Learning Topic Structure in Text Documents using Generative Topic Models

Size: px
Start display at page:

Download "Learning Topic Structure in Text Documents using Generative Topic Models"

Transcription

1 Learnng Topc Structure n Text Documents usng Generatve Topc Models Ntsh Srvastava CS 397 Report Advsor: Dr Hrsh Karnck Abstract We present a method for estmatng the topc structure for a document corpus by combnng the use of effcent clusterng algorthms wth generatve topc models. Our method bulds the topc structure as a DAG n a layer-wse manner by estmatng the number of topcs n each layer and constructng topc models for them. It dscovers subsumpton relatons between topc nodes n consecutve layers by mappng the problem to fndng maxmum edge-weghted clques on small and sparse graphs. We analyze the sparsty of the nduced graphs and gve bounds on the runnng tme of our algorthm. Most methods for solvng ths problem use varants of herarchcal drchlet procesess to provde a nonparametrc pror on the number of topcs and estmate the number as the topc model s bult. Whle these models have been shown to perform well under certan metrcs, the estmated number of topcs s often qute large, lmtng the utlty of the topc structures bult. Our approach s to buld structures where the number of topcs would be smaller and comparable to that n structures bult by human experts. We evaluate our method usng real world text datasets. Keywords-Data management; Data models; Herarchcal systems; Pattern clusterng methods I. INTRODUCTION Topc models have been shown to be effectve tools for analyss of document corpora. Learnng the topc structure s an mportant problem n ths regard. Our method bulds arbtrary DAG topc structures over a collecton of documents where nodes represent topcs and edges pont from a topc to a more specfc sub-topc. Key contrbutons are the estmaton of number of topcs n each layer of the structure and dscovery of subsumpton relatons between topc nodes n adjacent layers as a max-clque problem. Topc models have been proposed whch automatcally learn the number of topcs usng varous forms of Herarchcal Drchlet Processes (HDP, Teh et al. [1]) such as nonparametrc bayes PAM [2]. The key dea here s to Descrpton of HDP, PAM, NPB-PAM. Crtcsm We explore a dfferent strategy for topc structure extracton. We separate the generatve topc model from the topc ontology. We propose a method whch separates the processes of learnng a topc structure from learnng the topc model. We buld the topc structure usng smple clusterng technques combned wth sngle layer topc models. These structures can then be used by varous sophstcated topc models whch requre the number of topcs and herarchy structure to be specfed. Models such as herarchcal Latent Drchlet Allocaton (hlda, [3]), Correlated Topcs Model (CTM, [4]) and Pachnko Allocaton Model (PAM, [5]) and mxture model extensons of these are some such methods. Document clusterng s a wdely studed feld. Zhao et al. [6] gve an excellent comparson of several clusterng paradgms. We use a recursve bsecton based clusterng algorthm [7] to k-cluster a dataset and evaluate the qualty of each cluster. We use ths to approxmate the number of topcs at dfferent depths n the DAG. Successve levels of depth n the DAG ( layers ) represent more fne-graned topcs. We learn sngle-layer topc models for each depth. The topcs n consecutve layers are then lnked by subsumpton relatons. We cast the problem of dscoverng subsumpton relatons nto fndng a maxmum edge-weghted clque over a sparse topc graph. Besdes fndng subsumpton relatons, our formulaton also merges very smlar topcs n the same layer to further prune the topc structure and make up for errors made n the clusterng phase. In order to valdate our results, we use the 20 newsgroups comp5 (ngcomp5) and the NIPS abstracts datasets. These datasets have sngle level categores. In order to do a more meanngful evaluaton, we collected a herarchcal dataset from a crawl of the webpages lnked from the Open Drectory Project (ODP). Ths allows us to compare aganst human-expert determned topc structures. II. ESTIMATING NUMBER OF TOPICS In ths secton, we descrbe a smple approach to obtan a rough estmate of the number of the topcs gven a document corpus. Nonparametrc varants of topc models whch use Herarchcal Drchlet Processes (HDP) such as Latent Drchlet Allocaton [1] and Pachnko Allocaton Model (NPB-PAM [2]) have been used to estmate the number of topcs whle buldng the correspondng topc model. These methods place a non-parametrc pror on the number of topcs and estmate the number as the model s bult. These methods have been shown to work well n terms of evaluaton metrcs whch nvolve lkelhood estmates (emprcal or otherwse) or classfcaton accuraces on heldout data. Inferng the number of topcs usng HDPs gves more specfc topcs (as qualtatvely shown for the case of NPB-PAM). However, the average number of topcs estmated s often very hgh. For example NPB-PAM dscovers 179 sub-topcs n the 20newsgroups comp5 dataset whch contans 5 topcs. Such models are good for applcaton domans where the topc model s meant for tasks where the

2 Crteron functon Crteron functon Crteron functon number of topcs number of topcs number of topcs (a) ODP (b) NIPS (c) ODP/Computers Fgure 1. Plot of Crteron functon vs Number of topcs dscovered topcs themselves serve only as latent varables. It s not clear f these topcs would correspond to a human determned ontology. Our goal s to generate a topc structure wth typcally fewer topc nodes and more semblance to a topc herarchy that could be appled to domans whch use the actual topc structure for organzng unstructured text nformaton. We begn by obtanng an approxmate number of topcs usng a clusterng algorthm whch uses repeated bsectons. Zhao and Karyps [6] have shown that ths method produces better clusterng than agglomeratve or graph based methods. A key characterstc of ths approach s that t uses a global crteron functon whose optmzaton drves the entre clusterng process. The crteron functon used s maxmze =1 v,u S u.v where each document u s represented as a vector of normalzed tf values. We k-cluster the set of documents for each k {2, 3,..., K}, where K s the maxmum number of topcs to be allowed n any layer of the topc DAG. Computng a K-clusterng takes O(NNZ log(k)) tme where NNZ s the number of non-zero entres n the document smlarty matrx. In the process of K clusterng the dataset, the algorthm also produces k-clusters for all k < K snce the method performs repeated bsectons. Hence, computng all clusterngs from 2 to K also takes O(NNZ log(k)) tme. Each clusterng C k corresponds to a set of clusters {Ck 1, C2 k,..., Ck k }. The qualty of each C k s evaluated usng the functon 1 F ( µ nt, µ ext, σ nt ) = k =1 C k Ck µ extσnt µ (1) nt =1 Here µ denotes the vector of average ntra-cluster smlarty. µ e denotes the vector of average nter-cluster smlarty. σ denotes the vector standard devaton n ntra-cluster smlarty. All smlartes refer to cosne smlartes. A lower value of the functon ndcates a better clusterng. The ntuton behnd the crteron functon s to have low smlarty wth documents outsde the cluster and hgh smlarty, wth low standard devaton nsde. Ths s to ensure that the clusters are tght. The rato s weghed by the sze of the cluster. Ths functon s computed for each k-clusterng. Ths gves a sequence of qualty values q 1, q 2,..., q K correspondng to the sequence of topc numbers. A low value mples a good clusterng. Let {m 1, m 2,..., m l } be the sequence of topc numbers correspondng to local mnma n the sequence of qualty values. These mnma represent the regons where the number of topcs are such that the clusterng produced s locally optmal. These are potentally the number of topcs n at ncreasng depth n the topc structure. The ntuton behnd ths s as follows. Suppose that a document corpus has c topcs. If the corpus was clustered nto a c 1 or c + 1 clusterng, some of the actual c clusters wll have to redstrbute ther documents to ft nto a c 1 or c+1 clusterng. Ths wll decrease the tghtness of the clusters whch wll be captured by the crteron functon n 1. We expect that the value of the functon would be hgher on ether sde of a good clusterng. Fgure 1 shows plots of q 1, q 2,..., q 30 for dfferent datasets. Wth ths heurstc n place, the goal of the clusterng step s to determne {m 1, m 2,..., m L }, the sequence of approxmate number of topcs n each layer. For example, n Fgure 1(b) the sequence of mnma s {6, 9, 14, 18,...}. These wll be chosen as the approxmate number of topcs

3 n the topc structure correspondng to the NIPS abstracts dataset. The underlyng assumpton that mutually exclusve document clusters are representatve of topcs s not entrely accurate snce the actual topc structure may be more complex and documents may contan content that could be best descrbed as a mxture of dfferent topcs. Topcs may exhbt sgnfcant correlatons. However, the purpose of ths clusterng s not to dscover ths rch underlyng structure but to get a rough estmate of the number of topcs n each layer of the topc structure. Ths estmate s mportant because t gves a startng pont for the structure buldng algorthm. A more accurate topc number along wth subsumpton relatons between them s dscovered later. Table?? shows the top few words n the clusters correspondng to the local mnma of the crteron functon. Whle the clusters may not be extremely accurate, they are good enough to justfy ther use as approxmatatons. Secton xxx compares the fnal topc numbers and structure wth these approxmatons. III. BUILDING THE DAG A. Buldng Sngle Layer topc models The next step n buldng the topc structure s to fnd topc models for each value of the number of topcs {m 1, m 2,..., m l } found n the prevous secton. We treat each layer ndependently and buld sngle layer topc models for them. Our method allows the use of any statstcal topc model as long as t s possble to nfer topc proportons present n each document under that model. In ths paper, we demonstrate our method usng smple Latent Drchlet Allocaton [8] though t s possble to use more sophstcated models. The smplcty of the model allows us to demonstrate the key dea behnd the process of dscoverng subsumpton relatons more effectvely. The framework of buldng ndependent models and then subsumng consecutve layers has been prevously used by Zavtsanos et al. [9] who demonstrated ts use n buldng document ontologes usng LDA. Latent Drchlet Allocaton s a generatve probablstc model n whch documents are represented as random mxtures over latent topcs. Topcs are themselves modeled as an nfnte mxture over a set of topc probabltes. The generatve process can be summarzed as: We use varatonal methods descrbed n Ble et al. [8] to nfer topc proportons n each document. We have the sequence (m) L =1 of number of topcs estmated from the clusterng step. For each element m of ths sequence, we buld an LDA topc model T. For each document d of the corpus, the varatonal nference method gves a posteror drchlet parameter γ d, where the proporton of topc u s represented by the component γ d,u. One way to nterpret ths stuaton s to defne a probablty functon P over the set of topcs {u 1, u 2,..., u m }, where P (u ) denotes the probablty of occurence of topc u n a random document. Gven the document corpus D, ths can be estmated as, 1 P (u ) = γ d,u D d D γ d,u γ d,u = m l=1 γ d,u l The jont probablty dstrbuton P (u, u j ) can be estmated as, P (u, u j ) = 1 γ d,u D γ d,u j (2) d D The jont probabltes are a measure of co-occurence of dfferent topcs. Ths dea s exploted next for dscoverng subsumpton relatons. B. Subsumpton as a maxmum weght clque problem Now that we have the topc models for each layer, the next step s to fnd subsumpton relatons between consecutve layers. The problem of dscoverng subsumpton relatons n ontologes has been wdely studed. One of the most relevant works n the context of buldng topc structures s by Zavtsanos et al. [9] who use the dea of condtonal ndependence between sub-topcs gven a canddate parent topc to decde subsumpton relatons. In ther method, f {u 1, u 2,..., u m } s a topc layer and {v 1, v 2,..., v n } s the layer of topcs just below t, then each par (v, v j ) s assgned a parent u k f the occurence of u k n a document makes the occurence of topcs v and v j condtonally ndependent. The ntuton s that f the co-occurence of topcs v and v j s nullfed once we know the parent u k occurs, then the parent captures what s common between the topcs and s therefore a good choce to subsume v and v j. Such a crteron for subsumpton s reasonable, but t suffers from a constrant that t decdes the subsumpton relatons based on par-wse ndependence alone. In ths paper, we use a dfferent crteron and a more general graph framework that captures par-wse relatons and uses a clque based method to buld hgher order sub-topc sets and then determnes subsumpton relatons. Let there be m parent topcs and n chld topcs n some par of consecutve layers. For each chld topc v we fnd u k such that P (v u k ) s maxmum, where the probablty P s defned as n Eq. 2 and 2. The topc v s then drectly subsumed under u k. Ths process assocates each chld topc wth exactly one parent, buldng a tree structure. Next, we dscover par-wse subsumpton relatons,.e., for each par of dstnct chld topcs (v, v j ) we fnd the parent topc u k whch best subsumes them. We later use par-wse relatons to determne global subsumpton relatons. We would lke u k to subsume v and v j f they could be consdered subtopcs of u k. For them to be sub-topcs, two condtons must hold. v and v j should be assocated wth u k but at the same tme they must be suffcently separate so that they can be

4 consdered separate sub-topcs. These two condtons can be nterpreted as demandng that P (v u k ) and P (v j u k ) should both be hgh but P (v, v j u k ) must be low,.e. the jont probablty that v and v j occur together gven u k must be low. In other words, each sub-topc ndvdually must have a hgh condtonal probablty of occurence gven that the parent topc occurs (ndcatng that the subtopcs capture a part of the parent topc s vocabulary) but at the same tme both the sub-topcs must not occur together very often gven the parent (f they do occur, they are not separate enough to be consdered ndvdual sub-topcs). In ths sense, the occurence of v and v j must be negatvely correlated gven u k. Hence for every (v, v j ), we can fnd a u k u k = argmax uk (P (v u k )P (v j u k ) P (v, v j u k )) (3) subject to u k th where th s a postve threshold on the objectve functon to supress very small values. If no such u k exsts, then (v, v j ) cannot be subsumed under any parent topc. A postve threshold ensures that the par of sub-topcs respects the two condtons above. A slghtly dfferent way to look at ths objectve functon s to note that we want P (v j v, u k ) P (v j u k ) P (v u k )P (v j v, u k ) P (v u k )P (v j u k ) P (v, v j u k ) P (v u k )P (v j u k ) 5 are not connected. Ths would mean that they are not suffcently negatvely correlated to be consdered separate subtopcs of the parent topc. Hence we would not want both 4 and 5 to be subsumed. The clques consstng of topcs (8,1,4) or (8,2,4) are better choces snce each topc s then suffcently dfferent from the others. Topc 5 would, n essence, be captured by topc 4. The weghts assocated wth the edges denote the strength of the correspondng negatve correlaton. All vertces n the maxmum edgeweghted clque should be subsumed under the parent topc. Hence the problem of determnng the best subsumpton set for any parent topc k reduces to fndng the maxmum weght clque n graph G k. Though ths problem s hard to solve n general, we observed that the graphs G k that are nduced by real datasets that we expermented on are typcally very sparse (not more than 1 connected component and not more than 6-7 vertces n any graph). Hence, even an exponental tme algorthm would not be too bad keepng n mnd that ths ontology buldng exercse s to be done offlne. The par-wse ndependence crteron used by Zavtsanos et al. [9] can be seen as a specal case of ths method, where only the maxmum 2-clque s used for buldng subsumpton relatons. The next secton explores ths graph and the weght functon n more detal and gves bounds on the sparsty of the nduced graphs. And the more t s less the better. Solvng the optmzaton problem n Eq.(3) gves the best parent topc under whch a par of chld topcs may be subsumed. We emphasze that none of these chld topcs mght actually be subsumed under ths parent n the fnal herarchy. Ths s just the best parent that could subsume ths par. Next, we construct graphs G k for each parent topc u k. G k conssts of those sub-topcs whch occur as a part of a par whch s best subsumed under u k..e., V (G k ) = { jstu k subsumes (v, v j )} (4) E(G k ) = {(, j) st u k subsumes (v, v j )} (5) wt(, j) = P (v u k )P (v j u k ) P (v, v j u k ) (6) Fgure 2 shows some examples of such graphs for the Reuters dataset. The three graphs correspond to the three parent topcs. The chld layer conssts of 8 topcs. Note that any clque C n ths graph represents a set of topcs whch are mutually negatvely correlated n the sense of Eq.(3). We would want that only the largest such set be subsumed under the correspondng parent. For example n the frst graph n Fgure 2, topcs 8 and 5 are connected by an edge meanng that they are negatvely correlated. Also topcs 8 and 4 are connected. However, topcs 4 and C. Sparsty bounds Let there be m parent topcs and n chld topcs n some par of consecutve layers. An edge (, j) can be n G k for a unque value of k snce only the best canddate parent u k subsumes (v, v j ). There are O(n 2 ) such edges dstrbuted across m graphs. Hence, the expected number of edges n each graph s O(n 2 /m). Snce m and n are the number of topcs n adjacent layers, we can assume that m = O(n). Therefore, the expected number of edges s O(n). Ths means that the maxmum-clque can be of sze O( n). Consder the functon f k (, j) = P ( k)p (j k) P (, j k) Where and j refer to chld topcs and k refers to the parent. If the occurence of topcs and j s ndependent gven k, P (, j k) = P ( k)p (j k) and hence f k (, j) = 0. If the occurence of the topcs s negatvely correlated f k (, j) > 0.

5 (a) Graphs G k for Reuters-25 topcs Fgure 2. Determnng subsumpton relatons (b) Top few layers of the ontology Also, f k (, j) = (P ( k)p (j k) P (, j k)),j,j D. Model Trmmng =,j = P ( k)p (j k) P (, j k),j P ( k) j = P ( k)1 1 = 1 1 = 0 P (j k) 1 So far we have bult the topc structure assumng that the number of topcs chosen n the clusterng step were correct. Clusterng methods are however susceptble to errors and use a very prmtve noton of smlarty. The objectve functon 3 can be put to further use for trmmng the topc structure. Ths tme, we can look at t as follows w,j (k) = P (v u k )P (v j u k ) P (v, v j u k ) Recall that a hgh value of w,j (k) would mean that topcs v and v j are separated enough from each other and yet assocated suffcently to u k to be consdered ts sub-topcs. A low value, on the other hand, would mean that the topcs are qute smlar and as far as u k s consdered, they mght be merged. Ths can be seen as a scheme where each parent topc u k votes f (v, v j ) should be merged and the value of the vote s w,j (k). A hgh vote value means that the correspondng parent topc wants the par to be kept separate and a low ndcates that they should be merged. These votes are then polled together to get a fnal value. W (, j) = w,j (k ) k =1 If W (, j) s postve, the par s left alone, else t s merged. A possble mprovement s to wegh the votes of each parent k by P (v u k )P (v j u k ). Ths factor ensures that the vote of a parent topc whose occurence s more correlated wth the occurence of the chld topcs s taken more serously, the dea beng that such a parent topc carres statstcally more weght than a parent topc whch has nothng to do wth ths par of chld topcs. W (, j) = P (v u k)p (v j u k)w,j (k ) k =1 Merge (v, v j ) f W (, j) < 0 Whle the ntuton behnd the method seems sound, more experments need to done to evaluate t. IV. EVALUATION The am of ths study s to provde a method to buld semantcally meanngful topc structures whch are smlar to those bult by human experts. We compare the topc structures determned usng our method wth human bult ones. A. Datasets We chose a subset of the Open Drectory Project (ODP) category structure and crawled a random number of webpages lnked under those categores. Ths data was collected usng the rdf dump avalable from the ODP homepage accessed on 10th Aprl Besdes, we use standard datasets such as the NIPS abstracts dataset, Reuters and the 20 newsgroups dataset.

6 F_ F_ number of topcs number of topcs (a) Reuters-13 (b) Reuters-25 Fgure 3. Plot of Crteron functon vs Number of topcs for dfferent subsets of Reuters dataset Fgure 4. Top 3 layers of onotology for Reuters dataset. Black lnes ndcate major parent, blue lnes ndcate lesser degree of assocaton. B. Qualty of clusterng Reuters topc dataset Mnma for crteron functon at : 3, 8, 12, 20 and 24 topcs. See Fgure 3. Top few words for a 3-topc clusterng usng LDA Topc 1 shp, offc, unon, strke, gulf Topc 2 tonne, mln, wheat sugar, export, gran Topc 3 ol, prce, mln, dlr, pct, produc Top few words for a 8-topc clusterng usng LDA Topc 1 prce, market, dlr, futur, exchange, trade Topc 2 ol, mln, pct, dlr, gold Topc 3 tonne, export, sugar, wheat, mln Topc 4 shp, strke, compan, port, unon Topc 5 ol, opec, prce, mln, bpd, saud Topc 6 coffee, produc, export, quota, stock, cocoa Topc 7 mln, crop, tonne, pct, product, gran, plant Topc 8 propos, offc, farm, wheat, agrcultur, gran Fgure 2 shows the graphs generated for decdng subsumpton between 8-topc layer and 3-topc layer. Fgure 4 shows the frst 3 layers for the generated ontology tree. The subsumpton seems to be qute approprate consderng the topc keywords gven above. ODP/Comp Ths data was collected by us usng the rdf dump avalable from the Open Drectory Project page accessed on 10th Aprl A subset of the scence drectory urls were crawled and converted to bag of words. Topc Structure: Algorthms Artfcal Intellgence : Academc Departments, People, Conferences and Events, Machne Leanrng, Natural Langugae, Neural Networks, Vson. Hardware : Buses, Cables, Perpherals (Audo, Keyboards, Dsplays, Prnters) OpenSource Systems Internet :Organzatons, Searchng, Web desgn and development

7 Securty : Frewalls, Intruson Detecton Systems, Malcous software Fgure 1(c) shows the plot of the cluster qualty crteron functon. A sgnfcant amount of data cleanng needs to be done before ths set s usable for further experments. Ths presents a very qualtatve analyss of the method. A more quanttatve comparson needs to be done to fully valdate t. We are currently workng on mplementng other methods as such Non-Parametrc LDA and Non-Parametrc PAM whch also try to estmate the sze and structure of a topc herarchy. We plan to compare our method aganst these on metrcs such as perplexty and devaton from human defned structure. [9] E. Zavtsanos, S. Petrds, G. Palouras, and G. A. Vouros, Determnng automatcally the sze of learned ontologes, n ECAI, ser. Fronters n Artfcal Intellgence and Applcatons, M. Ghallab, C. D. Spyropoulos, N. Fakotaks, and N. M. Avours, Eds., vol IOS Press, 2008, pp V. CONCLUSIONS AND FUTURE WORK Our method estmates the sze and structure of the topc space underlyng a set of documents. However, an mportant step n usng ths method for a large number of practcal applcatons s to develop a method for category namng that wll be used to label each of the dscovered categores. Our method currently lacks a formal justfcaton and a quanttatve analyss. We plan to work further n these drectons. REFERENCES [1] Y. W. Teh, M. I. Jordan, M. J. Beal, and D. M. Ble, Herarchcal Drchlet processes, Journal of the Amercan Statstcal Assocaton, vol. 101, no. 476, pp , [2] W. L, D. Ble, and A. Mccallum, Nonparametrc bayes pachnko allocaton, n UAI 07, [Onlne]. Avalable: mccallum/papers/npbpamua2007s.pdf [3] D. Ble, T. L. Grffths, M. I. Jordan, and J. B. Tenenbaum, Herarchcal topc models and the nested chnese restaurant process, Advances n Neural Informaton Processng Systems 16., [Onlne]. Avalable: gruffydd/papers/ncrp.pdf [4] D. M. Ble and J. D. Lafferty, Correlated topc models, n NIPS, [5] W. L and A. McCallum, Pachnko allocaton: Dag-structured mxture models of topc correlatons, n ICML 06: Proceedngs of the 23rd nternatonal conference on Machne learnng. New York, NY, USA: ACM, 2006, pp [6] Y. Zhao and G. Karyps, Emprcal and theoretcal comparsons of selected crteron functons for document clusterng, Mach. Learn., vol. 55, no. 3, pp , [7] M. Stenbach, G. Karyps, and V. Kumar, A comparson of document clusterng technques, n KDD Workshop on Text Mnng, [8] D. Ble, A. Y. Ng, and M. I. Jordan, Latent drchlet allocaton, Journal of Machne Learnng Research, vol. 3, p. 2003, 2001.

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning Outlne Artfcal Intellgence and ts applcatons Lecture 8 Unsupervsed Learnng Professor Danel Yeung danyeung@eee.org Dr. Patrck Chan patrckchan@eee.org South Chna Unversty of Technology, Chna Introducton

More information

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique //00 :0 AM Outlne and Readng The Greedy Method The Greedy Method Technque (secton.) Fractonal Knapsack Problem (secton..) Task Schedulng (secton..) Mnmum Spannng Trees (secton.) Change Money Problem Greedy

More information

User Authentication Based On Behavioral Mouse Dynamics Biometrics

User Authentication Based On Behavioral Mouse Dynamics Biometrics User Authentcaton Based On Behavoral Mouse Dynamcs Bometrcs Chee-Hyung Yoon Danel Donghyun Km Department of Computer Scence Department of Computer Scence Stanford Unversty Stanford Unversty Stanford, CA

More information

Machine Learning: Algorithms and Applications

Machine Learning: Algorithms and Applications 14/05/1 Machne Learnng: Algorthms and Applcatons Florano Zn Free Unversty of Bozen-Bolzano Faculty of Computer Scence Academc Year 011-01 Lecture 10: 14 May 01 Unsupervsed Learnng cont Sldes courtesy of

More information

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

Performance Evaluation of Information Retrieval Systems

Performance Evaluation of Information Retrieval Systems Why System Evaluaton? Performance Evaluaton of Informaton Retreval Systems Many sldes n ths secton are adapted from Prof. Joydeep Ghosh (UT ECE) who n turn adapted them from Prof. Dk Lee (Unv. of Scence

More information

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points; Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features

More information

CS 534: Computer Vision Model Fitting

CS 534: Computer Vision Model Fitting CS 534: Computer Vson Model Fttng Sprng 004 Ahmed Elgammal Dept of Computer Scence CS 534 Model Fttng - 1 Outlnes Model fttng s mportant Least-squares fttng Maxmum lkelhood estmaton MAP estmaton Robust

More information

Hierarchical clustering for gene expression data analysis

Hierarchical clustering for gene expression data analysis Herarchcal clusterng for gene expresson data analyss Gorgo Valentn e-mal: valentn@ds.unm.t Clusterng of Mcroarray Data. Clusterng of gene expresson profles (rows) => dscovery of co-regulated and functonally

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

S1 Note. Basis functions.

S1 Note. Basis functions. S1 Note. Bass functons. Contents Types of bass functons...1 The Fourer bass...2 B-splne bass...3 Power and type I error rates wth dfferent numbers of bass functons...4 Table S1. Smulaton results of type

More information

Cluster Analysis of Electrical Behavior

Cluster Analysis of Electrical Behavior Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School

More information

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth

More information

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 15

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 15 CS434a/541a: Pattern Recognton Prof. Olga Veksler Lecture 15 Today New Topc: Unsupervsed Learnng Supervsed vs. unsupervsed learnng Unsupervsed learnng Net Tme: parametrc unsupervsed learnng Today: nonparametrc

More information

Unsupervised Learning

Unsupervised Learning Pattern Recognton Lecture 8 Outlne Introducton Unsupervsed Learnng Parametrc VS Non-Parametrc Approach Mxture of Denstes Maxmum-Lkelhood Estmates Clusterng Prof. Danel Yeung School of Computer Scence and

More information

Mathematics 256 a course in differential equations for engineering students

Mathematics 256 a course in differential equations for engineering students Mathematcs 56 a course n dfferental equatons for engneerng students Chapter 5. More effcent methods of numercal soluton Euler s method s qute neffcent. Because the error s essentally proportonal to the

More information

Machine Learning. Topic 6: Clustering

Machine Learning. Topic 6: Clustering Machne Learnng Topc 6: lusterng lusterng Groupng data nto (hopefully useful) sets. Thngs on the left Thngs on the rght Applcatons of lusterng Hypothess Generaton lusters mght suggest natural groups. Hypothess

More information

TN348: Openlab Module - Colocalization

TN348: Openlab Module - Colocalization TN348: Openlab Module - Colocalzaton Topc The Colocalzaton module provdes the faclty to vsualze and quantfy colocalzaton between pars of mages. The Colocalzaton wndow contans a prevew of the two mages

More information

GSLM Operations Research II Fall 13/14

GSLM Operations Research II Fall 13/14 GSLM 58 Operatons Research II Fall /4 6. Separable Programmng Consder a general NLP mn f(x) s.t. g j (x) b j j =. m. Defnton 6.. The NLP s a separable program f ts objectve functon and all constrants are

More information

Keywords - Wep page classification; bag of words model; topic model; hierarchical classification; Support Vector Machines

Keywords - Wep page classification; bag of words model; topic model; hierarchical classification; Support Vector Machines (IJCSIS) Internatonal Journal of Computer Scence and Informaton Securty, Herarchcal Web Page Classfcaton Based on a Topc Model and Neghborng Pages Integraton Wongkot Srura Phayung Meesad Choochart Haruechayasak

More information

Biostatistics 615/815

Biostatistics 615/815 The E-M Algorthm Bostatstcs 615/815 Lecture 17 Last Lecture: The Smplex Method General method for optmzaton Makes few assumptons about functon Crawls towards mnmum Some recommendatons Multple startng ponts

More information

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms Course Introducton Course Topcs Exams, abs, Proects A quc loo at a few algorthms 1 Advanced Data Structures and Algorthms Descrpton: We are gong to dscuss algorthm complexty analyss, algorthm desgn technques

More information

EXTENDED BIC CRITERION FOR MODEL SELECTION

EXTENDED BIC CRITERION FOR MODEL SELECTION IDIAP RESEARCH REPORT EXTEDED BIC CRITERIO FOR ODEL SELECTIO Itshak Lapdot Andrew orrs IDIAP-RR-0-4 Dalle olle Insttute for Perceptual Artfcal Intellgence P.O.Box 59 artgny Valas Swtzerland phone +4 7

More information

Problem Set 3 Solutions

Problem Set 3 Solutions Introducton to Algorthms October 4, 2002 Massachusetts Insttute of Technology 6046J/18410J Professors Erk Demane and Shaf Goldwasser Handout 14 Problem Set 3 Solutons (Exercses were not to be turned n,

More information

Feature Reduction and Selection

Feature Reduction and Selection Feature Reducton and Selecton Dr. Shuang LIANG School of Software Engneerng TongJ Unversty Fall, 2012 Today s Topcs Introducton Problems of Dmensonalty Feature Reducton Statstc methods Prncpal Components

More information

The Research of Support Vector Machine in Agricultural Data Classification

The Research of Support Vector Machine in Agricultural Data Classification The Research of Support Vector Machne n Agrcultural Data Classfcaton Le Sh, Qguo Duan, Xnmng Ma, Me Weng College of Informaton and Management Scence, HeNan Agrcultural Unversty, Zhengzhou 45000 Chna Zhengzhou

More information

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster

More information

Deep Classification in Large-scale Text Hierarchies

Deep Classification in Large-scale Text Hierarchies Deep Classfcaton n Large-scale Text Herarches Gu-Rong Xue Dkan Xng Qang Yang 2 Yong Yu Dept. of Computer Scence and Engneerng Shangha Jao-Tong Unversty {grxue, dkxng, yyu}@apex.sjtu.edu.cn 2 Hong Kong

More information

Query Clustering Using a Hybrid Query Similarity Measure

Query Clustering Using a Hybrid Query Similarity Measure Query clusterng usng a hybrd query smlarty measure Fu. L., Goh, D.H., & Foo, S. (2004). WSEAS Transacton on Computers, 3(3), 700-705. Query Clusterng Usng a Hybrd Query Smlarty Measure Ln Fu, Don Hoe-Lan

More information

Module Management Tool in Software Development Organizations

Module Management Tool in Software Development Organizations Journal of Computer Scence (5): 8-, 7 ISSN 59-66 7 Scence Publcatons Management Tool n Software Development Organzatons Ahmad A. Al-Rababah and Mohammad A. Al-Rababah Faculty of IT, Al-Ahlyyah Amman Unversty,

More information

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1 4/14/011 Outlne Dscrmnatve classfers for mage recognton Wednesday, Aprl 13 Krsten Grauman UT-Austn Last tme: wndow-based generc obect detecton basc ppelne face detecton wth boostng as case study Today:

More information

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems A Unfed Framework for Semantcs and Feature Based Relevance Feedback n Image Retreval Systems Ye Lu *, Chunhu Hu 2, Xngquan Zhu 3*, HongJang Zhang 2, Qang Yang * School of Computng Scence Smon Fraser Unversty

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervsed Learnng and Clusterng Why consder unlabeled samples?. Collectng and labelng large set of samples s costly Gettng recorded speech s free, labelng s tme consumng 2. Classfer could be desgned

More information

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour 6.854 Advanced Algorthms Petar Maymounkov Problem Set 11 (November 23, 2005) Wth: Benjamn Rossman, Oren Wemann, and Pouya Kheradpour Problem 1. We reduce vertex cover to MAX-SAT wth weghts, such that the

More information

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1)

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1) Secton 1.2 Subsets and the Boolean operatons on sets If every element of the set A s an element of the set B, we say that A s a subset of B, or that A s contaned n B, or that B contans A, and we wrte A

More information

Harvard University CS 101 Fall 2005, Shimon Schocken. Assembler. Elements of Computing Systems 1 Assembler (Ch. 6)

Harvard University CS 101 Fall 2005, Shimon Schocken. Assembler. Elements of Computing Systems 1 Assembler (Ch. 6) Harvard Unversty CS 101 Fall 2005, Shmon Schocken Assembler Elements of Computng Systems 1 Assembler (Ch. 6) Why care about assemblers? Because Assemblers employ some nfty trcks Assemblers are the frst

More information

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

Concurrent Apriori Data Mining Algorithms

Concurrent Apriori Data Mining Algorithms Concurrent Apror Data Mnng Algorthms Vassl Halatchev Department of Electrcal Engneerng and Computer Scence York Unversty, Toronto October 8, 2015 Outlne Why t s mportant Introducton to Assocaton Rule Mnng

More information

y and the total sum of

y and the total sum of Lnear regresson Testng for non-lnearty In analytcal chemstry, lnear regresson s commonly used n the constructon of calbraton functons requred for analytcal technques such as gas chromatography, atomc absorpton

More information

Kent State University CS 4/ Design and Analysis of Algorithms. Dept. of Math & Computer Science LECT-16. Dynamic Programming

Kent State University CS 4/ Design and Analysis of Algorithms. Dept. of Math & Computer Science LECT-16. Dynamic Programming CS 4/560 Desgn and Analyss of Algorthms Kent State Unversty Dept. of Math & Computer Scence LECT-6 Dynamc Programmng 2 Dynamc Programmng Dynamc Programmng, lke the dvde-and-conquer method, solves problems

More information

Smoothing Spline ANOVA for variable screening

Smoothing Spline ANOVA for variable screening Smoothng Splne ANOVA for varable screenng a useful tool for metamodels tranng and mult-objectve optmzaton L. Rcco, E. Rgon, A. Turco Outlne RSM Introducton Possble couplng Test case MOO MOO wth Game Theory

More information

BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION

BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION SHI-LIANG SUN, HONG-LEI SHI Department of Computer Scence and Technology, East Chna Normal Unversty 500 Dongchuan Road, Shangha 200241, P. R. Chna E-MAIL: slsun@cs.ecnu.edu.cn,

More information

Reducing Frame Rate for Object Tracking

Reducing Frame Rate for Object Tracking Reducng Frame Rate for Object Trackng Pavel Korshunov 1 and We Tsang Oo 2 1 Natonal Unversty of Sngapore, Sngapore 11977, pavelkor@comp.nus.edu.sg 2 Natonal Unversty of Sngapore, Sngapore 11977, oowt@comp.nus.edu.sg

More information

Optimizing Document Scoring for Query Retrieval

Optimizing Document Scoring for Query Retrieval Optmzng Document Scorng for Query Retreval Brent Ellwen baellwe@cs.stanford.edu Abstract The goal of ths project was to automate the process of tunng a document query engne. Specfcally, I used machne learnng

More information

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,

More information

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur FEATURE EXTRACTION Dr. K.Vjayarekha Assocate Dean School of Electrcal and Electroncs Engneerng SASTRA Unversty, Thanjavur613 41 Jont Intatve of IITs and IISc Funded by MHRD Page 1 of 8 Table of Contents

More information

Machine Learning 9. week

Machine Learning 9. week Machne Learnng 9. week Mappng Concept Radal Bass Functons (RBF) RBF Networks 1 Mappng It s probably the best scenaro for the classfcaton of two dataset s to separate them lnearly. As you see n the below

More information

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,

More information

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Determining the Optimal Bandwidth Based on Multi-criterion Fusion Proceedngs of 01 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 5 (01) (01) IACSIT Press, Sngapore Determnng the Optmal Bandwdth Based on Mult-crteron Fuson Ha-L Lang 1+, Xan-Mn

More information

12/2/2009. Announcements. Parametric / Non-parametric. Case-Based Reasoning. Nearest-Neighbor on Images. Nearest-Neighbor Classification

12/2/2009. Announcements. Parametric / Non-parametric. Case-Based Reasoning. Nearest-Neighbor on Images. Nearest-Neighbor Classification Introducton to Artfcal Intellgence V22.0472-001 Fall 2009 Lecture 24: Nearest-Neghbors & Support Vector Machnes Rob Fergus Dept of Computer Scence, Courant Insttute, NYU Sldes from Danel Yeung, John DeNero

More information

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision SLAM Summer School 2006 Practcal 2: SLAM usng Monocular Vson Javer Cvera, Unversty of Zaragoza Andrew J. Davson, Imperal College London J.M.M Montel, Unversty of Zaragoza. josemar@unzar.es, jcvera@unzar.es,

More information

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc.

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc. [Type text] [Type text] [Type text] ISSN : 0974-74 Volume 0 Issue BoTechnology 04 An Indan Journal FULL PAPER BTAIJ 0() 04 [684-689] Revew on Chna s sports ndustry fnancng market based on market -orented

More information

CSCI 5417 Information Retrieval Systems Jim Martin!

CSCI 5417 Information Retrieval Systems Jim Martin! CSCI 5417 Informaton Retreval Systems Jm Martn! Lecture 11 9/29/2011 Today 9/29 Classfcaton Naïve Bayes classfcaton Ungram LM 1 Where we are... Bascs of ad hoc retreval Indexng Term weghtng/scorng Cosne

More information

Wishing you all a Total Quality New Year!

Wishing you all a Total Quality New Year! Total Qualty Management and Sx Sgma Post Graduate Program 214-15 Sesson 4 Vnay Kumar Kalakband Assstant Professor Operatons & Systems Area 1 Wshng you all a Total Qualty New Year! Hope you acheve Sx sgma

More information

7/12/2016. GROUP ANALYSIS Martin M. Monti UCLA Psychology AGGREGATING MULTIPLE SUBJECTS VARIANCE AT THE GROUP LEVEL

7/12/2016. GROUP ANALYSIS Martin M. Monti UCLA Psychology AGGREGATING MULTIPLE SUBJECTS VARIANCE AT THE GROUP LEVEL GROUP ANALYSIS Martn M. Mont UCLA Psychology NITP AGGREGATING MULTIPLE SUBJECTS When we conduct mult-subject analyss we are tryng to understand whether an effect s sgnfcant across a group of people. Whether

More information

Simulation: Solving Dynamic Models ABE 5646 Week 11 Chapter 2, Spring 2010

Simulation: Solving Dynamic Models ABE 5646 Week 11 Chapter 2, Spring 2010 Smulaton: Solvng Dynamc Models ABE 5646 Week Chapter 2, Sprng 200 Week Descrpton Readng Materal Mar 5- Mar 9 Evaluatng [Crop] Models Comparng a model wth data - Graphcal, errors - Measures of agreement

More information

Graph-based Clustering

Graph-based Clustering Graphbased Clusterng Transform the data nto a graph representaton ertces are the data ponts to be clustered Edges are eghted based on smlarty beteen data ponts Graph parttonng Þ Each connected component

More information

EECS 730 Introduction to Bioinformatics Sequence Alignment. Luke Huan Electrical Engineering and Computer Science

EECS 730 Introduction to Bioinformatics Sequence Alignment. Luke Huan Electrical Engineering and Computer Science EECS 730 Introducton to Bonformatcs Sequence Algnment Luke Huan Electrcal Engneerng and Computer Scence http://people.eecs.ku.edu/~huan/ HMM Π s a set of states Transton Probabltes a kl Pr( l 1 k Probablty

More information

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS ARPN Journal of Engneerng and Appled Scences 006-017 Asan Research Publshng Network (ARPN). All rghts reserved. NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS Igor Grgoryev, Svetlana

More information

An Entropy-Based Approach to Integrated Information Needs Assessment

An Entropy-Based Approach to Integrated Information Needs Assessment Dstrbuton Statement A: Approved for publc release; dstrbuton s unlmted. An Entropy-Based Approach to ntegrated nformaton Needs Assessment June 8, 2004 Wllam J. Farrell Lockheed Martn Advanced Technology

More information

A Post Randomization Framework for Privacy-Preserving Bayesian. Network Parameter Learning

A Post Randomization Framework for Privacy-Preserving Bayesian. Network Parameter Learning A Post Randomzaton Framework for Prvacy-Preservng Bayesan Network Parameter Learnng JIANJIE MA K.SIVAKUMAR School Electrcal Engneerng and Computer Scence, Washngton State Unversty Pullman, WA. 9964-75

More information

Intelligent Information Acquisition for Improved Clustering

Intelligent Information Acquisition for Improved Clustering Intellgent Informaton Acquston for Improved Clusterng Duy Vu Unversty of Texas at Austn duyvu@cs.utexas.edu Mkhal Blenko Mcrosoft Research mblenko@mcrosoft.com Prem Melvlle IBM T.J. Watson Research Center

More information

Adaptive Transfer Learning

Adaptive Transfer Learning Adaptve Transfer Learnng Bn Cao, Snno Jaln Pan, Yu Zhang, Dt-Yan Yeung, Qang Yang Hong Kong Unversty of Scence and Technology Clear Water Bay, Kowloon, Hong Kong {caobn,snnopan,zhangyu,dyyeung,qyang}@cse.ust.hk

More information

Three supervised learning methods on pen digits character recognition dataset

Three supervised learning methods on pen digits character recognition dataset Three supervsed learnng methods on pen dgts character recognton dataset Chrs Flezach Department of Computer Scence and Engneerng Unversty of Calforna, San Dego San Dego, CA 92093 cflezac@cs.ucsd.edu Satoru

More information

Clustering is a discovery process in data mining.

Clustering is a discovery process in data mining. Cover Feature Chameleon: Herarchcal Clusterng Usng Dynamc Modelng Many advanced algorthms have dffculty dealng wth hghly varable clusters that do not follow a preconceved model. By basng ts selectons on

More information

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task Proceedngs of NTCIR-6 Workshop Meetng, May 15-18, 2007, Tokyo, Japan Term Weghtng Classfcaton System Usng the Ch-square Statstc for the Classfcaton Subtask at NTCIR-6 Patent Retreval Task Kotaro Hashmoto

More information

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION Paulo Quntlano 1 & Antono Santa-Rosa 1 Federal Polce Department, Brasla, Brazl. E-mals: quntlano.pqs@dpf.gov.br and

More information

Classifier Selection Based on Data Complexity Measures *

Classifier Selection Based on Data Complexity Measures * Classfer Selecton Based on Data Complexty Measures * Edth Hernández-Reyes, J.A. Carrasco-Ochoa, and J.Fco. Martínez-Trndad Natonal Insttute for Astrophyscs, Optcs and Electroncs, Lus Enrque Erro No.1 Sta.

More information

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance Tsnghua Unversty at TAC 2009: Summarzng Mult-documents by Informaton Dstance Chong Long, Mnle Huang, Xaoyan Zhu State Key Laboratory of Intellgent Technology and Systems, Tsnghua Natonal Laboratory for

More information

X- Chart Using ANOM Approach

X- Chart Using ANOM Approach ISSN 1684-8403 Journal of Statstcs Volume 17, 010, pp. 3-3 Abstract X- Chart Usng ANOM Approach Gullapall Chakravarth 1 and Chaluvad Venkateswara Rao Control lmts for ndvdual measurements (X) chart are

More information

Backpropagation: In Search of Performance Parameters

Backpropagation: In Search of Performance Parameters Bacpropagaton: In Search of Performance Parameters ANIL KUMAR ENUMULAPALLY, LINGGUO BU, and KHOSROW KAIKHAH, Ph.D. Computer Scence Department Texas State Unversty-San Marcos San Marcos, TX-78666 USA ae049@txstate.edu,

More information

SHAPE RECOGNITION METHOD BASED ON THE k-nearest NEIGHBOR RULE

SHAPE RECOGNITION METHOD BASED ON THE k-nearest NEIGHBOR RULE SHAPE RECOGNITION METHOD BASED ON THE k-nearest NEIGHBOR RULE Dorna Purcaru Faculty of Automaton, Computers and Electroncs Unersty of Craoa 13 Al. I. Cuza Street, Craoa RO-1100 ROMANIA E-mal: dpurcaru@electroncs.uc.ro

More information

Edge Detection in Noisy Images Using the Support Vector Machines

Edge Detection in Noisy Images Using the Support Vector Machines Edge Detecton n Nosy Images Usng the Support Vector Machnes Hlaro Gómez-Moreno, Saturnno Maldonado-Bascón, Francsco López-Ferreras Sgnal Theory and Communcatons Department. Unversty of Alcalá Crta. Madrd-Barcelona

More information

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,

More information

Keyword-based Document Clustering

Keyword-based Document Clustering Keyword-based ocument lusterng Seung-Shk Kang School of omputer Scence Kookmn Unversty & AIrc hungnung-dong Songbuk-gu Seoul 36-72 Korea sskang@kookmn.ac.kr Abstract ocument clusterng s an aggregaton of

More information

SVM-based Learning for Multiple Model Estimation

SVM-based Learning for Multiple Model Estimation SVM-based Learnng for Multple Model Estmaton Vladmr Cherkassky and Yunqan Ma Department of Electrcal and Computer Engneerng Unversty of Mnnesota Mnneapols, MN 55455 {cherkass,myq}@ece.umn.edu Abstract:

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Decson surface s a hyperplane (lne n 2D) n feature space (smlar to the Perceptron) Arguably, the most mportant recent dscovery n machne learnng In a nutshell: map the data to a predetermned

More information

Assembler. Building a Modern Computer From First Principles.

Assembler. Building a Modern Computer From First Principles. Assembler Buldng a Modern Computer From Frst Prncples www.nand2tetrs.org Elements of Computng Systems, Nsan & Schocken, MIT Press, www.nand2tetrs.org, Chapter 6: Assembler slde Where we are at: Human Thought

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervsed Learnng and Clusterng Supervsed vs. Unsupervsed Learnng Up to now we consdered supervsed learnng scenaro, where we are gven 1. samples 1,, n 2. class labels for all samples 1,, n Ths s also

More information

Assembler. Shimon Schocken. Spring Elements of Computing Systems 1 Assembler (Ch. 6) Compiler. abstract interface.

Assembler. Shimon Schocken. Spring Elements of Computing Systems 1 Assembler (Ch. 6) Compiler. abstract interface. IDC Herzlya Shmon Schocken Assembler Shmon Schocken Sprng 2005 Elements of Computng Systems 1 Assembler (Ch. 6) Where we are at: Human Thought Abstract desgn Chapters 9, 12 abstract nterface H.L. Language

More information

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization Problem efntons and Evaluaton Crtera for Computatonal Expensve Optmzaton B. Lu 1, Q. Chen and Q. Zhang 3, J. J. Lang 4, P. N. Suganthan, B. Y. Qu 6 1 epartment of Computng, Glyndwr Unversty, UK Faclty

More information

Private Information Retrieval (PIR)

Private Information Retrieval (PIR) 2 Levente Buttyán Problem formulaton Alce wants to obtan nformaton from a database, but she does not want the database to learn whch nformaton she wanted e.g., Alce s an nvestor queryng a stock-market

More information

TESTING AND IMPROVING LOCAL ADAPTIVE IMPORTANCE SAMPLING IN LJF LOCAL-JT IN MULTIPLY SECTIONED BAYESIAN NETWORKS

TESTING AND IMPROVING LOCAL ADAPTIVE IMPORTANCE SAMPLING IN LJF LOCAL-JT IN MULTIPLY SECTIONED BAYESIAN NETWORKS TESTING AND IMPROVING LOCAL ADAPTIVE IMPORTANCE SAMPLING IN LJF LOCAL-JT IN MULTIPLY SECTIONED BAYESIAN NETWORKS Dan Wu 1 and Sona Bhatt 2 1 School of Computer Scence Unversty of Wndsor, Wndsor, Ontaro

More information

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr) Helsnk Unversty Of Technology, Systems Analyss Laboratory Mat-2.08 Independent research projects n appled mathematcs (3 cr) "! #$&% Antt Laukkanen 506 R ajlaukka@cc.hut.f 2 Introducton...3 2 Multattrbute

More information

UB at GeoCLEF Department of Geography Abstract

UB at GeoCLEF Department of Geography   Abstract UB at GeoCLEF 2006 Mguel E. Ruz (1), Stuart Shapro (2), June Abbas (1), Slva B. Southwck (1) and Davd Mark (3) State Unversty of New York at Buffalo (1) Department of Lbrary and Informaton Studes (2) Department

More information

CE 221 Data Structures and Algorithms

CE 221 Data Structures and Algorithms CE 1 ata Structures and Algorthms Chapter 4: Trees BST Text: Read Wess, 4.3 Izmr Unversty of Economcs 1 The Search Tree AT Bnary Search Trees An mportant applcaton of bnary trees s n searchng. Let us assume

More information

Text Similarity Computing Based on LDA Topic Model and Word Co-occurrence

Text Similarity Computing Based on LDA Topic Model and Word Co-occurrence 2nd Internatonal Conference on Software Engneerng, Knowledge Engneerng and Informaton Engneerng (SEKEIE 204) Text Smlarty Computng Based on LDA Topc Model and Word Co-occurrence Mngla Shao School of Computer,

More information

Greedy Technique - Definition

Greedy Technique - Definition Greedy Technque Greedy Technque - Defnton The greedy method s a general algorthm desgn paradgm, bult on the follong elements: confguratons: dfferent choces, collectons, or values to fnd objectve functon:

More information

A Statistical Model Selection Strategy Applied to Neural Networks

A Statistical Model Selection Strategy Applied to Neural Networks A Statstcal Model Selecton Strategy Appled to Neural Networks Joaquín Pzarro Elsa Guerrero Pedro L. Galndo joaqun.pzarro@uca.es elsa.guerrero@uca.es pedro.galndo@uca.es Dpto Lenguajes y Sstemas Informátcos

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS46: Mnng Massve Datasets Jure Leskovec, Stanford Unversty http://cs46.stanford.edu /19/013 Jure Leskovec, Stanford CS46: Mnng Massve Datasets, http://cs46.stanford.edu Perceptron: y = sgn( x Ho to fnd

More information

Fuzzy Filtering Algorithms for Image Processing: Performance Evaluation of Various Approaches

Fuzzy Filtering Algorithms for Image Processing: Performance Evaluation of Various Approaches Proceedngs of the Internatonal Conference on Cognton and Recognton Fuzzy Flterng Algorthms for Image Processng: Performance Evaluaton of Varous Approaches Rajoo Pandey and Umesh Ghanekar Department of

More information

LECTURE : MANIFOLD LEARNING

LECTURE : MANIFOLD LEARNING LECTURE : MANIFOLD LEARNING Rta Osadchy Some sldes are due to L.Saul, V. C. Raykar, N. Verma Topcs PCA MDS IsoMap LLE EgenMaps Done! Dmensonalty Reducton Data representaton Inputs are real-valued vectors

More information

The Codesign Challenge

The Codesign Challenge ECE 4530 Codesgn Challenge Fall 2007 Hardware/Software Codesgn The Codesgn Challenge Objectves In the codesgn challenge, your task s to accelerate a gven software reference mplementaton as fast as possble.

More information

Learning an Image Manifold for Retrieval

Learning an Image Manifold for Retrieval Learnng an Image Manfold for Retreval Xaofe He*, We-Yng Ma, and Hong-Jang Zhang Mcrosoft Research Asa Bejng, Chna, 100080 {wyma,hjzhang}@mcrosoft.com *Department of Computer Scence, The Unversty of Chcago

More information

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching A Fast Vsual Trackng Algorthm Based on Crcle Pxels Matchng Zhqang Hou hou_zhq@sohu.com Chongzhao Han czhan@mal.xjtu.edu.cn Ln Zheng Abstract: A fast vsual trackng algorthm based on crcle pxels matchng

More information

Virtual Machine Migration based on Trust Measurement of Computer Node

Virtual Machine Migration based on Trust Measurement of Computer Node Appled Mechancs and Materals Onlne: 2014-04-04 ISSN: 1662-7482, Vols. 536-537, pp 678-682 do:10.4028/www.scentfc.net/amm.536-537.678 2014 Trans Tech Publcatons, Swtzerland Vrtual Machne Mgraton based on

More information

CHAPTER 2 DECOMPOSITION OF GRAPHS

CHAPTER 2 DECOMPOSITION OF GRAPHS CHAPTER DECOMPOSITION OF GRAPHS. INTRODUCTION A graph H s called a Supersubdvson of a graph G f H s obtaned from G by replacng every edge uv of G by a bpartte graph,m (m may vary for each edge by dentfyng

More information

GENETIC ALGORITHMS APPLIED FOR PATTERN GENERATION FOR DOWNHOLE DYNAMOMETER CARDS

GENETIC ALGORITHMS APPLIED FOR PATTERN GENERATION FOR DOWNHOLE DYNAMOMETER CARDS GENETIC ALGORITHMS APPLIED FOR PATTERN GENERATION FOR DOWNHOLE DYNAMOMETER CARDS L. Schntman 1 ; B.C.Brandao 1 ; H.Lepkson 1 ; J.A.M. Felppe de Souza 2 ; J.F.S.Correa 3 1 Unversdade Federal da Baha- Brazl

More information

Announcements. Supervised Learning

Announcements. Supervised Learning Announcements See Chapter 5 of Duda, Hart, and Stork. Tutoral by Burge lnked to on web page. Supervsed Learnng Classfcaton wth labeled eamples. Images vectors n hgh-d space. Supervsed Learnng Labeled eamples

More information