Weighted Feature Subset Non-Negative Matrix Factorization and its Applications to Document Understanding

Size: px
Start display at page:

Download "Weighted Feature Subset Non-Negative Matrix Factorization and its Applications to Document Understanding"

Transcription

1 200 IEEE Internatonal Conference on Data Mnng Weghted Feature Subset Non-Negatve Matrx Factorzaton and ts Applcatons to Document Understandng Dngdng Wang Tao L School of Computng and Informaton Scences Florda Internatonal Unversty Mam, FL, USA Emal: {dwang003,taol}@cs.fu.edu Chrs Dng Department of Computer Scence and Engneerng Unversty of Texas at Arlngton Arlngton, TX, USA Emal: chqdng@uta.edu Abstract Keyword (Feature) selecton enhances and mproves many Informaton Retreval (IR) tasks such as document categorzaton, automatc topc dscovery, etc. The problem of keyword selecton s usually solved usng supervsed algorthms. In ths paper, we propose an unsupervsed approach that combnes keyword selecton and document clusterng (topc dscovery) together. The proposed approach extends non-negatve matrx factorzaton (NMF) by ncorporatng a weght matrx to ndcate the mportance of the keywords. The proposed approach s further extended to a weghted verson n whch each document s also assgned a weght to assess ts mportance n the cluster. Ths work consders both theoretcal and emprcal weghted feature subset selecton for NMF and draws the connecton between unsupervsed feature selecton and data clusterng. We apply our proposed approaches to varous document understandng tasks ncludng document clusterng, summarzaton, and vsualzaton. Expermental results demonstrate the effectveness of our approach for these tasks. Keywords-Non-negatve matrx factorzaton; feature selecton; weghted feature subset non-negatve matrx factorzaton. I. INTRODUCTION Recently, many research efforts have been reported on developng effcent and effectve technques for analyzng large document collectons. Among these efforts, nonnegatve matrx factorzaton (NMF) has been shown to be useful for dfferent document understandng problems, e.g., document clusterng [40] and summarzaton [38]. The success of NMF s largely due to the newly dscovered ablty of NMF to solve challengng data mnng and machne learnng problems. In partcular, NMF wth the sum of squared error cost functon s equvalent to a relaxed K-means clusterng, the most wdely used unsupervsed learnng algorthm [8]. In addton, NMF wth the I-dvergence cost functon s equvalent to probablstc latent semantc ndexng (PLSI) [22], another unsupervsed learnng method popularly used n text analyss [0], [4]. Furthermore, NMF s able to model wdely varyng data dstrbutons and can do both hard and soft clusterng smultaneously. Several varants of NMF wth dfferent forms of factorzaton and regularzaton have also been developed and appled to many document analyss tasks [], [8], [38], [39]. Although NMF and ts varants have shown ther effectveness n these tasks, they usually perform data clusterng on all feature space. As we know, keyword (feature) selecton can enhance and mprove many document applcatons such as document categorzaton and automatc topc dscovery. However most of exstng keyword selecton technques are desgned for supervsed classfcaton problems. In ths paper, we extends NMF to solve a novel problem of clusterng wth double labelng of mportant features and data ponts, whch means that each data pont s marked as belongng to one of the groups, and each feature and data pont are also weghted to assess ther mportance respectvely. In partcular, we frst extends NMF to feature subset NMF whch combnes keyword selecton and document clusterng (topc dscovery) together. The proposed extenson ncorporates a weght matrx to ndcate the mportance of the keywords. It consders both theoretcally and emprcally feature subset selecton for NMF and draws the connecton between unsupervsed feature selecton and data clusterng. The selected keywords are dscrmnant for dfferent topcs n a global perspectve, unlke those obtaned n co-clusterng, whch typcally assocate wth one cluster strongly and are absent from other clusters. Also, the selected keywords are not lnear combnatons of words lke those obtaned n Latent Semantc Indexng (LSI) [7]: our selected words provde clear semantc meanngs of the key features whle LSI features combne dfferent words together and are not easy to nterpret. We further extend feature subset NMF nto a weghted verson whch assumes documents (data ponts) contrbute dfferently to the clusterng process,.e., some documents are tghtly related to certan topcs, whle some can be consdered as outlers. Fnally, we apply the proposed approaches n document understandng problems such as document clusterng, summarzaton, and vsualzaton. The comprehensve experments demonstrate the effectveness of our approaches. The rest of the paper s organzed as follows. Secton II dscusses the related work n NMF framework and varous /0 $ IEEE DOI 0.09/ICDM

2 document understandng tasks. In Secton III, we derve a generc theorem on the NMF algorthm. Secton IV and Secton V propose our (weghted) feature subsect NMF. An llustratve example s shown n Secton VI, and comprehensve experments on document clusterng, summarzaton, and vsualzaton are conducted n Secton VII. Secton VIII concludes. II. RELATED WORK A. NMF Framework NMF has been shown to be very useful for data clusterng. Lee and Seung [24] proposed the NMF problem and showed that the NMF problem could be solved by a multplcatve update algorthm. In general, the NMF algorthm attempts to fnd the subspaces n whch the maorty of the data ponts le. Let the nput data matrx X = (x,...,x n ) contan the collecton of n nonnegatve data column vectors. The problem of NMF ams to factorze X nto two nonnegatve matrces, X FG T, where X R p n +, F R p k +,andg R n k +. Smlarly, there are other matrx factorzatons whch dffer wth standard NMF by the restrctons on the matrx factors and forms: We lst them as follows. Convex-NMF: Tr-Factorzaton: WFS-NMF: X XWG T X FSG T mn X FG T 2 W Note that WFS-NMF s our proposed algorthm whch extends NMF by ncorporatng a weght matrx to ndcate the mportance of the keywords and data ponts. The detal of the algorthm wll be dscussed n the followng sectons. A prelmnary study of feature subset NMF whch only consders the mportance of keywords was presented as a two-page poster [37]. The relatons between NMF and some of the other matrx factorzaton and clusterng algorthms have been studed n [25]. In general, () Orthogonal NMF s equvalent to K- means clusterng; (2) G-orthogonal NMF, sem-nmf and convex-nmf are dentcal to relaxed K-means clusterng; (3) Tr-factorzaton wth explct orthogonalty constrants can be transformed nto 2-factor NMF; (4) PLSI [22] (whch s further developed nto a more comprehensve Latent Drchlet Allocaton (LDA) model []) solves the problem of NMF wth Kullback-Lebler dvergence; (5) our proposed WFS- NMF combnes clusterng wth double labelng of mportant features and samples by assgnng dfferent weghts to each row and column based on the weght matrx. B. Document Understandng Applcatons There exst varous document understandng applcatons n IR communty. Here, we brefly revew some popular tasks ncludng document clusterng, document summarzaton, and vsualzaton. In ths paper, we also apply our proposed approaches to these three applcatons. Document Clusterng.: The problem of document clusterng has been extensvely studed. Gven a collecton of documents, document clusterng parttons them nto dfferent groups (called clusters) so that smlar documents belong to the same group whle the documents n dfferent clusters are dssmlar. The problem of document clusterng has been extensvely studed. Tradtonal clusterng technques such as herarchcal and parttonng methods have been used n clusterng documents (e.g. herarchcal agglomeratve clusterng (HAC) [2] and K-means clusterng [20]). Model-based clusterng methods such as PLSI and the more comprehensve LDA have also been successfully appled to document clusterng [22], []. Recently, matrx and graph based clusterng algorthms have emerged as promsng clusterng approaches [39], and two representatve examples of whch are spectral clusterng [34] and non-negatve matrx factorzaton (NMF) [24], [40]. Co-clusterng algorthms are then proposed whch am at clusterng document and term smultaneously by makng use of the dual relatonshp nformaton [5], [7], [43]. Subspace clusterng algorthms have also been developed for dscoverng low-dmensonal clusters n hgh-dmenson document space [26], [23]. Mult-Document Summarzaton.: Mult-document summarzaton ams to generate a short summary for a collecton of documents reflectng the maor or queryrelevant nformaton. Exstng summarzaton methods usually rank the sentences n the documents accordng to ther salent scores calculated by a set of predefned lgustc features, such as term frequency-nverse sentence frequency (TF-ISF) [28], sentence or term poston [4], and number of keywords [4]. Gong et al. [6] propose a generc method usng latent semantc analyss (LSA) to select sentences wth hgh rankng for summarzaton. Goldsten et al. [5] propose a maxmal margnal relevance (MMR) method to summarze documents based on the cosne smlarty between a query and a sentence and also the sentence and prevously selected sentences. Other approaches nclude NMF based summarzaton [30], Condtonal Random Feld (CRF) based summarzaton [33], and hdden Markov model (HMM) based method [4]. In addton, graph-rankng based approaches have been proposed to summarze documents usng the sentence relatonshp [3], the dea of whch s smlar to PageRank. Document Vsualzaton.: Document vsualzaton focuses on dsplayng document relatonshps usng varous presentaton technques, whch helps users to understand and navgate nformaton easly. Some technques have been developed to map the document collecton nto multvarate space. Typcal systems for document vsualzaton nclude the Galaxy of News [32], Jgsaw [35], and ThemeRver [2]. In ths paper, we extend the NMF model to allow unsu- 542

3 pervsed feature selecton and data clusterng and rankng to be conducted smultaneously. We apply the proposed approaches n three document understandng applcatons to demonstrate the effectveness of the approaches for mprovng document understandng. III. A GENERIC THEOREM ON NMF ALGORITHM In ths paper, we wll derve several algorthms for NMF problems. Here we frst provde a generc theorem on the NMF algorthm. We wll use ths results repeatedly later. For the followng optmzaton problem mn J(H) H 0 =Tr[ 2RT H + H T PHQ], () where P, Q, H 0 are constant matrces, the optmal soluton for H s gven by the followng updatng algorthm R k H k H k. (2) (PHQ) k Theorem. If the algorthm converges, the converged soluton satsfes the KKT condton. Proof. We mnmze the Lagrangan functon L(H) =Tr[ 2R T H + H T PHQ 2βH], where λ =(λ k ) s the Lagrangan multpler to enforce H k 0. Settng J =( 2R +2PHQ 2β) k =0, H k the KKT complementarty slackness condton β k H k =0 becomes ( R + PHQ) k H k =0. (3) Now, when teratve soluton of H convergees, t satsfes R k H k = H k. (4) (PHQ) k One can see Eq.(4) s dentcal to Eq.(3) ether H k =0or not. Ths proves that the converged soluton satsfes KKT condton. Theorem 2. The updatng algorthm of Eq.(2) converges. Proof. We use the auxlary functon approach [24]. A functon Z(H, H ) s called an auxlary functon of J(H) f t satsfes Z(H, H ) J(H), Z(H, H) =J(H), (5) for any H, H.Defne H (t+) =argmnz(h, H (t) ), (6) H where we note that we requre the global mnmum. By constructon, we have J(H (t) ) = Z(H (t),h (t) ) Z(H (t+),h (t) ) J(H (t+) ). Thus J(H (t) ) s monotone decreasng (non-ncreasng). The key s to fnd () approprate Z(H, H ) and (2) ts global mnmum. Usng the followng matrx nequalty Tr(H T PHQ) k (PH Q) k Hk 2 H k, where H, P, Q 0 and P = P T,Q= Q T,wecanseethat Z(H, H )= k R k H k + k (PH Q) k H 2 k H k s an auxlary functon of J(H) of Eq.(). Now we solve Eq.(6) by dentfyng H (t+) = H and H (t) = H. Settng Z = 2R k +2 (PH Q) k H k H k H k =0, (7) we obtan H k = H k R k (PH. (8) Q) k The second dervatves are 2 Z =2 (PH Q) k H k H l H k δ δ kl, whch s a sem-postve defnte matrx, ensurng the local optma of Eq.(8) obtaned from Eq.(7) s the global mnma for solvng Eq.(6). Thus updatng H usng Eq.(8) wll decrease J(H). One can see Eq.(8) s dentcal to Eq.(2). IV. FEATURE SUBSET NMF (FS-NMF) A. Obectve Let X = {x,,x n ) contans n documents wth m keywords (features). In general, NMF factorzes the nput nonnegatve data matrx X nto two nonnegatve matrces, X FG T, where G R n k + s the cluster ndcator matrx for clusterng columns of X and F =(f,,f k ) R m k + contans k cluster centrods. In ths paper, we propose a new obectve to smultaneously factorze X and rank the features n X as follows: mn W 0,F 0,G 0 X FGT 2 W, s.t. W α =, (9) where W R m m + whch s a dagonal matrx ndcatng the weghts of the rows (keywords or features) n X, andα s a parameter (set to 0.7 emprcally). B. Optmzaton Mnmzng Eq.(9) wth respect to W, F,andG, hasa closed-form soluton. We wll optmze the obectve wth respect to one varable whle fxng the other varables. Ths procedure repeats untl convergence. 543

4 ) Computaton of W : Optmzng Eq.(9) wth respect to W s equvalent to optmzng J = W u λ( W α ), u = J (X FG T ) 2. Now, from the KKT condton W W = (u λαw )W =0, we obtan the followng updatng formula [ ] α W = u u α. (0) 2) Computaton of G: Optmzng Eq.(9) wth respect to G s equvalent to optmzng J 2(G) =Tr(X T W T X 2GF T W T X + F T W T FG T G). Usng the generc algorthm n Secton III, we obtan the followng updatng formula (X T WF ) k G k G k (GF T. () WF ) k 3) Computaton of F : Optmzng Eq.(9) wth respect to F s equvalent to optmzng J 3 (F )=Tr[WXX T 2WXGF T + WFG T GF ]. Usng the generc algorthm n Secton III, we obtan the followng updatng formula (WXG) k F k F k (WFG T. (2) G) k 4) Algorthm Procedure: The detal procedure of FS- NMF s lsted as Algorthm. Algorthm FS-NMF Algorthm Descrpton Input: X : word-document matrx K : the number of clusters Output: F : word cluster ndcator matrx G : document cluster ndcator matrx W : word weghts matrx : Intalze W = I and ntalze (F, G ) as the output of standard NMF 2: repeat [ ] α 3: Update W by W = α u, u where u = (X FGT ) 2 ; (X 4: Update G by G k G T WF ) k k ; (GF T WF ) k (WXG) 5: Update F by F k F k k ; (WFG T G ) k 6: untl converges. V. WEIGHTED FEATURE SUBSET NMF (WFS-NMF) In Secton IV, dfferent weghts are assgned to the term features ndcatng the mportance of the keywords, however all the documents are treated equally. Ths assumpton does no longer hold n case that dfferent documents are created wth dfferent mportance. Thus, we extend our algorthm to a weghted verson n whch each document s also assgned a weght. Smlar to Eq.( 9), the obectve of weghted FS-NMF can be wrtten as: mn X W 0,F 0,G 0 FGT 2 W, where we set W = a b. Ths becomes mn (X W 0,F 0,G 0 FGT ) 2 a b, s.t. a α =, b β =, (3) where α, β are two parameters wth 0 <α<, 0 <β<. A. Optmzaton ) Computaton of W : Snce W = ab T, we optmze a =(a,,a m ) frst. Optmzng Eq.(3) wth respect to a s equvalent to optmzng J a = u a λ( a α ), u = (X FG T ) 2 b. Ths optmzaton has been analyzed n Secton IV-B. The optmal soluton for a s gven by [ a = u α ] α u. (4) We now optmze the obectve Eq.(3) wth respect to b =(b,,b n ) whch s equvalent to optmzng J b = v b λ( b β ), v = (X FG T ) 2 a. The optmal soluton for b s gven by β b = v β. (5) v β β 2) Computaton of F : Let A = dag(a,a 2,...,a m ) and B =dag(b,b 2,...,b n ). Optmzng Eq.(3) wth respect to F s equvalent to optmzng J 4 (F )= [ a (X FG T ) b ] 2 = A 2 (X FG T )B 2 2 = Tr(X T AXB 2G T BX T AF + F T AF G T BG). (6) Usng the generc algorthm of Secton III, we obtan (AXBG) k F k F k (AF G T. (7) BG) k 544

5 3) Computaton of G: Usng Eq.(6), the obectve for G obtaned after pre-processng, s S S2 S3 S4 S5 S6 J 5 (G) =Tr(X T AXB 2G T BX T AF + G T BGF T AF ). (8) start applcaton 0 0 Usng the generc algorthm of Secton III, we obtan X = verson 0 0. create (BX T AF ) k temporary 0 0 G k G k (BGF T. (9) AF ) servce 0 0 k B. Algorthm Procedure After the computaton by WFS-NMF, the weghts for the The detal procedure of WFS-NMF s lsted as Algorthm 2. applcaton 0.58 verson 0.58 a = Algorthm 2 WFS-NMF Algorthm Descrpton create terms are start 0.84 Input: X : word-document matrx temporary 0.65 K : the number of clusters servce 0.65 Output: F : word cluster ndcator matrx G : document cluster ndcator matrx Thus the most mportant two keywords are start and W : word and document weghts matrx create, whch s consstent wth our perspectve. Smlarly, the weghts for the messages are : Intalze W = I and ntalze (F, G) as the output of standard NMF S : repeat S : Update W by W = a b, S [ ] b = α a = α u β S , b = β S β v, S u where u = (X FGT ) 2 and v = (X FGT ) 2 ; v β 4: Update G by G k G k (BX T AF ) k (BGF T AF ) k ; 5: Update F by F k F k (AXBF T ) k (AF GBG T ) k ; 6: untl converges. VI. AN ILLUSTRATIVE EXAMPLE In ths secton, we use a smple example to llustrate the process of weghtng the keywords and data ponts usng the proposed WFS-NMF algorthm. An example dataset wth sx system log messages s presented n Table I, whch s a subset of the Log data descrbed n Secton VII-A. The sx sample messages belong to two dfferent clusters: start and create. S S2 S3 S4 S5 S6 Start User profle applcaton verson.0 started successfully. Database applcaton verson. starts. Start applcaton verson 2.0 for temporary servces. Create Can not create temporary ser vces for the Oracle engne. Can not create temporary servces on the fles. Create applcaton verson 2.0 for temporary servces. Table I AN EXAMPLE DATASET WITH TWO CLUSTERS. In the data pre-processng step, the stop words and the words whch only appear once are removed, and also stemmng s performed. The followng term-message matrx s Then we know S3 and S6 are not mportant words n dscrmnatng the two clusters as they have the lowest weghts. From the example, we clearly observe that the proposed approaches can dscover key features and samples. VII. EXPERIMENTS A. Document Clusterng Frst of all, we examne the clusterng performance of FS- NMF and W-FS-NMF usng four text datasets as descrbed n Secton VII-A, and compare the results wth seven wdely used document clusterng methods as descrbed n Secton VII-A2. Datasets # Samples # Dmensons # Class CSTR Log Reuters WebACE Table II DATASET DESCRIPTIONS. ) Data Sets: Table II summarzes the characterstcs of the datasets used n the expermetns. Detaled descrptons of the data sets are as follows. CSTR. Ths s the dataset of the abstracts of techncal reports (TRs) publshed n the Department of Computer Scence at Unversty of Rochester from 99 to The dataset contaned 476 abstracts, whch were dvded nto four research areas: Natural 545

6 Language Processng(NLP), Robotcs/Vson, Systems, and Theory. Log. Ths dataset contans 367 log text messages colleccted from several dff erent machnes at Florda Internatonal Unversty wth dfferent operatng systems usng logdump2td (an NT data collecton tool). There are 9 categores of these messages,.e., confguraton, connecton, create, dependency, other, report, request, start, and stop. Reuters. The Reuters-2578 Text Categorzaton Test collecton contans documents collected from the Reuters newswre n 987. It s a standard text categorzaton benchmark and contans 35 categores. In our experments, we use a subset of the data collecton whch ncludes the 0 most frequent categores among the 35 topcs and we call t Reuters-top 0. WebAce. Ths s from WebACE proect and has been used for document clusterng [2], [9]. The dataset contans 2340 documents consstng news artcles from Reuters new servce va the Web n October 997. These documents are dvded nto 20 classes. Newsgroups. The 20 newsgroups dataset contans approxmately 20,000 artcles evenly dvded among 20 Usenet newsgroups. The raw text sze s 26MB. To pre-process the datasets, we remove the stop words usng a standard stop lst, all HTML tags are skpped and all header felds except subect and organzaton of the posted artcles are gnored. In all our experments, we frst select the top 000 words by mutual nformaton wth class labels. The feature selecton s done wth the ranbow package [29]. 2) Implemented Baselnes: We compare the clusterng performance of FS-NMF and W-FS-NMF wth the followng most wdely used document clusterng methods. () K- means: Stardard K-means algorthm; (2) PCA-Km: PCA s frstly appled to reduce the data dmenson followed by the K-means clusterng; (3) LDA-Km [9]: an adaptve subspace clusterng algorthm by ntegratng lnear dscrmnant analyss (LDA) and K-means clusterng nto a coherent process; (4) ECC: Eucldean co-clusterng [3]; (5) MSRC: mnmum squared resdueco clusterng [3]; (6) NMF: Nonnegatve matrx factorzaton [40]; (7) TNMF: Tr-factor matrx factorzaton []; (8) Ncut: Spectral Clusterng wth Normalzed Cuts [42]. In these mplemented baselnes, (a) the K-means algorthm s one of the most wdely used standard clusterng algorthm; (b) LDA-Km and PCA-Km are two subspace clusterng algorthms whch dentfy clusters exstng n the subspaces of the orgnal data space; (c) Spectral Clusterng wth Normalzed Cuts (Ncut) s also mplemented snce t has been shown that that weghted Kernel K-means s equvalent to the normalzed cut [6]; (d) both ECC and MSRC are document co-clusterng algorthms that are able to fnd blocks n a rectangle document-term matrx. Coclusterng algorthms generally perform mplct dmenson reducton durng clusterng process. NMF has been shown to be effectve n document clusterng [40], and our methods are both based on the NMF framework. 3) Evaluaton Measures: To measure the clusterng performance, we use accuracy and normalzed mutual nformaton as our performance measures. Accuracy dscovers the one-to-one relatonshp between clusters and classes and measures the extent to whch each cluster contaned data ponts from the correspondng class. It sums up the whole matchng degree between all par class-clusters. Its value s between [0, ]. Accuracy can be represented as: ACC =max( C,L T (C,L ))/N, (20) where C denotes the -th cluster, and L s the -th class. T (C,L ) s the number of enttes whch belong to class are assgned to cluster. Accuracy computes the maxmum sum of T (C,L ) for all pars of clusters and classes, and these pars have no overlaps. Generally, the greater accuracy means the better clusterng performance. Normalzed mutual nformaton (NMI) s another wdely used performance evaluaton measure for determnng the qualty of clusters [36]. For two random varables X and Y, the NMI s defned as NMI(X, Y )= I(X, Y ) H(X)H(Y ), (2) where I(X, Y ) s the mutual nformaton between X and Y, and H(X) and H(Y ) are the entropes of X and Y, respectvely. Clearly, NMI(X, X) =and ths s the maxmum possble value of NMI. Gven a clusterng result, NMI n Eq.( 2) s estmated as follows: k k = = NMI = n log( n n n ˆn ) ( (22) k = n log n )( k n = ˆn log ˆn ) n where n denotes the number of data ponts contaned n the cluster C ( k), ˆn s the number of data ponts belongng to the -th class ( k), and n denotes the number of data ponts that are n the ntersecton between the cluster C and the -th class. In general, the larger the NMI value, the better the clusterng qualty. 4) Clusterng Results: Table III and Table IV show the accuracy and NMI evaluaton results on the text datasets. From the expermental comparsons, we observe that: On most datasets, subspace clusterng algorthms (especally LDA-Km) outperform the standard K-means algorthm due to the pre-processng by LDA or PCA. Co-clusterng algorthms (ECC and MSRC) generally outperform K-means snce they are performng mplct dmenson reducton durng the clusterng process. NMF outperforms K-means sgnfcantly snce NMF can model wdely varyng data dstrbutons due to the flexblty of matrx factorzaton as compared to 546

7 WebACE Log Reuters CSTR K-means PCA-Km LDA-Km ECC MSRC NMF TNMF Ncut FS-NMF WFS-NMF Table III CLUSTERING ACCURACY. WebACE Log Reuters CSTR K-means PCA-Km LDA-Km ECC MSRC NMF TNMF Ncut FS-NMF WFS-NMF Table IV CLUSTERING NMI RESULTS. the rgd sphercal clusters that the K-means clusterng obectve functon attempts to capture [8]. TNMF provdes a good framework for smultaneously clusterng the rows and columns of the nput documents. Hence TNMF generally outperforms NMF. The results of spectral clusterng (Ncut) s better than K-means. Note that spectral clusterng can be vewed as a weghted verson of Kernel K-means and hence t s able to dscover arbtrarly shaped clusters. The expermental results of Ncut s smlar to those of NMF. Note that t has also been that NMF s equvalent to spectral clusterng [8]. The proposed FS-NMF and WFS-NMF extend the NMF model and provde a good framework for weghtng dfferent terms and documents. Hence both of them generally outperform NMF and TNMF on the datasets. And n the meanwhle, mportant term features can be dscovered by our algorthms. As the fact that WFS-NMF consders the mportance of dfferent documents nstead of treatng them equally, the results of WFS-NMF acheves the best performance on most datasets. B. Document Summarzaton ) Data Sets: We use the DUC benchmark datasets (DUC2002 and DUC2004) for generc document summa- DUC2002 DUC2004 number of document collectons number of documents 0 0 n each collecton data source TREC TDT summary length 200 words 665bytes Table V DESCRIPTION OF THE DATA SETS FOR MULTI-DOCUMENT SUMMARIZATION rzaton tasks. Table V gves a bref descrpton of the data sets. 2) Implemented Systems: In ths experment, we compare our algorthms for summarzaton wth several most wdely used document summarzaton methods as follows. () DUCBest: the method developed by the team achevng the hghest scores n the DUC competton. (2)Random: selects sentences randomly for each document collecton. (3) Centrod: smlar to MEAD algorthm proposed n [3] usng centrod value, postonal value, and frstsentence overlap as features. (4) LexPageRank: a graph-based summarzaton method recommendng sentences by the votng of ther neghbors [3]. (5) LSA: conducts latent semantc analyss on terms by sentences matrx as proposed n [6]. (6) NMF: performs NMF on terms by sentences matrx and ranks the sentences by ther weghted scores [24]. In order to use FS-NMF or WFS-NMF to conduct document summarzaton, we use the document-sentence matrx as the nput data X, whch can be generated from the document-term and sentence-term matrces, and now each feature (column) n X represents a sentence. Then the sentences can be ranked based on the sentence weghts n W n both FS-NMF and WFS-NMF. Top-ranked sentences are ncluded nto the fnal summary. Snce WFS-NMF weghts both the samples and features, an alternatve soluton for document summarzaton s to factorze the sentence-term matrx generated from the orgnal documents, and after computaton the sentences are naturally ranked based on ther assgned weghts. Thus, we develop three new summarzaton methods as follows. (7) FS-NMF: performs FS-NMF on document-sentence matrx, and selects the sentences assocated wth the hghest weghts to form summares. (8) WFS-NMF-: smlar to FS-NMF, performs WFS- NMF on document-sentence matrx to select the sentences wth the hghest weghts. (9) WFS-NMF-2: performs WFS-NMF on sentenceterm matrx, and selects the sentences assocated wth the hghest weghts to form summares. 547

8 3) Evaluaton Methods: We use ROUGE [27] toolkt (verson.5.5) to measure the summarzaton performance, whch s wdely appled by DUC for performance evaluaton. It measures the qualty of a summary by countng the unt overlaps between the canddate summary and a set of reference summares. Several automatc evaluaton methods are mplemented n ROUGE, such as ROUGE-N, ROUGE-W and ROUGE-SU. ROUGE-N s an n-gram recall computed as follows. S ref gram ROUGE-N = n S Count match(gram n ) S ref gram n S Count(gram n ) (23) where n s the length of the n-gram, and ref stands for the reference summares. Count match (gram n ) s the maxmum number of n-grams co-occurrng n a canddate summary and the reference summares, and Count(gram n ) s the number of n-grams n the reference summares. ROUGE- W s based on weghted LCS and ROUGE-SU s based on skp-bgram plus ungram. Each of these evaluaton methods n ROUGE can generate three scores (recall, precson and F-measure). As we have smlar conclusons n terms of any of the three scores, for smplcty, n ths paper, we only report the average F-measure scores generated by ROUGE-, ROUGE-2, ROUGE-W and ROUGE-SU to compare the mplemented systems. Systems R- R-2 R-W R-SU DUC Best Random Centrod LexPageRank LSA NMF FS-NMF WFS-NMF WFS-NMF Table VI OVERALL PERFORMANCE COMPARISON ON DUC2002 DATA. Systems R- R-2 R-W R-SU DUC Best Random Centrod LexPageRank LSA NMF FS-NMF WFS-NMF WFS-NMF Table VII OVERALL PERFORMANCE COMPARISON ON DUC2004 DATA. 4) Summarzaton Evaluaton: The expermental results are demonstrated n Table VI and Table VII. From the results, we have the followng observatons: All of the three summarzaton methods developed based on FS-NMF and WFS-NMF algorthms outperform the state-of-the-art generc summarzaton methods. The good results beneft from the weghtng schemes for sentence features (or sentence samples). Among these three methods, n general WFS-NMF- acheves the hghest ROUGE scores. Ths observaton demonstrates that the sentence feature selecton s effectve and the weghts on document sde also helps the sentence weghtng process. Whle further lookng at the selected sentences, we fnd that there do exst some overlap n the selected sentences by the proposed three summarzaton methods, whch ndcates the consstency and effectveness of the weght assgnments n both samples and features. The ROUGE scores of our methods are hgher than the best team n DUC2004 and comparable to the best team from DUC2002. Note that the good results of the best team come from the fact that they perform deeper natural language processng technques to resolve pronouns and other anaphorc expressons, whch we do not use for the data preprocessng. C. Vsualzaton To evaluate the term features selected by our methods n document clusterng and smultaneous keyword selecton, n ths set of experments, we calculate the parwse document smlarty usng the top 20 word features selected by dfferent methods. We use CSTR dataset n ths experment, whch contans four classes of text data. We compare the results of our FS-NMF and WFS-NMF algorthms wth standard NMF and LSI, and Fgure demonstrates the document smlarty matrx vsually. Note that n the CSTR dataset, we order the documents based on ther class labels. From Fgure, we have the followng observatons. Word features selected by FS-NMF and WFS-NMF can effectvely reflet the document dstrbuton. Ths s because the keywords dentfed by FS-NMF dscrmnate dfferent topcs n a global perspectve. NMF Fgure (b) shows no obvous patterns at all. The falure of NMF comes from the fact that t tres to group the terms nto topcs contaned n the documents and uses the terms wth the hghest probabltes n each topc as the keywords, whch are not dscrmnant and usually redundant. LSI can also fnd meanngful words, however, the frst two clusters are not clearly dscovered n Fgure (a), whch ndcates some small classes are hard to dentfed by LSI usng the keywords selected by t. VIII. CONCLUSION In ths paper, we propose the weghed feature subset non-negatve matrx factorzaton, whch s an unsupervsed approach to smultaneously cluster data ponts and select 548

9 (a) LSI (b) NMF mportant features and also dfferent data ponts are assgned dfferent weghts ndcatng ther mportance. We apply our proposed approach to varous document understandng tasks ncludng document clusterng, summarzaton, and vsualzaton. Expermental results demonstrate the effectveness of our approaches for these tasks. ACKNOWLEDGEMENT The work of D. Wang s supported by an Florda Internatonal Unversty (FIU) Dssertaton Fellowshp. The work of T. L s partally supported NSF grants IIS , CCF , and DMS The work of C. Dng s partally supported by NSF grants DMS and CCF REFERENCES [] D. Ble, A. Ng, and M. Jordan. Latent drchlet allocaton. Journal of Machne Learnng Research, 3:993022, [2] D. Boley. Prncpal drecton dvsve parttonng. Data Mnng and Knowledge Dscovery, 2: , 997. [3] H. Cho, I. Dhllon, Y. Guan, and S. Sra. Mnmum sum squared resdue co-clusterng of gene expresson data. In Proceedngs of SDM [4] J. M. Conroy and D. P. O leary. Text summarzaton va hdden markov models. In SIGIR 0: Proceedngs of the 24th annual nternatonal ACM SIGIR conference on Research and development n nformaton retreval, pages , 200. [5] I. Dhllon. Co-clusterng documents and words usng bpartte spectral graph parttonng. In Proceedngs of SIGKDD (c) WFS-NMF (d) FS-NMF [6] I. Dhllon, Y. Guan, and B. Kuls. Kernel k-means: spectral clusterng and normalzed cuts [7] I. Dhllon, S. Mallela, and S. Modha. Informaton-theoretc co-clusterng. In Proceedngs of SIGKDD 200. [8] C. Dng, X. He, and H. Smon. On the equvalence of nonnegatve matrx factorzaton and spectral clusterng. In Proceedngs of Sam Data Mnng, [9] C. Dng and T. L. Adaptve dmenson reducton usng dscrmnant analyss and k-means c lusterng. In Proceedngs of ICML [0] C. Dng, T. L, and W. Peng. Nmf and pls: equvalence and a hybrd algorthm. In SIGIR 06: Proceedngs of the 29th annual nternatonal ACM SIGIR conference on Research and development n nformaton retreval, pages , [] C. Dng, T. L, W. Peng, and H. Park. Orthogonal nonnegatve matrx tr-factorzatons for clusterng. In Proceedngs of SIGKDD 2006, [2] R. Duda, P. Hart, and D. Stork. Pattern Classfcaton. John Wley and Sons, Inc., 200. Fgure. Vsualzaton Results on CSTR Data. CSTR has 4 clusters. [3] G. Erkan and D. Radev. Lexpagerank: Prestge n multdocument text summarzaton. In Proceedngs of EMNLP

10 [4] E. Gausser and C. Goutte. Relaton between plsa and nmf and mplcatons. In SIGIR 05: Proceedngs of the 28th annual nternatonal ACM SIGIR conference on Research and development n nformaton retreval, pages , [5] J. Goldsten, M. Kantrowtz, V. Mttal, and J. Carbonell. Summarzng text documents: Sentence selecton and evaluaton metrcs. In In Research and Development n Informaton Retreval, pages 2 28, 999. [6] Y. Gong and X. Lu. Generc text summarzaton usng relevance measure and latent semantc analyss. In Proceedngs of SIGIR 200. [7] A. Graesser, A. Karnavat, and V. Pomeroy. Latent semantc analyss captures causal, goal-orented, and taxonomc structures. In CogSc. [8] Q. Gu and J. Zhou. Local learnng regularzed nonnegatve matrx factorzaton. In IJCAI, pages , [9] E.-H. S. Han, D. Bole, M. Gn, R. Gross, K. Hastngs, G. Karyps, V. Kuma, B. Mobasher, and J. Moore. Webace: A web agent for document categorzaton and exploraton, 998. [20] J. A. Hartgan and M. A. Wong. Algorthm as 36: A k- means clusterng algorthm. Journal of the Royal Statstcal Socety. Seres C (Appled Statstcs), 28():00 08, 979. [2] S. Havre, E. Hetzler, P. Whtney, and L. Nowell. Themerver: Vsualzng thematc changes n large document collectons. IEEE Transactons on Vsualzaton and Computer Graphcs, 8():9 20, [22] T. Hofmann. Probablstc latent semantc ndexng. In Proceedngs of the Twenty-Second Annual Internatonal SIGIR Conference, 999. [23] L. Jng, M. K. Ng, and J. Z. Huang. An entropy weghtng k- means algorthm for subspace clusterng of hgh-dmensonal sparse data. IEEE Trans. on Knowl. and Data Eng., 9(8):026 04, [24] D. D. Lee and H. S. Seung. Algorthms for non-negatve matrx factorzaton. In In NIPS, pages , 200. [25] T. L. The relatonshps among varous nonnegatve matrx factorzaton methods for clusterng. In In ICDM, pages , [26] T. L, S. Ma, and M. Oghara. Document clusterng va adaptve subspace teraton. In Proceedngs of Twenty-Seventh Annual Internatonal ACM SIGIR Conference on Research and Development n Informaton Retreval (SIGIR 2004), pages , [27] C.-Y. Ln and E.Hovy. Automatc evaluaton of summares usng n-gram co-occurrence statstcs. In Proceedngs of NLT- NAACL [28] C.-Y. Ln and E. Hovy. From sngle to mult-document summarzaton: a prototype system and ts evaluaton. In ACL 02: Proceedngs of the 40th Annual Meetng on Assocaton for Computatonal Lngustcs, pages , [29] A. K. McCallum. Bow: A toolkt for statstcal language modelng, text retreval, classfcaton and clusterng. mccallum/bow, 996. [30] S. Park, J.-H. Lee, D.-H. Km, and C.-M. Ahn. Multdocument summarzaton based on cluster usng non-negatve matrx factorzaton. In SOFSEM 07: Proceedngs of the 33rd conference on Current Trends n Theory and Practce of Computer Scence, pages , [3] D. Radev, H. Jng, M. Stys, and D. Tam. Centrod-based summarzaton of multple documents. Informaton Processng and Management, pages , [32] E. Rennson. Galaxy of news: an approach to vsualzng and understandng expansve news landscapes. In UIST 94, pages 3 2, 994. [33] D. Shen, J.-T. Sun, H. L, Q. Yang, and Z. Chen. Document summarzaton usng condtonal random felds. In IJCAI 07: Proceedngs of the 20th nternatonal ont conference on Artfcal ntellgence, pages , [34] J. Sh and J. Malk. Normalzed cuts and mage segmentaton. IEEE Transactons on Pattern Analyss and Machne Intellgence, 22: , 997. [35] J. Stasko, C. Görg, and Z. Lu. Jgsaw: supportng nvestgatve analyss through nteractve vsualzaton. Informaton Vsualzaton, 7(2):8 32, [36] A. Strehl, J. Ghosh, and C. Carde. Cluster ensembles - a knowledge reuse framework for combnng multple parttons. Journal of Machne Learnng Research, 3:583 67, [37] D. Wang, C. H. Q. Dng, and T. L. Feature subset nonnegatve matrx factorzaton and ts applcatons to document understandng. In SIGIR, pages , 200. [38] D. Wang, T. L, S. Zhu, and C. Dng. Mult-document summarzaton va sentence-level semantc analyss and symmetrc matrx factorzaton. In Proceedngs of SIGIR [39] F. Wang, C. Zhang, and T. L. Regularzed clusterng for documents. In Proceedngs of the 30th annual nternatonal ACM SIGIR conference on Research and development n nformaton retreval, pages 95 02, [40] W. Xu, X. Lu, and Y. Gong. Document clusterng based on non-negatve matrx factorzaton. In Proceedngs of SIGIR 2004, [4] W.-t. Yh, J. Goodman, L. Vanderwende, and H. Suzuk. Mult-document summarzaton by maxmzng nformatve content-words. In IJCAI 07: Proceedngs of the 20th nternatonal ont conference on Artfcal ntellgence, pages , [42] S. X. Yu and J. Sh. Multclass spectral clusterng. In ICCV 03. [43] H. Zha, X. He, C. Dng, M. Gu, and H. Smon. Bpartte graph parttonng and data clusterng. Proc. Int l Conf. Informaton and Knowledge Management (CIKM 200),

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points; Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features

More information

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task Proceedngs of NTCIR-6 Workshop Meetng, May 15-18, 2007, Tokyo, Japan Term Weghtng Classfcaton System Usng the Ch-square Statstc for the Classfcaton Subtask at NTCIR-6 Patent Retreval Task Kotaro Hashmoto

More information

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance Tsnghua Unversty at TAC 2009: Summarzng Mult-documents by Informaton Dstance Chong Long, Mnle Huang, Xaoyan Zhu State Key Laboratory of Intellgent Technology and Systems, Tsnghua Natonal Laboratory for

More information

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

Cluster Analysis of Electrical Behavior

Cluster Analysis of Electrical Behavior Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School

More information

The Research of Support Vector Machine in Agricultural Data Classification

The Research of Support Vector Machine in Agricultural Data Classification The Research of Support Vector Machne n Agrcultural Data Classfcaton Le Sh, Qguo Duan, Xnmng Ma, Me Weng College of Informaton and Management Scence, HeNan Agrcultural Unversty, Zhengzhou 45000 Chna Zhengzhou

More information

Using Wikipedia Anchor Text and Weighted Clustering Coefficient to Enhance the Traditional Multi-Document Summarization

Using Wikipedia Anchor Text and Weighted Clustering Coefficient to Enhance the Traditional Multi-Document Summarization Usng Wkpeda Anchor Text and Weghted Clusterng Coeffcent to Enhance the Tradtonal Mult-Document Summarzaton by Nraj Kumar, Kannan Srnathan, Vasudeva Varma n 13th Internatonal Conference on Intellgent Text

More information

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Determining the Optimal Bandwidth Based on Multi-criterion Fusion Proceedngs of 01 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 5 (01) (01) IACSIT Press, Sngapore Determnng the Optmal Bandwdth Based on Mult-crteron Fuson Ha-L Lang 1+, Xan-Mn

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervsed Learnng and Clusterng Supervsed vs. Unsupervsed Learnng Up to now we consdered supervsed learnng scenaro, where we are gven 1. samples 1,, n 2. class labels for all samples 1,, n Ths s also

More information

UB at GeoCLEF Department of Geography Abstract

UB at GeoCLEF Department of Geography   Abstract UB at GeoCLEF 2006 Mguel E. Ruz (1), Stuart Shapro (2), June Abbas (1), Slva B. Southwck (1) and Davd Mark (3) State Unversty of New York at Buffalo (1) Department of Lbrary and Informaton Studes (2) Department

More information

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning Outlne Artfcal Intellgence and ts applcatons Lecture 8 Unsupervsed Learnng Professor Danel Yeung danyeung@eee.org Dr. Patrck Chan patrckchan@eee.org South Chna Unversty of Technology, Chna Introducton

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Decson surface s a hyperplane (lne n 2D) n feature space (smlar to the Perceptron) Arguably, the most mportant recent dscovery n machne learnng In a nutshell: map the data to a predetermned

More information

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth

More information

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems A Unfed Framework for Semantcs and Feature Based Relevance Feedback n Image Retreval Systems Ye Lu *, Chunhu Hu 2, Xngquan Zhu 3*, HongJang Zhang 2, Qang Yang * School of Computng Scence Smon Fraser Unversty

More information

Query Clustering Using a Hybrid Query Similarity Measure

Query Clustering Using a Hybrid Query Similarity Measure Query clusterng usng a hybrd query smlarty measure Fu. L., Goh, D.H., & Foo, S. (2004). WSEAS Transacton on Computers, 3(3), 700-705. Query Clusterng Usng a Hybrd Query Smlarty Measure Ln Fu, Don Hoe-Lan

More information

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour 6.854 Advanced Algorthms Petar Maymounkov Problem Set 11 (November 23, 2005) Wth: Benjamn Rossman, Oren Wemann, and Pouya Kheradpour Problem 1. We reduce vertex cover to MAX-SAT wth weghts, such that the

More information

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

Unsupervised Learning

Unsupervised Learning Pattern Recognton Lecture 8 Outlne Introducton Unsupervsed Learnng Parametrc VS Non-Parametrc Approach Mxture of Denstes Maxmum-Lkelhood Estmates Clusterng Prof. Danel Yeung School of Computer Scence and

More information

Hierarchical clustering for gene expression data analysis

Hierarchical clustering for gene expression data analysis Herarchcal clusterng for gene expresson data analyss Gorgo Valentn e-mal: valentn@ds.unm.t Clusterng of Mcroarray Data. Clusterng of gene expresson profles (rows) => dscovery of co-regulated and functonally

More information

Discriminative Dictionary Learning with Pairwise Constraints

Discriminative Dictionary Learning with Pairwise Constraints Dscrmnatve Dctonary Learnng wth Parwse Constrants Humn Guo Zhuoln Jang LARRY S. DAVIS UNIVERSITY OF MARYLAND Nov. 6 th, Outlne Introducton/motvaton Dctonary Learnng Dscrmnatve Dctonary Learnng wth Parwse

More information

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 15

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 15 CS434a/541a: Pattern Recognton Prof. Olga Veksler Lecture 15 Today New Topc: Unsupervsed Learnng Supervsed vs. unsupervsed learnng Unsupervsed learnng Net Tme: parametrc unsupervsed learnng Today: nonparametrc

More information

Feature Reduction and Selection

Feature Reduction and Selection Feature Reducton and Selecton Dr. Shuang LIANG School of Software Engneerng TongJ Unversty Fall, 2012 Today s Topcs Introducton Problems of Dmensonalty Feature Reducton Statstc methods Prncpal Components

More information

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices Steps for Computng the Dssmlarty, Entropy, Herfndahl-Hrschman and Accessblty (Gravty wth Competton) Indces I. Dssmlarty Index Measurement: The followng formula can be used to measure the evenness between

More information

Machine Learning: Algorithms and Applications

Machine Learning: Algorithms and Applications 14/05/1 Machne Learnng: Algorthms and Applcatons Florano Zn Free Unversty of Bozen-Bolzano Faculty of Computer Scence Academc Year 011-01 Lecture 10: 14 May 01 Unsupervsed Learnng cont Sldes courtesy of

More information

Fast Robust Non-Negative Matrix Factorization for Large-Scale Human Action Data Clustering

Fast Robust Non-Negative Matrix Factorization for Large-Scale Human Action Data Clustering roceedngs of the Twenty-Ffth Internatonal Jont Conference on Artfcal Intellgence IJCAI-6) Fast Robust Non-Negatve Matrx Factorzaton for Large-Scale Human Acton Data Clusterng De Wang, Fepng Ne, Heng Huang

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

Available online at Available online at Advanced in Control Engineering and Information Science

Available online at   Available online at   Advanced in Control Engineering and Information Science Avalable onlne at wwwscencedrectcom Avalable onlne at wwwscencedrectcom Proceda Proceda Engneerng Engneerng 00 (2011) 15000 000 (2011) 1642 1646 Proceda Engneerng wwwelsevercom/locate/proceda Advanced

More information

Combining Multiple Resources, Evidence and Criteria for Genomic Information Retrieval

Combining Multiple Resources, Evidence and Criteria for Genomic Information Retrieval Combnng Multple Resources, Evdence and Crtera for Genomc Informaton Retreval Luo S 1, Je Lu 2 and Jame Callan 2 1 Department of Computer Scence, Purdue Unversty, West Lafayette, IN 47907, USA ls@cs.purdue.edu

More information

Keywords - Wep page classification; bag of words model; topic model; hierarchical classification; Support Vector Machines

Keywords - Wep page classification; bag of words model; topic model; hierarchical classification; Support Vector Machines (IJCSIS) Internatonal Journal of Computer Scence and Informaton Securty, Herarchcal Web Page Classfcaton Based on a Topc Model and Neghborng Pages Integraton Wongkot Srura Phayung Meesad Choochart Haruechayasak

More information

An Entropy-Based Approach to Integrated Information Needs Assessment

An Entropy-Based Approach to Integrated Information Needs Assessment Dstrbuton Statement A: Approved for publc release; dstrbuton s unlmted. An Entropy-Based Approach to ntegrated nformaton Needs Assessment June 8, 2004 Wllam J. Farrell Lockheed Martn Advanced Technology

More information

Performance Evaluation of Information Retrieval Systems

Performance Evaluation of Information Retrieval Systems Why System Evaluaton? Performance Evaluaton of Informaton Retreval Systems Many sldes n ths secton are adapted from Prof. Joydeep Ghosh (UT ECE) who n turn adapted them from Prof. Dk Lee (Unv. of Scence

More information

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1 4/14/011 Outlne Dscrmnatve classfers for mage recognton Wednesday, Aprl 13 Krsten Grauman UT-Austn Last tme: wndow-based generc obect detecton basc ppelne face detecton wth boostng as case study Today:

More information

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,

More information

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices Internatonal Mathematcal Forum, Vol 7, 2012, no 52, 2549-2554 An Applcaton of the Dulmage-Mendelsohn Decomposton to Sparse Null Space Bases of Full Row Rank Matrces Mostafa Khorramzadeh Department of Mathematcal

More information

Document Representation and Clustering with WordNet Based Similarity Rough Set Model

Document Representation and Clustering with WordNet Based Similarity Rough Set Model IJCSI Internatonal Journal of Computer Scence Issues, Vol. 8, Issue 5, No 3, September 20 ISSN (Onlne): 694-084 www.ijcsi.org Document Representaton and Clusterng wth WordNet Based Smlarty Rough Set Model

More information

Solving two-person zero-sum game by Matlab

Solving two-person zero-sum game by Matlab Appled Mechancs and Materals Onlne: 2011-02-02 ISSN: 1662-7482, Vols. 50-51, pp 262-265 do:10.4028/www.scentfc.net/amm.50-51.262 2011 Trans Tech Publcatons, Swtzerland Solvng two-person zero-sum game by

More information

Classifying Acoustic Transient Signals Using Artificial Intelligence

Classifying Acoustic Transient Signals Using Artificial Intelligence Classfyng Acoustc Transent Sgnals Usng Artfcal Intellgence Steve Sutton, Unversty of North Carolna At Wlmngton (suttons@charter.net) Greg Huff, Unversty of North Carolna At Wlmngton (jgh7476@uncwl.edu)

More information

From Comparing Clusterings to Combining Clusterings

From Comparing Clusterings to Combining Clusterings Proceedngs of the Twenty-Thrd AAAI Conference on Artfcal Intellgence (008 From Comparng Clusterngs to Combnng Clusterngs Zhwu Lu and Yuxn Peng and Janguo Xao Insttute of Computer Scence and Technology,

More information

Module Management Tool in Software Development Organizations

Module Management Tool in Software Development Organizations Journal of Computer Scence (5): 8-, 7 ISSN 59-66 7 Scence Publcatons Management Tool n Software Development Organzatons Ahmad A. Al-Rababah and Mohammad A. Al-Rababah Faculty of IT, Al-Ahlyyah Amman Unversty,

More information

Classifier Selection Based on Data Complexity Measures *

Classifier Selection Based on Data Complexity Measures * Classfer Selecton Based on Data Complexty Measures * Edth Hernández-Reyes, J.A. Carrasco-Ochoa, and J.Fco. Martínez-Trndad Natonal Insttute for Astrophyscs, Optcs and Electroncs, Lus Enrque Erro No.1 Sta.

More information

Smoothing Spline ANOVA for variable screening

Smoothing Spline ANOVA for variable screening Smoothng Splne ANOVA for varable screenng a useful tool for metamodels tranng and mult-objectve optmzaton L. Rcco, E. Rgon, A. Turco Outlne RSM Introducton Possble couplng Test case MOO MOO wth Game Theory

More information

CS 534: Computer Vision Model Fitting

CS 534: Computer Vision Model Fitting CS 534: Computer Vson Model Fttng Sprng 004 Ahmed Elgammal Dept of Computer Scence CS 534 Model Fttng - 1 Outlnes Model fttng s mportant Least-squares fttng Maxmum lkelhood estmaton MAP estmaton Robust

More information

Text Similarity Computing Based on LDA Topic Model and Word Co-occurrence

Text Similarity Computing Based on LDA Topic Model and Word Co-occurrence 2nd Internatonal Conference on Software Engneerng, Knowledge Engneerng and Informaton Engneerng (SEKEIE 204) Text Smlarty Computng Based on LDA Topc Model and Word Co-occurrence Mngla Shao School of Computer,

More information

A Robust Method for Estimating the Fundamental Matrix

A Robust Method for Estimating the Fundamental Matrix Proc. VIIth Dgtal Image Computng: Technques and Applcatons, Sun C., Talbot H., Ourseln S. and Adraansen T. (Eds.), 0- Dec. 003, Sydney A Robust Method for Estmatng the Fundamental Matrx C.L. Feng and Y.S.

More information

Lecture 4: Principal components

Lecture 4: Principal components /3/6 Lecture 4: Prncpal components 3..6 Multvarate lnear regresson MLR s optmal for the estmaton data...but poor for handlng collnear data Covarance matrx s not nvertble (large condton number) Robustness

More information

Learning a Class-Specific Dictionary for Facial Expression Recognition

Learning a Class-Specific Dictionary for Facial Expression Recognition BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 16, No 4 Sofa 016 Prnt ISSN: 1311-970; Onlne ISSN: 1314-4081 DOI: 10.1515/cat-016-0067 Learnng a Class-Specfc Dctonary for

More information

Collaboratively Regularized Nearest Points for Set Based Recognition

Collaboratively Regularized Nearest Points for Set Based Recognition Academc Center for Computng and Meda Studes, Kyoto Unversty Collaboratvely Regularzed Nearest Ponts for Set Based Recognton Yang Wu, Mchhko Mnoh, Masayuk Mukunok Kyoto Unversty 9/1/013 BMVC 013 @ Brstol,

More information

Improving Web Image Search using Meta Re-rankers

Improving Web Image Search using Meta Re-rankers VOLUME-1, ISSUE-V (Aug-Sep 2013) IS NOW AVAILABLE AT: www.dcst.com Improvng Web Image Search usng Meta Re-rankers B.Kavtha 1, N. Suata 2 1 Department of Computer Scence and Engneerng, Chtanya Bharath Insttute

More information

Detection of an Object by using Principal Component Analysis

Detection of an Object by using Principal Component Analysis Detecton of an Object by usng Prncpal Component Analyss 1. G. Nagaven, 2. Dr. T. Sreenvasulu Reddy 1. M.Tech, Department of EEE, SVUCE, Trupath, Inda. 2. Assoc. Professor, Department of ECE, SVUCE, Trupath,

More information

Graph-based Clustering

Graph-based Clustering Graphbased Clusterng Transform the data nto a graph representaton ertces are the data ponts to be clustered Edges are eghted based on smlarty beteen data ponts Graph parttonng Þ Each connected component

More information

Enhancement of Infrequent Purchased Product Recommendation Using Data Mining Techniques

Enhancement of Infrequent Purchased Product Recommendation Using Data Mining Techniques Enhancement of Infrequent Purchased Product Recommendaton Usng Data Mnng Technques Noraswalza Abdullah, Yue Xu, Shlomo Geva, and Mark Loo Dscplne of Computer Scence Faculty of Scence and Technology Queensland

More information

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique //00 :0 AM Outlne and Readng The Greedy Method The Greedy Method Technque (secton.) Fractonal Knapsack Problem (secton..) Task Schedulng (secton..) Mnmum Spannng Trees (secton.) Change Money Problem Greedy

More information

Description of NTU Approach to NTCIR3 Multilingual Information Retrieval

Description of NTU Approach to NTCIR3 Multilingual Information Retrieval Proceedngs of the Thrd NTCIR Workshop Descrpton of NTU Approach to NTCIR3 Multlngual Informaton Retreval Wen-Cheng Ln and Hsn-Hs Chen Department of Computer Scence and Informaton Engneerng Natonal Tawan

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervsed Learnng and Clusterng Why consder unlabeled samples?. Collectng and labelng large set of samples s costly Gettng recorded speech s free, labelng s tme consumng 2. Classfer could be desgned

More information

Intelligent Information Acquisition for Improved Clustering

Intelligent Information Acquisition for Improved Clustering Intellgent Informaton Acquston for Improved Clusterng Duy Vu Unversty of Texas at Austn duyvu@cs.utexas.edu Mkhal Blenko Mcrosoft Research mblenko@mcrosoft.com Prem Melvlle IBM T.J. Watson Research Center

More information

Experiments in Text Categorization Using Term Selection by Distance to Transition Point

Experiments in Text Categorization Using Term Selection by Distance to Transition Point Experments n Text Categorzaton Usng Term Selecton by Dstance to Transton Pont Edgar Moyotl-Hernández, Héctor Jménez-Salazar Facultad de Cencas de la Computacón, B. Unversdad Autónoma de Puebla, 14 Sur

More information

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc.

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc. [Type text] [Type text] [Type text] ISSN : 0974-74 Volume 0 Issue BoTechnology 04 An Indan Journal FULL PAPER BTAIJ 0() 04 [684-689] Revew on Chna s sports ndustry fnancng market based on market -orented

More information

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION Paulo Quntlano 1 & Antono Santa-Rosa 1 Federal Polce Department, Brasla, Brazl. E-mals: quntlano.pqs@dpf.gov.br and

More information

Sum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints

Sum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints Australan Journal of Basc and Appled Scences, 2(4): 1204-1208, 2008 ISSN 1991-8178 Sum of Lnear and Fractonal Multobjectve Programmng Problem under Fuzzy Rules Constrants 1 2 Sanjay Jan and Kalash Lachhwan

More information

High-Boost Mesh Filtering for 3-D Shape Enhancement

High-Boost Mesh Filtering for 3-D Shape Enhancement Hgh-Boost Mesh Flterng for 3-D Shape Enhancement Hrokazu Yagou Λ Alexander Belyaev y Damng We z Λ y z ; ; Shape Modelng Laboratory, Unversty of Azu, Azu-Wakamatsu 965-8580 Japan y Computer Graphcs Group,

More information

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1)

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1) Secton 1.2 Subsets and the Boolean operatons on sets If every element of the set A s an element of the set B, we say that A s a subset of B, or that A s contaned n B, or that B contans A, and we wrte A

More information

Learning-Based Top-N Selection Query Evaluation over Relational Databases

Learning-Based Top-N Selection Query Evaluation over Relational Databases Learnng-Based Top-N Selecton Query Evaluaton over Relatonal Databases Lang Zhu *, Wey Meng ** * School of Mathematcs and Computer Scence, Hebe Unversty, Baodng, Hebe 071002, Chna, zhu@mal.hbu.edu.cn **

More information

Lobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide

Lobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide Lobachevsky State Unversty of Nzhn Novgorod Polyhedron Quck Start Gude Nzhn Novgorod 2016 Contents Specfcaton of Polyhedron software... 3 Theoretcal background... 4 1. Interface of Polyhedron... 6 1.1.

More information

Optimizing Document Scoring for Query Retrieval

Optimizing Document Scoring for Query Retrieval Optmzng Document Scorng for Query Retreval Brent Ellwen baellwe@cs.stanford.edu Abstract The goal of ths project was to automate the process of tunng a document query engne. Specfcally, I used machne learnng

More information

Machine Learning 9. week

Machine Learning 9. week Machne Learnng 9. week Mappng Concept Radal Bass Functons (RBF) RBF Networks 1 Mappng It s probably the best scenaro for the classfcaton of two dataset s to separate them lnearly. As you see n the below

More information

Clustering Algorithm of Similarity Segmentation based on Point Sorting

Clustering Algorithm of Similarity Segmentation based on Point Sorting Internatonal onference on Logstcs Engneerng, Management and omputer Scence (LEMS 2015) lusterng Algorthm of Smlarty Segmentaton based on Pont Sortng Hanbng L, Yan Wang*, Lan Huang, Mngda L, Yng Sun, Hanyuan

More information

VISUAL SELECTION OF SURFACE FEATURES DURING THEIR GEOMETRIC SIMULATION WITH THE HELP OF COMPUTER TECHNOLOGIES

VISUAL SELECTION OF SURFACE FEATURES DURING THEIR GEOMETRIC SIMULATION WITH THE HELP OF COMPUTER TECHNOLOGIES UbCC 2011, Volume 6, 5002981-x manuscrpts OPEN ACCES UbCC Journal ISSN 1992-8424 www.ubcc.org VISUAL SELECTION OF SURFACE FEATURES DURING THEIR GEOMETRIC SIMULATION WITH THE HELP OF COMPUTER TECHNOLOGIES

More information

Recommended Items Rating Prediction based on RBF Neural Network Optimized by PSO Algorithm

Recommended Items Rating Prediction based on RBF Neural Network Optimized by PSO Algorithm Recommended Items Ratng Predcton based on RBF Neural Network Optmzed by PSO Algorthm Chengfang Tan, Cayn Wang, Yuln L and Xx Q Abstract In order to mtgate the data sparsty and cold-start problems of recommendaton

More information

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009. Farrukh Jabeen Algorthms 51 Assgnment #2 Due Date: June 15, 29. Assgnment # 2 Chapter 3 Dscrete Fourer Transforms Implement the FFT for the DFT. Descrbed n sectons 3.1 and 3.2. Delverables: 1. Concse descrpton

More information

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes SPH3UW Unt 7.3 Sphercal Concave Mrrors Page 1 of 1 Notes Physcs Tool box Concave Mrror If the reflectng surface takes place on the nner surface of the sphercal shape so that the centre of the mrror bulges

More information

BRDPHHC: A Balance RDF Data Partitioning Algorithm based on Hybrid Hierarchical Clustering

BRDPHHC: A Balance RDF Data Partitioning Algorithm based on Hybrid Hierarchical Clustering 015 IEEE 17th Internatonal Conference on Hgh Performance Computng and Communcatons (HPCC), 015 IEEE 7th Internatonal Symposum on Cyberspace Safety and Securty (CSS), and 015 IEEE 1th Internatonal Conf

More information

Biostatistics 615/815

Biostatistics 615/815 The E-M Algorthm Bostatstcs 615/815 Lecture 17 Last Lecture: The Smplex Method General method for optmzaton Makes few assumptons about functon Crawls towards mnmum Some recommendatons Multple startng ponts

More information

FINDING IMPORTANT NODES IN SOCIAL NETWORKS BASED ON MODIFIED PAGERANK

FINDING IMPORTANT NODES IN SOCIAL NETWORKS BASED ON MODIFIED PAGERANK FINDING IMPORTANT NODES IN SOCIAL NETWORKS BASED ON MODIFIED PAGERANK L-qng Qu, Yong-quan Lang 2, Jng-Chen 3, 2 College of Informaton Scence and Technology, Shandong Unversty of Scence and Technology,

More information

Machine Learning. Topic 6: Clustering

Machine Learning. Topic 6: Clustering Machne Learnng Topc 6: lusterng lusterng Groupng data nto (hopefully useful) sets. Thngs on the left Thngs on the rght Applcatons of lusterng Hypothess Generaton lusters mght suggest natural groups. Hypothess

More information

LECTURE : MANIFOLD LEARNING

LECTURE : MANIFOLD LEARNING LECTURE : MANIFOLD LEARNING Rta Osadchy Some sldes are due to L.Saul, V. C. Raykar, N. Verma Topcs PCA MDS IsoMap LLE EgenMaps Done! Dmensonalty Reducton Data representaton Inputs are real-valued vectors

More information

Load-Balanced Anycast Routing

Load-Balanced Anycast Routing Load-Balanced Anycast Routng Chng-Yu Ln, Jung-Hua Lo, and Sy-Yen Kuo Department of Electrcal Engneerng atonal Tawan Unversty, Tape, Tawan sykuo@cc.ee.ntu.edu.tw Abstract For fault-tolerance and load-balance

More information

Load Balancing for Hex-Cell Interconnection Network

Load Balancing for Hex-Cell Interconnection Network Int. J. Communcatons, Network and System Scences,,, - Publshed Onlne Aprl n ScRes. http://www.scrp.org/journal/jcns http://dx.do.org/./jcns.. Load Balancng for Hex-Cell Interconnecton Network Saher Manaseer,

More information

Pruning Training Corpus to Speedup Text Classification 1

Pruning Training Corpus to Speedup Text Classification 1 Prunng Tranng Corpus to Speedup Text Classfcaton Jhong Guan and Shugeng Zhou School of Computer Scence, Wuhan Unversty, Wuhan, 430079, Chna hguan@wtusm.edu.cn State Key Lab of Software Engneerng, Wuhan

More information

Keyword-based Document Clustering

Keyword-based Document Clustering Keyword-based ocument lusterng Seung-Shk Kang School of omputer Scence Kookmn Unversty & AIrc hungnung-dong Songbuk-gu Seoul 36-72 Korea sskang@kookmn.ac.kr Abstract ocument clusterng s an aggregaton of

More information

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS ARPN Journal of Engneerng and Appled Scences 006-017 Asan Research Publshng Network (ARPN). All rghts reserved. NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS Igor Grgoryev, Svetlana

More information

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching A Fast Vsual Trackng Algorthm Based on Crcle Pxels Matchng Zhqang Hou hou_zhq@sohu.com Chongzhao Han czhan@mal.xjtu.edu.cn Ln Zheng Abstract: A fast vsual trackng algorthm based on crcle pxels matchng

More information

Robust Dictionary Learning with Capped l 1 -Norm

Robust Dictionary Learning with Capped l 1 -Norm Proceedngs of the Twenty-Fourth Internatonal Jont Conference on Artfcal Intellgence (IJCAI 205) Robust Dctonary Learnng wth Capped l -Norm Wenhao Jang, Fepng Ne, Heng Huang Unversty of Texas at Arlngton

More information

Hybridization of Expectation-Maximization and K-Means Algorithms for Better Clustering Performance

Hybridization of Expectation-Maximization and K-Means Algorithms for Better Clustering Performance BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 16, No 2 Sofa 2016 Prnt ISSN: 1311-9702; Onlne ISSN: 1314-4081 DOI: 10.1515/cat-2016-0017 Hybrdzaton of Expectaton-Maxmzaton

More information

Outline. Self-Organizing Maps (SOM) US Hebbian Learning, Cntd. The learning rule is Hebbian like:

Outline. Self-Organizing Maps (SOM) US Hebbian Learning, Cntd. The learning rule is Hebbian like: Self-Organzng Maps (SOM) Turgay İBRİKÇİ, PhD. Outlne Introducton Structures of SOM SOM Archtecture Neghborhoods SOM Algorthm Examples Summary 1 2 Unsupervsed Hebban Learnng US Hebban Learnng, Cntd 3 A

More information

Clustering Algorithm Combining CPSO with K-Means Chunqin Gu 1, a, Qian Tao 2, b

Clustering Algorithm Combining CPSO with K-Means Chunqin Gu 1, a, Qian Tao 2, b Internatonal Conference on Advances n Mechancal Engneerng and Industral Informatcs (AMEII 05) Clusterng Algorthm Combnng CPSO wth K-Means Chunqn Gu, a, Qan Tao, b Department of Informaton Scence, Zhongka

More information

Multi-Source Multi-View Clustering via Discrepancy Penalty

Multi-Source Multi-View Clustering via Discrepancy Penalty Mult-Source Mult-Vew Clusterng va Dscrepancy Penalty Wexang Shao, Jawe Zhang, Lfang He, Phlp S. Yu Unversty of Illnos at Chcago Emal: wshao4@uc.edu, jzhan9@uc.edu, psyu@uc.edu Shenzhen Unversty, Chna Emal:

More information

Classification / Regression Support Vector Machines

Classification / Regression Support Vector Machines Classfcaton / Regresson Support Vector Machnes Jeff Howbert Introducton to Machne Learnng Wnter 04 Topcs SVM classfers for lnearly separable classes SVM classfers for non-lnearly separable classes SVM

More information

Overlapping Clustering with Sparseness Constraints

Overlapping Clustering with Sparseness Constraints 2012 IEEE 12th Internatonal Conference on Data Mnng Workshops Overlappng Clusterng wth Sparseness Constrants Habng Lu OMIS, Santa Clara Unversty hlu@scu.edu Yuan Hong MSIS, Rutgers Unversty yhong@cmc.rutgers.edu

More information

BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION

BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION SHI-LIANG SUN, HONG-LEI SHI Department of Computer Scence and Technology, East Chna Normal Unversty 500 Dongchuan Road, Shangha 200241, P. R. Chna E-MAIL: slsun@cs.ecnu.edu.cn,

More information

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,

More information

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur FEATURE EXTRACTION Dr. K.Vjayarekha Assocate Dean School of Electrcal and Electroncs Engneerng SASTRA Unversty, Thanjavur613 41 Jont Intatve of IITs and IISc Funded by MHRD Page 1 of 8 Table of Contents

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS46: Mnng Massve Datasets Jure Leskovec, Stanford Unversty http://cs46.stanford.edu /19/013 Jure Leskovec, Stanford CS46: Mnng Massve Datasets, http://cs46.stanford.edu Perceptron: y = sgn( x Ho to fnd

More information

Web Document Classification Based on Fuzzy Association

Web Document Classification Based on Fuzzy Association Web Document Classfcaton Based on Fuzzy Assocaton Choochart Haruechayasa, Me-Lng Shyu Department of Electrcal and Computer Engneerng Unversty of Mam Coral Gables, FL 33124, USA charuech@mam.edu, shyu@mam.edu

More information

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY 1. SSDH: Semi-supervised Deep Hashing for Large Scale Image Retrieval

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY 1. SSDH: Semi-supervised Deep Hashing for Large Scale Image Retrieval IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY SSDH: Sem-supervsed Deep Hashng for Large Scale Image Retreval Jan Zhang, and Yuxn Peng arxv:607.08477v2 [cs.cv] 8 Jun 207 Abstract Hashng

More information

Semi-Supervised Discriminant Analysis Based On Data Structure

Semi-Supervised Discriminant Analysis Based On Data Structure IOSR Journal of Computer Engneerng (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 17, Issue 3, Ver. VII (May Jun. 2015), PP 39-46 www.osrournals.org Sem-Supervsed Dscrmnant Analyss Based On Data

More information

Virtual Machine Migration based on Trust Measurement of Computer Node

Virtual Machine Migration based on Trust Measurement of Computer Node Appled Mechancs and Materals Onlne: 2014-04-04 ISSN: 1662-7482, Vols. 536-537, pp 678-682 do:10.4028/www.scentfc.net/amm.536-537.678 2014 Trans Tech Publcatons, Swtzerland Vrtual Machne Mgraton based on

More information

An Improvement to Naive Bayes for Text Classification

An Improvement to Naive Bayes for Text Classification Avalable onlne at www.scencedrect.com Proceda Engneerng 15 (2011) 2160 2164 Advancen Control Engneerngand Informaton Scence An Improvement to Nave Bayes for Text Classfcaton We Zhang a, Feng Gao a, a*

More information

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr) Helsnk Unversty Of Technology, Systems Analyss Laboratory Mat-2.08 Independent research projects n appled mathematcs (3 cr) "! #$&% Antt Laukkanen 506 R ajlaukka@cc.hut.f 2 Introducton...3 2 Multattrbute

More information

5 The Primal-Dual Method

5 The Primal-Dual Method 5 The Prmal-Dual Method Orgnally desgned as a method for solvng lnear programs, where t reduces weghted optmzaton problems to smpler combnatoral ones, the prmal-dual method (PDM) has receved much attenton

More information