430 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 45, NO. 3, MARCH Boosting for Multi-Graph Classification

Size: px
Start display at page:

Download "430 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 45, NO. 3, MARCH Boosting for Multi-Graph Classification"

Transcription

1 430 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 45, NO. 3, MARCH 2015 Boostng for Mult-Graph Classfcaton Ja Wu, Student Member, IEEE, Shru Pan, Xngquan Zhu, Senor Member, IEEE, and Zhhua Ca Abstract In ths paper, we formulate a novel graph-based learnng problem, mult-graph classfcaton (MGC), whch ams to learn a classfer from a set of labeled bags each contanng a number of graphs nsde the bag. A bag s labeled postve, f at least one graph n the bag s postve, and negatve otherwse. Such a mult-graph representaton can be used for many real-world applcatons, such as webpage classfcaton, where a webpage can be regarded as a bag wth texts and mages nsde the webpage beng represented as graphs. Ths problem s a generalzaton of mult-nstance learnng (MIL) but wth vtal dfferences, manly because nstances n MIL share a common feature space whereas no feature s avalable to represent graphs n a mult-graph bag. To solve the problem, we propose a boostng based mult-graph classfcaton framework (bmgc). Gven a set of labeled mult-graph bags, bmgc employs dynamc weght adjustment at both bag- and graph-levels to select one subgraph n each teraton as a weak classfer. In each teraton, bag and graph weghts are adjusted such that an ncorrectly classfed bag wll receve a hgher weght because ts predcted bag label conflcts to the genune label, whereas an ncorrectly classfed graph wll receve a lower weght value f the graph s n a postve bag (or a hgher weght f the graph s n a negatve bag). Accordngly, bmgc s able to dfferentate graphs n postve and negatve bags to derve effectve classfers to form a boostng model for MGC. Experments and comparsons on real-world mult-graph learnng tasks demonstrate the algorthm performance. Index Terms Boostng, graph classfcaton, mult-graph, mult-nstance learnng, subgraph mnng. I. INTRODUCTION GRAPH classfcaton, n whch the object to be classfed s a graph, has found many applcatons n the past decade, such as chemcal compounds [1], XML documents [2], program flows [3], and mages [4]. Despte ts success n a broad spectrum of areas, standard graph classfcaton settng s rather restrctve for many real-world learnng problems. One of such problems s mult-graph classfcaton (MGC), Manuscrpt receved July 14, 2013; revsed January 25, 2014; accepted May 13, Date of publcaton July 8, 2014; date of current verson February 12, Ths paper was recommended by Assocate Edtor M. Last. J. Wu s wth the School of Computer Scence, Chna Unversty of Geoscences, Wuhan , Chna, and also wth the Centre for Quantum Computaton and Intellgent Systems, Faculty of Engneerng & Informaton Technology, Unversty of Technology, Sydney, Ultmo, NSW 2007, Australa (e-mal: ja.wu@student.uts.edu.au). S. Pan s wth the Centre for Quantum Computaton and Intellgent Systems, Faculty of Engneerng & Informaton Technology, Unversty of Technology, Sydney, Ultmo, NSW 2007, Australa (e-mal: shru.pan@student.uts.edu.au). X. Zhu s wth the Department of Computer and Electrcal Engneerng & Computer Scence, Florda Atlantc Unversty, Boca Raton, FL USA (e-mal: xzhu3@fau.edu). Z. Ca s wth the School of Computer Scence, Chna Unversty of Geoscences, Wuhan , Chna (e-mal: zhca@cug.edu.cn). Color versons of one or more of the fgures n ths paper are avalable onlne at Dgtal Object Identfer /TCYB n whch the object to be classfed s a bag of graphs. For example, a webpage may consst of texts and mages, where texts can be represented as graphs to preserve contextual nformaton [5] and mages can also be represented as graphs to descrbe structural dependency between mage regons [6]. As a result, a webpage can be regarded as a bag contanng a number of graphs, each of whch represents a certan part of the webpage content. For an nformaton seeker, a webpage s nterestng to hm/her f one or multple parts of the webpage (texts and/or mages) draws hs/her attenton a graph bag s postve f at least one graph n the bag s postve. On the other hand, the webpage s not nterestng to the vewer f none of the content attracts the vewer a graph bag s negatve f all graphs nsde the bag are negatve. The above mult-graph settng can be found useful n many other domans. For bo-pharmaceutcal test, labelng ndvdual molecules (whch can be represented as graphs) s expensve and tme-consumng. Molecular group actvty predcton can be used to nvestgate the actvty of a group (.e., a bag) of molecules, wth the actve group (.e., postve bag), n whch at least one molecule s actve, beng further nvestgated for ndvdual actvty test. Another MGC applcaton s scentfc publcaton classfcaton, where a paper and ts references can be represented as a bag of graphs and each graph (.e., a paper) s formed by usng the correlatons between keywords n the paper, as shown n Fg. 1. A bag s labeled postve, f the paper or any of ts references s relevant to a specfc topc. Smlarly, for onlne revew based product recommendaton, each product receves many customer revews. For each revew composed of detaled text descrptons, we can use a graph to represent the revew descrptons. Thus, a product can be represented as a bag of graphs. Assume customers manly concern about several key propertes, such as affordablty and durablty, of the product. A product (.e., a bag) can be labeled as postve f t receves very postve revew n any of these propertes, and negatve otherwse. As a result, we can use MGC learnng to help recommend products to customers. Indeed, the MGC problem s a generalzaton of multnstance learnng (MIL) to graph data, but wth sgnfcant complcatons. Exstng MIL methods cannot be smply appled to the mult-graph settng because they can only handle bags wth all nstances beng represented n a common vectoral feature space. Unfortunately, n the MGC problem settng, graphs cannot drectly provde feature vectors for learnng. On the other hand, exstng graph classfcaton methods cannot be used to tackle the MGC problem nether, because they requre each sngle graph to be labeled n order to learn a classfer. One smple soluton s to represent all graphs n the same feature space, by usng some subgraph feature selecton methods [7] [9] to convert graphs as nstances, and then c 2014 IEEE. Personal use s permtted, but republcaton/redstrbuton requres IEEE permsson. See for more nformaton.

2 WU et al.: BOOSTING FOR MULTI-GRAPH CLASSIFICATION 431 Fg. 1. Example of mult-graph representaton for a scentfc publcaton. Each paper s represented as a mult-graph bag, where each graph nsde the bag corresponds to the abstract of the paper or the abstract of the reference cted n the paper (a graph s formed by usneywords of the abstract as nodes and ther correlatons as edges). The graph constructon detals are reported n Secton VII-A. apply exstng MIL methods to the nstance bags. However, ths smple soluton suffers from three nherent dsadvantages. 1) Large Subgraph Feature Space: The graph substructure feature space ncreases, wth respect to the number of edges and nodes, n an exponental order. It s computatonally neffcent, or even nfeasble, to enumerate all subgraph features, and then select some subgraph features for classfcaton. 2) Feature Flterng Ineffcency: By separatng subgraph feature mnng and feature selecton nto two steps, the flterng process of fndng salent subgraph patterns wll depend on the optmal soluton of the subsequent learnng algorthm. It s very dffcult to theoretcally guarantee that the statstcal crteron provdes good features for the subsequent learnng algorthm. Ths s the problem of all flter methods (as dscussed n [10]). 3) Bag Constrants: The bag constrants n the mult-graph learnng provde mportant nformaton to dfferentate postve and negatve graphs, whereas the smple soluton drectly extracts subgraphs from all graphs wthout consderng mult-graph bag constrants for effectve learnng. In summary, the MGC problem for the aforementoned realworld applcatons needs to address two essental challenges. 1) Labelng Ambguty: Labels are only avalable at bag level nstead of nstance level (.e., a bag s labeled postve f t has at least one postve graph and negatve otherwse). 2) Structured Data Representaton: Instances n a bag are not vectors but graphs, whch mples that all nstances are not represented n a common feature space for calculatng smlartes or dstances. Motvated by the above challenges, n ths paper, we propose a boostng based mult-graph classfcaton framework (bmgc) for mult-graph classfcaton. In each boostng teraton, bmgc explores the most nformatve subgraph to construct a sngle weak classfer, whch s used to update the weghts of graphs and bags to obtan the next nformatve subgraph. At the end of the boostng process, the selected weak classfers are combned to form a strong classfer. A unque characterstc of bmgc s that t combnes bag- and graph-level constrants to assess the nformatveness score of a subgraph. By adaptng the score as a prunng crteron, we combne subgraph mnng and nformatve subgraph exploraton to dynamcally construct weak classfers on the fly. As a result, the proposed learnng framework not only addresses the labelng ambguty ssue by usng a novel two-level (bag and graph) weghtng strategy but also addresses the structured data representaton ssue through a dynamc subgraph selecton crteron. The expermental results on real-world data demonstrate that bmgc s effectve for MGC. The remander of the paper s organzed as follows. A bref revew of related works s reported n Secton II. The problem defnton and the overall framework are descrbed n Sectons III and IV, respectvely. Secton V ntroduces the proposed subgraph selecton crteron. The bmgc algorthm s presented n Secton VI, followed by the experments n Secton VII. Secton VIII dscusses the propertes of the proposed bmgc, and we conclude the paper n Secton IX. II. RELATED WORK A. Mult-nstance Learnng MGC s a generalzaton of the MIL problem, whch was frst proposed by Detterch et al. [11] for drug actvty predcton. Snce then, t has drawn ncreasng nterest n the machne learnng communty for many real-world applcatons, such as mage categorzaton [12], web mnng [13], language recognton [14], and computer securty [15]. The key assumpton of MIL formulaton s that the tranng set s composed of some labeled bags, each of whch contans a number of nstances. A bag s labeled postve f at least one of ts nstances s postve and negatve otherwse. The goal of MIL s to predct the label of an unknown bag. Several off-the-shelf methods have been developed to solve the MIL problem, whch can roughly be dvded nto two categores. 1) Sngle-Instance Learner Based MIL: One approach to solve MIL problems s to upgrade generc sngle-nstance learnng methods to deal wth mult-nstance data. For example, lazy learnng Ctaton-KNN and Bayesan-KNN [16] extend the k-nearest neghbor (KNN) algorthm for MIL. Tree learnng MITI [17] and MIRI [18] are varatons of decson trees for MIL. Rule learnng RIPPER-MI adapts the RIPPER algorthm [19] for MIL. Neural network BP-MIP extends standard neural networks [20], and kernel method MISMO adapts the classcal support vector machne [21] for MIL. Logstc learnng MILR [22] apples the logstc regresson to MIL, and ensemble approaches [23], [24] whch extend baggng and boostng [25] tomil. 2) Bag-Based MIL Algorthms: The frst specfcally desgned method for MIL s the axs-parallel rectangle (APR) algorthm [11], whch approxmates the APRs constructed by the conjuncton of features. Based on the dea of APR, a number of algorthms have also been desgned for MIL. Examples nclude dverse densty (DD) [26], whch searches a pont n the feature space by maxmzng the DD functon that measures a co-occurrence of smlar nstances from dfferent postve bags; MIEMDD [27], whch combnes expectatonmaxmzaton (EM) algorthm wth DD to search the most lkely concept; and MIOptmalBall [28], another boostng optmal ball based approach, whch uses balls (wth respect to

3 432 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 45, NO. 3, MARCH 2015 varous metrcs) as weak hypotheses centered at nstances of postve bags. B. Graph Classfcaton The MGC problem can also be vewed as a generalzaton of graph classfcaton where objects are bags of graphs (nstead of ndvdual graphs). Exstng graph classfcaton methods can be broadly classfed nto the followng two categores. 1) Global Dstance Based Approaches: The global dstance based methods consder correlatons [29] or smlartes between two graphs and plug the kernel matrx nto a offthe-shelf learner, such as support vector machnes, to learn a model for graph classfcaton. Examples nclude graph kernels [30], [31], graph embeddng [32], and graph transformaton [33]. One obvous drawback of global dstance based approaches s that the dstance s calculated based on the smlarty of global graph structures, such as random walks or paths, between two graphs. Therefore, t s not clear whch substructures (or whch parts of the graph) are mostly dscrmnatve for dfferentatng graphs between dfferent classes. 2) Local Subgraph Feature Based Approaches: For many graph classfcaton tasks, such as chemcal compound classfcaton [1], research has shown that graphs wthn the same class may not have hgh global smlarty but merely share some unque substructures. Accordngly, extractng mportant subgraph features, usng some predefned crtera, to represent a graph n a vectoral space becomes a popular soluton for graph classfcaton. The most common subgraph selecton crteron s frequency, whch ntends to select frequently appearng subgraphs by usng frequent subgraph mnng methods. For example, one of the most popular algorthms for frequent subgraph mnng s gspan [34]. Other methods nclude AGM [35], FSG [7], MoFa [36], and Gaston [37]. The subgraph feature mnng approach seems applcable to the MGC problem as a preprocessng step to transform all graphs nto feature vectors. However, one major defcency of ths approach s that t s computatonally demandng to enumerate all frequent subgraphs n the target graph set, whch nhbts ts ablty to handle large graph sets. To overcome ths drawback, some supervsed subgraph feature extracton approaches have been developed, such as LEAP [38], gpls [39], and COPK [8], whch search drectly for dscrmnatve subgraph patterns for classfcaton. Moreover, Jn et al. [40] proposes an effcent graph classfcaton method usng evolutonary computaton for mnng dscrmnatve subgraphs for graph classfcaton n large databases. Besdes, some graph boostng methods [41] [44] also exst to use each sngle subgraph feature as a weak classfer to buld boostng algorthm, ncludng some other types of boostng approaches [45], [46] for graph classfcaton. III. PROBLEM DEFINITION In ths secton, we defne mportant notatons and concepts, whch wll be used throughout the paper. We also formally defne the MGC problem n ths secton. Defnton 1 (Connected Graph): A graph s represented as G = (V, E, L, l) where V s a set of vertces, E V V s a Fg. 2. Example of subgraph feature representaton for bags. B + 1 and B 2 are postve and negatve bags, respectvely. G + 1 s a postve graph and G 2, G 3,andG 4 are labeled negatve. The feature value of a bag correspondng to each subgraph g 1 or g 2 s set to 1, ff there s a graph n the bag contans the subgraph, and 0 otherwse. set of edges, and L s the set of symbols for the vertces and edges. l : V E L s the functon assgnng labels to the vertces and edges. A connected graph s a graph such that there s a path between any par of vertces. Defnton 2 (Bag of Graphs): A graph bag contans a number of graphs, denoted by B ={G 1,...,G n }, where G j and n denote the jth graph and the total number of graphs n the th bag, respectvely. For ease of representaton, we also use G j to denote the jth graph n a gven bag. A bag B s label s denoted by y { 1, +1}. A bag s ether postve (B + )or negatve (B ). In ths paper, we use B ={B 1,...,B p } to denote a set of bags assocated wth the weghts w B ={w B 1,...,wB p }, where p denotes the number of bags n B. We can also aggregate all graphs n B as G ={G 1,...,G q } assocated wth the weghts w G ={w G 1,...,wG q }, where q denotes the number of graphs n G. Smlarly, the set of postve bags n B s denoted by B +, wth B denotng the set of negatve bags. Defnton 3 (Subgraph): Let G = (V, E, L, l) and = (V, E, L, l ) each denotes a connected graph. s a subgraph of G,.e., G, ff there exsts an njectve functon ϕ:v V s.t.: 1) v V, l (v) = l(ϕ(v)); and 2) (u, v) E,(ϕ(u), ϕ(v)) E and l (u, v) = l(ϕ(u), ϕ(v)). If s a subgraph of G, then G s a supergraph of. Defnton 4 (Subgraph Feature Representaton for Graph): Let S g ={g 1,...,g s } denote a set of subgraph patterns dscovered from a gven set of graphs. For each graph G,weusea subgraph feature vector x G = [(x g 1 ) G,...,(x g s ) G ] {0, 1} s to represent G n the feature space, where (x ) G = 1, ff s a subgraph of G (.e., G, S g ) and (x ) G = 0 otherwse. Defnton 5: (Subgraph Feature Representaton for Bag): Gven a set of subgraphs S g = {g 1,...,g s }, a graph bag B can be represented by a feature vector x B = [(x g 1 ) B,...,(x g s ) B ] {0, 1} s, where (x ) B = 1, ff s a subgraph of any graph G j n bag B (.e., G j B G j, S g ) and (x ) B = 0 otherwse. An example of subgraph feature representaton for graph bags s llustrated n Fg. 2, where two graph bags (B + 1 and B 2 on the left panel) are represented as two 2-D feature vectors (on the rght panel) based on two subgraph patterns (g 1 and g 2 ). Gven a mult-graph set B wth a number of labeled graph bags, where each postve bag contans at least one postve graph and all graphs n each negatve bag are negatve (.e., the

4 WU et al.: BOOSTING FOR MULTI-GRAPH CLASSIFICATION 433 bag constrant n MGC), the am of MGC s to buld a predcton model from the tranng mult-graph bag set B to predct some prevously unseen graph bags wth unknown label wth maxmum bag classfcaton accuracy. IV. OVERALL FRAMEWORK OF bmgc In mult-graph bags, there s no feature avalable to represent graphs, so exstng MIL methods, whch requre nstances to have a vectorzed feature representaton, cannot be appled to MGC. In addton, due to lack of labelng nformaton for ndvdual graphs nsde postve bags, subgraph feature based graph classfcaton cannot be drectly appled to MGC nether. To solve the above ssues, n ths secton we propose a bmgc framework, whch apples dynamc weght adjustment at both graph- and bag-levels to select one subgraph n each teraton to construct a sngle weak classfer. In each teraton, the bag and graph weghts are adjusted by the last bag-level and graphlevel weak classfers, respectvely. By dong so, bmgc s able to dfferentate graphs n postve or negatve bags to derve effectve learnng models by boostng all the sngle subgraph bag-level weak classfers. The proposed bmgc framework, as shown n Fg. 3, ncludes the followng four major steps. 1) Subgraph Canddate Generaton: Generatng subgraph canddates s a key step for selectng the most nformatve subgraph. To fnd subgraph canddates wth dverse structures, we aggregate graphs n mult-graph bags nto three graph sets: a) graphs n all bags; b) graphs n all postve bags; and c) graphs n all negatve bags. A gspan [34] based subgraph mnng procedure s appled to each graph set, through whch a set of dverse subgraph canddate patterns can be dscovered for valdaton. 2) Bag Constraned Subgraph Exploraton: In the tth teraton, an nformatve subgraph g t s selected to form a weak classfer for MGC under the weghted bag- and graph-level constrants. To obtan the t + 1th nformatve subgraph, the weghts of bags and graphs should be updated. After m teratons, the selected m subgraphs wll correspond to m weak classfers for learnng. 3) Updatng Weghts of Bags and Graphs: After we fnd the tth nformatve subgraph g t, a bag-level classfer Ht B and a graph-level classfer Ht G wll be traned, respectvely. For graphs, due to our assumpton that we apply bag labels to graphs, some graphs n postve bags have been assgned wrong labels. If a graph G n postve bag set B + s msclassfed by Ht G, n the next teraton we wll decrease G s weght to reduce ts mpact on the learnng process. If a graph G n negatve bag set B s msclassfed, ts weght wll be ncreased, such that G n the negatve bag set wll play a more mportant role to help the learnng algorthm fnd better subgraphs. 4) Boostng Classfcaton: After the subgraphs are selected n all teratons to form the correspondng sngle weak classfers, they can be weghted to construct a strong classfer for MGC. In the followng two sectons, we frst propose our subgraph exploraton crteron n Secton V and then ntroduce detaled procedures of bmgc n Secton VI. Fg. 3. Overvew of the proposed bmgc framework. V. SUBGRAPH EXPLORATION Explorng optmal bag constraned subgraphs n each teraton of bmgc s a nontrval task. Ths process has two man challenges. 1) How to utlze the nformaton of the labeled graphs n negatve bags? 2) How to tackle the problem that the labels of graphs n postve bags are unknown? Assume that a set of canddate graphs are collected from the bag set B, lets g denote the complete set of subgraphs n B, and g t be the optmal subgraph selected from S g n the tth teraton. Our bag constraned subgraph exploraton ams to fnd the most nformatve subgraph g t n each teraton wth weght updatng for both bags and graphs. Let Z( ), the evaluaton crteron for a sngle subgraph S g, be a functon to measure the nformatveness of as g t = arg max (Z( )). (1) S g The objectve functon n (1) ndcates that the optmal bag constraned subgraph g t should have the maxmum dscrmnatve capablty for MGC. A. Evaluaton Crteron for Subgraphs In order to measure the nformatveness of a subgraph,.e., Z( ), such that we can dscover the most nformatve subgraph for bags, we mpose constrants to the labeled bags n the mult-graph bag set B, through whch the subgraph selecton crteron Z( ) can be properly defned. For two bags, B and B j, f they have the same class labels, there s a parwse must-lnk constrant between them. If B and B j have dfferent class labels, there s a cannot-lnk constrant between them.

5 434 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 45, NO. 3, MARCH 2015 To further take the data dstrbutons n each bag nto consderaton, we also add graph-level constrants to ensure that the selected subgraphs can make graphs n each negatve bag close to each other and make graphs n each postve bag be maxmally separated. In summary, a good subgraph should satsfy the followng constrants. 1) Weghted Bag Must-Lnk: If there s a must-lnk between B and B j, ther subgraph feature vectors x B and x B j should be close to each other. In a MGC scenaro, each bag B s assocated wth a weght w B. For each par of bags wth the same class label, the selected subgraph should ensure that bags wth smlar weghts (analogous to mportance) have a hgh smlarty. 2) Weghted Bag Cannot-Lnk: If there s a cannot-lnk between B and B j, the underlyng subgraph feature vectors x B and x B j should be dstnct from each other. For each par of bags n dfferent classes, the smaller the weght dfference between the two bags, the more mpact the constrant wll have for selectng subgraph to represent the dstncton between them. 3) Weghted Graph Must-Lnk: If there s a must-lnk between G and G j, ther subgraph feature vectors x G and x G j should be close to each other. In bmgc, only graphs n negatve bags are known to have genune labels, n whch the feature representatons of the two weghted graphs should have low dversty. 4) Weghted Graph Separablty: If genune labels of graphs G andg j areunknown,thecorrespondngsubgraphfeature vectors x G and x G j should be dfferent. Ths s smlar to the prncpal component analyss (PCA) s assumpton [47], whch ams to fnd the component wth the largest possble varance. Ths constrant apples to postve bags, because genune labels of all graphs n each postve bag are unknown. As a result, we apply ths constrant to encourage each postve bag to have a large dversty nsde the bag. Smlar assumpton has also been used n [9] to handle unlabeled graphs n a sem-supervsed learnng settng. In summary, the bag must-lnk and bag cannot-lnk constrants are appled to bags wth the same label and dfferent labels, respectvely. Whle the graph must-lnk and graph separablty constrants are only appled to graphs n negatve bags and graphs n postve bags, respectvely. By mposng constrants to both bag- and graph- levels, our evaluaton crteron ntends to capture nformatve subgraph features for MGC. Based on the above consderatons, we derve a crteron Z( ) for measurng the nformatveness of a subgraph as follows: Z( ) = 1 2A y y j = 1 1 2B 1 2C + 1 2D y y j =1 ( D gk w B xb D gk w B j xb j G,G j B G,G j B + ) 2 ( ) 2 D gk w B xb D gk w B j xb j ( D gk w G xg ( D gk w G xg D gk w G j xg j D gk w G j xg j ) 2 ) 2 (2) where w B, wb j, wg, and wg j are the weghts for B, B j, G, and G j, respectvely. D gk = dag(d( )) s a dagonal matrx ndcatng whch subgraph feature s selected from S g to represent the bags or graphs, d( ) = I(g =, g S g ) wth I( ) equalng to 1 f the condton nsde s true and 0 otherwse. A = y y j = 1 1, B = y y j =1 1, C = G,G j B 1, and D = G,G j B + 1 assess the total parwse sets of constrants n the bag cannot-lnk, bag must-lnk, graph must-lnk and graph separablty. We defne two matrces for bag-level and graph-level constrants, denoted by M B = [Mj B]p p and M G = [Mj G]q q, respectvely, where Mj B ={1/A, y y j = 1; 1/B, y y j = 1}, and Mj G = { 1/C, G, G j B ; 1/D, G, G j B + ; 0, otherwse}. As a result, (2) can be rewrtten as Z( ) = Z( ) B + Z( ) G = 1 ( ) 2 D gk w B 2 xb D gk w B j xb j M B j y y j + 1 ( ) 2 D gk w G 2 xg D gk w G j xg j M G j. (3) G G j For bag-level evaluaton Z( ) B,wehave Z( ) B = 1 ( ) 2M D gk w B 2 xb D gk w B j xb B j j y y j ) = tr (Dg k X B W B (D B M B )WB X B D ) = tr (Dg k X B W B L B WB X B D ( ) = f B g WB k L B WB f B ( ) = f B g QB k f B (4) where L B = D B M B s a Laplacan matrx, where D B = dag(d B) s a dagonal matrx wth db = j MB j. Q B = W B L B WB, where W B s a also a dagonal matrx, wth W B = w B denotng the weght of the th bag B. X B = [x B 1,...,xB p ] = [f B g 1,...,f B g s ] {0, 1} s p, where f B s an ndcator vector of subgraph wth respect to all the bags n B. Specfcally, f B = [f B 1,...,f B p ] {0, 1} p, where f B = 1ff G B G and f B = 0 otherwse. Smlarly, the graph-level evaluaton Z( ) G can be rewrtten n the form of matrx. Takng both the bag-level and graph-level evaluaton functons together, we have Z ( ) = Z ( ) B + Z( ) G ( ) ( ) = f B g QB k f B + f G g QG k f G = f Qf gk (5) where Q G = W G L G WG, wth W G a dagonal matrx,.e., W G = w G, denotng the weght of the th graph G. L G = D G M G s known as a Laplacan matrx, where D G = dag(d G) s a dagonal matrx wth dg = j MG j. Meanwhle, f G s an ndcator vector of subgraph wth respect to

6 WU et al.: BOOSTING FOR MULTI-GRAPH CLASSIFICATION 435 all graphs n G, and f G = [f G 1,...,f G q ] {0, 1} q, where f G = 1ff G and f G = 0 otherwse. Accordng to (5), we have [ ] f B [ ] g f gk = k QB 0 Q = (6) 0 Q G f G where f gk s an ndcator vector of subgraph wth respect to the data combned wth bag matrx X B and graph matrx X G. By denotng the functon as h(, Q) = f Qf gk, the problem of maxmzng Z( ) n (1) s equvalent to fndng a subgraph that can maxmze the h(, Q), whch can be represented as g t = max S g h(, Q) (7) Defnton 6 (bscore): Gven two matrces M B and M G embeddng the label nformaton, respectvely, and two correspondng weght matrces W B and W G, the nformatveness score of a subgraph s defned n (8) r( ) = h(, Q) = f Qf gk. (8) In the above defnton, a larger bscore r( ) value represents a stronger dependency between ths subgraph feature and the correspondng labels. In other words, good subgraph features should have hgh bscore values. To fnd the optmal subgraph n each teraton, we can calculate bscore values of all subgraphs n S g, and then select the topmost subgraph wth the hghest r( ) value. B. Upper Bound of bscore Before we ntroduce detaled algorthm to mne the optmal subgraph n each teraton, we derve a bscore upper bound to help prune the subgraph search space. Theorem 1: Gven two subgraphs, g k S g, g k s a supergraph of (.e., g k ). The bscore value g k (r(g k ))s bounded by ˆr( ),.e., r(g k ) ˆr() ˆr( ) = f ˆQf gk (9) [ ] ˆQ where ˆQ = B 0, n whch ˆQ B and ˆQ G are defned as 0 ˆQ G ˆQ B j = max(0, QB j ) and ˆQ G j = max(0, Q G j ). For any g k, r(g k ) ˆr(). The correspondng proof s gven n Appendx. C. Mnng Bag Constraned Subgraph For subgraph selecton, we employ a depth-frst search (DFS) based algorthm gspan [34] to enumerate subgraphs. The key dea of gspan s that each subgraph has a unque DFS code, whch s defned by a lexcographc order of the dscovery tme durng the search process. Two subgraphs are somorphsm ff they have the same mnmum DFS code. By employng a depth-frst search strategy on the DFS code tree (where each node s a subgraph), gspan can enumerate all frequent subgraphs effcently. Algorthm 1 reports the proposed bag constraned subgraph exploraton process, whch starts wth an empty optmal Algorthm 1 BSE: Bag Constraned Subgraph Exploraton Input: G: A graph dataset; mn_sup: The threshold of the frequent subgraph; Output: g t : The optmal subgraph; 1: whle Recursvely vst the DFS Code Tree n gspan do 2: current vsted subgraph n DFS Code Tree of G; 3: f freq( )<mn_sup then 4: contnue; 5: end f 6: Compute the bscore r( ) for subgraph ; 7: f g t == NULL or r( )>r(g t ) then 8: g t ; 9: end f 10: f ˆr( ) r(g t ) then 11: Depth-frst search the subtree rooted from node ; 12: end f 13: end whle 14: return g t ; subgraph set and contnuously enumerates subgraphs by recursvely vstng the DFS code tree. If a subgraph s not a frequent subgraph, both and ts subtree wll be pruned (lnes 3 5), n whch freq( ) denotes the percentage of graphs contanng the subgraph n graph dataset G; otherwse, we calculate s bscore value r( ) (lne 6). If r( ) s larger than the current optmal score r(g t ) or t s the frst step (.e., the optmal subgraph set s empty), we regard as the current optmal subgraph g t (lnes 7 9). After that, the upper bound prunng module wll check f ˆr( ) s less than r(g t ),fso, t means that the bscore value of any supergraph g k of (.e., g k ) wll not be greater than r(g t ). Thus, we can safely prune subtrees rooted from n the search space. If ˆr( ) s ndeed greater than the bscore of g t, we cannot prune ths space snce there mght exst a supergraph g k wth r(g k ) r(g t), so the DFS wll contnue by followng the chldren of (lnes 10 12), untl the frequent subgraph mnng process s completed. VI. bmgc The detaled procedures of bmgc are reported n Algorthm 2, whch teratvely expands the canddate graph set to exact nformatve subgraphs, then explores the optmal subgraphs based on bscore. After m teratons, bmgc boosts the m selected weak classfers to obtan the fnal classfcaton model. A. bmgc Algorthm In Algorthm 2, bmgc dfferentates and consders graph n three sets: graphs n postve bags G +, graphs n negatve bags G, and graphs n both postve and negatve bags G. The beneft of separatng graphs nto three sets s that the subgraph mnng process, whch s carred out on each set respectvely, wll ncrease the canddate graph set for explorng subgraphs. By dong so, the subgraph space becomes more dense, through whch good subgraph features can be dscovered. The whle loop n Algorthm 2 represents the boostng process of bmgc. In each teraton, the subgraph mnng s

7 436 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 45, NO. 3, MARCH 2015 Algorthm 2 bmgc: Boostng for Mult-graph Classfcaton Input: B: Mult-graph bag set; G: Graph dataset n B; m: The number of teratons; mn_sup: The threshold of the frequent subgraph; 1: Output: The class label y k of a testng bag B k. 2: Intalze w B w B :w B = 1; w G w G :w G = 1; t = 0; // Tranng Phase: 3: {G +, G } Graphs n B + and B, respectvely; 4: {p, q, q + } #of bags B, # of graphs G, G + ; 5: whle t < m do 6: t t + 1; 7: w B w B / p =1 wb, wg w G / q =1 wg ; 8: g G t BSE(G, mn_sup); //Algorthm 1 9: g G+ t BSE(G +, mn_sup); //Algorthm 1 10: g G t BSE(G, mn_sup); //Algorthm 1, g G t ); 11: g t The subgraph wth the hghest bscore (g G t, gg+ t // Error Calculaton: 12: εt B Calculate the error of Ht B correspondng to g t on B; 13: f εt B > 1/2 then 14: Ht B Ht B, εb t 1 εt B; 15: end f 16: εt G Calculate the error of Ht G correspondng to g t on G ; 17: f εt G > 1/2 then 18: Ht G Ht G; εg t 1 εt G ; 19: end f 20: βt B (1 εt B)/εB t ; 21: βt G εt G /(1 εt G ), β G+ 1/(1 + 2Inq + /m); // Increase weght for ncorrectly classfed bag: 22: w B w B (βb t )I( H B t (B ) =c(b ) ), B B; // Decrease weght for ncorrectly classfed graph n B + : 23: w G+ j w G+ j (β G+ ) I( ) Ht G(G j) =c(g j ), G j G + ; // Increase weght for ncorrectly classfed graph n B : 24: w G k w G k (βt G ) I( ) Ht G(G k) =c(g k ), G k G ; 25: end whle // Testng Phase: 26: y k sgn ( m t=1 βt BHB t (B k) ) carred out on three graph sets as shown from lnes 7 to 9. The current optmal subgraph g t s the one wth the hghest bscore wth respect to the subgraph dscovered from each ndvdual graph sets (lne 10). In bmgc, the subgraph g t s drectly used as a weak bag classfer Ht B or a weak graph classfer Ht G, wth Ht B(B ) = 1ff(x g t ) B = 1, and Ht B(B ) = 1, otherwse. The same classfcaton method s also used n graph based subgraph classfer Ht G. Accordngly, the steps from lnes 11 to 20 use the error rates of the weak classfers to update the parameters of the boostng framework. 1) Updatng Bag and Graph Weghts: To obtan the t + 1th optmal subgraph g t+1, we must update the weghts of bags and graphs usng the tth optmal subgraph g t. The error εt B (lne 11) on a bag set B can be defned as follows: ) p w B εt B (H I t B(B ) = c(b ) = (10) =0 p =1 wb where c(b ) returns the label for the th bag and I( ) s the ndcator functon. The error εt G (lne 15) on a negatve graph set can also be obtaned n a smlar way. Note that εt B and εt G are requred to be smaller than 1/2. If not, the underlyng classfer s worse than random hypothess, and then we should use Ht B and Ht G to replace the current bag- and graph-level classfers, respectvely. As a result, the underlyng errors on and 1 ε G t, bag set and negatve graph set become 1 εt B respectvely (lnes and 16 18). Accordng to the specfc characterstcs of bags and graphs, we employ two dfferent weghtng strateges. Because bags are the target of the classfcaton and ther genune labels are gven, f a bag s msclassfed by the current subgraph g t classfer Ht B, the bag weght s ncreased by usng the weght coeffcent factor βt B (lne 19) n order to fnd more nformatve subgraph n the next teraton to deal wth ncorrectly predcted bags (lne 21). Ths bag-level weghtng mechansm s smlar to the AdaBoost algorthm [25]. At ndvdual graph level, because we propagate bag labels to graphs at the very begnnng of the algorthm, some graphs n postve bags mght have been assgned wth wrong labels. Therefore, f a graph n postve bags s msclassfed (.e., Ht G(G j) = c(g j )), n the next teraton we decrease ts weght to reduce ts effect through multplyng ts weght by (β G+ ) I(HG t (G j) =c(g j )) (0, 1], where β G+ s the weght coeffcent factor for postve graph (lne 20). Thus, the msclassfed graphs n postve bags wll have reduced mpact on the learnng process n the next round (lne 22). The graphs wth large tranng weghts wll help the learnng algorthm fnd better subgraphs. For negatve bags, the weght updatng mechansm s the same for all graphs nsde the bag (lne 23). Ths graph-level weghtng mechansm s smlar to the TrAdaBoost algorthm [48]. In the test phase, the test bag B k wll be tested usng a weghted classfer sgn ( m t=1 βt BHB t (B k) ) by boostng all the m weak classfers Ht B, t = 1, 2,...,m to obtan ts class label y k (lne 25). The key techncal advantage of the bmgc process can be summarzed as follows. a) Bag Constraned Subgraph Mnng: The two-level weght updatng mechansm seamlessly ntegrates the unque bag- and graph-level constrants nto a repettve and progressve mnng process. It helps explore nformatve subgraphs to represent mult-graph bags. b) Implct Feature Representaton: bmgc selects a subgraph to drectly form a weak classfer n each teraton. Ths can effcently tackle the challenge that no feature vectors are avalable for MGC. c) Generc Boostng Framework for MGC: The proposed framework solves MGC by explorng nformatve subgraphs as weak classfers to form a strong boostng model. The framework can be easly adjusted to accommodate other types of graph or bag classfers for MGC. VII. EXPERIMENTS A. DataSets 1) DBLP Mult-Graph Dataset: The DBLP dataset conssts of bblography data n computer scence. We download a DBLP verson called DBLP-Ctaton-network V5 from Arnetmner ( Each record n

8 WU et al.: BOOSTING FOR MULTI-GRAPH CLASSIFICATION 437 TABLE I DBLP DATASET USED IN EXPERIMENTS TABLE II NCI CANCER SCREEN DATASETS: NCI(1) AND NCI(109) DBLP s assocated wth a number of attrbutes ncludng ttle, abstract, author names, year, venue, and reference names etc. [49]. To buld mult-graph bags, we select papers publshed n artfcal ntellgence (AI), computer vson (CV), and database (DB) felds to form MGC tasks. The goal s to predct whch feld a paper belongs to (AI, CV, or DB), by usng abstracts of each paper and the abstracts of ts references. For each abstract, a fuzzy cogntve map (E-FCM) [50] based approach s used to extract a number of keywords and correlatons between keywords. In our experments, we use keywords as nodes and correlatons between two keywords as edge weght values to buld a graph. A threshold (0.005) s used to remove edges whose correlaton values are less than the threshold. At the last step, the graph s converted nto an unweghted graph by settng the weght values of all remanng edges as 1. The smlar graph representaton was also used n prevous works [51] [54]. A conceptual vew of buldng a mult-graph bag s shown n Fg. 1. Notce that AI, CV, and DB are overlapped n many aspects, such as machne learnng, optmzaton and data mnng, whch make them challengng MGC tasks. The orgnal DBLP dataset contans a sgnfcant number of papers wthout references. We choose 2400 papers, each of whch contanng one to ten references, to form two MGC tasks: DBLP (AI versus CV) wth postve (AI) and negatve (CV) bags, and DBLP (AI versus DB) wth postve (AI) and negatve (DB) bags. The last two columns n Table I report the number of bags (papers) and graphs (abstracts) n each category. 2) NCI Chemcal Compound Mult-Graph Dataset: The NCI cancer screenng database s a commonly used graph classfcaton benchmark. We download two NCI datasets wth ID 1 and 109 from PubChem ( Each NCI dataset belongs to a boassay task for antcancer actvty predcton, where each chemcal compound s represented as a graph, wth atoms representng nodes and bonds denotng edges. A chemcal compound s postve f t s actve aganst the correspondng cancer, or negatve otherwse. The orgnal NCI datasets are hghly mbalanced, wth about 5% postve graphs, whch s used to generate our mult-graph bags. To buld mult-graph bags, we randomly select 1 to 4 postve graphs and several negatve graphs to form a postve bag, and randomly select a number of negatve graphs to form a negatve bag. In order to address dfferent targets, we desgn two NCI mult-graph classfcaton tasks. One s NCI(1), whch s generated from NCI dataset wth ID 1, and the other s NCI(109), whch s generated from NCI dataset wth ID 109. The number of graphs n each bag may vary from 1 to 10. Table II summarzes the NCI(1) and NCI(109) datasets used n our experments, where columns 4 5 show the numbers of postve and negatve graphs n all mult-graph bags. In the NCI MGC, a bag of graphs can be regarded as a molecular group. Investgatng the actvty of a molecular group s meanngful n the bo-pharmaceutcal feld. Because labelng ndvdual compounds s expensve and tme-consumng, t s desrable to desgn effectve methods (bmgc) to label molecular groups (.e., bags). B. Baselne Methods To demonstrate the effectveness of our MGC framework, we compare the proposed bmgc wth both supervsed and unsupervsed bag constraned subgraph selecton methods n the tradtonal MIL framework. The baselne methods are summarzed as follows. 1) Informaton Gan Based Approach (IG+MI): In these methods, a set of frequent subgraphs are mned from graphs n all bags by usng gspan [34]. A supervsed feature selecton based on nformaton gan (IG) s used to select m subgraphs wth the hghest IG scores. After obtanng the m subgraphs, IG based mult-nstance approach (IG+MI) utlzes the selected subgraphs to represent graphs n bags, so a bag of graphs are converted nto a bag of nstances, through whch the exstng MIL methods can be appled for MGC learnng. 2) Top-k Based Approach (Topk+MI): Ths s an unsupervsed feature selecton method whch uses frequency as evaluaton crteron to select subgraphs dscovered by gspan [34]. The Top-k subgraphs wth the hghest frequency from graphs n bags are selected. Top-k based mult-nstance approach (Topk+MI) transforms each bag of graphs nto a bag of nstances for learnng. To compare our MGC framework bmgc s performance wth MIL, two types of benchmark mult-nstance classfers, ncludng boostng based (MIBoost and MIOptmallBall) and four dfferent knds of general approaches (CtatonKNN, MIRI, MIEMDD, and MISMO), are used n our experments. In the followng, CtatonKNN denotes a lazy learnng based method, MIRI s an mprovement of tree learnng based approach, MIEMDD s an mproved DD [26], and MISMO s an mplementaton of support vector machne for MIL. The baselne MIL methods used n our experments and ther abbrevatons are lsted as follows. 1) Boostng for MI Learnng Approaches. a) MIBoost s an algorthm [24] nspred by AdaBoost that bulds a seres of weak classfers (decson stump s used

9 438 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 45, NO. 3, MARCH 2015 Fg. 4. Accuracy on DBLP(AI versus CV) by usng proposed bmgc and boostng based MI learnng methods. (a) MIBoost. (b) MIOptmalBall. Fg. 6. Accuracy on NCI(1) by usng proposed bmgc and boostng based MI learnng methods. (a) MIBoost. (b) MIOptmalBall. Fg. 5. Accuracy on DBLP(AI versus DB) by usng proposed bmgc and boostng based MI learnng methods. (a) MIBoost. (b) MIOptmalBall. Fg. 7. Accuracy on NCI(109) by usng proposed bmgc and boostng based MI learnng methods. (a) MIBoost. (b) MIOptmalBall. n our experment) usng a sngle nstance learner based on approprately reweghted versons of the nput data. b) MIOptmalBall treats the weak hypotheses for AdaBoost as balls [28] and the classfcaton s based on the dstance to a reference pont. More specfcally, ths method attempts to fnd a ball n the nstance space so that all nstances of all negatve bags are outsde the ball and at least one nstance of each postve bag s nsde the ball. 2) General MI Learnng Approaches. a) CtatonKNN, a nearest-neghbor-based approach, measures the dstance between bags usng Hausdorff dstance [16]. The nearest neghbor example to be classfed s the one nearest to both references and cters. b) MIEMDD s the EM verson of DD wth the mostlkely-cause model [26], whch s used to fnd the most lkely target ponts based on the DD model that has been learned [27]. c) MIRI s a mult-nstance classfer that utlzes partal MITI trees [17] wth a sngle postve leaf to learn and represent rules. MIRI [18] s a smple modfcaton to MITI to yeld a rule learner for MIL. d) MISMO constructs a support vector machne classfer for mult-nstance data [21], where the standard sequental mnmzaton algorthm s used for support vector learnng n conjuncton wth an MI kernel as descrbed n [55]. C. Experment Settngs In our experments, all reported results are based on 10 tmes 10-fold cross-valdaton wth classfcaton accuracy beng used as the performance metrcs. Unless specfed otherwse, the default parameter settngs are as follows: mnmum support threshold mn_sup = 4% for DBLP datasets and mn_sup = 15% for NCI datasets. All the above classfers for tradtonal MIL utlze the versons provded n WEKA machne learnng workbench [56], wth default parameter settngs. Besdes, all experments are conducted on a Lnux cluster computng node wth an Interl(R) CPU and 3GB memory. D. Accuracy on Mult-Graph Classfcaton In ths secton, we report expermental results on DBLP and NCI datasets, by comparng the performance of bmgc wth two types of MIL methods, ncludng boostng based and general approaches under the supervsed and unsupervsed feature selecton settngs respectvely. All methods are compared by usng the same number of subgraphs. For our boostng based bmgc, one subgraph s selected n each teraton untl the total number reaches m, whereas for baselne methods, a number of m subgraphs are selected n one tme. As expected, bmgc clearly outperforms exstng tradtonal MIL methods on both DBLP and NCI mult-graph datasets wth dfferent number of subgraphs (varyng from 1 to 100). 1) bmgc Versus Boostng for MI Learnng Approaches: We compare bmgc to MIBoost and MIOptmalBall, where the two boostng based baselnes are two varants of the well known AdaBoost algorthm [25] wth the objectve of mnmzng the exponental loss for bags of nstances. Lke other boostng schemes, these two algorthms greedly ft an addtve model to the tranng data. In each teraton of the sequental boostng process, a weak learner (a decson stump for MIBoost, and a ball for MIOptmalBall) s appled to generate one component of the underlyng addtve model. Results n Fgs. 4(a) to 7(a) show that both bmgc and MIBoost can acheve a hgh accuracy on DBLP (AI versus

10 WU et al.: BOOSTING FOR MULTI-GRAPH CLASSIFICATION 439 TABLE III PAIRWISE t-test RESULT OF BMGC VERSUS BOOSTING BASED MI LEARNING METHODS ON (a) DBLP AND (b) NCI DATASETS. A,B,AND CDENOTE BMGC, IG+MI, AND TOPK+MI, RESPECTIVELY. H 1 AND H 2 DENOTE MIBOOST AND MIOPTIMALBALL, RESPECTIVELY CV, AI versus DB) and NCI (1, and 109) datasets. Meanwhle, bmgc consstently outperforms MIBoost when the number of selected subgraphs s 20 or more. On the other hand, comparng our bmgc wth MIOptmallBall, sgnfcant performance gan can be observed n Fgs. 4(b) to 7(b) on both datasets. The superor performance of bmgc s due to the optmal subgraph mnng strategy combned wth AdaBoost and TrAdaBoost algorthms. Further more, t seems that MIOptmallBall fals to adapt to the feature space composed of subgraphs. Our results also show that bmgc has a very low accuracy n early teratons, and ts accuracy may be worse than baselnes such as MIBoost n some cases. Ths s manly because that the boostng model of bmgc reles on weak classfers to acheve better performance. When the number of weak classfers s small (normally happens at the early stage of the boostng process), the accuracy of bmgc s notceably low. In order to show that ths stuaton wll not affect the performance of bmgc, we summarze the parwse t-test results (wth confdence level α = 0.05) of bmgc and boostng MI learnng methods on both datasets n Table III. Each entry (value) denotes the p-value for a t-test between two algorthms, and a p-value less than α = 0.05 ndcates that the dfference s statstcally sgnfcant. From Table III, bmgc statstcally outperforms boostng based MI learnng baselnes n all cases. 2) bmgc Versus General MI Learnng Approaches: We carry out another expermental comparson to demonstrate the performance of bmgc, wth other four dfferent types of general MI learnng approaches (CtatonKNN, MIRI, MIEMDD and MISMO). From the results n Fgs. 8(c) to 11(c), MIEMDD shows neffectve performance for MGC, and ncreasng number of subgraphs cannot result n addtonal accuracy gan. Although the performance of CtatonKNN, MIRI, and MISMO based methods mprove as the number of subgraphs ncreases, they stll cannot reach the best performance acheved by bmgc except for IG+MIRI on NCI(109) dataset as shown n Fg. 11(b). It s worth mentonng that bmgc may acheve comparable performances over other baselnes n some cases, such as Topk+CtatonKNN [Fg. 9(a)] and MISMO [Fgs. 8(d) and 9(d)] on DBLP dataset, IG+MISMO [Fgs. 10(d) and 11(d)] on NCI dataset. To further valdate the statstcal performance of bmgc, n Table IV, we also report the parwse t-test to valdate the statstcal sgnfcance between two methods. From Table IV, bmgc statstcally outperforms general MI learnng baselnes n all cases. Ths s manly attrbuted to the effectveness of the proposed bag constraned subgraph exploraton crteron and the specally desgned boostng strategy, whch weghts a set of sngle weak classfers under our specally desgned weghtng mechansm. E. Effectveness of Subgraph Canddate Generaton n bmgc As dscussed above, one man component of bmgc s the utlzaton of subgraph canddate generaton (as descrbed n Secton IV). More specfcally, n addton to aggregatng graphs n all bags G, we also aggregate: 1) graphs n all postve bags G +, and 2) graphs n all negatve bags G.Asa result, a set of dverse subgraph canddate patterns can be dscovered for valdaton. In order to further llustrate the effectveness of the proposed strategy for subgraph canddate generaton and valdate whether usng the two extra graph sets G + and G can ndeed mprove the performance of bmgc, we compare bmgc wth an approach whch only uses the G to generate the subgraphs for learnng, namely bmgc-g. In Fg. 12(a) and (b), we report the accuracy wth respect to dfferent teratons on DBLP (AI versus CV) and NCI(1) datasets, respectvely. The results show that the classfcaton accuracy of bmgc usng all three graph sets s normally 3% 5% hgher than bmgc-g whch only uses the G. Ths s due to the fact that the separaton of graphs nto G + and G can help fnd some unque subgraph patterns, whch do not appear n the whole graph set G. Indeed, because the subgraph exploraton essentally reles on a threshold (.e., the support value) to dscover frequent subgraphs. When aggregatng all graphs n one set G, t s possble that a good subgraph n G + may not be dscovered from G, smply because the frequency of the subgraph s below the gven threshold n G. The separaton of graphs nto three sets G +, G, and G wll therefore help dscover a rch set of subgraph canddates, through whch bmgc can fnd the ones wth the hghest nformatveness scores. F. Convergence Study Fg. 13 reports the error rate curves of bmgc n terms of the number of teratons on four mult-graph datasets. The curves are qute smooth, but converge well, whch s consstent wth the theoretcal analyss and the exstng observatons from Adaboost [25]. The error rates of bmgc, after the algorthm reaches the convergence, are hgher on DBLP datasets than on the NCI datasets. Overall, bmgc on all four datasets receves a fast convergence speed. For NCI datasets, the convergence s reached wthn ten teratons, whereas for DBLP datasets, bmgcs convergence

11 440 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 45, NO. 3, MARCH 2015 Fg. 8. Accuracy on DBLP(AI versus CV) by usng bmgc and generc MI learnng methods. (a) CtatonKNN. (b) MIRI. (c) MIEMDD. (d) MISMO. Fg. 9. Accuracy on DBLP(AI versus DB) by usng bmgc and generc MI learnng methods. (a) CtatonKNN. (b) MIRI. (c) MIEMDD. (d) MISMO. Fg. 10. Accuracy on NCI(1) by usng bmgc and generc MI learnng methods. (a) CtatonKNN. (b) MIRI. (c) MIEMDD. (d) MISMO. Fg. 11. Accuracy on NCI(109) by usng bmgc and generc MI learnng methods. (a) CtatonKNN. (b) MIRI. (c) MIEMDD. (d) MISMO. TABLE IV PAIRWISE t-test RESULT OF BMGC VERSUS GENERAL MI LEARNING METHODS ON (a) DBLP AND NCI DATASETS. A,B,AND CDENOTE BMGC, IG+MI, AND TOPK+MI, RESPECTIVELY. H 1, H 2, H 3, AND H 4 DENOTE CITATIONKNN, MIRI, MIEMDD, AND MISMO, RESPECTIVELY s reached after 20 or more teratons. Notce that each weak classfer n bmgc denotes one subgraph, ths ndcates that more subgraph features are needed n order to dfferentate the object classes n the DBLP dataset. Indeed, because DBLP tasks nvolve overlappng domans (such as AI versus CV), usng more subgraph features (whch correspond to keywords

12 WU et al.: BOOSTING FOR MULTI-GRAPH CLASSIFICATION 441 Fg. 12. Accuracy comparsons by usng bmgc and bmgc-g on DBLP and NCI datasets, respectvely. (a) DBLP (AI versus CV) dataset. (b) NCI (1) dataset. Fg. 14. Average CPU runtme for bmgc versus unpruned ubmgc wth dfferent mn_sup under a fxed number of subgraphs m = 100 on DBLP and NCI datasets, respectvely. Fg. 13. Error rate curves on DBLP (AI versus CV, AI versus DB) and NCI(1 and 109) mult-graph datasets n terms of the number of teratons. and ther correlatons) can constantly help mprove the classfcaton accuracy. For NCI graphs, the postve versus negatve graphs are mostly separated by some unque subgraph features. So as long as such unque patterns are dscovered, the algorthm can quckly converge. G. Effectveness Results To evaluate the effectveness of the prunng module of bmgc n reducng the search space (as descrbed n Secton V-C), we compare bmgc wth an approach whch does not have prunng module n the subgraph search space (denoted by ubmgc). In our mplementaton, ubmgc frst uses gspan to fnd a set of frequent subgraphs, and then selects the optmal subgraph by usng the same crtera as bmgc n each teraton. In Fg. 14(a) and (b), we report the average CPU runtme wth respect to dfferent mnmum support values mn_sup (the number of selected subgraphs s fxed to 100) on DBLP(AI versus CV) and NCI(1) datasets, respectvely. The results show that as the mn_sup values ncrease, the runtme of both prunng and unprunng bmgc decrease, ths s manly because a larger mn_sup value wll reduce the number of canddates for valdaton. Accordngly, by ncorporatng the proposed prunng strategy, bmgc can mprove the runtme performance. The reason s that the bscore upper bound of bmgc can effectvely help prune the subgraph search space wthout decreasng the qualty of classfcaton. VIII. DISCUSSION In ths paper, we focus on usng subgraph based boostng framework for MGC. Indeed, the dea of explotng subgraphs for graph classfcaton has been studed n a number of exstng works, ncludng a recent ensemble based semsupervsed graph stream classfcaton approach [9]. The core of the proposed bmgc approach s to combne two types of boostng strateges: AdaBoost [25] for bag-level boostng and TrAdaBoost [48] for graph-level boostng, to ntegrate graphand bag-level learnng for MGC. Boostng algorthms for graph classfcaton have already been studed n several prevous works. For example, Kudo et al. [41] proposes an AdaBoost based graph classfcaton approach, whch s the orgnal algorthm among many varants [42] [44]. Meanwhle, LPBoost [57], namely lnear programmng Boostng, s another type of boostng algorthm for graph classfcaton. The proposed bmgc follows smlar subgraph search approaches as used n these exstng works. For bmgc, t uses gspan algorthm [34] n each boostng teraton, together wth the proposed prunng strategy, to explore subgraphs. The man complcaton of MGC s that the genune labels of graphs nsde a postve bag are unknown. To tackle uncertanty nsde postve bags, bmgc takes the bag constrants nto consderaton and explores subgraphs to represent graphs wth maxmum dversty, as defned n (2). Ths s smlar to the way of handlng unlabeled graphs n an exstng sem-supervsed graph stream classfcaton method [9]. In [9], an nstance weghtng mechansm has also been proposed but s dfferent from the weghtng approach n bmgc, where the weghts are drectly assocated to the graphs and bags. In addton, the weght updatng strategy n [9] s based on AdaBoost [25], whch only consders labeled graphs. In bmgc, we borrow the weghtng strategy from TrAdaBoost [48] to update the graph weghs n both labeled and unlabeled graph sets. In summary, the dea n [9] provdes nspratons to motvate the proposed MGC desgn. We beleve that the proposed bmgc opens a new opportunty to expand exstng MIL to ncreasngly popular graph applcatons. Although bmgc proposes to use subgraph mnng to tackle the MGC challenges, the prncple of combnng graph and bag level constrants can be extended to many other types of approaches to handle MGC problems. For example, for kernel based methods, MGC problem can be solved by two subtasks: 1) add mult-graph constrants to tradtonal graph kernel, and 2) propose a new mult-graph kernel framework. In addton, one can also mpose mult-graph constrans to graph embeddng methods (e.g., the one n [32]) to drectly calculate the dstance between two graphs or between two

13 442 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 45, NO. 3, MARCH 2015 graph bags. Wth the calculated dstances between graphs and between bags, standard learnng algorthms (ncludng MIL algorthms) can be appled to solve MGC tasks. IX. CONCLUSION In ths paper, we nvestgated a novel MGC problem, n whch a number of graphs form a bag, wth each bag beng labeled as ether postve or negatve. Mult-graph representaton can be used to represent many real-world applcatons, where label s only avalable for a bag of objects wth dependency structures. To buld a learnng model for MGC, we proposed a bmgc, whch employs dynamc weght adjustment, at both graph- and bag-levels, to select one subgraph n each teraton to form a set of weak graph classfers. The MGC s acheved by usng weghted combnaton of weak graph classfers. Experments on two real-world MGC tasks, ncludng DBLP ctaton network and NCI chemcal compound classfcaton, demonstrate that our method s effectve n fndng nformatve subgraph, and ts accuracy s sgnfcantly better than baselne methods. APPENDIX PROOF OF THE THEOREM 1 Accordng to (8), for any g k we have r ( g ) k = f g kqf g k [ ( ) ( ) ][ ] = f B g f G QB 0 f B g k g k k 0 Q G f G g k ( ( = f B ) QB f B g + f G k ) QG f G g k = Q B j + Q G j (11),j:G,G j G(g k ),j:b,b j B(g k) where B(g k ) ={B g k G j B, 1 p, 1 j q} and G(g k ) ={G j g k G j, 1 j q}. Snce g k s the supergraph of (.e., g k ), accordng to the ant-monotonc property, we have B(g k ) B() and G(g k ) G() r ( g ) k = Q B j + Q G j =,j:b,b j B(g k),j:b,b j B(g k),j:b,b j B( ) ( f B ) ˆQ B f B + ˆQ B j + ˆQ B j +,j:g,g j G(g k),j:g,g j G(g k),j:g,g j G( ) ( f G ) ˆQ G f G = f ˆQf gk =ˆr( ). (12) Thus, for any g k, r(g k ) ˆr(). REFERENCES [1] M. Deshpande, M. Kuramoch, N. Wale, and G. Karyps, Frequent substructure-based approaches for classfyng chemcal compounds, IEEE Trans. Knowl. Data Eng., vol. 17, no. 8, pp , Aug ˆQ G j ˆQ G j [2] W. Lan, D.-L. Cheung, N. Mamouls, and S.-M. Yu, An effcent and scalable algorthm for clusterng XML documents by structure, IEEE Trans. Knowl. Data Eng., vol. 16, no. 1, pp , Jan [3] C. Chen et al., Mnng graph patterns effcently va randomzed summares, n Proc. 35th Int. Conf. VLDB, Lyon, France, 2009, pp [4] H. Wang, H. Huang, and C. Dng, Image categorzaton usng drected graphs, n Proc. 11th ECCV, Crete, Greece, 2010, pp [5] R. Angelova and G. Wekum, Graph-based text classfcaton: Learn from your neghbors, n Proc. 29th Annu. Int. ACM SIGIR, Seattle, WA, USA, 2006, pp [6] Z. Harchaou and F. Bach, Image classfcaton wth segmentaton graph kernels, n Proc. 20th IEEE Conf. CVPR, Mnneapols, MN, USA, 2007, pp [7] M. Kuramoch and G. Karyps, Frequent subgraph dscovery, n Proc. 1st ICDM, 2001, pp [8] M. Thoma et al., Near-optmal supervsed feature selecton among frequent subgraphs, n Proc. 9th SDM, 2009, pp [9] S. Pan, X. Zhu, C. Zhang, and P. Yu, Graph stream classfcaton usng labeled and unlabeled graphs, n Proc. 29th IEEE ICDE, Brsbane, QLD, USA, 2013, pp [10] R. Kohav and G. H. John, Wrappers for feature subset selecton, Artf. Intell., vol. 97, nos. 1 2, pp , [11] T. Detterch, R. Lathrop, and T. Lozano-Pérez, Solvng the multple nstance problem wth axs-parallel rectangles, Artf. Intell., vol. 89, no. 1 2, pp , [12] Z. Fu, A. Robles-Kelly, and J. Zhou, MILIS: Multple nstance learnng wth nstance selecton, IEEE Trans. Pattern Anal. Mach. Intell.,vol.33, no. 5, pp , May [13] Z.-H. Zhou, K. Jang, and M. L, Mult-nstance learnng based web mnng, Appl. Intell., vol. 22, no. 2, pp , [14] D. Kelly, J. McDonald, and C. Markham, Weakly supervsed tranng of a sgn language recognton system usng multple nstance learnng densty matrces, IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 41, no. 2, pp , Apr [15] Z. Zhou, M. Zhang, S. Huang, and Y. L, Mult-nstance mult-label learnng, Artf. Intell., vol. 176, no. 1, pp , [16] J. Wang, Solvng the multple-nstance problem: A lazy learnng approach, n Proc. 17th ICML, San Francsco, CA, USA, 2000, pp [17] H. Blockeel and A. Srnvasan, Mult-nstance tree learnng, n Proc. 22th ICML, Bonn, Germany, 2005, pp [18] L. Bjerrng and E. Frank, Beyond trees: Adoptng MITI to learn rules and ensemble classfers for mult-nstance data, n Proc. 24th Int. Conf. Adv. AI, Berln, Hedelberg, 2011, pp [19] Y. Chevaleyre and J. Zucker, A framework for learnng rules from multple nstance data, n Proc. 12th ECML, Freburg, Germany, 2001, pp [20] M. Zhang and Z. Zhou, Improve mult-nstance neural networks through feature selecton, Neural Process. Lett.,vol.19,no.1,pp. 1 10, [21] X. Q and Y. Han, Incorporatng multple SVMs for automatc mage annotaton, Pattern Recogn., vol. 40, no. 2, pp , [22] S. Ray and M. Craven, Supervsed versus multple nstance learnng: An emprcal comparson, n Proc. 22nd ICML, New York, NY, USA, 2005, pp [23] H. Yuan, M. Fang, and X. Zhu, Herarchcal samplng for mult-nstance ensemble learnng, IEEE Trans. Knowl. Data Eng., vol. 25, no. 12, pp , Dec [24] X. Xu and E. Frank, Logstc regresson and boostng for labeled bags of nstances, n Proc. 8th PAKDD, 2004, pp [25] M. Telgarsky, A prmal-dual convergence analyss of boostng, J. Mach. Learn. Res., vol. 13, no. 1, pp , [26] O. Maron and T. Lozano-Pérez, A framework for multple-nstance learnng, n Proc. 12th Annu. Conf. NIPS, Cambrdge, MA, USA, 1998, pp [27] Q. Zhang and S. Goldman, EM-DD: An mproved multplenstance learnng technque, n Proc. 15th Annu. Conf. NIPS, 2001, pp [28] P. Auer and R. Ortner, A boostng approach to multple nstance learnng, n Proc. 15th ECML, Psa, Italy, 2004, pp [29] J. Wu, X. Zhu, C. Zhang, and Z. Ca, Mult-nstance mult-graph dual embeddng learnng, n Proc. 13th ICDM, Dallas, TX, USA, 2013, pp [30] S. V. N. Vshwanathan, K. M. Borgwardt, R. I. Kondor, and N. N. Schraudolph, Graph kernels, J. Mach. Learn. Res., vol. 11, pp , Apr

14 WU et al.: BOOSTING FOR MULTI-GRAPH CLASSIFICATION 443 [31] P. Mahe, N. Ueda, T. Akutsu, J. Pettet, and J. Vert, Extensons of margnalzed graph kernels, n Proc. 21st ICML, New York, NY, USA, 2004, pp [32] K. Resen and H. Bunke, Graph classfcaton by means of Lpschtz embeddng, IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 39, no. 6, pp , Dec [33] K. Resen and H. Bunke, Graph Classfcaton and Clusterng Based on Vector Space Embeddng. Sngapore: World Scentfc, [34] X. Yan and J. Han, gspan: Graph-based substructure pattern mnng, n Proc. 2nd ICDM, Washngton, DC, USA, 2002, pp [35] A. Inokuch, T. Washo, and H. Motoda, An apror-based algorthm for mnng frequent substructures from graph data, n Proc. 4th Eur. Conf. PKDD, Lyon, France, 2000, pp [36] C. Borgelt and M. Berthold, Mnng molecular fragments: Fndng relevant substructures of molecules, n Proc. 2nd ICDM, 2002, pp [37] S. Njssen and J. Kok, A quckstart n frequent structure mnng can make a dfference, n Proc. 10th ACM SIGKDD, Seattle, WA, USA, 2004, pp [38] X. Yan, H. Cheng, J. Han, and P. S. Yu, Mnng sgnfcant graph patterns by leap search, n Proc. 27th ACM SIGMOD, Vancouver, BC, Canada, 2008, pp [39] H. Sago, N. Krämer, and K. Tsuda, Partal least squares regresson for graph mnng, n Proc. 14th ACM SIGKDD, Las Vegas, NV, USA, 2008, pp [40] N. Jn, C. Young, and W. Wang, GAIA: Graph classfcaton usng evolutonary computaton, n Proc. 29th ACM SIGMOD, Indanapols, IN, USA, 2010, pp [41] T. Kudo, E. Maeda, and Y. Matsumoto, An applcaton of boostng to graph classfcaton, n Proc. 18th Annu. Conf. NIPS, 2004, pp [42] S. Nowozn, K. Tsuda, T. Uno, T. Kudo, and G. Bakr, Weghted substructure mnng for mage analyss, n Proc. 20th IEEE Conf. CVPR, Mnneapols, MN, USA, 2007, pp [43] H. Sago, S. Nowozn, T. Kadowak, T. Kudo, and K. Tsuda, gboost: A mathematcal programmng approach to graph classfcaton and regresson, Mach. Learn., vol. 75, no. 1, pp , [44] S. Pan and X. Zhu, Graph classfcaton wth mbalanced class dstrbutons and nose, n Proc. 23rd IJCAI, 2013, pp [45] H. Fe and J. Huan, Boostng wth structure nformaton n the functonal space: An applcaton to graph classfcaton, n Proc. 16th ACM SIGKDD, Washngton, DC, USA, 2010, pp [46] B. Zhang et al., Mult-class graph boostng wth subgraph sharng for object recognton, n Proc. 20th ICPR, Istanbul, Turkey, 2010, pp [47] M. Grbovc, C. Dance, and S. Vucetc, Sparse prncpal component analyss wth constrants, n Proc. 26th Conf. AAAI, 2012, pp [48] W. Da, Q. Yang, G. Xue, and Y. Yu, Boostng for transfer learnng, n Proc. 24th ICML, Corvalls, OR, USA, 2007, pp [49] J. Tang et al., ArnetMner: Extracton and mnng of academc socal networks, n Proc. 14th ACM SIGKDD, Las Vegas, NV, USA, 2008, pp [50] K. Perusch and M. McNeese, Usng fuzzy cogntve maps for knowledge management n a conflct envronment, IEEE Trans. Syst., Man, Cybern.C,Appl.Rev., vol. 36, no. 6, pp , Nov [51] X. L. Q. Hu, W. Xu, and Z. Yu, Dscovery of textual knowledge flow based on the management of knowledge maps, Concurr. Comput. Pract. Exp., vol. 20, no. 15, pp , [52] X. Luo, Z. Xu, J. Yu, and X. Chen, Buldng assocaton lnk network for semantc lnk on web resources, IEEE Trans. Autom. Sc. Eng., vol. 8, no. 3, pp , Jul [53] J. Wu et al., Mult-graph learnng wth postve and unlabeled bags, n Proc. 14th SIAM Int. Conf. Data Mnng, 2014, pp [54] J. Wu, X. Zhu, C. Zhang, and P. Yu, Bag constraned structure pattern mnng for mult-graph classfcaton, IEEE Trans. Knowl. Data Eng., to be publshed. [55] T. Gartner, P. A. Flach, A. Kowalczyk, and A. J. Smola, Mult-nstance kernels, n Proc. 19th ICML, 2002, pp [56] I. H. Wtten and E. Frank, Data Mnng: Practcal Machne Learnng Tools and Technques, 2nd ed. Amsterdam, The Netherlands: Morgan Kaufmann, [57] A. Demrz, K. P. Bennett, and J. Shawe-Taylor, Lnear programmng boostng va column generaton, Mach. Learn., vol. 46, no. 1 3, pp , Ja Wu (S 14) receved the bachelor s degree n computer scence from the Chna Unversty of Geoscences, Wuhan, Chna, n 2009, where he s currently pursung the Ph.D. degree n computer scence. He s also pursung the Ph.D. degree from the Centre for Quantum Computaton and Intellgent Systems, Faculty of Engneerng and Informaton Technology, Unversty of Technology, Sydney, Ultmo, NSW, Australa. Hs current research nterests nclude data mnng and machne learnng. Shru Pan receved the master s degree n computer scence from Northwest A&F Unversty, Yanglng, Chna, n He s currently pursung the Ph.D. degree from the Centre for Quantum Computaton and Intellgent Systems, Faculty of Engneerng and Informaton Technology, Unversty of Technology, Sydney, Ultmo, NSW, Australa. Hs current research nterests nclude data mnng and machne learnng. Xngquan Zhu (SM 12) receved the Ph.D. degree n computer scence from Fudan Unversty, Shangha, Chna. He s an Assocate Professor wth the Department of Computer & Electrcal Engneerng and Computer Scence, Florda Atlantc Unversty, Boca Raton, FL, USA. Pror to that, he was wth the Centre for Quantum Computaton and Intellgent Systems, Unversty of Technology, Sydney, Ultmo, NSW, Australa. Hs current research nterests nclude data mnng, machne learnng, and multmeda systems. Snce 2000, he has publshed over 170 refereed journal and conference papers n these areas. Dr. Zhu s an Assocate Edtor of the IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING durng ( date), and s currently servng on the Edtoral Board of the nternatonal journal of Socal Network Analyss and Mnng, snce 2010, and Network Modelng Analyss n Health Informatcs and Bonformatcs journal, snce He served or s servng as a Program Commttee Co-Char for the 14th IEEE Internatonal Conference on Bonformatcs and BoEngneerng (BIBE-2014), the IEEE Internatonal Conference on Granular Computng (GRC-2013), the 23rd IEEE Internatonal Conference on Tools wth Artfcal Intellgence (ICTAI-2011), and the 9th Internatonal Conference on Machne Learnng and Applcatons (ICMLA-2010). He also served as a Conference Co-Char for ICMLA He was a recpent of two Best Paper Awards and one Best Student Paper Award. Zhhua Ca receved the B.Sc. degree from Wuhan Unversty, Wuhan, Chna, n 1986, the M.Sc. degree from the Bejng Unversty of Technology, Bejng, Chna, n 1992, and the Ph.D. degree from the Chna Unversty of Geoscences, Wuhan, n He s currently a faculty member wth the School of Computer Scence, Chna Unversty of Geoscences. He has publshed over 50 research papers n journals and nternatonal conferences, such as IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, IEEE TRANSACTIONS ON CYBERNETICS, Appled Soft Computng, Informaton Scences, Knowledge-Based Systems, and Knowledge and Informaton Systems. Hs current research nterests nclude data mnng, machne learnng, evolutonary computaton, and ther applcatons.

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Decson surface s a hyperplane (lne n 2D) n feature space (smlar to the Perceptron) Arguably, the most mportant recent dscovery n machne learnng In a nutshell: map the data to a predetermned

More information

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique //00 :0 AM Outlne and Readng The Greedy Method The Greedy Method Technque (secton.) Fractonal Knapsack Problem (secton..) Task Schedulng (secton..) Mnmum Spannng Trees (secton.) Change Money Problem Greedy

More information

Cluster Analysis of Electrical Behavior

Cluster Analysis of Electrical Behavior Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School

More information

The Research of Support Vector Machine in Agricultural Data Classification

The Research of Support Vector Machine in Agricultural Data Classification The Research of Support Vector Machne n Agrcultural Data Classfcaton Le Sh, Qguo Duan, Xnmng Ma, Me Weng College of Informaton and Management Scence, HeNan Agrcultural Unversty, Zhengzhou 45000 Chna Zhengzhou

More information

Collaboratively Regularized Nearest Points for Set Based Recognition

Collaboratively Regularized Nearest Points for Set Based Recognition Academc Center for Computng and Meda Studes, Kyoto Unversty Collaboratvely Regularzed Nearest Ponts for Set Based Recognton Yang Wu, Mchhko Mnoh, Masayuk Mukunok Kyoto Unversty 9/1/013 BMVC 013 @ Brstol,

More information

Feature Reduction and Selection

Feature Reduction and Selection Feature Reducton and Selecton Dr. Shuang LIANG School of Software Engneerng TongJ Unversty Fall, 2012 Today s Topcs Introducton Problems of Dmensonalty Feature Reducton Statstc methods Prncpal Components

More information

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization Problem efntons and Evaluaton Crtera for Computatonal Expensve Optmzaton B. Lu 1, Q. Chen and Q. Zhang 3, J. J. Lang 4, P. N. Suganthan, B. Y. Qu 6 1 epartment of Computng, Glyndwr Unversty, UK Faclty

More information

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1 4/14/011 Outlne Dscrmnatve classfers for mage recognton Wednesday, Aprl 13 Krsten Grauman UT-Austn Last tme: wndow-based generc obect detecton basc ppelne face detecton wth boostng as case study Today:

More information

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,

More information

Optimizing Document Scoring for Query Retrieval

Optimizing Document Scoring for Query Retrieval Optmzng Document Scorng for Query Retreval Brent Ellwen baellwe@cs.stanford.edu Abstract The goal of ths project was to automate the process of tunng a document query engne. Specfcally, I used machne learnng

More information

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION Paulo Quntlano 1 & Antono Santa-Rosa 1 Federal Polce Department, Brasla, Brazl. E-mals: quntlano.pqs@dpf.gov.br and

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

12/2/2009. Announcements. Parametric / Non-parametric. Case-Based Reasoning. Nearest-Neighbor on Images. Nearest-Neighbor Classification

12/2/2009. Announcements. Parametric / Non-parametric. Case-Based Reasoning. Nearest-Neighbor on Images. Nearest-Neighbor Classification Introducton to Artfcal Intellgence V22.0472-001 Fall 2009 Lecture 24: Nearest-Neghbors & Support Vector Machnes Rob Fergus Dept of Computer Scence, Courant Insttute, NYU Sldes from Danel Yeung, John DeNero

More information

BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION

BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION SHI-LIANG SUN, HONG-LEI SHI Department of Computer Scence and Technology, East Chna Normal Unversty 500 Dongchuan Road, Shangha 200241, P. R. Chna E-MAIL: slsun@cs.ecnu.edu.cn,

More information

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth

More information

Query Clustering Using a Hybrid Query Similarity Measure

Query Clustering Using a Hybrid Query Similarity Measure Query clusterng usng a hybrd query smlarty measure Fu. L., Goh, D.H., & Foo, S. (2004). WSEAS Transacton on Computers, 3(3), 700-705. Query Clusterng Usng a Hybrd Query Smlarty Measure Ln Fu, Don Hoe-Lan

More information

Classifier Selection Based on Data Complexity Measures *

Classifier Selection Based on Data Complexity Measures * Classfer Selecton Based on Data Complexty Measures * Edth Hernández-Reyes, J.A. Carrasco-Ochoa, and J.Fco. Martínez-Trndad Natonal Insttute for Astrophyscs, Optcs and Electroncs, Lus Enrque Erro No.1 Sta.

More information

TN348: Openlab Module - Colocalization

TN348: Openlab Module - Colocalization TN348: Openlab Module - Colocalzaton Topc The Colocalzaton module provdes the faclty to vsualze and quantfy colocalzaton between pars of mages. The Colocalzaton wndow contans a prevew of the two mages

More information

Smoothing Spline ANOVA for variable screening

Smoothing Spline ANOVA for variable screening Smoothng Splne ANOVA for varable screenng a useful tool for metamodels tranng and mult-objectve optmzaton L. Rcco, E. Rgon, A. Turco Outlne RSM Introducton Possble couplng Test case MOO MOO wth Game Theory

More information

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour 6.854 Advanced Algorthms Petar Maymounkov Problem Set 11 (November 23, 2005) Wth: Benjamn Rossman, Oren Wemann, and Pouya Kheradpour Problem 1. We reduce vertex cover to MAX-SAT wth weghts, such that the

More information

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning Outlne Artfcal Intellgence and ts applcatons Lecture 8 Unsupervsed Learnng Professor Danel Yeung danyeung@eee.org Dr. Patrck Chan patrckchan@eee.org South Chna Unversty of Technology, Chna Introducton

More information

BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET

BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET 1 BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET TZU-CHENG CHUANG School of Electrcal and Computer Engneerng, Purdue Unversty, West Lafayette, Indana 47907 SAUL B. GELFAND School

More information

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems A Unfed Framework for Semantcs and Feature Based Relevance Feedback n Image Retreval Systems Ye Lu *, Chunhu Hu 2, Xngquan Zhu 3*, HongJang Zhang 2, Qang Yang * School of Computng Scence Smon Fraser Unversty

More information

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster

More information

Online Detection and Classification of Moving Objects Using Progressively Improving Detectors

Online Detection and Classification of Moving Objects Using Progressively Improving Detectors Onlne Detecton and Classfcaton of Movng Objects Usng Progressvely Improvng Detectors Omar Javed Saad Al Mubarak Shah Computer Vson Lab School of Computer Scence Unversty of Central Florda Orlando, FL 32816

More information

Journal of Chemical and Pharmaceutical Research, 2014, 6(6): Research Article. A selective ensemble classification method on microarray data

Journal of Chemical and Pharmaceutical Research, 2014, 6(6): Research Article. A selective ensemble classification method on microarray data Avalable onlne www.ocpr.com Journal of Chemcal and Pharmaceutcal Research, 2014, 6(6):2860-2866 Research Artcle ISSN : 0975-7384 CODEN(USA) : JCPRC5 A selectve ensemble classfcaton method on mcroarray

More information

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points; Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features

More information

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur FEATURE EXTRACTION Dr. K.Vjayarekha Assocate Dean School of Electrcal and Electroncs Engneerng SASTRA Unversty, Thanjavur613 41 Jont Intatve of IITs and IISc Funded by MHRD Page 1 of 8 Table of Contents

More information

Machine Learning: Algorithms and Applications

Machine Learning: Algorithms and Applications 14/05/1 Machne Learnng: Algorthms and Applcatons Florano Zn Free Unversty of Bozen-Bolzano Faculty of Computer Scence Academc Year 011-01 Lecture 10: 14 May 01 Unsupervsed Learnng cont Sldes courtesy of

More information

Concurrent Apriori Data Mining Algorithms

Concurrent Apriori Data Mining Algorithms Concurrent Apror Data Mnng Algorthms Vassl Halatchev Department of Electrcal Engneerng and Computer Scence York Unversty, Toronto October 8, 2015 Outlne Why t s mportant Introducton to Assocaton Rule Mnng

More information

Unsupervised Learning

Unsupervised Learning Pattern Recognton Lecture 8 Outlne Introducton Unsupervsed Learnng Parametrc VS Non-Parametrc Approach Mxture of Denstes Maxmum-Lkelhood Estmates Clusterng Prof. Danel Yeung School of Computer Scence and

More information

Machine Learning 9. week

Machine Learning 9. week Machne Learnng 9. week Mappng Concept Radal Bass Functons (RBF) RBF Networks 1 Mappng It s probably the best scenaro for the classfcaton of two dataset s to separate them lnearly. As you see n the below

More information

Fast Feature Value Searching for Face Detection

Fast Feature Value Searching for Face Detection Vol., No. 2 Computer and Informaton Scence Fast Feature Value Searchng for Face Detecton Yunyang Yan Department of Computer Engneerng Huayn Insttute of Technology Hua an 22300, Chna E-mal: areyyyke@63.com

More information

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task Proceedngs of NTCIR-6 Workshop Meetng, May 15-18, 2007, Tokyo, Japan Term Weghtng Classfcaton System Usng the Ch-square Statstc for the Classfcaton Subtask at NTCIR-6 Patent Retreval Task Kotaro Hashmoto

More information

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

FINDING IMPORTANT NODES IN SOCIAL NETWORKS BASED ON MODIFIED PAGERANK

FINDING IMPORTANT NODES IN SOCIAL NETWORKS BASED ON MODIFIED PAGERANK FINDING IMPORTANT NODES IN SOCIAL NETWORKS BASED ON MODIFIED PAGERANK L-qng Qu, Yong-quan Lang 2, Jng-Chen 3, 2 College of Informaton Scence and Technology, Shandong Unversty of Scence and Technology,

More information

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr) Helsnk Unversty Of Technology, Systems Analyss Laboratory Mat-2.08 Independent research projects n appled mathematcs (3 cr) "! #$&% Antt Laukkanen 506 R ajlaukka@cc.hut.f 2 Introducton...3 2 Multattrbute

More information

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Determining the Optimal Bandwidth Based on Multi-criterion Fusion Proceedngs of 01 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 5 (01) (01) IACSIT Press, Sngapore Determnng the Optmal Bandwdth Based on Mult-crteron Fuson Ha-L Lang 1+, Xan-Mn

More information

Three supervised learning methods on pen digits character recognition dataset

Three supervised learning methods on pen digits character recognition dataset Three supervsed learnng methods on pen dgts character recognton dataset Chrs Flezach Department of Computer Scence and Engneerng Unversty of Calforna, San Dego San Dego, CA 92093 cflezac@cs.ucsd.edu Satoru

More information

Lecture 5: Multilayer Perceptrons

Lecture 5: Multilayer Perceptrons Lecture 5: Multlayer Perceptrons Roger Grosse 1 Introducton So far, we ve only talked about lnear models: lnear regresson and lnear bnary classfers. We noted that there are functons that can t be represented

More information

Lobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide

Lobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide Lobachevsky State Unversty of Nzhn Novgorod Polyhedron Quck Start Gude Nzhn Novgorod 2016 Contents Specfcaton of Polyhedron software... 3 Theoretcal background... 4 1. Interface of Polyhedron... 6 1.1.

More information

Discriminative Dictionary Learning with Pairwise Constraints

Discriminative Dictionary Learning with Pairwise Constraints Dscrmnatve Dctonary Learnng wth Parwse Constrants Humn Guo Zhuoln Jang LARRY S. DAVIS UNIVERSITY OF MARYLAND Nov. 6 th, Outlne Introducton/motvaton Dctonary Learnng Dscrmnatve Dctonary Learnng wth Parwse

More information

Module Management Tool in Software Development Organizations

Module Management Tool in Software Development Organizations Journal of Computer Scence (5): 8-, 7 ISSN 59-66 7 Scence Publcatons Management Tool n Software Development Organzatons Ahmad A. Al-Rababah and Mohammad A. Al-Rababah Faculty of IT, Al-Ahlyyah Amman Unversty,

More information

X- Chart Using ANOM Approach

X- Chart Using ANOM Approach ISSN 1684-8403 Journal of Statstcs Volume 17, 010, pp. 3-3 Abstract X- Chart Usng ANOM Approach Gullapall Chakravarth 1 and Chaluvad Venkateswara Rao Control lmts for ndvdual measurements (X) chart are

More information

APPLICATION OF MULTIVARIATE LOSS FUNCTION FOR ASSESSMENT OF THE QUALITY OF TECHNOLOGICAL PROCESS MANAGEMENT

APPLICATION OF MULTIVARIATE LOSS FUNCTION FOR ASSESSMENT OF THE QUALITY OF TECHNOLOGICAL PROCESS MANAGEMENT 3. - 5. 5., Brno, Czech Republc, EU APPLICATION OF MULTIVARIATE LOSS FUNCTION FOR ASSESSMENT OF THE QUALITY OF TECHNOLOGICAL PROCESS MANAGEMENT Abstract Josef TOŠENOVSKÝ ) Lenka MONSPORTOVÁ ) Flp TOŠENOVSKÝ

More information

Associative Based Classification Algorithm For Diabetes Disease Prediction

Associative Based Classification Algorithm For Diabetes Disease Prediction Internatonal Journal of Engneerng Trends and Technology (IJETT) Volume-41 Number-3 - November 016 Assocatve Based Classfcaton Algorthm For Dabetes Dsease Predcton 1 N. Gnana Deepka, Y.surekha, 3 G.Laltha

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS46: Mnng Massve Datasets Jure Leskovec, Stanford Unversty http://cs46.stanford.edu /19/013 Jure Leskovec, Stanford CS46: Mnng Massve Datasets, http://cs46.stanford.edu Perceptron: y = sgn( x Ho to fnd

More information

S1 Note. Basis functions.

S1 Note. Basis functions. S1 Note. Bass functons. Contents Types of bass functons...1 The Fourer bass...2 B-splne bass...3 Power and type I error rates wth dfferent numbers of bass functons...4 Table S1. Smulaton results of type

More information

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision SLAM Summer School 2006 Practcal 2: SLAM usng Monocular Vson Javer Cvera, Unversty of Zaragoza Andrew J. Davson, Imperal College London J.M.M Montel, Unversty of Zaragoza. josemar@unzar.es, jcvera@unzar.es,

More information

Machine Learning. Support Vector Machines. (contains material adapted from talks by Constantin F. Aliferis & Ioannis Tsamardinos, and Martin Law)

Machine Learning. Support Vector Machines. (contains material adapted from talks by Constantin F. Aliferis & Ioannis Tsamardinos, and Martin Law) Machne Learnng Support Vector Machnes (contans materal adapted from talks by Constantn F. Alfers & Ioanns Tsamardnos, and Martn Law) Bryan Pardo, Machne Learnng: EECS 349 Fall 2014 Support Vector Machnes

More information

Biostatistics 615/815

Biostatistics 615/815 The E-M Algorthm Bostatstcs 615/815 Lecture 17 Last Lecture: The Smplex Method General method for optmzaton Makes few assumptons about functon Crawls towards mnmum Some recommendatons Multple startng ponts

More information

LECTURE : MANIFOLD LEARNING

LECTURE : MANIFOLD LEARNING LECTURE : MANIFOLD LEARNING Rta Osadchy Some sldes are due to L.Saul, V. C. Raykar, N. Verma Topcs PCA MDS IsoMap LLE EgenMaps Done! Dmensonalty Reducton Data representaton Inputs are real-valued vectors

More information

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching A Fast Vsual Trackng Algorthm Based on Crcle Pxels Matchng Zhqang Hou hou_zhq@sohu.com Chongzhao Han czhan@mal.xjtu.edu.cn Ln Zheng Abstract: A fast vsual trackng algorthm based on crcle pxels matchng

More information

Classifying Acoustic Transient Signals Using Artificial Intelligence

Classifying Acoustic Transient Signals Using Artificial Intelligence Classfyng Acoustc Transent Sgnals Usng Artfcal Intellgence Steve Sutton, Unversty of North Carolna At Wlmngton (suttons@charter.net) Greg Huff, Unversty of North Carolna At Wlmngton (jgh7476@uncwl.edu)

More information

Fuzzy Filtering Algorithms for Image Processing: Performance Evaluation of Various Approaches

Fuzzy Filtering Algorithms for Image Processing: Performance Evaluation of Various Approaches Proceedngs of the Internatonal Conference on Cognton and Recognton Fuzzy Flterng Algorthms for Image Processng: Performance Evaluaton of Varous Approaches Rajoo Pandey and Umesh Ghanekar Department of

More information

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY 1. SSDH: Semi-supervised Deep Hashing for Large Scale Image Retrieval

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY 1. SSDH: Semi-supervised Deep Hashing for Large Scale Image Retrieval IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY SSDH: Sem-supervsed Deep Hashng for Large Scale Image Retreval Jan Zhang, and Yuxn Peng arxv:607.08477v2 [cs.cv] 8 Jun 207 Abstract Hashng

More information

A Hybrid Genetic Algorithm for Routing Optimization in IP Networks Utilizing Bandwidth and Delay Metrics

A Hybrid Genetic Algorithm for Routing Optimization in IP Networks Utilizing Bandwidth and Delay Metrics A Hybrd Genetc Algorthm for Routng Optmzaton n IP Networks Utlzng Bandwdth and Delay Metrcs Anton Redl Insttute of Communcaton Networks, Munch Unversty of Technology, Arcsstr. 21, 80290 Munch, Germany

More information

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters Proper Choce of Data Used for the Estmaton of Datum Transformaton Parameters Hakan S. KUTOGLU, Turkey Key words: Coordnate systems; transformaton; estmaton, relablty. SUMMARY Advances n technologes and

More information

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc.

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc. [Type text] [Type text] [Type text] ISSN : 0974-74 Volume 0 Issue BoTechnology 04 An Indan Journal FULL PAPER BTAIJ 0() 04 [684-689] Revew on Chna s sports ndustry fnancng market based on market -orented

More information

Quality Improvement Algorithm for Tetrahedral Mesh Based on Optimal Delaunay Triangulation

Quality Improvement Algorithm for Tetrahedral Mesh Based on Optimal Delaunay Triangulation Intellgent Informaton Management, 013, 5, 191-195 Publshed Onlne November 013 (http://www.scrp.org/journal/m) http://dx.do.org/10.36/m.013.5601 Qualty Improvement Algorthm for Tetrahedral Mesh Based on

More information

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,

More information

Edge Detection in Noisy Images Using the Support Vector Machines

Edge Detection in Noisy Images Using the Support Vector Machines Edge Detecton n Nosy Images Usng the Support Vector Machnes Hlaro Gómez-Moreno, Saturnno Maldonado-Bascón, Francsco López-Ferreras Sgnal Theory and Communcatons Department. Unversty of Alcalá Crta. Madrd-Barcelona

More information

CAN COMPUTERS LEARN FASTER? Seyda Ertekin Computer Science & Engineering The Pennsylvania State University

CAN COMPUTERS LEARN FASTER? Seyda Ertekin Computer Science & Engineering The Pennsylvania State University CAN COMPUTERS LEARN FASTER? Seyda Ertekn Computer Scence & Engneerng The Pennsylvana State Unversty sertekn@cse.psu.edu ABSTRACT Ever snce computers were nvented, manknd wondered whether they mght be made

More information

A Lazy Ensemble Learning Method to Classification

A Lazy Ensemble Learning Method to Classification IJCSI Internatonal Journal of Computer Scence Issues, Vol. 7, Issue 5, September 2010 ISSN (Onlne): 1694-0814 344 A Lazy Ensemble Learnng Method to Classfcaton Haleh Homayoun 1, Sattar Hashem 2 and Al

More information

An Entropy-Based Approach to Integrated Information Needs Assessment

An Entropy-Based Approach to Integrated Information Needs Assessment Dstrbuton Statement A: Approved for publc release; dstrbuton s unlmted. An Entropy-Based Approach to ntegrated nformaton Needs Assessment June 8, 2004 Wllam J. Farrell Lockheed Martn Advanced Technology

More information

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices Steps for Computng the Dssmlarty, Entropy, Herfndahl-Hrschman and Accessblty (Gravty wth Competton) Indces I. Dssmlarty Index Measurement: The followng formula can be used to measure the evenness between

More information

Modular PCA Face Recognition Based on Weighted Average

Modular PCA Face Recognition Based on Weighted Average odern Appled Scence odular PCA Face Recognton Based on Weghted Average Chengmao Han (Correspondng author) Department of athematcs, Lny Normal Unversty Lny 76005, Chna E-mal: hanchengmao@163.com Abstract

More information

An Image Fusion Approach Based on Segmentation Region

An Image Fusion Approach Based on Segmentation Region Rong Wang, L-Qun Gao, Shu Yang, Yu-Hua Cha, and Yan-Chun Lu An Image Fuson Approach Based On Segmentaton Regon An Image Fuson Approach Based on Segmentaton Regon Rong Wang, L-Qun Gao, Shu Yang 3, Yu-Hua

More information

Object-Based Techniques for Image Retrieval

Object-Based Techniques for Image Retrieval 54 Zhang, Gao, & Luo Chapter VII Object-Based Technques for Image Retreval Y. J. Zhang, Tsnghua Unversty, Chna Y. Y. Gao, Tsnghua Unversty, Chna Y. Luo, Tsnghua Unversty, Chna ABSTRACT To overcome the

More information

Manifold-Ranking Based Keyword Propagation for Image Retrieval *

Manifold-Ranking Based Keyword Propagation for Image Retrieval * Manfold-Rankng Based Keyword Propagaton for Image Retreval * Hanghang Tong,, Jngru He,, Mngjng L 2, We-Yng Ma 2, Hong-Jang Zhang 2 and Changshu Zhang 3,3 Department of Automaton, Tsnghua Unversty, Bejng

More information

Feature Selection as an Improving Step for Decision Tree Construction

Feature Selection as an Improving Step for Decision Tree Construction 2009 Internatonal Conference on Machne Learnng and Computng IPCSIT vol.3 (2011) (2011) IACSIT Press, Sngapore Feature Selecton as an Improvng Step for Decson Tree Constructon Mahd Esmael 1, Fazekas Gabor

More information

SVM-based Learning for Multiple Model Estimation

SVM-based Learning for Multiple Model Estimation SVM-based Learnng for Multple Model Estmaton Vladmr Cherkassky and Yunqan Ma Department of Electrcal and Computer Engneerng Unversty of Mnnesota Mnneapols, MN 55455 {cherkass,myq}@ece.umn.edu Abstract:

More information

Deep Classification in Large-scale Text Hierarchies

Deep Classification in Large-scale Text Hierarchies Deep Classfcaton n Large-scale Text Herarches Gu-Rong Xue Dkan Xng Qang Yang 2 Yong Yu Dept. of Computer Scence and Engneerng Shangha Jao-Tong Unversty {grxue, dkxng, yyu}@apex.sjtu.edu.cn 2 Hong Kong

More information

Hermite Splines in Lie Groups as Products of Geodesics

Hermite Splines in Lie Groups as Products of Geodesics Hermte Splnes n Le Groups as Products of Geodescs Ethan Eade Updated May 28, 2017 1 Introducton 1.1 Goal Ths document defnes a curve n the Le group G parametrzed by tme and by structural parameters n the

More information

Determining Fuzzy Sets for Quantitative Attributes in Data Mining Problems

Determining Fuzzy Sets for Quantitative Attributes in Data Mining Problems Determnng Fuzzy Sets for Quanttatve Attrbutes n Data Mnng Problems ATTILA GYENESEI Turku Centre for Computer Scence (TUCS) Unversty of Turku, Department of Computer Scence Lemmnkäsenkatu 4A, FIN-5 Turku

More information

ISSN: International Journal of Engineering and Innovative Technology (IJEIT) Volume 1, Issue 4, April 2012

ISSN: International Journal of Engineering and Innovative Technology (IJEIT) Volume 1, Issue 4, April 2012 Performance Evoluton of Dfferent Codng Methods wth β - densty Decodng Usng Error Correctng Output Code Based on Multclass Classfcaton Devangn Dave, M. Samvatsar, P. K. Bhanoda Abstract A common way to

More information

Journal of Process Control

Journal of Process Control Journal of Process Control (0) 738 750 Contents lsts avalable at ScVerse ScenceDrect Journal of Process Control j ourna l ho me pag e: wwwelsevercom/locate/jprocont Decentralzed fault detecton and dagnoss

More information

Fuzzy Modeling of the Complexity vs. Accuracy Trade-off in a Sequential Two-Stage Multi-Classifier System

Fuzzy Modeling of the Complexity vs. Accuracy Trade-off in a Sequential Two-Stage Multi-Classifier System Fuzzy Modelng of the Complexty vs. Accuracy Trade-off n a Sequental Two-Stage Mult-Classfer System MARK LAST 1 Department of Informaton Systems Engneerng Ben-Guron Unversty of the Negev Beer-Sheva 84105

More information

CMPS 10 Introduction to Computer Science Lecture Notes

CMPS 10 Introduction to Computer Science Lecture Notes CPS 0 Introducton to Computer Scence Lecture Notes Chapter : Algorthm Desgn How should we present algorthms? Natural languages lke Englsh, Spansh, or French whch are rch n nterpretaton and meanng are not

More information

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 15

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 15 CS434a/541a: Pattern Recognton Prof. Olga Veksler Lecture 15 Today New Topc: Unsupervsed Learnng Supervsed vs. unsupervsed learnng Unsupervsed learnng Net Tme: parametrc unsupervsed learnng Today: nonparametrc

More information

A Statistical Model Selection Strategy Applied to Neural Networks

A Statistical Model Selection Strategy Applied to Neural Networks A Statstcal Model Selecton Strategy Appled to Neural Networks Joaquín Pzarro Elsa Guerrero Pedro L. Galndo joaqun.pzarro@uca.es elsa.guerrero@uca.es pedro.galndo@uca.es Dpto Lenguajes y Sstemas Informátcos

More information

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance Tsnghua Unversty at TAC 2009: Summarzng Mult-documents by Informaton Dstance Chong Long, Mnle Huang, Xaoyan Zhu State Key Laboratory of Intellgent Technology and Systems, Tsnghua Natonal Laboratory for

More information

Experiments in Text Categorization Using Term Selection by Distance to Transition Point

Experiments in Text Categorization Using Term Selection by Distance to Transition Point Experments n Text Categorzaton Usng Term Selecton by Dstance to Transton Pont Edgar Moyotl-Hernández, Héctor Jménez-Salazar Facultad de Cencas de la Computacón, B. Unversdad Autónoma de Puebla, 14 Sur

More information

Face Detection with Deep Learning

Face Detection with Deep Learning Face Detecton wth Deep Learnng Yu Shen Yus122@ucsd.edu A13227146 Kuan-We Chen kuc010@ucsd.edu A99045121 Yzhou Hao y3hao@ucsd.edu A98017773 Mn Hsuan Wu mhwu@ucsd.edu A92424998 Abstract The project here

More information

Incremental Learning with Support Vector Machines and Fuzzy Set Theory

Incremental Learning with Support Vector Machines and Fuzzy Set Theory The 25th Workshop on Combnatoral Mathematcs and Computaton Theory Incremental Learnng wth Support Vector Machnes and Fuzzy Set Theory Yu-Mng Chuang 1 and Cha-Hwa Ln 2* 1 Department of Computer Scence and

More information

Alignment Results of SOBOM for OAEI 2010

Alignment Results of SOBOM for OAEI 2010 Algnment Results of SOBOM for OAEI 2010 Pegang Xu, Yadong Wang, Lang Cheng, Tany Zang School of Computer Scence and Technology Harbn Insttute of Technology, Harbn, Chna pegang.xu@gmal.com, ydwang@ht.edu.cn,

More information

Adaptive Transfer Learning

Adaptive Transfer Learning Adaptve Transfer Learnng Bn Cao, Snno Jaln Pan, Yu Zhang, Dt-Yan Yeung, Qang Yang Hong Kong Unversty of Scence and Technology Clear Water Bay, Kowloon, Hong Kong {caobn,snnopan,zhangyu,dyyeung,qyang}@cse.ust.hk

More information

Competitive Sparse Representation Classification for Face Recognition

Competitive Sparse Representation Classification for Face Recognition Vol. 6, No. 8, 05 Compettve Sparse Representaton Classfcaton for Face Recognton Yng Lu Chongqng Key Laboratory of Computatonal Intellgence Chongqng Unversty of Posts and elecommuncatons Chongqng, Chna

More information

Recommended Items Rating Prediction based on RBF Neural Network Optimized by PSO Algorithm

Recommended Items Rating Prediction based on RBF Neural Network Optimized by PSO Algorithm Recommended Items Ratng Predcton based on RBF Neural Network Optmzed by PSO Algorthm Chengfang Tan, Cayn Wang, Yuln L and Xx Q Abstract In order to mtgate the data sparsty and cold-start problems of recommendaton

More information

FAHP and Modified GRA Based Network Selection in Heterogeneous Wireless Networks

FAHP and Modified GRA Based Network Selection in Heterogeneous Wireless Networks 2017 2nd Internatonal Semnar on Appled Physcs, Optoelectroncs and Photoncs (APOP 2017) ISBN: 978-1-60595-522-3 FAHP and Modfed GRA Based Network Selecton n Heterogeneous Wreless Networks Xaohan DU, Zhqng

More information

Wishing you all a Total Quality New Year!

Wishing you all a Total Quality New Year! Total Qualty Management and Sx Sgma Post Graduate Program 214-15 Sesson 4 Vnay Kumar Kalakband Assstant Professor Operatons & Systems Area 1 Wshng you all a Total Qualty New Year! Hope you acheve Sx sgma

More information

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,

More information

Virtual Machine Migration based on Trust Measurement of Computer Node

Virtual Machine Migration based on Trust Measurement of Computer Node Appled Mechancs and Materals Onlne: 2014-04-04 ISSN: 1662-7482, Vols. 536-537, pp 678-682 do:10.4028/www.scentfc.net/amm.536-537.678 2014 Trans Tech Publcatons, Swtzerland Vrtual Machne Mgraton based on

More information

Meta-heuristics for Multidimensional Knapsack Problems

Meta-heuristics for Multidimensional Knapsack Problems 2012 4th Internatonal Conference on Computer Research and Development IPCSIT vol.39 (2012) (2012) IACSIT Press, Sngapore Meta-heurstcs for Multdmensonal Knapsack Problems Zhbao Man + Computer Scence Department,

More information

Boosting for transfer learning with multiple sources

Boosting for transfer learning with multiple sources Boostng for transfer learnng wth multple sources Y Yao Ganfranco Doretto Vsualzaton and Computer Vson Lab, GE Global Research, Nskayuna, NY 239 yaoy@gecom doretto@researchgecom Abstract Transfer learnng

More information

Data Mining: Model Evaluation

Data Mining: Model Evaluation Data Mnng: Model Evaluaton Aprl 16, 2013 1 Issues: Evaluatng Classfcaton Methods Accurac classfer accurac: predctng class label predctor accurac: guessng value of predcted attrbutes Speed tme to construct

More information

Discriminative classifiers for object classification. Last time

Discriminative classifiers for object classification. Last time Dscrmnatve classfers for object classfcaton Thursday, Nov 12 Krsten Grauman UT Austn Last tme Supervsed classfcaton Loss and rsk, kbayes rule Skn color detecton example Sldng ndo detecton Classfers, boostng

More information

Load Balancing for Hex-Cell Interconnection Network

Load Balancing for Hex-Cell Interconnection Network Int. J. Communcatons, Network and System Scences,,, - Publshed Onlne Aprl n ScRes. http://www.scrp.org/journal/jcns http://dx.do.org/./jcns.. Load Balancng for Hex-Cell Interconnecton Network Saher Manaseer,

More information