430 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 45, NO. 3, MARCH Boosting for Multi-Graph Classification

Size: px

Start display at page:

Download "430 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 45, NO. 3, MARCH Boosting for Multi-Graph Classification"

Dennis Lane
6 years ago
Views:

1 430 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 45, NO. 3, MARCH 2015 Boostng for Mult-Graph Classfcaton Ja Wu, Student Member, IEEE, Shru Pan, Xngquan Zhu, Senor Member, IEEE, and Zhhua Ca Abstract In ths paper, we formulate a novel graph-based learnng problem, mult-graph classfcaton (MGC), whch ams to learn a classfer from a set of labeled bags each contanng a number of graphs nsde the bag. A bag s labeled postve, f at least one graph n the bag s postve, and negatve otherwse. Such a mult-graph representaton can be used for many real-world applcatons, such as webpage classfcaton, where a webpage can be regarded as a bag wth texts and mages nsde the webpage beng represented as graphs. Ths problem s a generalzaton of mult-nstance learnng (MIL) but wth vtal dfferences, manly because nstances n MIL share a common feature space whereas no feature s avalable to represent graphs n a mult-graph bag. To solve the problem, we propose a boostng based mult-graph classfcaton framework (bmgc). Gven a set of labeled mult-graph bags, bmgc employs dynamc weght adjustment at both bag- and graph-levels to select one subgraph n each teraton as a weak classfer. In each teraton, bag and graph weghts are adjusted such that an ncorrectly classfed bag wll receve a hgher weght because ts predcted bag label conflcts to the genune label, whereas an ncorrectly classfed graph wll receve a lower weght value f the graph s n a postve bag (or a hgher weght f the graph s n a negatve bag). Accordngly, bmgc s able to dfferentate graphs n postve and negatve bags to derve effectve classfers to form a boostng model for MGC. Experments and comparsons on real-world mult-graph learnng tasks demonstrate the algorthm performance. Index Terms Boostng, graph classfcaton, mult-graph, mult-nstance learnng, subgraph mnng. I. INTRODUCTION GRAPH classfcaton, n whch the object to be classfed s a graph, has found many applcatons n the past decade, such as chemcal compounds [1], XML documents [2], program flows [3], and mages [4]. Despte ts success n a broad spectrum of areas, standard graph classfcaton settng s rather restrctve for many real-world learnng problems. One of such problems s mult-graph classfcaton (MGC), Manuscrpt receved July 14, 2013; revsed January 25, 2014; accepted May 13, Date of publcaton July 8, 2014; date of current verson February 12, Ths paper was recommended by Assocate Edtor M. Last. J. Wu s wth the School of Computer Scence, Chna Unversty of Geoscences, Wuhan , Chna, and also wth the Centre for Quantum Computaton and Intellgent Systems, Faculty of Engneerng & Informaton Technology, Unversty of Technology, Sydney, Ultmo, NSW 2007, Australa (e-mal: ja.wu@student.uts.edu.au). S. Pan s wth the Centre for Quantum Computaton and Intellgent Systems, Faculty of Engneerng & Informaton Technology, Unversty of Technology, Sydney, Ultmo, NSW 2007, Australa (e-mal: shru.pan@student.uts.edu.au). X. Zhu s wth the Department of Computer and Electrcal Engneerng & Computer Scence, Florda Atlantc Unversty, Boca Raton, FL USA (e-mal: xzhu3@fau.edu). Z. Ca s wth the School of Computer Scence, Chna Unversty of Geoscences, Wuhan , Chna (e-mal: zhca@cug.edu.cn). Color versons of one or more of the fgures n ths paper are avalable onlne at Dgtal Object Identfer /TCYB n whch the object to be classfed s a bag of graphs. For example, a webpage may consst of texts and mages, where texts can be represented as graphs to preserve contextual nformaton [5] and mages can also be represented as graphs to descrbe structural dependency between mage regons [6]. As a result, a webpage can be regarded as a bag contanng a number of graphs, each of whch represents a certan part of the webpage content. For an nformaton seeker, a webpage s nterestng to hm/her f one or multple parts of the webpage (texts and/or mages) draws hs/her attenton a graph bag s postve f at least one graph n the bag s postve. On the other hand, the webpage s not nterestng to the vewer f none of the content attracts the vewer a graph bag s negatve f all graphs nsde the bag are negatve. The above mult-graph settng can be found useful n many other domans. For bo-pharmaceutcal test, labelng ndvdual molecules (whch can be represented as graphs) s expensve and tme-consumng. Molecular group actvty predcton can be used to nvestgate the actvty of a group (.e., a bag) of molecules, wth the actve group (.e., postve bag), n whch at least one molecule s actve, beng further nvestgated for ndvdual actvty test. Another MGC applcaton s scentfc publcaton classfcaton, where a paper and ts references can be represented as a bag of graphs and each graph (.e., a paper) s formed by usng the correlatons between keywords n the paper, as shown n Fg. 1. A bag s labeled postve, f the paper or any of ts references s relevant to a specfc topc. Smlarly, for onlne revew based product recommendaton, each product receves many customer revews. For each revew composed of detaled text descrptons, we can use a graph to represent the revew descrptons. Thus, a product can be represented as a bag of graphs. Assume customers manly concern about several key propertes, such as affordablty and durablty, of the product. A product (.e., a bag) can be labeled as postve f t receves very postve revew n any of these propertes, and negatve otherwse. As a result, we can use MGC learnng to help recommend products to customers. Indeed, the MGC problem s a generalzaton of multnstance learnng (MIL) to graph data, but wth sgnfcant complcatons. Exstng MIL methods cannot be smply appled to the mult-graph settng because they can only handle bags wth all nstances beng represented n a common vectoral feature space. Unfortunately, n the MGC problem settng, graphs cannot drectly provde feature vectors for learnng. On the other hand, exstng graph classfcaton methods cannot be used to tackle the MGC problem nether, because they requre each sngle graph to be labeled n order to learn a classfer. One smple soluton s to represent all graphs n the same feature space, by usng some subgraph feature selecton methods [7] [9] to convert graphs as nstances, and then c 2014 IEEE. Personal use s permtted, but republcaton/redstrbuton requres IEEE permsson. See for more nformaton.

2 WU et al.: BOOSTING FOR MULTI-GRAPH CLASSIFICATION 431 Fg. 1. Example of mult-graph representaton for a scentfc publcaton. Each paper s represented as a mult-graph bag, where each graph nsde the bag corresponds to the abstract of the paper or the abstract of the reference cted n the paper (a graph s formed by usneywords of the abstract as nodes and ther correlatons as edges). The graph constructon detals are reported n Secton VII-A. apply exstng MIL methods to the nstance bags. However, ths smple soluton suffers from three nherent dsadvantages. 1) Large Subgraph Feature Space: The graph substructure feature space ncreases, wth respect to the number of edges and nodes, n an exponental order. It s computatonally neffcent, or even nfeasble, to enumerate all subgraph features, and then select some subgraph features for classfcaton. 2) Feature Flterng Ineffcency: By separatng subgraph feature mnng and feature selecton nto two steps, the flterng process of fndng salent subgraph patterns wll depend on the optmal soluton of the subsequent learnng algorthm. It s very dffcult to theoretcally guarantee that the statstcal crteron provdes good features for the subsequent learnng algorthm. Ths s the problem of all flter methods (as dscussed n [10]). 3) Bag Constrants: The bag constrants n the mult-graph learnng provde mportant nformaton to dfferentate postve and negatve graphs, whereas the smple soluton drectly extracts subgraphs from all graphs wthout consderng mult-graph bag constrants for effectve learnng. In summary, the MGC problem for the aforementoned realworld applcatons needs to address two essental challenges. 1) Labelng Ambguty: Labels are only avalable at bag level nstead of nstance level (.e., a bag s labeled postve f t has at least one postve graph and negatve otherwse). 2) Structured Data Representaton: Instances n a bag are not vectors but graphs, whch mples that all nstances are not represented n a common feature space for calculatng smlartes or dstances. Motvated by the above challenges, n ths paper, we propose a boostng based mult-graph classfcaton framework (bmgc) for mult-graph classfcaton. In each boostng teraton, bmgc explores the most nformatve subgraph to construct a sngle weak classfer, whch s used to update the weghts of graphs and bags to obtan the next nformatve subgraph. At the end of the boostng process, the selected weak classfers are combned to form a strong classfer. A unque characterstc of bmgc s that t combnes bag- and graph-level constrants to assess the nformatveness score of a subgraph. By adaptng the score as a prunng crteron, we combne subgraph mnng and nformatve subgraph exploraton to dynamcally construct weak classfers on the fly. As a result, the proposed learnng framework not only addresses the labelng ambguty ssue by usng a novel two-level (bag and graph) weghtng strategy but also addresses the structured data representaton ssue through a dynamc subgraph selecton crteron. The expermental results on real-world data demonstrate that bmgc s effectve for MGC. The remander of the paper s organzed as follows. A bref revew of related works s reported n Secton II. The problem defnton and the overall framework are descrbed n Sectons III and IV, respectvely. Secton V ntroduces the proposed subgraph selecton crteron. The bmgc algorthm s presented n Secton VI, followed by the experments n Secton VII. Secton VIII dscusses the propertes of the proposed bmgc, and we conclude the paper n Secton IX. II. RELATED WORK A. Mult-nstance Learnng MGC s a generalzaton of the MIL problem, whch was frst proposed by Detterch et al. [11] for drug actvty predcton. Snce then, t has drawn ncreasng nterest n the machne learnng communty for many real-world applcatons, such as mage categorzaton [12], web mnng [13], language recognton [14], and computer securty [15]. The key assumpton of MIL formulaton s that the tranng set s composed of some labeled bags, each of whch contans a number of nstances. A bag s labeled postve f at least one of ts nstances s postve and negatve otherwse. The goal of MIL s to predct the label of an unknown bag. Several off-the-shelf methods have been developed to solve the MIL problem, whch can roughly be dvded nto two categores. 1) Sngle-Instance Learner Based MIL: One approach to solve MIL problems s to upgrade generc sngle-nstance learnng methods to deal wth mult-nstance data. For example, lazy learnng Ctaton-KNN and Bayesan-KNN [16] extend the k-nearest neghbor (KNN) algorthm for MIL. Tree learnng MITI [17] and MIRI [18] are varatons of decson trees for MIL. Rule learnng RIPPER-MI adapts the RIPPER algorthm [19] for MIL. Neural network BP-MIP extends standard neural networks [20], and kernel method MISMO adapts the classcal support vector machne [21] for MIL. Logstc learnng MILR [22] apples the logstc regresson to MIL, and ensemble approaches [23], [24] whch extend baggng and boostng [25] tomil. 2) Bag-Based MIL Algorthms: The frst specfcally desgned method for MIL s the axs-parallel rectangle (APR) algorthm [11], whch approxmates the APRs constructed by the conjuncton of features. Based on the dea of APR, a number of algorthms have also been desgned for MIL. Examples nclude dverse densty (DD) [26], whch searches a pont n the feature space by maxmzng the DD functon that measures a co-occurrence of smlar nstances from dfferent postve bags; MIEMDD [27], whch combnes expectatonmaxmzaton (EM) algorthm wth DD to search the most lkely concept; and MIOptmalBall [28], another boostng optmal ball based approach, whch uses balls (wth respect to

3 432 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 45, NO. 3, MARCH 2015 varous metrcs) as weak hypotheses centered at nstances of postve bags. B. Graph Classfcaton The MGC problem can also be vewed as a generalzaton of graph classfcaton where objects are bags of graphs (nstead of ndvdual graphs). Exstng graph classfcaton methods can be broadly classfed nto the followng two categores. 1) Global Dstance Based Approaches: The global dstance based methods consder correlatons [29] or smlartes between two graphs and plug the kernel matrx nto a offthe-shelf learner, such as support vector machnes, to learn a model for graph classfcaton. Examples nclude graph kernels [30], [31], graph embeddng [32], and graph transformaton [33]. One obvous drawback of global dstance based approaches s that the dstance s calculated based on the smlarty of global graph structures, such as random walks or paths, between two graphs. Therefore, t s not clear whch substructures (or whch parts of the graph) are mostly dscrmnatve for dfferentatng graphs between dfferent classes. 2) Local Subgraph Feature Based Approaches: For many graph classfcaton tasks, such as chemcal compound classfcaton [1], research has shown that graphs wthn the same class may not have hgh global smlarty but merely share some unque substructures. Accordngly, extractng mportant subgraph features, usng some predefned crtera, to represent a graph n a vectoral space becomes a popular soluton for graph classfcaton. The most common subgraph selecton crteron s frequency, whch ntends to select frequently appearng subgraphs by usng frequent subgraph mnng methods. For example, one of the most popular algorthms for frequent subgraph mnng s gspan [34]. Other methods nclude AGM [35], FSG [7], MoFa [36], and Gaston [37]. The subgraph feature mnng approach seems applcable to the MGC problem as a preprocessng step to transform all graphs nto feature vectors. However, one major defcency of ths approach s that t s computatonally demandng to enumerate all frequent subgraphs n the target graph set, whch nhbts ts ablty to handle large graph sets. To overcome ths drawback, some supervsed subgraph feature extracton approaches have been developed, such as LEAP [38], gpls [39], and COPK [8], whch search drectly for dscrmnatve subgraph patterns for classfcaton. Moreover, Jn et al. [40] proposes an effcent graph classfcaton method usng evolutonary computaton for mnng dscrmnatve subgraphs for graph classfcaton n large databases. Besdes, some graph boostng methods [41] [44] also exst to use each sngle subgraph feature as a weak classfer to buld boostng algorthm, ncludng some other types of boostng approaches [45], [46] for graph classfcaton. III. PROBLEM DEFINITION In ths secton, we defne mportant notatons and concepts, whch wll be used throughout the paper. We also formally defne the MGC problem n ths secton. Defnton 1 (Connected Graph): A graph s represented as G = (V, E, L, l) where V s a set of vertces, E V V s a Fg. 2. Example of subgraph feature representaton for bags. B + 1 and B 2 are postve and negatve bags, respectvely. G + 1 s a postve graph and G 2, G 3,andG 4 are labeled negatve. The feature value of a bag correspondng to each subgraph g 1 or g 2 s set to 1, ff there s a graph n the bag contans the subgraph, and 0 otherwse. set of edges, and L s the set of symbols for the vertces and edges. l : V E L s the functon assgnng labels to the vertces and edges. A connected graph s a graph such that there s a path between any par of vertces. Defnton 2 (Bag of Graphs): A graph bag contans a number of graphs, denoted by B ={G 1,...,G n }, where G j and n denote the jth graph and the total number of graphs n the th bag, respectvely. For ease of representaton, we also use G j to denote the jth graph n a gven bag. A bag B s label s denoted by y { 1, +1}. A bag s ether postve (B + )or negatve (B ). In ths paper, we use B ={B 1,...,B p } to denote a set of bags assocated wth the weghts w B ={w B 1,...,wB p }, where p denotes the number of bags n B. We can also aggregate all graphs n B as G ={G 1,...,G q } assocated wth the weghts w G ={w G 1,...,wG q }, where q denotes the number of graphs n G. Smlarly, the set of postve bags n B s denoted by B +, wth B denotng the set of negatve bags. Defnton 3 (Subgraph): Let G = (V, E, L, l) and = (V, E, L, l ) each denotes a connected graph. s a subgraph of G,.e., G, ff there exsts an njectve functon ϕ:v V s.t.: 1) v V, l (v) = l(ϕ(v)); and 2) (u, v) E,(ϕ(u), ϕ(v)) E and l (u, v) = l(ϕ(u), ϕ(v)). If s a subgraph of G, then G s a supergraph of. Defnton 4 (Subgraph Feature Representaton for Graph): Let S g ={g 1,...,g s } denote a set of subgraph patterns dscovered from a gven set of graphs. For each graph G,weusea subgraph feature vector x G = [(x g 1 ) G,...,(x g s ) G ] {0, 1} s to represent G n the feature space, where (x ) G = 1, ff s a subgraph of G (.e., G, S g ) and (x ) G = 0 otherwse. Defnton 5: (Subgraph Feature Representaton for Bag): Gven a set of subgraphs S g = {g 1,...,g s }, a graph bag B can be represented by a feature vector x B = [(x g 1 ) B,...,(x g s ) B ] {0, 1} s, where (x ) B = 1, ff s a subgraph of any graph G j n bag B (.e., G j B G j, S g ) and (x ) B = 0 otherwse. An example of subgraph feature representaton for graph bags s llustrated n Fg. 2, where two graph bags (B + 1 and B 2 on the left panel) are represented as two 2-D feature vectors (on the rght panel) based on two subgraph patterns (g 1 and g 2 ). Gven a mult-graph set B wth a number of labeled graph bags, where each postve bag contans at least one postve graph and all graphs n each negatve bag are negatve (.e., the

4 WU et al.: BOOSTING FOR MULTI-GRAPH CLASSIFICATION 433 bag constrant n MGC), the am of MGC s to buld a predcton model from the tranng mult-graph bag set B to predct some prevously unseen graph bags wth unknown label wth maxmum bag classfcaton accuracy. IV. OVERALL FRAMEWORK OF bmgc In mult-graph bags, there s no feature avalable to represent graphs, so exstng MIL methods, whch requre nstances to have a vectorzed feature representaton, cannot be appled to MGC. In addton, due to lack of labelng nformaton for ndvdual graphs nsde postve bags, subgraph feature based graph classfcaton cannot be drectly appled to MGC nether. To solve the above ssues, n ths secton we propose a bmgc framework, whch apples dynamc weght adjustment at both graph- and bag-levels to select one subgraph n each teraton to construct a sngle weak classfer. In each teraton, the bag and graph weghts are adjusted by the last bag-level and graphlevel weak classfers, respectvely. By dong so, bmgc s able to dfferentate graphs n postve or negatve bags to derve effectve learnng models by boostng all the sngle subgraph bag-level weak classfers. The proposed bmgc framework, as shown n Fg. 3, ncludes the followng four major steps. 1) Subgraph Canddate Generaton: Generatng subgraph canddates s a key step for selectng the most nformatve subgraph. To fnd subgraph canddates wth dverse structures, we aggregate graphs n mult-graph bags nto three graph sets: a) graphs n all bags; b) graphs n all postve bags; and c) graphs n all negatve bags. A gspan [34] based subgraph mnng procedure s appled to each graph set, through whch a set of dverse subgraph canddate patterns can be dscovered for valdaton. 2) Bag Constraned Subgraph Exploraton: In the tth teraton, an nformatve subgraph g t s selected to form a weak classfer for MGC under the weghted bag- and graph-level constrants. To obtan the t + 1th nformatve subgraph, the weghts of bags and graphs should be updated. After m teratons, the selected m subgraphs wll correspond to m weak classfers for learnng. 3) Updatng Weghts of Bags and Graphs: After we fnd the tth nformatve subgraph g t, a bag-level classfer Ht B and a graph-level classfer Ht G wll be traned, respectvely. For graphs, due to our assumpton that we apply bag labels to graphs, some graphs n postve bags have been assgned wrong labels. If a graph G n postve bag set B + s msclassfed by Ht G, n the next teraton we wll decrease G s weght to reduce ts mpact on the learnng process. If a graph G n negatve bag set B s msclassfed, ts weght wll be ncreased, such that G n the negatve bag set wll play a more mportant role to help the learnng algorthm fnd better subgraphs. 4) Boostng Classfcaton: After the subgraphs are selected n all teratons to form the correspondng sngle weak classfers, they can be weghted to construct a strong classfer for MGC. In the followng two sectons, we frst propose our subgraph exploraton crteron n Secton V and then ntroduce detaled procedures of bmgc n Secton VI. Fg. 3. Overvew of the proposed bmgc framework. V. SUBGRAPH EXPLORATION Explorng optmal bag constraned subgraphs n each teraton of bmgc s a nontrval task. Ths process has two man challenges. 1) How to utlze the nformaton of the labeled graphs n negatve bags? 2) How to tackle the problem that the labels of graphs n postve bags are unknown? Assume that a set of canddate graphs are collected from the bag set B, lets g denote the complete set of subgraphs n B, and g t be the optmal subgraph selected from S g n the tth teraton. Our bag constraned subgraph exploraton ams to fnd the most nformatve subgraph g t n each teraton wth weght updatng for both bags and graphs. Let Z( ), the evaluaton crteron for a sngle subgraph S g, be a functon to measure the nformatveness of as g t = arg max (Z( )). (1) S g The objectve functon n (1) ndcates that the optmal bag constraned subgraph g t should have the maxmum dscrmnatve capablty for MGC. A. Evaluaton Crteron for Subgraphs In order to measure the nformatveness of a subgraph,.e., Z( ), such that we can dscover the most nformatve subgraph for bags, we mpose constrants to the labeled bags n the mult-graph bag set B, through whch the subgraph selecton crteron Z( ) can be properly defned. For two bags, B and B j, f they have the same class labels, there s a parwse must-lnk constrant between them. If B and B j have dfferent class labels, there s a cannot-lnk constrant between them.

5 434 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 45, NO. 3, MARCH 2015 To further take the data dstrbutons n each bag nto consderaton, we also add graph-level constrants to ensure that the selected subgraphs can make graphs n each negatve bag close to each other and make graphs n each postve bag be maxmally separated. In summary, a good subgraph should satsfy the followng constrants. 1) Weghted Bag Must-Lnk: If there s a must-lnk between B and B j, ther subgraph feature vectors x B and x B j should be close to each other. In a MGC scenaro, each bag B s assocated wth a weght w B. For each par of bags wth the same class label, the selected subgraph should ensure that bags wth smlar weghts (analogous to mportance) have a hgh smlarty. 2) Weghted Bag Cannot-Lnk: If there s a cannot-lnk between B and B j, the underlyng subgraph feature vectors x B and x B j should be dstnct from each other. For each par of bags n dfferent classes, the smaller the weght dfference between the two bags, the more mpact the constrant wll have for selectng subgraph to represent the dstncton between them. 3) Weghted Graph Must-Lnk: If there s a must-lnk between G and G j, ther subgraph feature vectors x G and x G j should be close to each other. In bmgc, only graphs n negatve bags are known to have genune labels, n whch the feature representatons of the two weghted graphs should have low dversty. 4) Weghted Graph Separablty: If genune labels of graphs G andg j areunknown,thecorrespondngsubgraphfeature vectors x G and x G j should be dfferent. Ths s smlar to the prncpal component analyss (PCA) s assumpton [47], whch ams to fnd the component wth the largest possble varance. Ths constrant apples to postve bags, because genune labels of all graphs n each postve bag are unknown. As a result, we apply ths constrant to encourage each postve bag to have a large dversty nsde the bag. Smlar assumpton has also been used n [9] to handle unlabeled graphs n a sem-supervsed learnng settng. In summary, the bag must-lnk and bag cannot-lnk constrants are appled to bags wth the same label and dfferent labels, respectvely. Whle the graph must-lnk and graph separablty constrants are only appled to graphs n negatve bags and graphs n postve bags, respectvely. By mposng constrants to both bag- and graph- levels, our evaluaton crteron ntends to capture nformatve subgraph features for MGC. Based on the above consderatons, we derve a crteron Z( ) for measurng the nformatveness of a subgraph as follows: Z( ) = 1 2A y y j = 1 1 2B 1 2C + 1 2D y y j =1 ( D gk w B xb D gk w B j xb j G,G j B G,G j B + ) 2 ( ) 2 D gk w B xb D gk w B j xb j ( D gk w G xg ( D gk w G xg D gk w G j xg j D gk w G j xg j ) 2 ) 2 (2) where w B, wb j, wg, and wg j are the weghts for B, B j, G, and G j, respectvely. D gk = dag(d( )) s a dagonal matrx ndcatng whch subgraph feature s selected from S g to represent the bags or graphs, d( ) = I(g =, g S g ) wth I( ) equalng to 1 f the condton nsde s true and 0 otherwse. A = y y j = 1 1, B = y y j =1 1, C = G,G j B 1, and D = G,G j B + 1 assess the total parwse sets of constrants n the bag cannot-lnk, bag must-lnk, graph must-lnk and graph separablty. We defne two matrces for bag-level and graph-level constrants, denoted by M B = [Mj B]p p and M G = [Mj G]q q, respectvely, where Mj B ={1/A, y y j = 1; 1/B, y y j = 1}, and Mj G = { 1/C, G, G j B ; 1/D, G, G j B + ; 0, otherwse}. As a result, (2) can be rewrtten as Z( ) = Z( ) B + Z( ) G = 1 ( ) 2 D gk w B 2 xb D gk w B j xb j M B j y y j + 1 ( ) 2 D gk w G 2 xg D gk w G j xg j M G j. (3) G G j For bag-level evaluaton Z( ) B,wehave Z( ) B = 1 ( ) 2M D gk w B 2 xb D gk w B j xb B j j y y j ) = tr (Dg k X B W B (D B M B )WB X B D ) = tr (Dg k X B W B L B WB X B D ( ) = f B g WB k L B WB f B ( ) = f B g QB k f B (4) where L B = D B M B s a Laplacan matrx, where D B = dag(d B) s a dagonal matrx wth db = j MB j. Q B = W B L B WB, where W B s a also a dagonal matrx, wth W B = w B denotng the weght of the th bag B. X B = [x B 1,...,xB p ] = [f B g 1,...,f B g s ] {0, 1} s p, where f B s an ndcator vector of subgraph wth respect to all the bags n B. Specfcally, f B = [f B 1,...,f B p ] {0, 1} p, where f B = 1ff G B G and f B = 0 otherwse. Smlarly, the graph-level evaluaton Z( ) G can be rewrtten n the form of matrx. Takng both the bag-level and graph-level evaluaton functons together, we have Z ( ) = Z ( ) B + Z( ) G ( ) ( ) = f B g QB k f B + f G g QG k f G = f Qf gk (5) where Q G = W G L G WG, wth W G a dagonal matrx,.e., W G = w G, denotng the weght of the th graph G. L G = D G M G s known as a Laplacan matrx, where D G = dag(d G) s a dagonal matrx wth dg = j MG j. Meanwhle, f G s an ndcator vector of subgraph wth respect to

6 WU et al.: BOOSTING FOR MULTI-GRAPH CLASSIFICATION 435 all graphs n G, and f G = [f G 1,...,f G q ] {0, 1} q, where f G = 1ff G and f G = 0 otherwse. Accordng to (5), we have [ ] f B [ ] g f gk = k QB 0 Q = (6) 0 Q G f G where f gk s an ndcator vector of subgraph wth respect to the data combned wth bag matrx X B and graph matrx X G. By denotng the functon as h(, Q) = f Qf gk, the problem of maxmzng Z( ) n (1) s equvalent to fndng a subgraph that can maxmze the h(, Q), whch can be represented as g t = max S g h(, Q) (7) Defnton 6 (bscore): Gven two matrces M B and M G embeddng the label nformaton, respectvely, and two correspondng weght matrces W B and W G, the nformatveness score of a subgraph s defned n (8) r( ) = h(, Q) = f Qf gk. (8) In the above defnton, a larger bscore r( ) value represents a stronger dependency between ths subgraph feature and the correspondng labels. In other words, good subgraph features should have hgh bscore values. To fnd the optmal subgraph n each teraton, we can calculate bscore values of all subgraphs n S g, and then select the topmost subgraph wth the hghest r( ) value. B. Upper Bound of bscore Before we ntroduce detaled algorthm to mne the optmal subgraph n each teraton, we derve a bscore upper bound to help prune the subgraph search space. Theorem 1: Gven two subgraphs, g k S g, g k s a supergraph of (.e., g k ). The bscore value g k (r(g k ))s bounded by ˆr( ),.e., r(g k ) ˆr() ˆr( ) = f ˆQf gk (9) [ ] ˆQ where ˆQ = B 0, n whch ˆQ B and ˆQ G are defned as 0 ˆQ G ˆQ B j = max(0, QB j ) and ˆQ G j = max(0, Q G j ). For any g k, r(g k ) ˆr(). The correspondng proof s gven n Appendx. C. Mnng Bag Constraned Subgraph For subgraph selecton, we employ a depth-frst search (DFS) based algorthm gspan [34] to enumerate subgraphs. The key dea of gspan s that each subgraph has a unque DFS code, whch s defned by a lexcographc order of the dscovery tme durng the search process. Two subgraphs are somorphsm ff they have the same mnmum DFS code. By employng a depth-frst search strategy on the DFS code tree (where each node s a subgraph), gspan can enumerate all frequent subgraphs effcently. Algorthm 1 reports the proposed bag constraned subgraph exploraton process, whch starts wth an empty optmal Algorthm 1 BSE: Bag Constraned Subgraph Exploraton Input: G: A graph dataset; mn_sup: The threshold of the frequent subgraph; Output: g t : The optmal subgraph; 1: whle Recursvely vst the DFS Code Tree n gspan do 2: current vsted subgraph n DFS Code Tree of G; 3: f freq( )<mn_sup then 4: contnue; 5: end f 6: Compute the bscore r( ) for subgraph ; 7: f g t == NULL or r( )>r(g t ) then 8: g t ; 9: end f 10: f ˆr( ) r(g t ) then 11: Depth-frst search the subtree rooted from node ; 12: end f 13: end whle 14: return g t ; subgraph set and contnuously enumerates subgraphs by recursvely vstng the DFS code tree. If a subgraph s not a frequent subgraph, both and ts subtree wll be pruned (lnes 3 5), n whch freq( ) denotes the percentage of graphs contanng the subgraph n graph dataset G; otherwse, we calculate s bscore value r( ) (lne 6). If r( ) s larger than the current optmal score r(g t ) or t s the frst step (.e., the optmal subgraph set s empty), we regard as the current optmal subgraph g t (lnes 7 9). After that, the upper bound prunng module wll check f ˆr( ) s less than r(g t ),fso, t means that the bscore value of any supergraph g k of (.e., g k ) wll not be greater than r(g t ). Thus, we can safely prune subtrees rooted from n the search space. If ˆr( ) s ndeed greater than the bscore of g t, we cannot prune ths space snce there mght exst a supergraph g k wth r(g k ) r(g t), so the DFS wll contnue by followng the chldren of (lnes 10 12), untl the frequent subgraph mnng process s completed. VI. bmgc The detaled procedures of bmgc are reported n Algorthm 2, whch teratvely expands the canddate graph set to exact nformatve subgraphs, then explores the optmal subgraphs based on bscore. After m teratons, bmgc boosts the m selected weak classfers to obtan the fnal classfcaton model. A. bmgc Algorthm In Algorthm 2, bmgc dfferentates and consders graph n three sets: graphs n postve bags G +, graphs n negatve bags G, and graphs n both postve and negatve bags G. The beneft of separatng graphs nto three sets s that the subgraph mnng process, whch s carred out on each set respectvely, wll ncrease the canddate graph set for explorng subgraphs. By dong so, the subgraph space becomes more dense, through whch good subgraph features can be dscovered. The whle loop n Algorthm 2 represents the boostng process of bmgc. In each teraton, the subgraph mnng s

7 436 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 45, NO. 3, MARCH 2015 Algorthm 2 bmgc: Boostng for Mult-graph Classfcaton Input: B: Mult-graph bag set; G: Graph dataset n B; m: The number of teratons; mn_sup: The threshold of the frequent subgraph; 1: Output: The class label y k of a testng bag B k. 2: Intalze w B w B :w B = 1; w G w G :w G = 1; t = 0; // Tranng Phase: 3: {G +, G } Graphs n B + and B, respectvely; 4: {p, q, q + } #of bags B, # of graphs G, G + ; 5: whle t < m do 6: t t + 1; 7: w B w B / p =1 wb, wg w G / q =1 wg ; 8: g G t BSE(G, mn_sup); //Algorthm 1 9: g G+ t BSE(G +, mn_sup); //Algorthm 1 10: g G t BSE(G, mn_sup); //Algorthm 1, g G t ); 11: g t The subgraph wth the hghest bscore (g G t, gg+ t // Error Calculaton: 12: εt B Calculate the error of Ht B correspondng to g t on B; 13: f εt B > 1/2 then 14: Ht B Ht B, εb t 1 εt B; 15: end f 16: εt G Calculate the error of Ht G correspondng to g t on G ; 17: f εt G > 1/2 then 18: Ht G Ht G; εg t 1 εt G ; 19: end f 20: βt B (1 εt B)/εB t ; 21: βt G εt G /(1 εt G ), β G+ 1/(1 + 2Inq + /m); // Increase weght for ncorrectly classfed bag: 22: w B w B (βb t )I( H B t (B ) =c(b ) ), B B; // Decrease weght for ncorrectly classfed graph n B + : 23: w G+ j w G+ j (β G+ ) I( ) Ht G(G j) =c(g j ), G j G + ; // Increase weght for ncorrectly classfed graph n B : 24: w G k w G k (βt G ) I( ) Ht G(G k) =c(g k ), G k G ; 25: end whle // Testng Phase: 26: y k sgn ( m t=1 βt BHB t (B k) ) carred out on three graph sets as shown from lnes 7 to 9. The current optmal subgraph g t s the one wth the hghest bscore wth respect to the subgraph dscovered from each ndvdual graph sets (lne 10). In bmgc, the subgraph g t s drectly used as a weak bag classfer Ht B or a weak graph classfer Ht G, wth Ht B(B ) = 1ff(x g t ) B = 1, and Ht B(B ) = 1, otherwse. The same classfcaton method s also used n graph based subgraph classfer Ht G. Accordngly, the steps from lnes 11 to 20 use the error rates of the weak classfers to update the parameters of the boostng framework. 1) Updatng Bag and Graph Weghts: To obtan the t + 1th optmal subgraph g t+1, we must update the weghts of bags and graphs usng the tth optmal subgraph g t. The error εt B (lne 11) on a bag set B can be defned as follows: ) p w B εt B (H I t B(B ) = c(b ) = (10) =0 p =1 wb where c(b ) returns the label for the th bag and I( ) s the ndcator functon. The error εt G (lne 15) on a negatve graph set can also be obtaned n a smlar way. Note that εt B and εt G are requred to be smaller than 1/2. If not, the underlyng classfer s worse than random hypothess, and then we should use Ht B and Ht G to replace the current bag- and graph-level classfers, respectvely. As a result, the underlyng errors on and 1 ε G t, bag set and negatve graph set become 1 εt B respectvely (lnes and 16 18). Accordng to the specfc characterstcs of bags and graphs, we employ two dfferent weghtng strateges. Because bags are the target of the classfcaton and ther genune labels are gven, f a bag s msclassfed by the current subgraph g t classfer Ht B, the bag weght s ncreased by usng the weght coeffcent factor βt B (lne 19) n order to fnd more nformatve subgraph n the next teraton to deal wth ncorrectly predcted bags (lne 21). Ths bag-level weghtng mechansm s smlar to the AdaBoost algorthm [25]. At ndvdual graph level, because we propagate bag labels to graphs at the very begnnng of the algorthm, some graphs n postve bags mght have been assgned wth wrong labels. Therefore, f a graph n postve bags s msclassfed (.e., Ht G(G j) = c(g j )), n the next teraton we decrease ts weght to reduce ts effect through multplyng ts weght by (β G+ ) I(HG t (G j) =c(g j )) (0, 1], where β G+ s the weght coeffcent factor for postve graph (lne 20). Thus, the msclassfed graphs n postve bags wll have reduced mpact on the learnng process n the next round (lne 22). The graphs wth large tranng weghts wll help the learnng algorthm fnd better subgraphs. For negatve bags, the weght updatng mechansm s the same for all graphs nsde the bag (lne 23). Ths graph-level weghtng mechansm s smlar to the TrAdaBoost algorthm [48]. In the test phase, the test bag B k wll be tested usng a weghted classfer sgn ( m t=1 βt BHB t (B k) ) by boostng all the m weak classfers Ht B, t = 1, 2,...,m to obtan ts class label y k (lne 25). The key techncal advantage of the bmgc process can be summarzed as follows. a) Bag Constraned Subgraph Mnng: The two-level weght updatng mechansm seamlessly ntegrates the unque bag- and graph-level constrants nto a repettve and progressve mnng process. It helps explore nformatve subgraphs to represent mult-graph bags. b) Implct Feature Representaton: bmgc selects a subgraph to drectly form a weak classfer n each teraton. Ths can effcently tackle the challenge that no feature vectors are avalable for MGC. c) Generc Boostng Framework for MGC: The proposed framework solves MGC by explorng nformatve subgraphs as weak classfers to form a strong boostng model. The framework can be easly adjusted to accommodate other types of graph or bag classfers for MGC. VII. EXPERIMENTS A. DataSets 1) DBLP Mult-Graph Dataset: The DBLP dataset conssts of bblography data n computer scence. We download a DBLP verson called DBLP-Ctaton-network V5 from Arnetmner ( Each record n

8 WU et al.: BOOSTING FOR MULTI-GRAPH CLASSIFICATION 437 TABLE I DBLP DATASET USED IN EXPERIMENTS TABLE II NCI CANCER SCREEN DATASETS: NCI(1) AND NCI(109) DBLP s assocated wth a number of attrbutes ncludng ttle, abstract, author names, year, venue, and reference names etc. [49]. To buld mult-graph bags, we select papers publshed n artfcal ntellgence (AI), computer vson (CV), and database (DB) felds to form MGC tasks. The goal s to predct whch feld a paper belongs to (AI, CV, or DB), by usng abstracts of each paper and the abstracts of ts references. For each abstract, a fuzzy cogntve map (E-FCM) [50] based approach s used to extract a number of keywords and correlatons between keywords. In our experments, we use keywords as nodes and correlatons between two keywords as edge weght values to buld a graph. A threshold (0.005) s used to remove edges whose correlaton values are less than the threshold. At the last step, the graph s converted nto an unweghted graph by settng the weght values of all remanng edges as 1. The smlar graph representaton was also used n prevous works [51] [54]. A conceptual vew of buldng a mult-graph bag s shown n Fg. 1. Notce that AI, CV, and DB are overlapped n many aspects, such as machne learnng, optmzaton and data mnng, whch make them challengng MGC tasks. The orgnal DBLP dataset contans a sgnfcant number of papers wthout references. We choose 2400 papers, each of whch contanng one to ten references, to form two MGC tasks: DBLP (AI versus CV) wth postve (AI) and negatve (CV) bags, and DBLP (AI versus DB) wth postve (AI) and negatve (DB) bags. The last two columns n Table I report the number of bags (papers) and graphs (abstracts) n each category. 2) NCI Chemcal Compound Mult-Graph Dataset: The NCI cancer screenng database s a commonly used graph classfcaton benchmark. We download two NCI datasets wth ID 1 and 109 from PubChem ( Each NCI dataset belongs to a boassay task for antcancer actvty predcton, where each chemcal compound s represented as a graph, wth atoms representng nodes and bonds denotng edges. A chemcal compound s postve f t s actve aganst the correspondng cancer, or negatve otherwse. The orgnal NCI datasets are hghly mbalanced, wth about 5% postve graphs, whch s used to generate our mult-graph bags. To buld mult-graph bags, we randomly select 1 to 4 postve graphs and several negatve graphs to form a postve bag, and randomly select a number of negatve graphs to form a negatve bag. In order to address dfferent targets, we desgn two NCI mult-graph classfcaton tasks. One s NCI(1), whch s generated from NCI dataset wth ID 1, and the other s NCI(109), whch s generated from NCI dataset wth ID 109. The number of graphs n each bag may vary from 1 to 10. Table II summarzes the NCI(1) and NCI(109) datasets used n our experments, where columns 4 5 show the numbers of postve and negatve graphs n all mult-graph bags. In the NCI MGC, a bag of graphs can be regarded as a molecular group. Investgatng the actvty of a molecular group s meanngful n the bo-pharmaceutcal feld. Because labelng ndvdual compounds s expensve and tme-consumng, t s desrable to desgn effectve methods (bmgc) to label molecular groups (.e., bags). B. Baselne Methods To demonstrate the effectveness of our MGC framework, we compare the proposed bmgc wth both supervsed and unsupervsed bag constraned subgraph selecton methods n the tradtonal MIL framework. The baselne methods are summarzed as follows. 1) Informaton Gan Based Approach (IG+MI): In these methods, a set of frequent subgraphs are mned from graphs n all bags by usng gspan [34]. A supervsed feature selecton based on nformaton gan (IG) s used to select m subgraphs wth the hghest IG scores. After obtanng the m subgraphs, IG based mult-nstance approach (IG+MI) utlzes the selected subgraphs to represent graphs n bags, so a bag of graphs are converted nto a bag of nstances, through whch the exstng MIL methods can be appled for MGC learnng. 2) Top-k Based Approach (Topk+MI): Ths s an unsupervsed feature selecton method whch uses frequency as evaluaton crteron to select subgraphs dscovered by gspan [34]. The Top-k subgraphs wth the hghest frequency from graphs n bags are selected. Top-k based mult-nstance approach (Topk+MI) transforms each bag of graphs nto a bag of nstances for learnng. To compare our MGC framework bmgc s performance wth MIL, two types of benchmark mult-nstance classfers, ncludng boostng based (MIBoost and MIOptmallBall) and four dfferent knds of general approaches (CtatonKNN, MIRI, MIEMDD, and MISMO), are used n our experments. In the followng, CtatonKNN denotes a lazy learnng based method, MIRI s an mprovement of tree learnng based approach, MIEMDD s an mproved DD [26], and MISMO s an mplementaton of support vector machne for MIL. The baselne MIL methods used n our experments and ther abbrevatons are lsted as follows. 1) Boostng for MI Learnng Approaches. a) MIBoost s an algorthm [24] nspred by AdaBoost that bulds a seres of weak classfers (decson stump s used

9 438 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 45, NO. 3, MARCH 2015 Fg. 4. Accuracy on DBLP(AI versus CV) by usng proposed bmgc and boostng based MI learnng methods. (a) MIBoost. (b) MIOptmalBall. Fg. 6. Accuracy on NCI(1) by usng proposed bmgc and boostng based MI learnng methods. (a) MIBoost. (b) MIOptmalBall. Fg. 5. Accuracy on DBLP(AI versus DB) by usng proposed bmgc and boostng based MI learnng methods. (a) MIBoost. (b) MIOptmalBall. Fg. 7. Accuracy on NCI(109) by usng proposed bmgc and boostng based MI learnng methods. (a) MIBoost. (b) MIOptmalBall. n our experment) usng a sngle nstance learner based on approprately reweghted versons of the nput data. b) MIOptmalBall treats the weak hypotheses for AdaBoost as balls [28] and the classfcaton s based on the dstance to a reference pont. More specfcally, ths method attempts to fnd a ball n the nstance space so that all nstances of all negatve bags are outsde the ball and at least one nstance of each postve bag s nsde the ball. 2) General MI Learnng Approaches. a) CtatonKNN, a nearest-neghbor-based approach, measures the dstance between bags usng Hausdorff dstance [16]. The nearest neghbor example to be classfed s the one nearest to both references and cters. b) MIEMDD s the EM verson of DD wth the mostlkely-cause model [26], whch s used to fnd the most lkely target ponts based on the DD model that has been learned [27]. c) MIRI s a mult-nstance classfer that utlzes partal MITI trees [17] wth a sngle postve leaf to learn and represent rules. MIRI [18] s a smple modfcaton to MITI to yeld a rule learner for MIL. d) MISMO constructs a support vector machne classfer for mult-nstance data [21], where the standard sequental mnmzaton algorthm s used for support vector learnng n conjuncton wth an MI kernel as descrbed n [55]. C. Experment Settngs In our experments, all reported results are based on 10 tmes 10-fold cross-valdaton wth classfcaton accuracy beng used as the performance metrcs. Unless specfed otherwse, the default parameter settngs are as follows: mnmum support threshold mn_sup = 4% for DBLP datasets and mn_sup = 15% for NCI datasets. All the above classfers for tradtonal MIL utlze the versons provded n WEKA machne learnng workbench [56], wth default parameter settngs. Besdes, all experments are conducted on a Lnux cluster computng node wth an Interl(R) CPU and 3GB memory. D. Accuracy on Mult-Graph Classfcaton In ths secton, we report expermental results on DBLP and NCI datasets, by comparng the performance of bmgc wth two types of MIL methods, ncludng boostng based and general approaches under the supervsed and unsupervsed feature selecton settngs respectvely. All methods are compared by usng the same number of subgraphs. For our boostng based bmgc, one subgraph s selected n each teraton untl the total number reaches m, whereas for baselne methods, a number of m subgraphs are selected n one tme. As expected, bmgc clearly outperforms exstng tradtonal MIL methods on both DBLP and NCI mult-graph datasets wth dfferent number of subgraphs (varyng from 1 to 100). 1) bmgc Versus Boostng for MI Learnng Approaches: We compare bmgc to MIBoost and MIOptmalBall, where the two boostng based baselnes are two varants of the well known AdaBoost algorthm [25] wth the objectve of mnmzng the exponental loss for bags of nstances. Lke other boostng schemes, these two algorthms greedly ft an addtve model to the tranng data. In each teraton of the sequental boostng process, a weak learner (a decson stump for MIBoost, and a ball for MIOptmalBall) s appled to generate one component of the underlyng addtve model. Results n Fgs. 4(a) to 7(a) show that both bmgc and MIBoost can acheve a hgh accuracy on DBLP (AI versus

10 WU et al.: BOOSTING FOR MULTI-GRAPH CLASSIFICATION 439 TABLE III PAIRWISE t-test RESULT OF BMGC VERSUS BOOSTING BASED MI LEARNING METHODS ON (a) DBLP AND (b) NCI DATASETS. A,B,AND CDENOTE BMGC, IG+MI, AND TOPK+MI, RESPECTIVELY. H 1 AND H 2 DENOTE MIBOOST AND MIOPTIMALBALL, RESPECTIVELY CV, AI versus DB) and NCI (1, and 109) datasets. Meanwhle, bmgc consstently outperforms MIBoost when the number of selected subgraphs s 20 or more. On the other hand, comparng our bmgc wth MIOptmallBall, sgnfcant performance gan can be observed n Fgs. 4(b) to 7(b) on both datasets. The superor performance of bmgc s due to the optmal subgraph mnng strategy combned wth AdaBoost and TrAdaBoost algorthms. Further more, t seems that MIOptmallBall fals to adapt to the feature space composed of subgraphs. Our results also show that bmgc has a very low accuracy n early teratons, and ts accuracy may be worse than baselnes such as MIBoost n some cases. Ths s manly because that the boostng model of bmgc reles on weak classfers to acheve better performance. When the number of weak classfers s small (normally happens at the early stage of the boostng process), the accuracy of bmgc s notceably low. In order to show that ths stuaton wll not affect the performance of bmgc, we summarze the parwse t-test results (wth confdence level α = 0.05) of bmgc and boostng MI learnng methods on both datasets n Table III. Each entry (value) denotes the p-value for a t-test between two algorthms, and a p-value less than α = 0.05 ndcates that the dfference s statstcally sgnfcant. From Table III, bmgc statstcally outperforms boostng based MI learnng baselnes n all cases. 2) bmgc Versus General MI Learnng Approaches: We carry out another expermental comparson to demonstrate the performance of bmgc, wth other four dfferent types of general MI learnng approaches (CtatonKNN, MIRI, MIEMDD and MISMO). From the results n Fgs. 8(c) to 11(c), MIEMDD shows neffectve performance for MGC, and ncreasng number of subgraphs cannot result n addtonal accuracy gan. Although the performance of CtatonKNN, MIRI, and MISMO based methods mprove as the number of subgraphs ncreases, they stll cannot reach the best performance acheved by bmgc except for IG+MIRI on NCI(109) dataset as shown n Fg. 11(b). It s worth mentonng that bmgc may acheve comparable performances over other baselnes n some cases, such as Topk+CtatonKNN [Fg. 9(a)] and MISMO [Fgs. 8(d) and 9(d)] on DBLP dataset, IG+MISMO [Fgs. 10(d) and 11(d)] on NCI dataset. To further valdate the statstcal performance of bmgc, n Table IV, we also report the parwse t-test to valdate the statstcal sgnfcance between two methods. From Table IV, bmgc statstcally outperforms general MI learnng baselnes n all cases. Ths s manly attrbuted to the effectveness of the proposed bag constraned subgraph exploraton crteron and the specally desgned boostng strategy, whch weghts a set of sngle weak classfers under our specally desgned weghtng mechansm. E. Effectveness of Subgraph Canddate Generaton n bmgc As dscussed above, one man component of bmgc s the utlzaton of subgraph canddate generaton (as descrbed n Secton IV). More specfcally, n addton to aggregatng graphs n all bags G, we also aggregate: 1) graphs n all postve bags G +, and 2) graphs n all negatve bags G.Asa result, a set of dverse subgraph canddate patterns can be dscovered for valdaton. In order to further llustrate the effectveness of the proposed strategy for subgraph canddate generaton and valdate whether usng the two extra graph sets G + and G can ndeed mprove the performance of bmgc, we compare bmgc wth an approach whch only uses the G to generate the subgraphs for learnng, namely bmgc-g. In Fg. 12(a) and (b), we report the accuracy wth respect to dfferent teratons on DBLP (AI versus CV) and NCI(1) datasets, respectvely. The results show that the classfcaton accuracy of bmgc usng all three graph sets s normally 3% 5% hgher than bmgc-g whch only uses the G. Ths s due to the fact that the separaton of graphs nto G + and G can help fnd some unque subgraph patterns, whch do not appear n the whole graph set G. Indeed, because the subgraph exploraton essentally reles on a threshold (.e., the support value) to dscover frequent subgraphs. When aggregatng all graphs n one set G, t s possble that a good subgraph n G + may not be dscovered from G, smply because the frequency of the subgraph s below the gven threshold n G. The separaton of graphs nto three sets G +, G, and G wll therefore help dscover a rch set of subgraph canddates, through whch bmgc can fnd the ones wth the hghest nformatveness scores. F. Convergence Study Fg. 13 reports the error rate curves of bmgc n terms of the number of teratons on four mult-graph datasets. The curves are qute smooth, but converge well, whch s consstent wth the theoretcal analyss and the exstng observatons from Adaboost [25]. The error rates of bmgc, after the algorthm reaches the convergence, are hgher on DBLP datasets than on the NCI datasets. Overall, bmgc on all four datasets receves a fast convergence speed. For NCI datasets, the convergence s reached wthn ten teratons, whereas for DBLP datasets, bmgcs convergence

440 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 45, NO. 3, MARCH 2015 Fg. 8.

Accuracy on DBLP(AI versus DB) by usng bmgc and generc MI learnng methods. (a) CtatonKNN. (b) MIRI. (c) MIEMDD. (d) MISMO. Fg. 10.

Accuracy on NCI(109) by usng bmgc and generc MI learnng methods. (a) CtatonKNN. (b) MIRI. (c) MIEMDD. (d) MISMO.

A,B,AND CDENOTE BMGC, IG+MI, AND TOPK+MI, RESPECTIVELY.

11 440 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 45, NO. 3, MARCH 2015 Fg. 8. Accuracy on DBLP(AI versus CV) by usng bmgc and generc MI learnng methods. (a) CtatonKNN. (b) MIRI. (c) MIEMDD. (d) MISMO. Fg. 9. Accuracy on DBLP(AI versus DB) by usng bmgc and generc MI learnng methods. (a) CtatonKNN. (b) MIRI. (c) MIEMDD. (d) MISMO. Fg. 10. Accuracy on NCI(1) by usng bmgc and generc MI learnng methods. (a) CtatonKNN. (b) MIRI. (c) MIEMDD. (d) MISMO. Fg. 11. Accuracy on NCI(109) by usng bmgc and generc MI learnng methods. (a) CtatonKNN. (b) MIRI. (c) MIEMDD. (d) MISMO. TABLE IV PAIRWISE t-test RESULT OF BMGC VERSUS GENERAL MI LEARNING METHODS ON (a) DBLP AND NCI DATASETS. A,B,AND CDENOTE BMGC, IG+MI, AND TOPK+MI, RESPECTIVELY. H 1, H 2, H 3, AND H 4 DENOTE CITATIONKNN, MIRI, MIEMDD, AND MISMO, RESPECTIVELY s reached after 20 or more teratons. Notce that each weak classfer n bmgc denotes one subgraph, ths ndcates that more subgraph features are needed n order to dfferentate the object classes n the DBLP dataset. Indeed, because DBLP tasks nvolve overlappng domans (such as AI versus CV), usng more subgraph features (whch correspond to keywords

12 WU et al.: BOOSTING FOR MULTI-GRAPH CLASSIFICATION 441 Fg. 12. Accuracy comparsons by usng bmgc and bmgc-g on DBLP and NCI datasets, respectvely. (a) DBLP (AI versus CV) dataset. (b) NCI (1) dataset. Fg. 14. Average CPU runtme for bmgc versus unpruned ubmgc wth dfferent mn_sup under a fxed number of subgraphs m = 100 on DBLP and NCI datasets, respectvely. Fg. 13. Error rate curves on DBLP (AI versus CV, AI versus DB) and NCI(1 and 109) mult-graph datasets n terms of the number of teratons. and ther correlatons) can constantly help mprove the classfcaton accuracy. For NCI graphs, the postve versus negatve graphs are mostly separated by some unque subgraph features. So as long as such unque patterns are dscovered, the algorthm can quckly converge. G. Effectveness Results To evaluate the effectveness of the prunng module of bmgc n reducng the search space (as descrbed n Secton V-C), we compare bmgc wth an approach whch does not have prunng module n the subgraph search space (denoted by ubmgc). In our mplementaton, ubmgc frst uses gspan to fnd a set of frequent subgraphs, and then selects the optmal subgraph by usng the same crtera as bmgc n each teraton. In Fg. 14(a) and (b), we report the average CPU runtme wth respect to dfferent mnmum support values mn_sup (the number of selected subgraphs s fxed to 100) on DBLP(AI versus CV) and NCI(1) datasets, respectvely. The results show that as the mn_sup values ncrease, the runtme of both prunng and unprunng bmgc decrease, ths s manly because a larger mn_sup value wll reduce the number of canddates for valdaton. Accordngly, by ncorporatng the proposed prunng strategy, bmgc can mprove the runtme performance. The reason s that the bscore upper bound of bmgc can effectvely help prune the subgraph search space wthout decreasng the qualty of classfcaton. VIII. DISCUSSION In ths paper, we focus on usng subgraph based boostng framework for MGC. Indeed, the dea of explotng subgraphs for graph classfcaton has been studed n a number of exstng works, ncludng a recent ensemble based semsupervsed graph stream classfcaton approach [9]. The core of the proposed bmgc approach s to combne two types of boostng strateges: AdaBoost [25] for bag-level boostng and TrAdaBoost [48] for graph-level boostng, to ntegrate graphand bag-level learnng for MGC. Boostng algorthms for graph classfcaton have already been studed n several prevous works. For example, Kudo et al. [41] proposes an AdaBoost based graph classfcaton approach, whch s the orgnal algorthm among many varants [42] [44]. Meanwhle, LPBoost [57], namely lnear programmng Boostng, s another type of boostng algorthm for graph classfcaton. The proposed bmgc follows smlar subgraph search approaches as used n these exstng works. For bmgc, t uses gspan algorthm [34] n each boostng teraton, together wth the proposed prunng strategy, to explore subgraphs. The man complcaton of MGC s that the genune labels of graphs nsde a postve bag are unknown. To tackle uncertanty nsde postve bags, bmgc takes the bag constrants nto consderaton and explores subgraphs to represent graphs wth maxmum dversty, as defned n (2). Ths s smlar to the way of handlng unlabeled graphs n an exstng sem-supervsed graph stream classfcaton method [9]. In [9], an nstance weghtng mechansm has also been proposed but s dfferent from the weghtng approach n bmgc, where the weghts are drectly assocated to the graphs and bags. In addton, the weght updatng strategy n [9] s based on AdaBoost [25], whch only consders labeled graphs. In bmgc, we borrow the weghtng strategy from TrAdaBoost [48] to update the graph weghs n both labeled and unlabeled graph sets. In summary, the dea n [9] provdes nspratons to motvate the proposed MGC desgn. We beleve that the proposed bmgc opens a new opportunty to expand exstng MIL to ncreasngly popular graph applcatons. Although bmgc proposes to use subgraph mnng to tackle the MGC challenges, the prncple of combnng graph and bag level constrants can be extended to many other types of approaches to handle MGC problems. For example, for kernel based methods, MGC problem can be solved by two subtasks: 1) add mult-graph constrants to tradtonal graph kernel, and 2) propose a new mult-graph kernel framework. In addton, one can also mpose mult-graph constrans to graph embeddng methods (e.g., the one n [32]) to drectly calculate the dstance between two graphs or between two

13 442 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 45, NO. 3, MARCH 2015 graph bags. Wth the calculated dstances between graphs and between bags, standard learnng algorthms (ncludng MIL algorthms) can be appled to solve MGC tasks. IX. CONCLUSION In ths paper, we nvestgated a novel MGC problem, n whch a number of graphs form a bag, wth each bag beng labeled as ether postve or negatve. Mult-graph representaton can be used to represent many real-world applcatons, where label s only avalable for a bag of objects wth dependency structures. To buld a learnng model for MGC, we proposed a bmgc, whch employs dynamc weght adjustment, at both graph- and bag-levels, to select one subgraph n each teraton to form a set of weak graph classfers. The MGC s acheved by usng weghted combnaton of weak graph classfers. Experments on two real-world MGC tasks, ncludng DBLP ctaton network and NCI chemcal compound classfcaton, demonstrate that our method s effectve n fndng nformatve subgraph, and ts accuracy s sgnfcantly better than baselne methods. APPENDIX PROOF OF THE THEOREM 1 Accordng to (8), for any g k we have r ( g ) k = f g kqf g k [ ( ) ( ) ][ ] = f B g f G QB 0 f B g k g k k 0 Q G f G g k ( ( = f B ) QB f B g + f G k ) QG f G g k = Q B j + Q G j (11),j:G,G j G(g k ),j:b,b j B(g k) where B(g k ) ={B g k G j B, 1 p, 1 j q} and G(g k ) ={G j g k G j, 1 j q}. Snce g k s the supergraph of (.e., g k ), accordng to the ant-monotonc property, we have B(g k ) B() and G(g k ) G() r ( g ) k = Q B j + Q G j =,j:b,b j B(g k),j:b,b j B(g k),j:b,b j B( ) ( f B ) ˆQ B f B + ˆQ B j + ˆQ B j +,j:g,g j G(g k),j:g,g j G(g k),j:g,g j G( ) ( f G ) ˆQ G f G = f ˆQf gk =ˆr( ). (12) Thus, for any g k, r(g k ) ˆr(). REFERENCES [1] M. Deshpande, M. Kuramoch, N. Wale, and G. Karyps, Frequent substructure-based approaches for classfyng chemcal compounds, IEEE Trans. Knowl. Data Eng., vol. 17, no. 8, pp , Aug ˆQ G j ˆQ G j [2] W. Lan, D.-L. Cheung, N. Mamouls, and S.-M. Yu, An effcent and scalable algorthm for clusterng XML documents by structure, IEEE Trans. Knowl. Data Eng., vol. 16, no. 1, pp , Jan [3] C. Chen et al., Mnng graph patterns effcently va randomzed summares, n Proc. 35th Int. Conf. VLDB, Lyon, France, 2009, pp [4] H. Wang, H. Huang, and C. Dng, Image categorzaton usng drected graphs, n Proc. 11th ECCV, Crete, Greece, 2010, pp [5] R. Angelova and G. Wekum, Graph-based text classfcaton: Learn from your neghbors, n Proc. 29th Annu. Int. ACM SIGIR, Seattle, WA, USA, 2006, pp [6] Z. Harchaou and F. Bach, Image classfcaton wth segmentaton graph kernels, n Proc. 20th IEEE Conf. CVPR, Mnneapols, MN, USA, 2007, pp [7] M. Kuramoch and G. Karyps, Frequent subgraph dscovery, n Proc. 1st ICDM, 2001, pp [8] M. Thoma et al., Near-optmal supervsed feature selecton among frequent subgraphs, n Proc. 9th SDM, 2009, pp [9] S. Pan, X. Zhu, C. Zhang, and P. Yu, Graph stream classfcaton usng labeled and unlabeled graphs, n Proc. 29th IEEE ICDE, Brsbane, QLD, USA, 2013, pp [10] R. Kohav and G. H. John, Wrappers for feature subset selecton, Artf. Intell., vol. 97, nos. 1 2, pp , [11] T. Detterch, R. Lathrop, and T. Lozano-Pérez, Solvng the multple nstance problem wth axs-parallel rectangles, Artf. Intell., vol. 89, no. 1 2, pp , [12] Z. Fu, A. Robles-Kelly, and J. Zhou, MILIS: Multple nstance learnng wth nstance selecton, IEEE Trans. Pattern Anal. Mach. Intell.,vol.33, no. 5, pp , May [13] Z.-H. Zhou, K. Jang, and M. L, Mult-nstance learnng based web mnng, Appl. Intell., vol. 22, no. 2, pp , [14] D. Kelly, J. McDonald, and C. Markham, Weakly supervsed tranng of a sgn language recognton system usng multple nstance learnng densty matrces, IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 41, no. 2, pp , Apr [15] Z. Zhou, M. Zhang, S. Huang, and Y. L, Mult-nstance mult-label learnng, Artf. Intell., vol. 176, no. 1, pp , [16] J. Wang, Solvng the multple-nstance problem: A lazy learnng approach, n Proc. 17th ICML, San Francsco, CA, USA, 2000, pp [17] H. Blockeel and A. Srnvasan, Mult-nstance tree learnng, n Proc. 22th ICML, Bonn, Germany, 2005, pp [18] L. Bjerrng and E. Frank, Beyond trees: Adoptng MITI to learn rules and ensemble classfers for mult-nstance data, n Proc. 24th Int. Conf. Adv. AI, Berln, Hedelberg, 2011, pp [19] Y. Chevaleyre and J. Zucker, A framework for learnng rules from multple nstance data, n Proc. 12th ECML, Freburg, Germany, 2001, pp [20] M. Zhang and Z. Zhou, Improve mult-nstance neural networks through feature selecton, Neural Process. Lett.,vol.19,no.1,pp. 1 10, [21] X. Q and Y. Han, Incorporatng multple SVMs for automatc mage annotaton, Pattern Recogn., vol. 40, no. 2, pp , [22] S. Ray and M. Craven, Supervsed versus multple nstance learnng: An emprcal comparson, n Proc. 22nd ICML, New York, NY, USA, 2005, pp [23] H. Yuan, M. Fang, and X. Zhu, Herarchcal samplng for mult-nstance ensemble learnng, IEEE Trans. Knowl. Data Eng., vol. 25, no. 12, pp , Dec [24] X. Xu and E. Frank, Logstc regresson and boostng for labeled bags of nstances, n Proc. 8th PAKDD, 2004, pp [25] M. Telgarsky, A prmal-dual convergence analyss of boostng, J. Mach. Learn. Res., vol. 13, no. 1, pp , [26] O. Maron and T. Lozano-Pérez, A framework for multple-nstance learnng, n Proc. 12th Annu. Conf. NIPS, Cambrdge, MA, USA, 1998, pp [27] Q. Zhang and S. Goldman, EM-DD: An mproved multplenstance learnng technque, n Proc. 15th Annu. Conf. NIPS, 2001, pp [28] P. Auer and R. Ortner, A boostng approach to multple nstance learnng, n Proc. 15th ECML, Psa, Italy, 2004, pp [29] J. Wu, X. Zhu, C. Zhang, and Z. Ca, Mult-nstance mult-graph dual embeddng learnng, n Proc. 13th ICDM, Dallas, TX, USA, 2013, pp [30] S. V. N. Vshwanathan, K. M. Borgwardt, R. I. Kondor, and N. N. Schraudolph, Graph kernels, J. Mach. Learn. Res., vol. 11, pp , Apr

WU et al.: BOOSTING FOR MULTI-GRAPH CLASSIFICATION 443 [31] P. Mahe, N. Ueda, T. Akutsu, J. Pettet, and J. Vert, Extensons of margnalzed graph kernels, n Proc. 21st ICML, New York, NY, USA, 2004, pp.

Sngapore: World Scentfc, 2010. [34] X. Yan and J. Han, gspan: Graph-based substructure pattern mnng, n Proc. 2nd ICDM, Washngton, DC, USA, 2002, pp. 721 724. [35] A. Inokuch, T. Washo, and H.

Berthold, Mnng molecular fragments: Fndng relevant substructures of molecules, n Proc. 2nd ICDM, 2002, pp. 51 58. [37] S. Njssen and J.

27th ACM SIGMOD, Vancouver, BC, Canada, 2008, pp. 433 444. [39] H. Sago, N. Krämer, and K. Tsuda, Partal least squares regresson for graph mnng, n Proc. 14th ACM SIGKDD, Las Vegas, NV, USA, 2008, pp.

14 WU et al.: BOOSTING FOR MULTI-GRAPH CLASSIFICATION 443 [31] P. Mahe, N. Ueda, T. Akutsu, J. Pettet, and J. Vert, Extensons of margnalzed graph kernels, n Proc. 21st ICML, New York, NY, USA, 2004, pp [32] K. Resen and H. Bunke, Graph classfcaton by means of Lpschtz embeddng, IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 39, no. 6, pp , Dec [33] K. Resen and H. Bunke, Graph Classfcaton and Clusterng Based on Vector Space Embeddng. Sngapore: World Scentfc, [34] X. Yan and J. Han, gspan: Graph-based substructure pattern mnng, n Proc. 2nd ICDM, Washngton, DC, USA, 2002, pp [35] A. Inokuch, T. Washo, and H. Motoda, An apror-based algorthm for mnng frequent substructures from graph data, n Proc. 4th Eur. Conf. PKDD, Lyon, France, 2000, pp [36] C. Borgelt and M. Berthold, Mnng molecular fragments: Fndng relevant substructures of molecules, n Proc. 2nd ICDM, 2002, pp [37] S. Njssen and J. Kok, A quckstart n frequent structure mnng can make a dfference, n Proc. 10th ACM SIGKDD, Seattle, WA, USA, 2004, pp [38] X. Yan, H. Cheng, J. Han, and P. S. Yu, Mnng sgnfcant graph patterns by leap search, n Proc. 27th ACM SIGMOD, Vancouver, BC, Canada, 2008, pp [39] H. Sago, N. Krämer, and K. Tsuda, Partal least squares regresson for graph mnng, n Proc. 14th ACM SIGKDD, Las Vegas, NV, USA, 2008, pp [40] N. Jn, C. Young, and W. Wang, GAIA: Graph classfcaton usng evolutonary computaton, n Proc. 29th ACM SIGMOD, Indanapols, IN, USA, 2010, pp [41] T. Kudo, E. Maeda, and Y. Matsumoto, An applcaton of boostng to graph classfcaton, n Proc. 18th Annu. Conf. NIPS, 2004, pp [42] S. Nowozn, K. Tsuda, T. Uno, T. Kudo, and G. Bakr, Weghted substructure mnng for mage analyss, n Proc. 20th IEEE Conf. CVPR, Mnneapols, MN, USA, 2007, pp [43] H. Sago, S. Nowozn, T. Kadowak, T. Kudo, and K. Tsuda, gboost: A mathematcal programmng approach to graph classfcaton and regresson, Mach. Learn., vol. 75, no. 1, pp , [44] S. Pan and X. Zhu, Graph classfcaton wth mbalanced class dstrbutons and nose, n Proc. 23rd IJCAI, 2013, pp [45] H. Fe and J. Huan, Boostng wth structure nformaton n the functonal space: An applcaton to graph classfcaton, n Proc. 16th ACM SIGKDD, Washngton, DC, USA, 2010, pp [46] B. Zhang et al., Mult-class graph boostng wth subgraph sharng for object recognton, n Proc. 20th ICPR, Istanbul, Turkey, 2010, pp [47] M. Grbovc, C. Dance, and S. Vucetc, Sparse prncpal component analyss wth constrants, n Proc. 26th Conf. AAAI, 2012, pp [48] W. Da, Q. Yang, G. Xue, and Y. Yu, Boostng for transfer learnng, n Proc. 24th ICML, Corvalls, OR, USA, 2007, pp [49] J. Tang et al., ArnetMner: Extracton and mnng of academc socal networks, n Proc. 14th ACM SIGKDD, Las Vegas, NV, USA, 2008, pp [50] K. Perusch and M. McNeese, Usng fuzzy cogntve maps for knowledge management n a conflct envronment, IEEE Trans. Syst., Man, Cybern.C,Appl.Rev., vol. 36, no. 6, pp , Nov [51] X. L. Q. Hu, W. Xu, and Z. Yu, Dscovery of textual knowledge flow based on the management of knowledge maps, Concurr. Comput. Pract. Exp., vol. 20, no. 15, pp , [52] X. Luo, Z. Xu, J. Yu, and X. Chen, Buldng assocaton lnk network for semantc lnk on web resources, IEEE Trans. Autom. Sc. Eng., vol. 8, no. 3, pp , Jul [53] J. Wu et al., Mult-graph learnng wth postve and unlabeled bags, n Proc. 14th SIAM Int. Conf. Data Mnng, 2014, pp [54] J. Wu, X. Zhu, C. Zhang, and P. Yu, Bag constraned structure pattern mnng for mult-graph classfcaton, IEEE Trans. Knowl. Data Eng., to be publshed. [55] T. Gartner, P. A. Flach, A. Kowalczyk, and A. J. Smola, Mult-nstance kernels, n Proc. 19th ICML, 2002, pp [56] I. H. Wtten and E. Frank, Data Mnng: Practcal Machne Learnng Tools and Technques, 2nd ed. Amsterdam, The Netherlands: Morgan Kaufmann, [57] A. Demrz, K. P. Bennett, and J. Shawe-Taylor, Lnear programmng boostng va column generaton, Mach. Learn., vol. 46, no. 1 3, pp , Ja Wu (S 14) receved the bachelor s degree n computer scence from the Chna Unversty of Geoscences, Wuhan, Chna, n 2009, where he s currently pursung the Ph.D. degree n computer scence. He s also pursung the Ph.D. degree from the Centre for Quantum Computaton and Intellgent Systems, Faculty of Engneerng and Informaton Technology, Unversty of Technology, Sydney, Ultmo, NSW, Australa. Hs current research nterests nclude data mnng and machne learnng. Shru Pan receved the master s degree n computer scence from Northwest A&F Unversty, Yanglng, Chna, n He s currently pursung the Ph.D. degree from the Centre for Quantum Computaton and Intellgent Systems, Faculty of Engneerng and Informaton Technology, Unversty of Technology, Sydney, Ultmo, NSW, Australa. Hs current research nterests nclude data mnng and machne learnng. Xngquan Zhu (SM 12) receved the Ph.D. degree n computer scence from Fudan Unversty, Shangha, Chna. He s an Assocate Professor wth the Department of Computer & Electrcal Engneerng and Computer Scence, Florda Atlantc Unversty, Boca Raton, FL, USA. Pror to that, he was wth the Centre for Quantum Computaton and Intellgent Systems, Unversty of Technology, Sydney, Ultmo, NSW, Australa. Hs current research nterests nclude data mnng, machne learnng, and multmeda systems. Snce 2000, he has publshed over 170 refereed journal and conference papers n these areas. Dr. Zhu s an Assocate Edtor of the IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING durng ( date), and s currently servng on the Edtoral Board of the nternatonal journal of Socal Network Analyss and Mnng, snce 2010, and Network Modelng Analyss n Health Informatcs and Bonformatcs journal, snce He served or s servng as a Program Commttee Co-Char for the 14th IEEE Internatonal Conference on Bonformatcs and BoEngneerng (BIBE-2014), the IEEE Internatonal Conference on Granular Computng (GRC-2013), the 23rd IEEE Internatonal Conference on Tools wth Artfcal Intellgence (ICTAI-2011), and the 9th Internatonal Conference on Machne Learnng and Applcatons (ICMLA-2010). He also served as a Conference Co-Char for ICMLA He was a recpent of two Best Paper Awards and one Best Student Paper Award. Zhhua Ca receved the B.Sc. degree from Wuhan Unversty, Wuhan, Chna, n 1986, the M.Sc. degree from the Bejng Unversty of Technology, Bejng, Chna, n 1992, and the Ph.D. degree from the Chna Unversty of Geoscences, Wuhan, n He s currently a faculty member wth the School of Computer Scence, Chna Unversty of Geoscences. He has publshed over 50 research papers n journals and nternatonal conferences, such as IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, IEEE TRANSACTIONS ON CYBERNETICS, Appled Soft Computng, Informaton Scences, Knowledge-Based Systems, and Knowledge and Informaton Systems. Hs current research nterests nclude data mnng, machne learnng, evolutonary computaton, and ther applcatons.

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department