Towards Semantic Knowledge Propagation from Text to Web Images

Guoun Q (Unversty of Illnos at Urbana-Champagn) Charu C. Aggarwal (IBM T. J. Watson Research Center) Thomas Huang (Unversty of Illnos at Urbana-Champagn) Towards Semantc Knowledge Propagaton from Text to Web Images WWW Conference, Hyderabad, Inda 2011

Introducton The problem of mage classfcaton s challengng n many scenaros: Labeled mage data may be scarce n many settngs. The mage features are not drectly related to semantc concepts nherent n class labels. The combnaton of the above factors can be challengng. Goal: To address these challenges wth the use of semantc knowledge propagaton.

Semantc Challenges The semantc challenges of mage features are evdent, when we attempt to recognze complex abstract concepts. The vsual features often fal to dscrmnate such concepts. Classfers naturally work better wth features that have semantc nterpretablty. Class labels are also usually desgned on the bass of applcaton-specfc semantc crtera. Text features are nherently frendly to the classfcaton process n a way that s often a challenge for mage representatons.

Observatons n the Context of Web and Socal Networks In many real web and socal meda applcatons, t s possble to obtan co-occurrence nformaton between text and mages. Tremendous amount of lnkage between text and mages on the web, socal meda and nformaton networks In web pages, the mages co-occur wth text on the same web page. Comments n mage sharng stes. Posts n socal networks.

Learnng from Semantc Brdges The copous avalablty of brdgng relatonshps between text and mages n the context of web and socal network data can be leveraged for better learnng models. It s reasonable to assume that the content of the text and the mages are hghly correlated n both scenaros. The relatonshps between text and mages can be used n order to facltate the learnng process.

Observatons The co-occurrence data provdes a raw semantc brdge whch needs to be further learned and refned n functonal form. Ths learned brdge s then leveraged n order to translate the semantc nformaton n the text features nto the mage doman. Related to the problem of transfer learnng. It s appled to the multmeda doman n order to leverage the semantc labels n text corpora to annotate mage corpora wth scarce labels. Co-occurrence nformaton s nosy, but suffcently rch on an aggregate bass.

Modelng Develop a mathematcal model for the functonal relatonshps between text and mage features, so as to ndrectly transfer semantc knowledge through feature transformatons. Ths feature transformaton s accomplshed by mappng nstances from dfferent domans nto a common space of unspecfed topcs. Ths s used as a brdge to semantcally connect the two heterogeneous spaces. We evaluate our knowledge transfer technques on an mage classfcaton task wth labeled text corpora.

Broad Approach Desgn a translator functon whch represents the functonal relatonshps between mages and text (from the common topc space). Both the correspondence nformaton and auxlary mage tranng set are used to learn the translator. Lnks the nstances across heterogeneous text and mage spaces. Follow the prncple of parsmony and encode as few topcs as possble. After the translator functon s learned, the semantc labels can be propagated from any labeled text corpus to any new mage by a process of cross-doman label propagaton.

Notatons and Defntons Let R a and R b be the source and target feature spaces. In the source (text) space, we have a set of n (s) text documents n R a. Each text document s represented by a feature vector x (s) R a,1 n (s). Ths text {( corpus has already been annotated wth class labels A (s) = x (s),y (s) ) } 1 n (s), where y (s) {+1, 1} s the bnary label. Bnary assumpton s wthout loss of generalty, because the extenson to the mult-class case s straghtforward.

Notatons and Defntons The mages are represented by feature vectors x (t) n the target space R b A key component whch provdes such brdgng nformaton about the relatonshp between the {( text and mage feature spaces s a co-occurrence set C = x (s) ( k, x(t) l,c x (s) ))} k, x(t) l. For the text document x (s) k ts correspondng mage feature vector x (t) l n the co-occurrence set, we denote the co- ( x (s) ) k, x(t) l. occurrence frequency by c For brevty, we use c k,l to denote c ( x (s) ) k, x(t) l.

Notatons and Defntons Besdes the lnkage-based co-occurrence {( set, we sometmes also have a small set A (t) = x (t),y (t) ) } 1 n (t) of labeled target nstances. Ths s an auxlary set of labeled target nstances, and ts sze s usually much smaller than that of the set of labeled source examples. n (t) n (s). The auxlary set s used n order to enhance the accuracy of the transfer learnng process.

Formal Approach wth Translator Functon One of the key ntermedate steps durng ths process s the desgn of a translator functon between text and mages. Ths translator functon serves as a condut to measure the lnkng strength between text and mage features. The translator T s a functon defned on text space R a as well as mage space R b as T : R a R b R. It assgns a real value to each par of text and mage nstances to wegh ther lnkng strength. Value can be ether postve or negatve, representng ether postve or negatve lnkages.

Formal Approach wth Translator Functon Gven a new mage x (t), ts label s determned by a dscrmnant functon as a lnear combnaton of the class labels n A (s) weghted by the correspondng translator functons: f T ( x (t) ) = n (s) =1 y (s) T ( x (s),x (t) ) The sgn of f T ( x (t) ) provdes the class label of x (t). The key to translatng from text to mages s to learn a translator whch can properly explan the correspondence between text and mage spaces.

Learnng the Translator Functon The key to an effectve transfer learnng process s to learn the functon T. We need to formulate an optmzaton problem whch maxmzes the correspondence between the two spaces. Set up a canoncal form for the translator functon n the form of matrces whch represent topc spaces. The parameters of ths canoncal form wll be optmzed n order to learn the translator functon

Learnng the Translator Functon We propose to optmze the followng problem to learn the semantc translator: γ n(t) T =1 mn l ( y (t) f T (x (t) ) ) +λ ( χ c k,l T( x (s) ) k, x(t) l ) C +Ω(T) γ and λ are balancng parameters Uses both co-occurrence data and auxlary data

Desgnng the Translator Functon We wll desgn the canoncal form of the translator functon n terms of underlyng topc spaces. Ths provdes a closed form to our translator functon, whch can be effectvely optmzed. Topc spaces provde a natural ntermedate representaton whch can semantcally lnk the nformaton between the text and mages.

Desgnng the Translator Functon Topc spaces are represented by transformaton matrces. W (s) R p a : R a R p,x (s) W (s) x (s) W (t) R p b : R b R p,x (t) W (t) x (t) The translator functon s defned as a functon of the source and target nstances by computng the nner product n our hypothetcal topc space, whch s mpled by these transformaton matrces: T ( x (s),x (t) ) = W (s) x (s),w (t) x (t) = x (s) W (s) W (t) x (t) = x (s) Sx (t)

Observatons The choce of the transformaton matrces (or rather the product matrx W (s) W (t) ) mpacts the translator functon T drectly. We wll use the notaton S n order to brefly denote the matrx W (s) W (t). It suffces to learn ths product matrx S rather than the two transformaton matrces separately. The above defnton of the matrx S can be used to rewrte the dscrmnant functon as follows: f S ( x (t) ) = n (s) =1 y (s) x (s) Sx (t)

Regularzaton Use conventonal squared norm for regularzaton. Ω(T) = 1 2 ( W (s) 2 W F + (t) ) 2 F Use trace-norm as a substtute to force convexty It s defned as follows: S Σ = 1 nf S=W (s) W (t) 2 ( W (s) 2 W F + (t) ) 2 F

Obectve Functon after Regularzaton The regularzed obectve functon can be rewrtten as follows: mn S γ n (t) =1 l ( y (t) f S (x (t) ) ) +λ C χ ( c k,l x (s) k S x (t) ) l + S Σ Convex functon whch can be optmzed wth gradent descent method.

Choce of Loss Functon Need to decde whch functons are used for the loss functons l( ) and χ( ). These functons measure complance wth the observed cooccurrence and the margn of dscrmnant functons f S ( ) on the auxlary data set, respectvely. Use the well known logstc loss functon l(z) = log{1+exp( z)} for the frst functon. Use the exponentally decreasng functon χ(z) = exp( z) for the second.

Optmzaton Strategy After performng the afore-mentoned substtutons the obectve functon s non-lnear. One possblty for optmzng an obectve functon of the form represented can be solved by usng sem-defnate programmng on the dual formulaton (Srebro et al). Approach does not scale well wth the sze of the problem General approach s to adapt a proxmal gradent method to mnmze such non-lnear obectve functons wth the use of a trace norm regularzer (Toh & Yun 09).

Proxmal Gradent Method In order to represent the obectve functon more succnctly, we ntroduce the functon F (S): F (S) = γ n (t) =1 l ( y (t) f S ( x (t) )) +λ C χ ( c k,l x (s) k Sx (t) ) l The optmzaton obectve functon can be rewrtten as F (S)+ S Σ. In order to optmze ths obectve functon, the proxmal gradent method quadratcally approxmates t by Taylor expanson at current value of S = S τ and Lpschtz coeffcent α as follows: Q(S,S τ ) = F (S τ )+ F (S τ ),S S τ + α 2 S S τ 2 F + S Σ

Obectve Functon Gradent The gradent of the functon needs to be evaluated n order to enable the teratve method The gradent F (S τ ) can be computed as follows: n (s) F (S τ ) = γ y (s) x (s) =1 n (t) =1 l ( y (t) f Sτ ( x (t) )) y (t) x (t) + +λ C ( {χ c k,l x (s) k S τ x (t) ) l c k,l x (s) } k x(t) l l (z) = e z 1+e z and χ (z) = e z are the dervatves of l(z) and χ(z) wth respect to z.

Proxmal Gradent Method S can be updated by mnmzng Q(S,S τ ) wth fxed S τ teratvely. Ths can be solved by sngular value thresholdng (Ca, Candes, Shen 08) The convergence of the proxmal gradent algorthm can be accelerated by makng an ntal estmate of α and ncreasng t by a constant factor η (Toh and Yun 09). Approach contnued untl F ( S τ+1 ) + Sτ+1 Σ Q ( S τ+1,s τ ). At ths pont, t s deemed that we are suffcently close to an optmum soluton, and the algorthm termnates.

Expermental Results Tested the method on number of real data sets. Use Wkpeda and Flckr data for text and assocated mages Use class labels as query keywords to obtan text and assocated mages Compared our technque to: SVM based mage classfer Transfer Learnng Methods (TLRsk, HTL)

Ten Auxlary Tranng Images Category Image only HTL TLRsk TTI brds 0.2639±0.0012 0.2619±0.0015 0.2546±0.0018 0.252±0.0008 buldngs 0.2856±0.0002 0.2707±0.0021 0.2555±0.0014 0.2303±0.0017 cars 0.3027±0.0073 0.3065±0.0030 0.2543±0.0029 0.2299±0.0031 cat 0.2755±0.0043 0.2525±0.0038 0.2553±0.0028 0.2424±0.0026 dog 0.2252±0.0039 0.2343±0.0037 0.2545±0.0031 0.2162±0.0027 horses 0.2667±0.0019 0.2500±0.0021 0.2551±0.0016 0.2383±0.0013 mountan 0.3176±0.0010 0.3097±0.0003 0.2541±0.0011 0.2626±0.0007 plane 0.2667±0.0009 0.2133±0.0008 0.2546±0.0005 0.2567±0.0012 tran 0.2624±0.0029 0.2716±0.0118 0.2552±0.0025 0.2346±0.0031 waterfall 0.2611±0.0008 0.2435±0.0009 0.2555±0.0016 0.2546±0.0007

Two Auxlary Tranng Images Category Image only HTL TLRsk TTI brds 0.3293±0.0105 0.3293±0.0124 0.2817±0.0097 0.2738±0.0080 buldngs 0.3272±0.0061 0.3295±0.0041 0.2758±0.0023 0.2329±0.0032 cars 0.2529±0.0059 0.2759±0.0048 0.2639±0.0032 0.1647±0.0058 cat 0.3333±0.0071 0.3333±0.0060 0.2480±0.0109 0.2525±0.0083 dog 0.3694±0.0031 0.3694±0.0087 0.2793±0.0161 0.252±0.0092 horses 0.25±0.0087 0.3±0.0050 0.2679±0.0069 0.2±0.0015 mountan 0.3311±0.0016 0.3322±0.0009 0.2817±0.0021 0.2699±0.0004 plane 0.2667±0.0019 0.225±0.0006 0.2758±0.0006 0.2517±0.0011 tran 0.3333±0.0084 0.3333±0.0068 0.2738±0.0105 0.2099±0.0060 waterfall 0.2693±0.0009 0.2694±0.0016 0.2659±0.0020 0.257±0.0007

Rank of Topc Matrx Category Two trn. ex. Ten trn. ex. brds 11 9 buldngs 88 102 cars 19 3 cat 18 2 dog 7 5 horses 4 1 mountan 6 1 plane 15 25 tran 6 3 waterfall 21 26

Senstvty to Varyng Number of Tranng Images 0.31 0.3 0.29 Image Only HTL TLRsk TTI 0.28 Error rate 0.27 0.26 0.25 0.24 0.23 1 2 3 4 5 6 7 8 9 10 Number of auxlary tranng examples Senstvty to number of Tranng Images

Senstvty to Number of Co-occurrng examples 0.28 0.275 Average error rate 0.27 0.265 0.26 0.255 0.25 Image Only HTL TLRsk TTI 0.245 0.24 500 1000 1500 2000 Number of co occurred text mage pars Senstvty to number of co-occurrng examples

Senstvty to Parameters 0.34 0.33 0.32 γ = 0.1 γ = 0.5 γ = 1.0 γ = 2.0 0.31 Error rate 0.3 0.29 0.28 0.27 0.26 0.25 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 λ Senstvty to Parameter Values

Conclusons and Summary New method for transfer learnng between text and web mages Uses co-occurrence data as a brdge for the transfer process Bulds new topc space based on co-occurrence data Leverages topc space for classfcaton Expermental results show advantages over competng methods