Towards Semantic Knowledge Propagation from Text to Web Images

Similar documents
Machine Learning. Support Vector Machines. (contains material adapted from talks by Constantin F. Aliferis & Ioannis Tsamardinos, and Martin Law)

Classification / Regression Support Vector Machines

Support Vector Machines. CS534 - Machine Learning

Support Vector Machines

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Support Vector Machines

Discriminative Dictionary Learning with Pairwise Constraints

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Lecture 5: Multilayer Perceptrons

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Feature Reduction and Selection

Collaboratively Regularized Nearest Points for Set Based Recognition

Cost-efficient deployment of distributed software services

Hermite Splines in Lie Groups as Products of Geodesics

GSLM Operations Research II Fall 13/14

Cluster Analysis of Electrical Behavior

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Kent State University CS 4/ Design and Analysis of Algorithms. Dept. of Math & Computer Science LECT-16. Dynamic Programming

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 15

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

Sum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

LECTURE : MANIFOLD LEARNING

Unsupervised Learning

CHAPTER 3 SEQUENTIAL MINIMAL OPTIMIZATION TRAINED SUPPORT VECTOR CLASSIFIER FOR CANCER PREDICTION

Announcements. Supervised Learning

Machine Learning 9. week

Solving two-person zero-sum game by Matlab

TN348: Openlab Module - Colocalization

Face Recognition University at Buffalo CSE666 Lecture Slides Resources:

Active Contours/Snakes

Learning a Class-Specific Dictionary for Facial Expression Recognition

12/2/2009. Announcements. Parametric / Non-parametric. Case-Based Reasoning. Nearest-Neighbor on Images. Nearest-Neighbor Classification

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters

Machine Learning: Algorithms and Applications

Polyhedral Compilation Foundations

Smoothing Spline ANOVA for variable screening

Edge Detection in Noisy Images Using the Support Vector Machines

Review of approximation techniques

Modeling and Solving Nontraditional Optimization Problems Session 2a: Conic Constraints

Range images. Range image registration. Examples of sampling patterns. Range images and range surfaces

S1 Note. Basis functions.

Backpropagation: In Search of Performance Parameters

High-Boost Mesh Filtering for 3-D Shape Enhancement

SENSITIVITY ANALYSIS IN LINEAR PROGRAMMING USING A CALCULATOR

Incremental MQDF Learning for Writer Adaptive Handwriting Recognition 1

Machine Learning. Topic 6: Clustering

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems

Computer Animation and Visualisation. Lecture 4. Rigging / Skinning

Efficient Distributed File System (EDFS)

y and the total sum of

A Bilinear Model for Sparse Coding

Laplacian Eigenmap for Image Retrieval

Manifold-Ranking Based Keyword Propagation for Image Retrieval *

On Some Entertaining Applications of the Concept of Set in Computer Science Course

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices

A Post Randomization Framework for Privacy-Preserving Bayesian. Network Parameter Learning

LECTURE NOTES Duality Theory, Sensitivity Analysis, and Parametric Programming

Optimal Workload-based Weighted Wavelet Synopses

Lecture 4: Principal components

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms

Ecient Computation of the Most Probable Motion from Fuzzy. Moshe Ben-Ezra Shmuel Peleg Michael Werman. The Hebrew University of Jerusalem

Intra-Parametric Analysis of a Fuzzy MOLP

Classifier Selection Based on Data Complexity Measures *

Biostatistics 615/815

Local Quaternary Patterns and Feature Local Quaternary Patterns

Optimizing Document Scoring for Query Retrieval

Overview. Basic Setup [9] Motivation and Tasks. Modularization 2008/2/20 IMPROVED COVERAGE CONTROL USING ONLY LOCAL INFORMATION

Human Face Recognition Using Generalized. Kernel Fisher Discriminant

Problem Set 3 Solutions

An Accurate Evaluation of Integrals in Convex and Non convex Polygonal Domain by Twelve Node Quadrilateral Finite Element Method

Recognizing Faces. Outline

Taxonomy of Large Margin Principle Algorithms for Ordinal Regression Problems

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

Robust Dictionary Learning with Capped l 1 -Norm

ON SOME ENTERTAINING APPLICATIONS OF THE CONCEPT OF SET IN COMPUTER SCIENCE COURSE

PYTHON IMPLEMENTATION OF VISUAL SECRET SHARING SCHEMES

Keywords - Wep page classification; bag of words model; topic model; hierarchical classification; Support Vector Machines

Outline. Self-Organizing Maps (SOM) US Hebbian Learning, Cntd. The learning rule is Hebbian like:

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes

Improving Web Image Search using Meta Re-rankers

A New Token Allocation Algorithm for TCP Traffic in Diffserv Network

The Research of Support Vector Machine in Agricultural Data Classification

Exercises (Part 4) Introduction to R UCLA/CCPR. John Fox, February 2005

BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET

An Optimal Algorithm for Prufer Codes *

OPL: a modelling language

Positive Semi-definite Programming Localization in Wireless Sensor Networks

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

Discriminative classifiers for object classification. Last time

A Clustering Algorithm for Chinese Adjectives and Nouns 1

Face Recognition Based on SVM and 2DPCA

IN recent years, recommender systems, which help users discover

Data-dependent Hashing Based on p-stable Distribution

Very simple computational domains can be discretized using boundary-fitted structured meshes (also called grids)

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. *, NO. *, Dictionary Pair Learning on Grassmann Manifolds for Image Denoising

Unsupervised Learning and Clustering

Transcription:

Guoun Q (Unversty of Illnos at Urbana-Champagn) Charu C. Aggarwal (IBM T. J. Watson Research Center) Thomas Huang (Unversty of Illnos at Urbana-Champagn) Towards Semantc Knowledge Propagaton from Text to Web Images WWW Conference, Hyderabad, Inda 2011

Introducton The problem of mage classfcaton s challengng n many scenaros: Labeled mage data may be scarce n many settngs. The mage features are not drectly related to semantc concepts nherent n class labels. The combnaton of the above factors can be challengng. Goal: To address these challenges wth the use of semantc knowledge propagaton.

Semantc Challenges The semantc challenges of mage features are evdent, when we attempt to recognze complex abstract concepts. The vsual features often fal to dscrmnate such concepts. Classfers naturally work better wth features that have semantc nterpretablty. Class labels are also usually desgned on the bass of applcaton-specfc semantc crtera. Text features are nherently frendly to the classfcaton process n a way that s often a challenge for mage representatons.

Observatons n the Context of Web and Socal Networks In many real web and socal meda applcatons, t s possble to obtan co-occurrence nformaton between text and mages. Tremendous amount of lnkage between text and mages on the web, socal meda and nformaton networks In web pages, the mages co-occur wth text on the same web page. Comments n mage sharng stes. Posts n socal networks.

Learnng from Semantc Brdges The copous avalablty of brdgng relatonshps between text and mages n the context of web and socal network data can be leveraged for better learnng models. It s reasonable to assume that the content of the text and the mages are hghly correlated n both scenaros. The relatonshps between text and mages can be used n order to facltate the learnng process.

Observatons The co-occurrence data provdes a raw semantc brdge whch needs to be further learned and refned n functonal form. Ths learned brdge s then leveraged n order to translate the semantc nformaton n the text features nto the mage doman. Related to the problem of transfer learnng. It s appled to the multmeda doman n order to leverage the semantc labels n text corpora to annotate mage corpora wth scarce labels. Co-occurrence nformaton s nosy, but suffcently rch on an aggregate bass.

Modelng Develop a mathematcal model for the functonal relatonshps between text and mage features, so as to ndrectly transfer semantc knowledge through feature transformatons. Ths feature transformaton s accomplshed by mappng nstances from dfferent domans nto a common space of unspecfed topcs. Ths s used as a brdge to semantcally connect the two heterogeneous spaces. We evaluate our knowledge transfer technques on an mage classfcaton task wth labeled text corpora.

Broad Approach Desgn a translator functon whch represents the functonal relatonshps between mages and text (from the common topc space). Both the correspondence nformaton and auxlary mage tranng set are used to learn the translator. Lnks the nstances across heterogeneous text and mage spaces. Follow the prncple of parsmony and encode as few topcs as possble. After the translator functon s learned, the semantc labels can be propagated from any labeled text corpus to any new mage by a process of cross-doman label propagaton.

Notatons and Defntons Let R a and R b be the source and target feature spaces. In the source (text) space, we have a set of n (s) text documents n R a. Each text document s represented by a feature vector x (s) R a,1 n (s). Ths text {( corpus has already been annotated wth class labels A (s) = x (s),y (s) ) } 1 n (s), where y (s) {+1, 1} s the bnary label. Bnary assumpton s wthout loss of generalty, because the extenson to the mult-class case s straghtforward.

Notatons and Defntons The mages are represented by feature vectors x (t) n the target space R b A key component whch provdes such brdgng nformaton about the relatonshp between the {( text and mage feature spaces s a co-occurrence set C = x (s) ( k, x(t) l,c x (s) ))} k, x(t) l. For the text document x (s) k ts correspondng mage feature vector x (t) l n the co-occurrence set, we denote the co- ( x (s) ) k, x(t) l. occurrence frequency by c For brevty, we use c k,l to denote c ( x (s) ) k, x(t) l.

Notatons and Defntons Besdes the lnkage-based co-occurrence {( set, we sometmes also have a small set A (t) = x (t),y (t) ) } 1 n (t) of labeled target nstances. Ths s an auxlary set of labeled target nstances, and ts sze s usually much smaller than that of the set of labeled source examples. n (t) n (s). The auxlary set s used n order to enhance the accuracy of the transfer learnng process.

Formal Approach wth Translator Functon One of the key ntermedate steps durng ths process s the desgn of a translator functon between text and mages. Ths translator functon serves as a condut to measure the lnkng strength between text and mage features. The translator T s a functon defned on text space R a as well as mage space R b as T : R a R b R. It assgns a real value to each par of text and mage nstances to wegh ther lnkng strength. Value can be ether postve or negatve, representng ether postve or negatve lnkages.

Formal Approach wth Translator Functon Gven a new mage x (t), ts label s determned by a dscrmnant functon as a lnear combnaton of the class labels n A (s) weghted by the correspondng translator functons: f T ( x (t) ) = n (s) =1 y (s) T ( x (s),x (t) ) The sgn of f T ( x (t) ) provdes the class label of x (t). The key to translatng from text to mages s to learn a translator whch can properly explan the correspondence between text and mage spaces.

Learnng the Translator Functon The key to an effectve transfer learnng process s to learn the functon T. We need to formulate an optmzaton problem whch maxmzes the correspondence between the two spaces. Set up a canoncal form for the translator functon n the form of matrces whch represent topc spaces. The parameters of ths canoncal form wll be optmzed n order to learn the translator functon

Learnng the Translator Functon We propose to optmze the followng problem to learn the semantc translator: γ n(t) T =1 mn l ( y (t) f T (x (t) ) ) +λ ( χ c k,l T( x (s) ) k, x(t) l ) C +Ω(T) γ and λ are balancng parameters Uses both co-occurrence data and auxlary data

Desgnng the Translator Functon We wll desgn the canoncal form of the translator functon n terms of underlyng topc spaces. Ths provdes a closed form to our translator functon, whch can be effectvely optmzed. Topc spaces provde a natural ntermedate representaton whch can semantcally lnk the nformaton between the text and mages.

Desgnng the Translator Functon Topc spaces are represented by transformaton matrces. W (s) R p a : R a R p,x (s) W (s) x (s) W (t) R p b : R b R p,x (t) W (t) x (t) The translator functon s defned as a functon of the source and target nstances by computng the nner product n our hypothetcal topc space, whch s mpled by these transformaton matrces: T ( x (s),x (t) ) = W (s) x (s),w (t) x (t) = x (s) W (s) W (t) x (t) = x (s) Sx (t)

Observatons The choce of the transformaton matrces (or rather the product matrx W (s) W (t) ) mpacts the translator functon T drectly. We wll use the notaton S n order to brefly denote the matrx W (s) W (t). It suffces to learn ths product matrx S rather than the two transformaton matrces separately. The above defnton of the matrx S can be used to rewrte the dscrmnant functon as follows: f S ( x (t) ) = n (s) =1 y (s) x (s) Sx (t)

Regularzaton Use conventonal squared norm for regularzaton. Ω(T) = 1 2 ( W (s) 2 W F + (t) ) 2 F Use trace-norm as a substtute to force convexty It s defned as follows: S Σ = 1 nf S=W (s) W (t) 2 ( W (s) 2 W F + (t) ) 2 F

Obectve Functon after Regularzaton The regularzed obectve functon can be rewrtten as follows: mn S γ n (t) =1 l ( y (t) f S (x (t) ) ) +λ C χ ( c k,l x (s) k S x (t) ) l + S Σ Convex functon whch can be optmzed wth gradent descent method.

Choce of Loss Functon Need to decde whch functons are used for the loss functons l( ) and χ( ). These functons measure complance wth the observed cooccurrence and the margn of dscrmnant functons f S ( ) on the auxlary data set, respectvely. Use the well known logstc loss functon l(z) = log{1+exp( z)} for the frst functon. Use the exponentally decreasng functon χ(z) = exp( z) for the second.

Optmzaton Strategy After performng the afore-mentoned substtutons the obectve functon s non-lnear. One possblty for optmzng an obectve functon of the form represented can be solved by usng sem-defnate programmng on the dual formulaton (Srebro et al). Approach does not scale well wth the sze of the problem General approach s to adapt a proxmal gradent method to mnmze such non-lnear obectve functons wth the use of a trace norm regularzer (Toh & Yun 09).

Proxmal Gradent Method In order to represent the obectve functon more succnctly, we ntroduce the functon F (S): F (S) = γ n (t) =1 l ( y (t) f S ( x (t) )) +λ C χ ( c k,l x (s) k Sx (t) ) l The optmzaton obectve functon can be rewrtten as F (S)+ S Σ. In order to optmze ths obectve functon, the proxmal gradent method quadratcally approxmates t by Taylor expanson at current value of S = S τ and Lpschtz coeffcent α as follows: Q(S,S τ ) = F (S τ )+ F (S τ ),S S τ + α 2 S S τ 2 F + S Σ

Obectve Functon Gradent The gradent of the functon needs to be evaluated n order to enable the teratve method The gradent F (S τ ) can be computed as follows: n (s) F (S τ ) = γ y (s) x (s) =1 n (t) =1 l ( y (t) f Sτ ( x (t) )) y (t) x (t) + +λ C ( {χ c k,l x (s) k S τ x (t) ) l c k,l x (s) } k x(t) l l (z) = e z 1+e z and χ (z) = e z are the dervatves of l(z) and χ(z) wth respect to z.

Proxmal Gradent Method S can be updated by mnmzng Q(S,S τ ) wth fxed S τ teratvely. Ths can be solved by sngular value thresholdng (Ca, Candes, Shen 08) The convergence of the proxmal gradent algorthm can be accelerated by makng an ntal estmate of α and ncreasng t by a constant factor η (Toh and Yun 09). Approach contnued untl F ( S τ+1 ) + Sτ+1 Σ Q ( S τ+1,s τ ). At ths pont, t s deemed that we are suffcently close to an optmum soluton, and the algorthm termnates.

Expermental Results Tested the method on number of real data sets. Use Wkpeda and Flckr data for text and assocated mages Use class labels as query keywords to obtan text and assocated mages Compared our technque to: SVM based mage classfer Transfer Learnng Methods (TLRsk, HTL)

Ten Auxlary Tranng Images Category Image only HTL TLRsk TTI brds 0.2639±0.0012 0.2619±0.0015 0.2546±0.0018 0.252±0.0008 buldngs 0.2856±0.0002 0.2707±0.0021 0.2555±0.0014 0.2303±0.0017 cars 0.3027±0.0073 0.3065±0.0030 0.2543±0.0029 0.2299±0.0031 cat 0.2755±0.0043 0.2525±0.0038 0.2553±0.0028 0.2424±0.0026 dog 0.2252±0.0039 0.2343±0.0037 0.2545±0.0031 0.2162±0.0027 horses 0.2667±0.0019 0.2500±0.0021 0.2551±0.0016 0.2383±0.0013 mountan 0.3176±0.0010 0.3097±0.0003 0.2541±0.0011 0.2626±0.0007 plane 0.2667±0.0009 0.2133±0.0008 0.2546±0.0005 0.2567±0.0012 tran 0.2624±0.0029 0.2716±0.0118 0.2552±0.0025 0.2346±0.0031 waterfall 0.2611±0.0008 0.2435±0.0009 0.2555±0.0016 0.2546±0.0007

Two Auxlary Tranng Images Category Image only HTL TLRsk TTI brds 0.3293±0.0105 0.3293±0.0124 0.2817±0.0097 0.2738±0.0080 buldngs 0.3272±0.0061 0.3295±0.0041 0.2758±0.0023 0.2329±0.0032 cars 0.2529±0.0059 0.2759±0.0048 0.2639±0.0032 0.1647±0.0058 cat 0.3333±0.0071 0.3333±0.0060 0.2480±0.0109 0.2525±0.0083 dog 0.3694±0.0031 0.3694±0.0087 0.2793±0.0161 0.252±0.0092 horses 0.25±0.0087 0.3±0.0050 0.2679±0.0069 0.2±0.0015 mountan 0.3311±0.0016 0.3322±0.0009 0.2817±0.0021 0.2699±0.0004 plane 0.2667±0.0019 0.225±0.0006 0.2758±0.0006 0.2517±0.0011 tran 0.3333±0.0084 0.3333±0.0068 0.2738±0.0105 0.2099±0.0060 waterfall 0.2693±0.0009 0.2694±0.0016 0.2659±0.0020 0.257±0.0007

Rank of Topc Matrx Category Two trn. ex. Ten trn. ex. brds 11 9 buldngs 88 102 cars 19 3 cat 18 2 dog 7 5 horses 4 1 mountan 6 1 plane 15 25 tran 6 3 waterfall 21 26

Senstvty to Varyng Number of Tranng Images 0.31 0.3 0.29 Image Only HTL TLRsk TTI 0.28 Error rate 0.27 0.26 0.25 0.24 0.23 1 2 3 4 5 6 7 8 9 10 Number of auxlary tranng examples Senstvty to number of Tranng Images

Senstvty to Number of Co-occurrng examples 0.28 0.275 Average error rate 0.27 0.265 0.26 0.255 0.25 Image Only HTL TLRsk TTI 0.245 0.24 500 1000 1500 2000 Number of co occurred text mage pars Senstvty to number of co-occurrng examples

Senstvty to Parameters 0.34 0.33 0.32 γ = 0.1 γ = 0.5 γ = 1.0 γ = 2.0 0.31 Error rate 0.3 0.29 0.28 0.27 0.26 0.25 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 λ Senstvty to Parameter Values

Conclusons and Summary New method for transfer learnng between text and web mages Uses co-occurrence data as a brdge for the transfer process Bulds new topc space based on co-occurrence data Leverages topc space for classfcaton Expermental results show advantages over competng methods