A Bilinear Model for Sparse Coding

Similar documents
Feature Reduction and Selection

Lecture 5: Multilayer Perceptrons

IMAGE FUSION TECHNIQUES

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Support Vector Machines

A Binarization Algorithm specialized on Document Images and Photos

Face Recognition University at Buffalo CSE666 Lecture Slides Resources:

Cluster Analysis of Electrical Behavior

LECTURE : MANIFOLD LEARNING

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1

Smoothing Spline ANOVA for variable screening

Edge Detection in Noisy Images Using the Support Vector Machines

Recognizing Faces. Outline

Discriminative Dictionary Learning with Pairwise Constraints

Outline. Self-Organizing Maps (SOM) US Hebbian Learning, Cntd. The learning rule is Hebbian like:

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

The Research of Support Vector Machine in Agricultural Data Classification

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices

Brushlet Features for Texture Image Retrieval

S1 Note. Basis functions.

TN348: Openlab Module - Colocalization

Modular PCA Face Recognition Based on Weighted Average

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A New Feature of Uniformity of Image Texture Directions Coinciding with the Human Eyes Perception 1

CS 534: Computer Vision Model Fitting

Classifying Acoustic Transient Signals Using Artificial Intelligence

y and the total sum of

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

High-Boost Mesh Filtering for 3-D Shape Enhancement

Machine Learning. Support Vector Machines. (contains material adapted from talks by Constantin F. Aliferis & Ioannis Tsamardinos, and Martin Law)

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

X- Chart Using ANOM Approach

Optimal Workload-based Weighted Wavelet Synopses

Image Alignment CSC 767

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

Sum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints

Hybrid Non-Blind Color Image Watermarking

Human Face Recognition Using Generalized. Kernel Fisher Discriminant

A Novel Adaptive Descriptor Algorithm for Ternary Pattern Textures

Local Quaternary Patterns and Feature Local Quaternary Patterns

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION

Support Vector Machines

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Modeling, Manipulating, and Visualizing Continuous Volumetric Data: A Novel Spline-based Approach

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

Classification of Face Images Based on Gender using Dimensionality Reduction Techniques and SVM

Comparison Study of Textural Descriptors for Training Neural Network Classifiers

Backpropagation: In Search of Performance Parameters

APPLICATION OF MULTIVARIATE LOSS FUNCTION FOR ASSESSMENT OF THE QUALITY OF TECHNOLOGICAL PROCESS MANAGEMENT

An Accurate Evaluation of Integrals in Convex and Non convex Polygonal Domain by Twelve Node Quadrilateral Finite Element Method

Fitting: Deformable contours April 26 th, 2018

Texture Feature Extraction Inspired by Natural Vision System and HMAX Algorithm

Lecture 4: Principal components

Learning a Class-Specific Dictionary for Facial Expression Recognition

Positive Semi-definite Programming Localization in Wireless Sensor Networks

APPLICATION OF A COMPUTATIONALLY EFFICIENT GEOSTATISTICAL APPROACH TO CHARACTERIZING VARIABLY SPACED WATER-TABLE DATA

PRÉSENTATIONS DE PROJETS

Detection of an Object by using Principal Component Analysis

An Entropy-Based Approach to Integrated Information Needs Assessment

Unsupervised Learning

Accounting for the Use of Different Length Scale Factors in x, y and z Directions

IMAGE FUSION BASED ON EXTENSIONS OF INDEPENDENT COMPONENT ANALYSIS

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Fast Feature Value Searching for Face Detection

Recognition of Handwritten Numerals Using a Combined Classifier with Hybrid Features

Unsupervised Learning and Clustering

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

COMPLEX WAVELET TRANSFORM-BASED COLOR INDEXING FOR CONTENT-BASED IMAGE RETRIEVAL

A New Approach For the Ranking of Fuzzy Sets With Different Heights

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Unsupervised Learning and Clustering

Classifier Selection Based on Data Complexity Measures *

2x x l. Module 3: Element Properties Lecture 4: Lagrange and Serendipity Elements

Feature Extractions for Iris Recognition

12/2/2009. Announcements. Parametric / Non-parametric. Case-Based Reasoning. Nearest-Neighbor on Images. Nearest-Neighbor Classification

Hermite Splines in Lie Groups as Products of Geodesics

Range images. Range image registration. Examples of sampling patterns. Range images and range surfaces

A Gradient Difference based Technique for Video Text Detection

Using Fuzzy Logic to Enhance the Large Size Remote Sensing Images

Novel Fuzzy logic Based Edge Detection Technique

Analysis of Continuous Beams in General

Active Contours/Snakes

A Gradient Difference based Technique for Video Text Detection

Lecture #15 Lecture Notes

Fuzzy Filtering Algorithms for Image Processing: Performance Evaluation of Various Approaches

Laplacian Eigenmap for Image Retrieval

Related-Mode Attacks on CTR Encryption Mode

User Authentication Based On Behavioral Mouse Dynamics Biometrics

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems

LEARNING A WARPED SUBSPACE MODEL OF FACES WITH IMAGES OF UNKNOWN POSE AND ILLUMINATION

Integrated Expression-Invariant Face Recognition with Constrained Optical Flow

Fuzzy C-Means Initialized by Fixed Threshold Clustering for Improving Image Retrieval

Learning-based License Plate Detection on Edge Features

Appearance-based Statistical Methods for Face Recognition

An Image Fusion Approach Based on Segmentation Region

Object-Based Techniques for Image Retrieval

Robust Dictionary Learning with Capped l 1 -Norm

Transcription:

A Blnear Model for Sparse Codng Davd B. Grmes and Rajesh P. N. Rao Department of Computer Scence and Engneerng Unversty of Washngton Seattle, WA 98195-2350, U.S.A. grmes,rao @cs.washngton.edu Abstract Recent algorthms for sparse codng and ndependent component analyss (ICA) have demonstrated how localzed features can be learned from natural mages. However, these approaches do not take mage transformatons nto account. As a result, they produce mage codes that are redundant because the same feature s learned at multple locatons. We descrbe an algorthm for sparse codng based on a blnear generatve model of mages. By explctly modelng the nteracton between mage features and ther transformatons, the blnear approach helps reduce redundancy n the mage code and provdes a bass for transformatonnvarant vson. We present results demonstratng blnear sparse codng of natural mages. We also explore an extenson of the model that can capture spatal relatonshps between the ndependent features of an object, thereby provdng a new framework for parts-based object recognton. 1 Introducton Algorthms for redundancy reducton and effcent codng have been the subject of consderable attenton n recent years [6, 3, 4, 7, 9, 5, 11]. Although the basc deas can be traced to the early work of Attneave [1] and Barlow [2], recent technques such as ndependent component analyss (ICA) and sparse codng have helped formalze these deas and have demonstrated the feasblty of effcent codng through redundancy reducton. These technques produce an effcent code by attemptng to mnmze the dependences between elements of the code by usng approprate constrants. One of the most successful applcatons of ICA and sparse codng has been n the area of mage codng. Olshausen and Feld showed that sparse codng of natural mages produces localzed, orented bass flters that resemble the receptve felds of smple cells n prmary vsual cortex [6, 7]. Bell and Sejnowsk obtaned smlar results usng ther algorthm for ICA [3]. However, these approaches do not take mage transformatons nto account. As a result, the same orented feature s often learned at dfferent locatons, yeldng a redundant code. Moreover, the presence of the same feature at multple locatons prevents more complex features from beng learned and leads to a combnatoral exploson when one attempts to scale the approach to large mage patches or herarchcal networks. In ths paper, we propose an approach to sparse codng that explctly models the nterac-

ton between mage features and ther transformatons. A blnear generatve model s used to learn both the ndependent features n an mage as well as ther transformatons. Our approach extends Tenenbaum and Freeman s work on blnear models for learnng content and style [12] by castng the problem wthn probablstc sparse codng framework. Thus, whereas pror work on blnear models used global decomposton methods such as SVD, the approach presented here emphaszes the extracton of local features by removng hgher-order redundances through sparseness constrants. We show that for natural mages, ths approach produces localzed, orented flters that can be translated by dfferent amounts to account for mage features at arbtrary locatons. Our results demonstrate how an mage can be factored nto a set of basc local features and ther transformatons, provdng a bass for transformaton-nvarant vson. We conclude by dscussng how the approach can be extended to allow parts-based object recognton, wheren an object s modeled as a collecton of local features (or parts ) and ther relatve transformatons. 2 Blnear Generatve Models We begn by consderng the standard lnear generatve model used n algorthms for ICA and sparse codng [3, 7, 9]: where s a -dmensonal nput vector (e.g. an mage), and s a -dmensonal bass vector s ts scalar coeffcent. Gven the lnear generatve model above, the goal of ICA s to learn the bass vectors such that the are as ndependent as possble, whle the goal n sparse codng s to make the dstrbuton of hghly kurtotc gven Equaton 1. (2) The coeffcents and jontly modulate a set of bass vectors to produce an nput vector. For the present study, the coeffcent can be regarded as encodng the presence of object feature n the mage whle the values determne the transformaton present n the mage. In the termnology of Tenenbaum and Freeman [12], descrbes the content of the mage whle encodes ts style. The lnear generatve model n Equaton 1 can be extended to the blnear case by usng two ndependent sets of coeffcents and (or equvalently, two vectors and ) [12]: Equaton 2 can also be expressed as a lnear equaton n for a fxed :! "# $ &%' )( Lkewse, for a fxed, one obtans a lnear equaton n. Indeed ths s the defnton of blnear: gven one fxed factor, the model s lnear wth respect to the other factor. The power of blnear models stems from the rch non-lnear nteractons that can be represented by varyng both and smultaneously. 3 Learnng Sparse Blnear Models 3.1 Learnng Blnear Models Our goal s to learn from mage data an approprate set of bass vectors that effectvely descrbe the nteractons between the feature vector and the transformaton vector. (1) (3)

( # A commonly used approach n unsupervsed learnng s to mnmze the sum of squared pxel-wse errors over all mages: (4) (5) where denotes the norm of a vector. A standard approach to mnmzng such a functon s to use gradent descent and alternate between mnmzaton wth respect to and mnmzaton wth respect to. Unfortunately, the optmzaton problem as stated s underconstraned. The functon has many local mnma and results from our smulatons ndcate that convergence s dffcult n many cases. There are many dfferent ways to represent an mage, makng t dffcult for the method to converge to a bass set that can generalze effectvely. A related approach s presented by Tenenbaum and Freeman [12]. Rather than usng gradent descent, ther method estmates the parameters drectly by computng the sngular value decomposton (SVD) of a matrx contanng nput data correspondng to each content class n every style. Ther approach can be regarded as an extenson of methods based on prncpal component analyss (PCA) appled to the blnear case. The SVD approach avods the dffcultes of convergence that plague the gradent descent method and s much faster n practce. Unfortunately, the learned features tend to be global and non-localzed smlar to those obtaned from PCA-based methods based on second-order statstcs. As a result, the method s unsutable for the problem of learnng local features of objects and ther transformatons. The underconstraned nature of the problem can be remeded by mposng constrants on and. In partcular, we could cast the problem wthn a probablstc framework and mpose specfc pror dstrbutons on and wth hgher probabltes for values that acheve certan desrable propertes. We focus here on the class of sparse pror dstrbutons for several reasons: (a) by forcng most of the coeffcents to be zero for any gven nput, sparse prors mnmze redundancy and encourage statstcal ndependence between the varous and between the varous [7], (b) there s growng evdence for sparse representatons n the bran the dstrbuton of neural responses n vsual cortcal areas s hghly kurtotc.e. the cell exhbts lttle actvty for most nputs but responds vgorously for a few nputs, causng a dstrbuton wth a hgh peak near zero and long tals, (c) prevous approaches based on sparseness constrants have obtaned encouragng results [7], and (d) enforcng sparseness on the encourages the parts and local features shared across objects to be learned whle mposng sparseness on the allows object transformatons to be explaned n terms of a small set of basc transformatons. 3.2 Blnear Sparse Codng We assume the followng prors for and : "! where and! are normalzaton constants, $ and % are parameters that control the degree of sparseness, and & s a sparseness functon. For ths study, we used ' & ()+* ',.! (6) (7)

& Wthn a probablstc framework, the squared error functon summed over all mages can be nterpreted as representng the negatve log lkelhood of the data gven the parameters: ()+*! (see, for example, [7]). The prors and can be used to margnalze ths lkelhood to obtan the new lkelhood functon:. The goal then s to fnd the that maxmze, or equvalently, mnmze the negatve log of. Under certan reasonable assumptons (dscussed n [7]), ths s equvalent to mnmzng the followng optmzaton functon over all nput mages:, $, % Gradent descent can be used to derve update rules for the components bass : $ & & (8) and of the feature vector and transformaton vector respectvely for any mage, assumng a fxed $, n batch mode accordng to:, (9) % & (10) Gven a tranng set of nputs, the values for and for each mage after convergence can be used to update the bass set $ (11) As suggested by Olshausen and Feld [7], n order to keep the bass vectors from growng wthout bound, we adapted the norm of each bass vector n such a way that the varances of the and were mantaned at a fxed desred level. 4 Results 4.1 Tranng Paradgm We tested the algorthms for blnear sparse codng on natural mage data. The natural mages we used are dstrbuted by Olshausen and Feld [7], along wth the code for ther algorthm. The tranng set of mages conssted of patches randomly extracted from ten source mages. The mages are pre-whtened to equalze large varances n frequency, and thus speed convergence. We choose to use a complete bass where and we let be at least as large as the number of transformatons (ncludng the notransformaton case). The sparseness parameters $ and % were set to and. In order to assst convergence all learnng occurs n batch mode, where the batch conssted of mage patches. The step sze for gradent descent usng Equaton 11 was set to. The transformatons were chosen to be 2D translatons n the range "!#$ pxels n both the axes. The style/content separaton was enforced by learnng a sngle vector to descrbe an mage patch regardless ts translaton, and lkewse a sngle vector to descrbe a partcular style gven any mage patch content. 4.2 Blnear Sparse Codng of Natural Images Fgure 1 shows the results of tranng on natural mage data. A comparson between the learned features for the lnear generatve model (Equaton 1) and the blnear model s

(a) Example of lnear bass Example of Blnear bass w w y( 3) w y( 2) w y( 1) w y(0) w y(1) w y(2) w y(3) = 1 = 2 (b) Estmated feature x vector = 3 9 91 x y j 3 w j after learnng 1 2 7 8 Canoncal patch Translated patch y y j = Estmated transformaton vectors 1 2 7 8 9 91 Fgure 1: Representng natural mages and ther transformatons wth a sparse blnear model. (a) A comparson of learned features between a standard lnear model and a blnear model, both traned wth the same sparseness prors. The two rows for the blnear case depct the translated object features w (see Equaton 3) for translatons of ( pxels. (b) The representaton of an example natural mage patch, and of the same patch translated to the left. Note that the bar plot representng the vector s ndeed sparse, havng only three sgnfcant coeffcents. The code for the style vectors for both the canoncal patch, and the translated one s lkewse sparse. The bass mages are shown for those dmensons whch have non-zero coeffcents for or. provded n Fgure 1 (a). Although both show smple, localzed, and orented features, the blnear method s able to model the same features under dfferent transformatons. In ths case, the range $ horzontal translatons were used n the tranng of the blnear model. Fgure 1 (b) provdes an example of how the blnear sparse codng model encodes a natural mage patch and the same patch after t has been translated. Note that both the and vectors are sparse. Fgure 2 shows how the model can account for a gven localzed feature at dfferent locatons by varyng the y vector. As shown n the last column of the fgure, the translated local feature s generated by lnearly combnng a sparse set of bass vectors. 4.3 Towards Parts-Based Object Recognton The blnear generatve model n Equaton 2 uses the same set of transformaton values for all the features. Such a model s approprate for global transformatons

Feature 1 (x 57 ) y( 1,+2) Selected transformatons w j y(0,3) Feature 2 (x 32 ) y( 2,0) w j j = 1 2 3 4 5 6 7 8 y(+1,0) j = 1... 8 Fgure 2: Translatng a learned feature to multple locatons. The two rows of eght mages represent the ndvdual bass vectors for two values of. The values for two selected transformatons for each are shown as bar plots. ' & denotes a translaton of ' pxels n the Cartesan plane. The last column shows the resultng bass vectors after translaton. that apply to an entre mage regon such as a shft of llumnaton change. pxels for an mage patch or a global Consder the problem of representng an object n terms of ts consttuent parts. In ths case, we would lke to be able to transform each part ndependently of other parts n order to account for the locaton, orentaton, and sze of each part n the object mage. The standard blnear model can be extended to address ths need as follows: Note that each object feature now has ts own set of transformaton values for all.. The double summaton s thus no longer symmetrc. Also note that the standard model (Equaton 2) s a specal case of Equaton 12 where We have conducted prelmnary experments to test the feasblty of Equaton 12 usng a set of object features learned for the standard blnear model. Fg. 3 shows the results. These results suggest that allowng ndependent transformatons for the dfferent features provdes a rch substrate for modelng mages and objects n terms of a set of local features (or parts) and ther ndvdual transformatons. 5 Summary and Concluson A fundamental problem n vson s to smultaneously recognze objects and ther transformatons [8, 10]. Blnear generatve models provde a tractable way of addressng ths problem by factorng an mage nto object features and transformatons usng a blnear equaton. Prevous approaches used unconstraned blnear models and produced global bass vectors for mage representaton [12]. In contrast, recent research on mage codng has stressed the mportance of localzed, ndependent features derved from metrcs that emphasze the hgher-order statstcs of nputs [6, 3, 7, 5]. Ths paper ntroduces a new probablstc framework for learnng blnear generatve models based on the dea of sparse codng. Our results demonstrate that blnear sparse codng of natural mages produces localzed orented bass vectors that can smultaneously represent features n an mage and ther transformaton. We showed how the learned generatve model can be used to translate a (12)

(a) w y 81 y(0,1) x 81 z x w y 57 57 81 x 57 y(0,1) (b) w 81 j y 81 j y 81 y(0,1) y 57 y( 2,0) x 81 x 57 z y( 2,0) y(0,1) y(1,0) y(1,1) y(1,0) y(1,1) Fgure 3: Modelng ndependently transformed features. (a) shows the standard blnear method of generatng a translated feature by combnng bass vectors usng the same set of values for two dfferent features ( and ). (b) shows four examples of mages generated by allowng dfferent values of for the two dfferent features. Note the sgnfcant dfferences between the resultng mages, whch cannot be obtaned usng the standard blnear model. bass vector to dfferent locatons, thereby reducng the need to learn the same bass vector at multple locatons as n tradtonal sparse codng methods. We also proposed an extenson of the blnear model that allows each feature to be transformed ndependently of other features. Our prelmnary results suggest that such an approach could provde a flexble platform for adaptve parts-based object recognton, wheren objects are descrbed by a set of ndependent, shared parts and ther transformatons. The mportance of parts-based methods has long been recognzed n object recognton n vew of ther ablty to handle a combnatorally large number of objects by combnng parts and ther transformatons. Few methods, f any, exst for learnng representatons of object parts and ther transformatons drectly from mages. Our ongong efforts are therefore focused on dervng effcent algorthms for parts-based object recognton based on the combnaton of blnear models and sparse codng.

Acknowledgments Ths research s supported by NSF grant no. 133592 and a Sloan Research Fellowshp to RPNR. References [1] F. Attneave. Some nformatonal aspects of vsual percepton. Psychologcal Revew, 61(3):183 193, 1954. [2] H. B. Barlow. Possble prncples underlyng the transformaton of sensory messages. In W. A. Rosenblth, edtor, Sensory Communcaton, pages 217 234. Cambrdge, MA: MIT Press, 1961. [3] A. J. Bell and T. J. Sejnowsk. The ndependent components of natural scenes are edge flters. Vson Research, 37(23):3327 3338, 1997. [4] G. E. Hnton and Z. Ghahraman. Generatve models for dscoverng sparse dstrbuted representatons. Phlosophcal Transactons Royal Socety B, 352(1177 1190), 1997. [5] M. S. Lewck and T. J. Sejnowsk. Learnng overcomplete representatons. Neural Computaton, 12(2):337 365, 2000. [6] B. A. Olshausen and D. J. Feld. Emergence of smple-cell receptve feld propertes by learnng a sparse code for natural mages. Nature, 381:607 609, 1996. [7] B. A. Olshausen and D. J. Feld. Sparse codng wth an overcomplete bass set: A strategy employed by V1? Vson Research, 37:33113325, 1997. [8] R. P. N. Rao and D. H. Ballard. Development of localzed orented receptve felds by learnng a translaton-nvarant code for natural mages. Network: Computaton n Neural Systems, 9(2):219 234, 1998. [9] R. P. N. Rao and D. H. Ballard. Predctve codng n the vsual cortex: A functonal nterpretaton of some extra-classcal receptve feld effects. Nature Neuroscence, 2(1):79 87, 1999. [10] R. P. N. Rao and D. L. Ruderman. Learnng Le groups for nvarant vsual percepton. In Advances n Neural Informaton Processng Systems 11, pages 810 816. Cambrdge, MA: MIT Press, 1999. [11] O. Schwartz and E. P. Smoncell. Natural sgnal statstcs and sensory gan control. Nature Neuroscence, 4(8):819 825, August 2001. [12] J. B. Tenenbaum and W. T. Freeman. Separatng style and content wth blnear models. Neural Computaton, 12(6):1247 1283, 2000.