arxiv: v1 [cs.cv] 17 Apr 2018

Size: px

Start display at page:

Download "arxiv: v1 [cs.cv] 17 Apr 2018"

Dominick Briggs
6 years ago
Views:

1 Zero-shot Learning with Complementary Attributes Xiaofeng Xu 1,2, Ivor W. Tsang 2, Chuancai Liu 1 1 School of Computer Science and Engineering, Nanjing University of Science and Technology 2 Centre for Artificial Intelligence, University of Technology Sydney xuxf1992@gmail.com, Ivor.Tsang@uts.edu.au, chuancailiu@njust.edu.cn arxiv: v1 [cs.cv] 17 Apr 2018 Abstract Zero-shot learning (ZSL) aims to recognize unseen objects using disjoint seen objects via attributes to transfer semantic information from training data to testing data. The generalization performance of ZSL is governed by the attributes, which represent the relatedness between the seen classes and the unseen classes. In this paper, we propose a novel ZSL method using complementary attributes as a supplement to the original attributes. We first expand attributes with their complementary form, and then pre-train classifiers for both original attributes and complementary attributes using training data. After ranking classes for each attribute, we use rank aggregation framework to calculate the optimized rank among testing classes of which the highest order is assigned as the label of testing sample. We empirically demonstrate that complementary attributes have an effective improvement for ZSL models. Experimental results show that our approach outperforms state-of-the-art methods on standard ZSL datasets. 1 Introduction With the rapid improvement of computing power and development of machine learning technology, especially the rise of deep neural network, visual object recognition has made tremendous progress in recent years [Sabour et al., 2017]. These recognition systems however require a massive amount of labelled training data to perform excellently and there is a big challenge that they do not work when testing objects did not appear in training time. While humans can identify new object whose description is given even though they have not seen it before [Murphy, 2004]. Taking the above challenge into account and inspired from human cognition system, zeroshot learning (ZSL) has recently received increasing attention in the field of computer vision. Zero-shot learning aims to recognize new objects using disjoint training objects, so some high-level semantic properties of objects are required to transfer information between different domain resources. As a widely employed high-level feature, attribute is a typical nameable description of objects [Farhadi et al., 2009]. Attribute could be the color or shape of an object, and it could also be a certain part or a manual description of an object. For example, object elephant has the attributes big and long nose, object zebra has the attribute striped. Attributes can be binary value for describing the presence or absence of a property for an object, and it can also be continuous value for indicating the correlation between the property and object. ZSL usually adopts a two-stage strategy to recognize unseen objects with seen objects given. Lampert et al. [2014] proposed a attribute-based model for zero-shot visual object categorization, i.e. direct attribute prediction (DAP). In this two-stage model, attributes classifiers are pre-trained using seen training data firstly, and then the label of unseen testing data is inferred by combining the outputs of classifiers via attributes relationship. Attributes play the key role to share information between classes, which governs the performance of ZSL, while some attributes may be irrelevant to some objects although they may be suitable for describing other objects. All of the attributes are utilized to recognize an object whether they are relevant or not to this object. The relevant attributes will contribute a lot of information for recognizing this object while irrelevant attributes contribute little [Fu et al., 2012]. For example, attribute water can play a critical role in recognizing object fish, while it is nearly useless for object bird. In order to make all of the attributes useful for zero-shot learning whether it is relevant or irrelevant, hereafter, we introduce complementary attributes as a supplement to the original attributes. Complementary attributes are the opposite form of the original attributes. For example, complementary attribute corresponding to the attribute small is not small and corresponding to the attribute have a tail is have no tail. In this way, complementary attributes can contribute a lot for recognizing objects when the original attributes contribute little. Hence complementary attributes and original attributes are complementary to describe objects more comprehensively that make the ZSL system more discriminative and robust. DAP makes zero-shot learning a reality, while it suffers from a drawback that it assumes that attributes are independent from each other [Akata et al., 2016]. In fact, correlation between attributes is inescapable since attributes are manually created. Taking this drawback into account and combining with complementary attributes, we presented a novel ZSL model illustrated in Figure 1. Firstly, complementary attributes are introduced and similarities matrix is calculated

2 Training Testing Attributes small have a tail agile smelly Similarities Complementary Attributes Training Data (Seen) Complementary Attributes not small have no tail not agile not smelly Attributes Classifiers Ranks per Attribute Test Data (Unseen) [+,-,-,+,,+,+,-] Probability Outputs Rank Aggregation Framework [C1,C2,,Cn] Ranks for test samples Figure 1: The proposed approach for zero-shot learning using complementary attributes and rank aggregation framework. In training stage (left), classifiers of both attributes and complementary attributes are trained using seen training data and testing classes are ranked per attribute. In testing stage (right), the label of testing sample is predicted using rank aggregation framework combining ranks per attribute with probability outputs of attributes classifiers. to imply the correlation between attributes and objects. After ranking testing classes for each attribute, we apply the rank aggregation framework to calculate the optimized rank among testing classes. Finally, the highest order of the optimized rank is assigned as the label of testing sample. The proposed approach has been evaluated on several standard ZSL datasets. Experimental results showed that the proposed approach outperforms state-of-the-art methods. We summarize our main contributions as follows: 1). We introduce complementary attributes as a supplement to the original attributes; 2). We use rank aggregation framework to fuse ranks per attribute; 3). We conduct experiments on three ZSL datasets, demonstrating that the proposed approach is effective and advanced for zero-shot learning. The rest of the paper is organized as follows. We review related work in Section 2, introduce our approach in Section 3, present experimental results and analysis in Section 4, and finally give a conclusion in Section 5. 2 Related Work Zero-shot learning is a novel but practical task in computer vision and it has attracted increasing interest in recent years. ZSL can recognize new objects through transferring information between seen classes and unseen classes by sharing highlevel semantic properties. We briefly review related work on semantic representation and zero-shot learning below. 2.1 Semantic Representation Semantic representation as a description or property of objects, it can be utilized to sharing knowledge between training data and testing data for ZSL. As prior information, semantic representation can be derived from various sources. Attribute is a popular visual semantic representation of objects. It can be the appearance, a part or the property of objects. Attributes have been used as intermediate between images and classes in various ZSL models. Lampert et al. proposed DAP which categorizes zero-shot visual objects using attribute-label relationship as the assistant information. Akata et al. [2016] proposed attribute label embedding (ALE) which learns a compatibility function mapping image feature to attribute-based label embedding. Different from the leverage of definite value attribute, Parikh et al. [2011] presented relative attribute to describe objects. They proposed a model in which the supervisor relates the unseen objects to previously seen objects. Relative attribute allows for a richer language of supervision and description, for example, Lions are larger than dogs, as large as tigers, but smaller than elephants. In addition to attributes, some alternative sources of assistant information could be leveraged for ZSL. Semantic correlation between high-level property and object description calculated by word2vec model [Mikolov et al., 2013] is an effective semantic feature widely used in ZSL recently [Fu et al., 2015b]. Knowledge mined from Web [Mensink et al., 2014], semantic class taxonomies [Mensink et al., 2012], and class similarities [Yu et al., 2013] are some other high-level features of prior information used for ZSL. 2.2 Zero-shot Learning Zero-shot learning has the ability to recognize new objects by transferring knowledge from seen classes to unseen classes. Increasing attention has been attracted on ZSL and a lot of ZSL methods have been proposed in recent years. Some researchers present two-stage strategy to transfer information. Lampert et al. [2014] proposed two popular benchmark models, i.e. DAP and indirect attribute prediction (IAP). DAP learns probabilistic attribute classifiers using the seen data and infers the labels of unseen data by combining results of pre-trained classifiers. IAP induces a distribution over the labels of testing data through the posterior distribu-

3 tion of the seen data by means on the class-attribute relationship. Similar to IAP, Norouzi et al. [2014] proposed a method using convex combination of semantic embedding with no additional training. An increasing number of methods that directly learn various functions mapping input image features to semantic embedding space have been presented to solve ZSL problems. Some researchers learned linear compatibility functions. Akata et al. [2016] presented an attribute label embedding model which learns a compatibility function using ranking loss. Frome et al. [2013] presented a new deep visual-semantic embedding model using semantic information gleaned from unannotated text. Akata et al. [2015] presented a structured joint embedding framework that recognizes images by finding the label with the highest joint compatibility score. Romera-Paredes et al. [2015] proposed an approach which models the relationships among features, attributes and classes as a two linear layers network. Other researchers learned nonlinear compatibility functions. Xian et al. [2016] presented a nonlinear embedding model that augments bilinear compatibility model by incorporating latent variables. 3 Approach We consider zero-shot learning to be a task that classifies images from unseen classes by training using samples from seen classes. Given a training set S = {(x n, y n ), n = 1,..., N tr } of input/output pairs with x n X and y n Y, our goal is to learn a nontrivial classifier f : X Z through transferring knowledge between Y and Z by attributes, where training classes Y (seen classes) and testing classes Z (unseen classes) are disjoint, i.e. Y Z =. In the proposed approach, we firstly expand attributes with complementary attributes and calculate similarity matrix between classes and attributes. After calculating weights using pre-trained attribute classifiers, rank aggregation framework is used to calculate the proximal rank among testing classes for each testing sample. The highest order of the rank is assigned as the label of testing sample. 3.1 Complementary Attributes Attribute as semantic property of visual objects, is used to solve zero-shot learning problems in many works [Fu et al., 2015a; Al-Halah et al., 2016; Wang et al., 2017]. The value of attribute indicates the presence or absence of semantic property for a specific object. For example, the attribute wing for object birds is present and for object fish is absent. Attributes of ZSL datasets are designed to describe objects as clearly as possible. While parts of attributes have little correlation with some objects although these attributes may be more suitable for describing other objects. For example, attribute small is effective for recognizing object rat while it has little similarity with object elephant. So we expand attribute with complementary attribute which can be considered as a supplement to the original attribute to describe objects more comprehensively. For example, we add not small as the complementary attribute of attribute small, and obviously, the complementary attribute not small is more relevant than the original attribute small with respect to the object elephant. Hence complementary attribute (like not small ) can contribute more information for recognizing zero-shot object (like elephant ) when the original attribute (like small ) contribute little. Attribute can be specified either on a per-class level or on a per-image level, and it can also be divided into binary value attribute and continuous value attribute according to the value type. In this work, per-class level continuous value attribute is adopted for ZSL, since it needs less human effort to be collected comparing to per-image level. Suppose we have K seen classes for training and L unseen classes for testing. Each class has an attributes vector a j = [a 1 j,..., ai j,..., am j ]T, where a i j denotes the value of ith attribute for class j. Attributes matrix A consists of attribute vectors of all classes including training classes Y and testing classes Z as follows: A = [a Y, a Z ] = [a Y1,..., a YK, a Z1,..., a ZL ] (1) Similarity matrix S indicating the correlation between classes and attributes can be calculated from attributes matrix A as follows: S = [s Y1,..., s YK, s Z1,..., s ZL ] (2) where s j, the j th column in S, is computed as follows: s j = a j / a j 2 2 (3) and s ij in matrix S denotes the similarity between i th attribute and class j. After obtaining attribute similarity matrix S, we can easily get the complementary attribute similarity matrix S = 1 S, where 1 is the matrix of all 1 s (with the same shape as S). Then we can expand the original attribute similarity matrix S using complementary attribute similarity matrix S as follows: [ S S = (4) S] 3.2 ZSL using Rank Aggregation Suppose we have M attributes including original attributes and complementary attributes, and each attribute classifier is trained independently on training datasets. At testing time, each trained attribute classifier provides us with a probability estimation p(a m x) for testing sample x. After calculating similarity matrix S, M orderings among testing classes for each attribute can be induced, and the ordering r m corresponding to m th attribute can be expressed as follows: class i ranked above class j r m i > r m j (5) where r m i and r m j are equal to s mi and s mj in matrix S, respectively. Different attributes usually have distinct similarities that induce different orderings among testing classes. So we need to recover a proximal ordering which has least empirical error

4 Dataset # of Attributes Classes Images # of total # of seen # of unseen # of total # of seen # of unseen AwA apy CUB Table 1: Statistic information of datasets: Awa, apy and CUB. Seen classes and seen images are used as training data, unseen classes and unseen images are used as testing data. according to M orderings for each testing sample. A straightforward strategy is to use the weighted average of similarity orderings as follows: r = M p(a m x)r m (6) m=1 This strategy can work reasonably, while it suffers from some drawbacks. First, similarities calculated from artificial continuous attribute values may be not smooth since attributes values may contain some gross outliers. Second, similarities may be not on the same scale hence they cannot be simply averaged. So we adopt the rank aggregation framework [Gleich and Lim, 2011] to calculate the proximal rank. For each attribute, the ordering r m R L among testing classes can be converted to a pairwise comparison matrix C m R L L as follows: C m = r m 1 T 1 (r m ) T (7) where 1 is the vector of all 1 s (with the same shape as r m ). Obviously, C m ij = rm i r m j, and Cm ij > 0 denotes that testing class i has greater similarity with m th attribute than testing class j. After obtaining M comparison matrices, i.e. C 1,..., C M, we want to find the proximal matrix C as the consensus of these M comparison matrices. Using the probability estimate p(a m x) as weight w m on m th attribute for testing sample x, the proximal matrix C can be solved by optimizing the following problem: min C,E m m=1 M w m C C m + E m γ Em 1 (8) where 2 and 1 are l 2 and l 1 norm respectively. The error matrix E m is introduced to offset possible errors produced by attribute similarities and penalized by l 1 norm to keep it sparse. Equation (8) could eliminate E m by introducing Huber loss function which is robust to handle outliers as follows [Chang et al., 2015]: min C M w m H m (C C m ) (9) m=1 where H m (X) is the Huber loss function [Huber, 1964]. The problem of Equation (9) is convex and can be solved using gradient descent algorithm [Burges et al., 2005]. Matrix C, the solution to this problem, indicates the pairwise comparison between testing classes for each sample. So we can recover the proximal rank r among testing classes using the inverse form of Equation (7) as follows: r = arg min r r 1 T 1 r T C 2 (10) 2 The highest order in the rank r is assigned as the label of testing sample. Suppose that there are n testing samples belonging to L classes, and the number of attributes including original and complementary is M. The complexity of testing time is O(nML 2 ), where O(ML 2 ) is the complexity of computing Equation (9). The complexity of the proposed method is acceptable since the M and L are fixed and usually small. Combining complementary attributes with rank aggregation framework, the procedure of the present approach is given in Algorithm 1. Algorithm 1 ZSL with complementary attributes Input: Training samples and testing samples; Correlation matrix A between attributes and classes. Output: Labels of testing samples. 1: Calculate similarity between attributes and classes using Equation (2); 2: Expand attribute similarity matrix with complementary attributes using Equation (4); 3: Get probability estimate for each testing sample using pre-trained attribute classifiers; 4: Rank testing classes for each attribute according to similarity matrix (Equation (5)); 5: Calculate the optimized rank using rank aggregation framework (Equation (9)), and recover the highest order as the label of testing sample (Equation (10)). 4 Experiments We evaluated the proposed approach and compared it to several popular ZSL methods on three benchmark datasets, i.e. AwA, CUB and apy. In this Section, we firstly introduce the experimental setup and then compare the proposed approach with DAP on dataset AwA. The comparison and analysis between our method and several state-of-the-art methods are provided finally.

5 DAP DAP+C DAP+R DAP+C+R Traditional Features Deep Features Table 2: Zero-shot learning results in terms of mean class accuracy (in %) on dataset AwA using traditional features and deep features. DAP denotes the direct attribute prediction; DAP + C denotes that DAP with complementary attributes; DAP + R denotes that DAP with rank aggregation; and DAP + C + R denotes that DAP with both complementary attributes and rank aggregation. persian cat hippopotamus leopard humpback whale seal chimpanzee rat giant panda pig raccoon persian cat hippopotamus leopard humpback whale seal chimpanzee rat giant panda pig raccoon persian cat hippopotamus leopard humpback whale seal chimpanzee rat giant panda pig raccoon (a) persian cat hippopotamus leopard humpback whale seal chimpanzee rat giant panda pig raccoon (b) Figure 2: Confusion matrices (in %) between 10 testing classes on dataset AwA using ResNet features. (a) DAP; (b) The proposed approach. 4.1 Experimental Setup Datasets. We conducted experiments on three standard ZSL datasets: (1) Animal with Attribute (AwA) [Lampert et al., 2014], (2) attribute-pascal-yahoo (apy) [Farhadi et al., 2009], and (3) Caltech-UCSD Bird (CUB) [Welinder et al., 2010]. The overall statistic information of these datasets is summarized in Table 1. Image Features. Deep neural network feature is extracted for experiments. Image features are extracted from the entire image for AwA and CUB, and from bounding boxes mentioned in [Farhadi et al., 2009] for apy, respectively. The original ResNet-101 [He et al., 2016] pre-trained on ImageNet with 1K classes is used to calculate 2048-dimensional top-layer pooling units as image features. Semantic Representation. Attributes are used as semantic representation to transfer information between classes. We use 85, 64 and 312 dimensional continuous value attributes for dataset AwA, apy and CUB, respectively. Other kinds of prior information can be alternative semantic assistant information, as long as it can be formalized in the form of relationship matrix (Equation (2)). Evaluation protocols. Unified dataset splits seen in Table 1 are used for each method to get the fair comparison results. Mean class accuracy, i.e. per-class averaged top-1 accuracy is calculated as the criterion of assessment since the dataset is not well balanced with respect to the number of images per class. 4.2 Evaluation of Complementary Attributes and Rank Aggregation We compared the proposed approach with a baseline DAP using both traditional features (the same as used in [Lampert et al., 2014]) and deep features (i.e. ResNet features) on dataset AwA. In this paper, our primary contributions are that introducing complementary attributes and using rank aggregation framework to overcome the drawbacks of DAP. So DAP, DAP improved by complementary attributes, DAP improved by rank aggregation, and DAP improved by both complementary attributes and rank aggregation (i.e. the present approach) are evaluated on the same experimental setup. The classification results of the above four methods are shown in Table 2. It can be seen that two strategies, i.e. complementary attributes and rank aggregation, both have an improvement comparing to DAP, and the present method using both strategies is able to improve the baseline approach by a large margin. Complementary attributes, as a supplement to the original attributes, can describe objects more comprehensively

6 Method AwA apy CUB DAP [Lampert et al., 2014] IAP [Lampert et al., 2014] CONSE [Norouzi et al., 2014] CMT [Socher et al., 2013] ESZSL [Romera-Paredes and Torr, 2015] ALE [Akata et al., 2016] SJE [Akata et al., 2015] COSTA [Mensink et al., 2014] Our approach Table 3: Detailed zero-shot object classification results in terms of mean class accuracy (in %) on dataset AwA, apy and CUB using ResNet features. Within each column, bold font indicates the best result and italics indicates the second best. and therefore make the semantic embedding space more complete. The experimental results show that adding complementary attribute into DAP can improve the mean class classification accuracy with 1.32% and 4.94% on traditional features and deep features, respectively. The other strategy, rank aggregation framework overcomes the slack assumption of DAP that attributes are independent from each other. Obviously, it improves DAP with 1.01% and 1.99% on traditional features and deep features, respectively. When using both complementary attributes and rank aggregation framework, the performance improvements of the proposed approach are greater than 3.71% and 9.57% on traditional features and deep features, respectively. In order to analyze the robustness of the present method comparing to DAP, the resulting confusion matrices of our approach and DAP on dataset AwA using ResNet features are illustrated in Figure 2. The numbers in the diagonal area (yellow patches) of confusion matrices indicate the classification accuracy per class. It is obvious that our method has a greater performance on most of the testing classes, and the accuracies of our method nearly doubled on some testing classes comparing to DAP, such as Persian cat, seal and raccoon. And more importantly, the confusion matrix of our method contains less noise (i.e. smaller numbers in the side regions (white patches) of confusion matrices apart from diagonal area) than DAP s, which suggests that our method has less prediction uncertainties. In other words, our approach is more robust than DAP on zero-shot learning. 4.3 Comparison with the State-of-the-art We compared the proposed approach with a baseline DAP and several state-of-the-art ZSL methods mentioned in the Section of Introduction and Related Work. Experiments are conducted on three standard ZSL datasets (i.e. AwA, apy and CUB) using deep features and the results are reported in Table 3. The same dataset splits and image features are used to ensure the comparability and easy analysis. It can be seen that our approach significantly outperforms others on all datasets. On dataset AwA, our approach achieves 74.0% accuracy, which is nearly 10% greater than DAP (64.4%), and still has a 7.7% improvement comparing to the second best method, i.e. SJE with the accuracy of 66.3%. Our approach also has a better performance than others either on dataset apy or CUB. Since the number of testing images per class in dataset AwA (about 618) is much greater than dataset apy (about 220) and CUB (about 58), the experimental results with less uncertainty on dataset AwA are more evidential and credible than other two datasets. The significant improvement of the proposed approach on dataset AwA and other two datasets shows that complementary attributes and rank aggregation framework we presented are effective to improve the performance for zero-shot object classification. 5 Conclusion We have developed a novel approach on zero-shot learning by introducing complementary attributes and adopting rank aggregation framework. Complementary attribute can be considered as a supplement to the original attribute and helps describe objects more comprehensively, hence it can make the presented method more discriminative for zeroshot object classification. Rank aggregation framework is a more accurate model fusing the prediction of attribute classifiers and similarities between attributes and classes. The present approach has been evaluated on several standard ZSL datasets, and experimental results showed that complementary attributes are effective to improve the performance of current ZSL models. Our approach outperforms start-of-theart methods on all of the datasets used in experiments. Attribute is a representative high-level semantic property of objects, and complementary attribute is introduced as a supplement of attribute in this work. Obviously, other assistant information, which can be formalized as a relationship matrix, can be easily added into the presented rank aggregation model for zero-shot object classification. So as future work, we plan to fuse multi-sources assistant information to increase the discriminative power of current zero-shot learning models.

7 References [Akata et al., 2015] Zeynep Akata, Scott Reed, Daniel Walter, Honglak Lee, and Bernt Schiele. Evaluation of output embeddings for fine-grained image classification. In CVPR, pages , [Akata et al., 2016] Zeynep Akata, Florent Perronnin, Zaid Harchaoui, and Cordelia Schmid. Label-embedding for image classification. IEEE transactions on pattern analysis and machine intelligence, 38(7): , [Al-Halah et al., 2016] Ziad Al-Halah, Makarand Tapaswi, and Rainer Stiefelhagen. Recovering the missing link: Predicting class-attribute associations for unsupervised zero-shot learning. In CVPR, pages , [Burges et al., 2005] Chris Burges, Tal Shaked, Erin Renshaw, Ari Lazier, Matt Deeds, Nicole Hamilton, and Greg Hullender. Learning to rank using gradient descent. In ICML, pages ACM, [Chang et al., 2015] Xiaojun Chang, Yi Yang, Alexander G Hauptmann, Eric P Xing, and Yaoliang Yu. Semantic concept discovery for large-scale zero-shot event detection. In IJCAI, pages , [Farhadi et al., 2009] Ali Farhadi, Ian Endres, Derek Hoiem, and David Forsyth. Describing objects by their attributes. In CVPR, pages , [Frome et al., 2013] Andrea Frome, Greg S Corrado, Jon Shlens, Samy Bengio, Jeff Dean, Tomas Mikolov, et al. Devise: A deep visual-semantic embedding model. In NIPS, pages , [Fu et al., 2012] Yanwei Fu, Timothy M Hospedales, Tao Xiang, and Shaogang Gong. Attribute learning for understanding unstructured social activity. In ECCV, pages , [Fu et al., 2015a] Yanwei Fu, Timothy M Hospedales, Tao Xiang, and Shaogang Gong. Transductive multi-view zero-shot learning. IEEE transactions on pattern analysis and machine intelligence, 37(11): , [Fu et al., 2015b] Zhenyong Fu, Tao Xiang, Elyor Kodirov, and Shaogang Gong. Zero-shot object recognition by semantic manifold distance. In CVPR, pages , [Gleich and Lim, 2011] David F Gleich and Lek-heng Lim. Rank aggregation via nuclear norm minimization. In ACM SIGKDD, pages 60 68, [He et al., 2016] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In CVPR, pages , [Huber, 1964] Peter J Huber. Robust estimation of a location parameter. The annals of mathematical statistics, pages , [Lampert et al., 2014] Christoph H Lampert, Hannes Nickisch, and Stefan Harmeling. Attribute-based classification for zero-shot visual object categorization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(3): , [Mensink et al., 2012] Thomas Mensink, Jakob Verbeek, Florent Perronnin, and Gabriela Csurka. Metric learning for large scale image classification: Generalizing to new classes at near-zero cost. In Computer Vision ECCV 2012, pages [Mensink et al., 2014] Thomas Mensink, Efstratios Gavves, and Cees GM Snoek. Costa: Co-occurrence statistics for zero-shot classification. In CVPR, pages , [Mikolov et al., 2013] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. Distributed representations of words and phrases and their compositionality. In NIPS, pages , [Murphy, 2004] Gregory Murphy. The big book of concepts. MIT press, [Norouzi et al., 2014] Mohammad Norouzi, Tomas Mikolov, Samy Bengio, Yoram Singer, Jonathon Shlens, Andrea Frome, Greg S Corrado, and Jeffrey Dean. Zero-shot learning by convex combination of semantic embeddings. ICLR, [Parikh and Grauman, 2011] Devi Parikh and Kristen Grauman. Relative attributes. In ICCV, pages , [Romera-Paredes and Torr, 2015] Bernardino Romera- Paredes and Philip Torr. An embarrassingly simple approach to zero-shot learning. In ICML, pages , [Sabour et al., 2017] Sara Sabour, Nicholas Frosst, and Geoffrey E Hinton. Dynamic routing between capsules. In NIPS, pages , [Socher et al., 2013] Richard Socher, Milind Ganjoo, Christopher D Manning, and Andrew Ng. Zero-shot learning through cross-modal transfer. In NIPS, pages , [Wang et al., 2017] Yaqing Wang, James T Kwok, Quanming Yao, and Lionel M Ni. Zero-shot learning with a partial set of observed attributes. In IJCNN, pages , [Welinder et al., 2010] Peter Welinder, Steve Branson, Takeshi Mita, Catherine Wah, Florian Schroff, Serge Belongie, and Pietro Perona. Caltech-ucsd birds [Xian et al., 2016] Yongqin Xian, Zeynep Akata, Gaurav Sharma, Quynh Nguyen, Matthias Hein, and Bernt Schiele. Latent embeddings for zero-shot classification. In CVPR, pages 69 77, [Yu et al., 2013] Felix X Yu, Liangliang Cao, Rogerio S Feris, John R Smith, and Shih-Fu Chang. Designing category-level attributes for discriminative visual recognition. In CVPR, pages , 2013.

Improving One-Shot Learning through Fusing Side Information

Improving One-Shot Learning through Fusing Side Information Yao-Hung Hubert Tsai Ruslan Salakhutdinov Machine Learning Department, School of Computer Science, Carnegie Mellon University {yaohungt, rsalakhu}@cs.cmu.edu