arxiv: v2 [cs.cv] 14 Apr 2014

Size: px
Start display at page:

Download "arxiv: v2 [cs.cv] 14 Apr 2014"

Transcription

1 Deep Convolutional Ranking for Multilabel Image Annotation arxiv: v2 [s.cv] 14 Apr 2014 Yunhao Gong UNC Chapel Hill Alexander Toshev Google Researh Yangqing Jia Google Researh Abstrat Thomas K. Leung Google Researh Sergey Ioffe Google Researh Multilabel image annotation is one of the most important hallenges in omputer vision with many real-world appliations. While existing work usually use onventional visual features for multilabel annotation, features based on Deep Neural Networks have shown potential to signifiantly boost performane. In this work, we propose to leverage the advantage of suh features and analyze key omponents that lead to better performanes. Speifially, we show that a signifiant performane gain ould be obtained by ombining onvolutional arhitetures with approximate top-k ranking objetives, as thye naturally fit the multilabel tagging problem. Our experiments on the NUS-WIDE dataset outperforms the onventional visual features by about 10%, obtaining the best reported performane in the literature. 1 Introdution Multilabel image annotation [25, 14] is an important and hallenging problem in omputer vision. Most existing work fous on single-label lassifiation problems [6, 21], where eah image is assumed to have only one lass label. However, this is not neessarily true for real world appliations, as an image may be assoiated with multiple semanti tags (Figure 1). As a pratial example, images from Flikr are usually aompanied by several tags to desribe the ontent of the image, suh as objets, ativities, and sene desriptions. Images on the Internet, in general, are usually assoiated with sentenes or desriptions, instead of a single lass label, whih may be deemed as a type of multitagging. Therefore, it is a pratial and important problem to aurately assign multiple labels to one image. Single-label image lassifiation has been extensively studied in the vision ommunity, the most reent advanes reported on the large-sale ImageNet database [6]. Most existing work fous on designing visual features for improving reognition auray. For example, sparse oding [33, 36], Fisher vetors [28], and VLAD [18] have been proposed to redue the quantization error of bag of words -type features. Spatial pyramid mathing [21] has been developed to enode spatial information for reognition. Very reently, deep onvolutional neural networks (CNN) have demonstrated promising results for single-label image lassifiation [20]. Suh algorithms have all foused on learning a better feature representation for one-vs-rest lassifiation problems, and it is not yet lear how to best train an arhiteture for multilabel annotation problems. In this work, we are interested in leveraging the highly expressive onvolutional network for the problem of multilabel image annotation. We employed a similar network struture to [20], whih ontains several onvolutional and dense onneted layers as the basi arhiteture. We studied 1

2 Tags: green, flower sun, flowers, zoo, day, sunny, sunshine Tags: london, traffi, raw Tags: art, girl, woman, wow, dane, jump, daning Figure 1: Sample images from the NUS-WIDE dataset, where eah image is annotated with several tags. and ompared several other popular multilabel losses, suh as the ranking loss [19] that optimizes the area under ROC urve (AUC), and the ross-entropy loss used in Tagprop [14]. Speifially, we propose to use the top-k ranking loss, inspired by [34], for embedding to train the network. Using the largest publily available multilabel dataset NUS-WIDE [4], we observe a signifiant performane boost over onventional features, reporting the best retrieval performane on the benhmark dataset in the literature. 1.1 Previous Work In this setion we first review related works on multilabel image annotation and then briefly disuss works on deep onvolutional networks. Modeling Internet images and their orresponding textural information (e.g., sentenes, tags) have been of great interest in the vision ommunity [2, 10, 12, 15, 30, 29, 34]. In this work, we fous on the image annotation problem and summarize several important lines of related researh. Early work in this area was mostly devoted to annotation models inspired by mahine translation tehniques [1, 8]. The work by Barnard et al. [1, 8] applied mahine translation methods to parse natural images and tried to establish a relationship between image regions and words. More reently, image annotation has been formulated as a lassifiation problem. Early works foused on generative model based tagging [1, 3, 26], entred upon the idea of learning a parametri model to perform preditions. However, beause image annotation is a highly nonlinear problem, a parametri model might not be suffiient to apture the omplex distribution of the data. Several reent works on image tagging have mostly foused on nonparametri nearest-neighbor methods, whih offer higher expressive power. The work by Makadia et al. [25], whih proposed a simple nearest-neighbor-based tag transfer approah, ahieved signifiant improvement over previous model-based methods. Reent improvements on the nonparametri approah inlude TagProp [14], whih learns a disriminative metri for nearest neighbors to improve tagging. Convolutional neural networks (CNNs) [20, 22, 23, 17, 5] are a speial type of neural network that utilizes speifi network strutures, suh as onvolutions and spatial pooling, and have exhibited good generalization power in image-related appliations. Combined with reent tehniques suh as Dropout and fast parallel training, CNN models have outperformed existing hanrafted features. Krizhevsky et al. [20] reported reord-breaking results on ILSVRC 2012 that ontains 1000 visualobjet ategories. However, this study was mostly onerned with single-label image lassifiation, and the images in the dataset only ontain one prominent objet lass. At a finer sale, several methods fous on improving speifi network designs. Notably, Zeiler et al. [37] investigated different pooling methods for training CNNs, and several different regularization methods, suh as Dropout [16], DropConnet [32], and Maxout [13] have been proposed to improve the robustness and representation power of the networks. In adition, Earlier studies [7] have shown that CNN features are suitable as a general feature for various tasks under the onventional lassifiation shemes, and our work fouses on how to diretly train a deep network from raw pixels, using multilabel ranking loss, to address the multilabel annotation problem. 2 Multilabel Deep Convolutional Ranking Net In our approah for multilabel image annotation, we adopted the arhiteture proposed in [20] as our basi framework and mainly foused on training the network with loss funtions tailored for multi-label predition tasks. 2

3 2.1 Network Arhiteture The basi arhiteture of the network we use is similar to the one used in [20]. We use five onvolutional layers and three densely onneted layers. Before feeding the images to the onvolutional layers, eah image is resized to Next, pathes are extrated from the whole image, at the enter and the four orners to provide an augmentation of the dataset. Convolution filter sizes are set to squares of size 11, 9, and 5 respetively for the different onvolutional layers; and max pooling layers are used in some of the onvolutional layers to introdue invariane. Eah densely onneted layer has output sizes of Dropout layers follow eah of the densely onneted layers with a dropout ratio of 0.6. For all the layers, we used retified linear units (RELU) as our nonlinear ativation funtion. The optimization of the whole network is ahieved by asynhronized stohasti gradient desent with a momentum term with weight 0.9, with mini-bath size of 32. The global learning rate for the whole network is set to at the beginning, and a stairase weight deay is applied after a few epohs. The same optimization parameters and proedure are applied to all the different methods. For our dataset with 150,000 training images, it usually takes one day to obtain a good model by training on a luster. Unlike previous work that usually used ImageNet to pre-train the network, we train the whole network diretly from the training images from the NUS-WIDE dataset for a fair omparison with onventional vision baselines. 2.2 Multilabel Ranking Losses We mainly foused on loss layer, whih speifies how the network training penalizes the deviation between the predited and true labels, and investigated several different multilabel loss funtions for training the network. The first loss funtion was inspired by Tagprop [14], for whih we minimized the multilabel softmax regression loss. The seond loss was a simple modifiation of a pairwiseranking loss [19], whih takes multiple labels into aount. The third loss funtion was a multilabel variant of the WARP loss [34], whih uses a sampling trik to optimize top-k annotation auray. For notations, assume that we have a set of images x and that we denote the onvolutional network by f( ) where the onvolutional layers and dense onneted layers filter the images. The output of f( ) is a soring funtion of the data point x, that produes a vetor of ativations. We assume there are n image training data and tags Softmax The softmax loss has been used for multilabel annotation in Tagprop [14], and is also used in singlelabel image lassifiation [20]; therefore, we adopted it in our ontext. The posterior probability of an image x i and lass j an be expressed as p ij = exp(f j (x i )) k=1 exp(f k(x i )), (1) where f j (x i ) means the ativation value for image x i and lass j. We then minimized the KL- Divergene between the preditions and the ground-truth probabilities. Assuming that eah image has multiple labels, and that we an form a label vetor y R 1 where y j = 1 means the presene of a label and y j = 0 means absene of a label for an image, we an obtain ground-truth probability by normalizing y as y/ y 1. If the ground truth probability for image i and lass j is defined as p ij, the ost funtion to be minimized is J = 1 m n i=1 j=1 p ij log(p ij ) = 1 m n + i=1 j=1 1 + log(p ij ) where + denotes the number of positive labels for eah image. For the ease of exposition and without loss of generality, we set + to be the same for all images Pairwise Ranking The seond loss funtion we onsidered was the pairwise-ranking loss [19], whih diretly models the annotation problem. In partiular, we wanted to rank the positive labels to always have higher 3

4 sores than negative labels, whih led to the following minimization problem: J = n + i=1 j=1 k=1 max(0, 1 f j (x i ) + f k (x i )), (2) where + is the positive labels and is the negative labels. During the bak-propagation, we omputed the sub-gradient of this loss funtion. One limitation of this loss is that it optimizes the area under the ROC urve (AUC) but does not diretly optimize the top-k annotation auray. Beause for image annotation problems we were mostly interested in top-k annotations, this pairwise ranking loss did not best fit our purpose Weighted Approximate Ranking (WARP) The third loss we onsidered was the weighted approximate ranking (WARP), whih was first desribed in [34]. It speifially optimizes the top-k auray for annotation by using a stohasti sampling approah. Suh an approah fits the stohasti optimization framework of the deep arhiteture very well. It minimizes J = n + i=1 j=1 k=1 L(r j ) max(0, 1 f j (x i ) + f k (x i )). (3) where L( ) is a weighting funtion for different ranks, and r j is the rank for the jth lass for image i. The weighting funtion L( ) used in our work is defined as: L(r) = r α j, with α 1 α (4) j=1 We defined the α i as equal to 1/j, whih is the same as [34]. The weights defined by L( ) ontrol the top-k of the optimization. In partiular, if a positive label is ranked top in the label list, then L( ) will assign a small weight to the loss and will not ost the loss too muh. However, if a positive label is not ranked top, L( ) will assign a muh larger weight to the loss, whih pushes the positive label to the top. The last question was how to estimate the rank r j. We followed the sampling method in [34]: for a positive label, we ontinued to sample negative labels until we found a violation; then we reorded the number of trials s we sampled for negative labels. The rank was estimated by the following formulation r j = 1, (5) s for lasses and s sampling trials. We omputed the sub-gradient for this layer during optimization. As a minor noite, the approximate objetive we optimize is a looser upper bound ompared to the original WARP loss proposed in [34]. To see this, notie that in the original paper, it is assumed that the probability of sampling a violator is p = r #Y 1 with a positive example (x, y) with rank r, where #Y is the number of labels. Thus, there are r labels with higher sores than y. This is true only if all these r labels are negative. However, in our ase, sine there may be other positive labels having higher sores than y due to the multi-label nature of the problem, we effetively have p r #Y 1. 3 Visual Feature based Image Annotation Baslines We used a set of 9 different visual features and ombined them to serve as our baseline features. Although suh a set of features might not have been the best possible ones we ould obtain, they already serve as a very strong visual representation, and the omputation of suh features is nontrivial. On top of these features, we ran two simple but powerful lassifiers (knn and SVM) for image annotation. We also experimented with Tagprop [14], but found it annot easily sale to a large training set beause of the O(n 2 ) time omplexity. After using a small training set to train the Tagprop model, we found the performane to be unsatisfatory and therefore do not ompare it here. 4

5 3.1 Visual Features GIST [27]: We resized eah image to and used three different sales [8, 8, 4] to filter eah RGB hannel, resulting in 960-dimensional (320 3) GIST feature vetors. SIFT: We used two different sampling methods and three different loal desriptors to extrat texture features, whih gave us a total of 6 different features. We used dense sampling and a Harris orner detetor as our path-sampling methods. For loal desriptors, we extrated SIFT [24], CSIFT [31], and RGBSIFT [31], and formed a odebook of size 1000 using kmeans lustering; then built a twolevel spatial pyramid [21] that resulted in a 5000-dimensional vetor for eah image. We will refer to these six features as D-SIFT, D-CSIFT, D-RGBSIFT, H-SIFT, H-CSIFT, and H-RGBSIFT. HOG: To represent texture information at a larger sale, we used 2 2 overlapping HOG as desribed in [35]. We quantized the HOG features to a odebook of size 1000 and used the same spatial pyramid sheme as above, whih resulted in 5000-dimensional feature vetors. Color: We used a joint RGB olor histogram of 8 bins per dimension, for a 512-dimensional feature. The same set of features were used in [11], and ahieved state-of-the-art performane for image retrieval and annotation. The ombination of this set of features has a total dimensionality of 36,472, whih makes learning very expensive. We followed [11] to perform simple dimensionality redutions to redue omputation. In partiular, we performed a kernel PCA (KPCA) separately on eah feature to redue the dimensionality to 500. Then we onatenated all of the feature vetors to form a 4500-dimensional global image feature vetor and performed different learning algorithms on it. 3.2 Visual feature + knn The simplest baseline that remains very powerful involves diretly applying a weighted knn on the visual feature vetors. knn is a very strong baseline for image annotation, as suggested by Makadia et al. [25], mainly beause multilabel image annotation is a highly nonlinear problem and handling the heavily tailed label distribution is usually very diffiult. By ontrast, knn is a highly nonlinear and adaptive algorithm that better handles rare tags. For eah test image, we found its k nearest neighbors in the training set and omputed the posterior probability p( i) as k 1 p( i) = k exp( x i x j 2 2 )y jk, (6) σ j=1 where y ik indexes the labels of training data, y ik = 1 when there is one label for this image, and y ik = 0 when there is no label for this image. σ is the bandwidth that needs to be tuned. After obtaining the predition probabilities for eah image, we sorted the sores and annotated eah testing image with the top-k tags. 3.3 Visual feature + SVM Another way to perform image annotation is to treat eah tag separately and to train different onevs-all lassifiers. We trained a linear SVM [9] for eah tag and used the output of the different SVMs to rank the tags. Beause we had already performed nonlinear mapping to the data during the KPCA stage, we found a linear SVM to be suffiient. Thus we assigned top-k tags to one image, based on the ranking of the output sores of the SVMs. 4 Experiments 4.1 Dataset We performed experiments on the largest publily available multilabel dataset, NUS-WIDE [4]. This dataset ontains 269,648 images downloaded from Flikr that have been manually annotated, with several tags (2-5 on average) per image. After ignoring the small subset of the images that are not annotated by any tag, we had a total of 209,347 images for training and testing. We used a subset of 150,000 images for training and used the rest of the images for testing. The tag ditionary for the images ontains 81 different tags. Some sample images and annotations are shown in Figure 1. 5

6 method / metri per-lass reall per-lass preision overall reall overall preision N+ Upper bound Visual Feature + knn Visual Feature + SVM CNN + Softmax CNN + Ranking CNN + WARP Table 1: Image annotation results on NUS-WIDE with k = 3 annotated tags per image. See text in setion 5.4 for the definition of Upper bound. method / metri per-lass reall per-lass preision overall reall overall preision N+ Upper bound Visual Feature + knn Visual Feature + SVM CNN + Softmax CNN + Ranking CNN + WARP Table 2: Image annotation results on NUS-WIDE with k = 5 annotated tags per image. See text in setion 5.4 for the definition of Upper bound. 4.2 Evaluation Protools We followed previous researh [25] in our use of the following protools to evaluate different methods. For eah image, we assigned k (e.g., k = 3, 5) highest-ranked tags to the image and ompared the assigned tags to the ground-truth tags. We omputed the reall and preision for eah tag separately, and then report the mean-per-lass reall and mean-per-lass preision: per-lass reall = 1 Ni, per-lass preision = 1 Ni (7) N g i=1 i N p i=1 i where is the number of tags, Ni is the number of orretly annotated image for tag i, N g i is the number of ground-truth tags for tag i, and N p i is the number of preditions for tag i. The above evaluations are biased toward infrequent tags, beause making them orret would have a very signifiant impat on the final auray. Therefore we also report the overall reall and overall preision: i=1 overall reall = N i i=1 N g i=1, overall preision = N i i i=1 N p. (8) i For the above two metris, the frequent lasses will be dominant and have a larger impat on final performane. Finally, we also report the perentage of realled tag words out of all tag words as N+. We believe that evaluating all of these metris makes the evaluation unbiased. 4.3 Baseline Parameters In our preliminary evaluation, we optimized the parameters for the visual-feature-based baseline systems. For visual-feature dimensionality redution, we followed the suggestions in Gong et al. [11] to redue the dimensionality of eah feature to 500 and then onatenated the PCA-redued vetors into a 4500-dimensional global image desriptor, whih worked as well as the original feature. For knn, we set the bandwidth σ to 1 and k to 50, having found that these settings work best. For SVM, we set the regularization parameter to C = 2, whih works best for this dataset. 4.4 Results We first report results with respet to the metris introdued above. In partiular, we vary the number k of predited keywords for eah image and mainly onsider k = 3 and k = 5. Before doing so, however, we must define an upper bound for our evaluation. In the dataset, eah image had different numbers of ground-truth tags, whih made it hard for us to preisely ompute an upper 6

7 Reall Softmax Ranking WARP Tags (dereasing frequeny) Figure 2: Analysis of per-lass reall of the 81 tags in NUS-WIDE dataset with k = 3. Preision Softmax Ranking WARP Tags (dereasing frequeny) Figure 3: Analysis of per-lass preision of the 81 tags in NUS-WIDE dataset with k = 3. bound for performane with different k. For eah image, when the number of ground-truth tags was larger than k, we randomly hose k ground-truth tags and assigned them to that image; when the number of ground-truth tags was smaller than k, we assigned all ground-truth tags to that image and randomly hose other tags for that image. We believe this baseline represents the best possible performane when the ground truth is known. The results for assigning 3 keywords per image are reported in Table 1. The results indiate that the deep network ahieves a substantial improvement over existing visual-feature-based annotation methods. The CNN+Softmax method outperforms the VisualFeature+SVM baseline by about 10%. Comparing the same CNN network with different loss funtions, results show that softmax already gives a very powerful baseline. Although using the pairwise ranking loss does not improve softmax, by using the weighted approximated-ranking loss (WARP) we were able to ahieve a substantial improvement over softmax. This is probably beause pairwise-ranking is not diretly optimizing the top-k auray, and beause WARP pushes lasses that are not ranked top heavier, whih boosts the performane of rare tags. From these results, we an see that all loss funtions ahieved omparable overall-reall and overall-preision, but that WARP loss ahieved signifiantly better per-lass reall and per-lass preision. Results for k = 5, whih are given in Table 2, show similar trends to k = 3. We also provide a more detailed analysis of per-lass reall and per-lass preision. The reall for eah tags appears in Figure 2, and the preision for eah tag in Figure 3. The results for different tags are sorted by the frequeny of eah tag, in desending order. From these results, we see that the auray for frequent tags greater than for infrequent tags. Different losses performed omparably to eah other for frequent lasses, and WARP worked better than other loss funtions for infrequent lasses. Finally, we show some image annotation examples in Figure 4. Even though some of the predited tags for these do not math the ground truth, they are still very meaningful. 5 Disussion and Future Work In this work, we proposed to use ranking to train deep onvolutional neural networks for multilabel image annotation problems. We investigated several different ranking-based loss funtions for training the CNN, and found that the weighted approximated-ranking loss works partiularly well for multilabel annotation problems. We performed experiments on the largest publily available multilabel image dataset NUS-WIDE, and demonstrated the effetiveness of using top-k ranking to train the network. In the future, we would like to use very large amount of noisy-labeled multilabel images from the Internet (e.g., from Flikr or image searhes) to train the network. 7

8 Image Ground truth Predi2ons Image Ground truth Predi2ons Boat Oean Vehile Lake Oean House Animal Flower plant Valley Plant Flower Road Animal Bird Food Toy Beah Rok Sunset Rok Oean Snow Sunset Tree Tree Snow Animal Cow Horse Animal Mountain Rok Valley Mountain Building Figure 4: Qualitative image annotation results obtained with WARP. Referenes [1] Kobus Barnard and David Forsyth. Learning the semantis of words and pitures. In ICCV, [2] Tamara Berg and David Forsyth. Animals on the web. CVPR, [3] Gustavo Carneiro, Antoni B Chan, Pedro J Moreno, and Nuno Vasonelos. Supervised learning of semanti lasses for image annotation and retrieval. Pattern Analysis and Mahine Intelligene, IEEE Transations on, 29(3): , [4] Tat-Seng Chua, Jinhui Tang, Rihang Hong, Haojie Li, Zhiping Luo, and Yan-Tao. Zheng. Nus-wide: A real-world web image database from national university of singapore. In Pro. of ACM Conf. on Image and Video Retrieval (CIVR 09), Santorini, Greee., July 8-10, [5] Jeffrey Dean, Greg Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Quo Le, Mark Mao, Mar Aurelio Ranzato, Andrew Senior, Paul Tuker, Ke Yang, and Andrew Ng. Large sale distributed deep networks. In P. Bartlett, F.C.N. Pereira, C.J.C. Burges, L. Bottou, and K.Q. Weinberger, editors, Advanes in Neural Information Proessing Systems 25, pages , [6] Jia Deng, W. Dong, R. Soher, Lijia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-sale hierarhial image database. CVPR, [7] Jeff Donahue, Yangqing Jia, Oriol Vinyals, Judy Hoffman, Ning Zhang, Eri Tzeng, and Trevor Darrell. Deaf: A deep onvolutional ativation feature for generi visual reognition. arxiv preprint arxiv: , [8] Pinar Duygulu, Kobus Barnard, Nando de Freitas, and David Forsyth. Objet reognition as mahine translation: Learning a lexion for a fixed image voabulary. In ECCV, [9] R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin. Liblinear: A library for large linear lassifiation. JMLR, [10] Rob Fergus, Antonio Torralba, and Yair Weiss. Semi-supervised learning in giganti image olletions. NIPS, [11] Yunhao Gong, Qifa Ke, Mihael Isard, and Svetlana Lazebnik. A multi-view embedding spae for internet images, tags, and their semantis. IJCV, [12] Yunhao Gong and Svetlana Lazebnik. Iterative quantization: An prorustean approah to learning binary odes. CVPR,

9 [13] Ian Goodfellow, David Warde-Farley, Mehdi Mirza, Aaron Courville, and Yoshua Bengio. Maxout networks. ICML, [14] M. Guillaumin, T. Mensink, J. Verbeek, and C. Shmid. Tagprop: Disriminative metri learning in nearest neighbor models for image auto-annotation. ICCV, [15] Matthieu Guillaumin, Jakob Verbeek, and Cordelia Shmid. Multimodal semi-supervised learning for image lassifiation. CVPR, [16] Geoffrey Hinton, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Improving neural networks by preventing o-adaptation of feature detetors. Arxiv, [17] What is the best multi-stage arhiteture for objet reognition? K. jarrett and k. kavukuoglu and m. a. ranzato and y. leun. CVPR, [18] Herv Jégou, M. Douze, Cordelia Shmid, and Patrik Perez. Aggregating loal desriptors into a ompat image representation. CVPR, [19] Thorsten Joahims. Optimizing searh engines using likthrough data. In Proeedings of the Eighth ACM SIGKDD International Conferene on Knowledge Disovery and Data Mining, KDD 02, pages , New York, NY, USA, ACM. [20] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. Imagenet lassifiation with deep onvolutional neural networks. NIPS, [21] Svetlana Lazebnik, Cordelia Shmid, and Jean Pone. Beyond bags of features: Spatial pyramid mathing for reognizing natural sene ategories. CVPR, [22] Y. LeCun, B. Boser, J.S. Denker, D. Henderson, R.E. Howard, W. Hubbard, and L.D. Jakel. Handwritten digit reognition with a bak-propagation network. NIPS, [23] H. Lee, R. Grosse, R. Ranganath, and A.Y. Ng. Convolutional deep belief networks for salable unsupervised learning of hierarhial representations. ICML, [24] David G. Lowe. Distintive image features from sale-invariant keypoints. IJCV, [25] Ameesh Makadia, Vladimir Pavlovi, and Sanjiv Kumar. A new baseline for image annotation. In ECCV, [26] F. Monay and D. Gatia-Perez. Plsa-based image autoannotation: Constraining the latent spae. In ACM Multimedia, [27] Aude Oliva and Antonio Torralba. Modeling the shape of the sene: a holisti representation of the spatial envelope. IJCV, [28] Florent Perronnin and Christopher R. Dane. Fisher kernels on visual voabularies for image ategorization. CVPR, [29] A. Quattoni, M. Collins, and T. Darrell. Learning visual representations using images with aptions. CVPR, [30] N. Rasiwasia, PJ Moreno, and N. Vasonelos. Bridging the gap: Query by semanti example. IEEE Transations on Multimedia, [31] Koen E. A. van de Sande, Theo Gevers, and Cees G. M. Snoek. Evaluating olor desriptors for objet and sene reognition. PAMI, [32] Li Wan, Matt Zeiler, Sixin Zhang, Yann Leun, and Rob Fergus. Regularization of neural networks using droponnet. ICML, [33] Jinjun Wang, Jianhao Yang, Kai Yu, Fengjun Lv, Thomas Huang, and Yihong Gong. Loalityonstrained linear oding for image lassifiation. CVPR, [34] Jason Weston, Samy Bengio, and Niolas Usunier. Wsabie: Saling up to large voabulary image annotation. In IJCAI, [35] Jianxiong Xiao, James Hays, Krista Ehinger, Aude Oliva, and Antonio Torralba. Sun database: Large-sale sene reognition from abbey to zoo. CVPR, [36] Jianhao Yang, Kai Yu, Yihong Gong, and Thomas Huang. Linear spatial pyramid mathing uisng sparse oding for image lassifiation. CVPR, [37] Matt Zeiler and Rob Fergus. Stohasti pooling for regularization of deep onvolutional neural networks. ICLR,

Learning Convention Propagation in BeerAdvocate Reviews from a etwork Perspective. Abstract

Learning Convention Propagation in BeerAdvocate Reviews from a etwork Perspective. Abstract CS 9 Projet Final Report: Learning Convention Propagation in BeerAdvoate Reviews from a etwork Perspetive Abstrat We look at the way onventions propagate between reviews on the BeerAdvoate dataset, and

More information

Proceedings of the International MultiConference of Engineers and Computer Scientists 2018 Vol I IMECS 2018, March 14-16, 2018, Hong Kong

Proceedings of the International MultiConference of Engineers and Computer Scientists 2018 Vol I IMECS 2018, March 14-16, 2018, Hong Kong , March 14-16, 2018, Hong Kong , March 14-16, 2018, Hong Kong , March 14-16, 2018, Hong Kong , March 14-16, 2018, Hong Kong TABLE I CLASSIFICATION ACCURACY OF DIFFERENT PRE-TRAINED MODELS ON THE TEST DATA

More information

The Minimum Redundancy Maximum Relevance Approach to Building Sparse Support Vector Machines

The Minimum Redundancy Maximum Relevance Approach to Building Sparse Support Vector Machines The Minimum Redundany Maximum Relevane Approah to Building Sparse Support Vetor Mahines Xiaoxing Yang, Ke Tang, and Xin Yao, Nature Inspired Computation and Appliations Laboratory (NICAL), Shool of Computer

More information

Learning Discriminative and Shareable Features. Scene Classificsion

Learning Discriminative and Shareable Features. Scene Classificsion Learning Disriminative and Shareable Features for Sene Classifiation Zhen Zuo, Gang Wang, Bing Shuai, Lifan Zhao, Qingxiong Yang, and Xudong Jiang Nanyang Tehnologial University, Singapore, Advaned Digital

More information

Evolutionary Feature Synthesis for Image Databases

Evolutionary Feature Synthesis for Image Databases Evolutionary Feature Synthesis for Image Databases Anlei Dong, Bir Bhanu, Yingqiang Lin Center for Researh in Intelligent Systems University of California, Riverside, California 92521, USA {adong, bhanu,

More information

Exploiting Enriched Contextual Information for Mobile App Classification

Exploiting Enriched Contextual Information for Mobile App Classification Exploiting Enrihed Contextual Information for Mobile App Classifiation Hengshu Zhu 1 Huanhuan Cao 2 Enhong Chen 1 Hui Xiong 3 Jilei Tian 2 1 University of Siene and Tehnology of China 2 Nokia Researh Center

More information

Groupout: A Way to Regularize Deep Convolutional Neural Network

Groupout: A Way to Regularize Deep Convolutional Neural Network Groupout: A Way to Regularize Deep Convolutional Neural Network Eunbyung Park Department of Computer Science University of North Carolina at Chapel Hill eunbyung@cs.unc.edu Abstract Groupout is a new technique

More information

the data. Structured Principal Component Analysis (SPCA)

the data. Structured Principal Component Analysis (SPCA) Strutured Prinipal Component Analysis Kristin M. Branson and Sameer Agarwal Department of Computer Siene and Engineering University of California, San Diego La Jolla, CA 9193-114 Abstrat Many tasks involving

More information

NONLINEAR BACK PROJECTION FOR TOMOGRAPHIC IMAGE RECONSTRUCTION. Ken Sauer and Charles A. Bouman

NONLINEAR BACK PROJECTION FOR TOMOGRAPHIC IMAGE RECONSTRUCTION. Ken Sauer and Charles A. Bouman NONLINEAR BACK PROJECTION FOR TOMOGRAPHIC IMAGE RECONSTRUCTION Ken Sauer and Charles A. Bouman Department of Eletrial Engineering, University of Notre Dame Notre Dame, IN 46556, (219) 631-6999 Shool of

More information

Robust Scene Classification with Cross-level LLC Coding on CNN Features

Robust Scene Classification with Cross-level LLC Coding on CNN Features Robust Scene Classification with Cross-level LLC Coding on CNN Features Zequn Jie 1, Shuicheng Yan 2 1 Keio-NUS CUTE Center, National University of Singapore, Singapore 2 Department of Electrical and Computer

More information

TagProp: Discriminative Metric Learning in Nearest Neighbor Models for Image Annotation

TagProp: Discriminative Metric Learning in Nearest Neighbor Models for Image Annotation TagProp: Discriminative Metric Learning in Nearest Neighbor Models for Image Annotation Matthieu Guillaumin, Thomas Mensink, Jakob Verbeek, Cordelia Schmid LEAR team, INRIA Rhône-Alpes, Grenoble, France

More information

Boosted Random Forest

Boosted Random Forest Boosted Random Forest Yohei Mishina, Masamitsu suhiya and Hironobu Fujiyoshi Department of Computer Siene, Chubu University, 1200 Matsumoto-ho, Kasugai, Aihi, Japan {mishi, mtdoll}@vision.s.hubu.a.jp,

More information

Deep Rule-Based Classifier with Human-level Performance and Characteristics

Deep Rule-Based Classifier with Human-level Performance and Characteristics Deep Rule-Based Classifier with Human-level Performane and Charateristis Plamen P. Angelov 1,2 and Xiaowei Gu 1* 1 Shool of Computing and Communiations, Lanaster University, Lanaster, LA1 4WA, UK 2 Tehnial

More information

Performance of Histogram-Based Skin Colour Segmentation for Arms Detection in Human Motion Analysis Application

Performance of Histogram-Based Skin Colour Segmentation for Arms Detection in Human Motion Analysis Application World Aademy of Siene, Engineering and Tehnology 8 009 Performane of Histogram-Based Skin Colour Segmentation for Arms Detetion in Human Motion Analysis Appliation Rosalyn R. Porle, Ali Chekima, Farrah

More information

Deep Neural Networks:

Deep Neural Networks: Deep Neural Networks: Part II Convolutional Neural Network (CNN) Yuan-Kai Wang, 2016 Web site of this course: http://pattern-recognition.weebly.com source: CNN for ImageClassification, by S. Lazebnik,

More information

Graph-Based vs Depth-Based Data Representation for Multiview Images

Graph-Based vs Depth-Based Data Representation for Multiview Images Graph-Based vs Depth-Based Data Representation for Multiview Images Thomas Maugey, Antonio Ortega, Pasal Frossard Signal Proessing Laboratory (LTS), Eole Polytehnique Fédérale de Lausanne (EPFL) Email:

More information

A scheme for racquet sports video analysis with the combination of audio-visual information

A scheme for racquet sports video analysis with the combination of audio-visual information A sheme for raquet sports video analysis with the ombination of audio-visual information Liyuan Xing a*, Qixiang Ye b, Weigang Zhang, Qingming Huang a and Hua Yu a a Graduate Shool of the Chinese Aadamy

More information

Computer Vision Lecture 16

Computer Vision Lecture 16 Computer Vision Lecture 16 Deep Learning for Object Categorization 14.01.2016 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Announcements Seminar registration period

More information

One Against One or One Against All : Which One is Better for Handwriting Recognition with SVMs?

One Against One or One Against All : Which One is Better for Handwriting Recognition with SVMs? One Against One or One Against All : Whih One is Better for Handwriting Reognition with SVMs? Jonathan Milgram, Mohamed Cheriet, Robert Sabourin To ite this version: Jonathan Milgram, Mohamed Cheriet,

More information

Improved Vehicle Classification in Long Traffic Video by Cooperating Tracker and Classifier Modules

Improved Vehicle Classification in Long Traffic Video by Cooperating Tracker and Classifier Modules Improved Vehile Classifiation in Long Traffi Video by Cooperating Traker and Classifier Modules Brendan Morris and Mohan Trivedi University of California, San Diego San Diego, CA 92093 {b1morris, trivedi}@usd.edu

More information

A Novel Validity Index for Determination of the Optimal Number of Clusters

A Novel Validity Index for Determination of the Optimal Number of Clusters IEICE TRANS. INF. & SYST., VOL.E84 D, NO.2 FEBRUARY 2001 281 LETTER A Novel Validity Index for Determination of the Optimal Number of Clusters Do-Jong KIM, Yong-Woon PARK, and Dong-Jo PARK, Nonmembers

More information

Capturing Large Intra-class Variations of Biometric Data by Template Co-updating

Capturing Large Intra-class Variations of Biometric Data by Template Co-updating Capturing Large Intra-lass Variations of Biometri Data by Template Co-updating Ajita Rattani University of Cagliari Piazza d'armi, Cagliari, Italy ajita.rattani@diee.unia.it Gian Lua Marialis University

More information

Learning Deep Context-Network Architectures for Image Annotation

Learning Deep Context-Network Architectures for Image Annotation 1 Learning Deep Context-Network Arhitetures for Image Annotation Mingyuan Jiu and Hihem Sahbi arxiv:1803.08794v1 [s.cv] 23 Mar 2018 Abstrat Context plays an important role in visual pattern reognition

More information

Aggregating Descriptors with Local Gaussian Metrics

Aggregating Descriptors with Local Gaussian Metrics Aggregating Descriptors with Local Gaussian Metrics Hideki Nakayama Grad. School of Information Science and Technology The University of Tokyo Tokyo, JAPAN nakayama@ci.i.u-tokyo.ac.jp Abstract Recently,

More information

Cluster-Based Cumulative Ensembles

Cluster-Based Cumulative Ensembles Cluster-Based Cumulative Ensembles Hanan G. Ayad and Mohamed S. Kamel Pattern Analysis and Mahine Intelligene Lab, Eletrial and Computer Engineering, University of Waterloo, Waterloo, Ontario N2L 3G1,

More information

KERNEL SPARSE REPRESENTATION WITH LOCAL PATTERNS FOR FACE RECOGNITION

KERNEL SPARSE REPRESENTATION WITH LOCAL PATTERNS FOR FACE RECOGNITION KERNEL SPARSE REPRESENTATION WITH LOCAL PATTERNS FOR FACE RECOGNITION Cuiui Kang 1, Shengai Liao, Shiming Xiang 1, Chunhong Pan 1 1 National Laboratory of Pattern Reognition, Institute of Automation, Chinese

More information

Rotation Invariant Spherical Harmonic Representation of 3D Shape Descriptors

Rotation Invariant Spherical Harmonic Representation of 3D Shape Descriptors Eurographis Symposium on Geometry Proessing (003) L. Kobbelt, P. Shröder, H. Hoppe (Editors) Rotation Invariant Spherial Harmoni Representation of 3D Shape Desriptors Mihael Kazhdan, Thomas Funkhouser,

More information

A Coarse-to-Fine Classification Scheme for Facial Expression Recognition

A Coarse-to-Fine Classification Scheme for Facial Expression Recognition A Coarse-to-Fine Classifiation Sheme for Faial Expression Reognition Xiaoyi Feng 1,, Abdenour Hadid 1 and Matti Pietikäinen 1 1 Mahine Vision Group Infoteh Oulu and Dept. of Eletrial and Information Engineering

More information

A FRAMEWORK OF EXTRACTING MULTI-SCALE FEATURES USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKS. Kuan-Chuan Peng and Tsuhan Chen

A FRAMEWORK OF EXTRACTING MULTI-SCALE FEATURES USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKS. Kuan-Chuan Peng and Tsuhan Chen A FRAMEWORK OF EXTRACTING MULTI-SCALE FEATURES USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKS Kuan-Chuan Peng and Tsuhan Chen School of Electrical and Computer Engineering, Cornell University, Ithaca, NY

More information

Gray Codes for Reflectable Languages

Gray Codes for Reflectable Languages Gray Codes for Refletable Languages Yue Li Joe Sawada Marh 8, 2008 Abstrat We lassify a type of language alled a refletable language. We then develop a generi algorithm that an be used to list all strings

More information

Computer Vision Lecture 16

Computer Vision Lecture 16 Announcements Computer Vision Lecture 16 Deep Learning Applications 11.01.2017 Seminar registration period starts on Friday We will offer a lab course in the summer semester Deep Robot Learning Topic:

More information

A New RBFNDDA-KNN Network and Its Application to Medical Pattern Classification

A New RBFNDDA-KNN Network and Its Application to Medical Pattern Classification A New RBFNDDA-KNN Network and Its Appliation to Medial Pattern Classifiation Shing Chiang Tan 1*, Chee Peng Lim 2, Robert F. Harrison 3, R. Lee Kennedy 4 1 Faulty of Information Siene and Tehnology, Multimedia

More information

Computer Vision Lecture 16

Computer Vision Lecture 16 Computer Vision Lecture 16 Deep Learning Applications 11.01.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Announcements Seminar registration period starts

More information

Multi-modal Clustering for Multimedia Collections

Multi-modal Clustering for Multimedia Collections Multi-modal Clustering for Multimedia Colletions Ron Bekkerman and Jiwoon Jeon Center for Intelligent Information Retrieval University of Massahusetts at Amherst, USA {ronb jeon}@s.umass.edu Abstrat Most

More information

3-D IMAGE MODELS AND COMPRESSION - SYNTHETIC HYBRID OR NATURAL FIT?

3-D IMAGE MODELS AND COMPRESSION - SYNTHETIC HYBRID OR NATURAL FIT? 3-D IMAGE MODELS AND COMPRESSION - SYNTHETIC HYBRID OR NATURAL FIT? Bernd Girod, Peter Eisert, Marus Magnor, Ekehard Steinbah, Thomas Wiegand Te {girod eommuniations Laboratory, University of Erlangen-Nuremberg

More information

A Novel Bit Level Time Series Representation with Implication of Similarity Search and Clustering

A Novel Bit Level Time Series Representation with Implication of Similarity Search and Clustering A Novel Bit Level Time Series Representation with Impliation of Similarity Searh and lustering hotirat Ratanamahatana, Eamonn Keogh, Anthony J. Bagnall 2, and Stefano Lonardi Dept. of omputer Siene & Engineering,

More information

Contour Box: Rejecting Object Proposals Without Explicit Closed Contours

Contour Box: Rejecting Object Proposals Without Explicit Closed Contours Contour Box: Rejeting Objet Proposals Without Expliit Closed Contours Cewu Lu, Shu Liu Jiaya Jia Chi-Keung Tang The Hong Kong University of Siene and Tehnology Stanford University The Chinese University

More information

Comparing Images Under Variable Illumination

Comparing Images Under Variable Illumination ( This paper appeared in CVPR 8. IEEE ) Comparing Images Under Variable Illumination David W. Jaobs Peter N. Belhumeur Ronen Basri NEC Researh Institute Center for Computational Vision and Control The

More information

Volume 3, Issue 9, September 2013 International Journal of Advanced Research in Computer Science and Software Engineering

Volume 3, Issue 9, September 2013 International Journal of Advanced Research in Computer Science and Software Engineering Volume 3, Issue 9, September 2013 ISSN: 2277 128X International Journal of Advaned Researh in Computer Siene and Software Engineering Researh Paper Available online at: www.ijarsse.om A New-Fangled Algorithm

More information

What are Cycle-Stealing Systems Good For? A Detailed Performance Model Case Study

What are Cycle-Stealing Systems Good For? A Detailed Performance Model Case Study What are Cyle-Stealing Systems Good For? A Detailed Performane Model Case Study Wayne Kelly and Jiro Sumitomo Queensland University of Tehnology, Australia {w.kelly, j.sumitomo}@qut.edu.au Abstrat The

More information

A {k, n}-secret Sharing Scheme for Color Images

A {k, n}-secret Sharing Scheme for Color Images A {k, n}-seret Sharing Sheme for Color Images Rastislav Luka, Konstantinos N. Plataniotis, and Anastasios N. Venetsanopoulos The Edward S. Rogers Sr. Dept. of Eletrial and Computer Engineering, University

More information

Deep Learning for Computer Vision II

Deep Learning for Computer Vision II IIIT Hyderabad Deep Learning for Computer Vision II C. V. Jawahar Paradigm Shift Feature Extraction (SIFT, HoG, ) Part Models / Encoding Classifier Sparrow Feature Learning Classifier Sparrow L 1 L 2 L

More information

Learning Visual Semantics: Models, Massive Computation, and Innovative Applications

Learning Visual Semantics: Models, Massive Computation, and Innovative Applications Learning Visual Semantics: Models, Massive Computation, and Innovative Applications Part II: Visual Features and Representations Liangliang Cao, IBM Watson Research Center Evolvement of Visual Features

More information

Towards Optimal Naive Bayes Nearest Neighbor

Towards Optimal Naive Bayes Nearest Neighbor Towards Optimal Naive Bayes Nearest Neighbor Régis Behmo 1, Paul Marombes 1,2, Arnak Dalalyan 2,andVéronique Prinet 1 1 NLPR / LIAMA, Institute of Automation, Chinese Aademy of Sienes 2 IMAGINE, LIGM,

More information

Detecting Outliers in High-Dimensional Datasets with Mixed Attributes

Detecting Outliers in High-Dimensional Datasets with Mixed Attributes Deteting Outliers in High-Dimensional Datasets with Mixed Attributes A. Koufakou, M. Georgiopoulos, and G.C. Anagnostopoulos 2 Shool of EECS, University of Central Florida, Orlando, FL, USA 2 Dept. of

More information

arxiv: v1 [cs.mm] 12 Jan 2016

arxiv: v1 [cs.mm] 12 Jan 2016 Learning Subclass Representations for Visually-varied Image Classification Xinchao Li, Peng Xu, Yue Shi, Martha Larson, Alan Hanjalic Multimedia Information Retrieval Lab, Delft University of Technology

More information

Semantic Concept Detection Using Weighted Discretization Multiple Correspondence Analysis for Disaster Information Management

Semantic Concept Detection Using Weighted Discretization Multiple Correspondence Analysis for Disaster Information Management Semanti Conept Detetion Using Weighted Disretization Multiple Correspondene Analysis for Disaster Information Management Samira Pouyanfar and Shu-Ching Chen Shool of Computing and Information Sienes Florida

More information

Outline: Software Design

Outline: Software Design Outline: Software Design. Goals History of software design ideas Design priniples Design methods Life belt or leg iron? (Budgen) Copyright Nany Leveson, Sept. 1999 A Little History... At first, struggling

More information

Part Localization by Exploiting Deep Convolutional Networks

Part Localization by Exploiting Deep Convolutional Networks Part Localization by Exploiting Deep Convolutional Networks Marcel Simon, Erik Rodner, and Joachim Denzler Computer Vision Group, Friedrich Schiller University of Jena, Germany www.inf-cv.uni-jena.de Abstract.

More information

Relevance for Computer Vision

Relevance for Computer Vision The Geometry of ROC Spae: Understanding Mahine Learning Metris through ROC Isometris, by Peter A. Flah International Conferene on Mahine Learning (ICML-23) http://www.s.bris.a.uk/publiations/papers/74.pdf

More information

M ULTI -S CALE D ENSE N ETWORKS

M ULTI -S CALE D ENSE N ETWORKS Under review as a onferene paper at ICLR 208 M ULTI -S CALE D ENSE N ETWORKS FOR R ESOURCE E FFICIENT I MAGE C LASSIFICATION Anonymous authors Paper under double-blind review A BSTRACT In this paper we

More information

A DYNAMIC ACCESS CONTROL WITH BINARY KEY-PAIR

A DYNAMIC ACCESS CONTROL WITH BINARY KEY-PAIR Malaysian Journal of Computer Siene, Vol 10 No 1, June 1997, pp 36-41 A DYNAMIC ACCESS CONTROL WITH BINARY KEY-PAIR Md Rafiqul Islam, Harihodin Selamat and Mohd Noor Md Sap Faulty of Computer Siene and

More information

Multi-Scale Orderless Pooling of Deep Convolutional Activation Features

Multi-Scale Orderless Pooling of Deep Convolutional Activation Features Multi-Scale Orderless Pooling of Deep Convolutional Activation Features Yunchao Gong, Liwei Wang 2, Ruiqi Guo 2, and Svetlana Lazebnik 2 University of North Carolina at Chapel Hill yunchao@cs.unc.edu 2

More information

Progressive Generative Hashing for Image Retrieval

Progressive Generative Hashing for Image Retrieval Progressive Generative Hashing for Image Retrieval Yuqing Ma, Yue He, Fan Ding, Sheng Hu, Jun Li, Xianglong Liu 2018.7.16 01 BACKGROUND the NNS problem in big data 02 RELATED WORK Generative adversarial

More information

DETECTION METHOD FOR NETWORK PENETRATING BEHAVIOR BASED ON COMMUNICATION FINGERPRINT

DETECTION METHOD FOR NETWORK PENETRATING BEHAVIOR BASED ON COMMUNICATION FINGERPRINT DETECTION METHOD FOR NETWORK PENETRATING BEHAVIOR BASED ON COMMUNICATION FINGERPRINT 1 ZHANGGUO TANG, 2 HUANZHOU LI, 3 MINGQUAN ZHONG, 4 JIAN ZHANG 1 Institute of Computer Network and Communiation Tehnology,

More information

HEXA: Compact Data Structures for Faster Packet Processing

HEXA: Compact Data Structures for Faster Packet Processing Washington University in St. Louis Washington University Open Sholarship All Computer Siene and Engineering Researh Computer Siene and Engineering Report Number: 27-26 27 HEXA: Compat Data Strutures for

More information

Pipelined Multipliers for Reconfigurable Hardware

Pipelined Multipliers for Reconfigurable Hardware Pipelined Multipliers for Reonfigurable Hardware Mithell J. Myjak and José G. Delgado-Frias Shool of Eletrial Engineering and Computer Siene, Washington State University Pullman, WA 99164-2752 USA {mmyjak,

More information

Extracting Partition Statistics from Semistructured Data

Extracting Partition Statistics from Semistructured Data Extrating Partition Statistis from Semistrutured Data John N. Wilson Rihard Gourlay Robert Japp Mathias Neumüller Department of Computer and Information Sienes University of Strathlyde, Glasgow, UK {jnw,rsg,rpj,mathias}@is.strath.a.uk

More information

Micro-Doppler Based Human-Robot Classification Using Ensemble and Deep Learning Approaches

Micro-Doppler Based Human-Robot Classification Using Ensemble and Deep Learning Approaches Miro-Doppler Based Human-Robot Classifiation Using Ensemble and Deep Learning Approahes Sherif Abdulatif, Qian Wei, Fady Aziz, Bernhard Kleiner, Urs Shneider Department of Biomehatroni Systems, Fraunhofer

More information

Hello neighbor: accurate object retrieval with k-reciprocal nearest neighbors

Hello neighbor: accurate object retrieval with k-reciprocal nearest neighbors Hello neighbor: aurate objet retrieval with -reiproal nearest neighbors Danfeng Qin Stephan Gammeter Luas Bossard Till Qua,2 Lu van Gool,3 ETH Zürih 2 Kooaba AG 3 K.U. Leuven {ind,gammeter,bossard,tua,vangool}@vision.ee.ethz.h

More information

arxiv: v5 [cs.lg] 7 Jun 2018

arxiv: v5 [cs.lg] 7 Jun 2018 Published as a onferene paper at ICLR 08 M ULTI -S CALE D ENSE N ETWORKS FOR R ESOURCE E FFICIENT I MAGE C LASSIFICATION Gao Huang Cornell University Danlu Chen Fudan University Tianhong Li Tsinghua University

More information

Video Data and Sonar Data: Real World Data Fusion Example

Video Data and Sonar Data: Real World Data Fusion Example 14th International Conferene on Information Fusion Chiago, Illinois, USA, July 5-8, 2011 Video Data and Sonar Data: Real World Data Fusion Example David W. Krout Applied Physis Lab dkrout@apl.washington.edu

More information

Semi-Supervised Affinity Propagation with Instance-Level Constraints

Semi-Supervised Affinity Propagation with Instance-Level Constraints Semi-Supervised Affinity Propagation with Instane-Level Constraints Inmar E. Givoni, Brendan J. Frey Probabilisti and Statistial Inferene Group University of Toronto 10 King s College Road, Toronto, Ontario,

More information

Discriminative classifiers for image recognition

Discriminative classifiers for image recognition Discriminative classifiers for image recognition May 26 th, 2015 Yong Jae Lee UC Davis Outline Last time: window-based generic object detection basic pipeline face detection with boosting as case study

More information

Metric Learning for Large Scale Image Classification:

Metric Learning for Large Scale Image Classification: Metric Learning for Large Scale Image Classification: Generalizing to New Classes at Near-Zero Cost Thomas Mensink 1,2 Jakob Verbeek 2 Florent Perronnin 1 Gabriela Csurka 1 1 TVPA - Xerox Research Centre

More information

FUZZY WATERSHED FOR IMAGE SEGMENTATION

FUZZY WATERSHED FOR IMAGE SEGMENTATION FUZZY WATERSHED FOR IMAGE SEGMENTATION Ramón Moreno, Manuel Graña Computational Intelligene Group, Universidad del País Vaso, Spain http://www.ehu.es/winto; {ramon.moreno,manuel.grana}@ehu.es Abstrat The

More information

Metric Learning for Large-Scale Image Classification:

Metric Learning for Large-Scale Image Classification: Metric Learning for Large-Scale Image Classification: Generalizing to New Classes at Near-Zero Cost Florent Perronnin 1 work published at ECCV 2012 with: Thomas Mensink 1,2 Jakob Verbeek 2 Gabriela Csurka

More information

Contents Contents...I List of Tables...VIII List of Figures...IX 1. Introduction Information Retrieval... 8

Contents Contents...I List of Tables...VIII List of Figures...IX 1. Introduction Information Retrieval... 8 Contents Contents...I List of Tables...VIII List of Figures...IX 1. Introdution... 1 1.1. Internet Information...2 1.2. Internet Information Retrieval...3 1.2.1. Doument Indexing...4 1.2.2. Doument Retrieval...4

More information

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech Convolutional Neural Networks Computer Vision Jia-Bin Huang, Virginia Tech Today s class Overview Convolutional Neural Network (CNN) Training CNN Understanding and Visualizing CNN Image Categorization:

More information

Cost-alleviative Learning for Deep Convolutional Neural Network-based Facial Part Labeling

Cost-alleviative Learning for Deep Convolutional Neural Network-based Facial Part Labeling [DOI: 10.2197/ipsjtcva.7.99] Express Paper Cost-alleviative Learning for Deep Convolutional Neural Network-based Facial Part Labeling Takayoshi Yamashita 1,a) Takaya Nakamura 1 Hiroshi Fukui 1,b) Yuji

More information

Probabilistic Classification of Image Regions using an Observation-Constrained Generative Approach

Probabilistic Classification of Image Regions using an Observation-Constrained Generative Approach 9 Probabilisti Classifiation of mage Regions using an Observation-Constrained Generative Approah Sanjiv Kumar, Alexander C. Loui 2, and Martial Hebert The Robotis nstitute, Carnegie Mellon University,

More information

Object Detection Based on Deep Learning

Object Detection Based on Deep Learning Object Detection Based on Deep Learning Yurii Pashchenko AI Ukraine 2016, Kharkiv, 2016 Image classification (mostly what you ve seen) http://tutorial.caffe.berkeleyvision.org/caffe-cvpr15-detection.pdf

More information

Unsupervised Stereoscopic Video Object Segmentation Based on Active Contours and Retrainable Neural Networks

Unsupervised Stereoscopic Video Object Segmentation Based on Active Contours and Retrainable Neural Networks Unsupervised Stereosopi Video Objet Segmentation Based on Ative Contours and Retrainable Neural Networks KLIMIS NTALIANIS, ANASTASIOS DOULAMIS, and NIKOLAOS DOULAMIS National Tehnial University of Athens

More information

Automatic Physical Design Tuning: Workload as a Sequence Sanjay Agrawal Microsoft Research One Microsoft Way Redmond, WA, USA +1-(425)

Automatic Physical Design Tuning: Workload as a Sequence Sanjay Agrawal Microsoft Research One Microsoft Way Redmond, WA, USA +1-(425) Automati Physial Design Tuning: Workload as a Sequene Sanjay Agrawal Mirosoft Researh One Mirosoft Way Redmond, WA, USA +1-(425) 75-357 sagrawal@mirosoft.om Eri Chu * Computer Sienes Department University

More information

Deep Learning with Tensorflow AlexNet

Deep Learning with Tensorflow   AlexNet Machine Learning and Computer Vision Group Deep Learning with Tensorflow http://cvml.ist.ac.at/courses/dlwt_w17/ AlexNet Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton, "Imagenet classification

More information

Model Based Approach for Content Based Image Retrievals Based on Fusion and Relevancy Methodology

Model Based Approach for Content Based Image Retrievals Based on Fusion and Relevancy Methodology The International Arab Journal of Information Tehnology, Vol. 12, No. 6, November 15 519 Model Based Approah for Content Based Image Retrievals Based on Fusion and Relevany Methodology Telu Venkata Madhusudhanarao

More information

CS 2750: Machine Learning. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh April 13, 2016

CS 2750: Machine Learning. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh April 13, 2016 CS 2750: Machine Learning Neural Networks Prof. Adriana Kovashka University of Pittsburgh April 13, 2016 Plan for today Neural network definition and examples Training neural networks (backprop) Convolutional

More information

Discrete sequential models and CRFs. 1 Case Study: Supervised Part-of-Speech Tagging

Discrete sequential models and CRFs. 1 Case Study: Supervised Part-of-Speech Tagging 0-708: Probabilisti Graphial Models 0-708, Spring 204 Disrete sequential models and CRFs Leturer: Eri P. Xing Sribes: Pankesh Bamotra, Xuanhong Li Case Study: Supervised Part-of-Speeh Tagging The supervised

More information

DeCAF: a Deep Convolutional Activation Feature for Generic Visual Recognition

DeCAF: a Deep Convolutional Activation Feature for Generic Visual Recognition DeCAF: a Deep Convolutional Activation Feature for Generic Visual Recognition ECS 289G 10/06/2016 Authors: Jeff Donahue, Yangqing Jia, Oriol Vinyals, Judy Hoffman, Ning Zhang, Eric Tzeng and Trevor Darrell

More information

On - Line Path Delay Fault Testing of Omega MINs M. Bellos 1, E. Kalligeros 1, D. Nikolos 1,2 & H. T. Vergos 1,2

On - Line Path Delay Fault Testing of Omega MINs M. Bellos 1, E. Kalligeros 1, D. Nikolos 1,2 & H. T. Vergos 1,2 On - Line Path Delay Fault Testing of Omega MINs M. Bellos, E. Kalligeros, D. Nikolos,2 & H. T. Vergos,2 Dept. of Computer Engineering and Informatis 2 Computer Tehnology Institute University of Patras,

More information

An Optimized Approach on Applying Genetic Algorithm to Adaptive Cluster Validity Index

An Optimized Approach on Applying Genetic Algorithm to Adaptive Cluster Validity Index IJCSES International Journal of Computer Sienes and Engineering Systems, ol., No.4, Otober 2007 CSES International 2007 ISSN 0973-4406 253 An Optimized Approah on Applying Geneti Algorithm to Adaptive

More information

arxiv: v1 [cs.lg] 20 Dec 2013

arxiv: v1 [cs.lg] 20 Dec 2013 Unsupervised Feature Learning by Deep Sparse Coding Yunlong He Koray Kavukcuoglu Yun Wang Arthur Szlam Yanjun Qi arxiv:1312.5783v1 [cs.lg] 20 Dec 2013 Abstract In this paper, we propose a new unsupervised

More information

Abstract. Key Words: Image Filters, Fuzzy Filters, Order Statistics Filters, Rank Ordered Mean Filters, Channel Noise. 1.

Abstract. Key Words: Image Filters, Fuzzy Filters, Order Statistics Filters, Rank Ordered Mean Filters, Channel Noise. 1. Fuzzy Weighted Rank Ordered Mean (FWROM) Filters for Mixed Noise Suppression from Images S. Meher, G. Panda, B. Majhi 3, M.R. Meher 4,,4 Department of Eletronis and I.E., National Institute of Tehnology,

More information

Experiments of Image Retrieval Using Weak Attributes

Experiments of Image Retrieval Using Weak Attributes Columbia University Computer Science Department Technical Report # CUCS 005-12 (2012) Experiments of Image Retrieval Using Weak Attributes Felix X. Yu, Rongrong Ji, Ming-Hen Tsai, Guangnan Ye, Shih-Fu

More information

We don t need no generation - a practical approach to sliding window RLNC

We don t need no generation - a practical approach to sliding window RLNC We don t need no generation - a pratial approah to sliding window RLNC Simon Wunderlih, Frank Gabriel, Sreekrishna Pandi, Frank H.P. Fitzek Deutshe Telekom Chair of Communiation Networks, TU Dresden, Dresden,

More information

Weighted Convolutional Neural Network. Ensemble.

Weighted Convolutional Neural Network. Ensemble. Weighted Convolutional Neural Network Ensemble Xavier Frazão and Luís A. Alexandre Dept. of Informatics, Univ. Beira Interior and Instituto de Telecomunicações Covilhã, Portugal xavierfrazao@gmail.com

More information

Using Augmented Measurements to Improve the Convergence of ICP

Using Augmented Measurements to Improve the Convergence of ICP Using Augmented Measurements to Improve the onvergene of IP Jaopo Serafin, Giorgio Grisetti Dept. of omputer, ontrol and Management Engineering, Sapienza University of Rome, Via Ariosto 25, I-0085, Rome,

More information

A Partial Sorting Algorithm in Multi-Hop Wireless Sensor Networks

A Partial Sorting Algorithm in Multi-Hop Wireless Sensor Networks A Partial Sorting Algorithm in Multi-Hop Wireless Sensor Networks Abouberine Ould Cheikhna Department of Computer Siene University of Piardie Jules Verne 80039 Amiens Frane Ould.heikhna.abouberine @u-piardie.fr

More information

Exploiting noisy web data for largescale visual recognition

Exploiting noisy web data for largescale visual recognition Exploiting noisy web data for largescale visual recognition Lamberto Ballan University of Padova, Italy CVPRW WebVision - Jul 26, 2017 Datasets drive computer vision progress ImageNet Slide credit: O.

More information

Algorithms for External Memory Lecture 6 Graph Algorithms - Weighted List Ranking

Algorithms for External Memory Lecture 6 Graph Algorithms - Weighted List Ranking Algorithms for External Memory Leture 6 Graph Algorithms - Weighted List Ranking Leturer: Nodari Sithinava Sribe: Andi Hellmund, Simon Ohsenreither 1 Introdution & Motivation After talking about I/O-effiient

More information

Transition Detection Using Hilbert Transform and Texture Features

Transition Detection Using Hilbert Transform and Texture Features Amerian Journal of Signal Proessing 1, (): 35-4 DOI: 1.593/.asp.1.6 Transition Detetion Using Hilbert Transform and Texture Features G. G. Lashmi Priya *, S. Domni Department of Computer Appliations, National

More information

TUMOR DETECTION IN MRI BRAIN IMAGE SEGMENTATION USING PHASE CONGRUENCY MODIFIED FUZZY C MEAN ALGORITHM

TUMOR DETECTION IN MRI BRAIN IMAGE SEGMENTATION USING PHASE CONGRUENCY MODIFIED FUZZY C MEAN ALGORITHM TUMOR DETECTION IN MRI BRAIN IMAGE SEGMENTATION USING PHASE CONGRUENCY MODIFIED FUZZY C MEAN ALGORITHM M. Murugeswari 1, M.Gayathri 2 1 Assoiate Professor, 2 PG Sholar 1,2 K.L.N College of Information

More information

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks Si Chen The George Washington University sichen@gwmail.gwu.edu Meera Hahn Emory University mhahn7@emory.edu Mentor: Afshin

More information

System-Level Parallelism and Throughput Optimization in Designing Reconfigurable Computing Applications

System-Level Parallelism and Throughput Optimization in Designing Reconfigurable Computing Applications System-Level Parallelism and hroughput Optimization in Designing Reonfigurable Computing Appliations Esam El-Araby 1, Mohamed aher 1, Kris Gaj 2, arek El-Ghazawi 1, David Caliga 3, and Nikitas Alexandridis

More information

Multivariate Texture-based Segmentation of Remotely Sensed. Imagery for Extraction of Objects and Their Uncertainty

Multivariate Texture-based Segmentation of Remotely Sensed. Imagery for Extraction of Objects and Their Uncertainty Multivariate Texture-based Segmentation of Remotely Sensed Imagery for Extration of Objets and Their Unertainty Arko Luieer*, Alfred Stein* & Peter Fisher** * International Institute for Geo-Information

More information

PASCAL VOC Classification: Local Features vs. Deep Features. Shuicheng YAN, NUS

PASCAL VOC Classification: Local Features vs. Deep Features. Shuicheng YAN, NUS PASCAL VOC Classification: Local Features vs. Deep Features Shuicheng YAN, NUS PASCAL VOC Why valuable? Multi-label, Real Scenarios! Visual Object Recognition Object Classification Object Detection Object

More information

Supervised Hashing for Image Retrieval via Image Representation Learning

Supervised Hashing for Image Retrieval via Image Representation Learning Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence Supervised Hashing for Image Retrieval via Image Representation Learning Rongkai Xia 1, Yan Pan 1, Hanjiang Lai 1,2, Cong Liu

More information

Creating Adaptive Web Sites Through Usage-Based Clustering of URLs

Creating Adaptive Web Sites Through Usage-Based Clustering of URLs Creating Adaptive Web Sites Through Usage-Based Clustering of URLs Bamshad Mobasher Dept. of Computer Siene, DePaul University, Chiago, IL mobasher@s.depaul.edu Robert Cooley, Jaideep Srivastava Dept.

More information

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun Presented by Tushar Bansal Objective 1. Get bounding box for all objects

More information

Context-Aware Activity Modeling using Hierarchical Conditional Random Fields

Context-Aware Activity Modeling using Hierarchical Conditional Random Fields Context-Aware Ativity Modeling using Hierarhial Conditional Random Fields Yingying Zhu, Nandita M. Nayak, and Amit K. Roy-Chowdhury Abstrat In this paper, rather than modeling ativities in videos individually,

More information