RECENT years have witnessed the rapid growth of image. SSDH: Semi-supervised Deep Hashing for Large Scale Image Retrieval

Size: px
Start display at page:

Download "RECENT years have witnessed the rapid growth of image. SSDH: Semi-supervised Deep Hashing for Large Scale Image Retrieval"

Transcription

1 SSDH: Semi-supervised Deep Hashing for Large Scale Image Retrieval Jian Zhang, Yuxin Peng, and Junchao Zhang arxiv: v [cs.cv] 28 Jul 206 Abstract The hashing methods have been widely used for efficient similarity retrieval on large scale image datasets. The traditional hashing methods learn hash functions to generate binary codes from hand-crafted features, which achieve limited accuracy since the hand-crafted features cannot optimally represent the image content and preserve the semantic similarity. Recently, several deep hashing methods have shown better performance because the deep architectures generate more discriminative feature representations. However, these deep hashing methods are mainly designed for the supervised scenarios, which only exploit the semantic similarity information, but ignore the underlying data structures. In this paper, we propose the semi-supervised deep hashing (SSDH) method, to perform more effective hash learning by simultaneously preserving the semantic similarity and the underlying data structures. Our proposed approach can be divided into two phases. First, a deep network is designed to extensively exploit both the labeled and unlabeled data, in which we construct the similarity graph online in a mini-batch with the deep feature representations. To the best of our knowledge, our proposed deep network is the first deep hashing method that can perform the hash code learning and feature learning simultaneously in a semi-supervised fashion. Second, we propose a loss function suitable for the semi-supervised scenario by jointly minimizing the empirical error on the labeled data as well as the embedding error on both the labeled and unlabeled data, which can preserve the semantic similarity, as well as capture the meaningful neighbors on the underlying data structures for effective hashing. Experiment results on 4 widely used datasets show that the proposed approach outperforms state-of-the-art hashing methods. Index Terms Semi-supervised deep hashing, large scale image retrieval, underlying data structures, online graph constructing. I. INTRODUCTION RECENT years have witnessed the rapid growth of image data on the web, and the content-based image retrieval (CBIR) on large scale datasets has attracted much attention, which tries to return similar images that match the visual query from a large image dataset. Nearest Neighbor (NN) search is the most basic scheme for content-based image retrieval. However, retrieving the exact nearest neighbors is computationally impracticable when the reference dataset becomes very large, since NN search usually exhaustively compares the query with each sample in the reference dataset. Therefore, many efforts have been focused on the approximate nearest neighbor (ANN) search, which is sufficient and efficient enough for many practical applications. Instead of finding the nearest neighbors, This work was supported by National Hi-Tech Research and Development Program of China (863 Program) under Grant 204AA0502, and National Natural Science Foundation of China under Grants and The authors are with the Institute of Computer Science and Technology, Peking University, Beijing 0087, China. Corresponding author: Yuxin Peng ( pengyuxin@pku.edu.cn). ANN search [] relaxes the goal of retrieval by finding the ǫ approximate nearest neighbors p of the query q, so that d(p,q) ( + ǫ)d(p,q), where ǫ > 0 and p denotes the nearest neighbor of the query q, which results in minor loss in accuracy but provides considerable computational efficiency. Many approaches have been designed to perform efficient ANN search, including tree-based methods and hash-based methods. The tree-based methods, such as KD tree [2], [3], usually organize data samples in efficient data structures to achieve fast search. However, the tree-based methods suffer from the curse of dimensionality, that is to say, the search efficiency greatly degrades with the high-dimensional data. While the hashbased methods, representatively Locality Sensitive Hashing (LSH) [4], aim to map the high-dimensional data into the lowdimensional binary code representations. On the one hand, hash codes are usually organized into a hash table which only takes constant query time; on the other hand, hash-based methods lead to substantially reduced storage as they only need to store the binary codes. Thus the hash-based methods support more efficient ANN search and have attracted more attention in the recent years. Generally speaking, the hashing methods learn the compact binary codes for each data, so that the similar data examples are mapped into the similar binary codes. Traditional hashing methods [4] [7] take the pre-extracted image features as input, then learn the hash functions by exploiting the data structures or applying the similarity preserving regularizations. These methods can be divided into two categories, namely unsupervised and supervised methods. And recently, inspired by the remarkable progress of the deep networks, some deep hashing methods have also been proposed [5], [8] [2], which take advantage of the deep networks to perform the hash learning. The unsupervised methods design the hash functions by using the unlabeled data to generate the binary codes. Locality Sensitive Hashing (LSH) [4] is a typical unsupervised method, which uses random linear projections to map data into the binary codes. Instead of using hash functions randomly generated, some data-dependent methods [7], [3] [9] have been proposed to capture the data properties, such as data distributions and manifold structures. For example, Spectral Hashing (SH) [7] designs the hash functions to be balanced and uncorrelated. Anchor Graph Hashing (AGH) [5] and Locally Linear Hashing (LLH) [4] utilize the manifold structures of data to learn compact hash codes, of which the former uses anchor graphs to discover the neighborhood structures, and the latter uses locality sensitive sparse coding to capture the local linear structures and then recovers these structures

2 2 in Hamming space. Iterative Quantization (ITQ) [3] attempts to generate the hash codes by minimizing the quantization error of mapping data to the vertices of a binary hypercube. Topology Preserving Hashing (TPH) [6], [8] performs the hash learning by preserving consistent neighborhood rankings of data points in Hamming space. While the unsupervised methods are promising to retrieve the neighbors under some kind of distance metrics (such as L 2 distance), the neighbors in the feature space can not optimally reflect the semantic similarity. Therefore, the supervised hashing methods are proposed, which utilize the semantic information such as image labels to generate the effective hash codes. For example, Binary Reconstruction Embedding (BRE) [20] learns the hash codes by minimizing the reconstruction error between the original distances and the reconstructed distances in the Hamming space. Supervised Hashing with Kernels (KSH) [2] learns the hash codes by preserving the pairwise relations between data samples provided by labels. Semi-supervised Hashing (SSH) [6] learns the hash codes by minimizing the empirical error over the labeled data while maximizing the information entropy of the generated hash codes over the labeled and unlabeled data. In [22], Norouzi et al. propose to learn hash functions based on a triplet ranking loss that can preserve relative semantic similarity. Ranking Preserving Hashing (RPH) [23] and Order Preserving Hashing (OPH) [24] learn the hash codes by preserving the ranking information, which is obtained based on the number of shared semantic labels between data examples. Supervised Discrete Hashing (SDH) [25] leverages label information to obtain hash codes by integrating hash code generation and classifier training. For most of the unsupervised and supervised hashing methods, the input images are represented by the hand-crafted features (e.g. GIST [26]), which can not optimally represent the semantic information of images. Inspired by the fast progress of the deep networks on the image classification [27], some deep hashing methods have been proposed [5], [8] [2] to take advantage of the superior feature representation power of the deep networks. Convolutional Neural Network Hashing (CNNH) [8] is a supervised hashing methods based on the convolutional networks. CNNH is a two-stage framework, which learns the fixed hash codes in the first stage, and learns the hash functions and image representations in the second stage. Although the learned hash codes can guide the feature learning, the learned image representations cannot provide feedback for learning better hash codes. To overcome the shortcomings of the two-stage learning scheme, some approaches have been proposed to perform simultaneously feature learning and hash code learning. Network in Network Hashing (NINH) [5] designs a triplet ranking loss to capture the relative similarities of images. NINH is a one-stage supervised method, thus the image representation learning and hash code learning can benefit each other in the deep architecture. Some similar ranking-based deep hashing methods [9], [0] have also been proposed recently, which also design the hash codes to preserve the ranking information obtained by labels. However, the existing deep hashing methods are mainly supervised methods, which heavily rely on the labeled images, and the deep networks often require a large amount of the labeled data to achieve better performance. But labeling images consumes large amounts of time and human resources, which is not practical in real applications. Thus it is necessary to make full use of the unlabeled images to improve the deep hashing performance. In this paper, we propose the semi-supervised deep hashing (SSDH) method, which learns the hash codes by preserving the semantic similarity and underlying data structures simultaneously. To the best of our knowledge, our proposed SSDH is the first deep hashing method that can perform the hash code learning and feature learning simultaneously in a semi-supervised fashion. The main contributions of this paper can be concluded as follows: Existing deep hashing methods only design the loss functions to preserve the semantic similarity but ignore the underlying data structures, which is essential to capture the semantic neighborhoods and return the meaningful nearest neighbors for effective hashing [5]. To address the problem, we propose a semi-supervised loss function to exploit the underlying structures of the unlabeled data for more effective hashing. By jointly minimizing the empirical error on the labeled data as well as the embedding error on both the labeled and unlabeled data, the learned hash codes can not only preserve the semantic similarity, but also capture the meaningful neighbors on the underlying data structures. Thus our proposed SSDH method can benefit greatly from both the labeled and unlabeled data. A deep network structure is designed to make full use of both the labeled and unlabeled data to learn hash codes and image features simultaneously. In our deep network, the representation learning layers extract discriminative deep features from the input images. Then the hash code learning layer and the classification layer are connected, of which the former maps the image features into q-dimensional hash vectors, and the latter predicts the pseudo labels of the unlabeled data. The last representation learning layer, the hash code learning layer and the classification layer are connected to the loss layer, and the proposed semi-supervised loss function is designed to model the semantic similarity constraints, meanwhile preserve the underlying data structures of the image features. The whole network is trained in an endto-end way, thus our proposed method can perform hash code learning and feature learning simultaneously. The traditional graph constructing methods are time consuming due to the O(n 2 ) complexity, which is intractable for large scale data. To address this issue, we propose an online graph construction strategy. Rather than construct the similarity graph over all the data in advance, our online graph construction strategy constructs the similarity graph in a mini-batch during the training procedure. On the one hand, the online graph construction strategy only needs to take consideration of a much smaller mini-batch of data, which is efficient and suitable for the batch-based deep network s training. On the other hand, the online graph constructing can benefit from the effective feature

3 3 representations extracted from deep networks. Experiments on 4 widely used datasets show that our proposed SSDH method achieves the best search accuracy comparing with state-of-art methods. The rest of this paper is organized as follows. Section 2 presents a brief review of related deep hashing methods. Section 3 introduces our proposed SSDH method. Section 4 shows our experiments on the 4 widely used image datasets, and Section 5 concludes this paper. II. RELATED WORK The hashing methods have become one of the most popular methods for similarity retrieval on large scale image datasets. In recent years, some hashing methods have been proposed to perform the hash learning with the deep networks and have made considerable progress. In this section, we briefly review some representative deep hashing methods. A. Convolutional Neural Network Hashing Convolutional Neural Network Hashing (CNNH) [8] decomposes the hash learning into two stages: a hash code learning stage and a hash function learning stage. Given the training imagesi = {I,I 2,...,I n }, in the hash code learning stage (Stage ), CNNH learns an approximate hash code for each training data by optimizing the following loss function: min H S q HHT 2 F () where F denotes the Frobenius norm;s denotes the semantic similarity of the image pairs in I, in which S ij = when image I i and I j are semantically similar, otherwise S ij = ; H {,} n q denotes the approximate hash code matrix whose each row is a q-dimensional hash code. Following KSH [2] that utilizes the equivalence between optimizing the code inner products and the Hamming distances, CNNH also tries to approximately reconstruct the similarity matrix S by the hash code inner product q HHT, thus optimizing the equation () means minimizing the reconstruction errors. CNNH firstly relaxes the integer constraints on H and randomly initializes H [,] n q, then optimizes the equation () using a coordinate descent algorithm. Thus it is equivalent to optimize the following equation: min H j H T j (qs c jh c H T c ) 2 F (2) where H j and H c denote the j-th and the c-th column of H respectively. In the hash function learning stage (Stage 2), CNNH utilizes deep networks to learn the image feature representation and hash functions. Specifically, CNNH adopts the deep framework in [28] as its basic network, and designs an output layer with softmax activation to generate q-dimensional hash codes. CNNH trains the designed deep network in a supervised way, in which the approximate hash codes learned in Stage are used as the ground-truth. In additionally, if the discrete class labels of training images are available, CNNH also incorporates these image labels to learn the hash functions. Based on the deep network, CNNH learns deep features and hash functions at the same time. However CNNH is a twostage framework, where the learned deep features in Stage 2 cannot help to improve the approximate hash code learning in Stage, which limits the performance of hash learning. B. Network in Network Hashing Different from the two-stage framework in CNNH [8], Network in Network Hashing (NINH) [5] integrates the image representation learning and hash code learning into a one-stage framework. NINH constructs the deep framework for hash learning based on the Network in Network architecture [29], with a shared sub-network composed of several stacked convolutional layers to extract the image features, as well as a divideand-encode module encouraged by sigmoid activation function and a piece-wise threshold function to output the binary hash codes. During the learning process, without generating the approximate hash codes in advance, NINH designs a triplet ranking loss function to exploit the relative similarity of the training images to directly guide the hash learning: l triplet (I,I +,I ) = max(0, ( b(i) b(i ) H b(i) b(i + ) H )) s.t. b(i),b(i + ),b(i ) {,} q (3) where I,I + and I specify the triplet constraint that image I is more similar to image I + than to image I based on image labels; b( ) denotes the binary hash code, and H denotes the Hamming distance. For easy optimization of the equation (3), NINH applies two relaxation tricks: relaxing the integer constraint on the binary hash code and replacing the Hamming distance with Euclidean distance. C. Bit-scalable Deep Hashing The Bit-scalable Deep Hashing (BS-DRSCH) method [9] is also a one-stage deep hashing method that constructs an endto-end architecture for hash learning. In addition, BS-DRSCH is able to flexibly manipulate the hash code length by weighing each bit of the hash codes. BS-DRSCH defines a weighted Hamming affinity to measure the dissimilarity between two hash codes: q H(b(I i ),b(i j )) = wk 2 b k(i i )b k (I j ) (4) k= where I i and I j are training images; b( ) denotes the q- dimensional binary hash code, and b k ( ) denotes the k-th hash bit in b( ); w k is the weight value attached to b k ( ), which is to be learned and expected to represent the significance of b k ( ). Similar to NINH [5], BS-DRSCH organizes the training images into triplet tuples, then formulates the loss function as follows: loss = max(h(b(i),b(i + )) H(b(I),b(I )),C) I,I +,I + H(b(I i ),b(i j ))S ij 2 i,j (5)

4 4 Fig.. Overview of our proposed deep network architecture. The training set are composed of both the labeled and unlabeled images (the notations y,y 2 and y 3 denote different image labels). The representation learning layers extract discriminative deep features for the input images, which are also used for online graph constructing. Then the hash code learning layer and the classification layer are connected, of which the former maps the image features into q-dimensional hash vectors, and the latter predicts the pseudo labels of the unlabeled data. The last representation learning layer, the hash code learning layer and the classification layer are all connected to the semi-supervised loss layer. In addition, a softmax loss layer is also used to train the classification layer. where S ij denotes the similarity between images I i and I j based on labels. The defined loss is the sum of two terms: the first term is a max-margin term, which is designed to preserve the ranking orders; the second term is a regularization term, which is defined to keep the pairwise adjacency relations. BS-DRSCH builds a deep convolutional neural network to perform hash learning, in which the convolutional layers are used to extract deep features, while the fully-connected layer followed by the tanh-like layer is used to generate the hash codes. For learning the weight values {w k },k =,...,q, BS- DRSCH designs an element-wise layer on the top of the deep architecture, which is able to weigh each bit of the hash code. Both NINH [5] and BS-DRSCH [9] are supervised deep hashing methods, and they only design the loss functions to preserve the semantic similarity based on the labeled training data, but ignore the underlying data structures of the unlabeled data, which is essential to capture the semantic neighborhoods for effective hashing. III. SEMI-SUPERVISED DEEP HASHING Given a set of n images X that are composed of the labeled images (X L,Y L ) = {(x,y ),(x 2,y 2 ),...,(x l,y l )} and the unlabeled images X U = {x l+,x l+2,...,x n }, where x i R D, i = n and y i {,2,...,C}, i = l, Y L is the label information. The goal of hashing methods is to learn a mapping function H : X {,} q, which encodes an image X X into a q-dimensional binary code H(X) in the Hamming space, while preserving the semantic similarity of images. Existing deep hashing methods [5], [8], [9] are designed for the supervised scenarios, which only use the labeled data (X L,Y L ) to preserve the semantic similarity but ignore the underlying data structures. In this paper, we propose a novel deep hashing method called semi-supervised deep hashing (SSDH), to learn better hash codes by preserving the semantic similarity and underlying data structures simultaneously. As shown in Fig., the deep architecture of SSDH includes three main components: ) A deep convolutional neural network is designed to learn discriminative deep features for images, which takes both the labeled and unlabeled image data as input. 2) A hash code learning layer and a classification layer are connected, of which the former is used to map the image features into q-dimensional hash codes, and the latter is used to predict the pseudo labels of the unlabeled data. 3) A semi-supervised loss is designed to preserve the semantic similarity, meanwhile captures the neighborhood structures for effective hashing, which is formulated as minimizing the empirical error on the labeled data as well as the embedding error on both the labeled and unlabeled data. These three components are integrated into a unified framework, thus our proposed SSDH can perform image representation learning and hash code learning simultaneously. In the following parts of this section, we first introduce the proposed deep network architecture, then we introduce the proposed semi-supervised loss function as well as the training of the proposed network. A. The Proposed Deep Network The deep architecture of the proposed SSDH is shown in Fig.. We adopt the Fast CNN architecture (denoted as VGG- F for simplification) from [30] as the base network. The first seven layers are configured the same as those in VGG-F [30], which build the representation learning layers to extract the discriminative image features. We mainly focus on designing the hash code learning layer and the semi-supervised loss

5 5 function. The eighth layer is the hash code learning layer, which is constructed by a fully-connected layer followed by a sigmoid activation function, and its outputs are hash codes defined as: h(x) = sigmoid(w T f(x)+v) (6) where f(x) is the deep features extracted from the last representation learning layer, i.e., the output of fully-connected layer fc7, W denotes the weights in the hash code learning layer, v is the bias parameter. Through the hash code learning layer, the image features f(x) are mapped into [0,] q. Since the hash codes h(x) [0,] q are continuous real values, we apply a thresholding function g(x) = sgn(x ) to obtain binary codes: b k (x) = g(h(x)) = sgn(h k (x) ), k =,2,,q (7) The ninth layer is the classification layer which predicts the pseudo labels of the unlabeled data, we use the softmax loss function to train the classification layer. Outputs of the last representation learning layer (fc7), the hash code learning layer (fc8) and the classification layer (fc9) are connected with the semi-supervised loss function, so that the proposed loss function can make full use of the underlying data structures of extracted image features and the pseudo labels to improve the hash learning performance. The whole network is an endto-end structure, which can perform hash code learning and feature learning simultaneously. TABLE I DETAILED NETWORK CONFIGURATIONS OF OUR PROPOSED SSDH MODEL Layers Configuration conv filter 64xx, st. 4x4, pad 0, LRN, pool 2x2 conv2 filter 256x5x5, st. x, pad 2, LRN, pool 2x2 conv3 filter 256x3x3, st. x, pad conv4 filter 256x3x3, st. x, pad conv5 filter 256x3x3, st. x, pad, pool 2x2 fc6 4096, dropout fc7 4096, dropout fc8 q-dimensional, sigmoid fc9 c-dimensional The detailed configurations of our network structure are shown in Table I. For the five convolutional layers: the filter parameter specifies the number and the receptive filed size of convolutional kernels as num size size; the st. and pad parameters specify the convolution stride and spatial padding respectively; the LRN indicates whether Local Response Normalization (LRN) [27] is used; the pool indicates whether to apply the max-pooling downsampling, and specifies the pooling window size by size size. For the fully-connected layers, we specify their dimensionality, of which the first two are 4096, the hash layer is the same as the hash code length q, the classification layer is the same as the class number c. The dropout indicates whether the fullyconnected layer is regularized by dropout [27]. The hash code learning layer takes sigmoid as activation function, while the other weighted layers take Rectied Linear Unit (ReLU) [27] as activation functions. Note that although we adopt VGG- F [30] as our base network, it s practicable to replace the base network by other deep convolutional network structures, such as Network in Network (NIN) [29] model or AlexNet [27] model. B. Semi-supervised Deep Hashing Learning Most of the existing deep hashing methods are designed for supervised scenario, which construct their loss functions to preserve the pairwise similarity [8] or ranking similarity [5], [9], [0] based on the label annotations. The deep networks often require a large amount of the labeled data to achieve better performance, however labeling images consumes large amounts of time and human resources, which is not practical in real world application scenario. To address this problem, we propose a semi-supervised loss function, which can jointly preserve the semantic similarity and capture the neighborhood constraints on the underlying data structures. Our proposed semi-supervised loss function is composed of three terms, which are a supervised ranking term applied on the labeled data, a semi-supervised embedding term applied on both the labeled and unlabeled data, and a pseudo-label term which applied on the unlabeled data. The first term can preserve the semantic similarity by minimizing the empirical error, the second term can capture the underlying data structures by minimizing the embedding error, and the third term encourages the decision boundary in low density regions, which is complementary with semi-supervised embedding term. Thus the proposed SSDH model can fully benefit from both the labeled and unlabeled data. We introduce these three terms separately in the following parts of this subsection. ) Supervised Ranking Term: Encouraged by [5], [9], we formulate the supervised ranking term as triplet ranking regularization to preserve the relative semantic similarity provided by the label annotations. For the labeled images (X L,Y L ), we sample a set of triplet tuples depending on the labels, T = {(x L i,xl+ i,x L i )} t i=, in which xl i and x L+ i are two similar images with the same labels, while x L i and x L i are two dissimilar images with different labels, and t is the number of sampled triplet tuples. For the triplet tuple (x L i,xl+ i,x L i ),i = t, the supervised ranking term is defined as: J L (x L i,x L+ i,x L i ) = max(0,m t + b(x L i ) b(x L+ i ) H b(x L i ) b(x L i ) H ) (8) where H denotes the Hamming distance, and the constant parameter m t defines the margins between the relative similarity of the two pairs (x L i,xl+ i ) and (x L i,xl i ), that is to say, we expect the Hamming distance of the dissimilar pair (x L i,xl i ) to be larger than the distance of the similar pair (x L i,xl+ i ) by at least m t. Thus minimizing J L can reach our goal to preserve the semantic ranking information provided by labels. In equation (8), the binary hash code and Hamming distance make it hard to directly optimize. Similar to NINH [5], we first replace the Hamming distance with Euclidean distance, then

6 6 we relax the binary hash code b(x) with continuous real value hash code h(x). Equation (8) can be rewritten as: J L (x L i,xl+ i,x L i ) = max(0,m t + h(x L i ) h(xl+ i ) 2 h(x L i ) h(xl i ) 2 ) (9) 2) Semi-supervised Embedding Term: Although we can train the network solely based on the supervised ranking term, the deep models with multiple layers often require a large amount of labeled data to achieve better performance. Usually it s hard to collect abundant labeled images, since labeling images consumes large amounts of time and human resources. Thus it s necessary to improve the search accuracy by utilizing the unlabeled data which are more convenient to obtain. Manifold assumption [3] states that data points within the same manifold are likely to have the same labels. Based on this assumption, the graph-based embedding algorithms are often used to capture the underlying manifold structures of data. Inspired by the semi-supervised embedding algorithms in deep networks [32], we add an embedding term which captures the underlying structures of both the labeled and unlabeled data. Modeling the neighborhood structures of data often requires to build a neighborhood graph. A neighborhood graph is defined as G = (V, E), where the vertices V represent the data points, and the edges E, which are often represented by an n n adjacency matrix A, indicate the adjacency relations among the data points in a specified space (usually in the feature space). But for large scale data, building the neighborhood graph on all the data is time and memory consuming, since the time complexity and memory cost of building the k-nn graph on all the data are both O(n 2 ), which is intractable for large n. Instead of constructing the graph offline on all the data, we propose to construct the graph online within a mini-batch by using the features extracted from the proposed deep networks. More specifically, given the minibatch including r images, B = {x i } r i=, and the deep features {f(x i ) : x i B}, we use the deep features to calculate the k-nearest-neighbors NN k (i) for each image x i, then we can construct the adjacency matrix as: A(i,j) = : I(i,j) = 0 x j NN k (i) 0 : I(i,j) = 0 x j / NN k (i) : I(i,j) = (0) where I(i,j) is an indicator function: I(i,j) = if x i and x j are both the labeled images; otherwise I(i,j) = 0. A(i,j) = means that we don t build neighborhood graph between two labeled image points, since we can use the supervised ranking term to model their similarity relations. Because the size of mini-batch is much smaller than the size of the whole training data, i.e., r n, thus it is efficient to compute online. Furthermore, building the neighborhood graph online can benefit from the representative deep features, since the training procedure of deep network generates more and more discriminative feature representations, which lead to capture the semantic neighbors more accurately. Based on the constructed adjacency matrix, we expect the neighbor image pairs to have small distances, while the nonneighbor image pairs to have large distances. We apply pairwise contrastive regularization [33] to achieve this goal. Thus for one image pair (x i,x j ), our semi-supervised embedding term is formulated as: J U (x i,x j ) = { b(x i ) b(x j ) H,A(i,j) = max(0,m p b(x i ) b(x j ) H ),A(i,j) = 0 () where J U (x i,x j ) is the contrastive loss between the two images x i and x j based on the neighborhood graph, H represents the Hamming distance, and the constant m p is a margin parameter for the distance metric of non-neighbor image pairs. With the neighborhood graphs constructed on the mini-batches, the semi-supervised embedding term J U captures the embedding error on both the labeled and unlabeled data. We minimize J U to preserve the underlying neighborhood structures. Similar to equation (8), we adopt the same relaxation strategy on equation (), then we rewrite the embedding term as: J U (x i,x j ) = { h(x i ) h(x j ) 2,A(i,j) = max(0,m p h(x i ) h(x j ) ) 2,A(i,j) = 0 (2) 3) Pseudo-label Term: Apart from our graph approach, D.-H. Lee [34] proposes the pseudo-label method for image classification task, which can also make use of the unlabeled data. Pseudo-label method predicts the unlabeled data with the class which has the maximum predicted probability as their true labels. The maximum a posteriori estimation encourages this approach to have the decision boundary in low density regions, which improves the generalization of classification model. Different from our proposed graph approach, which encourages neighborhood image pairs to have small distance and non-neighbor image pairs to have large distance, pseudolabel method encourages each image data to have an -of- C prediction, these two approaches are complementary to each other. Thus we integrate pseudo-label method in our framework. More specifically: in our network structure, the top layers have two branches, one is hash code learning layer, and the other is a classification layer, which predicts the pseudo labels of unlabeled data. We also take the class with maximum predicted probability as their true label. Then we also use the contrastive loss function to model semantic similarity between the pseudo labels. Similar to equation (2), the pseudo-label term is defined as follows: J P (x i,x j ) = { h(x i ) h(x j ) 2,PL i = PL j (3) max(0,m p h(x i ) h(x j ) ) 2,PL i PL j where the PL i and PL j are the predicted pseudo labels. 4) The Semi-supervised Loss: Combining the supervised ranking term, semi-supervised embedding term and the

7 7 pseudo-label term, we define the semi-supervised loss function as: t n J = J L (x L i,x L+ i,x L i )+λ J U (x i,x j ) +µ i= n J P (x i,x j ) i= i,j= s.t. x L i,x L+ i,x L i X L, x i,x j X (4) where t is the number of triplet tuples sampled from the labeled images, λ and µ are the balance parameters. In order to make the neighbor and non-neighbor image pairs to be balanced, we only use a small part of image pairs obtained by the neighborhood graph to calculate the loss. More specifically, for each image x i, we randomly select a neighbor pair (x i,x + i ), and a non-neighbor pair (x i,x i ) for semi-supervised embedding term, and we randomly select a positive pair (x L i,xl+ p i ) and negative pair (x L i,xl p i ) for the pseudo-label term, then we rewrite the loss function as: J = +µ t i n i= J L (x L i,x L+ i,x L i )+λ n (J U (x i,x + i )+J U(x, x i )) i= (J P (x L i,x L+ p i )+J P (x L i,x L p i )) (5) In our semi-supervised loss function, the supervised ranking term is applied on the labeled data to preserve the semantic similarity provided by image labels, while the semi-supervised embedding term and pseudo-label term are applied on both the labeled and unlabeled data to preserve the underlying data structures. Thus the proposed loss function can fit the semisupervised scenario, and make full use of both the labeled and unlabeled data to improve the hash learning performance. C. Network Training The proposed SSDH network is trained in a mini-batch way, that is to say, we only use a mini-batch of training images to update the network s parameters in one iteration. For each iteration, the input image batch B X is composed of some labeled images (X B L,YB L ) = {(x,y ),(x 2,y 2 ),,(x m,y m )} and some unlabeled images X B U = {x m+,x m+2,,x r }, where m denotes the number of labeled images in the minibatch, and r denotes the total number of images in the minibatch. The proposed network structure has two branches on the top, i.e., the hash code learning layer and the classification layer. We train these two layers simultaneously. For the classification layer, we use the softmax loss function to model the prediction errors on the labeled images, minimizing softmax loss can make the predicted pseudo labels more accurate. For the hash code learning layer, with the equations (9), (2), (3) and (5), we adopt stochastic gradient descent (SGD) to train the deep network. For each triplet tuple and image pair, we use the back-propagation algorithm (BP) to update the parameters of the network. More specifically, according to equation (9), we can compute the gradient for each triplet tuple(x L i,xl+ i,x L i ), with regard to h(x L i ), h(xl+ i ) and h(x L i ) respectively as: J L h(x L i ) = 2(h(xL i ) h(x L+ i )) J L h(x L+ i ) = + 2(h(xL i ) h(x L i )) J L h(x L i ) = 2(h(xL i ) h(xl i )) (6) Since the semi-supervised term and pseudo-label term are similar, according to equation (2) and (3), for the neighbor pair (x i,x + i ) (or positive pseudo-label pair (xl i,xl+ p i )), we compute the gradient w.r.t. h(x i ), h(x + i ) respectively as: J U h(x i ) = 2(h(x i) h(x + i )) J U h(x + i ) = 2(h(x+ i ) h(x i)) (7) and for the non-neighbor pair (x i,x i ) (or negative pseudolabel pair (x L i,xl p i )), we compute the gradient w.r.t. h(x i ), h(x i ) respectively as: J U h(x i ) = 2(m p h(x i ) h(x i ) ) sgn(h(x i) h(x i )) J U h(x i ) = 2(m p h(x i ) h(x i ) ) sgn(h(x i ) h(x i)) (8) These derivative values can be fed into the network via the back-propagation algorithm to update the parameters of each layer in the deep network. The training procedure is ended until the loss converges or a predefined maximal iteration number is reached. After the deep model is trained, we can generate a q-bit binary code for any input image. For an input image x, it is first encoded into a hash codes h(x) [0,] q by the trained model. Then we can compute the binary code for hashing by equation (7). In this way, the deep network can be trained end-to-end in a semi-supervised fashion, and the newly input images can be encoded into binary hash codes by the deep model, we can perform efficient image retrieval via fast Hamming distance calculation between binary hash codes. IV. EXPERIMENTS In this section, we will introduce the experiments conducted on 4 widely used datasets (, CIFAR-0, NUS-WIDE and ) with 7 state-of-the-art methods, including unsupervised methods LSH [4], SH [7] and ITQ [3], supervised methods SDH [25], CNNH [8], NINH [5] and DRSCH [9]. LSH, SH, ITQ and SDH are traditional hashing methods, which take hand-crafted image features as input to learn hash functions, while CNNH, NINH, DRSCH and proposed SSDH approach are the deep hashing methods, which take the raw image pixels as input to conduct hash learning. In the following of this section, we first introduce the datasets and experiment settings, then we compare the proposed approach with state-of-the-art methods. To further verify each term in

8 DRSCH* CNNH* SDH-VGGF ITQ-VGGF SH-VGGF LSH-VGGF SDH ITQ SH LSH Fig. 2. The comparison results on dataset. Precision curves within top 500 retrieved samples w.r.t. different length of hash codes; Precision curves with 48bit hash codes w.r.t. different number of top retrieved samples; Precision-recall curves of Hamming ranking with 48bit hash codes. the proposed semi-supervised loss function, we also present the results of only using the first term (supervised ranking term), only using the first and second terms (supervised ranking term and semi-supervised embedding term), and using all the three terms (our SSDH). A. Datasets We conduct experiments on 4 widely used image retrieval datasets: The dataset contains 70,000 gray scale handwritten images from 0 to 9 with the resolution of Following [8],,000 images are randomly selected as the query set (00 images per class). For the unsupervised methods, all the rest of the images are used as the training set. For the supervised methods, 5,000 images (500 images per class) are randomly selected from the rest images as the training set. The CIFAR-0 2 dataset consists of 60,000 color images with the resolution of from 0 classes, each of which has 6,000 images. For the fairly comparison, following [5], [8],,000 images are randomly selected as the query set (00 images per class). For the unsupervised methods, all the rest of images are used as the training set. For the supervised methods, 5,000 images (500 images per class) are further randomly selected from the rest of images to form the training set. The NUS-WIDE 3 [35] dataset contains nearly 270,000 images, and each image is associated with one or multiple labels of 8 semantic concepts. Following [5], [5], only the 2 most frequent concepts are used, where each concept has at least 5,000 images. Following [5], [8], 2,00 images are randomly selected as the query set (00 images per concept). For the unsupervised methods, all the rest of images are used as the training set. For the supervised methods, 500 images from each of the 2 concepts are randomly selected to form the training set kriz/cifar.html 3 The 4 [36] dataset consists of 25,000 images collected from Flickr, and each image is associated with one or multiple labels of 38 semantic concepts.,000 images are randomly selected as the query set. For the unsupervised methods, all the rest of images are used as the training set. For the supervised methods, 5,000 images are randomly selected from the rest of images to form the training set. B. Experiment Settings We implement the proposed SSDH method based on the open-source framework Caffe [37]. The parameters of our network are initialized with the VGG-F network [30], which is pre-trained on the ImageNet dataset [38]. Similar initialization strategy has been used in other deep hashing methods [0], []. In all experiments, our networks are trained with the initial learning rate of 0.00, and we divide the learning rate by 0 each 20,000 steps. The size of mini-batch is 28, and the weight decay parameter is For the four parameters in the proposed loss function, we simply set m t =, m p = 2, λ = 0. and µ = 0. in all the experiments. For the proposed SSDH method, CNNH, NINH and DRSCH, we use the raw image pixels as input. The implementations of CNNH [8] and DRSCH [9] are provided by their authors, while NINH [5] is of our own implementation. Since the network structures of these four deep hashing methods are different from each other, for a fair comparison, we use the same VGG-F network as the base structure for the four deep methods. And the network parameters of the deep hashing methods are initialized with the same pre-trained VGG-F model, thus we can perform fair comparison among these deep hashing methods. The results of CNNH [8], NINH [5] and DRSCH [9] are referred as CNNH, NINH and DRSCH respectively. For other compared methods without deep networks, i.e., the traditional hashing methods, we represent each image by hand-crafted features and deep features respectively. For 4

9 DRSCH* CNNH* SDH-VGGF ITQ-VGGF SH-VGGF LSH-VGGF SDH ITQ SH LSH Fig. 3. The comparison results on CIFAR-0 dataset. Precision curves within top 500 retrieved samples w.r.t. different length of hash codes; Precision curves with 48bit hash codes w.r.t. different number of top retrieved samples; Precision-recall curves of Hamming ranking with 48bit hash codes. hand-crafted features, we represent images in the CIFAR- 0 and by 52-dimensional GIST features, images in by 784-dimensional gray scale values, and images in the NUS-WIDE by 500-dimensional bag-of-words features. For a fair comparison between traditional methods and deep hashing methods, we also conduct experiments on the traditional methods with the features extracted from deep networks, where we extract 4096-dimensional deep feature for each image from the activation of the second fully-connected layer (fc7) of the pre-trained VGG-F network. We denote the results of traditional methods using hand-crafted features by LSH, SH, ITQ and SDH, and we denote the results of traditional methods using deep features by LSH-VGGF, SH- VGGF, ITQ-VGGF and SDH-VGGF. The results of SDH, SH and ITQ are obtained from the implementations provided by the authors of original paper, while the results of LSH are from our own implementation. TABLE II MAP SCORES WITH DIFFERENT LENGTH OF HASH CODES ON DATASET. Methods (MAP) 2bit 24bit 32bit 48bit SSDH DRSCH* [9] [5] CNNH* [8] SDH-VGGF ITQ-VGGF SH-VGGF LSH-VGGF SDH [25] ITQ [3] SH [7] LSH [4] C. Experiment Results and Analysis To evaluate the retrieval accuracy of the proposed approach and the compared methods, we use four evaluation metrics: Mean Average Precision (MAP), precision-recall curves, precision curves w.r.t different number of top retrieved samples, and precision curves within top 500 retrieved samples w.r.t. TABLE III MAP SCORES WITH DIFFERENT LENGTH OF HASH CODES ON DATASET. Methods (MAP) 2bit 24bit 32bit 48bit SSDH DRSCH [9] NINH [5] CNNH [8] SDH-VGGF ITQ-VGGF SH-VGGF LSH-VGGF SDH [25] ITQ [3] SH [7] LSH [4] different hash code length, which can objectively and comprehensively evaluate the accuracy of our approach and the compared methods. ) Experiment results on : Table II shows the MAP scores within all the retrieved results on dataset for different length of hash codes. Fig. 2, 2 and 2 show the precision curves within top 500 retrieved samples for different code length, the precision curves with 48bit hash codes of different number of top retrieved samples, and the precision-recall curves of 48bit hash codes on respectively. We can clearly see that: () The proposed SSDH approach outperforms all compared state-of-the-art methods on all hash code length. Note that is a relatively simple dataset compared to the other 3 datasets. Although the deep method DRSCH has achieved a high average MAP score of 95.6%, our methods still achieve the average highest MAP score of 97.0%. (2) From these 3 figures, although compared deep hashing methods have achieved a high precision on three metrics, our proposed SSDH still achieves the highest precision at all recall levels, and the highest precision at all retrieved numbers, as well as the highest precision of top 500 retrieved samples on all length of hash codes. (3) Our proposed SSDH method outperforms other 3 supervised deep hashing methods, which demonstrates the effectiveness

10 0 of exploiting the underlying structures of unlabeled data. (4) Comparing traditional methods using deep features and the identical methods using hand-crafted features, we can observe that deep features can boost the performance of traditional methods. 2) Experiment results on : Table III shows the MAP scores within all the retrieved results on dataset for different hash code length. We can observe from the result table that: () Compared to state-of-the-art supervised deep hashing methods, the proposed SSDH approach improves the average MAP from 73.2% (DRSCH ), 67.2% (NINH ) and 56.0% (CNNH ) to 8.0%. It s because the proposed SSDH designs a semi-supervised loss function, which not only preserves the semantic similarity provided by labels, but also captures the meaningful neighborhood structure, thus improving the search accuracy. (2) SSDH, DSRCH and NINH significantly outperform the two-stage method CNNH, because they learn the hash codes and image features simultaneously, which can benefit from each other, thus leading to better performance compared to CNNH. (3) The traditional hashing methods using deep features significantly outperform the identical methods using hand-crafted features. For example, compared to SDH, SDH-VGG improves the average MAP score from 32.2% to 49.0%, which indicates that deep features can also boosts traditional hashing methods performance. Fig. 3 shows the precision curves within top 500 retrieved samples for different length of hash codes, we can observe that the proposed SSDH approach achieves the best search accuracy on all hash code lengths (over 80% search precision). Fig. 3 shows the precision curves with 48bit hash codes of different number of top retrieved samples, the proposed SSDH consistently achieves over 80% search precision when the number of retrieved results increases. Fig. 3 shows the precision-recall curves of Hamming ranking with 48bit hash codes, and the proposed SSDH method still achieves the best results. TABLE IV MAP SCORES WITH DIFFERENT LENGTH OF HASH CODES ON NUS-WIDE DATASET. Methods NUS-WIDE (MAP) 2bit 24bit 32bit 48bit SSDH DRSCH [9] NINH [5] CNNH [8] SDH-VGGF ITQ-VGGF SH-VGGF LSH-VGGF SDH [25] ITQ [3] SH [7] LSH [4] ) Experiment results on NUS-WIDE: Table IV shows the MAP scores within all the retrieved results on NUS-WIDE. The precision curves within top 500 retrieved samples of different length of hash codes are shown in Fig. 4. The precision curves with 48bit hash codes of different number of top retrieved samples are illustrated in Fig. 4. The precision-recall curves of Hamming ranking with 48bit hash codes are demonstrated in Fig. 4. Similar results can be observed on NUS-WIDE. Our proposed SSDH achieves the best results on all the four evaluation metrics: () SSDH achieves average MAP score of 72.5%, which outperforms all the compared state-of-the-art supervised deep hashing methods DRSCH (64.5%), NINH (63.%) and CNNH (53.%). This demonstrates the effectiveness of jointly preserving the semantic similarity and the underlying data structure. (2) The proposed SSDH achieves over 80.0% precision of top 500 retrieved results on all code lengths, and outperforms other state-of-the-art methods. (3) SSDH consistently achieves over 80.0% search accuracy when retrieve samples increase on 48bit hash codes, which shows the effectiveness of the proposed method on Hamming ranking. (4) SSDH achieves the best precisions on all recall levels, which further demonstrates the effectiveness of the proposed approach. TABLE V MAP SCORES WITH DIFFERENT LENGTH OF HASH CODES ON DATASET. Methods (MAP) 2bit 24bit 32bit 48bit SSDH DRSCH [9] NINH [5] CNNH [8] SDH-VGGF ITQ-VGGF SH-VGGF LSH-VGGF SDH [25] ITQ [3] SH [7] LSH [4] ) Experiment results on : Table V shows the MAP scores within all the retrieved results on. Similar results can be observed, our SSDH method achieves average MAP score of 77.7%, which outperforms other supervised deep hashing methods DRSCH (73.6%), NINH (7%) and CNNH (65.9%). Fig. 5 shows the precision curves within top 500 retrieved samples of different hash code length, SSDH achieves over 85.0% precision on all code length, which shows a clear advantage over state-of-the-art methods. Fig. 5 illustrates the precision curves with 48bit hash codes of different number of top retrieved samples, it can be observed that SSDH achieves over 85% search accuracy steadily when the number of retrieved results increases, which has superior advantage over state-of-the-art methods. Fig. 5 demonstrates the precision-recall curves of Hamming ranking with 48bit hash codes, SSDH still achieves the best results. On all the four evaluation metrics, it can be observed that our proposed SSDH outperforms other state-of-the-art supervised hashing methods, which shows the benefit of jointly preserving the semantic similarity and neighborhood structures. In all metrics on the 4 datasets, the proposed SSDH method shows superior advantages over the state-of-the-art hashing methods, which demonstrates the effectiveness of proposed SSDH method. And SSDH outperforms supervised deep hash-

learning stage (Stage 1), CNNH learns approximate hash codes for training images by optimizing the following loss function:

learning stage (Stage 1), CNNH learns approximate hash codes for training images by optimizing the following loss function: 1 Query-adaptive Image Retrieval by Deep Weighted Hashing Jian Zhang and Yuxin Peng arxiv:1612.2541v2 [cs.cv] 9 May 217 Abstract Hashing methods have attracted much attention for large scale image retrieval.

More information

Supervised Hashing for Image Retrieval via Image Representation Learning

Supervised Hashing for Image Retrieval via Image Representation Learning Supervised Hashing for Image Retrieval via Image Representation Learning Rongkai Xia, Yan Pan, Cong Liu (Sun Yat-Sen University) Hanjiang Lai, Shuicheng Yan (National University of Singapore) Finding Similar

More information

Supervised Hashing for Image Retrieval via Image Representation Learning

Supervised Hashing for Image Retrieval via Image Representation Learning Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence Supervised Hashing for Image Retrieval via Image Representation Learning Rongkai Xia 1, Yan Pan 1, Hanjiang Lai 1,2, Cong Liu

More information

Rongrong Ji (Columbia), Yu Gang Jiang (Fudan), June, 2012

Rongrong Ji (Columbia), Yu Gang Jiang (Fudan), June, 2012 Supervised Hashing with Kernels Wei Liu (Columbia Columbia), Jun Wang (IBM IBM), Rongrong Ji (Columbia), Yu Gang Jiang (Fudan), and Shih Fu Chang (Columbia Columbia) June, 2012 Outline Motivations Problem

More information

arxiv: v2 [cs.cv] 27 Nov 2017

arxiv: v2 [cs.cv] 27 Nov 2017 Deep Supervised Discrete Hashing arxiv:1705.10999v2 [cs.cv] 27 Nov 2017 Qi Li Zhenan Sun Ran He Tieniu Tan Center for Research on Intelligent Perception and Computing National Laboratory of Pattern Recognition

More information

Bit-Scalable Deep Hashing with Regularized Similarity Learning for Image Retrieval and Person Re-identification

Bit-Scalable Deep Hashing with Regularized Similarity Learning for Image Retrieval and Person Re-identification IEEE TRANSACTIONS ON IMAGE PROCESSING 1 Bit-Scalable Deep Hashing with Regularized Similarity Learning for Image Retrieval and Person Re-identification Ruimao Zhang, Liang Lin, Rui Zhang, Wangmeng Zuo,

More information

Hashing with Graphs. Sanjiv Kumar (Google), and Shih Fu Chang (Columbia) June, 2011

Hashing with Graphs. Sanjiv Kumar (Google), and Shih Fu Chang (Columbia) June, 2011 Hashing with Graphs Wei Liu (Columbia Columbia), Jun Wang (IBM IBM), Sanjiv Kumar (Google), and Shih Fu Chang (Columbia) June, 2011 Overview Graph Hashing Outline Anchor Graph Hashing Experiments Conclusions

More information

Adaptive Binary Quantization for Fast Nearest Neighbor Search

Adaptive Binary Quantization for Fast Nearest Neighbor Search IBM Research Adaptive Binary Quantization for Fast Nearest Neighbor Search Zhujin Li 1, Xianglong Liu 1*, Junjie Wu 1, and Hao Su 2 1 Beihang University, Beijing, China 2 Stanford University, Stanford,

More information

arxiv: v1 [cs.cv] 20 Dec 2016

arxiv: v1 [cs.cv] 20 Dec 2016 End-to-End Pedestrian Collision Warning System based on a Convolutional Neural Network with Semantic Segmentation arxiv:1612.06558v1 [cs.cv] 20 Dec 2016 Heechul Jung heechul@dgist.ac.kr Min-Kook Choi mkchoi@dgist.ac.kr

More information

Deep Supervised Hashing with Triplet Labels

Deep Supervised Hashing with Triplet Labels Deep Supervised Hashing with Triplet Labels Xiaofang Wang, Yi Shi, Kris M. Kitani Carnegie Mellon University, Pittsburgh, PA 15213 USA Abstract. Hashing is one of the most popular and powerful approximate

More information

Perceptron: This is convolution!

Perceptron: This is convolution! Perceptron: This is convolution! v v v Shared weights v Filter = local perceptron. Also called kernel. By pooling responses at different locations, we gain robustness to the exact spatial location of image

More information

over Multi Label Images

over Multi Label Images IBM Research Compact Hashing for Mixed Image Keyword Query over Multi Label Images Xianglong Liu 1, Yadong Mu 2, Bo Lang 1 and Shih Fu Chang 2 1 Beihang University, Beijing, China 2 Columbia University,

More information

Deep Learning and Its Applications

Deep Learning and Its Applications Convolutional Neural Network and Its Application in Image Recognition Oct 28, 2016 Outline 1 A Motivating Example 2 The Convolutional Neural Network (CNN) Model 3 Training the CNN Model 4 Issues and Recent

More information

Machine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU,

Machine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU, Machine Learning 10-701, Fall 2015 Deep Learning Eric Xing (and Pengtao Xie) Lecture 8, October 6, 2015 Eric Xing @ CMU, 2015 1 A perennial challenge in computer vision: feature engineering SIFT Spin image

More information

SEMANTIC COMPUTING. Lecture 8: Introduction to Deep Learning. TU Dresden, 7 December Dagmar Gromann International Center For Computational Logic

SEMANTIC COMPUTING. Lecture 8: Introduction to Deep Learning. TU Dresden, 7 December Dagmar Gromann International Center For Computational Logic SEMANTIC COMPUTING Lecture 8: Introduction to Deep Learning Dagmar Gromann International Center For Computational Logic TU Dresden, 7 December 2018 Overview Introduction Deep Learning General Neural Networks

More information

Supervised Hashing for Multi-labeled Data with Order-Preserving Feature

Supervised Hashing for Multi-labeled Data with Order-Preserving Feature Supervised Hashing for Multi-labeled Data with Order-Preserving Feature Dan Wang, Heyan Huang (B), Hua-Kang Lin, and Xian-Ling Mao Beijing Institute of Technology, Beijing, China {wangdan12856,hhy63,1120141916,maoxl}@bit.edu.cn

More information

IN computer vision and many other application areas, there

IN computer vision and many other application areas, there IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 1 Hashing with Mutual Information Fatih Cakir*, Member, IEEE, Kun He*, Student Member, IEEE, Sarah Adel Bargal, Student Member, IEEE and Stan

More information

Deep Learning with Tensorflow AlexNet

Deep Learning with Tensorflow   AlexNet Machine Learning and Computer Vision Group Deep Learning with Tensorflow http://cvml.ist.ac.at/courses/dlwt_w17/ AlexNet Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton, "Imagenet classification

More information

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet.

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. CS 189 Spring 2015 Introduction to Machine Learning Final You have 2 hours 50 minutes for the exam. The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. No calculators or

More information

Compact Hash Code Learning with Binary Deep Neural Network

Compact Hash Code Learning with Binary Deep Neural Network Compact Hash Code Learning with Binary Deep Neural Network Thanh-Toan Do, Dang-Khoa Le Tan, Tuan Hoang, Ngai-Man Cheung arxiv:7.0956v [cs.cv] 6 Feb 08 Abstract In this work, we firstly propose deep network

More information

Deep Learning for Computer Vision II

Deep Learning for Computer Vision II IIIT Hyderabad Deep Learning for Computer Vision II C. V. Jawahar Paradigm Shift Feature Extraction (SIFT, HoG, ) Part Models / Encoding Classifier Sparrow Feature Learning Classifier Sparrow L 1 L 2 L

More information

Deep Supervised Hashing for Fast Image Retrieval

Deep Supervised Hashing for Fast Image Retrieval Deep Supervised Hashing for Fast Image Retrieval Haomiao Liu 1,, Ruiping Wang 1, Shiguang Shan 1, Xilin Chen 1 1 Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS),

More information

Large-scale visual recognition Efficient matching

Large-scale visual recognition Efficient matching Large-scale visual recognition Efficient matching Florent Perronnin, XRCE Hervé Jégou, INRIA CVPR tutorial June 16, 2012 Outline!! Preliminary!! Locality Sensitive Hashing: the two modes!! Hashing!! Embedding!!

More information

DUe to the explosive increasing of data in real applications, Deep Discrete Supervised Hashing. arxiv: v1 [cs.

DUe to the explosive increasing of data in real applications, Deep Discrete Supervised Hashing. arxiv: v1 [cs. Deep Discrete Supervised Hashing Qing-Yuan Jiang, Xue Cui and Wu-Jun Li, Member, IEEE arxiv:707.09905v [cs.ir] 3 Jul 207 Abstract Hashing has been widely used for large-scale search due to its low storage

More information

Learning independent, diverse binary hash functions: pruning and locality

Learning independent, diverse binary hash functions: pruning and locality Learning independent, diverse binary hash functions: pruning and locality Ramin Raziperchikolaei and Miguel Á. Carreira-Perpiñán Electrical Engineering and Computer Science University of California, Merced

More information

Hashing with Binary Autoencoders

Hashing with Binary Autoencoders Hashing with Binary Autoencoders Ramin Raziperchikolaei Electrical Engineering and Computer Science University of California, Merced http://eecs.ucmerced.edu Joint work with Miguel Á. Carreira-Perpiñán

More information

Unsupervised Learning of Spatiotemporally Coherent Metrics

Unsupervised Learning of Spatiotemporally Coherent Metrics Unsupervised Learning of Spatiotemporally Coherent Metrics Ross Goroshin, Joan Bruna, Jonathan Tompson, David Eigen, Yann LeCun arxiv 2015. Presented by Jackie Chu Contributions Insight between slow feature

More information

MoonRiver: Deep Neural Network in C++

MoonRiver: Deep Neural Network in C++ MoonRiver: Deep Neural Network in C++ Chung-Yi Weng Computer Science & Engineering University of Washington chungyi@cs.washington.edu Abstract Artificial intelligence resurges with its dramatic improvement

More information

Facial Expression Classification with Random Filters Feature Extraction

Facial Expression Classification with Random Filters Feature Extraction Facial Expression Classification with Random Filters Feature Extraction Mengye Ren Facial Monkey mren@cs.toronto.edu Zhi Hao Luo It s Me lzh@cs.toronto.edu I. ABSTRACT In our work, we attempted to tackle

More information

Machine Learning 13. week

Machine Learning 13. week Machine Learning 13. week Deep Learning Convolutional Neural Network Recurrent Neural Network 1 Why Deep Learning is so Popular? 1. Increase in the amount of data Thanks to the Internet, huge amount of

More information

Rank Preserving Hashing for Rapid Image Search

Rank Preserving Hashing for Rapid Image Search 205 Data Compression Conference Rank Preserving Hashing for Rapid Image Search Dongjin Song Wei Liu David A. Meyer Dacheng Tao Rongrong Ji Department of ECE, UC San Diego, La Jolla, CA, USA dosong@ucsd.edu

More information

Convolution Neural Networks for Chinese Handwriting Recognition

Convolution Neural Networks for Chinese Handwriting Recognition Convolution Neural Networks for Chinese Handwriting Recognition Xu Chen Stanford University 450 Serra Mall, Stanford, CA 94305 xchen91@stanford.edu Abstract Convolutional neural networks have been proven

More information

Akarsh Pokkunuru EECS Department Contractive Auto-Encoders: Explicit Invariance During Feature Extraction

Akarsh Pokkunuru EECS Department Contractive Auto-Encoders: Explicit Invariance During Feature Extraction Akarsh Pokkunuru EECS Department 03-16-2017 Contractive Auto-Encoders: Explicit Invariance During Feature Extraction 1 AGENDA Introduction to Auto-encoders Types of Auto-encoders Analysis of different

More information

Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group

Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group Deep Learning Vladimir Golkov Technical University of Munich Computer Vision Group 1D Input, 1D Output target input 2 2D Input, 1D Output: Data Distribution Complexity Imagine many dimensions (data occupies

More information

HASHING has been a hot research topic on image processing. Bilinear Supervised Hashing Based on 2D Image Features

HASHING has been a hot research topic on image processing. Bilinear Supervised Hashing Based on 2D Image Features 1 Bilinear Supervised Hashing Based on 2D Image Features Yujuan Ding, Wai Kueng Wong, Zhihui Lai and Zheng Zhang arxiv:1901.01474v1 [cs.cv] 5 Jan 2019 Abstract Hashing has been recognized as an efficient

More information

Deep Learning for Embedded Security Evaluation

Deep Learning for Embedded Security Evaluation Deep Learning for Embedded Security Evaluation Emmanuel Prouff 1 1 Laboratoire de Sécurité des Composants, ANSSI, France April 2018, CISCO April 2018, CISCO E. Prouff 1/22 Contents 1. Context and Motivation

More information

Bilevel Sparse Coding

Bilevel Sparse Coding Adobe Research 345 Park Ave, San Jose, CA Mar 15, 2013 Outline 1 2 The learning model The learning algorithm 3 4 Sparse Modeling Many types of sensory data, e.g., images and audio, are in high-dimensional

More information

AdaDepth: Unsupervised Content Congruent Adaptation for Depth Estimation

AdaDepth: Unsupervised Content Congruent Adaptation for Depth Estimation AdaDepth: Unsupervised Content Congruent Adaptation for Depth Estimation Introduction Supplementary material In the supplementary material, we present additional qualitative results of the proposed AdaDepth

More information

Machine Learning. Nonparametric methods for Classification. Eric Xing , Fall Lecture 2, September 12, 2016

Machine Learning. Nonparametric methods for Classification. Eric Xing , Fall Lecture 2, September 12, 2016 Machine Learning 10-701, Fall 2016 Nonparametric methods for Classification Eric Xing Lecture 2, September 12, 2016 Reading: 1 Classification Representing data: Hypothesis (classifier) 2 Clustering 3 Supervised

More information

arxiv: v1 [cond-mat.dis-nn] 30 Dec 2018

arxiv: v1 [cond-mat.dis-nn] 30 Dec 2018 A General Deep Learning Framework for Structure and Dynamics Reconstruction from Time Series Data arxiv:1812.11482v1 [cond-mat.dis-nn] 30 Dec 2018 Zhang Zhang, Jing Liu, Shuo Wang, Ruyue Xin, Jiang Zhang

More information

Kaggle Data Science Bowl 2017 Technical Report

Kaggle Data Science Bowl 2017 Technical Report Kaggle Data Science Bowl 2017 Technical Report qfpxfd Team May 11, 2017 1 Team Members Table 1: Team members Name E-Mail University Jia Ding dingjia@pku.edu.cn Peking University, Beijing, China Aoxue Li

More information

Deep Face Recognition. Nathan Sun

Deep Face Recognition. Nathan Sun Deep Face Recognition Nathan Sun Why Facial Recognition? Picture ID or video tracking Higher Security for Facial Recognition Software Immensely useful to police in tracking suspects Your face will be an

More information

Learning Discrete Representations via Information Maximizing Self-Augmented Training

Learning Discrete Representations via Information Maximizing Self-Augmented Training A. Relation to Denoising and Contractive Auto-encoders Our method is related to denoising auto-encoders (Vincent et al., 2008). Auto-encoders maximize a lower bound of mutual information (Cover & Thomas,

More information

Topics in AI (CPSC 532L): Multimodal Learning with Vision, Language and Sound. Lecture 12: Deep Reinforcement Learning

Topics in AI (CPSC 532L): Multimodal Learning with Vision, Language and Sound. Lecture 12: Deep Reinforcement Learning Topics in AI (CPSC 532L): Multimodal Learning with Vision, Language and Sound Lecture 12: Deep Reinforcement Learning Types of Learning Supervised training Learning from the teacher Training data includes

More information

Approximate Nearest Neighbor Search. Deng Cai Zhejiang University

Approximate Nearest Neighbor Search. Deng Cai Zhejiang University Approximate Nearest Neighbor Search Deng Cai Zhejiang University The Era of Big Data How to Find Things Quickly? Web 1.0 Text Search Sparse feature Inverted Index How to Find Things Quickly? Web 2.0, 3.0

More information

Adaptive Hash Retrieval with Kernel Based Similarity

Adaptive Hash Retrieval with Kernel Based Similarity Adaptive Hash Retrieval with Kernel Based Similarity Xiao Bai a, Cheng Yan a, Haichuan Yang a, Lu Bai b, Jun Zhou c, Edwin Robert Hancock d a School of Computer Science and Engineering, Beihang University,

More information

arxiv: v3 [cs.cv] 3 Dec 2017

arxiv: v3 [cs.cv] 3 Dec 2017 Structured Deep Hashing with Convolutional Neural Networks for Fast Person Re-identification Lin Wu, Yang Wang ISSR, ITEE, The University of Queensland, Brisbane, QLD, 4072, Australia The University of

More information

Dynamic Routing Between Capsules

Dynamic Routing Between Capsules Report Explainable Machine Learning Dynamic Routing Between Capsules Author: Michael Dorkenwald Supervisor: Dr. Ullrich Köthe 28. Juni 2018 Inhaltsverzeichnis 1 Introduction 2 2 Motivation 2 3 CapusleNet

More information

Machine Learning Classifiers and Boosting

Machine Learning Classifiers and Boosting Machine Learning Classifiers and Boosting Reading Ch 18.6-18.12, 20.1-20.3.2 Outline Different types of learning problems Different types of learning algorithms Supervised learning Decision trees Naïve

More information

Advanced Video Analysis & Imaging

Advanced Video Analysis & Imaging Advanced Video Analysis & Imaging (5LSH0), Module 09B Machine Learning with Convolutional Neural Networks (CNNs) - Workout Farhad G. Zanjani, Clint Sebastian, Egor Bondarev, Peter H.N. de With ( p.h.n.de.with@tue.nl

More information

A Sparse and Locally Shift Invariant Feature Extractor Applied to Document Images

A Sparse and Locally Shift Invariant Feature Extractor Applied to Document Images A Sparse and Locally Shift Invariant Feature Extractor Applied to Document Images Marc Aurelio Ranzato Yann LeCun Courant Institute of Mathematical Sciences New York University - New York, NY 10003 Abstract

More information

Automatic detection of books based on Faster R-CNN

Automatic detection of books based on Faster R-CNN Automatic detection of books based on Faster R-CNN Beibei Zhu, Xiaoyu Wu, Lei Yang, Yinghua Shen School of Information Engineering, Communication University of China Beijing, China e-mail: zhubeibei@cuc.edu.cn,

More information

Semantic Segmentation. Zhongang Qi

Semantic Segmentation. Zhongang Qi Semantic Segmentation Zhongang Qi qiz@oregonstate.edu Semantic Segmentation "Two men riding on a bike in front of a building on the road. And there is a car." Idea: recognizing, understanding what's in

More information

Intro to Deep Learning. Slides Credit: Andrej Karapathy, Derek Hoiem, Marc Aurelio, Yann LeCunn

Intro to Deep Learning. Slides Credit: Andrej Karapathy, Derek Hoiem, Marc Aurelio, Yann LeCunn Intro to Deep Learning Slides Credit: Andrej Karapathy, Derek Hoiem, Marc Aurelio, Yann LeCunn Why this class? Deep Features Have been able to harness the big data in the most efficient and effective

More information

Learning Label Preserving Binary Codes for Multimedia Retrieval: A General Approach

Learning Label Preserving Binary Codes for Multimedia Retrieval: A General Approach 1 Learning Label Preserving Binary Codes for Multimedia Retrieval: A General Approach KAI LI, GUO-JUN QI, and KIEN A. HUA, University of Central Florida Learning-based hashing has received a great deal

More information

arxiv: v1 [cs.lg] 22 Feb 2016

arxiv: v1 [cs.lg] 22 Feb 2016 Noname manuscript No. (will be inserted by the editor) Structured Learning of Binary Codes with Column Generation Guosheng Lin Fayao Liu Chunhua Shen Jianxin Wu Heng Tao Shen arxiv:1602.06654v1 [cs.lg]

More information

Deep Semantic-Preserving and Ranking-Based Hashing for Image Retrieval

Deep Semantic-Preserving and Ranking-Based Hashing for Image Retrieval Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI-16) Deep Semantic-Preserving and Ranking-Based Hashing for Image Retrieval Ting Yao, Fuchen Long, Tao Mei,

More information

Lab meeting (Paper review session) Stacked Generative Adversarial Networks

Lab meeting (Paper review session) Stacked Generative Adversarial Networks Lab meeting (Paper review session) Stacked Generative Adversarial Networks 2017. 02. 01. Saehoon Kim (Ph. D. candidate) Machine Learning Group Papers to be covered Stacked Generative Adversarial Networks

More information

CENG 783. Special topics in. Deep Learning. AlchemyAPI. Week 11. Sinan Kalkan

CENG 783. Special topics in. Deep Learning. AlchemyAPI. Week 11. Sinan Kalkan CENG 783 Special topics in Deep Learning AlchemyAPI Week 11 Sinan Kalkan TRAINING A CNN Fig: http://www.robots.ox.ac.uk/~vgg/practicals/cnn/ Feed-forward pass Note that this is written in terms of the

More information

Transfer Learning. Style Transfer in Deep Learning

Transfer Learning. Style Transfer in Deep Learning Transfer Learning & Style Transfer in Deep Learning 4-DEC-2016 Gal Barzilai, Ram Machlev Deep Learning Seminar School of Electrical Engineering Tel Aviv University Part 1: Transfer Learning in Deep Learning

More information

Supplementary A. Overview. C. Time and Space Complexity. B. Shape Retrieval. D. Permutation Invariant SOM. B.1. Dataset

Supplementary A. Overview. C. Time and Space Complexity. B. Shape Retrieval. D. Permutation Invariant SOM. B.1. Dataset Supplementary A. Overview This supplementary document provides more technical details and experimental results to the main paper. Shape retrieval experiments are demonstrated with ShapeNet Core55 dataset

More information

A novel supervised learning algorithm and its use for Spam Detection in Social Bookmarking Systems

A novel supervised learning algorithm and its use for Spam Detection in Social Bookmarking Systems A novel supervised learning algorithm and its use for Spam Detection in Social Bookmarking Systems Anestis Gkanogiannis and Theodore Kalamboukis Department of Informatics Athens University of Economics

More information

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet.

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. CS 189 Spring 2015 Introduction to Machine Learning Final You have 2 hours 50 minutes for the exam. The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. No calculators or

More information

Chapter 7. Learning Hypersurfaces and Quantisation Thresholds

Chapter 7. Learning Hypersurfaces and Quantisation Thresholds Chapter 7 Learning Hypersurfaces and Quantisation Thresholds The research presented in this Chapter has been previously published in Moran (2016). 7.1 Introduction In Chapter 1 I motivated this thesis

More information

Similarity-Preserving Binary Hashing for Image Retrieval in large databases

Similarity-Preserving Binary Hashing for Image Retrieval in large databases Universidad Politécnica de Valencia Master s Final Project Similarity-Preserving Binary Hashing for Image Retrieval in large databases Author: Guillermo García Franco Supervisor: Dr. Roberto Paredes Palacios

More information

Detecting Burnscar from Hyperspectral Imagery via Sparse Representation with Low-Rank Interference

Detecting Burnscar from Hyperspectral Imagery via Sparse Representation with Low-Rank Interference Detecting Burnscar from Hyperspectral Imagery via Sparse Representation with Low-Rank Interference Minh Dao 1, Xiang Xiang 1, Bulent Ayhan 2, Chiman Kwan 2, Trac D. Tran 1 Johns Hopkins Univeristy, 3400

More information

Deep Learning For Video Classification. Presented by Natalie Carlebach & Gil Sharon

Deep Learning For Video Classification. Presented by Natalie Carlebach & Gil Sharon Deep Learning For Video Classification Presented by Natalie Carlebach & Gil Sharon Overview Of Presentation Motivation Challenges of video classification Common datasets 4 different methods presented in

More information

Inception and Residual Networks. Hantao Zhang. Deep Learning with Python.

Inception and Residual Networks. Hantao Zhang. Deep Learning with Python. Inception and Residual Networks Hantao Zhang Deep Learning with Python https://en.wikipedia.org/wiki/residual_neural_network Deep Neural Network Progress from Large Scale Visual Recognition Challenge (ILSVRC)

More information

NEarest neighbor search plays an important role in

NEarest neighbor search plays an important role in 1 EFANNA : An Extremely Fast Approximate Nearest Neighbor Search Algorithm Based on knn Graph Cong Fu, Deng Cai arxiv:1609.07228v3 [cs.cv] 3 Dec 2016 Abstract Approximate nearest neighbor (ANN) search

More information

A Sparse and Locally Shift Invariant Feature Extractor Applied to Document Images

A Sparse and Locally Shift Invariant Feature Extractor Applied to Document Images A Sparse and Locally Shift Invariant Feature Extractor Applied to Document Images Marc Aurelio Ranzato Yann LeCun Courant Institute of Mathematical Sciences New York University - New York, NY 10003 Abstract

More information

Learning Deep Binary Descriptor with Multi-Quantization

Learning Deep Binary Descriptor with Multi-Quantization Learning Deep Binary Descriptor with Multi-Quantization Yueqi Duan, Jiwen Lu, Senior Member, IEEE, Ziwei Wang, Jianjiang Feng, Member, IEEE, and Jie Zhou, Senior Member, IEEE Abstract In this paper, we

More information

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li Learning to Match Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li 1. Introduction The main tasks in many applications can be formalized as matching between heterogeneous objects, including search, recommendation,

More information

Residual Networks And Attention Models. cs273b Recitation 11/11/2016. Anna Shcherbina

Residual Networks And Attention Models. cs273b Recitation 11/11/2016. Anna Shcherbina Residual Networks And Attention Models cs273b Recitation 11/11/2016 Anna Shcherbina Introduction to ResNets Introduced in 2015 by Microsoft Research Deep Residual Learning for Image Recognition (He, Zhang,

More information

FastText. Jon Koss, Abhishek Jindal

FastText. Jon Koss, Abhishek Jindal FastText Jon Koss, Abhishek Jindal FastText FastText is on par with state-of-the-art deep learning classifiers in terms of accuracy But it is way faster: FastText can train on more than one billion words

More information

JOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS. Puyang Xu, Ruhi Sarikaya. Microsoft Corporation

JOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS. Puyang Xu, Ruhi Sarikaya. Microsoft Corporation JOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS Puyang Xu, Ruhi Sarikaya Microsoft Corporation ABSTRACT We describe a joint model for intent detection and slot filling based

More information

Machine Learning. MGS Lecture 3: Deep Learning

Machine Learning. MGS Lecture 3: Deep Learning Dr Michel F. Valstar http://cs.nott.ac.uk/~mfv/ Machine Learning MGS Lecture 3: Deep Learning Dr Michel F. Valstar http://cs.nott.ac.uk/~mfv/ WHAT IS DEEP LEARNING? Shallow network: Only one hidden layer

More information

arxiv: v1 [cs.cv] 19 Oct 2017 Abstract

arxiv: v1 [cs.cv] 19 Oct 2017 Abstract Improved Search in Hamming Space using Deep Multi-Index Hashing Hanjiang Lai and Yan Pan School of Data and Computer Science, Sun Yan-Sen University, China arxiv:70.06993v [cs.cv] 9 Oct 207 Abstract Similarity-preserving

More information

NEarest neighbor search plays an important role in

NEarest neighbor search plays an important role in 1 EFANNA : An Extremely Fast Approximate Nearest Neighbor Search Algorithm Based on knn Graph Cong Fu, Deng Cai arxiv:1609.07228v2 [cs.cv] 18 Nov 2016 Abstract Approximate nearest neighbor (ANN) search

More information

A Unified Approach to Learning Task-Specific Bit Vector Representations for Fast Nearest Neighbor Search

A Unified Approach to Learning Task-Specific Bit Vector Representations for Fast Nearest Neighbor Search A Unified Approach to Learning Task-Specific Bit Vector Representations for Fast Nearest Neighbor Search Vinod Nair Yahoo! Labs Bangalore vnair@yahoo-inc.com Dhruv Mahajan Yahoo! Labs Bangalore dkm@yahoo-inc.com

More information

on learned visual embedding patrick pérez Allegro Workshop Inria Rhônes-Alpes 22 July 2015

on learned visual embedding patrick pérez Allegro Workshop Inria Rhônes-Alpes 22 July 2015 on learned visual embedding patrick pérez Allegro Workshop Inria Rhônes-Alpes 22 July 2015 Vector visual representation Fixed-size image representation High-dim (100 100,000) Generic, unsupervised: BoW,

More information

HENet: A Highly Efficient Convolutional Neural. Networks Optimized for Accuracy, Speed and Storage

HENet: A Highly Efficient Convolutional Neural. Networks Optimized for Accuracy, Speed and Storage HENet: A Highly Efficient Convolutional Neural Networks Optimized for Accuracy, Speed and Storage Qiuyu Zhu Shanghai University zhuqiuyu@staff.shu.edu.cn Ruixin Zhang Shanghai University chriszhang96@shu.edu.cn

More information

1 Training/Validation/Testing

1 Training/Validation/Testing CPSC 340 Final (Fall 2015) Name: Student Number: Please enter your information above, turn off cellphones, space yourselves out throughout the room, and wait until the official start of the exam to begin.

More information

Como funciona o Deep Learning

Como funciona o Deep Learning Como funciona o Deep Learning Moacir Ponti (com ajuda de Gabriel Paranhos da Costa) ICMC, Universidade de São Paulo Contact: www.icmc.usp.br/~moacir moacir@icmc.usp.br Uberlandia-MG/Brazil October, 2017

More information

Structured Prediction using Convolutional Neural Networks

Structured Prediction using Convolutional Neural Networks Overview Structured Prediction using Convolutional Neural Networks Bohyung Han bhhan@postech.ac.kr Computer Vision Lab. Convolutional Neural Networks (CNNs) Structured predictions for low level computer

More information

COMPUTER vision is moving from predicting discrete,

COMPUTER vision is moving from predicting discrete, 1 Learning Two-Branch Neural Networks for Image-Text Matching Tasks Liwei Wang, Yin Li, Jing Huang, Svetlana Lazebnik arxiv:1704.03470v3 [cs.cv] 28 Dec 2017 Abstract Image-language matching tasks have

More information

COMP 551 Applied Machine Learning Lecture 16: Deep Learning

COMP 551 Applied Machine Learning Lecture 16: Deep Learning COMP 551 Applied Machine Learning Lecture 16: Deep Learning Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise noted, all

More information

Improving Image Segmentation Quality Via Graph Theory

Improving Image Segmentation Quality Via Graph Theory International Symposium on Computers & Informatics (ISCI 05) Improving Image Segmentation Quality Via Graph Theory Xiangxiang Li, Songhao Zhu School of Automatic, Nanjing University of Post and Telecommunications,

More information

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun Presented by Tushar Bansal Objective 1. Get bounding box for all objects

More information

Research on Pruning Convolutional Neural Network, Autoencoder and Capsule Network

Research on Pruning Convolutional Neural Network, Autoencoder and Capsule Network Research on Pruning Convolutional Neural Network, Autoencoder and Capsule Network Tianyu Wang Australia National University, Colledge of Engineering and Computer Science u@anu.edu.au Abstract. Some tasks,

More information

Column Sampling based Discrete Supervised Hashing

Column Sampling based Discrete Supervised Hashing Column Sampling based Discrete Supervised Hashing Wang-Cheng Kang, Wu-Jun Li and Zhi-Hua Zhou National Key Laboratory for Novel Software Technology Collaborative Innovation Center of Novel Software Technology

More information

arxiv: v1 [cs.cv] 11 Apr 2018

arxiv: v1 [cs.cv] 11 Apr 2018 Unsupervised Segmentation of 3D Medical Images Based on Clustering and Deep Representation Learning Takayasu Moriya a, Holger R. Roth a, Shota Nakamura b, Hirohisa Oda c, Kai Nagara c, Masahiro Oda a,

More information

Alternatives to Direct Supervision

Alternatives to Direct Supervision CreativeAI: Deep Learning for Graphics Alternatives to Direct Supervision Niloy Mitra Iasonas Kokkinos Paul Guerrero Nils Thuerey Tobias Ritschel UCL UCL UCL TUM UCL Timetable Theory and Basics State of

More information

Deep Generative Models Variational Autoencoders

Deep Generative Models Variational Autoencoders Deep Generative Models Variational Autoencoders Sudeshna Sarkar 5 April 2017 Generative Nets Generative models that represent probability distributions over multiple variables in some way. Directed Generative

More information

Semi-Supervised Clustering with Partial Background Information

Semi-Supervised Clustering with Partial Background Information Semi-Supervised Clustering with Partial Background Information Jing Gao Pang-Ning Tan Haibin Cheng Abstract Incorporating background knowledge into unsupervised clustering algorithms has been the subject

More information

FINE-GRAINED image classification aims to recognize. Fast Fine-grained Image Classification via Weakly Supervised Discriminative Localization

FINE-GRAINED image classification aims to recognize. Fast Fine-grained Image Classification via Weakly Supervised Discriminative Localization 1 Fast Fine-grained Image Classification via Weakly Supervised Discriminative Localization Xiangteng He, Yuxin Peng and Junjie Zhao arxiv:1710.01168v1 [cs.cv] 30 Sep 2017 Abstract Fine-grained image classification

More information

The Boundary Graph Supervised Learning Algorithm for Regression and Classification

The Boundary Graph Supervised Learning Algorithm for Regression and Classification The Boundary Graph Supervised Learning Algorithm for Regression and Classification! Jonathan Yedidia! Disney Research!! Outline Motivation Illustration using a toy classification problem Some simple refinements

More information

Autoencoders. Stephen Scott. Introduction. Basic Idea. Stacked AE. Denoising AE. Sparse AE. Contractive AE. Variational AE GAN.

Autoencoders. Stephen Scott. Introduction. Basic Idea. Stacked AE. Denoising AE. Sparse AE. Contractive AE. Variational AE GAN. Stacked Denoising Sparse Variational (Adapted from Paul Quint and Ian Goodfellow) Stacked Denoising Sparse Variational Autoencoding is training a network to replicate its input to its output Applications:

More information

CHAPTER 6 IDENTIFICATION OF CLUSTERS USING VISUAL VALIDATION VAT ALGORITHM

CHAPTER 6 IDENTIFICATION OF CLUSTERS USING VISUAL VALIDATION VAT ALGORITHM 96 CHAPTER 6 IDENTIFICATION OF CLUSTERS USING VISUAL VALIDATION VAT ALGORITHM Clustering is the process of combining a set of relevant information in the same group. In this process KM algorithm plays

More information

Machine Learning. Topic 5: Linear Discriminants. Bryan Pardo, EECS 349 Machine Learning, 2013

Machine Learning. Topic 5: Linear Discriminants. Bryan Pardo, EECS 349 Machine Learning, 2013 Machine Learning Topic 5: Linear Discriminants Bryan Pardo, EECS 349 Machine Learning, 2013 Thanks to Mark Cartwright for his extensive contributions to these slides Thanks to Alpaydin, Bishop, and Duda/Hart/Stork

More information

Probabilistic Siamese Network for Learning Representations. Chen Liu

Probabilistic Siamese Network for Learning Representations. Chen Liu Probabilistic Siamese Network for Learning Representations by Chen Liu A thesis submitted in conformity with the requirements for the degree of Master of Applied Science Graduate Department of Electrical

More information