Generalized Lasso based Approximation of Sparse Coding for Visual Recognition

Size: px
Start display at page:

Download "Generalized Lasso based Approximation of Sparse Coding for Visual Recognition"

Transcription

1 Generalized Lasso based Approximation of Sparse Coding for Visual Recognition Nobuyuki Morioka The University of New South Wales & NICTA Sydney, Australia Shin ichi Satoh National Institute of Informatics Tokyo, Japan Abstract Sparse coding, a method of explaining sensory data with as few dictionary bases as possible, has attracted much attention in computer vision. For visual object category recognition, l 1 regularized sparse coding is combined with the spatial pyramid representation to obtain state-of-the-art performance. However, because of its iterative optimization, applying sparse coding onto every local feature descriptor extracted from an image database can become a major bottleneck. To overcome this computational challenge, this paper presents Generalized Lasso based Approximation of Sparse coding (). By representing the distribution of sparse coefficients with slice transform, we fit a piece-wise linear mapping function with the generalized lasso. We also propose an efficient post-refinement procedure to perform mutual inhibition between bases which is essential for an overcomplete setting. The experiments show that obtains a comparable performance to l 1 regularized sparse coding, yet achieves a significant speed up demonstrating its effectiveness for large-scale visual recognition problems. 1 Introduction Recently, sparse coding [3, 18] has attracted much attention in computer vision research. Its applications range from image denoising [23] to image segmentation [17] and image classification [10, 24], achieving state-of-the-art results. Sparse coding interprets an input signal x R D 1 with a sparse vector u R K 1 whose linear combination with an overcomplete set of K bases (i.e., D K), also known as dictionary B R D K, reconstructs the input as precisely as possible. To enforce sparseness on u, the l 1 norm is a popular choice due to its computational convenience and its interesting connection with the NP-hard l 0 norm in compressed sensing [2]. Several efficient l 1 regularized sparse coding algorithms have been proposed [4, 14] and are adopted in visual recognition [10, 24]. In particular, Yang et al. [24] compute the spare codes of many local feature descriptors with sparse coding. However, due to the l 1 norm being non-smooth convex, the sparse coding algorithm needs to optimize iteratively until convergence. Therefore, the local feature descriptor coding step becomes a major bottleneck for large-scale problems like visual recognition. The goal of this paper is to achieve state-of-the-art performance on large-scale visual recognition that is comparable to the work of Yang et al. [24], but with a significant improvement in efficiency. To this end, we propose Generalized Lasso based Approximation of Sparse coding, for short. Specifically, we encode the distribution of each dimension in sparse codes with the slice transform representation [9] and learn a piece-wise linear mapping function with the generalized lasso to obtain the best fit [21] to approximate l 1 regularized sparse coding. We further propose an efficient post-refinement procedure to capture the dependency between overcomplete bases. The effectiveness of our approach is demonstrated with several challenging object and scene category datasets, showing a comparable performance to Yang et al. [24] and perforg better than other fast algorithms that obtain sparse codes [22]. While there have been several supervised dictionary 1

2 learning methods for sparse coding to obtain more discriative sparse representations [16, 25], they have not been evaluated on visual recognition with many object categories due to its computational challenges. Furthermore, Ranzato et al. [19] have empirically shown that unsupervised learning of visual features can obtain a more general and effective representation. Therefore, in this paper, we focus on learning a fast approximation of sparse coding in an unsupervised manner. The paper is organized as follows: Section 2 reviews some related work including the linear spatial pyramid combined with sparse coding and other fast algorithms to obtain sparse codes. Section 3 presents. This is followed by the experimental results on several challenging categorization datasets in Section 4. Section 5 concludes the paper with discussion and future work. 2 Related Work 2.1 Linear Spatial Pyramid Matching Using Sparse Coding This section reviews the linear spatial pyramid matching based on sparse coding by Yang et al. [24]. Given a collection of N local feature descriptors randomly sampled from training images X = [x 1, x 2,..., x N ] R D N, an over-complete dictionary B = [b 1, b 2,..., b K ] R D K is learned by B,U N x i Bu i λ u i 1 s.t. b k 2 2 1, k = 1, 2,..., K. (1) i=1 The cost function above is a combination of the reconstruction error and the l 1 sparsity penalty which is controlled by λ. The l 2 norm on each b k is constrained to be less than or equal to 1 to avoid a trival solution. Since both B and [u 1, u 2,..., u N ] are unknown a priori, an alternating optimization technique is often used [14] to optimize over the two parameter sets. Under the spatial pyramid matching framework, each image is divided into a set of sub-regions r = [r 1, r 2,..., r R ]. For example, if 1 1, 2 2 and 4 4 partitions are used on an image, we have 21 sub-regions. Then, we compute the sparse solutions of all local feature descriptors, denoted as U rj, appearing in each sub-region r j by U rj X rj BU rj λ U rj 1. (2) The sparse solutions are max pooled for each sub-region and concatenated with other sub-regions to build a statistic of the image by h = [max( U r1 ), max( U r2 ),..., max( U rr ) ], (3) where max(.) is a function that finds the maximum value at each row of a matrix and returns a column vector. Finally, a linear SVM is trained on a set of image statistics for classification. The main advantage of using sparse coding is that state-of-the-art results can be achieved with a simple linear classifier as reported in [24]. Compared to kernel-based methods, this dramatically speeds up training and testing time of the classifier. However, the step of finding a sparse code for each local descriptor with sparse coding now becomes a major bottleneck. Using the efficient sparse coding algorithm based on feature-sign search [14], the time to compute the solution for one local descriptor u is O(KZ) where Z is the number of non-zeros in u. This paper proposes an approximation method whose time complexity reduces to O(K). With the post-refinement procedure, its time complexity is O(K + Z 2 ) which is still much lower than O(KZ). 2.2 Predictive Sparse Decomposition Predictive sparse decomposition (PSD) described in [10, 11] is a feedforward network that applies a non-linear mapping function on linearly transformed input data to match the optimal sparse coding solution as accurate as possible. Such feedfoward network is defined as: û i = Gg(Wx i, θ), where g(z, θ) denotes a non-linear parametric mapping function which can be of any form, but to name a few there are hyperbolic tangent, tanh(z + θ) and soft shrinkage, sign(z) max( z θ, 0). The function is applied to linearly transformed data Wx i and subsequently scaled by a diagonal matrix 2

3 G. Given training samples {x i } N i=1, the parameters can be estimated either jointly or separately from the dictionary B. When learning jointly, we imize the cost function given below: B,G,W,θ,U i=1 N x i Bu i λ u i 1 + γ u i Gg(Wx i, θ) 2 2. (4) When learning separately, B and U are obtained with Eqn. (1) first. Then, other remaining parameters G, W and θ are estimated by solving the last term of Eqn. (4) only. Gregor and LeCun [7] have later proposed a better, but iterative approximation scheme for l 1 regularized sparse coding. One downside of the parametric approach is its accuracy is largely dependent on how well its parametric function fits the target statistical distribution, as argued by Hel-Or and Shaked [9]. This paper explores a non-parametric approach which can fit any distribution as long as data samples available are representative. The advantage of our approach over the parametric approach is that we do not need to seek an appropriate parametric function for each distribution. This is particularly useful in visual recognition that uses multiple feature types, as it automatically estimates the function form for each feature type from data. We demonstrate this with two different local descriptor types in our experiments. 2.3 Locality-constrained Linear Coding Another notable work that overcomes the bottleneck of the local descriptor coding step is localityconstrained linear coding (LLC) proposed by Wang et al. [22], a fast version of local coordinate coding [26]. Given a local feature descriptor x i, LLC searches for M nearest dictionary bases of each local descriptor x i and these nearest bases stacked in columns are denoted as B φi R D M where φ i indicates the index list of the bases. Then, the coefficients u φi R M 1 whose linear combination with B φi reconstructs x i is solved by: u φi x i B φi u φi 2 2 s.t. 1 u φi = 1. (5) This is the least squares problem which can be solved quite efficiently. The final sparse code u i is obtained by setting its elements indexed at φ i to u φi. The time complexity of LLC is O(K + M 2 ). This excludes the time required to find M nearest neighbours. While it is fast, the resulting sparse solutions obtained are not as discriative as the ones obtained by sparse coding. This may be due to the fact that M is fixed across all local feature descriptors. Some descriptors may need more bases for accurate representation and others may need less bases for more distinctiveness. In contrast, the number of bases selected with our post-refinement procedure to handle the mutual inhibition is different for each local descriptor. 3 Generalized Lasso based Approximation of Sparse Coding This section describes. We first learn a dictionary from a collection of local feature descriptors as given Eqn. (1). Then, based on slice transform representation, we fit a piece-wise linear mapping function with the generalized lasso to approximate the optimal sparse solutions of the local feature descriptors under l 1 regularized sparse coding. Finally, we propose an efficient post-refinement procedure to perform the mutual inhibition. 3.1 Slice Transform Representation Slice transform representation is introduced as a way to discretize a function space so to fit a piecewise linear function for the purpose of image denoising by Hel-Or and Shaked [9]. This is later adopted by Adler et al. [1] for single image super resolution. In this paper, we utilise the representation to approximate sparse coding to obtain sparse codes for local feature descriptor as fast as possible. Given a local descriptor x, we can linearly combine with B to obtain z. For the moment, we just consider one dimension of z denoted as z which is a real value and lies in a half open interval of [a, b). The interval is divided into Q 1 equal-sized bins whose boundaries form a vector q = [q 1, q 2,..., q Q ] such that a = q 1 < q 2 < q Q = b. 3

4 u Data RLS L u Data RLS L u Data RLS L z z z (a) (b) (c) Figure 1: Different approaches to fit a piece-wise linear mapping function. Regularized least squares (RLS) in red (see Eqn. (8)). l 1 -regularized sparse coding (L1-) in magenta (see Eqn. (9)). in green (see Eqn. (10)). (a) All three methods achieving a good fit. (b) A case when L1- fails to extrapolate well at the end and RLS tends to align itself to q in black. (c) A case when data samples at around 0.25 are removed artificially to illustrate that RLS fails to interpolate as no neighoring prior is used. In contrast, can both interpolate and extrapolate well in the case of missing or noisy data. The interval into which the value of z falls is expressed as: π(z) = j if z [q j 1, q j ), and its corresponding residue is calculated by: r(z) = Based on the above, we can re-express z as z q π(z) 1 q π(z) q π(z) 1. z = (1 r(z))q π(z) 1 + r(z)q π(z) = S q (z)q, (6) where S q (z) = [0,..., 0, 1 r(z), r(z), 0,..., 0]. If we now come back to the multivariate case of z = B x, then we have the following: z = [S q (z 1 )q, S q (z 2 )q,..., S q (z K )q], where z k implies the k th dimension of z. Then, we replace the boundary vector q with p = {p 1, p 2,..., p K } such that resulting vector approximates the optimal sparse solution of x obtained by l 1 regularized sparse coding as much as possible. This is written as û = [S q (z 1 )p 1, S q (z 2 )p 2,..., S q (z K )p K ]. (7) Hel-Or and Shaked [9] have formulated the problem of learning each p k as regularized least squares either independently in a transform domain or jointly in a spatial domain. Unlike their setting, we have significantly large number of bases which makes joint optimization of all p k difficult. Moreover, since we are interested in approximating the sparse solutions which are in the transform domain, we learn each p k independently. Given N local descriptors X = [x 1, x 2,..., x N ] R D N and their corresponding sparse solutions U = [u 1, u 2,..., u N ] = [y 1, y 2,..., y K ] R K N obtained with l 1 regularized sparse coding, we have an optimization problem given as p k y k S k p k α q p k 2 2, (8) where S k = S q (z k ). The regularization of the second term is essential to avoid singularity when computing the inverse and its consequence is that p k is encouraged to align itself to q when not many data samples are available. This might have been a reasonable prior for image denoising [9], but not desirable for the purpose of approximating sparse coing, as we would like to suppress most of the coefficients in u to zero. Figure 1 shows the distribution of one dimension of sparse coefficients z obtained from a collection of SIFT descriptors and q does not look similar to the distribution. This motivates us to look at the generalized lasso [21] as an alternative for obtaining a better fit of the distribution of the coefficients. 3.2 Generalized Lasso In the previous section, we have argued that regularized least squares stated in Eqn. (8) does not give the desired result. Instead most intervals need to be set to zero. This naturally leads us to consider l 1 regularized sparse coding also known as the lasso which is formulated as: p k y k S k p k α p k 1. (9) 4

5 However, the drawback of this is that the learnt piece-wise linear function may become unstable in cases when training data is noisy or missing as illustrated in Figure 1 (b) and (c). It turns out l 1 trend filtering [12], generally known as the generalized lasso [21], can overcome this problem. This is expressed as y p k S k p k α Dp k 1, (10) k where D R (Q 2) Q is referred to as a penalty matrix and defined as D = (11) To solve the above optimization problem, we can turn it into the sparse coding problem [21]. Since D is not invertible, the key is to augment D with A R 2 Q to build a square matrix D = [D; A] R Q Q such that rank( D) = Q and the rows of A are orthogonal to the rows of D. To satisfy such constraints, A can for example be set to [1, 2,..., Q; 2, 3,..., Q + 1]. If we let θ = [θ 1 ; θ 2 ] = Dp k where θ 1 = Dp k and θ 2 = Ap k, then S k p k = S k D 1 θ = S k1 θ 1 + S k2 θ 2. After some substitutions, we see that θ 2 can be solved by: θ 2 = (S k2s k2 ) 1 S k2(y k S k1 θ 1 ), given θ 1 is solved already. Now, to solve θ 1, we have the following sparse coding problem: θ 1 (I P)y k (I P)S k1 θ α θ 1 1, (12) where P = S k2 (S k2s k2 ) 1 S k2. Having computed both θ 1 and θ 2, we can recover the solution of p k by D 1 θ. Further details can be found in [21]. Given the learnt p, we can approximate sparse solution of x by Eqn. (7). However, explicitly computing S q (z) and multiplying it by p is somewhat redundant. Thus, we can alternatively compute each component of û as follows: û k = (1 r(z k )) p k (π(z k ) 1) + r(z k ) p k (π(z k )), (13) whose time complexity becomes O(K). In Eqn. (13), since we are essentially using p k as a lookup table, the complexity is independent from Q. This is followed by l 1 normalization on û. While û can readily be used for the spatial max pooling as stated in Eqn. (3), it does not yet capture any explaining away effect where the corresponding coefficients of correlated bases are mutually inhibited to remove redundancy. This is because each p k is estimated independently in the transform domain [9]. In the next section, we propose an efficient post-refinement technique to mutually inhibit between the bases. 3.3 Capturing Dependency Between Bases To handle the mutual inhibition between overcomplete bases, this section explains how to refine the sparse codes by solving regularized least squares on a significantly small active basis set. Given a local descriptor x and its initial sparse code û estimated with above method, we set the non-zero components of the code to be active. By denoting a set of these active components as φ, we have û φ and B φ which are the subsets of the sparse code and dictionary bases respectively. The goal is to compute the refined code of û φ denoted as ˆv φ such that B φ v φ reconstructs x i as accurately as possible. We formulate this as regularised least squares given below: ˆv φ x B φˆv φ β ˆv φ û φ 2 2, (14) where β is the weight parameter of the regularisation. This is convex and has the following analytical solution: ˆv φ = (B φ B φ + βi) 1 (B φ x + βû φ ). The intuition behind the above formulation is that the initial sparse code û is considered as a good starting point for refinement to further reduce the reconstruction error by allowing redundant bases to compete against each other. Empirically, the number of active components for each û is substantially small compared to the whole basis set. Hence, a linear system to be solved becomes much smaller 5

6 SIFT (128 Dim.) [15] Methods KM LLC [22] PSD [11] [24] + 15 Train 55.5± ± ± ± ± ± Train 63.0± ± ± ± ± ±0.7 Time (sec) Local Self-Similarity (30 Dim.) [20] Methods KM LLC [22] PSD [11] [24] + 15 Train 60.1± ± ± ± ± ± Train 63.0± ± ± ± ± ±1.1 Time (sec) Table 1: Recognition accuracy on Caltech-101. The dictionary sizes for all methods are set to We also report the time taken to process 1000 local descriptors for each method. which is computationally cheap. We also make sure that we do not deviate too much from the initial solution by introducing the regularization on ˆv φ. This refinement procedure may be similar to LLC [22]. However, in our case, we do not preset the number of active bases and detere by non-zero components of û. More importantly, we base our final solution on û and do not perform nearest neighbor search. With this refinement procedure, the total time complexity becomes O(K + Z 2 ). We refer with this post-refinement procedure as +. 4 Experimental Results This section evaluates and + on several challenging categorization datasets. To learn the mapping function, we have used 50,000 local descriptors as data samples. The parameters Q, α and β are fixed to 10, 0.1 and 0.25 respectively for all experiments, unless otherwise stated. For comparison, we have implemented methods discussed in Section 2. is our re-implementation of Yang et al. [24]. LLC is locality-constrained linear coding proposed by Wang et al. [22]. The number of nearest neighors to consider is set to 5. PSD is predictive sparse decomposition [11]. Shrinkage function is used as its parametric mapping function. We also include KM which builds its codebook with k-means clustering and adopts hard-assignment as its local descriptor coding. For all methods, exactly the same local feature descriptors, spatial max pooling technique and linear SVM are used to only compare the difference between the local feature descriptor coding techniques. As for the descriptors, SIFT [15] and Local Self-Similarity [20] are used. SIFT is a histogram of gradient directions computed over an image patch - capturing appearance information. We have sampled a patch at every 8 pixel step. In contrast, Local Self-Similarity computes correlation between a small image patch of interest and its surrounding region which captures the geometric layout of a local region. Spatial max pooling with 1 1, 2 2 and 4 4 image partitions is used. The implementation is all done in MATLAB for fair comparison. 4.1 Caltech-101 The Caltech-101 dataset [5] consists of 9144 images which are divided into 101 object categories. The images are scaled down to preserving their aspect ratios. We train with 15/30 images per class and test with 15 images per class. The dictionary size of each method is set to 1024 for both SIFT and Local Self-Similarity. The results are averaged over eight random training and testing splits and are reported in Table 1. For SIFT, + is consistently better than demonstrating the effectiveness of mutual inhibition by the post-refinement procedure. Both and + performs better than other fast algorithms that produces sparse codes. In addition and + performs competitively against. In fact, + is slightly better when 30 training images per class are used. While sparse codes for both and + are learned from the solutions of, the approximated codes are not exactly the same as the ones of. Moreover, sometimes produces unstable codes due to the non-smooth convex property of l 1 norm as previously observed in [6]. In contrast, + 6

7 Q Alpha RLS % 10% 20% 30% 40% % of Missing Data (a) (b) (c) Figure 2: (a) Q, the number of bins to quantize the interval of each sparse code component. (b) α, the parameter that controls the weight of the norm used for the generalized lasso. (c) When some data samples are missing is more robust than regularized least squares given in Eqn. (8). approximates its sparse codes with a relatively smooth piece-wise linear mapping function learned with the generalized lasso (note that the l 1 norm penalizes on changes in the shape of the function) and performs smooth post-refinement. We suspect these differences may be contributing to the slightly better results of + on this dataset. Although PSD performs quite close to for SIFT, this is not the case for Local Self-Similarity. outperforms PSD probably due to the distribution of sparse codes is not captured well by a simple shrinkage function. Therefore, might be more effective for a wide range of distributions. This is useful for recognition using multiple feature types where speed is critical. performs worse than, but + closes the gap between and. We suspect that due to Local Self-Similarity (30 dim.) being relatively low-dimensional than SIFT (128 dim.), the mutual inhibition becomes more important. This might also explain why LLC has performed reasonably well for this descriptor. Table 1 also reports computational time taken to process 1000 local descriptors for each method. and + are slower than KM and PSD, but are slightly faster than LLC and significantly faster than. This demonstrates the practical importance of our approach where competitive recognition results are achieved with fast computation. Different values for Q, α and β are evaluated one parameter at a time. Figure 2 (a) shows the results of different Q. The results are very stable after 10 bins. As sparse codes are computed by Eqn. (13), the time complexity is not affected by what Q is chosen. Figure 2 (b) shows the results for different α which look very stable. We also observe similar stability for β. We also validate if the generalized lasso given in Eqn. (10) is more robust than the regularized least squares solution given in Eqn. (8) when some data samples are missing. When learning each q k, we artificially remove data samples from an interval centered around a randomly sampled point, as also illustrated in Figure 1 (c). We evaluate with different numbers of data samples removed in terms of percentages of the whole data sample set. The results are shown in Figure 2 (c) where the performance of RLS significantly drops as the number of missing data is increased. However, both and + are not affected that much. 4.2 Caltech-256 Caltech-256 [8] contains 30,607 images and 256 object categories in total. Like Caltech-101, we scale the images down to preserving their aspect ratios. The results are averaged over eight random training and testing splits and are reported in Table 2. We use 25 testing images per class. This time, for SIFT, performs slightly worse than, but + outperforms probably due to the same argument given in the previous experiments on Caltech-101. For Local Self-Similarity, results similar to Caltech-101 are obtained. The performance of PSD is close to KM and is outperformed by, suggesting the inadequate fitting of sparse codes. LLC performs slightly better than, but could not perform better than +. While performed the best, the performance of + is quite close to. We also plot a graph of the computational time taken for each method with its achieved accuracy on SIFT and Local Self-Similarity in Figure 3 (a) and (b) respectively. 7

8 SIFT (128 Dim.) [15] Methods KM LLC [22] PSD [11] [24] + 15 Train 22.7± ± ± ± ± ± Train 27.4± ± ± ± ± ±0.4 Local Self-Similarity (30 Dim.) [20] Methods KM LLC [22] PSD [11] [24] + 15 Train 23.7± ± ± ± ± ± Train 28.5± ± ± ± ± ±0.5 Table 2: Recognition accuracy on Caltech-256. The dictionary sizes are all set to 2048 for SIFT and 1024 for Local Self-Similarity KM LLC PSD Computational Time KM 32 LLC PSD Computational Time KM LLC 78 PSD Computational Time (a) (b) (c) Figure 3: Plotting computational time vs. average recognition. (a) and (b) are SIFT and Local-Self Similarity respectively evaluated on Caltech-256 with 30 training images. The dictionary size is set to (c) is SIFT evaluated on 15 Scenes. The dictionary size is set to Scenes The 15 Scenes [13] dataset contains 4485 images divided into 15 scene classes ranging from indoor scenes to outdoor scenes. 100 training images per class are used for training and the rest for testing. We used SIFT to learn 1024 dictionary bases for each method. The results are plotted with computational time taken in Figure 3 (c). The result of + (80.6%) are very similar to that of (80.7%), yet the former is significantly faster. In summary, we show that our approach works well on three different challenging datasets. 5 Conclusion This paper has presented an approximation of l 1 sparse coding based on the generalized lasso called. This is further extended with the post-refinement procedure to handle mutual inhibition between bases which are essential in an overcomplete setting. The experiments have shown competitive performance of against and achieved significant computational speed up. We have also demonstrated that the effectiveness of on two local descriptor types, namely SIFT and Local Self-Similarity where LLC and PSD only perform well on one type. is not restricted to only approximate l 1 sparse coding, but should be applicable to other variations of sparse coding in general. For example, it may be interesting to try on Laplacian sparse coding [6] that achieves smoother sparse codes than l 1 sparse coding. Acknowledgment NICTA is funded by the Australian Government as represented by the Department of Broadband, Communications and the Digital Economy and the Australian Research Council through the ICT Centre of Excellence program. 8

9 References [1] A. Adler, Y. Hel-Or, and M. Elad. A Shrinkage Learning Approach for Single Image Super- Resolution with Overcomplete Representations. In ECCV, [2] D.L. Donoho. For Most Large Underdetered Systems of Linear Equations the Minimal L1- norm Solution is also the Sparse Solution. Communications on Pure and Applied Mathematics, [3] D.L. Donoho and M. Elad. Optimally sparse representation in general (nonorthogonal) dictionaries via L1 imization. PNAS, 100(5): , [4] B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani. Least Angle Regression. Annals of Statistics, [5] L. Fei-Fei, R. Fergus, and P. Perona. Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories. In CVPR Workshop, [6] S. Gao, W. Tsang, L. Chia, and P. Zhao. Local Features Are Not Lonely - Laplacian Sparse Coding for Image Classification. In CVPR, [7] K. Gregor and Y. LeCun. Learning fast approximations of sparse coding. In ICML, [8] G. Griffin, A. Holub, and P. Perona. Caltech-256 Object Category Dataset. Technical Report, California Institute of Technology, [9] Y. Hel-Or and D. Shaked. A Discriative Approach for Wavelet Denoising. TIP, [10] K. Jarrett, K. Kavukcuoglu, M. Ranzato, and Y. LeCun. What is the Best Multi-Stage Architecture for Object Recognition. In ICCV, [11] K Kavukcuoglu, M Ranzato, and Y Lecun. Fast inference in sparse coding algorithms with applications to object recognition. Technical rrport CBLL-TR , Computational and Biological Learning Lab, Courant Institute, NYU, [12] S.-J. Kim, K. Koh, S. Boyd, and D. Gorinevsky. L1 trend filtering. SIAM Review, [13] S. Lazebnik, C. Schmid, and J. Ponce. Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories. In CVPR, [14] H. Lee, A. Battle, R. Raina, and A.Y. Ng. Efficient sparse coding algorithms. In NIPS, [15] D.G. Lowe. Distinctive Image Features from Scale-Invariant Keypoints. IJCV, [16] J. Mairal, F. Bach, J. Ponce, G. Sapiro, and A. Zisserman. Supervised Dictionary Learning. In NIPS, [17] J. Mairal, M. Leordeanu, F. Bach, M. Hebert, and J. Ponce. Discriative Sparse Image Models for Class-Specific Edge Detection and Image Interpretation. In ECCV, [18] B.A. Olshausen and D.J. Field. Sparse coding with an overcomplete basis set: A strategy employed by V1? Vision Research, 37, [19] M. Ranzato, F.J. Huang, Y. Boureau, and Y. LeCun. Unsupervised Learning of Invariant Feature Hierarchies with Applications to Object Recognition. In CVPR, [20] E. Shechtman and M. Irani. Matching Local Self-Similarities across Image and Videos. In CVPR, [21] R. Tibshirani and J. Taylor. The Solution Path of the Generalized Lasso. The Annals of Statistics, [22] J. Wang, J. Yang, K. Yu, F. Lv, T. Huang, and Y. Gong. Locality-constrained Linear Coding for Image Classification. In CVPR, [23] J. Yang, J. Wright, T. Huang, and Y. Ma. Image Super-Resolution via Sparse Representation. TIP, [24] J. Yang, K. Yu, Y. Gong, and T.S. Huang. Linear spatial pyramid matching using sparse coding for image classification. In CVPR, [25] J. Yang, K. Yu, and T. Huang. Supervised Translation-Invariant Sparse Coding. In CVPR, [26] K. Yu, T. Zhang, and Y. Gong. Nonlinear Learning using Local Coordinate Coding. In NIPS,

arxiv: v1 [cs.lg] 20 Dec 2013

arxiv: v1 [cs.lg] 20 Dec 2013 Unsupervised Feature Learning by Deep Sparse Coding Yunlong He Koray Kavukcuoglu Yun Wang Arthur Szlam Yanjun Qi arxiv:1312.5783v1 [cs.lg] 20 Dec 2013 Abstract In this paper, we propose a new unsupervised

More information

Aggregating Descriptors with Local Gaussian Metrics

Aggregating Descriptors with Local Gaussian Metrics Aggregating Descriptors with Local Gaussian Metrics Hideki Nakayama Grad. School of Information Science and Technology The University of Tokyo Tokyo, JAPAN nakayama@ci.i.u-tokyo.ac.jp Abstract Recently,

More information

Discriminative sparse model and dictionary learning for object category recognition

Discriminative sparse model and dictionary learning for object category recognition Discriative sparse model and dictionary learning for object category recognition Xiao Deng and Donghui Wang Institute of Artificial Intelligence, Zhejiang University Hangzhou, China, 31007 {yellowxiao,dhwang}@zju.edu.cn

More information

IMA Preprint Series # 2281

IMA Preprint Series # 2281 DICTIONARY LEARNING AND SPARSE CODING FOR UNSUPERVISED CLUSTERING By Pablo Sprechmann and Guillermo Sapiro IMA Preprint Series # 2281 ( September 2009 ) INSTITUTE FOR MATHEMATICS AND ITS APPLICATIONS UNIVERSITY

More information

Sparse Coding and Dictionary Learning for Image Analysis

Sparse Coding and Dictionary Learning for Image Analysis Sparse Coding and Dictionary Learning for Image Analysis Part IV: Recent Advances in Computer Vision and New Models Francis Bach, Julien Mairal, Jean Ponce and Guillermo Sapiro CVPR 10 tutorial, San Francisco,

More information

Sparse coding for image classification

Sparse coding for image classification Sparse coding for image classification Columbia University Electrical Engineering: Kun Rong(kr2496@columbia.edu) Yongzhou Xiang(yx2211@columbia.edu) Yin Cui(yc2776@columbia.edu) Outline Background Introduction

More information

Bilevel Sparse Coding

Bilevel Sparse Coding Adobe Research 345 Park Ave, San Jose, CA Mar 15, 2013 Outline 1 2 The learning model The learning algorithm 3 4 Sparse Modeling Many types of sensory data, e.g., images and audio, are in high-dimensional

More information

Learning Feature Hierarchies for Object Recognition

Learning Feature Hierarchies for Object Recognition Learning Feature Hierarchies for Object Recognition Koray Kavukcuoglu Computer Science Department Courant Institute of Mathematical Sciences New York University Marc Aurelio Ranzato, Kevin Jarrett, Pierre

More information

Modeling Visual Cortex V4 in Naturalistic Conditions with Invari. Representations

Modeling Visual Cortex V4 in Naturalistic Conditions with Invari. Representations Modeling Visual Cortex V4 in Naturalistic Conditions with Invariant and Sparse Image Representations Bin Yu Departments of Statistics and EECS University of California at Berkeley Rutgers University, May

More information

String distance for automatic image classification

String distance for automatic image classification String distance for automatic image classification Nguyen Hong Thinh*, Le Vu Ha*, Barat Cecile** and Ducottet Christophe** *University of Engineering and Technology, Vietnam National University of HaNoi,

More information

Learning Convolutional Feature Hierarchies for Visual Recognition

Learning Convolutional Feature Hierarchies for Visual Recognition Learning Convolutional Feature Hierarchies for Visual Recognition Koray Kavukcuoglu, Pierre Sermanet, Y-Lan Boureau, Karol Gregor, Michael Mathieu, Yann LeCun Computer Science Department Courant Institute

More information

Hierarchical Matching Pursuit for Image Classification: Architecture and Fast Algorithms

Hierarchical Matching Pursuit for Image Classification: Architecture and Fast Algorithms Hierarchical Matching Pursuit for Image Classification: Architecture and Fast Algorithms Liefeng Bo University of Washington Seattle WA 98195, USA Xiaofeng Ren ISTC-Pervasive Computing Intel Labs Seattle

More information

Previously. Part-based and local feature models for generic object recognition. Bag-of-words model 4/20/2011

Previously. Part-based and local feature models for generic object recognition. Bag-of-words model 4/20/2011 Previously Part-based and local feature models for generic object recognition Wed, April 20 UT-Austin Discriminative classifiers Boosting Nearest neighbors Support vector machines Useful for object recognition

More information

Beyond bags of features: Adding spatial information. Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba

Beyond bags of features: Adding spatial information. Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba Beyond bags of features: Adding spatial information Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba Adding spatial information Forming vocabularies from pairs of nearby features doublets

More information

Effectiveness of Sparse Features: An Application of Sparse PCA

Effectiveness of Sparse Features: An Application of Sparse PCA 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

Multi-Class Image Classification: Sparsity Does It Better

Multi-Class Image Classification: Sparsity Does It Better Multi-Class Image Classification: Sparsity Does It Better Sean Ryan Fanello 1,2, Nicoletta Noceti 2, Giorgio Metta 1 and Francesca Odone 2 1 Department of Robotics, Brain and Cognitive Sciences, Istituto

More information

Using Geometric Blur for Point Correspondence

Using Geometric Blur for Point Correspondence 1 Using Geometric Blur for Point Correspondence Nisarg Vyas Electrical and Computer Engineering Department, Carnegie Mellon University, Pittsburgh, PA Abstract In computer vision applications, point correspondence

More information

Part-based and local feature models for generic object recognition

Part-based and local feature models for generic object recognition Part-based and local feature models for generic object recognition May 28 th, 2015 Yong Jae Lee UC Davis Announcements PS2 grades up on SmartSite PS2 stats: Mean: 80.15 Standard Dev: 22.77 Vote on piazza

More information

Beyond Bags of Features

Beyond Bags of Features : for Recognizing Natural Scene Categories Matching and Modeling Seminar Instructed by Prof. Haim J. Wolfson School of Computer Science Tel Aviv University December 9 th, 2015

More information

Sparse Models in Image Understanding And Computer Vision

Sparse Models in Image Understanding And Computer Vision Sparse Models in Image Understanding And Computer Vision Jayaraman J. Thiagarajan Arizona State University Collaborators Prof. Andreas Spanias Karthikeyan Natesan Ramamurthy Sparsity Sparsity of a vector

More information

Learning Visual Semantics: Models, Massive Computation, and Innovative Applications

Learning Visual Semantics: Models, Massive Computation, and Innovative Applications Learning Visual Semantics: Models, Massive Computation, and Innovative Applications Part II: Visual Features and Representations Liangliang Cao, IBM Watson Research Center Evolvement of Visual Features

More information

Sparsity and image processing

Sparsity and image processing Sparsity and image processing Aurélie Boisbunon INRIA-SAM, AYIN March 6, Why sparsity? Main advantages Dimensionality reduction Fast computation Better interpretability Image processing pattern recognition

More information

IMAGE DENOISING USING NL-MEANS VIA SMOOTH PATCH ORDERING

IMAGE DENOISING USING NL-MEANS VIA SMOOTH PATCH ORDERING IMAGE DENOISING USING NL-MEANS VIA SMOOTH PATCH ORDERING Idan Ram, Michael Elad and Israel Cohen Department of Electrical Engineering Department of Computer Science Technion - Israel Institute of Technology

More information

Robust Face Recognition via Sparse Representation

Robust Face Recognition via Sparse Representation Robust Face Recognition via Sparse Representation Panqu Wang Department of Electrical and Computer Engineering University of California, San Diego La Jolla, CA 92092 pawang@ucsd.edu Can Xu Department of

More information

Object Classification Problem

Object Classification Problem HIERARCHICAL OBJECT CATEGORIZATION" Gregory Griffin and Pietro Perona. Learning and Using Taxonomies For Fast Visual Categorization. CVPR 2008 Marcin Marszalek and Cordelia Schmid. Constructing Category

More information

arxiv: v3 [cs.cv] 3 Oct 2012

arxiv: v3 [cs.cv] 3 Oct 2012 Combined Descriptors in Spatial Pyramid Domain for Image Classification Junlin Hu and Ping Guo arxiv:1210.0386v3 [cs.cv] 3 Oct 2012 Image Processing and Pattern Recognition Laboratory Beijing Normal University,

More information

Linear Spatial Pyramid Matching Using Sparse Coding for Image Classification

Linear Spatial Pyramid Matching Using Sparse Coding for Image Classification Linear Spatial Pyramid Matching Using Sparse Coding for Image Classification Jianchao Yang,KaiYu, Yihong Gong, Thomas Huang Beckman Institute, University of Illinois at Urbana-Champaign NEC Laboratories

More information

Beyond Spatial Pyramids: Receptive Field Learning for Pooled Image Features

Beyond Spatial Pyramids: Receptive Field Learning for Pooled Image Features Beyond Spatial Pyramids: Receptive Field Learning for Pooled Image Features Yangqing Jia UC Berkeley EECS jiayq@berkeley.edu Chang Huang NEC Labs America chuang@sv.nec-labs.com Abstract We examine the

More information

Discrete Optimization of Ray Potentials for Semantic 3D Reconstruction

Discrete Optimization of Ray Potentials for Semantic 3D Reconstruction Discrete Optimization of Ray Potentials for Semantic 3D Reconstruction Marc Pollefeys Joined work with Nikolay Savinov, Christian Haene, Lubor Ladicky 2 Comparison to Volumetric Fusion Higher-order ray

More information

Supervised Translation-Invariant Sparse Coding

Supervised Translation-Invariant Sparse Coding Supervised Translation-Invariant Sparse Coding Jianchao Yang,KaiYu, Thomas Huang Beckman Institute, University of Illinois at Urbana-Champaign NEC Laboratories America, Inc., Cupertino, California {jyang29,

More information

LEARNING A SPARSE DICTIONARY OF VIDEO STRUCTURE FOR ACTIVITY MODELING. Nandita M. Nayak, Amit K. Roy-Chowdhury. University of California, Riverside

LEARNING A SPARSE DICTIONARY OF VIDEO STRUCTURE FOR ACTIVITY MODELING. Nandita M. Nayak, Amit K. Roy-Chowdhury. University of California, Riverside LEARNING A SPARSE DICTIONARY OF VIDEO STRUCTURE FOR ACTIVITY MODELING Nandita M. Nayak, Amit K. Roy-Chowdhury University of California, Riverside ABSTRACT We present an approach which incorporates spatiotemporal

More information

Selection of Scale-Invariant Parts for Object Class Recognition

Selection of Scale-Invariant Parts for Object Class Recognition Selection of Scale-Invariant Parts for Object Class Recognition Gy. Dorkó and C. Schmid INRIA Rhône-Alpes, GRAVIR-CNRS 655, av. de l Europe, 3833 Montbonnot, France fdorko,schmidg@inrialpes.fr Abstract

More information

Learning-based Methods in Vision

Learning-based Methods in Vision Learning-based Methods in Vision 16-824 Sparsity and Deep Learning Motivation Multitude of hand-designed features currently in use in vision - SIFT, HoG, LBP, MSER, etc. Even the best approaches, just

More information

Spatial Locality-Aware Sparse Coding and Dictionary Learning

Spatial Locality-Aware Sparse Coding and Dictionary Learning JMLR: Workshop and Conference Proceedings 25:491 505, 2012 Asian Conference on Machine Learning Spatial Locality-Aware Sparse Coding and Dictionary Learning Jiang Wang 2145 Sheridan Road, Evanston IL 60208

More information

Ask the locals: multi-way local pooling for image recognition

Ask the locals: multi-way local pooling for image recognition Ask the locals: multi-way local pooling for image recognition Y-Lan Boureau, Nicolas Le Roux, Francis Bach, Jean Ponce, Yann Lecun To cite this version: Y-Lan Boureau, Nicolas Le Roux, Francis Bach, Jean

More information

Sketchable Histograms of Oriented Gradients for Object Detection

Sketchable Histograms of Oriented Gradients for Object Detection Sketchable Histograms of Oriented Gradients for Object Detection No Author Given No Institute Given Abstract. In this paper we investigate a new representation approach for visual object recognition. The

More information

Local Features and Bag of Words Models

Local Features and Bag of Words Models 10/14/11 Local Features and Bag of Words Models Computer Vision CS 143, Brown James Hays Slides from Svetlana Lazebnik, Derek Hoiem, Antonio Torralba, David Lowe, Fei Fei Li and others Computer Engineering

More information

Part based models for recognition. Kristen Grauman

Part based models for recognition. Kristen Grauman Part based models for recognition Kristen Grauman UT Austin Limitations of window-based models Not all objects are box-shaped Assuming specific 2d view of object Local components themselves do not necessarily

More information

Learning based face hallucination techniques: A survey

Learning based face hallucination techniques: A survey Vol. 3 (2014-15) pp. 37-45. : A survey Premitha Premnath K Department of Computer Science & Engineering Vidya Academy of Science & Technology Thrissur - 680501, Kerala, India (email: premithakpnath@gmail.com)

More information

Computer vision: models, learning and inference. Chapter 13 Image preprocessing and feature extraction

Computer vision: models, learning and inference. Chapter 13 Image preprocessing and feature extraction Computer vision: models, learning and inference Chapter 13 Image preprocessing and feature extraction Preprocessing The goal of pre-processing is to try to reduce unwanted variation in image due to lighting,

More information

Improved Spatial Pyramid Matching for Image Classification

Improved Spatial Pyramid Matching for Image Classification Improved Spatial Pyramid Matching for Image Classification Mohammad Shahiduzzaman, Dengsheng Zhang, and Guojun Lu Gippsland School of IT, Monash University, Australia {Shahid.Zaman,Dengsheng.Zhang,Guojun.Lu}@monash.edu

More information

Beyond Bags of features Spatial information & Shape models

Beyond Bags of features Spatial information & Shape models Beyond Bags of features Spatial information & Shape models Jana Kosecka Many slides adapted from S. Lazebnik, FeiFei Li, Rob Fergus, and Antonio Torralba Detection, recognition (so far )! Bags of features

More information

Facial Expression Classification with Random Filters Feature Extraction

Facial Expression Classification with Random Filters Feature Extraction Facial Expression Classification with Random Filters Feature Extraction Mengye Ren Facial Monkey mren@cs.toronto.edu Zhi Hao Luo It s Me lzh@cs.toronto.edu I. ABSTRACT In our work, we attempted to tackle

More information

BossaNova at ImageCLEF 2012 Flickr Photo Annotation Task

BossaNova at ImageCLEF 2012 Flickr Photo Annotation Task BossaNova at ImageCLEF 2012 Flickr Photo Annotation Task S. Avila 1,2, N. Thome 1, M. Cord 1, E. Valle 3, and A. de A. Araújo 2 1 Pierre and Marie Curie University, UPMC-Sorbonne Universities, LIP6, France

More information

Spatial Latent Dirichlet Allocation

Spatial Latent Dirichlet Allocation Spatial Latent Dirichlet Allocation Xiaogang Wang and Eric Grimson Computer Science and Computer Science and Artificial Intelligence Lab Massachusetts Tnstitute of Technology, Cambridge, MA, 02139, USA

More information

IMAGE RETRIEVAL USING VLAD WITH MULTIPLE FEATURES

IMAGE RETRIEVAL USING VLAD WITH MULTIPLE FEATURES IMAGE RETRIEVAL USING VLAD WITH MULTIPLE FEATURES Pin-Syuan Huang, Jing-Yi Tsai, Yu-Fang Wang, and Chun-Yi Tsai Department of Computer Science and Information Engineering, National Taitung University,

More information

Improving Recognition through Object Sub-categorization

Improving Recognition through Object Sub-categorization Improving Recognition through Object Sub-categorization Al Mansur and Yoshinori Kuno Graduate School of Science and Engineering, Saitama University, 255 Shimo-Okubo, Sakura-ku, Saitama-shi, Saitama 338-8570,

More information

Machine Learning / Jan 27, 2010

Machine Learning / Jan 27, 2010 Revisiting Logistic Regression & Naïve Bayes Aarti Singh Machine Learning 10-701/15-781 Jan 27, 2010 Generative and Discriminative Classifiers Training classifiers involves learning a mapping f: X -> Y,

More information

Evaluation and comparison of interest points/regions

Evaluation and comparison of interest points/regions Introduction Evaluation and comparison of interest points/regions Quantitative evaluation of interest point/region detectors points / regions at the same relative location and area Repeatability rate :

More information

CS 4495 Computer Vision A. Bobick. CS 4495 Computer Vision. Features 2 SIFT descriptor. Aaron Bobick School of Interactive Computing

CS 4495 Computer Vision A. Bobick. CS 4495 Computer Vision. Features 2 SIFT descriptor. Aaron Bobick School of Interactive Computing CS 4495 Computer Vision Features 2 SIFT descriptor Aaron Bobick School of Interactive Computing Administrivia PS 3: Out due Oct 6 th. Features recap: Goal is to find corresponding locations in two images.

More information

Gradient LASSO algoithm

Gradient LASSO algoithm Gradient LASSO algoithm Yongdai Kim Seoul National University, Korea jointly with Yuwon Kim University of Minnesota, USA and Jinseog Kim Statistical Research Center for Complex Systems, Korea Contents

More information

Detecting Burnscar from Hyperspectral Imagery via Sparse Representation with Low-Rank Interference

Detecting Burnscar from Hyperspectral Imagery via Sparse Representation with Low-Rank Interference Detecting Burnscar from Hyperspectral Imagery via Sparse Representation with Low-Rank Interference Minh Dao 1, Xiang Xiang 1, Bulent Ayhan 2, Chiman Kwan 2, Trac D. Tran 1 Johns Hopkins Univeristy, 3400

More information

Regularized Tensor Factorizations & Higher-Order Principal Components Analysis

Regularized Tensor Factorizations & Higher-Order Principal Components Analysis Regularized Tensor Factorizations & Higher-Order Principal Components Analysis Genevera I. Allen Department of Statistics, Rice University, Department of Pediatrics-Neurology, Baylor College of Medicine,

More information

Diffusion Wavelets for Natural Image Analysis

Diffusion Wavelets for Natural Image Analysis Diffusion Wavelets for Natural Image Analysis Tyrus Berry December 16, 2011 Contents 1 Project Description 2 2 Introduction to Diffusion Wavelets 2 2.1 Diffusion Multiresolution............................

More information

Translation Symmetry Detection: A Repetitive Pattern Analysis Approach

Translation Symmetry Detection: A Repetitive Pattern Analysis Approach 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops Translation Symmetry Detection: A Repetitive Pattern Analysis Approach Yunliang Cai and George Baciu GAMA Lab, Department of Computing

More information

SUMMARY: DISTINCTIVE IMAGE FEATURES FROM SCALE- INVARIANT KEYPOINTS

SUMMARY: DISTINCTIVE IMAGE FEATURES FROM SCALE- INVARIANT KEYPOINTS SUMMARY: DISTINCTIVE IMAGE FEATURES FROM SCALE- INVARIANT KEYPOINTS Cognitive Robotics Original: David G. Lowe, 004 Summary: Coen van Leeuwen, s1460919 Abstract: This article presents a method to extract

More information

Image Deblurring Using Adaptive Sparse Domain Selection and Adaptive Regularization

Image Deblurring Using Adaptive Sparse Domain Selection and Adaptive Regularization Volume 3, No. 3, May-June 2012 International Journal of Advanced Research in Computer Science RESEARCH PAPER Available Online at www.ijarcs.info ISSN No. 0976-5697 Image Deblurring Using Adaptive Sparse

More information

ROTATION INVARIANT SPARSE CODING AND PCA

ROTATION INVARIANT SPARSE CODING AND PCA ROTATION INVARIANT SPARSE CODING AND PCA NATHAN PFLUEGER, RYAN TIMMONS Abstract. We attempt to encode an image in a fashion that is only weakly dependent on rotation of objects within the image, as an

More information

Locality-constrained and Spatially Regularized Coding for Scene Categorization

Locality-constrained and Spatially Regularized Coding for Scene Categorization Locality-constrained and Spatially Regularized Coding for Scene Categorization Aymen Shabou Hervé Le Borgne CEA, LIST, Vision & Content Engineering Laboratory Gif-sur-Yvettes, France aymen.shabou@cea.fr

More information

Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1

Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1 Preface to the Second Edition Preface to the First Edition vii xi 1 Introduction 1 2 Overview of Supervised Learning 9 2.1 Introduction... 9 2.2 Variable Types and Terminology... 9 2.3 Two Simple Approaches

More information

CS229: Action Recognition in Tennis

CS229: Action Recognition in Tennis CS229: Action Recognition in Tennis Aman Sikka Stanford University Stanford, CA 94305 Rajbir Kataria Stanford University Stanford, CA 94305 asikka@stanford.edu rkataria@stanford.edu 1. Motivation As active

More information

Supervised learning. y = f(x) function

Supervised learning. y = f(x) function Supervised learning y = f(x) output prediction function Image feature Training: given a training set of labeled examples {(x 1,y 1 ),, (x N,y N )}, estimate the prediction function f by minimizing the

More information

Object Recognition. Computer Vision. Slides from Lana Lazebnik, Fei-Fei Li, Rob Fergus, Antonio Torralba, and Jean Ponce

Object Recognition. Computer Vision. Slides from Lana Lazebnik, Fei-Fei Li, Rob Fergus, Antonio Torralba, and Jean Ponce Object Recognition Computer Vision Slides from Lana Lazebnik, Fei-Fei Li, Rob Fergus, Antonio Torralba, and Jean Ponce How many visual object categories are there? Biederman 1987 ANIMALS PLANTS OBJECTS

More information

Introduction to object recognition. Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and others

Introduction to object recognition. Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and others Introduction to object recognition Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and others Overview Basic recognition tasks A statistical learning approach Traditional or shallow recognition

More information

Multi-view Facial Expression Recognition Analysis with Generic Sparse Coding Feature

Multi-view Facial Expression Recognition Analysis with Generic Sparse Coding Feature 0/19.. Multi-view Facial Expression Recognition Analysis with Generic Sparse Coding Feature Usman Tariq, Jianchao Yang, Thomas S. Huang Department of Electrical and Computer Engineering Beckman Institute

More information

Tensor Decomposition of Dense SIFT Descriptors in Object Recognition

Tensor Decomposition of Dense SIFT Descriptors in Object Recognition Tensor Decomposition of Dense SIFT Descriptors in Object Recognition Tan Vo 1 and Dat Tran 1 and Wanli Ma 1 1- Faculty of Education, Science, Technology and Mathematics University of Canberra, Australia

More information

Multiple-Choice Questionnaire Group C

Multiple-Choice Questionnaire Group C Family name: Vision and Machine-Learning Given name: 1/28/2011 Multiple-Choice naire Group C No documents authorized. There can be several right answers to a question. Marking-scheme: 2 points if all right

More information

On Compact Codes for Spatially Pooled Features

On Compact Codes for Spatially Pooled Features Yangqing Jia Oriol Vinyals Trevor Darrell UC Berkeley EECS, Berkeley, CA 97 USA jiayq@eecs.berkeley.edu vinyals@eecs.berkeley.edu trevor@eecs.berkeley.edu Abstract Feature encoding with an overcomplete

More information

Comparison of Optimization Methods for L1-regularized Logistic Regression

Comparison of Optimization Methods for L1-regularized Logistic Regression Comparison of Optimization Methods for L1-regularized Logistic Regression Aleksandar Jovanovich Department of Computer Science and Information Systems Youngstown State University Youngstown, OH 44555 aleksjovanovich@gmail.com

More information

Image Super-Resolution Reconstruction Based On L 1/2 Sparsity

Image Super-Resolution Reconstruction Based On L 1/2 Sparsity Buletin Teknik Elektro dan Informatika (Bulletin of Electrical Engineering and Informatics) Vol. 3, No. 3, September 4, pp. 55~6 ISSN: 89-39 55 Image Super-Resolution Reconstruction Based On L / Sparsity

More information

Self Lane Assignment Using Smart Mobile Camera For Intelligent GPS Navigation and Traffic Interpretation

Self Lane Assignment Using Smart Mobile Camera For Intelligent GPS Navigation and Traffic Interpretation For Intelligent GPS Navigation and Traffic Interpretation Tianshi Gao Stanford University tianshig@stanford.edu 1. Introduction Imagine that you are driving on the highway at 70 mph and trying to figure

More information

Network Lasso: Clustering and Optimization in Large Graphs

Network Lasso: Clustering and Optimization in Large Graphs Network Lasso: Clustering and Optimization in Large Graphs David Hallac, Jure Leskovec, Stephen Boyd Stanford University September 28, 2015 Convex optimization Convex optimization is everywhere Introduction

More information

SPARSE COMPONENT ANALYSIS FOR BLIND SOURCE SEPARATION WITH LESS SENSORS THAN SOURCES. Yuanqing Li, Andrzej Cichocki and Shun-ichi Amari

SPARSE COMPONENT ANALYSIS FOR BLIND SOURCE SEPARATION WITH LESS SENSORS THAN SOURCES. Yuanqing Li, Andrzej Cichocki and Shun-ichi Amari SPARSE COMPONENT ANALYSIS FOR BLIND SOURCE SEPARATION WITH LESS SENSORS THAN SOURCES Yuanqing Li, Andrzej Cichocki and Shun-ichi Amari Laboratory for Advanced Brain Signal Processing Laboratory for Mathematical

More information

Image Restoration and Background Separation Using Sparse Representation Framework

Image Restoration and Background Separation Using Sparse Representation Framework Image Restoration and Background Separation Using Sparse Representation Framework Liu, Shikun Abstract In this paper, we introduce patch-based PCA denoising and k-svd dictionary learning method for the

More information

The devil is in the details: an evaluation of recent feature encoding methods

The devil is in the details: an evaluation of recent feature encoding methods CHATFIELD et al.: THE DEVIL IS IN THE DETAILS 1 The devil is in the details: an evaluation of recent feature encoding methods Ken Chatfield http://www.robots.ox.ac.uk/~ken Victor Lempitsky http://www.robots.ox.ac.uk/~vilem

More information

ImageCLEF 2011

ImageCLEF 2011 SZTAKI @ ImageCLEF 2011 Bálint Daróczy joint work with András Benczúr, Róbert Pethes Data Mining and Web Search Group Computer and Automation Research Institute Hungarian Academy of Sciences Training/test

More information

Semi-Supervised Clustering with Partial Background Information

Semi-Supervised Clustering with Partial Background Information Semi-Supervised Clustering with Partial Background Information Jing Gao Pang-Ning Tan Haibin Cheng Abstract Incorporating background knowledge into unsupervised clustering algorithms has been the subject

More information

Ensemble of Bayesian Filters for Loop Closure Detection

Ensemble of Bayesian Filters for Loop Closure Detection Ensemble of Bayesian Filters for Loop Closure Detection Mohammad Omar Salameh, Azizi Abdullah, Shahnorbanun Sahran Pattern Recognition Research Group Center for Artificial Intelligence Faculty of Information

More information

Supplementary material for the paper Are Sparse Representations Really Relevant for Image Classification?

Supplementary material for the paper Are Sparse Representations Really Relevant for Image Classification? Supplementary material for the paper Are Sparse Representations Really Relevant for Image Classification? Roberto Rigamonti, Matthew A. Brown, Vincent Lepetit CVLab, EPFL Lausanne, Switzerland firstname.lastname@epfl.ch

More information

By Suren Manvelyan,

By Suren Manvelyan, By Suren Manvelyan, http://www.surenmanvelyan.com/gallery/7116 By Suren Manvelyan, http://www.surenmanvelyan.com/gallery/7116 By Suren Manvelyan, http://www.surenmanvelyan.com/gallery/7116 By Suren Manvelyan,

More information

Histograms of Sparse Codes for Object Detection

Histograms of Sparse Codes for Object Detection Histograms of Sparse Codes for Object Detection Xiaofeng Ren (Amazon), Deva Ramanan (UC Irvine) Presented by Hossein Azizpour What does the paper do? (learning) a new representation local histograms of

More information

DALM-SVD: ACCELERATED SPARSE CODING THROUGH SINGULAR VALUE DECOMPOSITION OF THE DICTIONARY

DALM-SVD: ACCELERATED SPARSE CODING THROUGH SINGULAR VALUE DECOMPOSITION OF THE DICTIONARY DALM-SVD: ACCELERATED SPARSE CODING THROUGH SINGULAR VALUE DECOMPOSITION OF THE DICTIONARY Hugo Gonçalves 1, Miguel Correia Xin Li 1 Aswin Sankaranarayanan 1 Vitor Tavares Carnegie Mellon University 1

More information

Face Recognition via Sparse Representation

Face Recognition via Sparse Representation Face Recognition via Sparse Representation John Wright, Allen Y. Yang, Arvind, S. Shankar Sastry and Yi Ma IEEE Trans. PAMI, March 2008 Research About Face Face Detection Face Alignment Face Recognition

More information

Exploring Bag of Words Architectures in the Facial Expression Domain

Exploring Bag of Words Architectures in the Facial Expression Domain Exploring Bag of Words Architectures in the Facial Expression Domain Karan Sikka, Tingfan Wu, Josh Susskind, and Marian Bartlett Machine Perception Laboratory, University of California San Diego {ksikka,ting,josh,marni}@mplab.ucsd.edu

More information

Extended Dictionary Learning : Convolutional and Multiple Feature Spaces

Extended Dictionary Learning : Convolutional and Multiple Feature Spaces Extended Dictionary Learning : Convolutional and Multiple Feature Spaces Konstantina Fotiadou, Greg Tsagkatakis & Panagiotis Tsakalides kfot@ics.forth.gr, greg@ics.forth.gr, tsakalid@ics.forth.gr ICS-

More information

Three things everyone should know to improve object retrieval. Relja Arandjelović and Andrew Zisserman (CVPR 2012)

Three things everyone should know to improve object retrieval. Relja Arandjelović and Andrew Zisserman (CVPR 2012) Three things everyone should know to improve object retrieval Relja Arandjelović and Andrew Zisserman (CVPR 2012) University of Oxford 2 nd April 2012 Large scale object retrieval Find all instances of

More information

Robust Face Recognition via Sparse Representation Authors: John Wright, Allen Y. Yang, Arvind Ganesh, S. Shankar Sastry, and Yi Ma

Robust Face Recognition via Sparse Representation Authors: John Wright, Allen Y. Yang, Arvind Ganesh, S. Shankar Sastry, and Yi Ma Robust Face Recognition via Sparse Representation Authors: John Wright, Allen Y. Yang, Arvind Ganesh, S. Shankar Sastry, and Yi Ma Presented by Hu Han Jan. 30 2014 For CSE 902 by Prof. Anil K. Jain: Selected

More information

arxiv: v1 [cs.cv] 2 Sep 2013

arxiv: v1 [cs.cv] 2 Sep 2013 A Study on Unsupervised Dictionary Learning and Feature Encoding for Action Classification Xiaojiang Peng 1,2, Qiang Peng 1, Yu Qiao 2, Junzhou Chen 1, and Mehtab Afzal 1 1 Southwest Jiaotong University,

More information

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu FMA901F: Machine Learning Lecture 3: Linear Models for Regression Cristian Sminchisescu Machine Learning: Frequentist vs. Bayesian In the frequentist setting, we seek a fixed parameter (vector), with value(s)

More information

Codebook Graph Coding of Descriptors

Codebook Graph Coding of Descriptors Int'l Conf. Par. and Dist. Proc. Tech. and Appl. PDPTA'5 3 Codebook Graph Coding of Descriptors Tetsuya Yoshida and Yuu Yamada Graduate School of Humanities and Science, Nara Women s University, Nara,

More information

REJECTION-BASED CLASSIFICATION FOR ACTION RECOGNITION USING A SPATIO-TEMPORAL DICTIONARY. Stefen Chan Wai Tim, Michele Rombaut, Denis Pellerin

REJECTION-BASED CLASSIFICATION FOR ACTION RECOGNITION USING A SPATIO-TEMPORAL DICTIONARY. Stefen Chan Wai Tim, Michele Rombaut, Denis Pellerin REJECTION-BASED CLASSIFICATION FOR ACTION RECOGNITION USING A SPATIO-TEMPORAL DICTIONARY Stefen Chan Wai Tim, Michele Rombaut, Denis Pellerin Univ. Grenoble Alpes, GIPSA-Lab, F-38000 Grenoble, France ABSTRACT

More information

Combining Selective Search Segmentation and Random Forest for Image Classification

Combining Selective Search Segmentation and Random Forest for Image Classification Combining Selective Search Segmentation and Random Forest for Image Classification Gediminas Bertasius November 24, 2013 1 Problem Statement Random Forest algorithm have been successfully used in many

More information

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009 Learning and Inferring Depth from Monocular Images Jiyan Pan April 1, 2009 Traditional ways of inferring depth Binocular disparity Structure from motion Defocus Given a single monocular image, how to infer

More information

Multiple Kernel Learning for Emotion Recognition in the Wild

Multiple Kernel Learning for Emotion Recognition in the Wild Multiple Kernel Learning for Emotion Recognition in the Wild Karan Sikka, Karmen Dykstra, Suchitra Sathyanarayana, Gwen Littlewort and Marian S. Bartlett Machine Perception Laboratory UCSD EmotiW Challenge,

More information

Unsupervised learning in Vision

Unsupervised learning in Vision Chapter 7 Unsupervised learning in Vision The fields of Computer Vision and Machine Learning complement each other in a very natural way: the aim of the former is to extract useful information from visual

More information

Analysis: TextonBoost and Semantic Texton Forests. Daniel Munoz Februrary 9, 2009

Analysis: TextonBoost and Semantic Texton Forests. Daniel Munoz Februrary 9, 2009 Analysis: TextonBoost and Semantic Texton Forests Daniel Munoz 16-721 Februrary 9, 2009 Papers [shotton-eccv-06] J. Shotton, J. Winn, C. Rother, A. Criminisi, TextonBoost: Joint Appearance, Shape and Context

More information

Frequent Inner-Class Approach: A Semi-supervised Learning Technique for One-shot Learning

Frequent Inner-Class Approach: A Semi-supervised Learning Technique for One-shot Learning Frequent Inner-Class Approach: A Semi-supervised Learning Technique for One-shot Learning Izumi Suzuki, Koich Yamada, Muneyuki Unehara Nagaoka University of Technology, 1603-1, Kamitomioka Nagaoka, Niigata

More information

Learning Dictionaries of Discriminative Image Patches

Learning Dictionaries of Discriminative Image Patches Downloaded from orbit.dtu.dk on: Nov 22, 2018 Learning Dictionaries of Discriminative Image Patches Dahl, Anders Bjorholm; Larsen, Rasmus Published in: Proceedings of the British Machine Vision Conference

More information

COMP 551 Applied Machine Learning Lecture 16: Deep Learning

COMP 551 Applied Machine Learning Lecture 16: Deep Learning COMP 551 Applied Machine Learning Lecture 16: Deep Learning Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise noted, all

More information

Semi-Supervised Hierarchical Models for 3D Human Pose Reconstruction

Semi-Supervised Hierarchical Models for 3D Human Pose Reconstruction Semi-Supervised Hierarchical Models for 3D Human Pose Reconstruction Atul Kanaujia, CBIM, Rutgers Cristian Sminchisescu, TTI-C Dimitris Metaxas,CBIM, Rutgers 3D Human Pose Inference Difficulties Towards

More information