arxiv: v1 [cs.cv] 13 Jul 2015

Size: px
Start display at page:

Download "arxiv: v1 [cs.cv] 13 Jul 2015"

Transcription

1 Unconstrained Facial Landmark Localization with Backbone-Branches Fully-Convolutional Networks arxiv:7.349v [cs.cv] 3 Jul 2 Zhujin Liang, Shengyong Ding, Liang Lin Sun Yat-Sen University Guangzhou Higher Education Mega Center, Guangzhou 6, PR China alfredtofu@gmail.com, marcding@63.com, linliang@ieee.org Abstract This paper investigates how to rapidly and accurately localize facial landmarks in unconstrained, cluttered environments rather than in the well segmented face images. We present a novel Backbone-Branches Fully-Convolutional Neural Network (BB-FCN), which produces facial landmark response maps directly from raw images without relying on pre-process or sliding window approaches. BB-FCN contains one backbone and a number of network branches with each corresponding to one landmark type, and it operates in a progressive manner. Specifically, the backbone roughly detects the locations of facial landmarks by taking the whole image as input, and the branches further refine the localizations based on a local observation from the backbone s intermediate feature map. Moreover, our backbone-branches architecture does not contain fullconnection layers for location regression, leading to efficient learning and inference. Our extensive experiments show that our model achieves superior performances over other state-of-the-arts under both the constrained (i.e. with face regions) and the in the wild scenarios.. Introduction Localizing facial landmarks plays a critical role in face recognition and it is also beneficial to a batch of face-based applications such as face hallucination [2] and person verification [4]. Most of existing methods for facial landmark estimation are developed in a controlled context, e.g., the face regions are well segmented as pre-processing. Such a setting has drawbacks when dealing with in the wild images (e.g., cluttered surveillance scenes), where the automated face detection is not always reliable. This work aims at the task of unconstrained facial landmark estimation, i.e., how to rapidly and accurately localize facial landmarks in real-world, cluttered environments (see Figure (a) for example). Specifically, we consider the following challenges (a) (b) Figure. Facial landmark predictions from unconstrained environments. (a) Two cluttered images including an unknown number of person faces. (b) The dense response maps generated by our approach, where the color represents the different types of landmarks. to develop such a system. Person faces have large appearance and structure variations in a constrained scene caused by diverse views, head poses and expressions as well as facial accessories (e.g., glasses and hats) and aging. Thus, traditional global models may not work well as the usual assumptions (e.g., certain spatial layouts) do not hold. The search space of facial landmarks is quite large under the circumstance that the number and the sizescale of person faces are both unknown. Partial occlusions and conjunctions of closing faces are also two inignorable difficulties. It is quite thus infeasible to handle our task by existing deformable part-based models with exhaustive image pyramid searching. To overcome the above issues, we present a novel deep model based on convolutional neural network (CNN),

2 which produces facial landmark response maps directly from raw images without relying on any pre-processing or feature engineering. Two typical results generated by our approach are shown in Figure (b). Besides achieving outstanding performance for image classification [] [6] [7], deep convolutional neural network models have demonstrated their effectiveness in object detection and localization [6]. These models usually take an image patch as input and output the parameterized object localization by the regression method. For example, Sun et al. [] proposed to detect landmarks in face images by using a three-level cascaded CNN framework, in which each landmark s location was gradually predicted via full-connection layers. Despite making substantial progresses, it is complicated for these models to jointly handle the classification (i.e., whether a landmark exists) and localization problem. Thus, applying the method in [] to a unconstrained scene requires the exhaustive sliding window search in the image. Very recently, Long et al. [2] presented a new fully-convolutional network (FCN), which takes input of arbitrary size and produces a correspondingly-sized dense label map, and showed convincing results for semantic image segmentation. Notably, the classification and localization can be simultaneously obtained with the dense label map. And the FCN has a property of efficient inference by sharing convolutions among overlapped image patches. The success of this work inspires us to adopt the FCN in our task, i.e., producing pixelwise facial landmark predictions. Nevertheless, we shall develop a specialized architecture, as our task requires more accurate prediction than general image labeling. Considering both computational efficiency and localization accuracy, we pose the facial landmark estimation as a coarse-to-fine filtering process. In particular, the locations of facial landmarks are roughly detected in the global context, while they are further refined by observing the local regions. To this end, we introduce a novel architecture of fully-convolutional networks that transparently accords with this coarse-to-fine pipeline. Specifically, our architecture contains one backbone and a number of network branches each corresponding to one landmark type, and it operates in a progressive manner. First, by taking the whole image as input, the network backbone, where several convolutional layers and max-pooling layers are sequentially stacked, handles all of the facial landmarks together and generates a coarse and low-resolution response map. Then, for each type of facial landmarks, the network branch takes a local observation from the intermediate feature map of the backbone, and it produces a fine and accurate prediction, where only convolutional layers are utilized. We thus call our architecture as the Backbone-Branches Fully- Convolutional Network (BB-FCN). Our BB-FCN has three important properties on handling the challenges of unconstrained facial landmark localization. i) It is trained end-to-end, pixels-to-pixels without requiring extra supervision. ii) It does not rely on full-connection layers for accurate localization regression, leading to efficient learning and inference. iii) It naturally combines the global and local information according with human perception process. We extensively evaluate BB-FCN in several standard benchmarks (e.g., AFW [26], AFLW [9]), and our experiments show that BB- FCN achieves superior performances over other state-ofthe-arts under both the constrained (i.e. with face regions) and the in the wild scenarios. In particular, our BB-FCN significantly decreases the average mean error of the current state-of-the-art from 8.2% to 7.6% on AFW and from 8% to 7.% on AFLW. The rest of the paper is organized as follows. Section II presents a brief review of related work. Section III introduces the main pipeline of our BB-FCN, followed by a discussion of network implementation and optimization in Section IV. The experimental results, comparisons and component analysis are presented in Section V. Section VI concludes the paper. 2. Related Work Facial landmark localization has been long studied due to its critical role in a lot applications. Generally speaking, most of the methods are based on local detectors and global constraints. Local detectors are designed to give the evidence of part existence, which are implemented by classifiers with hand-crafted features. For example, Belhumeur et al. [] use SVM as the local detectors with SIFT features and Liang et al.[] use AdaBoost to implement its local detectors on Haar wavelet features. For global constraints, there are several ways to model the part relationship. For example, Zhu et al. [26] applies the tree-based deformable part model to encode the constraints which achieves good performance. Valstar et al. models the constraints as Markov random field [9] to speed up the process and improve the robustness of the algorithm. Cao et al. [3] use the whole face region as input and random ferns as the regressor with shapes to be predicted expressed as linear combinations of training shapes. All these methods use hand-crafted features (e.g. HoG features) in common. In comparison to learned features (as in our work), hand-crafted features have poor generalization performance and discriminative power. Recently significant progress on facial landmark detection has been achieved. Deep models, like Convolutional Neural Networks (CNNs), Deep Auto-encoders (DAEs) and Restricted Boltzmann Machines (RBMs), play a vital role to advance this progress and most of the works are based on regression. In the regression based methods, the problem of facial landmark detection is formulated as a regression task 2

3 x Conv 2x2 Max Pool + x Conv 2x2 Max Pool + x Conv Backbone Network 2x2 Max Pool + 9x9 Conv x Conv x Conv 3x6x6 32x6x6 32x8x8 32x4x4 28x2x2 64x2x2 x2x2 crop LE Get max response positions in each channel x Conv 7x7 Conv 9x9 Conv x Conv 32x64x64 6x64x64 6x64x64 6x64x64 x64x64 RM Branch Network Figure 2. Landmark localization as a progressively filtering process with two stages. The backbone network first generates low resolution response maps identifying the rough locations with a large stride. The branch network then produces fine response maps with small stride for accurate landmark detection. There are five branches corresponding to five facial landmarks and each branch refine the response map separately. and a holistic regressor is used to jointly compute the landmark coordinates. Sun et al. [] propose a cascaded regression method for facial landmarks detection which includes three-level carefully designed convolutional networks. Further, Zhou et al. [2] proposal a four-level convolutional network cascade. Zhang et al. [24] tries to optimize facial landmark detection with other related tasks like pose estimation and facial expression analysis. Zhang et al. [23] proposal a novel Coarse-to-Fine Auto-Encoder Networks methods for facial landmark detection. Our method differs most of the existing deep models in the way how we model the outputs. Previous literatures model the output as the landmark locations, while we model the output as the response map, which does not introduce fully connection layers. The most similar works to ours are [8] [7]. In [7], Tompson et al. propose to use heat map to detect body joints with carefully designed dropout and rely on segmented images, assuming there is exact one target. We generalize the response map based model to face landmark localization problem with the standard SGD algorithm. We quantitatively compare the performance of regression based model with ours to demonstrate the effectiveness of response map model in the experiments. 3. The Proposed BB-FCN Architecture We aim at localizing all the facial landmarks in unconstrained images. Suppose there are K part types (e.g. eyes, nose, mouth and etc.) and use L k i = (x k i, yk i, sk i ) to denote the location of the ith instance of part k in the image I where x k i, yk i and sk i represent the coordinate and scale of the detected part respectively, then our task can be defined as follows with k =, 2,..., K. Det(I) = {(x k i, y k i, s k i )} () Note that the number of the instances of different parts may have different values due to the pose variation and occlusion. Unlike the existing approaches that predict the landmarks by regressors, we address this problem by a backbone-branches fully convolutional network model, with the backbone network to generate coarse response maps for rough location inference and the branch network to produce fine response maps for accurate location refinement. Before going to the details of our architecture, we first explain our model from an intuitive perspective, i.e. filtering perspective to reveal the key difference between our 3

4 model and the regression based model. In a nutshell, the response map of our model can be seen as a filtered response by a filtering function, with high values represent high confidence of the presence of a landmark. Let F W k(p ) to denote the filtering function parameterised by W k for part k defined on patch P of size w h. An ideal filtering function should have the following property: patches containing the target part should have strong responses while patches without that part should have weak responses. We model the response exponentially decreasing with the distance r between the part and the center of the patch as below where β controlling the decay effect. In this paper, β is set. in backbone network and. in branch network. { e βr if P contains part k; F W k(p ) (2) otherwise For an input image I, applying this function in sliding window manner with stride δ then generate a response map F W k I whose values can be derived by: (F W k I)(x, y) = F W k(i(xδ, yδ, xδ + w, yδ + h)) (3) Here, I(xδ, yδ, xδ + w, yδ + h) stands for the patch of size w h started at (xδ, yδ) in image I. With this response map, then a simple landmark localization approach can be formulated as below for part k where θ denotes a threshold. Det(I) = {(x iδ + w/2, y iδ + h/2, ) (F W k I)(x i, y i) > θ} (4) Of course, in order to achieve better results, we need to detect the landmarks across a set of scales and suppress the non maximum values as the typical detection approaches do, which will be discussed in later sections. According to equation 3, there is a trade-off between the localization error magnitude and the computational cost. In order to achieve high accuracy, we should make the stride δ as small as possible. However, in order to speedup the detection process, we need to enlarge this stride, resulting a low resolution response map. This inspires us to apply a two stage filtering process to localize the landmark progressively. More specifically, the first filtering process generates a coarse response map with a relatively large stride, identifying the rough locations of the landmark. Then we apply another filtering process on the local patches centered at the estimated landmark to get a fine response map for accurate landmark localization. This two stage strategy enables us to detect the landmark quickly at a high speed. 3.. Coarse-to-Fine Localization by BB-FCN In this part, we show that the coarse-to-fine strategy can be implemented by our proposed BB-FCN architecture. The key ingredient is a fully convolutional network x conv 7x7 conv x 7x7 x Layer Layer 2 Layer 3 Figure 3. An example of receptive field. There are two convolution layers with two kernels of x and 7x7 receptively. The receptive field of a response x response in layer 3 is a patch with size 7x7 in layer 2 and the receptive field of the 7x7 patch in layer 2 is a patch with size x in layer. can model a filtering process equivalently. Actually, the output of an input image by a fully convolutional network is just the result of a deep architecture applied on the sliding windows of that input. In the context of fully convolutional network, the sliding window is called a receptive field. Figure 3 illustrates the relationship between the receptive fields and the outputs. So we design an deep architecture with one backbone and a number of branches for efficient and accurate landmark localization. The backbone is to produce a coarse response map for each part at the top layer. This backbone takes a relative large receptive field to utilize global textures to ensure the quality of the coarse response map. With rough locations provided by the coarse map, we then apply another network, i.e. a branch for each part to refine the response map. In order to share some features, this branch takes patches from intermediate features of the backbone as input. This input is then convolved by a set of filters without any pooling to generate fine grained response map. Let W c to denote the parameters of the backbone network and H k (R; W c ) denote the response map of input R for part k. We train the backbone network with the loss function as follows: L (R; W c ) = K H k (R; W c ) H k (R) 2 () k= where H k (R) denotes the ground truth response map. This ground truth response map is defined according to equation 2 with the receptive field rec field(x, y) acting as the patch P. { H k e βr if rec field(x, y) contains part k; (x,y) = otherwise; (6) 4

5 Figure 4. Example images from AFW and AFLW and the results of facial landmarks detection, where the color represents the different types of landmarks. This figure is encouraged to be viewed with the electronic edition. 4. Implementation The branch networks are trained almost the same way as the backbone except for that each branch takes patches of intermediate features of the backbone as input rather than patches in images. Using Wfk to denote the parameters of the branch network for part k and H(P ; Wfk ), Hk (P ) to denote the predicted response map and ground truth response map of patch P respectively, the loss function of this network is again defined as follows with H (P ) set as in equation 2: L2 (P ; Wfk ) = H(P ; Wfk ) Hk (P ) 2 In this section, we describe the detailed network architecture in our implementation. Figure 2 shows the architecture of our network which consists of a backbone and a number of branches. Backbone Network: The backbone is positioned to produce coarse response maps for rough location estimation at a high speed. As the rough locations are the foundation of the fine locations, this network is designed to utilize global contexts with several pooling layers. More specifically, this network is trained by 6x6 inputs ranging from. to.8 times face size. The coarse response maps are produced by a set of stacked convolution and pooling layers as depicted in Figure 2. The first six layers contain three convolutional layers with each layer taking the pooled output of the previous layer as input. For simplicity, these six layers are compressed as three layers in Figure 2. The last three layers are all convolutional layers without any pooling. Note the final convolutional layer is our desired response map with each channel corresponding to one part. In summary, the kernel sizes of the convolutional layers are x, x, x, 9x9, x and x, and the feature map sizes are 32x6x6, 32x8x8, 32x4x4, 28x2x2, (7) 3.2. Non-maximum Suppression over Scales As the network is trained at a fixed scale, given a testing image, we need to rescale it to a set of images to ensure each landmark can appear at the trained scale. Thus we get a pyramid of response maps for each testing image for each part. We first conduct the non maximum suppression to localize the maximum point in the pyramid of coarse response maps. Given this detected maximum point with its scale and location, we obtain its fine grained response map by feeding the patches centered at the predicted locations to the fine network. Then we use the maximum point in this response map as the final detected landmarks.

6 AFLW mean error(%) mean error(%) AFW mean error(%) mean error(%) 2 TSPM ESR CDM Luxand RCPR SDM TCDCN Ours LE RE N LM RM LE RE N LM RM Figure. Comparison of our methods with different methods on AFW and AFLW datasets. The top row is the result of AFW with the right column showing the averaged mean error over different parts and the bottom row is the result of AFLW. 64x2x2 and x2x2 respectively. Branch Network: The branch network is designed to generate fine response map for precise landmark prediction. We use one branch for one part. Thus there are five branches in total and we use the same architecture for each branch. Since we do not want to introduce any pooling operations, each branch is attached to the first convolutional layer of the backbone, taking the patch of size 64x64 centered at the predicted location as input. This input is then convolved by four convolutional layers without any pooling operation to ensure the output has high resolution for accurate localization. By attaching the branch under the backbone convolutional layer, it offers the possibility for the two networks to benefit each other as they share some common feature maps. As depicted in Figure 2, the kernel sizes of this four convolutional layers are x, 7x7, 9x9 and x and the feature maps size are 6x64x64, 6x64x64, 6x64x64 and x64x64 respectively. We implement our model under Caffe[8], an open source framework for deep learning. The two networks are integrated as one model in Caffe thanks to the flexible configuration so that we do not need to invoke the forward propagation of the two networks process separately. The model is trained and tested on a host equipped with a Titan Black GPU of 6G memory. The training phase takes about 2 hours and.8g memory with 64 samples per mini-batch. For the testing, each constrained image takes about only 8.8 ms and each unconstrained images with size 4x4 takes about 7ms for 4 scales which are 778x778, 482x482, 23x23, 29x29, 88x88, 7x7, 96x96, 497x497, 44x44, 34x34, 288x288, 24x24, 2x2 and 67x67.. Experiment Dataset: We create our dataset for training and validation from three sources: () 737 face images (637 for training, for validation) collected from web which are manually labelled with five facial landmarks, (2) 99 images (7998 for training, 97 for validation) randomly selected from AFLW [9], and (3) 67 natural scene images (28 for training, 43 for validation) without person from INRIAPerson database [4] as negative examples. Totally, there are 33 samples for training and 26 samples for validation. For evaluation we use two challenging public datasets: AFW [26] and AFLW (figure 4 shows samples from these two dataset and our detection results). There are no overlap among training, validation and evaluation sets. The images in AFW and AFLW are collected in the wild environment which formulate a more challenging scenario than other datasets (e.g. XM2VTS [3]). Besides, these two datasets differs from others (e.g. LFPW []) in theirs annotation of multiple, non-frontal faces in a single image. AFW dataset consists of 2 images with 468 faces. The evaluation images of AFLW are the same as [24] which randomly selects 3 faces from AFLW and 39% of them are not-frontal. Evaluation Metric: The evaluation metric adopted is mean error which is measured by the distances between estimated landmarks and the ground truths, normalized with The labelled facial landmarks are left eye, right eye, nose, left lip corner and right lip corner. 6

7 fine coarse regression fine coarse regression LE NE LE NE N LM N LM RM A RM A Figure 6. The average precision of landmarks on AFW. fine and coarse stand for network with and without branch network and regression stand for regression model trained by us. respect to inter-ocular distance. It can be formulated as (x x ) err = 2 + (y y ) 2 (8) l where (x, y) and (x, y ) are the ground truth and predicted locations respectively, and l is the inter-ocular distance. In our experiments, we evaluate the performance of five facial landmarks, i.e. LE (left eye), RE (right eye), N (nose), LM (left mouth corner) and RM(right mouth corner) and A(average mean error over five facial landmarks)... Comparison with the State-of-the-art Methods Our method can support both constrained images and unconstrained images. In this part, we compare our method with other published results on constrained image datasets, i.e. AFW and AFLW in which we can access the bounding boxes. In this case, the maximum response locations are taken as the landmarks. The compared methods include academic State-of-the-art methods 2 as well as commercial softwares, i.e. () Robust Cascaded Pose Regression (RCPR) [2]; (2) Tree Structured Part 2 The results of the other methods are from [24]. Figure 7. The average precision of landmarks on AFLW. Model (TSPM) [26]; (3) Luxand face SDK 3 ; (4) Explicit Shape Regression (ESR) [3]; () A Cascaded Deformable Shape Model (CDM) [22]; (6) Supervised Descent Method (SDM) [2]; (7) Tasks-Constrained Deep Convolutional Network (TCDCN) [24]. Our model consistently achieves better performance on these two datasets than others. On AFW, our average mean error is 7.6 percent over five parts, advancing the state-of-the-art TDCN performance by 7.3 percent relatively. On AFLW, we achieve 7. percent average mean error, 6.2 percent improvement over TDCN. Figure shows that our methods outperforms all the other methods on both AFW and AFLW..2. Performance Under Unconstrained Scenarios So far as we know, very few facial landmark localizers are studied in unconstrained contexts. We thus use the precision-recall curve[] to evaluate the performance of facial landmark detection under unconstrain scenarios. Similar to object detection, a detected landmark is taken as correct only if there exists a ground truth landmark of the same type within % of the inter-ocular distance. For fairness, we also implement the balinese method by a fully convolutional network which has almost the same ar- 3 Luxand incorporated: Luxand face sdk, 7

8 mean error(%) relative improvement(%) 7 4 fine coarse LE RE N LM RM A 8 LE RE N LM RM A Figure 8. Comparison of branches model and backbone network. The left one shows the average mean error of five parts and the middle reduced average error one shows the relative improvement. Relative Improvement =. The right figure shows localized examples average error of the method in comparison with the top row for backbone network and bottom row for branch network. Solid circles are ground truths and empty ones are predictions. chitecture as our backbone network except for the top layer. In our baseline deep model, the top layer is a full connection layer with each part corresponds to three outputs, i.e. two for location regression in the receptive field and one for existence classification. So given a threshold value, we can find all the regions that contains the target part and use the predicted locations as the final detected landmarks. Figure 6 and figure 7 show the PR curves of different parts. Our method clearly outperforms the regression based deep model..3. Benefit of Branch Networks Our method relies on two networks to progressively refine the landmark locations. In this part, we evaluate the effect of the branch network. This is achieved by conducting the same experiments under two settings, i.e. one setting with the branch network and one setting without the branch network. We show the results in Figure 8, from which we can see that branch network can effectively improve the performance of landmarks detection. With the branch network, the performance achieves about 3.77% relative improvement. 6. Conclusion In this work, we proposed a novel Backbone-Branches Fully-Convolutional Network (BB-FCN) that progressively produces prediction maps of facial landmarks in an end-toend way. Our extensive experiments suggested that BB- FCN achieves very promising results under both the traditional constrained benchmarks as well as the cluttered, realworld scenes. In the future, we will integrate our BB-FCN model with object recognition and detection systems, where the accurate part-based localization can be very helpful for improving performances. References [] P. N. Belhumeur, D. W. Jacobs, D. Kriegman, and N. Kumar. Localizing parts of faces using a consensus of exemplars. In Computer Vision and Pattern Recognition (CVPR), 2 IEEE Conference on, pages 4 2. IEEE, 2. 2, 6 [2] X. P. Burgos-Artizzu, P. Perona, and P. Dollár. Robust face landmark estimation under occlusion. In Computer Vision (ICCV), 23 IEEE International Conference on, pages 3 2. IEEE, [3] X. Cao, Y. Wei, F. Wen, and J. Sun. Face alignment by explicit shape regression, Dec US Patent App. 3/728,84. 2, 7 [4] N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In C. Schmid, S. Soatto, and C. Tomasi, editors, International Conference on Computer Vision & Pattern Recognition, volume 2, pages , INRIA Rhône- Alpes, ZIRST-6, av. de l Europe, Montbonnot-38334, June 2. 6 [] M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman. The pascal visual object classes (voc) challenge. International journal of computer vision, 88(2):33 338, 2. 7 [6] R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In Computer Vision and Pattern Recognition (CVPR), 24 IEEE Conference on, pages IEEE, [7] K. He, X. Zhang, S. Ren, and J. Sun. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. arxiv preprint arxiv:2.82, 2. 2 [8] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. B. Girshick, S. Guadarrama, and T. Darrell. Caffe: Convolutional architecture for fast feature embedding. CoRR, abs/48.93, [9] M. Koestinger, P. Wohlhart, P. M. Roth, and H. Bischof. Annotated facial landmarks in the wild: A large-scale, realworld database for facial landmark localization. In First IEEE International Workshop on Benchmarking Facial Image Analysis Technologies, 2. 2, 6 8

9 [] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 97, [] L. Liang, R. Xiao, F. Wen, and J. Sun. Face alignment via component-based discriminative search. In Computer Vision ECCV 28, pages Springer, [2] J. Long, E. Shelhamer, and T. Darrell. Fully convolutional networks for semantic segmentation. arxiv preprint arxiv:4.438, [3] K. Messer, J. Matas, J. Kittler, J. Luettin, and G. Maitre. Xm2vtsdb: The extended m2vts database. In Second international conference on audio and video-based biometric person authentication, volume 964, pages Citeseer, [4] Y. Sun, Y. Chen, X. Wang, and X. Tang. Deep learning face representation by joint identification-verification. In Advances in Neural Information Processing Systems, pages , 24. [] Y. Sun, X. Wang, and X. Tang. Deep convolutional network cascade for facial point detection. In Computer Vision and Pattern Recognition (CVPR), 23 IEEE Conference on, pages IEEE, 23. 2, 3 [6] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. arxiv preprint arxiv: , [7] J. Tompson, R. Goroshin, A. Jain, Y. LeCun, and C. Bregler. Efficient object localization using convolutional networks. arxiv preprint arxiv:4.428, [8] J. J. Tompson, A. Jain, Y. LeCun, and C. Bregler. Joint training of a convolutional network and a graphical model for human pose estimation. In Advances in Neural Information Processing Systems, pages , [9] M. Valstar, B. Martinez, X. Binefa, and M. Pantic. Facial point detection using boosted regression and graph models. In Computer Vision and Pattern Recognition (CVPR), 2 IEEE Conference on, pages IEEE, 2. 2 [2] X. Xiong and F. De la Torre. Supervised descent method and its applications to face alignment. In Computer Vision and Pattern Recognition (CVPR), 23 IEEE Conference on, pages IEEE, [2] C.-Y. Yang, S. Liu, and M.-H. Yang. Structured face hallucination. In Computer Vision and Pattern Recognition (CVPR), 23 IEEE Conference on, pages IEEE, 23. [22] X. Yu, J. Huang, S. Zhang, W. Yan, and D. N. Metaxas. Pose-free facial landmark fitting via optimized part mixtures and cascaded deformable shape model. In Computer Vision (ICCV), 23 IEEE International Conference on, pages IEEE, [23] J. Zhang, S. Shan, M. Kan, and X. Chen. Coarse-to-fine autoencoder networks (cfan) for real-time face alignment. In Computer Vision ECCV 24, pages 6. Springer, [24] Z. Zhang, P. Luo, C. C. Loy, and X. Tang. Facial landmark detection by deep multi-task learning. In Computer Vision ECCV 24, pages Springer, 24. 3, 6, 7 [2] E. Zhou, H. Fan, Z. Cao, Y. Jiang, and Q. Yin. Extensive facial landmark localization with coarse-to-fine convolutional network cascade. In Computer Vision Workshops (ICCVW), 23 IEEE International Conference on, pages IEEE, [26] X. Zhu and D. Ramanan. Face detection, pose estimation, and landmark localization in the wild. In Computer Vision and Pattern Recognition (CVPR), 22 IEEE Conference on, pages IEEE, 22. 2, 6, 7 9

arxiv: v1 [cs.cv] 16 Nov 2015

arxiv: v1 [cs.cv] 16 Nov 2015 Coarse-to-fine Face Alignment with Multi-Scale Local Patch Regression Zhiao Huang hza@megvii.com Erjin Zhou zej@megvii.com Zhimin Cao czm@megvii.com arxiv:1511.04901v1 [cs.cv] 16 Nov 2015 Abstract Facial

More information

FACIAL POINT DETECTION USING CONVOLUTIONAL NEURAL NETWORK TRANSFERRED FROM A HETEROGENEOUS TASK

FACIAL POINT DETECTION USING CONVOLUTIONAL NEURAL NETWORK TRANSFERRED FROM A HETEROGENEOUS TASK FACIAL POINT DETECTION USING CONVOLUTIONAL NEURAL NETWORK TRANSFERRED FROM A HETEROGENEOUS TASK Takayoshi Yamashita* Taro Watasue** Yuji Yamauchi* Hironobu Fujiyoshi* *Chubu University, **Tome R&D 1200,

More information

FACIAL POINT DETECTION BASED ON A CONVOLUTIONAL NEURAL NETWORK WITH OPTIMAL MINI-BATCH PROCEDURE. Chubu University 1200, Matsumoto-cho, Kasugai, AICHI

FACIAL POINT DETECTION BASED ON A CONVOLUTIONAL NEURAL NETWORK WITH OPTIMAL MINI-BATCH PROCEDURE. Chubu University 1200, Matsumoto-cho, Kasugai, AICHI FACIAL POINT DETECTION BASED ON A CONVOLUTIONAL NEURAL NETWORK WITH OPTIMAL MINI-BATCH PROCEDURE Masatoshi Kimura Takayoshi Yamashita Yu Yamauchi Hironobu Fuyoshi* Chubu University 1200, Matsumoto-cho,

More information

Cost-alleviative Learning for Deep Convolutional Neural Network-based Facial Part Labeling

Cost-alleviative Learning for Deep Convolutional Neural Network-based Facial Part Labeling [DOI: 10.2197/ipsjtcva.7.99] Express Paper Cost-alleviative Learning for Deep Convolutional Neural Network-based Facial Part Labeling Takayoshi Yamashita 1,a) Takaya Nakamura 1 Hiroshi Fukui 1,b) Yuji

More information

Improved Face Detection and Alignment using Cascade Deep Convolutional Network

Improved Face Detection and Alignment using Cascade Deep Convolutional Network Improved Face Detection and Alignment using Cascade Deep Convolutional Network Weilin Cong, Sanyuan Zhao, Hui Tian, and Jianbing Shen Beijing Key Laboratory of Intelligent Information Technology, School

More information

REGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION

REGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION REGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION Kingsley Kuan 1, Gaurav Manek 1, Jie Lin 1, Yuan Fang 1, Vijay Chandrasekhar 1,2 Institute for Infocomm Research, A*STAR, Singapore 1 Nanyang Technological

More information

A FRAMEWORK OF EXTRACTING MULTI-SCALE FEATURES USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKS. Kuan-Chuan Peng and Tsuhan Chen

A FRAMEWORK OF EXTRACTING MULTI-SCALE FEATURES USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKS. Kuan-Chuan Peng and Tsuhan Chen A FRAMEWORK OF EXTRACTING MULTI-SCALE FEATURES USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKS Kuan-Chuan Peng and Tsuhan Chen School of Electrical and Computer Engineering, Cornell University, Ithaca, NY

More information

A Fully End-to-End Cascaded CNN for Facial Landmark Detection

A Fully End-to-End Cascaded CNN for Facial Landmark Detection 2017 IEEE 12th International Conference on Automatic Face & Gesture Recognition A Fully End-to-End Cascaded CNN for Facial Landmark Detection Zhenliang He 1,2 Meina Kan 1,3 Jie Zhang 1,2 Xilin Chen 1 Shiguang

More information

arxiv: v1 [cs.cv] 29 Sep 2016

arxiv: v1 [cs.cv] 29 Sep 2016 arxiv:1609.09545v1 [cs.cv] 29 Sep 2016 Two-stage Convolutional Part Heatmap Regression for the 1st 3D Face Alignment in the Wild (3DFAW) Challenge Adrian Bulat and Georgios Tzimiropoulos Computer Vision

More information

Extensive Facial Landmark Localization with Coarse-to-fine Convolutional Network Cascade

Extensive Facial Landmark Localization with Coarse-to-fine Convolutional Network Cascade 2013 IEEE International Conference on Computer Vision Workshops Extensive Facial Landmark Localization with Coarse-to-fine Convolutional Network Cascade Erjin Zhou Haoqiang Fan Zhimin Cao Yuning Jiang

More information

Object detection with CNNs

Object detection with CNNs Object detection with CNNs 80% PASCAL VOC mean0average0precision0(map) 70% 60% 50% 40% 30% 20% 10% Before CNNs After CNNs 0% 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 year Region proposals

More information

Channel Locality Block: A Variant of Squeeze-and-Excitation

Channel Locality Block: A Variant of Squeeze-and-Excitation Channel Locality Block: A Variant of Squeeze-and-Excitation 1 st Huayu Li Northern Arizona University Flagstaff, United State Northern Arizona University hl459@nau.edu arxiv:1901.01493v1 [cs.lg] 6 Jan

More information

Deep learning for object detection. Slides from Svetlana Lazebnik and many others

Deep learning for object detection. Slides from Svetlana Lazebnik and many others Deep learning for object detection Slides from Svetlana Lazebnik and many others Recent developments in object detection 80% PASCAL VOC mean0average0precision0(map) 70% 60% 50% 40% 30% 20% 10% Before deep

More information

Proceedings of the International MultiConference of Engineers and Computer Scientists 2018 Vol I IMECS 2018, March 14-16, 2018, Hong Kong

Proceedings of the International MultiConference of Engineers and Computer Scientists 2018 Vol I IMECS 2018, March 14-16, 2018, Hong Kong , March 14-16, 2018, Hong Kong , March 14-16, 2018, Hong Kong , March 14-16, 2018, Hong Kong , March 14-16, 2018, Hong Kong TABLE I CLASSIFICATION ACCURACY OF DIFFERENT PRE-TRAINED MODELS ON THE TEST DATA

More information

Intensity-Depth Face Alignment Using Cascade Shape Regression

Intensity-Depth Face Alignment Using Cascade Shape Regression Intensity-Depth Face Alignment Using Cascade Shape Regression Yang Cao 1 and Bao-Liang Lu 1,2 1 Center for Brain-like Computing and Machine Intelligence Department of Computer Science and Engineering Shanghai

More information

Robust FEC-CNN: A High Accuracy Facial Landmark Detection System

Robust FEC-CNN: A High Accuracy Facial Landmark Detection System Robust FEC-CNN: A High Accuracy Facial Landmark Detection System Zhenliang He 1,2 Jie Zhang 1,2 Meina Kan 1,3 Shiguang Shan 1,3 Xilin Chen 1 1 Key Lab of Intelligent Information Processing of Chinese Academy

More information

Tweaked residual convolutional network for face alignment

Tweaked residual convolutional network for face alignment Journal of Physics: Conference Series PAPER OPEN ACCESS Tweaked residual convolutional network for face alignment To cite this article: Wenchao Du et al 2017 J. Phys.: Conf. Ser. 887 012077 Related content

More information

Unconstrained Face Alignment without Face Detection

Unconstrained Face Alignment without Face Detection Unconstrained Face Alignment without Face Detection Xiaohu Shao 1,2, Junliang Xing 3, Jiangjing Lv 1,2, Chunlin Xiao 4, Pengcheng Liu 1, Youji Feng 1, Cheng Cheng 1 1 Chongqing Institute of Green and Intelligent

More information

Locating Facial Landmarks Using Probabilistic Random Forest

Locating Facial Landmarks Using Probabilistic Random Forest 2324 IEEE SIGNAL PROCESSING LETTERS, VOL. 22, NO. 12, DECEMBER 2015 Locating Facial Landmarks Using Probabilistic Random Forest Changwei Luo, Zengfu Wang, Shaobiao Wang, Juyong Zhang, and Jun Yu Abstract

More information

arxiv: v1 [cs.cv] 31 Mar 2016

arxiv: v1 [cs.cv] 31 Mar 2016 Object Boundary Guided Semantic Segmentation Qin Huang, Chunyang Xia, Wenchao Zheng, Yuhang Song, Hao Xu and C.-C. Jay Kuo arxiv:1603.09742v1 [cs.cv] 31 Mar 2016 University of Southern California Abstract.

More information

OVer the last few years, cascaded-regression (CR) based

OVer the last few years, cascaded-regression (CR) based 1 Random Cascaded-Regression Copse for Robust Facial Landmark Detection Zhen-Hua Feng 1,2, Student Member, IEEE, Patrik Huber 2, Josef Kittler 2, Life Member, IEEE, William Christmas 2, and Xiao-Jun Wu

More information

Content-Based Image Recovery

Content-Based Image Recovery Content-Based Image Recovery Hong-Yu Zhou and Jianxin Wu National Key Laboratory for Novel Software Technology Nanjing University, China zhouhy@lamda.nju.edu.cn wujx2001@nju.edu.cn Abstract. We propose

More information

arxiv: v1 [cs.cv] 26 Jun 2017

arxiv: v1 [cs.cv] 26 Jun 2017 Detecting Small Signs from Large Images arxiv:1706.08574v1 [cs.cv] 26 Jun 2017 Zibo Meng, Xiaochuan Fan, Xin Chen, Min Chen and Yan Tong Computer Science and Engineering University of South Carolina, Columbia,

More information

TRANSPARENT OBJECT DETECTION USING REGIONS WITH CONVOLUTIONAL NEURAL NETWORK

TRANSPARENT OBJECT DETECTION USING REGIONS WITH CONVOLUTIONAL NEURAL NETWORK TRANSPARENT OBJECT DETECTION USING REGIONS WITH CONVOLUTIONAL NEURAL NETWORK 1 Po-Jen Lai ( 賴柏任 ), 2 Chiou-Shann Fuh ( 傅楸善 ) 1 Dept. of Electrical Engineering, National Taiwan University, Taiwan 2 Dept.

More information

Spatial Localization and Detection. Lecture 8-1

Spatial Localization and Detection. Lecture 8-1 Lecture 8: Spatial Localization and Detection Lecture 8-1 Administrative - Project Proposals were due on Saturday Homework 2 due Friday 2/5 Homework 1 grades out this week Midterm will be in-class on Wednesday

More information

MULTI-SCALE OBJECT DETECTION WITH FEATURE FUSION AND REGION OBJECTNESS NETWORK. Wenjie Guan, YueXian Zou*, Xiaoqun Zhou

MULTI-SCALE OBJECT DETECTION WITH FEATURE FUSION AND REGION OBJECTNESS NETWORK. Wenjie Guan, YueXian Zou*, Xiaoqun Zhou MULTI-SCALE OBJECT DETECTION WITH FEATURE FUSION AND REGION OBJECTNESS NETWORK Wenjie Guan, YueXian Zou*, Xiaoqun Zhou ADSPLAB/Intelligent Lab, School of ECE, Peking University, Shenzhen,518055, China

More information

Face Alignment across Large Pose via MT-CNN based 3D Shape Reconstruction

Face Alignment across Large Pose via MT-CNN based 3D Shape Reconstruction Face Alignment across Large Pose via MT-CNN based 3D Shape Reconstruction Gang Zhang 1,2, Hu Han 1, Shiguang Shan 1,2,3, Xingguang Song 4, Xilin Chen 1,2 1 Key Laboratory of Intelligent Information Processing

More information

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun Presented by Tushar Bansal Objective 1. Get bounding box for all objects

More information

Robust Facial Landmark Detection under Significant Head Poses and Occlusion

Robust Facial Landmark Detection under Significant Head Poses and Occlusion Robust Facial Landmark Detection under Significant Head Poses and Occlusion Yue Wu Qiang Ji ECSE Department, Rensselaer Polytechnic Institute 110 8th street, Troy, NY, USA {wuy9,jiq}@rpi.edu Abstract There

More information

arxiv: v2 [cs.cv] 23 May 2016

arxiv: v2 [cs.cv] 23 May 2016 Localizing by Describing: Attribute-Guided Attention Localization for Fine-Grained Recognition arxiv:1605.06217v2 [cs.cv] 23 May 2016 Xiao Liu Jiang Wang Shilei Wen Errui Ding Yuanqing Lin Baidu Research

More information

Topic-aware Deep Auto-encoders (TDA) for Face Alignment

Topic-aware Deep Auto-encoders (TDA) for Face Alignment Topic-aware Deep Auto-encoders (TDA) for Face Alignment Jie Zhang 1,2, Meina Kan 1, Shiguang Shan 1, Xiaowei Zhao 3, and Xilin Chen 1 1 Key Lab of Intelligent Information Processing of Chinese Academy

More information

Deep Multi-Center Learning for Face Alignment

Deep Multi-Center Learning for Face Alignment 1 Deep Multi-Center Learning for Face Alignment Zhiwen Shao 1, Hengliang Zhu 1, Xin Tan 1, Yangyang Hao 1, and Lizhuang Ma 2,1 1 Department of Computer Science and Engineering, Shanghai Jiao Tong University,

More information

Supplementary material for Analyzing Filters Toward Efficient ConvNet

Supplementary material for Analyzing Filters Toward Efficient ConvNet Supplementary material for Analyzing Filters Toward Efficient Net Takumi Kobayashi National Institute of Advanced Industrial Science and Technology, Japan takumi.kobayashi@aist.go.jp A. Orthonormal Steerable

More information

CEA LIST s participation to the Scalable Concept Image Annotation task of ImageCLEF 2015

CEA LIST s participation to the Scalable Concept Image Annotation task of ImageCLEF 2015 CEA LIST s participation to the Scalable Concept Image Annotation task of ImageCLEF 2015 Etienne Gadeski, Hervé Le Borgne, and Adrian Popescu CEA, LIST, Laboratory of Vision and Content Engineering, France

More information

Real-Time Rotation-Invariant Face Detection with Progressive Calibration Networks

Real-Time Rotation-Invariant Face Detection with Progressive Calibration Networks Real-Time Rotation-Invariant Face Detection with Progressive Calibration Networks Xuepeng Shi 1,2 Shiguang Shan 1,3 Meina Kan 1,3 Shuzhe Wu 1,2 Xilin Chen 1 1 Key Lab of Intelligent Information Processing

More information

Object Detection Based on Deep Learning

Object Detection Based on Deep Learning Object Detection Based on Deep Learning Yurii Pashchenko AI Ukraine 2016, Kharkiv, 2016 Image classification (mostly what you ve seen) http://tutorial.caffe.berkeleyvision.org/caffe-cvpr15-detection.pdf

More information

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks Si Chen The George Washington University sichen@gwmail.gwu.edu Meera Hahn Emory University mhahn7@emory.edu Mentor: Afshin

More information

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech Convolutional Neural Networks Computer Vision Jia-Bin Huang, Virginia Tech Today s class Overview Convolutional Neural Network (CNN) Training CNN Understanding and Visualizing CNN Image Categorization:

More information

Direct Multi-Scale Dual-Stream Network for Pedestrian Detection Sang-Il Jung and Ki-Sang Hong Image Information Processing Lab.

Direct Multi-Scale Dual-Stream Network for Pedestrian Detection Sang-Il Jung and Ki-Sang Hong Image Information Processing Lab. [ICIP 2017] Direct Multi-Scale Dual-Stream Network for Pedestrian Detection Sang-Il Jung and Ki-Sang Hong Image Information Processing Lab., POSTECH Pedestrian Detection Goal To draw bounding boxes that

More information

Efficient Segmentation-Aided Text Detection For Intelligent Robots

Efficient Segmentation-Aided Text Detection For Intelligent Robots Efficient Segmentation-Aided Text Detection For Intelligent Robots Junting Zhang, Yuewei Na, Siyang Li, C.-C. Jay Kuo University of Southern California Outline Problem Definition and Motivation Related

More information

Computer Vision Lecture 16

Computer Vision Lecture 16 Computer Vision Lecture 16 Deep Learning for Object Categorization 14.01.2016 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Announcements Seminar registration period

More information

arxiv: v1 [cs.cv] 5 Oct 2015

arxiv: v1 [cs.cv] 5 Oct 2015 Efficient Object Detection for High Resolution Images Yongxi Lu 1 and Tara Javidi 1 arxiv:1510.01257v1 [cs.cv] 5 Oct 2015 Abstract Efficient generation of high-quality object proposals is an essential

More information

R-FCN++: Towards Accurate Region-Based Fully Convolutional Networks for Object Detection

R-FCN++: Towards Accurate Region-Based Fully Convolutional Networks for Object Detection The Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18) R-FCN++: Towards Accurate Region-Based Fully Convolutional Networks for Object Detection Zeming Li, 1 Yilun Chen, 2 Gang Yu, 2 Yangdong

More information

Supplementary Material: Unconstrained Salient Object Detection via Proposal Subset Optimization

Supplementary Material: Unconstrained Salient Object Detection via Proposal Subset Optimization Supplementary Material: Unconstrained Salient Object via Proposal Subset Optimization 1. Proof of the Submodularity According to Eqns. 10-12 in our paper, the objective function of the proposed optimization

More information

Detecting facial landmarks in the video based on a hybrid framework

Detecting facial landmarks in the video based on a hybrid framework Detecting facial landmarks in the video based on a hybrid framework Nian Cai 1, *, Zhineng Lin 1, Fu Zhang 1, Guandong Cen 1, Han Wang 2 1 School of Information Engineering, Guangdong University of Technology,

More information

Real-time Object Detection CS 229 Course Project

Real-time Object Detection CS 229 Course Project Real-time Object Detection CS 229 Course Project Zibo Gong 1, Tianchang He 1, and Ziyi Yang 1 1 Department of Electrical Engineering, Stanford University December 17, 2016 Abstract Objection detection

More information

PT-NET: IMPROVE OBJECT AND FACE DETECTION VIA A PRE-TRAINED CNN MODEL

PT-NET: IMPROVE OBJECT AND FACE DETECTION VIA A PRE-TRAINED CNN MODEL PT-NET: IMPROVE OBJECT AND FACE DETECTION VIA A PRE-TRAINED CNN MODEL Yingxin Lou 1, Guangtao Fu 2, Zhuqing Jiang 1, Aidong Men 1, and Yun Zhou 2 1 Beijing University of Posts and Telecommunications, Beijing,

More information

Unified, real-time object detection

Unified, real-time object detection Unified, real-time object detection Final Project Report, Group 02, 8 Nov 2016 Akshat Agarwal (13068), Siddharth Tanwar (13699) CS698N: Recent Advances in Computer Vision, Jul Nov 2016 Instructor: Gaurav

More information

FaceNet. Florian Schroff, Dmitry Kalenichenko, James Philbin Google Inc. Presentation by Ignacio Aranguren and Rahul Rana

FaceNet. Florian Schroff, Dmitry Kalenichenko, James Philbin Google Inc. Presentation by Ignacio Aranguren and Rahul Rana FaceNet Florian Schroff, Dmitry Kalenichenko, James Philbin Google Inc. Presentation by Ignacio Aranguren and Rahul Rana Introduction FaceNet learns a mapping from face images to a compact Euclidean Space

More information

Fully Convolutional Networks for Semantic Segmentation

Fully Convolutional Networks for Semantic Segmentation Fully Convolutional Networks for Semantic Segmentation Jonathan Long* Evan Shelhamer* Trevor Darrell UC Berkeley Chaim Ginzburg for Deep Learning seminar 1 Semantic Segmentation Define a pixel-wise labeling

More information

Recognize Complex Events from Static Images by Fusing Deep Channels Supplementary Materials

Recognize Complex Events from Static Images by Fusing Deep Channels Supplementary Materials Recognize Complex Events from Static Images by Fusing Deep Channels Supplementary Materials Yuanjun Xiong 1 Kai Zhu 1 Dahua Lin 1 Xiaoou Tang 1,2 1 Department of Information Engineering, The Chinese University

More information

arxiv: v1 [cs.cv] 31 Mar 2017

arxiv: v1 [cs.cv] 31 Mar 2017 End-to-End Spatial Transform Face Detection and Recognition Liying Chi Zhejiang University charrin0531@gmail.com Hongxin Zhang Zhejiang University zhx@cad.zju.edu.cn Mingxiu Chen Rokid.inc cmxnono@rokid.com

More information

Feature-Fused SSD: Fast Detection for Small Objects

Feature-Fused SSD: Fast Detection for Small Objects Feature-Fused SSD: Fast Detection for Small Objects Guimei Cao, Xuemei Xie, Wenzhe Yang, Quan Liao, Guangming Shi, Jinjian Wu School of Electronic Engineering, Xidian University, China xmxie@mail.xidian.edu.cn

More information

arxiv: v1 [cs.cv] 11 Jun 2015

arxiv: v1 [cs.cv] 11 Jun 2015 Pose-Invariant 3D Face Alignment Amin Jourabloo, Xiaoming Liu Department of Computer Science and Engineering Michigan State University, East Lansing MI 48824 {jourablo, liuxm}@msu.edu arxiv:506.03799v

More information

Towards Real-Time Automatic Number Plate. Detection: Dots in the Search Space

Towards Real-Time Automatic Number Plate. Detection: Dots in the Search Space Towards Real-Time Automatic Number Plate Detection: Dots in the Search Space Chi Zhang Department of Computer Science and Technology, Zhejiang University wellyzhangc@zju.edu.cn Abstract Automatic Number

More information

Facial Key Points Detection using Deep Convolutional Neural Network - NaimishNet

Facial Key Points Detection using Deep Convolutional Neural Network - NaimishNet 1 Facial Key Points Detection using Deep Convolutional Neural Network - NaimishNet Naimish Agarwal, IIIT-Allahabad (irm2013013@iiita.ac.in) Artus Krohn-Grimberghe, University of Paderborn (artus@aisbi.de)

More information

Deep Neural Networks:

Deep Neural Networks: Deep Neural Networks: Part II Convolutional Neural Network (CNN) Yuan-Kai Wang, 2016 Web site of this course: http://pattern-recognition.weebly.com source: CNN for ImageClassification, by S. Lazebnik,

More information

arxiv: v1 [cs.cv] 20 Dec 2016

arxiv: v1 [cs.cv] 20 Dec 2016 End-to-End Pedestrian Collision Warning System based on a Convolutional Neural Network with Semantic Segmentation arxiv:1612.06558v1 [cs.cv] 20 Dec 2016 Heechul Jung heechul@dgist.ac.kr Min-Kook Choi mkchoi@dgist.ac.kr

More information

The Nottingham eprints service makes this work by researchers of the University of Nottingham available open access under the following conditions.

The Nottingham eprints service makes this work by researchers of the University of Nottingham available open access under the following conditions. Sagonas, Christos and Tzimiropoulos, Georgios and Zafeiriou, Stefanos and Pantic, Maja (2013) 300 Faces in-the-wild Challenge: the first facial landmark localization challenge. In: 2013 IEEE International

More information

Deep Learning. Visualizing and Understanding Convolutional Networks. Christopher Funk. Pennsylvania State University.

Deep Learning. Visualizing and Understanding Convolutional Networks. Christopher Funk. Pennsylvania State University. Visualizing and Understanding Convolutional Networks Christopher Pennsylvania State University February 23, 2015 Some Slide Information taken from Pierre Sermanet (Google) presentation on and Computer

More information

Computer Vision Lecture 16

Computer Vision Lecture 16 Computer Vision Lecture 16 Deep Learning Applications 11.01.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Announcements Seminar registration period starts

More information

Regionlet Object Detector with Hand-crafted and CNN Feature

Regionlet Object Detector with Hand-crafted and CNN Feature Regionlet Object Detector with Hand-crafted and CNN Feature Xiaoyu Wang Research Xiaoyu Wang Research Ming Yang Horizon Robotics Shenghuo Zhu Alibaba Group Yuanqing Lin Baidu Overview of this section Regionlet

More information

arxiv: v1 [cs.cv] 12 Sep 2018

arxiv: v1 [cs.cv] 12 Sep 2018 In Proceedings of the 2018 IEEE International Conference on Image Processing (ICIP) The final publication is available at: http://dx.doi.org/10.1109/icip.2018.8451026 A TWO-STEP LEARNING METHOD FOR DETECTING

More information

Part Localization by Exploiting Deep Convolutional Networks

Part Localization by Exploiting Deep Convolutional Networks Part Localization by Exploiting Deep Convolutional Networks Marcel Simon, Erik Rodner, and Joachim Denzler Computer Vision Group, Friedrich Schiller University of Jena, Germany www.inf-cv.uni-jena.de Abstract.

More information

LEARNING A MULTI-CENTER CONVOLUTIONAL NETWORK FOR UNCONSTRAINED FACE ALIGNMENT. Zhiwen Shao, Hengliang Zhu, Yangyang Hao, Min Wang, and Lizhuang Ma

LEARNING A MULTI-CENTER CONVOLUTIONAL NETWORK FOR UNCONSTRAINED FACE ALIGNMENT. Zhiwen Shao, Hengliang Zhu, Yangyang Hao, Min Wang, and Lizhuang Ma LEARNING A MULTI-CENTER CONVOLUTIONAL NETWORK FOR UNCONSTRAINED FACE ALIGNMENT Zhiwen Shao, Hengliang Zhu, Yangyang Hao, Min Wang, and Lizhuang Ma Department of Computer Science and Engineering, Shanghai

More information

Structured Prediction using Convolutional Neural Networks

Structured Prediction using Convolutional Neural Networks Overview Structured Prediction using Convolutional Neural Networks Bohyung Han bhhan@postech.ac.kr Computer Vision Lab. Convolutional Neural Networks (CNNs) Structured predictions for low level computer

More information

3D model classification using convolutional neural network

3D model classification using convolutional neural network 3D model classification using convolutional neural network JunYoung Gwak Stanford jgwak@cs.stanford.edu Abstract Our goal is to classify 3D models directly using convolutional neural network. Most of existing

More information

Computer Vision Lecture 16

Computer Vision Lecture 16 Announcements Computer Vision Lecture 16 Deep Learning Applications 11.01.2017 Seminar registration period starts on Friday We will offer a lab course in the summer semester Deep Robot Learning Topic:

More information

Study of Residual Networks for Image Recognition

Study of Residual Networks for Image Recognition Study of Residual Networks for Image Recognition Mohammad Sadegh Ebrahimi Stanford University sadegh@stanford.edu Hossein Karkeh Abadi Stanford University hosseink@stanford.edu Abstract Deep neural networks

More information

Learn to Combine Multiple Hypotheses for Accurate Face Alignment

Learn to Combine Multiple Hypotheses for Accurate Face Alignment 2013 IEEE International Conference on Computer Vision Workshops Learn to Combine Multiple Hypotheses for Accurate Face Alignment Junjie Yan Zhen Lei Dong Yi Stan Z. Li Center for Biometrics and Security

More information

Rich feature hierarchies for accurate object detection and semantic segmentation

Rich feature hierarchies for accurate object detection and semantic segmentation Rich feature hierarchies for accurate object detection and semantic segmentation BY; ROSS GIRSHICK, JEFF DONAHUE, TREVOR DARRELL AND JITENDRA MALIK PRESENTER; MUHAMMAD OSAMA Object detection vs. classification

More information

A Deep Regression Architecture with Two-Stage Re-initialization for High Performance Facial Landmark Detection

A Deep Regression Architecture with Two-Stage Re-initialization for High Performance Facial Landmark Detection A Deep Regression Architecture with Two-Stage Re-initialization for High Performance Facial Landmark Detection Jiangjing Lv 1,2, Xiaohu Shao 1,2, Junliang Xing 3, Cheng Cheng 1, Xi Zhou 1 1 Chongqing Institute

More information

YOLO9000: Better, Faster, Stronger

YOLO9000: Better, Faster, Stronger YOLO9000: Better, Faster, Stronger Date: January 24, 2018 Prepared by Haris Khan (University of Toronto) Haris Khan CSC2548: Machine Learning in Computer Vision 1 Overview 1. Motivation for one-shot object

More information

Finding Tiny Faces Supplementary Materials

Finding Tiny Faces Supplementary Materials Finding Tiny Faces Supplementary Materials Peiyun Hu, Deva Ramanan Robotics Institute Carnegie Mellon University {peiyunh,deva}@cs.cmu.edu 1. Error analysis Quantitative analysis We plot the distribution

More information

Deep Convolutional Neural Network in Deformable Part Models for Face Detection

Deep Convolutional Neural Network in Deformable Part Models for Face Detection Deep Convolutional Neural Network in Deformable Part Models for Face Detection Dinh-Luan Nguyen 1, Vinh-Tiep Nguyen 2, Minh-Triet Tran 2, Atsuo Yoshitaka 3 1,2 University of Science, Vietnam National University,

More information

Extended Supervised Descent Method for Robust Face Alignment

Extended Supervised Descent Method for Robust Face Alignment Extended Supervised Descent Method for Robust Face Alignment Liu Liu, Jiani Hu, Shuo Zhang, Weihong Deng Beijing University of Posts and Telecommunications, Beijing, China Abstract. Supervised Descent

More information

FCHD: A fast and accurate head detector

FCHD: A fast and accurate head detector JOURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 1 FCHD: A fast and accurate head detector Aditya Vora, Johnson Controls Inc. arxiv:1809.08766v2 [cs.cv] 26 Sep 2018 Abstract In this paper, we

More information

MoFA: Model-based Deep Convolutional Face Autoencoder for Unsupervised Monocular Reconstruction

MoFA: Model-based Deep Convolutional Face Autoencoder for Unsupervised Monocular Reconstruction MoFA: Model-based Deep Convolutional Face Autoencoder for Unsupervised Monocular Reconstruction Ayush Tewari Michael Zollhofer Hyeongwoo Kim Pablo Garrido Florian Bernard Patrick Perez Christian Theobalt

More information

Cascade Region Regression for Robust Object Detection

Cascade Region Regression for Robust Object Detection Large Scale Visual Recognition Challenge 2015 (ILSVRC2015) Cascade Region Regression for Robust Object Detection Jiankang Deng, Shaoli Huang, Jing Yang, Hui Shuai, Zhengbo Yu, Zongguang Lu, Qiang Ma, Yali

More information

Object Detection. CS698N Final Project Presentation AKSHAT AGARWAL SIDDHARTH TANWAR

Object Detection. CS698N Final Project Presentation AKSHAT AGARWAL SIDDHARTH TANWAR Object Detection CS698N Final Project Presentation AKSHAT AGARWAL SIDDHARTH TANWAR Problem Description Arguably the most important part of perception Long term goals for object recognition: Generalization

More information

Automatic detection of books based on Faster R-CNN

Automatic detection of books based on Faster R-CNN Automatic detection of books based on Faster R-CNN Beibei Zhu, Xiaoyu Wu, Lei Yang, Yinghua Shen School of Information Engineering, Communication University of China Beijing, China e-mail: zhubeibei@cuc.edu.cn,

More information

arxiv: v1 [cs.cv] 29 Jan 2016

arxiv: v1 [cs.cv] 29 Jan 2016 Face Alignment by Local Deep Descriptor Regression arxiv:1601.07950v1 [cs.cv] 29 Jan 2016 Amit Kumar University of Maryland College Park, MD 20742 akumar14@umd.edu Abstract Rajeev Ranjan University of

More information

Hybrid Cascade Model for Face Detection in the Wild Based on Normalized Pixel Difference and a Deep Convolutional Neural Network

Hybrid Cascade Model for Face Detection in the Wild Based on Normalized Pixel Difference and a Deep Convolutional Neural Network Hybrid Cascade Model for Face Detection in the Wild Based on Normalized Pixel Difference and a Deep Convolutional Neural Network Darijan Marčetić [-2-6556-665X], Martin Soldić [-2-431-44] and Slobodan

More information

An Exploration of Computer Vision Techniques for Bird Species Classification

An Exploration of Computer Vision Techniques for Bird Species Classification An Exploration of Computer Vision Techniques for Bird Species Classification Anne L. Alter, Karen M. Wang December 15, 2017 Abstract Bird classification, a fine-grained categorization task, is a complex

More information

arxiv: v1 [cs.cv] 24 May 2016

arxiv: v1 [cs.cv] 24 May 2016 Dense CNN Learning with Equivalent Mappings arxiv:1605.07251v1 [cs.cv] 24 May 2016 Jianxin Wu Chen-Wei Xie Jian-Hao Luo National Key Laboratory for Novel Software Technology, Nanjing University 163 Xianlin

More information

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides Deep Learning in Visual Recognition Thanks Da Zhang for the slides Deep Learning is Everywhere 2 Roadmap Introduction Convolutional Neural Network Application Image Classification Object Detection Object

More information

Encoder-Decoder Networks for Semantic Segmentation. Sachin Mehta

Encoder-Decoder Networks for Semantic Segmentation. Sachin Mehta Encoder-Decoder Networks for Semantic Segmentation Sachin Mehta Outline > Overview of Semantic Segmentation > Encoder-Decoder Networks > Results What is Semantic Segmentation? Input: RGB Image Output:

More information

arxiv: v4 [cs.cv] 6 Jul 2016

arxiv: v4 [cs.cv] 6 Jul 2016 Object Boundary Guided Semantic Segmentation Qin Huang, Chunyang Xia, Wenchao Zheng, Yuhang Song, Hao Xu, C.-C. Jay Kuo (qinhuang@usc.edu) arxiv:1603.09742v4 [cs.cv] 6 Jul 2016 Abstract. Semantic segmentation

More information

CAP 6412 Advanced Computer Vision

CAP 6412 Advanced Computer Vision CAP 6412 Advanced Computer Vision http://www.cs.ucf.edu/~bgong/cap6412.html Boqing Gong April 21st, 2016 Today Administrivia Free parameters in an approach, model, or algorithm? Egocentric videos by Aisha

More information

Extend the shallow part of Single Shot MultiBox Detector via Convolutional Neural Network

Extend the shallow part of Single Shot MultiBox Detector via Convolutional Neural Network Extend the shallow part of Single Shot MultiBox Detector via Convolutional Neural Network Liwen Zheng, Canmiao Fu, Yong Zhao * School of Electronic and Computer Engineering, Shenzhen Graduate School of

More information

Category-level localization

Category-level localization Category-level localization Cordelia Schmid Recognition Classification Object present/absent in an image Often presence of a significant amount of background clutter Localization / Detection Localize object

More information

Detecting Faces Using Inside Cascaded Contextual CNN

Detecting Faces Using Inside Cascaded Contextual CNN Detecting Faces Using Inside Cascaded Contextual CNN Kaipeng Zhang 1, Zhanpeng Zhang 2, Hao Wang 1, Zhifeng Li 1, Yu Qiao 3, Wei Liu 1 1 Tencent AI Lab 2 SenseTime Group Limited 3 Guangdong Provincial

More information

arxiv: v1 [cs.cv] 15 Oct 2018

arxiv: v1 [cs.cv] 15 Oct 2018 Instance Segmentation and Object Detection with Bounding Shape Masks Ha Young Kim 1,2,*, Ba Rom Kang 2 1 Department of Financial Engineering, Ajou University Worldcupro 206, Yeongtong-gu, Suwon, 16499,

More information

arxiv: v3 [cs.cv] 18 Oct 2017

arxiv: v3 [cs.cv] 18 Oct 2017 SSH: Single Stage Headless Face Detector Mahyar Najibi* Pouya Samangouei* Rama Chellappa University of Maryland arxiv:78.3979v3 [cs.cv] 8 Oct 27 najibi@cs.umd.edu Larry S. Davis {pouya,rama,lsd}@umiacs.umd.edu

More information

SSD: Single Shot MultiBox Detector. Author: Wei Liu et al. Presenter: Siyu Jiang

SSD: Single Shot MultiBox Detector. Author: Wei Liu et al. Presenter: Siyu Jiang SSD: Single Shot MultiBox Detector Author: Wei Liu et al. Presenter: Siyu Jiang Outline 1. Motivations 2. Contributions 3. Methodology 4. Experiments 5. Conclusions 6. Extensions Motivation Motivation

More information

Learning Spatio-Temporal Features with 3D Residual Networks for Action Recognition

Learning Spatio-Temporal Features with 3D Residual Networks for Action Recognition Learning Spatio-Temporal Features with 3D Residual Networks for Action Recognition Kensho Hara, Hirokatsu Kataoka, Yutaka Satoh National Institute of Advanced Industrial Science and Technology (AIST) Tsukuba,

More information

Tiny ImageNet Visual Recognition Challenge

Tiny ImageNet Visual Recognition Challenge Tiny ImageNet Visual Recognition Challenge Ya Le Department of Statistics Stanford University yle@stanford.edu Xuan Yang Department of Electrical Engineering Stanford University xuany@stanford.edu Abstract

More information

De-mark GAN: Removing Dense Watermark With Generative Adversarial Network

De-mark GAN: Removing Dense Watermark With Generative Adversarial Network De-mark GAN: Removing Dense Watermark With Generative Adversarial Network Jinlin Wu, Hailin Shi, Shu Zhang, Zhen Lei, Yang Yang, Stan Z. Li Center for Biometrics and Security Research & National Laboratory

More information

Final Report: Smart Trash Net: Waste Localization and Classification

Final Report: Smart Trash Net: Waste Localization and Classification Final Report: Smart Trash Net: Waste Localization and Classification Oluwasanya Awe oawe@stanford.edu Robel Mengistu robel@stanford.edu December 15, 2017 Vikram Sreedhar vsreed@stanford.edu Abstract Given

More information

arxiv: v1 [cs.cv] 1 Sep 2017

arxiv: v1 [cs.cv] 1 Sep 2017 Single Shot Text Detector with Regional Attention Pan He1, Weilin Huang2, 3, Tong He3, Qile Zhu1, Yu Qiao3, and Xiaolin Li1 arxiv:1709.00138v1 [cs.cv] 1 Sep 2017 1 National Science Foundation Center for

More information