BEING one of the most significant biometric techniques, face

Size: px
Start display at page:

Download "BEING one of the most significant biometric techniques, face"

Transcription

1 JOURNAL OF L A T E X CLASS FILES, VOL. 14, NO. 8, AUGUST DeMeshNet: Blind Face Inpainting for Deep MeshFace Verification Shu Zhang, Student Member, IEEE, Ran He, Senior Member, IEEE, Zhenan Sun, Member, IEEE, and Tieniu Tan, Fellow, IEEE Abstract MeshFace photos have been widely used in many Chinese business organizations to protect ID face photos from being misused. The occlusions incurred by random meshes severely degenerate the performance of face verification systems, which raises the MeshFace verification problem between MeshFace and daily photos. Previous methods cast this problem as a typical low-level vision problem, i.e., blind inpainting. They recover perceptually pleasing clear ID photos from MeshFaces by enforcing pixel level similarity between the recovered ID images and the ground-truth clear ID images and then perform face verification on them. Essentially, face verification is conducted on a compact feature space rather than the image pixel space. Therefore, this paper argues that pixel level similarity and feature level similarity jointly offer the key to improve the verification performance. Based on this insight, we offer a novel feature oriented blind face inpainting framework. Specifically, we implement this by establishing a novel DeMeshNet, which consists of three parts. The first part addresses blind inpainting of the MeshFaces by implicitly exploiting extra supervision from the occlusion position to enforce pixel level similarity. The second part explicitly enforces a feature level similarity in the compact feature space, which can explore informative supervision from the feature space to produce better inpainting results for verification. The last part copes with face alignment within the net via a customized spatial transformer module when extracting deep facial features. All the three parts are implemented within an end-to-end network that facilitates efficient optimization. Extensive experiments on two MeshFace datasets demonstrate the effectiveness of the proposed DeMeshNet as well as the insight of this paper. Index Terms MeshFace, face verification, blind inpainting, deep learning, DeMeshNet, spatial transformer. 1 INTRODUCTION BEING one of the most significant biometric techniques, face recognition has received continuous attentions from the computer vision community since as far back as in [1]. Compared to other biometric techniques, such as iris recognition [2] and fingerprint identification [3], face recognition techniques can work in a passive manner and therefore are more readily used in many real-world scenarios, which include access control systems, social networks, forensics and public security [4], [5], [6], [7]. However, the easy of use comes at a cost, it presents many challenges for the research community to come up with robust systems that can deal with large variations in pose, expression and illumination. Until recently, benefiting from recent advancements in deep representation learning, there have been remarkable improvements in face recognition (verification in particular) performances [8], [9], [10]. Many works have shown that deep learning based solutions have the ability to surpass human in the task of face verification under uncontrolled environments. Nevertheless, this does not signal the end of the face recognition research, one specific problem that Shu Zhang is with the Center for Research on Intelligent Perception and Computing (CRIPAC), National Laboratory of Pattern Recognition(NLPR), Institute of Automation, Chinese Academy of Sciences(CASIA), No.95 ZhongGuanCun East Street, HaiDian District Beijing, P.R.China, shu.zhang@nlpr.ia.ac.cn Ran He, Zhenan Sun and Tieniu Tan are with the Center for Research on Intelligent Perception and Computing (CRIPAC), National Laboratory of Pattern Recognition (NLPR), Institute of Automation, Chinese Academy of Sciences (CASIA) and the Center for Excellence in Brain Science and Intelligence Technology (CEBSIT), Chinese Academy of Sciences (CAS). {rhe, znsun, tnt}@nlpr.ia.ac.cn. Ran He is the corresponding author. Manuscript received March 8, deals with Face Verification Between photos from Identification documents (ID, Citizen Card or Passport Card) and Daily photos (FVBID) [11] starts to gain traction and continues to present new challenges to researchers. FVBID uses face images from digitalized ID photos as gallery and uses photos captured in daily lives (with cell-phones, surveillance cameras, etc.) as probe. An individual s identity can be easily certified as long as he or she has an ID, so there s no need for registration, which is a typical routine for all biometric systems, in advance. Therefore, this technique soon become very popular in commercial banks, custom houses, etc. However, when FVBID is applied to real-world scenarios, such as automated custom control and Very Important People (VIP) recognition in commercial banks, the ID photo in an identity card may potentially be misused or illegally distributed. Therefore, ID photos are often deliberately corrupted by mesh-like lines or watermarks by the government for privacy protection when delivered to business organizations such as banks and hotels. For convenience, we denote this type of corrupted ID photo as MeshFace. As shown in Fig. 1, MeshFaces incur catastrophic influence to face recognition systems [6], [12]. Directly verifying MeshFaces against daily photos leads to very poor accuracy [13]. Therefore, these corruptions raise a novel and challenging problem called MeshFace verification which deals with face verification between MeshFaces and daily photos. Some efforts have been made to address this challenging problem. Zhang et al. [13] propose a multi-task residual learning Convolutional Neural Network (CNN) for this problem. They propose to learn a non-linear transformation with Super-Resolution Convolutional Neural Network (SRCNN) [14] based architecture to recover clear ID photos from MeshFaces. Then, the recovered clear ID photos are used for face verification. They treat the recovery of clear ID photos from MeshFaces as blind face

2 JOURNAL OF L A T E X CLASS FILES, VOL. 14, NO. 8, AUGUST Fig. 1. MeshFaces (first row) refer to the ID photos corrupted by randomly generated mesh-like lines or watermarks. The corruptions significantly degenerate the performance of facial landmark detection and facial feature extraction, thus leading to poor verification accuracy. Fig. 2. Recovered ID with higher PSNR may be further away from the ground-truth clear ID in the compact deep feature space. For instance, the green and yellow sample. Best viewed in color. inpainting because the position of corruptions is unknown during testing phase. In a related vein of research, many contemporary works [15], [16], [17], [18] have shown that CNN is very effective in solving hole filling (non-blind inpainting) problems [19] because this data-driven learning method can exploit the structure of natural images to predict occluded parts. Improved verification performance is observed after blind face inpainting in [13] because using an occlusion free image will greatly improve the accuracy of face detection and alignment. However, in their work, the performance gap between using their inpainted ID and the clear ID is still very large. On one hand, this is because SRCNN is less powerful in modeling corruption distributions and recovering the exact image content. As a result, the difference in image content between the recovered ID photos and the ground-truth clear ID photos are too large (as illustrated by the red and the purple sample in Fig.2). On the other hand, existing works often assume that using recovered ID photos with higher PSNR are more likely to achieve better verification performance [13]. Therefore, they only enforce the pixel level similarity when learning a blind face inpainting network. But in fact, face similarity is compared in a compact feature space rather the image pixel space. Moreover, it has been shown that CNN can be potentially fooled by adding even a tiny amount of noise to the original input [20], [21], [22]. That is, when facial features are extracted by a CNN, two perceptually indistinguishable face images e.g. the ground-truth clear ID and its recovered version) may still have very large feature level differences (as shown by the green and yellow sample in Fig.2). It is a common belief that a large intra-class feature distance will generally deteriorate face verification performance. Therefore, this suggests that treating blind face inpainting as a typical low-level vision problem by only enforcing the pixel level similarity can hardly guarantee an improvement in verification performance. To address the aforementioned problems, we present DeMesh- Net to take verification performance into account when dealing with the blind face inpainting problem. DeMeshNet is trained on a large scale dataset of MeshFace/clear ID photo pairs to learn a non-linear transformation to recover clear ID photos from MeshFaces. Note that DeMeshNet aims to improve the MeshFace verification performance rather than simply to recover perceptually pleasing clear ID photos. Therefore, we refer to DeMeshNet as a feature oriented blind face inpainting framework. The proposed framework is an end-to-end network that facilitates efficient optimization with gradient back propagation. For clarity, we divide the explanation the whole framework into three parts, and we will briefly introduce each part of the DeMeshNet and our contributions in the following paragraphs. In the first part, we enforce the pixel level similarity to explore the structure of face images so as to recover uncorrupted ID photos from MeshFaces. Specifically, we propose to adopt a fully convolutional network (FCN) [23] with a weighted Euclidean loss to minimize the pixel differences of the ground-truth clear ID/recovered ID photo pairs. Extra supervision from corruption positions is further exploited to accurately model the corruption distribution. In the second part, we enforce a feature level similarity between the ground-truth clear ID photo and the recovered ID photo pairs in a compact deep feature space. This feature space is spanned by a pre-trained CNN, and distance in it directly corresponds to a measure of face similarity. This part will force the inpainted image to have a smaller distance to the ground-truth clear ID in the deep feature space, which will in turn facilitate accurate verification. Moreover, we propose to measure the feature level similarity with the reverse Huber loss function so that the feature level similarity can be more efficiently optimized when the differences are very small.

3 JOURNAL OF L A T E X CLASS FILES, VOL. 14, NO. 8, AUGUST Fig. 3. Conceptual diagram of DeMeshNet. The upper part denotes the training phase of the proposed end-to-end DeMeshNet. Note that the corruption mask and clear ID are only used in the training phase to provide ground-truth for the pixel and feature level loss. The pre-trained feature extraction model has fixed parameters during training and is used for providing feature level supervision during the end-to-end training process. The lower part illustrates the MeshFace verification process. Trained DeMeshNet are firstly used for restoring clean ID photos from the MeshFaces, and then the restored (inpainted) clean ID photos are compared against daily photos for face verification. In the third part, we employ a customized spatial transformer module [24] to align and crop the face region for accurate feature extraction within the network. It is essential to take alignment into account because the MeshFace which is the input of DeMeshNet and the aligned face which is the input of the feature extraction sub-net are different in sizes, scales and orientations (as shown in Fig. 3). By integrating these three parts into one end-to-end training network, the proposed DeMeshNet has the ability to maintain both pixel and feature level similarity while learning to inpaint the MeshFace. Extensive experimental results on two MeshFace datasets demonstrate that DeMeshNet achieves the best verification accuracy and outperforms previous work by a large margin. Overall, as demonstrated by the experimental results, the proposed deep architecture can achieve superior recognition accuracy to previous work at the cost of some visual effects. Furthermore, we thoroughly evaluate different configurations of DeMeshNet to gain insight into the factors for such significant improvements. 2 RELATED WORK MeshFace inpainting is a rarely studied problem compared to other related subjects, such as hole filling [19], rain removal [25] and de-fencing [26]. All these tasks fall into the category of inpainting research, and they share a common goal, i.e., to recover a semantically plausible and visually pleasing image from a image with unknown corruption patterns. Specifically, MeshFace inpainting deals with the problem of improving MeshFace verification performance, where increasingly more attention has been attached to with the rapid development of FVBID technologies. Recently, Zhang et al. [13] proposed a multi-task ConvNet to inpaint the MeshFaces. In the following paragraphs, we will briefly go through some recent advances in the related studies of hole filling and rain removal. Hole Filling (or image completion) is a fundamental research topic in the field of computer vision. In the early years of study, researchers resort to prior knowledge like neighborhood resemblance to fill in missing areas with textures around the hole [27], [28]. These methods focus on a local propagation scheme to acquire smooth blending around the holes but fails to take the global structure into consideration, therefore, they can not guarantee a semantically consistent inpainting result. To compensate for the semantic loss, researchers introduce datadriven learning methods into this field, where they exploit a large reference database to find similar image patches to fill the missing holes [29]. Recently, deep learning based methods also show great potential in solving this ill-posed low-level vision problem with their power to model both the local and global structure of images. In particular, Phatak et al. [16] proposed the Context Encoders with combined L2 norm and adversarial loss [30] for semantic hole filling. Raymond et al. [31] further employ a pre-trained Deep

4 JOURNAL OF L A T E X CLASS FILES, VOL. 14, NO. 8, AUGUST Convolutional Generative Adversarial Network (DCGAN) [32] to restore images with random or central missing regions. Rain Removal is a typical application of the blind inpainting problem. Different from traditional inpainting problem, where the position of corrupted regions are provided in advance, blind inpainting deals with images whose position of corruption regions is not given. A common idea for solving this specific problem is to exploit the rich temporal information from video sequences to identify the corrupted regions firstly and then employ inpainting algorithm for further restoration [33]. While other researchers explore the possibility of single image rain removal with the help of image priors [34], [35]. Kang et al. [36] proposed a dictionary learning based approach for layer decomposition and rain removal. More recently, Eigen et al. [37] proposed a deep learning based method, where data-driven convolutional auto-encoder model is trained to directly recover clean patches from corrupted ones. The proposed method demonstrates effective removal of rain in outdoor test conditions. 3 APPROACH 3.1 Overview In this section, we present an overview of the proposed DeMesh- Net. We cast the proposed feature oriented blind face inpainting problem as a dense regression problem, which aims to regress a perceptually pleasing and verification favorable clear ID photo from a MeshFace X. For convenience, we refer to the groundtruth clear ID photo as the target, termed as Y and the recovered ID photo from our blind face inpainting model ψ as the prediction, termed as ψ(x). The prediction will be used for face verification against clear daily photos. We model the highly non-linear function for dense regression as a FCN as illustrated in part A of Fig. 3. Part B shows the customized spatial transformer module. The following part is a pre-trained CNN which is utilized to compute the feature representation of the aligned face region. It should be noted that parameters in both the spatial transformer module and the pretrained CNN are fixed during training. The learnable parameters in the FCN are optimized through minimizing a unified loss function that jointly models the pixel and feature level similarities between the prediction and the target pairs. No identity information is needed to train such a blind inpainting network. The pixel level loss helps to obtain perceptually pleasing inpainting results and serves as a means to capture the distribution difference between actual face texture and mesh-like corruptions. And the feature level loss explores supervision in a compact feature space to provide regularizations to the network training. Thus, the network s prediction will not only have similar appearance but also have similar feature representation to that of the target. Specifically, we develop a weighted Euclidean loss to model the pixel level similarity and employ a reverse Huber function [38] to characterize the feature level loss on the spatial transformed face region. Combining the pixel level loss and the spatial transformed feature level loss, we define the unified loss function for DeMeshNet as follows: L i = l pixel + l feature = ψ(x i ) Y i 2 F + λ 1 M i (ψ(x i ) Y i ) 2 F + 2 RH(φ j (ψ(st (X i ))) φ j (ST (Y i )) ) λ 2 j=1 (1) where ST denotes the spatial transformation implementation that samples an aligned face region from the original input solely based on the positions of facial landmarks, RH is the reverse Huber function, λ 1 and λ 2 are the balance parameters. In general, combining more than one loss ususlly is no trivial task. However, in our experiments, we have observed that our network is robust to the exact loss weights on feature and pixel level losses, as long as it can simultaneously decrease the pixel and feature level losses. Therefore, in our final experiment, we have empirically set those two parameters to 1. We postpone explanations of other symbols to later sections when we meet them. For simplicity, we omit the regularization term on the parameters of FCN (weight decay) in Equation 1, which is used to reduce overfitting when optimizing our network. The objective function can be efficiently optimized by gradient back propagation in an end-to-end manner. We will elaborate each of the three parts in the following subsections. 3.2 Pixel Level Regression Network Pixel Level Loss Blind face inpainting is naturally characterized as a pixel-wise regression problem. In the first part of DeMeshNet, it learns a highly non-linear transformation by optimizing a well-designed pixel level loss. For blind face inpainting, although positions of the corruption are not provided during the testing phase, we can still make use of this information when training DeMeshNet. Specifically, we propose to implicitly exploit this extra supervision by introducing a weighted Euclidean loss function as below: l pixel = ψ(x i ) Y i 2 F + λ M i (ψ(x i ) Y i ) 2 F (2) where X i, Y i and ψ(x i ) are a corrupted input, target and prediction, respectively. M i is the binary mask with a value of 1 indicating the pixel is corrupted and a value of 0 otherwise. is the element-wise product operation. Therefore, the second term in the loss function only measures the Euclidean loss on corrupted areas, emphasizing losses on those areas with a weighting parameter λ. This loss function helps the network to learn the distributions of corrupted pixels better by exploiting extra supervision from the corruption positions. Experimental results demonstrate that it also helps to generate predictions with higher PSNR Network Architecture Since FCN has achieved outstanding performance in dense prediction tasks like depth prediction [39] and semantic segmentation [23], it motivates us to use FCN as the non-linear function to improve the blind face inpainting performance. The main difference between FCN and the architecture in [13] is the introduction of down-sampling and up-sampling layers in FCN. This simple adjustment has enabled FCN to admit much deeper layers and to expand the receptive fields with the same amount of computational cost. The expanded receptive fields are critical to blind inpainting as it can enclose more contextual information for identifying the corrupted areas. In this work, we use the network architecture of SegNet as proposed in [40]. Specifically, a VGG-16 Net [41] is used in the front end of our network, unpooling and convolutional layers with Rectified Linear Units (ReLU) nonlinearities are symmetrically concatenated to the bottleneck layer (output of conv53) to progressively upsample the feature maps to its original size. ReLU activation is used after the last convolution layer to ensure

5 JOURNAL OF L A T E X CLASS FILES, VOL. 14, NO. 8, AUGUST Fig. 4. Schematic architecture for face feature extraction. It uses Max-Feature-Map nonlinearities instead of ReLU. the predicted pixel values are all positive.it is feasible to adopt other architectures such as ResidualNet [42] and Deconvolution Network [43], but this is beyond the scope of this paper. The input and output to the network are MeshFaces and clear ID photos respectively.gray scale images of size are used for input and output throughout the paper. 3.3 Feature Level Regression Network Feature Level Loss As aforementioned in Section 1, images with very small Euclidean distance may have large feature distance. This problem is raised in [22] and has been shown to severely deteriorate classification performance because of the enlarged intra-class feature distance. In fact, the enlarged intra-class feature distance can also influence the verification task at hand. To improve the verification performance, it won t be enough to only enforce pixel level similarity. Therefore, in the second part, we explicitly enforce the target and the prediction to have a small distance in the compact feature space computed by the pre-trained CNN φ. Let φ j (ψ(x i )) be the activations of the jth layer of the pre-trained face model φ. We intend to improve the verification performance by minimizing the residual r = φ(ψ(x i )) φ(y i ) at each position of an image. To efficiently back-propagate the errors when the residual is very small, we employ the reverse Huber loss function [38] to measure the feature level difference, its formulation is: { r r > c RH(r) = r c r 2 +c 2 2c The reverse Huber loss is equivalent to L1 norm when the residual r is in the interval of [ c, c] and equals to a transformed L2 norm otherwise. This loss experimentally works better than L2 norm. Note that when c is smaller than 1, the derivative of the L1 norm is greater than that of the L2 norm, which will speed up error back-propagation when the residual is very tiny. Like in [38], we use a dynamic threshold for c. That is, in each batch-minimization step, c is set to be at 20% of maximum residual in that batch. Since our objective is to improve verification performance, we impose the reverse Huber loss on the output of eltwise 6 in the pre-trained model φ, which is a 256-dim feature vector used for similarity comparison. Drawing inspirations from [44], where feature loss is used for super-resolution, we also impose the reverse Huber loss on the early layers (conv2 in our case) of (3) the pre-trained face model φ to include deeper supervision [45]. Therefore, our final loss function for feature level regression is: 2 l feature = RH(φ j (ψ(x i )) φ j (Y i ) ) (4) j=1 Feature loss has recently been considered in the literature of super-resolution and sketch inversion for better visual performance [44], [46], [47]. But it should be noted that in this paper, the feature level loss is proposed from a totally different perspective. Instead of pursing a perceptually pleasing image transformation results, we want to deal with the large intra-class feature distance problem and improve verification performance [48] Pre-trained CNN for Face Feature Extraction Both training DeMeshNet and evaluate the verification performance need a pre-trained CNN to compute the facial features for face images. We use the architecture proposed in [49] (code name CRIP AC 9, as it has nine layers) for facial feature extraction because it is computationally efficient in both time and space. Fig. 4 briefly illustrates the architecture of the CRIP AC 9 model, which uses Max-Feature-Map nonlinearities instead of ReLU and thus can generate dense features at its output. The network takes aligned gray-scale face images of size as input and returns a 256-dim feature (output of eltwise 6). Alignment is conducted by transforming two facial landmarks (i.e., centers of two eyes) to (32, 32) and (96, 32) with a similarity transformation. We train a base model on the purified MS-Celeb-1M dataset [50] (the original data is very noisy, we purify them before training with it). One single base model achieves a verification accuracy of 98.80% on the LFW [51] benchmark, which is very competitive among already published works [52], [53]. We further finetune the base model with triple-let loss on a large-scale IDdaily photo dataset collected from the web and use the finetuned model φ to extract compact facial features in DeMeshNet. Note that this model s parameters should stay fixed during the training of DeMeshNet because we only use it for feature computation and no identity related supervision is adopted to finetune φ while learning the blind inpainting FCN. The obtained feature extraction model is also used for the subsequent face verification after the inpainting process. 3.4 Spatial Transformer For Face Alignment Face alignment is essential for feature extraction. It helps to improve verification performance by providing a normalized input. The pre-trained model φ admits aligned face region to

6 JOURNAL OF L A T E X CLASS FILES, VOL. 14, NO. 8, AUGUST Fig. 5. The customized spatial transformer module used in our model. It uses the locations of facial landmarks to calculate a transformation matrix for face alignment. Fig. 6. An image triplet sample from the training dataset [13]. compute the 256-dim feature. But DeMeshNet takes un-aligned MeshFace as input and outputs a prediction with the same size. Therefore, in order to compute the 256-dim feature in the fully connected layer eltwise 6, we must implement face alignment within the network. Moreover, unlike the image classification models that are trained with multi-scale natural images from ImageNet [54], the pre-trained facial feature extraction model φ only takes singlescale, well-aligned face regions for training. This means that even we only compute the features from conv layers like in [44], [46], [47], we will still need to align the MeshFaces first to acquire an accurate feature representation. We incorporate a customized spatial transformer module [24] between the pixel level regression sub-net and the feature level regression sub-net to sample an aligned face region from the prediction according to the facial landmarks. This procedure is illustrated in Fig. 3 and detailed in Fig. 5. The spatial transformer module comprises of a localization network, a grid generator and a sampler. Since the similarity transformation τ θ for face alignment is uniquely determined by the coordinates of two eye centers, we do not need to learn it through the localization network as in [24]. τ θ is parameterized by θ = [a, b, 1; b, a, 1] and can be determined with the following equation: x l y l x r y r 1 1 = [ a b 1 b a 1 ] (5) where (x l, y l ), (x r, y l ) and ( 0.5, 0.5), (0.5, 0.5) are the normalized coordinates (normalized to [ 1, 1]) of two eye centers in the original image and the aligned face image respectively. Given the similarity transformation matrix τ θ, we can define the forward and backward pass of the spatial transformer module, which employ a bilinear sampling kernel to sample the original image to its aligned version and back propagate the feature loss from the aligned image to the original input respectively. In the forward pass, a sampling grid is firstly determined with the given τ θ. A sampling grid is a set of points with continuous coordinates. Sampling an input image according to this sampling grid will generate a transformed output. By defining the output pixels to lie on a regular grid G = G i of pixels G i = (x t i, yt i ), the sampling grid τ θ(g) is given by the point-wise transformation: ( x s i y s i ) = τ θ (G) = [ a b 1 b a 1 ] ( x t i y t i where (x s i, ys i ) are the source coordinates in the input feature map, and (x t i, yt i ) are the target coordinates of the regular grid. Since ) (6) Fig. 7. Sample image pairs from SV1000. Note that this dataset is very hard as the variations in illumination, pose and hair style are very significant. the coordinates in the sampling grid are continuous numbers, a bilinear kernel is applied to those positions to produce the corresponding pixel values in the output: Q = max(0, 1 x s i m )max(0, 1 y s i n ) (7) P (x t i,y t i ) = H n W P (x s n,ym s ) Q (8) where H and W are the height and width of the input image respectively and P (x t i,yi t) represents the pixel value of (x t i, yt i ). To allow the feature loss defined in the last subsection to be backpropagated from the output of the spatial transformer module to the input image, we give the gradients with respect to the input image as follows: (P (x t i, yt i )) H (P (x s m, yn)) s = W Q (9) The gradients of the feature loss defined earlier can be easily flowed back to the input image using chain rule with Equation 9. Note that the gradients with respect to the sampling grid coordinates (x s i, ys i ) are not derived, because the transformation parameters are not learned in the customized spatial transformer module. 4 EXPERIMENTS In this section, we experimentally evaluate the proposed framework. We begin by introducing the datasets for training and testing. Then we specify the baseline methods and implementation details. At last, we present detailed algorithmic evaluation, as well as comparison with other methods. 4.1 Datasets All the compared models are trained on the dataset as in [13] that contains over 500, 000 data triplets of 11, 648 individuals. It should be noted that to increase the diversity of the patterns, the random corruption is simulated with sine and cosine waves of different magnitudes and phases. Random affine transformations are also introduced to the corruption patterns. Furthermore, the percentage of corruption is controlled by manipulating the line width of the generated wavy lines. Each data triplet consists of a m n m

7 JOURNAL OF L A T E X CLASS FILES, VOL. 14, NO. 8, AUGUST Fig. 8. Sample image pairs from SYN500. Images of some Chinese celebrities are collected from the Internet. MeshFace, its clear version and a corruption mask (as illustrated in Fig. 6). Facial landmarks (two eye centers) are detected using Intraface [55] to aid the spatial transformer module. Data triplets of 400 individuals are sampled for validation and testing (200 each) and all the other individuals are used for training. Besides the SYN500 used in [13], we collect another dataset with 1000 MeshFace/daily photo pairs from 1000 individuals named SV1000 to evaluate the MeshFace verification performance. Daily photos in SV1000 are captured under surveillance cameras. As shown in Fig. 7, this dataset not only contains more individuals but also presents more variations in the daily photo, which makes it more challenging than SYN500 (shown in Fig. 8). We develop a protocol for evaluation of verification performance on these two datasets. Specifically, face comparison is conducted between all the possible recovered clear ID/daily photo pairs in the compact feature space (spanned by model φ) with cosine distance. For a dataset with N data pairs, N 2 comparisons are conducted in total. To exclude influences from metric learning methods, no supervised learning methods, e.g.joint bayesian [56], are employed on the extracted features. 4.2 Baselines and Implementation Details Although many algorithms have been proposed for non-blind inpainting, few have been developed to address the blind inpainting problem, even less for the blind face inpainting problem addressed in this paper. We implement the multi-task CNN (MtNet) [13] as a baseline. This method employs architecture that resembles the SRCNN [14] and use multi-task learning to make use of the information of corruption position in the training phase. To give a detailed evaluation of each part of the proposed DeMeshNet, we also compare it with various configurations. The compared configurations include FCN with Euclidean pixel level loss (FCNE), FCN with weighted pixel level loss (FCNW) and FCN with only feature level loss (FCNF, without spatial transformer module). All three configurations use FCN as the backbone for the blind face inpainting task. Both FCNE and FCNW only adopts the pixel level loss during training, but FCNW introduces an implicit supervision from the corruption mask with a weighted loss in addition to the Euclidean loss. FCNF takes both weighted pixel loss and whole-image feature level loss into consideration. But the feature level differences are only computed at the output of conv2, using whole-image as input to the pretrained feature extraction network φ. Like in [44], we don t implement face alignment within the network for FCNF. To further reveal the importance of the pixel level loss, we also implement an alternative DeMeshNet structure without the pixel level loss and named it as DeMeshNet F. All the compared models are trained on the training set with photo pairs of size , gray scale images that are normalized to 0 to 1 are used in all the experiments. For all Fig. 9. ROC curves for SYN500 and SV1000. Note that verification performance on the challenging SV1000 dataset is significantly worse than on SYN500. the compared network structure, [57] is used to initialize the weights, and training is carried out using Adam [58] with a batch size of 30. The learning rate is set to 10 4 initially, and decreased by a factor of 10 each 40k iterations. The training process takes approximately 160k iterations to converge. For the proposed approach, feature level loss is computed at layer conv2 and eltwise 6 of the pre-trained face model φ. For FCNF, only conv2 is used for computing the feature loss as the computation of fc layer eltwise 6 requires the input to be of size All the MeshFace verification experiments use the pre-trained face model φ for facial feature extraction. All the experiments are conducted with the Caffe framework [59] on a single GTX Titan X GPU. Code for implementing the DeMeshNet is available at cripac/demeshnet. 4.3 Evaluation of Verification Results In this section, we conduct MeshFace verification experiments on two datasets, i.e., SYN500 and SV1000. Recovered ID photos are used for FVBID according to the aforementioned protocol. We report ROC curves in Fig. 9. FNMR@FMR=1% (false nonmatch rate when false match rate is %1), FNMR@FMR=0.1% and FNMR@FMR=0.01% are reported in Table 1 for closer inspection. We also present the face verification performances with ground-truth clear ID photos and MeshFaces (denoted as Clear and Corrupted) for fair comparison. As expected, when using the MeshFaces for verification, the accuracy suffers a severe drop on both datasets due to large detection and alignment errors. After processing the MeshFace with blind face inpainting models, face verification performance with the recovered ID photos has seen a great improvement. Owning to the deeper FCN architecture and expanded receptive fields, all the FCN based models outperform the baseline model MtNet [13] by a large margin on both datasets. As shown in Fig. 9, feature loss based models (DeMeshNet, FCNF, DeMeshNet F) perform better than the models that only seek a visually pleasing inpainting results (FCNE, FCNW). This suggests that by enforcing a feature level similarity during training, predictions from DeMeshNet lie in a low-dimensional feature space that is closer to the ground-truth clear ID photos than predictions from pixel-level only networks. This is validated in the next section where we calculate the RMSE (rooted mean square error) between the features of ground-truth clear IDs and recovered

8 JOURNAL OF L A T E X CLASS FILES, VOL. 14, NO. 8, AUGUST TABLE 1 Verification performance on SYN500/SV1000 and inpainting results on the testing set. RMSE illustrates the feature distance between the recovered ID photos and the ground-truth clear ID photos, while PSNR indicates the pixel distance. Note that smaller RMSE consistently indicates better verification performance, but higher PSNR doesn t guarantee that. Method FNMR@FMR=1% FNMR@FMR=0.1% FNMR@FMR=0.01% PSNR RMSE MtNet 16.40% / 21.20% 37.20% / 42.50% 63.20% / 64.60% Clear 1.20% / 11.90% 10.60% / 25.70% 32.60% / 46.40% - 0 Corrupted 52.60% / 56.80% 66.20% / 71.50% 81.60% / 81.80% FCNE 4.80% / 16.80% 20.20% / 36.10% 46.20% / 57.00% FCNW 4.60% / 14.30% 21.60% / 35.10% 47.20% / 55.60% FCNF 4.80% / 14.20% 17.20% / 33.60% 45.60% / 53.70% DeMeshNet F 3.20% / 13.80% 17.80% / 30.60% 45.20% / 53.70% DeMeshNet E 3.40% / 13.70% 16.10% / 30.00% 45.20% / 53.50% DeMeshNet 2.60% / 13.30% 15.20% / 29.30% 44.80% / 53.00% IDs. However, only enforcing the feature level similarity will lead to inferior verification results as shown by the comparison between DeMeshNet and DeMeshNet F. This is possibly due to increased artifacts introduced by feature level loss as it can not maintain image details without enforcing the pixel level similarity. We further investigate the role of the spatial transformer module in our model by comparing DeMeshNet with FCNF which uses the image of original size ( ) as input to the feature loss component. We find that DeMeshNet consistently performs better than FCNF. This is because FCNF takes the whole image, which is different from the aligned face region in both scale and orientation, for feature extraction. Unlike the ImageNet [54] models that are trained with multi-scale natural images, the pretrained face model φ only takes single-scale, well-aligned face regions for training. The scale and orientation differences have led to the performance degeneration. Therefore, it is significant to take face alignment into account when optimizing the feature level loss as done in DeMeshNet. It should be noted that for blind inpainting models, there is an upper limit for their verification performance, which is the verification performance with ground-truth clear ID photos. From Table 1, we can observe that the gap between DeMeshNet and clear ID at FNMR@FMR=1% is very small (1.4% for both datasets, although the overall performance is significantly worse on SV1000 than on SYN500.), validating the outstanding performance of DeMeshNet. 4.4 Evaluation of Inpainting Results In this section, we qualitatively and quantitatively evaluate the inpainting results on the testing set (200 individuals, photos). Firstly, we qualitatively evaluate the compared models by visual inspection of the inpainting results in Fig. 10. It is observed that MtNet fails to identify and recover some portions of the corruptions in these cases (cherry-picked to illustrate the point). In contrast, FCN based models can handle all the corruption areas very well because they can enclose more contextual information with expanded receptive fields. This demonstrates the improved capacity of FCN over SRCNN based architectures. Regarding the details of the recovered ID photo, the models trained with only pixel level loss (FCNE, FCNW, MtNet) can better preserve the consistency of pixels and thus provide a smooth and clear photo which is more similar to the ground-truth. But the images recovered with models trained on feature level loss (FCNF in particular) contain many artifacts, making them visually less appealing. This is because the high-level features are robust to pixel level changes in the texture, shape and even color. However, images recovered from DeMeshNet looks much better than the ones from FCNF due to the introduction of the spatial transformer module within DeMeshNet. Next, we quantitatively evaluate the models by measuring both pixel level and feature level (eltwise 6) difference. The groundtruth clear ID photo is used as a baseline. Specifically, average PSNR and Euclidean distance between features (or rooted mean square error, RMSE) are shown in Table 1. Pixel level loss based models yield better PSNR as they explicitly optimize PSNR in their loss function. We also observe that FCNW performs slightly better than FCNF. This implies that exploiting extra supervision from the corruption position in the training phase is beneficial for acquiring visually pleasing inpainting results. To validate the choice of the reverse Huber loss over the Euclidean loss on the feature level difference, we also implement a DeMeshNet E that use Euclidean loss for the feature level loss. From Table 1, we can see that RMSE for DeMeshNet E is slightly larger than DeMeshNet. This indicates that the reverse Huber loss is more effective at minimizing the feature level distance thanks to the imposed L1 norm when residuals are small. Furthermore, from Table 1 we observe that smaller RMSE often means better verification performance, but higher PSNR doesn t guarantee smaller RMSE. This reveals that the models trained with only pixel level loss does suffer from the influence of the easily fooled nature of CNN [20], [21], [22]. Moreover, visually appealing inpainting results does not necessarily produce better verification results. By exploring supervision from the deep feature space, DeMeshNet can capture a distribution that is more robust to transformation in φ and thus provide a stable high-level representation in the compact deep feature space. 5 CONCLUSIONS This paper addresses the MeshFace verification problem that verifies corrupted ID photos against clear daily photos. Specifically, we have proposed DeMeshNet that consists of three parts to blindly inpaint the MeshFace before conducting verification.the proposed DeMeshNet distinguishes itself from previous works by explicitly taking verification performance into consideration while recovering a clear ID photo. The training objective of DeMeshNet is motivated by the fact that minimizing pixel level differences alone cannot guarantee a small intra-class feature distance in the compact deep feature space, which is crucial for accurate face verification. By further incorporating a spatial transformer module, DeMeshNet can implement face alignment within the network, resulting in an end-to-end network. For optimizing DeMeshNet, a very well-performed facial feature extraction network has been

9 JOURNAL OF L A T E X CLASS FILES, VOL. 14, NO. 8, AUGUST Fig. 10. Visual inspection of the inpainting results. Although these inpainting results all look very well and are quite similar, they will lead to entirely different verification rates because of different RMSE in the feature space. trained in advance. Experimental results on two MeshFace datasets demonstrate that the proposed DeMeshNet outperforms previous work on verification performance. Although we have achieved good performance on synthetic datasets, our current framework still has trouble generalizing its power to unseen real-world MeshFaces. It would be interesting to try to come up with new methods to deal with real-world MeshFaces in our future work. ACKNOWLEDGEMENT This work is funded by the National Natural Science Foundation of China (Grant No , ), the Beijing Municipal Science and Technology Commission (No.Z ) and the State Key Development Program (Grant No. 2016YF- B ). REFERENCES [1] M. A. Turk and A. P. Pentland, Face recognition using eigenfaces, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1991, pp [2] Z. Sun and T. Tan, Ordinal measures for iris recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 12, pp , Dec [3] A. K. Jain, A. Ross, and S. Pankanti, Biometrics: a tool for information security, IEEE transactions on information forensics and security, vol. 1, no. 2, pp , [4] C. Ding and D. Tao, Pose-invariant face recognition with homographybased normalization, Pattern Recognition, [5] Y. Sun, K. Nasrollahi, Z. Sun, and T. Tan, Complementary cohort strategy for multimodal face pair matching, IEEE Transactions on Information Forensics and Security, vol. 11, no. 5, pp , May [6] R. He, W. S. Zheng, and B. G. Hu, Maximum correntropy criterion for robust face recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 33, no. 8, pp , Aug [7] D. F. Smith, A. Wiliem, and B. C. Lovell, Face recognition on consumer devices: Reflections on replay attacks, IEEE Transactions on Information Forensics and Security, vol. 10, no. 4, pp , April [8] F. Schroff, D. Kalenichenko, and J. Philbin, Facenet: A unified embedding for face recognition and clustering, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp [9] Y. Sun, X. Wang, and X. Tang, Deep learning face representation from predicting 10,000 classes, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp [10] Y. Taigman, M. Yang, M. Ranzato, and L. Wolf, Deepface: Closing the gap to human-level performance in face verification, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp [11] E. Zhou, Z. Cao, and Q. Yin, Naive-deep face recognition: Touching the limit of lfw benchmark or not? arxiv preprint arxiv: , [12] X. P. Burgos-Artizzu, P. Perona, and P. Dollár, Robust face landmark estimation under occlusion, in Proceedings of the IEEE International Conference on Computer Vision, 2013, pp [13] S. Zhang, R. He, Z. Sun, and T. Tan, Multi-task convnet for blind face inpainting with application to face verification, in Proceddings of the IEEE International Conference on Biometrics, June 2016, pp [14] C. Dong, C. C. Loy, K. He, and X. Tang, Learning a deep convolutional network for image super-resolution, in Proceedings of European Conference on Computer Vision, [15] X.-J. Mao, C. Shen, and Y.-B. Yang, Image restoration using convolutional auto-encoders with symmetric skip connections, arxiv preprint arxiv: , [16] D. Pathak, P. Krahenbuhl, J. Donahue, T. Darrell, and A. A. Efros, Context encoders: Feature learning by inpainting, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, June [17] J. S. Ren, L. Xu, Q. Yan, and W. Sun, Shepard convolutional neural networks, in Advances in Neural Information Processing Systems, 2015, pp

10 JOURNAL OF L A T E X CLASS FILES, VOL. 14, NO. 8, AUGUST [18] J. Xie, L. Xu, and E. Chen, Image denoising and inpainting with deep neural networks, in Advances in Neural Information Processing Systems, 2012, pp [19] A. Criminisi, P. Pérez, and K. Toyama, Region filling and object removal by exemplar-based image inpainting, IEEE Transactions on image processing, vol. 13, no. 9, pp , [20] I. J. Goodfellow, J. Shlens, and C. Szegedy, Explaining and harnessing adversarial examples, arxiv preprint arxiv: , [21] A. Nguyen, J. Yosinski, and J. Clune, Deep neural networks are easily fooled: High confidence predictions for unrecognizable images, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2015, pp [22] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus, Intriguing properties of neural networks, arxiv preprint arxiv: , [23] J. Long, E. Shelhamer, and T. Darrell, Fully convolutional networks for semantic segmentation, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp [24] M. Jaderberg, K. Simonyan, A. Zisserman et al., Spatial transformer networks, in Advances in Neural Information Processing Systems, 2015, pp [25] Y. Li, R. T. Tan, X. Guo, J. Lu, and M. S. Brown, Rain streak removal using layer priors, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, June [26] R. Yi, J. Wang, and P. Tan, Automatic fence segmentation in videos of dynamic scenes, in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June [27] V. Kwatra, I. Essa, A. Bobick, and N. Kwatra, Texture optimization for example-based synthesis, ACM Transactions on Graphics (ToG), vol. 24, no. 3, pp , [28] A. A. Efros and T. K. Leung, Texture synthesis by non-parametric sampling, in Proceedings of the IEEE International Conference on Computer Vision, vol. 2. IEEE, 1999, pp [29] J. Hays and A. A. Efros, Scene completion using millions of photographs, in ACM Transactions on Graphics (TOG), vol. 26, no. 3. ACM, 2007, p. 4. [30] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, Generative adversarial nets, in Advances in neural information processing systems, 2014, pp [31] R. Yeh, C. Chen, T. Y. Lim, M. Hasegawa-Johnson, and M. N. Do, Semantic image inpainting with perceptual and contextual losses, arxiv preprint arxiv: , [32] A. Radford, L. Metz, and S. Chintala, Unsupervised representation learning with deep convolutional generative adversarial networks, CoR- R, vol. abs/ , [33] K. GARG and S. K. NAVAR, Detection and removal of rain from videos, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, [34] S. Roth and M. J. Black, Fields of experts: A framework for learning image priors, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, vol. 2. IEEE, 2005, pp [35] D. Zoran and Y. Weiss, From learning models of natural image patches to whole image restoration, in Proceedings of the IEEE International Conference on Computer Vision. IEEE, 2011, pp [36] L.-W. Kang, C.-W. Lin, and Y.-H. Fu, Automatic single-image-based rain streaks removal via image decomposition, IEEE Transactions on Image Processing, vol. 21, no. 4, pp , [37] D. Eigen, D. Krishnan, and R. Fergus, Restoring an image taken through a window covered with dirt or rain, in Proceedings of the IEEE International Conference on Computer Vision, 2013, pp [38] I. Laina, C. Rupprecht, V. Belagiannis, F. Tombari, and N. Navab, Deeper depth prediction with fully convolutional residual networks, arxiv preprint arxiv: , [39] D. Eigen, C. Puhrsch, and R. Fergus, Depth map prediction from a single image using a multi-scale deep network, in Advances in neural information processing systems, 2014, pp [40] V. Badrinarayanan, A. Handa, and R. Cipolla, Segnet: A deep convolutional encoder-decoder architecture for robust semantic pixel-wise labelling, arxiv preprint arxiv: , [41] K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, arxiv preprint arxiv: , [42] K. He, X. Zhang, S. Ren, and J. Sun, Deep residual learning for image recognition, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, June [43] H. Noh, S. Hong, and B. Han, Learning deconvolution network for semantic segmentation, in Proceedings of the IEEE International Conference on Computer Vision, December [44] J. Johnson, A. Alahi, and L. Fei-Fei, Perceptual losses for real-time style transfer and super-resolution, arxiv preprint arxiv: , [45] C.-Y. Lee, S. Xie, P. Gallagher, Z. Zhang, and Z. Tu, Deeply-supervised nets, arxiv preprint arxiv: , [46] Y. Güçlütürk, U. Güçlü, R. van Lier, and M. A. van Gerven, Convolutional sketch inversion, arxiv preprint arxiv: , [47] C. Ledig, L. Theis, F. Huszar, J. Caballero, A. P. Aitken, A. Tejani, J. Totz, Z. Wang, and W. Shi, Photo-realistic single image superresolution using a generative adversarial network, CoRR, vol. abs/ , [48] R. Huang, S. Zhang, T. Li, and R. He, Beyond face rotation: Global and local perception gan for photorealistic and identity preserving frontal view synthesis, arxiv preprint arxiv: , [49] X. Wu, R. He, and Z. Sun, A lightened cnn for deep face representation, arxiv preprint arxiv: , [50] Y. Guo, L. Zhang, Y. Hu, X. He, and J. Gao, Ms-celeb-1m: A dataset and benchmark for large-scale face recognition, arxiv preprint arxiv: , [51] G. B. Huang, M. Ramesh, T. Berg, and E. Learned-Miller, Labeled faces in the wild: A database for studying face recognition in unconstrained environments, University of Massachusetts, Amherst, Tech. Rep , October [52] C. Ding and D. Tao, Robust face recognition via multimodal deep face representation, IEEE Transactions on Multimedia, vol. 17, no. 11, pp , [53] D. Wang, C. Otto, and A. K. Jain, Face search at scale, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. PP, no. 99, pp. 1 1, [54] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein et al., Imagenet large scale visual recognition challenge, International Journal of Computer Vision, vol. 115, no. 3, pp , [55] X. Xiong and F. De la Torre, Supervised descent method and its applications to face alignment, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, [56] D. Chen, X. Cao, L. Wang, F. Wen, and J. Sun, Bayesian face revisited: A joint formulation, in Proceedings of European Conference on Computer Vision. Springer, 2012, pp [57] K. He, X. Zhang, S. Ren, and J. Sun, Delving deep into rectifiers: Surpassing human-level performance on imagenet classification, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp [58] K. Diederik and J. Ba, Adam: A method for stochastic optimization, arxiv preprint arxiv: , [59] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell, Caffe: Convolutional architecture for fast feature embedding, arxiv preprint arxiv: , Shu Zhang received the B.E. degree in Automation from Northwestern Polytechnical University in July He is currently pursuing a Ph.D. degree in Computer Application Technology in Center for Research on Intelligent Perception and Computing, National Laboratory of Pattern Recognition, Institute of Automation, Chinese A- cademy of Sciences, China. His research interests focus on deep learning, pattern recognition, and computer vision.

11 JOURNAL OF L A T E X CLASS FILES, VOL. 14, NO. 8, AUGUST Ran He received the B.E. and M.S. degrees in computer science from the Dalian University of Technology in 2001 and 2004, respectively, and the Ph.D. degree in pattern recognition and intelligent systems from the Institute of Automation, Chinese Academy of Sciences, in Since 2010, he has been with NLPR, where he is currently a Project Professor. His research interests focus on information theoretic learning, pattern recognition, and computer vision. He currently serves as an Associate Editor of Neurocomputing (Elsevier) and also serves on the program committee of several conferences. Zhenan Sun received the B.E. degree in industrial automation from the Dalian University of Technology, China, in 1999, the M.S. degree in system engineering from the Huazhong University of Science and Technology, China, in 2002, and the Ph.D. degree in pattern recognition and intelligent systems from the Institute of Automation, Chinese Academy of Sciences (CASIA), China, in Since 2006, he has been with NLPR, CASIA, as a Faculty Member. He is currently a Professor with the Center for Research on Intelligent Perception and Computing, National Laboratory of Pattern Recognition, CASIA. He is a member of the IEEE Computer Society, and the IEEE Signal Processing Society. His current research interests include biometrics, pattern recognition, and computer vision. He has authored or coauthored over 100 technical papers. He is an Associate Editor of the IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY and the IEEE BIOMETRICS COMPENDIUM. Tieniu Tan received his BSc degree in electronic engineering from Xian Jiaotong University, China, in 1984, and his MSc and PhD degrees in electronic engineering from Imperial College London, U.K., in 1986 and 1989, respectively. In October 1989, he joined the Department of Computer Science, The University of Reading, U.K., where he worked as a Research Fellow, Senior Research Fellow and Lecturer. In January 1998, he returned to China to join the National Laboratory of Pattern Recognition (NLPR), Institute of Automation of the Chinese Academy of Sciences (CAS) as a full professor. He was the Director General of the CAS Institute of Automation from , and the Director of the NLPR from He is currently Director of the Center for Research on Intelligent Perception and Computing at the Institute of Automation and also serves as Deputy Secretary-General of the CAS and the Director General of the CAS Bureau of International Cooperation. He has published more than 450 research papers in refereed international journals and conferences in the areas of image processing, computer vision and pattern recognition, and has authored or edited 11 books. He holds more than 70 patents. His current research interests include biometrics, image and video understanding, and information forensics and security. Dr Tan is a Member (Academician) of the Chinese Academy of Sciences, Fellow of The World Academy of Sciences for the advancement of sciences in developing countries (TWAS), an International Fellow of the UK Royal Academy of Engineering, and a Fellow of the IEEE and the IAPR (the International Association of Pattern Recognition). He is Editor-in-Chief of the International Journal of Automation and Computing. He has given invited talks and keynotes at many universities and international conferences, and has received numerous national and international awards and recognitions.

arxiv: v1 [cs.cv] 16 Nov 2016

arxiv: v1 [cs.cv] 16 Nov 2016 DeMeshNet: Blind Face Inpainting for Deep MeshFace Verification Shu Zhang Ran He Tieniu Tan National Laboratory of Pattern Recognition, CASIA Center for Research on Intelligent Perception and Computing,

More information

De-mark GAN: Removing Dense Watermark With Generative Adversarial Network

De-mark GAN: Removing Dense Watermark With Generative Adversarial Network De-mark GAN: Removing Dense Watermark With Generative Adversarial Network Jinlin Wu, Hailin Shi, Shu Zhang, Zhen Lei, Yang Yang, Stan Z. Li Center for Biometrics and Security Research & National Laboratory

More information

Robust Face Recognition Based on Convolutional Neural Network

Robust Face Recognition Based on Convolutional Neural Network 2017 2nd International Conference on Manufacturing Science and Information Engineering (ICMSIE 2017) ISBN: 978-1-60595-516-2 Robust Face Recognition Based on Convolutional Neural Network Ying Xu, Hui Ma,

More information

Content-Based Image Recovery

Content-Based Image Recovery Content-Based Image Recovery Hong-Yu Zhou and Jianxin Wu National Key Laboratory for Novel Software Technology Nanjing University, China zhouhy@lamda.nju.edu.cn wujx2001@nju.edu.cn Abstract. We propose

More information

Channel Locality Block: A Variant of Squeeze-and-Excitation

Channel Locality Block: A Variant of Squeeze-and-Excitation Channel Locality Block: A Variant of Squeeze-and-Excitation 1 st Huayu Li Northern Arizona University Flagstaff, United State Northern Arizona University hl459@nau.edu arxiv:1901.01493v1 [cs.lg] 6 Jan

More information

DCGANs for image super-resolution, denoising and debluring

DCGANs for image super-resolution, denoising and debluring DCGANs for image super-resolution, denoising and debluring Qiaojing Yan Stanford University Electrical Engineering qiaojing@stanford.edu Wei Wang Stanford University Electrical Engineering wwang23@stanford.edu

More information

Recovering Realistic Texture in Image Super-resolution by Deep Spatial Feature Transform. Xintao Wang Ke Yu Chao Dong Chen Change Loy

Recovering Realistic Texture in Image Super-resolution by Deep Spatial Feature Transform. Xintao Wang Ke Yu Chao Dong Chen Change Loy Recovering Realistic Texture in Image Super-resolution by Deep Spatial Feature Transform Xintao Wang Ke Yu Chao Dong Chen Change Loy Problem enlarge 4 times Low-resolution image High-resolution image Previous

More information

arxiv: v1 [cs.cv] 5 Jul 2017

arxiv: v1 [cs.cv] 5 Jul 2017 AlignGAN: Learning to Align Cross- Images with Conditional Generative Adversarial Networks Xudong Mao Department of Computer Science City University of Hong Kong xudonmao@gmail.com Qing Li Department of

More information

Supplementary Material: Unsupervised Domain Adaptation for Face Recognition in Unlabeled Videos

Supplementary Material: Unsupervised Domain Adaptation for Face Recognition in Unlabeled Videos Supplementary Material: Unsupervised Domain Adaptation for Face Recognition in Unlabeled Videos Kihyuk Sohn 1 Sifei Liu 2 Guangyu Zhong 3 Xiang Yu 1 Ming-Hsuan Yang 2 Manmohan Chandraker 1,4 1 NEC Labs

More information

arxiv: v1 [cs.cv] 25 Dec 2017

arxiv: v1 [cs.cv] 25 Dec 2017 Deep Blind Image Inpainting Yang Liu 1, Jinshan Pan 2, Zhixun Su 1 1 School of Mathematical Sciences, Dalian University of Technology 2 School of Computer Science and Engineering, Nanjing University of

More information

FaceNet. Florian Schroff, Dmitry Kalenichenko, James Philbin Google Inc. Presentation by Ignacio Aranguren and Rahul Rana

FaceNet. Florian Schroff, Dmitry Kalenichenko, James Philbin Google Inc. Presentation by Ignacio Aranguren and Rahul Rana FaceNet Florian Schroff, Dmitry Kalenichenko, James Philbin Google Inc. Presentation by Ignacio Aranguren and Rahul Rana Introduction FaceNet learns a mapping from face images to a compact Euclidean Space

More information

One Network to Solve Them All Solving Linear Inverse Problems using Deep Projection Models

One Network to Solve Them All Solving Linear Inverse Problems using Deep Projection Models One Network to Solve Them All Solving Linear Inverse Problems using Deep Projection Models [Supplemental Materials] 1. Network Architecture b ref b ref +1 We now describe the architecture of the networks

More information

arxiv: v1 [cs.cv] 16 Nov 2015

arxiv: v1 [cs.cv] 16 Nov 2015 Coarse-to-fine Face Alignment with Multi-Scale Local Patch Regression Zhiao Huang hza@megvii.com Erjin Zhou zej@megvii.com Zhimin Cao czm@megvii.com arxiv:1511.04901v1 [cs.cv] 16 Nov 2015 Abstract Facial

More information

DOMAIN-ADAPTIVE GENERATIVE ADVERSARIAL NETWORKS FOR SKETCH-TO-PHOTO INVERSION

DOMAIN-ADAPTIVE GENERATIVE ADVERSARIAL NETWORKS FOR SKETCH-TO-PHOTO INVERSION DOMAIN-ADAPTIVE GENERATIVE ADVERSARIAL NETWORKS FOR SKETCH-TO-PHOTO INVERSION Yen-Cheng Liu 1, Wei-Chen Chiu 2, Sheng-De Wang 1, and Yu-Chiang Frank Wang 1 1 Graduate Institute of Electrical Engineering,

More information

DOMAIN-ADAPTIVE GENERATIVE ADVERSARIAL NETWORKS FOR SKETCH-TO-PHOTO INVERSION

DOMAIN-ADAPTIVE GENERATIVE ADVERSARIAL NETWORKS FOR SKETCH-TO-PHOTO INVERSION 2017 IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING, SEPT. 25 28, 2017, TOKYO, JAPAN DOMAIN-ADAPTIVE GENERATIVE ADVERSARIAL NETWORKS FOR SKETCH-TO-PHOTO INVERSION Yen-Cheng Liu 1,

More information

Progress on Generative Adversarial Networks

Progress on Generative Adversarial Networks Progress on Generative Adversarial Networks Wangmeng Zuo Vision Perception and Cognition Centre Harbin Institute of Technology Content Image generation: problem formulation Three issues about GAN Discriminate

More information

REGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION

REGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION REGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION Kingsley Kuan 1, Gaurav Manek 1, Jie Lin 1, Yuan Fang 1, Vijay Chandrasekhar 1,2 Institute for Infocomm Research, A*STAR, Singapore 1 Nanyang Technological

More information

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks Si Chen The George Washington University sichen@gwmail.gwu.edu Meera Hahn Emory University mhahn7@emory.edu Mentor: Afshin

More information

A FRAMEWORK OF EXTRACTING MULTI-SCALE FEATURES USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKS. Kuan-Chuan Peng and Tsuhan Chen

A FRAMEWORK OF EXTRACTING MULTI-SCALE FEATURES USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKS. Kuan-Chuan Peng and Tsuhan Chen A FRAMEWORK OF EXTRACTING MULTI-SCALE FEATURES USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKS Kuan-Chuan Peng and Tsuhan Chen School of Electrical and Computer Engineering, Cornell University, Ithaca, NY

More information

Clustering Lightened Deep Representation for Large Scale Face Identification

Clustering Lightened Deep Representation for Large Scale Face Identification Clustering Lightened Deep Representation for Large Scale Face Identification Shilun Lin linshilun@bupt.edu.cn Zhicheng Zhao zhaozc@bupt.edu.cn Fei Su sufei@bupt.edu.cn ABSTRACT On specific face dataset,

More information

Structured Prediction using Convolutional Neural Networks

Structured Prediction using Convolutional Neural Networks Overview Structured Prediction using Convolutional Neural Networks Bohyung Han bhhan@postech.ac.kr Computer Vision Lab. Convolutional Neural Networks (CNNs) Structured predictions for low level computer

More information

Improving Face Recognition by Exploring Local Features with Visual Attention

Improving Face Recognition by Exploring Local Features with Visual Attention Improving Face Recognition by Exploring Local Features with Visual Attention Yichun Shi and Anil K. Jain Michigan State University Difficulties of Face Recognition Large variations in unconstrained face

More information

Deep learning for dense per-pixel prediction. Chunhua Shen The University of Adelaide, Australia

Deep learning for dense per-pixel prediction. Chunhua Shen The University of Adelaide, Australia Deep learning for dense per-pixel prediction Chunhua Shen The University of Adelaide, Australia Image understanding Classification error Convolution Neural Networks 0.3 0.2 0.1 Image Classification [Krizhevsky

More information

Deep Learning for Face Recognition. Xiaogang Wang Department of Electronic Engineering, The Chinese University of Hong Kong

Deep Learning for Face Recognition. Xiaogang Wang Department of Electronic Engineering, The Chinese University of Hong Kong Deep Learning for Face Recognition Xiaogang Wang Department of Electronic Engineering, The Chinese University of Hong Kong Deep Learning Results on LFW Method Accuracy (%) # points # training images Huang

More information

Image Super-Resolution Using Dense Skip Connections

Image Super-Resolution Using Dense Skip Connections Image Super-Resolution Using Dense Skip Connections Tong Tong, Gen Li, Xiejie Liu, Qinquan Gao Imperial Vision Technology Fuzhou, China {ttraveltong,ligen,liu.xiejie,gqinquan}@imperial-vision.com Abstract

More information

arxiv: v1 [cs.cv] 29 Sep 2016

arxiv: v1 [cs.cv] 29 Sep 2016 arxiv:1609.09545v1 [cs.cv] 29 Sep 2016 Two-stage Convolutional Part Heatmap Regression for the 1st 3D Face Alignment in the Wild (3DFAW) Challenge Adrian Bulat and Georgios Tzimiropoulos Computer Vision

More information

arxiv: v1 [cs.cv] 31 Mar 2016

arxiv: v1 [cs.cv] 31 Mar 2016 Object Boundary Guided Semantic Segmentation Qin Huang, Chunyang Xia, Wenchao Zheng, Yuhang Song, Hao Xu and C.-C. Jay Kuo arxiv:1603.09742v1 [cs.cv] 31 Mar 2016 University of Southern California Abstract.

More information

AdaDepth: Unsupervised Content Congruent Adaptation for Depth Estimation

AdaDepth: Unsupervised Content Congruent Adaptation for Depth Estimation AdaDepth: Unsupervised Content Congruent Adaptation for Depth Estimation Introduction Supplementary material In the supplementary material, we present additional qualitative results of the proposed AdaDepth

More information

Cost-alleviative Learning for Deep Convolutional Neural Network-based Facial Part Labeling

Cost-alleviative Learning for Deep Convolutional Neural Network-based Facial Part Labeling [DOI: 10.2197/ipsjtcva.7.99] Express Paper Cost-alleviative Learning for Deep Convolutional Neural Network-based Facial Part Labeling Takayoshi Yamashita 1,a) Takaya Nakamura 1 Hiroshi Fukui 1,b) Yuji

More information

arxiv: v1 [cs.cv] 31 Mar 2017

arxiv: v1 [cs.cv] 31 Mar 2017 End-to-End Spatial Transform Face Detection and Recognition Liying Chi Zhejiang University charrin0531@gmail.com Hongxin Zhang Zhejiang University zhx@cad.zju.edu.cn Mingxiu Chen Rokid.inc cmxnono@rokid.com

More information

Finding Tiny Faces Supplementary Materials

Finding Tiny Faces Supplementary Materials Finding Tiny Faces Supplementary Materials Peiyun Hu, Deva Ramanan Robotics Institute Carnegie Mellon University {peiyunh,deva}@cs.cmu.edu 1. Error analysis Quantitative analysis We plot the distribution

More information

Self-supervised Multi-level Face Model Learning for Monocular Reconstruction at over 250 Hz Supplemental Material

Self-supervised Multi-level Face Model Learning for Monocular Reconstruction at over 250 Hz Supplemental Material Self-supervised Multi-level Face Model Learning for Monocular Reconstruction at over 250 Hz Supplemental Material Ayush Tewari 1,2 Michael Zollhöfer 1,2,3 Pablo Garrido 1,2 Florian Bernard 1,2 Hyeongwoo

More information

An efficient face recognition algorithm based on multi-kernel regularization learning

An efficient face recognition algorithm based on multi-kernel regularization learning Acta Technica 61, No. 4A/2016, 75 84 c 2017 Institute of Thermomechanics CAS, v.v.i. An efficient face recognition algorithm based on multi-kernel regularization learning Bi Rongrong 1 Abstract. A novel

More information

Face Recognition A Deep Learning Approach

Face Recognition A Deep Learning Approach Face Recognition A Deep Learning Approach Lihi Shiloh Tal Perl Deep Learning Seminar 2 Outline What about Cat recognition? Classical face recognition Modern face recognition DeepFace FaceNet Comparison

More information

Bidirectional Recurrent Convolutional Networks for Video Super-Resolution

Bidirectional Recurrent Convolutional Networks for Video Super-Resolution Bidirectional Recurrent Convolutional Networks for Video Super-Resolution Qi Zhang & Yan Huang Center for Research on Intelligent Perception and Computing (CRIPAC) National Laboratory of Pattern Recognition

More information

Improved Face Detection and Alignment using Cascade Deep Convolutional Network

Improved Face Detection and Alignment using Cascade Deep Convolutional Network Improved Face Detection and Alignment using Cascade Deep Convolutional Network Weilin Cong, Sanyuan Zhao, Hui Tian, and Jianbing Shen Beijing Key Laboratory of Intelligent Information Technology, School

More information

Extend the shallow part of Single Shot MultiBox Detector via Convolutional Neural Network

Extend the shallow part of Single Shot MultiBox Detector via Convolutional Neural Network Extend the shallow part of Single Shot MultiBox Detector via Convolutional Neural Network Liwen Zheng, Canmiao Fu, Yong Zhao * School of Electronic and Computer Engineering, Shenzhen Graduate School of

More information

Show, Discriminate, and Tell: A Discriminatory Image Captioning Model with Deep Neural Networks

Show, Discriminate, and Tell: A Discriminatory Image Captioning Model with Deep Neural Networks Show, Discriminate, and Tell: A Discriminatory Image Captioning Model with Deep Neural Networks Zelun Luo Department of Computer Science Stanford University zelunluo@stanford.edu Te-Lin Wu Department of

More information

Super-Resolution on Image and Video

Super-Resolution on Image and Video Super-Resolution on Image and Video Jason Liu Stanford University liujas00@stanford.edu Max Spero Stanford University maxspero@stanford.edu Allan Raventos Stanford University aravento@stanford.edu Abstract

More information

Feature-Fused SSD: Fast Detection for Small Objects

Feature-Fused SSD: Fast Detection for Small Objects Feature-Fused SSD: Fast Detection for Small Objects Guimei Cao, Xuemei Xie, Wenzhe Yang, Quan Liao, Guangming Shi, Jinjian Wu School of Electronic Engineering, Xidian University, China xmxie@mail.xidian.edu.cn

More information

arxiv: v1 [cs.cv] 20 Dec 2016

arxiv: v1 [cs.cv] 20 Dec 2016 End-to-End Pedestrian Collision Warning System based on a Convolutional Neural Network with Semantic Segmentation arxiv:1612.06558v1 [cs.cv] 20 Dec 2016 Heechul Jung heechul@dgist.ac.kr Min-Kook Choi mkchoi@dgist.ac.kr

More information

RSRN: Rich Side-output Residual Network for Medial Axis Detection

RSRN: Rich Side-output Residual Network for Medial Axis Detection RSRN: Rich Side-output Residual Network for Medial Axis Detection Chang Liu, Wei Ke, Jianbin Jiao, and Qixiang Ye University of Chinese Academy of Sciences, Beijing, China {liuchang615, kewei11}@mails.ucas.ac.cn,

More information

Attribute Augmented Convolutional Neural Network for Face Hallucination

Attribute Augmented Convolutional Neural Network for Face Hallucination Attribute Augmented Convolutional Neural Network for Face Hallucination Cheng-Han Lee 1 Kaipeng Zhang 1 Hu-Cheng Lee 1 Chia-Wen Cheng 2 Winston Hsu 1 1 National Taiwan University 2 The University of Texas

More information

Part Localization by Exploiting Deep Convolutional Networks

Part Localization by Exploiting Deep Convolutional Networks Part Localization by Exploiting Deep Convolutional Networks Marcel Simon, Erik Rodner, and Joachim Denzler Computer Vision Group, Friedrich Schiller University of Jena, Germany www.inf-cv.uni-jena.de Abstract.

More information

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides Deep Learning in Visual Recognition Thanks Da Zhang for the slides Deep Learning is Everywhere 2 Roadmap Introduction Convolutional Neural Network Application Image Classification Object Detection Object

More information

Elastic Neural Networks for Classification

Elastic Neural Networks for Classification Elastic Neural Networks for Classification Yi Zhou 1, Yue Bai 1, Shuvra S. Bhattacharyya 1, 2 and Heikki Huttunen 1 1 Tampere University of Technology, Finland, 2 University of Maryland, USA arxiv:1810.00589v3

More information

Proceedings of the International MultiConference of Engineers and Computer Scientists 2018 Vol I IMECS 2018, March 14-16, 2018, Hong Kong

Proceedings of the International MultiConference of Engineers and Computer Scientists 2018 Vol I IMECS 2018, March 14-16, 2018, Hong Kong , March 14-16, 2018, Hong Kong , March 14-16, 2018, Hong Kong , March 14-16, 2018, Hong Kong , March 14-16, 2018, Hong Kong TABLE I CLASSIFICATION ACCURACY OF DIFFERENT PRE-TRAINED MODELS ON THE TEST DATA

More information

HENet: A Highly Efficient Convolutional Neural. Networks Optimized for Accuracy, Speed and Storage

HENet: A Highly Efficient Convolutional Neural. Networks Optimized for Accuracy, Speed and Storage HENet: A Highly Efficient Convolutional Neural Networks Optimized for Accuracy, Speed and Storage Qiuyu Zhu Shanghai University zhuqiuyu@staff.shu.edu.cn Ruixin Zhang Shanghai University chriszhang96@shu.edu.cn

More information

An Empirical Study of Generative Adversarial Networks for Computer Vision Tasks

An Empirical Study of Generative Adversarial Networks for Computer Vision Tasks An Empirical Study of Generative Adversarial Networks for Computer Vision Tasks Report for Undergraduate Project - CS396A Vinayak Tantia (Roll No: 14805) Guide: Prof Gaurav Sharma CSE, IIT Kanpur, India

More information

Lecture 7: Semantic Segmentation

Lecture 7: Semantic Segmentation Semantic Segmentation CSED703R: Deep Learning for Visual Recognition (207F) Segmenting images based on its semantic notion Lecture 7: Semantic Segmentation Bohyung Han Computer Vision Lab. bhhanpostech.ac.kr

More information

Composable Unpaired Image to Image Translation

Composable Unpaired Image to Image Translation Composable Unpaired Image to Image Translation Laura Graesser New York University lhg256@nyu.edu Anant Gupta New York University ag4508@nyu.edu Abstract There has been remarkable recent work in unpaired

More information

Graph Matching Iris Image Blocks with Local Binary Pattern

Graph Matching Iris Image Blocks with Local Binary Pattern Graph Matching Iris Image Blocs with Local Binary Pattern Zhenan Sun, Tieniu Tan, and Xianchao Qiu Center for Biometrics and Security Research, National Laboratory of Pattern Recognition, Institute of

More information

Face Recognition via Active Annotation and Learning

Face Recognition via Active Annotation and Learning Face Recognition via Active Annotation and Learning Hao Ye 1, Weiyuan Shao 1, Hong Wang 1, Jianqi Ma 2, Li Wang 2, Yingbin Zheng 1, Xiangyang Xue 2 1 Shanghai Advanced Research Institute, Chinese Academy

More information

Disguised Face Identification (DFI) with Facial KeyPoints using Spatial Fusion Convolutional Network. Nathan Sun CIS601

Disguised Face Identification (DFI) with Facial KeyPoints using Spatial Fusion Convolutional Network. Nathan Sun CIS601 Disguised Face Identification (DFI) with Facial KeyPoints using Spatial Fusion Convolutional Network Nathan Sun CIS601 Introduction Face ID is complicated by alterations to an individual s appearance Beard,

More information

MoFA: Model-based Deep Convolutional Face Autoencoder for Unsupervised Monocular Reconstruction

MoFA: Model-based Deep Convolutional Face Autoencoder for Unsupervised Monocular Reconstruction MoFA: Model-based Deep Convolutional Face Autoencoder for Unsupervised Monocular Reconstruction Ayush Tewari Michael Zollhofer Hyeongwoo Kim Pablo Garrido Florian Bernard Patrick Perez Christian Theobalt

More information

Multi-Glance Attention Models For Image Classification

Multi-Glance Attention Models For Image Classification Multi-Glance Attention Models For Image Classification Chinmay Duvedi Stanford University Stanford, CA cduvedi@stanford.edu Pararth Shah Stanford University Stanford, CA pararth@stanford.edu Abstract We

More information

Boosting face recognition via neural Super-Resolution

Boosting face recognition via neural Super-Resolution Boosting face recognition via neural Super-Resolution Guillaume Berger, Cle ment Peyrard and Moez Baccouche Orange Labs - 4 rue du Clos Courtel, 35510 Cesson-Se vigne - France Abstract. We propose a two-step

More information

Intensity-Depth Face Alignment Using Cascade Shape Regression

Intensity-Depth Face Alignment Using Cascade Shape Regression Intensity-Depth Face Alignment Using Cascade Shape Regression Yang Cao 1 and Bao-Liang Lu 1,2 1 Center for Brain-like Computing and Machine Intelligence Department of Computer Science and Engineering Shanghai

More information

Computer Vision Lecture 16

Computer Vision Lecture 16 Announcements Computer Vision Lecture 16 Deep Learning Applications 11.01.2017 Seminar registration period starts on Friday We will offer a lab course in the summer semester Deep Robot Learning Topic:

More information

Study of Residual Networks for Image Recognition

Study of Residual Networks for Image Recognition Study of Residual Networks for Image Recognition Mohammad Sadegh Ebrahimi Stanford University sadegh@stanford.edu Hossein Karkeh Abadi Stanford University hosseink@stanford.edu Abstract Deep neural networks

More information

Computer Vision Lecture 16

Computer Vision Lecture 16 Computer Vision Lecture 16 Deep Learning Applications 11.01.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Announcements Seminar registration period starts

More information

arxiv: v1 [cs.cv] 1 Jun 2018

arxiv: v1 [cs.cv] 1 Jun 2018 Accurate and Efficient Similarity Search for Large Scale Face Recognition Ce Qi BUPT Zhizhong Liu BUPT Fei Su BUPT arxiv:1806.00365v1 [cs.cv] 1 Jun 2018 Abstract Face verification is a relatively easy

More information

Fully Convolutional Network for Depth Estimation and Semantic Segmentation

Fully Convolutional Network for Depth Estimation and Semantic Segmentation Fully Convolutional Network for Depth Estimation and Semantic Segmentation Yokila Arora ICME Stanford University yarora@stanford.edu Ishan Patil Department of Electrical Engineering Stanford University

More information

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun Presented by Tushar Bansal Objective 1. Get bounding box for all objects

More information

An Exploration of Computer Vision Techniques for Bird Species Classification

An Exploration of Computer Vision Techniques for Bird Species Classification An Exploration of Computer Vision Techniques for Bird Species Classification Anne L. Alter, Karen M. Wang December 15, 2017 Abstract Bird classification, a fine-grained categorization task, is a complex

More information

Image Restoration with Deep Generative Models

Image Restoration with Deep Generative Models Image Restoration with Deep Generative Models Raymond A. Yeh *, Teck-Yian Lim *, Chen Chen, Alexander G. Schwing, Mark Hasegawa-Johnson, Minh N. Do Department of Electrical and Computer Engineering, University

More information

CENG 783. Special topics in. Deep Learning. AlchemyAPI. Week 11. Sinan Kalkan

CENG 783. Special topics in. Deep Learning. AlchemyAPI. Week 11. Sinan Kalkan CENG 783 Special topics in Deep Learning AlchemyAPI Week 11 Sinan Kalkan TRAINING A CNN Fig: http://www.robots.ox.ac.uk/~vgg/practicals/cnn/ Feed-forward pass Note that this is written in terms of the

More information

Partial Face Matching between Near Infrared and Visual Images in MBGC Portal Challenge

Partial Face Matching between Near Infrared and Visual Images in MBGC Portal Challenge Partial Face Matching between Near Infrared and Visual Images in MBGC Portal Challenge Dong Yi, Shengcai Liao, Zhen Lei, Jitao Sang, and Stan Z. Li Center for Biometrics and Security Research, Institute

More information

SUMMARY. they need large amounts of computational resources to train

SUMMARY. they need large amounts of computational resources to train Learning to Label Seismic Structures with Deconvolution Networks and Weak Labels Yazeed Alaudah, Shan Gao, and Ghassan AlRegib {alaudah,gaoshan427,alregib}@gatech.edu Center for Energy and Geo Processing

More information

Real-Time Depth Estimation from 2D Images

Real-Time Depth Estimation from 2D Images Real-Time Depth Estimation from 2D Images Jack Zhu Ralph Ma jackzhu@stanford.edu ralphma@stanford.edu. Abstract ages. We explore the differences in training on an untrained network, and on a network pre-trained

More information

Real-time Object Detection CS 229 Course Project

Real-time Object Detection CS 229 Course Project Real-time Object Detection CS 229 Course Project Zibo Gong 1, Tianchang He 1, and Ziyi Yang 1 1 Department of Electrical Engineering, Stanford University December 17, 2016 Abstract Objection detection

More information

Supplementary Material for Synthesizing Normalized Faces from Facial Identity Features

Supplementary Material for Synthesizing Normalized Faces from Facial Identity Features Supplementary Material for Synthesizing Normalized Faces from Facial Identity Features Forrester Cole 1 David Belanger 1,2 Dilip Krishnan 1 Aaron Sarna 1 Inbar Mosseri 1 William T. Freeman 1,3 1 Google,

More information

Real-time Hand Tracking under Occlusion from an Egocentric RGB-D Sensor Supplemental Document

Real-time Hand Tracking under Occlusion from an Egocentric RGB-D Sensor Supplemental Document Real-time Hand Tracking under Occlusion from an Egocentric RGB-D Sensor Supplemental Document Franziska Mueller 1,2 Dushyant Mehta 1,2 Oleksandr Sotnychenko 1 Srinath Sridhar 1 Dan Casas 3 Christian Theobalt

More information

Learning to Recognize Faces in Realistic Conditions

Learning to Recognize Faces in Realistic Conditions 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

Single Image Super Resolution of Textures via CNNs. Andrew Palmer

Single Image Super Resolution of Textures via CNNs. Andrew Palmer Single Image Super Resolution of Textures via CNNs Andrew Palmer What is Super Resolution (SR)? Simple: Obtain one or more high-resolution images from one or more low-resolution ones Many, many applications

More information

DeepIM: Deep Iterative Matching for 6D Pose Estimation - Supplementary Material

DeepIM: Deep Iterative Matching for 6D Pose Estimation - Supplementary Material DeepIM: Deep Iterative Matching for 6D Pose Estimation - Supplementary Material Yi Li 1, Gu Wang 1, Xiangyang Ji 1, Yu Xiang 2, and Dieter Fox 2 1 Tsinghua University, BNRist 2 University of Washington

More information

MULTI-SCALE OBJECT DETECTION WITH FEATURE FUSION AND REGION OBJECTNESS NETWORK. Wenjie Guan, YueXian Zou*, Xiaoqun Zhou

MULTI-SCALE OBJECT DETECTION WITH FEATURE FUSION AND REGION OBJECTNESS NETWORK. Wenjie Guan, YueXian Zou*, Xiaoqun Zhou MULTI-SCALE OBJECT DETECTION WITH FEATURE FUSION AND REGION OBJECTNESS NETWORK Wenjie Guan, YueXian Zou*, Xiaoqun Zhou ADSPLAB/Intelligent Lab, School of ECE, Peking University, Shenzhen,518055, China

More information

arxiv: v2 [cs.cv] 22 Nov 2017

arxiv: v2 [cs.cv] 22 Nov 2017 Anti-Makeup: Learning A Bi-Level Adversarial Network for Makeup-Invariant Face Verification Yi Li, Lingxiao Song, Xiang Wu, Ran He, and Tieniu Tan arxiv:1709.03654v2 [cs.cv] 22 Nov 2017 National Laboratory

More information

A Novel Multi-Frame Color Images Super-Resolution Framework based on Deep Convolutional Neural Network. Zhe Li, Shu Li, Jianmin Wang and Hongyang Wang

A Novel Multi-Frame Color Images Super-Resolution Framework based on Deep Convolutional Neural Network. Zhe Li, Shu Li, Jianmin Wang and Hongyang Wang 5th International Conference on Measurement, Instrumentation and Automation (ICMIA 2016) A Novel Multi-Frame Color Images Super-Resolution Framewor based on Deep Convolutional Neural Networ Zhe Li, Shu

More information

arxiv: v1 [cs.cv] 16 Jul 2017

arxiv: v1 [cs.cv] 16 Jul 2017 enerative adversarial network based on resnet for conditional image restoration Paper: jc*-**-**-****: enerative Adversarial Network based on Resnet for Conditional Image Restoration Meng Wang, Huafeng

More information

3D Densely Convolutional Networks for Volumetric Segmentation. Toan Duc Bui, Jitae Shin, and Taesup Moon

3D Densely Convolutional Networks for Volumetric Segmentation. Toan Duc Bui, Jitae Shin, and Taesup Moon 3D Densely Convolutional Networks for Volumetric Segmentation Toan Duc Bui, Jitae Shin, and Taesup Moon School of Electronic and Electrical Engineering, Sungkyunkwan University, Republic of Korea arxiv:1709.03199v2

More information

Clustering and Unsupervised Anomaly Detection with l 2 Normalized Deep Auto-Encoder Representations

Clustering and Unsupervised Anomaly Detection with l 2 Normalized Deep Auto-Encoder Representations Clustering and Unsupervised Anomaly Detection with l 2 Normalized Deep Auto-Encoder Representations Caglar Aytekin, Xingyang Ni, Francesco Cricri and Emre Aksu Nokia Technologies, Tampere, Finland Corresponding

More information

arxiv: v2 [cs.cv] 16 Dec 2017

arxiv: v2 [cs.cv] 16 Dec 2017 CycleGAN, a Master of Steganography Casey Chu Stanford University caseychu@stanford.edu Andrey Zhmoginov Google Inc. azhmogin@google.com Mark Sandler Google Inc. sandler@google.com arxiv:1712.02950v2 [cs.cv]

More information

Machine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU,

Machine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU, Machine Learning 10-701, Fall 2015 Deep Learning Eric Xing (and Pengtao Xie) Lecture 8, October 6, 2015 Eric Xing @ CMU, 2015 1 A perennial challenge in computer vision: feature engineering SIFT Spin image

More information

Face Recognition At-a-Distance Based on Sparse-Stereo Reconstruction

Face Recognition At-a-Distance Based on Sparse-Stereo Reconstruction Face Recognition At-a-Distance Based on Sparse-Stereo Reconstruction Ham Rara, Shireen Elhabian, Asem Ali University of Louisville Louisville, KY {hmrara01,syelha01,amali003}@louisville.edu Mike Miller,

More information

Learning Invariant Deep Representation for NIR-VIS Face Recognition

Learning Invariant Deep Representation for NIR-VIS Face Recognition Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17) Learning Invariant Deep Representation for NIR-VIS Face Recognition Ran He, Xiang Wu, Zhenan Sun, Tieniu Tan National

More information

A Hierarchial Model for Visual Perception

A Hierarchial Model for Visual Perception A Hierarchial Model for Visual Perception Bolei Zhou 1 and Liqing Zhang 2 1 MOE-Microsoft Laboratory for Intelligent Computing and Intelligent Systems, and Department of Biomedical Engineering, Shanghai

More information

arxiv: v3 [cs.cv] 3 Oct 2012

arxiv: v3 [cs.cv] 3 Oct 2012 Combined Descriptors in Spatial Pyramid Domain for Image Classification Junlin Hu and Ping Guo arxiv:1210.0386v3 [cs.cv] 3 Oct 2012 Image Processing and Pattern Recognition Laboratory Beijing Normal University,

More information

Deep Neural Networks:

Deep Neural Networks: Deep Neural Networks: Part II Convolutional Neural Network (CNN) Yuan-Kai Wang, 2016 Web site of this course: http://pattern-recognition.weebly.com source: CNN for ImageClassification, by S. Lazebnik,

More information

Kaggle Data Science Bowl 2017 Technical Report

Kaggle Data Science Bowl 2017 Technical Report Kaggle Data Science Bowl 2017 Technical Report qfpxfd Team May 11, 2017 1 Team Members Table 1: Team members Name E-Mail University Jia Ding dingjia@pku.edu.cn Peking University, Beijing, China Aoxue Li

More information

Color Local Texture Features Based Face Recognition

Color Local Texture Features Based Face Recognition Color Local Texture Features Based Face Recognition Priyanka V. Bankar Department of Electronics and Communication Engineering SKN Sinhgad College of Engineering, Korti, Pandharpur, Maharashtra, India

More information

CNN for Low Level Image Processing. Huanjing Yue

CNN for Low Level Image Processing. Huanjing Yue CNN for Low Level Image Processing Huanjing Yue 2017.11 1 Deep Learning for Image Restoration General formulation: min Θ L( x, x) s. t. x = F(y; Θ) Loss function Parameters to be learned Key issues The

More information

Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group

Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group Deep Learning Vladimir Golkov Technical University of Munich Computer Vision Group 1D Input, 1D Output target input 2 2D Input, 1D Output: Data Distribution Complexity Imagine many dimensions (data occupies

More information

Visual Inspection of Storm-Water Pipe Systems using Deep Convolutional Neural Networks

Visual Inspection of Storm-Water Pipe Systems using Deep Convolutional Neural Networks Visual Inspection of Storm-Water Pipe Systems using Deep Convolutional Neural Networks Ruwan Tennakoon, Reza Hoseinnezhad, Huu Tran and Alireza Bab-Hadiashar School of Engineering, RMIT University, Melbourne,

More information

Tiny ImageNet Visual Recognition Challenge

Tiny ImageNet Visual Recognition Challenge Tiny ImageNet Visual Recognition Challenge Ya Le Department of Statistics Stanford University yle@stanford.edu Xuan Yang Department of Electrical Engineering Stanford University xuany@stanford.edu Abstract

More information

Human-Robot Interaction

Human-Robot Interaction Human-Robot Interaction Elective in Artificial Intelligence Lecture 6 Visual Perception Luca Iocchi DIAG, Sapienza University of Rome, Italy With contributions from D. D. Bloisi and A. Youssef Visual Perception

More information

Controllable Generative Adversarial Network

Controllable Generative Adversarial Network Controllable Generative Adversarial Network arxiv:1708.00598v2 [cs.lg] 12 Sep 2017 Minhyeok Lee 1 and Junhee Seok 1 1 School of Electrical Engineering, Korea University, 145 Anam-ro, Seongbuk-gu, Seoul,

More information

SSD: Single Shot MultiBox Detector. Author: Wei Liu et al. Presenter: Siyu Jiang

SSD: Single Shot MultiBox Detector. Author: Wei Liu et al. Presenter: Siyu Jiang SSD: Single Shot MultiBox Detector Author: Wei Liu et al. Presenter: Siyu Jiang Outline 1. Motivations 2. Contributions 3. Methodology 4. Experiments 5. Conclusions 6. Extensions Motivation Motivation

More information

Inception and Residual Networks. Hantao Zhang. Deep Learning with Python.

Inception and Residual Networks. Hantao Zhang. Deep Learning with Python. Inception and Residual Networks Hantao Zhang Deep Learning with Python https://en.wikipedia.org/wiki/residual_neural_network Deep Neural Network Progress from Large Scale Visual Recognition Challenge (ILSVRC)

More information

Computer Vision Lecture 16

Computer Vision Lecture 16 Computer Vision Lecture 16 Deep Learning for Object Categorization 14.01.2016 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Announcements Seminar registration period

More information