FACE photo-sketch synthesis refers synthesizing a face

Size: px
Start display at page:

Download "FACE photo-sketch synthesis refers synthesizing a face"

Transcription

1 JOURNAL OF L A TEX CLASS FILES, VOL. X, NO. X, XX Composition-Aided Face Photo-Sketch Synthesis Jun Yu, Senior Member, IEEE,, Shengjie Shi, Fei Gao, Dacheng Tao, Fellow, IEEE, and Qingming Huang, Fellow, IEEE arxiv: v2 [cs.cv] 10 Jul 2018 Abstract Face photo-sketch synthesis aims at generating a facial sketch (or photo) conditioned on a given photo (or sketch). It is of wide applications including digital entertainment and law enforcement. Despite the great progress achieved by existing methods, they mostly yield blurred effects and great deformation over various facial components. In order to tackle this challenge, we propose to use the facial composition information to help the synthesis of face sketch/photo. Specially, we propose a novel composition-aided generative adversarial network (CA-GAN) for face photo-sketch synthesis. First, we utilize paired inputs including a face photo/sketch and the corresponding pixel-wise face labels for generating the sketch/photo. Second, we propose an improved pixel loss, termed compositional loss, to focus training on hard-generated components and delicate facial structures. Moreover, we use stacked CA-GANs (SCA-GAN) to further rectify defects and add compelling details. Experimental results show that our method is capable of generating identity-preserving and visually comfortable sketches and photos over a wide range of challenging data. Besides, cross-dataset photo-sketch synthesis evaluations demonstrate that the proposed method is of considerable generalization ability. Index Terms Face photo-sketch synthesis, face hallucination, image translation, generative adversarial network, compositional loss. I. INTRODUCTION FACE photo-sketch synthesis refers synthesizing a face sketch (or photo) given one input face photo (or sketch). It has a wide range of applications such as digital entertainment and law enforcement. Ideally, the synthesized photo or sketch portrait should be appearance-preserving and photo/sketchrealistic, so that it will yield both high sketch identification accuracy and excellent perceptual quality. Despite the great success achieved in this area, existing photo-sketch synthesis methods [1], even the most advanced deep learning based method [2], yield serious blurred effects and deformation in systhesised sketches and photos [3] (see Fig. 1). Recently, generative adversarial networks (GANs) [5] have achieved great success in image transformation, e.g. image Jun Yu and Shengjie Shi are with the Key Laboratory of Complex Systems Modeling and Simulation, School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou , China. jacobshi777@hotmail.com, yujun@hdu.edu.cn. Fei Gao is with the Key Laboratory of Complex Systems Modeling and Simulation, School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou , China and the State Key Laboratory of Integrated Services Networks, Xidian University, Xian , China. E- mail: gaofei@hdu.edu.cn. Dacheng Tao is with the UBTech Sydney Artificial Intelligence Institute, and the School of Information Technologies, in the Faculty of Engineering and Information Technologies, The University of Sydney, J12 Cleveland St, Darlington, NSW 2008, Australia. dacheng.tao@sydney.edu.au. Qingming Huang is with the School of Computer and Control Engineering, University of Chinese Academy of Sciences, Beijing , China. qmhuang@ucas.ac.cn. Corresponding author: Fei Gao, gaofei@hdu.edu.cn. (d) (e) (f) Fig. 1. Illustration results of existing methods and the proposed methods. (a) Input, (b) MrFSPS [1], (c) cgan [4], (d) our CA-GAN, (e) our SCA-GAN, (f) Ground truth. Our results show more nautral textures and details. style transfer [4], image super-resolution [6], and image-toimage translation [7]. The face photo-sketch synthesis process can be naturally formulated as photo-to-sketch and sketch-tophoto translation problem, which can be naturally handled by a conditional generative adversarial network (cgan) model [4]. Wang et al. [8] therefore test the vanilla cgan for facial sketch generation. The results show that cgan is promising to yield sketch-like textures. However, as the vanilla cgan only takes the face photo as input, it is difficult for the model to learn the structural relationship among the facial components given no composition information, thus resulting in deformation on some facial parts (see Fig. 1). Since faces are under strong geometric constrain with complicated structural details, it is promising to use the facial composition information to help the generation of sketch portraits. In this paper, we propose to use pixel-wise face labelling masks to character the facial composition. This is motivated by the following two observations. First, the facial structure can be well represented by pixel-wise face labelling masks. In particular, the pixel-wise labels can be mapped to a face photo/sketch one-by-one, thus preserving the personal

2 JOURNAL OF L A TEX CLASS FILES, VOL. X, NO. X, XX information in faces. Second, it is easy to access pixelwise facial labels due to recent development on face parsing techniques [9], thus avoiding heavy human annotations and feasible for test. Moreover, we propose an improved pixel loss, termed compositional loss, for learning the photo/sketch generator. In typical image generation methods, the pixel loss (i.e. reconstruction error) is uniformly calculated across the whole image as (part of) the objective [4]. Thus large components that comprise a vast number of pixels dominate the training procedure, obstructing the model to generate delicate facial structures. However, for face photos/sketches, large components are typically unimportant for recognition (e.g. background) or easy to generate (e.g. facial skin). In contrast, small components (e.g. eyes) are critical for recognition and difficult to generate, because they comprise complicated structures. To eliminate this barrier, we introduce a weighting factor for the distinct pixel loss of each component, which downweights the loss assigned to large components. In other words, our compositional loss focus training on hard components and prevents the large components from overwhelming the generator during training. In this paper, we propose a Composition-Aided Generative Adversarial Network (CA-GAN) for face photo-sketch synthesis. Our model is based on the cgan infrastructure. First, we utilize paired inputs including a face photo and the corresponding pixel-wise face labelling masks for generating the portrait. Second, we use the proposed novel compositional loss for training the GAN. Moreover, we use stacked CA-GANs (SCA- GAN) for refinement, which proves to be capable of rectifying defects and adding compelling details [6]. As the proposed framework jointly exploits the image appearance space and structural composition space, it is capable of generating natural face photos and sketches. Experimental results show that our methods outperform existing methods in terms of perceptual quality, and obtain highly comparable quantitative evaluation results. We also verify the excellent generalization ability of our new model across different datasets. The contributions of this paper are mainly three-fold. First, to the best of our knowledge, this is the first work to employ facial composition information in the loop of learning a face photo-sketch synthesis model. Second, we propose an improved pixel loss, termed compositional loss, to focus training on hard-generated components and delicate facial structures, which is demonstrated to be much effective. This both speeds the training up and greatly stabilizes it. Third, the proposed method yields identity-preserving, realistic, and visually comfortable photos and sketches over a wide range of challenging data. Besides, our methods show considerable generalization ability. The rest of this paper is organized as follows. Section II introduces related works. Secion III details the proposed sketch portrait generation framework. Experimental results and analysis are presented in section IV. Section V concludes this paper. II. RELATED WORK A. Face Photo-Sketch Synthesis Tremendous efforts have been made to develop facial photosketch synthesis methods, which can be broadly classified into two groups: data-driven methods and model-driven methods [10]. Data-driven refers to methods that try to synthesize a photo/sketch by using a linear combination of similar training photo/sketch patches [11], [12], [13], [14], [15], [16]. These methods have two main parts: similar photo/sketch patch searching and linear combination weight computation. The similar photo/sketch searching process heavily increases the time consuming for test and make it difficult to use a large scale of training dataset. Model-driven refers to methods that learn a mathematical function offline to map a photo to a sketch or to map a sketch to a photo [1], [17], [18], [19]. Traditionally, researchers pay great efforts to explore handcrafted features, neighbour searching strategies, and learning techniques. However, these methods typically yield serious blurred effects and great deformation in synthesized face photos and sketches. Inspired by the great success achieved by deep learning techniques [20], [5] in various image-to-image translation tasks [7], some trials are made to learn deep learning based face sketch synthesis models. To name a few, Zhang et al. [21] propose to use branched fully convolutional network (FCN) for generating structural and textural representations, respectively, and then use face parsing results to fuse them together. However, the resulted sketches have blurred and ring effects. Recently, Wang et al. [8] propose to first use the vanilla cgan to generate a sketch and then refine it by using a postprocessing approach termed back projection. Experimental results show that cgan can produces sketch-like structures in the synthesized portrait. However, there are also great deformation in various facial parts. More recently, Wang et al. [22] use the CycleGAN [23] as the prototype, and propose to use multi-scale discriminators [24] for generating high resolution sketches/photos. This method shows distinctly improved performance and yield sketch-realistic textures. However, there are still slight blurred defects and degradations in the color components. Few exiting methods use the composition information to guide the generation of the face sketch [21], [25]. In particular, they try to learn a specific generator for each component and then combine them together to form the entire face. Similar ideas have also been proposed for face image hallucination [26], [27]. In contrast, we propose to employ facial composition information in the loop of learning the generator to boost the performance. B. Image-to-image Translation Our work is highly related to image-to-image translation, which has achieved significant progress with the development of generative adversarial networks (GANs) [5], [28] and variational auto-encoders (VAEs) [29]. Among them, conditional generative adversarial networks (cgan) [4] attracts growing attentions because there are many interesting works based on it, including conditional face generation [30], text to image

3 JOURNAL OF L A TEX CLASS FILES, VOL. X, NO. X, XX synthesis [6], and image style transfer [31]. All of them obtained amazing results. Inspired by these observations, we are interested in generating sketch-realistic portraits by using cgan. However, we found the vanilla cgan insufficient for this task, thus propose to boost the performance by both developing the network architecture and modifying the objective. A. Preliminaries III. METHOD The proposed method is capable of handling both sketch synthesis and photo synthesis, because these two procedures are symmetric. In this section, we take face sketch synthesis as an example to introduce our method. Our problem is defined as follows. Given a face photo X, we would like to generate a sketch portrait Y that share the same identity with sketch-realistic appearance. Our key idea is using the face composition information to help the generation of sketch portrait. The first step is to obtain the structural composition of a face. As face parsing can well represent the facial composition, we employ the pixel-wise face labelling masks M as prior knowledge for the facial composition. The remaining problem is to generate the sketch portrait based on the face photo and composition masks: {X, M} Y. Here, we propose a composition-aided GAN (CA-GAN) for this purpose. We further employ stacked CA-GANs (SCA- GAN) to refine the generated sketch portraits. Details are given in the following of this section. B. Face Decomposition Assume that the given face photo is X R m n d, where m, n, and d are the height, width, and number of channels, respectively. We decompose the input photo into C components (e.g. hair, nose, mouse, etc.) by employing the face parsing method proposed by Liu et al. [9] due to its excellent performance. For notational convenience, we refer to this model as P-Net. By using P-Net, we get the pixel-wise labels related to 8 components, i.e. two eyes, two eyebrows, nose, upper and lower lips, inner mouth, facial skin, hair, and background [9]. We propose to use soft labels (probabilistic outputs) in this paper. Let M = {M (1),, M (C) } R m n C denote the pixel-wise face labelling masks. Here, M (c) i,j [0, 1], s.t. c M(c) i,j = 1 denotes the probability pixel X i,j belongs to the c-th component, predicted by P-Net, c = 1,, C with C = 8. In the preliminary implementation, we also tested the performance while using hard labels (binary outputs), i.e. each value M (c) i,j denotes whether X i,j belongs to the c-th component. Because it is almost impossible to get absolutely precise pix-wise face labels, using hard labels occasionally yields deformation in the border area between two nearby components. C. Composition-aided GAN (CA-GAN) In the proposed framework, we first utilize paired inputs including a face photo and the corresponding pixel-wise face labels for generating the portrait. Second, we propose an Facial Labels P-Net ( fixed ) Input Photo Composition Encoder Appearance Encoder Decoder GeneratedSketch Fig. 2. Generator architecture of the proposed composition-aided generative adversarial network (CA-GAN). improved pixel loss, termed the compositional loss, to focus training on hard-generated components and delicate facial structures. Moreover, we use stacked CA-GANs to further rectify defects and add compelling details. Details will be introduced in the following subsections. 1) Generator Architecture: The architecture of the generator in CA-GAN is presented in Fig. 2. In our case, the generator needs to translate two inputs (i.e., the face photo X and the face labelling masks M) into a single output Y. Because X and M are of different modalities, we propose to use distinct encoders to model them and refer to them as Appearance Encoder and Composition Encoder, correspondingly. The features of these two encoders are concatenated at the bottleneck layer for the decoder [32]. In this way, the information of both the face photo and the facial composition can be well modeled respectively. The architectures of the encoder, decoder, and discriminator are exactly the same as those used in [4] but without dropout, following the shape of a U-Net. Specifically, we concatenate all channels at layer i in both encoders with those at layer n i in the decoder. Details of the network can be found in the appendix of [4]. In addition, we test the network with one single encoder that takes the cascade of X and M, i.e. [X, M (1),, M (C) ] R m n (d+c), as the input. This network is the most straightforward solution for simultaneously encoding the face photo and the composition masks. Experimental results show that using this structure decreases the face sketch recognition accuracy by about 2 percent and yield slightly blurred effects in the area of hair. 2) Compositional Loss: Previous approaches to cgans have found it beneficial to mix the GAN objective with pixel loss (i.e. reconstruction error) for various tasks, e.g. image translation [4] and super-resolution reconstruction [7]. Besides, using the normalized L 1 distance encourage less blurring than the L 2 distance. We therefore use the normalized L 1 distance between the generated sketch Ŷ and the target Y in the computation of pixel loss. We introduce the compositional loss starting from the standard pixel loss for image generation. In previous works about cgans, the pixel loss is calculated over the whole image. For distinction, we refer to it as global pixel loss in this paper. Global pixel loss. Suppose both Ŷ and Y have shape m n. Let 1 be a m n matrix of ones. The global pixel loss is

4 JOURNAL OF L A TEX CLASS FILES, VOL. X, NO. X, XX expressed as: L L1,global(Y, Ŷ) = 1 mn Y Ŷ 1. (1) In the global pixel loss, the L 1 loss related to the c th component, c = 1, 2,, C, can be expressed as: L (c) L 1,global = 1 mn Y M(c) Ŷ M(c) 1, (2) with L L1,global = c L(c) L 1,global. Here, denotes the pixelwise product operation. As all the pixels are treated equally in the global pixel loss, large components (e.g. background and facial skin) contribute more to learn the generator than small components (e.g. eyes and mouth). Compositional Loss. To eliminate this barrier, we introduce a weighting factor, γ c, to balance the distinct pixel loss of each component. Specially, inspired by the idea of balanced crossentropy loss [33], we set γ c by inverse component frequency. When we adopt the soft facial labels, M (c) 1 is the sum of the possibilities every pixel belonging to the c th component. Here, denotes the convolutional operation. If we adopt the hard facial labels, it becomes the number of pixels belonging to the c th component. The component frequency is thus M(c) 1 mn. So we set γ c = mn and multiply it with M (c) L(c) 1 L, resulting 1,global in the balanced L 1 loss: L (c) L = 1 1,cmp M (c) 1 Y M(c) Ŷ M(c) 1 (3) Obviously, the balanced L 1 loss is exactly the normalized L 1 loss across the related componential region. The compositional loss is defined as, L L1,cmp(Y, Ŷ) = C c=1 L (c) L 1,cmp. (4) As γ c is broadly in inverse proportion to the component size, it reduces the loss contribution from large components. From the other aspect, it high-weights the losses assigned to small and hard-generated components. In practice we use a weighted average of the global pixel loss and compositional loss: L L1 (Y, Ŷ) = αl L 1,cmp + (1 α)l L1,global, (5) where α [0, 1] is used to balance the global pixel loss and the compositional pixel loss. We adopt this form in our experiments and set α = 0.7, as it yields slightly improved perceptual comfortability over the compositional loss. 3) Objective: Following the objective of the vanilla cgan, we express the adversarial loss of CA-GAN as: L adv (G, D) = E X,M,Y pdata (X,M,Y)[log D(X, M, Y)] + E X,M pdata (X,M)[log(1 D(X, M, G(X, M)))]. (6) Similar to the settings in [4], we do not add a Gaussian noise z as the input. Besides, we do not use dropout in the generator. Finally, we use a combination of the adversarial loss and the weighted pixel loss to learn the generator. We aim to solve: G = arg min max L adv + λl L1, (7) D G where λ is a weighting factor. Stage-I Generator G (1) Stage-II Generator G (2) Reconstruction Error Target Sketch Input Photo Facial Labels Reconstruction Error D (1) D (2) Discriminator Real/Fake Pairs Fig. 3. Pipeline of the proposed stacked composition-aided generative adversarial network (SCA-GAN). D. Stacked Refinement Network Finally, we use stacked CA-GAN (SCA-GAN) to further boost the quality of the generated sketch portrait [6]. The architecture of SCA-GAN is illustrated in Fig. 3. SCA- GAN includes two-stage GANs, each comprises a generator and a discriminator, which are sequentially denoted by G (1), D (1), G (2), D (2). In SCA-GAN, the Stage-I GAN yields an initial portrait, Ŷ(1), based on the given face photo X and pix-wise label masks M. Afterwards, the Stage-II GAN takes {X, M, Ŷ(1) } as inputs to rectify defects and add compelling details, yielding a refined sketch portrait, Ŷ (2). The network architectures of the these two GANs are almost the same, except that the inputs of G (2) and D (2) have one more channel (i.e. the initial sketch) than those of G (1) and D (1), correspondingly. Here, the given photo and the initial sketch are concatenated and input into the appearance encoder. In the implementation, we also test the SCA-GAN network with one single discriminator, shared by these two GANs. However, it cannot yield vivid hairs. E. Optimization and Implementation In the proposed method, the input image should be of fixed size, e.g In the default setting of cgan [4], the input image is resized from an arbitrary size to However, we observed that resizing the input face photo will yield serious blurred effects and great deformation in the generated sketch [8] [22]. In contrast, by padding the input image to the target size, we can obtain considerable performance improvement. We therefore use padding across all the experiments. To optimize our networks, following [4], we alternate between one gradient descent step on D, then one step on G. We use minibatch SGD and apply the Adam solver. For clarity, we illustrate the optimization procedure of SCA-GAN in Algorithm 1. In our experiments, we use batch size 1 and run for 700 epochs for all the experiments. Besides, we apply instance normalization, which has shown great superiority over batch normalization in the task of image generation [4]. We trained our models on a single Pascal Titan X GPU. When we used a training set of 500 samples, it took about 3 hours to train the CA-GAN model and 6 hours to train the SCA-GAN model. At test time, all models run in well under one second on this GPU.

5 JOURNAL OF L A TEX CLASS FILES, VOL. X, NO. X, XX Algorithm 1 Optimization procedure of SCA-GAN (for sketch synthesis). Input: a set of training instances, in form of triplet: {a face photo X, pix-wise label masks M, a target sketch Y }; iteration time t = 0, max iteration T ; Output: optimal G (1), D (1), G (2), D (2) ; initial G (1), D (1), G (2), D (2) ; for t = 1 to T do 1. Randomly select one training instance: { a face photo X, pix-wise label masks M, a target sketch Y. } 2. Estimate the initial sketch portrait: Ŷ (1) = G (1) (X, M) 3. Estimate the refined sketch portrait: Ŷ (2) = G (2) (X, M, Ŷ(1) ) 4. Update D (1) : D (1) = arg min D (1) L adv (G (1), D (1) ) 5. Update D (2) : D (2) = arg min D (2) L adv (G (2), D (2) ) 6. Update G (1) : G (1) = arg max G (1) L adv (G (1), D (1) ) + λl L1 (Y, Ŷ(1) ) 7. Update G (2) : G (2) = arg max G (2) L adv (G (2), D (2) ) + λl L1 (Y, Ŷ(2) ) end for IV. EXPERIMENTS In this section, we will first introduce the experimental settings and then present a series of empirical results to verify the effectiveness of the proposed method. A. Settings 1) Datasets: We conducted experiments on three public available databases: the CUHK Face Sketch database (CUHK) [34], the CUFSF database [35], and the VIPSL-FS database [19] [36]. The CUHK database consists of 606 face photos from three databases: the CUHK student database [37] (188 persons), the AR database [38] (123 persons), and the XM2VTS database [39] (295 persons). The CUFSF database includes 1194 persons [40]. In the CUFSF database, there are lighting variation in face photos and shape exaggeration in sketches. Thus the CUFSF is very challenging. For each person, there are one face photo and one face sketch drawn by the artist in both the CUHK database and the CUFSF database. The VIPSL-FS database includes 200 persons. For each person, there are 5 sketches drawn by different artists. Because the original sketches in the VIPSL-FS database are , we use it to test the performance of our proposed method for generating high-resolution face photos/sketches. Following existing methods [3], all these face images (photos and sketches) are geometrically aligned relying on three points: two eye centers and the mouth center. For the CUHK and CUFSF, the aligned images are cropped to the size of For the VIPSL-FS database, the aligned face photos are first cropped to the size of and then resized to The corresponding pixel-wise label masks are estimated from the photo and then resized to The aligned sketches are cropped to In the following context, we present a series of experiments: First, we perform face photo-sketch synthesis on the CUHK, CUFSF, and VIPSL-FS databases, respectively, to evaluate the performance of the proposed methods (see Part IV-B and Part IV-C); Second, we conduct cross-dataset experiments to verify whether the proposed method is independent of the training data (see Part IV-D); and Third, we discuss the network configurations for our proposed method on the CUHK database and CUFSF database (see Part IV-E). We use the proposed architecture for both the sketch synthesis and photo synthesis and release all the synthesized sketches and photos online: It is well know that a large size of training dataset is necessary for learning the GAN based model. In the experiment, unless otherwise specified, we randomly split each dataset into a training set (80%) and a testing set (20%). There is no overlap between them. Besides, we ran the training-testing process for 10 times and calculated the average values of the following criteria as the performance measure. 2) Criteria: We adopt the Peak Signal to Noise Ratio (PSNR) and Feature Similarity Index Metric (FSIM) [41] between the synthesized image and the ground-truth image to objectively assess the quality of the synthesized image. It is worth mentioning that, although these metrics works well for evaluating the quality of natural images and have become a prevalent metric in the face photo-sketch synthesis community, their performance for the synthesized images is referential but not infallible [42]. In addition, sketch based face recognition is always used to assist law enforcement. It is necessary to verify whether the synthesized images can be used for identity recognition. We therefore statistically evaluate the face recognition accuracy while using the ground-truth image (the photo or the sketch drawn by the artist) as the probe image and synthesized images (photos or sketches) as the images in the gallery. Nullspace linear discriminant analysis (NLDA) [43] is employed to conduct the face recognition experiments. We repeat each face recognition experiment 20 times by randomly partitioning the data and report the average accuracy. B. Face Sketch Synthesis Comparison with existing methods: There are great divergence in the experimental settings among existing face sketch synthesis methods. Besides, existing methods are typically tested on the CUHK database and CUFSF database. In this paper, we follow the work presented in [3] and split the dataset in the following ways. For the CUHK student database, 88 pairs of face photo-sketch are taken for training and the rest for testing. For the AR database, we randomly choose 80 pairs for training and the rest 43 pairs for testing. For the XM2VTS database, we randomly choose 100 pairs for training and the rest 195 pairs for testing. Fig. 4 presents some synthesized face sketches from different methods on the CUHK database and the CUFSF database. Four advanced methods are compared: MrFSPS [1], RSLCR [3], FCN [2], and cgan [4]. All the synthesized sketches by RSLCR, and FCN are those released by Wang et al. at: All the synthesized sketches by MrFSPS are those released by the author Peng at: MrFSPS.html.

6 JOURNAL OF L A TEX CLASS FILES, VOL. X, NO. X, XX (d) (e) (f) (g) (h) (i) Fig. 4. Examples of synthesized face sketches on the the CUHK database and the CUFSF database. (a) Photo, (b) MrFSPS [1], (c) RSLCR[3], (d) FCN [2], (e) BP-GAN [8], (f) cgan [4], (g) CA-GAN, (h) SCA-GAN, and (i) Sketch drawn by artist. From top to bottom, the examples are selected from the CUHK to, (b) MrFSPS student [3], database (c)[37], RSLCR[2], the AR database (d) FCN [38],[1], the XM2VTS (e) BP-GAN database [8], [39], (f) andcgan the CUFSF [4], database (g) CA-GAN, [40], sequentially. (h)stack-ca-gan, and (i) Sketch drawn by a As shown in Fig. 4, cgan, CA-GAN, and SCA-GAN methods could generate sketch-like textures (e.g. hair region) and shadows. In contrast, BP-GAN yields over-smooth sketch portrait. MrFSPS, RSLCR, and FCN yield serious blurred effects and great deformation in various facial pars. Besides, there are deformations on synthesized sketches by cgan, specially for the mouth area. In contrast, CA-GAN alleviates such defects, and SCA-GAN almost eliminates them. This illustrates the effectiveness of the proposed methods. Table I presents the average PSNR, FSIM, and face sketch recognition accuracy (Acc.) of the most advanced face sketch synthesis methods and the proposed ones, on the CUHK database and CUFSF database. The evaluation method is exactly the same as that presented in [3]. Specially, in the face sketch recognition experiment, we randomly split the CUHK database into a training set (150 synthesized sketches and corresponding ground-truths) and a testing set (188 sketches) consists of the gallery. For the CUFSF database, we randomly choose 300 synthesized sketches and corresponding groundtruths for training and 644 synthesized sketches as the gallery. We repeat each face recognition experiment 20 times by randomly partitioning the data. As shown in Table I, the PSNR values related to all these methods are highly comparable. According to FSIM, cgan, CA-GAN, and SCA-GAN outperform existing methods, except MrFSPS, on both the CUHK database and CUFSF database. According to the recognition accuracy, cgan, CA- GAN, and SCA-GAN show 2-3 percent inferiority over MrF- SPS and RSLCR on the CUHK database, but show 4-5 superiority over them on the CUFSF database. Since the CUFSF database is much larger than CUHK database. Besides, the lighting variation in face photos and the shape exaggeration in sketches both increase the difficulty of face sketchphoto synthesis and recognition. We conclude that cgan, CA-GAN, and SCA-GAN outperform existing methods according to the face sketch recognition accuracy. There is no considerable difference between cgan, CA-GAN, and SCA-GAN, in terms of these three criteria, across both the CUHK database and CUFSF database. In addition, PS 2 -MAN [22] achieves a FSIM value of on the CUHK database, which is slightly better

7 JOURNAL OF L A TEX CLASS FILES, VOL. X, NO. X, XX TABLE I COMPARISON WITH EXISTING FACE SKETCH SYNTHESIS METHODS IN TERM OF THE AVERAGE PSNR, FSIM (%), AND FACE RECOGNITION ACCURACY (ACC.) (%), ON THE CUHK AND CUFSF DATABASES. THE EXPERIMENTAL SETTTINGS ARE FOLLOWING RSLCR [3]. MrFSPS[1] RSLCR[3] FCN[2] BP-GAN[8] cgan[4] CA-GAN SCA-GAN PSNR CUFS N/A CUFSF N/A FSIM CUHK CUFSF N/A Acc. CUHK CUFSF than both CA-GAN and SCA-GAN. From Fig. 4 and Table I, we can safely conclude that both CA-GAN and SCA-GAN generate much better sketches and achieve highly comparable quantitative evaluations, in comparison with existing face sketch synthesis methods. High-resolution sketch synthesis: We add one convolutional layer to the encoders and one deconvolutional layer to the decoders in cgan, CA-GAN, and SCA-GAN, for the purpose of generating high-resolution sketches. We use 1000 photosketch pairs in the VIPSL-FS database here. We randomly split these pairs into a training set and a testing set by 80%:20%. Fig. 5 shows the sketch portraits generated by using cgan [4], SCA-GAN. Obviously, cgan yields check-board-like textures and blurred effects in the area of hair. Besides, cgan yields deformation in small facial components (see the left eye of the first person in Fig. 5). In contrast, SCA-GAN generate very high-quality and sketch-realistic portraits, alleviating such defects. Quantitative Evaluation: Since GANs typically need a large size of training data, we further conduct the sketch synthesis experiment on the CUHK, CUFSF, and VIPSL-FS databases by randomly splitting each database into a training set (80%) and a testing set (20%). In the face sketch recognition, we randomly split the CUHK database into a training set (70 synthesized sketches and corresponding ground-truths) and a testing set (188 sketches) consists of the gallery. For the CUFSF database, we randomly choose 120 synthesized sketches and corresponding ground-truths for training and 250 synthesized sketches as the gallery. For the VIPSL-FS database, we randomly choose 20 synthesized sketches and corresponding ground-truths for training and 40 synthesized sketches as the gallery. We repeat each face sketch recognition experiment 20 times by randomly partitioning the data. We run the training-testing process for 10 times, and calculate the average PSNR, FSIM, and face recognition accuracy (Acc.) of the synthesized sketches. The corresponding results are shown in Table II. As shown in Table II, there is no distinct difference between cgan, CA-GAN, and SCA-GAN in term of PSNR. According to FSIM, CA-GAN is highly comparable with cgan, and SCA-GAN show slight superiority over both of them. In addition, both CA-GAN and SCA-GAN achieves higher face sketch recognition accuracy on the CUHK database, and is still comparable with cgan on both the CUFSF and VIPSL- FS databases. Recall that the sketches generated by SCA-GAN looks most like the input face (as illustrated in Figs. 1, 4, and (d) (e) Fig. 5. Examples of high-resolution synthesized face sketches on the VIPSL- FS database. (a) Photo, (b) cgan [4], (c) CA-GAN, (d) SCA-GAN, and (e) Sketch drawn by artist. TABLE II AVERAGE PSNR, FSIM (%), AND FACE RECOGNITION ACCURACY (ACC.) (%) OF THE SYNTHESIZED SKETCHES ON THE CUHK, CUFSF, AND VIPSL-FS DATABASES. EACH DATABASE IS RANDOMLY SPLIT INTO A TRAINING SET (80%) AND A TESTING SET (20%). cgan CA-GAN SCA-GAN CUHK PSNR CUFSF VIPSL-FS CUHK FSIM CUFSF VIPSL-FS CUHK Acc. CUFSF VIPSL-FS ). We can safely draw the conclusion that both CA-GAN and SCA-GAN are capable of generating identity-preserving and sketch-realistic sketch portraits. C. Face Photo Synthesis We exchange the roles of the sketch and photo in the proposed model, and evaluate the face photo synthesis performance on the aforementioned datasets, separately. Fig.6 illustrates the synthesized face photos of MrFSPS [1], cgan, CA-GAN, and SCA-GAN. All the synthesized photos by MrFSPS are those released by the author Peng at: MrFSPS.html. Obviously, the face photos synthesized by MrFSPS are heavily blurred. Besides, there are serious degradations in the synthesized photos by suing cgan. In contrast, the photos

8 JOURNAL OF L A TEX CLASS FILES, VOL. X, NO. X, XX TABLE III AVERAGE PSNR, FSIM (%), AND FACE RECOGNITION ACCURACY (ACC.) (%) OF THE SYNTHESIZED PHOTOS ON THE CUHK, CUFSF, AND VIPSL-FS DATABASES. EACH DATABASE IS RANDOMLY SPLIT INTO A TRAINING SET (80%) AND A TESTING SET (20%). cgan CA-GAN SCA-GAN CUHK PSNR CUFSF VIPSL-FS CUHK FSIM CUFSF VIPSL-FS CUHK Acc. CUFSF VIPSL-FS (d) (e) (f) Fig. 6. Examples of synthesized face photos. (a) Sketch drawn by artist, (b) MrFSPS [1], (c) cgan, (d) CA-GAN, (e) SCA-GAN, and (f) groundtruth photo. From top to bottom, the examples are selected from the CUHK student database [37], the AR database [38], the XM2VTS database [39], and the CUFSF database [40], sequentially. generated by both CA-GAN or SCA-GAN consistently show considerable improvement in the perceptual quality. Table III presents the average PSNR, FSIM, and face recognition accuracy (Acc.) on the CUHK, CUFSF, and VIPSL-FS databases. Obviously, SCA-GAN obtains the best FSIM values across all the three databases. Besides, both CA-GAN and SCA-GAN outperform cgan in terms of face recognition on the CUHK and VIPSL-FS databases, but infer to cgan on the CUFSF database. From Fig. 6 and Table III, we can see that both CA-GAN and SCA-GAN generate better face photos and achieve highly comparable quantitative evaluations as compared with cgan. We can safely draw the conclusion that both CA-GAN and SCA-GAN are capable of generating identity-preserving and natural face photos. Comparison with existing methods: Recently, only a few number of methods have been proposed for face photo synthesis. Here we compare the proposed method with two advanced methods: MrFSPS [1] and PS 2 -MAN [22]. MrFSPS achieves an FSIM of 80.31% and a face recognition accuracy of 96.7% using the synthesized photos on the CUHK database. Besides, on the CUFSF database, it achieves an a face recognition accuracy of 59.37%. As reported in [22], PS 2 -MAN achieves a FSIM value of 80.62% on the CUHK database. In general, the performance of CA-GAN and SCA-GAN are highly comparable with MrFSPS and PS 2 -MAN. Besides, the photos synthesized by using CA-GAN and SCA-GAN outperform the results of MrFSPS and PS 2 -MAN. Specially, there are serious (d) (e) Fig. 7. Examples of synthesized high-resolution face photos on the VIPSL- FS database. (a) Sketch drawn by artist, (b) cgan [4], (c) CA-GAN, (d) SCA-GAN, and (e) ground truth photo. blurred effects in the photos synthesized by using MrFSPS and visible degradations in color components in those by using PS 2 -MAN [22]. In contrast, the results of CA-GAN and SCA- GAN express more natural colors and details. High-resolution photo synthesis: In addition, we evaluate the performance of cgan, CA-GAN, and SCA-GAN, in synthesizing high-resolution photos, on the VIPSL-FS database. The experimental settings are exactly the same as those previously presented in Section IV-B, except that the roles of the sketch and photo are exchanged. Fig. 7 illustrates the synthesized photos. Obviously, cgan yields check-board-like textures and blurred effects in the area of hair. Besides, cgan yields deformation in small facial components (e.g. the left eye of the second person in Fig. 7 (b)). In contrast, both CA-GAN and SCA-GAN generate very high-quality and natural face photos, alleviating such defects. Besides, the photos synthesized by using SCA-GAN are of the best perceptual quality. D. Dataset Independence To verify the generalization ability of the learned model, we conducted two cross-dataset experiments. Cross-database experiment: First, we apply the model learned from the CUHK training dataset to the whole VIPSL- FS database. There is great divergence in person identity, background, and sketch style between these two datasets. Fig. 8 illustrates the synthesized sketches on the VIPSL-FS

9 JOURNAL OF L A TEX CLASS FILES, VOL. X, NO. X, XX TABLE IV AVERAGE PSNR, FSIM (%), AND FACE RECOGNITION ACCURACY (ACC.) (%) OF THE SYNTHESIZED PHOTOS/SKETCHES ON THE WHOLE VIPSL-FS DATABASE WHILE THE MODEL IS LEARNED FROM THE CUHK TRAINING DATASET. cgan CA-GAN SCA-GAN PSNR Sketch Synthesis FSIM Acc PSNR Photo Synthesis FSIM Acc (d) (e) Fig. 8. Synthesized sketches on the VIPSL-FS database while the model is trained on the CUHK database. (a) Photo, (b) cgan, (c) CA-GAN, (d) SCA-GAN, (e) ground-truth crossdataset sketch drawn CUFS by the artist. VIPSL database, and Fig. 9 the synthesized photos. Obviously, both CA-GAN and SCA-GAN generate much better sketches and photos than cgan. Besides, the results of SCA-GAN express the best appearance. Table IV literates the average PSNR, FSIM (%), and face recognition accuracy (Acc.) (%) of the synthesized photos/sketches on the VIPSL-FS database. In the face sketch recognition task, we randomly choose 100 synthesized sketches and corresponding ground-truths for training and 200 synthesized sketches as the gallery. We repeat each face sketch recognition experiment 20 times by randomly partitioning the data. CA-GAN and stack-ca-gan outperform cgan according to PSNR and FSIM, but inferior to cgan according to the face recognition accuracy. Face photo-sketch synthesis of Chinese celebrities: In addition, we tested the CA-GAN and SCA-GAN model, trained on the CUHK database, our method on the photos and sketches of (d) Fig. 10. Synthesized sketches of Chinese celebrities. (a) Photo, (b) cgan, (c) CA-GAN, (d) SCA-GAN. Chinese celebrities. These photos and sketches are downloaded from the web, and contain different lighting conditions and backgrounds compared with the images in the training set. Fig. 10 shows the synthesized sketches, and Fig. 11 the synthesized photos. Obviously, our results express more natural textures and details than cgan. Limitations: It is inspiring that both CA-GAN and SCA- GAN show outstanding generalization ability in the sketch synthesis task. However, as shown in Fig. 8, the proposed method could not handle the black margins well, and yields ink marks on the corresponding areas. This might be caused by the fact that there are little black margins in the CUHK dataset, the generator therefore learns little about how to process black margins. In addition, the synthesized photos in the crossdataset experiment are dissatisfactory. This might be due to the great divergence between the input sketches in terms of textures and styles. It is necessary to further improve the generalization ability of the photo synthesis models. (d) (e) Fig. 9. Synthesized photos on the VIPSL-FS database while the model is trained on the CUHK database. (a) Sketch drawn by the artist, (b) cgan, (c) CA-GAN, (d) SCA-GAN, (e) ground-truth photo. crossdataset CUFS VIPSL E. Discussions on the Network Configurations 1) Ablation Study: There are mainly three components in CA-GAN, i.e. (i) using face labels in G; (ii) using face labels in D; and (iii) the compositional loss. To illustrate the contribution of each component, we accordingly evaluate the performance related to the following settings: cgan, cgan+i, cgan+ii, cgan+iii, CA-GAN (i.e. cgan+i+ii+iii), and SCA-GAN. We separately conduct the photo synthesis and sketch synthesis experiments on the CUHK database and the

10 JOURNAL OF L A TEX CLASS FILES, VOL. X, NO. X, XX V. CONCLUSION In this paper, we propose a novel composition-aided generative adversarial network for face photo-sketch synthesis. Our approach produces high-quality face photos and sketches over a wide range of challenging data. We hope that the presented approach can support the applications of other image generation problems. Besides, it is essential to develop models that can handle photos/sketches with great variations in head poses, lighting conditions, and styles. Finally, exciting work remains to be done to qualitatively evaluate the quality of the synthesized sketches and photos. (d) Fig. 11. Synthesized photos of Chinese celebrities. (a) Sketch, (b) cgan, (c) CA-GAN, (d) SCA-GAN. CUFSF database. We randomly split each database into two parts: 80% for training and the rest for testing. There is no overlap between the training set and the testing set. We show the corresponding PSNR, FSIM, and recognition accuracy (Acc.) of the synthesized sketches and photos in Table V and Table VI, respectively. There is no distinct difference between different settings. Fig. 12, illustrates the synthesised sketches and photos. Compared to (b), (c)-(e) express less deformations and sharper margins in the area of nose, mouse and eyes. In other words, all the proposed three components improve the quality of the generated sketches. 2) Stability of the Training Procedure: We discover that, our proposed approaches considerably stabilizes the training procedure of the network. Fig. 13 shows the (smoothed) training loss curves related to cgan [4], CA-GAN, and SCA- GANK database. Specially, (a) and (b) shows the reconstruction error (Global L 1 loss) and the adversarial loss in the sketch synthesis task; (c) and (d) show the reconstruction error and the adversarial loss in the photo synthesis task, respectively. For clarity, we smooth the initial loss curves by averaging adjacent 40 loss values. Obviously, there are large impulses in the adversarial loss of cgan. In contrast, the corresponding curves of CA-GAN and SCA-GAN are much smoother. The reconstruction error of both CA-GAN and SCA-GAN are smaller than that of cgan. Besides, SCA-GAN achieves the least reconstruction errors and smoothest loss curves. This observation explains why stacked generators are capable of refining the generation performance [44]. REFERENCES [1] C. Peng, X. Gao, N. Wang, D. Tao, X. Li, and J. Li, Multiple representations-based face sketch-photo synthesis, IEEE Transactions on Neural Networks and Learning Systems, vol. 27, no. 11, pp , [2] L. Zhang, L. Lin, X. Wu, S. Ding, and L. Zhang, End-to-end photosketch generation via fully convolutional representation learning, in Proceedings of the 5th ACM on International Conference on Multimedia Retrieval, 2015, pp [3] N. Wang, X. Gao, and J. Li, Random sampling for fast face sketch synthesis, Pattern Recognition (PR), [4] P. Isola, J. Zhu, T. Zhou, and A. Efros, Image-to-image translation with conditional adversarial networks, arxiv preprint arxiv: , Tech. Rep., [5] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, Generative adversarial nets, in International Conference on Neural Information Processing Systems, 2014, pp [6] H. Zhang, T. Xu, H. Li, S. Zhang, X. Wang, X. Huang, and D. Metaxas, Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks, Tech. Rep., [7] J. Johnson, A. Alahi, and F. F. Li, Perceptual losses for real-time style transfer and super-resolution, in European Conference on Computer Vision, 2016, pp [8] N. Wang, W. Zha, J. Li, and X. Gao, Back projection: an effective postprocessing method for gan-based face sketch synthesis, Pattern Recognition Letters, pp. 1 7, [9] S. Liu, J. Yang, C. Huang, and M. H. Yang, Multi-objective convolutional learning for face labeling, in Computer Vision and Pattern Recognition, 2015, pp [10] N. Wang, M. Zhu, J. Li, B. Song, and Z. Li, Data-driven vs. modeldriven: Fast face sketch synthesis, Neurocomputing, [11] Y. Song, J. Zhang, L. Bao, and Q. Yang, Fast preprocessing for robust face sketch synthesis, in Proceedings of International Joint Conference on Artifical Intelligence, 2017, pp [12] Y. Song, L. Bao, S. He, Q. Yang, and M. H. Yang, Stylizing face images via multiple exemplars, Computer Vision and Image Understanding, [13] X. Gao, N. Wang, D. Tao, and X. Li, Face sketchphoto synthesis and retrieval using sparse representation, IEEE Transactions on Circuits and Systems for Video Technology, vol. 22, no. 8, pp , [14] Y. Song, L. Bao, Q. Yang, and M. H. Yang, Real-time exemplar-based face sketch synthesis, in European Conference on Computer Vision, 2014, pp [15] Q. Pan, Y. Liang, L. Zhang, and S. Wang, Semi-coupled dictionary learning with applications to image super-resolution and photo-sketch synthesis, in Computer Vision and Pattern Recognition, 2012, pp [16] X. Wang and X. Tang, Face photo-sketch synthesis and recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 11, pp , [17] S. Zhang, X. Gao, N. Wang, J. Li, and M. Zhang, Face sketch synthesis via sparse representation-based greedy search. IEEE Transactions on Image Processing, vol. 24, no. 8, pp , [18] S. Zhang, X. Gao, N. Wang, and J. Li, Robust face sketch style synthesis, IEEE Transactions on Image Processing, vol. 25, no. 1, p. 220, [19] N. Wang, D. Tao, X. Gao, X. Li, and J. Li, Transductive face sketchphoto synthesis, IEEE Transactions on Neural Networks and Learning Systems, vol. 24, no. 9, pp , 2013.

11 JOURNAL OF LATEX CLASS FILES, VOL. X, NO. X, XX (a) (b) (c) (d) (e) (f) (g) (h) Fig. 12. Illustration of synthesized face sketch and photo under different configurations. (a) Input, (b) cgan, (c) cgan with face labels in G (cgan+i), (d) cgan with face labels in D (cgan+ii), (e) cgan with the compositional loss (cgan+iii), (f) CA-GAN, (g) SCA-GAN, (h) ground-truth. TABLE V AVERAGE PSNR, FSIM, AND FACE RECOGNITION ACCURACY (ACC.) OF THE SYNTHESIZED SKETCHES ON THE CUHK AND CUFSF DATABASES. ( I ) U SING FACE LABELS IN G; ( II ) USING FACE LABELS IN D; AND ( III ) THE COMPOSITIONAL LOSS. G cgan D cgan+i LOSS cgan+ii Parsing cgan+iii CA-GAN SCA-GAN PSNR CUHK CUFSF FSIM CUHK CUFSF Acc. CUHK CUFSF TABLE VI AVERAGE PSNR, FSIM, AND FACE RECOGNITION ACCURACY (ACC.) OF THE SYNTHESIZED PHOTOS ON THE CUHK AND CUFSF DATABASES. ( I ) U SING FACE LABELS IN G; ( II ) USING FACE LABELS IN D; AND ( III ) THE COMPOSITIONAL LOSS. (a) cgan cgan+i cgan+ii cgan+iii CA-GAN SCA-GAN PSNR CUHK CUFSF FSIM CUHK CUFSF Acc. CUHK CUFSF (b) (c) (d) Fig. 13. Training loss curves of cgan, CA-GAN, and SCA-GAN, on the CUHK database. (a) Reconstruction error in the sketch synthesis task, (b) adversarial loss in the sketch synthesis task, (c) reconstruction error in the photo synthesis task, and (d) adversarial loss in the photo synthesis task.

FACE photo-sketch synthesis refers synthesizing a face. Towards Realistic Face Photo-Sketch Synthesis via Composition-Aided GANs

FACE photo-sketch synthesis refers synthesizing a face. Towards Realistic Face Photo-Sketch Synthesis via Composition-Aided GANs JOURNAL OF L A TEX CLASS FILES, VOL. X, NO. X, XX 2018 1 Towards Realistic Face Photo-Sketch Synthesis via Composition-Aided GANs Jun Yu, Senior Member, IEEE, Shengjie Shi, Fei Gao, Dacheng Tao, Fellow,

More information

Random Sampling for Fast Face Sketch. Synthesis

Random Sampling for Fast Face Sketch. Synthesis Random Sampling for Fast Face Sketch Synthesis Nannan Wang, and Xinbo Gao, and Jie Li arxiv:70.09v2 [cs.cv] Aug 207 Abstract Exemplar-based face sketch synthesis plays an important role in both digital

More information

DOMAIN-ADAPTIVE GENERATIVE ADVERSARIAL NETWORKS FOR SKETCH-TO-PHOTO INVERSION

DOMAIN-ADAPTIVE GENERATIVE ADVERSARIAL NETWORKS FOR SKETCH-TO-PHOTO INVERSION DOMAIN-ADAPTIVE GENERATIVE ADVERSARIAL NETWORKS FOR SKETCH-TO-PHOTO INVERSION Yen-Cheng Liu 1, Wei-Chen Chiu 2, Sheng-De Wang 1, and Yu-Chiang Frank Wang 1 1 Graduate Institute of Electrical Engineering,

More information

DOMAIN-ADAPTIVE GENERATIVE ADVERSARIAL NETWORKS FOR SKETCH-TO-PHOTO INVERSION

DOMAIN-ADAPTIVE GENERATIVE ADVERSARIAL NETWORKS FOR SKETCH-TO-PHOTO INVERSION 2017 IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING, SEPT. 25 28, 2017, TOKYO, JAPAN DOMAIN-ADAPTIVE GENERATIVE ADVERSARIAL NETWORKS FOR SKETCH-TO-PHOTO INVERSION Yen-Cheng Liu 1,

More information

Shweta Gandhi, Dr.D.M.Yadav JSPM S Bhivarabai sawant Institute of technology & research Electronics and telecom.dept, Wagholi, Pune

Shweta Gandhi, Dr.D.M.Yadav JSPM S Bhivarabai sawant Institute of technology & research Electronics and telecom.dept, Wagholi, Pune Face sketch photo synthesis Shweta Gandhi, Dr.D.M.Yadav JSPM S Bhivarabai sawant Institute of technology & research Electronics and telecom.dept, Wagholi, Pune Abstract Face sketch to photo synthesis has

More information

arxiv: v1 [cs.cv] 5 Jul 2017

arxiv: v1 [cs.cv] 5 Jul 2017 AlignGAN: Learning to Align Cross- Images with Conditional Generative Adversarial Networks Xudong Mao Department of Computer Science City University of Hong Kong xudonmao@gmail.com Qing Li Department of

More information

arxiv: v1 [cs.cv] 12 Dec 2018

arxiv: v1 [cs.cv] 12 Dec 2018 Semi-Supervised Learning for Face Sketch Synthesis in the Wild Chaofeng Chen 1, Wei Liu 1, Xiao Tan 2 and Kwan-Yee K. Wong 1 arxiv:1812.04929v1 [cs.cv] 12 Dec 2018 1 The University of Hong Kong, 2 Baidu

More information

Controllable Generative Adversarial Network

Controllable Generative Adversarial Network Controllable Generative Adversarial Network arxiv:1708.00598v2 [cs.lg] 12 Sep 2017 Minhyeok Lee 1 and Junhee Seok 1 1 School of Electrical Engineering, Korea University, 145 Anam-ro, Seongbuk-gu, Seoul,

More information

Cost-alleviative Learning for Deep Convolutional Neural Network-based Facial Part Labeling

Cost-alleviative Learning for Deep Convolutional Neural Network-based Facial Part Labeling [DOI: 10.2197/ipsjtcva.7.99] Express Paper Cost-alleviative Learning for Deep Convolutional Neural Network-based Facial Part Labeling Takayoshi Yamashita 1,a) Takaya Nakamura 1 Hiroshi Fukui 1,b) Yuji

More information

Lab meeting (Paper review session) Stacked Generative Adversarial Networks

Lab meeting (Paper review session) Stacked Generative Adversarial Networks Lab meeting (Paper review session) Stacked Generative Adversarial Networks 2017. 02. 01. Saehoon Kim (Ph. D. candidate) Machine Learning Group Papers to be covered Stacked Generative Adversarial Networks

More information

arxiv: v1 [cs.cv] 27 Oct 2017

arxiv: v1 [cs.cv] 27 Oct 2017 High-Quality Facial Photo-Sketch Synthesis Using Multi-Adversarial Networks Lidan Wang, Vishwanath A. Sindagi, Vishal M. Patel Rutgers, The State University of New Jersey 94 Brett Road, Piscataway, NJ

More information

Deep Learning for Visual Manipulation and Synthesis

Deep Learning for Visual Manipulation and Synthesis Deep Learning for Visual Manipulation and Synthesis Jun-Yan Zhu 朱俊彦 UC Berkeley 2017/01/11 @ VALSE What is visual manipulation? Image Editing Program input photo User Input result Desired output: stay

More information

An Empirical Study of Generative Adversarial Networks for Computer Vision Tasks

An Empirical Study of Generative Adversarial Networks for Computer Vision Tasks An Empirical Study of Generative Adversarial Networks for Computer Vision Tasks Report for Undergraduate Project - CS396A Vinayak Tantia (Roll No: 14805) Guide: Prof Gaurav Sharma CSE, IIT Kanpur, India

More information

Two Routes for Image to Image Translation: Rule based vs. Learning based. Minglun Gong, Memorial Univ. Collaboration with Mr.

Two Routes for Image to Image Translation: Rule based vs. Learning based. Minglun Gong, Memorial Univ. Collaboration with Mr. Two Routes for Image to Image Translation: Rule based vs. Learning based Minglun Gong, Memorial Univ. Collaboration with Mr. Zili Yi Introduction A brief history of image processing Image to Image translation

More information

Recovering Realistic Texture in Image Super-resolution by Deep Spatial Feature Transform. Xintao Wang Ke Yu Chao Dong Chen Change Loy

Recovering Realistic Texture in Image Super-resolution by Deep Spatial Feature Transform. Xintao Wang Ke Yu Chao Dong Chen Change Loy Recovering Realistic Texture in Image Super-resolution by Deep Spatial Feature Transform Xintao Wang Ke Yu Chao Dong Chen Change Loy Problem enlarge 4 times Low-resolution image High-resolution image Previous

More information

Progress on Generative Adversarial Networks

Progress on Generative Adversarial Networks Progress on Generative Adversarial Networks Wangmeng Zuo Vision Perception and Cognition Centre Harbin Institute of Technology Content Image generation: problem formulation Three issues about GAN Discriminate

More information

One Network to Solve Them All Solving Linear Inverse Problems using Deep Projection Models

One Network to Solve Them All Solving Linear Inverse Problems using Deep Projection Models One Network to Solve Them All Solving Linear Inverse Problems using Deep Projection Models [Supplemental Materials] 1. Network Architecture b ref b ref +1 We now describe the architecture of the networks

More information

Estimating Human Pose in Images. Navraj Singh December 11, 2009

Estimating Human Pose in Images. Navraj Singh December 11, 2009 Estimating Human Pose in Images Navraj Singh December 11, 2009 Introduction This project attempts to improve the performance of an existing method of estimating the pose of humans in still images. Tasks

More information

DCGANs for image super-resolution, denoising and debluring

DCGANs for image super-resolution, denoising and debluring DCGANs for image super-resolution, denoising and debluring Qiaojing Yan Stanford University Electrical Engineering qiaojing@stanford.edu Wei Wang Stanford University Electrical Engineering wwang23@stanford.edu

More information

Face Hallucination Based on Eigentransformation Learning

Face Hallucination Based on Eigentransformation Learning Advanced Science and Technology etters, pp.32-37 http://dx.doi.org/10.14257/astl.2016. Face allucination Based on Eigentransformation earning Guohua Zou School of software, East China University of Technology,

More information

Learning to generate with adversarial networks

Learning to generate with adversarial networks Learning to generate with adversarial networks Gilles Louppe June 27, 2016 Problem statement Assume training samples D = {x x p data, x X } ; We want a generative model p model that can draw new samples

More information

arxiv: v1 [cs.cv] 1 Nov 2018

arxiv: v1 [cs.cv] 1 Nov 2018 Examining Performance of Sketch-to-Image Translation Models with Multiclass Automatically Generated Paired Training Data Dichao Hu College of Computing, Georgia Institute of Technology, 801 Atlantic Dr

More information

arxiv: v1 [cs.cv] 2 Sep 2018

arxiv: v1 [cs.cv] 2 Sep 2018 Natural Language Person Search Using Deep Reinforcement Learning Ankit Shah Language Technologies Institute Carnegie Mellon University aps1@andrew.cmu.edu Tyler Vuong Electrical and Computer Engineering

More information

Face Sketch Synthesis with Style Transfer using Pyramid Column Feature

Face Sketch Synthesis with Style Transfer using Pyramid Column Feature Face Sketch Synthesis with Style Transfer using Pyramid Column Feature Chaofeng Chen 1, Xiao Tan 2, and Kwan-Yee K. Wong 1 1 The University of Hong Kong, 2 Baidu Research {cfchen, kykwong}@cs.hku.hk, tanxchong@gmail.com

More information

arxiv: v2 [cs.cv] 16 Dec 2017

arxiv: v2 [cs.cv] 16 Dec 2017 CycleGAN, a Master of Steganography Casey Chu Stanford University caseychu@stanford.edu Andrey Zhmoginov Google Inc. azhmogin@google.com Mark Sandler Google Inc. sandler@google.com arxiv:1712.02950v2 [cs.cv]

More information

arxiv: v2 [cs.cv] 3 Mar 2018

arxiv: v2 [cs.cv] 3 Mar 2018 High-Quality Facial Photo-Sketch Synthesis Using Multi-Adversarial Networks Lidan Wang, Vishwanath A. Sindagi, Vishal M. Patel Rutgers, The State University of New Jersey 94 Brett Road, Piscataway, NJ

More information

Attribute Augmented Convolutional Neural Network for Face Hallucination

Attribute Augmented Convolutional Neural Network for Face Hallucination Attribute Augmented Convolutional Neural Network for Face Hallucination Cheng-Han Lee 1 Kaipeng Zhang 1 Hu-Cheng Lee 1 Chia-Wen Cheng 2 Winston Hsu 1 1 National Taiwan University 2 The University of Texas

More information

Learning Social Graph Topologies using Generative Adversarial Neural Networks

Learning Social Graph Topologies using Generative Adversarial Neural Networks Learning Social Graph Topologies using Generative Adversarial Neural Networks Sahar Tavakoli 1, Alireza Hajibagheri 1, and Gita Sukthankar 1 1 University of Central Florida, Orlando, Florida sahar@knights.ucf.edu,alireza@eecs.ucf.edu,gitars@eecs.ucf.edu

More information

Finding Tiny Faces Supplementary Materials

Finding Tiny Faces Supplementary Materials Finding Tiny Faces Supplementary Materials Peiyun Hu, Deva Ramanan Robotics Institute Carnegie Mellon University {peiyunh,deva}@cs.cmu.edu 1. Error analysis Quantitative analysis We plot the distribution

More information

Deep Fakes using Generative Adversarial Networks (GAN)

Deep Fakes using Generative Adversarial Networks (GAN) Deep Fakes using Generative Adversarial Networks (GAN) Tianxiang Shen UCSD La Jolla, USA tis038@eng.ucsd.edu Ruixian Liu UCSD La Jolla, USA rul188@eng.ucsd.edu Ju Bai UCSD La Jolla, USA jub010@eng.ucsd.edu

More information

GAN Related Works. CVPR 2018 & Selective Works in ICML and NIPS. Zhifei Zhang

GAN Related Works. CVPR 2018 & Selective Works in ICML and NIPS. Zhifei Zhang GAN Related Works CVPR 2018 & Selective Works in ICML and NIPS Zhifei Zhang Generative Adversarial Networks (GANs) 9/12/2018 2 Generative Adversarial Networks (GANs) Feedforward Backpropagation Real? z

More information

An efficient face recognition algorithm based on multi-kernel regularization learning

An efficient face recognition algorithm based on multi-kernel regularization learning Acta Technica 61, No. 4A/2016, 75 84 c 2017 Institute of Thermomechanics CAS, v.v.i. An efficient face recognition algorithm based on multi-kernel regularization learning Bi Rongrong 1 Abstract. A novel

More information

Supplementary Material: Unsupervised Domain Adaptation for Face Recognition in Unlabeled Videos

Supplementary Material: Unsupervised Domain Adaptation for Face Recognition in Unlabeled Videos Supplementary Material: Unsupervised Domain Adaptation for Face Recognition in Unlabeled Videos Kihyuk Sohn 1 Sifei Liu 2 Guangyu Zhong 3 Xiang Yu 1 Ming-Hsuan Yang 2 Manmohan Chandraker 1,4 1 NEC Labs

More information

De-mark GAN: Removing Dense Watermark With Generative Adversarial Network

De-mark GAN: Removing Dense Watermark With Generative Adversarial Network De-mark GAN: Removing Dense Watermark With Generative Adversarial Network Jinlin Wu, Hailin Shi, Shu Zhang, Zhen Lei, Yang Yang, Stan Z. Li Center for Biometrics and Security Research & National Laboratory

More information

FACIAL POINT DETECTION BASED ON A CONVOLUTIONAL NEURAL NETWORK WITH OPTIMAL MINI-BATCH PROCEDURE. Chubu University 1200, Matsumoto-cho, Kasugai, AICHI

FACIAL POINT DETECTION BASED ON A CONVOLUTIONAL NEURAL NETWORK WITH OPTIMAL MINI-BATCH PROCEDURE. Chubu University 1200, Matsumoto-cho, Kasugai, AICHI FACIAL POINT DETECTION BASED ON A CONVOLUTIONAL NEURAL NETWORK WITH OPTIMAL MINI-BATCH PROCEDURE Masatoshi Kimura Takayoshi Yamashita Yu Yamauchi Hironobu Fuyoshi* Chubu University 1200, Matsumoto-cho,

More information

What was Monet seeing while painting? Translating artworks to photo-realistic images M. Tomei, L. Baraldi, M. Cornia, R. Cucchiara

What was Monet seeing while painting? Translating artworks to photo-realistic images M. Tomei, L. Baraldi, M. Cornia, R. Cucchiara What was Monet seeing while painting? Translating artworks to photo-realistic images M. Tomei, L. Baraldi, M. Cornia, R. Cucchiara COMPUTER VISION IN THE ARTISTIC DOMAIN The effectiveness of Computer Vision

More information

Facial Key Points Detection using Deep Convolutional Neural Network - NaimishNet

Facial Key Points Detection using Deep Convolutional Neural Network - NaimishNet 1 Facial Key Points Detection using Deep Convolutional Neural Network - NaimishNet Naimish Agarwal, IIIT-Allahabad (irm2013013@iiita.ac.in) Artus Krohn-Grimberghe, University of Paderborn (artus@aisbi.de)

More information

RSRN: Rich Side-output Residual Network for Medial Axis Detection

RSRN: Rich Side-output Residual Network for Medial Axis Detection RSRN: Rich Side-output Residual Network for Medial Axis Detection Chang Liu, Wei Ke, Jianbin Jiao, and Qixiang Ye University of Chinese Academy of Sciences, Beijing, China {liuchang615, kewei11}@mails.ucas.ac.cn,

More information

arxiv: v1 [cs.cv] 7 Mar 2018

arxiv: v1 [cs.cv] 7 Mar 2018 Accepted as a conference paper at the European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN) 2018 Inferencing Based on Unsupervised Learning of Disentangled

More information

arxiv: v1 [cs.cv] 8 Jan 2019

arxiv: v1 [cs.cv] 8 Jan 2019 GILT: Generating Images from Long Text Ori Bar El, Ori Licht, Netanel Yosephian Tel-Aviv University {oribarel, oril, yosephian}@mail.tau.ac.il arxiv:1901.02404v1 [cs.cv] 8 Jan 2019 Abstract Creating an

More information

Comparing Dropout Nets to Sum-Product Networks for Predicting Molecular Activity

Comparing Dropout Nets to Sum-Product Networks for Predicting Molecular Activity 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

Learning based face hallucination techniques: A survey

Learning based face hallucination techniques: A survey Vol. 3 (2014-15) pp. 37-45. : A survey Premitha Premnath K Department of Computer Science & Engineering Vidya Academy of Science & Technology Thrissur - 680501, Kerala, India (email: premithakpnath@gmail.com)

More information

Face Recognition Based on LDA and Improved Pairwise-Constrained Multiple Metric Learning Method

Face Recognition Based on LDA and Improved Pairwise-Constrained Multiple Metric Learning Method Journal of Information Hiding and Multimedia Signal Processing c 2016 ISSN 2073-4212 Ubiquitous International Volume 7, Number 5, September 2016 Face Recognition ased on LDA and Improved Pairwise-Constrained

More information

arxiv: v1 [cs.lg] 12 Jul 2018

arxiv: v1 [cs.lg] 12 Jul 2018 arxiv:1807.04585v1 [cs.lg] 12 Jul 2018 Deep Learning for Imbalance Data Classification using Class Expert Generative Adversarial Network Fanny a, Tjeng Wawan Cenggoro a,b a Computer Science Department,

More information

Clustering and Unsupervised Anomaly Detection with l 2 Normalized Deep Auto-Encoder Representations

Clustering and Unsupervised Anomaly Detection with l 2 Normalized Deep Auto-Encoder Representations Clustering and Unsupervised Anomaly Detection with l 2 Normalized Deep Auto-Encoder Representations Caglar Aytekin, Xingyang Ni, Francesco Cricri and Emre Aksu Nokia Technologies, Tampere, Finland Corresponding

More information

Random projection for non-gaussian mixture models

Random projection for non-gaussian mixture models Random projection for non-gaussian mixture models Győző Gidófalvi Department of Computer Science and Engineering University of California, San Diego La Jolla, CA 92037 gyozo@cs.ucsd.edu Abstract Recently,

More information

A Novel Image Super-resolution Reconstruction Algorithm based on Modified Sparse Representation

A Novel Image Super-resolution Reconstruction Algorithm based on Modified Sparse Representation , pp.162-167 http://dx.doi.org/10.14257/astl.2016.138.33 A Novel Image Super-resolution Reconstruction Algorithm based on Modified Sparse Representation Liqiang Hu, Chaofeng He Shijiazhuang Tiedao University,

More information

Facial Feature Extraction Based On FPD and GLCM Algorithms

Facial Feature Extraction Based On FPD and GLCM Algorithms Facial Feature Extraction Based On FPD and GLCM Algorithms Dr. S. Vijayarani 1, S. Priyatharsini 2 Assistant Professor, Department of Computer Science, School of Computer Science and Engineering, Bharathiar

More information

GENERATIVE ADVERSARIAL NETWORK-BASED VIR-

GENERATIVE ADVERSARIAL NETWORK-BASED VIR- GENERATIVE ADVERSARIAL NETWORK-BASED VIR- TUAL TRY-ON WITH CLOTHING REGION Shizuma Kubo, Yusuke Iwasawa, and Yutaka Matsuo The University of Tokyo Bunkyo-ku, Japan {kubo, iwasawa, matsuo}@weblab.t.u-tokyo.ac.jp

More information

A Novel Multi-Frame Color Images Super-Resolution Framework based on Deep Convolutional Neural Network. Zhe Li, Shu Li, Jianmin Wang and Hongyang Wang

A Novel Multi-Frame Color Images Super-Resolution Framework based on Deep Convolutional Neural Network. Zhe Li, Shu Li, Jianmin Wang and Hongyang Wang 5th International Conference on Measurement, Instrumentation and Automation (ICMIA 2016) A Novel Multi-Frame Color Images Super-Resolution Framewor based on Deep Convolutional Neural Networ Zhe Li, Shu

More information

arxiv: v1 [cs.cv] 31 Mar 2016

arxiv: v1 [cs.cv] 31 Mar 2016 Object Boundary Guided Semantic Segmentation Qin Huang, Chunyang Xia, Wenchao Zheng, Yuhang Song, Hao Xu and C.-C. Jay Kuo arxiv:1603.09742v1 [cs.cv] 31 Mar 2016 University of Southern California Abstract.

More information

A Novel Technique for Sketch to Photo Synthesis

A Novel Technique for Sketch to Photo Synthesis A Novel Technique for Sketch to Photo Synthesis Pulak Purkait, Bhabatosh Chanda (a) and Shrikant Kulkarni (b) (a) Indian Statistical Institute, Kolkata (b) National Institute of Technology Karnataka, Surathkal

More information

S+U Learning through ANs - Pranjit Kalita

S+U Learning through ANs - Pranjit Kalita S+U Learning through ANs - Pranjit Kalita - (from paper) Learning from Simulated and Unsupervised Images through Adversarial Training - Ashish Shrivastava, Tomas Pfister, Oncel Tuzel, Josh Susskind, Wenda

More information

Improving Latent Fingerprint Matching Performance by Orientation Field Estimation using Localized Dictionaries

Improving Latent Fingerprint Matching Performance by Orientation Field Estimation using Localized Dictionaries Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 11, November 2014,

More information

Unsupervised Learning

Unsupervised Learning Deep Learning for Graphics Unsupervised Learning Niloy Mitra Iasonas Kokkinos Paul Guerrero Vladimir Kim Kostas Rematas Tobias Ritschel UCL UCL/Facebook UCL Adobe Research U Washington UCL Timetable Niloy

More information

Intensity-Depth Face Alignment Using Cascade Shape Regression

Intensity-Depth Face Alignment Using Cascade Shape Regression Intensity-Depth Face Alignment Using Cascade Shape Regression Yang Cao 1 and Bao-Liang Lu 1,2 1 Center for Brain-like Computing and Machine Intelligence Department of Computer Science and Engineering Shanghai

More information

MULTI-POSE FACE HALLUCINATION VIA NEIGHBOR EMBEDDING FOR FACIAL COMPONENTS. Yanghao Li, Jiaying Liu, Wenhan Yang, Zongming Guo

MULTI-POSE FACE HALLUCINATION VIA NEIGHBOR EMBEDDING FOR FACIAL COMPONENTS. Yanghao Li, Jiaying Liu, Wenhan Yang, Zongming Guo MULTI-POSE FACE HALLUCINATION VIA NEIGHBOR EMBEDDING FOR FACIAL COMPONENTS Yanghao Li, Jiaying Liu, Wenhan Yang, Zongg Guo Institute of Computer Science and Technology, Peking University, Beijing, P.R.China,

More information

arxiv: v1 [cs.cv] 6 Sep 2018

arxiv: v1 [cs.cv] 6 Sep 2018 arxiv:1809.01890v1 [cs.cv] 6 Sep 2018 Full-body High-resolution Anime Generation with Progressive Structure-conditional Generative Adversarial Networks Koichi Hamada, Kentaro Tachibana, Tianqi Li, Hiroto

More information

2.1 Optimized Importance Map

2.1 Optimized Importance Map 3rd International Conference on Multimedia Technology(ICMT 2013) Improved Image Resizing using Seam Carving and scaling Yan Zhang 1, Jonathan Z. Sun, Jingliang Peng Abstract. Seam Carving, the popular

More information

Face Transfer with Generative Adversarial Network

Face Transfer with Generative Adversarial Network Face Transfer with Generative Adversarial Network Runze Xu, Zhiming Zhou, Weinan Zhang, Yong Yu Shanghai Jiao Tong University We explore the impact of discriminators with different receptive field sizes

More information

arxiv: v1 [cs.cv] 16 Jul 2017

arxiv: v1 [cs.cv] 16 Jul 2017 enerative adversarial network based on resnet for conditional image restoration Paper: jc*-**-**-****: enerative Adversarial Network based on Resnet for Conditional Image Restoration Meng Wang, Huafeng

More information

Color Local Texture Features Based Face Recognition

Color Local Texture Features Based Face Recognition Color Local Texture Features Based Face Recognition Priyanka V. Bankar Department of Electronics and Communication Engineering SKN Sinhgad College of Engineering, Korti, Pandharpur, Maharashtra, India

More information

Alternatives to Direct Supervision

Alternatives to Direct Supervision CreativeAI: Deep Learning for Graphics Alternatives to Direct Supervision Niloy Mitra Iasonas Kokkinos Paul Guerrero Nils Thuerey Tobias Ritschel UCL UCL UCL TUM UCL Timetable Theory and Basics State of

More information

DeepIM: Deep Iterative Matching for 6D Pose Estimation - Supplementary Material

DeepIM: Deep Iterative Matching for 6D Pose Estimation - Supplementary Material DeepIM: Deep Iterative Matching for 6D Pose Estimation - Supplementary Material Yi Li 1, Gu Wang 1, Xiangyang Ji 1, Yu Xiang 2, and Dieter Fox 2 1 Tsinghua University, BNRist 2 University of Washington

More information

arxiv: v2 [cs.cv] 14 Jul 2018

arxiv: v2 [cs.cv] 14 Jul 2018 Constrained Neural Style Transfer for Decorated Logo Generation arxiv:1803.00686v2 [cs.cv] 14 Jul 2018 Gantugs Atarsaikhan, Brian Kenji Iwana, Seiichi Uchida Graduate School of Information Science and

More information

SYNTHESIS OF IMAGES BY TWO-STAGE GENERATIVE ADVERSARIAL NETWORKS. Qiang Huang, Philip J.B. Jackson, Mark D. Plumbley, Wenwu Wang

SYNTHESIS OF IMAGES BY TWO-STAGE GENERATIVE ADVERSARIAL NETWORKS. Qiang Huang, Philip J.B. Jackson, Mark D. Plumbley, Wenwu Wang SYNTHESIS OF IMAGES BY TWO-STAGE GENERATIVE ADVERSARIAL NETWORKS Qiang Huang, Philip J.B. Jackson, Mark D. Plumbley, Wenwu Wang Centre for Vision, Speech and Signal Processing University of Surrey, Guildford,

More information

Improving Image Segmentation Quality Via Graph Theory

Improving Image Segmentation Quality Via Graph Theory International Symposium on Computers & Informatics (ISCI 05) Improving Image Segmentation Quality Via Graph Theory Xiangxiang Li, Songhao Zhu School of Automatic, Nanjing University of Post and Telecommunications,

More information

Robust Face Sketch Synthesis via Generative Adversarial Fusion of Priors and Parametric Sigmoid

Robust Face Sketch Synthesis via Generative Adversarial Fusion of Priors and Parametric Sigmoid Robust Face Sketch Synthesis via Generative Adversarial Fusion of Priors and Parametric Sigmoid Shengchuan Zhang 1,2, Rongrong Ji 1,2, Jie Hu 1,2, Yue Gao 3, Chia-Wen Lin 4 1 Fujian Key Laboratory of Sensing

More information

arxiv: v1 [cs.cv] 17 Nov 2016

arxiv: v1 [cs.cv] 17 Nov 2016 Inverting The Generator Of A Generative Adversarial Network arxiv:1611.05644v1 [cs.cv] 17 Nov 2016 Antonia Creswell BICV Group Bioengineering Imperial College London ac2211@ic.ac.uk Abstract Anil Anthony

More information

3D model classification using convolutional neural network

3D model classification using convolutional neural network 3D model classification using convolutional neural network JunYoung Gwak Stanford jgwak@cs.stanford.edu Abstract Our goal is to classify 3D models directly using convolutional neural network. Most of existing

More information

Show, Discriminate, and Tell: A Discriminatory Image Captioning Model with Deep Neural Networks

Show, Discriminate, and Tell: A Discriminatory Image Captioning Model with Deep Neural Networks Show, Discriminate, and Tell: A Discriminatory Image Captioning Model with Deep Neural Networks Zelun Luo Department of Computer Science Stanford University zelunluo@stanford.edu Te-Lin Wu Department of

More information

Semi-supervised Data Representation via Affinity Graph Learning

Semi-supervised Data Representation via Affinity Graph Learning 1 Semi-supervised Data Representation via Affinity Graph Learning Weiya Ren 1 1 College of Information System and Management, National University of Defense Technology, Changsha, Hunan, P.R China, 410073

More information

A FRAMEWORK OF EXTRACTING MULTI-SCALE FEATURES USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKS. Kuan-Chuan Peng and Tsuhan Chen

A FRAMEWORK OF EXTRACTING MULTI-SCALE FEATURES USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKS. Kuan-Chuan Peng and Tsuhan Chen A FRAMEWORK OF EXTRACTING MULTI-SCALE FEATURES USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKS Kuan-Chuan Peng and Tsuhan Chen School of Electrical and Computer Engineering, Cornell University, Ithaca, NY

More information

Facial Expression Classification with Random Filters Feature Extraction

Facial Expression Classification with Random Filters Feature Extraction Facial Expression Classification with Random Filters Feature Extraction Mengye Ren Facial Monkey mren@cs.toronto.edu Zhi Hao Luo It s Me lzh@cs.toronto.edu I. ABSTRACT In our work, we attempted to tackle

More information

Extend the shallow part of Single Shot MultiBox Detector via Convolutional Neural Network

Extend the shallow part of Single Shot MultiBox Detector via Convolutional Neural Network Extend the shallow part of Single Shot MultiBox Detector via Convolutional Neural Network Liwen Zheng, Canmiao Fu, Yong Zhao * School of Electronic and Computer Engineering, Shenzhen Graduate School of

More information

An Adaptive Threshold LBP Algorithm for Face Recognition

An Adaptive Threshold LBP Algorithm for Face Recognition An Adaptive Threshold LBP Algorithm for Face Recognition Xiaoping Jiang 1, Chuyu Guo 1,*, Hua Zhang 1, and Chenghua Li 1 1 College of Electronics and Information Engineering, Hubei Key Laboratory of Intelligent

More information

Video annotation based on adaptive annular spatial partition scheme

Video annotation based on adaptive annular spatial partition scheme Video annotation based on adaptive annular spatial partition scheme Guiguang Ding a), Lu Zhang, and Xiaoxu Li Key Laboratory for Information System Security, Ministry of Education, Tsinghua National Laboratory

More information

Image Restoration with Deep Generative Models

Image Restoration with Deep Generative Models Image Restoration with Deep Generative Models Raymond A. Yeh *, Teck-Yian Lim *, Chen Chen, Alexander G. Schwing, Mark Hasegawa-Johnson, Minh N. Do Department of Electrical and Computer Engineering, University

More information

Sparse Shape Registration for Occluded Facial Feature Localization

Sparse Shape Registration for Occluded Facial Feature Localization Shape Registration for Occluded Facial Feature Localization Fei Yang, Junzhou Huang and Dimitris Metaxas Abstract This paper proposes a sparsity driven shape registration method for occluded facial feature

More information

arxiv: v1 [cs.cv] 19 Apr 2017

arxiv: v1 [cs.cv] 19 Apr 2017 Generative Face Completion Yijun Li 1, Sifei Liu 1, Jimei Yang 2, and Ming-Hsuan Yang 1 1 University of California, Merced 2 Adobe Research {yli62,sliu32,mhyang}@ucmerced.edu jimyang@adobe.com arxiv:1704.05838v1

More information

CS231N Project Final Report - Fast Mixed Style Transfer

CS231N Project Final Report - Fast Mixed Style Transfer CS231N Project Final Report - Fast Mixed Style Transfer Xueyuan Mei Stanford University Computer Science xmei9@stanford.edu Fabian Chan Stanford University Computer Science fabianc@stanford.edu Tianchang

More information

Introduction to Generative Adversarial Networks

Introduction to Generative Adversarial Networks Introduction to Generative Adversarial Networks Luke de Oliveira Vai Technologies Lawrence Berkeley National Laboratory @lukede0 @lukedeo lukedeo@vaitech.io https://ldo.io 1 Outline Why Generative Modeling?

More information

Texture Sensitive Image Inpainting after Object Morphing

Texture Sensitive Image Inpainting after Object Morphing Texture Sensitive Image Inpainting after Object Morphing Yin Chieh Liu and Yi-Leh Wu Department of Computer Science and Information Engineering National Taiwan University of Science and Technology, Taiwan

More information

Convolution Neural Networks for Chinese Handwriting Recognition

Convolution Neural Networks for Chinese Handwriting Recognition Convolution Neural Networks for Chinese Handwriting Recognition Xu Chen Stanford University 450 Serra Mall, Stanford, CA 94305 xchen91@stanford.edu Abstract Convolutional neural networks have been proven

More information

Single Image Super Resolution of Textures via CNNs. Andrew Palmer

Single Image Super Resolution of Textures via CNNs. Andrew Palmer Single Image Super Resolution of Textures via CNNs Andrew Palmer What is Super Resolution (SR)? Simple: Obtain one or more high-resolution images from one or more low-resolution ones Many, many applications

More information

Robust Face Recognition Based on Convolutional Neural Network

Robust Face Recognition Based on Convolutional Neural Network 2017 2nd International Conference on Manufacturing Science and Information Engineering (ICMSIE 2017) ISBN: 978-1-60595-516-2 Robust Face Recognition Based on Convolutional Neural Network Ying Xu, Hui Ma,

More information

Face Recognition Technology Based On Image Processing Chen Xin, Yajuan Li, Zhimin Tian

Face Recognition Technology Based On Image Processing Chen Xin, Yajuan Li, Zhimin Tian 4th International Conference on Machinery, Materials and Computing Technology (ICMMCT 2016) Face Recognition Technology Based On Image Processing Chen Xin, Yajuan Li, Zhimin Tian Hebei Engineering and

More information

arxiv: v1 [cs.ne] 11 Jun 2018

arxiv: v1 [cs.ne] 11 Jun 2018 Generative Adversarial Network Architectures For Image Synthesis Using Capsule Networks arxiv:1806.03796v1 [cs.ne] 11 Jun 2018 Yash Upadhyay University of Minnesota, Twin Cities Minneapolis, MN, 55414

More information

arxiv: v1 [cs.cv] 16 Nov 2015

arxiv: v1 [cs.cv] 16 Nov 2015 Coarse-to-fine Face Alignment with Multi-Scale Local Patch Regression Zhiao Huang hza@megvii.com Erjin Zhou zej@megvii.com Zhimin Cao czm@megvii.com arxiv:1511.04901v1 [cs.cv] 16 Nov 2015 Abstract Facial

More information

Direct Matrix Factorization and Alignment Refinement: Application to Defect Detection

Direct Matrix Factorization and Alignment Refinement: Application to Defect Detection Direct Matrix Factorization and Alignment Refinement: Application to Defect Detection Zhen Qin (University of California, Riverside) Peter van Beek & Xu Chen (SHARP Labs of America, Camas, WA) 2015/8/30

More information

Face Alignment Under Various Poses and Expressions

Face Alignment Under Various Poses and Expressions Face Alignment Under Various Poses and Expressions Shengjun Xin and Haizhou Ai Computer Science and Technology Department, Tsinghua University, Beijing 100084, China ahz@mail.tsinghua.edu.cn Abstract.

More information

arxiv: v1 [cs.cv] 22 Feb 2017

arxiv: v1 [cs.cv] 22 Feb 2017 Synthesising Dynamic Textures using Convolutional Neural Networks arxiv:1702.07006v1 [cs.cv] 22 Feb 2017 Christina M. Funke, 1, 2, 3, Leon A. Gatys, 1, 2, 4, Alexander S. Ecker 1, 2, 5 1, 2, 3, 6 and Matthias

More information

Stacked Denoising Autoencoders for Face Pose Normalization

Stacked Denoising Autoencoders for Face Pose Normalization Stacked Denoising Autoencoders for Face Pose Normalization Yoonseop Kang 1, Kang-Tae Lee 2,JihyunEun 2, Sung Eun Park 2 and Seungjin Choi 1 1 Department of Computer Science and Engineering Pohang University

More information

Face Recognition Using Vector Quantization Histogram and Support Vector Machine Classifier Rong-sheng LI, Fei-fei LEE *, Yan YAN and Qiu CHEN

Face Recognition Using Vector Quantization Histogram and Support Vector Machine Classifier Rong-sheng LI, Fei-fei LEE *, Yan YAN and Qiu CHEN 2016 International Conference on Artificial Intelligence: Techniques and Applications (AITA 2016) ISBN: 978-1-60595-389-2 Face Recognition Using Vector Quantization Histogram and Support Vector Machine

More information

Deep Manga Colorization with Color Style Extraction by Conditional Adversarially Learned Inference

Deep Manga Colorization with Color Style Extraction by Conditional Adversarially Learned Inference Information Engineering Express International Institute of Applied Informatics 2017, Vol.3, No.4, P.55-66 Deep Manga Colorization with Color Style Extraction by Conditional Adversarially Learned Inference

More information

arxiv: v1 [eess.sp] 23 Oct 2018

arxiv: v1 [eess.sp] 23 Oct 2018 Reproducing AmbientGAN: Generative models from lossy measurements arxiv:1810.10108v1 [eess.sp] 23 Oct 2018 Mehdi Ahmadi Polytechnique Montreal mehdi.ahmadi@polymtl.ca Mostafa Abdelnaim University de Montreal

More information

Structured Light II. Thanks to Ronen Gvili, Szymon Rusinkiewicz and Maks Ovsjanikov

Structured Light II. Thanks to Ronen Gvili, Szymon Rusinkiewicz and Maks Ovsjanikov Structured Light II Johannes Köhler Johannes.koehler@dfki.de Thanks to Ronen Gvili, Szymon Rusinkiewicz and Maks Ovsjanikov Introduction Previous lecture: Structured Light I Active Scanning Camera/emitter

More information

Facial Animation System Design based on Image Processing DU Xueyan1, a

Facial Animation System Design based on Image Processing DU Xueyan1, a 4th International Conference on Machinery, Materials and Computing Technology (ICMMCT 206) Facial Animation System Design based on Image Processing DU Xueyan, a Foreign Language School, Wuhan Polytechnic,

More information

Generic Face Alignment Using an Improved Active Shape Model

Generic Face Alignment Using an Improved Active Shape Model Generic Face Alignment Using an Improved Active Shape Model Liting Wang, Xiaoqing Ding, Chi Fang Electronic Engineering Department, Tsinghua University, Beijing, China {wanglt, dxq, fangchi} @ocrserv.ee.tsinghua.edu.cn

More information

Leaf Image Recognition Based on Wavelet and Fractal Dimension

Leaf Image Recognition Based on Wavelet and Fractal Dimension Journal of Computational Information Systems 11: 1 (2015) 141 148 Available at http://www.jofcis.com Leaf Image Recognition Based on Wavelet and Fractal Dimension Haiyan ZHANG, Xingke TAO School of Information,

More information