Improving Semantic Style Transfer Using Guided Gram Matrices

Size: px

Start display at page:

Download "Improving Semantic Style Transfer Using Guided Gram Matrices"

Willa Dean
5 years ago
Views:

1 Improving Semantic Style Transfer Using Guided Gram Matrices Chung Nicolas 1,2, Rong Xie 1,2, Li Song 1,2, and Wenjun Zhang 1,2 1 Institute of Image Communication and Network Engineering, Shanghai Jiao Tong University, Shanghai, China 2 Cooperative Medianet Innovation Center, Shanghai Jiao Tong University, Shanghai China nicolas.chung@insa-lyon.fr, {xierong, song li, zhangwenjun}@sjtu.edu.cn Abstract. Style transfer is a computer vision task that attempts to transfer the style of an artistic image to a content image. Thanks to the advance in Deep Convolutional Neural Networks, exciting style transfer results has been achieved, but traditional algorithms do not fully understand semantic information. Those algorithms are not aware of which regions in the style image have to be transferred to which regions in the content image. A common failure case is style transfer involving landscape images. After stylization, the textures and colors of the land are often found in incoherent places such as in the river or in the sky. In this work, we investigate semantic style transfer for content images with more than 2 semantic regions. We combine guided Gram matrices with gradient capping and multi-scale representations. Our approach simplifies the parameter tuning problem, improves the style transfer results and is faster than current semantic methods. Keywords: semantic style transfer guided gram matrices gradient capping multi-scale representation deep learning. 1 Introduction Much study in the past three years has focus on style transfer using Deep Convolutional Neural Networks. Style transfer on images aims to transfer the style of an artistic image to a content image. It has been widely investigated - in the field of computer vision and graphics - to solve problems such as: headshot portrait, weather transfer, texture synthesis and object transfiguration. In order to produce good results, style transfer algorithms must preserve the content features of the content image while changing important style features such as texture, colors and line stokes. One way to address that problem is to solve an image optimization problem. Gatys et al achieve astonishing results by considering a 19-layers VGG network pre-trained on image classification [7]. The objective to minimize is composed of a content loss and a style loss. The stylized images are obtained by performing gradient descent on white noisy images. Image optimization methods are flexible

2 N. Chung et al. Fig. 1. Style: Rain princess, Leonid Afremov, 2014. An example of poor style transfer. Traditional methods consider the content image as a whole.

2 2 N. Chung et al. Fig. 1. Style: Rain princess, Leonid Afremov, An example of poor style transfer. Traditional methods consider the content image as a whole. These algorithms cannot distinguish the background and the object of interest (the cat here). In this case, too much content information is lost. A better style/content weight ratio need to be used. The process can be long and tedious. but computationally expensive. To overcome that problem, Justin et al introduce a style transfer network based on model optimization [6]. Stylized images are generated 3 times faster, but the model has lost flexibility: a model has to be trained for each style images. However, although traditional models (whether based on image optimization or model optimization) can achieve exciting results, more work is needed to exploit semantic information. Traditional models apply the style on the whole image. If the content and style weight are not well chosen, the results are poor (see Fig. 1). Semantic style transfer aims to spatially control neural style transfer. With semantic style transfer, the user selects which style to apply to each region in the content image. In [13], the authors attempt to mask the gradient using a simple threshold, but they were not completely successful. In [4], the authors try to combine capped gradient with Gram matrices manipulation, but the results are far from perfect. Recently, the authors in [9] introduce guided Gram matrices and have shown good results for simple content images. In this study, we show how simple modifications of the original guided Gram matrices [9] improve the results of semantic style transfer. Our work concentrates on content images with more than 2 semantics regions. We simplify the parameter tunning problem by considering pixel ratio of the segmentation masks. By doing so, our approach can represent the different styles uniformly. As a second contribution, we demonstrate that spatial control can be reinforced by combining Gram matrices with gradient capping. During backpropagation, the idea is to update each region in the stylized image by stopping the gradient of the other regions. We also introduce a multi-scale semantic style transfer algorithm. Our model produces qualitatively better results and is faster than current semantic style transfer methods. The rest of this paper is organized as follow. We summarize the current existing work in section 2. We briefly introduce the neural style transfer algorithm in section 2. We present our proposed model in section 4. We detail the experiment in section 5. We show and analyze the results in section 6. We draw conclusion in section 7.

3 Improving Semantic Style Transfer Using Guided Gram Matrices 3 2 Related Work Style transfer. Traditional style transfer methods solve an image optimisation problem. Gatys et al demonstrate that content and style features can be extracted directly from the hidden activation layers of the VGG [7]. Extracted style features are then used to compute features correlation known as Gram matrices. Alternative methods replace Gram matrices with a Markov random fields (MRFs) regularizer [3,10]. MRFs are patch-based approaches that improve the precision of photo-realistic style transfer. In some cases, the colors of the content image should be preserved. Authors in [9] accomplish color preservation by implementing two simple linear methods. In [13] a better per-layer content and style weighting scheme is investigated. The same authors demonstrate that more style layers leads to better results. In [14], an histogram loss is used to improve stability of style transfer. These methods solve an image optimization problem: the quality of the resulting images is high but generating them is time-consuming. Moreover, these methods do not exploit semantic information. Fast style transfer. Image optimization models are slow. To speed up style transfer, Justin et al propose a model optimization problem [6]. They formulate the style transfer problem as an image transformation task. A feed-forward transformation network is trained. The per-pixel loss functions are replaced by perceptual losses. Compared to image optimization problems, stylized images are generated three times faster. It was found that generated images can be qualitatively improved using instance normalization [15]. Other methods like [5,12] developed a single network that can represent several styles. Even though these methods are faster than image optimization models, model optimization models cannot perform semantic style transfer. Those models lack flexibility: a model would need to be trained for each content images and combination of style images. Semantic style transfer. The aim of semantic style transfer is to spatially control style transfer. In [9], the authors introduce Guided Gram Matrices. Unlike full style transfer, a Gram Matrix is computed for each region in the content image. The authors obtain good results for content images with 2 semantic regions. In [11] the authors address the computation bottleneck of back-propagation of semantic style transfer with a decoder network. Other works attempt to perform semantic style transfer using patch based methods [3,17]. All these methods have in common that they consider 1 content image and 1 style image with similar semantic regions. Those methods perform well for portrait style transfer. In this work, we consider 1 content image and n style images. It is harder since each style images does not necessarily have the same semantics as the content image. We demonstrate the efficiency of our method for content images with more than 2 semantic regions.

4 4 N. Chung et al. 3 Neural style transfer In this section, we briefly review the style transfer method proposed by Gatys et al [7]. Our work solves the same image optimization problem but combines several reconstruction losses. We review them below. Content and style losses. Gatys et al generate stylized images by minimizing a content and style loss [7]. We denote C the content image, S the style image, I the generated image and F l the activation map of convolutional layer l. The content of an image is represented by the activation of one convolutional layer lc. The content loss compares the content representation of C and I: L content = F l (C) F l (I) 2 (1) Statistical features known as Gram Matrices represent the style of I [7]. When those statistics are computed for each region r in I, they are known as guided Gram Matrices [9]. Guided Gram matrices require semantic segmentation masks. Contrary to content features, multiple layers represent the style of an image. Let G l r be the guided Gram Matrices of the r-th region at layer l and β r its associated weighting factor. The style loss is: L style = β r G l r(s) G l r(i) 2 (2) r l Regularizer Loss. Justin et al [6] show that a regularizer loss improve results for fast style transfer methods. It is a per pixel loss function that encourages spatial smoothness by reducing distortions and visual artifacts. It is defined as [16]: L regularizer = ((x i,j+1 x i,j ) 2 + (x i+1,j x i,j ) 2 ) 1/2 (3) i,j 4 Proposed Model Our algorithm takes as input a content image (an ordinary photograph), n segmentation masks (each mask is associated to a region in the content image) and n style images. Fig. 2 illustrates our method. 4.1 Loss function We generate stylized images by jointly minimizing a content, style and regularizer loss. Refer to the previous section for the detail of each loss. We control the tradeoff between content, style and regularizer by the weighting factors α, β and γ. As in [9], we used 19-layers VGG network pre-trained on image classification to extract the high levels features. L total = αl content + βl style + γl regularizer (4)

Improving Semantic Style Transfer Using Guided Gram Matrices 5 (a) Step 1: Features extraction (b) Step 2: Image update (c) Step 3: Next scale Fig. 2. System overview. (a) Features extraction.

We cap the gradient and update the stylized image. We repeat step (a) and (b) N times. (c) Next scale. We upscale the current stylized image and set it as initialisation for the next scale.

To understand our auto-tuning approach, we have to understand how guided Gram Matrices are computed. Consider an image X with R number of regions.

We denote the element-wise multiplication operator. G l r is the inner product between the F l r.

5 Improving Semantic Style Transfer Using Guided Gram Matrices 5 (a) Step 1: Features extraction (b) Step 2: Image update (c) Step 3: Next scale Fig. 2. System overview. (a) Features extraction. We input the styles, masks, content and initialize the optimisation with the content. We extract content and style features and compute their associated loss. (b) Image update. We cap the gradient and update the stylized image. We repeat step (a) and (b) N times. (c) Next scale. We upscale the current stylized image and set it as initialisation for the next scale. We repeat the whole process until the final resolution is reached. 4.2 Improved Guided Gram Matrices Auto-tuning of weights. To understand our auto-tuning approach, we have to understand how guided Gram Matrices are computed. Consider an image X with R number of regions. Let G l r and T l r, respectively, be the guided Gram Matrix and the guidance channel at layer l for r-th region, r [0, R]. Let F l r be the activations maps masked by T l r. We denote the element-wise multiplication operator. G l r is the inner product between the F l r. F l r(x) = T l r F l (X) G l r(x) = F l r(x) T F l r(x) (5a) (5b) The main difficulty to obtain good results comes from parameters tuning. A content image composed of R semantic regions introduces R more parameters. We simplify that problem by tuning β r (defined in Eq. 2) automatically. For a region r in C, we compute β r as the ratio between the number of pixels in that region and the total number of pixels: β r = p r i p i where p i is the number of pixel of region i. Because of masking (Eq. 5a), bigger regions contribute more to the style loss than smaller regions (more pixels). (6)

6 6 N. Chung et al. To address that problem, we tried to normalize the guidance channels such that i (T r) l 2 i = 1 [9], but by doing so, we found out that the style of the smaller regions was more marked than the bigger ones. Our weighting factor counterbalance that phenomena by assigning smaller weights to smaller regions. Erosion/Dilation. Recall that style features are represented with several layers. Because neurons in deeper layers have bigger receptive field, the styles of adjacent regions overlap at the boundaries. Authors in [8] attempt to increase spatial control by using erosion and dilation on the guidance channels. The results are good but the numbers and the shapes of the considered regions are simple: only two large regions. We found out that results for small regions are poor. For small regions, only few neurons capture style features (see Eq. 5a). After erosion the remaining neurons are not enough to well represent the style. We propose a method that could improve the blending problem but that we did not implement yet. Our idea is to define a zone of pixels at the boundaries using morphological filtering. The weight of two-adjacent style are set to 0.5 and 0.5. For n overlapping stylized regions, the weights are set to 1/n. The aim is to smooth transition between the different stylized regions. 4.3 Gradient Capping In the previous subsection, we saw that the styles overlap at the boundaries. To address that problem, we combine guided Gram Matrices with gradient capping. We ensure that only the desired regions are updated by stopping the propagation of the gradient through the unwanted regions. Consider Xr t the r-th region of the stylized image X at iteration t, T r the associated segmentation mask, λ the learning rate, the gradient and L tot the loss function defined in Eq. 4. At each iteration we have: Xr t+1 = Xr t λ T r X tl tot (7) Note that guided Gram Matrices mask style features. To match the dimension of those features (see Eq. 5a), the masks T r have to be downsampled with pooling. Gradient capping masks regions in the stylized image itself (the input masks T r are enough). With gradient capping, the transition between objects is sharp at the boundaries. We solve this problem by increasing the regularization weight γ (see Eq. 4). As suggested in [4], we tried to independently minimize the content, style and regularizer loss, but the results were bad. The reason is that the minimun of a sum of functions is not equal to the sum of the minimun of the functions. This property can be easily verified by considering functions with positive and negative numbers such as the gradient. Thus, we have to jointly minimize the three losses. 4.4 Multi-scale representation In order to generate high resolution stylized images, multi-scale representation has been used in traditional style transfer methods [7,14]. The first step is to

7 Improving Semantic Style Transfer Using Guided Gram Matrices 7 Algorithm 1: Multi scale semantic style transfer. X 0 indicates image X at resolution α 0, N number of scales, upscaling, downscaling. input : Content image C Style images S Masks content T c Masks styles T s output: Stylized image I k at scale α k begin /* Initialization with the smallest scale */ C 0, S 0, Tc 0, Ts 0 C, S, T c, T s α 0 I 0 C 0 /* Multi-scale representation */ for k = 0,..., N 1 do /* Image optimization */ I k semantic-style-transfer(c k, S k, Tc k, Ts k, I k ) /* Initialization next scale */ I k+1 I k α k+1 /* Upscale content image, style images and masks */ C k+1, S k+1, Tc k+1, Ts k+1 C k, S k, Tc k, Ts k α k+1 generate a low resolution image. This stylized image is then upscaled and set as the initialization image for the next scale. The process is then repeated until the final resolution is reached. By doing so, details are added between each successive scale and fewer iterations are required at high resolution. We extend this multi-scale representation to semantic style transfer. In addition to content and style images, our algorithms requires the segmentation masks. We upscale images with bilinear interpolation. Refer to Alg. 1 for the details. Even though multi-scale representation requires less iterations at high resolution, it is not always faster than one-scale representation. There exists a prepocessing phase that is not negligeable. We empirically found that using 2 scales is a good trade-off between speed and quality of the generated images. 5 Experiment details The number of iteration was fixed to N = 700. Content features are extracted from layer ReLU4 2. Style features are extracted from layer ReLU1 1, ReLU2 1, ReLU3 1, ReLU4 1, ReLU5 1. The optimization process is initialized from the content image (cleaner results than white noise initialization). We use the content masks as style masks. Adam optimizer is used with learning rate 1e+1. The content weight is 5e+0. The style weight is 5e+2. When not mentioned, we used the geometric weighting scheme defined in [13]. The strength of the total variation loss is 1e+3. Our implementation is based on the work of [1]. We plan

8 8 N. Chung et al. Table 1. Running time comparison in second (s) of different semantic methods. Time is averaged over 20 samples. N = 700 iterations, r : number of semantic regions. A content image was used. Number regions Castillo et al [2] Gatys et al [9] Our Speed up vs. [2] vs. [9] r = r = r = to render our code publicly available upon acceptance for future research. On two NVIDIA k20c, it takes around 1 minute to process one frame of size Results and discussion 6.1 Proposed vs Others semantic methods In this section, we qualitatively compare targeted style transfer [2], guided Gram Matrices [9] and our method. Two semantic regions are considered and the style is applied only on the background. The results are presented in Fig. 3. With [2], the foreground is the same as in the content image. This method applies style transfer on the whole content and then segments the stylized regions. This method is simple but it cannot control which regions in the style images has to be transferred to which regions in the content image. Our method and [9] produce close imitation of the foreground but ours better blends regions. In Fig. 3 (a), the background between the arms and the lower body of the person riding the bike should be orange but is somewhat greyish with [9]. With ours, the background stays orange. In Fig. 3 (b) with [9] the semantic style transfer failed around the head and the tail of the horse (because of erosion). With ours it does not. For [2], stylized regions blend naturally but another optimization problem is solved to avoid crude results. 6.2 Mutiple semantic regions One of the main drawbacks of targeted transfer [2] is that the running time increases linearly with the number of semantic regions. When R styles are applied, [2] solve the style transfer problem R times. Our method and [9] solve the problem only once whatever value of R. A running time comparison is shown in Tab. 1. It takes 92s to [2] to produce a stylized image with 2 semantic regions and 185s with 4 semantic regions: the time complexity is linear. With ours and [9], the running time increases approximately by 10 seconds for each additional regions. Ours is the fastest and beats [9] by seconds. We show some generated images with our method in Fig. 4.

9 Improving Semantic Style Transfer Using Guided Gram Matrices One-scale vs Multi-scale We compare here one-scale vs multi-scale methods. We used 700 iterations for both methods. We used the weighting scheme in [1] (better results in this case). Results with two-scale are presented in Fig. 5 and Fig. 6. We also report quantitative results with more scale in Tab. 2. Our two-scale algorithm Fig. 5h produces visually more pleasing results than one-scale method Fig. 5g. Even after 700 iterations, the one-scale method does not well capture the style for the soil: this problem is known as ghosting. Our two-scale method does not show those instabilities. Loss curves are shown in Fig. 7. Each optimisation begins with a preprocessing phase. During this phase, we extract the content features from the content image and the style features from the style images. This phase is longer for higher resolution images (10s for image, 20s for image). Increasing iterations is not efficient to reduce the style loss. With one-scale, the style loss remains almost constant after 330 iterations (50s). After optimisation, both methods present roughly the same content (2e+6) and style loss (1e+6). However, care has to be taken at the analysis of those values. As it can be seen in Fig. 5, the style loss fails to take ghosting instabilities into account. The two-scale method generate images faster than the one-scale one (68s compared to 83s). Using more scales is sligthly faster but the quantity of the generated images is worse (see Tab. 2). 6.4 Ablation study To study which part of our model is effective we perform an ablation study. Results are shown in Fig. 8. Fig. 8a demonstrates that not using regularization results in a noisy image. It can be seen in Fig. 8b that with guided Gram matrices alone, the semantics are not well respected (especially around the shoes). Autotuning halves the style loss by 2 (Fig. 8c). It also reduces the content loss. With Gradient capping the semantics are respected (smaller content loss) but the countours are visible. It explains why the style loss is higher (see around the body and ear of the cat in Fig. 8d). Combining Gram matrices, auto-tuning and gradient capping offers a good trade off between content and style loss (Fig. 8e). We can observe in Fig. 8f that blending at the boundaries can be improved by increasing the regularization weight. 7 Conclusion In this work we proposed a semantic style transfer that can process content images with more than 2 semantic regions. We have shown how simple modifications of the original guided Gram matrices can simplify the parameter tunning problem and improve the style transfer results. This study indicates that gradient capping is an efficient solution to spatial control of neural style transfer. We further proposed a multi-scale algorithm that generates high quality images

10 10 N. Chung et al. faster than single scale methods. In current implementation, our method requires the segmentation masks of both the style and content images. We are working on developing a better solution that could perform both segmentation and style transfer at the same time. Acknowledgment This work was supported by NSFC ( and ) and the Shanghai Key Laboratory of Digital Media Processing and Transmissions. References 1. Athalye, A.: Neural style. github.com/anishathalye/neural-style (2015) 2. Castillo, C., De, S., Han, X., Singh, B., Yadav, A., Goldstein, T.: Son of zorns lemma: Targeted style transfer using instance-aware semantic segmentation. arxiv: (2017) 3. Champandard, A.J.: Semantic style transfer and turning two-bit doodles into fine artwork. arxiv: (2016) 4. Chan, E., Bhargava, R.: Show, divide and neural: Weighted style transfer. CS231n: Convolutional Neural Networks for Visual Recognition, Project Reports (2016) 5. Dumoulin, V., Shlens, J., Kudlur, M.: A learned representation for artistic style. arxiv: (2017) 6. Johnson, J., Alahi, A., Fei-Fei, L.: Perceptual losses for real-time style transfer and super-resolution. arxiv: (2016) 7. Leon, A., Alexander, S., Matthias, B.: Image style transfer using convolutional neural networks. arxiv: (2015) 8. Leon, A., Alexander, S., Matthias, B., Hertzmann, A., Shechtman, E.: Supplementary material: Controlling perceptual factors in neural style transfer. bethgelab.org (2016) 9. Leon, A., Alexander, S., Matthias, B., Hertzmann, A., Shechtman, E.: Controlling perceptual factors in neural style transfer. arxiv: (2017) 10. Li, C., Wand, M.: Combining markov random fields and convolutional neural networks for image synthesis. arxiv: (2016) 11. Lu, M., Zhao, H.L., Yao, A., Xu, F., Chen, Y., Zhang, L.: Decoder network over lightweight reconstructed feature for fast semantic style transfer. The IEEE International Conference on Computer Vision (ICCV) (2017) 12. Mahendran, A., Vedaldi, A.: Multi-style generative network for real-time transfer. arxiv: (2017) 13. Novak, R., Nikulin, Y.: Improving the neural algorithm of artistic style. arxiv: (2016) 14. Risser, E., P., W., Barnes, C.: Stable and controllable neural texture synthesis and style transfer using histogram losses. arxiv: (2017) 15. Ulyanov, D., Vedaldi, A., Lepitsky, V.: Instance normalization: The missing ingredient for fast stylization. arxiv: (2017) 16. Zhang, H., Dana, K.: Understanding deep image representation by inverting them. arxiv: (2017) 17. Zhao, H., Rosin, P.L., Lai, Y.: Automatic semantic style transfer using deep convolutional neural networks and soft masks. arxiv: (2017)

Improving Semantic Style Transfer Using Guided Gram Matrices 11 (a) Motorbike (b)

Results obtained with [2] are pleasing, but ours offers more spatial control.

With [9], the background is not consitent in (a) around the motorbike and in (b)

11 Improving Semantic Style Transfer Using Guided Gram Matrices 11 (a) Motorbike (b) Horse Fig. 3. Targeted transfer [2] vs Guided Gram matrices [9] vs Ours. Results obtained with [2] are pleasing, but ours offers more spatial control. Compared to [9] our method gives better blending results. With [9], the background is not consitent in (a) around the motorbike and in (b) around the head and the tail of the horse. Content images were extracted from the DAVIS dataset and were resized to pixels.

12 12 N. Chung et al. (a) Mask (b) Style 1 (c) Style 2 (e) Content (g) Mask (h) Style 1 (l) Content (d) Style 3 (f) 3 styles combination (i) Style 2 (j) Style 3 (k) Style 4 (m) 4 styles combination Fig. 4. Our results for (f) 3, (m) 4 semantic regions. Numbers in the masks correspond to the style number which is applied. Mixing styles with the same predominant colors looks good ((m) yellow/orange). Content and maks are from the COCO dataset.

Improving Semantic Style Transfer Using Guided Gram Matrices

(e) Style 3 13 (f) Style 4 (h) Two-scale Fig. 5.

We used 700 iterations for both methods.

Multi-scaling generates a more pleasing image: with (g)

The perceptual loss curves are shown in Fig. 7. Fig. 6.

13 Improving Semantic Style Transfer Using Guided Gram Matrices (a) Content (b) Mask (g) One-scale (c) Style 1 (d) Style 2 (e) Style 3 13 (f) Style 4 (h) Two-scale Fig. 5. (g) One-scale transfer, (h) Two-scale transfer. We used 700 iterations for both methods. We detailed the 2-scale process in Fig. 6. Multi-scaling generates a more pleasing image: with (g) ghosting instabilities can be seen in the soil. The perceptual loss curves are shown in Fig. 7. Fig. 6. Multi-scale transfer: 500 iterations (it) are performed at low resolution ( ). The resulting image is then upscaled and set as initialisation for the next scale ( ). We performed 200 more iterations. We report quantitative results for different scale in Tab. 2.

14 14 N. Chung et al. Table 2. Comparison of content loss, style loss and running time for different scale. The image used is the one in Fig. 5. The total number of iterations is fixed to N = 700. Values are averaged over 20 samples. For each successive scale, we halves the number of iterations by approximately 2. Using 2 scales offers a good compromise between speed and quality of the generated images. Number scales Iterations per scale Loss (1e+6) Time Content Style (s) PP = preprocessing One-scale Two-scale PP = preprocessing One-scale Two-scale content loss style loss PP 500 it PP 200 it 10 5 PP 500 it PP 200 it PP 700 it time (s) (a) Content loss 10 4 PP 700 it time (s) (b) Style loss Fig. 7. Perceptual loss curves for 1-scale vs 2-scale representation. Visual results are shown in Fig. 5. With a same number of iterations (700 it), our 2-scale algorithm generates the stylized image faster (68s compared to 83s). The final content and style loss are roughly the same for both methods but multi-scaling reduces ghosting instabilities (see Fig. 5).

15 Improving Semantic Style Transfer Using Guided Gram Matrices 15 (a) Gram matrices (γ = 0) 2.79 / 2.42 (b) Gram matrices (γ = 1e+2) 2.79 / 2.43 (c) Auto-tuning + Fig. 8b 2.24 / 1.32 (d) Gradient capping + Fig. 8b 2.19 / 2.81 (e) Final (γ = 1e+2) 2.07 / 1.42 (f) Final (γ = 1e+3) 1.92 / 1.39 Fig. 8. Ablation study. The image used is the same one as in Fig. 4e. γ is the regularization weight defined in. Eq. 4. We report content / style loss (1e+6) below each image. Zoom in for details, especially around the right side of the shoes.

Arbitrary Style Transfer in Real-Time with Adaptive Instance Normalization. Presented by: Karen Lucknavalai and Alexandr Kuznetsov

Arbitrary Style Transfer in Real-Time with Adaptive Instance Normalization Presented by: Karen Lucknavalai and Alexandr Kuznetsov Example Style Content Result Motivation Transforming content of an image