IDENTIFYING PHOTOREALISTIC COMPUTER GRAPHICS USING CONVOLUTIONAL NEURAL NETWORKS

Similar documents
Supplementary material for Analyzing Filters Toward Efficient ConvNet

Learning Semantic Video Captioning using Data Generated with Grand Theft Auto

DD2427 Final Project Report. Human face attributes prediction with Deep Learning

Video Gesture Recognition with RGB-D-S Data Based on 3D Convolutional Networks

A FRAMEWORK OF EXTRACTING MULTI-SCALE FEATURES USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKS. Kuan-Chuan Peng and Tsuhan Chen

Sanny: Scalable Approximate Nearest Neighbors Search System Using Partial Nearest Neighbors Sets

A PARALLEL-FUSION RNN-LSTM ARCHITECTURE FOR IMAGE CAPTION GENERATION

Convolutional Neural Network Layer Reordering for Acceleration

arxiv: v3 [cs.cv] 21 Jul 2017

Tiny ImageNet Visual Recognition Challenge

arxiv: v1 [cs.cv] 8 Mar 2016

Volumetric and Multi-View CNNs for Object Classification on 3D Data Supplementary Material

Groupout: A Way to Regularize Deep Convolutional Neural Network

Improving the adversarial robustness of ConvNets by reduction of input dimensionality

Real-time Object Detection CS 229 Course Project

Channel Locality Block: A Variant of Squeeze-and-Excitation

Deep Learning with Tensorflow AlexNet

In Defense of Fully Connected Layers in Visual Representation Transfer

Multi-Glance Attention Models For Image Classification

Cultural Event Recognition by Subregion Classification with Convolutional Neural Network

Large-scale gesture recognition based on Multimodal data with C3D and TSN

ImageNet Classification with Deep Convolutional Neural Networks

Human Action Recognition Using CNN and BoW Methods Stanford University CS229 Machine Learning Spring 2016

CNN BASED REGION PROPOSALS FOR EFFICIENT OBJECT DETECTION. Jawadul H. Bappy and Amit K. Roy-Chowdhury

HENet: A Highly Efficient Convolutional Neural. Networks Optimized for Accuracy, Speed and Storage

Deep Learning Based Real-time Object Recognition System with Image Web Crawler

Content-Based Image Recovery

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech

Recognize Complex Events from Static Images by Fusing Deep Channels Supplementary Materials

Faceted Navigation for Browsing Large Video Collection

Ryerson University CP8208. Soft Computing and Machine Intelligence. Naive Road-Detection using CNNS. Authors: Sarah Asiri - Domenic Curro

Real Time Monitoring of CCTV Camera Images Using Object Detectors and Scene Classification for Retail and Surveillance Applications

Feature Contraction: New ConvNet Regularization in Image Classification

KamiNet A Convolutional Neural Network for Tiny ImageNet Challenge

Transfer Learning. Style Transfer in Deep Learning

Elastic Neural Networks for Classification

EFFECTIVE OBJECT DETECTION FROM TRAFFIC CAMERA VIDEOS. Honghui Shi, Zhichao Liu*, Yuchen Fan, Xinchao Wang, Thomas Huang

Contextual Dropout. Sam Fok. Abstract. 1. Introduction. 2. Background and Related Work

Deep Learning for Computer Vision II

Presentation Outline. Semantic Segmentation. Overview. Presentation Outline CNN. Learning Deconvolution Network for Semantic Segmentation 6/6/16

CSE 559A: Computer Vision

Weighted Convolutional Neural Network. Ensemble.

Tunnel Effect in CNNs: Image Reconstruction From Max-Switch Locations

arxiv: v2 [cs.lg] 8 Mar 2016

Robust Face Recognition Based on Convolutional Neural Network

FACIAL EXPRESSION RECOGNITION WITH DEEP AGE. Zhaojie Luo, Jinhui Chen, Tetsuya Takiguchi, Yasuo Ariki

arxiv: v1 [cs.cv] 4 Dec 2014

Visual Inspection of Storm-Water Pipe Systems using Deep Convolutional Neural Networks

Object Detection for Crime Scene Evidence Analysis using Deep Learning

TRANSPARENT OBJECT DETECTION USING REGIONS WITH CONVOLUTIONAL NEURAL NETWORK

DeCAF: a Deep Convolutional Activation Feature for Generic Visual Recognition

Classification and Representation Joint Learning via Deep Networks

Fast Learning and Prediction for Object Detection using Whitened CNN Features

Applying Visual User Interest Profiles for Recommendation & Personalisation

Return of the Devil in the Details: Delving Deep into Convolutional Nets

Comparison of Fine-tuning and Extension Strategies for Deep Convolutional Neural Networks

Real-time convolutional networks for sonar image classification in low-power embedded systems

DIVERSITY PROMOTING ONLINE SAMPLING FOR STREAMING VIDEO SUMMARIZATION. Arizona State University.

Real-Time Depth Estimation from 2D Images

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

arxiv: v1 [cs.cv] 26 Jun 2017

Deep Neural Decision Forests

Meta-Learner with Linear Nulling

Layer-wise Performance Bottleneck Analysis of Deep Neural Networks

Understanding Deep Networks with Gradients

arxiv: v1 [cs.cv] 6 Jul 2016

Fast CNN-Based Object Tracking Using Localization Layers and Deep Features Interpolation

arxiv: v4 [cs.cv] 14 Apr 2017

REGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION

Multimodal Sparse Coding for Event Detection

Deep learning for object detection. Slides from Svetlana Lazebnik and many others

arxiv: v2 [cs.cv] 30 May 2016

Application of Convolutional Neural Network for Image Classification on Pascal VOC Challenge 2012 dataset

Tiny ImageNet Challenge Submission

Convolutional Architecture Exploration for Action Recognition and Image Classification

Learning the Structure of Deep Architectures Using l 1 Regularization

CEA LIST s participation to the Scalable Concept Image Annotation task of ImageCLEF 2015

Deep Neural Networks:

R-FCN++: Towards Accurate Region-Based Fully Convolutional Networks for Object Detection

Pipeline-Based Processing of the Deep Learning Framework Caffe

Learning Binary Code with Deep Learning to Detect Software Weakness

Encoder-Decoder Networks for Semantic Segmentation. Sachin Mehta

arxiv: v2 [cs.cv] 30 Oct 2018

REVISITING DISTRIBUTED SYNCHRONOUS SGD

Supervised Learning of Classifiers

Object Detection Based on Deep Learning

Extend the shallow part of Single Shot MultiBox Detector via Convolutional Neural Network

Computer Vision Lecture 16

Pedestrian Detection based on Deep Fusion Network using Feature Correlation

Automatic Graphic Logo Detection via Fast Region-based Convolutional Networks

Learning Convolutional Neural Networks using Hybrid Orthogonal Projection and Estimation

DFT-based Transformation Invariant Pooling Layer for Visual Classification

Video Inter-frame Forgery Identification Based on Optical Flow Consistency

Proceedings of the International MultiConference of Engineers and Computer Scientists 2018 Vol I IMECS 2018, March 14-16, 2018, Hong Kong

Traffic Multiple Target Detection on YOLOv2

arxiv: v1 [cs.cv] 22 Sep 2014

Kaggle Data Science Bowl 2017 Technical Report

International Journal of Computer Engineering and Applications, Volume XII, Special Issue, September 18,

Efficient Convolutional Network Learning using Parametric Log based Dual-Tree Wavelet ScatterNet

Deeply Cascaded Networks

Transcription:

IDENTIFYING PHOTOREALISTIC COMPUTER GRAPHICS USING CONVOLUTIONAL NEURAL NETWORKS In-Jae Yu, Do-Guk Kim, Jin-Seok Park, Jong-Uk Hou, Sunghee Choi, and Heung-Kyu Lee Korea Advanced Institute of Science and Technology School of Computing (ijyu 1, dgkim 2, jspark 3, juheo 4 )@mmc.kaist.ac.kr, (sunghee,heunglee)@kaist.ac.kr ABSTRACT As computer graphics technology advances, it is becoming increasingly difficult to determine whether a given picture was taken by camera or via computer graphics. In this work, we propose a method to using simple CNN structures to identify photorealistic computer graphics (PRCG) using convolutional neural networks (CNN). This network trained to identify the source of image patches. We showed the network without pooling layer showed 98.2% accuracy, which is 2.1% higher than the result of using conventional object-recognition network. Testing random patches from image, the accuracy of identifying image reached 98.5%. Furthermore, it is possible to detect the photograph-prcg synthesized regions from the image. Index Terms Digital Forensics, Image Source Identification, Convolutional Neural Networks, Photo-Realistic Computer Graphics 1. INTRODUCTION Camera and computer graphics technology has been developing simultaneously. The development of digital camera technology has dramatically improved the performance of the built-in camera in the mobile phone, and now everyone with a cell phone can shoot high-resolution or full HD images. At the same time, computer graphics technology keep developing. With the development of rendering software, the 3D modeling of the scene has become more sophisticated and the results has became more realistic after rendering. In addition, with the development of GPU technology, the level of real-time graphics such as computer game has also improved greatly. Therefore, it is very easy to get a graphic image that looks similar to a real photograph if the user requires it, and the quality is also very high. We call these images as photorealistic computer graphics (PRCG). As shown in Fig. 1, the current graphics technology has developed to describe a direction of the light and texture of the objects. Corresponding Author (a) PRCG (b) Photograph Fig. 1: Examples of PRCG and photograph images So far, studies have been conducted to identify PRCG and photographs. They classified these images using machine learning techniques such as Support Vector Machine (SVM) after calculating statistical properties in images. Ng et al. [1] used feature vector consisting of 33 dimensional power spectrum, 24-dimensional local patch features, and 72- dimensional high order wavelet coefficients. Lyu et al. [2] proposed a method of using wavelet models of natural images. A 216-dimensional feature vector generated by constructing QMF pyramid in each color channel of the image was learned through SVM. Wang et al. [3] proposed 70- dimensional contourlet transformation and homomorphic filtering features. Peng et al. [4] proposed 31-dimensional texture and statistic features and achieved over 97% accuracy for both PRCG and photograph using LIBSVM [5]. Although existing PRCG identification methods use different statistical features, but there is no clear relationship between them. Features such as third or fourth order wavelet coefficients may be appropriate for PRCG detection after feature calculation, but there is insufficient explanation as to why such differences exist. On the other hand, when extracting features, existing techniques work on the entire image, or split the image, and extract feature from each image block to merge into a single feature. Depending on the size of image, it is difficult to determine whether the technology works well. Also, when photograph and PRCG are synthesized into one images, these methods are difficult to make accurate judgments. In this work, we propose a method using convolutional neural networks (CNN) to identify PRCG. This network was trained to classify input image patches to PRCG or photo-

graph. Unlike the existing studies, the features required for the training are learned by the network. In addition, since the trained network identifies only for a small image patch, not an entire image, it can perform two functions depending on its use. First, based on the high accuracy of network on the image patch, it is possible to identify whether the image is PRCG with small number of patches from the image. Second, it is possible to detect the area where PRCG is synthesized, by testing a very large number of patches in a constant grid. The rest of this paper is organized as follows: The proposed method is described in Section 2. The experiment and discussion is presented in Section 3. The conclusion and future works is contained in Section 4. In our method, the trained network judges only for tiny patch of image. To test the image, we pick image patches randomly from the images. The image is determined to be PRCG if the ratio of the patches classifed to PRCG is over 50%. Overall process is described in Fig. 3. 2. THE PROPOSED METHOD 2.1. Network training process Fig. 2 describes the training process. In preprocess phase, image patches are randomly extracted from the photograph and PRCG dataset. Size of each patch is 32 32 3. The photograph database consisted of images taken from different digital cameras and mobile phones. The PRCG dataset consisted of images collected from the web such as PRCG competition and dataset from the previous works. The number of patches selected from the photograph and PRCG dataset were same. In network traning phase, a CNN is trained to classify the input patch into photograph and PRCG. The network was trained using the prepared patch data. Fig. 3: PRCG identification 2.3. PRCG-photograph synthesized area detection It is possible to detect area where PRCG and photograph are synthesized from the image using PRCG-CNN. Rather than randomly extracting patches from the image, the patch is extracted at regular intervals and tested with the learned network. Each grid was colored white when it was classified as PRCG, and otherwise it was colored black. Fig. 4: Synthesized area detection 2.4. Network models used in training 2.2. PRCG identification Fig. 2: Training process The previous methods used an entire image or a very large part of the image in training and test phase. Therfore, previoius methods directly made decision on image whether it is PRCG or not. The basic structure of the networks is VGG-net [6]. We replaced dropout [7] with batch normalization [8]. Because the size of image patches is much smaller than that of the ImageNet dataset [9], we used a structure that reduced the depth and filter size of the existing VGG-net. We used one more network model, which consist of just convolutional layers without pooling layer. The reason of using such network is that max-pooling layer is not suitable for training low-level difference compare to ordinary classification task that use high level features. Thus, we removed pooling layer. The two networks are referred to as Type 1 and Type 2, respectively. Fig. 5 represents the detailed structure of each network. The difference between the two networks is that Type 2 has no pooling layer and padding is removed from

each convolutional layer to reduce the number of training parameters. Fig 7(a) is result of paste photograph-oriented object into PRCG. It shows that the network correctly detects the synthesized region from the image. (a) Test error of each networks Fig. 5: Architecture of two networks 3. EXPERIMENTS AND DISCUSSIONS Photograph dataset consisted of 1000 JPEG images taken from several digital cameras and cell phones. PRCG dataset consisted of jpeg 1000 images from Columbia Image Dataset [10], and images from the photorealism competition conducted monthly by the 3d rendering softwares (e.g. 3D Studio Max, Maya, Blender). Recently rendered images have a resolution from HD to FullHD, while the images in the Columbia dataset has very low resolution. For each photograph and PRCG dataset, the number of images used for training and test were 750 and 250, respectively. Image patches were extracted from the datasets to use as input to the network. The number of patches used to training and test were 200,000 and 80,000, respectively. The computer used in the experiment consisted of i7-6770k 4GHz CPU and NVIDIA Geforce GTX 1070 GPU with 8GB memory. 3.1. Training result of the network We used Caffe [11] for experiment. All the learning parameters (e.g. learning rate, batch size) applied to the two networks were same. Fig 6(a) represents the performance of two networks. The test error of each network is reduced to 3.81%, 1.78%, respectively. In the case of testing 1000 patches randomly selected from each image (Section 2.2), it was possible to make accurate decision on 98%. The rate of mis-classifying the patch extracted from the PRCG was much lower than that of the photograph. (b) Training loss of each networks Fig. 6: Training/Test result of each network: there is no significant difference in the error reduction tendency, but Type 2 showed a stable error reduction. Training loss of Type 2 decreased much more rapidly. 3.2. The effect of eliminating pooling Fig 6(b) shows that the training loss between two networks is 100 times different. By applying patches that are extracted in constant grid from the image to the trained network showed that the Type 1 is less robust to edge area (Fig. 8). This is because passing through the pooling layer loses association between adjacent pixels. Although pooling enhance the performance of object recognition by finding hierarchically highlevel features and speed up the training, but it is not suited to digital forensics. 3.3. Robustness on image processing Fig 9 shows the result of robustness test to image processing on photographs. In the case of PRCG, almost all of the patches were classifed to PRCG before image processing, which did not change after processing. On the other hand,

(a) Photograph (b) Resized to half (a) Synthesized image (b) Detection result Fig. 7: Trained network succesfully detects synthesized region from the image (c) JPEG 90 (d) JPEG 70 Fig. 9: Test on photograph for robustness to image processing (a) Photograph (b) Result of Type1 (d) PRCG (e) Result of Type1 out pooling (Type 2) showed excellent results, while Type 2 showed less sensitive performance to edge and plain regions. Furthermore, with the high accuracy of network, it was possible to detect synthesized regions from the image. However, the proposed network was not robust to image processing such as resizing and JPEG compression. For future works, we plan to design a network having high accuracy, and robust to various image processing. We also plan to extend this work to apply for various forensic tasks. (c) Result of Type2 (f) Result of Type2 Acknowledgement This work was supported by the National Research Foundation of Korea(NRF) grant funded by the Korea government(msip) (No. 2016R1A2B2009595) Fig. 8: Result of test on patches with constant grid from photograph and PRCG in the case of photographs, image processing significantly loses the characteristics of photography, which significantly increases the rate of inaccurate network decision. If the the patches classified to PRCG is uniformly distributed throughout the image, it is able to use the network to detect the synthesized region, but the distribution is skewed according to the image content. Therefore, the current network is not suitable to detect synthesized region if the image has been processed. 4. CONCLUSION In this paper, we proposed a method for identifying PRCG using convolutional neural networks. We trained network to classify the tiny image patch into photograph and PRCG. Both object-recognition network (Type 1) and network with- 5. REFERENCES [1] Tian-Tsong Ng and Shih-Fu Chang, Classifying photographic and photorealistic computer graphic images using natural image statistics, 2004. [2] Siwei Lyu and Hany Farid, How realistic is photorealistic?, IEEE Transactions on Signal Processing, vol. 53, no. 2, pp. 845 850, 2005. [3] Xiaofeng Wang, Yong Liu, Bingchao Xu, Lu Li, and Jianru Xue, A statistical feature based approach to distinguish prcg from photographs, Computer Vision and Image Understanding, vol. 128, pp. 84 93, 2014. [4] Fei Peng, Jiao-ting Li, and Min Long, Identification of natural images and computer-generated graphics based on statistical and textural features, Journal of forensic sciences, vol. 60, no. 2, pp. 435 443, 2015. [5] Chih-Chung Chang and Chih-Jen Lin, Libsvm: a library for support vector machines, ACM Transactions

on Intelligent Systems and Technology (TIST), vol. 2, no. 3, pp. 27, 2011. [6] Ken Chatfield, Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman, Return of the devil in the details: Delving deep into convolutional nets, arxiv preprint arxiv:1405.3531, 2014. [7] Nitish Srivastava, Geoffrey E Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov, Dropout: a simple way to prevent neural networks from overfitting., Journal of Machine Learning Research, vol. 15, no. 1, pp. 1929 1958, 2014. [8] Sergey Ioffe and Christian Szegedy, Batch normalization: Accelerating deep network training by reducing internal covariate shift, arxiv preprint arxiv:1502.03167, 2015. [9] Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei, ImageNet Large Scale Visual Recognition Challenge, International Journal of Computer Vision (IJCV), vol. 115, no. 3, pp. 211 252, 2015. [10] J. Hsu T.-T Ng, S.-F. Chang and M. Pepeljugoski, Columbia photographic images and photorealistic computer graphics dataset, Tech. Rep. 205-2004-5, ADVENT, Columbia University, 2004. [11] Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell, Caffe: Convolutional architecture for fast feature embedding, arxiv preprint arxiv:1408.5093, 2014.