Exemplar-Supported Generative Reproduction for Class Incremental Learning Supplementary Material

Similar documents
Class-Splitting Generative Adversarial Networks

Optimal transport for machine learning

arxiv: v1 [cs.ne] 11 Jun 2018

Rotation Invariance Neural Network

Channel Locality Block: A Variant of Squeeze-and-Excitation

Real-time Object Detection CS 229 Course Project

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides

Dynamic Routing Between Capsules

ANY image data set only covers a fixed domain. This

Deep Learning With Noise

Study of Residual Networks for Image Recognition

11. Neural Network Regularization

arxiv: v1 [cs.cv] 20 Dec 2016

arxiv: v1 [cs.cv] 16 Jul 2017

Finding Tiny Faces Supplementary Materials

Deep neural networks II

Improving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah

Sentiment Classification of Food Reviews

Deep Learning Workshop. Nov. 20, 2015 Andrew Fishberg, Rowan Zellers

Spatial Localization and Detection. Lecture 8-1

Ryerson University CP8208. Soft Computing and Machine Intelligence. Naive Road-Detection using CNNS. Authors: Sarah Asiri - Domenic Curro

Convolutional Neural Network for Facial Expression Recognition

The Mathematics Behind Neural Networks

arxiv: v1 [cs.cv] 7 Jun 2018

arxiv: v3 [cs.cv] 29 May 2017

Elastic Neural Networks for Classification

LSTM: An Image Classification Model Based on Fashion-MNIST Dataset

Smart Parking System using Deep Learning. Sheece Gardezi Supervised By: Anoop Cherian Peter Strazdins

HENet: A Highly Efficient Convolutional Neural. Networks Optimized for Accuracy, Speed and Storage

FUSION MODEL BASED ON CONVOLUTIONAL NEURAL NETWORKS WITH TWO FEATURES FOR ACOUSTIC SCENE CLASSIFICATION

3D model classification using convolutional neural network

Deep Learning with Tensorflow AlexNet

Contextual Dropout. Sam Fok. Abstract. 1. Introduction. 2. Background and Related Work

The Application Research of Neural Network in Embedded Intelligent Detection

Introduction to Generative Adversarial Networks

Su et al. Shape Descriptors - III

arxiv: v2 [cs.cv] 6 Dec 2017

Improved Boundary Equilibrium Generative Adversarial Networks

Multi-Glance Attention Models For Image Classification

Machine Learning 13. week

arxiv: v2 [cs.cv] 3 Apr 2019

Progressive Neural Architecture Search

Content-Based Image Recovery

A FRAMEWORK OF EXTRACTING MULTI-SCALE FEATURES USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKS. Kuan-Chuan Peng and Tsuhan Chen

arxiv: v2 [cs.cv] 20 Oct 2018

MULTI-VIEW FEATURE FUSION NETWORK FOR VEHICLE RE- IDENTIFICATION

Real-time convolutional networks for sonar image classification in low-power embedded systems

Rolling Bearing Diagnosis Based on CNN-LSTM and Various Condition Dataset

Generative Adversarial Network with Spatial Attention for Face Attribute Editing

Defense against Adversarial Attacks Using High-Level Representation Guided Denoiser SUPPLEMENTARY MATERIALS

Inception Network Overview. David White CS793

An Exploration of Computer Vision Techniques for Bird Species Classification

Tempered Adversarial Networks

Neural Networks with Input Specified Thresholds

CS 229 Final Report: Artistic Style Transfer for Face Portraits

Properties of adv 1 Adversarials of Adversarials

A Novel Representation and Pipeline for Object Detection

BranchyNet: Fast Inference via Early Exiting from Deep Neural Networks

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks

GAN and Feature Representation. Hung-yi Lee

CPSC 340: Machine Learning and Data Mining. Deep Learning Fall 2016

Recovering Realistic Texture in Image Super-resolution by Deep Spatial Feature Transform Supplementary Material

Artificial Intelligence Introduction Handwriting Recognition Kadir Eren Unal ( ), Jakob Heyder ( )

ImageNet Classification with Deep Convolutional Neural Networks

Know your data - many types of networks

arxiv: v1 [eess.sp] 23 Oct 2018

Visual Recommender System with Adversarial Generator-Encoder Networks

Rotate your Networks: Better Weight Consolidation and Less Catastrophic Forgetting

Machine Learning on Encrypted Data

Cost-alleviative Learning for Deep Convolutional Neural Network-based Facial Part Labeling

Convolutional Neural Networks for Facial Expression Recognition

Proceedings of the International MultiConference of Engineers and Computer Scientists 2018 Vol I IMECS 2018, March 14-16, 2018, Hong Kong

From Maxout to Channel-Out: Encoding Information on Sparse Pathways

All You Want To Know About CNNs. Yukun Zhu

Multi-Task Self-Supervised Visual Learning

Visual Inspection of Storm-Water Pipe Systems using Deep Convolutional Neural Networks

Unsupervised Learning

Super-Resolution on Image and Video

Using Real-valued Meta Classifiers to Integrate and Contextualize Binding Site Predictions

arxiv: v1 [cs.lg] 16 Feb 2018

CS230: Lecture 4 Attacking Networks with Adversarial Examples - Generative Adversarial Networks

Convolutional Neural Networks

Deep Learning for Computer Vision II

Smooth Deep Image Generator from Noises

Deep Neural Network Hyperparameter Optimization with Genetic Algorithms

Optimization Model of K-Means Clustering Using Artificial Neural Networks to Handle Class Imbalance Problem

Team G-RMI: Google Research & Machine Intelligence

CNNS FROM THE BASICS TO RECENT ADVANCES. Dmytro Mishkin Center for Machine Perception Czech Technical University in Prague

END-TO-END CHINESE TEXT RECOGNITION

MoonRiver: Deep Neural Network in C++

CS 1674: Intro to Computer Vision. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh November 16, 2016

CAP 6412 Advanced Computer Vision

Deep Neural Networks Optimization

Deep Model Compression

arxiv: v1 [cs.cv] 25 Jul 2018

Adversarial Examples in Deep Learning. Cho-Jui Hsieh Computer Science & Statistics UC Davis

arxiv: v1 [cs.cv] 29 Oct 2017

arxiv: v2 [stat.ml] 21 Oct 2017

Feature Super-Resolution: Make Machine See More Clearly

DCGANs for image super-resolution, denoising and debluring

Transcription:

HE ET AL.: EXEMPLAR-SUPPORTED GENERATIVE REPRODUCTION 1 Exemplar-Supported Generative Reproduction for Class Incremental Learning Supplementary Material Chen He 1,2 chen.he@vipl.ict.ac.cn Ruiping Wang 1,2 wangruiping@ict.ac.cn Shiguang Shan 1,2 sgshan@ict.ac.cn Xilin Chen 1,2 xlchen@ict.ac.cn 1 Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, 100190, China 2 University of Chinese Academy of Sciences, Beijing, 100049, China In the supplementary material, we provide the implementation details (Section 4.1) and the preliminary results of the variants of our method that use a more balanced training set which performs better under the 10-class adding setting on CIFAR-100 (mentioned in Section 4.2). 1 Implementation Details Exemplar-Supported Generative Reproduction (ESGR). For CIFAR-100 [2], we train a WGAN with gradient penalty (WGAN-GP) [1] for each class (under the one-class adding and the batch-of-class adding setting). Each WGAN-GP is trained for 10,000 iterations with a base Adam learning rate 0.001 and a batch size of 64. For ImageNet-Dogs, we add an auxiliary classifier [4] to WGAN-GP, making it an AC-WGAN-GP. Each generator is trained for 20,000 iterations with a base Adam learning rate 0.0002 and a batch size of 32. The generated images on CIFAR-100 and ImageNet-Dogs are shown in Figure 1 and Figure 3 respectively. All of the implementations of the generators are based on the codes of WGAN-GP [1]. According to the original implementations, the generator should be trained for 200,000 iterations, but the training loss and the inception scores won t change much and the generated images are good enough after 10,000 iterations on CIFAR-100 and 20,000 iterations on ImageNet-Dogs respectively. Thus, we take an early stop, which can save much training time for ESGR. icarl [5]. For convenience s sake, we use the codes released on Github by the author and only change the network architectures to be the same as ours to make fair comparisons. For both LeNet and ResNet, we use the output of the layer before the final fully connected layer as the extracted features as icarl did. Learning without Forgetting (LwF) [3]. Although LwF is not designed for this task, we still evaluate its performance. We simply refer to the implementation of hybrid1 in the c 2018. The copyright of this document resides with its authors. It may be distributed unchanged freely in print or electronic forms.

2 HE ET AL.: EXEMPLAR-SUPPORTED GENERATIVE REPRODUCTION Figure 1: The generated images by ESGR on CIFAR-100. Figure 2: The generated images by DGR on CIFAR-100 after the final training session. codes released by icarl as the implementation of LwF (hybrid1 is an implementation of LwF except that hybrid1 uses sigmoid instead of softmax in the original paper). Supposing that we want to add N classes to an M-class classifier and make it an (M+N)-class classifier, the one-hot vectors for the old task and new task are set to be M-dimension and N-dimension respectively. Deep generative replay (DGR) [6]. The author doesn t offer the codes nor conduct experiment on CIFAR-100 and ImageNet-Dogs, so we implement DGR ourselves. WGAN- GP is used as the generator, the network architecture of which is the same as that of ESGR. For CIFAR-100, we use a base Adam learning rate of 0.0001 and train 20,000 iterations for each training session. The generated images are shown in Figure 2. The ratio is set to be 0.8 for training the classifier since we find that 0.8 yields better results. For ImageNet-Dogs, we also train the generator for 20,000 iterations in each training session, but the generated images are getting unsatisfactory from the 8th training session (Figure 4). The reason might be that it is a difficult dataset and DGR cannot handle high-resolution pictures of so many classes. There is a possibility that careful tuning on the hyper-parameters may make DGR perform better, but it is a difficult task which is beyond the scope of this paper. 2 Results using a Balanced Version of ESGR In Section 4.2, we mention that the less satisfactory result of our method under the 10-class adding setting on CIFAR-100 is because the training set is in essence imbalanced and we suggest that it can be easily solved by using generated data only for the new class for

HE ET AL.: EXEMPLAR-SUPPORTED GENERATIVE REPRODUCTION 3 Figure 3: The generated images by ESGR on ImageNet-Dogs. Although most of the generated images are distorted since it is a difficult dataset, for some of them we can still recognize what is in the picture. Figure 4: The generated images by DGR on ImageNet-Dogs after the 8th training session. From the 8th training session, the training is less stable and the generated images become unsatisfactory.

4 HE ET AL.: EXEMPLAR-SUPPORTED GENERATIVE REPRODUCTION 100% 90% 80% CIFAR-100 ESGR-mix(balanced) ESGR-mix(imbalanced) ESGR-gens(balanced) ESGR-gens(imbalanced) 70% Accuracy 60% 50% 40% 30% 20% 10% 0% 10 20 30 40 50 60 Number of classes 70 80 90 100 Figure 5: Accuracy curves of the original version of ESGR-gens and ESGR-mix and the balanced versions under the 10-class adding setting on CIFAR-100. ESGR-gens and a proper combination of some randomly selected real data combined with generated data for ESGR-mix (please refer to the released codes for more details). Below are the results: From Figure 5, we can see that the balanced version of ESGR-gens has a 1-2% improvement over the imbalanced one in the final evaluation. But when adding less than 70 seen classes, the original imbalanced version performs better. It is because there are more real data in the training set in the imbalanced version an extreme case is when the algorithm sees the first 10 classes: for the imbalanced version the training set is composed of real samples only; for the balanced version the training set is composed of generated samples only. Apparently at this time the imbalanced version has better performance than the balanced version. But as the number of seen classes increases, the negative influence of the imbalance increases, and the performance drops more quickly than the balanced version. Note that when adding more classes, let s say 1,000, the two curves will perform almost the same, since the portion of the real samples is too small to cause imbalance. The balanced version of ESGR-mix shows similar phenomena and the reason is almost the same. From Table 1, we can see that the forget rate drops quite a lot by adapting the original version to a balanced one. It is even lower than that of icarl, indicating that the imbalance

HE ET AL.: EXEMPLAR-SUPPORTED GENERATIVE REPRODUCTION 5 ESGR-gens(balanced) True label 10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100 Predicted label Figure 6: Visualization of the confusion matrix of the balanced version of ESGR-gens under the 10-class adding setting on CIFAR-100. Note that it doesn t have a distinct bar on the right of the image anymore, implying that a test sample may be predicted as each of the seen classes equally. Table 1: Results of different methods under the 10-class adding setting on CIFAR-100 Method Final Adaptation Forget Rate Joint training 0.4611 0.5437 15.19% ESGR-gens (imbalanced) 0.3388 0.6393 47.00% ESGR-gens (balanced) 0.3597 0.4624 22.21% ESGR-mix (imbalanced) 0.3576 0.6243 42.72% ESGR-mix (balanced) 0.3803 0.5082 25.17% icarl 0.356 0.4865 26.82%

6 HE ET AL.: EXEMPLAR-SUPPORTED GENERATIVE REPRODUCTION ESGR-mix(balanced) True label 10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100 Predicted label Figure 7: Visualization of the confusion matrix of the balanced version of ESGR-mix under the 10-class adding setting on CIFAR-100. Similar to ESGR-gens (Figure 6), a test sample may be predicted as each of the seen classes equally.

HE ET AL.: EXEMPLAR-SUPPORTED GENERATIVE REPRODUCTION 7 is indeed the main cause of the forgetting effect. Also, from the visualization of the confusion matrices of ESGR-gens (balanced) and ESGR-mix (balanced) (Figure 6 and 7), it can be easily observed that there is not a distinct bar in the right of the confusion matrix any more, indicating that a sample can be predicted as each of the seen classes equally and it doesn t have a imbalance problem. References [1] Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, and Aaron C Courville. Improved training of wasserstein gans. In Advances in Neural Information Processing Systems, pages 5769 5779, 2017. [2] Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images. 2009. [3] Zhizhong Li and Derek Hoiem. Learning without forgetting. In European Conference on Computer Vision, pages 614 629. Springer, 2016. [4] Augustus Odena, Christopher Olah, and Jonathon Shlens. Conditional image synthesis with auxiliary classifier gans. arxiv preprint arxiv:1610.09585, 2016. [5] Sylvestre-Alvise Rebuffi, Alexander Kolesnikov, and Christoph H Lampert. icarl: Incremental classifier and representation learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5533 5542, 2017. [6] Hanul Shin, Jung Kwon Lee, Jaehong Kim, and Jiwon Kim. Continual learning with deep generative replay. In Advances in Neural Information Processing Systems, pages 2994 3003, 2017.