Deep Neural Network Hyperparameter Optimization with Genetic Algorithms

Similar documents
Convolutional Neural Networks

Deep Neural Networks:

Keras: Handwritten Digit Recognition using MNIST Dataset

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech

Deep Learning with Tensorflow AlexNet

Inception and Residual Networks. Hantao Zhang. Deep Learning with Python.

Computer Vision Lecture 16

Study of Residual Networks for Image Recognition

Deep Learning for Computer Vision II

HENet: A Highly Efficient Convolutional Neural. Networks Optimized for Accuracy, Speed and Storage

Defense Data Generation in Distributed Deep Learning System Se-Yoon Oh / ADD-IDAR

Vulnerability of machine learning models to adversarial examples

Intro to Deep Learning. Slides Credit: Andrej Karapathy, Derek Hoiem, Marc Aurelio, Yann LeCunn

Keras: Handwritten Digit Recognition using MNIST Dataset

Seminars in Artifiial Intelligenie and Robotiis

Deep Learning Workshop. Nov. 20, 2015 Andrew Fishberg, Rowan Zellers

Real-time convolutional networks for sonar image classification in low-power embedded systems

COMP9444 Neural Networks and Deep Learning 7. Image Processing. COMP9444 c Alan Blair, 2017

Advanced Introduction to Machine Learning, CMU-10715

CS 2750: Machine Learning. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh April 13, 2016

Tutorial on Keras CAP ADVANCED COMPUTER VISION SPRING 2018 KISHAN S ATHREY

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides

MoonRiver: Deep Neural Network in C++

POINT CLOUD DEEP LEARNING

6. Convolutional Neural Networks

Deep Learning Basic Lecture - Complex Systems & Artificial Intelligence 2017/18 (VO) Asan Agibetov, PhD.

Convolutional Neural Networks

Deep Learning Based Large Scale Handwritten Devanagari Character Recognition

Know your data - many types of networks

Efficient Convolutional Network Learning using Parametric Log based Dual-Tree Wavelet ScatterNet

Smart Parking System using Deep Learning. Sheece Gardezi Supervised By: Anoop Cherian Peter Strazdins

Machine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU,

Machine Evolution. Machine Evolution. Let s look at. Machine Evolution. Machine Evolution. Machine Evolution. Machine Evolution

Weighted Convolutional Neural Network. Ensemble.

CNNS FROM THE BASICS TO RECENT ADVANCES. Dmytro Mishkin Center for Machine Perception Czech Technical University in Prague

Generative Modeling with Convolutional Neural Networks. Denis Dus Data Scientist at InData Labs

Facial Key Points Detection using Deep Convolutional Neural Network - NaimishNet

Vulnerability of machine learning models to adversarial examples

TensorFlow and Keras-based Convolutional Neural Network in CAT Image Recognition Ang LI 1,*, Yi-xiang LI 2 and Xue-hui LI 3

Channel Locality Block: A Variant of Squeeze-and-Excitation

Outline GF-RNN ReNet. Outline

Machine Learning With Python. Bin Chen Nov. 7, 2017 Research Computing Center

Deep Learning With Noise

Introduction to Neural Networks

Advanced Machine Learning

Deep Learning Explained Module 4: Convolution Neural Networks (CNN or Conv Nets)

ECE 5470 Classification, Machine Learning, and Neural Network Review

Capsule Networks. Eric Mintun

IMPROVEMENT OF DEEP LEARNING MODELS ON CLASSIFICATION TASKS USING HAAR TRANSFORM AND MODEL ENSEMBLE

INTRODUCTION TO DEEP LEARNING

Tuning the Layers of Neural Networks for Robust Generalization

Computer Vision Lecture 16

CNN Basics. Chongruo Wu

Contextual Dropout. Sam Fok. Abstract. 1. Introduction. 2. Background and Related Work

Exploring Capsules. Binghui Peng Runzhou Tao Shunyu Yao IIIS, Tsinghua University {pbh15, trz15,

A Quick Guide on Training a neural network using Keras.

BranchyNet: Fast Inference via Early Exiting from Deep Neural Networks

arxiv: v2 [cs.cv] 30 Oct 2018

CNN optimization. Rassadin A

SEMANTIC COMPUTING. Lecture 8: Introduction to Deep Learning. TU Dresden, 7 December Dagmar Gromann International Center For Computational Logic

Deep Convolutional Neural Networks and Noisy Images

Machine Learning. MGS Lecture 3: Deep Learning

Rotation Invariance Neural Network

An Introduction to Deep Learning with RapidMiner. Philipp Schlunder - RapidMiner Research

Kaggle Data Science Bowl 2017 Technical Report

Ryerson University CP8208. Soft Computing and Machine Intelligence. Naive Road-Detection using CNNS. Authors: Sarah Asiri - Domenic Curro

Computer Vision Lecture 16

Machine Learning Workshop

Mocha.jl. Deep Learning in Julia. Chiyuan Zhang CSAIL, MIT

arxiv: v1 [cs.cv] 26 Aug 2016

Fei-Fei Li & Justin Johnson & Serena Yeung

Index. Umberto Michelucci 2018 U. Michelucci, Applied Deep Learning,

A Further Step to Perfect Accuracy by Training CNN with Larger Data

A performance comparison of Deep Learning frameworks on KNL

NVIDIA DLI HANDS-ON TRAINING COURSE CATALOG

Characterization and Benchmarking of Deep Learning. Natalia Vassilieva, PhD Sr. Research Manager

3D Densely Convolutional Networks for Volumetric Segmentation. Toan Duc Bui, Jitae Shin, and Taesup Moon

CS 1674: Intro to Computer Vision. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh November 16, 2016

YOLO9000: Better, Faster, Stronger

Using Capsule Networks. for Image and Speech Recognition Problems. Yan Xiong

On the Effectiveness of Neural Networks Classifying the MNIST Dataset

Deep Learning & Neural Networks

Lecture 17: Neural Networks and Deep Learning. Instructor: Saravanan Thirumuruganathan

Convolu'onal Neural Networks

arxiv: v1 [cs.lg] 16 Jan 2013

Fuzzy Set Theory in Computer Vision: Example 3

Convolutional Deep Belief Networks on CIFAR-10

A Deep Learning primer

Review: The best frameworks for machine learning and deep learning

On Fast Sample Preselection for Speeding up Convolutional Neural Network Training

DEEP NEURAL NETWORKS FOR OBJECT DETECTION

arxiv: v1 [cs.cv] 29 Oct 2017

Code Mania Artificial Intelligence: a. Module - 1: Introduction to Artificial intelligence and Python:

CS 523: Multimedia Systems

Convolutional Neural Network for Facial Expression Recognition

arxiv: v2 [cs.cv] 29 Nov 2017

Elastic Neural Networks for Classification

Supplementary material for Analyzing Filters Toward Efficient ConvNet

Evaluation of Convnets for Large-Scale Scene Classification From High-Resolution Remote Sensing Images

Supervised Hashing for Image Retrieval via Image Representation Learning

Transcription:

Deep Neural Network Hyperparameter Optimization with Genetic Algorithms EvoDevo A Genetic Algorithm Framework Aaron Vose, Jacob Balma, Geert Wenes, and Rangan Sukumar Cray Inc. October 2017 Presenter Vose, A et al Slide 1

EvoDevo Motivation and Description Improve time-to-accuracy as well as accuracy in DNN training Take the trial-and-error out of DNN training Genetic / Evolutionary Algorithm (GA/EA) framework. Use a fitness function and crossover and migration mechanisms to evolve local, somewhat isolated pools (demes) of hyperparameters over multiple generations per training epoch Optimization of neural network hyperparameters and topology: Number of filters, kernel size of convolutional layers. Size of fully-connected layers. Dropout rate and momentum used during training. Vose, A et al Slide 2

EvoDevo Built into existing CNNs Support for multiple toolkits (Google s TensorFlow, Microsoft s Cognitive Toolkit, Keras and others) C wrapper provides generic interface, multi-node support via MPI. References Inspired by previous work in theoretical biology at UTK [1]. See also recent MENNDL work out of ORNL [6]. Vose, A et al Slide 3

Datasets MNIST: Size: 70,000 images 60,000 train and 10,000 test Resolution: 28 28 greyscale pixels Classes: 10 classes one each for digits 0-9 Figure 1: Selected example images from MNIST [4]. Vose, A et al Slide 4

Datasets CIFAR-10: Size: 60,000 images 50,000 train and 10,000 test Resolution: 32 32 color pixels Classes: 10 classes airplane, bird, cat,... Figure 2: Selected example images from CIFAR-10 [3]. Vose, A et al Slide 5

NN Architectures LeNet-5 in TensorFlow: Model: 7-Layer (5 hidden) LeNet [5] ToolKit: Google s TensorFlow Language: Python code calling TF API Figure 3: LeNet-5 neural network architecture. Vose, A et al Slide 6

NN Architectures ResNet-110 in CNTK: Model: 110-Layer ResNet [2] ToolKit: Microsoft s CNTK Language: Configuration script read by CNTK (C++) Figure 4: ResNet neural network architecture (34 layers shown for clarity). Vose, A et al Slide 7

Results Time to Accuracy LeNet-5, MNIST, TensorFlow: Model: 7-Layer (5 hidden) LeNet Momentum: 1e 4 1e 3 Topology: 5 32, 5 64, 1024 5 18, 5 32, 512 c1kern c1filt, c2kern c2filt, fullconn Gain: 70% reduction in training time to validation accuracy of 99.1%. Genetic Algorithm: Fitness function: Optimization time: 5 samples of: training time to 99.1% accuracy. 24 hours Vose, A et al Slide 8

Results Time to Accuracy Figure 5: Validation accuracy during training. Vose, A et al Slide 9

Results Final Accuracy ResNet-110, CIFAR-10, CNTK: Model: 110-Layer ResNet Topology: 16, 32, 64 32, 15, 128 cstack1filt, cstack2filt, cstack3filt Error: Gain: 6.35% initial 5.91% optimized 7% reduction in final top-1 classification error Genetic Algorithm: Fitness function: Optimization time: 3 samples of: validation accuracy at 2 epochs. 24 hours Vose, A et al Slide 10

Results Final Accuracy Figure 6: Best individual s two-epoch validation accuracy improves over successive generations of EvoDevo s evolutionary algorithm. Vose, A et al Slide 11

Genetic Algorithm Life Cycle Details Typical Parameters PARAM EPOCHS = PARAM GENERATIONS = PARAM DEMES = 31 epochs 5 generations per epoch 4 demes (local populations) in 2 2 grid PARAM POPULATION SIZE = 4 demes * 25 to 85 individuals Infrastructure Results obtained on 16 Cray XC-50 nodes with NVIDIA P100s Vose, A et al Slide 12

Genetic Algorithm Life Cycle Details Generations and Epochs g 0 P g initial population while g < PARAM GENERATIONS: p P g : p.runtime execute( p ) p P g : p.fitness e ((p.runtime min)/(max min))2 while P (g+1) < PARAM POPULATION SIZE: p a p P g with probability p.fitness q.fitness q Pg p.fitness q.fitness q Pg p b p P g with probability c mutate( crossover( p a, p b ) ) P (g+1) P (g) {c} g g + 1 if MOD( g, PARAM GENERATIONS/PARAM EPOCHS ) == 0: migrate best population member( north, south, east, west ) Vose, A et al Slide 13

Genetic Algorithm Life Cycle Details Crossover Figure 7: Crossover combines the hyperparameters of two parents to create a new child. Vose, A et al Slide 14

Genetic Algorithm Life Cycle Details Migration Figure 8: Migration copies the best individuals to neighboring demes each epoch. Vose, A et al Slide 15

Conclusions Evolution of DNN Topologies and Hyperparameters with EvoDevo HPC-scalable solution for exploration of DNN topologies and hyperparameters Simultaneous evolution of hyperparameters and topology widens search space, maximizes training speed or validation accuracy Supports individuals with distributed training node-sets (via MPI), enabling large data-parallel training tasks Population size scales with machine resources Time-to-Accuracy: Shown to significantly improve training time for DNNs Selects for individuals who reach target accuracy fastest Final Accuracy: Shown to improve validation accuracy over a known best-topology on CIFAR-10 Prunes search space of topologies when a good starting topology is not known (applies to new datasets, similar to MENNDL) Vose, A et al Slide 16

Future Work Expand Hyperparameter Evolution: Stride of convolutional and pooling layers. Number of convolutional and fully-connected layers. Activation function (e.g., logistic, tanh, ReLU). Random seed value for better initial weights. Larger Runs: Larger data sets such as CIFAR-100 and ImageNet. Larger EvoDevo runs on more compute nodes. New Applications: Unsupervised learning with Generative Adversarial Networks (GANs). Vose, A et al Slide 17

References Bibliography: S. Gavrilets and A. Vose. Dynamic patterns of adaptive radiation. Proceedings of the National Academy of Sciences of the United States of America, 102(50):18040 18045, 2005. K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770 778, 2016. A. Krizhevsky and G. Hinton. Learning multiple layers of features from tiny images. 2009. Y. LeCun. The mnist database of handwritten digits. http://yann. lecun. com/exdb/mnist/. Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278 2324, 1998. T. E. Potok, C. D. Schuman, S. R. Young, R. M. Patton, F. Spedalieri, J. Liu, K.-T. Yao, G. Rose, and G. Chakma. A study of complex deep learning networks on high performance, neuromorphic, and quantum computers. In Proceedings of the Workshop on Machine Learning in High Performance Computing Environments, pages 47 55. IEEE Press, 2016. Vose, A et al Slide 18