Advanced Video Analysis & Imaging

Similar documents
Encoder-Decoder Networks for Semantic Segmentation. Sachin Mehta

DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution and Fully Connected CRFs

Inception and Residual Networks. Hantao Zhang. Deep Learning with Python.

Spatial Localization and Detection. Lecture 8-1

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Machine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU,

Tutorial on Keras CAP ADVANCED COMPUTER VISION SPRING 2018 KISHAN S ATHREY

CENG 783. Special topics in. Deep Learning. AlchemyAPI. Week 11. Sinan Kalkan

Yiqi Yan. May 10, 2017

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech

Know your data - many types of networks

Lecture 7: Semantic Segmentation

Deep Learning for Computer Vision II

Deep Learning with Tensorflow AlexNet

Fully Convolutional Networks for Semantic Segmentation

Inception Network Overview. David White CS793

INTRODUCTION TO DEEP LEARNING

DEEP NEURAL NETWORKS FOR OBJECT DETECTION

Machine Learning. MGS Lecture 3: Deep Learning

CAP 6412 Advanced Computer Vision

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

SSD: Single Shot MultiBox Detector. Author: Wei Liu et al. Presenter: Siyu Jiang

Deconvolutions in Convolutional Neural Networks

Convolutional Neural Networks

CIS680: Vision & Learning Assignment 2.b: RPN, Faster R-CNN and Mask R-CNN Due: Nov. 21, 2018 at 11:59 pm

Structured Prediction using Convolutional Neural Networks

COMP9444 Neural Networks and Deep Learning 7. Image Processing. COMP9444 c Alan Blair, 2017

Deep learning for object detection. Slides from Svetlana Lazebnik and many others

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 37: ConvNets (Cont d) and Training

Object detection with CNNs

OBJECT DETECTION HYUNG IL KOO

CNN Basics. Chongruo Wu

Deep Learning for Object detection & localization

Computer Vision Lecture 16

Computer Vision Lecture 16

Object Detection on Self-Driving Cars in China. Lingyun Li

Deep Learning. Visualizing and Understanding Convolutional Networks. Christopher Funk. Pennsylvania State University.

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides

Fuzzy Set Theory in Computer Vision: Example 3, Part II

COMP 551 Applied Machine Learning Lecture 16: Deep Learning

Object detection using Region Proposals (RCNN) Ernest Cheung COMP Presentation

Mask R-CNN. presented by Jiageng Zhang, Jingyao Zhan, Yunhan Ma

Deep Residual Learning

Deep learning for dense per-pixel prediction. Chunhua Shen The University of Adelaide, Australia

Classification of objects from Video Data (Group 30)

3 Object Detection. BVM 2018 Tutorial: Advanced Deep Learning Methods. Paul F. Jaeger, Division of Medical Image Computing

All You Want To Know About CNNs. Yukun Zhu

Object Detection Based on Deep Learning

Deep Learning For Video Classification. Presented by Natalie Carlebach & Gil Sharon

Yield Estimation using faster R-CNN

CMU Lecture 18: Deep learning and Vision: Convolutional neural networks. Teacher: Gianni A. Di Caro

Machine Learning 13. week

Face Recognition A Deep Learning Approach

Rich feature hierarchies for accurate object detection and semant

Mask R-CNN. Kaiming He, Georgia, Gkioxari, Piotr Dollar, Ross Girshick Presenters: Xiaokang Wang, Mengyao Shi Feb. 13, 2018

MoonRiver: Deep Neural Network in C++

HENet: A Highly Efficient Convolutional Neural. Networks Optimized for Accuracy, Speed and Storage

Gradient Descent Optimization Algorithms for Deep Learning Batch gradient descent Stochastic gradient descent Mini-batch gradient descent

Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset. By Joa õ Carreira and Andrew Zisserman Presenter: Zhisheng Huang 03/02/2018

CNNS FROM THE BASICS TO RECENT ADVANCES. Dmytro Mishkin Center for Machine Perception Czech Technical University in Prague

Efficient Deep Learning Optimization Methods

Computer Vision Lecture 16

Fuzzy Set Theory in Computer Vision: Example 3

ImageNet Classification with Deep Convolutional Neural Networks

Deep Learning Explained Module 4: Convolution Neural Networks (CNN or Conv Nets)

CS489/698: Intro to ML

Keras: Handwritten Digit Recognition using MNIST Dataset

Perceptron: This is convolution!

Semantic Segmentation

Semi-supervised Semantic Segmentation using Generative Adversarial Networks

In-Place Activated BatchNorm for Memory- Optimized Training of DNNs

Index. Springer Nature Switzerland AG 2019 B. Moons et al., Embedded Deep Learning,

Project 3 Q&A. Jonathan Krause

Convolutional Neural Networks: Applications and a short timeline. 7th Deep Learning Meetup Kornel Kis Vienna,

Como funciona o Deep Learning

Deep Learning. Practical introduction with Keras JORDI TORRES 27/05/2018. Chapter 3 JORDI TORRES

Mask R-CNN. By Kaiming He, Georgia Gkioxari, Piotr Dollar and Ross Girshick Presented By Aditya Sanghi

Overall Description. Goal: to improve spatial invariance to the input data. Translation, Rotation, Scale, Clutter, Elastic

Real-time Object Detection CS 229 Course Project

CPSC 340: Machine Learning and Data Mining. Deep Learning Fall 2016

Deep Learning and Its Applications

YOLO9000: Better, Faster, Stronger

MULTI-SCALE OBJECT DETECTION WITH FEATURE FUSION AND REGION OBJECTNESS NETWORK. Wenjie Guan, YueXian Zou*, Xiaoqun Zhou

Lecture 20: Neural Networks for NLP. Zubin Pahuja

POINT CLOUD DEEP LEARNING

CSE 559A: Computer Vision

Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group

Flow-Based Video Recognition

Places Challenge 2017

Automatic detection of books based on Faster R-CNN

Object Detection. CS698N Final Project Presentation AKSHAT AGARWAL SIDDHARTH TANWAR

Kaggle Data Science Bowl 2017 Technical Report

Learning to Segment Object Candidates

Toward Scale-Invariance and Position-Sensitive Region Proposal Networks

CS 2750: Machine Learning. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh April 13, 2016

Detection and Segmentation of Manufacturing Defects with Convolutional Neural Networks and Transfer Learning

RGBd Image Semantic Labelling for Urban Driving Scenes via a DCNN

Deep Learning for Computer Vision with MATLAB By Jon Cherrie

SEMANTIC segmentation has a wide array of applications

Transcription:

Advanced Video Analysis & Imaging (5LSH0), Module 09B Machine Learning with Convolutional Neural Networks (CNNs) - Workout Farhad G. Zanjani, Clint Sebastian, Egor Bondarev, Peter H.N. de With ( p.h.n.de.with@tue.nl ) Program a Neural Network Define a neural network Define network architecture: operations and layers (e.g. fullyconnected? convolution? recurrent?) Define the data I/O: read what data from where? Define a loss function/optimization objective: L2 loss? Softmax?... Define an optimization algorithm: SGD? Momentum SGD? etc. Auto-differential Libraries will then take over Dataflow graph. Current auto-differential libraries 3 Outline Program a Neural Network Define a neural network Auto-differential libraries Dataflow graph. Current auto-differential libraries Case studies Image classification Image segmentation Object detection Summary Program a Neural Network Define a neural network Neural network has modular design. A composition of computational modules (convolution, activation function, pooling, LSTM, ) Adapt the architecture to specific needs, change connectivity patterns, attach specialized layers and etc. Most of publications in the field, concentrate on designing a new architecture (In case studies section, some well-known architectures for image analysis will be introduced 2 4 1

Program a Neural Network Define a neural network Define network architecture: operations and layers (e.g. fullyconnected? convolution? recurrent?) Define the data I/O: read what data from where? Define a loss function/optimization objective: L2 loss? Softmax? Ranking Loss? Define an optimization algorithm: SGD? Momentum SGD? etc. Auto-differential Libraries will then take over Dataflow graph. Current auto-differential libraries Program a Neural Network Define a neural network Define network architecture: operations and layers (e.g. fullyconnected? convolution? recurrent?) Define the data I/O: read what data from where? Define a loss function/optimization objective: L2 loss? Softmax?... Define an optimization algorithm: SGD? Momentum SGD? etc. Auto-differential Libraries will then take over Dataflow graph. Current auto-differential libraries 7 Program a Neural Network Define the data I/O: read what data from where? Data Input: Using whole input image frame, cropping patches, multi frames (e.g. video),.. Data sampling strategy: random, candidate selection, hard mining, Using data augmentation: Spatial transformation, spatial distortion, color distortion, fancy PCA, Data output: What is objective? classification, regression, clustering, Using single label per image or a labeled bags (multiple-instance learning) Using only unlabeled data (unsupervised learning) or along with labeled data ( Semi-supervised learning) Using fine-annotated data along with coarse annotated instances (Weakly supervised) Program a Neural Network Define a loss function/optimization objective: L2 loss? Softmax?... A supervised learning problem measures the compatibility between a prediction (e.g. the class scores in classification) and the ground truth label The data loss takes the form of an average over the data losses for every individual example Total loss: L= 1 Number of training samples 6 8 5

Program a Neural Network Define a loss function/optimization objective. Examples of classification loss = max(0, +1) SVM L= 1 = ( ; ) = max(0, +1) Squared hinge loss = log( ) Cross entropy (softmax) Program a Neural Network Define a neural network Define network architecture: operations and layers (e.g. fullyconnected? convolution? recurrent?) Define the data I/O: read what data from where? Define a loss function/optimization objective: L2 loss? Softmax? Define an optimization algorithm: SGD? Momentum SGD? etc. Auto-differential Libraries will then take over Dataflow graph. Current auto-differential libraries 11 Program a Neural Network Define a loss function/optimization objective. Examples of regression loss Some over dimensions L= 1 = ( ; ) = = ( ) L1 loss = = ( ( ) ) L2 loss * L2 loss is more sensitive to outliers Further details: http://cs231n.github.io/neural-networks-2/#losses Gradient Descent Optimization Algorithms 10 12 9

Gradient Descent (GD) Variants Batch GD Stochastic GD (SGD) Mini-batch GD SGD+Momentum Adam RMSprop Adagrad Adadelta AdaMax Nadam * We will not discuss algorithms that are infeasible to compute in practice for high-dimensional data sets, e.g. second-order methods such as Newton method Further reading: http://ruder.io/optimizing-gradient-descent/ Problems with Gradient Descent (GD) Adjusting proper learning rate can be difficult. If set too small: it leads to painfully slow convergence. If set too large: it can hinder convergence and causes the loss function to fluctuate around the minimum or even diverge. Predefined learning rate () schedule. Strategy1: changing according to the number of running epochs Strategy 2: changing w.r.t the objective function values Problem: unable to adapt to the data characteristics Same learning rate applies to all parameter updates. If data is sparse and our features have very different frequencies, we might not want to update all of them to the same extent, but perform a larger update for rarely occurring features 13 15 Gradient Descent (GD) Variants 14 How much data we use to compute the gradient? Batch GD The entire training dataset Mini-batch GD A part of dataset Stochastic GD (SGD) Only 1 sample Gradient Descent (GD) Varaints 16 SGD + Momentum Gradient descent without momentum (left), with momentum(right) Often while training, gradient descent will get stuck at a local minima and slows down training. To accelerate the training, a fraction of previous weight update vector added to the current one. is the update vector at time step t and is momentum term.

Gradient Descent (GD) Varaints 17 Visualization Noisy moons data example Smoothing effects of momentum-based techniques (which also results in over shooting and correction). Accuracy Alec Radfords animations Gradient Descent (GD) Varaints 19 Visualization long valley example Effect of scaling-based on gradient information Alec Radfords animations Gradient Descent (GD) Varaints 18 Visualization - Beale's function example Due to the large initial gradient, velocity-based techniques shoot off and bounce around Alec Radfords animations Gradient Descent (GD) Varaints 20 Visualization - saddle point example Notice here that SGD and Momentum find it difficult to break the symmetry Alec Radfords animations

Gradient Descent (GD) Varaints 21 Goal: Move quickly in directions with small but consistent gradients. Move slowly in directions with big but inconsistent gradients. Solution: Adaptively tune learning rate based on observed behavior (e.g. Adam, RMSProp, ) Gradient Descent (GD) Varaints 23 Adaptive Moment Estimation (Adam) =. + 1. =. +(1 ). Average of past gradients Average of past squared gradients = 1 = 1 = +. Bias-correction (because and are initialized as vectors of 0 s. Consequently they will be biased toward 0 ) Typical values: =0.9, =0.999, =10 Gradient Descent (GD) Varaints RMSprop; dividing the learning rate for weight by a running average of the magnitudes of recent gradients for that weight [ ] =. [ ] + 1. = [ ] +. Typical values: =0.9, =10 Program a Neural Network Define a neural network Define network architecture: operations and layers (e.g. fullyconnected? Convolution? Recurrent?) Define the data I/O: read what data from where? Define a loss function/optimization objective: L2 loss? Softmax? Ranking Loss? Define an optimization algorithm: SGD? Momentum SGD? etc. Auto-differential Libraries will then take over Dataflow graph. Current auto-differential libraries 22 24

A Computational Layer in DL From Layers to Network 25 27 From Layers to Network Training the neural network involves deriving the gradient of its parameters with a backward pass (next slides) Back-propagation through a NN 26 28

A Layer as a Dataflow Graph Give the forward computation flow, gradients can be computed by auto differentiation Automatically derive the backward gradient flow graph from the forward dataflow graph Gradient Descent via Back-propagation The computational workflow of deep learning Forward, which we usually also call inference: forward dataflow Backward, which derives the gradients: backward gradient flow Apply/update gradients and repeat Mathematically Backward Model parameters Forward Data 29 31 A Network as a Dataflow Graph Gradients can be computed by auto differentiation Automatically derive the gradient flow graph from the forward dataflow graph Image from TensorFlow website Auto-differential Libraries Auto-differential Library automatically derives the gradients following the back- propagation rule. A lot of auto-differentiation libraries have been developed: So-called Deep Learning toolkits 30 32

Deep Learning Toolkits They are adopted differently in different domains For example Vision NLP Deep Learning Toolkits Symbolic vs. imperative programming Symbolic: write symbols to assemble the networks first, evaluate later Imperative: immediate evaluation Symbolic Imperative 33 35 Deep Learning Toolkits They are also designed differently Symbolic vs. imperative programming Imperative Symbolic Deep Learning Toolkits Symbolic Good easy to optimize (e.g. distributed, batching, parallelization) for developers More efficient Bad The way of programming might be counter-intuitive Hard to debug for user programs Less flexible: you need to write symbols before actually doing anything Imperative: Good More flexible: write one line, evaluate one line Easy to program and easy to debug: because it matches the way we use C++ or python Bad Less efficient More difficult to optimize 34 36

Deep Learning Toolkits They are also designed differently dataflow graphs vs. layer-by-layer construction Layer-by-layer construction Dataflow graphs Image Classification AlexNet (2012) VGGNet (2013) GoogleNet (2014) Residual Net (2015) Image Segmentation Fully Convolutional Networks (2013) SegNet (2015) Object Detection RCNN/ Fast, Faster RCNN (2014, 2015) Recommended to read the papers for details. 37 39 Good and Bad of Dataflow Graphs Dataflow graphs seems to be a dominated choice for representing deep learning models What s good for dataflow graphs Good for static workflows: define once, run for arbitrary batches/data Programming convenience: easy to program once you get used to it. Easy to parallelize/batching for a fixed graph Easy to optimize: a lot of off-the-shelf optimization techniques for graph What s bad for dataflow graphs Not good for dynamic workflows: need to define a graph for every training sample Hard to program dynamic neural networks: how can you define dynamic graphs using a language for static graphs? (e.g. LSTM, tree-lstm). Not easy for debugging. Difficult to parallelize/batching across multiple graphs: every graph is different, no natural batching. Image Classification ImageNet Large Scale Visual Recognition Challenge (ILSRVC) Started in 2010, annual competition 1000 categories. 1.2 million images for training. 50K validation images 150K test images Variable-resolution images Top - 1 and Top 5 error rates Top 5 error rate - The fraction of images for which label is not among the five labels considered most probable by the model. 38 40

Image Classification example I AlexNet (2012) Data augmented with image crops of 224 x 224 and horizontal flips. RGB pixel intensities altered with principle components. Refer paper for more details. The authors trained on two GPUs hence splitting the outputs of the convolutions (also known as grouped convolutions) in the diagram above. Image Classification example II VGGNet (2013) Very Deep Convolutional Networks for Large Scale Image Recognition Two variants - VGG16 and VGG19. Used only 3 x 3 kernels. Stacking multiple 3 x 3 kernels can result in larger receptive field while using less number of parameters. For example stacking two 3 x 3 kernels can result in a effective receptive field of 5 X 5. Two 3 x 3 kernels result in 18 parameters where as a single 5 x 5 results in 25. 41 43 Image Classification example I (continue) ReLUs are applied after each convolution and full connection (except the one before softmax). They also added Local Response Normalization (not covered in the course) before Max Pooling 1 and 2. Local Response Normalization had minimal performance gains and are rarely (if not never) used in practice. Refer the paper to read more. Top-5 error rate on test set 15.3% vs 26.2% previous best. Image Classification example II (continue) VGGNet (2013) VGG16 16 layers deep, VGG19 19 layers deep. Stacked multiple 3 x 3 convolution layers together to make deeper network. Each convolution layer and fully connected layer (except one before softmax) followed by ReLU. For architecture details and hyper parameter settings refer paper. Top- 5 error rate 6.8 % 42 44

Image Classification example III GoogleNet (2014) Going Deeper with Convolutions Introduced inception module. (left: naive version, right: with dimensionality reduction) Naive version concatenation of outputs of 1 x 1, 3 x3,5 x 5 convolutions and max pooling. Inception with Dimensionality reduction 1 x 1 convolutions with lower number outputs before 3 x 3 and 5 x 5 convolution. Image Classification example IV Residual Net (2015) - Deep Residual Learning for Image Recognition Introduced Residual modules, building blocks of ResNet. Degradation - In very deep networks, both training and test error increases as network gets deeper (not overfitting or underfitting). Solution - add identity mapping to the output. No extra parameters needed. 34 layers plain and residual network on the right. 45 47 Image Classification example III (continue) GoogleNet (2014) Before outputs are concatenated, they are padded so as to maintain same size (width and height). Inception modules stacked over one another other 22 layers deep Average pooling before last fully connected layer Dropout of 0.7 after average pooling. For architecture details and clearer view of GoogleNet, refer the paper. Top 5 error rate 6.67% on test set. Other variants - Inception V2, V3, V4 Image Classification - example IV (continue) ResNet (2015) Residual block. (left: residual block, right: residual block with bottleneck) Defined as = + ( ) where is the +1 layer, is the layer and ( ) is the applied transformation. Easier to propagate gradients through shorter connections. Residual Block with bottleneck used in architectures with 50+ layers. Note: Batch Normalization applied after each convolution. 46 48

Image Classification Were able to train 1202 layer ResNet without degradation problems but overfits. Top - 5 error rate of 4.49% (single model ResNet - 152) Later ablation studies showed that ResNet is an ensemble of shallower networks. Residual networks does not use all the paths equally for propagating gradients. Effective paths much shorter w.r.t network depth. Other interesting reads: Wide Residual Networks Identity mapping in Deep Residual Networks Residual Networks of Residual Networks Aggregated Residual Transformations for Deep Neural Networks Densely Connected Convolutional Networks (Best paper CVPR 2017) Attention ResNet for Image classification Squeeze and excitation Networks (ILSVRC 2017 winner) Image Segmentation - example I (continue) Fully Convolutional Network (2013) Simple up-sampling has coarser results. Added skip connections sum (element-wise) pooling layers after 1 x 1 convolution with transposed convolution layer outputs. Adding skip connection results in finer outputs FCN variants FCN - 32s, FCN - 16s, FCN 8s The number in the FCN variant denotes the stride taken in the last transposed convolution layer. FCN 8s has a stride of 8 in the last convolution layer. 49 51 Image Segmentation example I Fully Convolutional Network (2013) Fully connected layers replaced with convolution layers. Image up-sampled using transposed convolution (also known as fractionally strided convolution or sometimes as deconvolution). Stacking few transposed convolution layers with ReLU can learn non-linear up-sampling. Image Segmentation - example I (continue) Fully Convolutional Network (2013) Mean IoU 62.2% on Pascal VOC 2012. The backbone architecture can replaced with VGGNet, GoogleNet, ResNet etc. and then stack transposed convolution. 50 52

Image Segmentation - example II SegNet (2015) Encoder Decoder structure without any fully connected layers. The encoder is not down-sampled vigorously due to loss of spatial resolution. Boundary and small object information difficult to learn in street view images. Pooling indices stored from encoder and restored at decoder side. Storing pooling indices captures boundary information before down-sampling. Image Segmentation Other segmentation architectures: Learning Deconvolution Network for Semantic Segmentation DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs Conditional Random Fields as Recurrent Neural Networks The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation Pyramid Scene Parsing Network Stacked Deconvolutional Network for Semantic Segmentation 53 55 Image Segmentation - example II (continue) SegNet results on Camvid dataset 11 classes from street view images SegNet mean IoU 60.1% on Camvid dataset. Object Detection - example I RCNN (Regions with CNN features) Object detection - three stages Object localized through region proposal method (selective search, edge boxes etc.) Features extracted using CNN (either from fully connected layer or convolution layer from deeper part of the network) Training linear SVMs on CNN features. 54 56

Object Detection - example I (continue) RCNN (Regions with CNN features) Region proposal Authors use selective search algorithm Generates 2000 region proposals during testing CNN feature extraction Pretrained with ImageNet Input for CNN warped to fixed size. Train on new data for classification Classification using Linear SVM Better results with deeper layers. Trade off with computation time, feature vector can be selected from early layers Object Detection - other examples (continue) Fast RCNN Object proposal performed on last convolutional layer. Each object proposal converted to a fixed using Region of Interest (RoI) Pooling For example, proposal has size of 10 x 15 pixels and the desired output needed is of 5 x 5 pixels. The pooling window takes a size of 2 x 3. It is branched to two outputs. One to a softmax classifier and other two a bounding box regressor. The network is jointly trained and the objective function is combined with the classification and regression losses 57 59 Object Detection other examples Fast RCNN and Faster RCNN Framework of Fast RCNN (above). Expensive part of RCNN is the region proposal part. Fast and Faster RCNN tries to alleviate this problem. Faster RCNN uses a Region proposal network (RPN) to propose regions. Object Detection - other examples (continue) Faster RCNN proposes regions using a Region Proposal Network (RPN) on the last convolution layer. The RPN combined with Fast RCNN results in the Faster RCNN framework. The convolution feature map is shared with the RPN which reduces the computational complexity RPN is a fully convolutional network that is slid over a n x n window of the last convolution layer feature map. The RoI pooling after RPN produces a fixed size output. 58 60

Object Detection - other examples (continue) At each position, N anchor boxes with 3 scales and 3 aspect ratios. Anchor boxes are translation invariant. The fixed size output is connected to two branches for classification and regression The RPN and the network is trained end-to-end. For details how the loss functions are defined, refer the paper. Object Detection Other object detection works: You Only Look Once: Unified, Real-Time Object Detection. SSD: Single shot Multi-Box Detector R-FCN: Object Detection via Region-based Fully Convolutional Networks Mask R-CNN Focal Loss for Dense Object Detection 61 63 62 Object Detection RCNN Fast RCNN Faster RCNN Mean Average Precision 66.0 % 66.9 % 66.9 % Test time per image 50 sec. 2 sec. 0.2 sec. Generalization Problem of Neural Networks 64 Pitfall of Neural Networks Examples of image misclassification after applying a small distortion for AlexNet (all incorrectly predicted as ostrich!) true additive noise false true additive noise false Szegedy et al. 2014

Summary Deep learning (DL) has a modular design. This makes possible presenting the network as dataflow graph. Auto differentiable libraries facilitates the development of DL. Advances in image analysis (classification, segmentation, localization,...) speeded up fast by DL. Reliability and stability of DL models are still a controversial issue. 65