Face Recognition A Deep Learning Approach

Similar documents
FaceNet. Florian Schroff, Dmitry Kalenichenko, James Philbin Google Inc. Presentation by Ignacio Aranguren and Rahul Rana

DeepFace: Closing the Gap to Human-Level Performance in Face Verification

Deep Learning for Vision

Deep Learning with Tensorflow AlexNet

Deep Learning and Its Applications

Deep Learning. Visualizing and Understanding Convolutional Networks. Christopher Funk. Pennsylvania State University.

Convolutional Layer Pooling Layer Fully Connected Layer Regularization

Deep Face Recognition. Nathan Sun

Deep Convolutional Neural Network using Triplet of Faces, Deep Ensemble, and Scorelevel Fusion for Face Recognition

Perceptron: This is convolution!

Robust Face Recognition Based on Convolutional Neural Network

Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group

COMP9444 Neural Networks and Deep Learning 7. Image Processing. COMP9444 c Alan Blair, 2017

Structured Prediction using Convolutional Neural Networks

Deep Learning for Computer Vision II

Computer Vision Lecture 16

Understanding Faces. Detection, Recognition, and. Transformation of Faces 12/5/17

Computer Vision Lecture 16

Clustering Lightened Deep Representation for Large Scale Face Identification

Smart Parking System using Deep Learning. Sheece Gardezi Supervised By: Anoop Cherian Peter Strazdins

Lecture 20: Neural Networks for NLP. Zubin Pahuja

An Associate-Predict Model for Face Recognition FIPA Seminar WS 2011/2012

Deep Learning For Video Classification. Presented by Natalie Carlebach & Gil Sharon

COMP 551 Applied Machine Learning Lecture 16: Deep Learning

An Exploration of Computer Vision Techniques for Bird Species Classification

Convolutional Neural Networks

Machine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU,

Supplementary material for Analyzing Filters Toward Efficient ConvNet

CS 2750: Machine Learning. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh April 13, 2016

Advanced Video Analysis & Imaging

Deep Neural Networks:

A Patch Strategy for Deep Face Recognition

Facial Key Points Detection using Deep Convolutional Neural Network - NaimishNet

Computer Vision Lecture 16

arxiv: v1 [cs.cv] 9 Jun 2016

Machine Learning With Python. Bin Chen Nov. 7, 2017 Research Computing Center

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech

Automatic Detection of Multiple Organs Using Convolutional Neural Networks

Neural Networks with Input Specified Thresholds

CMU Lecture 18: Deep learning and Vision: Convolutional neural networks. Teacher: Gianni A. Di Caro

Real-time Object Detection CS 229 Course Project

SHIV SHAKTI International Journal in Multidisciplinary and Academic Research (SSIJMAR) Vol. 7, No. 2, April 2018 (ISSN )

Face Recognition by Deep Learning - The Imbalance Problem

Inception and Residual Networks. Hantao Zhang. Deep Learning with Python.

Tiny ImageNet Visual Recognition Challenge

Fuzzy Set Theory in Computer Vision: Example 3

DECISION TREES & RANDOM FORESTS X CONVOLUTIONAL NEURAL NETWORKS

Como funciona o Deep Learning

arxiv: v1 [cs.cv] 31 Mar 2017

Channel Locality Block: A Variant of Squeeze-and-Excitation

Vignette: Reimagining the Analog Photo Album

Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset. By Joa õ Carreira and Andrew Zisserman Presenter: Zhisheng Huang 03/02/2018

Deep neural networks II

Improving Face Recognition by Exploring Local Features with Visual Attention

CNN Basics. Chongruo Wu

Deconvolutions in Convolutional Neural Networks

DeepFace: Closing the Gap to Human-Level Performance in Face Verification

Direct Multi-Scale Dual-Stream Network for Pedestrian Detection Sang-Il Jung and Ki-Sang Hong Image Information Processing Lab.

Deep Learning Explained Module 4: Convolution Neural Networks (CNN or Conv Nets)

POINT CLOUD DEEP LEARNING

Convolution Neural Networks for Chinese Handwriting Recognition

Deep Learning for Face Recognition. Xiaogang Wang Department of Electronic Engineering, The Chinese University of Hong Kong

arxiv: v1 [cs.cv] 16 Nov 2015

3D model classification using convolutional neural network

Using Machine Learning for Classification of Cancer Cells

FACE recognition has been one of the most extensively. Robust Face Recognition via Multimodal Deep Face Representation

arxiv: v3 [cs.cv] 18 Jan 2017

arxiv: v1 [cs.cv] 1 Jun 2018

Residual Networks for Tiny ImageNet

Stochastic Function Norm Regularization of DNNs

All You Want To Know About CNNs. Yukun Zhu

Deep Learning Architectures for Face Recognition in Video Surveillance

CSE 559A: Computer Vision

Know your data - many types of networks

Traffic Multiple Target Detection on YOLOv2

Video-Based Face Recognition Using Ensemble of Haar-Like Deep Convolutional Neural Networks

INTRODUCTION TO DEEP LEARNING

Machine Learning 13. week

Recursive Spatial Transformer (ReST) for Alignment-Free Face Recognition

Intro to Deep Learning. Slides Credit: Andrej Karapathy, Derek Hoiem, Marc Aurelio, Yann LeCunn

CS 179 Lecture 16. Logistic Regression & Parallel SGD

INF 5860 Machine learning for image classification. Lecture 11: Visualization Anne Solberg April 4, 2018

Overall Description. Goal: to improve spatial invariance to the input data. Translation, Rotation, Scale, Clutter, Elastic

Object detection with CNNs

Real Time Monitoring of CCTV Camera Images Using Object Detectors and Scene Classification for Retail and Surveillance Applications

Measuring Aristic Similarity of Paintings

REGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION

Is Bigger CNN Better? Samer Hijazi on behalf of IPG CTO Group Embedded Neural Networks Summit (enns2016) San Jose Feb. 9th

In-Place Activated BatchNorm for Memory- Optimized Training of DNNs

Application of Convolutional Neural Network for Image Classification on Pascal VOC Challenge 2012 dataset

CS 1674: Intro to Computer Vision. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh November 16, 2016

MoonRiver: Deep Neural Network in C++

Multi-Glance Attention Models For Image Classification

Weighted Convolutional Neural Network. Ensemble.

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides

Bilevel Sparse Coding

Inception Network Overview. David White CS793

FACIAL POINT DETECTION BASED ON A CONVOLUTIONAL NEURAL NETWORK WITH OPTIMAL MINI-BATCH PROCEDURE. Chubu University 1200, Matsumoto-cho, Kasugai, AICHI

Spatial Localization and Detection. Lecture 8-1

arxiv: v1 [cs.cv] 14 Jul 2017

Transcription:

Face Recognition A Deep Learning Approach Lihi Shiloh Tal Perl Deep Learning Seminar

2 Outline What about Cat recognition? Classical face recognition Modern face recognition DeepFace FaceNet Comparison Discussion

3 Articles DeepFace: Closing the Gap to Human-Level Performance in Face Verification Link - https://www.cs.toronto.edu/~ranzato/publications/taigman_cvpr14.pdf FaceNet: A Unified Embedding for Face Recognition and Clustering Link - https://arxiv.org/pdf/1503.03832v3.pdf

4 The Face Recognition Problem 512 Matrix of pixels 512x512=262144 Need to find a face Compare to a database Curse of dimensionality 512

5 Face recognition pipeline (Classical) Detection and Alignment Recognition

6 Face recognition pipeline (Classical) Features handpicked Works well on small datasets Fails on illumination variations and facial expressions Fails on large datasets Detection and Alignment Recognition

7 Face patterns lie on a complex nonlinear and non convex manifold in the high dimensional space.

8 Ok So how can we solve this nonlinear, nonconvex problem?

9 Modern Face Recognition More Data attained by crawling more faces! Google / Facebook etc. Stronger Hardware More powerful statistical models Neural networks / Deep Learning

10 The Network will find the features itself

11 Lets dive into 2 examples DeepFace (2014) FaceNet (2015)

12 DeepFace (Taigman and Wolf 2014) Detection Alignment Deep Learning Representation Classification

13 DeepFace (Taigman and Wolf 2014) Detection Alignment Deep Learning Representation Classification 2D/3D face modeling and alignment using affine transformations 9 layer deep neural network 120 million parameters

14 DeepFace - Alignment (Frontalization) Fiducial points (face landmarks) 2D and 3D affine transformations Frontal face view Detection Alignment Deep Learning Representation Classification

15 DeepFace - Deep Learning Input: 3D aligned 3 channel (RGB) face image 152x152 pixels 9 layer deep neural network architecture Performs soft max for minimizing cross entropy loss Uses SGD, Dropout, ReLU Outputs k-class prediction Detection Alignment Deep Learning Representation Classification

16 DeepFace - Deep Learning Architecture Detection Alignment Deep Learning Representation Classification

17 DeepFace - Deep Learning Layer 1-3 : Intuition Convolution layers - extract low-level features (e.g. simple edges and texture) ReLU after each conv. layer Max-pooling: make convolution network more robust to local translations. Detection Alignment Deep Learning Representation Classification

18 DeepFace - Deep Learning Layer 4-6: Intuition Apply filters to different locations on the map Similar to a conv. layer but spatially dependent Detection Alignment Deep Learning Representation Classification

19 DeepFace - Deep Learning Layer F7 is fully connected and generates 4096d vector Sparse representation of face descriptor 75% of outputs are zero Detection Alignment Deep Learning Representation Classification

20 DeepFace - Deep Learning Layer F8 is fully connected and generates 4030d vector Acts as K class SVM Detection Alignment Deep Learning Representation Classification

21 DeepFace Representation F8 calculates probability with softmax Cross-entropy loss function: L log Computed using SGD and performs backpropagation k p k p exp o / exp o k k h h Detection Alignment Deep Learning Representation Classification

22 DeepFace Training Trained on SFC 4M faces (4030 identities, 800-1200 images per person) We will focus on Labeled Faces in the Wild (LFW) evaluation Used SGD with momentum of 0.9 Learning rate 0.01 with manual decreasing, final rate was 0.0001 Random weight initializing 15 epochs of training 3 days total on a GPU-based engine

23 DeepFace results DF was evaluated on LFW (Labeled Faces in the Wild) dataset 13233 images collected from the web 1680 identities. 0.9735±0.0025 accuracy

24 DeepFace results Experimented with different derivatives of deep architecture Single the network we discussed Ensemble a combination of 3 networks with different inputs Final K class computer by: Kcombined Ksingle Kgradient Kalign2d K x, y x y 2

25 DeepFace deeper is better Experimented with different depths of networks Removed C3, L4, L5 Compared error rate to number of classes K

26 DeepFace Summary Close to human accuracy 120M parameters Proves that going deeper brings better results Computation efficiency 0.33 second per face image @2.2GHz CPU Invariant to pose, illumination, expression and image quality Our work is done

27 Going Deeper with convolutions Szegedy, Liu, Jia, Semanet, Reed, Anguelov, Erhan, Vanhoucke and Rabinovitch CoRR, abs/1409.4842, 2014. 2, 4,5,6,9

28 FaceNet (Schroff and Philbin 2015) Detection Deep Learning Normalization Representation Triplet Loss Classification Alignment Cross- Entropy Loss

29 FaceNet (Schroff and Philbin 2015) Detection Deep Learning Normalization Representation Triplet Loss Classification Deep CNN (22 layers) Works on pure data Embedding (State-Of-The-Art face recognition using only 128 features per face > efficient!) Triplet images for training and loss function Uses SGD, Dropout, ReLU

Deep Learning Seminar 30 FaceNet Deep Learning Detection Deep Learning Normalization Representation Triplet Loss Classification

31 FaceNet Deep Learning 22 layers: 11 convolutions 3 normalizations 4 max-pooling 1 concatenation 3 fully-connected 140 million parameters Detection Deep Learning Normalization Representation Triplet Loss Classification

32 FaceNet Normalization Detection Deep Learning Visualizing and understanding convolutional networks M. D. Zeiler and R. Fergus CoRR, abs/1311.2901, 2013. 2, 3, 4, 6 Normalization Representation Triplet Loss Classification

33 FaceNet Representation Objective is to minimize L2 distance between same face representations Embedding concept: Transform an image to a low dimensional feature space (128 d) Concept known for Natural Language Processing (NLP) Detection Deep Learning Normalization Representation Triplet Loss Classification

34 FaceNet Triplet Loss Train model with triplets of roughly aligned matching / non-matching face patches T set of all possible triplets Loss function: N i 1 2 2 a p a n a p n d i i i i i, i, i, L f x f x f x f x x x x f x 2 2 Constraint : f x 2 1 Detection Deep Learning Normalization Representation Triplet Loss Classification

35 FaceNet Triplet Selection Crucial to ensure fast convergence Select triplets that violet the triplet constraint: 2 2 a p a n a p n,, i i i i i i i f x f x f x f x x x x 2 2 Offline Generate triplets every n steps, using the most recent network checkpoint and computing argmin/argmax on a subset of the data Online Selecting hard positive/negative examplers from a mini-batch Detection Deep Learning Normalization Representation Triplet Loss Classification

36 FaceNet Online Triplet Selection Compute arg max 2 2 a p a n p f xi f xi, arg min n f x x i 2 x i i f xi 2 Better: Choose all anchor-positive pairs in a mini-batch while selecting only hard-negatives To avoid local minima they chose negative semi-hard examplers that satisfy 2 2 a p a n i i i i f x f x f x f x 2 2 Detection Deep Learning Normalization Representation Triplet Loss Classification

37 FaceNet Classification Clustering: K-means or other clustering algorithms LFW (Labeled Faces in the Wild) dataset: 13233 images collected from the web, 1680 identities. Results: 0.9887±0.15 accuracy with face alignment 0.9963±0.09 (DeepFace: 0.9735±0.0025 accuracy) Detection Deep Learning Normalization Representation Triplet Loss Classification

38 FaceNet LFW Classification Detection Deep Learning Normalization Representation Triplet Loss Classification

39 FaceNet LFW Classification Detection Deep Learning Normalization Representation Triplet Loss Classification

40 FaceNet Some more information Detection Deep Learning Normalization Representation Triplet Loss Classification

41 FaceNet Summary Important new concepts: Triplet loss and Embeddings 140M parameters Proves that going deeper brings better results for the face recognition problem Computation efficiency ~0.73 second per face image (1.6B FLOPS) @2.2GHZ CPU Invariant to pose, illumination, expression and image quality Is our work done?

42 Comparison DeepFace FaceNet Multi-class probability Embedding 9 layers 22 layers 120M parameters 140M parameters 0.33 sec per image @2.2GHZ CPU ~0.73 sec per image @2.2GHZ CPU 2D/3D alignment Crop and scaling Cross-Entropy Loss Triplet loss 0.9735±0.0025 0.9963±0.09

43

44 Discussion Different representations (1-hot vs. feature vector) NN depth importance Computational complexity vs. classification performance (trade-off) Results shown here are updated with the articles Better results have already been shown

Thank you!