Deep Learning for Vision

Similar documents
DeepFace: Closing the Gap to Human-Level Performance in Face Verification

Face Recognition A Deep Learning Approach

DeepFace: Closing the Gap to Human-Level Performance in Face Verification

Computer Vision Lecture 16

Understanding Faces. Detection, Recognition, and. Transformation of Faces 12/5/17

Computer Vision Lecture 16

Spatial Localization and Detection. Lecture 8-1

Deep Learning for Face Recognition. Xiaogang Wang Department of Electronic Engineering, The Chinese University of Hong Kong

Deep Learning for Computer Vision II

FaceNet. Florian Schroff, Dmitry Kalenichenko, James Philbin Google Inc. Presentation by Ignacio Aranguren and Rahul Rana

Deep Face Recognition. Nathan Sun

DeepPose & Convolutional Pose Machines

Computer Vision Lecture 16

Articulated Pose Estimation by a Graphical Model with Image Dependent Pairwise Relations

CS 2750: Machine Learning. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh April 13, 2016

Combining Local Appearance and Holistic View: Dual-Source Deep Neural Networks for Human Pose Estimation

arxiv: v1 [cs.cv] 16 Nov 2015

Deep Learning with Tensorflow AlexNet

Introduction to Deep Learning

Deep learning for object detection. Slides from Svetlana Lazebnik and many others

Deep Learning. Visualizing and Understanding Convolutional Networks. Christopher Funk. Pennsylvania State University.

Deep Learning for Computer Vision with MATLAB By Jon Cherrie

Rich feature hierarchies for accurate object detection and semantic segmentation

Human-Robot Interaction

Vignette: Reimagining the Analog Photo Album

Human Pose Estimation using Global and Local Normalization. Ke Sun, Cuiling Lan, Junliang Xing, Wenjun Zeng, Dong Liu, Jingdong Wang

arxiv: v3 [cs.cv] 9 Jun 2015

Towards Automatic Identification of Elephants in the Wild

Weighted Convolutional Neural Network. Ensemble.

Deep Neural Networks:

ECE 6554:Advanced Computer Vision Pose Estimation

CS229 Final Project Report. A Multi-Task Feature Learning Approach to Human Detection. Tiffany Low

LOCALIZATION OF HUMANS IN IMAGES USING CONVOLUTIONAL NETWORKS

Intro to Deep Learning. Slides Credit: Andrej Karapathy, Derek Hoiem, Marc Aurelio, Yann LeCunn

Content-Based Image Recovery

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech

FACIAL POINT DETECTION BASED ON A CONVOLUTIONAL NEURAL NETWORK WITH OPTIMAL MINI-BATCH PROCEDURE. Chubu University 1200, Matsumoto-cho, Kasugai, AICHI

Deep Convolutional Neural Network using Triplet of Faces, Deep Ensemble, and Scorelevel Fusion for Face Recognition

Deep Learning Basic Lecture - Complex Systems & Artificial Intelligence 2017/18 (VO) Asan Agibetov, PhD.

Introduction to Deep Learning for Facial Understanding Part III: Regional CNNs

Deep Learning for Virtual Shopping. Dr. Jürgen Sturm Group Leader RGB-D

Deep Learning For Video Classification. Presented by Natalie Carlebach & Gil Sharon

Clustering Lightened Deep Representation for Large Scale Face Identification

THE task of human pose estimation is to determine the. Knowledge-Guided Deep Fractal Neural Networks for Human Pose Estimation

FACIAL POINT DETECTION USING CONVOLUTIONAL NEURAL NETWORK TRANSFERRED FROM A HETEROGENEOUS TASK

Object detection with CNNs

Machine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU,

ImageNet Classification with Deep Convolutional Neural Networks

Supplementary Material: Unsupervised Domain Adaptation for Face Recognition in Unlabeled Videos

Recursive Spatial Transformer (ReST) for Alignment-Free Face Recognition

Learning to Recognize Faces in Realistic Conditions

Using Siamese CNNs for Removing Duplicate Entries From Real-Estate Listing Databases

Classification of objects from Video Data (Group 30)

A Patch Strategy for Deep Face Recognition

Getting started with Caffe. Jon Barker, Solutions Architect

Real Time Monitoring of CCTV Camera Images Using Object Detectors and Scene Classification for Retail and Surveillance Applications

Object Recognition II

CS6501: Deep Learning for Visual Recognition. Object Detection I: RCNN, Fast-RCNN, Faster-RCNN

Implementing Deep Learning for Video Analytics on Tegra X1.

Robust Face Recognition Based on Convolutional Neural Network

All You Want To Know About CNNs. Yukun Zhu

Regionlet Object Detector with Hand-crafted and CNN Feature

Convolutional Neural Networks

AttentionNet for Accurate Localization and Detection of Objects. (To appear in ICCV 2015)

POINT CLOUD DEEP LEARNING

Machine Learning 13. week

Visual Inspection of Storm-Water Pipe Systems using Deep Convolutional Neural Networks

Deep Convolutional Neural Networks for Efficient Pose Estimation in Gesture Videos

An Efficient Convolutional Network for Human Pose Estimation

Project 3 Q&A. Jonathan Krause

COMP9444 Neural Networks and Deep Learning 7. Image Processing. COMP9444 c Alan Blair, 2017

Structured Prediction using Convolutional Neural Networks

SHIV SHAKTI International Journal in Multidisciplinary and Academic Research (SSIJMAR) Vol. 7, No. 2, April 2018 (ISSN )

Yiqi Yan. May 10, 2017

Describable Visual Attributes for Face Verification and Image Search

Motion Interchange Patterns for Action Recognition in Unconstrained Videos

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks

Deep learning for dense per-pixel prediction. Chunhua Shen The University of Adelaide, Australia

NVIDIA FOR DEEP LEARNING. Bill Veenhuis

CIS680: Vision & Learning Assignment 2.b: RPN, Faster R-CNN and Mask R-CNN Due: Nov. 21, 2018 at 11:59 pm

CS 1674: Intro to Computer Vision. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh November 16, 2016

on learned visual embedding patrick pérez Allegro Workshop Inria Rhônes-Alpes 22 July 2015

Object Detection Based on Deep Learning

Single Image Depth Estimation via Deep Learning

Learning Deep Representations for Visual Recognition

arxiv: v1 [cs.cv] 1 Jun 2018

3D Shape Analysis with Multi-view Convolutional Networks. Evangelos Kalogerakis

Articulated Pose Estimation with Flexible Mixtures-of-Parts

arxiv: v1 [cs.cv] 20 Dec 2016

An Exploration of Computer Vision Techniques for Bird Species Classification

Perceptron: This is convolution!

Apparel Classifier and Recommender using Deep Learning

Deep Fisher Faces. 1 Introduction. Harald Hanselmann

An Efficient Convolutional Network for Human Pose Estimation

Disguised Face Identification (DFI) with Facial KeyPoints using Spatial Fusion Convolutional Network. Nathan Sun CIS601

A Sparse and Locally Shift Invariant Feature Extractor Applied to Document Images

Tutorial on Machine Learning Tools

Smart Content Recognition from Images Using a Mixture of Convolutional Neural Networks *

Learning to Align from Scratch

A Sparse and Locally Shift Invariant Feature Extractor Applied to Document Images

Transcription:

Deep Learning for Vision Presented by Kevin Matzen

Quick Intro - DNN Feed-forward Sparse connectivity (layer to layer) Different layer types Recently popularized for vision [Krizhevsky, et. al. NIPS 2012]

The Layers Convolution Fully connected Loss functions Image processing Pooling Neuron activation function Normalization

deeplearning.net/tutorial/lenet.html

[Krizhevsky, NIPS 2012]

Software code.google.com/p/cuda-convnet/ [nvidia gpu] github.com/ucb-icsi-vision-group/decaf-release/ [deprecated; cpu-only] caffe.berkeleyvision.org [cpu; nvidia gpu] research.google.com/archive/ large_deep_networks_nips2012.html [proprietary; distributed system]

DeepPose: Human Pose Estimation via Deep Neural Networks Alexander Toshev, Christian Szegedy - CVPR 2014 DeepFace: Closing the Gap to Human-Level Performance in Face Verification Yaniv Taigman, Ming Yang, Marc Aurelio Ranzato, Lior Wolf - CVPR 2014

DeepPose: Human Pose Estimation via Deep Neural Networks Alexander Toshev, Christian Szegedy - CVPR 2014 DeepFace: Closing the Gap to Human-Level Performance in Face Verification Yaniv Taigman, Ming Yang, Marc Aurelio Ranzato, Lior Wolf - CVPR 2014

Input: Uncropped photo Output: Joint locations

Pipeline 1. Person detection 2. Joint position regression 3. Joint refinement

Datasets Leeds Sports Pose (LSP) [Johnson, et. al. BMVC 2010] 14 joint locations 2000 main person - 150 px Frames Labeled in Cinema (FLIC) [Sapp, et. al. CVPR 2013] Image Parse [Ramanan NIPS 2006] Buffy Stickmen 305 images similar to leeds includes casual photos 5003 person detector every 10 frames of 30 movies 20k candidates mturk 10 upperbody joints 748 frames

Person Detection Input: Uncropped image Output: Cropped image LSP dataset - No person detector FLIC dataset - Enlarged face detector

Main difference

Runtime 0.1s per image - 12 cores (SotA - 1.5s, 4s) Training stage 0-3 days Training refinement - 7 days each

Evaluation Percentage of Correct Parts (PCP) Correct if predicted limb is within 1/2 of correct limb length Percentage of Detected Joints (PDJ) Predicted and correct joints are within some factor of torso diameter

DeepPose: Human Pose Estimation via Deep Neural Networks Alexander Toshev, Christian Szegedy - CVPR 2014 DeepFace: Closing the Gap to Human-Level Performance in Face Verification Yaniv Taigman, Ming Yang, Marc Aurelio Ranzato, Lior Wolf - CVPR 2014

Pipeline Detect faces Correct out-of-plane rotation Generate features via CNN Classify

Alignment

Fiducial Detection LBP histograms Support Vector Regressor Iteratively transform and predict 6 fiducial points for 2D alignment 67 fiducial points for 3D alignment

3D Alignment Iterative affine camera PnP 3D reference - Average mesh of USF Human-ID dataset Considers fiducial covariance Residuals applied to reference mesh Affine warp texture

CNN Architecture

CNN Architecture Features

CNN Architecture weight sharing no weight sharing

Training softmax cross-entropy loss -log pk

Sparsity ReLU nonlinearly - rectified linear unit max(0, x) 75% model parameters = 0 Dropout - first fully connected layer

Normalization ReLU - unbounded Normalize features to [0, 1] based on holdout

Verification Metrics Unsupervised - dot product χ 2 similarity Siamese network

Χ 2 Similarity Χ 2 (f1,f2) = Σiwi(f1[i] - f2[i]) 2 /(f1[i] + f2[i]) weights learned via svm

Siamese Network - FC 4096-to-1

Datasets Social Face Classification (SFC) Presumably Facebook photos 4.4 mil faces; 4,030 people No overlap with other datasets

Datasets Labeled Faces in the Wild (LFW) 13,323 faces; 5,749 celebs 6,000 pairs Restricted protocol - same/not same labels at training Unrestricted protocol - identities during training Unsupervised - no training on LFW

Datasets YouTube Faces (YTF) 3,425 videos of 1,595 subjects Subset of celebs from LFW

SFC Training Perf Reduce data by omitting people Reduce data by omitting examples Remove layers from network

LFW Perf

Runtime 0.18 s - feature extraction (1 core; 2.2 GHz) 0.05 s - alignment 0.33 s - total

Questions?