Face Detection with Deep Learning

Similar documents
EYE CENTER LOCALIZATION ON A FACIAL IMAGE BASED ON MULTI-BLOCK LOCAL BINARY PATTERNS

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Edge Detection in Noisy Images Using the Support Vector Machines

Detection of an Object by using Principal Component Analysis

Fast Feature Value Searching for Face Detection

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1

What is Object Detection? Face Detection using AdaBoost. Detection as Classification. Principle of Boosting (Schapire 90)

Lecture 5: Multilayer Perceptrons

3D vector computer graphics

A Binarization Algorithm specialized on Document Images and Photos

Learning-based License Plate Detection on Edge Features

A Gradient Difference based Technique for Video Text Detection

A Gradient Difference based Technique for Video Text Detection

Support Vector Machines

Classifying Acoustic Transient Signals Using Artificial Intelligence

Local Quaternary Patterns and Feature Local Quaternary Patterns

Parallelism for Nested Loops with Non-uniform and Flow Dependences

TN348: Openlab Module - Colocalization

Pictures at an Exhibition

Histogram of Template for Pedestrian Detection

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

FACE detection and alignment are essential to many face

A Modified Median Filter for the Removal of Impulse Noise Based on the Support Vector Machines

Research of Image Recognition Algorithm Based on Depth Learning

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Audio Event Detection and classification using extended R-FCN Approach. Kaiwu Wang, Liping Yang, Bin Yang

Support Vector Machines

Corner-Based Image Alignment using Pyramid Structure with Gradient Vector Similarity

A Novel Adaptive Descriptor Algorithm for Ternary Pattern Textures

An Image Fusion Approach Based on Segmentation Region

Face Recognition Based on SVM and 2DPCA

An Efficient Face Detection Method Using Adaboost and Facial Parts

Load Balancing for Hex-Cell Interconnection Network

RECOGNIZING GENDER THROUGH FACIAL IMAGE USING SUPPORT VECTOR MACHINE

Machine Learning 9. week

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach

Classifier Selection Based on Data Complexity Measures *

A Probabilistic Approach to Detect Urban Regions from Remotely Sensed Images Based on Combination of Local Features

Discriminative classifiers for object classification. Last time

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

A Background Subtraction for a Vision-based User Interface *

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes

Efficient Object Detection Using Cascades of Nearest Convex Model Classifiers

BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET

Classifier Swarms for Human Detection in Infrared Imagery

A PATTERN RECOGNITION APPROACH TO IMAGE SEGMENTATION

User Authentication Based On Behavioral Mouse Dynamics Biometrics

Categorizing objects: of appearance

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following.

Modular PCA Face Recognition Based on Weighted Average

Simulation Based Analysis of FAST TCP using OMNET++

Mathematics 256 a course in differential equations for engineering students

An Improved Image Segmentation Algorithm Based on the Otsu Method

Face Recognition by Fusing Binary Edge Feature and Second-order Mutual Information

A high precision collaborative vision measurement of gear chamfering profile

Enhanced Face Detection Technique Based on Color Correction Approach and SMQT Features

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

An Automatic Eye Detection Method for Gray Intensity Facial Images

A New Feature of Uniformity of Image Texture Directions Coinciding with the Human Eyes Perception 1

Detection of hand grasping an object from complex background based on machine learning co-occurrence of local image feature

Gender Classification using Interlaced Derivative Patterns

Multiclass Object Recognition based on Texture Linear Genetic Programming

Video-Based Facial Expression Recognition Using Local Directional Binary Pattern

Deep Spatial-Temporal Joint Feature Representation for Video Object Detection

Sequential Monte-Carlo Based Road Region Segmentation Algorithm with Uniform Spatial Sampling

Identifying Table Boundaries in Digital Documents via Sparse Line Detection

Real-time Motion Capture System Using One Video Camera Based on Color and Edge Distribution

Cluster Analysis of Electrical Behavior

The Research of Support Vector Machine in Agricultural Data Classification

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

Online Detection and Classification of Moving Objects Using Progressively Improving Detectors

arxiv: v2 [cs.cv] 9 Apr 2018

Learning a Class-Specific Dictionary for Facial Expression Recognition

Multi-View Face Alignment Using 3D Shape Model for View Estimation

Face Tracking Using Motion-Guided Dynamic Template Matching

Feature Reduction and Selection

Computer Aided Drafting, Design and Manufacturing Volume 25, Number 2, June 2015, Page 14

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc.

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Image Representation & Visualization Basic Imaging Algorithms Shape Representation and Analysis. outline

Training of Kernel Fuzzy Classifiers by Dynamic Cluster Generation

Comparison Study of Textural Descriptors for Training Neural Network Classifiers

CAN COMPUTERS LEARN FASTER? Seyda Ertekin Computer Science & Engineering The Pennsylvania State University

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms

Reducing Frame Rate for Object Tracking

Positive Semi-definite Programming Localization in Wireless Sensor Networks

Face Recognition University at Buffalo CSE666 Lecture Slides Resources:

Extraction of Texture Information from Fuzzy Run Length Matrix

Real-time Joint Tracking of a Hand Manipulating an Object from RGB-D Input

A fast algorithm for color image segmentation

Focal Loss in 3D Object Detection

Infrared face recognition using texture descriptors

MATHEMATICS FORM ONE SCHEME OF WORK 2004

Accurate Overlay Text Extraction for Digital Video Analysis

Adaptive Silhouette Extraction and Human Tracking in Dynamic. Environments 1

Design of Structure Optimization with APDL

Face Recognition Method Based on Within-class Clustering SVM

Transcription:

Face Detecton wth Deep Learnng Yu Shen Yus122@ucsd.edu A13227146 Kuan-We Chen kuc010@ucsd.edu A99045121 Yzhou Hao y3hao@ucsd.edu A98017773 Mn Hsuan Wu mhwu@ucsd.edu A92424998 Abstract The project here presents the mplementaton of face detecton technology by usng deep learnng. The man dea used n ths project s mult-task Cascaded Convolutonal Neural Networks, whch contans three sub-networks together to learn to recognze human faces after several stages of decomposton and flterng. The dataset that s gong to be used s FDDB dataset, whch contans over fve thousand faces n a set of around two thousand and eght hundred mages. 1. Introducton Wth the development of machne learnng and computer vson, face detecton becomes a very popular and useful technology. The hstory of face detecton can be dvded nto three phases. The frst perod was from 1964 to 1990. It was the start of face detecton, and the research n ths perod were manly based on Geometrc features. At ths early stage, there were not many applcatons related to face detecton. The second perod was from 1991 to 1997. Although ths perod was short (only seven years), t was actually a peak tme of face detecton research and development. A lot of new algorthm and busness applcaton on face detecton showed up n ths tme. And the research moved to Egen-face based method nstead of geometrc feature based method. In the thrd perod, from 1998 to now, researchers and scentsts are focusng on face detecton under non deal condtons. For example, f the lght s too strong or people are movng around too fast, t s gong to be hard to track the faces. Hence, people tred to mplement face detecton based on three dmensonal model to have a better track of faces. And for now, face detecton has been mplemented and wdely used n dfferent ndustres and products. Face detecton could be used n the access control system of a door or of a buldng. It could also be used by companes or school to check employees or students attendance. The new Phone released several months ago also uses ths technology to unlock the screen. Face detecton s defntely gong to be an ndspensable part n the ntellgent socety n the near future and face detecton may be mplemented nto even more applcatons such as dgtal passport and so on. As students studyng n machne learnng, we are curous and enthusastc about ths new and popular technology. Hence, n ths project, we are gong to use Deep learnng to detect human faces n mages. 2. Related Work As mentoned n the ntroducton part, the face detecton technology has been studed for over 50 years. Hence, n the hstory, there are a lot of dfferent face detecton methods that are nterestng and each of them has ther own advantages and dsadvantages. For example, the Vola-Jones method[6] s the frst framework that can allow face detecton n real tme. Ths s a three-step framework, whch ncludes features computaton, classfcaton, and combnaton of classfers. Overall, Vola-Jones s a successful method, whch has fast detecton speed and good accuracy and low false postve rate. However, t takes long tme for tranng and has restrcton on dfferent head poses. Local Bnary Pattern(LBP [1]) s another effectve method. In ths method, every pxel s assgned a texture value, whch can be combned wth target for trackng. The advantages of LBP ncludes fast computaton, successful descrpton of texture feature and smple steps. However, t could be only used for gray mages, and t doesn t have good accuracy. Adaboost algorthm[3] was proposed n 2003. AdaBoost s a learnng algorthm that can create a strong classfer by choosng vsual features from a bunch of smple classfers and combnng them lnearly. It s very smple to be mplemented because t doesn t requre any pror knowledge about the face structure. Also, t can be used wth numerous dfferent classfers and mproves classfcaton accuracy. The dsadvantages of Adaboost are that frst t has slow tranng speed, second t s too senstve to nosy data whch can lead to low fnal detecton accuracy. SMQT Features and SNOW Classfer Method[5] s a relatvely new method publshed n 2011. Ths method has two phases. The frst phase s face lumnance, whch get pxel nformaton of the mage. The second phase s detecton whch uses SMQT features as feature extracton and SNOW to speed 1

up the classfer. Ths method s computatonally effcent. But t has low false postve rate. The last method dscussed here s neural network based method[4], whch s the foundaton of the MTCNN[7]. In ths early neural network method, there are two stages: flterng and mergng and arbtatng. The advantages of ths method are acceptable false detecton and acceptable accuracy. And the dsadvantages are slow detecton process and complex methodology. The method we mplemented n our project s MTCNN. Compared to all methods above, MTCNN has the best and ncredbly hgh accuracy. Although t takes some tme for tranng, we can save tme by usng pre-traned model and keep the relatvely hgh accuracy. MTCNN even supports for real tme face detecton. Hence, MTCNN s a very good method for face detecton. 3. Dataset and Features The data used s FDDB dataset [2]. It contans 5171 faces n a set of 2845 (both gray-scale and colored) mages. The dataset was broken down nto 10 folders wth roughly 520 labeled faces each folder. Each mage may contan multple faces. The dataset have labels of the postons of human faces wthn the pcture. The labels were marked by drawng ellpses around each human face, and recordng the radus of both axes, center of the ellpse, and the angle of the ellpse. As seen n 1, the data has multple Fgure 1. Sample data faces wthn the pcture wth dfferent lghtngs. Notce the two faces at the back have poor lghtng and ther bodes, even edges of ther faces are blocked by the other people n front. In 2, the labels are drawn as ellpses. Notce that ellpses may overlap wth each other. In addton, note that the person s face on the left s not labeled. Ths s because only faces wth ether one of the eyes vsble s labeled. The annotators also only label the faces that are at least 20 pxels n both heght and wdth. Snce these faces are annotated Fgure 2. Sample data wth ellptcal label by dfferent annotators, there could be slght dfferences n judgments on whether or not a face should be labeled between dfferent annotators. Moreover, dfferent poses of the faces, occlusons of faces, as well as resoluton all affect the annotatons. Therefore, there are ntrnsc dffcultes n makng a perfect labels. However, these should not affect most annotatons sgnfcantly and the annotatons should stll be relatvely accurate. 4. Method The method we used s called Mult-task Cascaded Convolutonal Networks[7]. Dfferent from other algorthms, MTCNN s cascadng three CNN wth dfferent structures together for face detecton. Fgure 3 s the ppelne of MTCNN, and fgure 4 s the detaled structure of MTCNN. Before we put the testng mage nto classfer, we need to resze the mage n dfferent scales, and stack t nto an mage pyramd. By dong these steps, we can generate the same face n dfferent scales, whch ncreases the ablty of the network. After that, a sldng wndow wll be appled to the pyramd, and break the mage nto regons, whch are the nputs to the network. 4.1. Stage-1 Stage-1 s called P-Net, whch references to Proposal network. It s a shallow network whch wll roughly decde whch regon contans faces. For each proposed boundng box, there wll be three dfferent classfcatons appled to t, whch are face classfcaton, boundng box regresson and facal landmark localzaton. I wll ntroduce these three algorthms n the later secton. The output of P-Net are some proposed boxng boxes and ther classfcaton scores of faces. None maxmum suppresson wll be appled n order to clean the boundng boxes that are overlappng wth each other. Those remanng boxes wll be reszed to 24x24x3, and used as the nput to the next net. 2

The output of ths net are stll boundng boxes and ther classfcaton scores. Those boxes wll be reszed to 48x48x3 to be used as the nput to the next net. 4.3. Stage-3 Stage-3 s called O-Net, whch references to Output Net. Ths network s deeper wth larger convoluton kernel comparng wth prevous nets. Ths powerful network wll make the fnal decson about where the face s and what the sze of boundng box should be. 4.4. Classfcaton and Regresson At the end of three stages, there are the computaton of face classfcaton, boundng box regresson and facal landmark localzaton. Ther loss functon are dfferent, I wll ntroduce them one by one. 4.4.1 Face classfcaton Fgure 3. Ppelne of three stages of MTCNN Whle dong face classfcaton, cross entropy s used as the Loss functon. L det = (y det log(p ) + (1 y det )(1 log(p ))) (1) where p s the probablty produced by the network. y det 0, 1 s the ground-truth label of the faces. 4.4.2 Boundng Box Regresson For the boundng box regresson, Euler dstance(l2 loss) s used as the loss functon. L box = ŷ box y box 2 2 (2) where ŷ box s the regresson target generated by the network and y box s the ground-truth locaton. Fgure 4. Three nets of MTCNN 4.4.3 Facal Landmark Localzaton L2 loss s also used as the loss functon for ths part. 4.2. Stage-2 Stage-2 s called R-Net, whch references to Refne network. Ths Net has convoluton wth larger kernel and a fully connected layer, whch s more powerful than the prevous network. The goal of ths net s to refne the results from prevous net. The boundng boxes whch have low face classfcaton scores wll be dscarded. Agan, for the regons wth hgh classfcaton scores, boundng box regresson and facal landmark localzaton wll be appled. L landmark = ŷ landmark y landmark 2 2 (3) where ŷ landmark s the locaton of facal landmarks generated by the network and y landmark s the ground-truth locaton. These weghted sum of these three loss values s used as the loss for back propagaton. Ther weghts are determnstc values. In the P-Net and R-net, the weghts for face determnaton, boxes regresson and landmark localzaton are 1.0, 0.5 and 0.5. In the last stage, the weghts are 1.0, 0.5, 1.0. 3

5. Experments/Results/Dscusson 5.1. Evaluaton method The algorthm was tested on the FDDB dataset. The evaluaton method used was to compare the coordnate of the ellpse label to the center of the output rectangle. If the two coordnates are wthn 50 pxels both n wdth and heght, the rectangle s consdered a correct detecton. If there s a detecton rectangle but not an ellptcal label near the rectangle, t s consdered a false postve. An example of false postve s shown n 5. Notce the person s face on the left was detected wth a green boundng rectangle, but t was not labeled, snce nether of the person s eyes were vsble. On the other hand, f there s a labeled ellpse but the detector fals to fnd the face, t s consdered a false negatve. An example s shown n 6. Notce a red ellpse n the top center labels a man s face that s partally covered by the person n front. Snce most of the person s face was covered, the detector was unable to detect the face. However, t was stll labeled snce one of the person s eyes s vsble. In [2], the authors proposed a method to evaluate by calculatng the overlap between the boundng rectangle and the ellpse. Although t does gve a better meanng to what a correct detecton s,.e. over 80% overlap between the rectangle and ellpse, t would consder all detecton as false detecton f the rectangle and ellpse have relatvely dfferent szes. Snce, dstance between center pxel s smpler to mplement, t was used to evaluate the results. Note that the percentage overlap s an arbtrary number as the 50 pxels used, and thus ether evaluaton method s ntrnscally mperfect as the labels. Nonetheless, the evaluaton result should stll be a meanngful metrc to test the algorthm. Fgure 6. Example of false negatve Fold False postve Correct detecton Total Faces 1 23 480 515 2 18 485 519 3 25 483 517 4 18 488 517 5 17 486 514 6 21 484 518 7 30 494 518 8 24 468 518 9 15 483 514 10 24 495 521 All 215 4846 5171 Table 1. Detecton results 5.2. Results The results are recorded n Table 1 and Table 2. Snce there are two knds of errors, a false postve and a false negatve, there are two metrcs to evaluate the detector. The accuracy, s the number of correct detectons made dvded by the total number of faces. Ths measures how many of the labeled faces can the detector fnd. The true postve rate, s the number of correct detectons made dvded by the number of all postves. Ths measures out of all the detectons the detector has made, what percentage of whch s a true detecton. 6. Concluson/Future Work Fgure 5. Example of false postve By usng FDDB dataset for evaluaton, our mplementaton of MTCNN got an average of 93.7% accuracy and 95.8% true postve rate. The 4% of false postve rate ndcates that there are some dfferences between the MTCNN detecton and FDDB s label. We looked nto the mspredcton cases, and we found some faces wthout both eyes vsble are stll detected by MTCNN. Snce FDDB only labels faces wth ether one of the eyes vsble, there 4

Fold Accuracy True Postve Rate 1 0.932 0.954 2 0.934 0.964 3 0.934 0.951 4 0.944 0.964 5 0.946 0.966 6 0.934 0.958 7 0.954 0.943 8 0.903 0.951 9 0.940 0.970 10 0.950 0.954 All 0.937 0.958 Table 2. Accuracy and True postve rates are some faces not labeled, whch caused our false postve rate to go hgh. Also, some labeled faces wth extremely low lght condtons or wth other objected covered. Ths knd of faces are dffcult for MTCNN to detect. For the future work, we can mprove our model by tranng more edge cases, lke the ones mentoned above. Also, dong more evaluatons lke we dd on FDDB can help us to fnd the weakness of our model. In addton, tranng on a even larger dataset should allow the network to learn n more extreme cases and therefore have mproved performance. We can also make our MTCNN mplementaton to work on real tme face detecton. Whle the accuracy s hgh enough, we can work on other functonaltes lke face recognton or emoton recognton n the future. References [1] T. Ahonen, A. Hadd, and M. Petkänen. Face recognton wth local bnary patterns. In European conference on computer vson, pages 469 481. Sprnger, 2004. [2] V. Jan and E. Learned-Mller. Fddb: A benchmark for face detecton n unconstraned settngs. Techncal Report UM-CS-2010-009, Unversty of Massachusetts, Amherst, 2010. [3] R. Mer and G. Rätsch. An ntroducton to boostng and leveragng. In Advanced lectures on machne learnng, pages 118 183. Sprnger, 2003. [4] H. A. Rowley, S. Baluja, and T. Kanade. Neural network-based face detecton. IEEE Transactons on pattern analyss and machne ntellgence, 20(1):23 38, 1998. [5] K. Somashekar, C. Puttamadappa, and D. Chandrappa. Face detecton by smqt features and snow classfer usng color nformaton. Internatonal Journal of Engneerng Scence and Technology, 3(2), 2011. [6] P. Vola and M. Jones. Rapd object detecton usng a boosted cascade of smple features. In Computer Vson and Pattern Recognton, 2001. CVPR 2001. Proceedngs of the 2001 IEEE Computer Socety Conference on, volume 1, pages I I. IEEE, 2001. [7] K. Zhang, Z. Zhang, Z. L, and Y. Qao. Jont face detecton and algnment usng multtask cascaded convolutonal networks. IEEE Sgnal Processng Letters, 23(10):1499 1503, Oct 2016. 5