Face Detection on OpenCV using Raspberry Pi

Face Detection on OpenCV using Raspberry Pi Narayan V. Naik Aadhrasa Venunadan Kumara K R Department of ECE Department of ECE Department of ECE GSIT, Karwar, Karnataka GSIT, Karwar, Karnataka GSIT, Karwar, Karnataka Abstract This paper describes a machine learning approach for face detection to achieve extremely rapid and high detection rates. The objectives of the face detection are to detect faces and its spatial location in any images or videos. The proposed system detects the faces present in a grey scale image. The entire implementation includes three main stages. In the initial stage, Integral Image representation is used to extract the rectangular features very quickly. The intermediate stage includes AdaBoost learning algorithm to train classifiers and it also selects small number of visual features from a large number of features to yield extremely efficient classifiers. Finally all classifiers are linearly cascaded in such a way that the initial classifiers in the cascade consists of less complex features and more complex features are added as cascading increases in order to reject non-face region very quickly and focus on the face target which will result in less computation time. The classifiers are trained using OpenCV train cascade utility and a strong classifier is obtained after training with a large set of positive (faces) and negative (non-faces) images. The trained face detector is tested on Raspberry Pi (model B+) with its five mega pixels camera. An experimental result shows that the proposed system achieves good face detection rate compared to conventional methods. Keywords: Face detection, Rectangular Features, Integral image, AdaBoost, OpenCV. I. INTRODUCTION Face detection is one of the on-going research topics in the field of computer vision. The task of face detection is easy for human being but it is very challenging for computers. The difficulties associated with face detection are variations in scale, pose, orientation, lighting condition, facial expression etc. Many approaches have been implemented but each has its own advantages and limitations. This paper describes the face detection technique based on Viola and Jones algorithm [1] [2]. Various methods of face detection are categorized into four ways [3]. 1. Knowledge based method: Which uses predefined rules based on nature of human faces. This method has the limitation such as difficulty in detecting invariant background like illumination, different pose etc. 2. Feature invariant approach: It is mainly used to find face structure features but it consumes more time and suffers from lack of accuracy. 3. Template matching method: Here initially several templates of faces are stored and an input image is compared with the stored templates to detect faces but it consumes more time. 4. Appearance based method: Here models are trained from a set of face images to perform detection. It provides high detection accuracy compared to all other detection system. Machine learning algorithms are used to train the detector based on the statistical properties and probability distribution function. The proposed system is relay on the appearance based approach. Here the face detection is done by extracting facial features like eye feature, bridge of the nose feature, mouth feature etc. which are present in a grey scale image. These features are proportional to the change in contrast values between adjacent groups of pixels but not to the intensity values of a pixel. The features used in this system are named as rectangular features and are reminiscent of Haar basis functions [1, 4, 5, 6]. The three main contribution of this face detection is briefly described below. The initial contribution is start from converting the grey scale image into integral image, this allows fast extraction of rectangular features. The integral image concept is same as summed area table which are used in computer data structures [2].The second contribution is the AdaBoost learning algorithm which is used to select small number of important 1 Narayan V. Naik, Aadhrasa Venunadan, Kumara K R

features and to train classifiers [2, 7]. This algorithm classifies the faces and non-faces based on the threshold value. In each round of training it selects a feature that should classifies more than fifty percent of faces.the last contribution is cascading all the classifiers [2]. While training the AdaBoost generates large number of classifiers which can be used to detect faces in an image. An image may consist of non-face object also, so it is necessary to reject the sub-window very quickly and focus on the target object. In order to achieve this objective classifiers are combined in a cascade structure and applied on the input image one by one so that initial classifiers will reject the non-face region very quickly and focus on the face region. The face detector is trained by OpenCV (2.4.9) train cascade utility on the Ubuntu (14.04) platform with thousands of well cropped face and non-face images. The output cascade xml file is imported into Raspberry Pi (model B+). The detector results good detection rate and low false positive rate on still images and real time videos captured from Raspberry Pi camera (five mega pixels). is the difference between sum of pixels values under white and black regions. Fig.1. (c) is the three rectangular feature and its value is the difference of sum of pixels value of white region and black region. Fig.1.(d) is the four rectangular features and its value is the difference between sum of the pixels values of diagonal rectangles and off diagonals. The number of features derived from each prototype for different size, position and location is quite large and can be calculate with the use of below equation (1). X.Y [W+1-w(X+1/2)][H+1-h(Y+1/2)] (1) Where H and W are the size of a W H pixels window and w and h be the size of one prototype inside the window as shown in Fig 2. Where X= [W/w] and Y= [H/h] be the maximum scaling factors in x and y direction. The Fig.2 shows that how the upright rectangle appears within an image sub-window. The rectangle slides all over the subwindow with different scale and position resulting in large number of feature value. Table 1.gives the total number of features obtained within an image subwindow of size 24 24. II. RECTANGULAR FEATURES TABLE I. Fig.2. Upright rectangle in a window. TOTAL NUMBER OF FEATURES INSIDE A 24 24 WINDOW. Feature type w/h Count Fig.1. Rectangular features. Rectangular features are the key part of this face detection system. The main purpose of using these features is that it encodes domain knowledge and also it operates much faster than the pixel-based system. These features are the combination of white and black rectangles and the value is obtained by subtracting the pixels value of white area from the pixels values of black area. The proposed system uses three up-right features as shown in Fig.1. The size of the features is easily scaled by increasing or decreasing the number of pixels being examined.infig.1. (a) and (b) are the two rectangular features and value of these features (a),(b) 1/2,2/1 86400 (c),(its vertical image) 3/1,1/3 55200 (d) 2/2 20736 Total 162336 A. Integral Image Even a small window will result in large number of features and it is computationally expensive to calculate the value of the rectangles at every time, so the concept of integral image is introduced. Here the value of a rectangle can be easily calculated with only four array references. Let consider a Fig.3.the integral image at location x, y is the sum of the 2 Narayan V. Naik, Aadhrasa Venunadan, Kumara K R

pixels above and to the left of x, y inclusive and it can be calculated with help of below equations. ii x, y = i x, y x x,y y (2) Where ii x, y the original image and ii [x, y] is the integral image. It can be calculated in one pass over the original image with the use of below equation (3) and (4). s(x, y) = s(x,y-1) + (i(x, y) (3) ii(x, y) = ii(x-1, y) + s(x, y) (4) Where s(x, y) is the cumulative row sum and s(x, -1) = ii (-1, y) = 0. The Fig.4. shows calculation of sum of the pixels within the rectangle D only with the help of four array references. The sum of the pixels within A will result in integral image at location 1. A+B will result a value at location 2, A+B+C gives a value at 3 and A+B+C+D is the value at 4. The value of D can be calculated as [(4+1)-(2+3)]. Fig.3. Value of the integral image at point (x, y) is the sum of all the pixels above and to the left. Fig.4. Computation of rectangular sum with only four array references. III. ADABOOST LEARNING ALGORITHM Table 1.shows that even a 24 24 window image will results in total of 162336 features but only small number of relevant features are necessary for face detection. The AdaBoost algorithm is used to select such features and also it trains classifiers. AdaBoost uses weak learning algorithm to select a good feature which best separates the positive and negative examples. This learning algorithm is also called as weak learner and it determines the optimal threshold classification function, such that the minimum numbers of examples are misclassified. A weak classifier h j x is given in equation (5) and it consists of a featuref j, a threshold θ j and a parity p j indicating the direction of the inequality sign. h j x = 1, ifp jf j x < p j θ j (5) 0, otherwise Here x is a 24x24 pixel sub-window of an image. The AdaBoost algorithm for classifier learning is given below. Each round of boosting selects one feature from the 162336 potential features. 1) Lets = (x 1, y 1 ), (x n, y n ), wherex i x, set of faces and non faces images, y i y, y = {+1, -1}, uses for to mark positive sample or negative sample. 2) Initialize weights w 1,i = 1, 1, 2m 2l corresponds with the positive sample and negative sample separately (where m, l are the number of negatives and positives respectively). 3) For t = 1,.. T, find the classifier ht that minimizes the error with respect the distribution w t a) Normalize the weights, w t,i w t,j n j=1 w t,j (6) So that w t is a probability distribution. b) For each feature j, train a classifier h j (with definite threshold value θ j and parityp j ) which is restricted to using a single feature. The error is evaluated with respect to w t. n ϵ j = i=1 w t,j w i hj x j y i. (7) c) Choose the classifier, h t with the lowest error ϵ t. 4) Update the weights. w t+1,i = w t,i β t 1 e i (8) Where e i = 0 if example x i is classified correctly,e i = 1 otherwise, andβ t = 5) The final strong classifier is. T t=1 h x = 1, α th t (x) 1 2 0, otherwise ϵ t 1 ϵ t. T t=1 α t (10) Where α t = log 1 β t At the each round of learning, some examples are misclassified, so the examples are re-weighted in 3 Narayan V. Naik, Aadhrasa Venunadan, Kumara K R

order to emphasize those which were incorrectly classified by the previous classifier. And finally a strong classifier is constructed. B. Cascading of Classifiers In each stage of classifier training the AdaBoost selects only relevant features which best classifies the positive and negative images. It is a best idea that adding a few features at the initial stage and increasing the feature as stage increases so that a positive results from initial classifiers triggers the next classifiers and this process will continue until the last classifiers. If any sub window that fails to pass any of the stage will reject immediately and no further processing is takes place. This will result in increased detection performance while radically reducing computation time. This form of detection process is called cascade. The Adaboost will take care of cascade classifier training. The Fig.5. show that at the first stage the subwindow is passed with few features and if that stage passes that sub-window as a face then it is tested with the second stage with more accurate features else sub-window will drop at the initial stage and this process will continue until last stage and the output of the last stage is a detected face. The training process consists of total 2429 faces and 1215 non-faces and 14 stages. After a few hour of training OpenCV generates an xml file and this final detector is imported into Raspberry Pi (model b+) for real time face detection. C. IMAGE PROCESSING In order to eliminate the effect of different lighting condition, all images which are used for training process are variance normalized. This process is also done during real time detection also. Standard deviation can be given as σ 2 = m 2 1 x 2 (11) N Where σ the standard deviation, m is the mean and x is the pixels value within the sub window. Mean value can be easily calculated with the help of integral image and by using the integral image of the squared image, the sum of squared pixels also calculated. During real time detection the normalization is done by post multiplying the feature values rather than operating on the pixels. D. OPENCV TOOL The proposed face detection system is initially trained by using the OpenCV utility. The createsample tool is used to create positive samples. Finally with the help of opencvtraincascade function the detector is trained. The Python language (Python 2.7) is used to write the face detection script. V.RESULT Fig.5. Signal flow of the detection cascade. IV. PRACTICAL IMPLEMENTATION The overall training process is done on Ubuntu 12.04 LTS with OpenCV 2.4.9. The training faces and non-faces images are obtained from MIT face data base. The database consists of resized and normalized images. OpenCV provides a utility called opencvcreatesamples tool to create a vector file and opencv_traincasced for training the detector. (a) Face detection on an image (b) Python output script. Fig.6. Face detection on 640x480 pixels image. Fig.6. (a) the detected face is represented by a rectangular box. Here the face is detected even though it contains moustache, beard. Fig.6. (b) shows the output Python script here the value within the square bracket represents pixels value of the detected face. Here the face detector takes 0.175 seconds to detect the face and this time also includes the Python script execution time. 4 Narayan V. Naik, Aadhrasa Venunadan, Kumara K R

(a)face detection on an image. (b) Python output script. Fig.7.Face detection on an image which contains structural component. that as the stages and training image increases, false positive rate become low with relative high accuracy. The detector works well on Raspberry Pi with a 5MP camera and detects the faces of captured image with the lowest resolution of 640x480 and the highest resolution of 2560x1920pixels withacceptable detection speed. This face detection system also detects the multiple faces in an image or videos. Fig.7. (a) shows a 640x480 pixels image which contains a person with spects on her face. The result proves that the detector detects the face regardless of the structural component. Fig.7. (b) shows the detected face pixels values and approximated detection time which is equal to 0.15 seconds. (a) Multiple faces detection.(b) Python output script. Fig.8. Face detection in a multiple faces image. Fig. 8. (a) shows that face detection in group photo and it is a 2560x1920 pixels image. The faces are correctly detected even though each face has different color, pose, and expression.here some nonface image alsodetected as face, but it can be eliminated by training the classifier with more images and stages. Fig. 8. (b) represents the corresponding face pixels value and the overall detection time. The detection time is equal to 5.859 seconds but this time also includes Python script Execution time. VI. CONCLUSION The proposed face detection system detects the faces with low false positive rate. Initially the experiment is done with 10 and 12 stage of classifier but the face detection gives the low accuracy and more false positive rates. The 14 stage detector gives good detection accuracy and an experiment results shows REFERENCES [1] Paul Viola and Michael Jones, Rapid object detection using boosted cascade of simple Features, IEEE Conference on Computer Vision and Pattern Recognition, 2001, Vol.1, pp. 511-518. [2] Paul Viola and Michael J. Jones, Robust Real- Time Face Detection, International Journal of Computer Vision, Kluwer Academic Publishers, Netherlands, 2004, pp. 137-154. [3] Ming-Hsuan Yang,David J. Kriegman and Narendra Ahuja, Detecting Faces in Images: A Survey IEEE transactions on pattern analysis and machine intelligence, vol. 24, no. 1, january 2002. [4] C. Papageorgiou, M. Oren and T. Poggio, A general framework for object detection, In International Conference on Computer Wsion, Bombay, Jan 1998, pp. 555-562. [5] Mohamed OUALLA, Abdelalim SADIQ and Samir MBARKI, A Survey of Haar-Like Feature Representation Multimedia Computing and Systems (ICMCS), 2014 International Conference on 14-16 April 2014, pp. 1101 1106. [6] Rainer Lienhart and Jochen Maydt, An Extended Set of Haar-like Features for Rapid Object Detection. IEEE ICIP 2002, Vol.1, Sep. 2002. [7] Yoav Freund and Robert E. Schapire, A Short Introduction to Boosting, Journal of Japanese Society for Artificial Intelligence, 14(5), pp. 771-780, September, 1999. 5 Narayan V. Naik, Aadhrasa Venunadan, Kumara K R