Eye Tracking System to Detect Driver Drowsiness T. P. Nguyen Centre of Technology RMIT University, Saigon South Campus Ho Chi Minh City, Vietnam s3372654@rmit.edu.vn M. T. Chew, S. Demidenko School of Engineering and Advanced Technology Massey University, Albany Auckland, New Zealand m.t.chew@massey.ac.nz, s.demidenko@massey.ac.nz Abstract This paper describes an eye tracking system for drowsiness detection of a driver. It is based on application of Viola Jones algorithm and Percentage of Eyelid Closure (PERCLOS). The system alerts the driver if the drowsiness index exceeds a pre-specified level. Keywords-eye tracking frequency, Viola Jones algorithm, PERCLOS, drowsiness index I INTRODUCTION Driver s fatigue is one of the main contributory factors of up to 20% of all traffic accidents and up to 25% of fatal and serious accidents [1]. In this context, it is very important to monitor the drowsiness of a driver. Among other approaches driver s fatigue level can be evaluated using the face analysis through several typical visual cues on a human face [2]: Eye blinking frequency; Yawn frequency; Eye gaze movements; Facial expressions; Head movements. II DROWSINESS DETECTION TECHNIQUES Face detection is a complex computer vision task due to the dynamic nature of human faces and high degree of variability of them. The face detection techniques include among others [3]: Top down model based approach (face model is searched at different scale levels); Bottom up feature based approach (facial features are searched in the image); Texture based approach (examining spatial distribution of the gray or color information); Neural network approach (sampling different region for detecting faces and passing to the neural network); Color based approach (use face similarities to skin color, and also to face shape); Motion base approach (using image subtraction to extract a moving region from a static background). Different face detection techniques are characterized by different face detection rates. Analysis of a number of the most popular techniques in [3] led to the conclusion that the combination of Viola-Jones technique and PERCLOS method could yield the eye detection rate reaching up to 99% and blinking detection rate up to 97.8%, thus being superior compare to other approaches. This combination has been employed in the reported driver drowsiness detection system. III. SYSTEM CONFIGURATION The prototype Eye Tracking System for Drowsiness Detection includes a dashboard mounted commodity camera, simple alarm board and processor (laptop) equipped with the developed software (Fig. 1). Figure 1. ETSDD architecture The system performs a real-time processing of the input image stream so to compute the level of fatigue of the driver. The analysis is based on calculating a number of frames of the data stream where the driver eyes are closed. The result of the processing is sent to the alarm board, which activates an alarm signal when the drowsiness index exceeds a pre-specified parameter. Because the face and eye tracking depends on light intensity and face illumination, the background should not contain any other high brightness objects or direct light sources. In order to effectively capture the face, the webcam is placed onto the vehicle dashboard and is approximately 20cm away from the driver s face. At this distance, the webcam captures the most of the driver s face. The camera and processor positions in the car are shown in Fig. 2. The alarm board is installed in the car console close to the driver. Figure 2. Webcam and computer on a car dashboard
Commodity webcam Logitech HD 920 [4] is employed for image acquisition. The camera uses High Speed USB 2.0 and it is connected to the processor while supplying stream video with the resolution of 1920 x 1080 pixels. The processor (commodity laptop or microprocessor board) converts the video signal into the IplImage format [5], grabs every frame of the input video and performs the required image processing so to determine in a real-time the state of the driver s eyes: open or closed. Based on the number of frames where the eyes are opened and closed, the processor calculates the drowsiness index and transfers the result to the alarm board. The board circuity is connected to the processor via the serial communication port (Fig. 3). Figure 5. A synchronous communication format The level of drowsiness is displayed to the driver by activating an appropriate number out of five LED indicators available on the board. When the drowsiness level reaches the highest fifth level, all LEDs are lit and PIC microcontroller activates the sound alarm. IV ALGORITHMS AND SOFTWARE Fig. 6 shows the operation flow of the system. Figure 3. Processor to alarm board communication The alarm board (Fig. 4) is built around PIC 16F887 microcontroller starter kit [6]. Figure 6. Drowsiness detection flowchart Figure 4. Alarm board. PIC receives the drowsiness level from the processor in an asynchronous mode in a frame with eight data bits and one stop bit where START bit is a 0 and STOP is a 1 (Fig. 5). In the eye detection stage, the processor receives facial images and, first, it adjusts their brightness and contrast. This helps to reduce dependence of accuracy of the system on light sensitivity. In next step, the top down model approach is applied to detect the face region in order to narrow down the location of eyes. If the input image does not contain the driver s face, the program continues to grab new input images from the webcam until the face is detected. From there the eyes region can be extracted. The system employs the Viola- Jones technique [7, 8] and the standard Ada-Boost (Adaptive Boosting) training method [9] to do the fast and effective eyes detection extraction.
In the first stage of the algorithm, a Haar-like features (reminiscent of Haar Basis functions [10]) are applied on a sub image to extract face features. Fig. 6 [8] shows some basic Haar features in the OpenCV [11] library and their application on the input image. The processor sums values of pixels under the black area. Then the sum of all pixels in the white area is calculated. Then the sum of white area(s) is subtracted from the sum of black rectangle area(s) providing a single value output. For example, in Fig. 6 Type 2 and 3 Haar features are applied to the face sub-images of 24*24 pixels to extract the eyes and nose areas. reducing the recognition processing time. In addition it also finds the best threshold values. A cascade of classifiers is constructed aiming to achieve increased detection performance while radically reducing computation time. The key insight here is that smaller, more efficient, boosted classifiers can be assembled in such a way that to reject many of the negative sub-windows while detecting almost all positive ones [8, 12]. It should be noted that the Viola-Jones method works well under different illumination conditions once the relevant images are available in the library and employed during the training. Fig. 8 shows an example of the successful extraction of eye region from a face image. Figure 6. Haar-type features and their application (Types 2 and 3) When the features match the relevant areas, high values (exceeding specified threshold levels) are the output are produced thus indicating detection of specific face parts. In order to increase efficiency and speed of calculating the sums of pixels inside a rectangle, Viola-Jones algorithm employs a so-called Integral Image technique. The integral image is the matrix where each value of a pixel (X,Y) is a sum of all pixels above and to the left of the coordinate (X,Y) Fig. 7. This significantly reduces the time and efforts for calculating the sum of pixels in the black and white regions when applying the Haar-like features. Figure 7. Integral image coding In a sub-window of 24*24 pixel base resolution, up to 160,000 Haar-features may be required to detect elements of interest in a face. However, there are only few sets among them that are actually useful for identifying the target facial areas. Therefore, it is important to choose the best among 160,000+ features to improve the efficiency and reduce the processing time. And this is where the Ada-Boost classifier is applied. Effectively it constructs a linear combination of weak classifiers to create a stronger classifier as: During the classifier training section, weights α are initially given uniform values. All the features are applied then to sample images, which are facial and non-facial. An error is recorded if a feature detects a wrong face object in a nonfacial sample image thus leading to a weaker classifier. The best features are then chosen for constructing a strong classifier from the weaker ones. Ada-Boost combines these weak classifiers for improving the facial detection rate and Figure 8. Eye region extraction Percentage of Eyelid Closure (PERCLOS) is the most popular method for drowsiness detection. It mathematically defines the proportion of time when the eyes are 80 percent to 100 percent closed [3, 13]. It monitors the slow closure of an eye lid rather than the fast blinking state of the eyes. Driver s fatigue level S can be calculated as S=H/L, where H is the height and L is the length of the driver s eye. In the input video each frame is classified based on the measured S value. Then PERCLOS value is calculated as: In this system the recommended PERCLOS alarm threshold of 0.15 is used as the highest level of drowsiness [14]. Table 1 show the drowsiness level which based on the PERCLOS thresholds (% of eyes closure over the 3 minutes interval). TABLE 1. DROWSINESS LEVELS BASED ON THE PERCLOS THRESHOLDS Threshold 1 S 3.75% Low drowsiness Threshold 2 3.75%<S 7.5% Low drowsiness Threshold 3 7.5%<S 11.25% Moderate Drowsiness Threshold 4 11.25%<S 15% Moderate Drowsiness Threshold 5 15%<S Severe Drowsiness When the fatigue level reaches the severe level, the system activates the alarm and thus alerts the driver to take appropriate actions to avoid a potential accident 1) Controlling contrast and brightness of the input image. The processor receives an input image from the camera and adjusts its brightness and the contrast so to reduce the light
sensitivity and increase accuracy of the system. This also helps to improve efficiency of the face detection classification that is sensitive to the brightness and contrast of the input image. When the input image is too bright/too dark, the processor program reduces/increases the brightness and contrast correspondingly. Otherwise, it brings these two parameters to the pre-specified balance point. Fig. 9 shows an example of the result of the parameters adjustment. 4) Graphical User Interface (GUI). Fig. 12 shows the GUI outlook with the following information being displayed: Data history; Image capture from the webcam; Extracted information including the level of drowsiness; The serial communication port setup; The program setup. Figure 9. Image before and after the brightness and contrast adjustment 2) Eye state detection by using contour information. A contour defines an object shape. When the eye region is extracted, the Contours function of OpenCV is employed to identify the shape of the eye iris area. The contour is derived by using color separation between the object of interest and the background. The image is converted to the black and white to highlight the shape of the iris. Based on the extracted contour eye region, the eye height and width are calculated. Fig. 10 shows the results for calculating the eye width and height. Figure 12.System GUI The data history area displays the captured process in a real time as well as drowsiness related data (Fig. 13). Figure 10. Detecting eye state based on the contour information 3) Time of blinking calculation. Eye blinking time is to be calculated and excluded the PERCLO computation. Thus the system distinguishes between closed eyes and blinking eyes on a basis of the time of these actions (the time of a blink is a relatively short, while that of a closed eye is significantly longer). The closed eye and blinking eye conditions are found by measuring the eye closure frequency. Fig. 11 shows the time length difference between eye blinking and being closed. Figure 13.Data history is displayed in real time and refreshed every 3 minutes The image capture area displays the tracked face and eye as well as displays in the upper right corner the contour region of the tracked eye. The extracted information area shows the eye related parameters, number of current frame which is shown in the captured image region (Fig. 14). Figure 11. Distinguishing between closed and blinking eyes Figure 14. Extracted information area
When the system is initially switched on, the configuration is loaded into it, which includes settings for the serial port, data display time, PERCLOS threshold, lighting conditions, etc. If required these parameters can easily be adjusted. The twilight condition accuracy plot for different contrast and brightness levels is shown in Fig. 16. It can be seen that the accuracy here is overall since the illumination interference is reduced. The contrast and brightness in the range 0 and 5 could be chosen as the pre-set values. V EXPERIMENTAL EVALUATION The system was tried in various light conditions including: daylight, nightlight and twilight with different brightness and contrast parameters. In each light condition four states of an eye (Close, 80% Close, 20% Close and Open) are tested (Table 2, 3 and 4). TABLE 2. DAYLIGHT CONDITION TESTING RESULTS Fig. 16. Accuracy with respect to brightness and contrast in twilight TABLE 4. NIGHTLIGHT CONDITION TESTING RESULTS It can be seen from the results that the level of accuracy depends on the levels of brightness and contrast (Fig. 15). The nightlight condition is the most challenging and the most important as this is when the drivers usually feel fatigue. It can be seen from Fig. 17 that the optimal contrast and brightness levels for the nightlight are in the 10 to 20 range. Fig. 15. Accuracy with respect to brightness and contrast in daylight Indeed in the daylight condition, the ambient light strongly interferes with the image processing and pattern recognition. The worst case (zero accuracy) is caused by the ambient light interference as well as inappropriate contrast and brightness parameters. It can be concluded from the experimental results that the standard (pre-set) conditions for contrast and brightness in daylight operation of the system are to be in the vicinity of -10 to -20. TABLE 3. TWILIGHT CONDITION TESTING RESULTS Fig. 17. Accuracy with respect to brightness and contrast in nightlight Once the optimal values for the contrast and brightness are known the experimental research has been repeated to evaluate the accuracy of detection of various eye states (close, 80% close, 20% close and open). The results are shown below in Table 5, and it can be seen that the accuracy exceeds 95% for all eye states and lighting conditions. Test on detection of eye blinking and calculation blinking time showed the expected efficiency of the system it correctly recognized eye blinking events and accurately count
a number of relevant frames (Fig. 18). The blinking time is then used in PERCLOS algorithm as discussed above in section IV. TABLE 5. ACCURACY OF EYE DETECTION IN OPTIMAL BRIGHTNESS AND CONTRAST LEVELS V CONCLUSION The implemented prototype Eye Tracking System for Drowsiness Detection takes advantage of the Viola-Jones algorithm and the PERCLOS methodology for successful detection of drowsiness of a vehicle driver (or some other machine operator). The accuracy of the eye state detection is in excess of 95% for the analyzed lighting conditions. The system provides user friendly GUI. It is characterized by a rather compact hardware requirements thus making it possible to implement it based on one of the mid-range microprocessor or FPGA board. Another considered option is to progress towards complete implementation of the system on one of the standard commodity smartphones where all the required software and hardware tools (camera, processor, memory, display, alarm, controls, communication facilities, operating system, various applications, etc.) are already available, thus making it easy to progress towards a new app that can be offered to the public. Fig. 18. Detection of eye blinking and calculation of the blinking time Finally the complete system was tested for overall operation with drowsiness detection based on pre-assigned PERCLOS threshold. Once the drowsiness level exceeds the threshold value the alarm is to be activated. Fig. 19 shows various eye closure states and alarm activation/deactivation for various pre-assigned levels. The continuous red line is the real-time evaluation of percentage of driver s eye being open, while the black line shows the drowsiness levels from 1 to 5. Depending on the system settings and level of drowsiness, led indicators and sound alarm are activated when the drowsiness is approaching a danger level. Once the level of drowsiness is reduced the alarm is deactivated after a short programmable delay. REFERENCES [1] Driver fatigue and road accidents, The Royal Society for the Prevention of Accidents, UK, June 2011, 4 pp. [2] M. J. Flores, J. M. Armingol and A. de la Escalera, "Real-time drowsiness detection system for an intelligent vehicle," IEEE Intelligent Vehicles Symposium, June 4-6, 2008, The Netherlands, pp. 637-642. [3] J. Jimenez-Pinto and M. Torres-Torriti, "Driver alert state and fatigue detection by salient points analysis," IEEE International Conference on Systems, Man, and Cybernetics, October, 2009, USA, pp. 455-461 [4] Logitech HD Pro webcam C920, [Online]. Available: http://www.logitech.com/en-nz/product/hd-pro-webcam-c920 (accessed Aug 12, 2014) [5] G. Agam, Introduction to programming with OpenCV, Department of Computer Science, Illinois Institute of Technology, 2006, [Online]. Available: http://www.cs.iit.edu/~agam/cs512/lect-notes/opencvintro/opencv-intro.html (accessed Aug 21, 2014) [6] PIC16F887, Microchip, [Online]. Available: http://www.microchip.com/wwwproducts/devices.aspx?product=pic16 F887 (accessed Aug 21, 2014) [7] P. Viola and M. Jones, "Rapid Object Detection Using a Boosted Cascade of Simple Features," IEEE Computer Vision and Pattern Recognition Conference, 2001, Vol. 1, pp. 511-518 [8] P. Viola and M. Jones, "Robust Real-Time Face Detection", International Journal of Computer Vision, 2004, Vol. 57, no 2, pp. 137-154. [9] Y. Freund and R. E. Schapire, A Short Introduction to Boosting, Journal of Japanese Society for Artificial Intelligence, 1999, Vol. 14, no5, pp.771-780. [10] Haar Function, Wolfram MathWorld, [Online]. Available: http://mathworld.wolfram.com/haarfunction.html (accessed Aug 21, 2014) [11] OpenCV open source computer vision, [Online]. Available: http://opencv.org/ (accessed Aug 21, 2014) [12] J. Ren, N. Kehtarnavaz and L. Estevez, "Real-time optimization of Viola-Jones face detection for mobile platforms," 7 th IEEE Dallas Circuits and Systems Workshop, Oct, 2008, pp.1-4 [13] L. Tijerina, W. W. Wierwille, M. J. Goodman, S. Johnston, D. Stoltzfus and M. Gleckler, "A preliminary assessment of algorithms for drowsy and inattentive driver detection on the road," U.S. Department of Transportation, National Highway Safety Administration, 1998, 49pp. [14] J.-F. Xie, M. Xie and W. Zhu, "Driver fatigue detection based on head gesture and PERCLOS", International Conference on Wavelet Active Media Technology and Information Processing, Dec, China, 2012, pp. 128-131 Figure 19. Alarm is activation/deactivation for various levels of drowsiness