Topic: Facial Keypoints Detection

Size: px

Start display at page:

Download "Topic: Facial Keypoints Detection"

Hilary Hudson
6 years ago
Views:

1 Project Proposal Chongzhao Mao Topic: Facial Keypoints Detection 1. Introduction The objective of this task[1] is to predict key points positions on face images. This can be used as a building block in several applications, such as: tracking faces in images and videos, analyzing facial expressions, detecting dysmorphic facial signs for medical diagnosis, biometrics / face recognition. Detecting facial key points is a very challenging problem. Facial features vary greatly from one individual to another, and even for a single individual, there is a large amount of variation due to 3D pose, size, position, viewing angle, and illumination conditions. Computer vision research has come a long way in addressing these difficulties, but there remain many opportunities for improvement. The data set for this competition was graciously provided by Dr. Yoshua Bengio of the University of Montreal. The tutorial was developed by James Petterson. 2. Data The input image is given in the last field of the data files, and consists of a list of pixels (ordered by row), as integers in (0,255). The images are 96x96 pixels. The data[2] given are three parts: 1

2 1) training.csv. This is the file of list of 7049 images, where each row contains the (x,y) coordinates for the 15 key points. The image data is in row-ordered list of pixels. 2) test.csv This is the file of list of 1783 test images. Each row contains image ID and image data as row-ordered list of pixels. 3) SampleSubmission This is the file of the list of key points to predict. Each row in the file contains a row ID, image ID, feature name, and location. 3. Features(Keypoints) Each predicted keypoint is specified by an (x,y) real-valued pair in the space of pixel indices. There are 15 keypoints, which represent the following elements of the face: left_eye_center, right_eye_center, left_eye_inner_corner, left_eye_outer_corner, right_eye_inner_corner, right_eye_outer_corner, left_eyebrow_inner_end, left_eyebrow_outer_end, right_eyebrow_inner_end, right_eyebrow_outer_end, nose_tip, mouth_left_corner, mouth_right_corner, mouth_center_top_lip, mouth_center_bottom_lip. Left and right here refers to the point of view of the subject.in some examples, some of the target keypoint positions are missing (encoded as missing entries in the csv, i.e., with nothing between two commas). 4. Evaluation & Baseline The evaluation is based on Root Mean Squared Error(RMSE). While I can test the result on Kaggle website with score and rank. 2

3 There is given baseline on Kaggle, which is written in R with a score.[3] 5. Topic Choosing Reason 1)This is a educational-purpose data set. The format for data set is clear. 2)The data set is uploaded on Kaggle in 2013, while the competition will end on Dec 31, So for the data, it is pretty new, not like some data sets around 2000 or before. On the other hand, the competition has last for several years, so there are many resources for me to learn. 3)I have a strong interest in computer vision. The data set is pretty open minded. People can use different ways with image analysis and computer vision for preprocessing the image data, and this is a good start for me to apply deep learning algorithm on computer vision. 4)The competition on Kaggle can provide a easy way to test the prediction result, giving me a score based on RMSE and a rank over all entries for competition to let myself better knowing the performance of the algorithm. 6. Tools The project will use Torch7[4], which is an open source framework. Torch is a scientific computing framework with wide support for machine learning algorithms that puts GPUs first. The reason for choosing Torch is that it was used created by Facebook AI Research Group and Google DeepMind, which is stable and robust. I wanted to use Theano and Caffe, while Theano was invented in lab that there are so much bug that I have to deal with bug issues all the time and right now there are something not even solved by people. For Caffe, it requires the GPU and installation of CUDA, but my macbook does not have a GPU. If you do not use GPU, the framework will be a nightmare, it was optimized specifically for GPU use. 3

4 In the end, I chose the Torch. Its language Lua is easy to learn and it is a mature framework with so much resources and libraries to learn with. 7. Milestone 1)Progress Report: For progress report, I will finish learning Lua, getting familiar with the Torch library of deep leaning, preprocess the image data, and start choosing the deep learning algorithm. 2)Final Report By the final report, I will finish the code implementation, network training, prediction, adjustment and improvements. 8. Available Papers & Reports 1) Yue Wang and Yang Song, Stanford University, CS229 Machine Learning, class project report. Title: Facial Keypoints Detection Link: %20Song,Facial%20Keypoints%20Detection.pdf 2) Abheet Aggarwal. Indian Institute of Technology Kanpur, CS365 Artificial Intelligence, class project report. Title: Facial Keypoint Detection Link: 3) Mikko Haavisto. Lappeenranta University of Technology, Bachelor Thesis. 4

5 Title: Deep Generative Models For Facial Keypoints Detection Link: Deep_Generative_Models_for_Facial_Keypoints_Detection.pdf?sequence=2 References: [1] [2] [3] [4] 5

Facial Keypoint Detection

Facial Keypoint Detection CS365 Artificial Intelligence Abheet Aggarwal 12012 Ajay Sharma 12055 Abstract Recognizing faces is a very challenging problem in the field of image processing. The techniques