Facial Landmark Detection using Active Shape Models

Size: px
Start display at page:

Download "Facial Landmark Detection using Active Shape Models"

Transcription

1 Departament de Teoria del Senyal i les Comunicacions Escola d Enginyeria de Terrassa Universitat Politècnica de Catalunya Facial Landmark Detection using Active Shape Models Author Gonzalo Lopez Lillo Professor: Josep Ramon Morros Barcelona Octubre 2014

2 1.1 Introduction I would like to thank my brother, who encouraged me to study this degree and to do this project in English. A special appreciation to the blonde who lives in the attic. I am thankful to my team for their great human quality. Also a special appreciation to the coffee girl with a flower in her head. And last, but not least special mention to Ramon Morros, who is the only one that has given me the opportunity to be here. 1

3 1.1 Introduction Contents Illustrations... 5 Chapter Introduction Project objectives Some clarifications of the project Project description... 9 Chapter History and Literature Review Active Shape Model Active Shape Model with SIFT MARS Active Shape Model with SIFT and MARS Chapter Active Shape Model Aligning the Train Set The Shape Model PCA in the Shape Model Classic Active Shape Model Multi-resolution The New Active Shape Model HAT & SIFT Scale-Invariant Feature Transform (SIFT) Histogram Array Transform (HAT) Multivariate Adaptive Regression Splines (MARS) Chapter System Description Block Diagram Face Detector

4 1.1 Introduction 4.3 Database model ASM The ASM model Init Models and Detector Parameters bmax and negis Align the Shape Align with the Rectangle Align with the Eyes Align with the Eyes and Mouth Select ASM Model ROI Shape to frame Mouth Identification Chapter Database Facial Dataset Open and closed mouth dataset Chapter Open or Closed Mouth State Detection Active Shape Model in Mouth Vector of features Classifier Training Classification Chapter Experimental Results NEW ASM The me17 measure: Compare the different start shape models Graphs for comparing models Image error with me Landmark error with me Classic ASM vs New ASM Empirical cumulative distribution function Visual evaluation with empirical test Mouth State Detector Confusion Matrix The ROC curve

5 1.1 Introduction The ROC curve for dataset validation Results Chapter Conclusions and Possible Future Research Conclusions Possible Future Research Appendix A Face landmarks Bibliography

6 1.1 Introduction Illustrations Figure 1 Image Face with landmarks. Image data base from BioID Figure 2: Aligning position, rotation and scale the shape Figure 3 Algorithm of aligning the train set Figure 4 the mean shape Figure 5 varying the first three shape parameters between i b i Figure 6 Applying an Active Shape Model in a face and iterating until convergence Figure 7 three resolutions of the image Figure 8 Block Diagram about the New Active Shape Model Figure 9 Block diagram of SIFT Figure 10 an area in the patch map to extract a gradient Figure 11 convert the gradient with 8 bins to a histogram Figure 12 General block diagram of software Figure 13 Block Diagram of software in small modules Figure 14 Input image before to pass by the face detection. And the ROI image after the Face Detection Figure 15 Block diagram of model training used for Stephen Millborrow and Fred Nicolls for create a frontal model Figure 16 Align ROI with the rectangle model Figure 17 Align ROI with the eyes detector Figure 18 Align ROI with the eyes and mouth detector Figure 19 select and adjust the size of the start shape using a triangulation of the eyes position with the mouth position Figure 20 Fitting the Shape using HAT and MARS Figure 21 The BioID face database Figure 22 Frames to Maria Konnikova: How to think like Sherlock Holmes from big think channel Figure 23 Frames to the importance of knowing from big think channel Figure 24 Frames from to how to revise like a Sherlock by Mind Place from Maddie Moat channel Figure 25 Sam Harris Big Think All from Big Think channel Figure 26 telenotíces TV3 (TN) Figure 27 Landmarks located in the mouth Figure 28 the distance between the two inner landmarks lips and the width mouth distance Figure 29 Histograms with the Gaussian distribution. The left histogram is about closed mouth and the right histogram is about the open mouth Figure 30 the two histograms with their Gaussian distribution in the same plane for looking for the intersection between the Gaussian distribution and find the threshold Figure 31 Images with many landmark errors found in the results of the graphs. Order from left to right and from top bottom these image correspond to the number 259,282,392, 901,423,640,740,803 of the Graph Figure 32 BioId landmarks Figure 33 classic ASM with 68 landmarks Figure 34 New ASM with 77 landmarks

7 1.1 Introduction 6

8 1.1 Introduction Chapter Introduction In today s digital world the consumer electronics industry is undergoing massive transformation as consumers become more connected than ever to an increasing array of digital products and devices, anywhere, anytime. Devices such as smartphones and tablets are commonplace. Therefore increasing exponentially the large amount of visual data that is generated, both video and images. Creating the need of knowing how to deal with the information it contains. One of the most important issues of this pre-processing is knowing how to retrieve the information related to people from those multimedia devices. Learning how to use this information lets us interact with systems using cameras to automate processes. One of the most relevant features of a person in their face. In the last couple of years there has been extensive efforts around facial detection and recognition, the key in this algorithm is the detection of facial features (eyes, mouth, nose, etc...). Because by effectively identifying these facial features we can improve the facial recognition techniques, for example be able to estimate the pose of face, recognize emotions etc.. This project we will be focused on one of the tools used to localise facial features. Specifically on some variations of the Active Shape Model [1]. ASM is a method based on a model created by interest points, it uses algorithms to try and find the maximum coincidences between the model and the image data. The information obtained from ASM will be used to recognize expressions. This information have several applications. For example an eyes detection could let you control your smartphone with the eyes. The focus will be focus on how to use this information to create and open or closed mouth state detection, which could be used to improve the recording of sound in videos. 7

9 1.2 Project objectives 1.2 Project objectives The main objective is to adopt the stasm by Steven Milborrow and Fred Nicolls [11] code for facial recognition and extraction of facial features to Imageplus (the new development platform in C++ language for the Video and Image Processing Group at UPC). In order to achieve the main objective the following task have been completed. All the objectives are focused on relating images to a faces and facial features: Given an image with a face, find the position where the face features are located. The set of local points describing each face, as shown in Figure 1 are called landmarks. Figure 1 Image Face with landmarks. Image data base from BioID. To compare the new with the old Active Shape Model, it uses SIFT [10] descriptor and PCA [16] and it has 68 landmarks. New Active Shape Model use SIFT descriptor and MARS[18] and it has 77 landmarks. To recognize the facial expressions, specifically open and closed mouth detection using Active Shape Model and classify it with a binary classifier. 8

10 1.3 Some clarifications of the project 1.3 Some clarifications of the project The feature extraction process is extremely complex and there are many techniques. Obviously this project is not starting from scratch. In order to avoid confusion or pitfalls, it is necessary to indicate some aspects that are not included in this project: Face detection. The face position must be located before using Active Shape Model Viola-Jones [17] detector of OpenCV[14] is used in order to see if there are some faces present in the image. Only to accept frontal face images. Our ASM implementation does not contain any side view models for ASM. Face recognition. This project does not identify the present people within an image. 1.4 Project description The project is divided into seven main parts: - State of the art, a brief review of the literature. - ASM, SHIFT, MARS and HAT theoretical contents. - The software structure. - Database validation set and test set. - Open and closed mouth detection. - Summary of results and a comparison of them with published results. - Conclusions, discussion and possible future research. 9

11 2 History and Literature Review Chapter 2 2. History and Literature Review 2.1 Active Shape Model This statistic approximation is used for shape models and feature extraction. ASM was presented for Cootes[1], in the same document he also presented AAM. This technicque is able to estimate the shape of an object with high level accuracy. In addition to this, they have a lot of types of algorithms and results. This allows to establish a comparative framework, which allowed the development and improvement of different techniques. In October of 2004, Fei Zuo[2 ]and Peter H. created a face recognition system. They used a deformable shape model with haar-wavelet based in local texture attributes. In 2005, Kwok-Wai [3] proposes an ASM, which is able to adapt to facial images under different orientations. To represent the face in different poses, a model is used for each view. In 2006, Mohammad H.[4] proposes to extract the facial features with RGB information. He used enhanced active shape model. 2.2 Active Shape Model with SIFT There are a lot of proposals related with the usage of SIFT inside ASM. However, we are going to mention just a few based schemes. In 2007, Kanaujia and Metaxas[7] used SIFT descriptors with multiple shape models clustered on different face poses. In 2009, Zhou[5] made an automatic landmark with a SIFT descriptor with Mahalanobis distances. Zhou and Petrovska improved the eye and mouth distances. In the same year, Li Z.[6] searched facial feature using statistical models and SIFT descriptors. Finally, in 2011 Belhumeur used SIFT descriptors with SVM and RANSAC.[8] 2.3 MARS Multivariate Adaptive Regression Splines (MARS) is a general-purpose regression technique introduced by Jerome Friedman in a Leathwick in a 2005 use MARS to predict the distributions of New Zeland s freshwater diadromous. Fish and Vogel use of MARS for concentration can explain two thirds protein abundance variation in a human cell line[9] 10

12 2.4 Active Shape Model with SIFT and MARS MARS has demonstrated a good performance on different applications, although it is not well known in the image processing community. 2.4 Active Shape Model with SIFT and MARS In 2014, Steven Milborrow and Fred Nicolls used the ASM with SIFT descriptor and MARS[11]. Histogram Array Transform is a simple modification version of SIFT. They used a HAT descriptor: HAT descriptor is essentially unrotated SIFT descriptor with a fixed scale. This project is based on the Milborrow s model. From now on. This model will refer to as New Active Shape Mode [11l. 11

13 3 Active Shape Model Chapter 3 3. Active Shape Model Active Shape Model is used to represent objects in images. The shape of the object is represented by a set of n points which may be x dimensions. These points are called landmarks. x ( x1, y1, x2, y2..., x n, y n ) Equation 3-1 In order to choose the best landmarks points it is necessary to find the same points in other images. It could be placed at clean corners of object boundaries or in face cases, landmarks are usually located in places where these could be boundaries and resemble with most precision the shape of the faces. The objective is to build a model that allows us to create new shapes and to synthesise shapes similar to those in a training set. The training set comes from hand annotation, though there are automatic landmark systems. 3.1 Aligning the Train Set It is necessary to align the train set to work in the same reference systems. By aligning position, rotation and scale of each shape, we achieve the minimum distance between the shapes. T Figura 4.2: Figure Punt 2: osaligning ut ilizados position, para rotation alineación and scale dethe modelos shape de forma. 12

14 3.2 The Shape Model 1. Translate each example so that its centre of gravity is at the origin. 2. Choose one example as an initial estimation of the mean shape and scale so that x Record the first estimation as x0 the reference to define the default reference frame. 4. Align all the shapes with the current estimation the mean shape. 5. Re-estimation mean from aligned shapes. 6. Apply constraints on the current estimate of the mean by aligning it with x0 and scaling so that x If not converged, return to 4 point. 3.2 The Shape Model After aligning all the shapes and before building the shape model, we have n vectors of 2 dimensions. With that, we could generate new examples that are similar to those in the original training set. With them, it is possible to examine new shapes to decide whether they are valid examples. In order to reduce the dimensionality of the data from n x 2. We use Principal Component Analysis (PCA) PCA in the Shape Model In our case the shape have an Figure 3 Algorithm of aligning the train set n shapes=77 landmarks in 2D dimension. x x, y, x, y..., x, y ) i ( Equation 2 We search the mean shapes in the average of the aligned training shape n n T x i x n 1 shapes Equation 3 n shapes i1 x i 13

15 3.2 The Shape Model The covariance matrix Figure 4 the mean shape S S of the training shape points is computed. S S n shapes n 1 1 shapes i1 Equation 4 x xx x T i i Then the eigenvectors and corresponding eigenvalues i of S. If it contains the t eigenvectors corresponding to the largest eigenvalues, we can approximate any of the training set, x by using x x b Equation 5 Being b is a vector given by the expression Equation 6 where t is the number of eigenvectors considered. By applying limits of b T x Equation 6 x 3 i b 3 i to the parameter i shape generated is similar to those in the original training set. b we guarantee that the The Figure 5 presents the effect of varying the first three shape parameters between 3 i b 3 i parameters at zero. standard derivations from the mean value, leaving all other 14

16 3.3 Classic Active Shape Model Figure 5 varying the first three shape parameters between 3 i b 3 i 3.3 Classic Active Shape Model The ASM fits the model points with a new image using an iterative technique. A search is made around the current position of each point to find a close point which best matches the model. The parameters of the shape model controlling the point positions are then updated to move the model points closets to the points found in the image. Before calculating b for the new image, we need to find T. T is a similar transform, which correlate the model space into the image space. x X T y Yt t s cos s sin Equation 7 s sin x s cos y The transform is needed because the face shape x could be anywhere in the image space. Then we have to straighten out the image in order to work in the same origin point. Cootes and Taylor [1] section 4.8 describes an iterative algorithm for finding b and T. x T( x b) Equation 8 Instead of starting from scratch we are going to start from an approximation. Once we have chosen the parameter b for the model, we define the shape of the object in an object centred coordinates. We can create an instance X of the model in the image frame. Nevertheless, model points are not always settled on the strongest edge in the locality. They might represent a weaker secondary edge or another image structure. 15

17 3.3 Classic Active Shape Model The best approach is to use the training set for learning what to look for in the destination image. For this we need to develop statistical models of the image structure around the point and during the search simply find the points which best match the model. During the training stage we build a model for each landmark, we sample along a profile k pixels either side of the model point in the i training image. We have 2k+1 sample, which can be put in a vector g i. We normalise the sample g i g i 1 j g ij Equation 9 g i We create a mean profile g and a covariance matrix S g 1 g n shapes n shapes i1 g i S g 1 k 1 k i1 Equation 10 Equation 11 g gg g T The assumption is that the profiles are approximately distributed as a multivariate Gaussian, and thus can be described by their mean and covariance matrix. The distance between a search profile g and the model profile g is calculated using the Mahalanobis distance Distance T 1 g g S g g g Equation 12 This distance is equivalent to maximise the probability that g i g i g g comes from the distribution. This process is repeated for each landmark, Cootes and Taylor[1] section 4.8 describes and iterative algorithm. 16

18 3.4 Multi-resolution Figure 6 Applying an Active Shape Model in a face and iterating until convergence. 3.4 Multi-resolution In order to improve the efficiency and robustness of the algorithm, a multiresolution framework is implemented. This implies making a search in a coarse image, then refining this on finer resolution images. Figure 7 three resolutions of the image When looking at the resolution, we need to determine if the resolution is good or not. When the resolution is bad, then we look for another resolution and when resolution is good we stop the search. Techniques for testing convergence is repeated by Cootes and Taylor[1] in section 7.3 describes an iterative algorithm. 17

19 3.5 The New Active Shape Model 3.5 The New Active Shape Model New Active Shape Model uses the classic ASM Classic to locate landmarks. However, it uses a simplified form of SIFT descriptors for template matching, replacing the 2D gradient descriptor profiles. This simplified form of SIFT is called HAT. It also incorporates MARS to measure descriptor matches replacing the Mahalanobis distance. HAT MARS Aligning the Face Shape Model Resolution Image Figure 8 Block Diagram about the New Active Shape Model In the next section, the new descriptor HAT is going to be described. Also we are going to describe the MARS algorithm used to measure descriptor matches. 18

20 3.5.1 HAT & SIFT HAT & SIFT Scale-Invariant Feature Transform (SIFT) Scale-space Extreme Detection Keypoint Localization Orientation Assignment Keypoint Descriptor Steven Milborrow and Fred Nicolls Correspondence points of interest Figure 9 Block diagram of SIFT SIFT is an algorithm to detect and describe local features of images. The main objective or applications are object detection, recognizing people, video tracking and 3D modelling. This algorithm locates points within images based on the amount of information surrounding that point. This local information can be edges, textures or stable transformation points. The original SIFT is made up of the followings processes: 1 Scale-space Extreme Detector It searches on different sizes and in different regions the Difference of Gaussians (DoG) and the maximum local difference over space and scale. This produces a strong response to corners sized to fit with scale. 2 Keypoint Localization It locates the points of interest, refined to sub-pixel accuracy and it uses a threshold to discard those which are not relevant. 19

21 Histogram Array Transform (HAT) 3 Orientation Assignment An orientation to each point of interest is assigned to ensure with respect to the rotation of the images. In order to do that, the neighbouring points are taken at each interest point and the magnitude and direction of the gradient are calculated. Then we make a histogram with the directions that we mentioned before weighting with the gradient magnitude. The largest peak in the histogram indicates the orientation of the interest point. If there are other peaks above 80% of the most important, the new point is used to create other points of interest in the same position and scale but with different orientation. 4 Keypoint Descriptor For each point a group of neighbourhood points of 16x6 are taken. These are divided into 4x4 sub-blocks and for each point an orientation histogram is created. 5 Correspondence between points of interest The correspondence between the points of interest of two images is obtained through a search of the nearest points in the space of descriptors of points of interest Histogram Array Transform (HAT) HAT is the descriptor used in the new version of ASM (new ASM). It is a simplified form of SIFT. This version takes advantages of the algorithm that is previously used in ASM to simplify the task descriptor. That does not repeat processes, therefore it is a smaller version of SIFT. SIFT first makes a scale-space analysis of the image to discover which points in the image are keypoints. In contrast, in ASM the keypoints are predetermined, they are the facial landmarks and the face is escalated to a constant size before the ASM search begin. In the SIFT algorithm the intrinsic scale of a descriptor has been determined in an additional pre-processing step. The local structure of the image around the point of interest is analysed to determine the gradient orientation. However, in ASM before to beginning the search we find the rotation of the entire image so the eyes are horizontal. Thus, the automatic orientation for SIFT is unnecessary. When SIFT localizes each pixel in the patch, the SIFT must be mapped to a position in the array of histograms. To do this mapping, SIFT scales and rotates patches. But HAT does not use that scaling, rotation nor the mapping. Therefore, it depends only on the histogram array dimensions and the image patch width. 20

22 Histogram Array Transform (HAT) This step generates a descriptor for the local image region that is highly distinctive at the beginning. It is as invariant as possible to remaining variations, such as change of illumination. The descriptor is created by computing the gradient magnitude from a rectangular patch around the image point of interest. Figure 10 an area in the patch map to extract a gradient We use a 15x15 patch and we take an array of histograms. These arrays have a 4x5 dimension. These dimension were determined during their training. We use 8 bins per histogram. That is, 360º/8=45 degree per bins. The array of histograms is stored internally as a vector with 4x5x8=160. Figure 11 convert the gradient with 8 bins to a histogram The gradient magnitude at a pixel is added to the histogram bin designated for this orientation, down weighted for smoothness by Gaussian distance of the pixel from the centre of the path. Likewise a small change in the orientation at a pixel could cause an abrupt jump in assignment from one histogram bin to another. Therefore, a gradient with a 45º orientation is shared equally between the bin for 0-45 and the bin for 45-90º. 21

23 3.6 Multivariate Adaptive Regression Splines (MARS) 3.6 Multivariate Adaptive Regression Splines (MARS) MARS is a type of regression analysis. It is a non-parametric regression technique and could be seen as an extension of linear models and automatically this type of regression analysis makes non-linear models and interactions between variables. MARS was used to measure how well the descriptor matches the facial feature of interest. match max( 0,1.514 b 0.111max( 0,2.092 b max( 0, b max( 0,1.574 b ) 10 ) 1.255) Equation 13 This is a MARS formula to estimate the descriptor match at the bottom left eyelid in the full scale 13 ) The MARS model-building algorithm generates the formula from training data. There is a similar formula for each landmark at each pyramid level. The bins enter the formula via the max functions rather than directly as they would in a linear model. The innate structure of the image feature makes some bins more important than others. 22

24 4 System Description Chapter 4 4. System Description In this chapter, the system used to carry out the face features extraction is described. The extraction is based on the chapter 3.5 and the following modules. Input Frame Face Detector NEW ASM Mouth Classification Load ASM Model from disk Load Mouth Model from disk Disk Figure 12 General block diagram of software Input Frame: It is the image or set of images, that are going to be entered to the face detection in order to extract their associated features. These tools are intended for input video. Face detection: To find the facial features, the faces must first be detected. When the images are entered to the system, the Viola-Jones technique has been used to perform the face detection. NEW ASM: This Module is responsible for aligning shape and extracting the face features. It will look for the initial shape and adjust it to the face, it will also fit the landmarks in the shape of the face. This module is based on a code developed by Stephen Millborrow and Fred Nicolls on C++ [11]. This code has been adapted to be used on the ImagePlus. 23

25 4 System Description Mouth identification: This module is responsible for recognizing if the people have their mouth opened or closed. This module uses the extracted features in previous module and the training model to determine the condition of the mouth. 24

26 4.1 Block Diagram 4.1 Block Diagram Figure 13 Block Diagram of software in small modules. 25

27 4.2 Face Detector 4.2 Face Detector All the faces that are in an image are extracted through a face detection system. The face detector used in this project is the one that we can find in the library ImagePuls, but it s implemented by OpenCV library [14], which is based on the algorithm developed by Viola-Jones. Figure 14 Input image before to pass by the face detection. And the ROI image after the Face Detection. The Viola-Jones uses a method that accumulates the results of many weak classifiers, each of them based on very simple image features. Viola-Jones uses AdaBoost, which is a machine learning algorithm that combines different weak classifiers in order to create a very powerful classifier. It also uses an algorithm to construct a cascade of classifiers, which achieves increased detection performance while radically reducing computation time. This implementation can work with frontal and side profile faces, but our face features detection only can work with frontal faces. Therefore, we only configure face detector tool for frontal faces. 26

28 4.3 Database model ASM 4.3 Database model ASM Manual Annotation Data Base Shapes Aligning Position, Rotation and Scale Reduce Dimensionality using PCA. Create Shape Model Figure 15 Block diagram of model training used for Stephen Millborrow and Fred Nicolls for create a frontal model. 4.4 The ASM model ASM model is based on training data. This data comes from a set of face images that have previously been manually labelled with facial features interest points. These images are part of the database XM2VTS. This database only includes frontal faces. Therefore, the exclusive usage of the model is for front faces. This model was trained by Stephen Millborrow and Fred Nicolls [11]. All the models are built using all 77 landmarks. In the Appendix A is described all the landmarks and their position. The appointment process of creation is explained in section 3. In the area 3.2.1, the algorithm and the software used to build the model is described. 27

29 4.5 Init Models and Detector 4.5 Init Models and Detector This module belongs to a class of facial_features_detector class. In this module the detectors and the configuration files are loaded and initialized. In this case we only have one version of ASM. This model is named yaw00. The class model are configured and initialized following the next attributes: Estart Detectors that will be used to align and adapt the shape for the face, this will be described in section 4.6 The shape model is initialized as mentioned before. Parameters bmax and neigs. They have a value from empirical testing Parameters bmax and negis During the landmark search, a shape by profile matching is generated and the suggested shape is conformed to the shape model. For the best result is needed to choose the appropriate neigs and bmax parameters for the shape model must be chosen. neigs is the number of eigenvectors used by the shape model. bmax is a parameter controlling the maximum allowable value of elements of b in Equation 8 For face shape there are 154 eigenvectors for 76 landmarks. The chosen values after empirical testing are neigs=20 and bmax=1.5. These parameters have been chosen by Stephen Millborrow and Fred Nicolls from empirical testing. 28

30 4.6 Align the Shape 4.6 Align the Shape The goal of this module is to align the start shape with the ROI, given the face rectangle. Depending on the estart filed in the model, we detect the eyes and mouth and use those to help fit the start shape Align with the Rectangle With the model estart=estart_rect_only, the start shape is created by aligning the model mean face shape to the rectangle. (The face rectangle is found by the face detector). Figure 16 Align ROI with the rectangle model. It aligns mean shape to the face detector rectangle and returns it as start shape. Therefore, ignoring the eyes and mouth Align with the Eyes With the model estart=estart_eyes the start shape is created as follows. Using the face rectangle that is found by the face detector, Stasm [11] searches the eyes (using the Viola-Jones algorithm for eye detection, OpenCV[14]) in the appropriate subregions within the rectangle. Figure 17 Align with the eyes detector 29

31 4.6.3 Align with the Eyes and Mouth If both eyes are found, the face is rotated so the eyes are horizontal. The start shape is then formed by aligning the mean training shape to the eyes. If either eye is not found, the start shape is aligned to the face detector rectangle Align with the Eyes and Mouth With the model estart=estart_eye_and_mouth the start shape is generated as above, but it search for the mouth(using the Viola-Jones algorithm for eye and mouth detection, OpenCV) and use it if is detected too. Figure 18 Align with the eyes and mouth detector The central idea is to form a triangular shape of the eyes and bottom-of-mouth from the face detector parameters, and to align the same triangle in the mean shape to this triangle. 4.7 Select ASM Model The goal of this module is to provide a shape aligned with the face. To adjust the size of the shape and choose the ASM model. Here the size of the face will be estimated triangulating the eyes position with mouth position. The location of the eyes and mouth have been found in the previous module. Figure 19 select and adjust the size of the start shape using a triangulation of the eyes position with the mouth position 30

32 4.8 ROI Shape to frame 4.8 ROI Shape to frame This is the final module for ASM. In this module the optimal value for parameter b is searched. We iterate the shape model until convergence. (See section ). Then, calculate HAT descriptor of the image resize for previous module. Finally, we fit the shape using MARS algorithm. This technique is described in the previous section 3.5. Figure 20 Fitting the Shape using HAT and MARS 4.9 Mouth Identification This module uses the detected and extracted features in the previous modules, namely those of mouth. The goal is to determine whether the person has the mouth open or closed. This section has three important blokes: The necessary characteristics for detection. The classification algorithm. Evaluation system. This techniques are described in the section 6 31

33 5 Database Chapter 5 5. Database In this chapter we will indicate all the databases that are used. These databases are for training, validating and testing models of facial features detector and another one for detecting whether the mouths are open or closed. 5.1 Facial Dataset In the case of facial features detector two manual landmarks datasets are used. For the creation of the ASM model (created by Steven Milborrow) the following database is used. We should add that information because although we do not work directly with it, we work with the model. It is important to know the characteristics that may influence the behavior of the detector. 1. The University of Surrey XM2VTS[15] database, frontal sets I and II. The XM2VTS frontal image sets contain x256 color images of 295 subjects. The pose and lighting is uniform, with a flat background. The faces are manually landmarked with 77 points. The XM2VTS data must be purchased and cannot in general reproduced. Figure A.1 shows a landmarked XM2VTS face. To test the facial features detector and compare results with other publications, we have used the following manually landmark dataset. 2. The BioID face database with FGnet markup[12] The BioID dataset consists of x286 pixel monochrome images of 23 different subjects. The image are frontal views with a variety of backgrounds and face sizes. The background is an office interior. The face are manually landmarked with 20 points. The data set is freely downloadable and it seems that the faces may be reprinted without legal issues Figure 21 The BioID face database 32

34 5.2 Open and closed mouth dataset 5.2 Open and closed mouth dataset In this case we have used a few datasets: two dataset for training classification, another two for testing the classification and a different one for validation. For training the classifier two videobloguers from youtube have been used. 1. Maria Konnikova: How to think like Sherlock Holmes from big think channel. This video consists of x720 color image of 1 subject. The pose and lightings are uniform with a white background. The subject is a woman and she speaks in front of the camera. Many frame are frontal face with open and closed mouth. Figure 22 Frames to Maria Konnikova: How to think like Sherlock Holmes from big think channel 2. The importance of Knowing: An introduction to Epinets from big think channel. This video consists of x720 color image of 1 subjects. The pose and lightings are uniform with a white background. The subject is a man and he speaks in front of the camera and sometimes he speaks in his better profile. Many frames are frontal face with open and closed mouth. He does not open his mouth too much. Figure 23 Frames to the importance of knowing from big think channel For validating the train set and regulating the sensitivity of classifier, one dataset has been used. 3. Database of Artificial Intelligence Laboratory of FEI. FEI contain x480 color images of 200 subjects between 19 and 40 years old. The pose and lightings are uniform and white homogenous background. The subjects are men and women and their have a frontal position and open and closed mouth. The number of male and female subjects are exactly the same and equal to

35 5.2 Open and closed mouth dataset Finally two videobloguers from youtube and another database have been used for testing set. 4. How to Revise like Sherlock by Mind Place from Maddie Moate channel. This video consists of x720 color image of 1 subject. The pose and lighting are not uniform and it has a shelf background. The subject is a woman and she speaks in front of the camera. Figure 24 Frames from to how to revise like a Sherlock by Mind Place from Maddie Moat channel 5. Sam Harris BigThink (ALL) from Big Think channel. This video consist of x480 color image of 1 subject. The pose and lightings are uniform with white homogenous background. The subject is a man and he speaks in front of the camera. Figure 25 Sam Harris Big Think All from Big Think channel 6. Telenotícies TV3 (TN). TN contain x576 color image of 3 subjects. These subjects are two women and a man. These subjects are TV presenters. The pose of subjects are frontal and there are uniform lightings. The background is not homogenous. Figure 26 telenotíces TV3 (TN) 34

36 5.2 Open and closed mouth dataset The strategy is to use many frame of videos. We use 162 frames with open mouth and 162 frames with closed mouth from Channel big think (data base 1 and 2) for training. In validation case we have used 200 image from FEI dataset, we used 100 frames with closed and 100 frames with opened mouth. Finally for testing we used for opened mouth: 23 frames from NT database, 82 frames from Big think channel (database 1 and 2). For closed mouth: 37 frames from NT database, 93 frames from Big think channel and Maddie Moate channel (database 4 and 5). 35

37 6 Open or Closed Mouth State Detection Chapter 6 6. Open or Closed Mouth State Detection The information obtained from Active Shape Model can be used to recognition of expressions. In this chapter, we will focus on how to use this information to create an open or closed mouth state detector. 6.1 Active Shape Model in Mouth In New Active Shape Model we have a lot of landmarks in mouth, in particular 17. These Landmarks are situated all over the mouth (upper lip, lower lip, inner lip ). With these landmarks, we can know the position of the lips and extract the distance between them. For this we will focus on three groups: Outer Lip (Landmarks 62 and 74) Inner Lip (Landmarks 67 and 70) Corner Mouth (Landmarks 59 and 65) Figure 27 Landmarks located in the mouth 6.2 Vector of features Once we have the positions of the landmarks, we need to calculate the distance between them. This distance will be used to determine if the subject has the mouth open or closed. The initial strategy was to use the outer upper lip and outer lower lip. However, the problem with this approach is that there are many different lips and if the subject has big lips, we could have many false positive. Given a two-dimensional vector v i with the position of the inner upper lip and another one v i with the position of the inner lower lip. 36

38 6.2 Vector of features v 70 = (x 70, y 70 ) v 67 = (x 67, y 67 ) d lip = (x 70 x 67 + y 70 y 67 ) This is the distance of the height of the mouth. However, it is not enough to determine whether the subject has an opened or closed mouth. This distance depends on the size of the mouth and face size. To solve this, we have used the width of mouth. It is not a static distance because the width of mouth is variable in the same subject and it has some proportionality with the height of the mouth. Corners landmarks mouth have been used to calculate this mouth distance. v 65 = (x 65, y 65 ) v 59 = (x 59, y 59 ) d width mouth = (x 65 x 59 + y 65 y 59 ) D = d lip d width mouth This parameter D will be used to classify whether the subject has the mouth open or closed. Figure 28 the distance between the two inner landmarks lips and the width mouth distance 37

39 6.3 Classifier 6.3 Classifier This classifier is a binary classifier, it has a threshold to decide the state of the mouth. In order to decide the value of the threshold parameter the training data set has been used Training In this section the train data set has been used to decide the value of threshold. Before deciding the parameter value, the frames have to be manually labeled as open or closed mouth. After this, the image frame goes through the face detector and we extract their features using New ASM. And finally the parameter D for all the image frames is calculated. Once all parameters have been calculated, the histogram for all parameter D from closed mouth are. The histograms are modeled using a Gaussian distribution. Figure 29 Histograms with the Gaussian distribution. The left histogram is about closed mouth and the right histogram is about the open mouth. The same process is repeated for the open mouth set. Once the two histograms have been calculated, they are placed with their Gaussian distribution in the same plane. The intersection point between the two Gaussian is called threshold. Theoretically it is the point that accumulate less mistakes. 38

40 6.3.2 Classification Figure 30 the two histograms with their Gaussian distribution in the same plane for looking for the intersection between the Gaussian distribution and find the threshold. The chosen value after the training set and validation set is T= The process about validation will be described in chapter Classification By the T parameter calculated previously. T needs to be compared with parameter D. In this case the distance between these parameters will determine the state mouth of the subject. { D T the subject smouth is closed D > T the subject s mouth is opened 39

41 7 Experimental Results Chapter 7 7. Experimental Results In this chapter all the experiments results will be exposed which will then be compared with others. Additionally all the measurement methods used to compare these results will be explained. Search the best way to align the start shape with the ROI and look for a start shape model. Look for the worst and the best principal landmark and compare the landmarks with the methods analyzed in the previous step. Look for all the cases when the New ASM has failed. Finally compare the classical ASM with the New ASM. On the other hand, for the mouth state detection the followings points will be analyzed: Method to measure the classifier. Analysis of histogram and curve ROC Validation process. The final test and results with test dataset. 7.1 NEW ASM The me17 measure: The me17 is a normalized measure to determine the amount of error in a facial landmark fitting. This measure is focused on the 17 landmarks in common to all the shape models. These are the most important landmarks. internal BioID landmarks Appendix A Table 11 Following Cristinacce [13], the me17 is calculated as follows: The 17 points are the Calculate the distance between each of 17 points located by the search and the corresponding manually landmarked point. v manual = (x i, y i ) v search = (x s, y s ) d i = (x i x s + y i y s ) Calculate the distance between the eyes pupils from manually landmarks points. 40

42 7.1.2 Compare the different start shape models. v eyel = (x l, y l ) v eyer = (x r, y r ) d eyes = (x r x l + y r y l ) Divide the results of distance between each of 17 points (step1) by the distance between the eyes pupils (step2) D = d i d eyes If there are more than one image, we take the mean me17 for all the landmarks for all the images I m. 17 I m = 1 17 D i With this measure, we can compare this model with the others ones and extract the conclusions Compare the different start shape models. In the chapter 4.6, we described different forms to align the start shape with the detected face and look for a start shape model. The authors propose three methods: ASM0align the rectangle and find the initial shape looking for the left eye and find the right eye searching the symmetry. ASM1align the rectangle searching the eyes in the appropriate subregions within the rectangle and find the initial shape looking for the left and right eyes. ASM2align the rectangle searching the eyes and mouth in the appropriate subregions within the rectangle and find the initial shape triangulating the eyes and bottom-of-mouth from the face detector parameter. i=0 41

43 7.1.3 Graphs for comparing models Graphs for comparing models Image error with me17 In this step it calculates the me17 measure for all the images from BioID database. The ASM0 vs ASM1 vs ASM2 will be compared. BioID has 1521 images. However, we only work with 1233 images because the face detector did not detect the other faces before. In the next graph you see in the X axis identifier of the images and in the Y axis the me17 error for these images. For ASM0 we have the following results: Graph 1 all the image from BioId data set with the me17 error using ASM and aligning start shape with input rectangle In this case, in general we have good results. However, we can differentiate three groups in this graph: 1. Between 0 and 0.2 we have very good results. These have a little error. 2. Between 0.2 and 0.4 we have a good result but a bit of error. The majority of the images are in this case. 3. Between 0.4 and 1 we can consider a poor outcome. They have plenty of errors. In chapter the images of group 1 and 2 will be displayed. 42

44 Image error with me17 By ASM1 we have the following results: Graph 2 all the image from BioId data set with the me17 error using ASM and aligning start shape with eyes detector In this case we can find better results. However, we can see several cases where the ASM1 has failed and have an error. Also there are a few groups with a bit of error. In the next step failed cases will be studied. This ASM1 is better than ASM0. By ASM2 we have the following results: Graph 3 all the image from BioId data set with the me17 error using ASM and aligning start shape with eyes and mouth detector In this case we have a good result in general. Nevertheless, we can see several cases where the ASM2 has failed and has an error. We can also see, that the ASM2 is worse than the ASM1 in several points. 43

45 Image error with me17 Graph 4 all the image from BioId data set with the me17 error. Colour blue is ASM0, the colour green is ASM2 and the red colour is ASM1. If we compare the three ASM we can see that although we have a good result, the ASM1 has a better fitting and it has less error. Whereas, the ASM0 is worse than the others. Mean_me17 ASM ASM 1 0,0733 ASM Table 1 mean_me17 ASM0 vs ASM1 vs ASM2 44

46 Landmark error with me Landmark error with me17 In this step the me17 measure for all the landmarks from BioID database is calculated. The ASM0 vs ASM1 vs ASM2 will be compared. The work was done with 1233 images and calculated the maximum, minimum and mean me17 for each landmark. Additionally, groups of landmarks have been separated. For example: landmarks from right eye, left eye mouth The following results are ASM0: Graph 5 Distance error landmarks aligning start shape with input rectangle. The red points are maximum error. The green points are the minimum error of each landmark. The blue points are the mean error In this graph, it is easy to see that there is bit of error in the eyes landmarks. In contrast, the mouth landmarks have more error. The landmarks that have had maximum error are the landmarks that are located in the edges of the face, which are, the ones that are at eye level. 45

47 Landmark error with me17 The following results are of ASM1: Graph 6 Distance error landmarks aligning start shape eyes detector. The red points are maximum error. The green points are the minimum error of each landmark. The blue points are the mean error If we compare this graph with the previous one, we can see that this graph is better. The mean error is minimum. In this graph we can also see that the better landmarks are the landmarks located in the eyes, but we have better results in this mouth landmarks. Despite this, the maximum error is now in one landmark in the mouth. The worst mean is still in the landmark that are located in the edge of the face. the following results are of ASM2: Graph 7 Distance error landmarks aligning start shape with eyes and mouth detector. The red points are maximum error. The green points are the minimum error of each landmark. The blue points are the mean error 46

48 Classic ASM vs New ASM The results of this graph are worse than the results of ASM1, but ASM2 is better than ASM0. We have the same results as the ASM1, but with a higher error. Mean_landmark_me17 Max_landmarks_me17 ASM ASM ASM Table 2 mean_landmark_me17 ASM0 vs ASM1 vs ASM Classic ASM vs New ASM Graph 8 all the image from BioId data set with the me17 error using classic ASM from UPC It is described in the chapter 3.2 and we have called it classical ASM. This graph is the worst. We can observe many errors in this graph, even though there are some images without error. Mean_me17 ASM classic 0,3790 ASM 1 0,0733 Table 3 mean_me17 ASM1 vs classic ASM 47

49 Empirical cumulative distribution function Graph 9 Distance error landmarks using classic ASM. The red points are maximum error. The green points are the minimum error of each landmark. The blue points are the mean error This graph follows the same pattern as the others ones. Paradoxically, the landmark located in the chin is better in this model than in the other models. Mean_landmark_me17 Max_landmarks_me17 ASM classic ASM Table 4 mean landmark ASM1 vs ASM classic Empirical cumulative distribution function Graph 10 the empirical cumulative distribution function. The red line represented by ASM1, the blue line by ASM0, the grey line is represented by ASM2 and finally the green line is represented by classic ASM. 48

50 7.1.4 Visual evaluation with empirical test The Empirical cumulative distribution function is a standard way to compare many models. The faster the curve goes up and the quicker it reaches the top, the better the model is. Then, we can say that the best model is ASM1. Given a vector of me17 measurements vme17, the pseudo code to plot an ECDF on Matlab is x = sort(vme17); 1: lengh(vme17) y = lengh(vme17) ; plot(x, y); Visual evaluation with empirical test Figure 31 Images with many landmark errors found in the results of the graphs. Order from left to right and from top bottom these image correspond to the number 259,282,392, 901,423,640,740,803 of the Graph 2. In these graphs we have seen some errors. Specifics errors that only appear in one image or errors that we can find in different images. Watching the images, we can say that the worst is 259, because the ASM does not find the face features. The typical mistake that appears depending of the subject is that the ASM confuses the nose with the mouth. Finally, we can say that in some images the ASM does not fit the landmarks that are located in the edge of the face. 49

51 7.2 Mouth State Detector 7.2 Mouth State Detector Confusion Matrix Matrix confusion is a specific layout that allows visualization of the performance of algorithms and to evaluate the classifier. Each column of the matrix represents the instance in an actual class and each row represents the instances in a predicted class. Positive Condition Negative Condition Test outcome positive True Positive (TP) False Positive (FP) Test outcome negative False Negative (FN) True Negative (TN) Table 5 Confusion Matrix. Condition positive: All of which the ground truth identifies as having the positive condition (PC). Condition negative: All of which the ground truth identifies as having the negative condition (NC). True positive: when the prediction is the same as the actual class. In a binary case, it is positive when the case that we have searched is considered. False positive: when a test result has the positive condition, but finally the ground truth indicates that it is not positive. True negative: when the prediction is the same to an actual class and it has the negative condition. False negative: when a test result has the negative condition, but finally the ground truth indicates that it is not negative. Once we have the matrix confusion we can calculate the statically measures. This statically measures display the performance and with that we could compare the classifier with the other classifiers. Recall Recall is the sensitivity, it measures the proportion of actual positives which are correctly identified as such. 50

52 7.2.2 The ROC curve R ecall = TP PC R ecalltotal = R ecall i i=0 Precision Precision measure the proportion of positive correctly identified with all the identified as positive. F_score TP P recision = Test Positive P recisiontotal = P recision i F_score is a measure of test s accuracy that combines precision and recall. The F_score can be interpreted as a weighted average of the precision and recall. Accuracy i=0 F score = P recision R ecall P recision R ecall 2 1 F scoretotal= 2 F score i The Accuracy is the proportion of the results both true positive and true negatives in the population. Accuracy = 2 i=0 TP + TN TotalPopulation The ROC curve The ROC curve positive likelihood ratio is the relation between the true positives with the false positives. In contrast, the ROC curve negative likelihood ratio is the relation between the true negatives with the false negatives. In our case, this curves display the possibility that the mouth be opened or closed. The faster the curve goes up and the quicker it reaches the top, the better is the classifier. The point (0, 1) would be the perfect classification and the point (1, 0) would be worst classification. One classifier is good considerate if it is located above of the random guess line. If 51

53 7.2.3 The ROC curve for dataset validation the points are above of the line the result is better than expected and if the points are below of the line the result is worse than expected The ROC curve for dataset validation Graph 11 The ROC curve the left graph is the ROC curve positive likelihood ratio for the open mouth and the right is the ROC curve Negative likelihood ratio for the closed mouth. The red line is the random guess line. To validate the classifier the ROC curve was analyzed. Therefore enabling to see the quality of the classifier and knowing if the threshold is good or not. True positive are for opened mouth and true negative are for closed mouth. As can be seen in the ROC curve positive likelihood ratio the curve is above the random guess line. Then, the classifier will find the 60% of opened mouth without error. On the other hand, the ROC curve negative likelihood ratio is also above the random guess line, but it is not as good as the ROC curve positive likelihood ratio. That is, there is more probability that the recall of the open mouth is higher than the recall of the closed mouth. Graph 12 The ROC curves with the proofs validation (green points) and the finally test (red points) Plenty of proofs have been done with the validation of the database, and the results are better than expected. Consequently, a good threshold has been chosen and 52

54 7.2.4 Results finally it can be seen that the result is good because it is above of the line of classifier (red point) Results Open mouth Closed mouth Open mouth Closed mouth Table 6 Confusion Matrix of the test database In this table the confusion matrix can be seen. The classifier has guessed 91 true positive and 25 false positives (positive in our case is open mouth). On the other hand, the classifier has guessed 59 true negative and 13 false negatives (negative in our case is close mouth). Recall Precision F_score Open mouth Closed mouth Table 7 statically measures In that one the measures of classifiers can be seen. As mentioned before in the ROC curve, the recall of open mouth is higher than the recall of the close mouth. The F_score that combines precision and recall is better in the open mouth. Total Accuracy Precision Recall F_score Table 8 average of statically measures. Finally, we can see that the mean of the total measure is around 80%. 53

55 8 Conclusions and Possible Future Research Chapter 8 8. Conclusions and Possible Future Research 8.1 Conclusions The objective of this project is to study the face features. This information enables to interact with systems using cameras and automated process. To achieve that ASM has been used the classic ASM has been studied and we have tried to improve it by using a variation of ASM. This ASM was created by Stephen Millborrow and Fred Nicolls. And it is different from the classical version because the new ASM uses the descriptor HAT instead of gradient descriptor in 2D. Another difference is that the classic ASM uses the distance Mahalanobis instead of MARS. In the new ASM the three methods to align the rectangle and find the initial shape have been analyzed. Displaying that the best one is ASM1, which aligns the rectangle that searches the eyes with appropriate subregions within the rectangle and finds the initial shape looking for the left and right eyes. In contrast, the worst one is ASM0. This has been an unexpected result, under the initial assumption that the ASM2 would better, because it aligns the rectangle searching the eyes and mouth in the appropriate subregions within the rectangle and it finds the initial shape triangulating the eyes and bottom-of-mouth from the face detector parameter. Therefore, it is very deficient for it does not find the mouth. Taking in to account that sometimes the mouth is not well detected when the subject has a moustache. On the other hand, the Active Shape Model has been used to recognize the facial expressions. In order to do that, we have focused on how to use this information to create an open or closed mouth state detection. This can have a future application to help for the audio recording in video. It is well known that the people who record the audio in the video have problems knowing when the actor is beginning to speak when there is noise in the video. This classifier is a basic binary classifier based on a threshold. We have had good results as we could see in the positive and negative ROC curve. This has been surprising taking into account that there was no design of a more complex algorithm such as SVM or KNN. 54

56 8.2 Possible Future Research In conclusion, we have demonstrated that ASM is a good method for extracting the features in faces and the information obtained from Active Shape Model is useful to recognize the expressions. 8.2 Possible Future Research The ASM is a technique that gives good results, although it has some features that have to be improved. In this project, only frontal faces were accepted because the side face model does not exist. A future project would be the creation of this side face model. It would only be necessary to do one side because the face is symmetric. On the other hand, a classifier of open and close mouth state detector has to be created using other algorithms. For example, SVM and KNN. 55

57 0 Face landmarks Appendix A Face landmarks Figure 32 BioId landmarks LEyeBottom 0 REyeBottom 1 LMouthCorner 2 RMouthCorner 3 LOuterEyeBrow 4 LInnerEyeBrow 5 LEyeInner 10 RInnerEyeBrow 6 REyeInner 11 ROuterEyeBrow REyeOuter 7 12 LTemple 8 RJaw1 13 LEyeOuter 9 NoseTip 14 Table 9 Landmarks descriptors from BioID LNoseBot 15 RNoseBot 16 MouthTopOfTopLip 17 MouthBotOfBotLip 18 TipOfChin 19 56

58 0 Face landmarks Figure 33 classic ASM with 68 landmarks 57

59 0 Face landmarks Figure 34 New ASM with 77 landmarks 58

Scale Invariant Feature Transform

Scale Invariant Feature Transform Scale Invariant Feature Transform Why do we care about matching features? Camera calibration Stereo Tracking/SFM Image moiaicing Object/activity Recognition Objection representation and recognition Image

More information

Scale Invariant Feature Transform

Scale Invariant Feature Transform Why do we care about matching features? Scale Invariant Feature Transform Camera calibration Stereo Tracking/SFM Image moiaicing Object/activity Recognition Objection representation and recognition Automatic

More information

SIFT: SCALE INVARIANT FEATURE TRANSFORM SURF: SPEEDED UP ROBUST FEATURES BASHAR ALSADIK EOS DEPT. TOPMAP M13 3D GEOINFORMATION FROM IMAGES 2014

SIFT: SCALE INVARIANT FEATURE TRANSFORM SURF: SPEEDED UP ROBUST FEATURES BASHAR ALSADIK EOS DEPT. TOPMAP M13 3D GEOINFORMATION FROM IMAGES 2014 SIFT: SCALE INVARIANT FEATURE TRANSFORM SURF: SPEEDED UP ROBUST FEATURES BASHAR ALSADIK EOS DEPT. TOPMAP M13 3D GEOINFORMATION FROM IMAGES 2014 SIFT SIFT: Scale Invariant Feature Transform; transform image

More information

Face Alignment Under Various Poses and Expressions

Face Alignment Under Various Poses and Expressions Face Alignment Under Various Poses and Expressions Shengjun Xin and Haizhou Ai Computer Science and Technology Department, Tsinghua University, Beijing 100084, China ahz@mail.tsinghua.edu.cn Abstract.

More information

Outline 7/2/201011/6/

Outline 7/2/201011/6/ Outline Pattern recognition in computer vision Background on the development of SIFT SIFT algorithm and some of its variations Computational considerations (SURF) Potential improvement Summary 01 2 Pattern

More information

Features Points. Andrea Torsello DAIS Università Ca Foscari via Torino 155, Mestre (VE)

Features Points. Andrea Torsello DAIS Università Ca Foscari via Torino 155, Mestre (VE) Features Points Andrea Torsello DAIS Università Ca Foscari via Torino 155, 30172 Mestre (VE) Finding Corners Edge detectors perform poorly at corners. Corners provide repeatable points for matching, so

More information

Feature Detection and Tracking with Constrained Local Models

Feature Detection and Tracking with Constrained Local Models Feature Detection and Tracking with Constrained Local Models David Cristinacce and Tim Cootes Dept. Imaging Science and Biomedical Engineering University of Manchester, Manchester, M3 9PT, U.K. david.cristinacce@manchester.ac.uk

More information

Generic Face Alignment Using an Improved Active Shape Model

Generic Face Alignment Using an Improved Active Shape Model Generic Face Alignment Using an Improved Active Shape Model Liting Wang, Xiaoqing Ding, Chi Fang Electronic Engineering Department, Tsinghua University, Beijing, China {wanglt, dxq, fangchi} @ocrserv.ee.tsinghua.edu.cn

More information

Training-Free, Generic Object Detection Using Locally Adaptive Regression Kernels

Training-Free, Generic Object Detection Using Locally Adaptive Regression Kernels Training-Free, Generic Object Detection Using Locally Adaptive Regression Kernels IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIENCE, VOL.32, NO.9, SEPTEMBER 2010 Hae Jong Seo, Student Member,

More information

Last week. Multi-Frame Structure from Motion: Multi-View Stereo. Unknown camera viewpoints

Last week. Multi-Frame Structure from Motion: Multi-View Stereo. Unknown camera viewpoints Last week Multi-Frame Structure from Motion: Multi-View Stereo Unknown camera viewpoints Last week PCA Today Recognition Today Recognition Recognition problems What is it? Object detection Who is it? Recognizing

More information

Image Processing. Image Features

Image Processing. Image Features Image Processing Image Features Preliminaries 2 What are Image Features? Anything. What they are used for? Some statements about image fragments (patches) recognition Search for similar patches matching

More information

EE368 Project Report CD Cover Recognition Using Modified SIFT Algorithm

EE368 Project Report CD Cover Recognition Using Modified SIFT Algorithm EE368 Project Report CD Cover Recognition Using Modified SIFT Algorithm Group 1: Mina A. Makar Stanford University mamakar@stanford.edu Abstract In this report, we investigate the application of the Scale-Invariant

More information

Chapter 3 Image Registration. Chapter 3 Image Registration

Chapter 3 Image Registration. Chapter 3 Image Registration Chapter 3 Image Registration Distributed Algorithms for Introduction (1) Definition: Image Registration Input: 2 images of the same scene but taken from different perspectives Goal: Identify transformation

More information

Image processing and features

Image processing and features Image processing and features Gabriele Bleser gabriele.bleser@dfki.de Thanks to Harald Wuest, Folker Wientapper and Marc Pollefeys Introduction Previous lectures: geometry Pose estimation Epipolar geometry

More information

DISTANCE MAPS: A ROBUST ILLUMINATION PREPROCESSING FOR ACTIVE APPEARANCE MODELS

DISTANCE MAPS: A ROBUST ILLUMINATION PREPROCESSING FOR ACTIVE APPEARANCE MODELS DISTANCE MAPS: A ROBUST ILLUMINATION PREPROCESSING FOR ACTIVE APPEARANCE MODELS Sylvain Le Gallou*, Gaspard Breton*, Christophe Garcia*, Renaud Séguier** * France Telecom R&D - TECH/IRIS 4 rue du clos

More information

Building a Panorama. Matching features. Matching with Features. How do we build a panorama? Computational Photography, 6.882

Building a Panorama. Matching features. Matching with Features. How do we build a panorama? Computational Photography, 6.882 Matching features Building a Panorama Computational Photography, 6.88 Prof. Bill Freeman April 11, 006 Image and shape descriptors: Harris corner detectors and SIFT features. Suggested readings: Mikolajczyk

More information

Local Features: Detection, Description & Matching

Local Features: Detection, Description & Matching Local Features: Detection, Description & Matching Lecture 08 Computer Vision Material Citations Dr George Stockman Professor Emeritus, Michigan State University Dr David Lowe Professor, University of British

More information

Face detection and recognition. Detection Recognition Sally

Face detection and recognition. Detection Recognition Sally Face detection and recognition Detection Recognition Sally Face detection & recognition Viola & Jones detector Available in open CV Face recognition Eigenfaces for face recognition Metric learning identification

More information

Obtaining Feature Correspondences

Obtaining Feature Correspondences Obtaining Feature Correspondences Neill Campbell May 9, 2008 A state-of-the-art system for finding objects in images has recently been developed by David Lowe. The algorithm is termed the Scale-Invariant

More information

Image matching. Announcements. Harder case. Even harder case. Project 1 Out today Help session at the end of class. by Diva Sian.

Image matching. Announcements. Harder case. Even harder case. Project 1 Out today Help session at the end of class. by Diva Sian. Announcements Project 1 Out today Help session at the end of class Image matching by Diva Sian by swashford Harder case Even harder case How the Afghan Girl was Identified by Her Iris Patterns Read the

More information

Harder case. Image matching. Even harder case. Harder still? by Diva Sian. by swashford

Harder case. Image matching. Even harder case. Harder still? by Diva Sian. by swashford Image matching Harder case by Diva Sian by Diva Sian by scgbt by swashford Even harder case Harder still? How the Afghan Girl was Identified by Her Iris Patterns Read the story NASA Mars Rover images Answer

More information

AAM Based Facial Feature Tracking with Kinect

AAM Based Facial Feature Tracking with Kinect BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 15, No 3 Sofia 2015 Print ISSN: 1311-9702; Online ISSN: 1314-4081 DOI: 10.1515/cait-2015-0046 AAM Based Facial Feature Tracking

More information

Understanding Faces. Detection, Recognition, and. Transformation of Faces 12/5/17

Understanding Faces. Detection, Recognition, and. Transformation of Faces 12/5/17 Understanding Faces Detection, Recognition, and 12/5/17 Transformation of Faces Lucas by Chuck Close Chuck Close, self portrait Some slides from Amin Sadeghi, Lana Lazebnik, Silvio Savarese, Fei-Fei Li

More information

Harder case. Image matching. Even harder case. Harder still? by Diva Sian. by swashford

Harder case. Image matching. Even harder case. Harder still? by Diva Sian. by swashford Image matching Harder case by Diva Sian by Diva Sian by scgbt by swashford Even harder case Harder still? How the Afghan Girl was Identified by Her Iris Patterns Read the story NASA Mars Rover images Answer

More information

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor COSC160: Detection and Classification Jeremy Bolton, PhD Assistant Teaching Professor Outline I. Problem I. Strategies II. Features for training III. Using spatial information? IV. Reducing dimensionality

More information

SIFT: Scale Invariant Feature Transform

SIFT: Scale Invariant Feature Transform 1 / 25 SIFT: Scale Invariant Feature Transform Ahmed Othman Systems Design Department University of Waterloo, Canada October, 23, 2012 2 / 25 1 SIFT Introduction Scale-space extrema detection Keypoint

More information

Motion Estimation and Optical Flow Tracking

Motion Estimation and Optical Flow Tracking Image Matching Image Retrieval Object Recognition Motion Estimation and Optical Flow Tracking Example: Mosiacing (Panorama) M. Brown and D. G. Lowe. Recognising Panoramas. ICCV 2003 Example 3D Reconstruction

More information

Image Segmentation and Registration

Image Segmentation and Registration Image Segmentation and Registration Dr. Christine Tanner (tanner@vision.ee.ethz.ch) Computer Vision Laboratory, ETH Zürich Dr. Verena Kaynig, Machine Learning Laboratory, ETH Zürich Outline Segmentation

More information

SUMMARY: DISTINCTIVE IMAGE FEATURES FROM SCALE- INVARIANT KEYPOINTS

SUMMARY: DISTINCTIVE IMAGE FEATURES FROM SCALE- INVARIANT KEYPOINTS SUMMARY: DISTINCTIVE IMAGE FEATURES FROM SCALE- INVARIANT KEYPOINTS Cognitive Robotics Original: David G. Lowe, 004 Summary: Coen van Leeuwen, s1460919 Abstract: This article presents a method to extract

More information

CHAPTER 1 Introduction 1. CHAPTER 2 Images, Sampling and Frequency Domain Processing 37

CHAPTER 1 Introduction 1. CHAPTER 2 Images, Sampling and Frequency Domain Processing 37 Extended Contents List Preface... xi About the authors... xvii CHAPTER 1 Introduction 1 1.1 Overview... 1 1.2 Human and Computer Vision... 2 1.3 The Human Vision System... 4 1.3.1 The Eye... 5 1.3.2 The

More information

Face detection and recognition. Many slides adapted from K. Grauman and D. Lowe

Face detection and recognition. Many slides adapted from K. Grauman and D. Lowe Face detection and recognition Many slides adapted from K. Grauman and D. Lowe Face detection and recognition Detection Recognition Sally History Early face recognition systems: based on features and distances

More information

Implementing the Scale Invariant Feature Transform(SIFT) Method

Implementing the Scale Invariant Feature Transform(SIFT) Method Implementing the Scale Invariant Feature Transform(SIFT) Method YU MENG and Dr. Bernard Tiddeman(supervisor) Department of Computer Science University of St. Andrews yumeng@dcs.st-and.ac.uk Abstract The

More information

Skin and Face Detection

Skin and Face Detection Skin and Face Detection Linda Shapiro EE/CSE 576 1 What s Coming 1. Review of Bakic flesh detector 2. Fleck and Forsyth flesh detector 3. Details of Rowley face detector 4. Review of the basic AdaBoost

More information

Feature Detection. Raul Queiroz Feitosa. 3/30/2017 Feature Detection 1

Feature Detection. Raul Queiroz Feitosa. 3/30/2017 Feature Detection 1 Feature Detection Raul Queiroz Feitosa 3/30/2017 Feature Detection 1 Objetive This chapter discusses the correspondence problem and presents approaches to solve it. 3/30/2017 Feature Detection 2 Outline

More information

Designing Applications that See Lecture 7: Object Recognition

Designing Applications that See Lecture 7: Object Recognition stanford hci group / cs377s Designing Applications that See Lecture 7: Object Recognition Dan Maynes-Aminzade 29 January 2008 Designing Applications that See http://cs377s.stanford.edu Reminders Pick up

More information

Augmented Reality VU. Computer Vision 3D Registration (2) Prof. Vincent Lepetit

Augmented Reality VU. Computer Vision 3D Registration (2) Prof. Vincent Lepetit Augmented Reality VU Computer Vision 3D Registration (2) Prof. Vincent Lepetit Feature Point-Based 3D Tracking Feature Points for 3D Tracking Much less ambiguous than edges; Point-to-point reprojection

More information

Project Report for EE7700

Project Report for EE7700 Project Report for EE7700 Name: Jing Chen, Shaoming Chen Student ID: 89-507-3494, 89-295-9668 Face Tracking 1. Objective of the study Given a video, this semester project aims at implementing algorithms

More information

Patch-based Object Recognition. Basic Idea

Patch-based Object Recognition. Basic Idea Patch-based Object Recognition 1! Basic Idea Determine interest points in image Determine local image properties around interest points Use local image properties for object classification Example: Interest

More information

HISTOGRAMS OF ORIENTATIO N GRADIENTS

HISTOGRAMS OF ORIENTATIO N GRADIENTS HISTOGRAMS OF ORIENTATIO N GRADIENTS Histograms of Orientation Gradients Objective: object recognition Basic idea Local shape information often well described by the distribution of intensity gradients

More information

Feature Descriptors. CS 510 Lecture #21 April 29 th, 2013

Feature Descriptors. CS 510 Lecture #21 April 29 th, 2013 Feature Descriptors CS 510 Lecture #21 April 29 th, 2013 Programming Assignment #4 Due two weeks from today Any questions? How is it going? Where are we? We have two umbrella schemes for object recognition

More information

Machine Learning for Signal Processing Detecting faces (& other objects) in images

Machine Learning for Signal Processing Detecting faces (& other objects) in images Machine Learning for Signal Processing Detecting faces (& other objects) in images Class 8. 27 Sep 2016 11755/18979 1 Last Lecture: How to describe a face The typical face A typical face that captures

More information

Scale Invariant Feature Transform by David Lowe

Scale Invariant Feature Transform by David Lowe Scale Invariant Feature Transform by David Lowe Presented by: Jerry Chen Achal Dave Vaishaal Shankar Some slides from Jason Clemons Motivation Image Matching Correspondence Problem Desirable Feature Characteristics

More information

Advanced Video Content Analysis and Video Compression (5LSH0), Module 4

Advanced Video Content Analysis and Video Compression (5LSH0), Module 4 Advanced Video Content Analysis and Video Compression (5LSH0), Module 4 Visual feature extraction Part I: Color and texture analysis Sveta Zinger Video Coding and Architectures Research group, TU/e ( s.zinger@tue.nl

More information

Feature Detectors and Descriptors: Corners, Lines, etc.

Feature Detectors and Descriptors: Corners, Lines, etc. Feature Detectors and Descriptors: Corners, Lines, etc. Edges vs. Corners Edges = maxima in intensity gradient Edges vs. Corners Corners = lots of variation in direction of gradient in a small neighborhood

More information

Feature descriptors and matching

Feature descriptors and matching Feature descriptors and matching Detections at multiple scales Invariance of MOPS Intensity Scale Rotation Color and Lighting Out-of-plane rotation Out-of-plane rotation Better representation than color:

More information

Final Exam Study Guide

Final Exam Study Guide Final Exam Study Guide Exam Window: 28th April, 12:00am EST to 30th April, 11:59pm EST Description As indicated in class the goal of the exam is to encourage you to review the material from the course.

More information

BSB663 Image Processing Pinar Duygulu. Slides are adapted from Selim Aksoy

BSB663 Image Processing Pinar Duygulu. Slides are adapted from Selim Aksoy BSB663 Image Processing Pinar Duygulu Slides are adapted from Selim Aksoy Image matching Image matching is a fundamental aspect of many problems in computer vision. Object or scene recognition Solving

More information

Angle Based Facial Expression Recognition

Angle Based Facial Expression Recognition Angle Based Facial Expression Recognition Maria Antony Kodiyan 1, Nikitha Benny 2, Oshin Maria George 3, Tojo Joseph 4, Jisa David 5 Student, Dept of Electronics & Communication, Rajagiri School of Engg:

More information

Local Feature Detectors

Local Feature Detectors Local Feature Detectors Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr Slides adapted from Cordelia Schmid and David Lowe, CVPR 2003 Tutorial, Matthew Brown,

More information

Edge and corner detection

Edge and corner detection Edge and corner detection Prof. Stricker Doz. G. Bleser Computer Vision: Object and People Tracking Goals Where is the information in an image? How is an object characterized? How can I find measurements

More information

Key properties of local features

Key properties of local features Key properties of local features Locality, robust against occlusions Must be highly distinctive, a good feature should allow for correct object identification with low probability of mismatch Easy to etract

More information

Real-time facial feature point extraction

Real-time facial feature point extraction University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 2007 Real-time facial feature point extraction Ce Zhan University of Wollongong,

More information

3D from Photographs: Automatic Matching of Images. Dr Francesco Banterle

3D from Photographs: Automatic Matching of Images. Dr Francesco Banterle 3D from Photographs: Automatic Matching of Images Dr Francesco Banterle francesco.banterle@isti.cnr.it 3D from Photographs Automatic Matching of Images Camera Calibration Photographs Surface Reconstruction

More information

CS 4495 Computer Vision A. Bobick. CS 4495 Computer Vision. Features 2 SIFT descriptor. Aaron Bobick School of Interactive Computing

CS 4495 Computer Vision A. Bobick. CS 4495 Computer Vision. Features 2 SIFT descriptor. Aaron Bobick School of Interactive Computing CS 4495 Computer Vision Features 2 SIFT descriptor Aaron Bobick School of Interactive Computing Administrivia PS 3: Out due Oct 6 th. Features recap: Goal is to find corresponding locations in two images.

More information

Hand Gesture Extraction by Active Shape Models

Hand Gesture Extraction by Active Shape Models Hand Gesture Extraction by Active Shape Models Nianjun Liu, Brian C. Lovell School of Information Technology and Electrical Engineering The University of Queensland, Brisbane 4072, Australia National ICT

More information

Local invariant features

Local invariant features Local invariant features Tuesday, Oct 28 Kristen Grauman UT-Austin Today Some more Pset 2 results Pset 2 returned, pick up solutions Pset 3 is posted, due 11/11 Local invariant features Detection of interest

More information

CS4670: Computer Vision

CS4670: Computer Vision CS4670: Computer Vision Noah Snavely Lecture 6: Feature matching and alignment Szeliski: Chapter 6.1 Reading Last time: Corners and blobs Scale-space blob detector: Example Feature descriptors We know

More information

TA Section 7 Problem Set 3. SIFT (Lowe 2004) Shape Context (Belongie et al. 2002) Voxel Coloring (Seitz and Dyer 1999)

TA Section 7 Problem Set 3. SIFT (Lowe 2004) Shape Context (Belongie et al. 2002) Voxel Coloring (Seitz and Dyer 1999) TA Section 7 Problem Set 3 SIFT (Lowe 2004) Shape Context (Belongie et al. 2002) Voxel Coloring (Seitz and Dyer 1999) Sam Corbett-Davies TA Section 7 02-13-2014 Distinctive Image Features from Scale-Invariant

More information

Computer Vision. Recap: Smoothing with a Gaussian. Recap: Effect of σ on derivatives. Computer Science Tripos Part II. Dr Christopher Town

Computer Vision. Recap: Smoothing with a Gaussian. Recap: Effect of σ on derivatives. Computer Science Tripos Part II. Dr Christopher Town Recap: Smoothing with a Gaussian Computer Vision Computer Science Tripos Part II Dr Christopher Town Recall: parameter σ is the scale / width / spread of the Gaussian kernel, and controls the amount of

More information

Feature Extraction and Image Processing, 2 nd Edition. Contents. Preface

Feature Extraction and Image Processing, 2 nd Edition. Contents. Preface , 2 nd Edition Preface ix 1 Introduction 1 1.1 Overview 1 1.2 Human and Computer Vision 1 1.3 The Human Vision System 3 1.3.1 The Eye 4 1.3.2 The Neural System 7 1.3.3 Processing 7 1.4 Computer Vision

More information

Eye Detection by Haar wavelets and cascaded Support Vector Machine

Eye Detection by Haar wavelets and cascaded Support Vector Machine Eye Detection by Haar wavelets and cascaded Support Vector Machine Vishal Agrawal B.Tech 4th Year Guide: Simant Dubey / Amitabha Mukherjee Dept of Computer Science and Engineering IIT Kanpur - 208 016

More information

Criminal Identification System Using Face Detection and Recognition

Criminal Identification System Using Face Detection and Recognition Criminal Identification System Using Face Detection and Recognition Piyush Kakkar 1, Mr. Vibhor Sharma 2 Information Technology Department, Maharaja Agrasen Institute of Technology, Delhi 1 Assistant Professor,

More information

Duck Efface: Decorating faces with duck bills aka Duckification

Duck Efface: Decorating faces with duck bills aka Duckification Duck Efface: Decorating faces with duck bills aka Duckification Marlee Gotlieb, Scott Hendrickson, and Martin Wickham University of Wisconsin, Madison CS 534, Fall 2015 http://pages.cs.wisc.edu/~hendrick/cs534/duckefface

More information

Face Recognition with Local Binary Patterns

Face Recognition with Local Binary Patterns Face Recognition with Local Binary Patterns Bachelor Assignment B.K. Julsing University of Twente Department of Electrical Engineering, Mathematics & Computer Science (EEMCS) Signals & Systems Group (SAS)

More information

DA Progress report 2 Multi-view facial expression. classification Nikolas Hesse

DA Progress report 2 Multi-view facial expression. classification Nikolas Hesse DA Progress report 2 Multi-view facial expression classification 16.12.2010 Nikolas Hesse Motivation Facial expressions (FE) play an important role in interpersonal communication FE recognition can help

More information

CS 223B Computer Vision Problem Set 3

CS 223B Computer Vision Problem Set 3 CS 223B Computer Vision Problem Set 3 Due: Feb. 22 nd, 2011 1 Probabilistic Recursion for Tracking In this problem you will derive a method for tracking a point of interest through a sequence of images.

More information

Classifying Images with Visual/Textual Cues. By Steven Kappes and Yan Cao

Classifying Images with Visual/Textual Cues. By Steven Kappes and Yan Cao Classifying Images with Visual/Textual Cues By Steven Kappes and Yan Cao Motivation Image search Building large sets of classified images Robotics Background Object recognition is unsolved Deformable shaped

More information

CS231A Section 6: Problem Set 3

CS231A Section 6: Problem Set 3 CS231A Section 6: Problem Set 3 Kevin Wong Review 6 -! 1 11/09/2012 Announcements PS3 Due 2:15pm Tuesday, Nov 13 Extra Office Hours: Friday 6 8pm Huang Common Area, Basement Level. Review 6 -! 2 Topics

More information

Face Detection and Alignment. Prof. Xin Yang HUST

Face Detection and Alignment. Prof. Xin Yang HUST Face Detection and Alignment Prof. Xin Yang HUST Many slides adapted from P. Viola Face detection Face detection Basic idea: slide a window across image and evaluate a face model at every location Challenges

More information

Facial Feature Detection

Facial Feature Detection Facial Feature Detection Rainer Stiefelhagen 21.12.2009 Interactive Systems Laboratories, Universität Karlsruhe (TH) Overview Resear rch Group, Universität Karlsruhe (TH H) Introduction Review of already

More information

Computer vision: models, learning and inference. Chapter 13 Image preprocessing and feature extraction

Computer vision: models, learning and inference. Chapter 13 Image preprocessing and feature extraction Computer vision: models, learning and inference Chapter 13 Image preprocessing and feature extraction Preprocessing The goal of pre-processing is to try to reduce unwanted variation in image due to lighting,

More information

The SIFT (Scale Invariant Feature

The SIFT (Scale Invariant Feature The SIFT (Scale Invariant Feature Transform) Detector and Descriptor developed by David Lowe University of British Columbia Initial paper ICCV 1999 Newer journal paper IJCV 2004 Review: Matt Brown s Canonical

More information

Parallel Tracking. Henry Spang Ethan Peters

Parallel Tracking. Henry Spang Ethan Peters Parallel Tracking Henry Spang Ethan Peters Contents Introduction HAAR Cascades Viola Jones Descriptors FREAK Descriptor Parallel Tracking GPU Detection Conclusions Questions Introduction Tracking is a

More information

Active Appearance Models

Active Appearance Models Active Appearance Models Edwards, Taylor, and Cootes Presented by Bryan Russell Overview Overview of Appearance Models Combined Appearance Models Active Appearance Model Search Results Constrained Active

More information

CS 231A Computer Vision (Fall 2012) Problem Set 3

CS 231A Computer Vision (Fall 2012) Problem Set 3 CS 231A Computer Vision (Fall 2012) Problem Set 3 Due: Nov. 13 th, 2012 (2:15pm) 1 Probabilistic Recursion for Tracking (20 points) In this problem you will derive a method for tracking a point of interest

More information

Local features: detection and description. Local invariant features

Local features: detection and description. Local invariant features Local features: detection and description Local invariant features Detection of interest points Harris corner detection Scale invariant blob detection: LoG Description of local patches SIFT : Histograms

More information

THE ability to accurately detect feature points of

THE ability to accurately detect feature points of IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE Robust and Accurate Shape Model Matching using Random Forest Regression-Voting Claudia Lindner*, Paul A. Bromiley, Mircea C. Ionita and Tim

More information

SuRVoS Workbench. Super-Region Volume Segmentation. Imanol Luengo

SuRVoS Workbench. Super-Region Volume Segmentation. Imanol Luengo SuRVoS Workbench Super-Region Volume Segmentation Imanol Luengo Index - The project - What is SuRVoS - SuRVoS Overview - What can it do - Overview of the internals - Current state & Limitations - Future

More information

EE795: Computer Vision and Intelligent Systems

EE795: Computer Vision and Intelligent Systems EE795: Computer Vision and Intelligent Systems Spring 2012 TTh 17:30-18:45 FDH 204 Lecture 09 130219 http://www.ee.unlv.edu/~b1morris/ecg795/ 2 Outline Review Feature Descriptors Feature Matching Feature

More information

Sparse Shape Registration for Occluded Facial Feature Localization

Sparse Shape Registration for Occluded Facial Feature Localization Shape Registration for Occluded Facial Feature Localization Fei Yang, Junzhou Huang and Dimitris Metaxas Abstract This paper proposes a sparsity driven shape registration method for occluded facial feature

More information

Image Features. Work on project 1. All is Vanity, by C. Allan Gilbert,

Image Features. Work on project 1. All is Vanity, by C. Allan Gilbert, Image Features Work on project 1 All is Vanity, by C. Allan Gilbert, 1873-1929 Feature extrac*on: Corners and blobs c Mo*va*on: Automa*c panoramas Credit: Ma9 Brown Why extract features? Mo*va*on: panorama

More information

Introduction. Introduction. Related Research. SIFT method. SIFT method. Distinctive Image Features from Scale-Invariant. Scale.

Introduction. Introduction. Related Research. SIFT method. SIFT method. Distinctive Image Features from Scale-Invariant. Scale. Distinctive Image Features from Scale-Invariant Keypoints David G. Lowe presented by, Sudheendra Invariance Intensity Scale Rotation Affine View point Introduction Introduction SIFT (Scale Invariant Feature

More information

School of Computing University of Utah

School of Computing University of Utah School of Computing University of Utah Presentation Outline 1 2 3 4 Main paper to be discussed David G. Lowe, Distinctive Image Features from Scale-Invariant Keypoints, IJCV, 2004. How to find useful keypoints?

More information

Mobile Human Detection Systems based on Sliding Windows Approach-A Review

Mobile Human Detection Systems based on Sliding Windows Approach-A Review Mobile Human Detection Systems based on Sliding Windows Approach-A Review Seminar: Mobile Human detection systems Njieutcheu Tassi cedrique Rovile Department of Computer Engineering University of Heidelberg

More information

Video Google faces. Josef Sivic, Mark Everingham, Andrew Zisserman. Visual Geometry Group University of Oxford

Video Google faces. Josef Sivic, Mark Everingham, Andrew Zisserman. Visual Geometry Group University of Oxford Video Google faces Josef Sivic, Mark Everingham, Andrew Zisserman Visual Geometry Group University of Oxford The objective Retrieve all shots in a video, e.g. a feature length film, containing a particular

More information

Feature descriptors. Alain Pagani Prof. Didier Stricker. Computer Vision: Object and People Tracking

Feature descriptors. Alain Pagani Prof. Didier Stricker. Computer Vision: Object and People Tracking Feature descriptors Alain Pagani Prof. Didier Stricker Computer Vision: Object and People Tracking 1 Overview Previous lectures: Feature extraction Today: Gradiant/edge Points (Kanade-Tomasi + Harris)

More information

Out-of-Plane Rotated Object Detection using Patch Feature based Classifier

Out-of-Plane Rotated Object Detection using Patch Feature based Classifier Available online at www.sciencedirect.com Procedia Engineering 41 (2012 ) 170 174 International Symposium on Robotics and Intelligent Sensors 2012 (IRIS 2012) Out-of-Plane Rotated Object Detection using

More information

Face Alignment Using Active Shape Model And Support Vector Machine

Face Alignment Using Active Shape Model And Support Vector Machine Face Alignment Using Active Shape Model And Support Vector Machine Le Hoang Thai Department of Computer Science University of Science Hochiminh City, 70000, VIETNAM Vo Nhat Truong Faculty/Department/Division

More information

Subject-Oriented Image Classification based on Face Detection and Recognition

Subject-Oriented Image Classification based on Face Detection and Recognition 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

A Comparison of SIFT and SURF

A Comparison of SIFT and SURF A Comparison of SIFT and SURF P M Panchal 1, S R Panchal 2, S K Shah 3 PG Student, Department of Electronics & Communication Engineering, SVIT, Vasad-388306, India 1 Research Scholar, Department of Electronics

More information

An Associate-Predict Model for Face Recognition FIPA Seminar WS 2011/2012

An Associate-Predict Model for Face Recognition FIPA Seminar WS 2011/2012 An Associate-Predict Model for Face Recognition FIPA Seminar WS 2011/2012, 19.01.2012 INSTITUTE FOR ANTHROPOMATICS, FACIAL IMAGE PROCESSING AND ANALYSIS YIG University of the State of Baden-Wuerttemberg

More information

Image Processing Pipeline for Facial Expression Recognition under Variable Lighting

Image Processing Pipeline for Facial Expression Recognition under Variable Lighting Image Processing Pipeline for Facial Expression Recognition under Variable Lighting Ralph Ma, Amr Mohamed ralphma@stanford.edu, amr1@stanford.edu Abstract Much research has been done in the field of automated

More information

Coarse-to-fine image registration

Coarse-to-fine image registration Today we will look at a few important topics in scale space in computer vision, in particular, coarseto-fine approaches, and the SIFT feature descriptor. I will present only the main ideas here to give

More information

Probabilistic Facial Feature Extraction Using Joint Distribution of Location and Texture Information

Probabilistic Facial Feature Extraction Using Joint Distribution of Location and Texture Information Probabilistic Facial Feature Extraction Using Joint Distribution of Location and Texture Information Mustafa Berkay Yilmaz, Hakan Erdogan, Mustafa Unel Sabanci University, Faculty of Engineering and Natural

More information

CAP 5415 Computer Vision Fall 2012

CAP 5415 Computer Vision Fall 2012 CAP 5415 Computer Vision Fall 01 Dr. Mubarak Shah Univ. of Central Florida Office 47-F HEC Lecture-5 SIFT: David Lowe, UBC SIFT - Key Point Extraction Stands for scale invariant feature transform Patented

More information

Image Features: Local Descriptors. Sanja Fidler CSC420: Intro to Image Understanding 1/ 58

Image Features: Local Descriptors. Sanja Fidler CSC420: Intro to Image Understanding 1/ 58 Image Features: Local Descriptors Sanja Fidler CSC420: Intro to Image Understanding 1/ 58 [Source: K. Grauman] Sanja Fidler CSC420: Intro to Image Understanding 2/ 58 Local Features Detection: Identify

More information

UvA-DARE (Digital Academic Repository) Enabling dynamics in face analysis Dibeklioglu, H. Link to publication

UvA-DARE (Digital Academic Repository) Enabling dynamics in face analysis Dibeklioglu, H. Link to publication UvA-DARE (Digital Academic Repository) Enabling dynamics in face analysis Dibeklioglu, H. Link to publication Citation for published version (APA): Dibekliolu, H. (24). Enabling dynamics in face analysis

More information

Face Analysis using Curve Edge Maps

Face Analysis using Curve Edge Maps Face Analysis using Curve Edge Maps Francis Deboeverie 1, Peter Veelaert 2 and Wilfried Philips 1 1 Ghent University - Image Processing and Interpretation/IBBT, St-Pietersnieuwstraat 41, B9000 Ghent, Belgium

More information

Video Processing for Judicial Applications

Video Processing for Judicial Applications Video Processing for Judicial Applications Konstantinos Avgerinakis, Alexia Briassouli, Ioannis Kompatsiaris Informatics and Telematics Institute, Centre for Research and Technology, Hellas Thessaloniki,

More information

EXAMINATIONS 2016 TRIMESTER 2

EXAMINATIONS 2016 TRIMESTER 2 EXAMINATIONS 2016 TRIMESTER 2 CGRA 151 INTRODUCTION TO COMPUTER GRAPHICS Time Allowed: TWO HOURS CLOSED BOOK Permitted materials: Silent non-programmable calculators or silent programmable calculators

More information