Integral Channel Features with Random Forest for 3D Facial Landmark Detection

Size: px
Start display at page:

Download "Integral Channel Features with Random Forest for 3D Facial Landmark Detection"

Transcription

1 MSc Artificial Intelligence Track: Computer Vision Master Thesis Integral Channel Features with Random Forest for 3D Facial Landmark Detection by Arif Qodari February EC Supervisor/Examiner: Prof. dr. Theo Gevers Sezer Karaoglu Accessor: dr. Leo Dorst dr. Jacco Vink Informatics Institute University of Amsterdam

2 Abstract Detecting facial landmarks are important to understand human faces. While 2D imagebased approaches have been well studied in literature, a 3D-based approach remains a challenging problem due to several reasons, e.g. performance issue in noisy data, run time issue because of model complexity, and the robustness issue of pose variations. In this thesis, we investigate the performance of random forest based models combined with integral channel features to detect 3D facial landmark. We study the influence of using heteregeneous information computed from multiple channels features to obtain an accuracy 3D landmark detector. A variant of a random forest algorithm that utilizes multiple channel features is proposed to localize 3D facial landmarks. Multiple channel features provide rich and divers information. These features are efficiently computed using integral image from noisy RGB-Depth images. Finally, we present our experimental results evaluated on the Biwi Kinect dataset containing a large range of head pose angles. The results show that adding more channel features, more specifically gray and gradient channels, has a positive influence on the accuracy of our detector rather than using a single depth channel. Moreover, using the additional gray and gradient channels also increases the robustness of the detector against head pose variations. We also demonstrate that our approach produces higher mean accuracy compared to a 2D-based state-of-the-art method.

3 Acknowledgements I would like to thank to Theo Gevers for giving me the opportunity to work on an interesting topic under his supervision. Many thanks to Sezer Karaoglu for his valuable advice and feedback to improve the quality of this thesis. He helped me intensively with the project and writing process. During the project, we have had many meetings to discuss both the theoritical and technical aspects. Despite his busy schedule, I could always contact and discuss the problems I faced. I also want to thank my wife and my parents for supporting me along the way. iii

4

5 Contents Abstract ii Acknowledgements iii Contents List of Figures List of Tables iv vii ix 1 Introduction Motivation Goal Thesis Outline Random Forest for 3D Facial Landmark Detection Related Work D Facial Landmark Detection Random Forest Training Forest Binary Test Objective Function Testing Vote Clustering Integral Channel Features Related Work Integral Image Channel Features Evaluation on Computation Time Experiments Experiment Setup Dataset Labelling Parameter Settings Evaluation Measure Results v

6 Contents vi Number of Trees Accuracy Accuracy vs Efficiency Channel Features Comparison Frontal face dataset Full dataset Analysis Performance under Head Pose Variations Comparison with 2D-based State-of-the-Art Conclusion 33 Bibliography 35

7 List of Figures 2.1 The pipeline of training process Vote clustering Integral image Multiple registered image channels Landmark annotation result The influence of number of trees Accuracy Accuracy vs efficiency in frontal face test set Accuracy vs efficiency in full test set Channel features performance comparison in frontal face dataset Examples of failure cases in frontal face test set Channel features performance comparison in full test set Examples of failure cases in full test set Performance under head pose variations Examples of failure cases in large head poses vii

8

9 List of Tables 3.1 Average time needed to generate image channel Summary of the performance of channel features Evaluation result on 4-fold cross validation Performance comparison with 2D-based method ix

10

11 Chapter 1 Introduction 1.1 Motivation Facial landmark detection is the problem of detecting point of interests (e.g. eyes, mouth corners, and nose tip) on human faces. Facial landmarks are important for many facial analysis tasks such as face recognition [1], facial expression recognition [2], and facial animation [3]. Therefore, detecting facial landmarks is an essential aspect to understand faces. There has been many approaches developed to robustly detect facial landmarks. These methods can be categorized into 2D image-based and 3D-based approaches. 2D imagebased approaches operate in 2D and hence return the detected landmarks in 2D coordinates, while 3D-based approaches detect 3D facial landmarks from depth images. Facial landmark detection from 2D images has been well studied in the literature. Recently, a number of real-time performing approaches have achieved high detection accuracy on face images collected in the wild [4][5][6][7][8]. However, the performance of 2D image-based methods usually deteriorates under varying illumination conditions (e.g. highlights, shadows and dim light). In addition, 2D image-based methods often require an initial face region obtained by a face detection algorithm. As a consequence, the performance of these methods are limited by the accuracy of the face detector. Prior work on 3D-based approach shows that it is more robust than a 2D image-based approach against lighting conditions and head pose variations [9][10]. For instance, the method proposed by Baltrušaitis et al. [9] integrates depth and intensity images to alleviate the problem caused by poor lighting conditions. Since depth information is independent of light, the appearance of objects is not affected by lighting condition. The authors reported accurate results for detection and tracking of 3D facial landmarks. 1

12 Chapter 1. Introduction 2 Another relevant example is the method by Papazov et al. [10]. The method exploits depth features to detect 3D facial landmarks under varying head poses. The method obtains high detection accuracy in real-time. Detecting 3D facial landmarks remains a challenging problem because of several reasons: performance issue due to noisy input data, run time issue due to computational complexity, and robustness issue for pose variations and local deformations. Another challenge of this topic is the lack of annotated 3D facial landmark datasets. Some of the prior work used synthetic a head model [9] or high-quality face scans [11] [12] leading a performance gap when tested in noisy depth data acquired by depth sensors. In our experiments, we used the Biwi Kinect dataset [13] which has more than 15K RGB-Depth streams for various head poses. This dataset does not provide any landmark annotations. However, head rotation angles and head center locations are provided. In section 4.1.1, we present an algorithm to annotate rigid landmark points through all frames using the provided head rotation angles and center locations. Random forest has been widely used in many computer vision tasks including 2D [14][7] and 3D facial analysis [12][14][13]. A random forest consists of multiple decision trees. A single tree maps complex feature spaces into simpler decision spaces. Random forest has the capability to handle large data inputs efficiently. Moreover, the concept of randomness also brings an advantage to avoid overfitting. A number of random forest based models have been proposed for 2D facial landmark detection. Dantone et al. [14] proposed a conditional regression forest to detect 2D facial landmarks in 2D images. The proposed regression forest is conditional to global face properties, e.g. head pose. This method employs multiple channel features: raw and normalized gray values and Gabor filter banks [15] computed for varying parameters. The authors demonstrated the benefit of using a random forest based model to map complex high-dimensional features into a multi-class decision model. A similar approach was proposed by Kazemi et al [7], namely Ensemble Regression Trees (ERT). Unlike the method proposed by Dantone et. al., ensemble trees work in cascaded architectures and are trained by a gradient boosting algorithm. The method utilizes differences of intensity values at a pair of pixels as features. The combination of random forests using these features accurately predicts 2D facial landmarks efficiently (about 1ms to process one image).

13 Chapter 1. Introduction 3 In the context of 3D facial landmark detection, Fanelli et al. [12] proposed a random forest to localize 3D facial landmarks under various facial expressions from high-quality face scans. The method utilizes generalized Haar-like features [16] computed from the depth channel. The authors reported real-time and accurate detection results. Today, advances in device technology allow us to record depth information as well as RGB information at low cost, e.g., MS Kinect and Asus Xtion. Inspired by the work of Fanelli et al. [12], we employed a similar approach to detect 3D facial landmarks from RGB-D images. The difference is that we exploited multiple sources of information, i.e. RGB and depth, and analyzed the influence of data diversity on the 3D facial landmark detection performance. A number of methods have shown that integral channel features [17] are effective for many computer vision tasks, including object recognition [18], pedestian detection [19], and local region matching [20]. Integral channel features capture rich information from the different and diverse channels in images. In addition, the features can be efficiently computed using integral images. For these reasons, we combined integral channel features with a random forest model to detect 3D facial landmarks. Our main contribution is the combination of integral channel features with random forest to detect 3D facial landmark using RGB-D images. We study the influence of various channel features on the 3D facial landmark detection performance. We also investigate the robustness of our approach under varying head poses. 1.2 Goal This thesis focuses on investigating the combination of different channel features for 3D facial landmark detection. The following research questions are: 1. How to integrate multiple channel features into a random forest based model for 3D facial landmark detection? 2. What are the best performing channels for detecting 3D facial landmarks? 3. How does the landmark detector perform under various head poses?

14 Chapter 1. Introduction Thesis Outline In Chapter 2, we first summarize prior work related to 3D facial landmark detection and a variant of random forest based approaches for facial analysis tasks. We then explain the details of the random forest algorithm specific for 3D facial landmark detection. Chapter 3 describes the integral channel features and approaches for integrating the features into a random forest. It discusses three different channel types: depth, gray and the gradient histogram. This chapter also provides a discussion on the computational time. Implementation details (e.g. dataset annotation, parameter settings, and evaluation metric) and experiments are discussed in Chapter 4. Finally, we present our conclusions and possibilities for future research.

15 Chapter 2 Random Forest for 3D Facial Landmark Detection Typical random forest algorithms work in a supervised way, i.e. the algorithm constructs trees from a set of training data annotated with the desired output labels. We call this as the training process. A tree is constructed to maximize the information gain by mapping complex input spaces into simpler discrete (classification) or continuous (regression) output spaces. The mapping process is done in every non-leaf node, while the leaf node stores the information to be used for prediction. Once the forest is constructed, a testing process is conducted to evaluate the generalization ability of the trained forest from given unseen data. A set of testing data are propagated down the trees where each tree gives a prediction vote. The forest determines final prediction by either averaging the votes or choosing the majority votes. This chapter discusses a specific variant of the random forest algorithm for 3D facial landmark detection. Section 2.1 presents a literature review related to 3D facial landmark detection and random forest based solutions for facial analysis. Training and testing are discussed in section 2.2 and 2.3, respectively. 2.1 Related Work D Facial Landmark Detection A number of methods have been proposed in the literature for detecting 3D facial landmark from noisy and high-quality input. 5

16 Chapter 2. Random Forest for 3D Facial Landmark Detection 6 Baltrušaitis et al. [9] proposed a 3D Constrained Local Model (CLM-Z), which is an extension of the Constrained Local Model [21], for facial landmark tracking under varying pose. Depth and intensity channels were integrated to reduce missed detections caused by poor lighting condition. This model has shown a robust performance for varying lighting conditions and poses. Ju et al. [22] combined a 3D shape descriptor with binary neural networks to detect nose tip and eyes. The descriptor is invariant against illumination variations. The reported accuracy was over 99, 6% in the presence of facial expressions. Zhao et al. [23] introduced the Statistical Facial Model (SFAM) which combines local variations of texture and geometry around each landmark with global variations between landmarks. A robust fitting algorithm was proposed to localize landmarks under facial expressions and occlusions. Although high accuracy results were reported, the proposed algorithm is computationally expensive. Papazov et al. [10] proposed Triangular Surface Patch (TSP) features extracted from 3D point clouds to jointly estimate the head pose and 3D facial landmarks. The authors demonstrated that these features are efficient to compute, are viewpoint-independent and they are insensitive to pose changes. The proposed approach achieved high accuracy and real time Random Forest Random forest, as introduced by Breiman [24], is an ensemble learning method that consists of multiple decision trees [25]. Each tree in the forest is constructed from a randomly sampled subset of training data. Starting from its root node, every non-leaf node generates a number of candidate splits and finds the optimal split of the incoming data input. The optimal split φ is defined as the one which maximizes the information gain: φ = arg max IG(φ), (2.1) φ IG(φ) = H(P) w i H(P i (φ)), (2.2) i {L,R} where w i is the ratio of data input propagated to each child node and H(P) is uncertainty measure of the input set P. After the split, the results are sent to the left and right child nodes. The procedure is then repeated until all leaves are created.

17 Chapter 2. Random Forest for 3D Facial Landmark Detection 7 In the context of 3D face analysis, random forest based approaches have been applied to estimate head pose from high-quality head scans [11]. The authors achieved realtime performance without requiring a Graphical Processing Unit (GPU). The authors extended their work [26] to use noisy depth data obtained from a consumer depth camera and still managed to obtain low regression error. However, the result was not as accurate as the previous system due to more noisy data input. In their subsequent paper [12], the authors extended their work for facial landmark detection. The method was evaluated on high quality face scans containing facial expressions and head pose rotations. High accuracy results were reported. Another relevant work by Fanelli et al. [13], proposed a random regression forest to steer fitting of an Active Appearance Model (AAM) [27]. The authors achieved robust performance by integrating depth and intensity channels. 2.2 Training Forest A forest is basically a collection of decision trees. To construct a decision tree T in the forest T = {T t }, a set of randomly sampled training images is provided. Every single image has multiple registered channels, which will be discussed later in chapter 3. Next, a set of fixed-size image patches are extracted from each training image and each channel. The patches are extracted around the facial landmark points (positive samples) and outside the face region (negative samples). More specifically, a patch is considered as a positive sample for a landmark point k if the distance d k between the center of the patch and the landmark point is below a certain radius. We follow the parameter setting from [12], in which the radius is defined as one fifth of the radius of an average human face. In other words, d k 0.2r, r is the radius of an average human face. Figure 2.1 illustrates the pipeline of the training process. Each patch P i consists of multiple channel features I i = (Ii 1,..., IC i ) and annotated with a class label c i 0, 1,..., K and an offset vector θ i = (θ 1 i,..., θ K i ). K is the number of landmark points and c i = 0 means that the patch is sampled from background, e.g. hair, body. The offset vector θ k = (θx, k θy, k θz k ) represents the relative position of landmark point k from the patch center. Each tree is constructed using a different set of training patches to make sure that the trees are less correlated. Reducing correlation between any two trees in the forest reduces the error rate [24]. This is because a single decision tree can be seen as a predictor with a high variance. Adding more trees and averaging the results will move the final prediction close to the actual value.

18 Chapter 2. Random Forest for 3D Facial Landmark Detection 8 Figure 2.1: The pipeline of training process: (1) RGB and depth images are aligned using calibration matrix. (2) Multiple channels are generated from each image in the training set. (3) A set of positive and negative training patches are extracted from the registered image channels. (4) Training patches are used to construct trees. A tree is grown from its root node until all leaf nodes are created i.e. when either the maximum depth tree is reached or less than a certain number of patches are left. The algorithm for growing a tree in the forest is summarized as follows: 1. Sample with replacement N training images from the original training set. 2. Randomly extract a number of positive and negative samples from training images. 3. Starting from root node: (a) Generate different sets of parameters to perform binary tests {φ = {f, R 1, R 2, τ}}. The detail of binary test will be described in section (b) Perform binary tests for all generated parameters. (c) Select the optimum parameters which maximizes the objective function. The optimum parameters are then stored in the current node. The detail of objective function will be described in section (d) Divide the incoming patches P into two subset P L and P R and send them to the appropriate child nodes. 4. Repeat step 3 until all leaves are created. Once a leaf node L is created, it stores two kinds of information: (a) Probability of each class in that leaf p(c = k L), computed as the ratio of positive samples of class k arrive at that leaf.

19 Chapter 2. Random Forest for 3D Facial Landmark Detection 9 (b) Distribution over offset vectors for each facial landmark. The distribution is simply modelled by multivariate Gaussian, similar to [14]: p(θ k L) = N (θ k ; θ k ; Σ k ), (2.3) where θ k and Σ k are the mean and covariance matrix of the offset vectors of facial landmark k Binary Test As described in the previous section, a binary test is performed to split incoming patches into two subsets. In order to find the optimum split, typically a large number of candidate splits are generated. This means generating a large number of candidate parameters and then evaluating them using a binary test. The binary test is defined as follows [12]: 1 R 1 I f (q) 1 R 2 q R 1 q R 2 I f (q) > τ, (2.4) where I is the image channel, f is channel s index, R 1 and R 2 are two rectangular subpatches within the patch, and τ is a threshold. The parameters f, R 1, R 2, and τ are generated randomly and the result of this test determines how to split the incoming image patches. A patch is sent to the right child node if the test returns true, otherwise is sent to the left child node. It can be derived from equation 2.4 that the test measure is the difference between the average values of two rectangular sub-patches. Using the average pixel values reduces the effect of missing information in noisy data. Section 3.2 discusses how to compute the sum of pixel values over any rectangular region R using integral images Objective Function As mentioned in section 2.1.2, a forest is trained to maximize the information gain in every node in the tree, which results in minimum uncertainty measure. In this particular case where we want to localize facial landmarks in image patches. Therefore, the term H(P) in equation 2.2 can be replaced by a classification uncertainty measure and it is defined by: K H(P) = p(c = k P) log(p(c = k P)), (2.5) k=0

20 Chapter 2. Random Forest for 3D Facial Landmark Detection 10 where K is the number of classes (number of landmarks + 1) and p(c = k P) is the probability of class k in the patch set P. The probability p(c = k P) is aproximated by computing the ratio of positive patches for landmark k in the set P. A complete objective function is obtained by substituting equation 2.5 into equation 2.2. The optimum split is the one which maximizes this objective function. 2.3 Testing Once a complete forest has been trained, we would like to test the performance of the trained forest to detect 3D landmarks for unseen images. A set of dense patches are extracted from a test image with a predefined stride parameter. Stride parameters control the distance between patches. These patches are then sent to the trained trees. In each tree, the binary test with optimum parameters is performed to lead a patch from the root node until it reaches a leaf node. The information in the leaf node is used to compute a prediction vote. So, for each patch P, we will obtain a set of prediction votes from the trees. However, not all votes are considered. A leaf node L is allowed to vote for the location of landmark point k, if the following conditions are met: 1. The probability of class k stored in the leaf node is higher than a threshold, i.e. p(c = k L) tr prob 2. The trace of the corresponding covariance matrix (Equation 2.3) is below a maximum variance, i.e. Tr(Σ k ) < tr var. The optimal values for tr prob and tr var are 0.75 and 300, respectively. Those values are found by trial-and-error experiments. This criteria ensures that only votes with high confidence is considered for prediction. After sending all patches to the trees, K different sets of votes {vi k } are obtained. Each set represents the set of location candidates for the corresponding landmark k. Location candidates are calculated by adding the patch center coordinates with the mean offset vector θ k stored in the leaf node. Finally, a mean shift clustering [28] is performed for each vote set k to get the final prediction. The next section describes the vote clustering algorithm.

21 Chapter 2. Random Forest for 3D Facial Landmark Detection Vote Clustering Since our approach does not involve any face or head detection, a bottom-up clustering with a predefined radius (the radius of the average human face) is computed to localize head positions and to filter out the outliers. Outliers are identified if the number of votes in the resulting cluster is below a threshold that defines minimum number of votes. We follow [12] to set the threshold value. Within each head cluster, a mean shift clustering is performed for each landmark k. Mean shift is a non-parametric iterative algorithm that can be used to find the mode of a density function. The algorithm assumes that the given data are sampled from a probability density function where the dense region corresponds to local maxima or the mode of the density function. Starting from an arbitrary location, mean shift operates by defining a window around it and computes the weighted mean of the data within the window. The window size is defined by a kernel function. There are many choices to define a kernel function, e.g. flat kernel, Gaussian kernel. Next, the center of the window is shifted into the new weighted mean. This procedure is then repeated until converges or it reaches a maximum number of iterations. Given a set of landmark votes {vi k } and a Gaussian kernel K, the clustering procedure is summarized as follows: 1. Set initial estimate m k t=0 with the mean of landmark votes 2. Repeat until m k converges or maximum iteration: (a) Update the weighted mean m k, m k vi t+1 = k K(vk i mk t )vi k v ki K(vk i mk t ), (2.6) ) where K(vi k mk t ) = exp ( vk i mk t 2 and h = 0.2r. 2h 2 The bandwidth parameter h determines the size of clustering window, we set its value to one fifth of the radius of average face.

22 Chapter 2. Random Forest for 3D Facial Landmark Detection 12 In each iteration, the window determined by a Gaussian kernel K is be shifted to a more dense region. So, at the end, it will reach the peak of the density function. The final prediction for landmark k is given by the final value of the weighted mean m k. Figure 2.2 illustrates the votes for all landmark points and the final prediction. Figure 2.2: All votes for each landmark are represented as point clouds in different colors. The centers of the circles represent the final prediction of landmark positions.

23 Chapter 3 Integral Channel Features In the previous chapter, we have explained random forest-based models for 3D landmark detection. The performance of these models are not only determined by the learning algorithm itself, but also the feature representation. Thus, the choice of features is an important aspect to develop a robust landmark detector. In this chapter, we discuss integral channel features and how these features are integrated into a random forest model. The idea of integral channel features is simple but effective. A number of image channels are generated from a given image. These channels can be generated in many different forms. For instance, depth and color channels are obtained directly from an image. A channel can be computed using linear transformation (e.g. Gabor filters), non-linear transformation (e.g. gradient) or even a pointwise transformation. Once the channels are generated, features such as local sums, histograms, and Haar features are computed for each channel. The features capture heterogenous and richness information from different types of channels. Furthermore, these features can be computed efficiently using integral images. In the first section, we summarize the related work on integral channel features that have been applied in different computer vision tasks. Section 3.2 describes integral images. Section 3.3 explains the different channel types, and how the channels and features are integrated into a random forest model. Lastly in section 3.4, we present the evaluation result of different channels in terms of computational time. 3.1 Related Work The notion of integral channel features is inseparable with the concept of an integral image. The first work adopting integral images in computer vision domain was the work 13

24 Chapter 3. Integral Channel Features 14 of Viola and Jones [29]. Viola and Jones proposed cascaded AdaBoost classifiers with Haar-like features for object detection. They achieved real time performance with high detection accuracy. Their work was a breakthrough in computer vision in which their proposed feature representations are proven efficient yet effective for object detection. Later, similar framework has been adopted in many other applications. Integral channel features, in particular, have proven to be effective for many computer vision tasks, e.g. object recognition [18], pedestrian detection [19] and local region matching [20]. In the medical imaging domain, Tu et al. [30] introduced a probabilistic boosting tree framework with various image channels for MRI brain segmentation. The authors computed Gabor filters and edge response channels at different scales combined with 3D Haar filter channels on top of them. Dollar et al. [31] trained an edge detector using a large number of channel features. These channels included gradients at various scales, Gabor filters and Gaussian filters obtained high accuracy. In their subsequent paper [17], Dollar et al. explored different types of channel features and studied the performance of different channel types for pedestrian detection. Their proposed method succesfully outperformed other features including Histogram of Oriented Gradients (HoG). A variant of integral channel features, named aggregate channel features was proposed by Yang et al. [32] to train multi-view face detector. The authors adopted the Viola- Jones learning framework and utilized different types of color and gradient channels to deal with different poses from ranging frontal faces to profile faces. The algorithm achieved high detection accuracy for face images in the wild Although many methods have reported that use integral channel features in many different tasks, only a few methods utilized integral channel features in 3D related problems. In this thesis, we are interested to exploit multiple channel features computed from RGB-Depth images. 3.2 Integral Image In image processing, an integral image is known as the algorithm to efficiently calculate the sum of values (pixel values) in a rectangular image area. Figure 3.1 illustrates how the integral image algorithm works. At each location (x, y), an integral image contains the sum of the pixel values above and to the left of (x, y). It is formally defined by:

25 Chapter 3. Integral Channel Features 15 I(x, y) = x x y y i(x, y ), (3.1) where i is image input and i(x, y ) is pixel values at location (x, y ) in image input. Once the integral image has been computed, the sum of values over any rectangular area (x 0, y 0, x 1, y 1 ) within the integral image can be calculated in constant time O(1) using four references: i(x, y) = I(x 0, y 0 ) + I(x 1, y 1 ) I(x 0, y 1 ) I(x 1, y 0 ) (3.2) x 0 <x x 1 y 0 <y y 1 Figure 3.1: (a) Input image and (b) Computed integral image. Sum values in region A can be computed using four references: L 1 + L 4 L 2 L 3. The concept of integral images has been extended in various ways. For instance, integral images can also be used to compute the local product of any rectangular area within an image. This can be done by taking the log of pixel values and to compute the sum, since exp ( i log(x i)) = i x i. Lienhart and Maydt [33] extended the integral image representation to compute the sum of pixels in rotated rectangular regions. They proposed rotated Haar-like features with boosted classifiers for object detection. These features were reported produced more robust and accurate detection. Another variant is the integral volume representation, which is the three-dimensional generalization of the integral image. Ke et al. [34] exploited this kind of volumetric features in the spatiotemporal domain for event detection in video sequences. The method achieved real time performance with low errors.

26 Chapter 3. Integral Channel Features Channel Features In section 2.2.1, we have defined features such s average pixel values over two rectangular regions within a patch. This kind of features can be computed from any type of channel. The only prerequisite is that a channel C has to be translational invariant. That means that if two images I and I are related by a translation, the generated channels C and C are related by the same translation. This criteria allows us to efficiently compute features from any rectangular within the image channel. An image channel C is only generated once rather than for every image patch. Computing features in an image patch is done by using integral images. In this section, we study three different channel types: depth, gray and gradient histograms. These channels are illustrated in Figure 3.2. The rationale behind selecting these channels is that these channels capture local information about the face surface and its contours. In particular, depth values indicate how the face surface looks like. Gray values capture the texture of each face surface. The image gradients capture information about the rate of texture changes and edge responses along different angles. Figure 3.2: Examples of generated image channels: depth, gray, and gradient along 4 different angles. Sum over any rectangular region within the image is computed using integral image. 1. Depth channel It is the channel that is obtained directly from the RGB-D image. The sum over any rectangular region of the depth channel is computed directly by using integral images. 2. Gray channel The gray channel is generated from the RGB color channels. Normalized (raw) gray values are used to minimize the effect of illumination variations. 3. Gradient Histogram The algorithm to compute the histogram using integral images was first introduced by Porikli [35]. Gradient histograms are the most commonly used variants of

27 Chapter 3. Integral Channel Features 17 integral histograms. Gradient histograms are generated by quantizing the gray image into a number of gradient angles. Each value within the quantized image is weighted by its gradient magnitude. Q θ (x, y) = G(x, y)1[θ(x, y) = θ], (3.3) where 1 is indicator function, θ is gradient angle, G(x, y) and Θ(x, y) are the gradient magnitude and the quantized gradient angles at pixel location (x, y), respectively. In our settings, instead of combining the obtained quantized images into histograms, we adopted the quantized images themselves as multiple individual channels. The only parameter to be set here is the number of quantized images that are computed. This parameter influences the performance of the model. The impact of this parameter will be discussed in chapter 4. This technique can also be applied to approximate HoG features, as in [19], by combining all quantized images into histogram and normalize it with gradient image computed at a different scale. Integrating channel features into our random forest model is as follows. Since RGB-D images are used, the RGB and depth images have to be aligned first. In the training process, each training image is transformed into multiple image channels. A set of patches extracted from these channels, are then used to grow trees. During the training stage, the learning algorithm will select an optimum channel for every non-leaf node which maximizes the information gain. The same approach is applied in testing phase. Intuitively, the more channels used in the model, the richer are information the model is collecting to classify patches correctly. However, adding more channels also increases the complexity of the model and increases the computational time. Our experiments study which combination of channel features produces the best performance to detect 3D facial landmarks. Our experiment results reported in chapter Evaluation on Computation Time To demonstrate the efficiency of integral channel features, we perform experiments to measure the average time needed to generate each individual channel and combination channels. All experiments are conducted on the same standard PC. Table 3.1 illustrates the average time needed to generate different channel types plus the computation of integral images from the channel. Computing integral images from

28 Chapter 3. Integral Channel Features 18 Channel Types Time (ms) 1 Gradient Gradients Gradients Depth + Gray + 1 Gradient Depth + Gray + 4 Gradients Depth + Gray + 9 Gradients Table 3.1: Average time needed to generate image channel depth and gray images takes around 2 ms. While time to compute the sum values over any random rectangular region within the channel is also negligible since it is O(1) operation.

29 Chapter 4 Experiments 4.1 Experiment Setup Dataset Labelling We evaluated our model on the Biwi Kinect head pose dataset 1. The dataset contains 24 sequences of 20 subjects (14 men and 6 women) with more than 15K frames in total. Each frame has both a RGB image and a depth image as well as information about the head rotation and location. The head rotation angles in each subject varies: ±60 pitch, ±75 yaw, and ±50 roll. The dataset has no landmark annotation. To annotate landmarks for each subject, we used the following algorithm: 1. Manually annotate landmark points in the first frame. Any facial landmark detector can also be used to automate this step. In our setting, we annotated the first frames using the 2D landmark detector proposed in [6]. This step will result in facial landmark annotations in 2D coordinate. 2. From 2D landmarks, compute 3D landmarks using the corresponding depth image and the intrinsic matrix. p = M 1 x (4.1) x is a vector representing 2D landmarks, vector p denotes the landmarks in 3D, and M is the camera intrinsic matrix. 1 Biwi Kinect dataset is available at: 19

30 Chapter 4. Experiments Shift the location of head center to the origin of coordinate system. In other words, substract the position of head center from the landmark position. 4. Transform the landmarks with the inversed rotation matrix. This results in landmarks in 3D camera coordinates. p0 = R 1 1 p (4.2) where the rotation matrix R1 is the rotation matrix at the first frame. 5. From frame 1 until frame N, transform the landmarks p0 using rotation matrices at each frame. Final landmark positions are obtained by translating the transformed landmark positions with the original head center location. Figure 4.1 illustrates landmark annotation results for different head poses. pn = (Rn p0 ) + h (4.3) pn and Rn are respectively final 3D landmarks and the rotation matrix at frame n. Figure 4.1: Examples of annotation results for different head poses. The landmarks (green dots) are visualized in 2D. The black dots represent landmarks that are not visible when projected in 2D. We identified that for large head pose, several landmarks are not aligned on face surface. In addition, these points are visible when the 3D image is projected onto a 2D image, as illustrated by the third and fourth images in Figure 4.1. Considering this, we performed an additional step to verify whether a landmark is located on the face surface. A landmark that has neighbourhood point clouds within a certain radius is categorized as visible, otherwise it is not visible. This visibility information used when evaluating the performance of the landmark detector. Only visibile landmarks are considered in the evaluation. Once the dataset is annotated, we followed the settings of [12] to split the dataset into training and testing sets. The testing set contains only 2 subjects: man and woman with large pose variations (subject number 01 and 12). The rest subjects are used in the

31 Chapter 4. Experiments 21 training set, except for subjects 06, 17 and 19. These subjects have facial expressions and missing depth data in one or more fiducial points, e.g. eye corners. As a consequence, the position of rigid landmark points in these subjects are hard to approximate. In order to analyze the robustness of our landmark detector in the presence of head pose variations, we conducted two experiments. First, we trained and evaluated the model with subsets of frames, having less than 20 head rotation (frontal face). This constraint ensures that all landmarks are visible and surrounded by sufficient facial surface to be computed. The second experiment, we relaxed the constraint and constructed a forest from the full training set. The trained trees are then evaluated using the unconstrained test set. In this experiment, the evaluation is conducted only from visible landmarks. We study optimum parameters of random forest, error thresholds and comparison of different combinations of channel features. Moreover, we compare the performance of the landmark detector with other 2D-based method to gain insight about the advantage of this method Parameter Settings In order to fairly compare the performance of the channel features, we trained multiple forests using identical training images and patches. For training, we fixed the following parameters: 1. Number of image samples on each tree: 1000 (frontal face set), 3000 (full set). 2. Maximum tree depth: Number of positive patch samples extracted from each image: Number of negative patch samples extracted from each image: Patch size: pixels. 6. Minimum number of patches required for a split: Number of binary tests in each non-leaf node: different combinations of R 1, R 2 and f in Equation 2.4, each with 25 different threshold τ. In the testing phase, the following parameters are applied: 1. Threshold variance: 300

32 Chapter 4. Experiments Threshold class probability: Bandwidth parameter for mean shift clustering: 0.2r, r is radius of average face Evaluation Measure We measured the error for each landmark as an Euclidean distance between the predicted location and the ground truth (Equation 4.4). We also measured the ratio of correctly detected points if the error produced is less than an error threshold. The optimum error threshold is discussed in section error(y k, t k ) = (y k x t k x) 2 + (y k y t k y) 2 + (y k z t k z) 2 (4.4) y k and t k are predicted location and the ground truth location of landmark k in 3D coordinate, respectively. 4.2 Results Number of Trees The experiment was conducted with three different channel combinations both for the frontal face and full dataset. The results of this experiment are presented in Figure 4.2a and 4.2b. The graphs show the mean Euclidean error (in milimeters) as function of the number of trees when the maximum tree depth is fixed to 20. Both graphs illustrate that adding more trees gives a positive impact to reduce the error of the landmark detector. The same trend happens for all combinations of channel features in both sets. In Figure 4.2a, we can derive that the accuracies for Depth channel and Depth + Gray channels stabilize at about 7 trees, while the accuracy for Depth + Gray + 4 Gradients converges even faster after using 3 trees. We noted that when 7 trees are used, Depth + Gray channels and Depth + Gray + 4 Gradients channels perform equally well. In Figure 4.2b, the combination of depth, gray and 4 gradient channels outperforms the other channels. Using additional gray and 4 gradient channels is able to reduce the error especially for the landmarks that have small variations of depth values, e.g. eye corners. Our following experiments are conducted with optimal number of trees (7) for frontal face dataset and (20) for full dataset.

33 Chapter 4. Experiments Depth Depth + Gray Depth + Gray + 4 Gradients Mean Error (mm) #Trees (Depth = 20) (a) Frontal face test set [ 20, 20 ] Depth Depth + Gray Depth + Gray + Gradient 300 Mean Error (mm) #Trees (Depth = 20) (b) Full test set Figure 4.2: The influence of number of trees, measured with mean euclidean error, averaged over all landmarks and all images in test set Accuracy In section 2.2, we have defined positive samples for each landmark k by a certain radius. To preserve consistency, we also evaluated the accuracy of the detector with a certain error threshold. Any prediction that produces an error larger than threshold is considered as a missed detection. Figure 4.3a and 4.3b depict the accuracy as function of different error thresholds evaluated on both the frontal face and full test set, respectively. Stable accuracy is achieved when the threshold is set to 20mm. Once again for the frontal face, the combination of Depth + Gray channels and Depth + Gray + 4 Gradients channels have similar performance. While for the full test set, the combination of Depth + Gray + 4 Gradients channels provides higher accuracy than the other combinations.

34 Chapter 4. Experiments Accuracy (%) Depth 10 Depth + Gray Depth + Gray + 4 Gradients Error Threshold (mm) (a) Frontal face test set [ 20, 20 ] Accuracy (%) Depth 10 Depth + Gray Depth + Gray + 4 Gradients Error Threshold (mm) (b) Full test set Figure 4.3: Detection accuracy as function of error threshold, averaged over all landmarks and all images in test set Accuracy vs Efficiency In this experiment we study the effect of the stride parameter in terms of accuracy and efficiency. We measured average time needed to test a single image after it has been loaded into the memory, and compared it with the resulting accuracy. Figures 4.4a and 4.4b show the evaluation results on the frontal face test set. Figure 4.5a and 4.5b show the evaluation results on the full test set. The results illustrate that the value of the stride parameter is negatively correlated with the accuracy. Using a smaller stride value yields high accuracy (Figures 4.4b and 4.5b). However, it comes with the expense of processing time (Figure 4.4a and 4.5a). When we

35 Chapter 4. Experiments Depth Depth + Gray Depth + Gray + 4 Gradients 300 Time (ms) Stride (pixel) (a) Depth Depth + Gray Depth + Gray + 4 Gradients 90 Accuracy (%) Stride (pixel) (b) Figure 4.4: 4.4a Execution time as function of the stride parameter. 4.4b Accuracy as a function of the stride parameter. Time and accuracy are averaged over all landmarks and all images in the frontal face test set. use larger stride values, it fastens the process but decreases the accuracy. By comparing the results, we can conclude that the choice of the stride parameter controls the trade-off between accuracy and efficiency. In the case when execution time is not a constraint, 5 pixels stride can be utilized since it mantains high accuracy with computational time still under 1 second. For real time applications, larger stride values can be considered. Our following experiments are conducted with a 5 pixel stride.

36 Chapter 4. Experiments Depth Depth + Gray Depth + Gray + Gradient 1200 Time (ms) Stride (pixel) (a) Accuracy (%) Depth Depth + Gray Depth + Gray + 4 Gradients Stride (pixel) (b) Figure 4.5: 4.5a Execution time as function of stride parameter. 4.5b Accuracy as function of stride parameter. Time and accuracy are averaged over all landmarks and all images in full test set Channel Features Comparison Our approach differs from other facial alignment approaches because we do not build a landmark or shape model beforehand to fit test image. This makes our detection results sensitive for landmark combination. For this reason, we prefer to evaluate each individual landmark separately. We present the performance results of different channel features evaluated on both the frontal face and full test set. Table 4.1 summarizes the experimental results.

37 Chapter 4. Experiments 27 Depth Depth + Gray Depth + Gray + 1 Gradient Depth + Gray + 4 Gradients Depth + Gray + 9 Gradients Accuracy (%) Chin Nose Tip R Eye Out R Eye Inn L Eye Inn L Eye Out Figure 4.6: Channel features performance comparison in frontal face dataset (head pose range [ 20, 20 ]). Note that in this dataset all landmarks are visible Frontal face dataset Figure 4.6 illustrates the performance of different combinations of channel features. The nose tip landmark is the most correctly predicted landmark followed by the inner eye corner landmarks. This is not surprising since for nearly frontal faces, the nose area is the most distinctive area in the face. To detect these landmarks, using only depth channels already achieves 100% or close to 100% accuracy. Adding more channels does not yield performance improvement. In contrast, the chin is the landmark that is often misplaced when only the depth channel is used. Since we use relatively small patches, this implies that the depth values around the chin are not distinguishable enough compared to the other regions. Using an additional gray channel is effective to reduce the misdetection rate. Adding gradient channels also increases the accuracy but still cannot outperform the combination of depth and gray channels. Another example of misdetection is between the outer eye corners and mouth corners. For a number of subjects, the detector wrongly predicts the mouth corners as outer eye corners. We identified that this condition happens when the features between these regions contain small variations. Some examples of the failure cases are presented in Figure 4.7. Overall, the best performing channels to detect 3D landmarks for frontal faces is the combination of depth, gray and 4 gradient channels. This combination produces highest the mean accuracy and lowest error, as shown in Table 4.1.

38 Chapter 4. Experiments 28 Figure 4.7: Examples of failure cases in frontal face test set [ 20, 20 ], randomly selected from all channel combinations. Chin and outer eye corners are the most often misplaced landmarks Full dataset Depth Depth + Gray Depth + Gray + 1 Gradient Depth + Gray + 4 Gradients Depth + Gray + 9 Gradients Accuracy (%) Chin Nose Tip R Eye Out R Eye Inn L Eye Inn L Eye Out Figure 4.8: Channel feature performance comparison for the full test set. The accuracy is computed only using the visible landmarks. The performance results for large head pose variations are shown in Figure 4.8. We can derive that adding gray and gradient channels provides significant improvent to the accuracies of chin and outer eye corners. For the nose tip and inner eye corners, adding gray and gradient channels only results in a small improvement, since depth channel already produces at least 85% accuracy. However, the results also imply that chin landmarks are still the most difficult landmarks to detect. When only the depth channel used, our detector only achieves a 32% accuracy. Adding gray and 4 gradient channels augments the accuracy up to 68%. Even with 36% improvement, its accuracy is still the lowest compared to the others. The other landmarks have at least 80% accuracy when the gray and 4 gradients channels used. A number of misdetection cases are shown in Figure 4.9. It shows that landmarks obtained using only the depth channel compared to landmarks obtained using additional gray and gradient channels.

39 Chapter 4. Experiments 29 Figure 4.9: Examples of failure cases in full test set. First row: Examples of failure cases when only depth channel used. Second row: Results from the same images when gray and gradient channels are added. The comparison results are summarized in Table 4.1. These results lead us to the same conclusion as the previous experiment. We can conclude that the best performing channels to detect the 3D landmarks for large head pose variations is the combination of depth, gray and 4 gradient channels. Dataset Channel Features Mean Accuracy (%) Mean Error (mm) Frontal face Depth Depth Depth Depth Depth Gray Gray + 1 Gradient Gray + 4 Gradients Gray + 9 Gradients Full Depth Depth Depth Depth Depth Gray Gray + 1 Gradient Gray + 4 Gradients Gray + 9 Gradients Table 4.1: Summary of the performance of different channel features, averaged over all landmarks and all test images in dataset.

40 Chapter 4. Experiments 30 Lastly, using the best performing setting, we computed a 4-fold subject-independent cross validations on the entire Biwi kinect dataset. presented in Table 4.2. The result for this evaluation is Mean Error Chin Nose Tip R Eye Out R Eye Inn L Eye Inn L Eye Out Table 4.2: Evaluation result on 4-fold subject-independent cross validation performed with the best setting. The numbers represent euclidean error in mm Analysis Performance under Head Pose Variations In this section, we further analyze the accuracy of our detector for different poses. To do this, we test our best performing model on the discretized test set. The test set was discretized according to head poses in areas and the accuracy was computed for each range separately. Hence, the performance is known for the detector in each discretized head pose. Figure The result of this experiment is presented as a heat map in Yaw Pitch Success Ratio Figure 4.10: Evaluation result on test set, discretized in areas depending on their head pose angles. The colors and numbers represent success ratio of the detector, averaged over visible landmarks and test images in each area. The optimal settings were applied (20 Trees with Depth + Gray + 4 Gradient channels) The graph shows that the detector achieved highest accuracy in frontal faces ( 20 head pose 20 ). This result is consistent with our previous result (Table 4.1), where for frontal faces the success rate is 1 or close to 1. We can also see from the graph that the success rates naturally decrease when the head pose angles become larger, especially when the angle is larger than 40.

41 Chapter 4. Experiments 31 For large poses, the detector often wrongly predicts the areas that have similar texture or depth to the ground truths as landmarks. Even the areas that do not belong to the face region. For instance, ears and hair are misdetected as eye corners since they have similar textures as well as the depth values. Another factor that contributes to the performance drop is that the lack of training images for large poses. We noted that frontal faces have many more training images than the faces with large poses. Adding more training images or oversampling the images for large poses would resolve this. Figure 4.11: Examples of failure cases in large pitch (top row) and yaw (bottom row) angles. The detector often mistakenly predict areas such as ear, hair line, and neck as landmarks Comparison with 2D-based State-of-the-Art For the last experiment, we compared the performance of our detector against the 2Dbased method, that is Ensemble Regression Trees (ERT) [7]. We used available source code from DLIB Library [36] and ran 4-fold cross validation on entire Biwi Kinect dataset. Since this method relies on the face detection, we also provided 100x100 pixels ground truth face boundary as input. The trained model detects landmarks in 2D and then convert these landmarks into 3D coordinate to be evaluated. The result of this evaluation is presented in Table 4.3. Method Mean Error Chin Nose Tip R Eye Out R Eye Inn L Eye Inn L Eye Out ERT [7] RF (ours) Table 4.3: Performance comparison with Ensemble Regression Trees [7]. The numbers represent Euclidean distance in mm (lower is better).

Applying Synthetic Images to Learning Grasping Orientation from Single Monocular Images

Applying Synthetic Images to Learning Grasping Orientation from Single Monocular Images Applying Synthetic Images to Learning Grasping Orientation from Single Monocular Images 1 Introduction - Steve Chuang and Eric Shan - Determining object orientation in images is a well-established topic

More information

Generic Face Alignment Using an Improved Active Shape Model

Generic Face Alignment Using an Improved Active Shape Model Generic Face Alignment Using an Improved Active Shape Model Liting Wang, Xiaoqing Ding, Chi Fang Electronic Engineering Department, Tsinghua University, Beijing, China {wanglt, dxq, fangchi} @ocrserv.ee.tsinghua.edu.cn

More information

Articulated Pose Estimation with Flexible Mixtures-of-Parts

Articulated Pose Estimation with Flexible Mixtures-of-Parts Articulated Pose Estimation with Flexible Mixtures-of-Parts PRESENTATION: JESSE DAVIS CS 3710 VISUAL RECOGNITION Outline Modeling Special Cases Inferences Learning Experiments Problem and Relevance Problem:

More information

Mobile Human Detection Systems based on Sliding Windows Approach-A Review

Mobile Human Detection Systems based on Sliding Windows Approach-A Review Mobile Human Detection Systems based on Sliding Windows Approach-A Review Seminar: Mobile Human detection systems Njieutcheu Tassi cedrique Rovile Department of Computer Engineering University of Heidelberg

More information

CS 223B Computer Vision Problem Set 3

CS 223B Computer Vision Problem Set 3 CS 223B Computer Vision Problem Set 3 Due: Feb. 22 nd, 2011 1 Probabilistic Recursion for Tracking In this problem you will derive a method for tracking a point of interest through a sequence of images.

More information

Image Processing. Image Features

Image Processing. Image Features Image Processing Image Features Preliminaries 2 What are Image Features? Anything. What they are used for? Some statements about image fragments (patches) recognition Search for similar patches matching

More information

Parallel Tracking. Henry Spang Ethan Peters

Parallel Tracking. Henry Spang Ethan Peters Parallel Tracking Henry Spang Ethan Peters Contents Introduction HAAR Cascades Viola Jones Descriptors FREAK Descriptor Parallel Tracking GPU Detection Conclusions Questions Introduction Tracking is a

More information

SIFT: SCALE INVARIANT FEATURE TRANSFORM SURF: SPEEDED UP ROBUST FEATURES BASHAR ALSADIK EOS DEPT. TOPMAP M13 3D GEOINFORMATION FROM IMAGES 2014

SIFT: SCALE INVARIANT FEATURE TRANSFORM SURF: SPEEDED UP ROBUST FEATURES BASHAR ALSADIK EOS DEPT. TOPMAP M13 3D GEOINFORMATION FROM IMAGES 2014 SIFT: SCALE INVARIANT FEATURE TRANSFORM SURF: SPEEDED UP ROBUST FEATURES BASHAR ALSADIK EOS DEPT. TOPMAP M13 3D GEOINFORMATION FROM IMAGES 2014 SIFT SIFT: Scale Invariant Feature Transform; transform image

More information

Probabilistic Facial Feature Extraction Using Joint Distribution of Location and Texture Information

Probabilistic Facial Feature Extraction Using Joint Distribution of Location and Texture Information Probabilistic Facial Feature Extraction Using Joint Distribution of Location and Texture Information Mustafa Berkay Yilmaz, Hakan Erdogan, Mustafa Unel Sabanci University, Faculty of Engineering and Natural

More information

SUMMARY: DISTINCTIVE IMAGE FEATURES FROM SCALE- INVARIANT KEYPOINTS

SUMMARY: DISTINCTIVE IMAGE FEATURES FROM SCALE- INVARIANT KEYPOINTS SUMMARY: DISTINCTIVE IMAGE FEATURES FROM SCALE- INVARIANT KEYPOINTS Cognitive Robotics Original: David G. Lowe, 004 Summary: Coen van Leeuwen, s1460919 Abstract: This article presents a method to extract

More information

CS 231A Computer Vision (Fall 2012) Problem Set 3

CS 231A Computer Vision (Fall 2012) Problem Set 3 CS 231A Computer Vision (Fall 2012) Problem Set 3 Due: Nov. 13 th, 2012 (2:15pm) 1 Probabilistic Recursion for Tracking (20 points) In this problem you will derive a method for tracking a point of interest

More information

Face detection and recognition. Detection Recognition Sally

Face detection and recognition. Detection Recognition Sally Face detection and recognition Detection Recognition Sally Face detection & recognition Viola & Jones detector Available in open CV Face recognition Eigenfaces for face recognition Metric learning identification

More information

Ulas Bagci

Ulas Bagci CAP5415-Computer Vision Lecture 14-Decision Forests for Computer Vision Ulas Bagci bagci@ucf.edu 1 Readings Slide Credits: Criminisi and Shotton Z. Tu R.Cipolla 2 Common Terminologies Randomized Decision

More information

Subject-Oriented Image Classification based on Face Detection and Recognition

Subject-Oriented Image Classification based on Face Detection and Recognition 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

Business Club. Decision Trees

Business Club. Decision Trees Business Club Decision Trees Business Club Analytics Team December 2017 Index 1. Motivation- A Case Study 2. The Trees a. What is a decision tree b. Representation 3. Regression v/s Classification 4. Building

More information

Disguised Face Identification (DFI) with Facial KeyPoints using Spatial Fusion Convolutional Network. Nathan Sun CIS601

Disguised Face Identification (DFI) with Facial KeyPoints using Spatial Fusion Convolutional Network. Nathan Sun CIS601 Disguised Face Identification (DFI) with Facial KeyPoints using Spatial Fusion Convolutional Network Nathan Sun CIS601 Introduction Face ID is complicated by alterations to an individual s appearance Beard,

More information

Real Time Head Pose Estimation from Consumer Depth Cameras

Real Time Head Pose Estimation from Consumer Depth Cameras Real Time Head Pose Estimation from Consumer Depth Cameras Gabriele Fanelli 1, Thibaut Weise 2, Juergen Gall 1 and Luc Van Gool 1,3 1 ETH Zurich, Switzerland 2 EPFL Lausanne, Switzerland 3 KU Leuven, Belgium

More information

SIFT - scale-invariant feature transform Konrad Schindler

SIFT - scale-invariant feature transform Konrad Schindler SIFT - scale-invariant feature transform Konrad Schindler Institute of Geodesy and Photogrammetry Invariant interest points Goal match points between images with very different scale, orientation, projective

More information

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor COSC160: Detection and Classification Jeremy Bolton, PhD Assistant Teaching Professor Outline I. Problem I. Strategies II. Features for training III. Using spatial information? IV. Reducing dimensionality

More information

Adaptive Skin Color Classifier for Face Outline Models

Adaptive Skin Color Classifier for Face Outline Models Adaptive Skin Color Classifier for Face Outline Models M. Wimmer, B. Radig, M. Beetz Informatik IX, Technische Universität München, Germany Boltzmannstr. 3, 87548 Garching, Germany [wimmerm, radig, beetz]@informatik.tu-muenchen.de

More information

Tri-modal Human Body Segmentation

Tri-modal Human Body Segmentation Tri-modal Human Body Segmentation Master of Science Thesis Cristina Palmero Cantariño Advisor: Sergio Escalera Guerrero February 6, 2014 Outline 1 Introduction 2 Tri-modal dataset 3 Proposed baseline 4

More information

Postprint.

Postprint. http://www.diva-portal.org Postprint This is the accepted version of a paper presented at 14th International Conference of the Biometrics Special Interest Group, BIOSIG, Darmstadt, Germany, 9-11 September,

More information

FACE DETECTION AND RECOGNITION OF DRAWN CHARACTERS HERMAN CHAU

FACE DETECTION AND RECOGNITION OF DRAWN CHARACTERS HERMAN CHAU FACE DETECTION AND RECOGNITION OF DRAWN CHARACTERS HERMAN CHAU 1. Introduction Face detection of human beings has garnered a lot of interest and research in recent years. There are quite a few relatively

More information

Describable Visual Attributes for Face Verification and Image Search

Describable Visual Attributes for Face Verification and Image Search Advanced Topics in Multimedia Analysis and Indexing, Spring 2011, NTU. 1 Describable Visual Attributes for Face Verification and Image Search Kumar, Berg, Belhumeur, Nayar. PAMI, 2011. Ryan Lei 2011/05/05

More information

FACE RECOGNITION USING INDEPENDENT COMPONENT

FACE RECOGNITION USING INDEPENDENT COMPONENT Chapter 5 FACE RECOGNITION USING INDEPENDENT COMPONENT ANALYSIS OF GABORJET (GABORJET-ICA) 5.1 INTRODUCTION PCA is probably the most widely used subspace projection technique for face recognition. A major

More information

Big Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1

Big Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1 Big Data Methods Chapter 5: Machine learning Big Data Methods, Chapter 5, Slide 1 5.1 Introduction to machine learning What is machine learning? Concerned with the study and development of algorithms that

More information

Multi-view Facial Expression Recognition Analysis with Generic Sparse Coding Feature

Multi-view Facial Expression Recognition Analysis with Generic Sparse Coding Feature 0/19.. Multi-view Facial Expression Recognition Analysis with Generic Sparse Coding Feature Usman Tariq, Jianchao Yang, Thomas S. Huang Department of Electrical and Computer Engineering Beckman Institute

More information

Estimating Human Pose in Images. Navraj Singh December 11, 2009

Estimating Human Pose in Images. Navraj Singh December 11, 2009 Estimating Human Pose in Images Navraj Singh December 11, 2009 Introduction This project attempts to improve the performance of an existing method of estimating the pose of humans in still images. Tasks

More information

Face Alignment Under Various Poses and Expressions

Face Alignment Under Various Poses and Expressions Face Alignment Under Various Poses and Expressions Shengjun Xin and Haizhou Ai Computer Science and Technology Department, Tsinghua University, Beijing 100084, China ahz@mail.tsinghua.edu.cn Abstract.

More information

Gesture Recognition: Hand Pose Estimation. Adrian Spurr Ubiquitous Computing Seminar FS

Gesture Recognition: Hand Pose Estimation. Adrian Spurr Ubiquitous Computing Seminar FS Gesture Recognition: Hand Pose Estimation Adrian Spurr Ubiquitous Computing Seminar FS2014 27.05.2014 1 What is hand pose estimation? Input Computer-usable form 2 Augmented Reality Gaming Robot Control

More information

Facial Expression Recognition Using Non-negative Matrix Factorization

Facial Expression Recognition Using Non-negative Matrix Factorization Facial Expression Recognition Using Non-negative Matrix Factorization Symeon Nikitidis, Anastasios Tefas and Ioannis Pitas Artificial Intelligence & Information Analysis Lab Department of Informatics Aristotle,

More information

Image Processing Pipeline for Facial Expression Recognition under Variable Lighting

Image Processing Pipeline for Facial Expression Recognition under Variable Lighting Image Processing Pipeline for Facial Expression Recognition under Variable Lighting Ralph Ma, Amr Mohamed ralphma@stanford.edu, amr1@stanford.edu Abstract Much research has been done in the field of automated

More information

Fast Edge Detection Using Structured Forests

Fast Edge Detection Using Structured Forests Fast Edge Detection Using Structured Forests Piotr Dollár, C. Lawrence Zitnick [1] Zhihao Li (zhihaol@andrew.cmu.edu) Computer Science Department Carnegie Mellon University Table of contents 1. Introduction

More information

Computer Vision for HCI. Topics of This Lecture

Computer Vision for HCI. Topics of This Lecture Computer Vision for HCI Interest Points Topics of This Lecture Local Invariant Features Motivation Requirements, Invariances Keypoint Localization Features from Accelerated Segment Test (FAST) Harris Shi-Tomasi

More information

Accurate 3D Face and Body Modeling from a Single Fixed Kinect

Accurate 3D Face and Body Modeling from a Single Fixed Kinect Accurate 3D Face and Body Modeling from a Single Fixed Kinect Ruizhe Wang*, Matthias Hernandez*, Jongmoo Choi, Gérard Medioni Computer Vision Lab, IRIS University of Southern California Abstract In this

More information

Object Category Detection: Sliding Windows

Object Category Detection: Sliding Windows 04/10/12 Object Category Detection: Sliding Windows Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem Today s class: Object Category Detection Overview of object category detection Statistical

More information

IMPROVED FACE RECOGNITION USING ICP TECHNIQUES INCAMERA SURVEILLANCE SYSTEMS. Kirthiga, M.E-Communication system, PREC, Thanjavur

IMPROVED FACE RECOGNITION USING ICP TECHNIQUES INCAMERA SURVEILLANCE SYSTEMS. Kirthiga, M.E-Communication system, PREC, Thanjavur IMPROVED FACE RECOGNITION USING ICP TECHNIQUES INCAMERA SURVEILLANCE SYSTEMS Kirthiga, M.E-Communication system, PREC, Thanjavur R.Kannan,Assistant professor,prec Abstract: Face Recognition is important

More information

Discriminative classifiers for image recognition

Discriminative classifiers for image recognition Discriminative classifiers for image recognition May 26 th, 2015 Yong Jae Lee UC Davis Outline Last time: window-based generic object detection basic pipeline face detection with boosting as case study

More information

Outline 7/2/201011/6/

Outline 7/2/201011/6/ Outline Pattern recognition in computer vision Background on the development of SIFT SIFT algorithm and some of its variations Computational considerations (SURF) Potential improvement Summary 01 2 Pattern

More information

Face Detection Using Convolutional Neural Networks and Gabor Filters

Face Detection Using Convolutional Neural Networks and Gabor Filters Face Detection Using Convolutional Neural Networks and Gabor Filters Bogdan Kwolek Rzeszów University of Technology W. Pola 2, 35-959 Rzeszów, Poland bkwolek@prz.rzeszow.pl Abstract. This paper proposes

More information

Cross-pose Facial Expression Recognition

Cross-pose Facial Expression Recognition Cross-pose Facial Expression Recognition Abstract In real world facial expression recognition (FER) applications, it is not practical for a user to enroll his/her facial expressions under different pose

More information

Study of Viola-Jones Real Time Face Detector

Study of Viola-Jones Real Time Face Detector Study of Viola-Jones Real Time Face Detector Kaiqi Cen cenkaiqi@gmail.com Abstract Face detection has been one of the most studied topics in computer vision literature. Given an arbitrary image the goal

More information

Locating Facial Landmarks Using Probabilistic Random Forest

Locating Facial Landmarks Using Probabilistic Random Forest 2324 IEEE SIGNAL PROCESSING LETTERS, VOL. 22, NO. 12, DECEMBER 2015 Locating Facial Landmarks Using Probabilistic Random Forest Changwei Luo, Zengfu Wang, Shaobiao Wang, Juyong Zhang, and Jun Yu Abstract

More information

Machine Learning for Medical Image Analysis. A. Criminisi

Machine Learning for Medical Image Analysis. A. Criminisi Machine Learning for Medical Image Analysis A. Criminisi Overview Introduction to machine learning Decision forests Applications in medical image analysis Anatomy localization in CT Scans Spine Detection

More information

SYDE Winter 2011 Introduction to Pattern Recognition. Clustering

SYDE Winter 2011 Introduction to Pattern Recognition. Clustering SYDE 372 - Winter 2011 Introduction to Pattern Recognition Clustering Alexander Wong Department of Systems Design Engineering University of Waterloo Outline 1 2 3 4 5 All the approaches we have learned

More information

AAM Based Facial Feature Tracking with Kinect

AAM Based Facial Feature Tracking with Kinect BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 15, No 3 Sofia 2015 Print ISSN: 1311-9702; Online ISSN: 1314-4081 DOI: 10.1515/cait-2015-0046 AAM Based Facial Feature Tracking

More information

Computationally Efficient Serial Combination of Rotation-invariant and Rotation Compensating Iris Recognition Algorithms

Computationally Efficient Serial Combination of Rotation-invariant and Rotation Compensating Iris Recognition Algorithms Computationally Efficient Serial Combination of Rotation-invariant and Rotation Compensating Iris Recognition Algorithms Andreas Uhl Department of Computer Sciences University of Salzburg, Austria uhl@cosy.sbg.ac.at

More information

Learning to Recognize Faces in Realistic Conditions

Learning to Recognize Faces in Realistic Conditions 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

Image processing and features

Image processing and features Image processing and features Gabriele Bleser gabriele.bleser@dfki.de Thanks to Harald Wuest, Folker Wientapper and Marc Pollefeys Introduction Previous lectures: geometry Pose estimation Epipolar geometry

More information

Occluded Facial Expression Tracking

Occluded Facial Expression Tracking Occluded Facial Expression Tracking Hugo Mercier 1, Julien Peyras 2, and Patrice Dalle 1 1 Institut de Recherche en Informatique de Toulouse 118, route de Narbonne, F-31062 Toulouse Cedex 9 2 Dipartimento

More information

Object detection using non-redundant local Binary Patterns

Object detection using non-redundant local Binary Patterns University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 2010 Object detection using non-redundant local Binary Patterns Duc Thanh

More information

Supervised Learning for Image Segmentation

Supervised Learning for Image Segmentation Supervised Learning for Image Segmentation Raphael Meier 06.10.2016 Raphael Meier MIA 2016 06.10.2016 1 / 52 References A. Ng, Machine Learning lecture, Stanford University. A. Criminisi, J. Shotton, E.

More information

MediaTek Video Face Beautify

MediaTek Video Face Beautify MediaTek Video Face Beautify November 2014 2014 MediaTek Inc. Table of Contents 1 Introduction... 3 2 The MediaTek Solution... 4 3 Overview of Video Face Beautify... 4 4 Face Detection... 6 5 Skin Detection...

More information

Exploring Bag of Words Architectures in the Facial Expression Domain

Exploring Bag of Words Architectures in the Facial Expression Domain Exploring Bag of Words Architectures in the Facial Expression Domain Karan Sikka, Tingfan Wu, Josh Susskind, and Marian Bartlett Machine Perception Laboratory, University of California San Diego {ksikka,ting,josh,marni}@mplab.ucsd.edu

More information

Deep Learning for Virtual Shopping. Dr. Jürgen Sturm Group Leader RGB-D

Deep Learning for Virtual Shopping. Dr. Jürgen Sturm Group Leader RGB-D Deep Learning for Virtual Shopping Dr. Jürgen Sturm Group Leader RGB-D metaio GmbH Augmented Reality with the Metaio SDK: IKEA Catalogue App Metaio: Augmented Reality Metaio SDK for ios, Android and Windows

More information

Face Detection and Alignment. Prof. Xin Yang HUST

Face Detection and Alignment. Prof. Xin Yang HUST Face Detection and Alignment Prof. Xin Yang HUST Many slides adapted from P. Viola Face detection Face detection Basic idea: slide a window across image and evaluate a face model at every location Challenges

More information

Computer Vision Group Prof. Daniel Cremers. 8. Boosting and Bagging

Computer Vision Group Prof. Daniel Cremers. 8. Boosting and Bagging Prof. Daniel Cremers 8. Boosting and Bagging Repetition: Regression We start with a set of basis functions (x) =( 0 (x), 1(x),..., M 1(x)) x 2 í d The goal is to fit a model into the data y(x, w) =w T

More information

Face Recognition using SURF Features and SVM Classifier

Face Recognition using SURF Features and SVM Classifier International Journal of Electronics Engineering Research. ISSN 0975-6450 Volume 8, Number 1 (016) pp. 1-8 Research India Publications http://www.ripublication.com Face Recognition using SURF Features

More information

A Keypoint Descriptor Inspired by Retinal Computation

A Keypoint Descriptor Inspired by Retinal Computation A Keypoint Descriptor Inspired by Retinal Computation Bongsoo Suh, Sungjoon Choi, Han Lee Stanford University {bssuh,sungjoonchoi,hanlee}@stanford.edu Abstract. The main goal of our project is to implement

More information

Classification of objects from Video Data (Group 30)

Classification of objects from Video Data (Group 30) Classification of objects from Video Data (Group 30) Sheallika Singh 12665 Vibhuti Mahajan 12792 Aahitagni Mukherjee 12001 M Arvind 12385 1 Motivation Video surveillance has been employed for a long time

More information

Computer vision: models, learning and inference. Chapter 13 Image preprocessing and feature extraction

Computer vision: models, learning and inference. Chapter 13 Image preprocessing and feature extraction Computer vision: models, learning and inference Chapter 13 Image preprocessing and feature extraction Preprocessing The goal of pre-processing is to try to reduce unwanted variation in image due to lighting,

More information

Local Features: Detection, Description & Matching

Local Features: Detection, Description & Matching Local Features: Detection, Description & Matching Lecture 08 Computer Vision Material Citations Dr George Stockman Professor Emeritus, Michigan State University Dr David Lowe Professor, University of British

More information

Skin and Face Detection

Skin and Face Detection Skin and Face Detection Linda Shapiro EE/CSE 576 1 What s Coming 1. Review of Bakic flesh detector 2. Fleck and Forsyth flesh detector 3. Details of Rowley face detector 4. Review of the basic AdaBoost

More information

Using Machine Learning to Optimize Storage Systems

Using Machine Learning to Optimize Storage Systems Using Machine Learning to Optimize Storage Systems Dr. Kiran Gunnam 1 Outline 1. Overview 2. Building Flash Models using Logistic Regression. 3. Storage Object classification 4. Storage Allocation recommendation

More information

Training-Free, Generic Object Detection Using Locally Adaptive Regression Kernels

Training-Free, Generic Object Detection Using Locally Adaptive Regression Kernels Training-Free, Generic Object Detection Using Locally Adaptive Regression Kernels IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIENCE, VOL.32, NO.9, SEPTEMBER 2010 Hae Jong Seo, Student Member,

More information

Advanced Video Content Analysis and Video Compression (5LSH0), Module 8B

Advanced Video Content Analysis and Video Compression (5LSH0), Module 8B Advanced Video Content Analysis and Video Compression (5LSH0), Module 8B 1 Supervised learning Catogarized / labeled data Objects in a picture: chair, desk, person, 2 Classification Fons van der Sommen

More information

Using Geometric Blur for Point Correspondence

Using Geometric Blur for Point Correspondence 1 Using Geometric Blur for Point Correspondence Nisarg Vyas Electrical and Computer Engineering Department, Carnegie Mellon University, Pittsburgh, PA Abstract In computer vision applications, point correspondence

More information

Automatic Colorization of Grayscale Images

Automatic Colorization of Grayscale Images Automatic Colorization of Grayscale Images Austin Sousa Rasoul Kabirzadeh Patrick Blaes Department of Electrical Engineering, Stanford University 1 Introduction ere exists a wealth of photographic images,

More information

Eye Detection by Haar wavelets and cascaded Support Vector Machine

Eye Detection by Haar wavelets and cascaded Support Vector Machine Eye Detection by Haar wavelets and cascaded Support Vector Machine Vishal Agrawal B.Tech 4th Year Guide: Simant Dubey / Amitabha Mukherjee Dept of Computer Science and Engineering IIT Kanpur - 208 016

More information

Decision Trees Dr. G. Bharadwaja Kumar VIT Chennai

Decision Trees Dr. G. Bharadwaja Kumar VIT Chennai Decision Trees Decision Tree Decision Trees (DTs) are a nonparametric supervised learning method used for classification and regression. The goal is to create a model that predicts the value of a target

More information

Object Detection Design challenges

Object Detection Design challenges Object Detection Design challenges How to efficiently search for likely objects Even simple models require searching hundreds of thousands of positions and scales Feature design and scoring How should

More information

Object Recognition Using Pictorial Structures. Daniel Huttenlocher Computer Science Department. In This Talk. Object recognition in computer vision

Object Recognition Using Pictorial Structures. Daniel Huttenlocher Computer Science Department. In This Talk. Object recognition in computer vision Object Recognition Using Pictorial Structures Daniel Huttenlocher Computer Science Department Joint work with Pedro Felzenszwalb, MIT AI Lab In This Talk Object recognition in computer vision Brief definition

More information

Structured Light II. Thanks to Ronen Gvili, Szymon Rusinkiewicz and Maks Ovsjanikov

Structured Light II. Thanks to Ronen Gvili, Szymon Rusinkiewicz and Maks Ovsjanikov Structured Light II Johannes Köhler Johannes.koehler@dfki.de Thanks to Ronen Gvili, Szymon Rusinkiewicz and Maks Ovsjanikov Introduction Previous lecture: Structured Light I Active Scanning Camera/emitter

More information

Categorization by Learning and Combining Object Parts

Categorization by Learning and Combining Object Parts Categorization by Learning and Combining Object Parts Bernd Heisele yz Thomas Serre y Massimiliano Pontil x Thomas Vetter Λ Tomaso Poggio y y Center for Biological and Computational Learning, M.I.T., Cambridge,

More information

Designing Applications that See Lecture 7: Object Recognition

Designing Applications that See Lecture 7: Object Recognition stanford hci group / cs377s Designing Applications that See Lecture 7: Object Recognition Dan Maynes-Aminzade 29 January 2008 Designing Applications that See http://cs377s.stanford.edu Reminders Pick up

More information

Feature Detection. Raul Queiroz Feitosa. 3/30/2017 Feature Detection 1

Feature Detection. Raul Queiroz Feitosa. 3/30/2017 Feature Detection 1 Feature Detection Raul Queiroz Feitosa 3/30/2017 Feature Detection 1 Objetive This chapter discusses the correspondence problem and presents approaches to solve it. 3/30/2017 Feature Detection 2 Outline

More information

arxiv: v1 [cs.cv] 28 Sep 2018

arxiv: v1 [cs.cv] 28 Sep 2018 Camera Pose Estimation from Sequence of Calibrated Images arxiv:1809.11066v1 [cs.cv] 28 Sep 2018 Jacek Komorowski 1 and Przemyslaw Rokita 2 1 Maria Curie-Sklodowska University, Institute of Computer Science,

More information

IRIS SEGMENTATION OF NON-IDEAL IMAGES

IRIS SEGMENTATION OF NON-IDEAL IMAGES IRIS SEGMENTATION OF NON-IDEAL IMAGES William S. Weld St. Lawrence University Computer Science Department Canton, NY 13617 Xiaojun Qi, Ph.D Utah State University Computer Science Department Logan, UT 84322

More information

Motion illusion, rotating snakes

Motion illusion, rotating snakes Motion illusion, rotating snakes Local features: main components 1) Detection: Find a set of distinctive key points. 2) Description: Extract feature descriptor around each interest point as vector. x 1

More information

Epithelial rosette detection in microscopic images

Epithelial rosette detection in microscopic images Epithelial rosette detection in microscopic images Kun Liu,3, Sandra Ernst 2,3, Virginie Lecaudey 2,3 and Olaf Ronneberger,3 Department of Computer Science 2 Department of Developmental Biology 3 BIOSS

More information

Ensemble of Bayesian Filters for Loop Closure Detection

Ensemble of Bayesian Filters for Loop Closure Detection Ensemble of Bayesian Filters for Loop Closure Detection Mohammad Omar Salameh, Azizi Abdullah, Shahnorbanun Sahran Pattern Recognition Research Group Center for Artificial Intelligence Faculty of Information

More information

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Classification Vladimir Curic Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Outline An overview on classification Basics of classification How to choose appropriate

More information

Nonrigid Surface Modelling. and Fast Recovery. Department of Computer Science and Engineering. Committee: Prof. Leo J. Jia and Prof. K. H.

Nonrigid Surface Modelling. and Fast Recovery. Department of Computer Science and Engineering. Committee: Prof. Leo J. Jia and Prof. K. H. Nonrigid Surface Modelling and Fast Recovery Zhu Jianke Supervisor: Prof. Michael R. Lyu Committee: Prof. Leo J. Jia and Prof. K. H. Wong Department of Computer Science and Engineering May 11, 2007 1 2

More information

Criminal Identification System Using Face Detection and Recognition

Criminal Identification System Using Face Detection and Recognition Criminal Identification System Using Face Detection and Recognition Piyush Kakkar 1, Mr. Vibhor Sharma 2 Information Technology Department, Maharaja Agrasen Institute of Technology, Delhi 1 Assistant Professor,

More information

Face Recognition At-a-Distance Based on Sparse-Stereo Reconstruction

Face Recognition At-a-Distance Based on Sparse-Stereo Reconstruction Face Recognition At-a-Distance Based on Sparse-Stereo Reconstruction Ham Rara, Shireen Elhabian, Asem Ali University of Louisville Louisville, KY {hmrara01,syelha01,amali003}@louisville.edu Mike Miller,

More information

Building a Panorama. Matching features. Matching with Features. How do we build a panorama? Computational Photography, 6.882

Building a Panorama. Matching features. Matching with Features. How do we build a panorama? Computational Photography, 6.882 Matching features Building a Panorama Computational Photography, 6.88 Prof. Bill Freeman April 11, 006 Image and shape descriptors: Harris corner detectors and SIFT features. Suggested readings: Mikolajczyk

More information

Face recognition algorithms: performance evaluation

Face recognition algorithms: performance evaluation Face recognition algorithms: performance evaluation Project Report Marco Del Coco - Pierluigi Carcagnì Institute of Applied Sciences and Intelligent systems c/o Dhitech scarl Campus Universitario via Monteroni

More information

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CHAPTER 4 CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS 4.1 Introduction Optical character recognition is one of

More information

Selection of Location, Frequency and Orientation Parameters of 2D Gabor Wavelets for Face Recognition

Selection of Location, Frequency and Orientation Parameters of 2D Gabor Wavelets for Face Recognition Selection of Location, Frequency and Orientation Parameters of 2D Gabor Wavelets for Face Recognition Berk Gökberk, M.O. İrfanoğlu, Lale Akarun, and Ethem Alpaydın Boğaziçi University, Department of Computer

More information

Illumination invariant face detection

Illumination invariant face detection University of Wollongong Research Online University of Wollongong Thesis Collection 1954-2016 University of Wollongong Thesis Collections 2009 Illumination invariant face detection Alister Cordiner University

More information

Gaze interaction (2): models and technologies

Gaze interaction (2): models and technologies Gaze interaction (2): models and technologies Corso di Interazione uomo-macchina II Prof. Giuseppe Boccignone Dipartimento di Scienze dell Informazione Università di Milano boccignone@dsi.unimi.it http://homes.dsi.unimi.it/~boccignone/l

More information

Multiple Kernel Learning for Emotion Recognition in the Wild

Multiple Kernel Learning for Emotion Recognition in the Wild Multiple Kernel Learning for Emotion Recognition in the Wild Karan Sikka, Karmen Dykstra, Suchitra Sathyanarayana, Gwen Littlewort and Marian S. Bartlett Machine Perception Laboratory UCSD EmotiW Challenge,

More information

Robust PDF Table Locator

Robust PDF Table Locator Robust PDF Table Locator December 17, 2016 1 Introduction Data scientists rely on an abundance of tabular data stored in easy-to-machine-read formats like.csv files. Unfortunately, most government records

More information

Intensity-Depth Face Alignment Using Cascade Shape Regression

Intensity-Depth Face Alignment Using Cascade Shape Regression Intensity-Depth Face Alignment Using Cascade Shape Regression Yang Cao 1 and Bao-Liang Lu 1,2 1 Center for Brain-like Computing and Machine Intelligence Department of Computer Science and Engineering Shanghai

More information

ECG782: Multidimensional Digital Signal Processing

ECG782: Multidimensional Digital Signal Processing ECG782: Multidimensional Digital Signal Processing Object Recognition http://www.ee.unlv.edu/~b1morris/ecg782/ 2 Outline Knowledge Representation Statistical Pattern Recognition Neural Networks Boosting

More information

Object Category Detection: Sliding Windows

Object Category Detection: Sliding Windows 03/18/10 Object Category Detection: Sliding Windows Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem Goal: Detect all instances of objects Influential Works in Detection Sung-Poggio

More information

Adaptive Feature Extraction with Haar-like Features for Visual Tracking

Adaptive Feature Extraction with Haar-like Features for Visual Tracking Adaptive Feature Extraction with Haar-like Features for Visual Tracking Seunghoon Park Adviser : Bohyung Han Pohang University of Science and Technology Department of Computer Science and Engineering pclove1@postech.ac.kr

More information

Automatic Initialization of the TLD Object Tracker: Milestone Update

Automatic Initialization of the TLD Object Tracker: Milestone Update Automatic Initialization of the TLD Object Tracker: Milestone Update Louis Buck May 08, 2012 1 Background TLD is a long-term, real-time tracker designed to be robust to partial and complete occlusions

More information

CAP 5415 Computer Vision Fall 2012

CAP 5415 Computer Vision Fall 2012 CAP 5415 Computer Vision Fall 01 Dr. Mubarak Shah Univ. of Central Florida Office 47-F HEC Lecture-5 SIFT: David Lowe, UBC SIFT - Key Point Extraction Stands for scale invariant feature transform Patented

More information

Obtaining Feature Correspondences

Obtaining Feature Correspondences Obtaining Feature Correspondences Neill Campbell May 9, 2008 A state-of-the-art system for finding objects in images has recently been developed by David Lowe. The algorithm is termed the Scale-Invariant

More information