Exploring Curve Fitting for Fingers in Egocentric Images

Similar documents
Outline 7/2/201011/6/

SUMMARY: DISTINCTIVE IMAGE FEATURES FROM SCALE- INVARIANT KEYPOINTS

Other Linear Filters CS 211A

Computer Vision I - Basics of Image Processing Part 2

Visual Tracking (1) Feature Point Tracking and Block Matching

EE795: Computer Vision and Intelligent Systems

Motion Estimation. There are three main types (or applications) of motion estimation:

EE795: Computer Vision and Intelligent Systems

IRIS SEGMENTATION OF NON-IDEAL IMAGES

Scale Invariant Feature Transform

Visual Tracking (1) Tracking of Feature Points and Planar Rigid Objects

Face Tracking : An implementation of the Kanade-Lucas-Tomasi Tracking algorithm

Edge and corner detection

CS 231A Computer Vision (Fall 2012) Problem Set 3

HOUGH TRANSFORM CS 6350 C V

Computer Vision I - Filtering and Feature detection

Scale Invariant Feature Transform

Segmentation and Tracking of Partial Planar Templates

Towards the completion of assignment 1

Occluded Facial Expression Tracking

Motion Tracking and Event Understanding in Video Sequences

SIFT: SCALE INVARIANT FEATURE TRANSFORM SURF: SPEEDED UP ROBUST FEATURES BASHAR ALSADIK EOS DEPT. TOPMAP M13 3D GEOINFORMATION FROM IMAGES 2014

Announcements. Edges. Last Lecture. Gradients: Numerical Derivatives f(x) Edge Detection, Lines. Intro Computer Vision. CSE 152 Lecture 10

COMPUTER VISION > OPTICAL FLOW UTRECHT UNIVERSITY RONALD POPPE

Features Points. Andrea Torsello DAIS Università Ca Foscari via Torino 155, Mestre (VE)

CS 223B Computer Vision Problem Set 3

Filtering Images. Contents

SUPPLEMENTARY FILE S1: 3D AIRWAY TUBE RECONSTRUCTION AND CELL-BASED MECHANICAL MODEL. RELATED TO FIGURE 1, FIGURE 7, AND STAR METHODS.

Local Image preprocessing (cont d)

The SIFT (Scale Invariant Feature

Feature Based Registration - Image Alignment

BSB663 Image Processing Pinar Duygulu. Slides are adapted from Selim Aksoy

Computer Vision I. Announcements. Fourier Tansform. Efficient Implementation. Edge and Corner Detection. CSE252A Lecture 13.

Feature Tracking and Optical Flow

ROBUST LINE-BASED CALIBRATION OF LENS DISTORTION FROM A SINGLE VIEW

EE368 Project Report CD Cover Recognition Using Modified SIFT Algorithm

Local Feature Detectors

Locally Weighted Least Squares Regression for Image Denoising, Reconstruction and Up-sampling

Corner Detection. GV12/3072 Image Processing.

Computer Vision 2. SS 18 Dr. Benjamin Guthier Professur für Bildverarbeitung. Computer Vision 2 Dr. Benjamin Guthier

Line, edge, blob and corner detection

Visual Tracking (1) Pixel-intensity-based methods

Feature Detectors and Descriptors: Corners, Lines, etc.

Dense Image-based Motion Estimation Algorithms & Optical Flow

Motion Estimation and Optical Flow Tracking

CS 231A Computer Vision (Winter 2014) Problem Set 3

An Efficient Single Chord-based Accumulation Technique (SCA) to Detect More Reliable Corners

Feature Detection and Matching

Corner Detection. Harvey Rhody Chester F. Carlson Center for Imaging Science Rochester Institute of Technology

Chapter 3 Image Registration. Chapter 3 Image Registration

Feature Tracking and Optical Flow

CS143 Introduction to Computer Vision Homework assignment 1.

Detection of a Single Hand Shape in the Foreground of Still Images

Implementing the Scale Invariant Feature Transform(SIFT) Method

Model Fitting. Introduction to Computer Vision CSE 152 Lecture 11

Augmented Reality VU. Computer Vision 3D Registration (2) Prof. Vincent Lepetit

Scott Smith Advanced Image Processing March 15, Speeded-Up Robust Features SURF

Image processing and features

Feature Detection. Raul Queiroz Feitosa. 3/30/2017 Feature Detection 1

CS231A Course Project Final Report Sign Language Recognition with Unsupervised Feature Learning

Robotics Programming Laboratory

Evaluating Error Functions for Robust Active Appearance Models

Anno accademico 2006/2007. Davide Migliore

A split-and-merge framework for 2D shape summarization

Image features. Image Features

Local Features: Detection, Description & Matching

Cellular Learning Automata-Based Color Image Segmentation using Adaptive Chains

Image Features: Detection, Description, and Matching and their Applications

UNIVERSITY OF TORONTO Faculty of Applied Science and Engineering. ROB501H1 F: Computer Vision for Robotics. Midterm Examination.

A NEW FEATURE BASED IMAGE REGISTRATION ALGORITHM INTRODUCTION

CS231A Midterm Review. Friday 5/6/2016

SECTION 5 IMAGE PROCESSING 2

Edge detection. Convert a 2D image into a set of curves. Extracts salient features of the scene More compact than pixels

Digital Makeup Face Generation

CAP 5415 Computer Vision Fall 2012

Lecture 10 Detectors and descriptors

Retrieval of Images with Coronal Loops from Solar Image Databases

Autonomous Navigation for Flying Robots

COMPUTER AND ROBOT VISION

Discovering Visual Hierarchy through Unsupervised Learning Haider Razvi

Computer Vision I - Appearance-based Matching and Projective Geometry

Understanding Tracking and StroMotion of Soccer Ball

Advanced Video Content Analysis and Video Compression (5LSH0), Module 4

Factorization with Missing and Noisy Data

arxiv: v1 [cs.cv] 2 May 2016

EECS150 - Digital Design Lecture 14 FIFO 2 and SIFT. Recap and Outline

A Summary of Projective Geometry

Optical Flow Estimation with CUDA. Mikhail Smirnov

Coarse-to-fine image registration

LANDMARK EXTRACTION USING CORNER DETECTION AND K-MEANS CLUSTERING FOR AUTONOMOUS LEADER-FOLLOWER CARAVAN

Moving Object Segmentation Method Based on Motion Information Classification by X-means and Spatial Region Segmentation

Schedule for Rest of Semester

School of Computing University of Utah

Detection of Edges Using Mathematical Morphological Operators

Application questions. Theoretical questions

CSE/EE-576, Final Project

Hand Gesture Recognition Based On The Parallel Edge Finger Feature And Angular Projection

Image Quality Assessment Techniques: An Overview

CSE 252B: Computer Vision II

Patch-based Object Recognition. Basic Idea

Transcription:

Exploring Curve Fitting for Fingers in Egocentric Images Akanksha Saran Robotics Institute, Carnegie Mellon University 16-811: Math Fundamentals for Robotics Final Project Report Email: asaran@andrew.cmu.edu Abstract For the task of gesture recognition or of an interaction system detecting when objects are picked up by the human hand, it is important to know where the fingers are and how they are oriented. Using images of human hands obtained using egocentric wearable cameras, a hand detector can help in providing the pixel-wise skin probability for all pixels in an image. Using the output of such a detector, we can approximate curves specifically ellipses to fingers for a top-down approach for hand detection or gesture recognition. Such curves can be employed as features or for ascertaining the possibility of fingers given some detection results. I. INTRODUCTION For wearable cameras or first person vision, hand detectors provide an output of pixel-level detection of the hands (bottom-up approach) which gives an output in terms of a probability map for the input image of the hands, where each pixel is assigned the probability of being a part of the hand or the skin. Such detectors first need to be trained for a specific user. The training requires ground truth masks for the hands/skin region present in am image. It then tries to learn a random forest regressor using local features such as SIFT, GIST, BRIEF, ORB features used commonly in computer vision and global illumination features describing the illumination of the scene. For a different kind of illumination category a different regressor is learned. This helps in giving good detection results on a variety of indoor and outdoor settings. After the training process is complete, the detector can then be used on new images obtained by the wearable camera for the same user. The output of the detector would be noisy grayscale images depicting the probability of each pixel belonging to the skin region. So a bright white pixel would mean that it is highly likely to be a part of the skin and a black pixel would mean it is highly unlikely to be a part of the skin. These noisy images would be the input to our curve fitting system for fingers. The broad idea behind fitting curves to fingers is to formulate a new approach towards hand detection or a gesture recognition system. For such a system, an important feature is the finger positions, number of fingers detected and fingertip orientations. It will be advantageous as compared to template matching as the results for that approach would vary heavily with the size and shape of the template used. Here in some sense we are generating a template for the fingers on the Fig. 1. Input image frame used for training of the detector Fig. 2. A hand mask used during training fly. Also, such a technique would be helpful for coarse to fine contour matching in general. For certain gestures, some particular contours would be expected, and contour matching could be done for detection of such gestures if properties of these contours could be formulated. II. DATA FROM HAND DETECTORS The user-specific training of hand detectors is done on a video of hand image frames. For every 25th or 50th frame of the video or so, masks for the hands need to be provided. This can be seen in Fig. 1 and 2. After learning the random forest regressors, the detector when fed with a new video can learn generate rough probability maps as shown in Fig. 3 and 4. These maps are not perfect and have spurious detections in the form of noise in the image. Certain part of the hands very brightly illuminated failed to be detected as part of the skin. A limitation of any such detector is to handle very rapid changes in illumination on the surface of the hand itself in the same image. For example, a part of the hand brightly illuminated

Fig. 3. A test image given to the detector post training Fig. 4. The probability map generated by skin detector in the sun and a part of the hand under the shadow of a tree would not give very neat results. It is also heavily dependent on how well-trained the detector is. III. METHODOLOGY A. Denoising the probability map To be able to fit curves to the fingers from the noisy probability maps, the first step is to denoise the grayscale probability estimation to get a smoother area to investigate for the fingers. Salt and pepper noise is removed using the median filter, which considers the median value from set of neighboring pixels for a point. Bilateral filtering is used to smooth out the map while preserving the edges. Hence after these two filtering operations, a cleaner map is obtained. shown in Fig. 5. The next step is to perform thresholding of the probability map and then apply the Canny edge filtering operation to obtain the contours for the hand as shown in Fig. 6. Once the image outlining the hand is obtained, the next step is consider small patches across the image which could possibly contain a finger and fit curves to it. Different sized patches were tested such as shown in Fig. 7. Finally a patch size of 21x21 pixels was chosen as it seemed to fit a fingertip well within it. Every such 21x21 patch containing some edge points of the hand was then tested for strength of curvature of the curve contained within it. A patch simply containing the straight line of the arm for example was definitely not needed. For testing the curvature different metrics were tried. The principal curvature involving double derivatives did not seem to give very accurate results for the dataset involved. Finally the Harris Corner metric was chosen as it retained patches of interest when tried on a small dataset during experimentation. Fig. 5. The original image, noisy probability map and denoised probability map generated through skin detector Fig. 6. The output of Canny edge filtering on thresholded probability map Fig. 7. Different sized patches explored for the ellipse fitting process - 100x100, 31x31, 21x21 Equation (1) denotes the Harris matrix computation which is computed over a small area of the image (u, v).w(u, v) is the weight assigned to the different pixels in the patch while taking

the sum of squared differences in two nearby patches. Equation (1) approximates this using the Taylor Series expansion. In our case, we simply consider a uniform averaging. A = [ ] [ 2 Ix I w(u, v) xy < 2 ] Ix > < I 2 = xy > I xy I y < I xy > < I 2 y > u v (1) Equation (2) gives the final metric to be computed which indicates the measure of curvature for the small area (u, v). λ 1 and λ 2 are the two eigen values of the matrix A in equation (1). If M c is greater than a certain threshold, then the patch is considered for curve fitting else discarded. k is chosen between 0.04 to 0.15. M c = λ 1 λ 2 k(λ 1 + λ 2 ) 2 = det(a) ktrace 2 (A) (2) Finally the patches retained after computing the Harris corner metric are used for curve fitting. The curves are finally superimposed on the original image frames to display the curves. IV. DIRECT LEAST SQUARE FITTING OF ELLIPSES To begin with no prior curve was assumed to be beneficial. The first attempt was to take a set of polynomial bases and try to fit that to the chosen patches. However it is important to note that the data points in our case are extremely noisy and are taken at a very small scale. It was found that the polynomial curves returned via the least squares approach were not very stable. For very similar patches the coefficients being returned for lower order polynomials like power 2 or 3 was also not very consistent or close. With small deviation in data or introduction to small variation of noise perturbed the small scale polynomial in a big way. Also, the equation or polynomial coefficients would not directly quantify a generic property of the curve. Hence a conic, or specifically an ellipse was seen to be a better choice for fitting to fingers. Previously many generic conic fitting algorithms existed, however they involved iterative approaches to make the fitted conic an ellipse for the given set of data points. The algorithm used (B2AC method) directly fits an ellipse to the given set of points using the least squares approach. This is possible because of a good normalization constraint chosen for the problem which leads to solving a specific generalized eigen value system having only one single ellipse as the output. F (a, x) = ax 2 + bxy + cy 2 + dx + ey + f = 0 (3) a = (a, b, c, d, e, f) T (4) x i = (x i, y i ) (5) Equation (3) represents the general equation of a conic where equation (2) describes the conic parameters. We are given a set of n data points x i where i varies from 1 to n. The least squares approach requires minimizing the algebraic distance between the data points and the conic as shown in equation (6). D A (a) = n F (a, x) (6) i=1 A. Minimization and Normalization constraint To avoid the trivial solution where all conic parameters become zero, the minimization equation must also abide by the constraint for an ellipse in our case equation (7). b 2 4ac < 0 (7) Here the constraint equation (7) is turned into a normalization constraint by setting it to -1. This constraint can then be realized in matrix form as shown in equation (9) where the matrix C is a 6x6 symmetric constraint matrix. where b 2 4ac = 1 (8) a T Ca = 1 (9) 0 0 2 0 0 0 0 1 0 0 0 ) C = 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 A minimum of five points are needed to fit the ellipse. B. Generalized Eigen Value System (10) With the new normalization constraint, the minimization term of equation (6) reduces to equation (11) while the constraint of equation (9) holds. mine = Da 2 (11) where design matrix 2 2 x 1 x 1 y 1 y 1 x 1 y 1 1 2 2 x 2 x 2 y 2 y 2 x 2 y 2 1 D =...... 2 2 x n x n y n y n x n y n 1 (12) Provided the gradient of the constraint equation (9) is not zero, we can use the lagrange multiplier to formulate the minimization expression in equation (13). (Da) T (Da) + λ(1 a T Ca) = 0 (13) Differentiating equation (13) with respect to a we get a generalized eigen value system as described in equation (15) for the pair of matrices C and S. S is the scatter matrix formed using the data points given and is positive definite.

D T Da = λca (14) where scatter matrix Sa = λca (15) S = D T D (16) Solving the generalized eigen value system of equation (15) requires the use of a lemma which states that the signs of the generalized eigenvalues of Su = λcu, where S is positive definite and C is symmetric, are the same as those of the constraint matrix, up to permutation of the indices. Eigen values of C are -2, -1, 2, 0, 0, 0, thus containing only one positive eigen value. Therefore according to the lemma stated above, there is only one generalized eigen value solution which is positive, the corresponding generalized eigenvector for which precisely leads to the only solution for the ellipse parameters a. a = µ i u i (17) where u i is the generalized eigen vector with the positivegeneralized eigen value and Fig. 8. The solid line ellipses of the B2AC method are close to being a circle (low eccentricity) compared to other ellipse fitting methods Fig. 9. The B2AC method shows robust results in the presence of noise compared to other ellipse fitting methods µ i = 1 1 = (18) ui Cu i ui Su i C. Advantages Compared to other General Conic Fitting Approaches This approach compared to other iterative ellipse fitting approaches is much more efficient and faster to implement. This method has a low eccentricity bias as can be seen in Fig.8. where the other methods fir more elongated ellipses to the same dataset as compared to the B2AC approach. This method is also affine invariant, meaning bringing about affine transformations in the data do not change the ellipses being fit drastically compared to other approaches. Moreover it is highly robust to noise as can be seen in Fig.9 where increasing the level of noise in the data points brings about gradual degradation in ellipse fitting compared to other methods. Also for the same level of noise, but different instances of it do not cause any change in the ellipse fitting results whereas other methods are quite unstable to different runs of introducing the same level of noise to a given set of points. V. RESULTS The results of the ellipse fitting approach with the methodology suggested gives many outliers as shown in Fig.10. However it can be seen that all the fingers are detected well. The results were collected in the form of a video for the input video given to the detector. For outlier removal (very big ellipses), RANSAC was used with the major and minor axes lengths of the ellipses as input parameters. The sum of squared differences on these parameters was used to minimize the error for the model fitting required for RANSAC. The results after Fig. 10. Ellipse fitting results with outliers removing outliers were much more consistent however did miss out on a few fingers in very few frames. The results can be seen in Fig.11. VI. LIMITATIONS AND FUTURE WORK The output of this approach is heavily dependent upon the robustness of the hand detector giving a near to perfect estimate of the skin regions. If a region of the image similar to the skin color such as a table or door is spuriously detected by the hand detector then ellipses will be fit to these regions as well as shown in Fig.12. We find that in some frames the approach becomes conservative after the application of RANSAC and very few of the ellipses while should be present are missing. Future work involves making the estimation of finger ellipses more accurate such that it detects all ellipses correctly. Multiple ellipses are being fit for close by patches in-

Fig. 11. RANSAC removes outlier ellipses however due to limitations of the detector, misses out on certain fingers or detects similar sized ellipses not belonging to fingers Fig. 12. Table and some regions of the hand having curvature similar to fingers also detected as fingers corporating the same finger edge. This redundancy if removed can speed up the algorithm. Also, the results attained so far, can be used as new features for a gesture recognition system and human-object interaction detector system. Another idea is to use the results of this approach as an initialization step for finger tracking using particle filters. Different properties can be extracted such as aspect ratio (ratio of height and width) for finger ellipses and clustering of ellipses into different fingers to develop more accurate features. R EFERENCES [1] C.Li and K.Kitani Pixel-level Hand Detection in Ego-Centric Videos, CVPR, 2013 [2] A. Fitzgibbon, M. Pilu, R. Fischer Direct Least Square Fitting of Ellipses, PloS One, IEEE Transactions on Pattern Analysis and Machine Intelligence, VOL. 21, NO. 5, MAY 1999 [3] W. Gander, G. Golub, R. StrebelLeast Squares Fitting of Circles and Ellipses [4] T. Ellis, A. Abbood, B. Brillault Ellipse Detection and Matching with Uncertainity, Image and Vision Computing Vol. 10, no. 2, pp 271-276, 1992.