COMS W4735: Visual Interfaces To Computers Final Project (Finger Mouse) Submitted by: Tarandeep Singh Uni: ts2379
FINGER MOUSE (Fingertip tracking to control mouse pointer) Abstract. This report discusses the design of a system that tracks the fingertip of the index finger using a single camera, for the control of a mouse pointer on the screen. To support left click, the system also visually recognizes a hand gesture. The underlying algorithm to track the movement of the fingertip is based on CAMSHIFT algorithm [4]. To make tracking performance robust to the cluttered background having colors similar to those of skin region, the algorithm is tuned for tracking hand regions [1]. The mapping of the location of the fingertip to a location on the monitor is done using a method described in [1] and [3]. 1. Introduction Over the recent years, computer vision has started to play a significant role in the Human Computer Interaction (HCI). With efficient object tracking algorithms, it is possible to track the motion of a human hand in real time using a simple web camera. This report discusses the design of a system that tracks the fingertip of the index finger for the purpose of controlling mouse pointer on the screen. A single camera (web camera) is used to track the motion of the fingertip in real time. The camera is mounted on the top of the computer monitor or hooked on the laptop screen. This is shown in Fig 1: other hand is not in the scene. To move the mouse pointer, the user moves his index finger on a 2D plane (for example on a table). All other fingers are folded to form a fist. To do left click, user unfolds his thumb and then folds it back. Please note that if the hand pose is not as described above (all fingers in a folded form to form a fist except the index finger), then we simply track the hand but do not move the mouse pointer. Similarly the clicks are recognized only if index finger was spotted in the scene. This is shown in Fig2 and Fig3. Fig 2: Index finger as a mouse pointer Fig 3: Thumb as Mouse left button 2. System Overview The system consists of three parts: First, an online training of skin color so as to make hand tracking robust to change of illumination and hand poses and complex background. Second, fingertip tracking and gesture recognition (for left click). Third, computing the mapping between the camera coordinates and monitor coordinates and smoothing the mouse pointer path. This is shown in Fig 5. On-line training One time Fingertip tracking and gesture recognition Continuous Computing mouse coordinates and smoothing mouse pointer path Fig 1: Physical layout Since the underlying tracking algorithm (CAMSHIFT) works by segmenting the skin color from the background, it is required for the efficient tracking of the finger tip that user's Fig 4: System overview 3.1 On-line training of skin colors On-line training allows the system to learn
variations of skin colors due to change in illumination and hand poses. A 2D look up table of Hue and Saturation values ( probability density values) from HSV color space is computed using the technique described in [1] 3.2 Detecting hand region As described in [1] a CAMSHIFT algorithm is used to detect hand region, that is the detection of hand region is confined to Region of Interest (ROI). Initially, the ROI is set to the entire captured frame. The hand region is using the H and S values from the 2D look-up table computed during the training phase. A threshold operation gives us the Binary image B(x,y) of the hand region, where B(x,y) = 1 for pixels in the skin region. The center of the hand region (x c, y c ) and its orientation (θ) is calculated using the 0 th, 1 st, 2 nd order image moments. The ROI of the next frame is then computed from this binary image. The center of ROI for the next region is set to the center of hand region in the current frame. The horizontal and vertical lengths (R x, R y ) of the ROI of next frame are computed as follows: R x = s x M oo R y = s y M 00 where s x = cos θ + 1, s y = sin θ + 1 and M 00 = 0 th order image moment 3.3 Computation of Fingertip location To detect finger tip, first it is found out if the pose of the hand is our mouse pointer pose (only index finger in the unfolded position, rest all folded to form a fist). This is done by first cropping the image around the hand region, smoothing it by using Gaussian kernel so as to reduce noise and then analyzing the shape of the hand pose. A simple method is implemented to analyze the shape of the hand. The image is converted into Binary image and the system scans the image from top to bottom row by row. The system counts the pixels that it encounters in each row and try to match it with the width of the finger. If enough rows are found with number of pixels greater than equal to the width of the finger, the system proceeds to check if there is a fist following the finger. If enough number of rows with pixel count greater than fist width are found, then system shows that the pointing pose is. Basically I have implemented a finite state machine to detect the pointing pose in the image by analyzing the number of pixels in the rows. The dimensions of the finger, fist are determined during the on-line training process. A part of this finite state machine is shown in Fig 5. Pix < fwidth 0 Check for Fist Pix > fwidth Pix < fwidth Pix: Number of pixels in a row fwidth: Avg. finger widh flength: Avg. Finger length Pix >= fwidth count++ Fig 5: State of the mouse button corresponding to pointing pose detection in an image The x coordinate of finger tip is set to the first row having number of pixels greater than finger width. The y coordinate is set to the center of the finger. 3.3 Computation of mouse click First the system proceeds to find out if thumb is present in the image. For this a similar Finite state machine is implemented. But this time the system scans the image column by column to find out thumb is present in the image or not. Depending upon if the user is left handed or right handed, the scanning will begin from left side or right side of the image. The system tries to find out if enough number of columns 1 2 Count > flength Finger
with pixel count greater than thumb width are present in the image or not. If sufficient number of such columns, the system checks if there is a fist present in the system or not. Once the fist is also, the system declares that a thumb is in the image which is equivalent to mouse's left button down event. When the user folds his thumb back, the system generates a mouse button up event. Also note that if pointing pose was not in the image, then mouse clicks are not at all. The system maintains the status of the mouse button (left button only). This is shown in Fig 6: Thumb not 0 Thumb Thumb not State 0 => State 1: mouse button pressed. State 1 => State 0: mouse button released Fig 6: State of the mouse button corresponding to thumb detection in the image 4. Display of mouse pointer Once the fingertip is, we need to map the coordinates of the fingertip to the coordinates of the mouse on the monitor. However, we can not use the fingertip locations directly due to the following problems: (i) noise from sources such as segmentation error make it difficult to position the mouse pointer accurately [3] (ii)due to the limitation on the tracking rate, we might get discontinuous coordinates of fingertip locations [1] (iii) Difference of resolution between the input image from the camera and monitor makes it difficult to position the mouse pointer accurately [1]. To circumvent these problems, a simple method is implemented. The displacement of finger tip is averaged over few 1 Thumb frames and this average displacement is used to displace the mouse cursor on the screen. Also if this displacement is found to be less than a threshold value, the mouse cursor is not moved. To make mouse cursor move and to generate mouse button events through our application, windows' sendinput routine is used. Its prototype is included in the windows.h header file. 5. Applications of this application This application gives user an easy way to move mouse cursor on the screen. Also since user is using his hand, the homing time (time to place hand on the mouse) is reduced a lot. The click is implemented with a very easy gesture. With more robust finger tip detection, this application can replace the use of mouse. 6. Limitations of the system The system was evaluated on the basis of hand detection, finger tip detection, click detection, effort required to place mouse on a specific location on the screen and the following observations are made: i. Though the hand detection algorithm (CAMSHIFT) performs in real time, it has the limitation that it works well only in an environment free from background noise. To be specific, if background contains colors similar to skin, then the algorithm will loose track of the hand or falsely report its location. ii. The hand pose detection works well in the setup explained in the figure 1. However, when the camera's height is changed, the system has reported false pose detection. A better way to detect pointing pose is to use Machine learning algorithm (for example Neural network or Support vector machine) iii. The mouse cursor movement on the
screen required more smoothing. Also user is not able to cover the entire screen. 7. Learning It was fun to implement this project and along with fun, I learned a lot of things, some of which I hope to implement later to make this project more usable. My learnings are summarized below: i. Skin segmentation- It is a hard problem and a lot of people have done work in this area. People have tried various color spaces RGB (red, green, blue), HSV (Hue, Saturation, Value) etc. It has been observed that Hue color resembles our skin color and this forms the basis of most of skin segmentation algorithms. However, if we use Saturation along with Hue, it makes skin detection more robust. However, segmenting skin from cluttered background is still a challenging task. Further, the different illumination conditions make skin segmentation even harder. ii. Innovative ways to determine illumination condition People have tried various ways to automatically determine illumination conditions and its effect on the skin color. For example, first detect face using a face detector algorithm that does not depend on skin color. Then extract probability distribution from it. However, for proper working of this approach, we have to segment our face into regions of more skin and ignore areas lips, eyes etc. done either by background subtraction or using optical flow to determine motion. A very nice paper that I came across, when searching for a robust hand tracking method was one using optical flow and skin color - Fast 2D hand tracking with flocks of features and multi-cue integration Mathias Kolsch and Mathew Turk 8. References 1. Jiyoung Park and Juneho Yi, Efficient Fingertip Tracking and Mouse Pointer Control for a Human Mouse, Computer Vision Systems, Third International Conference, ICVS 2003, Graz, Austria, April 1-3, 2003, Proceedings 2. Anne-Marie Burns and Barbara Mazzarino, Finger Tracking Methods Using Eyes Web 3. R. Kjeldsen and J. Kender, Interaction with On-Screen Objects using Visual Gesture Recognition, IEEE Conference on Computer Vision and Pattern Recognition, pp. 788-793, 1997 4. Gary R. Bradski, Computer Vision Face Tracking For Use in Perceptual User Interface, Intel Technology Journal Q2, 1998 iii. Ways to track hands People have been trying to use hands as an interface to computers. One of the ways to track hands robustly is to use motion to segment hand from the background. This can be
Code listing