Detection of hand grasping an object from complex background based on machine learning co-occurrence of local image feature

Similar documents
Classifier Selection Based on Data Complexity Measures *

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Edge Detection in Noisy Images Using the Support Vector Machines

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

A Background Subtraction for a Vision-based User Interface *

Support Vector Machines

Face Detection with Deep Learning

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

Detection of an Object by using Principal Component Analysis

Machine Learning 9. week

Local Quaternary Patterns and Feature Local Quaternary Patterns

User Authentication Based On Behavioral Mouse Dynamics Biometrics

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

A PATTERN RECOGNITION APPROACH TO IMAGE SEGMENTATION

Fast Feature Value Searching for Face Detection

Parallelism for Nested Loops with Non-uniform and Flow Dependences

The Research of Support Vector Machine in Agricultural Data Classification

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1

A Probabilistic Approach to Detect Urban Regions from Remotely Sensed Images Based on Combination of Local Features

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes

3D vector computer graphics

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach

S1 Note. Basis functions.

Machine Learning. Topic 6: Clustering

Histogram of Template for Pedestrian Detection

Shape Representation Robust to the Sketching Order Using Distance Map and Direction Histogram

Classifier Swarms for Human Detection in Infrared Imagery

Lecture 5: Multilayer Perceptrons

Collaboratively Regularized Nearest Points for Set Based Recognition

12/2/2009. Announcements. Parametric / Non-parametric. Case-Based Reasoning. Nearest-Neighbor on Images. Nearest-Neighbor Classification

Real-time Motion Capture System Using One Video Camera Based on Color and Edge Distribution

Three supervised learning methods on pen digits character recognition dataset

Range images. Range image registration. Examples of sampling patterns. Range images and range surfaces

Face Recognition Method Based on Within-class Clustering SVM

Smoothing Spline ANOVA for variable screening

Corner-Based Image Alignment using Pyramid Structure with Gradient Vector Similarity

2x x l. Module 3: Element Properties Lecture 4: Lagrange and Serendipity Elements

Face Tracking Using Motion-Guided Dynamic Template Matching

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices

Real-time Joint Tracking of a Hand Manipulating an Object from RGB-D Input

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

Vanishing Hull. Jinhui Hu, Suya You, Ulrich Neumann University of Southern California {jinhuihu,suyay,

Unsupervised Learning and Clustering

Multiple Frame Motion Inference Using Belief Propagation

A Deflected Grid-based Algorithm for Clustering Analysis

UAV global pose estimation by matching forward-looking aerial images with satellite images

Classifying Acoustic Transient Signals Using Artificial Intelligence

A Modified Median Filter for the Removal of Impulse Noise Based on the Support Vector Machines

SVM-based Learning for Multiple Model Estimation

The Shortest Path of Touring Lines given in the Plane

Support Vector Machines

2. Related Work Hand-crafted Features Based Trajectory Prediction Deep Neural Networks Based Trajectory Prediction

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Texture Feature Extraction Inspired by Natural Vision System and HMAX Algorithm

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Lower Body Pose Estimation in Team Sports Videos Using Label-Grid Classifier Integrated with Tracking-by-Detection

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

An Entropy-Based Approach to Integrated Information Needs Assessment

A Study on the Application of Spatial-Knowledge-Tags using Human Motion in Intelligent Space

Application of Maximum Entropy Markov Models on the Protein Secondary Structure Predictions

A Binarization Algorithm specialized on Document Images and Photos

Online Detection and Classification of Moving Objects Using Progressively Improving Detectors

SHAPE RECOGNITION METHOD BASED ON THE k-nearest NEIGHBOR RULE

Announcements. Supervised Learning

Robust visual tracking based on Informative random fern

A high precision collaborative vision measurement of gear chamfering profile

Face Recognition University at Buffalo CSE666 Lecture Slides Resources:

SUMMARY... I TABLE OF CONTENTS...II INTRODUCTION...

Machine Learning. Support Vector Machines. (contains material adapted from talks by Constantin F. Aliferis & Ioannis Tsamardinos, and Martin Law)

RECOGNIZING GENDER THROUGH FACIAL IMAGE USING SUPPORT VECTOR MACHINE

Determining Fuzzy Sets for Quantitative Attributes in Data Mining Problems

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Object-Based Techniques for Image Retrieval

Resolving Ambiguity in Depth Extraction for Motion Capture using Genetic Algorithm

Feature Selection for Target Detection in SAR Images

Analysis of Continuous Beams in General

An AAM-based Face Shape Classification Method Used for Facial Expression Recognition

Applying EM Algorithm for Segmentation of Textured Images

A Comparison and Evaluation of Three Different Pose Estimation Algorithms In Detecting Low Texture Manufactured Objects

Invariant Shape Object Recognition Using B-Spline, Cardinal Spline, and Genetic Algorithm

A New Feature of Uniformity of Image Texture Directions Coinciding with the Human Eyes Perception 1

Region Segmentation Readings: Chapter 10: 10.1 Additional Materials Provided

Human Face Recognition Using Generalized. Kernel Fisher Discriminant

LOCAL FEATURE EXTRACTION AND MATCHING METHOD FOR REAL-TIME FACE RECOGNITION SYSTEM. Ho-Chul Shin, Hae Chul Choi and Seong-Dae Kim

Fitting and Alignment

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z.

Real Time Hand Gesture Recognition Including Hand Segmentation and Tracking

Incremental Learning with Support Vector Machines and Fuzzy Set Theory

Mathematics 256 a course in differential equations for engineering students

An Optimal Algorithm for Prufer Codes *

TN348: Openlab Module - Colocalization

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

SRBIR: Semantic Region Based Image Retrieval by Extracting the Dominant Region and Semantic Learning

2 ZHENG et al.: ASSOCIATING GROUPS OF PEOPLE (a) Ambgutes from person re dentfcaton n solaton (b) Assocatng groups of people may reduce ambgutes n mat

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation

Transcription:

Detecton of hand graspng an object from complex background based on machne learnng co-occurrence of local mage feature Shnya Moroka, Yasuhro Hramoto, Nobutaka Shmada, Tadash Matsuo, Yoshak Shra Rtsumekan Unversty, Shga, Japan Abstract Ths paper proposes a method of detectng a hand graspng an object from one mage based on machne learnng. The learnng process progresses as a boostrap scheme from handonly detecton to object-hand relatonshp modelng. In the frst step, Randomzed Trees that learns the postonal and vsual relatonshp between mage features of hand regon and wrst poston and that provdes the probablty dstrbuton of wrst postons s bult. Then the wrst poston s detected by votng the dstrbutons for multple features n newly captured mages. In the second step, the sutablty as hand s calculated for each feature by evaluatng the votng result. Based on that, the object features specfed by the graspng type are extracted and the postonal and vsual relatonshp between the object and hand are learned so that the object center can be predcted from hand shape. On the local coordnate constructed by the predcted object center and the wrst poston, t s learned where the object, hand regon and the background s located n the target graspng type by One class SVM. Fnally, the graspng hand, the grasped object and the background are respectvely extracted from the cluttered background. I. INTRODUCTION There are researches of detectng unknown objects from mage based on machne learnng lke [1]. In addton, another approach s proposed based on more mage cues lke human actons and the knd of objects. In order to recognze varous actons of the human hands, such as holdng or pckng up somethng, detectng the hands and estmatng ther postures from mages are to be solved. In some prevous researches the object category s recognzed by usng functonal relatonshp between an object and the hand acton [2][3]. They smultaneously evaluate the mage appearances of hand and object. However, when a hand s holdng somethng, t s dffcult to detect the hand and precsely classfy ts posture because some part of regons may be occluded by the grasped object. Ths paper proposes a method that detects a hand poston holdng an object based on local hand features n spte of the occluson. By learnng local features from the hand graspng varous tools wth varous way to hold them, the proposed method estmates the poston, scale, drecton of the hand and the regon of the grasped object. By some experments, we demonstrate that the proposed method can detect them n cluttered backgrounds. II. WRIST POSITION DETECTION BASED ON RANDOMIZED TREES LEARNING RELATIVE POSITIONS OF THE LOCAL IMAGE FEATURES A. Tranng of Randomzed Trees model provdng the probablty dstrbuton of wrst postons Randomzed Trees (RT)[4] s a mult-class classfer that accepts mult-dmensonal features and t provdes probablty dstrbutons over the class ndces. Here we construct a classfer that provdes the probablty dstrbuton of wrst postons when SURF[5] mage features are nput nto the classfer. It s necessary to estmate the dstrbuton even f the hand s located at the mage regon, n the scale and orentaton dfferent from those of the tranng mages. For the purpose, the tranng data of the wrst postons are once normalzed based on the poston, scale and orentaton of each SURF feature of the graspng hands seen n the tranng mages, and then the pars of the normalzed wrst postons and the features are nput nto the classfer for tranng. Snce the probablty dstrbuton predcted by RT s dscrete, however, the contnuous dstrbuton over the mage space cannot be drectly treat by RT. Therefore based on the normalzed coordnate, the normalzed mage space s dvded nto dscrete grd blocks each of that has a unque class label, the class label on whch the wrst poston les s determned, and then an RT that outputs the probablty dstrbuton of the wrst poston over the grds for the nput of 128 dmensonal SURF mage feature s constructed. A local coordnate space defned for each SURF vector s segmented nto some blocks by a grd (Fg.1-(c)). The number of blocks s fnte and they are ndexed. A label j s an ndex that means a block or background. The j-th block C j s a regon on a local coordnate space defned as [ ] u C j = u u v c (j) < S block/2, v v c (j) < S block/2}, (1) where (u c (j), v c (j)) t denotes the center of the C j and S block denotes the sze of a block. When a wrst poston s x wrst = (x wrst, y wrst ) t and a SURF vector v s detected on a hand, a label j of the SURF vector v s defned as a block ndex j such that T v (x wrst ) C j, where T v s defned the translaton nto the local coordnate

as T v (x, y) = 1 S v [ (x xv ) cos θ v + (y y v ) sn θ (x x v ) sn θ v + (y y v ) cos θ ], (2) and (x v, y v ) denotes the poston where the SURF v s detected, θ v and S v denote the drecton and scale of the SURF v, respectvely. A label of a SURF vector detected on background s defned as a specal ndex j back and t s dstngushed from block ndexes. To learn the relaton between a SURF vector v and ts label j, we collect many pars of a SURF vector and ts label from many teacher mages where the true wrst poston s known. Then, we tran RT wth the pars so that we can calculate a probablty dstrbuton P (j v). To fnd local maxma wthout explctly defnng local regons, we use mean-shft clusterng. Seeds for the clusterng are placed at regular ntervals n the mage space as shown n Fg.2(a), where a blue pont denotes a seed. a green pont denotes a trajectory, a red pont denotes a lmt of convergence and a green and blue crcle denotes the frst and second wrst canddate, respectvely. The mage space s clustered by lmt postons of convergence (Fg.2(b)). For each cluster, the poston wth the maxmum votes s taken as a canddate (Fg.2(c)). Fg. 1. Converson of wrst poston and postons of features to relatve postonal relatonshp class labels Fg. 2. Flow of wrst canddate detecton B. Wrst poston detecton based on votes from the probablty dstrbuton of wrst postons A wrst poston s estmated by votng on the mage space, based on the probablty dstrbuton P (j v) learned wth the RT. The votes functon V wrst (x, y) defned as V wrst (x, y) = V v (x, y), (3) where V v (x, y) = v for all SURF vectors P (j = j v) f C j s.t. T v(x, y) C j, 0 (otherwse), and P (j v) s a probablty dstrbuton calculated by the traned RT. The poston wth the maxmum votes, arg max V wrst (x, y), s consdered as a poston sutable for a wrst. However, the global maxmum may not be the true poston because the votes mean a local lkelhood and global comparson of them makes no sense. Therefore, we allow multple canddates of wrst postons that have locally maxmum votes. x,y (4) If a background regon s larger than a hand regon, the above votng s dsturbed by SURFs on the background. To overcome ths, we roughly dstngush a SURF seemngly orgnated from a hand and others by a SVM[6]. We perform the above votng algorthm wth SURFs except those apparently orgnated from non-hand regons. An example of fndng canddate s shown n Fg. 3. Frst, SURFs are extracted from an nput mage(fg. 3(a)). By usng the SVM, they are roughly classfed and SURFs apparently orgnated from non-hand regons are excluded (Fg. 3(b),(c)). For each of the rest SURF vectors v, as the above mentoned, the local coordnate s defned and the condtonal probablty dstrbuton P (j v) s predcted by the RT classfer (Fg. 3(d),(e)). By votng the probablty dstrbuton, canddates of wrst postons are determned (Fg. 3(f),(g)). The red rectangle n the mage Fg. 3(g) s the detected wrst poston, and t s shown that the wrst s successfully detected.

represents how much the detected SURF features contrbuted to the votng for that poston when the wrst poston was beng determned (.e. the probablty of votes) (Fg.4-(b)). The poston (x, y) wthn the mage s classfed usng h. The method of calculatng sutablty as hand h s to take the ndex of SURF features voted for vmax as, and the values voted for by feature n that place as Pwrst, and express ths as n equaton(5). vmax = Pwrst (5) Pwrst The hgher the value of s greater, feature has strongly contrbuted to determnng the wrst poston. In other words, the sutablty as hand for each feature s shown n equaton(6). h = Pwrst (6) K-means clusterng s used to segment the postons of features n the mage and the groups representng ther sutablty as hand x = (x, y, h) to determne whether they are hand class features or object class features. The h value of center xm1 and xm2 for each cluster s compared, wth the larger one beng judged to be the hand class Lhand and the other one to be the object class Lobj. m1 mlhand hm1 > hm2 m2 m Lobj (7) m1 m L obj otherwse m2 m Lhand Fg. 3. Flow of based on votes from the probablty dstrbuton of wrst postons III. E XTRACTION OF HAND AND OBJECT REGION BASED ON LEARNING OF CO - OCCURENCE OF HAND AND OBJECT FEATURES To fnd a wrst poston from features observed on a hand graspng an object, t s requred to classfy them nto features on a hand and those on an object. We ntroduce a One Class SVM[7] for the classfcaton. It s traned wth a object relatve poston represented by a wrst-object coordnate system. Frst, we estmate the center of gravty of an object by RT and derve a wrst-object coordnate system from the estmated center. Then, we dstngush between feature ponts on an object and those on a hand by the One Class SVM. A. Estmatng gravty center of a hand by coarse classfcaton A wrst poston can be found by the algorthm descrbed the secton 2. We can derve a wrst-object coordnate system by fndng a center of gravty of an object. Here, we are lookng at mages showng object graspng states, and we need to classfy these nto the mage features that belong to the hand and those that belong to the object. Thus, sutablty as hand h From ths, we can use the key pont x, and the Eucldean dstance d(x, xm ) of the center of gravty vector xm to determne the class n whch each key pont belongs. The hand class Lhand and object class Lobj are calculated from equaton(8). d(x, xmlhand ) < d(x, xmlobj ) Lhand (8) otherwse Lobj Ths wll generate the centers of gravty of the wrst poston and object class features as well as the wrst-object coordnate system (Fg.4-(c)). Fg.4 shows the hand regon based on the effectve scope of SURF features classfed nto the hand class (dark color ndcates greater sutablty as hand).(pnk rectangle : hand feature class, Blue rectangle : object feature class) B. Learnng of an RT to output the probablty dstrbuton of object postons Usng the process n the prevous secton, t s possble to segment the respectve areas for hand and object n the object graspng sample mage for tranng use. By usng the segmented mage graspng the object for tranng purposes, t s possble to newly tran t on the relatonshp between the wrst poston and the center poston of the object n these knds of claspng patterns. Thus, as n Secton 2.A, RT s used to teach the center of gravty of objects n the sample tranng mage of a graspng

Fg. 4. Flow of Classfcaton of hand and object features hand extracted n the prevous secton and the relatonshp of hand SURF features n the same mage and a system s bult to predct the object s locaton relatve to local features. After detectng the wrst poston, the probablty dstrbuton output by the devce for predctng the object s locaton s voted nto the mage space relatve to SURF features that have hgh sutablty as hand (h) and the object s center of gravty s detected. Fg.5 shows an example result of Detecton.(Red rectangle : wrst poston. Blue rectangle : center of gravty of objects) Fg. 5. Detecton of center of gravty of objects. C. Lernng of an One Class SVM to dentfy hand and object features based on the wrst-object coordnate system By obtanng the classfers n the prevous secton, t becomes possble, by nputtng the hand graspng state, to predct what poston wthn the mage the center of the object s lkely to appear. In ths way, as t s possble to set the detected wrst poston and wrst-object coordnate system estmated from the predcted central pont of the object as prevously explaned, t s possble to tran t from these standard coordnates where, relatvely, the clasped object area wll appear and make t possble to segment the object area and the background area. In the wrst-object coordnate system defned n Secton 3.A, n order to estmate the range of hand and object regons n the mage, both a hand feature One Class SVM and object feature One Class SVM are bult. One Class SVM s a classfer that learns usng only postve nstances to classfy tems nto two classes. For hand features, t uses the nput poston and lkeness to a hand (x, y, h) n the wrst-object coordnate system to learn classfers. The converson of mage coordnates x = (x.y.h) to wrst-object coordnates x s expressed n equaton(9). x = (x, y, h) (9) Specfcally, the classfer f hand (x ) of hand class feature becomes equaton (10). f hand (x ) > 0 x L hand (10) otherwse x / L hand For object features, because the sutablty as object class that s common to the appearance of all tems that are normally grasped cannot be evaluated, t learns classfers wth only the two-dmensonal poston (x, y) n the wrst-object coordnate system of the mage. The object class feature classfer f obj (y ), whch uses y = (x, y ) as the wrst-object coordnates converted from the mage coordnates, s as expressed n equaton(11). f cobj (y) > 0 y L cobj otherwse y / L cobj (11) x and y are calculated from the nput mage. As countless objects exst each tme there s the shape of a hand claspng an object, f hand (x ) s trusted more than f obj (y ). Thus, object class canddate recognton s expressed as n equaton(12). f cobj (y) > 0 f hand (x ) 0 y L cobj otherwse y / L cobj (12) The classfer functon f l (x y ) for hand class L hand, object canddate class L obj and background class L back from equaton(10,11,12), s expressed n equaton(13). L hand f hand (x ) > 0 f l (x, y ) = L cobj f cobj (y) > 0 f hand (x ) 0 (13) otherwse L back Fg.6 shows an example result of classfcaton nto hand, object, and background classes (features that are nether hand nor object) usng these two classfers. (Pnk rectangle : Hand feature, Blue rectangle : object feature, Green rectangle : background feature, Red rectangle : Wrst poston) IV. CONCLUSION We constructed the RT that output the probablty dstrbuton of the wrst postons wth the wrst havng undergone the tranng n ts postons as related to the SURF features of the hand class. Then we searched the wrst postons ths complex

Fg. 6. Result of feature class classfcaton background. We calculated the sutablty as hand of SURF features from the results thus detected on the wrst postons and classfed the features nto hand class and object class. From the results of our classfcaton, we constructed the RT that output the wrst-object poston probablty dstrbuton wth the wrst havng been traned n ts relatonshp wth the locaton of the center of gravty.then we prepared the wrstobject coordnate system. To enable us to nfer the hand and object area n the wrst-object coordnate system thus prepared, we constructed the One Class SVM of the hand features and the One Class SVM of the object features and proceeded to carry out the segmentaton of the hand, object, background area under complex background from the two classfers whch we had constructed. In the future, how varous objects change the grp of the hand wll be studed n order to extract objects based on that relatonshp. REFERENCES [1] S. Kawabata, S. Hura, and K.Sato, A Rapd Anomalous Regon Extracton Method by Iteratve Projecton onto Kernel Egen Space, The Insttute of Electroncs, Informaton and Communcaton Engneers, D,J91-D(11), pp. 2673-2683, November, 2008 [2] N. Kamakura, Shape of hand and Hand moton. Ishyaku Publshers, 1989. [3] H.Kasahara, J.Matsuzak, N.Shmada, H.Tanaka Object Recognton by Observng Graspng Scene from Image Sequence, Meetng on Image Recognton and Understandng 2008, IS2-5, pp.623-628,2008. [4] V. Lepett and P. Fua: Keypont recognton usng randomzed trees, Transactons on Pattern Analyss and Machne Intellggence, 28,9,pp.1465-1479 [5] H. Bay, A. Ess, T. Tuytelaars, and L. V. Gool: SURF:Sppeded Up Robust Features, Computer Vson and Image Understandng, Vol.110,No.3,pp346-359(2008) [6] V.Vapnk: The Nature of Statstcal Learnng Theory, Sprnger, (1995). [7] B.Scholkopf, J. C. Platt, J. Shawe-Taylor, A. J. Smola, and R. C. Wllamson: Estmatng the support of a hgh-dmensonal dstrbuton., Neural Computaton, 13(7):1443-1471,2001.