Real-tme Jont Tracng of a Hand Manpulatng an Object from RGB-D Input Srnath Srdhar 1 Franzsa Mueller 1 Mchael Zollhöfer 1 Dan Casas 1 Antt Oulasvrta 2 Chrstan Theobalt 1 1 Max Planc Insttute for Informatcs 2 Aalto Unversty {ssrdhar,frmueller,mzollhoef,dcasas,theobalt}@mp-nfmpgde {anttoulasvrta}@aaltof Supplementary Document Fg 1 Lve tracng results for three dfferent subjects In ths document we tae a deeper loo at our artculated Gaussan mxture algnment strategy and show more qualtatve results of our lve capture setup that allows to trac hand-object nteractons at frame rate In addton, we provde detals on our benchmar dataset and the error metrc used n the ground truth evaluaton Fnally, we gve the gradents of all components of our objectve functon For further results, e nfluence of the dfferent components and vdeo footage of lve tracng sessons, we refer to the supplemental vdeo 1 Algnment Objectve In ths secton, we tae a deeper loo at the desgn of our algnment objectve E a and explore ts connecton to pont set regstraton methods that are based on Gaussan mxtures [1 Note, the algnment objectve s just a small component of our complete energy functon that also ncludes novel contact and occluson handlng constrants Let us assume the model as well as the nput depth data
2 S Srdhar, F Mueller, M Zollhöfer, D Casas, A Oulasvrta, C Theobalt are represented each as a Gaussan mxture: M(x) M w G(x µ, σ ), I(x) I w G(x µ, σ ) Here, the set M contans the ndces of all model Gaussans and the set I of all mage Gaussans, respectvely Each Gaussan s sotropc wth standard devaton σ R and mean µ R 3 For smplcty let us assume all mxng weghts to be one (w 1) We then defne an l 2 -dssmlarty measure between the two Gaussan mxtures, also see [1 for more detals: E a [M(x) I(x) 2 dx The expanson of Equaton 1 splts the objectve n three dstnct parts: E a [M(x) I(x) 2 dx [ M(x) 2 2M(x)I(x) + I(x) 2 dx M(x) 2 dx 2 M(x)I(x)dx + I(x) 2 dx (a) (b) (c) Note, (c) s constant n the presented tracng scenaro, snce we only optmze for the postons of the model Gaussans The terms (a) and (b) are ntegrals over products of Gaussan Mxtures Let us frst consder (b): M(x)I(x)dx M ( )( ) G(x µ, σ ) G(x µ j, σ j ) dx M j I [ G(x µ, σ )G(x µ j, σ j ) dx M j I [ j I G(x µ, σ )G(x µ j, σ j ) dx S,j Snce S,j s the ntegral over a product of Gaussans, t has a closed form expresson [2: ( ) S,j (2π) 3 2 (σ 2 σ2 j ) 3 2 (σ 2 + σ2 j ) 3 2 exp µ µ j 2 2 2(σ 2 + σ2 j ) Its gradent can be easly derved n closed form; the same holds for (a)
Real-tme Jont Hand and Object Tracng from RGB-D Input 3 Fg 2 We are able to trac complex shapes le a toy car Our contact ponts term (contacts are crcled n blue) maes fngers hold the car even n the presence of severe occluson 2 Lve Tracng Results Our real-tme approach uses the color and depth data from a sngle Creatve Senz3D tme-of-flght (TOF) sensor Note, we also support other depth sensors le the Intel RealSense, Knect and Prmesense Carmne The used color and depth resolutons are 640 480 and 320 240, both captured at 30 Hz We show compellng lve tracng results for three dfferent subjects n a close nteracton range of 15 to 100 cm away from the camera, see Fg 1 In addton, Fg 2 presents a tracng result of a complex object (toy car) Tracng s robust even f hands closely nteract wth objects due to the proposed contact and occluson constrants Our approach s robust even f a second hand s vsble Ths enables nterestng and new nteracton possbltes as shown n Fg 1 For addtonal lve footage, we refer to the supplemental vdeo 3 Error Measure We provde a new benchmar wth 3014 frames (6 sequences) wth ground truth annotatons to evaluate hand-object tracng methods, see Fg 3 For each frame, we annotated 8 dstnct landmars (5 fngertp postons and 3 corners of the object) If a locaton s not vsble, the correspondng landmar s set to be nvald and s not consdered n the error measure For the object (cubod), the 3 landmars span a coordnate system along the cubod s two domnant axes Ths unquely defnes the cubod wth respect to an axs of symmetry For evaluaton, we employ the followng error metrc to compare our tracng results wth the ground truth annotatons: [ 1 E V + 1 M V X G + 1 M 3 m M X m G m where V denotes the set of all un-occluded fngertp postons n the ground truth, M denotes matched cubod corners, and X and G denote estmated and ground truth postons, respectvely The ndcator functon 1 M s 1 f M 3 and 0 otherwse Fngertp postons are compared wth the correspondng landmars based on the dstance n 3D Eucldean space To ths end, the 2D annotatons are bac-projected based on depth and nverse camera ntrnscs Matched cubod,
4 S Srdhar, F Mueller, M Zollho fer, D Casas, A Oulasvrta, C Theobalt Fg 3 The sx sequences of our novel ground-truth hand-object benchmar corners refers to corners n the estmated cubod that are closest to the ground truth If one of the cubod corners s occluded, then the set M s empty as the cubod cannot be unquely postoned 4 Gradents Here, we gve analytcal expressons for the gradents of all energy terms The used mathematcal notaton s defned n the man document Spatal Algnment Term Ea : µ µ µ X Xh µj Ea j S,j 2 σ + σj2 M j M µ µ µ X Xh j 2 S,j 2 σ + σj2 M j I Semantc Algnment Term Es : XX Es µ 2 α,j (µ µj ) x x M j I Anatomcal Plausblty Regularzer Ep : f xl x xu 0 Ep u 2 (x x ) f x > xu 2 (x xl ) f x < xl
Real-tme Jont Hand and Object Tracng from RGB-D Input 5 Temporal Smoothness Regularzer E t : Contact Ponts Term E c : E c x (j,l,t d ) T Object Occluson Term E o : References E t 2 (x (t) x 2x(t 1) + x (t 2) ) ( µj 4 ( µ j µ l 2 2 t 2 d) (µ j µ l ) µ ) l x x E o 2 (1 x ˆf ) (x x old ) H 1 Jan, B, Vemur, BC: Robust pont set regstraton usng gaussan mxture models Pattern Analyss and Machne Intellgence, IEEE Transactons on 33(8), 1633 1645 (2011) 2 Stoll, C, Hasler, N, Gall, J, Sedel, H, Theobalt, C: Fast artculated moton tracng usng a sums of gaussans body model In: Proc IEEE ICCV pp 951 958 (2011)