3D Point Cloud Video Segmentation Based on Interaction Analysis

Size: px

Start display at page:

Download "3D Point Cloud Video Segmentation Based on Interaction Analysis"

Sherman Pierce
6 years ago
Views:

1 3D Pont Cloud Vdeo Segmentaton Based on Interacton nalyss Xao Ln, Josep R.Casas and Montse Pardàs Image Processng Group, Techncal Unversty of Catalona (UPC) bstract. Gven the wdespread avalablty of pont cloud data from consumer depth sensors, 3D segmentaton becomes a promsng buldng block for hgh level applcatons such as scene understandng and nteracton analyss. It benefts from the rcher nformaton contaned n actual world 3D data compared to apparent (projected) data n 2D mages. Ths also mples that the classcal color segmentaton challenges have recently shfted to RGBD data, whereas new emergng challenges are added as the depth nformaton s usually nosy, sparse and unorganzed. In ths paper, we present a novel segmentaton approach for 3D pont cloud vdeo based on low level features and orented to the analyss of object nteractons. herarchcal representaton of the nput pont cloud s proposed to effcently segment pont clouds at the fner level, and to temporally establsh the correspondence between segments whle dynamcally managng the object splt and merge at the coarser level. Experments llustrate promsng results for our approach and ts potental applcaton n object nteracton analyss.... Keywords: object segmentaton, 3D pont clouds, dynamc splt and merge management, object nteractons 1 Introducton Segmentaton s an essental task n computer vson. It usually serves as the foundaton for solvng hgher level problems such as object recognton, nteracton analyss and scene understandng. Tradtonally, segmentaton s defned as a process of groupng homogeneous pxels nto multple segments on a sngle mage, whch s also known as low level segmentaton. The obtaned segments are somehow more homogeneous and more perceptually meanngful than raw pxels. Based on that, the concept of semantc segmentaton/labelng s proposed. It s devoted to segment an mage nto regons whch deally correspond to meanngful objects n the scene. To acheve ths goal, hgh level knowledge s usually ncorporated nto the segmentaton process, such as object models [2] exploted n constraned scenes, accurate object annotatons requred n the ntalzaton [12, 16] and large databases contanng fully annotated data n, for nstance, label transfer approaches such as [9]. These approaches yeld outstandng segmentaton results; however, most computer vson applcatons nvolve large amounts

2 2 X. Ln, J.R. Casas and M. Pardàs of data wth dfferent types of scenes contanng several objects, whch dffcult the adaptaton to generc scenes of those methods based on manual ntalzaton or predefned/learned object models. To generalze the methodology from constraned stuatons, larger attenton has been drawn on nvestgatng the spatal relaton between segments and ther temporal correspondences when temporal vdeo (stream) data s avalable. bunch of methods focusng on the spato-temporal relaton between segments are proposed [4, 15, 3, 5, 6, 1]. These methods manly focus on tacklng two problems: a hgher level representaton, whch abstracts the raw data from scratch, and a method to establsh the spato-temporal correspondences. Several methods employ a generc model to represent the objects n the scene. Husan et al. [6] mantans a quadratc surface model to generally represent the object segments n the scene. The model s then updated along the sequence to obtan the fnal segmentaton result. But t s dffcult to handle objects wth large dsplacement n successve frames. Smlarly, a Gaussan Mxture Model (GMM) s used n [8] to represent the objects, whle the model s ncrementally updated for new frames n the sequence. However, t establshes the correspondence between the object model and the pont cloud n the new frame by usng the Iteratve Closest Pont (ICP) technque, whch may lead to the accumulaton of regstraton errors n the object model due to the deformaton of the objects. More generally n scene representaton, Rchtsfeld et al. [13] propose to represent the 3D pont cloud wth a graph of surface patches detected n the scene, such as planes and non-unform ratonal B-splnes (NURBs). SVM based learnng process s then employed to decde the relaton between surface patches for a sub-sequent graph cut segmentaton. Grundmann et al. [3] use a graph-based model to herarchcally construct a consstent vdeo segmentaton from over-segmented frames. Smlarly, Hckson et al. [5] extend the method to RGBD stream data. But the over-segmentaton for each frame n these two approaches s stll calculated ndependently, wthout the temporal coherence constrant, whch may lead to a temporal nconsstency problem due to changes of correspondng over-segments n dfferent frames. bramov et al. [1] perform label transfer n the pxel level between frames by usng optcal flow. Then, they mnmze the label dstrbuton energy n the Potts model to generate labels for objects n the scene. Ths establshes the temporal correspondences n the pxel level, whch makes the approach hghly rely on the performance of optcal flow estmaton. Motvated by the problems mentoned above, we propose a segmentaton algorthm based on the defnton of objects as compact pont clouds n the 3D-space plus tme doman. However, pont clouds correspondng to an object can break nto dfferent compact sub-clouds due to occlusons, or can merge wth compact pont clouds correspondng to other objects, producng a sngle compact pont cloud, when they become spatally close (object nteracton). Our system ams to produce a robust spato-temporal segmentaton of the pont clouds by analyzng ther connectvty to defne the objects accordng to the evdence observed up to a gven temporal pont. Our prmary contrbutons are:

3 3D Pont Cloud Vdeo Segmentaton Based on Interacton nalyss 3 We propose a novel tree structure representaton for the pont cloud of the scene whch allows us to temporally update the smlartes between nodes n the tree We propose to approach the temporal correspondences establshment task by a labellng assgnment problem regardng the tree structure. dynamc management of object splts and merges s exploted n our approach for generatng a better segmentaton result based on all low-level features avalable n over-segmentaton method based on the compactness of the connecton between neghborng super voxels n the graph s proposed. The rest of the paper s organzed as follow. In secton 2 we explan how the 3D pont cloud segmentaton problem s modeled. Sectons 3 and 4, present the framework of the proposed segmentaton approach and show expermental results, respectvely. Fnally, secton 5 dscusses the results and yelds conclusons. 2 Problem modelng and defnton In ths secton, we explan how the 3D pont cloud segmentaton task s modeled by the proposed tree structure. 2.1 Tree structure representaton of the pont cloud Gven a stream of RGBD data, our goal s to segment the foreground pont cloud n each frame nto meanngful sub-clouds and assocate these sub-clouds n consecutve frames to mantan the trajectores for them wthout explct object models or accurate ntalzaton. More precsely, we represent the nput pont cloud as a graph G (shown n Fg.1) wth a super-voxel approach [1]. Nodes n the graph are homogeneous sub-cloud patches and edges defne the spatal connectons among patches. In ths manner, the connectvty of a pont cloud s nterpreted as the connectvty n the correspondng graph representaton. The set of connected nodes n the graph corresponds to the compact parts of the nput pont cloud, whch we call blobs (shown n Fg.1(c) and marked n dfferent colors). Then a tree structure wth 4 levels varyng from coarse to fne s exploted to represent the nput pont cloud at dfferent scales of objectconnectvty. Fg.1 shows the constructed tree structure for the pont cloud data n the second row. The root of the tree represents the scene. The second level of the tree s the object level, n whch each node stands for an object proposal. The next level, named component level, s employed to handle potental splts and merges of pont clouds representng these objects. n object s represented by more than one component f t splts n dfferent blobs n the graph. Components from dfferent objects can be part of the same blob, because of the nteractons between objects. Splts and merges of components are managed by mantanng the smlartes among object components along tme, whch provdes a temporally coherent way to obtan object proposals based on pont cloud

4 4 X. Ln, J.R. Casas and M. Parda s root object smlarty Scene Level Object Level component smlarty B Component Level B 1 B1 B Over-Segmentaton Level Blob B2 B1 (c) (d) (e) Fg. 1. n example of tree structure representaton. tree structure representaton of the nput pont cloud data. The graph bult on the nput pont cloud. (c) Blobs n the nput pont cloud. (d) Components for each object. (e) Objects segmentaton obtaned from the tree structure. connectvty. The fnal level of the tree s the over-segmentaton level. We oversegment components usng normalsed cut n ther graphs n order to correctly establsh correspondences between trees along tme and update ts structure, that s, the temporal coherent assgnment of labels to the segmented objects. Note that three knds of labels are used n Fg.1 to dfferentate the nodes n the tree whle showng ther relatonshps, whch are object label (color), component label (alphabet) and segment label (number). We algned the color used n Fg.1 wth the real pont cloud data plots (Fg.1(c) and Fg.1(e)). In Fg.1(e), we present the object segmentaton result obtaned n ths tree structure, whle Fg.1(d) shows the components of each object from the pont cloud vew. Fg.1(c) presents the blobs n the nput pont cloud whch s related to the ellpses marked wth the same color n Fg Tree structure creaton Takng the pont cloud n frame t as nput data, we abstract t wth super voxels, usng the method proposed n [1]. The graph representaton smplfes the nput data by groupng homogeneous ponts on the pont cloud nto super voxels whle preservng the boundary nformaton. Then, a graph G s constructed regardng the spatal connectvty between super voxels. We group the pont cloud nto blobs by detectng the connected components n the graph. The tree n the frst

5 3D Pont Cloud Vdeo Segmentaton Based on Interacton nalyss 5? B B1 B2 frame t-1 frame t frame t-1 frame t Fg. 2. n example of temporal nconsstency problem. The problem when establshng the correspondence between components n the prevous frame and blobs n the current frame. Usng the segments nstead of components solves ths problem. frame s created by smply takng the detected blobs as the objects, as no pror nformaton about the objects s provded. ccordngly, we create one component for each object and over-segment each component nto segments. part from the frst frame, the tree s bult n a bottom-up way, startng at the component level. Frst, a correspondence s made between the connected components of the graph (blobs) n the current frame and the segments at the over-segmentaton level of the tree structure n the prevous frame. Ths over-segmentaton level s employed to avod temporal nconsstency problem. Fg.2 shows an example of t, where the component B of the blue object n frame t 1 splts nto two blobs n frame t. In ths case, no correct assocaton s found between components and blobs. The problem may be tackled by over-segmentng the component B of the blue object nto segments B1 and B2 (shown n Fg.2) and assocatng the segments n frame t 1 wth blobs n frame t. Establshng the correspondence between the blob labels and the segments s a problem of assgnng M b blob labels to M s segments. Ths s a nonlnear nteger programmng problem whch s solved usng a Genetc lgorthm to mnmze an energy functon whch s composed of three terms: one for representng the appearance changes E a, one for the dsplacements E d and the other one E o for the penalty when objects move out of the scene. further segmentaton s needed when segments that correspond to dfferent objects n the prevous frame are assgned to the same blob. restrcted graph cut method s employed to segment the graph of the blob by mnmzng a segmentaton energy functon, n whch we consder the degree that a graph cut fts the current data whle beng coherent wth the mnmum cut n the prevous frame. Once the current segmentaton s done, the components and objects n the current tree are created ntally from t regardng the prevous tree structure. To dynamcally manage object splts and merges along tme, we mantan smlartes between nodes at the component and object level respectvely and update the tree based on t. The component smlartes are measured among components whch belong to the same object whle the object smlartes are measured among objects. These smlartes are computed consderng spatal dstance and apprearance dfference, whch reveal the lkelhood of object splts and merges.

6 6 X. Ln, J.R. Casas and M. Pardàs We accumulate them along tme by averagng the current smlarty and the prevous accumulated smlarty regardng the establshed correspondences. Then object splts and merges are confrmed by thresholdng the accumulated smlartes. Fnally, an over-segmentaton s performed at the component level to generate segments for correctly establshng the correspondence to the next frame. Specfcally, a normalzed cut s performed n the graph representng the component teratvely untl the cut cost s larger than a threshold. 3 Graph based dynamc 3D pont cloud segmentaton In ths secton, we present the frame work of the proposed approach ncludng data acquston and ntalzaton, temporal correspondences establshment and segmentaton, the proposed dynamc management of object splts and merges mechansm and the over-segmentaton method. 3.1 Data acquston and ntalzaton We can transform the per-pxel dstances provded n an RGBD mage nto a 3D pont cloud C I R 3 usng camera parameters. We focus on the nterest area of the foreground cloud C fg R 3 n 3D space. Takng the foreground pont cloud at frame t as nput data, a graph representaton s constructed f (C fg ) G (v, e) va a graph buldng method f, where v s the set of vertces or nodes and e the edges of the graph. The super voxel method ntroduced n [1] s employed as the graph buldng method f n our approach. It aggregates ponts on the pont cloud nto homogeneous sub-cloud patches (super voxels) wth respect to the pont proxmty and appearance smlarty whle preservng the boundary nformaton. Then a graph G s bult for the super voxels regardng ther adjacency. The connectvty of C fg s nterpreted as the connectvty on G. Our system s ntalzed by buldng the tree structure for the frst frame. Nodes n the tree are denoted as Nlevel, where we specfy whch level the node belongs to and ts node number n ths level. Each node s descrbed by ts related pont cloud and graph Nlevel ( Clevel, level) G, where C level C fg and G level G. node for the tree root s created as Nsc 1 (C fg, G). s mentoned n Secton 2.2, we base the constructon of the tree for the frst frame only on the connectvty of the nput pont cloud. Thus, we extract blobs from C fg by detectng connected components on graph G. Each blob s treated as one object proposal whle accordngly we create one object node No n the tree. For each object node, one component node Nc s created. Components are over-segmented nto M s segments va an over segmentaton method OSeg(Nc) { } Ns 1 Ns Ms. 3.2 Correspondences establshment and segmentaton part from the frst frame, we create the current tree structure wth respect to the tree n the prevous frame T r. Smlarly, a graph G s obtaned for the current pont cloud and blobs are detected on G. Then, the tree buldng process

7 3D Pont Cloud Vdeo Segmentaton Based on Interacton nalyss 7 s started by establshng the correspondences between the current data and T r. ssocatng a set of labels to another set of labels s treated as an assgnment task, n whch we optmze dfferent assgnment proposals wth respect to an energy functon. Snce the problem scale ncreases exponentally wth the number of labels, t s crtcal to lmt the number of the labels. Therefore, we propose to assgn the blob labels n the current frame to the segments n the prevous tree T r. The blobs represent the few compact sub-clouds of the nput pont cloud. ssgnng blob labels to segments reduces the problem scale whle t respects the spatal dsconnecton between the compact sub-clouds, whch may concde wth the object boundares. The problem now becomes a task of assgnng M b blob labels n the current frame to M s segments n the prevous frame. To cope wth the stuaton when objects go out of the scene, we employ a vrtual blob B out whch stands for the space out of the nterest area and does not represent any pont cloud. Then, the assgnment energy functon s defned as: E ass () = E a + E d + E o (1) Where s an assgnment proposal whch maps a segment label n the prevous frame l s to a blob label l b n the current frame, ( l ) s l b. E a stands for the summaton of the energy of the appearance dfference between the set of the pont clouds Cs related to the segment N s and the pont cloud C b related the blob B, where the segment label ls j s assocated to the blob label l b regardng the assgnment. The appearance dfference s measured by comparng the number of ponts NoP ( ) on the pont cloud. M b E a () = NoP ( C ) b =1 (ls )==lb NoP ( ) Cs (2) E d s the summaton of the energy of the dsplacement whch s calculated by measurng the Hausdorff dstance dst h ( ) between the pont cloud Cs and the pont cloud Cb (l ), where s j == lb. E d () = M b =1 (ls )==lb ) dst h (C s, C b (3) E o s the summaton of the dstance between the pont cloud of the segment Ns, whch s assocated wth blob B out, to the closest boundary bd. The boundares are predefned planes whch are also used n the data acquston step n Sec.3.1. The dstance s calculated as the Eucldean dstance (dst e ( )) from the centrod of the pont cloud of the segment to the closest boundary plane. ( )) E o () = mn (dst e Cs, bd (4) (ls bd Boundary )==lb out

8 8 X. Ln, J.R. Casas and M. Pardàs Optmzng ths energy functon s a nonlnear nteger programmng problem. We employed the Genetc lgorthm to solve t. fter the best assgnment s obtaned, the blobs n the current frame are assocated wth segments n the prevous frame. further segmentaton s needed when segments that correspond to dfferent objects n the prevous frame (accordng to the prvous tree T r ) are assgned to the same blob. Gven a blob B ( Cb, b) G and the M unque object labels related to the assocated segment labels { ls j ( } ls) j == l b, our goal s to segment the graph G b nto M parts. To ths end, we employ the restrcted graph cut approach proposed n [17] to seek for the mnmal cut on graph G b, whch further segments ths blob consderng both spatal/feature homogenety and the temporal consstency. The mnmal cut s obtaned by mnmzng a segmentaton energy functon wth the way ntroduced n [7]. The energy functon for graph cut s usually defned as the summaton of the data energy and the smoothness energy (E (L) = E data + E smooth ), where L stands for the label proposal for the graph. The data energy s an unary energy term representng the degree that the label proposal fts the current data. The smoothness energy s a par-wse energy whch manages the label smoothness between nodes n the graph. The method proposed n [17] ntroduces a novel smoothness energy term consderng the label smoothness regardng not only the current data but also the mnmal cut obtaned n the prevous frame. The further segmentaton performed on the blob wth segment labels related to multple object labels yelds the object parttons for ths blob whle establshng ther correspondences to the objects n the prevous tree. s mentoned n Secton 2.1, an object s represented by more than one component f t splts n dfferent blobs, n the current or prevous frames. Thus, for each object partton n the blobs, we create one component for the related object. ccordngly, the correspondences between the current components and the components n the prevous tree are made. Note that there s no correspondence establshed for the newly generated components n the current tree. In ths step, the frst three levels of the current tree structure are ntally bult regardng the establshed correspondences to the prevous tree at each of the three levels. 3.3 Dynamc management of merge and splt Gven the current tree obtaned n the last step, an object proposal s mplct at the object level. Ths object proposal for the current nput data s temporally coherent wth the object proposal n the prevous frame. However, t may not be a correct object proposal, snce no accurate ntalzaton s guaranteed at the begnnng of ths process n our approach. That s to say, ths object proposal needs to be further analyzed, n order to cope wth the errors n the prevous nformaton. In ths case, we explot the establshed correspondences and analyze the behavors of related nodes n trees along tme. More specfcally, we compute the smlartes among the components correspondng to the same object, whch forms a smlarty matrx for each object n the tree. The object smlartes are computed among objects n the tree producng an object smlarty matrx. In

9 3D Pont Cloud Vdeo Segmentaton Based on Interacton nalyss 9 accumulated splt... Object Level accumulated smlarty matrx... merge... Object Level smlarty matrx blob blob Component Level blob blob Component Level Fg. 3. Example of dynamc management of splt and merge our approach, the smlarty between node N and N j s defned as the Eucldean dstance between ther related pont clouds n Eq.(5). Sm ( { ( N, N j) dst e C, C j) > ψ = 1 dste(c,c j ) (5) ψ otherwse Note that here the Eucldean dstance between two pont clouds s calculated as the dstance between the closest pont par from them. ψ s a normalzng factor. fterwards, the smlartes are accumulated along tme by averagng them wth the correspondng accumulated smlarty n the prevous frame, n order to dynamcally manage the merges and splts of objects n the scene on the fly. The accumulated smlarty reveals the lkelhood of object splts and merges regardng the evdence observed up to the current frame. Thus, the decsons for splt and merge are made by thresholdng the accumulated smlarty regardng two thresholds, T h splt and T h merge. Fg.3 shows an example how the object merge and splt are dynamcally managed. Specfcally, a splt for an object node s confrmed when a set of ts chld component nodes have all the accumulated smlartes smaller than T h splt wth respect to the rest of the chld component nodes. Then a new object node s created as the parent node of the splt component nodes. In Fg.3, the red component node splts from ts parent object node and a new object node marked n blue s created as ts new parent. merge between object nodes s confrmed when they are physcally connected whle the accumulated smlartes between them are larger than T h merge. In Fg.3, a merge s confrmed between the blue object node and the green object node whch s physcally connected wth each other. The nodes are merged to the one wth the larger number of the ponts on the related pont cloud (the green node). Ther chld component nodes are all connected to the green object node n the tree. Then the blue object node s removed from the tree. 3.4 Over segmentaton The frst three levels n the current tree structure are bult and updated n the last two steps. In ths secton, we ntroduce an over segmentaton process n order to buld the forth level of the tree. In the over-segmentaton level, we generate segments for each component n the current tree. Ths s treated as the preparaton for establshng the correspondence between segments n the

10 Frame Frame Frame Frame 1 X. Ln, J.R. Casas and M. Pardàs current tree to the blobs detected n the next frame. The more segments are generated n the current frame, the better they wll respect the topology of the nput pont cloud n the next frame, whch avods the temporal nconsstency problem. However, the number of the segments wll affect the problem scale of the assgnment task n the correspondence establshment and segmentaton process n the next frame. Therefore, nstead of usng a technque such as super voxel [1] whch fnely over-segments the pont cloud, we propose a relatvely coarser over-segmentaton method to tackle ths problem. Pratcally, n the related graph G c (v, e) of a component node Nc, we defne the touchng ponts for connected nodes n G c. The touchng ponts T Pj from the node v to the node v j s computed as the number of the ponts n v whch have the closest Eucldean dstance lower than a threshold T h tp to v j. The touchng ponts T P j from v j to v s defned n the same manner. The number of touchng ponts reveals the compactness CC between connected nodes n the graph, whch s defned as: CC ( v, v j) = NoP ( ( ) ) T Pj NoP T P j NoP (v + ) NoP (v j (6) ) where NoP ( ) stands for the number of the ponts n a graph node. We beleve that any splt of the pont cloud wll gradually lead the decrease n the number of touchng ponts at the splttng poston on the graph. Then the edges n the graph are weghted by CC and a normalzed cut approach [14] s performed on the graph teratvely, whch yelds one segment node n each teraton untl the cut cost s larger than a threshold T h mc. In ths manner, the component s teratvely over-segmented nto segments at the postons whch are less compact n the graph, whch may concde wth the splts n the next frame. 4 Experments 4.1 Segmentaton result evaluaton To evaluate our approach from the 3D pont cloud segmentaton perspectve, we select 4 sequences wth 3D pont cloud ground truth labelng n the human manpulaton data set [11]. Each of them contans 2 frames. These 4 sequences Error (%) Error (%) Error (%) Error (%) (c) (d) Fg. 4. -(d) present the segmentaton results for sequence 1-4, shown as percentage of error ponts (vertcal axs) per frame (horzontal). Red: GDS, Blue: GS.

11 3D Pont Cloud Vdeo Segmentaton Based on Interacton nalyss 11 vary from sngle attachment to mult-attachments, low moton to hgher moton, double attached objects to multple attached objects. The task of ths experment s to segment objects from the scene. The evaluaton metrcs s 3D segmentaton accuracy (3D CCU) proposed n [18] whch computes the fracton of a ground truth segment that s correctly classfed n our approach. Snce the super voxel based graph representaton method organses the nput pont cloud wth voxels n 3D producng a down sampled pont cloud, whle the ground truth s labeled n the orgnal cloud, we fnd K nearest neghbors for a pont on the down sampled pont cloud from the ground truth labelng and use majorty voted label among K nearest neghbors as the ground truth labelng for ths pont. Fg.4 shows the segmentaton results of these 4 sequences. In each sub-fgure, we plot the percentage of segmentaton error aganst the frame number. We compare the segmentaton performance of our graph based dynamc segmentaton approach (GDS) wth the graph based segmentaton method (GS) proposed n [17] whch provdes a temporally coherent segmentaton for RGBD stream data. The red lnes n Fg.4 stand for the result of GDS whle the blue lnes stand for the result of GS. s mentoned n Secton 3.2, GS s also ntegrated n our approach producng temporally coherent segmentaton for blobs wth more than one unque object labels. Thus, the comparson between them shows the mportance of ntroducng the dynamc management of object splt and merge mechansm. Partcularly, GDS outperforms GS n all the 4 sequences shown n Fg.4, whch proves that the dynamc management mechansm contrbutes n the low level to the better segmentaton of actual objects n the scene. GDS acheves an overall foreground 3D pont cloud segmentaton 3.92% mean segmentaton error. In Fg.4(c), we observe a dramatc ncrease (from 2.5% to 15%) n the segmentaton error for GDS (red lne). Ths s caused by the error n dynamc management of object merge and splt. Fg.5(d)-5(f) show 3 key frames n ths process, where objects are marked n dfferent colors. The left arm of the human body n blue s confrmed to splt from the torso n frame 81 due to the persstent self-occluson. The self-occluson breaks the human body nto two compact pont clouds (the left arm and the rest), whch gradually decreases the accumulated smlarty tll the splt s confrmed n frame 81. However, these two pont clouds reattach to each other n frame 13, whch contnuously ncreases the accumulated smlarty between them, tll they are confrmed to merge n frame 136. Fg.5(j)-5(l) n the second row present another example, n whch our system s ncorrectly ntalzed n frame 1 (part of the torso marked n red s treated as one object because of the spatal dsconnecton caused by occluson). They reattach to each other n frame 8 and get merged n frame 14 due to the dynamc management mechansm. These two examples llustrate the robustness of our system regardng the errors n prevous frames whle showng that our approach does not rely on an accurate ntalzaton. For comparson, we employ 3 more sequences proposed n [6] and perform our approach aganst the daptve Surface Models based 3D Segmentaton method (SMS) n [6]. SMS mantans a quadratc surface model to generally represent the object segments n the scene. The model s updated along the sequence by

12 X. Ln, J.R. Casas and M. Parda s (c) (d) (e) (f) (g) (h) () (j) (k) (l) Fg. 5. Examples of dynamc management of merge and splt.

(g)-() present the color mages of frame 1,8 and 14 n sequence 2, (j)-(l) show the segmentaton results n these frames. fndng and growng the overlappng area to obtan the fnal segmentaton result.

6 shows a quanttatve comparson between GDS and SMS n these 3 sequences. Sequence 1 contans a scenaro of a human hand rollng a green ball forward and then backward wth the fngers.

12 12 X. Ln, J.R. Casas and M. Parda s (c) (d) (e) (f) (g) (h) () (j) (k) (l) Fg. 5. Examples of dynamc management of merge and splt. -(c) present the color mages of frame 81,13 and 136 n sequence 4, (d)-(f) show the segmentaton results n these frames. (g)-() present the color mages of frame 1,8 and 14 n sequence 2, (j)-(l) show the segmentaton results n these frames. fndng and growng the overlappng area to obtan the fnal segmentaton result. To adapt our approach to the scenes n these 3 sequences, we remove the background pont cloud by usng a plane fttng technque to extract foreground pont clouds as nput data. Fg. 6 shows a quanttatve comparson between GDS and SMS n these 3 sequences. Sequence 1 contans a scenaro of a human hand rollng a green ball forward and then backward wth the fngers. Sequence 2 nvolves a robot arm graspng a paper roll and movng t to a new poston. Sequence 3 descrbes a scenaro n whch the human hand enters and leaves the scene, dsplacng the objects rapdly. The comparson results show that the proposed GDS approach outperforms SMS n all the 3 sequences. Specfcally, Fg.6(c) shows an example of the drawback n SMS. The spkes n the blue curve are caused by rapd object movement, whch leaves lttle or no overlap of correspondng segments for SMS. However, our method has the robustness to rapd movements, snce the correspondences establshment problem s treated as the optmzaton of an assgnment energy. part from the sequences used n the quanttatve evaluaton experments, we employ 5 more sequences wthout ground truth labelng, recorded by ourselves wth scenes dsplayng nteractons n human manpulaton scenes. Fg.7 shows Error(%) Error(%) Error(%) Frame Frame Frame (c) Fg. 6. Quanttatve result of GDS (n red) and the SMS (n blue) for the 3 sequences provded n [6]. From left to rght, sequence

13 3D Pont Cloud Vdeo Segmentaton Based on Interacton nalyss (c) (d) 13 Fg. 7. Qualtatve results of the proposed method: - from human manpulaton dataset n [11], (c) from data recorded by ourselves and (d) from data n [6]. some qualtatve results of our approach from 4 sequences. For each sequence, we unformly sample 4 frames and show the segmentaton result from our approach. More vsual results are avalable on Interacton detecton Our approach s also capable to obtan the nteractons between objects, whch s mplct n the tree structure. s mentoned n Secton 1, an nteracton between objects s defned as a state when they become spatally close. In our method, an nteracton s detected when a blob s related to more than one unque object label. n example of an object nteracton s shown n Fg.8, where an nteracton between a human body and a box s detected and marked as the black lne connectng them. Based on ths defnton, we manually label the object nteracton occurrng n the 4 sequences used n the frst experment and calculate the tmes of nteractons between objects n the scene detected by the proposed method. Fg.8(c) shows the nteracton detecton result n each sequence, where the top of each bar shows the number of nteractons n the ground truth, the red lne stands for the true postve detectons n our approach and the blue lne stands for the false postve detectons. Our approach detected 742 truth postve nteractons over 98 labeled nteractons n the ground truth. We notce that the number of false postve detectons n sequence 3 s relatvely hgh. Ths s 4 35 Red lne: True Postve Blue lne: False Postve number of nteractons Seq.1 Seq.2 Seq.3 Seq.4 (c) Fg. 8. n example of object nteracton. n example of false postve detectons (c) Interacton detecton evaluaton for 4 sequences from [11]

14 14 X. Ln, J.R. Casas and M. Pardàs manly caused by the errors n dynamc management of object splt and merge process. Fg.8 shows the segmented pont cloud n a frame of sequence 3, where an nteracton s detected between the arm n blue and the juce box n green. However, the arm and the torso are not correctly merged as one object n ths frame, whch makes the detected nteracton a false detecton. 4.3 Computatonal cost analyss In our approach, there are two man parts where the computatonal power s spent: the optmzaton for the mult-labels assgnment for temporal correspondences establshment and the graph cut technque used n ether the further segmentaton or the over-segmentaton process. The man problem of approachng the temporal correspondences assocaton by a mult-labels assgnment problem s the computaton complexty. The problem scale ncreases exponentally when the number of labels grows. However, the number of the labels s controlled n our approach by fndng a sutable over-segmentaton level so that we can acheve the assgnment task n a small problem scale whle not leadng to the temporal nconsstency problem. In the experments, generally 2 segments and 5 blobs are nvolved n the assgnment task n each frame. The graph cut technque used n our approach has the reported computaton complexty O ( v 2 sqrt (e) ) where v stands for the number of vertces and e the number of edges on the graph. 5 Concluson In ths paper, we have ntroduced a graph based dynamc 3D pont cloud segmentaton method, whch works at low level wth a tree structure representaton for segmentng generc objects n RGBD steam data. We have evaluated the performance of the proposed approach wth a human manpulaton data set and also compared t wth the method proposed n [6]. Our approach acheves an overall 3.92% 3D pont cloud segmentaton error whle outperformng n the comparson experment. Our contrbuton can be summarzed n 3 ponts: frstly, we proposed a novel over-segmentaton method based on the compactness of the connecton between neghborng super voxels on the graph then a novel tree structure representaton for the scene s proposed, whch allows to temporally update the smlartes between nodes n the tree the temporal correspondences establshment task s approached by a labellng assgnment problem that takes nto account the appearance and dsplacement of the components. Based on the tree structure, a dynamc management of object splts and merges mechansm s proposed Our approach generates a better segmentaton result based on all low-level features avalable. Ths guarantees t to be generc, as no explct or learnt model of the objects or the scene are ntroduced n the proposed method. cknowledgement: Ths work has been developed n the framework of the project TEC R, fnanced by the Spansh Mnstero de Economa y Compettvdad and the European Regonal Development Fund (ERDF).

15 References 3D Pont Cloud Vdeo Segmentaton Based on Interacton nalyss bramov,., Pauwels, K., Papon, J., Wörgötter, F., Dellen, B.: Depth-supported real-tme vdeo segmentaton wth the knect. In: pplcatons of Computer Vson (WCV), 212 IEEE Workshop on. pp IEEE (212) 2. Felzenszwalb, P.F., Grshck, R.B., Mcllester, D., Ramanan, D.: Object detecton wth dscrmnatvely traned part-based models. Pattern nalyss and Machne Intellgence, IEEE Transactons on 32(9), (21) 3. Grundmann, M., Kwatra, V., Han, M., Essa, I.: Effcent herarchcal graph-based vdeo segmentaton. In: Computer Vson and Pattern Recognton (CVPR), 21 IEEE Conference on. pp IEEE (21) 4. He, X., Zemel, R.S., Carrera-Perpñán, M.Á.: Multscale condtonal random felds for mage labelng. In: In CVPR 24. vol. 2, pp. II 695. IEEE (24) 5. Hckson, S., Brchfeld, S., Essa, I., Chrstensen, H.: Effcent herarchcal graphbased segmentaton of rgbd vdeos. In: CVPR214. IEEE Computer Socety (214) 6. Husan, F., Dellen, B., Torras, C.: Consstent depth vdeo segmentaton usng adaptve surface models. Cybernetcs, IEEE Transactons on 45(2), (215) 7. Kolmogorov, V., Zabn, R.: What energy functons can be mnmzed va graph cuts? Pattern nalyss and Machne Intellgence, IEEE Transactons on 26(2), (24) 8. Koo, S., Lee, D., Kwon, D.S.: Incremental object learnng and robust trackng of multple objects from rgb-d pont set data. Journal of Vsual Communcaton and Image Representaton 25(1), (214) 9. Lu, C., Yuen, J., Torralba,.: Nonparametrc scene parsng va label transfer. Pattern nalyss and Machne Intellgence 33(12), (211) 1. Papon, J., bramov,., Schoeler, M., Worgotter, F.: Voxel cloud connectvty segmentaton-supervoxels for pont clouds. In: Computer Vson and Pattern Recognton (CVPR), 213 IEEE Conference on. pp IEEE (213) 11. Peropan,., Salv, G., Pauwels, K., Kjellstrom, H.: udo-vsual classfcaton and detecton of human manpulaton actons. In: Intellgent Robots and Systems (IROS 214), 214 IEEE/RSJ Internatonal Conference on. pp IEEE (214) 12. Ren, X., Malk, J.: Trackng as repeated fgure/ground segmentaton. In: Computer Vson and Pattern Recognton, 27. pp IEEE (27) 13. Rchtsfeld,., Mörwald, T., Prankl, J., Zllch, M., Vncze, M.: Segmentaton of unknown objects n ndoor envronments. In: Intellgent Robots and Systems (IROS), 212 IEEE/RSJ Internatonal Conference on. pp IEEE (212) 14. Sh, J., Malk, J.: Normalzed cuts and mage segmentaton. Pattern nalyss and Machne Intellgence, IEEE Transactons on 22(8), (2) 15. Shotton, J., Wnn, J., Rother, C., Crmns,.: Textonboost: Jont appearance, shape and context modelng for mult-class object recognton and segmentaton. In: Computer Vson ECCV 26, pp Sprnger (26) 16. Tsa, D., Flagg, M., Nakazawa,., Rehg, J.M.: Moton coherent trackng usng mult-label mrf optmzaton. IJCV 1(2), (212) 17. Xao, L., Josep, C., Montse, P.: 3d pont cloud segmentaton orented to the analyss of nteractons. In: 24th European Sgnal Processng Conference (EUSIPCO),216. p. (ccepted and to be publshed). IEEE (216) 18. Xu, C., Corso, J.J.: Evaluaton of super-voxel methods for early vdeo processng. In: Computer Vson and Pattern Recognton (CVPR), 212 IEEE Conference on. pp IEEE (212)

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching A Fast Vsual Trackng Algorthm Based on Crcle Pxels Matchng Zhqang Hou hou_zhq@sohu.com Chongzhao Han czhan@mal.xjtu.edu.cn Ln Zheng Abstract: A fast vsual trackng algorthm based on crcle pxels matchng