Universidade Federal da Bahia Universidade Salvador Universidade Estadual de Feira de Santana TESE DE DOUTORADO


 Hugo Cook
 10 months ago
 Views:
Transcription
1 Universidade Federal da Bahia Universidade Salvador Universidade Estadual de Feira de Santana TESE DE DOUTORADO An Adaptive Approach to RealTime 3D NonRigid Registration Antonio Carlos dos Santos Souza Programa Multiinstitucional de PósGraduação em Ciência da Computação PMCC Salvador 19 de Dezembro de 2014 PMCCDsc0000
2
3 ANTONIO CARLOS DOS SANTOS SOUZA AN ADAPTIVE APPROACH TO REALTIME 3D NONRIGID REGISTRATION Tese apresentada ao Programa Multiinstitucional de PósGraduação em Ciência da Computação da Universidade Federal da Bahia, Universidade Estadual de Feira de Santana e Universidade Salvador, como requisito parcial para obtenção do grau de Doutor em Ciência da Computação. Orientador: Antônio Lopes Apolinário Júnior Salvador 19 de Dezembro de 2014
4 ii Ficha catalográfica. Souza, Antonio Carlos dos Santos An Adaptive Approach to RealTime 3D NonRigid Registration/ Antonio Carlos dos Santos Souza Salvador, 19 de Dezembro de p.: il. Orientador: Antônio Lopes Apolinário Júnior. Tese (doutorado) Universidade Federal da Bahia, Instituto de Matemática, 19 de Dezembro de Alinhamento nãorígido 2. Algoritmos Adaptativos 3. Realidade Aumentada. I. Apolinario, Antônio Lopes. II. Universidade Federal da Bahia. Instituto de Matemática. III Título. CCD 20.ed
5 iii TERMO DE APROVAÇÃO ANTONIO CARLOS DOS SANTOS SOUZA AN ADAPTIVE APPROACH TO REALTIME 3D NONRIGID REGISTRATION Esta tese foi julgada adequada à obtenção do título de Doutor em Ciência da Computação e aprovada em sua forma final pelo Programa Multiinstitucional de Pós Graduação em Ciência da Computação da UFBAUEFSUNIFACS. Salvador, 19 de Dezembro de 2014 Prof. Dr. Antônio Lopes Apolinário Júnior Universidade Federal da Bahia Prof. Dr. Vinicius Moreira Mello Universidade Federal da Bahia Prof. Dr. Thales Miranda de Almeida Vieira Universidade Federal de Alagoas Prof. Dr. Ricardo Farias Universidade Federal do Rio de Janeiro Prof. Dr. Luiz Marcos Garcia Gonçalves Universidade Federal do Rio Grande do Norte
6
7 ACKNOWLEDGEMENTS First, I would like to thank God for all the blessings given during my journey. This work was made possible by the enthusiastic support, suggestions, encouragement, and guidance of many individuals. I am greatly indebted to my academic advisor, chair and director of this work, Prof. Dr. Antonio Lopes Apolinário Jr for instilling in me the joy of conducting outstanding research in computer graphics. These six years have been intense in my career. Your support and vigilance have allowed me to achieve results that I couldn t have thought of. Thank you so much Committee for the direction, feedbacks, and all the enlightening advices. Thank you Prof. Dr. Gilson Giraldi, Prof. Dr. Vinícius Mello and Prof. Dr. Perfilino Ferreira. Thank you Prof. Dra. Lynn Alves for the unforgettable times at master s degree. Furthermore, I would like to acknowledge my friend Márcio Cerqueira de Farias Macedo for the great partnership and his awesome markerless augmented reality environment for onpatient medical data visualization. Working with you has been a wonderful experience and a great source of inspiration. I really wonder how my thesis would be without this environment. Many individuals also provided support in myriad ways. Special thanks go to Aline Machado, Sabrina, Osmar, Rosalba, Thalles Caribé, Prof. Dr. Eduardo Telmo, Prof. Dr. Lurimar, Prof. André, Prof. Jowaner, Prof. Cesar, Lilia, Prof. Dr. Marcelo Veras, Prof. Dr. Jairo Dantas, Bruno, Leo, Everton, Toninho, Dona Vilma, Marilene (Fortona), Dona Mary, Sr. Deja, Cita, Fabiana, Carol, Janio, Eliakin, Igor, Rodrigo, Jony, Katia, Anderson, Leandro, Edilson and Rita. I am also much indebted to the insightful discussions and fun times with all the Labrasoft friends: Luiz Cláudio Machado, Valentim, Romilson, Ronaldo, Antonio Maurício, Simone, Josildo, Marcelo, Felipe, Amilton, Vanessa, Diego, Letícia, Luiz Henrique, Pedro, Jorge, Fabiano and Aderbal. Finally, I would like to acknowledge my family for their constant support and encouragement during this graduate journey: Antonio Porfírio, Reinaldo, Ricardo, Maisa, Tissi, Lucia, Danilo, my uncle Zé Ribeiro, Manoel, Walter and Carlito, my aunt Belita, Esmera, Judith and Decinha, my cousins Fatim, Alva and Bel, Alan, Zeo, Caio, Vivian, Manoel, Jorge, Nandi, Renã, Luis and my family away from home Chicão, Rosinha, Juca, Leo, Mito and Kelly. I wish to thank Aline Requião for everything. Aline, I appreciate your motherly love. I dedicate this work to the loving memory of my mother Antonia, to my son Arthur and to my nieces Bruna and Eduarda. v
8
9 RESUMO Alinhamento nãorígido 3D é fundamental para o rastreamento e/ou reconstrução de modelos tridimensionais deformáveis. Contudo, a maioria dos algoritmos de alinhamento nãorígido não são tão rápidos quanto aqueles desenvolvidos no campo do alinhamento rígido. Métodos rápidos para alinhamento nãorígido 3D são particularmente interessantes para aplicações de realidade aumentada sem marcadores, em que um objeto sendo utilizado como marcador natural pode sofrer deformações ao longo do tempo de execução da aplicação. Nesta tese é apresentado um algoritmo adaptativo multiframe implementado em GPU para o alinhamento nãorígido de modelos tridimensionais deformáveis capturados por uma câmera RGBD. Abordagens adaptativas tendem a otimizar algoritmos, concentrando esforços nos locais mais relevantes, causando um efeito global de melhoria da solução. O método proposto utiliza adaptatividade em três passos do algoritmo. Primeiro, para guiar a distribuição de regiões de influência baseado na intensidade de deformação calculada sobre o objeto. Segundo, durante a seleção de restrições, em que a amostragem feita sobre o objeto para a fase de otimização é baseado na deformação atual medida. Terceiro, para aplicar o algoritmo em um esquema multiframe apenas quando o erro do rastreamento rígido ultrapassar um certo limiar, indicando que uma transformação rígida já não produz um alinhamento satisfatório. A partir do uso da adaptatividade e do paralelismo da implementação em GPU, foram obtidos resultados que demonstram que o método proposto é capaz de executar em tempo real com uma abordagem tão precisa quanto aquelas existentes na literatura. Alinhamento nãorígido, Algoritmos Adaptativos, Realidade Aumen Palavraschave: tada. vii
10
11 ABSTRACT 3D nonrigid registration is fundamental for tracking or reconstruction of 3D deformable shapes. However, the majority of nonrigid registration methods are not as fast as the ones developed in the field of rigid registration. Fast methods for 3D nonrigid registration are particularly interesting for markerless augmented reality applications, in which the object being used as a natural marker may undergo nonrigid user interaction. Here, we present a multiframe adaptive algorithm for 3D nonrigid registration implemented on GPU where the 3D data is captured from an RGBD camera. In general, adaptive algorithms optimize the solution, focusing on the more relevant aspects of the problem, causing a global improvement on the final solution. Our approach uses adaptivity in three stages of the process. First, to guide the distribution of regions of influence based on the deformation intensity on some region of the shape. Second, during the selection of constraints, where the sampling done over the object for the optimization is based on the current deformation. Third, to apply the algorithm in a multiframe manner only when rigid tracking error is above a predefined threshold, showing that a rigid transformation cannot result in a satisfactory result. Taking advantage from this adaptivity and the parallelism of the GPU, the results obtained show that the proposed algorithm is capable to achieve realtime performance with an approach as accurate as the ones proposed in the literature. Keywords: NonRigid Registration, Adaptive Algorithms, Augmented Reality. ix
12
13 CONTENTS Chapter 1 Introduction Hypothesis Contributions Organization Chapter 2 Fundamentals and Related Work Augmented Reality D Registration Rigid Registration NonRigid Registration Summary Chapter 3 Markerless Augmented Reality Environment Surface Acquisition D Reference Model Reconstruction Tracking Summary Chapter 4 GPUBased Adaptive NonRigid Registration Deformation Model Matching of Points Selection of Nodes Weighting the Influence of Nodes Selection of Constraints Error Minimization Updating the Source Object MultiFrame NonRigid Tracking Summary Chapter 5 NonRigid Registration Evaluation Methodology Accuracy Evaluation Performance Evaluation Discussion xi
14 xii CONTENTS 5.5 Summary Chapter 6 NonRigid Support Evaluation for a Markerless Augmented Reality Environment Methodology Evaluation Summary Chapter 7 Conclusion and Future Work Conclusion Future Directions
15 LIST OF FIGURES 2.1 RealityVirtuality Continuum (Milgram; Kishino, 1994) Markerbased (left image) and markerless (right images) augmented reality. Left image is courtesy of ARToolKit library (Kato; Billinghurst, 1999) and right images are courtesy of KinectFusion (Izadi et al., 2011) Overview of the proposed approach from 3D reference model reconstruction to tracking solution. Adapted from (Souza; Macedo; Apolinario, 2014) Overview of KinectFusion s pipeline (Izadi et al., 2011) Left image: The user translated his face fast. A small number of points were at the same image coordinate and the ICP failed. Right image: By using the pose estimation algorithm, the problem can be solved (Macedo; Apolinario; Souza, 2013) Overview of the proposed approach from the depth map acquisition to the final nonrigid aligned surface Building of the deformation graph (right) over the source object (left) based on the residual error measured (center) Refinement of the deformation graph (right) over the cheeks region of the source object (left) based on the residual error measured (center) Collapsing of the deformation graph (right) over the cheeks region of the source object (left) after updating on the residual error (center) Constraint selection based on the initial nonrigid error between source and target surfaces Overview of the libraries used for each step of our approach Datasets used for evaluation of the nonrigid registration algorithm. I  Synthetic dataset consisting on a deformed plane. II  Real dataset of a deforming hand. III1  Real dataset of a user smiling. III2  Real dataset of a user inflating his cheeks The resulting colorcoded error from the registration between source and target surfaces. In all situations the proposed algorithm AdNodes + Ad Cons obtained an averaged accuracy below 3mm and standard deviation below 3.5mm. I  Synthetic dataset consisting on a deformed plane. II  Real dataset of a deforming hand. III1  Real dataset of a user smiling. III2  Real dataset of a user inflating his cheeks Accuracy (in mm) obtained by AdNodes and AdNodes + AdCons in comparison with the Embedded Deformation (ED) algorithm and the initial error for each one of the datasets used xiii
16 xiv LIST OF FIGURES 5.5 Accuracy comparison between ED algorithm and our adaptive approach with respect to the node selection for the dataset II Accuracy comparison between ED algorithm and our adaptive approach with respect to the node selection for the dataset III Accuracy comparison between different sampling schemes used to select constraints for optimization for the dataset II Accuracy comparison between different sampling schemes used to select constraints for optimization for the dataset III Accuracy (in mm) related to the parameter k for each one of the datasets used Accuracy (in mm) obtained for each level of the quadtree and for each one of the datasets used. The maximum number of nodes for a level l is 4 l Performance (in FPS) obtained by AdNodes and AdNodes + AdCons in comparison with ED algorithm for each one of the datasets used Performance (in ms) obtained by our approach for each one of the most computationally expensive methods. MM  Matrix Multiplication (A = J t J); Jacobian  computation of J; Cholesky  LLT decomposition; Solver  linear solver Strsm from CUBLAS library; ACS  Adaptive Constraint Selection; ANS  Adaptive Node Selection; Weights  computation of the influence of G on P s ; MV  Matrixvector multiplication (b = J t r) Neutral and deformed reference models based on user s facial expression Neutral and deformed reference models for a different user Neutral and deformed reference models based on challenging deformation scenarios Cheeks tracking error measured for both rigid and rigid + nonrigid solutions. Plot in red  rigid tracking. Plot in blue  nonrigid adaptive tracking. Dashed line  threshold Colorcoded cheeks tracking error measured for both rigid and nonrigid solutions Cheeks2 tracking error measured for both rigid and rigid + nonrigid solutions. Plot in red  rigid tracking. Plot in blue  nonrigid adaptive tracking. Dashed line  threshold Colorcoded cheeks2 tracking error measured for both rigid and nonrigid solutions Smile tracking error measured for both rigid and rigid + nonrigid solutions. Plot in red  rigid tracking. Plot in blue  nonrigid adaptive tracking. Dashed line  threshold Colorcoded smile tracking error measured for both rigid and nonrigid solutions Smile2 tracking error measured for both rigid and rigid + nonrigid solutions. Plot in red  rigid tracking. Plot in blue  nonrigid adaptive tracking. Dashed line  threshold
17 LIST OF FIGURES xv 6.11 Colorcoded smile2 tracking error measured for both rigid and nonrigid solutions Kiss tracking error measured for both rigid and rigid + nonrigid solutions. Plot in red  rigid tracking. Plot in blue  nonrigid adaptive tracking. Dashed line  threshold Colorcoded kiss tracking error measured for both rigid and nonrigid solutions Kiss2 tracking error measured for both rigid and rigid + nonrigid solutions. Plot in red  rigid tracking. Plot in blue  nonrigid adaptive tracking. Dashed line  threshold Colorcoded kiss2 tracking error measured for both rigid and nonrigid solutions Open Mouth tracking error measured for both rigid and rigid + nonrigid solutions. Plot in red  rigid tracking. Plot in blue  nonrigid adaptive tracking. Dashed line  threshold Colorcoded open mouth tracking error measured for both rigid and nonrigid solutions Angry tracking error measured for both rigid and rigid + nonrigid solutions. Plot in red  rigid tracking. Plot in blue  nonrigid adaptive tracking. Dashed line  threshold Colorcoded angry tracking error measured for both rigid and nonrigid solutions Bag tracking error measured for both rigid and rigid + nonrigid solutions. Plot in red  rigid tracking. Plot in blue  nonrigid adaptive tracking. Dashed line  threshold Colorcoded bag tracking error measured for both rigid and nonrigid solutions Limitation of the proposed method. User s body (A) is reconstructed (B) and the algorithm cannot track user s arms (C) integrating all the movement into the 3D reference model (D) Body tracking error measured for both rigid and rigid + nonrigid solutions. Plot in red  rigid tracking. Plot in blue  nonrigid adaptive tracking. Dashed line  threshold Colorcoded body tracking error measured for both rigid and nonrigid solutions
18
19 LIST OF TABLES 5.1 Number of constraints (C), accuracy (A, given in mm), standard deviation (SD, given in mm) and performance (P, given in FPS) results according to the step size (from 1 to 32) or sampling scheme (Adap for adaptive) used to select constraints for optimization Average accuracy (A, given in mm) and Standard Deviation (SD, given in mm) results according to the weight used to update the 3D reference model Average accuracy (Avg., given in mm), Standard Deviation (Std. Dev., given in mm) and Performance (Perf., given in FPS) results for each one of the tracking algorithms tested in presence of specific user deformation. NRn: NonRigid Registration applied for every n frames (independent of rigid tracking fail); NRAdaptive: NonRigid Registration applied whenever the rigid algorithm fails Average accuracy (Avg., given in mm), Standard Deviation (Std. Dev., given in mm) and Performance (Perf., given in FPS) results for each one of the thresholds used to detect rigid tracking fail xvii
20
21 Chapter 1 In this first chapter, a brief contextualization of the problem we want to solve, objectives and contributions of the proposed work and thesis organization are described. INTRODUCTION Augmented Reality (AR) is a technology in which the view of a real scene is augmented with additional virtual information. As stated by Azuma (1997), an AR application must follow three basic characteristics: 1. Combination of virtual object(s) into a real scene; 2. Realtime performance; 3. 3D Registration for accurate tracking of the augmented scene; Since the beginning, tracking is one of the main problems which limits the development of a successful AR application. Virtual and real worlds must be properly aligned so that they seem to coexist at the same location for the user. For some applications, such as the ones proposed for medical AR in surgery environments, it is specially important accurate registration of the virtual medical data into the patient or a successful surgery operation may be compromised. Tracking plays an important role not only in AR, but also for 3D reconstruction. Several viewpoints of the same object/scene of interest are captured by an appropriate sensor and these must be registered and aligned to the same coordinate system. After this registration step, the different viewpoints must be integrated into a single 3D model. Therefore, if the viewpoints are incorrectly aligned, visible artifacts will appear in the final reconstructed model. Computer vision techniques have been proposed to solve the problem of registration, however they are not robust enough for some illumination conditions (Teichrieb et al., 2007). With the availability of depth sensors, 3D registration techniques have been proposed using 3D information to improve tracking robustness. But, for lowcost depth sensors, noise may affect the accuracy of the registration. 1
22 2 INTRODUCTION In scenarios such as onpatient craniofacial medical data visualization (Lee et al., 2012; Macedo et al., 2014), it is specially important for a markerless AR environment (MAR) to provide support for nonrigid tracking, which adds one level of interactivity for the user and improves the robustness of the tracking algorithm for rigid and nonrigid patient interactions. The main issue related to this support is that AR requires realtime interactivity and most of the current stateoftheart works in the field of 3D nonrigid registration do not provide such performance. Here, we assume that an application runs in realtime if its performance is equal or above 15 frames per second (Akeninemoller; Moller; Haines, 2002). This concept of realtime is more related to user interactivity, because the user must interact with the application and receive fast feedback from it without too much delay. Several approaches exist for accurate 3D nonrigid registration, however a few of them allow interactive registration. Despite the realtime techniques which rely on strong priors about a specific scenario (Weise et al., 2011; Chen; Izadi; Fitzgibbon, 2012; Bouaziz; Wang; Pauly, 2013; Li et al., 2013), a few methods have been proposed for fast generalpurpose nonrigid registration (Sumner; Schmid; Pauly, 2007; Nutti et al., 2014). Their common characteristic is the way they represent the deformation for a given surface: using a deformation graph. Each node of this graph has a 3D affine transformation which allows source surface to be deformed to a target surface. Deformation is modelled in terms of an energy function and, by using a nonlinear optimization algorithm, energy is minimized and the best affine transformations for each node of the graph can be found. In this doctoral work, we want to address the problem of fast 3D nonrigid registration by applying adaptive techniques to reduce the computational cost of the registration while keeping it accurate. 1.1 HYPOTHESIS Our main question of research is: Is it possible to track interactively and with sufficient accuracy deformable objects which undergo deformation in sequential frames in a markerless augmented reality application? To answer this question, we build upon an adaptive approach for fast nonrigid registration in scenarios where real noisy surfaces are captured from a lowcost depth sensor. This thesis aims to solve the problem of fast, interactive 3D nonrigid registration for MAR environments. In this sense, the proposed approach must be as accurate as stateoftheart solutions, while supporting realtime performance and being robust under noisy and missing data. 1.2 CONTRIBUTIONS The main contributions of this thesis are: ˆ A markerless augmented reality environment based on a lowcost RGBD sensor; ˆ A dynamic subdivision approach for node selection on the source object;
23 1.3 ORGANIZATION 3 ˆ An adaptive algorithm to select, for each iteration, samples from the source object to be used as constraints for optimization; ˆ A multiframe adaptive approach in which nonrigid registration is applied only when rigid tracking error is above a certain threshold and a 3D rigid representation of the object is updated to take into account the current deformation; ˆ A full framework for nonrigid registration implemented entirely on the Graphics Processing Unit (GPU); 1.3 ORGANIZATION This thesis is organized as follows: Chapter 2, Fundamentals and Related Work. This chapter formalizes the concepts of augmented reality, 3D registration and their challenges. Also, it provides an extensive review on related work in the fields of rigid and nonrigid registration, focusing on the interactive methods developed so far. Chapter 3, Markerless Augmented Reality Environment. The focus of this thesis is to add support for nonrigid tracking in a markerless augmented reality environment. Therefore, in this chapter we present the environment in which the proposed nonrigid registration was applied and validated. Chapter 4, GPUBased Adaptive NonRigid Registration. In this chapter we present the proposed adaptive nonrigid registration algorithm and its adaptation to take advantage from the parallelism of the GPU, as well as the multiframe scheme adopted to improve algorithm s performance. Chapter 5, NonRigid Registration Evaluation. In this chapter, nonrigid registration is evaluated in terms of accuracy and performance for several datasets. Chapter 6, NonRigid Support Evaluation for a Markerless Augmented Reality Environment. In this chapter, nonrigid tracking is evaluated in the context of the markerless augmented reality environment in terms of accuracy, performance and tracking robustness for several datasets. Chapter 7, Conclusion and Future Work. Thesis is concluded with a summary and discussion of future directions.
24
25 Chapter 2 This chapter formalizes the concepts of augmented reality, 3D registration and their challenges. Also, it provides a review on related work in the fields of rigid and nonrigid registration, focusing on the interactive methods developed so far. FUNDAMENTALS AND RELATED WORK 2.1 AUGMENTED REALITY The concept of virtual environments has been proposed since 90s. They can be defined as environments in which only virtual objects are present. Milgram and Kishino proposed a taxonomy to identify in which point the applications were localized inside the socalled RealityVirtuality Continuum (Milgram; Kishino, 1994) (Figure 2.1). The extremes of this taxonomy are the real world and the Virtual Reality (VR). At the center are the Augmented Reality and Augmented Virtuality. On the former, there is the predominance of the real world over the virtual one, while on the latter there is the prevalence of the virtual world over the real. Figure 2.1 RealityVirtuality Continuum (Milgram; Kishino, 1994). AR and VR use virtual objects both, but they have some differences. AR changes the real world by adding virtual elements. Thus, it is fundamental for an application to maintain the contact with the view of the real world, which is the basis for an AR application. Although authors such as Vallino and Azuma state that the main goal of AR is the seamlessly integration of the virtual objects into the real scene (Vallino, 1998; Azuma et al., 2001), it is not mandatory for such systems to be realistic. Another central distinction between AR and VR is the registration or tracking problem. This process is crucial in AR: the combination of real and virtual objects into the augmented scene requires an accurate positioning of the virtual objects over the real world. 5
26 6 FUNDAMENTALS AND RELATED WORK The motivation for the development of applications and researches in the field of AR comes from the potential of benefits that such techniques may bring in several other fields. In the specific field of registration, AR methods have been attracting a lot of attention in Medicine, because they extend the possibilities of study and practice for many techniques and medical procedures related to the medical images generated from patient s current condition, such as angiographic visualization (Wang et al., 2012), liver surgery (Haouchine et al., 2013, 2014) and uterine laparosurgery (Collins et al., 2014). However, registration is a crucial problem in AR applications. Objects misplaced in the scene appear to be floating over the real scene. Accurate registration becomes even more crucial in applications which demand high precision, such as surgeries. Tracking in AR is performed based on color or depth intensity of the object being tracked by the application. For colorbased tracking, features are computed from the color image of the scene captured by the sensor and tracked during application s live stream (Horn; Schunck, 1981; Lucas; Kanade, 1981). The first solution proposed to solve this issue was based on fiducial markers, used as point of reference positioned in the real scene for tracking (Figure 2.2left image). Due to its intrusiveness (i.e. the marker is an artificial content introduced in the scene), methods for colorbased tracking without markers were proposed. However, the main drawback in this kind of registration is still the same: the susceptibility to illumination conditions. To overcome this problem, depthbased tracking was proposed by registering two surfaces captured from the real scene from a realtime 3D depth sensor (Besl; Mckay, 1992; Chen; Medioni, 1992). This kind of tracking has grown popularity due to its accuracy, robustness over illumination conditions and the recent availability of lowcost depth sensors. In general, AR applications can be divided in two groups: markerbased and markerless. Markerbased AR uses a fiducial marker as a point of reference in the field of view to help the system to estimate the camera pose (Figure 2.2left image) (Kato; Billinghurst, 1999). Markerless AR (MAR) uses a part of real scene as a natural marker (Figure 2.2 right images) (Izadi et al., 2011). By using it as a point of reference for tracking, one can expect nonrigid motion of the marker if it consists of a deformable object (e.g. face, body, hand) D REGISTRATION 3D registration is a fundamental problem in fields as 3D reconstruction and augmented reality. Most of the depth sensors provide partial surface data (i.e. acquired from one viewpoint) that must be aligned, to allow camera pose estimation, and merged, to obtain a complete digital representation of the object or scene of interest. Some functional models have been proposed in the literature to solve the problem of registration with good performance: 1. Rigid Registration: In this kind of registration, a single Euclidean transformation is used to align two objects (Rusinkiewicz; Levoy, 2001). This transformation has the following properties: (1) It is global (i.e. remains the same for every point); (2) It can be uniquely defined by three noncollinear pairs of correspondences; (3) It is lowdimensional (i.e. only six degrees of freedom). Realtime performance is
27 2.2 3D REGISTRATION 7 Figure 2.2 Markerbased (left image) and markerless (right images) augmented reality. Left image is courtesy of ARToolKit library (Kato; Billinghurst, 1999) and right images are courtesy of KinectFusion (Izadi et al., 2011). easily achieved due to the low number of parameters required to solve the rigid registration; 2. Articulated Deformation: For surfaces which are mainly characterized by articulations, a skeleton is typically used as basis for deformation. In this representation, a skeleton is defined by a combination of bones and joints. Each joint is associated to some DoF (i.e. joint angles) and is related to other joints by rigid transformations (Allen; Curless; Popović, 2002). In an alternative representation, joint deformation is obtained by blending the transformations of two adjacent bones in the overlap regions (Chang; Zwicker, 2008, 2011). The advantage of this representation is that it requires a low number of parameters to be estimated, which depends on the number of available bones or joints; 3. Local Affine Deformation: For several realworld datasets, it is desirable for the nonrigid registration algorithm to support general deformations, without prior knowledge about the objects or the kind of deformation they undergone. To achieve realtime performance, models, such as articulated registration, rely on prior knowledge about the scenario (e.g. skeleton tracking), losing its generality. To solve this issue, keeping nonrigid registration fast, accurate and general, solutions which use local affine transformations are frequently employed as they allow the preservation of fine surface details, while decoupling the complexity of the geometry from the complexity of the deformation by using a deformation graph as basis representation (Sumner; Schmid; Pauly, 2007); Other functional models such as rigid registration with nonrigid correctives (Brown; Rusinkiewicz, 2007) and isometric deformation (Lipman; Funkhouser, 2009) have also been proposed in the literature, however they require too much computational cost, being inadequate to be used in our approach.
28 8 FUNDAMENTALS AND RELATED WORK Rigid Registration Rigid registration estimates a single transformation, composed of rotation and translation, to align two different viewpoints of the same object. Rigid registration is a challenging problem because, for realworld scenarios, it must deal with noise, outliers and nonoverlapping regions inbetween two surfaces captured from commodity depth sensors. Noise refers to the presence of unwanted points near the surface captured. Outliers are noisy points far from the surface, but that must be rejected otherwise they may affect the optimization phase. As the object is captured from a single view of the camera, the presence of nonoverlapping regions between two surfaces is already expected, however holes and other artifacts may decrease significantly the region of overlap (Tam et al., 2013). To limit the search space for optimization and correspondence estimation, constraints must be defined. In the field of rigid registration, transformationinduced constraints such as closest point criterion are commonly employed. It constraints potential correspondences by computing and matching closest points for every iteration of the registration algorithm. It is used in the standard Iterative Closest Point (ICP) algorithm (Besl; Mckay, 1992; Chen; Medioni, 1992). To reduce search space for correspondence, specific approaches have been proposed: project and projectandwalk methods (Rusinkiewicz; Levoy, 2001) restrict the search for a new closest point to the same 2D projection (i.e. pixel) and local neighbourhood respectively, avoiding global exhaustive search. Other constraints such as features (Johnson; Hebert, 1999) and saliency (Gelfand et al., 2005) have also been proposed in the literature and provide more reliable correspondences, and consequently more accurate convergence to the final result, however they require high processing time, being inadequate to be used in our proposal. In fact, rigid registration has been researched for several years and now it consists on a welldefined problem with a small number of parameters to be estimated. Then, realtime highquality methods have already been proposed in the literature. The most popular algorithm for 3D rigid registration is the ICP. It consists of six steps: ˆ Selection of Points: Points from source and target objects are selected as samples for the algorithm; ˆ Matching of Points: Corresponding points from source and target objects are associated; ˆ Weighting of Correspondences: Correspondences are weighted such that the most reliables will have more weight according to its level of reliability; ˆ Rejecting of Correspondences: Outliers are rejected from the pairs of corresponding points; ˆ Error Association: Pointtopoint or pointtoplane error metric is defined for the optimization step; ˆ Error Minimization: Energy function built from previous step is (commonly) minimized by solving a linear system.
29 2.2 3D REGISTRATION 9 As the ICP algorithm provides high accuracy and realtime performance for rigid registration (Rusinkiewicz; Levoy, 2001), it is used in our approach NonRigid Registration Nonrigid registration requires more attention because it faces the issues from rigid registration and also the problem of deformation, which itself increases the number of parameters to be estimated and the space of solutions that can be found. Unlike the rigid scenario, where every point from a given source object must be moved by a single transformation measured by the algorithm, in the nonrigid scenario, every point may undergo a different, interconnected deformation. Therefore, more reliable correspondences must be computed for every region of the source object so that the registration may be sufficiently accurate and realistic (Tam et al., 2013). Traditionally, commercial systems have used markers to provide sparse reliable correspondences for nonrigid registration, however, they are intrusive in the scene (Bermano et al., 2014). Templates have been used for applications based on parttowhole alignment, where they provide strong priors for the shape, helping on handling of noise and missing data (Li et al., 2009). For scenarios such as facial nonrigid registration, blendshapes can be applied to capture a basis set of user expressions (Weise et al., 2011; Bouaziz; Wang; Pauly, 2013; Li et al., 2013). Other constraints induced by deformation, features, signature and saliency require too much processing time. Closest point criterion can be used for rigid and nonrigid registration in a similar way. However, regularization constraints are commonly employed to improve optimization phase by avoiding local minima taking advantage from a priori information. Orthonormality (Sumner; Schmid; Pauly, 2007) and handling of holes (Li; Sumner; Pauly, 2008) are some of the most used regularization schemes for nonrigid shapes. In general, nonlinear optimization solver is typically employed for rigid and nonrigid registration. Many techniques have been used focusing on finding the best transformations and correspondences. Local deterministic optimization methods compute a solution that maximizes/minimizes an energy function locally. These techniques do not produce the most accurate solutions, but are mainly used due to their low processing time. Gradientdescent, Newton, GaussNewton, quasinewton and LevenbergMarquadt are often employed for nonrigid registration (Madsen; Bruun; Tingleff, 2004). Singular Value Decomposition, quaternions, orthonormal matrices and dual quaternions are the most frequently used for rigid registration (Lorusso; Eggert; Fisher, 1995). As local optimization techniques may find only the local minima, global optimization can solve this problem trying to find a global solution. As alternative, stochastic optimization can solve this problem by using statistics and probabilistic models. While stochastic and global deterministic optimization seem to be more accurate, in this thesis we use a technique based on local optimization because of its low running time. Moreover, as we assume that there are spatial and temporal coherences between the sequential frames used for registration, local optimization converges after a few iterations (Sumner; Schmid; Pauly, 2007). Surfaces such as face, hand and body may undergo deformation during a process of 3D reconstruction, for instance, and the rigid registration is not able to solve it. A solution
30 10 FUNDAMENTALS AND RELATED WORK for this issue is to apply nonrigid registration to align those deformable objects. One of the first works in the field of fast nonrigid registration applied to computer graphics is the Embedded Deformation (ED), a realtime deformation algorithm for object manipulation and creation of 3D animation (Sumner; Schmid; Pauly, 2007). The goal of this technique is to allow an user intuitive surface editing while preserving surface s features. Deformation is represented by a graph. Each node of this graph is associated with an affine transformation that influences the deformation to the nearby space. The great advantage of this approach is that it can be applied to a wide range of objects, articulated or not. Although its main goal is the user object manipulation, the algorithm proposed by Sumner et al. also can be seen as a nonrigid registration algorithm in which source and target surfaces are the objects before and after user manipulation. In this sense, many other works have used or improved this approach to the specific problem of nonrigid surface registration. Li et al. adapted Sumner s algorithm to the registration of partial range scans acquired from a 3D scanner (Li; Sumner; Pauly, 2008). They augmented the ED algorithm with a rigid registration and designed an energy function to penalize unreliable correspondences. Later on, Li et al. presented an extension of the previous approach (Li; Sumner; Pauly, 2008) where an algorithm for highquality templatebased nonrigid surface registration and reconstruction using dynamic graph refinement and multiframe stabilization was presented (Li et al., 2009). Li et al. presented a method for temporally coherent completion of surfaces captured from realtime dynamic performances (Li et al., 2012). They extended the nonrigid registration proposed in their previous work (Li et al., 2009) by adding texture constraints for the optimization. Dou and colleagues proposed an algorithm to track dynamic objects acquired from realtime commodity depth cameras, such as the Microsoft Kinect Sensor (Dou; Fuchs; Frahm, 2013). Basically, they have extended the KinectFusion algorithm (Izadi et al., 2011) to deal with nonrigid registration. Their nonrigid registration is based on ED algorithm, however color consistency and dense point cloud alignment were added to the original energy function. All these approaches improve the accuracy of the ED algorithm, however requiring execution time in the order of minutes to register two point clouds. Thus, they are not suitable for an AR application. Few methods were capable to achieve realtime performance in 3D nonrigid registration. Chen et al. proposed a method for nonrigid registration of skeletons captured from user s body (Chen; Izadi; Fitzgibbon, 2012). Their approach runs in 30 frames per second (FPS) but uses a small number of constraints for registration and depends on a skeleton definition. Nutti et al. proposed a method to track tumors based on patient s body position that presumes the prior knowledge about the scenario (Nutti et al., 2014). Their algorithm runs in 10 FPS by using a multithread implementation of (Li et al., 2009) in CPU. Zollhöfer et al. proposed a method for realtime nonrigid registration of arbitrary meshes captured from the real scene (Zollhöfer et al., 2014). Based on a hardware specialized for highquality surface acquisition, their approach generates a 3D template model of the object of interest and uses a hierarchical nonrigid registration algorithm fully
31 2.3 SUMMARY 11 implemented on the GPU. The implementation runs in 30 FPS with high accuracy. In this work, we present an approach also based on the ED algorithm which shares some characteristics of (Zollhöfer et al., 2014), such as no special configuration or prior knowledge of the object and GPU parallelism to achieve realtime performance. However, no special hardware is supposed to be used, on the contrary, our approach is based on a simple offtheshelf RGBD sensor, with noise and low accuracy. As proposed in related work (Li et al., 2009), we use an adaptive graph refinement to improve nonrigid registration accuracy. Differently from other approaches, the algorithm proposed here runs entirely on the GPU and is based on a quadtree which operates over the 2D projection of the object to be registered. Also, the main goal of our algorithm is to be incorporated in a MAR environment, as a tool to improve tracking of the deformable object. 2.3 SUMMARY Augmented reality is a technology which has been used in several fields such as medicine, entertainment, among others. For some applications, markerless technology is useful to remove the intrusiveness of traditional markerbased approaches. When the object used as basis for markerless tracking is deformable, it is desirable for the application to support nonrigid motion to improve tracking robustness. Many methods have been proposed for accurate 3D nonrigid registration inspired by the ED algorithm, however a few of them support realtime performance, still requiring prior knowledge about the scenario. To overcome this situation, in this thesis we propose an alternative method for fast 3D nonrigid registration which extends the ED algorithm by using a threelevel adaptive approach implemented entirely on the GPU.
32
33 Chapter 3 The focus of this thesis is to add support for nonrigid tracking in a markerless augmented reality environment. Therefore, in this chapter we present the environment in which the proposed nonrigid registration was applied and validated. MARKERLESS AUGMENTED REALITY ENVIRONMENT In this chapter we present the MAR environment in which this work is based on. An overview of proposed MAR environment can be seen in Figure 3.1. An RGBD sensor is used to capture color and depth information of the scene. The object of interest is localized, segmented from the scene and reconstructed in realtime. Then, realtime tracking is performed by using the 3D reference model previously reconstructed and the current 3D object captured by the sensor. The final registered 3D object is integrated into the 3D reference model to account for new viewpoints or changes in object s shape due to deformations. A detailed explanation of the environment can be seen in the next subsections of this chapter. 3.1 SURFACE ACQUISITION In this environment, an RGBD sensor is used to capture color and depth information from the real scene for every input frame (Figure 3.1). Color information is encoded as a color map, an image which stores for each pixel the red, green and blue intensities of the captured scene. Depth information is encoded as a depth map (D), an image which stores for each pixel the measurement of distance (i.e. depth) from the corresponding 3D point on the scene to the depth sensor. Our approach is based on a lowcost RGBD sensor which provides noisy depth data. As described in Section 2.2, unwanted points on the surface captured may reduce registration accuracy. To minimize this problem, bilateral filter is applied over D (Tomasi; Manduchi, 1998), as shown in Equation 3.1. To reduce noise preserving features (i.e. discontinuities) of the raw depth data, this technique uses a nonlinear combination of nearby image intensities based on geometric proximity and photometric similarity. 13
34 14 MARKERLESS AUGMENTED REALITY ENVIRONMENT Live Stream Object Segmentation TSDF 3D Reference Model 3D Reference Model Reconstruction Tracking Source Depth Map Source Surface Registration RGBD Sensor Target Depth Map Target Surface Figure 3.1 Overview of the proposed approach from 3D reference model reconstruction to tracking solution. Adapted from (Souza; Macedo; Apolinario, 2014). D f (p) = 1 W (p) G σd ( p q )G σc ( D(p) D(q) )D(q) q S W (p) = q S G σd ( p q )G σc ( D(p) D(q) ) (3.1) where D(p) and D(q) correspond to the pixel values at positions p and q in image D. σ d and σ c are the standard deviations of Gaussian functions G for space (i.e. distance) and range (i.e. color) domains, respectively. W (p) is a normalization factor, S is the neighbourhood of pixel p and D f is the filtered depth map. From empirical tests, we have set σ d = 4.5 and σ c = 30. Unwanted points are also localized on the background scene, which can be removed from Df by using a depth threshold. On the experiments conducted, we have used the value of 1.3 meters for such task by considering that the object of interest is somewhere near the depth sensor. To detect and segment the object of interest in the scene (Figure 3.1), two methods can be used. The first method relies on the use of a classifier to detect the object on the appropriate map. If it is applied on the color map, intrinsic and extrinsic calibrations must be performed to allow the mapping of the segmented region from color to depth map. In practice, we have tested the approach in some scenarios where the object consists on user s head. In these cases, the ViolaJones face detector (Viola; Jones, 2004) implemented
35 3.2 3D REFERENCE MODEL RECONSTRUCTION 15 in GPU is used to locate and segment the face in the color map (Figure 3.1). This detector takes advantage from a representation called integral image to compute Haarlike features quickly. In an integral image, each pixel contains the sum of the pixels above and to the left of the original position. After the computation of the Haarlike features, a combination of simple classifiers built using the Adaboost learning algorithm is employed to detect faces in color images (Freund; Schapire, 1995). If the classifier is not available, an alternative method can be used. A 2D bounding box that contains the foreground object is computed from D. Then, it is discarded from the memory every position outside the bounding box. By applying the process of intrinsic calibration, a point cloud P is computed from D. The normal vector (n) for each point is the eigenvector of smallest eigenvalue for a covariance matrix built for every point p P (Holzer et al., 2012). Once the 3D object is obtained for every frame, markerless rigid registration is performed based on the interactive alignment of two consecutive source (P s ) and target (P t ) point clouds captured from the real scene. In fact, P s is represented by a 3D reference model generated from the object of interest in a previous pose and P t is the current point cloud acquired by the depth sensor. To achieve realtime performance, all the steps of this MAR environment must run on the GPU. Then, all the algorithms were carefully designed and implemented in a parallel way to exploit the full parallelism provided by the hardware D REFERENCE MODEL RECONSTRUCTION To reconstruct the 3D reference model from the object of interest in realtime (Figure 3.1), the KinectFusion algorithm is employed (Izadi et al., 2011; Newcombe et al., 2011). An overview of this algorithm can be seen in Figure 3.2. Figure 3.2 Overview of KinectFusion s pipeline (Izadi et al., 2011). Once the object is detected on the scene, the region that contains it is fixed. Then, the object is constrained to be moved only inside this region. From the different viewpoints captured from the same object, a single 3D reference model can be generated. To do so, the KinectFusion integrates raw depth data captured from an RGBD sensor into a 3D grid to produce a highquality 3D reconstruction of the object of interest. The grid
36 16 MARKERLESS AUGMENTED REALITY ENVIRONMENT stores for each voxel the signed distance to the closest surface around a narrow region (i.e. TSDF  Truncated Signed Distance Function) and a weight that indicates uncertainty of the surface measurement. These volumetric representation and integration are based on the VRIP algorithm (Curless; Levoy, 1996). To extract the implicit surface of the 3D reconstructed model, zerocrossings (i.e. positions where the TSDF sign changes) are detected on the grid through the raycasting algorithm. By extracting the reference model in a previous pose, and aligning it to the current 3D model captured by the depth sensor, the incremental motion (T rigid ) between frames can be estimated. This solution allows accurate markerless tracking without error accumulation, as the highquality 3D reference model is used as basis for tracking. 3.3 TRACKING Rigid motion is estimated by the ICP algorithm described in Section 2.2. Each one of the ICP steps were designed to achieve realtime performance while providing good accuracy for the rigid registration. This realtime variant of the algorithm is described as follows: ˆ Selection of Points: All the points from P s and P t are selected for optimization; ˆ Matching of Points: Corresponding points between P s and P t are associated by using the projective data association (i.e. reverse calibration) (Rusinkiewicz; Levoy, 2001), which matches the points that are located at the same 2D projection position (i.e. the same pixel in D s and D t ); ˆ Weighting of Pairs: It is assigned constant weight for each association; ˆ Rejection of Pairs: Pairs are rejected if the Euclidean distance between corresponding points is greater than 10mm or angle between corresponding normals is greater than 20 degrees; ˆ Error Metric: Pointtoplane metric (Equation 3.2) is used to guide optimization; argmin p selected (T rigid p s p t ) n t 2 (3.2) ˆ Error Minimization: Error metric is minimized by using the Cholesky decomposition on Equation 3.2 (Chen; Medioni, 1992). The realtime variant of the ICP algorithm uses projective data association to find correspondences. The ICP fails, or does not converge to a correct registration, when there is high pose variation between frames in sequence. To improve tracking robustness, a realtime pose estimator is used to give a new initial guess to the tracking algorithm when it fails (Figure 3.3). For the situations where the object consists on user s head, the head pose estimation algorithm proposed by Fanelli et al. was used (Fanelli et al., 2011). However, even using this algorithm, the tracking may fail if the user interacts nonrigidly with the application. Nonrigid tracking support can be added by applying a realtime nonrigid surface registration algorithm to align the 3D reference model and the current model captured, as will be discussed in the next chapter.
37 3.4 SUMMARY 17 Figure 3.3 Left image: The user translated his face fast. A small number of points were at the same image coordinate and the ICP failed. Right image: By using the pose estimation algorithm, the problem can be solved (Macedo; Apolinario; Souza, 2013). 3.4 SUMMARY One solution to provide accurate markerless tracking for an augmented reality environment is by generating a 3D reference model of the object of interest and tracking it in realtime. The KinectFusion algorithm is used to reconstruct such model in realtime and the ICP algorithm is used to track it in the scene by registering the 3D reference model in a previous pose and the current 3D model captured from a depth sensor. To add support for nonrigid tracking, it is necessary a realtime nonrigid registration algorithm to maintain user interaction with the application.
38
39 Chapter 4 In this chapter we present the proposed adaptive nonrigid registration algorithm and its adaptation to take advantage from the parallelism of the GPU. Our approach is evaluated in terms of accuracy and performance for several datasets. Some of the content described in this chapter is present in our authored publication in (Souza; Macedo; Apolinario, 2014). GPUBASED ADAPTIVE NONRIGID REGISTRATION In this chapter we present the adaptive nonrigid registration algorithm. An overview of the full process to register two point clouds can be seen in Figure 4.1. Nonrigid algorithm builds a deformation graph (G) on P s to allow its deformation to P t iteratively. Each node g G consists of a point P s associated with a 3D affine rigid transformation (i.e. a 3D rotation matrix R and a 3D translation vector t) which influences the deformation to the nearby space. Current deformation between P s and P t is modelled in terms of an energy function and a nonlinear optimization algorithm is applied to minimize this energy based on the affine transformations of G. To reduce computational cost of the nonlinear solver, a subsample of P s is selected as constraint to be used during optimization. Next, the algorithm iteratively refines G according to the energy function measured previously. This refinement is based on a quadtree. The registration is stopped when the residual error between deformed P s and P t is sufficient low. To achieve a good performance, the full pipeline runs entirely on GPU and nonrigid registration algorithm is applied in a multiframe manner only when rigid tracking fails. Our deformation model is inspired in the ED algorithm (Sumner; Schmid; Pauly, 2007). However, we have added a threelevel adaptive approach to improve accuracy and performance of the original solution. Moreover, we have implemented it on the GPU to boost performance even more. The proposed algorithm consists of several stages (Figure 4.1), which are described in the next sections of this chapter. 4.1 DEFORMATION MODEL By using the deformation graph, a point p can be deformed by G according to the following equation: 19
40 20 GPUBASED ADAPTIVE NONRIGID REGISTRATION Source Depth Map Cropped Source Depth Map Source Surface Selection of Nodes Target Depth Map Cropped Target Depth Map Target Surface Matching of Points Building of Quadtree Weighting the Influence of Nodes Selection of Constraints Adapting Quadtree Error Minimization Error > threshold Updating the source object Error threshold Deformed Source Surface Figure 4.1 Overview of the proposed approach from the depth map acquisition to the final nonrigid aligned surface.
41 4.1 DEFORMATION MODEL 21 p = k w j (p)[r j (p g j ) + g j + t j ] (4.1) j=1 where k represents the knearest nodes of p and w j is a weight that measures the influence of each node to the point. To solve the problem of nonrigid registration using this representation, we use three energy functions  E rot, E reg, E con (Sumner; Schmid; Pauly, 2007): ˆ Energy function for rotation (E rot ): In order for a 3 3 rotation matrix to represent a rotation in SO(3), it must satisfy six conditions: each of its three columns must be unit length, and all columns must be orthogonal to one another (Grassia, 1998). The squared deviation of these conditions is given by the function Rot(R): Rot(R) = (c 1 c 2 ) 2 + (c 1 c 3 ) 2 + (c 2 c 3 ) 2 + (c 1 c 1 1) 2 + (c 2 c 2 1) 2 + (c 3 c 3 1) 2 (4.2) where c 1, c 2 and c 3 are the column vectors of a given rotation matrix. The term E rot is defined by the sum of the rotation error over all affine transformations of G: E rot = m Rot(R j ) (4.3) j=1 ˆ Energy function for regularization (E reg ): In order to apply a deformation sufficiently smooth, we must ensure that the affine transformations of adjacent nodes in G must be consistent. E reg is the sum of the squared distances between each node s transformation applied to its neighbours and the actual transformed neighbour positions: E reg = m R j (g k g j ) + g j + t j (g k t k ) 2 2 (4.4) j=1 k N(j) where N j consists of all nodes connected with the node g j. ˆ Energy function for constraints (E con ): This energy function deals directly with P s and P t. It measures how distant they are from each other. E con is the sum of the Euclidean distances between the deformed source points and their correspondents on the target object:
42 22 GPUBASED ADAPTIVE NONRIGID REGISTRATION E con = n p i q i 2 2 (4.5) i=1 q is the target point correspondent to p i, p i is p i after deformation (Equation 4.1). n is the total of points in P s. The total energy function E tot is defined by the following equation: E tot = w rot E rot + w reg E reg + w con E con (4.6) We used w rot = 1, w reg = 10 and w con = 100 in all our experiments, as suggested in related work (Sumner; Schmid; Pauly, 2007). We tested other weights and alternative strategies for relaxing them during each iteration, however we did not obtain better results. 4.2 MATCHING OF POINTS After object detection and segmentation, points from P s and P t are associated. By using the MAR environment described in the previous chapter, it is assumed that there is temporal/spatial coherence between frames, as the rigid registration was already applied and, as result, P s and P t are relatively near from each other. Hence, projective data association (Section 3.3) is used to match the points. As adaptation for GPU processing, each GPU thread transforms a single point p s into image coordinate and associates it with the point p t at the same image coordinate. 4.3 SELECTION OF NODES After the matching of points, the nodes of G are selected. A quadtree is built on GPU to perform the selection of nodes based on the 2D projection of G. As the nodes of G are also points in P s, we can convert them from world to image coordinates by using the same process used to reproject P s into D s. P s may be an object with holes distributed along the surface. In this case, the selection of nodes only based on the 2D space may cause the nodes to be selected in regions where there is no depth data. To solve this problem, we take advantage from what we call virtual nodes to represent the space where there is no depth data. Virtual nodes favor the expansion of the quadtree in regions where naturally we have depth data, however not in the specific position of the node. It is worthy to mention that virtual nodes do not have affine transformation, they are just leaves of the quadtree that can be refined to generate real leaf nodes if necessary. Therefore, we restrict the use of virtual nodes in the first two levels of the quadtree. To build the quadtree, some information must be stored on the GPU memory space, such as: the level for each node in G, whether in a given position exists a node in G, G has children (i.e. is a parent node) and exists a virtual node in G. The algorithm can be divided in three steps: the building of the quadtree (Algorithm 1), the adaptive refinement (Algorithm 2) and collapse (Algorithm 3) of nodes in G.
43 4.3 SELECTION OF NODES 23 Algorithm 1 Building a quadtree 1: for each thread of index idx in parallel do 2: u getp ixel(idx, currentlevel) 3: if depth(v(u)) > 0 then 4: insertn odeingraph(u) 5: setlevel(u, currentlevel) 6: else if currentlevel <= 2 then 7: insertv irtualn odeingraph(u) 8: setlevel(u, currentlevel) 9: end if 10: if currentlevel > 1 and hasn ode(u) then 11: parentidx = idx/4 12: u getp ixel(parentidx, currentlevel 1) 13: removen odef romgraph(u) 14: removev irtualn odef romgraph(u) 15: insertn odeinp arentlist(u) 16: end if 17: end for Figure 4.2 Building of the deformation graph (right) over the source object (left) based on the residual error measured (center). We build the quadtree in the first iteration of our algorithm. This building is shown as pseudocode in Algorithm 1 and one result is illustrated in Figure 4.2. First, we iteratively call the GPU kernel that will select the nodes. We iterate from the first level to the level required by the user to build the quadtree. Each GPU thread in parallel computes the position u to select the node (line 2). To compute u, we need the thread id and the current level of the quadtree being iterated. The method getp ixel shifts the position of the thread id to the center of the 2D space that will be represented by the node. If the point is visible, it will be a new node in G (lines 35). In the opposite case, it can be a new virtual node (lines 69). Therefore, we allow the quadtree to be refined even in regions where there are just a few points. If the node was selected but it is not in the first level (line 10), the thread removes the parent node from G, being it a real or virtual node, and
44 24 GPUBASED ADAPTIVE NONRIGID REGISTRATION inserts it into a parent list, indicating that it has already been expanded (lines 1315). In this case, getp ixel computes the position of the parent node based on the previous level in the quadtree hierarchy and the parent id thread (as the parent is expanded to four children, we simply divide the current thread id by 4 to obtain the parent id). Algorithm 2 Refinement of nodes 1: for each thread of index idx in parallel do 2: u getp ixel(idx, currentlevel) 3: if hasn ode(u) or hasv irtualn ode(u) and getlevel(u) = currentlevel then 4: evaluatee con (u) 5: if region around u must be refined then 6: for each child node at pixel u c do 7: if depth(v(u c )) > 0 then 8: insertnodeingraph(u c ) 9: setlevel(u c, currentlevel + 1) 10: end if 11: end for 12: removen odef romgraph(u) 13: removev irtualn odef romgraph(u) 14: insertn odeinp arentlist(u) 15: end if 16: end if 17: end for Figure 4.3 Refinement of the deformation graph (right) over the cheeks region of the source object (left) based on the residual error measured (center). After the building of the quadtree, the nodes of G can be refined or collapsed according to the residual error measured in the previous iteration. The algorithm to do the refinement of nodes is shown as pseudocode in Algorithm 2 and one result is illustrated in Figure 4.3. Again, we iteratively call the GPU kernel that will refine the nodes. We iterate from the first level of the quadtree to the maximum level in order to refine the
45 4.4 WEIGHTING THE INFLUENCE OF NODES 25 nodes in a topdown fashion. For each GPU thread in parallel, we compute the position of the thread in the 2D space, check if there is a node at this position and if it is at the current level being iterated (lines 23). If the thread passes from this condition, we compute the average of the error around a region C as explained before (line 4). If the average is above a certain threshold, the node must be refined. For each child node computed from the node position (line 6), we check whether there is a point at the child position (line 7). If exists, it will be a new child node in G (line 8). In this case, the thread removes the node from G (lines 12, 13) and inserts it into a parent list, indicating that it has already been expanded (line 14). The algorithm to do the collapsing of nodes is shown as pseudocode in Algorithm 3 and one result is illustrated in Figure 4.4. Again, we iteratively call the GPU kernel that will collapse the nodes. We iterate from the maximum level of the quadtree to the root node in order to collapse the nodes in a bottomup fashion. For each GPU thread in parallel, we compute the position of the thread in the 2D space, check if the node has children and if it is at the current level that is being iterated (lines 23). If the thread passes from these conditions, given a region C around u, we compute the average of the error E con (Equation 4.5) for each p s C (line 4). If the average is below a certain threshold, the children nodes in C must be collapsed. To collapse the nodes, we check if exist child nodes and they are leaf nodes (line 6). In this case, they are collapsed (lines 79) and C is represented by the old parent node (lines 1011). Algorithm 3 Collapsing of nodes 1: for each thread of index idx in parallel do 2: u getp ixel(idx, currentlevel) 3: if haschildren(u) and getlevel(u) = currentlevel then 4: evaluatee con (u) 5: if region around u must be collapsed then 6: if exist child nodes and they are leaves then 7: for each child node at pixel u c do 8: removenodef romgraph(u c ) 9: end for 10: insertn odeingraph(u) 11: removen odef romp arentlist(u) 12: end if 13: end if 14: end if 15: end for 4.4 WEIGHTING THE INFLUENCE OF NODES In this step, the influence of the knearest nodes for each p s is computed. The weight w j can be computed by: w j (p) = (1 p g j /dist max ) 2 (4.7)
46 26 GPUBASED ADAPTIVE NONRIGID REGISTRATION Figure 4.4 Collapsing of the deformation graph (right) over the cheeks region of the source object (left) after updating on the residual error (center). and then normalized to sum to one. dist max is the distance to the k + 1nearest node with respect to p. From the Equation 4.7, it is guaranteed that the nearest nodes will have more influence in the deformation of p. Also, as the nodes are points of P s, they are deformed by other nodes of G. To compute the weights efficiently in GPU, we create an array that contains only the nodes selected. The direct access to this array prevents us from checking explicitly on the surface whether a point is also a node. Then, each GPU thread computes the influence for a specific node in G. 4.5 SELECTION OF CONSTRAINTS To compute the best affine transformations that align P s and P t we must: 1. Select the constraints (i.e. points from P s that will be used during the optimization phase); 2. Convert the affine rotations from Euler to quaternion representation; 3. Compute the energy function E tot (Equation 4.6) that models the constraints to guide the proper registration of the objects; 4. Use a nonlinear solver to minimize E tot ; Instead of using the full dense point cloud as constraint for the optimization or asking the user to perform this task of constraint selection, we use an adaptive algorithm that performs the selection of constraints based on the residual error previously measured (Equation 4.6). Given a region on the source surface, the higher the error, the higher the number of points selected as constraints for the optimization, as can be seen in Figure 4.5. In the first iteration of the optimization algorithm, where the residual error still was not measured, an uniform sampling is used to select the constraints. To do that, a n n mask, with step n, is scanned through the 2D projection of P s at the xy coordinates.
47 4.6 ERROR MINIMIZATION 27 Source Surface Target Surface Initial Error Constraints max 0 Figure 4.5 Constraint selection based on the initial nonrigid error between source and target surfaces. The point at the center of this mask is selected to be a constraint if it exists in P s (i.e. it is not in a hole). From empirical tests, n = 4 produced the best results. A discussion about the most appropriate value for n is shown in Chapter 5, Section 5.2. In the remaining iterations of the optimization, we use the same n n mask to perform a scan on the 2D projection of P s and its residual error E tot (Equation 4.6). First, the algorithm evaluates the average residual error at the n n region being scanned. Based on this average error measured from E tot, which we call here E avg, and a predefined threshold th c, the number of points selected at that region will be defined. In this case, we have three situations: 1. E avg > th c, all the n 2 points are selected; 2. E avg th c /2 and E avg th c, n points uniformly distributed over the mask are selected; 3. E avg < th c /2, only the point at the center of the mask is selected. Therefore, we select more constraints in the regions where the deformation is high and must be minimized, but we still consider the regions where the deformation is small or none, by selecting a small number of constraints to represent them. From empirical tests, th c equals to the half of the averaged root mean squared error measured for the dataset produced the best results. 4.6 ERROR MINIMIZATION In this stage, the affine transformation A = [R t], where R is a 3 3 rotation matrix and t is a 3D translation vector, is estimated for each node by a nonlinear GaussNewton
48 28 GPUBASED ADAPTIVE NONRIGID REGISTRATION solver using the constraints selected previously. After the selection of constraints, we need to convert the affine rotations from Euler to quaternion representation. The motivation is related to our nonlinear solver, that operates faster with quaternions (three unknowns, assuming the component w equals to 1) than the Eulerform rotation matrix (nine unknowns). To store the affine transformations that will be estimated, we create two arrays: one array to store six parameters (i.e. three from quaternion and three for translation) for each node, and another array that is a hash relating a node in G to where are its parameters in the first array. We compute the array and the hash elements using atomic operations on the GPU. Once with E tot, we must solve the optimization step to obtain the affine transformations that align P s to P t. To achieve this goal we use the GaussNewton solver (Madsen; Bruun; Tingleff, 2004). Our objective is to solve the normal equation J t J = J t r. We compute the residual r, that consists in the computation of E tot for each coordinate x, y and z of each constraint and the Jacobian J, that is the firstpartial derivative of E tot for each one of the parameters. represents the unknown parameters that we want to find to minimize E tot. To compute J efficiently we compute only the partial derivative for the parameters that affect the constraint in which the derivation is being computed. Once with J and r, we reduce the normal equation to the linear system A = b and compute the products A = J t J and b = J t r. After solving the linear equation, we add to the array of parameters (i.e. quaternions + translation vectors) and reiterate the optimization algorithm until the maximum number of iterations or if the error is stabilized (does not change more than 5%). Related to GPU processing, r and J are computed in parallel. A and b are computed by using the matrixmatrix and matrixvector multiplication from CUBLAS library (Nvidia, 2008). The linear system is solved by using a GPU implementation of the LLT decomposition proposed (Henry, 2009) together with a linear solver Strsm from the CUBLAS library. 4.7 UPDATING THE SOURCE OBJECT The affine transformations computed in the previous step are applied on P s based on Equation 4.1 and the algorithm is reiterated to the second step until the maximum number of iterations is reached (we use three iterations to limit processing time). Each GPU thread applies Equation 4.1 for each p s of the source object. 4.8 MULTIFRAME NONRIGID TRACKING To add support for nonrigid tracking, one solution is to apply it whenever the rigid tracking fails, enhancing the robustness of the MAR environment. However, to apply nonrigid tracking for every frame has a computational cost which does not make it suitable for realtime applications. Therefore, if rigid tracking keeps failing consecutively, nonrigid tracking will be used more frequently, reducing user interactivity. To solve this problem, we take advantage from the volumetric representation of the
49 4.9 SUMMARY 29 KinectFusion algorithm to update the 3D reference model in realtime based on the current deformation measured. When the rigid tracking fails (i.e. error measured is above a certain threshold), nonrigid registration is applied and the 3D reference model deformed surface is sent to KinectFusion s grid with a high weight. 3D reference model is updated in the grid representation by the TSDF computation and then the grid is ray casted to generate a new source surface for the next iteration. High weight is used for fast adaptation of the previously stored 3D reference model into a new deformed one. As consequence, by deforming the 3D reference model, nonrigid tracking converges faster and with higher accuracy in the next iterations than the rigidonly solution (i.e. in which only rigid tracking is applied and KinectFusion s volume is not updated). 4.9 SUMMARY In this chapter we have presented a fast method for nonrigid registration which is able to register two noisy point clouds captured from a depth sensor with high accuracy. We have proposed an adaptive strategy for node distribution and constraint selection. In this context, it is fundamental to validate the algorithm in a real MAR environment in order to validate tracking robustness over many frames as well as averaged accuracy and performance, that is what will be done in the next chapter.
50
51 Chapter 5 In this chapter, nonrigid registration is evaluated in terms of accuracy and performance for several datasets. NONRIGID REGISTRATION EVALUATION In this section we describe the experimental setup used and analyse accuracy and performance of the proposed algorithm. In the tests, we compare the results obtained with our algorithm in relation to related work, such as the ED algorithm. 5.1 METHODOLOGY For all tests, we ran our algorithm on an Intel Core TM i GHZ, 8GB RAM memory, NVIDIA GeForce GTX 660. Kinect is used as RGBD sensor due to its accessibility and lowcost (Cruz; Lucio; Velho, 2012). It consists of a structured light depth sensor (IR emitter and camera), an RGB camera, an accelerometer, a motor and a multiarray microphone. Both cameras operate at 30 Hz, pushing images at 640x480 pixels. While the sensor provides depth maps in realtime, the depth data is noisy and inaccurate. To implement the approach proposed in this thesis, we have used some libraries or toolkits to ease the implementation. The configuration of these libraries in the context of our approach is illustrated in Figure 5.1. OpenNI was used to capture the depth and color stream provided by the Kinect sensor (Occipital, 2015). Object detection and segmentation were done by using the OpenCV library (Bradski; Kaehler, 2008). We have implemented 3D reference model reconstruction and nonrigid registration in GPU by using the NVIDIA CUDA architecture (Kirk; Hwu, 2010). Also, we used the open source C++ implementation of the KinectFusion released by the PCL project (Rusu; Cousins, 2011). 3D reference model was reconstructed with the KinectFusion using a grid with volume size of 70cm 70cm 140cm and resolution of 512 3, as suggested in related work (Macedo et al., 2014). The nonrigid registration optimization takes 20ms for each iteration. Therefore, to achieve realtime performance ( 15 frames per second), we have chosen to use only three iterations of the optimization. As the optimization converges faster, 31
52 32 NONRIGID REGISTRATION EVALUATION such small number of iterations still produces a good balance between accuracy and performance. As each dataset has its own minimum and maximum errors, we set the thresholds for adaptive node and constraint selections to be half of the averaged root mean squared error measured. Image Processing Reference Model Reconstruction Kinect Live Stream NonRigid Registration Figure 5.1 Overview of the libraries used for each step of our approach. Out of the MAR environment, we have tested the proposed nonrigid registration algorithm in four different datasets, which can be seen in Figure 5.2. I Synthetic dataset: to perform a groundtruth evaluation for the nonrigid registration in objects free from noise and holes. This dataset contains models with 10k points; II Real dataset with high precision and low noise: to evaluate the nonrigid registration in objects with low level of noise. This dataset was used by Weise et al. and consists on a deforming hand with 80k points (Weise; Leibe; Gool, 2007). Although this is not the kind of data we will find on the markerless AR environment, this is a real dataset common on the literature. Therefore, it was used to compare our approach with ED using a common model; III Real datasets with medium precision and noise: to evaluate the nonrigid registration in objects with noise and holes. The source and target surfaces were captured by our markerless AR environment. This scenario contains two different datasets: an user deforming his face by smiling (III1) and inflating his cheeks (III2). These two scenarios have objects with 30k and 40k points respectively;
53 5.2 ACCURACY EVALUATION 33 Source surface Target surface I II III1 III2 Figure 5.2 Datasets used for evaluation of the nonrigid registration algorithm. I  Synthetic dataset consisting on a deformed plane. II  Real dataset of a deforming hand. III1  Real dataset of a user smiling. III2  Real dataset of a user inflating his cheeks. On the tests comparing our approach with the ED algorithm, both were tested by using the same number of nodes in the first iteration. While in the ED algorithm, the number of nodes did not change, in our approach the number of nodes changed according to the error reduction. We have used E tot as a measure for refinement/collapse of nodes. The following evaluations were done by a comparison of three algorithms: ED implemented in GPU (GPUED), our approach based only on adaptive refinement of nodes (AdNodes), in which all the points are selected as constraints, and our full approach based on adaptive refinement of nodes and constraints (AdNodes + AdCons). 5.2 ACCURACY EVALUATION The final error distribution for the different datasets shown in Figure 5.2 can be seen in Figure 5.3 for our algorithm AdNodes + AdCons. For the synthetic dataset (on the top of the figure), the only deformation is the presence of a semisphere located on the center of the object. In this situation, our algorithm achieved high accuracy of 1mm. For the surface on Figure 5.3, a hand deforms starting by the fingers, where is the high error. The algorithm could reduce the average error below 2mm. The surfaces on the bottom were captured from the Kinect. On the first surface, the user was asked to deform his face by smiling in front of the Kinect. Moreover, the user translated his face to get slightly distant to the camera. Therefore, there is a high error in the model as it was deformed and rigidly translated. On the bottom surface, the user was asked to deform his face by inflating his cheeks. Therefore, the main deformation error is present in the region of the cheeks. In both cases, the algorithm had accuracy of 2.6mm.
54 34 NONRIGID REGISTRATION EVALUATION Source surface Target surface Initial error Final error I II 10mm III1 0mm III2 Figure 5.3 The resulting colorcoded error from the registration between source and target surfaces. In all situations the proposed algorithm AdNodes + AdCons obtained an averaged accuracy below 3mm and standard deviation below 3.5mm. I  Synthetic dataset consisting on a deformed plane. II  Real dataset of a deforming hand. III1  Real dataset of a user smiling. III2  Real dataset of a user inflating his cheeks. Figure 5.4 Accuracy (in mm) obtained by AdNodes and AdNodes + AdCons in comparison with the Embedded Deformation (ED) algorithm and the initial error for each one of the datasets used. The improvement of accuracy by AdNodes + AdCons with respect to the ED algorithm can be seen in Figure 5.4. AdNodes + AdCons obtained better accuracy than ED because of the adaptive selection of nodes, which redistributed the nodes in the deforma
55 5.2 ACCURACY EVALUATION 35 tion space increasing them in the regions where the residual error is high and decreasing them otherwise. To improve accuracy, one solution is to select more constraints to be used by the nonlinear solver. Obviously, it will decrease the performance of the algorithm. This situation can be seen in Figures 5.4 and 5.11 and for the algorithm AdNodes. Target Object Source Object Registered Object Embedded Deformation (7 nodes) 10mm Registered Object Embedded Deformation (33 nodes) Registered Object Adaptive Node Selection (19 nodes) 0mm Figure 5.5 Accuracy comparison between ED algorithm and our adaptive approach with respect to the node selection for the dataset II. Target Object Source Object Registered Object Embedded Deformation (16 nodes) 10mm Registered Object Embedded Deformation (64 nodes) Registered Object Adaptive Node Selection (20 nodes) 0mm Figure 5.6 Accuracy comparison between ED algorithm and our adaptive approach with respect to the node selection for the dataset III1. A visual comparison between AdNodes and ED can be seen in Figures 5.5 and 5.6. Accuracy by using adaptivity is comparable to ED algorithm using the double or triple number of nodes. An accuracy evaluation with respect to AdCons can be seen in Table 5.1 and in Figures 5.7 and 5.8. By using adaptivity instead of uniform sampling with fixed step size, non
56 36 NONRIGID REGISTRATION EVALUATION rigid registration achieves results as accurate as the ones obtained by using all the points from source object as constraints (i.e. step size 1), while maintaining the performance as fast as the one obtained by the approaches which achieve good performance and poor accuracy (i.e. step size 4 and 8). However, for the adaptivity to perform properly, we still must define a value for n of the mask n n used to scan the 2D projection of P s. Based on Table 5.1, step size 4 produces good results for uniform sampling with fixed step size. Therefore, we use such step size for n in order to improve accuracy and performance of the fixed step size. Source surface Registered surface Constraint sampling factor 1 Registered surface Constraint sampling factor 2 Registered surface Constraint sampling factor 4 Target surface Registered surface Constraint sampling factor 8 Registered surface Constraint sampling factor 16 Registered surface Constraint sampling factor 32 Registered surface Adaptive Constraint Sampling Figure 5.7 Accuracy comparison between different sampling schemes used to select constraints for optimization for the dataset II. An accuracy evaluation with respect to the number of nodes which influence the deformation (k) for a given point is illustrated in Figure 5.9. As stated in previous work (Sumner; Schmid; Pauly, 2007), k = 4 is a good option to solve the problem of deformation. Higher values for k may restrict the deformation space for G due to the oversample of nodes influencing a specific region of P s. An accuracy evaluation with respect to the influence of each quadtree level on AdNodes
57 5.2 ACCURACY EVALUATION 37 Source surface Registered surface Constraint sampling factor 1 Registered surface Constraint sampling factor 2 Registered surface Constraint sampling factor 4 Target surface Registered surface Constraint sampling factor 8 Registered surface Constraint sampling factor 16 Registered surface Constraint sampling factor 32 Registered surface Adaptive Constraint Sampling Figure 5.8 Accuracy comparison between different sampling schemes used to select constraints for optimization for the dataset III1. Dataset I II III1 III2 Sampling C A SD P C A SD P C A SD P C A SD P 1 10K K K K K K K K K K K K Adap 1.7K K K K Table 5.1 Number of constraints (C), accuracy (A, given in mm), standard deviation (SD, given in mm) and performance (P, given in FPS) results according to the step size (from 1 to 32) or sampling scheme (Adap for adaptive) used to select constraints for optimization. can be seen in Figure As the number of levels (l) increases, more nodes (maximum
58 38 NONRIGID REGISTRATION EVALUATION Figure 5.9 Accuracy (in mm) related to the parameter k for each one of the datasets used. 4 l ) are selected and the accuracy is improved. From the tests conducted, we need three levels for the quadtree building and refinement to register two objects accurately. Figure 5.10 Accuracy (in mm) obtained for each level of the quadtree and for each one of the datasets used. The maximum number of nodes for a level l is 4 l. 5.3 PERFORMANCE EVALUATION In terms of performance, a comparison between the algorithms can be seen in Figure As the graphic shows, AdNodes + AdCons does not run in full realtime, but achieves in the real cases 15 FPS, half of the frame rate considered ideal for a realtime application in computer graphics. Nevertheless, it is up three times faster than the ED algorithm.
59 5.3 PERFORMANCE EVALUATION 39 Figure 5.11 Performance (in FPS) obtained by AdNodes and AdNodes + AdCons in comparison with ED algorithm for each one of the datasets used. The use of adaptivity for constraint selection greatly reduces the processing time originally demanded by the ED algorithm (Table 5.1, step size 1). Optimization is a common bottleneck in nonrigid registration algorithms (Sumner; Schmid; Pauly, 2007; Li et al., 2009). The number of constraints selected is directly related to the time required by the optimization phase. Therefore, by reducing adaptively the number of constraints used, we can achieve good performance even for the optimization phase. Moreover, as long as the error is minimized over the surface, the number of nodes is dynamically decreased from G. With less parameters to be estimated, the optimization algorithm converges faster. On the dataset II, the performance for the ED algorithm is better than the one obtained by AdNodes. It can be justified by the number of nodes used. In this case, AdNodes did not change too much the initial number of nodes. Thus, when with almost the same number of nodes, ED is faster than AdNodes approach because it does not build nor refine the quadtree. An analysis of the performance cost for each step of AdNodes + AdCons was also performed. The average processing time for each step of the four datasets was measured. The performance results can be seen in Figure The step which takes most time is the nonlinear optimization algorithm, which requires 30ms (10ms per iteration). In fact, it consists of several steps: matrixmatrix and matrixvector multiplication, computation of J, LLT decomposition and linear solving. Adaptive selection of nodes and constraints requires only 5ms. Therefore, the gain of performance in our approach is justified by the reduction of dimensionality for the optimization algorithm, directly related to the size of G and the number of constraints selected. As J is a sparse matrix, one way to improve the performance of the matrix product would be using a specific sparse matrix product in GPU from CUSPARSE library (Nvidia, 2014), as example. However, from tests conducted, the level of sparsity in J is not
60 40 NONRIGID REGISTRATION EVALUATION Figure 5.12 Performance (in ms) obtained by our approach for each one of the most computationally expensive methods. MM  Matrix Multiplication (A = J t J); Jacobian  computation of J; Cholesky  LLT decomposition; Solver  linear solver Strsm from CUBLAS library; ACS  Adaptive Constraint Selection; ANS  Adaptive Node Selection; Weights  computation of the influence of G on P s ; MV  Matrixvector multiplication (b = J t r). sufficiently high (< 90%), and the CUBLASdense matrix product ran faster than the CUSPARSEbased matrix product. 5.4 DISCUSSION Based on Table 5.1 and Figure 5.11, we can verify that our algorithm is up to three times faster and about 1.5 to 2 times more accurate than the traditional ED algorithm implemented in GPU. Adaptation for node and constraint selections have shown to be useful in this context, improving from 2 to 6 times the performance of the original ED, while keeping the registration accurate. Also based on Table 5.1, we highlight that our algorithm achieved optimistic results regarding the application on real noisy datasets. Performance is improved from 2 to 3 times over the scenario where all points are selected as constraints, while there is minimal (for dataset III1) or none (for dataset III2) loss in accuracy. Stability of the algorithm is reinforced by the low standard deviation measured in comparison to the other scenarios evaluated. The focus of our approach is to add nonrigid tracking support for a MAR environment. Taking advantage from this scenario, where we have temporal/spatial coherence and deformation is expected to be small between consecutive frames, we use a simple projection algorithm to find correspondences. This matching algorithm does not affect our results since we want to ensure that the algorithm will minimize the deformation between consecutive frames, which we assume will be predominantly small for every input frame. Moreover, to boost application s performance and achieve full realtime performance, the algorithm does not need to be applied for every frame when the current error is sufficient
61 5.5 SUMMARY 41 low. 5.5 SUMMARY In this chapter we have evaluated the nonrigid registration algorithm and compared it against related work. Four different datasets were used and from the tests performed, we have shown that the adaptive nonrigid registration proposed outperforms current existing methods in terms of accuracy and performance.
62
63 Chapter 6 In this chapter, nonrigid tracking is evaluated in the context of the markerless augmented reality environment in terms of accuracy, performance and tracking robustness for several datasets. NONRIGID SUPPORT EVALUATION FOR A MARKERLESS AUGMENTED REALITY ENVIRONMENT In this section we describe the datasets used and analyse accuracy, performance and tracking robustness of the proposed algorithm in the markerless augmented reality environment. 6.1 METHODOLOGY The same hardware described in Chapter 5, Section 5.1 is used in the following tests. On the tests of our algorithm on the MAR environment, we have tested the approach in a scenario where user s head is our natural marker. As simple nonrigid interactions, we asked the user to perform three different facial expressions after 3D reconstruction: inflate his cheeks, smile and simulate a kiss expression, as shown in Figure 6.1. Moreover, to evaluate the proposed environment with respect to challenging deformation scenarios, we have tested the algorithm with different objects and in different conditions for deformation. First, we have tested the same expressions with a different user (Figure 6.2) to evaluate the proposed approach in different faces. We will use term Cheeks2, Smile2, Kiss2 to denote these expressions and differentiate them from the ones present in Figure 6.1. Next, we have tested different deformations: open mouth and angry facial expressions, and a deformation done on a bag, as can be seen in Figure 6.3. Compared to the scenarios presented in Figure 6.1, these deformations present additional challenges for the nonrigid registration algorithm: ˆ Open mouth expression poses a challenging scenario for matching of points, because the corresponding points become too distant during the motion of deformation. Also, there is a big hole on the deformed model which makes the process of matching even more difficult; 43
64 44NONRIGID SUPPORT EVALUATION FOR A MARKERLESS AUGMENTED REALITY ENVIRONMENT Cheeks Inflated Asrigidaspossible Expression Smile Kiss Figure 6.1 Neutral and deformed reference models based on user s facial expression. ˆ Angry expression has higher tracking error than open mouth expression, however it introduces less holes in the deformed model. In this case, the user not only performed the facial expression, but also it rigidly rotated his head in front of the sensor. Therefore, the environment must deal not only with the rigid tracking required to solve rigid motion, but also with the nonrigid tracking to solve user s nonrigid facial expression; ˆ Deformed bag presents high error and it is an object different from a face. Therefore, this dataset is fundamental to evaluate the robustness of the algorithm for distinct objects; 6.2 EVALUATION In this section, the deformation scenarios presented in Figures 6.1, 6.2, 6.3 are evaluated in the context of the MAR environment. As explained in Chapter 4, we need to update the 3D reference model to minimize the use of the nonrigid registration algorithm. To accomplish that, one solution is to resend the 3D deformed reference model into the grid with high weight. As explained in Section 3.2, the KinectFusion algorithm integrates raw depth data into a grid based on TSDF computation and a weight that indicates uncertainty. The higher the weight, the faster the 3D reference model shape is updated based on the current measurement. Therefore, to accommodate the current deformation and to stabilize the tracking faster, a high weight must be used to update the 3D reference model. We have tested the influence of such updating on the tracking accuracy. This test can be seen in Table 6.1. While weight 1 does not result in fast update on 3D reference model shape, stabilization in terms of accuracy is achieved with weight between 8 and 16. We have used weight 8 for all the other tests performed in this section because it provides more stable results than
65 6.2 EVALUATION 45 Cheeks Inflated Asrigidaspossible Expression Smile Kiss Figure 6.2 Neutral and deformed reference models for a different user. weight 16 (vide standard deviation measurements in Table 6.1). The exception for this statement occurs for the scenario where the error accumulated is too high (in our tests, bag deformation). In this case, the use of a weight higher than 8 is required to minimize the error estimated. From the tests conducted on all cases mentioned at the beginning of this section in which the 3D reference model is a face, we estimated an average accuracy of 1.5mm for rigid tracking during 3D rigid reference model reconstruction. For the bag, the average accuracy was 2mm for the same step. As can be seen in Table 6.2, when nonrigid user interaction is present, the average accuracy decreases for rigid tracking. We have tested different scenarios for nonrigid registration in order to evaluate the best multiframe strategy to balance accuracy and performance. While skipping a specific number of frames (i.e. NR4, NR8) is a good strategy, to apply it for almost every frame reduces the performance while being, sometimes, unnecessary (i.e. NR1, NR2). Likewise, to apply it between a large number of frames (i.e. NR16, NR32, NR64) improves slightly average tracking accuracy while application s performance keeps almost the same when compared to the rigid solution. However, if high deformation occurs inbetween these frames, the tracking will fail (i.e. error measured will be above a predefined threshold used to detect rigid tracking failure). To apply the nonrigid registration whenever the rigid tracking fails (i.e. NRAdaptive) is a good idea in order to solve every deformation which occurs between frames, while maintaining good accuracy even for the bag scenario, by considering the relative error reduction when compared to the rigid solution. When the 3D reference model is continuously updated for a case in which there is a small region of deformation, it will become increasingly smooth for each frame. In this case, this solution may be not the most accurate, as the 3D reference model will lose information in regions where there is no deformation. In Table 6.2, we can see this scenario from the tests conducted on the kiss, cheeks2 and open mouth expressions, where
66 46NONRIGID SUPPORT EVALUATION FOR A MARKERLESS AUGMENTED REALITY ENVIRONMENT Rigid Object Open Mouth Angry Deformed Bag Figure 6.3 Neutral and deformed reference models based on challenging deformation scenarios. Error (mm) Frame Figure 6.4 Cheeks tracking error measured for both rigid and rigid + nonrigid solutions. Plot in red  rigid tracking. Plot in blue  nonrigid adaptive tracking. Dashed line  threshold. nonrigid registration applied for every 1 or 2 frames did not produce the best results. This issue can be minimized by using the adaptive approach. Tracking error evolution can be seen in Figures 6.4, 6.6, 6.8, 6.10, 6.12, 6.14, 6.16, 6.18 and When there is sufficient nonrigid user interaction, error grows considerably and the nonrigid solution minimizes it. 3D reference model is updated to stabilize the tracking based on the current deformation. Nonrigid registration and 3D reference model updating are done only when the deformation changes in intensity (i.e. error above the threshold, shown as a dashed line) and the rigid tracking fails. A test to analyse the best threshold to detect rigid tracking failure was performed for the simple deformation scenarios shown in Figures 6.1 and 6.2 and the results can be seen in Table 6.3. As mentioned before, rigid tracking has average accuracy of 1.5mm. Therefore, by using this value as threshold, the algorithm applies nonrigid tracking for almost every new frame. On the opposite case, by using threshold of 3mm, the algorithm
67 6.2 EVALUATION 47 User Deformation Cheeks Smile Kiss TSDF Weight A SD A SD A SD User Deformation Cheeks2 Smile2 Kiss2 TSDF Weight A SD A SD A SD User Deformation Open Mouth Angry Bag TSDF Weight A SD A SD A SD Table 6.1 Average accuracy (A, given in mm) and Standard Deviation (SD, given in mm) results according to the weight used to update the 3D reference model. uses almost rigid tracking only. In this sense, the best threshold is between 2mm and 2.5mm, which provides fast and accurate tracking. For the challenging deformation scenarios shown in Figure 6.3, tracking error coming from deformation is too high when compared to the simple scenarios. Therefore, by using the threshold of 2mm, the nonrigid registration algorithm would be applied for almost every input frame. In this sense, for each one of the datasets, we have chosen an appropriate threshold to validate our approach. From the tests conducted, 2mm for open mouth expression, 3mm for angry expression and 7mm for bag deformation allowed our approach to achieve the best results. In terms of visual quality and accuracy, from Figures 6.5, 6.7, 6.9, 6.11, 6.13, 6.15, 6.17, 6.19 and 6.21, it is visible that the algorithm captures the main deformation present on the deformed expressions through the sequence of frames, improving accuracy in regions where only rigid registration cannot solve the tracking. In this context, our main contribution is that the nonrigid registration algorithm runs in realtime, allowing its application for an AR environment. As a preprocessing step, 3D reference model reconstruction is performed at 30 frames per second (FPS). When applied, nonrigid registration requires 60ms per frame. The step which takes most time to be completed for every frame is the nonlinear optimization, which demands on average 45ms per frame.
68 48NONRIGID SUPPORT EVALUATION FOR A MARKERLESS AUGMENTED REALITY ENVIRONMENT User Deformation Cheeks Smile Kiss Tracking/Measurement Avg. Std. Dev. Perf. Avg. Std. Dev. Perf. Avg. Std. Dev. Perf. Rigid NR NR NR NR NRAdaptive NR NR NR User Deformation Cheeks2 Smile2 Kiss2 Tracking/Measurement Avg. Std. Dev. Perf. Avg. Std. Dev. Perf. Avg. Std. Dev. Perf. Rigid NR NR NR NR NRAdaptive NR NR NR User Deformation Open Mouth Angry Bag Tracking/Measurement Avg. Std. Dev. Perf. Avg. Std. Dev. Perf. Avg. Std. Dev. Perf. Rigid NR NR NR NR NRAdaptive NR NR NR Table 6.2 Average accuracy (Avg., given in mm), Standard Deviation (Std. Dev., given in mm) and Performance (Perf., given in FPS) results for each one of the tracking algorithms tested in presence of specific user deformation. NRn: NonRigid Registration applied for every n frames (independent of rigid tracking fail); NRAdaptive: NonRigid Registration applied whenever the rigid algorithm fails.
69 6.2 EVALUATION 49 Rigid Tracking Error 10mm NonRigid Tracking Error 0mm Frame 8 Frame 20 Frame 50 Frame 80 Frame 101 Frame 143 Figure 6.5 Colorcoded cheeks tracking error measured for both rigid and nonrigid solutions. Error (mm) Frame Figure 6.6 Cheeks2 tracking error measured for both rigid and rigid + nonrigid solutions. Plot in red  rigid tracking. Plot in blue  nonrigid adaptive tracking. Dashed line  threshold. User Deformation Cheeks Smile Kiss Threshold/Measurement Avg. Std. Dev. Perf. Avg. Std. Dev. Perf. Avg. Std. Dev. Perf User Deformation Cheeks2 Smile2 Kiss2 Threshold/Measurement Avg. Std. Dev. Perf. Avg. Std. Dev. Perf. Avg. Std. Dev. Perf Table 6.3 Average accuracy (Avg., given in mm), Standard Deviation (Std. Dev., given in mm) and Performance (Perf., given in FPS) results for each one of the thresholds used to detect rigid tracking fail. From Table 6.2, it it visible that NRAdaptive approach allows realtime performance
70 50NONRIGID SUPPORT EVALUATION FOR A MARKERLESS AUGMENTED REALITY ENVIRONMENT Rigid Tracking Error 10mm NonRigid Tracking Error 0mm Frame 20 Frame 32 Frame 60 Frame 100 Frame 160 Frame 192 Figure 6.7 Colorcoded cheeks2 tracking error measured for both rigid and nonrigid solutions. Error (mm) Frame Figure 6.8 Smile tracking error measured for both rigid and rigid + nonrigid solutions. Plot in red  rigid tracking. Plot in blue  nonrigid adaptive tracking. Dashed line  threshold. (above 20 FPS) for almost all the deformations, with the exception of the bag, which is an object with much more points sent for optimization than the human s head, then the nonrigid registration runs slower for this scenario. It is worthy to mention that, in this case, the algorithm is not applied almost for every frame as the 3D reference model is updated based on the present deformation, reducing the chances for rigid tracking fail in the next iterations. As can be seen in the plots of Figures 6.4, 6.6, 6.8, 6.10, 6.12, 6.14, 6.16, 6.18 and The algorithm is applied 21 times ( for every 8 frames) for cheeks deformation, 29 times ( for every 8 frames) for cheeks2 deformation, 66 times ( for every 2.5 frames) for smile deformation, 8 times ( for every 15 frames) for smile 2 deformation, 16 times ( for every 10 frames) for kiss deformation, 25 times ( for every 6 frames) for kiss2 deformation, 73 times ( for every 2.2 frames) for open mouth deformation, 71 times ( for every 2.2 frames) for angry deformation and 80 times ( for every 1.5 frames) for bag deformation. A limitation of this adaptive algorithm is that it does not track nonrigid motions in which the 2D projections of the correspondent parts of the object are not near. An example of this situation can be seen in Figure Looking at the 2D position of the arms, if they are under big motion between sequential frames (Figures 6.22A and
71 6.2 EVALUATION 51 Rigid Tracking Error 10mm NonRigid Tracking Error 0mm Frame 14 Frame 22 Frame 50 Frame 94 Frame 134 Frame 154 Figure 6.9 Colorcoded smile tracking error measured for both rigid and nonrigid solutions. Error (mm) Frame Figure 6.10 Smile2 tracking error measured for both rigid and rigid + nonrigid solutions. Plot in red  rigid tracking. Plot in blue  nonrigid adaptive tracking. Dashed line  threshold. C), the projective data association matching algorithm will not match them, because their corresponding pixels are not close enough. In this case, the 3D rigid reference model is reconstructed from user s body (Figure 6.22B). As the user moves his arms in front of the sensor (Figure 6.22C) and, by the movement, they cannot be tracked properly due to the use of the project association algorithm, all the trajectory of the movement performed by the user is integrated into the 3D reference model (Figure D). As stated before, our multiframe adaptive nonrigid registration solution integrates the current depth data into the 3D reference model when deformation occurs. Therefore, when the algorithm cannot register object s movement, its residual error is integrated into the reference model, based on the updating of the TSDF representation, which averages the current 3D reference model implicitly stored on the grid and the current depth data captured by the sensor. Because there is no updating on 3D reference model s topology, the genus (i.e. hole) present inbetween the body and the arms, during the opening of the arms, is not transferred to the 3D reference model. Even in this case, the adaptive approach produces results better than the ones obtained by using rigid registration only (Figures 6.23 and 6.24).
72 52NONRIGID SUPPORT EVALUATION FOR A MARKERLESS AUGMENTED REALITY ENVIRONMENT Rigid Tracking Error 10mm NonRigid Tracking Error 0mm Frame 30 Frame 40 Frame 60 Frame 80 Frame 100 Frame 112 Figure 6.11 Colorcoded smile2 tracking error measured for both rigid and nonrigid solutions. Error (mm) Frame Figure 6.12 Kiss tracking error measured for both rigid and rigid + nonrigid solutions. Plot in red  rigid tracking. Plot in blue  nonrigid adaptive tracking. Dashed line  threshold. 6.3 SUMMARY In this chapter we have evaluated the multiframe adaptive nonrigid registration algorithm in a MAR environment. To validate our approach, tests were realized using mainly user s face as natural marker and user s facial expressions as nonrigid interactions. From the tests conducted, we have shown that the nonrigid registration, applied in a multiframe manner, is capable to run in realtime on customer hardware. Moreover, it improves the tracking accuracy of the MAR environment when compared to the rigidonly solution or other realtime nonrigid registration techniques, such as the ED algorithm.
73 6.3 SUMMARY 53 Rigid Tracking Error 10mm NonRigid Tracking Error 0mm Frame 18 Frame 44 Frame 82 Frame 106 Frame 134 Frame 150 Figure 6.13 Colorcoded kiss tracking error measured for both rigid and nonrigid solutions. Error (mm) Frame Figure 6.14 Kiss2 tracking error measured for both rigid and rigid + nonrigid solutions. Plot in red  rigid tracking. Plot in blue  nonrigid adaptive tracking. Dashed line  threshold. Rigid Tracking Error 10mm NonRigid Tracking Error 0mm Frame 15 Frame 30 Frame 60 Frame 75 Frame 90 Frame 102 Figure 6.15 Colorcoded kiss2 tracking error measured for both rigid and nonrigid solutions.
74 54NONRIGID SUPPORT EVALUATION FOR A MARKERLESS AUGMENTED REALITY ENVIRONMENT Error (mm) Frame Figure 6.16 Open Mouth tracking error measured for both rigid and rigid + nonrigid solutions. Plot in red  rigid tracking. Plot in blue  nonrigid adaptive tracking. Dashed line  threshold. Rigid Tracking Error 10mm NonRigid Tracking Error 0mm Frame 20 Frame 40 Frame 60 Frame 100 Frame 120 Frame 140 Figure 6.17 Colorcoded open mouth tracking error measured for both rigid and nonrigid solutions. 10 Error (mm) Frame Figure 6.18 Angry tracking error measured for both rigid and rigid + nonrigid solutions. Plot in red  rigid tracking. Plot in blue  nonrigid adaptive tracking. Dashed line  threshold
75 6.3 SUMMARY 55 Rigid Tracking Error 10mm NonRigid Tracking Error 0mm Frame 12 Frame 24 Frame 40 Frame 80 Frame 120 Frame 140 Figure 6.19 Colorcoded angry tracking error measured for both rigid and nonrigid solutions. 20 Error (mm) Frame Figure 6.20 Bag tracking error measured for both rigid and rigid + nonrigid solutions. Plot in red  rigid tracking. Plot in blue  nonrigid adaptive tracking. Dashed line  threshold. Rigid Tracking Error 10mm NonRigid Tracking Error 0mm Frame 42 Frame 54 Frame 62 Frame 82 Frame 94 Frame 110 Figure 6.21 Colorcoded bag tracking error measured for both rigid and nonrigid solutions.
76 56NONRIGID SUPPORT EVALUATION FOR A MARKERLESS AUGMENTED REALITY ENVIRONMENT A B C D Figure 6.22 Limitation of the proposed method. User s body (A) is reconstructed (B) and the algorithm cannot track user s arms (C) integrating all the movement into the 3D reference model (D). 20 Error (mm) Frame Figure 6.23 Body tracking error measured for both rigid and rigid + nonrigid solutions. Plot in red  rigid tracking. Plot in blue  nonrigid adaptive tracking. Dashed line  threshold.
77 6.3 SUMMARY 57 Rigid Tracking Error 10mm 0mm NonRigid Tracking Error Frame 50 Frame 150 Frame 300 Frame 500 Figure 6.24 Colorcoded body tracking error measured for both rigid and nonrigid solutions.
Accurate 3D Face and Body Modeling from a Single Fixed Kinect
Accurate 3D Face and Body Modeling from a Single Fixed Kinect Ruizhe Wang*, Matthias Hernandez*, Jongmoo Choi, Gérard Medioni Computer Vision Lab, IRIS University of Southern California Abstract In this
More informationMobile Point Fusion. Realtime 3d surface reconstruction out of depth images on a mobile platform
Mobile Point Fusion Realtime 3d surface reconstruction out of depth images on a mobile platform Aaron Wetzler Presenting: Daniel BenHoda Supervisors: Prof. Ron Kimmel Gal Kamar Yaron Honen Supported
More informationIntrinsic3D: HighQuality 3D Reconstruction by Joint Appearance and Geometry Optimization with SpatiallyVarying Lighting
Intrinsic3D: HighQuality 3D Reconstruction by Joint Appearance and Geometry Optimization with SpatiallyVarying Lighting R. Maier 1,2, K. Kim 1, D. Cremers 2, J. Kautz 1, M. Nießner 2,3 Fusion Ours 1
More informationRigid ICP registration with Kinect
Rigid ICP registration with Kinect Students: Yoni Choukroun, Elie Semmel Advisor: Yonathan Aflalo 1 Overview.p.3 Development of the project..p.3 Papers p.4 Project algorithm..p.6 Result of the whole body.p.7
More informationOutline. 1 Why we re interested in RealTime tracking and mapping. 3 Kinect Fusion System Overview. 4 Realtime Surface Mapping
Outline CSE 576 KinectFusion: RealTime Dense Surface Mapping and Tracking PhD. work from Imperial College, London Microsoft Research, Cambridge May 6, 2013 1 Why we re interested in RealTime tracking
More informationOccluded Facial Expression Tracking
Occluded Facial Expression Tracking Hugo Mercier 1, Julien Peyras 2, and Patrice Dalle 1 1 Institut de Recherche en Informatique de Toulouse 118, route de Narbonne, F31062 Toulouse Cedex 9 2 Dipartimento
More informationDynamic Geometry Processing
Dynamic Geometry Processing EG 2012 Tutorial Will Chang, Hao Li, Niloy Mitra, Mark Pauly, Michael Wand Tutorial: Dynamic Geometry Processing 1 Articulated Global Registration Introduction and Overview
More informationUsing temporal seeding to constrain the disparity search range in stereo matching
Using temporal seeding to constrain the disparity search range in stereo matching Thulani Ndhlovu Mobile Intelligent Autonomous Systems CSIR South Africa Email: tndhlovu@csir.co.za Fred Nicolls Department
More informationCorrespondence. CS 468 Geometry Processing Algorithms. Maks Ovsjanikov
Shape Matching & Correspondence CS 468 Geometry Processing Algorithms Maks Ovsjanikov Wednesday, October 27 th 2010 Overall Goal Given two shapes, find correspondences between them. Overall Goal Given
More information3D Photography: Stereo
3D Photography: Stereo Marc Pollefeys, Torsten Sattler Spring 2016 http://www.cvg.ethz.ch/teaching/3dvision/ 3D Modeling with Depth Sensors Today s class Obtaining depth maps / range images unstructured
More informationImage processing and features
Image processing and features Gabriele Bleser gabriele.bleser@dfki.de Thanks to Harald Wuest, Folker Wientapper and Marc Pollefeys Introduction Previous lectures: geometry Pose estimation Epipolar geometry
More informationImage Processing Pipeline for Facial Expression Recognition under Variable Lighting
Image Processing Pipeline for Facial Expression Recognition under Variable Lighting Ralph Ma, Amr Mohamed ralphma@stanford.edu, amr1@stanford.edu Abstract Much research has been done in the field of automated
More informationRobust Articulated ICP for RealTime Hand Tracking
Robust ArticulatedICP for RealTime Hand Tracking Andrea Tagliasacchi* Sofien Bouaziz Matthias Schröder* Mario Botsch Anastasia Tkach Mark Pauly * equal contribution 1/36 RealTime Tracking Setup Data
More informationAAM Based Facial Feature Tracking with Kinect
BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 15, No 3 Sofia 2015 Print ISSN: 13119702; Online ISSN: 13144081 DOI: 10.1515/cait20150046 AAM Based Facial Feature Tracking
More informationPERFORMANCE CAPTURE FROM SPARSE MULTIVIEW VIDEO
Stefan Krauß, Juliane Hüttl SE, SoSe 2011, HUBerlin PERFORMANCE CAPTURE FROM SPARSE MULTIVIEW VIDEO 1 Uses of Motion/Performance Capture movies games, virtual environments biomechanics, sports science,
More information3D Editing System for Captured Real Scenes
3D Editing System for Captured Real Scenes Inwoo Ha, Yong Beom Lee and James D.K. Kim Samsung Advanced Institute of Technology, Youngin, South Korea Email: {iw.ha, leey, jamesdk.kim}@samsung.com Tel:
More informationAdvances in 3D data processing and 3D cameras
Advances in 3D data processing and 3D cameras Miguel Cazorla Grupo de Robótica y Visión Tridimensional Universidad de Alicante Contents Cameras and 3D images 3D data compression 3D registration 3D feature
More informationDense Tracking and Mapping for Autonomous Quadrocopters. Jürgen Sturm
Computer Vision Group Prof. Daniel Cremers Dense Tracking and Mapping for Autonomous Quadrocopters Jürgen Sturm Joint work with Frank Steinbrücker, Jakob Engel, Christian Kerl, Erik Bylow, and Daniel Cremers
More informationVideo based Animation Synthesis with the Essential Graph. Adnane Boukhayma, Edmond Boyer MORPHEO INRIA Grenoble RhôneAlpes
Video based Animation Synthesis with the Essential Graph Adnane Boukhayma, Edmond Boyer MORPHEO INRIA Grenoble RhôneAlpes Goal Given a set of 4D models, how to generate realistic motion from user specified
More informationCVPR 2014 Visual SLAM Tutorial Kintinuous
CVPR 2014 Visual SLAM Tutorial Kintinuous kaess@cmu.edu The Robotics Institute Carnegie Mellon University Recap: KinectFusion [Newcombe et al., ISMAR 2011] RGBD camera GPU 3D/color model RGB TSDF (volumetric
More informationGlobal NonRigid Alignment. Benedict J. Brown Katholieke Universiteit Leuven
Global NonRigid Alignment Benedict J. Brown Katholieke Universiteit Leuven 3D Scanning Pipeline Acquisition Scanners acquire data from a single viewpoint 3D Scanning Pipeline Acquisition Alignment 3D
More informationSCAPE: Shape Completion and Animation of People
SCAPE: Shape Completion and Animation of People By Dragomir Anguelov, Praveen Srinivasan, Daphne Koller, Sebastian Thrun, Jim Rodgers, James Davis From SIGGRAPH 2005 Presentation for CS468 by Emilio Antúnez
More informationProcessing 3D Surface Data
Processing 3D Surface Data Computer Animation and Visualisation Lecture 17 Institute for Perception, Action & Behaviour School of Informatics 3D Surfaces 1 3D surface data... where from? Isosurfacing
More informationObject Reconstruction
B. Scholz Object Reconstruction 1 / 39 MINFakultät Fachbereich Informatik Object Reconstruction Benjamin Scholz Universität Hamburg Fakultät für Mathematik, Informatik und Naturwissenschaften Fachbereich
More informationSIFT: SCALE INVARIANT FEATURE TRANSFORM SURF: SPEEDED UP ROBUST FEATURES BASHAR ALSADIK EOS DEPT. TOPMAP M13 3D GEOINFORMATION FROM IMAGES 2014
SIFT: SCALE INVARIANT FEATURE TRANSFORM SURF: SPEEDED UP ROBUST FEATURES BASHAR ALSADIK EOS DEPT. TOPMAP M13 3D GEOINFORMATION FROM IMAGES 2014 SIFT SIFT: Scale Invariant Feature Transform; transform image
More information3D object recognition used by team robotto
3D object recognition used by team robotto Workshop Juliane Hoebel February 1, 2016 Faculty of Computer Science, OttovonGuericke University Magdeburg Content 1. Introduction 2. Depth sensor 3. 3D object
More informationMotion Estimation for Video Coding Standards
Motion Estimation for Video Coding Standards Prof. JaLing Wu Department of Computer Science and Information Engineering National Taiwan University Introduction of Motion Estimation The goal of video compression
More informationA NonLinear Image Registration Scheme for RealTime Liver Ultrasound Tracking using Normalized Gradient Fields
A NonLinear Image Registration Scheme for RealTime Liver Ultrasound Tracking using Normalized Gradient Fields Lars König, Till Kipshagen and Jan Rühaak Fraunhofer MEVIS Project Group Image Registration,
More information3D Models from Range Sensors. Gianpaolo Palma
3D Models from Range Sensors Gianpaolo Palma Who Gianpaolo Palma Researcher at Visual Computing Laboratory (ISTICNR) Expertise: 3D scanning, Mesh Processing, Computer Graphics Email: gianpaolo.palma@isti.cnr.it
More informationUncertainties: Representation and Propagation & Line Extraction from Range data
41 Uncertainties: Representation and Propagation & Line Extraction from Range data 42 Uncertainty Representation Section 4.1.3 of the book Sensing in the real world is always uncertain How can uncertainty
More informationMotion Detection Algorithm
Volume 1, No. 12, February 2013 ISSN 22781080 The International Journal of Computer Science & Applications (TIJCSA) RESEARCH PAPER Available Online at http://www.journalofcomputerscience.com/ Motion Detection
More informationOverview. Augmented reality and applications Markerbased augmented reality. Camera model. Binary markers Textured planar markers
Augmented reality Overview Augmented reality and applications Markerbased augmented reality Binary markers Textured planar markers Camera model Homography Direct Linear Transformation What is augmented
More informationOcclusion Robust MultiCamera Face Tracking
Occlusion Robust MultiCamera Face Tracking Josh Harguess, Changbo Hu, J. K. Aggarwal Computer & Vision Research Center / Department of ECE The University of Texas at Austin harguess@utexas.edu, changbo.hu@gmail.com,
More informationMeasurement of 3D Foot Shape Deformation in Motion
Measurement of 3D Foot Shape Deformation in Motion Makoto Kimura Masaaki Mochimaru Takeo Kanade Digital Human Research Center National Institute of Advanced Industrial Science and Technology, Japan The
More informationDetecting motion by means of 2D and 3D information
Detecting motion by means of 2D and 3D information Federico Tombari Stefano Mattoccia Luigi Di Stefano Fabio Tonelli Department of Electronics Computer Science and Systems (DEIS) Viale Risorgimento 2,
More informationChaplin, Modern Times, 1936
Chaplin, Modern Times, 1936 [A Bucket of Water and a Glass Matte: Special Effects in Modern Times; bonus feature on The Criterion Collection set] Multiview geometry problems Structure: Given projections
More informationCSE 252B: Computer Vision II
CSE 252B: Computer Vision II Lecturer: Serge Belongie Scribes: Jeremy Pollock and Neil Alldrin LECTURE 14 Robust Feature Matching 14.1. Introduction Last lecture we learned how to find interest points
More informationInternational Conference on Communication, Media, Technology and Design. ICCMTD May 2012 Istanbul  Turkey
VISUALIZING TIME COHERENT THREEDIMENSIONAL CONTENT USING ONE OR MORE MICROSOFT KINECT CAMERAS Naveed Ahmed University of Sharjah Sharjah, United Arab Emirates Abstract Visualizing or digitization of the
More informationACQUIRING 3D models of deforming objects in reallife is
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 1 Robust Nonrigid Motion Tracking and Surface Reconstruction Using L 0 Regularization Kaiwen Guo, Feng Xu, Yangang Wang, Yebin Liu, Member, IEEE
More informationPanoramic Image Stitching
Mcgill University Panoramic Image Stitching by Kai Wang Pengbo Li A report submitted in fulfillment for the COMP 558 Final project in the Faculty of Computer Science April 2013 Mcgill University Abstract
More informationSynthesizing Realistic Facial Expressions from Photographs
Synthesizing Realistic Facial Expressions from Photographs 1998 F. Pighin, J Hecker, D. Lischinskiy, R. Szeliskiz and D. H. Salesin University of Washington, The Hebrew University Microsoft Research 1
More informationGesture based PTZ camera control
Gesture based PTZ camera control Report submitted in May 2014 to the department of Computer Science and Engineering of National Institute of Technology Rourkela in partial fulfillment of the requirements
More informationIEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. 6, NO. 5, SEPTEMBER
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. 6, NO. 5, SEPTEMBER 2012 411 Consistent StereoAssisted Absolute Phase Unwrapping Methods for Structured Light Systems Ricardo R. Garcia, Student
More informationHandeye calibration with a depth camera: 2D or 3D?
Handeye calibration with a depth camera: 2D or 3D? Svenja Kahn 1, Dominik Haumann 2 and Volker Willert 2 1 Fraunhofer IGD, Darmstadt, Germany 2 Control theory and robotics lab, TU Darmstadt, Darmstadt,
More information3D Model Acquisition by Tracking 2D Wireframes
3D Model Acquisition by Tracking 2D Wireframes M. Brown, T. Drummond and R. Cipolla {96mab twd20 cipolla}@eng.cam.ac.uk Department of Engineering University of Cambridge Cambridge CB2 1PZ, UK Abstract
More informationEuclidean Reconstruction Independent on Camera Intrinsic Parameters
Euclidean Reconstruction Independent on Camera Intrinsic Parameters Ezio MALIS I.N.R.I.A. SophiaAntipolis, FRANCE Adrien BARTOLI INRIA RhoneAlpes, FRANCE Abstract bundle adjustment techniques for Euclidean
More informationL2 Data Acquisition. Mechanical measurement (CMM) Structured light Range images Shape from shading Other methods
L2 Data Acquisition Mechanical measurement (CMM) Structured light Range images Shape from shading Other methods 1 Coordinate Measurement Machine Touch based Slow Sparse Data Complex planning Accurate 2
More informationOcclusion Detection of Real Objects using Contour Based Stereo Matching
Occlusion Detection of Real Objects using Contour Based Stereo Matching Kenichi Hayashi, Hirokazu Kato, Shogo Nishida Graduate School of Engineering Science, Osaka University,13 Machikaneyamacho, Toyonaka,
More informationDEFORMABLE MATCHING OF HAND SHAPES FOR USER VERIFICATION. Ani1 K. Jain and Nicolae Duta
DEFORMABLE MATCHING OF HAND SHAPES FOR USER VERIFICATION Ani1 K. Jain and Nicolae Duta Department of Computer Science and Engineering Michigan State University, East Lansing, MI 488241026, USA Email:
More informationFace Tracking : An implementation of the KanadeLucasTomasi Tracking algorithm
Face Tracking : An implementation of the KanadeLucasTomasi Tracking algorithm Dirk W. Wagener, Ben Herbst Department of Applied Mathematics, University of Stellenbosch, Private Bag X1, Matieland 762,
More informationSnakes, level sets and graphcuts. (Deformable models)
INSTITUTE OF INFORMATION AND COMMUNICATION TECHNOLOGIES BULGARIAN ACADEMY OF SCIENCE Snakes, level sets and graphcuts (Deformable models) Centro de Visión por Computador, Departament de Matemàtica Aplicada
More informationIMPROVED FACE RECOGNITION USING ICP TECHNIQUES INCAMERA SURVEILLANCE SYSTEMS. Kirthiga, M.ECommunication system, PREC, Thanjavur
IMPROVED FACE RECOGNITION USING ICP TECHNIQUES INCAMERA SURVEILLANCE SYSTEMS Kirthiga, M.ECommunication system, PREC, Thanjavur R.Kannan,Assistant professor,prec Abstract: Face Recognition is important
More informationMediaTek Video Face Beautify
MediaTek Video Face Beautify November 2014 2014 MediaTek Inc. Table of Contents 1 Introduction... 3 2 The MediaTek Solution... 4 3 Overview of Video Face Beautify... 4 4 Face Detection... 6 5 Skin Detection...
More informationVideo Processing for Judicial Applications
Video Processing for Judicial Applications Konstantinos Avgerinakis, Alexia Briassouli, Ioannis Kompatsiaris Informatics and Telematics Institute, Centre for Research and Technology, Hellas Thessaloniki,
More informationMotion Estimation. There are three main types (or applications) of motion estimation:
Members: D91922016 朱威達 R93922010 林聖凱 R93922044 謝俊瑋 Motion Estimation There are three main types (or applications) of motion estimation: Parametric motion (image alignment) The main idea of parametric motion
More informationFeature Tracking and Optical Flow
Feature Tracking and Optical Flow Prof. D. Stricker Doz. G. Bleser Many slides adapted from James Hays, Derek Hoeim, Lana Lazebnik, Silvio Saverse, who in turn adapted slides from Steve Seitz, Rick Szeliski,
More informationA consumer level 3D object scanning device using Kinect for webbased C2C business
A consumer level 3D object scanning device using Kinect for webbased C2C business Geoffrey Poon, Yu Yin Yeung and WaiMan Pang Caritas Institute of Higher Education Introduction Internet shopping is popular
More informationRobot Mapping. Least Squares Approach to SLAM. Cyrill Stachniss
Robot Mapping Least Squares Approach to SLAM Cyrill Stachniss 1 Three Main SLAM Paradigms Kalman filter Particle filter Graphbased least squares approach to SLAM 2 Least Squares in General Approach for
More information3D Visualization through Planar Pattern Based Augmented Reality
NATIONAL TECHNICAL UNIVERSITY OF ATHENS SCHOOL OF RURAL AND SURVEYING ENGINEERS DEPARTMENT OF TOPOGRAPHY LABORATORY OF PHOTOGRAMMETRY 3D Visualization through Planar Pattern Based Augmented Reality Dr.
More informationOn the Dimensionality of Deformable Face Models
On the Dimensionality of Deformable Face Models CMURITR0612 Iain Matthews, Jing Xiao, and Simon Baker The Robotics Institute Carnegie Mellon University 5000 Forbes Avenue Pittsburgh, PA 15213 Abstract
More informationAn Approach for Reduction of Rain Streaks from a Single Image
An Approach for Reduction of Rain Streaks from a Single Image Vijayakumar Majjagi 1, Netravati U M 2 1 4 th Semester, M. Tech, Digital Electronics, Department of Electronics and Communication G M Institute
More informationProject report Augmented reality with ARToolKit
Project report Augmented reality with ARToolKit FMA175 Image Analysis, Project Mathematical Sciences, Lund Institute of Technology Supervisor: Petter Strandmark Fredrik Larsson (dt07fl2@student.lth.se)
More informationPHOTOGRAMMETRIC TECHNIQUE FOR TEETH OCCLUSION ANALYSIS IN DENTISTRY
PHOTOGRAMMETRIC TECHNIQUE FOR TEETH OCCLUSION ANALYSIS IN DENTISTRY V. A. Knyaz a, *, S. Yu. Zheltov a, a State Research Institute of Aviation System (GosNIIAS), 539 Moscow, Russia (knyaz,zhl)@gosniias.ru
More informationProcessing 3D Surface Data
Processing 3D Surface Data Computer Animation and Visualisation Lecture 15 Institute for Perception, Action & Behaviour School of Informatics 3D Surfaces 1 3D surface data... where from? Isosurfacing
More informationPATTERN CLASSIFICATION AND SCENE ANALYSIS
PATTERN CLASSIFICATION AND SCENE ANALYSIS RICHARD O. DUDA PETER E. HART Stanford Research Institute, Menlo Park, California A WILEYINTERSCIENCE PUBLICATION JOHN WILEY & SONS New York Chichester Brisbane
More informationDigital Makeup Face Generation
Digital Makeup Face Generation Wut Yee Oo Mechanical Engineering Stanford University wutyee@stanford.edu Abstract Make up applications offer photoshop tools to get users inputs in generating a make up
More informationLecture 10 Multiview Stereo (3D Dense Reconstruction) Davide Scaramuzza
Lecture 10 Multiview Stereo (3D Dense Reconstruction) Davide Scaramuzza REMODE: Probabilistic, Monocular Dense Reconstruction in Real Time, ICRA 14, by Pizzoli, Forster, Scaramuzza [M. Pizzoli, C. Forster,
More informationLow Cost Motion Capture
Low Cost Motion Capture R. Budiman M. Bennamoun D.Q. Huynh School of Computer Science and Software Engineering The University of Western Australia Crawley WA 6009 AUSTRALIA Email: budimr01@tartarus.uwa.edu.au,
More informationConnected Component Analysis and Change Detection for Images
Connected Component Analysis and Change Detection for Images Prasad S.Halgaonkar Department of Computer Engg, MITCOE Pune University, India Abstract Detection of the region of change in images of a particular
More informationCITS 4402 Computer Vision
CITS 4402 Computer Vision Prof Ajmal Mian Lecture 12 3D Shape Analysis & Matching Overview of this lecture Revision of 3D shape acquisition techniques Representation of 3D data Applying 2D image techniques
More informationApplying Synthetic Images to Learning Grasping Orientation from Single Monocular Images
Applying Synthetic Images to Learning Grasping Orientation from Single Monocular Images 1 Introduction  Steve Chuang and Eric Shan  Determining object orientation in images is a wellestablished topic
More informationPART IV: RS & the Kinect
Computer Vision on Rolling Shutter Cameras PART IV: RS & the Kinect PerErik Forssén, Erik Ringaby, Johan Hedborg Computer Vision Laboratory Dept. of Electrical Engineering Linköping University Tutorial
More informationVIDEO FACE BEAUTIFICATION
VIDEO FACE BEAUTIFICATION Yajie Zhao 1, Xinyu Huang 2, Jizhou Gao 1, Alade Tokuta 2, Cha Zhang 3, Ruigang Yang 1 University of Kentucky 1 North Carolina Central University 2 Microsoft Research 3 Lexington,
More informationProgrammable Shaders for Deformation Rendering
Programmable Shaders for Deformation Rendering Carlos D. Correa, Deborah Silver Rutgers, The State University of New Jersey Motivation We present a different way of obtaining mesh deformation. Not a modeling,
More informationFace Tracking. Synonyms. Definition. Main Body Text. Amit K. RoyChowdhury and Yilei Xu. Facial Motion Estimation
Face Tracking Amit K. RoyChowdhury and Yilei Xu Department of Electrical Engineering, University of California, Riverside, CA 92521, USA {amitrc,yxu}@ee.ucr.edu Synonyms Facial Motion Estimation Definition
More informationCS 532: 3D Computer Vision 7 th Set of Notes
1 CS 532: 3D Computer Vision 7 th Set of Notes Instructor: Philippos Mordohai Webpage: www.cs.stevens.edu/~mordohai Email: Philippos.Mordohai@stevens.edu Office: Lieb 215 Logistics No class on October
More information3D Computer Vision. Depth Cameras. Prof. Didier Stricker. Oliver Wasenmüller
3D Computer Vision Depth Cameras Prof. Didier Stricker Oliver Wasenmüller Kaiserlautern University http://ags.cs.unikl.de/ DFKI Deutsches Forschungszentrum für Künstliche Intelligenz http://av.dfki.de
More informationLecture 13 Theory of Registration. ch. 10 of Insight into Images edited by Terry Yoo, et al. Spring (CMU RI) : BioE 2630 (Pitt)
Lecture 13 Theory of Registration ch. 10 of Insight into Images edited by Terry Yoo, et al. Spring 2018 16725 (CMU RI) : BioE 2630 (Pitt) Dr. John Galeotti The content of these slides by John Galeotti,
More informationA Hybrid Face Detection System using combination of Appearancebased and Featurebased methods
IJCSNS International Journal of Computer Science and Network Security, VOL.9 No.5, May 2009 181 A Hybrid Face Detection System using combination of Appearancebased and Featurebased methods Zahra Sadri
More informationIMPLEMENTATION OF THE CONTRAST ENHANCEMENT AND WEIGHTED GUIDED IMAGE FILTERING ALGORITHM FOR EDGE PRESERVATION FOR BETTER PERCEPTION
IMPLEMENTATION OF THE CONTRAST ENHANCEMENT AND WEIGHTED GUIDED IMAGE FILTERING ALGORITHM FOR EDGE PRESERVATION FOR BETTER PERCEPTION Chiruvella Suresh Assistant professor, Department of Electronics & Communication
More informationDynamic Human Surface Reconstruction Using a Single Kinect
2013 13th International Conference on ComputerAided Design and Computer Graphics Dynamic Human Surface Reconstruction Using a Single Kinect Ming Zeng Jiaxiang Zheng Xuan Cheng Bo Jiang Xinguo Liu Software
More informationRegistration D.A. Forsyth, UIUC
Registration D.A. Forsyth, UIUC Registration Place a geometric model in correspondence with an image could be 2D or 3D model up to some transformations possibly up to deformation Applications very important
More informationThreeDimensional Sensors Lecture 6: PointCloud Registration
ThreeDimensional Sensors Lecture 6: PointCloud Registration Radu Horaud INRIA Grenoble RhoneAlpes, France Radu.Horaud@inria.fr http://perception.inrialpes.fr/ PointCloud Registration Methods Fuse data
More informationCS 4495 Computer Vision A. Bobick. Motion and Optic Flow. Stereo Matching
Stereo Matching Fundamental matrix Let p be a point in left image, p in right image l l Epipolar relation p maps to epipolar line l p maps to epipolar line l p p Epipolar mapping described by a 3x3 matrix
More informationVisual Recognition: Image Formation
Visual Recognition: Image Formation Raquel Urtasun TTI Chicago Jan 5, 2012 Raquel Urtasun (TTIC) Visual Recognition Jan 5, 2012 1 / 61 Today s lecture... Fundamentals of image formation You should know
More informationColour Segmentationbased Computation of Dense Optical Flow with Application to Video Object Segmentation
ÖGAI Journal 24/1 11 Colour Segmentationbased Computation of Dense Optical Flow with Application to Video Object Segmentation Michael Bleyer, Margrit Gelautz, Christoph Rhemann Vienna University of Technology
More information2D vs. 3D Deformable Face Models: Representational Power, Construction, and RealTime Fitting
2D vs. 3D Deformable Face Models: Representational Power, Construction, and RealTime Fitting Iain Matthews, Jing Xiao, and Simon Baker The Robotics Institute, Carnegie Mellon University Epsom PAL, Epsom
More informationCS 664 Segmentation. Daniel Huttenlocher
CS 664 Segmentation Daniel Huttenlocher Grouping Perceptual Organization Structural relationships between tokens Parallelism, symmetry, alignment Similarity of token properties Often strong psychophysical
More informationOutdoor Scene Reconstruction from Multiple Image Sequences Captured by a Handheld Video Camera
Outdoor Scene Reconstruction from Multiple Image Sequences Captured by a Handheld Video Camera Tomokazu Sato, Masayuki Kanbara and Naokazu Yokoya Graduate School of Information Science, Nara Institute
More informationFacial Expression Recognition Using Nonnegative Matrix Factorization
Facial Expression Recognition Using Nonnegative Matrix Factorization Symeon Nikitidis, Anastasios Tefas and Ioannis Pitas Artificial Intelligence & Information Analysis Lab Department of Informatics Aristotle,
More informationMultimedia Technology CHAPTER 4. Video and Animation
CHAPTER 4 Video and Animation  Both video and animation give us a sense of motion. They exploit some properties of human eye s ability of viewing pictures.  Motion video is the element of multimedia
More informationAUTOMATED 4 AXIS ADAYfIVE SCANNING WITH THE DIGIBOTICS LASER DIGITIZER
AUTOMATED 4 AXIS ADAYfIVE SCANNING WITH THE DIGIBOTICS LASER DIGITIZER INTRODUCTION The DIGIBOT 3D Laser Digitizer is a high performance 3D input device which combines laser ranging technology, personal
More informationStructure from Motion. Prof. Marco Marcon
Structure from Motion Prof. Marco Marcon Summingup 2 Stereo is the most powerful clue for determining the structure of a scene Another important clue is the relative motion between the scene and (mono)
More informationBuilding a Panorama. Matching features. Matching with Features. How do we build a panorama? Computational Photography, 6.882
Matching features Building a Panorama Computational Photography, 6.88 Prof. Bill Freeman April 11, 006 Image and shape descriptors: Harris corner detectors and SIFT features. Suggested readings: Mikolajczyk
More informationFast and robust techniques for 3D/2D registration and photo blending on massive point clouds
www.crs4.it/vic/ vcg.isti.cnr.it/ Fast and robust techniques for 3D/2D registration and photo blending on massive point clouds R. Pintus, E. Gobbetti, M.Agus, R. Combet CRS4 Visual Computing M. Callieri
More informationMobile Human Detection Systems based on Sliding Windows ApproachA Review
Mobile Human Detection Systems based on Sliding Windows ApproachA Review Seminar: Mobile Human detection systems Njieutcheu Tassi cedrique Rovile Department of Computer Engineering University of Heidelberg
More informationModelBased Human Motion Capture from Monocular Video Sequences
ModelBased Human Motion Capture from Monocular Video Sequences Jihun Park 1, Sangho Park 2, and J.K. Aggarwal 2 1 Department of Computer Engineering Hongik University Seoul, Korea jhpark@hongik.ac.kr
More informationAdaptive MultiStage 2D Image Motion Field Estimation
Adaptive MultiStage 2D Image Motion Field Estimation Ulrich Neumann and Suya You Computer Science Department Integrated Media Systems Center University of Southern California, CA 900890781 ABSRAC his
More informationAdvanced Computer Graphics
G22.2274 001, Fall 2009 Advanced Computer Graphics Project details and tools 1 Project Topics Computer Animation Geometric Modeling Computational Photography Image processing 2 Optimization All projects
More information