Universidade Federal da Bahia Universidade Salvador Universidade Estadual de Feira de Santana TESE DE DOUTORADO

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "Universidade Federal da Bahia Universidade Salvador Universidade Estadual de Feira de Santana TESE DE DOUTORADO"

Transcription

1 Universidade Federal da Bahia Universidade Salvador Universidade Estadual de Feira de Santana TESE DE DOUTORADO An Adaptive Approach to Real-Time 3D Non-Rigid Registration Antonio Carlos dos Santos Souza Programa Multiinstitucional de Pós-Graduação em Ciência da Computação PMCC Salvador 19 de Dezembro de 2014 PMCC-Dsc-0000

2

3 ANTONIO CARLOS DOS SANTOS SOUZA AN ADAPTIVE APPROACH TO REAL-TIME 3D NON-RIGID REGISTRATION Tese apresentada ao Programa Multiinstitucional de Pós-Graduação em Ciência da Computação da Universidade Federal da Bahia, Universidade Estadual de Feira de Santana e Universidade Salvador, como requisito parcial para obtenção do grau de Doutor em Ciência da Computação. Orientador: Antônio Lopes Apolinário Júnior Salvador 19 de Dezembro de 2014

4 ii Ficha catalográfica. Souza, Antonio Carlos dos Santos An Adaptive Approach to Real-Time 3D Non-Rigid Registration/ Antonio Carlos dos Santos Souza Salvador, 19 de Dezembro de p.: il. Orientador: Antônio Lopes Apolinário Júnior. Tese (doutorado) Universidade Federal da Bahia, Instituto de Matemática, 19 de Dezembro de Alinhamento não-rígido 2. Algoritmos Adaptativos 3. Realidade Aumentada. I. Apolinario, Antônio Lopes. II. Universidade Federal da Bahia. Instituto de Matemática. III Título. CCD 20.ed

5 iii TERMO DE APROVAÇÃO ANTONIO CARLOS DOS SANTOS SOUZA AN ADAPTIVE APPROACH TO REAL-TIME 3D NON-RIGID REGISTRATION Esta tese foi julgada adequada à obtenção do título de Doutor em Ciência da Computação e aprovada em sua forma final pelo Programa Multiinstitucional de Pós- Graduação em Ciência da Computação da UFBA-UEFS-UNIFACS. Salvador, 19 de Dezembro de 2014 Prof. Dr. Antônio Lopes Apolinário Júnior Universidade Federal da Bahia Prof. Dr. Vinicius Moreira Mello Universidade Federal da Bahia Prof. Dr. Thales Miranda de Almeida Vieira Universidade Federal de Alagoas Prof. Dr. Ricardo Farias Universidade Federal do Rio de Janeiro Prof. Dr. Luiz Marcos Garcia Gonçalves Universidade Federal do Rio Grande do Norte

6

7 ACKNOWLEDGEMENTS First, I would like to thank God for all the blessings given during my journey. This work was made possible by the enthusiastic support, suggestions, encouragement, and guidance of many individuals. I am greatly indebted to my academic advisor, chair and director of this work, Prof. Dr. Antonio Lopes Apolinário Jr for instilling in me the joy of conducting outstanding research in computer graphics. These six years have been intense in my career. Your support and vigilance have allowed me to achieve results that I couldn t have thought of. Thank you so much Committee for the direction, feedbacks, and all the enlightening advices. Thank you Prof. Dr. Gilson Giraldi, Prof. Dr. Vinícius Mello and Prof. Dr. Perfilino Ferreira. Thank you Prof. Dra. Lynn Alves for the unforgettable times at master s degree. Furthermore, I would like to acknowledge my friend Márcio Cerqueira de Farias Macedo for the great partnership and his awesome markerless augmented reality environment for on-patient medical data visualization. Working with you has been a wonderful experience and a great source of inspiration. I really wonder how my thesis would be without this environment. Many individuals also provided support in myriad ways. Special thanks go to Aline Machado, Sabrina, Osmar, Rosalba, Thalles Caribé, Prof. Dr. Eduardo Telmo, Prof. Dr. Lurimar, Prof. André, Prof. Jowaner, Prof. Cesar, Lilia, Prof. Dr. Marcelo Veras, Prof. Dr. Jairo Dantas, Bruno, Leo, Everton, Toninho, Dona Vilma, Marilene (Fortona), Dona Mary, Sr. Deja, Cita, Fabiana, Carol, Janio, Eliakin, Igor, Rodrigo, Jony, Katia, Anderson, Leandro, Edilson and Rita. I am also much indebted to the insightful discussions and fun times with all the Labrasoft friends: Luiz Cláudio Machado, Valentim, Romilson, Ronaldo, Antonio Maurício, Simone, Josildo, Marcelo, Felipe, Amilton, Vanessa, Diego, Letícia, Luiz Henrique, Pedro, Jorge, Fabiano and Aderbal. Finally, I would like to acknowledge my family for their constant support and encouragement during this graduate journey: Antonio Porfírio, Reinaldo, Ricardo, Maisa, Tissi, Lucia, Danilo, my uncle Zé Ribeiro, Manoel, Walter and Carlito, my aunt Belita, Esmera, Judith and Decinha, my cousins Fatim, Alva and Bel, Alan, Zeo, Caio, Vivian, Manoel, Jorge, Nandi, Renã, Luis and my family away from home Chicão, Rosinha, Juca, Leo, Mito and Kelly. I wish to thank Aline Requião for everything. Aline, I appreciate your motherly love. I dedicate this work to the loving memory of my mother Antonia, to my son Arthur and to my nieces Bruna and Eduarda. v

8

9 RESUMO Alinhamento não-rígido 3D é fundamental para o rastreamento e/ou reconstrução de modelos tridimensionais deformáveis. Contudo, a maioria dos algoritmos de alinhamento não-rígido não são tão rápidos quanto aqueles desenvolvidos no campo do alinhamento rígido. Métodos rápidos para alinhamento não-rígido 3D são particularmente interessantes para aplicações de realidade aumentada sem marcadores, em que um objeto sendo utilizado como marcador natural pode sofrer deformações ao longo do tempo de execução da aplicação. Nesta tese é apresentado um algoritmo adaptativo multi-frame implementado em GPU para o alinhamento não-rígido de modelos tridimensionais deformáveis capturados por uma câmera RGB-D. Abordagens adaptativas tendem a otimizar algoritmos, concentrando esforços nos locais mais relevantes, causando um efeito global de melhoria da solução. O método proposto utiliza adaptatividade em três passos do algoritmo. Primeiro, para guiar a distribuição de regiões de influência baseado na intensidade de deformação calculada sobre o objeto. Segundo, durante a seleção de restrições, em que a amostragem feita sobre o objeto para a fase de otimização é baseado na deformação atual medida. Terceiro, para aplicar o algoritmo em um esquema multi-frame apenas quando o erro do rastreamento rígido ultrapassar um certo limiar, indicando que uma transformação rígida já não produz um alinhamento satisfatório. A partir do uso da adaptatividade e do paralelismo da implementação em GPU, foram obtidos resultados que demonstram que o método proposto é capaz de executar em tempo real com uma abordagem tão precisa quanto aquelas existentes na literatura. Alinhamento não-rígido, Algoritmos Adaptativos, Realidade Aumen- Palavras-chave: tada. vii

10

11 ABSTRACT 3D non-rigid registration is fundamental for tracking or reconstruction of 3D deformable shapes. However, the majority of non-rigid registration methods are not as fast as the ones developed in the field of rigid registration. Fast methods for 3D non-rigid registration are particularly interesting for markerless augmented reality applications, in which the object being used as a natural marker may undergo non-rigid user interaction. Here, we present a multi-frame adaptive algorithm for 3D non-rigid registration implemented on GPU where the 3D data is captured from an RGB-D camera. In general, adaptive algorithms optimize the solution, focusing on the more relevant aspects of the problem, causing a global improvement on the final solution. Our approach uses adaptivity in three stages of the process. First, to guide the distribution of regions of influence based on the deformation intensity on some region of the shape. Second, during the selection of constraints, where the sampling done over the object for the optimization is based on the current deformation. Third, to apply the algorithm in a multi-frame manner only when rigid tracking error is above a pre-defined threshold, showing that a rigid transformation cannot result in a satisfactory result. Taking advantage from this adaptivity and the parallelism of the GPU, the results obtained show that the proposed algorithm is capable to achieve real-time performance with an approach as accurate as the ones proposed in the literature. Keywords: Non-Rigid Registration, Adaptive Algorithms, Augmented Reality. ix

12

13 CONTENTS Chapter 1 Introduction Hypothesis Contributions Organization Chapter 2 Fundamentals and Related Work Augmented Reality D Registration Rigid Registration Non-Rigid Registration Summary Chapter 3 Markerless Augmented Reality Environment Surface Acquisition D Reference Model Reconstruction Tracking Summary Chapter 4 GPU-Based Adaptive Non-Rigid Registration Deformation Model Matching of Points Selection of Nodes Weighting the Influence of Nodes Selection of Constraints Error Minimization Updating the Source Object Multi-Frame Non-Rigid Tracking Summary Chapter 5 Non-Rigid Registration Evaluation Methodology Accuracy Evaluation Performance Evaluation Discussion xi

14 xii CONTENTS 5.5 Summary Chapter 6 Non-Rigid Support Evaluation for a Markerless Augmented Reality Environment Methodology Evaluation Summary Chapter 7 Conclusion and Future Work Conclusion Future Directions

15 LIST OF FIGURES 2.1 Reality-Virtuality Continuum (Milgram; Kishino, 1994) Marker-based (left image) and markerless (right images) augmented reality. Left image is courtesy of ARToolKit library (Kato; Billinghurst, 1999) and right images are courtesy of KinectFusion (Izadi et al., 2011) Overview of the proposed approach from 3D reference model reconstruction to tracking solution. Adapted from (Souza; Macedo; Apolinario, 2014) Overview of KinectFusion s pipeline (Izadi et al., 2011) Left image: The user translated his face fast. A small number of points were at the same image coordinate and the ICP failed. Right image: By using the pose estimation algorithm, the problem can be solved (Macedo; Apolinario; Souza, 2013) Overview of the proposed approach from the depth map acquisition to the final non-rigid aligned surface Building of the deformation graph (right) over the source object (left) based on the residual error measured (center) Refinement of the deformation graph (right) over the cheeks region of the source object (left) based on the residual error measured (center) Collapsing of the deformation graph (right) over the cheeks region of the source object (left) after updating on the residual error (center) Constraint selection based on the initial non-rigid error between source and target surfaces Overview of the libraries used for each step of our approach Datasets used for evaluation of the non-rigid registration algorithm. I - Synthetic dataset consisting on a deformed plane. II - Real dataset of a deforming hand. III-1 - Real dataset of a user smiling. III-2 - Real dataset of a user inflating his cheeks The resulting color-coded error from the registration between source and target surfaces. In all situations the proposed algorithm AdNodes + Ad- Cons obtained an averaged accuracy below 3mm and standard deviation below 3.5mm. I - Synthetic dataset consisting on a deformed plane. II - Real dataset of a deforming hand. III-1 - Real dataset of a user smiling. III-2 - Real dataset of a user inflating his cheeks Accuracy (in mm) obtained by AdNodes and AdNodes + AdCons in comparison with the Embedded Deformation (ED) algorithm and the initial error for each one of the datasets used xiii

16 xiv LIST OF FIGURES 5.5 Accuracy comparison between ED algorithm and our adaptive approach with respect to the node selection for the dataset II Accuracy comparison between ED algorithm and our adaptive approach with respect to the node selection for the dataset III Accuracy comparison between different sampling schemes used to select constraints for optimization for the dataset II Accuracy comparison between different sampling schemes used to select constraints for optimization for the dataset III Accuracy (in mm) related to the parameter k for each one of the datasets used Accuracy (in mm) obtained for each level of the quadtree and for each one of the datasets used. The maximum number of nodes for a level l is 4 l Performance (in FPS) obtained by AdNodes and AdNodes + AdCons in comparison with ED algorithm for each one of the datasets used Performance (in ms) obtained by our approach for each one of the most computationally expensive methods. MM - Matrix Multiplication (A = J t J); Jacobian - computation of J; Cholesky - LLT decomposition; Solver - linear solver Strsm from CUBLAS library; ACS - Adaptive Constraint Selection; ANS - Adaptive Node Selection; Weights - computation of the influence of G on P s ; MV - Matrix-vector multiplication (b = J t r) Neutral and deformed reference models based on user s facial expression Neutral and deformed reference models for a different user Neutral and deformed reference models based on challenging deformation scenarios Cheeks tracking error measured for both rigid and rigid + non-rigid solutions. Plot in red - rigid tracking. Plot in blue - non-rigid adaptive tracking. Dashed line - threshold Color-coded cheeks tracking error measured for both rigid and non-rigid solutions Cheeks-2 tracking error measured for both rigid and rigid + non-rigid solutions. Plot in red - rigid tracking. Plot in blue - non-rigid adaptive tracking. Dashed line - threshold Color-coded cheeks-2 tracking error measured for both rigid and non-rigid solutions Smile tracking error measured for both rigid and rigid + non-rigid solutions. Plot in red - rigid tracking. Plot in blue - non-rigid adaptive tracking. Dashed line - threshold Color-coded smile tracking error measured for both rigid and non-rigid solutions Smile-2 tracking error measured for both rigid and rigid + non-rigid solutions. Plot in red - rigid tracking. Plot in blue - non-rigid adaptive tracking. Dashed line - threshold

17 LIST OF FIGURES xv 6.11 Color-coded smile-2 tracking error measured for both rigid and non-rigid solutions Kiss tracking error measured for both rigid and rigid + non-rigid solutions. Plot in red - rigid tracking. Plot in blue - non-rigid adaptive tracking. Dashed line - threshold Color-coded kiss tracking error measured for both rigid and non-rigid solutions Kiss-2 tracking error measured for both rigid and rigid + non-rigid solutions. Plot in red - rigid tracking. Plot in blue - non-rigid adaptive tracking. Dashed line - threshold Color-coded kiss-2 tracking error measured for both rigid and non-rigid solutions Open Mouth tracking error measured for both rigid and rigid + non-rigid solutions. Plot in red - rigid tracking. Plot in blue - non-rigid adaptive tracking. Dashed line - threshold Color-coded open mouth tracking error measured for both rigid and nonrigid solutions Angry tracking error measured for both rigid and rigid + non-rigid solutions. Plot in red - rigid tracking. Plot in blue - non-rigid adaptive tracking. Dashed line - threshold Color-coded angry tracking error measured for both rigid and non-rigid solutions Bag tracking error measured for both rigid and rigid + non-rigid solutions. Plot in red - rigid tracking. Plot in blue - non-rigid adaptive tracking. Dashed line - threshold Color-coded bag tracking error measured for both rigid and non-rigid solutions Limitation of the proposed method. User s body (A) is reconstructed (B) and the algorithm cannot track user s arms (C) integrating all the movement into the 3D reference model (D) Body tracking error measured for both rigid and rigid + non-rigid solutions. Plot in red - rigid tracking. Plot in blue - non-rigid adaptive tracking. Dashed line - threshold Color-coded body tracking error measured for both rigid and non-rigid solutions

18

19 LIST OF TABLES 5.1 Number of constraints (C), accuracy (A, given in mm), standard deviation (SD, given in mm) and performance (P, given in FPS) results according to the step size (from 1 to 32) or sampling scheme (Adap for adaptive) used to select constraints for optimization Average accuracy (A, given in mm) and Standard Deviation (SD, given in mm) results according to the weight used to update the 3D reference model Average accuracy (Avg., given in mm), Standard Deviation (Std. Dev., given in mm) and Performance (Perf., given in FPS) results for each one of the tracking algorithms tested in presence of specific user deformation. NRn: Non-Rigid Registration applied for every n frames (independent of rigid tracking fail); NRAdaptive: Non-Rigid Registration applied whenever the rigid algorithm fails Average accuracy (Avg., given in mm), Standard Deviation (Std. Dev., given in mm) and Performance (Perf., given in FPS) results for each one of the thresholds used to detect rigid tracking fail xvii

20

21 Chapter 1 In this first chapter, a brief contextualization of the problem we want to solve, objectives and contributions of the proposed work and thesis organization are described. INTRODUCTION Augmented Reality (AR) is a technology in which the view of a real scene is augmented with additional virtual information. As stated by Azuma (1997), an AR application must follow three basic characteristics: 1. Combination of virtual object(s) into a real scene; 2. Real-time performance; 3. 3D Registration for accurate tracking of the augmented scene; Since the beginning, tracking is one of the main problems which limits the development of a successful AR application. Virtual and real worlds must be properly aligned so that they seem to coexist at the same location for the user. For some applications, such as the ones proposed for medical AR in surgery environments, it is specially important accurate registration of the virtual medical data into the patient or a successful surgery operation may be compromised. Tracking plays an important role not only in AR, but also for 3D reconstruction. Several viewpoints of the same object/scene of interest are captured by an appropriate sensor and these must be registered and aligned to the same coordinate system. After this registration step, the different viewpoints must be integrated into a single 3D model. Therefore, if the viewpoints are incorrectly aligned, visible artifacts will appear in the final reconstructed model. Computer vision techniques have been proposed to solve the problem of registration, however they are not robust enough for some illumination conditions (Teichrieb et al., 2007). With the availability of depth sensors, 3D registration techniques have been proposed using 3D information to improve tracking robustness. But, for low-cost depth sensors, noise may affect the accuracy of the registration. 1

22 2 INTRODUCTION In scenarios such as on-patient craniofacial medical data visualization (Lee et al., 2012; Macedo et al., 2014), it is specially important for a markerless AR environment (MAR) to provide support for non-rigid tracking, which adds one level of interactivity for the user and improves the robustness of the tracking algorithm for rigid and non-rigid patient interactions. The main issue related to this support is that AR requires real-time interactivity and most of the current state-of-the-art works in the field of 3D non-rigid registration do not provide such performance. Here, we assume that an application runs in real-time if its performance is equal or above 15 frames per second (Akenine-moller; Moller; Haines, 2002). This concept of real-time is more related to user interactivity, because the user must interact with the application and receive fast feedback from it without too much delay. Several approaches exist for accurate 3D non-rigid registration, however a few of them allow interactive registration. Despite the real-time techniques which rely on strong priors about a specific scenario (Weise et al., 2011; Chen; Izadi; Fitzgibbon, 2012; Bouaziz; Wang; Pauly, 2013; Li et al., 2013), a few methods have been proposed for fast general-purpose non-rigid registration (Sumner; Schmid; Pauly, 2007; Nutti et al., 2014). Their common characteristic is the way they represent the deformation for a given surface: using a deformation graph. Each node of this graph has a 3D affine transformation which allows source surface to be deformed to a target surface. Deformation is modelled in terms of an energy function and, by using a non-linear optimization algorithm, energy is minimized and the best affine transformations for each node of the graph can be found. In this doctoral work, we want to address the problem of fast 3D non-rigid registration by applying adaptive techniques to reduce the computational cost of the registration while keeping it accurate. 1.1 HYPOTHESIS Our main question of research is: Is it possible to track interactively and with sufficient accuracy deformable objects which undergo deformation in sequential frames in a markerless augmented reality application? To answer this question, we build upon an adaptive approach for fast non-rigid registration in scenarios where real noisy surfaces are captured from a low-cost depth sensor. This thesis aims to solve the problem of fast, interactive 3D non-rigid registration for MAR environments. In this sense, the proposed approach must be as accurate as stateof-the-art solutions, while supporting real-time performance and being robust under noisy and missing data. 1.2 CONTRIBUTIONS The main contributions of this thesis are: ˆ A markerless augmented reality environment based on a low-cost RGB-D sensor; ˆ A dynamic subdivision approach for node selection on the source object;

23 1.3 ORGANIZATION 3 ˆ An adaptive algorithm to select, for each iteration, samples from the source object to be used as constraints for optimization; ˆ A multi-frame adaptive approach in which non-rigid registration is applied only when rigid tracking error is above a certain threshold and a 3D rigid representation of the object is updated to take into account the current deformation; ˆ A full framework for non-rigid registration implemented entirely on the Graphics Processing Unit (GPU); 1.3 ORGANIZATION This thesis is organized as follows: Chapter 2, Fundamentals and Related Work. This chapter formalizes the concepts of augmented reality, 3D registration and their challenges. Also, it provides an extensive review on related work in the fields of rigid and non-rigid registration, focusing on the interactive methods developed so far. Chapter 3, Markerless Augmented Reality Environment. The focus of this thesis is to add support for non-rigid tracking in a markerless augmented reality environment. Therefore, in this chapter we present the environment in which the proposed non-rigid registration was applied and validated. Chapter 4, GPU-Based Adaptive Non-Rigid Registration. In this chapter we present the proposed adaptive non-rigid registration algorithm and its adaptation to take advantage from the parallelism of the GPU, as well as the multi-frame scheme adopted to improve algorithm s performance. Chapter 5, Non-Rigid Registration Evaluation. In this chapter, non-rigid registration is evaluated in terms of accuracy and performance for several datasets. Chapter 6, Non-Rigid Support Evaluation for a Markerless Augmented Reality Environment. In this chapter, non-rigid tracking is evaluated in the context of the markerless augmented reality environment in terms of accuracy, performance and tracking robustness for several datasets. Chapter 7, Conclusion and Future Work. Thesis is concluded with a summary and discussion of future directions.

24

25 Chapter 2 This chapter formalizes the concepts of augmented reality, 3D registration and their challenges. Also, it provides a review on related work in the fields of rigid and non-rigid registration, focusing on the interactive methods developed so far. FUNDAMENTALS AND RELATED WORK 2.1 AUGMENTED REALITY The concept of virtual environments has been proposed since 90s. They can be defined as environments in which only virtual objects are present. Milgram and Kishino proposed a taxonomy to identify in which point the applications were localized inside the socalled Reality-Virtuality Continuum (Milgram; Kishino, 1994) (Figure 2.1). The extremes of this taxonomy are the real world and the Virtual Reality (VR). At the center are the Augmented Reality and Augmented Virtuality. On the former, there is the predominance of the real world over the virtual one, while on the latter there is the prevalence of the virtual world over the real. Figure 2.1 Reality-Virtuality Continuum (Milgram; Kishino, 1994). AR and VR use virtual objects both, but they have some differences. AR changes the real world by adding virtual elements. Thus, it is fundamental for an application to maintain the contact with the view of the real world, which is the basis for an AR application. Although authors such as Vallino and Azuma state that the main goal of AR is the seamlessly integration of the virtual objects into the real scene (Vallino, 1998; Azuma et al., 2001), it is not mandatory for such systems to be realistic. Another central distinction between AR and VR is the registration or tracking problem. This process is crucial in AR: the combination of real and virtual objects into the augmented scene requires an accurate positioning of the virtual objects over the real world. 5

26 6 FUNDAMENTALS AND RELATED WORK The motivation for the development of applications and researches in the field of AR comes from the potential of benefits that such techniques may bring in several other fields. In the specific field of registration, AR methods have been attracting a lot of attention in Medicine, because they extend the possibilities of study and practice for many techniques and medical procedures related to the medical images generated from patient s current condition, such as angiographic visualization (Wang et al., 2012), liver surgery (Haouchine et al., 2013, 2014) and uterine laparosurgery (Collins et al., 2014). However, registration is a crucial problem in AR applications. Objects misplaced in the scene appear to be floating over the real scene. Accurate registration becomes even more crucial in applications which demand high precision, such as surgeries. Tracking in AR is performed based on color or depth intensity of the object being tracked by the application. For color-based tracking, features are computed from the color image of the scene captured by the sensor and tracked during application s live stream (Horn; Schunck, 1981; Lucas; Kanade, 1981). The first solution proposed to solve this issue was based on fiducial markers, used as point of reference positioned in the real scene for tracking (Figure 2.2-left image). Due to its intrusiveness (i.e. the marker is an artificial content introduced in the scene), methods for color-based tracking without markers were proposed. However, the main drawback in this kind of registration is still the same: the susceptibility to illumination conditions. To overcome this problem, depth-based tracking was proposed by registering two surfaces captured from the real scene from a real-time 3D depth sensor (Besl; Mckay, 1992; Chen; Medioni, 1992). This kind of tracking has grown popularity due to its accuracy, robustness over illumination conditions and the recent availability of low-cost depth sensors. In general, AR applications can be divided in two groups: marker-based and markerless. Marker-based AR uses a fiducial marker as a point of reference in the field of view to help the system to estimate the camera pose (Figure 2.2-left image) (Kato; Billinghurst, 1999). Markerless AR (MAR) uses a part of real scene as a natural marker (Figure 2.2- right images) (Izadi et al., 2011). By using it as a point of reference for tracking, one can expect non-rigid motion of the marker if it consists of a deformable object (e.g. face, body, hand) D REGISTRATION 3D registration is a fundamental problem in fields as 3D reconstruction and augmented reality. Most of the depth sensors provide partial surface data (i.e. acquired from one viewpoint) that must be aligned, to allow camera pose estimation, and merged, to obtain a complete digital representation of the object or scene of interest. Some functional models have been proposed in the literature to solve the problem of registration with good performance: 1. Rigid Registration: In this kind of registration, a single Euclidean transformation is used to align two objects (Rusinkiewicz; Levoy, 2001). This transformation has the following properties: (1) It is global (i.e. remains the same for every point); (2) It can be uniquely defined by three non-collinear pairs of correspondences; (3) It is low-dimensional (i.e. only six degrees of freedom). Real-time performance is

27 2.2 3D REGISTRATION 7 Figure 2.2 Marker-based (left image) and markerless (right images) augmented reality. Left image is courtesy of ARToolKit library (Kato; Billinghurst, 1999) and right images are courtesy of KinectFusion (Izadi et al., 2011). easily achieved due to the low number of parameters required to solve the rigid registration; 2. Articulated Deformation: For surfaces which are mainly characterized by articulations, a skeleton is typically used as basis for deformation. In this representation, a skeleton is defined by a combination of bones and joints. Each joint is associated to some DoF (i.e. joint angles) and is related to other joints by rigid transformations (Allen; Curless; Popović, 2002). In an alternative representation, joint deformation is obtained by blending the transformations of two adjacent bones in the overlap regions (Chang; Zwicker, 2008, 2011). The advantage of this representation is that it requires a low number of parameters to be estimated, which depends on the number of available bones or joints; 3. Local Affine Deformation: For several real-world datasets, it is desirable for the non-rigid registration algorithm to support general deformations, without prior knowledge about the objects or the kind of deformation they undergone. To achieve real-time performance, models, such as articulated registration, rely on prior knowledge about the scenario (e.g. skeleton tracking), losing its generality. To solve this issue, keeping non-rigid registration fast, accurate and general, solutions which use local affine transformations are frequently employed as they allow the preservation of fine surface details, while decoupling the complexity of the geometry from the complexity of the deformation by using a deformation graph as basis representation (Sumner; Schmid; Pauly, 2007); Other functional models such as rigid registration with non-rigid correctives (Brown; Rusinkiewicz, 2007) and isometric deformation (Lipman; Funkhouser, 2009) have also been proposed in the literature, however they require too much computational cost, being inadequate to be used in our approach.

28 8 FUNDAMENTALS AND RELATED WORK Rigid Registration Rigid registration estimates a single transformation, composed of rotation and translation, to align two different viewpoints of the same object. Rigid registration is a challenging problem because, for real-world scenarios, it must deal with noise, outliers and nonoverlapping regions in-between two surfaces captured from commodity depth sensors. Noise refers to the presence of unwanted points near the surface captured. Outliers are noisy points far from the surface, but that must be rejected otherwise they may affect the optimization phase. As the object is captured from a single view of the camera, the presence of non-overlapping regions between two surfaces is already expected, however holes and other artifacts may decrease significantly the region of overlap (Tam et al., 2013). To limit the search space for optimization and correspondence estimation, constraints must be defined. In the field of rigid registration, transformation-induced constraints such as closest point criterion are commonly employed. It constraints potential correspondences by computing and matching closest points for every iteration of the registration algorithm. It is used in the standard Iterative Closest Point (ICP) algorithm (Besl; Mckay, 1992; Chen; Medioni, 1992). To reduce search space for correspondence, specific approaches have been proposed: project and project-and-walk methods (Rusinkiewicz; Levoy, 2001) restrict the search for a new closest point to the same 2D projection (i.e. pixel) and local neighbourhood respectively, avoiding global exhaustive search. Other constraints such as features (Johnson; Hebert, 1999) and saliency (Gelfand et al., 2005) have also been proposed in the literature and provide more reliable correspondences, and consequently more accurate convergence to the final result, however they require high processing time, being inadequate to be used in our proposal. In fact, rigid registration has been researched for several years and now it consists on a well-defined problem with a small number of parameters to be estimated. Then, real-time high-quality methods have already been proposed in the literature. The most popular algorithm for 3D rigid registration is the ICP. It consists of six steps: ˆ Selection of Points: Points from source and target objects are selected as samples for the algorithm; ˆ Matching of Points: Corresponding points from source and target objects are associated; ˆ Weighting of Correspondences: Correspondences are weighted such that the most reliables will have more weight according to its level of reliability; ˆ Rejecting of Correspondences: Outliers are rejected from the pairs of corresponding points; ˆ Error Association: Point-to-point or point-to-plane error metric is defined for the optimization step; ˆ Error Minimization: Energy function built from previous step is (commonly) minimized by solving a linear system.

29 2.2 3D REGISTRATION 9 As the ICP algorithm provides high accuracy and real-time performance for rigid registration (Rusinkiewicz; Levoy, 2001), it is used in our approach Non-Rigid Registration Non-rigid registration requires more attention because it faces the issues from rigid registration and also the problem of deformation, which itself increases the number of parameters to be estimated and the space of solutions that can be found. Unlike the rigid scenario, where every point from a given source object must be moved by a single transformation measured by the algorithm, in the non-rigid scenario, every point may undergo a different, interconnected deformation. Therefore, more reliable correspondences must be computed for every region of the source object so that the registration may be sufficiently accurate and realistic (Tam et al., 2013). Traditionally, commercial systems have used markers to provide sparse reliable correspondences for non-rigid registration, however, they are intrusive in the scene (Bermano et al., 2014). Templates have been used for applications based on part-to-whole alignment, where they provide strong priors for the shape, helping on handling of noise and missing data (Li et al., 2009). For scenarios such as facial non-rigid registration, blendshapes can be applied to capture a basis set of user expressions (Weise et al., 2011; Bouaziz; Wang; Pauly, 2013; Li et al., 2013). Other constraints induced by deformation, features, signature and saliency require too much processing time. Closest point criterion can be used for rigid and non-rigid registration in a similar way. However, regularization constraints are commonly employed to improve optimization phase by avoiding local minima taking advantage from a priori information. Orthonormality (Sumner; Schmid; Pauly, 2007) and handling of holes (Li; Sumner; Pauly, 2008) are some of the most used regularization schemes for non-rigid shapes. In general, non-linear optimization solver is typically employed for rigid and non-rigid registration. Many techniques have been used focusing on finding the best transformations and correspondences. Local deterministic optimization methods compute a solution that maximizes/minimizes an energy function locally. These techniques do not produce the most accurate solutions, but are mainly used due to their low processing time. Gradient-descent, Newton, Gauss-Newton, quasi-newton and Levenberg-Marquadt are often employed for non-rigid registration (Madsen; Bruun; Tingleff, 2004). Singular Value Decomposition, quaternions, orthonormal matrices and dual quaternions are the most frequently used for rigid registration (Lorusso; Eggert; Fisher, 1995). As local optimization techniques may find only the local minima, global optimization can solve this problem trying to find a global solution. As alternative, stochastic optimization can solve this problem by using statistics and probabilistic models. While stochastic and global deterministic optimization seem to be more accurate, in this thesis we use a technique based on local optimization because of its low running time. Moreover, as we assume that there are spatial and temporal coherences between the sequential frames used for registration, local optimization converges after a few iterations (Sumner; Schmid; Pauly, 2007). Surfaces such as face, hand and body may undergo deformation during a process of 3D reconstruction, for instance, and the rigid registration is not able to solve it. A solution

30 10 FUNDAMENTALS AND RELATED WORK for this issue is to apply non-rigid registration to align those deformable objects. One of the first works in the field of fast non-rigid registration applied to computer graphics is the Embedded Deformation (ED), a real-time deformation algorithm for object manipulation and creation of 3D animation (Sumner; Schmid; Pauly, 2007). The goal of this technique is to allow an user intuitive surface editing while preserving surface s features. Deformation is represented by a graph. Each node of this graph is associated with an affine transformation that influences the deformation to the nearby space. The great advantage of this approach is that it can be applied to a wide range of objects, articulated or not. Although its main goal is the user object manipulation, the algorithm proposed by Sumner et al. also can be seen as a non-rigid registration algorithm in which source and target surfaces are the objects before and after user manipulation. In this sense, many other works have used or improved this approach to the specific problem of non-rigid surface registration. Li et al. adapted Sumner s algorithm to the registration of partial range scans acquired from a 3D scanner (Li; Sumner; Pauly, 2008). They augmented the ED algorithm with a rigid registration and designed an energy function to penalize unreliable correspondences. Later on, Li et al. presented an extension of the previous approach (Li; Sumner; Pauly, 2008) where an algorithm for high-quality template-based non-rigid surface registration and reconstruction using dynamic graph refinement and multi-frame stabilization was presented (Li et al., 2009). Li et al. presented a method for temporally coherent completion of surfaces captured from real-time dynamic performances (Li et al., 2012). They extended the non-rigid registration proposed in their previous work (Li et al., 2009) by adding texture constraints for the optimization. Dou and colleagues proposed an algorithm to track dynamic objects acquired from real-time commodity depth cameras, such as the Microsoft Kinect Sensor (Dou; Fuchs; Frahm, 2013). Basically, they have extended the KinectFusion algorithm (Izadi et al., 2011) to deal with non-rigid registration. Their non-rigid registration is based on ED algorithm, however color consistency and dense point cloud alignment were added to the original energy function. All these approaches improve the accuracy of the ED algorithm, however requiring execution time in the order of minutes to register two point clouds. Thus, they are not suitable for an AR application. Few methods were capable to achieve real-time performance in 3D non-rigid registration. Chen et al. proposed a method for non-rigid registration of skeletons captured from user s body (Chen; Izadi; Fitzgibbon, 2012). Their approach runs in 30 frames per second (FPS) but uses a small number of constraints for registration and depends on a skeleton definition. Nutti et al. proposed a method to track tumors based on patient s body position that presumes the prior knowledge about the scenario (Nutti et al., 2014). Their algorithm runs in 10 FPS by using a multi-thread implementation of (Li et al., 2009) in CPU. Zollhöfer et al. proposed a method for real-time non-rigid registration of arbitrary meshes captured from the real scene (Zollhöfer et al., 2014). Based on a hardware specialized for high-quality surface acquisition, their approach generates a 3D template model of the object of interest and uses a hierarchical non-rigid registration algorithm fully

31 2.3 SUMMARY 11 implemented on the GPU. The implementation runs in 30 FPS with high accuracy. In this work, we present an approach also based on the ED algorithm which shares some characteristics of (Zollhöfer et al., 2014), such as no special configuration or prior knowledge of the object and GPU parallelism to achieve real-time performance. However, no special hardware is supposed to be used, on the contrary, our approach is based on a simple off-the-shelf RGB-D sensor, with noise and low accuracy. As proposed in related work (Li et al., 2009), we use an adaptive graph refinement to improve non-rigid registration accuracy. Differently from other approaches, the algorithm proposed here runs entirely on the GPU and is based on a quadtree which operates over the 2D projection of the object to be registered. Also, the main goal of our algorithm is to be incorporated in a MAR environment, as a tool to improve tracking of the deformable object. 2.3 SUMMARY Augmented reality is a technology which has been used in several fields such as medicine, entertainment, among others. For some applications, markerless technology is useful to remove the intrusiveness of traditional marker-based approaches. When the object used as basis for markerless tracking is deformable, it is desirable for the application to support non-rigid motion to improve tracking robustness. Many methods have been proposed for accurate 3D non-rigid registration inspired by the ED algorithm, however a few of them support real-time performance, still requiring prior knowledge about the scenario. To overcome this situation, in this thesis we propose an alternative method for fast 3D non-rigid registration which extends the ED algorithm by using a three-level adaptive approach implemented entirely on the GPU.

32

33 Chapter 3 The focus of this thesis is to add support for non-rigid tracking in a markerless augmented reality environment. Therefore, in this chapter we present the environment in which the proposed non-rigid registration was applied and validated. MARKERLESS AUGMENTED REALITY ENVIRONMENT In this chapter we present the MAR environment in which this work is based on. An overview of proposed MAR environment can be seen in Figure 3.1. An RGB-D sensor is used to capture color and depth information of the scene. The object of interest is localized, segmented from the scene and reconstructed in real-time. Then, real-time tracking is performed by using the 3D reference model previously reconstructed and the current 3D object captured by the sensor. The final registered 3D object is integrated into the 3D reference model to account for new viewpoints or changes in object s shape due to deformations. A detailed explanation of the environment can be seen in the next subsections of this chapter. 3.1 SURFACE ACQUISITION In this environment, an RGB-D sensor is used to capture color and depth information from the real scene for every input frame (Figure 3.1). Color information is encoded as a color map, an image which stores for each pixel the red, green and blue intensities of the captured scene. Depth information is encoded as a depth map (D), an image which stores for each pixel the measurement of distance (i.e. depth) from the corresponding 3D point on the scene to the depth sensor. Our approach is based on a low-cost RGB-D sensor which provides noisy depth data. As described in Section 2.2, unwanted points on the surface captured may reduce registration accuracy. To minimize this problem, bilateral filter is applied over D (Tomasi; Manduchi, 1998), as shown in Equation 3.1. To reduce noise preserving features (i.e. discontinuities) of the raw depth data, this technique uses a non-linear combination of nearby image intensities based on geometric proximity and photometric similarity. 13

34 14 MARKERLESS AUGMENTED REALITY ENVIRONMENT Live Stream Object Segmentation TSDF 3D Reference Model 3D Reference Model Reconstruction Tracking Source Depth Map Source Surface Registration RGB-D Sensor Target Depth Map Target Surface Figure 3.1 Overview of the proposed approach from 3D reference model reconstruction to tracking solution. Adapted from (Souza; Macedo; Apolinario, 2014). D f (p) = 1 W (p) G σd ( p q )G σc ( D(p) D(q) )D(q) q S W (p) = q S G σd ( p q )G σc ( D(p) D(q) ) (3.1) where D(p) and D(q) correspond to the pixel values at positions p and q in image D. σ d and σ c are the standard deviations of Gaussian functions G for space (i.e. distance) and range (i.e. color) domains, respectively. W (p) is a normalization factor, S is the neighbourhood of pixel p and D f is the filtered depth map. From empirical tests, we have set σ d = 4.5 and σ c = 30. Unwanted points are also localized on the background scene, which can be removed from Df by using a depth threshold. On the experiments conducted, we have used the value of 1.3 meters for such task by considering that the object of interest is somewhere near the depth sensor. To detect and segment the object of interest in the scene (Figure 3.1), two methods can be used. The first method relies on the use of a classifier to detect the object on the appropriate map. If it is applied on the color map, intrinsic and extrinsic calibrations must be performed to allow the mapping of the segmented region from color to depth map. In practice, we have tested the approach in some scenarios where the object consists on user s head. In these cases, the Viola-Jones face detector (Viola; Jones, 2004) implemented

35 3.2 3D REFERENCE MODEL RECONSTRUCTION 15 in GPU is used to locate and segment the face in the color map (Figure 3.1). This detector takes advantage from a representation called integral image to compute Haarlike features quickly. In an integral image, each pixel contains the sum of the pixels above and to the left of the original position. After the computation of the Haar-like features, a combination of simple classifiers built using the Adaboost learning algorithm is employed to detect faces in color images (Freund; Schapire, 1995). If the classifier is not available, an alternative method can be used. A 2D bounding box that contains the foreground object is computed from D. Then, it is discarded from the memory every position outside the bounding box. By applying the process of intrinsic calibration, a point cloud P is computed from D. The normal vector (n) for each point is the eigenvector of smallest eigenvalue for a covariance matrix built for every point p P (Holzer et al., 2012). Once the 3D object is obtained for every frame, markerless rigid registration is performed based on the interactive alignment of two consecutive source (P s ) and target (P t ) point clouds captured from the real scene. In fact, P s is represented by a 3D reference model generated from the object of interest in a previous pose and P t is the current point cloud acquired by the depth sensor. To achieve real-time performance, all the steps of this MAR environment must run on the GPU. Then, all the algorithms were carefully designed and implemented in a parallel way to exploit the full parallelism provided by the hardware D REFERENCE MODEL RECONSTRUCTION To reconstruct the 3D reference model from the object of interest in real-time (Figure 3.1), the KinectFusion algorithm is employed (Izadi et al., 2011; Newcombe et al., 2011). An overview of this algorithm can be seen in Figure 3.2. Figure 3.2 Overview of KinectFusion s pipeline (Izadi et al., 2011). Once the object is detected on the scene, the region that contains it is fixed. Then, the object is constrained to be moved only inside this region. From the different viewpoints captured from the same object, a single 3D reference model can be generated. To do so, the KinectFusion integrates raw depth data captured from an RGB-D sensor into a 3D grid to produce a high-quality 3D reconstruction of the object of interest. The grid

36 16 MARKERLESS AUGMENTED REALITY ENVIRONMENT stores for each voxel the signed distance to the closest surface around a narrow region (i.e. TSDF - Truncated Signed Distance Function) and a weight that indicates uncertainty of the surface measurement. These volumetric representation and integration are based on the VRIP algorithm (Curless; Levoy, 1996). To extract the implicit surface of the 3D reconstructed model, zero-crossings (i.e. positions where the TSDF sign changes) are detected on the grid through the raycasting algorithm. By extracting the reference model in a previous pose, and aligning it to the current 3D model captured by the depth sensor, the incremental motion (T rigid ) between frames can be estimated. This solution allows accurate markerless tracking without error accumulation, as the high-quality 3D reference model is used as basis for tracking. 3.3 TRACKING Rigid motion is estimated by the ICP algorithm described in Section 2.2. Each one of the ICP steps were designed to achieve real-time performance while providing good accuracy for the rigid registration. This real-time variant of the algorithm is described as follows: ˆ Selection of Points: All the points from P s and P t are selected for optimization; ˆ Matching of Points: Corresponding points between P s and P t are associated by using the projective data association (i.e. reverse calibration) (Rusinkiewicz; Levoy, 2001), which matches the points that are located at the same 2D projection position (i.e. the same pixel in D s and D t ); ˆ Weighting of Pairs: It is assigned constant weight for each association; ˆ Rejection of Pairs: Pairs are rejected if the Euclidean distance between corresponding points is greater than 10mm or angle between corresponding normals is greater than 20 degrees; ˆ Error Metric: Point-to-plane metric (Equation 3.2) is used to guide optimization; argmin p selected (T rigid p s p t ) n t 2 (3.2) ˆ Error Minimization: Error metric is minimized by using the Cholesky decomposition on Equation 3.2 (Chen; Medioni, 1992). The real-time variant of the ICP algorithm uses projective data association to find correspondences. The ICP fails, or does not converge to a correct registration, when there is high pose variation between frames in sequence. To improve tracking robustness, a real-time pose estimator is used to give a new initial guess to the tracking algorithm when it fails (Figure 3.3). For the situations where the object consists on user s head, the head pose estimation algorithm proposed by Fanelli et al. was used (Fanelli et al., 2011). However, even using this algorithm, the tracking may fail if the user interacts non-rigidly with the application. Non-rigid tracking support can be added by applying a real-time non-rigid surface registration algorithm to align the 3D reference model and the current model captured, as will be discussed in the next chapter.

37 3.4 SUMMARY 17 Figure 3.3 Left image: The user translated his face fast. A small number of points were at the same image coordinate and the ICP failed. Right image: By using the pose estimation algorithm, the problem can be solved (Macedo; Apolinario; Souza, 2013). 3.4 SUMMARY One solution to provide accurate markerless tracking for an augmented reality environment is by generating a 3D reference model of the object of interest and tracking it in real-time. The KinectFusion algorithm is used to reconstruct such model in real-time and the ICP algorithm is used to track it in the scene by registering the 3D reference model in a previous pose and the current 3D model captured from a depth sensor. To add support for non-rigid tracking, it is necessary a real-time non-rigid registration algorithm to maintain user interaction with the application.

38

39 Chapter 4 In this chapter we present the proposed adaptive non-rigid registration algorithm and its adaptation to take advantage from the parallelism of the GPU. Our approach is evaluated in terms of accuracy and performance for several datasets. Some of the content described in this chapter is present in our authored publication in (Souza; Macedo; Apolinario, 2014). GPU-BASED ADAPTIVE NON-RIGID REGISTRATION In this chapter we present the adaptive non-rigid registration algorithm. An overview of the full process to register two point clouds can be seen in Figure 4.1. Non-rigid algorithm builds a deformation graph (G) on P s to allow its deformation to P t iteratively. Each node g G consists of a point P s associated with a 3D affine rigid transformation (i.e. a 3D rotation matrix R and a 3D translation vector t) which influences the deformation to the nearby space. Current deformation between P s and P t is modelled in terms of an energy function and a non-linear optimization algorithm is applied to minimize this energy based on the affine transformations of G. To reduce computational cost of the non-linear solver, a sub-sample of P s is selected as constraint to be used during optimization. Next, the algorithm iteratively refines G according to the energy function measured previously. This refinement is based on a quadtree. The registration is stopped when the residual error between deformed P s and P t is sufficient low. To achieve a good performance, the full pipeline runs entirely on GPU and non-rigid registration algorithm is applied in a multi-frame manner only when rigid tracking fails. Our deformation model is inspired in the ED algorithm (Sumner; Schmid; Pauly, 2007). However, we have added a three-level adaptive approach to improve accuracy and performance of the original solution. Moreover, we have implemented it on the GPU to boost performance even more. The proposed algorithm consists of several stages (Figure 4.1), which are described in the next sections of this chapter. 4.1 DEFORMATION MODEL By using the deformation graph, a point p can be deformed by G according to the following equation: 19

40 20 GPU-BASED ADAPTIVE NON-RIGID REGISTRATION Source Depth Map Cropped Source Depth Map Source Surface Selection of Nodes Target Depth Map Cropped Target Depth Map Target Surface Matching of Points Building of Quadtree Weighting the Influence of Nodes Selection of Constraints Adapting Quadtree Error Minimization Error > threshold Updating the source object Error threshold Deformed Source Surface Figure 4.1 Overview of the proposed approach from the depth map acquisition to the final non-rigid aligned surface.

41 4.1 DEFORMATION MODEL 21 p = k w j (p)[r j (p g j ) + g j + t j ] (4.1) j=1 where k represents the k-nearest nodes of p and w j is a weight that measures the influence of each node to the point. To solve the problem of non-rigid registration using this representation, we use three energy functions - E rot, E reg, E con (Sumner; Schmid; Pauly, 2007): ˆ Energy function for rotation (E rot ): In order for a 3 3 rotation matrix to represent a rotation in SO(3), it must satisfy six conditions: each of its three columns must be unit length, and all columns must be orthogonal to one another (Grassia, 1998). The squared deviation of these conditions is given by the function Rot(R): Rot(R) = (c 1 c 2 ) 2 + (c 1 c 3 ) 2 + (c 2 c 3 ) 2 + (c 1 c 1 1) 2 + (c 2 c 2 1) 2 + (c 3 c 3 1) 2 (4.2) where c 1, c 2 and c 3 are the column vectors of a given rotation matrix. The term E rot is defined by the sum of the rotation error over all affine transformations of G: E rot = m Rot(R j ) (4.3) j=1 ˆ Energy function for regularization (E reg ): In order to apply a deformation sufficiently smooth, we must ensure that the affine transformations of adjacent nodes in G must be consistent. E reg is the sum of the squared distances between each node s transformation applied to its neighbours and the actual transformed neighbour positions: E reg = m R j (g k g j ) + g j + t j (g k t k ) 2 2 (4.4) j=1 k N(j) where N j consists of all nodes connected with the node g j. ˆ Energy function for constraints (E con ): This energy function deals directly with P s and P t. It measures how distant they are from each other. E con is the sum of the Euclidean distances between the deformed source points and their correspondents on the target object:

42 22 GPU-BASED ADAPTIVE NON-RIGID REGISTRATION E con = n p i q i 2 2 (4.5) i=1 q is the target point correspondent to p i, p i is p i after deformation (Equation 4.1). n is the total of points in P s. The total energy function E tot is defined by the following equation: E tot = w rot E rot + w reg E reg + w con E con (4.6) We used w rot = 1, w reg = 10 and w con = 100 in all our experiments, as suggested in related work (Sumner; Schmid; Pauly, 2007). We tested other weights and alternative strategies for relaxing them during each iteration, however we did not obtain better results. 4.2 MATCHING OF POINTS After object detection and segmentation, points from P s and P t are associated. By using the MAR environment described in the previous chapter, it is assumed that there is temporal/spatial coherence between frames, as the rigid registration was already applied and, as result, P s and P t are relatively near from each other. Hence, projective data association (Section 3.3) is used to match the points. As adaptation for GPU processing, each GPU thread transforms a single point p s into image coordinate and associates it with the point p t at the same image coordinate. 4.3 SELECTION OF NODES After the matching of points, the nodes of G are selected. A quadtree is built on GPU to perform the selection of nodes based on the 2D projection of G. As the nodes of G are also points in P s, we can convert them from world to image coordinates by using the same process used to reproject P s into D s. P s may be an object with holes distributed along the surface. In this case, the selection of nodes only based on the 2D space may cause the nodes to be selected in regions where there is no depth data. To solve this problem, we take advantage from what we call virtual nodes to represent the space where there is no depth data. Virtual nodes favor the expansion of the quadtree in regions where naturally we have depth data, however not in the specific position of the node. It is worthy to mention that virtual nodes do not have affine transformation, they are just leaves of the quadtree that can be refined to generate real leaf nodes if necessary. Therefore, we restrict the use of virtual nodes in the first two levels of the quadtree. To build the quadtree, some information must be stored on the GPU memory space, such as: the level for each node in G, whether in a given position exists a node in G, G has children (i.e. is a parent node) and exists a virtual node in G. The algorithm can be divided in three steps: the building of the quadtree (Algorithm 1), the adaptive refinement (Algorithm 2) and collapse (Algorithm 3) of nodes in G.

43 4.3 SELECTION OF NODES 23 Algorithm 1 Building a quadtree 1: for each thread of index idx in parallel do 2: u getp ixel(idx, currentlevel) 3: if depth(v(u)) > 0 then 4: insertn odeingraph(u) 5: setlevel(u, currentlevel) 6: else if currentlevel <= 2 then 7: insertv irtualn odeingraph(u) 8: setlevel(u, currentlevel) 9: end if 10: if currentlevel > 1 and hasn ode(u) then 11: parentidx = idx/4 12: u getp ixel(parentidx, currentlevel 1) 13: removen odef romgraph(u) 14: removev irtualn odef romgraph(u) 15: insertn odeinp arentlist(u) 16: end if 17: end for Figure 4.2 Building of the deformation graph (right) over the source object (left) based on the residual error measured (center). We build the quadtree in the first iteration of our algorithm. This building is shown as pseudocode in Algorithm 1 and one result is illustrated in Figure 4.2. First, we iteratively call the GPU kernel that will select the nodes. We iterate from the first level to the level required by the user to build the quadtree. Each GPU thread in parallel computes the position u to select the node (line 2). To compute u, we need the thread id and the current level of the quadtree being iterated. The method getp ixel shifts the position of the thread id to the center of the 2D space that will be represented by the node. If the point is visible, it will be a new node in G (lines 3-5). In the opposite case, it can be a new virtual node (lines 6-9). Therefore, we allow the quadtree to be refined even in regions where there are just a few points. If the node was selected but it is not in the first level (line 10), the thread removes the parent node from G, being it a real or virtual node, and

44 24 GPU-BASED ADAPTIVE NON-RIGID REGISTRATION inserts it into a parent list, indicating that it has already been expanded (lines 13-15). In this case, getp ixel computes the position of the parent node based on the previous level in the quadtree hierarchy and the parent id thread (as the parent is expanded to four children, we simply divide the current thread id by 4 to obtain the parent id). Algorithm 2 Refinement of nodes 1: for each thread of index idx in parallel do 2: u getp ixel(idx, currentlevel) 3: if hasn ode(u) or hasv irtualn ode(u) and getlevel(u) = currentlevel then 4: evaluatee con (u) 5: if region around u must be refined then 6: for each child node at pixel u c do 7: if depth(v(u c )) > 0 then 8: insertnodeingraph(u c ) 9: setlevel(u c, currentlevel + 1) 10: end if 11: end for 12: removen odef romgraph(u) 13: removev irtualn odef romgraph(u) 14: insertn odeinp arentlist(u) 15: end if 16: end if 17: end for Figure 4.3 Refinement of the deformation graph (right) over the cheeks region of the source object (left) based on the residual error measured (center). After the building of the quadtree, the nodes of G can be refined or collapsed according to the residual error measured in the previous iteration. The algorithm to do the refinement of nodes is shown as pseudocode in Algorithm 2 and one result is illustrated in Figure 4.3. Again, we iteratively call the GPU kernel that will refine the nodes. We iterate from the first level of the quadtree to the maximum level in order to refine the

45 4.4 WEIGHTING THE INFLUENCE OF NODES 25 nodes in a top-down fashion. For each GPU thread in parallel, we compute the position of the thread in the 2D space, check if there is a node at this position and if it is at the current level being iterated (lines 2-3). If the thread passes from this condition, we compute the average of the error around a region C as explained before (line 4). If the average is above a certain threshold, the node must be refined. For each child node computed from the node position (line 6), we check whether there is a point at the child position (line 7). If exists, it will be a new child node in G (line 8). In this case, the thread removes the node from G (lines 12, 13) and inserts it into a parent list, indicating that it has already been expanded (line 14). The algorithm to do the collapsing of nodes is shown as pseudocode in Algorithm 3 and one result is illustrated in Figure 4.4. Again, we iteratively call the GPU kernel that will collapse the nodes. We iterate from the maximum level of the quadtree to the root node in order to collapse the nodes in a bottom-up fashion. For each GPU thread in parallel, we compute the position of the thread in the 2D space, check if the node has children and if it is at the current level that is being iterated (lines 2-3). If the thread passes from these conditions, given a region C around u, we compute the average of the error E con (Equation 4.5) for each p s C (line 4). If the average is below a certain threshold, the children nodes in C must be collapsed. To collapse the nodes, we check if exist child nodes and they are leaf nodes (line 6). In this case, they are collapsed (lines 7-9) and C is represented by the old parent node (lines 10-11). Algorithm 3 Collapsing of nodes 1: for each thread of index idx in parallel do 2: u getp ixel(idx, currentlevel) 3: if haschildren(u) and getlevel(u) = currentlevel then 4: evaluatee con (u) 5: if region around u must be collapsed then 6: if exist child nodes and they are leaves then 7: for each child node at pixel u c do 8: removenodef romgraph(u c ) 9: end for 10: insertn odeingraph(u) 11: removen odef romp arentlist(u) 12: end if 13: end if 14: end if 15: end for 4.4 WEIGHTING THE INFLUENCE OF NODES In this step, the influence of the k-nearest nodes for each p s is computed. The weight w j can be computed by: w j (p) = (1 p g j /dist max ) 2 (4.7)

46 26 GPU-BASED ADAPTIVE NON-RIGID REGISTRATION Figure 4.4 Collapsing of the deformation graph (right) over the cheeks region of the source object (left) after updating on the residual error (center). and then normalized to sum to one. dist max is the distance to the k + 1-nearest node with respect to p. From the Equation 4.7, it is guaranteed that the nearest nodes will have more influence in the deformation of p. Also, as the nodes are points of P s, they are deformed by other nodes of G. To compute the weights efficiently in GPU, we create an array that contains only the nodes selected. The direct access to this array prevents us from checking explicitly on the surface whether a point is also a node. Then, each GPU thread computes the influence for a specific node in G. 4.5 SELECTION OF CONSTRAINTS To compute the best affine transformations that align P s and P t we must: 1. Select the constraints (i.e. points from P s that will be used during the optimization phase); 2. Convert the affine rotations from Euler to quaternion representation; 3. Compute the energy function E tot (Equation 4.6) that models the constraints to guide the proper registration of the objects; 4. Use a non-linear solver to minimize E tot ; Instead of using the full dense point cloud as constraint for the optimization or asking the user to perform this task of constraint selection, we use an adaptive algorithm that performs the selection of constraints based on the residual error previously measured (Equation 4.6). Given a region on the source surface, the higher the error, the higher the number of points selected as constraints for the optimization, as can be seen in Figure 4.5. In the first iteration of the optimization algorithm, where the residual error still was not measured, an uniform sampling is used to select the constraints. To do that, a n n mask, with step n, is scanned through the 2D projection of P s at the xy coordinates.

47 4.6 ERROR MINIMIZATION 27 Source Surface Target Surface Initial Error Constraints max 0 Figure 4.5 Constraint selection based on the initial non-rigid error between source and target surfaces. The point at the center of this mask is selected to be a constraint if it exists in P s (i.e. it is not in a hole). From empirical tests, n = 4 produced the best results. A discussion about the most appropriate value for n is shown in Chapter 5, Section 5.2. In the remaining iterations of the optimization, we use the same n n mask to perform a scan on the 2D projection of P s and its residual error E tot (Equation 4.6). First, the algorithm evaluates the average residual error at the n n region being scanned. Based on this average error measured from E tot, which we call here E avg, and a pre-defined threshold th c, the number of points selected at that region will be defined. In this case, we have three situations: 1. E avg > th c, all the n 2 points are selected; 2. E avg th c /2 and E avg th c, n points uniformly distributed over the mask are selected; 3. E avg < th c /2, only the point at the center of the mask is selected. Therefore, we select more constraints in the regions where the deformation is high and must be minimized, but we still consider the regions where the deformation is small or none, by selecting a small number of constraints to represent them. From empirical tests, th c equals to the half of the averaged root mean squared error measured for the dataset produced the best results. 4.6 ERROR MINIMIZATION In this stage, the affine transformation A = [R t], where R is a 3 3 rotation matrix and t is a 3D translation vector, is estimated for each node by a non-linear Gauss-Newton

48 28 GPU-BASED ADAPTIVE NON-RIGID REGISTRATION solver using the constraints selected previously. After the selection of constraints, we need to convert the affine rotations from Euler to quaternion representation. The motivation is related to our non-linear solver, that operates faster with quaternions (three unknowns, assuming the component w equals to 1) than the Euler-form rotation matrix (nine unknowns). To store the affine transformations that will be estimated, we create two arrays: one array to store six parameters (i.e. three from quaternion and three for translation) for each node, and another array that is a hash relating a node in G to where are its parameters in the first array. We compute the array and the hash elements using atomic operations on the GPU. Once with E tot, we must solve the optimization step to obtain the affine transformations that align P s to P t. To achieve this goal we use the Gauss-Newton solver (Madsen; Bruun; Tingleff, 2004). Our objective is to solve the normal equation J t J = J t r. We compute the residual r, that consists in the computation of E tot for each coordinate x, y and z of each constraint and the Jacobian J, that is the first-partial derivative of E tot for each one of the parameters. represents the unknown parameters that we want to find to minimize E tot. To compute J efficiently we compute only the partial derivative for the parameters that affect the constraint in which the derivation is being computed. Once with J and r, we reduce the normal equation to the linear system A = b and compute the products A = J t J and b = J t r. After solving the linear equation, we add to the array of parameters (i.e. quaternions + translation vectors) and reiterate the optimization algorithm until the maximum number of iterations or if the error is stabilized (does not change more than 5%). Related to GPU processing, r and J are computed in parallel. A and b are computed by using the matrix-matrix and matrix-vector multiplication from CUBLAS library (Nvidia, 2008). The linear system is solved by using a GPU implementation of the LLT decomposition proposed (Henry, 2009) together with a linear solver Strsm from the CUBLAS library. 4.7 UPDATING THE SOURCE OBJECT The affine transformations computed in the previous step are applied on P s based on Equation 4.1 and the algorithm is reiterated to the second step until the maximum number of iterations is reached (we use three iterations to limit processing time). Each GPU thread applies Equation 4.1 for each p s of the source object. 4.8 MULTI-FRAME NON-RIGID TRACKING To add support for non-rigid tracking, one solution is to apply it whenever the rigid tracking fails, enhancing the robustness of the MAR environment. However, to apply non-rigid tracking for every frame has a computational cost which does not make it suitable for real-time applications. Therefore, if rigid tracking keeps failing consecutively, non-rigid tracking will be used more frequently, reducing user interactivity. To solve this problem, we take advantage from the volumetric representation of the

49 4.9 SUMMARY 29 KinectFusion algorithm to update the 3D reference model in real-time based on the current deformation measured. When the rigid tracking fails (i.e. error measured is above a certain threshold), non-rigid registration is applied and the 3D reference model deformed surface is sent to KinectFusion s grid with a high weight. 3D reference model is updated in the grid representation by the TSDF computation and then the grid is ray casted to generate a new source surface for the next iteration. High weight is used for fast adaptation of the previously stored 3D reference model into a new deformed one. As consequence, by deforming the 3D reference model, non-rigid tracking converges faster and with higher accuracy in the next iterations than the rigid-only solution (i.e. in which only rigid tracking is applied and KinectFusion s volume is not updated). 4.9 SUMMARY In this chapter we have presented a fast method for non-rigid registration which is able to register two noisy point clouds captured from a depth sensor with high accuracy. We have proposed an adaptive strategy for node distribution and constraint selection. In this context, it is fundamental to validate the algorithm in a real MAR environment in order to validate tracking robustness over many frames as well as averaged accuracy and performance, that is what will be done in the next chapter.

50

51 Chapter 5 In this chapter, non-rigid registration is evaluated in terms of accuracy and performance for several datasets. NON-RIGID REGISTRATION EVALUATION In this section we describe the experimental setup used and analyse accuracy and performance of the proposed algorithm. In the tests, we compare the results obtained with our algorithm in relation to related work, such as the ED algorithm. 5.1 METHODOLOGY For all tests, we ran our algorithm on an Intel Core TM i GHZ, 8GB RAM memory, NVIDIA GeForce GTX 660. Kinect is used as RGB-D sensor due to its accessibility and low-cost (Cruz; Lucio; Velho, 2012). It consists of a structured light depth sensor (IR emitter and camera), an RGB camera, an accelerometer, a motor and a multi-array microphone. Both cameras operate at 30 Hz, pushing images at 640x480 pixels. While the sensor provides depth maps in real-time, the depth data is noisy and inaccurate. To implement the approach proposed in this thesis, we have used some libraries or toolkits to ease the implementation. The configuration of these libraries in the context of our approach is illustrated in Figure 5.1. OpenNI was used to capture the depth and color stream provided by the Kinect sensor (Occipital, 2015). Object detection and segmentation were done by using the OpenCV library (Bradski; Kaehler, 2008). We have implemented 3D reference model reconstruction and non-rigid registration in GPU by using the NVIDIA CUDA architecture (Kirk; Hwu, 2010). Also, we used the open source C++ implementation of the KinectFusion released by the PCL project (Rusu; Cousins, 2011). 3D reference model was reconstructed with the KinectFusion using a grid with volume size of 70cm 70cm 140cm and resolution of 512 3, as suggested in related work (Macedo et al., 2014). The non-rigid registration optimization takes 20ms for each iteration. Therefore, to achieve real-time performance ( 15 frames per second), we have chosen to use only three iterations of the optimization. As the optimization converges faster, 31

52 32 NON-RIGID REGISTRATION EVALUATION such small number of iterations still produces a good balance between accuracy and performance. As each dataset has its own minimum and maximum errors, we set the thresholds for adaptive node and constraint selections to be half of the averaged root mean squared error measured. Image Processing Reference Model Reconstruction Kinect Live Stream Non-Rigid Registration Figure 5.1 Overview of the libraries used for each step of our approach. Out of the MAR environment, we have tested the proposed non-rigid registration algorithm in four different datasets, which can be seen in Figure 5.2. I Synthetic dataset: to perform a ground-truth evaluation for the non-rigid registration in objects free from noise and holes. This dataset contains models with 10k points; II Real dataset with high precision and low noise: to evaluate the non-rigid registration in objects with low level of noise. This dataset was used by Weise et al. and consists on a deforming hand with 80k points (Weise; Leibe; Gool, 2007). Although this is not the kind of data we will find on the markerless AR environment, this is a real dataset common on the literature. Therefore, it was used to compare our approach with ED using a common model; III Real datasets with medium precision and noise: to evaluate the non-rigid registration in objects with noise and holes. The source and target surfaces were captured by our markerless AR environment. This scenario contains two different datasets: an user deforming his face by smiling (III-1) and inflating his cheeks (III-2). These two scenarios have objects with 30k and 40k points respectively;

53 5.2 ACCURACY EVALUATION 33 Source surface Target surface I II III-1 III-2 Figure 5.2 Datasets used for evaluation of the non-rigid registration algorithm. I - Synthetic dataset consisting on a deformed plane. II - Real dataset of a deforming hand. III-1 - Real dataset of a user smiling. III-2 - Real dataset of a user inflating his cheeks. On the tests comparing our approach with the ED algorithm, both were tested by using the same number of nodes in the first iteration. While in the ED algorithm, the number of nodes did not change, in our approach the number of nodes changed according to the error reduction. We have used E tot as a measure for refinement/collapse of nodes. The following evaluations were done by a comparison of three algorithms: ED implemented in GPU (GPU-ED), our approach based only on adaptive refinement of nodes (AdNodes), in which all the points are selected as constraints, and our full approach based on adaptive refinement of nodes and constraints (AdNodes + AdCons). 5.2 ACCURACY EVALUATION The final error distribution for the different datasets shown in Figure 5.2 can be seen in Figure 5.3 for our algorithm AdNodes + AdCons. For the synthetic dataset (on the top of the figure), the only deformation is the presence of a semi-sphere located on the center of the object. In this situation, our algorithm achieved high accuracy of 1mm. For the surface on Figure 5.3, a hand deforms starting by the fingers, where is the high error. The algorithm could reduce the average error below 2mm. The surfaces on the bottom were captured from the Kinect. On the first surface, the user was asked to deform his face by smiling in front of the Kinect. Moreover, the user translated his face to get slightly distant to the camera. Therefore, there is a high error in the model as it was deformed and rigidly translated. On the bottom surface, the user was asked to deform his face by inflating his cheeks. Therefore, the main deformation error is present in the region of the cheeks. In both cases, the algorithm had accuracy of 2.6mm.

54 34 NON-RIGID REGISTRATION EVALUATION Source surface Target surface Initial error Final error I II 10mm III-1 0mm III-2 Figure 5.3 The resulting color-coded error from the registration between source and target surfaces. In all situations the proposed algorithm AdNodes + AdCons obtained an averaged accuracy below 3mm and standard deviation below 3.5mm. I - Synthetic dataset consisting on a deformed plane. II - Real dataset of a deforming hand. III-1 - Real dataset of a user smiling. III-2 - Real dataset of a user inflating his cheeks. Figure 5.4 Accuracy (in mm) obtained by AdNodes and AdNodes + AdCons in comparison with the Embedded Deformation (ED) algorithm and the initial error for each one of the datasets used. The improvement of accuracy by AdNodes + AdCons with respect to the ED algorithm can be seen in Figure 5.4. AdNodes + AdCons obtained better accuracy than ED because of the adaptive selection of nodes, which redistributed the nodes in the deforma-

55 5.2 ACCURACY EVALUATION 35 tion space increasing them in the regions where the residual error is high and decreasing them otherwise. To improve accuracy, one solution is to select more constraints to be used by the non-linear solver. Obviously, it will decrease the performance of the algorithm. This situation can be seen in Figures 5.4 and 5.11 and for the algorithm AdNodes. Target Object Source Object Registered Object Embedded Deformation (7 nodes) 10mm Registered Object Embedded Deformation (33 nodes) Registered Object Adaptive Node Selection (19 nodes) 0mm Figure 5.5 Accuracy comparison between ED algorithm and our adaptive approach with respect to the node selection for the dataset II. Target Object Source Object Registered Object Embedded Deformation (16 nodes) 10mm Registered Object Embedded Deformation (64 nodes) Registered Object Adaptive Node Selection (20 nodes) 0mm Figure 5.6 Accuracy comparison between ED algorithm and our adaptive approach with respect to the node selection for the dataset III-1. A visual comparison between AdNodes and ED can be seen in Figures 5.5 and 5.6. Accuracy by using adaptivity is comparable to ED algorithm using the double or triple number of nodes. An accuracy evaluation with respect to AdCons can be seen in Table 5.1 and in Figures 5.7 and 5.8. By using adaptivity instead of uniform sampling with fixed step size, non-

56 36 NON-RIGID REGISTRATION EVALUATION rigid registration achieves results as accurate as the ones obtained by using all the points from source object as constraints (i.e. step size 1), while maintaining the performance as fast as the one obtained by the approaches which achieve good performance and poor accuracy (i.e. step size 4 and 8). However, for the adaptivity to perform properly, we still must define a value for n of the mask n n used to scan the 2D projection of P s. Based on Table 5.1, step size 4 produces good results for uniform sampling with fixed step size. Therefore, we use such step size for n in order to improve accuracy and performance of the fixed step size. Source surface Registered surface Constraint sampling factor 1 Registered surface Constraint sampling factor 2 Registered surface Constraint sampling factor 4 Target surface Registered surface Constraint sampling factor 8 Registered surface Constraint sampling factor 16 Registered surface Constraint sampling factor 32 Registered surface Adaptive Constraint Sampling Figure 5.7 Accuracy comparison between different sampling schemes used to select constraints for optimization for the dataset II. An accuracy evaluation with respect to the number of nodes which influence the deformation (k) for a given point is illustrated in Figure 5.9. As stated in previous work (Sumner; Schmid; Pauly, 2007), k = 4 is a good option to solve the problem of deformation. Higher values for k may restrict the deformation space for G due to the oversample of nodes influencing a specific region of P s. An accuracy evaluation with respect to the influence of each quadtree level on AdNodes

57 5.2 ACCURACY EVALUATION 37 Source surface Registered surface Constraint sampling factor 1 Registered surface Constraint sampling factor 2 Registered surface Constraint sampling factor 4 Target surface Registered surface Constraint sampling factor 8 Registered surface Constraint sampling factor 16 Registered surface Constraint sampling factor 32 Registered surface Adaptive Constraint Sampling Figure 5.8 Accuracy comparison between different sampling schemes used to select constraints for optimization for the dataset III-1. Dataset I II III-1 III-2 Sampling C A SD P C A SD P C A SD P C A SD P 1 10K K K K K K K K K K K K Adap 1.7K K K K Table 5.1 Number of constraints (C), accuracy (A, given in mm), standard deviation (SD, given in mm) and performance (P, given in FPS) results according to the step size (from 1 to 32) or sampling scheme (Adap for adaptive) used to select constraints for optimization. can be seen in Figure As the number of levels (l) increases, more nodes (maximum

58 38 NON-RIGID REGISTRATION EVALUATION Figure 5.9 Accuracy (in mm) related to the parameter k for each one of the datasets used. 4 l ) are selected and the accuracy is improved. From the tests conducted, we need three levels for the quadtree building and refinement to register two objects accurately. Figure 5.10 Accuracy (in mm) obtained for each level of the quadtree and for each one of the datasets used. The maximum number of nodes for a level l is 4 l. 5.3 PERFORMANCE EVALUATION In terms of performance, a comparison between the algorithms can be seen in Figure As the graphic shows, AdNodes + AdCons does not run in full real-time, but achieves in the real cases 15 FPS, half of the frame rate considered ideal for a real-time application in computer graphics. Nevertheless, it is up three times faster than the ED algorithm.

59 5.3 PERFORMANCE EVALUATION 39 Figure 5.11 Performance (in FPS) obtained by AdNodes and AdNodes + AdCons in comparison with ED algorithm for each one of the datasets used. The use of adaptivity for constraint selection greatly reduces the processing time originally demanded by the ED algorithm (Table 5.1, step size 1). Optimization is a common bottleneck in non-rigid registration algorithms (Sumner; Schmid; Pauly, 2007; Li et al., 2009). The number of constraints selected is directly related to the time required by the optimization phase. Therefore, by reducing adaptively the number of constraints used, we can achieve good performance even for the optimization phase. Moreover, as long as the error is minimized over the surface, the number of nodes is dynamically decreased from G. With less parameters to be estimated, the optimization algorithm converges faster. On the dataset II, the performance for the ED algorithm is better than the one obtained by AdNodes. It can be justified by the number of nodes used. In this case, AdNodes did not change too much the initial number of nodes. Thus, when with almost the same number of nodes, ED is faster than AdNodes approach because it does not build nor refine the quadtree. An analysis of the performance cost for each step of AdNodes + AdCons was also performed. The average processing time for each step of the four datasets was measured. The performance results can be seen in Figure The step which takes most time is the non-linear optimization algorithm, which requires 30ms (10ms per iteration). In fact, it consists of several steps: matrix-matrix and matrix-vector multiplication, computation of J, LLT decomposition and linear solving. Adaptive selection of nodes and constraints requires only 5ms. Therefore, the gain of performance in our approach is justified by the reduction of dimensionality for the optimization algorithm, directly related to the size of G and the number of constraints selected. As J is a sparse matrix, one way to improve the performance of the matrix product would be using a specific sparse matrix product in GPU from CUSPARSE library (Nvidia, 2014), as example. However, from tests conducted, the level of sparsity in J is not

60 40 NON-RIGID REGISTRATION EVALUATION Figure 5.12 Performance (in ms) obtained by our approach for each one of the most computationally expensive methods. MM - Matrix Multiplication (A = J t J); Jacobian - computation of J; Cholesky - LLT decomposition; Solver - linear solver Strsm from CUBLAS library; ACS - Adaptive Constraint Selection; ANS - Adaptive Node Selection; Weights - computation of the influence of G on P s ; MV - Matrix-vector multiplication (b = J t r). sufficiently high (< 90%), and the CUBLAS-dense matrix product ran faster than the CUSPARSE-based matrix product. 5.4 DISCUSSION Based on Table 5.1 and Figure 5.11, we can verify that our algorithm is up to three times faster and about 1.5 to 2 times more accurate than the traditional ED algorithm implemented in GPU. Adaptation for node and constraint selections have shown to be useful in this context, improving from 2 to 6 times the performance of the original ED, while keeping the registration accurate. Also based on Table 5.1, we highlight that our algorithm achieved optimistic results regarding the application on real noisy datasets. Performance is improved from 2 to 3 times over the scenario where all points are selected as constraints, while there is minimal (for dataset III-1) or none (for dataset III-2) loss in accuracy. Stability of the algorithm is reinforced by the low standard deviation measured in comparison to the other scenarios evaluated. The focus of our approach is to add non-rigid tracking support for a MAR environment. Taking advantage from this scenario, where we have temporal/spatial coherence and deformation is expected to be small between consecutive frames, we use a simple projection algorithm to find correspondences. This matching algorithm does not affect our results since we want to ensure that the algorithm will minimize the deformation between consecutive frames, which we assume will be predominantly small for every input frame. Moreover, to boost application s performance and achieve full real-time performance, the algorithm does not need to be applied for every frame when the current error is sufficient

61 5.5 SUMMARY 41 low. 5.5 SUMMARY In this chapter we have evaluated the non-rigid registration algorithm and compared it against related work. Four different datasets were used and from the tests performed, we have shown that the adaptive non-rigid registration proposed outperforms current existing methods in terms of accuracy and performance.

62

63 Chapter 6 In this chapter, non-rigid tracking is evaluated in the context of the markerless augmented reality environment in terms of accuracy, performance and tracking robustness for several datasets. NON-RIGID SUPPORT EVALUATION FOR A MARKERLESS AUGMENTED REALITY ENVIRONMENT In this section we describe the datasets used and analyse accuracy, performance and tracking robustness of the proposed algorithm in the markerless augmented reality environment. 6.1 METHODOLOGY The same hardware described in Chapter 5, Section 5.1 is used in the following tests. On the tests of our algorithm on the MAR environment, we have tested the approach in a scenario where user s head is our natural marker. As simple non-rigid interactions, we asked the user to perform three different facial expressions after 3D reconstruction: inflate his cheeks, smile and simulate a kiss expression, as shown in Figure 6.1. Moreover, to evaluate the proposed environment with respect to challenging deformation scenarios, we have tested the algorithm with different objects and in different conditions for deformation. First, we have tested the same expressions with a different user (Figure 6.2) to evaluate the proposed approach in different faces. We will use term Cheeks-2, Smile-2, Kiss-2 to denote these expressions and differentiate them from the ones present in Figure 6.1. Next, we have tested different deformations: open mouth and angry facial expressions, and a deformation done on a bag, as can be seen in Figure 6.3. Compared to the scenarios presented in Figure 6.1, these deformations present additional challenges for the non-rigid registration algorithm: ˆ Open mouth expression poses a challenging scenario for matching of points, because the corresponding points become too distant during the motion of deformation. Also, there is a big hole on the deformed model which makes the process of matching even more difficult; 43

64 44NON-RIGID SUPPORT EVALUATION FOR A MARKERLESS AUGMENTED REALITY ENVIRONMENT Cheeks Inflated As-rigid-as-possible Expression Smile Kiss Figure 6.1 Neutral and deformed reference models based on user s facial expression. ˆ Angry expression has higher tracking error than open mouth expression, however it introduces less holes in the deformed model. In this case, the user not only performed the facial expression, but also it rigidly rotated his head in front of the sensor. Therefore, the environment must deal not only with the rigid tracking required to solve rigid motion, but also with the non-rigid tracking to solve user s non-rigid facial expression; ˆ Deformed bag presents high error and it is an object different from a face. Therefore, this dataset is fundamental to evaluate the robustness of the algorithm for distinct objects; 6.2 EVALUATION In this section, the deformation scenarios presented in Figures 6.1, 6.2, 6.3 are evaluated in the context of the MAR environment. As explained in Chapter 4, we need to update the 3D reference model to minimize the use of the non-rigid registration algorithm. To accomplish that, one solution is to re-send the 3D deformed reference model into the grid with high weight. As explained in Section 3.2, the KinectFusion algorithm integrates raw depth data into a grid based on TSDF computation and a weight that indicates uncertainty. The higher the weight, the faster the 3D reference model shape is updated based on the current measurement. Therefore, to accommodate the current deformation and to stabilize the tracking faster, a high weight must be used to update the 3D reference model. We have tested the influence of such updating on the tracking accuracy. This test can be seen in Table 6.1. While weight 1 does not result in fast update on 3D reference model shape, stabilization in terms of accuracy is achieved with weight between 8 and 16. We have used weight 8 for all the other tests performed in this section because it provides more stable results than

65 6.2 EVALUATION 45 Cheeks Inflated As-rigid-as-possible Expression Smile Kiss Figure 6.2 Neutral and deformed reference models for a different user. weight 16 (vide standard deviation measurements in Table 6.1). The exception for this statement occurs for the scenario where the error accumulated is too high (in our tests, bag deformation). In this case, the use of a weight higher than 8 is required to minimize the error estimated. From the tests conducted on all cases mentioned at the beginning of this section in which the 3D reference model is a face, we estimated an average accuracy of 1.5mm for rigid tracking during 3D rigid reference model reconstruction. For the bag, the average accuracy was 2mm for the same step. As can be seen in Table 6.2, when non-rigid user interaction is present, the average accuracy decreases for rigid tracking. We have tested different scenarios for non-rigid registration in order to evaluate the best multi-frame strategy to balance accuracy and performance. While skipping a specific number of frames (i.e. NR4, NR8) is a good strategy, to apply it for almost every frame reduces the performance while being, sometimes, unnecessary (i.e. NR1, NR2). Likewise, to apply it between a large number of frames (i.e. NR16, NR32, NR64) improves slightly average tracking accuracy while application s performance keeps almost the same when compared to the rigid solution. However, if high deformation occurs in-between these frames, the tracking will fail (i.e. error measured will be above a pre-defined threshold used to detect rigid tracking failure). To apply the non-rigid registration whenever the rigid tracking fails (i.e. NRAdaptive) is a good idea in order to solve every deformation which occurs between frames, while maintaining good accuracy even for the bag scenario, by considering the relative error reduction when compared to the rigid solution. When the 3D reference model is continuously updated for a case in which there is a small region of deformation, it will become increasingly smooth for each frame. In this case, this solution may be not the most accurate, as the 3D reference model will lose information in regions where there is no deformation. In Table 6.2, we can see this scenario from the tests conducted on the kiss, cheeks-2 and open mouth expressions, where

66 46NON-RIGID SUPPORT EVALUATION FOR A MARKERLESS AUGMENTED REALITY ENVIRONMENT Rigid Object Open Mouth Angry Deformed Bag Figure 6.3 Neutral and deformed reference models based on challenging deformation scenarios. Error (mm) Frame Figure 6.4 Cheeks tracking error measured for both rigid and rigid + non-rigid solutions. Plot in red - rigid tracking. Plot in blue - non-rigid adaptive tracking. Dashed line - threshold. non-rigid registration applied for every 1 or 2 frames did not produce the best results. This issue can be minimized by using the adaptive approach. Tracking error evolution can be seen in Figures 6.4, 6.6, 6.8, 6.10, 6.12, 6.14, 6.16, 6.18 and When there is sufficient non-rigid user interaction, error grows considerably and the non-rigid solution minimizes it. 3D reference model is updated to stabilize the tracking based on the current deformation. Non-rigid registration and 3D reference model updating are done only when the deformation changes in intensity (i.e. error above the threshold, shown as a dashed line) and the rigid tracking fails. A test to analyse the best threshold to detect rigid tracking failure was performed for the simple deformation scenarios shown in Figures 6.1 and 6.2 and the results can be seen in Table 6.3. As mentioned before, rigid tracking has average accuracy of 1.5mm. Therefore, by using this value as threshold, the algorithm applies non-rigid tracking for almost every new frame. On the opposite case, by using threshold of 3mm, the algorithm

67 6.2 EVALUATION 47 User Deformation Cheeks Smile Kiss TSDF Weight A SD A SD A SD User Deformation Cheeks-2 Smile-2 Kiss-2 TSDF Weight A SD A SD A SD User Deformation Open Mouth Angry Bag TSDF Weight A SD A SD A SD Table 6.1 Average accuracy (A, given in mm) and Standard Deviation (SD, given in mm) results according to the weight used to update the 3D reference model. uses almost rigid tracking only. In this sense, the best threshold is between 2mm and 2.5mm, which provides fast and accurate tracking. For the challenging deformation scenarios shown in Figure 6.3, tracking error coming from deformation is too high when compared to the simple scenarios. Therefore, by using the threshold of 2mm, the nonrigid registration algorithm would be applied for almost every input frame. In this sense, for each one of the datasets, we have chosen an appropriate threshold to validate our approach. From the tests conducted, 2mm for open mouth expression, 3mm for angry expression and 7mm for bag deformation allowed our approach to achieve the best results. In terms of visual quality and accuracy, from Figures 6.5, 6.7, 6.9, 6.11, 6.13, 6.15, 6.17, 6.19 and 6.21, it is visible that the algorithm captures the main deformation present on the deformed expressions through the sequence of frames, improving accuracy in regions where only rigid registration cannot solve the tracking. In this context, our main contribution is that the non-rigid registration algorithm runs in real-time, allowing its application for an AR environment. As a pre-processing step, 3D reference model reconstruction is performed at 30 frames per second (FPS). When applied, non-rigid registration requires 60ms per frame. The step which takes most time to be completed for every frame is the non-linear optimization, which demands on average 45ms per frame.

68 48NON-RIGID SUPPORT EVALUATION FOR A MARKERLESS AUGMENTED REALITY ENVIRONMENT User Deformation Cheeks Smile Kiss Tracking/Measurement Avg. Std. Dev. Perf. Avg. Std. Dev. Perf. Avg. Std. Dev. Perf. Rigid NR NR NR NR NRAdaptive NR NR NR User Deformation Cheeks-2 Smile-2 Kiss-2 Tracking/Measurement Avg. Std. Dev. Perf. Avg. Std. Dev. Perf. Avg. Std. Dev. Perf. Rigid NR NR NR NR NRAdaptive NR NR NR User Deformation Open Mouth Angry Bag Tracking/Measurement Avg. Std. Dev. Perf. Avg. Std. Dev. Perf. Avg. Std. Dev. Perf. Rigid NR NR NR NR NRAdaptive NR NR NR Table 6.2 Average accuracy (Avg., given in mm), Standard Deviation (Std. Dev., given in mm) and Performance (Perf., given in FPS) results for each one of the tracking algorithms tested in presence of specific user deformation. NRn: Non-Rigid Registration applied for every n frames (independent of rigid tracking fail); NRAdaptive: Non-Rigid Registration applied whenever the rigid algorithm fails.

69 6.2 EVALUATION 49 Rigid Tracking Error 10mm Non-Rigid Tracking Error 0mm Frame 8 Frame 20 Frame 50 Frame 80 Frame 101 Frame 143 Figure 6.5 Color-coded cheeks tracking error measured for both rigid and non-rigid solutions. Error (mm) Frame Figure 6.6 Cheeks-2 tracking error measured for both rigid and rigid + non-rigid solutions. Plot in red - rigid tracking. Plot in blue - non-rigid adaptive tracking. Dashed line - threshold. User Deformation Cheeks Smile Kiss Threshold/Measurement Avg. Std. Dev. Perf. Avg. Std. Dev. Perf. Avg. Std. Dev. Perf User Deformation Cheeks-2 Smile-2 Kiss-2 Threshold/Measurement Avg. Std. Dev. Perf. Avg. Std. Dev. Perf. Avg. Std. Dev. Perf Table 6.3 Average accuracy (Avg., given in mm), Standard Deviation (Std. Dev., given in mm) and Performance (Perf., given in FPS) results for each one of the thresholds used to detect rigid tracking fail. From Table 6.2, it it visible that NRAdaptive approach allows real-time performance

70 50NON-RIGID SUPPORT EVALUATION FOR A MARKERLESS AUGMENTED REALITY ENVIRONMENT Rigid Tracking Error 10mm Non-Rigid Tracking Error 0mm Frame 20 Frame 32 Frame 60 Frame 100 Frame 160 Frame 192 Figure 6.7 Color-coded cheeks-2 tracking error measured for both rigid and non-rigid solutions. Error (mm) Frame Figure 6.8 Smile tracking error measured for both rigid and rigid + non-rigid solutions. Plot in red - rigid tracking. Plot in blue - non-rigid adaptive tracking. Dashed line - threshold. (above 20 FPS) for almost all the deformations, with the exception of the bag, which is an object with much more points sent for optimization than the human s head, then the non-rigid registration runs slower for this scenario. It is worthy to mention that, in this case, the algorithm is not applied almost for every frame as the 3D reference model is updated based on the present deformation, reducing the chances for rigid tracking fail in the next iterations. As can be seen in the plots of Figures 6.4, 6.6, 6.8, 6.10, 6.12, 6.14, 6.16, 6.18 and The algorithm is applied 21 times ( for every 8 frames) for cheeks deformation, 29 times ( for every 8 frames) for cheeks-2 deformation, 66 times ( for every 2.5 frames) for smile deformation, 8 times ( for every 15 frames) for smile- 2 deformation, 16 times ( for every 10 frames) for kiss deformation, 25 times ( for every 6 frames) for kiss-2 deformation, 73 times ( for every 2.2 frames) for open mouth deformation, 71 times ( for every 2.2 frames) for angry deformation and 80 times ( for every 1.5 frames) for bag deformation. A limitation of this adaptive algorithm is that it does not track non-rigid motions in which the 2D projections of the correspondent parts of the object are not near. An example of this situation can be seen in Figure Looking at the 2D position of the arms, if they are under big motion between sequential frames (Figures 6.22-A and

71 6.2 EVALUATION 51 Rigid Tracking Error 10mm Non-Rigid Tracking Error 0mm Frame 14 Frame 22 Frame 50 Frame 94 Frame 134 Frame 154 Figure 6.9 Color-coded smile tracking error measured for both rigid and non-rigid solutions. Error (mm) Frame Figure 6.10 Smile-2 tracking error measured for both rigid and rigid + non-rigid solutions. Plot in red - rigid tracking. Plot in blue - non-rigid adaptive tracking. Dashed line - threshold. C), the projective data association matching algorithm will not match them, because their corresponding pixels are not close enough. In this case, the 3D rigid reference model is reconstructed from user s body (Figure 6.22-B). As the user moves his arms in front of the sensor (Figure 6.22-C) and, by the movement, they cannot be tracked properly due to the use of the project association algorithm, all the trajectory of the movement performed by the user is integrated into the 3D reference model (Figure D). As stated before, our multi-frame adaptive non-rigid registration solution integrates the current depth data into the 3D reference model when deformation occurs. Therefore, when the algorithm cannot register object s movement, its residual error is integrated into the reference model, based on the updating of the TSDF representation, which averages the current 3D reference model implicitly stored on the grid and the current depth data captured by the sensor. Because there is no updating on 3D reference model s topology, the genus (i.e. hole) present in-between the body and the arms, during the opening of the arms, is not transferred to the 3D reference model. Even in this case, the adaptive approach produces results better than the ones obtained by using rigid registration only (Figures 6.23 and 6.24).

72 52NON-RIGID SUPPORT EVALUATION FOR A MARKERLESS AUGMENTED REALITY ENVIRONMENT Rigid Tracking Error 10mm Non-Rigid Tracking Error 0mm Frame 30 Frame 40 Frame 60 Frame 80 Frame 100 Frame 112 Figure 6.11 Color-coded smile-2 tracking error measured for both rigid and non-rigid solutions. Error (mm) Frame Figure 6.12 Kiss tracking error measured for both rigid and rigid + non-rigid solutions. Plot in red - rigid tracking. Plot in blue - non-rigid adaptive tracking. Dashed line - threshold. 6.3 SUMMARY In this chapter we have evaluated the multi-frame adaptive non-rigid registration algorithm in a MAR environment. To validate our approach, tests were realized using mainly user s face as natural marker and user s facial expressions as non-rigid interactions. From the tests conducted, we have shown that the non-rigid registration, applied in a multiframe manner, is capable to run in real-time on customer hardware. Moreover, it improves the tracking accuracy of the MAR environment when compared to the rigid-only solution or other real-time non-rigid registration techniques, such as the ED algorithm.

73 6.3 SUMMARY 53 Rigid Tracking Error 10mm Non-Rigid Tracking Error 0mm Frame 18 Frame 44 Frame 82 Frame 106 Frame 134 Frame 150 Figure 6.13 Color-coded kiss tracking error measured for both rigid and non-rigid solutions. Error (mm) Frame Figure 6.14 Kiss-2 tracking error measured for both rigid and rigid + non-rigid solutions. Plot in red - rigid tracking. Plot in blue - non-rigid adaptive tracking. Dashed line - threshold. Rigid Tracking Error 10mm Non-Rigid Tracking Error 0mm Frame 15 Frame 30 Frame 60 Frame 75 Frame 90 Frame 102 Figure 6.15 Color-coded kiss-2 tracking error measured for both rigid and non-rigid solutions.

74 54NON-RIGID SUPPORT EVALUATION FOR A MARKERLESS AUGMENTED REALITY ENVIRONMENT Error (mm) Frame Figure 6.16 Open Mouth tracking error measured for both rigid and rigid + non-rigid solutions. Plot in red - rigid tracking. Plot in blue - non-rigid adaptive tracking. Dashed line - threshold. Rigid Tracking Error 10mm Non-Rigid Tracking Error 0mm Frame 20 Frame 40 Frame 60 Frame 100 Frame 120 Frame 140 Figure 6.17 Color-coded open mouth tracking error measured for both rigid and non-rigid solutions. 10 Error (mm) Frame Figure 6.18 Angry tracking error measured for both rigid and rigid + non-rigid solutions. Plot in red - rigid tracking. Plot in blue - non-rigid adaptive tracking. Dashed line - threshold

75 6.3 SUMMARY 55 Rigid Tracking Error 10mm Non-Rigid Tracking Error 0mm Frame 12 Frame 24 Frame 40 Frame 80 Frame 120 Frame 140 Figure 6.19 Color-coded angry tracking error measured for both rigid and non-rigid solutions. 20 Error (mm) Frame Figure 6.20 Bag tracking error measured for both rigid and rigid + non-rigid solutions. Plot in red - rigid tracking. Plot in blue - non-rigid adaptive tracking. Dashed line - threshold. Rigid Tracking Error 10mm Non-Rigid Tracking Error 0mm Frame 42 Frame 54 Frame 62 Frame 82 Frame 94 Frame 110 Figure 6.21 Color-coded bag tracking error measured for both rigid and non-rigid solutions.

76 56NON-RIGID SUPPORT EVALUATION FOR A MARKERLESS AUGMENTED REALITY ENVIRONMENT A B C D Figure 6.22 Limitation of the proposed method. User s body (A) is reconstructed (B) and the algorithm cannot track user s arms (C) integrating all the movement into the 3D reference model (D). 20 Error (mm) Frame Figure 6.23 Body tracking error measured for both rigid and rigid + non-rigid solutions. Plot in red - rigid tracking. Plot in blue - non-rigid adaptive tracking. Dashed line - threshold.

77 6.3 SUMMARY 57 Rigid Tracking Error 10mm 0mm Non-Rigid Tracking Error Frame 50 Frame 150 Frame 300 Frame 500 Figure 6.24 Color-coded body tracking error measured for both rigid and non-rigid solutions.

Accurate 3D Face and Body Modeling from a Single Fixed Kinect

Accurate 3D Face and Body Modeling from a Single Fixed Kinect Accurate 3D Face and Body Modeling from a Single Fixed Kinect Ruizhe Wang*, Matthias Hernandez*, Jongmoo Choi, Gérard Medioni Computer Vision Lab, IRIS University of Southern California Abstract In this

More information

Mobile Point Fusion. Real-time 3d surface reconstruction out of depth images on a mobile platform

Mobile Point Fusion. Real-time 3d surface reconstruction out of depth images on a mobile platform Mobile Point Fusion Real-time 3d surface reconstruction out of depth images on a mobile platform Aaron Wetzler Presenting: Daniel Ben-Hoda Supervisors: Prof. Ron Kimmel Gal Kamar Yaron Honen Supported

More information

Intrinsic3D: High-Quality 3D Reconstruction by Joint Appearance and Geometry Optimization with Spatially-Varying Lighting

Intrinsic3D: High-Quality 3D Reconstruction by Joint Appearance and Geometry Optimization with Spatially-Varying Lighting Intrinsic3D: High-Quality 3D Reconstruction by Joint Appearance and Geometry Optimization with Spatially-Varying Lighting R. Maier 1,2, K. Kim 1, D. Cremers 2, J. Kautz 1, M. Nießner 2,3 Fusion Ours 1

More information

Rigid ICP registration with Kinect

Rigid ICP registration with Kinect Rigid ICP registration with Kinect Students: Yoni Choukroun, Elie Semmel Advisor: Yonathan Aflalo 1 Overview.p.3 Development of the project..p.3 Papers p.4 Project algorithm..p.6 Result of the whole body.p.7

More information

Outline. 1 Why we re interested in Real-Time tracking and mapping. 3 Kinect Fusion System Overview. 4 Real-time Surface Mapping

Outline. 1 Why we re interested in Real-Time tracking and mapping. 3 Kinect Fusion System Overview. 4 Real-time Surface Mapping Outline CSE 576 KinectFusion: Real-Time Dense Surface Mapping and Tracking PhD. work from Imperial College, London Microsoft Research, Cambridge May 6, 2013 1 Why we re interested in Real-Time tracking

More information

Occluded Facial Expression Tracking

Occluded Facial Expression Tracking Occluded Facial Expression Tracking Hugo Mercier 1, Julien Peyras 2, and Patrice Dalle 1 1 Institut de Recherche en Informatique de Toulouse 118, route de Narbonne, F-31062 Toulouse Cedex 9 2 Dipartimento

More information

Dynamic Geometry Processing

Dynamic Geometry Processing Dynamic Geometry Processing EG 2012 Tutorial Will Chang, Hao Li, Niloy Mitra, Mark Pauly, Michael Wand Tutorial: Dynamic Geometry Processing 1 Articulated Global Registration Introduction and Overview

More information

Using temporal seeding to constrain the disparity search range in stereo matching

Using temporal seeding to constrain the disparity search range in stereo matching Using temporal seeding to constrain the disparity search range in stereo matching Thulani Ndhlovu Mobile Intelligent Autonomous Systems CSIR South Africa Email: tndhlovu@csir.co.za Fred Nicolls Department

More information

Correspondence. CS 468 Geometry Processing Algorithms. Maks Ovsjanikov

Correspondence. CS 468 Geometry Processing Algorithms. Maks Ovsjanikov Shape Matching & Correspondence CS 468 Geometry Processing Algorithms Maks Ovsjanikov Wednesday, October 27 th 2010 Overall Goal Given two shapes, find correspondences between them. Overall Goal Given

More information

3D Photography: Stereo

3D Photography: Stereo 3D Photography: Stereo Marc Pollefeys, Torsten Sattler Spring 2016 http://www.cvg.ethz.ch/teaching/3dvision/ 3D Modeling with Depth Sensors Today s class Obtaining depth maps / range images unstructured

More information

Image processing and features

Image processing and features Image processing and features Gabriele Bleser gabriele.bleser@dfki.de Thanks to Harald Wuest, Folker Wientapper and Marc Pollefeys Introduction Previous lectures: geometry Pose estimation Epipolar geometry

More information

Image Processing Pipeline for Facial Expression Recognition under Variable Lighting

Image Processing Pipeline for Facial Expression Recognition under Variable Lighting Image Processing Pipeline for Facial Expression Recognition under Variable Lighting Ralph Ma, Amr Mohamed ralphma@stanford.edu, amr1@stanford.edu Abstract Much research has been done in the field of automated

More information

Robust Articulated ICP for Real-Time Hand Tracking

Robust Articulated ICP for Real-Time Hand Tracking Robust Articulated-ICP for Real-Time Hand Tracking Andrea Tagliasacchi* Sofien Bouaziz Matthias Schröder* Mario Botsch Anastasia Tkach Mark Pauly * equal contribution 1/36 Real-Time Tracking Setup Data

More information

AAM Based Facial Feature Tracking with Kinect

AAM Based Facial Feature Tracking with Kinect BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 15, No 3 Sofia 2015 Print ISSN: 1311-9702; Online ISSN: 1314-4081 DOI: 10.1515/cait-2015-0046 AAM Based Facial Feature Tracking

More information

PERFORMANCE CAPTURE FROM SPARSE MULTI-VIEW VIDEO

PERFORMANCE CAPTURE FROM SPARSE MULTI-VIEW VIDEO Stefan Krauß, Juliane Hüttl SE, SoSe 2011, HU-Berlin PERFORMANCE CAPTURE FROM SPARSE MULTI-VIEW VIDEO 1 Uses of Motion/Performance Capture movies games, virtual environments biomechanics, sports science,

More information

3D Editing System for Captured Real Scenes

3D Editing System for Captured Real Scenes 3D Editing System for Captured Real Scenes Inwoo Ha, Yong Beom Lee and James D.K. Kim Samsung Advanced Institute of Technology, Youngin, South Korea E-mail: {iw.ha, leey, jamesdk.kim}@samsung.com Tel:

More information

Advances in 3D data processing and 3D cameras

Advances in 3D data processing and 3D cameras Advances in 3D data processing and 3D cameras Miguel Cazorla Grupo de Robótica y Visión Tridimensional Universidad de Alicante Contents Cameras and 3D images 3D data compression 3D registration 3D feature

More information

Dense Tracking and Mapping for Autonomous Quadrocopters. Jürgen Sturm

Dense Tracking and Mapping for Autonomous Quadrocopters. Jürgen Sturm Computer Vision Group Prof. Daniel Cremers Dense Tracking and Mapping for Autonomous Quadrocopters Jürgen Sturm Joint work with Frank Steinbrücker, Jakob Engel, Christian Kerl, Erik Bylow, and Daniel Cremers

More information

Video based Animation Synthesis with the Essential Graph. Adnane Boukhayma, Edmond Boyer MORPHEO INRIA Grenoble Rhône-Alpes

Video based Animation Synthesis with the Essential Graph. Adnane Boukhayma, Edmond Boyer MORPHEO INRIA Grenoble Rhône-Alpes Video based Animation Synthesis with the Essential Graph Adnane Boukhayma, Edmond Boyer MORPHEO INRIA Grenoble Rhône-Alpes Goal Given a set of 4D models, how to generate realistic motion from user specified

More information

CVPR 2014 Visual SLAM Tutorial Kintinuous

CVPR 2014 Visual SLAM Tutorial Kintinuous CVPR 2014 Visual SLAM Tutorial Kintinuous kaess@cmu.edu The Robotics Institute Carnegie Mellon University Recap: KinectFusion [Newcombe et al., ISMAR 2011] RGB-D camera GPU 3D/color model RGB TSDF (volumetric

More information

Global Non-Rigid Alignment. Benedict J. Brown Katholieke Universiteit Leuven

Global Non-Rigid Alignment. Benedict J. Brown Katholieke Universiteit Leuven Global Non-Rigid Alignment Benedict J. Brown Katholieke Universiteit Leuven 3-D Scanning Pipeline Acquisition Scanners acquire data from a single viewpoint 3-D Scanning Pipeline Acquisition Alignment 3-D

More information

SCAPE: Shape Completion and Animation of People

SCAPE: Shape Completion and Animation of People SCAPE: Shape Completion and Animation of People By Dragomir Anguelov, Praveen Srinivasan, Daphne Koller, Sebastian Thrun, Jim Rodgers, James Davis From SIGGRAPH 2005 Presentation for CS468 by Emilio Antúnez

More information

Processing 3D Surface Data

Processing 3D Surface Data Processing 3D Surface Data Computer Animation and Visualisation Lecture 17 Institute for Perception, Action & Behaviour School of Informatics 3D Surfaces 1 3D surface data... where from? Iso-surfacing

More information

Object Reconstruction

Object Reconstruction B. Scholz Object Reconstruction 1 / 39 MIN-Fakultät Fachbereich Informatik Object Reconstruction Benjamin Scholz Universität Hamburg Fakultät für Mathematik, Informatik und Naturwissenschaften Fachbereich

More information

SIFT: SCALE INVARIANT FEATURE TRANSFORM SURF: SPEEDED UP ROBUST FEATURES BASHAR ALSADIK EOS DEPT. TOPMAP M13 3D GEOINFORMATION FROM IMAGES 2014

SIFT: SCALE INVARIANT FEATURE TRANSFORM SURF: SPEEDED UP ROBUST FEATURES BASHAR ALSADIK EOS DEPT. TOPMAP M13 3D GEOINFORMATION FROM IMAGES 2014 SIFT: SCALE INVARIANT FEATURE TRANSFORM SURF: SPEEDED UP ROBUST FEATURES BASHAR ALSADIK EOS DEPT. TOPMAP M13 3D GEOINFORMATION FROM IMAGES 2014 SIFT SIFT: Scale Invariant Feature Transform; transform image

More information

3D object recognition used by team robotto

3D object recognition used by team robotto 3D object recognition used by team robotto Workshop Juliane Hoebel February 1, 2016 Faculty of Computer Science, Otto-von-Guericke University Magdeburg Content 1. Introduction 2. Depth sensor 3. 3D object

More information

Motion Estimation for Video Coding Standards

Motion Estimation for Video Coding Standards Motion Estimation for Video Coding Standards Prof. Ja-Ling Wu Department of Computer Science and Information Engineering National Taiwan University Introduction of Motion Estimation The goal of video compression

More information

A Non-Linear Image Registration Scheme for Real-Time Liver Ultrasound Tracking using Normalized Gradient Fields

A Non-Linear Image Registration Scheme for Real-Time Liver Ultrasound Tracking using Normalized Gradient Fields A Non-Linear Image Registration Scheme for Real-Time Liver Ultrasound Tracking using Normalized Gradient Fields Lars König, Till Kipshagen and Jan Rühaak Fraunhofer MEVIS Project Group Image Registration,

More information

3D Models from Range Sensors. Gianpaolo Palma

3D Models from Range Sensors. Gianpaolo Palma 3D Models from Range Sensors Gianpaolo Palma Who Gianpaolo Palma Researcher at Visual Computing Laboratory (ISTI-CNR) Expertise: 3D scanning, Mesh Processing, Computer Graphics E-mail: gianpaolo.palma@isti.cnr.it

More information

Uncertainties: Representation and Propagation & Line Extraction from Range data

Uncertainties: Representation and Propagation & Line Extraction from Range data 41 Uncertainties: Representation and Propagation & Line Extraction from Range data 42 Uncertainty Representation Section 4.1.3 of the book Sensing in the real world is always uncertain How can uncertainty

More information

Motion Detection Algorithm

Motion Detection Algorithm Volume 1, No. 12, February 2013 ISSN 2278-1080 The International Journal of Computer Science & Applications (TIJCSA) RESEARCH PAPER Available Online at http://www.journalofcomputerscience.com/ Motion Detection

More information

Overview. Augmented reality and applications Marker-based augmented reality. Camera model. Binary markers Textured planar markers

Overview. Augmented reality and applications Marker-based augmented reality. Camera model. Binary markers Textured planar markers Augmented reality Overview Augmented reality and applications Marker-based augmented reality Binary markers Textured planar markers Camera model Homography Direct Linear Transformation What is augmented

More information

Occlusion Robust Multi-Camera Face Tracking

Occlusion Robust Multi-Camera Face Tracking Occlusion Robust Multi-Camera Face Tracking Josh Harguess, Changbo Hu, J. K. Aggarwal Computer & Vision Research Center / Department of ECE The University of Texas at Austin harguess@utexas.edu, changbo.hu@gmail.com,

More information

Measurement of 3D Foot Shape Deformation in Motion

Measurement of 3D Foot Shape Deformation in Motion Measurement of 3D Foot Shape Deformation in Motion Makoto Kimura Masaaki Mochimaru Takeo Kanade Digital Human Research Center National Institute of Advanced Industrial Science and Technology, Japan The

More information

Detecting motion by means of 2D and 3D information

Detecting motion by means of 2D and 3D information Detecting motion by means of 2D and 3D information Federico Tombari Stefano Mattoccia Luigi Di Stefano Fabio Tonelli Department of Electronics Computer Science and Systems (DEIS) Viale Risorgimento 2,

More information

Dense 3D Reconstruction

Dense 3D Reconstruction Dense 3D Reconstruction Christiano Gava christiano.gava@dfki.de Thanks to Yasutaka Furukawa Outline Previous lecture: structure and motion II Structure and motion loop Triangulation Wide baseline matching

More information

Chaplin, Modern Times, 1936

Chaplin, Modern Times, 1936 Chaplin, Modern Times, 1936 [A Bucket of Water and a Glass Matte: Special Effects in Modern Times; bonus feature on The Criterion Collection set] Multi-view geometry problems Structure: Given projections

More information

CSE 252B: Computer Vision II

CSE 252B: Computer Vision II CSE 252B: Computer Vision II Lecturer: Serge Belongie Scribes: Jeremy Pollock and Neil Alldrin LECTURE 14 Robust Feature Matching 14.1. Introduction Last lecture we learned how to find interest points

More information

International Conference on Communication, Media, Technology and Design. ICCMTD May 2012 Istanbul - Turkey

International Conference on Communication, Media, Technology and Design. ICCMTD May 2012 Istanbul - Turkey VISUALIZING TIME COHERENT THREE-DIMENSIONAL CONTENT USING ONE OR MORE MICROSOFT KINECT CAMERAS Naveed Ahmed University of Sharjah Sharjah, United Arab Emirates Abstract Visualizing or digitization of the

More information

ACQUIRING 3D models of deforming objects in real-life is

ACQUIRING 3D models of deforming objects in real-life is IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 1 Robust Non-rigid Motion Tracking and Surface Reconstruction Using L 0 Regularization Kaiwen Guo, Feng Xu, Yangang Wang, Yebin Liu, Member, IEEE

More information

Panoramic Image Stitching

Panoramic Image Stitching Mcgill University Panoramic Image Stitching by Kai Wang Pengbo Li A report submitted in fulfillment for the COMP 558 Final project in the Faculty of Computer Science April 2013 Mcgill University Abstract

More information

Synthesizing Realistic Facial Expressions from Photographs

Synthesizing Realistic Facial Expressions from Photographs Synthesizing Realistic Facial Expressions from Photographs 1998 F. Pighin, J Hecker, D. Lischinskiy, R. Szeliskiz and D. H. Salesin University of Washington, The Hebrew University Microsoft Research 1

More information

Gesture based PTZ camera control

Gesture based PTZ camera control Gesture based PTZ camera control Report submitted in May 2014 to the department of Computer Science and Engineering of National Institute of Technology Rourkela in partial fulfillment of the requirements

More information

IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. 6, NO. 5, SEPTEMBER

IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. 6, NO. 5, SEPTEMBER IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. 6, NO. 5, SEPTEMBER 2012 411 Consistent Stereo-Assisted Absolute Phase Unwrapping Methods for Structured Light Systems Ricardo R. Garcia, Student

More information

Hand-eye calibration with a depth camera: 2D or 3D?

Hand-eye calibration with a depth camera: 2D or 3D? Hand-eye calibration with a depth camera: 2D or 3D? Svenja Kahn 1, Dominik Haumann 2 and Volker Willert 2 1 Fraunhofer IGD, Darmstadt, Germany 2 Control theory and robotics lab, TU Darmstadt, Darmstadt,

More information

3D Model Acquisition by Tracking 2D Wireframes

3D Model Acquisition by Tracking 2D Wireframes 3D Model Acquisition by Tracking 2D Wireframes M. Brown, T. Drummond and R. Cipolla {96mab twd20 cipolla}@eng.cam.ac.uk Department of Engineering University of Cambridge Cambridge CB2 1PZ, UK Abstract

More information

Euclidean Reconstruction Independent on Camera Intrinsic Parameters

Euclidean Reconstruction Independent on Camera Intrinsic Parameters Euclidean Reconstruction Independent on Camera Intrinsic Parameters Ezio MALIS I.N.R.I.A. Sophia-Antipolis, FRANCE Adrien BARTOLI INRIA Rhone-Alpes, FRANCE Abstract bundle adjustment techniques for Euclidean

More information

L2 Data Acquisition. Mechanical measurement (CMM) Structured light Range images Shape from shading Other methods

L2 Data Acquisition. Mechanical measurement (CMM) Structured light Range images Shape from shading Other methods L2 Data Acquisition Mechanical measurement (CMM) Structured light Range images Shape from shading Other methods 1 Coordinate Measurement Machine Touch based Slow Sparse Data Complex planning Accurate 2

More information

Occlusion Detection of Real Objects using Contour Based Stereo Matching

Occlusion Detection of Real Objects using Contour Based Stereo Matching Occlusion Detection of Real Objects using Contour Based Stereo Matching Kenichi Hayashi, Hirokazu Kato, Shogo Nishida Graduate School of Engineering Science, Osaka University,1-3 Machikaneyama-cho, Toyonaka,

More information

DEFORMABLE MATCHING OF HAND SHAPES FOR USER VERIFICATION. Ani1 K. Jain and Nicolae Duta

DEFORMABLE MATCHING OF HAND SHAPES FOR USER VERIFICATION. Ani1 K. Jain and Nicolae Duta DEFORMABLE MATCHING OF HAND SHAPES FOR USER VERIFICATION Ani1 K. Jain and Nicolae Duta Department of Computer Science and Engineering Michigan State University, East Lansing, MI 48824-1026, USA E-mail:

More information

Face Tracking : An implementation of the Kanade-Lucas-Tomasi Tracking algorithm

Face Tracking : An implementation of the Kanade-Lucas-Tomasi Tracking algorithm Face Tracking : An implementation of the Kanade-Lucas-Tomasi Tracking algorithm Dirk W. Wagener, Ben Herbst Department of Applied Mathematics, University of Stellenbosch, Private Bag X1, Matieland 762,

More information

Snakes, level sets and graphcuts. (Deformable models)

Snakes, level sets and graphcuts. (Deformable models) INSTITUTE OF INFORMATION AND COMMUNICATION TECHNOLOGIES BULGARIAN ACADEMY OF SCIENCE Snakes, level sets and graphcuts (Deformable models) Centro de Visión por Computador, Departament de Matemàtica Aplicada

More information

IMPROVED FACE RECOGNITION USING ICP TECHNIQUES INCAMERA SURVEILLANCE SYSTEMS. Kirthiga, M.E-Communication system, PREC, Thanjavur

IMPROVED FACE RECOGNITION USING ICP TECHNIQUES INCAMERA SURVEILLANCE SYSTEMS. Kirthiga, M.E-Communication system, PREC, Thanjavur IMPROVED FACE RECOGNITION USING ICP TECHNIQUES INCAMERA SURVEILLANCE SYSTEMS Kirthiga, M.E-Communication system, PREC, Thanjavur R.Kannan,Assistant professor,prec Abstract: Face Recognition is important

More information

MediaTek Video Face Beautify

MediaTek Video Face Beautify MediaTek Video Face Beautify November 2014 2014 MediaTek Inc. Table of Contents 1 Introduction... 3 2 The MediaTek Solution... 4 3 Overview of Video Face Beautify... 4 4 Face Detection... 6 5 Skin Detection...

More information

Video Processing for Judicial Applications

Video Processing for Judicial Applications Video Processing for Judicial Applications Konstantinos Avgerinakis, Alexia Briassouli, Ioannis Kompatsiaris Informatics and Telematics Institute, Centre for Research and Technology, Hellas Thessaloniki,

More information

Motion Estimation. There are three main types (or applications) of motion estimation:

Motion Estimation. There are three main types (or applications) of motion estimation: Members: D91922016 朱威達 R93922010 林聖凱 R93922044 謝俊瑋 Motion Estimation There are three main types (or applications) of motion estimation: Parametric motion (image alignment) The main idea of parametric motion

More information

Feature Tracking and Optical Flow

Feature Tracking and Optical Flow Feature Tracking and Optical Flow Prof. D. Stricker Doz. G. Bleser Many slides adapted from James Hays, Derek Hoeim, Lana Lazebnik, Silvio Saverse, who in turn adapted slides from Steve Seitz, Rick Szeliski,

More information

A consumer level 3D object scanning device using Kinect for web-based C2C business

A consumer level 3D object scanning device using Kinect for web-based C2C business A consumer level 3D object scanning device using Kinect for web-based C2C business Geoffrey Poon, Yu Yin Yeung and Wai-Man Pang Caritas Institute of Higher Education Introduction Internet shopping is popular

More information

Robot Mapping. Least Squares Approach to SLAM. Cyrill Stachniss

Robot Mapping. Least Squares Approach to SLAM. Cyrill Stachniss Robot Mapping Least Squares Approach to SLAM Cyrill Stachniss 1 Three Main SLAM Paradigms Kalman filter Particle filter Graphbased least squares approach to SLAM 2 Least Squares in General Approach for

More information

3D Visualization through Planar Pattern Based Augmented Reality

3D Visualization through Planar Pattern Based Augmented Reality NATIONAL TECHNICAL UNIVERSITY OF ATHENS SCHOOL OF RURAL AND SURVEYING ENGINEERS DEPARTMENT OF TOPOGRAPHY LABORATORY OF PHOTOGRAMMETRY 3D Visualization through Planar Pattern Based Augmented Reality Dr.

More information

On the Dimensionality of Deformable Face Models

On the Dimensionality of Deformable Face Models On the Dimensionality of Deformable Face Models CMU-RI-TR-06-12 Iain Matthews, Jing Xiao, and Simon Baker The Robotics Institute Carnegie Mellon University 5000 Forbes Avenue Pittsburgh, PA 15213 Abstract

More information

An Approach for Reduction of Rain Streaks from a Single Image

An Approach for Reduction of Rain Streaks from a Single Image An Approach for Reduction of Rain Streaks from a Single Image Vijayakumar Majjagi 1, Netravati U M 2 1 4 th Semester, M. Tech, Digital Electronics, Department of Electronics and Communication G M Institute

More information

Project report Augmented reality with ARToolKit

Project report Augmented reality with ARToolKit Project report Augmented reality with ARToolKit FMA175 Image Analysis, Project Mathematical Sciences, Lund Institute of Technology Supervisor: Petter Strandmark Fredrik Larsson (dt07fl2@student.lth.se)

More information

PHOTOGRAMMETRIC TECHNIQUE FOR TEETH OCCLUSION ANALYSIS IN DENTISTRY

PHOTOGRAMMETRIC TECHNIQUE FOR TEETH OCCLUSION ANALYSIS IN DENTISTRY PHOTOGRAMMETRIC TECHNIQUE FOR TEETH OCCLUSION ANALYSIS IN DENTISTRY V. A. Knyaz a, *, S. Yu. Zheltov a, a State Research Institute of Aviation System (GosNIIAS), 539 Moscow, Russia (knyaz,zhl)@gosniias.ru

More information

Processing 3D Surface Data

Processing 3D Surface Data Processing 3D Surface Data Computer Animation and Visualisation Lecture 15 Institute for Perception, Action & Behaviour School of Informatics 3D Surfaces 1 3D surface data... where from? Iso-surfacing

More information

PATTERN CLASSIFICATION AND SCENE ANALYSIS

PATTERN CLASSIFICATION AND SCENE ANALYSIS PATTERN CLASSIFICATION AND SCENE ANALYSIS RICHARD O. DUDA PETER E. HART Stanford Research Institute, Menlo Park, California A WILEY-INTERSCIENCE PUBLICATION JOHN WILEY & SONS New York Chichester Brisbane

More information

Digital Makeup Face Generation

Digital Makeup Face Generation Digital Makeup Face Generation Wut Yee Oo Mechanical Engineering Stanford University wutyee@stanford.edu Abstract Make up applications offer photoshop tools to get users inputs in generating a make up

More information

Lecture 10 Multi-view Stereo (3D Dense Reconstruction) Davide Scaramuzza

Lecture 10 Multi-view Stereo (3D Dense Reconstruction) Davide Scaramuzza Lecture 10 Multi-view Stereo (3D Dense Reconstruction) Davide Scaramuzza REMODE: Probabilistic, Monocular Dense Reconstruction in Real Time, ICRA 14, by Pizzoli, Forster, Scaramuzza [M. Pizzoli, C. Forster,

More information

Low Cost Motion Capture

Low Cost Motion Capture Low Cost Motion Capture R. Budiman M. Bennamoun D.Q. Huynh School of Computer Science and Software Engineering The University of Western Australia Crawley WA 6009 AUSTRALIA Email: budimr01@tartarus.uwa.edu.au,

More information

Connected Component Analysis and Change Detection for Images

Connected Component Analysis and Change Detection for Images Connected Component Analysis and Change Detection for Images Prasad S.Halgaonkar Department of Computer Engg, MITCOE Pune University, India Abstract Detection of the region of change in images of a particular

More information

CITS 4402 Computer Vision

CITS 4402 Computer Vision CITS 4402 Computer Vision Prof Ajmal Mian Lecture 12 3D Shape Analysis & Matching Overview of this lecture Revision of 3D shape acquisition techniques Representation of 3D data Applying 2D image techniques

More information

Applying Synthetic Images to Learning Grasping Orientation from Single Monocular Images

Applying Synthetic Images to Learning Grasping Orientation from Single Monocular Images Applying Synthetic Images to Learning Grasping Orientation from Single Monocular Images 1 Introduction - Steve Chuang and Eric Shan - Determining object orientation in images is a well-established topic

More information

PART IV: RS & the Kinect

PART IV: RS & the Kinect Computer Vision on Rolling Shutter Cameras PART IV: RS & the Kinect Per-Erik Forssén, Erik Ringaby, Johan Hedborg Computer Vision Laboratory Dept. of Electrical Engineering Linköping University Tutorial

More information

VIDEO FACE BEAUTIFICATION

VIDEO FACE BEAUTIFICATION VIDEO FACE BEAUTIFICATION Yajie Zhao 1, Xinyu Huang 2, Jizhou Gao 1, Alade Tokuta 2, Cha Zhang 3, Ruigang Yang 1 University of Kentucky 1 North Carolina Central University 2 Microsoft Research 3 Lexington,

More information

Programmable Shaders for Deformation Rendering

Programmable Shaders for Deformation Rendering Programmable Shaders for Deformation Rendering Carlos D. Correa, Deborah Silver Rutgers, The State University of New Jersey Motivation We present a different way of obtaining mesh deformation. Not a modeling,

More information

Face Tracking. Synonyms. Definition. Main Body Text. Amit K. Roy-Chowdhury and Yilei Xu. Facial Motion Estimation

Face Tracking. Synonyms. Definition. Main Body Text. Amit K. Roy-Chowdhury and Yilei Xu. Facial Motion Estimation Face Tracking Amit K. Roy-Chowdhury and Yilei Xu Department of Electrical Engineering, University of California, Riverside, CA 92521, USA {amitrc,yxu}@ee.ucr.edu Synonyms Facial Motion Estimation Definition

More information

CS 532: 3D Computer Vision 7 th Set of Notes

CS 532: 3D Computer Vision 7 th Set of Notes 1 CS 532: 3D Computer Vision 7 th Set of Notes Instructor: Philippos Mordohai Webpage: www.cs.stevens.edu/~mordohai E-mail: Philippos.Mordohai@stevens.edu Office: Lieb 215 Logistics No class on October

More information

3D Computer Vision. Depth Cameras. Prof. Didier Stricker. Oliver Wasenmüller

3D Computer Vision. Depth Cameras. Prof. Didier Stricker. Oliver Wasenmüller 3D Computer Vision Depth Cameras Prof. Didier Stricker Oliver Wasenmüller Kaiserlautern University http://ags.cs.uni-kl.de/ DFKI Deutsches Forschungszentrum für Künstliche Intelligenz http://av.dfki.de

More information

Lecture 13 Theory of Registration. ch. 10 of Insight into Images edited by Terry Yoo, et al. Spring (CMU RI) : BioE 2630 (Pitt)

Lecture 13 Theory of Registration. ch. 10 of Insight into Images edited by Terry Yoo, et al. Spring (CMU RI) : BioE 2630 (Pitt) Lecture 13 Theory of Registration ch. 10 of Insight into Images edited by Terry Yoo, et al. Spring 2018 16-725 (CMU RI) : BioE 2630 (Pitt) Dr. John Galeotti The content of these slides by John Galeotti,

More information

A Hybrid Face Detection System using combination of Appearance-based and Feature-based methods

A Hybrid Face Detection System using combination of Appearance-based and Feature-based methods IJCSNS International Journal of Computer Science and Network Security, VOL.9 No.5, May 2009 181 A Hybrid Face Detection System using combination of Appearance-based and Feature-based methods Zahra Sadri

More information

IMPLEMENTATION OF THE CONTRAST ENHANCEMENT AND WEIGHTED GUIDED IMAGE FILTERING ALGORITHM FOR EDGE PRESERVATION FOR BETTER PERCEPTION

IMPLEMENTATION OF THE CONTRAST ENHANCEMENT AND WEIGHTED GUIDED IMAGE FILTERING ALGORITHM FOR EDGE PRESERVATION FOR BETTER PERCEPTION IMPLEMENTATION OF THE CONTRAST ENHANCEMENT AND WEIGHTED GUIDED IMAGE FILTERING ALGORITHM FOR EDGE PRESERVATION FOR BETTER PERCEPTION Chiruvella Suresh Assistant professor, Department of Electronics & Communication

More information

Dynamic Human Surface Reconstruction Using a Single Kinect

Dynamic Human Surface Reconstruction Using a Single Kinect 2013 13th International Conference on Computer-Aided Design and Computer Graphics Dynamic Human Surface Reconstruction Using a Single Kinect Ming Zeng Jiaxiang Zheng Xuan Cheng Bo Jiang Xinguo Liu Software

More information

Registration D.A. Forsyth, UIUC

Registration D.A. Forsyth, UIUC Registration D.A. Forsyth, UIUC Registration Place a geometric model in correspondence with an image could be 2D or 3D model up to some transformations possibly up to deformation Applications very important

More information

Three-Dimensional Sensors Lecture 6: Point-Cloud Registration

Three-Dimensional Sensors Lecture 6: Point-Cloud Registration Three-Dimensional Sensors Lecture 6: Point-Cloud Registration Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inria.fr http://perception.inrialpes.fr/ Point-Cloud Registration Methods Fuse data

More information

CS 4495 Computer Vision A. Bobick. Motion and Optic Flow. Stereo Matching

CS 4495 Computer Vision A. Bobick. Motion and Optic Flow. Stereo Matching Stereo Matching Fundamental matrix Let p be a point in left image, p in right image l l Epipolar relation p maps to epipolar line l p maps to epipolar line l p p Epipolar mapping described by a 3x3 matrix

More information

Visual Recognition: Image Formation

Visual Recognition: Image Formation Visual Recognition: Image Formation Raquel Urtasun TTI Chicago Jan 5, 2012 Raquel Urtasun (TTI-C) Visual Recognition Jan 5, 2012 1 / 61 Today s lecture... Fundamentals of image formation You should know

More information

Colour Segmentation-based Computation of Dense Optical Flow with Application to Video Object Segmentation

Colour Segmentation-based Computation of Dense Optical Flow with Application to Video Object Segmentation ÖGAI Journal 24/1 11 Colour Segmentation-based Computation of Dense Optical Flow with Application to Video Object Segmentation Michael Bleyer, Margrit Gelautz, Christoph Rhemann Vienna University of Technology

More information

2D vs. 3D Deformable Face Models: Representational Power, Construction, and Real-Time Fitting

2D vs. 3D Deformable Face Models: Representational Power, Construction, and Real-Time Fitting 2D vs. 3D Deformable Face Models: Representational Power, Construction, and Real-Time Fitting Iain Matthews, Jing Xiao, and Simon Baker The Robotics Institute, Carnegie Mellon University Epsom PAL, Epsom

More information

CS 664 Segmentation. Daniel Huttenlocher

CS 664 Segmentation. Daniel Huttenlocher CS 664 Segmentation Daniel Huttenlocher Grouping Perceptual Organization Structural relationships between tokens Parallelism, symmetry, alignment Similarity of token properties Often strong psychophysical

More information

Outdoor Scene Reconstruction from Multiple Image Sequences Captured by a Hand-held Video Camera

Outdoor Scene Reconstruction from Multiple Image Sequences Captured by a Hand-held Video Camera Outdoor Scene Reconstruction from Multiple Image Sequences Captured by a Hand-held Video Camera Tomokazu Sato, Masayuki Kanbara and Naokazu Yokoya Graduate School of Information Science, Nara Institute

More information

Facial Expression Recognition Using Non-negative Matrix Factorization

Facial Expression Recognition Using Non-negative Matrix Factorization Facial Expression Recognition Using Non-negative Matrix Factorization Symeon Nikitidis, Anastasios Tefas and Ioannis Pitas Artificial Intelligence & Information Analysis Lab Department of Informatics Aristotle,

More information

Multimedia Technology CHAPTER 4. Video and Animation

Multimedia Technology CHAPTER 4. Video and Animation CHAPTER 4 Video and Animation - Both video and animation give us a sense of motion. They exploit some properties of human eye s ability of viewing pictures. - Motion video is the element of multimedia

More information

AUTOMATED 4 AXIS ADAYfIVE SCANNING WITH THE DIGIBOTICS LASER DIGITIZER

AUTOMATED 4 AXIS ADAYfIVE SCANNING WITH THE DIGIBOTICS LASER DIGITIZER AUTOMATED 4 AXIS ADAYfIVE SCANNING WITH THE DIGIBOTICS LASER DIGITIZER INTRODUCTION The DIGIBOT 3D Laser Digitizer is a high performance 3D input device which combines laser ranging technology, personal

More information

Structure from Motion. Prof. Marco Marcon

Structure from Motion. Prof. Marco Marcon Structure from Motion Prof. Marco Marcon Summing-up 2 Stereo is the most powerful clue for determining the structure of a scene Another important clue is the relative motion between the scene and (mono)

More information

Building a Panorama. Matching features. Matching with Features. How do we build a panorama? Computational Photography, 6.882

Building a Panorama. Matching features. Matching with Features. How do we build a panorama? Computational Photography, 6.882 Matching features Building a Panorama Computational Photography, 6.88 Prof. Bill Freeman April 11, 006 Image and shape descriptors: Harris corner detectors and SIFT features. Suggested readings: Mikolajczyk

More information

Fast and robust techniques for 3D/2D registration and photo blending on massive point clouds

Fast and robust techniques for 3D/2D registration and photo blending on massive point clouds www.crs4.it/vic/ vcg.isti.cnr.it/ Fast and robust techniques for 3D/2D registration and photo blending on massive point clouds R. Pintus, E. Gobbetti, M.Agus, R. Combet CRS4 Visual Computing M. Callieri

More information

Mobile Human Detection Systems based on Sliding Windows Approach-A Review

Mobile Human Detection Systems based on Sliding Windows Approach-A Review Mobile Human Detection Systems based on Sliding Windows Approach-A Review Seminar: Mobile Human detection systems Njieutcheu Tassi cedrique Rovile Department of Computer Engineering University of Heidelberg

More information

Model-Based Human Motion Capture from Monocular Video Sequences

Model-Based Human Motion Capture from Monocular Video Sequences Model-Based Human Motion Capture from Monocular Video Sequences Jihun Park 1, Sangho Park 2, and J.K. Aggarwal 2 1 Department of Computer Engineering Hongik University Seoul, Korea jhpark@hongik.ac.kr

More information

Adaptive Multi-Stage 2D Image Motion Field Estimation

Adaptive Multi-Stage 2D Image Motion Field Estimation Adaptive Multi-Stage 2D Image Motion Field Estimation Ulrich Neumann and Suya You Computer Science Department Integrated Media Systems Center University of Southern California, CA 90089-0781 ABSRAC his

More information

Advanced Computer Graphics

Advanced Computer Graphics G22.2274 001, Fall 2009 Advanced Computer Graphics Project details and tools 1 Project Topics Computer Animation Geometric Modeling Computational Photography Image processing 2 Optimization All projects

More information