Eindhoven University of Technology MASTER. Canonical skeletons for shape matching. van Eede, M.C. Award date: 2008

Size: px

Start display at page:

Download "Eindhoven University of Technology MASTER. Canonical skeletons for shape matching. van Eede, M.C. Award date: 2008"

Norman Lucas
5 years ago
Views:

1 Eindhoven University of Technology MASTER Canonical skeletons for shape matching van Eede, M.C. Award date: 2008 Disclaimer This document contains a student thesis (bachelor's or master's), as authored by a student at Eindhoven University of Technology. Student theses are made available in the TU/e repository upon obtaining the required degree. The grade received is not published on the document as presented in the repository. The required complexity or quality of research of student theses may vary by program, and the required minimum study period may vary in duration. General rights Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. Users may download and print one copy of any publication from the public portal for the purpose of private study or research. You may not further distribute the material or use it for any profit-making activity or commercial gain Take down policy If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim. Download date: 23. Aug. 2018

2 Canonical Skeletons for Shape Matching by Matthijs Christiaan van Eede A thesis submitted in conformity with the requirements for the degree of Master s in Computer Science Department of Mathematics and Computer Science Technische Universiteit Eindhoven June 2008 Supervisors: Alex Telea (TU/e) Sven Dickinson (UofT)

3 Abstract Canonical Skeletons for Shape Matching Matthijs Christiaan van Eede Master s in Computer Science Graduate Department of Mathematics and Computer Science Technical University Eindhoven 2008 Skeletal representations of 2-D shapes, including shock graphs, have become increasingly popular for shape matching and object recognition. However, it is well known that skeletal structure can be unstable under minor boundary deformations, part articulations, and minor shape deformations (due to, for example, small changes in viewpoint). As a result, two very similar shapes may yield two significantly different skeletal representations which, in turn, will induce a large matching distance, which is undesirable for recognition. Such instability occurs at both external and internal branches of the skeleton. We present a framework for the structural simplification of a shape s skeleton which balances, in an optimization framework, the desire to reduce a skeleton s complexity by minimizing the number of branches, with the desire to maximize the skeleton s ability to accurately reconstruct the original shape. The result of the optimization is a canonical skeleton whose increased stability significantly improves recognition performance. ii

4 To my family. iii

5 Acknowledgements I would like to express my gratitude to my supervisors, Sven Dickinson and Alex Telea, for their great support, guidance and dedication. I thank Diego Macrini for many fruitful discussion on skeletons, and his help and support during the work on this thesis. The support and encouragement of my family, especially my mother, are a wonderful treasure for which I am most grateful. Thank you all! I thank my good friend and fellow computer scientist Fernando, for his support, thorough review of what follows, and many good discussions on computer vision related issues. iv

6 Contents 1 Introduction Object recognition Shape representations Introduction Boundary based approaches Region based approaches Skeletons Shock graphs Instabilities Sources of instabilities Summary Related Work External branches Internal branches Summary Skeletal Simplification Introduction Skeletal simplification as optimization External branch pruning v

7 3.3.1 Saliency measure Internal branch pruning Candidate internal branches Internal branch pruning Cost function Summary Experiments and Results Experiments set-up Object recognition Pose estimation Experiments on a clean database Experiments on a noisy database Summary Conclusions Limitations and further work Bibliography 54 vi

8 Chapter 1 Introduction This chapter describes what object recognition is and where object recognition systems can be found. It talks about different kinds of shape representations that can be employed for object recognition, in particular skeletons. Shock graphs, an abstraction of skeletons which allow for efficient indexing and matching of shapes, are presented next. Finally the instabilities related to skeletons and shock graphs are introduced followed by a summary stating the goal of this work. 1.1 Object recognition Object recognition is one of the fundamental tasks in computer vision. Several problems in this field rely on the ability to determine the similarity between two objects or to find the closest match between a query object and the objects residing in a database. For example one can think here of visual surveillance in an airport. Additionally, object recognition systems are highly useful in 21 st century manufacturing assembly. A robot arm can use this ability to retrieve the right items from a conveyor belt. Another example would be the automated monitoring of a satellite docking to a space station, where it is essential that the satellite and space station docking apparatus are correctly identified. For humans these sorts of tasks seem to be innate. As soon as we know what an 1

9 Chapter 1. Introduction 2 object of a certain class looks like, we are able to recognize other objects from that class, even when we have never seen them before. Remarkably, human recognition skills extend beyond large variability, as well as severe occlusion. In order to perform shape-based object recognition using a computer vision system, a suitable shape representation must be chosen. Using this shape representation, appropriate object features can then be extracted which will be used for matching in the object database. 1.2 Shape representations This section analyzes boundary based and region based shape representations, focussing on skeletons, a shape representation that is both region as well as boundary based. Several ways to build skeletons are included, and the Augmented Fast Marching Method [33], a simple and robust method for computing skeletons, is described in detail. Finally, shock graphs are introduced. These are an abstraction of skeletons which can be used to index and match shapes Introduction Object recognition is used in many different areas. Sometimes the objective is to separate grass from sky, which can be done by looking at the color of pixels in an image. When a zebra is to be located, the black and white texture of its hide could be a salient feature. However, in general, the objects we are looking for will not all have distinct colors or textures, which is why a suitable shape representation is needed to capture a wide range of objects. Many different representations of any given object exist. Their diversity is one of the indications that the answer to the question Which is the best shape representation? will likely depend on the task at hand. Marr et al. [15] propose a number of criteria that a shape representation should satisfy

10 Chapter 1. Introduction 3 in order to account for the efficiency with which the human visual system recognizes 3D objects. Accessibility The representation should be computed from the image using reasonable resources. Scope It should be able to represent a large class of shapes. Uniqueness It should provide a description that is unique from any point of view. Otherwise, at some point, the problem can arise whether two descriptions describe the same shape. Stability The representation should reflect the similarity between two similar shapes. Sensitivity The representation should preserve differences between shapes. In addition to Marr s criteria, we define the following. Hierarchy The representation should allow for an easy extraction of parts. It is very intuitive to speak of shapes in terms of their parts. Searching for human shapes can be done by identifying that the shape has four relatively long extremities, representing the arms and legs and an ellipsoid-like shape for the head which are all subordinated to a trunk. The matching problem can now be broken into two subproblems. Matching parts being the first one and testing for group membership as the second one. Invariance When identifying objects we often do not care whether the object is situated in the right upper or left lower corner, nor whether it is rotated in a certain way. In general we can say that the representation should be invariant to Euclidean transformations, where rotations become particularly important. The shape of an object remains unchanged when rotated along the normal to the plane

11 Chapter 1. Introduction 4 of view, however, when rotated for instance along the axis perpendicular to the viewing direction, the shape of the object can drastically change. Therefore, the representation should be invariant to translations and in-plane rotations of the shape. Noise It should be able to deal with noise, in that it should be possible to tell apart major and minor features of the shape. As mentioned above, many different shape representations have been formulated, all having different strengths and weaknesses. Among these are two main classes of direct shape representation one being boundary based and the other region based Boundary based approaches Boundary based shape representations are descriptions of shape based on the contour or outline curve of the shape. Many such representation exist, two of which are described next; curves and point sets. Curves Curves [11, 34] belong to the class of boundary based representations, where shapes are represented by their outline curve. Typically, outline curves do not represent a notion of the interior of the shapes. Matching curves involves finding a mapping from one curve to another minimizing an elastic performance functional, where the cost of the deformation is defined as the sum of stretching and bending energies [34]. Curve matching has computational efficiency as a major advantage and this technique has been used in several applications such as handwritten character recognition, signature verification, prototype formation and morphing [23]. However, there are some fundamental limitations in using curves for general purpose object recognition. Because they typically do not represent the interior of the shapes, problems arise in shapes which are globally distinct,

12 Chapter 1. Introduction 5 but which have conflicting local curve-based features. They can also suffer from one or more of the following drawbacks: asymmetric treatment of two curves, lack of rotation and scaling invariance, and sensitivity to articulations and deformations of parts. Point sets The shape outline can also be modelled by point sets where recognition can be performed by matching the point sets using an assignment algorithm. A point set is created by taking point samples from the shape contours without any particular order. No special points, such as landmarks or curvature extrema, are required. Belongie et al. [3] use the Hungarian method to match the boundary points, using a coarse histogram of the relative location of the remaining points as features. Point sets have the advantage of not requiring ordered boundary points and the approach presented in [3] is simple and easy to apply. However, the coherence of shapes is not guaranteed to be preserved in the sense that the relationship among portions of the shape in the process of matching can become ambiguous Region based approaches Region based shape representations exploit the notion of sub-shape and symmetry. Descriptions that belong to this class are Euler s number, projections, eccentricity, elongatedness and moments. An advantage that region based approaches have with respect to boundary based ones is that they capture topological information about the shape as well as inter-point relations, such as (relative) distance. However, some region based methods can have large memory requirements Skeletons A shape representation that combines the advantages of both boundary based and region based approaches, is the skeleton, which aims to capture the part structure of a shape.

13 Chapter 1. Introduction 6 One of the first formal skeleton definitions is that of Blum [4]. To describe the skeleton, Blum gave a definition in terms of a grassfire front. The idea is to imagine that the shape is given as a perfectly dry and flat grass region surrounded by a wet area. Fire is set simultaneously to all boundary points and the front will start propagating inward at a constant speed. When opposing fronts meet, the fire will extinguish forming quench points. This set of quench points constitutes the skeleton or the medial axis of the shape. If the time of formation of the quench points is stored, the resulting structure is called the medal axis transform (MAT), because this information allows us to regenerate the original shape by an inverse grassfire. A more formal way to define the skeleton or MAT of a shape is through the loci of centers of all the maximal disks inscribed within the shape boundary. A disk is maximal when there is no other disk which strictly contains it and is completely inside the shape. The distance of any skeleton point s S to its closest boundary points is given by the radius R(s) of the maximal inscribed disk at s, and corresponds to the time of formation of the equivalent quench point in the grassfire analogy. The original shape can be reconstructed by the union of all disks of radius R(s) centered at s. Since the skeleton was first introduced, a large number of algorithms that compute the skeleton have been published. Three main ways to create the skeleton are morphological thinning, Voronoi diagram approaches and distance transform methods. Morphological thinning One of the properties of a skeleton is that it is topologically equivalent to the original shape. The number of connected components and holes is the same in both and there is a natural mapping between the components or holes in the shape and the components and holes in the skeleton. Morphological thinning methods ensure that the result of skeletonization has this property. In these approaches, the boundary is iteratively peeled off layer by layer, identifying points whose removal does not affect the object s topology

14 Chapter 1. Introduction 7 [35]. They are relatively straightforward to implement and have short execution times, but the produced skeletons are not guaranteed to be thin and thinning approaches need intricate heuristics to ensure the skeletal connectivity. Several thinning methods also fail to produce a skeleton in the maximal disk sense as proposed by Blum [4]. Voronoi diagram A Voronoi diagram is a partitioning of the plane based on proximity to a discrete set of points. Kirkpatrick first observed that the skeleton of a polygonal shape is a subgraph of the Voronoi diagram of the shape s contour, i.e., the Voronoi Diagram contains the boundary s medial axis [17, 18]. Brandt and Algazi [7] show that a good approximation of the skeleton can be obtained using a discrete point sampling of the shape s boundary. For a finite sampling, however, the Voronoi Diagram will contain many edges that are not part of the skeleton and will need to be pruned. Voronoi Diagram approaches compute connected skeletons, but are fairly complex, computationally expensive and the main problem is that pruning may yield different combinatorial structures. Distance transform A third approach uses the distance transform (DT) of the object s boundary. One of the definitions Blum gave for the skeleton was the loci of maximal disks inscribed in the shape. Computing the skeleton thus involves finding the centers of those disks which can be identified with the observation that centers of maximal disks are local maxima of the DT. To compute the DT, Sethian [24] introduced a robust and simple-to-implement Fast Marching Method for evolution of boundaries in normal direction with constant speed. The evolving front can be seen as the fire front in the grassfire analogy and the skeleton lies along the creases or curvature discontinuities of the DT. However, the detection of the curvature discontinuities of the DT is difficult. Direct computation of these singularities is a numerically unstable process [6, 16, 27], and usually

Chapter 1. Introduction 8 Figure 1.1: Visualization of the DT field for a number of objects. Distances range from blue (close) to red (far away) with respect to the object boundary.

15 Chapter 1. Introduction 8 Figure 1.1: Visualization of the DT field for a number of objects. Distances range from blue (close) to red (far away) with respect to the object boundary. cannot guarantee connected and thin skeletons [20, 35]. Telea et al. [33] present a method that overcomes these difficulties, which is simple to implement, delivers connected skeletons, behaves robustly with respect to noisy boundaries and works in near real time. Augmented Fast Marching Method The Augmented Fast Marching Method (AFMM) is a simple and robust method for computing skeletons for arbitrary planar objects proposed by Telea et al. [33] 1. They integrate the notion that skeleton points are generated by the collapse of compact boundary segments during the FMM algorithm s front evolution. The FMM algorithm calculates a scalar field T by solving the equation T = 1 (1.1) (which is a form of the well-known Eikonal equation) with T = 0 on the object s boundary. The field T is a good approximation of the distances of points in the object to the object boundary. Figure 1.1 shows three color coded images where blue represents a small distance to the object boundary and red a large one. The FMM algorithm calculates the 1 For a mathematical description of the FMM see [24, 25].

e, towards points with unknown T values in the shape, freezing the computed T values and adding new points to the band.

16 Chapter 1. Introduction 9 field T outwards starting with the set of smallest known T values. These are the values that lie on the object boundary. For these points the T values are trivially known to be zero. Starting with the boundary points, which is used as a narrow band or evolving front, the algorithm marches these points forward, i.e, towards points with unknown T values in the shape, freezing the computed T values and adding new points to the band. The resulting T field is a distance transform (DT) field which can be used to extract the skeleton of the object as noted earlier. (a) (b) (c) (d) (e) (f) Figure 1.2: The top row shows the visualization of the U values. The bottom row shows the thresholded field. The color of the skeleton points in the bottom row vary from blue to red, where red points have the highest value, meaning that the collapsed boundary length at these points is highest. As a way to determine the importance of a skeleton point, one can use the length

17 Chapter 1. Introduction 10 of the boundary segment that collapsed into that point. This is what the augmentation to the FMM algorithm consists of. The points along the boundary are each given a monotonically increasing real value U, which reflects the distances between boundary points along the boundary. Due to front evolution, every pixel inside the object is marked with the U value of the boundary point that arrived there. The U value of a pixel that lies between initial boundary pixels are averaged between these values, however, if the U values around a pixel differ by more than 2, then the boundary points that account for this pixel were not neighbours and instead of averaging the U values, the U value of this pixel s neighbour is propagated. Figure 1.2 (a) through (c) shows visualizations this field of U values. Blue points have a small U value and red ones have a large U value. Given this U field one can obtain the skeleton by detecting significant discontinuities in the field. This is done by retaining all the points which have a difference in their U values greater than a given threshold (See Figure 1.2 (d) through (f)). This threshold can be used to determine the amount of accuracy the skeleton is required to have with respect to the original object (See Figure 1.3). In fact the threshold expresses the collapsed boundary length for a given skeleton point. And thus using 20 as threshold means one is only interested in skeleton points that account for a collapsed boundary length of at least 20 pixels. We see in the figure that using higher values for the threshold results in a skeleton that represents less detail on the boundary. The AFMM produces skeletons that are very similar to the ones delivered by more complex methods [33]. It is simple to implement and runs quickly and reliably on large 2D datasets. This is why the AFMM is used to compute the skeleton for the shapes in this work.

18 Chapter 1. Introduction 11 b = 2 b = 20 b = 50 b = 100 b = 150 Figure 1.3: A variety of values for thresholding the U field produces several skeletons. In light grey the original object is shown. The skeleton for a given threshold along with the boundary represented by that skeleton are shown in black. When using higher values for the threshold, branches that account for small detail on the boundary disappear, resulting in a smoother object boundary Shock graphs Skeletons are a shape transformation that highlight the symmetries of the boundary. Hence, it does not provide an abstraction of the shape, but instead, it can be used to compute such an abstraction. The main argument to use the skeleton for the task of recognizing objects, is that the skeleton simplifies the decomposition of a shape into parts, providing the means to deal with partial occlusion. Additionally, skeletons are invariant to scaling, rotation and translation. A shock graph is an abstraction of the skeleton which allows for efficient indexing and matching of shapes. It is a directed acyclic graph representing the decomposition of a skeleton into primitive parts.

Chapter 1. Introduction 12 The shock graph is inspired by Blum s classical work [4], in which he explored using directed graphs to define equivalence classes of shapes.

19 Chapter 1. Introduction 12 The shock graph is inspired by Blum s classical work [4], in which he explored using directed graphs to define equivalence classes of shapes. The concept of a shock graph was defined by Siddiqi et al. [28] as an abstraction of the skeleton of a shape onto a directed acyclic graph (DAG). They recognize that skeleton points, or shocks, provide additional information apart from their radius to the object boundary, and introduce a coloring of the skeleton points based on the local variation of the radius function at each point. The colored description of the skeleton provides a much richer basis for recognition than when an unlabelled skeleton is used [30]. Type 1 Type 2 Type 3 Type 4 Figure 1.4: The four types of shocks. A type 1 shock describes protrusions. A type 2 shock occurs at a neck, and is followed by two type 1 shocks flowing away from it. A part with parallel sides is defined by a type 3 shock, and a type 4 occurs at a local maxima in the radius function. The coloring of the skeleton points can be illustrated by simulating a walk along the skeleton. Type 1 shocks form a segment of skeleton points, in which the radius function varies monotonically, as is the case for a protrusion. At a type 2 shock the radius function achieves a strict local minimum, i.e., arises at a neck, and is immediately followed by two type 1 branches flowing away from it in opposite directions. Type 3 shocks belong to an interval of skeleton points, in which the radius function is constant, i.e., for a part with

20 Chapter 1. Introduction 13 parallel sides. Finally, a type 4 arises when the radius function achieves a strict local maximum, as is the case when the boundary collapses to a single point. An example of each shock type is shown in Figure 1.4. Shocks can be formalized as follows [30]. Let X be the object s shape and S its set of skeleton points. Let N(s, ǫ) be the set of skeleton points in S \ {s} within a distance ǫ from skeleton point s, and let N(s) be the set of the largest connected sets of points in S \ s. Let R(s) be the radius function R : S R +. Let l(s) be the shock labelling function l : S {1, 2, 3, 4}. Then l(s) for s S is defined as: 4, if ǫ > 0 s.t. R(s) > R(s ) s N(s, ǫ), 3, if ǫ > 0 s.t. R(s) = R(s ) s N(s, ǫ) and N(s, ǫ), l(s) = 2, if ǫ > 0 s.t. R(s) < R(s ) s N(s, ǫ) and N(s, ǫ) and N(s, ǫ) is not connected, 1, otherwise. (1.2) From this definition we can also see a relationship between the velocity dr/ds and acceleration d 2 R/ds 2 along the skeleton and the assignment of shock labels. After identifying the existing shock types, shock groups are formed. They consist of skeleton segments in which all the shocks have the same label and belong to the same branch. Let B 1...B n be the largest groups of connected shocks in S such that s, s B i, l(s) = l(s ) and s B i, either N(s) 2 or if N(s) > 2, then s must be a terminal point of B i for 1 i n. Let the group s label, l(b i ), be the label of the shocks in B i, and let the time of formation of the group, t(b i ), be the interval [min s (R(s)), max(r(s))] defined by the time of formation R(s) of all s B i. s The shock graph of a 2-D shape is a labelled graph G = (V, E, γ) such that: vertices V = {0,..., n}, corresponding to the groups B 1... B n, and 0 denoting the root node; edges (i, j) E V V directed from vertex i to vertex j if and only if i j,

21 Chapter 1. Introduction 14 and i 0 t(b i ) t(b j ) 5 B i B j is a connected set of shocks, or i = 0 k 0 (k, j) / E; labels γ : V {#, 1, 2, 3, 4, Φ} such that γ(i) = # if i = 0 and γ = l(b i ) otherwise. Any shock graph can be described by a grammar consisting of the following symbols: {# (the start symbol), 1, 2, 3, 4, Φ (the terminal symbol)} and ten rewrite rules which can be seen in Figure 1.5. Examples of skeletons with their shock graphs can be seen in Figure 1.6. Figure 1.5: The shock graph grammar, image from Macrini [14]. For details, please refer to [30]

Chapter 1. Introduction 15 Figure 1.6: Figures from Siddiqi et al. [29] The left shows two shapes with their medial axis colored (or labelled) according to the shock definition.

22 Chapter 1. Introduction 15 Figure 1.6: Figures from Siddiqi et al. [29] The left shows two shapes with their medial axis colored (or labelled) according to the shock definition. The labels for the skeleton segments (shocks) are attributed with their shock type and identification number. The right shows the corresponding shock graphs. 1.3 Instabilities Skeletons are a powerful representation of shape, and with the shock graph abstraction of skeletons, we have an efficient means for the indexing and matching of shapes. However, skeletons, and in turn their computed shock graphs, suffer from three forms of instability. The skeleton of a shape is defined by points on its boundary, and changes on the boundary result in an alteration of the skeleton. An instability occurs when a relatively small modification of a shape s boundary leads to a significant change in the underlying skeleton.

23 Chapter 1. Introduction 16 Figure 1.7: Instability related to small perturbations on the shape s boundary. On the left we see two very similar shapes, with the original shape on top and the shape with two small protrusions on the boundary on the bottom. The significant skeletal and shock graph changes do not proportionally reflect the minor changes in shape Sources of instabilities This section describes the sources of instability in a structured way. First contour noise is discussed, and then instabilities related to articulation of parts and instabilities caused by small viewpoint changes are described. Contour noise The first instability is caused by small perturbations on the shape s boundary. These can be small protrusions (bumps) or indentations (notches). The result of these kinds of artifacts are spurious branches explaining the perturbations. An example of this is

24 Chapter 1. Introduction 17 illustrated in Figure 1.7. The upper row shows the original shape. The bottom row shows the same shape with two small protrusions added to the shape s boundary. The result of these protrusions is that the the long branch delineating the blade of the knife is broken up in three parts due to the two new branches depicting the bumps. In turn this shows up in the corresponding shock graphs. Such a significant change in the skeleton, and consequently in the shock graph, does not proportionally reflect the small change in the shape s appearance and severely complicates the matching process. Articulation instability An articulation instability occurs when parts of an object articulate, and the visible structure remains, but the skeletal representation changes. This instability is caused by the so called ligature branches, which were introduced by Blum [4]. Ligature branches are related to concave corners on the boundary of the shape and have very little boundary support, i.e., the position of only a few boundary points control the position of these branches. Small changes in the position of these concave corners can have a great effect on the ligature branches related to them. Figure 1.8 presents two hand shapes. The difference between the two shapes is the articulation of the fingers, which also means an alteration in the concave corners between them. The internal structure of the skeleton has shifted around, and just as we saw with the first instability, the underlying shock graph is modified in result. Viewpoint instability There are two forms of instability related to small viewpoint changes. In the first one structure becomes visible as the object rotates, such as the handle of a cup. In this case new structure is created in the skeleton. In the second one, as was the case with the articulation instabilities, the visible structure remains, but the skeletal representation changes as the object rotates. This is the case where the branches that represent the limbs of a

horse attach to the torso in a different way as it rotates. An example of this is shown in Figure 1.9, where the object on the right side shows a rotated version of the object on the left side.

25 Chapter 1. Introduction 18 Figure 1.8: Instability related to ligature branches. Small changes in concave corners on the shape s boundary can lead to a different internal structure of the skeleton, which also affects the underlying shock graph. horse attach to the torso in a different way as it rotates. An example of this is shown in Figure 1.9, where the object on the right side shows a rotated version of the object on the left side. Note the effect that the rotation has on the skeleton representing the objects. The internal structure differs where the branches of the front legs, and the main torso and head meet. This instability can be signalled by ligature branches as well, since the viewpoint change in the object caused the concavity between the front legs to change. Figure 1.9: Viewpoint instability. A small change in the viewpoint of an object can alter the internal structure of the skeleton.

26 Chapter 1. Introduction 19 These instabilities pose a major obstacle to effective object recognition, in general, and generic object recognition, in particular, where representational invariance to part articulation, minor shape deformation, and minor changes in viewpoint is essential. If such changes in an object s shape induce major changes in its underlying skeletal (or shock) graph structure, the distance between two graphs (as computed by a skeletal graph matcher, such as [29, 22]) will not reflect the distance between the two shapes. The difficulty associated with dealing with these instabilities varies. The ones caused by contour noise are the most common, but also the easiest to deal with. A number of researches have proposed methods to alleviate this problem, some of which are described in Chapter 2. Viewpoint and articulation instabilities are much harder to deal with, because of the existing ambiguity related to the hierarchical structure of the object. However, some techniques have been proposed in order to address this issue, two of which are mentioned in Chapter 2. This work presents a new technique where both these instabilities and the contour instabilities are dealt with simultaneously. Finally, rotation instabilities that create or remove structure from the skeleton are extremely difficult to deal with, and the community is still looking for ways to represent this variability, particularly in the context of shape matching. 1.4 Summary Skeletons are a powerful shape representation, and shock graphs, an abstraction of skeletons, can be used for efficient indexing and matching of shapes. However, it is well known that skeletal structures can be unstable under minor boundary deformation. Small protrusions on the boundary can cause significant skeletal changes that do not proportionally reflect the minor changes in the global shape. Furthermore, ligature branches, which are related to concave corners on the boundary of the shape, are unstable under articulation of parts and, for example, under small viewpoint changes. As a result two very similar

27 Chapter 1. Introduction 20 shapes may yield two significantly different skeletal representations which, in turn, induces a large matching distance. Such instability occurs both at external branches as well as internal branches of the skeleton. Our challenge is to eliminate these types of instability by structurally simplifying a shape s skeleton, so that non-salient branches, both internal and external, are removed, leaving a canonical skeleton that captures only the salient part structure of the shape. We present a framework for the structural simplification of a shape s skeleton which balances, in an optimization framework, the desire to reduce a skeleton s complexity by minimizing the number of branches, with the desire to maximize the skeleton s ability to accurately reconstruct the original shape. This optimization yields a canonical skeleton whose increased stability yields significantly improves recognition performance.

28 Chapter 2 Related Work This chapter describes a number of methods that researchers have proposed to address the instabilities related to skeletons. The computer vision community has addressed both instabilities separately. The emphasis has been both on the skeleton as well as on the shock graph side. First approaches dealing with spurious external branches are considered. Then approaches that deal with spurious internal branches are described, and finally a technique that deals with both the external and the internal branches is outlined. 2.1 External branches In the previous chapter we identified the instabilities that can occur in the skeleton. One related to protrusions on the shape s boundary and the other related to ligature branches. Protrusions on the contour of the shape can create spurious external branches. A branch is external if it has exactly one terminal endpoint in the skeleton tree, and internal otherwise. This section describes a number of methods proposed to deal with spurious external branches. 21

29 Chapter 2. Related Work 22 Early algorithms, used to compute the discrete, pixel-sampled MAT, tended to create skeletons that were extremely sensitive to boundary noise, spatial sampling rate, and small perturbations of the shape boundary. Several researchers, including Blum himself, proposed solutions to robustly computing the MAT. The generalized MAT [4] considers only skeleton points s with radius R(s) > r 0 greater than some threshold r 0 ; however, this may result in a disconnected skeleton. Another approach to deal with spurious external branches is to use branch pruning or multiscale representations [8, 10, 13, 17, 18, 19, 33]. The idea behind using multiscale representations is that among the different levels of scale, one representation will be sufficiently close in structure to the representation in the database. Dill et al. [10] propose to smooth the shape and compute the MAT at different levels of smoothing. However, smoothing the shape can cause substantial changes in the structure of the skeleton and determining the correct correspondences between the skeletons at distinct levels of smoothing becomes a difficult problem. Figure 2.1: The chord residual, figure from [19]. The chord residual R H (e) of branch e is given by the distance along the boundary between the generators of e, p A and p B minus the chord length between p A and p B. Instead of using several levels of smoothed shapes, Ognniewicz et al. [17] propose a

30 Chapter 2. Related Work 23 method that starts from the MAT of the original, non-smoothed shape. This particular method was based on Voronoi diagrams and uses the chord residual R H (e) of a branch e in the Voronoi diagram to determine its saliency (see Figure 2.1). Branches in a Voronoi diagram are related to the so called generators, i.e., sampled points on the boundary that were used to create the Voronoi diagram. Let p A and p B be the two generators for branch e. Then w AB denotes the minimal distance along the shape s boundary from p A to p B and s AB denotes the length of the chord p A p B. The chord residual of the branch is now given by: R H (e) = w AB s AB (2.1) The chord residual of a branch gives a notion of saliency and can be used to order the branches of the skeleton according to different levels of importance. Figure 2.2 shows the Voronoi skeleton of the object thresholded using the chord residual R H at threshold T = 3.0 on the left. A 3D visualization of the residual values in the middle, and on the right the multiscale skeleton is the result of assigning significance levels to the branches. (a) (b) (c) Figure 2.2: The multi-scale skeleton, figure from [19]. In (a) the Voronoi skeleton is pruned with the chord residual, T = 3.0, (b) shows the height of the chord residuals for the different branches. Higher ridges correspond to higher level branches in the hierarchy of the skeleton. (c) The result of assigning the different hierarchy levels to the branches. Telea et al. [33] propose a similar multiscale approach for raster objects. As detailed

Chapter 2. Related Work 24 (a) (b) (c) (d) Figure 2.3: The Instability of a Shape s Skeleton.

Minor deformations in shape due to viewpoint change or articulation (c) and (d) may result in major changes in the topology of ligature segments (darkened) which, if spanning an entire branch, are

in Chapter 1, every skeleton point is attributed with the length of the collapsed boundary associated with that point.

31 Chapter 2. Related Work 24 (a) (b) (c) (d) Figure 2.3: The Instability of a Shape s Skeleton. Considering only skeleton points with radius greater than some threshold r does not eliminate all spurious branches in the presence of bumps (a) or notches (b). Minor deformations in shape due to viewpoint change or articulation (c) and (d) may result in major changes in the topology of ligature segments (darkened) which, if spanning an entire branch, are called ligature branches. Changes in branch color reflect qualitative changes in the branch s radius function. in Chapter 1, every skeleton point is attributed with the length of the collapsed boundary associated with that point. Branches can now be pruned using this metric as a global threshold to discard branches that are related to boundary detail up to a certain length r 0. Potentially this threshold could be used to remove all branches that associated with unwanted detail on the boundary. However, the size of the largest part of noise that needs to be remove is ignored a priori, which means a fairly large threshold must be chosen. But since the threshold has a global effect on the skeleton, using a high threshold will not only remove unwanted detail, but essential shape detail as well. Furthermore, these approaches work only for external branches. Spurious external branches occur when noise is added to the contour of a shape, as illustrated in Figure 2.3(a,b), where random bumps and notches are added to a hand shape. Overall, most methods encounter problems in eliminating spurious internal branches while retaining important descriptive branches

32 Chapter 2. Related Work Internal branches Another source of skeletal instability has been studied by August et al. [1], who have shown that shape boundary concavities produce so-called ligature branch segments whose points are related only to the concave boundary points; when a ligature segment spans the entire branch, it is called a ligature branch [4, 1]. Small positional changes of such concavities can cause significant ligature branch structural changes (see Fig. 2.3(c,d)), which ultimately give rise to significant differences in their corresponding shock graphs [28]. August et al. [2] show that the internal skeleton instabilities cannot be removed by boundary smoothing alone. Giblin and Kimia [12] have catalogued all the generic transitions of the medial axis and showed that the above types of MAT instabilities (one related to spurious external branches, the other to ligature branches) are the only cases where small boundary changes produce large representation (skeleton) changes. 1 August et al. [1] attempt to deal with the instability regarding the internal structure of the skeleton (the branching topology) by looking at the regions known as ligature. They view ligature points as the glue that connects parts together, e.g., fingers of a hand to its palm, and argue that this is the only information needed from ligature in the shape description. They do not touch the medial axis itself, but instead address the ligature instability in the corresponding shock graph. There, the shocks related to ligature are identified and collapsed, i.e., they are deleted, but the connection information which is contained in the ligature shocks is retained because the resulting shock graph is still connected by structural links where the ligature shocks used to be. This is illustrated in Figure 2.4. On the left we see the two shock graphs related to the two hand shapes from Figure 1.8. The ligature segments of the fingers are denoted by colored nodes in the 1 In fact, these two forms of instability are characterized by transitions of the symmetry set [12]. Another form of instability addresses the movement of a branch point as a function of boundary deformation (see, for example, Bouix et al. [5]). We do not address this form of instability since its affect on skeletal structure is minimal.

33 Chapter 2. Related Work 26 (a) (b) Figure 2.4: Excluding nodes in the shock graph related to ligature. (a) The two shock graphs related to the hand shapes from Figure 1.8, and (b) the shock graph both graphs on the left reduce to when ligature nodes are deleted. The dashed lines represent structural links. graph. On the right we see the shock graph resulting from the two graphs on the left, when the ligature nodes have been removed and replaced by structural links (the dashed lines). However, this method requires the robust detection of concave corners, in order to identify the ligature, which is a challenging problem for discrete images. Telea et al. [32] address both the instabilities at the external branches as well as the instability related to internal structure of the skeleton. They propose a principled framework that generates a simplified, abstracted skeleton hierarchy by analyzing the quasi-stable points of a Bayesian-inspired energy function. The resulting model is parameterized by both boundary and internal structure variations corresponding to object scale and abstraction dimensions and trades-off reconstruction accuracy and representation parsimony. The hierarchical skeleton abstraction is formulated over a two parameter family (b, s) representing boundary b and internal structure s respectively. To accommodate the boundary simplification, the skeleton is computed using the AFMM method [33]. The

34 Chapter 2. Related Work 27 skeleton produced by this algorithm has just one threshold to vary the detail on the boundary it accounts for, which is based on the collapsed boundary length for each skeleton point. This threshold is used to control the boundary simplification b of the objects. To account for the internal instability, the structural simplification M(s) is parameterized by s. The neighbours of a point x of the skeleton M are denoted by n(x, M) = n(x). Given a point x s in the simplified skeleton M s, the corresponding point in the original skeleton M d is given by c(x s, M s, M d ) = c(x s ). The set of all endpoints of M is given by e(m) and the two endpoints of the i th branch b i of M by b i (M). When applying the structural simplification M(s), only internal branches that are shorter than s will be removed from M. The result of removing an internal branch is that the skeleton will get disconnected, generating two sub-skeletons. These sub-skeletons are reconnected by translating one of them to join the other. This translation will displace the positions of the points in the sub-skeleton and will cause large changes to the object represented by this simplified skeleton compared to the original one. To minimize the impact of the structural simplification and preserve the consistency of the skeleton the points in M s are being optimized by the following cost function: F(M s M d ) = ( x s y s x d c(x s ) yc(y d ) ) 2 (2.2) s with two hard constraints x s M s y s n(x s ) x s i = xd c(x s i ), c(xs i ) e(md ) (2.3) x s k = xs p, if b i(m d ) = {c(x s k ), c(xs p )} and b i is eliminated (2.4) The idea of the structural simplification is to decrease the complexity of the structure of the skeleton without drastically changing the meaning of the skeleton. Thus, the object represented by the skeleton should be as close to the original object as possible. The first constraint given by Equation 2.3 enforces a boundary condition which keeps the

35 Chapter 2. Related Work 28 endpoints of the skeleton fixed. The second constraint given by Equation 2.4 insures that the removed branch will not reappear after the relaxation. A spring embedder approach is used to minimize the energy function with the given constraints. Figure 2.5 shows some results of the structural simplification proposed in [32] applied to the skeleton of a hand. In the top row only boundary simplification is used, and thus the instability at the internal branches remains. In the bottom row, both simplifications are applied. We see that in the latter case, the internal instability has been overcome. However, the method does not generate a true axis of symmetry, in the sense of the MAT definition, and thus cannot be directly used with existing skeleton-based shape matching techniques, such as [21, 29]. 2.3 Summary The instabilities in the skeleton occur at external and internal branches. Several methods have been proposed to prune spurious external branches. These work well in general, although often the process of pruning external branches has a global as opposed to a local effect on the skeleton. Instabilities related to ligature branches are much harder to deal with. The method proposed by August et al. [1] requires that concave corners are accurately identified, which is a challenging problem for discrete images.

36 Chapter 2. Related Work 29 (a) b = 25, s = 0 (b) b = 50, s = 0 (c) b = 100, s = 0 (d) b = 25, s = 20 (e) b = 50, s = 50 (f) b = 100, s = 50 Figure 2.5: Structural simplification as proposed in [32]. In the top row only boundary simplification is used, addressing the instability related to spurious external branches. In the bottom row both instabilities are tackled by applying both boundary and structural simplification.

37 Chapter 3 Skeletal Simplification This chapter introduces an optimization framework for structural simplification that balances, on the one hand, our desire to abstract or simplify a shape s skeletal representation, and on the other hand, our desire to yield a representation that corresponds to the original shape, i.e., a skeleton whose reconstruction error is minimal. Such a simplified skeleton leads to an underlying shock graph containing fewer nodes. As a consequence, the task of matching this less complex graph is simplified. 3.1 Introduction The trade-off between abstraction (complexity) and faithfulness (reconstruction error) is task dependant. Do we want to recognize a particular instance of an object class, or any object of the class? Different applications require different levels of specificity. Still, the complexity of the skeleton and the faithfulness of the reconstruction provide a pair of opposing forces that enables us to converge at a canonical skeleton for shape matching. In order to do this we prune those skeletal branches that do not contribute to the salient shape structure of the object. The questions that need to be answered then, is how to measure the saliency of a branch, and, when to stop pruning branches. Our structural simplification procedure is divided into two stages, both of which 30

38 Chapter 3. Skeletal Simplification 31 balance reconstruction error with branch complexity. The first stage removes unstable external branches, while the second stage removes unstable internal branches. Removing external branches first is motivated by the fact that an external branch may separate two internal branches that merge naturally after removing the external branch, thereby simplifying internal structure. Removing internal branches involves first identifying ligature branches, which become candidates for removal. Removing a candidate ligature branch requires modifying neighbouring branches subject to all of them obeying the properties of a MAT and at the same time keeping reconstruction error small. 3.2 Skeletal simplification as optimization Skeletons can be simplified on different levels, and Figure 3.1 shows an example of this. The right column shows the corresponding shock graph for each shape on the left. It is clear that simplifying the skeleton, i.e., pruning branches from the skeleton, has a decreasing effect on the complexity of the shock graph. On the other hand, it can also be seen that shape detail is being lost. This is denoted by the dark grey areas, which we refer to as reconstruction error. For the proposed optimization strategy, both the decreasing complexity of the underlying shock graph and the amount of reconstruction error induced by pruning the skeleton is taken into account. The effect that pruning a branch has on the shock graph is related to the number of shocks the branch contains. This means that the gain of removing a branch in terms of decreased graph complexity can be locally computed for each branch. The cost of removing a branch with respect to reconstruction error is, however, dependent on the saliency of that feature within the original image, and therefore not so easily available. To this end, it is necessary to define the saliency of a branch, i.e., define what the contribution of a branch means in terms of shape detail.

39 Chapter 3. Skeletal Simplification 32 Figure 3.1: Different levels of structural simplification. On the left three shapes with their skeleton can be seen, where an increasing amount of simplification is performed on the external edges. Reconstruction error (darker grey areas) increases with increased simplification. On the right, shock graphs corresponding to the skeletons on the left are shown. Shock graph complexity decreases as external branch simplification increases. 3.3 External branch pruning A powerful property of skeletons is that they provide a part decomposition of a shape. Consider for example the bull in the middle row in Figure 3.1. The skeleton describing this shape shows branches for the legs, the tail, the torso, the neck and two branches

40 Chapter 3. Skeletal Simplification 33 Figure 3.2: Saliency of a branch. The area of the shape that is solely related to the front leg (red branch) as well as the area related to the nose (blue branch) are depicted by the regions on the left. The coloring of the shape on the left is according to the normalized distance transform of each pixel, where a pixel on the skeleton is attributed a value of 1 (white) and a pixel on the boundary the value 0 (black). describing each horn. However, in the bull on the first row we see that small perturbations on the boundary of the bull s shape are being explained through spurious branches. These branches in fact only account for a very small portion of the shape and do not provide any global shape information. When pruning external branches from a skeleton it would be desirable to start with these branches, i.e., those branches with small contributions to the shape of an object. To do this, we need a measure for the saliency of a branch and we want the formulation to favor thick and elongated parts. Figure 3.2 shows the relationship between branches and the regions of the shape that are associated with them. At first glance, taking this area seems like a good measure for the saliency of that branch. However, the problem with using a simple area difference is that it fails to capture salient shape differences due to, for example, the removal of a long, thin part (e.g., the leg of a horse) whose area relative to the entire object is small, but whose contribution to salient part structure is large. This can be seen in Figure 3.2

41 Chapter 3. Skeletal Simplification 34 where the regions associated with the nose and the leg of the horse appear to be similar, even though the leg represents an important global part of the shape, whereas the nose, in terms of part structure, is less significant Saliency measure To account for part structure we weight each pixel s contribution to the area by its normalized distance transform [31], in which the skeleton receives value 1 and the boundary receives value 0. In this manner, the skeleton of a long, thin part is weighted the same as that of a long, thick part, as are their respective boundaries. However, the larger area of the thick part results in a larger integration of normalized distance transform values, and hence in a larger reconstruction error. This way, it is possible to balance salient part structures using the weighted part mass, yielding an effective reconstruction error. In Figure 3.2 the normalized distance transform is visualized on the left. High intensity pixels (white) represent the pixels with the highest reconstruction error, whereas the low intensity pixels (black) contribute only little to the reconstruction error. It is now possible to identify candidate external branches for pruning by rank-ordering them with respect to their saliency. Branches that contribute less to the shape, and whose removal yields small reconstruction error, are ranked higher than branches that contribute more to the shape, and whose removal yields large reconstruction error. In the case of external branches, all external branches are considered as candidates for pruning. The result of the rank-ordering of the external branches of the horse is shown in Figure 3.3.

42 Chapter 3. Skeletal Simplification 35 Figure 3.3: Rank-ordering of external branches. Branches which have a low reconstruction error associated with them, are ranked before branches with a larger reconstruction error. 3.4 Internal branch pruning When parts of an object articulate or the viewpoint of the object changes slightly, the internal structure of the skeleton representing that shape can change significantly. In Figure 3.4 we see two hand shapes of which the fingers are differently articulated. The effect can be seen at the location where the branches of the fingers meet in the middle of the hand. The internal branches that cause this effect are called ligature branches. These are related to concave corners on the boundary and have very little boundary support. The position of concave corners can have significant effects on the underlying skeleton. A ligature branch can shorten and eventually disappear during boundary smoothing. If the smoothing continues, the ligature branch can reappear, and produce a different internal structure from that of the initial version [2] (See Figure 3.5. The moment when a ligature branch is contracted represents a canonical state, one where different configurations meet.

Intuitively, it is desirable to create this canonical state (similar topologies) in the skeleton by pruning short (low saliency) ligature segments and branches.

43 Chapter 3. Skeletal Simplification 36 (a) (b) Figure 3.4: Internal skeletal differences due to articulation of parts. The articulation of the fingers, causes the internal structure of the skeleton to change. Intuitively, it is desirable to create this canonical state (similar topologies) in the skeleton by pruning short (low saliency) ligature segments and branches. Since this instability is caused only by ligature branches, it is first necessary to identify these branches as candidates for pruning Candidate internal branches To identify the ligature branches, the radius function of each internal branch is analyzed. Specifically, let a skeleton S be a discrete, connected set of points in N 2, and let the local neighbourhood of a discrete point be its 8-neighbourhood. The radii of a branch s skeleton points is approximated as a function of the cumulative piecewise linear distance, d i, along the branch {s i } with endpoints s 0 and s n, where s i = [x i, y i ], for 0 i n. This distance is given by d i = i 1 k=0 s k+1 s k 2, and the radius ˆR(d i ) of the skeleton point s i at distance d i from s 0 is equal to R(s i ). A least-squares fitting error is used for each line segment. Since outliers are not ex-

44 Chapter 3. Skeletal Simplification 37 Figure 3.5: An example of the evolution of the skeleton under boundary smoothing [2]. Internal skeleton instabilities cannot be removed by simply applying boundary smoothing. pected, an unweighted least-squares method provides a good approximation. To compute the n+1 indices of endpoints for n line segments that minimize the fitting error, we define the following function: LSF(i, k) if n = 1, Ê(n, i, k) = {Ê( n/2, i, j) + Ê( n/2, j, k)} otherwise; min i<j<k (3.1) where LSF(i, k) is the line and its associated error that best fits, in the least-squares sense, the data between endpoints indexed by i and k, i.e., LSF(i, k) = e(m ik, b ik ), for e(m, b) = k j=i (R(d j) (md j +b)) 2 and (m ik, b ik ) = argmin {e(m, b)}. In turn, Ê(n, i, k) m,b R is the minimum error that can be achieved when fitting points i to k with n segments. Note that the segments are constrained to be continuous on s but not on R(s). We implement the function Ê(n, i, k) using dynamic programming and use it to find

45 Chapter 3. Skeletal Simplification 38 (a) (b) Figure 3.6: Approximating a Branch s Radius Function for Ligature Segment Identification. (a) The radii of maximally inscribed circles rapidly decrease as we move toward the concave corner between the fingers. (b) We compute a piecewise linear approximation to the radius function. the smallest value of n whose minimum error is smaller than half the number of skeleton points in a branch. This piecewise linear representation of the radius function of a skeleton branch allows us to identify the ligature segments within a branch. Since ligature segments are associated with concave boundary corners, it follows that they must start at a branch junction point, have decreasing radii, and end at the first abrupt change in the slope of R(s) (see Figure 3.6) Internal branch pruning The identification of ligature segments allows us to reconnect a skeleton when removing internal branches without significantly affecting the original shape boundary. We locate the endpoints of a ligature segment within a branch by detecting significant accelera-

46 Chapter 3. Skeletal Simplification 39 tions in the branch s radius function, i.e., differences between the slopes of two adjacent line segments that exceed a threshold. Let m 0 and m 1 be the slopes of adjacent line segments with equal sign. We group together the points associated with these segments if m 0 m 1 max( m 0, m 1 ) τ l, where τ l is the ligature segment threshold. Note that max( m 0, m 1 ) > 0 because in this step we have only decreasing branches. Hence, the ligature detection does not depend on a precise detection of local concave boundary corners, but rather on a more robust, global measure of relative slope change. In the proposed approach ligature branches are not simply removed (as in [1]), but rather they are marked as potential removals during the optimization procedure that balances reconstruction error with branch complexity. When removing a ligature branch, the skeleton branches that were connected to it are re-attached in order to preserve skeleton connectedness, as illustrated in Figure 3.7. Consider the removal of the small ligature branch in Figure 3.7(left) below the junction of the index and middle fingers. If the concavity between the index and middle finger was deepened, as shown in Figure 3.7 (middle), the small target ligature branch would eventually disappear. The proposed strategy, therefore, is to approximate this deepening of the concavity by modifying the branches adjacent to the ligature branch. To do this, only those branches attached to the smaller (in terms of radius) end of the ligature branch are altered; in Figure 3.7, this corresponds to the endpoint that leads to the ring finger. The target ligature branch is then removed, and the adjoining branches are modified to connect to the larger end of the removed ligature branch. The branches to be modified may consist of both non-ligature and ligature segments, as shown in Figure 3.7 (left), where the index finger consists of a non-ligature segment (red) at its extremity and a ligature segment (brown) attached to the ligature branch to be removed. The first step in the adjoining branch modification is to replace the adjoining ligature segments with linear approximations from their smaller endpoints to

47 Chapter 3. Skeletal Simplification 40 Figure 3.7: Removing ligature branches: (left) original skeleton in preparation for removal of target ligature branch below index and middle fingers; (middle) slight deepening of the concavity between the two fingers. Notice that the skeleton no longer has the proposed target ligature branch; (right) overlay of the left and middle figures, motivating the proposed approximation method that replaces the adjoining ligature segments and the target ligature branch with straight-line approximations. the larger endpoint of the removed ligature branch. 1 This effectively bridges the gap left by the removed ligature branch. However, the correct radius values must be assigned to the skeleton approximation. The new approximations to the two adjoining ligature segments of our target ligature branch may effectively deepen the concavity between the two fingers, and as a result, part of the ligature approximation may become non-ligature. For this portion of the ligature approximation, the radius values are assigned based on a linear extrapolation of the adjoining non-ligature segment s radius function. For the portion that remains a ligature segment, a linear interpolation between the two endpoints is used to assign the 1 Ligature segments correspond to maximal circles that share two (in the case of full ligature) boundary concavity points, or one (in the case of semi-ligature) boundary concavity point [1]. These constraints lead to ligature segments with low curvature, facilitating our straight-line approximation. This approximation can be improved by considering the constraints on the gradient of radius values of skeleton points, as defined by Damon [9].

48 Chapter 3. Skeletal Simplification 41 remaining radius values. (a) (b) (c) Figure 3.8: Different levels of internal skeletal simplification, with no simplification (a), mild simplification (b) and strong simplification (c). When mild smoothing is applied to internal branches, the branches representing the fingers of the hand intuitively meet in a single point (b). Strong smoothing introduces additional reconstruction error (c). Similar to pruning external branches, reconstruction error is introduced when simplifying the internal skeletal structure. In Figure 3.8 we see from left to right three hand shapes where an increasing amount of smoothing has been applied to the internal branches. In the unsmoothed skeleton, two small ligature branches connect the four branches representing the fingers. When mild internal smoothing is applied, these two ligature branches are pruned and the the four branches now intuitively meet in a single point. Some reconstruction error can be seen between the two right most fingers. This is the effect of the changes to the concave corners which effectively remove the ligature branches. An even stronger smoothing prunes the remaining ligature branch, further reducing shock graph complexity, but the star shaped skeleton fails to capture the more natural separation of the thumb and the other fingers. As it can be seen when mild smoothing is applied, displacements of the concave corners result in an altered overall

49 Chapter 3. Skeletal Simplification 42 shape. Furthermore, additional reconstruction error appears on the right side of the hand, which is an artifact of the linear piece-wise fitting during the re-attachment of the branches after the ligature branch was removed. 3.5 Cost function As mentioned above, the proposed simplified skeleton balances reconstruction error with shape (branch) complexity, and during the branch selection process, both candidate internal and external branches for pruning are rank-ordered by increasing reconstruction error. The reconstruction error is area-based, and therefore reflects the area difference between the reconstructed shape from the skeleton with the branch and the reconstructed shape without it. The reconstruction error associated with a shape, as described above, is constructed by the weight each pixel contributes to the area by its normalized distance transform. This metric assigns a value of 1 to the skeleton, and the boundary receives value 0. Thus, the skeleton of a long, thin part is weighted similarly to that of a long, thick part. However, the larger area of the thick part results in a larger integration of normalized distance transform values, and hence in a larger reconstruction error. This way, we can balance the saliency of a part structure with its mass, yielding an effective reconstruction error. Specifically, for each shape point p, we associate the closest skeleton point s p S (See reference Figure 3.9): The reconstruction error of a point p, E(p), is now given by: s p = min s p (3.2) s S E(p) = 1 sp p R(s p ) R(s p ) (3.3) where R(s p ) is the radius of s p. The reconstruction error R(S) for a shape S with respect

50 Chapter 3. Skeletal Simplification 43 Figure 3.9: The importance of a pixel p in a shape is related to the distance towards the closest skeleton point S p and the radius of that skeleton point R(S p ). to the original shape S O is: R(S) = p S O S E(p) p S O E(p) (3.4) The cost function C(S) for a skeleton S with branch complexity B(S) and reconstruction error R(S) has the form: C(S) = B(S) + ωr(s), (3.5) where B(S) is the number of branches (nodes) of skeleton (graph) S. The constant ω weights the contribution of each term. A high ω strongly penalizes the reconstruction error which, in practice, yields no simplification at all. Decreasing ω puts less emphasis on exact reconstruction and favors skeletons with lower branch complexity, obtained by removing less salient external branches and internal ligature branches. The employed optimization procedure requires two stages. First, the cost function is optimized by considering external branch removal candidates only. Next, we fix the remaining external branches, and optimize in a second pass by considering only candidate internal branches. Figure 3.10 shows the results of simplifying the four skeletons shown in Figure 2.3, with ω = 600. Note that the four divergent skeletons in Figure 2.3 have converged toward a canonical skeleton structure that is invariant to noise and articulation.

Chapter 3. Skeletal Simplification 44 (a) (b) (c) (d) Figure 3.10: Structural simplification applied to the objects in Figure 2.3. Whereas several images with various levels of articulation and noise led to four different skeletal topologies in Figure 2.

6 Summary Skeletons and in turn their shock graphs suffer from instabilities related to both external and internal branches.

First external branches are rank-ordered with respect to increasing reconstruction error, and simplified in an optimization procedure that

51 Chapter 3. Skeletal Simplification 44 (a) (b) (c) (d) Figure 3.10: Structural simplification applied to the objects in Figure 2.3. Whereas several images with various levels of articulation and noise led to four different skeletal topologies in Figure 2.3, their structural simplifications are almost identical. 3.6 Summary Skeletons and in turn their shock graphs suffer from instabilities related to both external and internal branches. We introduce a framework for the structural simplification of a shape s skeleton. First external branches are rank-ordered with respect to increasing reconstruction error, and simplified in an optimization procedure that trades-off skeletal simplicity and representation accuracy. Next, candidate internal branches are identified, and the same optimization procedure is applied. This optimization results in a canonical skeleton.

52 Chapter 4 Experiments and Results This chapter describes the experiments that were performed to test our simplification framework. The experiment set-up is explained in terms of the objects that have been used, and the way object recognition and pose estimation performance is measured is described. Finally the results of the tests are illustrated. 4.1 Experiments set-up We have evaluated our framework for the tasks of object recognition and pose estimation. For our tests we used both a set of clean queries serving as objects that are perfectly segmented, and a set of noisy queries representing inaccurately segmented objects. Since we are interested in obtaining stable skeletons for improved object recognition, we adopt the shock graph (see Section 1.2.5) as an abstracted skeletal representation, allowing us to utilize a powerful shock graph matcher [14]. Our view-based 3-D object database consists of 120 views (8 objects at 15 views each). The views are computed from 3-D graphics models obtained from the public domain. Sample views from the database can be seen in Figure 4.1. For each of the silhouettes an initial skeleton is computed using the Augmented Fast Marching Method (AFMM) (See Section and [33]). The AFMM is chosen for the initial skeletonization 45

Chapter 4. Experiments and Results 46 Figure 4.1: Example views from the database without noise. The database contains 120 views (8 objects at 15 views each).

53 Chapter 4. Experiments and Results 46 Figure 4.1: Example views from the database without noise. The database contains 120 views (8 objects at 15 views each). of the objects, because it runs quickly and reliably on large 2D datasets, and because it is simple to implement. Then a varying amount of simplification, ranging from no simplification to strong simplification, is applied to the skeleton and its shock graph is generated. Each shock graph is added to the model database Object recognition Using the object database containing 120 views, each view is removed from the database (without replacement) and used as a query against the remaining views. Matching a query starts with indexing it into the database to retrieve a small subset of candidates (using the indexing framework described in [26]). Then, the query is matched to each candidate to yield a distance, rank-ordering the candidates by increasing distance (decreasing similarity). The success of the task of object recognition is measured according

54 Chapter 4. Experiments and Results 47 to the highest ranked view that is returned by the matcher. If the model of that view is the same as the model of the query view, object recognition is said to be successful Pose estimation If recognition is successful and the closest candidate is a neighbouring view (on the viewsphere of the object) of the query, then pose estimation is also successful. The small changes in viewpoint between samples on the viewsphere can introduce significant changes in shock graph structure. Hence, we expect our structural simplification to reduce the structural changes between neighbouring views, leading to improved recognition and pose estimation performance. 4.2 Experiments on a clean database In the first set of experiments, our framework is tested on its performance using the silhouettes directly gotten from the models. In practice this means that the database consists of perfectly segmented objects. The previous chapter introduced the cost function C(S) for a skeleton S with branch complexity B(S) and reconstruction error R(S): C(S) = B(S) + ωr(s), (4.1) where B(S) is the number of branches (nodes) of skeleton (graph) S. The constant ω weights the contribution of each term. We evaluated object recognition and pose estimation performance using different levels of simplification, i.e., we varied the weighting factor ω between tests. The results for both object recognition and pose estimation are shown in Figure 4.3. Each figure shows recognition performance (% trials correct) as a function of the weighting parameter ω, which varies from 10,000 (no smoothing) to 350 (maximum smoothing). The optimum recognition performance was achieved with ω = 600. As can be seen

Chapter 4. Experiments and Results 48 from the two plots, structural simplification results in a 4% improvement in recognition performance and a 17% improvement in pose estimation performance.

55 Chapter 4. Experiments and Results 48 from the two plots, structural simplification results in a 4% improvement in recognition performance and a 17% improvement in pose estimation performance. Note that overly large values of ω (very mild smoothing) can lead to structural inconsistencies across the views of an object, resulting in a dip in recognition performance. This can happen when the weighting factor ω is such that an internal branch is pruned for a particular object, but not for its neighbouring view. Figure 4.2: Example views from the databases where noise (bumps and notches) was added to the boundary of the objects. The random noise that was added to the shape s contour varied in size. The following notation is used: when bumps with a radius of 5 pixels are added this is denoted by b5 and notches with a pixel radius of 3 by n3. The objects above correspond to the different noisy databases as follows: bulls (b6,n3), dinos (n4), dogs (b5), eagles (b6), horses (b5,n4), kangaroos (n4), ladybug (b5), camels (b5,n4).

Digital Image Processing Fundamentals

Ioannis Pitas Digital Image Processing Fundamentals Chapter 7 Shape Description Answers to the Chapter Questions Thessaloniki 1998 Chapter 7: Shape description 7.1 Introduction 1. Why is invariance to