1 Contour Completion Around a Fixation Point 1

Size: px

Start display at page:

Download "1 Contour Completion Around a Fixation Point 1"

Gary Gordon
6 years ago
Views:

1 1 Contour Completion Around a Fixation Point 1 2 Toshiro Kubota 2 3 Mathematical Sciences 3 4 Susquehanna University 4 5 Selinsgrove PA 18970, USA 5 6 Abstract. The paper presents three edge grouping algorithms for find- 6 7 ing a closed contour starting from a selected edge point and enclosing a 7 8 fixation point. The algorithms search a shortest simple cycle in a graph 8 9 derived from an edge image where a vertex is an end point of a contour 9 10 fragment and an undirected arc is drawn between every pair of end-points whose visual angle from the fixation point is less than a threshold value (set to π/2 in our experiments). The first algorithm restricts the search space to shapes where no contour point seen from the fixation point is oc cluded by other contour points, and finds the shortest simply cycle. The second algorithm restricts the search space to shapes where the start ing edge point neither occludes nor is occluded by other contour points, and finds a shortest simple cycle. The third algorithm is free from any constraints, but does not guarantee that the solution is a shortest cycle The third algorithm, however, guarantees a solution no worse than that of the second algorithm. The paper demonstrates effectiveness of these algorithms with a number of natural images. Finally, the paper proposes a way to automate placement of a fixation point and a starting point so that the procedure runs in a fully automated manner Introduction The goal of contour completion or edge grouping is to link a set of edges and form a salient closed contour. This is an important problem for many computer vision applications, as the solution can provide a hypothesis for an object present in the image. There are O(N!) possible realizations of contours where N is the number of edge elements. Therefore, a brute force method is computationally infeasible. The past research efforts have been spent to reduce the computational complexity by introducing constraints on the search space [1][2] and/or special optimality conditions [3][4][5]. Related to the contour completion problem is to derive a saliency measure of contour fragments, which may not necessarily be closed [6][7] In this paper, we introduce a new constraint on the search space by incorpo rating visual fixation. Roles of the visual fixation and its oculomotor behavior in feature grouping have been studied extensively and many theories and hypothe ses have been proposed [8][9][10][11]. A common view is that localized attention is needed to group multiple features. In this paper, we argue a possible role it plays in contour completion. In particular, we show that by providing a fixation 40

2 2 Toshiro Kubota 41 point in the image, visual features can be ordered based on their directions from the fixation point, which in turn facilitates their grouping We assume that a visual fixation point is given and try to find a salient closed contour that surround the fixation point. A typical scene contains multiple salient contours, whose optimality ranks may be dependent on, for example, attention and context. With methods that seek for solutions whose optimality is defined globally, incorporating such dynamic optimality criteria is not straightforward With the proposed method, the optimality becomes a function of the fixation point. Thus, we can explore the visual scene in a manner driven by the attention, context, and specific goals This approach is also useful for human-in-loop applications such as image editing [12][13] and measurement [14]. The user can manually click a point inside the region of interest and the software can provide a set of closed contours that are found around the fixation point. This frees the user from laborious tracing of the object boundary Although color or appearance information is rich and essential for our percep tion and can work jointly with contour information[15][16], we concentrate our study on edge information only. The reason is four-fold. First, our vision system can extract a great deal of information from an edge map with the capability far exceeding that of the current state-of-the-art edge grouping algorithms. Thus, we can still improve the performance of contour integration using edge infor mation only. We can then try to improve the performance by incorporating the color and texture information to the algorithm. Second, edges provide a sparse representation of the image. The amount of data required for an edge map is times less than the original RGB bitmap. Thus, an edge based algorithm can be more advantageous in situations under stringent memory, computation, and/or communication bandwidth requirements. Third, we can associate our study with various psycho-visual ones, which often treat edges and colors sep arately [17][18]. Fourth, we can make fair comparisons with many other edge grouping and saliency algorithms [19][3][20] [7][5][21]. Incorporating the color information makes comparisons of different algorithms difficult as there are far more free parameters to be considered We cast the contour integration problem to a graph search problem, and ask the following question: find a shortest cycle that starts at a chosen vertex and encloses a chosen fixation point. We present three contour integration algorithms in this paper. The first one is based on our previous work of [22]. The algorithm finds a shortest path of a star-shape where every point on the contour is directly visible from the fixation point. It searches a path in the forward looking direction (or clock-wise direction) and finds a solution efficiently. The second algorithm extends the first algorithm by searching for a shortest path of more general shapes where the start point is directly visible from the fixation point and does not occlude other contour points. The third algorithm extends the second one by searching a shortest path among generic shapes without any constraints. The algorithm improves the solution of the second algorithm by extending the search space. However, it does not guarantee a shortest path in the search space. 85

3 Contour Completion Around a Fixation Point 3 86 Although provision of a fixation point and a starting point is required, we argue that the placements of these points can be automated. To demonstrate our claim, we will present a simple method to do the placement algorithmically The paper is organized in the following way. Section 2 reviews relevant works Section 3 formulates the edge grouping problem into a graph search one. Section presents the contour completion algorithms. Section 5 provides some empirical results. Section 6 presents an algorithm to place a fixation point and starting points automatically and shows some empirical results of the algorithm. Section discusses issues and future enhancement of the proposed algorithms. Section concludes the paper Related Works This section provides non-exhaustive review of relevant works. First, we describe some edge grouping works, and then describe interactive segmentation works Edge Grouping In the past, various contour integration algorithms have been proposed. Many of them formulated the problem as a graph based one and derived a solution via efficient graph search algorithms. A graph consists of a set of vertices and a set of arcs. (We use arc instead of edge for both undirected and directed graphs, to avoid confusion with edges of an image.) A sequence of connected edge pixels is called a contour fragment in this paper In [3], the problem was formulated as a shortest path problem with arc weights encoding tangential information of contour fragments and color infor mation surrounding the fragments. From a salient contour fragment, Dijkstra s algorithm was applied to find a shortest cycle. Since the path cost based on their arc weights increases with the length of the contour, it tends to extract a short closed contour In [4], stochastic completion fields proposed in [23] and [21] were used to de rive transition probability between a pair of contour fragments, and the saliency of the transition was derived using the eigenvector of the transition matrix corre sponding to the largest eigenvalue. Via the transition saliency, a sparse directed graph was constructed where each vertex was a contour fragment, and a strongly connected component algorithm was used to partition vertices. Each component represents a group of contour in the image. Since strongly connected components are disjoint, two contours derived from the algorithm cannot share fragments Thus, the algorithm cannot handle multiple objects joined together in the edge map. It also fails to separate parts from a larger object In [5], elastica was used to define arc weights and a minimum perfect matching was used to derive a closed contour with the smallest ratio of the total arc weights and the length of the contour. By using the ratio form, the algorithm avoided favoring short contours. The algorithm, however, was restricted to extract only the most salient closed contour. Extraction of secondary salient contours required suppression of arcs in the most salient solution. 126

4 4 Toshiro Kubota 127 In [24], symmetry information aided the grouping process. Symmetry is often a strong cue for man-made objects. However, it is difficult to incorporate the information into arc weights as it is a non-local property. (In contrast, proximity and continuity are local properties.) The authors devised an ingenious way to incorporate symmetry by introducing symmetric trapezoids derived from a pair of contour fragments as grouping tokens. This work used the same graph search method of [5]. Hence, it experienced the same issue in extracting secondary contours as in [5] In these algorithms, a saliency condition is encoded in the arc weights of the graph and is fixed. However, the condition is often dependent on the goal and a focus of the system. A shape deemed most salient in one application may not be the most desired one in another. Even within the same application, the saliency criteria can change dynamically, for example, during parsing of the scene for navigation. Many existing algorithms including those mentioned above do not provide mechanisms to change the focus or the saliency criteria in an intuitive manner. Compounded with the fixed arc weight issue is that the algorithms imposes constraints on secondary solutions, which makes it difficult to provide multiple salient regions. An exception is the algorithm of [3], in which a user can select a starting point of the closed contour. However, the search is still driven toward a short contour In [22], a fixation point and a starting point were introduced and the algo rithm derived a cyclic path starting from the chosen starting point and surround ing the fixation point. By incorporating the starting point, a user has precise control of where the solution begins. With the fixation point together with the starting point, the user also has control of the size of the object. The first step of the algorithm was to divide the 2π field of view from the fixation point into an equally spaced set of M bins, and placed each edge pixel in one of the bins except the one at the starting position. Two additional bins were attached be fore and after the M bins and the starting point was placed in both bins. Edge pixels in the same bin and 8-connected in the edge image were aggregated into a super-edgel. For a super-edgel x in ith bin, allowed transitions were restricted to super-edgels in bins from i to i + m where m 0 was a parameter that con troled the size of a gap allowed in the solution. Hence the maximum gap allowed was capped at (m/m)2π. Using this set-up, the algorithm found a shortest path from the starting point in the first bin to the duplicate of the starting point in the last bin. The first algorithm presented in this paper is closely related to this algorithm The work of [25] also considered a fixation point as a parameter to the seg mentation algorithm. It transformed the edge image into the polar coordinate and used graph cut of [27] to separate inside/outside regions with respect to the fixation point. A graph is constructed by connecting four neighbors in the image grid of the polar domain and assigning weights encoding dissimilarity measures of the pixels. 169

5 Contour Completion Around a Fixation Point Interactive Segmentation Although our future goal is to fully automate the process of successively extract salient objects from an edge map, the current assumption is that the algorithm requires a fixation point from the user. In the sense, the current algorithms belong to a class of interactive segmentation A number of practical implementations of interactive segmentation have been developed recently. Most of them utilize color and edge information and separate the image into foreground and background with inputs from the user in the form of a bounding box and/or markers specifying the foreground and background LazySnapping [26] accepted two sets of free-form lines, one for the foreground and the other for the background, from the user. The color information of marked pixels were used to formulate the graph weights and the graph cut algorithm of [27] was used to perform figure-ground separation. To speed-up the process, wa tershed segmentation was applied first to generate over-segmented super-pixels, which were used in stead of individual pixels for the figure-ground separation GrabCut[13] accepted a bounding box enclosing the object of interest. It used Gaussian mixture models (GMM) to characterize the color distributions in the foreground and the background which was initially set as the region around the bounding box, and the graph cut of [27] to separate them. The two processes (GMM fitting and graph cut) were run iteratively until convergence. After the iterative segmentation, the visual appearance of the segmentation was improved by providing alpha blending around the segmentation boarder. The blending function was a spatially slowly varying soft step function, obtained by regular izing the alpha estimates of [28] When using the GrabCut, a user tends to draw a bounding box tightly around an object of interest. Thus, the work of Lempitsky et al. [29] actively integrated the behavior into the segmentation algorithm by incorporating the tightness into an optimization framework. It was done by adding constraints that the foreground had to intersect every pass that went from one side of the bounding box to the other (crossing path). The optimization problem with the constraints was relaxed into a series of LP problems with the constraints being added incre mentally, from no constraint to the full set of crossing paths, by adding those passes that were missed by large margins in the previous iteration. The projec tion from the solution of the LP problems to the original integer problem was done again by the graph cut of [27] These algorithms have shown effective in reducing the amount of laborious user interactions in cutting out any object from a complex natural image, and found places in some commercial applications. However, typical amount of user interactions is still prohibitively high. The algorithms intrinsically rely on color information and do not work well on gray scale images, let alone edge images Thus, they do not provide any insight into how humans perform such segmen tation task. 210

6 6 Toshiro Kubota Formulation Definitions We introduce some definitions and notations that will facilitate our discussion We use shape to describe a connected and closed chain of 2D points. We do not allow any holes. Hence, each shape is represented by one such chain of points, which we call a contour. The following mapping is called the visual angle of p from o. 217 ψ o (p) = arctan (p o) (1) 218 where arctan(p) gives the angle formed by the vector from the origin to p and a reference coordinate. We assume that the range of the arc-tangent is [0, 2π) We will determine how the reference coordinate is oriented later. We assume that the angle increases in the counter-clockwise direction, but this orientation is arbitrary. Let ϕ o (p, q) [0, π] be the angle formed by p-o-q We say that a shape is fully-visible from o, if {ψ o (p)} p is [0, 2π) and one-to one. In other words, the visual angles of the shape cover 360 degrees (thus o is inside the shape) and every p on the shape has a unique visual angle. We say that a shape is partially-visible from o if the visual angles of the shape cover degrees (thus o is inside the shape) and some points on the shape have unique visual angles. Obviously, if a shape is fully-visible from o, it is partially-visible from o Although our algorithms are dependent on o, it is informative to classify shapes in a view independent manner. Such classification gives us a hierarchical view of shapes in terms of their complexity. We say a shape is fully-visible, if it is fully-visible from some point. Similarly, we say a shape is partially-visible if it is partially-visible from some point. However, every shape is partially-visible, since we can move o closer to some boundary point until the boundary point has unique visual angle from o. A stronger condition is for a shape to be partially visible from everywhere inside the shape. We call such shape partially-visible everywhere A fully-visible shape is also a partially-visible everywhere. This is because every line drawn from o, a point from which the shape is fully-visible, to another point intersects the shape only once. Hence, every point has a contour point with a unique visual angle. When a shape is fully-visible from a and b, it is fully-visible from any points between a and b. Thus, a set of points from which the shape is fully-visible forms a convex set. A convex shape is fully-visible from anywhere inside the shape. Figure 1 summarizes this view-independent categorization of shapes. We think that most shapes in nature are partially-visible everywhere Figure 2 shows (a) a fully-visible shape, (b) a partially-visible everywhere shape, and (c) a partially-visible shape (but not partially-visible everywhere) In Figure 2 (a), the shape is fully-visible from the location marked by the red dot, as we can connect every contour point from the red-dot without touching another contour point. It is not fully-visible from the location marked by the blue-dot, however, as the blue horizontal line drawn from the location intersects the shape more than once. Nevertheless, the existence of the red-dot point puts 253

7 Contour Completion Around a Fixation Point 7 partially visible convex fully visible partially visible everywhere Fig. 1. A pictorial representation of the categorization of shapes. 254 the shape into the fully-visible category. The area enclosed by the dashed line forms a convex set of points from which the shape is fully visible. In Figure (b), the shape is partially-visible everywhere. From every point, we can draw a line that intersects the contour once, like the red line drawn from the red dot It is not fully-visible. From every point, we can draw a line that intersects the contour more than once, like the blue line drawn from the red dot. In Figure (c), the shape is partially-visible but not partially visible everywhere. It is partially-visible from the location marked by the red-dot. It is not partially visible from the location marked by the blue-dot. From the location, we cannot draw a line in any direction without intersecting the contour more than once. 263 (a) fully visible (b) partially visible everywhere (c) partially visible Fig. 2. Examples of three shape types Pre-processing This section describes preparatory steps that are needed before applying a con tour integration algorithm. First, we apply Canny edge detector to an input image. Junctions and end-points of connected edge pixels are detected and con tour fragments are formed by tracing every end-point to either another end-point or a junction. We impose the maximum length of L pixels to contour fragments. 269

8 8 Toshiro Kubota 270 Hence, each contour fragment whose length exceeds L units is split into multi ple fragments. This splitting step is to reduce the risk of merging two contour fragments that belong to different objects in the image, and is not a critical part of the overall algorithm. We chose L = 10 in our experiments An undirected graph is constructed by treating each end-point of each con tour fragment as a vertex. For a vertex u, we use u to refer to the end-point represented by the vertex. Two inputs are provided: a fixation point (o) and an interesting point in the image. Given the interesting point, we find the closest edge pixel, split the fragment there, and designate one of resulting endpoints as the starting vertex. We use s to denote the starting vertex. Once we identify s or equivalently s, we fix the reference coordinate of (1) such that ψ o (s) = Every pair of vertices is connected with an undirected arc whose weight is computed as follows. Let u and v be the two vertices of an arc. If u and v are on the same contour fragment, the weight (w(u, v)) is set to 0. If they are from different contour fragments, w(u, v) is set to the square of the Euclidean distance between u and v. We choose to square the distance so that a contour comprised of a large gap is penalized more than a contour comprised of many small gaps We used neither differential information such as tangent and curvature nor color information, to keep the preprocessing stage as elementally as possible Now, we can state the problem as follows. Given o, s, and the fully connected graph constructed as above, find a shortest simple cycle starting from s that encloses o. We need to be precise about the meaning of enclosure. We say a cycle is a θ-enclosure of o, if for every adjacent pair of vertices taken from different contour fragments, say u and v, ϕ o (u, v) is less than θ. In other words, every gap in the cycle has its angle seen from o less than θ. We require θ < π so that the fixation point always lies physically inside the θ-enclosure when gaps are connected by straight line segments. We can safely remove arcs whose visual angles are not less than θ as these arcs cannot be a part of a θ-enclosing cycle We call the graph obtained after the arc removal angularly annotated graph and denote it as G. The problem is well-defined, as there are a finite number of θ-enclosing cycles in G and each cycle has a finite path distance measure For each cycle in G, we can associate a shape by tracing each adjacent pair of vertices in the cycle, u and v, by a straight line segment connecting u and v. If the associated shape is fully-visible from the fixation point, we call the cycle fully-visible. If it is partially-visible from the fixation point, we call the cycle partially-visible. We are not concerned with the precise trace of the contour fragment between u and v, which may not be straight. As the length of such trace is short ( L), the straight line segment provides accurate enough approximation of the contour fragment See Figure 3 for an illustration of these definitions. In (a), an edge image with hypothetical fixation and interesting points are shown. In (b), a set of vertices obtained from the edge image of (a) are shown after splitting of long contour fragments. Vertices from the same contour fragment are shown connected by a solid line. In the figure, α is used to denote ϕ o (u, v). When θ α, a shortest θ-enclosing cycle in (b) is the one delineated with dashed lines. The 314

9 Contour Completion Around a Fixation Point half line starting from the fixation point and extending through the start vertex plays an important role in our algorithms, and is called critical line. In (c), a corresponding angularly annotated graph with θ = π/2 is shown. 317 interesting point critical line s fixation point u v α o (a) Edge image (b) Vertices (c) G Fig. 3. Formulation of an angularly annotated graph from an edge image. (a) is an edge image. Locations of a fixation point and an interesting point are shown. (b) shows a set of vertices. A pair of vertices from the same contour fragment is connected with a solid line. The shortest θ enclosing cycle with θ α = ϕ o (u, v) is shown with dashed and solid lines. (c) shows an angularly annotated graph with θ = π/2. A set of arcs with visual angles not less than θ are removed from the fully connected graph Algorithms In this section, we describe three contour integration algorithms. We simply enumerate them as Algorithm I, II, and III Algorithm I We transform G into another graph Ĝ by taking the following steps. First, we add a target vertex, t, at the same location with s but with ψ o (t) = 2π instead of 0. Second, we duplicate all arcs incident on s and make them incident on t instead of s. Finally, we remove arcs between u and v if ψ o (u) ψ o (v) θ Note that arcs with ψ o (u) ψ o (v) θ are those that crosses the critical line where the angle jumps between 2π and 0. All other arcs with the condition have been removed at construction of G. Thus, this step eliminates all arcs and only arcs that cross the critical line. Arcs removed include those that are incident on s from the right side of the critical line and those that are incident on t from the left side of the critical line We can visualize the entire steps as cutting G at the critical line as shown in Figure 4 where (a) is the same G shown in Figure 3(c), (b) is Ĝ derived from it, and (c) is Ĝ unwrapped to a ψ-r axes where ψ is the visual angle and r is the distance from the fixation point. The representation of 4(c) suggests dynamic programming as a way to find a solution efficiently. More fomally, we present our 336

10 10 Toshiro Kubota 337 first algorithm in Algorithm 1. Indeed, a shortest path from s to t in a forward looking direction only can be found in O( E ) where E is the number of arcs in Ĝ (or G). 339 s t s r t θ (a) G (b) G hat (c) Unwrapped G hat Fig. 4. (a) The same graph from Figure 3(c). (b) Ĝ with introduction of t and removal of arcs crossing the critical line. (c) Ĝ unwrapped to a ψ-r graph Input: Ĝ, s, t, o Output: s t: a path from s to t d(s) = 0; s π = None; Sort vertices by their visual angles; foreach u in the sorted order do d(u) = ; foreach v adjacent to u where ψ o (v) < ψ o (u) do if d(v) + w(v, u) < d(u) then d(u) = d(v) + w(v, u); u π = v; Fig. 5. Algorithm I. d(u) and u π are the distance and the predecessor of u, respectively. 340 A cycle found by Algorithm I is a fully-visible one, since the path takes vertices in a strictly increasing order of their visual angles from 0 to 2π. It is a shortest one among fully-visible cycles that starts from s. Since the pass distance of a cycle is invariant to circular shift of the vertices in the cycle, Algorithm I finds a shortest fully-visible cycle that is incident on s Algorithm II Algorithm I cannot handle partially-visible cycles, as it updates the vertices in the forward direction only. Instead of treating the problem as a dynamic 347

11 Contour Completion Around a Fixation Point programming one, we can treat it as a shortest-path problem, as every path from s to t is a θ-enclosing one. (See Figure 4(b) for a pictorial example.) We can apply a shortest path algorithm on Ĝ to find a shortest partially-visible cycle. This is basically Algorithm II Ĝ may not have every θ-enclosing cycle in G. Missing ones are those that have arcs over the critical line. Therefore, a shortest θ-enclosing cycle in G is a shortest path in Ĝ if and only if the cycle does not cross the critical line more than once We can apply Dijkstra algorithm to find a solution efficiently in O( E log V ) where V is the number of vertices in G (or Ĝ). Another approach is to use the dynamic programming idea of Algorithm I but apply it in both directions repeatedly. The advantage of the second approach is that we can quickly extract a rough sketch of the underlying object approximated by a fully-visible shape in O(E). Subsequent processes in a processing chain can start as soon as the approximate shape is extracted. We can think this approach as Bellman-Ford algorithm with a specific visitation schedule (i.e. repeated forward and backward directions in terms of the visual angle). Thus, the algorithm still extracts the optimal path in Ĝ. More specifically, it will take O(EK) to find the shortest path where K 1 is the number of changes in the angular direction as we trace the path. A fully visible shape has K = 1. The shape shown in Figure 6 has K = Visual Angle 2π π 1 0 Arc Length Fig. 6. A shape with K = 5. In the left, locations where the visual angle changes the direction are marked by four tangent lines. In the right, the same shape is graphed by its arc-length vs. the visual angle. Changes in the angular direction appear changes in its slope in the graph Algorithm III Algorithm II effectively extends the search space for a θ-enclosing cycle from fully-visible cycles to partially-visible ones. A restriction is that the cycle cannot cross the critical line more than once, or s has to have a unique visual angle across the derived shape. Thus, not every partially-visible cycle can be extracted by Algorithm II. Consider the shape shown in Figure 6, which is reproduced in 374

12 12 Toshiro Kubota 375 Figure 7. This is a partially visible everywhere shape and the corresponding cycle in an angularly annotated graph is a partially-visible one from the fixation point o. Algorithm II will extract the cycle if s is placed as shown in Figure 7(a) However, it fails to do so if s is placed as shown in Figure 7(b), as construction of Ĝ eliminates edges in the cycle Figure 7 also indicates that Algorithm II can extract every partially-visible cycle if s is at the place with a unique visual angle from o. Thus, one way to alleviate the limitation may be to place s more intelligently. Such approach can be acceptable for a user-interactive scenario. However, since our future goal is to automate the entire contour integration process and it will be difficult to make the process responsible for intelligent placement of s, we will explore a way to alleviate the limitation by extending the graph search algorithm. This subsection describes our third algorithm, which extends Algorithm II and circumvents its limitation to some extent. 388 s s o o (a) (b) Fig. 7. An example of a partially visible shape from a fixation point. (a) With the choice of s, Algorithm II will be able to extract the corresponding shape. (b) With the choice, Algorithm II will fail to extract the shape. 389 A quick investigation may bring two ways to extend Algorithm II. One is to keep arcs whose visual angles are less than θ regardless of them crossing the critical line or not. The approach is flawed as the search space includes non-θ enclosing paths from s to t. See Figure 8 (a) where a set of blue contours forms a shortest cycle that is not θ-enclosing one. The other approach is to extend Ĝ by circular replication of vertices. Denote the graph G. In G, all paths from s to t are θ enclosing ones. Along the path, the visual angle can go negative or over 2π. Such instances accommodate arcs crossing the critical line. However, the approach is also flawed, as G permits a path that visits the same contour fragment more than once, although the path in G is simple. Thus, the shape associated with such cycle is an invalid one. See Figure 8 (b) and (c). The former shows a shape represented by extracted vertices. The latter shows G derived from the shape of (b) in polar coordinate. A blue colored path in (c) is 401

13 Contour Completion Around a Fixation Point the shortest one from s to t and corresponds to the blue boundary shown in (b) Two arcs pointed by two arrows in (c) correspond to the same contour fragment in (b). 404 s t (a) Example 1 (b) Example 2 0 2π (C) G Tilde Fig. 8. (a) An example shape. By allowing arcs crossing the critical line, a non-enclosing cycle as shown in blue can be admitted. (b) Another example shape. By replicating vertices circularly, a non-simple shape can be admitted as shown in blue here. (c) However, the path is actually simple in G. Two arrows points to arcs that correspond to the same contour fragment in (b). 405 As observed above, we cannot simply apply a shortest path algorithm to G as it may result in a non-simple shape. Instead, we use a greedy way to improve the solution of Algorithm II. A new algorithm (Algorithm III) first runs Algorithm II to find an optimum path in Ĝ. It then extends the graph from Ĝ to G, and replaces the existing path from s to t when a better alternative is found. Thus, the approach does not guarantee the shortest cycle in G but does guarantee that the solution is not worse than that of Algorithm II Before describing details, some definitions and notations are in order. We use P to denote the current path from s to t. We call u V \ {s, t} a replica of v if u v but u=v. We call u dormant if one of its replica is currently in P. The dormant set of G is a set of dormant vertices. We denote the dormant set as D A path from u to v is consistent if no pair of vertices on the path are replica of each other. Exclusion of s and t from the replica definition is a minor technical one; Since s=t, inclusion of them make P inconsistent Roughly speaking, Algorithm II extends the shortest path tree found by Algorithm II to one in G with exclusion of all vertices in the dormant set When we find a shorter path from s to t, we check if it is consistent. If so, we replace the existing path by the better alternative. The consistency is the absolute requirement for the resulting shape to be valid. So why do we want to consider both dormancy and consistency? We use dormancy to prevent many inconsistent branches from forming. This will help growing consistent branches to reach P The Algorithm III is shown below. As before, d(u) is the current path distance of u from s. w(u, v) is the weight of an arc (u, v). v π is the parent of v in the tree Line 8 checks if the update will alter the current P. However, we allow it only if s u is consistent (Lines 9 and 10). Since dormant vertices are excluded, s u being consistent implies s u v t being consistent. When the dormant 431

14 14 Toshiro Kubota 432 set is updated in Line 11, we need to set d of all descendants under each vertex in the new dormant set to to prevent any illegal path from forming from the descendants. 434 Input: Ĝ, s, t Output: s t: a path from s to t 1 Apply Algorithm II on Ĝ 2 Extend Ĝ to G. 3 Find D, a dormant set of vertices given P = s t 4 repeat 5 foreach (u, v) in G do 6 if u / D and v / D then 7 if d(u) + w(u, v) < d(v) then 8 if u / P and v P then 9 if s u is NOT consistent then 10 continue; 11 Update D given P = s u v t; 12 v π = u; 13 d(v) = d(u) + w(u, v); 14 until there is no change ; Fig. 9. Algorithm III 435 Note that the algorithm is guaranteed to terminate since there can be only a finite number of ways the path cost P = s t can be reduced, and within a fixed P, the algorithm is the Bellman-Ford, which is guaranteed to terminate in O( V E ). The possible number of updates is upper bounded by O(2 V ) although the actual number is much smaller. The maintenance of the dormant set takes O( V ) with a tree data structure to maintain the shortest path tree and does not contribute to the overall complexity Since the algorithm is a greedy one, it may only find a locally optimum one This can happen when the result of Algorithm II uses a replica of a vertex in an optimal path. For a such example, see Figure 10 where (a) shows contour fragments with thick ones delineating the optimum solution, and (b) shows the result of Algorithm II. In (b), the fragment pointed by the arrow has the visual angle somewhere between 3π/2 and 2π while the same fragment appears in (a) as somewhere between π/2 and 0. This means that a vertex of the fragment in (b) (call it x) is a replica of that in (a) (call it y). While x is in P, y remains dormant. Thus, Algorithm III cannot use it to improve the current solution It first needs to remove x from P so that y is removed from the dormant set However, the step will likely to increase the path cost. 452

15 Contour Completion Around a Fixation Point 15 (a) Contour Fragments (b) Algorithm I Solution Fig. 10. (a) An example graph. Thick lines shows the optimal θ enclosing cycle. (b) The solution after Algorithm II. A contour fragment pointed by the arrow has the angle in [3π/2, 2π] while the optimal configuration of (a) has the same fragment in [ π/2, 0] Experiments The algorithms were tested on 692 pairs of a fixation point and a starting point, placed manually, on 520 images in which 300 were taken from the Berkeley Segmentation Dataset and 220 were collected from other sources. For each image, Canny edge detector in MATLAB with the default setting was applied, connected edge pixels were sequenced, and those contour fragments that were less than pixels were removed. A set of contour fragments obtained by these steps, a fixation point, and a starting point were the inputs to the algorithms. We used L = 10 and θ = π/2. The number of vertices in the resulting graph ranged from to 2666 with the average of For comparisons, we used the implementation of the Ratio-Contour algorithm of [30], available at [31]. Note that the algorithm does not take fixation point and start point pair as in our algorithms. It does use, in addition to a Canny edge map, the original gray scale image to incorporate region information In addition, we implemented Elder-Zucker type approach of using a short est path tree on G to find a shortest cycle. A direct application of the Elder Zucker algorithm [3] is not appropriate here, however. The algorithm relies on arc weights of a graph that are carefully set using tangential and edge polar ity information to avoid short cycles from forming. Even with such elaborate scheme, their results showed that the algorithm is biased toward short cycles When we applied it directly to our graph with the start vertex as the root of the shortest path tree, the results were almost always a self loop that connects two end points of the starting fragment Instead, we computed shortest path trees in two phases: one starting from the start vertex, and the other starting from an intermediate vertex. For the second phase, the shortest path from the start vertex to the intermediate one found in the first phase was removed. The resulting cycle was the concatenation of a shortest path from the starting vertex to the intermediate one in the first 480

16 16 Toshiro Kubota 481 phase and the shortest path from the intermediate to the start vertices in the second phase. We tried multiple vertices as candidates to the intermediate vertex and chose the one that resulted in the smallest path distance. The set of candi date vertices were chosen from vertices that lied on the opposite side from the start vertex with respect to the fixation point. More specifically, a vertex u was considered a candidate if ψ o (u, s) > π η/2 where η = 0.4 in our experiment Figure 11 illustrates this method. There are two candidates: u and v. Figure (b) and (c) show paths through u and v, respectively. We select v as the intermediate vertex, because the path cost with v is smaller than that with u due to a large gap found in the latter. 490 s s s o η o η o η u u u v (a) (b) (c) v v Fig. 11. An illustration of the Elder-Zucker type algorithm. (a) The original graph where o is the fixation point, s is the start vertex, and u and v are two candidate vertices. A vertex is considered a candidate if it lies within a wedge (a shaded area) centered at o with the angle of η along the line directly away from s. (b) A solution derived from u by concatenating a shortest path from s to u shown in green and a shortest path from u to s shown in blue where all arcs in the green path were excluded. (c) A solution derived from v. 491 Table 1 and Table 2 summarize the performance in terms of the path cost and computation time in seconds, respectively. The path cost for the Ratio-Contour is not given since it uses a different optimzation measure, namely the ratio of the path cost divided by the arc-length of the contour. The computation time does not include time it took for Canny edge detection and sequencing of edge pixels However, it includes construction of the angularly annotated graph. The data are collected on a PC with Windows 7, a 2.40GHz Intel Core2 Quad CPU, and 4GB of memory. The three proposed algorithms and Elder-Zucker type were written in C++ and built with Visual Studio The Ratio-Contour algorithm was written in MATLAB and C/C++. The C/C++ portion was built with GNU compiler with O2 level optimization. All programs were run on a single thread Among the three proposed algorithms, Algorithm I always has the largest d and the smallest t, Algorithm II always has the second largest d and the second largest t, and Algorithm III always has the smallest d and the largest t In average, Algorithm II took about 50% additional time than Algorithm I and 505

17 Contour Completion Around a Fixation Point Algorithm III took about 6 times more than Algorithm II. In the experiment, G is expanded to π to 3π. In average, the path-cost of the Elder-Zucker type is smaller than that of Algorithm III. This makes sense as the search space is less constrained than Algorithm I, II and III, which require θ-enclosure. 509 Table 1. Comparisons of four methods in terms of the path distance Method Max Mean Min Elder-Zucker Algorithm I Algorithm II Algorithm III Table 2. Comparisons of five methods in terms of computation time in seconds. Method Max Mean Min Ratio-Contour Elder-Zucker Algorithm I Algorithm II Algorithm III Figure 12 and 13 show some results from the experiment. A small circle identifies the fixation point, a small cross identifies the starting point, and a contour shows the result of the corresponding algorithm. They are color coded so that multiple results can appear when multiple sets of fixation and start points are provided on the same image. No such markers are shown in the Ratio-Contour results, as the method does not use them. Instead, we collected sequentially as many number of cycles as the number of fixation points by eliminating arcs used in detected cycles from the graph [30] Ratio-Contour tends to favor a large cycle as it finds a cycle with the small est ratio cost. The Elder-Zucker type provides tight contour, often identical to Algorithms II and III. Since it does not guarantee a cycle enclosing the fixation point, the outcome can be unpredictable, as seen in a few cases here. Algorithm I delineates fully-visible shapes, thus cannot follow highly articulate structures Algorithm II tends to delineate a tighter boundary than Algorithm I. There are some cases where it failed to extract an accurate contour due to the contour crossing the critical line more than once. Algorithm III tends to delineate a tighter shape than that of Algorithm II, and in some cases, it was successful in extracting an accurate contour where Algorithm II failed. 527

18 18 Toshiro Kubota Fig. 12. Contour integration results. Shown are, from left to right, a Canny edge image, the result of Ration-Contour [30], the result of a modified Elder-Zucker type, the result of Algorithm I, the result of Algorithm II, and the result of Algorithm III. In each image, a hollow circle is the fixation point and a cross mark is the starting point.

19 Contour Completion Around a Fixation Point 19 Fig. 13. Contour integration results. Shown are, from left to right, a Canny edge image, the result of Ration-Contour [30], the result of a modified Elder-Zucker type, the result of Algorithm I, the result of Algorithm II, and the result of Algorithm III. In each image, a hollow circle is the fixation point and a cross mark is the starting point.

20 20 Toshiro Kubota Fully automated approach Recently, various interactive foreground-background separation algorithms have been developed. Typically, the algorithms require a user input in the form of a bounding box that encloses the foreground of interest. The bounding box can be specified fully by two opposite corners of the box. Thus, the amount of information required by the algorithms is comparable to what our proposed algorithms require. However, we claim that the bounding box based algorithms are highly sensitive to the input by at least two reasons. First, the algorithms often treat the bounding box as a hard constraint and extract a foreground only within the box. The algorithm of [29] even goes further and considers the bounding box to be tightly drawn around the foreground. Second, the bounding box approaches are region based approaches. As such, they use color and texture information extensively. A slight change to the bounding box will alter the input information and influence the final outcome in an unpredictable manner On the contrary, our algorithms are relatively less sensitive to the user input and can be loosely placed without affecting the outcome. Consider a convex shape. The fixation point can be placed anywhere inside the shape and the starting point can be placed any vicinity of the contour. The all three algorithms will be able to extract the shape. For a fully-visible shape, a set of points from which the shape is fully-visible forms a convex region. The fixation point can be placed anywhere inside the convex region, and the starting point can be any vicinity of the object boundary. Again, the all three algorithms will be able to extract the shape. For a partially-visible shape, the fixation point can be anywhere inside the object and the start point can be anywhere near the object boundary as long as the resulting critical line cut the shape only once The Algorithm II and III will be able to extract the shape. Of course, the final outcome can be affected by noise, spurious edges, and edges from other objects The point here is that small perturbation to the click point or the start point will not alter the final outcome Hence, our algorithms can be more easily extended to a fully automated one by automatically placing fixation and starting points intelligently based on the image content. In this section, we describe a simple approach to do the auto mated placing of these points and demonstrate effectiveness of the approach The purpose of the demonstration is to illustrate possible extension of the cy cle finding algorithms to a fully automated one, and is not to propose it as a robust segmentation system. Indeed, the proposed click point selection may not be effective for objects with highly structured textures, like zebras and tigers Nevertheless, it works fairly well for many other objects with weak textures and random textures The algorithm runs strictly on the same set of contour fragments. Thus, we are still not considering color information. Algorithm 3 summarizes the proce dure. First step is to remove small fragments whose number of contour points is less than some threshold (Line 1). The remaining contour fragments are pro jected onto an image lattice (Line 2). A distance map is computed from the projection image (Line 3). For each pixel, the distance map holds the Euclidean 572

21 Contour Completion Around a Fixation Point distance to the closest contour point. Next, compute the local maxima points in the distance map (Line 4). A pixel is a local maximum if its distance value is not less than any of its 8-neighbors. Then, cluster these local maxima hierarchically (Line 5). Initially, each local maximum forms a cluster. Two clusters are merged if there exists a pair of points from each cluster whose Euclidean distance is less than the value of the distance map in either point (Line 5). Fixation points are chosen as a point with the largest distance value within each cluster (Line 6) For each fixation point, find a contour fragment with a point that is closest to the fixation point. A start point for the fixation point is chosen as the mid point of the contour fragment (Line 7) Input: {C}: a set of contour fragments Input: η L: contour length threshold Output: {o}: a set of fixation points Output: {s}: a set of start points Remove contours in {C} with the number of points less than η L. Layout the remaining contour points on an edge image, F. From F, compute the Euclidean distance map D. Each pixel in D stores the distance to the closest contour point in F. Find local maxima points in D. Hierarchically cluster local maxima points a and b if D(a) > a b or D(b) > a b. For each cluster, find a point with the largest distance value and use it as a fixation point. The set of centroids constitutes {o}. For each fixation point, find the closest contour point in F, and then find the med-point of the contour. The point serves as a starting point for the fixation point. The set of starting points constitutes {s}. Fig. 14. Algorithm 3: Automated Fixation and Start Points Placement 583 The number of fixation and start points can be large and some can result in very similar contours. This can be problematic when we want to select a small number of salient regions, as similar regions can appear multiple times. To allevi ate the issue, we cluster them using Jaccard/Tanimoto coefficient between a pair of regions delineated by the contours and use the contour with the least ratio cost, defined as the path distance in the graph divided by the actual arc length of the contour, as the representative region of the cluster. The Jaccard/Tanimoto coefficient of regions, A and B, is defined as 590 J(A, B) = A B A B. (2) 591 Two regions are clustered if the coefficient is larger than 0.7 in our experiment shown below Figure 15 illustrates the fixation/start point selection procedure and the rep resentative shape selection procedure with η L = 20. See the caption for details. 594

22 22 Toshiro Kubota 595 We applied the algorithm to the same set of 520 images we used in Section We set η L = 5 throughout. Table 3 shows descriptive statistics of the number of fixation/start point pairs before and after the region based filtering. The number of pairs selected initially is 180 in average. The filtering reduced the number by about 30%. The number of regions extracted at the end is 128 in average. 599 Table 3. Statistics of automatically selected fixation/start pairs before and after the region based filtering Max Mean Min Before After Figures 16 and 17 show some results of applying this automated approach to the same set of images in Figures 12 and 13. For each case, four images are shown side by side. From left to right, they are the input image, Canny edges of the input image with pairs of fixation (red) and start (green) points linked by a blue line, all cycles extracted by Algorithm II with each pair of fixation and start points, and the best five cycles after the region based filtering in terms of the ratio-cost. The best five cycles are color coded as red (first), green (second), blue (third), magenta (fourth), and cyan (fifth). They are drawn from the fifth to the first. Thus, a contour with a lower ratio-cost may be drawn over a contour with a higher ratio-cost Discussion First, we want to summarize three algorithms in terms of their capability and limitation. Assume that an angularly annotated graph is derived from a single complete shape. Thus, there are no distracters and no gaps between fragments Table 4 characterize the algorithms. We divide shapes into three categories: fully visible from o, partially-visible from o (but not fully-visible from o), and not partially-visible from o. These three cases are shown under the column labeled o For the partially-visible from o, the outcomes are dependent on s: the resulting critical line cuts the shape either once (single) or more than once (multiple) For the other two cases, s does not affect the characterization. Under each case of o and s, an algorithm is either guaranteed to find the shortest path solution (Yes), not capable of finding the solution (No), or capable but no guarantee (Possible). As you can see, applicability increases from Algorithm I, II, to III However, there is no guarantee that Algorithm III will reach an optimal solution when Algorithm II failed The first part of experiments presented in Section 5 showed effectiveness of the proposed algorithms. One interesting finding was that the variant of the Elder-Zucker algorithm, which was designed to compare against the other three algorithms proposed in this paper, worked surprisingly well. The results suggest 628

(c) An Euclidean distance map computed after removing contours with less than 5 points. Local maxima points of the distance map are superimposed.

23 Contour Completion Around a Fixation Point 23 (a) (b) (c) (d) (e) (e) Fig. 15. An illustration of the automated selection of fixation and start points, and the selection of representative shapes. (a) The original gray scale image. (b) Traced contour fragments. (c) An Euclidean distance map computed after removing contours with less than 5 points. Local maxima points of the distance map are superimposed. (d) Detected fixation and starting point pairs. Red circles are the fixation points and green circles are starting points. The pairs are linked by magenta lines. (e) Shapes extracted from each pair of fixation and start points. (f) Ten shapes with the smallest ratio-cost after region based filtering with the threshold of 0.7.

Edge and local feature detection - 2. Importance of edge detection in computer vision

Edge and local feature detection Gradient based edge detection Edge detection by function fitting Second derivative edge detectors Edge linking and the construction of the chain graph Edge and local feature