1 Introduction In this programming project, we are going to do a simple image segmentation task. Given a grayscale image with a bright object against a dark background and we are going to do a binary decision problem for each pixel whether it belongs to foreground or background. You will play with both synthetic data and real data and you are supposed to implement both the threshold algorithm and graph cut algorithm we learned in the lecture and understand them better. Again, this assignment can be done individually or in pairs, though we strongly encourage you to work in pairs. You can use any languages to do it. You are supposed to implement your own algorithms so that you CANNOT use any built-in functions or third-party code of these algorithms directly EXCEPT that you can use or translate the code which we provide on solving min-cut problem. If you have any questions about this requirement, please ask the TA (chenwang@cs.cornell.edu). 2 Assignment Submission Please submit an archived file through CMS including your report and source code. You are supposed to include a brief description on the technique choices (like parameters, energy functions, etc.) you made in your algorithm. For each image segmentation task, you are supposed to provide at least one mask image and its corresponding extracted foreground (please see Sec. 3.1 for more details and examples). It s better if you can provide multiple results using different techniques and compare them. 3 Assignment 3.1 Task Definition Definition 3.1 (Grayscale Image). In a computer system, a grayscale image can be represented by a 2-D matrix. Suppose we have H W pixels in the image, we will represent it as an H W matrix I where each entry I ij is the intensity of the corresponding pixel on the i-th row and j-th column. Typically, the intensity is an integer with range [0, 255] where 0 indicates black and 255 indicates white. Definition 3.2 (Binary Image Segmentation). In a binary image segmentation task, we would like to classify each pixel into 2 categories (i.e., foreground and background). In other words, we would like to work out a function f : [0, 255] H W {0, 1} H W which takes an image as input and output a 0,1-matrix with the same size where 0 indicates background pixel and 1 indicates foreground pixel. Sometimes we also name the output as mask matrix and we can visualize it as an image. Definition 3.3 (Extracted Foreground). We define extracted foreground as an image which keeps the original foreground and leaves all the background pixels as black. Formally, suppose Y = f(i), we can define O as the extracted foreground by: O ij = Y ij I ij (1) The following figure illustrates the original image, mask image and the extracted foreground. You are also supposed to include the mask image and extracted foreground for each segmentation in your experiments. 3.2 Threshold-based Segmentation 3.2.1 Simple Thresholding Algorithm Since in our simple binary image segmentation task, we assume that the object is always bright while the background is always dark, a naive way to do the segmentation is to set up a threshold t and apply the 1
Figure 1: Input Image Figure 2: Mask Image (Binary Segmentation Result) Figure 3: Extracted Foreground thresholding function to each pixel. I.e., given image I H W, we obtain our result (mask matrix) Y H W as: Y ij = { 1, Iij t 0, I ij < t (2) However, real images are not ideal. So that the simple thresholding algorithm is vulnerable to noise, and the boundary we got may not be smooth. Therefore, we alway perform some pre-processing and post-processing to enhance our result. I will introduce two typical filters for pre-processing and post-processing respectively. 3.2.2 Mean Filter Denoising We can generalize our 1-D mean filter into 2-D space to do the denoising task before we do the segmentation. The idea is still to replace the intensity of each pixel by its local average. Typically, given any pixel, we will define all the pixels within Manhattan Distance k as its neighbor, i.e., a square with length 2k + 1 centered at the given pixel. Here, k is a parameter. Note that we may also take a different treatment to the boundary pixels this time, see the formal definition below. We can define our neighbor set N (x, y) and the image I after denoising in the following way: N (x, y) = {(i, j) } 1 i H, 1 j W, x i + y i k (3) I xy = (i,j) N (x,y) I ij N (x, y) (4) 3.2.3 Morphological Filters To refine the segmentation result, we would like to do some post-processing on our result. Morphological filters are classic tools to tackle this issue. Two fundamental operations are dilation and erosion. In dilation, if any neighborhood pixel is foreground, it becomes foreground. In erosion, if any neighborhood pixel is background, it becomes background. Common morphological filter applies one dilation operation firstly, then followed by an erosion operation. It can fill in some holds in the segmentation result and makes the boundary more smooth. You also need to define the neighbor set for morphological filter. In practice, this neighbor set is typically very small, e.g., k 3 for the Manhattan distance we defined above. 3.2.4 Your Task You are suppose to do the following tasks in this part: Implement thresholding algorithm and 2-D mean filters. (Optional) Implement Morphological filters. 2
Figure 4: Example for a graph Choose proper threshold t, apply simple thresholding algorithm on groundtruth.png in synthetic data. Apply simple thresholding algorithm on noisy.png in synthetic data. Analyze why a simple thresholding strategy fails in this case. Choose proper size k, apply mean filter on noisy.png, then apply the simple thresholding algorithm to do the segmentation. (Optional: apply morphological filters to refine the result.) With the experience in the synthetic data, conduct segmentation for each image in real data using thresholding algorithm. (Optional: you can add more fun things beyond the mean filter and morphological filters to make the results better.) 3.3 Graph Cut-based Segmentation 3.3.1 Preliminary This part will introduce some basic knowledge in graph theory, including direct graph, flow, cut, etc. If you have already know these kind of things or you are not interested in reading too much maths, you can skip the formal definition. But PLEASE try to understand all the examples here, which may help you understand the high-level ideas of these mathematical tools and build our own models in image segmentation task better. Some materials of this part come from Dexter Kozen s textbook [1]. If you are interested in this staff, you can take CS 4820 or CS 6820 for more details. Definition 3.4 (Directed Graph). A directed graph G is defined by a tuple G = (V, E). Here V is a vertex set which represents the nodes in the graph. E V 2 is an edges set which represents the directed links between a pair of nodes. Example of graph: Fig. 4 illustrates an example of a directed graph. There are 3 nodes and 3 directed edges in the graph. Here our vertex set V = {1, 2, 3}, and out edge set E = {(1, 2), (2, 3), (3, 1)}. Note that we use ordered pair to represent directed edge, i.e., (1, 2) is an edge pointed from node 1 to node 2 (see the arrow in the example), which is different from edge (2, 1). Suppose we are given a tuple G = (V, c, s, t), where V is a set of vertices, s, t V are distinguished vertices which are called the source and sink respectively, and c is a function c : V 2 R + assigning a nonnegative real capacity to each pair of vertices. We make G into a directed graph by defining the set of directed edges: E = { (u, v) c(u, v) > 0 } (5) Definition 3.5 (Flow). A function f : V 2 R is called a flow if the following three conditions are satisfied: capacity constraints: for all vertex u, v V, we have f(u, v) c(u, v) skew symmetry: for all u, v V, we have f(u, v) = f(v, u) 3
Figure 5: Example for a flow conservation of flow: for all vertex v V except s and t, we have v V f(u, v) = 0 Definition 3.6 (Max Flow Problem). We can define the value of a flow f as the total amount of flow from the source, i.e., f = f(s, v) (6) v V The max flow problem is to find the flow f with the maximum value f. Example of flow: The intuition of a flow comes from the water flow in pipes. We use a graph to describe the connection of pipes, we use the capacity function c to describe the capacity for each pipe and we use flow function f to describe the amount of water go through each pipe in unit time. An interpretation of the 3 properties in our definition of flow is: 1) Capacity constraints says we cannot push too much water into any pipe which exceeds its capacity in unit time. 2) Skew symmetry is somehow counter-intuitive, we can understand that I give you 5 gallons of water means you receive 5 gallons of water from me. An awkward way to express the same thing maybe I give you 5 gallons of water mean you give me -5 gallons of water. 3) Combining with the second property, conservation of flow says for each internal node, the amount of water it received from other nodes is equal to the amount of water it pushes to other nodes. In other words, internal nodes cannot store waters. Only source node s can generate water while sink node t can store water in our water system. Fig. 5 illustrates an example of a flow network and its flow. The numbers f/c on each edge are its corresponding flow and capacity. To keep the figure clean, I don t draw lines with 0-capacity and negative flow. But we should know that f(s, 1) = 2 implies f(1, s) = 2. We can verify the capacity constraints and conservation of flow holds in this example. For example, for node 4, it receive 1 gallon of water from node 1 and 3 gallons of water from node 2. It also pushes 4 gallons of water towards sink node t. By the way, this flow illustrated in the figure is also the max flow, with value f = 5. Definition 3.7 (s,t-cut). An s,t-cut is a pair A, B of disjoint subsets of V whose union is V such that s A, t B. The capacity of the cut A, B, denoted c(a, B), is: c(a, B) = c(u, v) (7) u A,v B Definition 3.8 (Min Cut Problem). Given the tuple G = (V, c, s, t), find the s,t-cut A, B with the minimal capacity value. Example of cut: In Fig. 6, the dotted line separate the vertex set into A = {s, 1, 2, 4} and B = {3, t}. The capacity of this cut is c(a, B) = 5. By the way, this is also the min-cut of this graph. Theorem 3.9 (Max Flow-Min Cut Theorem). Suppose f is the max flow in G = (V, c, s, t) and A, B is the min cut in G = (V, c, s, t), then we have: f = c(a, B). An intuitive interpretation of this theorem is the maximum flow in a pipe system is bounded by the bottleneck pipes in the system. We can also see from the examples above that the value of max flow agrees with the capacity of min-cut in our example. 4
Figure 6: Example for a cut 3.3.2 Graph Cut for Image Segmentation Just as we learned from lecture, our energy function will be a combination of data term and prior term. E(L) = D p (L p ) + λ P pq I(L p L q ) (8) p where both D p and P pq are some kinds of penalty function. D p (L p ) describes the penalty of we label pixel p as L p. P pq is the penalty of two neighbor pixels p and q with different labels. There are many choices for these penalty functions, most of them are depend on the intensity of pixels. To convert this energy minimization problem into a min-cut problem. We need to define the flow network in the following way: pq N We treat each pixel p we need to label as one node n p in the graph. We need to add one special source node s and one special sink node t into the graph. We add an directed edge from source s to each internal node n p with capacity D p (1) while we add an directed edge from each internal node n p to sink t with capacity D p (0). For each pair of pixels p and q appears in our neighborhood system N, we add two directed edges (n p, n q ) and (n q, n p ) with both capacity λp pq. Fig. 7[2] is a very good example to illustrated to procedure described above. Suppose (A, B) is an arbitrary cut in the graph we defined above, then we label each pixel p A as 0 (background) and other pixels q B as 1 (foreground). One claim is c(a, B) = E(L). We can prove this by some computation: c(a, B) = p A,q B c(p, q) (defintion of cut) = q B {t} c(s, q) + p A {s} c(p, t) + p A {s},q B {t} c(p, q) (note that c(s, t) = 0) = q B {t} D q(1) + p A {s} D p(0) + p A {s},q B {t} λp pq = p D p(l p ) + λ pq N,L p L q P pq = p D p(l p ) + λ pq N P pqi(l p L q ) = E(L) (9) Therefore, we can also see that the optimal labels in our binary segmentation task to minimize the energy function is equivalent to the min-cut in the graph we defined. For more details about the formulation of the problem and the choices of penalty functions, please refer Boykov and Funka-Lea s paper [2], especially Section. 2.1, 2.2, 2.3 and 2.5. 5
Figure 7: Example of image segmentation using graph cut[2] 3.3.3 Usage of Dinic Class The algorithm to solve a max-flow (min-cut) problem is not trivial and far more than the requirement of the course. Therefore, we provided an implementation of Dinic Algorithm to solve this the max-flow and min-cut problem. You can use this code as a subroutine in your assignment. This code is written in C++, but you can easily translate it into your own programming language since there are only 144 lines of code. You don t need to understand how does this code works. You need to follow the following instructions to use this code: Step 1: Initialize an instance of Dinic class. Please also specify the number of vertices and edges of your graph as the first and second parameters in the constructor in this step. They will be used to allocate arrays in the memory so that please make sure that they are big enough. Step 2: Call AddEdge(s, t, c) to add edges of the graph one by one. The parameters means that we will add a directed edge from node s to node t with capacity c. This operation is additive, i.e., AddEdge(1,2,1);AddEdge(1,2,2); is equivalent to AddEdge(1,2,3). By the way, AddEdge(s,t,c,true); is a syntax sugar for AddEdge(s,t,c);Add(t,s,c);. You may find this one useful when you define prior terms. Step 3: Call MaxFlow() to calculate the max flow of the graph. Step 4: Call MinCut() to get the min cut set of the graph. This function will return a boolean array which indicates whether each variable belongs to set A. Please note that min-cut is computed based on the max flow. So please make sure that call MaxFlow() before MinCut(). The main function in the provided code illustrate how to get the results in Fig. 5 and Fig. 6. It should generate the max flow as 5 and min cut set A = {0, 1, 2, 4} and B = {3, 5}. You can use this one or some other toy problems as a test case for your own translation version. Please contact TA when you have any questions about the usage of this code. 6
3.3.4 Your Task You are supposed to do the following things in this part: Implement graph-cut algorithm for image segmentation using (or translating) the program we provided to solve the min-cut problem as a sub-routine. Choose proper parameters, define the penalty functions used in the energy function of graph cut, conduct graph-cut based segmentation for each images in both synthetic and real dataset. You may choose you own way to do the pre-processing and post-processing. Note that this parameter (including the parameters in the penalty functions) are task-specific, which means you may need to tune them for each image. And the quality of the final segmentation result will be heavily relied on these parameters and functions. Compare the result of threshold based segmentation and graph-cut based segmentation. Please also note that for some big images, it may takes several minutes for Dinic Algorithm to optimize the energy function. (According to my own experiment) So please try to use some efficient programming languages this time. The speed of the algorithm also depends on the size of your neighbor system, please try smaller neighbor system (like 4-neighbor system) when it takes too much time. In case of the program is still very slow, you can down sampling the original image to get a smaller image (which means a smaller graph). But remember, this is the last way out. Only use it when necessary. 4 Academic Integrity Academic integrity is important in this course. cornell.edu/academic/aic.html). You must follow the school s code (http://cuinfo. Since this is a programming project, we would like to emphasize the following rules: Having discussions with other people, using open sources and public tools, getting ideas from research papers is allowed, but proper citations and acknowledgements are required. Otherwise, any direct or indirect copy from other s work, Internet, etc. is strictly forbidden. All the results you reported in this project must be generated by your submitted programs. Violations of academic integrity are taken very seriously. Please feel free to contact Professor Zabih if you have any questions or concerns about this topic, or if you feel there is any possibility that you may be violating the code of academic integrity. References [1] D.C. Kozen, The design and analysis of algorithms, Springer, 1991. [2] Y. Boykov, and G. Funka-Lea, Graph cuts and efficient ND image segmentation, International Journal of Computer Vision 70 (2006), pp. 109 131. 7