Outline. Advanced Digital Image Processing and Others. Importance of Segmentation (Cont.) Importance of Segmentation

Advanced Digital Image Processing and Others Xiaojun Qi -- REU Site Program in CVIP (7 Summer) Outline Segmentation Strategies and Data Structures Algorithms Overview K-Means Algorithm Hidden Markov Model Linear Discriminant Analysis Importance of Segmentation Segmentation is generally the first stage in any attempt to analyze or interpret an image automatically. Segmentation bridges the gap between low-level image processing and high-level image processing. Some kinds of segmentation technique will be found in any application involving the detection, recognition, and measurement of objects in images. Importance of Segmentation (Cont.) The role of segmentation is crucial in most tasks requiring image analysis. The success or failure of the task is often a direct consequence of the success or failure of segmentation. However, a reliable and accurate segmentation of an image is, in general, very difficult to achieve by purely automatic means. Image Segmentation Eample Image Segmentation -- Descriptive Definition Segmentation subdivides an image into its constituent regions or objects. That is, it partitions an image into distinct regions that are meant to correlate strongly with objects or features of interest in the image. Segmentation can also be regarded as a process of grouping together piels that have similar attributes. 5 The level to which the subdivision is carried depends on the problem being solved. That is, segmentation should stop when the objects of interest in an application have been isolated. There is no point in carrying segmentation past the level of detail required to identify those elements. 6

7 8 Image Segmentation -- A Math Oriented Descriptive Definition It is the process that partitions the image piels into non-overlapping regions such that:. Each region is homogeneous (i.e., uniform in terms of the piel attributes such as intensity, color, range, or teture, and etc.) and connected.. The union of adjacent regions is not Image Segmentation -- Eplanation. All piels must be assigned to regions.. Each piel must belong to a single region only.. Each region must be uniform.. Any merged pair of adjacent regions must be non-uniform. 5. Each region must be a connected set of piels. homogeneous. 9 How many regions does each of these two pictures have? How many regions does each of these two pictures have?

Image Segmentation Strategies Image segmentation algorithms generally are based on one of two basic properties of intensity values: discontinuity and similarity. Discontinuity based approach: Partition an image based on abrupt changes in intensity. Similarity based approach: Partition an image based on regions that are similar according to a set of predefined criteria. Thresholding Region splitting and merging Discontinuity vs. Similarity Techniques based on discontinuity attempt to partition the image by detecting abrupt changes in gray level. Point, line, and edge detectors. Techniques based on similarity attempt to create the uniform regions by grouping together connected piels that satisfy predefined similarity criteria. Therefore, the results of segmentation may depend critically on these criteria and on the definition of connectivity. The approaches based on discontinuity and similarity mirror one another in the sense that completion of a boundary is equivalent to breaking one region into two. Region growing This histogrambased approach assumes that different features in an image give rise to distinct peaks in its histogram. Similarity Based Approach -- Thresholding How to choose threshold in this case? Thresholding E Case A: The image is composed of one light object on a dark background, in such a way that object and background piels have gray levels grouped into two dominant modes. Case B: Two types of light objects on a dark background. 5 6 E Importance of accurate threshold selection. (a) Input Image (b) Correct choice of threshold (T 9) (c) Threshold too low (T ) (d) Threshold too high (T 5) 7 Determination of Threshold T -- An Iterative Approach. Select an initial estimate for T.. Segment the image using T. This will produce two groups of piels: G consisting of all piels with gray level values > T and G consisting of piels with gray level values <T.. Compute the average gray levels µ and µ for the piels in regions G and G.. Compute a new threshold value: T (µ µ)/. 5. Repeat step through until the difference in Ts in successive iterations is smaller than a predefined parameter T. How do you select an initial estimate of T? 8

Similarity Based Approach -- Summary of Thresholding Method Thresholding groups together piels according to some global attribute, such as grey level. Two piels at opposite corners of an image will both be detected if they both have grey levels above the threshold, even though they are probably not related in any meaningful way. It is possible to distinguish between these two piels if we additionally take into account the fact that piels belonging to a single object are close to one another. Connectivity Consideration 9 Similarity Based Approach -- Region Splitting Basic Idea: Divide an image into smaller and smaller regions until all the piels in the different regions satisfy the predefined uniformity predicate or uniformity measure for that region. Steps: It begins with the entire image in one region. The region are then split to form sub-regions which satisfy the basic segmentation criterion using a suitable uniformity predicate or uniformity measure. Special Data Structures -- Picture Tree in Region Splitting A picture tree is a graph structure in the form of a tree that has an arc between two nodes (i.e., R and R, where they respectively represent two regions) of the graph, if R is contained in R (i.e., R is the parent node of R). It indicates that R is a sub-region of the region R. Picture Tree Eample In the following eample, R includes R, R, and R, and these sets form a partition of R. R R R R Picture tree structure is useful in implementing and describing region splitting and region merging segmentation methods. Special Data Structures -- Quad Picture Tree (QPT) in Region Splitting Quad picture tree (QPT): It is a picture tree with the original image as the start region and progressively divides each region into four (square) sub-regions with an equal number of piels. The QPT can be used to guide the search for regions with uniform gray-levels for gray-level images by developing a measure of uniformity (homogeneity) to test the regions as candidates to be split. Image Data Partition into uniform regions QPT Eample Quad Picture Tree R R R R R R R R R R R R K P E J L M N O A B C D F G H I (B), (C), (D), (E), (F), (G), (I), (J), XX(K), X(M), X(O)

Uniformity (Similarity) Measure Determine whether the region should be split Consider the case of region R split into the subregions R, R, R, and R. Let µ be the mean of R and µ i be the mean of R i. R R R R The simplest Measure: if µ - µ i < ε, then R i is uniform. If all the regions were uniform then R would not be split into the sub-regions. 5 Uniformity (Similarity) Measure U ( R) i #( R ) σ i #( R) i What is the formula to calculate the variance? L i r i i μ ( r) σ ( r) ( r m) p( ). Where #(R) is the number of piels in the region R. The variance σ i is a measure of the uniformity of each sub-region R i A large variance indicates the region is less uniform. Therefore, a smaller U(R) corresponds to a more uniform region R. If the region is non-uniform, then split the region. 6 Uniformity (Similarity) Measure U ( R) i i ( μ μ) i σ i If U(R) is large, then R is not uniform and should be split into the four sub-regions. It indicates that the sub-regions R i differ from R but are themselves uniform. Uniformity (Similarity) Measure U( R) Here σ ma R j ( g w jσ j σ ma g ma ) min Where g ma and g min are maimum and minimum gray-level values in the region R and w j is a weight associated with the sub-region R j. 7 8 Uniformity (Similarity) Measure Eample µ, µ, µ, µ, µ 5. ( g( p) ) p R σ ; 5 5 5 5 ( g( p) ) p R σ ; R R ( g( p) ) R R p R σ ; ( g( p) 5) p R σ. 9 Uniformity (Similarity) Measure Eample Measure : µ- µi <. U ( R).5; 6 ( ) ( ) ( ) (5 ) 6 U( R) ; U ( R) /.75. (5 ) / Where U, U, and U represent the uniform measure,, and 5

Similarity Based Approach -- Region Merging. Define a splitting method to segment the image into small atomic regions satisfying the basic segmentation criterion using a suitable uniformity predicate.. Define a method for merging adjacent regions. That is, merge two adjacent regions which satisfy the merging conditions and the basic segmentation criterion (i.e., the uniform predicate for the union of these two adjacent regions is true).. Repeat the merging procedure. If no regions Special Data Structures -- Region Adjacency Graph (RAG) in Region Merging Region adjacency graph gives the region adjacency relations. It is a structure often used with the QPT. That is, region R is adjacent to region R if there is a piel in R with a -neighbor, or 8-neighbor, or m- neighbor in R. The degree of a node is the number of nodes connected to it by an arc. A cut-node of a graph is a node c such that there are two other nodes a and b in the graph with the property that all paths from a to b go through node c. Transition regions have a small degree and should be connected to two nodes of high degree. can be merged, then stop. RAG Eample Region-Merging-Based Image Segmentation Illustration 5 E Region Adjacency Graph. What is the degree of each node?. Which node(s) are cut-node(s)?. Which region(s) are transition region(s)? Region-Merging-Based Segmentation Using Color Features Divide the image into non-overlapping blocks. Etract a color feature vector (i.e., the mean color of the block) for each block by using the Luv color space. Apply an unsupervised K-means algorithm to cluster the color features. The segmentation process adaptively increases the number of regions C (initially set as ) until a termination criterion is satisfied. 5 Cluster Seeking -- K-Means Algorithm The K-means algorithm is based on the minimization of a performance inde which is defined as the sum of the squared distances from all points in a cluster domain to the cluster center. 6 6

Cluster Seeking -- K-Means Algorithm (Cont.) Step : Choose K initial cluster centers z(), z(),, zk(). These are arbitrary and are usually selected as the first K samples of the given sample set. Step : At the kth iterative step, distribute the samples {} among the K cluster domains, using the relation, S ( k) if z ( k) < z ( k) () j For all i,,..., K, i j, where Sj(k) denotes the set of samples whose cluster center is zj(k). Ties in epression () are resolved arbitrarily. j i 7 Step : From the results of Step, compute the new cluster centers zj(k), j,,..., K, such that the sum of the squared distances from all points in Sj(k) to the new cluster center is minimized. In other words, the new cluster center zj(k) is computed so that the performance inde J j z j ( k ), j,, K, K S j ( k ) is minimized. The zj(k), which minimizes this performance inde is simply the sample means of Sj(k). Therefore, the new cluster center is given by z j ( k ), N j S j ( k ) j,, K, K Where Nj is the number of samples in Sj(k). The name K-means is obviously derived from the manner 8 in which cluster centers are sequentially updated. Step : If zj(k) zj(k) for j,,..., K, the algorithm has converged and the procedure is terminated. Otherwise go to Step. Cluster Seeking -- K-Means Algorithm Eample The behavior of the K-means algorithm is influenced by: The number of cluster centers specified; The choice of initial cluster centers; The order in which the samples are taken; The geometrical properties of the data. In most practical cases the application of this algorithm will require eperimenting with various values of K as well as different choices of starting configurations. 9 Step. Let K and choose z() (, ), z() (, ). Step. Since z( ) < zi () and z( ) < z i (), i we have that S( ) {, }. Similarly, the remaining patterns are closer to z(), so S ( ) {,, 5, K, } Step. Update the cluster centers: z z ( ) ( ) N N 8 5 (.. 5. 67 5. S ( ) ( S ( ) ) L ) 7

Step. Since zj() zj(), for j,, we return to Step. Step. With the new cluster centers we obtain z () < z () for l,, K,8 and z () < z () Therefore: for l 9,, K, and l l S () {, S () {,, K, } 9 l l 8, K, } Step. Update the cluster centers: z() N S () ( L8) 8.5. z () N S () ( 9 7.67 7. L ) Step. Since zj() zj(), for j,, we return to Step. Step yields the same results as in the previous iteration: S ) S () and S () () ( S Step also yields the same results. Step. Since zj() zj() for j,, the algorithm has converged, yielding these cluster centers:.5 7.67 z, z. 7. These results agree with what we would epect from inspection of the given data. 5 Similarity-Based Approach -- Region Growing Region growing is a bottom-up procedure that starts with a set of seed piels. The aim is to grow a uniform, connected region from each seed. A piel is added to a region if and only if It has not been assigned to any other region It is a neighbor of that region The new region created by addition of the piel is still uniform. In general, it starts with a single piel (seed) and add new piels slowly. 6 Seeded Region Growing -- Non-Math Perspective Basic Ideas:. Choose the seed piels.. Check the neighboring piels and add them to the region if they are similar to the seed by using a certain predicate.. Repeat step for each of the newly added piels; stop if no more piels can be added. Similarity-Based Approach -- Neighborhood Region Growing All the piels are considered as the nodes forming a graph. If p and p are neighboring piels, then connect them with an arc if g(p) g(p ) < ε where ε is a parameter. 7 8 8

Hidden Markov Model Introduction The Hidden Markov Model (HMM) is a popular statistical tool for modeling a wide range of time series data. In the contet of natural language processing (NLP), HMMs have been applied with great success to problems such as part-of-speech tagging and noun-phrase chunking. It also provides useful techniques for object tracking and has been used quite widely in computer vision. Its recursive nature makes it well suited to real-time applications and its ability to predict provides further computational Markov Processes efficiency by reducing image search. 9 5 Markov Processes (Cont.) Hidden Markov Models (HMM) The model presented describes a simple model for a stock market inde. The model has three states: Bull, Bear and Even. It has three inde observations up, down, and unchanged. The model is a finite state automaton, with probabilistic transitions between states. Given a sequence of observations, e.g., up-down-down, we can easily verify that the state sequence that produced those observations was: Bull-Bear- Bear, and the probability of the sequence is simply the product of the transitions, in this case.... 5 5 HMM (Cont.) The new model now allows all observation symbols to be emitted from each state with a finite probability. This change makes the model much more epressive and able to better represent our intuition. For eample, a bull market would have both good days and bad days, but there would be more good ones. The key difference is that now if we have the observation sequence up-down-down then we cannot say eactly what state sequence produced these observations and thus the state sequence is hidden. We can however calculate the probability that the model produced the sequence, as well as which state sequence was most likely to have produced the observations. 5 Formal Definition of HMM γ (A,B, π) (Eq. ) S is the state alphabet set, and V is the observation alphabet set: S (s, s,, s N ) (Eq. ) V (v, v,, v M ) (Eq. ) Let Q be a fied state sequence of length T, and corresponding observations O: Q q, q,, q T (Eq. ) O o, o,, o T (Eq. 5) where q i S and o i V 5 9

Formal Definition of HMM (Cont.) A is a transition array, storing the probability of state j following state i. Note the state transition probabilities are independent of time: A [a ij ], a ij P(q t s j q t- s i ). (Eq. 6) B is the observation array, storing the probability of observation k being produced from the state i, independent of t: B [b i (k)], b i (k) P(o t v k q t s i ). (Eq. 7) π is the initial probability array: π [π i ], π i P(q s i ). (Eq. 8) Two Assumptions Markov assumption: It states that the current state is dependent only on the previous state, this represents the memory of the model: t P q q ) P( q q ) (Eq. 9) ( t t t Independent assumption: It states that the output observation at time t is dependent only on the current state, it is independent of previous observations and states: t t (Eq. ) P o o, q ) P( o q ) ( t t t 55 56 The Power of HMM The Power of HMM (Cont.) An HMM can perform a number of tasks based on sequences of observations: ) Learning: Given an observation sequence O{o, o,, o T } and a model, γ, the model parameters can be adjusted such as P(O γ) is maimized. ) Prediction: An HMM model, γ, can predict observation sequences and their associated state sequences in which the probabilistic characteristics of the model are inherently ) Sequence Classification: For a given observation sequence O{o, o,, o T } by computing P(O γ i ) for a set of known models γ i, the sequence can be classified as belonging to class i for which P(O γ i ) is maimized. ) Sequence Interpretation: Given O{o, o,, o T } and an HMM, γ, applying the Viterbi algorithm gives a single most likely state sequence Q {q, q,, q T }. reflected. 57 58 HMM Eample -- Detect Shadow B, S, F Observation Models Clearly state Background (B), Shadow (S), and Foreground (F) A good assumption: When a piel belongs to a group of B,S or F at current time, it is more likely that it still belong to that group at the net time. Each group of piels are not totally separated in color domain. 59 6

HMM Structure B Linear Discriminant Analysis (LDA) -- Face Identification Suppose a data set X eists, which might be face images, each of which is labeled with an identity. All data points with the same identity form a class. In total, there are C classes. That is: X {X, X,., Xc} S F 6 6 LDA The sample covariance matri for the entire data set is then a NN symmetric matri [ μ][ μ] T M Χ where M is the total number of faces. This matri characterize the scatter of the entire data set, irrespective of class-membership. 6 LDA However, a within-classes scatter matri, W, and a between-classes scatter matri, B are also estimated. C T W [ μc ][ μc ] C c M c X c B C C [ μ ][ ] T c μ μc μ c Where Mc is the number of samples of class c, µ c is the sample mean for class c, and µ is the sample mean for the entire data set X. 6 LDA The goal is to find a linear transform U which maimizes the between-class scatter while minimizing the within-class scatter. Such a transformation should retain class separability while reducing the variation due to sources other than identity, for eample, illumination and facial epression. 65 LDA An appropriate transformation is given by the matri U [u u u K ] whose columns are the eigenvectors of W - B. In other words, the generalized eigenvectors corresponding to the K largest eigenvalues BU k λ WU There are at most C- non-zero generalized eigenvalues, so K<C k k 66

LDA The data are transformed as follows: ( ) α U T μ After this transformation, the data has betweenclass scatter matri U T BU and within-class scatter matri U T WU. The matri U is such that the determinant of the new between-class scatter is maimized while the determinant of the within-class scatter is minimized. This implies that the following ratio is to be maimized: U T BU / U T WU 67 LDA In practice, the within-class scatter matri (W) is often singular. This is nearly always the case when the data are image vectors with large dimensionality since the size of the data set is usually small in comparison (M<N). For this reason, PCA is first applied to the data set to reduce its dimensionality. The discriminant transformation is then applied to further reduce the dimensionality to C-. 68 PCA vs. LDA PCA seeks directions that are efficient for representation; LDA seeks directions that are efficient for discrimination. To obtain good separation of the projected data, we really want the difference between the means to be large relative to some measure of the standard deviation for each class. 69 LDA Illustration -- Bad Separation B..5..5........ w.5..5. A 7 LDA Illustration -- Good Separation B..5...5........ A.5..5. w.. 7 8 6 - - -6-8 LDA Illustration -- -class case - -5 - -5 5 5 7

Principal Components Analysis (PCA) Algorithm: Review Consider a data set D{,,, M } of N-dimensional vectors. This data set can be a set of M face images. The mean and the covariance matri is given by μ M M m M m [ ][ ] T m m M μ μ m Where the covariance matri is an NN symmetric matri. This matri characterizes the scatter of the data set. 7 PCA Algorithm (Cont.) A non-zero vector U k is the eigenvector of the covariance matri if u It has the corresponding eigenvalue λ λ,..., λ K If, are K largest and distinct eigenvalues, the matri U[u u uk] represent the K dominant eigenvectors. k λ u k k λ k 7 PCA Algorithm (Cont.) The eigenvectors are mutually orthogonal and span a K-dimensional subspace called the principal subspace. When the data are face images, these eigenvectors are often referred to as eigenfaces. 75 PCA Algorithm (Cont.) If U is the matri of dominant eigenvectors, an N- dimensional input can be linearly transformed into a K-dimensional vector α by: α U T ( μ) After applying the linear transform U T, the set of transformed vectors {α, α, α M } has scatter λ T U U PCA chooses U so as to maimize the determinant of this scatter matri. λ... λk 76 PCA Algorithm (Cont.) An original vector can be approimately constructed from its transformed α as: K k ~ α μ k u k In fact, PCA enables the training data to be reconstructed in a way that minimizes the squared reconstruction error over the data set. This error is: M ε ~ m m m 77