New Fuzzy Object Segmentation Algorithm for Video Sequences *

JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 24, 521-537 (2008) New Fuzzy Obet Segmentation Algorithm for Video Sequenes * KUO-LIANG CHUNG, SHIH-WEI YU, HSUEH-JU YEH, YONG-HUAI HUANG AND TA-JEN YAO Department of Computer Siene and Information Engineering National Taiwan University of Siene and Tehnology Taipei, 106 Taiwan E-mail: k.l.hung@mail.ntust.edu.tw A new fuzzy moving obet segmentation algorithm for video sequenes is presented in this paper. Our proposed effiient obet segmentation algorithm onsists of three steps, namely the spatial segmentation step, the temporal traking step, and the step for identifying the moving obet from the frame in a fuzzy way. Espeially, our proposed algorithm an robustly distinguish the foreground part, whih is a near stationary region but surrounded by some regions with moving variation, from the bakground. Using several different real video sequenes, experimental results demonstrate that the obet segmentation auray of our proposed fuzzy-based algorithm is enouraging when ompared to the reently published obet segmentation algorithms by Chien et al. and Kim et al. Keywords: fuzzy approah, moving obet segmentation, MPEG-4, spatial segmentation, temporal traking 1. INTRODUCTION Obet segmentation is a kernel issue in ontent-based video oding standard suh as MPEG-4 [12]. The main target of obet segmentation is to identify the moving obets of eah frame in video sequenes. Among those developed obet segmentation algorithms, there are two main approahes, namely the hange detetion masks (CDMs) approah and the spatio-temporal segmentation approah. Aording to the CDM-based approah, many effiient obet segmentation algorithms [1-3, 8, 10] have been presented in reent years. In the CDM-based approah, first it mainly utilizes the frame differene between two onseutive frames in order to extrat the position and shape of the moving obets. Then a fine-tuning proess is applied to enhane the boundary auray of eah moving obet. Sine the CDM-based approah mainly involves simple alulations of frame differene, it leads to omputation-saving effet. Currently, Chien, Ma, and Chen [2] presented an effiient bakground registration method to aumulate enough frame differene information in order to onstrut a reliable bakground image. When ompared with the reliable bakground image, further the moving obet an be extrated from the urrent image. For onveniene, the obet segmentation algorithm presented in [2] is alled the BR algorithm. Aording to the spatiotemporal segmentation approah, many effiient obet segmentation algorithms [4-6, 9, Reeived Marh 17, 2006; revised September 8 & Deember 14, 2006; aepted January 16, 2007. Communiated by Ja-Ling Wu. * This work was partially supported by the National Siene Counil of Taiwan, R.O.C. under ontrat No. 95-2221-E011-152. 521

522 KUO-LIANG CHUNG, SHIH-WEI YU, HSUEH-JU YEH, YONG-HUAI HUANG AND TA-JEN YAO 13-15, 18] have been developed. Using spatial homogeneity as the primary segmentation riterion, first the gradient of eah pixel in the urrent frame is alulated, then a segmentation algorithm, suh as the watershed algorithm [16], is applied to segment the initial region boundary. Further, the motion vetor of eah region is predited in order to trak the estimated orresponding region loation in next frame. If some adaent regions have similar motion vetors, then they are merged together to obtain a moving obet. Currently, based on spatio-temporal approah, Kim et al. [4] presented an effiient obet segmentation algorithm to represent the video sequenes in terms of video obet planes (VOP s). For onveniene, the obet segmentation algorithm presented in [4] is alled the VOP algorithm. In the VOP algorithm, some novel and effetive onsiderations are taken into aount and they are (1) using the affine transform and the least square method to estimate the amera motion behavior and then removing the amera motion influene, (2) avoiding the temporal segmentation when there is a sene hange, (3) employing a statistial hypothesis testing tehnique to detet intensity hange, and (4) inorporating the olor information into the morphologial gradient operator. Among these developed obet segmentation algorithms either using the CDM-based approah or using the spatio-temporal approah, how to determine whether a near stationary region surrounded by some adaent regions with moving variation is a foreground part or a bakground part is still a hallenging problem and it is the motivation of this researh. After determining whether a region is a foreground part or not, identifying the moving obet is easily followed. A new fuzzy moving obet segmentation algorithm for video sequenes is presented in this paper. Our proposed novel obet segmentation algorithm onsists of three steps, namely the spatial segmentation step, the temporal traking step, and the step for identifying the moving obet from the frame in an effiient, fuzzy way. Espeially, our proposed algorithm an robustly distinguish the foreground part, whih is a near stationary region but surrounded by some regions with moving variation, from the bakground. In fat, this is our main ontribution in this paper. Based on several different real video sequenes, experimental results demonstrate that the obet segmentation auray of our proposed fuzzy-based algorithm is enouraging when ompared to the previous two published obet segmentation algorithms, the BR algorithm [2] and the VOP algorithm [4]. The remainder of this paper is organized as follows. Setion 2 presents our proposed fuzzy approah to identify the moving obets in eah frame. Setion 3 presents our proposed whole obet segmentation algorithm. Experimental results are demonstrated in setion 4. Finally, some onluding remarks are addressed in setion 5. 2. FUZZY-BASED APPROACH FOR IDENTIFYING MOVING OBJECTS In the previous VOP algorithm, the foreground regions in the first frame of the video sequene must be identified by users in advane sine that the foreground regions in the referene frame must be known and these foreground regions are proeted to the urrent frame to assist the segmentation of the moving obets. In this setion, without onsidering the foreground and bakground information of the referene frame, a novel fuzzy-based approah is presented to segment the moving obets automatially. The blok diagram of our proposed fuzzy-based approah is shown in Fig. 1. First, by applying

FUZZY-BASED ALGORITHM FOR VIDEO OBJECT SEGMENTATION 523 Fig. 1. The blok diagram of our proposed fuzzy-based approah for moving obet identifiation. the signifiane test method [1] to the urrent frame and the referene frame, the ratio of the pixels with CDMs in the urrent region, namely the hange ratio (CR), an be obtained. Here, the definition of CDM will be defined in subsetion 2.1 in detail. Next, by using the motion estimation tehnique, we an ompute the ratio of the moving pixels in neighboring regions of the urrent region, namely the moving ratio (MR). Finally, aording to the CR and MR of the urrent region, the proposed fuzzy-based sheme is used to determine whether the urrent region is a foreground region or a bakground region. 2.1 The Computation of Change Ratio For exposition, assume the referene frame has been partitioned into r' regions in advane. Using the temporal traking step, for eah region in the urrent frame, say R for 1 ' where ' denotes the number of partitioned regions in the urrent frame, we an find the orresponding region in the referene frame. The detailed temporal traking step will be desribed in step 3 of our proposed obet segmentation algorithm (see setion 3). In next paragraph, we want to disuss the hange between the region R and the region R where the region R in the referene frame oupies the same positions as those of the region R in the urrent frame. We apply the signifiane test method [1] to determine whether eah pixel in the urrent region R has enough hange or not when ompared to the pixel at the same position in the region R. If the pixel in the region R has enough hange, that pixel is set by a CDM. First, we alulate the frame differene D(x, y) between the pixel at loation (x, y) in the referene frame, say F r (x, y), and the one in the urrent frame, say F (x, y). D(x, y) is alulated by D(x, y) = F r (x, y) F (x, y). (1) By Eq. (1), if D(x, y) T s, where the threshold T s an be determined aording to Eq. (3), the pixel F (x, y) is set by a CDM. The random variable D(x, y) obeys a zero mean Laplaian distribution [11] defined by

524 KUO-LIANG CHUNG, SHIH-WEI YU, HSUEH-JU YEH, YONG-HUAI HUANG AND TA-JEN YAO 2 2 D pd ( ) = exp( ) 2σ σ (2) where the random variable D is indeed the random variable D(x, y) while ignoring the position parameter (x, y) and σ 2 is the variane of the frame differene. Fig. 2 shows three histograms of D(x, y) s for Mother and Daughter video sequene and it an be observed that the three frame differenes fit the Laplaian distribution. Fig. 2. The histograms of D(x, y) s between 1st and 2nd frames, 4th and 5th frames, and 7th and 8th frames in Mother and Daughter video sequene. Aording to the predefined signifiane level α, the threshold T s an be omputed by α = p(d > T s ). (3) Based on the alulated threshold T s, we an determine whether eah pixel in the urrent region R should be set a CDM or not when ompared to the pixel with the same position in the region R of the referene frame. The hange ratio (CR) is defined to be the number of pixels with CDMs over the size of the region R and it is defined by number of pixels with CDMs in R CR = (4) R where R denotes the number of pixels in the region R. 2.2 The Computation of Moving Ratio By omputing the CR desribed in subsetion 2.1, a region with higher CR will be deteted as the foreground ground part of the urrent frame. However, if a foreground ground region has smaller CR, it will be regarded as the bakground region. For example, as shown in Fig. 3, there are five regions in the urrent frame, namely R 1, R 2, R 3, R 4, and. R5 We now take R 2 as the urrent region, where R 2 denotes the sarf region. For the urrent region R 2, if its CR value is rather small, we may favor the urrent region R 2 the

FUZZY-BASED ALGORITHM FOR VIDEO OBJECT SEGMENTATION 525 referene frame urrent frame Fig. 3. The urrent region R 2 and its three neighboring regions R 1, R 3, and R 4. bakground. Thus, besides utilizing the value of CR in Eq. (4) to help us to identify the moving obets, we further need to onsider the status of R s neighboring regions. Let the k neighboring regions of R n, be denoted by the set R = { R1, R2,..., Rk}. Then the, three neighboring regions of the urrent region R 2 are denoted by the set R n 2 = { R 1, R 3, }. R 4 In fat, it is a hallenging problem to determine whether the urrent region R 2 is a foreground part or not when R 2 is near stationary while its neighboring regions have some moving variation. Beause the moving variation of neighboring regions is also an important fator to help us to udge whether the urrent region belongs to the foreground or the bakground, naturally the information of motion vetors existed in eah neighboring region is used to define the seond measure. Besides the first measure CR defined in Eq. (4), we now define the seond measure, alled the moving ratio of R n, abbreviated by MR. Suppose we have performed the motion estimation algorithm [13] between the referene frame F r and the urrent frame F. In motion estimation algorithm, first we divide the urrent frame F into many fixed-size bloks. Under the predefined mathing riterion, suh as the mean square error (MSE), the motion vetor of the urrent blok B is determined by finding the best mathing referene blok within a searh window in the referene frame. Assume the determined motion vetor of B is denoted by (V x, V y ). If the maxima absolute value of V x and V y is equal to or less than the speified threshold T m, (0, 0) is hosen as the motion vetor for B, that is, we assign (0, 0) to (V x, V y ); otherwise, we retain the original motion vetor (V x, V y ). Empirially, the threshold T m is set to 0. Moreover, the motion vetor of eah pixel in the urrent blok B is set to be (V x, V y ). If the motion vetor of one pixel in B is not equal to zero, then this pixel is said to be a moving pixel. The measure MR is defined by n, number of the moving pixels in R MR =. (5) R n, 2.3 Fuzzy-based Approah for Foreground/Bakground Identifiation Employing the CR and the MR defined in Eqs. (4) and (5) respetively, we now present a novel fuzzy-based approah, whih is depited in Fig. 4, to determine whether the urrent region R is a foreground part or not. First, in order to use the proposed fuzzy rules to infer whether the urrent region R is a foreground region or not, two input funtions μ 1 (CR ) and μ 2 (MR') are employed to perform the fuzzifiation proess whih an

526 KUO-LIANG CHUNG, SHIH-WEI YU, HSUEH-JU YEH, YONG-HUAI HUANG AND TA-JEN YAO transform CR and MR from the risp domain to the fuzzy domain. Next, in the fuzzy domain, the proposed fuzzy rules represented by the fuzzy assoiative memory matrix (FAMM) are utilized to onnet the relationship between the two input funtions μ 1 (CR ) and μ 2 (MR') and the output funtion μ 3 ( G ). Further, from the defuzzifiation proess, we an obtain the risp value G, whih is a basis for foreground region identifiation. Finally, a threshold of G, namely T f, is used to determine whether the R is a foreground region or not. To obtain better μ 1 (CR ), μ 2 (MR ), ( μ ), 3 G and FAMM, a set of video sequenes are used as the training set in the training proess. The foreground regions of the training set must be identified artifiially before performing the training proess. For the four funtions μ 1 (CR ), μ 2 (MR ), ( μ ), 3 G and FAMM, the training proess onsiders different ases to perform the fuzzy-based approah on the training set (see Figs. 5-7), and then the orresponding result is ompared to the artifiial result. By testing different ases of the above four funtions, the ase whih has the most approximate segmentation result when ompared to the artifiial one is seleted to be used in our proposed fuzzy-based approah. In what follows, we explain how Fig. 4 works. After that, we will present how to utilize the output of Fig. 4 to identify the moving obets in the frame. The related parameters, suh as G, CR', MR, μ 1 (CR ), μ 2 (MR ), ( μ ), 3 G and T f, will be defined in next two paragraphs. Fig. 4. The fuzzy-based sheme to determine whether the urrent region is a foreground part or not.

FUZZY-BASED ALGORITHM FOR VIDEO OBJECT SEGMENTATION 527 In Fig. 4, there are two input variables, the CR variable and the MR variable. For the CR variable, we first normalize the variable CR to be CR = CR 0.5. Using the new variable CR as the input variable, Fig. 5 (a) depits the membership funtion μ 1 (CR ), whih ontains five triangular shape sub-membership funtions, namely LN 1 (CR ), SN 1 (CR ), ZE 1 (CR ), SP 1 (CR ), and LP 1 (CR ) where LN, SN, ZE, SP, and LP are the abbreviations of Large Negative, Small Negative, Zero, Small Positive, and Large Positive, respetively. The X-oordinate of Fig. 5 (a) denotes the value of CR where the variable range [ 0.5, 0.5] is partitioned into five sub-intervals uniformly. For example, when CR = 0.2, i.e. CR = 0.7, from Fig. 5 (a), we have SP 1 (0.2) = 0.8125, LP 1 (0.2) = 0.1875, and LN 1 (0.2) = SN 1 (0.2) = ZE 1 (0.2) = 0. Equivalently, we have μ 1 (CR' = 0.2) = {SP 1 (0.2), LP 1 (0.2)} = {0.8125, 0.1875}. We an say that when the hange ratio CR is 0.7, i.e. CR' = 0.2, the possibility of Small Positive is 0.8125 and the possibility of Large Positive is 0.1875 while the possibilities of the other three outputs are zeros. Similarly, we normalize the seond variable MR to be MR' = MR 0.5. Fig. 5 (b) depits the membership funtion μ 2 (MR ) whih ontains five triangular shape sub-membership funtions, namely LN 2 (MR ), SN 2 (MR ), ZE 2 (MR ), SP 2 (MR ), and LP 2 (MR ). For example, when MR' = 0.25, i.e. MR = 0.25, by Fig. 5 (b), we have LN 2 ( 0.25) = 0.5, SN 2 ( 0.25) = 0.5, and ZE 2 ( 0.25) = SP 2 ( 0.25) = LP 2 ( 0.25) = 0. We thus have μ 2 (MR' = 0.25) = {LN 2 ( 0.25), SN 2 ( 0.25)} = {0.5, 0.5}. When the moving ratio MR is 0.25, i.e. MR' = 0.25, the possibility of Large Negative is 0.5 and the possibility of Small Negative is 0.5 while the possibilities of the other three outputs are zeros too. (a) The membership funtion μ 1 (CR ). (b) The membership funtion μ 2 (MR ). Fig. 5. Two membership funtions μ 1 (CR ) and μ 2 (MR ). The inverse of the membership funtion μ 3 ( G )(see Fig. 6) is used to obtain the value of G and the value of G is used to determine whether the urrent region is the foreground part or not. The blok for the fuzzy rules in Fig. 4 is used to onnet the relationship between μ 1 (CR ), μ 2 (MR ), and μ 3 ( G ). As shown in Fig. 7, a fuzzy assoiative memory matrix (FAMM) is used to represent these twenty five fuzzy rules. Given the input membership funtion μ 1 (CR ) (μ 2 (MR )), suppose A(B) is one of its sub-membership funtions. From the empirial data, it is observed that the value of A will influene the output sub-membership funtion of A and B more than that of B. This observation leads to the arrangement of those twenty five fuzzy rules as shown in Fig. 7. For example, assume the sub-membership funtion of μ 1 (CR ) is LP 1 and the sub-membership funtion of μ 2 (MR ) is LN 2. Aording to the above observation, the output sub-membership funtion of LP 1 and LN 2 is set to SP 3 instead of setting to ZE 3. By the same ways, the output sub-membership funtion of LP 1 and SN 2 is set to LP 3 instead of setting to SP 3.

528 KUO-LIANG CHUNG, SHIH-WEI YU, HSUEH-JU YEH, YONG-HUAI HUANG AND TA-JEN YAO Fig. 6. The membership funtion μ 3 ( ). Fig. 7. The twenty five fuzzy rules. G As shown in Fig. 7, there are twenty five fuzzy rules in the FAMM where eah fuzzy rule is realized by the AND operator assoiated with two input membership funtions, μ 1 (CR ) and μ 2 (MR ). Returning to the same example, i.e. CR' = 0.2 and MR' = 0.25, we have SP 1 (0.2) = 0.8125, LP 1 (0.2) = 0.1875, LN 2 ( 0.25) = 0.5, and SN 2 ( 0.25) = 0.5. Sine the four sub-membership funtions SP 1, LP 1, LN 2, and SN 2 are ativated, the four fuzzy rules SN 3, SP 3, ZE 3, and LP 3 are fired, and they are depited in the four shaded areas of Fig. 7. Among these twenty five entries in the FAMM, we assign a weight value, whih is equal to the minimum of the two orresponding inputs, to eah entry. For example, the weight value of the entry SN 3 is equal to W SN3 = 0.5 = min{sp 1 (0.2) = 0.8125, LN 2 ( 0.25) = 0.5}. By the same arguments, we have W SP3 = 0.1875 = min{lp 1 (0.2) = 0.1875, LN 2 ( 0.25) = 0.5}, W ZE3 = 0.5 = min{sp 1 (0.2) = 0.8125, SN 2 ( 0.25) = 0.5}, and W LP3 = 0.1875 = min{lp 1 (0.2) = 0.1875, SN 2 ( 0.25) = 0.5}. As shown in Fig. 6, the output membership funtion μ 3 ( G ) also has five fuzzy sub-membership funtions, namely LN 3, SN 3, ZE 3, SP 3, and LP 3. The defuzzifiation blok in Fig. 4 is used to transform the output membership funtion μ 3 ( G ) to a deision value G whih an be used to determine whether the urrent region R is the foreground part or the bakground part. For exposition, let us follow the above example to explain the defuzzifiation proess. As shown in Fig. 8, after performing the AND operation for SP 1 (0.2) = 0.8125 and LN 2 ( 0.25) = 0.5, we have the resulting membership funtion denoted by the shaded area of SN 3. By the same arguments, after performing the AND operation for SP 1 (0.2) = 0.8125 and SN 2 ( 0.25) = 0.5, we have the resulting membership funtion denoted by the shaded area of ZE 3. The shaded area of SP 3 (LP 3 ) an be determined by performing the AND operation for LP 1 (0.2) = 0.1875 and LN 2 ( 0.25) = 0.5 (LP 1 (0.2) = 0.1875 and SN 2 ( 0.25) = 0.5). The four shaded membership funtions as shown in Fig. 8 are oinided with the orresponding four membership funtions in Fig. 6. Based on the enter of gravity tehnique, the deision value G is given by G 1 1 1 1 WSN SN 3 3 (1) + WSP SP 3 3 (1) + WZE ZE 3 3 (1) + WLP LP 3 3 (1) = W + W + W + W SN3 SP3 ZE3 LP3 (6) 1 1 1 1 where SN3 (1), SP3 (1), ZE3 (1), and LP3 (1) denote the inverse values of the membership funtions SN 3, SP 3, ZE 3, and LP 3 with respet to the entral value of the argument, i.e. 1.

FUZZY-BASED ALGORITHM FOR VIDEO OBJECT SEGMENTATION 529 Fig. 8. Defuzzifiation proess. Following the previous example, it is known that W SN3 = 0.5, W SP3 = 0.1875, W ZE3 = 1 1 0.5, and W LP3 = 0.1875. From Fig. 6, the four inverse values are SN3 (1) = 0.17, ZE3 (1) = 1-1 0, SP3 (1) = 0.17, and LP3 (1) = 0.33. By Eq. (6), the deision value G is equal to 0.0064 (= [0.5 ( 0.17) + 0.5 (0) + 0.1875 (0.17) + 0.1875 (0.33)]/[0.5 + 0.5 + 0.1875 + 0.1875)]). For the urrent region R, if its obtained deision value G is large than the predefined threshold T f, whih is set by T f = 0, the urrent region R is said to be the foreground part and we have M( R ) = 1; otherwise it is said to be the bakground part and we have M( R ) = 0. For our example, sine the deision value G = 0.0064 is larger than T f = 0, we say that the urrent region R is the foreground part. In fat, determining the urrent region R to be a foreground part is reasonable beause the urrent region R has a some large hange ratio, i.e. CR = 0.7, although its neighboring regions have a some small moving ratio, i.e. MR = 0.25. Now we take another example to demonstrate that our proposed algorithm an robustly distinguish the foreground part, whih is a near stationary region but surrounded by some regions with moving variation, from the bakground. Suppose the urrent region Rk has the hange ratio CR = 0.25 but the moving ratio of its neighboring regions is MR = 0.7. On the other hand, we have CR' = 0.25 and MR' = 0.2. It is easy to verify that the deision value of Rk is equal to 0.0085, i.e. G k = 0.0085. Due to G k = 0.0085 > T f = 0, the near stationary urrent region is still determined to be the foreground part, i.e. M ( R ) = 1. It implies that the neighboring regions of Rk have some large moving variation. After determining all the values of M ( R ) s for 1 ' in the urrent frame, we

530 KUO-LIANG CHUNG, SHIH-WEI YU, HSUEH-JU YEH, YONG-HUAI HUANG AND TA-JEN YAO merge all the regions, eah being the foreground part, into the whole foreground whih onstitutes the so alled moving obet. 3. THE PROPOSED OBJECT SEGMENTATION ALGORITHM After desribing our proposed fuzzy approah to identify the moving obets from the urrent frame, this setion wants to present our proposed whole obet segmentation algorithm. Our proposed obet segmentation algorithm onsists of the following four steps. Step 1: Preproessing the First Frame 1.1: For the first frame, we perform the noise removal proess using the 5 5 Gaussian smoothing filter. 1.2: Let the smoothed image of the first frame be denoted by F 1 and the grey level at loation (x, y) be denoted by F 1 (x, y). Using the multilevel gradient algorithm [17], the gradient value of F 1 (x, y) is alulated by F 1 3 1 1 (x, y) = (( F ( xy, ) 3 t =1 B t ) (F 1 (x, y) B t )) B t-1 where and denote the dilation operator and the erosion operator, respetively, and B t denotes the struturing element with (2t + 1) (2t + 1) pixels for 1 t 3. If the value of F 1 (x, y) is less than the threshold value T g, F 1 (x, y) is set to be zero; otherwise the value of F 1 (x, y) retains the same. In our experiment, the threshold value T g is set to be 15. Step 2: Spatial Segmentation for the First Frame We first apply the watershed segmentation algorithm [16] to split the first frame F 1 into r' homogeneous regions. In order to solve the over-segmentation problem, we apply the hybrid image segmentation algorithm [7] to merge the obtained r' regions further into 1 1 1 1 r 1 regions denoted by the set R = { R, R,..., R }, r r. 1 2 r1 1 Step 3: Temporal Traking between Two Conseutive Frames 3.1: Under the mean square error, i.e. MSE, we apply the full searh-based motion estimation tehnique [13] to obtain the motion vetor of eah blok, say (V x, V y ), in the urrent frame. If Max( V x, V y ) is equal to or less than the speified threshold T m, the vetor (0, 0) is hosen as the motion vetor of that blok, i.e. (V x, V y ) = (0, 0); otherwise, we retain the original motion vetor (V x, V y ). Sine the region label of eah pixel in the urrent frame is inherited from that of the orresponding pixel in the referene frame, the motion vetor of the urrent blok is assigned to eah pixel in that blok to find the orresponding pixel in the referene frame, i.e. the motion vetor of eah pixel F (x, y) in the urrent blok is set to (V x, V y ). Thus, after the motion estimation tehnique has been applied to eah bloks in the urrent frame, we an obtain the motion vetor of eah pixel. For the pixel F (x, y) with motion vetor (V x, V y ), its orresponding pixel F r (x', y') in the referene frame an be obtain by x' = x + V x and y' = y + V y, and then we set the region label of F (x, y) from that of F r (x', y'). After all

FUZZY-BASED ALGORITHM FOR VIDEO OBJECT SEGMENTATION 531 pixels in the urrent frame inherit the region labels from the referene frame, the pixels with the same region label are olleted to form a region. Therefore, in the urrent frame, we an obtain '' regions whih are denoted by{ R for = 1, 2,..., ''} where in eah region, all the pixels have the same region label. 3.2: In the urrent frame, for eah region R { R for = 1, 2,..., ''}, we perform the erosion operator with struturing element of 5 5 pixels to obtain the orresponding shrunk region alled the marker. Naturally, we have '' markers, namely M for = 1, 2,..., ''. These '' markers an be viewed as the initial athment basins to perform the watershed segmentation algorithm on the urrent frame. Therefore, based on these '' markers, the modified watershed segmentation algorithm [18] is applied to the urrent frame. After the segmentation proess and the merging proess in [18] have been ompleted, we an obtain ' homogeneous regions R s for 1 ' and ' < ''. Step 4: Fuzzy-based Moving Obet Determination For eah region R in the urrent frame, by Eqs. (4) and (5), we alulate the CR value and MR value. As shown in Fig. 4, aording to the CR value and the MR value of region R, we apply the fuzzy-based approah desribed in setion 2 to determine whether the region R is the foreground part or the bakground part. As shown in Fig. 9, in order to handle the doughnut-type obet, for eah foreground part, we further ompare its mean gray value and its variane with the mean gray value and the variane of the bakground part. If both differenes are less than the speified thresholds, we re-assign the onerning foreground part to be the bakground part. We merge all the onneted regions into a moving obet. Usually, we may obtain several separated moving obets. If the urrent frame is not the last frame in the video sequene, go to step 3; otherwise, the above obet segmentation algorithm is terminated. Fig. 9. Flowhart of doughnut-type obet handling.

532 KUO-LIANG CHUNG, SHIH-WEI YU, HSUEH-JU YEH, YONG-HUAI HUANG AND TA-JEN YAO (a) (b) () (d) Fig. 10. One obet segmentation example; (a) First frame in the Claire video sequene; (b) The multilevel gradient of (a); () Segmented frame; (d) Final moving obet in the seond frame. We give a real example to demonstrate the result of eah step in our proposed obet segmentation algorithm. Fig. 10 (a) depits the first frame of the Claire video sequene. After performing steps 1.1 and 1.2, the multilevel gradient of the first frame is shown in Fig. 10 (b). When step 2 has been performed, Fig. 10 () depits the segmented frame using the watershed segmentation method. At that time, the number of regions in Fig. 10 () is seven. Fig. 10 (d) depits the final obtained moving obet after performing step 3 and step 4 between the first frame and the seond frame. Note that although the sarf of Miss Claire is near stationary, the obtained moving obet also inludes the sarf sine the sarf s neighboring regions have enough variation. 4. EXPERIMENTAL RESULTS In this setion, several experiments are arried out to ompare the performane among the VOP algorithm [4], the BR algorithm [2], and our proposed fuzzy-based obet segmentation algorithm. All the onerning obet segmentation algorithms are implemented using Borland C++ builder and the Pentium 4 1.4 GHZ personal omputer with 256 MB RAM. In our experiments, four different testing video sequenes, namely the Claire video sequene, the Mother and Daughter video sequene, the Akiyo video sequene, and one video sequene inluding a doughnut-type obet, are used for evaluating the segmentation auray performane. Eah frame size in the video sequenes is 352 288. In step 3.1 of our proposed obet segmentation algorithm, the size of eah blok and the size of searh window are seleted as 16 16 and 33 33, respetively, to obtain the motion vetor of eah blok. The signifiane level α in Eq. (3) for signifiane test is set to be 1 10-2 empirially. Under the same video sequenes as in the VOP algorithm [4], onsidering the (i 1)th and ith frames in Claire video sequene, i = 5, 8, 11 and 14, the segmented moving obet is shown in Fig. 11 by using the VOP algorithm. From the obtained six segmented moving obets in Claire video sequene, it is observed that eah sarf region is identified to be the foreground part. Fig. 12 demonstrates the segmented moving obet for Mother and Daughter video sequene. From the obtained six segmented moving obets in Mother and Daughter video sequene, it is observed that a small upper-left portion of mother s nek region is identified to be the bakground part. The main reason of this

FUZZY-BASED ALGORITHM FOR VIDEO OBJECT SEGMENTATION 533 Fig. 11. The segmented moving obets in Claire video sequene using the VOP algorithm. Fig. 12. The segmented moving obets in Mother and Daughter video sequene using the VOP algorithm. small wrong identifiation is that the hange of the small upper-left portion of mother s nek region between two onseutive frames is too small to be determined as the foreground part sine the VOP algorithm doesn t onsider the influene of nek region s neighboring regions. However, sine some effetive onsiderations (mentioned in the introdution setion) are taken into aount, the overall segmentation result is satisfatory. In the BR algorithm [2], when ompared with the reliable bakground image, the moving obet of urrent frame an be identified from the aumulated bakground information. Using the same video sequene as in the BR algorithm, Fig. 13 demonstrates the segmented moving obets for Akiyo video sequene. From the obtained six segmented moving obets in Akiyo video sequene, it is observed that most portion of eah sarf region is identified to be the foreground part after aumulating foreground information. However, in Fig. 13, a small lower portion of the sarf region may be identified to be the bakground part when the aumulated bakground information is not enough. The BR algorithm has good segmentation auray for near stationary region when the BR algorithm has aumulated enough bakground information. On the other hand, if the BR algorithm an aumulate a more long term bakground information, a more reliable bakground an be onstruted, and then the BR algorithm an identify more reliable segmented moving obet in the frame. Fig. 13. The segmented moving obets in Akiyo video sequene using the BR algorithm.

534 KUO-LIANG CHUNG, SHIH-WEI YU, HSUEH-JU YEH, YONG-HUAI HUANG AND TA-JEN YAO For Claire, Mother and Daughter, Akiyo video sequenes, using our proposed algorithm, the sequene of the segmented moving obets are depited in Figs. 14-16, respetively. In the sequene of the segmented moving obets, the segmented ontours are rather enouraging. We notie that the sarf or nek regions in all the video sequenes are near stationary regions but eah sarf or nek region is surrounded by some regions with enough moving variation. However, our fuzzy approah an integrate the two influening fators, the urrent region and its neighboring regions, to obtain a positive deision value (see Eq. (6)) in order to determine that the sarf or nek region is the foreground part. On the other hand, eah sarf or nek region belongs to one subset of the whole segmented moving obet. When ompared to the VOP algorithm and the BR algorithm, our proposed obet segmentation algorithm has enouraging segmentation auray for near stationary region whose neighboring regions have enough variation. Fig. 14. The segmented moving obets in Claire video sequene using our proposed algorithm. Fig. 15. The segmented moving obets in mother and daughter video sequene using our proposed algorithm. Fig. 16. The segmented moving obets in Akiyo video sequene using our proposed algorithm. Finally, one doughnut-type video sequene is further used to examine the segmentation auray performane of our proposed algorithm. Using our proposed algorithm, Fig. 17 demonstrates the segmentation results for Hsuehu video sequene. The doughnuttype obet irulated by the left arm an be identified to be the bakground part by the help of step 4 in our proposed algorithm.

FUZZY-BASED ALGORITHM FOR VIDEO OBJECT SEGMENTATION 535 Fig. 17. The segmented moving obets in doughnut-type video sequene using our proposed algorithm. 5. CONCLUSION This paper has presented a novel fuzzy-based obet segmentation algorithm for identifying moving obets in video sequenes. Our proposed obet segmentation algorithm onsists of three maor steps, namely the spatial segmentation step, the temporal traking step, and the step for identifying the moving obet from the frame in an effiient, fuzzy way. Espeially, our proposed algorithm an robustly distinguish the foreground part, whih is a stationary region but surrounded by some regions with moving variation, from the bakground. In the previous VOP algorithm, the foreground regions in the first frame of the video sequene must be identified by users in advane sine that the foreground regions in the referene frame must be known and these foreground regions are proeted to the urrent frame to assist the segmentation of the moving obets. The main ontribution of our proposed fuzzy-based approah is that without onsidering the foreground and bakground information of the referene frame, our approah an segment the moving obets automatially. Under several different real video sequenes, experimental results demonstrate that the obet segmentation auray of our proposed fuzzy-based algorithm is enouraging when ompared to the previous two published obet segmentation algorithms, the VOP algorithm [4] and the BR algorithm [2]. REFERENCES 1. T. Aah, A. Kaup, and R. Mester, Statistial model-based hange detetion in moving video, Signal Proessing, Vol. 31, 1993, pp. 165-180. 2. S. Y. Chien, S. Y. Ma, and L. G. Chen, Effiient moving obet segmentation algorithm using bakground registration tehnique, IEEE Transations on Ciruits Systems for Video Tehnology, Vol. 12, 2002, pp. 577-586. 3. J. Guo, J. W. Kim, and C. C. J. Kuo, Fast and aurate moving obet extration tehnique for MPEG-4 obet-based video oding, SPIE, Vol. 3653, 1999, pp. 1210-1221. 4. M. Kim, J. G. Choi, D. Kim, H. Lee, M. H. Lee, C. Ahn, and Y. S. Ho, A VOP generation tool: automati segmentation of moving obets in image sequenes based on spatio-temporal information, IEEE Transations on Ciruits Systems for Video Tehnology, Vol. 9, 1999, pp. 1216-1226. 5. C. Kim and J. N. Hwang, Fast and automati video segmentation and traking for ontent-based appliation, IEEE Transations on Ciruits Systems for Video Teh-

536 KUO-LIANG CHUNG, SHIH-WEI YU, HSUEH-JU YEH, YONG-HUAI HUANG AND TA-JEN YAO nology, Vol. 12, 2002, pp. 122-129. 6. I. Kompatsiaris and M. G. Strintzis, Spatiotemporal segmentation and traking of obets for visualization of videoonferene image sequenes, IEEE Transations on Ciruits Systems for Video Tehnology, Vol. 10, 2000, pp. 1388-1402. 7. K. Haris, S. N. Efstratiadis, N. Maglaveras, and A. K. Katsaggelos, Hybrid image segmentation using watersheds and fast region merging, IEEE Transations on Image Proessing, Vol. 7, 1998, pp. 1684-1699. 8. R. Meh and M. Wollborn, A noise robust method for 2D shape estimation of moving obets in video sequenes onsidering a moving amera, Signal Proessing, Vol. 66, 1998, pp. 203-217. 9. T. Meier and K. N. Ngan, Video segmentation for ontent-based oding, IEEE Transations on Ciruits Systems for Video Tehnology, Vol. 9, 1999, pp. 1190-1203. 10. A. Neri, S. Colonnese, G. Russo, and P. Talone, Automati moving obet and bakground separation, Signal Proessing, Vol. 66, 1998, pp. 219-232. 11. E. Parzen, Modern Probability Theory and Its Appliations, John Wiley and Sons, New York, 1960. 12. T. Sikora, The MPEG-4 video standard verifiation model, IEEE Transations on Ciruits Systems for Video Tehnology, Vol. 7, 1997, pp. 19-31. 13. A. M. Tekalp, Digital Video Proessing, Prentie-Hall, Englewood Cliffs, NJ, 1995. 14. C. Toklu, A. M. Tekalp, and A. T. Erdem, Semi-automati video obet in the presene of olusion, IEEE Transations on Ciruits Systems for Video Tehnology, Vol. 10, 2000, pp. 624-629. 15. Y. Tsaig and A. Averbuh, Automati segmentation of moving obets in video sequene: a region labeling approah, IEEE Transations on Ciruits Systems for Video Tehnology, Vol. 12, 2002, pp. 597-611. 16. L. Vinent and P. Soille, Watershed in digital spae: an effiient algorithm based on immersion simulations, IEEE Transations on Pattern Analysis and Mahine Intelligene, Vol. 13, 1991, pp. 583-598. 17. D. Wang, A multisale gradient algorithm for image segmentation using watershed, Pattern Reognition, Vol. 30, 1997, pp. 2043-2052. 18. D. Wang, Unsupervised video segmentation based on watersheds, and temporal traking, IEEE Transations on Ciruits Systems for Video Tehnology, Vol. 8, 1998, pp. 539-546. Kuo-Liang Chung ( ) reeived the Ph.D. degree from National Taiwan University. Prof. Chung reeived the Distinguished Researh Award (2004 to 2007) from the National Si. Counil, Taiwan. His researh interests inlude image/video ompression, image/video proessing, and multimedia appliations.

FUZZY-BASED ALGORITHM FOR VIDEO OBJECT SEGMENTATION 537 Shih-Wei Yu ( ) reeived the M.S. degree in Computer Siene and Information Engineering from National Taiwan University of Siene and Tehnology. His researh interests inlude image proessing and image ompression. Hsueh-Ju Yeh ( ) reeived the M.S. degree in Computer Siene and Information Engineering from National Taiwan University of Siene and Tehnology. His researh interests inlude image proessing and image ompression. Yong-Huai Huang ( ) reeived the M.S. degree in Computer Siene and Information Engineering from National Taiwan University of Siene and Tehnology. He is now pursuing the Ph.D. degree in the same department. His researh interests inlude image proessing, mage/video ompression, and multimedia appliations. Ta-Jen Yao ( ) reeived the M.S. degree in Computer Siene and Information Engineering from National Taiwan University of Siene and Tehnology. His researh interests inlude image proessing and image ompression.