Object based Pseudo-3D Conversion of 2D Videos

Object based Pseudo-3D Coversio of 2D Videos J. Jiag 1,2 ad G. Xiao 1 1 Southwest Uiversity; 2 Uiversity of Bradford ABSTRACT: I this paper, we describe a ew algorithm to costruct pseudo-3d videos out of covetioal 2D videos at the viewig ed, where o additioal 3D iformatio is attached at the source of 2D video productio. We ame such costructed videos as pseudo-3d o the groud that the coverted video is ot true 3D but presets perceptual 3D effect whe viewed with a pair of polarized glasses. The proposed algorithm cosists of two video processig stages: (i) sematic video object segmetatio; ad (ii) estimatio of disparities. I the first stage, we propose a costraied regio-growig ad filterig approach to improve existig segmetatio techiques based o chage detectios. Such a processig stage esures that disparities are estimated i terms of sematic video objects rather tha textured regios, ad thus improve the pseudo-3d effect i terms of huma visual perceptio. I the secod stage, we propose a VO-size (video object) based disparity estimatio to costruct additioal video frames for the proposed pseudo-3d video coversio. Experimets carried out demostrate that, the proposed algorithm effectively produces perceptually harmoious pseudo-3d videos with advatages of simplicity ad low computig cost. Idexig Terms: Pseudo-3D videos, stereo video processig, ad sematic video object segmetatio 1. Itroductio Followig the o-goig research upo high resolutio digital TV, 3D TV is believed to be the ext major revolutio i the history of TV techologies. Several compaies have exhibited their 3D glass-free displayig systems[1-2], which illustrate a promisig potetial that 1

customers would ot eed to wear a pair of 3D glasses i order to ejoy the visual cotet i 3Ds. However, aother key issue is the eed for 3D video cotet geeratio. It is clear that the success of ay future 3D TV system will greatly deped o the availability of sufficiet 3D video materials. While it is obvious that the demad for 3D video cotet ca be partially met by ew 3D productios with higher cost, stereo coversio of existig rich source of 2D videos would remai a excellet choice [3]. The problem with such choice is that it is extremely challegig to automatically covert 2D videos ito 3D, sice o true stereo iformatio is geerally available iside covetioal 2D videos. As a result, people ofte rely o maual 3D coversio. I the commercial sector, there exist a rage of compaies who provide 3D coversio services, which are both time cosumig ad expesive. I the research commuity, attempts have bee made to develop computig algorithms for automatic 3D coversio of 2D videos, which are maily represeted by a Europea IST project ATTEST [4]. Approaches ivestigated i this project iclude: (i) prior kowledge based approach, where research is focused o huma objects oly to exploit those pre-kow features o huma bodies [4]; (ii) camera calibratio approach, where camera motio is to be detected from 2D videos to provide cues for 3D coversio [5] ad (iii) motio based approach, where motio iformatio is primarily targeted to eable o-lie depth augmetatio to be added ito received 2D videos [6-7]. Dyamic Digital Depth Research i Australia developed series of techical solutios to the 2D/3D coversio problem [8]. Their approaches ca be characterised by two directios. Oe is o the productio side ad the other is o the viewers side. O the viewers side, the 2D-to-3D coversio is challegig. Their curret solutio is based o a semi-automatic approach, where a software called DeepSee Studio is developed to eable a operator to lasso a object ad segmet it out from the rest of the image. The operator is the required to assig a artistic depth to the object (a apparet distace from the viewer). Followig that, the software (DeepSee Studio) ca fill the object 2

with the appropriate shade of grey correspodig to this distace ad complete the 2D/3D coversio. I 2001, Motorola Australia Research Cetre reported their work o 2D to 3D coversio for header & shoulder photos[9]. Their approach is characterized by five steps: (i) usig sake modellig algorithm to locate the head ad shoulder boudary ad remove the existig backgroud; (ii) parametric represetatio of header cotour usig multiple ellipsoids; (iii) locatig mai facial features (eyes, ose etc.); (iv) creatig parametric depth map based o facial features; ad (v) estimatio/compositio of left ad right views based o the origial photo ad the parameterised depth map. To automatically implemet the techology, three costraits must be coformed, which iclude: (1) the origial photo must cotai a huma head ad shoulder; (2) the head orietatio is such that the subject is lookig directly towards the camera ad (3) the photo should be take with a ucluttered backgroud. Otherwise, a user itervetio is required. Give the challege ad the difficulties illustrated i existig research towards 3D coversio of 2D videos, we propose a pseudo-3d approach to provide alterative solutios, where the coverted videos are ot true 3D but achieve perceptual 3D effect. Such a specificatio provides eormous scopes ad flexibilities for developmet of video coversio ad processig algorithms. To this ed, we report our ivestigatios alog this route to produce automatic pseudo-3d coversio of 2D videos. Our algorithm desig features a sematic object segmetatio ad a simple disparity estimatio based o the size of the segmeted objects, where operatioal elemets ca be implemeted o real-time basis sice the computig cost ivolved is low ad the simplicity of the algorithm is high. The rest of the paper is orgaized ito three sectios. While sectio 2 describes the sematic object segmetatio, sectio 3 describes disparity estimatio ad pseudo-3d video 3

geeratio, ad sectio 4 reports experimetal results ad provide discussios o experimetal aalysis. Fial cocludig remarks are also icluded i this sectio. 2. Sematic Video Object Segmetatio Existig research o video object segmetatio is built upo detectio of chages assisted by other sidelie iformatio icludig spatial segmetatio, edge detectio ad backgroud registratio etc.[10~19]. I [10], Kim et. Al described a spatio-temporal approach for automatic segmetatio of video objects, where hypothesis test based o the estimated variaces withi a widow is proposed to exploit the temporal iformatio, ad spatial segmetatio is icluded to assist with detectio of object boudaries. The fial decisio o foregroud ad backgroud objects is made i combiig the spatially segmeted object mask with the temporally segmeted object mask, i which a two-stage process is desiged to cosider both the chage detectio ad the historical attributes. I [11], aother similar approach was described towards a robust or oise-isesitive video object segmetatio, which follows the idea of combiig spatial edge iformatio with motio-based edge detectio. The described algorithm starts with edge detectio to derive a edge map by Cay edge detector [11], i which a gradiet operatio o the Gaussia covoluted image is performed. Give the th video frame I, the Cay edge detectig operatio ca be represeted as: ( G ) Φ( I ) = θ * (1) I where G*I stads for the Gaussia covoluted image, for the gradiet operatio, ad θ for the applicatio of o-maximum suppressio ad the thresholdig operatio with hysteresis to detect ad lik the edges. 4

To segmet the video objects, the algorithm requires three edge maps to be extracted, which are differece edge map DE Φ( I I ) = 1, curret edge map E = Φ I ), ad backgroud edge map E b, which cotais backgroud edges to be defied by maual process or by coutig the umber of edge occurrece for each pixel through the first several frames [11]. The three edge maps are the used to produce a curretly movig edge map, ( chage ME, by selectig all edge pixels withi a small distace of DE, ad the temporarily still movig edge map, still ME by cosiderig the previous frame s movig edges. Specific represetatios of such operatios ca be highlighted as follows: ME chage = e E mi e x T chage (2) x DE ME still = e E e E b, mi e x Tstill (3) x ME 1 Where e stads for the edge poits detected from curret frame I, ad x stads for the edge pixel detected from differece frame (I -1 -I ). T chage ad T still are thresholds empirically determied to defie the small distace. Ref [11] selected T T = 1 for their experimets ad simulatios. chage = still After the idetificatio of those movig edges by (2) ad (3), the remaiig operatio for extractig video objects is combiig the two edge maps ito a fial movig edge map: ME = ME U ME, ad the select the object pixels via a logic AND operatio of those chage still pixels betwee the first ad the last edge pixel i both rows ad colums[11]. 5

Our assessmet of the above algorithm reveals that, while the algorithm performs well geerally, there exist a uder-segmetatio problem, where parts of the object regio are missig or there exist holes iside the sematic object beig segmeted. Whe the grey level differece betwee the object ad the backgroud is small, part of the object at its boudary will have similar itesity level to that of backgroud. I this circumstace, the edge detector fails to detect all the edges of the object, ad thus some parts iside the objects become missig. To reduce such a effect, we propose a simple liear trasformatio to ehace the cotrast of the lumiace compoet of the video frame before the Cay edge detectio is applied. Although there exist may cotrast ehacemet algorithms that may provide better performaces, our primary aim here is ot oly improvig the segmetatio accuracy, but also maitaiig the ecessary simplicity for real-time applicatios. Cosiderig the fact that icrease of cotrast will ievitably itroduce additioal oise, we also desiged a simple 2D filter to remove the oise. Give a iput video frame, ( x, y), assumig that their itesity values are limited to the I rage of [a, b], its trasformed video frame ca be geerated as: b' a' g ( x, y) = a' + ( I ( x, y) a) (4) b a The above liear trasform is applied to the whole video frame before the edge detectio ad mappigs. To remove the additioal oise itroduced by the liear trasformatio, we apply a filter based o a so-called 8-coected likig regio sig techique to both the movig edge map ME ad the extracted object sequeces. 6

Give a pixel at (x, y) whose value is 1 i the biary map, we examie its 3 3 eighborhood to produce a set of all coected poits N={A 1, A 2, A k }. If k T, a threshold for oise removal, the set of poits N will be regarded as oise ad thus beig removed. Whe part of the movig object is relatively still across a few frames, the edge maps i both (2) ad (3) will fail to iclude those poits, ad thus creatig large holes iside the segmeted object. I this case, existig post-processig such as morphological operatios will ot be able to recover those missig parts iside or at the boudary of the segmeted object. To provide a solutio, we propose a costraied regio-growig techique to recover those missig parts. Ulike the ormal regio growig used by those spatial segmetatio techiques for still images[20], our proposed growig is uder costraied situatio to reflect the fact that: (i) the seed selectio is fixed at those edge poits at the boudary of the fial edge map; (ii) the umber of pixels outside the first ad the last edge poits must be smaller tha a certai limit. I other words, if the majority of the pixels o ay row are outside the boudary of the edge map, the costraied regio grow will ot be applied. Give the fial edge map ME, we examie those pixels outside the first ad the last pixel i each row to see if ay further growig ca be facilitated by usig the pixel as seeds at the boudary of the edge map. Specifically, give the set of pixels outside the first ad the last edge poits i the ith row: PO i ={PO 1, PO 2, PO k }, we decide whether the regio of those edge poits should be grow ito ay of the poits iside PO i or ot by a test as give below: ei if POi e Te POi = (5) POi else where T e is a threshold idicatig that the pixel tested is very similar to e, which is the first or the last edge pixel depedig o which of these two edge poits is closer to the positio of PO i. 7

If the coditio is satisfied, the PO i will be grow ito the edge poits. Otherwise, they will stay as they are outside the edge map. As the fial segmeted object is produced by cosiderig both colums ad rows iside the video frame [11], such costraied regio-grow will also apply to those poits alog the colums. After both rows ad colums are processed via costraied regio growig, the fial sematic VO is extracted by logic AND operatio of both row ad colum edge pixels as described i [11]. I summary, we proposed three simple operatios o top of the movig edge-based object segmetatio algorithm [11], which ca be highlighted as: (i) liear trasform to highlight ad stregthe the edges iside the video frames; (ii) oise filterig via 8-coected pixel approach; ad (iii) costraied regio-grow. 3. Pseudo-3D Video Geeratio By aalyzig the stereo visio system [21-22], it ca be see that the disparity value is iverse proportioal to the distace betwee huma eyes ad the object poit (depth), which ca be summarized as follows: f D = B (6) Z where D stads for the disparity value, f the focus legth, Z the depth, ad B the distace betwee left ad right optical ceters. As huma visual perceptio is tolerat to certai level of pixel distortio, which is the priciple of lossy video compressio, we may ot eed true disparity values to costruct stereo versio of those 2D videos ad achieve perceptual 3D effects. As explaied i the itroductio, 8

we refer such coverted 3D videos as pseudo-3d videos, idicatig the fact that the proposed 3D coversio is ot true 3D. To this ed, automatic coversio of 2D videos ito their pseudo-3d versio becomes less challegig. The remaiig issue becomes how to maximize its pseudo-3d effectiveess. Although geeral 2D videos do cotai some iformatio leadig to their true disparity estimatio [3], which formulate the foudatio for those multi-view algorithms, such pixel-wise estimatio of disparities could ivolve high computig cost ad legthy processig. To speed up the process ad reduce the computatioal cost, we could just segmet the scees ito a umber of meaigful cotet layers, such as object layers ad backgroud layer, ad the estimate disparity values for those layers rather tha each idividual pixel. As a result, sematic object segmetatio becomes crucial for such pseudo- 3D coversio. Although segmetatio has bee researched for past decades, sematic object segmetatio has ot started util recet years whe image processig starts to move from low-level to sematic level. The advatage of such sematic object segmetatio lies i the fact that segmeted objects are ot textured cosistet regios aymore, but represetig idividual object, matchig huma visual cotet uderstadig. I our pseudo-3d video coversio, such sematic object based disparity estimatio ad allocatio will sigificatly ehace our 3D viewig experieces sice the disparity allocated matches huma visual cotet uderstadig. Cosequetly, such approach will also provide additioal tolerace for iaccuracy i disparity estimatios. To estimate the disparity for each segmeted object, equatio (6) ca be further exploited with the fact that the disparity is iverse proportioal to the magificatio (Z/f), ad the ature of such magificatio ca be characterized by: f O Z S (7) where O stads for the object mask area, ad S for the surface area of the object. 9

Thus equatio (6) ca be rearraged as: f O D = B B = λ O (8) Z S As B is fixed oce the stereoscopic geometry (such as parallax) is determied, the oly varyig factor is S. As a matter of fact, S ca be practically regarded as fixed with respect to the segmeted object. As a example, oce a car is segmeted as the foregroud object, its surface area is a costat value relative to the object area. Therefore, it ca be iferred that S is a parameter depedet o the area of the segmeted object. Such aalysis provides a practical platform for us to estimate the disparity value for each segmeted object. I other words, the disparity should be estimated as adaptive to the area of the segmeted object. To facilitate the estimatio of S o real-time basis, we replace B/S by a sigle scalig factor. Further, to take ito cosideratio the total frame size, we cosider such a sigle scalig factor i relatio to both the segmeted object area ad the total frame size rather tha the object area aloe. To this ed, we ca arrage (8) ito the followig: O D = λ (9) T where λ is the scalig factor ad T is the total frame size. By so doig, the proposed algorithm achieves pseudo-3d coversio of 2D videos via the fact that the frame backgroud T provides a referece for all objects to create the pseudo- 3D effect. I other words, each object is allocated a disparity value agaist the frame backgroud. I the case of multiple objects, as log as they stay apart, their pseudo-3d effect is maily perceived by viewers with respect to the frame backgroud rather tha idividual 10

object. From equatio (9), it is see that the disparity estimated for each separate object depeds o the object size ad the frame size. Therefore, large objects will have large disparity ad thus produce stroger pseudo-3d effect agaist the frame backgroud. Sice larger objects attract more attetio from viewers, visual perceptio of the proposed pseudo-3d coversio will maily be implemeted betwee those large objects ad the frame backgroud. For some small objects, eve their disparity allocatio may ot be covicig i relatio to large objects, viewers ca hardly perceive ay discomfort due to the fact that: (i) the multiple objects stay apart; (ii) small objects attract less attetio from viewers; ad (iii) all objects have their disparity estimated with referece to the frame backgroud rather tha agaist each other. Therefore, large disparity may exist betwee each object ad the frame backgroud, but the disparity amog the objects themselves remais to be small depedig o their relative sizes. Essetially, the performace of the proposed algorithm is depedet o the accuracy of object segmetatio to a large extet. Therefore, there exist a few costraits for the proposed algorithm to achieve effective pseudo-3d coversio ad esure that such coversio is perceptually covicig. These costraits iclude: (i) temporal segmetatio ca effectively divide iput videos ito sectios, where all video frames cotai cosistet scees ad movig objects are smooth ad cotiuous. As a result, segmeted objects would provide cotiuous disparity estimatio ad thus the coverted pseudo-3d video presets smooth 3D effect amog objects agaist the frame backgroud; (ii) there is o camera movemet iside the video sequeces to esure that the backgroud ca be effectively separated by the proposed segmetatio algorithm. For those video frames with camera movemet, additioal tool is eeded to detect such movemet [5, 25] ad dedicated disparity estimatio has to be desiged; (iii) i the case of multiple objects, they do ot cotiuously move together ad overlap. Otherwise, occlusio will occur ad the proposed algorithm will segmet the overlapped objects ito oe sigle object. As a result, the disparity estimated could suddely become much 11

larger i compariso with its previous oes ad thus create a abrupt effect, which may cause discomfort for viewers. However, the proposed algorithm will ot produce cotradictig pseudo-3d effect i ay circumstaces. This is because the proposed algorithm always estimates disparity for each object agaist the backgroud frame rather tha agaist each other amog objects. Therefore, whether with differet objects ad they stay apart or occluded, their pseudo-3d effect created by the proposed algorithm will ot cotradict with their origial 2D view, sice the worst case is that the occluded objects are allocated with the same disparity ad thus they become 2D withi themselves, but still pseudo-3d with respect to the backgroud.. To view such coverted pseudo-stereo videos ad prove the cocept illustrated i (9), we arraged the stereo display as such that objects i the left eye chael are moved toward right by a few pixels, ad objects i the right eye chael are moved toward left by a few pixels. The visual effect becomes that the objects are behid the display plae, ad the backgroud layers for the two chaels are geerated as a itermediate objects layer that are moved by a few more pixels. As a result, the backgroud layer is displayed behid the object layer. Such arragemet is equivalet to the sceario that a 3D real world scee is preseted behid a widow (the widow glass is the display plae). I additio, to complete the fial pseudo-3d video coversio, we take the existig 2D video as the left chael ad usig the estimated disparity to costruct a right chael by shiftig pixels as explaied. To save the cost of viewig polarized images, we simply put the left frame ito the gree compoet ad the right frame ito the red compoet of the origial coloured video frames. Hece, with a glass of red ad gree (red o the right), people ca view the coverted pseudo-stereo videos ad assess their 3D effect. 12

4. Experimets Desig ad Cocludig Remarks I this sectio, we desig two experimetal phases to evaluate the proposed pseudo-3d video algorithm. The first phase is desiged to evaluate the proposed object segmetatio ad the secod phase is to evaluate the 3D effect of the coverted pseudo-3d videos. Throughout the experimets, we use a set of three video clips: Hall-moitor, Mother&daughter, ad Girl, all of which are publicly available ad Hall-moitor is the same as that used i [11]. This arragemet will have advatages that (i) such ope test data set eables ay other developed algorithms to be bechmarked by our proposed algorithm without repeatig our work; (ii) comparative assessmet of our proposed algorithm ca be easily implemeted by usig these ope test data set. For the first phase evaluatio, our desig aims at eablig detailed aalysis of how each elemet of our proposed object segmetatio actually cotributes to the effect of sematic video object segmetatio. To this ed, we implemeted the VO segmetatio algorithm described i [11] as our bechmark, ad carried out experimets each time oe elemet of the proposed is added. These elemets iclude: (i) liear trasform for cotrast ehacemet; (ii) filterig for removal of oise; ad (iii) costraied regio-growig. To highlight the advatages achieved by the above elemets i sematic video object segmetatio, we select Hall-moitor ad mother&daughter to illustrate our experimetal results. Figure-1 illustrates the segmeted results by addig liear trasformatio ito the bechmark, where part-(a), (c), (e) ad (f) are the segmeted objects by the bechmark ad part-(b), (d), (f) ad (g) are the segmeted objects by the bechmark with the proposed liear trasform. As see, the proposed liear trasform itroduced additioal oise while the segmetatio accuracy is improved. 13

Figure-2 illustrates the segmeted results by addig both the liear trasform ad the filter ito the bechmark, which show that the oise itroduced is effectively removed. Figure-3 illustrates the segmeted results by addig the elemet of costraied regiogrowig ito the bechmark, where part-(a) ad (c) represets the results of the bechmark ad part-(b) ad (d) the results of the proposed. Although the proposed regio-growig ca ot recover all the missig parts, it is see that the recovery of the proposed techique does have improvemet compared with the bechmark. By puttig all the elemets together, the segmeted video objects by the proposed algorithm ca be illustrated i Figure-4. Note that all the figures illustrated here are much larger tha those illustrated i refereces [10-19]. If we make the pictures smaller, the segmetatio results will look better as those boudaries will look smoother. Correspodigly, a sequece of the segmeted objects is illustrated i Figure-5 to provide a cotiuous visual ispectio. For the secod phase experimet, we implemeted the proposed algorithm by addig the segmetatio ad disparity estimatio together o a PC computig platform ad ru the software simulatio i C++ to covert the 2D videos iside the test data set ito their pseudo- 3D videos. Figure-6 illustrates such coverted pseudo-3d video for Hall-Moitor ad Girl, i which part (a~c) show the segmeted foregroud objects, ad part (d~f) show the coverted Hall&moitor i pseudo-3d, ad part (g~i) show the coverted Girl i pseudo-3d. To visually ispect the 3D effect of these coverted pseudo-3d video frames, we eed to wear a paper glass with gree o the left ad red o the right. Such 3D display is the cheapest optio suitable for academic research laboratories, where facilities are ofte uder fuded. From all the illustratios, it is see that the proposed algorithm achieves pseudo-3d coversio of 2D videos with certai level of perceptual 3D effect. Apart from beig able to improve the existig sematic video object segmetatio, the proposed work explores a ew 14

approach i geeratig pseudo-3d video cotet rather tha true 3D out of covetioal 2D videos, where the criteria of 3D coversio is chaged to relax the whole process ad thus make the coversio less challegig. Compared with existig approaches, our proposed algorithm features a pseudo-3d criterio, where oly perceptual 3D effect is targeted istead of tryig to be close to their origial true 3D. Uder this criterio, a rage of ew possibilities ca be explored for further research ad the existig efforts ca also be further exteded. Examples of such directios could iclude: (i) 3D modelig iside videos [23]; (ii) depth estimatio [24], (iii) multi-view imagig [26, 27], ad (iv) further processig tools to overcome the costraits [5, 25]. Fially, the authors wish to ackowledge fiacial support from both HERMES project, fuded uder Europea Framework-7 programme, ad LIVE project, fuded uder Europea Framework-6 programme. Refereces [1] http://www.seereal.com/default.e.htm [2] http://www.cl.cam.ac.uk/research/raibow/projects/asd.html [3] http://www.3d-coversio.com/exama.html [4] http://www.cordis.lu/ist/projects/projects.htm [5] Marc Pollefeys et. Al Self-calibratio ad metric recostructio i spite of varyig ad ukow itrisic camera parameters, Iteratioal Joural of Computer Visio, Kluwer Academic Pulishers, 1998 [6] F. Erst, P. Wiliski, K va Overveld Dese structure from motio-a approach based o segmet matchig, Lecture Note i Computer Sciece, Spriger-Verlag GmbH, Vol 2351, 2002; [7] P. Wiliski ad K. va Overveld Depth from motio usig cofidece based block matchig, Proceedigs of Image ad Multidimesioal Sigal Processig Workshop, Alpbach, Austria, 1998, pp159-162 [8] Phil Harma Home based 3D etertaimet-a overview, ICIP 2000, Vacouver, September 10-13 th, 2000; [9] C. Weerasighe, P. Oguboa ad W. Li 2D to Pseudo-3D coversio of head ad shoulder images usig feature based parametric disparity maps, IEEE 0-7803-6725-1/01, pp963-966 15

[10] Kim M. et al. A VOP geeratio tool: automatic segmetatio of movig objects i image sequeces based o spatio-temporal iformatio, IEEE Tras. Circuits, Systems for Video Techology, Vol 9, NO 8, 1999, pp1216-1226; [11] Kim C. ad Hwag J.N. fast ad automatic video object segmetatio ad trackig for cotet-based applicatios, IEEE Tras. Circuits ad Systems for Video Techology, Vol 12, No. 2, 2002, pp122-128; [12] Salembier P. ad Pardas M. Hierarchical morphological segmetatio for image sequece codig, IEEE Tras. Image Processig, Vol 3, No 5, 1994, pp639-648; [13] Chie S.Y. et. Al. Efficiet movig object segmetatio algorithm usig backgroud registratio techique, IEEE Tras. Circuits ad Systems for Video Techology, Vol 12, No 7, 2002, pp577-586; [14] Shamim A. ad Robiso A. Object-based video codig by global-to-local motio segmetatio, IEEE Tras. Circuits, Systems for Video Techology, Vol 12, No 12, 2002, pp1106-1116; [15] Feg G.C. ad Jiag J "Image segmetatio i compressed domai" Joural of Electroic Imagig, Vol.12, No.3, SPIE, 2003, pp390-397; [16] Toklu C. et. Al. Semi-automatic video object segmetatio i the presece of occlusio, IEEE Tras. Circuits ad Systems for Video Techology, Vol 10, NO 4, 2000, pp624-629; [17] Kervra C. ad Heitz F. Statistical deformable model-based segmetatio of image motio, IEEE Tras fast ad automatic video object segmetatio ad trackig for cotet-based applicatios, IEEE Tras. Image Processig, Vol 8, No 4, 1999, pp583-588; [18] Meier T. ad Nga K. Automatic segmetatio of movig objects for video object plae geeratio, IEEE Tras. Circuits, Systems for Video Techology, Vol 8, No 5, 1998, pp525-538; [19] Xu Y. et. Al. Object-based image labelig through learig by example ad multi-level segmetatio, Patter Recogitio, Vol 36, pp1407-1423, 2003. [20] Adams R. ad Bischof L., Seeded regio growig, IEEE Tras. Patter Aal. Machie Itell, vol.16, o.6, 1994, 641-647. [21] J. Jiag ad E. Edirisighe A hybrid scheme for low bit rate codig of stereo images, IEEE Tras. O Image Processig, Vol 11, NO 2, 2002, pp123-134; [22] S.Barard, W. Tompso, Disparity aalysis of images, IEEE Tras. Patter Aal. Mach. Itell. 2 (July 1980) 333-340; [23] D. Hoiem, A.A. Efros, ad M. Hebert, "Automatic Photo Pop-up", ACM SIGGRAPH, 2005; [24] Torralba A, Oliva A Depth estimatio from image structure IEEE Tras. PATTERN ANAL 24 (9): 1226-1238 SEP 2002 ; [25] Sato T, Kabara M, Yokoya N 3-D modelig of a outdoor scee from multiple image sequeces by estimatig camera motio parameters LECT NOTES COMPUT SC 2749: 717-724 2003; [26] Sato T, Kabara M, Yokoya N, et al. Dese 3-D recostructio of a outdoor scee by hudreds-baselie stereo usig a had-held video camera INT J COMPUT VISION 47 (1-3): 119-129 APR-JUN 2002; [27] Fa J. et al MultiView: multilevel video cotet represetatio ad retrieval, Joural of Electroic Imagig, Vol 10, No 4, 2001, pp895-908. 16

(a) (b) (c) (d) Fig. 1. Illustratio of sematic object segmetatio, (a) Hall&moitor frame 71 by the bechmark; (b) frame71 by bechmark+liear-trasform; (c) Mother&daughter segmeted by bechmark; (d) bechmark+liear-trasform for mother&daughter. (a) (b) Fig. 2 Sematic object segmetatio by bechmark+liear_trasform+filterig: (a) Hall&moitor (frame-71) segmetatio by bechmark+liear_trasform+filterig, (b) mother&daughter (frame-304) segmetatio by bechmark+liear_trasform+filterig. 17

(a) (b) Fig. 3 (a) Hall&moitor (frame-73) segmetatio by bechmark, (b) same frame segmetatio by bechmark + costraied regio-growig. (a) (b) (c) (d) Fig. 4 Fial segmetatio results by the proposed algorithm: (a)~(b) are origials, ad (c)~(d) are the segmeted video objects. 18

(a) segmeted objects i sequetial frames by the bechmark (frame 45~51) (b) segmeted objects i sequetial frames by the proposed algorithm (frame 45~51) (c) segmeted objects i sequetial frames by the bechmark (frame 55~61) (d) segmeted objects i sequetial frames by the proposed algorithm (frame 55~61) (e) segmeted objects i sequetial frames by the bechmark (frame 65~71) (f) segmeted objects i sequetial frames by the proposed algorithm (frame 65~71) Fig. 5 Fial segmetatio compariso with sequetial frames for Hall&Moitor 19

(a) (b) (c) (d) (e) (f) (g) (h) (i) Figure-6: Illustratio of experimetal results. (a) segmeted object i frame 35 of Hall&Moitor; (b) segmeted object i frame 145 of Hall&moitor; (c) segmeted object i frame 655 of Girl. (d) pseudo-3d coverted frame 37 of Hall&moitor; (e) Pseudo-3D coverted frame 55 of Hall&moitor; (f) Pseudo-3D coverted frame 78 of Hall&moitor; (g) Pseudo-3D coverted frame 655 of Girl; (h) Pseudo-3D coverted frame 663 of Girl; (i) Pseudo-3D coverted frame 684 of Girl. 20