Sprite Generation and Coding in Multiview Image Sequences

Size: px
Start display at page:

Download "Sprite Generation and Coding in Multiview Image Sequences"

Transcription

1 302 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 10, NO. 2, MARCH 2000 Sprite Generation and Coding in Multiview Image Sequences Nikos Grammalidis, Student Member, IEEE, Dimitris Beletsiotis, and Michael G. Strintzis, Senior Member, IEEE Abstract A novel algorithm for the generation of background sprite images from multiview image sequences is presented. A dynamic programming algorithm, first proposed in [1] using a multiview matching cost, as well as pure geometrical constraints, is used to provide an estimate of the disparity field and to identify occluded areas. By combining motion, disparity, and occlusion information, a sprite image corresponding to the first (main) view at the first time instant is generated. Image pixels from other views that are occluded in the main view are also added to the sprite. Finally, the sprite coding method defined by MPEG-4 is extended for multiview image sequences based on the generated sprite. Experimental results are presented, demonstrating the performance of the proposed technique and comparing it with standard MPEG-4 coding methods applied independently to each view. Index Terms Dynamic programming, MPEG-4, multiview video applications, sprites. I. INTRODUCTION ABACKGROUND sprite is an image composed of pixels belonging to a video object visible throughout a video segment. For instance, a sprite generated from a panning sequence will contain all visible pixels of the background object throughout the sequence. Certain portions of this background may not be visible in certain frames due to the occlusion of the foreground objects or the camera motion. Since the sprite contains all parts of the background that were at least visible once in an image sequence, the sprite can be used for the reconstruction or the predictive coding of the background. Sprites for background representation are also commonly referred to as salient stills [2], [3] or background mosaics in the literature [4] [10]. The procedure for generating background sprite images from a video sequence typically starts by detecting scene cuts (changes) and thus dividing the video sequence in subsequences containing similar content. A background mosaic (sprite) is then generated for each subsequence by warping (aligning) different instances of the background region to a fixed coordinate system, after estimating their motion using a two-dimensional (2-D) or three dimensional (3-D) motion model. Finally, information from all warped images is combined into the sprite image by using median filtering or averaging operations. Manuscript received March 15, 1999; revised October 20, This work was supported by the EU IST INTERFACE and the GSRT PAVE and PANORAMA projects. This paper was recommended by Guest Editor Y. Wang. The authors are with the Information Processing Laboratory, Department of Electrical and Computer Engineering, University of Thessaloniki , Greece ( ngramm@dion.ee.auth.gr; dbel@panorama.ee.auth.gr; strintzi@eng.auth.gr). Publisher Item Identifier S (00) A method for encoding sprite images has been included in the emerging MPEG-4 standard [11], [12]. This method is based on describing simple camera motion models (e.g., translational, affine or perspective) by the 2-D motion of a number of points, called reference points. Since the sprite images are often much larger than the initial images, their coding is complicated by the significant delay (latency) incurred when they are coded and decoded as I-frames. Since the frames following the first are coded and decoded based on the sprite image, such delays may hinder real-time implementation. However, in MPEG-4, the sprite coding syntax allows large static sprite images to be transmitted piece by piece as well as hierarchically, so that the latency incurred in displaying a video sequence is significantly reduced. In earlier sprite-generation procedures, no segmentation was used, and the generated sprites always corresponded to the region with the dominant motion, i.e., usually the background. In this case, foreground objects were removed by using temporal averaging or median filtering. However, in order to improve the quality of the generated sprite images and to be able to generate sprites for foreground objects, a number of techniques have been proposed to segment the scene into a number of layers [10], [13] [15]. Layers are regions which typically correspond to the physical objects in the scene. If a multilayered scene description is available, sprites can be easily obtained for each layer using standard sprite generation techniques or more sophisticated involving depth and transparent objects [15]. Although much effort is spent in the past to design sophisticated layer segmentation procedures, this still remains in many respects an open problem. In this paper, the sprite generation and coding procedures are generalized for the case of multiocular systems, consisting of two or more cameras. Multiocular systems provide the viewer with the appropriate monoscopic or stereoscopic view of a scene, depending on his position. Several coding schemes have been proposed for stereoscopic [16] and multiview image sequences [1], [17]. A common characteristic in these coding schemes is the use of disparity information to eliminate redundancies between images from different views. Furthermore, the detection of occlusions, i.e., points not visible in all views, provides additional information that can improve coding results. Techniques based on dynamic programming have been applied for the purposes of disparity estimation and simultaneous occlusion detection [1], [18] [21] in stereoscopic sequences. A significant advantage of these techniques is that they provide a global solution for the disparity estimation/occlusion detection problem under local constraints, such as constraints related to correlation, smoothness, or disparity gradient limit /00$ IEEE

2 GRAMMALIDIS et al.: SPRITE GENERATION AND CODING IN MULTIVIEW IMAGE SEQUENCES 303 Fig. 1. A multiview system with three viewpoints. In previously proposed sprite-generation techniques [4] [6], motion information has been extensively used to identify the objects in the scene (segmentation) and to determine their position in the sprite image (warping). The present paper proposes novel techniques for sprite generation, in which foreground and background segmentation is mainly based on the estimated disparity and occlusion information. Clearly, this is a more natural way for background identification, especially in sequences with very small motion, where segmentation based on motion may fail. Furthermore, motion information is used in this paper in a second segmentation step, in order to assign small or occluded regions to the background or foreground regions. The main contribution of this paper is the use of disparity and occlusion information to add information from all available views to the background sprite image. For example, a part of the background that is occluded in one view may be added to the sprite from another view, where this part appears. The sprite is generated in two stages: the first involves the frames from the first (leftmost) view, uses disparity and occlusion information for segmentation purposes, and is otherwise similar to previously proposed sprite-generation procedures [5], [11]. The second step involves the frames from the other views and is based exclusively on the estimated disparity and occlusion information. The sprite coding mode defined by MPEG-4 is then used to code the background region in the entire multiview sequence. Full compliance to the MPEG-4 sprite coding mode is achieved by using the same 6-parameter affine model to model both motion or disparity information describing the warping transformation between a frame and the sprite image. This model has seen to be efficient in situations where either the structure of the imaged scene is approximately planar, or the scene is sufficiently far from the camera [6], [22]. The entire multiview sequence can be then coded, according to the MPEG-4 sprite coding mode, by reordering all the frames from a group of frames (GOF) into a single sequence as follows: first the frames from the first view, then the corresponding frames from the second view, and so on. An advantage of this technique is that no disparity or occlusion information needs to be coded for the background region. Experimental results demonstrate significant reduction in the required bit rate if a single sprite image is used for the entire background of the multiview sequence. The rest of the paper is organized as follows. The algorithm used for disparity and occlusion information, which was described in detail in [1], is summarized in Section II. The procedure used to generate a sprite image from the first (leftmost) view of a multiview image sequence is described in Section III. In Section IV, the procedure to generate sprites from the other available views, based on disparity and occlusion information, is presented. Then, the sprite coding scheme defined by MPEG-4 is generalized for the case of multiview sequences. In Section V, experimental results are obtained using a four-view sequence and a stereo sequence. Comparisons are made against standard MPEG-4 coding schemes, with or without the use of sprite images, applied independently to each view. Finally, conclusions and suggestions for future extensions of the proposed approach are presented in Section VI. II. A METHOD FOR DISPARITY ESTIMATION AND OCCLUSION DETECTION Consider a multiocular system with viewpoints arranged on a horizontal line. A trinocular system with viewpoints is shown in Fig. 1. Let be the image corresponding to viewpoint and denote the -coordinate of the (perspective) projection of a 3-D point in. We shall estimate disparity and occluded areas for the first (leftmost) image. Disparity with respect to of a point visible in is defined by if is visible in (1) undefined if is occluded in.

3 304 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 10, NO. 2, MARCH 2000 By assuming that central projection is used and that all optical axes are parallel, it may be shown [1] that if is visible in, then its disparity equals (2) where is the baseline corresponding to the th viewpoint and is the depth of. In the example of Fig. 1, points belonging to segments, and are visible in both and, thus both and are defined for these points. Since the depth is constant within each of these segments, (2) implies that the corresponding disparities and also remain constant within each of these segments. However, only is defined for each point in (projected to segment ), while neither nor is defined for points between and (projected to segment ). A pixel in the first view will be said to be in state if it is visible in views. In particular, it will be in state if it is visible in all views, in state if it is visible only in and in state if it is invisible (occluded) in all views but. For a pixel in state, it is seen from (2) that Thus, knowledge of implies knowledge of all,. A dynamic programming scheme was proposed in [1] and [23] so as to estimate the disparity vector and the state for each pixel in. The corresponding valid disparity values are then found from (3). The disparity field obtained in this manner corresponds to each pixel of the first (leftmost) view and as such may be termed L R disparity field to distinguish it from the converse R L disparity field, corresponding to each pixel of the th (rightmost) view. For pixels in state, the multiview matching cost between all corresponding pixels is defined as where and the fixed weights are chosen heuristically. The disparity is given by (3), rounded to the nearest integer, and is a window centered on the working pixel. A dynamic programming algorithm for the calculation of the disparity of pixels in and for the identification of areas occluded in at least one view may be then based on the algorithm schematically shown in Fig. 2. The multiview matching cost is associated to the transition, while a fixed occlu- (3) (4) (5) Fig. 2. Allowed transitions between states in the general (N -view) case. The disparity d = d is estimated and the occluded pixels in S (not in state S ) are detected. sion cost is used for the occlusions and. Using this algorithm, only two states are identified: assigned to pixels visible in all views, and indicating that the pixel is occluded in at least one view. Finer estimation of disparity and detection of occluded regions is achieved, by iteratively applying the same algorithm within each of the occluded segments detected by the above algorithm. As detailed in [1], the same dynamic programming algorithm can be used to provide the R L disparity field and the corresponding state information the rightmost view. with respect to III. MULTIVIEW SPRITE GENERATION USING INFORMATION FROM THE FIRST VIEW Sprites are typically generated from monoscopic image sequences by first using a scene-cut detector for the identification of the number of frames where a significant part of scene (usually a large part of the background) remains substantially the same. Each of the resulting subsequences is assumed to contain similar image content, and each is processed independently from the others. Each frame of the subsequence is segmented into a number of regions, each defining a different object. Then, a binary mask representing the shape of each object is produced, which together with the luminance (or color) information for this object comprise the video object plane (VOP) in MPEG-4 terminology. The segmentation may be based on luminance, motion and, in the case of multiview image sequences, disparity information. Segmentation based on disparity information, has the following advantages. 1) In sequences produced in videoconferencing and other similar applications, the observed motion may be very small and inadequate for accurate segmentation. However, for such sequences, efficient segmentation into foreground and background regions is possible by using multiple cameras and exploiting disparity information. 2) Luminance-edge and motion-edge information may be conveniently exploited in disparity estimation techniques based on dynamic programming, so that disparity changes are endorsed near luminance or motion edges [20]. Thus, the produced objects have more or less constant disparity, motion and texture information. This approach is very

4 GRAMMALIDIS et al.: SPRITE GENERATION AND CODING IN MULTIVIEW IMAGE SEQUENCES 305 Fig. 3. Object segmentation and motion estimation. Fig. 4. Procedure for generating sprites from multiview image sequences. suitable for situations where the disparity variations between the background region and the foreground objects is small (e.g., if the distance between the objects and the camera is very large). In such situations, even though segmentation based on the disparity alone might fail, use of luminance or motion information may significantly improve the final segmentation result. 3) Disparity provides a convenient means of layering the objects in the scene: objects with smaller absolute disparity values are at larger distance from the viewer, and are assigned accordingly to deeper layers. For the above reasons, we use disparity information as the main cue in the proposed segmentation procedure. Specifically, in order to generate the background sprite image, the background region is identified in each frame of the first (leftmost) view. A two-stage motion and disparity based segmentation technique is proposed to identify the foreground and background regions. At the first stage, a simple thresholding of the disparity fields is used to initialize the segmentation map for each frame. The threshold value is determined on the basis of the disparity histogram. Occluded pixels in at least one view are left out of this initial classification procedure. A second stage is used to correct errors in the initial segmentation result caused by local minor disparity changes or estimation errors. A connected component labeling procedure is used to find small connected regions that are labeled as artifacts and are excluded from the motion model estimation stage which follows. After having identified the background region, its motion is described using a 6-parameter 2-D affine motion model. This model can be expressed as follows: where is the pixel position at time and is the corresponding pixel at time. The upper and lower indices in indicate the time and the view (first), respectively. The estimation of is based on correspondences obtained using a standard exhaustive search block-matching procedure. If matches are available in the background region, (6) yields a system of equations with six unknowns. The unknown parameters are then estimated from this system of equations using least-squares techniques. The pixels in occluded or very small regions that were excluded from the motion estimation stage are now assigned to the background region if the average displaced frame difference for this region is smaller than the average DFD in the background region. In the above, is computed from (6) (6)

5 306 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 10, NO. 2, MARCH 2000 using the estimated motion parameters, denotes the frame from the th view at time and is the number of pixels in. Using this procedure, the final segmentation map is obtained and the final motion parameters for the background region are recalculated. The entire segmentation and motion-estimation scheme is summarized in Fig. 3. The sprite-generation procedure for the background from the main (leftmost) view uses the coordinate system of the first frame as the reference coordinate system for the sprite image. For each frame, the estimated motion parameters are used to compute the motion (warping) transformation of the object between the current frame and the reference coordinate system. Using (6), this transformation can be written as follows: (7) where and denotes a pixel position in frame from the th view. Thus, the video object is warped toward the reference coordinate system. After processing all frames of the sequence, temporal median is used to produce the final sprite image from the warped objects. An alternative method for producing the final sprite image is a progressive averaging procedure [11] Fig. 5. Estimated disparity fields and state maps for the left and right frame of the Claude sequence. Occluded areas are shown in black color. (a) L R disparity field. (b) R L disparity field. (c) State map for the L R disparity field. (d) State map for the R L disparity field. (8) Averaging may be performed on the spot, as each sample from the warped images becomes available where and are the sprite image and the warped image corresponding to the th time instant. This method is faster and requires less memory than the former; however, the use of median filtering in the former method improves the quality of sprite images since outliers (wrong samples) and noise are eliminated. IV. MULTIVIEW SPRITE GENERATION AND CODING USING INFORMATION FROM THE REMAINING VIEWS After the initial sprite-image generation based on information obtained from the first view, information from the remaining views may be added based on the estimated disparity and occlusion information. Specifically, the warping parameters corresponding to frame for the general case of views, are computed on the basis of the estimated disparity information. The position of a pixel in the background in frame is modeled using an affine transformation. Using the notation of Section III, this can be written as follows: (9) (10) Fig. 6. (a) Initial support map after thresholding the disparity images. (b) Final support map after motion estimation. These affine model parameters are estimated using least squares techniques based on the disparity of the pixels that are visible in all intermediate views ( ). The total warping transformation between frame and the sprite image is obtained by combining this affine model for the disparity with the warping model between frame and the sprite image (aligned to ) (11) where are defined in (7). The multiview sprite generation procedure is illustrated in Fig. 4. A significant advantage of constructing the sprite image using more than one views is that pixels that are occluded in some of the views are still retained in the sprite image. More specifically, in order to generate the sprite image, we use the L R and R L disparity fields and the corresponding state maps to produce the disparity field, which corresponds to frame. Foreground and background segmentation is based on thresholding the disparity field after a preprocessing step in which occluded

6 GRAMMALIDIS et al.: SPRITE GENERATION AND CODING IN MULTIVIEW IMAGE SEQUENCES 307 Fig. 7. Comparison of background sprites obtained from monoscopic and multiview sequences. (a) Monoscopic sprite is obtained from ten frames of the left sequence. (b) Multiview sprite obtained from ten frames of all four views using method A1. (c) Multiview sprite obtained from ten frames from all four views using method A2. segments between pixels that have similar disparity to that of the foreground or background regions are assigned to these regions. Assuming that the foreground region has been identified correctly, all other pixels in, even those occluded in the left or the right view, can be assimilated into the background. Then, the disparity affine model for the background can be used to model the disparity in the entire background region. As a result, pixels in occluded regions provide additional samples for the warped frames used for the construction of the sprite image. Based on the information used for generating the sprite image, we have used and evaluated the following approaches. 1) In method A1, the sprite image is initially generated using only the pixels from the first (leftmost) view. Then, pixels from frames corresponding to pixel locations where no luminance value has been assigned, are added in the sprite image. This method yields sprite images with better visual quality because the pixels obtained from the first view are more reliable candidates for the sprite. However, as verified by the experimental results, this leads to very good reconstruction of the leftmost view, but not equally good results for the other views. TABLE I CODING OF THE BACKGROUND REGION OF THE CLAUDE SEQUENCE A1, A2: Coding using a single sprite image. B: Independent MPEG-4 coding of each view using MPEG-4 spritecoding mode. C: Independent MPEG-4 coding of each view without using sprite coding. 2) Method A2 uses an averaging procedure of all available pixels from frames to update the sprite image. This creates a more balanced sprite image that can be used to obtain satisfactory reconstructions for all channels, since all views equally contribute information to the sprite image. A drawback however, is that some blurring may be induced to the sprite image, when some of the samples that are averaged are noisy, due to the inadequacy

7 308 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 10, NO. 2, MARCH 2000 of the affine disparity model to describe the local disparity or due to luminance changes among different views. The coding of the multiview sequence using the generated sprite image conforms to the MPEG-4 specifications for sprite coding [11], [12]. This is achieved by reordering all the frames from a group of frames (GOF) to the following order: (12) Then, warping parameters for each frame of this sequence toward the sprite image are given by (6) for the first (left) view and by (11) for the remaining views. In order to code the warping transformation used to generate the reconstructed images from the sprite image, each transformation is expressed as a set of motion trajectories of a number of reference points. The number of reference points, needed to encode the warping parameters determines the transform to be used for warping, e.g., three reference points are used to fully describe the affine transform of (6) and (11). V. EXPERIMENTAL RESULTS Results are presented for the four-view sequence Claude and the stereoscopic sequence Aqua. 1 Fig. 5(a) (d) show the L R and R L disparity fields and the corresponding state maps obtained using the proposed algorithm for the first frame. Fig. 6(a) illustrates the initial segmentation map for the first frame obtained by thresholding of the disparity field in Fig. 5(a), while Fig. 6(b) shows the final segmentation map obtained after the proposed algorithm summarized in Fig. 3. The sprite for the background generated from ten frames from the first (leftmost) view is shown in Fig. 7(a). The sprite obtained by adding occluded pixels from the other three views using method A1, which was discussed in Section IV, is shown in Fig. 7(b). The background sprite image when averaging information from all four sequences according to method A2 is shown in Fig. 7(c). As seen, many new pixels are added to the sprite image. Most of the pixels are seen to be at the correct positions, however some blurring can be observed in Fig. 7(c), produced using method A2, especially near the left-upper corner. Blurring is mainly due to averaging noisy samples in locations where the local disparity is not adequately described by the affine disparity model. Coding results for both sprite-based and standard (nonsprite) MPEG-4 coding methods were obtained using the software implementation of the MPEG-4 Version 1 encoder/decoder provided by the ACTS 098 MoMuSys project [24], [25]. In the proposed approaches, methods A1 and A2, coding of all four views is based on a single sprite image obtained from all views. For comparison, independent coding of each view using MPEG-4 coders with a different sprite image for each view (method B) was also evaluated. Also, coding results provided by independent coding of each view using standard MPEG-4 coding are 1 Both sequences were prepared by THOMPSON CSF/LER for the RACE Project 2045 DISTIMA and the ACTS 092 Project PANORAMA. Fig. 8. (a), (b) Original images for the first and fourth view. (c), (d) Reconstructed images for the first and fourth view. Sprite coding (method A2) is used for the background and standard MPEG-4 coding for the foreground image. (e), (f) Reconstructed images for the first and fourth view using method C. also presented (method C). Results for the coding of the background region of the Claude sequence using these coding techniques are presented in Table I. Methods A1 and A2 are seen to require approximately four times less bitrate to encode the background in all views when compared to method B and eight times less bit rate when compared to method C. Method A1 results in negligible degradation of the reconstruction quality for the first view; however, the reconstruction quality of the other three views falls by approximately 4.5 db. Method A2 produces the same significant bit rate savings at a loss of approximately 3 db in the reconstruction of all four views. In terms of visual quality, some blurring can be observed in reconstructions obtained using method A2, due to the sprite image blurring effects discussed above. We have also used the proposed method for reconstructing each frame by applying the sprite coding (method A2) for the background object, while using standard (non sprite) MPEG-4 coding for the foreground object. The corresponding coding results are presented in Table II. The original first frames from the first and the last view are shown in Fig. 8(a) and (b), respectively, while the corresponding reconstructed frames are shown in Fig. 8(c) and (d). Fig. 8(e) and (f) illustrate the corresponding reconstructed frames obtained using method C (standard MPEG-4 coding) for the entire image area. Similar results

8 GRAMMALIDIS et al.: SPRITE GENERATION AND CODING IN MULTIVIEW IMAGE SEQUENCES 309 Fig. 10. (a), (b) Disparity and occlusion maps for the left and right views (occluded regions are shown in black color). (c), (d) Reconstructed images for the left and right view. Sprite coding (method A2) is used for the background and standard MPEG-4 coding for the foreground region. views using methods A1 and A2 are shown in Fig. 11(b) and (c), respectively. Coding results for the background region are presented in Table III, while results for the entire frames of the stereoscopic sequence are presented in Table IV (using standard MPEG-4 coding for the foreground region). Finally, two reconstructed frames using Method A2 are presented in Fig. 10(c) and (d). Fig. 9. (a), (b) Original images from the second and third view. (c), (d) Reconstructed images from the second and third. Sprite coding (method A2) is used for the background and standard MPEG-4 coding for the foreground region. (e), (f) Reconstructed images for the second and third view using method C. TABLE II CODING OF THE BACKGROUND AND FOREGROUND REGION OF THE CLAUDE SEQUENCE. FOREGROUND IS CODED USING STANDARD MPEG-4 CODING A1, A2: Coding using a single sprite image. B: Independent MPEG-4 coding of each view using MPEG-4 spritecoding mode. C: Independent MPEG-4 coding of each view without using sprite coding. for the middle (second and third) views are presented in Fig. 9(a) (f). As seen, the visual quality of the reconstructed frames using multiview sprite coding is comparable to that obtained using standard MPEG-4 coding. The estimated disparity fields estimated for the first frame of the Aqua sequence are presented in Fig. 10(a) and (b), where occluded regions are marked in black. Although significant depth variations can be observed, satisfactory segmentation of the background region is possible using the proposed approach. The sprite generated from five frames of the left sequence is shown in Fig. 11(a), while the sprites generated from both VI. CONCLUSIONS AND SUGGESTIONS FOR FUTURE WORK A method for sprite generation from multiview sequences was proposed. Disparity and occlusion estimation is based on an efficient dynamic programming algorithm using information from all views of the multiview sequence. By combining motion, disparity, and occlusion information, a sprite image corresponding to the first (main) view at the first time instant is generated. Image pixels from other views that are occluded in the main view are added to the sprite. The sprite coding method defined by MPEG-4 was extended for the case of a multiview image sequence, based on the generated sprite. Experimental results demonstrating the performance of the proposed technique and comparing it with methods using sprite generation from monoscopic sequences were presented. An additional advantage of this technique, is that the generated sprite images (mosaics) contain more pixels, and thus, additional information is available that may be exploited in other interesting sprite applications such as object tracking, background substitution, or annotating in multiview sequences. Significant depth changes or difficulties in the segmentation procedure may hinder successful sprite generation. In order to improve results, various approaches may be followed in the future. More complex warping models, than the simple affine or perspective models used by MPEG-4, could be defined to describe the motion of nonplanar surfaces or complex camera motions. Another approach would be to segment the scene into more than two regions (multiple layers), and use a different warping model for each layer. Efficient sprite-generation procedures for multiple layers considering transparent objects and depth variations have already

9 310 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 10, NO. 2, MARCH 2000 Fig. 11. Background sprites generated from five frames of the Aqua sequence. (a) Monoscopic sprite generated from the left view. (b) Multiview sprite obtained using method A1. (c) Multiview sprite obtained using method A2. TABLE III CODING OF THE BACKGROUND REGION OF THE AQUA SEQUENCE A1, A2: Coding using a single sprite image. B: Independent MPEG-4 coding of each view using MPEG-4 spritecoding mode. C: Independent MPEG-4 coding of each view without using sprite coding. TABLE IV CODING OF THE BACKGROUND AND FOREGROUND REGION OF THE AQUA SEQUENCE. IN ALL CASES, FOREGROUND IS CODED USING STANDARD MPEG-4 CODING A1, A2: Coding using a single sprite image. B: Independent MPEG-4 coding of each view using MPEG-4 spritecoding mode. C: Independent MPEG-4 coding of each view without using sprite coding. been proposed [15]. In this case, additional depth information which is necessary to resynthesize the images from the sprite has to be coded. However, using sprite coding for more than one layer is not supported by the current version of MPEG-4, since shape coding is not supported for layers coded in sprite coding mode. This inhibits sprite coding of more than one layer using MPEG-4 compliant methods. In sequences where there are significant luminance changes among different views, an interesting extension of the proposed technique would be to incorporate photometric correction methods, similar to those used in [26]. Specifically, the luminance direction and a normal vector could be estimated for the entire background region. Then, an iterative technique could be used to improve the estimation of the affine model parameters by using the photometrically corrected luminance values instead of the real ones. REFERENCES [1] N. Grammalidis and M. G. Strintzis, Disparity and occlusion estimation in multiocular systems and their coding for the communication of multiview image sequences, IEEE Trans. Circuits Syst. Video Technol., vol. 8, pp , Jun [2] M. Massey and W. Bender, Salient stills: Process and practice, IBM Syst. J., vol. 35, no. 3/4, pp , [3] L. Teodosio and W. Bender, Salient video stills: Content and context preserved, in Proc. 1st ACM Int. Conf. Multimedia MULTIMEDIA 93. New York, Aug. 1993, pp [4] F. Dufaux and F. Moscheni, Background mosaicking for low bit rate coding, in Proc. Int. Conf. Image Processing, Lausanne, Switzerland, Sept [5] M. Irani, P. Anandan, J. Bergen, R. Kumar, and S. Hsu, Efficient representations of video sequences and their applications, Signal Processing: Image Commun., vol. 8, no. 4, pp , May [6] M. Irani and P. Anandan, Video indexing based on mosaic representations, Proc. IEEE, vol. 16, no. 5, pp , May [7] R. Szeliski, Video mosaics for virtual environments, IEEE Comput. Graphics Applicat., vol. 16, pp , Mar [8] R. Szeliski and H.-Y. Shum, Creating full view panoramic mosaics and environment maps, in Proc. ACM SIGGRAPH 97 Conf., T. Whitted, Ed., Aug. 1997, ISBN , pp [9] S. Mann and R. W. Picard, Virtual bellows: Constructing high quality stills from video, in Proc. ICIP Int. Conf. Image Processing, Nov [10] M. Lee, W. Chen, C. B. Lin, C. Gu, T. Markoc, and R. Szeliski, A layered video object coding system using sprite and affine motion model, IEEE Trans. Circuits Syst. Video Technol., vol. 7, pp , Feb [11] MPEG-4 Video Group, MPEG-4 verification model version 11.0,, Tokyo, Japan, Tech. Rep., ISO/IEC JTC1/SC29/WG11/MPEG98/N2172, T. Ebrahimi, Ed., Mar [12] T. Sikora, The MPEG-4 video standard verification model, IEEE Trans. Circuits Syst. Video Technol., vol. 7, pp , Feb

10 GRAMMALIDIS et al.: SPRITE GENERATION AND CODING IN MULTIVIEW IMAGE SEQUENCES 311 [13] J. Y. Wang and E. H. Adelson, Representing moving images with layers, IEEE Trans. Image Processing, vol. 3, pp , Sept [14] T. Darrell and A. Pentland, Cooperative robust estimation using layers of support, IEEE Trans. Pattern Anal. Mach. Intell., vol. 17, pp , May [15] S. Baker, R. Szeliski, and P. Anandan, A layered approach to stereo reconstruction, in Proc. IEEE Computer Society Conf. Computer Vision and Pattern Recognition (CVPR 98), Santa Barbara, CA, June 1998, pp [16] M. Ziegler, Digital stereoscopic imaging and application A way toward new dimensions: The RACE II Project DISTIMA, in Inst. Elect. Eng. Colloq. Stereoscopic Television, London, U.K., Oct [17] J.-R. Ohm and K. Muller, Incomplete 3D-multiview representation of video objects, IEEE Trans. Circuits Syst. I, vol. 47, Feb [18] I. J. Cox, S. Hingorani, B. M. Maggs, and S. B. Rao, Stereo without disparity gradient smoothing: A Bayesian sensor fusion solution, in Proc. British Machine Vision Conf., New York, 1992, pp [19] S. S. Intille and A. F. Bobick, Disparity-space images and large occlusion stereo, M.I.T. Media Lab Perceptual Computing Group, Cambridge, MA, Tech. Rep. 220, [20], Incorporating intensity edges in the recovery of occlusion regions, M.I.T. Media Lab Perceptual Computing Group, Cambridge, MA, Tech. Rep. 246, [21] L. Falkenhagen, R. Koch, A. Kopernik, and M. Strintzis, Disparity estimation based on 3-D arbitrarily shaped regions, Digital Stereoscopic Imaging and Applications (DISTIMA), Tech. Rep. #R2045 /UH/DS/P/023/b1 RACE Project R2045, [22] O. Faugeras, Three-Dimensional Computer Vision. Cambridge, MA: MIT Press, [23] N. Grammalidis and M. G. Strintzis, Disparity and occlusion estimation for multiview image sequences using dynamic programming, in Proc. Int. Conf. Image Processing (ICIP 96), Lausanne, Switzerland, Sept [24] ACTS 098 MoMusys Project, Software simulation of mpeg4 video coder,, [Online]. Available FTP: /drogo.cselt.stet.it/pub/mpeg/mpeg- 4_fcd/Visual/Natural/MoMuSys-VFCD-V File: 507.tar.gz. [25] R. Koenen, F. Pereira, and L. Chariglione, MPEG-4: Context and objectives, Signal Processing: Image Commun., vol. 9, no. 4, May [26] G. Bozdagi, A. M. Tekalp, and L. Onural, 3-D motion estimation and wireframe adaptation including photometric effects for model-based coding of facial image sequences, IEEE Trans. Circuits Syst. Video Technol., vol. 4, pp , June Nikos Grammalidis (S 93) received the Diploma in electrical engineering from Aristotle University of Thessaloniki, Greece in He is currently working toward the Ph.D. degree in the Information Processing Laboratory, Aristotle University of Thessaloniki. His research interests include computer vision and multiview image sequence coding and processing. Dimitris Beletsiotis received the Diploma in electrical engineering from the Electrical Engineering Department, Aristotle University of Thessaloniki, Greece, in Presently he is serving in the Greek Army. His research interests include video-coding and video-processing applications. Michael G. Strintzis (S 68 M 70 SM 80) received the Diploma in electrical engineering from the National Technical University of Athens, Athens, Greece in 1967, and the M.A. and Ph.D. degrees in electrical engineering from Princeton University, Princeton, NJ, in 1969 and 1970, respectively. He then joined the Electrical Engineering Department, University of Pittsburgh, Pittsburgh, PA, where he served as Assistant ( ) and Associate ( ) Professor. Since 1980, he has been Professor of Electrical and Computer Engineering at the University of Thessaloniki, and since 1999, Director of the Informatics and Telematics Research Institute, Thessaloniki, Greece. His current research interests include 2-D and 3-D image coding, image processing, biomedical signal and image processing, and DVD and Internet data authentication and copy protection. Dr. Strintzis was awarded one of the Centennial Medals of the IEEE in 1994.

Transactions on Information and Communications Technologies vol 19, 1997 WIT Press, ISSN

Transactions on Information and Communications Technologies vol 19, 1997 WIT Press,   ISSN Hopeld Network for Stereo Correspondence Using Block-Matching Techniques Dimitrios Tzovaras and Michael G. Strintzis Information Processing Laboratory, Electrical and Computer Engineering Department, Aristotle

More information

Particle Tracking. For Bulk Material Handling Systems Using DEM Models. By: Jordan Pease

Particle Tracking. For Bulk Material Handling Systems Using DEM Models. By: Jordan Pease Particle Tracking For Bulk Material Handling Systems Using DEM Models By: Jordan Pease Introduction Motivation for project Particle Tracking Application to DEM models Experimental Results Future Work References

More information

Efficient Background Video Coding With Static Sprite Generation and Arbitrary-Shape Spatial Prediction Techniques

Efficient Background Video Coding With Static Sprite Generation and Arbitrary-Shape Spatial Prediction Techniques 394 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 13, NO. 5, MAY 2003 Efficient Background Video Coding With Static Sprite Generation and Arbitrary-Shape Spatial Prediction Techniques

More information

THE GENERATION of a stereoscopic image sequence

THE GENERATION of a stereoscopic image sequence IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 15, NO. 8, AUGUST 2005 1065 Stereoscopic Video Generation Based on Efficient Layered Structure and Motion Estimation From a Monoscopic

More information

1-2 Feature-Based Image Mosaicing

1-2 Feature-Based Image Mosaicing MVA'98 IAPR Workshop on Machine Vision Applications, Nov. 17-19, 1998, Makuhari, Chibq Japan 1-2 Feature-Based Image Mosaicing Naoki Chiba, Hiroshi Kano, Minoru Higashihara, Masashi Yasuda, and Masato

More information

Moving Object Segmentation Method Based on Motion Information Classification by X-means and Spatial Region Segmentation

Moving Object Segmentation Method Based on Motion Information Classification by X-means and Spatial Region Segmentation IJCSNS International Journal of Computer Science and Network Security, VOL.13 No.11, November 2013 1 Moving Object Segmentation Method Based on Motion Information Classification by X-means and Spatial

More information

Express Letters. A Simple and Efficient Search Algorithm for Block-Matching Motion Estimation. Jianhua Lu and Ming L. Liou

Express Letters. A Simple and Efficient Search Algorithm for Block-Matching Motion Estimation. Jianhua Lu and Ming L. Liou IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 7, NO. 2, APRIL 1997 429 Express Letters A Simple and Efficient Search Algorithm for Block-Matching Motion Estimation Jianhua Lu and

More information

Depth Estimation for View Synthesis in Multiview Video Coding

Depth Estimation for View Synthesis in Multiview Video Coding MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Depth Estimation for View Synthesis in Multiview Video Coding Serdar Ince, Emin Martinian, Sehoon Yea, Anthony Vetro TR2007-025 June 2007 Abstract

More information

The Video Z-buffer: A Concept for Facilitating Monoscopic Image Compression by exploiting the 3-D Stereoscopic Depth map

The Video Z-buffer: A Concept for Facilitating Monoscopic Image Compression by exploiting the 3-D Stereoscopic Depth map The Video Z-buffer: A Concept for Facilitating Monoscopic Image Compression by exploiting the 3-D Stereoscopic Depth map Sriram Sethuraman 1 and M. W. Siegel 2 1 David Sarnoff Research Center, Princeton,

More information

Still Image Objective Segmentation Evaluation using Ground Truth

Still Image Objective Segmentation Evaluation using Ground Truth 5th COST 276 Workshop (2003), pp. 9 14 B. Kovář, J. Přikryl, and M. Vlček (Editors) Still Image Objective Segmentation Evaluation using Ground Truth V. Mezaris, 1,2 I. Kompatsiaris 2 andm.g.strintzis 1,2

More information

MULTIVIEW REPRESENTATION OF 3D OBJECTS OF A SCENE USING VIDEO SEQUENCES

MULTIVIEW REPRESENTATION OF 3D OBJECTS OF A SCENE USING VIDEO SEQUENCES MULTIVIEW REPRESENTATION OF 3D OBJECTS OF A SCENE USING VIDEO SEQUENCES Mehran Yazdi and André Zaccarin CVSL, Dept. of Electrical and Computer Engineering, Laval University Ste-Foy, Québec GK 7P4, Canada

More information

Head Detection and Tracking by 2-D and 3-D Ellipsoid Fitting.

Head Detection and Tracking by 2-D and 3-D Ellipsoid Fitting. Head Detection and Tracking by 2-D and 3-D Ellipsoid Fitting. Nikos Grammalidis and Michael G.Strintzis Department of Electrical Engineering, University of Thessaloniki Thessaloniki 540 06, GREECE ngramm@panorama.ee.auth.gr,

More information

EE795: Computer Vision and Intelligent Systems

EE795: Computer Vision and Intelligent Systems EE795: Computer Vision and Intelligent Systems Spring 2012 TTh 17:30-18:45 FDH 204 Lecture 14 130307 http://www.ee.unlv.edu/~b1morris/ecg795/ 2 Outline Review Stereo Dense Motion Estimation Translational

More information

Asymmetric 2 1 pass stereo matching algorithm for real images

Asymmetric 2 1 pass stereo matching algorithm for real images 455, 057004 May 2006 Asymmetric 21 pass stereo matching algorithm for real images Chi Chu National Chiao Tung University Department of Computer Science Hsinchu, Taiwan 300 Chin-Chen Chang National United

More information

Multiview Image Compression using Algebraic Constraints

Multiview Image Compression using Algebraic Constraints Multiview Image Compression using Algebraic Constraints Chaitanya Kamisetty and C. V. Jawahar Centre for Visual Information Technology, International Institute of Information Technology, Hyderabad, INDIA-500019

More information

Fast Decision of Block size, Prediction Mode and Intra Block for H.264 Intra Prediction EE Gaurav Hansda

Fast Decision of Block size, Prediction Mode and Intra Block for H.264 Intra Prediction EE Gaurav Hansda Fast Decision of Block size, Prediction Mode and Intra Block for H.264 Intra Prediction EE 5359 Gaurav Hansda 1000721849 gaurav.hansda@mavs.uta.edu Outline Introduction to H.264 Current algorithms for

More information

Optimized Progressive Coding of Stereo Images Using Discrete Wavelet Transform

Optimized Progressive Coding of Stereo Images Using Discrete Wavelet Transform Optimized Progressive Coding of Stereo Images Using Discrete Wavelet Transform Torsten Palfner, Alexander Mali and Erika Müller Institute of Telecommunications and Information Technology, University of

More information

Implementation and analysis of Directional DCT in H.264

Implementation and analysis of Directional DCT in H.264 Implementation and analysis of Directional DCT in H.264 EE 5359 Multimedia Processing Guidance: Dr K R Rao Priyadarshini Anjanappa UTA ID: 1000730236 priyadarshini.anjanappa@mavs.uta.edu Introduction A

More information

DATA EMBEDDING IN TEXT FOR A COPIER SYSTEM

DATA EMBEDDING IN TEXT FOR A COPIER SYSTEM DATA EMBEDDING IN TEXT FOR A COPIER SYSTEM Anoop K. Bhattacharjya and Hakan Ancin Epson Palo Alto Laboratory 3145 Porter Drive, Suite 104 Palo Alto, CA 94304 e-mail: {anoop, ancin}@erd.epson.com Abstract

More information

I. INTRODUCTION. Figure-1 Basic block of text analysis

I. INTRODUCTION. Figure-1 Basic block of text analysis ISSN: 2349-7637 (Online) (RHIMRJ) Research Paper Available online at: www.rhimrj.com Detection and Localization of Texts from Natural Scene Images: A Hybrid Approach Priyanka Muchhadiya Post Graduate Fellow,

More information

Detecting motion by means of 2D and 3D information

Detecting motion by means of 2D and 3D information Detecting motion by means of 2D and 3D information Federico Tombari Stefano Mattoccia Luigi Di Stefano Fabio Tonelli Department of Electronics Computer Science and Systems (DEIS) Viale Risorgimento 2,

More information

Local Image Registration: An Adaptive Filtering Framework

Local Image Registration: An Adaptive Filtering Framework Local Image Registration: An Adaptive Filtering Framework Gulcin Caner a,a.murattekalp a,b, Gaurav Sharma a and Wendi Heinzelman a a Electrical and Computer Engineering Dept.,University of Rochester, Rochester,

More information

Colour Segmentation-based Computation of Dense Optical Flow with Application to Video Object Segmentation

Colour Segmentation-based Computation of Dense Optical Flow with Application to Video Object Segmentation ÖGAI Journal 24/1 11 Colour Segmentation-based Computation of Dense Optical Flow with Application to Video Object Segmentation Michael Bleyer, Margrit Gelautz, Christoph Rhemann Vienna University of Technology

More information

Video Alignment. Literature Survey. Spring 2005 Prof. Brian Evans Multidimensional Digital Signal Processing Project The University of Texas at Austin

Video Alignment. Literature Survey. Spring 2005 Prof. Brian Evans Multidimensional Digital Signal Processing Project The University of Texas at Austin Literature Survey Spring 2005 Prof. Brian Evans Multidimensional Digital Signal Processing Project The University of Texas at Austin Omer Shakil Abstract This literature survey compares various methods

More information

A Novel Stereo Camera System by a Biprism

A Novel Stereo Camera System by a Biprism 528 IEEE TRANSACTIONS ON ROBOTICS AND AUTOMATION, VOL. 16, NO. 5, OCTOBER 2000 A Novel Stereo Camera System by a Biprism DooHyun Lee and InSo Kweon, Member, IEEE Abstract In this paper, we propose a novel

More information

A Low Bit-Rate Video Codec Based on Two-Dimensional Mesh Motion Compensation with Adaptive Interpolation

A Low Bit-Rate Video Codec Based on Two-Dimensional Mesh Motion Compensation with Adaptive Interpolation IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 11, NO. 1, JANUARY 2001 111 A Low Bit-Rate Video Codec Based on Two-Dimensional Mesh Motion Compensation with Adaptive Interpolation

More information

Enhanced Hexagon with Early Termination Algorithm for Motion estimation

Enhanced Hexagon with Early Termination Algorithm for Motion estimation Volume No - 5, Issue No - 1, January, 2017 Enhanced Hexagon with Early Termination Algorithm for Motion estimation Neethu Susan Idiculay Assistant Professor, Department of Applied Electronics & Instrumentation,

More information

Chapter 3 Image Registration. Chapter 3 Image Registration

Chapter 3 Image Registration. Chapter 3 Image Registration Chapter 3 Image Registration Distributed Algorithms for Introduction (1) Definition: Image Registration Input: 2 images of the same scene but taken from different perspectives Goal: Identify transformation

More information

Model-based Enhancement of Lighting Conditions in Image Sequences

Model-based Enhancement of Lighting Conditions in Image Sequences Model-based Enhancement of Lighting Conditions in Image Sequences Peter Eisert and Bernd Girod Information Systems Laboratory Stanford University {eisert,bgirod}@stanford.edu http://www.stanford.edu/ eisert

More information

EE 5359 MULTIMEDIA PROCESSING SPRING Final Report IMPLEMENTATION AND ANALYSIS OF DIRECTIONAL DISCRETE COSINE TRANSFORM IN H.

EE 5359 MULTIMEDIA PROCESSING SPRING Final Report IMPLEMENTATION AND ANALYSIS OF DIRECTIONAL DISCRETE COSINE TRANSFORM IN H. EE 5359 MULTIMEDIA PROCESSING SPRING 2011 Final Report IMPLEMENTATION AND ANALYSIS OF DIRECTIONAL DISCRETE COSINE TRANSFORM IN H.264 Under guidance of DR K R RAO DEPARTMENT OF ELECTRICAL ENGINEERING UNIVERSITY

More information

Edge tracking for motion segmentation and depth ordering

Edge tracking for motion segmentation and depth ordering Edge tracking for motion segmentation and depth ordering P. Smith, T. Drummond and R. Cipolla Department of Engineering University of Cambridge Cambridge CB2 1PZ,UK {pas1001 twd20 cipolla}@eng.cam.ac.uk

More information

A The left scanline The right scanline

A The left scanline The right scanline Dense Disparity Estimation via Global and Local Matching Chun-Jen Tsai and Aggelos K. Katsaggelos Electrical and Computer Engineering Northwestern University Evanston, IL 60208-3118, USA E-mail: tsai@ece.nwu.edu,

More information

An Approach for Reduction of Rain Streaks from a Single Image

An Approach for Reduction of Rain Streaks from a Single Image An Approach for Reduction of Rain Streaks from a Single Image Vijayakumar Majjagi 1, Netravati U M 2 1 4 th Semester, M. Tech, Digital Electronics, Department of Electronics and Communication G M Institute

More information

Video Alignment. Final Report. Spring 2005 Prof. Brian Evans Multidimensional Digital Signal Processing Project The University of Texas at Austin

Video Alignment. Final Report. Spring 2005 Prof. Brian Evans Multidimensional Digital Signal Processing Project The University of Texas at Austin Final Report Spring 2005 Prof. Brian Evans Multidimensional Digital Signal Processing Project The University of Texas at Austin Omer Shakil Abstract This report describes a method to align two videos.

More information

A Robust Two Feature Points Based Depth Estimation Method 1)

A Robust Two Feature Points Based Depth Estimation Method 1) Vol.31, No.5 ACTA AUTOMATICA SINICA September, 2005 A Robust Two Feature Points Based Depth Estimation Method 1) ZHONG Zhi-Guang YI Jian-Qiang ZHAO Dong-Bin (Laboratory of Complex Systems and Intelligence

More information

VIDEO OBJECT SEGMENTATION BY EXTENDED RECURSIVE-SHORTEST-SPANNING-TREE METHOD. Ertem Tuncel and Levent Onural

VIDEO OBJECT SEGMENTATION BY EXTENDED RECURSIVE-SHORTEST-SPANNING-TREE METHOD. Ertem Tuncel and Levent Onural VIDEO OBJECT SEGMENTATION BY EXTENDED RECURSIVE-SHORTEST-SPANNING-TREE METHOD Ertem Tuncel and Levent Onural Electrical and Electronics Engineering Department, Bilkent University, TR-06533, Ankara, Turkey

More information

BI-DIRECTIONAL AFFINE MOTION COMPENSATION USING A CONTENT-BASED, NON-CONNECTED, TRIANGULAR MESH

BI-DIRECTIONAL AFFINE MOTION COMPENSATION USING A CONTENT-BASED, NON-CONNECTED, TRIANGULAR MESH BI-DIRECTIONAL AFFINE MOTION COMPENSATION USING A CONTENT-BASED, NON-CONNECTED, TRIANGULAR MESH Marc Servais, Theo Vlachos and Thomas Davies University of Surrey, UK; and BBC Research and Development,

More information

Efficient Block Matching Algorithm for Motion Estimation

Efficient Block Matching Algorithm for Motion Estimation Efficient Block Matching Algorithm for Motion Estimation Zong Chen International Science Inde Computer and Information Engineering waset.org/publication/1581 Abstract Motion estimation is a key problem

More information

Face Cyclographs for Recognition

Face Cyclographs for Recognition Face Cyclographs for Recognition Guodong Guo Department of Computer Science North Carolina Central University E-mail: gdguo@nccu.edu Charles R. Dyer Computer Sciences Department University of Wisconsin-Madison

More information

STEREO BY TWO-LEVEL DYNAMIC PROGRAMMING

STEREO BY TWO-LEVEL DYNAMIC PROGRAMMING STEREO BY TWO-LEVEL DYNAMIC PROGRAMMING Yuichi Ohta Institute of Information Sciences and Electronics University of Tsukuba IBARAKI, 305, JAPAN Takeo Kanade Computer Science Department Carnegie-Mellon

More information

MANY image and video compression standards such as

MANY image and video compression standards such as 696 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL 9, NO 5, AUGUST 1999 An Efficient Method for DCT-Domain Image Resizing with Mixed Field/Frame-Mode Macroblocks Changhoon Yim and

More information

A deblocking filter with two separate modes in block-based video coding

A deblocking filter with two separate modes in block-based video coding A deblocing filter with two separate modes in bloc-based video coding Sung Deu Kim Jaeyoun Yi and Jong Beom Ra Dept. of Electrical Engineering Korea Advanced Institute of Science and Technology 7- Kusongdong

More information

x L d +W/2 -W/2 W x R (d k+1/2, o k+1/2 ) d (x k, d k ) (x k-1, d k-1 ) (a) (b)

x L d +W/2 -W/2 W x R (d k+1/2, o k+1/2 ) d (x k, d k ) (x k-1, d k-1 ) (a) (b) Disparity Estimation with Modeling of Occlusion and Object Orientation Andre Redert, Chun-Jen Tsai +, Emile Hendriks, Aggelos K. Katsaggelos + Information Theory Group, Department of Electrical Engineering

More information

A Survey of Light Source Detection Methods

A Survey of Light Source Detection Methods A Survey of Light Source Detection Methods Nathan Funk University of Alberta Mini-Project for CMPUT 603 November 30, 2003 Abstract This paper provides an overview of the most prominent techniques for light

More information

Planar pattern for automatic camera calibration

Planar pattern for automatic camera calibration Planar pattern for automatic camera calibration Beiwei Zhang Y. F. Li City University of Hong Kong Department of Manufacturing Engineering and Engineering Management Kowloon, Hong Kong Fu-Chao Wu Institute

More information

Realtime View Adaptation of Video Objects in 3-Dimensional Virtual Environments

Realtime View Adaptation of Video Objects in 3-Dimensional Virtual Environments Contact Details of Presenting Author Edward Cooke (cooke@hhi.de) Tel: +49-30-31002 613 Fax: +49-30-3927200 Summation Abstract o Examination of the representation of time-critical, arbitrary-shaped, video

More information

An Edge-Based Approach to Motion Detection*

An Edge-Based Approach to Motion Detection* An Edge-Based Approach to Motion Detection* Angel D. Sappa and Fadi Dornaika Computer Vison Center Edifici O Campus UAB 08193 Barcelona, Spain {sappa, dornaika}@cvc.uab.es Abstract. This paper presents

More information

Elimination of Duplicate Videos in Video Sharing Sites

Elimination of Duplicate Videos in Video Sharing Sites Elimination of Duplicate Videos in Video Sharing Sites Narendra Kumar S, Murugan S, Krishnaveni R Abstract - In some social video networking sites such as YouTube, there exists large numbers of duplicate

More information

A FAST MULTISPRITE GENERATOR WITH NEAR-OPTIMUM CODING BIT-RATE

A FAST MULTISPRITE GENERATOR WITH NEAR-OPTIMUM CODING BIT-RATE International Journal of Pattern Recognition and Artificial Intelligence Vol. 23, No. 2 (2009) 331 353 c World Scientific Publishing Company A FAST MULTISPRITE GENERATOR WITH NEAR-OPTIMUM CODING BIT-RATE

More information

signal-to-noise ratio (PSNR), 2

signal-to-noise ratio (PSNR), 2 u m " The Integration in Optics, Mechanics, and Electronics of Digital Versatile Disc Systems (1/3) ---(IV) Digital Video and Audio Signal Processing ƒf NSC87-2218-E-009-036 86 8 1 --- 87 7 31 p m o This

More information

INTERACTIVE CONTENT-BASED VIDEO INDEXING AND BROWSING

INTERACTIVE CONTENT-BASED VIDEO INDEXING AND BROWSING INTERACTIVE CONTENT-BASED VIDEO INDEXING AND BROWSING Miclial Irani Dept. of Applied Math and CS The Weixmann Institute of Science 76100 Rehovot, Israel H.S. Sawhney R. Kumar P. Anandan David Sarnoff Research

More information

Data Hiding in Video

Data Hiding in Video Data Hiding in Video J. J. Chae and B. S. Manjunath Department of Electrical and Computer Engineering University of California, Santa Barbara, CA 9316-956 Email: chaejj, manj@iplab.ece.ucsb.edu Abstract

More information

Fast Outlier Rejection by Using Parallax-Based Rigidity Constraint for Epipolar Geometry Estimation

Fast Outlier Rejection by Using Parallax-Based Rigidity Constraint for Epipolar Geometry Estimation Fast Outlier Rejection by Using Parallax-Based Rigidity Constraint for Epipolar Geometry Estimation Engin Tola 1 and A. Aydın Alatan 2 1 Computer Vision Laboratory, Ecóle Polytechnique Fédéral de Lausanne

More information

Motion Tracking and Event Understanding in Video Sequences

Motion Tracking and Event Understanding in Video Sequences Motion Tracking and Event Understanding in Video Sequences Isaac Cohen Elaine Kang, Jinman Kang Institute for Robotics and Intelligent Systems University of Southern California Los Angeles, CA Objectives!

More information

WATERMARKING FOR LIGHT FIELD RENDERING 1

WATERMARKING FOR LIGHT FIELD RENDERING 1 ATERMARKING FOR LIGHT FIELD RENDERING 1 Alper Koz, Cevahir Çığla and A. Aydın Alatan Department of Electrical and Electronics Engineering, METU Balgat, 06531, Ankara, TURKEY. e-mail: koz@metu.edu.tr, cevahir@eee.metu.edu.tr,

More information

A reversible data hiding based on adaptive prediction technique and histogram shifting

A reversible data hiding based on adaptive prediction technique and histogram shifting A reversible data hiding based on adaptive prediction technique and histogram shifting Rui Liu, Rongrong Ni, Yao Zhao Institute of Information Science Beijing Jiaotong University E-mail: rrni@bjtu.edu.cn

More information

MOTION estimation is one of the major techniques for

MOTION estimation is one of the major techniques for 522 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 18, NO. 4, APRIL 2008 New Block-Based Motion Estimation for Sequences with Brightness Variation and Its Application to Static Sprite

More information

Robust Model-Free Tracking of Non-Rigid Shape. Abstract

Robust Model-Free Tracking of Non-Rigid Shape. Abstract Robust Model-Free Tracking of Non-Rigid Shape Lorenzo Torresani Stanford University ltorresa@cs.stanford.edu Christoph Bregler New York University chris.bregler@nyu.edu New York University CS TR2003-840

More information

Reduction of Blocking artifacts in Compressed Medical Images

Reduction of Blocking artifacts in Compressed Medical Images ISSN 1746-7659, England, UK Journal of Information and Computing Science Vol. 8, No. 2, 2013, pp. 096-102 Reduction of Blocking artifacts in Compressed Medical Images Jagroop Singh 1, Sukhwinder Singh

More information

Compression of Light Field Images using Projective 2-D Warping method and Block matching

Compression of Light Field Images using Projective 2-D Warping method and Block matching Compression of Light Field Images using Projective 2-D Warping method and Block matching A project Report for EE 398A Anand Kamat Tarcar Electrical Engineering Stanford University, CA (anandkt@stanford.edu)

More information

Light source estimation using feature points from specular highlights and cast shadows

Light source estimation using feature points from specular highlights and cast shadows Vol. 11(13), pp. 168-177, 16 July, 2016 DOI: 10.5897/IJPS2015.4274 Article Number: F492B6D59616 ISSN 1992-1950 Copyright 2016 Author(s) retain the copyright of this article http://www.academicjournals.org/ijps

More information

HUMAN S FACIAL PARTS EXTRACTION TO RECOGNIZE FACIAL EXPRESSION

HUMAN S FACIAL PARTS EXTRACTION TO RECOGNIZE FACIAL EXPRESSION HUMAN S FACIAL PARTS EXTRACTION TO RECOGNIZE FACIAL EXPRESSION Dipankar Das Department of Information and Communication Engineering, University of Rajshahi, Rajshahi-6205, Bangladesh ABSTRACT Real-time

More information

Feature-Based Image Mosaicing

Feature-Based Image Mosaicing Systems and Computers in Japan, Vol. 31, No. 7, 2000 Translated from Denshi Joho Tsushin Gakkai Ronbunshi, Vol. J82-D-II, No. 10, October 1999, pp. 1581 1589 Feature-Based Image Mosaicing Naoki Chiba and

More information

Chapter 11.3 MPEG-2. MPEG-2: For higher quality video at a bit-rate of more than 4 Mbps Defined seven profiles aimed at different applications:

Chapter 11.3 MPEG-2. MPEG-2: For higher quality video at a bit-rate of more than 4 Mbps Defined seven profiles aimed at different applications: Chapter 11.3 MPEG-2 MPEG-2: For higher quality video at a bit-rate of more than 4 Mbps Defined seven profiles aimed at different applications: Simple, Main, SNR scalable, Spatially scalable, High, 4:2:2,

More information

Hierarchical Matching Techiques for Automatic Image Mosaicing

Hierarchical Matching Techiques for Automatic Image Mosaicing Hierarchical Matching Techiques for Automatic Image Mosaicing C.L Begg, R Mukundan Department of Computer Science, University of Canterbury, Christchurch, New Zealand clb56@student.canterbury.ac.nz, mukund@cosc.canterbury.ac.nz

More information

Stereo Image Rectification for Simple Panoramic Image Generation

Stereo Image Rectification for Simple Panoramic Image Generation Stereo Image Rectification for Simple Panoramic Image Generation Yun-Suk Kang and Yo-Sung Ho Gwangju Institute of Science and Technology (GIST) 261 Cheomdan-gwagiro, Buk-gu, Gwangju 500-712 Korea Email:{yunsuk,

More information

Fingerprint Mosaicking by Rolling with Sliding

Fingerprint Mosaicking by Rolling with Sliding Fingerprint Mosaicking by Rolling with Sliding Kyoungtaek Choi, Hunjae Park, Hee-seung Choi and Jaihie Kim Department of Electrical and Electronic Engineering,Yonsei University Biometrics Engineering Research

More information

NeTra-V: Towards an Object-based Video Representation

NeTra-V: Towards an Object-based Video Representation Proc. of SPIE, Storage and Retrieval for Image and Video Databases VI, vol. 3312, pp 202-213, 1998 NeTra-V: Towards an Object-based Video Representation Yining Deng, Debargha Mukherjee and B. S. Manjunath

More information

Perceptual Grouping from Motion Cues Using Tensor Voting

Perceptual Grouping from Motion Cues Using Tensor Voting Perceptual Grouping from Motion Cues Using Tensor Voting 1. Research Team Project Leader: Graduate Students: Prof. Gérard Medioni, Computer Science Mircea Nicolescu, Changki Min 2. Statement of Project

More information

STEREOSCOPIC IMAGE PROCESSING

STEREOSCOPIC IMAGE PROCESSING STEREOSCOPIC IMAGE PROCESSING Reginald L. Lagendijk, Ruggero E.H. Franich 1 and Emile A. Hendriks 2 Delft University of Technology Department of Electrical Engineering 4 Mekelweg, 2628 CD Delft, The Netherlands

More information

Scene Segmentation by Color and Depth Information and its Applications

Scene Segmentation by Color and Depth Information and its Applications Scene Segmentation by Color and Depth Information and its Applications Carlo Dal Mutto Pietro Zanuttigh Guido M. Cortelazzo Department of Information Engineering University of Padova Via Gradenigo 6/B,

More information

Global Flow Estimation. Lecture 9

Global Flow Estimation. Lecture 9 Motion Models Image Transformations to relate two images 3D Rigid motion Perspective & Orthographic Transformation Planar Scene Assumption Transformations Translation Rotation Rigid Affine Homography Pseudo

More information

CONVERSION OF FREE-VIEWPOINT 3D MULTI-VIEW VIDEO FOR STEREOSCOPIC DISPLAYS

CONVERSION OF FREE-VIEWPOINT 3D MULTI-VIEW VIDEO FOR STEREOSCOPIC DISPLAYS CONVERSION OF FREE-VIEWPOINT 3D MULTI-VIEW VIDEO FOR STEREOSCOPIC DISPLAYS Luat Do 1, Svitlana Zinger 1, and Peter H. N. de With 1,2 1 Eindhoven University of Technology, P.O. Box 513, 5600 MB Eindhoven,

More information

Lucas-Kanade Without Iterative Warping

Lucas-Kanade Without Iterative Warping 3 LucasKanade Without Iterative Warping Alex RavAcha School of Computer Science and Engineering The Hebrew University of Jerusalem 91904 Jerusalem, Israel EMail: alexis@cs.huji.ac.il Abstract A significant

More information

Learning based face hallucination techniques: A survey

Learning based face hallucination techniques: A survey Vol. 3 (2014-15) pp. 37-45. : A survey Premitha Premnath K Department of Computer Science & Engineering Vidya Academy of Science & Technology Thrissur - 680501, Kerala, India (email: premithakpnath@gmail.com)

More information

Feature Transfer and Matching in Disparate Stereo Views through the use of Plane Homographies

Feature Transfer and Matching in Disparate Stereo Views through the use of Plane Homographies Feature Transfer and Matching in Disparate Stereo Views through the use of Plane Homographies M. Lourakis, S. Tzurbakis, A. Argyros, S. Orphanoudakis Computer Vision and Robotics Lab (CVRL) Institute of

More information

CONTENT ADAPTIVE SCREEN IMAGE SCALING

CONTENT ADAPTIVE SCREEN IMAGE SCALING CONTENT ADAPTIVE SCREEN IMAGE SCALING Yao Zhai (*), Qifei Wang, Yan Lu, Shipeng Li University of Science and Technology of China, Hefei, Anhui, 37, China Microsoft Research, Beijing, 8, China ABSTRACT

More information

Optical Flow-Based Person Tracking by Multiple Cameras

Optical Flow-Based Person Tracking by Multiple Cameras Proc. IEEE Int. Conf. on Multisensor Fusion and Integration in Intelligent Systems, Baden-Baden, Germany, Aug. 2001. Optical Flow-Based Person Tracking by Multiple Cameras Hideki Tsutsui, Jun Miura, and

More information

DIGITAL video is an integral part of many newly emerging

DIGITAL video is an integral part of many newly emerging IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 8, NO. 5, SEPTEMBER 1998 547 3-D Model-Based Segmentation of Videoconference Image Sequences Ioannis Kompatsiaris, Student Member, IEEE,

More information

Disparity map coding for 3D teleconferencing applications

Disparity map coding for 3D teleconferencing applications Disparity map coding for 3D teleconferencing applications André Redert, Emile Hendriks Information Theory Group, Department of Electrical Engineering Delft University of Technology, Mekelweg 4, 2628 CD

More information

Motion Detection and Segmentation Using Image Mosaics

Motion Detection and Segmentation Using Image Mosaics Research Showcase @ CMU Institute for Software Research School of Computer Science 2000 Motion Detection and Segmentation Using Image Mosaics Kiran S. Bhat Mahesh Saptharishi Pradeep Khosla Follow this

More information

Stereo Vision. MAN-522 Computer Vision

Stereo Vision. MAN-522 Computer Vision Stereo Vision MAN-522 Computer Vision What is the goal of stereo vision? The recovery of the 3D structure of a scene using two or more images of the 3D scene, each acquired from a different viewpoint in

More information

Using Shape Priors to Regularize Intermediate Views in Wide-Baseline Image-Based Rendering

Using Shape Priors to Regularize Intermediate Views in Wide-Baseline Image-Based Rendering Using Shape Priors to Regularize Intermediate Views in Wide-Baseline Image-Based Rendering Cédric Verleysen¹, T. Maugey², P. Frossard², C. De Vleeschouwer¹ ¹ ICTEAM institute, UCL (Belgium) ; ² LTS4 lab,

More information

5LSH0 Advanced Topics Video & Analysis

5LSH0 Advanced Topics Video & Analysis 1 Multiview 3D video / Outline 2 Advanced Topics Multimedia Video (5LSH0), Module 02 3D Geometry, 3D Multiview Video Coding & Rendering Peter H.N. de With, Sveta Zinger & Y. Morvan ( p.h.n.de.with@tue.nl

More information

FAST MOTION ESTIMATION WITH DUAL SEARCH WINDOW FOR STEREO 3D VIDEO ENCODING

FAST MOTION ESTIMATION WITH DUAL SEARCH WINDOW FOR STEREO 3D VIDEO ENCODING FAST MOTION ESTIMATION WITH DUAL SEARCH WINDOW FOR STEREO 3D VIDEO ENCODING 1 Michal Joachimiak, 2 Kemal Ugur 1 Dept. of Signal Processing, Tampere University of Technology, Tampere, Finland 2 Jani Lainema,

More information

MPEG-4 AUTHORING TOOL FOR THE COMPOSITION OF 3D AUDIOVISUAL SCENES

MPEG-4 AUTHORING TOOL FOR THE COMPOSITION OF 3D AUDIOVISUAL SCENES MPEG-4 AUTHORING TOOL FOR THE COMPOSITION OF 3D AUDIOVISUAL SCENES P. Daras I. Kompatsiaris T. Raptis M. G. Strintzis Informatics and Telematics Institute 1,Kyvernidou str. 546 39 Thessaloniki, GREECE

More information

Accurate and Dense Wide-Baseline Stereo Matching Using SW-POC

Accurate and Dense Wide-Baseline Stereo Matching Using SW-POC Accurate and Dense Wide-Baseline Stereo Matching Using SW-POC Shuji Sakai, Koichi Ito, Takafumi Aoki Graduate School of Information Sciences, Tohoku University, Sendai, 980 8579, Japan Email: sakai@aoki.ecei.tohoku.ac.jp

More information

VOLUMETRIC RECONSTRUCTION WITH COMPRESSED DATA

VOLUMETRIC RECONSTRUCTION WITH COMPRESSED DATA VOLUMETRIC RECONSTRUCTION WITH COMPRESSED DATA N. Anantrasirichai, C. Nishan Canagarajah, David W. Redmill and Akbar Sheikh Akbari Department of Electrical & Electronic Engineering, University of Bristol,

More information

Stereo/Multiview Video Encoding Using the MPEG Family of Standards

Stereo/Multiview Video Encoding Using the MPEG Family of Standards Stereo/Multiview Video Encoding Using the MPEG Family of Standards Jens-Rainer Ohm Heinrich-Hertz-Institut, Image Processing Department, Einsteinufer 37, D-10587 Berlin, Germany ABSTRACT Compression of

More information

Tracking facial features using low resolution and low fps cameras under variable light conditions

Tracking facial features using low resolution and low fps cameras under variable light conditions Tracking facial features using low resolution and low fps cameras under variable light conditions Peter Kubíni * Department of Computer Graphics Comenius University Bratislava / Slovakia Abstract We are

More information

EXPLORING ON STEGANOGRAPHY FOR LOW BIT RATE WAVELET BASED CODER IN IMAGE RETRIEVAL SYSTEM

EXPLORING ON STEGANOGRAPHY FOR LOW BIT RATE WAVELET BASED CODER IN IMAGE RETRIEVAL SYSTEM TENCON 2000 explore2 Page:1/6 11/08/00 EXPLORING ON STEGANOGRAPHY FOR LOW BIT RATE WAVELET BASED CODER IN IMAGE RETRIEVAL SYSTEM S. Areepongsa, N. Kaewkamnerd, Y. F. Syed, and K. R. Rao The University

More information

COMPARISONS OF DCT-BASED AND DWT-BASED WATERMARKING TECHNIQUES

COMPARISONS OF DCT-BASED AND DWT-BASED WATERMARKING TECHNIQUES COMPARISONS OF DCT-BASED AND DWT-BASED WATERMARKING TECHNIQUES H. I. Saleh 1, M. E. Elhadedy 2, M. A. Ashour 1, M. A. Aboelsaud 3 1 Radiation Engineering Dept., NCRRT, AEA, Egypt. 2 Reactor Dept., NRC,

More information

AUTOMATIC OBJECT DETECTION IN VIDEO SEQUENCES WITH CAMERA IN MOTION. Ninad Thakoor, Jean Gao and Huamei Chen

AUTOMATIC OBJECT DETECTION IN VIDEO SEQUENCES WITH CAMERA IN MOTION. Ninad Thakoor, Jean Gao and Huamei Chen AUTOMATIC OBJECT DETECTION IN VIDEO SEQUENCES WITH CAMERA IN MOTION Ninad Thakoor, Jean Gao and Huamei Chen Computer Science and Engineering Department The University of Texas Arlington TX 76019, USA ABSTRACT

More information

Biometric Security System Using Palm print

Biometric Security System Using Palm print ISSN (Online) : 2319-8753 ISSN (Print) : 2347-6710 International Journal of Innovative Research in Science, Engineering and Technology Volume 3, Special Issue 3, March 2014 2014 International Conference

More information

Intermediate view synthesis considering occluded and ambiguously referenced image regions 1. Carnegie Mellon University, Pittsburgh, PA 15213

Intermediate view synthesis considering occluded and ambiguously referenced image regions 1. Carnegie Mellon University, Pittsburgh, PA 15213 1 Intermediate view synthesis considering occluded and ambiguously referenced image regions 1 Jeffrey S. McVeigh *, M. W. Siegel ** and Angel G. Jordan * * Department of Electrical and Computer Engineering

More information

ELEC Dr Reji Mathew Electrical Engineering UNSW

ELEC Dr Reji Mathew Electrical Engineering UNSW ELEC 4622 Dr Reji Mathew Electrical Engineering UNSW Review of Motion Modelling and Estimation Introduction to Motion Modelling & Estimation Forward Motion Backward Motion Block Motion Estimation Motion

More information

CS 4495 Computer Vision A. Bobick. Motion and Optic Flow. Stereo Matching

CS 4495 Computer Vision A. Bobick. Motion and Optic Flow. Stereo Matching Stereo Matching Fundamental matrix Let p be a point in left image, p in right image l l Epipolar relation p maps to epipolar line l p maps to epipolar line l p p Epipolar mapping described by a 3x3 matrix

More information

Global Flow Estimation. Lecture 9

Global Flow Estimation. Lecture 9 Global Flow Estimation Lecture 9 Global Motion Estimate motion using all pixels in the image. Parametric flow gives an equation, which describes optical flow for each pixel. Affine Projective Global motion

More information

FRAME-RATE UP-CONVERSION USING TRANSMITTED TRUE MOTION VECTORS

FRAME-RATE UP-CONVERSION USING TRANSMITTED TRUE MOTION VECTORS FRAME-RATE UP-CONVERSION USING TRANSMITTED TRUE MOTION VECTORS Yen-Kuang Chen 1, Anthony Vetro 2, Huifang Sun 3, and S. Y. Kung 4 Intel Corp. 1, Mitsubishi Electric ITA 2 3, and Princeton University 1

More information

Face Recognition Using Vector Quantization Histogram and Support Vector Machine Classifier Rong-sheng LI, Fei-fei LEE *, Yan YAN and Qiu CHEN

Face Recognition Using Vector Quantization Histogram and Support Vector Machine Classifier Rong-sheng LI, Fei-fei LEE *, Yan YAN and Qiu CHEN 2016 International Conference on Artificial Intelligence: Techniques and Applications (AITA 2016) ISBN: 978-1-60595-389-2 Face Recognition Using Vector Quantization Histogram and Support Vector Machine

More information