3D Video Generation and Service based on a TOF Depth Sensor in MPEG-4 Multimedia Framework

Size: px
Start display at page:

Download "3D Video Generation and Service based on a TOF Depth Sensor in MPEG-4 Multimedia Framework"

Transcription

1 1730 3D Video Generation and Service based on a TOF Depth Sensor in MPEG-4 Multimedia Framework Sung-Yeol Kim, Ji-Ho Cho, Andreas Koschan, Member, IEEE, and Mongi A. Abidi, Member, IEEE Abstract In this paper, we present a new method to generate and serve 3D video represented by video-plus-depth using a time-of-flight (TOF) depth sensor. In practice, depth images captured by the depth sensor have critical problems, such as optical noise, unmatched boundaries with their corresponding color images, and depth flickering artifacts in the temporal domain. In this work, we enhance the noisy depth images by performing a series of processing steps including joint bilateral filtering with inner-edge selection, outer-boundary refinement by a robust image matting method, and temporal consistency based on motion estimation. Thereafter, the generated high-quality video-plus-depth is combined with computer graphics models in the MPEG-4 multimedia framework. Finally, the immersive video content is streamed to consumers to enjoy 3D view. Experimental results show that our method can minimize the inherent problems of depth images significantly and serve 3D video successfully in the MPEG-4 multimedia framework 1. Index Terms 3D video generation and service, Time-offlight depth sensor, MPEG-4 multimedia framework. I. INTRODUCTION As immersive multimedia services are expected to be available in the near future through a high-speed optical network, three-dimensional (3D) video is recognized as an essential part of the next-generation multimedia application. As a 3D video representation, it is widely accepted that an image sequence of synchronized color and depth images, which is often called as video-plus-depth [1], provides the basis for future 3D video applications. For a practical service of video-plus-depth to the potential consumers of 3D video applications, such as 3D TV [2], we need to consider two important questions: 1) how can we obtain high-quality video-plus-depth? 2) how can we stream 3D video contents including video-plus-depth and interactive multimedia data? With respect to the first question, a variety of depth estimation algorithms have been presented in the fields of computer vision and image processing [3]. However, accurate measurement of depth information from a natural scene still remains problematic due to the difficulty of depth estimation on textureless and depth discontinuous regions. 1 This work was supported by DOE-URPR (Grant-DOE-DEFG02-86NE37968) and US Air Force Grant (FA ) in USA, and in part by the National Research Foundation of Korea (NRF D00277). S.-Y. Kim, A. Koschan, and M. A. Abidi are with the Department of Electrical Engineering and Computer Science, The University of Tennessee, Knoxville, TN, USA ( {sykim, akoschan, abidi}@utk.edu) J.-H. Cho is with the Department of Mechatronics, Gwangju Institute of Science and Technology, Gwangju, South Korea ( jhcho@gist.ac.kr) Contributed Paper Manuscript received 07/02/10 Current version published 09/23/10 Electronic version published 09/30/ /10/$ IEEE With respect to the second question, traditional multimedia frameworks, such as MPEG-1 and MPEG-2 systems, merely deal with efficient coding issues and synchronization between the conventional video and audio, and do not provide 3D video functionalities. Unlike previous audio-visual standards, the MPEG-4 multimedia framework [4] can support streaming data for various media objects, such as computer graphics models and interactive information. However, the framework does not provide the functionality to stream natural 3D video information represented by video-plus-depth. As sensor technologies to obtain distance data advance, we can now capture more accurate per-pixel depth data of a real scene using active time-of-flight (TOF) depth sensors [5, 6]. These TOF depth sensors directly provide color and depth information from a natural scene by integrating an infrared light source with a conventional video sensor. These depth sensors produce more accurate depth data on textureless and depth discontinuous regions than conventional passive depth estimation methods. Figure 1(a) and Figure 1(b) show a color image and its corresponding depth image captured by a TOF depth sensor, respectively. (a) Color image (b) Depth image Fig. 1. A frame of video-plus-depth captured by a TOF depth sensor However, the depth data from this sensor cannot directly be used due to some inherent problems [7]. In order to use the depth data properly, we need to resolve spatial and temporal problems existing in the raw depth images, such as 1) optical noise, 2) unmatched boundaries between a depth image and its corresponding color image, 3) lost depth data on shiny and dark surface, and 4) temporal depth flickering artifacts on stationary objects. Optical noise, as shown in Fig. 2(a), usually occurs inside of objects in a scene as a result of differences in reflectivity of an infrared sensor according to color variation. Moreover, as shown in Fig. 2(b), the depth information is also not registered well with its corresponding color information such as the region of shoulder of a person in Fig. 1. The problem of unmatched boundaries arises because the TOF depth sensor exhibits inaccurate behavior at very close and very far target distances. In addition, as shown in Fig. 2(c), the TOF depth sensor does not capture depth data well on shiny and dark surfaces such as a black hair region, because reflected lights

2 S.-Y. Kim et al.: 3D Video Generation and Service based on a TOF Depth Sensor in MPEG-4 Multimedia Framework 1731 from these surfaces are very weak or scattered. Especially, as shown in Fig. 2(d), these spatial problems cause to generate depth flickering artifacts on a stationary object such as a table region in Fig. 1, in the temporal domain. Depth image Color image Black Hair Region (a) Optical noise (b) Unmatched boundary (c) Lost depth data 1 st frame 2 nd frame 3 rd frame (d) Temporal flickering artifacts Fig. 2. Inherent problems of a TOF depth sensor These spatial-temporal inherent problems limit the use of TOF depth sensors in applications involving motion detection and motion tracking. Our goal of this work is to provide a solution to improve the quality of depth images captured by a TOF depth sensor for the generation of high-quality videoplus-depth and to show that its application can be extended to reconstruct a realistic and dynamic 3D scene. The contributions of this paper are: (a) development of a new method to minimize optical noise in depth images using a newly-designed joint bilateral filter based on selected inner edges, (b) boundary refinement using robust matting and iterative threshold selection, (c) temporal consistency based on motion estimation, and (d) design of a framework to stream video-plus-depth data in the MPEG-4 multimedia framework to provide a practical solution to serve 3D video contents. This paper is organized as follows. In Section II, we present the proposed 3D video service system briefly. Section III explains the generation of high-quality video-plus-depth using a TOF depth sensor, and Section IV describes the MPEG-4 system for video-plus-depth data streaming. After providing experimental results in Section V, we conclude in Section VI. II. SYSTEM ARCHITECTURE A. Related Works During the past years, a variety of solutions have been developed to enhance depth images captured by TOF depth sensors. For minimization of optical noise in the depth image, a method that uses adaptive sampling and Gaussian smoothing was developed [8]. For lost region recovery, a method was proposed to regenerate lost hair region in a human actor using face detection and quadratic Bézier curve [9]. Recently, a hybrid camera system that combines a high-resolution video camera with a TOF depth sensor was introduced to provide high-quality depth images [10, 11]. These previous works have mainly concentrated on handling optical noise on depth images in the spatial domain and focused on generation of a static 3D scene, not a dynamic one. With respect to video-plus-depth services based on a TOF depth sensor, the ATTEST project has shown the possibility of realizing 3D video services with a video-plus-depth [12]. The ATTEST system transmitted video-plus-depth through one channel and then synthesized 3D virtual views from the video-plus-depth using depth-image-based rendering [13]. In a row, the 3DTV project has also developed core technologies related to the future 3D video services [14]. The previous systems for 3D video services were based on the MPEG-2 system to stream video-plus-depth. As a result, they limited not only 3D video content generation with other multimedia data, such as computer graphics models, but also user-friendly interactions, such as free viewpoint changing. In this paper, we present a spatial-temporal enhancement method of depth images captured by a TOF depth sensor to generate a dynamic 3D scene with high-quality video-plusdepth. In addition, we introduce a framework to stream videoplus-depth and various multimedia data at the same time in the MPEG-4 system while supporting free viewpoint changing. B. Proposed System Architecture Figure 3 shows the overall architecture of the proposed 3D video generation and service system. At the sender side, we capture video-plus-depth using a TOF depth sensor. Then, we apply a robust matting algorithm to color images with trimaps generated from depth images and perform an iterative threshold-selection method to compensate depth information with exact object boundaries. Next, joint bilateral filtering with inner-edge selection is applied to depth images to reduce optical noise. After recovering depth information, based on a quadratic Bézier curve [9], in the regions where depth data loss occurred, we perform temporal consistency based on motion estimation. Thereafter, the color and depth images are encoded by a video coder, such as a H.264/AVC coder [15], independently. The compressed video-plus-depth data are spatio-temporally combined with other multimedia, such as audio source and computer graphics models, using the MPEG-4 Binary Format for Scene (BIFS). The MPEG-4 BIFS is a scene descriptor that contains the spatio-temporal relationship between each multimedia object and interactive information [4]. Other multimedia data and the scene description information are also encoded by their coders. Finally, all encoded bitstreams are multiplexed into one bitstream in the MPEG-4 system. At the client side, we decode and extract video-plus-depth, MPEG-4 BIFS, and other multimedia information from the bitstream transmitted through a channel. Then, the MPEG-4 synchronizer distributes video-plus-depth, computer graphics models, and audio source data to a graphic renderer and an audio player, respectively. With transmitted video-plus-depth, we construct 3D surfaces with depth information by applying a mesh triangulation method [16], and the constructed 3D surfaces are overlaid with color information to represent a dynamic 3D scene. Moreover, other multimedia data are combined with the dynamic 3D scene by referring the transmitted scene description information, MPEG-4 BIFS. As a result, we can experience the immersive 3D video contents by a 3D display device and a speaker system.

3 1732 Data Acquisition Video-plus-depth Generation 3D Video Transmission Color Video Encoder TOF Depth Sensor Color Images Outer Edge Removal Outer Boundary Matching using Robust Matting Inner Edge Selection Joint Bilateral Filtering Motion Estimation Temporal Consistency Source CG Models Depth Video Encoder Encoder BIFS Encoder MPEG-4 Framework (MUX) Depth Images Iterative Threshold Selection Range Weight Loss Region Recovery Scene Description Information 3D Video Display 3D Video Reception 3D Display Device Graphic Renderer Player Mesh Triangulation Texture Mapping Depth images Color images CG Models MPEG-4 Synchronizer Source CG Models Color Video Decoder Depth Video Decoder Decoder BIFS Decoder MPEG-4 Framework (DEMUX) C H A N N E L Speaker Scene Description Information III. VIDEO-PLUS-DEPTH GENERATION A. Outer Boundary Matching When we render a scene using color and depth images, the color image is used as texture for a scene. As the TOF depth sensor does not catch exact boundary of objects, it causes visual artifacts in 3D video display. In this paper, we find the outer boundary of objects in a scene applying a robust matting algorithm onto the color image. The depth image is used to generate a trimap automatically, which is the input of image matting with the color image. For automatic trimap generation, we first convert a depth image D into a binary image M D by global thresholding. A foreground area T F can be an erosion of M D and a background area T B can be an inversion of dilation of M D. The trimap is represented by inversion of T F T B. Figure 4(b) shows a trimap generated by a depth image in Fig. 4(a). Exact outer boundaries are estimated applying an alpha matting algorithm onto the color image. We employ closed form matting [17] that finds globally optimal alpha matte using a quadratic cost function with a local smoothness assumption on foreground and background colors. Figure 4(c) shows an alpha map generated by closed form matting. The alpha map is used to compensate depth information in a depth image with extracted outer boundary by Eq. 1. Ai Di = Di ( x n, y m) (1) 255 Fig. 3. Proposed 3D video generation and service system where D i (x, is the intensity of pixel position (x, on the i th depth image D i, and A i (x, is the alpha value on the i th alpha map A i. The term of D i (x-n, y-m) is the nearest intensity of D i (x, that is found by a spiral search method. Finally, outer boundary is generated by local iterative threshold selection. We apply a threshold selection on the region of outer boundary. In order to do that, the depth image is partitioned into smaller blocks with block size, and blocks including outer boundary are selected. We find a threshold to separate each selected block into foreground and background regions using iterative average equivalence of normalized histogram. Figure 4(d) shows a compensated depth image by exact outer boundary. (a) Depth image (b) Trimap (c) Alpha map (d) Compensated depth image Fig. 4. Outer boundary matching using robust alpha matting

4 S.-Y. Kim et al.: 3D Video Generation and Service based on a TOF Depth Sensor in MPEG-4 Multimedia Framework 1733 B. Inner Optical Noise Minimization A general bilateral filter reduces noise in an image while preserving important sharp edges [18]. For reducing optical noise inside of objects in depth images, joint bilateral filtering has been used in the previous works [9, 19, 20]. In joint bilateral filtering, the assumption is that regions of depth discontinuity in a depth image usually correspond to the edges in its color image. Formally, for a pixel position p in a depth image D with its corresponding color image C, the joint filtered value J p is represented by Eq. 2 1 J p = G s ( p q ) Gr 1( Cp Cq ) Gr 2( Dp Dq ) Dq (2) kp q Ω where G s is the space weight. G r1 and G r2 are the weights of color difference and depth difference, respectively. The weights are derived from Gaussian distribution. Ω is the spatial support of the weight G s, and K p is a normalizing factor. The term of q is the pixel position in Ω. Note that some edges in a color image are not on the region of depth discontinuities in its depth image. These kinds of edges make undesirable effects on the depth image during joint bilateral filtering [20]. Although original depth information is smooth, these edges on the color image make the depth information to be discontinuous by joint bilateral filtering. In this paper, we present joint bilateral filtering based on valid inner edge selection. Figure 5 shows the procedure for optical noise minimization. addition, we remove edges on the region of outer boundary from the edge map E C, because we already matched the outer boundary in Section 3.1. For outer edge removal from E C, we convert a depth image D into a binary mask image M D by global thresholding. Then, the binary image M D is eroded to define a foreground area T F. The edge map E O after outer edge removal can be calculated by E C T F. Figure 6(d) shows an edge map after outer edge removal. (a) Color image (c) Edges from depth image (b) Edges from color image (d) Outer edge removal Color Image Depth Image (e) Selected inner edges (f) Modified color image Fig. 6. Valid inner edge selection from color and depth images Iterative Gaussian Filtering Canny Edge Detection Outer Edge Removal Edge Labeling Inner Edge Selection Image Merging Global Binarization Erosion Canny Edge Detection Range Weight Joint Bilateral Filtering Color image merged Enhanced Depth Image with edge map Fig. 5. Procedure of optical noise minimization First, we get an edge map E C from the color image C, and another edge map E D from its depth image D using Canny edge detection. Figure 6(b) shows the edge map extracted from a color image in Fig. 6(a). Figure 6(c) shows the edge map extracted from a depth image. Since TOF depth sensors usually do not capture edges on background, we remove them from the edge map E C. In Thereafter, we select valid inner edges in the edge map E O using a traditional labeling method based an equivalent table updating [21]. After assigning a label on each edge in E O, we search the label of which position is located at the same position of a pixel on edges in the edge map E D. Then, we gather the edges assigned with the searched labels from the edge map E O, as shown in Fig. 6(e). Finally, we create a new color image by merging the edge map E O with the color image heavily smoothed by iterative Gaussian filtering, as shown in Fig. 6(f). The new color image is used as the input of joint bilateral filtering to minimize optical noise in the depth image instead of the original color image. Formally, for some pixel position p of a depth image D with the modified color image M, the joint filtered value J p is represented by Eq. 3 1 J p = G s ( p q ) Gr 1( Mp Mq ) Gr 2( Dp Dq ) Dq (3) kp q Ω where G r1 is the weights of intensity difference on the modified color image M. C. Recovery of Lost Depth Data In this work, we employ the previous method using a quadratic Bézier curve to recover the region of lost depth data [9]. The depth recovery algorithm consists of three steps: detection of the lost depth data region, recovery of the boundary, and estimation of the lost depth data.

5 1734 In the method, a region growing algorithm with multiple seeds is applied to detect the lost depth data regions. Then, it recovers the boundary from the detected regions using boundary tracing. Finally, the lost depth data region is filled with depth information interpolated by a quadratic Bézier curve with neighboring depth data on the depth image. Figure 7(b) shows a depth image recovered by the quadratic Bézier curve method from the depth image in Fig. 7(a). (a) Before (b) After Fig. 7. Depth image after lost depth data recovery D. Temporal Consistency Temporal consistency reduces temporal depth flickering artifacts on stationary objects in a scene. For temporal consistency, we first detect the stationary regions. In this work, we estimate the stationary regions of the t th frame color image C t from the (t-1) th frame color image C t-1 using block matching. Block matching predicts the movement of objects in a scene by estimating similarity between blocks in the temporal domain. We use mean absolute difference (MAD) as a similarity measurement. When there is an M N block at the position (k, l) on the t th frame color image C t, the MAD value is calculated between the block and another M N block at the position (k+x, l+ on the (t-1) th frame color image C t-1. A motion vector v t (x, of the block on t th frame color image C t, which is a factor to determine motion existence, is represented by Eq. 4 v arg min MAD(, ) t = k l t In block matching, we assume that the block regions including zero motion vectors in both x- and y- directions are stationary. The motion image M t generated by the motion vector data is represented by Eq. 5 M t 0, = 255, if vt x > 0, vt y > 0 Otherwise where v t (x, x and v t (x, y indicate x- and y- direction motion vectors, respectively. In addition, a stationary region image S t can be extracted from the t th frame depth image D t by Eq. 6 (4) (5) St 1, D t = Dt, if St 1 > 0, St > 0 Otherwise IV. MPEG-4-BASED 3D VIDEO CONTENTS In order to deliver multimedia contents, a multimedia framework is needed. In this work, we direct attention to the MPEG-4 multimedia framework [4] that supports streaming data for various media objects and provides flexible interactivity. The MPEG-4-based scene is built with individual objects that have relationships in space and time. Based on this relationship, the MPEG-4 framework allows us to combine a variety of media objects, such as conventional 2D video, audio source and computer graphics models, with a 3D scene represented by videoplus-depth. The MPEG-4 system defines a scene description, referred to as BIFS. The MPEG-4 BIFS defines how the objects are spatio-temporally combined for presentation. All visible objects in the 3D scene are described within the Shape node in MPEG-4 BIFS [22]. Recently, a node to represent videoplus-depth data in the Shape node has been proposed, which is referred as a DepthMovie node [23]. We employed the DepthMovie node to combine video-plus-depth with other multimedia data. Computer graphics models are described by predefined nodes in MPEG-4 BIFS. The BIFS data including scene description and computer graphics model data are coded by a BIFS encoder provided by the present MPEG-4 system. Finally, the compressed color video, depth video, and MPEG-4 BIFS streams are multiplexed into one MP4 file that is designed to contain the media data of MPEG-4 presentation. The MP4 file can be played from a local hard disk and transmitted to consumers by a streaming server through existing networks. V. EXPERIMENTAL RESULTS We tested the performance of our method using two test image sequences, ACTOR1 and ACTOR2 sequences, obtained from a TOF depth sensor [5, 24]. ACTOR1 and ACTOR2 sequences were composed of 200 and 100 frames, respectively. Figure 8 shows some frames in the ACTOR1 and ACTOR2 sequence. The image resolution of the test image sequence was (7) St = Dt & M t (6) where the operator & indicates the BIT-AND operation. Finally, the enhanced depth image D' t considering temporal consistency is calculated by Eq. 7 (a) ACTOR1 sequence (b) ACTOR2 sequence Fig. 8. Test image sequence

6 S.-Y. Kim et al.: 3D Video Generation and Service based on a TOF Depth Sensor in MPEG-4 Multimedia Framework 1735 Figure 9 shows the result of noise minimization for the 1 st frame of the ACTOR1 and ACTOR2 sequences. In Fig. 9, we enlarged the rectangular regions in the first column and showed them in the second column. In the experiment, we used a joint bilateral 3 3 filter and set the standard deviation of each Gaussian kernel for the weight G s, G r1, and G r2 in Eq. 3 to 3, 0.1, and 0.1, respectively. In addition, conventional bilateral filtering [18] and previous joint bilateral filtering [9, 20] methods were used to compare with our method. As shown in Fig. 9(a), the original depth image included serious optical noise on objects in the scene. When we paid attention to the circle regions in the second column, we could easily notice that the proposed joint bilateral filter with inner edge selection reduced the optical noise efficiently while preserving some important sharp features. In the case of bilateral filtering, the crease of the cloth covering a table in the scene of ACTOR1 was almost disappeared, as shown in Fig. 9(b). In the case of joint bilateral filtering, the crease of the cloth was almost appeared, as shown in Fig. 9(c), but some distortion happened due to the effect of unexpected edges on the color image during joint bilateral filtering. On the other hand, the proposed method maintained the crease of the cloth well and minimized the visual distortion, as shown in Fig. 9(d), because we used only valid edges from the color image during joint bilateral filtering. We could notice the same situation on the woman s hair region in the scene of ACTOR2. (a) Original depth image (b) Bilateral filtering (c) Joint bilateral filtering Color image (d) Joint bilateral filtering with inner edge election Fig. 9. Result of noise minimization Depth image Color image Depth image (a) Color image (b) Original depth image Fig. 10. Result of boundary matching (c) Outer boundary matching

7 1736 Figure 10 shows the result of outer boundary matching for the 1 st frame of the ACTOR1 and ACTOR2 sequences. The rectangular regions of the color image, as shown in Fig. 10(a), are enlarged and shown after we folded them onto its corresponding depth image. As shown in Fig. 10(b), original depth image was not registered well with its color image, such as the region of man s calf in ACTOR1 and the region of woman s finger in ACTOR2. On the other hand, as shown in Fig. 10(c), boundary unmatched problem was minimized because the proposed method using a robust matting algorithm and iterative thresholding selection traced the exact boundary and compensated the unmatched region with neighboring depth information. Figure 11 shows the result of temporal consistency for the 1 st, 10 th, 20 th, 30 th, and 40 th frames of the ACTOR1 sequence when we reconstruct 3D scene from video-plusdepth. The table and the knee of a man in the scene are stationary regions in the temporal domain. As shown in Fig. 11(a), the optical noise and temporal inconsistency in the original depth images caused serious distortions. In the experiment, as shown in Fig. 11(b), although we enhanced the depth image in spatial domain using joint bilateral filtering and boundary matching, some flickering artifacts on the region of the table and the knee still happened. On the other hand, as shown in Fig. 11(c), the proposed method could reduce significantly the spatial optical noise by joint bilateral filtering with edge selection and the temporal distortion by the motion estimation-based temporal consistency. Figure 12(a) and Figure 12(b) shows the results of 3D scene reconstruction from the 1 st, 10 th, 20 th, 30 th, and 40 th frames of ACTOR1 and ACTOR2. We could generate natural and dynamic 3D scenes from the enhanced videoplus-depth successfully using a 3D mesh structure [16]. 1 st frame 10 th frame 20 th frame 30 th frame 40 th frame (a) Original depth image sequence (b) Joint bilateral filtering and boundary matching (c) Joint bilateral filtering with edge selection, boundary matching, and temporal consistency Fig. 11. Result of temporal consistency 1 st frame 10 th frame 20 th frame 30 th frame 40 th frame (a) 3D scene from ACTOR1 sequence (b) 3D scene from ACTOR2 sequence Fig. 12. Reconstruction of dynamic 3D scenes

8 S.-Y. Kim et al.: 3D Video Generation and Service based on a TOF Depth Sensor in MPEG-4 Multimedia Framework 1737 In order to assess the depth accuracy improvement by the proposed method, we applied Gaussian filtering [8], joint bilateral filtering (BF) [9, 20] and our joint bilateral filtering with edge selection onto noisy depth images. The noisy depth images were generated by adding Gaussian noise with a standard deviation of 20 into ground truth data in the Middlebury stereo dataset [3]. Figure 13 shows the result for the Bowling depth image in the Middlebury stereo data. Figure 13(a), Figure 13(b) and Figure 13(c) show the ground truth depth image, its corresponding color image, and artificial noisy depth image of Bowling, respectively. As shown in Fig. 13(f), our method reduced optical noise more significantly on the noisy depth image than Gaussian filtering in Fig. 13(d). Furthermore, our method was less affected by edges on the color image than joint bilateral filtering in Fig. 13(e). Peak signal-to-noise ratio (PSNR) was used as a quality measure based on known ground truth data. Table 1 shows the results of PSNR for other Middlebury test depth images. As shown in Table 1, our method had higher PSNR than the competing methods in this experiment. Figure 14(a) shows the 3D video content played by an MPEG-4 player with ACTOR1. We rendered the 3D scene using video-plus-depth data expressed by a DepthMovie node in the MPEG-4 system successfully. As shown in Fig. 14(a), MPEG-4-based 3D video contents could also display video-plus-depth, computer graphics models and 2D image together unlike MPEG-2-based 3D video contents [12]. Video-plus-depth CG model 2D image (a) 3D video contents (a) Ground truth (b) Color image -15 degree 0 degree (center view) +15 degree (b) Free viewpoint changing Fig. 14. MPEG-4-based 3D video contents (c) Noisy depth image (e) Joint bilateral filtering Test data Fig. 13. Depth quality evaluation (d) Gaussian filtering (f) Our method TABLE 1 DEPTH QUALITY EVALUATION (PSNR) Noisy depth data Gaussian Filtering Joint BF Our method Bowling 22.5 db 29.8 db 30.5 db 31.5 db Cloth 22.3 db 32.9 db 37.1 db 38.6 db Aloe 22.3 db 30.2 db 29.7 db 31.4 db Baby 22.4 db 33.0 db 33.9 db 34.1 db Wood 22.8 db 32.2 db 32.9 db 33.1 db Furthermore, we can change viewpoint freely in the 3D video content, as shown in Fig. 14(b). The 3D scene was viewed successfully when we changed the angle of the viewpoint from +15 degree to -15 degree. However, since the depth information on the side view was not captured in this experiment, heavy view changing caused unexpected results from video-plus-depth data, as shown in Fig. 14(a). In addition, Table 2 shows the results of average computation time for depth image enhancement with the test image sequence. Averagely, 3.47 sec/frame and 2.79 sec/frame for ACTOR1 and ACTOR2 sequences were needed for depth image enhancement in the spatial and temporal domain. We expect to reduce the computation time when we optimize our algorithm of depth image enhancement and employ fast rendering technique that uses graphics processing unit. TABLE 2 COMPUTATION TIME FOR DEPTH IMAGE ENHANCEMENT Processing ACTOR1 ACTOR2 Noise reduction 0.76 sec/frame 0.47 sec/frame Loss region recovery 0.55 sec/frame 0.53 sec/frame Boundary matching 1.45 sec/frame 1.12 sec/frame Temporal consistency 0.71 sec/frame 0.67 sec/frame Total 3.47 sec/frame 2.79 sec/frame

9 1738 VI. CONCLUSIONS In this paper, we have proposed a new method to enhance depth images captured by a TOF depth sensor spatially and temporally. As shown in the experimental results, we could minimize inherent problems of depth images significantly by the proposed depth enhancement method. Furthermore, we showed the possibility of 3D video services based on video-plus-depth data in the MPEG-4 multimedia framework. We expect that the proposed 3D video service system can be used in future 3D multimedia applications. REFERENCES [1] C. Fehn, A 3D-TV system based on video plus depth information, Proc. of Asilomar Conference on Signals, Systems and Computers, pp , [2] C. Fehn, R. Barré, and S. Pastoor, Interactive 3-D TV- concepts and key technologies, Proceedings of the IEEE, vol. 94, no. 3, pp , [3] D. Scharstein and R. Szeliski, A taxonomy and evaluation of dense two-frame stereo correspondence algorithms, International Jour. of Computer Vision, vol. 47, no. 1-3, pp. 7-42, [4] F. Pereira, MPEG-4: why, what, how and when, Signal Processing Image Communication, vol. 15, pp , [5] G.J. Iddan and G. Yahav, 3D imaging in the studio and elsewhere, Proc. of Videometrics and Optical Methods for 3D Shape Measurements, pp , [6] M. Kawakita, T. Kurita, H. Kikuchi, and S. Inoue, HDTV Axivision camera, Proc. of International Broadcasting Conference, pp , [7] S. Hu, S.S. Young, T. Hong, J.P. Reynolds, K. Krapels, B. Miller, J. Thomas, and O. Nguyen, Super-resolution for flash ladar imagery, Applied Optics, vol. 49, no. 5, pp , [8] S.M. Kim, J. Cha, J. Ryu, and K.H. Lee, Depth video enhancement for haptic interaction using a smooth surface reconstruction, IEICE Trans. on Information and System, vol. E89-D, pp , [9] J. Cho, S.Y. Kim, Y.S. Ho, and K. H. Lee, Dynamic 3D human actor generation method using a time-of-flight depth camera, IEEE Trans. on Consumer Electronics, vol. 54, no. 4, pp [10] J. Diebel and S. Thrun, An application of markov random fields to range sensing, Proc. of Advances in Neural In-formation Processing systems, pp , [11] B. Huhle, S. Fleck, and A. Schilling, Integrating 3D time-of-flight camera data and high resolution images for 3DTV applications, Proc. of 3DTV conference, pp. 1-4, [12] A. Redert, M. Beeck, C. Fehn, W. IJsselsteijn, M. Pollefeys, L. Van Gool, E. Ofek, I. Sexton, and P. Surman, ATTEST advanced threedimensional television system technologies, Proc. International Symposium on 3D Data Processing, pp , [13] H. Shum, S. Kang, and S. Chan, Survey of image-based representations and compression techniques, IEEE Trans. on Circuits and Systems for Video Technology, vol. 13, no. 11, pp , [14] L. Onural, Television in 3-D: what are the prospects, Proceedings of the IEEE, vol. 95, no. 6, pp , [15] T. Wiegand, M. Lightstone, D. Mukherjee, T.G. Campbell, and S.K. Mitra, Rate-distortion optimized mode selection for very low bit rate video coding and the emerging H.263 standard, IEEE Trans. on Circuit and System for Video Technology, vol. 6, no. 9, pp , [16] S.Y. Kim, S.B. Lee, and Y.S. Ho, Three-dimensional natural video system based on layered representation of depth maps, IEEE Trans. on Consumer Electronics, vol. 52, no. 3, pp , [17] A. Levin, D. Lischinski, and Y. Weiss, A closed-form solution to natural image matting, IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 30, no. 2, pp , [18] C. Tomasi and R. Manduchi, "Bilateral filtering for gray and color images," Proc. of International Conference on Computer Vision, pp , [19] J. Kopf, M.F. Cohen, D. Lischinski, and M. Uyttendaele, Joint bilateral upsampling, ACM Trans. on Graphics, vol. 26, no. 3, pp.1-6, [20] O. P. Gangwal and R.P. Berretty, Depth map post-processing for 3D-TV, Proc. of International Conference on Consumer Electronics, pp.1-2, [21] A. Rosenfeld and J. L. Pflatz, Sequential operations in digital picture processing, Jour. of ACM, vol. 13, no. 4. pp , [22] L. L. Maslyuk, A. Ignatenko, A. Zhirkov, A. Konushin, I. Park, M. Han, and Y. Bayakovski, Depth image-based representation and compression for static and animated 3-D objects, IEEE Trans. on Circuits and Systems for Video Technology, vol. 14, no. 7, pp , [23] J. Cha, Y.S. Ho, Y. Kim, J. Ryu, I. Oakley, A framework of haptic broadcasting, IEEE Multimedia, vol. 16, no 3, pp , [24] S.Y. Kim, E.K. Lee, and Y.S. Ho, Generation of ROI enhanced depth maps using stereoscopic cameras and a depth camera, IEEE Trans. on Broadcasting, vol. 54, no. 4, pp , Biographies Sung-Yeol Kim received M.S. and Ph.D degree in Information and Communication Engineering at the Gwangju Institute of Science and Technology (GIST), Korea, in 2003 and 2008, respectively. He is currently working as a research associate with the Department of Electrical Engineering and Computer Science, University of Tennessee at Knoxville, USA. His research interests include 3D video representation and processing, 3D mesh representation and processing, 3D TV and realistic broadcasting. Ji-Ho Cho received the MS degree in Information and Communications in 2005 and Ph. D degree in Mechatronics Engineering from Gwangju Institute of Science and Technology in Korea, and worked as an academic guest at Swiss Federal Institute of Technology ETHZ in Zürich in Currently, he works for the Intelligent Design and Graphics Laboratory at GIST as a postdoctoral research associate. His main interests lie in 3D Video and computational photography. Andreas Koschan (M 90) received the Diploma (M.S.) degree in computer science and the Dr.-Ing (Ph.D.) degree in computer engineering from the Technical University Berlin, Berlin, Germany, in 1985 and 1991, respectively. He is currently a Research Associate Professor with the Department of Electrical Engineering and Computer Science, University of Tennessee, Knoxville. His research work has primarily focused on color image processing and 3-D computer vision, including stereo vision and laser range finding techniques. He is a coauthor of two textbooks on 3-D image processing. Dr. Koschan is a member of the Society for Imaging Science and Technology Mongi A. Abidi (S 83 M 85) received his M.S. and Ph.D. degrees in electrical engineering from the University of Tennessee, Knoxville, in 1985 and 1987, respectively. He is currently a Professor with the Department of Electrical Engineering and Computer Science, University of Tennessee, directing research activities at the Imaging, Robotics, and Intelligent Systems Laboratory. He has published more than 300 papers and edited/written four books in the area of imaging and robotics. He received The Most Cited Paper Award in Computer Vision and Image Understanding for 2006 and 2007.

732 IEEE TRANSACTIONS ON BROADCASTING, VOL. 54, NO. 4, DECEMBER /$ IEEE

732 IEEE TRANSACTIONS ON BROADCASTING, VOL. 54, NO. 4, DECEMBER /$ IEEE 732 IEEE TRANSACTIONS ON BROADCASTING, VOL. 54, NO. 4, DECEMBER 2008 Generation of ROI Enhanced Depth Maps Using Stereoscopic Cameras and a Depth Camera Sung-Yeol Kim, Student Member, IEEE, Eun-Kyung Lee,

More information

M.S. in Information and Communications (Feb. 2003) Gwangju Institute of Science and Technology (GIST) in Korea, Advisor: Prof.

M.S. in Information and Communications (Feb. 2003) Gwangju Institute of Science and Technology (GIST) in Korea, Advisor: Prof. Dept. of Electrical Engineering and Computer Science, The University of Tennessee at Knoxville 327 Ferris Hall, Knoxville, Tennessee, USA, 37996 Mobile: +1-865-228-0961, Office: +1-865-974-5493, Fax: +1-865-974-5459,

More information

FILTER BASED ALPHA MATTING FOR DEPTH IMAGE BASED RENDERING. Naoki Kodera, Norishige Fukushima and Yutaka Ishibashi

FILTER BASED ALPHA MATTING FOR DEPTH IMAGE BASED RENDERING. Naoki Kodera, Norishige Fukushima and Yutaka Ishibashi FILTER BASED ALPHA MATTING FOR DEPTH IMAGE BASED RENDERING Naoki Kodera, Norishige Fukushima and Yutaka Ishibashi Graduate School of Engineering, Nagoya Institute of Technology ABSTRACT In this paper,

More information

Implementation of 3D Object Reconstruction Using Multiple Kinect Cameras

Implementation of 3D Object Reconstruction Using Multiple Kinect Cameras Implementation of 3D Object Reconstruction Using Multiple Kinect Cameras Dong-Won Shin and Yo-Sung Ho; Gwangju Institute of Science of Technology (GIST); Gwangju, Republic of Korea Abstract Three-dimensional

More information

Depth Map Boundary Filter for Enhanced View Synthesis in 3D Video

Depth Map Boundary Filter for Enhanced View Synthesis in 3D Video J Sign Process Syst (2017) 88:323 331 DOI 10.1007/s11265-016-1158-x Depth Map Boundary Filter for Enhanced View Synthesis in 3D Video Yunseok Song 1 & Yo-Sung Ho 1 Received: 24 April 2016 /Accepted: 7

More information

Multi-directional Hole Filling Method for Virtual View Synthesis

Multi-directional Hole Filling Method for Virtual View Synthesis DOI 10.1007/s11265-015-1069-2 Multi-directional Hole Filling Method for Virtual View Synthesis Ji-Hun Mun 1 & Yo-Sung Ho 1 Received: 30 March 2015 /Revised: 14 October 2015 /Accepted: 19 October 2015 #

More information

View Synthesis for Multiview Video Compression

View Synthesis for Multiview Video Compression View Synthesis for Multiview Video Compression Emin Martinian, Alexander Behrens, Jun Xin, and Anthony Vetro email:{martinian,jxin,avetro}@merl.com, behrens@tnt.uni-hannover.de Mitsubishi Electric Research

More information

A Novel Filling Disocclusion Method Based on Background Extraction. in Depth-Image-Based-Rendering

A Novel Filling Disocclusion Method Based on Background Extraction. in Depth-Image-Based-Rendering A Novel Filling Disocclusion Method Based on Background Extraction in Depth-Image-Based-Rendering Zhenguo Lu,Yuesheng Zhu,Jian Chen ShenZhen Graduate School,Peking University,China Email:zglu@sz.pku.edu.cn,zhuys@pkusz.edu.cn

More information

Temporally Consistence Depth Estimation from Stereo Video Sequences

Temporally Consistence Depth Estimation from Stereo Video Sequences Temporally Consistence Depth Estimation from Stereo Video Sequences Ji-Hun Mun and Yo-Sung Ho (&) School of Information and Communications, Gwangju Institute of Science and Technology (GIST), 123 Cheomdangwagi-ro,

More information

Scene Segmentation by Color and Depth Information and its Applications

Scene Segmentation by Color and Depth Information and its Applications Scene Segmentation by Color and Depth Information and its Applications Carlo Dal Mutto Pietro Zanuttigh Guido M. Cortelazzo Department of Information Engineering University of Padova Via Gradenigo 6/B,

More information

Noise vs Feature: Probabilistic Denoising of Time-of-Flight Range Data

Noise vs Feature: Probabilistic Denoising of Time-of-Flight Range Data Noise vs Feature: Probabilistic Denoising of Time-of-Flight Range Data Derek Chan CS229 Final Project Report ddc@stanford.edu Abstract Advances in active 3D range sensors have enabled the recording of

More information

DEPTH LESS 3D RENDERING. Mashhour Solh and Ghassan AlRegib

DEPTH LESS 3D RENDERING. Mashhour Solh and Ghassan AlRegib DEPTH LESS 3D RENDERING Mashhour Solh and Ghassan AlRegib School of Electrical and Computer Engineering Georgia Institute of Technology { msolh,alregib } @gatech.edu ABSTRACT We propose a new view synthesis

More information

View Synthesis for Multiview Video Compression

View Synthesis for Multiview Video Compression MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com View Synthesis for Multiview Video Compression Emin Martinian, Alexander Behrens, Jun Xin, and Anthony Vetro TR2006-035 April 2006 Abstract

More information

Hybrid Video Compression Using Selective Keyframe Identification and Patch-Based Super-Resolution

Hybrid Video Compression Using Selective Keyframe Identification and Patch-Based Super-Resolution 2011 IEEE International Symposium on Multimedia Hybrid Video Compression Using Selective Keyframe Identification and Patch-Based Super-Resolution Jeffrey Glaister, Calvin Chan, Michael Frankovich, Adrian

More information

Structured Light II. Thanks to Ronen Gvili, Szymon Rusinkiewicz and Maks Ovsjanikov

Structured Light II. Thanks to Ronen Gvili, Szymon Rusinkiewicz and Maks Ovsjanikov Structured Light II Johannes Köhler Johannes.koehler@dfki.de Thanks to Ronen Gvili, Szymon Rusinkiewicz and Maks Ovsjanikov Introduction Previous lecture: Structured Light I Active Scanning Camera/emitter

More information

Motion Estimation for Video Coding Standards

Motion Estimation for Video Coding Standards Motion Estimation for Video Coding Standards Prof. Ja-Ling Wu Department of Computer Science and Information Engineering National Taiwan University Introduction of Motion Estimation The goal of video compression

More information

A Survey of Light Source Detection Methods

A Survey of Light Source Detection Methods A Survey of Light Source Detection Methods Nathan Funk University of Alberta Mini-Project for CMPUT 603 November 30, 2003 Abstract This paper provides an overview of the most prominent techniques for light

More information

Combined Bilateral Filter for Enhanced Real-Time Upsampling of Depth Images

Combined Bilateral Filter for Enhanced Real-Time Upsampling of Depth Images Combined Bilateral Filter for Enhanced Real-Time Upsampling of Depth Images Oliver Wasenmüller, Gabriele Bleser and Didier Stricker Germany Research Center for Artificial Intelligence (DFKI), Trippstadter

More information

Multiple Color and ToF Camera System for 3D Contents Generation

Multiple Color and ToF Camera System for 3D Contents Generation IEIE Transactions on Smart Processing and Computing, vol. 6, no. 3, June 2017 https://doi.org/10.5573/ieiespc.2017.6.3.175 175 IEIE Transactions on Smart Processing and Computing Multiple Color and ToF

More information

3D Computer Vision. Structured Light II. Prof. Didier Stricker. Kaiserlautern University.

3D Computer Vision. Structured Light II. Prof. Didier Stricker. Kaiserlautern University. 3D Computer Vision Structured Light II Prof. Didier Stricker Kaiserlautern University http://ags.cs.uni-kl.de/ DFKI Deutsches Forschungszentrum für Künstliche Intelligenz http://av.dfki.de 1 Introduction

More information

Depth Estimation for View Synthesis in Multiview Video Coding

Depth Estimation for View Synthesis in Multiview Video Coding MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Depth Estimation for View Synthesis in Multiview Video Coding Serdar Ince, Emin Martinian, Sehoon Yea, Anthony Vetro TR2007-025 June 2007 Abstract

More information

Low-Complexity, Near-Lossless Coding of Depth Maps from Kinect-Like Depth Cameras

Low-Complexity, Near-Lossless Coding of Depth Maps from Kinect-Like Depth Cameras Low-Complexity, Near-Lossless Coding of Depth Maps from Kinect-Like Depth Cameras Sanjeev Mehrotra, Zhengyou Zhang, Qin Cai, Cha Zhang, Philip A. Chou Microsoft Research Redmond, WA, USA {sanjeevm,zhang,qincai,chazhang,pachou}@microsoft.com

More information

Context based optimal shape coding

Context based optimal shape coding IEEE Signal Processing Society 1999 Workshop on Multimedia Signal Processing September 13-15, 1999, Copenhagen, Denmark Electronic Proceedings 1999 IEEE Context based optimal shape coding Gerry Melnikov,

More information

Joint Tracking and Multiview Video Compression

Joint Tracking and Multiview Video Compression Joint Tracking and Multiview Video Compression Cha Zhang and Dinei Florêncio Communication and Collaborations Systems Group Microsoft Research, Redmond, WA, USA 98052 {chazhang,dinei}@microsoft.com ABSTRACT

More information

Deblocking Filter Algorithm with Low Complexity for H.264 Video Coding

Deblocking Filter Algorithm with Low Complexity for H.264 Video Coding Deblocking Filter Algorithm with Low Complexity for H.264 Video Coding Jung-Ah Choi and Yo-Sung Ho Gwangju Institute of Science and Technology (GIST) 261 Cheomdan-gwagiro, Buk-gu, Gwangju, 500-712, Korea

More information

Multi-View Image Coding in 3-D Space Based on 3-D Reconstruction

Multi-View Image Coding in 3-D Space Based on 3-D Reconstruction Multi-View Image Coding in 3-D Space Based on 3-D Reconstruction Yongying Gao and Hayder Radha Department of Electrical and Computer Engineering, Michigan State University, East Lansing, MI 48823 email:

More information

Professor, CSE Department, Nirma University, Ahmedabad, India

Professor, CSE Department, Nirma University, Ahmedabad, India Bandwidth Optimization for Real Time Video Streaming Sarthak Trivedi 1, Priyanka Sharma 2 1 M.Tech Scholar, CSE Department, Nirma University, Ahmedabad, India 2 Professor, CSE Department, Nirma University,

More information

Perceptual Quality Improvement of Stereoscopic Images

Perceptual Quality Improvement of Stereoscopic Images Perceptual Quality Improvement of Stereoscopic Images Jong In Gil and Manbae Kim Dept. of Computer and Communications Engineering Kangwon National University Chunchon, Republic of Korea, 200-701 E-mail:

More information

NEW CONCEPT FOR JOINT DISPARITY ESTIMATION AND SEGMENTATION FOR REAL-TIME VIDEO PROCESSING

NEW CONCEPT FOR JOINT DISPARITY ESTIMATION AND SEGMENTATION FOR REAL-TIME VIDEO PROCESSING NEW CONCEPT FOR JOINT DISPARITY ESTIMATION AND SEGMENTATION FOR REAL-TIME VIDEO PROCESSING Nicole Atzpadin 1, Serap Askar, Peter Kauff, Oliver Schreer Fraunhofer Institut für Nachrichtentechnik, Heinrich-Hertz-Institut,

More information

Multi-View Stereo for Static and Dynamic Scenes

Multi-View Stereo for Static and Dynamic Scenes Multi-View Stereo for Static and Dynamic Scenes Wolfgang Burgard Jan 6, 2010 Main references Yasutaka Furukawa and Jean Ponce, Accurate, Dense and Robust Multi-View Stereopsis, 2007 C.L. Zitnick, S.B.

More information

A New Data Format for Multiview Video

A New Data Format for Multiview Video A New Data Format for Multiview Video MEHRDAD PANAHPOUR TEHRANI 1 AKIO ISHIKAWA 1 MASASHIRO KAWAKITA 1 NAOMI INOUE 1 TOSHIAKI FUJII 2 This paper proposes a new data forma that can be used for multiview

More information

CONVERSION OF FREE-VIEWPOINT 3D MULTI-VIEW VIDEO FOR STEREOSCOPIC DISPLAYS

CONVERSION OF FREE-VIEWPOINT 3D MULTI-VIEW VIDEO FOR STEREOSCOPIC DISPLAYS CONVERSION OF FREE-VIEWPOINT 3D MULTI-VIEW VIDEO FOR STEREOSCOPIC DISPLAYS Luat Do 1, Svitlana Zinger 1, and Peter H. N. de With 1,2 1 Eindhoven University of Technology, P.O. Box 513, 5600 MB Eindhoven,

More information

International Journal of Emerging Technology and Advanced Engineering Website: (ISSN , Volume 2, Issue 4, April 2012)

International Journal of Emerging Technology and Advanced Engineering Website:   (ISSN , Volume 2, Issue 4, April 2012) A Technical Analysis Towards Digital Video Compression Rutika Joshi 1, Rajesh Rai 2, Rajesh Nema 3 1 Student, Electronics and Communication Department, NIIST College, Bhopal, 2,3 Prof., Electronics and

More information

Multiview Point Cloud Filtering for Spatiotemporal Consistency

Multiview Point Cloud Filtering for Spatiotemporal Consistency Multiview Point Cloud Filtering for Spatiotemporal Consistency Robert Skupin, Thilo Borgmann and Thomas Sikora Communication Systems Group, Technische Universität Berlin, Berlin, Germany skupin@mailbox.tu-berlin.de,

More information

View Generation for Free Viewpoint Video System

View Generation for Free Viewpoint Video System View Generation for Free Viewpoint Video System Gangyi JIANG 1, Liangzhong FAN 2, Mei YU 1, Feng Shao 1 1 Faculty of Information Science and Engineering, Ningbo University, Ningbo, 315211, China 2 Ningbo

More information

Video Compression Method for On-Board Systems of Construction Robots

Video Compression Method for On-Board Systems of Construction Robots Video Compression Method for On-Board Systems of Construction Robots Andrei Petukhov, Michael Rachkov Moscow State Industrial University Department of Automatics, Informatics and Control Systems ul. Avtozavodskaya,

More information

FRAME-RATE UP-CONVERSION USING TRANSMITTED TRUE MOTION VECTORS

FRAME-RATE UP-CONVERSION USING TRANSMITTED TRUE MOTION VECTORS FRAME-RATE UP-CONVERSION USING TRANSMITTED TRUE MOTION VECTORS Yen-Kuang Chen 1, Anthony Vetro 2, Huifang Sun 3, and S. Y. Kung 4 Intel Corp. 1, Mitsubishi Electric ITA 2 3, and Princeton University 1

More information

Disparity Estimation Using Fast Motion-Search Algorithm and Local Image Characteristics

Disparity Estimation Using Fast Motion-Search Algorithm and Local Image Characteristics Disparity Estimation Using Fast Motion-Search Algorithm and Local Image Characteristics Yong-Jun Chang, Yo-Sung Ho Gwangju Institute of Science and Technology (GIST) 13 Cheomdangwagi-ro, Buk-gu, Gwangju,

More information

ELEC Dr Reji Mathew Electrical Engineering UNSW

ELEC Dr Reji Mathew Electrical Engineering UNSW ELEC 4622 Dr Reji Mathew Electrical Engineering UNSW Review of Motion Modelling and Estimation Introduction to Motion Modelling & Estimation Forward Motion Backward Motion Block Motion Estimation Motion

More information

3D Editing System for Captured Real Scenes

3D Editing System for Captured Real Scenes 3D Editing System for Captured Real Scenes Inwoo Ha, Yong Beom Lee and James D.K. Kim Samsung Advanced Institute of Technology, Youngin, South Korea E-mail: {iw.ha, leey, jamesdk.kim}@samsung.com Tel:

More information

Accurate 3D Face and Body Modeling from a Single Fixed Kinect

Accurate 3D Face and Body Modeling from a Single Fixed Kinect Accurate 3D Face and Body Modeling from a Single Fixed Kinect Ruizhe Wang*, Matthias Hernandez*, Jongmoo Choi, Gérard Medioni Computer Vision Lab, IRIS University of Southern California Abstract In this

More information

REGION-BASED SPIHT CODING AND MULTIRESOLUTION DECODING OF IMAGE SEQUENCES

REGION-BASED SPIHT CODING AND MULTIRESOLUTION DECODING OF IMAGE SEQUENCES REGION-BASED SPIHT CODING AND MULTIRESOLUTION DECODING OF IMAGE SEQUENCES Sungdae Cho and William A. Pearlman Center for Next Generation Video Department of Electrical, Computer, and Systems Engineering

More information

Real-time Detection of Illegally Parked Vehicles Using 1-D Transformation

Real-time Detection of Illegally Parked Vehicles Using 1-D Transformation Real-time Detection of Illegally Parked Vehicles Using 1-D Transformation Jong Taek Lee, M. S. Ryoo, Matthew Riley, and J. K. Aggarwal Computer & Vision Research Center Dept. of Electrical & Computer Engineering,

More information

LBP-GUIDED DEPTH IMAGE FILTER. Rui Zhong, Ruimin Hu

LBP-GUIDED DEPTH IMAGE FILTER. Rui Zhong, Ruimin Hu LBP-GUIDED DEPTH IMAGE FILTER Rui Zhong, Ruimin Hu National Engineering Research Center for Multimedia Software,School of Computer, Wuhan University,Wuhan, 430072, China zhongrui0824@126.com, hrm1964@163.com

More information

Spectral Coding of Three-Dimensional Mesh Geometry Information Using Dual Graph

Spectral Coding of Three-Dimensional Mesh Geometry Information Using Dual Graph Spectral Coding of Three-Dimensional Mesh Geometry Information Using Dual Graph Sung-Yeol Kim, Seung-Uk Yoon, and Yo-Sung Ho Gwangju Institute of Science and Technology (GIST) 1 Oryong-dong, Buk-gu, Gwangju,

More information

Advanced Video Coding: The new H.264 video compression standard

Advanced Video Coding: The new H.264 video compression standard Advanced Video Coding: The new H.264 video compression standard August 2003 1. Introduction Video compression ( video coding ), the process of compressing moving images to save storage space and transmission

More information

Rectification of distorted elemental image array using four markers in three-dimensional integral imaging

Rectification of distorted elemental image array using four markers in three-dimensional integral imaging Rectification of distorted elemental image array using four markers in three-dimensional integral imaging Hyeonah Jeong 1 and Hoon Yoo 2 * 1 Department of Computer Science, SangMyung University, Korea.

More information

Gate-to-gate automated video tracking and location

Gate-to-gate automated video tracking and location Gate-to-gate automated video tracing and location Sangyu Kang*, Jooni Pai**, Besma R. Abidi***, David Shelton, Mar Mitces, and Mongi A. Abidi IRIS Lab, Department of Electrical & Computer Engineering University

More information

Digital Image Stabilization and Its Integration with Video Encoder

Digital Image Stabilization and Its Integration with Video Encoder Digital Image Stabilization and Its Integration with Video Encoder Yu-Chun Peng, Hung-An Chang, Homer H. Chen Graduate Institute of Communication Engineering National Taiwan University Taipei, Taiwan {b889189,

More information

Bilateral Depth-Discontinuity Filter for Novel View Synthesis

Bilateral Depth-Discontinuity Filter for Novel View Synthesis Bilateral Depth-Discontinuity Filter for Novel View Synthesis Ismaël Daribo and Hideo Saito Department of Information and Computer Science, Keio University, 3-14-1 Hiyoshi, Kohoku-ku, Yokohama 3-85, Japan

More information

Motion Detection Algorithm

Motion Detection Algorithm Volume 1, No. 12, February 2013 ISSN 2278-1080 The International Journal of Computer Science & Applications (TIJCSA) RESEARCH PAPER Available Online at http://www.journalofcomputerscience.com/ Motion Detection

More information

Range Imaging Through Triangulation. Range Imaging Through Triangulation. Range Imaging Through Triangulation. Range Imaging Through Triangulation

Range Imaging Through Triangulation. Range Imaging Through Triangulation. Range Imaging Through Triangulation. Range Imaging Through Triangulation Obviously, this is a very slow process and not suitable for dynamic scenes. To speed things up, we can use a laser that projects a vertical line of light onto the scene. This laser rotates around its vertical

More information

An Improvement of the Occlusion Detection Performance in Sequential Images Using Optical Flow

An Improvement of the Occlusion Detection Performance in Sequential Images Using Optical Flow , pp.247-251 http://dx.doi.org/10.14257/astl.2015.99.58 An Improvement of the Occlusion Detection Performance in Sequential Images Using Optical Flow Jin Woo Choi 1, Jae Seoung Kim 2, Taeg Kuen Whangbo

More information

VIDEO COMPRESSION STANDARDS

VIDEO COMPRESSION STANDARDS VIDEO COMPRESSION STANDARDS Family of standards: the evolution of the coding model state of the art (and implementation technology support): H.261: videoconference x64 (1988) MPEG-1: CD storage (up to

More information

Dense 3-D Reconstruction of an Outdoor Scene by Hundreds-baseline Stereo Using a Hand-held Video Camera

Dense 3-D Reconstruction of an Outdoor Scene by Hundreds-baseline Stereo Using a Hand-held Video Camera Dense 3-D Reconstruction of an Outdoor Scene by Hundreds-baseline Stereo Using a Hand-held Video Camera Tomokazu Satoy, Masayuki Kanbaray, Naokazu Yokoyay and Haruo Takemuraz ygraduate School of Information

More information

Multiview Image Compression using Algebraic Constraints

Multiview Image Compression using Algebraic Constraints Multiview Image Compression using Algebraic Constraints Chaitanya Kamisetty and C. V. Jawahar Centre for Visual Information Technology, International Institute of Information Technology, Hyderabad, INDIA-500019

More information

Gesture Recognition using Temporal Templates with disparity information

Gesture Recognition using Temporal Templates with disparity information 8- MVA7 IAPR Conference on Machine Vision Applications, May 6-8, 7, Tokyo, JAPAN Gesture Recognition using Temporal Templates with disparity information Kazunori Onoguchi and Masaaki Sato Hirosaki University

More information

Using temporal seeding to constrain the disparity search range in stereo matching

Using temporal seeding to constrain the disparity search range in stereo matching Using temporal seeding to constrain the disparity search range in stereo matching Thulani Ndhlovu Mobile Intelligent Autonomous Systems CSIR South Africa Email: tndhlovu@csir.co.za Fred Nicolls Department

More information

Automatic Reconstruction of 3D Objects Using a Mobile Monoscopic Camera

Automatic Reconstruction of 3D Objects Using a Mobile Monoscopic Camera Automatic Reconstruction of 3D Objects Using a Mobile Monoscopic Camera Wolfgang Niem, Jochen Wingbermühle Universität Hannover Institut für Theoretische Nachrichtentechnik und Informationsverarbeitung

More information

Fog Simulation and Refocusing from Stereo Images

Fog Simulation and Refocusing from Stereo Images Fog Simulation and Refocusing from Stereo Images Yifei Wang epartment of Electrical Engineering Stanford University yfeiwang@stanford.edu bstract In this project, we use stereo images to estimate depth

More information

MPEG-4 AUTHORING TOOL FOR THE COMPOSITION OF 3D AUDIOVISUAL SCENES

MPEG-4 AUTHORING TOOL FOR THE COMPOSITION OF 3D AUDIOVISUAL SCENES MPEG-4 AUTHORING TOOL FOR THE COMPOSITION OF 3D AUDIOVISUAL SCENES P. Daras I. Kompatsiaris T. Raptis M. G. Strintzis Informatics and Telematics Institute 1,Kyvernidou str. 546 39 Thessaloniki, GREECE

More information

Coding of 3D Videos based on Visual Discomfort

Coding of 3D Videos based on Visual Discomfort Coding of 3D Videos based on Visual Discomfort Dogancan Temel and Ghassan AlRegib School of Electrical and Computer Engineering, Georgia Institute of Technology Atlanta, GA, 30332-0250 USA {cantemel, alregib}@gatech.edu

More information

Shape Preserving RGB-D Depth Map Restoration

Shape Preserving RGB-D Depth Map Restoration Shape Preserving RGB-D Depth Map Restoration Wei Liu 1, Haoyang Xue 1, Yun Gu 1, Qiang Wu 2, Jie Yang 1, and Nikola Kasabov 3 1 The Key Laboratory of Ministry of Education for System Control and Information

More information

Temporal Filtering of Depth Images using Optical Flow

Temporal Filtering of Depth Images using Optical Flow Temporal Filtering of Depth Images using Optical Flow Razmik Avetisyan Christian Rosenke Martin Luboschik Oliver Staadt Visual Computing Lab, Institute for Computer Science University of Rostock 18059

More information

Outdoor Scene Reconstruction from Multiple Image Sequences Captured by a Hand-held Video Camera

Outdoor Scene Reconstruction from Multiple Image Sequences Captured by a Hand-held Video Camera Outdoor Scene Reconstruction from Multiple Image Sequences Captured by a Hand-held Video Camera Tomokazu Sato, Masayuki Kanbara and Naokazu Yokoya Graduate School of Information Science, Nara Institute

More information

Guided Image Super-Resolution: A New Technique for Photogeometric Super-Resolution in Hybrid 3-D Range Imaging

Guided Image Super-Resolution: A New Technique for Photogeometric Super-Resolution in Hybrid 3-D Range Imaging Guided Image Super-Resolution: A New Technique for Photogeometric Super-Resolution in Hybrid 3-D Range Imaging Florin C. Ghesu 1, Thomas Köhler 1,2, Sven Haase 1, Joachim Hornegger 1,2 04.09.2014 1 Pattern

More information

DEPTH PIXEL CLUSTERING FOR CONSISTENCY TESTING OF MULTIVIEW DEPTH. Pravin Kumar Rana and Markus Flierl

DEPTH PIXEL CLUSTERING FOR CONSISTENCY TESTING OF MULTIVIEW DEPTH. Pravin Kumar Rana and Markus Flierl DEPTH PIXEL CLUSTERING FOR CONSISTENCY TESTING OF MULTIVIEW DEPTH Pravin Kumar Rana and Markus Flierl ACCESS Linnaeus Center, School of Electrical Engineering KTH Royal Institute of Technology, Stockholm,

More information

Multiframe Blocking-Artifact Reduction for Transform-Coded Video

Multiframe Blocking-Artifact Reduction for Transform-Coded Video 276 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 12, NO. 4, APRIL 2002 Multiframe Blocking-Artifact Reduction for Transform-Coded Video Bahadir K. Gunturk, Yucel Altunbasak, and

More information

Efficient Stereo Image Rectification Method Using Horizontal Baseline

Efficient Stereo Image Rectification Method Using Horizontal Baseline Efficient Stereo Image Rectification Method Using Horizontal Baseline Yun-Suk Kang and Yo-Sung Ho School of Information and Communicatitions Gwangju Institute of Science and Technology (GIST) 261 Cheomdan-gwagiro,

More information

Analysis of Depth Map Resampling Filters for Depth-based 3D Video Coding

Analysis of Depth Map Resampling Filters for Depth-based 3D Video Coding MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Analysis of Depth Map Resampling Filters for Depth-based 3D Video Coding Graziosi, D.B.; Rodrigues, N.M.M.; de Faria, S.M.M.; Tian, D.; Vetro,

More information

Filter Flow: Supplemental Material

Filter Flow: Supplemental Material Filter Flow: Supplemental Material Steven M. Seitz University of Washington Simon Baker Microsoft Research We include larger images and a number of additional results obtained using Filter Flow [5]. 1

More information

Multimedia Technology CHAPTER 4. Video and Animation

Multimedia Technology CHAPTER 4. Video and Animation CHAPTER 4 Video and Animation - Both video and animation give us a sense of motion. They exploit some properties of human eye s ability of viewing pictures. - Motion video is the element of multimedia

More information

Stereo and structured light

Stereo and structured light Stereo and structured light http://graphics.cs.cmu.edu/courses/15-463 15-463, 15-663, 15-862 Computational Photography Fall 2018, Lecture 20 Course announcements Homework 5 is still ongoing. - Make sure

More information

Rate-distortion Optimized Streaming of Compressed Light Fields with Multiple Representations

Rate-distortion Optimized Streaming of Compressed Light Fields with Multiple Representations Rate-distortion Optimized Streaming of Compressed Light Fields with Multiple Representations Prashant Ramanathan and Bernd Girod Department of Electrical Engineering Stanford University Stanford CA 945

More information

Photometric Stereo with Auto-Radiometric Calibration

Photometric Stereo with Auto-Radiometric Calibration Photometric Stereo with Auto-Radiometric Calibration Wiennat Mongkulmann Takahiro Okabe Yoichi Sato Institute of Industrial Science, The University of Tokyo {wiennat,takahiro,ysato} @iis.u-tokyo.ac.jp

More information

STRUCTURAL EDGE LEARNING FOR 3-D RECONSTRUCTION FROM A SINGLE STILL IMAGE. Nan Hu. Stanford University Electrical Engineering

STRUCTURAL EDGE LEARNING FOR 3-D RECONSTRUCTION FROM A SINGLE STILL IMAGE. Nan Hu. Stanford University Electrical Engineering STRUCTURAL EDGE LEARNING FOR 3-D RECONSTRUCTION FROM A SINGLE STILL IMAGE Nan Hu Stanford University Electrical Engineering nanhu@stanford.edu ABSTRACT Learning 3-D scene structure from a single still

More information

Image Segmentation Techniques for Object-Based Coding

Image Segmentation Techniques for Object-Based Coding Image Techniques for Object-Based Coding Junaid Ahmed, Joseph Bosworth, and Scott T. Acton The Oklahoma Imaging Laboratory School of Electrical and Computer Engineering Oklahoma State University {ajunaid,bosworj,sacton}@okstate.edu

More information

Moving Object Segmentation Method Based on Motion Information Classification by X-means and Spatial Region Segmentation

Moving Object Segmentation Method Based on Motion Information Classification by X-means and Spatial Region Segmentation IJCSNS International Journal of Computer Science and Network Security, VOL.13 No.11, November 2013 1 Moving Object Segmentation Method Based on Motion Information Classification by X-means and Spatial

More information

Asymmetric 2 1 pass stereo matching algorithm for real images

Asymmetric 2 1 pass stereo matching algorithm for real images 455, 057004 May 2006 Asymmetric 21 pass stereo matching algorithm for real images Chi Chu National Chiao Tung University Department of Computer Science Hsinchu, Taiwan 300 Chin-Chen Chang National United

More information

Auto-focusing Technique in a Projector-Camera System

Auto-focusing Technique in a Projector-Camera System 2008 10th Intl. Conf. on Control, Automation, Robotics and Vision Hanoi, Vietnam, 17 20 December 2008 Auto-focusing Technique in a Projector-Camera System Lam Bui Quang, Daesik Kim and Sukhan Lee School

More information

Fingerprint Image Enhancement Algorithm and Performance Evaluation

Fingerprint Image Enhancement Algorithm and Performance Evaluation Fingerprint Image Enhancement Algorithm and Performance Evaluation Naja M I, Rajesh R M Tech Student, College of Engineering, Perumon, Perinad, Kerala, India Project Manager, NEST GROUP, Techno Park, TVM,

More information

Channel-Adaptive Error Protection for Scalable Audio Streaming over Wireless Internet

Channel-Adaptive Error Protection for Scalable Audio Streaming over Wireless Internet Channel-Adaptive Error Protection for Scalable Audio Streaming over Wireless Internet GuiJin Wang Qian Zhang Wenwu Zhu Jianping Zhou Department of Electronic Engineering, Tsinghua University, Beijing,

More information

Reconstruction PSNR [db]

Reconstruction PSNR [db] Proc. Vision, Modeling, and Visualization VMV-2000 Saarbrücken, Germany, pp. 199-203, November 2000 Progressive Compression and Rendering of Light Fields Marcus Magnor, Andreas Endmann Telecommunications

More information

Computer Vision 2. SS 18 Dr. Benjamin Guthier Professur für Bildverarbeitung. Computer Vision 2 Dr. Benjamin Guthier

Computer Vision 2. SS 18 Dr. Benjamin Guthier Professur für Bildverarbeitung. Computer Vision 2 Dr. Benjamin Guthier Computer Vision 2 SS 18 Dr. Benjamin Guthier Professur für Bildverarbeitung Computer Vision 2 Dr. Benjamin Guthier 1. IMAGE PROCESSING Computer Vision 2 Dr. Benjamin Guthier Content of this Chapter Non-linear

More information

High Performance GPU-Based Preprocessing for Time-of-Flight Imaging in Medical Applications

High Performance GPU-Based Preprocessing for Time-of-Flight Imaging in Medical Applications High Performance GPU-Based Preprocessing for Time-of-Flight Imaging in Medical Applications Jakob Wasza 1, Sebastian Bauer 1, Joachim Hornegger 1,2 1 Pattern Recognition Lab, Friedrich-Alexander University

More information

Unit-level Optimization for SVC Extractor

Unit-level Optimization for SVC Extractor Unit-level Optimization for SVC Extractor Chang-Ming Lee, Chia-Ying Lee, Bo-Yao Huang, and Kang-Chih Chang Department of Communications Engineering National Chung Cheng University Chiayi, Taiwan changminglee@ee.ccu.edu.tw,

More information

Implementation and analysis of Directional DCT in H.264

Implementation and analysis of Directional DCT in H.264 Implementation and analysis of Directional DCT in H.264 EE 5359 Multimedia Processing Guidance: Dr K R Rao Priyadarshini Anjanappa UTA ID: 1000730236 priyadarshini.anjanappa@mavs.uta.edu Introduction A

More information

Local Readjustment for High-Resolution 3D Reconstruction: Supplementary Material

Local Readjustment for High-Resolution 3D Reconstruction: Supplementary Material Local Readjustment for High-Resolution 3D Reconstruction: Supplementary Material Siyu Zhu 1, Tian Fang 2, Jianxiong Xiao 3, and Long Quan 4 1,2,4 The Hong Kong University of Science and Technology 3 Princeton

More information

Complex Sensors: Cameras, Visual Sensing. The Robotics Primer (Ch. 9) ECE 497: Introduction to Mobile Robotics -Visual Sensors

Complex Sensors: Cameras, Visual Sensing. The Robotics Primer (Ch. 9) ECE 497: Introduction to Mobile Robotics -Visual Sensors Complex Sensors: Cameras, Visual Sensing The Robotics Primer (Ch. 9) Bring your laptop and robot everyday DO NOT unplug the network cables from the desktop computers or the walls Tuesday s Quiz is on Visual

More information

Topics to be Covered in the Rest of the Semester. CSci 4968 and 6270 Computational Vision Lecture 15 Overview of Remainder of the Semester

Topics to be Covered in the Rest of the Semester. CSci 4968 and 6270 Computational Vision Lecture 15 Overview of Remainder of the Semester Topics to be Covered in the Rest of the Semester CSci 4968 and 6270 Computational Vision Lecture 15 Overview of Remainder of the Semester Charles Stewart Department of Computer Science Rensselaer Polytechnic

More information

A Video Watermarking Algorithm Based on the Human Visual System Properties

A Video Watermarking Algorithm Based on the Human Visual System Properties A Video Watermarking Algorithm Based on the Human Visual System Properties Ji-Young Moon 1 and Yo-Sung Ho 2 1 Samsung Electronics Co., LTD 416, Maetan3-dong, Paldal-gu, Suwon-si, Gyenggi-do, Korea jiyoung.moon@samsung.com

More information

Impact of Intensity Edge Map on Segmentation of Noisy Range Images

Impact of Intensity Edge Map on Segmentation of Noisy Range Images Impact of Intensity Edge Map on Segmentation of Noisy Range Images Yan Zhang 1, Yiyong Sun 1, Hamed Sari-Sarraf, Mongi A. Abidi 1 1 IRIS Lab, Dept. of ECE, University of Tennessee, Knoxville, TN 37996-100,

More information

Video Communication Ecosystems. Research Challenges for Immersive. over Future Internet. Converged Networks & Services (CONES) Research Group

Video Communication Ecosystems. Research Challenges for Immersive. over Future Internet. Converged Networks & Services (CONES) Research Group Research Challenges for Immersive Video Communication Ecosystems over Future Internet Tasos Dagiuklas, Ph.D., SMIEEE Assistant Professor Converged Networks & Services (CONES) Research Group Hellenic Open

More information

Chapter 11.3 MPEG-2. MPEG-2: For higher quality video at a bit-rate of more than 4 Mbps Defined seven profiles aimed at different applications:

Chapter 11.3 MPEG-2. MPEG-2: For higher quality video at a bit-rate of more than 4 Mbps Defined seven profiles aimed at different applications: Chapter 11.3 MPEG-2 MPEG-2: For higher quality video at a bit-rate of more than 4 Mbps Defined seven profiles aimed at different applications: Simple, Main, SNR scalable, Spatially scalable, High, 4:2:2,

More information

EE795: Computer Vision and Intelligent Systems

EE795: Computer Vision and Intelligent Systems EE795: Computer Vision and Intelligent Systems Spring 2012 TTh 17:30-18:45 FDH 204 Lecture 14 130307 http://www.ee.unlv.edu/~b1morris/ecg795/ 2 Outline Review Stereo Dense Motion Estimation Translational

More information

Storage Efficient NL-Means Burst Denoising for Programmable Cameras

Storage Efficient NL-Means Burst Denoising for Programmable Cameras Storage Efficient NL-Means Burst Denoising for Programmable Cameras Brendan Duncan Stanford University brendand@stanford.edu Miroslav Kukla Stanford University mkukla@stanford.edu Abstract An effective

More information

Express Letters. A Simple and Efficient Search Algorithm for Block-Matching Motion Estimation. Jianhua Lu and Ming L. Liou

Express Letters. A Simple and Efficient Search Algorithm for Block-Matching Motion Estimation. Jianhua Lu and Ming L. Liou IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 7, NO. 2, APRIL 1997 429 Express Letters A Simple and Efficient Search Algorithm for Block-Matching Motion Estimation Jianhua Lu and

More information

STUDY AND IMPLEMENTATION OF VIDEO COMPRESSION STANDARDS (H.264/AVC, DIRAC)

STUDY AND IMPLEMENTATION OF VIDEO COMPRESSION STANDARDS (H.264/AVC, DIRAC) STUDY AND IMPLEMENTATION OF VIDEO COMPRESSION STANDARDS (H.264/AVC, DIRAC) EE 5359-Multimedia Processing Spring 2012 Dr. K.R Rao By: Sumedha Phatak(1000731131) OBJECTIVE A study, implementation and comparison

More information

A reversible data hiding based on adaptive prediction technique and histogram shifting

A reversible data hiding based on adaptive prediction technique and histogram shifting A reversible data hiding based on adaptive prediction technique and histogram shifting Rui Liu, Rongrong Ni, Yao Zhao Institute of Information Science Beijing Jiaotong University E-mail: rrni@bjtu.edu.cn

More information

Perceptual Grouping from Motion Cues Using Tensor Voting

Perceptual Grouping from Motion Cues Using Tensor Voting Perceptual Grouping from Motion Cues Using Tensor Voting 1. Research Team Project Leader: Graduate Students: Prof. Gérard Medioni, Computer Science Mircea Nicolescu, Changki Min 2. Statement of Project

More information