Novel view image synthesis based on photo-consistent 3D model deformation

Similar documents
Multi-view stereo. Many slides adapted from S. Seitz

A Statistical Consistency Check for the Space Carving Algorithm.

3D Surface Reconstruction from 2D Multiview Images using Voxel Mapping

Multiple View Geometry

Image Based Reconstruction II

Multi-View 3D-Reconstruction

Multiview Reconstruction

Shape from Silhouettes I

Multi-View Stereo for Static and Dynamic Scenes

Dense 3D Reconstruction. Christiano Gava

Volumetric stereo with silhouette and feature constraints

Shape from Silhouettes I

Some books on linear algebra

Dense 3D Reconstruction. Christiano Gava

PERFORMANCE CAPTURE FROM SPARSE MULTI-VIEW VIDEO

arxiv: v1 [cs.cv] 28 Sep 2018

arxiv: v1 [cs.cv] 28 Sep 2018

Dynamic 3D Shape From Multi-viewpoint Images Using Deformable Mesh Model

BIL Computer Vision Apr 16, 2014

Lecture 8 Active stereo & Volumetric stereo

Step-by-Step Model Buidling

Volumetric Scene Reconstruction from Multiple Views

3D Photography: Stereo Matching

Multi-View Image Coding in 3-D Space Based on 3-D Reconstruction

Prof. Trevor Darrell Lecture 18: Multiview and Photometric Stereo

Shape from Silhouettes I CV book Szelisky

Volume Illumination and Segmentation

Stereo and Epipolar geometry

Subpixel accurate refinement of disparity maps using stereo correspondences

Stereo II CSE 576. Ali Farhadi. Several slides from Larry Zitnick and Steve Seitz

Geometric Reconstruction Dense reconstruction of scene geometry

IMAGE-BASED RENDERING

International Journal for Research in Applied Science & Engineering Technology (IJRASET) A Review: 3D Image Reconstruction From Multiple Images

Multi-View 3D-Reconstruction

Accurate 3D Face and Body Modeling from a Single Fixed Kinect

Automatic Reconstruction of 3D Objects Using a Mobile Monoscopic Camera

What have we leaned so far?

EECS 442 Computer vision. Announcements

Segmentation and Tracking of Partial Planar Templates

Ray casting for incremental voxel colouring

Computer Vision Lecture 17

Computer Vision Lecture 17

Projective Surface Refinement for Free-Viewpoint Video

Stereo. 11/02/2012 CS129, Brown James Hays. Slides by Kristen Grauman

Estimating normal vectors and curvatures by centroid weights

Modeling, Combining, and Rendering Dynamic Real-World Events From Image Sequences

ASIAGRAPH 2008 The Intermediate View Synthesis System For Soccer Broadcasts

Outdoor Scene Reconstruction from Multiple Image Sequences Captured by a Hand-held Video Camera

VOLUMETRIC MODEL REFINEMENT BY SHELL CARVING

Shape from LIDAR Data. Univ. of Florida

Multi-View Reconstruction using Narrow-Band Graph-Cuts and Surface Normal Optimization

Miniature faking. In close-up photo, the depth of field is limited.

VOLUMETRIC RECONSTRUCTION WITH COMPRESSED DATA

Reconstruction of complete 3D object model from multi-view range images.

IMAGE PROCESSING AND IMAGE REGISTRATION ON SPIRAL ARCHITECTURE WITH salib

3D Reconstruction of Dynamic Textures with Crowd Sourced Data. Dinghuang Ji, Enrique Dunn and Jan-Michael Frahm

Accurate and Dense Wide-Baseline Stereo Matching Using SW-POC

CS5670: Computer Vision

Multi-view Stereo via Volumetric Graph-cuts and Occlusion Robust Photo-Consistency

Efficient View-Dependent Sampling of Visual Hulls

Multiple View Geometry

MULTI-VIEWPOINT SILHOUETTE EXTRACTION WITH 3D CONTEXT-AWARE ERROR DETECTION, CORRECTION, AND SHADOW SUPPRESSION

Hierarchical Matching Techiques for Automatic Image Mosaicing

Generating Tool Paths for Free-Form Pocket Machining Using z-buffer-based Voronoi Diagrams

Occlusion Detection of Real Objects using Contour Based Stereo Matching

Polyhedral Visual Hulls for Real-Time Rendering

Stereo Vision. MAN-522 Computer Vision

An Efficient Visual Hull Computation Algorithm

An Efficient Image Matching Method for Multi-View Stereo

3D Editing System for Captured Real Scenes

Image Precision Silhouette Edges

Chaplin, Modern Times, 1936

Recap: Features and filters. Recap: Grouping & fitting. Now: Multiple views 10/29/2008. Epipolar geometry & stereo vision. Why multiple views?

ENGN2911I: 3D Photography and Geometry Processing Assignment 1: 3D Photography using Planar Shadows

Model-Based Multiple View Reconstruction of People

Interactive Free-Viewpoint Video

SIMPLE ROOM SHAPE MODELING WITH SPARSE 3D POINT INFORMATION USING PHOTOGRAMMETRY AND APPLICATION SOFTWARE

Computer Vision for Computer Graphics

Shape from Silhouettes

Extracting Surface Representations From Rim Curves

3D and Appearance Modeling from Images

Deformable Mesh Model for Complex Multi-Object 3D Motion Estimation from Multi-Viewpoint Video

Multiview Projectors/Cameras System for 3D Reconstruction of Dynamic Scenes

3D Digitization of a Hand-held Object with a Wearable Vision Sensor

A Survey of Light Source Detection Methods

Fundamentals of Stereo Vision Michael Bleyer LVA Stereo Vision

coding of various parts showing different features, the possibility of rotation or of hiding covering parts of the object's surface to gain an insight

Rendering and Modeling of Transparent Objects. Minglun Gong Dept. of CS, Memorial Univ.

Image-Based Modeling and Rendering. Image-Based Modeling and Rendering. Final projects IBMR. What we have learnt so far. What IBMR is about

A Constrained Delaunay Triangle Mesh Method for Three-Dimensional Unstructured Boundary Point Cloud

Multi-View Reconstruction Preserving Weakly-Supported Surfaces

Lecture 8 Active stereo & Volumetric stereo

Talk plan. 3d model. Applications: cultural heritage 5/9/ d shape reconstruction from photographs: a Multi-View Stereo approach

CS 4495/7495 Computer Vision Frank Dellaert, Fall 07. Dense Stereo Some Slides by Forsyth & Ponce, Jim Rehg, Sing Bing Kang

Projector Calibration for Pattern Projection Systems

Multi-view reconstruction for projector camera systems based on bundle adjustment

Processing 3D Surface Data

COMP 102: Computers and Computing

Surface Reconstruction. Gianpaolo Palma

Keywords:Synthetic Data, IBR, Data Generation Tool. Abstract

Transcription:

316 Int. J. Computational Science and Engineering, Vol. 8, No. 4, 2013 Novel view image synthesis based on photo-consistent 3D model deformation Meng-Hsuan Chia, Chien-Hsing He and Huei-Yung Lin* Department of Electrical Engineering, National Chung Cheng University, Chiayi 621, Taiwan Fax: 886-5-272-0862 E-mail: silverwolf8291@hotmail.com E-mail: sonily7811@hotmail.com E-mail: hylin@ccu.edu.tw *Corresponding author Abstract: In this paper, a system is designed for improving the quality of novel view synthesis. To make the virtual view synthesis look better, the basic idea is to perform model refinement with important camera information of the novel view. In this system, we first use the reconstructed visual hull from shape from silhouette and then refine this 3D model based on the view dependency. The 3D points of the model are classified into the outline points and the non-outline points according to the virtual viewpoint. To refine the model, both of the outline points and the non-outline points are used to move the 3D points iteratively by minimising energy function until convergence. The key energy is the photo-consistency energy with additional smoothness energy and contour/visual hull energy. The latter two energy terms can avoid the local minimum when calculating the photo-consistency energy. Finally, we render the novel view image using view-dependent image synthesis by blending the pixel values from reference cameras near the virtual camera. Keywords: 3D model reconstruction; view synthesis; visual hull. Reference to this paper should be made as follows: Chia, M-H., He, C-H. and Lin, H-Y. (2013) Novel view image synthesis based on photo-consistent 3D model deformation, Int. J. Computational Science and Engineering, Vol. 8, No. 4, pp.316 323. Biographical notes: Meng-Hsuan Chia received his BS and MS in Electrical Engineering from National Chung Cheng University, Taiwan. He has been engaging in the research areas of computer vision and image processing. He is currently with Institute for Information Industry, Taiwan. Chien-Hsing He received his BS in Electrical Engineering from National University of Kaohsiung, Taiwan. He is currently an MS student at the Department of Electrical Engineering, National Chung Cheng University, Taiwan. His research interests include computer vision and image processing. Huei-Yung Lin received his PhD in Electrical and Computer Engineering from the State University of New York at Stony Brook. In 2002, he joined the Department of Electrical Engineering, National Chung Cheng University, Taiwan, as an Assistant Professor and currently an Associate Professor. His current research interests include computer vision, robotics, pattern recognition, and image processing. This paper is a revised and expanded version of a paper entitled Novel view image synthesis based on photo-consistent 3D model deformation presented at the 2011 National Computer Symposium, Taiwan, 2 3 December 2011. 1 Introduction In the past few decades, three-dimensional (3D) reconstruction has been widely investigated, and the results are used to many 3D applications like 3D video game, 3D movie, virtual reality, medical imaging, or digital art collection. Many researchers (e.g., Furukawa and Ponce, 2010; Delaunoy et al., 2008; Goesele et al., 2006) produce more and more precise 3D models, but they always spend over one hour for calculation. Since it takes a long time to reconstruct a 3D model, we deal with this problem with a different approach. It is the common cases that two-dimensional (2D) monitors are still accepted by most consumers. That is, no matter how precise the 3D model is, it needs to be converted to 2D images for display. Thus, in our approach, we reconstruct the objects mainly specifically for novel view image synthesis. Although many previous works have discussed how to reconstruct more precise 3D models, most of them are not Copyright 2013 Inderscience Enterprises Ltd.

Novel view image synthesis based on photo-consistent 3D model deformation 317 specifically designed for novel view synthesis. (Matsuyama et al., 2004) proposed a parallel pipeline processing method for reconstructing a dynamic 3D object shape from multi-view video images. Different from shape from silhouettes algorithms, they introduced a plane-based volume intersection algorithm. The voxel-based representation of the 3D model is then transformed into meshes. Finally, the 3D model is rendered by viewindependent and view-dependent: the former determines which cameras are used to blend by point normal, and the latter performs the novel view synthesis according to the viewing direction. Goesele et al. (2006) proposed a simple concept which reconstructed 3D models only using photo-consistency. They projected every pixel at every viewpoint into the 3D space, and find the most matching voxel which has similar colour with the other viewpoint. In this case, it might produce highly accurate 3D points. However, some regions cannot be reconstructed due to the occlusion problem caused by low similarities of the colour information. Ukita et al. (2008) proposed a free-viewpoint imaging method with sparsely located cameras. It aims to be used in a complicated scene. Different from stereo vision, their cameras are separated by about 1.5 metres. They found the corresponding points for two cameras, and used them to create meshes for these images. Finally, these warped corresponding meshes were blended in the novel view images. Since it made some neighbouring meshes become discontinuous along their edges, they split the incorrect mesh according to photo-consistency for reducing effect of discontinuous regions. In many researches of multi-view 3D model reconstruction, shape from silhouette algorithm is the most popular one (Laurentini, 1994). It projects the silhouettes of the captured images into the 3D space, and creates the visual hull based on their intersection regions. This method reconstructs fast but only a rough result can be obtained because no detailed texture information is used. More importantly, the concave region of the 3D model cannot be reconstructed with the visual hulls. Thus, we will improve this drawback using the photo-consistency property among the images. Photo-consistency for 3D model reconstruction was presented at first by Seitz and Dyer (2002), and improved by Kutulakos and Seitz (2000). They used voxel-based representations to partition the 3D space into W H L voxels, and then project each voxel to the visible images and calculate their colour differences. A voxel is a true point if the difference is smaller than a threshold. Using this approach, more precise 3D points can be obtained. But it usually spends too much time for the 3D model refinement. Hence, our method aims to combine the advantages of the above techniques and apply it to image synthesis. In this paper, we first reconstruct the visual hull and remove the meshes invisible from the novel viewpoints, and then refine the model by photo-consistency check. Similar to the active contours, the 3D model refinement is defined via the minimum energy function. According to the position and the viewing direction of the novel view, 3D points are classified as outline points and non-outline points. Each type has it own energy function term. Finally, we use view-dependent image synthesis to render a high quality image for the given viewpoint. 2 Model refinement and novel view synthesis Our system consists of three stages: 1 3D reconstruction 2 model refinement 3 view-dependent image synthesis. In the model refinement stage, 3D points are classified as the outline points and the non-outline points according to the virtual viewpoint, and processed separately. The overall system flowchart is shown in Figure 1. Figure 1 System flowchart of the proposed model refinement and novel view synthesis technique 2.1 3D reconstruction To generate the initial 3D model, we first create the visual hull using shape from silhouettes algorithm. It projects every viewable region of the images to the 3D space, which is named visual cone. The visual hull is the intersection volume of all visual cones. Figure 2 shows a cross-section view when processing the visual hull. This method can only reconstruct a rough 3D model, but it requires less computation time. See Lin and Wu (2008) for more details. We then adopt the powercrust algorithm (Amenta et al., 2001; Aurenhammer, 1991; Lee and Schachter, 1980) to generate the meshes connecting all 3D points with their neighbours. It contains the following steps: 1 Calculate the Voronoi diagram. 2 Determine if the vertices of the Voronoi diagram are poles. 3 Calculate the power diagram of the poles. 4 The poles are classified into intrinsic points and extrinsic points.

318 M-H. Chia et al. 5 Create a surface of the power diagram so that the intrinsic and extrinsic points can be separated. 6 Connect all intrinsic points using Delaunay triangulation (Lee and Schachter, 1980) and form the mesh model. Figure 3 Triangular pyramid and invisible point Figure 2 The cross-section view of a visual hull Figure 4 Flowchart of outline points refinement The resulting 3D model is fully coated with meshes generated from triangulation. In this case, a surface normal vector can be represented by the cross product of any two vectors in the mesh. The 3D mesh model can detect the self-occlusion areas when synthesising the novel view images. If we need to find the neighbours of a point, the mesh model is also efficient since the connection information of points has been created. There is no need to search from all of the points. Since the invisible meshes are not required for novel view synthesis, they are removed according to the given virtual viewpoint. Our invisible mesh removal algorithm is divided into two stage: 1 For each mesh, a triangular viewing cone is generated by the centre of project P c and the triangle P 1 P 2 P 3 as shown in Figure 3. It is used to determine whether a 3D point, say P, is invisible or not. If the point is inside this infinite triangular space, then it is invisible because of the occlusion introduced by the mesh P 1 P 2 P 3. 2 After the invisibility test has been carried out for all points, whether a mesh is visible can then be determined accordingly. A mesh is invisible only if all its three vertices are not visible from the centre of projection. 2.2 Refinement of outline points When the 3D model is projected to a novel view image plane, it is relatively easy to identify the model s contour. Thus, we can refine the outline points first. The outline points are defined as the image points on the contour of the 3D model s projection silhouettes in the novel view image plane. Our refinement will move the outline points progressively for better image quality. The flowchart of our outline point refinement is shown in Figure 4. Figure 5 Moving direction (see online version for colours) At the second step, we should define the moving direction before every iterative movement. We project the outline points to the 3D model silhouettes and capture a 5 5 window, an example is shown in Figure 5. The curve indicates the contour of the 3D model silhouettes; P c is the projection points of the outline points. In the binary 5 5 window, the white region is the visible 3D model silhouettes and the black region is the background. We then find the centre of gravity of the background, P g. The 2D vector V p is the normalised PP c g at image plane, but Vp is not enough because the outline point is in 3D space. Thus, we change V p from the image coordinate to the world coordinate using intrinsic and extrinsic camera parameters of the novel view. The final moving direction is given by the 3D vector.

Novel view image synthesis based on photo-consistent 3D model deformation 319 We construct an energy function and the final position of the outline point is found when it has the minimum energy. The energy function is defined as E = E + α E + β E (1) s c r where E s is the photo-consistency energy in that the colour difference between the projections of outline points to different reference images. E c is the distance between the model contour in the neighbouring image and the projection point of the outline point, which is called the contour energy. E r is a smoothness energy which acts as an attraction of neighbouring 3D points to the outline point. Each energy will be discussed in more details at a later stage. The weights α and β are user-defined parameters. 2.2.1 Photo-consistency energy Photo-consistency plays an important role in stereo vision, which is used to find the coordinates in the 3D space using the corresponding points in a stereo image pair. In our system, it is used to check the consistency of the colour information in multi-view images. Thus, we choose a key image which is closest to the virtual viewpoint and some secondary images which are the neighbours of the key image. The outline points are then projected onto the key image and secondary images. Taking those projection points as centres, a key window and secondary windows are selected alternately. We calculate the sum of squared differences (SSD) values of the key window with each secondary window separately. The photo-consistency energy is computed as the average of the sum of the SSD values n sub 1 k = 0 wk Es = (2) n sub where n sub is number of secondary images and w k is the SSD value. Since the projections of the outline points should always be around the contour and the key window or secondary windows may grab the background pixel that we do not need, we create a binary foreground mask to handle this situation. For each pixel of the mask, it is set as 0 if the corresponding pixel is the background in the key image or any secondary images, otherwise it is set as 1. On the other hand, we do not calculate the colour difference when any pixel is on the background. So it is written as: 2 key (, ) sub (, ) xy, I x y I x y 2 2 Ikey ( x, y) Isub ( x, y) M f ( x, y) xy, xy, 2.2.2 Contour energy The outline points should locate around the contour of the neighbouring view images unless the angle between the novel view and its neighbours. Hence, we set the image contour as an energy of attraction. The outline points are (3) projected to each image of the neighbouring viewpoint I k (k = 0, 1,, n near 1 ). Find the closest contour pixel and calculate the distance d k. The contour energy is the average of the sum of all d k : E c = n near 1 dk k = 0 (4) σ n near where n near is number of neighbouring cameras, and σ is used to normalise the range of all d k from 0 to σ. Each contour distance image can be made in advance for reducing the heavy calculation time. Figure 6 illustrates the process of distance transforms (Borgefors, 1986): a the input contour image b setting the distance measurement mask which is in the Euclidean space c the result image. In the image, brighter pixels indicate closer points, and vice versa. Figure 6 Distance transform, (a) contour image (b) distance measurement mask (c) distance transform (see online version for colours) (a) (b) (c)

320 M-H. Chia et al. 2.2.3 Smoothness energy So far, we have used the contour and colour information of the 3D model for estimation. However, the 3D model can be seen as a deformable model because the shape may be changed when the 3D points are moving. The surfaces of a deformable model are like membranes that pull each other, so we can reduce the probability of local minimum of photo-consistency and smooth surface, which is termed smoothness. Because the local minimum also happens under occlusion, reflection or other unexpectable situations, preventing it to avoid abrupt colouring area for image synthesis is required. For a given image containing 3D model projection, it has the property of smooth outline points. Like active contours, the outline points pull each other without non-outline points according to the tension and curvature. Because the smooth energy should be normalised, the angle proportion will be used instead of the length. Figure 7 illustrates how an outline point is calculated for the smoothness energy. V i (for i = 0, 1,, n r, n r is the number of neighbouring novel view outline points) is a vector from the outline point to its neighbour, but not includes the non-outline points. V m is the normalised moving direction vector we ve mentioned previously. The energy is then calculated as the average of the sum of V m and all V i as E r = n r V i 0 m V = i (5) σ n near where V m and V i are the normalised V m and V i, respectively. We take the absolute value of the result because the smoothness energy should be large whenever protrusion or concavity occur and the smallest when it is vertical. Figure 7 Smoothness of outline point (see online version for colours) E = β E + (1 β) E + α E (6) s vh r where the weight β is according to reliability of photo-consistency, and the weight α is assigned by the user. The procedure is similar to Section 2.2 but the details are different because of the non-outline points. For some 3D model refinement algorithms like model deformations, they move the 3D points along the point normal or mesh normal. But some problems might happen if a point passes through another mesh. It will cause the meshes to cross each other and result in self-occlusion. This generally introduces more regions of discontinuous colour when performing the image synthesis. Since we have the viewing direction of the novel view for image synthesis, it is possible to let the non-outline points move along the line passing through itself and the centre of the novel view. This will prevent mesh cross problem and improve the quality of image synthesis at the same time. Besides, we restrict the moving direction to be the direction vector from the centre of the novel view to the non-outline point so that the concavity can be reconstructed. 2.3.1 Photo-consistency energy First, we choose the key image according to the normal vector of the non-outline point, and its neighbours are assigned as secondary images. To reduce the effect of calibration error, the searching range is set in the secondary images. Each pixel will be the centre of an estimated window, and then we calculate the sum of the square difference of each estimated window and the key estimated window in the key image. The smallest one are denoted as W k (k = 0, 1,, n sub, n sub is the secondary image number) in each secondary image and their centres are P k (k = 0, 1,, n sub ). Find the 3D points P sub,k which cross the ray of P key and each P k using the triangulation algorithm. The photo-consistency energy is the sum of difference of the non-outline point P and P sub,k : n sub 1 wk Psub, k P k = 0 β Es = (7) n sub and β means the reliability of this energy: n sub 1 wk k = 0 β = (8) nsub 2.3 Refine non-outline points Our system uses visual hull for the initial model, but the concavity of the 3D model cannot be reconstructed based on the visual hull. To solve this problem, Starck et al. (2005) use photo-consistency energy E s to find the correct position of the 3D point, and add the visual hull energy E vh and the smoothness energy E r to avoid local minimum. The energy function for moving the non-outline point is given by: 2.3.2 Visual hull energy The visual hull is reconstructed by the contours of each view image, so it can be an energy when moving the non-outline points. Specially, for some situation like self-occlusion, β is small and the weight of visual hull energy becomes higher instead. The energy is the distance between the non-outline point P and its closest point P v h in the visual hull: E = P P (9) vh vh

Novel view image synthesis based on photo-consistent 3D model deformation 321 2.3.3 Smoothness energy As mentioned in Section 2.2.3, we also adopt the smoothness energy to avoid the local minimum. Generally speaking, the smoothness is the sum of vectors Pri, ( i = 0,1,, nr 1) from P to its neighbouring point. However, in this work, the moving direction is restricted by the line L which passes through P and O n (Figure 8). The pulling the force between P and P ri, should also have the same property. The vector P P ri, is projected to L with the direction. Thus, the smoothness energy is given by: n r 1 ( P P, ) i 0 ri V = Er = (10) n r Figure 9 Figure 10 View-dependent image synthesis (see online version for colours) Mesh lookup table (see online version for colours) where V is a unit vector. Figure 8 Pulling force illustration (see online version for colours) 3 View-dependent image synthesis Assuming the novel view camera parameters are known, then every pixel in this camera is fixed in the world coordinate system. As showing in Figure 9, similar to the ray tracing, the ray which passes through the camera centre and the pixel P ij must intersect the 3D model. There might be one or more intersection points but the closest one is the point V seen by the pixel P ij. Project V to the reference cameras and capture the colour information, and the colour of P ij is the sum of these colour values multiplied by W c respectively, where W c is the value for which the novel viewing direction dot the referential viewing direction. The above process is generally very slow because every ray must check with every mesh to see whether an intersection is found. In this work, we create a mesh lookup table to meet the closest intersection point. Because the 3D model is available, the depth map of the novel view can be easily derived. Each mesh is projected to the novel view and the mesh index is written in the table if the depth of the mesh and the map is equal. In the final result (Figure 10), the intersection points are found easily using the mesh lookup table. As mentioned in Verlani et al. (2006), only two to three neighbouring cameras are needed when blending the images because the possibility of self-occlusion rises if the camera is far away, and the synthetic image becomes more blurred. Hence, we set W c to the power of m. 4 Experiment result We use the dinosaur dataset for our experiment. This dinosaur model is made by plaster with no texture or colour, but there is a little light on the model when performing the image acquisition. A CCD camera is used to capture the images from different viewpoint and 12 of them are chosen as our system input. The image resolution is 640 480. A calibration board was set and captured by these 12 viewpoints, so we can calibrate these input cameras to get their intrinsic and extrinsic parameters using Jean-Yves Bouguet s MATLAB toolbox. The novel view camera is set between camera 1 and camera 2 as shown in Figure 11. Note that the novel view image in this figure is the image from the dataset for verification, not the input or the result of image synthesis. In this case, we set α and β as 0.2 for the outline points refinement, and set α is 0.1 for the non-outline points refinement.

322 M-H. Chia et al. Figure 11 Novel view camera and reference cameras that only high quality novel view images are produced, and not necessary the precise 3D model. Table 1 RMS error for every outline points refinement of the dinosaur object We first show the image synthesis results in each stage. As shown in Figure 12, (d) is the ground truth. The 3D model in (a) is the visual hull, and it can be seen clearly that a piece of dorsal fin is cut by the shape of silhouettes. The lost parts are fixed after moving the outline points iteratively in (b). On the other hand, the leg becomes thinner because of low sensitivity of the concavity in (a) visual hull. Thus, we move the non-outline points to solve this problem in as shown in (c), which is the final result of our system. Figure 12 Image synthesis in each stage, (a) visual hull (b) after refine outline points (c) after refine non-outline points (d) ground truth (final result) (see online version for colours) Times 1 2 3 4 RMS error 33.57 31.98 31.27 30.92 Times 5 6 7 RMS error 30.57 30.24 30.26 Table 2 RMS error for every non-outline points refinement of the dinosaur object Times 1 2 3 4 5 RMS error 29.12 28.97 28.90 28.90 28.82 Figure 13 Novel view camera and reference cameras (see online version for colours) (a) (b) Figure 14 Image synthesis in each stage, (a) visual hull (b) after refine outline points (c) after refine non-outline points (d) ground truth (final result) (see online version for colours) (c) We then evaluate the image colour error for every outline point refinement or the non-outline parts. We calculate the root mean square (RMS) of the difference between the result and the ground truth in the foreground. The evaluation is shown in Table 1 and Table 2. We can see that the RMS errors are decreasing after every refinement in both stages. Note that the RMS error slightly increases after the 7th refinement as shown in Table 1. This is because the smoothness energy is large enough to pull other points away from the exact position. In this experiment, the total execution time is 298 seconds running on a notebook computer equipped with AMD Turion 1.8 GHz processor and 2 GB RAM. The 3D model reconstructed from the same dataset using Goesele s approach (Goesele et al., 2006) takes about 12 hours and 24 minutes. The major difference between our technique and the algorithm for accurate 3D model reconstruction is (d) (a) (c) We also perform the experiment using the 3D model bunny. This model was created without texture information, so we map woody texture on it using VTK (visualisation toolkit) library. The virtual cameras are used to capture images as the input of our algorithm. In this (b) (d)

Novel view image synthesis based on photo-consistent 3D model deformation 323 experiment, we set 12 cameras around the 3D model with 30 degrees separation between two consecutive views. The parameters α and β are set as 0.3 for the outline points refinement, and α is set as 0.3 for the non-outline points refinement. The novel view camera is set at a location between camera 0 and camera 1, aiming at the origin of the world coordinate system as shown in Figure 13. Note that this novel view image is only used to compare with the final result, not the input. Figures 14(a) to 14(c) show the results in each stage, and Figure 14(d) is the ground-truth image. We can see that the ear of the bunny is recovered after the refinement of outline points as shown in Figure 14(b). The concavity between the front and rear foot becomes more clear-cut as shown in Figure 14(c). The RMS errors calculated for the Bunny object are shown in Tables 3 and 4. We can see that the RMS errors decrease after the refinement of outline points. Note that the RMS error is large at the beginning of the non-outline points refinement mainly due to the textureless woody skin. The total execution for this object is about 271 seconds. Table 3 RMS error for every outline points refinement of the bunny object Times 1 2 3 4 5 RMS error 14.33 12.79 12.86 12.81 12.81 Table 4 RMS error for every non-outline points refinement of the bunny object Times 1 2 3 4 5 RMS error 13.51 13.23 13.13 13.06 13.04 5 Conclusions This paper introduces a novel view image synthesis system based on moving two different types of 3D points. With some information of the novel view direction, photo-consistency, image contour and surface smoothness, we can refine the 3D model and improve image quality. The contributions of this work include the followings: We introduce a concept that the 3D model refinement is used to improve quality of image synthesis. The 3D points are classified as the outline points or the non-outline points by novel view. The former corrects the shape of model and the latter improves the colour accuracy in the synthesised image. Design suitable energy functions according to two different types of 3D points, which consist of photo-consistency, contour/visual hull and surface smoothness. References Amenta, N., Choi, S. and Kolluri, R.K. (2001) The power crust, in Proceedings of the Sixth ACM Symposium on Solid Modeling and Applications, ser. SMA 01, ACM, New York, NY, USA, pp.249 266. Aurenhammer, F. (1991) Voronoi diagrams a survey of a fundamental geometric data structure, ACM Comput. Surv., September, Vol. 23, No. 3, pp.345 405. Borgefors, G. (1986) Distance transformations in digital images, Computer Vision, Graphics and Image Processing, No. 34, pp.344 371. Delaunoy, A., Prados, E., Gargallo, P., Pons, J-P. and Sturm, P. (2008) Minimizing the multi-view stereo reprojection error for triangular surface meshes, 19th British Machine Vision Conference, BMVC 2008, September. Furukawa, Y. and Ponce, J. (2010) Accurate, dense, and robust multiview stereopsis, IEEE Transactions on Pattern Analysis and Machine Intelligence, August, Vol. 32, No. 8, pp.1362 1376. Goesele, M., Curless, B. and Seitz, S. (2006) Multi-view stereo revisited, in 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Vol. 2, pp.2402 2409. Kutulakos, K. and Seitz, S. (2000) A theory of shape by space carving, International Journal of Computer Vision, Vol. 38, No. 3, pp.199 218. Laurentini, A. (1994) The visual hull concept for silhouette-based image understanding, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 16, No. 2, pp.150 162. Lee, D.T. and Schachter, B.J. (1980) Two algorithms for constructing a Delaunay triangulation, International Journal of Parallel Programming, Vol. 9, No. 3, pp.219 242. Lin, H. and Wu, J. (2008) 3d reconstruction by combining shape from silhouette with stereo, in International Conference on Pattern Recognition. Matsuyama, T., Wu, X., Takai, T. and Wada, T. (2004) Real-time dynamic 3-D video, IEEE Transactions on Circuits and Systems for Video Technology, Vol. 14, No. 3, p.357. Seitz, S. and Dyer, C. (2002) Photorealistic scene reconstruction by voxel coloring, US Patent 6,363,170, 26 March. Starck, J. and Hilton, A. (2005) Virtual view synthesis of people from multiple view video sequences, Graphical Models, Vol. 67, No. 6, pp.600 620. Ukita, N., Kawate, S. and Kidode, M. (2008) IBR-based free-viewpoint imaging of a complex scene using few cameras, in 19th International Conference on Pattern Recognition, ICPR 2008, pp.1 4. Verlani, P., Goswami, A., Narayanan, P., Dwivedi, S. and Penta, S. (2006) Depth images: representations and real-time rendering, in Third International Symposium on 3D Data Processing, Visualization, and Transmission, pp.962 969.