Image-Based Deformation of Objects in Real Scenes Han-Vit Chung and In-Kwon Lee Dept. of Computer Science, Yonsei University sharpguy@cs.yonsei.ac.kr, iklee@yonsei.ac.kr Abstract. We present a new method for deforming an object in a static real scene, which interacts with animated synthetic characters. Unlike the existing method - making a new synthetic object to substitute for the interacting object in the real scene, we directly deform the object in image space using an image-warping technique with assistance from a simplified 3D sub-model. The deformed image sequence is processed further using the Expression Ratio Image (ERI) technique to apply the illumination changes generated by the deformation. Using this method, we can maintain the photo-realistic rendering quality of the scene efficiently by preserving the appearance of the object in the real scene as it is. 1 Introduction Rendering synthetic objects into real scenes is a popular technique used in many applications such as film production and augmented reality [1]. To render an existing object in a real scene (we call this object an image-object in this paper) that is being deformed by the interaction with a synthetic character, the image-object should also be modelled and animated as another synthetic object (we call this method synthetic-copy in this paper). The simulation of interaction between the character and image-object can be rendered with original textures and lighting conditions extracted from the real scene. Then, a final animation would be created by the careful composition of the real and the synthetic scenes. When an image-object is complex enough as a part of a complex environment, the object is strongly coupled with other objects in the environment. Separating the object from the original scene is not easy. If the deformations of the object generated by the interactions with a synthetic character are not too large, the animator may want to find another method instead of using synthetic-copy. For example, if an image includes a bed with very complex shapes and textures, and a synthetic ball is thrown on the bedspread, we may try to find a more efficient way rather than modelling and animating the deformations of the bedspread synthetically. In the film making process, the interacting real objects are often deformed with the aid of a virtual object such as a transparent glass stick. Before the final composition of the real scene and synthetic characters, the glass stick in the real scene is erased frame by frame.
2 Han-Vit Chung and In-Kwon Lee In this paper, we suggest a new method for handling the interactions between a synthetic character and an image-object in a static real scene. A simplified 3D sub-model representing the image-object is used in the simulation of the interactions with the synthetic character. The geometric change of the 3D sub-model generated by the simulation is used to deform the original image-object using warping techniques in image space. To reflect the illumination change generated by the deformation, we apply the Expression Ratio Image (ERI) technique to the deformed image. Because we only need the change of silhouette and the ratio of illumination in our process, a simplified white 3D sub-model can be used in the simulation. Using this method, we can maintain the photo-realistic rendering quality of the image-object by preserving the textures and lighting conditions in the original real scene more effectively than the synthetic-copy method. There are some assumptions and restrictions in our method. First, we consider only a single static real scene as the background image; thus, the position and orientation of the camera are always fixed during the animation. Second, we assume the positions and directions of a camera and lights in the real scene are already known. This is feasible because those parameters can be known when the real scene is newly photographed for the animation. If not, we can use conventional methods (see next section) to estimate the camera and light parameters. Third, we assume the size of the deformation of the image-object is not too large, so any special case such as cloth folding does not arise in the simulation. 2 Related Work Various image-based lighting techniques [2] have been developed for recovering reflectance properties of real surfaces under unknown illumination. Liu et al. [3] proposed a method of capturing the illumination change of one person s expression. They map ERI to any different faces to generate more expressive facial expressions. There have been many research projects for getting camera attributes from several images, but it is still a challenge to extract camera parameters from a single image [4]. Much effort has also gone into the area of deforming an image. According to Gomes et al. [5], the numerous warping methods can be roughly classified into parameter- [6], free-form- [7, 8], and feature-based [9, 10] techniques. There have been a lot of research results of generating animations from 2D pictures or photographs. Image-based rendering methods generate new images having an arbitrary viewpoint, using a lot of 2D input images [11 15], or by creating 3D graphical models or animations from an image [16 18]. 3 Image-Based Deformation 3.1 3D Sub-Model To deform an image-object according to a given scenario, we need appropriate 3D deformation data. In our method, a 3D sub-model (see Fig. 1) that corresponds to the image-object is used in the simulation of interaction. This model can
Image-Based Deformation 3 Fig. 1. The top left is an image-object. The rest are the frames in the animation of a 3D sub-model corresponding to the change of geometry and illumination of the image-object be constructed using image-based modelling methods [16, 17] combined with a known camera position. The shapes of 3D sub-models should be similar enough to those of the original image-objects, but some simplifications may be made as follows: The back of shapes hidden from the fixed camera view can be omitted. The shapes, which are not deformed by the interaction, can be omitted. For example, we can use the sub-model of the bedspread only, excluding other parts of bed in the ball on the bed example in Section 1. The complex shapes, which do not have any serious effect on the change of silhouette in the 2D view of the model, can be ignored. The colors and textures of the 3D sub-model do not have to be considered because the sub-model will not be rendered into the final rendered output. We used plain white color for all sub-models in our experimental examples, which is also helpful in measuring the illumination change in the ERI process. Fig. 2. Geometric Change: The leftmost image shows the 2D grid on the image-object. According to the deformations of 3D sub-model (middle white models), the image-object is geometrically deformed (right)
4 Han-Vit Chung and In-Kwon Lee Fig. 3. (a) Grid G M for a 3D sub-model. (b) Vertices in G M are aligned to the feature points. (c) Changed grid G M is in the middle of the simulation. (d) Initial static image. (e) Aligned grid G I is identical to G M. (f) Warped image according to the grid change in (c). 3.2 Geometric Change The two same 2D grids are generated: G I and G M, which cover the imageobject and the initial projected view of the 3D sub-model, respectively (see Fig. 2 and 3). Some feature points are defined on the deformed surface of the 3D sub-model. Initially, G M is aligned to the feature points by moving each grid vertex to the projected 2D coordinates of the closest feature point from the grid vertex (see Fig. 3(b)). When the surface is deformed, some feature points are moved, the corresponding grid vertices in G M can be moved automatically to the changed projected coordinates of the feature points (see Fig. 3(c)). The grid G I is deformed to be identical to G M for each frame, and the image-object is warped using a conventional, two-pass warping method [7, 8] using the deformed grid G I (see Fig. 3(f)). We need to control the resolution of the set of feature points and grid so that each feature point corresponds exclusively to a single grid vertex. Simultaneously, it is desirable to maintain the resolution of the grid as low as possible, because the unbound grid vertices that do not correspond to any feature point cannot move in the deformation process, which prevents the exact deformation of the image-object. Sometimes the background pixels around the image-objects should be warped together with the pixels that are included in the image objects. Especially, in the waving cloth example shown in Fig. 2, the background pixels near the boundary of the cloth object are shown and hidden alternately in the animation. In this case, we make the grid cover a slightly larger region, including some of the boundary pixels as well as the object. Then, the warping process can generate the natural deformation of the background pixels, which prevents annoying clipping and composition of the image-object from the background scene, frame by frame. We assume the deformation of the object is always local enough that the degenerate cases, including self-intersections among the grid edges, do not arise in the middle of the simulation. Because of this restriction, we cannot treat scenes including large deformations, for example, a flapping flag in a strong wind, using our method. If such a large deformation is needed, we need to use the synthetic-copy method for the image-object.
Image-Based Deformation 5 3.3 Illumination Change Fig. 4. Applying Illumination Change: The image sequence in the first row shows the geometrically warped images. Applying the change of illumination, gives the result shown in the second row, which looks more expressive and convincing. As Fig. 4 shows, the geometric warping alone cannot generate a convincing deformation result. In the first row of the figure, we cannot see the realistic deformation effects because the resulting image sequence does not show the details such as wrinkles and illumination changes. Using the known light positions, we can construct virtual light sources to illuminate the 3D sub-model in the simulation process. The intensities and other properties of the lights are manually controlled to be as similar as possible to the real scene. Because, we only measure the ratio of illumination change, this rough approximation of light properties is usually acceptable in most cases. The rendered white 3D sub-model shows the illumination changes, which can be applied to the image-object using the ERI technique [3]. Assume there are m light sources illuminating some point p on a surface. Let n denote normal vector of the surface at p, and l i and I i, 1 i m, denote the light direction from p to the ith light source and the intensity of the light source, respectively. Suppose the surface is diffuse, and let ρ be its reflectance coefficient at p. ERI (R) is defined as the ratio of light intensity at the same point p between two arbitrary lightened images: R I I = ρ m i=1 I in l i ρ m i=1 I. (1) in l i From the above equation, we have I = RI for a point p on the surface. To apply the illumination change computed by ERI to the image-object at the ith frame, we need the following images: A 0 : rendered image of the 3D sub-model at the initial frame. A i : rendered image of the deformed 3D sub-model at the ith frame. A 0: warped image generated by warping A 0 to have the same geometry as A i.
6 Han-Vit Chung and In-Kwon Lee B i : the geometrically deformed image-object at the ith frame. As proven in [3], we can compute the ERI (R) easily by comparing two pixel values at same position: R(u, v) = A i(u, v) A (2) 0 (u, v), where (u, v) is the coordinates of the pixel in image space. Then, we can apply the ERI to compute a new intensity of the image-object at the pixel (u, v) as follows: B i(u, v) = R(u, v)b i (u, v). (3) Note that the algorithm requires a warping from A 0 to A 0, because the ERI should be calculated at two points at the same position in two images A 0 and A i. After computing a series of background real scene, including the imageobjects that are deformed appropriately, the composition of the background scene and the animation of the synthetic characters using conventional composition techniques [19] produces the final animation. Fig. 5. Example 1: (a) The original, fixed real scene. (b) Two synthetic objects: a fly and a ball. (c) Composition result with proper deformations of the laundry object interacting with the synthetic ball. Fig. 6. Example 2: (a) The original fixed real scene. (b) A synthetic electric fan. (c) The animation resulting from using our image-based deformation method. (d) The animation resulting from modelling and animating the synthetic cloth model: the cloth is rendered separately using texture mapping.
Image-Based Deformation 7 4 Results In the animation in Fig. 5, a ball is thrown on the laundry to shoo away a fly. The time and position of each collision between the laundry and the ball are predefined in a scenario, which also implies the collision force related to the velocity and mass of the ball. We used a well-known mass-spring model [20] to simulate the deformation of the laundry generated by the collision. We used a 23 23 grid for the geometric warping. The original real scene is an image of 1400 1050 pixels. Fig. 6 shows an example of a piece of cloth flapping in the wind generated by an electric fan. A 23 23 grid is used for this example. We can observe the differences between the animations generated using image-based deformation (Fig. 6(c)) and the synthetic-copy method (Fig. 6(d)). When both the electric fan and a piece of cloth are synthetic models, the quality of the rendering for the synthetic models depends on the animator s skill and time invested in modelling and animating the characters. Our result looks sufficiently expressive and convincing compared to the result of the traditional method. 5 Conclusions and Future work This paper suggests a new method for rendering the deformations of an imageobject that is interacting with a synthetic character rendered into a real scene. With the aid of a 3D sub-model, we can deform the image-object directly in image space, using warping and ERI techniques. Some convincing experimental results are shown in this paper. The most important advantage of this approach is that we can maintain the photo-realistic rendering quality of the image-object because we directly use the original image-object to deform it. Doing so prevents going through the complex process of extracting the exact lighting information from the original real scene. The image-warping technique exploited for geometric deformation has some limitations. A large amount of deformation cannot be handled with this approach. In that case, it still seems that there is no way to solve the problem without using the synthetic-copy method. Moreover, when the deformed face in the 3D sub-model is nearly parallel to the viewing direction, the face becomes too small to be deformed with the image warping. We need more investigation for these cases. Applying our method to the real video scene with a dynamically changing viewpoint and direction may be another extension. We are also trying to improve our method to be applied to real-time environments like 3D games. When the interaction happens, the 3D sub-model for the background object is deformed with great rapidity, and other processes such as image warping and ERI should be enough fast to support the real-time speed.
8 Han-Vit Chung and In-Kwon Lee Acknowledgement This work was supported by grant No. R01-2004-000-10117-0(2004) from the Basic Research Program of the Korea Science & Engineering Foundation. References 1. Birn J. Digital Lighting & Rendering. New Riders Press, 2000. 2. Debevec P. HDRI and Image-Based Lighting. SIGGRAPH 2003 Course Notes No.19, 2003. 3. Liu Z., Shan Y., and Zhang Z. Expresive Expression Mapping with Ratio Image. Proceedings of SIGGRAPH 2001, 2001, pp. 271-276. 4. Gabor Blasko. Vision-based Camera Matching Using Markers, Proceedings of CESCG 2000, 2000 5. Gomes J., Darsa L., Costa B., and Velho L. Warping and Morphing of Graphical Objects, SIGGRAPH 95 Course Notes No.3, 1995 6. Barr A.H. Global and Local Deformations of Solid Primitives. In Proceedings of SIGGRAPH 84, 1984, pp. 21-30. 7. Smith A.R. Planar 2-Pass Texture Mapping and Warping, Proceedings of SIGGRAPH 87, 1987, pp. 263-271. 8. Smithe D.B., A Two-Pass Mesh Warping Algorithm for Object Transformation and Image Interpolation, Technical Report 1030, ILM Computer Graphics Department, Lucas lm, 1990. 9. Beier T., Neely S. Feature-Based Image Metamorphosis, Proceedings of SIG- GRAPH 92, 1992, pp. 263-271. 10. Lee S.-Y., Chwa K.-Y., Shin S.-Y., Wolberg G. Image Metamorphosis Using Snakes and Free-Form Deformations, Proceedings of SIGGRAPH 95, 1995, pp. 439-448. 11. McMillan L., Bishop G. Plenoptic modeling : An Image-based rendering system, Proceedings of SIGGRAPH 95, 1995, pp. 39-46 12. Chen S. E. QuickTime VR - An Image-Based Approach to Virtual Environment Navigation, ACM SIGGRAPH 95 Conference Proceedings, 1995, pp. 29-38. 13. Mark W. R., McMillan L., and Bishop G. Post-Rendering 3D Warping, Proceedings 1997 Symposiums on Interactive 3D Graphics, 1997, pp. 7-16. 14. Chen S. E., Williams L. View Interpolation for Image Synthesis, In SIG- GRAPH 93 Conference Proceedings, 1993, pp. 279-288 15. Shade J., Gortler S. J., He L. W., Szeliski R., Layered Depth Images, Proceedings of SIGGRAPH 98, 1998, pp. 231-242. 16. Liebowitz D., Criminisi A., and Zisserman A. Creating Architectural Models from Images, In EuroGraphics 99 Conference Proceedings, 1999, pp. 39-50. 17. Debevec P., Taylor C., and Malik J. Modeling and Rendering Architecture from Photographs: A Hybrid Geometry- and Image-Based Approach, Proceedings of SIGGRAPH 96, 1996, pp. 11-20. 18. Chuang Y., Goldman D., Zheng K., Curless B., Salesin D., Szeliski R. Animating Pictures With Stochastic Motion Textures, Proceedings of SIG- GRAPH 2005, 2005, pp. 853-860. 19. Wright, S. 2001. Digital Compositing for Film and Video. Focal Press. 20. Witkin A., Baraff D., Blum M., and Monheit G. Physically Based Modeling: Principles and Practice. SIGGRAPH Course Notes 19, 1997.