FLY THROUGH VIEW VIDEO GENERATION OF SOCCER SCENE

Similar documents
ASIAGRAPH 2008 The Intermediate View Synthesis System For Soccer Broadcasts

Player Viewpoint Video Synthesis Using Multiple Cameras

Free Viewpoint Video Synthesis and Presentation of Sporting Events for Mixed Reality Entertainment

Recent Trend for Visual Media Synthesis and Analysis

Image Transfer Methods. Satya Prakash Mallick Jan 28 th, 2003

A Review of Image- based Rendering Techniques Nisha 1, Vijaya Goel 2 1 Department of computer science, University of Delhi, Delhi, India

Compositing a bird's eye view mosaic

1-2 Feature-Based Image Mosaicing

An Algorithm for Seamless Image Stitching and Its Application

FACE ANALYSIS AND SYNTHESIS FOR INTERACTIVE ENTERTAINMENT

Cybercity Walker - Layered Morphing Method -

Free Viewpoint Video Synthesis based on Visual Hull Reconstruction from Hand-Held Multiple Cameras

Factorization Method Using Interpolated Feature Tracking via Projective Geometry

Image Based Lighting with Near Light Sources

Image Based Lighting with Near Light Sources

Computer Science. Carnegie Mellon. DISTRIBUTION STATEMENT A ApproyedforPublic Release Distribution Unlimited

Real-Time Video-Based Rendering from Multiple Cameras

Using Shape Priors to Regularize Intermediate Views in Wide-Baseline Image-Based Rendering

IMAGE-BASED RENDERING TECHNIQUES FOR APPLICATION IN VIRTUAL ENVIRONMENTS

Vision-Based Registration for Augmented Reality with Integration of Arbitrary Multiple Planes

Natural Viewing 3D Display

Overview. Related Work Tensor Voting in 2-D Tensor Voting in 3-D Tensor Voting in N-D Application to Vision Problems Stereo Visual Motion

Efficient Stereo Image Rectification Method Using Horizontal Baseline

Efficient Rendering of Glossy Reflection Using Graphics Hardware

A SXGA 3D Display Processor with Reduced Rendering Data and Enhanced Precision

A virtual tour of free viewpoint rendering

Real-time Integral Photography Holographic Pyramid using a Game Engine

CS6670: Computer Vision

Appearance-Based Virtual View Generation From Multicamera Videos Captured in the 3-D Room

Computer Vision / Computer Graphics Collaboration for Model-based Imaging, Rendering, image Analysis and Graphical special Effects

CONVERSION OF FREE-VIEWPOINT 3D MULTI-VIEW VIDEO FOR STEREOSCOPIC DISPLAYS

Multiview Reconstruction

55:148 Digital Image Processing Chapter 11 3D Vision, Geometry

calibrated coordinates Linear transformation pixel coordinates

Multi-view Surface Inspection Using a Rotating Table

55:148 Digital Image Processing Chapter 11 3D Vision, Geometry

Occlusion Detection of Real Objects using Contour Based Stereo Matching

Synthesizing Realistic Facial Expressions from Photographs

Image-Based Rendering

Live Video Integration for High Presence Virtual World

Modeling, Combining, and Rendering Dynamic Real-World Events From Image Sequences

AR Cultural Heritage Reconstruction Based on Feature Landmark Database Constructed by Using Omnidirectional Range Sensor

Multiple View Geometry of Projector-Camera Systems from Virtual Mutual Projection

3D Tracking of a Soccer Ball Using Two Synchronized Cameras

Ray tracing based fast refraction method for an object seen through a cylindrical glass

Volume Rendering. Computer Animation and Visualisation Lecture 9. Taku Komura. Institute for Perception, Action & Behaviour School of Informatics

View Synthesis by Trinocular Edge Matching and Transfer

Stereo Image Rectification for Simple Panoramic Image Generation

Measurement of Pedestrian Groups Using Subtraction Stereo

Video Alignment. Literature Survey. Spring 2005 Prof. Brian Evans Multidimensional Digital Signal Processing Project The University of Texas at Austin

Video Alignment. Final Report. Spring 2005 Prof. Brian Evans Multidimensional Digital Signal Processing Project The University of Texas at Austin

Face Cyclographs for Recognition

EE795: Computer Vision and Intelligent Systems

But, vision technology falls short. and so does graphics. Image Based Rendering. Ray. Constant radiance. time is fixed. 3D position 2D direction

Announcements. Mosaics. Image Mosaics. How to do it? Basic Procedure Take a sequence of images from the same position =

FACIAL ANIMATION FROM SEVERAL IMAGES

Stereo pairs from linear morphing

3D Object Model Acquisition from Silhouettes

Outdoor Scene Reconstruction from Multiple Image Sequences Captured by a Hand-held Video Camera

But First: Multi-View Projective Geometry

Motion analysis for broadcast tennis video considering mutual interaction of players

Visualization 2D-to-3D Photo Rendering for 3D Displays

Stereo and Epipolar geometry

Announcements. Mosaics. How to do it? Image Mosaics

Volumetric Warping for Voxel Coloring on an Infinite Domain Gregory G. Slabaugh Λ Thomas Malzbender, W. Bruce Culbertson y School of Electrical and Co

Structure from motion

Flexible Calibration of a Portable Structured Light System through Surface Plane

Real-time Generation and Presentation of View-dependent Binocular Stereo Images Using a Sequence of Omnidirectional Images

3D FACE RECONSTRUCTION BASED ON EPIPOLAR GEOMETRY

Structure from Motion

Image Base Rendering: An Introduction

EECS 442 Computer vision. Stereo systems. Stereo vision Rectification Correspondence problem Active stereo vision systems

Measurement of 3D Foot Shape Deformation in Motion

N-Views (1) Homographies and Projection

(a) (b) (c) Fig. 1. Omnidirectional camera: (a) principle; (b) physical construction; (c) captured. of a local vision system is more challenging than

Digitization of 3D Objects for Virtual Museum

3D Reconstruction of a Hopkins Landmark

Graphics Pipeline 2D Geometric Transformations

Feature-Based Image Mosaicing

Performance Evaluation Metrics and Statistics for Positional Tracker Evaluation

Shape as a Perturbation to Projective Mapping

Stereo vision. Many slides adapted from Steve Seitz

CS 2770: Intro to Computer Vision. Multiple Views. Prof. Adriana Kovashka University of Pittsburgh March 14, 2017

INTERACTIVE VIRTUAL VIEW VIDEO FOR IMMERSIVE TV APPLICATIONS

Automatically Synthesising Virtual Viewpoints by Trinocular Image Interpolation

Image Warping and Mosacing

Ray Tracing through Viewing Portals

Some books on linear algebra

Expression Morphing Between Different Orientations

MR-Mirror: A Complex of Real and Virtual Mirrors

Image Based Rendering. D.A. Forsyth, with slides from John Hart

Introduction to Computer Vision. Week 3, Fall 2010 Instructor: Prof. Ko Nishino

A FREE-VIEWPOINT VIDEO SYSTEM FOR VISUALISATION OF SPORT SCENES

Lecture 6 Stereo Systems Multi-view geometry

CS 4495 Computer Vision A. Bobick. Motion and Optic Flow. Stereo Matching

Computed Photography - Final Project Endoscope Exploration on Knee Surface

A Robust and Efficient Motion Segmentation Based on Orthogonal Projection Matrix of Shape Space

Dense 3-D Reconstruction of an Outdoor Scene by Hundreds-baseline Stereo Using a Hand-held Video Camera

An Overview of Matchmoving using Structure from Motion Methods

Accurate 3D Face and Body Modeling from a Single Fixed Kinect

Transcription:

FLY THROUGH VIEW VIDEO GENERATION OF SOCCER SCENE Naho INAMOTO and Hideo SAITO Keio University, Yokohama, Japan {nahotty,saito}@ozawa.ics.keio.ac.jp Abstract Recently there has been great deal of interest in making system that enables an audience to view sports event from any arbitrary viewpoint. The rapid development of networks and computers will soon make it possible to provide a form of real-time video that allows the viewer to select any desired view of the action. This paper introduces a novel method for generating an arbitrary view of a scene in a soccer match. This allows the viewers to observe the action in a soccer game from a favorite viewpoint, such as one that corresponds to the movement of the ball or the tracking of one defender. We give some examples of video images we have generated that make the viewers feel as if they are flying through the field. Keywords: arbitrary viewpoint, soccer scene, broadcasting 1. Introduction Television allows us to watch sporting events, such as the Olympic Games and soccer's World Cup, no matter where the events are held. However, all viewers get the same view and none of them has any power to control the viewpoint. A new form of visual media that allows an audience to watch a sporting event from any desired viewpoint is thus desirable. Kanade [3, 4] has been studying the creation of virtual models of dynamic events in the real world and the use of these models in constructing views of the events. His unique new technology called "Eye Vision" was used to present playbacks of Super Bowl XXXV, which was held in Tampa in the USA in Jan. 2001. More than 30 cameras obtain images of the action, each for a different angle, and a computer combines the video streams from these cameras. The sequences of images from the different angles are then used to create a three-dimensional effect The original version of this chapter was revised: The copyright line was incorrect. This has been corrected. The Erratum to this chapter is available at DOI: 10.1007/978-0-387-35660-0_65 R. Nakatsu et al. (eds.), Entertainment Computing IFIP International Federation for Information Processing 2003

110 Naho Inamoto, Hideo Saito of walking around the action. The technology that controls all of the cameras to capture the same target at the same size enables such effects. We are working on the problems of synthesizing views of dynamic scenes. With computer vision technology and images from multiple viewpoints, it will be possible to create novel views that have not been captured by any camera. This approach may produce an enhanced form of the Eye Vision effect by the interpolation between the various views and even allow the viewers to see a scene from whatever angle they want. As the technology advances along these lines, it will come to be a completely new way to view sports and entertainment events. This paper introduces a novel method for generating an arbitrary view of a soccer scene. The scene is classified into dynamic regions, a field region, and a background region. An image of the required arbitrary view is generated for the respective regions and superimposition then completes the desired view of the whole object. By applying this method to actual scenes of a soccer match captured at a stadium, we will be able to present videos that make viewers feel as if they are flying through the stadium. 2. View Synthesis The several approaches to the view synthesis problem that have been proposed may be categorized into two groups. In the first group are those in which a full-3d model of an object is constructed to generate the desired view [3, 8J. The quality of the virtual view image then depends on accuracy of the 3D model. As multiple video cameras or range scanners are typically used to construct an accurate model, this approach requires large amounts of calculation and information. Once the 3D model has been constructed, however, we are able to flexibly change the viewpoint to present another view. In the second group, the arbitrary view image is synthesized without an explicit 3D model; instead, some form of image warping, such as transfer of correspondences [1, 2, 5, 6J is used. While this approach doesn't require too much calculation or information, it is only possible to move the viewpoint within a limited area. Our purpose is to generate arbitrary views of scenes in a soccer match. Since the movements of each player are complex, it's almost impossible to reconstruct an accurate 3D model of the scene. Furthermore, the camera calibration that is necessary to compute 3D positions is very difficult in a real stadium. Therefore, it's not practical to apply a method that requires 3D models to the generation of views of soccer scenes. Instead of using 3D models, we classify the objects in the scene into three planes.

Fly Through View Video Generation of Soccer Scene 111 Figure 1. Overview of the proposed method. Figure 1 gives an overview of the approach. Firstly, the image is classified into dynamic regions, in which the shape or position changes over time, and a static region. In a soccer scene, the former corresponds to players and the ball and the latter to the ground, goal, and background. Next, the static region is further classified into two regions on the basis of shape. The first is a field region, which we can approximate as sets of planes, and the other is a background region, which we can approximate as an infinitely distant plane. Arbitrary views of the respective regions are then generated and the whole image is synthesized by their superimposition. In this section, we describe the regions. 2.1. The Dynamic Regions A single scene usually contains several players and a ball, so we deal with these objects separately. Firstly, all dynamic regions are extracted by subtracting the background from the image. Considering color (RGB) vectors as well as intensity leads to more accurate extraction of these regions. After the silhouettes have been generated for the segmentation of each player and the ball, the correspondence between the segmented silhouettes of the players and those in images from other viewpoints is obtained by applying homography to the ground as shown in figure 2. This is based on the fact that the players are standing on the ground and the feet of the players can thus be related by homography of the ground.

112 Naho Inamoto, Hideo Saito Figure 2. Correspondence for the dynamic regions. Next, each pair of silhouettes is extracted to obtain the pixel-wise correspondence within the silhouette. Epipolar lines are drawn between images in the different views, view 1 and view 2, by using a fundamental matrix. On each epipolar line, the correspondences of intersections with boundaries in the silhouettes, such as al and a2, bl and b2 of figure 2, are made first. The correspondences between the pixels inside the silhouette are obtained by linear interpolation of the points of intersection. After a dense correspondence for the whole silhouette is obtained, the pixel values are transferred from the source images of view 1 and view 2 to a virtual view image by image morphing, i.e., by linear interpolation according to the displacement of the pixel positions. The location in the synthesized view is given by where PI, P2 are the coordinates of matching points in images h, 12, and a defines the relative weights given to the respective actual viewpoints. p (1) Real View 1 Virtual View Figure 3. Image morphing.

Fly Through View Video Generation of Soccer Scene 113 All correspondences are used in the transfer to generate a warped image. Here two transfers are required, one from view 1 and the other from view 2. Two warped images are thus generated; they are then blended to complete the image of the virtual view. If the color of a pixel is different in the two images, the corresponding pixel in the virtual view is rendered with the average of the colors; otherwise the rendered color is from either actual image. For all of the extracted players and the ball, a pixel-wise correspondence as described above is established for the rendering of arbitrary views. Finally, the dynamic regions are synthesized in order of distance from the viewpoint. An arbitrary view image of the dynamic regions is thus composed. 2.2. Field Region Of the objects in our scene, the ground and soccer goal can be considered as a single plane and a set of planes, respectively. We then apply homography to the planes to obtain the correspondences required for the generation of images of arbitrary views. The following equation gives the pixel-wise correspondence for two views of a plane. where H is the homographic matrix that represents the transformation between the planes, and PI, P2 are homogenous coordinates on the images h, h of different views. The homographic matrices of the planes that represent the ground and the planes of the soccer goal provide the dense correspondence within these regions. The pixel values are then transferred to corresponding points by image morphing to complete the destination images in the same way as for the players and ball. Figure 4 presents examples of generated images of the ground and goal regions, where the virtual viewpoint has been placed at the center of the pair of actual viewpoints. 2.3. Background Region The background may be considered as a single infinitely distant plane, so we are able to compose images from each of the two input view-.,i!rtl!1 (2) Figure 4. Arbitrary view images for the field region.

114 Naho Inamoto, Hideo Saito Figure 5. Synthesized panoramic image. points to make mosaics that are the respective panoramic images of the background. Arbitrary views of this region are extracted from these panoramic images. In composition, we start by integrating the coordinate systems of the two views in the homographic matrix H 21 for the background. Next, blending the pixel values of the overlap area so that pixel colors at junction areas can be smoothed connects the two backgrounds. Partial area that is necessary for each arbitrary view image is cut out from the mosaic image thus synthesized. The following homographic matrix, ii, is then used in the transformation of coordinates to complete the arbitrary view of the background region., -1 H=(1-a)E+aH21 (3) where a is the weight and E is the unit matrix. Figure 5 presents an example of mosaic images thus synthesized. 3. Interactive Videos When the methods described above are applied to the synthesis of an image sequence, the ground, goal, and background may be regarded as stable, so that frame-by-frame generation of these elements is inefficient. Instead, the stable regions are generated in advance for all possible virtual viewpoints; the dynamic regions are then synthesized for each frame. First of all, the two cameras nearest the virtual viewpoint given by the user are selected and the required pairs of images are obtained. Next, the dynamic regions are extracted from each of the images, and the arbitrary views of these regions are generated according to the method introduced in the previous section. These images are then composed with the stable regions from the same virtual viewpoint. This completes the image of the whole scene from the desired viewpoint. For example, we can produce a video that gives the viewer the feeling of flying through the field with a viewpoint that moves with the ball. Another possibility is a video in which a particular player is kept near the center of the view to make it easier to analyze the movements of this player.

Fly Through View Video Generation of Soccer Scene 115 4. Experimental Results We have applied this method to scenes of an actual soccer match that were taken by multiple video cameras at the stadium. A set of four cameras is placed to one side of the field to capture the penalty area mainly. All input images are 720x480 pixels, 24-bit-RGB color images. In this situation (a real stadium), camera calibration that is sufficiently accurate for the estimation of camera rotation and position is almost impossible. The proposed method does not require accurate calibration; instead, it takes advantage of fundamental matrices between the viewpoints of the cameras. We are easily able to obtain these through correspondences between several feature points in the images. In this experiment, we manually selected about 20 corresponding feature points in the input images. Figure 6 presents some results of generated arbitrary view images and figure 7 presents a sequence of images of a scene as seen from different angles. From (a) to (h) in figure 6, the position of players and location of the background gradually change with the changing viewpoint. When we compare the results with the input images, we see that we were able to successfully obtain realistic and distortion-free images. Although the method involves the rendering of separate regions, the synthesized images look so natural that the boundaries between the regions are not visible. Full-color versions of these images and generated fly through view videos are available at http://www.ozawa.ics.keio.ac jprnahotty/soccer.html Figure 6. Generated arbitrary view images. Figure 7. Image sequence from the different angles.

116 Naho Inamoto, Hideo Saito 5. Conclusions This paper has presented a novel method for the generation of video images of a soccer game from arbitrary viewpoints. The key idea behind the proposed method is to start by segmenting the image into object regions according to the properties of a soccer scene and to then apply the appropriate projective transformations and generate the desired view. This enables us to successfully generate fly through view videos from actual images of soccer matches that were captured at a stadium. The method will lead to the creation of completely new and enjoyable ways to present and view soccer games. We are currently investigating the reduction of the number of manual operations required in this method, such as when the correspondences between player regions are obtained in complex scenes where the silhouettes cross each other. References [1] S.Avidan, A.Shashua, "Novel View Synthesis by Cascading Trilinear Tensors," IEEE Trans. on Visualization and Computer Graphics, Vol.4, No.4, pp.293-306, 1998. [2] S.E.Chen, L.Williams, "View Interpolation for Image Synthesis," Proc. of SIG GRAPH '93, pp.279-288, 1993. [3] T.Kanade, P.J.Narayanan, P.W.Rander, "Virtualised reality:concepts and early results," Proc. of IEEE Workshop on Representation of Visual Scenes, pp.69-76, 1995. [4] I.Kitahara, Y.Ohta, H.Saito, S.Akimichi, T.Ono, T.Kanade, "Recording Multiple Videos in a Large-scale Space for Large-scale Virtualized Reality," Proc. of International Display Workshops (ADjIDW'Ol), pp.1377-1380, 2001. [5] S.Pollard, M.Pilu, S.Hayes, A.Lorusso, "View synthesis by trinocular edge matching and transfer," Image and Vision Computing, VoLl8, pp.749-757, 2000. [6] S.M.Seitz, C.R.Dyer, "View Morphing," Proc. of SIGGRAPH '96, pp.21-30, 1996. [7] D.Snow, O.Ozier, P.A.Viola, W.E.L.Grimson, "Variable Viewpoint Reality," NTT R&D, Vo1.49, No.7, pp.383-388, 2000. [8] S.Yaguchi, H.Saito, "Arbitrary View Image Generation from Multiple Silhouette Images in Projective Grid Space," Proc. of SPIE Vo1.4309 (Videometrics and Optical Methods for 3D Shape Measurement), pp.294-304, 2001.