ASIAGRAPH 2008 The Intermediate View Synthesis System For Soccer Broadcasts

Similar documents
FLY THROUGH VIEW VIDEO GENERATION OF SOCCER SCENE

Player Viewpoint Video Synthesis Using Multiple Cameras

Free Viewpoint Video Synthesis based on Visual Hull Reconstruction from Hand-Held Multiple Cameras

Free Viewpoint Video Synthesis and Presentation of Sporting Events for Mixed Reality Entertainment

Recent Trend for Visual Media Synthesis and Analysis

Dual-Mode Deformable Models for Free-Viewpoint Video of Sports Events

Occlusion Detection of Real Objects using Contour Based Stereo Matching

Using Shape Priors to Regularize Intermediate Views in Wide-Baseline Image-Based Rendering

Factorization Method Using Interpolated Feature Tracking via Projective Geometry

Modeling, Combining, and Rendering Dynamic Real-World Events From Image Sequences

3D Tracking of a Soccer Ball Using Two Synchronized Cameras

Real-Time Video-Based Rendering from Multiple Cameras

Appearance-Based Virtual View Generation From Multicamera Videos Captured in the 3-D Room

Volumetric stereo with silhouette and feature constraints

A virtual tour of free viewpoint rendering

Projective Surface Refinement for Free-Viewpoint Video

3D Corner Detection from Room Environment Using the Handy Video Camera

CS 4495 Computer Vision A. Bobick. Motion and Optic Flow. Stereo Matching

International Conference on Communication, Media, Technology and Design. ICCMTD May 2012 Istanbul - Turkey

Stereo vision. Many slides adapted from Steve Seitz

Image-Based Deformation of Objects in Real Scenes

Image-Based Modeling and Rendering

A FREE-VIEWPOINT VIDEO SYSTEM FOR VISUALISATION OF SPORT SCENES

Vision-Based Registration for Augmented Reality with Integration of Arbitrary Multiple Planes

A Novel Multi-Planar Homography Constraint Algorithm for Robust Multi-People Location with Severe Occlusion

Live Video Integration for High Presence Virtual World

Image Based Lighting with Near Light Sources

Image Based Lighting with Near Light Sources

Automatic Generation of Animatable 3D Personalized Model Based on Multi-view Images

But First: Multi-View Projective Geometry

CONVERSION OF FREE-VIEWPOINT 3D MULTI-VIEW VIDEO FOR STEREOSCOPIC DISPLAYS

Measurement of 3D Foot Shape Deformation in Motion

Image Transfer Methods. Satya Prakash Mallick Jan 28 th, 2003

CS 4495 Computer Vision A. Bobick. Motion and Optic Flow. Stereo Matching

1-2 Feature-Based Image Mosaicing

Deformable Mesh Model for Complex Multi-Object 3D Motion Estimation from Multi-Viewpoint Video

Outdoor Scene Reconstruction from Multiple Image Sequences Captured by a Hand-held Video Camera

3D Surface Reconstruction from 2D Multiview Images using Voxel Mapping

Calibrated Image Acquisition for Multi-view 3D Reconstruction

Feature Transfer and Matching in Disparate Stereo Views through the use of Plane Homographies

Computer Vision / Computer Graphics Collaboration for Model-based Imaging, Rendering, image Analysis and Graphical special Effects

Approach to Minimize Errors in Synthesized. Abstract. A new paradigm, the minimization of errors in synthesized images, is

Stereo Vision. MAN-522 Computer Vision

But, vision technology falls short. and so does graphics. Image Based Rendering. Ray. Constant radiance. time is fixed. 3D position 2D direction

calibrated coordinates Linear transformation pixel coordinates

Multi-view Stereo. Ivo Boyadzhiev CS7670: September 13, 2011

A SXGA 3D Display Processor with Reduced Rendering Data and Enhanced Precision

Fundamental Matrices from Moving Objects Using Line Motion Barcodes

3D Digitization of a Hand-held Object with a Wearable Vision Sensor

Image-Based Rendering

Multi-view stereo. Many slides adapted from S. Seitz

3D Image Analysis and Synthesis at MPI Informatik

Real-time Integral Photography Holographic Pyramid using a Game Engine

Stereo Image Rectification for Simple Panoramic Image Generation

3D Reconstruction from Scene Knowledge

A New Image Based Ligthing Method: Practical Shadow-Based Light Reconstruction

Specular 3D Object Tracking by View Generative Learning

Surface Capture for Performance-Based Animation

Virtualized Reality: Concepts and Early Results

BIL Computer Vision Apr 16, 2014

Automatically Synthesising Virtual Viewpoints by Trinocular Image Interpolation

PERFORMANCE CAPTURE FROM SPARSE MULTI-VIEW VIDEO

EE795: Computer Vision and Intelligent Systems

Performance Evaluation Metrics and Statistics for Positional Tracker Evaluation

INTERACTIVE VIRTUAL VIEW VIDEO FOR IMMERSIVE TV APPLICATIONS

Stereo and Epipolar geometry

Stereo. 11/02/2012 CS129, Brown James Hays. Slides by Kristen Grauman

Expanding gait identification methods from straight to curved trajectories

Visualization 2D-to-3D Photo Rendering for 3D Displays

Computer Science. Carnegie Mellon. DISTRIBUTION STATEMENT A ApproyedforPublic Release Distribution Unlimited

View Synthesis using Convex and Visual Hulls. Abstract

Multi-View Stereo for Static and Dynamic Scenes

Colour Segmentation-based Computation of Dense Optical Flow with Application to Video Object Segmentation

Some books on linear algebra

Camera Calibration. Schedule. Jesus J Caban. Note: You have until next Monday to let me know. ! Today:! Camera calibration

Person identification from spatio-temporal 3D gait

Acquisition and Visualization of Colored 3D Objects

Colorado School of Mines. Computer Vision. Professor William Hoff Dept of Electrical Engineering &Computer Science.

Step-by-Step Model Buidling

Efficient View-Dependent Sampling of Visual Hulls

Fast Natural Feature Tracking for Mobile Augmented Reality Applications

Stereo. Outline. Multiple views 3/29/2017. Thurs Mar 30 Kristen Grauman UT Austin. Multi-view geometry, matching, invariant features, stereo vision

arxiv: v1 [cs.cv] 28 Sep 2018

TEXTURE OVERLAY ONTO NON-RIGID SURFACE USING COMMODITY DEPTH CAMERA

Robust Graph-Cut Scene Segmentation and Reconstruction for Free-Viewpoint Video of Complex Dynamic Scenes

Vision is inferential. (

Dense 3D Reconstruction. Christiano Gava

Parallel Pipeline Volume Intersection for Real-Time 3D Shape Reconstruction on a PC Cluster

Hierarchical Matching Techiques for Automatic Image Mosaicing

Passive 3D Photography

Today. Stereo (two view) reconstruction. Multiview geometry. Today. Multiview geometry. Computational Photography

Capturing and View-Dependent Rendering of Billboard Models

Polyhedral Visual Hulls for Real-Time Rendering

HIGH SPEED 3-D MEASUREMENT SYSTEM USING INCOHERENT LIGHT SOURCE FOR HUMAN PERFORMANCE ANALYSIS

Interactive Free-Viewpoint Video

Camera Calibration for a Robust Omni-directional Photogrammetry System

CS4495/6495 Introduction to Computer Vision. 3B-L3 Stereo correspondence

Textureless Layers CMU-RI-TR Qifa Ke, Simon Baker, and Takeo Kanade

Multimedia Technology CHAPTER 4. Video and Animation

View Generation for Free Viewpoint Video System

Transcription:

ASIAGRAPH 2008 The Intermediate View Synthesis System For Soccer Broadcasts Songkran Jarusirisawad, Kunihiko Hayashi, Hideo Saito (Keio Univ.), Naho Inamoto (SGI Japan Ltd.), Tetsuya Kawamoto (Chukyo Television Broadcasting Co., Ltd.), Naoki Kubokawa, Toru Fujiwara (Nippon Television Network Corporation) songkran@ozawa.ics.keio.ac.jp, hayashi@ozawa.ics.keio.ac.jp, saito@ozawa.ics.keio.ac.jp Abstract: Keywords: In recent years, computer vision technologies are intensively applied for generating novel visual presentation of visual media for variety practical use including TV broadcast. In this paper, we will present a new technique for generating freeviewpoint video of soccer games that aim practical use with broadcasting quality. The key-point for the practical broadcast use is not automatic procedure for generating the free-viewpoint videos with some error frames, but efficient procedure for generating the free-viewpoint videos without any errors in frame. For achieving such efficient procedure, we propose an interactive user-interface for non-erroneous segmentation of the player from input soccer video. The segmentation of players is very important for generating high quality free-viewpoint videos, but it is no easy task. Therefore, our interactive segmentation with graph-cut algorithm can significantly contribute the efficient generation of free-viewpoint video. In this paper, we will also demonstrate how the developed interface system works by showing the generated free-viewpoint videos. Free Viewpoint Synthesis, Graph-Cut 1. Introduction In traditional broadcasting system, viewer can see a scene only from the capturing position. New viewpoint video can break this limit by allowing viewer to see the scene from their desired viewpoint. By generating new viewpoint video in post-production of broadcasting system, virtual playbacks of the key events can be viewed from many angles. This technology will enhance the viewer's experience and it is a breakthrough of traditional broadcasting system. Ideally, automatic view generating system is desired. However, some error frames in new view images usually occur due to the errors in player segmentation. Segmentation of player in a soccer scene is not trivial due to low resolution of the players in the input video and uncontrolled environment. Only a little miss segmentation may lead to obvious artifact in output new viewpoint video. In broadcasting where image quality is uncompromisable, perfect visual result is required. Manual operations for correct some error frames in automatic process are necessary. In this paper, we will present a new technique for generating free-viewpoint video of soccer games target for practical use with broadcasting quality. The key-point for the practical broadcast use is not automatic procedure for generating the free-viewpoint videos with some error frames, but efficient procedure for generating the free-viewpoint videos without any errors in frame. For achieving such efficient procedure, we propose an interactive user-interface for non-erroneous segmentation of the player from input soccer video. The segmentation of players is very important for generating high quality freeviewpoint videos, but it is no easy task. Therefore, our interactive segmentation with graph-cut algorithm [Boykov01] can significantly contribute the efficient generation of free-viewpoint video. In this paper, we will also demonstrate how the developed interface system works by showing the generated free-viewpoint videos. 1.1. Related Works Pioneering research in new viewpoint image synthesis of a dynamic scene is Virtualized Reality [Kanade95]. In that research, 51 cameras are placed around hemispherical dome called 3D Room to transcribe a scene. 3D structure of a moving human is then reconstruct for rendering from new view. Many methods for improving quality of free viewpoint image have been proposed. Carranza recover human motion by fitting a human shaped model to multiple view silhouette input images for accurate shape recovery of the human body[carranza03]. Starck optimizes a surface mesh using stereo and silhouette data to generate high accuracy virtual view image [Starck03]. Saito et al. propose appearancebased method [Saito99], which combines advantage from Image Based Rendering and Model Based Rendering. Figure 1 Capturing Environment

In the mentioned researches, they proposed techniques mainly for using with studio environment. In outdoor environment, it is more difficult to generate high quality new viewpoint video due to uncontrolled light condition, natural background and less resolution of object (in case that a scene is large). For the new viewpoint of outdoor sport scene, Kanade et al. have developed the eyevision system that was demonstrated at the Superbowl XXXV. Inamoto et al. have demonstrated a new viewpoint image system for soccer scene using image morphing [Inamoto04]. Billboards representation of soccer players have been used in new viewpoint system as well [Koyama03] [Hayashi06]. However these systems are limited either in the quality and some also require special equipment. 2. Capturing Environment Figure 1 is the capturing environment and cameras position. We use 4 cameras to capture the soccer scene. The input is captured from the qualify match on December 2nd 2006 of 85 th nationwide high school soccer championships Aichi prefecture between Chukyo high school of Chukyo University v.s. Tohou high school. Camera used for capturing the scene is Sony HDR-HC3 with video resolution 1440x1080 RGB 24 bits color. This capturing is supported by Chukyo Television Broadcasting and Nihon Television Broadcasting Company. Image and Video copyright is owned by Chukyo television. Figure 2 Overview of Inamoto s Method 3. New View Rendering Method Our system use the view interpolation method proposed by Inamoto et.al [Inamoto03] and add interactive graph-cut functionality for segmenting soccer players without any errors for broadcasting quality. In this section, we will firstly explain Inamoto s method and the case where artifact might occur and need some manual correction. In section 4, we will explain our interactive system for generating new viewpoint image which can produce broadcasting quality new viewpoint image by resolving the case where artifact occurs in fully automatic system. Figure 2 shows the overview of Inamoto s new viewpoint image generation method. From figure 2, soccer scene is divided into three types of area which are moving area (players and ball area), field area and audience area. From input images, fundamental matrices and homography of field and background regions between cameras are needed to render new viewpoint images. 2D-2D corresponding points for such estimation are selected from natural features manually. Field and background regions are approximated as planar. Homography matrices are used for warping these regions to new viewpoint image. Moving objects are firstly segmented from static area (ground and background) using ordinary background subtraction. This step sometimes produces error of segmentation in some frames and result to visual artifact in output new viewpoint image. Corresponding silhouette is Figure 3 Making Corresponding Silhouettes made automatically by the following method. From the assumption that the feet of soccer players normally attach to the ground, correspondence of silhouette in another camera can be found using homography of the soccer field to warp foot position (assume to be the most left bottom pixel) to the another camera and find the closest foot position to match silhouette. Figure 3 shows the explained matching algorithm. From corresponding silhouettes, dense pixels correspondence are found by project a pair of epipolar line onto both reference views. There will be four points, a 1, b 1, a 2, and b 2, that intersect the edge of corresponding silhouettes as figure 4. This line correspondence will be used for linear interpolation on a new view image. By

projecting enough pairs of epipolar line and make linear interpolation, new viewpoint image can be generated by this method. Figure 4 Making Dense Correspondences 4. Proposed System 4.1 Interactive Soccer Players Segmentation Using Graph-Cut Because automatic segmentation sometimes results in wrong segmentation, we develop interactive system for soccer players segmentation using graph-cut optimization method proposed in [Boykov01]. The user marks certain pixels as object or background to provide hard constraints for segmentation. The marked hard constrain pixels are guarantee to be segmented as object or background as user specify. These hard constraints provide clues on what the user intends to segment. The rest of the image is segmented automatically by computing a global optimum among all segmentations satisfying the hard constraints. The cost function is defined in terms of boundary and region properties of the segments. These properties can be viewed as soft constraints for segmentation. Once user specify hard constrain, graph-cut can find global segmentation result that gives globally optimal solution based on given hard constrain. If user found the result still unsatisfied, user can redefine hard constrain to get better result as many times as needed. Re-computation time is also fast enough to give interactive feeling to the user and convenient to use. Using the method described in section 3, new viewpoint image can be generated automatically. However using conventional background subtraction causes bad silhouette segmentation result. Moreover, in some cases there could be miss corresponding silhouette which cause noticeable artifact in the rendered new viewpoint image. We have developed Graphical User Interface (GUI) for solving these problems by having two functionalities. The first one is interactive foreground segmentation using graphcut. The second one is silhouette division and silhouette corresponding. In the following section we will describe both functionalities in detail. Figure 5 shows our GUI for generating new viewpoint image. Figure 5 Our Graphical User Interface Figure 6 Soccer Players Segmentation Using Graph-Cut (1 Input Frame)

Figure 6 show snapshots of soccer player segmentation using graph-cut (one input frame). In figure 6 the green line is user-specified area of foreground (soccer players) and the red line is user-specified area of background. We can see that user need only a few interactions to complete segmentation task. In case of input video that consists of consecutive frames rather than only single image, graph-cut segmentation is also suitable to apply in this case. User only needs to specify hard constrain in some key-frame, then this hard constrain will take effect to the previous and the next frame as well. Figure 7 shows the example segmentation result of consecutive five frames. Hard constrains are only specified in three frames. The result shows a satisfy segmentation. soccer player segmentation[boykov01]. This could reduce tedious time user need to do manual segmentation. The result new viewpoint also shows promising quality of output images which is good enough and suitable for broadcasting. (a) camera3 (b) camera4 Figure 8 Input Images from Camera3 and Camera4 4.2 Interactive Silhouette Labelling and Corresponding Silhouette labelling and finding corresponding silhouette in another camera could be done automatically using described method in section 3. By the way, problems of miss corresponding can occur in the following situations. - Many players are segmented to the same silhouette - Warped foot accidentally closer to another silhouette (of not the same person) in another camera In the first case, user of the system can correct by manually dividing silhouette that contains multiple players into each player. In the second case, user can correct by redefining corresponding silhouette in the other camera. Using our GUI, these tasks can be done easily. Figure 8 shows input images from two cameras. Figure 9 is the segmentation and labeling result using automatic method describe in se ction 3. Figure 10 is the segmentation and labeling result using our GUI. The same silhouette color means two silhouettes are corresponding to each other. 5. Experimental Result In this section, we firstly compare the new viewpoint video generated using fully automatic method described in section 3 with the one from our interactive method. Figure 11 is new viewpoint images generated from segmentation and labeling result as in Figure 9 while Figure 12 shows new viewpoint images generated from Figure 10 (our system). Figure 13 shows the result new viewpoint images from our proposed system. The weight ratios between two reference cameras are written under each figure. We can see that output rendered image has almost the same quality as input image itself. We cannot distinguish between rendered view and the actual input image so it is suitable for using in television broadcasting. 6. Conclusions In this paper, we have presented the developed system for generating new viewpoint images targeted for practical use in broadcasting. Because quality of output new viewpoint images is the most important criteria, manual operation is needed when automatic process fails to achieve perfect result. Our system use graph-cut for interactive (a) camera3 (b) camera4 Figure 9 Segmentation and Corresponding Result Using Method Described in Section 3 (a) camera3 (b) camera4 Figure 10 Correct Segmentation and Corresponding Result Using Our GUI References Boykov,Y., Jolly,M.P., Interactive Graph Cuts for Optimal Boundary & Region Segmentation of Objects in N-D Images, ICCV2001, vol. I, pp. 105-112 Carranza,J., Theobalt,C., Magnor,M. Seidel, H.-P. Free viewpoint video of human actors. In Proceedings of SIGGRAPH 0 3, pages 569 577. Hayashi,K. Saito,H., Synthesizing free-viewpoint images f rom multiple view videos in soccer stadium. In International Conference on Computer Graphics, Imaging and Visualisation, 0:220 225, 2006. Inamoto,N., Saito,H., 2003. Fly-through viewpoint video system for multi-view soccer movie using viewpoint interpolation. In Visual Communications and Image Processing (VCIP2003). Vol.5150, 122. Jul2003 Inamoto,N., Saito, H., Arbitrary viewpoint observation for soccer match video. European Conference on Visual Media Production, pages 21 30, 2004

Kanade,T., Rander, P. W., Narayanan, P.J. Virtualized reality: concepts and early results. In IEEE Workshop on Representation of Visual Scenes, pages 69 76, 1995. Koyama, T., Kitahara, I., Ohta, Y.. Live mixed-reality 3d video in soccer stadium. International Symposium on Mixed and Augmented Reality, pages 178 186, 2003. Saito, H., Kanade, T., Shape reconstruction in projective grid space from large number of images. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 99), June 1999. Starck, J., Hilton, A., Towards a 3D virtual studio for human appearance capture. In IMA International Conference on Vision, Video and Graphics (VVG), pages 17 24, 2003. Figure 7 Consecutive Five Frames Which Are Segmented Using Graph-Cut Figure 11 New Viewpoint Image between Camera3 and Camera4 Generate from Figure 9 Figure 12 New Viewpoint Image between Camera3 and Camera4 Generated from Figure 10

Figure 13 The Result New Viewpoint Images from the Proposed System