3D Reconstruction of Dynamic Textures with Crowd Sourced Data. Dinghuang Ji, Enrique Dunn and Jan-Michael Frahm

Similar documents
Multi-View Stereo for Static and Dynamic Scenes

Image Based Reconstruction II

Multi-view stereo. Many slides adapted from S. Seitz

PS3 Review Session. Kuan Fang CS231A 02/16/2018

Multi-view Stereo. Ivo Boyadzhiev CS7670: September 13, 2011

CS 231A Computer Vision (Winter 2018) Problem Set 3

3D Photography: Stereo Matching

Prof. Trevor Darrell Lecture 18: Multiview and Photometric Stereo

Multiple View Geometry

Multiview Reconstruction

Multi-View 3D-Reconstruction

Large Scale 3D Reconstruction by Structure from Motion

Dynamic 3D Shape From Multi-viewpoint Images Using Deformable Mesh Model

Augmenting Crowd-Sourced 3D Reconstructions using Semantic Detections: Supplementary Material

HISTOGRAMS OF ORIENTATIO N GRADIENTS

Deformable Mesh Model for Complex Multi-Object 3D Motion Estimation from Multi-Viewpoint Video

3D Object Model Acquisition from Silhouettes

Efficient View-Dependent Sampling of Visual Hulls

COMPARISON OF PHOTOCONSISTENCY MEASURES USED IN VOXEL COLORING

Dense 3D Reconstruction. Christiano Gava

Photo Tourism: Exploring Photo Collections in 3D

FOREGROUND DETECTION ON DEPTH MAPS USING SKELETAL REPRESENTATION OF OBJECT SILHOUETTES

Shape from Silhouettes

Some books on linear algebra

Volumetric Scene Reconstruction from Multiple Views

Step-by-Step Model Buidling

A Systems View of Large- Scale 3D Reconstruction

CS 4495 Computer Vision A. Bobick. Motion and Optic Flow. Stereo Matching

Multi-View 3D Reconstruction of Highly-Specular Objects

Geometric Reconstruction Dense reconstruction of scene geometry

Shape from Silhouettes I

CS5670: Computer Vision

Storyline Reconstruction for Unordered Images

Dense 3D Reconstruction. Christiano Gava

Chaplin, Modern Times, 1936

Calibrated Image Acquisition for Multi-view 3D Reconstruction

Contents I IMAGE FORMATION 1

EECS 442 Computer vision. Announcements

Modeling and Recognition of Landmark Image Collections Using Iconic Scene Graphs

Tri-modal Human Body Segmentation

A Statistical Consistency Check for the Space Carving Algorithm.

The SIFT (Scale Invariant Feature

SIMPLE ROOM SHAPE MODELING WITH SPARSE 3D POINT INFORMATION USING PHOTOGRAMMETRY AND APPLICATION SOFTWARE

PERFORMANCE CAPTURE FROM SPARSE MULTI-VIEW VIDEO

3D Computer Vision. Depth Cameras. Prof. Didier Stricker. Oliver Wasenmüller

Presented at the FIG Congress 2018, May 6-11, 2018 in Istanbul, Turkey

Shape from Silhouettes I CV book Szelisky

BSB663 Image Processing Pinar Duygulu. Slides are adapted from Selim Aksoy

Multi-View Reconstruction Preserving Weakly-Supported Surfaces

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009

TA Section 7 Problem Set 3. SIFT (Lowe 2004) Shape Context (Belongie et al. 2002) Voxel Coloring (Seitz and Dyer 1999)

Structured light 3D reconstruction

Miniature faking. In close-up photo, the depth of field is limited.

Image-Based Modeling and Rendering. Image-Based Modeling and Rendering. Final projects IBMR. What we have learnt so far. What IBMR is about

Discrete Optimization of Ray Potentials for Semantic 3D Reconstruction

Lecture 8 Active stereo & Volumetric stereo

Live Metric 3D Reconstruction on Mobile Phones ICCV 2013

Topics to be Covered in the Rest of the Semester. CSci 4968 and 6270 Computational Vision Lecture 15 Overview of Remainder of the Semester

Space-time Isosurface Evolution for Temporally Coherent 3D Reconstruction

Static Scene Reconstruction

Volumetric stereo with silhouette and feature constraints

Comparison of Local Feature Descriptors

Semantic 3D Reconstruction of Heads Supplementary Material

Intrinsic3D: High-Quality 3D Reconstruction by Joint Appearance and Geometry Optimization with Spatially-Varying Lighting

Shape from Silhouettes I

3D reconstruction how accurate can it be?

PEOPLE IN SEATS COUNTING VIA SEAT DETECTION FOR MEETING SURVEILLANCE

Visual Hull Construction in the Presence of Partial Occlusion

3D Digitization of a Hand-held Object with a Wearable Vision Sensor

!!!"#$%!&'()*&+,'-%%./01"&', Tokihiko Akita. AISIN SEIKI Co., Ltd. Parking Space Detection with Motion Stereo Camera applying Viterbi algorithm

Image-Based Modeling and Rendering

Geometric Registration for Deformable Shapes 1.1 Introduction

BIL Computer Vision Apr 16, 2014

arxiv: v1 [cs.cv] 28 Sep 2018

Single-view 3D Reconstruction

Building a Panorama. Matching features. Matching with Features. How do we build a panorama? Computational Photography, 6.882

3D Surface Reconstruction from 2D Multiview Images using Voxel Mapping

Finally: Motion and tracking. Motion 4/20/2011. CS 376 Lecture 24 Motion 1. Video. Uses of motion. Motion parallax. Motion field

Spatio-Temporally Consistent Correspondence for Dense Dynamic Scene Modeling

CSE/EE-576, Final Project

Multi-View Reconstruction using Narrow-Band Graph-Cuts and Surface Normal Optimization

Stereo Wrap + Motion. Computer Vision I. CSE252A Lecture 17

An Evaluation of Volumetric Interest Points

3D Computer Vision. Structured Light II. Prof. Didier Stricker. Kaiserlautern University.

3D Perception. CS 4495 Computer Vision K. Hawkins. CS 4495 Computer Vision. 3D Perception. Kelsey Hawkins Robotics

Multiple-Choice Questionnaire Group C

Occlusion Detection of Real Objects using Contour Based Stereo Matching

Evaluation and comparison of interest points/regions

3D Wikipedia: Using online text to automatically label and navigate reconstructed geometry

Tracking system. Danica Kragic. Object Recognition & Model Based Tracking

Multi-View 3D-Reconstruction

Camera Drones Lecture 3 3D data generation

Fundamental Matrices from Moving Objects Using Line Motion Barcodes

3D Computer Vision. Dense 3D Reconstruction II. Prof. Didier Stricker. Christiano Gava

Determination of Volume Characteristics of Cells from Dynamical Microscopic Image

Learning to generate 3D shapes

WP1: Video Data Analysis

Augmented Reality VU. Computer Vision 3D Registration (2) Prof. Vincent Lepetit

Visual Shapes of Silhouette Sets

3D Reconstruction from Scene Knowledge

Transcription:

3D Reconstruction of Dynamic Textures with Crowd Sourced Data Dinghuang Ji, Enrique Dunn and Jan-Michael Frahm 1

Background Large scale scene reconstruction Internet imagery 3D point cloud Dense geometry 2

Motivation No man ever steps in the same river twice. --Heraclitus No local patch ever appears in the same fountain twice 3

Goal Bring static scene reconstruction alive 3D shape of the dynamic scene elements More realistic (dynamic) visualizations 4

Related works Nelson, R., Polana, R.: Qualitative recognition of motion using temporal texture. CVGIP: Image Understanding (1992) Activities Motion Events Dynamic textures 5

Related works Reconstruction and rendering of Time-Varying Natural Phenomena. PhD thesis of Ivo Ihrke, 2007. Modeling Dynamic Scenes Recorded with Freely Moving Cameras. Taneja et.al. ECCV 2010 What Shape are Dolphins? Building 3D Morphable Models from 2D Images. Cashman et. Al. PAMI 2012 6

Framework Data acquisition Rough model estimation Closed-loop modelling 7

Framework Data acquisition Rough model estimation Closed-loop modelling 8

Image based Scene Reconstruction Generate the static background and obtain camera parameters. Trevi fountain Mooney waterfall Navagio beach Piccadilly circus billboard 9

Video Frame selection Select sequential video frames contain stable dynamic motions Large viewpoint change Good frame sequence Heavy occlusion Frame sample 1 Frame sample n Frame sample m 10

Selected video sequences 11

Video Frame selection Extract HOG feature, and use NCC to measure the similarity. Histogram of Gradient Normalized Cross Correlation Local cell Histogram of orientation 12

Framework Data acquisition Rough model estimation Closed-loop modelling 13

Rough model estimation Selected frame sequences Dynamic texture segmentation Shape-from-Silhouettes 14

Foreground mask from videos Input video sequence Input video fragment 1 2 3 4 5 Final mask 15

Foreground mask from videos Homography based video stabilization Input video fragment 1 2 3 4 5 Final mask 16

Foreground mask from videos Accumulated frame differencing Input video fragment 1 2 3 4 5 Final mask 17

Foreground mask from videos Otsu thresholding and morphology operation Input video fragment 1 2 3 4 5 Final mask 18

Foreground mask from videos Remove small connected regions (final mask) Input video fragment 1 2 3 4 5 Final mask 19

Background mask from videos Feature matches between neighboring video frames Remove feature matches in foreground mask Estimate concave hull mask 20

Background mask estimation Alpha shape method Find the boundary of a set of points 21

Original image Graph-cut segmentation mask refinement ( green: static background, red: dynamic foreground) Foreground mask Background mask 22

Graph-cut segmentation Two labels image segmentation Solve with min-cut/max-flow method 23

Initial model generation Silhouettes from videos Shape-from-Silhouettes 24

Classic Shape from silhouettes 25

Classic Shape from silhouettes 26

Classic Shape from silhouettes 27

Classic Shape from silhouettes 28

Shape from silhouettes Problem Some of the silhouettes are not complete, this will carve away valid part of the reconstructed object. 29

Shape-from-Silhouettes: accumulative volume 30

Visualization with texture Static background + rough model 31

Project back to 2D images Classic Shape from silhouettes Shape from silhouettes fusion 32

Framework Data acquisition Rough model estimation Closed-loop modelling 33

Why we use Flickr images? 1. Reuse their camera parameters generated in static reconstruction. 2. Youtube videos usually have smaller resolutions (60% videos less than 360*480). 3. Isolated images expand the camera distributions, which are critical for shape-from-silhouettes methods. 34

Why we use Flickr images? 1500 registered video frames 800 Flickr registered images 14658 3D points with covering range 135 degree 68392 3D points with covering range 287 degree 35

Closed-loop modelling Rough model Project to Flickr images Generate a new model iteration 36

Project initial model to photo collections 37

Background mask of images Original image in photo-collection Nearest-neighbor in GIST feature space 38

Closed Loop 3D Shape Refinement Iteration 1 Frontal view Top view 39

Closed Loop 3D Shape Refinement Iteration 2 Frontal view Top view 40

Closed Loop 3D Shape Refinement Iteration 3 Frontal view Top view 41

Closed Loop 3D Shape Refinement Iteration 4 Frontal view Top view 42

Closed Loop 3D Shape Refinement Iteration 5 Frontal view Top view 43

Closed Loop 3D Shape Refinement Iteration 6 Frontal view Top view 44

Closed Loop 3D Shape Refinement Iteration 7 Frontal view Top view 45

Closed Loop 3D Shape Refinement Iteration 8 Frontal view Top view 46

Closed Loop 3D Shape Refinement Iteration 9 Frontal view Top view 47

Problem Over-segment 48

Problem Over-segment (frontal view) (top view) 49

Shape-from-Silhouettes two-way carving shape-from-silhouettes with foreground mask Keep only occupied voxels shape-from-silhouettes with background mask 50

Problem Uneven camera distribution 51

Shape-from-Silhouettes Weighted carving 52

Shape-from-Silhouettes Weighted carving 150 [0,30] l i l 0 100 50 [30,60] [60,90] [90,120] 0 camera # [120,150] [150,180] 53

Results Piccadilly circus without weighting Piccadilly circus with weighting Navagio beach without weighting Navagio beach with weighting 54

Implementation details Experiments the first iteration use an intersection ratio of 0.10, and increment a small number (i.e. 0.03) each iteration. To ensure convergence, we use a subset of wide field-ofview images and test their segmentation change. Rough initial model is generated by 15~30 video frames. Usually finished within 10 iterations, less than 5 hours. 55

Comparisons Experiments PMVS by Y. Furukawa et. Al multi-view stereo method for rigid structure. CMPMVS by M. Jancosek et. Al multi-view stereo method, show good results for weakly supported surface, i.e. water surface. 56

Dataset Experiments Keyframes sampled every 50 frames. Dataset Videos Downloaded Image Downloaded Keyframes Extracted Trevi Fountain 481 6000 68629 810 Navagio Beach 300 1000 45823 520 Piccadilly Circus Billboard 460 5000 75983 496 Mooney Falls 200 1000 17850 723 Images used for model refinement 57

58

Demos 59

Conclusions Initial trials on exploration of dynamic 3D reconstruction 3D reconstruction framework for Dynamic texture Robust shape-from-silhouettes method Dynamic texture cosegmentation 60