3D Computer Vision. Depth Cameras. Prof. Didier Stricker. Oliver Wasenmüller

Similar documents
Depth Sensors Kinect V2 A. Fornaser

Depth Cameras. Didier Stricker Oliver Wasenmüller Lecture 3D Computer Vision

Kinect Device. How the Kinect Works. Kinect Device. What the Kinect does 4/27/16. Subhransu Maji Slides credit: Derek Hoiem, University of Illinois

Lecture 19: Depth Cameras. Visual Computing Systems CMU , Fall 2013

Human Body Recognition and Tracking: How the Kinect Works. Kinect RGB-D Camera. What the Kinect Does. How Kinect Works: Overview

3D Computer Vision. Structured Light I. Prof. Didier Stricker. Kaiserlautern University.

CS4495/6495 Introduction to Computer Vision

3D Computer Vision. Structured Light II. Prof. Didier Stricker. Kaiserlautern University.

Epipolar Geometry and Stereo Vision

Epipolar Geometry and Stereo Vision

Binocular stereo. Given a calibrated binocular stereo pair, fuse it to produce a depth image. Where does the depth information come from?

Stereo. 11/02/2012 CS129, Brown James Hays. Slides by Kristen Grauman

Stereo vision. Many slides adapted from Steve Seitz

Image Based Reconstruction II

3D Computer Vision. Dense 3D Reconstruction II. Prof. Didier Stricker. Christiano Gava

Multiple View Geometry

Stereo. Many slides adapted from Steve Seitz

3D Photography: Active Ranging, Structured Light, ICP

Geometric Reconstruction Dense reconstruction of scene geometry

Outline. ETN-FPI Training School on Plenoptic Sensing

Depth Camera for Mobile Devices

Range Imaging Through Triangulation. Range Imaging Through Triangulation. Range Imaging Through Triangulation. Range Imaging Through Triangulation

Dense 3D Reconstruction. Christiano Gava

Outline. 1 Why we re interested in Real-Time tracking and mapping. 3 Kinect Fusion System Overview. 4 Real-time Surface Mapping

3D Photography: Stereo

BIL Computer Vision Apr 16, 2014

Structured Light. Tobias Nöll Thanks to Marc Pollefeys, David Nister and David Lowe

Dense 3D Reconstruction. Christiano Gava

KinectFusion: Real-Time Dense Surface Mapping and Tracking

Multiple View Geometry

Chaplin, Modern Times, 1936

Epipolar Geometry and Stereo Vision

Stereo Vision A simple system. Dr. Gerhard Roth Winter 2012

Multi-view stereo. Many slides adapted from S. Seitz

Time-of-Flight Imaging!

Computer Vision Lecture 17

Computer Vision Lecture 17

EE795: Computer Vision and Intelligent Systems

Stereo II CSE 576. Ali Farhadi. Several slides from Larry Zitnick and Steve Seitz

Cameras and Stereo CSE 455. Linda Shapiro

Multi-View 3D-Reconstruction

Miniature faking. In close-up photo, the depth of field is limited.

Structured Light II. Thanks to Ronen Gvili, Szymon Rusinkiewicz and Maks Ovsjanikov

Lecture'9'&'10:'' Stereo'Vision'

3D Scanning. Qixing Huang Feb. 9 th Slide Credit: Yasutaka Furukawa

3D Computer Vision 1

3D Object Representations. COS 526, Fall 2016 Princeton University

Feature Tracking and Optical Flow

EECS 442 Computer vision. Stereo systems. Stereo vision Rectification Correspondence problem Active stereo vision systems

Fundamentals of Stereo Vision Michael Bleyer LVA Stereo Vision

Lecture 14: Computer Vision

3D Computer Vision. Structure from Motion. Prof. Didier Stricker

Lecture 9 & 10: Stereo Vision

Laser sensors. Transmitter. Receiver. Basilio Bona ROBOTICA 03CFIOR

Range Sensors (time of flight) (1)

Lecture 10: Multi view geometry

Epipolar Geometry Prof. D. Stricker. With slides from A. Zisserman, S. Lazebnik, Seitz

Mobile Point Fusion. Real-time 3d surface reconstruction out of depth images on a mobile platform

Stereo: Disparity and Matching

There are many cues in monocular vision which suggests that vision in stereo starts very early from two similar 2D images. Lets see a few...

Hierarchical Volumetric Fusion of Depth Images

10/5/09 1. d = 2. Range Sensors (time of flight) (2) Ultrasonic Sensor (time of flight, sound) (1) Ultrasonic Sensor (time of flight, sound) (2) 4.1.

Recap: Features and filters. Recap: Grouping & fitting. Now: Multiple views 10/29/2008. Epipolar geometry & stereo vision. Why multiple views?

CS 4495 Computer Vision A. Bobick. Motion and Optic Flow. Stereo Matching

Colorado School of Mines. Computer Vision. Professor William Hoff Dept of Electrical Engineering &Computer Science.

Stereo Vision. MAN-522 Computer Vision

Dense Tracking and Mapping for Autonomous Quadrocopters. Jürgen Sturm

Public Library, Stereoscopic Looking Room, Chicago, by Phillips, 1923

Two-view geometry Computer Vision Spring 2018, Lecture 10

CS 2770: Intro to Computer Vision. Multiple Views. Prof. Adriana Kovashka University of Pittsburgh March 14, 2017

Active Stereo Vision. COMP 4900D Winter 2012 Gerhard Roth

Stereo CSE 576. Ali Farhadi. Several slides from Larry Zitnick and Steve Seitz

Complex Sensors: Cameras, Visual Sensing. The Robotics Primer (Ch. 9) ECE 497: Introduction to Mobile Robotics -Visual Sensors

Lecture 10 Dense 3D Reconstruction

A Comparison between Active and Passive 3D Vision Sensors: BumblebeeXB3 and Microsoft Kinect

c 2014 Gregory Paul Meyer

Image Rectification (Stereo) (New book: 7.2.1, old book: 11.1)

Colorado School of Mines. Computer Vision. Professor William Hoff Dept of Electrical Engineering &Computer Science.

Lecture 10 Multi-view Stereo (3D Dense Reconstruction) Davide Scaramuzza

Camera Calibration. Schedule. Jesus J Caban. Note: You have until next Monday to let me know. ! Today:! Camera calibration

Processing 3D Surface Data

Visual Perception Sensors

Mesh from Depth Images Using GR 2 T

Simultaneous Localization and Mapping (SLAM)

The main problem of photogrammetry

Rectification and Disparity

Sensor Modalities. Sensor modality: Different modalities:

Volume Illumination, Contouring

Processing 3D Surface Data

Lecture 16: Computer Vision

Lecture 16: Computer Vision

IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. 6, NO. 5, SEPTEMBER

CS 4495 Computer Vision A. Bobick. Motion and Optic Flow. Stereo Matching

Epipolar geometry contd.

Introduction to 3D Machine Vision

Sensing Deforming and Moving Objects with Commercial Off the Shelf Hardware

Computer Vision I. Announcement. Stereo Vision Outline. Stereo II. CSE252A Lecture 15

3D Modeling of Objects Using Laser Scanning

Stereo and structured light

Lecture 6 Stereo Systems Multi- view geometry Professor Silvio Savarese Computational Vision and Geometry Lab Silvio Savarese Lecture 6-24-Jan-15

Transcription:

3D Computer Vision Depth Cameras Prof. Didier Stricker Oliver Wasenmüller Kaiserlautern University http://ags.cs.uni-kl.de/ DFKI Deutsches Forschungszentrum für Künstliche Intelligenz http://av.dfki.de 1

Content Motivation Depth Measurement Techniques Applications - Kinect Fusion - Body Reconstruction 2

What is a depth camera? A depth camera captured depth images. A depth image indicates in each pixel the distance from the camera to the seen object. (x,y,z) Color Image Depth Image (color encoded) (x,y) In the following slides: z indicates the depth How did we capture depth in the previous lectures. Camera Center 3

Depth from Stereo Images image 1 image 2 Dense disparity map Parts of this slide are adapted from Derek Hoiem (University of Illinois), Steve Seitz (University of Washington) and Lana Lazebnik (University of Illinois) 4

Depth from Stereo Images Goal: recover depth by finding image coordinate x that corresponds to x X X x x x z x' f f C Baseline B C Parts of this slide are adapted from Derek Hoiem (University of Illinois), Steve Seitz (University of Washington) and Lana Lazebnik (University of Illinois) 5

Stereo and the Epipolar constraint X X X x x x x Potential matches for x have to lie on the corresponding line l. Potential matches for x have to lie on the corresponding line l. Parts of this slide are adapted from Derek Hoiem (University of Illinois), Steve Seitz (University of Washington) and Lana Lazebnik (University of Illinois) 6

Simplest Case: Parallel images Image planes of cameras are parallel to each other and to the baseline Camera centers are at same height Focal lengths are the same Then, epipolar lines fall along the horizontal scan lines of the images Parts of this slide are adapted from Derek Hoiem (University of Illinois), Steve Seitz (University of Washington) and Lana Lazebnik (University of Illinois) 7

Basic stereo matching algorithm For each pixel in the first image Find corresponding epipolar line in the right image Examine all pixels on the epipolar line and pick the best match Triangulate the matches to get depth information Parts of this slide are adapted from Derek Hoiem (University of Illinois), Steve Seitz (University of Washington) and Lana Lazebnik (University of Illinois) 8

disparity Depth from disparity x x O O x f z x B f z X x x f f Baseline B O O z Disparity is inversely proportional to depth! Parts of this slide are adapted from Derek Hoiem (University of Illinois), Steve Seitz (University of Washington) and Lana Lazebnik (University of Illinois) 9

Depth Measurement Techniques 10

Depth Measurement Techniques Parts of this slide are adapted from Victor Castaneda and Nassir Navab (both University of Munich) 11

Depth Measurement Techniques Laser Scanner Structured Light Projection Time of Flight (ToF) 12

Structured Light Projection Souce: https://www.youtube.com/watch?v=28jwgxbqx8w Parts of this slide are adapted from Derek Hoiem (University of Illinois) 13

Structured Light Projection (see also lectures about structured light) Surface Projector Sensor Parts of this slide are adapted from Derek Hoiem (University of Illinois) 14

Structured Light Projection Projector Camera Parts of this slide are adapted from Derek Hoiem (University of Illinois) 15

Example: Book vs. No Book Source: http://www.futurepicture.org/?p=97 16

Example: Book vs. No Book Source: http://www.futurepicture.org/?p=97 17

Region-growing Random Dot Matching 1. Detect dots ( speckles ) and label them as unknown 2. Randomly select a region anchor, a dot with unknown depth a. Windowed search via normalized cross correlation along scanline Check that best match score is greater than threshold; if not, mark as invalid and go to 2 b. Region growing 1. Neighboring pixels are added to a queue 2. For each pixel in queue, initialize by anchor s shift; then search small local neighborhood; if matched, add neighbors to queue 3. Stop when no pixels are left in the queue 3. Stop when all dots have known depth or are marked invalid http://www.wipo.int/patentscope/search/en/wo2007043036 Parts of this slide are adapted from Derek Hoiem (University of Illinois) 18

Projected IR vs. Natural Light Stereo What are the advantages of IR? Works in low light conditions Does not rely on having textured objects Not confused by repeated scene textures Can tailor algorithm to produced pattern What are advantages of natural light? Works outside, anywhere with sufficient light Uses less energy Resolution limited only by sensors, not projector Difficulties with both Very dark surfaces may not reflect enough light Specular reflection in mirrors or metal causes trouble Parts of this slide are adapted from Derek Hoiem (University of Illinois) 19

Example: The Kinect Sensor (v1) Microsoft Kinect (v1) was released in 2011 as a new kind of controller for the Xbox 360. Parts of this slide are adapted from Rob Miles (University of Hull) 20

Example: The Kinect Sensor The Kinect is able to capture depth and color images. Therefore it contains two cameras and an infrared projector. It has also four microphones. Parts of this slide are adapted from Rob Miles (University of Hull) 21

Example: The Kinect Sensor The Kinect sensor contains a high quality video camera which can provide up to 1280x1024 resolution at 30 frames a second. Parts of this slide are adapted from Rob Miles (University of Hull) 22

Example: The Kinect Sensor IR Projector IR Camera The Kinect depth sensor uses an IR projector and an IR camera to measure the depth of objects in the scene in front of the sensor. Parts of this slide are adapted from Rob Miles (University of Hull) 23

Time of Flight (ToF) Time-of-Flight (ToF) Imaging refers to the process of measuring the depth of a scene by quantifying the changes that an emitted light signal encounters when it bounces back from objects in a scene. Two common principals: Pulsed Modulation Continuous Wave Modulation 24

Time of Flight (ToF) Pulsed Modulation Measure distance to a 3D object by measuring the absolute time a light pulse needs to travel from a source into the 3D scene and back, after reflection Speed of light is constant and known, c = 3 10 8 m/s Parts of this slide are adapted from Victor Castaneda and Nassir Navab (both University of Munich) 25

Time of Flight (ToF) Pulsed Modulation Advantages: Direct measurement of time-of-flight High-energy light pulses limit influence of background illumination Illumination and observation directions are collinear Disadvantages: High-accuracy time measurement required Measurement of light pulse return is inexact, due to light scattering Difficulty to generate short light pulses with fast rise and fall times Usable light sources (e.g. lasers) suffer low repetition rates for pulses Parts of this slide are adapted from Victor Castaneda and Nassir Navab (both University of Munich) 26

Time of Flight (ToF) Continuous Wave Modulation Microsoft Kinect v2 works with this principal Continuous light waves instead of short light pulses Modulation in terms of frequency of sinusoidal waves Detected wave after reflection has shifted phase Phase shift proportional to distance from reflecting surface Parts of this slide are adapted from Victor Castaneda and Nassir Navab (both University of Munich) 27

Time of Flight (ToF) Continuous Wave Modulation Microsoft Kinect v2 works with this principal Retrieve phase shift by demodulation of received signal Demodulation by cross-correlation of received signal with emitted signal Emitted sinusoidal signal: Received signal after reflection from 3D surface: Cross-correlation of both signals: Parts of this slide are adapted from Victor Castaneda and Nassir Navab (both University of Munich) 28

Time of Flight (ToF) Microsoft Kinect v2 works with this principal Continuous Wave Modulation Cross-correlation function simplifies to Sample at four sequential instants with different phase offset : Directly obtain sought parameters: Parts of this slide are adapted from Victor Castaneda and Nassir Navab (both University of Munich) 29

Time of Flight (ToF) Microsoft Kinect v2 works with this principal Continuous Wave Modulation Advantages: Variety of light sources available as no short/strong pulses required Applicable to different modulation techniques (other than frequency) Simultaneous range and amplitude images Disadvantages: In practice, integration over time required to reduce noise Frame rates limited by integration time Motion blur caused by long integration time Parts of this slide are adapted from Victor Castaneda and Nassir Navab (both University of Munich) 30

Depth Quality e.g. Kinect v1 Souce: http://vision.in.tum.de/data/datasets/rgbd-dataset Main problems: Resolution Noise 31

Depth Camera - Disadvantage: 1. High noise (+/-15mm) 2. Low resolution (176*144) 3. High distortion + Advantage: 1. Real-time capture 2. Video frame with 2/3D information Variance distribution in a depth image taken at approx. 1.5m average distance from a scene. Depth images contain heavy noise near the corners.

Applications Kinect Fusion Body Reconstruction 34

Kinect Fusion Paper link (ACM Symposium on User Interface Software and Technology, October 2011) YouTube Video

Challenges Tracking camera precisely Fusing and de-noising measurements (depth estimates) Avoiding drift Real-Time Low-Cost hardware Parts of this slide are adapted from Richard A. Newcombe (Imperial College London) and Boaz Petersil (Israel Institute of Technology) 37

Proposed Solution Fast optimization for tracking; due to high frame rate Global framework for fusing data Interleaving tracking & mapping Using Kinect to get depth data ( low cost) Using GPU to get real-time performance ( low cost) Parts of this slide are adapted from Richard A. Newcombe (Imperial College London) and Boaz Petersil (Israel Institute of Technology) 38

Method Parts of this slide are adapted from Richard A. Newcombe (Imperial College London) and Boaz Petersil (Israel Institute of Technology) 39

KinectFusion- Depth map L projects a light spot P on an object surface, and O observes the spot Triangle (OPL) solve d Standard structured lighting model Given b,α and β Depth image (VGA &11-bit ) 40/72

KinectFusion-Vertex and normal map Vertex map is a 3D point cloud 3D vertex depth 2D depth point intrinsic matrix (IR camera) Normal vector indicates the direction of the surface at a vertex cross product neighboring points 41/72

KinectFusion- Camera tracking Small motion between consecutive positions of Kinect Find correspondences using projective data association Estimate camera pose T i by applying ICP algorithm to vertex and normal maps Tracking camera pose 42/72

Tracking Finding camera position is the same as fitting the depth map of a frame onto Model Tracking Mapping Parts of this slide are adapted from Richard A. Newcombe (Imperial College London) and Boaz Petersil (Israel Institute of Technology) 43

Tracking ICP algorithm ICP = iterative closest point Goal: fit two 3D point sets Already explained in Structured Light lecture Problem: What are the correspondences? Kinect fusion chosen solution: 1) Start with T 0 2) Project model onto camera 3) Correspondences are points with same coordinates 4) Find new T with Least Squares (with the 3D-3D points) 5) Apply T, and repeat 2-5 until convergence Tracking Mapping Parts of this slide are adapted from Richard A. Newcombe (Imperial College London) and Boaz Petersil (Israel Institute of Technology) 44

Tracking ICP algorithm Tracking Mapping Assumption: frame and model are roughly aligned. True because of high frame rate Parts of this slide are adapted from Richard A. Newcombe (Imperial College London) and Boaz Petersil (Israel Institute of Technology) 45

Mapping Mapping is fusing depth maps when camera poses are known Problems: measurements are noisy Depth maps have holes Solution: Using implicit surface representation Fusing = estimations from all frames relevant Tracking Mapping Parts of this slide are adapted from Richard A. Newcombe (Imperial College London) and Boaz Petersil (Israel Institute of Technology) 46

Mapping surface representation Surface is represented implicitly using Truncated Signed Distance Function (TSDF) Voxel grid Tracking Mapping Numbers in cells measure voxel distance to surface Parts of this slide are adapted from Richard A. Newcombe (Imperial College London) and Boaz Petersil (Israel Institute of Technology) 47

KinectFusion- Volumetric integration Volumetric representation (3 3 3m, 512 voxels/axis) (0,1] (outside of the surface) tsdf(g) = 0 (on the surface) [-1,0) (inside the surface) TSDF volume grid 48/72

Mapping Tracking Mapping Parts of this slide are adapted from Richard A. Newcombe (Imperial College London) and Boaz Petersil (Israel Institute of Technology) 49

Mapping Tracking Mapping d= [pixel depth] [distance from sensor to voxel] Parts of this slide are adapted from Richard A. Newcombe (Imperial College London) and Boaz Petersil (Israel Institute of Technology) 50

Mapping Tracking Mapping Parts of this slide are adapted from Richard A. Newcombe (Imperial College London) and Boaz Petersil (Israel Institute of Technology) 51

Mapping Tracking Mapping Parts of this slide are adapted from Richard A. Newcombe (Imperial College London) and Boaz Petersil (Israel Institute of Technology) 52

Mapping Tracking Mapping Last step: Voxel D is the weighted average of all measurements Parts of this slide are adapted from Richard A. Newcombe (Imperial College London) and Boaz Petersil (Israel Institute of Technology) 53

Handling drift Drift would have happened, if tracking was done from frame to frame Thus, tracking is done on built model Tracking Mapping Parts of this slide are adapted from Richard A. Newcombe (Imperial College London) and Boaz Petersil (Israel Institute of Technology) 55

KinectFusion- Surface rendering Ray-casting technique Cast a ray through the focal point for each pixel Traverse voxels along the ray Find the first surface by observing the sign change of tsdf(g) Compute the intersection point using points around the surface boundary surface YouTube Video TSDF volume grid 56/72

Pros: Pros & Cons Nice results Real time performance (30 Hz) Dense model No drift with local optimization Elegant solution Cons : 3D grid can not be trivially up-scaled Parts of this slide are adapted from Richard A. Newcombe (Imperial College London) and Boaz Petersil (Israel Institute of Technology) 57

Limitations Doesn t work for large areas (Voxel-Grid) Doesn t work far away from objects (active ranging) Doesn t work well out-doors (IR) Requires powerful graphics card Uses lots of battery (active ranging) Parts of this slide are adapted from Richard A. Newcombe (Imperial College London) and Boaz Petersil (Israel Institute of Technology) 58

Thank you! 28.01.2015 60