Lecture 19: Depth Cameras. Visual Computing Systems CMU , Fall 2013

Similar documents
3D Computer Vision. Depth Cameras. Prof. Didier Stricker. Oliver Wasenmüller

Human Body Recognition and Tracking: How the Kinect Works. Kinect RGB-D Camera. What the Kinect Does. How Kinect Works: Overview

Depth Sensors Kinect V2 A. Fornaser

Kinect Device. How the Kinect Works. Kinect Device. What the Kinect does 4/27/16. Subhransu Maji Slides credit: Derek Hoiem, University of Illinois

3D Scanning. Qixing Huang Feb. 9 th Slide Credit: Yasutaka Furukawa

Range Imaging Through Triangulation. Range Imaging Through Triangulation. Range Imaging Through Triangulation. Range Imaging Through Triangulation

Kinect Cursor Control EEE178 Dr. Fethi Belkhouche Christopher Harris Danny Nguyen I. INTRODUCTION

3D Photography: Stereo

3D Photography: Active Ranging, Structured Light, ICP

The Kinect Sensor. Luís Carriço FCUL 2014/15

Multiple View Geometry

CS4495/6495 Introduction to Computer Vision

Binocular stereo. Given a calibrated binocular stereo pair, fuse it to produce a depth image. Where does the depth information come from?

Stereo and structured light

Depth. Common Classification Tasks. Example: AlexNet. Another Example: Inception. Another Example: Inception. Depth

Range Sensors (time of flight) (1)

Stereo. 11/02/2012 CS129, Brown James Hays. Slides by Kristen Grauman

Active Stereo Vision. COMP 4900D Winter 2012 Gerhard Roth

There are many cues in monocular vision which suggests that vision in stereo starts very early from two similar 2D images. Lets see a few...

CS5670: Computer Vision

3D Computer Vision 1

10/5/09 1. d = 2. Range Sensors (time of flight) (2) Ultrasonic Sensor (time of flight, sound) (1) Ultrasonic Sensor (time of flight, sound) (2) 4.1.

Lecture 10: Multi view geometry

Project 2 due today Project 3 out today. Readings Szeliski, Chapter 10 (through 10.5)

Stereo vision. Many slides adapted from Steve Seitz

Outline. ETN-FPI Training School on Plenoptic Sensing

Public Library, Stereoscopic Looking Room, Chicago, by Phillips, 1923

Stereo. Many slides adapted from Steve Seitz

Visual Perception Sensors

High-Fidelity Augmented Reality Interactions Hrvoje Benko Researcher, MSR Redmond

EE795: Computer Vision and Intelligent Systems

Depth Camera for Mobile Devices

Chaplin, Modern Times, 1936

EECS 442 Computer vision. Stereo systems. Stereo vision Rectification Correspondence problem Active stereo vision systems

Complex Sensors: Cameras, Visual Sensing. The Robotics Primer (Ch. 9) ECE 497: Introduction to Mobile Robotics -Visual Sensors

Lecture 14: Basic Multi-View Geometry

MotionCam-3D. The highest resolution and highest accuracy area based 3D camera in the world. Jan Zizka, CEO AWARD 2018

Recap from Previous Lecture

Image Based Reconstruction II

Multiple View Geometry

Rectification and Disparity

Machine vision. Summary # 11: Stereo vision and epipolar geometry. u l = λx. v l = λy

Photoneo's brand new PhoXi 3D Camera is the highest resolution and highest accuracy area based 3D

Structured Light II. Thanks to Ronen Gvili, Szymon Rusinkiewicz and Maks Ovsjanikov

Computer and Machine Vision

Computer Vision Lecture 17

Lecture 9 & 10: Stereo Vision

PART IV: RS & the Kinect

Computer Vision Lecture 17

Dense Tracking and Mapping for Autonomous Quadrocopters. Jürgen Sturm

Structured light , , Computational Photography Fall 2017, Lecture 27

Epipolar Geometry and Stereo Vision

Today. Stereo (two view) reconstruction. Multiview geometry. Today. Multiview geometry. Computational Photography

3D Computer Vision. Structured Light II. Prof. Didier Stricker. Kaiserlautern University.

Lecture 10: Multi-view geometry

Correspondence and Stereopsis. Original notes by W. Correa. Figures from [Forsyth & Ponce] and [Trucco & Verri]

Lecture 14: Computer Vision

BIL Computer Vision Apr 16, 2014

Accurate 3D Face and Body Modeling from a Single Fixed Kinect

Other Reconstruction Techniques

3D object recognition used by team robotto

Lecture'9'&'10:'' Stereo'Vision'

Announcements. Stereo Vision Wrapup & Intro Recognition

What have we leaned so far?

Structured Light. Tobias Nöll Thanks to Marc Pollefeys, David Nister and David Lowe

Fundamentals of Stereo Vision Michael Bleyer LVA Stereo Vision

Stereo and Epipolar geometry

Stereo Vision. MAN-522 Computer Vision

Deep Incremental Scene Understanding. Federico Tombari & Christian Rupprecht Technical University of Munich, Germany

Stereo Wrap + Motion. Computer Vision I. CSE252A Lecture 17

Overview of Active Vision Techniques

LUMS Mine Detector Project

Computational Imaging for Self-Driving Vehicles

3D Perception. CS 4495 Computer Vision K. Hawkins. CS 4495 Computer Vision. 3D Perception. Kelsey Hawkins Robotics

Stereo: Disparity and Matching

3D Time-of-Flight Image Sensor Solutions for Mobile Devices

High-speed Three-dimensional Mapping by Direct Estimation of a Small Motion Using Range Images

Depth Estimation with a Plenoptic Camera

Stereo Vision A simple system. Dr. Gerhard Roth Winter 2012

Dense 3D Reconstruction. Christiano Gava

Multi-View Stereo for Community Photo Collections Michael Goesele, et al, ICCV Venus de Milo

Lecture 8 Active stereo & Volumetric stereo

CS4495/6495 Introduction to Computer Vision. 3B-L3 Stereo correspondence

Colorado School of Mines. Computer Vision. Professor William Hoff Dept of Electrical Engineering &Computer Science.

Time-of-Flight Imaging!

Rendering. Converting a 3D scene to a 2D image. Camera. Light. Rendering. View Plane

Introduction to 3D Machine Vision

New Sony DepthSense TM ToF Technology

Laser sensors. Transmitter. Receiver. Basilio Bona ROBOTICA 03CFIOR

Project 3 code & artifact due Tuesday Final project proposals due noon Wed (by ) Readings Szeliski, Chapter 10 (through 10.5)

CS4670/5760: Computer Vision Kavita Bala Scott Wehrwein. Lecture 23: Photometric Stereo

3D Computer Vision. Structured Light I. Prof. Didier Stricker. Kaiserlautern University.

CS201 Computer Vision Lect 4 - Image Formation

Stereo Matching.

Cameras and Stereo CSE 455. Linda Shapiro

Two-view geometry Computer Vision Spring 2018, Lecture 10

Miniature faking. In close-up photo, the depth of field is limited.

Active Light Time-of-Flight Imaging

EE795: Computer Vision and Intelligent Systems

IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. 6, NO. 5, SEPTEMBER

Transcription:

Lecture 19: Depth Cameras Visual Computing Systems

Continuing theme: computational photography Cameras capture light, then extensive processing produces the desired image Today: - Capturing scene depth in addition to light intensity

Why might we want to know the depth of scene objects? Scene Understanding Navigation Object tracking Mapping Segmentation Credit: Blashko et al. CVPR 13 Tutorial

Depth from time-of-flight Conventional LIDAR - Laser beam scans scene (rotating mirror) - Low frame rate to capture entire scene Time-of-flight cameras - No moving beam, capture entire image of scene with each light pulse - Special CMOS sensor records a depth image - High frame rate - Formerly TOF cameras were low resolution, expensive... - TOF camera featured in the upcoming XBox One depth sensor (today we will first talk about the original XBox 360 implementation)

Computing depth from images Binocular stereo 3D reconstruction of point P: depth from disparity Focal length: f Baseline: b Disparity: d = x - x P z f b f x x Simple reconstruction example: cameras aligned (coplanar sensors), separated by known distance, same focal length

Correspondence problem How to determine which pairs of pixels in image 1 and image 2 correspond to the same scene point?

Epipolar constraint Goal: determine pixel correspondence - Corresponding pixels = pairs of pixels that correspond to same scene point P epipolar line epipolar plane epipolar line (image of the line through P) Epipolar Constraint Reduces correspondence problem to 1D search along conjugate epipolar lines Point in left image will lie on line in right image (epipolar line) Slide credit: S. Narasimhan

Solving correspondence (basic algorithm) For each epipolar line Slide credit: S. Narasimhan For each pixel in the left image Compare with every pixel on same epipolar line in right image Pick pixel with minimum match cost What are assumptions? Basic improvements: match windows, adaptive size match windows... - This should sound familiar given our discussion of image processing algorithms... - Correlation, sum of squared difference (SSD), etc.

Solving correspondence: robustness challenges Scene with no texture (many parts of the scene look the same) Non-lambertian surfaces (surface appearance is dependent upon view) Pixel pairs may not be present (point on surface is occluded from one view)

Alternative: depth from defocus Aperture: a P Circle-of-confusion: c z a Then use thin lens approximation to obtain z from z z c f Avoids correspondence problem of depth from disparity, but system must know location of sharp edges in scene to estimate circle of confusion c.

Structured light System: one light source emitting known beam + one camera measuring scene appearance If the scene is at reference plane, image that will be recorded by camera is known (correspondence between pixel in recorded image and scene point is known) Reference plane zref z b Known light source f Single spot illuminant is inefficient! x d (must scan scene with spot to get depth, so high latency to retrieve a single depth image)

Structured light Simplify correspondence problem by encoding spatial position in illuminant Image: Zhang et al. Projected light pattern Camera image

Microsoft XBox 360 Kinect Image credit: ifixit Illuminant (Infrared Laser + diffuser) RGB CMOS Sensor 640x480 (w/ Bayer mosaic) Monochrome Infrared CMOS Sensor (Aptina MT9M001) 1280x1024 ** ** Kinect returns 640x480 disparity image, suspect sensor is configured for 2x2 pixel binning down to 640x512, then crop

Infrared image of Kinect illuminant output Credit: www.futurepicture.org

Infrared image of Kinect illuminant output Credit: www.futurepicture.org

Computing disparity for scene Region-growing algorithm for compute efficiency ** (Assumption: spatial locality likely implies depth locality) 1. Choose output pixels in infrared image, classify as UNKNOWN or SHADOW (based on whether speckle is found) 2. While significantly large percentage of output pixels are UNKNOWN - Choose an UNKNOWN pixel. - Correlate surrounding MxN pixel window with reference image to compute disparity D = (dx,dy) (note: search window is a horizontal swath of image, plus some vertical slack) - If sufficiently good correlation is found: - Mark pixel as a region anchor (it s depth is known) - Attempt to grow region around the anchor pixel: - Place region anchor in FIFO, mark as ACTIVE - While FIFO not empty - Extract pixel P from FIFO (known disparity for P is D) - Attempt to establish correlations for UNKOWN neighboring pixels P n of P (left,right,top,bottom neighbors) by searching region given by P n + D + (+/-1,+/1) - If correlation is found, mark P n as ACTIVE, set parent to P, add to FIFO - Else, mark P n as EDGE, set depth to depth of P. ** Source: PrimeSense Patent WO 2007/043036 A1

Kinect block diagram Disparity calculations performed by PrimeSense ASIC in Kinect, not by XBox 360 CPU Infrared Sensor RGB Sensor Illuminant Kinect Image processing ASIC USB bus Cheap sensors: ~ 1 MPixel 640x480 x 30fps RGB image 640x480 x 30fps Disparity image Cheap illuminant: laser + diffuser makes random dot pattern (not a traditional projector) Custom image-processing ASIC to compute disparity image (scale-invariant region correlation involves non-trivial compute cost) Box 360 CPU

Extracting the player s skeleton [Shotton et al. 2011] (enabling full-body game input) Challenge: how to determine player s position and motion from (noisy) depth images... without consuming a large fraction of the XBox 360 s compute capability? Depth Image Character Joint Angles

Key idea: classify pixels into body regions [Shotton et al. 2011] Shotton et al. represents body with 31 regions

Pixel classification [Shotton et al. 2011] For each pixel: compute features from depth image Where and is the depth image value at pixel X. Two example depth features Features are cheap to compute + can be computed for all pixels in parallel - Do not depend on velocities: only information from current frame Classify pixels into body parts using randomized decision forest classifier - Trained on 100K motion capture poses + database of rendered images as ground truth Result of classification: (probability pixel x in depth image I is body part c) Per-pixel probabilities pooled to compute 3D spatial density function for each body part c (joint angles inferred from this density)

Performance result Real-time skeleton estimation from depth image requires < 10% of Xbox 360 CPU XBox GPU-based implementation @ 200Hz (research implementation described in publication, not used in product) - Actual XBox 360 product implementation is likely far more efficient

XBox 360 + Kinect system Infrared Sensor RGB Sensor Illuminant Disparity computations (create depth image) Image processing ASIC Kinect USB bus 640x480 x 30fps RGB image 640x480 x 30fps Disparity image Skeleton inference CPU CPU CPU GPU XBox 360 1 MB Shared L2 10 MB Embedded DRAM

Xbox 360 Kinect summary Hardware = cheap depth sensor + custom image processing ASIC - Structured light pattern generated by scattering infrared laser - Depth obtained from triangulation (depth from disparity), not time-of-flight - Custom ASIC to convert infrared image into depth values (high computational cost is searching for correspondences) Interpretation of depth values is performed on CPU - Low-cost, data-patallel skeleton estimation made computationally feasible by machine learning approach

Xbox One Sensor Time-of-flight sensor (not based on structured light like the original Kinect) 1080p depth sensing + wider field-of-view than original Kinect Computer vision challenges in obtaining high-quality signal: - Flying pixels - Segmentation - Motion blur Another TOF camera: Creative Depth Camera

Time of flight cameras Measure phase offset of light reflected off environment - Phase shift proportional to distance from object Image credit: V. Castaneda and N. Navab http://campar.in.tum.de/twiki/pub/chair/teachingss11kinect/2011-dsensors_labcourse_kinect.pdf Many computer vision challenges to achieving high quality depth estimate - Measurement error - Motion blur - Flying pixels - Segmentation [A Kolb et al. 2009] [Reynolds et al. CVPR 2011]

Reading KinectFusion: Real-time 3D Reconstruction and interaction Using a Moving Depth Camera. S. Izadi et al. UIST 11 - Note: may wish to Google ICP algorithm (iterative closest point)