Kinect Device. How the Kinect Works. Kinect Device. What the Kinect does 4/27/16. Subhransu Maji Slides credit: Derek Hoiem, University of Illinois

Similar documents
Human Body Recognition and Tracking: How the Kinect Works. Kinect RGB-D Camera. What the Kinect Does. How Kinect Works: Overview

3D Computer Vision. Depth Cameras. Prof. Didier Stricker. Oliver Wasenmüller

Stereo. 11/02/2012 CS129, Brown James Hays. Slides by Kristen Grauman

Binocular stereo. Given a calibrated binocular stereo pair, fuse it to produce a depth image. Where does the depth information come from?

Epipolar Geometry and Stereo Vision

Stereo vision. Many slides adapted from Steve Seitz

Epipolar Geometry and Stereo Vision

Stereo. Many slides adapted from Steve Seitz

Lecture 19: Depth Cameras. Visual Computing Systems CMU , Fall 2013

CS4495/6495 Introduction to Computer Vision. 3B-L3 Stereo correspondence

Stereo: Disparity and Matching

BIL Computer Vision Apr 16, 2014

Stereo II CSE 576. Ali Farhadi. Several slides from Larry Zitnick and Steve Seitz

CS4495/6495 Introduction to Computer Vision

Real-Time Human Pose Recognition in Parts from Single Depth Images

Epipolar Geometry and Stereo Vision

The Kinect Sensor. Luís Carriço FCUL 2014/15

Lecture 9 & 10: Stereo Vision

Image Based Reconstruction II

Depth Sensors Kinect V2 A. Fornaser

CS 4495 Computer Vision A. Bobick. Motion and Optic Flow. Stereo Matching

Lecture 10: Multi view geometry

Data-driven Depth Inference from a Single Still Image

CS 4495 Computer Vision A. Bobick. Motion and Optic Flow. Stereo Matching

Final project bits and pieces

CS Decision Trees / Random Forests

Stereo. Outline. Multiple views 3/29/2017. Thurs Mar 30 Kristen Grauman UT Austin. Multi-view geometry, matching, invariant features, stereo vision

Chaplin, Modern Times, 1936

EECS 442 Computer vision. Stereo systems. Stereo vision Rectification Correspondence problem Active stereo vision systems

Jamie Shotton, Andrew Fitzgibbon, Mat Cook, Toby Sharp, Mark Finocchio, Richard Moore, Alex Kipman, Andrew Blake CVPR 2011

Public Library, Stereoscopic Looking Room, Chicago, by Phillips, 1923

Epipolar Geometry and Stereo Vision

CS 2770: Intro to Computer Vision. Multiple Views. Prof. Adriana Kovashka University of Pittsburgh March 14, 2017

Lecture'9'&'10:'' Stereo'Vision'

Fundamental matrix. Let p be a point in left image, p in right image. Epipolar relation. Epipolar mapping described by a 3x3 matrix F

Lecture 10: Multi-view geometry

Multiple View Geometry

Introduction à la vision artificielle X

Multiple View Geometry

There are many cues in monocular vision which suggests that vision in stereo starts very early from two similar 2D images. Lets see a few...

CS5670: Computer Vision

Fundamentals of Stereo Vision Michael Bleyer LVA Stereo Vision

Project 2 due today Project 3 out today. Readings Szeliski, Chapter 10 (through 10.5)

Articulated Pose Estimation with Flexible Mixtures-of-Parts

Cameras and Stereo CSE 455. Linda Shapiro

Computer Vision Lecture 17

Computer Vision Lecture 17

Lecture 14: Computer Vision

Lecture 6 Stereo Systems Multi-view geometry

Kinect Cursor Control EEE178 Dr. Fethi Belkhouche Christopher Harris Danny Nguyen I. INTRODUCTION

Recap from Previous Lecture

Recap: Features and filters. Recap: Grouping & fitting. Now: Multiple views 10/29/2008. Epipolar geometry & stereo vision. Why multiple views?

What have we leaned so far?

Background subtraction in people detection framework for RGB-D cameras

Towards a Simulation Driven Stereo Vision System

Miniature faking. In close-up photo, the depth of field is limited.

Complex Sensors: Cameras, Visual Sensing. The Robotics Primer (Ch. 9) ECE 497: Introduction to Mobile Robotics -Visual Sensors

Stereo Vision A simple system. Dr. Gerhard Roth Winter 2012

Colorado School of Mines. Computer Vision. Professor William Hoff Dept of Electrical Engineering &Computer Science.

3D Photography: Stereo

Stereo and Epipolar geometry

Project 3 code & artifact due Tuesday Final project proposals due noon Wed (by ) Readings Szeliski, Chapter 10 (through 10.5)

Key Developments in Human Pose Estimation for Kinect

Today. Stereo (two view) reconstruction. Multiview geometry. Today. Multiview geometry. Computational Photography

A virtual tour of free viewpoint rendering

Computer Vision I. Announcements. Random Dot Stereograms. Stereo III. CSE252A Lecture 16

CEng Computational Vision

Multi-view stereo. Many slides adapted from S. Seitz

3D Scanning. Qixing Huang Feb. 9 th Slide Credit: Yasutaka Furukawa

Announcements. Stereo Vision Wrapup & Intro Recognition

LEARNING BOUNDARIES WITH COLOR AND DEPTH. Zhaoyin Jia, Andrew Gallagher, Tsuhan Chen

CS 1674: Intro to Computer Vision. Midterm Review. Prof. Adriana Kovashka University of Pittsburgh October 10, 2016

Depth Cameras. Didier Stricker Oliver Wasenmüller Lecture 3D Computer Vision

3D Photography: Active Ranging, Structured Light, ICP

EE795: Computer Vision and Intelligent Systems

Image Segmentation continued Graph Based Methods. Some slides: courtesy of O. Capms, Penn State, J.Ponce and D. Fortsyth, Computer Vision Book

Depth from two cameras: stereopsis

Practice Exam Sample Solutions

Stereo Vision II: Dense Stereo Matching

Introduction to Computer Vision. Srikumar Ramalingam School of Computing University of Utah

Depth from two cameras: stereopsis

Contexts and 3D Scenes

Multi-view Stereo. Ivo Boyadzhiev CS7670: September 13, 2011

Stereo and structured light

CS 4495/7495 Computer Vision Frank Dellaert, Fall 07. Dense Stereo Some Slides by Forsyth & Ponce, Jim Rehg, Sing Bing Kang

Computer Vision I. Announcement. Stereo Vision Outline. Stereo II. CSE252A Lecture 15

Project 4 Results. Representation. Data. Learning. Zachary, Hung-I, Paul, Emanuel. SIFT and HoG are popular and successful.

Problem Set 7 CMSC 426 Assigned Tuesday April 27, Due Tuesday, May 11

Lecture 6 Stereo Systems Multi- view geometry Professor Silvio Savarese Computational Vision and Geometry Lab Silvio Savarese Lecture 6-24-Jan-15

Global Depth from Epipolar Volumes - A General Framework for Reconstructing Non-Lambertian Surfaces

3D Photography: Stereo Matching

Geometric Reconstruction Dense reconstruction of scene geometry

Contexts and 3D Scenes

Outline. Segmentation & Grouping. Examples of grouping in vision. Grouping in vision. Grouping in vision 2/9/2011. CS 376 Lecture 7 Segmentation 1

Passive 3D Photography

Local features: detection and description. Local invariant features

Range Imaging Through Triangulation. Range Imaging Through Triangulation. Range Imaging Through Triangulation. Range Imaging Through Triangulation

3D Computer Vision 1

Mosaics wrapup & Stereo

Segmentation and Grouping

Transcription:

4/27/16 Kinect Device How the Kinect Works T2 Subhransu Maji Slides credit: Derek Hoiem, University of Illinois Photo frame-grabbed from: http://www.blisteredthumbs.net/2010/11/dance-central-angry-review Kinect Device What the Kinect does Get Depth Image Application (e.g., game) illustration source: primesense.com Estimate Body Pose 1

How Kinect Works: Overview IR Projector Part 1: Stereo from projected dots IR Projector IR Sensor Projected Light Pattern IR Sensor Projected Light Pattern Stereo Algorithm Stereo Algorithm Segmentation, Part Prediction Segmentation, Part Prediction Depth Image Body Pose Depth Image Body Pose Part 1: Stereo from projected dots 1. Overview of depth from stereo Depth from Stereo Images image 1 image 2 2. How it works for a projector/sensor pair 3. Stereo algorithm used by Primesense (Kinect) Dense depth map Some of following slides adapted from Steve Seitz and Lana Lazebnik 2

Depth from Stereo Images Stereo and the Epipolar constraint Goal: recover depth by finding image coordinate x that corresponds to x x x x x x x x z x' f f C Baseline B C Potential matches for x have to lie on the corresponding line l. Potential matches for x have to lie on the corresponding line l. Simplest Case: Parallel images Image planes of cameras are parallel to each other and to the baseline Camera centers are at same height Focal lengths are the same Then, epipolar lines fall along the horizontal scan lines of the images Basic stereo matching algorithm For each pixel in the first image Find corresponding epipolar line in the right image Examine all pixels on the epipolar line and pick the best match Triangulate the matches to get depth informaon 3

Depth from disparity Basic stereo matching algorithm x x = O O f z x x z f f O Baseline O B B f disparity = x x = z Disparity is inversely proportional to depth. If necessary, recfy the two stereo images to transform epipolar lines into scanlines For each pixel x in the first image Find corresponding epipolar scanline in the right image Examine all pixels on the scanline and pick the best match x Compute disparity x-x and set depth(x) = fb/(x-x ) Correspondence search Correspondence search Left Right Left Right scanline scanline Matching cost disparity Slide a window along the right scanline and compare contents of that window with the reference window in the le[ image Matching cost: SSD or normalized correlaon SSD 4

Correspondence search Left Right Results with window search Data scanline Window-based matching Ground truth Norm. corr Add constraints and solve with graph cuts Failures of correspondence search Before Textureless surfaces Occlusions, repetition Graph cuts Ground truth Y. Boykov, O. Veksler, and R. Zabih, Fast Approximate Energy Minimization via Graph Cuts, PAMI 2001 For the latest and greatest: http://www.middlebury.edu/stereo/ Non-Lambertian surfaces, specularities 5

4/27/16 Dot Projecons Depth from Projector-Sensor Only one image: How is it possible to get depth? Scene Surface http://www.youtube.com/ watch?v=28jwgxbqx8w Projector Same stereo algorithms apply Projector Sensor Example: Book vs. No Book Source: http://www.futurepicture.org/?p=97 Sensor 6

4/27/16 Example: Book vs. No Book Source: http://www.futurepicture.org/?p=97 Region-growing Random Dot Matching 1. Detect dots ( speckles ) and label them unknown 2. Randomly select a region anchor, a dot with unknown depth a. Windowed search via normalized cross correlaon along scanline Check that best match score is greater than threshold; if not, mark as invalid and go to 2 b. Region growing 1. 2. 3. Neighboring pixels are added to a queue For each pixel in queue, inialize by anchor s shi[; then search small local neighborhood; if matched, add neighbors to queue Stop when no pixels are le[ in the queue 3. Stop when all dots have known depth or are marked invalid http://www.wipo.int/patentscope/search/en/wo2007043036 Projected IR vs. Natural Light Stereo What are the advantages of IR? Works in low light condions Does not rely on having textured objects Not confused by repeated scene textures Can tailor algorithm to produced paeern What are advantages of natural light? Works outside, anywhere with sufficient light Uses less energy Resoluon limited only by sensors, not projector Part 2: Pose from depth IR Projector IR Sensor Projected Light Pattern Stereo Algorithm Segmentation, Part Prediction Difficules with both Very dark surfaces may not reflect enough light Specular reflecon in mirrors or metal causes trouble Depth Image Body Pose 7

Goal: esmate pose from depth image Goal: esmate pose from depth image RGB Depth Part Label Map Joint Positions Real-Time Human Pose Recognition in Parts from a Single Depth Image Jamie Shotton, Andrew Fitzgibbon, Mat Cook, Toby Sharp, Mark Finocchio, Richard Moore, Alex Kipman, and Andrew Blake CVPR 2011 http://research.microsoft.com/apps/video/ default.aspx?id=144455 Challenges Lots of variaon in bodies, orientaon, poses Needs to be very fast (their algorithm runs at 200 FPS on the box 360 GPU) Extract body pixels by thresholding depth Pose Examples Examples of one part 8

Basic learning approach Very simple features Lots of data Get lots of training data Capture and sample 500K mocap frames of people kicking, driving, dancing, etc. Get 3D models for 15 bodies with a variety of weight, height, etc. Synthesize mocap data for all 15 body types Flexible classifier Body models Features Difference of depth at two offsets Offset is scaled by depth at center 9

Part predicon with random forests Randomized decision forests: collecon of independently trained trees Each tree is a classifier that predicts the likelihood of a pixel belonging to each part Node corresponds to a thresholded feature The leaf node that an example falls into corresponds to a conjuncon of several features In training, at each node, a subset of features is chosen randomly, and the most discriminave is selected Joint esmaon Joints are esmated using mean-shi[ (a fast mode-finding algorithm) Observed part center is offset by preesmated value Results More results Ground Truth 10

Accuracy vs. Number of Training Examples Uses of Kinect Mario: hep://www.youtube.com/watch?v=8ctjl5lujhg Robot Control: hep://www.youtube.com/watch?v=w8bmgtmkfby Capture for holography: hep://www.youtube.com/watch?v=4lw8wgmfpte Virtual dressing room: hep://www.youtube.com/watch?v=1jbvnk1t4vq Fly wall: hep://vimeo.com/user3445108/kiwibankinteracvewall 3D Scanner: hep://www.youtube.com/watch?v=v7lthroesw To learn more Warning: lots of wrong info on web Great site by Daniel Reetz: hep://www.futurepicture.org/?p=97 Kinect patents: hep://www.faqs.org/patents/app/20100118123 hep://www.faqs.org/patents/app/20100020078 hep://www.faqs.org/patents/app/20100007717 Next week Tues ICES forms (important!) Wrap-up, proj 5 results Normal office hours + feel free to stop by other mes on Tues, Thurs Try to stop by instead of e-mail except for one-line answer kind of things Final project reports due Thursday at midnight Friday Final project presentaons at 1:30pm If you re in a jam for final project, let me know early 11