The Kinect Sensor. Luís Carriço FCUL 2014/15

Similar documents
Human Body Recognition and Tracking: How the Kinect Works. Kinect RGB-D Camera. What the Kinect Does. How Kinect Works: Overview

Articulated Pose Estimation with Flexible Mixtures-of-Parts

Kinect Device. How the Kinect Works. Kinect Device. What the Kinect does 4/27/16. Subhransu Maji Slides credit: Derek Hoiem, University of Illinois

CS Decision Trees / Random Forests

Lecture 19: Depth Cameras. Visual Computing Systems CMU , Fall 2013

Real-Time Human Pose Recognition in Parts from Single Depth Images

Jamie Shotton, Andrew Fitzgibbon, Mat Cook, Toby Sharp, Mark Finocchio, Richard Moore, Alex Kipman, Andrew Blake CVPR 2011

Depth Sensors Kinect V2 A. Fornaser

Indoor Object Recognition of 3D Kinect Dataset with RNNs

Key Developments in Human Pose Estimation for Kinect

3D object recognition used by team robotto

Gesture Recognition: Hand Pose Estimation. Adrian Spurr Ubiquitous Computing Seminar FS

CS4495/6495 Introduction to Computer Vision

AUTOMATIC 3D HUMAN ACTION RECOGNITION Ajmal Mian Associate Professor Computer Science & Software Engineering

Kinsight: Localizing and Tracking Household Objects using Depth-Camera Sensors

Kinect Cursor Control EEE178 Dr. Fethi Belkhouche Christopher Harris Danny Nguyen I. INTRODUCTION

Keywords: clustering, construction, machine vision

Human Detection. A state-of-the-art survey. Mohammad Dorgham. University of Hamburg

Image Analysis Lecture Segmentation. Idar Dyrdal

Skeleton based Human Action Recognition using Kinect

Combining Selective Search Segmentation and Random Forest for Image Classification

Category vs. instance recognition

Nearest Neighbor Classifiers

TRANSPARENT OBJECT DETECTION USING REGIONS WITH CONVOLUTIONAL NEURAL NETWORK

Data-driven Depth Inference from a Single Still Image

Robotics Programming Laboratory

Hand part classification using single depth images

Object Segmentation and Tracking in 3D Video With Sparse Depth Information Using a Fully Connected CRF Model

Adaptive Gesture Recognition System Integrating Multiple Inputs

Analysis: TextonBoost and Semantic Texton Forests. Daniel Munoz Februrary 9, 2009

Generating Object Candidates from RGB-D Images and Point Clouds

Tracking. Hao Guan( 管皓 ) School of Computer Science Fudan University

Recognizing people. Deva Ramanan

Object recognition (part 1)

Stereo. 11/02/2012 CS129, Brown James Hays. Slides by Kristen Grauman

Object Recognition. Lecture 11, April 21 st, Lexing Xie. EE4830 Digital Image Processing

Case-Based Reasoning. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. Parametric / Non-parametric.

CS 188: Artificial Intelligence Fall 2008

3D Scanning. Qixing Huang Feb. 9 th Slide Credit: Yasutaka Furukawa

Human Detection, Tracking and Activity Recognition from Video

Background subtraction in people detection framework for RGB-D cameras

Learning-based Localization

CS231A Course Project Final Report Sign Language Recognition with Unsupervised Feature Learning

RGBD Face Detection with Kinect Sensor. ZhongJie Bi

Feature Extractors. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. The Perceptron Update Rule.

Object Category Detection: Sliding Windows

3D Computer Vision. Depth Cameras. Prof. Didier Stricker. Oliver Wasenmüller

A Systems View of Large- Scale 3D Reconstruction

Large Scale 3D Reconstruction by Structure from Motion

Kinect for Windows An Update for Researchers

CS4495/6495 Introduction to Computer Vision. 8C-L1 Classification: Discriminative models

Motion Capture using Body Mounted Cameras in an Unknown Environment

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

Skin and Face Detection

3D Photography: Stereo

Combining PGMs and Discriminative Models for Upper Body Pose Detection

Probabilistic Tracking and Reconstruction of 3D Human Motion in Monocular Video Sequences

MATLAB Based Interactive Music Player using XBOX Kinect

Clustering Billions of Images with Large Scale Nearest Neighbor Search

The Detection of Faces in Color Images: EE368 Project Report

Last update: May 4, Vision. CMSC 421: Chapter 24. CMSC 421: Chapter 24 1

Patch-based Object Recognition. Basic Idea

Deep Incremental Scene Understanding. Federico Tombari & Christian Rupprecht Technical University of Munich, Germany

Scalable Object Classification using Range Images

3D Computer Vision 1

Discrete Optimization of Ray Potentials for Semantic 3D Reconstruction

CAP 6412 Advanced Computer Vision

What have we leaned so far?

3D Photography: Active Ranging, Structured Light, ICP

CS 775: Advanced Computer Graphics. Lecture 17 : Motion Capture

Three-Dimensional Object Detection and Layout Prediction using Clouds of Oriented Gradients

Short Survey on Static Hand Gesture Recognition

Structured Light II. Thanks to Ronen Gvili, Szymon Rusinkiewicz and Maks Ovsjanikov

Joint Inference in Image Databases via Dense Correspondence. Michael Rubinstein MIT CSAIL (while interning at Microsoft Research)

FOREGROUND DETECTION ON DEPTH MAPS USING SKELETAL REPRESENTATION OF OBJECT SILHOUETTES

Last week. Multi-Frame Structure from Motion: Multi-View Stereo. Unknown camera viewpoints

VIRTUAL TRAIL ROOM. South Asian Journal of Engineering and Technology Vol.3, No.5 (2017) 87 96

Gesture Recognition: Hand Pose Estimation

Multi-stable Perception. Necker Cube

Vision is inferential. (

ECS 289H: Visual Recognition Fall Yong Jae Lee Department of Computer Science

Object Category Detection. Slides mostly from Derek Hoiem

Graphical Models for Computer Vision

Reddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011

Implementation of Kinetic Typography by Motion Recognition Sensor

Automatic Gait Recognition. - Karthik Sridharan

Object Classification Problem

Computer and Machine Vision

3D Computer Vision. Structured Light II. Prof. Didier Stricker. Kaiserlautern University.

Sign Language Recognition using Dynamic Time Warping and Hand Shape Distance Based on Histogram of Oriented Gradient Features

Chaplin, Modern Times, 1936

Visual Perception for Robots

Fast Edge Detection Using Structured Forests

Texton Clustering for Local Classification using Scene-Context Scale

Announcements. Recognition I. Gradient Space (p,q) What is the reflectance map?

Tracking People. Tracking People: Context

Designing Applications that See Lecture 7: Object Recognition

Learning Semantic Environment Perception for Cognitive Robots

3D Perception. CS 4495 Computer Vision K. Hawkins. CS 4495 Computer Vision. 3D Perception. Kelsey Hawkins Robotics

DeepIM: Deep Iterative Matching for 6D Pose Estimation - Supplementary Material

Transcription:

Advanced Interaction Techniques The Kinect Sensor Luís Carriço FCUL 2014/15 Sources: MS Kinect for Xbox 360 John C. Tang. Using Kinect to explore NUI, Ms Research, From Stanford CS247 Shotton et al. Real-Time Human Pose Recognition in Parts from Single Depth Images, CVPR 2011 Larry Zitnick. Kinect Case Study, CSE = 576 John MacCormick. How does the Kinect work?

The Kinect Sensor

The RGB Camera 640 x 480-pixel resolution and run at 30 FPS (frames per second)

The Depth Sensor Technology: structured IR light missing pixels (non IR reflective) far shadow near

The Depth Sensor An infrared projector An infrared camera 640 x 480-pixel resolution and run at 30 FPS In mm, from the camera Structured light (Zhang et al, 3DPVT, 2002) Plus other algorithms: e.g. depth from focus

How it works? Structured light 3D scanner

Book no Book Source: http://www.futurepicture.org/?p=97

Book no Book Source: http://www.futurepicture.org/?p=97

The Depth Map top view side view

RGB vs. depth for pose estimation RGB Only works well lit Background clutter Scale unknown Clothing, skin colour Depth Works in low light Person pops out from bg Scale known Uniform texture Shadows, missing pixels

Skeletal - Provided Data Skeleton space coordinates are expressed in meters

Skeleton Recognition Two main steps: Find body parts Compute joint positions. capture depth image & remove bg infer body parts per pixel cluster pixels to hypothesize body joint positions fit model & track skeleton

Body part recognition No temporal information frame-by-frame Local pose estimate of parts each pixel & each body joint treated independently reduced training data and computation time Very fast simple depth image features parallel decision forest classifier

Features

Classification Learning: 1. Randomly choose a set of thresholds and features for splits. 2. Pick the threshold and feature that provide the largest information gain. 3. Recurse until a certain accuracy is reached or depth is obtained.

Implementation details 3 trees (depth 20) 300k unique training images per tree. 2000 candidate features, and 50 thresholds One day on 1000 core cluster.

Tracking Body Parts The trained classifiers assign a probably of a pixel being in each body part Picks out areas of maximum probability for each body part type

And the Skeleton The mean shift algorithm is used to robustly compute modes of probability distributions Mean shift is simple, fast, and effective

Vision Algorithm (Summary) Object recognition approach Intermediate body parts representation that maps the difficult pose estimation problem into a simpler per-pixel classification problem Large and highly varied training dataset allows the classifier to estimate body parts invariant to pose, body shape, clothing, etc. Generate confidence-scored 3D proposals of several body joints by reprojecting the classification result and finding local modes System runs at 200 frames per second on consumer hardware Evaluation shows high accuracy on both synthetic and real test sets State of the art accuracy in comparison with related work and improved generalization over exact whole-skeleton nearest neighbor matching

In Practice Collect training data thousands of visits to global households, filming real users, the Hollywood motion capture studio generated billions of images Apply state-of-the-art object recognition research Apply state-of-the-art real-time semantic segmentation Build a training set classify each pixel s probability of being in any of 32 body segments, determine probabilistic cluster of body configurations consistent with those, present the most probable Millions of training images Millions of classifier parameters Hard to parallelize New algorithm for distributed decision-tree training Fun Fact: Major use of DryadLINQ (large-scale distributed cluster computing)

To learn more Warning: lots of wrong info on web Great site by Daniel Reetz: http://www.futurepicture.org/?p=97 Kinect patents: http://www.faqs.org/patents/app/20100118123 http://www.faqs.org/patents/app/20100020078 http://www.faqs.org/patents/app/20100007717