Real-Time Human Pose Recognition in Parts from Single Depth Images

Similar documents
Jamie Shotton, Andrew Fitzgibbon, Mat Cook, Toby Sharp, Mark Finocchio, Richard Moore, Alex Kipman, Andrew Blake CVPR 2011

Articulated Pose Estimation with Flexible Mixtures-of-Parts

CS Decision Trees / Random Forests

Key Developments in Human Pose Estimation for Kinect

Human Body Recognition and Tracking: How the Kinect Works. Kinect RGB-D Camera. What the Kinect Does. How Kinect Works: Overview

Kinect Device. How the Kinect Works. Kinect Device. What the Kinect does 4/27/16. Subhransu Maji Slides credit: Derek Hoiem, University of Illinois

Supplementary Material: Decision Tree Fields

Data-driven Depth Inference from a Single Still Image

Human Pose Estimation in Stereo Images

THE fast and reliable estimation of the pose of the human

The Kinect Sensor. Luís Carriço FCUL 2014/15

Motion capture: An evaluation of Kinect V2 body tracking for upper limb motion analysis

Domain Adaptation for Upper Body Pose Tracking in Signed TV Broadcasts

Real-Time Human Pose Recognition in Parts from Single Depth Images

Walking gait dataset: point clouds, skeletons and silhouettes

Pose Estimation on Depth Images with Convolutional Neural Network

A two-step methodology for human pose estimation increasing the accuracy and reducing the amount of learning samples dramatically

Ulas Bagci

Gesture Recognition: Hand Pose Estimation. Adrian Spurr Ubiquitous Computing Seminar FS

Background subtraction in people detection framework for RGB-D cameras

Upper Body Pose Recognition with Labeled Depth Body Parts via Random Forests and Support Vector Machines

Gesture Recognition: Hand Pose Estimation

Random Tree Walk toward Instantaneous 3D Human Pose Estimation

Kinect Joints Correction Using Optical Flow for Weightlifting Videos

A Robust Gesture Recognition Using Depth Data

Multi-view Body Part Recognition with Random Forests

Human Upper Body Pose Estimation in Static Images

Human Upper Body Posture Recognition and Upper Limbs Motion Parameters Estimation

Lecture 19: Depth Cameras. Visual Computing Systems CMU , Fall 2013

Hand part classification using single depth images

Towards the automatic definition of the objective function for model-based 3D hand tracking

Joint Classification-Regression Forests for Spatially Structured Multi-Object Segmentation

Random Forest A. Fornaser

arxiv: v1 [cs.cv] 30 Oct 2017

Latent variable pictorial structure for human pose estimation on depth images

Kinect Cursor Control EEE178 Dr. Fethi Belkhouche Christopher Harris Danny Nguyen I. INTRODUCTION

arxiv: v3 [cs.cv] 10 Jan 2018

Classification of RGB-D and Motion Capture Sequences Using Extreme Learning Machine

Robust Classification of Human Actions from 3D Data

A Virtual Dressing Room Using Kinect

Depth Sweep Regression Forests for Estimating 3D Human Pose from Images

COLLABORATIVE VOTING OF 3D FEATURES FOR ROBUST GESTURE ESTIMATION. Daniel van Sabben, Javier Ruiz-Hidalgo, Xavier Suau Cuadros, Josep R.

Feature Weighting in Dynamic Time Warping for Gesture Recognition in Depth Data

Method For Segmentation Of Articulated Structures Using Depth Images for Public Displays

LOCALIZATION OF HUMANS IN IMAGES USING CONVOLUTIONAL NETWORKS

Skeleton based Human Action Recognition using Kinect

DPM Configurations for Human Interaction Detection

Body Parts Dependent Joint Regressors for Human Pose Estimation in Still Images

Towards a Simulation Driven Stereo Vision System

Human motion capture using 3D reconstruction based on multiple depth data

Development of a Fall Detection System with Microsoft Kinect

Sensors & Transducers 2015 by IFSA Publishing, S. L.

Keyframing an IK Skeleton Maya 2012

CS 559: Machine Learning Fundamentals and Applications 10 th Set of Notes

Easy Minimax Estimation with Random Forests for Human Pose Estimation

Abstract. 1 Introduction

Estimating Human Pose in Images. Navraj Singh December 11, 2009

Discrete Optimization of Ray Potentials for Semantic 3D Reconstruction

Spontaneously Emerging Object Part Segmentation

Automated video surveillance for preventing suicide attempts

3D Pose Estimation using Synthetic Data over Monocular Depth Images

Edge Enhanced Depth Motion Map for Dynamic Hand Gesture Recognition

Accurate, Robust, and Flexible Real-time Hand Tracking

A robust stereo prior for human segmentation

HandSonor: A Customizable Vision-based Control Interface for Musical Expression

GeoF: Geodesic Forests for Learning Coupled Predictors

Human-Object Interaction Recognition by Learning the distances between the Object and the Skeleton Joints

CS4495/6495 Introduction to Computer Vision. 8C-L1 Classification: Discriminative models

Combining PGMs and Discriminative Models for Upper Body Pose Detection

EigenJoints-based Action Recognition Using Naïve-Bayes-Nearest-Neighbor

Pose Estimation of Kinematic Chain Instances via Object Coordinate Regression

Supervised Learning for Image Segmentation

Human Pose Estimation using Body Parts Dependent Joint Regressors

LEARNING BOUNDARIES WITH COLOR AND DEPTH. Zhaoyin Jia, Andrew Gallagher, Tsuhan Chen

Metric Regression Forests for Human Pose Estimation

Part I: HumanEva-I dataset and evaluation metrics

7. Boosting and Bagging Bagging

CS395T paper review. Indoor Segmentation and Support Inference from RGBD Images. Chao Jia Sep

AUTOMATIC 3D HUMAN ACTION RECOGNITION Ajmal Mian Associate Professor Computer Science & Software Engineering

Applications. Systems. Motion capture pipeline. Biomechanical analysis. Graphics research

Articulated Characters

CS 775: Advanced Computer Graphics. Lecture 17 : Motion Capture

Learning 6D Object Pose Estimation and Tracking

arxiv: v1 [cs.cv] 13 Aug 2016

Patient MoCap: Human Pose Estimation under Blanket Occlusion for Hospital Monitoring Applications

TOWARDS SIGN LANGUAGE RECOGNITION BASED ON BODY PARTS RELATIONS

A Probabilistic Approach for Human Everyday Activities Recognition using Body Motion from RGB-D Images

Personalization and Evaluation of a Real-time Depth-based Full Body Tracker

Real-World Material Recognition for Scene Understanding

Real-time gesture recognition from depth data through key poses learning and decision forests

Probability-based Dynamic Time Warping for Gesture Recognition on RGB-D data

Robust Human Body Shape and Pose Tracking

Texton Clustering for Local Classification using Scene-Context Scale

View Invariant Human Action Recognition Using Histograms of 3D Joints

Chapter 2 Learning Actionlet Ensemble for 3D Human Action Recognition

Learning Semantic Environment Perception for Cognitive Robots

Multi-View 3D Object Detection Network for Autonomous Driving

3D Pose Estimation from a Single Monocular Image

A Performance Comparison of Random Forests and Dropout Nets on Sign Language Gesture Classification Using the Microsoft Kinect

arxiv: v1 [cs.cv] 13 Dec 2015

Transcription:

Real-Time Human Pose Recognition in Parts from Single Depth Images Jamie Shotton, Andrew Fitzgibbon, Mat Cook, Toby Sharp, Mark Finocchio, Richard Moore, Alex Kipman, Andrew Blake CVPR 2011 PRESENTER: AHSAN ABDULLAH

PROBLEM

APPROACH Partitioning into body parts helps localizing the joints right hand neck left shoulder right elbow

PIPELINE Design Goals Efficiency Robustness capture depth image & remove bg infer body parts per pixel cluster pixels to hypothesize body joint positions fit model & track skeleton

BODY PART CLASSIFICATION Compute P(c i w i ) pixels i = (x, y) body part c i image window w i Discriminative approach image windows move with classifier learn classifier P(c i w i ) from training data

LEARNING DATA synthetic (train & test) real (test)

LEARNING DATA SYNTHESIS Record MoCap 500k frames distilled to 100k poses Retarget to several models Render (depth, body parts) pairs

FEATURE SET Depth comparisons - very fast to compute feature response image depth f I, x = d I x d I (x + Δ) image coordinate offset depth Δ Δ x Δ x Δ x Δ Δ x x x input depth image Δ = v d I x scales inversely with depth Background pixels d = large constant

DECISION FORESTS Aggregation of decision trees

TRAINING DECISION TREES P n (c) body part cn Q n = (I, x) f(i, x; Δ n ) > θ n for all pixels [Breiman et al. 84] P l (c) c l no reduce entropy yes P r (c) r c Take (Δ, θ) that maximises information gain

DECISION TREE CLASSIFICATION Toy example: Distinguish left (L) and right (R) sides of the body no image window centred at x f(i, x; Δ 1 ) > θ 1 yes f(i, x; Δ 2 ) > θ 2 no yes P(c) L R P(c) P(c) L R L R

DECISION FOREST CLASSIFIER tree 1 (I, x) (I, x) P T (c) tree T P 1 (c) c Trained on different random subset of images bagging helps avoid over-fitting c [Amit & Geman 97] [Breiman 01] [Geurts et al. 06] Average tree posteriors T P c I, x = 1 T t=1 P t (c I, x)

Average per-class NUMBER OF TREES ground truth 55% 50% 45% inferred body parts (most likely) 1 tree 3 trees 6 trees 40% 1 2 3 4 5 6 Number of trees

Average per-class accuracy TREE DEPTH 65% 60% synthetic test data 65% 60% real test data 55% 55% 50% 50% 45% 45% 40% 40% 35% 35% 30% 8 12 16 20 Depth of trees 30% 5 15 Depth of trees

Body parts to joint hypotheses Define 3D world space density 3D coord pixel weight 3D coord of i th pixel 1 2 pixel index i bandwidth inferred probability depth at i th pixel Mean shift for mode detection 3. hypothesize body joints

input depth inferred body parts front view side view inferred joint positions No tracking or smoothing top view

input depth inferred body parts front view side view inferred joint positions No tracking or smoothing top view

Center Head Center Neck Left Shoulder Right Left Elbow Right Elbow Left Wrist Right Wrist Left Hand Right Hand Left Knee Right Knee Left Ankle Right Ankle Left Foot Right Foot Mean AP Average precision JOINT PREDICTION ACCURACY 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0

Center Head Center Neck Left Shoulder Right Shoulder Left Elbow Right Elbow Left Wrist Right Wrist Left Hand Right Hand Left Knee Right Knee Left Ankle Right Ankle Left Foot Right Foot Mean AP Average precision JOINT PREDICTION ACCURACY 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 Joint prediction from ground truth body parts Joint prediction from inferred body parts

ANALYSIS No temporal information - frame-by-frame Very fast - simple depth image feature - parallel decision forest classifier

Uses KINECT SYSTEM 3D joint hypotheses kinematic constraints temporal coherence 3 2 1 to give full skeleton higher accuracy invisible joints multi-player 4. track skeleton

SUMMARY Frame-by-frame gives robustness Body parts representation for efficiency Fast, simple machine learning Significant engineering to scale to a massive, varied training data set

QUESTIONS