CS Decision Trees / Random Forests

Similar documents
Real-Time Human Pose Recognition in Parts from Single Depth Images

Jamie Shotton, Andrew Fitzgibbon, Mat Cook, Toby Sharp, Mark Finocchio, Richard Moore, Alex Kipman, Andrew Blake CVPR 2011

Articulated Pose Estimation with Flexible Mixtures-of-Parts

Human Body Recognition and Tracking: How the Kinect Works. Kinect RGB-D Camera. What the Kinect Does. How Kinect Works: Overview

The Kinect Sensor. Luís Carriço FCUL 2014/15

Key Developments in Human Pose Estimation for Kinect

Kinect Device. How the Kinect Works. Kinect Device. What the Kinect does 4/27/16. Subhransu Maji Slides credit: Derek Hoiem, University of Illinois

Data-driven Depth Inference from a Single Still Image

Development of a Fall Detection System with Microsoft Kinect

Walking gait dataset: point clouds, skeletons and silhouettes

Lecture 19: Depth Cameras. Visual Computing Systems CMU , Fall 2013

Gesture Recognition: Hand Pose Estimation. Adrian Spurr Ubiquitous Computing Seminar FS

Human Pose Estimation in Stereo Images

Pak. J. Biotechnol. Vol. 14 (2) (2017) ISSN Print: ISSN Online:

CS 775: Advanced Computer Graphics. Lecture 17 : Motion Capture

Kinect Cursor Control EEE178 Dr. Fethi Belkhouche Christopher Harris Danny Nguyen I. INTRODUCTION

Towards the automatic definition of the objective function for model-based 3D hand tracking

Background subtraction in people detection framework for RGB-D cameras

Feature Weighting in Dynamic Time Warping for Gesture Recognition in Depth Data

Latent variable pictorial structure for human pose estimation on depth images

Human Upper Body Posture Recognition and Upper Limbs Motion Parameters Estimation

Method For Segmentation Of Articulated Structures Using Depth Images for Public Displays

3D Computer Vision. Depth Cameras. Prof. Didier Stricker. Oliver Wasenmüller

Edge Enhanced Depth Motion Map for Dynamic Hand Gesture Recognition

Motion capture: An evaluation of Kinect V2 body tracking for upper limb motion analysis

AUTOMATIC 3D HUMAN ACTION RECOGNITION Ajmal Mian Associate Professor Computer Science & Software Engineering

Spontaneously Emerging Object Part Segmentation

Towards a Simulation Driven Stereo Vision System

RANDOM DECISION TREE BODY PART RECOGNITION USING FPGAS

Pose Estimation on Depth Images with Convolutional Neural Network

Gesture Recognition: Hand Pose Estimation

THE fast and reliable estimation of the pose of the human

Real-Time Human Pose Recognition in Parts from Single Depth Images

Random Forest A. Fornaser

A multi-camera positioning system for steering of a THz stand-off scanner

Random Tree Walk toward Instantaneous 3D Human Pose Estimation

3D Computer Vision 1

Skeleton based Human Action Recognition using Kinect

Automated video surveillance for preventing suicide attempts

Upper Body Pose Recognition with Labeled Depth Body Parts via Random Forests and Support Vector Machines

Outline. ETN-FPI Training School on Plenoptic Sensing

A Performance Comparison of Random Forests and Dropout Nets on Sign Language Gesture Classification Using the Microsoft Kinect

A two-step methodology for human pose estimation increasing the accuracy and reducing the amount of learning samples dramatically

EigenJoints-based Action Recognition Using Naïve-Bayes-Nearest-Neighbor

Supplementary Material: Decision Tree Fields

Multi-view Body Part Recognition with Random Forests

Sensors & Transducers 2015 by IFSA Publishing, S. L.

arxiv: v3 [cs.cv] 10 Jan 2018

Combined Shape Analysis of Human Poses and Motion Units for Action Segmentation and Recognition

Human motion capture using 3D reconstruction based on multiple depth data

COLLABORATIVE VOTING OF 3D FEATURES FOR ROBUST GESTURE ESTIMATION. Daniel van Sabben, Javier Ruiz-Hidalgo, Xavier Suau Cuadros, Josep R.

HandSonor: A Customizable Vision-based Control Interface for Musical Expression

Hand part classification using single depth images

Aalborg Universitet. Published in: Electronic Letters on Computer Vision and Image Analysis. Publication date: 2013

Logical Rhythm - Class 3. August 27, 2018

Computer Vision at Cambridge: Reconstruction,Registration and Recognition

Kinect Sensing of Shopping related Actions

Depth Sensors Kinect V2 A. Fornaser

Fast Semantic Segmentation of RGB-D Scenes with GPU-Accelerated Deep Neural Networks

LEARNING BOUNDARIES WITH COLOR AND DEPTH. Zhaoyin Jia, Andrew Gallagher, Tsuhan Chen

CS 229 Midterm Review

Robust Classification of Human Actions from 3D Data

Ensemble Learning: An Introduction. Adapted from Slides by Tan, Steinbach, Kumar

Human-Object Interaction Recognition by Learning the distances between the Object and the Skeleton Joints

Stereo. 11/02/2012 CS129, Brown James Hays. Slides by Kristen Grauman

Applications. Systems. Motion capture pipeline. Biomechanical analysis. Graphics research

Human Action Recognition Using a Temporal Hierarchy of Covariance Descriptors on 3D Joint Locations

Abstract. 1 Introduction

Probability-based Dynamic Time Warping for Gesture Recognition on RGB-D data

Hand Gesture Recognition with Microsoft Kinect A Computer Player for the Rock-paper-scissors Game

CS4495/6495 Introduction to Computer Vision

Ulas Bagci

arxiv: v1 [cs.cv] 30 Oct 2017

A Real-time Hand Gesture Recognition System

A Virtual Dressing Room Using Kinect

A Comparison of Random Forests and Dropout Nets for Sign Language Recognition with the Kinect

A Robust Gesture Recognition Using Depth Data

Random Forest Classification and Attribute Selection Program rfc3d

Clustering Billions of Images with Large Scale Nearest Neighbor Search

Implementation of 3D Object Reconstruction Using Multiple Kinect Cameras

Part I: HumanEva-I dataset and evaluation metrics

Making Machines See. Roberto Cipolla Department of Engineering. Research team

Human Detection. A state-of-the-art survey. Mohammad Dorgham. University of Hamburg

7. Boosting and Bagging Bagging

Texton Clustering for Local Classification using Scene-Context Scale

CS 559: Machine Learning Fundamentals and Applications 10 th Set of Notes

Scene Segmentation by Color and Depth Information and its Applications

Range Imaging Through Triangulation. Range Imaging Through Triangulation. Range Imaging Through Triangulation. Range Imaging Through Triangulation

Histogram of 3D Facets: A Characteristic Descriptor for Hand Gesture Recognition

Computer Vision Group Prof. Daniel Cremers. 8. Boosting and Bagging

Accurate 3D Face and Body Modeling from a Single Fixed Kinect

Kinect Joints Correction Using Optical Flow for Weightlifting Videos

Real-time gesture recognition from depth data through key poses learning and decision forests

3D Perception. CS 4495 Computer Vision K. Hawkins. CS 4495 Computer Vision. 3D Perception. Kelsey Hawkins Robotics

Scale Invariant Human Action Detection from Depth Cameras using Class Templates

Real-Time Continuous Action Detection and Recognition Using Depth Images and Inertial Signals

Environment Capturing with Microsoft Kinect

USAGE OF MULTIPLE LIDAR SENSORS ON A MOBILE SYSTEM FOR THE DETECTION OF PERSONS WITH IMPLICIT SHAPE MODELS

Domain Adaptation for Upper Body Pose Tracking in Signed TV Broadcasts

Computational Imaging for Self-Driving Vehicles

Transcription:

CS548 2015 Decision Trees / Random Forests Showcase by: Lily Amadeo, Bir B Kafle, Suman Kumar Lama, Cody Olivier Showcase work by Jamie Shotton, Andrew Fitzgibbon, Richard Moore, Mat Cook, Alex Kipman, Toby Sharp, Andrew Blake, Mark Finocchio on Real-Time Human Pose Recognition in Parts from Single Depth Images

Sources Microsoft Kinect. (n.d.). Retrieved February 14, 2015, from http://hamlynkinect.wikispaces.com/microsoft Kinect Oberg, J., Eguro, K., Bittner, R., & Forin, A. (2012). Random Decision Tree Body Part Recognition Using FPGAs. International Conference on Field Programmable Logic and Applications, University of Oslo, Norway. Oberg, J. (2011, October 7). The Kinect's Body Part Recognition Algorithm on an FPGA - Microsoft Research. Retrieved from http: //research.microsoft.com/apps/video/dl.aspx?id=157648 Shotton, Jamie, Sharp, Toby, Kipman, Alex, Fitzgibbon, Andrew, Finocchio, Mark, Blake, Andrew, Cook, Mat & Moore, Richard (2013). Real-time Human Pose Recognition in Parts from Single Depth Images. Commun. ACM, 56, 116-124. Xbox One. (n.d.). Retrieved from http://en.wikipedia.org/wiki/xbox_one Xbox 360. (n.d.). Retrieved from http://en.wikipedia.org/wiki/xbox_360

Microsoft Kinect Video game console: XBox 360, XBox One Similar: Playstation, Nintendo Wii No game controllers/peripherals Use of natural user interface Features: 3D motion capture Facial recognition Voice recognition

Microsoft Kinect Uses infrared laser light with speckle pattern - speckle effect : interference of many waves of same frequency, having different phases, amplitude. Tracks upto 6 people - 2 active players using motion analysis; <x, y, z> Automatic Sensor Calibration - Based on Gameplay - Based on physical Environment

Microsoft Kinect Vision based object recognition Pixel classification using Random Decision Trees (RDT) Algorithm: forest fire pixel classification algorithm Hardware: Field Programmable Gate Array (FPGA) Inferring body position: Compute a depth map using structured light Depth from focus Depth from stereo Machine learning

Decision Trees Use of Decision Trees in Kinect a) Efficiency - computationally efficient b) Relatively Easy to Update Algorithm - Integrate new innovations - Include new use cases

Random Forests Ensemble learning Example: the Netflix prize Combined models Trees, trees, more trees Bagging Bootstrap aggregating Add more randomness Feature bagging

Random Forest One tree trained on a subset of features p features, sqrt p selected each time Another tree trained on a different subset of features Whole forest of trees http://citizennet.com/blog/wp-content/uploads/2012/11/rf.jpg

Random Forest Pros: Efficient Distributed Variable importance Cons: Interpretability

Body Part Inference Define several body parts labels. Parts could be changed to suit a particular application. Small parts = accurately localized body joined.

Depth Image Features di (x)= depth of pixel θ = offsets ensures features are depth invariant

Depth Feature continued fθ1 looks upward : gives a large positive response at the upper portion of the body but close to zero near lower down the body Provide weak signal about which part of the body a pixel belongs to. Decision forest is what makes it accurate. Removes disambiguity.

Decision Tree Creation Each tree gets: depth limit, a random 2000 pixel from each training image, set of candidate features Candidate features: parameters that determine how likely a pixel is a particular joint and a threshold Candidate features used in splitting subset in half Pixels above and below threshold Entropy and Info Gain calculated from these two subset

Classification Probability distribution at leaves Distributions of trees in forest are averaged for classification of a pixel 31 joints calculated with mean-shift clustering

Experiment Forests: 3 trees, 20 nodes deep, 300k training images per tree, 2000 random pixels per image, 2000 candidate features, 50 candidate thresholds per feature Datasets: 8808 real images, hand labeled 5000 images synthesized from motion capture poses Synthetic silhouette images

Results per Pixel

Results for Joints

Conclusion Better accuracy than previous NN methods Faster classification time than NN Better than Ganapathi et al. method Doesn t exploit temporal and kinematic constraints