Human Action Recognition A Grand Challenge

Similar documents
Model-Based Human Motion Capture from Monocular Video Sequences

Recognition of Two-person Interactions Using a Hierarchical Bayesian Network

1 Introduction Human activity is a continuous flow of single or discrete human action primitives in succession. An example of a human activity is a se

Introduction to behavior-recognition and object tracking

A Review on Human Actions Recognition Using Vision Based Techniques

A Background Subtraction Based Video Object Detecting and Tracking Method

Automatic visual recognition for metro surveillance

Submitted to ACM Autonomous Agents 99. When building statistical machine learning models from real data

Human Detection. A state-of-the-art survey. Mohammad Dorgham. University of Hamburg

ActivityRepresentationUsing3DShapeModels

Human motion analysis: methodologies and applications

Object Recognition 1

Object Recognition 1

Graphical Models for Recognizing Human Interactions

3D Reconstruction of Human Motion Through Video

Comparison between Motion Analysis and Stereo

Detection and recognition of moving objects using statistical motion detection and Fourier descriptors

Unsupervised Human Members Tracking Based on an Silhouette Detection and Analysis Scheme

Dynamic Time Warping for Binocular Hand Tracking and Reconstruction

Image-Based Reconstruction for View-Independent Human Motion Recognition

Human Upper Body Pose Estimation in Static Images

Definition, Detection, and Evaluation of Meeting Events in Airport Surveillance Videos

Expanding gait identification methods from straight to curved trajectories

Articulated Structure from Motion through Ellipsoid Fitting

3D Face and Hand Tracking for American Sign Language Recognition

Non-rigid body Object Tracking using Fuzzy Neural System based on Multiple ROIs and Adaptive Motion Frame Method

Marcel Worring Intelligent Sensory Information Systems

Vehicle Occupant Posture Analysis Using Voxel Data

Object Detection in Video Streams

IMPROVED FACE RECOGNITION USING ICP TECHNIQUES INCAMERA SURVEILLANCE SYSTEMS. Kirthiga, M.E-Communication system, PREC, Thanjavur

Raghuraman Gopalan Center for Automation Research University of Maryland, College Park

Recognizing Human Activities in Video by Multi-resolutional Optical Flows

Support Vector Machine-Based Human Behavior Classification in Crowd through Projection and Star Skeletonization

3D Human Motion Analysis and Manifolds

What is Computer Vision? Introduction. We all make mistakes. Why is this hard? What was happening. What do you see? Intro Computer Vision

Skeleton based Human Action Recognition using Kinect

EECS 442 Computer Vision fall 2011

Combined Shape Analysis of Human Poses and Motion Units for Action Segmentation and Recognition

Internal Organ Modeling and Human Activity Analysis

A Fast Moving Object Detection Technique In Video Surveillance System

Fully Automatic Methodology for Human Action Recognition Incorporating Dynamic Information

Last update: May 4, Vision. CMSC 421: Chapter 24. CMSC 421: Chapter 24 1

Highlights Extraction from Unscripted Video

Introduction to Mobile Robotics

ECE 172A: Introduction to Intelligent Systems: Machine Vision, Fall Midterm Examination

ELL 788 Computational Perception & Cognition July November 2015

COMPUTING A STATISTICAL MODEL BASED ON FEATURE VECTORS EXTRACTING FEATURE VECTORS DETECTING AND SEGMENTING THE HEAD MATCHING AGAINST STORED MODELS

Model-based segmentation and recognition from range data

Person identification from spatio-temporal 3D gait

IN computer vision develop mathematical techniques in

Visual Tracking of Human Body with Deforming Motion and Shape Average

Dynamic Human Shape Description and Characterization

EVENT DETECTION AND HUMAN BEHAVIOR RECOGNITION. Ing. Lorenzo Seidenari

Real-time Detection of Illegally Parked Vehicles Using 1-D Transformation

A Method of Annotation Extraction from Paper Documents Using Alignment Based on Local Arrangements of Feature Points

Part I: HumanEva-I dataset and evaluation metrics

ADAPTIVE TEXTURE IMAGE RETRIEVAL IN TRANSFORM DOMAIN

A Two-stage Scheme for Dynamic Hand Gesture Recognition

DEPTH AND GEOMETRY FROM A SINGLE 2D IMAGE USING TRIANGULATION

Finally: Motion and tracking. Motion 4/20/2011. CS 376 Lecture 24 Motion 1. Video. Uses of motion. Motion parallax. Motion field

Interactive Video Object Extraction & Inpainting 清華大學電機系. Networked Video Lab, Department of Electrical Engineering, National Tsing Hua University

I D I A P RECOGNITION OF ISOLATED COMPLEX MONO- AND BI-MANUAL 3D HAND GESTURES R E S E A R C H R E P O R T

Designing Video Processing Surveillance System. EE382 Embedded Software Systems. Literature Survey. Koichi Sato

Colorado School of Mines. Computer Vision. Professor William Hoff Dept of Electrical Engineering &Computer Science.

Object Detection System

Combination of Accumulated Motion and Color Segmentation for Human Activity Analysis

International Journal of Advanced Research in Electronics and Communication Engineering (IJARECE) Volume 4, Issue 4, April 2015

Gesture Recognition using Temporal Templates with disparity information

PART-LEVEL OBJECT RECOGNITION

Use of Gait Energy Image in Implementation of Real Time Video Surveillance System

An Interactive Technique for Robot Control by Using Image Processing Method

All human beings desire to know. [...] sight, more than any other senses, gives us knowledge of things and clarifies many differences among them.

Performance Evaluation Metrics and Statistics for Positional Tracker Evaluation

Sign Language Recognition using Dynamic Time Warping and Hand Shape Distance Based on Histogram of Oriented Gradient Features

Gesture Recognition Technique:A Review

Occlusion Robust Multi-Camera Face Tracking

A Real Time System for Detecting and Tracking People. Ismail Haritaoglu, David Harwood and Larry S. Davis. University of Maryland

ECG782: Multidimensional Digital Signal Processing

Object Tracking System Using Motion Detection and Sound Detection

Depth. Common Classification Tasks. Example: AlexNet. Another Example: Inception. Another Example: Inception. Depth

Introduction to Medical Imaging (5XSA0) Module 5

Object Recognition. The Chair Room

10/25/2018. Robotics and automation. Dr. Ibrahim Al-Naimi. Chapter two. Introduction To Robot Manipulators

Detecting and Identifying Moving Objects in Real-Time

Static Gesture Recognition with Restricted Boltzmann Machines

Real-time Gesture Pattern Classification with IMU Data

Motion Tracking and Event Understanding in Video Sequences

Investigation of Machine Learning Algorithm Compared to Fuzzy Logic in Wild Fire Smoke Detection Applications

The Forth Scientific Conference of the College of Science University of Kerbala Journal of University of Kerbala

Touching-Pigs Segmentation using Concave Points in Continuous Video Frames

Panoramic Appearance Map (PAM) for Multi-Camera Based Person Re-identification

Structure from Motion. Lecture-15

Announcements. Stereo Vision Wrapup & Intro Recognition

Evaluation of Moving Object Tracking Techniques for Video Surveillance Applications

VIDEO OBJECT SEGMENTATION BY EXTENDED RECURSIVE-SHORTEST-SPANNING-TREE METHOD. Ertem Tuncel and Levent Onural

Chapter 12 3D Localisation and High-Level Processing

of human activities. Our research is motivated by considerations of a ground-based mobile surveillance system that monitors an extended area for

Probabilistic Tracking and Reconstruction of 3D Human Motion in Monocular Video Sequences

A REAL-TIME FACIAL FEATURE BASED HEAD TRACKER

Announcements. Recognition I. Gradient Space (p,q) What is the reflectance map?

Transcription:

Human Action Recognition A Grand Challenge J. K. Aggarwal Department of Electrical and Computer Engineering The University of Texas at Austin Austin, TX 78712 1

Overview Human Action Recognition Motion An Important Cue Methodologies and Results by other Researchers Actions and Interaction. Simple actions Interaction - simple Interaction - continued and recursive Objects and actions Some Final Thoughts. 2

Human Action Recognition Image Processing / Pattern Recognition / Computer Vision have matured from recognizing simple objects, textures and images to recognizing discrete actions, continuous action and sequences of actions called events.this poses serious challenges and in the long run this may hold significant benefits. 3

Grand Challenges! A system that may detect a person having a heart attack in a hotel room or a system that may detect a person drowning at a swimming pool. Such systems would have universal applicability if the false alarm rate was really low. Research in human action recognition is at the embryonic stage, a successful system of this nature may require a mating of several disciplines including psychology and may be still far in the future. 4

Today s Research on Human Action Recognition Personal Assistants Surveillance Medical imaging Patient monitoring Analyzing sports and dance videos Robot/Human interaction Biometrics Kinesiology Almost all of the above applications involve enterprise systems 5

System Components Low-level vision module concerned with segmentation and other low-level tasks, there is no substitute for good segmentation! Modeling and recognition of the action and interactions, including motion of objects. Semantic description of the actions and interactions. 6

Motion - An Important Cue Detection Segmentation Tracking Deriving 3D information Structure from motion Point of impact Recognition 7

Study of Motion Long history of scientific thought spanning diverse disciplines: philosophy, psychophysics, neurobiology, and of course, pattern recognition, robotics, computer graphics and computer vision. Zeno, Aristotle, Bertrand Russell, 8

My Simple Beginnings in Motion Research (1970 s) Cloud motions from satellite image sequences. Idealized clouds as rigid polygons moving in a plane (later to curvilinear figures). For the idealized cloud models determined linear and angular velocities. 9

First Encounter with Structure from Motion Computer Analysis of Moving Polygonal Images, IEEE Trans. on Computers Vol. C-24, no.10 October, 1975. (with R. O. Duda) 10

Motion of Rigid and Jointed Objects Johansson s experiments (1975) - lights attached to major joints of a person - inspired Jon Webb. Each rigid body part represented by two points. Webb decomposed such a motion into a rotation and a translation, where the rotation axis is fixed for short periods of time. Webb determined structure of jointed objects under orthographic projection. 11

Motion of Rigid and Jointed Hoffman data/mit. Six points on a walking man, 0.26 sec. Artificial Intelligence, vol.19, 1982, 107-130. Objects, Contd. 12

Trying to Avoid Establishing Correspondence of Points Emphasis on points in the computation of structure-from-motion. For points difficult to: Find Localize Occlusion Correspondence A natural question: Can one do better with lines, planes and surfaces? 13

Lines Lebegue (1992) chose lines with particular orientation, using vanishing points. Tracked 3D line segments using a Kalman filter. Built a CAD model of a corridor by mounting a single camera on a tetherless robot. 14

Results (IEEE Trans. On Robotics and Automation, vol.9 no.6, 1993, 801-815) 15

Recent Progress Today, research has progressed from the structure from motion paradigm to tracking and recognition of events human actions, human-human interactions and human-object interactions Computational speed, cheap storage and cameras. 16

Issues in Human Actions/ Interactions Recognition Diversity Actions - walking, running, throwing etc Interactions - ranging from hugging, kissing and shaking hands to punching and killing. Interactions between objects and persons Occlusions, correspondence difficulties due to loose clothing and shadows, reflections and lighting conditions. Besides the computer vision issues, a number of contextual and philosophical issues persons can interact by not interacting. 17

Human Motion/Body modeling Different types of motion: Rigid vs. non-rigid motion Human motion is articulated motion, a subset of non-rigid motion. Composed of piecewise rigid motions of individual body parts, but the overall motion of the entire human body is not rigid. Model-based vs. appearance-based Well-defined a priori knowledge of the object shape available or not. 18

High-level recognition schemes with domain knowledge Human-involved scene understanding needs more abstract and meaningful schemes than purely physical laws for interpreting what is happening in the scene. Understanding very long image sequences requires another abstraction scheme: the event. The event is regarded as a sort of summary of the whole sequence, and the summary is closely related to real world knowledge. 19

Action, Interaction and Event: A Taxonomy Nagel (1988): Change, Event, Verb, Episode and History. Bobick (1997): Movement, Activity and Action. Park and Aggarwal (2003):Pose, Gesture, Action, Interaction. 20

A Bayesian Computer Vision System for Modeling Human Interactions Nuria Oliver, Barbara Rosario, and Alex Pentland Objective: Monitor person-to-person interactions in perspective-view image sequences Pedestrians walk in wide area, Humans move 2-dimensionally in image sequences Blob level tracking (track a human as one blob) Approach, Meet, Walk, etc. Method: Trajectory shape recognition Coupled Hidden Markov Model Recognize sequences of state change Multiple state streams 21

Interaction Behavior Models N. Johnson, A. Galata, and D. Hogg The acquisition of interaction behaviors by observing humans and using models to simulate a plausible partner during interaction. A Spline type contour to specify the persons. 22

Person-on-Person Violence Detection A. Datta, M. Shah and N. Lobo Detect human violence such as fighting, kicking, hitting with objects. Motion trajectory and orientation of person s limbs. AMV- Acceleration Measure Vectordirection and magnitude of motion- jerk temporal derivative of AMV. 23

Segmentation and Recognition of Continuous Human Activity Anjum Ali 24

Preprocessing : Segmentation and skeletonization sequence frame background thresholded image Image after filtering and connected component labeling 25

1 8 17 25 30 37 43 52 59 walking sitting standing up walking squatting rising bending getting up 26

Results Number of Breakpoint frames Actions Walking Sitting Standing up Bending Getting up Squatting Total 128 143 29 13 16 21 19 24 Correct 109 110 26 10 12 15 14 18 Efficiency 85 76 89 76 75 71 73 75 Rising 21 15 71 Event 2001, Vancouver, BC, pp.28-35 27

Blob-level Interaction of Persons 1. Temporal Spatio-Velocity (TSV) Transform 2. Tracking of blobs 3. Interaction Recognition Koichi Sato 28

Temporal Spatio-velocity (TSV) Transform Extract pixel velocities from an image sequence Remove the velocity-unstable pixels Segment blobs by velocity similarity v x v x v x x 29

Tracking Image Sequences Background Subtraction Human Tracking Object Extraction Interaction Classification Human Trajectory Temporal Spatio-velocity Transform Velocity x Interaction Type Blob Identification 30

Interaction Image Sequences Feature Extraction Human Tracking Interaction Moment Detection Human Trajectory Interaction Classification Feature Alignment Interaction Type Nearest Mean Method 31

Person2 Person1 S S S M S M S S S S S M S S S M APART S M S S STOPGO MEET S S LEAVE STOP STANDSTILL S M FOLLOW WALKSTOP PASS1 PASS2 TAKEOVER Simple Interaction Unit (SIU) types S M : Moving State S S : State of Being Still 32

Follow 33

Meet and Leave 34

Results Human segmentation: 184 out of 194, only 10 persons lost. Tracked 159, correctly tracked 123, lost in tracking 5, wrongly tracked 31. Interaction type, training 54, test 70, correctly classified 56, classification accuracy 80%. (Computer Vision and Image Understanding 96 (2004) 100-128. ) 35

Interaction At the Detailed Level Pushing sequence Problems involved: segmenting multiple body parts tracking body parts treatment of occlusion and shadows between body parts recognizing human interaction type 36

Verbal semantic description Image sequence Q 1 Q 2 Q 3 Y 1 Y 2 Y 3 Dynamic Bayesian network Sequence classification Body-part tree Body pose estimation Bayesian network 2 3 Image patches 7 3 2 1 4 5 6 7 1 4 6 5 Attribute relational graph Multi-blob tracking Regular lattice 8-connectivity Pixel classification A hierarchical framework System Diagram for recognizing Details human interaction 37

Results of segmentation Hugging 38

Gesture Detection using DBN e 1 D B A C D B A A C C B Observation e = [V 1,, V 11 ] T BN Pose Estimation using BN φ 1 C C C C A A BN Pose estimation φ = [H 1,, H 6 ] T e 2 C B A D C B B B C C B BN φ 2 B B D B A B 39

Results: Punching Interaction 40

Results: Hugging Interaction 41

Recognition Accuracy ACM Multimedia Systems 10(2), 164-179, 2004 42

Recursive human activities Recursive representation will allow us to describe higherlevel abstract human activities. For example, in fighting interaction, The number of sub-activities is not fixed. punching or kicking after some fighting is also fighting. fighting can be defined in terms of smaller fighting this=fighting_interaction (person1, person2) x= fighting_interaction (p1, p2) y= Punching or kicking (p1, p2) 43

Experimental results Recognized following 10 types of interactions (approach, depart, point, shake-hands, hug, punch, kick, and push) + (Fighting and Greeting). Videos of sequences of interactions are taken. 320*240 resolution, 15 frames per sec. Features are extracted for each image frame. 44

Experimental results (Cont d) 45

46 Experimental results (Cont d) 0.917 88 96 total 0.917 11 12 Push 0.833 10 12 Kick 0.917 11 12 Punch 0.833 10 12 Hug 0.917 11 12 Shake-hands 0.917 11 12 Point 1.000 12 12 Depart 1.000 12 12 Approach Accuracy Correct total interaction 0.667 8 12 total 0.667 4 6 Greeting 0.667 4 6 Fighting Accuracy Correct total interaction

Detection of Fence Climbing from Monocular Videos Elden Yu 47

Difficulty Assumptions Variability of fences Variability of climbing by fence and by person Fixed type of fences Single person Three possible views: front, back and side 48

frames Observation Symbol(1-48) Hidden State(1-12) State block(1-4) Truth (1-4) false decoding comments 30 10 2 1 1 0 31 10 1 1 1 0 d:\work\data2\pool_malik_climb_4051 32 34 3 1 1 0 Malik climbling in the pool side fe 33 34 3 1 1 0 34 34 3 1 1 0 35 34 3 1 1 0 for this sequence 36 34 3 1 1 0 the frame accuracy (compare column 37 34 3 1 1 0 is as high as 88% 38 34 3 1 1 0 only 29/234 state block is wrongly 39 34 3 1 1 0 40 34 3 1 1 0 41 34 3 1 1 0 column D is abridged by removing re 42 34 3 1 1 0 as 43 34 3 1 1 0 1 2 4 2 3 4 1 44 34 3 1 1 0 detected 45 34 3 1 1 0 46 34 3 1 1 0 close me to go back to slides 47 34 3 1 1 0 48 34 3 1 1 0 49 34 3 1 1 0 50 34 3 1 1 0 51 34 3 1 1 0 52 34 3 1 1 0 53 34 3 1 1 0 54 34 3 1 1 0 55 34 3 1 1 0 56 34 3 1 1 0 57 34 3 1 1 0 58 34 3 1 1 0 59 34 3 1 1 0 60 34 3 1 1 0 61 26 3 1 1 0 62 34 3 1 1 0 63 34 3 1 1 0 64 34 3 1 1 0 65 34 3 1 1 0 66 34 3 1 1 0 67 34 3 1 1 0 68 34 3 1 1 0 69 34 3 1 1 0 70 34 3 1 1 0 71 34 3 1 1 0 72 34 3 1 1 0 73 34 3 1 1 0 74 34 3 1 1 0 75 34 3 1 1 0 76 34 3 1 1 0 77 34 3 1 1 0 78 26 3 1 1 0 79 34 3 1 1 0 80 34 3 1 1 0 81 34 3 1 1 0 82 33 3 1 1 0 83 33 3 1 1 0 84 10 3 1 1 0 Example 49

Some Final Thoughts A long way from tracking planar polygonal objects in 1975. Interaction of two persons is significantly more complicated than tracking persons. The complexities of occlusion and correspondence probably shall never leave us!! Does the computer see and track what we perceive?! 50

Thank You PICASSO Figuras Al Borde Del Mar Figures By The Sea Shore 51