People Detection and Video Understanding

Similar documents
CS231N Section. Video Understanding 6/1/2018

Action recognition in videos

Person Action Recognition/Detection

Two-Stream Convolutional Networks for Action Recognition in Videos

Large-scale Video Classification with Convolutional Neural Networks

P-CNN: Pose-based CNN Features for Action Recognition. Iman Rezazadeh

AUTOMATIC 3D HUMAN ACTION RECOGNITION Ajmal Mian Associate Professor Computer Science & Software Engineering

Minimizing hallucination in Histogram of Oriented Gradients

EVENT DETECTION AND HUMAN BEHAVIOR RECOGNITION. Ing. Lorenzo Seidenari

Understanding Sport Activities from Correspondences of Clustered Trajectories

Storyline Reconstruction for Unordered Images

Multiple Kernel Learning for Emotion Recognition in the Wild

Multilayer and Multimodal Fusion of Deep Neural Networks for Video Classification

Class 9 Action Recognition

RECURRENT NEURAL NETWORKS

Evaluation of local descriptors for action recognition in videos

Automatic visual recognition for metro surveillance

Motion Interchange Patterns for Action Recognition in Unconstrained Videos

Multiple-Choice Questionnaire Group C

Towards Large-Scale Semantic Representations for Actionable Exploitation. Prof. Trevor Darrell UC Berkeley

LEARNING TO SEGMENT MOVING OBJECTS IN VIDEOS FRAGKIADAKI ET AL. 2015

Graph Neural Network. learning algorithm and applications. Shujia Zhang

Lecture 18: Human Motion Recognition

RESEARCH ACTIVITY. 1. Summary of past research activities ( )

Towards Unsupervised Sudden Group Movement Discovery for Video Surveillance

Stereoscopic Video Description for Human Action Recognition

CS 231A Computer Vision (Fall 2011) Problem Set 4

Leveraging Textural Features for Recognizing Actions in Low Quality Videos

Dynamic Vision Sensors for Human Activity Recognition

RECOGNIZING HAND-OBJECT INTERACTIONS IN WEARABLE CAMERA VIDEOS. IBM Research - Tokyo The Robotics Institute, Carnegie Mellon University

From attribute-labels to faces: face generation using a conditional generative adversarial network

Activity Recognition in Temporally Untrimmed Videos

Object detection using Region Proposals (RCNN) Ernest Cheung COMP Presentation

A Unified Method for First and Third Person Action Recognition

Gesture Recognition in Ego-Centric Videos using Dense Trajectories and Hand Segmentation

GUANGHAN NING Pinard St, Milpitas, CA, 95035

SURVEILLANCE VIDEO INDEXING AND RETRIEVAL USING OBJECT FEATURES AND SEMANTIC EVENTS

Flow-Based Video Recognition

Deep Learning For Video Classification. Presented by Natalie Carlebach & Gil Sharon

Statistics of Pairwise Co-occurring Local Spatio-Temporal Features for Human Action Recognition

Image and Video Understanding

Real-Time Action Recognition System from a First-Person View Video Stream

1 MS Student OR 1 Undergrad Student

Disguised Face Identification (DFI) with Facial KeyPoints using Spatial Fusion Convolutional Network. Nathan Sun CIS601

SPATIO-TEMPORAL PYRAMIDAL ACCORDION REPRESENTATION FOR HUMAN ACTION RECOGNITION. Manel Sekma, Mahmoud Mejdoub, Chokri Ben Amar

Bilinear Models for Fine-Grained Visual Recognition

Human Detection and Action Recognition. in Video Sequences

Toward Retail Product Recognition on Grocery Shelves

BEYOND BAG-OF-WORDS: FAST VIDEO CLASSIFICATION WITH FISHER KERNEL VECTOR OF LOCALLY AGGREGATED DESCRIPTORS

Eigen-Evolution Dense Trajectory Descriptors

Automatic Parameter Adaptation for Multi-Object Tracking

Classification of objects from Video Data (Group 30)

Interpretable Compressed Domain Video Annotation TU Berlin IC / Frauhofer HHI, TU Berlin IDA

Content-based image and video analysis. Event Recognition

Visual Recognition and Search

Action Recognition with Improved Trajectories

arxiv: v1 [cs.cv] 29 Apr 2016

HUMAN ACTION RECOGNITION IN VIDEOS: A SURVEY

VISION FOR AUTOMOTIVE DRIVING

Editorial Manager(tm) for International Journal of Pattern Recognition and

Human Action Recognition from Gradient Boundary Histograms

Object Detection. Part1. Presenter: Dae-Yong

Automatic summarization of video data

Using Machine Learning for Classification of Cancer Cells

LOCAL VISUAL PATTERN MODELLING FOR IMAGE AND VIDEO CLASSIFICATION

Computer Vision: Making machines see

List of Accepted Papers for ICVGIP 2018

Learning Visual Semantics: Models, Massive Computation, and Innovative Applications

Synscapes A photorealistic syntehtic dataset for street scene parsing Jonas Unger Department of Science and Technology Linköpings Universitet.

Fast scene understanding and prediction for autonomous platforms. Bert De Brabandere, KU Leuven, October 2017

Human Detection and Tracking for Video Surveillance: A Cognitive Science Approach

Human-Robot Interaction

Beyond Bags of Features

Camera-based Vehicle Velocity Estimation using Spatiotemporal Depth and Motion Features

Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset. By Joa õ Carreira and Andrew Zisserman Presenter: Zhisheng Huang 03/02/2018

Evaluation of Local Space-time Descriptors based on Cuboid Detector in Human Action Recognition

An Exploration of Computer Vision Techniques for Bird Species Classification

arxiv: v1 [cs.cv] 25 Aug 2016

AUTOMATIC VIDEO INDEXING

Deep Learning for Semantic Video Understanding

Recognize Complex Events from Static Images by Fusing Deep Channels Supplementary Materials

CS230: Lecture 3 Various Deep Learning Topics

Video Clustering using Camera Motion

Deep learning for dense per-pixel prediction. Chunhua Shen The University of Adelaide, Australia

Deep Learning for Vision

Object Classification for Video Surveillance

Fish species recognition from video using SVM classifier

Descriptors for CV. Introduc)on:

First-Person Animal Activity Recognition from Egocentric Videos

CS229: Action Recognition in Tennis

Vision and Image Processing Lab., CRV Tutorial day- May 30, 2010 Ottawa, Canada

Deep CNN Object Features for Improved Action Recognition in Low Quality Videos

Procedural Generation of Videos to Train Deep Action Recognition Networks (Supplementary Material)

Neue Verfahren der Bildverarbeitung auch zur Erfassung von Schäden in Abwasserkanälen?

Recognition. Topics that we will try to cover:

3D Shape Analysis with Multi-view Convolutional Networks. Evangelos Kalogerakis

Learning a Deep Model for Human Action Recognition from Novel Viewpoints

arxiv: v1 [cs.cv] 7 Apr 2018

HUMAN action recognition has received significant research

Describable Visual Attributes for Face Verification and Image Search

Transcription:

1 People Detection and Video Understanding Francois BREMOND INRIA Sophia Antipolis STARS team Institut National Recherche Informatique et Automatisme Francois.Bremond@inria.fr http://www-sop.inria.fr/members/francois.bremond/ CoBTeK, Nice University Hospital Sophia Antipolis

2 Video Understanding Objective: Designing systems for Real time recognition of human activities observed by various sensors (especially video cameras). Challenge: Bridging the gap between numerical sensors and semantic events. Approach: Spatio-temporal reasoning and knowledge management. Examples of human activities: for individuals (vandalism, bank attack, cooking, washing dishes, falling) for small groups (fighting) for crowd (overcrowding) for interactions of people and vehicles (aircraft refueling)

Generic Platform for activity understanding 3 Posture Recognition Activity Models Detection Classification Tracking Activity Recognition Actions Person inside Kitchen 3

People detection and tracking 4 Results a background updating scheme was used for some sequences

People detection : faster R-CNN on ETHZ 5

Motivation - Action Recognition Hollywood dataset 6

Motivation - Action Recognition UCF Sports dataset 7

Motivation - Action Recognition Daily Living datasets (Rochester Univ.) 8

Action Recognition using Bag of Words M. Koperski 9 Videos Feature detector Feature descriptor BOW model Histograms of codewords Non-linear SVM Codeword defined as a Descriptor cluster

Violence Recognition Framework, P. Bilinski 10 Trajectory Shape HOF HOG MBH Violence Non-violence Input Video Feature Detection (Improved Dense Trajectories) Feature Description (TS, HOG, HOF, MBH) Video Representation (Improved Fisher Vectors) Classification (SVM) Violence Non-violence Pub Street Football Stadium Football Stadium Steet School Movies Analysis Football Stadium

Gender recognition using smile: A. Dantcheva 11 Spatio-temporal features based on dense trajectories represented by a set of descriptors encoded by Fisher Vectors.

Toyota Smart-Home Large scale daily living dataset 12

Toyota Smart-Home Large scale daily living dataset 13

Issues in Action Recognition using Deep CNNs 14 Deep Convolutional Neural Networks (CNN) Images Large Annotated data (Imagenet) Architecture Suitable for Images with good resolution Videos: How to capture motion information in CNN? Stacking of frames Capture motion independently or not: several stream CNNs One ConvNet to capture static (frame based) visual information. Another ConvNet to capture motion information (like Optical Flow, but expensive) Other Nets to capture motion on longer scales or together (Siamese) Other Nets to capture object-ness. C. Roberto de Souza, A. Gaidon, E. Vig, and A. Lopez. Sympathy for the Details: Dense Trajectories and Hybrid Classification Architectures for Action Recognition, ECCV 2016

Convolutional Pose Machines for Action Recognition 15 Proposed a representation derived from human pose using Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields The descriptor aggregates motion and appearance information along tracks of human body parts using P-CNN : Pose-based CNN Features for Action Recognition

Toyota Smart-Home Large scale daily living dataset 16 Pour water for tea Prepare tea

Activity monitoring in Greece Hospital with AD patients Visualization of older adult performance while accomplishing the semi-guided tasks. 17

Conclusion - video understanding 18 A global framework for building real-time video understanding systems: Perspectives: Generate totally unsupervised models Use finer features as input for the algorithm (head, posture, facial gesture ) Generating language description for the activity models Generic activity models (cross scenes), Adaptive learning More semantics, emotion, mental states. 4 PhD open topics: Kontron: People Tracking using Deep Learning algorithms on embedded hardware ESI: People Re-Identification using Deep Learning Wildmoka: Video based Action Recognition using Deep Learning Nice Hospital: Uncertainty Management and Activity Recognition