Person Action Recognition/Detection
|
|
- Irene Marshall
- 5 years ago
- Views:
Transcription
1 Person Action Recognition/Detection Fabrício Ceschin Visão Computacional Prof. David Menotti Departamento de Informática - Universidade Federal do Paraná 1
2 In object recognition: is there a chair in the image? In object detection: is there a chair and where is it in the image? 2
3 In action recognition: is there an action present in the video? In action detection: is there an action and where is it in the video? 3
4 4
5 Datasets 5
6 KTH Six types of human actions: walking, jogging, running, boxing, hand waving and hand clapping Four different scenarios: outdoors s1, outdoors with scale variation s2, outdoors with different clothes s3 and indoors s sequences taken with a static camera with 25 fps. 6
7 Hollywood2 12 classes of human actions. 10 classes of scenes distributed over 3669 video clips from 69 movies. Approximately 20.1 hours of video in total. 7
8 UCF Sports Action Data Set Set of actions collected from various sports which are typically featured on broadcast television channels such as the BBC and ESPN. 10 classes of human actions. 150 sequences with the resolution of 720 x
9 UCF YouTube Action Data Set 11 action categories collected from YouTube and personal videos. Challenging due to large variations in camera motion, object pose and appearance, object scale, viewpoint, cluttered background, illumination conditions, etc. 9
10 JHMDB 51 categories, 928 clips, frames. Puppet flow per frame (approximated optical flow on the person). Puppet mask per frame. Joint positions per frame. Action label per clip. Meta label per clip (camera motion, visible body parts, camera viewpoint, number of people, video quality). 10
11 Articles 11
12 Timeline Learning realistic human actions from movies Ivan Laptev, Marcin Marszałek, Cordelia Schmid and Cordelia Schmid CVPR Dense Trajectories and Motion Boundary Descriptors for Action Recognition Heng Wang, Alexander Kl aser, Cordelia Schmid and Cheng-Lin Liu IJCV Two-Stream Convolutional Networks for Action Recognition in Videos Finding Action Tubes Georgia Gkioxari Jitendra Malik CVPR 2015 Karen Simonyan Andrew Zisserman NIPS
13 Learning Realistic Human Actions from Movies Ivan Laptev, Marcin Marszałek, Cordelia Schmid and Cordelia Schmid CVPR 13
14 Introduction & Dataset Generation Inspired by new robust methods for image description and classification. First version of Hollywood dataset. Movies contain a rich variety and a large number of realistic human actions. To avoid the difficulty of manual annotation, the dataset was build using script-based action annotation. Time information is transferred from subtitles to scripts and then time intervals for scene descriptions are inferred - 60% precision achieved. Learning Realistic Human Actions from Movies 14
15 Script-based Action Annotation Example of matching speech sections (green) in subtitles and scripts. Time information (blue) from adjacent speech sections is used to estimate time intervals of scene descriptions (yellow). Learning Realistic Human Actions from Movies 15
16 Space-time Features Detect interest points using a space-time extension of the Harris operator. Histogram descriptors of space-time volumes are computed in the neighborhood of detected points (the size of each volume is related to the detection scales). Each volume is subdivided into a (Nx, Ny, Nt) grid of cuboids; for each cuboid HOG and HOF are computed. Both are concatenated, creating a descriptor vector. Learning Realistic Human Actions from Movies 16
17 Space-time Features Space-time interest points detected for two video frames with human actions hand shake (left) and get out car (right). Result of detecting the strongest spatio-temporal interest points in a football sequence with a player heading the ball (a) and in a hand clapping sequence (b). Learning Realistic Human Actions from Movies 17
18 Spatio-temporal Bag-of-features A visual vocabulary is built clustering a subset of 100k features sampled from the training videos with the k-means algorithm, with k=4000. BoF assigns each feature to the closest (Euclidean distance) vocabulary word and computes the histogram of visual word occurrences over a space-time volume corresponding either to the entire video sequence or subsequences defined by a spatio-temporal grid. Learning Realistic Human Actions from Movies 18
19 Spatio-temporal Bag-of-features Bag-of-features illustration. Learning Realistic Human Actions from Movies 19
20 Classification Support Vector Machine (SVM) with a multi-channel X² kernel that combines channels. Defined by: Where Hi={hin} and Hj={Hjn} are the histograms for channel c and Dc(Hi,Hj) is the X² distance defined as: Learning Realistic Human Actions from Movies 20
21 Results Average class accuracy on the KTH actions dataset: Method Schuldt et al. Niebles et al. Wong et al. This work Accuracy 71.7% 81.5% 86.7% 91.8% Learning Realistic Human Actions from Movies 21
22 Results Average precision (AP) for each action class of test set - results for clean (annotated), automatic training data and for a random classifier (chance) Clean Automatic Chance AnswerPhone 32.1% 16.4% 10.6% GetOutCar 41.5% 16.4% 6.0% HandShake 32.3% 9.9% 8.8% HugPerson 40.6% 26.8% 10.1% Kiss 53.3% 45.1% 23.5% SitDown 38.6% 24.8% 13.8% SitUp 18.2% 10.4% 4.6% StandUp 50.5% 33.6% 22.6% Learning Realistic Human Actions from Movies 22
23 Dense Trajectories and Motion Boundary Descriptors for Action Recognition Heng Wang, Alexander Kl aser, Cordelia Schmid and Cheng-Lin Liu IJCV
24 Introduction Bag-of-features achieves state-of-the-art performance. Feature trajectories have shown to be efficient for representing videos. Generally extracted using KLT tracker or matching SIFT descriptors between frames, however, quantity and quality are not enough. Video description by dense trajectories Dense trajectories and motion boundary descriptors for a.r. 24
25 Dense Trajectories Separate sample feature points on a grid spaced by W pixels (W=5). Sampling is carried out on each spatial scale separately and the goal is to track all these sampled points through the video. Areas without any structure are removed (if the eigenvalues of the auto-correlation matrix are very small - few explanation ). Feature points are tracked on each spatial scale separately. Features are extracted using grids of cuboids, similar to last article Dense trajectories and motion boundary descriptors for a.r. 25
26 Dense Trajectories Left: feature points are densely sampled on a grid for each spatial scale. Middle: tracking is carried out in the corresponding spatial scale for L frames by median filtering in a dense optical flow field. Right: trajectory shape is represented by relative point coordinates. The descriptors (HOG, HOF, MBH) are computed along the trajectory in a N N pixels neighborhood, which is divided into grids of cuboids *Motion boundary histograms (MBH) are extracted by computing derivatives separately for the horizontal and vertical components of the optical flow. Dense trajectories and motion boundary descriptors for a.r. 26
27 Dense Trajectories 2013 Dense trajectories and motion boundary descriptors for a.r. 27
28 Results Comparison of different descriptors and methods for extracting trajectories on nine datasets. Mean average precision is reported over all classes (map) for Hollywood2 and Olympic Sports, average accuracy over all classes for the other seven datasets. The three best results for each dataset are in bold Dense trajectories and motion boundary descriptors for a.r. 28
29 Two-Stream Convolutional Networks for Action Recognition in Videos Karen Simonyan and Andrew Zisserman NIPS
30 Introduction CNNs work very well for image recognition. Extend CNN to action recognition in video. Two separate recognition streams related to the two-stream hypothesis: Spatial Stream - appearance recognition ConvNet. Temporal Stream - motion recognition ConvNet Two-Stream Convolutional Networks for Action Recognition in Videos 30
31 Two-stream Hypothesis Ventral pathway (purple, what pathway ) responds to shape, color and texture. Dorsal pathway (green, where pathway ) responds to spatial transformations and movement Two-Stream Convolutional Networks for Action Recognition in Videos 31
32 Two-stream Architecture for Video Recognition Spatial part: in the form of individual frame appearance, carries information about scenes and objects in the video. Temporal part: in the form of motion across the frames, carries information about the movement of the camera and the objects Two-Stream Convolutional Networks for Action Recognition in Videos 32
33 Two-stream Architecture for Video Recognition Each stream is implemented using a deep ConvNet, softmax scores which are combined by fusion methods. Two fusion methods proposed: Averaging. Training a multiclass linear SVM on stacked L2-normalised softmax scores as features Two-Stream Convolutional Networks for Action Recognition in Videos 33
34 The Spatial Stream ConvNet Similar model used for image classification. Operates on individual video frames. Static appearance is a useful feature, due to actions that are strongly associated with particular objects. Network pre-trained on a large image classification dataset, such as the ImageNet challenge dataset Two-Stream Convolutional Networks for Action Recognition in Videos 34
35 The Temporal Stream ConvNet Input of the ConvNet model is stacking optical flow displacement fields between several consecutive frames. This input describes the motion between video frame. Motion representation: Optical flow stacking: displacement vector fields dtx and dty of L consecutive frames are stacked, creating a total of 2L input channels. Trajectory stacking: trajectory-based descriptors. Bi-directional optical flow, mean flow subtraction Two-Stream Convolutional Networks for Action Recognition in Videos 35
36 Optical Flow Stacking Displacement vector fields dtx and dty of L consecutive frames are stacked, creating a total of 2L input channels. Examples: higher intensity corresponds to positive values, lower intensity to negative values. (a) Horizontal component dx of the displacement vector field (b) Vertical component dy of the displacement vector field. Two-Stream Convolutional Networks for Action Recognition in Videos 36
37 Multi-task Learning Unlike the spatial stream ConvNet, which can be pre-trained on a large still image classification dataset (such as ImageNet), the temporal ConvNet needs to be trained on video data. Available datasets for video action classification are still rather small. UCF-101 and HMDB-51 datasets have only 9.5K and 3.7K, respectively. ConvNet architecture is modified so that it has two softmax classification layers on top of the last fully-connected layer: one softmax layer computes HMDB-51 classification scores, the other one the UCF-101 scores. Each of the layers is equipped with its own loss function, which operates only on the videos, coming from the respective dataset. The overall training loss is computed as the sum of the individual tasks losses Two-Stream Convolutional Networks for Action Recognition in Videos 37
38 Results Two-Stream Convolutional Networks for Action Recognition in Videos 38
39 Finding Action Tubes Georgia Gkioxari and Jitendra Malik CVPR
40 Introduction Image region proposals: regions that are motion salient are more likely to contain the action, so they are selected. Significant reduction in the number of regions being processed and faster computations. Detection pipeline also is inspired by the human vision system. Outperforms other techniques in the task of action detection Finding Action Tubes
41 Regions of Interest Selective search are used on the RGB frames to generate approximately 2K regions per frame. Regions that are void of motion are discarded using the optical flow signal. Motion saliency algorithm: Normalized magnitude of optical flow signal (fm) is seen as a heat map at the pixel level. If R is a region, then fm(r) = 1/( R ) i R fm(i) is a measure of how motion salient R is ɑ. R is discarded if fm(r) < ɑ. For ɑ = 0.3, approximately 85% of boxes are discarded Finding Action Tubes
42 Feature Extraction (a) Candidate regions are fed into action specific classifiers, which make predictions using static and motion cues. (b) The regions are linked across frames based on the action predictions and their spatial overlap. Action tubes are produced for each action and each video Finding Action Tubes
43 Action Detection Model Action specific SVM classifiers are used on spatio-temporal features. The features are extracted from the fc7 layer of two CNNs, spatial-cnn and motion-cnn, which were trained to detect actions using static and motion cues, respectively. The architecture of spatial-cnn and motion-cnn is similar to the ones used for image classification Finding Action Tubes
44 This approach yields an accuracy of 62.5%, averaged over the three splits of JHMDB Finding Action Tubes
45 General Results Dataset Ivan Laptev et al. Heng Wang et al Karen Simonyan et al Georgia Gkioxari et al KTH 91.8% 95.0% - - Hollywood2 38,38%* 58.2% - - UCF Youtube % - - UCF Sports % 88.0% 75.8% JHMDB % 59.4% 62.5% *First version of Hollywood2. 45
46 References Articles Learning Realistic Human Actions from Movies - Ivan Laptev, Marcin Marszałek, Cordelia Schmid, Cordelia Schmid - CVPR Action Recognition with Improved Trajectories - Heng Wang and Cordelia Schmid - CVPR 2013 Dense trajectories and motion boundary descriptors for action recognitionheng Wang, Alexander Kl aser, Cordelia Schmid, Cheng-Lin Liu - IJCV 2013 Two-Stream Convolutional Networks for Action Recognition in Videos Karen Simonyan and Andrew Zisserman - NIPS 2014 Finding Action Tubes - Georgia Gkioxari and Jitendra Malik - CVPR
47 References Datasets KTH Dataset UCF YouTube Action Data Set Hollywood2 Dataset UCF Sports Action Data Set Joint-annotated Human Motion Data Base (JHMDB) 47
Two-Stream Convolutional Networks for Action Recognition in Videos
Two-Stream Convolutional Networks for Action Recognition in Videos Karen Simonyan Andrew Zisserman Cemil Zalluhoğlu Introduction Aim Extend deep Convolution Networks to action recognition in video. Motivation
More informationLearning realistic human actions from movies
Learning realistic human actions from movies Ivan Laptev, Marcin Marszalek, Cordelia Schmid, Benjamin Rozenfeld CVPR 2008 Presented by Piotr Mirowski Courant Institute, NYU Advanced Vision class, November
More informationAction recognition in videos
Action recognition in videos Cordelia Schmid INRIA Grenoble Joint work with V. Ferrari, A. Gaidon, Z. Harchaoui, A. Klaeser, A. Prest, H. Wang Action recognition - goal Short actions, i.e. drinking, sit
More informationLearning Realistic Human Actions from Movies
Learning Realistic Human Actions from Movies Ivan Laptev*, Marcin Marszałek**, Cordelia Schmid**, Benjamin Rozenfeld*** INRIA Rennes, France ** INRIA Grenoble, France *** Bar-Ilan University, Israel Presented
More informationDeep Learning For Video Classification. Presented by Natalie Carlebach & Gil Sharon
Deep Learning For Video Classification Presented by Natalie Carlebach & Gil Sharon Overview Of Presentation Motivation Challenges of video classification Common datasets 4 different methods presented in
More informationLecture 18: Human Motion Recognition
Lecture 18: Human Motion Recognition Professor Fei Fei Li Stanford Vision Lab 1 What we will learn today? Introduction Motion classification using template matching Motion classification i using spatio
More informationCS231N Section. Video Understanding 6/1/2018
CS231N Section Video Understanding 6/1/2018 Outline Background / Motivation / History Video Datasets Models Pre-deep learning CNN + RNN 3D convolution Two-stream What we ve seen in class so far... Image
More informationContent-based image and video analysis. Event Recognition
Content-based image and video analysis Event Recognition 21.06.2010 What is an event? a thing that happens or takes place, Oxford Dictionary Examples: Human gestures Human actions (running, drinking, etc.)
More informationAction Recognition by Dense Trajectories
Action Recognition by Dense Trajectories Heng Wang, Alexander Kläser, Cordelia Schmid, Liu Cheng-Lin To cite this version: Heng Wang, Alexander Kläser, Cordelia Schmid, Liu Cheng-Lin. Action Recognition
More informationSPATIO-TEMPORAL PYRAMIDAL ACCORDION REPRESENTATION FOR HUMAN ACTION RECOGNITION. Manel Sekma, Mahmoud Mejdoub, Chokri Ben Amar
2014 IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP) SPATIO-TEMPORAL PYRAMIDAL ACCORDION REPRESENTATION FOR HUMAN ACTION RECOGNITION Manel Sekma, Mahmoud Mejdoub, Chokri
More informationActivity Recognition in Temporally Untrimmed Videos
Activity Recognition in Temporally Untrimmed Videos Bryan Anenberg Stanford University anenberg@stanford.edu Norman Yu Stanford University normanyu@stanford.edu Abstract We investigate strategies to apply
More informationEvaluation of Local Space-time Descriptors based on Cuboid Detector in Human Action Recognition
International Journal of Innovation and Applied Studies ISSN 2028-9324 Vol. 9 No. 4 Dec. 2014, pp. 1708-1717 2014 Innovative Space of Scientific Research Journals http://www.ijias.issr-journals.org/ Evaluation
More informationP-CNN: Pose-based CNN Features for Action Recognition. Iman Rezazadeh
P-CNN: Pose-based CNN Features for Action Recognition Iman Rezazadeh Introduction automatic understanding of dynamic scenes strong variations of people and scenes in motion and appearance Fine-grained
More informationExtracting Spatio-temporal Local Features Considering Consecutiveness of Motions
Extracting Spatio-temporal Local Features Considering Consecutiveness of Motions Akitsugu Noguchi and Keiji Yanai Department of Computer Science, The University of Electro-Communications, 1-5-1 Chofugaoka,
More informationEVENT DETECTION AND HUMAN BEHAVIOR RECOGNITION. Ing. Lorenzo Seidenari
EVENT DETECTION AND HUMAN BEHAVIOR RECOGNITION Ing. Lorenzo Seidenari e-mail: seidenari@dsi.unifi.it What is an Event? Dictionary.com definition: something that occurs in a certain place during a particular
More informationIMPROVING SPATIO-TEMPORAL FEATURE EXTRACTION TECHNIQUES AND THEIR APPLICATIONS IN ACTION CLASSIFICATION. Maral Mesmakhosroshahi, Joohee Kim
IMPROVING SPATIO-TEMPORAL FEATURE EXTRACTION TECHNIQUES AND THEIR APPLICATIONS IN ACTION CLASSIFICATION Maral Mesmakhosroshahi, Joohee Kim Department of Electrical and Computer Engineering Illinois Institute
More informationMoSIFT: Recognizing Human Actions in Surveillance Videos
MoSIFT: Recognizing Human Actions in Surveillance Videos CMU-CS-09-161 Ming-yu Chen and Alex Hauptmann School of Computer Science Carnegie Mellon University Pittsburgh PA 15213 September 24, 2009 Copyright
More informationCS229: Action Recognition in Tennis
CS229: Action Recognition in Tennis Aman Sikka Stanford University Stanford, CA 94305 Rajbir Kataria Stanford University Stanford, CA 94305 asikka@stanford.edu rkataria@stanford.edu 1. Motivation As active
More informationAction Recognition with HOG-OF Features
Action Recognition with HOG-OF Features Florian Baumann Institut für Informationsverarbeitung, Leibniz Universität Hannover, {last name}@tnt.uni-hannover.de Abstract. In this paper a simple and efficient
More informationDeep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks
Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks Si Chen The George Washington University sichen@gwmail.gwu.edu Meera Hahn Emory University mhahn7@emory.edu Mentor: Afshin
More informationA Spatio-Temporal Descriptor Based on 3D-Gradients
A Spatio-Temporal Descriptor Based on 3D-Gradients Alexander Kläser Marcin Marszałek Cordelia Schmid INRIA Grenoble, LEAR, LJK {alexander.klaser,marcin.marszalek,cordelia.schmid}@inrialpes.fr Abstract
More informationUnderstanding Sport Activities from Correspondences of Clustered Trajectories
Understanding Sport Activities from Correspondences of Clustered Trajectories Francesco Turchini, Lorenzo Seidenari, Alberto Del Bimbo http://www.micc.unifi.it/vim Introduction The availability of multimedia
More informationPeople Detection and Video Understanding
1 People Detection and Video Understanding Francois BREMOND INRIA Sophia Antipolis STARS team Institut National Recherche Informatique et Automatisme Francois.Bremond@inria.fr http://www-sop.inria.fr/members/francois.bremond/
More informationQuo Vadis, Action Recognition? A New Model and the Kinetics Dataset. By Joa õ Carreira and Andrew Zisserman Presenter: Zhisheng Huang 03/02/2018
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset By Joa õ Carreira and Andrew Zisserman Presenter: Zhisheng Huang 03/02/2018 Outline: Introduction Action classification architectures
More informationLong-term Temporal Convolutions for Action Recognition INRIA
Longterm Temporal Convolutions for Action Recognition Gul Varol Ivan Laptev INRIA Cordelia Schmid 2 Motivation Current CNN methods for action recognition learn representations for short intervals (116
More informationAction Recognition & Categories via Spatial-Temporal Features
Action Recognition & Categories via Spatial-Temporal Features 华俊豪, 11331007 huajh7@gmail.com 2014/4/9 Talk at Image & Video Analysis taught by Huimin Yu. Outline Introduction Frameworks Feature extraction
More informationAction Recognition From Videos using Sparse Trajectories
Action Recognition From Videos using Sparse Trajectories Alexandros Doumanoglou, Nicholas Vretos, Petros Daras Centre for Research and Technology - Hellas (ITI-CERTH) 6th Km Charilaou - Thermi, Thessaloniki,
More informationDense trajectories and motion boundary descriptors for action recognition
Dense trajectories and motion boundary descriptors for action recognition Heng Wang, Alexander Kläser, Cordelia Schmid, Cheng-Lin Liu To cite this version: Heng Wang, Alexander Kläser, Cordelia Schmid,
More informationLeveraging Textural Features for Recognizing Actions in Low Quality Videos
Leveraging Textural Features for Recognizing Actions in Low Quality Videos Saimunur Rahman, John See, Chiung Ching Ho Centre of Visual Computing, Faculty of Computing and Informatics Multimedia University,
More informationClass 9 Action Recognition
Class 9 Action Recognition Liangliang Cao, April 4, 2013 EECS 6890 Topics in Information Processing Spring 2013, Columbia University http://rogerioferis.com/visualrecognitionandsearch Visual Recognition
More informationEvaluation of local spatio-temporal features for action recognition
Evaluation of local spatio-temporal features for action recognition Heng Wang, Muhammad Muneeb Ullah, Alexander Klaser, Ivan Laptev, Cordelia Schmid To cite this version: Heng Wang, Muhammad Muneeb Ullah,
More informationModeling and visual recognition of human actions and interactions
Modeling and visual recognition of human actions and interactions Ivan Laptev To cite this version: Ivan Laptev. Modeling and visual recognition of human actions and interactions. Computer Vision and Pattern
More informationSpatio-temporal Feature Classifier
Spatio-temporal Feature Classifier Send Orders for Reprints to reprints@benthamscience.ae The Open Automation and Control Systems Journal, 2015, 7, 1-7 1 Open Access Yun Wang 1,* and Suxing Liu 2 1 School
More informationHuman Detection and Tracking for Video Surveillance: A Cognitive Science Approach
Human Detection and Tracking for Video Surveillance: A Cognitive Science Approach Vandit Gajjar gajjar.vandit.381@ldce.ac.in Ayesha Gurnani gurnani.ayesha.52@ldce.ac.in Yash Khandhediya khandhediya.yash.364@ldce.ac.in
More informationDeformable Part Models
CS 1674: Intro to Computer Vision Deformable Part Models Prof. Adriana Kovashka University of Pittsburgh November 9, 2016 Today: Object category detection Window-based approaches: Last time: Viola-Jones
More informationLarge-scale Video Classification with Convolutional Neural Networks
Large-scale Video Classification with Convolutional Neural Networks Andrej Karpathy, George Toderici, Sanketh Shetty, Thomas Leung, Rahul Sukthankar, Li Fei-Fei Note: Slide content mostly from : Bay Area
More informationCategory-level localization
Category-level localization Cordelia Schmid Recognition Classification Object present/absent in an image Often presence of a significant amount of background clutter Localization / Detection Localize object
More informationFast Realistic Multi-Action Recognition using Mined Dense Spatio-temporal Features
Fast Realistic Multi-Action Recognition using Mined Dense Spatio-temporal Features Andrew Gilbert, John Illingworth and Richard Bowden CVSSP, University of Surrey, Guildford, Surrey GU2 7XH United Kingdom
More informationObject and Action Detection from a Single Example
Object and Action Detection from a Single Example Peyman Milanfar* EE Department University of California, Santa Cruz *Joint work with Hae Jong Seo AFOSR Program Review, June 4-5, 29 Take a look at this:
More informationRobust Action Recognition Using Local Motion and Group Sparsity
Robust Action Recognition Using Local Motion and Group Sparsity Jungchan Cho a, Minsik Lee a, Hyung Jin Chang b, Songhwai Oh a, a Department of Electrical and Computer Engineering and ASRI, Seoul National
More informationAction Localization in Video using a Graph-based Feature Representation
Action Localization in Video using a Graph-based Feature Representation Iveel Jargalsaikhan, Suzanne Little and Noel E O Connor Insight Centre for Data Analytics, Dublin City University, Ireland iveel.jargalsaikhan2@mail.dcu.ie
More informationLearning Representations for Visual Object Class Recognition
Learning Representations for Visual Object Class Recognition Marcin Marszałek Cordelia Schmid Hedi Harzallah Joost van de Weijer LEAR, INRIA Grenoble, Rhône-Alpes, France October 15th, 2007 Bag-of-Features
More informationEasyChair Preprint. Real-Time Action Recognition based on Enhanced Motion Vector Temporal Segment Network
EasyChair Preprint 730 Real-Time Action Recognition based on Enhanced Motion Vector Temporal Segment Network Xue Bai, Enqing Chen and Haron Chweya Tinega EasyChair preprints are intended for rapid dissemination
More informationAction Recognition in Low Quality Videos by Jointly Using Shape, Motion and Texture Features
Action Recognition in Low Quality Videos by Jointly Using Shape, Motion and Texture Features Saimunur Rahman, John See, Chiung Ching Ho Centre of Visual Computing, Faculty of Computing and Informatics
More informationRich feature hierarchies for accurate object detection and semantic segmentation
Rich feature hierarchies for accurate object detection and semantic segmentation BY; ROSS GIRSHICK, JEFF DONAHUE, TREVOR DARRELL AND JITENDRA MALIK PRESENTER; MUHAMMAD OSAMA Object detection vs. classification
More informationMultilayer and Multimodal Fusion of Deep Neural Networks for Video Classification
Multilayer and Multimodal Fusion of Deep Neural Networks for Video Classification Xiaodong Yang, Pavlo Molchanov, Jan Kautz INTELLIGENT VIDEO ANALYTICS Surveillance event detection Human-computer interaction
More informationA Unified Method for First and Third Person Action Recognition
A Unified Method for First and Third Person Action Recognition Ali Javidani Department of Computer Science and Engineering Shahid Beheshti University Tehran, Iran a.javidani@mail.sbu.ac.ir Ahmad Mahmoudi-Aznaveh
More informationDetecting Parts for Action Localization
CHESNEAU ET AL.: DETECTING PARTS FOR ACTION LOCALIZATION 1 Detecting Parts for Action Localization Nicolas Chesneau nicolas.chesneau@inria.fr Grégory Rogez gregory.rogez@inria.fr Karteek Alahari karteek.alahari@inria.fr
More informationDynamic Vision Sensors for Human Activity Recognition
Dynamic Vision Sensors for Human Activity Recognition Stefanie Anna Baby 1, Bimal Vinod 2, Chaitanya Chinni 3, Kaushik Mitra 4 Computational Imaging Lab IIT Madras, Chennai, India { 1 ee13b120, 2 ee15m005,
More informationEvaluation of local descriptors for action recognition in videos
Evaluation of local descriptors for action recognition in videos Piotr Bilinski and Francois Bremond INRIA Sophia Antipolis - PULSAR group 2004 route des Lucioles - BP 93 06902 Sophia Antipolis Cedex,
More informationHistogram of Flow and Pyramid Histogram of Visual Words for Action Recognition
Histogram of Flow and Pyramid Histogram of Visual Words for Action Recognition Ethem F. Can and R. Manmatha Department of Computer Science, UMass Amherst Amherst, MA, 01002, USA [efcan, manmatha]@cs.umass.edu
More informationLocal Feature Detectors
Local Feature Detectors Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr Slides adapted from Cordelia Schmid and David Lowe, CVPR 2003 Tutorial, Matthew Brown,
More informationObject detection using Region Proposals (RCNN) Ernest Cheung COMP Presentation
Object detection using Region Proposals (RCNN) Ernest Cheung COMP790-125 Presentation 1 2 Problem to solve Object detection Input: Image Output: Bounding box of the object 3 Object detection using CNN
More informationClassification of objects from Video Data (Group 30)
Classification of objects from Video Data (Group 30) Sheallika Singh 12665 Vibhuti Mahajan 12792 Aahitagni Mukherjee 12001 M Arvind 12385 1 Motivation Video surveillance has been employed for a long time
More informationarxiv: v2 [cs.cv] 22 Apr 2016
Improving Human Action Recognition by Non-action Classification Yang Wang and Minh Hoai Stony Brook University, Stony Brook, NY 11794, USA {wang33, minhhoai}@cs.stonybrook.edu arxiv:1604.06397v2 [cs.cv]
More informationDeep learning for object detection. Slides from Svetlana Lazebnik and many others
Deep learning for object detection Slides from Svetlana Lazebnik and many others Recent developments in object detection 80% PASCAL VOC mean0average0precision0(map) 70% 60% 50% 40% 30% 20% 10% Before deep
More informationVisual Action Recognition
Visual Action Recognition Ying Wu Electrical Engineering and Computer Science Northwestern University, Evanston, IL 60208 yingwu@northwestern.edu http://www.eecs.northwestern.edu/~yingwu 1 / 57 Outline
More informationarxiv: v2 [cs.cv] 31 May 2018
Graph Edge Convolutional Neural Networks for Skeleton Based Action Recognition Xikun Zhang, Chang Xu, Xinmei Tian, and Dacheng Tao June 1, 2018 arxiv:1805.06184v2 [cs.cv] 31 May 2018 Abstract This paper
More information1126 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 20, NO. 4, APRIL 2011
1126 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 20, NO. 4, APRIL 2011 Spatiotemporal Localization and Categorization of Human Actions in Unsegmented Image Sequences Antonios Oikonomopoulos, Member, IEEE,
More informationBag-of-features. Cordelia Schmid
Bag-of-features for category classification Cordelia Schmid Visual search Particular objects and scenes, large databases Category recognition Image classification: assigning a class label to the image
More informationMSR-CNN: Applying Motion Salient Region Based Descriptors for Action Recognition
MSR-CNN: Applying Motion Salient Region Based Descriptors for Action Recognition Zhigang Tu School of Computing, Informatics, Decision System Engineering Arizona State University Tempe, USA Email: Zhigang.Tu@asu.edu
More informationStoryline Reconstruction for Unordered Images
Introduction: Storyline Reconstruction for Unordered Images Final Paper Sameedha Bairagi, Arpit Khandelwal, Venkatesh Raizaday Storyline reconstruction is a relatively new topic and has not been researched
More informationREALISTIC HUMAN ACTION RECOGNITION: WHEN CNNS MEET LDS
REALISTIC HUMAN ACTION RECOGNITION: WHEN CNNS MEET LDS Lei Zhang 1, Yangyang Feng 1, Xuezhi Xiang 1, Xiantong Zhen 2 1 College of Information and Communication Engineering, Harbin Engineering University,
More informationSampling Strategies for Real-time Action Recognition
2013 IEEE Conference on Computer Vision and Pattern Recognition Sampling Strategies for Real-time Action Recognition Feng Shi, Emil Petriu and Robert Laganière School of Electrical Engineering and Computer
More informationEigen-Evolution Dense Trajectory Descriptors
Eigen-Evolution Dense Trajectory Descriptors Yang Wang, Vinh Tran, and Minh Hoai Stony Brook University, Stony Brook, NY 11794-2424, USA {wang33, tquangvinh, minhhoai}@cs.stonybrook.edu Abstract Trajectory-pooled
More informationHuman Action Recognition from Gradient Boundary Histograms
Human Action Recognition from Gradient Boundary Histograms by Xuelu Wang Thesis submitted to the Faculty of Graduate and Postdoctoral Studies In partial fulfillment of the requirements For the M.A.SC.
More informationVision and Image Processing Lab., CRV Tutorial day- May 30, 2010 Ottawa, Canada
Spatio-Temporal Salient Features Amir H. Shabani Vision and Image Processing Lab., University of Waterloo, ON CRV Tutorial day- May 30, 2010 Ottawa, Canada 1 Applications Automated surveillance for scene
More informationAction Recognition with Improved Trajectories
Action Recognition with Improved Trajectories Heng Wang and Cordelia Schmid LEAR, INRIA, France firstname.lastname@inria.fr Abstract Recently dense trajectories were shown to be an efficient video representation
More informationMotion Interchange Patterns for Action Recognition in Unconstrained Videos
Motion Interchange Patterns for Action Recognition in Unconstrained Videos Orit Kliper-Gross, Yaron Gurovich, Tal Hassner, Lior Wolf Weizmann Institute of Science The Open University of Israel Tel Aviv
More informationBSB663 Image Processing Pinar Duygulu. Slides are adapted from Selim Aksoy
BSB663 Image Processing Pinar Duygulu Slides are adapted from Selim Aksoy Image matching Image matching is a fundamental aspect of many problems in computer vision. Object or scene recognition Solving
More informationAn evaluation of local action descriptors for human action classification in the presence of occlusion
An evaluation of local action descriptors for human action classification in the presence of occlusion Iveel Jargalsaikhan, Cem Direkoglu, Suzanne Little, and Noel E. O Connor INSIGHT Centre for Data Analytics,
More informationAUTOMATIC 3D HUMAN ACTION RECOGNITION Ajmal Mian Associate Professor Computer Science & Software Engineering
AUTOMATIC 3D HUMAN ACTION RECOGNITION Ajmal Mian Associate Professor Computer Science & Software Engineering www.csse.uwa.edu.au/~ajmal/ Overview Aim of automatic human action recognition Applications
More informationDetection III: Analyzing and Debugging Detection Methods
CS 1699: Intro to Computer Vision Detection III: Analyzing and Debugging Detection Methods Prof. Adriana Kovashka University of Pittsburgh November 17, 2015 Today Review: Deformable part models How can
More informationObject Detection Based on Deep Learning
Object Detection Based on Deep Learning Yurii Pashchenko AI Ukraine 2016, Kharkiv, 2016 Image classification (mostly what you ve seen) http://tutorial.caffe.berkeleyvision.org/caffe-cvpr15-detection.pdf
More informationThe SIFT (Scale Invariant Feature
The SIFT (Scale Invariant Feature Transform) Detector and Descriptor developed by David Lowe University of British Columbia Initial paper ICCV 1999 Newer journal paper IJCV 2004 Review: Matt Brown s Canonical
More informationUNDERSTANDING human actions in videos has been
PAPER IDENTIFICATION NUMBER 1 A Space-Time Graph Optimization Approach Based on Maximum Cliques for Action Detection Sunyoung Cho, Member, IEEE, and Hyeran Byun, Member, IEEE Abstract We present an efficient
More informationLEARNING TO SEGMENT MOVING OBJECTS IN VIDEOS FRAGKIADAKI ET AL. 2015
LEARNING TO SEGMENT MOVING OBJECTS IN VIDEOS FRAGKIADAKI ET AL. 2015 Darshan Thaker Oct 4, 2017 Problem Statement Moving object segmentation in videos Applications: security tracking, pedestrian detection,
More informationRevisiting LBP-based Texture Models for Human Action Recognition
Revisiting LBP-based Texture Models for Human Action Recognition Thanh Phuong Nguyen 1, Antoine Manzanera 1, Ngoc-Son Vu 2, and Matthieu Garrigues 1 1 ENSTA-ParisTech, 828, Boulevard des Maréchaux, 91762
More informationMobile Human Detection Systems based on Sliding Windows Approach-A Review
Mobile Human Detection Systems based on Sliding Windows Approach-A Review Seminar: Mobile Human detection systems Njieutcheu Tassi cedrique Rovile Department of Computer Engineering University of Heidelberg
More informationHUMAN action recognition has received significant research
JOURNAL OF L A TEX CLASS FILES, VOL. 6, NO. 1, JANUARY 2007 1 Human Action Recognition in Unconstrained Videos by Explicit Motion Modeling Yu-Gang Jiang, Qi Dai, Wei Liu, Xiangyang Xue, Chong-Wah Ngo Abstract
More informationObject Detection Using Segmented Images
Object Detection Using Segmented Images Naran Bayanbat Stanford University Palo Alto, CA naranb@stanford.edu Jason Chen Stanford University Palo Alto, CA jasonch@stanford.edu Abstract Object detection
More informationBeyond Bags of features Spatial information & Shape models
Beyond Bags of features Spatial information & Shape models Jana Kosecka Many slides adapted from S. Lazebnik, FeiFei Li, Rob Fergus, and Antonio Torralba Detection, recognition (so far )! Bags of features
More informationACTION RECOGNITION USING INTEREST POINTS CAPTURING DIFFERENTIAL MOTION INFORMATION
ACTION RECOGNITION USING INTEREST POINTS CAPTURING DIFFERENTIAL MOTION INFORMATION Gaurav Kumar Yadav, Prakhar Shukla, Amit Sethi Department of Electronics and Electrical Engineering, IIT Guwahati Department
More informationAction Recognition Using Global Spatio-Temporal Features Derived from Sparse Representations
Action Recognition Using Global Spatio-Temporal Features Derived from Sparse Representations Guruprasad Somasundaram, Anoop Cherian, Vassilios Morellas, and Nikolaos Papanikolopoulos Department of Computer
More informationQMUL-ACTIVA: Person Runs detection for the TRECVID Surveillance Event Detection task
QMUL-ACTIVA: Person Runs detection for the TRECVID Surveillance Event Detection task Fahad Daniyal and Andrea Cavallaro Queen Mary University of London Mile End Road, London E1 4NS (United Kingdom) {fahad.daniyal,andrea.cavallaro}@eecs.qmul.ac.uk
More informationObject detection with CNNs
Object detection with CNNs 80% PASCAL VOC mean0average0precision0(map) 70% 60% 50% 40% 30% 20% 10% Before CNNs After CNNs 0% 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 year Region proposals
More informationFeature Descriptors. CS 510 Lecture #21 April 29 th, 2013
Feature Descriptors CS 510 Lecture #21 April 29 th, 2013 Programming Assignment #4 Due two weeks from today Any questions? How is it going? Where are we? We have two umbrella schemes for object recognition
More informationCS 231A Computer Vision (Fall 2011) Problem Set 4
CS 231A Computer Vision (Fall 2011) Problem Set 4 Due: Nov. 30 th, 2011 (9:30am) 1 Part-based models for Object Recognition (50 points) One approach to object recognition is to use a deformable part-based
More informationMultiple-Choice Questionnaire Group C
Family name: Vision and Machine-Learning Given name: 1/28/2011 Multiple-Choice naire Group C No documents authorized. There can be several right answers to a question. Marking-scheme: 2 points if all right
More informationRecognition of Animal Skin Texture Attributes in the Wild. Amey Dharwadker (aap2174) Kai Zhang (kz2213)
Recognition of Animal Skin Texture Attributes in the Wild Amey Dharwadker (aap2174) Kai Zhang (kz2213) Motivation Patterns and textures are have an important role in object description and understanding
More informationLearning video saliency from human gaze using candidate selection
Learning video saliency from human gaze using candidate selection Rudoy, Goldman, Shechtman, Zelnik-Manor CVPR 2013 Paper presentation by Ashish Bora Outline What is saliency? Image vs video Candidates
More informationAction Recognition using Discriminative Structured Trajectory Groups
2015 IEEE Winter Conference on Applications of Computer Vision Action Recognition using Discriminative Structured Trajectory Groups Indriyati Atmosukarto 1,2, Narendra Ahuja 3, Bernard Ghanem 4 1 Singapore
More informationThree-Dimensional Object Detection and Layout Prediction using Clouds of Oriented Gradients
ThreeDimensional Object Detection and Layout Prediction using Clouds of Oriented Gradients Authors: Zhile Ren, Erik B. Sudderth Presented by: Shannon Kao, Max Wang October 19, 2016 Introduction Given an
More informationPreviously. Part-based and local feature models for generic object recognition. Bag-of-words model 4/20/2011
Previously Part-based and local feature models for generic object recognition Wed, April 20 UT-Austin Discriminative classifiers Boosting Nearest neighbors Support vector machines Useful for object recognition
More informationRobotics Programming Laboratory
Chair of Software Engineering Robotics Programming Laboratory Bertrand Meyer Jiwon Shin Lecture 8: Robot Perception Perception http://pascallin.ecs.soton.ac.uk/challenges/voc/databases.html#caltech car
More informationFrom Activity to Language:
From Activity to Language: Learning to recognise the meaning of motion Centre for Vision, Speech and Signal Processing Prof Rich Bowden 20 June 2011 Overview Talk is about recognising spatio temporal patterns
More informationStereoscopic Video Description for Human Action Recognition
Stereoscopic Video Description for Human Action Recognition Ioannis Mademlis, Alexandros Iosifidis, Anastasios Tefas, Nikos Nikolaidis and Ioannis Pitas Department of Informatics, Aristotle University
More informationLeveraging Textural Features for Recognizing Actions in Low Quality Videos
Leveraging Textural Features for Recognizing Actions in Low Quality Videos Saimunur Rahman 1, John See 2, and Chiung Ching Ho 3 Centre of Visual Computing, Faculty of Computing and Informatics Multimedia
More informationTri-modal Human Body Segmentation
Tri-modal Human Body Segmentation Master of Science Thesis Cristina Palmero Cantariño Advisor: Sergio Escalera Guerrero February 6, 2014 Outline 1 Introduction 2 Tri-modal dataset 3 Proposed baseline 4
More information