Content-based image and video analysis. Event Recognition

Size: px
Start display at page:

Download "Content-based image and video analysis. Event Recognition"

Transcription

1 Content-based image and video analysis Event Recognition

2 What is an event? a thing that happens or takes place, Oxford Dictionary Examples: Human gestures Human actions (running, drinking, etc.) Human interaction (gunfight, car-crash, etc.) Sports event (tennis, soccer, etc.) Nature event (fire, storm, etc.) 1

3 Why event recognition? Huge amounts of video available Event recognition useful for: Content-based browsing Fast forward to the next goal scoring scene Video search Show me all videos with Bush and Putin shaking hands 2

4 Human actions Human actions are major events in movie content Meaning hidden within visual representation 4

5 What are human actions? Definition 1: Physical body motion: KTH action Database ( 5

6 What are human actions? Definition 2: Interaction with environment on specific purpose Same physical motion different action depending on the context 6

7 Challenges Variations: Lighting, appearance, view, background Individual motion, camera motion Difference in shape Difference in motion Drinking Smoking Both actions are similar in overall shape (human posture) and motion (hand motion) 7

8 Challenges Problems: Existing datasets contain few action classes, captured in controlled and simplified settings Lots of (realistic) training data needed Idea: Realistic human actions are frequent events within movies Perform automatic labeling of video sequences based on movie scripts 8

9 Action event dataset Coffee and Cigarettes dataset 159 annotated Drinking samples 149 annotated Smoking samples Spatial annotation head rectangle First frame Temporal annotation Keyframe Last frame torso rectangle 9

10 Problems: Challenges Existing datasets contain few action classes, captured in controlled and simplified settings Lots of (realistic) training data needed Idea: Realistic human actions are frequent events within movies Perform automatic labeling of video sequences based on movie scripts Hollywood Human actions (HOHA) dataset 10

11 Problems: Script-based annotation No time information available Use subtitles to align scripts with the video Described actions do not always correspond to movie scene Assign an alignment score to scene description: a = (#matched words)/(#all words) Apply a Threshold (eg. a>0.5) Variability of action expressions in text Use regularized Perceptron to classify action description 11

12 Script alignment Scripts publicly available to many movies Subtitles available for the most of movies Transfer time to scripts by text alignment :20:17,240 --> 01:20:20,437 Why weren't you honest with me? Why'd you keep your marriage a secret? RICK Why weren't you honest with me? Why did you keep your marriage a secret? :20:20,640 --> 01:20:23,598 lt wasn't my secret, Richard. Victor wanted it that way :20:23,800 --> 01:20:26,189 Not even our closest friends knew about our marriage. subtitles 01:20:17 01:20:23 Rick sits down with Ilsa. ILSA Oh, it wasn't my secret, Richard. Victor wanted it that way. Not even our closest friends knew about our marriage. movie script 12

13 Script alignment: Problems Perfect alignment does not guarantee perfect action annotation in video From 147 action with correct alignment (a=1), only 70% did match the video Errors Temporal misalignment (10%) Ouside FOV (10%) Completely missing in video (10%) (a: quality of subtitle-script matching) Example of false positive for get out of car, Action is not visible in video! A black car pulls up, two army officers get out. 14

14 Script-based Annotation Pros: Realistic variation of actions Many examples per class, many classes No extra overhead for new classes Character names may be used to resolve who is doing what? Problems: No spatial localization Temporal localization may be poor Script does not always follow the movie Automatic annotation useful for training, but not precise enough for evaluation 15

15 Retrieving actions in movies Ivan Laptev and Patric Perez ICCV

16 Actions == space-time objects? stableview objects atomic actions car exit phoning smoking hand shaking drinking Take advantage of spacetime shape Temporal slice time time time time 17

17 Actions == space-time objects? Can actions be considered as space-time objects? Transfer object detectors to action recognition Here: only atomic actions considered (i.e. simple actions) Temporal slice of same action under different circumstances similar So, (atomic) actions = space-time objects 18

18 Action features Action volume = space-time cuboid region around the head (duration of action) Encoded with block-histogram features f ( ), = (x,y,t,dx,dy,dt,, ), defined by Location (x,y,t) Space-time extent (dx,dy,dt) Type of block ( ) el Plane, Temp-2, Spat-4 Type of histogram ( ) Histogram of optical flow (HOF) Histogram of oriented gradient (HOG) 19

19 Action features HOG features HOF features 20

20 Histogram features (simplified) Histogram of oriented gradient: Apply gradient operator to each frame within sequence (eg. Sobel) Bin gradients discretized in 4 orientations to block-histogram Histogram of optical flow: Calculate optical flow (OF) between frames Bin OF vectors discretized in 4 direction bins (+1 bin for no motion) to block-histogram Normalized action cuboid has size 14x14x8 with units corresponding to 5x5x5 pixels More than 1 Mio. possible features f ( ) 21

21 Histogram features HOG: histograms of oriented gradient HOF: histograms of optic flow >10^6 possible features 4 grad. orientation bins 4 OF direction bins + 1 bin for no motion 22

22 Action learning Use boosting method (eg. AdaBoost) to classify features within an action volume Features: Block-histogram features selected features boosting AdaBoost: weak classifier Efficient discriminative classifier [Freund&Schapire 97] Good performance for face detection [Viola&Jones 01] 23

23 Action learning: Boosting A weak classifier h is a classifier with accuracy only slightly better than chance selected features weak classifier Boosting: combine a number of weak classifiers so that the ensemble is arbitrarily accurate Allows the use of simple (weak) classifiers without the loss if accuracy Selects features and trains the classifier 24

24 pre-aligned samples Action learning: Boosting Haar features optimal threshold Histogram features Use FLD, select opt. threshold Weak classifier h t : In case of one dimensional features select an optimal decision threshold E.g. for Haar-filter responses (Viola&Jones face detector) Here: m-dimensional features Project data on one dimension using Fisher s Linear Discriminant (FLD), then select optimal threshold in 1-D 25

25 Action classification test Comparison of: Static keyframe classifier with spatial HOG features (BH-Grad4) Use Boosting to classify action using features from keyframes (= frame when hand reaches the mouth) only Space-time action classifier with HOF features (STBH-OF5) Space-time action classifier with HOF and HOG features (STBH-OFGrad9) 28

26 Action classification test Random motion patterns Additional shape information does not seem to improve the space-time classifier Space-time classifier and static key-frame classifier might have complementary properties 29

27 Classifier properties Training output: Accumulated feature maps Space-time classifier (HOF) Static keyframe classifier(hog) Space-time classifier and static keyframe classifier might have complementary features 30

28 Classifier properties Region of selected features show most active parts of classifier High activity at beginning and end of sequence, but low activity at keyframe Less accuracy expected when only classifying keyframes Space-time classifier selects most features around hand region Keyframe classifier selects most features in upper part of the key-frame Idea: Use complementary properties to combine classifiers (-> keyframe priming) 31

29 Keyframe priming Combination of static key-frame classifier with space-time classifier Motivated by complementary properties of both classifiers Bootstrap space-time classifier and apply it to keyframes detected by the keyframe detector (boosted space-time window classifier) Speeds up detection Combines complementary models 32

30 Keyframe priming Apply keyframe detector (HOG classification on single frame) to all positions, scales and frames while being set to a high false positive rate Generate space-time blocks aligned with detected keyframes and with different temporal extent Run space-time classifier on each hypothesis 33

31 Keyframe priming Training Positive training sample Negative training samples Keyframe-primed event detection Keyframe detections Test 34

32 Action detection Test on 25min from Coffee and Cigarettes with 38 drinking actions No overlap with the training set in subjects or scenes Keyframe priming is faster and leads to significant better results Keyframe priming No Keyframe priming 35

33 Action detection 36

34 Learning realistic human actions from movies Ivan Laptev, Marcin Marszalek, Cordelia Schmid and Benjamin Rozenfeld CVPR

35 Action recognition in real-world videos Humans can do more than just drinking and smoking Robust detection and classification of all kinds of human actions needed 38

36 Space-Time Features: Detector [Laptev, IJCV 2005] 39

37 Space-Time Features: Detector Space-Time Interest Points (STIP): Space-Time Extension of Harris Operator Add dimensionality of time to the second moment matrix Look for maxima in extended Harris corner function H Detection depends on spatio-temporal scale Extract features at multiple levels of spatio-temporal scales (dense scale sampling) 40

38 Space-Time Features: Descriptor Multi-scale space-time patches from corner detector Public code available at Histogram of oriented spatial grad. (HOG) Histogram of optical flow (HOF) 3x3x2x4bins HOG descriptor 3x3x2x5bins HOF descriptor 41

39 Space-Time Features: Descriptor Compute histogram descriptors of space-time volumes in neighborhood of detected points: Compute a 4-bin HOG for each cube in 3x3x2 space-time grid Compute a 5-bin HOF for each cube in 3x3x2 space-time grid Size of each volume related to detection scales 42

40 Spatio-temporal bag of features (BoF) Cluster features (eg. k-means) Visual vocabulary (here, k=4000) Assign each feature to nearest vocabulary word Compute histogram of visual word occurrences over space time volume different spatio-temporal grids explored: 43

41 Action classification Use SVMs with multi-channel chi-square kernel: Channel c is a combination of a spatiotemporal grid and a descriptor (HoG or HoF) D is 2 distance between the BoF-histograms A is mean value of distances between all training samples One-against-all approach in case of multi-class classification 44

42 Results on KTH actions dataset Examples of all six classes and all four scenarios 45

43 Results on KTH actions dataset Average class accuracy Confusion matrix 46

44 Results on HOHA dataset Average Precision for each action class Comparison of results for annotated (clean) and automatic training data Chance denotes results of a random classifier 47

45 Results on HOHA dataset 48

46 Visual Event Recognition in News Video using Kernel Methods with Multi-Level Temporal Alignment Dong Xu and Shih-Fu Chang CVPR

47 Event recognition in news broadcast Challenging task: Complex motion, cluttered backgrounds, occlusions, geometric variations of objects HOF/HOG approaches are sensitive to high-motion regions only News events may have relatively low motion (eg. fire) Broadcast news less constrained domain than human actions Event usually consists of different sub-clips E.g. riot may consist of scenes of fire and smoke at different locations 50

48 Temporally aligned pyramid matching Algorithm overview: each frame is represented by one feature vector decompose video into sub-clips using hierarchical clustering on each hierarchy level, calculate Earth Mover s Distance (EMD) between frame features and use it within SVM framework Constrain EMD that frames from one sub-clip can only be matched to fames from one other sub-clip (alignment) fuse SVM outputs from all levels 51

49 Temporally aligned pyramid matching 52

50 Decomposition of video into sub-clips Temporal-constrained Hierarchical Agglomerative Clustering (T-HAC): First each feature-vector forms a cluster Construct clusters iteratively by combining existing clusters based on their distances (EMD) Only merge neighboring clusters in temporal dimension Clusters form a pyramid like structure (dendrogram) 53

51 Features Low-level global features: Grid Color Moment (GCM) First three moments for each grid region (eg. 5X5 grid) Gabor Texture feature (GT) Edge Direction Histogram (EDH) (same as HoG) Apply Sobel operator and histogram edge directions quantized at 5 degrees Mid-level feature: Concept Score (CS) N-dimensional vector (eg. N=108) each component represents confidence score from a semantic concept classifier (SVMs) 55

52 Temporal alignment Matching in single-level EMD (a) and temporally aligned pyramid matching (TAPM) (b) 58

53 Classification SVM Classification with Earth Movers Distance (EMD) Kernel: D(P,Q): EMD between features from sequence P and Q A: hyper-parameter set empirically through cross-validation 59

54 Fusion Fuse Information from different levels directly: L: number of sub-clip levels g: decision values from the SVMs at different levels l h: level weights (eg. all weights equal to 1) Similar to logistic regression/perceptron 60

55 Experiments Test with LSCOM concepts on TRECVID events/activities annotated (449 total concepts) 10 events chosen Relatively high occurrence frequency May be recognized from visual cues intuitively Number of positive samples for each class:

56 Results Comparison of single-level EMD (SLEMD) with key-frame classification on different features 62

57 Results Average Precision on different levels of TAPM L<x>: classification at level x d: fusion with non-uniform weights (h0=h1=1; h2=2) 63

58 Summary Paper1: Retrieving actions in movies Atomic human actions: smoking, drinking Histogram of oriented gradients (HoG), histograms of oriented flow (HoF), at different spatio-temporal positions & scales + grid-types,.. Boosting to select good features and to build classifier Paper2: Learning realistic human actions from movies Eight atomic human actions: Kiss, AnswerPhone, GetOutCar,.. HoG / HoF at space-time interest points, bag of features SVM for classification Paper 3: Visual event recognition in news videos Decomposes video into sub-clips Temporal alignment improves classification Low- and mid-level features, SVM for classification 64

59 References I. Laptev and P. Perez. Retrieving actions in movies. ICCV '07 I. Laptev, M. Marszalek, C. Schmid and B. Rozenfeld. Learning realistic human actions from movies. CVPR '08 D. Xu and SF. Chang. Visual Event Recognition in News Video using Kernel Methods with Multi-Level Temporal Alignment. CVPR '07 65

60 References ptev.html P. Viola and M. Jones. Rapid object detection using a boosted cascade of simple features. CVPR '01 R. Schapire, Y. Freund, P. Bartlett and WS. Lee. Boosting the margin: A new explanation for the effectiveness of voting methods. ICML '97 66

61 Hiwi Interested in action recognition? We are currently looking for a motivated student to teach Armar-III how to recognize situations within rooms (eg. cooking, cleaning) Required skills: Interest in Computer Vision C++ programming experience under Linux lukas.rybok@kit.edu for more details 67

Learning Realistic Human Actions from Movies

Learning Realistic Human Actions from Movies Learning Realistic Human Actions from Movies Ivan Laptev*, Marcin Marszałek**, Cordelia Schmid**, Benjamin Rozenfeld*** INRIA Rennes, France ** INRIA Grenoble, France *** Bar-Ilan University, Israel Presented

More information

Learning realistic human actions from movies

Learning realistic human actions from movies Learning realistic human actions from movies Ivan Laptev, Marcin Marszalek, Cordelia Schmid, Benjamin Rozenfeld CVPR 2008 Presented by Piotr Mirowski Courant Institute, NYU Advanced Vision class, November

More information

Action recognition in videos

Action recognition in videos Action recognition in videos Cordelia Schmid INRIA Grenoble Joint work with V. Ferrari, A. Gaidon, Z. Harchaoui, A. Klaeser, A. Prest, H. Wang Action recognition - goal Short actions, i.e. drinking, sit

More information

EVENT DETECTION AND HUMAN BEHAVIOR RECOGNITION. Ing. Lorenzo Seidenari

EVENT DETECTION AND HUMAN BEHAVIOR RECOGNITION. Ing. Lorenzo Seidenari EVENT DETECTION AND HUMAN BEHAVIOR RECOGNITION Ing. Lorenzo Seidenari e-mail: seidenari@dsi.unifi.it What is an Event? Dictionary.com definition: something that occurs in a certain place during a particular

More information

Person Action Recognition/Detection

Person Action Recognition/Detection Person Action Recognition/Detection Fabrício Ceschin Visão Computacional Prof. David Menotti Departamento de Informática - Universidade Federal do Paraná 1 In object recognition: is there a chair in the

More information

IMPROVING SPATIO-TEMPORAL FEATURE EXTRACTION TECHNIQUES AND THEIR APPLICATIONS IN ACTION CLASSIFICATION. Maral Mesmakhosroshahi, Joohee Kim

IMPROVING SPATIO-TEMPORAL FEATURE EXTRACTION TECHNIQUES AND THEIR APPLICATIONS IN ACTION CLASSIFICATION. Maral Mesmakhosroshahi, Joohee Kim IMPROVING SPATIO-TEMPORAL FEATURE EXTRACTION TECHNIQUES AND THEIR APPLICATIONS IN ACTION CLASSIFICATION Maral Mesmakhosroshahi, Joohee Kim Department of Electrical and Computer Engineering Illinois Institute

More information

P-CNN: Pose-based CNN Features for Action Recognition. Iman Rezazadeh

P-CNN: Pose-based CNN Features for Action Recognition. Iman Rezazadeh P-CNN: Pose-based CNN Features for Action Recognition Iman Rezazadeh Introduction automatic understanding of dynamic scenes strong variations of people and scenes in motion and appearance Fine-grained

More information

Previously. Part-based and local feature models for generic object recognition. Bag-of-words model 4/20/2011

Previously. Part-based and local feature models for generic object recognition. Bag-of-words model 4/20/2011 Previously Part-based and local feature models for generic object recognition Wed, April 20 UT-Austin Discriminative classifiers Boosting Nearest neighbors Support vector machines Useful for object recognition

More information

Lecture 18: Human Motion Recognition

Lecture 18: Human Motion Recognition Lecture 18: Human Motion Recognition Professor Fei Fei Li Stanford Vision Lab 1 What we will learn today? Introduction Motion classification using template matching Motion classification i using spatio

More information

Recap Image Classification with Bags of Local Features

Recap Image Classification with Bags of Local Features Recap Image Classification with Bags of Local Features Bag of Feature models were the state of the art for image classification for a decade BoF may still be the state of the art for instance retrieval

More information

Object Detection Design challenges

Object Detection Design challenges Object Detection Design challenges How to efficiently search for likely objects Even simple models require searching hundreds of thousands of positions and scales Feature design and scoring How should

More information

Part-based and local feature models for generic object recognition

Part-based and local feature models for generic object recognition Part-based and local feature models for generic object recognition May 28 th, 2015 Yong Jae Lee UC Davis Announcements PS2 grades up on SmartSite PS2 stats: Mean: 80.15 Standard Dev: 22.77 Vote on piazza

More information

https://en.wikipedia.org/wiki/the_dress Recap: Viola-Jones sliding window detector Fast detection through two mechanisms Quickly eliminate unlikely windows Use features that are fast to compute Viola

More information

Columbia University High-Level Feature Detection: Parts-based Concept Detectors

Columbia University High-Level Feature Detection: Parts-based Concept Detectors TRECVID 2005 Workshop Columbia University High-Level Feature Detection: Parts-based Concept Detectors Dong-Qing Zhang, Shih-Fu Chang, Winston Hsu, Lexin Xie, Eric Zavesky Digital Video and Multimedia Lab

More information

Bag-of-features. Cordelia Schmid

Bag-of-features. Cordelia Schmid Bag-of-features for category classification Cordelia Schmid Visual search Particular objects and scenes, large databases Category recognition Image classification: assigning a class label to the image

More information

Window based detectors

Window based detectors Window based detectors CS 554 Computer Vision Pinar Duygulu Bilkent University (Source: James Hays, Brown) Today Window-based generic object detection basic pipeline boosting classifiers face detection

More information

Multiple Kernel Learning for Emotion Recognition in the Wild

Multiple Kernel Learning for Emotion Recognition in the Wild Multiple Kernel Learning for Emotion Recognition in the Wild Karan Sikka, Karmen Dykstra, Suchitra Sathyanarayana, Gwen Littlewort and Marian S. Bartlett Machine Perception Laboratory UCSD EmotiW Challenge,

More information

Object Category Detection: Sliding Windows

Object Category Detection: Sliding Windows 04/10/12 Object Category Detection: Sliding Windows Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem Today s class: Object Category Detection Overview of object category detection Statistical

More information

Visuelle Perzeption für Mensch- Maschine Schnittstellen

Visuelle Perzeption für Mensch- Maschine Schnittstellen Visuelle Perzeption für Mensch- Maschine Schnittstellen Vorlesung, WS 2009 Prof. Dr. Rainer Stiefelhagen Dr. Edgar Seemann Institut für Anthropomatik Universität Karlsruhe (TH) http://cvhci.ira.uka.de

More information

Action Recognition with HOG-OF Features

Action Recognition with HOG-OF Features Action Recognition with HOG-OF Features Florian Baumann Institut für Informationsverarbeitung, Leibniz Universität Hannover, {last name}@tnt.uni-hannover.de Abstract. In this paper a simple and efficient

More information

Extracting Spatio-temporal Local Features Considering Consecutiveness of Motions

Extracting Spatio-temporal Local Features Considering Consecutiveness of Motions Extracting Spatio-temporal Local Features Considering Consecutiveness of Motions Akitsugu Noguchi and Keiji Yanai Department of Computer Science, The University of Electro-Communications, 1-5-1 Chofugaoka,

More information

Deformable Part Models

Deformable Part Models CS 1674: Intro to Computer Vision Deformable Part Models Prof. Adriana Kovashka University of Pittsburgh November 9, 2016 Today: Object category detection Window-based approaches: Last time: Viola-Jones

More information

Category-level localization

Category-level localization Category-level localization Cordelia Schmid Recognition Classification Object present/absent in an image Often presence of a significant amount of background clutter Localization / Detection Localize object

More information

Face detection and recognition. Detection Recognition Sally

Face detection and recognition. Detection Recognition Sally Face detection and recognition Detection Recognition Sally Face detection & recognition Viola & Jones detector Available in open CV Face recognition Eigenfaces for face recognition Metric learning identification

More information

CS229: Action Recognition in Tennis

CS229: Action Recognition in Tennis CS229: Action Recognition in Tennis Aman Sikka Stanford University Stanford, CA 94305 Rajbir Kataria Stanford University Stanford, CA 94305 asikka@stanford.edu rkataria@stanford.edu 1. Motivation As active

More information

Visuelle Perzeption für Mensch- Maschine Schnittstellen

Visuelle Perzeption für Mensch- Maschine Schnittstellen Visuelle Perzeption für Mensch- Maschine Schnittstellen Vorlesung, WS 2009 Prof. Dr. Rainer Stiefelhagen Dr. Edgar Seemann Institut für Anthropomatik Universität Karlsruhe (TH) http://cvhci.ira.uka.de

More information

Object Classification Problem

Object Classification Problem HIERARCHICAL OBJECT CATEGORIZATION" Gregory Griffin and Pietro Perona. Learning and Using Taxonomies For Fast Visual Categorization. CVPR 2008 Marcin Marszalek and Cordelia Schmid. Constructing Category

More information

Discriminative classifiers for image recognition

Discriminative classifiers for image recognition Discriminative classifiers for image recognition May 26 th, 2015 Yong Jae Lee UC Davis Outline Last time: window-based generic object detection basic pipeline face detection with boosting as case study

More information

Multiple-Choice Questionnaire Group C

Multiple-Choice Questionnaire Group C Family name: Vision and Machine-Learning Given name: 1/28/2011 Multiple-Choice naire Group C No documents authorized. There can be several right answers to a question. Marking-scheme: 2 points if all right

More information

Object recognition (part 1)

Object recognition (part 1) Recognition Object recognition (part 1) CSE P 576 Larry Zitnick (larryz@microsoft.com) The Margaret Thatcher Illusion, by Peter Thompson Readings Szeliski Chapter 14 Recognition What do we mean by object

More information

Robotics Programming Laboratory

Robotics Programming Laboratory Chair of Software Engineering Robotics Programming Laboratory Bertrand Meyer Jiwon Shin Lecture 8: Robot Perception Perception http://pascallin.ecs.soton.ac.uk/challenges/voc/databases.html#caltech car

More information

Object and Class Recognition I:

Object and Class Recognition I: Object and Class Recognition I: Object Recognition Lectures 10 Sources ICCV 2005 short courses Li Fei-Fei (UIUC), Rob Fergus (Oxford-MIT), Antonio Torralba (MIT) http://people.csail.mit.edu/torralba/iccv2005

More information

MoSIFT: Recognizing Human Actions in Surveillance Videos

MoSIFT: Recognizing Human Actions in Surveillance Videos MoSIFT: Recognizing Human Actions in Surveillance Videos CMU-CS-09-161 Ming-yu Chen and Alex Hauptmann School of Computer Science Carnegie Mellon University Pittsburgh PA 15213 September 24, 2009 Copyright

More information

Leveraging Textural Features for Recognizing Actions in Low Quality Videos

Leveraging Textural Features for Recognizing Actions in Low Quality Videos Leveraging Textural Features for Recognizing Actions in Low Quality Videos Saimunur Rahman, John See, Chiung Ching Ho Centre of Visual Computing, Faculty of Computing and Informatics Multimedia University,

More information

Beyond Bags of Features

Beyond Bags of Features : for Recognizing Natural Scene Categories Matching and Modeling Seminar Instructed by Prof. Haim J. Wolfson School of Computer Science Tel Aviv University December 9 th, 2015

More information

Part based models for recognition. Kristen Grauman

Part based models for recognition. Kristen Grauman Part based models for recognition Kristen Grauman UT Austin Limitations of window-based models Not all objects are box-shaped Assuming specific 2d view of object Local components themselves do not necessarily

More information

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009 Learning and Inferring Depth from Monocular Images Jiyan Pan April 1, 2009 Traditional ways of inferring depth Binocular disparity Structure from motion Defocus Given a single monocular image, how to infer

More information

First-Person Animal Activity Recognition from Egocentric Videos

First-Person Animal Activity Recognition from Egocentric Videos First-Person Animal Activity Recognition from Egocentric Videos Yumi Iwashita Asamichi Takamine Ryo Kurazume School of Information Science and Electrical Engineering Kyushu University, Fukuoka, Japan yumi@ieee.org

More information

Detection III: Analyzing and Debugging Detection Methods

Detection III: Analyzing and Debugging Detection Methods CS 1699: Intro to Computer Vision Detection III: Analyzing and Debugging Detection Methods Prof. Adriana Kovashka University of Pittsburgh November 17, 2015 Today Review: Deformable part models How can

More information

Selection of Scale-Invariant Parts for Object Class Recognition

Selection of Scale-Invariant Parts for Object Class Recognition Selection of Scale-Invariant Parts for Object Class Recognition Gy. Dorkó and C. Schmid INRIA Rhône-Alpes, GRAVIR-CNRS 655, av. de l Europe, 3833 Montbonnot, France fdorko,schmidg@inrialpes.fr Abstract

More information

Class 9 Action Recognition

Class 9 Action Recognition Class 9 Action Recognition Liangliang Cao, April 4, 2013 EECS 6890 Topics in Information Processing Spring 2013, Columbia University http://rogerioferis.com/visualrecognitionandsearch Visual Recognition

More information

Face Detection and Alignment. Prof. Xin Yang HUST

Face Detection and Alignment. Prof. Xin Yang HUST Face Detection and Alignment Prof. Xin Yang HUST Many slides adapted from P. Viola Face detection Face detection Basic idea: slide a window across image and evaluate a face model at every location Challenges

More information

Object Category Detection: Sliding Windows

Object Category Detection: Sliding Windows 03/18/10 Object Category Detection: Sliding Windows Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem Goal: Detect all instances of objects Influential Works in Detection Sung-Poggio

More information

Evaluation of Local Space-time Descriptors based on Cuboid Detector in Human Action Recognition

Evaluation of Local Space-time Descriptors based on Cuboid Detector in Human Action Recognition International Journal of Innovation and Applied Studies ISSN 2028-9324 Vol. 9 No. 4 Dec. 2014, pp. 1708-1717 2014 Innovative Space of Scientific Research Journals http://www.ijias.issr-journals.org/ Evaluation

More information

Preliminary Local Feature Selection by Support Vector Machine for Bag of Features

Preliminary Local Feature Selection by Support Vector Machine for Bag of Features Preliminary Local Feature Selection by Support Vector Machine for Bag of Features Tetsu Matsukawa Koji Suzuki Takio Kurita :University of Tsukuba :National Institute of Advanced Industrial Science and

More information

SPATIO-TEMPORAL PYRAMIDAL ACCORDION REPRESENTATION FOR HUMAN ACTION RECOGNITION. Manel Sekma, Mahmoud Mejdoub, Chokri Ben Amar

SPATIO-TEMPORAL PYRAMIDAL ACCORDION REPRESENTATION FOR HUMAN ACTION RECOGNITION. Manel Sekma, Mahmoud Mejdoub, Chokri Ben Amar 2014 IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP) SPATIO-TEMPORAL PYRAMIDAL ACCORDION REPRESENTATION FOR HUMAN ACTION RECOGNITION Manel Sekma, Mahmoud Mejdoub, Chokri

More information

Visual Event Recognition in News Video using Kernel Methods with Multi-Level Temporal Alignment

Visual Event Recognition in News Video using Kernel Methods with Multi-Level Temporal Alignment Visual Event Recognition in News Video using Kernel Methods with Multi-Level Temporal Alignment Dong Xu and Shih-Fu Chang Department of Electrical Engineering Columbia University, New York, NY 007 {dongxu,

More information

Human detection using local shape and nonredundant

Human detection using local shape and nonredundant University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 2010 Human detection using local shape and nonredundant binary patterns

More information

Classifying Images with Visual/Textual Cues. By Steven Kappes and Yan Cao

Classifying Images with Visual/Textual Cues. By Steven Kappes and Yan Cao Classifying Images with Visual/Textual Cues By Steven Kappes and Yan Cao Motivation Image search Building large sets of classified images Robotics Background Object recognition is unsolved Deformable shaped

More information

Local Features and Kernels for Classifcation of Texture and Object Categories: A Comprehensive Study

Local Features and Kernels for Classifcation of Texture and Object Categories: A Comprehensive Study Local Features and Kernels for Classifcation of Texture and Object Categories: A Comprehensive Study J. Zhang 1 M. Marszałek 1 S. Lazebnik 2 C. Schmid 1 1 INRIA Rhône-Alpes, LEAR - GRAVIR Montbonnot, France

More information

Fast Realistic Multi-Action Recognition using Mined Dense Spatio-temporal Features

Fast Realistic Multi-Action Recognition using Mined Dense Spatio-temporal Features Fast Realistic Multi-Action Recognition using Mined Dense Spatio-temporal Features Andrew Gilbert, John Illingworth and Richard Bowden CVSSP, University of Surrey, Guildford, Surrey GU2 7XH United Kingdom

More information

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor COSC160: Detection and Classification Jeremy Bolton, PhD Assistant Teaching Professor Outline I. Problem I. Strategies II. Features for training III. Using spatial information? IV. Reducing dimensionality

More information

Face detection and recognition. Many slides adapted from K. Grauman and D. Lowe

Face detection and recognition. Many slides adapted from K. Grauman and D. Lowe Face detection and recognition Many slides adapted from K. Grauman and D. Lowe Face detection and recognition Detection Recognition Sally History Early face recognition systems: based on features and distances

More information

Category vs. instance recognition

Category vs. instance recognition Category vs. instance recognition Category: Find all the people Find all the buildings Often within a single image Often sliding window Instance: Is this face James? Find this specific famous building

More information

QMUL-ACTIVA: Person Runs detection for the TRECVID Surveillance Event Detection task

QMUL-ACTIVA: Person Runs detection for the TRECVID Surveillance Event Detection task QMUL-ACTIVA: Person Runs detection for the TRECVID Surveillance Event Detection task Fahad Daniyal and Andrea Cavallaro Queen Mary University of London Mile End Road, London E1 4NS (United Kingdom) {fahad.daniyal,andrea.cavallaro}@eecs.qmul.ac.uk

More information

CS231N Section. Video Understanding 6/1/2018

CS231N Section. Video Understanding 6/1/2018 CS231N Section Video Understanding 6/1/2018 Outline Background / Motivation / History Video Datasets Models Pre-deep learning CNN + RNN 3D convolution Two-stream What we ve seen in class so far... Image

More information

Real-Time Action Recognition System from a First-Person View Video Stream

Real-Time Action Recognition System from a First-Person View Video Stream Real-Time Action Recognition System from a First-Person View Video Stream Maxim Maximov, Sang-Rok Oh, and Myong Soo Park there are some requirements to be considered. Many algorithms focus on recognition

More information

Human Detection. A state-of-the-art survey. Mohammad Dorgham. University of Hamburg

Human Detection. A state-of-the-art survey. Mohammad Dorgham. University of Hamburg Human Detection A state-of-the-art survey Mohammad Dorgham University of Hamburg Presentation outline Motivation Applications Overview of approaches (categorized) Approaches details References Motivation

More information

Computer Vision. Exercise Session 10 Image Categorization

Computer Vision. Exercise Session 10 Image Categorization Computer Vision Exercise Session 10 Image Categorization Object Categorization Task Description Given a small number of training images of a category, recognize a-priori unknown instances of that category

More information

Large-Scale Traffic Sign Recognition based on Local Features and Color Segmentation

Large-Scale Traffic Sign Recognition based on Local Features and Color Segmentation Large-Scale Traffic Sign Recognition based on Local Features and Color Segmentation M. Blauth, E. Kraft, F. Hirschenberger, M. Böhm Fraunhofer Institute for Industrial Mathematics, Fraunhofer-Platz 1,

More information

Leveraging Textural Features for Recognizing Actions in Low Quality Videos

Leveraging Textural Features for Recognizing Actions in Low Quality Videos Leveraging Textural Features for Recognizing Actions in Low Quality Videos Saimunur Rahman 1, John See 2, and Chiung Ching Ho 3 Centre of Visual Computing, Faculty of Computing and Informatics Multimedia

More information

Local Features and Bag of Words Models

Local Features and Bag of Words Models 10/14/11 Local Features and Bag of Words Models Computer Vision CS 143, Brown James Hays Slides from Svetlana Lazebnik, Derek Hoiem, Antonio Torralba, David Lowe, Fei Fei Li and others Computer Engineering

More information

Vision and Image Processing Lab., CRV Tutorial day- May 30, 2010 Ottawa, Canada

Vision and Image Processing Lab., CRV Tutorial day- May 30, 2010 Ottawa, Canada Spatio-Temporal Salient Features Amir H. Shabani Vision and Image Processing Lab., University of Waterloo, ON CRV Tutorial day- May 30, 2010 Ottawa, Canada 1 Applications Automated surveillance for scene

More information

Action Recognition From Videos using Sparse Trajectories

Action Recognition From Videos using Sparse Trajectories Action Recognition From Videos using Sparse Trajectories Alexandros Doumanoglou, Nicholas Vretos, Petros Daras Centre for Research and Technology - Hellas (ITI-CERTH) 6th Km Charilaou - Thermi, Thessaloniki,

More information

Chapter 2 Action Representation

Chapter 2 Action Representation Chapter 2 Action Representation Abstract In this chapter, various action recognition issues are covered in a concise manner. Various approaches are presented here. In Chap. 1, nomenclatures, various aspects

More information

Recognition of Animal Skin Texture Attributes in the Wild. Amey Dharwadker (aap2174) Kai Zhang (kz2213)

Recognition of Animal Skin Texture Attributes in the Wild. Amey Dharwadker (aap2174) Kai Zhang (kz2213) Recognition of Animal Skin Texture Attributes in the Wild Amey Dharwadker (aap2174) Kai Zhang (kz2213) Motivation Patterns and textures are have an important role in object description and understanding

More information

Face and Nose Detection in Digital Images using Local Binary Patterns

Face and Nose Detection in Digital Images using Local Binary Patterns Face and Nose Detection in Digital Images using Local Binary Patterns Stanko Kružić Post-graduate student University of Split, Faculty of Electrical Engineering, Mechanical Engineering and Naval Architecture

More information

Last week. Multi-Frame Structure from Motion: Multi-View Stereo. Unknown camera viewpoints

Last week. Multi-Frame Structure from Motion: Multi-View Stereo. Unknown camera viewpoints Last week Multi-Frame Structure from Motion: Multi-View Stereo Unknown camera viewpoints Last week PCA Today Recognition Today Recognition Recognition problems What is it? Object detection Who is it? Recognizing

More information

Computer Vision with MATLAB MATLAB Expo 2012 Steve Kuznicki

Computer Vision with MATLAB MATLAB Expo 2012 Steve Kuznicki Computer Vision with MATLAB MATLAB Expo 2012 Steve Kuznicki 2011 The MathWorks, Inc. 1 Today s Topics Introduction Computer Vision Feature-based registration Automatic image registration Object recognition/rotation

More information

Generic Object-Face detection

Generic Object-Face detection Generic Object-Face detection Jana Kosecka Many slides adapted from P. Viola, K. Grauman, S. Lazebnik and many others Today Window-based generic object detection basic pipeline boosting classifiers face

More information

Learning Representations for Visual Object Class Recognition

Learning Representations for Visual Object Class Recognition Learning Representations for Visual Object Class Recognition Marcin Marszałek Cordelia Schmid Hedi Harzallah Joost van de Weijer LEAR, INRIA Grenoble, Rhône-Alpes, France October 15th, 2007 Bag-of-Features

More information

Local features: detection and description May 12 th, 2015

Local features: detection and description May 12 th, 2015 Local features: detection and description May 12 th, 2015 Yong Jae Lee UC Davis Announcements PS1 grades up on SmartSite PS1 stats: Mean: 83.26 Standard Dev: 28.51 PS2 deadline extended to Saturday, 11:59

More information

Action Recognition in Low Quality Videos by Jointly Using Shape, Motion and Texture Features

Action Recognition in Low Quality Videos by Jointly Using Shape, Motion and Texture Features Action Recognition in Low Quality Videos by Jointly Using Shape, Motion and Texture Features Saimunur Rahman, John See, Chiung Ching Ho Centre of Visual Computing, Faculty of Computing and Informatics

More information

Human Focused Action Localization in Video

Human Focused Action Localization in Video Human Focused Action Localization in Video Alexander Kläser 1, Marcin Marsza lek 2, Cordelia Schmid 1, and Andrew Zisserman 2 1 INRIA Grenoble, LEAR, LJK {klaser,schmid}@inrialpes.fr 2 Engineering Science,

More information

Local features: detection and description. Local invariant features

Local features: detection and description. Local invariant features Local features: detection and description Local invariant features Detection of interest points Harris corner detection Scale invariant blob detection: LoG Description of local patches SIFT : Histograms

More information

Video Event Detection Using Motion Relativity and Feature Selection

Video Event Detection Using Motion Relativity and Feature Selection IEEE TRANSACTIONS ON MULTIMEDIA, VOL. XX, NO. XX, XX 1 Video Event Detection Using Motion Relativity and Feature Selection Feng Wang, Zhanhu Sun, Yu-Gang Jiang, and Chong-Wah Ngo Abstract Event detection

More information

Histogram of Oriented Gradients (HOG) for Object Detection

Histogram of Oriented Gradients (HOG) for Object Detection Histogram of Oriented Gradients (HOG) for Object Detection Navneet DALAL Joint work with Bill TRIGGS and Cordelia SCHMID Goal & Challenges Goal: Detect and localise people in images and videos n Wide variety

More information

Motion Estimation and Optical Flow Tracking

Motion Estimation and Optical Flow Tracking Image Matching Image Retrieval Object Recognition Motion Estimation and Optical Flow Tracking Example: Mosiacing (Panorama) M. Brown and D. G. Lowe. Recognising Panoramas. ICCV 2003 Example 3D Reconstruction

More information

Human-Robot Interaction

Human-Robot Interaction Human-Robot Interaction Elective in Artificial Intelligence Lecture 6 Visual Perception Luca Iocchi DIAG, Sapienza University of Rome, Italy With contributions from D. D. Bloisi and A. Youssef Visual Perception

More information

Large-scale visual recognition The bag-of-words representation

Large-scale visual recognition The bag-of-words representation Large-scale visual recognition The bag-of-words representation Florent Perronnin, XRCE Hervé Jégou, INRIA CVPR tutorial June 16, 2012 Outline Bag-of-words Large or small vocabularies? Extensions for instance-level

More information

Computer vision: models, learning and inference. Chapter 13 Image preprocessing and feature extraction

Computer vision: models, learning and inference. Chapter 13 Image preprocessing and feature extraction Computer vision: models, learning and inference Chapter 13 Image preprocessing and feature extraction Preprocessing The goal of pre-processing is to try to reduce unwanted variation in image due to lighting,

More information

Modeling and visual recognition of human actions and interactions

Modeling and visual recognition of human actions and interactions Modeling and visual recognition of human actions and interactions Ivan Laptev To cite this version: Ivan Laptev. Modeling and visual recognition of human actions and interactions. Computer Vision and Pattern

More information

Human detection using histogram of oriented gradients. Srikumar Ramalingam School of Computing University of Utah

Human detection using histogram of oriented gradients. Srikumar Ramalingam School of Computing University of Utah Human detection using histogram of oriented gradients Srikumar Ramalingam School of Computing University of Utah Reference Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection,

More information

Analysis: TextonBoost and Semantic Texton Forests. Daniel Munoz Februrary 9, 2009

Analysis: TextonBoost and Semantic Texton Forests. Daniel Munoz Februrary 9, 2009 Analysis: TextonBoost and Semantic Texton Forests Daniel Munoz 16-721 Februrary 9, 2009 Papers [shotton-eccv-06] J. Shotton, J. Winn, C. Rother, A. Criminisi, TextonBoost: Joint Appearance, Shape and Context

More information

Beyond Bags of features Spatial information & Shape models

Beyond Bags of features Spatial information & Shape models Beyond Bags of features Spatial information & Shape models Jana Kosecka Many slides adapted from S. Lazebnik, FeiFei Li, Rob Fergus, and Antonio Torralba Detection, recognition (so far )! Bags of features

More information

Designing Applications that See Lecture 7: Object Recognition

Designing Applications that See Lecture 7: Object Recognition stanford hci group / cs377s Designing Applications that See Lecture 7: Object Recognition Dan Maynes-Aminzade 29 January 2008 Designing Applications that See http://cs377s.stanford.edu Reminders Pick up

More information

Sparse coding for image classification

Sparse coding for image classification Sparse coding for image classification Columbia University Electrical Engineering: Kun Rong(kr2496@columbia.edu) Yongzhou Xiang(yx2211@columbia.edu) Yin Cui(yc2776@columbia.edu) Outline Background Introduction

More information

Linear combinations of simple classifiers for the PASCAL challenge

Linear combinations of simple classifiers for the PASCAL challenge Linear combinations of simple classifiers for the PASCAL challenge Nik A. Melchior and David Lee 16 721 Advanced Perception The Robotics Institute Carnegie Mellon University Email: melchior@cmu.edu, dlee1@andrew.cmu.edu

More information

Three-Dimensional Object Detection and Layout Prediction using Clouds of Oriented Gradients

Three-Dimensional Object Detection and Layout Prediction using Clouds of Oriented Gradients ThreeDimensional Object Detection and Layout Prediction using Clouds of Oriented Gradients Authors: Zhile Ren, Erik B. Sudderth Presented by: Shannon Kao, Max Wang October 19, 2016 Introduction Given an

More information

Ensemble Tracking. Abstract. 1 Introduction. 2 Background

Ensemble Tracking. Abstract. 1 Introduction. 2 Background Ensemble Tracking Shai Avidan Mitsubishi Electric Research Labs 201 Broadway Cambridge, MA 02139 avidan@merl.com Abstract We consider tracking as a binary classification problem, where an ensemble of weak

More information

Research on Robust Local Feature Extraction Method for Human Detection

Research on Robust Local Feature Extraction Method for Human Detection Waseda University Doctoral Dissertation Research on Robust Local Feature Extraction Method for Human Detection TANG, Shaopeng Graduate School of Information, Production and Systems Waseda University Feb.

More information

Detection of a Single Hand Shape in the Foreground of Still Images

Detection of a Single Hand Shape in the Foreground of Still Images CS229 Project Final Report Detection of a Single Hand Shape in the Foreground of Still Images Toan Tran (dtoan@stanford.edu) 1. Introduction This paper is about an image detection system that can detect

More information

An evaluation of local action descriptors for human action classification in the presence of occlusion

An evaluation of local action descriptors for human action classification in the presence of occlusion An evaluation of local action descriptors for human action classification in the presence of occlusion Iveel Jargalsaikhan, Cem Direkoglu, Suzanne Little, and Noel E. O Connor INSIGHT Centre for Data Analytics,

More information

Feature Descriptors. CS 510 Lecture #21 April 29 th, 2013

Feature Descriptors. CS 510 Lecture #21 April 29 th, 2013 Feature Descriptors CS 510 Lecture #21 April 29 th, 2013 Programming Assignment #4 Due two weeks from today Any questions? How is it going? Where are we? We have two umbrella schemes for object recognition

More information

Object recognition. Methods for classification and image representation

Object recognition. Methods for classification and image representation Object recognition Methods for classification and image representation Credits Slides by Pete Barnum Slides by FeiFei Li Paul Viola, Michael Jones, Robust Realtime Object Detection, IJCV 04 Navneet Dalal

More information

Feature descriptors. Alain Pagani Prof. Didier Stricker. Computer Vision: Object and People Tracking

Feature descriptors. Alain Pagani Prof. Didier Stricker. Computer Vision: Object and People Tracking Feature descriptors Alain Pagani Prof. Didier Stricker Computer Vision: Object and People Tracking 1 Overview Previous lectures: Feature extraction Today: Gradiant/edge Points (Kanade-Tomasi + Harris)

More information

Person Detection in Images using HoG + Gentleboost. Rahul Rajan June 1st July 15th CMU Q Robotics Lab

Person Detection in Images using HoG + Gentleboost. Rahul Rajan June 1st July 15th CMU Q Robotics Lab Person Detection in Images using HoG + Gentleboost Rahul Rajan June 1st July 15th CMU Q Robotics Lab 1 Introduction One of the goals of computer vision Object class detection car, animal, humans Human

More information

A Cascade of Feed-Forward Classifiers for Fast Pedestrian Detection

A Cascade of Feed-Forward Classifiers for Fast Pedestrian Detection A Cascade of eed-orward Classifiers for ast Pedestrian Detection Yu-ing Chen,2 and Chu-Song Chen,3 Institute of Information Science, Academia Sinica, aipei, aiwan 2 Dept. of Computer Science and Information

More information

Histogram of Flow and Pyramid Histogram of Visual Words for Action Recognition

Histogram of Flow and Pyramid Histogram of Visual Words for Action Recognition Histogram of Flow and Pyramid Histogram of Visual Words for Action Recognition Ethem F. Can and R. Manmatha Department of Computer Science, UMass Amherst Amherst, MA, 01002, USA [efcan, manmatha]@cs.umass.edu

More information

CLASSIFICATION Experiments

CLASSIFICATION Experiments CLASSIFICATION Experiments January 27,2015 CS3710: Visual Recognition Bhavin Modi Bag of features Object Bag of words 1. Extract features 2. Learn visual vocabulary Bag of features: outline 3. Quantize

More information