G3D: A Gaming Action Dataset and Real Time Action Recognition Evaluation Framework

Size: px
Start display at page:

Download "G3D: A Gaming Action Dataset and Real Time Action Recognition Evaluation Framework"

Transcription

1 G3D: A Gaming Action Dataset and Real Time Action Recognition Evaluation Framework Victoria Bloom Kingston University London. UK Victoria.Bloom@kingston.ac.uk Dimitrios Makris Kingston University London. UK D.Makris@kingston.ac.uk Vasileios Argyriou Kingston University London. UK Vasileios.Argyriou@kingston.ac.uk Abstract In this paper a novel evaluation framework for measuring the performance of real-time action recognition methods is presented. The evaluation framework will extend the time-based event detection metric to model multiple distinct action classes. The proposed metric provides more accurate indications of the performance of action recognition algorithms for games and other similar applications since it takes into consideration restrictions related to time and consecutive repetitions. Furthermore, a new dataset, G3D for real-time action recognition in gaming containing synchronised video, depth and skeleton data is provided. Our results indicate the need of an advanced metric especially designed for games and other similar real-time applications. 1. Introduction The gaming industry in recent years has attracted an increasing large and diverse group of people. A new generation of games based on full body play such as dance and sports games have increased the appeal of gaming to family members of all ages. Full body play relies on detecting players movements using sensors to provide a controller free gaming experience. The Kinect originally developed for the Xbox 360 games console has a wide range of released titles but they are limited to a small set of actions. Currently action recognition in gaming ranges from heuristic based techniques to machine learning algorithms. The approach taken depends on the number and complexity of gestures to be performed in the game. For example, a bowling game only requires a few simple gestures and an algorithm can be hardcoded for each gesture. However, this approach may not work well for a greater number of complex gestures where machine learning algorithms are more suited. Various machine learning techniques can be applied to more complex games for example AdaBoost with a boxing game and exemplar matching with a tennis game [1]. The benefit of machine learning algorithms is that they can be trained to recognise a wide range of actions including sporting, driving and action-adventure actions such as walking, running, jumping, dropping, firing, changing weapon, throwing and defending. This approach can increase the complexity and appeal of games that can be developed to include action-adventure games (e.g. Lara Croft). State of the art action recognition approaches are currently appearance based. The algorithms use input features such as colour, dense optical flow and spatiotemporal gradients extracted from a RGB image. The context of the environment can be used to further improve accuracy as intuitively certain actions will only happen in specific scenes [2]. For example, performing a golf swing in a real golf game would require a golf club and will occur outdoors, probably on a green field. However, performing a golf swing in a Kinect game the user has no golf club and is performing the action indoors. The restricted environment associated with gaming, typically the user s lounge, poses the challenge of missing context. The normal scene and objects usually associated with a given action are missing. This lack of contextual information may mean that appearance-based action recognition approaches may under perform. An alternative is a pose based approach where joint positions, velocities and angles are extracted from a human articulated pose. This approach was previously disregarded due to the complexity of estimating the human pose. Now it is possible to obtain real-time skeleton data for multiple subjects in an indoor environment using Kinect [3] so pose based approaches are being revisited by researchers. Yao et al. [4] experiments showed that pose based features outperform low-level appearance features in a home monitoring scenario. To compare the performance of the appearance, pose based and combined approaches for gaming, a common dataset and evaluation framework for measuring performance is required. G3D is a new public gaming action dataset containing synchronised colour, depth and skeleton data that has been captured for this paper. A novel evaluation framework for measuring the performance of real-time action recognition algorithms will also be proposed to provide a common base for comparison /12/$ IEEE 7

2 2. Related work 2.1. Action recognition There is a vast wealth of research on human action recognition in computer vision (for a comprehensive review see Aggarwal and Ryoo [5]). The majority of the state-of-the art algorithms in activity recognition are appearance based as low level features can easily be extracted from video sequences. Due to recent technological developments in depth camera technology it is now possible and economical to capture real-time sequences of depth images. This has resulted in depth and posed based approaches being developed. Li et al. [6] directly sample 3D representative points from a depth map as features for their action graph method. Their results show recognition errors were halved when using 3D depth data in comparison with 2D silhouettes. Yao et al. [4] posed the question Does Human Action Recognition Benefit from Pose Estimation? Their experiments compared appearance based, pose based and a combined approach in a home monitoring scenario using the same classifier and same dataset. The appearance based features used were colour, dense optical flow and spatio-temporal gradients. The pose based features were qualitative geometric features [7]. Yao et al. [4] results showed that the optimum approach was pose based. This significantly outperformed the appearance based approach and was even slightly better than the combined approach. Both studies show promising results and could benefit from a direct comparison with state-of-the-art appearance based methods using a common dataset Action recognition datasets There are many existing public datasets available containing video sequences. For a comprehensive review of these also see Aggarwal and Ryoo [5]. The datasets can be categorized into three groups, as follows. The first are simple action recognition datasets such as the KTH [8] and Weizmann [9] datasets, where each video contains one simple action performed by one actor in a controlled environment. The second type are surveillance datasets such as PETS [10] and i-lids [11] obtained in realistic environments such as airports. The third type are movie datasets with challenging videos with frequently moving camera viewpoints obtained from real movie scenes such as Hollywood2 [2]. For gaming, the existing action recognition datasets are insufficient as during a game multiple different actions will be performed by the player. Public datasets containing both video and skeleton data for sports and locomotion actions exist [12] [13] [14] but as mentioned previously there is a difference between performing a real action and a gaming action. Even simple actions such as walking are different in the gaming environment as the player will walk on the spot. The MPI HDM05 Motion Capture Database [13] database does include locomotion on the spot but not a full range of gaming actions. Microsoft research specifically developed a gaming action database, MSR Action3D Database [6] which initially consisted of a sequence of depth maps and was later extended by a third party to include skeleton data. Nevertheless, no corresponding video data is available. As far as we are aware there is no publicly available action recognition database for gaming that contains all three types of subject data (video, depth and skeleton). This paper will introduce a publicly available dataset, G3D for real-time action recognition in gaming containing synchronised video, depth and skeleton data Action recognition performance metrics A common performance measure used in general action recognition is classification accuracy and confusion matrices are frequently used to breakdown the number of correct classifications by class [5]. Applying the performance metric depends on the nature of the input sequence. In the simple case where the video is segmented to contain only one action e.g. KTH [8] the classification process is straightforward. An action label is predicted for each frame in the sequence and a majority decision over all frames is taken to decide the action label for the complete sequence [15]. In the more complex case, with multiple actions within a sequence e.g. the new G3D dataset, the classification process outlined above is not viable. The simple approach of applying the performance measure to the entire sequence does not incorporate timing or continuity constraints that are required to make action recognition methods applicable for a range of real-world applications. Real time performance is critical for a game to appear responsive to the player s actions. A single action performed by the player must be detected as soon as possible, as one continuous action consisting of multiple sequential frames and not multiple actions to prevent poor gameplay. In surveillance systems, continuous detection in realtime is required. The i-lids event detection metric [11] incorporates the continuous detection of events in realtime by summing occurrences of true positives, false negatives and false positives along an event timeline to produce an F1 score. The limitation of this metric is that it only recognises a single event type e.g. abandon package. A new real-time action recognition evaluation framework will be proposed that extends the event timeline detection metric to model multiple action classes. 8

3 Figure 1 Colour image 3. G3D dataset A new dataset, G3D for real-time action recognition in gaming containing synchronised video, depth and skeleton data has been captured. This dataset is publicly available at to allow researchers to develop new action recognition algorithms for video games and benchmark their performance. Due to the formats selected it is possible to view all the recorded data and tags without any special software tools. The Microsoft Kinect device and Windows SDK [3] enables easy capture of synchronised video, depth and skeleton data. The three streams were recorded at 30fps in a mirrored view so Figures 1-4 are actually a right punch. The PNG image format was selected for storing both the depth and colour images as it is a lossless format, suitable for online access and is open source. The resolution used to store both the depth and colour images was 640x480. The raw depth information contains the depth of each pixel in millimetres and was stored in 16-bit greyscale (see Figure 2) and the raw colour in 24-bit RGB (see Figure 1). The 16-bits of depth data contains 13 bits for depth data and 3 bits to identify the player. The player index can be used to segment the depth maps by user (see Figure 4). The depth information was also mapped to the colour coordinate space and stored in a 16-bit greyscale. Combining the colour image with the mapped depth data allows the user to also be segmented in the colour image. The XML text format was selected for storing the skeleton information as it is human readable and again suited for online access. The root node the XML file is an array of skeletons. Each skeleton contains the player s position and pose. The pose comprises of 20 joints as defined by Microsoft [3]. The player and joint positions are given in X,Y and Z co-ordinates in meters. These positions are also mapped into the depth (see Figure 4) and colour co-ordinates spaces. The skeleton data includes a joint tracking state, displayed in Figure 3 as tracked (green), inferred (yellow) and not tracked (red). In many cases the inferred joints will be accurate as in Figure 4 but certain situations where limbs are occluded the inferred joints may be inaccurate as in Figure 5. Consequently, pose data may need to be combined with colour or depth data to improve accuracy. Figure 2 Depth map Figure 3 Skeleton data Figure 4 Correctly inferred joints Figure 5 Incorrectly inferred joints This dataset contains 10 subjects performing 20 gaming actions : punch right, punch left, kick right, kick left, defend, golf swing, tennis swing forehand, tennis swing backhand, tennis serve, throw bowling ball, aim and fire gun, walk, run, jump, climb, crouch, steer a car, wave, flap and clap. Most sequences contain multiple actions in a controlled indoor environment with a fixed camera, a typical setup for gesture based gaming. The subjects were given basic instructions as to how to perform the action, similar to those issued in a Kinect game. Nevertheless, the subjects were free to perform the gesture with either hand or in the case of a side facing action stand with either foot forward to create a diverse dataset. Each sequence is repeated three times by each subject. This resulted in over 80,000 frames of video, depth and skeleton data. All the frames in the dataset that contain actions were manually labelled in a separate file with an appropriate tag. Each tag represents a single action and contains the action class e.g. PunchRight, first frame number and last frame number. The XML tags are also publicly available. 9

4 4. Evaluation Framework The existing performance metric for action recognition, classification accuracy determined for each sequence is inadequate for gaming as it does not incorporate time and continuity constraints critical for real-time action detection. The existing metric is also restricted to sequences containing one action. To address these limitations a new evaluation framework is proposed. To incorporate multiple actions a frame based evaluation framework could be used. Matches between each predicted frame and ground truth frame could be accumulated in a confusion matrix and average classification accuracy determined. Although this would work for multiple actions in a sequence it still does not incorporate timing and continuity constraints. To overcome all of the existing limitations an action based evaluation framework is needed. However, matching each predicted action with a ground truth action poses a challenge as both the predicted and ground truth actions represent a sequence of frames rather than just a single frame. It is no longer possible to accumulate matches in a confusion matrix as there is a new case. The new case is where the predicted action class matches a ground truth action type but is a false positive; e.g. in a duplicated action (see the second predicted defend action in Figure 6). As the predicted and ground truth action classes match, the only place to represent this information in a confusion matrix is along the diagonal axis, which represents the true positives. However, in this is clearly wrong as the second defend action is a false positive. The i-lids event detection metric [11] resolves this problem by using an event time-line to determine the correct number of true positives, false negatives and false positives. For example, a true positive is detected if a predicted event occurs within 10 seconds of an actual event. However, this metric only recognises a single event type e.g. an alarm. Extending the time based comparison of alarm events by the i-lids event detection metric [11] for multiple actions results in a new action based evaluation framework. The new framework can evaluate the performance of any real time action recognition algorithm by selecting an appropriate time constraint. A latency of 4 frames is by selecting an acceptable in gaming as it should not be perceived by the user. In Microsoft s experiments [1] an actual latency of 4 frames resulted in a perceived latency of 0 to 1 frames. The new action matching process is illustrated in Figure 6. For each sequence the start time of any predicted actions of a particular class should be compared to the start time of the ground truth action to evaluate the number of true positive, false positive and false negative actions. In the proposed framework a true positive arises when a predicted event starts within T (T=4 frames) of an actual event. A false positive is a predicted event that does not start within T frames of an actual event; this can be caused by two different situations. Firstly, when the correct action is detected but not within the T time period, for example the predicted punch left action on Figure 6. Secondly, when the incorrect action is predicted, for example the first predicted kick left action on Figure 6. Finally, a false negative is an actual event that remains unmatched with any predicted event. The main difference between this evaluation framework and the i-lids event detection is the addition of multiple distinct action classes to detect incorrect actions. The output of the matching process is the total number of true positives, false positive and false negative actions for each action class. This division by class is extremely helpful for improving action recognition algorithms but not so effective for ranking different algorithms. The performance metrics of precision, recall and F1 are then computed for each action class and a final single average F1 figure calculated. Figure 6. Matching of predicted actions with ground truth actions 10

5 5. Action recognition algorithm To test our new evaluation framework we used Adaptive Boosting (AdaBoost) [16] a real-time action recognition machine learning algorithm proved successful for gaming gestures [1]. Microsoft [1] recommended pose based input features such as position difference, position velocity, position velocity magnitude, angle velocity and joint angles. A description of the features is not available so the following is our own definition. Let be the 3D location (,, ) and of joint at time. The position difference features are defined as the difference between two distinct joints in a single pose: (1) (2) (3) The position velocity features are similar except they encode the differences between a single joint separated by time, where : (4) (5) (6) The position velocity magnitude feature is defined as the Euclidean distance between a single joint separated by time, where : (7) The angle velocity features are defined as the angle between a single joint separated by time projected in the x- y plane and the x-z plane: (8) to sum up the per frame results. A sliding window of size 3 is being used to filter our results (see Figure 9). The filtering is important to link broken actions into a single continuous block. However, the sliding window has the negative side effect of delaying the detection of the action. The length of the delay is directly related to the size of the sliding window so there is an inherent tradeoff between detecting continuous actions and the latency in detecting the action. Figure 7. Ground truth (frame based) for trial 24 Figure 8. Confidence results (frame based) for trial 24 Figure 9. Filtered confidence results (frame based) for trial 24 (9) The joint angle feature is defined as the 3D angle between three distinct joints in a single pose, let vector v 1 be (j 2 j 1 ) and vector v 2 be (j 2 j 3 ) : (10) We have implemented 170 features based on the Microsoft features described by equations (1)-(10). Instead of obtaining thresholds through experimentation [7] to produce Boolean features we quantised the continuous output of the features into 16 discrete outputs. We used the skeleton data from the G3D dataset which was split by subjects where the first 5 subjects were used for training and the remaining 5 subjects for testing. 6. Results The output from the AdaBoost algorithm is per frame confidence results (see Figure 8). Microsoft [1] recommend filtering the result by using a sliding window Figure 10. Ground truth (action based) for trial 24 Figure 11. Highest confidence results (action based) for trial 24 The filtered frame based results were then converted to action based results by detecting the class with the highest confidence at each frame. If other was the class with highest confidence then no action was predicted. Otherwise, the action predicted was simply the action with the highest confidence. Once an action was predicted sequential frames of the same class where linked to produce a single start and end frame number for each action (see Figure 11). The predicted actions were then matched against the ground truth actions (see Figure 10) using the new matching process illustrated in Figure 6. Since a confusion matrix is not feasible for the action based approach, a matrix with the estimated true positives, false 11

6 negatives and false positives is provided in Figure 12 (left). The action-based F1 value for each class was computed from this matrix and then averaged to produce an F1 value for the fighting sequence, of 46.66%. For comparison a frame based confusion matrix was also computed (see Figure 12 right) and the frame-based average F1 value for the same fighting sequence is 61.17%. AdaBoost a real-time action recognition method and further indicate the need of the proposed metric that takes into consideration all these new parameters related to games and real-time applications. Finally, a new dataset for gaming action recognition is provided containing synchronised video, depth and skeleton data. This combined dataset will allow further research into the comparison of pose and appearance based methodologies. 8. References Figure 12. Fighting matrices action based (left), frame based (right) This process was repeated for all sequences in the G3D dataset and Table 1 shows a comparison of the frame based and action based results. The results highlight a dramatic difference in the frame based F1 values and the action based F1 values. Whilst the algorithm appears to be performing well at the frame level it is severely underperforming at the action level. The difference between the frame based and action-based evaluation framework is significant. Therefore, in real-time game applications, a performance measure that does not take into consideration timing and continuity may lead to a wrong conclusion and therefore should be avoided. Action Category Frame based F1 Action based F1 (T=4) Fighting 70.46% 58.54% Golf 83.37% 11.88% Tennis 56.44% 14.85% Bowling 80.78% 31.58% FPS 53.57% 13.65% Driving a car 84.24% 2.50% Misc 78.21% 18.13% 7. Conclusion Table 1 Testing results A novel and reliable evaluation framework for real-time action recognition algorithms was suggested in this work. Compared with the existing metric utilised in this area it provides additional restrictions related to time and repetitions. Therefore, correct recognitions but not in time have the same negative effect in games as other real-time applications. Additionally, recognising multiple times the same action while only one occurrence is present will have disastrous effects in a game and thus it has to be taken into consideration during the evaluation process and the corresponding metrics. Experiments were performed using [1] Marais C. (2011). Kinect Gesture Detection using Machine Learning [Online]. Available: s.aspx?id= [2] M. Marszalek, I. Laptev and C. Schmid, "Actions in context," in CVPR, pp , [3] Microsoft. (2012). Kinect Natural User Interface (NUI) Overview [Online]. Available: [4] A. Yao, J. Gall, G. Fanelli and L. Van Gool, "Does Human Action Recognition Benefit from Pose Estimation?"BMVC. pp , [5] J.K. Aggarwal and M.S. Ryoo, "Human activity analysis: A review," CSUR, vol. 43, [6] W. Li, Z. Zhang, and Z. Liu, "Action Recognition Based on A Bag of 3D Points", CVPR4HB, pages 9-14, San Francisco, CA, USA, June 18, [7] M. Muller, T. Roder and M. Clausen, "Efficient contentbased retrieval of motion capture data," ACM Trans.Graph., vol. 24, pp , July [8] C. Schuldt, I. Laptev and B. Caputo, "Recognizing human actions: a local SVM approach,", ICPR. pp Vol.3, [9] M. Blank, L. Gorelick, E. Shechtman, M. Irani and R. Basri, "Actions as space-time shapes," in ICCV, pp , [10] D. Thirde. (2005). PETS: Performance Evaluation of Tracking and Surveillance [Online]. Available: [11] Home Office, "Imagery Library for Intelligent Detection Systems : The i-lids User Guide," [12] Carnegie Mellon University. (2011). Motion Capture Database [Online]. Available: [13] M. Muller, T. Roder, M. Clausen, B. Eberhardt, B. Kruger and A. Weber, "Documentation Mocap Database HDM05," Universitat Bonn., Tech. Rep. No. CG , [14] L. Sigal, A.O. Balan and M.J. Black, "HumanEva: Synchronized Video and Motion Capture Dataset and Baseline Algorithm for Evaluation of Articulated Human Motion," Int.J.Comput.Vision, vol. 87, pp. 4-27, March [15] A. Gilbert, J. Illingworth and R. Bowden, "Action Recognition Using Mined Hierarchical Compound Features," TPAMI, vol. 33, pp , [16] Freund, Y. and Schapire, R. E. (1996b). Experiments with a new boosting algorithm. In Machine Learning: Proceedings of the Thirteenth International Conference Morgan Kaufman, San Francisco. 12

QMUL-ACTIVA: Person Runs detection for the TRECVID Surveillance Event Detection task

QMUL-ACTIVA: Person Runs detection for the TRECVID Surveillance Event Detection task QMUL-ACTIVA: Person Runs detection for the TRECVID Surveillance Event Detection task Fahad Daniyal and Andrea Cavallaro Queen Mary University of London Mile End Road, London E1 4NS (United Kingdom) {fahad.daniyal,andrea.cavallaro}@eecs.qmul.ac.uk

More information

Action Recognition with HOG-OF Features

Action Recognition with HOG-OF Features Action Recognition with HOG-OF Features Florian Baumann Institut für Informationsverarbeitung, Leibniz Universität Hannover, {last name}@tnt.uni-hannover.de Abstract. In this paper a simple and efficient

More information

Lecture 18: Human Motion Recognition

Lecture 18: Human Motion Recognition Lecture 18: Human Motion Recognition Professor Fei Fei Li Stanford Vision Lab 1 What we will learn today? Introduction Motion classification using template matching Motion classification i using spatio

More information

Evaluation of Local Space-time Descriptors based on Cuboid Detector in Human Action Recognition

Evaluation of Local Space-time Descriptors based on Cuboid Detector in Human Action Recognition International Journal of Innovation and Applied Studies ISSN 2028-9324 Vol. 9 No. 4 Dec. 2014, pp. 1708-1717 2014 Innovative Space of Scientific Research Journals http://www.ijias.issr-journals.org/ Evaluation

More information

CS229: Action Recognition in Tennis

CS229: Action Recognition in Tennis CS229: Action Recognition in Tennis Aman Sikka Stanford University Stanford, CA 94305 Rajbir Kataria Stanford University Stanford, CA 94305 asikka@stanford.edu rkataria@stanford.edu 1. Motivation As active

More information

Combined Shape Analysis of Human Poses and Motion Units for Action Segmentation and Recognition

Combined Shape Analysis of Human Poses and Motion Units for Action Segmentation and Recognition Combined Shape Analysis of Human Poses and Motion Units for Action Segmentation and Recognition Maxime Devanne 1,2, Hazem Wannous 1, Stefano Berretti 2, Pietro Pala 2, Mohamed Daoudi 1, and Alberto Del

More information

Motion analysis for broadcast tennis video considering mutual interaction of players

Motion analysis for broadcast tennis video considering mutual interaction of players 14-10 MVA2011 IAPR Conference on Machine Vision Applications, June 13-15, 2011, Nara, JAPAN analysis for broadcast tennis video considering mutual interaction of players Naoto Maruyama, Kazuhiro Fukui

More information

Dynamic Human Shape Description and Characterization

Dynamic Human Shape Description and Characterization Dynamic Human Shape Description and Characterization Z. Cheng*, S. Mosher, Jeanne Smith H. Cheng, and K. Robinette Infoscitex Corporation, Dayton, Ohio, USA 711 th Human Performance Wing, Air Force Research

More information

Person Action Recognition/Detection

Person Action Recognition/Detection Person Action Recognition/Detection Fabrício Ceschin Visão Computacional Prof. David Menotti Departamento de Informática - Universidade Federal do Paraná 1 In object recognition: is there a chair in the

More information

Action recognition in videos

Action recognition in videos Action recognition in videos Cordelia Schmid INRIA Grenoble Joint work with V. Ferrari, A. Gaidon, Z. Harchaoui, A. Klaeser, A. Prest, H. Wang Action recognition - goal Short actions, i.e. drinking, sit

More information

EigenJoints-based Action Recognition Using Naïve-Bayes-Nearest-Neighbor

EigenJoints-based Action Recognition Using Naïve-Bayes-Nearest-Neighbor EigenJoints-based Action Recognition Using Naïve-Bayes-Nearest-Neighbor Xiaodong Yang and YingLi Tian Department of Electrical Engineering The City College of New York, CUNY {xyang02, ytian}@ccny.cuny.edu

More information

The Kinect Sensor. Luís Carriço FCUL 2014/15

The Kinect Sensor. Luís Carriço FCUL 2014/15 Advanced Interaction Techniques The Kinect Sensor Luís Carriço FCUL 2014/15 Sources: MS Kinect for Xbox 360 John C. Tang. Using Kinect to explore NUI, Ms Research, From Stanford CS247 Shotton et al. Real-Time

More information

Human Detection and Tracking for Video Surveillance: A Cognitive Science Approach

Human Detection and Tracking for Video Surveillance: A Cognitive Science Approach Human Detection and Tracking for Video Surveillance: A Cognitive Science Approach Vandit Gajjar gajjar.vandit.381@ldce.ac.in Ayesha Gurnani gurnani.ayesha.52@ldce.ac.in Yash Khandhediya khandhediya.yash.364@ldce.ac.in

More information

Combining PGMs and Discriminative Models for Upper Body Pose Detection

Combining PGMs and Discriminative Models for Upper Body Pose Detection Combining PGMs and Discriminative Models for Upper Body Pose Detection Gedas Bertasius May 30, 2014 1 Introduction In this project, I utilized probabilistic graphical models together with discriminative

More information

Human Upper Body Pose Estimation in Static Images

Human Upper Body Pose Estimation in Static Images 1. Research Team Human Upper Body Pose Estimation in Static Images Project Leader: Graduate Students: Prof. Isaac Cohen, Computer Science Mun Wai Lee 2. Statement of Project Goals This goal of this project

More information

Person Identity Recognition on Motion Capture Data Using Label Propagation

Person Identity Recognition on Motion Capture Data Using Label Propagation Person Identity Recognition on Motion Capture Data Using Label Propagation Nikos Nikolaidis Charalambos Symeonidis AIIA Lab, Department of Informatics Aristotle University of Thessaloniki Greece email:

More information

Indoor Object Recognition of 3D Kinect Dataset with RNNs

Indoor Object Recognition of 3D Kinect Dataset with RNNs Indoor Object Recognition of 3D Kinect Dataset with RNNs Thiraphat Charoensripongsa, Yue Chen, Brian Cheng 1. Introduction Recent work at Stanford in the area of scene understanding has involved using

More information

Latent Variable Models for Structured Prediction and Content-Based Retrieval

Latent Variable Models for Structured Prediction and Content-Based Retrieval Latent Variable Models for Structured Prediction and Content-Based Retrieval Ariadna Quattoni Universitat Politècnica de Catalunya Joint work with Borja Balle, Xavier Carreras, Adrià Recasens, Antonio

More information

Tri-modal Human Body Segmentation

Tri-modal Human Body Segmentation Tri-modal Human Body Segmentation Master of Science Thesis Cristina Palmero Cantariño Advisor: Sergio Escalera Guerrero February 6, 2014 Outline 1 Introduction 2 Tri-modal dataset 3 Proposed baseline 4

More information

Action Recognition From Videos using Sparse Trajectories

Action Recognition From Videos using Sparse Trajectories Action Recognition From Videos using Sparse Trajectories Alexandros Doumanoglou, Nicholas Vretos, Petros Daras Centre for Research and Technology - Hellas (ITI-CERTH) 6th Km Charilaou - Thermi, Thessaloniki,

More information

IMPROVING SPATIO-TEMPORAL FEATURE EXTRACTION TECHNIQUES AND THEIR APPLICATIONS IN ACTION CLASSIFICATION. Maral Mesmakhosroshahi, Joohee Kim

IMPROVING SPATIO-TEMPORAL FEATURE EXTRACTION TECHNIQUES AND THEIR APPLICATIONS IN ACTION CLASSIFICATION. Maral Mesmakhosroshahi, Joohee Kim IMPROVING SPATIO-TEMPORAL FEATURE EXTRACTION TECHNIQUES AND THEIR APPLICATIONS IN ACTION CLASSIFICATION Maral Mesmakhosroshahi, Joohee Kim Department of Electrical and Computer Engineering Illinois Institute

More information

Object and Action Detection from a Single Example

Object and Action Detection from a Single Example Object and Action Detection from a Single Example Peyman Milanfar* EE Department University of California, Santa Cruz *Joint work with Hae Jong Seo AFOSR Program Review, June 4-5, 29 Take a look at this:

More information

Scale Invariant Human Action Detection from Depth Cameras using Class Templates

Scale Invariant Human Action Detection from Depth Cameras using Class Templates Scale Invariant Human Action Detection from Depth Cameras using Class Templates Kartik Gupta and Arnav Bhavsar School of Computing and Electrical Engineering Indian Institute of Technology Mandi, India

More information

From Activity to Language:

From Activity to Language: From Activity to Language: Learning to recognise the meaning of motion Centre for Vision, Speech and Signal Processing Prof Rich Bowden 20 June 2011 Overview Talk is about recognising spatio temporal patterns

More information

Classification of RGB-D and Motion Capture Sequences Using Extreme Learning Machine

Classification of RGB-D and Motion Capture Sequences Using Extreme Learning Machine Classification of RGB-D and Motion Capture Sequences Using Extreme Learning Machine Xi Chen and Markus Koskela Department of Information and Computer Science Aalto University School of Science P.O. Box

More information

EVENT DETECTION AND HUMAN BEHAVIOR RECOGNITION. Ing. Lorenzo Seidenari

EVENT DETECTION AND HUMAN BEHAVIOR RECOGNITION. Ing. Lorenzo Seidenari EVENT DETECTION AND HUMAN BEHAVIOR RECOGNITION Ing. Lorenzo Seidenari e-mail: seidenari@dsi.unifi.it What is an Event? Dictionary.com definition: something that occurs in a certain place during a particular

More information

Fast Realistic Multi-Action Recognition using Mined Dense Spatio-temporal Features

Fast Realistic Multi-Action Recognition using Mined Dense Spatio-temporal Features Fast Realistic Multi-Action Recognition using Mined Dense Spatio-temporal Features Andrew Gilbert, John Illingworth and Richard Bowden CVSSP, University of Surrey, Guildford, Surrey GU2 7XH United Kingdom

More information

Robust Classification of Human Actions from 3D Data

Robust Classification of Human Actions from 3D Data Robust Classification of Human s from 3D Data Loc Huynh, Thanh Ho, Quang Tran, Thang Ba Dinh, Tien Dinh Faculty of Information Technology University of Science Ho Chi Minh City, Vietnam hvloc@apcs.vn,

More information

Does Human Action Recognition Benefit from Pose Estimation?

Does Human Action Recognition Benefit from Pose Estimation? YAO ET AL.: DOES ACTION RECOGNITION BENEFIT FROM POSE ESTIMATION? 1 Does Human Action Recognition Benefit from Pose Estimation? Angela Yao 1 yaoa@vision.ee.ethz.ch Juergen Gall 1 gall@vision.ee.ethz.ch

More information

Part I: HumanEva-I dataset and evaluation metrics

Part I: HumanEva-I dataset and evaluation metrics Part I: HumanEva-I dataset and evaluation metrics Leonid Sigal Michael J. Black Department of Computer Science Brown University http://www.cs.brown.edu/people/ls/ http://vision.cs.brown.edu/humaneva/ Motivation

More information

Three-Dimensional Object Detection and Layout Prediction using Clouds of Oriented Gradients

Three-Dimensional Object Detection and Layout Prediction using Clouds of Oriented Gradients ThreeDimensional Object Detection and Layout Prediction using Clouds of Oriented Gradients Authors: Zhile Ren, Erik B. Sudderth Presented by: Shannon Kao, Max Wang October 19, 2016 Introduction Given an

More information

Temporal Feature Weighting for Prototype-Based Action Recognition

Temporal Feature Weighting for Prototype-Based Action Recognition Temporal Feature Weighting for Prototype-Based Action Recognition Thomas Mauthner, Peter M. Roth, and Horst Bischof Institute for Computer Graphics and Vision Graz University of Technology {mauthner,pmroth,bischof}@icg.tugraz.at

More information

Articulated Pose Estimation with Flexible Mixtures-of-Parts

Articulated Pose Estimation with Flexible Mixtures-of-Parts Articulated Pose Estimation with Flexible Mixtures-of-Parts PRESENTATION: JESSE DAVIS CS 3710 VISUAL RECOGNITION Outline Modeling Special Cases Inferences Learning Experiments Problem and Relevance Problem:

More information

Human Activity Recognition Using a Dynamic Texture Based Method

Human Activity Recognition Using a Dynamic Texture Based Method Human Activity Recognition Using a Dynamic Texture Based Method Vili Kellokumpu, Guoying Zhao and Matti Pietikäinen Machine Vision Group University of Oulu, P.O. Box 4500, Finland {kello,gyzhao,mkp}@ee.oulu.fi

More information

STOP: Space-Time Occupancy Patterns for 3D Action Recognition from Depth Map Sequences

STOP: Space-Time Occupancy Patterns for 3D Action Recognition from Depth Map Sequences STOP: Space-Time Occupancy Patterns for 3D Action Recognition from Depth Map Sequences Antonio W. Vieira 1,2,EricksonR.Nascimento 1, Gabriel L. Oliveira 1, Zicheng Liu 3,andMarioF.M.Campos 1, 1 DCC - Universidade

More information

Graph-based High Level Motion Segmentation using Normalized Cuts

Graph-based High Level Motion Segmentation using Normalized Cuts Graph-based High Level Motion Segmentation using Normalized Cuts Sungju Yun, Anjin Park and Keechul Jung Abstract Motion capture devices have been utilized in producing several contents, such as movies

More information

Action Recognition & Categories via Spatial-Temporal Features

Action Recognition & Categories via Spatial-Temporal Features Action Recognition & Categories via Spatial-Temporal Features 华俊豪, 11331007 huajh7@gmail.com 2014/4/9 Talk at Image & Video Analysis taught by Huimin Yu. Outline Introduction Frameworks Feature extraction

More information

Extracting Spatio-temporal Local Features Considering Consecutiveness of Motions

Extracting Spatio-temporal Local Features Considering Consecutiveness of Motions Extracting Spatio-temporal Local Features Considering Consecutiveness of Motions Akitsugu Noguchi and Keiji Yanai Department of Computer Science, The University of Electro-Communications, 1-5-1 Chofugaoka,

More information

Categorizing Turn-Taking Interactions

Categorizing Turn-Taking Interactions Categorizing Turn-Taking Interactions Karthir Prabhakar and James M. Rehg Center for Behavior Imaging and RIM@GT School of Interactive Computing, Georgia Institute of Technology {karthir.prabhakar,rehg}@cc.gatech.edu

More information

P-CNN: Pose-based CNN Features for Action Recognition. Iman Rezazadeh

P-CNN: Pose-based CNN Features for Action Recognition. Iman Rezazadeh P-CNN: Pose-based CNN Features for Action Recognition Iman Rezazadeh Introduction automatic understanding of dynamic scenes strong variations of people and scenes in motion and appearance Fine-grained

More information

AUTOMATIC 3D HUMAN ACTION RECOGNITION Ajmal Mian Associate Professor Computer Science & Software Engineering

AUTOMATIC 3D HUMAN ACTION RECOGNITION Ajmal Mian Associate Professor Computer Science & Software Engineering AUTOMATIC 3D HUMAN ACTION RECOGNITION Ajmal Mian Associate Professor Computer Science & Software Engineering www.csse.uwa.edu.au/~ajmal/ Overview Aim of automatic human action recognition Applications

More information

Fully Automatic Methodology for Human Action Recognition Incorporating Dynamic Information

Fully Automatic Methodology for Human Action Recognition Incorporating Dynamic Information Fully Automatic Methodology for Human Action Recognition Incorporating Dynamic Information Ana González, Marcos Ortega Hortas, and Manuel G. Penedo University of A Coruña, VARPA group, A Coruña 15071,

More information

Linear combinations of simple classifiers for the PASCAL challenge

Linear combinations of simple classifiers for the PASCAL challenge Linear combinations of simple classifiers for the PASCAL challenge Nik A. Melchior and David Lee 16 721 Advanced Perception The Robotics Institute Carnegie Mellon University Email: melchior@cmu.edu, dlee1@andrew.cmu.edu

More information

ACTION BASED FEATURES OF HUMAN ACTIVITY RECOGNITION SYSTEM

ACTION BASED FEATURES OF HUMAN ACTIVITY RECOGNITION SYSTEM ACTION BASED FEATURES OF HUMAN ACTIVITY RECOGNITION SYSTEM 1,3 AHMED KAWTHER HUSSEIN, 1 PUTEH SAAD, 1 RUZELITA NGADIRAN, 2 YAZAN ALJEROUDI, 4 MUHAMMAD SAMER SALLAM 1 School of Computer and Communication

More information

Definition, Detection, and Evaluation of Meeting Events in Airport Surveillance Videos

Definition, Detection, and Evaluation of Meeting Events in Airport Surveillance Videos Definition, Detection, and Evaluation of Meeting Events in Airport Surveillance Videos Sung Chun Lee, Chang Huang, and Ram Nevatia University of Southern California, Los Angeles, CA 90089, USA sungchun@usc.edu,

More information

An Efficient Part-Based Approach to Action Recognition from RGB-D Video with BoW-Pyramid Representation

An Efficient Part-Based Approach to Action Recognition from RGB-D Video with BoW-Pyramid Representation 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) November 3-7, 2013. Tokyo, Japan An Efficient Part-Based Approach to Action Recognition from RGB-D Video with BoW-Pyramid

More information

Kinect Cursor Control EEE178 Dr. Fethi Belkhouche Christopher Harris Danny Nguyen I. INTRODUCTION

Kinect Cursor Control EEE178 Dr. Fethi Belkhouche Christopher Harris Danny Nguyen I. INTRODUCTION Kinect Cursor Control EEE178 Dr. Fethi Belkhouche Christopher Harris Danny Nguyen Abstract: An XBOX 360 Kinect is used to develop two applications to control the desktop cursor of a Windows computer. Application

More information

Multimodal Motion Capture Dataset TNT15

Multimodal Motion Capture Dataset TNT15 Multimodal Motion Capture Dataset TNT15 Timo v. Marcard, Gerard Pons-Moll, Bodo Rosenhahn January 2016 v1.2 1 Contents 1 Introduction 3 2 Technical Recording Setup 3 2.1 Video Data............................

More information

Action Recognition in Video by Sparse Representation on Covariance Manifolds of Silhouette Tunnels

Action Recognition in Video by Sparse Representation on Covariance Manifolds of Silhouette Tunnels Action Recognition in Video by Sparse Representation on Covariance Manifolds of Silhouette Tunnels Kai Guo, Prakash Ishwar, and Janusz Konrad Department of Electrical & Computer Engineering Motivation

More information

Category vs. instance recognition

Category vs. instance recognition Category vs. instance recognition Category: Find all the people Find all the buildings Often within a single image Often sliding window Instance: Is this face James? Find this specific famous building

More information

Lecture 19: Depth Cameras. Visual Computing Systems CMU , Fall 2013

Lecture 19: Depth Cameras. Visual Computing Systems CMU , Fall 2013 Lecture 19: Depth Cameras Visual Computing Systems Continuing theme: computational photography Cameras capture light, then extensive processing produces the desired image Today: - Capturing scene depth

More information

Face Recognition At-a-Distance Based on Sparse-Stereo Reconstruction

Face Recognition At-a-Distance Based on Sparse-Stereo Reconstruction Face Recognition At-a-Distance Based on Sparse-Stereo Reconstruction Ham Rara, Shireen Elhabian, Asem Ali University of Louisville Louisville, KY {hmrara01,syelha01,amali003}@louisville.edu Mike Miller,

More information

Distance matrices as invariant features for classifying MoCap data

Distance matrices as invariant features for classifying MoCap data Distance matrices as invariant features for classifying MoCap data ANTONIO W. VIEIRA 1, THOMAS LEWINER 2, WILLIAM SCHWARTZ 3 AND MARIO CAMPOS 3 1 Department of Mathematics Unimontes Montes Claros Brazil

More information

Bag of Optical Flow Volumes for Image Sequence Recognition 1

Bag of Optical Flow Volumes for Image Sequence Recognition 1 RIEMENSCHNEIDER, DONOSER, BISCHOF: BAG OF OPTICAL FLOW VOLUMES 1 Bag of Optical Flow Volumes for Image Sequence Recognition 1 Hayko Riemenschneider http://www.icg.tugraz.at/members/hayko Michael Donoser

More information

Action Recognition Using Motion History Image and Static History Image-based Local Binary Patterns

Action Recognition Using Motion History Image and Static History Image-based Local Binary Patterns , pp.20-214 http://dx.doi.org/10.14257/ijmue.2017.12.1.17 Action Recognition Using Motion History Image and Static History Image-based Local Binary Patterns Enqing Chen 1, Shichao Zhang* 1 and Chengwu

More information

Adaptive Action Detection

Adaptive Action Detection Adaptive Action Detection Illinois Vision Workshop Dec. 1, 2009 Liangliang Cao Dept. ECE, UIUC Zicheng Liu Microsoft Research Thomas Huang Dept. ECE, UIUC Motivation Action recognition is important in

More information

Evaluation of local descriptors for action recognition in videos

Evaluation of local descriptors for action recognition in videos Evaluation of local descriptors for action recognition in videos Piotr Bilinski and Francois Bremond INRIA Sophia Antipolis - PULSAR group 2004 route des Lucioles - BP 93 06902 Sophia Antipolis Cedex,

More information

Learning Human Motion Models from Unsegmented Videos

Learning Human Motion Models from Unsegmented Videos In IEEE International Conference on Pattern Recognition (CVPR), pages 1-7, Alaska, June 2008. Learning Human Motion Models from Unsegmented Videos Roman Filipovych Eraldo Ribeiro Computer Vision and Bio-Inspired

More information

Real-Time Human Detection using Relational Depth Similarity Features

Real-Time Human Detection using Relational Depth Similarity Features Real-Time Human Detection using Relational Depth Similarity Features Sho Ikemura, Hironobu Fujiyoshi Dept. of Computer Science, Chubu University. Matsumoto 1200, Kasugai, Aichi, 487-8501 Japan. si@vision.cs.chubu.ac.jp,

More information

Real-time Detection of Illegally Parked Vehicles Using 1-D Transformation

Real-time Detection of Illegally Parked Vehicles Using 1-D Transformation Real-time Detection of Illegally Parked Vehicles Using 1-D Transformation Jong Taek Lee, M. S. Ryoo, Matthew Riley, and J. K. Aggarwal Computer & Vision Research Center Dept. of Electrical & Computer Engineering,

More information

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks Si Chen The George Washington University sichen@gwmail.gwu.edu Meera Hahn Emory University mhahn7@emory.edu Mentor: Afshin

More information

Human Action Recognition Using Dynamic Time Warping and Voting Algorithm (1)

Human Action Recognition Using Dynamic Time Warping and Voting Algorithm (1) VNU Journal of Science: Comp. Science & Com. Eng., Vol. 30, No. 3 (2014) 22-30 Human Action Recognition Using Dynamic Time Warping and Voting Algorithm (1) Pham Chinh Huu *, Le Quoc Khanh, Le Thanh Ha

More information

The Analysis of Animate Object Motion using Neural Networks and Snakes

The Analysis of Animate Object Motion using Neural Networks and Snakes The Analysis of Animate Object Motion using Neural Networks and Snakes Ken Tabb, Neil Davey, Rod Adams & Stella George e-mail {K.J.Tabb, N.Davey, R.G.Adams, S.J.George}@herts.ac.uk http://www.health.herts.ac.uk/ken/vision/

More information

Evaluation Measures. Sebastian Pölsterl. April 28, Computer Aided Medical Procedures Technische Universität München

Evaluation Measures. Sebastian Pölsterl. April 28, Computer Aided Medical Procedures Technische Universität München Evaluation Measures Sebastian Pölsterl Computer Aided Medical Procedures Technische Universität München April 28, 2015 Outline 1 Classification 1. Confusion Matrix 2. Receiver operating characteristics

More information

Dyadic Interaction Detection from Pose and Flow

Dyadic Interaction Detection from Pose and Flow Dyadic Interaction Detection from Pose and Flow Coert van Gemeren 1,RobbyT.Tan 2, Ronald Poppe 1,andRemcoC.Veltkamp 1, 1 Interaction Technology Group, Department of Information and Computing Sciences,

More information

Human Detection. A state-of-the-art survey. Mohammad Dorgham. University of Hamburg

Human Detection. A state-of-the-art survey. Mohammad Dorgham. University of Hamburg Human Detection A state-of-the-art survey Mohammad Dorgham University of Hamburg Presentation outline Motivation Applications Overview of approaches (categorized) Approaches details References Motivation

More information

Human Focused Action Localization in Video

Human Focused Action Localization in Video Human Focused Action Localization in Video Alexander Kläser 1, Marcin Marsza lek 2, Cordelia Schmid 1, and Andrew Zisserman 2 1 INRIA Grenoble, LEAR, LJK {klaser,schmid}@inrialpes.fr 2 Engineering Science,

More information

Real-Time Action Recognition System from a First-Person View Video Stream

Real-Time Action Recognition System from a First-Person View Video Stream Real-Time Action Recognition System from a First-Person View Video Stream Maxim Maximov, Sang-Rok Oh, and Myong Soo Park there are some requirements to be considered. Many algorithms focus on recognition

More information

Modeling Motion of Body Parts for Action Recognition

Modeling Motion of Body Parts for Action Recognition K.N.TRAN, I.A.KAKADIARIS, S.K.SHAH: MODELING MOTION OF BODY PARTS 1 Modeling Motion of Body Parts for Action Recognition Khai N. Tran khaitran@cs.uh.edu Ioannis A. Kakadiaris ioannisk@uh.edu Shishir K.

More information

Spatio-temporal Shape and Flow Correlation for Action Recognition

Spatio-temporal Shape and Flow Correlation for Action Recognition Spatio-temporal Shape and Flow Correlation for Action Recognition Yan Ke 1, Rahul Sukthankar 2,1, Martial Hebert 1 1 School of Computer Science, Carnegie Mellon; 2 Intel Research Pittsburgh {yke,rahuls,hebert}@cs.cmu.edu

More information

Mobile Human Detection Systems based on Sliding Windows Approach-A Review

Mobile Human Detection Systems based on Sliding Windows Approach-A Review Mobile Human Detection Systems based on Sliding Windows Approach-A Review Seminar: Mobile Human detection systems Njieutcheu Tassi cedrique Rovile Department of Computer Engineering University of Heidelberg

More information

The Analysis of Animate Object Motion using Neural Networks and Snakes

The Analysis of Animate Object Motion using Neural Networks and Snakes The Analysis of Animate Object Motion using Neural Networks and Snakes Ken Tabb, Neil Davey, Rod Adams & Stella George e-mail {K.J.Tabb, N.Davey, R.G.Adams, S.J.George}@herts.ac.uk http://www.health.herts.ac.uk/ken/vision/

More information

Evaluating Example-based Pose Estimation: Experiments on the HumanEva Sets

Evaluating Example-based Pose Estimation: Experiments on the HumanEva Sets Evaluating Example-based Pose Estimation: Experiments on the HumanEva Sets Ronald Poppe Human Media Interaction Group, Department of Computer Science University of Twente, Enschede, The Netherlands poppe@ewi.utwente.nl

More information

Skeleton based Human Action Recognition using Kinect

Skeleton based Human Action Recognition using Kinect Skeleton based Human Action Recognition using Kinect Ayushi Gahlot Purvi Agarwal Akshya Agarwal Vijai Singh IMS Engineering college, Amit Kumar Gautam ABSTRACT This paper covers the aspects of action recognition

More information

Category-level localization

Category-level localization Category-level localization Cordelia Schmid Recognition Classification Object present/absent in an image Often presence of a significant amount of background clutter Localization / Detection Localize object

More information

Human Motion Reconstruction from Action Video Data Using a 3-Layer-LSTM

Human Motion Reconstruction from Action Video Data Using a 3-Layer-LSTM Human Motion Reconstruction from Action Video Data Using a 3-Layer-LSTM Jihee Hwang Stanford Computer Science jiheeh@stanford.edu Danish Shabbir Stanford Electrical Engineering danishs@stanford.edu Abstract

More information

Action Recognition in Low Quality Videos by Jointly Using Shape, Motion and Texture Features

Action Recognition in Low Quality Videos by Jointly Using Shape, Motion and Texture Features Action Recognition in Low Quality Videos by Jointly Using Shape, Motion and Texture Features Saimunur Rahman, John See, Chiung Ching Ho Centre of Visual Computing, Faculty of Computing and Informatics

More information

Object Detection by 3D Aspectlets and Occlusion Reasoning

Object Detection by 3D Aspectlets and Occlusion Reasoning Object Detection by 3D Aspectlets and Occlusion Reasoning Yu Xiang University of Michigan Silvio Savarese Stanford University In the 4th International IEEE Workshop on 3D Representation and Recognition

More information

A Method of Hyper-sphere Cover in Multidimensional Space for Human Mocap Data Retrieval

A Method of Hyper-sphere Cover in Multidimensional Space for Human Mocap Data Retrieval Journal of Human Kinetics volume 28/2011, 133-139 DOI: 10.2478/v10078-011-0030-0 133 Section III Sport, Physical Education & Recreation A Method of Hyper-sphere Cover in Multidimensional Space for Human

More information

Real-time gesture recognition from depth data through key poses learning and decision forests

Real-time gesture recognition from depth data through key poses learning and decision forests Real-time gesture recognition from depth data through key poses learning and decision forests Leandro Miranda Thales Vieira Dimas Martinez Mathematics, UFAL Thomas Lewiner Mathematics, PUC-Rio Antonio

More information

Automatic Colorization of Grayscale Images

Automatic Colorization of Grayscale Images Automatic Colorization of Grayscale Images Austin Sousa Rasoul Kabirzadeh Patrick Blaes Department of Electrical Engineering, Stanford University 1 Introduction ere exists a wealth of photographic images,

More information

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009 Learning and Inferring Depth from Monocular Images Jiyan Pan April 1, 2009 Traditional ways of inferring depth Binocular disparity Structure from motion Defocus Given a single monocular image, how to infer

More information

Summarization of Egocentric Moving Videos for Generating Walking Route Guidance

Summarization of Egocentric Moving Videos for Generating Walking Route Guidance Summarization of Egocentric Moving Videos for Generating Walking Route Guidance Masaya Okamoto and Keiji Yanai Department of Informatics, The University of Electro-Communications 1-5-1 Chofugaoka, Chofu-shi,

More information

Short Survey on Static Hand Gesture Recognition

Short Survey on Static Hand Gesture Recognition Short Survey on Static Hand Gesture Recognition Huu-Hung Huynh University of Science and Technology The University of Danang, Vietnam Duc-Hoang Vo University of Science and Technology The University of

More information

Hand Posture Recognition Using Adaboost with SIFT for Human Robot Interaction

Hand Posture Recognition Using Adaboost with SIFT for Human Robot Interaction Hand Posture Recognition Using Adaboost with SIFT for Human Robot Interaction Chieh-Chih Wang and Ko-Chih Wang Department of Computer Science and Information Engineering Graduate Institute of Networking

More information

Sign Language Recognition using Dynamic Time Warping and Hand Shape Distance Based on Histogram of Oriented Gradient Features

Sign Language Recognition using Dynamic Time Warping and Hand Shape Distance Based on Histogram of Oriented Gradient Features Sign Language Recognition using Dynamic Time Warping and Hand Shape Distance Based on Histogram of Oriented Gradient Features Pat Jangyodsuk Department of Computer Science and Engineering The University

More information

Seminar Heidelberg University

Seminar Heidelberg University Seminar Heidelberg University Mobile Human Detection Systems Pedestrian Detection by Stereo Vision on Mobile Robots Philip Mayer Matrikelnummer: 3300646 Motivation Fig.1: Pedestrians Within Bounding Box

More information

Revisiting LBP-based Texture Models for Human Action Recognition

Revisiting LBP-based Texture Models for Human Action Recognition Revisiting LBP-based Texture Models for Human Action Recognition Thanh Phuong Nguyen 1, Antoine Manzanera 1, Ngoc-Son Vu 2, and Matthieu Garrigues 1 1 ENSTA-ParisTech, 828, Boulevard des Maréchaux, 91762

More information

Real-Time Continuous Action Detection and Recognition Using Depth Images and Inertial Signals

Real-Time Continuous Action Detection and Recognition Using Depth Images and Inertial Signals Real-Time Continuous Action Detection and Recognition Using Depth Images and Inertial Signals Neha Dawar 1, Chen Chen 2, Roozbeh Jafari 3, Nasser Kehtarnavaz 1 1 Department of Electrical and Computer Engineering,

More information

Structured Models in. Dan Huttenlocher. June 2010

Structured Models in. Dan Huttenlocher. June 2010 Structured Models in Computer Vision i Dan Huttenlocher June 2010 Structured Models Problems where output variables are mutually dependent or constrained E.g., spatial or temporal relations Such dependencies

More information

Automatic visual recognition for metro surveillance

Automatic visual recognition for metro surveillance Automatic visual recognition for metro surveillance F. Cupillard, M. Thonnat, F. Brémond Orion Research Group, INRIA, Sophia Antipolis, France Abstract We propose in this paper an approach for recognizing

More information

Human Body Recognition and Tracking: How the Kinect Works. Kinect RGB-D Camera. What the Kinect Does. How Kinect Works: Overview

Human Body Recognition and Tracking: How the Kinect Works. Kinect RGB-D Camera. What the Kinect Does. How Kinect Works: Overview Human Body Recognition and Tracking: How the Kinect Works Kinect RGB-D Camera Microsoft Kinect (Nov. 2010) Color video camera + laser-projected IR dot pattern + IR camera $120 (April 2012) Kinect 1.5 due

More information

People Tracking and Segmentation Using Efficient Shape Sequences Matching

People Tracking and Segmentation Using Efficient Shape Sequences Matching People Tracking and Segmentation Using Efficient Shape Sequences Matching Junqiu Wang, Yasushi Yagi, and Yasushi Makihara The Institute of Scientific and Industrial Research, Osaka University 8-1 Mihogaoka,

More information

Human Action Recognition Using a Temporal Hierarchy of Covariance Descriptors on 3D Joint Locations

Human Action Recognition Using a Temporal Hierarchy of Covariance Descriptors on 3D Joint Locations Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence Human Action Recognition Using a Temporal Hierarchy of Covariance Descriptors on 3D Joint Locations Mohamed E.

More information

Exploring the Trade-off Between Accuracy and Observational Latency in Action Recognition

Exploring the Trade-off Between Accuracy and Observational Latency in Action Recognition Noname manuscript No. (will be inserted by the editor) Exploring the Trade-off Between Accuracy and Observational Latency in Action Recognition Chris Ellis Syed Zain Masood Marshall F. Tappen Joseph J.

More information

arxiv: v1 [cs.cv] 2 May 2016

arxiv: v1 [cs.cv] 2 May 2016 16-811 Math Fundamentals for Robotics Comparison of Optimization Methods in Optical Flow Estimation Final Report, Fall 2015 arxiv:1605.00572v1 [cs.cv] 2 May 2016 Contents Noranart Vesdapunt Master of Computer

More information

A Low Power, High Throughput, Fully Event-Based Stereo System: Supplementary Documentation

A Low Power, High Throughput, Fully Event-Based Stereo System: Supplementary Documentation A Low Power, High Throughput, Fully Event-Based Stereo System: Supplementary Documentation Alexander Andreopoulos, Hirak J. Kashyap, Tapan K. Nayak, Arnon Amir, Myron D. Flickner IBM Research March 25,

More information

Learning video saliency from human gaze using candidate selection

Learning video saliency from human gaze using candidate selection Learning video saliency from human gaze using candidate selection Rudoy, Goldman, Shechtman, Zelnik-Manor CVPR 2013 Paper presentation by Ashish Bora Outline What is saliency? Image vs video Candidates

More information

Forensic Image Recognition using a Novel Image Fingerprinting and Hashing Technique

Forensic Image Recognition using a Novel Image Fingerprinting and Hashing Technique Forensic Image Recognition using a Novel Image Fingerprinting and Hashing Technique R D Neal, R J Shaw and A S Atkins Faculty of Computing, Engineering and Technology, Staffordshire University, Stafford

More information

Separating Objects and Clutter in Indoor Scenes

Separating Objects and Clutter in Indoor Scenes Separating Objects and Clutter in Indoor Scenes Salman H. Khan School of Computer Science & Software Engineering, The University of Western Australia Co-authors: Xuming He, Mohammed Bennamoun, Ferdous

More information