Long-term Temporal Convolutions for Action Recognition INRIA
|
|
- Georgiana James
- 5 years ago
- Views:
Transcription
1 Longterm Temporal Convolutions for Action Recognition Gul Varol Ivan Laptev INRIA Cordelia Schmid
2 2 Motivation Current CNN methods for action recognition learn representations for short intervals (116 frames). Typical actions last several seconds. Actions contain characteristic patterns with specific longterm temporal structure. Spacetime convolutions Long temporal extent [Tran 2015] Optical flow [Simonyan 2014]
3 3 Contributions The advantages of longterm temporal convolutions The importance of highquality optical flow estimation for learning accurate video representations.
4 4 Approach
5 5 Network Architecture 3D convolutions with 3x3x3 filters ReLU 3D maxpooling of 2x2x2 Experiments with T = {16, 20, 40, 60, 80, 100}
6 6 Network Architecture Optical flow : 2channel input (original [x, y] values) RGB : 3channel input Increased temporal extent by the cost of decreased spatial resolution.
7 7 Experiments
8 8 Datasets HMDB51 (Kuehne et al. 2011) UCF101 (Soomro et al. 2012)
9 9 Optical Flow 60frame training from scratch With different input modalities Conclusions 1 Even lowquality MPEG flow outperforms RGB. Input Clip Video RGB MPEG flow Farneback Brox Quality of flow impacts the results significantly. RGB MPEG flow Farneback flow Brox flow 60frame networks from scratch on UCF101 (split 1)
10 10 16 vs 60frame networks spatial res. 112x112 58x58 16f network has the same architecture as Tran Input RGB Flow 16f 60f gain Clip Video Clip Video UCF101 (split 1) Pretraining Flow from scratch Flow from UCF101 16f 60f gain [Simonyan 2014] Clip Video Clip Video HMDB51 (split 1)
11 11 RGB Network Finetuning RGB from scratch is difficult to learn We need pretraining Clip Video 16f f f UCF101 (split 1) RGB from scratch C3D 16f 3D convnet trained on Sports1M (Tran 2015) We extend C3D to longer temporal convolutions as follows: Conv5 layer output has T/16 temporal resolution. Maxpool conv5 output over time to recycle pretrained fc layers. Finetune whole network.
12 12 Varying Temporal and Spatial Resolutions clip (dotted..) (plain ) video High resolution (71x71) Long temporal extent High spatial resolution RGB+Flow complementary RGB > Flow (clips) RGB < Flow (videos) Curves less steep for video Low resolution (58x58) (pink) Flow from scratch (blue) RGB from C3D performance Conclusion T UCF101 (split 1)
13 13 Multiple Networks Combined Input UCF101 HMDB51 LTCFlow (100f) LTCFlow (60f+100f) LTCRGB (100f) 81.8 LTCRGB (60f+100f) 81.5 LTCFlow+RGB LTCFlow+RGB + IDT split 1 UCF101 (split 1) flow
14 14 HMDB51 [Wang 2013] IDT+FV [Peng 2014] IDT+HSV IDT+MIFS IDT+SFV 66.8 Slow fusion (scratch) 41.3 C3D (scratch) 44 Slow fusion 65.4 Spatial stream C3D (1 net) LTCRGB C3D (3 nets) 85.2 [Lan 2015] [Peng 2014] [Karpathy 2014] [Tran 2015] CNN (RGB) UCF101 [Karpathy 2014] [Simonyan 2014] [Tran 2015] [Tran 2015] Fusion Handcrafted Method CNN (Flow) Comparison to the Stateoftheart 1 3 splits average Temporal stream LTCFlow [Simonyan 2014] Twostream(avg) [Simonyan 2014] Twostream(SVM) LSTM (flow+rgb) 88.6 TDD [Tran 2015] C3D+IDT 90.4 [Wang 2015] TDD+IDT LTCFlow+RGB LTCFlow+RGB + IDT [Simonyan 2014] [Ng 2014] [Wang 2015] Our implementation is 80.2% 2 No finetuning 3 Uses multitask learning
15 15 Qualitative Analysis
16 16 Classes with Largest Improvement JavelinThrow 16f 60f *JavelinThrow is mostly confused with FloorGymnastics in 16f. FloorGymnastics = running + gymnastics JavelinThrow = running + throwing javelin
17 17 First Layer Filters Complex motion patterns in local neighborhoods x and y intensities 2D vectors t=1 blue t=2 red t=3 green 60f Flow on UCF101 (split 1)
18 18 Higher Layer Filters Video Top activations of filters at conv layers. Colors: classes, Rows: maximum responding test videos, Columns: filters. L1 L2 L3 L4 L5 100f 16f
19 19 thanks! Questions? project page : contact : gul.varol@inria.fr
20 20 Credits Special thanks to all the people who made and released these awesome resources for free: Presentation template by SlidesCarnival Photographs by Unsplash
arxiv: v1 [cs.cv] 15 Apr 2016
arxiv:0.09v [cs.cv] Apr 0 Long-term Temporal Convolutions for Action Recognition Gül Varol, Ivan Laptev Cordelia Schmid Inria Abstract. Typical human actions last several seconds and exhibit characteristic
More informationLong-term Temporal Convolutions for Action Recognition
1 Long-term Temporal Convolutions for Action Recognition Gül Varol, Ivan Laptev, and Cordelia Schmid, Fellow, IEEE arxiv:1604.04494v2 [cs.cv] 2 Jun 2017 Abstract Typical human actions last several seconds
More informationDeep Learning For Video Classification. Presented by Natalie Carlebach & Gil Sharon
Deep Learning For Video Classification Presented by Natalie Carlebach & Gil Sharon Overview Of Presentation Motivation Challenges of video classification Common datasets 4 different methods presented in
More informationQuo Vadis, Action Recognition? A New Model and the Kinetics Dataset. By Joa õ Carreira and Andrew Zisserman Presenter: Zhisheng Huang 03/02/2018
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset By Joa õ Carreira and Andrew Zisserman Presenter: Zhisheng Huang 03/02/2018 Outline: Introduction Action classification architectures
More informationMultilayer and Multimodal Fusion of Deep Neural Networks for Video Classification
Multilayer and Multimodal Fusion of Deep Neural Networks for Video Classification Xiaodong Yang, Pavlo Molchanov, Jan Kautz INTELLIGENT VIDEO ANALYTICS Surveillance event detection Human-computer interaction
More informationTwo-Stream Convolutional Networks for Action Recognition in Videos
Two-Stream Convolutional Networks for Action Recognition in Videos Karen Simonyan Andrew Zisserman Cemil Zalluhoğlu Introduction Aim Extend deep Convolution Networks to action recognition in video. Motivation
More informationLarge-scale Video Classification with Convolutional Neural Networks
Large-scale Video Classification with Convolutional Neural Networks Andrej Karpathy, George Toderici, Sanketh Shetty, Thomas Leung, Rahul Sukthankar, Li Fei-Fei Note: Slide content mostly from : Bay Area
More informationLearning Spatio-Temporal Features with 3D Residual Networks for Action Recognition
Learning Spatio-Temporal Features with 3D Residual Networks for Action Recognition Kensho Hara, Hirokatsu Kataoka, Yutaka Satoh National Institute of Advanced Industrial Science and Technology (AIST) Tsukuba,
More informationEasyChair Preprint. Real-Time Action Recognition based on Enhanced Motion Vector Temporal Segment Network
EasyChair Preprint 730 Real-Time Action Recognition based on Enhanced Motion Vector Temporal Segment Network Xue Bai, Enqing Chen and Haron Chweya Tinega EasyChair preprints are intended for rapid dissemination
More informationPerson Action Recognition/Detection
Person Action Recognition/Detection Fabrício Ceschin Visão Computacional Prof. David Menotti Departamento de Informática - Universidade Federal do Paraná 1 In object recognition: is there a chair in the
More informationRecurrent Neural Networks and Transfer Learning for Action Recognition
Recurrent Neural Networks and Transfer Learning for Action Recognition Andrew Giel Stanford University agiel@stanford.edu Ryan Diaz Stanford University ryandiaz@stanford.edu Abstract We have taken on the
More informationarxiv: v2 [cs.cv] 6 May 2018
Appearance-and-Relation Networks for Video Classification Limin Wang 1,2 Wei Li 3 Wen Li 2 Luc Van Gool 2 1 State Key Laboratory for Novel Software Technology, Nanjing University, China 2 Computer Vision
More informationarxiv: v2 [cs.cv] 2 Apr 2018
Depth of 3D CNNs Depth of 2D CNNs Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet? arxiv:1711.09577v2 [cs.cv] 2 Apr 2018 Kensho Hara, Hirokatsu Kataoka, Yutaka Satoh National Institute
More informationarxiv: v1 [cs.cv] 22 Nov 2017
D Nets: New Architecture and Transfer Learning for Video Classification Ali Diba,4,, Mohsen Fayyaz,, Vivek Sharma, Amir Hossein Karami 4, Mohammad Mahdi Arzani 4, Rahman Yousefzadeh 4, Luc Van Gool,4 ESAT-PSI,
More informationarxiv: v1 [cs.cv] 29 Apr 2016
Improved Dense Trajectory with Cross Streams arxiv:1604.08826v1 [cs.cv] 29 Apr 2016 ABSTRACT Katsunori Ohnishi Graduate School of Information Science and Technology University of Tokyo ohnishi@mi.t.utokyo.ac.jp
More information3D CONVOLUTIONAL NEURAL NETWORK WITH MULTI-MODEL FRAMEWORK FOR ACTION RECOGNITION
3D CONVOLUTIONAL NEURAL NETWORK WITH MULTI-MODEL FRAMEWORK FOR ACTION RECOGNITION Longlong Jing 1, Yuancheng Ye 1, Xiaodong Yang 3, Yingli Tian 1,2 1 The Graduate Center, 2 The City College, City University
More informationT-C3D: Temporal Convolutional 3D Network for Real-Time Action Recognition
The Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18) T-C3D: Temporal Convolutional 3D Network for Real-Time Action Recognition Kun Liu, 1 Wu Liu, 1 Chuang Gan, 2 Mingkui Tan, 3 Huadong
More informationCS231N Section. Video Understanding 6/1/2018
CS231N Section Video Understanding 6/1/2018 Outline Background / Motivation / History Video Datasets Models Pre-deep learning CNN + RNN 3D convolution Two-stream What we ve seen in class so far... Image
More informationarxiv: v1 [cs.cv] 19 May 2015
Action Recognition with Trajectory-Pooled Deep-Convolutional Descriptors Limin Wang 1,2 Yu Qiao 2 Xiaoou Tang 1,2 1 Department of Information Engineering, The Chinese University of Hong Kong 2 Shenzhen
More informationarxiv: v3 [cs.cv] 12 Apr 2018
A Closer Look at Spatiotemporal Convolutions for Action Recognition Du Tran 1, Heng Wang 1, Lorenzo Torresani 1,2, Jamie Ray 1, Yann LeCun 1, Manohar Paluri 1 1 Facebook Research 2 Dartmouth College {trandu,hengwang,torresani,jamieray,yann,mano}@fb.com
More informationGPU Accelerated Sequence Learning for Action Recognition. Yemin Shi
GPU Accelerated Sequence Learning for Action Recognition Yemin Shi shiyemin@pku.edu.cn 2018-03 1 Background Object Recognition (Image Classification) Action Recognition (Video Classification) Action Recognition
More informationMiCT: Mixed 3D/2D Convolutional Tube for Human Action Recognition
MiCT: Mixed 3D/2D Convolutional Tube for Human Action Recognition Yizhou Zhou 1 Xiaoyan Sun 2 Zheng-Jun Zha 1 Wenjun Zeng 2 1 University of Science and Technology of China 2 Microsoft Research Asia zyz0205@mail.ustc.edu.cn,
More informationReal-time Action Recognition with Enhanced Motion Vector CNNs
Real-time Action Recognition with Enhanced Motion Vector CNNs Bowen Zhang 1,2 Limin Wang 1,3 Zhe Wang 1 Yu Qiao 1 Hanli Wang 2 1 Shenzhen key lab of Comp. Vis. & Pat. Rec., Shenzhen Institutes of Advanced
More informationLearning Latent Sub-events in Activity Videos Using Temporal Attention Filters
Learning Latent Sub-events in Activity Videos Using Temporal Attention Filters AJ Piergiovanni, Chenyou Fan, and Michael S Ryoo School of Informatics and Computing, Indiana University, Bloomington, IN
More informationP-CNN: Pose-based CNN Features for Action Recognition. Iman Rezazadeh
P-CNN: Pose-based CNN Features for Action Recognition Iman Rezazadeh Introduction automatic understanding of dynamic scenes strong variations of people and scenes in motion and appearance Fine-grained
More informationAction recognition in robot-assisted minimally invasive surgery
Action recognition in robot-assisted minimally invasive surgery Candidate: Laura Erica Pescatori Co-Tutor: Hirenkumar Chandrakant Nakawala Tutor: Elena De Momi 1 Project Objective da Vinci Robot: Console
More informationDeep Alternative Neural Network: Exploring Contexts as Early as Possible for Action Recognition
Deep Alternative Neural Network: Exploring Contexts as Early as Possible for Action Recognition Jinzhuo Wang, Wenmin Wang, Xiongtao Chen, Ronggang Wang, Wen Gao School of Electronics and Computer Engineering,
More informationCOMPRESSED-DOMAIN VIDEO CLASSIFICATION WITH DEEP NEURAL NETWORKS: THERE S WAY TOO MUCH INFORMATION TO DECODE THE MATRIX
COMPRESSED-DOMAIN VIDEO CLASSIFICATION WITH DEEP NEURAL NETWORKS: THERE S WAY TOO MUCH INFORMATION TO DECODE THE MATRIX Aaron Chadha, Alhabib Abbas University College London (UCL) Electronic and Electrical
More informationDeep Local Video Feature for Action Recognition
Deep Local Video Feature for Action Recognition Zhenzhong Lan 1 Yi Zhu 2 Alexander G. Hauptmann 1 Shawn Newsam 2 1 Carnegie Mellon University 2 University of California, Merced {lanzhzh,alex}@cs.cmu.edu
More informationEvaluation of Triple-Stream Convolutional Networks for Action Recognition
Evaluation of Triple-Stream Convolutional Networks for Action Recognition Dichao Liu, Yu Wang and Jien Kato Graduate School of Informatics Nagoya University Nagoya, Japan Email: {liu, ywang, jien} (at)
More informationMulti-region two-stream R-CNN for action detection
Multi-region two-stream R-CNN for action detection Xiaojiang Peng, Cordelia Schmid To cite this version: Xiaojiang Peng, Cordelia Schmid. Multi-region two-stream R-CNN for action detection. ECCV 2016 -
More informationarxiv: v1 [cs.cv] 14 Jul 2017
Temporal Modeling Approaches for Large-scale Youtube-8M Video Understanding Fu Li, Chuang Gan, Xiao Liu, Yunlong Bian, Xiang Long, Yandong Li, Zhichao Li, Jie Zhou, Shilei Wen Baidu IDL & Tsinghua University
More informationTemporal Difference Networks for Video Action Recognition
Temporal Difference Networks for Video Action Recognition Joe Yue-Hei Ng Larry S. Davis University of Maryland, College Park {yhng,lsd}@umiacs.umd.edu Abstract Deep convolutional neural networks have been
More informationOptical Flow Guided Feature: A Fast and Robust Motion Representation for Video Action Recognition
Optical Flow Guided Feature: A Fast and Robust Motion Representation for Video Action Recognition Shuyang Sun 1,2, Zhanghui Kuang 2, Lu Sheng 3, Wanli Ouyang 1, Wei Zhang 2 1 The University of Sydney 2
More informationarxiv: v7 [cs.cv] 21 Apr 2018
End-to-end Video-level Representation Learning for Action Recognition Jiagang Zhu 1,2, Wei Zou 1, Zheng Zhu 1,2 1 Institute of Automation, Chinese Academy of Sciences (CASIA) 2 University of Chinese Academy
More informationarxiv: v3 [cs.cv] 2 Aug 2017
Action Detection ( 4.3) Tube Proposal Network ( 4.1) Tube Convolutional Neural Network (T-CNN) for Action Detection in Videos Rui Hou, Chen Chen, Mubarak Shah Center for Research in Computer Vision (CRCV),
More informationAttention-Based Temporal Weighted Convolutional Neural Network for Action Recognition
Attention-Based Temporal Weighted Convolutional Neural Network for Action Recognition Jinliang Zang 1, Le Wang 1(&), Ziyi Liu 1, Qilin Zhang 2, Gang Hua 3, and Nanning Zheng 1 1 Xi an Jiaotong University,
More informationTube Convolutional Neural Network (T-CNN) for Action Detection in Videos
Tube Convolutional Neural Network (T-CNN) for Action Detection in Videos Rui Hou, Chen Chen, Mubarak Shah Center for Research in Computer Vision (CRCV), University of Central Florida (UCF) houray@gmail.com,
More informationarxiv: v2 [cs.cv] 26 Apr 2018
Motion Fused Frames: Data Level Fusion Strategy for Hand Gesture Recognition arxiv:1804.07187v2 [cs.cv] 26 Apr 2018 Okan Köpüklü Neslihan Köse Gerhard Rigoll Institute for Human-Machine Communication Technical
More informationAttention-based Temporal Weighted Convolutional Neural Network for Action Recognition
Attention-based Temporal Weighted Convolutional Neural Network for Action Recognition Jinliang Zang 1, Le Wang 1, Ziyi Liu 1, Qilin Zhang 2, Zhenxing Niu 3, Gang Hua 4, and Nanning Zheng 1 1 Xi an Jiaotong
More informationMultilayer and Multimodal Fusion of Deep Neural Networks for Video Classification
Multilayer and Multimodal Fusion of Deep Neural Networks for Video Classification Xiaodong Yang Pavlo Molchanov Jan Kautz NVIDIA {xiaodongy, pmolchanov, jkautz}@nvidia.com ABSTRACT This paper presents
More informationHUMAN ACTION RECOGNITION
HUMAN ACTION RECOGNITION Human Action Recognition 1. Hand crafted feature + Shallow classifier 2. Human localization + (Hand crafted features) + 3D CNN Input is a small chunk of video 3. 3D CNN Input is
More informationEigen-Evolution Dense Trajectory Descriptors
Eigen-Evolution Dense Trajectory Descriptors Yang Wang, Vinh Tran, and Minh Hoai Stony Brook University, Stony Brook, NY 11794-2424, USA {wang33, tquangvinh, minhhoai}@cs.stonybrook.edu Abstract Trajectory-pooled
More informationarxiv: v1 [cs.cv] 10 Apr 2017
ActionVLAD: Learning spatio-temporal aggregation for action classification Rohit Girdhar 1 Deva Ramanan 1 Abhinav Gupta 1 Josef Sivic 2,3 Bryan Russell 2 1 Robotics Institute, Carnegie Mellon University
More informationSKELETON-INDEXED DEEP MULTI-MODAL FEATURE LEARNING FOR HIGH PERFORMANCE HUMAN ACTION RECOGNITION. Chinese Academy of Sciences, Beijing, China
SKELETON-INDEXED DEEP MULTI-MODAL FEATURE LEARNING FOR HIGH PERFORMANCE HUMAN ACTION RECOGNITION Sijie Song 1, Cuiling Lan 2, Junliang Xing 3, Wenjun Zeng 2, Jiaying Liu 1 1 Institute of Computer Science
More informationTHE goal of action detection is to detect every occurrence
JOURNAL OF L A T E X CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 1 An End-to-end 3D Convolutional Neural Network for Action Detection and Segmentation in Videos Rui Hou, Student Member, IEEE, Chen Chen, Member,
More informationDeep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks
Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks Si Chen The George Washington University sichen@gwmail.gwu.edu Meera Hahn Emory University mhahn7@emory.edu Mentor: Afshin
More informationarxiv: v1 [cs.cv] 26 Jul 2018
A Better Baseline for AVA Rohit Girdhar João Carreira Carl Doersch Andrew Zisserman DeepMind Carnegie Mellon University University of Oxford arxiv:1807.10066v1 [cs.cv] 26 Jul 2018 Abstract We introduce
More informationAction recognition in videos
Action recognition in videos Cordelia Schmid INRIA Grenoble Joint work with V. Ferrari, A. Gaidon, Z. Harchaoui, A. Klaeser, A. Prest, H. Wang Action recognition - goal Short actions, i.e. drinking, sit
More informationAction Recognition with Dynamic Image Networks
TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 1 Action Recognition with Dynamic Image Networks Hakan Bilen, Basura Fernando, Efstratios Gavves, and Andrea Vedaldi arxiv:1612.00738v2 [cs.cv]
More informationImage and Video Understanding
Image and Video Understanding 2VO 70.095 WS Christoph Feichtenhofer, Axel Pinz Slide credits: Many thanks to all the great computer vision researchers on which this presentation relies on. Most material
More informationarxiv: v1 [cs.cv] 23 Jan 2018
Let s Dance: Learning From Online Dance Videos Daniel Castro Georgia Institute of Technology Steven Hickson Patsorn Sangkloy shickson@gatech.edu patsorn sangkloy@gatech.edu arxiv:1801.07388v1 [cs.cv] 23
More informationarxiv: v1 [cs.cv] 26 Nov 2018
Evolving Space-Time Neural Architectures for Videos AJ Piergiovanni, Anelia Angelova, Alexander Toshev, Michael S. Ryoo Google Brain {ajpiergi,anelia,toshev,mryoo}@google.com arxiv:8.066v [cs.cv] 26 Nov
More informationarxiv: v1 [cs.cv] 8 May 2018
LOW-LATENCY HUMAN ACTION RECOGNITION WITH WEIGHTED MULTI-REGION CONVOLUTIONAL NEURAL NETWORK Yunfeng Wang, Wengang Zhou, Qilin Zhang, Xiaotian Zhu, Houqiang Li University of Science and Technology of China,
More informationMULTI-VIEW GAIT RECOGNITION USING 3D CONVOLUTIONAL NEURAL NETWORKS. Thomas Wolf, Mohammadreza Babaee, Gerhard Rigoll
MULTI-VIEW GAIT RECOGNITION USING 3D CONVOLUTIONAL NEURAL NETWORKS Thomas Wolf, Mohammadreza Babaee, Gerhard Rigoll Technische Universität München Institute for Human-Machine Communication Theresienstrae
More informationRGB-D Based Action Recognition with Light-weight 3D Convolutional Networks
MANUSCRIPT 1 RGB-D Based Action Recognition with Light-weight 3D olutional Networks Haokui Zhang, Ying Li, Peng Wang, Yu Liu, and Chunhua Shen arxiv:1811.09908v1 [cs.cv] 24 Nov 2018 Abstract Different
More informationViolent Interaction Detection in Video Based on Deep Learning
Journal of Physics: Conference Series PAPER OPEN ACCESS Violent Interaction Detection in Video Based on Deep Learning To cite this article: Peipei Zhou et al 2017 J. Phys.: Conf. Ser. 844 012044 View the
More informationKnow your data - many types of networks
Architectures Know your data - many types of networks Fixed length representation Variable length representation Online video sequences, or samples of different sizes Images Specific architectures for
More informationarxiv: v1 [cs.cv] 6 Jul 2016
arxiv:1607.01794v1 [cs.cv] 6 Jul 2016 VideoLSTM Convolves, Attends and Flows for Action Recognition Zhenyang Li, Efstratios Gavves, Mihir Jain, and Cees G. M. Snoek QUVA Lab, University of Amsterdam Abstract.
More informationarxiv: v1 [cs.cv] 13 Aug 2017
Lattice Long Short-Term Memory for Human Action Recognition Lin Sun 1,2, Kui Jia 3, Kevin Chen 2, Dit Yan Yeung 1, Bertram E. Shi 1, and Silvio Savarese 2 arxiv:1708.03958v1 [cs.cv] 13 Aug 2017 1 The Hong
More informationarxiv: v2 [cs.cv] 13 Apr 2015
Beyond Short Snippets: Deep Networks for Video Classification arxiv:1503.08909v2 [cs.cv] 13 Apr 2015 Joe Yue-Hei Ng 1 yhng@umiacs.umd.edu Oriol Vinyals 3 vinyals@google.com Matthew Hausknecht 2 mhauskn@cs.utexas.edu
More informationSupplementary Materials to End-to-End Learning of Motion Representation for Video Understanding
Supplementary Materials to End-to-End Learning of Motion Representation for Video Understanding Lijie Fan 2, Wenbing Huang 1, Chuang Gan 3, Stefano Ermon 4, Boqing Gong 1, Junzhou Huang 1 1 Tencent AI
More informationarxiv: v3 [cs.cv] 8 May 2015
Exploiting Image-trained CNN Architectures for Unconstrained Video Classification arxiv:503.0444v3 [cs.cv] 8 May 205 Shengxin Zha Northwestern University Evanston IL USA szha@u.northwestern.edu Abstract
More informationarxiv: v1 [cs.cv] 29 Nov 2017
Optical Flow Guided : A Fast and Robust Motion Representation for Video Action Recognition Shuyang Sun 1, 2, Zhanghui Kuang 1, Wanli Ouyang 2, Lu Sheng 3, and Wei Zhang 1 arxiv:1711.11152v1 [cs.v] 29 Nov
More informationDense Optical Flow Prediction from a Static Image
Dense Optical Flow Prediction from a Static Image Jacob Walker, Abhinav Gupta, and Martial Hebert Robotics Institute, Carnegie Mellon University {jcwalker, abhinavg, hebert}@cs.cmu.edu Abstract Given a
More informationDeep Learning with Tensorflow AlexNet
Machine Learning and Computer Vision Group Deep Learning with Tensorflow http://cvml.ist.ac.at/courses/dlwt_w17/ AlexNet Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton, "Imagenet classification
More informationarxiv: v1 [cs.cv] 2 May 2015
Dense Optical Flow Prediction from a Static Image Jacob Walker, Abhinav Gupta, and Martial Hebert Robotics Institute, Carnegie Mellon University {jcwalker, abhinavg, hebert}@cs.cmu.edu arxiv:1505.00295v1
More informationDisguised Face Identification (DFI) with Facial KeyPoints using Spatial Fusion Convolutional Network. Nathan Sun CIS601
Disguised Face Identification (DFI) with Facial KeyPoints using Spatial Fusion Convolutional Network Nathan Sun CIS601 Introduction Face ID is complicated by alterations to an individual s appearance Beard,
More informationEnd-to-End Learning of Motion Representation for Video Understanding
End-to-End Learning of Motion Representation for Video Understanding Lijie Fan 2, Wenbing Huang 1, Chuang Gan 3, Stefano Ermon 4, Boqing Gong 1, Junzhou Huang 1 1 Tencent AI Lab, 2 Tsinghua University,
More informationarxiv: v1 [cs.cv] 2 Dec 2014
C3D: Generic Features for Video Analysis Du Tran1,2, Lubomir Bourdev1, Rob Fergus1,3, Lorenzo Torresani2, Manohar Paluri1 1 Facebook AI Research, 2 Dartmouth College, 3 New York University {lubomir,mano}@fb.com
More informationBody Joint guided 3D Deep Convolutional Descriptors for Action Recognition
1 Body Joint guided 3D Deep Convolutional Descriptors for Action Recognition arxiv:1704.07160v2 [cs.cv] 25 Apr 2017 Congqi Cao, Yifan Zhang, Member, IEEE, Chunjie Zhang, Member, IEEE, and Hanqing Lu, Senior
More informationTwo-stream Collaborative Learning with Spatial-Temporal Attention for Video Classification
1 Two-stream Collaborative Learning with Spatial-Temporal Attention for Video Classification Yuxin Peng, Yunzhen Zhao, and Junchao Zhang arxiv:1711.03273v1 [cs.cv] 9 Nov 2017 Abstract Video classification
More informationarxiv: v1 [cs.cv] 30 May 2017
Trampoline ing Trampoline ing Generic Tubelet Proposals for Action Localization arxiv:1705.10861v1 [cs.cv] 30 May 2017 Jiawei He Simon Fraser University jha203@sfu.ca Abstract We develop a novel framework
More informationLarge-scale gesture recognition based on Multimodal data with C3D and TSN
Large-scale gesture recognition based on Multimodal data with C3D and TSN July 6, 2017 1 Team details Team name ASU Team leader name Yunan Li Team leader address, phone number and email address: Xidian
More informationP-CNN: Pose-based CNN Features for Action Recognition
P-CNN: Pose-based CNN Features for Action Recognition Guilhem Chéron, Ivan Laptev, Cordelia Schmid To cite this version: Guilhem Chéron, Ivan Laptev, Cordelia Schmid. P-CNN: Pose-based CNN Features for
More informationarxiv: v1 [cs.cv] 20 Dec 2016
End-to-End Pedestrian Collision Warning System based on a Convolutional Neural Network with Semantic Segmentation arxiv:1612.06558v1 [cs.cv] 20 Dec 2016 Heechul Jung heechul@dgist.ac.kr Min-Kook Choi mkchoi@dgist.ac.kr
More informationarxiv: v1 [cs.cv] 25 Apr 2016
Actionness Estimation Using Hybrid Fully Convolutional Networks Limin Wang,3 Yu Qiao Xiaoou Tang,2 Luc Van Gool 3 Shenzhen key lab of Comp. Vis. & Pat. Rec., Shenzhen Institutes of Advanced Technology,
More informationACTION RECOGNITION WITH GRADIENT BOUNDARY CONVOLUTIONAL NETWORK
ACTION RECOGNITION WITH GRADIENT BOUNDARY CONVOLUTIONAL NETWORK Huafeng Chen 1,2, Jun Chen 1,2, Chen Chen 3, Ruimin Hu 1,2 1 Research Institute of Shenzhen, Wuhan University, Shenzhen, China 2 National
More information4D Effect Video Classification with Shot-aware Frame Selection and Deep Neural Networks
4D Effect Video Classification with Shot-aware Frame Selection and Deep Neural Networks Thomhert S. Siadari 1, Mikyong Han 2, and Hyunjin Yoon 1,2 Korea University of Science and Technology, South Korea
More informationTHe goal of scene recognition is to predict scene labels. Learning Effective RGB-D Representations for Scene Recognition
IEEE TRANSACTIONS ON IMAGE PROCESSING 1 Learning Effective RGB-D Representations for Scene Recognition Xinhang Song, Shuqiang Jiang* IEEE Senior Member, Luis Herranz, Chengpeng Chen Abstract Deep convolutional
More informationFuzzy Set Theory in Computer Vision: Example 3, Part II
Fuzzy Set Theory in Computer Vision: Example 3, Part II Derek T. Anderson and James M. Keller FUZZ-IEEE, July 2017 Overview Resource; CS231n: Convolutional Neural Networks for Visual Recognition https://github.com/tuanavu/stanford-
More informationarxiv: v1 [cs.cv] 19 Jun 2018
Multimodal feature fusion for CNN-based gait recognition: an empirical comparison F.M. Castro a,, M.J. Marín-Jiménez b, N. Guil a, N. Pérez de la Blanca c a Department of Computer Architecture, University
More informationHierarchical Attention Network for Action Recognition in Videos
Hierarchical Attention Network for Action Recognition in Videos Yilin Wang Arizona State University ywang370@asu.edu Suhang Wang Arizona State University suhang.wang@asu.edu Jiliang Tang Yahoo Research
More informationSpatio-Temporal Vector of Locally Max Pooled Features for Action Recognition in Videos
Spatio-Temporal Vector of Locally Max Pooled Features for Action Recognition in Videos Ionut Cosmin Duta 1 Bogdan Ionescu 2 Kiyoharu Aizawa 3 Nicu Sebe 1 1 University of Trento, Italy 2 University Politehnica
More informationBidirectional Recurrent Convolutional Networks for Video Super-Resolution
Bidirectional Recurrent Convolutional Networks for Video Super-Resolution Qi Zhang & Yan Huang Center for Research on Intelligent Perception and Computing (CRIPAC) National Laboratory of Pattern Recognition
More informationIntro to Deep Learning. Slides Credit: Andrej Karapathy, Derek Hoiem, Marc Aurelio, Yann LeCunn
Intro to Deep Learning Slides Credit: Andrej Karapathy, Derek Hoiem, Marc Aurelio, Yann LeCunn Why this class? Deep Features Have been able to harness the big data in the most efficient and effective
More informationConvolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech
Convolutional Neural Networks Computer Vision Jia-Bin Huang, Virginia Tech Today s class Overview Convolutional Neural Network (CNN) Training CNN Understanding and Visualizing CNN Image Categorization:
More informationSpatiotemporal Residual Networks for Video Action Recognition
Spatiotemporal Residual Networks for Video Action Recognition Christoph Feichtenhofer Graz University of Technology feichtenhofer@tugraz.at Axel Pinz Graz University of Technology axel.pinz@tugraz.at Richard
More informationMachine Learning 13. week
Machine Learning 13. week Deep Learning Convolutional Neural Network Recurrent Neural Network 1 Why Deep Learning is so Popular? 1. Increase in the amount of data Thanks to the Internet, huge amount of
More informationDMC-Net: Generating Discriminative Motion Cues for Fast Compressed Video Action Recognition
Accuracy (%) Accuracy (%) DMC-Net: Generating Discriminative Motion Cues for Fast Action Recognition Zheng Shou 1,2 Zhicheng Yan 2 Yannis Kalantidis 2 Laura Sevilla-Lara 2 Marcus Rohrbach 2 Xudong Lin
More informationT. Seshagiri 1, S. Varadarajan 2
International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2017 IJSRCSEIT Volume 2 Issue 5 ISSN : 2456-3307 Multi-Task Learning Organize (MTLN) of Skeleton
More informationMultimodal Keyless Attention Fusion for Video Classification
The Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18) Multimodal Keyless Attention Fusion for Video Classification Xiang Long, 1 Chuang Gan, 1 Gerard de Melo, 2 Xiao Liu, 3 Yandong Li,
More informationLecture 37: ConvNets (Cont d) and Training
Lecture 37: ConvNets (Cont d) and Training CS 4670/5670 Sean Bell [http://bbabenko.tumblr.com/post/83319141207/convolutional-learnings-things-i-learned-by] (Unrelated) Dog vs Food [Karen Zack, @teenybiscuit]
More informationDeep CNN Object Features for Improved Action Recognition in Low Quality Videos
Copyright 2017 American Scientific Publishers Advanced Science Letters All rights reserved Vol. 23, 11360 11364, 2017 Printed in the United States of America Deep CNN Object Features for Improved Action
More informationConvolu'onal Neural Networks
Convolu'onal Neural Networks Dr. Kira Radinsky CTO SalesPredict Visi8ng Professor/Scien8st Technion Slides were adapted from Fei-Fei Li & Andrej Karpathy & Jus8n Johnson A bit of history: Hubel & Wiesel,
More informationAdaDepth: Unsupervised Content Congruent Adaptation for Depth Estimation
AdaDepth: Unsupervised Content Congruent Adaptation for Depth Estimation Introduction Supplementary material In the supplementary material, we present additional qualitative results of the proposed AdaDepth
More informationA Unified Method for First and Third Person Action Recognition
A Unified Method for First and Third Person Action Recognition Ali Javidani Department of Computer Science and Engineering Shahid Beheshti University Tehran, Iran a.javidani@mail.sbu.ac.ir Ahmad Mahmoudi-Aznaveh
More informationStNet: Local and Global Spatial-Temporal Modeling for Action Recognition
StNet: Local and Global Spatial-Temporal Modeling for Action Recognition Dongliang He 1 Zhichao Zhou 1 Chuang Gan 2 Fu Li 1 Xiao Liu 1 Yandong Li 3 Limin Wang 4 Shilei Wen 1 Department of Computer Vision
More informationMSR-CNN: Applying Motion Salient Region Based Descriptors for Action Recognition
MSR-CNN: Applying Motion Salient Region Based Descriptors for Action Recognition Zhigang Tu School of Computing, Informatics, Decision System Engineering Arizona State University Tempe, USA Email: Zhigang.Tu@asu.edu
More information