Multilayer and Multimodal Fusion of Deep Neural Networks for Video Classification
|
|
- Margaret Woods
- 5 years ago
- Views:
Transcription
1 Multilayer and Multimodal Fusion of Deep Neural Networks for Video Classification Xiaodong Yang, Pavlo Molchanov, Jan Kautz
2 INTELLIGENT VIDEO ANALYTICS Surveillance event detection Human-computer interaction Multimedia search and indexing Video 2
3 INTELLIGENT VIDEO ANALYTICS Related Work Local feature extraction Global feature representation Temporal modeling 3
4 INTELLIGENT VIDEO ANALYTICS Related Work Local feature extraction Global feature representation Temporal modeling Dense trajectories, H. Wang et al. ICCV
5 INTELLIGENT VIDEO ANALYTICS Related Work Local feature extraction Global feature representation Temporal modeling Dense trajectories, H. Wang et al. ICCV 2013 Bag-of-visual-words, J. Gemert et al. TPAMI 2009 Fisher vector, F. Perronnin et al. ECCV
6 INTELLIGENT VIDEO ANALYTICS Related Work Local feature extraction Global feature representation Temporal modeling Dense trajectories, H. Wang et al. ICCV 2013 Bag-of-visual-words, J. Gemert et al. TPAMI 2009 Spatio-temporal pyramid, X. Yang et al. ECCV 2014 Fisher vector, F. Perronnin et al. ECCV
7 INTELLIGENT VIDEO ANALYTICS Related Work 2D-CNN, A. Karpathy et al, CVPR 2014 C3D, D. Tran et al, ICCV 2015 Two-stream networks, K. Simonyan et al, NIPS 2014 LSTM, J. Ng, CVPR
8 OUR CONTRIBUTIONS Local feature extraction: Multilayer representations from CNN Global feature representation: Multimodal representations Fusion by boosting Temporal modeling: Structure of FC-RNN Overview of multilayer and multimodal fusion for video classification 8
9 MULTILAYER REPRESENTATIONS Dense image prediction FCN by Long et al. FlowNet by Fischer et al. 9
10 MULTILAYER REPRESENTATIONS Features of conv layers Poses, parts, articulations, objects, etc. Visualization by Zeiler et al. 10
11 MULTILAYER REPRESENTATIONS Convert feature maps to feature descriptors Feature maps of dimension feature descriptors of dimension 5 11
12 MULTILAYER REPRESENTATIONS Learn spatial discriminative weights of conv layers Spatial information of conv layers to enhance representations Video frames Feature maps of a conv layer over time importance Spatial weights of a conv layer 12
13 MULTILAYER REPRESENTATIONS Aggregate feature descriptors by Fisher vector (FV) Feature maps of a conv layer over time Gaussian mixture model 13
14 MULTILAYER REPRESENTATIONS Represent conv layers by improved Fisher vector (ifv) importance Feature maps of a conv layer over time Spatial weights of a conv layer Gaussian mixture model 14
15 MULTILAYER REPRESENTATIONS Represent conv layers by improved Fisher vector (ifv) Represent fc layers by temporal max pooling Overview of multilayer representation 15
16 FC-RNN STRUCTURE Modeling Temporal Dynamics Don t be a hero use pre-trained models 16
17 FC-RNN STRUCTURE Modeling Temporal Dynamics Don t be a hero use pre-trained models Many pre-trained models from ImageNet and Sports1M Images/Snippets Videos VGG/C3D 17
18 FC-RNN STRUCTURE Modeling Temporal Dynamics Don t be a hero use pre-trained models Many pre-trained models from ImageNet and Sports1M Images/Snippets Videos RNN fc layer VGG/C3D VGG/C3D Standard RNN 18
19 FC-RNN STRUCTURE Modeling Temporal Dynamics Don t be a hero use pre-trained models Many pre-trained models from ImageNet and Sports1M Images/Snippets Videos RNN RNN VGG/C3D fc layer VGG/C3D fc layer VGG/C3D Standard RNN FC-RNN 19
20 FC-RNN STRUCTURE Modeling Temporal Dynamics Don t be a hero use pre-trained models Many pre-trained models from ImageNet and Sports1M Images/Snippets Videos RNN FC-RNN VGG/C3D fc layer VGG/C3D VGG/C3D Standard RNN FC-RNN 20
21 RNN FC-RNN STRUCTURE Modeling Temporal Dynamics FC-RNN Pre-trained CNN, fc layer: Transfer to recurrent layers Comparison of standard RNN and FC-RNN 21
22 MULTIMODAL REPRESENTATIONS Static and dynamic information 2D-CNN/3D-CNN with video frames/optical flow maps A single frame A buffer of frames A single flow map A buffer of flow maps 22
23 FUSION BY BOOSTING Optimize a linear combination of predictions of multiple layers from multiple modalities LPBoost: boost-u: learn uniform weights for all classes boost-c: learn class specific weights 23
24 FUSION BY BOOSTING Optimize a linear combination of predictions of multiple layers from multiple modalities LPBoost: boost-u: learn uniform weights for all classes boost-c: learn class specific weights 4 layers and 4 modalities M = 16 24
25 EXPERIMENTS Benchmark datasets UCF101: 13,320 videos in 101 classes Skiing HMDB51: 6,766 videos in 51 classes Kissing 25
26 EXPERIMENTS FC-RNN Outperforms RNN and LSTM by 3.0% and 2.9% error rate epochs Comparison of standard RNN and FC-RNN in training and testing of 3D-CNN-SF on UCF101 26
27 EXPERIMENTS FC-RNN Up to 3 % improvement Outperforms RNN and LSTM by 3.0% and 2.9% error rate epochs Comparison of standard RNN and FC-RNN in training and testing of 3D-CNN-SF on UCF101 27
28 EXPERIMENTS Feature Aggregation A single frame importance Spatial weights of a conv layer A single flow map A buffer of frames A buffer of flow maps Comparison of FV and ifv to represent conv layers of different modalities 28
29 EXPERIMENTS Feature Aggregation Up to 2.5 % improvement A single frame importance Spatial weights of a conv layer A single flow map A buffer of frames A buffer of flow maps Comparison of FV and ifv to represent conv layers of different modalities 29
30 EXPERIMENTS Multilayer Fusion Classification accuracy of single layers over different modalities and multilayer fusion results 30
31 EXPERIMENTS Multilayer Fusion Classification accuracy of single layers over different modalities and multilayer fusion results 31
32 EXPERIMENTS Multilayer Fusion Classification accuracy of single layers over different modalities and multilayer fusion results 32
33 EXPERIMENTS Multilayer Fusion Up to 8 % improvement Classification accuracy of single layers over different modalities and multilayer fusion results 33
34 EXPERIMENTS Multimodal Fusion Up to 6 % improvement Classification accuracy of different modalities and various combinations Comparison to the state-of-the-art results 34
35 EXPERIMENTS LPBoost conv4 29% 17% 0% 38% conv5 Modalities fc7 50% Layers 31% 23% 12% fc6 35
36 EXPERIMENTS Effect of Multimodal Fusion 2D-CNN-SF Multimodal Fusion skijet : ( skiing : ) SKIING SKIJET 36
37 EXPERIMENTS Effect of Multimodal Fusion 2D-CNN-OF Multimodal Fusion boxing speeding bag : ( boxing punching bag : ) BOXING PUNCHING BAG BOXING SPEEDING BAG 37
38 OUR CONTRIBUTIONS Local feature extraction: Multilayer representations from CNN Global feature representation: Multimodal representations Fusion by boosting Temporal modeling: Structure of FC-RNN Overview of multilayer and multimodal fusion for video classification 38
39
Multilayer and Multimodal Fusion of Deep Neural Networks for Video Classification
Multilayer and Multimodal Fusion of Deep Neural Networks for Video Classification Xiaodong Yang Pavlo Molchanov Jan Kautz NVIDIA {xiaodongy, pmolchanov, jkautz}@nvidia.com ABSTRACT This paper presents
More informationLong-term Temporal Convolutions for Action Recognition INRIA
Longterm Temporal Convolutions for Action Recognition Gul Varol Ivan Laptev INRIA Cordelia Schmid 2 Motivation Current CNN methods for action recognition learn representations for short intervals (116
More informationTwo-Stream Convolutional Networks for Action Recognition in Videos
Two-Stream Convolutional Networks for Action Recognition in Videos Karen Simonyan Andrew Zisserman Cemil Zalluhoğlu Introduction Aim Extend deep Convolution Networks to action recognition in video. Motivation
More informationarxiv: v1 [cs.cv] 14 Jul 2017
Temporal Modeling Approaches for Large-scale Youtube-8M Video Understanding Fu Li, Chuang Gan, Xiao Liu, Yunlong Bian, Xiang Long, Yandong Li, Zhichao Li, Jie Zhou, Shilei Wen Baidu IDL & Tsinghua University
More informationBilinear Models for Fine-Grained Visual Recognition
Bilinear Models for Fine-Grained Visual Recognition Subhransu Maji College of Information and Computer Sciences University of Massachusetts, Amherst Fine-grained visual recognition Example: distinguish
More informationLarge-scale Video Classification with Convolutional Neural Networks
Large-scale Video Classification with Convolutional Neural Networks Andrej Karpathy, George Toderici, Sanketh Shetty, Thomas Leung, Rahul Sukthankar, Li Fei-Fei Note: Slide content mostly from : Bay Area
More informationEigen-Evolution Dense Trajectory Descriptors
Eigen-Evolution Dense Trajectory Descriptors Yang Wang, Vinh Tran, and Minh Hoai Stony Brook University, Stony Brook, NY 11794-2424, USA {wang33, tquangvinh, minhhoai}@cs.stonybrook.edu Abstract Trajectory-pooled
More informationCS231N Section. Video Understanding 6/1/2018
CS231N Section Video Understanding 6/1/2018 Outline Background / Motivation / History Video Datasets Models Pre-deep learning CNN + RNN 3D convolution Two-stream What we ve seen in class so far... Image
More informationQuo Vadis, Action Recognition? A New Model and the Kinetics Dataset. By Joa õ Carreira and Andrew Zisserman Presenter: Zhisheng Huang 03/02/2018
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset By Joa õ Carreira and Andrew Zisserman Presenter: Zhisheng Huang 03/02/2018 Outline: Introduction Action classification architectures
More informationarxiv: v1 [cs.cv] 29 Apr 2016
Improved Dense Trajectory with Cross Streams arxiv:1604.08826v1 [cs.cv] 29 Apr 2016 ABSTRACT Katsunori Ohnishi Graduate School of Information Science and Technology University of Tokyo ohnishi@mi.t.utokyo.ac.jp
More informationLearning Latent Sub-events in Activity Videos Using Temporal Attention Filters
Learning Latent Sub-events in Activity Videos Using Temporal Attention Filters AJ Piergiovanni, Chenyou Fan, and Michael S Ryoo School of Informatics and Computing, Indiana University, Bloomington, IN
More informationDeep Local Video Feature for Action Recognition
Deep Local Video Feature for Action Recognition Zhenzhong Lan 1 Yi Zhu 2 Alexander G. Hauptmann 1 Shawn Newsam 2 1 Carnegie Mellon University 2 University of California, Merced {lanzhzh,alex}@cs.cmu.edu
More informationDeep Learning For Video Classification. Presented by Natalie Carlebach & Gil Sharon
Deep Learning For Video Classification Presented by Natalie Carlebach & Gil Sharon Overview Of Presentation Motivation Challenges of video classification Common datasets 4 different methods presented in
More informationEasyChair Preprint. Real-Time Action Recognition based on Enhanced Motion Vector Temporal Segment Network
EasyChair Preprint 730 Real-Time Action Recognition based on Enhanced Motion Vector Temporal Segment Network Xue Bai, Enqing Chen and Haron Chweya Tinega EasyChair preprints are intended for rapid dissemination
More informationLearning Spatio-Temporal Features with 3D Residual Networks for Action Recognition
Learning Spatio-Temporal Features with 3D Residual Networks for Action Recognition Kensho Hara, Hirokatsu Kataoka, Yutaka Satoh National Institute of Advanced Industrial Science and Technology (AIST) Tsukuba,
More informationSpatio-Temporal Vector of Locally Max Pooled Features for Action Recognition in Videos
Spatio-Temporal Vector of Locally Max Pooled Features for Action Recognition in Videos Ionut Cosmin Duta 1 Bogdan Ionescu 2 Kiyoharu Aizawa 3 Nicu Sebe 1 1 University of Trento, Italy 2 University Politehnica
More informationDeep Spatial Pyramid Ensemble for Cultural Event Recognition
215 IEEE International Conference on Computer Vision Workshops Deep Spatial Pyramid Ensemble for Cultural Event Recognition Xiu-Shen Wei Bin-Bin Gao Jianxin Wu National Key Laboratory for Novel Software
More informationReturn of the Devil in the Details: Delving Deep into Convolutional Nets
Return of the Devil in the Details: Delving Deep into Convolutional Nets Ken Chatfield - Karen Simonyan - Andrea Vedaldi - Andrew Zisserman University of Oxford The Devil is still in the Details 2011 2014
More informationarxiv: v1 [cs.cv] 15 Apr 2016
arxiv:0.09v [cs.cv] Apr 0 Long-term Temporal Convolutions for Action Recognition Gül Varol, Ivan Laptev Cordelia Schmid Inria Abstract. Typical human actions last several seconds and exhibit characteristic
More informationAction Recognition Using Super Sparse Coding Vector with Spatio-Temporal Awareness
Action Recognition Using Super Sparse Coding Vector with Spatio-Temporal Awareness Xiaodong Yang and YingLi Tian Department of Electrical Engineering City College, City University of New York Abstract.
More informationBidirectional Recurrent Convolutional Networks for Video Super-Resolution
Bidirectional Recurrent Convolutional Networks for Video Super-Resolution Qi Zhang & Yan Huang Center for Research on Intelligent Perception and Computing (CRIPAC) National Laboratory of Pattern Recognition
More informationReal-time Action Recognition with Enhanced Motion Vector CNNs
Real-time Action Recognition with Enhanced Motion Vector CNNs Bowen Zhang 1,2 Limin Wang 1,3 Zhe Wang 1 Yu Qiao 1 Hanli Wang 2 1 Shenzhen key lab of Comp. Vis. & Pat. Rec., Shenzhen Institutes of Advanced
More informationPerson Action Recognition/Detection
Person Action Recognition/Detection Fabrício Ceschin Visão Computacional Prof. David Menotti Departamento de Informática - Universidade Federal do Paraná 1 In object recognition: is there a chair in the
More informationConvolutional-Recursive Deep Learning for 3D Object Classification
Convolutional-Recursive Deep Learning for 3D Object Classification Richard Socher, Brody Huval, Bharath Bhat, Christopher D. Manning, Andrew Y. Ng NIPS 2012 Iro Armeni, Manik Dhar Motivation Hand-designed
More informationXiaowei Hu* Lei Zhu* Chi-Wing Fu Jing Qin Pheng-Ann Heng
Direction-aware Spatial Context Features for Shadow Detection Xiaowei Hu* Lei Zhu* Chi-Wing Fu Jing Qin Pheng-Ann Heng The Chinese University of Hong Kong The Hong Kong Polytechnic University Shenzhen
More informationarxiv: v2 [cs.cv] 2 Apr 2018
Depth of 3D CNNs Depth of 2D CNNs Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet? arxiv:1711.09577v2 [cs.cv] 2 Apr 2018 Kensho Hara, Hirokatsu Kataoka, Yutaka Satoh National Institute
More informationarxiv: v2 [cs.cv] 6 May 2018
Appearance-and-Relation Networks for Video Classification Limin Wang 1,2 Wei Li 3 Wen Li 2 Luc Van Gool 2 1 State Key Laboratory for Novel Software Technology, Nanjing University, China 2 Computer Vision
More informationHUMAN ACTION RECOGNITION
HUMAN ACTION RECOGNITION Human Action Recognition 1. Hand crafted feature + Shallow classifier 2. Human localization + (Hand crafted features) + 3D CNN Input is a small chunk of video 3. 3D CNN Input is
More informationHuman Pose Estimation with Deep Learning. Wei Yang
Human Pose Estimation with Deep Learning Wei Yang Applications Understand Activities Family Robots American Heist (2014) - The Bank Robbery Scene 2 What do we need to know to recognize a crime scene? 3
More informationarxiv: v1 [cs.cv] 19 May 2015
Action Recognition with Trajectory-Pooled Deep-Convolutional Descriptors Limin Wang 1,2 Yu Qiao 2 Xiaoou Tang 1,2 1 Department of Information Engineering, The Chinese University of Hong Kong 2 Shenzhen
More informationMiCT: Mixed 3D/2D Convolutional Tube for Human Action Recognition
MiCT: Mixed 3D/2D Convolutional Tube for Human Action Recognition Yizhou Zhou 1 Xiaoyan Sun 2 Zheng-Jun Zha 1 Wenjun Zeng 2 1 University of Science and Technology of China 2 Microsoft Research Asia zyz0205@mail.ustc.edu.cn,
More informationA Deep Learning Framework for Authorship Classification of Paintings
A Deep Learning Framework for Authorship Classification of Paintings Kai-Lung Hua ( 花凱龍 ) Dept. of Computer Science and Information Engineering National Taiwan University of Science and Technology Taipei,
More informationarxiv: v1 [cs.cv] 22 Nov 2017
D Nets: New Architecture and Transfer Learning for Video Classification Ali Diba,4,, Mohsen Fayyaz,, Vivek Sharma, Amir Hossein Karami 4, Mohammad Mahdi Arzani 4, Rahman Yousefzadeh 4, Luc Van Gool,4 ESAT-PSI,
More informationLearning Compact Visual Attributes for Large-scale Image Classification
Learning Compact Visual Attributes for Large-scale Image Classification Yu Su and Frédéric Jurie GREYC CNRS UMR 6072, University of Caen Basse-Normandie, Caen, France {yu.su,frederic.jurie}@unicaen.fr
More informationLong-term Temporal Convolutions for Action Recognition
1 Long-term Temporal Convolutions for Action Recognition Gül Varol, Ivan Laptev, and Cordelia Schmid, Fellow, IEEE arxiv:1604.04494v2 [cs.cv] 2 Jun 2017 Abstract Typical human actions last several seconds
More informationMultimodal Gesture Recognition using Multi-stream Recurrent Neural Network
Multimodal Gesture Recognition using Multi-stream Recurrent Neural Network Noriki Nishida, Hideki Nakayama Machine Perception Group Graduate School of Information Science and Technology The University
More informationarxiv: v3 [cs.cv] 12 Apr 2018
A Closer Look at Spatiotemporal Convolutions for Action Recognition Du Tran 1, Heng Wang 1, Lorenzo Torresani 1,2, Jamie Ray 1, Yann LeCun 1, Manohar Paluri 1 1 Facebook Research 2 Dartmouth College {trandu,hengwang,torresani,jamieray,yann,mano}@fb.com
More informationLOCAL VISUAL PATTERN MODELLING FOR IMAGE AND VIDEO CLASSIFICATION
LOCAL VISUAL PATTERN MODELLING FOR IMAGE AND VIDEO CLASSIFICATION Peng Wang A thesis submitted for the degree of Doctor of Philosophy at The University of Queensland in 2017 School of Information Technology
More informationMaking Convolutional Networks Recurrent for Visual Sequence Learning
Making Convolutional Networks Recurrent for Visual Sequence Learning Xiaodong Yang Pavlo Molchanov Jan Kautz NVIDIA {xiaodongy,pmolchanov,jkautz}@nvidia.com Abstract Recurrent neural networks (RNNs) have
More informationA Unified Method for First and Third Person Action Recognition
A Unified Method for First and Third Person Action Recognition Ali Javidani Department of Computer Science and Engineering Shahid Beheshti University Tehran, Iran a.javidani@mail.sbu.ac.ir Ahmad Mahmoudi-Aznaveh
More informationLSTM and its variants for visual recognition. Xiaodan Liang Sun Yat-sen University
LSTM and its variants for visual recognition Xiaodan Liang xdliang328@gmail.com Sun Yat-sen University Outline Context Modelling with CNN LSTM and its Variants LSTM Architecture Variants Application in
More informationDeep learning for object detection. Slides from Svetlana Lazebnik and many others
Deep learning for object detection Slides from Svetlana Lazebnik and many others Recent developments in object detection 80% PASCAL VOC mean0average0precision0(map) 70% 60% 50% 40% 30% 20% 10% Before deep
More informationGPU Accelerated Sequence Learning for Action Recognition. Yemin Shi
GPU Accelerated Sequence Learning for Action Recognition Yemin Shi shiyemin@pku.edu.cn 2018-03 1 Background Object Recognition (Image Classification) Action Recognition (Video Classification) Action Recognition
More informationSpotlight: A Smart Video Highlight Generator Stanford University CS231N Final Project Report
Spotlight: A Smart Video Highlight Generator Stanford University CS231N Final Project Report Jun-Ting (Tim) Hsieh junting@stanford.edu Chengshu (Eric) Li chengshu@stanford.edu Kuo-Hao Zeng khzeng@cs.stanford.edu
More informationAttentionNet for Accurate Localization and Detection of Objects. (To appear in ICCV 2015)
AttentionNet for Accurate Localization and Detection of Objects. (To appear in ICCV 2015) Donggeun Yoo, Sunggyun Park, Joon-Young Lee, Anthony Paek, In So Kweon. State-of-the-art frameworks for object
More informationarxiv: v2 [cs.cv] 13 Apr 2015
Beyond Short Snippets: Deep Networks for Video Classification arxiv:1503.08909v2 [cs.cv] 13 Apr 2015 Joe Yue-Hei Ng 1 yhng@umiacs.umd.edu Oriol Vinyals 3 vinyals@google.com Matthew Hausknecht 2 mhauskn@cs.utexas.edu
More informationLearning Visual Semantics: Models, Massive Computation, and Innovative Applications
Learning Visual Semantics: Models, Massive Computation, and Innovative Applications Part II: Visual Features and Representations Liangliang Cao, IBM Watson Research Center Evolvement of Visual Features
More informationMultiple Kernel Learning for Emotion Recognition in the Wild
Multiple Kernel Learning for Emotion Recognition in the Wild Karan Sikka, Karmen Dykstra, Suchitra Sathyanarayana, Gwen Littlewort and Marian S. Bartlett Machine Perception Laboratory UCSD EmotiW Challenge,
More informationMixtures of Gaussians and Advanced Feature Encoding
Mixtures of Gaussians and Advanced Feature Encoding Computer Vision Ali Borji UWM Many slides from James Hayes, Derek Hoiem, Florent Perronnin, and Hervé Why do good recognition systems go bad? E.g. Why
More informationarxiv: v7 [cs.cv] 21 Apr 2018
End-to-end Video-level Representation Learning for Action Recognition Jiagang Zhu 1,2, Wei Zou 1, Zheng Zhu 1,2 1 Institute of Automation, Chinese Academy of Sciences (CASIA) 2 University of Chinese Academy
More informationFlow-Based Video Recognition
Flow-Based Video Recognition Jifeng Dai Visual Computing Group, Microsoft Research Asia Joint work with Xizhou Zhu*, Yuwen Xiong*, Yujie Wang*, Lu Yuan and Yichen Wei (* interns) Talk pipeline Introduction
More informationRecurrent Neural Networks and Transfer Learning for Action Recognition
Recurrent Neural Networks and Transfer Learning for Action Recognition Andrew Giel Stanford University agiel@stanford.edu Ryan Diaz Stanford University ryandiaz@stanford.edu Abstract We have taken on the
More informationCOMPRESSED-DOMAIN VIDEO CLASSIFICATION WITH DEEP NEURAL NETWORKS: THERE S WAY TOO MUCH INFORMATION TO DECODE THE MATRIX
COMPRESSED-DOMAIN VIDEO CLASSIFICATION WITH DEEP NEURAL NETWORKS: THERE S WAY TOO MUCH INFORMATION TO DECODE THE MATRIX Aaron Chadha, Alhabib Abbas University College London (UCL) Electronic and Electrical
More informationPeople Detection and Video Understanding
1 People Detection and Video Understanding Francois BREMOND INRIA Sophia Antipolis STARS team Institut National Recherche Informatique et Automatisme Francois.Bremond@inria.fr http://www-sop.inria.fr/members/francois.bremond/
More informationCNN for Low Level Image Processing. Huanjing Yue
CNN for Low Level Image Processing Huanjing Yue 2017.11 1 Deep Learning for Image Restoration General formulation: min Θ L( x, x) s. t. x = F(y; Θ) Loss function Parameters to be learned Key issues The
More informationR-FCN: Object Detection with Really - Friggin Convolutional Networks
R-FCN: Object Detection with Really - Friggin Convolutional Networks Jifeng Dai Microsoft Research Li Yi Tsinghua Univ. Kaiming He FAIR Jian Sun Microsoft Research NIPS, 2016 Or Region-based Fully Convolutional
More informationLecture 7: Semantic Segmentation
Semantic Segmentation CSED703R: Deep Learning for Visual Recognition (207F) Segmenting images based on its semantic notion Lecture 7: Semantic Segmentation Bohyung Han Computer Vision Lab. bhhanpostech.ac.kr
More informationFully Convolutional Networks for Semantic Segmentation
Fully Convolutional Networks for Semantic Segmentation Jonathan Long* Evan Shelhamer* Trevor Darrell UC Berkeley Chaim Ginzburg for Deep Learning seminar 1 Semantic Segmentation Define a pixel-wise labeling
More informationT-C3D: Temporal Convolutional 3D Network for Real-Time Action Recognition
The Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18) T-C3D: Temporal Convolutional 3D Network for Real-Time Action Recognition Kun Liu, 1 Wu Liu, 1 Chuang Gan, 2 Mingkui Tan, 3 Huadong
More informationMultimodal Gesture Recognition using Multi-stream Recurrent Neural Network
Multimodal Gesture Recognition using Multi-stream Recurrent Neural Network Noriki Nishida and Hideki Nakayama Machine Perception Group Graduate School of Information Science and Technology The University
More informationTemporal Difference Networks for Video Action Recognition
Temporal Difference Networks for Video Action Recognition Joe Yue-Hei Ng Larry S. Davis University of Maryland, College Park {yhng,lsd}@umiacs.umd.edu Abstract Deep convolutional neural networks have been
More informationDeep learning for dense per-pixel prediction. Chunhua Shen The University of Adelaide, Australia
Deep learning for dense per-pixel prediction Chunhua Shen The University of Adelaide, Australia Image understanding Classification error Convolution Neural Networks 0.3 0.2 0.1 Image Classification [Krizhevsky
More informationBody Joint guided 3D Deep Convolutional Descriptors for Action Recognition
1 Body Joint guided 3D Deep Convolutional Descriptors for Action Recognition arxiv:1704.07160v2 [cs.cv] 25 Apr 2017 Congqi Cao, Yifan Zhang, Member, IEEE, Chunjie Zhang, Member, IEEE, and Hanqing Lu, Senior
More informationarxiv: v1 [cs.cv] 25 Apr 2016
Actionness Estimation Using Hybrid Fully Convolutional Networks Limin Wang,3 Yu Qiao Xiaoou Tang,2 Luc Van Gool 3 Shenzhen key lab of Comp. Vis. & Pat. Rec., Shenzhen Institutes of Advanced Technology,
More informationAggregating Descriptors with Local Gaussian Metrics
Aggregating Descriptors with Local Gaussian Metrics Hideki Nakayama Grad. School of Information Science and Technology The University of Tokyo Tokyo, JAPAN nakayama@ci.i.u-tokyo.ac.jp Abstract Recently,
More informationComputer Vision Lecture 16
Announcements Computer Vision Lecture 16 Deep Learning Applications 11.01.2017 Seminar registration period starts on Friday We will offer a lab course in the summer semester Deep Robot Learning Topic:
More informationarxiv: v3 [cs.cv] 8 May 2015
Exploiting Image-trained CNN Architectures for Unconstrained Video Classification arxiv:503.0444v3 [cs.cv] 8 May 205 Shengxin Zha Northwestern University Evanston IL USA szha@u.northwestern.edu Abstract
More informationComputer Vision Lecture 16
Computer Vision Lecture 16 Deep Learning Applications 11.01.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Announcements Seminar registration period starts
More informationarxiv: v1 [cs.cv] 30 May 2017
Trampoline ing Trampoline ing Generic Tubelet Proposals for Action Localization arxiv:1705.10861v1 [cs.cv] 30 May 2017 Jiawei He Simon Fraser University jha203@sfu.ca Abstract We develop a novel framework
More informationLecture 18: Human Motion Recognition
Lecture 18: Human Motion Recognition Professor Fei Fei Li Stanford Vision Lab 1 What we will learn today? Introduction Motion classification using template matching Motion classification i using spatio
More informationEvaluation of Triple-Stream Convolutional Networks for Action Recognition
Evaluation of Triple-Stream Convolutional Networks for Action Recognition Dichao Liu, Yu Wang and Jien Kato Graduate School of Informatics Nagoya University Nagoya, Japan Email: {liu, ywang, jien} (at)
More informationEncoder-Decoder Networks for Semantic Segmentation. Sachin Mehta
Encoder-Decoder Networks for Semantic Segmentation Sachin Mehta Outline > Overview of Semantic Segmentation > Encoder-Decoder Networks > Results What is Semantic Segmentation? Input: RGB Image Output:
More informationRegionlet Object Detector with Hand-crafted and CNN Feature
Regionlet Object Detector with Hand-crafted and CNN Feature Xiaoyu Wang Research Xiaoyu Wang Research Ming Yang Horizon Robotics Shenghuo Zhu Alibaba Group Yuanqing Lin Baidu Overview of this section Regionlet
More informationVisual Object Tracking. Jianan Wu Megvii (Face++) Researcher Dec 2017
Visual Object Tracking Jianan Wu Megvii (Face++) Researcher wjn@megvii.com Dec 2017 Applications From image to video: Augmented Reality Motion Capture Surveillance Sports Analysis Wait. What is visual
More informationIs 2D Information Enough For Viewpoint Estimation? Amir Ghodrati, Marco Pedersoli, Tinne Tuytelaars BMVC 2014
Is 2D Information Enough For Viewpoint Estimation? Amir Ghodrati, Marco Pedersoli, Tinne Tuytelaars BMVC 2014 Problem Definition Viewpoint estimation: Given an image, predicting viewpoint for object of
More informationTemporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks
Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks Alberto Montes al.montes.gomez@gmail.com Santiago Pascual TALP Research Center santiago.pascual@tsc.upc.edu Amaia Salvador
More informationImage-Sentence Multimodal Embedding with Instructive Objectives
Image-Sentence Multimodal Embedding with Instructive Objectives Jianhao Wang Shunyu Yao IIIS, Tsinghua University {jh-wang15, yao-sy15}@mails.tsinghua.edu.cn Abstract To encode images and sentences into
More informationSCENE TEXT RECOGNITION IN MULTIPLE FRAMES BASED ON TEXT TRACKING
SCENE TEXT RECOGNITION IN MULTIPLE FRAMES BASED ON TEXT TRACKING Xuejian Rong 1, Chucai Yi 2, Xiaodong Yang 1 and Yingli Tian 1,2 1 The City College, 2 The Graduate Center, City University of New York
More informationTowards Good Practices for Multi-modal Fusion in Large-scale Video Classification
Towards Good Practices for Multi-modal Fusion in Large-scale Video Classification Jinlai Liu, Zehuan Yuan, and Changhu Wang Bytedance AI Lab {liujinlai.licio,yuanzehuan,wangchanghu}@bytedance.com Abstract.
More informationMulti-region two-stream R-CNN for action detection
Multi-region two-stream R-CNN for action detection Xiaojiang Peng, Cordelia Schmid To cite this version: Xiaojiang Peng, Cordelia Schmid. Multi-region two-stream R-CNN for action detection. ECCV 2016 -
More informationarxiv: v1 [cs.cv] 26 Jul 2018
A Better Baseline for AVA Rohit Girdhar João Carreira Carl Doersch Andrew Zisserman DeepMind Carnegie Mellon University University of Oxford arxiv:1807.10066v1 [cs.cv] 26 Jul 2018 Abstract We introduce
More informationMultiple VLAD encoding of CNNs for image classification
Multiple VLAD encoding of CNNs for image classification Qing Li, Qiang Peng, Chuan Yan 1 arxiv:1707.00058v1 [cs.cv] 30 Jun 2017 Abstract Despite the effectiveness of convolutional neural networks (CNNs)
More informationDeep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks
Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks Si Chen The George Washington University sichen@gwmail.gwu.edu Meera Hahn Emory University mhahn7@emory.edu Mentor: Afshin
More informationClass 9 Action Recognition
Class 9 Action Recognition Liangliang Cao, April 4, 2013 EECS 6890 Topics in Information Processing Spring 2013, Columbia University http://rogerioferis.com/visualrecognitionandsearch Visual Recognition
More informationA Bag-of-Words Equivalent Recurrent Neural Network for Action Recognition
A Bag-of-Words Equivalent Recurrent Neural Network for Action Recognition Alexander Richard, Juergen Gall University of Bonn, Römerstraße 6, 77 Bonn, Germany {richard,gall}@iai.uni-bonn.de Abstract The
More informationDeep CNN Object Features for Improved Action Recognition in Low Quality Videos
Copyright 2017 American Scientific Publishers Advanced Science Letters All rights reserved Vol. 23, 11360 11364, 2017 Printed in the United States of America Deep CNN Object Features for Improved Action
More informationDeep Learning in Visual Recognition. Thanks Da Zhang for the slides
Deep Learning in Visual Recognition Thanks Da Zhang for the slides Deep Learning is Everywhere 2 Roadmap Introduction Convolutional Neural Network Application Image Classification Object Detection Object
More information16-785: Integrated Intelligence in Robotics: Vision, Language, and Planning. Spring 2018 Lecture 14. Image to Text
16-785: Integrated Intelligence in Robotics: Vision, Language, and Planning Spring 2018 Lecture 14. Image to Text Input Output Classification tasks 4/1/18 CMU 16-785: Integrated Intelligence in Robotics
More informationMask R-CNN. presented by Jiageng Zhang, Jingyao Zhan, Yunhan Ma
Mask R-CNN presented by Jiageng Zhang, Jingyao Zhan, Yunhan Ma Mask R-CNN Background Related Work Architecture Experiment Mask R-CNN Background Related Work Architecture Experiment Background From left
More information3D CONVOLUTIONAL NEURAL NETWORK WITH MULTI-MODEL FRAMEWORK FOR ACTION RECOGNITION
3D CONVOLUTIONAL NEURAL NETWORK WITH MULTI-MODEL FRAMEWORK FOR ACTION RECOGNITION Longlong Jing 1, Yuancheng Ye 1, Xiaodong Yang 3, Yingli Tian 1,2 1 The Graduate Center, 2 The City College, City University
More informationA Piggyback Representation for Action Recognition
A Piggyback Representation for Action Recognition Lior Wolf, Yair Hanani The Balvatnik School of Computer Science Tel Aviv University Tal Hassner Dept. of Mathematics and Computer Science The Open University
More informationDeep Learning. Visualizing and Understanding Convolutional Networks. Christopher Funk. Pennsylvania State University.
Visualizing and Understanding Convolutional Networks Christopher Pennsylvania State University February 23, 2015 Some Slide Information taken from Pierre Sermanet (Google) presentation on and Computer
More informationConvolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech
Convolutional Neural Networks Computer Vision Jia-Bin Huang, Virginia Tech Today s class Overview Convolutional Neural Network (CNN) Training CNN Understanding and Visualizing CNN Image Categorization:
More informationProceedings of the International MultiConference of Engineers and Computer Scientists 2018 Vol I IMECS 2018, March 14-16, 2018, Hong Kong
, March 14-16, 2018, Hong Kong , March 14-16, 2018, Hong Kong , March 14-16, 2018, Hong Kong , March 14-16, 2018, Hong Kong TABLE I CLASSIFICATION ACCURACY OF DIFFERENT PRE-TRAINED MODELS ON THE TEST DATA
More informationImage Question Answering using Convolutional Neural Network with Dynamic Parameter Prediction
Image Question Answering using Convolutional Neural Network with Dynamic Parameter Prediction by Noh, Hyeonwoo, Paul Hongsuck Seo, and Bohyung Han.[1] Presented : Badri Patro 1 1 Computer Vision Reading
More informationDeconvolutions in Convolutional Neural Networks
Overview Deconvolutions in Convolutional Neural Networks Bohyung Han bhhan@postech.ac.kr Computer Vision Lab. Convolutional Neural Networks (CNNs) Deconvolutions in CNNs Applications Network visualization
More informationLARGE-SCALE PERSON RE-IDENTIFICATION AS RETRIEVAL
LARGE-SCALE PERSON RE-IDENTIFICATION AS RETRIEVAL Hantao Yao 1,2, Shiliang Zhang 3, Dongming Zhang 1, Yongdong Zhang 1,2, Jintao Li 1, Yu Wang 4, Qi Tian 5 1 Key Lab of Intelligent Information Processing
More informationPREDICTION OF ANOMALOUS ACTIVITIES IN A VIDEO
PREDICTION OF ANOMALOUS ACTIVITIES IN A VIDEO Lekshmy K Nair 1 P.G. Student, Department of Computer Science and Engineering, LBS Institute of Technology for Women, Trivandrum, Kerala, India ---------------------------------------------------------------------***---------------------------------------------------------------------
More informationPose for Action Action for Pose
Pose for Action Action for Pose Umar Iqbal, Martin Garbade, and Juergen Gall Computer Vision Group, University of Bonn, Germany {uiqbal, garbade, gall}@iai.uni-bonn.de Abstract In this work we propose
More information