Multimodal Gesture Recognition using Multi-stream Recurrent Neural Network

Size: px
Start display at page:

Download "Multimodal Gesture Recognition using Multi-stream Recurrent Neural Network"

Transcription

1 Multimodal Gesture Recognition using Multi-stream Recurrent Neural Network Noriki Nishida and Hideki Nakayama Machine Perception Group Graduate School of Information Science and Technology The University of Tokyo 7th Pacific Rim Symposium on Image and Video Technology (PSIVT 2015) November 27, 2015 Noriki Nishida and Hideki Nakayama Multimodal Gesture Recognition using MRNN The University of Tokyo 1 / 26

2 Menu 1 Introduction 2 Proposed model 3 Experiments 4 Conclusion & Future works Noriki Nishida and Hideki Nakayama Multimodal Gesture Recognition using MRNN The University of Tokyo 2 / 26

3 Recent breakthroughs Object recognition [Krizhevsky et al., 2012] Object detection [Girshick et al., 2014] Speech recognition [Hinton et al., 2012] Word embedding [Mikolov et al., 2013] Convolutional neural networks [Krizhevsky et al., 2012] Recurrent neural networks with Long Short-Term Memory (LSTM) [Hochreiter et al., 1997] AdaDelta (optimization) [Zeiler et al., 2012] Noriki Nishida and Hideki Nakayama Multimodal Gesture Recognition using MRNN The University of Tokyo 3 / 26

4 Multimodal-sequential fusion is NOT solved Figure : Examples of multiple modalities (in gesture recognition) [ How should we fuse multiple modalities into a common space (vector representation)? How should we extract sequential dynamics from multiple sequential modalities? Noriki Nishida and Hideki Nakayama Multimodal Gesture Recognition using MRNN The University of Tokyo 4 / 26

5 Problem with traditional methods Hand-crafted heuristics 1. lead to a lack of generality e.g., skin color filtering for hand detection 2. require prior knowledge of target gesture domains System with less gesture-specific engineering is preferable. Noriki Nishida and Hideki Nakayama Multimodal Gesture Recognition using MRNN The University of Tokyo 5 / 26

6 Our goal 1. Propose an effective approach for fusing multiple sequential modalities 2. Propose a completely data-driven model that can be optimized from end to end Noriki Nishida and Hideki Nakayama Multimodal Gesture Recognition using MRNN The University of Tokyo 6 / 26

7 Recurrent Neural Networks (RNNs) h t = σ(w in x t + W hh h t 1 + b in ) y t = f(w out h t + b out ) Noriki Nishida and Hideki Nakayama Multimodal Gesture Recognition using MRNN The University of Tokyo 7 / 26

8 Overall view of our multi-stream RNN (MRNN) Noriki Nishida and Hideki Nakayama Multimodal Gesture Recognition using MRNN The University of Tokyo 8 / 26

9 Components in our MRNN I (m) : extracts feature vectors from the frame-level inputs of modality m at every time step S (m) : computes the sequential dynamics of the modality m F : fuses the multiple modalities while considering sequential dynamics in multimodal space O: predicts the gesture category given the last output of F What we should optimize are parameters of these components. Noriki Nishida and Hideki Nakayama Multimodal Gesture Recognition using MRNN The University of Tokyo 9 / 26

10 Formularization Input video (with M modalities): x = {(x (1) 1, x (2) 1,..., x (M) 1 ),..., (x (1) T, x(2) T Extract feature representation ĥt from x v (m) t,..., x(m) T )} = I (m) (x (m) t ) for m = 1,..., M h (m) t = S (m) (v (m) t, h (m) t 1) for m = 1,..., M Classification: ĥ t = F ([h (1) t ; h (2) t ;... ; h (M) t ], ĥt 1) y = O(ĥT ) Noriki Nishida and Hideki Nakayama Multimodal Gesture Recognition using MRNN The University of Tokyo 10 / 26

11 Formularization Input video (with M modalities): x = {(x (1) 1, x (2) 1,..., x (M) 1 ),..., (x (1) T, x(2) T Extract feature representation ĥt from x v (m) t,..., x(m) T )} = I (m) (x (m) t ) for m = 1,..., M h (m) t = S (m) (v (m) t, h t 1) (m) for m = 1,..., M Classification: ĥ t = F ([h (1) t ; h (2) t ;... ; h (M) t ], ĥt 1) y = O(ĥT ) Noriki Nishida and Hideki Nakayama Multimodal Gesture Recognition using MRNN The University of Tokyo 11 / 26

12 Formularization Input video (with M modalities): x = {(x (1) 1, x (2) 1,..., x (M) 1 ),..., (x (1) T, x(2) T Extract feature representation ĥt from x v (m) t,..., x(m) T )} = I (m) (x (m) t ) for m = 1,..., M h (m) t = S (m) (v (m) t, h t 1) (m) for m = 1,..., M Classification: ĥ t = F ([h (1) t ; h (2) t ;... ; h (M) t ], ĥt 1) y = O(ĥT ) Noriki Nishida and Hideki Nakayama Multimodal Gesture Recognition using MRNN The University of Tokyo 12 / 26

13 Formularization Input video (with M modalities): x = {(x (1) 1, x (2) 1,..., x (M) 1 ),..., (x (1) T, x(2) T Extract feature representation ĥt from x v (m) t,..., x(m) T )} = I (m) (x (m) t ) for m = 1,..., M h (m) t = S (m) (v (m) t, h (m) t 1) for m = 1,..., M Classification: ĥ t = F ([h (1) t ; h (2) t ;... ; h (M) t ], ĥt 1) y = O(ĥT ) Noriki Nishida and Hideki Nakayama Multimodal Gesture Recognition using MRNN The University of Tokyo 13 / 26

14 Formularization Input video (with M modalities): x = {(x (1) 1, x (2) 1,..., x (M) 1 ),..., (x (1) T, x(2) T Extract feature representation ĥt from x v (m) t,..., x(m) T )} = I (m) (x (m) t ) for m = 1,..., M h (m) t = S (m) (v (m) t, h (m) t 1) for m = 1,..., M Classification: ĥ t = F ([h (1) t ; h (2) t ;... ; h (M) t ], ĥt 1) y = O(ĥT ) Noriki Nishida and Hideki Nakayama Multimodal Gesture Recognition using MRNN The University of Tokyo 14 / 26

15 Graphical representation of our method (M = 2) Noriki Nishida and Hideki Nakayama Multimodal Gesture Recognition using MRNN The University of Tokyo 15 / 26

16 Advantages 1. Whole free parameters of the MRNN can be trained towards end-to-end performance in a supervised manner using SGD and backpropagation. No hand-crafted engineering 2. We can choose current state-of-the-art neural networks for each component: ConvNet or DNN for I (m) LSTM or GRU [Cho et al., 2014] for S (m), F DNN for O Noriki Nishida and Hideki Nakayama Multimodal Gesture Recognition using MRNN The University of Tokyo 16 / 26

17 Late multimodal fusion model (M = 2) No mechanism to consider sequential dynamics in multimodal space Noriki Nishida and Hideki Nakayama Multimodal Gesture Recognition using MRNN The University of Tokyo 17 / 26

18 Early multimodal fusion model (M = 2) No mechanism to consider sequential dynamics in each single-modal space Noriki Nishida and Hideki Nakayama Multimodal Gesture Recognition using MRNN The University of Tokyo 18 / 26

19 Dataset Sheffield Kinect Gesture (SKIG) Dataset [Liu et al., 2013] 10 gesture classes Various illumination and cluttered background Each video consists of two modalities (RGB + Depth) We compute Optical Flow as additional modality Noriki Nishida and Hideki Nakayama Multimodal Gesture Recognition using MRNN The University of Tokyo 19 / 26

20 Experimental Results: MRNN vs. alternatives Table : Test accuracy (MRNN vs. alternative models) Method Accuracy (%) Early multimodal fusion 94.1 Late multimodal fusion 94.6 MRNN 97.8 Extracting sequential dynamics in both single-modal space and multimodal space is beneficial for higher accuracy Noriki Nishida and Hideki Nakayama Multimodal Gesture Recognition using MRNN The University of Tokyo 20 / 26

21 Experimental Results: MRNN vs. previous works Table : Test accuracy (MRNN vs. state-of-the-art methods) Method Accuracy (%) Liu et al. (2013) 88.7 Choi et al. (2014) 91.9 Tung et al. (2014) 96.7 MRNN 97.8 The MRNN outperforms other state-of-the-art methods. Noriki Nishida and Hideki Nakayama Multimodal Gesture Recognition using MRNN The University of Tokyo 21 / 26

22 Experimental Results: multimodal vs. single modality Table : Test accuracy (multiple modality vs. single modality) Method Accuracy (%) MRNN (color) 91.6 MRNN (opt flow) 88.5 MRNN (depth) 95.9 MRNN (color + opt flow + depth) 97.8 The MRNN successfully incorporates multiple sequential modalities. Noriki Nishida and Hideki Nakayama Multimodal Gesture Recognition using MRNN The University of Tokyo 22 / 26

23 Investigation of the robustness to noisy inputs Add Gaussian noise with different standard deviation σ to the depth information in test set. The MRNN can maintain relatively high accuracy. Noriki Nishida and Hideki Nakayama Multimodal Gesture Recognition using MRNN The University of Tokyo 23 / 26

24 Conclusion We propose the MRNN for multimodal-sequential fusion. We successfully applied this approach to multimodal gesture recognition. The MRNN achieves newly state-of-the-art result in the SKIG dataset. Multimodal fusion while considering sequential dynamics in both single-modal space and multimodal space is beneficial. Noriki Nishida and Hideki Nakayama Multimodal Gesture Recognition using MRNN The University of Tokyo 24 / 26

25 Future works Further investigation for theoretical analysis Test our model in other datasets Use other modalities such as skeletal or speech data Apply our model to other tasks that have multimodal-sequential data (e.g., speech recognition) Noriki Nishida and Hideki Nakayama Multimodal Gesture Recognition using MRNN The University of Tokyo 25 / 26

26 Thank you very much! Q & A Noriki Nishida and Hideki Nakayama Multimodal Gesture Recognition using MRNN The University of Tokyo 26 / 26

Multimodal Gesture Recognition using Multi-stream Recurrent Neural Network

Multimodal Gesture Recognition using Multi-stream Recurrent Neural Network Multimodal Gesture Recognition using Multi-stream Recurrent Neural Network Noriki Nishida, Hideki Nakayama Machine Perception Group Graduate School of Information Science and Technology The University

More information

CS231N Section. Video Understanding 6/1/2018

CS231N Section. Video Understanding 6/1/2018 CS231N Section Video Understanding 6/1/2018 Outline Background / Motivation / History Video Datasets Models Pre-deep learning CNN + RNN 3D convolution Two-stream What we ve seen in class so far... Image

More information

DEEP LEARNING REVIEW. Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature Presented by Divya Chitimalla

DEEP LEARNING REVIEW. Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature Presented by Divya Chitimalla DEEP LEARNING REVIEW Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature 2015 -Presented by Divya Chitimalla What is deep learning Deep learning allows computational models that are composed of multiple

More information

Bidirectional Recurrent Convolutional Networks for Video Super-Resolution

Bidirectional Recurrent Convolutional Networks for Video Super-Resolution Bidirectional Recurrent Convolutional Networks for Video Super-Resolution Qi Zhang & Yan Huang Center for Research on Intelligent Perception and Computing (CRIPAC) National Laboratory of Pattern Recognition

More information

Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset. By Joa õ Carreira and Andrew Zisserman Presenter: Zhisheng Huang 03/02/2018

Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset. By Joa õ Carreira and Andrew Zisserman Presenter: Zhisheng Huang 03/02/2018 Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset By Joa õ Carreira and Andrew Zisserman Presenter: Zhisheng Huang 03/02/2018 Outline: Introduction Action classification architectures

More information

Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling

Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling Authors: Junyoung Chung, Caglar Gulcehre, KyungHyun Cho and Yoshua Bengio Presenter: Yu-Wei Lin Background: Recurrent Neural

More information

Multilayer and Multimodal Fusion of Deep Neural Networks for Video Classification

Multilayer and Multimodal Fusion of Deep Neural Networks for Video Classification Multilayer and Multimodal Fusion of Deep Neural Networks for Video Classification Xiaodong Yang, Pavlo Molchanov, Jan Kautz INTELLIGENT VIDEO ANALYTICS Surveillance event detection Human-computer interaction

More information

Convolutional-Recursive Deep Learning for 3D Object Classification

Convolutional-Recursive Deep Learning for 3D Object Classification Convolutional-Recursive Deep Learning for 3D Object Classification Richard Socher, Brody Huval, Bharath Bhat, Christopher D. Manning, Andrew Y. Ng NIPS 2012 Iro Armeni, Manik Dhar Motivation Hand-designed

More information

Adaptive Gesture Recognition System Integrating Multiple Inputs

Adaptive Gesture Recognition System Integrating Multiple Inputs Adaptive Gesture Recognition System Integrating Multiple Inputs Master Thesis - Colloquium Tobias Staron University of Hamburg Faculty of Mathematics, Informatics and Natural Sciences Technical Aspects

More information

arxiv: v1 [cs.cv] 26 Jun 2017

arxiv: v1 [cs.cv] 26 Jun 2017 Skeleton-Based Action Recognition Using Spatio-Temporal LSTM Network with Trust Gates Jun Liu, Amir Shahroudy, Dong Xu, Alex C. Kot, and Gang Wang arxiv:706.0876v [cs.cv] 6 Jun 07 Abstract Skeleton-based

More information

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks Si Chen The George Washington University sichen@gwmail.gwu.edu Meera Hahn Emory University mhahn7@emory.edu Mentor: Afshin

More information

RECURRENT NEURAL NETWORKS

RECURRENT NEURAL NETWORKS RECURRENT NEURAL NETWORKS Methods Traditional Deep-Learning based Non-machine Learning Machine-Learning based method Supervised SVM MLP CNN RNN (LSTM) Localizati on GPS, SLAM Self Driving Perception Pedestrian

More information

(Deep) Learning for Robot Perception and Navigation. Wolfram Burgard

(Deep) Learning for Robot Perception and Navigation. Wolfram Burgard (Deep) Learning for Robot Perception and Navigation Wolfram Burgard Deep Learning for Robot Perception (and Navigation) Lifeng Bo, Claas Bollen, Thomas Brox, Andreas Eitel, Dieter Fox, Gabriel L. Oliveira,

More information

Xuedong Huang Chief Speech Scientist & Distinguished Engineer Microsoft Corporation

Xuedong Huang Chief Speech Scientist & Distinguished Engineer Microsoft Corporation Xuedong Huang Chief Speech Scientist & Distinguished Engineer Microsoft Corporation xdh@microsoft.com Cloud-enabled multimodal NUI with speech, gesture, gaze http://cacm.acm.org/magazines/2014/1/170863-ahistorical-perspective-of-speech-recognition

More information

A Deep Learning primer

A Deep Learning primer A Deep Learning primer Riccardo Zanella r.zanella@cineca.it SuperComputing Applications and Innovation Department 1/21 Table of Contents Deep Learning: a review Representation Learning methods DL Applications

More information

HUMAN action recognition is a fast developing research

HUMAN action recognition is a fast developing research This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 0.09/TPAMI.07.77306,

More information

LSTM and its variants for visual recognition. Xiaodan Liang Sun Yat-sen University

LSTM and its variants for visual recognition. Xiaodan Liang Sun Yat-sen University LSTM and its variants for visual recognition Xiaodan Liang xdliang328@gmail.com Sun Yat-sen University Outline Context Modelling with CNN LSTM and its Variants LSTM Architecture Variants Application in

More information

Large-scale gesture recognition based on Multimodal data with C3D and TSN

Large-scale gesture recognition based on Multimodal data with C3D and TSN Large-scale gesture recognition based on Multimodal data with C3D and TSN July 6, 2017 1 Team details Team name ASU Team leader name Yunan Li Team leader address, phone number and email address: Xidian

More information

Recurrent Neural Networks and Transfer Learning for Action Recognition

Recurrent Neural Networks and Transfer Learning for Action Recognition Recurrent Neural Networks and Transfer Learning for Action Recognition Andrew Giel Stanford University agiel@stanford.edu Ryan Diaz Stanford University ryandiaz@stanford.edu Abstract We have taken on the

More information

END-TO-END CHINESE TEXT RECOGNITION

END-TO-END CHINESE TEXT RECOGNITION END-TO-END CHINESE TEXT RECOGNITION Jie Hu 1, Tszhang Guo 1, Ji Cao 2, Changshui Zhang 1 1 Department of Automation, Tsinghua University 2 Beijing SinoVoice Technology November 15, 2017 Presentation at

More information

A Deep Learning Framework for Authorship Classification of Paintings

A Deep Learning Framework for Authorship Classification of Paintings A Deep Learning Framework for Authorship Classification of Paintings Kai-Lung Hua ( 花凱龍 ) Dept. of Computer Science and Information Engineering National Taiwan University of Science and Technology Taipei,

More information

arxiv: v1 [cs.cv] 14 Jul 2017

arxiv: v1 [cs.cv] 14 Jul 2017 Temporal Modeling Approaches for Large-scale Youtube-8M Video Understanding Fu Li, Chuang Gan, Xiao Liu, Yunlong Bian, Xiang Long, Yandong Li, Zhichao Li, Jie Zhou, Shilei Wen Baidu IDL & Tsinghua University

More information

Reservoir Computing with Emphasis on Liquid State Machines

Reservoir Computing with Emphasis on Liquid State Machines Reservoir Computing with Emphasis on Liquid State Machines Alex Klibisz University of Tennessee aklibisz@gmail.com November 28, 2016 Context and Motivation Traditional ANNs are useful for non-linear problems,

More information

LSTM: An Image Classification Model Based on Fashion-MNIST Dataset

LSTM: An Image Classification Model Based on Fashion-MNIST Dataset LSTM: An Image Classification Model Based on Fashion-MNIST Dataset Kexin Zhang, Research School of Computer Science, Australian National University Kexin Zhang, U6342657@anu.edu.au Abstract. The application

More information

Disguised Face Identification (DFI) with Facial KeyPoints using Spatial Fusion Convolutional Network. Nathan Sun CIS601

Disguised Face Identification (DFI) with Facial KeyPoints using Spatial Fusion Convolutional Network. Nathan Sun CIS601 Disguised Face Identification (DFI) with Facial KeyPoints using Spatial Fusion Convolutional Network Nathan Sun CIS601 Introduction Face ID is complicated by alterations to an individual s appearance Beard,

More information

Deep Learning For Video Classification. Presented by Natalie Carlebach & Gil Sharon

Deep Learning For Video Classification. Presented by Natalie Carlebach & Gil Sharon Deep Learning For Video Classification Presented by Natalie Carlebach & Gil Sharon Overview Of Presentation Motivation Challenges of video classification Common datasets 4 different methods presented in

More information

Action recognition in robot-assisted minimally invasive surgery

Action recognition in robot-assisted minimally invasive surgery Action recognition in robot-assisted minimally invasive surgery Candidate: Laura Erica Pescatori Co-Tutor: Hirenkumar Chandrakant Nakawala Tutor: Elena De Momi 1 Project Objective da Vinci Robot: Console

More information

Code Mania Artificial Intelligence: a. Module - 1: Introduction to Artificial intelligence and Python:

Code Mania Artificial Intelligence: a. Module - 1: Introduction to Artificial intelligence and Python: Code Mania 2019 Artificial Intelligence: a. Module - 1: Introduction to Artificial intelligence and Python: 1. Introduction to Artificial Intelligence 2. Introduction to python programming and Environment

More information

arxiv: v1 [cs.cv] 4 Feb 2018

arxiv: v1 [cs.cv] 4 Feb 2018 End2You The Imperial Toolkit for Multimodal Profiling by End-to-End Learning arxiv:1802.01115v1 [cs.cv] 4 Feb 2018 Panagiotis Tzirakis Stefanos Zafeiriou Björn W. Schuller Department of Computing Imperial

More information

Machine Learning 13. week

Machine Learning 13. week Machine Learning 13. week Deep Learning Convolutional Neural Network Recurrent Neural Network 1 Why Deep Learning is so Popular? 1. Increase in the amount of data Thanks to the Internet, huge amount of

More information

DEEP LEARNING WITH GPUS Maxim Milakov, Senior HPC DevTech Engineer, NVIDIA

DEEP LEARNING WITH GPUS Maxim Milakov, Senior HPC DevTech Engineer, NVIDIA DEEP LEARNING WITH GPUS Maxim Milakov, Senior HPC DevTech Engineer, NVIDIA TOPICS COVERED Convolutional Networks Deep Learning Use Cases GPUs cudnn 2 MACHINE LEARNING! Training! Train the model from supervised

More information

End-To-End Spam Classification With Neural Networks

End-To-End Spam Classification With Neural Networks End-To-End Spam Classification With Neural Networks Christopher Lennan, Bastian Naber, Jan Reher, Leon Weber 1 Introduction A few years ago, the majority of the internet s network traffic was due to spam

More information

COMP 551 Applied Machine Learning Lecture 16: Deep Learning

COMP 551 Applied Machine Learning Lecture 16: Deep Learning COMP 551 Applied Machine Learning Lecture 16: Deep Learning Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise noted, all

More information

Tri-modal Human Body Segmentation

Tri-modal Human Body Segmentation Tri-modal Human Body Segmentation Master of Science Thesis Cristina Palmero Cantariño Advisor: Sergio Escalera Guerrero February 6, 2014 Outline 1 Introduction 2 Tri-modal dataset 3 Proposed baseline 4

More information

A Comparison of Sequence-Trained Deep Neural Networks and Recurrent Neural Networks Optical Modeling For Handwriting Recognition

A Comparison of Sequence-Trained Deep Neural Networks and Recurrent Neural Networks Optical Modeling For Handwriting Recognition A Comparison of Sequence-Trained Deep Neural Networks and Recurrent Neural Networks Optical Modeling For Handwriting Recognition Théodore Bluche, Hermann Ney, Christopher Kermorvant SLSP 14, Grenoble October

More information

Know your data - many types of networks

Know your data - many types of networks Architectures Know your data - many types of networks Fixed length representation Variable length representation Online video sequences, or samples of different sizes Images Specific architectures for

More information

Deep Learning With Noise

Deep Learning With Noise Deep Learning With Noise Yixin Luo Computer Science Department Carnegie Mellon University yixinluo@cs.cmu.edu Fan Yang Department of Mathematical Sciences Carnegie Mellon University fanyang1@andrew.cmu.edu

More information

A new approach for supervised power disaggregation by using a deep recurrent LSTM network

A new approach for supervised power disaggregation by using a deep recurrent LSTM network A new approach for supervised power disaggregation by using a deep recurrent LSTM network GlobalSIP 2015, 14th Dec. Lukas Mauch and Bin Yang Institute of Signal Processing and System Theory University

More information

3D Attention-Driven Depth Acquisition for Object Identification

3D Attention-Driven Depth Acquisition for Object Identification 3D Attention-Driven Depth Acquisition for Object Identification Kai Xu, Yifei Shi, Lintao Zheng, Junyu Zhang, Min Liu, Hui Huang, Hao Su, Daniel Cohen-Or and Baoquan Chen National University of Defense

More information

Rolling Bearing Diagnosis Based on CNN-LSTM and Various Condition Dataset

Rolling Bearing Diagnosis Based on CNN-LSTM and Various Condition Dataset Rolling Bearing Diagnosis Based on CNN-LSTM and Various Condition Dataset Osamu Yoshimatsu 1, Yoshihiro Satou 2, and Kenichi Shibasaki 3 1,2,3 Core Technology R&D Center, NSK Ltd., Fujisawa, Kanagawa,

More information

On the Efficiency of Recurrent Neural Network Optimization Algorithms

On the Efficiency of Recurrent Neural Network Optimization Algorithms On the Efficiency of Recurrent Neural Network Optimization Algorithms Ben Krause, Liang Lu, Iain Murray, Steve Renals University of Edinburgh Department of Informatics s17005@sms.ed.ac.uk, llu@staffmail.ed.ac.uk,

More information

27: Hybrid Graphical Models and Neural Networks

27: Hybrid Graphical Models and Neural Networks 10-708: Probabilistic Graphical Models 10-708 Spring 2016 27: Hybrid Graphical Models and Neural Networks Lecturer: Matt Gormley Scribes: Jakob Bauer Otilia Stretcu Rohan Varma 1 Motivation We first look

More information

Text Recognition in Videos using a Recurrent Connectionist Approach

Text Recognition in Videos using a Recurrent Connectionist Approach Author manuscript, published in "ICANN - 22th International Conference on Artificial Neural Networks, Lausanne : Switzerland (2012)" DOI : 10.1007/978-3-642-33266-1_22 Text Recognition in Videos using

More information

RGBD Occlusion Detection via Deep Convolutional Neural Networks

RGBD Occlusion Detection via Deep Convolutional Neural Networks 1 RGBD Occlusion Detection via Deep Convolutional Neural Networks Soumik Sarkar 1,2, Vivek Venugopalan 1, Kishore Reddy 1, Michael Giering 1, Julian Ryde 3, Navdeep Jaitly 4,5 1 United Technologies Research

More information

Empirical Evaluation of RNN Architectures on Sentence Classification Task

Empirical Evaluation of RNN Architectures on Sentence Classification Task Empirical Evaluation of RNN Architectures on Sentence Classification Task Lei Shen, Junlin Zhang Chanjet Information Technology lorashen@126.com, zhangjlh@chanjet.com Abstract. Recurrent Neural Networks

More information

Image Captioning with Object Detection and Localization

Image Captioning with Object Detection and Localization Image Captioning with Object Detection and Localization Zhongliang Yang, Yu-Jin Zhang, Sadaqat ur Rehman, Yongfeng Huang, Department of Electronic Engineering, Tsinghua University, Beijing 100084, China

More information

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides Deep Learning in Visual Recognition Thanks Da Zhang for the slides Deep Learning is Everywhere 2 Roadmap Introduction Convolutional Neural Network Application Image Classification Object Detection Object

More information

Perceiving the 3D World from Images and Videos. Yu Xiang Postdoctoral Researcher University of Washington

Perceiving the 3D World from Images and Videos. Yu Xiang Postdoctoral Researcher University of Washington Perceiving the 3D World from Images and Videos Yu Xiang Postdoctoral Researcher University of Washington 1 2 Act in the 3D World Sensing & Understanding Acting Intelligent System 3D World 3 Understand

More information

Prediction of Pedestrian Trajectories Final Report

Prediction of Pedestrian Trajectories Final Report Prediction of Pedestrian Trajectories Final Report Mingchen Li (limc), Yiyang Li (yiyang7), Gendong Zhang (zgdsh29) December 15, 2017 1 Introduction As the industry of automotive vehicles growing rapidly,

More information

Two-Stream Convolutional Networks for Action Recognition in Videos

Two-Stream Convolutional Networks for Action Recognition in Videos Two-Stream Convolutional Networks for Action Recognition in Videos Karen Simonyan Andrew Zisserman Cemil Zalluhoğlu Introduction Aim Extend deep Convolution Networks to action recognition in video. Motivation

More information

Face Recognition A Deep Learning Approach

Face Recognition A Deep Learning Approach Face Recognition A Deep Learning Approach Lihi Shiloh Tal Perl Deep Learning Seminar 2 Outline What about Cat recognition? Classical face recognition Modern face recognition DeepFace FaceNet Comparison

More information

Asynchronous Parallel Learning for Neural Networks and Structured Models with Dense Features

Asynchronous Parallel Learning for Neural Networks and Structured Models with Dense Features Asynchronous Parallel Learning for Neural Networks and Structured Models with Dense Features Xu SUN ( 孙栩 ) Peking University xusun@pku.edu.cn Motivation Neural networks -> Good Performance CNN, RNN, LSTM

More information

Training LDCRF model on unsegmented sequences using Connectionist Temporal Classification

Training LDCRF model on unsegmented sequences using Connectionist Temporal Classification Training LDCRF model on unsegmented sequences using Connectionist Temporal Classification 1 Amir Ahooye Atashin, 2 Kamaledin Ghiasi-Shirazi, 3 Ahad Harati Department of Computer Engineering Ferdowsi University

More information

Deep Learning Applications

Deep Learning Applications October 20, 2017 Overview Supervised Learning Feedforward neural network Convolution neural network Recurrent neural network Recursive neural network (Recursive neural tensor network) Unsupervised Learning

More information

SEMANTIC COMPUTING. Lecture 8: Introduction to Deep Learning. TU Dresden, 7 December Dagmar Gromann International Center For Computational Logic

SEMANTIC COMPUTING. Lecture 8: Introduction to Deep Learning. TU Dresden, 7 December Dagmar Gromann International Center For Computational Logic SEMANTIC COMPUTING Lecture 8: Introduction to Deep Learning Dagmar Gromann International Center For Computational Logic TU Dresden, 7 December 2018 Overview Introduction Deep Learning General Neural Networks

More information

Sentiment Classification of Food Reviews

Sentiment Classification of Food Reviews Sentiment Classification of Food Reviews Hua Feng Department of Electrical Engineering Stanford University Stanford, CA 94305 fengh15@stanford.edu Ruixi Lin Department of Electrical Engineering Stanford

More information

CS231A Course Project Final Report Sign Language Recognition with Unsupervised Feature Learning

CS231A Course Project Final Report Sign Language Recognition with Unsupervised Feature Learning CS231A Course Project Final Report Sign Language Recognition with Unsupervised Feature Learning Justin Chen Stanford University justinkchen@stanford.edu Abstract This paper focuses on experimenting with

More information

Convolutional Neural Networks: Applications and a short timeline. 7th Deep Learning Meetup Kornel Kis Vienna,

Convolutional Neural Networks: Applications and a short timeline. 7th Deep Learning Meetup Kornel Kis Vienna, Convolutional Neural Networks: Applications and a short timeline 7th Deep Learning Meetup Kornel Kis Vienna, 1.12.2016. Introduction Currently a master student Master thesis at BME SmartLab Started deep

More information

S7348: Deep Learning in Ford's Autonomous Vehicles. Bryan Goodman Argo AI 9 May 2017

S7348: Deep Learning in Ford's Autonomous Vehicles. Bryan Goodman Argo AI 9 May 2017 S7348: Deep Learning in Ford's Autonomous Vehicles Bryan Goodman Argo AI 9 May 2017 1 Ford s 12 Year History in Autonomous Driving Today: examples from Stereo image processing Object detection Using RNN

More information

The Hilbert Problems of Computer Vision. Jitendra Malik UC Berkeley & Google, Inc.

The Hilbert Problems of Computer Vision. Jitendra Malik UC Berkeley & Google, Inc. The Hilbert Problems of Computer Vision Jitendra Malik UC Berkeley & Google, Inc. This talk The computational power of the human brain Research is the art of the soluble Hilbert problems, circa 2004 Hilbert

More information

Sequence Modeling: Recurrent and Recursive Nets. By Pyry Takala 14 Oct 2015

Sequence Modeling: Recurrent and Recursive Nets. By Pyry Takala 14 Oct 2015 Sequence Modeling: Recurrent and Recursive Nets By Pyry Takala 14 Oct 2015 Agenda Why Recurrent neural networks? Anatomy and basic training of an RNN (10.2, 10.2.1) Properties of RNNs (10.2.2, 8.2.6) Using

More information

Recurrent Convolutional Neural Networks for Scene Labeling

Recurrent Convolutional Neural Networks for Scene Labeling Recurrent Convolutional Neural Networks for Scene Labeling Pedro O. Pinheiro, Ronan Collobert Reviewed by Yizhe Zhang August 14, 2015 Scene labeling task Scene labeling: assign a class label to each pixel

More information

Recurrent Neural Nets II

Recurrent Neural Nets II Recurrent Neural Nets II Steven Spielberg Pon Kumar, Tingke (Kevin) Shen Machine Learning Reading Group, Fall 2016 9 November, 2016 Outline 1 Introduction 2 Problem Formulations with RNNs 3 LSTM for Optimization

More information

Learning Deep and Compact Models for Gesture Recognition

Learning Deep and Compact Models for Gesture Recognition Learning Deep and Compact Models for Gesture Recognition Thesis submitted in partial fulfillment of the requirements for the degree of MS in Computer Science and Engineering, by Research by Koustav Mullick

More information

Dependency Parsing 2 CMSC 723 / LING 723 / INST 725. Marine Carpuat. Fig credits: Joakim Nivre, Dan Jurafsky & James Martin

Dependency Parsing 2 CMSC 723 / LING 723 / INST 725. Marine Carpuat. Fig credits: Joakim Nivre, Dan Jurafsky & James Martin Dependency Parsing 2 CMSC 723 / LING 723 / INST 725 Marine Carpuat Fig credits: Joakim Nivre, Dan Jurafsky & James Martin Dependency Parsing Formalizing dependency trees Transition-based dependency parsing

More information

Topics for thesis. Automatic Speech-based Emotion Recognition

Topics for thesis. Automatic Speech-based Emotion Recognition Topics for thesis Bachelor: Automatic Speech-based Emotion Recognition Emotion recognition is an important part of Human-Computer Interaction (HCI). It has various applications in industrial and commercial

More information

arxiv: v1 [cs.cv] 2 May 2017

arxiv: v1 [cs.cv] 2 May 2017 INVESTIGATION OF DIFFERENT SKELETON FEATURES FOR CNN-BASED 3D ACTION RECOGNITION Zewei Ding, Pichao Wang*, Philip O. Ogunbona, Wanqing Li Advanced Multimedia Research Lab, University of Wollongong, Australia

More information

ImageNet Classification with Deep Convolutional Neural Networks

ImageNet Classification with Deep Convolutional Neural Networks ImageNet Classification with Deep Convolutional Neural Networks Alex Krizhevsky Ilya Sutskever Geoffrey Hinton University of Toronto Canada Paper with same name to appear in NIPS 2012 Main idea Architecture

More information

Online Detection and Classification of Dynamic Hand Gestures with Recurrent 3D Convolutional Neural Networks

Online Detection and Classification of Dynamic Hand Gestures with Recurrent 3D Convolutional Neural Networks Online Detection and Classification of Dynamic Hand Gestures with Recurrent 3D Convolutional Neural Networks Pavlo Molchanov iaodong Yang Shalini Gupta Kihwan Kim Stephen Tyree Jan Kautz NVIDIA {pmolchanov,xiaodongy,shalinig,kihwank,styree,jkautz}@nvidia.com

More information

CSC 578 Neural Networks and Deep Learning

CSC 578 Neural Networks and Deep Learning CSC 578 Neural Networks and Deep Learning Fall 2018/19 7. Recurrent Neural Networks (Some figures adapted from NNDL book) 1 Recurrent Neural Networks 1. Recurrent Neural Networks (RNNs) 2. RNN Training

More information

This is a repository copy of Performance evaluation of deep feature learning for RGB-D image/video classification.

This is a repository copy of Performance evaluation of deep feature learning for RGB-D image/video classification. This is a repository copy of Performance evaluation of deep feature learning for RGB-D image/video classification. White Rose Research Online URL for this paper: http://eprints.whiterose.ac.uk/113273/

More information

Hello Edge: Keyword Spotting on Microcontrollers

Hello Edge: Keyword Spotting on Microcontrollers Hello Edge: Keyword Spotting on Microcontrollers Yundong Zhang, Naveen Suda, Liangzhen Lai and Vikas Chandra ARM Research, Stanford University arxiv.org, 2017 Presented by Mohammad Mofrad University of

More information

Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting

Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting Yaguang Li Joint work with Rose Yu, Cyrus Shahabi, Yan Liu Page 1 Introduction Traffic congesting is wasteful of time,

More information

Proceedings of the International MultiConference of Engineers and Computer Scientists 2018 Vol I IMECS 2018, March 14-16, 2018, Hong Kong

Proceedings of the International MultiConference of Engineers and Computer Scientists 2018 Vol I IMECS 2018, March 14-16, 2018, Hong Kong , March 14-16, 2018, Hong Kong , March 14-16, 2018, Hong Kong , March 14-16, 2018, Hong Kong , March 14-16, 2018, Hong Kong TABLE I CLASSIFICATION ACCURACY OF DIFFERENT PRE-TRAINED MODELS ON THE TEST DATA

More information

Semantic image search using queries

Semantic image search using queries Semantic image search using queries Shabaz Basheer Patel, Anand Sampat Department of Electrical Engineering Stanford University CA 94305 shabaz@stanford.edu,asampat@stanford.edu Abstract Previous work,

More information

Video Gesture Recognition with RGB-D-S Data Based on 3D Convolutional Networks

Video Gesture Recognition with RGB-D-S Data Based on 3D Convolutional Networks Video Gesture Recognition with RGB-D-S Data Based on 3D Convolutional Networks August 16, 2016 1 Team details Team name FLiXT Team leader name Yunan Li Team leader address, phone number and email address:

More information

Recurrent Neural Networks

Recurrent Neural Networks Recurrent Neural Networks Javier Béjar Deep Learning 2018/2019 Fall Master in Artificial Intelligence (FIB-UPC) Introduction Sequential data Many problems are described by sequences Time series Video/audio

More information

Deep Convolutional Neural Networks and Noisy Images

Deep Convolutional Neural Networks and Noisy Images Deep Convolutional Neural Networks and Noisy Images Tiago S. Nazaré, Gabriel B. Paranhos da Costa, Welinton A. Contato, and Moacir Ponti Instituto de Ciências Matemáticas e de Computação Universidade de

More information

Sign Language Recognition using Dynamic Time Warping and Hand Shape Distance Based on Histogram of Oriented Gradient Features

Sign Language Recognition using Dynamic Time Warping and Hand Shape Distance Based on Histogram of Oriented Gradient Features Sign Language Recognition using Dynamic Time Warping and Hand Shape Distance Based on Histogram of Oriented Gradient Features Pat Jangyodsuk Department of Computer Science and Engineering The University

More information

Neural Network and Deep Learning. Donglin Zeng, Department of Biostatistics, University of North Carolina

Neural Network and Deep Learning. Donglin Zeng, Department of Biostatistics, University of North Carolina Neural Network and Deep Learning Early history of deep learning Deep learning dates back to 1940s: known as cybernetics in the 1940s-60s, connectionism in the 1980s-90s, and under the current name starting

More information

Pluto A Distributed Heterogeneous Deep Learning Framework. Jun Yang, Yan Chen Large Scale Learning, Alibaba Cloud

Pluto A Distributed Heterogeneous Deep Learning Framework. Jun Yang, Yan Chen Large Scale Learning, Alibaba Cloud Pluto A Distributed Heterogeneous Deep Learning Framework Jun Yang, Yan Chen Large Scale Learning, Alibaba Cloud Outline PAI(Platform of Artificial Intelligence) PAI Overview Deep Learning with PAI Pluto

More information

Domain-Aware Sentiment Classification with GRUs and CNNs

Domain-Aware Sentiment Classification with GRUs and CNNs Domain-Aware Sentiment Classification with GRUs and CNNs Guangyuan Piao 1(B) and John G. Breslin 2 1 Insight Centre for Data Analytics, Data Science Institute, National University of Ireland Galway, Galway,

More information

A FRAMEWORK OF EXTRACTING MULTI-SCALE FEATURES USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKS. Kuan-Chuan Peng and Tsuhan Chen

A FRAMEWORK OF EXTRACTING MULTI-SCALE FEATURES USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKS. Kuan-Chuan Peng and Tsuhan Chen A FRAMEWORK OF EXTRACTING MULTI-SCALE FEATURES USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKS Kuan-Chuan Peng and Tsuhan Chen School of Electrical and Computer Engineering, Cornell University, Ithaca, NY

More information

Feature-Fused SSD: Fast Detection for Small Objects

Feature-Fused SSD: Fast Detection for Small Objects Feature-Fused SSD: Fast Detection for Small Objects Guimei Cao, Xuemei Xie, Wenzhe Yang, Quan Liao, Guangming Shi, Jinjian Wu School of Electronic Engineering, Xidian University, China xmxie@mail.xidian.edu.cn

More information

Deep learning for dense per-pixel prediction. Chunhua Shen The University of Adelaide, Australia

Deep learning for dense per-pixel prediction. Chunhua Shen The University of Adelaide, Australia Deep learning for dense per-pixel prediction Chunhua Shen The University of Adelaide, Australia Image understanding Classification error Convolution Neural Networks 0.3 0.2 0.1 Image Classification [Krizhevsky

More information

CS489/698: Intro to ML

CS489/698: Intro to ML CS489/698: Intro to ML Lecture 14: Training of Deep NNs Instructor: Sun Sun 1 Outline Activation functions Regularization Gradient-based optimization 2 Examples of activation functions 3 5/28/18 Sun Sun

More information

Combining Neural Networks and Log-linear Models to Improve Relation Extraction

Combining Neural Networks and Log-linear Models to Improve Relation Extraction Combining Neural Networks and Log-linear Models to Improve Relation Extraction Thien Huu Nguyen and Ralph Grishman Computer Science Department, New York University {thien,grishman}@cs.nyu.edu Outline Relation

More information

Quality Guided Image Denoising for Low-Cost Fundus Imaging

Quality Guided Image Denoising for Low-Cost Fundus Imaging Quality Guided Image Denoising for Low-Cost Fundus Imaging Thomas Köhler1,2, Joachim Hornegger1,2, Markus Mayer1,2, Georg Michelson2,3 20.03.2012 1 Pattern Recognition Lab, Ophthalmic Imaging Group 2 Erlangen

More information

A MULTI-RESOLUTION FUSION MODEL INCORPORATING COLOR AND ELEVATION FOR SEMANTIC SEGMENTATION

A MULTI-RESOLUTION FUSION MODEL INCORPORATING COLOR AND ELEVATION FOR SEMANTIC SEGMENTATION A MULTI-RESOLUTION FUSION MODEL INCORPORATING COLOR AND ELEVATION FOR SEMANTIC SEGMENTATION Wenkai Zhang a, b, Hai Huang c, *, Matthias Schmitz c, Xian Sun a, Hongqi Wang a, Helmut Mayer c a Key Laboratory

More information

Multi-Modal Audio, Video, and Physiological Sensor Learning for Continuous Emotion Prediction

Multi-Modal Audio, Video, and Physiological Sensor Learning for Continuous Emotion Prediction Multi-Modal Audio, Video, and Physiological Sensor Learning for Continuous Emotion Prediction Youngjune Gwon 1, Kevin Brady 1, Pooya Khorrami 2, Elizabeth Godoy 1, William Campbell 1, Charlie Dagli 1,

More information

List of Accepted Papers for ICVGIP 2018

List of Accepted Papers for ICVGIP 2018 List of Accepted Papers for ICVGIP 2018 Paper ID ACM Article Title 3 1 PredGAN - A deep multi-scale video prediction framework for anomaly detection in videos 7 2 Handwritten Essay Grading on Mobiles using

More information

arxiv: v1 [stat.ml] 3 Aug 2015

arxiv: v1 [stat.ml] 3 Aug 2015 Time-series modeling with undecimated fully convolutional neural networks arxiv:1508.00317v1 [stat.ml] 3 Aug 2015 Roni Mittelman rmittelm@gmail.com Abstract We present a new convolutional neural network-based

More information

Layerwise Interweaving Convolutional LSTM

Layerwise Interweaving Convolutional LSTM Layerwise Interweaving Convolutional LSTM Tiehang Duan and Sargur N. Srihari Department of Computer Science and Engineering The State University of New York at Buffalo Buffalo, NY 14260, United States

More information

SKELETON-INDEXED DEEP MULTI-MODAL FEATURE LEARNING FOR HIGH PERFORMANCE HUMAN ACTION RECOGNITION. Chinese Academy of Sciences, Beijing, China

SKELETON-INDEXED DEEP MULTI-MODAL FEATURE LEARNING FOR HIGH PERFORMANCE HUMAN ACTION RECOGNITION. Chinese Academy of Sciences, Beijing, China SKELETON-INDEXED DEEP MULTI-MODAL FEATURE LEARNING FOR HIGH PERFORMANCE HUMAN ACTION RECOGNITION Sijie Song 1, Cuiling Lan 2, Junliang Xing 3, Wenjun Zeng 2, Jiaying Liu 1 1 Institute of Computer Science

More information

April 4-7, 2016 Silicon Valley

April 4-7, 2016 Silicon Valley April 4-7, 2016 Silicon Valley Neural Attention for Object Tracking Brian Cheung bcheung@berkeley.edu Redwood Center for Theoretical Neuroscience, UC Berkeley Visual Computing Research, NVIDIA Source:

More information

arxiv: v1 [cs.cv] 31 Mar 2016

arxiv: v1 [cs.cv] 31 Mar 2016 Object Boundary Guided Semantic Segmentation Qin Huang, Chunyang Xia, Wenchao Zheng, Yuhang Song, Hao Xu and C.-C. Jay Kuo arxiv:1603.09742v1 [cs.cv] 31 Mar 2016 University of Southern California Abstract.

More information

Image de-fencing using RGB-D data

Image de-fencing using RGB-D data Image de-fencing using RGB-D data Vikram Voleti IIIT-Hyderabad, India Supervisor: Masters thesis at IIT Kharagpur, India (2013-2014) Prof. Rajiv Ranjan Sahay Associate Professor, Electrical Engineering,

More information

Pixel-level Generative Model

Pixel-level Generative Model Pixel-level Generative Model Generative Image Modeling Using Spatial LSTMs (2015NIPS) L. Theis and M. Bethge University of Tübingen, Germany Pixel Recurrent Neural Networks (2016ICML) A. van den Oord,

More information

UTS submission to Google YouTube-8M Challenge 2017

UTS submission to Google YouTube-8M Challenge 2017 UTS submission to Google YouTube-8M Challenge 2017 Linchao Zhu Yanbin Liu Yi Yang University of Technology Sydney {zhulinchao7, csyanbin, yee.i.yang}@gmail.com Abstract In this paper, we present our solution

More information

Tutorial on Keras CAP ADVANCED COMPUTER VISION SPRING 2018 KISHAN S ATHREY

Tutorial on Keras CAP ADVANCED COMPUTER VISION SPRING 2018 KISHAN S ATHREY Tutorial on Keras CAP 6412 - ADVANCED COMPUTER VISION SPRING 2018 KISHAN S ATHREY Deep learning packages TensorFlow Google PyTorch Facebook AI research Keras Francois Chollet (now at Google) Chainer Company

More information