Multimodal Gesture Recognition using Multi-stream Recurrent Neural Network
|
|
- Cory Floyd
- 6 years ago
- Views:
Transcription
1 Multimodal Gesture Recognition using Multi-stream Recurrent Neural Network Noriki Nishida and Hideki Nakayama Machine Perception Group Graduate School of Information Science and Technology The University of Tokyo 7th Pacific Rim Symposium on Image and Video Technology (PSIVT 2015) November 27, 2015 Noriki Nishida and Hideki Nakayama Multimodal Gesture Recognition using MRNN The University of Tokyo 1 / 26
2 Menu 1 Introduction 2 Proposed model 3 Experiments 4 Conclusion & Future works Noriki Nishida and Hideki Nakayama Multimodal Gesture Recognition using MRNN The University of Tokyo 2 / 26
3 Recent breakthroughs Object recognition [Krizhevsky et al., 2012] Object detection [Girshick et al., 2014] Speech recognition [Hinton et al., 2012] Word embedding [Mikolov et al., 2013] Convolutional neural networks [Krizhevsky et al., 2012] Recurrent neural networks with Long Short-Term Memory (LSTM) [Hochreiter et al., 1997] AdaDelta (optimization) [Zeiler et al., 2012] Noriki Nishida and Hideki Nakayama Multimodal Gesture Recognition using MRNN The University of Tokyo 3 / 26
4 Multimodal-sequential fusion is NOT solved Figure : Examples of multiple modalities (in gesture recognition) [ How should we fuse multiple modalities into a common space (vector representation)? How should we extract sequential dynamics from multiple sequential modalities? Noriki Nishida and Hideki Nakayama Multimodal Gesture Recognition using MRNN The University of Tokyo 4 / 26
5 Problem with traditional methods Hand-crafted heuristics 1. lead to a lack of generality e.g., skin color filtering for hand detection 2. require prior knowledge of target gesture domains System with less gesture-specific engineering is preferable. Noriki Nishida and Hideki Nakayama Multimodal Gesture Recognition using MRNN The University of Tokyo 5 / 26
6 Our goal 1. Propose an effective approach for fusing multiple sequential modalities 2. Propose a completely data-driven model that can be optimized from end to end Noriki Nishida and Hideki Nakayama Multimodal Gesture Recognition using MRNN The University of Tokyo 6 / 26
7 Recurrent Neural Networks (RNNs) h t = σ(w in x t + W hh h t 1 + b in ) y t = f(w out h t + b out ) Noriki Nishida and Hideki Nakayama Multimodal Gesture Recognition using MRNN The University of Tokyo 7 / 26
8 Overall view of our multi-stream RNN (MRNN) Noriki Nishida and Hideki Nakayama Multimodal Gesture Recognition using MRNN The University of Tokyo 8 / 26
9 Components in our MRNN I (m) : extracts feature vectors from the frame-level inputs of modality m at every time step S (m) : computes the sequential dynamics of the modality m F : fuses the multiple modalities while considering sequential dynamics in multimodal space O: predicts the gesture category given the last output of F What we should optimize are parameters of these components. Noriki Nishida and Hideki Nakayama Multimodal Gesture Recognition using MRNN The University of Tokyo 9 / 26
10 Formularization Input video (with M modalities): x = {(x (1) 1, x (2) 1,..., x (M) 1 ),..., (x (1) T, x(2) T Extract feature representation ĥt from x v (m) t,..., x(m) T )} = I (m) (x (m) t ) for m = 1,..., M h (m) t = S (m) (v (m) t, h (m) t 1) for m = 1,..., M Classification: ĥ t = F ([h (1) t ; h (2) t ;... ; h (M) t ], ĥt 1) y = O(ĥT ) Noriki Nishida and Hideki Nakayama Multimodal Gesture Recognition using MRNN The University of Tokyo 10 / 26
11 Formularization Input video (with M modalities): x = {(x (1) 1, x (2) 1,..., x (M) 1 ),..., (x (1) T, x(2) T Extract feature representation ĥt from x v (m) t,..., x(m) T )} = I (m) (x (m) t ) for m = 1,..., M h (m) t = S (m) (v (m) t, h t 1) (m) for m = 1,..., M Classification: ĥ t = F ([h (1) t ; h (2) t ;... ; h (M) t ], ĥt 1) y = O(ĥT ) Noriki Nishida and Hideki Nakayama Multimodal Gesture Recognition using MRNN The University of Tokyo 11 / 26
12 Formularization Input video (with M modalities): x = {(x (1) 1, x (2) 1,..., x (M) 1 ),..., (x (1) T, x(2) T Extract feature representation ĥt from x v (m) t,..., x(m) T )} = I (m) (x (m) t ) for m = 1,..., M h (m) t = S (m) (v (m) t, h t 1) (m) for m = 1,..., M Classification: ĥ t = F ([h (1) t ; h (2) t ;... ; h (M) t ], ĥt 1) y = O(ĥT ) Noriki Nishida and Hideki Nakayama Multimodal Gesture Recognition using MRNN The University of Tokyo 12 / 26
13 Formularization Input video (with M modalities): x = {(x (1) 1, x (2) 1,..., x (M) 1 ),..., (x (1) T, x(2) T Extract feature representation ĥt from x v (m) t,..., x(m) T )} = I (m) (x (m) t ) for m = 1,..., M h (m) t = S (m) (v (m) t, h (m) t 1) for m = 1,..., M Classification: ĥ t = F ([h (1) t ; h (2) t ;... ; h (M) t ], ĥt 1) y = O(ĥT ) Noriki Nishida and Hideki Nakayama Multimodal Gesture Recognition using MRNN The University of Tokyo 13 / 26
14 Formularization Input video (with M modalities): x = {(x (1) 1, x (2) 1,..., x (M) 1 ),..., (x (1) T, x(2) T Extract feature representation ĥt from x v (m) t,..., x(m) T )} = I (m) (x (m) t ) for m = 1,..., M h (m) t = S (m) (v (m) t, h (m) t 1) for m = 1,..., M Classification: ĥ t = F ([h (1) t ; h (2) t ;... ; h (M) t ], ĥt 1) y = O(ĥT ) Noriki Nishida and Hideki Nakayama Multimodal Gesture Recognition using MRNN The University of Tokyo 14 / 26
15 Graphical representation of our method (M = 2) Noriki Nishida and Hideki Nakayama Multimodal Gesture Recognition using MRNN The University of Tokyo 15 / 26
16 Advantages 1. Whole free parameters of the MRNN can be trained towards end-to-end performance in a supervised manner using SGD and backpropagation. No hand-crafted engineering 2. We can choose current state-of-the-art neural networks for each component: ConvNet or DNN for I (m) LSTM or GRU [Cho et al., 2014] for S (m), F DNN for O Noriki Nishida and Hideki Nakayama Multimodal Gesture Recognition using MRNN The University of Tokyo 16 / 26
17 Late multimodal fusion model (M = 2) No mechanism to consider sequential dynamics in multimodal space Noriki Nishida and Hideki Nakayama Multimodal Gesture Recognition using MRNN The University of Tokyo 17 / 26
18 Early multimodal fusion model (M = 2) No mechanism to consider sequential dynamics in each single-modal space Noriki Nishida and Hideki Nakayama Multimodal Gesture Recognition using MRNN The University of Tokyo 18 / 26
19 Dataset Sheffield Kinect Gesture (SKIG) Dataset [Liu et al., 2013] 10 gesture classes Various illumination and cluttered background Each video consists of two modalities (RGB + Depth) We compute Optical Flow as additional modality Noriki Nishida and Hideki Nakayama Multimodal Gesture Recognition using MRNN The University of Tokyo 19 / 26
20 Experimental Results: MRNN vs. alternatives Table : Test accuracy (MRNN vs. alternative models) Method Accuracy (%) Early multimodal fusion 94.1 Late multimodal fusion 94.6 MRNN 97.8 Extracting sequential dynamics in both single-modal space and multimodal space is beneficial for higher accuracy Noriki Nishida and Hideki Nakayama Multimodal Gesture Recognition using MRNN The University of Tokyo 20 / 26
21 Experimental Results: MRNN vs. previous works Table : Test accuracy (MRNN vs. state-of-the-art methods) Method Accuracy (%) Liu et al. (2013) 88.7 Choi et al. (2014) 91.9 Tung et al. (2014) 96.7 MRNN 97.8 The MRNN outperforms other state-of-the-art methods. Noriki Nishida and Hideki Nakayama Multimodal Gesture Recognition using MRNN The University of Tokyo 21 / 26
22 Experimental Results: multimodal vs. single modality Table : Test accuracy (multiple modality vs. single modality) Method Accuracy (%) MRNN (color) 91.6 MRNN (opt flow) 88.5 MRNN (depth) 95.9 MRNN (color + opt flow + depth) 97.8 The MRNN successfully incorporates multiple sequential modalities. Noriki Nishida and Hideki Nakayama Multimodal Gesture Recognition using MRNN The University of Tokyo 22 / 26
23 Investigation of the robustness to noisy inputs Add Gaussian noise with different standard deviation σ to the depth information in test set. The MRNN can maintain relatively high accuracy. Noriki Nishida and Hideki Nakayama Multimodal Gesture Recognition using MRNN The University of Tokyo 23 / 26
24 Conclusion We propose the MRNN for multimodal-sequential fusion. We successfully applied this approach to multimodal gesture recognition. The MRNN achieves newly state-of-the-art result in the SKIG dataset. Multimodal fusion while considering sequential dynamics in both single-modal space and multimodal space is beneficial. Noriki Nishida and Hideki Nakayama Multimodal Gesture Recognition using MRNN The University of Tokyo 24 / 26
25 Future works Further investigation for theoretical analysis Test our model in other datasets Use other modalities such as skeletal or speech data Apply our model to other tasks that have multimodal-sequential data (e.g., speech recognition) Noriki Nishida and Hideki Nakayama Multimodal Gesture Recognition using MRNN The University of Tokyo 25 / 26
26 Thank you very much! Q & A Noriki Nishida and Hideki Nakayama Multimodal Gesture Recognition using MRNN The University of Tokyo 26 / 26
Multimodal Gesture Recognition using Multi-stream Recurrent Neural Network
Multimodal Gesture Recognition using Multi-stream Recurrent Neural Network Noriki Nishida, Hideki Nakayama Machine Perception Group Graduate School of Information Science and Technology The University
More informationCS231N Section. Video Understanding 6/1/2018
CS231N Section Video Understanding 6/1/2018 Outline Background / Motivation / History Video Datasets Models Pre-deep learning CNN + RNN 3D convolution Two-stream What we ve seen in class so far... Image
More informationDEEP LEARNING REVIEW. Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature Presented by Divya Chitimalla
DEEP LEARNING REVIEW Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature 2015 -Presented by Divya Chitimalla What is deep learning Deep learning allows computational models that are composed of multiple
More informationBidirectional Recurrent Convolutional Networks for Video Super-Resolution
Bidirectional Recurrent Convolutional Networks for Video Super-Resolution Qi Zhang & Yan Huang Center for Research on Intelligent Perception and Computing (CRIPAC) National Laboratory of Pattern Recognition
More informationQuo Vadis, Action Recognition? A New Model and the Kinetics Dataset. By Joa õ Carreira and Andrew Zisserman Presenter: Zhisheng Huang 03/02/2018
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset By Joa õ Carreira and Andrew Zisserman Presenter: Zhisheng Huang 03/02/2018 Outline: Introduction Action classification architectures
More informationEmpirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling
Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling Authors: Junyoung Chung, Caglar Gulcehre, KyungHyun Cho and Yoshua Bengio Presenter: Yu-Wei Lin Background: Recurrent Neural
More informationMultilayer and Multimodal Fusion of Deep Neural Networks for Video Classification
Multilayer and Multimodal Fusion of Deep Neural Networks for Video Classification Xiaodong Yang, Pavlo Molchanov, Jan Kautz INTELLIGENT VIDEO ANALYTICS Surveillance event detection Human-computer interaction
More informationConvolutional-Recursive Deep Learning for 3D Object Classification
Convolutional-Recursive Deep Learning for 3D Object Classification Richard Socher, Brody Huval, Bharath Bhat, Christopher D. Manning, Andrew Y. Ng NIPS 2012 Iro Armeni, Manik Dhar Motivation Hand-designed
More informationAdaptive Gesture Recognition System Integrating Multiple Inputs
Adaptive Gesture Recognition System Integrating Multiple Inputs Master Thesis - Colloquium Tobias Staron University of Hamburg Faculty of Mathematics, Informatics and Natural Sciences Technical Aspects
More informationarxiv: v1 [cs.cv] 26 Jun 2017
Skeleton-Based Action Recognition Using Spatio-Temporal LSTM Network with Trust Gates Jun Liu, Amir Shahroudy, Dong Xu, Alex C. Kot, and Gang Wang arxiv:706.0876v [cs.cv] 6 Jun 07 Abstract Skeleton-based
More informationDeep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks
Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks Si Chen The George Washington University sichen@gwmail.gwu.edu Meera Hahn Emory University mhahn7@emory.edu Mentor: Afshin
More informationRECURRENT NEURAL NETWORKS
RECURRENT NEURAL NETWORKS Methods Traditional Deep-Learning based Non-machine Learning Machine-Learning based method Supervised SVM MLP CNN RNN (LSTM) Localizati on GPS, SLAM Self Driving Perception Pedestrian
More information(Deep) Learning for Robot Perception and Navigation. Wolfram Burgard
(Deep) Learning for Robot Perception and Navigation Wolfram Burgard Deep Learning for Robot Perception (and Navigation) Lifeng Bo, Claas Bollen, Thomas Brox, Andreas Eitel, Dieter Fox, Gabriel L. Oliveira,
More informationXuedong Huang Chief Speech Scientist & Distinguished Engineer Microsoft Corporation
Xuedong Huang Chief Speech Scientist & Distinguished Engineer Microsoft Corporation xdh@microsoft.com Cloud-enabled multimodal NUI with speech, gesture, gaze http://cacm.acm.org/magazines/2014/1/170863-ahistorical-perspective-of-speech-recognition
More informationA Deep Learning primer
A Deep Learning primer Riccardo Zanella r.zanella@cineca.it SuperComputing Applications and Innovation Department 1/21 Table of Contents Deep Learning: a review Representation Learning methods DL Applications
More informationHUMAN action recognition is a fast developing research
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 0.09/TPAMI.07.77306,
More informationLSTM and its variants for visual recognition. Xiaodan Liang Sun Yat-sen University
LSTM and its variants for visual recognition Xiaodan Liang xdliang328@gmail.com Sun Yat-sen University Outline Context Modelling with CNN LSTM and its Variants LSTM Architecture Variants Application in
More informationLarge-scale gesture recognition based on Multimodal data with C3D and TSN
Large-scale gesture recognition based on Multimodal data with C3D and TSN July 6, 2017 1 Team details Team name ASU Team leader name Yunan Li Team leader address, phone number and email address: Xidian
More informationRecurrent Neural Networks and Transfer Learning for Action Recognition
Recurrent Neural Networks and Transfer Learning for Action Recognition Andrew Giel Stanford University agiel@stanford.edu Ryan Diaz Stanford University ryandiaz@stanford.edu Abstract We have taken on the
More informationEND-TO-END CHINESE TEXT RECOGNITION
END-TO-END CHINESE TEXT RECOGNITION Jie Hu 1, Tszhang Guo 1, Ji Cao 2, Changshui Zhang 1 1 Department of Automation, Tsinghua University 2 Beijing SinoVoice Technology November 15, 2017 Presentation at
More informationA Deep Learning Framework for Authorship Classification of Paintings
A Deep Learning Framework for Authorship Classification of Paintings Kai-Lung Hua ( 花凱龍 ) Dept. of Computer Science and Information Engineering National Taiwan University of Science and Technology Taipei,
More informationarxiv: v1 [cs.cv] 14 Jul 2017
Temporal Modeling Approaches for Large-scale Youtube-8M Video Understanding Fu Li, Chuang Gan, Xiao Liu, Yunlong Bian, Xiang Long, Yandong Li, Zhichao Li, Jie Zhou, Shilei Wen Baidu IDL & Tsinghua University
More informationReservoir Computing with Emphasis on Liquid State Machines
Reservoir Computing with Emphasis on Liquid State Machines Alex Klibisz University of Tennessee aklibisz@gmail.com November 28, 2016 Context and Motivation Traditional ANNs are useful for non-linear problems,
More informationLSTM: An Image Classification Model Based on Fashion-MNIST Dataset
LSTM: An Image Classification Model Based on Fashion-MNIST Dataset Kexin Zhang, Research School of Computer Science, Australian National University Kexin Zhang, U6342657@anu.edu.au Abstract. The application
More informationDisguised Face Identification (DFI) with Facial KeyPoints using Spatial Fusion Convolutional Network. Nathan Sun CIS601
Disguised Face Identification (DFI) with Facial KeyPoints using Spatial Fusion Convolutional Network Nathan Sun CIS601 Introduction Face ID is complicated by alterations to an individual s appearance Beard,
More informationDeep Learning For Video Classification. Presented by Natalie Carlebach & Gil Sharon
Deep Learning For Video Classification Presented by Natalie Carlebach & Gil Sharon Overview Of Presentation Motivation Challenges of video classification Common datasets 4 different methods presented in
More informationAction recognition in robot-assisted minimally invasive surgery
Action recognition in robot-assisted minimally invasive surgery Candidate: Laura Erica Pescatori Co-Tutor: Hirenkumar Chandrakant Nakawala Tutor: Elena De Momi 1 Project Objective da Vinci Robot: Console
More informationCode Mania Artificial Intelligence: a. Module - 1: Introduction to Artificial intelligence and Python:
Code Mania 2019 Artificial Intelligence: a. Module - 1: Introduction to Artificial intelligence and Python: 1. Introduction to Artificial Intelligence 2. Introduction to python programming and Environment
More informationarxiv: v1 [cs.cv] 4 Feb 2018
End2You The Imperial Toolkit for Multimodal Profiling by End-to-End Learning arxiv:1802.01115v1 [cs.cv] 4 Feb 2018 Panagiotis Tzirakis Stefanos Zafeiriou Björn W. Schuller Department of Computing Imperial
More informationMachine Learning 13. week
Machine Learning 13. week Deep Learning Convolutional Neural Network Recurrent Neural Network 1 Why Deep Learning is so Popular? 1. Increase in the amount of data Thanks to the Internet, huge amount of
More informationDEEP LEARNING WITH GPUS Maxim Milakov, Senior HPC DevTech Engineer, NVIDIA
DEEP LEARNING WITH GPUS Maxim Milakov, Senior HPC DevTech Engineer, NVIDIA TOPICS COVERED Convolutional Networks Deep Learning Use Cases GPUs cudnn 2 MACHINE LEARNING! Training! Train the model from supervised
More informationEnd-To-End Spam Classification With Neural Networks
End-To-End Spam Classification With Neural Networks Christopher Lennan, Bastian Naber, Jan Reher, Leon Weber 1 Introduction A few years ago, the majority of the internet s network traffic was due to spam
More informationCOMP 551 Applied Machine Learning Lecture 16: Deep Learning
COMP 551 Applied Machine Learning Lecture 16: Deep Learning Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise noted, all
More informationTri-modal Human Body Segmentation
Tri-modal Human Body Segmentation Master of Science Thesis Cristina Palmero Cantariño Advisor: Sergio Escalera Guerrero February 6, 2014 Outline 1 Introduction 2 Tri-modal dataset 3 Proposed baseline 4
More informationA Comparison of Sequence-Trained Deep Neural Networks and Recurrent Neural Networks Optical Modeling For Handwriting Recognition
A Comparison of Sequence-Trained Deep Neural Networks and Recurrent Neural Networks Optical Modeling For Handwriting Recognition Théodore Bluche, Hermann Ney, Christopher Kermorvant SLSP 14, Grenoble October
More informationKnow your data - many types of networks
Architectures Know your data - many types of networks Fixed length representation Variable length representation Online video sequences, or samples of different sizes Images Specific architectures for
More informationDeep Learning With Noise
Deep Learning With Noise Yixin Luo Computer Science Department Carnegie Mellon University yixinluo@cs.cmu.edu Fan Yang Department of Mathematical Sciences Carnegie Mellon University fanyang1@andrew.cmu.edu
More informationA new approach for supervised power disaggregation by using a deep recurrent LSTM network
A new approach for supervised power disaggregation by using a deep recurrent LSTM network GlobalSIP 2015, 14th Dec. Lukas Mauch and Bin Yang Institute of Signal Processing and System Theory University
More information3D Attention-Driven Depth Acquisition for Object Identification
3D Attention-Driven Depth Acquisition for Object Identification Kai Xu, Yifei Shi, Lintao Zheng, Junyu Zhang, Min Liu, Hui Huang, Hao Su, Daniel Cohen-Or and Baoquan Chen National University of Defense
More informationRolling Bearing Diagnosis Based on CNN-LSTM and Various Condition Dataset
Rolling Bearing Diagnosis Based on CNN-LSTM and Various Condition Dataset Osamu Yoshimatsu 1, Yoshihiro Satou 2, and Kenichi Shibasaki 3 1,2,3 Core Technology R&D Center, NSK Ltd., Fujisawa, Kanagawa,
More informationOn the Efficiency of Recurrent Neural Network Optimization Algorithms
On the Efficiency of Recurrent Neural Network Optimization Algorithms Ben Krause, Liang Lu, Iain Murray, Steve Renals University of Edinburgh Department of Informatics s17005@sms.ed.ac.uk, llu@staffmail.ed.ac.uk,
More information27: Hybrid Graphical Models and Neural Networks
10-708: Probabilistic Graphical Models 10-708 Spring 2016 27: Hybrid Graphical Models and Neural Networks Lecturer: Matt Gormley Scribes: Jakob Bauer Otilia Stretcu Rohan Varma 1 Motivation We first look
More informationText Recognition in Videos using a Recurrent Connectionist Approach
Author manuscript, published in "ICANN - 22th International Conference on Artificial Neural Networks, Lausanne : Switzerland (2012)" DOI : 10.1007/978-3-642-33266-1_22 Text Recognition in Videos using
More informationRGBD Occlusion Detection via Deep Convolutional Neural Networks
1 RGBD Occlusion Detection via Deep Convolutional Neural Networks Soumik Sarkar 1,2, Vivek Venugopalan 1, Kishore Reddy 1, Michael Giering 1, Julian Ryde 3, Navdeep Jaitly 4,5 1 United Technologies Research
More informationEmpirical Evaluation of RNN Architectures on Sentence Classification Task
Empirical Evaluation of RNN Architectures on Sentence Classification Task Lei Shen, Junlin Zhang Chanjet Information Technology lorashen@126.com, zhangjlh@chanjet.com Abstract. Recurrent Neural Networks
More informationImage Captioning with Object Detection and Localization
Image Captioning with Object Detection and Localization Zhongliang Yang, Yu-Jin Zhang, Sadaqat ur Rehman, Yongfeng Huang, Department of Electronic Engineering, Tsinghua University, Beijing 100084, China
More informationDeep Learning in Visual Recognition. Thanks Da Zhang for the slides
Deep Learning in Visual Recognition Thanks Da Zhang for the slides Deep Learning is Everywhere 2 Roadmap Introduction Convolutional Neural Network Application Image Classification Object Detection Object
More informationPerceiving the 3D World from Images and Videos. Yu Xiang Postdoctoral Researcher University of Washington
Perceiving the 3D World from Images and Videos Yu Xiang Postdoctoral Researcher University of Washington 1 2 Act in the 3D World Sensing & Understanding Acting Intelligent System 3D World 3 Understand
More informationPrediction of Pedestrian Trajectories Final Report
Prediction of Pedestrian Trajectories Final Report Mingchen Li (limc), Yiyang Li (yiyang7), Gendong Zhang (zgdsh29) December 15, 2017 1 Introduction As the industry of automotive vehicles growing rapidly,
More informationTwo-Stream Convolutional Networks for Action Recognition in Videos
Two-Stream Convolutional Networks for Action Recognition in Videos Karen Simonyan Andrew Zisserman Cemil Zalluhoğlu Introduction Aim Extend deep Convolution Networks to action recognition in video. Motivation
More informationFace Recognition A Deep Learning Approach
Face Recognition A Deep Learning Approach Lihi Shiloh Tal Perl Deep Learning Seminar 2 Outline What about Cat recognition? Classical face recognition Modern face recognition DeepFace FaceNet Comparison
More informationAsynchronous Parallel Learning for Neural Networks and Structured Models with Dense Features
Asynchronous Parallel Learning for Neural Networks and Structured Models with Dense Features Xu SUN ( 孙栩 ) Peking University xusun@pku.edu.cn Motivation Neural networks -> Good Performance CNN, RNN, LSTM
More informationTraining LDCRF model on unsegmented sequences using Connectionist Temporal Classification
Training LDCRF model on unsegmented sequences using Connectionist Temporal Classification 1 Amir Ahooye Atashin, 2 Kamaledin Ghiasi-Shirazi, 3 Ahad Harati Department of Computer Engineering Ferdowsi University
More informationDeep Learning Applications
October 20, 2017 Overview Supervised Learning Feedforward neural network Convolution neural network Recurrent neural network Recursive neural network (Recursive neural tensor network) Unsupervised Learning
More informationSEMANTIC COMPUTING. Lecture 8: Introduction to Deep Learning. TU Dresden, 7 December Dagmar Gromann International Center For Computational Logic
SEMANTIC COMPUTING Lecture 8: Introduction to Deep Learning Dagmar Gromann International Center For Computational Logic TU Dresden, 7 December 2018 Overview Introduction Deep Learning General Neural Networks
More informationSentiment Classification of Food Reviews
Sentiment Classification of Food Reviews Hua Feng Department of Electrical Engineering Stanford University Stanford, CA 94305 fengh15@stanford.edu Ruixi Lin Department of Electrical Engineering Stanford
More informationCS231A Course Project Final Report Sign Language Recognition with Unsupervised Feature Learning
CS231A Course Project Final Report Sign Language Recognition with Unsupervised Feature Learning Justin Chen Stanford University justinkchen@stanford.edu Abstract This paper focuses on experimenting with
More informationConvolutional Neural Networks: Applications and a short timeline. 7th Deep Learning Meetup Kornel Kis Vienna,
Convolutional Neural Networks: Applications and a short timeline 7th Deep Learning Meetup Kornel Kis Vienna, 1.12.2016. Introduction Currently a master student Master thesis at BME SmartLab Started deep
More informationS7348: Deep Learning in Ford's Autonomous Vehicles. Bryan Goodman Argo AI 9 May 2017
S7348: Deep Learning in Ford's Autonomous Vehicles Bryan Goodman Argo AI 9 May 2017 1 Ford s 12 Year History in Autonomous Driving Today: examples from Stereo image processing Object detection Using RNN
More informationThe Hilbert Problems of Computer Vision. Jitendra Malik UC Berkeley & Google, Inc.
The Hilbert Problems of Computer Vision Jitendra Malik UC Berkeley & Google, Inc. This talk The computational power of the human brain Research is the art of the soluble Hilbert problems, circa 2004 Hilbert
More informationSequence Modeling: Recurrent and Recursive Nets. By Pyry Takala 14 Oct 2015
Sequence Modeling: Recurrent and Recursive Nets By Pyry Takala 14 Oct 2015 Agenda Why Recurrent neural networks? Anatomy and basic training of an RNN (10.2, 10.2.1) Properties of RNNs (10.2.2, 8.2.6) Using
More informationRecurrent Convolutional Neural Networks for Scene Labeling
Recurrent Convolutional Neural Networks for Scene Labeling Pedro O. Pinheiro, Ronan Collobert Reviewed by Yizhe Zhang August 14, 2015 Scene labeling task Scene labeling: assign a class label to each pixel
More informationRecurrent Neural Nets II
Recurrent Neural Nets II Steven Spielberg Pon Kumar, Tingke (Kevin) Shen Machine Learning Reading Group, Fall 2016 9 November, 2016 Outline 1 Introduction 2 Problem Formulations with RNNs 3 LSTM for Optimization
More informationLearning Deep and Compact Models for Gesture Recognition
Learning Deep and Compact Models for Gesture Recognition Thesis submitted in partial fulfillment of the requirements for the degree of MS in Computer Science and Engineering, by Research by Koustav Mullick
More informationDependency Parsing 2 CMSC 723 / LING 723 / INST 725. Marine Carpuat. Fig credits: Joakim Nivre, Dan Jurafsky & James Martin
Dependency Parsing 2 CMSC 723 / LING 723 / INST 725 Marine Carpuat Fig credits: Joakim Nivre, Dan Jurafsky & James Martin Dependency Parsing Formalizing dependency trees Transition-based dependency parsing
More informationTopics for thesis. Automatic Speech-based Emotion Recognition
Topics for thesis Bachelor: Automatic Speech-based Emotion Recognition Emotion recognition is an important part of Human-Computer Interaction (HCI). It has various applications in industrial and commercial
More informationarxiv: v1 [cs.cv] 2 May 2017
INVESTIGATION OF DIFFERENT SKELETON FEATURES FOR CNN-BASED 3D ACTION RECOGNITION Zewei Ding, Pichao Wang*, Philip O. Ogunbona, Wanqing Li Advanced Multimedia Research Lab, University of Wollongong, Australia
More informationImageNet Classification with Deep Convolutional Neural Networks
ImageNet Classification with Deep Convolutional Neural Networks Alex Krizhevsky Ilya Sutskever Geoffrey Hinton University of Toronto Canada Paper with same name to appear in NIPS 2012 Main idea Architecture
More informationOnline Detection and Classification of Dynamic Hand Gestures with Recurrent 3D Convolutional Neural Networks
Online Detection and Classification of Dynamic Hand Gestures with Recurrent 3D Convolutional Neural Networks Pavlo Molchanov iaodong Yang Shalini Gupta Kihwan Kim Stephen Tyree Jan Kautz NVIDIA {pmolchanov,xiaodongy,shalinig,kihwank,styree,jkautz}@nvidia.com
More informationCSC 578 Neural Networks and Deep Learning
CSC 578 Neural Networks and Deep Learning Fall 2018/19 7. Recurrent Neural Networks (Some figures adapted from NNDL book) 1 Recurrent Neural Networks 1. Recurrent Neural Networks (RNNs) 2. RNN Training
More informationThis is a repository copy of Performance evaluation of deep feature learning for RGB-D image/video classification.
This is a repository copy of Performance evaluation of deep feature learning for RGB-D image/video classification. White Rose Research Online URL for this paper: http://eprints.whiterose.ac.uk/113273/
More informationHello Edge: Keyword Spotting on Microcontrollers
Hello Edge: Keyword Spotting on Microcontrollers Yundong Zhang, Naveen Suda, Liangzhen Lai and Vikas Chandra ARM Research, Stanford University arxiv.org, 2017 Presented by Mohammad Mofrad University of
More informationDiffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting
Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting Yaguang Li Joint work with Rose Yu, Cyrus Shahabi, Yan Liu Page 1 Introduction Traffic congesting is wasteful of time,
More informationProceedings of the International MultiConference of Engineers and Computer Scientists 2018 Vol I IMECS 2018, March 14-16, 2018, Hong Kong
, March 14-16, 2018, Hong Kong , March 14-16, 2018, Hong Kong , March 14-16, 2018, Hong Kong , March 14-16, 2018, Hong Kong TABLE I CLASSIFICATION ACCURACY OF DIFFERENT PRE-TRAINED MODELS ON THE TEST DATA
More informationSemantic image search using queries
Semantic image search using queries Shabaz Basheer Patel, Anand Sampat Department of Electrical Engineering Stanford University CA 94305 shabaz@stanford.edu,asampat@stanford.edu Abstract Previous work,
More informationVideo Gesture Recognition with RGB-D-S Data Based on 3D Convolutional Networks
Video Gesture Recognition with RGB-D-S Data Based on 3D Convolutional Networks August 16, 2016 1 Team details Team name FLiXT Team leader name Yunan Li Team leader address, phone number and email address:
More informationRecurrent Neural Networks
Recurrent Neural Networks Javier Béjar Deep Learning 2018/2019 Fall Master in Artificial Intelligence (FIB-UPC) Introduction Sequential data Many problems are described by sequences Time series Video/audio
More informationDeep Convolutional Neural Networks and Noisy Images
Deep Convolutional Neural Networks and Noisy Images Tiago S. Nazaré, Gabriel B. Paranhos da Costa, Welinton A. Contato, and Moacir Ponti Instituto de Ciências Matemáticas e de Computação Universidade de
More informationSign Language Recognition using Dynamic Time Warping and Hand Shape Distance Based on Histogram of Oriented Gradient Features
Sign Language Recognition using Dynamic Time Warping and Hand Shape Distance Based on Histogram of Oriented Gradient Features Pat Jangyodsuk Department of Computer Science and Engineering The University
More informationNeural Network and Deep Learning. Donglin Zeng, Department of Biostatistics, University of North Carolina
Neural Network and Deep Learning Early history of deep learning Deep learning dates back to 1940s: known as cybernetics in the 1940s-60s, connectionism in the 1980s-90s, and under the current name starting
More informationPluto A Distributed Heterogeneous Deep Learning Framework. Jun Yang, Yan Chen Large Scale Learning, Alibaba Cloud
Pluto A Distributed Heterogeneous Deep Learning Framework Jun Yang, Yan Chen Large Scale Learning, Alibaba Cloud Outline PAI(Platform of Artificial Intelligence) PAI Overview Deep Learning with PAI Pluto
More informationDomain-Aware Sentiment Classification with GRUs and CNNs
Domain-Aware Sentiment Classification with GRUs and CNNs Guangyuan Piao 1(B) and John G. Breslin 2 1 Insight Centre for Data Analytics, Data Science Institute, National University of Ireland Galway, Galway,
More informationA FRAMEWORK OF EXTRACTING MULTI-SCALE FEATURES USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKS. Kuan-Chuan Peng and Tsuhan Chen
A FRAMEWORK OF EXTRACTING MULTI-SCALE FEATURES USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKS Kuan-Chuan Peng and Tsuhan Chen School of Electrical and Computer Engineering, Cornell University, Ithaca, NY
More informationFeature-Fused SSD: Fast Detection for Small Objects
Feature-Fused SSD: Fast Detection for Small Objects Guimei Cao, Xuemei Xie, Wenzhe Yang, Quan Liao, Guangming Shi, Jinjian Wu School of Electronic Engineering, Xidian University, China xmxie@mail.xidian.edu.cn
More informationDeep learning for dense per-pixel prediction. Chunhua Shen The University of Adelaide, Australia
Deep learning for dense per-pixel prediction Chunhua Shen The University of Adelaide, Australia Image understanding Classification error Convolution Neural Networks 0.3 0.2 0.1 Image Classification [Krizhevsky
More informationCS489/698: Intro to ML
CS489/698: Intro to ML Lecture 14: Training of Deep NNs Instructor: Sun Sun 1 Outline Activation functions Regularization Gradient-based optimization 2 Examples of activation functions 3 5/28/18 Sun Sun
More informationCombining Neural Networks and Log-linear Models to Improve Relation Extraction
Combining Neural Networks and Log-linear Models to Improve Relation Extraction Thien Huu Nguyen and Ralph Grishman Computer Science Department, New York University {thien,grishman}@cs.nyu.edu Outline Relation
More informationQuality Guided Image Denoising for Low-Cost Fundus Imaging
Quality Guided Image Denoising for Low-Cost Fundus Imaging Thomas Köhler1,2, Joachim Hornegger1,2, Markus Mayer1,2, Georg Michelson2,3 20.03.2012 1 Pattern Recognition Lab, Ophthalmic Imaging Group 2 Erlangen
More informationA MULTI-RESOLUTION FUSION MODEL INCORPORATING COLOR AND ELEVATION FOR SEMANTIC SEGMENTATION
A MULTI-RESOLUTION FUSION MODEL INCORPORATING COLOR AND ELEVATION FOR SEMANTIC SEGMENTATION Wenkai Zhang a, b, Hai Huang c, *, Matthias Schmitz c, Xian Sun a, Hongqi Wang a, Helmut Mayer c a Key Laboratory
More informationMulti-Modal Audio, Video, and Physiological Sensor Learning for Continuous Emotion Prediction
Multi-Modal Audio, Video, and Physiological Sensor Learning for Continuous Emotion Prediction Youngjune Gwon 1, Kevin Brady 1, Pooya Khorrami 2, Elizabeth Godoy 1, William Campbell 1, Charlie Dagli 1,
More informationList of Accepted Papers for ICVGIP 2018
List of Accepted Papers for ICVGIP 2018 Paper ID ACM Article Title 3 1 PredGAN - A deep multi-scale video prediction framework for anomaly detection in videos 7 2 Handwritten Essay Grading on Mobiles using
More informationarxiv: v1 [stat.ml] 3 Aug 2015
Time-series modeling with undecimated fully convolutional neural networks arxiv:1508.00317v1 [stat.ml] 3 Aug 2015 Roni Mittelman rmittelm@gmail.com Abstract We present a new convolutional neural network-based
More informationLayerwise Interweaving Convolutional LSTM
Layerwise Interweaving Convolutional LSTM Tiehang Duan and Sargur N. Srihari Department of Computer Science and Engineering The State University of New York at Buffalo Buffalo, NY 14260, United States
More informationSKELETON-INDEXED DEEP MULTI-MODAL FEATURE LEARNING FOR HIGH PERFORMANCE HUMAN ACTION RECOGNITION. Chinese Academy of Sciences, Beijing, China
SKELETON-INDEXED DEEP MULTI-MODAL FEATURE LEARNING FOR HIGH PERFORMANCE HUMAN ACTION RECOGNITION Sijie Song 1, Cuiling Lan 2, Junliang Xing 3, Wenjun Zeng 2, Jiaying Liu 1 1 Institute of Computer Science
More informationApril 4-7, 2016 Silicon Valley
April 4-7, 2016 Silicon Valley Neural Attention for Object Tracking Brian Cheung bcheung@berkeley.edu Redwood Center for Theoretical Neuroscience, UC Berkeley Visual Computing Research, NVIDIA Source:
More informationarxiv: v1 [cs.cv] 31 Mar 2016
Object Boundary Guided Semantic Segmentation Qin Huang, Chunyang Xia, Wenchao Zheng, Yuhang Song, Hao Xu and C.-C. Jay Kuo arxiv:1603.09742v1 [cs.cv] 31 Mar 2016 University of Southern California Abstract.
More informationImage de-fencing using RGB-D data
Image de-fencing using RGB-D data Vikram Voleti IIIT-Hyderabad, India Supervisor: Masters thesis at IIT Kharagpur, India (2013-2014) Prof. Rajiv Ranjan Sahay Associate Professor, Electrical Engineering,
More informationPixel-level Generative Model
Pixel-level Generative Model Generative Image Modeling Using Spatial LSTMs (2015NIPS) L. Theis and M. Bethge University of Tübingen, Germany Pixel Recurrent Neural Networks (2016ICML) A. van den Oord,
More informationUTS submission to Google YouTube-8M Challenge 2017
UTS submission to Google YouTube-8M Challenge 2017 Linchao Zhu Yanbin Liu Yi Yang University of Technology Sydney {zhulinchao7, csyanbin, yee.i.yang}@gmail.com Abstract In this paper, we present our solution
More informationTutorial on Keras CAP ADVANCED COMPUTER VISION SPRING 2018 KISHAN S ATHREY
Tutorial on Keras CAP 6412 - ADVANCED COMPUTER VISION SPRING 2018 KISHAN S ATHREY Deep learning packages TensorFlow Google PyTorch Facebook AI research Keras Francois Chollet (now at Google) Chainer Company
More information