RECURRENT NEURAL NETWORKS

Similar documents
Self Driving. DNN * * Reinforcement * Unsupervised *

CS231N Section. Video Understanding 6/1/2018

HUMAN ACTION RECOGNITION

Code Mania Artificial Intelligence: a. Module - 1: Introduction to Artificial intelligence and Python:

Sequence Modeling: Recurrent and Recursive Nets. By Pyry Takala 14 Oct 2015

Machine Learning 13. week

People Detection and Video Understanding

Turning an Automated System into an Autonomous system using Model-Based Design Autonomous Tech Conference 2018

Bidirectional Recurrent Convolutional Networks for Video Super-Resolution

Multimodal Gesture Recognition using Multi-stream Recurrent Neural Network

Project Final Report

Recurrent Neural Networks for Driver Activity Anticipation via Sensory-Fusion Architecture

LSTM and its variants for visual recognition. Xiaodan Liang Sun Yat-sen University

VISION FOR AUTOMOTIVE DRIVING

Characterization and Benchmarking of Deep Learning. Natalia Vassilieva, PhD Sr. Research Manager

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks

Multilayer and Multimodal Fusion of Deep Neural Networks for Video Classification

Object Detection. Part1. Presenter: Dae-Yong

Binary Convolutional Neural Network on RRAM

Deep learning for object detection. Slides from Svetlana Lazebnik and many others

Topics in AI (CPSC 532L): Multimodal Learning with Vision, Language and Sound. Lecture 12: Deep Reinforcement Learning

Hello Edge: Keyword Spotting on Microcontrollers

Multi-Glance Attention Models For Image Classification

Disguised Face Identification (DFI) with Facial KeyPoints using Spatial Fusion Convolutional Network. Nathan Sun CIS601

Show, Discriminate, and Tell: A Discriminatory Image Captioning Model with Deep Neural Networks

A Torch Library for Action Recognition and Detection Using CNNs and LSTMs

Visual features detection based on deep neural network in autonomous driving tasks

arxiv: v1 [cs.cv] 20 Dec 2016

An Exploration of Computer Vision Techniques for Bird Species Classification

Tutorial on Keras CAP ADVANCED COMPUTER VISION SPRING 2018 KISHAN S ATHREY

arxiv: v2 [cs.cv] 14 May 2018

S7348: Deep Learning in Ford's Autonomous Vehicles. Bryan Goodman Argo AI 9 May 2017

Lecture: Deep Convolutional Neural Networks

2 OVERVIEW OF RELATED WORK

Spatial Localization and Detection. Lecture 8-1

Recurrent Neural Networks and Transfer Learning for Action Recognition

Towards Fully-automated Driving. tue-mps.org. Challenges and Potential Solutions. Dr. Gijs Dubbelman Mobile Perception Systems EE-SPS/VCA

Combining Neural Networks and Log-linear Models to Improve Relation Extraction

Ryerson University CP8208. Soft Computing and Machine Intelligence. Naive Road-Detection using CNNS. Authors: Sarah Asiri - Domenic Curro

Machine learning based automatic extrinsic calibration of an onboard monocular camera for driving assistance applications on smart mobile devices

Deep Neural Networks for Recognizing Online Handwritten Mathematical Symbols

LSTM: An Image Classification Model Based on Fashion-MNIST Dataset

W4. Perception & Situation Awareness & Decision making

CSC 578 Neural Networks and Deep Learning

Layerwise Interweaving Convolutional LSTM

Object detection using Region Proposals (RCNN) Ernest Cheung COMP Presentation

A Comparison of Sequence-Trained Deep Neural Networks and Recurrent Neural Networks Optical Modeling For Handwriting Recognition

Rich feature hierarchies for accurate object detection and semantic segmentation

EasyChair Preprint. Real-Time Action Recognition based on Enhanced Motion Vector Temporal Segment Network

Object Detection Based on Deep Learning

Video Gesture Recognition with RGB-D-S Data Based on 3D Convolutional Networks

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides

Vehicle Localization. Hannah Rae Kerner 21 April 2015

Object Recognition II

3D Attention-Driven Depth Acquisition for Object Identification

Flow-Based Video Recognition

DEEP LEARNING REVIEW. Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature Presented by Divya Chitimalla

Deep Learning for Computer Vision II

Multimodal Gesture Recognition using Multi-stream Recurrent Neural Network

Fully Convolutional Networks for Semantic Segmentation

Constrained Convolutional Neural Networks for Weakly Supervised Segmentation. Deepak Pathak, Philipp Krähenbühl and Trevor Darrell

Rich feature hierarchies for accurate object detection and semantic segmentation

Two-Stream Convolutional Networks for Action Recognition in Videos

List of Accepted Papers for ICVGIP 2018

Crowd Scene Understanding with Coherent Recurrent Neural Networks

Deep Learning For Video Classification. Presented by Natalie Carlebach & Gil Sharon

GPU Accelerated Sequence Learning for Action Recognition. Yemin Shi

International Journal of Computer Engineering and Applications, Volume XII, Special Issue, September 18,

Deep Learning. Roland Olsson

Perceiving the 3D World from Images and Videos. Yu Xiang Postdoctoral Researcher University of Washington

Vision based autonomous driving - A survey of recent methods. -Tejus Gupta

Creating Affordable and Reliable Autonomous Vehicle Systems

End-to-end Driving Controls Predictions from Images

Object Classification for Video Surveillance

Segmentation-free Vehicle License Plate Recognition using ConvNet-RNN

3D Object Recognition and Scene Understanding from RGB-D Videos. Yu Xiang Postdoctoral Researcher University of Washington

USING DEEP LEARNING TECHNOLOGIES TO PROVIDE DESCRIPTIVE METADATA FOR LIVE VIDEO CONTENTS

Introduction to Deep Learning for Facial Understanding Part III: Regional CNNs

A Multi-Stream Bi-Directional Recurrent Neural Network for Fine-Grained Action Detection

Deformable Part Models

MULTI-SCALE OBJECT DETECTION WITH FEATURE FUSION AND REGION OBJECTNESS NETWORK. Wenjie Guan, YueXian Zou*, Xiaoqun Zhou

Recurrent Neural Nets II

arxiv: v1 [cs.cv] 31 Mar 2016

Deep Learning. Deep Learning provided breakthrough results in speech recognition and image classification. Why?

A NEW COMPUTING ERA JENSEN HUANG, FOUNDER & CEO GTC CHINA 2017

A new approach for supervised power disaggregation by using a deep recurrent LSTM network

Geometry-aware Traffic Flow Analysis by Detection and Tracking

Towards Large-Scale Semantic Representations for Actionable Exploitation. Prof. Trevor Darrell UC Berkeley

Industrial Technology Research Institute, Hsinchu, Taiwan, R.O.C ǂ

A Novel Representation and Pipeline for Object Detection

A Quick Guide on Training a neural network using Keras.

Real Time Monitoring of CCTV Camera Images Using Object Detectors and Scene Classification for Retail and Surveillance Applications

MULTI-VIEW FEATURE FUSION NETWORK FOR VEHICLE RE- IDENTIFICATION

Deep Learning Based Real-time Object Recognition System with Image Web Crawler

Inception and Residual Networks. Hantao Zhang. Deep Learning with Python.

Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

Mobile Human Detection Systems based on Sliding Windows Approach-A Review

Analyzing the Relationship Between Head Pose and Gaze to Model Driver Visual Attention

Designing a software framework for automated driving. Dr.-Ing. Sebastian Ohl, 2017 October 12 th

Recurrent neural networks for polyphonic sound event detection in real life recordings

Transcription:

RECURRENT NEURAL NETWORKS

Methods Traditional Deep-Learning based Non-machine Learning Machine-Learning based method Supervised SVM MLP CNN RNN (LSTM) Localizati on GPS, SLAM Self Driving Perception Pedestrian detection (HOG+SVM) Detection/ Segmentat ion/classif ication Dry/wet road classificati on ADAS Planning/ Control Optimal control End-toend Learning End-toend Learning Tasks Driver state Behavior Prediction/ Driver identificati on Vehicle Diagnosis Smart factory DNN * * Reinforcement * Unsupervised * *

RNN INTRODUCTION

Recurrent Neural Network ( 참고 )

Repeating module in RNN

Repeating module in LSTM

Recurrent Neural Network

Recurrent Neural Network

FLEXIBILITY OF RNN

RNN offers a lot of flexibility Vanilla N.N. Image Captioning Sentiment Classification Machine Translation Video Classification on Frame Level

TRAINING

Training Backpropagation through time (BPTT) Unfold and apply SGD

EXAMPLE/VARIANTS

Training

Results

Generated C code

Searching for interpretable cells

Searching for interpretable cells

Searching for interpretable cells

Searching for interpretable cells

Variants Bidirectional Bidirectional Deep

예제코드 LSTM

APPLICATIONS

CAR THAT KNOWS BEFORE YOU DO VIA SENSORY-FUSION DEEP LEARNING ARCHITECTURE ICCV 2015, Cornell Univ., Stanford Univ., Brain Of Things Inc

Methods Traditional Deep-Learning based Non-machine Learning Machine-Learning based method Supervised SVM MLP CNN RNN (LSTM) Localizati on GPS, SLAM Self Driving Perception Pedestrian detection (HOG+SVM) Detection/ Segmentat ion/classif ication Dry/wet road classificati on ADAS Planning/ Control Optimal control End-toend Learning End-toend Learning Tasks Driver state Behavior Prediction/ Driver identificati on Vehicle Diagnosis Smart factory DNN * * Reinforcement * Unsupervised * *

Overview An approach for anticipating driving maneuvers, several seconds in advance: lane change, keeping straight, turn,... Generic sensory-fusion RNN-LSTM architecture for anticipation in robotics applications

Demo Video https://youtu.be/o5i1hbwkwjc

Setup Driver-facing camera inside the vehicle Camera facing the road Speed logger of the car Global Positioning System (GPS)

Features 1. Face detection and tracking: - Driver s face: Viola Jones face detector - Point extract: Shi-Tomasi corner detector - Facial points tracker: KLT(Kanade-Lucas-Tomasi) 2. Head motion features (φ face R 9 ): - histogram features are used - matches facial points and create histograms of corresponding horizontal motions 3. 3D head pose and facial landmark features(φ face R 12 ): CLNF tracker model Aggregate φ face for every 20 frames

Network Architecture at a glance LSTM units is used for training - consider accumulated information from the past: noted as event x 1, x 2,, x T, y - y t is updated as y t = softmax W y h t + b y, with the representation h t = f Wx t + Hh t 1 + b x t : observation at time t y:representation of the event

Data set Fully natural driving 1180 miles of natural freeway, city driving Collected across two states Ten drivers, different kinds of driving maneuvers 2 months to take About 17GB 700 events of annotation - 274 lane change - 131 turns - 295 straight

Result With sensory fusion: - Precision of 84.5% - Recall of 77.1% - Anticipates maneuvers 3.5 seconds on average With incorporating the driver s 3D head-pose: - Precision of 90.5% - Recall of 87.4% - Anticipates maneuvers 3.16 seconds on average

CHARACTERIZING DRIVING STYLES WITH DEEP LEARNING W. Dong et al. ArXiv 2016 IBM Research China, Nanjing Univ., Univ. of Waterloo

Methods Traditional Deep-Learning based Non-machine Learning Machine-Learning based method Supervised SVM MLP CNN RNN (LSTM) Localizati on GPS, SLAM Self Driving Perception Pedestrian detection (HOG+SVM) Detection/ Segmentat ion/classif ication Dry/wet road classificati on ADAS Planning/ Control Optimal control End-toend Learning End-toend Learning Tasks Driver state Behavior Prediction/ Driver identificati on Vehicle Diagnosis Smart factory DNN * * Reinforcement * Unsupervised * *

Overview Speed norm difference of speed norm acceleration norm difference of acceleration norm angular speed mean Minimum Maximum First quartile Second quartile Third quartile Standard deviation Extracted 35 features Classification model Driver identification GPS on vehicles

Data 목표 : GPS 데이터를이용하여사용자 ID 분류수행 Input sensor GPS trajectory as a sequence (x, y, t) Dataset and feature map Training/Test data Data from 50/1,000 drivers (original data: Kaggle 2015 competition on Driver Telematics Analysis) Data transformation Feature map (35=5 x 7) Basic features (5) :Speed norm; difference of speed norm; acceleration norm; difference of acceleration norm; angular speed Statistical features (7) : mean; min; max; ¼ ; ½ ; ¾ ; standard deviation of each basic feature

Machine Learning Machine Learning Models CNN Six layers model (3 conv. + 3 FC) RNN RNN model Gradient boosting decision tree (GBDT) For reference

Experimental Results Experiments 80% for training, 20% for testing Small scale test for 50 drivers data Include 35,000 segments from 8,000 trips large scale test for 1,000 drivers data Results

DETECTING ROAD SURFACE WETNESS

Methods Traditional Deep-Learning based Non-machine Learning Machine-Learning based method Supervised SVM MLP CNN RNN (LSTM) Localizati on GPS, SLAM Self Driving Perception Pedestrian detection (HOG+SVM) Detection/ Segmentat ion/classif ication Dry/wet road classificati on ADAS Planning/ Control Optimal control End-toend Learning End-toend Learning Tasks Driver state Behavior Prediction/ Driver identificati on Vehicle Diagnosis Smart factory DNN * * Reinforcement * Unsupervised * *

Dataset/Machine Learning Shotgun microphone behind the rear tire Spectogram + LSTM 관련기사 http://www.ibtimes.co.uk/heres-how-self-driving-cars-can-detectdangerous-roads-using-sound-ai-1532407

Detecting Road Surface wetness

END-TO-END DRIVING WITH RNNS

Methods Traditional Deep-Learning based Non-machine Learning Machine-Learning based method Supervised SVM MLP CNN RNN (LSTM) Localizati on GPS, SLAM Self Driving Perception Pedestrian detection (HOG+SVM) Detection/ Segmentat ion/classif ication Dry/wet road classificati on ADAS Planning/ Control Optimal control End-toend Learning End-toend Learning Tasks Driver state Behavior Prediction/ Driver identificati on Vehicle Diagnosis Smart factory DNN * * Reinforcement * Unsupervised * *

End-to-end Driving with RNNs 1 st and 3 rd place winner of the Udacity end-to-end steering competition used RNNs: Sequence-to-sequence mapping from images to steering angles 관련링크 https://medium.com/udacity/challenge-2-using-deep-learning-to-predictsteering-angles-f42004a36ff3 https://medium.com/udacity/teaching-a-machine-to-steer-a-card73217f2492c

1 st Place Winner LSTM inputs: 3D convolution of image sequence Outputs: predicted steering angle, speed, torque Sequence length = 10

3 rd Place Winner LSTM inputs: 3000 features extracted with CNN Outputs: predicted steering angle Sequence length = 10

ANTICIPATING ACCIDENTS IN DASHCAM VIDEOS

Dataset Positive examples Negative examples Total Training set 455 829 1284 Testing set 165 301 466 Total 620 1130 1730 Positive example Negative example

BACKUPS

LONG-TERM RECURRENT CONVOLUTIONAL NETWORKS FOR VISUAL RECOGNITION AND DESCRIPTION Jeff Donahue, Lisa Anne Hendricks, Marcus Rohrbach, Subhashini Venugopalan, Sergio Guadarrama, Kate Saenko, Trevor Darrell

Long-time Spatio-Temporal ConvNets

DELVING DEEPER INTO CONVOLUTIONAL NETWORKS FOR LEARNING VIDEO REPRESENTATIONS Ballas et al., 2016

Limitations in simple RNN-ConvNet Visualization of convolutional maps on successive frames in video. As we go up in the CNN hierarchy, we observe that the convolutional maps are more stable over time, and thus discard variation over short temporal windows.

Stack-GRU-RCN RCN (Recurrent Convolution Networks)