Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting

Similar documents
Spatiotemporal Multi-Graph Convolution Network for Ride-hailing Demand Forecasting

Latent Space Model for Road Networks to Predict Time-Varying Traffic. Presented by: Rob Fitzgerald Spring 2017

Deep Learning on Graphs

Asynchronous Parallel Learning for Neural Networks and Structured Models with Dense Features

LSTM and its variants for visual recognition. Xiaodan Liang Sun Yat-sen University

Image Question Answering using Convolutional Neural Network with Dynamic Parameter Prediction

Deep Learning on Graphs

Machine Learning 13. week

Recurrent Neural Network (RNN) Industrial AI Lab.

27: Hybrid Graphical Models and Neural Networks

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li

End-To-End Spam Classification With Neural Networks

Sequence Modeling: Recurrent and Recursive Nets. By Pyry Takala 14 Oct 2015

Bidirectional Recurrent Convolutional Networks for Video Super-Resolution

Data Quality Network for Spatiotemporal Forecasting

Restricted Boltzmann Machines. Shallow vs. deep networks. Stacked RBMs. Boltzmann Machine learning: Unsupervised version

Convolutional Sequence to Sequence Learning. Denis Yarats with Jonas Gehring, Michael Auli, David Grangier, Yann Dauphin Facebook AI Research

Latent Space Model for Road Networks to Predict Time-Varying Traffic

Disguised Face Identification (DFI) with Facial KeyPoints using Spatial Fusion Convolutional Network. Nathan Sun CIS601

JOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS. Puyang Xu, Ruhi Sarikaya. Microsoft Corporation

Alternatives to Direct Supervision

Image Captioning with Object Detection and Localization

Layerwise Interweaving Convolutional LSTM

Deep Learning Applications

This Talk. 1) Node embeddings. Map nodes to low-dimensional embeddings. 2) Graph neural networks. Deep learning architectures for graphstructured

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides

Lecture 10 CNNs on Graphs

Structured Attention Networks

arxiv: v1 [cs.lg] 13 Mar 2019

A. Further experimental analysis. B. Simulation data

Spatial Localization and Detection. Lecture 8-1

Empirical Evaluation of RNN Architectures on Sentence Classification Task

Channel Locality Block: A Variant of Squeeze-and-Excitation

Part Localization by Exploiting Deep Convolutional Networks

Content-Based Image Recovery

Crowd Scene Understanding with Coherent Recurrent Neural Networks

Recurrent Neural Networks. Nand Kishore, Audrey Huang, Rohan Batra

Deep Learning for Computer Vision II

A Hybrid Method for Traffic Flow Forecasting Using Multimodal Deep Learning

Early detection of Crossfire attacks using deep learning

Optimization for Machine Learning

CSC 578 Neural Networks and Deep Learning

arxiv: v1 [cs.cv] 6 Jul 2016

Recurrent Convolutional Neural Networks for Scene Labeling

Learning-based Localization

Pixel-level Generative Model

GRAPH-BASED SEMI-SUPERVISED HYPERSPECTRAL IMAGE CLASSIFICATION USING SPATIAL INFORMATION

CS 2750: Machine Learning. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh April 13, 2016

Lecture 21 : A Hybrid: Deep Learning and Graphical Models

GaAN: Gated Attention Networks for Learning on Large and Spatiotemporal Graphs

Large-Scale Face Manifold Learning

arxiv: v1 [cs.lg] 20 Apr 2017

Scaling Distributed Machine Learning

Deep Learning. Deep Learning provided breakthrough results in speech recognition and image classification. Why?

Efficient Algorithms may not be those we think

Markov Random Fields and Gibbs Sampling for Image Denoising

LC-RNN: A Deep Learning Model for Traffic Speed Prediction

arxiv: v1 [cond-mat.dis-nn] 30 Dec 2018

Tutorial on Keras CAP ADVANCED COMPUTER VISION SPRING 2018 KISHAN S ATHREY

Does the Brain do Inverse Graphics?

Practical Methodology. Lecture slides for Chapter 11 of Deep Learning Ian Goodfellow

Hello Edge: Keyword Spotting on Microcontrollers

Show, Discriminate, and Tell: A Discriminatory Image Captioning Model with Deep Neural Networks

Reddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011

Application of Deep Learning Techniques in Satellite Telemetry Analysis.

Machine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU,

DeepIM: Deep Iterative Matching for 6D Pose Estimation - Supplementary Material

Semantic Segmentation. Zhongang Qi

Spectral Active Clustering of Remote Sensing Images

Liangjie Hong*, Dawei Yin*, Jian Guo, Brian D. Davison*

16-785: Integrated Intelligence in Robotics: Vision, Language, and Planning. Spring 2018 Lecture 14. Image to Text

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks

Autoencoder. Representation learning (related to dictionary learning) Both the input and the output are x

Recommender Systems New Approaches with Netflix Dataset

A Dynamic Shortest Path Algorithm Using Multi-Step Ahead Link Travel Time Prediction

Predicting Station-level Hourly Demand in a Large-scale Bike-sharing Network: A Graph Convolutional Neural Network Approach

XES Tensorflow Process Prediction using the Tensorflow Deep-Learning Framework

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /10/2017

A Dendrogram. Bioinformatics (Lec 17)

Modeling Sequences Conditioned on Context with RNNs

ECS289: Scalable Machine Learning

Unsupervised Learning

Contrast adjustment via Bayesian sequential partitioning

RNNs as Directed Graphical Models

Sentiment Classification of Food Reviews

Sensor-based Semantic-level Human Activity Recognition using Temporal Classification

Applying Supervised Learning

Proceedings of the International MultiConference of Engineers and Computer Scientists 2018 Vol I IMECS 2018, March 14-16, 2018, Hong Kong

House Price Prediction Using LSTM

Prediction of Pedestrian Trajectories Final Report

Overall Description. Goal: to improve spatial invariance to the input data. Translation, Rotation, Scale, Clutter, Elastic

THE free electron laser(fel) at Stanford Linear

Temporal HeartNet: Towards Human-Level Automatic Analysis of Fetal Cardiac Screening Video

Why equivariance is better than premature invariance

S7348: Deep Learning in Ford's Autonomous Vehicles. Bryan Goodman Argo AI 9 May 2017

Deep Character-Level Click-Through Rate Prediction for Sponsored Search

Capsule Networks. Eric Mintun

Research on Pruning Convolutional Neural Network, Autoencoder and Capsule Network

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech

DEEP LEARNING APPROACH FOR REMOTE SENSING IMAGE ANALYSIS

Transcription:

Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting Yaguang Li Joint work with Rose Yu, Cyrus Shahabi, Yan Liu Page 1

Introduction Traffic congesting is wasteful of time, money and energy Traffic congestion costs Americans $124 billion + direct/indirect loss in 2013. Accurate traffic forecasting could substantially improve route planning and mitigate traffic congestion. + https://www.forbes.com/sites/federicoguerrini/2014/10/14/traffic-congestion-costs-americans-124-billion-a-year-report-says/ * Image from LaLa Land: http://variety.com/2017/film/awards/la-la-land-hollywood-musical-trend-2017-1201981880/ Page 2

The Problem Existing Solutions Introduction Route 1: Best route according to current traffic conditions Route 2: Let s see. Route 1 Route 2 Destination 7:00AM Page 3

The Problem Existing Solutions Introduction Predictive vs. Real-Time Path-Planning Route 1 Route 2 Destination Evolution of traffic over time 7:15AM Page 4

The Problem Existing Solutions Introduction Traffic forecasting enables better route planning. Route 1 Route 2 Destination Stuck in traffic 7:30AM Page 5

Traffic Prediction Input: road network and past T traffic speed observed at sensors Output: traffic speed for the next T steps Input: Observations Output: Predictions... 7:00 AM... 8:00 AM 8:10AM, 8:20AM,, 9:00 AM Page 6

Challenges for Traffic Prediction Speed (mile/h) Complex Spatial Dependency Non-linear, non-stationary Temporal Dynamics Sensor 1 Sensor 2 Sensor 3 Page 7

Related Work Traffic Prediction without spatial dependency modeling Simulation and queuing theory [Drew 1968] Kalman Filter: [Okutani et al. TRB 83] [Wang et al. TRB 05] ARIMA: [Williams et al. TRB 98] [Pan et al. ICDM 12] Support Vector Regression (SVR): [Muller et al, ICANN' 97] [Wu et al. ITS 04] Gaussian process [Xie et al. TRB 10] [Zhou et al. SIGMOD 15] Recurrent neural networks and deep learning: [Lv et al ITS 15] [Ma et al. TRC 15] [Li et al SDM 17] Model each sensor independently Fail to capture spatial correlation Page 8

Related Work Traffic Prediction with spatial dependency modeling Vector ARIMA [Williams and Hoel JTE 03], [Chandra et al. ITS 09] Spatiotemporal ARIMA [Kamarianakis et al., TRB 03] [Min and Wynter, TRC 11] k-nearest Neighbor [Li et al. ITS 12] [Rice et al. ITS 13] Latent Space Model [Deng et al.kdd 17] Convolutional Neural Network [Ma et al. ITS 17] Either assume linear temporal dependency or fail to capture the non-euclidean spatial dependency Page 9

Big picture Model spatial dependency with proposed diffusion convolution. Model temporal dependency with augmented recurrent neural network * Li, Yaguang et al. Diffusion Convolutional Recurrent Neural Network: Data-driven Traffic Forecasting, ICLR 2018. Page 10

Spatial Dependency Modeling Model spatial dependency with Convolutional Neural Networks (CNN) CNN extracts meaningful spatial patterns using filters. State-of-the-art results on image related tasks Image CNN is only applicable to Euclidean grid graph. Convolutional Filter * Y LeCun et al. Gradient-based learning applied to document recognition. Proc. IEEE 1998 + Image from: http://www.wildml.com/2015/11/understanding-convolutional-neural-networks-for-nlp/ Page 11

Spatial Dependency in Traffic Prediction Spatial dependency among traffic flow is non-euclidean and directed Sensor 1 Sensor 2 Close in Euclidean space Similar traffic speed Sensor 3 dist net v i v j dist net v i v j Page 12

Spatial Dependency Modeling Model the network of traffic sensors, i.e., loop detectors, as a directed graph Graph G = (V, A) Vertices V: o sensors Adjacency matrix A: weight between vertices A ij = exp dist net v i, v j 2 σ 2 if dist net v i, v j κ dist net v i, v j : road network distance from v i to v j, κ: threshold to ensure sparsity, σ 2 variance of all pairwise road network distances Page 13

Problem Statement Graph signal: X t R V P, observation on G at time t V : number of vertices P : feature dimension of each vertex. Problem Statement: Learn a function g( ) to map T historical graph signals to future T graph signals X t T +1 X t X t+1 X t+t g. Page 14

Generalize Convolution to Graph Convolution as a weighted combination of neighborhood vertices. l w 6,4 h 7 l Filter weight Convolutional filter on graph centered at v 6 Learning complexity is too high: O V N h 2 l h 8 l Min h 6 l h 1 l h 4 l h 9 l h 5 l h 3 l l h 10 Max V h i l+1 = j N i w l l ij h j N i : neighbor of vertex i w l ij : filter weight of vertex j centered at vertex i layer l h l j : feature of vertex j in layer l h 1 j = X j,:, i.e., the input. Page 15

Generalize Convolution to Graph Diffusion convolution filter: combination of diffusion processes with different steps on the graph. Transition matrices of the diffusion process K 1 X :,p G f θ = θ k D O 1 A k X :,p Learning complexity: O K k=0 = θ 0 + θ 1 + θ 2 + + θ K Example diffusion filter Centered at 0 Step Diffusion G : diffusion convolution, D o : diagonal out-degree matrix. 1 Step Diffusion 2 Step Diffusion Filter weight Min K Step Diffusion Max Page 16

Generalize Convolution to Graph Diffusion convolution filter: combination of diffusion processes with different steps on the graph. Dual directional diffusion to model upstream and downstream separately K 1 X :,p G f θ = k=0 θ k,1 D O 1 A k + θ k,2 D I 1 A k X :,p = θ 0 + θ 1 + θ 2 + + θ K Example diffusion filter Centered at 0 Step Diffusion 1 Step Diffusion G : diffusion convolution, D o : diagonal out-degree matrix, D I : diagonal in-degree matrix 2 Step Diffusion Weight Min K Step Diffusion Max Page 17

Advantage of Diffusion Convolution Efficient K 1 X :,p G f θ = θ k,1 D 1 O A k + θ k,2 D 1 I A k k=0 Learning complexity: O K Time complexity: O K E, E number of edges X :,p Expressive Many popular convolution operations, including the ChebNet [Defferrard et al., NIPS 16], can be represented using the diffusion convolution [Li et al. ICLR 18]. G : diffusion convolution, D o : diagonal out-degree matrix, D I : diagonal in-degree matrix + Defferrard, M., Bresson, X., Vandergheynst, P., Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering, NIPS, 2016 * Li, Yaguang et al. Diffusion Convolutional Recurrent Neural Network: Data-driven Traffic Forecasting, ICLR, 2018 Page 18

Big picture Model spatial dependency with proposed diffusion convolution Model temporal dependency with augmented recurrent neural network * Li, Yaguang et al. Diffusion Convolutional Recurrent Neural Network: Data-driven Traffic Forecasting, ICLR 2018. Page 19

Model Temporal Dynamics using Recurrent Neural Networks Recurrent Neural Networks (RNN) Non-linear, non-stationary auto-regression State-of-the-art performance in sequence modeling h h 1 h 2 h 3 RNN Unroll RNN RNN RNN Popular example of RNN Long Short-Term Memory unit (LSTM) Gated Recurrent Unit (GRU) x GRU Diffusion + Convolution x 1 x 2 x 3 Page 20

Model Temporal Dynamics using Recurrent Neural Network Multi-step ahead prediction with RNN Current Time Error Propagation Teach the model to deal with its own error. x 4 x 5 x 6 x 1 x 2 x 3 x 4 x 5 x x Model prediction Observation or ground truth Previous model output is fed into the network Page 21

Improve Multi-step ahead Forecasting Traffic prediction as a sequence to sequence learning problem Encoder-decoder framework x 1, x 2, x 3 x 4 x 1, x 2, x 3 x 4, x 5, x 6 Current Time Backprop errors from multiple steps. x 4 x 5 x 6 δ 4 δ 5 δ 6 x 4 x 5 x 6 x 1 x 2 x 3 <GO> x 4 x 5 x x Model prediction Observation or ground truth Encoder * Sutskever et al. Sequence to sequence learning with neural networks, NIPS 2014 Decoder Ground truth becomes unavailable in testing. Page 22

Improve Multi-step ahead Forecasting Improve multi-step ahead forecasting with scheduled sampling Current Time x 4 x 5 x 6 x x x 1 Model prediction Observation or ground truth x 2 Encoder x 3 * Bengio,Samy, et al. Scheduled sampling for sequence prediction with recurrent neural networks. NIPS 2015 <GO> x 4 x 4 x 5 x 5 Decoder Scheduled sampling: Choose to use the previous ground truth or model prediction by flipping a coin Page 23

Improve Multi-step ahead Forecasting probability Improve multi-step ahead forecasting with scheduled sampling Curriculum learning: gradually enables the model to deal with its own error. Easy: Only feed ground truth x t+1 x t x t x x Model prediction Observation or ground truth # iteration * Bengio,Samy, et al. Scheduled sampling for sequence prediction with recurrent neural networks. NIPS 2015 Hard: Only feed model prediction Page 24

Diffusion Convolutional Recurrent Neural Network Diffusion Convolutional Recurrent Neural Network (DCRNN) Model spatial dependency with diffusion convolution Sequence to sequence learning with encoder-decoder framework Improve multi-step ahead forecasting with scheduled sampling * Li, Yaguang et al. Diffusion Convolutional Recurrent Neural Network: Data-driven Traffic Forecasting, ICLR 2018 Page 25

Experiments Datasets METR-LA: 207 traffic sensors in Los Angeles 4 months in 2012 6.5M observations PEMS-BAY: 345 traffic sensors in Bay Area 6 months in 2017 17M observations Page 26

Experiments Baselines Historical Average (HA) Autoregressive Integrated Moving Average (ARIMA) Support Vector Regression (SVR) Vector Auto-Regression (VAR) Feed forward Neural network (FNN) Fully connected LSTM with Sequence to Sequence framework (FC-LSTM) Task Multi-step ahead traffic speed forecasting Page 27

Experimental Results Mean Absolute Error (MAE) Mean Absolute Error (MAE) DCRNN achieves the best performance for all forecasting horizons for both datasets 7.00 METR-LA 3.50 PEMS-BAY 6.00 5.00 4.00 3.00 2.00 3.00 2.50 2.00 1.50 1.00 15 Min 30 Min 1 Hour 1.00 15 Min 30 Min 1 Hour HA ARIMA VAR SVR FNN FC-LSTM DCRNN HA ARIMA VAR SVR FNN FC-LSTM DCRNN Page 28

Effects of Spatiotemporal Dependency Modeling Mean Absolute Error (MAE) w/o temporal: removing sequence to sequence learning. w/o spatial: remove the diffusion convolution. 5 4.5 4 3.5 3 2.5 2 Removing either spatial or temporal modeling results in significantly worse results. 1.5 15 Min 30 Min 1 Hour DCRNN w/o Temporal DCRNN w/o Spatial DCRNN METR-LA Page 29

Example: Prediction Results mile/h DCRNN is more likely to accurately predict abrupt changes in the traffic speed than the best baseline method. Example Prediction Results on METR-LA Page 30

Example: Filter Visualization Visualization of learned filters Filters are localized around the center. Weights diffuse alongside the road network. METR-LA Max 0 Min center Learned filters centered at different verticices. Page 31

Summary Propose diffusion convolution to model the spatial dependency of traffic flow. Propose Diffusion Convolutional Recurrent Neural Network (DCRNN) that captures both spatial and temporal dependencies. DCRNN obtains consistent improvement over state-of-the-art baseline methods. ICLR 2018 https://github.com/liyaguang/dcrnn https://arxiv.org/1707.01926 * Li, Yaguang et al. Diffusion Convolutional Recurrent Neural Network: Data-driven Traffic Forecasting, ICLR 2018. Page 32

Thank You! Q & A Page 33