Crowd Scene Understanding with Coherent Recurrent Neural Networks

Size: px

Start display at page:

Download "Crowd Scene Understanding with Coherent Recurrent Neural Networks"

Darcy Lucas
6 years ago
Views:

1 Crowd Scene Understanding with Coherent Recurrent Neural Networks Hang Su, Yinpeng Dong, Jun Zhu May 22, 2016 Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, / 26

2 Outline 1 Introduction 2 LSTM Recap 3 Coherent LSTM 4 Experimental Results Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, / 26

3 Outline 1 Introduction 2 LSTM Recap 3 Coherent LSTM 4 Experimental Results Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, / 26

4 Background Crowd scene is the scene of public places where a large group of people who have gathered together such as a university campus or the sidewalks of a busy street. Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, / 26

5 Background Crowd scene is the scene of public places where a large group of people who have gathered together such as a university campus or the sidewalks of a busy street. Groups are the main entities that make up a crowd. Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, / 26

6 Background Crowd scene is the scene of public places where a large group of people who have gathered together such as a university campus or the sidewalks of a busy street. Groups are the main entities that make up a crowd. When pedestrians walk in a crowded space, their trajectories are influenced by others and obstacles. Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, / 26

7 Background Crowd scene is the scene of public places where a large group of people who have gathered together such as a university campus or the sidewalks of a busy street. Groups are the main entities that make up a crowd. When pedestrians walk in a crowded space, their trajectories are influenced by others and obstacles. Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, / 26

8 Applications Understanding collective behaviors in crowd scenes has a wide range of applications in Video Surveillance Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, / 26

9 Applications Understanding collective behaviors in crowd scenes has a wide range of applications in Video Surveillance Crowd Management Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, / 26

10 Applications Understanding collective behaviors in crowd scenes has a wide range of applications in Video Surveillance Crowd Management Avoiding Tragic Accidents Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, / 26

11 Problem Formulation Obtain reliable tracklets from each scene using KLT trackers. At any time-instant t, the i th person is represented by his/her coordinate (x i (t), y i (t)). Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, / 26

12 Problem Formulation Obtain reliable tracklets from each scene using KLT trackers. At any time-instant t, the i th person is represented by his/her coordinate (x i (t), y i (t)). Predict future trajectories of pedestrians and use extracted hidden features to do other classification tasks. LSTM LSTM Coherent regularization Motion Prediction LSTM Coherent regularization Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, / 26

13 Challenge Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, / 26

14 Challenge Crowd spatio-temporal patterns behave nonlinear dynamics Limit cycles Quasi-period Chaos Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, / 26

Pedestrian tend to form groups Intra-group properties and inter-group

15 Challenge Crowd spatio-temporal patterns behave nonlinear dynamics Limit cycles Quasi-period Chaos Collective effect (or coherent motion) Pedestrian tend to form groups Intra-group properties and inter-group properties. Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, / 26

16 Previous Work Traditional approach such as Social Force model Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, / 26

17 Previous Work Traditional approach such as Social Force model Optimize energy function Hand-crafted functions Hard to generalize Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, / 26

18 Previous Work Traditional approach such as Social Force model Optimize energy function Hand-crafted functions Hard to generalize Probabilistic Forecasting such as Gaussian Process Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, / 26

Previous Work Traditional approach such as Social Force model Optimize energy function Hand-crafted functions Hard to generalize Probabilistic

19 Previous Work Traditional approach such as Social Force model Optimize energy function Hand-crafted functions Hard to generalize Probabilistic Forecasting such as Gaussian Process Recurrent Neural Networks N-LSTM (CVPR 2016) Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, / 26

20 Outline 1 Introduction 2 LSTM Recap 3 Coherent LSTM 4 Experimental Results Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, / 26

21 LSTM Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, / 26

22 LSTM Structure Input / Output / Forget gate Memory state c t Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, / 26

23 LSTM Structure Input / Output / Forget gate Memory state c t Advantage Prevent vanishing gradient problem Nonlinear characteristic Generalization Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, / 26

24 LSTM i t = σ(w xi x t + W hi h t 1 + W ci c t 1 + b i ) (1) f t = σ(w xf x t + W hf h t 1 + W cf c t 1 + b f ) (2) c t =f t c t 1 + i t tanh(w xc x t + W hc h t 1 + b c ) (3) o t =σ(w xo x t + W ho h t 1 + W co c t + b o ) (4) h t = o t tanh(c t ) (5) Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, / 26

25 Outline 1 Introduction 2 LSTM Recap 3 Coherent LSTM 4 Experimental Results Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, / 26

26 Why Coherent LSTM? LSTM can model individual behaviors but dose not capture the interaction of people in a group. Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, / 26

27 Why Coherent LSTM? LSTM can model individual behaviors but dose not capture the interaction of people in a group. The individuals are always willing to engage with seed groups and form spatially coherent structures. Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, / 26

28 Why Coherent LSTM? LSTM can model individual behaviors but dose not capture the interaction of people in a group. The individuals are always willing to engage with seed groups and form spatially coherent structures. When the neighboring relationship of individuals remain invariant over time and correlation of their velocities remain high, they tend to have similar hidden state. Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, / 26

29 Why Coherent LSTM? LSTM can model individual behaviors but dose not capture the interaction of people in a group. The individuals are always willing to engage with seed groups and form spatially coherent structures. When the neighboring relationship of individuals remain invariant over time and correlation of their velocities remain high, they tend to have similar hidden state. The trajectories of pedestrians not only follow the old trend, but also are influenced by current environment. Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, / 26

30 clstm Unit c t = f t c t 1 + j N λ j (t)f j t cj t 1 + i t tanh(w xc x t + W hc h t 1 + b c ) (6) Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, / 26

31 clstm Unit c t = f t c t 1 + j N λ j (t)f j t cj t 1 + i t tanh(w xc x t + W hc h t 1 + b c ) (6) σ Forget Gate σ Input Gate Cell ϕ Coherent Regularization x t ht 1 Output Gate σ h t Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, / 26

32 Coherent Motion Modeling Use coherent filtering [Zhou et al., 2012a] [Shao et al., 2014] to discover the coherent group. Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, / 26

33 Coherent Motion Modeling Use coherent filtering [Zhou et al., 2012a] [Shao et al., 2014] to discover the coherent group. The dependency relationship between two tracklets within the same group is measured as: τ j (t) = v i(t) v j (t) v i (t) v j (t) (7) Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, / 26

34 Dependency Coefficient The dependency coefficient between the i th and j th tracklets in Eq. (6) is defined as λ j (t) = 1 ( ) τj (t) 1 exp Z i 2σ 2 (0, 1], (8) Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, / 26

35 Dependency Coefficient The dependency coefficient between the i th and j th tracklets in Eq. (6) is defined as λ j (t) = 1 ( ) τj (t) 1 exp Z i 2σ 2 (0, 1], (8) Z i : normalization constant corresponding to the i th tracklet. Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, / 26

36 Dependency Coefficient The dependency coefficient between the i th and j th tracklets in Eq. (6) is defined as λ j (t) = 1 ( ) τj (t) 1 exp Z i 2σ 2 (0, 1], (8) Z i : normalization constant corresponding to the i th tracklet. λ j (t) 1 if v i (t) v j (t) which implies that tracklets i and j are similar. Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, / 26

37 Dependency Coefficient The dependency coefficient between the i th and j th tracklets in Eq. (6) is defined as λ j (t) = 1 ( ) τj (t) 1 exp Z i 2σ 2 (0, 1], (8) Z i : normalization constant corresponding to the i th tracklet. λ j (t) 1 if v i (t) v j (t) which implies that tracklets i and j are similar. Coherent regularization encourages the tracklets to learn similar feature distributions by sharing information across tracklets within a coherent group. Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, / 26

38 Framework Unsupervised encoder-decoder clstm framework: h T = clstm e (x T, h T 1 ), (9) ˆx t = clstm dr (h t, ˆx t+1 ), where t [1, T ], (10) ˆx t = clstm dp (h t, ˆx t 1 ). where t > T, (11) ˆx 3 ˆx ˆx 2 1 x1 x2 x3 Wrd Wrd Reconstruction Decoder We We h T ˆx 4 ˆx 5 ˆx 6 Encoder Learnt Hidden Features Wpd Wpd Prediction Decoder Coherent Regularization Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, / 26

39 Crowd Scene Profiling Solve critical tasks in crowd scene analysis: Estimating group state Gas, Solid, Pure Fluid and Impure Fluid Softmax classification using the feature learnt from the unsupervised clstm. Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, / 26

40 Crowd Scene Profiling Solve critical tasks in crowd scene analysis: Estimating group state Gas, Solid, Pure Fluid and Impure Fluid Softmax classification using the feature learnt from the unsupervised clstm. Crowd video classification Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, / 26

41 Outline 1 Introduction 2 LSTM Recap 3 Coherent LSTM 4 Experimental Results Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, / 26

42 Datasets and Settings CUHK Crowd Dataset Scene: streets, shopping malls, airports and parks More than 400 sequences and more then 200,000 traklets Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, / 26

43 Datasets and Settings CUHK Crowd Dataset Scene: streets, shopping malls, airports and parks More than 400 sequences and more then 200,000 traklets Settings 128 hidden units in clstm 2/3 of tracklets as the input and 1/3 as the predicted tracklets to evaluate the performance. Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, / 26

44 Future Path Forecasting Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, / 26

45 Future Path Forecasting Table 1: Error of Path Prediction(pixels) Kalman Filter Un-coherent LSTM Coherent LSTM 9.32 ± ± ±0.93 Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, / 26

46 Group State Estimation Gas: Particles move in different directions without forming collective behaviors Solid: Particles move in the same direction with relative positions unchanged Pure Fluid: Particles move towards the same direction with ever-changing relative positions Impure Fluid: Particles move in a pure fluid style with invasion of particles from other groups Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, / 26

ever-changing relative positions Impure Fluid: Particles move in a pure fluid style with invasion of particles from

47 Group State Estimation Gas: Particles move in different directions without forming collective behaviors Solid: Particles move in the same direction with relative positions unchanged Pure Fluid: Particles move towards the same direction with ever-changing relative positions Impure Fluid: Particles move in a pure fluid style with invasion of particles from other groups (a) Gas (b) Solid (c) Pure Fluid (d) Impure Fluid Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, / 26

48 Group State Estimation Confusion matrices of estimating group states using different methods: (a) collective transition [Shao et al., 2014]; (b) prediction LSTM; (c) reconstruction LSTM; (d) un-coherent LSTM; and (e) coherent LSTM. (a) Collective Transition (b) Prediction LSTM (c) Reconstruction LSTM (d) Un-coherent LSTM (e) Coherent LSTM Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, / 26

49 Crowd Video Classification All video clips are annotated into 8 classes as 1) Highly mixed pedestrian walking; 2) Crowd walking following a mainstream and well organized; 3) Crowd walking following a mainstream but poorly organized; 4) Crowd merge; 5) Crowd split; 6) Crowd crossing in opposite directions; 7) Intervened escalator traffic; and 8) Smooth escalator traffic. Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, / 26

50 Crowd Video Classification All video clips are annotated into 8 classes as 1) Highly mixed pedestrian walking; 2) Crowd walking following a mainstream and well organized; 3) Crowd walking following a mainstream but poorly organized; 4) Crowd merge; 5) Crowd split; 6) Crowd crossing in opposite directions; 7) Intervened escalator traffic; and 8) Smooth escalator traffic. Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, / 26

Crowd Video Classification All video clips are annotated into 8 classes as 1) Highly mixed pedestrian walking; 2) Crowd walking following a mainstream and well organized; 3) Crowd walking following a

51 Crowd Video Classification All video clips are annotated into 8 classes as 1) Highly mixed pedestrian walking; 2) Crowd walking following a mainstream and well organized; 3) Crowd walking following a mainstream but poorly organized; 4) Crowd merge; 5) Crowd split; 6) Crowd crossing in opposite directions; 7) Intervened escalator traffic; and 8) Smooth escalator traffic. Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, / 26

52 Thanks for your time! Questions? Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, / 26

Crowd Scene Understanding with Coherent Recurrent Neural Networks

Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI-16) Crowd Scene Understanding with Coherent Recurrent Neural Networks Hang Su, Yinpeng Dong, Jun Zhu, Haibin