Social Interactions: A First-Person Perspective.

Size: px

Start display at page:

Download "Social Interactions: A First-Person Perspective."

Ella Hodge
6 years ago
Views:

1 Social Interactions: A First-Person Perspective. A. Fathi, J. Hodgins, J. Rehg Presented by Jacob Menashe November 16, 2012

2 Social Interaction Detection Objective: Detect social interactions from video footage.

3 Social Interaction Detection Objective: Detect social interactions from video footage. Consider faces and attention

4 Social Interaction Detection Objective: Detect social interactions from video footage. Consider faces and attention Account for temporal context

5 Social Interaction Detection Objective: Detect social interactions from video footage. Consider faces and attention Account for temporal context Analyze first-person movements cues

6 Introduction Overview Features Temporal Context Experiments

7 Video Example Red Yellow Green Light Blue Dark Blue None Dialogue Walking Dialogue Discussion Walking Discussion Monologue Background Link

8 Features Features are constructed based on first- and third-person information.

9 Features Features are constructed based on first- and third-person information. 1. Dense optical flow (first-person movement).

10 Features Features are constructed based on first- and third-person information. 1. Dense optical flow (first-person movement). 2. Face locations (relative to first person)

11 Features Features are constructed based on first- and third-person information. 1. Dense optical flow (first-person movement). 2. Face locations (relative to first person) 3. Attention and Roles. For each person x:

12 Features Features are constructed based on first- and third-person information. 1. Dense optical flow (first-person movement). 2. Face locations (relative to first person) 3. Attention and Roles. For each person x: Faces looking at x

13 Features Features are constructed based on first- and third-person information. 1. Dense optical flow (first-person movement). 2. Face locations (relative to first person) 3. Attention and Roles. For each person x: Faces looking at x Whether first person looks at x

14 Features Features are constructed based on first- and third-person information. 1. Dense optical flow (first-person movement). 2. Face locations (relative to first person) 3. Attention and Roles. For each person x: Faces looking at x Whether first person looks at x Mutual attention between x and first person

15 Features Features are constructed based on first- and third-person information. 1. Dense optical flow (first-person movement). 2. Face locations (relative to first person) 3. Attention and Roles. For each person x: Faces looking at x Whether first person looks at x Mutual attention between x and first person Number of faces looking at where x is looking

16 Feature Example

17 Conditional Random Fields CRFs are described in Lafferty et al. [2001].

18 Conditional Random Fields CRFs are described in Lafferty et al. [2001]. Observations and labels form a Markov chain. Nodes pend on neighbors. y 1 y 2 y 3 x 1 x 2 x 3

19 Conditional Random Fields CRFs are described in Lafferty et al. [2001]. Observations and labels form a Markov chain. Nodes pend on neighbors. y 1 y 1 y 2 y 3 p(y 1 x 1, y 2 ) x 1 x 2 x 3

20 Conditional Random Fields CRFs are described in Lafferty et al. [2001]. Observations and labels form a Markov chain. Nodes pend on neighbors. y 1 y 2 y 3 p(y 2 y 1, y 3, x 2 ) x 1 x 2 x 3

21 Conditional Random Fields CRFs are described in Lafferty et al. [2001]. Observations and labels form a Markov chain. Nodes pend on neighbors. y 3 y 1 y 2 y 3 p(y 3 y 2, x 3 ) x 1 x 2 x 3

22 Hidden Conditional Random Fields A micro view of the HCRF model as described in Quattoni et al. [2007]. Y h 1 h 2 h 3 x i

23 Hidden Conditional Random Fields A micro view of the HCRF model as described in Quattoni et al. [2007]. Y is a label for the whole sequence. Y h 1 h 2 h 3 x i

24 Hidden Conditional Random Fields A micro view of the HCRF model as described in Quattoni et al. [2007]. Y is a label for the whole sequence. x i is a single observation in the sequence. Y h 1 h 2 h 3 x i

25 Hidden Conditional Random Fields A micro view of the HCRF model as described in Quattoni et al. [2007]. Y is a label for the whole sequence. x i is a single observation in the sequence. Each h i is a possible hidden state. Y h 1 h 2 h 3 x i

26 Hidden Conditional Random Fields (cont.) A macro view of the HCRF model as described in Quattoni et al. [2007]. Y h 1 h 2 h 3 x 1 x 2 x 3

27 Hidden Conditional Random Fields (cont.) A macro view of the HCRF model as described in Quattoni et al. [2007]. Y is a label for the whole sequence. Y h 1 h 2 h 3 x 1 x 2 x 3

28 Hidden Conditional Random Fields (cont.) A macro view of the HCRF model as described in Quattoni et al. [2007]. Y is a label for the whole sequence. Each x i is a single observation in the sequence. Y h 1 h 2 h 3 x 1 x 2 x 3

29 Hidden Conditional Random Fields (cont.) A macro view of the HCRF model as described in Quattoni et al. [2007]. Y is a label for the whole sequence. Each x i is a single observation in the sequence. Each h i is the hidden state label assigned to x i. Y h 1 h 2 h 3 x 1 x 2 x 3

30 Hidden Conditional Random Fields (cont.) A macro view of the HCRF model as described in Quattoni et al. [2007]. Y is a label for the whole sequence. Each x i is a single observation in the sequence. Each h i is the hidden state label assigned to x i. Y p(h 1 Y, h 2, x 1 ) h 1 h 2 h 3 x 1 x 2 x 3

31 Hidden Conditional Random Fields (cont.) A macro view of the HCRF model as described in Quattoni et al. [2007]. Y is a label for the whole sequence. Each x i is a single observation in the sequence. Each h i is the hidden state label assigned to x i. Y p(h 2 Y, h 1, h 3, x 2 ) h 1 h 2 h 3 x 1 x 2 x 3

32 Hidden Conditional Random Fields (cont.) A macro view of the HCRF model as described in Quattoni et al. [2007]. Y is a label for the whole sequence. Each x i is a single observation in the sequence. Each h i is the hidden state label assigned to x i. Y p(h 3 Y, h 2, x 3 ) h 1 h 2 h 3 x 1 x 2 x 3

33 Hidden Conditional Random Fields (cont.) A macro view of the HCRF model as described in Quattoni et al. [2007]. Y is a label for the whole sequence. Each x i is a single observation in the sequence. Each h i is the hidden state label assigned to x i. Y p(y {h i }) = p(y {x i }) h 1 h 2 h 3 x 1 x 2 x 3

34 HCRF Example Suppose we want to find the likelihood of walking dialogue (WDlg) vs walking discussion (WDisc). WDlg h 1 h 2 h 3 x 1 x 2 x 3

35 HCRF Example Suppose we want to find the likelihood of walking dialogue (WDlg) vs walking discussion (WDisc). Each x i is now a feature extracted from video frames. WDlg h 1 h 2 h 3 x 1 x 2 x 3

36 HCRF Example Suppose we want to find the likelihood of walking dialogue (WDlg) vs walking discussion (WDisc). Each x i is now a feature extracted from video frames. Each h i is determined from training: WDlg h 1 h 2 h 3 x 1 x 2 x 3

37 HCRF Example Suppose we want to find the likelihood of walking dialogue (WDlg) vs walking discussion (WDisc). Each x i is now a feature extracted from video frames. Each h i is determined from training: h1 : John wants to hear about my weekend. WDlg h 1 h 1 h 2 h 3 x 1 x 2 x 3

38 HCRF Example Suppose we want to find the likelihood of walking dialogue (WDlg) vs walking discussion (WDisc). Each x i is now a feature extracted from video frames. Each h i is determined from training: h 2 : I m feeling talkative. WDlg h 2 h 1 h 2 h 3 x 1 x 2 x 3

39 HCRF Example Suppose we want to find the likelihood of walking dialogue (WDlg) vs walking discussion (WDisc). Each x i is now a feature extracted from video frames. Each h i is determined from training: h 3 : Mary wants to listen to her ipod. WDlg h 3 h 1 h 2 h 3 x 1 x 2 x 3

40 HCRF Example Suppose we want to find the likelihood of walking dialogue (WDlg) vs walking discussion (WDisc). Each x i is now a feature extracted from video frames. Each h i is determined from training: h1 : John wants to hear about my weekend. WDlg p(h 1 Y, h 2, x 1 ) h 1 h 2 h 3 x 1 x 2 x 3

41 HCRF Example Suppose we want to find the likelihood of walking dialogue (WDlg) vs walking discussion (WDisc). Each x i is now a feature extracted from video frames. Each h i is determined from training: h 2 : I m feeling talkative. WDlg p(h 2 Y, h 1, h 3, x 2 ) h 1 h 2 h 3 x 1 x 2 x 3

42 HCRF Example Suppose we want to find the likelihood of walking dialogue (WDlg) vs walking discussion (WDisc). Each x i is now a feature extracted from video frames. Each h i is determined from training: h 3 : Mary wants to listen to her ipod. WDlg p(h 3 Y, h 2, x 3 ) h 1 h 2 h 3 x 1 x 2 x 3

43 HCRF Example Suppose we want to find the likelihood of walking dialogue (WDlg) vs walking discussion (WDisc). Each x i is now a feature extracted from video frames. Each h i is determined from training: h1 : John wants to hear about my weekend. h 2 : I m feeling talkative. h 3 : Mary wants to listen to her ipod. WDlg p(wdlg {h i }) = p(wdlg {x i }) h 1 h 2 h 3 x 1 x 2 x 3

44 HCRF Example Suppose we want to find the likelihood of walking dialogue (WDlg) vs walking discussion (WDisc). Each x i is now a feature extracted from video frames. Each h i is determined from training: h1 : John wants to hear about my weekend. h 2 : I m feeling talkative. h 3 : Mary wants to listen to her ipod. WDisc WDlg p(wdisc {h i }) = p(wdisc {x i }) h 1 h 2 h 3 x 1 x 2 x 3

45 HCRF Example Suppose we want to find the likelihood of walking dialogue (WDlg) vs walking discussion (WDisc). Each x i is now a feature extracted from video frames. Each h i is determined from training: h1 : John wants to hear about my weekend. h 2 : I m feeling talkative. h 3 : Mary wants to listen to her ipod. Y If p(wdlg) > p(wdisc), assign Y = WDlg. h 1 h 2 h 3 x 1 x 2 x 3

46 Introduction Overview Temporal Context Conditional Random Fields Hidden Conditional Random Fields HCRF Example Experiments

47 Introduction Overview Temporal Context Experiments Experiment Outline Experiment 1: Video Processing Experiment 2: Caltech Dataset Conclusion

48 Experiment Outline The following experiments are presented:

49 Experiment Outline The following experiments are presented: Video Processing

50 Experiment Outline The following experiments are presented: Video Processing Caltech image dataset

51 Experiment Outline The following experiments are presented: Video Processing Caltech image dataset Adjusted parameters:

52 Experiment Outline The following experiments are presented: Video Processing Caltech image dataset Adjusted parameters: Iterations

53 Experiment Outline The following experiments are presented: Video Processing Caltech image dataset Adjusted parameters: Iterations Hidden States

54 Experiment Outline The following experiments are presented: Video Processing Caltech image dataset Adjusted parameters: Iterations Hidden States Optimization Function

55 Experiment Outline The following experiments are presented: Video Processing Caltech image dataset Adjusted parameters: Iterations Hidden States Optimization Function Clusters

56 Experiment Outline The following experiments are presented: Video Processing Caltech image dataset Adjusted parameters: Iterations Hidden States Optimization Function Clusters Compared with linear SVM baseline

57 Experiment 1: Video Processing Mine Theirs 40 training intervals 4,000 training intervals 40 testing intervals [unspecified] Dialogue vs Discussion One vs. All All Features Location First-Person Motion Attention All Features

58 Experiment 1: Video Processing Mine Theirs 40 training intervals 4,000 training intervals 40 testing intervals [unspecified] Dialogue vs Discussion One vs. All All Features Location First-Person Motion Attention All Features ~42 hours = 11,340 intervals 11, hours per 20 intervals > 18 months

59 Experiment 1: Video Processing (cont.) My Results Their Results 1 HCRF Dialogue vs Discussion Detection True positive rate False positive rate

60 Experiment 2: Caltech Dataset Experiment 2 focuses on the Caltech image dataset.

61 Experiment 2: Caltech Dataset Experiment 2 focuses on the Caltech image dataset. Multi-class HCRF evaluated

62 Experiment 2: Caltech Dataset Experiment 2 focuses on the Caltech image dataset. Multi-class HCRF evaluated Classes are evaluated in isolation.

63 Experiment 2: Caltech Dataset Experiment 2 focuses on the Caltech image dataset. Multi-class HCRF evaluated Classes are evaluated in isolation. Temporal context is simulated with clustering

64 Experiment 2: Caltech Dataset Experiment 2 focuses on the Caltech image dataset. Multi-class HCRF evaluated Classes are evaluated in isolation. Temporal context is simulated with clustering Initial parameters are based on Fathi et al. [2012]:

65 Experiment 2: Caltech Dataset Experiment 2 focuses on the Caltech image dataset. Multi-class HCRF evaluated Classes are evaluated in isolation. Temporal context is simulated with clustering Initial parameters are based on Fathi et al. [2012]: Hidden States: 5

66 Experiment 2: Caltech Dataset Experiment 2 focuses on the Caltech image dataset. Multi-class HCRF evaluated Classes are evaluated in isolation. Temporal context is simulated with clustering Initial parameters are based on Fathi et al. [2012]: Hidden States: 5 Window Size: 5

67 Experiment 2: Caltech Dataset Experiment 2 focuses on the Caltech image dataset. Multi-class HCRF evaluated Classes are evaluated in isolation. Temporal context is simulated with clustering Initial parameters are based on Fathi et al. [2012]: Hidden States: 5 Window Size: 5 Max Iterations: 100

68 Experiment 2: Caltech Dataset Experiment 2 focuses on the Caltech image dataset. Multi-class HCRF evaluated Classes are evaluated in isolation. Temporal context is simulated with clustering Initial parameters are based on Fathi et al. [2012]: Hidden States: 5 Window Size: 5 Max Iterations: 100 Optimizer: Broyden Fletcher-Goldfarb-Shanno (BFGS)

69 Exp. 2a: Initial Settings test airplanes HCRF cars HCRF faces HCRF motorbikes HCRF svm (all) 0.7 True positive rate False positive rate Processing: ~18 minutes, 1 MB

70 Exp. 2a: Initial Settings (cont.) airplanes rank 1 rank 2 rank 3 rank 4 motorbikes faces cars

71 Exp. 2b: Low Iterations HCRF with 10 iterations airplanes HCRF cars HCRF faces HCRF motorbikes HCRF 0.7 True positive rate False positive rate Processing: ~3 minutes, 1 MB

72 Exp. 2c: Low Hidden States HCRF with 1 Hidden State airplanes HCRF cars HCRF faces HCRF motorbikes HCRF 0.7 True positive rate False positive rate Processing: ~2 minutes, 1 MB

73 Exp. 2d: CG Optimizer HCRF with Conjugate Gradient Optimizer airplanes HCRF cars HCRF faces HCRF motorbikes HCRF 0.7 True positive rate False positive rate Processing: ~11 minutes, 1 MB

74 Exp. 2e: Increased Iterations HCRF with 1000 iterations airplanes HCRF cars HCRF faces HCRF motorbikes HCRF 0.7 True positive rate False positive rate Processing: ~30 minutes, 1 MB

75 Exp. 2f: Increased Hidden States HCRF with 15 Hidden States airplanes HCRF cars HCRF faces HCRF motorbikes HCRF 0.7 True positive rate False positive rate Processing: ~1 hour, 3 GB

76 Exp. 2g: Clustering + 15 Hidden States HCRF with Clustering and 15 Hidden States airplanes HCRF cars HCRF faces HCRF motorbikes HCRF 0.7 True positive rate False positive rate Processing: ~1 hour 10 minutes, 3 GB

77 Exp. 2g: Clustering + 15 Hidden States (cont.) rank 1 rank 2 rank 3 rank 4 motorbikes faces cars airplanes

78 Exp. 2h: Clustering + 20 Hidden States HCRF with Clustering and 20 Hidden States airplanes HCRF cars HCRF faces HCRF motorbikes HCRF 0.7 True positive rate False positive rate Processing: ~1 hour 40 minutes, 5 GB

79 Exp. 2i: LDCRF with 20 Hidden States LDCRF with Clustering and 20 Hidden States airplanes LDCRF cars LDCRF faces LDCRF motorbikes LDCRF 0.7 True positive rate False positive rate Processing: ~5 hours 20 minutes, 5 GB

80 Exp. 2j: CRF with Initial Parameters CRF with Clustering airplanes CRF cars CRF faces CRF motorbikes CRF 0.7 True positive rate False positive rate Processing: ~21 seconds, 1 MB

81 Exp. 2j: CRF with Initial Parameters (cont.) rank 1 rank 2 rank 3 rank 4 motorbikes faces cars airplanes

82 Overall Results SVM, CRF, and LDCRF perform best

83 Overall Results SVM, CRF, and LDCRF perform best CRF almost outperforms all with negligible memory and processing requirements

84 Overall Results SVM, CRF, and LDCRF perform best CRF almost outperforms all with negligible memory and processing requirements Hidden states increase accuracy but at significant memory cost

85 Conclusion HCRF is accurate, but has a heavy performance cost. May be optimal for particular domains.

86 References I Alireza Fathi, Jessica K. Hodgins, and James M. Rehg. Social interactions: A first-person perspective. In CVPR, pages IEEE, ISBN URL cvpr2012.html#fathihr12. John D. Lafferty, Andrew McCallum, and Fernando C. N. Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the Eighteenth International Conference on Machine Learning, ICML 01, pages , San Francisco, CA, USA, Morgan Kaufmann Publishers Inc. ISBN URL http: //dl.acm.org/citation.cfm?id=

87 References II Ariadna Quattoni, Sybor Wang, Louis-Philippe Morency, Michael Collins, and Trevor Darrell. Hidden conditional random fields. IEEE Trans. Pattern Anal. Mach. Intell., 29 (10): , October ISSN doi: /TPAMI URL

Training LDCRF model on unsegmented sequences using Connectionist Temporal Classification

Training LDCRF model on unsegmented sequences using Connectionist Temporal Classification 1 Amir Ahooye Atashin, 2 Kamaledin Ghiasi-Shirazi, 3 Ahad Harati Department of Computer Engineering Ferdowsi University