Face Detection. Raghuraman Gopalan AT&T Labs-Research, Middletown NJ USA

Size: px

Start display at page:

Download "Face Detection. Raghuraman Gopalan AT&T Labs-Research, Middletown NJ USA"

Isabel Thompson
5 years ago
Views:

Chellappa and Ankur Srivastava (University

1 Face Detection Raghuraman Gopalan AT&T Labs-Research, Middletown NJ USA Book chapter collaborators: William Schwartz (University of Campinas, Brazil), Rama Chellappa and Ankur Srivastava (University of Maryland, College Park, USA) Image courtesy: Google Images

2 Outline Categorization of face detection methods Detectors Viola-Jones Deep learning Local interest points Role of Context Semantic info. of supporting human Computational Issues Practical Systems 2

3 Pose Facial variations Ageing Lighting Expressions Blur Occlusion 3

4 Overview : Rosenfeld, Kelly, Kanade et al early 90 s: Morphology, knowledgebased, invariants early 2000 s: PCA, Neural net, SVM, Boosting 2000-now: Local features, Context 2005-now: Large-scale deployment Sliding windows 4

5 Sliding-window methodology Basic idea: slide a window across image and evaluate a face model at every location Knowledge-based Feature invariants Template-based Appearance learning Yang et al., Face detection survey article, PAMI 2002, ICPR 2004 Tutorial 5

6 Knowledge-based Feature-based Top-down Human-coded rules Bottom-up Feature invariants Yang and Huang 94, Kotropoulos and Pitas 94 Leung et al. 95, Yow and Cipolla 90 6

7 Template Matching Store a template Predefined: edges or regions Deformable: facial contours (e.g., Snakes) Hand-coded templates (not learned) Use correlation to locate faces 7

8 Appearance-Based Methods: Classifiers Neural network Multilayer Perceptrons Principal Component Analysis (PCA), Factor Analysis Support vector machine (SVM) Mixture of PCA, Mixture of factor analyzers Naïve Bayes classifier Hidden Markov model Sparse network of winnows (SNoW) Kullback relative information Inductive learning: C4.5 Adaboost Deep learning 8

9 The Viola-Jones Face Detector Key ideas Integral images for fast feature evaluation Boosting for feature selection Attentional cascade for fast rejection of non-face windows Viola and Jones CVPR 2001, IJCV

10 Image Features Haar filters (x,y) (x,y) Integral image Value = (pixels in white area) (pixels in black area) 10

11 Feature selection 11

12 Learning Relevant features - Boosting Xt=2 12

13 Boosting - Principle 13

14 Boosting - Principle 14

15 Boosting - Principle 15

16 Boosting - Principle 16

17 Boosting - Principle 17

18 Boosting - Principle 18

19 The Problem of Outliers Ref: Freund and Schapire 97, Dietterich 00, Friedman, Hastie and Tibshirani 00, Freund 01 19

20 % Detection Attentional cascade Chain classifiers that are progressively more complex and have lower false positive rates: Receiver operating characteristic % False Pos 0 50 vs false neg determined by IMAGE SUB-WINDOW Classifier 1 T Classifier 2 T Classifier 3 T FACE F F F NON-FACE NON-FACE NON-FACE 20

21 Output of Face Detector on Test Images 21

22 Profile Detection 22

23 Profile Features 23

24 Other detection tasks Facial Feature Localization Male vs. female 24

25 Summary: Viola-Jones detector Rectangle/ Haar features Integral images for fast computation Feature selection through boosting Attentional cascade for fast rejection of negative windows 25

26 High Precision Systems Deep Learning* Popular Architectures RBM (Restricted Boltzmann Machines) Auto-encoders * deeplearning.net 26

27 Results with Deep Learning* * Osadchy, Le Cun, Miller; JMLR 07 27

28 Local interest points Motivated by topic-models 28

representation Slide credit (next 3 slides): ICCV

29 Extracting interest points from images Feature description codebook Vector quantization Histogram representation Slide credit (next 3 slides): ICCV 2009 tutorial on Recognizing and learning object categories 29

30 Object Bag of words 30

31 What about spatial info? 31

32 Adding spatial info Feature level Generative models Discriminative models Savarese, Winn and Criminisi, CVPR 2006 Sudderth, Torralba, Freeman & Willsky, 2005, 2006 Niebles & Fei-Fei, CVPR 2007 Lazebnik, Schmid & Ponce,

33 Interest points Sliding windows 33

34 Video-based face detection Part-based Models Fischler and Elschlager 73, Huttenlocher 05, Ramanan 07 Mikolajczyk 01 34

35 Context Objects do not occur in isolation The surrounding scene information does provide some clue about the presence of objects Image Credit: Torralba 03 35

36 Types of Context* *material from Divvala, Hoiem, Hays, Efros, and Hebert, Empirical study of context in object detection, CVPR

37 Face/Human Detection under Partial Occlusions* R. Gopalan, W. Schwartz ACM PerMIS 2010 W. Schwartz, R. Gopalan, R. Chellappa, L.S. Davis ICB

Illustration Face detection probability 0.0054* Person detection probability 0.

38 Illustration Face detection probability * Person detection probability 0.324** * Probability of presence of a face obtained from Moon et al T-IP (2003) ** Probability of presence of a human obtained from Schwartz et al ICCV (2009) 38

39 Related work Bilattice-based logical reasoning: Shet et al CVPR (2007) Integrating probability of human parts using first-order logic (FOL): Schwartz et al ICB (2009) 39

40 Our approach: Probabilistic logical inference Using Markov logic networks* (MLN) Representing `semantic context between the detection probabilities of parts. Enforce consistency according to spatial location of detectors removal of false alarms. Exploit relations between persons to solve inconsistencies explain occlusions. *Domingos et al, Machine Learning (2006) 40

41 Our approach: An overview Multiple detection windows Part detector s outputs Face detector outputs Learning contextual rules Final Result Instantiation of the MLN Queries: - face/person(d1)? - occluded(d1)? - occludedby(d1,d2)? Inference 41

42 Our approach: An overview Multiple detection windows Part detector s outputs Face detector outputs Learning contextual rules Final Result Instantiation of the MLN Queries: - person(d1)? - occluded(d1)? - occludedby(d1,d2)? Inference 42

43 Part-based detectors To handle human detection under occlusion, our original detector is split into parts, then MLN is used to integrate their outputs. top top-torso Features: edges texture color original torso torso-legs Partial least squares (PLS)- based dimensionality reduction legs top-legs 43

44 Our approach: An overview Multiple detection windows Part detector s outputs Face detector outputs Learning contextual rules Final Result Instantiation of the MLN Queries: - person(d1)? - occluded(d1)? - occludedby(d1,d2)? Inference 44

45 Context: Consistency between the detector outputs top-torso top torso First order logic rules: toptorso(d1) ^ top(d1) ^ torso(d1) person(d1) (consistent) toptorso(d1) ^ ( top(d1) v torso(d1)) person(d1) (false alarm) 45

46 Context: Understanding relationship between different windows d2 First order logic rule: d1 intersect(d1,d2) ^ person(d1) ^ matching(d1,d2) person(d2) ^ occluded(d2) ^ occludedby(d2,d1) matching(d1,d2) is true if: - Detectors at visible parts of d2 have high response. - d1, and d2 are persons - d1 and d2 intersect - detectors at occluded parts of d2 have low response while sensors located at the corresponding positions of d1 have high response. 46

47 Our approach: An overview Multiple detection windows Part detector s outputs Face detector outputs Learning contextual rules Final Result F i Instantiation of the MLN Queries: - person(d1)? - occluded(d1)? - occludedby(d1,d2)? Inference 47

48 Inference using MLN* - The basic idea A logical knowledge base (KB) is a set of hard constraints (F i ) on the set of possible worlds Let s make them soft constraints: A Markov Logic Network (MLN) is a set of pairs (F i, w i ) where F i is a formula in first-order logic w i is the weight of F i (a real number) P(world) exp weights of formulas it satisfies 48

49 Example: Humans & Occlusions 1) Presence of a human implies presence of parts. 2) When two humans occlude, analyzematching context between their windows 49

50 Example: Humans & Occlusions x Human( x) Parts ( x) x, y Occlusion ( x, y) Human( x) Human( y) 50

51 Example: Humans & Occlusions x Human( x) Parts ( x) x, y Occlusion ( x, y) Human( x) Human( y) 51

52 Example: Humans & Occlusions x Human( x) Parts ( x) x, y Occlusion ( x, y) Human( x) Human( y) Two constants: Detection window 1 (D1) and Detection window 2 (D2) D1 D2 52

53 Example: Humans & Occlusions x Human( x) Parts ( x) x, y Occlusion ( x, y) Human( x) Human( y) Two constants: Detection window 1 (D1) and Detection window 2 (D2) Human(D1) Human(D2) One node for each grounding of each predicate in the MLN Parts(D1) Parts(D2) 53

54 Example: Humans & Occlusions x Human( x) Parts ( x) x, y Occlusion ( x, y) Human( x) Human( y) Two constants: Detection window 1 (D1) and Detection window 2 (D2) Occlusion(D1,D2) Occlusion(D1,D1) Human(D1) Human(D2) Occlusion(D2,D2) Parts(D1) Occlusion(D2,D1) Parts(D2) 54

55 Example: Humans & Occlusions x Human( x) Parts ( x) x, y Occlusion ( x, y) Human( x) Human( y) Two constants: Detection window 1 (D1) and Detection window 2 (D2) Occlusion(D1,D2) Occlusion(D1,D1) Human(D1) Human(D2) Occlusion(D2,D2) Parts(D1) Occlusion(D2,D1) Parts(D2) One feature for each grounding of each formula Fi in the MLN, with the corresponding weight wi 55

56 Example: Humans & Occlusions x Human( x) Parts ( x) x, y Occlusion ( x, y) Human( x) Human( y) Two constants: Detection window 1 (D1) and Detection window 2 (D2) Occlusion(D1,D2) Occlusion(D1,D1) Human(D1) Human(D2) Occlusion(D2,D2) Parts(D1) Occlusion(D2,D1) Parts(D2) 56

57 Example: Humans & Occlusions x Human( x) Parts ( x) x, y Occlusion ( x, y) Human( x) Human( y) Two constants: Detection window 1 (D1) and Detection window 2 (D2) Occlusion(D1,D2) Occlusion(D1,D1) Human(D1) Human(D2) Occlusion(D2,D2) Parts(D1) Occlusion(D2,D1) Parts(D2) 57

58 Instantiation MLN is template for ground Markov nets Probability of a world x: 1 P ( x) exp wi ni ( x) Z i Weight of formula Fi No. of true groundings of formula F i Learning of weights, and inference performed using the open-source Alchemy system [Domingos et al (2006)] 58

59 Our approach: An overview Multiple detection windows Part detector s outputs Face detector outputs Learning contextual rules Final Result Instantiation of the MLN Queries: - person(d1)? - occluded(d1)? - occludedby(d1,d2)? Inference 59

60 Results 60

61 Comparisons Dataset details: 200 images 5 to 15 humans per image Occluded humans ~ 35% 61

62 Practical systems Slide credit: Microsoft Research, face.com 62

63 Practical systems Slide credit: Microsoft Research, face.com 63

64 Computational efficiency Matching Branch and bound Dynamic programming Tree/ graph traversal Representation Integral images (Viola and Jones) Contour-based Efficient contour fitting 64

65 Computationally efficient representation of piece-wise linear contours Obtain edge image Line integral image 65

66 Computationally efficient representation of piece-wise linear contours Obtain edge image Line integral image 66

67 Datasets FDDB: Face Detection Data Set and Benchmark (2010) 67

68 Challenges and Conclusion Performance Bounds Multi-modal context Devices Occlusion 68

Detecting Humans under Partial Occlusion using Markov Logic Networks

Detecting Humans under Partial Occlusion using Markov Logic Networks ABSTRACT Raghuraman Gopalan Dept. of ECE University of Maryland College Park, MD 20742 USA raghuram@umiacs.umd.edu Identifying humans