Detecting Object Instances Without Discriminative Features

Size: px

Start display at page:

Download "Detecting Object Instances Without Discriminative Features"

Helen Delilah Banks
5 years ago
Views:

1 Detecting Object Instances Without Discriminative Features Edward Hsiao June 19, 2013 Thesis Committee: Martial Hebert, Chair Alexei Efros Takeo Kanade Andrew Zisserman, University of Oxford 1

2 Object Instance Detection Find this object under arbitrary viewpoint, lighting, clutter and occlusions 2

3 3

4 4

5 Robotic Manipulation 5

6 Scene Understanding 6

7 Scene Understanding Microwave Coffee maker Paper towel Faucet Refrigerator Stove Dishwasher 7

8 Visual Search 8

9 Recognition Using [SIFT, Lowe 2004] Discriminative Features model test image 9

10 [SIFT, Lowe 2004] Extract Keypoints model test image 10

11 [SIFT, Lowe 2004] Generate 1-To-1 Correspondences model test image 11

12 Enforce Geometric Constraints [SIFT, Lowe 2004] model test image 12

13 [SIFT, Lowe 2004] Recognized Object model test image 13

14 Failure of Feature Matching model test image 0 correct correspondences 14

15 Overview Lack of Discriminative Features Ambiguous Keypoint Features Feature-poor objects Occlusions 15

16 Overview Lack of Discriminative Features Ambiguous Keypoint Features Feature-poor objects Occlusions 16

17 Ambiguous Keypoint Features 17

18 Repeated Patterns 18

19 Failure of Discriminative Matching mdesc 1 mdesc 2... Image keypoint descriptor Model descriptors Geometric model 19

20 Failure of Discriminative Matching? or One-to-one matching mdesc 1 mdesc 2... Image keypoint descriptor Model descriptors Geometric model 20

21 Failure of Discriminative Matching? or One-to-one matching mdesc 1 mdesc 2... Image keypoint descriptor Model descriptors Geometric model Most approaches discard ambiguous features 21

22 Quantized Matching qdesc 1 qdesc 2... Image keypoint descriptor Quantized model descriptors Geometric model 22

23 Quantized Matching qdesc 1 Quantized matching qdesc 2... Image keypoint descriptor Quantized model descriptors Geometric model Preserve ambiguity of match until geometric verification 23

24 Detection Performance CMU Grocery Dataset 0.9 Average Precision (higher is better) images, 10 household objects 0 one-to-one matching [Collet et al. 2009] quantized matching 24

25 Failure of Feature Matching model test image 0 correct correspondences 25

26 Keypoint Comparison Success Failure 26

27 Uninformative Keypoints 27

28 Uninformative Keypoints 28

29 Uninformative Keypoints 29

30 Informative Keypoints 980 keypoints 10 keypoints Keypoints contained entirely within the object 30

31 Informative Keypoints 980 keypoints 10 keypoints Keypoints due to specularities 31

32 Feature-richness More keypoints Less keypoints 32

33 Feature-richness More keypoints Less keypoints 33

34 Feature-richness More keypoints Less keypoints 34

35 Feature-richness More keypoints Less keypoints 35

36 Feature Matching Experiment 36

37 Feature Matching Experiment 37

38 Feature Matching Experiment 38

39 Feature Matching Experiment At least 5 good correspondences between all pairs of images 39

40 Works Fails More keypoints Less keypoints 40

41 Works Fails More keypoints Less keypoints 41

42 Works Fails More keypoints Less keypoints 42

43 Feature-rich Feature-poor More keypoints Less keypoints 43

44 Feature-rich Feature-poor More keypoints Less keypoints 44

45 Overview Lack of Discriminative Features Ambiguous Keypoint Features Feature-poor objects Occlusions 45

46 46

47 Feature-poor Objects Shape Matching Template shape Input window Matched shape 47

48 Representing Feature-poor Objects Sparse Edge Points [Berg 2005], [Leordeanu 2007], [Duchenne 2009], [Hinterstoisser 2011] Lines & Contour Fragments [Ferrari 2006 & 2008], [Opelt 2006], [Srinivasan 2010] Histogram of Oriented Gradients (HOG) [Dalal and Triggs 2005], [Lai 2011] 48

49 Sparse Edge Points Local information: gradient orientation and color 49

50 Sparse Edge Points Matched Not matched 50

51 Sparse Edge Points Matched Not matched 51

52 Sparse Edge Points Matched Not matched Edge connectivity is lost 52

53 Lines & Contour Fragments 53

54 Lines & Contour Fragments Dependent on edge extraction Splines sensitive to occlusions Line fitting is brittle Difficult to parameterize 54

55 Lines & Contour Fragments Dependent on edge extraction Splines sensitive to occlusions Line fitting is brittle Difficult to parameterize 55

56 Histogram of Oriented Gradients Coarse statistics of gradient orientation and magnitude 56

57 Histogram of Oriented Gradients patch HOG patch HOG Corrupted by background clutter Ambiguous shape 57

58 Histogram of Oriented Gradients patch HOG patch HOG Corrupted by background clutter Ambiguous shape 58

59 Gradient Networks Our Approach 1. Match shape explicitly 2. Enforce connectivity without extracting edges 59

60 Gradient Networks Overview Shape template Input window 60

61 Gradient Networks Overview Shape template Input window 61

62 Gradient Networks Local Shape Potential How well does each pixel match locally? 62

63 Gradient Networks Predicted Shape Match Find long connected components which follow shape 63

64 Local Shape Potential Distance to template Local orientation Color Edge potential 64

65 Local Shape Potential Distance to template Local orientation Color Edge potential 65

66 Local Shape Potential Distance to template Local orientation Color Edge potential 66

67 Local Orientation Potential local orientation potential model test 67

68 Local Shape Potential Distance to template Local orientation Color Edge potential 68

69 Local Shape Potential Distance to template Local orientation Color Edge potential 69

70 Local Shape Potential 70

71 Gradient Networks p p Each pixel is a node in the network 71

72 Gradient Networks p Q 0 p p q p Q 1 Connect each node to neighbors in tangent direction 72

73 Gradient Networks p p Find paths in the network that match the shape well 73

74 [Bhat et al. 2010] Message Passing Local shape potential p shape similarity local shape potential message from left message from right 74

75 Message Passing Local shape potential p Initially, it is just the local shape potential 75

76 Message Passing Local shape potential p 76

77 Message Passing Local shape potential p 77

78 Message Passing Local shape potential p 78

79 Predicted Shape Match Message passing Local shape potential Predicted match 79

80 CMU Kitchen Occlusion Dataset 1600 images of 8 feature-poor objects Single and multiple viewpoints Cluttered scenes and occlusions Objects Example images 80

81 Shape Matching Results Template Input window Local shape potential Predicted match 81

82 Shape Matching Results Template Input window Local shape potential Predicted match 82

83 Shape Matching Results Template Input window Local shape potential Predicted match 83

84 Object Detection Sliding Window 84

85 Object Detection Sliding Window 85

86 Detection Performance better 86

87 False positives with shape only Object False positive window GN point-wise confidences 88

88 Interior Appearance Object False positive window GN point-wise confidences 89

89 BaRT Boundary and Region Templates 90

90 BaRT Boundary and Region Templates 91

91 Boundary Explicit shape: rline2d and GN 92

92 BaRT Boundary and Region Templates 93

93 Region Consider appearance within the object interior HOG and color 94

94 BaRT Boundary and Region Templates 95

95 BaRT Combines explicit boundary and region information 96

96 HOG Uniform Regions Uniform regions not represented well 97

97 HOG Normalization Each cell normalized with respect to magnitude of neighbors 98

98 HOG Normalization Amplifies noise if magnitude close to 0 99

99 Uniform Regions 100

100 Learning? HOG + SVM Multiple images weight = 0 HOG + exemplar SVM Single image weight = random 101

101 Learning? HOG + SVM Multiple images weight = 0 HOG + exemplar SVM Single image weight = random 102

102 Learning? HOG + SVM Multiple images weight = 0 HOG + exemplar SVM Single image weight = random 103

103 Modify HOG Normalization HOG Modified HOG Set cell to zero if normalization below threshold 104

104 Matching Uniform Regions HOG Ours Test image: HOG Ours 105

105 Matching Uniform Regions HOG Ours Test image: HOG Ours 106

106 Matching Uniform Regions HOG Ours Test image: HOG Ours More accurate confidences in uniform regions 107

107 Example Detections detection zoomed in boundary (GN) region (HOG+color) 108

108 Example Detections detection zoomed in boundary (GN) region (HOG+color) 109

109 Example Detections detection zoomed in boundary (GN) region (HOG+color) 110

110 Detection Performance 112

111 Detection Performance 113

112 Detection Performance Under Different Occlusion Levels 114

113 Detection Performance Under Different Occlusion Levels 115

114 Overview Lack of Discriminative Features Ambiguous Keypoint Features Feature-poor objects Occlusions 116

115 Occlusions 117

116 Occlusions 118

117 Occlusions happen in 3D 119

118 Occlusions happen in 3D 120

119 Occlusions happen in 3D 121

120 Occlusions happen in 3D 122

121 Occlusion Reasoning Matched Not matched Which of these hypotheses is most likely? 123

122 Occlusion Reasoning Matched Not matched Which of these hypotheses is most likely? 124

123 Occlusion Reasoning Matched Not matched Which of these hypotheses is most likely? 125

124 Occlusion Reasoning Matched Not matched Which of these hypotheses is most likely? 126

125 Occlusion Reasoning Local Coherency Fransens 06, Wang 09 Object Detection Depth Ordering Wu 05, Wang 11 Learn Occlusion Structure Gao 11, Kwak

126 Structure of Occlusions Occlusion Conditional Likelihood Probability a point is visible given the visibility labeling of all other points Binary variable that equals 1 if is visible Matched Not matched Occlusion under a given camera view point c 128

127 Occlusion Reasoning Per Environment H obj L obj W obj Estimate of object dimensions Distribution of object dimensions for a given environment 129

128 Occlusion Model 130

129 Occlusion Model Object Occluder 131

130 Occlusion Model Wˆobj Object Ĥ obj ĥ Occluder ŵ 132

131 Occlusion Model Wˆobj Object Ĥ obj ĥ Occluder ŵ 133

132 Occlusion Conditional Likelihood X i X j X j A Vj,O c A Vi,V j,o c Integral Geometry 134

133 Occlusion Conditional Likelihood X i X j X j A Vj,O c A Vi,V j,o c Area covering all positions where X j is visible and object occluded 135

134 Occlusion Conditional Likelihood X i X j X j A Vj,O c A Vi,V j,o c Area covering all positions where X j is visible and object occluded 136

135 Occlusion Conditional Likelihood X i X j X j A Vj,O c A Vi,V j,o c Area covering all positions where X j is visible and object occluded 137

136 Occlusion Conditional Likelihood X i X j X j A Vj,O c A Vi,V j,o c Area covering all positions where X j and X j are visible and object occluded 138

137 Occlusion Conditional Likelihood X j 139

138 Occlusion Conditional Likelihood Under Different Viewpoints 140

139 Occlusion Conditional Likelihood Under Different Viewpoints 141

140 Occlusion Conditional Likelihood Penalty (OCLP) X i Matched Not matched f OCLP : High penalty if unlikely to be occluded by a valid object on same support surface 142

141 Occlusion Conditional Likelihood Penalty (OCLP) X i Matched Not matched f OCLP : Low penalty if likely to be occluded by a valid object on same support surface 143

142 Occlusion Conditional Likelihood Penalty (OCLP) X i Matched Not matched f OCLP : Low penalty if likely to be occluded by a valid object on same support surface 144

143 Example Detections 145

144 Detection Performance 146

145 Detection Performance Under Different Occlusion Levels 147

146 Limitation Binary Matching Pattern Occlusion Conditional Likelihood 148

147 Limitation Binary Matching Pattern Occlusion Conditional Likelihood Misclassifications can have impact on distribution 149

148 Occlusion Efficient Subwindow Search (OESS) Probabilistic Matching Pattern Probabilistic Matching Pattern 150

149 OESS for True Positive Occlusion can be explained well 151

150 OESS for True Positive 95% explained 152

151 OESS for False Positive 153

152 OESS for False Positive Only 50% explained 154

153 OESS Scoring Matching Pattern p = p = 0 score = (1) + (1) + (-1) + (-1) = 0 155

154 OESS Scoring Matching Pattern p = 1 rewarded +1 Occluding block -1 p = 0 score = (1) + (1) + (1) + (-1) = 2 156

155 OESS Scoring Matching Pattern penalized p = p = 0 Occluding block score = (-1) + (1) + (1) + (-1) = 0 157

156 OESS Reformulate as Efficient Subwindow Search (ESS) 158

157 OESS Find best occluder object 159

158 OESS Remove all explained points 160

159 OESS Iterate 161

160 OESS Iterate 162

161 OESS Iterate 163

162 OESS Final prediction 164

163 Results detection window boundary region oboxes predicted groundtruth 165

164 Results detection window boundary region oboxes predicted groundtruth 166

165 Results detection window boundary region oboxes predicted groundtruth 167

166 Results detection window boundary region oboxes predicted groundtruth 168

167 Occlusion Prediction Performance predicted vs. groundtruth Average Intersection over Union (IoU) 169

168 Occlusion Prediction Performance vs. predicted groundtruth 170

169 Detection Performance 171

170 172

171 Summary Lack of Discriminative Features Gradient Networks Occlusion Conditional Likelihood Boundary and Region Templates Occlusion Efficient Subwindow Search Ambiguous Keypoint Features Feature-poor objects Occlusions 173

172 Main Contributions Ambiguous Keypoint Features Making specific features less discriminative 174

173 Main Contributions Representing Feature-poor Objects Gradient Networks Explicit shape matching without extracting edges Boundary and Region Templates Capture explicit boundary and region information 175

174 Main Contributions Representing Feature-poor Objects Gradient Networks Explicit shape matching without extracting edges Boundary and Region Templates Capture explicit boundary and region information 176

175 Main Contributions Representing Feature-poor Objects Gradient Networks Explicit shape matching without extracting edges Boundary and Region Templates Capture explicit boundary and region information 177

176 Main Contributions Occlusion Reasoning Occlusion Conditional Likelihood Representing occlusion structure under arbitrary viewpoint Occlusion Efficient Subwindow Search Directly search for occluding blocks to explain matching pattern 178

177 Main Contributions Occlusion Reasoning Occlusion Conditional Likelihood Representing occlusion structure under arbitrary viewpoint Occlusion Efficient Subwindow Search Directly search for occluding blocks to explain matching pattern 179

178 Main Contributions Occlusion Reasoning Occlusion Conditional Likelihood Representing occlusion structure under arbitrary viewpoint Occlusion Efficient Subwindow Search Directly search for occluding blocks to explain matching pattern 180

179 Acknowledgements Martial Hebert Alexei Efros Takeo Kanade Andrew Zisserman 181

180 182

181 183

182 Background 184

183 Augmented Reality 3D model Target environment 185

184 Augmented Reality 3D model Target environment 186

185 Instance vs. Category Recognition Instance Arbitrary viewpoint and lighting Single image per view Category Intra-class variations Many images per view 187

186 Ambiguous Viewpoint 188

187 Failure of SIFT Matching 189

188 Invariant Approaches 190

189 Future Directions Fine-grained verification Scalability 3D 191

190 Fine-grained Verification 192

191 Scalability 193

192 3D 194

193 Datasets 195

194 CMU Grocery Dataset 620 images of household objects 10 objects 25 single instance, 25 double instance 12 with ground truth pose Clutter, viewpoint, lighting, occlusion

195 CMU Kitchen Occlusion Dataset 1600 images of 8 household objects Single and multiple viewpoints Cluttered scenes and occlusions Objects Example images Hsiao and Hebert, CVPR

196 Gradient Networks 198

197 Local Shape Potential Region of influence Appearance Edge 199

198 Local Appearance Gradient Orientation Color 200

199 Potentials Unary Pairwise 201

200 Message Passing Shape Similarity 202

201 Probability Calibration Weibull fit to tail of negative distribution CDF of NOT Object Density of NOT Object Probability of Object NOT Object Object scores 203 Scheirer et al. CVPR 2012

202 Soft Shape Model 204

203 Additional Results 205

204 Color Potential 206

205 LINE2D Similarity Quantized gradient orientation of model point, p i θ i Quantized gradient orientation of the best matching image point in a local neighborhood p i Model point score LINE N 2 D = cos( θi ) i= 1 o cos( 0 ) = 1.00 o cos( 45 ) = 0.71 LINE2D (Hinterstoisser et al., PAMI 2011) 207

206 Robust LINE2D Similarity Quantized gradient orientation of model point, p i θ i Quantized gradient orientation of the best matching image point in a local neighborhood p i Model point N score rline 2 D = δ ( θi = 0) i= 1 rline2d (Hsiao and Hebert, CVPR 2012) 208

207 Message Passing Iterations 209

208 Probability Calibration 210

209 F-Measure of Shape Matching 211

210 Single View 212

211 Multiple View 213

212 Detection 1.0 FPPI 214

213 Detection 1.0 FPPI 215

214 False Positives 216

215 BaRT 217

216 Grid Optimization Un-optimized : 57 cells Optimized : 60 cells 218

217 HOG Normalization Amplifies noise in uniform region! 219

218 HOG Normalization Sensitive to shading effects! 220

219 HOG Normalization Pedestrians 221

220 Average Precision 222

221 Single View 223

222 Multiple View 224

223 False positives Match both boundary and region 225

224 BaRT False Positives Insufficient edge evidence Unlikely occlusion configuration Region information is only informative after there is a plausible hypothesis based on the boundary 226

225 Occlusion Reasoning 227

226 Occlusion Model 228

227 Occlusion Scoring Object detector Occlusion model Score of window Sliding window Occlusion hypothesis (binary) 229

228 Occlusion Conditional Likelihood 230

229 Occlusion Conditional Likelihood Approximation Approximate Analytic 231

230 Distribution of Physical Dimensions Household Objects 232

231 Occlusion Statistics 233

232 Validity of Occlusion Model 234

233 Occlusion Penalty Occlusion Prior Penalty (OPP) Occlusion Conditional Likelihood Penalty (OCLP) 235

234 Average Precision 236

235 Performance vs. Occlusion 237

236 Learning from Data 238

237 Parameter Sensitivity 239

238 OESS 240

239 Occlusion Upper Bound 241

240 OESS Algorithm 242

241 OESS vs. Brute Force 243

242 Occlusion Prediction 244

243 Object Detection Performance 245

244 Ambiguous Features 246

245 Problem Not enough correct matches Difficult to obtain matches Result of our system

246 Discriminative hierarchical matching (DHM) discriminative match Model features (Level 0) Image features discriminative match Quantized features (Level 1) aggregate Candidate correspondences discriminative match Quantized features (Level 2)

247 DHM example All features

248 DHM result Ratio test 3 correct matches (soymilk can) DHM 11 correct matches (soymilk can)

249 Simulated Affine (SA) Morel & Yu 2009

250 Baseline systems Gordon & Lowe SIFT + RANSAC Levenberg-Marquardt non-linear optimization Enhanced PnP (EPnP) Gordon & Lowe EPnP non-iterative pose estimation algorithm Collet et al. Gordon & Lowe Mean-shift spatial clustering of image features

251 Averaged precision-recall

252 Average Precision

253 Object detection results

254 Failure cases Pose ambiguity Repeated patterns Extreme lighting, occlusion, viewpoint etc

Coherent Occlusion Reasoning for Instance Recognition

Coherent Occlusion Reasoning for Instance Recognition oherent Occlusion Reasoning for Instance Recognition Edward Hsiao and Martial Hebert The Robotics Institute arnegie Mellon University Pittsburgh, PA 15213 Abstract Occlusions are common in real world scenes