Hierarchical Image-Region Labeling via Structured Learning

Size: px

Start display at page:

Download "Hierarchical Image-Region Labeling via Structured Learning"

Ross Tucker
5 years ago
Views:

1 Hierarchical Image-Region Labeling via Structured Learning Julian McAuley, Teo de Campos, Gabriela Csurka, Florent Perronin XRCE September 14, 2009 McAuley et al (XRCE) Hierarchical Image-Region Labeling September 14, / 51

2 Structured Learning Most of my work has been concerned with applying machine learning to computer vision problems. In our BMVC work, we wanted to explore how recent advances in machine learning could be applied to the problem of image segmentation and classification. Structured Learning is an extension of SVMs for cases where the output is structured (as opposed to a binary output) McAuley et al (XRCE) Hierarchical Image-Region Labeling September 14, / 51

3 Structured Output Spaces Parse trees [Tsochantaridis et al 04] McAuley et al (XRCE) Hierarchical Image-Region Labeling September 14, / 51

4 Structured Output Spaces Graph matching McAuley et al (XRCE) Hierarchical Image-Region Labeling September 14, / 51

5 Structured Output Spaces Word alignment McAuley et al (XRCE) Hierarchical Image-Region Labeling September 14, / 51

6 Structured Output Spaces Ranking McAuley et al (XRCE) Hierarchical Image-Region Labeling September 14, / 51

7 Structured Output Spaces Matching McAuley et al (XRCE) Hierarchical Image-Region Labeling September 14, / 51

8 Structured Output Spaces Segmentation McAuley et al (XRCE) Hierarchical Image-Region Labeling September 14, / 51

9 Structured Learning Our risk functional is very similar to that of SVMs: (regularized) risk N (g w (X n ), y n ) 1 N n=1 } {{ } empirical risk + λω(w) }{{} regularization term N y n g w (X n ) (y, y n ) Ω(w) λ size of training set n th training instance w-parametrised classification function loss function regularisation function regularisation hyper-parameter McAuley et al (XRCE) Hierarchical Image-Region Labeling September 14, / 51

10 Structured Learning energy function SVM energy function y = g w (X n ) = arg min Φ(X n, y), w y Y y = g w (X n ) = arg min yφ(x n ), w y { 1,1} Obviously, we require that this minimisation problem is tractable. McAuley et al (XRCE) Hierarchical Image-Region Labeling September 14, / 51

11 Structured Learning For a binary classifier, minimising the risk is easy, as the problem is convex (in the separable case) For a structured classifier, minimising the risk is hard, as the risk is piecewise constant in w McAuley et al (XRCE) Hierarchical Image-Region Labeling September 14, / 51

12 Structured Learning Since we cannot minimize the risk, we minimize a convex upper bound on it. soft margin formulation subject to min w,ξ 1 N N ξ n + λω(w) n=1 Φ(X n, y) Φ(X n, y n ), w (y, y n ) ξ n for all n and y Y the problem is now convex, but the number of constraints is large. McAuley et al (XRCE) Hierarchical Image-Region Labeling September 14, / 51

13 Structured Learning Any subset of these constraints yields a convex upper bound on the correct solution We want to choose some good subset of constraints We choose those constraints which maximise ξ n McAuley et al (XRCE) Hierarchical Image-Region Labeling September 14, / 51

14 Structured Learning These can be found by the following equation: column generation ŷ n = arg min[ Φ(X n, y), w (y, y n )] y Obviously, this problem must also be tractable. McAuley et al (XRCE) Hierarchical Image-Region Labeling September 14, / 51

15 Summary Our output space is structured Our problem can be linearly parametrised The energy minimisation problem can be efficiently solved We can solve the column generation problem (usually due to a decomposable loss) McAuley et al (XRCE) Hierarchical Image-Region Labeling September 14, / 51

16 Part II: BMVC paper Hierarchical Image-Region Labeling via Structured Learning McAuley et al (XRCE) Hierarchical Image-Region Labeling September 14, / 51

17 Is this a bus or a car? McAuley et al (XRCE) Hierarchical Image-Region Labeling September 14, / 51

18 Is this a bus or a car? McAuley et al (XRCE) Hierarchical Image-Region Labeling September 14, / 51

19 A pyramidal representation allows us to combine features from multiple scales: (image stolen from [Leclerc et al]) McAuley et al (XRCE) Hierarchical Image-Region Labeling September 14, / 51

20 Our approach We would like to learn how to combine information from different classifiers, which operate at different scales McAuley et al (XRCE) Hierarchical Image-Region Labeling September 14, / 51

21 Our approach The output of our method should look something like this: We d like to use the results from small scales to help large scales, and vice-versa. McAuley et al (XRCE) Hierarchical Image-Region Labeling September 14, / 51

22 Sometimes a classifier will be wrong, but in a predictable way McAuley et al (XRCE) Hierarchical Image-Region Labeling September 14, / 51

23 bus + car = minibus Sometimes a combination of outputs will yield new insights McAuley et al (XRCE) Hierarchical Image-Region Labeling September 14, / 51

24 A bad example McAuley et al (XRCE) Hierarchical Image-Region Labeling September 14, / 51

25 A bad example McAuley et al (XRCE) Hierarchical Image-Region Labeling September 14, / 51

26 A bad example McAuley et al (XRCE) Hierarchical Image-Region Labeling September 14, / 51

27 A bad example McAuley et al (XRCE) Hierarchical Image-Region Labeling September 14, / 51

28 How to enforce consistency? We d like to encode the fact that the previous labeling is impossible Can enforce neighbourhood constraints Efficient algorithms exist for performing binary segmentation McAuley et al (XRCE) Hierarchical Image-Region Labeling September 14, / 51

29 Grid structured graphical models? Few efficient or exact techniques for multi-label segmentation Not clear how to combine data from different scales Poorly suited to structured learning if exact inference is impossible McAuley et al (XRCE) Hierarchical Image-Region Labeling September 14, / 51

30 If we form connections between overlapping regions: then our model is tree-structured; we can perform inference efficiently and exactly using the belief-propagation. McAuley et al (XRCE) Hierarchical Image-Region Labeling September 14, / 51

31 Energy minimisation problem We will have two terms; a node compatibility term: E(region, label) = Φ nodes (region, label), θ nodes and an edge compatibility term: E(r p, r c, label p, label c ) = Φ edges (r p, r c, label p, label c ), θ edges. We choose the assignment that minimizes the total energy: Y = arg min Y E(r, y(r)) + E(r p, r c, y(r p ), y(r c )). r nodes This is linear in Θ = (θ nodes ; θ edges ). (r p,r c) edges McAuley et al (XRCE) Hierarchical Image-Region Labeling September 14, / 51

32 First-order region features Rather than computing features for each image region (e.g. SIFT), we run existing classifiers on each image region, and use their probabilistic outputs. We use classifiers from [Csurka and Perronin, BMVC 2008]: a classifier which is trained to classify entire images a classifier which is trained to classify bounding-boxes a classifier which is trained on image patches in principle, any combination of classifiers could be used McAuley et al (XRCE) Hierarchical Image-Region Labeling September 14, / 51

33 We use all classifiers at all scales: Φ nodes (r, label) = (0,..., Pr,label 1 }{{,..., 0)... (0,..., Pr,label n }}{{,..., 0) } feature for first classifier feature for n th classifier Thus the model learns which classifiers are useful at which scales. McAuley et al (XRCE) Hierarchical Image-Region Labeling September 14, / 51

34 Second-order region features We want to ban the inconsistent assignments I mentioned earlier: multiple aeroplane bicycle train tvmonitor background The label of a child region must be at or below the label of its parent region on this DAG McAuley et al (XRCE) Hierarchical Image-Region Labeling September 14, / 51

35 Consistency: if label p label c, H(label p, label c ) = 0 if label p label c, 1 otherwise (label p = label c ) Regions with the same label should have similar features: Φ edges (r p, r c ; label p, label c ) = H(label p, label c ) P rp,label p P rc,label c 2 Thus we learn the extent to which classifiers are consistent at different scales McAuley et al (XRCE) Hierarchical Image-Region Labeling September 14, / 51

36 Our training data We use bounding boxes from the VOC2007/2008 datasets. McAuley et al (XRCE) Hierarchical Image-Region Labeling September 14, / 51

37 Our loss function We just use the Hamming loss: (y, y n ) = 1 r regions I {y n (r)}(y (r)). This decomposes over the nodes in our model: E(r, y(r)) I {y n (r)}(y(r)). r regions This is a fairly reasonable type of loss, though it is worth mentioning that some losses do not decompose. McAuley et al (XRCE) Hierarchical Image-Region Labeling September 14, / 51

38 We re done! We ve specified a linearly parametrised energy function We ve specified a (decomposable) loss function We ve specified an algorithm to find constraints We can now iteratively find the best value of Θ using an off-the-shelf solver such as SVM struct [Joachims] or BMRM [Teo et al]. McAuley et al (XRCE) Hierarchical Image-Region Labeling September 14, / 51

39 Our learned feature vector Node weights Image-level classifier Mid-level classifier Patch-level classifier background aeroplane bicycle bird boat bottle bus car cat chair cow diningtable dog horse motorbike person pottedplant sheep sofa train tvmonitor multiple background aeroplane bicycle bird boat bottle bus car cat chair cow diningtable dog horse motorbike person pottedplant sheep sofa train tvmonitor multiple background aeroplane bicycle bird boat bottle bus car cat chair cow diningtable dog horse motorbike person pottedplant sheep sofa train tvmonitor multiple Level 2/3 Level 1/2 Level 0/1 background aeroplane bicycle bird boat bottle bus car cat chair cow diningtable dog horse motorbike person pottedplant sheep sofa train tvmonitor multiple background aeroplane bicycle bird boat bottle bus car cat chair cow diningtable dog horse motorbike person pottedplant sheep sofa train tvmonitor multiple background aeroplane bicycle bird boat bottle bus car cat chair cow diningtable dog horse motorbike person pottedplant sheep sofa train tvmonitor multiple Level 3 Level 2 Level 1 Level 0 Edge weights Image-level classifier Mid-level classifier Patch-level classifier McAuley et al (XRCE) Hierarchical Image-Region Labeling September 14, / 51

40 Some results Correct labeling (loss = 0) McAuley et al (XRCE) Hierarchical Image-Region Labeling September 14, / 51

41 Some results Csurka and Perronin, BMVC 2008 (loss = 0.434) McAuley et al (XRCE) Hierarchical Image-Region Labeling September 14, / 51

42 Some results Our method (loss = 0.230) McAuley et al (XRCE) Hierarchical Image-Region Labeling September 14, / 51

43 Some results Training Validation Testing Csu. & Per (0.005) (0.004) (0.003) Non-learning (0.006) (0.005) (0.004) Learning (0.006) (0.006) (0.004) McAuley et al (XRCE) Hierarchical Image-Region Labeling September 14, / 51

44 Failure cases... Objects that co-occur with person are often classified as person McAuley et al (XRCE) Hierarchical Image-Region Labeling September 14, / 51

45 Stuff we could have done... Explore other possible tree-structures McAuley et al (XRCE) Hierarchical Image-Region Labeling September 14, / 51

46 Stuff we could have done... Explore other possible tree-structures McAuley et al (XRCE) Hierarchical Image-Region Labeling September 14, / 51

47 Stuff we could have done... multiple aeroplane bicycle train tvmonitor person hand face foot background Used a more complex hierarchy McAuley et al (XRCE) Hierarchical Image-Region Labeling September 14, / 51

48 Stuff we could have done... Used segmented (rather than bounding-box) data McAuley et al (XRCE) Hierarchical Image-Region Labeling September 14, / 51

49 Stuff we could have done... Our approach is something between classification and segmentation By using more computational resources, we could do better segmentation By obtaining better compound classes, we could do better classification McAuley et al (XRCE) Hierarchical Image-Region Labeling September 14, / 51

50 Conclusions We have a generic model for combining different classifiers at different scales Learning yields a significant improvement, but limits us in our choice of loss function Loads of possible directions for future work McAuley et al (XRCE) Hierarchical Image-Region Labeling September 14, / 51

51 The end McAuley et al (XRCE) Hierarchical Image-Region Labeling September 14, / 51

Beyond Sliding Windows: Object Localization by Efficient Subwindow Search

Beyond Sliding Windows: Object Localization by Efficient Subwindow Search Christoph H. Lampert, Matthew B. Blaschko, & Thomas Hofmann Max Planck Institute for Biological Cybernetics Tübingen, Germany Google,