Visual Recognition: Examples of Graphical Models
|
|
- Pearl Joseph
- 5 years ago
- Views:
Transcription
1 Visual Recognition: Examples of Graphical Models Raquel Urtasun TTI Chicago March 6, 2012 Raquel Urtasun (TTI-C) Visual Recognition March 6, / 64
2 Graphical models Applications Representation Inference Learning message passing (LP relaxations) graph cuts Raquel Urtasun (TTI-C) Visual Recognition March 6, / 64
3 Learning in graphical models Raquel Urtasun (TTI-C) Visual Recognition March 6, / 64
4 Parameter learning The MAP problem was defined as max θ i (y i ) + y 1,,y n α i θ α (y α ) Learn parameters w for more accurate prediction max w i φ i (y i ) + w α φ α (y α ) y 1,,y n α i Raquel Urtasun (TTI-C) Visual Recognition March 6, / 64
5 Loss functions Regularized loss minimization: Given input pairs (x, y) S, minimize ˆl(w, x, y) + C p w p p, (x,y) S Different learning frameworks depending on the surrogate loss ˆl(w, x, y) Hinge for Structural SVMs [Tsochantaridis et al. 05, Taskar et al. 04] log-loss for Conditional Random Fields [Lafferty et al. 01] Unified by [Hazan and Urtasun, 10] Raquel Urtasun (TTI-C) Visual Recognition March 6, / 64
6 Loss functions Regularized loss minimization: Given input pairs (x, y) S, minimize ˆl(w, x, y) + C p w p p, (x,y) S Different learning frameworks depending on the surrogate loss ˆl(w, x, y) Hinge for Structural SVMs [Tsochantaridis et al. 05, Taskar et al. 04] log-loss for Conditional Random Fields [Lafferty et al. 01] Unified by [Hazan and Urtasun, 10] Raquel Urtasun (TTI-C) Visual Recognition March 6, / 64
7 Recall SVM In SVMs we minimize the following program min w 1 2 w 2 + i ξ i subject to y i (b + w T x i ) 1 + ξ i 0, i = 1,..., N. with y i { 1, 1} binary. We need to extend this to reason about more complex structures, not just binary variables. Raquel Urtasun (TTI-C) Visual Recognition March 6, / 64
8 Structural SVM [Tsochantaridis et al., 05] We want to construct a function f (x, y) = arg max y Y wt φ(x, y) which is parameterized in terms of w, the parameters to learn. We will like to minimize the empirical risk R s (f, w) = 1 n n (y i, f (x i, w)) i=1 Raquel Urtasun (TTI-C) Visual Recognition March 6, / 64
9 Structural SVM [Tsochantaridis et al., 05] We want to construct a function f (x, y) = arg max y Y wt φ(x, y) which is parameterized in terms of w, the parameters to learn. We will like to minimize the empirical risk R s (f, w) = 1 n n (y i, f (x i, w)) i=1 This is the expected loss under the empirical distribution induced Raquel Urtasun (TTI-C) Visual Recognition March 6, / 64
10 Structural SVM [Tsochantaridis et al., 05] We want to construct a function f (x, y) = arg max y Y wt φ(x, y) which is parameterized in terms of w, the parameters to learn. We will like to minimize the empirical risk R s (f, w) = 1 n n (y i, f (x i, w)) i=1 This is the expected loss under the empirical distribution induced (y i, f (x i, w)) is the task loss which depends on the application segmentation: per pixel segmentation error detection: intersection over the union Raquel Urtasun (TTI-C) Visual Recognition March 6, / 64
11 Structural SVM [Tsochantaridis et al., 05] We want to construct a function f (x, y) = arg max y Y wt φ(x, y) which is parameterized in terms of w, the parameters to learn. We will like to minimize the empirical risk R s (f, w) = 1 n n (y i, f (x i, w)) i=1 This is the expected loss under the empirical distribution induced (y i, f (x i, w)) is the task loss which depends on the application segmentation: per pixel segmentation error detection: intersection over the union Typically, (y, y ) = 0 if y = y Raquel Urtasun (TTI-C) Visual Recognition March 6, / 64
12 Structural SVM [Tsochantaridis et al., 05] We want to construct a function f (x, y) = arg max y Y wt φ(x, y) which is parameterized in terms of w, the parameters to learn. We will like to minimize the empirical risk R s (f, w) = 1 n n (y i, f (x i, w)) i=1 This is the expected loss under the empirical distribution induced (y i, f (x i, w)) is the task loss which depends on the application segmentation: per pixel segmentation error detection: intersection over the union Typically, (y, y ) = 0 if y = y Raquel Urtasun (TTI-C) Visual Recognition March 6, / 64
13 Separable case We will like to minimize the empirical risk R s (f, w) = 1 n n (y i, f (x i, w)) i=1 We will have 0 train error if we satisfy max {w T φ(x i, y)} w T φ(x i, y i ) y Y\y i since (y i, y i ) = 0 and (y i, y) > 0, y Y \ y i. Raquel Urtasun (TTI-C) Visual Recognition March 6, / 64
14 Separable case We will like to minimize the empirical risk R s (f, w) = 1 n n (y i, f (x i, w)) i=1 We will have 0 train error if we satisfy max {w T φ(x i, y)} w T φ(x i, y i ) y Y\y i since (y i, y i ) = 0 and (y i, y) > 0, y Y \ y i. This can be replaced by Y 1 inequalities i {1,, n}, y Y \ y i : w T φ(x i, y i ) w T φ(x i, y) 0 Raquel Urtasun (TTI-C) Visual Recognition March 6, / 64
15 Separable case We will like to minimize the empirical risk R s (f, w) = 1 n n (y i, f (x i, w)) i=1 We will have 0 train error if we satisfy max {w T φ(x i, y)} w T φ(x i, y i ) y Y\y i since (y i, y i ) = 0 and (y i, y) > 0, y Y \ y i. This can be replaced by Y 1 inequalities i {1,, n}, y Y \ y i : w T φ(x i, y i ) w T φ(x i, y) 0 What s the problem of this? Raquel Urtasun (TTI-C) Visual Recognition March 6, / 64
16 Separable case We will like to minimize the empirical risk R s (f, w) = 1 n n (y i, f (x i, w)) i=1 We will have 0 train error if we satisfy max {w T φ(x i, y)} w T φ(x i, y i ) y Y\y i since (y i, y i ) = 0 and (y i, y) > 0, y Y \ y i. This can be replaced by Y 1 inequalities i {1,, n}, y Y \ y i : w T φ(x i, y i ) w T φ(x i, y) 0 What s the problem of this? Raquel Urtasun (TTI-C) Visual Recognition March 6, / 64
17 Separable case Satisfying the inequalities might have more than one solution. Select the w with the maximum margin. Raquel Urtasun (TTI-C) Visual Recognition March 6, / 64
18 Separable case Satisfying the inequalities might have more than one solution. Select the w with the maximum margin. We can thus form the following optimization problem 1 min w 2 w 2 subject to w T φ(x i, y i ) w T φ(x i, y) 1 i {1,, n}, y Y \ y i Raquel Urtasun (TTI-C) Visual Recognition March 6, / 64
19 Separable case Satisfying the inequalities might have more than one solution. Select the w with the maximum margin. We can thus form the following optimization problem 1 min w 2 w 2 subject to w T φ(x i, y i ) w T φ(x i, y) 1 i {1,, n}, y Y \ y i This is a quadratic program, so it s convex Raquel Urtasun (TTI-C) Visual Recognition March 6, / 64
20 Separable case Satisfying the inequalities might have more than one solution. Select the w with the maximum margin. We can thus form the following optimization problem 1 min w 2 w 2 subject to w T φ(x i, y i ) w T φ(x i, y) 1 i {1,, n}, y Y \ y i This is a quadratic program, so it s convex But it involves exponentially many constraints! Raquel Urtasun (TTI-C) Visual Recognition March 6, / 64
21 Separable case Satisfying the inequalities might have more than one solution. Select the w with the maximum margin. We can thus form the following optimization problem 1 min w 2 w 2 subject to w T φ(x i, y i ) w T φ(x i, y) 1 i {1,, n}, y Y \ y i This is a quadratic program, so it s convex But it involves exponentially many constraints! Raquel Urtasun (TTI-C) Visual Recognition March 6, / 64
22 Non-separable case Multiple formulations Multi-class classification [Crammer & Singer, 03] Slack re-scaling [Tsochantaridis et al. 05] Margin re-scaling [Taskar et al. 04] Let s look at them in more details Raquel Urtasun (TTI-C) Visual Recognition March 6, / 64
23 Multi-class classification [Crammer & Singer, 03] Enforce a large margin and do a batch convex optimization The minimization program is then min w 1 2 w 2 + C n n i=1 ξ i s.t. w T φ(x i, y i ) w T φ(x i, y) 1 ξ i i {1,, n}, y y i Can also be written in terms of kernels Raquel Urtasun (TTI-C) Visual Recognition March 6, / 64
24 Structured Output SVMs Frame structured prediction as a multiclass problem to predict a single element of Y and pay a penalty for mistakes Not all errors are created equally, e.g. in an HMM making only one mistake in a sequence should be penalized less than making 50 mistakes Raquel Urtasun (TTI-C) Visual Recognition March 6, / 64
25 Structured Output SVMs Frame structured prediction as a multiclass problem to predict a single element of Y and pay a penalty for mistakes Not all errors are created equally, e.g. in an HMM making only one mistake in a sequence should be penalized less than making 50 mistakes Pay a loss proportional to the difference between true and predicted error (task dependent) (y i, y) [Source: M. Blaschko] Raquel Urtasun (TTI-C) Visual Recognition March 6, / 64
26 Structured Output SVMs Frame structured prediction as a multiclass problem to predict a single element of Y and pay a penalty for mistakes Not all errors are created equally, e.g. in an HMM making only one mistake in a sequence should be penalized less than making 50 mistakes Pay a loss proportional to the difference between true and predicted error (task dependent) (y i, y) [Source: M. Blaschko] Raquel Urtasun (TTI-C) Visual Recognition March 6, / 64
27 Slack re-scaling Re-scale the slack variables according to the loss incurred in each of the linear constraints Violating a margin constraint involving a y y i with high loss (y i, y) should be penalized more than a violation involving an output value with smaller loss Raquel Urtasun (TTI-C) Visual Recognition March 6, / 64
28 Slack re-scaling Re-scale the slack variables according to the loss incurred in each of the linear constraints Violating a margin constraint involving a y y i with high loss (y i, y) should be penalized more than a violation involving an output value with smaller loss The minimization program is then min w 1 2 w 2 + C n n i=1 s.t. w T φ(x i, y i ) w T φ(x i, y) 1 ξ i ξ i (y i, y) i {1,, n}, y Y \ y i Raquel Urtasun (TTI-C) Visual Recognition March 6, / 64
29 Slack re-scaling Re-scale the slack variables according to the loss incurred in each of the linear constraints Violating a margin constraint involving a y y i with high loss (y i, y) should be penalized more than a violation involving an output value with smaller loss The minimization program is then min w 1 2 w 2 + C n n i=1 s.t. w T φ(x i, y i ) w T φ(x i, y) 1 ξ i ξ i (y i, y) i {1,, n}, y Y \ y i The justification is that 1 n n i=1 ξ i is an upper-bound on the empirical risk. Easy to proof Raquel Urtasun (TTI-C) Visual Recognition March 6, / 64
30 Slack re-scaling Re-scale the slack variables according to the loss incurred in each of the linear constraints Violating a margin constraint involving a y y i with high loss (y i, y) should be penalized more than a violation involving an output value with smaller loss The minimization program is then min w 1 2 w 2 + C n n i=1 s.t. w T φ(x i, y i ) w T φ(x i, y) 1 ξ i ξ i (y i, y) i {1,, n}, y Y \ y i The justification is that 1 n n i=1 ξ i is an upper-bound on the empirical risk. Easy to proof Raquel Urtasun (TTI-C) Visual Recognition March 6, / 64
31 Margin re-scaling In this case the minimization problem is formulated as min w 1 2 w 2 + C n n i=1 ξ i s.t. w T φ(x i, y i ) w T φ(x i, y) (y i, y) ξ i i {1,, n}, y Y \ y i The justification is that 1 n n i=1 ξ i is an upper-bound on the empirical risk. Also easy to proof. Raquel Urtasun (TTI-C) Visual Recognition March 6, / 64
32 Raquel Urtasun (TTI-C) Visual Recognition March 6, / 64
33 Constraint Generation To find the most violated constraint, we need to maximize w.r.t. y for margin rescaling w T φ(x i, y) + (y i, y) and for slack rescaling {w T φ(x i, y) + 1 w T φ(x i, y i )} (y i, y) For arbitrary output spaces, we would need to iterate over all elements in Y Raquel Urtasun (TTI-C) Visual Recognition March 6, / 64
34 Constraint Generation To find the most violated constraint, we need to maximize w.r.t. y for margin rescaling w T φ(x i, y) + (y i, y) and for slack rescaling {w T φ(x i, y) + 1 w T φ(x i, y i )} (y i, y) For arbitrary output spaces, we would need to iterate over all elements in Y Use Graph-cuts or message passing Raquel Urtasun (TTI-C) Visual Recognition March 6, / 64
35 Constraint Generation To find the most violated constraint, we need to maximize w.r.t. y for margin rescaling w T φ(x i, y) + (y i, y) and for slack rescaling {w T φ(x i, y) + 1 w T φ(x i, y i )} (y i, y) For arbitrary output spaces, we would need to iterate over all elements in Y Use Graph-cuts or message passing When the MAP cannot be computed exactly, but only approximately, this algorithm does not behave well [Fidley et al., 08] Raquel Urtasun (TTI-C) Visual Recognition March 6, / 64
36 Constraint Generation To find the most violated constraint, we need to maximize w.r.t. y for margin rescaling w T φ(x i, y) + (y i, y) and for slack rescaling {w T φ(x i, y) + 1 w T φ(x i, y i )} (y i, y) For arbitrary output spaces, we would need to iterate over all elements in Y Use Graph-cuts or message passing When the MAP cannot be computed exactly, but only approximately, this algorithm does not behave well [Fidley et al., 08] Raquel Urtasun (TTI-C) Visual Recognition March 6, / 64
37 One Slack Formulation Margin rescaling 1 min w 2 w 2 + C n ξ s.t. w T φ(x i, y i ) w T φ(x i, y) (y i, y) ξ i {1,, n}, y Y \ y i Slack rescaling min w 1 2 w 2 + C n ξ s.t. w T φ(x i, y i ) w T φ(x i, y) 1 ξ (y i, y) i {1,, n}, y Y \ y i Raquel Urtasun (TTI-C) Visual Recognition March 6, / 64
38 One Slack Formulation Margin rescaling 1 min w 2 w 2 + C n ξ s.t. w T φ(x i, y i ) w T φ(x i, y) (y i, y) ξ i {1,, n}, y Y \ y i Slack rescaling min w 1 2 w 2 + C n ξ s.t. w T φ(x i, y i ) w T φ(x i, y) 1 ξ (y i, y) i {1,, n}, y Y \ y i Same optima as previous formulation [Joachims et al, 09] Raquel Urtasun (TTI-C) Visual Recognition March 6, / 64
39 One Slack Formulation Margin rescaling 1 min w 2 w 2 + C n ξ s.t. w T φ(x i, y i ) w T φ(x i, y) (y i, y) ξ i {1,, n}, y Y \ y i Slack rescaling min w 1 2 w 2 + C n ξ s.t. w T φ(x i, y i ) w T φ(x i, y) 1 ξ (y i, y) i {1,, n}, y Y \ y i Same optima as previous formulation [Joachims et al, 09] Raquel Urtasun (TTI-C) Visual Recognition March 6, / 64
40 Example: Handwritten Recognition Predict text from image of handwritten characters Equivalently: Iterate [Source: B. Taskar] Estimate model parameters w using active constraint set Generate the next constraint Raquel Urtasun (TTI-C) Visual Recognition March 6, / 64
41 Conditional Random Fields Regularized loss minimization: Given input pairs (x, y) S, minimize ˆl(w, x, y) + C p w p p, (x,y) S CRF loss: The conditional distribution is p x,y (ŷ; w) = Z(x, y) = ŷ Y 1 Z(x, y) exp ( l(y, ŷ) + w Φ(x, ŷ) ) exp ( l(y, ŷ) + w Φ(x, ŷ) ) where l(y, ŷ) is a prior distribution and Z(x, y) the partition function, and l log (w, x, y) = ln 1 p x,y (y; w). Raquel Urtasun (TTI-C) Visual Recognition March 6, / 64
42 Conditional Random Fields Regularized loss minimization: Given input pairs (x, y) S, minimize ˆl(w, x, y) + C p w p p, (x,y) S CRF loss: The conditional distribution is p x,y (ŷ; w) = Z(x, y) = ŷ Y 1 Z(x, y) exp ( l(y, ŷ) + w Φ(x, ŷ) ) exp ( l(y, ŷ) + w Φ(x, ŷ) ) where l(y, ŷ) is a prior distribution and Z(x, y) the partition function, and l log (w, x, y) = ln 1 p x,y (y; w). Raquel Urtasun (TTI-C) Visual Recognition March 6, / 64
43 CRF learning In CRFs one aims to minimize the regularized negative log-likelihood of the conditional distribution (CRF) min ln Z(x, y) d w + C w p w p p, (x,y) S where (x, y) S ranges over the training pairs and d = Φ(x, y) is the vector of empirical means. (x,y) S In coordinate descent methods, each coordinate w r is iteratively updated in the direction of the negative gradient, for some step size η. Raquel Urtasun (TTI-C) Visual Recognition March 6, / 64
44 CRF learning In CRFs one aims to minimize the regularized negative log-likelihood of the conditional distribution (CRF) min ln Z(x, y) d w + C w p w p p, (x,y) S where (x, y) S ranges over the training pairs and d = Φ(x, y) is the vector of empirical means. (x,y) S In coordinate descent methods, each coordinate w r is iteratively updated in the direction of the negative gradient, for some step size η. The gradient of the log-partition function corresponds to the probability distribution p(ŷ x, y; w), and the direction of descent takes the form p(ŷ x, y; w)φ r (x, ŷ) d r + w r p 1 sign(w r ). (x,y) S ŷ Raquel Urtasun (TTI-C) Visual Recognition March 6, / 64
45 CRF learning In CRFs one aims to minimize the regularized negative log-likelihood of the conditional distribution (CRF) min ln Z(x, y) d w + C w p w p p, (x,y) S where (x, y) S ranges over the training pairs and d = Φ(x, y) is the vector of empirical means. (x,y) S In coordinate descent methods, each coordinate w r is iteratively updated in the direction of the negative gradient, for some step size η. The gradient of the log-partition function corresponds to the probability distribution p(ŷ x, y; w), and the direction of descent takes the form p(ŷ x, y; w)φ r (x, ŷ) d r + w r p 1 sign(w r ). (x,y) S ŷ Problem: Requires computing the partition function! Raquel Urtasun (TTI-C) Visual Recognition March 6, / 64
46 CRF learning In CRFs one aims to minimize the regularized negative log-likelihood of the conditional distribution (CRF) min ln Z(x, y) d w + C w p w p p, (x,y) S where (x, y) S ranges over the training pairs and d = Φ(x, y) is the vector of empirical means. (x,y) S In coordinate descent methods, each coordinate w r is iteratively updated in the direction of the negative gradient, for some step size η. The gradient of the log-partition function corresponds to the probability distribution p(ŷ x, y; w), and the direction of descent takes the form p(ŷ x, y; w)φ r (x, ŷ) d r + w r p 1 sign(w r ). (x,y) S ŷ Problem: Requires computing the partition function! Raquel Urtasun (TTI-C) Visual Recognition March 6, / 64
47 Loss functions Regularized loss minimization: Given input pairs (x, y) S, minimize ˆl(w, x, y) + C p w p p, (x,y) S In structure SVMs { l hinge (w, x, y) = max l(y, ŷ) + w Φ(x, ŷ) w Φ(x, y) } ŷ Y Raquel Urtasun (TTI-C) Visual Recognition March 6, / 64
48 Loss functions Regularized loss minimization: Given input pairs (x, y) S, minimize ˆl(w, x, y) + C p w p p, (x,y) S In structure SVMs { l hinge (w, x, y) = max l(y, ŷ) + w Φ(x, ŷ) w Φ(x, y) } ŷ Y CRF loss: The conditional distribution is 1 p x,y (ŷ; w) = Z(x, y) exp ( l(y, ŷ) + w Φ(x, ŷ) ) Z(x, y) = exp ( l(y, ŷ) + w Φ(x, ŷ) ) ŷ Y where l(y, ŷ) is a prior distribution and Z(x, y) the partition function, and l log (w, x, y) = ln 1 p x,y (y; w). Raquel Urtasun (TTI-C) Visual Recognition March 6, / 64
49 Loss functions Regularized loss minimization: Given input pairs (x, y) S, minimize ˆl(w, x, y) + C p w p p, (x,y) S In structure SVMs { l hinge (w, x, y) = max l(y, ŷ) + w Φ(x, ŷ) w Φ(x, y) } ŷ Y CRF loss: The conditional distribution is 1 p x,y (ŷ; w) = Z(x, y) exp ( l(y, ŷ) + w Φ(x, ŷ) ) Z(x, y) = exp ( l(y, ŷ) + w Φ(x, ŷ) ) ŷ Y where l(y, ŷ) is a prior distribution and Z(x, y) the partition function, and l log (w, x, y) = ln 1 p x,y (y; w). Raquel Urtasun (TTI-C) Visual Recognition March 6, / 64
50 Relation between loss functions The CRF program is (CRF) min ln Z(x, y) d w + C w p w p p, (x,y) S where (x, y) S ranges over training pairs and d = (x,y) S Φ(x, y) is the vector of empirical means, and Z(x, y) = ŷ Y ( ) exp l(y, ŷ) + w Φ(x, ŷ) In structured SVMs (structured SVM) min w max ŷ Y (x,y) S { } l(y, ŷ) + w Φ(x, ŷ) d w + C p w p p, Raquel Urtasun (TTI-C) Visual Recognition March 6, / 64
51 Relation between loss functions The CRF program is (CRF) min ln Z(x, y) d w + C w p w p p, (x,y) S where (x, y) S ranges over training pairs and d = (x,y) S Φ(x, y) is the vector of empirical means, and Z(x, y) = ŷ Y ( ) exp l(y, ŷ) + w Φ(x, ŷ) In structured SVMs (structured SVM) min w max ŷ Y (x,y) S { } l(y, ŷ) + w Φ(x, ŷ) d w + C p w p p, Raquel Urtasun (TTI-C) Visual Recognition March 6, / 64
52 A family of structure prediction problems [T. Hazan and R. Urtasun, NIPS 2010] One parameter extension of CRFs and structured SVMs min ln Z w ɛ(x, y) d w + C p w p p, (x,y) S d is the empirical means, and ln Z ɛ(x, y) = ɛ ln ( l(y, ŷ) + w ) Φ(x, ŷ) exp ɛ ŷ Y CRF if ɛ = 1, Structured SVM if ɛ = 0 respectively. Raquel Urtasun (TTI-C) Visual Recognition March 6, / 64
53 A family of structure prediction problems [T. Hazan and R. Urtasun, NIPS 2010] One parameter extension of CRFs and structured SVMs min ln Z w ɛ(x, y) d w + C p w p p, (x,y) S d is the empirical means, and ln Z ɛ(x, y) = ɛ ln ( l(y, ŷ) + w ) Φ(x, ŷ) exp ɛ ŷ Y CRF if ɛ = 1, Structured SVM if ɛ = 0 respectively. Introduces the notion of loss in CRFs. Raquel Urtasun (TTI-C) Visual Recognition March 6, / 64
54 A family of structure prediction problems [T. Hazan and R. Urtasun, NIPS 2010] One parameter extension of CRFs and structured SVMs min ln Z w ɛ(x, y) d w + C p w p p, (x,y) S d is the empirical means, and ln Z ɛ(x, y) = ɛ ln ( l(y, ŷ) + w ) Φ(x, ŷ) exp ɛ ŷ Y CRF if ɛ = 1, Structured SVM if ɛ = 0 respectively. Introduces the notion of loss in CRFs. Dual takes the form max ɛh(p x,y ) + p x,y (ŷ)l(y, ŷ) C 1 q q q p x,y (ŷ)φ(x, ŷ) d ŷ (x,y) S ŷ Y q p x,y (ŷ) Y (x,y) S over the probability simplex over Y. Raquel Urtasun (TTI-C) Visual Recognition March 6, / 64
55 A family of structure prediction problems [T. Hazan and R. Urtasun, NIPS 2010] One parameter extension of CRFs and structured SVMs min ln Z w ɛ(x, y) d w + C p w p p, (x,y) S d is the empirical means, and ln Z ɛ(x, y) = ɛ ln ( l(y, ŷ) + w ) Φ(x, ŷ) exp ɛ ŷ Y CRF if ɛ = 1, Structured SVM if ɛ = 0 respectively. Introduces the notion of loss in CRFs. Dual takes the form max ɛh(p x,y ) + p x,y (ŷ)l(y, ŷ) C 1 q q q p x,y (ŷ)φ(x, ŷ) d ŷ (x,y) S ŷ Y q p x,y (ŷ) Y (x,y) S over the probability simplex over Y. Raquel Urtasun (TTI-C) Visual Recognition March 6, / 64
56 Primal-Dual approximated learning algorithm In many applications the features decompose φ r (x, ŷ 1,..., ŷ n) = φ r,v (x, ŷ v ) + φ r,α(x, ŷ α). v V r,x α E r,x Using this we can write the approximate program as min λx,y,v α,w [T. Hazan and R. Urtasun, NIPS 2010] ɛc v ln ( lv (y v, ŷ v ) + r:v Vr,x wr φr,v (x, ŷv ) α N(v) λx,y,v α(ŷv ) ) exp ɛc (x,y) S,v v ŷv + ɛc α ln ( r:α Er exp wr φr,α(x, ŷα) + v N(α) λx,y,v α(ŷv ) ) d w C ɛc (x,y) S,α α p w p p ŷα Raquel Urtasun (TTI-C) Visual Recognition March 6, / 64
57 Primal-Dual approximated learning algorithm In many applications the features decompose φ r (x, ŷ 1,..., ŷ n) = φ r,v (x, ŷ v ) + φ r,α(x, ŷ α). v V r,x α E r,x Using this we can write the approximate program as min λx,y,v α,w [T. Hazan and R. Urtasun, NIPS 2010] ɛc v ln ( lv (y v, ŷ v ) + r:v Vr,x wr φr,v (x, ŷv ) α N(v) λx,y,v α(ŷv ) ) exp ɛc (x,y) S,v v ŷv + ɛc α ln ( r:α Er exp wr φr,α(x, ŷα) + v N(α) λx,y,v α(ŷv ) ) d w C ɛc (x,y) S,α α p w p p ŷα Coordinate descent algorithm that alternates between sending messages and updating parameters. Raquel Urtasun (TTI-C) Visual Recognition March 6, / 64
58 Primal-Dual approximated learning algorithm In many applications the features decompose φ r (x, ŷ 1,..., ŷ n) = φ r,v (x, ŷ v ) + φ r,α(x, ŷ α). v V r,x α E r,x Using this we can write the approximate program as min λx,y,v α,w [T. Hazan and R. Urtasun, NIPS 2010] ɛc v ln ( lv (y v, ŷ v ) + r:v Vr,x wr φr,v (x, ŷv ) α N(v) λx,y,v α(ŷv ) ) exp ɛc (x,y) S,v v ŷv + ɛc α ln ( r:α Er exp wr φr,α(x, ŷα) + v N(α) λx,y,v α(ŷv ) ) d w C ɛc (x,y) S,α α p w p p ŷα Coordinate descent algorithm that alternates between sending messages and updating parameters. Advantage: doesn t need the MAP or marginal at each gradient step. Raquel Urtasun (TTI-C) Visual Recognition March 6, / 64
59 Primal-Dual approximated learning algorithm In many applications the features decompose φ r (x, ŷ 1,..., ŷ n) = φ r,v (x, ŷ v ) + φ r,α(x, ŷ α). v V r,x α E r,x Using this we can write the approximate program as min λx,y,v α,w [T. Hazan and R. Urtasun, NIPS 2010] ɛc v ln ( lv (y v, ŷ v ) + r:v Vr,x wr φr,v (x, ŷv ) α N(v) λx,y,v α(ŷv ) ) exp ɛc (x,y) S,v v ŷv + ɛc α ln ( r:α Er exp wr φr,α(x, ŷα) + v N(α) λx,y,v α(ŷv ) ) d w C ɛc (x,y) S,α α p w p p ŷα Coordinate descent algorithm that alternates between sending messages and updating parameters. Advantage: doesn t need the MAP or marginal at each gradient step. Can learn a large set of parameters. Raquel Urtasun (TTI-C) Visual Recognition March 6, / 64
60 Primal-Dual approximated learning algorithm In many applications the features decompose φ r (x, ŷ 1,..., ŷ n) = φ r,v (x, ŷ v ) + φ r,α(x, ŷ α). v V r,x α E r,x Using this we can write the approximate program as min λx,y,v α,w [T. Hazan and R. Urtasun, NIPS 2010] ɛc v ln ( lv (y v, ŷ v ) + r:v Vr,x wr φr,v (x, ŷv ) α N(v) λx,y,v α(ŷv ) ) exp ɛc (x,y) S,v v ŷv + ɛc α ln ( r:α Er exp wr φr,α(x, ŷα) + v N(α) λx,y,v α(ŷv ) ) d w C ɛc (x,y) S,α α p w p p ŷα Coordinate descent algorithm that alternates between sending messages and updating parameters. Advantage: doesn t need the MAP or marginal at each gradient step. Can learn a large set of parameters. Code will be available soon, including parallel implementation. Raquel Urtasun (TTI-C) Visual Recognition March 6, / 64
61 Primal-Dual approximated learning algorithm In many applications the features decompose φ r (x, ŷ 1,..., ŷ n) = φ r,v (x, ŷ v ) + φ r,α(x, ŷ α). v V r,x α E r,x Using this we can write the approximate program as min λx,y,v α,w [T. Hazan and R. Urtasun, NIPS 2010] ɛc v ln ( lv (y v, ŷ v ) + r:v Vr,x wr φr,v (x, ŷv ) α N(v) λx,y,v α(ŷv ) ) exp ɛc (x,y) S,v v ŷv + ɛc α ln ( r:α Er exp wr φr,α(x, ŷα) + v N(α) λx,y,v α(ŷv ) ) d w C ɛc (x,y) S,α α p w p p ŷα Coordinate descent algorithm that alternates between sending messages and updating parameters. Advantage: doesn t need the MAP or marginal at each gradient step. Can learn a large set of parameters. Code will be available soon, including parallel implementation. Raquel Urtasun (TTI-C) Visual Recognition March 6, / 64
62 Learning algorithm Raquel Urtasun (TTI-C) Visual Recognition March 6, / 64
63 Examples in computer vision Raquel Urtasun (TTI-C) Visual Recognition March 6, / 64
64 Examples Depth estimation Multi-label prediction Object detection Non-maxima supression Segmentation Sentence generation Holistic scene understanding 2D pose estimation Non-rigid shape estimation 3D scene understanding Raquel Urtasun (TTI-C) Visual Recognition March 6, / 64
65 For each application what do we need to decide? Random variables Graphical model Potentials Loss for learning Learning algorithm Inference algorithm Let s look at some examples Raquel Urtasun (TTI-C) Visual Recognition March 6, / 64
66 Depth Estimation [Source: P. Kohli] Raquel Urtasun (TTI-C) Visual Recognition March 6, / 64
67 Stereo matching pairwise [Source: P. Kohli] Raquel Urtasun (TTI-C) Visual Recognition March 6, / 64
68 Stereo matching: energy [Source: P. Kohli] Raquel Urtasun (TTI-C) Visual Recognition March 6, / 64
69 Stereo matching: energy [Source: P. Kohli] Raquel Urtasun (TTI-C) Visual Recognition March 6, / 64
70 More on pairwise [O. Veksler] [Source: P. Kohli] Raquel Urtasun (TTI-C) Visual Recognition March 6, / 64
71 Graph Structure see Raquel Urtasun (TTI-C) Visual Recognition March 6, / 64
72 Learning and inference There is only one parameter to learn: importance of pairwise with respect to unitary! Sum of square differences: outliers are more important % of pixels that have disparity error bigger than ɛ. The latter is how typically stereo algorithms are scored Which inference method will you choose? And for learning? Raquel Urtasun (TTI-C) Visual Recognition March 6, / 64
73 Example: Object Detection We can formulate object localization as a regression from an image to a bounding box g : X Y X is the space of all images Y is the space of all bounding boxes Raquel Urtasun (TTI-C) Visual Recognition March 6, / 64
74 Joint Kernel Between bboxes [Source: M. Blascko] Raquel Urtasun (TTI-C) Visual Recognition March 6, / 64
75 Restriction Kernel example [Source: M. Blascko] Raquel Urtasun (TTI-C) Visual Recognition March 6, / 64
76 Margin Rescaling [Source: M. Blascko] Raquel Urtasun (TTI-C) Visual Recognition March 6, / 64
77 Constraint Generation with Branch and Bound [Source: M. Blascko] Raquel Urtasun (TTI-C) Visual Recognition March 6, / 64
78 Sets of Rectangles [Source: M. Blascko] Raquel Urtasun (TTI-C) Visual Recognition March 6, / 64
79 Optimization [Source: M. Blascko] Raquel Urtasun (TTI-C) Visual Recognition March 6, / 64
80 Results: PASCAL VOC2006 [Source: M. Blascko] Raquel Urtasun (TTI-C) Visual Recognition March 6, / 64
81 Results: PASCAL VOC2006 cats [Source: M. Blascko] Raquel Urtasun (TTI-C) Visual Recognition March 6, / 64
82 Problem The restriction kernel is like having tunnel vision [Source: M. Blascko] Raquel Urtasun (TTI-C) Visual Recognition March 6, / 64
83 Problem The restriction kernel is like having tunnel vision [Source: M. Blascko] Raquel Urtasun (TTI-C) Visual Recognition March 6, / 64
84 Global and Local Context Kernels [Source: M. Blascko] Raquel Urtasun (TTI-C) Visual Recognition March 6, / 64
85 Local Context Kernel [Source: M. Blascko] Raquel Urtasun (TTI-C) Visual Recognition March 6, / 64
86 Results Context is a very busy area of research in vision! [Source: M. Blascko] Raquel Urtasun (TTI-C) Visual Recognition March 6, / 64
87 Example: 3D Indoor Scene Understanding Task: Given an image, predict the 3D parametric cuboid that best describes the layout. Raquel Urtasun (TTI-C) Visual Recognition March 6, / 64
88 Prediction Variables are not independent of each other, i.e. structured prediction x: Input image y: Room layout φ(x, y): Multidimensional feature vector w: Predictor Estimate room layout by solving inference task ŷ = arg max w T φ(x, y) y Learning w via structured SVMs or CRFs Raquel Urtasun (TTI-C) Visual Recognition March 6, / 64
89 Single Variable Parameterization Approaches of [Hedau et al. 09] and [Lee et al. 10]. One random variable y for the entire layout. Every state denotes a different candidate layout. Limits the amount of candidate layouts. Not really a structured prediction task. n states/3d layouts have to be evaluated exhaustively, e.g., Raquel Urtasun (TTI-C) Visual Recognition March 6, / 64
90 Four Variable Parameterization Approach of [Wang et al. 10]. 4 variables y i Y, i {1,..., 4} corresponding to the four degrees of freedom of the problem. One state of y i denotes the angle of ray r i. High order potentials, e.g., 50 4 for fourth-order. r 3 r 4 r 1 y 1 vp 2 vp 0 y 2 r 2 y 3 y 4 vp 1 For both parameterizations is even worst when reasoning about objects. Raquel Urtasun (TTI-C) Visual Recognition March 6, / 64
91 Integral Geometry for Features We follow [Wang et al. 10] and parameterize with four random variables. We follow [Lee et al. 10] and employ orientation map [Lee09 et al.] and geometric context [Hoiem et al. 07] as image cues. orientation map geometric context Raquel Urtasun (TTI-C) Visual Recognition March 6, / 64
92 Integral Geometry for Features Faces F = {left-wall, right-wall, ceiling, floor, front-wall} Faces are defined by four (front-wall) or three angles (otherwise) w T φ(x, y) = wo,αφ T o,α (x, y α ) + wg,αφ T g,α (x, y α ) α F α F Features count frequencies of image cues Orientation map and proposed left wall Raquel Urtasun (TTI-C) Visual Recognition March 6, / 64
93 Integral Geometry for Features Using inspiration from integral images, we decompose φ,α (x, y α ) = φ,{i,j,k} (x, y i, y j, y k ) = = H,{i,j} (x, y i, y j ) H,{j,k} (x, y j, y k ) Integral geometry H,{i,j} (x, y i, y j ) y i y k H,{j,k} (x, y j, y k ) y j Raquel Urtasun (TTI-C) Visual Recognition March 6, / 64
94 Integral Geometry for Features Decomposition: Corresponding factor graph: H,{i,j} (x, y i, y j ) H,{j,k} (x, y j, y k ) y i y j y k y l The front-wall: {i, j} {j, k} φ,front-wall = φ(x) φ,left-wall φ,right-wall φ,ceiling φ,floor Raquel Urtasun (TTI-C) Visual Recognition March 6, / 64
95 Integral Geometry Same concept as integral images, but in accordance with the vanishing points. Figure: Concept of integral geometry Raquel Urtasun (TTI-C) Visual Recognition March 6, / 64
96 Learning and Inference Learning Inference Family of structure prediction problems including CRF and structured-svms as especial cases. Primal-dual algorithm based on local updates. Fast and works well with large number of parameters. Code coming soon! Inference using parallel convex belief propagation Convergence and other theoretical guarantees [T. Hazan and R. Urtasun, NIPS 2010] Code available online: general potentials, cross-platform, Amazon EC2! [A. Schwing, T. Hazan, M. Pollefeys and R. Urtasun, CVPR 2011] Raquel Urtasun (TTI-C) Visual Recognition March 6, / 64
97 Time vs Accuracy Learning very fast: State-of-the-art after less than a minute! Error [%] / / 150 Train/Test 200 Train/Test rel. duality gap Time [s] Time Inference as little as 10ms per image! Raquel Urtasun (TTI-C) Visual Recognition March 6, / 64
98 Results [A. Schwing, T. Hazan, M. Pollefeys and R. Urtasun, CVPR12] Table: Pixel classification error in the layout dataset of [Hedau et al. 09]. OM GC OM + GC [Hoiem07] [Hedau09] (a) [Hedau09] (b) [Wang10] [Lee10] Ours (SVM struct ) Ours (struct-pred) Table: Pixel classification error in the bedroom data set [Hedau et al. 10]. [Luca11] [Hoiem07] [Hedau09](a) Ours w/o box Raquel Urtasun (TTI-C) Visual Recognition March 6, / 64
99 Simple object reasoning Compatibility of 3D object candidates and layout Raquel Urtasun (TTI-C) Visual Recognition March 6, / 64
100 Results [A. Schwing, T. Hazan, M. Pollefeys and R. Urtasun, CVPR12] Table: Pixel classification error in the layout dataset of [Hedau et al. 09]. OM GC OM + GC [Hoiem07] [Hedau09] (a) [Hedau09] (b) [Wang10] [Lee10] Ours (SVM struct ) Ours (struct-pred) Table: WITH object reasoning. OM GC OM + GC [Wang10] [Lee10] Ours (SVM struct ) Ours (struct-pred) Raquel Urtasun (TTI-C) Visual Recognition March 6, / 64
101 Results [A. Schwing, T. Hazan, M. Pollefeys and R. Urtasun, CVPR12] Table: Pixel classification error in the layout dataset of [Hedau et al. 09] with object reasoning. OM GC OM + GC [Wang10] [Lee10] Ours (SVM struct ) Ours (struct-pred) Table: Pixel classification error in the bedroom data set [Hedau et al. 10]. [Luca11] [Hoiem07] [Hedau09](a) Ours w/o box w/ box Raquel Urtasun (TTI-C) Visual Recognition March 6, / 64
102 Qualitative Results Raquel Urtasun (TTI-C) Visual Recognition March 6, / 64
103 Conclusions and Future Work Conclusion: Efficient learning and inference tools for structure prediction based on primal-dual methods. Inference: No need for application specific moves. Learning: can learn large number of parameters using local updates. State-of-the-art results. Future Work: More features. Better object reasoning. Weakly label setting. Better inference? Raquel Urtasun (TTI-C) Visual Recognition March 6, / 64
Part 5: Structured Support Vector Machines
Part 5: Structured Support Vector Machines Sebastian Nowozin and Christoph H. Lampert Providence, 21st June 2012 1 / 34 Problem (Loss-Minimizing Parameter Learning) Let d(x, y) be the (unknown) true data
More informationPart 5: Structured Support Vector Machines
Part 5: Structured Support Vector Machines Sebastian Noozin and Christoph H. Lampert Colorado Springs, 25th June 2011 1 / 56 Problem (Loss-Minimizing Parameter Learning) Let d(x, y) be the (unknon) true
More informationA Primal-Dual Message-Passing Algorithm for Approximated Large Scale Structured Prediction
A Primal-Dual Message-Passing Algorithm for Approximated Large Scale Structured Prediction Tamir Hazan TTI Chicago hazan@ttic.edu Raquel Urtasun TTI Chicago rurtasun@ttic.edu Abstract In this paper we
More informationComplex Prediction Problems
Problems A novel approach to multiple Structured Output Prediction Max-Planck Institute ECML HLIE08 Information Extraction Extract structured information from unstructured data Typical subtasks Named Entity
More informationSVMs for Structured Output. Andrea Vedaldi
SVMs for Structured Output Andrea Vedaldi SVM struct Tsochantaridis Hofmann Joachims Altun 04 Extending SVMs 3 Extending SVMs SVM = parametric function arbitrary input binary output 3 Extending SVMs SVM
More informationDevelopment in Object Detection. Junyuan Lin May 4th
Development in Object Detection Junyuan Lin May 4th Line of Research [1] N. Dalal and B. Triggs. Histograms of oriented gradients for human detection, CVPR 2005. HOG Feature template [2] P. Felzenszwalb,
More informationLearning CRFs using Graph Cuts
Appears in European Conference on Computer Vision (ECCV) 2008 Learning CRFs using Graph Cuts Martin Szummer 1, Pushmeet Kohli 1, and Derek Hoiem 2 1 Microsoft Research, Cambridge CB3 0FB, United Kingdom
More informationSupport Vector Machine Learning for Interdependent and Structured Output Spaces
Support Vector Machine Learning for Interdependent and Structured Output Spaces I. Tsochantaridis, T. Hofmann, T. Joachims, and Y. Altun, ICML, 2004. And also I. Tsochantaridis, T. Joachims, T. Hofmann,
More informationCOMS 4771 Support Vector Machines. Nakul Verma
COMS 4771 Support Vector Machines Nakul Verma Last time Decision boundaries for classification Linear decision boundary (linear classification) The Perceptron algorithm Mistake bound for the perceptron
More informationStructured Learning. Jun Zhu
Structured Learning Jun Zhu Supervised learning Given a set of I.I.D. training samples Learn a prediction function b r a c e Supervised learning (cont d) Many different choices Logistic Regression Maximum
More informationSeparating Objects and Clutter in Indoor Scenes
Separating Objects and Clutter in Indoor Scenes Salman H. Khan School of Computer Science & Software Engineering, The University of Western Australia Co-authors: Xuming He, Mohammed Bennamoun, Ferdous
More informationSupport vector machines
Support vector machines When the data is linearly separable, which of the many possible solutions should we prefer? SVM criterion: maximize the margin, or distance between the hyperplane and the closest
More informationGenerative and discriminative classification techniques
Generative and discriminative classification techniques Machine Learning and Category Representation 013-014 Jakob Verbeek, December 13+0, 013 Course website: http://lear.inrialpes.fr/~verbeek/mlcr.13.14
More informationMarkov Networks in Computer Vision. Sargur Srihari
Markov Networks in Computer Vision Sargur srihari@cedar.buffalo.edu 1 Markov Networks for Computer Vision Important application area for MNs 1. Image segmentation 2. Removal of blur/noise 3. Stereo reconstruction
More informationPedestrian Detection Using Structured SVM
Pedestrian Detection Using Structured SVM Wonhui Kim Stanford University Department of Electrical Engineering wonhui@stanford.edu Seungmin Lee Stanford University Department of Electrical Engineering smlee729@stanford.edu.
More informationEstimating the 3D Layout of Indoor Scenes and its Clutter from Depth Sensors
Estimating the 3D Layout of Indoor Scenes and its Clutter from Depth Sensors Jian Zhang Tsingua University jizhang@ethz.ch Chen Kan Tsingua University chenkan0007@gmail.com Alexander G. Schwing ETH Zurich
More informationNatural Language Processing
Natural Language Processing Classification III Dan Klein UC Berkeley 1 Classification 2 Linear Models: Perceptron The perceptron algorithm Iteratively processes the training set, reacting to training errors
More informationComparisons of Sequence Labeling Algorithms and Extensions
Nam Nguyen Yunsong Guo Department of Computer Science, Cornell University, Ithaca, NY 14853, USA NHNGUYEN@CS.CORNELL.EDU GUOYS@CS.CORNELL.EDU Abstract In this paper, we survey the current state-ofart models
More informationAll lecture slides will be available at CSC2515_Winter15.html
CSC2515 Fall 2015 Introduc3on to Machine Learning Lecture 9: Support Vector Machines All lecture slides will be available at http://www.cs.toronto.edu/~urtasun/courses/csc2515/ CSC2515_Winter15.html Many
More informationBig and Tall: Large Margin Learning with High Order Losses
Big and Tall: Large Margin Learning with High Order Losses Daniel Tarlow University of Toronto dtarlow@cs.toronto.edu Richard Zemel University of Toronto zemel@cs.toronto.edu Abstract Graphical models
More informationKernel Methods & Support Vector Machines
& Support Vector Machines & Support Vector Machines Arvind Visvanathan CSCE 970 Pattern Recognition 1 & Support Vector Machines Question? Draw a single line to separate two classes? 2 & Support Vector
More informationMachine Learning: Think Big and Parallel
Day 1 Inderjit S. Dhillon Dept of Computer Science UT Austin CS395T: Topics in Multicore Programming Oct 1, 2013 Outline Scikit-learn: Machine Learning in Python Supervised Learning day1 Regression: Least
More informationHierarchical Image-Region Labeling via Structured Learning
Hierarchical Image-Region Labeling via Structured Learning Julian McAuley, Teo de Campos, Gabriela Csurka, Florent Perronin XRCE September 14, 2009 McAuley et al (XRCE) Hierarchical Image-Region Labeling
More informationLearning to Localize Objects with Structured Output Regression
Learning to Localize Objects with Structured Output Regression Matthew Blaschko and Christopher Lampert ECCV 2008 Best Student Paper Award Presentation by Jaeyong Sung and Yiting Xie 1 Object Localization
More informationContextual Classification with Functional Max-Margin Markov Networks
Contextual Classification with Functional Max-Margin Markov Networks Dan Munoz Nicolas Vandapel Drew Bagnell Martial Hebert Geometry Estimation (Hoiem et al.) Sky Problem 3-D Point Cloud Classification
More informationSVM: Multiclass and Structured Prediction. Bin Zhao
SVM: Multiclass and Structured Prediction Bin Zhao Part I: Multi-Class SVM 2-Class SVM Primal form Dual form http://www.glue.umd.edu/~zhelin/recog.html Real world classification problems Digit recognition
More informationExponentiated Gradient Algorithms for Large-margin Structured Classification
Exponentiated Gradient Algorithms for Large-margin Structured Classification Peter L. Bartlett U.C.Berkeley bartlett@stat.berkeley.edu Ben Taskar Stanford University btaskar@cs.stanford.edu Michael Collins
More informationLecture 9: Support Vector Machines
Lecture 9: Support Vector Machines William Webber (william@williamwebber.com) COMP90042, 2014, Semester 1, Lecture 8 What we ll learn in this lecture Support Vector Machines (SVMs) a highly robust and
More informationSupport Vector Machines.
Support Vector Machines srihari@buffalo.edu SVM Discussion Overview 1. Overview of SVMs 2. Margin Geometry 3. SVM Optimization 4. Overlapping Distributions 5. Relationship to Logistic Regression 6. Dealing
More informationConstrained optimization
Constrained optimization A general constrained optimization problem has the form where The Lagrangian function is given by Primal and dual optimization problems Primal: Dual: Weak duality: Strong duality:
More informationIntroduction to Machine Learning
Introduction to Machine Learning Maximum Margin Methods Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574
More informationMarkov Networks in Computer Vision
Markov Networks in Computer Vision Sargur Srihari srihari@cedar.buffalo.edu 1 Markov Networks for Computer Vision Some applications: 1. Image segmentation 2. Removal of blur/noise 3. Stereo reconstruction
More informationLecture 7: Support Vector Machine
Lecture 7: Support Vector Machine Hien Van Nguyen University of Houston 9/28/2017 Separating hyperplane Red and green dots can be separated by a separating hyperplane Two classes are separable, i.e., each
More informationConvexization in Markov Chain Monte Carlo
in Markov Chain Monte Carlo 1 IBM T. J. Watson Yorktown Heights, NY 2 Department of Aerospace Engineering Technion, Israel August 23, 2011 Problem Statement MCMC processes in general are governed by non
More informationMessage Passing Inference for Large Scale Graphical Models with High Order Potentials
Message Passing Inference for Large Scale Graphical Models with High Order Potentials Jian Zhang ETH Zurich jizhang@ethz.ch Alexander G. Schwing University of Toronto aschwing@cs.toronto.edu Raquel Urtasun
More informationSports Field Localization
Sports Field Localization Namdar Homayounfar February 5, 2017 1 / 1 Motivation Sports Field Localization: Have to figure out where the field and players are in 3d space in order to make measurements and
More informationCombine the PA Algorithm with a Proximal Classifier
Combine the Passive and Aggressive Algorithm with a Proximal Classifier Yuh-Jye Lee Joint work with Y.-C. Tseng Dept. of Computer Science & Information Engineering TaiwanTech. Dept. of Statistics@NCKU
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Raquel Urtasun and Tamir Hazan TTI Chicago April 8, 2011 Raquel Urtasun and Tamir Hazan (TTI-C) Graphical Models April 8, 2011 1 / 19 Factor Graphs H does not reveal the
More informationData Analysis 3. Support Vector Machines. Jan Platoš October 30, 2017
Data Analysis 3 Support Vector Machines Jan Platoš October 30, 2017 Department of Computer Science Faculty of Electrical Engineering and Computer Science VŠB - Technical University of Ostrava Table of
More informationData Mining: Concepts and Techniques. Chapter 9 Classification: Support Vector Machines. Support Vector Machines (SVMs)
Data Mining: Concepts and Techniques Chapter 9 Classification: Support Vector Machines 1 Support Vector Machines (SVMs) SVMs are a set of related supervised learning methods used for classification Based
More informationObject Detection by 3D Aspectlets and Occlusion Reasoning
Object Detection by 3D Aspectlets and Occlusion Reasoning Yu Xiang University of Michigan Silvio Savarese Stanford University In the 4th International IEEE Workshop on 3D Representation and Recognition
More informationLearning Hierarchies at Two-class Complexity
Learning Hierarchies at Two-class Complexity Sandor Szedmak ss03v@ecs.soton.ac.uk Craig Saunders cjs@ecs.soton.ac.uk John Shawe-Taylor jst@ecs.soton.ac.uk ISIS Group, Electronics and Computer Science University
More informationAn MM Algorithm for Multicategory Vertex Discriminant Analysis
An MM Algorithm for Multicategory Vertex Discriminant Analysis Tong Tong Wu Department of Epidemiology and Biostatistics University of Maryland, College Park May 22, 2008 Joint work with Professor Kenneth
More informationGenerative and discriminative classification
Generative and discriminative classification Machine Learning and Object Recognition 2017-2018 Jakob Verbeek Classification in its simplest form Given training data labeled for two or more classes Classification
More informationSupport Vector Machines. James McInerney Adapted from slides by Nakul Verma
Support Vector Machines James McInerney Adapted from slides by Nakul Verma Last time Decision boundaries for classification Linear decision boundary (linear classification) The Perceptron algorithm Mistake
More informationNon-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines
Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2007 c 2007,
More informationRegularization and Markov Random Fields (MRF) CS 664 Spring 2008
Regularization and Markov Random Fields (MRF) CS 664 Spring 2008 Regularization in Low Level Vision Low level vision problems concerned with estimating some quantity at each pixel Visual motion (u(x,y),v(x,y))
More informationDiscrete Optimization of Ray Potentials for Semantic 3D Reconstruction
Discrete Optimization of Ray Potentials for Semantic 3D Reconstruction Marc Pollefeys Joined work with Nikolay Savinov, Christian Haene, Lubor Ladicky 2 Comparison to Volumetric Fusion Higher-order ray
More informationStructured Prediction in Computer Vision
Structured Prediction in Computer Vision Tiberio Caetano and Richard Hartley NICTA/Australian National University Canberra, Australia http://tiberiocaetano.com/iccv_tutorial ICCV, September 2009 Tiberio
More informationParallel and Distributed Graph Cuts by Dual Decomposition
Parallel and Distributed Graph Cuts by Dual Decomposition Presented by Varun Gulshan 06 July, 2010 What is Dual Decomposition? What is Dual Decomposition? It is a technique for optimization: usually used
More informationGenerative and discriminative classification techniques
Generative and discriminative classification techniques Machine Learning and Object Recognition 2015-2016 Jakob Verbeek, December 11, 2015 Course website: http://lear.inrialpes.fr/~verbeek/mlor.15.16 Classification
More informationOptimal Separating Hyperplane and the Support Vector Machine. Volker Tresp Summer 2018
Optimal Separating Hyperplane and the Support Vector Machine Volker Tresp Summer 2018 1 (Vapnik s) Optimal Separating Hyperplane Let s consider a linear classifier with y i { 1, 1} If classes are linearly
More informationStanford University. A Distributed Solver for Kernalized SVM
Stanford University CME 323 Final Project A Distributed Solver for Kernalized SVM Haoming Li Bangzheng He haoming@stanford.edu bzhe@stanford.edu GitHub Repository https://github.com/cme323project/spark_kernel_svm.git
More informationLearning and Inference to Exploit High Order Poten7als
Learning and Inference to Exploit High Order Poten7als Richard Zemel CVPR Workshop June 20, 2011 Collaborators Danny Tarlow Inmar Givoni Nikola Karamanov Maks Volkovs Hugo Larochelle Framework for Inference
More informationMachine Learning. Support Vector Machines. Fabio Vandin November 20, 2017
Machine Learning Support Vector Machines Fabio Vandin November 20, 2017 1 Classification and Margin Consider a classification problem with two classes: instance set X = R d label set Y = { 1, 1}. Training
More informationMathematical Programming and Research Methods (Part II)
Mathematical Programming and Research Methods (Part II) 4. Convexity and Optimization Massimiliano Pontil (based on previous lecture by Andreas Argyriou) 1 Today s Plan Convex sets and functions Types
More information.. Spring 2017 CSC 566 Advanced Data Mining Alexander Dekhtyar..
.. Spring 2017 CSC 566 Advanced Data Mining Alexander Dekhtyar.. Machine Learning: Support Vector Machines: Linear Kernel Support Vector Machines Extending Perceptron Classifiers. There are two ways to
More informationDecomposing a Scene into Geometric and Semantically Consistent Regions
Decomposing a Scene into Geometric and Semantically Consistent Regions Stephen Gould sgould@stanford.edu Richard Fulton rafulton@cs.stanford.edu Daphne Koller koller@cs.stanford.edu IEEE International
More informationProbabilistic Graphical Models
School of Computer Science Probabilistic Graphical Models Theory of Variational Inference: Inner and Outer Approximation Eric Xing Lecture 14, February 29, 2016 Reading: W & J Book Chapters Eric Xing @
More information08 An Introduction to Dense Continuous Robotic Mapping
NAVARCH/EECS 568, ROB 530 - Winter 2018 08 An Introduction to Dense Continuous Robotic Mapping Maani Ghaffari March 14, 2018 Previously: Occupancy Grid Maps Pose SLAM graph and its associated dense occupancy
More informationarxiv: v1 [cs.lg] 28 Aug 2014
A Multi-Plane Block-Coordinate Frank-Wolfe Algorithm for Structural SVMs with a Costly max-oracle Neel Shah, Vladimir Kolmogorov, Christoph H. Lampert IST Austria, 3400 Klosterneuburg, Austria arxiv:1408.6804v1
More information1 Case study of SVM (Rob)
DRAFT a final version will be posted shortly COS 424: Interacting with Data Lecturer: Rob Schapire and David Blei Lecture # 8 Scribe: Indraneel Mukherjee March 1, 2007 In the previous lecture we saw how
More informationIntroduction to Mathematical Programming IE496. Final Review. Dr. Ted Ralphs
Introduction to Mathematical Programming IE496 Final Review Dr. Ted Ralphs IE496 Final Review 1 Course Wrap-up: Chapter 2 In the introduction, we discussed the general framework of mathematical modeling
More informationMore Data, Less Work: Runtime as a decreasing function of data set size. Nati Srebro. Toyota Technological Institute Chicago
More Data, Less Work: Runtime as a decreasing function of data set size Nati Srebro Toyota Technological Institute Chicago Outline we are here SVM speculations, other problems Clustering wild speculations,
More informationSupport Vector Machines
Support Vector Machines SVM Discussion Overview. Importance of SVMs. Overview of Mathematical Techniques Employed 3. Margin Geometry 4. SVM Training Methodology 5. Overlapping Distributions 6. Dealing
More informationSupport Vector Machines.
Support Vector Machines srihari@buffalo.edu SVM Discussion Overview. Importance of SVMs. Overview of Mathematical Techniques Employed 3. Margin Geometry 4. SVM Training Methodology 5. Overlapping Distributions
More informationCPSC 340: Machine Learning and Data Mining. More Linear Classifiers Fall 2017
CPSC 340: Machine Learning and Data Mining More Linear Classifiers Fall 2017 Admin Assignment 3: Due Friday of next week. Midterm: Can view your exam during instructor office hours next week, or after
More informationCase Study 1: Estimating Click Probabilities
Case Study 1: Estimating Click Probabilities SGD cont d AdaGrad Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade March 31, 2015 1 Support/Resources Office Hours Yao Lu:
More informationLearning CRFs for Image Parsing with Adaptive Subgradient Descent
Learning CRFs for Image Parsing with Adaptive Subgradient Descent Honghui Zhang Jingdong Wang Ping Tan Jinglu Wang Long Quan The Hong Kong University of Science and Technology Microsoft Research National
More informationStructure and Support Vector Machines. SPFLODD October 31, 2013
Structure and Support Vector Machines SPFLODD October 31, 2013 Outline SVMs for structured outputs Declara?ve view Procedural view Warning: Math Ahead Nota?on for Linear Models Training data: {(x 1, y
More informationI How does the formulation (5) serve the purpose of the composite parameterization
Supplemental Material to Identifying Alzheimer s Disease-Related Brain Regions from Multi-Modality Neuroimaging Data using Sparse Composite Linear Discrimination Analysis I How does the formulation (5)
More informationMotivation: Shortcomings of Hidden Markov Model. Ko, Youngjoong. Solution: Maximum Entropy Markov Model (MEMM)
Motivation: Shortcomings of Hidden Markov Model Maximum Entropy Markov Models and Conditional Random Fields Ko, Youngjoong Dept. of Computer Engineering, Dong-A University Intelligent System Laboratory,
More informationDM6 Support Vector Machines
DM6 Support Vector Machines Outline Large margin linear classifier Linear separable Nonlinear separable Creating nonlinear classifiers: kernel trick Discussion on SVM Conclusion SVM: LARGE MARGIN LINEAR
More informationInput. Output. Problem Definition. Rectified stereo image pair All correspondences lie in same scan lines
Problem Definition 3 Input Rectified stereo image pair All correspondences lie in same scan lines Output Disparity map of the reference view Foreground: large disparity Background: small disparity Matching
More informationCS395T paper review. Indoor Segmentation and Support Inference from RGBD Images. Chao Jia Sep
CS395T paper review Indoor Segmentation and Support Inference from RGBD Images Chao Jia Sep 28 2012 Introduction What do we want -- Indoor scene parsing Segmentation and labeling Support relationships
More informationSemi-supervised learning and active learning
Semi-supervised learning and active learning Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Combining classifiers Ensemble learning: a machine learning paradigm where multiple learners
More informationAlternating Direction Method of Multipliers
Alternating Direction Method of Multipliers CS 584: Big Data Analytics Material adapted from Stephen Boyd (https://web.stanford.edu/~boyd/papers/pdf/admm_slides.pdf) & Ryan Tibshirani (http://stat.cmu.edu/~ryantibs/convexopt/lectures/21-dual-meth.pdf)
More informationCS 1674: Intro to Computer Vision. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh November 16, 2016
CS 1674: Intro to Computer Vision Neural Networks Prof. Adriana Kovashka University of Pittsburgh November 16, 2016 Announcements Please watch the videos I sent you, if you haven t yet (that s your reading)
More information732A54/TDDE31 Big Data Analytics
732A54/TDDE31 Big Data Analytics Lecture 10: Machine Learning with MapReduce Jose M. Peña IDA, Linköping University, Sweden 1/27 Contents MapReduce Framework Machine Learning with MapReduce Neural Networks
More informationLearning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009
Learning and Inferring Depth from Monocular Images Jiyan Pan April 1, 2009 Traditional ways of inferring depth Binocular disparity Structure from motion Defocus Given a single monocular image, how to infer
More informationSELF-ADAPTIVE SUPPORT VECTOR MACHINES
SELF-ADAPTIVE SUPPORT VECTOR MACHINES SELF ADAPTIVE SUPPORT VECTOR MACHINES AND AUTOMATIC FEATURE SELECTION By PENG DU, M.Sc., B.Sc. A thesis submitted to the School of Graduate Studies in Partial Fulfillment
More informationAdvanced Operations Research Techniques IE316. Quiz 2 Review. Dr. Ted Ralphs
Advanced Operations Research Techniques IE316 Quiz 2 Review Dr. Ted Ralphs IE316 Quiz 2 Review 1 Reading for The Quiz Material covered in detail in lecture Bertsimas 4.1-4.5, 4.8, 5.1-5.5, 6.1-6.3 Material
More informationCPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016
CPSC 340: Machine Learning and Data Mining Principal Component Analysis Fall 2016 A2/Midterm: Admin Grades/solutions will be posted after class. Assignment 4: Posted, due November 14. Extra office hours:
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Raquel Urtasun and Tamir Hazan TTI Chicago April 22, 2011 Raquel Urtasun and Tamir Hazan (TTI-C) Graphical Models April 22, 2011 1 / 22 If the graph is non-chordal, then
More informationMVE165/MMG630, Applied Optimization Lecture 8 Integer linear programming algorithms. Ann-Brith Strömberg
MVE165/MMG630, Integer linear programming algorithms Ann-Brith Strömberg 2009 04 15 Methods for ILP: Overview (Ch. 14.1) Enumeration Implicit enumeration: Branch and bound Relaxations Decomposition methods:
More informationStructured Output Learning with High Order Loss Functions
Daniel Tarlow Department of Computer Science University of Toronto Richard S. Zemel Department of Computer Science University of Toronto Abstract Often when modeling structured domains, it is desirable
More informationSegmentation. Bottom up Segmentation Semantic Segmentation
Segmentation Bottom up Segmentation Semantic Segmentation Semantic Labeling of Street Scenes Ground Truth Labels 11 classes, almost all occur simultaneously, large changes in viewpoint, scale sky, road,
More informationLearning and Recognizing Visual Object Categories Without First Detecting Features
Learning and Recognizing Visual Object Categories Without First Detecting Features Daniel Huttenlocher 2007 Joint work with D. Crandall and P. Felzenszwalb Object Category Recognition Generic classes rather
More informationSelf Lane Assignment Using Smart Mobile Camera For Intelligent GPS Navigation and Traffic Interpretation
For Intelligent GPS Navigation and Traffic Interpretation Tianshi Gao Stanford University tianshig@stanford.edu 1. Introduction Imagine that you are driving on the highway at 70 mph and trying to figure
More informationA Short SVM (Support Vector Machine) Tutorial
A Short SVM (Support Vector Machine) Tutorial j.p.lewis CGIT Lab / IMSC U. Southern California version 0.zz dec 004 This tutorial assumes you are familiar with linear algebra and equality-constrained optimization/lagrange
More informationStructured Models in. Dan Huttenlocher. June 2010
Structured Models in Computer Vision i Dan Huttenlocher June 2010 Structured Models Problems where output variables are mutually dependent or constrained E.g., spatial or temporal relations Such dependencies
More informationMultiple cosegmentation
Armand Joulin, Francis Bach and Jean Ponce. INRIA -Ecole Normale Supérieure April 25, 2012 Segmentation Introduction Segmentation Supervised and weakly-supervised segmentation Cosegmentation Segmentation
More informationSupport Vector Machines
Support Vector Machines RBF-networks Support Vector Machines Good Decision Boundary Optimization Problem Soft margin Hyperplane Non-linear Decision Boundary Kernel-Trick Approximation Accurancy Overtraining
More informationConvex Optimization and Machine Learning
Convex Optimization and Machine Learning Mengliu Zhao Machine Learning Reading Group School of Computing Science Simon Fraser University March 12, 2014 Mengliu Zhao SFU-MLRG March 12, 2014 1 / 25 Introduction
More informationThe More the Merrier: Parameter Learning for Graphical Models with Multiple MAPs
The More the Merrier: Parameter Learning for Graphical Models with Multiple MAPs Franziska Meier fmeier@usc.edu U. of Southern California, 94 Bloom Walk, Los Angeles, CA 989 USA Amir Globerson amir.globerson@mail.huji.ac.il
More informationLOGISTIC REGRESSION FOR MULTIPLE CLASSES
Peter Orbanz Applied Data Mining Not examinable. 111 LOGISTIC REGRESSION FOR MULTIPLE CLASSES Bernoulli and multinomial distributions The mulitnomial distribution of N draws from K categories with parameter
More informationThree-Dimensional Object Detection and Layout Prediction using Clouds of Oriented Gradients
ThreeDimensional Object Detection and Layout Prediction using Clouds of Oriented Gradients Authors: Zhile Ren, Erik B. Sudderth Presented by: Shannon Kao, Max Wang October 19, 2016 Introduction Given an
More informationLocal cues and global constraints in image understanding
Local cues and global constraints in image understanding Olga Barinova Lomonosov Moscow State University *Many slides adopted from the courses of Anton Konushin Image understanding «To see means to know
More informationAnalysis: TextonBoost and Semantic Texton Forests. Daniel Munoz Februrary 9, 2009
Analysis: TextonBoost and Semantic Texton Forests Daniel Munoz 16-721 Februrary 9, 2009 Papers [shotton-eccv-06] J. Shotton, J. Winn, C. Rother, A. Criminisi, TextonBoost: Joint Appearance, Shape and Context
More informationSupplemental Material: Discovering Groups of People in Images
Supplemental Material: Discovering Groups of People in Images Wongun Choi 1, Yu-Wei Chao 2, Caroline Pantofaru 3 and Silvio Savarese 4 1. NEC Laboratories 2. University of Michigan, Ann Arbor 3. Google,
More information