Today. Logistic Regression. Decision Trees Redux. Graphical Models. Maximum Entropy Formulation. Now using Information Theory

Size: px
Start display at page:

Download "Today. Logistic Regression. Decision Trees Redux. Graphical Models. Maximum Entropy Formulation. Now using Information Theory"

Transcription

1 Today Logistic Regression Maximum Entropy Formulation Decision Trees Redux Now using Information Theory Graphical Models Representing conditional dependence graphically 1

2 Logistic Regression Optimization Take the gradient in terms of w ( w) = ln p(t w) = N 1 n=0 {t n ln y n +(1 t n )ln(1 y n )} Where y 0 = p(c 0 x n )=σ(a n ) w E = a n = w T x n N 1 n=0 E y n y n a n w a n 2

3 Optimization We know the gradient of the error function, but how do we find the maximum value? w E = (y n t n )x n N 1 n=0 Setting to zero is nontrivial Numerical approximation 3

4 Entropy H(x) = x p(x) log 2 p(x) Measure of uncertainty, or Measure of Information High uncertainty equals high entropy. Rare events are more informative than common events. 4

5 Examples of Entropy Uniform distributions have higher distributions. 5

6 Maximum Entropy Logistic Regression is also known as Maximum Entropy. Entropy is convex. Convergence Expectation. Constrain this optimization to enforce good classification. Increase maximum likelihood of the data while making the distribution of weights most even. Include as many useful features as possible. 6

7 Maximum Entropy with Constraints From Klein and Manning Tutorial 7

8 Optimization formulation If we let the weights represent likelihoods of value for each feature. max H(w; x, t) = x w log 2 w For each feature i s.t. w T x = t and w 2 =1 8

9 Solving MaxEnt formulation Convex optimization with a concave objective function and linear constraints. Lagrange Multipliers max H(w; x, t) = x w log 2 w Dual representation of the maximum likelihood estimation of Logistic Regression (p, λ)= w w log 2 w + N i=1 For each feature i s.t. w T x = t and w 2 =1 λ i (wi T x i t)+λ 0 ( w 2 1) 9

10 Decision Trees Nested if -statements for classification Each Decision Tree Node contains a feature and a split point. Challenges: Determine which feature and split point to use Determine which branches are worth including at all (Pruning) 10

11 Decision Trees color blue brown green h w w <66 <140 <150 w w m f h h <145 <170 <66 <64 m f m f m f m f 11

12 Ranking Branches Last time, we used classification accuracy to measure value of a branch. 6M / 6F <68 height 1M / 5F 5M / 1F 50% Accuracy before Branch 83.3% Accuracy after Branch 33.3% Accuracy Improvement 12

13 Ranking Branches Measure Decrease in Entropy of the class distribution following the split 6M / 6F <68 height 1M / 5F 5M / 1F H(x) = 2 before Branch 83.3% Accuracy after Branch 33.3% Accuracy Improvement 13

14 InfoGain Criterion Calculate the decrease in Entropy across a split point. This represents the amount of information contained in the split. This is relatively indifferent to the position on the decision tree. More applicable to N-way classification. Accuracy represents the mode of the distribution Entropy can be reduced while leaving the mode unaffected. 14

15 Graphical Models and Conditional Independence More generally about probabilities, but used in classification and clustering. Both Linear Regression and Logistic Regression use probabilistic models. Graphical Models allow us to structure, and visualize probabilistic models, and the relationships between variables. 15

16 (Joint) Probability Tables Represent multinomial joint probabilities between K variables as K-dimensional tables p(x) =p(flu?,achiness?,headache?,..., temperature?) Assuming D binary variables, how big is this table? What is we had multinomials with M entries? 16

17 Probability Models What if the variables are independent? p(x) =p(flu?,achiness?,headache?,..., temperature?) If x and y are independent: p(x, y) =p(x)p(y) The original distribution can be factored p(x) =p(flu?)p(achiness?)p(headache?)...p(temperature?) How big is this table, if each variable is binary? 17

18 Conditional Independence Independence assumptions are convenient (Naïve Bayes), but rarely true. More often some groups of variables are dependent, but others are independent. Still others are conditionally independent. 18

19 Conditional Independence If two variables are conditionally independent. p(x, z y) =p(x y)p(z y) p(x, z) = p(x)p(z) E.g. y = flu?, x = achiness?, z = headache? x z y 19

20 Factorization if a joint Assume x z y How do you factorize: p(x, y, z) p(x, y, z) =p(x, z y)p(y) p(x y)p(z y)p(y) 20

21 Factorization if a joint What if there is no conditional independence? How do you factorize: p(x, y, z) p(x, y, z) =p(x, z y)p(y) p(x y, z)p(z y)p(y) 21

22 Structure of Graphical Models Graphical models allow us to represent dependence relationships between variables visually Graphical models are directed acyclic graphs (DAG). Nodes: random variables Edges: Dependence relationship No Edge: Independent variables Direction of the edge: indicates a parent-child relationship Parent: Source Trigger Child: Destination Response 22

23 Example Graphical Models p(x, y) =p(x)p(y) p(x, y) =p(x y)p(y) x y x y Parents of a node i are denoted π i Factorization of the joint in a graphical model: p(x 0,...,x n 1 )= n 1 i=0 p(x i π i ) 23

24 Basic Graphical Models Independent Variables Observations x y z x y z When we observe a variable, (fix its value from data) we color the node grey. Observing a variable allows us to condition on it. E.g. p(x,z y) Given an observation we can generate pdfs for the other variables. 24

25 Example Graphical Models X = cloudy? Y = raining? Z = wet ground? Markov Chain x y z p(x, y, z) = n {x,y,z} p(n π n )=p(x)p(y x)p(z y) 25

26 Example Graphical Models Markov Chain p(x, y, z) = x y z p(n π n )=p(x)p(y x)p(z y n {x,y,z} Are x and z conditionally independent given y? p(x, z y) =p(x y)p(z y) 26

27 Example Graphical Models Markov Chain p(x, y, z) = x y z n {x,y,z} p(n π n )=p(x)p(y x)p(z y) p(x, z y) =p(x z,y)p(z y) p(x z,y) = p(x, y, z) p(y, z) = p(x)p(y x) p(y) p(x, z y) =p(x y)p(z y) x z y = p(x)p(y x)p(z y) p(y)p(z y) p(x, y) = = p(x y) p(y) 27

28 One Trigger Two Responses X = achiness? Y = flu? Z = fever? y x z p(x, y, z) = (n π n )=p(x y)p(y)p(z y) n {x,y,z} 28

29 (x, y, z) = Example Graphical Models x n {x,y,z} Are x and z conditionally independent given y? y (n π n )=p(x y)p(y)p(z y) z p(x, z y) =p(x y)p(z y) 29

30 Example Graphical Models p(x, y, z) = p(x, z y) =p(x z,y)p(z y) p(x z,y) = p(x, y, z) p(y, z) = p(x y) p(x, z y) =p(x y)p(z y) x z y x n {x,y,z} y = p(x y)p(y)p(z y) p(y)p(z y) z (n π n )=p(x y)p(y)p(z y 30

31 Two Triggers One Response X = rain? Y = wet sidewalk? Z = spilled coffee? x z p(x, y, z) = n {x,y,z} y p(n π n )=p(x)p(y x, z)p(z) 31

32 Example Graphical Models x z p(x, y, z) = n {x,y,z} Are x and z conditionally independent given y? y p(n π n )=p(x)p(y x, z)p(z) p(x, z y) =p(x y)p(z y) 32

33 Example Graphical Models p(x, y, z) = p(x, z y) =p(x z,y)p(z y) p(x z,y) = p(x, y, z) p(y, z) = p(x) p(x, z y) =p(x)p(z y) x not z y = x n {x,y,z} y z p(n π n )=p(x)p(y x, z)p(z p(x)p(y x, z)p(z) p(y x, z)p(z) 33

34 Factorization x1 x3 x0 x5 x2 x4 p(x 0,x 1,x 2,x 3,x 4,x 5 )=? 34

35 Factorization x1 x3 x0 x5 x2 x4 p(x 0,x 1,x 2,x 3,x 4,x 5 )= p(x 0 )p(x 1 x 0 )p(x 2 x 0 )p(x 3 x 1 )p(x 4 x 2 )p(x 5 x 1,x 4 ) 35

36 How Large are the probability tables? p(x 0,x 1,x 2,x 3,x 4,x 5 )= p(x 0 )p(x 1 x 0 )p(x 2 x 0 )p(x 3 x 1 )p(x 4 x 2 )p(x 5 x 1,x 4 ) 36

37 Model Parameters as Nodes Treating model parameters as a random variable, we can include these in a graphical model Multivariate Bernouli µ0 µ1 µ2 x0 x1 x2 37

38 Model Parameters as Nodes Treating model parameters as a random variable, we can include these in a graphical model Multinomial µ x0 x1 x2 38

39 Naïve Bayes Classification y x0 x1 x2 Observed variables xi are independent given the class variable y The distribution can be optimized using maximum likelihood on each variable separately. Can easily combine various types of distributions p(y x 0 x 1,x 2 ) p(x 0,x 1,x 2 y)p(y) p(y x 0 x 1,x 2 ) p(x 0 y)p(x 1 y)p(x 2 y)p(y) 39

40 Graphical Models Graphical representation of dependency relationships Directed Acyclic Graphs Nodes as random variables Edges define dependency relations What can we do with Graphical Models Learn parameters to fit data Understand independence relationships between variables Perform inference (marginals and conditionals) Compute likelihoods for classification. 40

41 Plate Notation y y x0 x1 xn n xi To indicate a repeated variable, draw a plate around it. 41

42 Completely observed Graphical Model Observations for every node Flu Fever Sinus Ache Swell Head Y L Y Y Y N N M N N N N Y H N N Y Y Y M Y N N Y Simplest (least general) graph, assume each independent Fl Fe Si Ac Sw He 42

43 Completely observed Graphical Model Observations for every node Flu Fever Sinus Ache Swell Head Y L Y Y Y N N M N N N N Y H N N Y Y Y M Y N N Y Second simplest graph, assume complete dependence Fl Fe Si Ac Sw He 43

44 Maximum Likelihood x 0 x 1 x 2 x 3 x 4 x 5 Each node has a conditional probability table, θ Given the tables, we can construct the pdf. p(x θ) = M 1 i=0 p(x i π i, θ i ) Use Maximum Likelihood to find the best settings of θ 44

45 Maximum likelihood θ = argmax θ = argmax θ = argmax θ = argmax θ ln p( X θ) N 1 n=0 N 1 n=0 N 1 n=0 ln p( X n θ) ln M 1 i=0 M 1 i=0 p(x in θ i ) ln p(x in θ i ) 45

46 Count functions Count the number of times something appears in the data = x 1 m(x 1 )= x 1 x 2 δ(x 1,x 2 ) δ(x n, x m )= m(x i )= m( X)= { 1 if xn = x m 0 otherwise N 1 n=0 N 1 n=0 δ(x i,x in ) δ( X, X n ) = x 1 x 2 x 3 δ(x 1,x 2,x 3 )... 46

47 Maximum Likelihood l(θ) = = = N 1 n=0 N 1 n=0 N 1 n=0 ln p(x n θ) ln X p(x θ) δ(x n,x) δ(x n, X)lnp(X θ) X = x n m(x)lnp(x θ) = x n m(x)ln = M 1 x n = = M 1 i=0 M 1 i=0 i=0 x i,π i M 1 i=0 p(x i π i, θ i ) m(x)lnp(x i π i, θ i ) X\x i \π i m(x)lnp(x i π i, θ i ) x i,π i m(x i, π i )lnp(x i π i, θ i ) Define a function: Constraint: θ(x i, π i )=p(x i π i, θ i ) x i θ(x i, π i )=1 47

48 Maximum Likelihood Use Lagrange Multipliers l(θ) = M 1 i=0 m(x i, π i )lnθ(x i, π i ) x i π i l(θ) θ(x i, π i ) = m(x i, π i ) θ(x i, π i ) λ π i =0 θ(x i, π i )= m(x i, π i ) λ πi m(x i, π i ) = 1 the constraint λ x πi i λ πi = x i m(x i, π i )=m(π i ) M 1 i=0 π i λ πi x i θ(x i, π i ) 1 θ(x i, π i )= m(x i, π i ) m(π i ) counts! 48

49 Maximum A Posteriori Training Bayesians would never do that, the thetas need a prior. θ(x i, π i )= m(x i, π i )+ m(π i )+ x i 49

50 Conditional Dependence Test Can check conditional independence in a graphical model Is achiness (x3) independent of the flue (x0) given fever(x1)? Is achiness (x3) independent of sinus infections(x2) given fever (x1)? p(x) =p(x 0 )p(x 1 x 0 )p(x 2 x 0 )p(x 3 x 1 )p(x 4 x 2 )p(x 5 x 1,x 4 ) p(x 3 x 0,x 1,x 2 )= p(x 0,x 1,x 2,x 3 ) p(x 0,x 1,x 2 ) x 3 x 0,x 2 x 1 = p(x 0)p(x 1 x 0 )p(x 2 x 0 )p(x 3 x 1 ) p(x 0 )p(x 1 x 0 )p(x 2 x 0 ) = p(x 3 x 1 ) 50

51 D-Separation and Bayes Ball x 0 x 1 x 2 Intuition: nodes are separated or blocked by sets of nodes. E.g. nodes x1 and x2, block the path from x0 to x5. So x0 is cond. ind.from x5 given x1 and x2 x 3 x 4 x 5 51

52 Bayes Ball Algorithm x a x b x c Shade nodes x c Place a ball at each node in x a Bounce balls around the graph according to rules If no balls reach x b, then cond. ind. 52

53 Ten rules of Bayes Ball Theorem 53

54 Bayes Ball Example x 0 x 4 x 2? x 3 x 0 x 1 x 5 x 2 x 4 54

55 Bayes Ball Example x 0 x 5 x 1, x 2? x 3 x 0 x 1 x 5 x 2 x 4 55

56 Undirected Graphs What if we allow undirected graphs? What do they correspond to? Not Cause/Effect, or Trigger/Response, but general dependence Example: Image pixels, each pixel is a bernouli P(x11,, x1m,, xm1,, xmm) Bright pixels have bright neighbors No parents, just probabilities. Grid models are called Markov Random Fields 56

57 Undirected Graphs A D C Undirected separability is easy. B To check conditional independence of A and B given C, check the Graph reachability of A and B without going through nodes in C 57

58 Next Time More fun with Graphical Models Read Chapter 8.1,

Lecture 9: Undirected Graphical Models Machine Learning

Lecture 9: Undirected Graphical Models Machine Learning Lecture 9: Undirected Graphical Models Machine Learning Andrew Rosenberg March 5, 2010 1/1 Today Graphical Models Probabilities in Undirected Graphs 2/1 Undirected Graphs What if we allow undirected graphs?

More information

Graphical Models. David M. Blei Columbia University. September 17, 2014

Graphical Models. David M. Blei Columbia University. September 17, 2014 Graphical Models David M. Blei Columbia University September 17, 2014 These lecture notes follow the ideas in Chapter 2 of An Introduction to Probabilistic Graphical Models by Michael Jordan. In addition,

More information

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models Prof. Daniel Cremers 4. Probabilistic Graphical Models Directed Models The Bayes Filter (Rep.) (Bayes) (Markov) (Tot. prob.) (Markov) (Markov) 2 Graphical Representation (Rep.) We can describe the overall

More information

The Basics of Graphical Models

The Basics of Graphical Models The Basics of Graphical Models David M. Blei Columbia University September 30, 2016 1 Introduction (These notes follow Chapter 2 of An Introduction to Probabilistic Graphical Models by Michael Jordan.

More information

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models Prof. Daniel Cremers 4. Probabilistic Graphical Models Directed Models The Bayes Filter (Rep.) (Bayes) (Markov) (Tot. prob.) (Markov) (Markov) 2 Graphical Representation (Rep.) We can describe the overall

More information

Conditional Random Fields - A probabilistic graphical model. Yen-Chin Lee 指導老師 : 鮑興國

Conditional Random Fields - A probabilistic graphical model. Yen-Chin Lee 指導老師 : 鮑興國 Conditional Random Fields - A probabilistic graphical model Yen-Chin Lee 指導老師 : 鮑興國 Outline Labeling sequence data problem Introduction conditional random field (CRF) Different views on building a conditional

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 25, 2015 Today: Graphical models Bayes Nets: Inference Learning EM Readings: Bishop chapter 8 Mitchell

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2, 2012 Today: Graphical models Bayes Nets: Representing distributions Conditional independencies

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 18, 2015 Today: Graphical models Bayes Nets: Representing distributions Conditional independencies

More information

Lecture 3: Conditional Independence - Undirected

Lecture 3: Conditional Independence - Undirected CS598: Graphical Models, Fall 2016 Lecture 3: Conditional Independence - Undirected Lecturer: Sanmi Koyejo Scribe: Nate Bowman and Erin Carrier, Aug. 30, 2016 1 Review for the Bayes-Ball Algorithm Recall

More information

Computer Vision Group Prof. Daniel Cremers. 4a. Inference in Graphical Models

Computer Vision Group Prof. Daniel Cremers. 4a. Inference in Graphical Models Group Prof. Daniel Cremers 4a. Inference in Graphical Models Inference on a Chain (Rep.) The first values of µ α and µ β are: The partition function can be computed at any node: Overall, we have O(NK 2

More information

Bayesian Networks Inference (continued) Learning

Bayesian Networks Inference (continued) Learning Learning BN tutorial: ftp://ftp.research.microsoft.com/pub/tr/tr-95-06.pdf TAN paper: http://www.cs.huji.ac.il/~nir/abstracts/frgg1.html Bayesian Networks Inference (continued) Learning Machine Learning

More information

Machine Learning

Machine Learning Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 15, 2011 Today: Graphical models Inference Conditional independence and D-separation Learning from

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University March 4, 2015 Today: Graphical models Bayes Nets: EM Mixture of Gaussian clustering Learning Bayes Net structure

More information

Directed Graphical Models (Bayes Nets) (9/4/13)

Directed Graphical Models (Bayes Nets) (9/4/13) STA561: Probabilistic machine learning Directed Graphical Models (Bayes Nets) (9/4/13) Lecturer: Barbara Engelhardt Scribes: Richard (Fangjian) Guo, Yan Chen, Siyang Wang, Huayang Cui 1 Introduction For

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 9, 2012 Today: Graphical models Bayes Nets: Inference Learning Readings: Required: Bishop chapter

More information

D-Separation. b) the arrows meet head-to-head at the node, and neither the node, nor any of its descendants, are in the set C.

D-Separation. b) the arrows meet head-to-head at the node, and neither the node, nor any of its descendants, are in the set C. D-Separation Say: A, B, and C are non-intersecting subsets of nodes in a directed graph. A path from A to B is blocked by C if it contains a node such that either a) the arrows on the path meet either

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University April 1, 2019 Today: Inference in graphical models Learning graphical models Readings: Bishop chapter 8 Bayesian

More information

Conditional Random Fields : Theory and Application

Conditional Random Fields : Theory and Application Conditional Random Fields : Theory and Application Matt Seigel (mss46@cam.ac.uk) 3 June 2010 Cambridge University Engineering Department Outline The Sequence Classification Problem Linear Chain CRFs CRF

More information

Graphical Models & HMMs

Graphical Models & HMMs Graphical Models & HMMs Henrik I. Christensen Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280 hic@cc.gatech.edu Henrik I. Christensen (RIM@GT) Graphical Models

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 2: Probability: Discrete Random Variables Classification: Validation & Model Selection Many figures

More information

CS 343: Artificial Intelligence

CS 343: Artificial Intelligence CS 343: Artificial Intelligence Bayes Nets: Independence Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley.

More information

1 : Introduction to GM and Directed GMs: Bayesian Networks. 3 Multivariate Distributions and Graphical Models

1 : Introduction to GM and Directed GMs: Bayesian Networks. 3 Multivariate Distributions and Graphical Models 10-708: Probabilistic Graphical Models, Spring 2015 1 : Introduction to GM and Directed GMs: Bayesian Networks Lecturer: Eric P. Xing Scribes: Wenbo Liu, Venkata Krishna Pillutla 1 Overview This lecture

More information

Generative and discriminative classification techniques

Generative and discriminative classification techniques Generative and discriminative classification techniques Machine Learning and Category Representation 2014-2015 Jakob Verbeek, November 28, 2014 Course website: http://lear.inrialpes.fr/~verbeek/mlcr.14.15

More information

Introduction to Graphical Models

Introduction to Graphical Models Robert Collins CSE586 Introduction to Graphical Models Readings in Prince textbook: Chapters 10 and 11 but mainly only on directed graphs at this time Credits: Several slides are from: Review: Probability

More information

2. Graphical Models. Undirected graphical models. Factor graphs. Bayesian networks. Conversion between graphical models. Graphical Models 2-1

2. Graphical Models. Undirected graphical models. Factor graphs. Bayesian networks. Conversion between graphical models. Graphical Models 2-1 Graphical Models 2-1 2. Graphical Models Undirected graphical models Factor graphs Bayesian networks Conversion between graphical models Graphical Models 2-2 Graphical models There are three families of

More information

Bayesian Classification Using Probabilistic Graphical Models

Bayesian Classification Using Probabilistic Graphical Models San Jose State University SJSU ScholarWorks Master's Projects Master's Theses and Graduate Research Spring 2014 Bayesian Classification Using Probabilistic Graphical Models Mehal Patel San Jose State University

More information

Machine Learning. Supervised Learning. Manfred Huber

Machine Learning. Supervised Learning. Manfred Huber Machine Learning Supervised Learning Manfred Huber 2015 1 Supervised Learning Supervised learning is learning where the training data contains the target output of the learning system. Training data D

More information

COMP90051 Statistical Machine Learning

COMP90051 Statistical Machine Learning COMP90051 Statistical Machine Learning Semester 2, 2016 Lecturer: Trevor Cohn 21. Independence in PGMs; Example PGMs Independence PGMs encode assumption of statistical independence between variables. Critical

More information

COMP90051 Statistical Machine Learning

COMP90051 Statistical Machine Learning COMP90051 Statistical Machine Learning Semester 2, 2016 Lecturer: Trevor Cohn 20. PGM Representation Next Lectures Representation of joint distributions Conditional/marginal independence * Directed vs

More information

Statistical Methods for NLP

Statistical Methods for NLP Statistical Methods for NLP Text Similarity, Text Categorization, Linear Methods of Classification Sameer Maskey Announcement Reading Assignments Bishop Book 6, 62, 7 (upto 7 only) J&M Book 3, 45 Journal

More information

Regularization and Markov Random Fields (MRF) CS 664 Spring 2008

Regularization and Markov Random Fields (MRF) CS 664 Spring 2008 Regularization and Markov Random Fields (MRF) CS 664 Spring 2008 Regularization in Low Level Vision Low level vision problems concerned with estimating some quantity at each pixel Visual motion (u(x,y),v(x,y))

More information

Machine Learning

Machine Learning Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 17, 2011 Today: Graphical models Learning from fully labeled data Learning from partly observed data

More information

CS242: Probabilistic Graphical Models Lecture 3: Factor Graphs & Variable Elimination

CS242: Probabilistic Graphical Models Lecture 3: Factor Graphs & Variable Elimination CS242: Probabilistic Graphical Models Lecture 3: Factor Graphs & Variable Elimination Instructor: Erik Sudderth Brown University Computer Science September 11, 2014 Some figures and materials courtesy

More information

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning Topics Bayes Nets (Finish) Parameter Learning Structure Learning Readings: KF 18.1, 18.3; Barber 9.5,

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Theory of Variational Inference: Inner and Outer Approximation Eric Xing Lecture 14, February 29, 2016 Reading: W & J Book Chapters Eric Xing @

More information

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning Topics Bayes Nets: Inference (Finish) Variable Elimination Graph-view of VE: Fill-edges, induced width

More information

Note Set 4: Finite Mixture Models and the EM Algorithm

Note Set 4: Finite Mixture Models and the EM Algorithm Note Set 4: Finite Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine Finite Mixture Models A finite mixture model with K components, for

More information

Mixture Models and the EM Algorithm

Mixture Models and the EM Algorithm Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Finite Mixture Models Say we have a data set D = {x 1,..., x N } where x i is

More information

Conditional Random Fields and beyond D A N I E L K H A S H A B I C S U I U C,

Conditional Random Fields and beyond D A N I E L K H A S H A B I C S U I U C, Conditional Random Fields and beyond D A N I E L K H A S H A B I C S 5 4 6 U I U C, 2 0 1 3 Outline Modeling Inference Training Applications Outline Modeling Problem definition Discriminative vs. Generative

More information

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet.

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. CS 189 Spring 2015 Introduction to Machine Learning Final You have 2 hours 50 minutes for the exam. The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. No calculators or

More information

FMA901F: Machine Learning Lecture 6: Graphical Models. Cristian Sminchisescu

FMA901F: Machine Learning Lecture 6: Graphical Models. Cristian Sminchisescu FMA901F: Machine Learning Lecture 6: Graphical Models Cristian Sminchisescu Graphical Models Provide a simple way to visualize the structure of a probabilistic model and can be used to design and motivate

More information

Today s outline: pp

Today s outline: pp Chapter 3 sections We will SKIP a number of sections Random variables and discrete distributions Continuous distributions The cumulative distribution function Bivariate distributions Marginal distributions

More information

3 : Representation of Undirected GMs

3 : Representation of Undirected GMs 0-708: Probabilistic Graphical Models 0-708, Spring 202 3 : Representation of Undirected GMs Lecturer: Eric P. Xing Scribes: Nicole Rafidi, Kirstin Early Last Time In the last lecture, we discussed directed

More information

A Brief Introduction to Bayesian Networks AIMA CIS 391 Intro to Artificial Intelligence

A Brief Introduction to Bayesian Networks AIMA CIS 391 Intro to Artificial Intelligence A Brief Introduction to Bayesian Networks AIMA 14.1-14.3 CIS 391 Intro to Artificial Intelligence (LDA slides from Lyle Ungar from slides by Jonathan Huang (jch1@cs.cmu.edu)) Bayesian networks A simple,

More information

Bayesian Machine Learning - Lecture 6

Bayesian Machine Learning - Lecture 6 Bayesian Machine Learning - Lecture 6 Guido Sanguinetti Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh gsanguin@inf.ed.ac.uk March 2, 2015 Today s lecture 1

More information

Markov Random Fields and Gibbs Sampling for Image Denoising

Markov Random Fields and Gibbs Sampling for Image Denoising Markov Random Fields and Gibbs Sampling for Image Denoising Chang Yue Electrical Engineering Stanford University changyue@stanfoed.edu Abstract This project applies Gibbs Sampling based on different Markov

More information

Generative and discriminative classification techniques

Generative and discriminative classification techniques Generative and discriminative classification techniques Machine Learning and Object Recognition 2015-2016 Jakob Verbeek, December 11, 2015 Course website: http://lear.inrialpes.fr/~verbeek/mlor.15.16 Classification

More information

Sequence Labeling: The Problem

Sequence Labeling: The Problem Sequence Labeling: The Problem Given a sequence (in NLP, words), assign appropriate labels to each word. For example, POS tagging: DT NN VBD IN DT NN. The cat sat on the mat. 36 part-of-speech tags used

More information

Machine Learning / Jan 27, 2010

Machine Learning / Jan 27, 2010 Revisiting Logistic Regression & Naïve Bayes Aarti Singh Machine Learning 10-701/15-781 Jan 27, 2010 Generative and Discriminative Classifiers Training classifiers involves learning a mapping f: X -> Y,

More information

Supervised Learning Classification Algorithms Comparison

Supervised Learning Classification Algorithms Comparison Supervised Learning Classification Algorithms Comparison Aditya Singh Rathore B.Tech, J.K. Lakshmipat University -------------------------------------------------------------***---------------------------------------------------------

More information

6.867 Machine Learning

6.867 Machine Learning 6.867 Machine Learning Problem set 3 Due Tuesday, October 22, in class What and how to turn in? Turn in short written answers to the questions explicitly stated, and when requested to explain or prove.

More information

Machine Learning. Sourangshu Bhattacharya

Machine Learning. Sourangshu Bhattacharya Machine Learning Sourangshu Bhattacharya Bayesian Networks Directed Acyclic Graph (DAG) Bayesian Networks General Factorization Curve Fitting Re-visited Maximum Likelihood Determine by minimizing sum-of-squares

More information

Graphical models are a lot like a circuit diagram they are written down to visualize and better understand a problem.

Graphical models are a lot like a circuit diagram they are written down to visualize and better understand a problem. Machine Learning (ML, F16) Lecture#15 (Tuesday, Nov. 1st) Lecturer: Byron Boots Graphical Models 1 Graphical Models Often, one is interested in representing a joint distribution P over a set of n random

More information

Computer vision: models, learning and inference. Chapter 10 Graphical Models

Computer vision: models, learning and inference. Chapter 10 Graphical Models Computer vision: models, learning and inference Chapter 10 Graphical Models Independence Two variables x 1 and x 2 are independent if their joint probability distribution factorizes as Pr(x 1, x 2 )=Pr(x

More information

Network Traffic Measurements and Analysis

Network Traffic Measurements and Analysis DEIB - Politecnico di Milano Fall, 2017 Sources Hastie, Tibshirani, Friedman: The Elements of Statistical Learning James, Witten, Hastie, Tibshirani: An Introduction to Statistical Learning Andrew Ng:

More information

Introduction to Machine Learning

Introduction to Machine Learning Department of Computer Science, University of Helsinki Autumn 2009, second term Session 8, November 27 th, 2009 1 2 3 Multiplicative Updates for L1-Regularized Linear and Logistic Last time I gave you

More information

Bayes Net Learning. EECS 474 Fall 2016

Bayes Net Learning. EECS 474 Fall 2016 Bayes Net Learning EECS 474 Fall 2016 Homework Remaining Homework #3 assigned Homework #4 will be about semi-supervised learning and expectation-maximization Homeworks #3-#4: the how of Graphical Models

More information

CS 343: Artificial Intelligence

CS 343: Artificial Intelligence CS 343: Artificial Intelligence Bayes Nets: Inference Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley.

More information

6 : Factor Graphs, Message Passing and Junction Trees

6 : Factor Graphs, Message Passing and Junction Trees 10-708: Probabilistic Graphical Models 10-708, Spring 2018 6 : Factor Graphs, Message Passing and Junction Trees Lecturer: Kayhan Batmanghelich Scribes: Sarthak Garg 1 Factor Graphs Factor Graphs are graphical

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 5 Inference

More information

2. Graphical Models. Undirected pairwise graphical models. Factor graphs. Bayesian networks. Conversion between graphical models. Graphical Models 2-1

2. Graphical Models. Undirected pairwise graphical models. Factor graphs. Bayesian networks. Conversion between graphical models. Graphical Models 2-1 Graphical Models 2-1 2. Graphical Models Undirected pairwise graphical models Factor graphs Bayesian networks Conversion between graphical models Graphical Models 2-2 Graphical models Families of graphical

More information

Statistical Techniques in Robotics (STR, S15) Lecture#06 (Wednesday, January 28)

Statistical Techniques in Robotics (STR, S15) Lecture#06 (Wednesday, January 28) Statistical Techniques in Robotics (STR, S15) Lecture#06 (Wednesday, January 28) Lecturer: Byron Boots Graphical Models 1 Graphical Models Often one is interested in representing a joint distribution P

More information

Bayesian Networks Inference

Bayesian Networks Inference Bayesian Networks Inference Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University November 5 th, 2007 2005-2007 Carlos Guestrin 1 General probabilistic inference Flu Allergy Query: Sinus

More information

Graphical Models. Pradeep Ravikumar Department of Computer Science The University of Texas at Austin

Graphical Models. Pradeep Ravikumar Department of Computer Science The University of Texas at Austin Graphical Models Pradeep Ravikumar Department of Computer Science The University of Texas at Austin Useful References Graphical models, exponential families, and variational inference. M. J. Wainwright

More information

Convex Optimization MLSS 2015

Convex Optimization MLSS 2015 Convex Optimization MLSS 2015 Constantine Caramanis The University of Texas at Austin The Optimization Problem minimize : f (x) subject to : x X. The Optimization Problem minimize : f (x) subject to :

More information

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review Gautam Kunapuli Machine Learning Data is identically and independently distributed Goal is to learn a function that maps to Data is generated using an unknown function Learn a hypothesis that minimizes

More information

Part II. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Part II. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Part II C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Converting Directed to Undirected Graphs (1) Converting Directed to Undirected Graphs (2) Add extra links between

More information

10-701/15-781, Fall 2006, Final

10-701/15-781, Fall 2006, Final -7/-78, Fall 6, Final Dec, :pm-8:pm There are 9 questions in this exam ( pages including this cover sheet). If you need more room to work out your answer to a question, use the back of the page and clearly

More information

Machine Learning. Chao Lan

Machine Learning. Chao Lan Machine Learning Chao Lan Machine Learning Prediction Models Regression Model - linear regression (least square, ridge regression, Lasso) Classification Model - naive Bayes, logistic regression, Gaussian

More information

Time series, HMMs, Kalman Filters

Time series, HMMs, Kalman Filters Classic HMM tutorial see class website: *L. R. Rabiner, "A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition," Proc. of the IEEE, Vol.77, No.2, pp.257--286, 1989. Time series,

More information

Learning Bayesian Networks (part 3) Goals for the lecture

Learning Bayesian Networks (part 3) Goals for the lecture Learning Bayesian Networks (part 3) Mark Craven and David Page Computer Sciences 760 Spring 2018 www.biostat.wisc.edu/~craven/cs760/ Some of the slides in these lectures have been adapted/borrowed from

More information

Machine Learning Classifiers and Boosting

Machine Learning Classifiers and Boosting Machine Learning Classifiers and Boosting Reading Ch 18.6-18.12, 20.1-20.3.2 Outline Different types of learning problems Different types of learning algorithms Supervised learning Decision trees Naïve

More information

Graphical Models. Dmitrij Lagutin, T Machine Learning: Basic Principles

Graphical Models. Dmitrij Lagutin, T Machine Learning: Basic Principles Graphical Models Dmitrij Lagutin, dlagutin@cc.hut.fi T-61.6020 - Machine Learning: Basic Principles 12.3.2007 Contents Introduction to graphical models Bayesian networks Conditional independence Markov

More information

Comparative Analysis of Naive Bayes and Tree Augmented Naive Bayes Models

Comparative Analysis of Naive Bayes and Tree Augmented Naive Bayes Models San Jose State University SJSU ScholarWorks Master's Projects Master's Theses and Graduate Research Spring 2014 Comparative Analysis of Naive Bayes and Tree Augmented Naive Bayes Models Harini Padmanaban

More information

Algorithms for Markov Random Fields in Computer Vision

Algorithms for Markov Random Fields in Computer Vision Algorithms for Markov Random Fields in Computer Vision Dan Huttenlocher November, 2003 (Joint work with Pedro Felzenszwalb) Random Field Broadly applicable stochastic model Collection of n sites S Hidden

More information

A Parallel Algorithm for Exact Structure Learning of Bayesian Networks

A Parallel Algorithm for Exact Structure Learning of Bayesian Networks A Parallel Algorithm for Exact Structure Learning of Bayesian Networks Olga Nikolova, Jaroslaw Zola, and Srinivas Aluru Department of Computer Engineering Iowa State University Ames, IA 0010 {olia,zola,aluru}@iastate.edu

More information

Image analysis. Computer Vision and Classification Image Segmentation. 7 Image analysis

Image analysis. Computer Vision and Classification Image Segmentation. 7 Image analysis 7 Computer Vision and Classification 413 / 458 Computer Vision and Classification The k-nearest-neighbor method The k-nearest-neighbor (knn) procedure has been used in data analysis and machine learning

More information

Machine Learning A WS15/16 1sst KU Version: January 11, b) [1 P] For the probability distribution P (A, B, C, D) with the factorization

Machine Learning A WS15/16 1sst KU Version: January 11, b) [1 P] For the probability distribution P (A, B, C, D) with the factorization Machine Learning A 708.064 WS15/16 1sst KU Version: January 11, 2016 Exercises Problems marked with * are optional. 1 Conditional Independence I [3 P] a) [1 P] For the probability distribution P (A, B,

More information

Multi-Class Logistic Regression and Perceptron

Multi-Class Logistic Regression and Perceptron Multi-Class Logistic Regression and Perceptron Instructor: Wei Xu Some slides adapted from Dan Jurfasky, Brendan O Connor and Marine Carpuat MultiClass Classification Q: what if we have more than 2 categories?

More information

Generative and discriminative classification

Generative and discriminative classification Generative and discriminative classification Machine Learning and Object Recognition 2017-2018 Jakob Verbeek Classification in its simplest form Given training data labeled for two or more classes Classification

More information

Scene Grammars, Factor Graphs, and Belief Propagation

Scene Grammars, Factor Graphs, and Belief Propagation Scene Grammars, Factor Graphs, and Belief Propagation Pedro Felzenszwalb Brown University Joint work with Jeroen Chua Probabilistic Scene Grammars General purpose framework for image understanding and

More information

Supervised Learning for Image Segmentation

Supervised Learning for Image Segmentation Supervised Learning for Image Segmentation Raphael Meier 06.10.2016 Raphael Meier MIA 2016 06.10.2016 1 / 52 References A. Ng, Machine Learning lecture, Stanford University. A. Criminisi, J. Shotton, E.

More information

Bayesian Networks. A Bayesian network is a directed acyclic graph that represents causal relationships between random variables. Earthquake.

Bayesian Networks. A Bayesian network is a directed acyclic graph that represents causal relationships between random variables. Earthquake. Bayes Nets Independence With joint probability distributions we can compute many useful things, but working with joint PD's is often intractable. The naïve Bayes' approach represents one (boneheaded?)

More information

Introduction to Pattern Recognition Part II. Selim Aksoy Bilkent University Department of Computer Engineering

Introduction to Pattern Recognition Part II. Selim Aksoy Bilkent University Department of Computer Engineering Introduction to Pattern Recognition Part II Selim Aksoy Bilkent University Department of Computer Engineering saksoy@cs.bilkent.edu.tr RETINA Pattern Recognition Tutorial, Summer 2005 Overview Statistical

More information

Generative and discriminative classification techniques

Generative and discriminative classification techniques Generative and discriminative classification techniques Machine Learning and Category Representation 013-014 Jakob Verbeek, December 13+0, 013 Course website: http://lear.inrialpes.fr/~verbeek/mlcr.13.14

More information

Nearest neighbor classification DSE 220

Nearest neighbor classification DSE 220 Nearest neighbor classification DSE 220 Decision Trees Target variable Label Dependent variable Output space Person ID Age Gender Income Balance Mortgag e payment 123213 32 F 25000 32000 Y 17824 49 M 12000-3000

More information

Scene Grammars, Factor Graphs, and Belief Propagation

Scene Grammars, Factor Graphs, and Belief Propagation Scene Grammars, Factor Graphs, and Belief Propagation Pedro Felzenszwalb Brown University Joint work with Jeroen Chua Probabilistic Scene Grammars General purpose framework for image understanding and

More information

Machine Learning. Unsupervised Learning. Manfred Huber

Machine Learning. Unsupervised Learning. Manfred Huber Machine Learning Unsupervised Learning Manfred Huber 2015 1 Unsupervised Learning In supervised learning the training data provides desired target output for learning In unsupervised learning the training

More information

Markov Networks in Computer Vision. Sargur Srihari

Markov Networks in Computer Vision. Sargur Srihari Markov Networks in Computer Vision Sargur srihari@cedar.buffalo.edu 1 Markov Networks for Computer Vision Important application area for MNs 1. Image segmentation 2. Removal of blur/noise 3. Stereo reconstruction

More information

K-Means and Gaussian Mixture Models

K-Means and Gaussian Mixture Models K-Means and Gaussian Mixture Models David Rosenberg New York University June 15, 2015 David Rosenberg (New York University) DS-GA 1003 June 15, 2015 1 / 43 K-Means Clustering Example: Old Faithful Geyser

More information

Introduction to Hidden Markov models

Introduction to Hidden Markov models 1/38 Introduction to Hidden Markov models Mark Johnson Macquarie University September 17, 2014 2/38 Outline Sequence labelling Hidden Markov Models Finding the most probable label sequence Higher-order

More information

Introduction to Optimization

Introduction to Optimization Introduction to Optimization Second Order Optimization Methods Marc Toussaint U Stuttgart Planned Outline Gradient-based optimization (1st order methods) plain grad., steepest descent, conjugate grad.,

More information

Clustering. Mihaela van der Schaar. January 27, Department of Engineering Science University of Oxford

Clustering. Mihaela van der Schaar. January 27, Department of Engineering Science University of Oxford Department of Engineering Science University of Oxford January 27, 2017 Many datasets consist of multiple heterogeneous subsets. Cluster analysis: Given an unlabelled data, want algorithms that automatically

More information

Semi-supervised Learning

Semi-supervised Learning Semi-supervised Learning Piyush Rai CS5350/6350: Machine Learning November 8, 2011 Semi-supervised Learning Supervised Learning models require labeled data Learning a reliable model usually requires plenty

More information

CS839: Probabilistic Graphical Models. Lecture 10: Learning with Partially Observed Data. Theo Rekatsinas

CS839: Probabilistic Graphical Models. Lecture 10: Learning with Partially Observed Data. Theo Rekatsinas CS839: Probabilistic Graphical Models Lecture 10: Learning with Partially Observed Data Theo Rekatsinas 1 Partially Observed GMs Speech recognition 2 Partially Observed GMs Evolution 3 Partially Observed

More information

STAT 598L Learning Bayesian Network Structure

STAT 598L Learning Bayesian Network Structure STAT 598L Learning Bayesian Network Structure Sergey Kirshner Department of Statistics Purdue University skirshne@purdue.edu November 2, 2009 Acknowledgements: some of the slides were based on Luo Si s

More information

Building Classifiers using Bayesian Networks

Building Classifiers using Bayesian Networks Building Classifiers using Bayesian Networks Nir Friedman and Moises Goldszmidt 1997 Presented by Brian Collins and Lukas Seitlinger Paper Summary The Naive Bayes classifier has reasonable performance

More information

1) Give decision trees to represent the following Boolean functions:

1) Give decision trees to represent the following Boolean functions: 1) Give decision trees to represent the following Boolean functions: 1) A B 2) A [B C] 3) A XOR B 4) [A B] [C Dl Answer: 1) A B 2) A [B C] 1 3) A XOR B = (A B) ( A B) 4) [A B] [C D] 2 2) Consider the following

More information