Today. Logistic Regression. Decision Trees Redux. Graphical Models. Maximum Entropy Formulation. Now using Information Theory
|
|
- Kathryn Flynn
- 6 years ago
- Views:
Transcription
1 Today Logistic Regression Maximum Entropy Formulation Decision Trees Redux Now using Information Theory Graphical Models Representing conditional dependence graphically 1
2 Logistic Regression Optimization Take the gradient in terms of w ( w) = ln p(t w) = N 1 n=0 {t n ln y n +(1 t n )ln(1 y n )} Where y 0 = p(c 0 x n )=σ(a n ) w E = a n = w T x n N 1 n=0 E y n y n a n w a n 2
3 Optimization We know the gradient of the error function, but how do we find the maximum value? w E = (y n t n )x n N 1 n=0 Setting to zero is nontrivial Numerical approximation 3
4 Entropy H(x) = x p(x) log 2 p(x) Measure of uncertainty, or Measure of Information High uncertainty equals high entropy. Rare events are more informative than common events. 4
5 Examples of Entropy Uniform distributions have higher distributions. 5
6 Maximum Entropy Logistic Regression is also known as Maximum Entropy. Entropy is convex. Convergence Expectation. Constrain this optimization to enforce good classification. Increase maximum likelihood of the data while making the distribution of weights most even. Include as many useful features as possible. 6
7 Maximum Entropy with Constraints From Klein and Manning Tutorial 7
8 Optimization formulation If we let the weights represent likelihoods of value for each feature. max H(w; x, t) = x w log 2 w For each feature i s.t. w T x = t and w 2 =1 8
9 Solving MaxEnt formulation Convex optimization with a concave objective function and linear constraints. Lagrange Multipliers max H(w; x, t) = x w log 2 w Dual representation of the maximum likelihood estimation of Logistic Regression (p, λ)= w w log 2 w + N i=1 For each feature i s.t. w T x = t and w 2 =1 λ i (wi T x i t)+λ 0 ( w 2 1) 9
10 Decision Trees Nested if -statements for classification Each Decision Tree Node contains a feature and a split point. Challenges: Determine which feature and split point to use Determine which branches are worth including at all (Pruning) 10
11 Decision Trees color blue brown green h w w <66 <140 <150 w w m f h h <145 <170 <66 <64 m f m f m f m f 11
12 Ranking Branches Last time, we used classification accuracy to measure value of a branch. 6M / 6F <68 height 1M / 5F 5M / 1F 50% Accuracy before Branch 83.3% Accuracy after Branch 33.3% Accuracy Improvement 12
13 Ranking Branches Measure Decrease in Entropy of the class distribution following the split 6M / 6F <68 height 1M / 5F 5M / 1F H(x) = 2 before Branch 83.3% Accuracy after Branch 33.3% Accuracy Improvement 13
14 InfoGain Criterion Calculate the decrease in Entropy across a split point. This represents the amount of information contained in the split. This is relatively indifferent to the position on the decision tree. More applicable to N-way classification. Accuracy represents the mode of the distribution Entropy can be reduced while leaving the mode unaffected. 14
15 Graphical Models and Conditional Independence More generally about probabilities, but used in classification and clustering. Both Linear Regression and Logistic Regression use probabilistic models. Graphical Models allow us to structure, and visualize probabilistic models, and the relationships between variables. 15
16 (Joint) Probability Tables Represent multinomial joint probabilities between K variables as K-dimensional tables p(x) =p(flu?,achiness?,headache?,..., temperature?) Assuming D binary variables, how big is this table? What is we had multinomials with M entries? 16
17 Probability Models What if the variables are independent? p(x) =p(flu?,achiness?,headache?,..., temperature?) If x and y are independent: p(x, y) =p(x)p(y) The original distribution can be factored p(x) =p(flu?)p(achiness?)p(headache?)...p(temperature?) How big is this table, if each variable is binary? 17
18 Conditional Independence Independence assumptions are convenient (Naïve Bayes), but rarely true. More often some groups of variables are dependent, but others are independent. Still others are conditionally independent. 18
19 Conditional Independence If two variables are conditionally independent. p(x, z y) =p(x y)p(z y) p(x, z) = p(x)p(z) E.g. y = flu?, x = achiness?, z = headache? x z y 19
20 Factorization if a joint Assume x z y How do you factorize: p(x, y, z) p(x, y, z) =p(x, z y)p(y) p(x y)p(z y)p(y) 20
21 Factorization if a joint What if there is no conditional independence? How do you factorize: p(x, y, z) p(x, y, z) =p(x, z y)p(y) p(x y, z)p(z y)p(y) 21
22 Structure of Graphical Models Graphical models allow us to represent dependence relationships between variables visually Graphical models are directed acyclic graphs (DAG). Nodes: random variables Edges: Dependence relationship No Edge: Independent variables Direction of the edge: indicates a parent-child relationship Parent: Source Trigger Child: Destination Response 22
23 Example Graphical Models p(x, y) =p(x)p(y) p(x, y) =p(x y)p(y) x y x y Parents of a node i are denoted π i Factorization of the joint in a graphical model: p(x 0,...,x n 1 )= n 1 i=0 p(x i π i ) 23
24 Basic Graphical Models Independent Variables Observations x y z x y z When we observe a variable, (fix its value from data) we color the node grey. Observing a variable allows us to condition on it. E.g. p(x,z y) Given an observation we can generate pdfs for the other variables. 24
25 Example Graphical Models X = cloudy? Y = raining? Z = wet ground? Markov Chain x y z p(x, y, z) = n {x,y,z} p(n π n )=p(x)p(y x)p(z y) 25
26 Example Graphical Models Markov Chain p(x, y, z) = x y z p(n π n )=p(x)p(y x)p(z y n {x,y,z} Are x and z conditionally independent given y? p(x, z y) =p(x y)p(z y) 26
27 Example Graphical Models Markov Chain p(x, y, z) = x y z n {x,y,z} p(n π n )=p(x)p(y x)p(z y) p(x, z y) =p(x z,y)p(z y) p(x z,y) = p(x, y, z) p(y, z) = p(x)p(y x) p(y) p(x, z y) =p(x y)p(z y) x z y = p(x)p(y x)p(z y) p(y)p(z y) p(x, y) = = p(x y) p(y) 27
28 One Trigger Two Responses X = achiness? Y = flu? Z = fever? y x z p(x, y, z) = (n π n )=p(x y)p(y)p(z y) n {x,y,z} 28
29 (x, y, z) = Example Graphical Models x n {x,y,z} Are x and z conditionally independent given y? y (n π n )=p(x y)p(y)p(z y) z p(x, z y) =p(x y)p(z y) 29
30 Example Graphical Models p(x, y, z) = p(x, z y) =p(x z,y)p(z y) p(x z,y) = p(x, y, z) p(y, z) = p(x y) p(x, z y) =p(x y)p(z y) x z y x n {x,y,z} y = p(x y)p(y)p(z y) p(y)p(z y) z (n π n )=p(x y)p(y)p(z y 30
31 Two Triggers One Response X = rain? Y = wet sidewalk? Z = spilled coffee? x z p(x, y, z) = n {x,y,z} y p(n π n )=p(x)p(y x, z)p(z) 31
32 Example Graphical Models x z p(x, y, z) = n {x,y,z} Are x and z conditionally independent given y? y p(n π n )=p(x)p(y x, z)p(z) p(x, z y) =p(x y)p(z y) 32
33 Example Graphical Models p(x, y, z) = p(x, z y) =p(x z,y)p(z y) p(x z,y) = p(x, y, z) p(y, z) = p(x) p(x, z y) =p(x)p(z y) x not z y = x n {x,y,z} y z p(n π n )=p(x)p(y x, z)p(z p(x)p(y x, z)p(z) p(y x, z)p(z) 33
34 Factorization x1 x3 x0 x5 x2 x4 p(x 0,x 1,x 2,x 3,x 4,x 5 )=? 34
35 Factorization x1 x3 x0 x5 x2 x4 p(x 0,x 1,x 2,x 3,x 4,x 5 )= p(x 0 )p(x 1 x 0 )p(x 2 x 0 )p(x 3 x 1 )p(x 4 x 2 )p(x 5 x 1,x 4 ) 35
36 How Large are the probability tables? p(x 0,x 1,x 2,x 3,x 4,x 5 )= p(x 0 )p(x 1 x 0 )p(x 2 x 0 )p(x 3 x 1 )p(x 4 x 2 )p(x 5 x 1,x 4 ) 36
37 Model Parameters as Nodes Treating model parameters as a random variable, we can include these in a graphical model Multivariate Bernouli µ0 µ1 µ2 x0 x1 x2 37
38 Model Parameters as Nodes Treating model parameters as a random variable, we can include these in a graphical model Multinomial µ x0 x1 x2 38
39 Naïve Bayes Classification y x0 x1 x2 Observed variables xi are independent given the class variable y The distribution can be optimized using maximum likelihood on each variable separately. Can easily combine various types of distributions p(y x 0 x 1,x 2 ) p(x 0,x 1,x 2 y)p(y) p(y x 0 x 1,x 2 ) p(x 0 y)p(x 1 y)p(x 2 y)p(y) 39
40 Graphical Models Graphical representation of dependency relationships Directed Acyclic Graphs Nodes as random variables Edges define dependency relations What can we do with Graphical Models Learn parameters to fit data Understand independence relationships between variables Perform inference (marginals and conditionals) Compute likelihoods for classification. 40
41 Plate Notation y y x0 x1 xn n xi To indicate a repeated variable, draw a plate around it. 41
42 Completely observed Graphical Model Observations for every node Flu Fever Sinus Ache Swell Head Y L Y Y Y N N M N N N N Y H N N Y Y Y M Y N N Y Simplest (least general) graph, assume each independent Fl Fe Si Ac Sw He 42
43 Completely observed Graphical Model Observations for every node Flu Fever Sinus Ache Swell Head Y L Y Y Y N N M N N N N Y H N N Y Y Y M Y N N Y Second simplest graph, assume complete dependence Fl Fe Si Ac Sw He 43
44 Maximum Likelihood x 0 x 1 x 2 x 3 x 4 x 5 Each node has a conditional probability table, θ Given the tables, we can construct the pdf. p(x θ) = M 1 i=0 p(x i π i, θ i ) Use Maximum Likelihood to find the best settings of θ 44
45 Maximum likelihood θ = argmax θ = argmax θ = argmax θ = argmax θ ln p( X θ) N 1 n=0 N 1 n=0 N 1 n=0 ln p( X n θ) ln M 1 i=0 M 1 i=0 p(x in θ i ) ln p(x in θ i ) 45
46 Count functions Count the number of times something appears in the data = x 1 m(x 1 )= x 1 x 2 δ(x 1,x 2 ) δ(x n, x m )= m(x i )= m( X)= { 1 if xn = x m 0 otherwise N 1 n=0 N 1 n=0 δ(x i,x in ) δ( X, X n ) = x 1 x 2 x 3 δ(x 1,x 2,x 3 )... 46
47 Maximum Likelihood l(θ) = = = N 1 n=0 N 1 n=0 N 1 n=0 ln p(x n θ) ln X p(x θ) δ(x n,x) δ(x n, X)lnp(X θ) X = x n m(x)lnp(x θ) = x n m(x)ln = M 1 x n = = M 1 i=0 M 1 i=0 i=0 x i,π i M 1 i=0 p(x i π i, θ i ) m(x)lnp(x i π i, θ i ) X\x i \π i m(x)lnp(x i π i, θ i ) x i,π i m(x i, π i )lnp(x i π i, θ i ) Define a function: Constraint: θ(x i, π i )=p(x i π i, θ i ) x i θ(x i, π i )=1 47
48 Maximum Likelihood Use Lagrange Multipliers l(θ) = M 1 i=0 m(x i, π i )lnθ(x i, π i ) x i π i l(θ) θ(x i, π i ) = m(x i, π i ) θ(x i, π i ) λ π i =0 θ(x i, π i )= m(x i, π i ) λ πi m(x i, π i ) = 1 the constraint λ x πi i λ πi = x i m(x i, π i )=m(π i ) M 1 i=0 π i λ πi x i θ(x i, π i ) 1 θ(x i, π i )= m(x i, π i ) m(π i ) counts! 48
49 Maximum A Posteriori Training Bayesians would never do that, the thetas need a prior. θ(x i, π i )= m(x i, π i )+ m(π i )+ x i 49
50 Conditional Dependence Test Can check conditional independence in a graphical model Is achiness (x3) independent of the flue (x0) given fever(x1)? Is achiness (x3) independent of sinus infections(x2) given fever (x1)? p(x) =p(x 0 )p(x 1 x 0 )p(x 2 x 0 )p(x 3 x 1 )p(x 4 x 2 )p(x 5 x 1,x 4 ) p(x 3 x 0,x 1,x 2 )= p(x 0,x 1,x 2,x 3 ) p(x 0,x 1,x 2 ) x 3 x 0,x 2 x 1 = p(x 0)p(x 1 x 0 )p(x 2 x 0 )p(x 3 x 1 ) p(x 0 )p(x 1 x 0 )p(x 2 x 0 ) = p(x 3 x 1 ) 50
51 D-Separation and Bayes Ball x 0 x 1 x 2 Intuition: nodes are separated or blocked by sets of nodes. E.g. nodes x1 and x2, block the path from x0 to x5. So x0 is cond. ind.from x5 given x1 and x2 x 3 x 4 x 5 51
52 Bayes Ball Algorithm x a x b x c Shade nodes x c Place a ball at each node in x a Bounce balls around the graph according to rules If no balls reach x b, then cond. ind. 52
53 Ten rules of Bayes Ball Theorem 53
54 Bayes Ball Example x 0 x 4 x 2? x 3 x 0 x 1 x 5 x 2 x 4 54
55 Bayes Ball Example x 0 x 5 x 1, x 2? x 3 x 0 x 1 x 5 x 2 x 4 55
56 Undirected Graphs What if we allow undirected graphs? What do they correspond to? Not Cause/Effect, or Trigger/Response, but general dependence Example: Image pixels, each pixel is a bernouli P(x11,, x1m,, xm1,, xmm) Bright pixels have bright neighbors No parents, just probabilities. Grid models are called Markov Random Fields 56
57 Undirected Graphs A D C Undirected separability is easy. B To check conditional independence of A and B given C, check the Graph reachability of A and B without going through nodes in C 57
58 Next Time More fun with Graphical Models Read Chapter 8.1,
Lecture 9: Undirected Graphical Models Machine Learning
Lecture 9: Undirected Graphical Models Machine Learning Andrew Rosenberg March 5, 2010 1/1 Today Graphical Models Probabilities in Undirected Graphs 2/1 Undirected Graphs What if we allow undirected graphs?
More informationGraphical Models. David M. Blei Columbia University. September 17, 2014
Graphical Models David M. Blei Columbia University September 17, 2014 These lecture notes follow the ideas in Chapter 2 of An Introduction to Probabilistic Graphical Models by Michael Jordan. In addition,
More informationComputer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models
Prof. Daniel Cremers 4. Probabilistic Graphical Models Directed Models The Bayes Filter (Rep.) (Bayes) (Markov) (Tot. prob.) (Markov) (Markov) 2 Graphical Representation (Rep.) We can describe the overall
More informationThe Basics of Graphical Models
The Basics of Graphical Models David M. Blei Columbia University September 30, 2016 1 Introduction (These notes follow Chapter 2 of An Introduction to Probabilistic Graphical Models by Michael Jordan.
More informationComputer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models
Prof. Daniel Cremers 4. Probabilistic Graphical Models Directed Models The Bayes Filter (Rep.) (Bayes) (Markov) (Tot. prob.) (Markov) (Markov) 2 Graphical Representation (Rep.) We can describe the overall
More informationConditional Random Fields - A probabilistic graphical model. Yen-Chin Lee 指導老師 : 鮑興國
Conditional Random Fields - A probabilistic graphical model Yen-Chin Lee 指導老師 : 鮑興國 Outline Labeling sequence data problem Introduction conditional random field (CRF) Different views on building a conditional
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 25, 2015 Today: Graphical models Bayes Nets: Inference Learning EM Readings: Bishop chapter 8 Mitchell
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2, 2012 Today: Graphical models Bayes Nets: Representing distributions Conditional independencies
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 18, 2015 Today: Graphical models Bayes Nets: Representing distributions Conditional independencies
More informationLecture 3: Conditional Independence - Undirected
CS598: Graphical Models, Fall 2016 Lecture 3: Conditional Independence - Undirected Lecturer: Sanmi Koyejo Scribe: Nate Bowman and Erin Carrier, Aug. 30, 2016 1 Review for the Bayes-Ball Algorithm Recall
More informationComputer Vision Group Prof. Daniel Cremers. 4a. Inference in Graphical Models
Group Prof. Daniel Cremers 4a. Inference in Graphical Models Inference on a Chain (Rep.) The first values of µ α and µ β are: The partition function can be computed at any node: Overall, we have O(NK 2
More informationBayesian Networks Inference (continued) Learning
Learning BN tutorial: ftp://ftp.research.microsoft.com/pub/tr/tr-95-06.pdf TAN paper: http://www.cs.huji.ac.il/~nir/abstracts/frgg1.html Bayesian Networks Inference (continued) Learning Machine Learning
More informationMachine Learning
Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 15, 2011 Today: Graphical models Inference Conditional independence and D-separation Learning from
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University March 4, 2015 Today: Graphical models Bayes Nets: EM Mixture of Gaussian clustering Learning Bayes Net structure
More informationDirected Graphical Models (Bayes Nets) (9/4/13)
STA561: Probabilistic machine learning Directed Graphical Models (Bayes Nets) (9/4/13) Lecturer: Barbara Engelhardt Scribes: Richard (Fangjian) Guo, Yan Chen, Siyang Wang, Huayang Cui 1 Introduction For
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 9, 2012 Today: Graphical models Bayes Nets: Inference Learning Readings: Required: Bishop chapter
More informationD-Separation. b) the arrows meet head-to-head at the node, and neither the node, nor any of its descendants, are in the set C.
D-Separation Say: A, B, and C are non-intersecting subsets of nodes in a directed graph. A path from A to B is blocked by C if it contains a node such that either a) the arrows on the path meet either
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University April 1, 2019 Today: Inference in graphical models Learning graphical models Readings: Bishop chapter 8 Bayesian
More informationConditional Random Fields : Theory and Application
Conditional Random Fields : Theory and Application Matt Seigel (mss46@cam.ac.uk) 3 June 2010 Cambridge University Engineering Department Outline The Sequence Classification Problem Linear Chain CRFs CRF
More informationGraphical Models & HMMs
Graphical Models & HMMs Henrik I. Christensen Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280 hic@cc.gatech.edu Henrik I. Christensen (RIM@GT) Graphical Models
More informationIntroduction to Machine Learning
Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 2: Probability: Discrete Random Variables Classification: Validation & Model Selection Many figures
More informationCS 343: Artificial Intelligence
CS 343: Artificial Intelligence Bayes Nets: Independence Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley.
More information1 : Introduction to GM and Directed GMs: Bayesian Networks. 3 Multivariate Distributions and Graphical Models
10-708: Probabilistic Graphical Models, Spring 2015 1 : Introduction to GM and Directed GMs: Bayesian Networks Lecturer: Eric P. Xing Scribes: Wenbo Liu, Venkata Krishna Pillutla 1 Overview This lecture
More informationGenerative and discriminative classification techniques
Generative and discriminative classification techniques Machine Learning and Category Representation 2014-2015 Jakob Verbeek, November 28, 2014 Course website: http://lear.inrialpes.fr/~verbeek/mlcr.14.15
More informationIntroduction to Graphical Models
Robert Collins CSE586 Introduction to Graphical Models Readings in Prince textbook: Chapters 10 and 11 but mainly only on directed graphs at this time Credits: Several slides are from: Review: Probability
More information2. Graphical Models. Undirected graphical models. Factor graphs. Bayesian networks. Conversion between graphical models. Graphical Models 2-1
Graphical Models 2-1 2. Graphical Models Undirected graphical models Factor graphs Bayesian networks Conversion between graphical models Graphical Models 2-2 Graphical models There are three families of
More informationBayesian Classification Using Probabilistic Graphical Models
San Jose State University SJSU ScholarWorks Master's Projects Master's Theses and Graduate Research Spring 2014 Bayesian Classification Using Probabilistic Graphical Models Mehal Patel San Jose State University
More informationMachine Learning. Supervised Learning. Manfred Huber
Machine Learning Supervised Learning Manfred Huber 2015 1 Supervised Learning Supervised learning is learning where the training data contains the target output of the learning system. Training data D
More informationCOMP90051 Statistical Machine Learning
COMP90051 Statistical Machine Learning Semester 2, 2016 Lecturer: Trevor Cohn 21. Independence in PGMs; Example PGMs Independence PGMs encode assumption of statistical independence between variables. Critical
More informationCOMP90051 Statistical Machine Learning
COMP90051 Statistical Machine Learning Semester 2, 2016 Lecturer: Trevor Cohn 20. PGM Representation Next Lectures Representation of joint distributions Conditional/marginal independence * Directed vs
More informationStatistical Methods for NLP
Statistical Methods for NLP Text Similarity, Text Categorization, Linear Methods of Classification Sameer Maskey Announcement Reading Assignments Bishop Book 6, 62, 7 (upto 7 only) J&M Book 3, 45 Journal
More informationRegularization and Markov Random Fields (MRF) CS 664 Spring 2008
Regularization and Markov Random Fields (MRF) CS 664 Spring 2008 Regularization in Low Level Vision Low level vision problems concerned with estimating some quantity at each pixel Visual motion (u(x,y),v(x,y))
More informationMachine Learning
Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 17, 2011 Today: Graphical models Learning from fully labeled data Learning from partly observed data
More informationCS242: Probabilistic Graphical Models Lecture 3: Factor Graphs & Variable Elimination
CS242: Probabilistic Graphical Models Lecture 3: Factor Graphs & Variable Elimination Instructor: Erik Sudderth Brown University Computer Science September 11, 2014 Some figures and materials courtesy
More informationECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning
ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning Topics Bayes Nets (Finish) Parameter Learning Structure Learning Readings: KF 18.1, 18.3; Barber 9.5,
More informationProbabilistic Graphical Models
School of Computer Science Probabilistic Graphical Models Theory of Variational Inference: Inner and Outer Approximation Eric Xing Lecture 14, February 29, 2016 Reading: W & J Book Chapters Eric Xing @
More informationECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning
ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning Topics Bayes Nets: Inference (Finish) Variable Elimination Graph-view of VE: Fill-edges, induced width
More informationNote Set 4: Finite Mixture Models and the EM Algorithm
Note Set 4: Finite Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine Finite Mixture Models A finite mixture model with K components, for
More informationMixture Models and the EM Algorithm
Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Finite Mixture Models Say we have a data set D = {x 1,..., x N } where x i is
More informationConditional Random Fields and beyond D A N I E L K H A S H A B I C S U I U C,
Conditional Random Fields and beyond D A N I E L K H A S H A B I C S 5 4 6 U I U C, 2 0 1 3 Outline Modeling Inference Training Applications Outline Modeling Problem definition Discriminative vs. Generative
More informationThe exam is closed book, closed notes except your one-page (two-sided) cheat sheet.
CS 189 Spring 2015 Introduction to Machine Learning Final You have 2 hours 50 minutes for the exam. The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. No calculators or
More informationFMA901F: Machine Learning Lecture 6: Graphical Models. Cristian Sminchisescu
FMA901F: Machine Learning Lecture 6: Graphical Models Cristian Sminchisescu Graphical Models Provide a simple way to visualize the structure of a probabilistic model and can be used to design and motivate
More informationToday s outline: pp
Chapter 3 sections We will SKIP a number of sections Random variables and discrete distributions Continuous distributions The cumulative distribution function Bivariate distributions Marginal distributions
More information3 : Representation of Undirected GMs
0-708: Probabilistic Graphical Models 0-708, Spring 202 3 : Representation of Undirected GMs Lecturer: Eric P. Xing Scribes: Nicole Rafidi, Kirstin Early Last Time In the last lecture, we discussed directed
More informationA Brief Introduction to Bayesian Networks AIMA CIS 391 Intro to Artificial Intelligence
A Brief Introduction to Bayesian Networks AIMA 14.1-14.3 CIS 391 Intro to Artificial Intelligence (LDA slides from Lyle Ungar from slides by Jonathan Huang (jch1@cs.cmu.edu)) Bayesian networks A simple,
More informationBayesian Machine Learning - Lecture 6
Bayesian Machine Learning - Lecture 6 Guido Sanguinetti Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh gsanguin@inf.ed.ac.uk March 2, 2015 Today s lecture 1
More informationMarkov Random Fields and Gibbs Sampling for Image Denoising
Markov Random Fields and Gibbs Sampling for Image Denoising Chang Yue Electrical Engineering Stanford University changyue@stanfoed.edu Abstract This project applies Gibbs Sampling based on different Markov
More informationGenerative and discriminative classification techniques
Generative and discriminative classification techniques Machine Learning and Object Recognition 2015-2016 Jakob Verbeek, December 11, 2015 Course website: http://lear.inrialpes.fr/~verbeek/mlor.15.16 Classification
More informationSequence Labeling: The Problem
Sequence Labeling: The Problem Given a sequence (in NLP, words), assign appropriate labels to each word. For example, POS tagging: DT NN VBD IN DT NN. The cat sat on the mat. 36 part-of-speech tags used
More informationMachine Learning / Jan 27, 2010
Revisiting Logistic Regression & Naïve Bayes Aarti Singh Machine Learning 10-701/15-781 Jan 27, 2010 Generative and Discriminative Classifiers Training classifiers involves learning a mapping f: X -> Y,
More informationSupervised Learning Classification Algorithms Comparison
Supervised Learning Classification Algorithms Comparison Aditya Singh Rathore B.Tech, J.K. Lakshmipat University -------------------------------------------------------------***---------------------------------------------------------
More information6.867 Machine Learning
6.867 Machine Learning Problem set 3 Due Tuesday, October 22, in class What and how to turn in? Turn in short written answers to the questions explicitly stated, and when requested to explain or prove.
More informationMachine Learning. Sourangshu Bhattacharya
Machine Learning Sourangshu Bhattacharya Bayesian Networks Directed Acyclic Graph (DAG) Bayesian Networks General Factorization Curve Fitting Re-visited Maximum Likelihood Determine by minimizing sum-of-squares
More informationGraphical models are a lot like a circuit diagram they are written down to visualize and better understand a problem.
Machine Learning (ML, F16) Lecture#15 (Tuesday, Nov. 1st) Lecturer: Byron Boots Graphical Models 1 Graphical Models Often, one is interested in representing a joint distribution P over a set of n random
More informationComputer vision: models, learning and inference. Chapter 10 Graphical Models
Computer vision: models, learning and inference Chapter 10 Graphical Models Independence Two variables x 1 and x 2 are independent if their joint probability distribution factorizes as Pr(x 1, x 2 )=Pr(x
More informationNetwork Traffic Measurements and Analysis
DEIB - Politecnico di Milano Fall, 2017 Sources Hastie, Tibshirani, Friedman: The Elements of Statistical Learning James, Witten, Hastie, Tibshirani: An Introduction to Statistical Learning Andrew Ng:
More informationIntroduction to Machine Learning
Department of Computer Science, University of Helsinki Autumn 2009, second term Session 8, November 27 th, 2009 1 2 3 Multiplicative Updates for L1-Regularized Linear and Logistic Last time I gave you
More informationBayes Net Learning. EECS 474 Fall 2016
Bayes Net Learning EECS 474 Fall 2016 Homework Remaining Homework #3 assigned Homework #4 will be about semi-supervised learning and expectation-maximization Homeworks #3-#4: the how of Graphical Models
More informationCS 343: Artificial Intelligence
CS 343: Artificial Intelligence Bayes Nets: Inference Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley.
More information6 : Factor Graphs, Message Passing and Junction Trees
10-708: Probabilistic Graphical Models 10-708, Spring 2018 6 : Factor Graphs, Message Passing and Junction Trees Lecturer: Kayhan Batmanghelich Scribes: Sarthak Garg 1 Factor Graphs Factor Graphs are graphical
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 5 Inference
More information2. Graphical Models. Undirected pairwise graphical models. Factor graphs. Bayesian networks. Conversion between graphical models. Graphical Models 2-1
Graphical Models 2-1 2. Graphical Models Undirected pairwise graphical models Factor graphs Bayesian networks Conversion between graphical models Graphical Models 2-2 Graphical models Families of graphical
More informationStatistical Techniques in Robotics (STR, S15) Lecture#06 (Wednesday, January 28)
Statistical Techniques in Robotics (STR, S15) Lecture#06 (Wednesday, January 28) Lecturer: Byron Boots Graphical Models 1 Graphical Models Often one is interested in representing a joint distribution P
More informationBayesian Networks Inference
Bayesian Networks Inference Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University November 5 th, 2007 2005-2007 Carlos Guestrin 1 General probabilistic inference Flu Allergy Query: Sinus
More informationGraphical Models. Pradeep Ravikumar Department of Computer Science The University of Texas at Austin
Graphical Models Pradeep Ravikumar Department of Computer Science The University of Texas at Austin Useful References Graphical models, exponential families, and variational inference. M. J. Wainwright
More informationConvex Optimization MLSS 2015
Convex Optimization MLSS 2015 Constantine Caramanis The University of Texas at Austin The Optimization Problem minimize : f (x) subject to : x X. The Optimization Problem minimize : f (x) subject to :
More informationCS6375: Machine Learning Gautam Kunapuli. Mid-Term Review
Gautam Kunapuli Machine Learning Data is identically and independently distributed Goal is to learn a function that maps to Data is generated using an unknown function Learn a hypothesis that minimizes
More informationPart II. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS
Part II C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Converting Directed to Undirected Graphs (1) Converting Directed to Undirected Graphs (2) Add extra links between
More information10-701/15-781, Fall 2006, Final
-7/-78, Fall 6, Final Dec, :pm-8:pm There are 9 questions in this exam ( pages including this cover sheet). If you need more room to work out your answer to a question, use the back of the page and clearly
More informationMachine Learning. Chao Lan
Machine Learning Chao Lan Machine Learning Prediction Models Regression Model - linear regression (least square, ridge regression, Lasso) Classification Model - naive Bayes, logistic regression, Gaussian
More informationTime series, HMMs, Kalman Filters
Classic HMM tutorial see class website: *L. R. Rabiner, "A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition," Proc. of the IEEE, Vol.77, No.2, pp.257--286, 1989. Time series,
More informationLearning Bayesian Networks (part 3) Goals for the lecture
Learning Bayesian Networks (part 3) Mark Craven and David Page Computer Sciences 760 Spring 2018 www.biostat.wisc.edu/~craven/cs760/ Some of the slides in these lectures have been adapted/borrowed from
More informationMachine Learning Classifiers and Boosting
Machine Learning Classifiers and Boosting Reading Ch 18.6-18.12, 20.1-20.3.2 Outline Different types of learning problems Different types of learning algorithms Supervised learning Decision trees Naïve
More informationGraphical Models. Dmitrij Lagutin, T Machine Learning: Basic Principles
Graphical Models Dmitrij Lagutin, dlagutin@cc.hut.fi T-61.6020 - Machine Learning: Basic Principles 12.3.2007 Contents Introduction to graphical models Bayesian networks Conditional independence Markov
More informationComparative Analysis of Naive Bayes and Tree Augmented Naive Bayes Models
San Jose State University SJSU ScholarWorks Master's Projects Master's Theses and Graduate Research Spring 2014 Comparative Analysis of Naive Bayes and Tree Augmented Naive Bayes Models Harini Padmanaban
More informationAlgorithms for Markov Random Fields in Computer Vision
Algorithms for Markov Random Fields in Computer Vision Dan Huttenlocher November, 2003 (Joint work with Pedro Felzenszwalb) Random Field Broadly applicable stochastic model Collection of n sites S Hidden
More informationA Parallel Algorithm for Exact Structure Learning of Bayesian Networks
A Parallel Algorithm for Exact Structure Learning of Bayesian Networks Olga Nikolova, Jaroslaw Zola, and Srinivas Aluru Department of Computer Engineering Iowa State University Ames, IA 0010 {olia,zola,aluru}@iastate.edu
More informationImage analysis. Computer Vision and Classification Image Segmentation. 7 Image analysis
7 Computer Vision and Classification 413 / 458 Computer Vision and Classification The k-nearest-neighbor method The k-nearest-neighbor (knn) procedure has been used in data analysis and machine learning
More informationMachine Learning A WS15/16 1sst KU Version: January 11, b) [1 P] For the probability distribution P (A, B, C, D) with the factorization
Machine Learning A 708.064 WS15/16 1sst KU Version: January 11, 2016 Exercises Problems marked with * are optional. 1 Conditional Independence I [3 P] a) [1 P] For the probability distribution P (A, B,
More informationMulti-Class Logistic Regression and Perceptron
Multi-Class Logistic Regression and Perceptron Instructor: Wei Xu Some slides adapted from Dan Jurfasky, Brendan O Connor and Marine Carpuat MultiClass Classification Q: what if we have more than 2 categories?
More informationGenerative and discriminative classification
Generative and discriminative classification Machine Learning and Object Recognition 2017-2018 Jakob Verbeek Classification in its simplest form Given training data labeled for two or more classes Classification
More informationScene Grammars, Factor Graphs, and Belief Propagation
Scene Grammars, Factor Graphs, and Belief Propagation Pedro Felzenszwalb Brown University Joint work with Jeroen Chua Probabilistic Scene Grammars General purpose framework for image understanding and
More informationSupervised Learning for Image Segmentation
Supervised Learning for Image Segmentation Raphael Meier 06.10.2016 Raphael Meier MIA 2016 06.10.2016 1 / 52 References A. Ng, Machine Learning lecture, Stanford University. A. Criminisi, J. Shotton, E.
More informationBayesian Networks. A Bayesian network is a directed acyclic graph that represents causal relationships between random variables. Earthquake.
Bayes Nets Independence With joint probability distributions we can compute many useful things, but working with joint PD's is often intractable. The naïve Bayes' approach represents one (boneheaded?)
More informationIntroduction to Pattern Recognition Part II. Selim Aksoy Bilkent University Department of Computer Engineering
Introduction to Pattern Recognition Part II Selim Aksoy Bilkent University Department of Computer Engineering saksoy@cs.bilkent.edu.tr RETINA Pattern Recognition Tutorial, Summer 2005 Overview Statistical
More informationGenerative and discriminative classification techniques
Generative and discriminative classification techniques Machine Learning and Category Representation 013-014 Jakob Verbeek, December 13+0, 013 Course website: http://lear.inrialpes.fr/~verbeek/mlcr.13.14
More informationNearest neighbor classification DSE 220
Nearest neighbor classification DSE 220 Decision Trees Target variable Label Dependent variable Output space Person ID Age Gender Income Balance Mortgag e payment 123213 32 F 25000 32000 Y 17824 49 M 12000-3000
More informationScene Grammars, Factor Graphs, and Belief Propagation
Scene Grammars, Factor Graphs, and Belief Propagation Pedro Felzenszwalb Brown University Joint work with Jeroen Chua Probabilistic Scene Grammars General purpose framework for image understanding and
More informationMachine Learning. Unsupervised Learning. Manfred Huber
Machine Learning Unsupervised Learning Manfred Huber 2015 1 Unsupervised Learning In supervised learning the training data provides desired target output for learning In unsupervised learning the training
More informationMarkov Networks in Computer Vision. Sargur Srihari
Markov Networks in Computer Vision Sargur srihari@cedar.buffalo.edu 1 Markov Networks for Computer Vision Important application area for MNs 1. Image segmentation 2. Removal of blur/noise 3. Stereo reconstruction
More informationK-Means and Gaussian Mixture Models
K-Means and Gaussian Mixture Models David Rosenberg New York University June 15, 2015 David Rosenberg (New York University) DS-GA 1003 June 15, 2015 1 / 43 K-Means Clustering Example: Old Faithful Geyser
More informationIntroduction to Hidden Markov models
1/38 Introduction to Hidden Markov models Mark Johnson Macquarie University September 17, 2014 2/38 Outline Sequence labelling Hidden Markov Models Finding the most probable label sequence Higher-order
More informationIntroduction to Optimization
Introduction to Optimization Second Order Optimization Methods Marc Toussaint U Stuttgart Planned Outline Gradient-based optimization (1st order methods) plain grad., steepest descent, conjugate grad.,
More informationClustering. Mihaela van der Schaar. January 27, Department of Engineering Science University of Oxford
Department of Engineering Science University of Oxford January 27, 2017 Many datasets consist of multiple heterogeneous subsets. Cluster analysis: Given an unlabelled data, want algorithms that automatically
More informationSemi-supervised Learning
Semi-supervised Learning Piyush Rai CS5350/6350: Machine Learning November 8, 2011 Semi-supervised Learning Supervised Learning models require labeled data Learning a reliable model usually requires plenty
More informationCS839: Probabilistic Graphical Models. Lecture 10: Learning with Partially Observed Data. Theo Rekatsinas
CS839: Probabilistic Graphical Models Lecture 10: Learning with Partially Observed Data Theo Rekatsinas 1 Partially Observed GMs Speech recognition 2 Partially Observed GMs Evolution 3 Partially Observed
More informationSTAT 598L Learning Bayesian Network Structure
STAT 598L Learning Bayesian Network Structure Sergey Kirshner Department of Statistics Purdue University skirshne@purdue.edu November 2, 2009 Acknowledgements: some of the slides were based on Luo Si s
More informationBuilding Classifiers using Bayesian Networks
Building Classifiers using Bayesian Networks Nir Friedman and Moises Goldszmidt 1997 Presented by Brian Collins and Lukas Seitlinger Paper Summary The Naive Bayes classifier has reasonable performance
More information1) Give decision trees to represent the following Boolean functions:
1) Give decision trees to represent the following Boolean functions: 1) A B 2) A [B C] 3) A XOR B 4) [A B] [C Dl Answer: 1) A B 2) A [B C] 1 3) A XOR B = (A B) ( A B) 4) [A B] [C D] 2 2) Consider the following
More information