Survey of contemporary Bayesian Network Structure Learning methods
|
|
- Kerry Gordon
- 6 years ago
- Views:
Transcription
1 Survey of contemporary Bayesian Network Structure Learning methods Ligon Liu September 2015 Ligon Liu (CUNY) Survey on Bayesian Network Structure Learning (slide 1) September / 38
2 Bayesian Network Definition Let V be a set of variables. A Bayesian Network is comprised of a discrete structure part and a continuous parameter part: structural part: a Directed Acyclic Graph (V, E), V being random variables, E V V parameter part: the conditional probability of every variable given its parents in the DAG. Example The Y-shaped Bayesian Network: V={0,1,2,3}, E={(0,2),(1,2),(2,3)} Ligon Liu (CUNY) Survey on Bayesian Network Structure Learning (slide 2) September / 38
3 Bayesian Network Example Conditional Probability Table x0,x1 x2 Pr x0 Pr x1 Pr x2 x3 Pr Ligon Liu (CUNY) Survey on Bayesian Network Structure Learning (slide 3) September / 38
4 BN Structure Learning Problem Counted indexed relation dataset (V, R, c) Scoring function s(v, R, c, E), abbrev. s(v, E) usually s(v, E) is required to be decomposable: s(v, E) = S(v, Pa(v)) v V Find the DAG (V, E) that maximize s(v, E) Ligon Liu (CUNY) Survey on Bayesian Network Structure Learning (slide 4) September / 38
5 Clusters of surveyed articles Conditional Independence(C.I.) constraint-based algorithms [16], [1], [11], [12][20]) Ordering-based search[10], [19], [15], Branch and bound[4][14], Parent Graph shortest path [22], [5, 7, 6] Integer Linear Programming and LP relaxation based approximate algorithms [8] [9], [17, 18], [2, 3] Ligon Liu (CUNY) Survey on Bayesian Network Structure Learning (slide 5) September / 38
6 Conditional Independence and its testing Definition Let P be a distribution over variable set V, X, Y, M V, X and Y is said to be Conditional Independent given M if P(X, Y M) = P(X M) P(Y M) Conditional Independence conclusions can be tested or inferred from known conditional independences. Testing(for discrete variables), e.g. χ 2 test G 2 test Monte Carlo permutation test Inferring, e.g. Semi-graphoid rules: (1) Symmetry CI(A, B C) CI(B, A C) (2) Decomposition CI(A, B D C) CI(A, B C) (3) Weak union CI(A, B C D) CI(A, B C D) (4) Contraction Ligon Liu (CUNY) Survey on Bayesian Network Structure Learning (slide 6) September / 38
7 Conditional Independence and Bayesian Network DAG Notation Let (V,E) be a Directed Acyclic Graph, v V, the parent set of v is denoted as Pa(v), i.e. Pa(v) = {u (u, v) E} Lemma P(v V {v}) = P(v Pa(v)) Definition Let (V, E) be a DAG, vertexes u, v V is said to be d-separated given M, if after all colliders(including collider sets) in M be replaced by bidirected edges between their parents, all directed paths from u to v or from v to u in (V,E) does not pass through M. On a Bayesian Network DAG, vertexes u, v being d-separated by M indicates u v M in the BN probability distribution. Ligon Liu (CUNY) Survey on Bayesian Network Structure Learning (slide 7) September / 38
8 Early Conditional Independence-based algorithms C.I. based algorithms are based on the following facts: d-separation on Bayesian Network DAG Conditional Independence Existence of an undirected edge u v can be inferred from at least one of many conditional independences. Testing on more C.I. triples (u, v, M) may increase confidence. C.I. tests are computationally expensive to perform on datasets. Minimize number of tests. Ligon Liu (CUNY) Survey on Bayesian Network Structure Learning (slide 8) September / 38
9 Early Conditional Independence-based algorithms SGS (the first C.I.-based algorithm to learn BN) PC(PC*, Stable- and Conservative- PC) Grow-Shrink, IAMB, SRS Ligon Liu (CUNY) Survey on Bayesian Network Structure Learning (slide 9) September / 38
10 PC Algorithm brief PC is an iterative algorithm to learn Bayesian Network from C.I. tests. With graph edges E as a variable, Start with complete undirected graph Edge elimination: For each two variables u, v, do C.I. tests, startingăwith size 0 (unconditional) Ø, then size 1 condition sets {i}, {j},..., then size 2 condition sets {i, j}, {i, k}, {j, k},..., larger condition sets until conditional independence i j M is found, V {u, v}. Eliminate any edge between two variables that are conditionally independent given any condition set. For any pair of variables, PC algorithm test against conditional sets with variables in any path between the pair. Directing the edge by unshielded collider rule and loop removal rule Ligon Liu (CUNY) Survey on Bayesian Network Structure Learning (slide 10) September / 38
11 Edge direction rules Unshielded Collider Rule: If two variables u,w are not directly connected but are connected as u v w, orient v w as v w to avoid forming unshielded collider u v w Loop Removal Rule: If two variables, u and v connected both by an undirected edge and by a directed path, orient the undirected edge as u v Ligon Liu (CUNY) Survey on Bayesian Network Structure Learning (slide 11) September / 38
12 Advantages of PC based algorithms 1 Fast speed. On sparse graphs, the computation time of PC is polynomial time. 2 Compared to SGS, C.I. constraint propagation by semi-graphoid rules saved a lot of C.I. testings. In addition, if parallel machine is available, it is possible to do redundant C.I. testings to improve confidence[1], 3 By computing independence of smaller conditional testing M first, the conditional independence test has higher confidence for high dimensional dataset. Ligon Liu (CUNY) Survey on Bayesian Network Structure Learning (slide 12) September / 38
13 Robustness of PC based algorithms The robustness of C.I. based algorithms is doubted by researchers[11],[1]. Factors that will undermine robustness of PC in high dimensional datasets: Sampling loss when marginal dataset is relatively small w/regarding to graph complexity, local C.I. tests are usually less accurate also called non faithfulness of the C.I. relations to the distribution. C.I. testing order when earlier independence test happens to have lower confidence, they can prevent tests generating higher confidence contradictory C.I. results. Two algorithms, Conservative-PC[11] and Stable-PC[1], are invented to overcome the instability over C.I. testing order. They use redundant CI testing to detect unfaithfulness and a voting mechanism to find the most likely CI. Ligon Liu (CUNY) Survey on Bayesian Network Structure Learning (slide 13) September / 38
14 Markov blanket Definition Let (V, E) be a DAG. The Markov Blanket of v V, denoted by MB(v), is the set of vertexes not d-separated with v by any variables. i.e., the set of nodes composed of v s parents, children, and children s parents in the DAG. Theorem v is d-separated from V {v} MB(v) by MB(v) Definition Let (V, E) be a DAG. The Moral Graph (V, F) of (V, E) is formed by connecting nodes that have a common child, and then making all edges in the graph undirected. i.e. F = {{u, v} (u, v) E or (v, u) E or w : (u, w), (w, u) E} Ligon Liu (CUNY) Survey on Bayesian Network Structure Learning (slide 14) September / 38
15 Markov blanket Corollary Let (V, E) be a DAG, (V, F) be the Moral Graph of (V,E), the Markov Blanket of v V is the neighbors of v in (V, F). DAG Moral Graph Markov Blanket of E Ligon Liu (CUNY) Survey on Bayesian Network Structure Learning (slide 15) September / 38
16 Grow-Shrink and IAMB algorithms Definition Let (V, E) be a directed graph. The Markov Blanket of v V is the set of vertexes not d-separated with v by any variables. i.e., the set of nodes composed of v s parents, children, and children s parents. i.e. A Markov Blanket M is a minimum subset of V that satisfies: U V {v} M : v U M Obvious: finding every variable s Markov Blanket is equivalent of finding the DAG s Moral Graph Grow-Shrink algorithm IAMB algorithm greedy ordering of condition sets of Grow-Shrink Ligon Liu (CUNY) Survey on Bayesian Network Structure Learning (slide 16) September / 38
17 Multiple Markov Blankets Since Bayesian Network decomposition is usually not unique for a data distribution, one variable may have multiple different Markov Blankets. Like for both M 1 and M 2 : Definition U V {v} M : v U M Let (V, E) be a directed graph. A variable u V is called Strongly Relevant with v V if and only if S V {v, u} : P(v S) P(v S {u}) A variable u V is called Weakly Relevant with v V if and only if S V {v, u} : P(v S) P(v S {u}) and u is not Strongly Relevant with v. Ligon Liu (CUNY) Survey on Bayesian Network Structure Learning (slide 17) September / 38
18 Selection via Represent Sets Algorithm Definition Let V be the variable set, a representative set of v V consists of a variable u in v s Markov blanket and u s corresponding correlated features. Proposition u is strongly relevant with v, if and only if u belongs to the set of parents and children of variable v in a faithful Bayesian Network. SRS Algorithm Step 1: G v Get PC(v) (PC means Parent&Child) for u in G v : G u {u} Get PC(u) Step 2: Search a group of strongly relevant variables Parent Child sets {G i } {G u u SR(v)}, such that i G i is a best Markov Blanket under the given measure. Ligon Liu (CUNY) Survey on Bayesian Network Structure Learning (slide 18) September / 38
19 Decomposable Scoring Function Let V be the variables, scoring function s(v, E): P(V V) R s is called decomposable if and only if s(v, E) = v V S(v, Pa(v)), where Pa(v) = {u (u, v) E} Commonly used decomposable scoring functions: Log-Likelihood(AIC,BIC), BD(e,eu) Define BN learning as finding E for V that maximizes s(v, E) Ligon Liu (CUNY) Survey on Bayesian Network Structure Learning (slide 19) September / 38
20 Ordering Search Given an order O of variables, if scoring function s is decomposable, the best DAG satisfying O can be found in polynomial time to the number of variables, simply by finding best parents among smaller-order variables from sink to source.[15] Modern ordering search algorithms use propagation of constraints inferred from scoring function s properties and background knowledge to reduce search space. Algorithm: branch n bound search, A heuristic search Ligon Liu (CUNY) Survey on Bayesian Network Structure Learning (slide 20) September / 38
21 Dynamic Programming of Parent Sets Lemma [13]Let v V, Q V, v Q. max P Q S(v, P) = max(s(v, Q), max u Q max P Q {u} S(v, P)) Which enables DP for propagation of argmax P Q S(v, P) for all subsets Q of V {v}. This is the step all dynamic programming algorithms use to get optimal parent sets of every variable. Ligon Liu (CUNY) Survey on Bayesian Network Structure Learning (slide 21) September / 38
22 The 2006 ordering search algorithm[15] Variable Set: V = {1,..., N}. Variable i s parent candidate set: Pa(v) V {v} 1. Calculate the local scores for all n 2 n 1 different (v, Pa(v))-pairs. [s(v, Pa(v)) v V, Pa(v) V {v}] 3. Find the best sink from all 2 n variable sets. [sink(w) = argmax s W skore(w, s) W V] 2. Find optimal smaller-by-1 parent set Pα(v, G) Pa(v) for all G V {v} [Pα(v, G) v = 1,..., N, G V {v}] Pα(v, G) = Pa(v) argmax v G s(v, G {v}) 4. Using the best sink, find a best ordering of the variables. O i = sink(v N j=i+1 {O j}) 5. Compute the best network using above best parents, best sink, best orde Ligon Liu (CUNY) Survey on Bayesian Network Structure Learning (slide 22) September / 38
23 Ordering by Sink Score Lemma [15]Let W V, k is the last variable(called sink) in the optimal order of W if and only if k = argmax k W (max P W {k} S(k, P) + S(W {k})) Which enables using DP for computation of optimal sink. max P W {k} S(k, P) + S(W {k}) is called sink score. Ligon Liu (CUNY) Survey on Bayesian Network Structure Learning (slide 23) September / 38
24 Example of Optimal Parents and Optimal Sink Add graphic example Ligon Liu (CUNY) Survey on Bayesian Network Structure Learning (slide 24) September / 38
25 Optimization techniques AD-Tree [19] if U U, and s(v, U ) s(v, U), remove U from candidates [19] Partition parent sets by size reduce space to 2 n ( 3 4 )p n O(1), p is degree of parents [14] Ligon Liu (CUNY) Survey on Bayesian Network Structure Learning (slide 25) September / 38
26 Structural constraints Optimal DAGs under common scoring functions (MDL, BDeu) have common structural constraints[4] that can be used to prune. Hard limits of incoming degrees Corollary Using BIC or AIC as criterion, the optimal graph (V, E) has at most log 2 N parents per node. Optimal parent set score has upper bounds various heuristics Optimal parent set has upper bounds Ligon Liu (CUNY) Survey on Bayesian Network Structure Learning (slide 26) September / 38
27 Upper bound of optimal parent set[4] Theorem Let N be total count of (V, R, c), L U = u U L u. With BIC or AIC as score function, if LPa(v) > N w log( L v ) L v, any proper superset of Pa(v) is not 1 the parent set of vertex v in an optimal structure. Theorem Given a BD score and two parent sets Pa (v) and Pa(v) for a node v such that Pa (v) Pa(v), let K vj = LPa(v) p j, if S(v, Pa (v)) > K vj K vj 2 j=1 then Pa(v) is not an optimal parent set of v. K vj K vj =1 f(k vj, α vjk k ) + j=1 log α vjk α vj Ligon Liu (CUNY) Survey on Bayesian Network Structure Learning (slide 27) September / 38
28 Upper bound of optimal parent set score[22, 21] Theorem Given a BD score S and two parent sets Pa (v) and Pa(v) for a node v such that Pa (v) Pa(v), let K vj = LPa(v) p j, if S(v, Pa (v)) > K vj K vj 2 j=1 then Pa(v) is not an optimal parent set of v. K vj K vj =1 f(k vj, α vjk k ) + j=1 log α vjk α vj Ligon Liu (CUNY) Survey on Bayesian Network Structure Learning (slide 28) September / 38
29 Order Graph Definition Let V = 1,..., n be the indexset of variables, the order graph (V, E) is defined by a graph with vertex set V being V s powerset, edge set E= {(X, Y) X, Y P(V), X Y, X + 1 = Y }. Obviously, any order graph is DAG. Example Order graph of V = 1, 2, 3, 4 Ligon Liu (CUNY) Survey on Bayesian Network Structure Learning (slide 29) September / 38
30 Shortest Path formation of Optimal Parent Set Problem Let S(v, Pa(v)) be the scoring function item for v V and its parents. Finding optimal BN is equivalent to finding shortest path on Order Graph (V, E) from Ø to V, if we define length of edge (X, Y) to be: Advantages: d(x, Y) = min Pa X S(Y X, Pa) Shortest Path on directed graph G has well studied algorithms (Dijkstra, BFBnB, A etc) Generally does not require pre-generation of all graph data, vertexes and edges can be computed dynamically. Ligon Liu (CUNY) Survey on Bayesian Network Structure Learning (slide 30) September / 38
31 Shortest Path Example Add an example of shortest path on order graph <==> optimal parent set Ligon Liu (CUNY) Survey on Bayesian Network Structure Learning (slide 31) September / 38
32 A best-first search algorithm A heuristics-enhanced variation of Dijkstra algorithm, use priority function to decide the next step of search finding Shortest Path from vertex x to y on (V, E), with the length of each edge d(u, v) (u, v) E computable in a fixed time cost. The priority function on vertex v V: f(v) = d(x, v) + h(v, y) d(x, v): already computed distance from x to v h(v, y) is the heuristically estimated distance from v to y Ligon Liu (CUNY) Survey on Bayesian Network Structure Learning (slide 32) September / 38
33 One A* heuristic function for d(x, Y) = min Pa X S(Y X, Pa) Definition Let (V, E) be an order graph of vertex set V, U V, heuristic function used in [22], denoted by h(u), is defined by h(u) = min Pa V {v} S({v}, Pa) v V U Remark: h(u) is acquired by using the best parent set for each vertex in V U, regardless if the graph is DAG. Theorem h(u) is monotonic. [22] Ligon Liu (CUNY) Survey on Bayesian Network Structure Learning (slide 33) September / 38
34 Integer valued Multisets(imsets) Definition Let V be a set of integers, A Integer Valued Multiset (imset) is a mapping from P(V) to the set of integers Z. Example Let a,b,c be integers, an example imset u with V={a,b,c}: u = δ {b} δ {a,b} δ {b,c} + δ {a,b,c} δ : Kronecker delta imset defined on the following page. Ligon Liu (CUNY) Survey on Bayesian Network Structure Learning (slide 34) September / 38
35 Arithmetic notation of imsets Definition Let V be a set of integers and U V. The U Kronecker delta imset, denoted by δ U, is defined by 1 X = U δ U (X) = 0 X U Definition Let V be a set of integers, a and b are imsets: P(V) Z The same for minus. (a + b)(x) = a(x) + b(x) Ligon Liu (CUNY) Survey on Bayesian Network Structure Learning (slide 35) September / 38
36 DAG to {0,1} to [0,1] Family Variable Vector {φ vu = 1 if Pa(v) = U, 0 otherwise} Standard Imset u (V,E) = δ V δ Ø + (δ Pa(v) δ {v} Pa(v) ) Characteristic Imset v V W V c (V,E) (U) = 1 u (V,E) (W) U W Ligon Liu (CUNY) Survey on Bayesian Network Structure Learning (slide 36) September / 38
37 Linear Program of Family Variable Vector Family Variable Vector {φ vu = 1 if Pa(v) = U, 0 otherwise} Ligon Liu (CUNY) Survey on Bayesian Network Structure Learning (slide 37) September / 38
38 References Diego Colombo and Marloes H Maathuis. Order-independent constraint-based causal structure learning. The Journal of Machine Learning Research, 15(1): , James Cussens. Integer programming for bayesian network structure learning James Cussens, David Haws, and Milan Studeny. Polyhedral aspects of score equivalence in bayesian network structure learning. arxiv preprint arxiv: , Cassio P De Campos and Qiang Ji. Efficient structure learning of bayesian networks using constraints. The Journal of Machine Learning Research, 12: , Xiannian Fan, Brandon Malone, and Changhe Yuan. Finding optimal bayesian network structures with constraints learned from data. Ligon Liu (CUNY) Survey on Bayesian Network Structure Learning (slide 38) September / 38
SCORE EQUIVALENCE & POLYHEDRAL APPROACHES TO LEARNING BAYESIAN NETWORKS
SCORE EQUIVALENCE & POLYHEDRAL APPROACHES TO LEARNING BAYESIAN NETWORKS David Haws*, James Cussens, Milan Studeny IBM Watson dchaws@gmail.com University of York, Deramore Lane, York, YO10 5GE, UK The Institute
More informationBAYESIAN NETWORKS STRUCTURE LEARNING
BAYESIAN NETWORKS STRUCTURE LEARNING Xiannian Fan Uncertainty Reasoning Lab (URL) Department of Computer Science Queens College/City University of New York http://url.cs.qc.cuny.edu 1/52 Overview : Bayesian
More informationLecture 4: Undirected Graphical Models
Lecture 4: Undirected Graphical Models Department of Biostatistics University of Michigan zhenkewu@umich.edu http://zhenkewu.com/teaching/graphical_model 15 September, 2016 Zhenke Wu BIOSTAT830 Graphical
More informationScale Up Bayesian Networks Learning Dissertation Proposal
Scale Up Bayesian Networks Learning Dissertation Proposal Xiannian Fan xnf1203@gmail.com March 2015 Abstract Bayesian networks are widely used graphical models which represent uncertain relations between
More informationAn Improved Lower Bound for Bayesian Network Structure Learning
An Improved Lower Bound for Bayesian Network Structure Learning Xiannian Fan and Changhe Yuan Graduate Center and Queens College City University of New York 365 Fifth Avenue, New York 10016 Abstract Several
More informationLearning Optimal Bayesian Networks Using A* Search
Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence Learning Optimal Bayesian Networks Using A* Search Changhe Yuan, Brandon Malone, and Xiaojian Wu Department of
More informationA Parallel Algorithm for Exact Structure Learning of Bayesian Networks
A Parallel Algorithm for Exact Structure Learning of Bayesian Networks Olga Nikolova, Jaroslaw Zola, and Srinivas Aluru Department of Computer Engineering Iowa State University Ames, IA 0010 {olia,zola,aluru}@iastate.edu
More informationFMA901F: Machine Learning Lecture 6: Graphical Models. Cristian Sminchisescu
FMA901F: Machine Learning Lecture 6: Graphical Models Cristian Sminchisescu Graphical Models Provide a simple way to visualize the structure of a probabilistic model and can be used to design and motivate
More informationComputer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models
Prof. Daniel Cremers 4. Probabilistic Graphical Models Directed Models The Bayes Filter (Rep.) (Bayes) (Markov) (Tot. prob.) (Markov) (Markov) 2 Graphical Representation (Rep.) We can describe the overall
More informationComputer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models
Prof. Daniel Cremers 4. Probabilistic Graphical Models Directed Models The Bayes Filter (Rep.) (Bayes) (Markov) (Tot. prob.) (Markov) (Markov) 2 Graphical Representation (Rep.) We can describe the overall
More informationD-Separation. b) the arrows meet head-to-head at the node, and neither the node, nor any of its descendants, are in the set C.
D-Separation Say: A, B, and C are non-intersecting subsets of nodes in a directed graph. A path from A to B is blocked by C if it contains a node such that either a) the arrows on the path meet either
More informationOn Pruning with the MDL Score
ON PRUNING WITH THE MDL SCORE On Pruning with the MDL Score Eunice Yuh-Jie Chen Arthur Choi Adnan Darwiche Computer Science Department University of California, Los Angeles Los Angeles, CA 90095 EYJCHEN@CS.UCLA.EDU
More information3 No-Wait Job Shops with Variable Processing Times
3 No-Wait Job Shops with Variable Processing Times In this chapter we assume that, on top of the classical no-wait job shop setting, we are given a set of processing times for each operation. We may select
More informationScale Up Bayesian Network Learning
City University of New York (CUNY) CUNY Academic Works Dissertations, Theses, and Capstone Projects Graduate Center 6-2016 Scale Up Bayesian Network Learning Xiannian Fan Graduate Center, City University
More informationBayesian network model selection using integer programming
Bayesian network model selection using integer programming James Cussens Leeds, 2013-10-04 James Cussens IP for BNs Leeds, 2013-10-04 1 / 23 Linear programming The Belgian diet problem Fat Sugar Salt Cost
More informationarxiv: v2 [stat.ml] 27 Sep 2013
Order-independent causal structure learning Order-independent constraint-based causal structure learning arxiv:1211.3295v2 [stat.ml] 27 Sep 2013 Diego Colombo Marloes H. Maathuis Seminar for Statistics,
More informationJunction tree propagation - BNDG 4-4.6
Junction tree propagation - BNDG 4-4. Finn V. Jensen and Thomas D. Nielsen Junction tree propagation p. 1/2 Exact Inference Message Passing in Join Trees More sophisticated inference technique; used in
More informationComputer vision: models, learning and inference. Chapter 10 Graphical Models
Computer vision: models, learning and inference Chapter 10 Graphical Models Independence Two variables x 1 and x 2 are independent if their joint probability distribution factorizes as Pr(x 1, x 2 )=Pr(x
More informationOutline. CS38 Introduction to Algorithms. Approximation Algorithms. Optimization Problems. Set Cover. Set cover 5/29/2014. coping with intractibility
Outline CS38 Introduction to Algorithms Lecture 18 May 29, 2014 coping with intractibility approximation algorithms set cover TSP center selection randomness in algorithms May 29, 2014 CS38 Lecture 18
More informationIntroduction to information theory and coding - Lecture 1 on Graphical models
Introduction to information theory and coding - Lecture 1 on Graphical models Louis Wehenkel Department of Electrical Engineering and Computer Science University of Liège Montefiore - Liège - October,
More informationOrder-Independent Constraint-Based Causal Structure Learning
Journal of Machine Learning Research 15 (2014) 3741-3782 Submitted 9/13; Revised 7/14; Published 11/14 Order-Independent Constraint-Based Causal Structure Learning Diego Colombo Marloes H. Maathuis Seminar
More informationLearning Bounded Treewidth Bayesian Networks
Journal of Machine Learning Research 9 (2008) 2287-2319 Submitted 5/08; Published 10/08 Learning Bounded Treewidth Bayesian Networks Gal Elidan Department of Statistics Hebrew University Jerusalem, 91905,
More informationModeling and Reasoning with Bayesian Networks. Adnan Darwiche University of California Los Angeles, CA
Modeling and Reasoning with Bayesian Networks Adnan Darwiche University of California Los Angeles, CA darwiche@cs.ucla.edu June 24, 2008 Contents Preface 1 1 Introduction 1 1.1 Automated Reasoning........................
More informationExam Advanced Data Mining Date: Time:
Exam Advanced Data Mining Date: 11-11-2010 Time: 13.30-16.30 General Remarks 1. You are allowed to consult 1 A4 sheet with notes written on both sides. 2. Always show how you arrived at the result of your
More informationECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning
ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning Topics Bayes Nets: Inference (Finish) Variable Elimination Graph-view of VE: Fill-edges, induced width
More information3 : Representation of Undirected GMs
0-708: Probabilistic Graphical Models 0-708, Spring 202 3 : Representation of Undirected GMs Lecturer: Eric P. Xing Scribes: Nicole Rafidi, Kirstin Early Last Time In the last lecture, we discussed directed
More informationThese notes present some properties of chordal graphs, a set of undirected graphs that are important for undirected graphical models.
Undirected Graphical Models: Chordal Graphs, Decomposable Graphs, Junction Trees, and Factorizations Peter Bartlett. October 2003. These notes present some properties of chordal graphs, a set of undirected
More informationCS242: Probabilistic Graphical Models Lecture 3: Factor Graphs & Variable Elimination
CS242: Probabilistic Graphical Models Lecture 3: Factor Graphs & Variable Elimination Instructor: Erik Sudderth Brown University Computer Science September 11, 2014 Some figures and materials courtesy
More informationThe Full Survey on The Euclidean Steiner Tree Problem
The Full Survey on The Euclidean Steiner Tree Problem Shikun Liu Abstract The Steiner Tree Problem is a famous and long-studied problem in combinatorial optimization. However, the best heuristics algorithm
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationMachine Learning. Sourangshu Bhattacharya
Machine Learning Sourangshu Bhattacharya Bayesian Networks Directed Acyclic Graph (DAG) Bayesian Networks General Factorization Curve Fitting Re-visited Maximum Likelihood Determine by minimizing sum-of-squares
More information1 : Introduction to GM and Directed GMs: Bayesian Networks. 3 Multivariate Distributions and Graphical Models
10-708: Probabilistic Graphical Models, Spring 2015 1 : Introduction to GM and Directed GMs: Bayesian Networks Lecturer: Eric P. Xing Scribes: Wenbo Liu, Venkata Krishna Pillutla 1 Overview This lecture
More informationLecture 13: May 10, 2002
EE96 Pat. Recog. II: Introduction to Graphical Models University of Washington Spring 00 Dept. of Electrical Engineering Lecture : May 0, 00 Lecturer: Jeff Bilmes Scribe: Arindam Mandal, David Palmer(000).
More informationToward the joint design of electronic and optical layer protection
Toward the joint design of electronic and optical layer protection Massachusetts Institute of Technology Slide 1 Slide 2 CHALLENGES: - SEAMLESS CONNECTIVITY - MULTI-MEDIA (FIBER,SATCOM,WIRELESS) - HETEROGENEOUS
More informationHyper-Butterfly Network: A Scalable Optimally Fault Tolerant Architecture
Hyper-Butterfly Network: A Scalable Optimally Fault Tolerant Architecture Wei Shi and Pradip K Srimani Department of Computer Science Colorado State University Ft. Collins, CO 80523 Abstract Bounded degree
More informationMinimum-Spanning-Tree problem. Minimum Spanning Trees (Forests) Minimum-Spanning-Tree problem
Minimum Spanning Trees (Forests) Given an undirected graph G=(V,E) with each edge e having a weight w(e) : Find a subgraph T of G of minimum total weight s.t. every pair of vertices connected in G are
More informationLearning Bounded Tree-width Bayesian Networks using Integer Linear Programming
Learning Bounded Tree-width Bayesian Networks using Integer Linear Programming Pekka Parviainen Hossein Shahrabi Farahani Jens Lagergren University of British Columbia Department of Pathology Vancouver,
More informationMarkov Equivalence in Bayesian Networks
Markov Equivalence in Bayesian Networks Ildikó Flesch and eter Lucas Institute for Computing and Information Sciences University of Nijmegen Email: {ildiko,peterl}@cs.kun.nl Abstract robabilistic graphical
More informationLearning Bayesian Networks via Edge Walks on DAG Associahedra
Learning Bayesian Networks via Edge Walks on DAG Associahedra Liam Solus Based on work with Lenka Matejovicova, Adityanarayanan Radhakrishnan, Caroline Uhler, and Yuhao Wang KTH Royal Institute of Technology
More informationLecture 10,11: General Matching Polytope, Maximum Flow. 1 Perfect Matching and Matching Polytope on General Graphs
CMPUT 675: Topics in Algorithms and Combinatorial Optimization (Fall 2009) Lecture 10,11: General Matching Polytope, Maximum Flow Lecturer: Mohammad R Salavatipour Date: Oct 6 and 8, 2009 Scriber: Mohammad
More informationLearning decomposable models with a bounded clique size
Learning decomposable models with a bounded clique size Achievements 2014-2016 Aritz Pérez Basque Center for Applied Mathematics Bilbao, March, 2016 Outline 1 Motivation and background 2 The problem 3
More informationarxiv: v1 [cs.ai] 11 Oct 2015
Journal of Machine Learning Research 1 (2000) 1-48 Submitted 4/00; Published 10/00 ParallelPC: an R package for efficient constraint based causal exploration arxiv:1510.03042v1 [cs.ai] 11 Oct 2015 Thuc
More informationWorkshop report 1. Daniels report is on website 2. Don t expect to write it based on listening to one project (we had 6 only 2 was sufficient
Workshop report 1. Daniels report is on website 2. Don t expect to write it based on listening to one project (we had 6 only 2 was sufficient quality) 3. I suggest writing it on one presentation. 4. Include
More informationStat 5421 Lecture Notes Graphical Models Charles J. Geyer April 27, Introduction. 2 Undirected Graphs
Stat 5421 Lecture Notes Graphical Models Charles J. Geyer April 27, 2016 1 Introduction Graphical models come in many kinds. There are graphical models where all the variables are categorical (Lauritzen,
More information2. Graphical Models. Undirected graphical models. Factor graphs. Bayesian networks. Conversion between graphical models. Graphical Models 2-1
Graphical Models 2-1 2. Graphical Models Undirected graphical models Factor graphs Bayesian networks Conversion between graphical models Graphical Models 2-2 Graphical models There are three families of
More informationBuilding Classifiers using Bayesian Networks
Building Classifiers using Bayesian Networks Nir Friedman and Moises Goldszmidt 1997 Presented by Brian Collins and Lukas Seitlinger Paper Summary The Naive Bayes classifier has reasonable performance
More informationGraph Algorithms. Chromatic Polynomials. Graph Algorithms
Graph Algorithms Chromatic Polynomials Graph Algorithms Chromatic Polynomials Definition G a simple labelled graph with n vertices and m edges. k a positive integer. P G (k) number of different ways of
More informationResearch Article Structural Learning about Directed Acyclic Graphs from Multiple Databases
Abstract and Applied Analysis Volume 2012, Article ID 579543, 9 pages doi:10.1155/2012/579543 Research Article Structural Learning about Directed Acyclic Graphs from Multiple Databases Qiang Zhao School
More informationA Transformational Characterization of Markov Equivalence for Directed Maximal Ancestral Graphs
A Transformational Characterization of Markov Equivalence for Directed Maximal Ancestral Graphs Jiji Zhang Philosophy Department Carnegie Mellon University Pittsburgh, PA 15213 jiji@andrew.cmu.edu Abstract
More informationIntegrating locally learned causal structures with overlapping variables
Integrating locally learned causal structures with overlapping variables Robert E. Tillman Carnegie Mellon University Pittsburgh, PA rtillman@andrew.cmu.edu David Danks, Clark Glymour Carnegie Mellon University
More informationThe max-min hill-climbing Bayesian network structure learning algorithm
Mach Learn (2006) 65:31 78 DOI 10.1007/s10994-006-6889-7 The max-min hill-climbing Bayesian network structure learning algorithm Ioannis Tsamardinos Laura E. Brown Constantin F. Aliferis Received: January
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Bayesian Networks Directed Acyclic Graph (DAG) Bayesian Networks General Factorization Bayesian Curve Fitting (1) Polynomial Bayesian
More informationChapter 5 Graph Algorithms Algorithm Theory WS 2012/13 Fabian Kuhn
Chapter 5 Graph Algorithms Algorithm Theory WS 2012/13 Fabian Kuhn Graphs Extremely important concept in computer science Graph, : node (or vertex) set : edge set Simple graph: no self loops, no multiple
More informationInteger Programming ISE 418. Lecture 7. Dr. Ted Ralphs
Integer Programming ISE 418 Lecture 7 Dr. Ted Ralphs ISE 418 Lecture 7 1 Reading for This Lecture Nemhauser and Wolsey Sections II.3.1, II.3.6, II.4.1, II.4.2, II.5.4 Wolsey Chapter 7 CCZ Chapter 1 Constraint
More informationLearning Bayesian Networks with Non-Decomposable Scores
Learning Bayesian Networks with Non-Decomposable Scores Eunice Yuh-Jie Chen, Arthur Choi, and Adnan Darwiche Computer Science Department University of California, Los Angeles, USA {eyjchen,aychoi,darwiche}@cs.ucla.edu
More informationGraphical Models Part 1-2 (Reading Notes)
Graphical Models Part 1-2 (Reading Notes) Wednesday, August 3 2011, 2:35 PM Notes for the Reading of Chapter 8 Graphical Models of the book Pattern Recognition and Machine Learning (PRML) by Chris Bishop
More informationA Note on Polyhedral Relaxations for the Maximum Cut Problem
A Note on Polyhedral Relaxations for the Maximum Cut Problem Alantha Newman Abstract We consider three well-studied polyhedral relaxations for the maximum cut problem: the metric polytope of the complete
More informationBayesian Machine Learning - Lecture 6
Bayesian Machine Learning - Lecture 6 Guido Sanguinetti Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh gsanguin@inf.ed.ac.uk March 2, 2015 Today s lecture 1
More informationLecture 3: Graphs and flows
Chapter 3 Lecture 3: Graphs and flows Graphs: a useful combinatorial structure. Definitions: graph, directed and undirected graph, edge as ordered pair, path, cycle, connected graph, strongly connected
More informationFinding Optimal Bayesian Network Given a Super-Structure
Journal of Machine Learning Research 9 (2008) 2251-2286 Submitted 12/07; Revised 6/08; Published 10/08 Finding Optimal Bayesian Network Given a Super-Structure Eric Perrier Seiya Imoto Satoru Miyano Human
More informationV8 Molecular decomposition of graphs
V8 Molecular decomposition of graphs - Most cellular processes result from a cascade of events mediated by proteins that act in a cooperative manner. - Protein complexes can share components: proteins
More informationNotes on Minimum Cuts and Modular Functions
Notes on Minimum Cuts and Modular Functions 1 Introduction The following are my notes on Cunningham s paper [1]. Given a submodular function f and a set S, submodular minimisation is the problem of finding
More information6. Lecture notes on matroid intersection
Massachusetts Institute of Technology 18.453: Combinatorial Optimization Michel X. Goemans May 2, 2017 6. Lecture notes on matroid intersection One nice feature about matroids is that a simple greedy algorithm
More informationLearning Equivalence Classes of Bayesian-Network Structures
Journal of Machine Learning Research 2 (2002) 445-498 Submitted 7/01; Published 2/02 Learning Equivalence Classes of Bayesian-Network Structures David Maxwell Chickering Microsoft Research One Microsoft
More informationApproximation slides 1. An optimal polynomial algorithm for the Vertex Cover and matching in Bipartite graphs
Approximation slides 1 An optimal polynomial algorithm for the Vertex Cover and matching in Bipartite graphs Approximation slides 2 Linear independence A collection of row vectors {v T i } are independent
More informationOn the Complexity of the Policy Improvement Algorithm. for Markov Decision Processes
On the Complexity of the Policy Improvement Algorithm for Markov Decision Processes Mary Melekopoglou Anne Condon Computer Sciences Department University of Wisconsin - Madison 0 West Dayton Street Madison,
More informationMesh segmentation. Florent Lafarge Inria Sophia Antipolis - Mediterranee
Mesh segmentation Florent Lafarge Inria Sophia Antipolis - Mediterranee Outline What is mesh segmentation? M = {V,E,F} is a mesh S is either V, E or F (usually F) A Segmentation is a set of sub-meshes
More information1 Linear programming relaxation
Cornell University, Fall 2010 CS 6820: Algorithms Lecture notes: Primal-dual min-cost bipartite matching August 27 30 1 Linear programming relaxation Recall that in the bipartite minimum-cost perfect matching
More informationRandomized rounding of semidefinite programs and primal-dual method for integer linear programming. Reza Moosavi Dr. Saeedeh Parsaeefard Dec.
Randomized rounding of semidefinite programs and primal-dual method for integer linear programming Dr. Saeedeh Parsaeefard 1 2 3 4 Semidefinite Programming () 1 Integer Programming integer programming
More informationLecture 6: Linear Programming for Sparsest Cut
Lecture 6: Linear Programming for Sparsest Cut Sparsest Cut and SOS The SOS hierarchy captures the algorithms for sparsest cut, but they were discovered directly without thinking about SOS (and this is
More informationCOMP 251 Winter 2017 Online quizzes with answers
COMP 251 Winter 2017 Online quizzes with answers Open Addressing (2) Which of the following assertions are true about open address tables? A. You cannot store more records than the total number of slots
More informationHybrid Feature Selection for Modeling Intrusion Detection Systems
Hybrid Feature Selection for Modeling Intrusion Detection Systems Srilatha Chebrolu, Ajith Abraham and Johnson P Thomas Department of Computer Science, Oklahoma State University, USA ajith.abraham@ieee.org,
More informationCS242: Probabilistic Graphical Models Lecture 2B: Loopy Belief Propagation & Junction Trees
CS242: Probabilistic Graphical Models Lecture 2B: Loopy Belief Propagation & Junction Trees Professor Erik Sudderth Brown University Computer Science September 22, 2016 Some figures and materials courtesy
More informationECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning
ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning Topics Markov Random Fields: Inference Exact: VE Exact+Approximate: BP Readings: Barber 5 Dhruv Batra
More informationMeasures of Clustering Quality: A Working Set of Axioms for Clustering
Measures of Clustering Quality: A Working Set of Axioms for Clustering Margareta Ackerman and Shai Ben-David School of Computer Science University of Waterloo, Canada Abstract Aiming towards the development
More informationHybrid Correlation and Causal Feature Selection for Ensemble Classifiers
Hybrid Correlation and Causal Feature Selection for Ensemble Classifiers Rakkrit Duangsoithong and Terry Windeatt Centre for Vision, Speech and Signal Processing University of Surrey Guildford, United
More informationV,T C3: S,L,B T C4: A,L,T A,L C5: A,L,B A,B C6: C2: X,A A
Inference II Daphne Koller Stanford University CS228 Handout #13 In the previous chapter, we showed how efficient inference can be done in a BN using an algorithm called Variable Elimination, that sums
More informationLearning Bayesian networks with ancestral constraints
Learning Bayesian networks with ancestral constraints Eunice Yuh-Jie Chen and Yujia Shen and Arthur Choi and Adnan Darwiche Computer Science Department University of California Los Angeles, CA 90095 {eyjchen,yujias,aychoi,darwiche}@cs.ucla.edu
More informationCS 580: Algorithm Design and Analysis. Jeremiah Blocki Purdue University Spring 2018
CS 580: Algorithm Design and Analysis Jeremiah Blocki Purdue University Spring 2018 Chapter 11 Approximation Algorithms Slides by Kevin Wayne. Copyright @ 2005 Pearson-Addison Wesley. All rights reserved.
More informationChapter 2 PRELIMINARIES. 1. Random variables and conditional independence
Chapter 2 PRELIMINARIES In this chapter the notation is presented and the basic concepts related to the Bayesian network formalism are treated. Towards the end of the chapter, we introduce the Bayesian
More informationLecture Note: Computation problems in social. network analysis
Lecture Note: Computation problems in social network analysis Bang Ye Wu CSIE, Chung Cheng University, Taiwan September 29, 2008 In this lecture note, several computational problems are listed, including
More informationNon-convex Multi-objective Optimization
Non-convex Multi-objective Optimization Multi-objective Optimization Real-world optimization problems usually involve more than one criteria multi-objective optimization. Such a kind of optimization problems
More informationCOMP90051 Statistical Machine Learning
COMP90051 Statistical Machine Learning Semester 2, 2016 Lecturer: Trevor Cohn 21. Independence in PGMs; Example PGMs Independence PGMs encode assumption of statistical independence between variables. Critical
More informationOn Local Optima in Learning Bayesian Networks
On Local Optima in Learning Bayesian Networks Jens D. Nielsen, Tomáš Kočka and Jose M. Peña Department of Computer Science Aalborg University, Denmark {dalgaard, kocka, jmp}@cs.auc.dk Abstract This paper
More informationGraphical Models and Markov Blankets
Stephan Stahlschmidt Ladislaus von Bortkiewicz Chair of Statistics C.A.S.E. Center for Applied Statistics and Economics Humboldt-Universität zu Berlin Motivation 1-1 Why Graphical Models? Illustration
More informationMachine Learning (BSMC-GA 4439) Wenke Liu
Machine Learning (BSMC-GA 4439) Wenke Liu 01-31-017 Outline Background Defining proximity Clustering methods Determining number of clusters Comparing two solutions Cluster analysis as unsupervised Learning
More information22 Elementary Graph Algorithms. There are two standard ways to represent a
VI Graph Algorithms Elementary Graph Algorithms Minimum Spanning Trees Single-Source Shortest Paths All-Pairs Shortest Paths 22 Elementary Graph Algorithms There are two standard ways to represent a graph
More informationDesigning robust network topologies for wireless sensor networks in adversarial environments
Designing robust network topologies for wireless sensor networks in adversarial environments Aron Laszka a, Levente Buttyán a, Dávid Szeszlér b a Department of Telecommunications, Budapest University of
More informationFaster parameterized algorithms for Minimum Fill-In
Faster parameterized algorithms for Minimum Fill-In Hans L. Bodlaender Pinar Heggernes Yngve Villanger Technical Report UU-CS-2008-042 December 2008 Department of Information and Computing Sciences Utrecht
More informationCMPSCI 311: Introduction to Algorithms Practice Final Exam
CMPSCI 311: Introduction to Algorithms Practice Final Exam Name: ID: Instructions: Answer the questions directly on the exam pages. Show all your work for each question. Providing more detail including
More informationBelief propagation in a bucket-tree. Handouts, 275B Fall Rina Dechter. November 1, 2000
Belief propagation in a bucket-tree Handouts, 275B Fall-2000 Rina Dechter November 1, 2000 1 From bucket-elimination to tree-propagation The bucket-elimination algorithm, elim-bel, for belief updating
More informationTopic: Local Search: Max-Cut, Facility Location Date: 2/13/2007
CS880: Approximations Algorithms Scribe: Chi Man Liu Lecturer: Shuchi Chawla Topic: Local Search: Max-Cut, Facility Location Date: 2/3/2007 In previous lectures we saw how dynamic programming could be
More informationMVE165/MMG630, Applied Optimization Lecture 8 Integer linear programming algorithms. Ann-Brith Strömberg
MVE165/MMG630, Integer linear programming algorithms Ann-Brith Strömberg 2009 04 15 Methods for ILP: Overview (Ch. 14.1) Enumeration Implicit enumeration: Branch and bound Relaxations Decomposition methods:
More informationNumber Theory and Graph Theory
1 Number Theory and Graph Theory Chapter 6 Basic concepts and definitions of graph theory By A. Satyanarayana Reddy Department of Mathematics Shiv Nadar University Uttar Pradesh, India E-mail: satya8118@gmail.com
More informationSteiner Trees and Forests
Massachusetts Institute of Technology Lecturer: Adriana Lopez 18.434: Seminar in Theoretical Computer Science March 7, 2006 Steiner Trees and Forests 1 Steiner Tree Problem Given an undirected graph G
More informationSub-Local Constraint-Based Learning of Bayesian Networks Using A Joint Dependence Criterion
Journal of Machine Learning Research 14 (2013) 1563-1603 Submitted 11/10; Revised 9/12; Published 6/13 Sub-Local Constraint-Based Learning of Bayesian Networks Using A Joint Dependence Criterion Rami Mahdi
More informationStrongly Connected Components. Andreas Klappenecker
Strongly Connected Components Andreas Klappenecker Undirected Graphs An undirected graph that is not connected decomposes into several connected components. Finding the connected components is easily solved
More informationCS200: Graphs. Prichard Ch. 14 Rosen Ch. 10. CS200 - Graphs 1
CS200: Graphs Prichard Ch. 14 Rosen Ch. 10 CS200 - Graphs 1 Graphs A collection of nodes and edges What can this represent? n A computer network n Abstraction of a map n Social network CS200 - Graphs 2
More informationProbabilistic Graphical Models
Overview of Part One Probabilistic Graphical Models Part One: Graphs and Markov Properties Christopher M. Bishop Graphs and probabilities Directed graphs Markov properties Undirected graphs Examples Microsoft
More informationOptimization I : Brute force and Greedy strategy
Chapter 3 Optimization I : Brute force and Greedy strategy A generic definition of an optimization problem involves a set of constraints that defines a subset in some underlying space (like the Euclidean
More information