Integer Programming for Bayesian Network Structure Learning

Size: px
Start display at page:

Download "Integer Programming for Bayesian Network Structure Learning"

Transcription

1 Quality Technology & Quantitative Management Vol. 11, No. 1, pp , 2014 QTQM ICAQM 2014 Integer Programming for Bayesian Network Structure Learning James Cussens * Department of Computer Science and York Centre for Complex Systems Analysis University of York, York, UK (Received July 2013, accepted December 2013) Abstract: Bayesian networks provide an attractive representation of structured probabilistic information. There is thus much interest in 'learning' BNs from data. In this paper the problem of learning a Bayesian network using integer programming is presented. The SCIP (Solving Constraint Integer Programming) framework is used to do this. Although cutting planes are a key ingredient in our approach, primal heuristics and efficient propagation are also important. Keywords: Bayesian networks, integer programming, machine learning. 1. Introduction A Bayesian network (BN) represents a probability distribution over a finite number of random variables. In this paper, unless specified otherwise, it will be assumed that all random variables are discrete. A BN has two components: an acyclic directed graph (DAG) representing qualitative aspects of the distribution, and a set of parameters. Figure 1 presents the structure of the famous 'Asia' BN which was introduced by Lauritzen and Spiegelhalter [16]. This BN has 8 random variables A, T, X, E, L, D, S and B. The BN represents an imagined probabilistic medical 'expert system' where A = visit to Asia, T = Tuberculosis, X = Normal X-Ray result, E = Either tuberculosis or lung cancer, L = Lung cancer, D = Dyspnea (shortness of breath), S = Smoker and B = Bronchitis. Each of these random variables has two values: TRUE (t) and FALSE (f). A joint probability distribution for these 8 random variables must specify a probability for each of the 2 8 joint instantiations of these random variables. 8 To specify these 2 probabilities some parameters are needed. To explain what these parameters are some terminology is now introduced. In a BN if there is an arrow from node X to node Y we say that X is a parent of Y (and that Y is a child of node X). The parameters of a BN are defined in terms of the set of parents each node has. They are conditional probability tables (CPTs), one for each random variable, which specify a distribution for the random variable for each possible joint instantiation of its parents. So, for example, the CPT for D in Figure 1 could be: P( D t B f, E f ) 0.3 P( D f B f, E f ) 0.7 P( D t B f, E t) 0.4 P( D f B f, E t ) 0.6 P( D t B t, E f ) 0.5 P( D f B t, E f ) 0.5 P( D t B t, E t ) 1.0 P( D f B t, E t ) 0.0 * Corresponding author. james.cussens@york.ac.uk

2 100 Cussens Note that deterministic relations can be represented using 0 and 1 values for probabilities. If a random variable has no parents (like A and S in Figure 1) an unconditional probability distribution is defined for its values. For example the CPT for A might be P( A t ) 0.1, P( A f ) 0.9. Figure 1. An 8-node DAG which is the structure of a BN (the 'Asia' BN [16]) with random variables A, T, X, E, L, D, S, and B. The probability of any joint instantiation of the random variables is given by 8 multiplying the relevant conditional probabilities found in the CPTs. Although there are 2 such joint instantiations for the BN in Figure 1 the number of parameters of the BN is far fewer so that BNs provide a compact representation. They can do this since the BN structure encodes conditional independence assumptions about the random variables. A full account of this will not be give here: the interested reader should consult Koller and Friedman's excellent book on probabilistic graphical models [15]. However the basic idea is that if a node (or collection of nodes) V 3 'blocks' a path in the graph between two other nodes V 1 and V 2, then V 1 is independent of V 2 given V 3 (Koller and Friedman [15] provide a proper definition of what it means to 'block' a path). So for example, in Figure 1 A is dependent on E, D and X, but it is independent of these random variables given T. To put it informally: knowing about A tells you something about E, D and X, but once you know the value of T, A provides no information about E, D or X---it is only via T that A provides information about E, D or X. The graphs allows one to 'read off' these relationships between the variables. For example, recall that in the 'Asia' BN in Figure 1, S = Smoking, L = Lung cancer, B = Bronchitis, and D = Dyspnea (shortness of breath). The structure of the graph tell us that smoking influences dyspnea, but it only does so as a result of lung cancer or bronchitis. Such structural information can provide considerable insight, but this raises the question of how it can be reliably obtained. Two main approaches are taken. In the first a domain expert is asked to provide the structure. There are, of course, many problems with such a 'manual' approach: experts' time is expensive, experts may disagree and make mistakes and any expert used would have to first understand the semantics of BNs. An appealing alternative is to infer BN structure directly from data. Any data which can be viewed as having been sampled from some unknown joint probability distribution is appropriate. The goal is to learn a BN structure for this unknown distribution. For example, supposing again that the BN in Figure 1 is a medical expert system, it could be inferred from a database (single table) of patient records, where for each patient there is a field recording whether they smoke, have lung cancer, have bronchitis, suffer from dyspnea, etc.

3 Integer Programming for Bayesian Network Structure Learning 101 There are currently two main approaches taken to learning BN structure from data. In the first, statistical tests are performed with a view to determining conditional independence relations between the variables. An algorithm then searches for a DAG which represents the conditional independence relations thus found [6, 8, 19, 21]. In the second approach, often called search and score, each candidate DAG has a score which reflects how well it fits the data. The goal is then simply to find the DAG with the highest score [4, 5, 7, 9, 18, 24]. It is also possible to combine elements of both these main approaches [20]. The difficulty with search and score is that the number of candidate DAGs grows super-exponentially with the number of variables, so a simple enumerative approach is out of the question for all but the smallest problems. A number of search techniques have been applied, including greedy hill-climbing [19], dynamic programming [18], branch-and-bound [5] and A*[24]. In many cases the search is not complete in the sense that there is no guarantee that the BN structure returned has an optimal score. However recently there has been much interest in complete (also known as exact) BN structure learning where a search is conducted until a guaranteed optimal structure is returned. 2. Bayesian Network Structure Learning with IP In the rest of this paper an integer programming approach to exact BN learning is described. The basic ideas of integer programming (IP) are first briefly presented. This is then followed by an account of how BN structure learning can be encoded and efficiently solved using IP. Two important extensions are then described: adding in structural prior information and finding multiple solutions. The article ends with a summary of how well this approach performs Integer Programming In an integer programming problem the goal is to maximise a linear objective function subject to linear constraints with the added constraint that all variables must take integer values. (Any minimisation problem can be easily converted into a maximisation problem, so here only maximisation problems are considered.) Let x ( x1, x2, x n ) be the problem variables where each xi (i.e. can only take integer values). Assume that finite upper and lower bounds on each x i are given. Let c ( c1, c2, c n ) be the real-valued vector of objective coefficients for each problem variable. Viewing x as a column vector and c as a row vector, the problem of maximising cx with no constraints is easy: just set each x i to its lower bound if ci 0 and set it to its upper bound otherwise. The problem becomes significantly harder once linear constraints on acceptable solutions are added. Each such constraint is of the form ax b where a is a real-valued row vector and b is a real number. Many important industrial and business problems can be encoded as an IP and there are many powerful solvers ( such as CPLEX ) which can provide optimal solutions even when there are thousands of problem variables and constraints. A proper account of the many techniques of integer programming will not be provided here (for that see Wolsey's book [22]) but some basics are now given. Although solving an IP may be very hard, solving the linear relaxation of an IP is much easier (the simplex algorithm is often used). The linear relaxation is the same as the original IP except that the variables are now permitted to take non-integer values. Note that if we are 'lucky' and the solution to the linear relaxation 'happens' to be integer valued then the original IP is also

4 102 Cussens solved. In general, this is not the case but the solution to the linear relaxation does provide a useful upper bound on the objective value of an optimal integer solution. Two important parts of the IP solving process (in addition to solving the linear relaxation) are the addition of cutting planes and branching. A cutting plane is a linear inequality not present in the original problem whose validity is implied by (1) those linear inequalities that are initially present and (2) the integrality restriction on the problem variables. Typically an IP solver will search for cutting planes which the solution to the linear relaxation (call it x * ) does not satisfy. IP solvers typically contain a number of generic cutting plane algorithms (e.g. Gomory, Strong Chvátal-Gomory, zero-half [23]) which can be applied to any IP. In addition, users can create problem-specific cutting plane algorithms. Adding cutting planes will not rule out the yet-to-be-found optimal integer * solution but will rule out x. It follows that adding cutting planes in this way will produce a new linear relaxation whose solution will provide a tighter upper bound. In some problems it is possible to add sufficiently many cutting planes of the right sort so that a linear relaxation is produced whose solution is entirely integer-valued. In such a case the original IP problem is solved. Typically this is not the case so another approach is required, the most common of which is branching. In branching a problem variable x i is selected together with some appropriate integer value l. Two new subproblems are then created: one where xi l 1 and one where xi l. Usually a variable is selected with a * non-integer value in the linear relaxation solution x. Since there are only finitely many variables each with finitely many values it is not difficult to see that one can search for all possible solutions by repeated branching. In practice this search is made efficient by pruning. Pruning takes advantage of the upper bound provided by the linear relaxation. It also uses the incumbent: the best (not necessarily optimal) solution found so far. If the upper bound on the best solution for some subproblem is below that of the objective value of the incumbent then the optimal solution for the subproblem is worse than the incumbent and no further work on the subproblem is necessary Bayesian Network Learning as an IP Problem In this section it is shown how to represent the BN structure learning problem as an IP. This question has been considered in a number of papers [2, 10, 11, 12, 14]. Firstly, we need to create IP problem variables to represent the structure of DAGs. This is done by creating binary 'family' variables IW ( v) for each node v and candidate parent set W, where IW ( v) 1 iff W is the parent set for v. In this encoding, the DAG in Figure 1 would be represented by a solution where I( A ) 1, I( S ) 1, I({ A} T ) 1, I({ S} L ) 1, I({ S} B) 1, I({ L, T} E ) 1, I({ E} X ) 1, I({ B, E} D) 1, and all other IP variables have the value 0. The next issue to consider is how to score candidate BNs: how do we measure how 'good' a given BN is for the data from which we are learning? A number of scores are used but here only one is considered: log marginal likelihood or the BDeu score. The BDeu score comes from looking at the problem from the perspective of Bayesian statistics. In that approach the problem is to find the 'most probable' BN given the data, i.e. to find a BN G which maximises PG ( Data). Using Bayes theorem we have that PG ( Data) PGP ( ) (Data G), where PG ( ) is the prior probability of BN G and P(Data G ) is the marginal likelihood. If we have no prior bias between the candidate BNs it is reasonable for PG ( ) to have the same value for all G. In this case maximising marginal likelihood or indeed log marginal likelihood will maximise PG ( Data). (Note that the word

5 Integer Programming for Bayesian Network Structure Learning 103 ''Bayesian'' in ''Bayesian networks'' is misleading, since BNs are no more Bayesian than other probabilistic models which do not have the world ''Bayesian'' in their name.) Crucially, given certain restrictions, the BDeu score can be expressed as a linear function of the family variables IW ( v) (hence the decision to encode the graph using them). So-called 'local scores' cvw (, ) are computed from the data for each IW ( v) variable. The BN structure learning problem then becomes the problem of maximising cvw (, ) IW ( v), (1) vw, subject to the condition that the values assigned to the IW ( v) represent a DAG. Linear inequalities are now required to restrict instantiations of the IW ( v) variables so that only DAGs are represented. Firstly, it is easy to ensure that each BN variable (call BN variables 'nodes') has exactly one (possibly empty) parent set. Letting V be the set of BN nodes, the following linear constraints are added to the IP: v V : I( W v) 1, (2) W Ensuring that the graph is acyclic is more tricky. The most successful approach has been to use 'cluster' constraints: C V : I( W v) 1, (3) v C W: W C introduced by Jaakkola et al. [14]. A cluster is a subset of BN nodes. For each cluster C the associated constraint declares that at least one v C has no parents in C. Since there are exponentially many cluster constraints these are added as cutting planes in the course of solving: each time the linear relaxation of the IP is solved there is a search for a cluster constraint which is not satisfied by the linear relaxation solution. If no cluster constraint can be found there are two possibilities depending on whether the linear relaxation solution * (call it x ) has variables with fractional values or not. If there are no fractional variables * * then x must represent a DAG, and moreover this DAG is optimal since x is a solution * to the linear relaxation and thus an upper bound. Alternatively, x may include variables with fractional values. If so, generic cutting plane algorithms are run in the hope of finding cutting planes which are not 'cluster' constraints (3). Figure 2. Branch-and-cut approach to solving an IP. A standard 'branch-and-cut' approach, as summarised in Figure 2, is taken to solving the IP. Cutting planes are added (if possible) each time the linear relaxation is solved. If no suitable cutting planes can be found progress is made by branching on a variable.

6 104 Cussens Eventually this algorithm will return an optimal solution. In addition to cutting and branching two further ingredients are used to improve performance. The first of these is a 'sink-finding' primal heuristic algorithm which searches for a feasible integer solution (i.e. a DAG) 'near' the solution to the current LP relaxation. The point of this is to find a good (probably suboptimal) solution early in the solving process since this allows earlier and more frequent pruning of the search if and when branching begins. To understand the sink-finding algorithm recall that each family variable IW ( v) has an associated objective function coefficient. It follows that the potential parent sets for each BN node can be ordered from 'best' (highest coefficient) to 'worst' (lowest coefficient). Suppose, without loss of generality, that the BN nodes are labelled {1,2, p} and let Wv,1, Wv, k v be the parent sets for BN node v ordered from best to worst, as illustrated in Table 1. (In this table the rows are shown as being of equal length for neatness, but this is typically not the case, since different BN nodes may have differing numbers of candidate parent sets.) Table 1. Example initial state of the sink-finding heuristic for V p. Rows need not be of the same length. I( W1,1 1) I( W1,2 1) I( W1, k 1) 1 I( W2,1 2) I( W2,2 2) I( W2, k 2) 2 I( W3,1 3) I( W3,2 3) I( W3, k 3) 3 I( W p) I( W p) I( W, p) p,1 p,2 pk p Table 2. Example intermediate state of the sink-finding heuristic. I( W1,1 1) I( W1,2 1) I( W1, k 1) 1 IW ( 2,1 2 ) I( W2,2 2) I( W2, k 2) 2 I( W3,1 3) I( W3,2 3) I( W3, k 3) 3 I( W p) I( W p) I( W, p) p,1 p,2 pk p Each DAG must have at least one sink node, that is a node which has no children. So any optimal DAG has a sink node for which one can choose its best parent set without fear of creating a cycle. It follows that at least one of the parent sets in the leftmost column in Table 1 must be selected in any optimal BN. The sink-finding algorithm works by selecting parent sets for each BN node. It starts by finding a BN node v such that the value of the family variable W v,1 is as close to 1 as possible in the solution to the current LP relaxation. The parent set W v,1 is chosen for v and then parent sets for other variables containing v are 'ruled out', ensuring that v will be a sink node of the DAG eventually created (hence the name of the algorithm). Table 2 illustrates the state of the algorithm with v 2 and where v W1,1, v W3,2, v W1, p v W. and 2,p

7 Integer Programming for Bayesian Network Structure Learning 105 In its second iteration the sink-finding algorithm looks for a sink node for a DAG with nodes V \{ v } in the same way selecting a best allowable parent set with a value closest to 1 in the solution to the linear relaxation. In subsequent iterations the algorithm proceeds analogously until a DAG is fully constructed. Since best allowable parent sets are chosen in each iteration, the hope is that a high scoring (if not optimal) DAG will be returned. The second extra ingredient for improving efficiency propagation can be more briefly described. Suppose that due to branching decisions the IP variables I({ S} L), and I({ L, T} E) have both been set to 1 in some subproblem. In this case it is immediate that, for example, the variable I({ E} S) should be set to 0 in this subproblem, since having all three set to 1 would result in a cyclic subgraph. Propagation allows 'non-linear' reasoning within an IP approach and can bring important performance benefits. Although the BN structure learning problem has now been cast as an IP there are some problems with this approach. Firstly there are exponentially many IW ( v) IP variables. To deal with this a restriction on candidate parent sets is typically made, usually by limiting the number of parents any node can have to some small number (e.g. 2, 3 or 4). It follows that the IP approach to BN learning is most appropriate to applications where such a restriction is reasonable. Secondly, it is necessary to precompute the local scores cvw (, ) which can be a slow business Adding Structural Constraints This approach to BN structure learning has been implemented in the GOBNILP system which is available for download from GOBNILP uses the SCIP 'constraint integer programming' framework [1] (scip.zib.de). As well as implementing 'vanilla' BN learning using IP, GOBNILP allows the user to add additional constraints on the structure of BNs. This facility is very important in solving real problems since domain experts typically have some knowledge on how the variables in their data are related. Failing to incorporate such knowledge (usually called 'prior knowledge') into the learning process will produce inferior results. We may end up with a BN expressing conditional independence relations between the random variables, which we know to be untrue. The user constraints available in GOBNILP 1.4 are now given. Conditional independence relations It may be that the user knows some conditional independence relations that hold between the random variables. This can be declared and GOBNILP will only return BNs respecting them. (Non-)existence of particular arrows If the user knows that particular arrows must occur in the BN this can be stated. In addition, if certain arrows must not occur this too can be declared. (Non-)existence of particular undirected edges If the user knows that there must be an arrow between two particular nodes but does not wish to specify the direction this can be declared. Similarly the non-existence of an arrow in either direction may be stated. Immoralities If two parents of some node do not have an arrow connecting them, this is known as an 'immorality' (or v-structure). It is sometimes useful to state the existence or non-existence of immoralities. This is possible in GOBNILP. Number of founders A founder is a BN node with no parents. Nodes A and S are the only founders in Figure 1. GOBNILP allows the user to put upper and lower bounds on the number of founders.

8 106 Cussens Number of parents Each node in a BN is either a parent of some other node or not. In Figure 1 all nodes are parents apart from the 'sink' nodes D and X. GOBNILP allows the user to put upper and lower bounds on the number of parents. Such constraints were used by Pe'er et al. [19] (not using GOBNILP!) Number of arrows The BN in Figure 1 has 8 arrows. GOBNILP allows the user to put upper and lower bounds on the number of arrows. In many cases adding the functionality to allow such user-defined constraints is very easy because an IP approach has been taken. Integer programming allows what might be called 'declarative machine learning' where the user can inject knowledge into the learning algorithm without having to worry about how that algorithm will use it to solve the problem. One final feature which the IP approach makes simple is the learning of multiple BNs. It is very important to acknowledge that the output of any BN learning algorithm (even an 'exact' one) can only be a guess as to what the 'true' BN should be. Although one can have greater confidence in the accuracy of this guess as the amount of data increases, the impossibility of deducing the correct BN remains. Given this, it is useful to consider a range of possible BNs. GOBNILP does this by returning the top k best scoring BNs, where k is set by the user. This is simply done: once a highest scoring BN is found a linear constraint is added ruling out just that BN and the problem is re-solved Results The IP approach to BN structure learning as implemented in GOBNILP 1.3 (not the current version) has been evaluated by Bartlett and Cussens [2]. The main results from that paper are reproduced here in Table 3 as a convenience. Synthetic datasets were generated by sampling from the joint probability distributions defined by various Bayesian networks (Column 'Network' in Table 3). p is the number of variables in the data set. m is the limit on the number of parents of each variable. N is the number of observations in the data set. Families is the number of family variables in the data set after pruning. All times are given in seconds (rounded). '' [ ]'' indicates that the solution had not been found after 2 hours --- the value given is the gap, rounded to the nearest percent, between the score of the best found BN and the upper bound on the score of the best potential BN, as a percentage of the score of the best found BN. A limit on the size of parent sets was then set (Column m ) and local BDeu scores for 'family' variables were then computed. An IP problem was then created and solved as described in the preceding sections. The goal of these empirical investigations was to measure the effect of different strategies and particularly to check whether the sink-finding algorithm and propagation did indeed lead to faster solving. Comparing the column GOBNILP 1.3 to the columns SPH and VP showed that typically (not always) both the sink-finding algorithm and propagation (respectively) were helpful. However what is most striking is how sensitive solving time is to the choice of cutting plane strategy. Table 3 shows that using three of SCIP's builtin generic cutting plane algorithms (Gomory, Strong CG and Zero-half) has a big, usually positive, effect. Entries in italics are at least 10% worse than GOBNILP 1.3, while those in bold are at least 10% better. Turning these off and just using cluster constraint cutting planes typically led to much slower solving. It is also evident that adding set packing constraints lead to big improvements. By a set packing constraint we mean an inequality of the form given in (4).

9 Integer Programming for Bayesian Network Structure Learning 107 C V : I( W v) 1 (4) v C W: C\{ v} W These inequalities state that for any subset C of nodes at most one v C may have all other members of C as its parents. The effect of adding in all such inequalities for all C such that C 4 is what is recorded in column SPC of Table 3. Doing so typically leads to faster solving since it leads to tighter linear relaxations. To understand these results it is useful to dip into the theory of integer programming and consider the convex hull of DAGs represented using family variables. Each DAG so n represented can be seen as a point in where n is the total number of family variables in some BN learning problem instance. The convex hull of all such points is an n -dimensional polyhedron (or more properly polytope) whose vertices correspond to DAGs. If it were possible to compactly define this shape using a modest number of inequalities one could construct a linear program (LP) (not an IP) with just these inequalities and there would be a guarantee that any solution to the LP would be an optimal DAG. Unfortunately, any such convex hull would require very many inequalities to define it, so it is necessary to resort to approximating the convex hull by a much smaller number of inequalities. What the results in this section show is that constructing a good approximation is vital; this is because solutions to linear relaxations will provide better bounds. Using the set packing constraints (4) and SCIP's generic cutting planes provide a much better approximation than the cluster constraints (2) alone which leads to the improved solving times shown in Table 3. Table 3. Comparison of GOBNILP 1.3 with older systems and impact of various features. Network m p N Families GOBNILP GOBNILP Cussens Without Solver Feature No Cuts of Type G SCG ZH SPC SPH VP hailfinder hailfinder alarm alarm carpo carpo [0%] [1%] 6649 [0%] [0%] [0%] 7014 [0%] [0%] 4065 [0%] [0%] [0%] [0%] [0%] [0%] diabetes [31%] [39%] [17%] [23%] [168%] [199%] [16%] [17%] [15%] [17%] [44%] [17%] [378%] [380%] [44%] [44%] [44%] [44%] pigs [0%] [7%] [8%] [3%] [9%] [13%] [42%] [3%] [3%] [3%] [3%] Key: SPC-Set Packing Constraints, SPH-Sink Primal Heuristic, VP-Value Propagator, G- Gomory cuts, SCG-Strong CG cuts, ZH-Zero-half cuts.

10 108 Cussens 3. Conclusions and Future Work It is instructive to examine which benefits of Bayesian networks are stressed by commercial vendors of BN software such as Hugin Expert A/S ( Norsys Software Corp ( and Bayesia ( A number of themes stand out: 1. The graphical structure of a BN allows one to 'read off' relationships between variables of interest. 2. It is possible to 'learn' BNs from data (using, perhaps, expert knowledge also). 3. Since a BN represents a probability distribution the strength of a (probabilistic) relation is properly quantified. 4. BNs can be used for making predictions. 5. By adding nodes representing actions and costs, Bayesian networks can be extended into decision networks to help users make optimal decisions in conditions of uncertainty. The following extract from Bayesia's website stresses the first two of these benefits: You can use the power of non-supervised learning to extract the set of significant probabilistic relations contained in your databases (base conceptualisation). Apart from significant time savings made by revealing direct probabilistic relations compared with a standard analysis of the table of correlations, this type of analysis is a real knowledge finding tool helping one understand phenomena. [3] So BN learning is an important task, but it is also known to be NP-hard (which means that one cannot expect to have an algorithm which performs learning in time polynomial in the size of the input). Nonetheless, it has been shown that integer programming is an effective approach to 'exact' learning of Bayesian networks in certain circumstances. However, current approaches have severe limitations. In particular, in order to prevent too many IP variables being created restrictions, often artificial, on the number of these variables are imposed. However, in any solution only one IP variable for each BN node has a non-zero value (indicating the selected set of parents for that node). This suggests seeking to avoid creating IP variables unless there is some prospect that they will have a non-zero value in the optimal solution. Fortunately, there is a well-known IP technique which does exactly this: delayed column generation [13] where variables are created 'on the fly'. A 'pricing' algorithm is used to search for new variables which might be needed in an optimal solution. This technique has yet to be applied to Bayesian network learning but it holds out the possibility, at least, of allowing exact approaches to be applied to substantially bigger problems. Acknowledgements Thanks to an anonymous referee for useful criticisms. This work has been supported by the UK Medical Research Council (Project Grant G ). References 1. Achterberg, T. (2007). Constraint Integer Programming. Ph.D. thesis, TU Berlin.

11 Integer Programming for Bayesian Network Structure Learning Barlett, M. and Cussens, J. (2013). Advances in Bayesian network learning using integer programming. In Ann Nicholson and Padhraic Smyth, editors, Proceedings of the 29th Conference on Uncertainty in Artificial Intelligence (UAI 2013), , Bellevue. AUAI Press. 3. Bayesia. (2013). The strengths of Bayesia s technology for marketing in 18 points Bøttcher, S. G. and Dethlefsen, C. (2003). DEAL: A Package for Learning Bayesian Networks. Technical report, Department of Mathematical Sciences, Aalborg University. 5. Campos, de C., Zeng, Z. and Ji, Q. (2009). Structure learning of Bayesian networks using constraints. Proceedings of the 26th International Conference on Machine Learning, , Canada. 6. Cheng, J., Greiner, R., Kelly, J., Bell, D. and Liu, W. (2002). Learning Bayesian networks from data: An information-theory based approach. Artificial Intelligence, 137(1-2), Chickering, D. M., Geiger, D. and Heckerman, D. (1995). Learning Bayesian networks: Search methods and experimental methods. Proceedings of the 5th International Workshop on Artificial Intelligence and Statistics, , USA. 8. Claassen, T., Mooij, J. and Heskes, T. (2013). Learning sparse causal models is not NP-hard. Proceedings of the 29th Conference on Un-certainty in Artificial Intelligence (UAI-13), , USA. 9. Cooper, G. F. and Herskovits, E. (1992). A Bayesian method for the induction of probabilistic networks from data. Machine Learning, 9, Cussens, J. (2011) Bayesian network learning with cutting planes. In Cozman, F. G., and Pfeffer, A. editors, Proceedings of the 27th Conference on Uncertainty in Artificial Intelligence (UAI 2011), , Barcelona. AUAI Press. 11. Cussens, J. (2010). Maximum likelihood pedigree reconstruction using integer programming. Proceedings of the Workshop on Constraint Based Methods for Bioinformatics (WCB-10), Edinburgh. 12. Cussens, J., Bartlett, M., Jones, E. M. and Sheehan, N. A. (2013). Maximum likelihood pedigree reconstruction using integer linear programming. Genetic Epidemiology, 37(1), Desaulniers, G., Desrosiers, J. and Solomon, M. M., (2005). Column Generation. Springer, USA. 14. Jaakkola, T., Sontag, D., Globerson, A. and Meila, M. (2010). Learning Bayesian network structure using LP relaxations. Proceedings of 13 th International Conference on Artificial Intelligence and Statistics (AISTATS 2010), Italy, 9, Journal of Machine Learning Research Workshop and Conference Proceedings. 15. Koller, D. and Friedman, N. (2009). Probabilistic Graphical Models: Principles and Techniques. MIT Press. 16. Lauritzen, S. L. and Spiegelhatler, D. (1988). Local computations with probabilities on graphical structures and their application to expert systems. Journal of the Royal Statistical Society (Series B), 50(2), Pe er, D., Tanay, A. and Regev, A. (2006). MinReg: A scalable algorithm for learning parsimonious regulatory networks in yeast and mammals. Journal of Machine Learning Research, 7,

12 110 Cussens 18. Silander, T. and Myllymäki, P. (2006). A simple approach for finding the globally optimal Bayesian network structure. Proceedings of 22nd Conference on Uncertainly in Artificial Intelligence, , AUAI Press, USA. 19. Spirtes, P., Meek, C., and Scheines, R. (1993). Causation, Prediction and Search. Springer-Verlag, New York. 20. Tsamardinos, I., Brown, L. E. and Aliferis, C. F. (2006). The max-min hill-climbing Bayesian network structure learning algorithm. Machine Learning, 65(1), Verma, T. and Pearl, J. (1992). An algorithm for deciding if a set of observed independencies has a causal explanation. Proceedings of 8th Conference on Uncertainty in Artificial Intelligence (UAI-92), Wolsey, L. A. (1998). Integer Programming. John Wiley. 23. Wolter, K. (2006). Implementation of Cutting Plane Separators for Mixed Integer Programs. Master s thesis, Technische Universität Berlin. 24. Yuan, C. and Malone, B. (2012). An improved admissible heuristic for learning optimal Bayesian networks. Proceedings of the 28th Conference on Uncertainty in Artificial Intelligence (UAI-12), Catalina Island, CA. Author s Biography: James Cussens received his Ph.D. in the philosophy of probability from King's College, London, UK. After spells working at the University of Oxford (twice), Glasgow Caledonian University and King's College, London, he joined the University of York as a Lecturer in He is currently a Senior Lecturer in the Artificial Intelligence Group, Dept of Computer Science and also a member of the York Centre for Complex Systems Analysis. He works on machine learning, probabilistic graphical models, discrete optimisation and combinations thereof.

Bayesian network model selection using integer programming

Bayesian network model selection using integer programming Bayesian network model selection using integer programming James Cussens Leeds, 2013-10-04 James Cussens IP for BNs Leeds, 2013-10-04 1 / 23 Linear programming The Belgian diet problem Fat Sugar Salt Cost

More information

BAYESIAN NETWORKS STRUCTURE LEARNING

BAYESIAN NETWORKS STRUCTURE LEARNING BAYESIAN NETWORKS STRUCTURE LEARNING Xiannian Fan Uncertainty Reasoning Lab (URL) Department of Computer Science Queens College/City University of New York http://url.cs.qc.cuny.edu 1/52 Overview : Bayesian

More information

Lecture 5: Exact inference

Lecture 5: Exact inference Lecture 5: Exact inference Queries Inference in chains Variable elimination Without evidence With evidence Complexity of variable elimination which has the highest probability: instantiation of all other

More information

An Improved Lower Bound for Bayesian Network Structure Learning

An Improved Lower Bound for Bayesian Network Structure Learning An Improved Lower Bound for Bayesian Network Structure Learning Xiannian Fan and Changhe Yuan Graduate Center and Queens College City University of New York 365 Fifth Avenue, New York 10016 Abstract Several

More information

Learning Bayesian networks with ancestral constraints

Learning Bayesian networks with ancestral constraints Learning Bayesian networks with ancestral constraints Eunice Yuh-Jie Chen and Yujia Shen and Arthur Choi and Adnan Darwiche Computer Science Department University of California Los Angeles, CA 90095 {eyjchen,yujias,aychoi,darwiche}@cs.ucla.edu

More information

Integer Programming ISE 418. Lecture 7. Dr. Ted Ralphs

Integer Programming ISE 418. Lecture 7. Dr. Ted Ralphs Integer Programming ISE 418 Lecture 7 Dr. Ted Ralphs ISE 418 Lecture 7 1 Reading for This Lecture Nemhauser and Wolsey Sections II.3.1, II.3.6, II.4.1, II.4.2, II.5.4 Wolsey Chapter 7 CCZ Chapter 1 Constraint

More information

On Pruning with the MDL Score

On Pruning with the MDL Score ON PRUNING WITH THE MDL SCORE On Pruning with the MDL Score Eunice Yuh-Jie Chen Arthur Choi Adnan Darwiche Computer Science Department University of California, Los Angeles Los Angeles, CA 90095 EYJCHEN@CS.UCLA.EDU

More information

Learning Bounded Tree-width Bayesian Networks using Integer Linear Programming

Learning Bounded Tree-width Bayesian Networks using Integer Linear Programming Learning Bounded Tree-width Bayesian Networks using Integer Linear Programming Pekka Parviainen Hossein Shahrabi Farahani Jens Lagergren University of British Columbia Department of Pathology Vancouver,

More information

Lecture 5: Exact inference. Queries. Complexity of inference. Queries (continued) Bayesian networks can answer questions about the underlying

Lecture 5: Exact inference. Queries. Complexity of inference. Queries (continued) Bayesian networks can answer questions about the underlying given that Maximum a posteriori (MAP query: given evidence 2 which has the highest probability: instantiation of all other variables in the network,, Most probable evidence (MPE: given evidence, find an

More information

A New Approach For Convert Multiply-Connected Trees in Bayesian networks

A New Approach For Convert Multiply-Connected Trees in Bayesian networks A New Approach For Convert Multiply-Connected Trees in Bayesian networks 1 Hussein Baloochian, Alireza khantimoory, 2 Saeed Balochian 1 Islamic Azad university branch of zanjan 2 Islamic Azad university

More information

Integer Programming Theory

Integer Programming Theory Integer Programming Theory Laura Galli October 24, 2016 In the following we assume all functions are linear, hence we often drop the term linear. In discrete optimization, we seek to find a solution x

More information

Evaluating the Explanatory Value of Bayesian Network Structure Learning Algorithms

Evaluating the Explanatory Value of Bayesian Network Structure Learning Algorithms Evaluating the Explanatory Value of Bayesian Network Structure Learning Algorithms Patrick Shaughnessy University of Massachusetts, Lowell pshaughn@cs.uml.edu Gary Livingston University of Massachusetts,

More information

Evaluating the Effect of Perturbations in Reconstructing Network Topologies

Evaluating the Effect of Perturbations in Reconstructing Network Topologies DSC 2 Working Papers (Draft Versions) http://www.ci.tuwien.ac.at/conferences/dsc-2/ Evaluating the Effect of Perturbations in Reconstructing Network Topologies Florian Markowetz and Rainer Spang Max-Planck-Institute

More information

Hybrid Feature Selection for Modeling Intrusion Detection Systems

Hybrid Feature Selection for Modeling Intrusion Detection Systems Hybrid Feature Selection for Modeling Intrusion Detection Systems Srilatha Chebrolu, Ajith Abraham and Johnson P Thomas Department of Computer Science, Oklahoma State University, USA ajith.abraham@ieee.org,

More information

Learning Directed Probabilistic Logical Models using Ordering-search

Learning Directed Probabilistic Logical Models using Ordering-search Learning Directed Probabilistic Logical Models using Ordering-search Daan Fierens, Jan Ramon, Maurice Bruynooghe, and Hendrik Blockeel K.U.Leuven, Dept. of Computer Science, Celestijnenlaan 200A, 3001

More information

Learning Bayesian Network Structure using LP Relaxations

Learning Bayesian Network Structure using LP Relaxations Tommi Jaakkola David Sontag Amir Globerson Marina Meila MIT CSAIL MIT CSAIL Hebrew University University of Washington Abstract We propose to solve the combinatorial problem of finding the highest scoring

More information

Av. Prof. Mello Moraes, 2231, , São Paulo, SP - Brazil

Av. Prof. Mello Moraes, 2231, , São Paulo, SP - Brazil " Generalizing Variable Elimination in Bayesian Networks FABIO GAGLIARDI COZMAN Escola Politécnica, University of São Paulo Av Prof Mello Moraes, 31, 05508-900, São Paulo, SP - Brazil fgcozman@uspbr Abstract

More information

Summary: A Tutorial on Learning With Bayesian Networks

Summary: A Tutorial on Learning With Bayesian Networks Summary: A Tutorial on Learning With Bayesian Networks Markus Kalisch May 5, 2006 We primarily summarize [4]. When we think that it is appropriate, we comment on additional facts and more recent developments.

More information

Handling first-order linear constraints with SCIP

Handling first-order linear constraints with SCIP Handling first-order linear constraints with SCIP James Cussens, University of York KU Leuven, 2015-02-16 James Cussens, University of York FO with SCIP KU Leuven, 2015-02-16 1 / 18 MIP Mixed integer programs

More information

Building Classifiers using Bayesian Networks

Building Classifiers using Bayesian Networks Building Classifiers using Bayesian Networks Nir Friedman and Moises Goldszmidt 1997 Presented by Brian Collins and Lukas Seitlinger Paper Summary The Naive Bayes classifier has reasonable performance

More information

The MIP-Solving-Framework SCIP

The MIP-Solving-Framework SCIP The MIP-Solving-Framework SCIP Timo Berthold Zuse Institut Berlin DFG Research Center MATHEON Mathematics for key technologies Berlin, 23.05.2007 What Is A MIP? Definition MIP The optimization problem

More information

Algorithms for Decision Support. Integer linear programming models

Algorithms for Decision Support. Integer linear programming models Algorithms for Decision Support Integer linear programming models 1 People with reduced mobility (PRM) require assistance when travelling through the airport http://www.schiphol.nl/travellers/atschiphol/informationforpassengerswithreducedmobility.htm

More information

The Heuristic (Dark) Side of MIP Solvers. Asja Derviskadic, EPFL Vit Prochazka, NHH Christoph Schaefer, EPFL

The Heuristic (Dark) Side of MIP Solvers. Asja Derviskadic, EPFL Vit Prochazka, NHH Christoph Schaefer, EPFL The Heuristic (Dark) Side of MIP Solvers Asja Derviskadic, EPFL Vit Prochazka, NHH Christoph Schaefer, EPFL 1 Table of content [Lodi], The Heuristic (Dark) Side of MIP Solvers, Hybrid Metaheuristics, 273-284,

More information

Ordering attributes for missing values prediction and data classification

Ordering attributes for missing values prediction and data classification Ordering attributes for missing values prediction and data classification E. R. Hruschka Jr., N. F. F. Ebecken COPPE /Federal University of Rio de Janeiro, Brazil. Abstract This work shows the application

More information

3 No-Wait Job Shops with Variable Processing Times

3 No-Wait Job Shops with Variable Processing Times 3 No-Wait Job Shops with Variable Processing Times In this chapter we assume that, on top of the classical no-wait job shop setting, we are given a set of processing times for each operation. We may select

More information

SCORE EQUIVALENCE & POLYHEDRAL APPROACHES TO LEARNING BAYESIAN NETWORKS

SCORE EQUIVALENCE & POLYHEDRAL APPROACHES TO LEARNING BAYESIAN NETWORKS SCORE EQUIVALENCE & POLYHEDRAL APPROACHES TO LEARNING BAYESIAN NETWORKS David Haws*, James Cussens, Milan Studeny IBM Watson dchaws@gmail.com University of York, Deramore Lane, York, YO10 5GE, UK The Institute

More information

Learning Optimal Bayesian Networks Using A* Search

Learning Optimal Bayesian Networks Using A* Search Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence Learning Optimal Bayesian Networks Using A* Search Changhe Yuan, Brandon Malone, and Xiaojian Wu Department of

More information

A Well-Behaved Algorithm for Simulating Dependence Structures of Bayesian Networks

A Well-Behaved Algorithm for Simulating Dependence Structures of Bayesian Networks A Well-Behaved Algorithm for Simulating Dependence Structures of Bayesian Networks Yang Xiang and Tristan Miller Department of Computer Science University of Regina Regina, Saskatchewan, Canada S4S 0A2

More information

3 INTEGER LINEAR PROGRAMMING

3 INTEGER LINEAR PROGRAMMING 3 INTEGER LINEAR PROGRAMMING PROBLEM DEFINITION Integer linear programming problem (ILP) of the decision variables x 1,..,x n : (ILP) subject to minimize c x j j n j= 1 a ij x j x j 0 x j integer n j=

More information

Methods and Models for Combinatorial Optimization Exact methods for the Traveling Salesman Problem

Methods and Models for Combinatorial Optimization Exact methods for the Traveling Salesman Problem Methods and Models for Combinatorial Optimization Exact methods for the Traveling Salesman Problem L. De Giovanni M. Di Summa The Traveling Salesman Problem (TSP) is an optimization problem on a directed

More information

Algorithms for Integer Programming

Algorithms for Integer Programming Algorithms for Integer Programming Laura Galli November 9, 2016 Unlike linear programming problems, integer programming problems are very difficult to solve. In fact, no efficient general algorithm is

More information

Escola Politécnica, University of São Paulo Av. Prof. Mello Moraes, 2231, , São Paulo, SP - Brazil

Escola Politécnica, University of São Paulo Av. Prof. Mello Moraes, 2231, , São Paulo, SP - Brazil Generalizing Variable Elimination in Bayesian Networks FABIO GAGLIARDI COZMAN Escola Politécnica, University of São Paulo Av. Prof. Mello Moraes, 2231, 05508-900, São Paulo, SP - Brazil fgcozman@usp.br

More information

LP-Modelling. dr.ir. C.A.J. Hurkens Technische Universiteit Eindhoven. January 30, 2008

LP-Modelling. dr.ir. C.A.J. Hurkens Technische Universiteit Eindhoven. January 30, 2008 LP-Modelling dr.ir. C.A.J. Hurkens Technische Universiteit Eindhoven January 30, 2008 1 Linear and Integer Programming After a brief check with the backgrounds of the participants it seems that the following

More information

Advanced Operations Research Prof. G. Srinivasan Department of Management Studies Indian Institute of Technology, Madras

Advanced Operations Research Prof. G. Srinivasan Department of Management Studies Indian Institute of Technology, Madras Advanced Operations Research Prof. G. Srinivasan Department of Management Studies Indian Institute of Technology, Madras Lecture 16 Cutting Plane Algorithm We shall continue the discussion on integer programming,

More information

Dependency detection with Bayesian Networks

Dependency detection with Bayesian Networks Dependency detection with Bayesian Networks M V Vikhreva Faculty of Computational Mathematics and Cybernetics, Lomonosov Moscow State University, Leninskie Gory, Moscow, 119991 Supervisor: A G Dyakonov

More information

Joint Entity Resolution

Joint Entity Resolution Joint Entity Resolution Steven Euijong Whang, Hector Garcia-Molina Computer Science Department, Stanford University 353 Serra Mall, Stanford, CA 94305, USA {swhang, hector}@cs.stanford.edu No Institute

More information

Advanced Operations Research Prof. G. Srinivasan Department of Management Studies Indian Institute of Technology, Madras

Advanced Operations Research Prof. G. Srinivasan Department of Management Studies Indian Institute of Technology, Madras Advanced Operations Research Prof. G. Srinivasan Department of Management Studies Indian Institute of Technology, Madras Lecture 18 All-Integer Dual Algorithm We continue the discussion on the all integer

More information

Some Advanced Topics in Linear Programming

Some Advanced Topics in Linear Programming Some Advanced Topics in Linear Programming Matthew J. Saltzman July 2, 995 Connections with Algebra and Geometry In this section, we will explore how some of the ideas in linear programming, duality theory,

More information

An Information Theory based Approach to Structure Learning in Bayesian Networks

An Information Theory based Approach to Structure Learning in Bayesian Networks An Information Theory based Approach to Structure Learning in Bayesian Networks Gopalakrishna Anantha 9 th October 2006 Committee Dr.Xue wen Chen (Chair) Dr. John Gauch Dr. Victor Frost Publications An

More information

Research Article Structural Learning about Directed Acyclic Graphs from Multiple Databases

Research Article Structural Learning about Directed Acyclic Graphs from Multiple Databases Abstract and Applied Analysis Volume 2012, Article ID 579543, 9 pages doi:10.1155/2012/579543 Research Article Structural Learning about Directed Acyclic Graphs from Multiple Databases Qiang Zhao School

More information

Treewidth and graph minors

Treewidth and graph minors Treewidth and graph minors Lectures 9 and 10, December 29, 2011, January 5, 2012 We shall touch upon the theory of Graph Minors by Robertson and Seymour. This theory gives a very general condition under

More information

Bayesian Networks Structure Learning Second Exam Literature Review

Bayesian Networks Structure Learning Second Exam Literature Review Bayesian Networks Structure Learning Second Exam Literature Review Xiannian Fan xnf1203@gmail.com December 2014 Abstract Bayesian networks are widely used graphical models which represent uncertainty relations

More information

12.1 Formulation of General Perfect Matching

12.1 Formulation of General Perfect Matching CSC5160: Combinatorial Optimization and Approximation Algorithms Topic: Perfect Matching Polytope Date: 22/02/2008 Lecturer: Lap Chi Lau Scribe: Yuk Hei Chan, Ling Ding and Xiaobing Wu In this lecture,

More information

5.3 Cutting plane methods and Gomory fractional cuts

5.3 Cutting plane methods and Gomory fractional cuts 5.3 Cutting plane methods and Gomory fractional cuts (ILP) min c T x s.t. Ax b x 0integer feasible region X Assumption: a ij, c j and b i integer. Observation: The feasible region of an ILP can be described

More information

Introduction to Mathematical Programming IE406. Lecture 20. Dr. Ted Ralphs

Introduction to Mathematical Programming IE406. Lecture 20. Dr. Ted Ralphs Introduction to Mathematical Programming IE406 Lecture 20 Dr. Ted Ralphs IE406 Lecture 20 1 Reading for This Lecture Bertsimas Sections 10.1, 11.4 IE406 Lecture 20 2 Integer Linear Programming An integer

More information

MVE165/MMG630, Applied Optimization Lecture 8 Integer linear programming algorithms. Ann-Brith Strömberg

MVE165/MMG630, Applied Optimization Lecture 8 Integer linear programming algorithms. Ann-Brith Strömberg MVE165/MMG630, Integer linear programming algorithms Ann-Brith Strömberg 2009 04 15 Methods for ILP: Overview (Ch. 14.1) Enumeration Implicit enumeration: Branch and bound Relaxations Decomposition methods:

More information

Optimal tree for Genetic Algorithms in the Traveling Salesman Problem (TSP).

Optimal tree for Genetic Algorithms in the Traveling Salesman Problem (TSP). Optimal tree for Genetic Algorithms in the Traveling Salesman Problem (TSP). Liew Sing liews_ryan@yahoo.com.sg April 1, 2012 Abstract In this paper, the author proposes optimal tree as a gauge for the

More information

Outline. Column Generation: Cutting Stock A very applied method. Introduction to Column Generation. Given an LP problem

Outline. Column Generation: Cutting Stock A very applied method. Introduction to Column Generation. Given an LP problem Column Generation: Cutting Stock A very applied method thst@man.dtu.dk Outline History The Simplex algorithm (re-visited) Column Generation as an extension of the Simplex algorithm A simple example! DTU-Management

More information

Column Generation: Cutting Stock

Column Generation: Cutting Stock Column Generation: Cutting Stock A very applied method thst@man.dtu.dk DTU-Management Technical University of Denmark 1 Outline History The Simplex algorithm (re-visited) Column Generation as an extension

More information

Stat 5421 Lecture Notes Graphical Models Charles J. Geyer April 27, Introduction. 2 Undirected Graphs

Stat 5421 Lecture Notes Graphical Models Charles J. Geyer April 27, Introduction. 2 Undirected Graphs Stat 5421 Lecture Notes Graphical Models Charles J. Geyer April 27, 2016 1 Introduction Graphical models come in many kinds. There are graphical models where all the variables are categorical (Lauritzen,

More information

Survey of contemporary Bayesian Network Structure Learning methods

Survey of contemporary Bayesian Network Structure Learning methods Survey of contemporary Bayesian Network Structure Learning methods Ligon Liu September 2015 Ligon Liu (CUNY) Survey on Bayesian Network Structure Learning (slide 1) September 2015 1 / 38 Bayesian Network

More information

Approximation Algorithms

Approximation Algorithms Approximation Algorithms Prof. Tapio Elomaa tapio.elomaa@tut.fi Course Basics A 4 credit unit course Part of Theoretical Computer Science courses at the Laboratory of Mathematics There will be 4 hours

More information

On Local Optima in Learning Bayesian Networks

On Local Optima in Learning Bayesian Networks On Local Optima in Learning Bayesian Networks Jens D. Nielsen, Tomáš Kočka and Jose M. Peña Department of Computer Science Aalborg University, Denmark {dalgaard, kocka, jmp}@cs.auc.dk Abstract This paper

More information

Computational Integer Programming. Lecture 12: Branch and Cut. Dr. Ted Ralphs

Computational Integer Programming. Lecture 12: Branch and Cut. Dr. Ted Ralphs Computational Integer Programming Lecture 12: Branch and Cut Dr. Ted Ralphs Computational MILP Lecture 12 1 Reading for This Lecture Wolsey Section 9.6 Nemhauser and Wolsey Section II.6 Martin Computational

More information

Benders Decomposition

Benders Decomposition Benders Decomposition Using projections to solve problems thst@man.dtu.dk DTU-Management Technical University of Denmark 1 Outline Introduction Using projections Benders decomposition Simple plant location

More information

Chapter 2 PRELIMINARIES. 1. Random variables and conditional independence

Chapter 2 PRELIMINARIES. 1. Random variables and conditional independence Chapter 2 PRELIMINARIES In this chapter the notation is presented and the basic concepts related to the Bayesian network formalism are treated. Towards the end of the chapter, we introduce the Bayesian

More information

Testing Independencies in Bayesian Networks with i-separation

Testing Independencies in Bayesian Networks with i-separation Proceedings of the Twenty-Ninth International Florida Artificial Intelligence Research Society Conference Testing Independencies in Bayesian Networks with i-separation Cory J. Butz butz@cs.uregina.ca University

More information

Primal Heuristics for Branch-and-Price Algorithms

Primal Heuristics for Branch-and-Price Algorithms Primal Heuristics for Branch-and-Price Algorithms Marco Lübbecke and Christian Puchert Abstract In this paper, we present several primal heuristics which we implemented in the branch-and-price solver GCG

More information

Probabilistic Abstraction Lattices: A Computationally Efficient Model for Conditional Probability Estimation

Probabilistic Abstraction Lattices: A Computationally Efficient Model for Conditional Probability Estimation Probabilistic Abstraction Lattices: A Computationally Efficient Model for Conditional Probability Estimation Daniel Lowd January 14, 2004 1 Introduction Probabilistic models have shown increasing popularity

More information

Combinatorial Optimization

Combinatorial Optimization Combinatorial Optimization Frank de Zeeuw EPFL 2012 Today Introduction Graph problems - What combinatorial things will we be optimizing? Algorithms - What kind of solution are we looking for? Linear Programming

More information

11.1 Facility Location

11.1 Facility Location CS787: Advanced Algorithms Scribe: Amanda Burton, Leah Kluegel Lecturer: Shuchi Chawla Topic: Facility Location ctd., Linear Programming Date: October 8, 2007 Today we conclude the discussion of local

More information

Fundamentals of Integer Programming

Fundamentals of Integer Programming Fundamentals of Integer Programming Di Yuan Department of Information Technology, Uppsala University January 2018 Outline Definition of integer programming Formulating some classical problems with integer

More information

MVE165/MMG631 Linear and integer optimization with applications Lecture 9 Discrete optimization: theory and algorithms

MVE165/MMG631 Linear and integer optimization with applications Lecture 9 Discrete optimization: theory and algorithms MVE165/MMG631 Linear and integer optimization with applications Lecture 9 Discrete optimization: theory and algorithms Ann-Brith Strömberg 2018 04 24 Lecture 9 Linear and integer optimization with applications

More information

Cuts from Proofs: A Complete and Practical Technique for Solving Linear Inequalities over Integers

Cuts from Proofs: A Complete and Practical Technique for Solving Linear Inequalities over Integers Cuts from Proofs: A Complete and Practical Technique for Solving Linear Inequalities over Integers Isil Dillig, Thomas Dillig, and Alex Aiken Computer Science Department Stanford University Linear Arithmetic

More information

Column Generation Method for an Agent Scheduling Problem

Column Generation Method for an Agent Scheduling Problem Column Generation Method for an Agent Scheduling Problem Balázs Dezső Alpár Jüttner Péter Kovács Dept. of Algorithms and Their Applications, and Dept. of Operations Research Eötvös Loránd University, Budapest,

More information

Vertex Cover Approximations

Vertex Cover Approximations CS124 Lecture 20 Heuristics can be useful in practice, but sometimes we would like to have guarantees. Approximation algorithms give guarantees. It is worth keeping in mind that sometimes approximation

More information

Application of the Bayesian Network to Machine breakdowns using Witness Simulation

Application of the Bayesian Network to Machine breakdowns using Witness Simulation , July 4-6, 2012, London, U.K. Application of the Bayesian Network to Machine breakdowns using Witness Simulation Elbahlul M. Abogrean and Muhammad Latif Abstract This paper explores the use of Bayesian

More information

A Transformational Characterization of Markov Equivalence for Directed Maximal Ancestral Graphs

A Transformational Characterization of Markov Equivalence for Directed Maximal Ancestral Graphs A Transformational Characterization of Markov Equivalence for Directed Maximal Ancestral Graphs Jiji Zhang Philosophy Department Carnegie Mellon University Pittsburgh, PA 15213 jiji@andrew.cmu.edu Abstract

More information

Generation of Near-Optimal Solutions Using ILP-Guided Sampling

Generation of Near-Optimal Solutions Using ILP-Guided Sampling Generation of Near-Optimal Solutions Using ILP-Guided Sampling Ashwin Srinivasan 1, Gautam Shroff 2, Lovekesh Vig 2, and Sarmimala Saikia 2 1 Department of Computer Science & Information Systems BITS Pilani,

More information

1 : Introduction to GM and Directed GMs: Bayesian Networks. 3 Multivariate Distributions and Graphical Models

1 : Introduction to GM and Directed GMs: Bayesian Networks. 3 Multivariate Distributions and Graphical Models 10-708: Probabilistic Graphical Models, Spring 2015 1 : Introduction to GM and Directed GMs: Bayesian Networks Lecturer: Eric P. Xing Scribes: Wenbo Liu, Venkata Krishna Pillutla 1 Overview This lecture

More information

Random projection for non-gaussian mixture models

Random projection for non-gaussian mixture models Random projection for non-gaussian mixture models Győző Gidófalvi Department of Computer Science and Engineering University of California, San Diego La Jolla, CA 92037 gyozo@cs.ucsd.edu Abstract Recently,

More information

Improved Local Search in Bayesian Networks Structure Learning

Improved Local Search in Bayesian Networks Structure Learning Proceedings of Machine Learning Research vol 73:45-56, 2017 AMBN 2017 Improved Local Search in Bayesian Networks Structure Learning Mauro Scanagatta IDSIA, SUPSI, USI - Lugano, Switzerland Giorgio Corani

More information

LECTURES 3 and 4: Flows and Matchings

LECTURES 3 and 4: Flows and Matchings LECTURES 3 and 4: Flows and Matchings 1 Max Flow MAX FLOW (SP). Instance: Directed graph N = (V,A), two nodes s,t V, and capacities on the arcs c : A R +. A flow is a set of numbers on the arcs such that

More information

A Parallel Algorithm for Exact Structure Learning of Bayesian Networks

A Parallel Algorithm for Exact Structure Learning of Bayesian Networks A Parallel Algorithm for Exact Structure Learning of Bayesian Networks Olga Nikolova, Jaroslaw Zola, and Srinivas Aluru Department of Computer Engineering Iowa State University Ames, IA 0010 {olia,zola,aluru}@iastate.edu

More information

Leave-One-Out Support Vector Machines

Leave-One-Out Support Vector Machines Leave-One-Out Support Vector Machines Jason Weston Department of Computer Science Royal Holloway, University of London, Egham Hill, Egham, Surrey, TW20 OEX, UK. Abstract We present a new learning algorithm

More information

Greedy Algorithms 1. For large values of d, brute force search is not feasible because there are 2 d

Greedy Algorithms 1. For large values of d, brute force search is not feasible because there are 2 d Greedy Algorithms 1 Simple Knapsack Problem Greedy Algorithms form an important class of algorithmic techniques. We illustrate the idea by applying it to a simplified version of the Knapsack Problem. Informally,

More information

Distributed minimum spanning tree problem

Distributed minimum spanning tree problem Distributed minimum spanning tree problem Juho-Kustaa Kangas 24th November 2012 Abstract Given a connected weighted undirected graph, the minimum spanning tree problem asks for a spanning subtree with

More information

Finding the k-best Equivalence Classes of Bayesian Network Structures for Model Averaging

Finding the k-best Equivalence Classes of Bayesian Network Structures for Model Averaging Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence Finding the k-best Equivalence Classes of Bayesian Network Structures for Model Averaging Yetian Chen and Jin Tian Department

More information

CS 580: Algorithm Design and Analysis. Jeremiah Blocki Purdue University Spring 2018

CS 580: Algorithm Design and Analysis. Jeremiah Blocki Purdue University Spring 2018 CS 580: Algorithm Design and Analysis Jeremiah Blocki Purdue University Spring 2018 Chapter 11 Approximation Algorithms Slides by Kevin Wayne. Copyright @ 2005 Pearson-Addison Wesley. All rights reserved.

More information

Node Aggregation for Distributed Inference in Bayesian Networks

Node Aggregation for Distributed Inference in Bayesian Networks Node Aggregation for Distributed Inference in Bayesian Networks Kuo-Chu Chang and Robert Fung Advanced Decision Systmes 1500 Plymouth Street Mountain View, California 94043-1230 Abstract This study describes

More information

Surrogate Gradient Algorithm for Lagrangian Relaxation 1,2

Surrogate Gradient Algorithm for Lagrangian Relaxation 1,2 Surrogate Gradient Algorithm for Lagrangian Relaxation 1,2 X. Zhao 3, P. B. Luh 4, and J. Wang 5 Communicated by W.B. Gong and D. D. Yao 1 This paper is dedicated to Professor Yu-Chi Ho for his 65th birthday.

More information

An Introduction to LP Relaxations for MAP Inference

An Introduction to LP Relaxations for MAP Inference An Introduction to LP Relaxations for MAP Inference Adrian Weller MLSALT4 Lecture Feb 27, 2017 With thanks to David Sontag (NYU) for use of some of his slides and illustrations For more information, see

More information

11. APPROXIMATION ALGORITHMS

11. APPROXIMATION ALGORITHMS 11. APPROXIMATION ALGORITHMS load balancing center selection pricing method: vertex cover LP rounding: vertex cover generalized load balancing knapsack problem Lecture slides by Kevin Wayne Copyright 2005

More information

Theorem 2.9: nearest addition algorithm

Theorem 2.9: nearest addition algorithm There are severe limits on our ability to compute near-optimal tours It is NP-complete to decide whether a given undirected =(,)has a Hamiltonian cycle An approximation algorithm for the TSP can be used

More information

Approximation Algorithms

Approximation Algorithms Approximation Algorithms Given an NP-hard problem, what should be done? Theory says you're unlikely to find a poly-time algorithm. Must sacrifice one of three desired features. Solve problem to optimality.

More information

Integer Programming Chapter 9

Integer Programming Chapter 9 1 Integer Programming Chapter 9 University of Chicago Booth School of Business Kipp Martin October 30, 2017 2 Outline Branch and Bound Theory Branch and Bound Linear Programming Node Selection Strategies

More information

Notes for Lecture 24

Notes for Lecture 24 U.C. Berkeley CS170: Intro to CS Theory Handout N24 Professor Luca Trevisan December 4, 2001 Notes for Lecture 24 1 Some NP-complete Numerical Problems 1.1 Subset Sum The Subset Sum problem is defined

More information

CS 473: Algorithms. Ruta Mehta. Spring University of Illinois, Urbana-Champaign. Ruta (UIUC) CS473 1 Spring / 36

CS 473: Algorithms. Ruta Mehta. Spring University of Illinois, Urbana-Champaign. Ruta (UIUC) CS473 1 Spring / 36 CS 473: Algorithms Ruta Mehta University of Illinois, Urbana-Champaign Spring 2018 Ruta (UIUC) CS473 1 Spring 2018 1 / 36 CS 473: Algorithms, Spring 2018 LP Duality Lecture 20 April 3, 2018 Some of the

More information

Principles of Optimization Techniques to Combinatorial Optimization Problems and Decomposition [1]

Principles of Optimization Techniques to Combinatorial Optimization Problems and Decomposition [1] International Journal of scientific research and management (IJSRM) Volume 3 Issue 4 Pages 2582-2588 2015 \ Website: www.ijsrm.in ISSN (e): 2321-3418 Principles of Optimization Techniques to Combinatorial

More information

Binary Decision Diagrams

Binary Decision Diagrams Logic and roof Hilary 2016 James Worrell Binary Decision Diagrams A propositional formula is determined up to logical equivalence by its truth table. If the formula has n variables then its truth table

More information

Large-Scale Optimization and Logical Inference

Large-Scale Optimization and Logical Inference Large-Scale Optimization and Logical Inference John Hooker Carnegie Mellon University October 2014 University College Cork Research Theme Large-scale optimization and logical inference. Optimization on

More information

Scale Up Bayesian Networks Learning Dissertation Proposal

Scale Up Bayesian Networks Learning Dissertation Proposal Scale Up Bayesian Networks Learning Dissertation Proposal Xiannian Fan xnf1203@gmail.com March 2015 Abstract Bayesian networks are widely used graphical models which represent uncertain relations between

More information

High Dimensional Indexing by Clustering

High Dimensional Indexing by Clustering Yufei Tao ITEE University of Queensland Recall that, our discussion so far has assumed that the dimensionality d is moderately high, such that it can be regarded as a constant. This means that d should

More information

Optimization Methods in Management Science

Optimization Methods in Management Science Problem Set Rules: Optimization Methods in Management Science MIT 15.053, Spring 2013 Problem Set 6, Due: Thursday April 11th, 2013 1. Each student should hand in an individual problem set. 2. Discussing

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Theory of Variational Inference: Inner and Outer Approximation Eric Xing Lecture 14, February 29, 2016 Reading: W & J Book Chapters Eric Xing @

More information

Enumerating Equivalence Classes of Bayesian Networks using EC Graphs

Enumerating Equivalence Classes of Bayesian Networks using EC Graphs Enumerating Equivalence Classes of Bayesian Networks using EC Graphs Eunice Yuh-Jie Chen and Arthur Choi and Adnan Darwiche Computer Science Department University of California, Los Angeles {eyjchen,aychoi,darwiche}@cs.ucla.edu

More information

Stable sets, corner polyhedra and the Chvátal closure

Stable sets, corner polyhedra and the Chvátal closure Stable sets, corner polyhedra and the Chvátal closure Manoel Campêlo Departamento de Estatística e Matemática Aplicada, Universidade Federal do Ceará, Brazil, mcampelo@lia.ufc.br. Gérard Cornuéjols Tepper

More information

16.410/413 Principles of Autonomy and Decision Making

16.410/413 Principles of Autonomy and Decision Making 16.410/413 Principles of Autonomy and Decision Making Lecture 17: The Simplex Method Emilio Frazzoli Aeronautics and Astronautics Massachusetts Institute of Technology November 10, 2010 Frazzoli (MIT)

More information

CMPSCI611: The Simplex Algorithm Lecture 24

CMPSCI611: The Simplex Algorithm Lecture 24 CMPSCI611: The Simplex Algorithm Lecture 24 Let s first review the general situation for linear programming problems. Our problem in standard form is to choose a vector x R n, such that x 0 and Ax = b,

More information

The Simplex Algorithm

The Simplex Algorithm The Simplex Algorithm Uri Feige November 2011 1 The simplex algorithm The simplex algorithm was designed by Danzig in 1947. This write-up presents the main ideas involved. It is a slight update (mostly

More information