Improving Acyclic Selection Order-Based Bayesian Network Structure Learning

Size: px
Start display at page:

Download "Improving Acyclic Selection Order-Based Bayesian Network Structure Learning"

Transcription

1 Improving Acyclic Selection Order-Based Bayesian Network Structure Learning Walter Perez Urcia, Denis Deratani Mauá 1 Instituto de Matemática e Estatística, Universidade de São Paulo, Brazil wperez@ime.usp.br,denis.maua@usp.br Abstract. An effective approach for learning Bayesian network structures in large domains is to perform a local search on the space of topological orderings. As with most local search approaches, the quality of the procedure depends on the initialization strategy. Usually, a simple random initialization is adopted. Perez and Mauá developed initialization heuristics that were empirically shown to improve the overall performance of order-based structure learning. Recently, Scanagatta et al. proposed replacing the search for a directed acyclic graph in order-based learning with a procedure that considers also order-incompatible structures. Their procedure covers a larger space of structures without small computational overhead, which often leads to improved performance. As with standard order-based learning, Scanagatta et al. recommended initializing their algorithm with a randomly generated ordering. A natural improvement for this approach would be then to consider better initialization heuristics. In this work we propose a new initialization heuristic that takes into account the idiosyncrasies of Scanagatta et al. s approach. Experiments with real-world data sets indicate that the combination of this new heuristic and Scanagatta et al. s orderbased search outperforms other order-based methods. 1. Introduction Bayesian Networks are compact representations of multivariate probability distributions [Pearl 1988]. They are defined by two components: an acyclic directed graph (DAG), and a collection of local conditional probability distributions of each variable given its parents. Manually specifying a Bayesian network is an error-prone and time-consuming task, and practitioners often resort to automatically learning (i.e., inferring) the model from a data set of observations. This learning is often performed in two steps. In the structure learning step, one obtains a DAG (called the structure) representing the relations between variables in the data. Then, in the parameter learning step, the associated conditional probabilities are estimated assuming the learned DAG to be true. When the data are complete (i.e., there are no missing values), the latter step amounts to obtaining relative frequencies from data, and can thus be performed efficiently. An effective approach for Bayesian network structure learning is to search the space of DAGs guided by a score function [Cooper and Herskovits 1992, Lam and Bacchus 1994, Friedman et al. 1999, Chickering 2002]. Motivated by the fact that the search over DAGs is tractable when an ordering over the variables is fixed and the node in-degree is bounded [Buntine 1991], Tessyer and Koller (2005) proposed searching in the space of topological orderings by associating each ordering with a compatible SBC ENIAC-2016 Recife - PE 169

2 DAG. Their order-based search (OBS) performs a local search in the topology of complete variable orderings where two orderings are neighbors if they differ in at most one comparison. Each ordering is associated with a compatible DAG found by searching for score-maximizing parents of each node that respect that ordering and the in-degree bound. As with most local search approaches, the choice of an initial solution (i.e., an ordering) is crucial to the quality of the solution produced by OBS. The search is usually initialized with an ordering sampled uniformly at random. While this guarantees a fair coverage of the search space, it can lead to poor local optima, slow convergence and ultimately poor performance. To alleviate these issues, we have recently advocated the use of informed heuristics for the generation of initial solutions [Perez and Mauá 2015]. We developed two heuristics based on the solution of the relaxed version of the structure learning problem where cycles are permitted. This relaxed solution can be obtained efficiently by greedily and independently selecting the parents of each node so as to maximize the score. The first heuristic generates an ordering by a depth-first search (DFS) traversal of the relaxed solution where the in-degree of nodes is used to break ties. The second heuristic considers a weighted version of the relaxed solution where the weight of an arc measures the decrease in the overall score incurred by removing that arc. A standard greedy algorithm is used to find the minimum-cost Feedback Arc Set (FAS) and obtain a DAG. This DAG is finally used to generate a consistent topological ordering. Empirical results with a large collection of data sets showed that both heuristics improve the performance of OBS, with the FAS-based heuristic consistently outperforming the DFS-based heuristic. Recently, Scaganatta et al. (2015) proposed a different approach for associating a DAG with an ordering. Their approach, called acyclic selection order-based search (ASOBS), differs from OBS as follows: given a fixed ordering, a DAG is selected by iterating from the greatest to the smallest node, selecting the score-maximizing parent set for the current node that does not introduce cycles in the incumbent DAG. Thus, the associated DAG does not need to respect the given ordering. They reported improved performance over OBS in a data set of real world domains. In this work, we develop a new initialization heuristic that is based on a relaxed version of the optimization solved by ASOBS. Thus, unlike the FAS- and DFS-heurstics, our heuristic is tailored to ASOBS. We compare the heuristics and search techniques on six real-world data sets containing from 64 to 1556 variables. The results suggests that this new heuristic very often finds higher scoring DAGs than previous heuristics, including the standard implementation of ASOBS (which uses a random generation of orderings). The rest of this document is organized as follows. We begin in Section 2 with a review of the Bayesian network structure learning problem, and the OBS. We then briefly explain ASOBS in Section 3. In Section 4, we describe the initialization heuristics we developed in previous work. Our new heuristic is presented in Section 5. An empirical analysis of the heuristics is shown in Section 6. We conclude the paper in Section Order-Based Bayesian Network Structure Learning A Bayesian network specification contains a DAG G = (V, E), where V = {X 1, X 2,..., X n } is the set of (categorical) random variables, and a collection of conditional probability distributions P (X i P a G i ), i = 1,..., n, where P a G i are the parents SBC ENIAC-2016 Recife - PE 170

3 of X i in G and P (X i ) = P (X i ). The Bayesian network is assumed to induce a joint probability distribution over all the variables through the equation P (X 1,..., X n ) = n P (X i P a G i ), i=1 The number of parameters required to specify a Bayesian network with DAG G is size(g) = n i=1 (r i 1) X j P a r G j, where r k denotes the number of states variable i X k can assume. A score function sc(g) assigns a real-value to any DAG indicating its goodness in representing a given data set. 1 Typical score functions combine a data likelihood component with a model complexity penalization to prevent overfitting [Akaike 1974, Schwarz 1978, Suzuki 1996, Heckerman et al. 1995]. Notable examples are the Bayesian Information Criterion (BIC) [Schwarz 1978], the Akaike Information Criterion (AIC) [Akaike 1974], Minimum Description Length (MDL) [Suzuki 1996] and the Bayesian Dirichlet score (BD) [Heckerman et al. 1995]. In particular, the BIC and MDL scores were shown to outperform other scores at recovering the true structure [Liu et al. 2012]. We assume the score function is decomposable, meaning that it can be written as a sum of local scores sc(g) = n i=1 sc i(p a G i ). Most score functions used in structure learning satisfy these properties, such as the BIC, MDL, and BD [Chickering et al. 2004]. For example, the BIC score function, is given by BIC(G) = LL(G) ln N n 2 size(g) = N ijk ln N ijk + (r i 1) r j, N i=1 k j ij X j P a G i where LL(G) is the data log-likelihood, N ijk is the number of instances where attribute X i takes its kth value and its parents take the jth configuration (for some arbitrary fixed ordering of the configurations of the parents values), and N ij = k N ijk. The Bayesian network structure learning problem is to find G that satisfies sc(g ) = max G is a DAG n sc i (P a G i ). (1) Since a directed graph is acyclic if and only if it admits a topological ordering, Equation (1) can be rewritten as sc(g ) = max < i=1 n max sc i(y), (2) Y {X j <X i } i=1 where the first maximization is performed over the space of (total) orderings of the nodes V. The search space of this formulation is considerably smaller than the search space in Equation (1) [Teyssier and Koller 2005]. When the in-degree of G is assumed to be bounded by an integer k, the inner maximizations can be performed by systematic search 1 The dependence of the scoring function on the data set is usually left implicitly, as for most of this explanation we can assume a fixed data set. We assume here that the data set contains no missing values. SBC ENIAC-2016 Recife - PE 171

4 in time O(n k ) [Scanagatta et al. 2015]. In particular, for the BIC and MDL score functions, the (local) score-maximizing parent set of any variable (under any fixed order) has at most log N variables, where N is the size of the data set [de Campos and Ji 2011]. Hence, a small bound k is often a reasonable simplification. Pruning rules can also be used to speed up the parent set selection [de Campos and Ji 2011, Yuan and Malone 2013]. Tessyer and Koller (2005) used a local search procedure to solve the outer optimization in Equation (2). Define the score of an ordering < as sc(<) = n max sc i(y). Y {X j <X i }: Y k i=1 At each iteration, Tessyer and Koller s local search evaluates all candidate orderings < that differ from < in a single comparison, and either selects the one with the highest increase in score (if it exists) or halts (if none is found). Given two orderings < and < that differ in a single comparison such that X < Y and Y < X, the relative increase in score is sc(< ) sc(<) = max Y {Xj < X}: Y k sc i (Y) + max Y {Xj < Y }: Y k sc i (Y) ( maxy {Xj <X}: Y k sc i (Y) + max Y {Xj <Y }: Y k sc i (Y) ). Thus, the selection of the best neighbor ordering can performed efficiently, which makes OBS highly effective. 3. Acyclic Selection Order-Based Search As discussed, given an ordering < over the variables, OBS performs parent set selection by independently searching, for each variable X, for the score-maximizing subsets of variables smaller than X. While this decomposition of the parent selection step for each variable ensures linear time (for bounded in-degree), it imposes unnecessary constraints and reduces the effectively searched space. To see this, consider a simple example where three variables are ordered such that A < B < C. Suppose that the best parent set for B is the empty set. Then clearly considering B as a candidate parent set of A still leads to a valid solution (a DAG), while increasing the coverage of DAGs. In other words, once a choice of parent set is made for some of the variables, the constraint imposed by the order on the remaining variables can be partially relaxed. Acyclic selection order-based search (ASOBS) builds on this idea to improve the performance of OBS [Scanagatta et al. 2015]. Instead of maximizing parent sets independently, ASOBS selects parents sequentially, from the highest to the smallest ordered variable. The procedure is more precisely described as follows. Fix an ordering X 1 < X 2 < < X n, and let G n be an empty graph. Then, for i =n to i = 1 ASOBS searches for the best parent set for X i that does not induce cycles in G i+1, and sets G i to be G i+1 with the selected parents of X i. This procedure can be implemented in linear time, thus matching the asymptotic performance of OBS [Scanagatta et al. 2015]. Scanagatta et al. showed that ASOBS outperforms OBS on a large collection of real-word data sets. 4. Informed initialization heuristics The generation of a good initial solution is crucial for avoiding convergence to poor local maxima in order-based structure learning. Traditionally, this is attempted by randomly generating initial orderings using standard techniques such as the Fisher-Yates algorithm SBC ENIAC-2016 Recife - PE 172

5 [Knuth 1998]. While this guarantees a good coverage of the search space when sufficiently many restarts are performed, in large domains it can lead to poor solutions and require many iterations until a local optimum is reached. In previous work, we proposed using the information provided by the relaxed version of the problem H k = arg H n max sc i(y), (3) Y {X j <X i }: Y k i=1 to guide the generation of initial solutions (variable orderings) [Perez and Mauá 2015]. To this aim, we developed two heuristics, which we review next Depth-First Based Heuristic The idea behind this heuristic is to perform a depth-first search in the graph Hk using the in-degree of child nodes to select which child to visit next. To understand why this may generate good orderings, consider a graph Hk with nodes X i and X j such that X i is the single parent of X j and has no parents. Then, there is an optimal ordering starting with X i (this can easily be shown by contradiction). We can delete X i from the graph and repeat the argument to conclude the existence of an optimal ordering starting with X i < X j. Now consider a case where there are two or more selectable nodes (by the previous explanation) in graph Hk. Instead of picking a random selectable node we define the goodness of a node by: goodness(x i ) = P a H j unvisited (4) X j Ch H i unvisited where Ch H i is the set of X i s children and unvisited is the set of unvisited nodes. Small values of goodness mean that removing X i from the graph will make more nodes to be selectable. Ties are resolved by picking one of the best selectable nodes uniformly at random. The heuristic is more precisely described in the pseudo-code in Algorithm 1. For example, in the graph in Figure 1, we can safely constrain the orderings to start with A, since it has no parents, and remove it from the graph. At this time, we have three selectable nodes B, C and F, each one with same in-degree, but with different goodness value. Since F has the least goodness value, we select it. Performing previous steps repeatedly we obtain that the candidate optimal orderings are A, F, C, E, B, D and A, F, C, E, D, B. Note that this is a significant decrease from the full space of 6! = 720 possible orderings. This difference is likely to increase considerably as the number of variables increases, and as the best parent set becomes sparser Feedback-Arc Set Heuristic The DFS-based approach can be seen as removing edges from a graph Hk so as to make it a DAG (more specifically, a tree), and then extracting a consistent topological ordering. The selection of an edge to remove considers only the qualitative information of the graph. An arguably better approach is to use the score function to assess the relevance of each edge, and to consider the removal of edges globally (not only in a local neighborhood). The relevance of an edge X j X i in a graph H is assessed by W ji = sc i (P a H i ) sc i (P a H i \ {X j }), (5) SBC ENIAC-2016 Recife - PE 173

6 Algorithm 1: DFS-Based ordering generation. Function: DFS( Graph H ) 1 unvisited all nodes 2 L 3 while unvisited is not empty do 4 O unvisited nodes ordered by unvisited in-degree and goodness 5 B best nodes from O 6 if B has more than one node then 7 select a node X r from B uniformly at random 8 else 9 select the unique node X r from B 10 end 11 L L {X r } 12 unvisited unvisited \ {X r } 13 end 14 return L 83 B 153 D F A 87 C 227 E Figure 1. An example of a parent set graph. The weight W ji represents the cost of removing X j from the set P a H i, and it is always a positive number if H is the best parent set graph (Equation (3)) since P a H i maximizes the score for X i. A small value of W ji suggests that the parent X j is not very relevant to X i. For instance, in the weighted graph in Figure 1, the edge C D is less relevant than the edge B D, which in turn is less relevant than the edge A D. The feedback-arc set heuristic (FAS) penalizes orderings which violate an edge X i X j in H by their associated cost W ij. Given a directed graph H = (V, E), a set F E is called a Feedback Arc Set (FAS) if every (directed) cycle of H contains at least one edge in F. In other words, F is an edge set that if removed makes the graph H acyclic [Demetrescu and Finocchi 2003]. If we assume that the cost of an ordering of H is the sum of the weights of the violated (or removed) edges, we can formulate the problem of finding a minimum cost ordering of H as a Minimum Cost Feedback Arc Set Problem (min-cost FAS): given the weighted directed graph H with weights W ij, find a min-cost FAS F such that F = arg min W ij. (6) H F is a DAG X i X j E SBC ENIAC-2016 Recife - PE 174

7 The min-cost FAS problem have been proved to be NP-complete [Gavril 1977], but there are efficient and effective approximation algorithms [Eades et al. 1993, Eades and Lin 1995, Demetrescu and Finocchi 2003] like the one shown in Algorithm 2 with complexity O(nm), where m is the number of edges on the graph. Algorithm 2: Minimum Cost FAS approximation Input : Graph H Output: Feedback Arc Set F 1 F 2 while there is a cycle C on H do 3 W min arg min (u,v) C W uv 4 for (u, v) C do 5 W uv = W uv W min 6 if W uv = 0 then 7 F = F + {(u, v)} 8 end 9 end 10 end 11 for (u, v) F do 12 if (u, v) does not build a cycle then 13 H = H + (u, v) 14 F = F \ {(u, v)} 15 end 16 end In short, the FAS heuristic is: take the weighted graph Hk with weights W ij as input, and find a(n approximate) min-cost FAS F ; remove the edges in F from Hk and return a topological order of the DAG Hk F (this can be done by performing a depth-first search traversal starting at root nodes). 5. A New Initialization Heuristic for ASOBS: A Best-First Based Approach Whereas the DFS- and FAS-based heuristics provide a significant improvement on the quality of solutions found by OBS in fixed amount time, they generate only a marginal gain in performance when ASOBS is adopted. One possible explanation is that ASOBS performs parent set selection under a (dynamically chosen) variable ordering, hence biasing the search towards specific orderings can actually hurt performance. Motivated by the previous explanation, we propose the Best First-Based initialization heuristic (BFT) described in Algorithm 3. The algorithm takes a collection of possible candidate parent sets for each variable (e.g., the subsets of all variables other than X i with cardinality at most a given k, for each X i ). The heuristic initially labels all nodes as non visited (Line 1), and represents an ordering as a list L (initially empty; Line 2). Say that a parent set is valid if it does not contain visited nodes. Then the loop in Lines 3 to 12 generates an ordering as follows. In Line 4, the best valid parent set bestscore visited k = max P ak {X 1,...,X n}\visited sc k (P a k ) is selected for each non-visited variable, and then ranked by their score in decreasing order (Line 5). Then a variable is SBC ENIAC-2016 Recife - PE 175

8 generated with probability proportional to its ranking (Lines 6 to 9), and the ordering and the set of visited nodes are updated (Lines 10 and 11, respectively). Algorithm 3: BestFirst-Based ordering generation. Function: BestFirst( Candidate Parent Sets C i ) 1 visited 2 L 3 for r = n to 1 do 4 S {(X i, bestscore visited i ) X i visited} 5 Sort S decreasingly by bestscore visited i 6 for j = 1 to S do 7 P rob j = 1 j 8 end 9 X r random variable using probability distribution P rob r 10 L[r] X r 11 visited = visited {X r } 12 end 13 return L The time complexity of the procedure is dominated by the selection of the best parent set for each variable in Line 4. Assuming we have pre-calculated scores of parent sets bounded by in-degree k and using the efficient implementation of bestscore visited i developed by [Malone 2012] based on bitsets, then the complexity of the procedure is as follows. Line 4 is performed in worst-time O(nk), line 5 in time O(n log n). The loop from line 6 to 8 is O(n) and lines 9 to 11 are performed in constant time. Since it is necessary to repeat all steps at each iteration, the overall complexity is O(n 2 k). 6. Experiments, Results and Discussion We evaluate the performance of (acyclic selection) order-based structure learning algorithms with different initialization heuristics, including the standard random generation (RND), the depth-first search (DFS) and feedback arc set (FAS) based heuristics [Perez and Mauá 2015], and the new heuristic (BFT). We use a collection of real-world data sets whose characteristics are shown in Table 1. The algorithms were implemented in C++, using a few utilities from the URLearning 2 package for learning Bayesian networks. 3 For each data set we generated all parent sets in increasing cardinality, with a time limit of 2 minutes per variable. We used pruning rules in order to discard parent sets that are provably suboptimal (in the optimal ordering) [de Campos and Ji 2011]. This reduces significantly the number of candidate parent sets and improves performance. The number of non-pruned candidate parent sets generated is shown in column Nps in Table 1. We ran each search algorithm (ASOBS or OBS) with 100 different initial orderings obtained by different heuristics. The maximum and average relative scores 2 Available at 3 Implementation available at SBC ENIAC-2016 Recife - PE 176

9 Dataset n N N ps Kdd M Tretail M Msweb M Cr M Bbc M Ad M Table 1. Data sets characteristics. n: number of variables, N: number of instances and Nps: number of unpruned parent sets BFT RND DFS FAS BFT RND DFS FAS Figure 2. Maximum Best Score Figure 3. Average Best Score for each data set/algorithm appears in Table 2. The relative score of a DAG G is RC(G) = (sc(g) sc( ))/ sc( ), where sc( ) is the score of an empty DAG. Looking at the the results in Table 2 we see that BFT outperforms the other approaches w.r.t. the maximum best score in 5 out of 6 data sets. The ad data set, for example, where FAS outperforms BFT has a relatively small search space (see its Nps in Table 1), which might explain the superior performance of FAS+ASOBS. W.r.t. average best scores, FAS and BFT perform fairly similarly, and are superior to other approaches. Again, we see that FAS outperforms BFT on the Ad data set. Overall, ASOBS+BFT achieves higher scores than OBS+FAS and the improvement is more noticeable as the number of variable increases. To verify whether the performance differences are statistically significant, we performed a Friedman Test, which is a non-parametric hypothesis test with multiple comparison correction [Demsar 2006]. The computed p-values for the test are and for the maximum and average best score, respectively. Hence, there is a statistically significantly better method for the first criterion at significance level α = To obtain insight into the relative performance of the methods, we performed a post-hoc Nemenyi Test, which performs pairwise comparisons with multiple comparison correction [Demsar 2006]. The results for maximum and average best score are depicted in Figures 2 and 3, respectively. Each point represents the average ranking of an approach, and the intervals represent confidence intervals. In these figures, a method A is considered statistically significantly better than a method B if A has a smaller average ranking than B and their intervals do not overlap. We see from Figure 2 that w.r.t. the maximum best score BFT ranks better on average than other methods, but this difference is statistically significant only when compared to DFS (since confidence intervals overlap with other heuristics). Figure 3 shows that, w.r.t. the average best score, BFT and FAS rank better than other methods, but these differences are not statistically significant. SBC ENIAC-2016 Recife - PE 177

10 Data set Kdd Tretail Msweb Cr52 Bbc Ad Heuristic ASOBS OBS Max. Best Score Avg. Best Score Max. Best Score Avg. Best Score RND ± ± DFS ± ± FAS ± ± BFT ± ± RND ± ± DFS ± ± FAS ± ± BFT ± ± RND ± ± DFS ± ± FAS ± ± BFT ± ± RND ± ± DFS ± ± FAS ± ± BFT ± ± RND ± ± DFS ± ± FAS ± ± BFT ± ± RND ± ± DFS ± ± FAS ± ± BFT ± ± Table 2. Performance of order-based search algorithm (the best method for each data set/criterion is shown in bold) 7. Conclusions and Future Work Learning Bayesian networks from data is a notably difficult problem, and practitioners often resort to approximate solutions. A state-of-the-art approach for large domains is order-based structure learning, which performs a local search in the space of variable orderings. Acyclic selection is an scalable order-based algorithm that aims to be the new state-of-the-art approach. As with many local search approaches, the quality of the solutions produced by order-based learning strongly depends on the initialization strategy. In this work, we proposed a new informed heuristic for generating initial solutions for order-based structure learning. The heuristic, called BFT (from Best-First), mimics the way that acyclic selection performs structure search given an initial order. Experiments with 6 real-world data sets containing from 64 to 1556 variables demonstrate that the new heuristic often significantly improves the accuracy of acyclic selection order-based search. In summary, these results indicate the advantage of using informed approaches to generating initial orderings in large domains. In the future, we intend to compare the heuristics with a much larger collection of data sets, in order to obtain a more clear (and statistically significant) picture of the relative performance. Our new informed initialization heuristic can also be used in other methods that search the space of orderings such as tabu search [Glover 1989], simulated annealing [Granville et al. 1994] or data perturbation [Elidan et al. 2002]. This is also left as SBC ENIAC-2016 Recife - PE 178

11 future work. 8. Acknowledgement We thank Mauro Scanagatta for kindly providing us with the data sets. This work benefited from the computing resources provided by the Superintendência de Informação of Universidade de São Paulo. The first author was partially supported by CAPES. The second author was partially supported by the São Paulo Research Foundation (FAPESP) grant #2016/ References Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19(6): Buntine, W. (1991). Theory refinement on Bayesian networks. In Proceedings of the 7th Annual Conference on Uncertainty Artificial Intelligence, pages Chickering, D. M. (2002). Learning equivalence classes of Bayesian-network structures. Journal of Machine Learning Research, 2: Chickering, D. M., Heckerman, D., and Meek, C. (2004). Large-sample learning of Bayesian networks is NP-Hard. Journal of Machine Learning Research, 5(1): Cooper, G. F. and Herskovits, E. (1992). A Bayesian method for the induction of probabilistic networks from data. Machine Learning, 9(4): de Campos, C. P. and Ji, Q. (2011). Efficient structure learning of Bayesian networks using constraints. Journal of Machine Learning Research, 12: Demetrescu, C. and Finocchi, I. (2003). Combinatorial algorithms for feedback problems in directed graphs. Information Processing Letters, 86(3): Demsar, J. (2006). Statistical comparison of classifiers over multiple data sets. Journal of Machine Learning Research, 7:1 30. Eades, P. and Lin, X. (1995). A new heuristic for the feedback arc set problem. Australian Journal of Combinatorics, 12: Eades, P., X.Lin, and Smyth, W. F. (1993). A fast and effective heuristic for the feedback arc set problem. Information Processing Letters, 47(6): Elidan, G., Ninio, M., Friedman, N., and Schuurmans, D. (2002). Data perturbation for escaping local maxima in learning. In Eighteenth National Conference on Artificial Intelligence, pages Friedman, N., Nachman, I., and Peér, D. (1999). Learning Bayesian network structure from massive datasets: The sparse candidate algorithm. In Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence, pages Gavril, F. (1977). Some NP-complete problems on graphs. In Proceedings of the 11th Conference on Information Sciences and Systems, pages Glover, F. (1989). Tabu search - part I. Operations Research Society of America Journal on Computing, 1(3): SBC ENIAC-2016 Recife - PE 179

12 Granville, V., Krivanek, M., and Rasson, J. P. (1994). Simulated annealing: A proof of convergence. IEEE Transactions on Pattern Analysis and Machine Intelligence, 16(6): Heckerman, D., Geiger, D., and Chickering, D. M. (1995). Learning Bayesian networks: The combination of knowledge and statistical data. Journal of Machine Learning Research, 20(MSR-TR-94-09): Knuth, D. (1998). The Art of Computer Programming 2. Boston: Adison-Wesley. Lam, W. and Bacchus, F. (1994). Learning Bayesian belief networks: An approach based on the MDL principle. Computational Intelligence, 10(4):31. Liu, Z., Malone, B., and Yuan, C. (2012). Empirical evaluation of scoring functions for bayesian network model selection. BMC Bioinformatics, 13 (Suppl. 15(S14). Malone, B. M. (2012). Learning optimal Bayesian networks with heuristic search. PhD thesis, Mississippi State University, Mississipi, USA. Pearl, J. (1988). Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufmann. Perez, W. and Mauá, D. D. (2015). Initialization heuristics for greedy Bayesian network structure learning. In Proceedings of the 3rd Symposium on Knowledge Discovery, Mining and Learning, pages Scanagatta, M., de Campos, C. P., Corani, G., and Zaffalon, M. (2015). Learning Bayesian networks with thousands of variables. In Proceedings of the 29th Annual Conference on Neural Information Processing Systems, pages Schwarz, G. (1978). Estimating the dimension of a model. Annals of Statistics, 6(2): Suzuki, J. (1996). Learning Bayesian belief networks based on the minimum description length principle. In Proceedings of the Thirteenth International Conference on Machine Learning, pages Teyssier, M. and Koller, D. (2005). Ordering-based search: A simple and effective algorithm for learning Bayesian networks. In Proceedings of the 21st Conference on Uncertainty in Artificial Intelligence, UAI 2005, pages Yuan, C. and Malone, B. (2013). Learning optimal Bayesian networks: A shortest path perspective. Artificial Intelligence, 48: SBC ENIAC-2016 Recife - PE 180

BAYESIAN NETWORKS STRUCTURE LEARNING

BAYESIAN NETWORKS STRUCTURE LEARNING BAYESIAN NETWORKS STRUCTURE LEARNING Xiannian Fan Uncertainty Reasoning Lab (URL) Department of Computer Science Queens College/City University of New York http://url.cs.qc.cuny.edu 1/52 Overview : Bayesian

More information

Improved Local Search in Bayesian Networks Structure Learning

Improved Local Search in Bayesian Networks Structure Learning Proceedings of Machine Learning Research vol 73:45-56, 2017 AMBN 2017 Improved Local Search in Bayesian Networks Structure Learning Mauro Scanagatta IDSIA, SUPSI, USI - Lugano, Switzerland Giorgio Corani

More information

Ordering-Based Search: A Simple and Effective Algorithm for Learning Bayesian Networks

Ordering-Based Search: A Simple and Effective Algorithm for Learning Bayesian Networks Ordering-Based Search: A Simple and Effective Algorithm for Learning Bayesian Networks Marc Teyssier Computer Science Dept. Stanford University Stanford, CA 94305 Daphne Koller Computer Science Dept. Stanford

More information

Learning Optimal Bayesian Networks Using A* Search

Learning Optimal Bayesian Networks Using A* Search Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence Learning Optimal Bayesian Networks Using A* Search Changhe Yuan, Brandon Malone, and Xiaojian Wu Department of

More information

Summary: A Tutorial on Learning With Bayesian Networks

Summary: A Tutorial on Learning With Bayesian Networks Summary: A Tutorial on Learning With Bayesian Networks Markus Kalisch May 5, 2006 We primarily summarize [4]. When we think that it is appropriate, we comment on additional facts and more recent developments.

More information

Learning Bounded Tree-width Bayesian Networks using Integer Linear Programming

Learning Bounded Tree-width Bayesian Networks using Integer Linear Programming Learning Bounded Tree-width Bayesian Networks using Integer Linear Programming Pekka Parviainen Hossein Shahrabi Farahani Jens Lagergren University of British Columbia Department of Pathology Vancouver,

More information

Learning Directed Probabilistic Logical Models using Ordering-search

Learning Directed Probabilistic Logical Models using Ordering-search Learning Directed Probabilistic Logical Models using Ordering-search Daan Fierens, Jan Ramon, Maurice Bruynooghe, and Hendrik Blockeel K.U.Leuven, Dept. of Computer Science, Celestijnenlaan 200A, 3001

More information

Learning Bounded Treewidth Bayesian Networks

Learning Bounded Treewidth Bayesian Networks Journal of Machine Learning Research 9 (2008) 2287-2319 Submitted 5/08; Published 10/08 Learning Bounded Treewidth Bayesian Networks Gal Elidan Department of Statistics Hebrew University Jerusalem, 91905,

More information

Learning Bayesian Network Structure from Correlation-Immune Data

Learning Bayesian Network Structure from Correlation-Immune Data Learning Bayesian Network Structure from Correlation-Immune Data Eric Lantz Computer Sciences Dept. Biostat. and Med. Informatics Dept. University of Wisconsin-Madison Madison, WI 53706 Soumya Ray School

More information

Supplementary Material: The Emergence of. Organizing Structure in Conceptual Representation

Supplementary Material: The Emergence of. Organizing Structure in Conceptual Representation Supplementary Material: The Emergence of Organizing Structure in Conceptual Representation Brenden M. Lake, 1,2 Neil D. Lawrence, 3 Joshua B. Tenenbaum, 4,5 1 Center for Data Science, New York University

More information

A Well-Behaved Algorithm for Simulating Dependence Structures of Bayesian Networks

A Well-Behaved Algorithm for Simulating Dependence Structures of Bayesian Networks A Well-Behaved Algorithm for Simulating Dependence Structures of Bayesian Networks Yang Xiang and Tristan Miller Department of Computer Science University of Regina Regina, Saskatchewan, Canada S4S 0A2

More information

Learning Treewidth-Bounded Bayesian Networks with Thousands of Variables

Learning Treewidth-Bounded Bayesian Networks with Thousands of Variables Learning Treewidth-Bounded Bayesian Networks with Thousands of Variables Scanagatta, M., Corani, G., de Campos, C. P., & Zaffalon, M. (2016). Learning Treewidth-Bounded Bayesian Networks with Thousands

More information

Loopy Belief Propagation

Loopy Belief Propagation Loopy Belief Propagation Research Exam Kristin Branson September 29, 2003 Loopy Belief Propagation p.1/73 Problem Formalization Reasoning about any real-world problem requires assumptions about the structure

More information

Local Search Methods for Learning Bayesian Networks Using a Modified Neighborhood in the Space of DAGs

Local Search Methods for Learning Bayesian Networks Using a Modified Neighborhood in the Space of DAGs Local Search Methods for Learning Bayesian Networks Using a Modified Neighborhood in the Space of DAGs L.M. de Campos 1, J.M. Fernández-Luna 2, and J.M. Puerta 3 1 Dpto. de Ciencias de la Computación e

More information

Mix-nets: Factored Mixtures of Gaussians in Bayesian Networks with Mixed Continuous And Discrete Variables

Mix-nets: Factored Mixtures of Gaussians in Bayesian Networks with Mixed Continuous And Discrete Variables Mix-nets: Factored Mixtures of Gaussians in Bayesian Networks with Mixed Continuous And Discrete Variables Scott Davies and Andrew Moore School of Computer Science Carnegie Mellon University Pittsburgh,

More information

A Parallel Algorithm for Exact Structure Learning of Bayesian Networks

A Parallel Algorithm for Exact Structure Learning of Bayesian Networks A Parallel Algorithm for Exact Structure Learning of Bayesian Networks Olga Nikolova, Jaroslaw Zola, and Srinivas Aluru Department of Computer Engineering Iowa State University Ames, IA 0010 {olia,zola,aluru}@iastate.edu

More information

On Pruning with the MDL Score

On Pruning with the MDL Score ON PRUNING WITH THE MDL SCORE On Pruning with the MDL Score Eunice Yuh-Jie Chen Arthur Choi Adnan Darwiche Computer Science Department University of California, Los Angeles Los Angeles, CA 90095 EYJCHEN@CS.UCLA.EDU

More information

Probabilistic Abstraction Lattices: A Computationally Efficient Model for Conditional Probability Estimation

Probabilistic Abstraction Lattices: A Computationally Efficient Model for Conditional Probability Estimation Probabilistic Abstraction Lattices: A Computationally Efficient Model for Conditional Probability Estimation Daniel Lowd January 14, 2004 1 Introduction Probabilistic models have shown increasing popularity

More information

Metaheuristics for Score-and-Search Bayesian Network Structure Learning

Metaheuristics for Score-and-Search Bayesian Network Structure Learning Metaheuristics for Score-and-Search Bayesian Network Structure Learning Colin Lee and Peter van Beek Cheriton School of Computer Science, University of Waterloo Abstract. Structure optimization is one

More information

Exam Advanced Data Mining Date: Time:

Exam Advanced Data Mining Date: Time: Exam Advanced Data Mining Date: 11-11-2010 Time: 13.30-16.30 General Remarks 1. You are allowed to consult 1 A4 sheet with notes written on both sides. 2. Always show how you arrived at the result of your

More information

3 No-Wait Job Shops with Variable Processing Times

3 No-Wait Job Shops with Variable Processing Times 3 No-Wait Job Shops with Variable Processing Times In this chapter we assume that, on top of the classical no-wait job shop setting, we are given a set of processing times for each operation. We may select

More information

The max-min hill-climbing Bayesian network structure learning algorithm

The max-min hill-climbing Bayesian network structure learning algorithm Mach Learn (2006) 65:31 78 DOI 10.1007/s10994-006-6889-7 The max-min hill-climbing Bayesian network structure learning algorithm Ioannis Tsamardinos Laura E. Brown Constantin F. Aliferis Received: January

More information

Ordering attributes for missing values prediction and data classification

Ordering attributes for missing values prediction and data classification Ordering attributes for missing values prediction and data classification E. R. Hruschka Jr., N. F. F. Ebecken COPPE /Federal University of Rio de Janeiro, Brazil. Abstract This work shows the application

More information

A Deterministic Annealing Approach to Learning Bayesian Networks

A Deterministic Annealing Approach to Learning Bayesian Networks Journal of Machine Learning Research 1 (0000) 0-0 Submitted 0/00; Published 00/00 A Deterministic Annealing Approach to Learning Bayesian Networks Ahmed M. Hassan Computer Engineering Dept. Cairo University

More information

Topic: Local Search: Max-Cut, Facility Location Date: 2/13/2007

Topic: Local Search: Max-Cut, Facility Location Date: 2/13/2007 CS880: Approximations Algorithms Scribe: Chi Man Liu Lecturer: Shuchi Chawla Topic: Local Search: Max-Cut, Facility Location Date: 2/3/2007 In previous lectures we saw how dynamic programming could be

More information

I How does the formulation (5) serve the purpose of the composite parameterization

I How does the formulation (5) serve the purpose of the composite parameterization Supplemental Material to Identifying Alzheimer s Disease-Related Brain Regions from Multi-Modality Neuroimaging Data using Sparse Composite Linear Discrimination Analysis I How does the formulation (5)

More information

10708 Graphical Models: Homework 2

10708 Graphical Models: Homework 2 10708 Graphical Models: Homework 2 Due October 15th, beginning of class October 1, 2008 Instructions: There are six questions on this assignment. Each question has the name of one of the TAs beside it,

More information

Sub-Local Constraint-Based Learning of Bayesian Networks Using A Joint Dependence Criterion

Sub-Local Constraint-Based Learning of Bayesian Networks Using A Joint Dependence Criterion Journal of Machine Learning Research 14 (2013) 1563-1603 Submitted 11/10; Revised 9/12; Published 6/13 Sub-Local Constraint-Based Learning of Bayesian Networks Using A Joint Dependence Criterion Rami Mahdi

More information

An Improved Lower Bound for Bayesian Network Structure Learning

An Improved Lower Bound for Bayesian Network Structure Learning An Improved Lower Bound for Bayesian Network Structure Learning Xiannian Fan and Changhe Yuan Graduate Center and Queens College City University of New York 365 Fifth Avenue, New York 10016 Abstract Several

More information

Learning Automata Based Algorithms for Finding Minimum Weakly Connected Dominating Set in Stochastic Graphs

Learning Automata Based Algorithms for Finding Minimum Weakly Connected Dominating Set in Stochastic Graphs Learning Automata Based Algorithms for Finding Minimum Weakly Connected Dominating Set in Stochastic Graphs Javad Akbari Torkestani Department of Computer Engineering, Islamic Azad University, Arak Branch,

More information

Dependency detection with Bayesian Networks

Dependency detection with Bayesian Networks Dependency detection with Bayesian Networks M V Vikhreva Faculty of Computational Mathematics and Cybernetics, Lomonosov Moscow State University, Leninskie Gory, Moscow, 119991 Supervisor: A G Dyakonov

More information

Data Mining Chapter 8: Search and Optimization Methods Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Data Mining Chapter 8: Search and Optimization Methods Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Data Mining Chapter 8: Search and Optimization Methods Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Search & Optimization Search and Optimization method deals with

More information

Variable Neighborhood Search for Solving the Balanced Location Problem

Variable Neighborhood Search for Solving the Balanced Location Problem TECHNISCHE UNIVERSITÄT WIEN Institut für Computergraphik und Algorithmen Variable Neighborhood Search for Solving the Balanced Location Problem Jozef Kratica, Markus Leitner, Ivana Ljubić Forschungsbericht

More information

Complementary Graph Coloring

Complementary Graph Coloring International Journal of Computer (IJC) ISSN 2307-4523 (Print & Online) Global Society of Scientific Research and Researchers http://ijcjournal.org/ Complementary Graph Coloring Mohamed Al-Ibrahim a*,

More information

Lecture 11: May 1, 2000

Lecture 11: May 1, 2000 / EE596 Pat. Recog. II: Introduction to Graphical Models Spring 2000 Lecturer: Jeff Bilmes Lecture 11: May 1, 2000 University of Washington Dept. of Electrical Engineering Scribe: David Palmer 11.1 Graph

More information

COL351: Analysis and Design of Algorithms (CSE, IITD, Semester-I ) Name: Entry number:

COL351: Analysis and Design of Algorithms (CSE, IITD, Semester-I ) Name: Entry number: Name: Entry number: There are 6 questions for a total of 75 points. 1. Consider functions f(n) = 10n2 n + 3 n and g(n) = n3 n. Answer the following: (a) ( 1 / 2 point) State true or false: f(n) is O(g(n)).

More information

Algorithms for Integer Programming

Algorithms for Integer Programming Algorithms for Integer Programming Laura Galli November 9, 2016 Unlike linear programming problems, integer programming problems are very difficult to solve. In fact, no efficient general algorithm is

More information

Optimal Reinsertion: A new search operator for accelerated and more accurate Bayesian network structure learning

Optimal Reinsertion: A new search operator for accelerated and more accurate Bayesian network structure learning : A new search operator for accelerated and more accurate Bayesian network structure learning Andrew Moore Weng-Keen Wong School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213 USA

More information

On the Complexity of the Policy Improvement Algorithm. for Markov Decision Processes

On the Complexity of the Policy Improvement Algorithm. for Markov Decision Processes On the Complexity of the Policy Improvement Algorithm for Markov Decision Processes Mary Melekopoglou Anne Condon Computer Sciences Department University of Wisconsin - Madison 0 West Dayton Street Madison,

More information

These notes present some properties of chordal graphs, a set of undirected graphs that are important for undirected graphical models.

These notes present some properties of chordal graphs, a set of undirected graphs that are important for undirected graphical models. Undirected Graphical Models: Chordal Graphs, Decomposable Graphs, Junction Trees, and Factorizations Peter Bartlett. October 2003. These notes present some properties of chordal graphs, a set of undirected

More information

Learning Equivalence Classes of Bayesian-Network Structures

Learning Equivalence Classes of Bayesian-Network Structures Journal of Machine Learning Research 2 (2002) 445-498 Submitted 7/01; Published 2/02 Learning Equivalence Classes of Bayesian-Network Structures David Maxwell Chickering Microsoft Research One Microsoft

More information

Framework for Design of Dynamic Programming Algorithms

Framework for Design of Dynamic Programming Algorithms CSE 441T/541T Advanced Algorithms September 22, 2010 Framework for Design of Dynamic Programming Algorithms Dynamic programming algorithms for combinatorial optimization generalize the strategy we studied

More information

Evaluating the Explanatory Value of Bayesian Network Structure Learning Algorithms

Evaluating the Explanatory Value of Bayesian Network Structure Learning Algorithms Evaluating the Explanatory Value of Bayesian Network Structure Learning Algorithms Patrick Shaughnessy University of Massachusetts, Lowell pshaughn@cs.uml.edu Gary Livingston University of Massachusetts,

More information

Understanding the effects of search constraints on structure learning

Understanding the effects of search constraints on structure learning Understanding the effects of search constraints on structure learning Michael Hay, Andrew Fast, and David Jensen {mhay,afast,jensen}@cs.umass.edu University of Massachusetts Amherst Computer Science Department

More information

Bayesian Networks Structure Learning Second Exam Literature Review

Bayesian Networks Structure Learning Second Exam Literature Review Bayesian Networks Structure Learning Second Exam Literature Review Xiannian Fan xnf1203@gmail.com December 2014 Abstract Bayesian networks are widely used graphical models which represent uncertainty relations

More information

On Local Optima in Learning Bayesian Networks

On Local Optima in Learning Bayesian Networks On Local Optima in Learning Bayesian Networks Jens D. Nielsen, Tomáš Kočka and Jose M. Peña Department of Computer Science Aalborg University, Denmark {dalgaard, kocka, jmp}@cs.auc.dk Abstract This paper

More information

STAT 598L Probabilistic Graphical Models. Instructor: Sergey Kirshner. Exact Inference

STAT 598L Probabilistic Graphical Models. Instructor: Sergey Kirshner. Exact Inference STAT 598L Probabilistic Graphical Models Instructor: Sergey Kirshner Exact Inference What To Do With Bayesian/Markov Network? Compact representation of a complex model, but Goal: efficient extraction of

More information

The Acyclic Bayesian Net Generator (Student Paper)

The Acyclic Bayesian Net Generator (Student Paper) The Acyclic Bayesian Net Generator (Student Paper) Pankaj B. Gupta and Vicki H. Allan Microsoft Corporation, One Microsoft Way, Redmond, WA 98, USA, pagupta@microsoft.com Computer Science Department, Utah

More information

Bayesian Networks Inference (continued) Learning

Bayesian Networks Inference (continued) Learning Learning BN tutorial: ftp://ftp.research.microsoft.com/pub/tr/tr-95-06.pdf TAN paper: http://www.cs.huji.ac.il/~nir/abstracts/frgg1.html Bayesian Networks Inference (continued) Learning Machine Learning

More information

On maximum spanning DAG algorithms for semantic DAG parsing

On maximum spanning DAG algorithms for semantic DAG parsing On maximum spanning DAG algorithms for semantic DAG parsing Natalie Schluter Department of Computer Science School of Technology, Malmö University Malmö, Sweden natalie.schluter@mah.se Abstract Consideration

More information

Information Processing Letters

Information Processing Letters Information Processing Letters 112 (2012) 449 456 Contents lists available at SciVerse ScienceDirect Information Processing Letters www.elsevier.com/locate/ipl Recursive sum product algorithm for generalized

More information

Joint Entity Resolution

Joint Entity Resolution Joint Entity Resolution Steven Euijong Whang, Hector Garcia-Molina Computer Science Department, Stanford University 353 Serra Mall, Stanford, CA 94305, USA {swhang, hector}@cs.stanford.edu No Institute

More information

Methods and Models for Combinatorial Optimization Exact methods for the Traveling Salesman Problem

Methods and Models for Combinatorial Optimization Exact methods for the Traveling Salesman Problem Methods and Models for Combinatorial Optimization Exact methods for the Traveling Salesman Problem L. De Giovanni M. Di Summa The Traveling Salesman Problem (TSP) is an optimization problem on a directed

More information

Some Applications of Graph Bandwidth to Constraint Satisfaction Problems

Some Applications of Graph Bandwidth to Constraint Satisfaction Problems Some Applications of Graph Bandwidth to Constraint Satisfaction Problems Ramin Zabih Computer Science Department Stanford University Stanford, California 94305 Abstract Bandwidth is a fundamental concept

More information

A Transformational Characterization of Markov Equivalence for Directed Maximal Ancestral Graphs

A Transformational Characterization of Markov Equivalence for Directed Maximal Ancestral Graphs A Transformational Characterization of Markov Equivalence for Directed Maximal Ancestral Graphs Jiji Zhang Philosophy Department Carnegie Mellon University Pittsburgh, PA 15213 jiji@andrew.cmu.edu Abstract

More information

Random projection for non-gaussian mixture models

Random projection for non-gaussian mixture models Random projection for non-gaussian mixture models Győző Gidófalvi Department of Computer Science and Engineering University of California, San Diego La Jolla, CA 92037 gyozo@cs.ucsd.edu Abstract Recently,

More information

An Improved Admissible Heuristic for Learning Optimal Bayesian Networks

An Improved Admissible Heuristic for Learning Optimal Bayesian Networks An Improved Admissible Heuristic for Learning Optimal Bayesian Networks Changhe Yuan 1,3 and Brandon Malone 2,3 1 Queens College/City University of New York, 2 Helsinki Institute for Information Technology,

More information

EXERCISES SHORTEST PATHS: APPLICATIONS, OPTIMIZATION, VARIATIONS, AND SOLVING THE CONSTRAINED SHORTEST PATH PROBLEM. 1 Applications and Modelling

EXERCISES SHORTEST PATHS: APPLICATIONS, OPTIMIZATION, VARIATIONS, AND SOLVING THE CONSTRAINED SHORTEST PATH PROBLEM. 1 Applications and Modelling SHORTEST PATHS: APPLICATIONS, OPTIMIZATION, VARIATIONS, AND SOLVING THE CONSTRAINED SHORTEST PATH PROBLEM EXERCISES Prepared by Natashia Boland 1 and Irina Dumitrescu 2 1 Applications and Modelling 1.1

More information

SIMULTANEOUS COMPUTATION OF MODEL ORDER AND PARAMETER ESTIMATION FOR ARX MODEL BASED ON MULTI- SWARM PARTICLE SWARM OPTIMIZATION

SIMULTANEOUS COMPUTATION OF MODEL ORDER AND PARAMETER ESTIMATION FOR ARX MODEL BASED ON MULTI- SWARM PARTICLE SWARM OPTIMIZATION SIMULTANEOUS COMPUTATION OF MODEL ORDER AND PARAMETER ESTIMATION FOR ARX MODEL BASED ON MULTI- SWARM PARTICLE SWARM OPTIMIZATION Kamil Zakwan Mohd Azmi, Zuwairie Ibrahim and Dwi Pebrianti Faculty of Electrical

More information

Solution for Homework set 3

Solution for Homework set 3 TTIC 300 and CMSC 37000 Algorithms Winter 07 Solution for Homework set 3 Question (0 points) We are given a directed graph G = (V, E), with two special vertices s and t, and non-negative integral capacities

More information

Learning decomposable models with a bounded clique size

Learning decomposable models with a bounded clique size Learning decomposable models with a bounded clique size Achievements 2014-2016 Aritz Pérez Basque Center for Applied Mathematics Bilbao, March, 2016 Outline 1 Motivation and background 2 The problem 3

More information

Learning Bayesian networks with ancestral constraints

Learning Bayesian networks with ancestral constraints Learning Bayesian networks with ancestral constraints Eunice Yuh-Jie Chen and Yujia Shen and Arthur Choi and Adnan Darwiche Computer Science Department University of California Los Angeles, CA 90095 {eyjchen,yujias,aychoi,darwiche}@cs.ucla.edu

More information

Decomposition of log-linear models

Decomposition of log-linear models Graphical Models, Lecture 5, Michaelmas Term 2009 October 27, 2009 Generating class Dependence graph of log-linear model Conformal graphical models Factor graphs A density f factorizes w.r.t. A if there

More information

Testing Independencies in Bayesian Networks with i-separation

Testing Independencies in Bayesian Networks with i-separation Proceedings of the Twenty-Ninth International Florida Artificial Intelligence Research Society Conference Testing Independencies in Bayesian Networks with i-separation Cory J. Butz butz@cs.uregina.ca University

More information

CS 161 Lecture 11 BFS, Dijkstra s algorithm Jessica Su (some parts copied from CLRS) 1 Review

CS 161 Lecture 11 BFS, Dijkstra s algorithm Jessica Su (some parts copied from CLRS) 1 Review 1 Review 1 Something I did not emphasize enough last time is that during the execution of depth-firstsearch, we construct depth-first-search trees. One graph may have multiple depth-firstsearch trees,

More information

A Genetic Algorithm Applied to Graph Problems Involving Subsets of Vertices

A Genetic Algorithm Applied to Graph Problems Involving Subsets of Vertices A Genetic Algorithm Applied to Graph Problems Involving Subsets of Vertices Yaser Alkhalifah Roger L. Wainwright Department of Mathematical Department of Mathematical and Computer Sciences and Computer

More information

Inference Complexity As Learning Bias. The Goal Outline. Don t use model complexity as your learning bias

Inference Complexity As Learning Bias. The Goal Outline. Don t use model complexity as your learning bias Inference Complexity s Learning Bias Daniel Lowd Dept. of Computer and Information Science University of Oregon Don t use model complexity as your learning bias Use inference complexity. Joint work with

More information

Dr. Amotz Bar-Noy s Compendium of Algorithms Problems. Problems, Hints, and Solutions

Dr. Amotz Bar-Noy s Compendium of Algorithms Problems. Problems, Hints, and Solutions Dr. Amotz Bar-Noy s Compendium of Algorithms Problems Problems, Hints, and Solutions Chapter 1 Searching and Sorting Problems 1 1.1 Array with One Missing 1.1.1 Problem Let A = A[1],..., A[n] be an array

More information

Using Optimal Dependency-Trees for Combinatorial Optimization: Learning the Structure of the Search Space

Using Optimal Dependency-Trees for Combinatorial Optimization: Learning the Structure of the Search Space Using Optimal Dependency-Trees for Combinatorial Optimization: Learning the Structure of the Search Space Abstract Many combinatorial optimization algorithms have no mechanism for capturing inter-parameter

More information

Midterm Examination CS540-2: Introduction to Artificial Intelligence

Midterm Examination CS540-2: Introduction to Artificial Intelligence Midterm Examination CS540-2: Introduction to Artificial Intelligence March 15, 2018 LAST NAME: FIRST NAME: Problem Score Max Score 1 12 2 13 3 9 4 11 5 8 6 13 7 9 8 16 9 9 Total 100 Question 1. [12] Search

More information

Methods and Models for Combinatorial Optimization Heuristis for Combinatorial Optimization

Methods and Models for Combinatorial Optimization Heuristis for Combinatorial Optimization Methods and Models for Combinatorial Optimization Heuristis for Combinatorial Optimization L. De Giovanni 1 Introduction Solution methods for Combinatorial Optimization Problems (COPs) fall into two classes:

More information

3. Genetic local search for Earth observation satellites operations scheduling

3. Genetic local search for Earth observation satellites operations scheduling Distance preserving recombination operator for Earth observation satellites operations scheduling Andrzej Jaszkiewicz Institute of Computing Science, Poznan University of Technology ul. Piotrowo 3a, 60-965

More information

CS570: Introduction to Data Mining

CS570: Introduction to Data Mining CS570: Introduction to Data Mining Classification Advanced Reading: Chapter 8 & 9 Han, Chapters 4 & 5 Tan Anca Doloc-Mihu, Ph.D. Slides courtesy of Li Xiong, Ph.D., 2011 Han, Kamber & Pei. Data Mining.

More information

An Introduction to Probabilistic Graphical Models for Relational Data

An Introduction to Probabilistic Graphical Models for Relational Data An Introduction to Probabilistic Graphical Models for Relational Data Lise Getoor Computer Science Department/UMIACS University of Maryland College Park, MD 20740 getoor@cs.umd.edu Abstract We survey some

More information

Sum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 15, 2015

Sum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 15, 2015 Sum-Product Networks STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 15, 2015 Introduction Outline What is a Sum-Product Network? Inference Applications In more depth

More information

Distributed minimum spanning tree problem

Distributed minimum spanning tree problem Distributed minimum spanning tree problem Juho-Kustaa Kangas 24th November 2012 Abstract Given a connected weighted undirected graph, the minimum spanning tree problem asks for a spanning subtree with

More information

Optimization I : Brute force and Greedy strategy

Optimization I : Brute force and Greedy strategy Chapter 3 Optimization I : Brute force and Greedy strategy A generic definition of an optimization problem involves a set of constraints that defines a subset in some underlying space (like the Euclidean

More information

Graph Algorithms Using Depth First Search

Graph Algorithms Using Depth First Search Graph Algorithms Using Depth First Search Analysis of Algorithms Week 8, Lecture 1 Prepared by John Reif, Ph.D. Distinguished Professor of Computer Science Duke University Graph Algorithms Using Depth

More information

A Framework for Space and Time Efficient Scheduling of Parallelism

A Framework for Space and Time Efficient Scheduling of Parallelism A Framework for Space and Time Efficient Scheduling of Parallelism Girija J. Narlikar Guy E. Blelloch December 996 CMU-CS-96-97 School of Computer Science Carnegie Mellon University Pittsburgh, PA 523

More information

A Tabu Search solution algorithm

A Tabu Search solution algorithm Chapter 5 A Tabu Search solution algorithm The TS examines a trajectory sequence of solutions and moves to the best neighbor of the current solution. To avoid cycling, solutions that were recently examined

More information

CMPSCI 311: Introduction to Algorithms Practice Final Exam

CMPSCI 311: Introduction to Algorithms Practice Final Exam CMPSCI 311: Introduction to Algorithms Practice Final Exam Name: ID: Instructions: Answer the questions directly on the exam pages. Show all your work for each question. Providing more detail including

More information

Survey of contemporary Bayesian Network Structure Learning methods

Survey of contemporary Bayesian Network Structure Learning methods Survey of contemporary Bayesian Network Structure Learning methods Ligon Liu September 2015 Ligon Liu (CUNY) Survey on Bayesian Network Structure Learning (slide 1) September 2015 1 / 38 Bayesian Network

More information

Constraint Satisfaction Problems

Constraint Satisfaction Problems Constraint Satisfaction Problems CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2013 Soleymani Course material: Artificial Intelligence: A Modern Approach, 3 rd Edition,

More information

Hybrid Feature Selection for Modeling Intrusion Detection Systems

Hybrid Feature Selection for Modeling Intrusion Detection Systems Hybrid Feature Selection for Modeling Intrusion Detection Systems Srilatha Chebrolu, Ajith Abraham and Johnson P Thomas Department of Computer Science, Oklahoma State University, USA ajith.abraham@ieee.org,

More information

An Exact Approach to Learning Probabilistic Relational Model

An Exact Approach to Learning Probabilistic Relational Model JMLR: Workshop and Conference Proceedings vol 52, 171-182, 2016 PGM 2016 An Exact Approach to Learning Probabilistic Relational Model Nourhene Ettouzi LARODEC, ISG Sousse, Tunisia Philippe Leray LINA,

More information

arxiv: v2 [cs.ds] 25 Jan 2017

arxiv: v2 [cs.ds] 25 Jan 2017 d-hop Dominating Set for Directed Graph with in-degree Bounded by One arxiv:1404.6890v2 [cs.ds] 25 Jan 2017 Joydeep Banerjee, Arun Das, and Arunabha Sen School of Computing, Informatics and Decision System

More information

COS 513: Foundations of Probabilistic Modeling. Lecture 5

COS 513: Foundations of Probabilistic Modeling. Lecture 5 COS 513: Foundations of Probabilistic Modeling Young-suk Lee 1 Administrative Midterm report is due Oct. 29 th. Recitation is at 4:26pm in Friend 108. Lecture 5 R is a computer language for statistical

More information

Clustering Using Graph Connectivity

Clustering Using Graph Connectivity Clustering Using Graph Connectivity Patrick Williams June 3, 010 1 Introduction It is often desirable to group elements of a set into disjoint subsets, based on the similarity between the elements in the

More information

Information Criteria Methods in SAS for Multiple Linear Regression Models

Information Criteria Methods in SAS for Multiple Linear Regression Models Paper SA5 Information Criteria Methods in SAS for Multiple Linear Regression Models Dennis J. Beal, Science Applications International Corporation, Oak Ridge, TN ABSTRACT SAS 9.1 calculates Akaike s Information

More information

Randomized rounding of semidefinite programs and primal-dual method for integer linear programming. Reza Moosavi Dr. Saeedeh Parsaeefard Dec.

Randomized rounding of semidefinite programs and primal-dual method for integer linear programming. Reza Moosavi Dr. Saeedeh Parsaeefard Dec. Randomized rounding of semidefinite programs and primal-dual method for integer linear programming Dr. Saeedeh Parsaeefard 1 2 3 4 Semidefinite Programming () 1 Integer Programming integer programming

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Overview of Part Two Probabilistic Graphical Models Part Two: Inference and Learning Christopher M. Bishop Exact inference and the junction tree MCMC Variational methods and EM Example General variational

More information

4 INFORMED SEARCH AND EXPLORATION. 4.1 Heuristic Search Strategies

4 INFORMED SEARCH AND EXPLORATION. 4.1 Heuristic Search Strategies 55 4 INFORMED SEARCH AND EXPLORATION We now consider informed search that uses problem-specific knowledge beyond the definition of the problem itself This information helps to find solutions more efficiently

More information

Advances in Learning Bayesian Networks of Bounded Treewidth

Advances in Learning Bayesian Networks of Bounded Treewidth Advances in Learning Bayesian Networks of Bounded Treewidth Siqi Nie Rensselaer Polytechnic Institute Troy, NY, USA nies@rpi.edu Cassio P. de Campos Queen s University Belfast Belfast, UK c.decampos@qub.ac.uk

More information

Name: UW CSE 473 Midterm, Fall 2014

Name: UW CSE 473 Midterm, Fall 2014 Instructions Please answer clearly and succinctly. If an explanation is requested, think carefully before writing. Points may be removed for rambling answers. If a question is unclear or ambiguous, feel

More information

An efficient approach for finding the MPE in belief networks

An efficient approach for finding the MPE in belief networks 342 Li and D'Ambrosio An efficient approach for finding the MPE in belief networks Zhaoyu Li Department of Computer Science Oregon State University Corvallis, OR 97331 Bruce D'Ambrosio Department of Computer

More information

Last topic: Summary; Heuristics and Approximation Algorithms Topics we studied so far:

Last topic: Summary; Heuristics and Approximation Algorithms Topics we studied so far: Last topic: Summary; Heuristics and Approximation Algorithms Topics we studied so far: I Strength of formulations; improving formulations by adding valid inequalities I Relaxations and dual problems; obtaining

More information

Limitations of Matrix Completion via Trace Norm Minimization

Limitations of Matrix Completion via Trace Norm Minimization Limitations of Matrix Completion via Trace Norm Minimization ABSTRACT Xiaoxiao Shi Computer Science Department University of Illinois at Chicago xiaoxiao@cs.uic.edu In recent years, compressive sensing

More information

Graphical Analysis of Value of Information in Decision Models

Graphical Analysis of Value of Information in Decision Models From: FLAIRS-01 Proceedings. Copyright 2001, AAAI (www.aaai.org). All rights reserved. Graphical Analysis of Value of Information in Decision Models Songsong Xu Kim-Leng Poh Department of lndustrial &

More information

An Efficient Approximation for the Generalized Assignment Problem

An Efficient Approximation for the Generalized Assignment Problem An Efficient Approximation for the Generalized Assignment Problem Reuven Cohen Liran Katzir Danny Raz Department of Computer Science Technion Haifa 32000, Israel Abstract We present a simple family of

More information

7. Decision or classification trees

7. Decision or classification trees 7. Decision or classification trees Next we are going to consider a rather different approach from those presented so far to machine learning that use one of the most common and important data structure,

More information