Statistical relationship discovery in SNP data using Bayesian networks

Size: px
Start display at page:

Download "Statistical relationship discovery in SNP data using Bayesian networks"

Transcription

1 Statistical relationship discovery in SNP data using Bayesian networks Pawe l Szlendak and Robert M. Nowak Institute of Electronic Systems, Warsaw University of Technology, Nowowiejska 5/9, -665 Warsaw, Poland ABSTRACT The aim of this article is to present an application of Bayesian networks for discovery of affinity relationships based on genetic data. The presented solution uses a search and score algorithm to discover the Bayesian network structure which best fits the data i.e. the alleles of single nucleotide polymorphisms detected by DNA microarrays. The algorithm perceives structure learning as a combinatorial optimization problem. It is a randomized local search algorithm, which uses a Bayesian-Dirichlet scoring function. The algorithm s testing procedure encompasses tests on synthetic data, generated from given Bayesian networks by a forward sampling procedure as well as tests on real-world genetic data. The comparison of Bayesian networks generated by the application and the genetic evidence data confirms the usability of the presented methods. Keywords: bioinformatics, Bayesian network, single nucleotide polymorphism. INTRODUCTION Bayesian networks belong to the family of graphical models which combine probability theory and graph theory. They model statistical relationships of attributes in data (perceived as random variables) using a directed acyclic graph (DAG) to visualize the relationships. Data (we considered discrete data with no missing values nor latent variables) is seen as a finite set of random variables X = {X, X 2,..., X n } where each X i may take on value x i from the finite and discrete domain. Formally, a Bayesian network is defined as a pair (G, P ), where G is a DAG and P is a joint probability distribution of random variables from X. Nodes in G represent the variables of X and edges express the relationships between these random variables. Depending on the domain being modeled the directionality of a particular arc can be sometimes treated as a causal relationship. A Bayesian network satisfies a Markov condition, which says that a node in a Bayesian network is independent of its nondescendents given its parents. Markov condition enables factorization of joint probability distribution of random variables in a form given by equation. P (X, X 2..., X n ) = n P (X i PA Xi ) () Thus, a joint probability distribution is expressed as a product of conditional distributions of all nodes given values of their parents in G. To specify () it is required to determine each conditional probability of X i given the set of its parents. In case of X i taking on discrete values the conditional probabilities P (X i PA Xi ) are represented in form of conditional probability tables (CPTs), which consist of conditional distributions P (x i pa Xi ) for each possible value x i of X i and each possible configuration of parents values pa Xi of PA Xi. Owning to the factorization in equation, the complexity of data analysis is significantly reduced. The joint probability distribution P can be equally well expressed by more than one DAG, therefore (Markov) equivalence classes of DAGs are defined. Two DAGs belong to the same Markov equivalent class if and only if Further author information: Pawe l Szlendak: P.Szlendak@stud.elka.pw.edu.pl Robert M. Nowak: R.M.Nowak@elka.pw.edu.pl i=

2 they have the same skeletons (edges with disregard to directions) and the same set of v-structures, that is, two converging arrows whose tails are not connected by an arrow. 2 A Markov equivalence class could be represented as a graph, such representation is called PDAG (pattern DAG). The problem tackled in this paper was to discover the Bayesian network (more precisely the PDAG) which best fits the given genetic data. The nodes in the network represented individuals and the arcs expressed the relationships which (being statistically important) might be interesting from point of view of medicine, forensic science or biology. The analyzed genetic data referred the variations in DNA sequence of the human genome. The genetic information is stored in DNA molecules. A DNA molecule is a chain built from 4 nucleotides: Adenine (A), Cytosine (C), Guanine (G) and Thymine (T), of length depending on an organism, e.g. the human genome is 3 9 nucleotide long. The 99% of DNA sequence is identical among individuals of a given species. The most frequently variation of DNA is single nucleotide polymorphism (SNP) that occur when a single nucleotide in the genome sequence is altered. The number of SNPs in the human genome is estimated to be 3 6, 3 and these variations are evolutionary stable (not changing much from generation to generation), thus they are useful in medicine and forensic science. To detect SNPs microarray technology is used, capable of identifying about 4 variants in one step. Genetic data, usually taken into consideration in forensic science or medicine includes those parts of DNA code that varies among individuals. A place in a DNA sequence, where a particular genetic information is stored, is called a locus, the DNA string stored at locus is called gene. Different substrings of DNA that may occur at given locus are called variants or alleles. In diploid organisms (e.g. human) majority of the genes come in two copies (inherited from a female and a male). Growing availability of genetic data owning to technology development caused many genetic problems to be expressed in terms of graphical models, since the graphical models are the natural way to express the relationships between individuals. 4 The problems range from identifying genes causing particular diseases, discovering affinity relationships within a group of individuals or examining cell life cycle. Figure shows a pedigree diagram commonly used in genetics and a graph which shows the same relationships on a DAG. b c b c b b a c b c b b a c b c a c a c a a Figure. Traditional pedigree diagram vs. pedigree DAG. Females are represented by circles, males by squares. The letters indicate particular alleles of an individual at a given locus. The problem, which extends the traditional assessment of alleles given parent-child relationships, is that of establishing unknown affinity relationships for a group of individuals based on SNPs, e.g. identifying multiple remains from disasters, wars. Specialized algorithms have been developed for the analysis of genetic linkage with graphical models. 5, 6 The algorithm presented in this paper perceives structure learning of a Bayesian network as a combinatorial optimization problem. It exploits the property of decomposability of the scoring function (whose cost in presented solution is constant) and thus allows to analyze the data more efficiently. 2. PROPOSED ALGORITHMS The algorithms that have been proposed to reliably asses Bayesian networks usability in discovery of statistical relationships consider problems of sampling, structure learning of Bayesian networks and transforming DAGs to PDAGs. a a

3 The problem of sampling Bayesian network is the same as the problem of obtaining a sample from a given distribution. In this case, the distribution is represented in a form of a DAG and is factorized according to equation (). Random variables in a Bayesian network depend directly on the their parents. Thus in order to determine the value of a random variable, say X i, it is necessary to obtain the values of random variables belonging to the parents P A Xi of X i. In presented approach the forward sampling algorithm 7 was used which starts from a root node and traverses a Bayesian network in readth-first order sampling each node according to the node s CPT. The proposed structure learning algorithm follows the search and score approach to learning Bayesian network topology. Search and score methods perceives structure learning as a combinatorial optimization problem, where a space of candidate DAGs (of n variables) is searched for the DAG which best approximates the joint probability distribution. In presented approach the algorithm designed for structure learning, given in listing, is an improved version of randomized greedy local search algorithm 8 and was inspired by solution presented in. 9 The algorithm searches the space of all DAGs (containing n variables) for a DAG with the highest value of the scoring function. Listing Randomized Local DAG search Problem: Find a DAG G that maximizes score Input: Data D, graph G (possibly empty), number of local search runs l Output: DAG G that maximizes score RandomizedLocalDAGSearch(D, G, l) 2: score best score(d, G) G best G 4: for i to l do G RandomDAG(getV ertices(g)) 6: score score(d, G) repeat 8: findmore false N G generateacyclicneighbourhood(g) : for all G j from N G do if score < score(d, G j ) then 2: G G j score score(d, G) 4: findmore true end if 6: end for until findmore = false 8: if score best < score then G best G 2: score best score end if 22: end for return G best The algorithm performs l number of local searches (lines 4-22), starting every time from a random DAG. At each step of the local search a neighborhood of the current DAG G is found by the procedure generateacyclic- Neighbourhood. The neighborhood of a DAG is a set of all DAGs that are obtained from graph G applying one of the following operations: if two nodes are not adjacent, add an edge between them in either direction provided that no cycle is introduced if two nodes are adjacent, remove the edge between them

4 if two nodes are adjacent, reverse the edge between them provided that no cycle is introduced Next all DAGs in the neighborhood of the current DAG G are scored and the one with the highest score is selected (line ). The graph becomes a current DAG G and so the local searching proceeds further with G as a new current DAG. The local searching is stopped when no graph from the neighborhood of G has higher score than G. When this is the case, the algorithm checks if the graph obtained from local searching has a higher score that the best graph found so far. If its score is better, then the graph becomes the best graph G best (line 9). The algorithm proceeds to the next iteration and a new local search begins starting from a random DAG. The algorithm stops after the specified number of l iterations. Randomization of an initial graph for each local searching is necessary to avoid local maxima. Starting only from one, for example, empty graph, would always lead to the same output graph since local searching is deterministic. Randomization, however, enables local search to explore complete DAG space as long as l is high enough. The fitness of a DAG to data is measured with a scoring function. The score used for ranking DAGs is a Bayesian-Dirichlet scoring function with an equivalent sample size, 8 given in equation (2). where, n M ijk N ijk N r i q i score(d, G) = n q i log ( Γ i= j N q i ( Γ ) N q i + r i k= M ijk is the number of random variables (attributes) in data D is the number of cases for which X i takes on its k-th value while the parents of X i in G are in their j-th instantiation is the hyperparameter N ijk = N r iq i is an equivalent sample size is the number of distinct values X i takes on is the number of distinct instantiations of parents of X i r i ( ) ) Γ(Nijk + M ijk ) + log (2) Γ(N ijk ) Generally obtaining the value of the scoring function requires many calculations on data (e.g. Gamma function is calculated by general formula Γ(x) = t x e t dt, due to not integer arguments). In case of RandomizedLocalDAGSearch algorithm the property of decomposability of the scoring function can be exploited since current graph s neighborhoods are scored, which differ from the current graph only by one edge. In this way, the insertion or deletion of an arc X j X i in a DAG G can be evaluated by computing only one new local score, score(d, X i, P A(X i ) X j ) or score(d, X i, P a(x i ) X j ) respectively, the reversal of an arc X j X i requires the evaluation of two new local scores score(d, X i, P a(x i ) X j ) and score(d, X j, P a(x j ) X i ). To improve the efficiency, a data set is first pre-processed to build an index based on associate container. The time complexity of RandomizedLocalDAGSearch algorithm is proportional to the number of local search iterations. Each local search comprises searches through neighborhoods of DAGs. The size of a neighborhood of a DAG G with n nodes is of order n 2, every such DAG is scored using scoring function, whose cost is constant, thanks to the property of decomposability and data indexing. If h is the average number of neighborhoods, that will be searched during one local search iteration the time complexity is O(lhn 2 ). Depending on the fitness of data to an initial graph, h can either be a small number, when a local search terminates quickly due to the lack of any better graph in the neighborhood of the current graph G or considerably ig number, when a chain of consecutive better graphs is long. When learning structure of Bayesian networks from raw data, relaying only on pure statistical characteristics of data, the output graph should be converted to a PDAG. In this work the PDAG transformation algorithm given in 8 was implemented. k=

5 3. VERIFICATION ON SYNTHETIC AND REAL DATA Functional testing of the software was split into two parts. The first part considered tests for the ability of reconstructing a given graph topology from the data sampled from this graph, the second part referred real SNP data detected by DNA microarray. For validation of the structure learning algorithm two metrics that give measure of Bayesian networks structures similarity were defined. Skeleton metric: where, E E 2 M s (G, G 2 ) = is a set of edges of G with disregard to their direction is a set of edges of G 2 with disregard to their direction V-structure metric: where, V is a set of v-structures of graph G V 2 is a set of v-structures of graph G 2 M v (G, G 2 ) = E E 2 max( E, E 2 ) (3) V V 2 max( V, V 2 ) (4) Elements of sets V and V 2 are defined as ordered triples of nodes (X i, X k, X j ) where node in the middle is the middle node of a v-structure X i X k X j. Note, that the higher the value of the two metrics the more similar are the two graphs. The maximum value of each metric is. If the M s (G, G 2 ) = and M v (G, G 2 ) =, G and G 2 are Markov equivalent, hence they belong to the same Markov equivalence class. 3. Tests on synthetic data The tests on synthetic data checked the ability of reconstructing a given graph topology from the data sampled from this graph. There was a number of experiments performed, in which every time an input Bayesian network of n vertices was specified and the data of size m was sampled from this network. Next, this data was used for reconstruction of the graph. Taking into account the stochastic nature of forward sampling, t independent data sets were sampled and graph reconstruction was performed for each of these t data sets separately. To estimate the algorithm s accuracy in reconstructing the input network, the skeleton and v-structure metrics were calculated for each of the obtained graph with respect to the input graph. The testing procedure is given in listing 2. Listing 2 Testing procedure Input: Bayesian network bn, data size m, number of data sets t Output: Statistics of skeleton and v-structure metrics for t samplings T estingp rocedure(bn, m, t) for i to t do Sample data D i of size m from bn Build graph G i from D i using algorithm Compare G i with the topology of bn using skeleton and v-structure metrics end for Calculate statistics of M i s and M i v The number of samplings t done for each input Bayesian network was introduced to eliminate influence of biased data on the structure learning algorithm s accuracy. The bigger the t, the more time it is spent to get the results, therefore t was set so the results could be obtained in reasonable time.

6 The accuracy of graph reconstruction was then expressed using statistics of calculated metrics Ms i and Mv. i The average algorithm reconstruction accuracy was characterized by the Ms mean = t t i= M s i and Mv mean = t t i= M v. i First binary networks of two and three nodes were tested to check if the Bayesian-Dirichlet scoring function could properly differentiate among basic structures and whether the proposed algorithm converges to the best graph. The structures tested were: independent nodes, chain nodes and v-structures. The CPTs in the graphs were specified do they reflect dependencies imposed by graph s structures. Next, to check whether ternary data can be equally well recreated as binary data, two networks of three ternary nodes were proposed. Then, nodes of four, five and six nodes were composed of the basic structures given above and the algorithm was tested on data sampled from these networks. The CPTs of basic structures were not changed, or slightly changed so that according to Markov condition a node would be independent of its non-descendants given its parents. Tests were also performed for various data sizes. The general approach was to run the testing procedure starting from some initial data size m init, increment the data size by some m inc, run the testing procedure for m = m + m inc and continue the process until m reaches some data cap m cap, for which the proposed statistics saturate. To illustrate the testing procedure the result for six node network depicted in figure 2 is described below. The network consists of one v-structure X 2 X 3 X 4, two pairs of dependent nodes X 3 X 5 and X 3 X 6 and one independent node X, the CPTs were specified to values shown in figure 2. P(=).5 P(=).5 P(X=).5 P(=) X.5.5 P(X5=).5 X5 X6 P(X6=).5 Figure 2. Six node network mean skeleton metric mean v structure metric data size data size Figure 3. Six node network - results, M mean s on the left M mean v The results in figure 3 indicate good average reconstruction facility. on the right in function of data size

7 The tests showed that when CPTs of an input Bayesian network reflect the dependency given by its structure it was possible to reconstruct the network from sampled data. The reconstruction rate for basic structures (for greater data sizes) ranged from 9% to 95% and for composed, more complex networks, it ranged from 75% to 95% both for equivalent skeletons statics and equivalent v-structure statistics. During tests, it was shown that the accuracy of structure reconstruction depends on the data size. The tendency is that the bigger the data size the more the output graph resembles the input graph. However, for each Bayesian network there seems to be some threshold data size above which no increase of algorithm s accuracy is observed. Oscillations appearing when data size reaches the threshold value are possibly caused by the forward sampling procedure. Sometimes the generated data set is better match by different network and hence the input Bayesian network would not be reconstructed accurately. Apart from the tests described above, tests on random Bayesian networks, which structure and CPTs were randomly generated, were performed. The generation procedure of a Bayesian network consisted of two steps, first generating topology and then generating entries to CPTs. The two steps were independent of each other, which posed a problem since it did not guarantee that the CPTs would reflect the dependencies entailed by the structure. Nevertheless, when CPTs were properly specified the reconstruction was possible. 3.2 Analysis of SNP data The goal of testing the structure learning algorithm on real data was to check how it handled real joint probability distributions and to propose an application of the algorithm for real-world problems. The task was to reconstruct a graph of affinity relationships between members of a family based on single nucleotide polymorphism data. The data was obtained from genotyping experiment conducted for an 8 person family, for which 24 biallelic SNP locus were observed. The observed values were AA, AB, BB and NoCall. The AA, AB, BB corresponded to alleles and were mapped to, 2 and 3 respectively, the NoCall indicated a missing or uncertain value and was eliminated. This rendered a ternary data set of size 8 by The graph of affinity relationships, given in figure 4, was known before running the structure learning algorithm. The edges in the graph identify parent-child relationships. X X8 X5 X7 X6 Figure 4. Input graph of affinity relationships (pedigree) The structure learning algorithm was run on the SNP data. The number of local searches was set to, meaning that the algorithm would perform independent searches every time starting from the random DAG. Figure 5 shows the two graphs, a PDAG of the input graph and a PDAG of the output graph found by the algorithm. There are 7 edges in the input graph and 4 edges in the output graph. All the edges of the input graphs are contained in the output graph and thus the skeleton metric of the two graphs is.5. There is one v-structure in the input graph, namely X 3 X 5 X 8 and two v-structures in the output graph X 3 X 5 X 8, which was successfully reconstructed and X 2 X 8 X 6, the v-structure metric of two graphs is also.5. The output graph additionally contains edges not present in the input graph. Some of the additional edges indicate the sibling relationships between the family members. From figure 4, we see that there are three sibling in the first generation: X 2, X 3, X 4 and two siblings in the second generation X 6 and X 7. All these siblings were discovered by the structure learning algorithm. There is also an edge X X 6 which express a cross-generation relationship.

8 X X X8 X8 X5 X7 X6 X5 X7 X6 Figure 5. Affinity relationship PDAG vs. discovered PDAG The only unexpected structure present in the output graph is the v-structure X 2 X 8 X 6. X 8 is not tied to the family in terms of gene inheritance, since it comes outside the family, and yet two nodes X 2 and X 6 points towards X 8 as if they were parents of X 8. This result leads to the conclusion that the discovered edges, apart from affinity relationships, might model some other unknown, yet statistically important feature. 3.3 Software development Most of the software that accompanied the project included a C++ library dedicated for programmers. Additionally, unit tests were developed for purpose of verification and validation of the library s modules. The project includes also a command line application, implemented to deliver most of functionality provided by the library without forcing a user to deal with programming issues. The software was developed in C++ using standard libraries as well as portable third-party libraries. Testing of the software was performed on Windows (Visual Studio) and Linux (GNU Compiler Collection) platforms. The library consists of four modules: sampling module structure learning module validation module serialization module Each of the module is based on the dependency injection design pattern, which removes direct dependencies between components and favors plug-in architecture. 4. CONCLUSIONS This article presented the algorithm for discovery of Bayesian network structure as well as its application to the analysis of genetic data. Prior to analysis of SNP data, the structure learning algorithm was verified on synthetic data. The tests indicated that the algorithm was able to find the best Bayesian network for a given data set as long as the size of data was sufficient. Testing the algorithm on real data accounted for analysis of almost ten thousand SNPs taken from an eight person family. Apart from familial relationships, the analyzed SNPs turned out to manifest also other dependencies as the algorithm pointed out one v-structure which did not fit in affinity relationships. Further studies on data could focus on finding a subset of SNPs, which solely determine the affinity relationships and a subset of those SNPs that cause the unexplained v-structure to appear. The main advantage of the presented algorithm lies in reduction of structure learning complexity, owing to local search heuristic for searching a space of DAGs and decomposability of the scoring function. The library implemented in C++ as well as the other tools are freely available for academic and commercial purpose. Library name: Homepage: Precompiled binaries: Programming languages: License: mugraph Windows XP/Vista, Debian Linux C++ GNU LGPL

9 REFERENCES [] Pearl, J., [Causality: models, reasoning, and inference], Cambridge university press (2). [2] Madigan, D., Andersson, S., Perlman, M., and Volinsky, C., Bayesian model averaging and model selection for Markov equivalence classes of acyclic digraphs, Communications in Statistics-Theory and Methods 25(), (996). [3] Thorisson, G., Smith, A., Krishnan, L., and Stein, L., The international HapMap project web site, (25). [4] Lauritzen, S. and Sheehan, N., Graphical models for genetic analyses, Statistical Science, (23). [5] Cottingham Jr, R., Idury, R., and Schäffer, A., Faster sequential genetic linkage computations., American Journal of Human Genetics 53(), 252 (993). [6] O Connell, J. and Weeks, D., The VITESSE algorithm for rapid exact multilocus linkage analysis via genotype set recoding and fuzzy inheritance, Nature Genetics (4), (995). [7] Bishop, C. and service, S. O., [Pattern recognition and machine learning], Springer New York. (26). [8] Neapolitan, R., [Learning bayesian networks], Prentice Hall Upper Saddle River, NJ (23). [9] Cooper, G. and Herskovits, E., A Bayesian method for the induction of probabilistic networks from data, Machine learning 9(4), (992). [] Fowler, M., Inversion of control containers and the dependency injection pattern, Actualizado el 23.

A Well-Behaved Algorithm for Simulating Dependence Structures of Bayesian Networks

A Well-Behaved Algorithm for Simulating Dependence Structures of Bayesian Networks A Well-Behaved Algorithm for Simulating Dependence Structures of Bayesian Networks Yang Xiang and Tristan Miller Department of Computer Science University of Regina Regina, Saskatchewan, Canada S4S 0A2

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Bayesian Networks Directed Acyclic Graph (DAG) Bayesian Networks General Factorization Bayesian Curve Fitting (1) Polynomial Bayesian

More information

Evaluating the Effect of Perturbations in Reconstructing Network Topologies

Evaluating the Effect of Perturbations in Reconstructing Network Topologies DSC 2 Working Papers (Draft Versions) http://www.ci.tuwien.ac.at/conferences/dsc-2/ Evaluating the Effect of Perturbations in Reconstructing Network Topologies Florian Markowetz and Rainer Spang Max-Planck-Institute

More information

10708 Graphical Models: Homework 2

10708 Graphical Models: Homework 2 10708 Graphical Models: Homework 2 Due October 15th, beginning of class October 1, 2008 Instructions: There are six questions on this assignment. Each question has the name of one of the TAs beside it,

More information

A Parallel Algorithm for Exact Structure Learning of Bayesian Networks

A Parallel Algorithm for Exact Structure Learning of Bayesian Networks A Parallel Algorithm for Exact Structure Learning of Bayesian Networks Olga Nikolova, Jaroslaw Zola, and Srinivas Aluru Department of Computer Engineering Iowa State University Ames, IA 0010 {olia,zola,aluru}@iastate.edu

More information

SOLOMON: Parentage Analysis 1. Corresponding author: Mark Christie

SOLOMON: Parentage Analysis 1. Corresponding author: Mark Christie SOLOMON: Parentage Analysis 1 Corresponding author: Mark Christie christim@science.oregonstate.edu SOLOMON: Parentage Analysis 2 Table of Contents: Installing SOLOMON on Windows/Linux Pg. 3 Installing

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

BAYESIAN NETWORKS STRUCTURE LEARNING

BAYESIAN NETWORKS STRUCTURE LEARNING BAYESIAN NETWORKS STRUCTURE LEARNING Xiannian Fan Uncertainty Reasoning Lab (URL) Department of Computer Science Queens College/City University of New York http://url.cs.qc.cuny.edu 1/52 Overview : Bayesian

More information

Dependency detection with Bayesian Networks

Dependency detection with Bayesian Networks Dependency detection with Bayesian Networks M V Vikhreva Faculty of Computational Mathematics and Cybernetics, Lomonosov Moscow State University, Leninskie Gory, Moscow, 119991 Supervisor: A G Dyakonov

More information

FastA & the chaining problem

FastA & the chaining problem FastA & the chaining problem We will discuss: Heuristics used by the FastA program for sequence alignment Chaining problem 1 Sources for this lecture: Lectures by Volker Heun, Daniel Huson and Knut Reinert,

More information

Graphical Models and Markov Blankets

Graphical Models and Markov Blankets Stephan Stahlschmidt Ladislaus von Bortkiewicz Chair of Statistics C.A.S.E. Center for Applied Statistics and Economics Humboldt-Universität zu Berlin Motivation 1-1 Why Graphical Models? Illustration

More information

FastA and the chaining problem, Gunnar Klau, December 1, 2005, 10:

FastA and the chaining problem, Gunnar Klau, December 1, 2005, 10: FastA and the chaining problem, Gunnar Klau, December 1, 2005, 10:56 4001 4 FastA and the chaining problem We will discuss: Heuristics used by the FastA program for sequence alignment Chaining problem

More information

Hybrid Feature Selection for Modeling Intrusion Detection Systems

Hybrid Feature Selection for Modeling Intrusion Detection Systems Hybrid Feature Selection for Modeling Intrusion Detection Systems Srilatha Chebrolu, Ajith Abraham and Johnson P Thomas Department of Computer Science, Oklahoma State University, USA ajith.abraham@ieee.org,

More information

Distributed minimum spanning tree problem

Distributed minimum spanning tree problem Distributed minimum spanning tree problem Juho-Kustaa Kangas 24th November 2012 Abstract Given a connected weighted undirected graph, the minimum spanning tree problem asks for a spanning subtree with

More information

FMA901F: Machine Learning Lecture 6: Graphical Models. Cristian Sminchisescu

FMA901F: Machine Learning Lecture 6: Graphical Models. Cristian Sminchisescu FMA901F: Machine Learning Lecture 6: Graphical Models Cristian Sminchisescu Graphical Models Provide a simple way to visualize the structure of a probabilistic model and can be used to design and motivate

More information

Network Based Models For Analysis of SNPs Yalta Opt

Network Based Models For Analysis of SNPs Yalta Opt Outline Network Based Models For Analysis of Yalta Optimization Conference 2010 Network Science Zeynep Ertem*, Sergiy Butenko*, Clare Gill** *Department of Industrial and Systems Engineering, **Department

More information

Research Article Structural Learning about Directed Acyclic Graphs from Multiple Databases

Research Article Structural Learning about Directed Acyclic Graphs from Multiple Databases Abstract and Applied Analysis Volume 2012, Article ID 579543, 9 pages doi:10.1155/2012/579543 Research Article Structural Learning about Directed Acyclic Graphs from Multiple Databases Qiang Zhao School

More information

SNP HiTLink Manual. Yoko Fukuda 1, Hiroki Adachi 2, Eiji Nakamura 2, and Shoji Tsuji 1

SNP HiTLink Manual. Yoko Fukuda 1, Hiroki Adachi 2, Eiji Nakamura 2, and Shoji Tsuji 1 SNP HiTLink Manual Yoko Fukuda 1, Hiroki Adachi 2, Eiji Nakamura 2, and Shoji Tsuji 1 1 Department of Neurology, Graduate School of Medicine, the University of Tokyo, Tokyo, Japan 2 Dynacom Co., Ltd, Kanagawa,

More information

Step-by-Step Guide to Basic Genetic Analysis

Step-by-Step Guide to Basic Genetic Analysis Step-by-Step Guide to Basic Genetic Analysis Page 1 Introduction This document shows you how to clean up your genetic data, assess its statistical properties and perform simple analyses such as case-control

More information

CHAPTER 6 REAL-VALUED GENETIC ALGORITHMS

CHAPTER 6 REAL-VALUED GENETIC ALGORITHMS CHAPTER 6 REAL-VALUED GENETIC ALGORITHMS 6.1 Introduction Gradient-based algorithms have some weaknesses relative to engineering optimization. Specifically, it is difficult to use gradient-based algorithms

More information

Using a Model of Human Cognition of Causality to Orient Arcs in Structural Learning

Using a Model of Human Cognition of Causality to Orient Arcs in Structural Learning Using a Model of Human Cognition of Causality to Orient Arcs in Structural Learning A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy at George

More information

Sequence Alignment 1

Sequence Alignment 1 Sequence Alignment 1 Nucleotide and Base Pairs Purine: A and G Pyrimidine: T and C 2 DNA 3 For this course DNA is double-helical, with two complementary strands. Complementary bases: Adenine (A) - Thymine

More information

Review of feature selection techniques in bioinformatics by Yvan Saeys, Iñaki Inza and Pedro Larrañaga.

Review of feature selection techniques in bioinformatics by Yvan Saeys, Iñaki Inza and Pedro Larrañaga. Americo Pereira, Jan Otto Review of feature selection techniques in bioinformatics by Yvan Saeys, Iñaki Inza and Pedro Larrañaga. ABSTRACT In this paper we want to explain what feature selection is and

More information

Ch9: Exact Inference: Variable Elimination. Shimi Salant, Barak Sternberg

Ch9: Exact Inference: Variable Elimination. Shimi Salant, Barak Sternberg Ch9: Exact Inference: Variable Elimination Shimi Salant Barak Sternberg Part 1 Reminder introduction (1/3) We saw two ways to represent (finite discrete) distributions via graphical data structures: Bayesian

More information

Machine Learning. Sourangshu Bhattacharya

Machine Learning. Sourangshu Bhattacharya Machine Learning Sourangshu Bhattacharya Bayesian Networks Directed Acyclic Graph (DAG) Bayesian Networks General Factorization Curve Fitting Re-visited Maximum Likelihood Determine by minimizing sum-of-squares

More information

Population Genetics (52642)

Population Genetics (52642) Population Genetics (52642) Benny Yakir 1 Introduction In this course we will examine several topics that are related to population genetics. In each topic we will discuss briefly the biological background

More information

Lectures by Volker Heun, Daniel Huson and Knut Reinert, in particular last years lectures

Lectures by Volker Heun, Daniel Huson and Knut Reinert, in particular last years lectures 4 FastA and the chaining problem We will discuss: Heuristics used by the FastA program for sequence alignment Chaining problem 4.1 Sources for this lecture Lectures by Volker Heun, Daniel Huson and Knut

More information

Workshop report 1. Daniels report is on website 2. Don t expect to write it based on listening to one project (we had 6 only 2 was sufficient

Workshop report 1. Daniels report is on website 2. Don t expect to write it based on listening to one project (we had 6 only 2 was sufficient Workshop report 1. Daniels report is on website 2. Don t expect to write it based on listening to one project (we had 6 only 2 was sufficient quality) 3. I suggest writing it on one presentation. 4. Include

More information

A New Approach For Convert Multiply-Connected Trees in Bayesian networks

A New Approach For Convert Multiply-Connected Trees in Bayesian networks A New Approach For Convert Multiply-Connected Trees in Bayesian networks 1 Hussein Baloochian, Alireza khantimoory, 2 Saeed Balochian 1 Islamic Azad university branch of zanjan 2 Islamic Azad university

More information

Applications of admixture models

Applications of admixture models Applications of admixture models CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar, Alkes Price Applications of admixture models 1 / 27

More information

Learning Equivalence Classes of Bayesian-Network Structures

Learning Equivalence Classes of Bayesian-Network Structures Journal of Machine Learning Research 2 (2002) 445-498 Submitted 7/01; Published 2/02 Learning Equivalence Classes of Bayesian-Network Structures David Maxwell Chickering Microsoft Research One Microsoft

More information

Chapter S:II. II. Search Space Representation

Chapter S:II. II. Search Space Representation Chapter S:II II. Search Space Representation Systematic Search Encoding of Problems State-Space Representation Problem-Reduction Representation Choosing a Representation S:II-1 Search Space Representation

More information

A note on the pairwise Markov condition in directed Markov fields

A note on the pairwise Markov condition in directed Markov fields TECHNICAL REPORT R-392 April 2012 A note on the pairwise Markov condition in directed Markov fields Judea Pearl University of California, Los Angeles Computer Science Department Los Angeles, CA, 90095-1596,

More information

Summary: A Tutorial on Learning With Bayesian Networks

Summary: A Tutorial on Learning With Bayesian Networks Summary: A Tutorial on Learning With Bayesian Networks Markus Kalisch May 5, 2006 We primarily summarize [4]. When we think that it is appropriate, we comment on additional facts and more recent developments.

More information

Meta- Heuristic based Optimization Algorithms: A Comparative Study of Genetic Algorithm and Particle Swarm Optimization

Meta- Heuristic based Optimization Algorithms: A Comparative Study of Genetic Algorithm and Particle Swarm Optimization 2017 2 nd International Electrical Engineering Conference (IEEC 2017) May. 19 th -20 th, 2017 at IEP Centre, Karachi, Pakistan Meta- Heuristic based Optimization Algorithms: A Comparative Study of Genetic

More information

Computational Genomics and Molecular Biology, Fall

Computational Genomics and Molecular Biology, Fall Computational Genomics and Molecular Biology, Fall 2015 1 Sequence Alignment Dannie Durand Pairwise Sequence Alignment The goal of pairwise sequence alignment is to establish a correspondence between the

More information

USING A PRIORI KNOWLEDGE TO CREATE PROBABILISTIC MODELS FOR OPTIMIZATION

USING A PRIORI KNOWLEDGE TO CREATE PROBABILISTIC MODELS FOR OPTIMIZATION USING A PRIORI KNOWLEDGE TO CREATE PROBABILISTIC MODELS FOR OPTIMIZATION Shumeet Baluja 1 School of Computer Science Carnegie Mellon University Abstract Recent studies have examined the effectiveness of

More information

A probabilistic logic incorporating posteriors of hierarchic graphical models

A probabilistic logic incorporating posteriors of hierarchic graphical models A probabilistic logic incorporating posteriors of hierarchic graphical models András s Millinghoffer, Gábor G Hullám and Péter P Antal Department of Measurement and Information Systems Budapest University

More information

GRASP. Greedy Randomized Adaptive. Search Procedure

GRASP. Greedy Randomized Adaptive. Search Procedure GRASP Greedy Randomized Adaptive Search Procedure Type of problems Combinatorial optimization problem: Finite ensemble E = {1,2,... n } Subset of feasible solutions F 2 Objective function f : 2 Minimisation

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 5 Inference

More information

Directed Graphical Models (Bayes Nets) (9/4/13)

Directed Graphical Models (Bayes Nets) (9/4/13) STA561: Probabilistic machine learning Directed Graphical Models (Bayes Nets) (9/4/13) Lecturer: Barbara Engelhardt Scribes: Richard (Fangjian) Guo, Yan Chen, Siyang Wang, Huayang Cui 1 Introduction For

More information

A Fast Algorithm for Optimal Alignment between Similar Ordered Trees

A Fast Algorithm for Optimal Alignment between Similar Ordered Trees Fundamenta Informaticae 56 (2003) 105 120 105 IOS Press A Fast Algorithm for Optimal Alignment between Similar Ordered Trees Jesper Jansson Department of Computer Science Lund University, Box 118 SE-221

More information

Graphical Models. David M. Blei Columbia University. September 17, 2014

Graphical Models. David M. Blei Columbia University. September 17, 2014 Graphical Models David M. Blei Columbia University September 17, 2014 These lecture notes follow the ideas in Chapter 2 of An Introduction to Probabilistic Graphical Models by Michael Jordan. In addition,

More information

Using a genetic algorithm for editing k-nearest neighbor classifiers

Using a genetic algorithm for editing k-nearest neighbor classifiers Using a genetic algorithm for editing k-nearest neighbor classifiers R. Gil-Pita 1 and X. Yao 23 1 Teoría de la Señal y Comunicaciones, Universidad de Alcalá, Madrid (SPAIN) 2 Computer Sciences Department,

More information

Literature Review On Implementing Binary Knapsack problem

Literature Review On Implementing Binary Knapsack problem Literature Review On Implementing Binary Knapsack problem Ms. Niyati Raj, Prof. Jahnavi Vitthalpura PG student Department of Information Technology, L.D. College of Engineering, Ahmedabad, India Assistant

More information

Genetic Analysis. Page 1

Genetic Analysis. Page 1 Genetic Analysis Page 1 Genetic Analysis Objectives: 1) Set up Case-Control Association analysis and the Basic Genetics Workflow 2) Use JMP tools to interact with and explore results 3) Learn advanced

More information

Missing Data Estimation in Microarrays Using Multi-Organism Approach

Missing Data Estimation in Microarrays Using Multi-Organism Approach Missing Data Estimation in Microarrays Using Multi-Organism Approach Marcel Nassar and Hady Zeineddine Progress Report: Data Mining Course Project, Spring 2008 Prof. Inderjit S. Dhillon April 02, 2008

More information

A Parallel Evolutionary Algorithm for Discovery of Decision Rules

A Parallel Evolutionary Algorithm for Discovery of Decision Rules A Parallel Evolutionary Algorithm for Discovery of Decision Rules Wojciech Kwedlo Faculty of Computer Science Technical University of Bia lystok Wiejska 45a, 15-351 Bia lystok, Poland wkwedlo@ii.pb.bialystok.pl

More information

USING AN EXTENDED SUFFIX TREE TO SPEED-UP SEQUENCE ALIGNMENT

USING AN EXTENDED SUFFIX TREE TO SPEED-UP SEQUENCE ALIGNMENT IADIS International Conference Applied Computing 2006 USING AN EXTENDED SUFFIX TREE TO SPEED-UP SEQUENCE ALIGNMENT Divya R. Singh Software Engineer Microsoft Corporation, Redmond, WA 98052, USA Abdullah

More information

Suppose you have a problem You don t know how to solve it What can you do? Can you use a computer to somehow find a solution for you?

Suppose you have a problem You don t know how to solve it What can you do? Can you use a computer to somehow find a solution for you? Gurjit Randhawa Suppose you have a problem You don t know how to solve it What can you do? Can you use a computer to somehow find a solution for you? This would be nice! Can it be done? A blind generate

More information

A Transformational Characterization of Markov Equivalence for Directed Maximal Ancestral Graphs

A Transformational Characterization of Markov Equivalence for Directed Maximal Ancestral Graphs A Transformational Characterization of Markov Equivalence for Directed Maximal Ancestral Graphs Jiji Zhang Philosophy Department Carnegie Mellon University Pittsburgh, PA 15213 jiji@andrew.cmu.edu Abstract

More information

8/19/13. Computational problems. Introduction to Algorithm

8/19/13. Computational problems. Introduction to Algorithm I519, Introduction to Introduction to Algorithm Yuzhen Ye (yye@indiana.edu) School of Informatics and Computing, IUB Computational problems A computational problem specifies an input-output relationship

More information

Graph and Digraph Glossary

Graph and Digraph Glossary 1 of 15 31.1.2004 14:45 Graph and Digraph Glossary A B C D E F G H I-J K L M N O P-Q R S T U V W-Z Acyclic Graph A graph is acyclic if it contains no cycles. Adjacency Matrix A 0-1 square matrix whose

More information

Sub-Local Constraint-Based Learning of Bayesian Networks Using A Joint Dependence Criterion

Sub-Local Constraint-Based Learning of Bayesian Networks Using A Joint Dependence Criterion Journal of Machine Learning Research 14 (2013) 1563-1603 Submitted 11/10; Revised 9/12; Published 6/13 Sub-Local Constraint-Based Learning of Bayesian Networks Using A Joint Dependence Criterion Rami Mahdi

More information

1. Why Study Trees? Trees and Graphs. 2. Binary Trees. CITS2200 Data Structures and Algorithms. Wood... Topic 10. Trees are ubiquitous. Examples...

1. Why Study Trees? Trees and Graphs. 2. Binary Trees. CITS2200 Data Structures and Algorithms. Wood... Topic 10. Trees are ubiquitous. Examples... . Why Study Trees? CITS00 Data Structures and Algorithms Topic 0 Trees and Graphs Trees and Graphs Binary trees definitions: size, height, levels, skinny, complete Trees, forests and orchards Wood... Examples...

More information

A genetic algorithm for kidney transplantation matching

A genetic algorithm for kidney transplantation matching A genetic algorithm for kidney transplantation matching S. Goezinne Research Paper Business Analytics Supervisors: R. Bekker and K. Glorie March 2016 VU Amsterdam Faculty of Exact Sciences De Boelelaan

More information

An Information Theory based Approach to Structure Learning in Bayesian Networks

An Information Theory based Approach to Structure Learning in Bayesian Networks An Information Theory based Approach to Structure Learning in Bayesian Networks Gopalakrishna Anantha 9 th October 2006 Committee Dr.Xue wen Chen (Chair) Dr. John Gauch Dr. Victor Frost Publications An

More information

The Acyclic Bayesian Net Generator (Student Paper)

The Acyclic Bayesian Net Generator (Student Paper) The Acyclic Bayesian Net Generator (Student Paper) Pankaj B. Gupta and Vicki H. Allan Microsoft Corporation, One Microsoft Way, Redmond, WA 98, USA, pagupta@microsoft.com Computer Science Department, Utah

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Overview of Part One Probabilistic Graphical Models Part One: Graphs and Markov Properties Christopher M. Bishop Graphs and probabilities Directed graphs Markov properties Undirected graphs Examples Microsoft

More information

Joint Entity Resolution

Joint Entity Resolution Joint Entity Resolution Steven Euijong Whang, Hector Garcia-Molina Computer Science Department, Stanford University 353 Serra Mall, Stanford, CA 94305, USA {swhang, hector}@cs.stanford.edu No Institute

More information

Graph Algorithms Using Depth First Search

Graph Algorithms Using Depth First Search Graph Algorithms Using Depth First Search Analysis of Algorithms Week 8, Lecture 1 Prepared by John Reif, Ph.D. Distinguished Professor of Computer Science Duke University Graph Algorithms Using Depth

More information

Step-by-Step Guide to Advanced Genetic Analysis

Step-by-Step Guide to Advanced Genetic Analysis Step-by-Step Guide to Advanced Genetic Analysis Page 1 Introduction In the previous document, 1 we covered the standard genetic analyses available in JMP Genomics. Here, we cover the more advanced options

More information

Graphical Models. Pradeep Ravikumar Department of Computer Science The University of Texas at Austin

Graphical Models. Pradeep Ravikumar Department of Computer Science The University of Texas at Austin Graphical Models Pradeep Ravikumar Department of Computer Science The University of Texas at Austin Useful References Graphical models, exponential families, and variational inference. M. J. Wainwright

More information

Smoothing Dissimilarities for Cluster Analysis: Binary Data and Functional Data

Smoothing Dissimilarities for Cluster Analysis: Binary Data and Functional Data Smoothing Dissimilarities for Cluster Analysis: Binary Data and unctional Data David B. University of South Carolina Department of Statistics Joint work with Zhimin Chen University of South Carolina Current

More information

Evaluating the Explanatory Value of Bayesian Network Structure Learning Algorithms

Evaluating the Explanatory Value of Bayesian Network Structure Learning Algorithms Evaluating the Explanatory Value of Bayesian Network Structure Learning Algorithms Patrick Shaughnessy University of Massachusetts, Lowell pshaughn@cs.uml.edu Gary Livingston University of Massachusetts,

More information

Chapter 2 PRELIMINARIES. 1. Random variables and conditional independence

Chapter 2 PRELIMINARIES. 1. Random variables and conditional independence Chapter 2 PRELIMINARIES In this chapter the notation is presented and the basic concepts related to the Bayesian network formalism are treated. Towards the end of the chapter, we introduce the Bayesian

More information

Graphical Models Part 1-2 (Reading Notes)

Graphical Models Part 1-2 (Reading Notes) Graphical Models Part 1-2 (Reading Notes) Wednesday, August 3 2011, 2:35 PM Notes for the Reading of Chapter 8 Graphical Models of the book Pattern Recognition and Machine Learning (PRML) by Chris Bishop

More information

Bayesian Machine Learning - Lecture 6

Bayesian Machine Learning - Lecture 6 Bayesian Machine Learning - Lecture 6 Guido Sanguinetti Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh gsanguin@inf.ed.ac.uk March 2, 2015 Today s lecture 1

More information

Graphical Models and Bayesian Networks user!2015 tutorial

Graphical Models and Bayesian Networks user!2015 tutorial 1/53 Graphical Models and Bayesian Networks user!2015 tutorial Therese Graversen Department of Mathematical Sciences, University of Copenhagen 30 June 2015 Bayesian Networks 2/53 3/53 Models for discrete

More information

Neural Network Weight Selection Using Genetic Algorithms

Neural Network Weight Selection Using Genetic Algorithms Neural Network Weight Selection Using Genetic Algorithms David Montana presented by: Carl Fink, Hongyi Chen, Jack Cheng, Xinglong Li, Bruce Lin, Chongjie Zhang April 12, 2005 1 Neural Networks Neural networks

More information

Av. Prof. Mello Moraes, 2231, , São Paulo, SP - Brazil

Av. Prof. Mello Moraes, 2231, , São Paulo, SP - Brazil " Generalizing Variable Elimination in Bayesian Networks FABIO GAGLIARDI COZMAN Escola Politécnica, University of São Paulo Av Prof Mello Moraes, 31, 05508-900, São Paulo, SP - Brazil fgcozman@uspbr Abstract

More information

Data Mining Chapter 8: Search and Optimization Methods Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Data Mining Chapter 8: Search and Optimization Methods Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Data Mining Chapter 8: Search and Optimization Methods Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Search & Optimization Search and Optimization method deals with

More information

A Genetic Algorithm Applied to Graph Problems Involving Subsets of Vertices

A Genetic Algorithm Applied to Graph Problems Involving Subsets of Vertices A Genetic Algorithm Applied to Graph Problems Involving Subsets of Vertices Yaser Alkhalifah Roger L. Wainwright Department of Mathematical Department of Mathematical and Computer Sciences and Computer

More information

The max-min hill-climbing Bayesian network structure learning algorithm

The max-min hill-climbing Bayesian network structure learning algorithm Mach Learn (2006) 65:31 78 DOI 10.1007/s10994-006-6889-7 The max-min hill-climbing Bayesian network structure learning algorithm Ioannis Tsamardinos Laura E. Brown Constantin F. Aliferis Received: January

More information

Exploration vs. Exploitation in Differential Evolution

Exploration vs. Exploitation in Differential Evolution Exploration vs. Exploitation in Differential Evolution Ângela A. R. Sá 1, Adriano O. Andrade 1, Alcimar B. Soares 1 and Slawomir J. Nasuto 2 Abstract. Differential Evolution (DE) is a tool for efficient

More information

PACKING DIGRAPHS WITH DIRECTED CLOSED TRAILS

PACKING DIGRAPHS WITH DIRECTED CLOSED TRAILS PACKING DIGRAPHS WITH DIRECTED CLOSED TRAILS PAUL BALISTER Abstract It has been shown [Balister, 2001] that if n is odd and m 1,, m t are integers with m i 3 and t i=1 m i = E(K n) then K n can be decomposed

More information

HIDDEN MARKOV MODELS AND SEQUENCE ALIGNMENT

HIDDEN MARKOV MODELS AND SEQUENCE ALIGNMENT HIDDEN MARKOV MODELS AND SEQUENCE ALIGNMENT - Swarbhanu Chatterjee. Hidden Markov models are a sophisticated and flexible statistical tool for the study of protein models. Using HMMs to analyze proteins

More information

Slides for Faculty Oxford University Press All rights reserved.

Slides for Faculty Oxford University Press All rights reserved. Oxford University Press 2013 Slides for Faculty Assistance Preliminaries Author: Vivek Kulkarni vivek_kulkarni@yahoo.com Outline Following topics are covered in the slides: Basic concepts, namely, symbols,

More information

Towards a Weighted-Tree Similarity Algorithm for RNA Secondary Structure Comparison

Towards a Weighted-Tree Similarity Algorithm for RNA Secondary Structure Comparison Towards a Weighted-Tree Similarity Algorithm for RNA Secondary Structure Comparison Jing Jin, Biplab K. Sarker, Virendra C. Bhavsar, Harold Boley 2, Lu Yang Faculty of Computer Science, University of New

More information

Dynamic Programming User Manual v1.0 Anton E. Weisstein, Truman State University Aug. 19, 2014

Dynamic Programming User Manual v1.0 Anton E. Weisstein, Truman State University Aug. 19, 2014 Dynamic Programming User Manual v1.0 Anton E. Weisstein, Truman State University Aug. 19, 2014 Dynamic programming is a group of mathematical methods used to sequentially split a complicated problem into

More information

As of August 15, 2008, GenBank contained bases from reported sequences. The search procedure should be

As of August 15, 2008, GenBank contained bases from reported sequences. The search procedure should be 48 Bioinformatics I, WS 09-10, S. Henz (script by D. Huson) November 26, 2009 4 BLAST and BLAT Outline of the chapter: 1. Heuristics for the pairwise local alignment of two sequences 2. BLAST: search and

More information

A Probabilistic Relaxation Framework for Learning Bayesian Network Structures from Data

A Probabilistic Relaxation Framework for Learning Bayesian Network Structures from Data A Probabilistic Relaxation Framework for Learning Bayesian Network Structures from Data by Ahmed Mohammed Hassan A Thesis Submitted to the Faculty of Engineering at Cairo University In Partial Fulfillment

More information

Genetic Programming. Charles Chilaka. Department of Computational Science Memorial University of Newfoundland

Genetic Programming. Charles Chilaka. Department of Computational Science Memorial University of Newfoundland Genetic Programming Charles Chilaka Department of Computational Science Memorial University of Newfoundland Class Project for Bio 4241 March 27, 2014 Charles Chilaka (MUN) Genetic algorithms and programming

More information

Causality in Communication: The Agent-Encapsulated Bayesian Network Model

Causality in Communication: The Agent-Encapsulated Bayesian Network Model Causality in Communication: The Agent-Encapsulated Bayesian Network Model Scott Langevin Oculus Info, Inc. Toronto, Ontario, Canada Marco Valtorta mgv@cse.sc.edu University of South Carolina, Columbia,

More information

1 : Introduction to GM and Directed GMs: Bayesian Networks. 3 Multivariate Distributions and Graphical Models

1 : Introduction to GM and Directed GMs: Bayesian Networks. 3 Multivariate Distributions and Graphical Models 10-708: Probabilistic Graphical Models, Spring 2015 1 : Introduction to GM and Directed GMs: Bayesian Networks Lecturer: Eric P. Xing Scribes: Wenbo Liu, Venkata Krishna Pillutla 1 Overview This lecture

More information

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models Prof. Daniel Cremers 4. Probabilistic Graphical Models Directed Models The Bayes Filter (Rep.) (Bayes) (Markov) (Tot. prob.) (Markov) (Markov) 2 Graphical Representation (Rep.) We can describe the overall

More information

Machine Learning!!!!! Srihari. Chain Graph Models. Sargur Srihari

Machine Learning!!!!! Srihari. Chain Graph Models. Sargur Srihari Chain Graph Models Sargur Srihari srihari@cedar.buffalo.edu 1 Topics PDAGs or Chain Graphs Generalization of CRF to Chain Graphs Independencies in Chain Graphs Summary BN versus MN 2 Partially Directed

More information

Integrating locally learned causal structures with overlapping variables

Integrating locally learned causal structures with overlapping variables Integrating locally learned causal structures with overlapping variables Robert E. Tillman Carnegie Mellon University Pittsburgh, PA rtillman@andrew.cmu.edu David Danks, Clark Glymour Carnegie Mellon University

More information

Haplotype Analysis. 02 November 2003 Mendel Short IGES Slide 1

Haplotype Analysis. 02 November 2003 Mendel Short IGES Slide 1 Haplotype Analysis Specifies the genetic information descending through a pedigree Useful visualization of the gene flow through a pedigree A haplotype for a given individual and set of loci is defined

More information

A Comparative Study of Linear Encoding in Genetic Programming

A Comparative Study of Linear Encoding in Genetic Programming 2011 Ninth International Conference on ICT and Knowledge A Comparative Study of Linear Encoding in Genetic Programming Yuttana Suttasupa, Suppat Rungraungsilp, Suwat Pinyopan, Pravit Wungchusunti, Prabhas

More information

Part II. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Part II. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Part II C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Converting Directed to Undirected Graphs (1) Converting Directed to Undirected Graphs (2) Add extra links between

More information

Fortran 90 Two Commonly Used Statements

Fortran 90 Two Commonly Used Statements Fortran 90 Two Commonly Used Statements 1. DO Loops (Compiled primarily from Hahn [1994]) Lab 6B BSYSE 512 Research and Teaching Methods The DO loop (or its equivalent) is one of the most powerful statements

More information

Sequence clustering. Introduction. Clustering basics. Hierarchical clustering

Sequence clustering. Introduction. Clustering basics. Hierarchical clustering Sequence clustering Introduction Data clustering is one of the key tools used in various incarnations of data-mining - trying to make sense of large datasets. It is, thus, natural to ask whether clustering

More information

Escola Politécnica, University of São Paulo Av. Prof. Mello Moraes, 2231, , São Paulo, SP - Brazil

Escola Politécnica, University of São Paulo Av. Prof. Mello Moraes, 2231, , São Paulo, SP - Brazil Generalizing Variable Elimination in Bayesian Networks FABIO GAGLIARDI COZMAN Escola Politécnica, University of São Paulo Av. Prof. Mello Moraes, 2231, 05508-900, São Paulo, SP - Brazil fgcozman@usp.br

More information

Factorization with Missing and Noisy Data

Factorization with Missing and Noisy Data Factorization with Missing and Noisy Data Carme Julià, Angel Sappa, Felipe Lumbreras, Joan Serrat, and Antonio López Computer Vision Center and Computer Science Department, Universitat Autònoma de Barcelona,

More information

Long Read RNA-seq Mapper

Long Read RNA-seq Mapper UNIVERSITY OF ZAGREB FACULTY OF ELECTRICAL ENGENEERING AND COMPUTING MASTER THESIS no. 1005 Long Read RNA-seq Mapper Josip Marić Zagreb, February 2015. Table of Contents 1. Introduction... 1 2. RNA Sequencing...

More information

Introduction to Graphical Models

Introduction to Graphical Models Robert Collins CSE586 Introduction to Graphical Models Readings in Prince textbook: Chapters 10 and 11 but mainly only on directed graphs at this time Credits: Several slides are from: Review: Probability

More information

MICROARRAY IMAGE SEGMENTATION USING CLUSTERING METHODS

MICROARRAY IMAGE SEGMENTATION USING CLUSTERING METHODS Mathematical and Computational Applications, Vol. 5, No. 2, pp. 240-247, 200. Association for Scientific Research MICROARRAY IMAGE SEGMENTATION USING CLUSTERING METHODS Volkan Uslan and Đhsan Ömür Bucak

More information

Breeding Guide. Customer Services PHENOME-NETWORKS 4Ben Gurion Street, 74032, Nes-Ziona, Israel

Breeding Guide. Customer Services PHENOME-NETWORKS 4Ben Gurion Street, 74032, Nes-Ziona, Israel Breeding Guide Customer Services PHENOME-NETWORKS 4Ben Gurion Street, 74032, Nes-Ziona, Israel www.phenome-netwoks.com Contents PHENOME ONE - INTRODUCTION... 3 THE PHENOME ONE LAYOUT... 4 THE JOBS ICON...

More information

The Basics of Graphical Models

The Basics of Graphical Models The Basics of Graphical Models David M. Blei Columbia University September 30, 2016 1 Introduction (These notes follow Chapter 2 of An Introduction to Probabilistic Graphical Models by Michael Jordan.

More information