Outline Network Based Models For Analysis of Yalta Optimization Conference 2010 Network Science Zeynep Ertem*, Sergiy Butenko*, Clare Gill** *Department of Industrial and Systems Engineering, **Department of Animal Science, Texas A& M University College Station, TX 77843-3131 July 29, 2010 1/25 Network Based Models For Analysis of Yalta Opt
Outline Outline 1 Introduction to Graph Theory 2 3 4 5 2/25 Network Based Models For Analysis of Yalta Opt
Introduction to Graph Theory G = (V, E) is a simple undirected graph V = {1, 2,..., n} - set of vertices E V V - set of edges (arcs, lines) A subset C V is called a clique if G(C) is complete, i.e. it has all possible edges. 3/25 Network Based Models For Analysis of Yalta Opt
Introduction to Graph Theory A graph without cycles is acyclic A graph is connected if there is a path between any pair of vertices A tree is a simple, undirected, connected, acyclic graph 4/25 Network Based Models For Analysis of Yalta Opt
Introduction to Graph Theory G = (V, E), is the complement graph of G = (V, E) where E = {(i, j) i, j V, i j and (i, j) / E}. For S V, G(S) = (S, E S S) the subgraph induced by S. 5/25 Network Based Models For Analysis of Yalta Opt
Introduction to Graph Theory A subset C V is called a clique if G(C) is complete, i.e. it has all possible edges. A subset I V is called an independent set (stable set, vertex packing) if G(I ) has no edges. A clique (independent set) is said to be maximal, if it is not a subset of any larger clique (independent set); maximum, if there is no larger clique (independent set) in the graph. 6/25 Network Based Models For Analysis of Yalta Opt
k-plex Given a positive integer k, a k-plex is a subset of vertices C such that each vertex v C is adjacent to all but at most k vertices in C. 1-plex is a clique. 7/25 Network Based Models For Analysis of Yalta Opt
Graph Theory Basics α(g) the independence (stability) number of G. ω(g) the clique number of G. VC V is a vertex cover if every edge has at least one endpoint in VC. 8/25 Network Based Models For Analysis of Yalta Opt
Graph Theory Basics I is a maximum independent set of G I is a maximum clique of Ḡ V \ I is a minimum vertex cover of G. 9/25 Network Based Models For Analysis of Yalta Opt
Graph Theory Problems Maximum Clique Problem (MCP) To find largest k-plex in a given graph G Maximum independent set problem(misp) Minimum vertex cover problem (MVC) MC, MIS and MVC problems are NP-hard 10/25 Network Based Models For Analysis of Yalta Opt
Graph Theory Problems and Constructing a complete map of all occurring in human genome is one of the most important goal. WHAT ARE THOSE? 11/25 Network Based Models For Analysis of Yalta Opt
SNP Single - Nucleotide Polymorphism DNA sequence variation Disease development, response to pathogens, drugs... They do not cause diseases, however they determine susceptibility to them Between members of species or pairs of chromosomes in an individual 12/25 Network Based Models For Analysis of Yalta Opt
Difference is in single nucleotide ATTCGA ATTTGA In human DNA, more than 10 million. However only a few million discovered so far. Substitution Deletion Insertion 13/25 Network Based Models For Analysis of Yalta Opt
Occur in every 100 to 300 nucleotides in the genome Two of every three SNP are C and T substitutions 90% of genetic variation is attributed to SNP 14/25 Network Based Models For Analysis of Yalta Opt
SNP in Genomic Sequences 15/25 Network Based Models For Analysis of Yalta Opt
Tag A representative SNP in a region of the genome with high linkage disequilibrium With the help of tag it is possible to identify genetic variation without genotyping every SNP in a chromosomal region Reduction in experimental cost 16/25 Network Based Models For Analysis of Yalta Opt
Linkage Disequilibrium Nonrandom association of alleles between 2 or more loci Can be calculated as difference between observed and expected allelic frequencies Alleles There may be two or more bases can occur in, these bases 17/25 are called alleles. Like A and C Network Based Models For Analysis of Yalta Opt
Linkage Disequilibrium Generally LD is high when the distance between 2 is low Influenced by a number of factors Genetic linkage, selection the rate of recombination rate of mutation non-random mating population structure 18/25 Network Based Models For Analysis of Yalta Opt
Linkage Disequilibrium 19/25 Network Based Models For Analysis of Yalta Opt
Data With the help of HapMap Project= Haploid Mapping Project able to map genomic data Cattle Chromosome 1 20/25 Network Based Models For Analysis of Yalta Opt
Data Different threshold values ranging from 0.1 to 0.5 are used for the r 2 values Largest k-plexes found. BUT HOW? 21/25 Network Based Models For Analysis of Yalta Opt
How to find k-plexes? k-plexes in social network analysis introduced by Seidman and Foster (1978). ideal for cohesive subgroup not practical due to its restrictive nature Ostergard s algorithm(1999) Branch-and-bound algorithm for maximum-weight clique problem Balasundram et al.(2009) Branch-and-cut algorithm k-plex detection 22/25 Network Based Models For Analysis of Yalta Opt
Results Threshold # of in 2-plex # of in 3-plex # of in 4-plex 0.1 62 63 63 0.15 57 58 58 0.2 50 51 51 0.25 41 41 42 0.3 40 41 41 0.35 40 41 41 0.4 40 40 40 0.45 40 39 39 0.5 39 39 39 23/25 Network Based Models For Analysis of Yalta Opt
Results Largest clusters are very stable and well defined in terms of cliques. If we increase k, clusters are not changing drastically. There is high correlation in between the associated in the cattle s chromosome 1. 24/25 Network Based Models For Analysis of Yalta Opt
Thank You. 25/25 Network Based Models For Analysis of Yalta Opt