Graph Algorithms in Bioinformatics

Size: px
Start display at page:

Download "Graph Algorithms in Bioinformatics"

Transcription

1 Graph Algorithms in Bioinformatics Bioinformatics: Issues and Algorithms CSE Fall 2007 Lecture 13 Lopresti Fall 2007 Lecture

2 Outline Introduction to graph theory Eulerian & Hamiltonian Cycle Problems Benzer experiment and interval graphs DNA sequencing The Shortest Superstring & Traveling Salesman Problems Sequencing by hybridization Lopresti Fall 2007 Lecture

3 The Bridge Obsession Problem Find a tour crossing every bridge just once. Leonhard Euler, 1735 Bridges of Königsberg Lopresti Fall 2007 Lecture

4 Eulerian Cycle Problem Find a cycle that visits every edge exactly once. Algorithm for solution has linear time complexity. More complicated Königsberg Lopresti Fall 2007 Lecture

5 Mapping problems to graphs Arthur Cayley studied chemical structures of hydrocarbons in mid-1800's. He used trees (acyclic connected graphs) to enumerate structural isomers. Lopresti Fall 2007 Lecture

6 Hamiltonian Cycle Problem Find a cycle that visits every vertex exactly once. NP-complete (i.e., a very hard problem to solve). Game invented by Sir William Hamilton in 1857 Lopresti Fall 2007 Lecture

7 Beginning of graph theory in biology Benzer s work: Developed deletion mapping. Proved linearity of gene. Demonstrated internal structure of gene. Seymour Benzer (1950's) Lopresti Fall 2007 Lecture

8 Viruses attack bacteria Normally bacteriophage T4 kills bacteria. However if T4 is mutated (e.g., an important gene is deleted), it gets disabled and loses ability to kill bacteria. Suppose bacteria is infected with two different mutants, each of which is disabled. Would the bacteria still survive? Amazingly, a pair of disable viruses can kill a bacteria even if each of them is disabled. How can this be explained? Lopresti Fall 2007 Lecture

9 Benzer s experiment Idea: infect bacteria with pairs of mutant T4 bacteriophage (virus). Each T4 mutant has an unknown interval deleted from its genome. If two intervals overlap: T4 pair is missing part of its genome and is disabled bacteria survive. If two intervals do not overlap: T4 pair has its entire genome and is enabled bacteria die. Lopresti Fall 2007 Lecture

10 Benzer s experiment Complementation between pairs of mutant T4 bacteriophages Lopresti Fall 2007 Lecture

11 Benzer s experiment and interval graphs Construct interval graph: each T4 mutant is a vertex; place edge between mutant pairs where bacteria survived (i.e., deleted intervals in pair of mutants overlap). Interval graph structure reveals whether DNA is linear or branched DNA. Case for linear genes Lopresti Fall 2007 Lecture

12 Benzer s experiment and interval graphs Case for branched genes Lopresti Fall 2007 Lecture

13 Interval graphs: a comparison Linear genome Branched genome Lopresti Fall 2007 Lecture

14 DNA sequencing: history Sanger method (1977): labeled ddntps* terminate DNA copying at random points. Gilbert method (1977): chemical method to cleave DNA at specific points (G, G+A, T+C, C). Both methods generate labeled fragments of varying lengths that are further electrophoresed. * Dideoxynucleotides, or ddntps, are nucleotides lacking 3'-hydroxyl (-OH) group on their deoxyribose sugar. Note that the sugar normally referred to as deoxyribose lacks a 2'-OH, while the dideoxyribose lacks hydroxyl groups on both its 2' and 3' carbon. The lack of this hydroxyl group means that, after being added by a DNA polymerase to a growing nucleotide chain, no further nucleotides can be added, as no phosphodiester bond can be created. Thus, these molecules form the basis of the dideoxy chaintermination method of DNA sequencing, which was developed by Frederick Sanger in (From Lopresti Fall 2007 Lecture

15 Sanger Method: generating a read Start at primer (restriction site). Grow DNA chain. Include ddntps. Stops reaction at all possible points. Separate products by length, using gel electrophoresis. Lopresti Fall 2007 Lecture

16 DNA sequencing Shear DNA into millions of small fragments. Read nucleotides at a time from the small fragments (Sanger method). Lopresti Fall 2007 Lecture

17 Fragment assembly Computational challenge: assemble individual short fragments (reads) into single genomic sequence. Until late 1990's, the shotgun fragment assembly of human genome was viewed as intractable problem. The Shortest Superstring Problem. Given set of strings, find shortest string containing all of them. Input: Strings s 1, s 2,..., s n. Output: A string s that contains all strings s 1, s 2,..., s n as substrings, such that length of s is minimized. Problem is NP-complete (efficient solution unlikely). This formulation does not take into account sequencing errors! Lopresti Fall 2007 Lecture

18 Shortest Superstring Problem: an example Note: this is an approximation an atttactive mathematical model for a real-world biological problem. Lopresti Fall 2007 Lecture

19 Reducing SSP to TSP Define overlap(s i, s j ) as the length of the longest prefix of s j that matches a suffix of s i. aaaggcatcaaatctaaaggcatcaaa aaaggcatcaaatctaaaggcatcaaa What is overlap(s i, s j ) for these strings? 3? aaaggcatcaaatctaaaggcatcaaa aaaggcatcaaatctaaaggcatcaaa aaaggcatcaaatctaaaggcatcaaa What is overlap(s i, s j ) for these strings? No, 12! Lopresti Fall 2007 Lecture

20 Reducing SSP to TSP Define overlap(s i, s j ) as the length of the longest prefix of s j that matches a suffix of s i. aaaggcatcaaatctaaaggcatcaaa aaaggcatcaaatctaaaggcatcaaa aaaggcatcaaatctaaaggcatcaaa Build graph with n vertices representing strings s 1, s 2,..., s n. Insert edges of length overlap overlap(s i, s j ) between vertices s i and s j. Find longest path which visits every vertex exactly once. This is Traveling Salesman Problem (TSP), which is also NPcomplete. Lopresti Fall 2007 Lecture

21 Reducing SSP to TSP Lopresti Fall 2007 Lecture

22 SSP to TSP: an Example Say that S = {ATC, CCA, CAG, TCC, AGT} SSP AGT CCA ATC ATCCAGT TCC CAG TSP ATC 0 1 AGT 1 2 CAG TCC ATCCAGT CCA Lopresti Fall 2007 Lecture

23 Sequencing by Hybridization (SBH): history 1988: SBH suggested as an an alternative sequencing method. Nobody believed it will ever work. 1991: Light directed polymer synthesis developed by Steve Fodor and colleagues. 1994: Affymetrix develops first 64-kb DNA microarray. First microarray prototype (1989) First commercial DNA microarray prototype w/16,000 features (1994) 500,000 features per chip (2002) Lopresti Fall 2007 Lecture

24 How SBH works Attach all possible DNA probes of length l to flat surface, each probe at distinct and known location. This set of probes is called the DNA array. Apply a solution containing fluorescently labeled DNA fragment to array. The DNA fragment hybridizes with only those probes that are complementary to substrings of length l of fragment. Using a spectroscopic detector, determine which probes hybridize to DNA fragment to obtain l-mer composition of target DNA fragment. Apply combinatorial algorithm (next) to reconstruct sequence of target DNA fragment from l-mer composition. Lopresti Fall 2007 Lecture

25 Hybridization on DNA Array Lopresti Fall 2007 Lecture

26 l-mer composition Spectrum(s, l) is an unordered multiset of all possible (n l + 1) l-mers in a string s of length n. The order of individual elements in Spectrum(s, l) does not matter. For s = TATGGTGC all of following are equivalent representations of Spectrum(s, 3): {TAT, ATG, TGG, GGT, GTG, TGC} {ATG, GGT, GTG, TAT, TGC, TGG} {TGG, TGC, TAT, GTG, GGT, ATG} Lopresti Fall 2007 Lecture

27 l-mer composition Spectrum(s, l) is an unordered multiset of all possible (n l + 1) l-mers in a string s of length n. The order of individual elements in Spectrum(s, l) does not matter. For s = TATGGTGC all of following are equivalent representations of Spectrum(s, 3): {TAT, ATG, TGG, GGT, GTG, TGC} {ATG, GGT, GTG, TAT, TGC, TGG} {TGG, TGC, TAT, GTG, GGT, ATG} We usually choose lexicographically-ordered representation as canonical one. Lopresti Fall 2007 Lecture

28 Different sequences different spectrums Note that different sequences may have same spectrum. Spectrum(GTATCT, 2)= Spectrum(GTCTAT, 2)= {AT, CT, GT, TA, TC} The Sequencing by Hybridization (SBH) Problem. Reconstruct a string from its l-mer composition. Input: Set S representing all l-mers from an unknown string s. Output: String s such that Spectrum(s, l) = S. Lopresti Fall 2007 Lecture

29 SBH: Hamiltonian Path approach S = {ATG AGG TGC TCC GTC GGT GCA CAG} Corresponding Hamiltonian Path Problem H : ATG AGG TGC TCC GTC GGT GCA CAG ATGCAG GTCC Path visits every vertex once! Lopresti Fall 2007 Lecture

30 SBH: Hamiltonian Path approach A more complicated graph: S = {ATG TGG TGC GTG GGC GCA GCG CGT } H Lopresti Fall 2007 Lecture

31 SBH: Hamiltonian Path approach S = {ATG TGG TGC GTG GGC GCA GCG CGT} Path 1: H Path 2: H ATGCGTGGCA ATGGCGTGCA Lopresti Fall 2007 Lecture

32 SBH: Eulerian Path approach S = {ATG TGG TGC GTG GGC GCA GCG CGT} Vertices correspond to (l 1)-mers: {AT, TG, GC, GG, GT, CA, CG} Edges correspond to l-mers from S. GT CG AT TG GC CA GG Path visits every edge once! Lopresti Fall 2007 Lecture

33 SBH: Eulerian Path approach S = {ATG TGG TGC GTG GGC GCA GCG CGT} corresponds to two different paths: GT CG GT CG 5 4 AT TG 6 4 GC CA AT TG 5 3 GC CA GG ATGGCGTGCA GG ATGCGTGGCA I.e., two acceptable solutions to this SBH problem. Lopresti Fall 2007 Lecture

34 Euler Theorem A graph is balanced if for every vertex, the number of incoming edges equals the number of outgoing edges: in(v) = out(v) Theorem: a connected graph is Eulerian (i.e., has an Euler Cycle) if and only if each of its vertices is balanced. Lopresti Fall 2007 Lecture

35 Euler Theorem: Proof Two cases to consider: (1) Eulerian balanced For every edge entering v (incoming edge) there exists an edge leaving v (outgoing edge). Therefore in(v) = out(v) (2) Balanced Eulerian??? Lopresti Fall 2007 Lecture

36 Algorithm for constructing an Eulerian Cycle (a) Start with an arbitrary vertex v and form arbitrary cycle with unused edges until dead-end is reached. Since graph is Eulerian, dead-end must be starting point, i.e., vertex v. w v Lopresti Fall 2007 Lecture

37 Algorithm for constructing an Eulerian Cycle (b) If cycle from Step (a) earlier is not Eulerian cycle, it must contain a vertex w, which has untraversed edges. Perform Step (a) again, using vertex w as the starting point. Once again, we will end up at starting vertex w. w v Lopresti Fall 2007 Lecture

38 Algorithm for constructing an Eulerian Cycle (c) Combine cycles from Step (a) and Step (b) into single cycle and iterate Step (b). w v Lopresti Fall 2007 Lecture

39 Euler Theorem: extension Theorem: A connected graph has an Eulerian path if and only if it contains at most two semi-balanced vertices and all other vertices are balanced. I.e., one vertex has an extra outgoing edge, while another corresponding vertex has an extra incoming edge. Lopresti Fall 2007 Lecture

40 Some Difficulties with SBH Fidelity of Hybridization: difficult to detect differences between probes hybridized with perfect matches and those with one or two mismatches. Array Size: effect of low fidelity can be decreased with longer l-mers, but array size increases exponentially in l. Array size is limited with current technology. Practicality: SBH is still impractical. As DNA microarray technology improves, SBH may become practical in future. Practicality (again): although impractical, SBH still spearheaded techniques for expression and SNP analysis. Lopresti Fall 2007 Lecture

41 Traditional DNA sequencing DNA Shake DNA fragments Vector = circular genome (bacterium, plasmid) Known location (restriction site) + = Lopresti Fall 2007 Lecture

42 Different types of vectors VECTOR Plasmid Cosmid BAC (Bacterial Artificial Chromosome) YAC (Yeast Artificial Chromosome) Size of insert (bp) 2,000-10,000 40,000 70, ,000 > 300,000 Not used much recently Lopresti Fall 2007 Lecture

43 Electrophoresis diagrams Lopresti Fall 2007 Lecture

44 Challenging to read sometimes Filtering Smoothening Correcting for length compression Method for deciding nucleotides Lopresti Fall 2007 Lecture

45 Shotgun sequencing Genomic segment Cut many times at random (shotgun) Get one or two reads from each segment ~500 bp ~500 bp Lopresti Fall 2007 Lecture

46 Fragment assembly Reads Cover region with ~7-fold redundancy. Overlap reads, extend to reconstruct original genomic region. Lopresti Fall 2007 Lecture

47 Read coverage C Length of genomic segment: L Number of reads: n Coverage C = (nl) / L Length of each read: l How much coverage is enough? Lander-Waterman model: Assuming uniform distribution of reads, C = 10 results in 1 gapped region per 1,000,000 nucleotides. Lopresti Fall 2007 Lecture

48 Challenges in fragment assembly Repeats: a major problem for fragment assembly. More than 50% of human genome consists of repeats: over 1 million Alu repeats (about 300 bp), about 200,000 LINE repeats (1000 bp and longer). Repeat Repeat Repeat Green and blue fragments are interchangeable when assembling repetitive DNA Lopresti Fall 2007 Lecture

49 Bad things happen when you confuse repeated regions... Correct assembly (we don't know this starting off): Repeat Repeat Repeat Potential mis-assembly (seems just as plausible): Repeat Repeat Repeat Lopresti Fall 2007 Lecture

50 Triazzle: a fun example The puzzle looks simple BUT there are repeats!!! Repeats make it very difficult. Try it only $7.99 at Lopresti Fall 2007 Lecture

51 Repeat types Low-Complexity DNA (e.g., ATATATATACATA ) Microsatellite repeats (a 1 a k ) N where k ~ 3-6 (e.g., CAGCAGTAGCAGCACCAG) Transposons/retrotransposons SINE LINE LTR retroposons Gene Families Short Interspersed Nuclear Elements (e.g., Alu: ~300 bp long, 10 6 copies) Long Interspersed Nuclear Elements ~500-5,000 bp long, 200,000 copies Long Terminal Repeats (~700 bp at each end genes duplicate & then diverge Segmental duplications ~very long, very similar copies Lopresti Fall 2007 Lecture

52 Overlap-Layout-Consensus Assemblers: ARACHNE, PHRAP, CAP, TIGR, CELERA Overlap: find potentially overlapping reads Layout: merge reads into contigs and contigs into supercontigs Consensus: derive the DNA sequence and correct read errors..acgattacaataggtt.. Lopresti Fall 2007 Lecture

53 Overlap Find best match between suffix of one read and prefix of another. Due to sequencing errors, need to use dynamic programming to find optimal overlap alignment. Apply filtration method to filter out pairs of fragments that do not share a significantly long common substring. Lopresti Fall 2007 Lecture

54 Overlapping reads Sort all k-mers in reads (k ~ 24). Find pairs of reads sharing a k-mer. Extend to full alignment throw away if not >95% similar. TACA TAGATTACACAGATTACT GA TAGT TAGATTACACAGATTACTAGA Lopresti Fall 2007 Lecture

55 Overlapping reads and repeats A k-mer that appears N times initiates N 2 comparisons For an Alu that appears 10 6 times, this means comparisons, which is obviously too much. Solution: discard all k-mers that appear more than t x Coverage, (t ~ 10) Lopresti Fall 2007 Lecture

56 Finding overlapping reads Create local multiple alignments from the overlapping reads: TAGATTACACAGATTACTGA TAGATTACACAGATTACTGA TAG TAGATTACACAGATTACTGA TTACACAGATTATTGA TAGATTACACAGATTACTGA TAGATTACACAGATTACTGA TAG TAGATTACACAGATTACTGA TTACACAGATTATTGA Lopresti Fall 2007 Lecture

57 Finding overlapping reads Correct errors using multiple alignment C: 20 C: 35 T: 30 TAGATTACACAGATTACTGA C: 35 TAGATTACACAGATTACTGA C: 40 TAG TAGATTACACAGATTACTGA TTACACAGATTATTGA TAGATTACACAGATTACTGA A: 15 A: 25 - A: 40 A: 25 Score alignments Accept alignments with good scores A: 15 A: 25 A: 0 A: 40 A: 25 C: 20 C: 35 C: 0 C: 35 C: 40 Lopresti Fall 2007 Lecture

58 Layout Repeats are major challenge. Do two aligned fragments really overlap, or are they from two copies of a repeat? Solution: repeat masking hide repeats!!! Masking results in high rate of misassembly (up to 20%). Misassembly means a lot more work at finishing step. Lopresti Fall 2007 Lecture

59 Merge reads into contigs repeat region Merge reads up to potential repeat boundaries Lopresti Fall 2007 Lecture

60 Repeats, errors, and contig lengths Repeats shorter than read length are okay. Repeats with more base pair differences than sequencing error rate are okay. To make smaller portion of genome appear repetitive: - try to increase effective increase read length, - work to decrease sequencing error rate. Lopresti Fall 2007 Lecture

61 Role of error correction Discards ~90% of single-letter sequencing errors. Lowers error rate: decreases effective repeat content, increases contig length. Lopresti Fall 2007 Lecture

62 Merge reads into contigs Repeat region Ignore non-maximal reads. Merge only maximal reads into contigs. Lopresti Fall 2007 Lecture

63 Merge reads into contigs a b Repeat boundary??? Sequencing error Ignore hanging reads, when detecting repeat boundaries. Lopresti Fall 2007 Lecture

64 Merge reads into contigs????? Unambiguous Insert non-maximal reads whenever unambiguous. Lopresti Fall 2007 Lecture

65 Link contigs into supercontigs Normal density Too dense: Overcollapsed? Inconsistent links: Overcollapsed? Lopresti Fall 2007 Lecture

66 Link contigs into supercontigs Find all links between unique contigs. Connect contigs incrementally, if 2 links. Lopresti Fall 2007 Lecture

67 Link contigs into supercontigs Fill gaps in supercontigs with paths of overcollapsed contigs. Lopresti Fall 2007 Lecture

68 Wrap-up Readings for next time: IBP (hash tables, repeat-finding, exact pattern matching, keyword trees, suffix trees). Remember: Come to class having done the readings. Check Blackboard regularly for updates. Lopresti Fall 2007 Lecture

10/8/13 Comp 555 Fall

10/8/13 Comp 555 Fall 10/8/13 Comp 555 Fall 2013 1 Find a tour crossing every bridge just once Leonhard Euler, 1735 Bridges of Königsberg 10/8/13 Comp 555 Fall 2013 2 Find a cycle that visits every edge exactly once Linear

More information

10/15/2009 Comp 590/Comp Fall

10/15/2009 Comp 590/Comp Fall Lecture 13: Graph Algorithms Study Chapter 8.1 8.8 10/15/2009 Comp 590/Comp 790-90 Fall 2009 1 The Bridge Obsession Problem Find a tour crossing every bridge just once Leonhard Euler, 1735 Bridges of Königsberg

More information

Sequencing. Computational Biology IST Ana Teresa Freitas 2011/2012. (BACs) Whole-genome shotgun sequencing Celera Genomics

Sequencing. Computational Biology IST Ana Teresa Freitas 2011/2012. (BACs) Whole-genome shotgun sequencing Celera Genomics Computational Biology IST Ana Teresa Freitas 2011/2012 Sequencing Clone-by-clone shotgun sequencing Human Genome Project Whole-genome shotgun sequencing Celera Genomics (BACs) 1 Must take the fragments

More information

Graph Algorithms in Bioinformatics

Graph Algorithms in Bioinformatics Graph Algorithms in Bioinformatics Computational Biology IST Ana Teresa Freitas 2015/2016 Sequencing Clone-by-clone shotgun sequencing Human Genome Project Whole-genome shotgun sequencing Celera Genomics

More information

DNA Sequencing The Shortest Superstring & Traveling Salesman Problems Sequencing by Hybridization

DNA Sequencing The Shortest Superstring & Traveling Salesman Problems Sequencing by Hybridization Eulerian & Hamiltonian Cycle Problems DNA Sequencing The Shortest Superstring & Traveling Salesman Problems Sequencing by Hybridization The Bridge Obsession Problem Find a tour crossing every bridge just

More information

Algorithms for Bioinformatics

Algorithms for Bioinformatics Adapted from slides by Alexandru Tomescu, Leena Salmela and Veli Mäkinen, which are partly from http://bix.ucsd.edu/bioalgorithms/slides.php 58670 Algorithms for Bioinformatics Lecture 5: Graph Algorithms

More information

Algorithms for Bioinformatics

Algorithms for Bioinformatics Adapted from slides by Alexandru Tomescu, Leena Salmela and Veli Mäkinen, which are partly from http://bix.ucsd.edu/bioalgorithms/slides.php 582670 Algorithms for Bioinformatics Lecture 3: Graph Algorithms

More information

Sequence Assembly Required!

Sequence Assembly Required! Sequence Assembly Required! 1 October 3, ISMB 20172007 1 Sequence Assembly Genome Sequenced Fragments (reads) Assembled Contigs Finished Genome 2 Greedy solution is bounded 3 Typical assembly strategy

More information

CSCI2950-C Lecture 4 DNA Sequencing and Fragment Assembly

CSCI2950-C Lecture 4 DNA Sequencing and Fragment Assembly CSCI2950-C Lecture 4 DNA Sequencing and Fragment Assembly Ben Raphael Sept. 22, 2009 http://cs.brown.edu/courses/csci2950-c/ l-mer composition Def: Given string s, the Spectrum ( s, l ) is unordered multiset

More information

DNA Sequencing. Overview

DNA Sequencing. Overview BINF 3350, Genomics and Bioinformatics DNA Sequencing Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Backgrounds Eulerian Cycles Problem Hamiltonian Cycles

More information

I519 Introduction to Bioinformatics, Genome assembly. Yuzhen Ye School of Informatics & Computing, IUB

I519 Introduction to Bioinformatics, Genome assembly. Yuzhen Ye School of Informatics & Computing, IUB I519 Introduction to Bioinformatics, 2014 Genome assembly Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Contents Genome assembly problem Approaches Comparative assembly The string

More information

DNA Fragment Assembly

DNA Fragment Assembly Algorithms in Bioinformatics Sami Khuri Department of Computer Science San José State University San José, California, USA khuri@cs.sjsu.edu www.cs.sjsu.edu/faculty/khuri DNA Fragment Assembly Overlap

More information

Sequence Assembly. BMI/CS 576 Mark Craven Some sequencing successes

Sequence Assembly. BMI/CS 576  Mark Craven Some sequencing successes Sequence Assembly BMI/CS 576 www.biostat.wisc.edu/bmi576/ Mark Craven craven@biostat.wisc.edu Some sequencing successes Yersinia pestis Cannabis sativa The sequencing problem We want to determine the identity

More information

Genome Reconstruction: A Puzzle with a Billion Pieces Phillip E. C. Compeau and Pavel A. Pevzner

Genome Reconstruction: A Puzzle with a Billion Pieces Phillip E. C. Compeau and Pavel A. Pevzner Genome Reconstruction: A Puzzle with a Billion Pieces Phillip E. C. Compeau and Pavel A. Pevzner Outline I. Problem II. Two Historical Detours III.Example IV.The Mathematics of DNA Sequencing V.Complications

More information

Genome 373: Genome Assembly. Doug Fowler

Genome 373: Genome Assembly. Doug Fowler Genome 373: Genome Assembly Doug Fowler What are some of the things we ve seen we can do with HTS data? We ve seen that HTS can enable a wide variety of analyses ranging from ID ing variants to genome-

More information

CS681: Advanced Topics in Computational Biology

CS681: Advanced Topics in Computational Biology CS681: Advanced Topics in Computational Biology Can Alkan EA224 calkan@cs.bilkent.edu.tr Week 7 Lectures 2-3 http://www.cs.bilkent.edu.tr/~calkan/teaching/cs681/ Genome Assembly Test genome Random shearing

More information

Eulerian tours. Russell Impagliazzo and Miles Jones Thanks to Janine Tiefenbruck. April 20, 2016

Eulerian tours. Russell Impagliazzo and Miles Jones Thanks to Janine Tiefenbruck.  April 20, 2016 Eulerian tours Russell Impagliazzo and Miles Jones Thanks to Janine Tiefenbruck http://cseweb.ucsd.edu/classes/sp16/cse21-bd/ April 20, 2016 Seven Bridges of Konigsberg Is there a path that crosses each

More information

Genome Reconstruction: A Puzzle with a Billion Pieces. Phillip Compeau Carnegie Mellon University Computational Biology Department

Genome Reconstruction: A Puzzle with a Billion Pieces. Phillip Compeau Carnegie Mellon University Computational Biology Department http://cbd.cmu.edu Genome Reconstruction: A Puzzle with a Billion Pieces Phillip Compeau Carnegie Mellon University Computational Biology Department Eternity II: The Highest-Stakes Puzzle in History Courtesy:

More information

Graphs and Puzzles. Eulerian and Hamiltonian Tours.

Graphs and Puzzles. Eulerian and Hamiltonian Tours. Graphs and Puzzles. Eulerian and Hamiltonian Tours. CSE21 Winter 2017, Day 11 (B00), Day 7 (A00) February 3, 2017 http://vlsicad.ucsd.edu/courses/cse21-w17 Exam Announcements Seating Chart on Website Good

More information

Eulerian Tours and Fleury s Algorithm

Eulerian Tours and Fleury s Algorithm Eulerian Tours and Fleury s Algorithm CSE21 Winter 2017, Day 12 (B00), Day 8 (A00) February 8, 2017 http://vlsicad.ucsd.edu/courses/cse21-w17 Vocabulary Path (or walk): describes a route from one vertex

More information

DNA Fragment Assembly

DNA Fragment Assembly SIGCSE 009 Algorithms in Bioinformatics Sami Khuri Department of Computer Science San José State University San José, California, USA khuri@cs.sjsu.edu www.cs.sjsu.edu/faculty/khuri DNA Fragment Assembly

More information

Read Mapping. de Novo Assembly. Genomics: Lecture #2 WS 2014/2015

Read Mapping. de Novo Assembly. Genomics: Lecture #2 WS 2014/2015 Mapping de Novo Assembly Institut für Medizinische Genetik und Humangenetik Charité Universitätsmedizin Berlin Genomics: Lecture #2 WS 2014/2015 Today Genome assembly: the basics Hamiltonian and Eulerian

More information

(for more info see:

(for more info see: Genome assembly (for more info see: http://www.cbcb.umd.edu/research/assembly_primer.shtml) Introduction Sequencing technologies can only "read" short fragments from a genome. Reconstructing the entire

More information

RESEARCH TOPIC IN BIOINFORMANTIC

RESEARCH TOPIC IN BIOINFORMANTIC RESEARCH TOPIC IN BIOINFORMANTIC GENOME ASSEMBLY Instructor: Dr. Yufeng Wu Noted by: February 25, 2012 Genome Assembly is a kind of string sequencing problems. As we all know, the human genome is very

More information

Genome Sequencing Algorithms

Genome Sequencing Algorithms Genome Sequencing Algorithms Phillip Compaeu and Pavel Pevzner Bioinformatics Algorithms: an Active Learning Approach Leonhard Euler (1707 1783) William Hamilton (1805 1865) Nicolaas Govert de Bruijn (1918

More information

DNA Fragment Assembly Algorithms: Toward a Solution for Long Repeats

DNA Fragment Assembly Algorithms: Toward a Solution for Long Repeats San Jose State University SJSU ScholarWorks Master's Projects Master's Theses and Graduate Research 2008 DNA Fragment Assembly Algorithms: Toward a Solution for Long Repeats Ching Li San Jose State University

More information

Genome Assembly and De Novo RNAseq

Genome Assembly and De Novo RNAseq Genome Assembly and De Novo RNAseq BMI 7830 Kun Huang Department of Biomedical Informatics The Ohio State University Outline Problem formulation Hamiltonian path formulation Euler path and de Bruijin graph

More information

de novo assembly Simon Rasmussen 36626: Next Generation Sequencing analysis DTU Bioinformatics Next Generation Sequencing Analysis

de novo assembly Simon Rasmussen 36626: Next Generation Sequencing analysis DTU Bioinformatics Next Generation Sequencing Analysis de novo assembly Simon Rasmussen 36626: Next Generation Sequencing analysis DTU Bioinformatics 27626 - Next Generation Sequencing Analysis Generalized NGS analysis Data size Application Assembly: Compare

More information

Purpose of sequence assembly

Purpose of sequence assembly Sequence Assembly Purpose of sequence assembly Reconstruct long DNA/RNA sequences from short sequence reads Genome sequencing RNA sequencing for gene discovery Amplicon sequencing But not for transcript

More information

BMI/CS 576 Fall 2015 Midterm Exam

BMI/CS 576 Fall 2015 Midterm Exam BMI/CS 576 Fall 2015 Midterm Exam Prof. Colin Dewey Tuesday, October 27th, 2015 11:00am-12:15pm Name: KEY Write your answers on these pages and show your work. You may use the back sides of pages as necessary.

More information

Bioinformatics-themed projects in Discrete Mathematics

Bioinformatics-themed projects in Discrete Mathematics Bioinformatics-themed projects in Discrete Mathematics Art Duval University of Texas at El Paso Joint Mathematics Meeting MAA Contributed Paper Session on Discrete Mathematics in the Undergraduate Curriculum

More information

by the Genevestigator program (www.genevestigator.com). Darker blue color indicates higher gene expression.

by the Genevestigator program (www.genevestigator.com). Darker blue color indicates higher gene expression. Figure S1. Tissue-specific expression profile of the genes that were screened through the RHEPatmatch and root-specific microarray filters. The gene expression profile (heat map) was drawn by the Genevestigator

More information

Introduction to Genome Assembly. Tandy Warnow

Introduction to Genome Assembly. Tandy Warnow Introduction to Genome Assembly Tandy Warnow 2 Shotgun DNA Sequencing DNA target sample SHEAR & SIZE End Reads / Mate Pairs 550bp 10,000bp Not all sequencing technologies produce mate-pairs. Different

More information

BLAST & Genome assembly

BLAST & Genome assembly BLAST & Genome assembly Solon P. Pissis Tomáš Flouri Heidelberg Institute for Theoretical Studies May 15, 2014 1 BLAST What is BLAST? The algorithm 2 Genome assembly De novo assembly Mapping assembly 3

More information

Reducing Genome Assembly Complexity with Optical Maps

Reducing Genome Assembly Complexity with Optical Maps Reducing Genome Assembly Complexity with Optical Maps Lee Mendelowitz LMendelo@math.umd.edu Advisor: Dr. Mihai Pop Computer Science Department Center for Bioinformatics and Computational Biology mpop@umiacs.umd.edu

More information

Bioinformatics: Fragment Assembly. Walter Kosters, Universiteit Leiden. IPA Algorithms&Complexity,

Bioinformatics: Fragment Assembly. Walter Kosters, Universiteit Leiden. IPA Algorithms&Complexity, Bioinformatics: Fragment Assembly Walter Kosters, Universiteit Leiden IPA Algorithms&Complexity, 29.6.2007 www.liacs.nl/home/kosters/ 1 Fragment assembly Problem We study the following problem from bioinformatics:

More information

Solutions Exercise Set 3 Author: Charmi Panchal

Solutions Exercise Set 3 Author: Charmi Panchal Solutions Exercise Set 3 Author: Charmi Panchal Problem 1: Suppose we have following fragments: f1 = ATCCTTAACCCC f2 = TTAACTCA f3 = TTAATACTCCC f4 = ATCTTTC f5 = CACTCCCACACA f6 = CACAATCCTTAACCC f7 =

More information

Omega: an Overlap-graph de novo Assembler for Metagenomics

Omega: an Overlap-graph de novo Assembler for Metagenomics Omega: an Overlap-graph de novo Assembler for Metagenomics B a h l e l H a i d e r, Ta e - H y u k A h n, B r i a n B u s h n e l l, J u a n j u a n C h a i, A l e x C o p e l a n d, C h o n g l e Pa n

More information

IDBA - A Practical Iterative de Bruijn Graph De Novo Assembler

IDBA - A Practical Iterative de Bruijn Graph De Novo Assembler IDBA - A Practical Iterative de Bruijn Graph De Novo Assembler Yu Peng, Henry Leung, S.M. Yiu, Francis Y.L. Chin Department of Computer Science, The University of Hong Kong Pokfulam Road, Hong Kong {ypeng,

More information

Graphs and Genetics. Outline. Computational Biology IST. Ana Teresa Freitas 2015/2016. Slides source: AED (MEEC/IST); Jones and Pevzner (book)

Graphs and Genetics. Outline. Computational Biology IST. Ana Teresa Freitas 2015/2016. Slides source: AED (MEEC/IST); Jones and Pevzner (book) raphs and enetics Computational Biology IST Ana Teresa Freitas / Slides source: AED (MEEC/IST); Jones and Pevzner (book) Outline l Motivacion l Introduction to raph Theory l Eulerian & Hamiltonian Cycle

More information

IDBA A Practical Iterative de Bruijn Graph De Novo Assembler

IDBA A Practical Iterative de Bruijn Graph De Novo Assembler IDBA A Practical Iterative de Bruijn Graph De Novo Assembler Yu Peng, Henry C.M. Leung, S.M. Yiu, and Francis Y.L. Chin Department of Computer Science, The University of Hong Kong Pokfulam Road, Hong Kong

More information

BLAST & Genome assembly

BLAST & Genome assembly BLAST & Genome assembly Solon P. Pissis Tomáš Flouri Heidelberg Institute for Theoretical Studies November 17, 2012 1 Introduction Introduction 2 BLAST What is BLAST? The algorithm 3 Genome assembly De

More information

Reducing Genome Assembly Complexity with Optical Maps Mid-year Progress Report

Reducing Genome Assembly Complexity with Optical Maps Mid-year Progress Report Reducing Genome Assembly Complexity with Optical Maps Mid-year Progress Report Lee Mendelowitz LMendelo@math.umd.edu Advisor: Dr. Mihai Pop Computer Science Department Center for Bioinformatics and Computational

More information

Path Finding in Graphs. Problem Set #2 will be posted by tonight

Path Finding in Graphs. Problem Set #2 will be posted by tonight Path Finding in Graphs Problem Set #2 will be posted by tonight 1 From Last Time Two graphs representing 5-mers from the sequence "GACGGCGGCGCACGGCGCAA" Hamiltonian Path: Eulerian Path: Each k-mer is a

More information

8/19/13. Computational problems. Introduction to Algorithm

8/19/13. Computational problems. Introduction to Algorithm I519, Introduction to Introduction to Algorithm Yuzhen Ye (yye@indiana.edu) School of Informatics and Computing, IUB Computational problems A computational problem specifies an input-output relationship

More information

Euler and Hamilton paths. Jorge A. Cobb The University of Texas at Dallas

Euler and Hamilton paths. Jorge A. Cobb The University of Texas at Dallas Euler and Hamilton paths Jorge A. Cobb The University of Texas at Dallas 1 Paths and the adjacency matrix The powers of the adjacency matrix A r (with normal, not boolean multiplication) contain the number

More information

CSE 417 Branch & Bound (pt 4) Branch & Bound

CSE 417 Branch & Bound (pt 4) Branch & Bound CSE 417 Branch & Bound (pt 4) Branch & Bound Reminders > HW8 due today > HW9 will be posted tomorrow start early program will be slow, so debugging will be slow... Review of previous lectures > Complexity

More information

CSE 111 Bio: Program Design I Lecture 3: Python Basics & More Bio

CSE 111 Bio: Program Design I Lecture 3: Python Basics & More Bio Theresa McCracken @McHumor.com CSE 111 Bio: Program Design I Lecture 3: Python Basics & More Bio Robert Sloan (CS) & Rachel Poretsky (Bio) University of Illinois, Chicago August 31, 2017 DNA sequencing

More information

V1.0: Seth Gilbert, V1.1: Steven Halim August 30, Abstract. d(e), and we assume that the distance function is non-negative (i.e., d(x, y) 0).

V1.0: Seth Gilbert, V1.1: Steven Halim August 30, Abstract. d(e), and we assume that the distance function is non-negative (i.e., d(x, y) 0). CS4234: Optimisation Algorithms Lecture 4 TRAVELLING-SALESMAN-PROBLEM (4 variants) V1.0: Seth Gilbert, V1.1: Steven Halim August 30, 2016 Abstract The goal of the TRAVELLING-SALESMAN-PROBLEM is to find

More information

Description of a genome assembler: CABOG

Description of a genome assembler: CABOG Theo Zimmermann Description of a genome assembler: CABOG CABOG (Celera Assembler with the Best Overlap Graph) is an assembler built upon the Celera Assembler, which, at first, was designed for Sanger sequencing,

More information

Efficient Selection of Unique and Popular Oligos for Large EST Databases. Stefano Lonardi. University of California, Riverside

Efficient Selection of Unique and Popular Oligos for Large EST Databases. Stefano Lonardi. University of California, Riverside Efficient Selection of Unique and Popular Oligos for Large EST Databases Stefano Lonardi University of California, Riverside joint work with Jie Zheng, Timothy Close, Tao Jiang University of California,

More information

Computational models for bionformatics

Computational models for bionformatics Computational models for bionformatics De-novo assembly and alignment-free measures Michele Schimd Department of Information Engineering July 8th, 2015 Michele Schimd (DEI) PostDoc @ DEI July 8th, 2015

More information

Sarah Will Math 490 December 2, 2009

Sarah Will Math 490 December 2, 2009 Sarah Will Math 490 December 2, 2009 Euler Circuits INTRODUCTION Euler wrote the first paper on graph theory. It was a study and proof that it was impossible to cross the seven bridges of Königsberg once

More information

DNA arrays. and their various applications. Algorithmen der Bioinformatik II - SoSe Christoph Dieterich

DNA arrays. and their various applications. Algorithmen der Bioinformatik II - SoSe Christoph Dieterich DNA arrays and their various applications Algorithmen der Bioinformatik II - SoSe 2007 Christoph Dieterich 1 Introduction Motivation DNA microarray is a parallel approach to gene screening and target identification.

More information

IE 102 Spring Routing Through Networks - 1

IE 102 Spring Routing Through Networks - 1 IE 102 Spring 2017 Routing Through Networks - 1 The Bridges of Koenigsberg: Euler 1735 Graph Theory began in 1735 Leonard Eüler Visited Koenigsberg People wondered whether it is possible to take a walk,

More information

A THEORETICAL ANALYSIS OF SCALABILITY OF THE PARALLEL GENOME ASSEMBLY ALGORITHMS

A THEORETICAL ANALYSIS OF SCALABILITY OF THE PARALLEL GENOME ASSEMBLY ALGORITHMS A THEORETICAL ANALYSIS OF SCALABILITY OF THE PARALLEL GENOME ASSEMBLY ALGORITHMS Munib Ahmed, Ishfaq Ahmad Department of Computer Science and Engineering, University of Texas At Arlington, Arlington, Texas

More information

CSCI 1820 Notes. Scribes: tl40. February 26 - March 02, Estimating size of graphs used to build the assembly.

CSCI 1820 Notes. Scribes: tl40. February 26 - March 02, Estimating size of graphs used to build the assembly. CSCI 1820 Notes Scribes: tl40 February 26 - March 02, 2018 Chapter 2. Genome Assembly Algorithms 2.1. Statistical Theory 2.2. Algorithmic Theory Idury-Waterman Algorithm Estimating size of graphs used

More information

Building approximate overlap graphs for DNA assembly using random-permutations-based search.

Building approximate overlap graphs for DNA assembly using random-permutations-based search. An algorithm is presented for fast construction of graphs of reads, where an edge between two reads indicates an approximate overlap between the reads. Since the algorithm finds approximate overlaps directly,

More information

Combinatorial Pattern Matching. CS 466 Saurabh Sinha

Combinatorial Pattern Matching. CS 466 Saurabh Sinha Combinatorial Pattern Matching CS 466 Saurabh Sinha Genomic Repeats Example of repeats: ATGGTCTAGGTCCTAGTGGTC Motivation to find them: Genomic rearrangements are often associated with repeats Trace evolutionary

More information

Genome 373: Mapping Short Sequence Reads I. Doug Fowler

Genome 373: Mapping Short Sequence Reads I. Doug Fowler Genome 373: Mapping Short Sequence Reads I Doug Fowler Two different strategies for parallel amplification BRIDGE PCR EMULSION PCR Two different strategies for parallel amplification BRIDGE PCR EMULSION

More information

MacVector for Mac OS X. The online updater for this release is MB in size

MacVector for Mac OS X. The online updater for this release is MB in size MacVector 17.0.3 for Mac OS X The online updater for this release is 143.5 MB in size You must be running MacVector 15.5.4 or later for this updater to work! System Requirements MacVector 17.0 is supported

More information

Pyramidal and Chiral Groupings of Gold Nanocrystals Assembled Using DNA Scaffolds

Pyramidal and Chiral Groupings of Gold Nanocrystals Assembled Using DNA Scaffolds Pyramidal and Chiral Groupings of Gold Nanocrystals Assembled Using DNA Scaffolds February 27, 2009 Alexander Mastroianni, Shelley Claridge, A. Paul Alivisatos Department of Chemistry, University of California,

More information

Genome Assembly Using de Bruijn Graphs. Biostatistics 666

Genome Assembly Using de Bruijn Graphs. Biostatistics 666 Genome Assembly Using de Bruijn Graphs Biostatistics 666 Previously: Reference Based Analyses Individual short reads are aligned to reference Genotypes generated by examining reads overlapping each position

More information

IDBA - A practical Iterative de Bruijn Graph De Novo Assembler

IDBA - A practical Iterative de Bruijn Graph De Novo Assembler IDBA - A practical Iterative de Bruijn Graph De Novo Assembler Speaker: Gabriele Capannini May 21, 2010 Introduction De Novo Assembly assembling reads together so that they form a new, previously unknown

More information

How to apply de Bruijn graphs to genome assembly

How to apply de Bruijn graphs to genome assembly PRIMER How to apply de Bruijn graphs to genome assembly Phillip E C Compeau, Pavel A Pevzner & lenn Tesler A mathematical concept known as a de Bruijn graph turns the formidable challenge of assembling

More information

Discrete Mathematics and Probability Theory Fall 2009 Satish Rao,David Tse Note 8

Discrete Mathematics and Probability Theory Fall 2009 Satish Rao,David Tse Note 8 CS 70 Discrete Mathematics and Probability Theory Fall 2009 Satish Rao,David Tse Note 8 An Introduction to Graphs Formulating a simple, precise specification of a computational problem is often a prerequisite

More information

Walking with Euler through Ostpreußen and RNA

Walking with Euler through Ostpreußen and RNA Walking with Euler through Ostpreußen and RNA Mark Muldoon February 4, 2010 Königsberg (1652) Kaliningrad (2007)? The Königsberg Bridge problem asks whether it is possible to walk around the old city in

More information

Reducing Genome Assembly Complexity with Optical Maps Final Report

Reducing Genome Assembly Complexity with Optical Maps Final Report Reducing Genome Assembly Complexity with Optical Maps Final Report Lee Mendelowitz LMendelo@math.umd.edu Advisor: Dr. Mihai Pop Computer Science Department Center for Bioinformatics and Computational Biology

More information

Decision Problems. Observation: Many polynomial algorithms. Questions: Can we solve all problems in polynomial time? Answer: No, absolutely not.

Decision Problems. Observation: Many polynomial algorithms. Questions: Can we solve all problems in polynomial time? Answer: No, absolutely not. Decision Problems Observation: Many polynomial algorithms. Questions: Can we solve all problems in polynomial time? Answer: No, absolutely not. Definition: The class of problems that can be solved by polynomial-time

More information

CS 173, Lecture B Introduction to Genome Assembly (using Eulerian Graphs) Tandy Warnow

CS 173, Lecture B Introduction to Genome Assembly (using Eulerian Graphs) Tandy Warnow CS 173, Lecture B Introduction to Genome Assembly (using Eulerian Graphs) Tandy Warnow 2 Shotgun DNA Sequencing DNA target sample SHEAR & SIZE End Reads / Mate Pairs 550bp 10,000bp Not all sequencing technologies

More information

Theorem 2.9: nearest addition algorithm

Theorem 2.9: nearest addition algorithm There are severe limits on our ability to compute near-optimal tours It is NP-complete to decide whether a given undirected =(,)has a Hamiltonian cycle An approximation algorithm for the TSP can be used

More information

EECS 203 Lecture 20. More Graphs

EECS 203 Lecture 20. More Graphs EECS 203 Lecture 20 More Graphs Admin stuffs Last homework due today Office hour changes starting Friday (also in Piazza) Friday 6/17: 2-5 Mark in his office. Sunday 6/19: 2-5 Jasmine in the UGLI. Monday

More information

Discrete Mathematics for CS Spring 2008 David Wagner Note 13. An Introduction to Graphs

Discrete Mathematics for CS Spring 2008 David Wagner Note 13. An Introduction to Graphs CS 70 Discrete Mathematics for CS Spring 2008 David Wagner Note 13 An Introduction to Graphs Formulating a simple, precise specification of a computational problem is often a prerequisite to writing a

More information

11/5/09 Comp 590/Comp Fall

11/5/09 Comp 590/Comp Fall 11/5/09 Comp 590/Comp 790-90 Fall 2009 1 Example of repeats: ATGGTCTAGGTCCTAGTGGTC Motivation to find them: Genomic rearrangements are often associated with repeats Trace evolutionary secrets Many tumors

More information

TCGR: A Novel DNA/RNA Visualization Technique

TCGR: A Novel DNA/RNA Visualization Technique TCGR: A Novel DNA/RNA Visualization Technique Donya Quick and Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist University Dallas, Texas 75275 dquick@mail.smu.edu, mhd@engr.smu.edu

More information

MITOCW watch?v=zyw2aede6wu

MITOCW watch?v=zyw2aede6wu MITOCW watch?v=zyw2aede6wu The following content is provided under a Creative Commons license. Your support will help MIT OpenCourseWare continue to offer high quality educational resources for free. To

More information

Week 11: Eulerian and Hamiltonian graphs; Trees. 21 and 23 November, 2018

Week 11: Eulerian and Hamiltonian graphs; Trees. 21 and 23 November, 2018 (1/22) MA284 : Discrete Mathematics Week 11: Eulerian and amiltonian graphs; Trees http://www.maths.nuigalway.ie/ niall/ma284/ amilton s Icosian Game (Library or the Royal Irish Academy) 21 and 23 November,

More information

Preliminary Syllabus. Genomics. Introduction & Genome Assembly Sequence Comparison Gene Modeling Gene Function Identification

Preliminary Syllabus. Genomics. Introduction & Genome Assembly Sequence Comparison Gene Modeling Gene Function Identification Preliminary Syllabus Sep 30 Oct 2 Oct 7 Oct 9 Oct 14 Oct 16 Oct 21 Oct 25 Oct 28 Nov 4 Nov 8 Introduction & Genome Assembly Sequence Comparison Gene Modeling Gene Function Identification OCTOBER BREAK

More information

Lecture Overview. Sequence search & alignment. Searching sequence databases. Sequence Alignment & Search. Goals: Motivations:

Lecture Overview. Sequence search & alignment. Searching sequence databases. Sequence Alignment & Search. Goals: Motivations: Lecture Overview Sequence Alignment & Search Karin Verspoor, Ph.D. Faculty, Computational Bioscience Program University of Colorado School of Medicine With credit and thanks to Larry Hunter for creating

More information

Computational Genomics and Molecular Biology, Fall

Computational Genomics and Molecular Biology, Fall Computational Genomics and Molecular Biology, Fall 2015 1 Sequence Alignment Dannie Durand Pairwise Sequence Alignment The goal of pairwise sequence alignment is to establish a correspondence between the

More information

02-711/ Computational Genomics and Molecular Biology Fall 2016

02-711/ Computational Genomics and Molecular Biology Fall 2016 Literature assignment 2 Due: Nov. 3 rd, 2016 at 4:00pm Your name: Article: Phillip E C Compeau, Pavel A. Pevzner, Glenn Tesler. How to apply de Bruijn graphs to genome assembly. Nature Biotechnology 29,

More information

Simple Graph. General Graph

Simple Graph. General Graph Graph Theory A graph is a collection of points (also called vertices) and lines (also called edges), with each edge ending at a vertex In general, it is allowed for more than one edge to have the same

More information

Finishing Circular Assemblies. J Fass UCD Genome Center Bioinformatics Core Thursday April 16, 2015

Finishing Circular Assemblies. J Fass UCD Genome Center Bioinformatics Core Thursday April 16, 2015 Finishing Circular Assemblies J Fass UCD Genome Center Bioinformatics Core Thursday April 16, 2015 Assembly Strategies de Bruijn graph Velvet, ABySS earlier, basic assemblers IDBA, SPAdes later, multi-k

More information

Chapter 3: Paths and Cycles

Chapter 3: Paths and Cycles Chapter 3: Paths and Cycles 5 Connectivity 1. Definitions: Walk: finite sequence of edges in which any two consecutive edges are adjacent or identical. (Initial vertex, Final vertex, length) Trail: walk

More information

Week 11: Eulerian and Hamiltonian graphs; Trees. 15 and 17 November, 2017

Week 11: Eulerian and Hamiltonian graphs; Trees. 15 and 17 November, 2017 (1/22) MA284 : Discrete Mathematics Week 11: Eulerian and Hamiltonian graphs; Trees http://www.maths.nuigalway.ie/~niall/ma284/ 15 and 17 November, 2017 Hamilton s Icosian Game (Library or the Royal Irish

More information

Fundamental Properties of Graphs

Fundamental Properties of Graphs Chapter three In many real-life situations we need to know how robust a graph that represents a certain network is, how edges or vertices can be removed without completely destroying the overall connectivity,

More information

12 Algorithms in Bioinformatics I, WS 06, ZBIT, D. Huson, October 18, 2006

12 Algorithms in Bioinformatics I, WS 06, ZBIT, D. Huson, October 18, 2006 12 Algorithms in Bioinformatics I, WS 06, ZBIT, D. Huson, October 18, 2006 3 Sequence comparison by compression This chapter is based on the following articles, which are all recommended reading: X. Chen,

More information

Discrete Mathematics and Probability Theory Fall 2013 Vazirani Note 7

Discrete Mathematics and Probability Theory Fall 2013 Vazirani Note 7 CS 70 Discrete Mathematics and Probability Theory Fall 2013 Vazirani Note 7 An Introduction to Graphs A few centuries ago, residents of the city of Königsberg, Prussia were interested in a certain problem.

More information

Genome Sequencing & Assembly. Slides by Carl Kingsford

Genome Sequencing & Assembly. Slides by Carl Kingsford Genome Sequencing & Assembly Slides by Carl Kingsford Genome Sequencing ACCGTCCAATTGG...! TGGCAGGTTAACC... E.g. human: 3 billion bases split into 23 chromosomes Main tool of traditional sequencing: DNA

More information

Notes slides from before lecture. CSE 21, Winter 2017, Section A00. Lecture 9 Notes. Class URL:

Notes slides from before lecture. CSE 21, Winter 2017, Section A00. Lecture 9 Notes. Class URL: Notes slides from before lecture CSE 21, Winter 2017, Section A00 Lecture 9 Notes Class URL: http://vlsicad.ucsd.edu/courses/cse21-w17/ Notes slides from before lecture Notes February 8 (1) HW4 is due

More information

SOLiD GFF File Format

SOLiD GFF File Format SOLiD GFF File Format 1 Introduction The GFF file is a text based repository and contains data and analysis results; colorspace calls, quality values (QV) and variant annotations. The inputs to the GFF

More information

6.00 Introduction to Computer Science and Programming Fall 2008

6.00 Introduction to Computer Science and Programming Fall 2008 MIT OpenCourseWare http://ocw.mit.edu 6.00 Introduction to Computer Science and Programming Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

Computational Biology Lecture 12: Physical mapping by restriction mapping Saad Mneimneh

Computational Biology Lecture 12: Physical mapping by restriction mapping Saad Mneimneh Computational iology Lecture : Physical mapping by restriction mapping Saad Mneimneh In the beginning of the course, we looked at genetic mapping, which is the problem of identify the relative order of

More information

Introduction III. Graphs. Motivations I. Introduction IV

Introduction III. Graphs. Motivations I. Introduction IV Introduction I Graphs Computer Science & Engineering 235: Discrete Mathematics Christopher M. Bourke cbourke@cse.unl.edu Graph theory was introduced in the 18th century by Leonhard Euler via the Königsberg

More information

An Introduction to Graph Theory

An Introduction to Graph Theory An Introduction to Graph Theory Evelyne Smith-Roberge University of Waterloo March 22, 2017 What is a graph? Definition A graph G is: a set V (G) of objects called vertices together with: a set E(G), of

More information

Number Theory and Graph Theory

Number Theory and Graph Theory 1 Number Theory and Graph Theory Chapter 7 Graph properties By A. Satyanarayana Reddy Department of Mathematics Shiv Nadar University Uttar Pradesh, India E-mail: satya8118@gmail.com 2 Module-2: Eulerian

More information

CLC Server. End User USER MANUAL

CLC Server. End User USER MANUAL CLC Server End User USER MANUAL Manual for CLC Server 10.0.1 Windows, macos and Linux March 8, 2018 This software is for research purposes only. QIAGEN Aarhus Silkeborgvej 2 Prismet DK-8000 Aarhus C Denmark

More information

Bioinformatics I, WS 09-10, D. Huson, February 10,

Bioinformatics I, WS 09-10, D. Huson, February 10, Bioinformatics I, WS 09-10, D. Huson, February 10, 2010 189 12 More on Suffix Trees This week we study the following material: WOTD-algorithm MUMs finding repeats using suffix trees 12.1 The WOTD Algorithm

More information

Lecture 5: Markov models

Lecture 5: Markov models Master s course Bioinformatics Data Analysis and Tools Lecture 5: Markov models Centre for Integrative Bioinformatics Problem in biology Data and patterns are often not clear cut When we want to make a

More information

CHAPTER 10 GRAPHS AND TREES. Alessandro Artale UniBZ - artale/z

CHAPTER 10 GRAPHS AND TREES. Alessandro Artale UniBZ -  artale/z CHAPTER 10 GRAPHS AND TREES Alessandro Artale UniBZ - http://www.inf.unibz.it/ artale/z SECTION 10.1 Graphs: Definitions and Basic Properties Copyright Cengage Learning. All rights reserved. Graphs: Definitions

More information