Genome Sequencing Algorithms

Size: px
Start display at page:

Download "Genome Sequencing Algorithms"

Transcription

1 Genome Sequencing Algorithms Phillip Compaeu and Pavel Pevzner Bioinformatics Algorithms: an Active Learning Approach Leonhard Euler ( ) William Hamilton ( ) Nicolaas Govert de Bruijn ( )

2 The Genome Sequencing Problem Determining the order of nucleotides in a genome Human genome contains about 3 billion nucleotides Ameoba dubia and Paris japonica contain 200 times more! Applications in Medicine, Agriculture, Biotechnology,...

3 The Genome Sequencing Problem There is no technology to read the genome from one end to another. Short snippets, called reads ( nucleotides), can be identified. No info about a location of a read is known. Assembling individual reads into the entire genome is akin to solving a giant overlapping puzzle. The newspaper explosion analogy

4 History of Genome Sequencing 1977: Walter Gilbert and Frederick Sanger developed independent DNA sequencing methods. 1990: Human Genome Project, Francis Collins. 1997: Celera Genomics, Craig Venter. 2000: Human genome is sequenced.

5 Next Generation Sequencing Illumina sequences human genomes for $10,000 Complete Genomics sequences 100s of genomes per month Beijing Genome Institute has 100s of sequencing machines. Is the world's biggest sequencing center.

6 Next Generation Sequencing Identification of mutations in personal genomes for health diagnosis Genome 10K project

7 Genome Assembly The Computational Problem CTA CTA CTA CTA CTA CTA CTA CTA

8 Genome Assembly The Computational Problem Sequencing Machine generates reads A String Reconstruction Problem

9 The Genome Sequencing Problem Reconstruct a genome from reads Input: A collection of of strings, Reads Output: A string, Genome, reconstructed from all the Reads

10 k-mer Composition Composition 3 (TATT) = A C C G G A G T Lexicographical ordering of k-mers A C G A G C G T

11 The String Reconstruction Problem Reconstruct a string from its k-mer composition. Input: A collection of of k-mers Output: A Genome, such that Composition k (Genome) k is is equal to to the collection of of k-mers

12 Naive String Reconstruction Approach A C G A G C G T A T No No 3-mer begins with TT!

13 Representing a Genome as a Path Composition 3 (TATT) = A C C G G A G T Nodes in a Graph Connect k-mer with k-mer 1 2 if if suffix(k-mer 1 ) ) = prefix(k-mer 2 ) ) A C C G G A The Genome T G

14 Path turns into a Graph Connect k-mer with k-mer 1 2 if if suffix(k-mer 1 ) ) = prefix(k-mer 2 ) ) A C C G G A T G

15 Path turns into a Graph Connect k-mer with k-mer 1 2 if if suffix(k-mer 1 ) ) = prefix(k-mer 2 ) )

16 Path turns into a Graph Nodes are ordered lexicographically. How does one find the genome string?

17 Genome Path in the Graph TATT A C G A G C G T The genome string is a Hamiltonian walk in the graph

18 Hamiltonian Path Problem Find a Hamiltonian path in in the graph Input: A graph. Output: A path visiting every node in in the graph exactly once. Hamiltonian Path: A path in a graph that traverses every node exactly once William R Hamilton ( )

19 A Different Path A C C G G A G T 3-mers as nodes C G A A C G G T 3-mers as edges

20 A Different Path C G A A C G G T 3-mers as edges and nodes as prefix and suffixes of the corresponding 3-mers C G A TA AA GT TT A C G G T

21 Glue Identical Nodes C G A TA AA GT TT A C G G T G A GT TT C G T C G TA AA A

22 Glue Identical Nodes G A GT TT C G T C G TA AA A G C G C G A TA AA A GT TT T

23 Glue Identical Nodes G A C G C G TA AA GT TT A T C C TA AA A G T G GT TT A G

24 De Bruijn Graph of the Genome C TA AA A G C T G GT TT A G

25 De Bruijn Graph of the Genome TA T T C TA AA A G C T G GT TT A G The genome string is an Eulerian walk in the De Bruijn graph

26 Eulerian Path Problem Find an Eulerian path in in a graph Input: A graph. Output: A path visiting every edge in in the graph exactly once. Eulerian Path: A path in a graph that traverses every edge exactly once. Leonhard Euler ( )

27 Hamiltonian Path vs. Eulerian Path C TA AA A G C T G GT TT A

28 Hamiltonian Path vs. Eulerian Path Euler has presented an efficient solution to the Eulerian path problem. No fast algorithm exists to solve the Hamiltonian Path problem. The Hamiltonian Path Problem is NP-Complete. C C TA AA A T GT TT G G A G

29 The Objective TA TT A A C C G G A A G G C C G G T T

30 To Do... TA TT A A C C G G A A G G C C G G T T A C C G G A G T TA AA GT TT

31 Constructing De Bruijn Graph The composition of the genome is known C G A TA AA GT TT A C C G G A G T AA GT A C G G T

32 Constructing De Bruijn Graph C G A TA AA GT TT AA GT A C G G T

33 Constructing De Bruijn Graph C G A GT TT TA AA A C G G T GT

34 Constructing De Bruijn Graph TA AA C G A GT TT A GT C G G T

35 Constructing De Bruijn Graph C G A TA AA GT TT A C G G T

36 Glue Identical Nodes C G A TA AA GT TT A C G G T G A GT TT C G T C G TA AA A

37 Glue Identical Nodes G A GT TT C G T C G TA AA A G C G C G A TA AA A GT TT T

38 Glue Identical Nodes G A C G C G TA AA GT TT A T C C TA AA A G T G GT TT A G

39 De Bruijn Graph of the Genome Composition C C TA AA A G T G GT TT A G De De Bruijn Graph(Genome Composition) == == De De Bruijn Graph(Genome)

40 Constructing the De Bruijn Graph De Bruijn graph of a collection of k-mers: Represent every k-mer as an edge between its prefix and suffix Glue all nodes with identical labels

41 Euler Cycle Problem Find an Eulerian cycle in in a graph Input: A graph. Output: A cycle visiting every edge in in the graph exactly once.

42 The Konigsberg Bridges

43 Eulerian Graph A graph is Eulerian if it contains an Eulerian cycle Every balanced and strongly connected graph is Eulerian

44 Algorithm to Find the Eulerian Cycle

45 Algorithm to Find the Eulerian Cycle

46 Algorithm to Find the Eulerian Cycle EulerianCycle(Graph) form a cycle Cycle by randomly walking in in Graph while there are unexplored edges in in Graph select a node newstart in in Cycle with unexplored edges form Cycle' by traversing Cycle (starting at at newstart) and then randomly walking Cycle Cycle' return Cycle

47 From Reads to De Bruijn Graph to Genome TATT C C TA AA A G T G GT TT A A C G A G C G T

48 Multiple Eulerian Paths TAT T C C TA AA A G T G GT TT A G

49 Multiple Eulerian Paths TATT C C TA AA A G T G GT TT A G C C TA TA AA A G A G T G GT TT

50 DNA Sequencing with Read-pairs Read-pair is a pair of reads separated by a fixed distance d. Genome: TATT. A- is a 3,1 read pair. A- represents the sequence A in the original debruijn graph

51 DNA Sequencing with Read-pairs Composition 3 (TATT) = A C C G G A G T PairedComposition 3,1 (TATT) = A C C G G C A G G G T A

52 Paired Composition A C C G G C A G G G T A Lexicographical order: A C G C A G G A G T C G

53 String Reconstruction from Read-pairs String reconstruction from read-pairs Input: A collection of of paired k-mers Output: A string Text such that PairedComposition(Text) is is equal to to the collection of of paired k-mers

54 Paired De Bruijn Graphs A C C G G C A G G G T A AA A C G C A GT TT GT G G A G T TA AA C G

55 Paired De Bruijn Graphs Combine nodes with identical labels A G G A G AA GT TT TA AA GT C C A G G T C

56 Paired De Bruijn Graphs C AA A G C A GT TT GT G G A G T TA C G

57 Paired De Bruijn Graphs G C A C A G G G T TA AA GT TT C G A Paired DeBruijn graphs obtained from the paired composition and the genome are identical.

58 Paired De Bruijn Graphs G C A C A G G G T TA AA GT TT C G A Unique genome string: TATT

DNA Sequencing. Overview

DNA Sequencing. Overview BINF 3350, Genomics and Bioinformatics DNA Sequencing Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Backgrounds Eulerian Cycles Problem Hamiltonian Cycles

More information

Genome Reconstruction: A Puzzle with a Billion Pieces Phillip E. C. Compeau and Pavel A. Pevzner

Genome Reconstruction: A Puzzle with a Billion Pieces Phillip E. C. Compeau and Pavel A. Pevzner Genome Reconstruction: A Puzzle with a Billion Pieces Phillip E. C. Compeau and Pavel A. Pevzner Outline I. Problem II. Two Historical Detours III.Example IV.The Mathematics of DNA Sequencing V.Complications

More information

Algorithms for Bioinformatics

Algorithms for Bioinformatics Adapted from slides by Alexandru Tomescu, Leena Salmela and Veli Mäkinen, which are partly from http://bix.ucsd.edu/bioalgorithms/slides.php 582670 Algorithms for Bioinformatics Lecture 3: Graph Algorithms

More information

Read Mapping. de Novo Assembly. Genomics: Lecture #2 WS 2014/2015

Read Mapping. de Novo Assembly. Genomics: Lecture #2 WS 2014/2015 Mapping de Novo Assembly Institut für Medizinische Genetik und Humangenetik Charité Universitätsmedizin Berlin Genomics: Lecture #2 WS 2014/2015 Today Genome assembly: the basics Hamiltonian and Eulerian

More information

02-711/ Computational Genomics and Molecular Biology Fall 2016

02-711/ Computational Genomics and Molecular Biology Fall 2016 Literature assignment 2 Due: Nov. 3 rd, 2016 at 4:00pm Your name: Article: Phillip E C Compeau, Pavel A. Pevzner, Glenn Tesler. How to apply de Bruijn graphs to genome assembly. Nature Biotechnology 29,

More information

Genome Reconstruction: A Puzzle with a Billion Pieces. Phillip Compeau Carnegie Mellon University Computational Biology Department

Genome Reconstruction: A Puzzle with a Billion Pieces. Phillip Compeau Carnegie Mellon University Computational Biology Department http://cbd.cmu.edu Genome Reconstruction: A Puzzle with a Billion Pieces Phillip Compeau Carnegie Mellon University Computational Biology Department Eternity II: The Highest-Stakes Puzzle in History Courtesy:

More information

DNA Sequencing The Shortest Superstring & Traveling Salesman Problems Sequencing by Hybridization

DNA Sequencing The Shortest Superstring & Traveling Salesman Problems Sequencing by Hybridization Eulerian & Hamiltonian Cycle Problems DNA Sequencing The Shortest Superstring & Traveling Salesman Problems Sequencing by Hybridization The Bridge Obsession Problem Find a tour crossing every bridge just

More information

10/15/2009 Comp 590/Comp Fall

10/15/2009 Comp 590/Comp Fall Lecture 13: Graph Algorithms Study Chapter 8.1 8.8 10/15/2009 Comp 590/Comp 790-90 Fall 2009 1 The Bridge Obsession Problem Find a tour crossing every bridge just once Leonhard Euler, 1735 Bridges of Königsberg

More information

Genome 373: Genome Assembly. Doug Fowler

Genome 373: Genome Assembly. Doug Fowler Genome 373: Genome Assembly Doug Fowler What are some of the things we ve seen we can do with HTS data? We ve seen that HTS can enable a wide variety of analyses ranging from ID ing variants to genome-

More information

Algorithms for Bioinformatics

Algorithms for Bioinformatics Adapted from slides by Alexandru Tomescu, Leena Salmela and Veli Mäkinen, which are partly from http://bix.ucsd.edu/bioalgorithms/slides.php 58670 Algorithms for Bioinformatics Lecture 5: Graph Algorithms

More information

I519 Introduction to Bioinformatics, Genome assembly. Yuzhen Ye School of Informatics & Computing, IUB

I519 Introduction to Bioinformatics, Genome assembly. Yuzhen Ye School of Informatics & Computing, IUB I519 Introduction to Bioinformatics, 2014 Genome assembly Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Contents Genome assembly problem Approaches Comparative assembly The string

More information

Graph Algorithms in Bioinformatics

Graph Algorithms in Bioinformatics Graph Algorithms in Bioinformatics Computational Biology IST Ana Teresa Freitas 2015/2016 Sequencing Clone-by-clone shotgun sequencing Human Genome Project Whole-genome shotgun sequencing Celera Genomics

More information

10/8/13 Comp 555 Fall

10/8/13 Comp 555 Fall 10/8/13 Comp 555 Fall 2013 1 Find a tour crossing every bridge just once Leonhard Euler, 1735 Bridges of Königsberg 10/8/13 Comp 555 Fall 2013 2 Find a cycle that visits every edge exactly once Linear

More information

How to apply de Bruijn graphs to genome assembly

How to apply de Bruijn graphs to genome assembly PRIMER How to apply de Bruijn graphs to genome assembly Phillip E C Compeau, Pavel A Pevzner & lenn Tesler A mathematical concept known as a de Bruijn graph turns the formidable challenge of assembling

More information

CSCI2950-C Lecture 4 DNA Sequencing and Fragment Assembly

CSCI2950-C Lecture 4 DNA Sequencing and Fragment Assembly CSCI2950-C Lecture 4 DNA Sequencing and Fragment Assembly Ben Raphael Sept. 22, 2009 http://cs.brown.edu/courses/csci2950-c/ l-mer composition Def: Given string s, the Spectrum ( s, l ) is unordered multiset

More information

Sequence Assembly Required!

Sequence Assembly Required! Sequence Assembly Required! 1 October 3, ISMB 20172007 1 Sequence Assembly Genome Sequenced Fragments (reads) Assembled Contigs Finished Genome 2 Greedy solution is bounded 3 Typical assembly strategy

More information

Sequencing. Computational Biology IST Ana Teresa Freitas 2011/2012. (BACs) Whole-genome shotgun sequencing Celera Genomics

Sequencing. Computational Biology IST Ana Teresa Freitas 2011/2012. (BACs) Whole-genome shotgun sequencing Celera Genomics Computational Biology IST Ana Teresa Freitas 2011/2012 Sequencing Clone-by-clone shotgun sequencing Human Genome Project Whole-genome shotgun sequencing Celera Genomics (BACs) 1 Must take the fragments

More information

Introduction to Genome Assembly. Tandy Warnow

Introduction to Genome Assembly. Tandy Warnow Introduction to Genome Assembly Tandy Warnow 2 Shotgun DNA Sequencing DNA target sample SHEAR & SIZE End Reads / Mate Pairs 550bp 10,000bp Not all sequencing technologies produce mate-pairs. Different

More information

Sequence Assembly. BMI/CS 576 Mark Craven Some sequencing successes

Sequence Assembly. BMI/CS 576  Mark Craven Some sequencing successes Sequence Assembly BMI/CS 576 www.biostat.wisc.edu/bmi576/ Mark Craven craven@biostat.wisc.edu Some sequencing successes Yersinia pestis Cannabis sativa The sequencing problem We want to determine the identity

More information

Graph Algorithms in Bioinformatics

Graph Algorithms in Bioinformatics Graph Algorithms in Bioinformatics Bioinformatics: Issues and Algorithms CSE 308-408 Fall 2007 Lecture 13 Lopresti Fall 2007 Lecture 13-1 - Outline Introduction to graph theory Eulerian & Hamiltonian Cycle

More information

RESEARCH TOPIC IN BIOINFORMANTIC

RESEARCH TOPIC IN BIOINFORMANTIC RESEARCH TOPIC IN BIOINFORMANTIC GENOME ASSEMBLY Instructor: Dr. Yufeng Wu Noted by: February 25, 2012 Genome Assembly is a kind of string sequencing problems. As we all know, the human genome is very

More information

de novo assembly Simon Rasmussen 36626: Next Generation Sequencing analysis DTU Bioinformatics Next Generation Sequencing Analysis

de novo assembly Simon Rasmussen 36626: Next Generation Sequencing analysis DTU Bioinformatics Next Generation Sequencing Analysis de novo assembly Simon Rasmussen 36626: Next Generation Sequencing analysis DTU Bioinformatics 27626 - Next Generation Sequencing Analysis Generalized NGS analysis Data size Application Assembly: Compare

More information

(for more info see:

(for more info see: Genome assembly (for more info see: http://www.cbcb.umd.edu/research/assembly_primer.shtml) Introduction Sequencing technologies can only "read" short fragments from a genome. Reconstructing the entire

More information

CS 173, Lecture B Introduction to Genome Assembly (using Eulerian Graphs) Tandy Warnow

CS 173, Lecture B Introduction to Genome Assembly (using Eulerian Graphs) Tandy Warnow CS 173, Lecture B Introduction to Genome Assembly (using Eulerian Graphs) Tandy Warnow 2 Shotgun DNA Sequencing DNA target sample SHEAR & SIZE End Reads / Mate Pairs 550bp 10,000bp Not all sequencing technologies

More information

DNA Fragment Assembly

DNA Fragment Assembly Algorithms in Bioinformatics Sami Khuri Department of Computer Science San José State University San José, California, USA khuri@cs.sjsu.edu www.cs.sjsu.edu/faculty/khuri DNA Fragment Assembly Overlap

More information

EECS 203 Lecture 20. More Graphs

EECS 203 Lecture 20. More Graphs EECS 203 Lecture 20 More Graphs Admin stuffs Last homework due today Office hour changes starting Friday (also in Piazza) Friday 6/17: 2-5 Mark in his office. Sunday 6/19: 2-5 Jasmine in the UGLI. Monday

More information

Graphs and Genetics. Outline. Computational Biology IST. Ana Teresa Freitas 2015/2016. Slides source: AED (MEEC/IST); Jones and Pevzner (book)

Graphs and Genetics. Outline. Computational Biology IST. Ana Teresa Freitas 2015/2016. Slides source: AED (MEEC/IST); Jones and Pevzner (book) raphs and enetics Computational Biology IST Ana Teresa Freitas / Slides source: AED (MEEC/IST); Jones and Pevzner (book) Outline l Motivacion l Introduction to raph Theory l Eulerian & Hamiltonian Cycle

More information

CSCI 1820 Notes. Scribes: tl40. February 26 - March 02, Estimating size of graphs used to build the assembly.

CSCI 1820 Notes. Scribes: tl40. February 26 - March 02, Estimating size of graphs used to build the assembly. CSCI 1820 Notes Scribes: tl40 February 26 - March 02, 2018 Chapter 2. Genome Assembly Algorithms 2.1. Statistical Theory 2.2. Algorithmic Theory Idury-Waterman Algorithm Estimating size of graphs used

More information

Performance analysis of parallel de novo genome assembly in shared memory system

Performance analysis of parallel de novo genome assembly in shared memory system IOP Conference Series: Earth and Environmental Science PAPER OPEN ACCESS Performance analysis of parallel de novo genome assembly in shared memory system To cite this article: Syam Budi Iryanto et al 2018

More information

BLAST & Genome assembly

BLAST & Genome assembly BLAST & Genome assembly Solon P. Pissis Tomáš Flouri Heidelberg Institute for Theoretical Studies May 15, 2014 1 BLAST What is BLAST? The algorithm 2 Genome assembly De novo assembly Mapping assembly 3

More information

CS681: Advanced Topics in Computational Biology

CS681: Advanced Topics in Computational Biology CS681: Advanced Topics in Computational Biology Can Alkan EA224 calkan@cs.bilkent.edu.tr Week 7 Lectures 2-3 http://www.cs.bilkent.edu.tr/~calkan/teaching/cs681/ Genome Assembly Test genome Random shearing

More information

Omega: an Overlap-graph de novo Assembler for Metagenomics

Omega: an Overlap-graph de novo Assembler for Metagenomics Omega: an Overlap-graph de novo Assembler for Metagenomics B a h l e l H a i d e r, Ta e - H y u k A h n, B r i a n B u s h n e l l, J u a n j u a n C h a i, A l e x C o p e l a n d, C h o n g l e Pa n

More information

Genome Assembly and De Novo RNAseq

Genome Assembly and De Novo RNAseq Genome Assembly and De Novo RNAseq BMI 7830 Kun Huang Department of Biomedical Informatics The Ohio State University Outline Problem formulation Hamiltonian path formulation Euler path and de Bruijin graph

More information

Purpose of sequence assembly

Purpose of sequence assembly Sequence Assembly Purpose of sequence assembly Reconstruct long DNA/RNA sequences from short sequence reads Genome sequencing RNA sequencing for gene discovery Amplicon sequencing But not for transcript

More information

BMI/CS 576 Fall 2015 Midterm Exam

BMI/CS 576 Fall 2015 Midterm Exam BMI/CS 576 Fall 2015 Midterm Exam Prof. Colin Dewey Tuesday, October 27th, 2015 11:00am-12:15pm Name: KEY Write your answers on these pages and show your work. You may use the back sides of pages as necessary.

More information

Sequence Design Problems in Discovery of Regulatory Elements

Sequence Design Problems in Discovery of Regulatory Elements Sequence Design Problems in Discovery of Regulatory Elements Yaron Orenstein, Bonnie Berger and Ron Shamir Regulatory Genomics workshop Simons Institute March 10th, 2016, Berkeley, CA Differentially methylated

More information

DNA Fragment Assembly

DNA Fragment Assembly SIGCSE 009 Algorithms in Bioinformatics Sami Khuri Department of Computer Science San José State University San José, California, USA khuri@cs.sjsu.edu www.cs.sjsu.edu/faculty/khuri DNA Fragment Assembly

More information

Questions? You are given the complete graph of Facebook. What questions would you ask? (What questions could we hope to answer?)

Questions? You are given the complete graph of Facebook. What questions would you ask? (What questions could we hope to answer?) P vs. NP What now? Attribution These slides were prepared for the New Jersey Governor s School course The Math Behind the Machine taught in the summer of 2011 by Grant Schoenebeck Large parts of these

More information

of Nebraska - Lincoln

of Nebraska - Lincoln University of Nebraska - Lincoln DigitalCommons@University of Nebraska - Lincoln MAT Exam Expository Papers Math in the Middle Institute Partnership 7-2008 De Bruijn Cycles Val Adams University of Nebraska

More information

Eulerian tours. Russell Impagliazzo and Miles Jones Thanks to Janine Tiefenbruck. April 20, 2016

Eulerian tours. Russell Impagliazzo and Miles Jones Thanks to Janine Tiefenbruck.  April 20, 2016 Eulerian tours Russell Impagliazzo and Miles Jones Thanks to Janine Tiefenbruck http://cseweb.ucsd.edu/classes/sp16/cse21-bd/ April 20, 2016 Seven Bridges of Konigsberg Is there a path that crosses each

More information

A THEORETICAL ANALYSIS OF SCALABILITY OF THE PARALLEL GENOME ASSEMBLY ALGORITHMS

A THEORETICAL ANALYSIS OF SCALABILITY OF THE PARALLEL GENOME ASSEMBLY ALGORITHMS A THEORETICAL ANALYSIS OF SCALABILITY OF THE PARALLEL GENOME ASSEMBLY ALGORITHMS Munib Ahmed, Ishfaq Ahmad Department of Computer Science and Engineering, University of Texas At Arlington, Arlington, Texas

More information

6.2. Paths and Cycles

6.2. Paths and Cycles 6.2. PATHS AND CYCLES 85 6.2. Paths and Cycles 6.2.1. Paths. A path from v 0 to v n of length n is a sequence of n+1 vertices (v k ) and n edges (e k ) of the form v 0, e 1, v 1, e 2, v 2,..., e n, v n,

More information

The Value of Mate-pairs for Repeat Resolution

The Value of Mate-pairs for Repeat Resolution The Value of Mate-pairs for Repeat Resolution An Analysis on Graphs Created From Short Reads Joshua Wetzel Department of Computer Science Rutgers University Camden in conjunction with CBCB at University

More information

Reducing Genome Assembly Complexity with Optical Maps

Reducing Genome Assembly Complexity with Optical Maps Reducing Genome Assembly Complexity with Optical Maps Lee Mendelowitz LMendelo@math.umd.edu Advisor: Dr. Mihai Pop Computer Science Department Center for Bioinformatics and Computational Biology mpop@umiacs.umd.edu

More information

PATH FINDING AND GRAPH TRAVERSAL

PATH FINDING AND GRAPH TRAVERSAL GRAPH TRAVERSAL PATH FINDING AND GRAPH TRAVERSAL Path finding refers to determining the shortest path between two vertices in a graph. We discussed the Floyd Warshall algorithm previously, but you may

More information

Worksheet 28: Wednesday November 18 Euler and Topology

Worksheet 28: Wednesday November 18 Euler and Topology Worksheet 28: Wednesday November 18 Euler and Topology The Konigsberg Problem: The Foundation of Topology The Konigsberg Bridge Problem is a very famous problem solved by Euler in 1735. The process he

More information

Intermediate Math Circles Wednesday, February 22, 2017 Graph Theory III

Intermediate Math Circles Wednesday, February 22, 2017 Graph Theory III 1 Eulerian Graphs Intermediate Math Circles Wednesday, February 22, 2017 Graph Theory III Let s begin this section with a problem that you may remember from lecture 1. Consider the layout of land and water

More information

Eulerian Tours and Fleury s Algorithm

Eulerian Tours and Fleury s Algorithm Eulerian Tours and Fleury s Algorithm CSE21 Winter 2017, Day 12 (B00), Day 8 (A00) February 8, 2017 http://vlsicad.ucsd.edu/courses/cse21-w17 Vocabulary Path (or walk): describes a route from one vertex

More information

IDBA A Practical Iterative de Bruijn Graph De Novo Assembler

IDBA A Practical Iterative de Bruijn Graph De Novo Assembler IDBA A Practical Iterative de Bruijn Graph De Novo Assembler Yu Peng, Henry C.M. Leung, S.M. Yiu, and Francis Y.L. Chin Department of Computer Science, The University of Hong Kong Pokfulam Road, Hong Kong

More information

IDBA - A Practical Iterative de Bruijn Graph De Novo Assembler

IDBA - A Practical Iterative de Bruijn Graph De Novo Assembler IDBA - A Practical Iterative de Bruijn Graph De Novo Assembler Yu Peng, Henry Leung, S.M. Yiu, Francis Y.L. Chin Department of Computer Science, The University of Hong Kong Pokfulam Road, Hong Kong {ypeng,

More information

IE 102 Spring Routing Through Networks - 1

IE 102 Spring Routing Through Networks - 1 IE 102 Spring 2017 Routing Through Networks - 1 The Bridges of Koenigsberg: Euler 1735 Graph Theory began in 1735 Leonard Eüler Visited Koenigsberg People wondered whether it is possible to take a walk,

More information

BLAST & Genome assembly

BLAST & Genome assembly BLAST & Genome assembly Solon P. Pissis Tomáš Flouri Heidelberg Institute for Theoretical Studies November 17, 2012 1 Introduction Introduction 2 BLAST What is BLAST? The algorithm 3 Genome assembly De

More information

Graphs and Puzzles. Eulerian and Hamiltonian Tours.

Graphs and Puzzles. Eulerian and Hamiltonian Tours. Graphs and Puzzles. Eulerian and Hamiltonian Tours. CSE21 Winter 2017, Day 11 (B00), Day 7 (A00) February 3, 2017 http://vlsicad.ucsd.edu/courses/cse21-w17 Exam Announcements Seating Chart on Website Good

More information

Chapter 3: Paths and Cycles

Chapter 3: Paths and Cycles Chapter 3: Paths and Cycles 5 Connectivity 1. Definitions: Walk: finite sequence of edges in which any two consecutive edges are adjacent or identical. (Initial vertex, Final vertex, length) Trail: walk

More information

DNA Sequencing Error Correction using Spectral Alignment

DNA Sequencing Error Correction using Spectral Alignment DNA Sequencing Error Correction using Spectral Alignment Novaldo Caesar, Wisnu Ananta Kusuma, Sony Hartono Wijaya Department of Computer Science Faculty of Mathematics and Natural Science, Bogor Agricultural

More information

IDBA - A practical Iterative de Bruijn Graph De Novo Assembler

IDBA - A practical Iterative de Bruijn Graph De Novo Assembler IDBA - A practical Iterative de Bruijn Graph De Novo Assembler Speaker: Gabriele Capannini May 21, 2010 Introduction De Novo Assembly assembling reads together so that they form a new, previously unknown

More information

Description of a genome assembler: CABOG

Description of a genome assembler: CABOG Theo Zimmermann Description of a genome assembler: CABOG CABOG (Celera Assembler with the Best Overlap Graph) is an assembler built upon the Celera Assembler, which, at first, was designed for Sanger sequencing,

More information

Characterization of Graphs with Eulerian Circuits

Characterization of Graphs with Eulerian Circuits Eulerian Circuits 3. 73 Characterization of Graphs with Eulerian Circuits There is a simple way to determine if a graph has an Eulerian circuit. Theorems 3.. and 3..2: Let G be a pseudograph that is connected

More information

Eulerian Cycle (2A) Young Won Lim 4/26/18

Eulerian Cycle (2A) Young Won Lim 4/26/18 Eulerian Cycle (2A) Copyright (c) 2015 2018 Young W. Lim. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any

More information

Accelerating Genomic Sequence Alignment Workload with Scalable Vector Architecture

Accelerating Genomic Sequence Alignment Workload with Scalable Vector Architecture Accelerating Genomic Sequence Alignment Workload with Scalable Vector Architecture Dong-hyeon Park, Jon Beaumont, Trevor Mudge University of Michigan, Ann Arbor Genomics Past Weeks ~$3 billion Human Genome

More information

Path Finding in Graphs. Problem Set #2 will be posted by tonight

Path Finding in Graphs. Problem Set #2 will be posted by tonight Path Finding in Graphs Problem Set #2 will be posted by tonight 1 From Last Time Two graphs representing 5-mers from the sequence "GACGGCGGCGCACGGCGCAA" Hamiltonian Path: Eulerian Path: Each k-mer is a

More information

Week 11: Eulerian and Hamiltonian graphs; Trees. 15 and 17 November, 2017

Week 11: Eulerian and Hamiltonian graphs; Trees. 15 and 17 November, 2017 (1/22) MA284 : Discrete Mathematics Week 11: Eulerian and Hamiltonian graphs; Trees http://www.maths.nuigalway.ie/~niall/ma284/ 15 and 17 November, 2017 Hamilton s Icosian Game (Library or the Royal Irish

More information

Constrained traversal of repeats with paired sequences

Constrained traversal of repeats with paired sequences RECOMB 2011 Satellite Workshop on Massively Parallel Sequencing (RECOMB-seq) 26-27 March 2011, Vancouver, BC, Canada; Short talk: 2011-03-27 12:10-12:30 (presentation: 15 minutes, questions: 5 minutes)

More information

Fundamental Properties of Graphs

Fundamental Properties of Graphs Chapter three In many real-life situations we need to know how robust a graph that represents a certain network is, how edges or vertices can be removed without completely destroying the overall connectivity,

More information

CS 68: BIOINFORMATICS. Prof. Sara Mathieson Swarthmore College Spring 2018

CS 68: BIOINFORMATICS. Prof. Sara Mathieson Swarthmore College Spring 2018 CS 68: BIOINFORMATICS Prof. Sara Mathieson Swarthmore College Spring 2018 Outline: Jan 31 DBG assembly in practice Velvet assembler Evaluation of assemblies (if time) Start: string alignment Candidate

More information

Next Generation Sequencing Workshop De novo genome assembly

Next Generation Sequencing Workshop De novo genome assembly Next Generation Sequencing Workshop De novo genome assembly Tristan Lefébure TNL7@cornell.edu Stanhope Lab Population Medicine & Diagnostic Sciences Cornell University April 14th 2010 De novo assembly

More information

DNA Fragment Assembly Algorithms: Toward a Solution for Long Repeats

DNA Fragment Assembly Algorithms: Toward a Solution for Long Repeats San Jose State University SJSU ScholarWorks Master's Projects Master's Theses and Graduate Research 2008 DNA Fragment Assembly Algorithms: Toward a Solution for Long Repeats Ching Li San Jose State University

More information

Discrete Mathematics for CS Spring 2008 David Wagner Note 13. An Introduction to Graphs

Discrete Mathematics for CS Spring 2008 David Wagner Note 13. An Introduction to Graphs CS 70 Discrete Mathematics for CS Spring 2008 David Wagner Note 13 An Introduction to Graphs Formulating a simple, precise specification of a computational problem is often a prerequisite to writing a

More information

Crossing bridges. Crossing bridges Great Ideas in Theoretical Computer Science. Lecture 12: Graphs I: The Basics. Königsberg (Prussia)

Crossing bridges. Crossing bridges Great Ideas in Theoretical Computer Science. Lecture 12: Graphs I: The Basics. Königsberg (Prussia) 15-251 Great Ideas in Theoretical Computer Science Lecture 12: Graphs I: The Basics February 22nd, 2018 Crossing bridges Königsberg (Prussia) Now Kaliningrad (Russia) Is there a way to walk through the

More information

Ma/CS 6a Class 8: Eulerian Cycles

Ma/CS 6a Class 8: Eulerian Cycles Ma/CS 6a Class 8: Eulerian Cycles By Adam Sheffer The Bridges of Königsberg Can we travel the city while crossing every bridge exactly once? 1 How Graph Theory was Born Leonhard Euler 1736 Eulerian Cycle

More information

Eulerian Cycle (2A) Walk : vertices may repeat, edges may repeat (closed or open) Trail: vertices may repeat, edges cannot repeat (open)

Eulerian Cycle (2A) Walk : vertices may repeat, edges may repeat (closed or open) Trail: vertices may repeat, edges cannot repeat (open) Eulerian Cycle (2A) Walk : vertices may repeat, edges may repeat (closed or open) Trail: vertices may repeat, edges cannot repeat (open) circuit : vertices my repeat, edges cannot repeat (closed) path

More information

Euler and Hamilton paths. Jorge A. Cobb The University of Texas at Dallas

Euler and Hamilton paths. Jorge A. Cobb The University of Texas at Dallas Euler and Hamilton paths Jorge A. Cobb The University of Texas at Dallas 1 Paths and the adjacency matrix The powers of the adjacency matrix A r (with normal, not boolean multiplication) contain the number

More information

Hybrid Parallel Programming

Hybrid Parallel Programming Hybrid Parallel Programming for Massive Graph Analysis KameshMdd Madduri KMadduri@lbl.gov ComputationalResearch Division Lawrence Berkeley National Laboratory SIAM Annual Meeting 2010 July 12, 2010 Hybrid

More information

Reducing Genome Assembly Complexity with Optical Maps Mid-year Progress Report

Reducing Genome Assembly Complexity with Optical Maps Mid-year Progress Report Reducing Genome Assembly Complexity with Optical Maps Mid-year Progress Report Lee Mendelowitz LMendelo@math.umd.edu Advisor: Dr. Mihai Pop Computer Science Department Center for Bioinformatics and Computational

More information

DELL EMC POWER EDGE R940 MAKES DE NOVO ASSEMBLY EASIER

DELL EMC POWER EDGE R940 MAKES DE NOVO ASSEMBLY EASIER DELL EMC POWER EDGE R940 MAKES DE NOVO ASSEMBLY EASIER Genome Assembly on Deep Sequencing data with SOAPdenovo2 ABSTRACT De novo assemblies are memory intensive since the assembly algorithms need to compare

More information

Discrete Mathematics and Probability Theory Fall 2009 Satish Rao,David Tse Note 8

Discrete Mathematics and Probability Theory Fall 2009 Satish Rao,David Tse Note 8 CS 70 Discrete Mathematics and Probability Theory Fall 2009 Satish Rao,David Tse Note 8 An Introduction to Graphs Formulating a simple, precise specification of a computational problem is often a prerequisite

More information

Chapter 14 Section 3 - Slide 1

Chapter 14 Section 3 - Slide 1 AND Chapter 14 Section 3 - Slide 1 Chapter 14 Graph Theory Chapter 14 Section 3 - Slide WHAT YOU WILL LEARN Graphs, paths and circuits The Königsberg bridge problem Euler paths and Euler circuits Hamilton

More information

Introduction III. Graphs. Motivations I. Introduction IV

Introduction III. Graphs. Motivations I. Introduction IV Introduction I Graphs Computer Science & Engineering 235: Discrete Mathematics Christopher M. Bourke cbourke@cse.unl.edu Graph theory was introduced in the 18th century by Leonhard Euler via the Königsberg

More information

3 Euler Tours, Hamilton Cycles, and Their Applications

3 Euler Tours, Hamilton Cycles, and Their Applications 3 Euler Tours, Hamilton Cycles, and Their Applications 3.1 Euler Tours and Applications 3.1.1 Euler tours Carefully review the definition of (closed) walks, trails, and paths from Section 1... Definition

More information

Darwin: A Genomic Co-processor gives up to 15,000X speedup on long read assembly (To appear in ASPLOS 2018)

Darwin: A Genomic Co-processor gives up to 15,000X speedup on long read assembly (To appear in ASPLOS 2018) Darwin: A Genomic Co-processor gives up to 15,000X speedup on long read assembly (To appear in ASPLOS 2018) Yatish Turakhia EE PhD candidate Stanford University Prof. Bill Dally (Electrical Engineering

More information

MATH 113 Section 9.2: Topology

MATH 113 Section 9.2: Topology MATH 113 Section 9.2: Topology Prof. Jonathan Duncan Walla Walla College Winter Quarter, 2007 Outline 1 Introduction to Topology 2 Topology and Childrens Drawings 3 Networks 4 Conclusion Geometric Topology

More information

Suffix Arrays CMSC 423

Suffix Arrays CMSC 423 Suffix Arrays CMSC Suffix Arrays Even though Suffix Trees are O(n) space, the constant hidden by the big-oh notation is somewhat big : 0 bytes / character in good implementations. If you have a 0Gb genome,

More information

Sarah Will Math 490 December 2, 2009

Sarah Will Math 490 December 2, 2009 Sarah Will Math 490 December 2, 2009 Euler Circuits INTRODUCTION Euler wrote the first paper on graph theory. It was a study and proof that it was impossible to cross the seven bridges of Königsberg once

More information

Graph Theory: Starting Out

Graph Theory: Starting Out Graph Theory: Starting Out Administrivia To read: Chapter 7, Sections 1-3 (Ensley/Crawley) Problem Set 5 sent out; due Monday 12/8 in class. There will be two review days next week (Wednesday and Friday)

More information

De-Bruijn Sequence and Application in Graph theory.

De-Bruijn Sequence and Application in Graph theory. International Journal of Progressive Sciences and Technologies (IJPSAT) ISSN: 2509-0119 Vol. 3 No. 1 June 2016, pp.04-17 2016 International Journals of Sciences and High Technologies http://ijpsat.ijsht-journals.org

More information

Mapping Reads to Reference Genome

Mapping Reads to Reference Genome Mapping Reads to Reference Genome DNA carries genetic information DNA is a double helix of two complementary strands formed by four nucleotides (bases): Adenine, Cytosine, Guanine and Thymine 2 of 31 Gene

More information

Alignment of Long Sequences

Alignment of Long Sequences Alignment of Long Sequences BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 2009 Mark Craven craven@biostat.wisc.edu Pairwise Whole Genome Alignment: Task Definition Given a pair of genomes (or other large-scale

More information

Computational Methods for de novo Assembly of Next-Generation Genome Sequencing Data

Computational Methods for de novo Assembly of Next-Generation Genome Sequencing Data 1/39 Computational Methods for de novo Assembly of Next-Generation Genome Sequencing Data Rayan Chikhi ENS Cachan Brittany / IRISA (Genscale team) Advisor : Dominique Lavenier 2/39 INTRODUCTION, YEAR 2000

More information

Elements of Graph Theory

Elements of Graph Theory Elements of Graph Theory Quick review of Chapters 9.1 9.5, 9.7 (studied in Mt1348/2008) = all basic concepts must be known New topics we will mostly skip shortest paths (Chapter 9.6), as that was covered

More information

CHAPTER 10 GRAPHS AND TREES. Alessandro Artale UniBZ - artale/z

CHAPTER 10 GRAPHS AND TREES. Alessandro Artale UniBZ -  artale/z CHAPTER 10 GRAPHS AND TREES Alessandro Artale UniBZ - http://www.inf.unibz.it/ artale/z SECTION 10.1 Graphs: Definitions and Basic Properties Copyright Cengage Learning. All rights reserved. Graphs: Definitions

More information

An Introduction to Graph Theory

An Introduction to Graph Theory An Introduction to Graph Theory Evelyne Smith-Roberge University of Waterloo March 22, 2017 What is a graph? Definition A graph G is: a set V (G) of objects called vertices together with: a set E(G), of

More information

#30: Graph theory May 25, 2009

#30: Graph theory May 25, 2009 #30: Graph theory May 25, 2009 Graph theory is the study of graphs. But not the kind of graphs you are used to, like a graph of y = x 2 graph theory graphs are completely different from graphs of functions.

More information

Reducing Genome Assembly Complexity with Optical Maps

Reducing Genome Assembly Complexity with Optical Maps Reducing Genome Assembly Complexity with Optical Maps AMSC 663 Mid-Year Progress Report 12/13/2011 Lee Mendelowitz Lmendelo@math.umd.edu Advisor: Mihai Pop mpop@umiacs.umd.edu Computer Science Department

More information

Graph Traversals. CSC 1300 Discrete Structures Villanova University. Villanova CSC Dr Papalaskari 1

Graph Traversals. CSC 1300 Discrete Structures Villanova University. Villanova CSC Dr Papalaskari 1 Graph Traversals CSC 1300 Discrete Structures Villanova University Villanova CSC 1300 - Dr Papalaskari 1 Graph traversals: Euler circuit/path Major Themes Every edge exactly once Hamilton circuit/path

More information

Walking with Euler through Ostpreußen and RNA

Walking with Euler through Ostpreußen and RNA Walking with Euler through Ostpreußen and RNA Mark Muldoon February 4, 2010 Königsberg (1652) Kaliningrad (2007)? The Königsberg Bridge problem asks whether it is possible to walk around the old city in

More information

Genome 373: Mapping Short Sequence Reads I. Doug Fowler

Genome 373: Mapping Short Sequence Reads I. Doug Fowler Genome 373: Mapping Short Sequence Reads I Doug Fowler Two different strategies for parallel amplification BRIDGE PCR EMULSION PCR Two different strategies for parallel amplification BRIDGE PCR EMULSION

More information

Parameterized Edge Hamiltonicity

Parameterized Edge Hamiltonicity Parameterized Edge Hamiltonicity Joint with Valia Mitsou JST ERATO Minato Project Michael Lampis Université Paris Dauphine Kazuhisa Makino RIMS, Kyoto University Yushi Uno Osaka Prefecture University Edge-Hamiltonian

More information

GPU based Eulerian Assembly of Genomes

GPU based Eulerian Assembly of Genomes GPU based Eulerian Assembly of Genomes A thesis submitted in partial fulfillment of the requirements for the degree of Master of Science at George Mason University By Syed Faraz Mahmood Bachelor of Science

More information

Next generation sequencing: de novo assembly. Overview

Next generation sequencing: de novo assembly. Overview Next generation sequencing: de novo assembly Laurent Falquet, Vital-IT Helsinki, June 4, 2010 Overview What is de novo assembly? Methods Greedy OLC de Bruijn Tools Issues File formats Paired-end vs mate-pairs

More information

7.36/7.91 recitation. DG Lectures 5 & 6 2/26/14

7.36/7.91 recitation. DG Lectures 5 & 6 2/26/14 7.36/7.91 recitation DG Lectures 5 & 6 2/26/14 1 Announcements project specific aims due in a little more than a week (March 7) Pset #2 due March 13, start early! Today: library complexity BWT and read

More information