DNA Sequencing. Overview
|
|
- Lora Joseph
- 5 years ago
- Views:
Transcription
1 BINF 3350, Genomics and Bioinformatics DNA Sequencing Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Backgrounds Eulerian Cycles Problem Hamiltonian Cycles Problem Fragment Assembly for Genome Reconstruction 1
2 Genome Sequencing (1) Genome The entire set of genes A genome can be represented as a book written in an alphabet containing only 4 letters, called nucleotides: A, T, C, and G A human genome has roughly 3 billion nucleotides...ctgatgatggactacgctactactgctagctgtattacgatcagctaccacatcgtagctacgatgcattagcaagctatcgatcgatcgatcgatt ATCTACGATCGATCGATCGATCACTATACGAGCTACTACGTACGTACGATCGCGGGACTATTATCGACTACAGATAAAACATGCTAGTACAACAGTATAC ATAGCTGCGGGATACGATTAGCTAATAGCTGACGATATATAGCCGAGCGGCTACGATGATGCTAGCTGTACAGCTGATGATCTAGCTATCGATGCGATCG ATGCGCGAGTGCGATCGATCACTTCGAGCTAGCTGATCGATCGATGCTAGCTAGCTGACTGATCATGGCGTTAGCTAGCTAGCTGATCGTCGATCGTACG TAGCTGATTACGATCGTCCGATCGTGCTATGACGTACGAGGCGGCTACGTAGCATGCTAGCTGACTGATGTAGCTAGCTATACGATACTATATATTCGAT CGATTTATTACCATGACTGACGCGCATCGCTGTACACGTACTAGCTGATCGATGCTAGTCGATCGATCGATCATGTTATATATCGCGGCGCATCGATCGA CTGCTCGATTATCGATACGTCGATCGCTGTATATACGTCTTTATAGCTAGGAGCATAGCGACGCGCTATCGATCGATCGTCTAGTCGACTGATCGTACTA GCTGACGCTGACGACTAGCTAGCTATCGACGATCGTAGTGCGATTACTAGCTAGGATCCTACTGTACGTCAGTCAGTCTGATCGATAGCGAGGAAAGCGA GACTGATCGTTCTCTAGATGTAGCTGATGTGACTACTATACTACTGGCAGCGATCGGGA... Genome Sequencing Process of determining the sequence of nucleotides that make up a genome. Genome Sequencing (2) Features of Human Genomes Different people have slightly different genomes All humans share 99.9% of the same genetic code. The 0.1% difference accounts for height, eye color, high cholesterol susceptibility, etc. CTGATGATGGACTACGCTACTACTGCTAGCTGTATTACGA TCAGCTACCACATCGTAGCTACGATGCATTAGCAAGCTAT CGATCGATCGATCGATTATCTACGATCGATCGATCGATCA CTATACGAGCTACTACGTACGTACGATCGCGGGACTATTA TCGACTACAGATAAAACATGCTAGTACAACAGTATACATA GCTGCGGGATACGATTAGCTAATAGCTGACGATATCCGAT CTGATGATGGACTACGCTACTACTGCTAGCTGTATTACGA TCAGCTACAACATCGTAGCTACGATGCATTAGCAAGCTAT CGATCGATCGATCGATTATCTACGATCGATCGATCGATCA CTATACGAGCTACTACGTACGTACGATCGCGTGACTATTA TCGACTACAGATGAAACATGCTAGTACAACAGTATACATA GCTGCGGGATACGATTAGCTAATAGCTGACGATATCCGAT 2
3 Types of Genome Sequencing Species Sequencing Determine the consensus genome of an entire species Compare various species (e.g. human and chimpanzee) to understand how their genes function Reveal evolutionary relationships between species Determine the genetic makeup of our evolutionary ancestors Individual Sequencing Determine how an individual differs from its species Unearth the genetic basis of many diseases Forensics applications Brief History of Genome Sequencing (1) Late 1970s First independent sequencing methods were developed by Walter Gilbert and Frederick Sanger - They share the Nobel Prize in Chemistry in Still, their sequencing methods were too expensive for large genomes (with a $1 per nucleotide cost, it would cost $3 billion to sequence the human genome.) 3
4 Brief History of Genome Sequencing (2) 1990s High-throughput methods were used for Human Genome Project - The draft of the human genome was simultaneously completed by the Human Genome Project Consortium (a public project) and Celera Genomics (a private firm) in s Advanced next-generation sequencing techniques were introduced with dramatic cost-down Many mammalian genomes have been sequenced Problem of Genome Sequencing Ideal Situation When we read a book, we can read the entire book one letter at a time from the beginning to the end Real Problem However, modern sequencing machines cannot read an entire genome one nucleotide at a time from beginning to end. They can only shred the genome and read the short pieces. Identify very short fragments of DNA, called reads No idea which genomic positions these reads come from Have to figure out how to put the reads back together to assemble a genome 4
5 Process of Genome Sequencing (1) (Step 1) Read Generation Experimental technique Generate many reads from multiple copies of the same genome (Step 2) Fragment Assembly Computational technique Use the reads to algorithmically put the genome back together Process of Genome Sequencing (2) Multiple (Unsequenced) Genome Copies Reads Read Generation Sequenced Genome Fragment Assembly GGCATGCGTCAGAAACTATCATAGCTAGATCGTACGTAGCC 5
6 Overview Backgrounds Eulerian Cycles Problem Hamiltonian Cycles Problem Fragment Assembly for Genome Reconstruction Königsberg Bridges Problem Königsberg Bridges Problem The people of Königsberg, Prussia, wondered if they could walk through the city, cross each bridge exactly once, and return where they started 6
7 Solving Königsberg Bridges Problem Solution by Leonhard Euler Leonhard Euler developed an approach to answer this question, even for a city with a million islands, in 1735 Eulerian Cycles Eulerian Cycle A cycle that travels to each edge exactly once Eulerian Graph A graph containing an Eulerian cycle Eulerian graph? Eulerian graph? 7
8 Eulerian Cycles in a Directed Graph Balanced Graph indegree(v) = the number of edges leading into vertex v outdegree(v) = the number of edges leading out of v A graph is balanced if indegree(v) = outdegree(v) for every vertex v Eulerian Graph A directed graph is Eulerian if it is connected and balanced Example (1, 2) (2, 1) (0, 2) (1, 1) (1, 0) (1, 1) (2, 1) Eulerian Cycles in an Undirected Graph Eulerian Graph An undirected graph is Eulerian if it is a connected graph with every vertex of even degree Eulerian Cycles vs. Eulerian Path Eulerian cycle: every vertex must have even degree Eulerian path: two vertices must have odd degree, and others must have even degree 8
9 Overview Backgrounds Eulerian Cycles Problem Hamiltonian Cycles Problem Fragment Assembly for Genome Reconstruction Icosian Game Icosian Game William Hamilton designed a game consisting of a board representing 20 islands connected by bridges. He wanted to find a walk that visits every island exactly once and returns back where it started 9
10 Solving Icosian Game Solution? mathematicians still do not know how to solve this problem, even with a small number of islands Hamiltonian Cycles (1) Hamiltonian Cycle A cycle that visits each vertex exactly once Hamiltonian A graph containing a Hamiltonian cycle 14 Hamiltonian cycle? Hamiltonian path?
11 Hamiltonian Cycles (2) Examples of Hamiltonian Complete graph Algorithms to Solve Hamiltonian Cycles Problem Exhaustive algorithm is not efficient No one have found more efficient algorithm than the exhaustive search The HCP has been classified as NP-Complete Overview Backgrounds Eulerian Cycles Problem Hamiltonian Cycles Problem Fragment Assembly for Genome Reconstruction 11
12 Genome Reconstruction Process (1) Read sequencing (2) Creating all possible k-mers (substrings of length k) from reads (3) Assembly of k-mers Assumptions Reads are error-free Every k-mer occurring in the genome occurs exactly once A genome consists of a single circular-shaped chromosome Example Read: AGATCGAGTG 3-mers: AGA GAT ATC TCG CGA GAG AGT GTG Fragment Assembly by Hamiltonian Cycles (1) (Step 1) Creating vertex for every k-mer ATG CGT GGC AAT GTG TGG TGC CAA GCA GCG (Step 2) Connecting vertex v to vertex w with a directed edge if the suffix of v matches the prefix of w ATG CGT GGC AAT GTG TGG TGC CAA GCA GCG 12
13 Fragment Assembly by Hamiltonian Cycles (2) (Step 2) Continued ATG CGT GGC AAT GTG TGG TGC CAA GCA GCG Fragment Assembly by Hamiltonian Cycles (3) (Step 3) Searching Hamiltonian cycle ATG CGT GGC AAT GTG TGG TGC CAA GCA GCG ATG TGG GGC GCG CGT GTG TGC GCA CAA AAT ATG 13
14 Fragment Assembly by Hamiltonian Cycles (4) (Step 4) Constructing a genome Genome: ATG TGG GGC GCG CGT GTG TGC GCA CAA AAT ATG ATGGCGTGCAATG Problem? Fragment Assembly by Eulerian Cycles (1) (Step 1) Creating vertex for each distinct prefix and suffix of length (k-1) from k-mers k-mers: ATG, CGT, GGC, AAT, GTG, TGG, TGC, CAA, GCA, GCG GT CG GG AT TG GC CA AA 14
15 Fragment Assembly by Eulerian Cycles (2) (Step 2) Connecting vertex v to vertex w with a directed edge if there is a k-mer whose prefix is v and whose suffix is w k-mers: ATG, CGT, GGC, AAT, GTG, TGG, TGC, CAA, GCA, GCG GT CGT GG CG AT ATG TG GC CA AA Fragment Assembly by Eulerian Cycles (3) (Step 2) Continued GT CGT CG GTG TGG GG GGC GCG AT ATG TG TGC GC GCA CA AAT AA CAA De Bruijn graph 15
16 Fragment Assembly by Eulerian Cycles (4) (Step 3) Searching for Eulerian cycle GT CGT 5 CG GTG 6 TGG 2 GG 3 4 GGC GCG AT ATG 1 TG TGC GC GCA 7 8 CA AAT 10 AA 9 CAA ATG TGG GGC GCG CGT GTG TGC GCA CAA AAT ATG Algorithm to Search Eulerian Cycle Algorithm (1) Start with an arbitrary node (2) If there is an outgoing edge that does not disconnect the graph when it is removed, then select the edge and remove it (3) If there is no such an edge, then select a remaining edge and remove it (4) Repeat (2) and (3) until reaching the starting node Runtime? 16
17 Fragment Assembly by Eulerian Cycles (5) (Step 4) Constructing a genome Genome: Problem? Multiple genome candidates k-mer multiplicity, e.g., ATCGATCG ATG TGG GGC GCG CGT GTG TGC GCA CAA AAT ATG ATGGCGTGCAATG Solution? Questions? Lecture Slides are found on the Course Website, web.ecs.baylor.edu/faculty/cho/
Genome Reconstruction: A Puzzle with a Billion Pieces. Phillip Compeau Carnegie Mellon University Computational Biology Department
http://cbd.cmu.edu Genome Reconstruction: A Puzzle with a Billion Pieces Phillip Compeau Carnegie Mellon University Computational Biology Department Eternity II: The Highest-Stakes Puzzle in History Courtesy:
More informationAlgorithms for Bioinformatics
Adapted from slides by Alexandru Tomescu, Leena Salmela and Veli Mäkinen, which are partly from http://bix.ucsd.edu/bioalgorithms/slides.php 582670 Algorithms for Bioinformatics Lecture 3: Graph Algorithms
More informationGenome 373: Genome Assembly. Doug Fowler
Genome 373: Genome Assembly Doug Fowler What are some of the things we ve seen we can do with HTS data? We ve seen that HTS can enable a wide variety of analyses ranging from ID ing variants to genome-
More informationDNA Sequencing The Shortest Superstring & Traveling Salesman Problems Sequencing by Hybridization
Eulerian & Hamiltonian Cycle Problems DNA Sequencing The Shortest Superstring & Traveling Salesman Problems Sequencing by Hybridization The Bridge Obsession Problem Find a tour crossing every bridge just
More information10/15/2009 Comp 590/Comp Fall
Lecture 13: Graph Algorithms Study Chapter 8.1 8.8 10/15/2009 Comp 590/Comp 790-90 Fall 2009 1 The Bridge Obsession Problem Find a tour crossing every bridge just once Leonhard Euler, 1735 Bridges of Königsberg
More informationGenome Reconstruction: A Puzzle with a Billion Pieces Phillip E. C. Compeau and Pavel A. Pevzner
Genome Reconstruction: A Puzzle with a Billion Pieces Phillip E. C. Compeau and Pavel A. Pevzner Outline I. Problem II. Two Historical Detours III.Example IV.The Mathematics of DNA Sequencing V.Complications
More information10/8/13 Comp 555 Fall
10/8/13 Comp 555 Fall 2013 1 Find a tour crossing every bridge just once Leonhard Euler, 1735 Bridges of Königsberg 10/8/13 Comp 555 Fall 2013 2 Find a cycle that visits every edge exactly once Linear
More informationAlgorithms for Bioinformatics
Adapted from slides by Alexandru Tomescu, Leena Salmela and Veli Mäkinen, which are partly from http://bix.ucsd.edu/bioalgorithms/slides.php 58670 Algorithms for Bioinformatics Lecture 5: Graph Algorithms
More informationCSCI2950-C Lecture 4 DNA Sequencing and Fragment Assembly
CSCI2950-C Lecture 4 DNA Sequencing and Fragment Assembly Ben Raphael Sept. 22, 2009 http://cs.brown.edu/courses/csci2950-c/ l-mer composition Def: Given string s, the Spectrum ( s, l ) is unordered multiset
More informationGraph Algorithms in Bioinformatics
Graph Algorithms in Bioinformatics Computational Biology IST Ana Teresa Freitas 2015/2016 Sequencing Clone-by-clone shotgun sequencing Human Genome Project Whole-genome shotgun sequencing Celera Genomics
More informationSequence Assembly. BMI/CS 576 Mark Craven Some sequencing successes
Sequence Assembly BMI/CS 576 www.biostat.wisc.edu/bmi576/ Mark Craven craven@biostat.wisc.edu Some sequencing successes Yersinia pestis Cannabis sativa The sequencing problem We want to determine the identity
More informationSequence Assembly Required!
Sequence Assembly Required! 1 October 3, ISMB 20172007 1 Sequence Assembly Genome Sequenced Fragments (reads) Assembled Contigs Finished Genome 2 Greedy solution is bounded 3 Typical assembly strategy
More informationSequencing. Computational Biology IST Ana Teresa Freitas 2011/2012. (BACs) Whole-genome shotgun sequencing Celera Genomics
Computational Biology IST Ana Teresa Freitas 2011/2012 Sequencing Clone-by-clone shotgun sequencing Human Genome Project Whole-genome shotgun sequencing Celera Genomics (BACs) 1 Must take the fragments
More informationGenome Sequencing Algorithms
Genome Sequencing Algorithms Phillip Compaeu and Pavel Pevzner Bioinformatics Algorithms: an Active Learning Approach Leonhard Euler (1707 1783) William Hamilton (1805 1865) Nicolaas Govert de Bruijn (1918
More informationDNA Fragment Assembly
Algorithms in Bioinformatics Sami Khuri Department of Computer Science San José State University San José, California, USA khuri@cs.sjsu.edu www.cs.sjsu.edu/faculty/khuri DNA Fragment Assembly Overlap
More informationPurpose of sequence assembly
Sequence Assembly Purpose of sequence assembly Reconstruct long DNA/RNA sequences from short sequence reads Genome sequencing RNA sequencing for gene discovery Amplicon sequencing But not for transcript
More informationGraph Algorithms in Bioinformatics
Graph Algorithms in Bioinformatics Bioinformatics: Issues and Algorithms CSE 308-408 Fall 2007 Lecture 13 Lopresti Fall 2007 Lecture 13-1 - Outline Introduction to graph theory Eulerian & Hamiltonian Cycle
More informationDNA Fragment Assembly
SIGCSE 009 Algorithms in Bioinformatics Sami Khuri Department of Computer Science San José State University San José, California, USA khuri@cs.sjsu.edu www.cs.sjsu.edu/faculty/khuri DNA Fragment Assembly
More informationEulerian tours. Russell Impagliazzo and Miles Jones Thanks to Janine Tiefenbruck. April 20, 2016
Eulerian tours Russell Impagliazzo and Miles Jones Thanks to Janine Tiefenbruck http://cseweb.ucsd.edu/classes/sp16/cse21-bd/ April 20, 2016 Seven Bridges of Konigsberg Is there a path that crosses each
More informationRead Mapping. de Novo Assembly. Genomics: Lecture #2 WS 2014/2015
Mapping de Novo Assembly Institut für Medizinische Genetik und Humangenetik Charité Universitätsmedizin Berlin Genomics: Lecture #2 WS 2014/2015 Today Genome assembly: the basics Hamiltonian and Eulerian
More informationEulerian Tours and Fleury s Algorithm
Eulerian Tours and Fleury s Algorithm CSE21 Winter 2017, Day 12 (B00), Day 8 (A00) February 8, 2017 http://vlsicad.ucsd.edu/courses/cse21-w17 Vocabulary Path (or walk): describes a route from one vertex
More informationRESEARCH TOPIC IN BIOINFORMANTIC
RESEARCH TOPIC IN BIOINFORMANTIC GENOME ASSEMBLY Instructor: Dr. Yufeng Wu Noted by: February 25, 2012 Genome Assembly is a kind of string sequencing problems. As we all know, the human genome is very
More informationGraphs and Puzzles. Eulerian and Hamiltonian Tours.
Graphs and Puzzles. Eulerian and Hamiltonian Tours. CSE21 Winter 2017, Day 11 (B00), Day 7 (A00) February 3, 2017 http://vlsicad.ucsd.edu/courses/cse21-w17 Exam Announcements Seating Chart on Website Good
More informationby the Genevestigator program (www.genevestigator.com). Darker blue color indicates higher gene expression.
Figure S1. Tissue-specific expression profile of the genes that were screened through the RHEPatmatch and root-specific microarray filters. The gene expression profile (heat map) was drawn by the Genevestigator
More informationTCGR: A Novel DNA/RNA Visualization Technique
TCGR: A Novel DNA/RNA Visualization Technique Donya Quick and Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist University Dallas, Texas 75275 dquick@mail.smu.edu, mhd@engr.smu.edu
More informationWorksheet 28: Wednesday November 18 Euler and Topology
Worksheet 28: Wednesday November 18 Euler and Topology The Konigsberg Problem: The Foundation of Topology The Konigsberg Bridge Problem is a very famous problem solved by Euler in 1735. The process he
More information02-711/ Computational Genomics and Molecular Biology Fall 2016
Literature assignment 2 Due: Nov. 3 rd, 2016 at 4:00pm Your name: Article: Phillip E C Compeau, Pavel A. Pevzner, Glenn Tesler. How to apply de Bruijn graphs to genome assembly. Nature Biotechnology 29,
More informationI519 Introduction to Bioinformatics, Genome assembly. Yuzhen Ye School of Informatics & Computing, IUB
I519 Introduction to Bioinformatics, 2014 Genome assembly Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Contents Genome assembly problem Approaches Comparative assembly The string
More informationCrossing bridges. Crossing bridges Great Ideas in Theoretical Computer Science. Lecture 12: Graphs I: The Basics. Königsberg (Prussia)
15-251 Great Ideas in Theoretical Computer Science Lecture 12: Graphs I: The Basics February 22nd, 2018 Crossing bridges Königsberg (Prussia) Now Kaliningrad (Russia) Is there a way to walk through the
More informationPyramidal and Chiral Groupings of Gold Nanocrystals Assembled Using DNA Scaffolds
Pyramidal and Chiral Groupings of Gold Nanocrystals Assembled Using DNA Scaffolds February 27, 2009 Alexander Mastroianni, Shelley Claridge, A. Paul Alivisatos Department of Chemistry, University of California,
More informationHow to apply de Bruijn graphs to genome assembly
PRIMER How to apply de Bruijn graphs to genome assembly Phillip E C Compeau, Pavel A Pevzner & lenn Tesler A mathematical concept known as a de Bruijn graph turns the formidable challenge of assembling
More informationEECS 203 Lecture 20. More Graphs
EECS 203 Lecture 20 More Graphs Admin stuffs Last homework due today Office hour changes starting Friday (also in Piazza) Friday 6/17: 2-5 Mark in his office. Sunday 6/19: 2-5 Jasmine in the UGLI. Monday
More information6 Anhang. 6.1 Transgene Su(var)3-9-Linien. P{GS.ry + hs(su(var)3-9)egfp} 1 I,II,III,IV 3 2I 3 3 I,II,III 3 4 I,II,III 2 5 I,II,III,IV 3
6.1 Transgene Su(var)3-9-n P{GS.ry + hs(su(var)3-9)egfp} 1 I,II,III,IV 3 2I 3 3 I,II,III 3 4 I,II,II 5 I,II,III,IV 3 6 7 I,II,II 8 I,II,II 10 I,II 3 P{GS.ry + UAS(Su(var)3-9)EGFP} A AII 3 B P{GS.ry + (10.5kbSu(var)3-9EGFP)}
More information6.2. Paths and Cycles
6.2. PATHS AND CYCLES 85 6.2. Paths and Cycles 6.2.1. Paths. A path from v 0 to v n of length n is a sequence of n+1 vertices (v k ) and n edges (e k ) of the form v 0, e 1, v 1, e 2, v 2,..., e n, v n,
More informationDiscrete Mathematics and Probability Theory Fall 2013 Vazirani Note 7
CS 70 Discrete Mathematics and Probability Theory Fall 2013 Vazirani Note 7 An Introduction to Graphs A few centuries ago, residents of the city of Königsberg, Prussia were interested in a certain problem.
More informationGraphs and Genetics. Outline. Computational Biology IST. Ana Teresa Freitas 2015/2016. Slides source: AED (MEEC/IST); Jones and Pevzner (book)
raphs and enetics Computational Biology IST Ana Teresa Freitas / Slides source: AED (MEEC/IST); Jones and Pevzner (book) Outline l Motivacion l Introduction to raph Theory l Eulerian & Hamiltonian Cycle
More informationIntermediate Math Circles Wednesday, February 22, 2017 Graph Theory III
1 Eulerian Graphs Intermediate Math Circles Wednesday, February 22, 2017 Graph Theory III Let s begin this section with a problem that you may remember from lecture 1. Consider the layout of land and water
More informationCS 173, Lecture B Introduction to Genome Assembly (using Eulerian Graphs) Tandy Warnow
CS 173, Lecture B Introduction to Genome Assembly (using Eulerian Graphs) Tandy Warnow 2 Shotgun DNA Sequencing DNA target sample SHEAR & SIZE End Reads / Mate Pairs 550bp 10,000bp Not all sequencing technologies
More informationProblem statement. CS267 Assignment 3: Parallelize Graph Algorithms for de Novo Genome Assembly. Spring Example.
CS267 Assignment 3: Problem statement 2 Parallelize Graph Algorithms for de Novo Genome Assembly k-mers are sequences of length k (alphabet is A/C/G/T). An extension is a simple symbol (A/C/G/T/F). The
More informationWalking with Euler through Ostpreußen and RNA
Walking with Euler through Ostpreußen and RNA Mark Muldoon February 4, 2010 Königsberg (1652) Kaliningrad (2007)? The Königsberg Bridge problem asks whether it is possible to walk around the old city in
More informationEuler and Hamilton paths. Jorge A. Cobb The University of Texas at Dallas
Euler and Hamilton paths Jorge A. Cobb The University of Texas at Dallas 1 Paths and the adjacency matrix The powers of the adjacency matrix A r (with normal, not boolean multiplication) contain the number
More informationwarm-up exercise Representing Data Digitally goals for today proteins example from nature
Representing Data Digitally Anne Condon September 6, 007 warm-up exercise pick two examples of in your everyday life* in what media are the is represented? is the converted from one representation to another,
More informationof Nebraska - Lincoln
University of Nebraska - Lincoln DigitalCommons@University of Nebraska - Lincoln MAT Exam Expository Papers Math in the Middle Institute Partnership 7-2008 De Bruijn Cycles Val Adams University of Nebraska
More informationde novo assembly Simon Rasmussen 36626: Next Generation Sequencing analysis DTU Bioinformatics Next Generation Sequencing Analysis
de novo assembly Simon Rasmussen 36626: Next Generation Sequencing analysis DTU Bioinformatics 27626 - Next Generation Sequencing Analysis Generalized NGS analysis Data size Application Assembly: Compare
More informationCSCI 1820 Notes. Scribes: tl40. February 26 - March 02, Estimating size of graphs used to build the assembly.
CSCI 1820 Notes Scribes: tl40 February 26 - March 02, 2018 Chapter 2. Genome Assembly Algorithms 2.1. Statistical Theory 2.2. Algorithmic Theory Idury-Waterman Algorithm Estimating size of graphs used
More informationLecture 3, Review of Algorithms. What is Algorithm?
BINF 336, Introduction to Computational Biology Lecture 3, Review of Algorithms Young-Rae Cho Associate Professor Department of Computer Science Baylor University What is Algorithm? Definition A process
More informationSupplementary Table 1. Data collection and refinement statistics
Supplementary Table 1. Data collection and refinement statistics APY-EphA4 APY-βAla8.am-EphA4 Crystal Space group P2 1 P2 1 Cell dimensions a, b, c (Å) 36.27, 127.7, 84.57 37.22, 127.2, 84.6 α, β, γ (
More informationIntroduction to Genome Assembly. Tandy Warnow
Introduction to Genome Assembly Tandy Warnow 2 Shotgun DNA Sequencing DNA target sample SHEAR & SIZE End Reads / Mate Pairs 550bp 10,000bp Not all sequencing technologies produce mate-pairs. Different
More informationSUPPLEMENTARY INFORMATION. Systematic evaluation of CRISPR-Cas systems reveals design principles for genome editing in human cells
SUPPLEMENTARY INFORMATION Systematic evaluation of CRISPR-Cas systems reveals design principles for genome editing in human cells Yuanming Wang 1,2,7, Kaiwen Ivy Liu 2,7, Norfala-Aliah Binte Sutrisnoh
More informationBioinformatics: Fragment Assembly. Walter Kosters, Universiteit Leiden. IPA Algorithms&Complexity,
Bioinformatics: Fragment Assembly Walter Kosters, Universiteit Leiden IPA Algorithms&Complexity, 29.6.2007 www.liacs.nl/home/kosters/ 1 Fragment assembly Problem We study the following problem from bioinformatics:
More informationWeek 11: Eulerian and Hamiltonian graphs; Trees. 15 and 17 November, 2017
(1/22) MA284 : Discrete Mathematics Week 11: Eulerian and Hamiltonian graphs; Trees http://www.maths.nuigalway.ie/~niall/ma284/ 15 and 17 November, 2017 Hamilton s Icosian Game (Library or the Royal Irish
More informationMa/CS 6a Class 8: Eulerian Cycles
Ma/CS 6a Class 8: Eulerian Cycles By Adam Sheffer The Bridges of Königsberg Can we travel the city while crossing every bridge exactly once? 1 How Graph Theory was Born Leonhard Euler 1736 Eulerian Cycle
More informationChapter 3: Paths and Cycles
Chapter 3: Paths and Cycles 5 Connectivity 1. Definitions: Walk: finite sequence of edges in which any two consecutive edges are adjacent or identical. (Initial vertex, Final vertex, length) Trail: walk
More informationDiscrete Mathematics for CS Spring 2008 David Wagner Note 13. An Introduction to Graphs
CS 70 Discrete Mathematics for CS Spring 2008 David Wagner Note 13 An Introduction to Graphs Formulating a simple, precise specification of a computational problem is often a prerequisite to writing a
More informationSection Graphs, Paths, and Circuits. Copyright 2013, 2010, 2007, Pearson, Education, Inc.
Section 14.1 Graphs, Paths, and Circuits What You Will Learn Graphs Paths Circuits Bridges 14.1-2 Definitions A graph is a finite set of points called vertices (singular form is vertex) connected by line
More informationCHAPTER 10 GRAPHS AND TREES. Alessandro Artale UniBZ - artale/z
CHAPTER 10 GRAPHS AND TREES Alessandro Artale UniBZ - http://www.inf.unibz.it/ artale/z SECTION 10.1 Graphs: Definitions and Basic Properties Copyright Cengage Learning. All rights reserved. Graphs: Definitions
More informationAppendix A. Example code output. Chapter 1. Chapter 3
Appendix A Example code output This is a compilation of output from selected examples. Some of these examples requires exernal input from e.g. STDIN, for such examples the interaction with the program
More informationBMI/CS 576 Fall 2015 Midterm Exam
BMI/CS 576 Fall 2015 Midterm Exam Prof. Colin Dewey Tuesday, October 27th, 2015 11:00am-12:15pm Name: KEY Write your answers on these pages and show your work. You may use the back sides of pages as necessary.
More informationHP22.1 Roth Random Primer Kit A für die RAPD-PCR
HP22.1 Roth Random Kit A für die RAPD-PCR Kit besteht aus 20 Einzelprimern, jeweils aufgeteilt auf 2 Reaktionsgefäße zu je 1,0 OD Achtung: Angaben beziehen sich jeweils auf ein Reaktionsgefäß! Sequenz
More informationDiscrete Mathematics and Probability Theory Fall 2009 Satish Rao,David Tse Note 8
CS 70 Discrete Mathematics and Probability Theory Fall 2009 Satish Rao,David Tse Note 8 An Introduction to Graphs Formulating a simple, precise specification of a computational problem is often a prerequisite
More informationEfficient Selection of Unique and Popular Oligos for Large EST Databases. Stefano Lonardi. University of California, Riverside
Efficient Selection of Unique and Popular Oligos for Large EST Databases Stefano Lonardi University of California, Riverside joint work with Jie Zheng, Timothy Close, Tao Jiang University of California,
More informationEulerian Cycle (2A) Young Won Lim 4/26/18
Eulerian Cycle (2A) Copyright (c) 2015 2018 Young W. Lim. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any
More informationDNA Fragment Assembly Algorithms: Toward a Solution for Long Repeats
San Jose State University SJSU ScholarWorks Master's Projects Master's Theses and Graduate Research 2008 DNA Fragment Assembly Algorithms: Toward a Solution for Long Repeats Ching Li San Jose State University
More informationQuestions? You are given the complete graph of Facebook. What questions would you ask? (What questions could we hope to answer?)
P vs. NP What now? Attribution These slides were prepared for the New Jersey Governor s School course The Math Behind the Machine taught in the summer of 2011 by Grant Schoenebeck Large parts of these
More information3 Euler Tours, Hamilton Cycles, and Their Applications
3 Euler Tours, Hamilton Cycles, and Their Applications 3.1 Euler Tours and Applications 3.1.1 Euler tours Carefully review the definition of (closed) walks, trails, and paths from Section 1... Definition
More informationJunior Circle Meeting 3 Circuits and Paths. April 18, 2010
Junior Circle Meeting 3 Circuits and Paths April 18, 2010 We have talked about insect worlds which consist of cities connected by tunnels. Here is an example of an insect world (Antland) which we saw last
More informationBLAST & Genome assembly
BLAST & Genome assembly Solon P. Pissis Tomáš Flouri Heidelberg Institute for Theoretical Studies May 15, 2014 1 BLAST What is BLAST? The algorithm 2 Genome assembly De novo assembly Mapping assembly 3
More informationDigging into acceptor splice site prediction: an iterative feature selection approach
Digging into acceptor splice site prediction: an iterative feature selection approach Yvan Saeys, Sven Degroeve, and Yves Van de Peer Department of Plant Systems Biology, Ghent University, Flanders Interuniversity
More informationParallel de novo Assembly of Complex (Meta) Genomes via HipMer
Parallel de novo Assembly of Complex (Meta) Genomes via HipMer Aydın Buluç Computational Research Division, LBNL May 23, 2016 Invited Talk at HiCOMB 2016 Outline and Acknowledgments Joint work (alphabetical)
More informationNumber Theory and Graph Theory
1 Number Theory and Graph Theory Chapter 7 Graph properties By A. Satyanarayana Reddy Department of Mathematics Shiv Nadar University Uttar Pradesh, India E-mail: satya8118@gmail.com 2 Module-2: Eulerian
More informationIntroduction III. Graphs. Motivations I. Introduction IV
Introduction I Graphs Computer Science & Engineering 235: Discrete Mathematics Christopher M. Bourke cbourke@cse.unl.edu Graph theory was introduced in the 18th century by Leonhard Euler via the Königsberg
More informationBLAST & Genome assembly
BLAST & Genome assembly Solon P. Pissis Tomáš Flouri Heidelberg Institute for Theoretical Studies November 17, 2012 1 Introduction Introduction 2 BLAST What is BLAST? The algorithm 3 Genome assembly De
More informationMATH 113 Section 9.2: Topology
MATH 113 Section 9.2: Topology Prof. Jonathan Duncan Walla Walla College Winter Quarter, 2007 Outline 1 Introduction to Topology 2 Topology and Childrens Drawings 3 Networks 4 Conclusion Geometric Topology
More informationChapter 9. Graph Theory
Chapter 9. Graph Theory Prof. Tesler Math 8A Fall 207 Prof. Tesler Ch. 9. Graph Theory Math 8A / Fall 207 / 50 Graphs PC Computer network PC2 Modem ISP Remote server PC Emily Dan Friends Irene Gina Harry
More informationCircuits and Paths. April 13, 2014
Circuits and Paths April 13, 2014 Warm Up Problem Quandroland is an insect country that has four cities. Draw all possible ways tunnels can join the cities in Quadroland. (Remember that some cities might
More informationDNA arrays. and their various applications. Algorithmen der Bioinformatik II - SoSe Christoph Dieterich
DNA arrays and their various applications Algorithmen der Bioinformatik II - SoSe 2007 Christoph Dieterich 1 Introduction Motivation DNA microarray is a parallel approach to gene screening and target identification.
More informationEULERIAN GRAPHS AND ITS APPLICATIONS
EULERIAN GRAPHS AND ITS APPLICATIONS Aruna R 1, Madhu N.R 2 & Shashidhar S.N 3 1.2&3 Assistant Professor, Department of Mathematics. R.L.Jalappa Institute of Technology, Doddaballapur, B lore Rural Dist
More informationIE 102 Spring Routing Through Networks - 1
IE 102 Spring 2017 Routing Through Networks - 1 The Bridges of Koenigsberg: Euler 1735 Graph Theory began in 1735 Leonard Eüler Visited Koenigsberg People wondered whether it is possible to take a walk,
More informationCharacterization of Graphs with Eulerian Circuits
Eulerian Circuits 3. 73 Characterization of Graphs with Eulerian Circuits There is a simple way to determine if a graph has an Eulerian circuit. Theorems 3.. and 3..2: Let G be a pseudograph that is connected
More informationChapter 14 Section 3 - Slide 1
AND Chapter 14 Section 3 - Slide 1 Chapter 14 Graph Theory Chapter 14 Section 3 - Slide WHAT YOU WILL LEARN Graphs, paths and circuits The Königsberg bridge problem Euler paths and Euler circuits Hamilton
More informationGene Clustering & Classification
BINF, Introduction to Computational Biology Gene Clustering & Classification Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Introduction to Gene Clustering
More informationGraph Traversals. CSC 1300 Discrete Structures Villanova University. Villanova CSC Dr Papalaskari 1
Graph Traversals CSC 1300 Discrete Structures Villanova University Villanova CSC 1300 - Dr Papalaskari 1 Graph traversals: Euler circuit/path Major Themes Every edge exactly once Hamilton circuit/path
More informationPath Finding in Graphs. Problem Set #2 will be posted by tonight
Path Finding in Graphs Problem Set #2 will be posted by tonight 1 From Last Time Two graphs representing 5-mers from the sequence "GACGGCGGCGCACGGCGCAA" Hamiltonian Path: Eulerian Path: Each k-mer is a
More informationCrick s Hypothesis Revisited: The Existence of a Universal Coding Frame
Crick s Hypothesis Revisited: The Existence of a Universal Coding Frame Jean-Louis Lassez*, Ryan A. Rossi Computer Science Department, Coastal Carolina University jlassez@coastal.edu, raross@coastal.edu
More informationWeek 10: Colouring graphs, and Euler s paths. 14 and 16 November, 2018
MA284 : Discrete Mathematics Week 10: Colouring graphs, and Euler s paths http://www.maths.nuigalway.ie/ niall/ma284/ 14 and 16 November, 2018 1 Colouring The Four Colour Theorem 2 Graph colouring Chromatic
More informationCombinatorial Pattern Matching. CS 466 Saurabh Sinha
Combinatorial Pattern Matching CS 466 Saurabh Sinha Genomic Repeats Example of repeats: ATGGTCTAGGTCCTAGTGGTC Motivation to find them: Genomic rearrangements are often associated with repeats Trace evolutionary
More informationIntroduction aux Systèmes Collaboratifs Multi-Agents
M1 EEAII - Découverte de la Recherche (ViRob) Introduction aux Systèmes Collaboratifs Multi-Agents UPJV, Département EEA Fabio MORBIDI Laboratoire MIS Équipe Perception et Robotique E-mail: fabio.morbidi@u-picardie.fr
More informationDegenerate Coding and Sequence Compacting
ESI The Erwin Schrödinger International Boltzmanngasse 9 Institute for Mathematical Physics A-1090 Wien, Austria Degenerate Coding and Sequence Compacting Maya Gorel Kirzhner V.M. Vienna, Preprint ESI
More informationReducing Genome Assembly Complexity with Optical Maps
Reducing Genome Assembly Complexity with Optical Maps Lee Mendelowitz LMendelo@math.umd.edu Advisor: Dr. Mihai Pop Computer Science Department Center for Bioinformatics and Computational Biology mpop@umiacs.umd.edu
More informationGenome Assembly and De Novo RNAseq
Genome Assembly and De Novo RNAseq BMI 7830 Kun Huang Department of Biomedical Informatics The Ohio State University Outline Problem formulation Hamiltonian path formulation Euler path and de Bruijin graph
More informationCS681: Advanced Topics in Computational Biology
CS681: Advanced Topics in Computational Biology Can Alkan EA224 calkan@cs.bilkent.edu.tr Week 7 Lectures 2-3 http://www.cs.bilkent.edu.tr/~calkan/teaching/cs681/ Genome Assembly Test genome Random shearing
More informationMohammad A. Yazdani, Ph.D. Abstract
Utilizing Euler s Approach in Solving Konigsberg Bridge Problem to Identify Similar Traversable Networks in a Dynamic Geometry Teacher Education Environment: An Instructional Activity Mohammad A. Yazdani,
More informationThe Traveling Salesman Problem
The Traveling Salesman Problem Hamilton path A path that visits each vertex of the graph once and only once. Hamilton circuit A circuit that visits each vertex of the graph once and only once (at the end,
More informationGraph Theory CS/Math231 Discrete Mathematics Spring2015
1 Graphs Definition 1 A directed graph (or digraph) G is a pair (V, E), where V is a finite set and E is a binary relation on V. The set V is called the vertex set of G, and its elements are called vertices
More informationPrecept 4: Traveling Salesman Problem, Hierarchical Clustering. Qian Zhu 2/23/2011
Precept 4: Traveling Salesman Problem, Hierarchical Clustering Qian Zhu 2/23/2011 Agenda Assignment: Traveling salesman problem Hierarchical clustering Example Comparisons with K-means TSP TSP: Given the
More informationGraph (1A) Young Won Lim 4/19/18
Graph (1A) Copyright (c) 2015 2018 Young W. Lim. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version
More informationGraphs. Reading Assignment. Mandatory: Chapter 3 Sections 3.1 & 3.2. Peeking into Computer Science. Jalal Kawash 2010
Graphs Mandatory: hapter 3 Sections 3.1 & 3.2 Reading ssignment 2 Graphs bstraction of ata 3 t the end of this section, you will be able to: 1.efine directed and undirected graphs 2.Use graphs to model
More informationarxiv: v1 [cs.dc] 31 May 2017
Extreme-Scale De Novo Genome Assembly Evangelos Georganas 1, Steven Hofmeyr 2, Rob Egan 3, Aydın Buluç 2, Leonid Oliker 2, Daniel Rokhsar 3, Katherine Yelick 2 arxiv:1705.11147v1 [cs.dc] 31 May 2017 1
More information(for more info see:
Genome assembly (for more info see: http://www.cbcb.umd.edu/research/assembly_primer.shtml) Introduction Sequencing technologies can only "read" short fragments from a genome. Reconstructing the entire
More informationSuffix Arrays CMSC 423
Suffix Arrays CMSC Suffix Arrays Even though Suffix Trees are O(n) space, the constant hidden by the big-oh notation is somewhat big : 0 bytes / character in good implementations. If you have a 0Gb genome,
More information