Genome 373: Genome Assembly. Doug Fowler
|
|
- Lester Hamilton
- 5 years ago
- Views:
Transcription
1 Genome 373: Genome Assembly Doug Fowler
2 What are some of the things we ve seen we can do with HTS data?
3 We ve seen that HTS can enable a wide variety of analyses ranging from ID ing variants to genome- wide biology! Li Ding et al. Hum. Mol. Genet. 2010;19:R188-R196
4 But We Always Need One Thing! After we get the HTS reads, there is a common first step for all these analyses. What is it? We ve just assumed that we were given one critical piece of data. What is it?
5 But We Always Need One Thing! After we get the HTS reads, there is a common first step for all these analyses. What is it? Read mapping! We ve just assumed that we were given one critical piece of data. What is it?
6 But We Always Need One Thing! After we get the HTS reads, there is a common first step for all these analyses. What is it? Read mapping! We ve just assumed that we were given one critical piece of data. What is it? The reference genome! reference genome
7 Outline De novo genome assembly introducbon State- of- the- art assembly with short reads: the De Bruijn graph Complete course evaluabons
8 Acquiring Data How would you guys go about acquiring sequencing data for genome assembly?
9 Shotgun Sequencing Genomic DNA First, isolate genomic DNA Commins, J., ToK, C., Fares, M. A. Biol. Procedures Online (2009).
10 Shotgun Sequencing Genomic DNA Generate defined- length fragments Fragment by sonication, nuclease or transposase Commins, J., ToK, C., Fares, M. A. Biol. Procedures Online (2009).
11 Shotgun Sequencing Genomic DNA Generate defined- length fragments Sequence and assemble Commins, J., ToK, C., Fares, M. A. Biol. Procedures Online (2009).
12 Shotgun Sequencing Genomic DNA Generate defined- length fragments Sequence fragments Assemble fragment sequences Commins, J., ToK, C., Fares, M. A. Biol. Procedures Online (2009).
13 Reads are the Basic Unit of Assembly All we start with at the beginning of the assembly process is a read. read
14 Reads are the Basic Unit of Assembly All we start with at the beginning of the assembly process is a read. read Read length is a key parameter is de novo assembly
15 Reads are the Basic Unit of Assembly All we start with at the beginning of the assembly process is a read. read Technology Read Length (nt) Sanger ~1,000 HTS ~100
16 Assembling a Genome Once we have reads from randomly sheared DNA, what is our next step?
17 Assembling a Genome Align the reads to find ones that have overlaps
18 Assembling a Genome Why is this a hard problem?
19 Assembling a Genome Because it s an all-by-all comparison of the reads. We ve seen how using hashing and seeds can help.
20 Assembling a Genome What s next, now that we know which pairs of reads contain overlaps?
21 Assembling a Genome contig Assemble the overlapping reads into contigs
22 Dealing with Gaps Between Contigs Contig #1 Contig #2 Gap! Now we have a problem: gaps between contigs. How can we deal with these? Hint: we have to change our experimental design
23 Paired End Reads to Connect Contigs Contig #1 Contig #2 Gap! Sequencing paired ends enables us to bridge gaps
24 Paired End Reads to Connect Contigs Contig #1 Contig #2 Gap! If we know the length between read pairs, even better!
25 Paired End Reads to Connect Contigs Contig #1 Contig #2 Gap! How would we actually do this?
26 How Would We Actually Do This? 1) Fragment genome 2) Isolate defined sizes on a gel 3) Clone into a vector (or append HTS adapters)
27 Contigs are Assembled into Scaffolds Scaffolds are large units of assembly.
28 Contigs are Assembled into Scaffolds Of course, even this strategy won t be a complete one what regions are we likely to miss?
29 Repeat Regions are Problematic Of course, even this strategy won t be a complete one what regions are we likely to miss?
30 Assessing Assembly Quality How should we assess the quality of our assembly?
31 Assessing Assembly Quality N50 is a simple statistic for assessing assembly quality
32 Assessing Assembly Quality N50 is defined as the length of the shortest scaffold at 50% coverage of the genome
33 Assessing Assembly Quality We arrange the scaffolds from biggest to smallest
34 Assessing Assembly Quality Then identify the length of the smallest scaffold needed to cover 50% of the genome (here N 50 = 30)
35 Outline De novo genome assembly introducbon State- of- the- art assembly with short reads: the De Bruijn graph Complete course evaluabons
36 Using Graphs to Represent Assembly Imagine we acquire short reads from a small circular genome
37 Using Graphs to Represent Assembly We can represent the traditional assembly process we just talked about as a directed graph where each edge represents the best alignment between two reads
38 Using Graphs to Represent Assembly Walking the graph corresponds to assembling the genome
39 Breaking Reads into k-mers ATGGCGT In practice, we break reads into short k-mers to ensure that all k-mers in a genome are represented
40 Breaking Reads into k-mers ATGGCGT ATG TGG GGC GCG CGT k=3 mers for the first read in our example
41 Breaking Reads into k-mers ATG Given a k-mer we define its suffix as the string formed by all nucleotides except the first
42 Breaking Reads into k-mers ATG Given a k-mer we define its prefix as the string formed by all nucleotides except the last
43 A Graph With k-mers as Nodes ATG TGG We connect one k-mer to another using a directed edge when the suffix of the first k-mer equals the prefix of the second k-mer
44 Using Graphs to Represent Assembly The assembled genome can be found by visiting each node once and only once
45 Using Graphs to Represent Assembly This is equivalent to the align all reads to each other and find the optimal assembly problem
46 Using Graphs to Represent Assembly Also known as finding a Hamiltonian cycle, it s computationally very difficult
47 Euler and the 7 Bridges of Koningsberg Euler showed that we can find a path that goes through all edges of a graph exactly once, provided that every vertex is equal in in/out-degree
48 Euler and the 7 Bridges of Koningsberg It also turns out that finding a Eulerian path through all edges, if it exists, is much less computationally difficult than finding a Hamiltonian path
49 Euler and the 7 Bridges of Koningsberg How can we take advantage of Euler s observation?
50 Representing k-mers as Edges ATG Recast the graph so that edges represent k-mers
51 Representing k-mers as Edges AT ATG TG A prefix and suffix are joined by an edge when they represent an observed k-mer
52 Representing k-mers as Edges AT ATG TG Finding an Eulerian path through such a graph gives us the genome assembly
53 Finding an Eulerian Path: Hierholzer s Algorithm AAT AT ATG AA TG CAA TGG CA GTG TGC GG GT GCA GC GGC CGT CG GCG Here is a graph representing the same reads we ve been working with we want to find an Eulerian path through the graph
54 Finding an Eulerian Path: Hierholzer s Algorithm AAT AT ATG AA TG CAA TGG CA GTG TGC GG GT GCA GC GGC CGT CG GCG 1) Start with any node
55 Finding an Eulerian Path: Hierholzer s Algorithm AAT AT ATG AA TG CAA TGG CA GTG TGC GG GT GCA GC GGC CGT CG GCG 1) Start with any node 2) Walk an arbitrary path of edges back to the start node
56 Finding an Eulerian Path: Hierholzer s Algorithm AAT AT ATG AA TG CAA TGG CA GTG TGC GG GT GCA GC GGC CGT CG GCG 1) Start with any node 2) Walk an arbitrary path of edges back to the start node 3) If any node has edges not part of the current path, start another walk from that node, following unused edges and returning to the node. Append this second path to the first
57 Finding an Eulerian Path: Hierholzer s Algorithm AAT AT ATG AA TG CAA TGG CA GTG TGC GG GT GCA GC GGC CGT CG GCG The algorithm is guaranteed to give an Eulerian path if one exists
58 Picking the Best Eulerian Path 10 AT 1 10 AT 1 AA TG AA TG 9 CA GG 9 CA GG GT 5 8 CG 4 GC 3 GT 4 8 CG 3 GC 7 You may have noticed that we could make multiple distinct Eulerian paths for this graph, each of which would correspond to a distinct genome assembly
59 Picking the Best Eulerian Path AA 10 AT 1 10 AT 1 TG AA TG 9 CA GG 9 CA GG GT 5 8 CG 4 GC 3 GT 4 8 CG 3 GC 7 This problem arises because, when we converted reads to k-mers we lost linkage information across reads
60 Picking the Best Eulerian Path AA 10 AT 1 10 AT 1 TG AA TG 9 CA GG 9 CA GG GT 5 8 CG 4 GC 3 GT 4 8 CG 3 GC 7 One of our reads, ATGGCGT, spans the ambiguous region
61 Picking the Best Eulerian Path AA 10 AT 1 10 AT 1 TG AA TG 9 CA GG 9 CA GG GT 5 8 CG 4 GC 3 GT 4 8 CG 3 GC 7 One of our reads, ATGGCGT, spans the ambiguous region And we can use it to pick the right path
62 The Eulerian Path Through the Graph Gives the Sequence 10 AT 1 AA TG 9 2 CA 6 7 GG GT 8 GC 3 5 CG 4 So, now we have the right Eulerian path through the graph
63 The Eulerian Path Through the Graph Gives the Sequence AA 10 AT 1 TG A 9 2 CA 6 7 GG GT 8 GC 3 5 CG 4
64 The Eulerian Path Through the Graph Gives the Sequence AA 10 AT 1 TG AT 9 2 CA 6 7 GG GT 8 GC 3 5 CG 4
65 The Eulerian Path Through the Graph Gives the Sequence AA 10 AT 1 TG ATGGCGTGCA 9 2 CA 6 7 GG GT 8 GC 3 5 CG 4
66 References De novo assembly Compeau, Pevzner and Tesler, How to apply de Bruijn graphs to genome assembly. Nature Biotechnology, 2011 Baker, De novo genome assembly: what every biologist should know. Nature Methods, 2012 Jones and Pevzner An introduction to bioinformatics algorithms Chapter 8
67 Outline De novo genome assembly introducbon State- of- the- art assembly with short reads: the De Bruijn graph Complete course evaluabons h`ps://uw.iasystem.org/survey/142151
DNA Sequencing. Overview
BINF 3350, Genomics and Bioinformatics DNA Sequencing Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Backgrounds Eulerian Cycles Problem Hamiltonian Cycles
More informationPurpose of sequence assembly
Sequence Assembly Purpose of sequence assembly Reconstruct long DNA/RNA sequences from short sequence reads Genome sequencing RNA sequencing for gene discovery Amplicon sequencing But not for transcript
More informationAlgorithms for Bioinformatics
Adapted from slides by Alexandru Tomescu, Leena Salmela and Veli Mäkinen, which are partly from http://bix.ucsd.edu/bioalgorithms/slides.php 582670 Algorithms for Bioinformatics Lecture 3: Graph Algorithms
More informationCSCI2950-C Lecture 4 DNA Sequencing and Fragment Assembly
CSCI2950-C Lecture 4 DNA Sequencing and Fragment Assembly Ben Raphael Sept. 22, 2009 http://cs.brown.edu/courses/csci2950-c/ l-mer composition Def: Given string s, the Spectrum ( s, l ) is unordered multiset
More informationDNA Sequencing The Shortest Superstring & Traveling Salesman Problems Sequencing by Hybridization
Eulerian & Hamiltonian Cycle Problems DNA Sequencing The Shortest Superstring & Traveling Salesman Problems Sequencing by Hybridization The Bridge Obsession Problem Find a tour crossing every bridge just
More information10/15/2009 Comp 590/Comp Fall
Lecture 13: Graph Algorithms Study Chapter 8.1 8.8 10/15/2009 Comp 590/Comp 790-90 Fall 2009 1 The Bridge Obsession Problem Find a tour crossing every bridge just once Leonhard Euler, 1735 Bridges of Königsberg
More informationSequence Assembly. BMI/CS 576 Mark Craven Some sequencing successes
Sequence Assembly BMI/CS 576 www.biostat.wisc.edu/bmi576/ Mark Craven craven@biostat.wisc.edu Some sequencing successes Yersinia pestis Cannabis sativa The sequencing problem We want to determine the identity
More informationAlgorithms for Bioinformatics
Adapted from slides by Alexandru Tomescu, Leena Salmela and Veli Mäkinen, which are partly from http://bix.ucsd.edu/bioalgorithms/slides.php 58670 Algorithms for Bioinformatics Lecture 5: Graph Algorithms
More informationGenome Reconstruction: A Puzzle with a Billion Pieces Phillip E. C. Compeau and Pavel A. Pevzner
Genome Reconstruction: A Puzzle with a Billion Pieces Phillip E. C. Compeau and Pavel A. Pevzner Outline I. Problem II. Two Historical Detours III.Example IV.The Mathematics of DNA Sequencing V.Complications
More informationGraph Algorithms in Bioinformatics
Graph Algorithms in Bioinformatics Computational Biology IST Ana Teresa Freitas 2015/2016 Sequencing Clone-by-clone shotgun sequencing Human Genome Project Whole-genome shotgun sequencing Celera Genomics
More informationDNA Fragment Assembly
Algorithms in Bioinformatics Sami Khuri Department of Computer Science San José State University San José, California, USA khuri@cs.sjsu.edu www.cs.sjsu.edu/faculty/khuri DNA Fragment Assembly Overlap
More informationSequencing. Computational Biology IST Ana Teresa Freitas 2011/2012. (BACs) Whole-genome shotgun sequencing Celera Genomics
Computational Biology IST Ana Teresa Freitas 2011/2012 Sequencing Clone-by-clone shotgun sequencing Human Genome Project Whole-genome shotgun sequencing Celera Genomics (BACs) 1 Must take the fragments
More information10/8/13 Comp 555 Fall
10/8/13 Comp 555 Fall 2013 1 Find a tour crossing every bridge just once Leonhard Euler, 1735 Bridges of Königsberg 10/8/13 Comp 555 Fall 2013 2 Find a cycle that visits every edge exactly once Linear
More informationSequence Assembly Required!
Sequence Assembly Required! 1 October 3, ISMB 20172007 1 Sequence Assembly Genome Sequenced Fragments (reads) Assembled Contigs Finished Genome 2 Greedy solution is bounded 3 Typical assembly strategy
More information02-711/ Computational Genomics and Molecular Biology Fall 2016
Literature assignment 2 Due: Nov. 3 rd, 2016 at 4:00pm Your name: Article: Phillip E C Compeau, Pavel A. Pevzner, Glenn Tesler. How to apply de Bruijn graphs to genome assembly. Nature Biotechnology 29,
More informationGenome Reconstruction: A Puzzle with a Billion Pieces. Phillip Compeau Carnegie Mellon University Computational Biology Department
http://cbd.cmu.edu Genome Reconstruction: A Puzzle with a Billion Pieces Phillip Compeau Carnegie Mellon University Computational Biology Department Eternity II: The Highest-Stakes Puzzle in History Courtesy:
More informationDNA Fragment Assembly
SIGCSE 009 Algorithms in Bioinformatics Sami Khuri Department of Computer Science San José State University San José, California, USA khuri@cs.sjsu.edu www.cs.sjsu.edu/faculty/khuri DNA Fragment Assembly
More informationRESEARCH TOPIC IN BIOINFORMANTIC
RESEARCH TOPIC IN BIOINFORMANTIC GENOME ASSEMBLY Instructor: Dr. Yufeng Wu Noted by: February 25, 2012 Genome Assembly is a kind of string sequencing problems. As we all know, the human genome is very
More informationGraph Algorithms in Bioinformatics
Graph Algorithms in Bioinformatics Bioinformatics: Issues and Algorithms CSE 308-408 Fall 2007 Lecture 13 Lopresti Fall 2007 Lecture 13-1 - Outline Introduction to graph theory Eulerian & Hamiltonian Cycle
More informationRead Mapping. de Novo Assembly. Genomics: Lecture #2 WS 2014/2015
Mapping de Novo Assembly Institut für Medizinische Genetik und Humangenetik Charité Universitätsmedizin Berlin Genomics: Lecture #2 WS 2014/2015 Today Genome assembly: the basics Hamiltonian and Eulerian
More informationGenome Sequencing Algorithms
Genome Sequencing Algorithms Phillip Compaeu and Pavel Pevzner Bioinformatics Algorithms: an Active Learning Approach Leonhard Euler (1707 1783) William Hamilton (1805 1865) Nicolaas Govert de Bruijn (1918
More informationde novo assembly Simon Rasmussen 36626: Next Generation Sequencing analysis DTU Bioinformatics Next Generation Sequencing Analysis
de novo assembly Simon Rasmussen 36626: Next Generation Sequencing analysis DTU Bioinformatics 27626 - Next Generation Sequencing Analysis Generalized NGS analysis Data size Application Assembly: Compare
More informationI519 Introduction to Bioinformatics, Genome assembly. Yuzhen Ye School of Informatics & Computing, IUB
I519 Introduction to Bioinformatics, 2014 Genome assembly Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Contents Genome assembly problem Approaches Comparative assembly The string
More informationEulerian tours. Russell Impagliazzo and Miles Jones Thanks to Janine Tiefenbruck. April 20, 2016
Eulerian tours Russell Impagliazzo and Miles Jones Thanks to Janine Tiefenbruck http://cseweb.ucsd.edu/classes/sp16/cse21-bd/ April 20, 2016 Seven Bridges of Konigsberg Is there a path that crosses each
More informationBLAST & Genome assembly
BLAST & Genome assembly Solon P. Pissis Tomáš Flouri Heidelberg Institute for Theoretical Studies May 15, 2014 1 BLAST What is BLAST? The algorithm 2 Genome assembly De novo assembly Mapping assembly 3
More informationEulerian Tours and Fleury s Algorithm
Eulerian Tours and Fleury s Algorithm CSE21 Winter 2017, Day 12 (B00), Day 8 (A00) February 8, 2017 http://vlsicad.ucsd.edu/courses/cse21-w17 Vocabulary Path (or walk): describes a route from one vertex
More information(for more info see:
Genome assembly (for more info see: http://www.cbcb.umd.edu/research/assembly_primer.shtml) Introduction Sequencing technologies can only "read" short fragments from a genome. Reconstructing the entire
More informationIntroduction to Genome Assembly. Tandy Warnow
Introduction to Genome Assembly Tandy Warnow 2 Shotgun DNA Sequencing DNA target sample SHEAR & SIZE End Reads / Mate Pairs 550bp 10,000bp Not all sequencing technologies produce mate-pairs. Different
More informationHow to apply de Bruijn graphs to genome assembly
PRIMER How to apply de Bruijn graphs to genome assembly Phillip E C Compeau, Pavel A Pevzner & lenn Tesler A mathematical concept known as a de Bruijn graph turns the formidable challenge of assembling
More informationGraphs and Puzzles. Eulerian and Hamiltonian Tours.
Graphs and Puzzles. Eulerian and Hamiltonian Tours. CSE21 Winter 2017, Day 11 (B00), Day 7 (A00) February 3, 2017 http://vlsicad.ucsd.edu/courses/cse21-w17 Exam Announcements Seating Chart on Website Good
More informationProblem statement. CS267 Assignment 3: Parallelize Graph Algorithms for de Novo Genome Assembly. Spring Example.
CS267 Assignment 3: Problem statement 2 Parallelize Graph Algorithms for de Novo Genome Assembly k-mers are sequences of length k (alphabet is A/C/G/T). An extension is a simple symbol (A/C/G/T/F). The
More informationBioinformatics: Fragment Assembly. Walter Kosters, Universiteit Leiden. IPA Algorithms&Complexity,
Bioinformatics: Fragment Assembly Walter Kosters, Universiteit Leiden IPA Algorithms&Complexity, 29.6.2007 www.liacs.nl/home/kosters/ 1 Fragment assembly Problem We study the following problem from bioinformatics:
More informationParallel de novo Assembly of Complex (Meta) Genomes via HipMer
Parallel de novo Assembly of Complex (Meta) Genomes via HipMer Aydın Buluç Computational Research Division, LBNL May 23, 2016 Invited Talk at HiCOMB 2016 Outline and Acknowledgments Joint work (alphabetical)
More informationBLAST & Genome assembly
BLAST & Genome assembly Solon P. Pissis Tomáš Flouri Heidelberg Institute for Theoretical Studies November 17, 2012 1 Introduction Introduction 2 BLAST What is BLAST? The algorithm 3 Genome assembly De
More informationBMI/CS 576 Fall 2015 Midterm Exam
BMI/CS 576 Fall 2015 Midterm Exam Prof. Colin Dewey Tuesday, October 27th, 2015 11:00am-12:15pm Name: KEY Write your answers on these pages and show your work. You may use the back sides of pages as necessary.
More informationReducing Genome Assembly Complexity with Optical Maps
Reducing Genome Assembly Complexity with Optical Maps Lee Mendelowitz LMendelo@math.umd.edu Advisor: Dr. Mihai Pop Computer Science Department Center for Bioinformatics and Computational Biology mpop@umiacs.umd.edu
More informationRead Mapping and Assembly
Statistical Bioinformatics: Read Mapping and Assembly Stefan Seemann seemann@rth.dk University of Copenhagen April 9th 2019 Why sequencing? Why sequencing? Which organism does the sample comes from? Assembling
More informationGenome Assembly Using de Bruijn Graphs. Biostatistics 666
Genome Assembly Using de Bruijn Graphs Biostatistics 666 Previously: Reference Based Analyses Individual short reads are aligned to reference Genotypes generated by examining reads overlapping each position
More informationIDBA - A Practical Iterative de Bruijn Graph De Novo Assembler
IDBA - A Practical Iterative de Bruijn Graph De Novo Assembler Yu Peng, Henry Leung, S.M. Yiu, Francis Y.L. Chin Department of Computer Science, The University of Hong Kong Pokfulam Road, Hong Kong {ypeng,
More informationGenome 373: Mapping Short Sequence Reads I. Doug Fowler
Genome 373: Mapping Short Sequence Reads I Doug Fowler Two different strategies for parallel amplification BRIDGE PCR EMULSION PCR Two different strategies for parallel amplification BRIDGE PCR EMULSION
More informationDNA Fragment Assembly Algorithms: Toward a Solution for Long Repeats
San Jose State University SJSU ScholarWorks Master's Projects Master's Theses and Graduate Research 2008 DNA Fragment Assembly Algorithms: Toward a Solution for Long Repeats Ching Li San Jose State University
More informationGraphs and Genetics. Outline. Computational Biology IST. Ana Teresa Freitas 2015/2016. Slides source: AED (MEEC/IST); Jones and Pevzner (book)
raphs and enetics Computational Biology IST Ana Teresa Freitas / Slides source: AED (MEEC/IST); Jones and Pevzner (book) Outline l Motivacion l Introduction to raph Theory l Eulerian & Hamiltonian Cycle
More informationReducing Genome Assembly Complexity with Optical Maps
Reducing Genome Assembly Complexity with Optical Maps AMSC 663 Mid-Year Progress Report 12/13/2011 Lee Mendelowitz Lmendelo@math.umd.edu Advisor: Mihai Pop mpop@umiacs.umd.edu Computer Science Department
More informationIDBA A Practical Iterative de Bruijn Graph De Novo Assembler
IDBA A Practical Iterative de Bruijn Graph De Novo Assembler Yu Peng, Henry C.M. Leung, S.M. Yiu, and Francis Y.L. Chin Department of Computer Science, The University of Hong Kong Pokfulam Road, Hong Kong
More informationPerformance analysis of parallel de novo genome assembly in shared memory system
IOP Conference Series: Earth and Environmental Science PAPER OPEN ACCESS Performance analysis of parallel de novo genome assembly in shared memory system To cite this article: Syam Budi Iryanto et al 2018
More informationCS 68: BIOINFORMATICS. Prof. Sara Mathieson Swarthmore College Spring 2018
CS 68: BIOINFORMATICS Prof. Sara Mathieson Swarthmore College Spring 2018 Outline: Jan 31 DBG assembly in practice Velvet assembler Evaluation of assemblies (if time) Start: string alignment Candidate
More informationOmega: an Overlap-graph de novo Assembler for Metagenomics
Omega: an Overlap-graph de novo Assembler for Metagenomics B a h l e l H a i d e r, Ta e - H y u k A h n, B r i a n B u s h n e l l, J u a n j u a n C h a i, A l e x C o p e l a n d, C h o n g l e Pa n
More informationCrossing bridges. Crossing bridges Great Ideas in Theoretical Computer Science. Lecture 12: Graphs I: The Basics. Königsberg (Prussia)
15-251 Great Ideas in Theoretical Computer Science Lecture 12: Graphs I: The Basics February 22nd, 2018 Crossing bridges Königsberg (Prussia) Now Kaliningrad (Russia) Is there a way to walk through the
More informationGenome Assembly and De Novo RNAseq
Genome Assembly and De Novo RNAseq BMI 7830 Kun Huang Department of Biomedical Informatics The Ohio State University Outline Problem formulation Hamiltonian path formulation Euler path and de Bruijin graph
More informationReducing Genome Assembly Complexity with Optical Maps Mid-year Progress Report
Reducing Genome Assembly Complexity with Optical Maps Mid-year Progress Report Lee Mendelowitz LMendelo@math.umd.edu Advisor: Dr. Mihai Pop Computer Science Department Center for Bioinformatics and Computational
More informationCS 173, Lecture B Introduction to Genome Assembly (using Eulerian Graphs) Tandy Warnow
CS 173, Lecture B Introduction to Genome Assembly (using Eulerian Graphs) Tandy Warnow 2 Shotgun DNA Sequencing DNA target sample SHEAR & SIZE End Reads / Mate Pairs 550bp 10,000bp Not all sequencing technologies
More informationTCGR: A Novel DNA/RNA Visualization Technique
TCGR: A Novel DNA/RNA Visualization Technique Donya Quick and Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist University Dallas, Texas 75275 dquick@mail.smu.edu, mhd@engr.smu.edu
More informationEulerian Paths and Cycles
Eulerian Paths and Cycles What is a Eulerian Path Given an graph. Find a path which uses every edge exactly once. This path is called an Eulerian Path. If the path begins and ends at the same vertex, it
More informationCSCI 1820 Notes. Scribes: tl40. February 26 - March 02, Estimating size of graphs used to build the assembly.
CSCI 1820 Notes Scribes: tl40 February 26 - March 02, 2018 Chapter 2. Genome Assembly Algorithms 2.1. Statistical Theory 2.2. Algorithmic Theory Idury-Waterman Algorithm Estimating size of graphs used
More informationDescription of a genome assembler: CABOG
Theo Zimmermann Description of a genome assembler: CABOG CABOG (Celera Assembler with the Best Overlap Graph) is an assembler built upon the Celera Assembler, which, at first, was designed for Sanger sequencing,
More informationby the Genevestigator program (www.genevestigator.com). Darker blue color indicates higher gene expression.
Figure S1. Tissue-specific expression profile of the genes that were screened through the RHEPatmatch and root-specific microarray filters. The gene expression profile (heat map) was drawn by the Genevestigator
More informationA Genome Assembly Algorithm Designed for Single-Cell Sequencing
SPAdes A Genome Assembly Algorithm Designed for Single-Cell Sequencing Bankevich A, Nurk S, Antipov D, et al. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput
More informationA THEORETICAL ANALYSIS OF SCALABILITY OF THE PARALLEL GENOME ASSEMBLY ALGORITHMS
A THEORETICAL ANALYSIS OF SCALABILITY OF THE PARALLEL GENOME ASSEMBLY ALGORITHMS Munib Ahmed, Ishfaq Ahmad Department of Computer Science and Engineering, University of Texas At Arlington, Arlington, Texas
More informationarxiv: v1 [cs.dc] 31 May 2017
Extreme-Scale De Novo Genome Assembly Evangelos Georganas 1, Steven Hofmeyr 2, Rob Egan 3, Aydın Buluç 2, Leonid Oliker 2, Daniel Rokhsar 3, Katherine Yelick 2 arxiv:1705.11147v1 [cs.dc] 31 May 2017 1
More informationPATH FINDING AND GRAPH TRAVERSAL
GRAPH TRAVERSAL PATH FINDING AND GRAPH TRAVERSAL Path finding refers to determining the shortest path between two vertices in a graph. We discussed the Floyd Warshall algorithm previously, but you may
More informationEfficient Selection of Unique and Popular Oligos for Large EST Databases. Stefano Lonardi. University of California, Riverside
Efficient Selection of Unique and Popular Oligos for Large EST Databases Stefano Lonardi University of California, Riverside joint work with Jie Zheng, Timothy Close, Tao Jiang University of California,
More informationHiPGA: A High Performance Genome Assembler for Short Read Sequence Data
2014 IEEE 28th International Parallel & Distributed Processing Symposium Workshops HiPGA: A High Performance Genome Assembler for Short Read Sequence Data Xiaohui Duan, Kun Zhao, Weiguo Liu* School of
More informationTitle:- Instructions to run GS Assembler and Mapper Course # BIOL 8803 Special Topic on Computational Genomics Assembly Group
Title:- Instructions to run GS Assembler and Mapper Course # BIOL 8803 Special Topic on Computational Genomics Assembly Group Contents 1. Genome Assembly... 3 1.0. Data and Projects... 3 1.1. GS De Novo
More informationComputational Methods for de novo Assembly of Next-Generation Genome Sequencing Data
1/39 Computational Methods for de novo Assembly of Next-Generation Genome Sequencing Data Rayan Chikhi ENS Cachan Brittany / IRISA (Genscale team) Advisor : Dominique Lavenier 2/39 INTRODUCTION, YEAR 2000
More informationThe Value of Mate-pairs for Repeat Resolution
The Value of Mate-pairs for Repeat Resolution An Analysis on Graphs Created From Short Reads Joshua Wetzel Department of Computer Science Rutgers University Camden in conjunction with CBCB at University
More informationGenome Sequencing & Assembly. Slides by Carl Kingsford
Genome Sequencing & Assembly Slides by Carl Kingsford Genome Sequencing ACCGTCCAATTGG...! TGGCAGGTTAACC... E.g. human: 3 billion bases split into 23 chromosomes Main tool of traditional sequencing: DNA
More informationPyramidal and Chiral Groupings of Gold Nanocrystals Assembled Using DNA Scaffolds
Pyramidal and Chiral Groupings of Gold Nanocrystals Assembled Using DNA Scaffolds February 27, 2009 Alexander Mastroianni, Shelley Claridge, A. Paul Alivisatos Department of Chemistry, University of California,
More informationAdam M Phillippy Center for Bioinformatics and Computational Biology
Adam M Phillippy Center for Bioinformatics and Computational Biology WGS sequencing shearing sequencing assembly WGS assembly Overlap reads identify reads with shared k-mers calculate edit distance Layout
More informationRead Mapping. Slides by Carl Kingsford
Read Mapping Slides by Carl Kingsford Bowtie Ultrafast and memory-efficient alignment of short DNA sequences to the human genome Ben Langmead, Cole Trapnell, Mihai Pop and Steven L Salzberg, Genome Biology
More informationWalking with Euler through Ostpreußen and RNA
Walking with Euler through Ostpreußen and RNA Mark Muldoon February 4, 2010 Königsberg (1652) Kaliningrad (2007)? The Königsberg Bridge problem asks whether it is possible to walk around the old city in
More informationCoordinates and Intervals in Graph-based Reference Genomes
Coordinates and Intervals in Graph-based Reference Genomes Knut D. Rand *, Ivar Grytten **, Alexander J. Nederbragt **,***, Geir O. Storvik *, Ingrid K. Glad *, and Geir K. Sandve ** * Statistics and biostatistics,
More informationHybrid Parallel Programming
Hybrid Parallel Programming for Massive Graph Analysis KameshMdd Madduri KMadduri@lbl.gov ComputationalResearch Division Lawrence Berkeley National Laboratory SIAM Annual Meeting 2010 July 12, 2010 Hybrid
More informationIDBA - A practical Iterative de Bruijn Graph De Novo Assembler
IDBA - A practical Iterative de Bruijn Graph De Novo Assembler Speaker: Gabriele Capannini May 21, 2010 Introduction De Novo Assembly assembling reads together so that they form a new, previously unknown
More informationBioinformatics-themed projects in Discrete Mathematics
Bioinformatics-themed projects in Discrete Mathematics Art Duval University of Texas at El Paso Joint Mathematics Meeting MAA Contributed Paper Session on Discrete Mathematics in the Undergraduate Curriculum
More informationMITOCW watch?v=zyw2aede6wu
MITOCW watch?v=zyw2aede6wu The following content is provided under a Creative Commons license. Your support will help MIT OpenCourseWare continue to offer high quality educational resources for free. To
More informationSUPPLEMENTARY INFORMATION. Systematic evaluation of CRISPR-Cas systems reveals design principles for genome editing in human cells
SUPPLEMENTARY INFORMATION Systematic evaluation of CRISPR-Cas systems reveals design principles for genome editing in human cells Yuanming Wang 1,2,7, Kaiwen Ivy Liu 2,7, Norfala-Aliah Binte Sutrisnoh
More informationCSE 549: Genome Assembly De Bruijn Graph. All slides in this lecture not marked with * courtesy of Ben Langmead.
CSE 549: Genome Assembly De Bruijn Graph All slides in this lecture not marked with * courtesy of Ben Langmead. Real-world assembly methods OLC: Overlap-Layout-Consensus assembly DBG: De Bruijn graph assembly
More informationNext Generation Sequencing Workshop De novo genome assembly
Next Generation Sequencing Workshop De novo genome assembly Tristan Lefébure TNL7@cornell.edu Stanhope Lab Population Medicine & Diagnostic Sciences Cornell University April 14th 2010 De novo assembly
More informationThe Konigsberg Bridge Problem
The Konigsberg Bridge Problem This is a classic mathematical problem. There were seven bridges across the river Pregel at Königsberg. Is it possible to take a walk in which each bridge is crossed exactly
More informationSequence Design Problems in Discovery of Regulatory Elements
Sequence Design Problems in Discovery of Regulatory Elements Yaron Orenstein, Bonnie Berger and Ron Shamir Regulatory Genomics workshop Simons Institute March 10th, 2016, Berkeley, CA Differentially methylated
More informationScalable Solutions for DNA Sequence Analysis
Scalable Solutions for DNA Sequence Analysis Michael Schatz Dec 4, 2009 JHU/UMD Joint Sequencing Meeting The Evolution of DNA Sequencing Year Genome Technology Cost 2001 Venter et al. Sanger (ABI) $300,000,000
More informationABySS. Assembly By Short Sequences
ABySS Assembly By Short Sequences ABySS Developed at Canada s Michael Smith Genome Sciences Centre Developed in response to memory demands of conventional DBG assembly methods Parallelizability Illumina
More informationDiscrete Mathematics and Probability Theory Fall 2009 Satish Rao,David Tse Note 8
CS 70 Discrete Mathematics and Probability Theory Fall 2009 Satish Rao,David Tse Note 8 An Introduction to Graphs Formulating a simple, precise specification of a computational problem is often a prerequisite
More informationDNA arrays. and their various applications. Algorithmen der Bioinformatik II - SoSe Christoph Dieterich
DNA arrays and their various applications Algorithmen der Bioinformatik II - SoSe 2007 Christoph Dieterich 1 Introduction Motivation DNA microarray is a parallel approach to gene screening and target identification.
More information8.2 Paths and Cycles
8.2 Paths and Cycles Degree a b c d e f Definition The degree of a vertex is the number of edges incident to it. A loop contributes 2 to the degree of the vertex. (G) is the maximum degree of G. δ(g) is
More informationDiscrete Mathematics and Probability Theory Fall 2013 Vazirani Note 7
CS 70 Discrete Mathematics and Probability Theory Fall 2013 Vazirani Note 7 An Introduction to Graphs A few centuries ago, residents of the city of Königsberg, Prussia were interested in a certain problem.
More informationDiscrete Mathematics for CS Spring 2008 David Wagner Note 13. An Introduction to Graphs
CS 70 Discrete Mathematics for CS Spring 2008 David Wagner Note 13 An Introduction to Graphs Formulating a simple, precise specification of a computational problem is often a prerequisite to writing a
More informationAssembly in the Clouds
Assembly in the Clouds Michael Schatz October 13, 2010 Beyond the Genome Shredded Book Reconstruction Dickens accidentally shreds the first printing of A Tale of Two Cities Text printed on 5 long spools
More informationIntroduction and tutorial for SOAPdenovo. Xiaodong Fang Department of Science and BGI May, 2012
Introduction and tutorial for SOAPdenovo Xiaodong Fang fangxd@genomics.org.cn Department of Science and Technology @ BGI May, 2012 Why de novo assembly? Genome is the genetic basis for different phenotypes
More information6 Anhang. 6.1 Transgene Su(var)3-9-Linien. P{GS.ry + hs(su(var)3-9)egfp} 1 I,II,III,IV 3 2I 3 3 I,II,III 3 4 I,II,III 2 5 I,II,III,IV 3
6.1 Transgene Su(var)3-9-n P{GS.ry + hs(su(var)3-9)egfp} 1 I,II,III,IV 3 2I 3 3 I,II,III 3 4 I,II,II 5 I,II,III,IV 3 6 7 I,II,II 8 I,II,II 10 I,II 3 P{GS.ry + UAS(Su(var)3-9)EGFP} A AII 3 B P{GS.ry + (10.5kbSu(var)3-9EGFP)}
More informationCombinatorial Pattern Matching. CS 466 Saurabh Sinha
Combinatorial Pattern Matching CS 466 Saurabh Sinha Genomic Repeats Example of repeats: ATGGTCTAGGTCCTAGTGGTC Motivation to find them: Genomic rearrangements are often associated with repeats Trace evolutionary
More informationCS681: Advanced Topics in Computational Biology
CS681: Advanced Topics in Computational Biology Can Alkan EA224 calkan@cs.bilkent.edu.tr Week 7 Lectures 2-3 http://www.cs.bilkent.edu.tr/~calkan/teaching/cs681/ Genome Assembly Test genome Random shearing
More informationAlignment of Long Sequences
Alignment of Long Sequences BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 2009 Mark Craven craven@biostat.wisc.edu Pairwise Whole Genome Alignment: Task Definition Given a pair of genomes (or other large-scale
More informationTutorial. Aligning contigs manually using the Genome Finishing. Sample to Insight. February 6, 2019
Aligning contigs manually using the Genome Finishing Module February 6, 2019 Sample to Insight QIAGEN Aarhus Silkeborgvej 2 Prismet 8000 Aarhus C Denmark Telephone: +45 70 22 32 44 www.qiagenbioinformatics.com
More informationCharacterization of Graphs with Eulerian Circuits
Eulerian Circuits 3. 73 Characterization of Graphs with Eulerian Circuits There is a simple way to determine if a graph has an Eulerian circuit. Theorems 3.. and 3..2: Let G be a pseudograph that is connected
More information11.2 Eulerian Trails
11.2 Eulerian Trails K.. onigsberg, 1736 Graph Representation A B C D Do You Remember... Definition A u v trail is a u v walk where no edge is repeated. Do You Remember... Definition A u v trail is a u
More informationFinishing Circular Assemblies. J Fass UCD Genome Center Bioinformatics Core Thursday April 16, 2015
Finishing Circular Assemblies J Fass UCD Genome Center Bioinformatics Core Thursday April 16, 2015 Assembly Strategies de Bruijn graph Velvet, ABySS earlier, basic assemblers IDBA, SPAdes later, multi-k
More informationChapter 3: Paths and Cycles
Chapter 3: Paths and Cycles 5 Connectivity 1. Definitions: Walk: finite sequence of edges in which any two consecutive edges are adjacent or identical. (Initial vertex, Final vertex, length) Trail: walk
More informationAMOS Assembly Validation and Visualization
AMOS Assembly Validation and Visualization Michael Schatz Center for Bioinformatics and Computational Biology University of Maryland April 7, 2006 Outline AMOS Introduction Getting Data into AMOS AMOS
More information7.36/7.91 recitation. DG Lectures 5 & 6 2/26/14
7.36/7.91 recitation DG Lectures 5 & 6 2/26/14 1 Announcements project specific aims due in a little more than a week (March 7) Pset #2 due March 13, start early! Today: library complexity BWT and read
More information