GATB programming day
|
|
- Adam Watts
- 5 years ago
- Views:
Transcription
1 GATB programming day G.Rizk, R.Chikhi Genscale, Rennes 15/06/2016 G.Rizk, R.Chikhi (Genscale, Rennes) GATB workshop 15/06/ / 41
2 GATB INTRODUCTION NGS technologies produce terabytes of data Efficient and fast algorithms are essential to analyze such data >read 1 ACGACGACGTAGACGACTAGC AAAACTACGATCGACTAT >read 2 ACTACTACGATCGATGGTCGC GCTGCTCGCTCTCTCGCT... >read TCTCCTAGCGCGGCGTATACG CTCGCTAGCTACGTAGCT... The Genome Assembly Tool Box (GATB) 1. Open-source software developed by GENSCALE 2. Easy way to develop efficient and fast NGS tools 3. Based on data structure with a very low memory footprint 4. Complex genomes can be processed on desktop computers G.Rizk, R.Chikhi (Genscale, Rennes) GATB workshop 15/06/ / 41
3 GATB INTRODUCTION The GATB philosophy: a 3-layer construction to analyze NGS datasets 1. GATB-CORE: a C++ library holding all the services needed for developing software dedicated to NGS data 2. GATB-TOOLS: a set of elementary NGS tools mainly built upon the GATB library (k-mer counter, contiger, scaffolder, variant detection, etc.) GATB-PIPELINE GATB-TOOLS GATB-CORE THIRD PARTIES 3. GATB-PIPELINE: a set of NGS pipeline that links together tools from the previous layer G.Rizk, R.Chikhi (Genscale, Rennes) GATB workshop 15/06/ / 41
4 GATB INTRODUCTION GATB-CORE structure for NGS data Compact de Bruijn graph AGC CGC GCT GAG Encodes most of the information from the sequencing reads CCG TCC CTA CGA AAA GGA TGG TAT Graph with low memory footprint ATC ATT TTG Complex genomes can be processed on a desktop computer A whole human genome handled with 6 GBytes The raspberry genome can be assembled on a Raspberry Pi G.Rizk, R.Chikhi (Genscale, Rennes) GATB workshop 15/06/ / 41
5 GATB FROM READS TO DE BRUIJN GRAPH Each read is split into words named kmers A kmer has a fixed size K Example for K=11 >read 1 ACGACGACGTAGTAAACTACGATCGACTAT ACGACGACGTA kmer 1 CGACGACGTAG kmer 2 GACGACGTAGT kmer 3 CGACGTAGTA kmer ACGATCGACTA kmer 18 CGATCGACTAT kmer 19 G.Rizk, R.Chikhi (Genscale, Rennes) GATB workshop 15/06/ / 41
6 GATB FROM READS TO DE BRUIJN GRAPH Count kmers and keep solid kmers (i.e. present at least N times) Each solid kmer is inserted as a node of a de Bruijn graph Nodes A,B are connected <=> suffix(a,k-1) = prefix(b,k-1) Two reads sharing (K-1) overlapping will be connected in the de Bruijn graph >read 1 ACGACGACGTAGTAAACTACGATCGACTAT >read 2 CTACGATCGACTATTAGTGATGATAGATAGAT Kmers specific to read 1 Kmers common to read 1 and read 2 Kmers specific to read 2 G.Rizk, R.Chikhi (Genscale, Rennes) GATB workshop 15/06/ / 41
7 GATB FROM READS TO DE BRUIJN GRAPH In a nutshell Split the reads into kmers 2. Count kmers and keep the solid ones 3. Insert the solid kmers into a de Bruijn graph >read 1 ACGACGACGTAGACGACTAGCTAGCAATGCTA GCTAGGATCAAAACTACGATCGACTAT >read 2 ACTACTACGATCGATGGTCGAGGGCGAGCTAG CTAGCTGACGCTGCTCGCTCTCTCGCT... >read TCTCCTAGCGCGGCGTATACGCGCTAAGCTAG CTCTCGCTGCTCGCTAGCTACGTAGCT... G.Rizk, R.Chikhi (Genscale, Rennes) GATB workshop 15/06/ / 41
8 GATB FROM READS TO DE BRUIJN GRAPH GATB-CORE only stores nodes of the de Bruijn graph, the edges can be computed on the fly when needed. The nodes are stored in a Bloom filter, a space-efficient structure Two reasons for a very low memory footprint G.Rizk, R.Chikhi (Genscale, Rennes) GATB workshop 15/06/ / 41
9 GATB FROM READS TO DE BRUIJN GRAPH So, GATB-CORE transforms a set of reads into a compact de Bruijn graph The graph is saved in HDF5 format Such a graph can be used by tools developed with the GATB-CORE C++ Library Reads (FASTA) >14_G AGTCGGCTAGCAT AGTGCTCAGGAGC TTAAACATGCATG AGAG >14_G ATCGACTTCTCTT CTTTTCGAGCTTA GCTAATCA GATB-CORE API Library Binaries Graph (HDF5) GATB-TOOLS Tool 1 Tool 2 Tool 3 G.Rizk, R.Chikhi (Genscale, Rennes) GATB workshop 15/06/ / 41
10 GATB THE GATB-CORE API GATB-CORE C++ library Fast development of efficient tools Easy and fast learning curve (full documentation with numerous code samples) From developers point of view don't have to bother with the de Bruijn graph construction focus on their own algorithms GATB-CORE high level API G.Rizk, R.Chikhi (Genscale, Rennes) GATB workshop 15/06/ / 41
11 GATB THE GATB-CORE API The system package holds all the operations related to the operating system: file management, memory management and thread management. The tools package offers generic operations used throughout user applicative code, but not specific to genomic area. The bank package provides operations related to standard genomic sequence dataset management. Using this package allows to write algorithms independently of the input format. The kmer package is dedicated to fine-grained manipulation of k-mers. The debruijn package provides high-level functions to manipulate the de Bruijn graph data structure G.Rizk, R.Chikhi (Genscale, Rennes) GATB workshop 15/06/ / 41
12 GATB THE GATB-CORE API Details on the graph API (class Graph) Iterate all the nodes of a graph Iterate branching nodes of a graph Get neighbors nodes/edges from a node Get in/out degree from a node Depth First Search from a node Breadth First Search from a node Two use cases of de Bruijn graph navigation Assembly Detection of patterns in the graph G.Rizk, R.Chikhi (Genscale, Rennes) GATB workshop 15/06/ / 41
13 G.Rizk, R.Chikhi (Genscale, Rennes) GATB workshop 15/06/ / 41
14 GATB-core Coding session Basic input/output 15 min break Kmerization / graph API Lunch Mini tool: error correction. G.Rizk, R.Chikhi (Genscale, Rennes) GATB workshop 15/06/ / 41
15 Documentation Open How to use the library? : many code examples. Use top right search field to find doc for a class. Sources in the gatb-core library examples/ General design of library, documentation of API. G.Rizk, R.Chikhi (Genscale, Rennes) GATB workshop 15/06/ / 41
16 GATB input/output GATB can : read files in fasta or fastq format. read gzipped files directly. no need to tell which format it is. read list of files. write fasta/fastq files. G.Rizk, R.Chikhi (Genscale, Rennes) GATB workshop 15/06/ / 41
17 Introduction : code io1.cpp # include <gatb / gatb_core.hpp > int main ( int argc, char * argv []) { IBank * inbank = Bank :: open ( argv [1]) ; Iterator < Sequence >* it = inbank - > iterator (); } for (it -> first ();!it -> isdone (); it -> next ()) { // Shortcut Sequence & seq = it -> item (); // We dump the data size and the comment std :: cout << "[" << seq. getdatasize () << "]" << seq. getcomment () << std :: endl ; // We dump the data std :: cout << seq. tostring () << std :: endl ; } G.Rizk, R.Chikhi (Genscale, Rennes) GATB workshop 15/06/ / 41
18 First project Go to gatb/devel/samples Compile first example with: make io1 Try with./io1 read.fastq G.Rizk, R.Chikhi (Genscale, Rennes) GATB workshop 15/06/ / 41
19 IO: Task 1, read a file Open the GATB documentation page Have a look at the IBank interface (type IBank in the search field). Scroll down to find methods estimatenbitems and estimatesequencessize. Modify the code to print some bank information. Should look like this : G.Rizk, R.Chikhi (Genscale, Rennes) GATB workshop 15/06/ / 41
20 IO: Task 1, read a file Open the GATB documentation page Have a look at the IBank interface (type IBank in the search field). Scroll down to find methods estimatenbitems and estimatesequencessize. Modify the code to print some bank information. Should look like this : int64_t nbseq = inbank - > estimatenbitems (); u_int64_t totalsize = inbank - > estimatesequencessize (); std :: cout << " nbseq : " << nbseq << " totalsize " << totalsize << std :: endl ; G.Rizk, R.Chikhi (Genscale, Rennes) GATB workshop 15/06/ / 41
21 IO: Task 2, write a file Create a new Bank with new BankFasta constructor: IBank * BankFasta ( const std :: string & filename, bool output_fastq = false, bool output_gz = false ) Insert sequences into this bank with insert method. Do not forget to flush the bank at the end with flush(). = You built a fastq/fasta convertor. G.Rizk, R.Chikhi (Genscale, Rennes) GATB workshop 15/06/ / 41
22 IO: Task 2, solution # include <gatb / gatb_core.hpp > int main ( int argc, char * argv []) { IBank * inbank = Bank :: open ( argv [1]) ; IBank * outbank = new BankFasta (" outbank "); Iterator < Sequence >* it = inbank - > iterator (); for (it -> first ();!it -> isdone (); it -> next ()) { // Shortcut Sequence & seq = it -> item (); } outbank -> insert ( seq ); } outbank -> flush (); G.Rizk, R.Chikhi (Genscale, Rennes) GATB workshop 15/06/ / 41
23 IO: Task 2 The BankFasta constructor has options to output in fastq or gz, try this out! The input bank can be any of fasta, fastq, fasta.gz, fastq.gz, or a list of files. With comma separated list:./ io1 read. fasta, file2. fastq, file3. fasta.gz Or in a file of files, one filename per line: > echo list_of_files read. fasta file2. fasta >./ io1 list_of_files In that case the iterator on IBank is the concatenation of banks. G.Rizk, R.Chikhi (Genscale, Rennes) GATB workshop 15/06/ / 41
24 IO: Task 3 This template can be used to filter/transform sequences. Explore the Sequence API to trim 20 nt at the end of the sequence, and filter out sequences that have more than 20 N On a larger file, a progress bar would be nice! Go to bank5.cpp code snippet (in the gatb doc) to get an example. Hint: use seq.setdataref( & seq.getdata(),...) to trim and seq.getdatabuffer() to count N s G.Rizk, R.Chikhi (Genscale, Rennes) GATB workshop 15/06/ / 41
25 IO: Task 3, solution # include <gatb / gatb_core.hpp > int main ( int argc, char * argv []) { IBank * inbank = Bank :: open ( argv [1]) ; IBank * outbank = new BankFasta (" outbank "); } ProgressIterator < Sequence > * it = new ProgressIterator < Sequence > (* inbank, " Iterating sequences "); for (it -> first ();!it -> isdone (); it -> next ()) { Sequence & seq = it -> item (); // trim 20 nt at the end seq. setdataref ( & seq. getdata (),0, seq. getdatasize () -20 ); int nbn = 0; char * data = seq. getdatabuffer (); for ( size_t i =0; i<seq. getdatasize (); i ++) { if ( data [i ]== N ) { nbn ++; } } if(nbn <20) outbank -> insert ( seq ); } outbank -> flush (); G.Rizk, R.Chikhi (Genscale, Rennes) GATB workshop 15/06/ / 41
26 Kmerization Everything in GATB is kmer based! API for kmer extraction. Open File kmer1.cpp // We declare a kmer model Kmer <span >:: ModelDirect model ( kmersize ); // and a kmer iterator Kmer <span >:: ModelDirect :: Iterator itkmer ( model ); compile: make kmer1 then launch:./kmer1../data/read3.fastq Observe output. G.Rizk, R.Chikhi (Genscale, Rennes) GATB workshop 15/06/ / 41
27 Canonical kmer Reads can be on forward or reverse strand, we do not know: each kmer may appear twice. A widespread concept is to keep only the forward or reverse complement of a kmer. By convention, the minimum (in lexic. order) between a kmer and its reverse complement is the Canonical kmer Accessible through ModelCanonical API. G.Rizk, R.Chikhi (Genscale, Rennes) GATB workshop 15/06/ / 41
28 Kmerization Replace ModelDirect with ModelCanonical See doc KmerCanonical. Try itkmer-> value(), forward(), revcomp() G.Rizk, R.Chikhi (Genscale, Rennes) GATB workshop 15/06/ / 41
29 Parallelization GATB-CORE provides a way to easily parallelize iterations Concept of Dispatcher that takes an iterator as input, creates N thread ; each thread receives and processes iterated items from the iterator through a functor object. We need to modify previous code a little bit. G.Rizk, R.Chikhi (Genscale, Rennes) GATB workshop 15/06/ / 41
30 Parallelization G.Rizk, R.Chikhi (Genscale, Rennes) GATB workshop 15/06/ / 41
31 Parallelization G.Rizk, R.Chikhi (Genscale, Rennes) GATB workshop 15/06/ / 41
32 Graph creation G.Rizk, R.Chikhi (Genscale, Rennes) GATB workshop 15/06/ / 41
33 Graph creation From command line: Graph created with command./dbgh5 -in read.fasta./dbgh5 to get option list. Use dbginfo -in graph.h5 to print information. Within code: Graph::create or Graph::load. Print graph information with: cout << graph. getinfo (); G.Rizk, R.Chikhi (Genscale, Rennes) GATB workshop 15/06/ / 41
34 minitool: Read correction Remember, graph nodes is the set of solid kmers. solid kmers are error free. Contains method to query graph for a specific kmer. In this example, the graph is only used as the set of solid kmers (to test membership), edges will not be used here. Warning: GATB structure answers set membership queries exactly only for nodes in the graph. For other kmers, the structure is probabilistic (but this is what makes it great) there might be false positives. G.Rizk, R.Chikhi (Genscale, Rennes) GATB workshop 15/06/ / 41
35 Read correction: theory G.Rizk, R.Chikhi (Genscale, Rennes) GATB workshop 15/06/ / 41
36 Read correction: in the.tar file data/ : example data solution/ : solutions samples/ : your working directory GATB-core is already installed in the VM. G.Rizk, R.Chikhi (Genscale, Rennes) GATB workshop 15/06/ / 41
37 Read correction: task 1 Open minicorrec1.cpp See documentation for Graph create a Graph with Graph graph = Graph :: create (" - in % s - abundance - min % d - kmer - size %d - debloom original ",argv [1], 3, kmersize ); Iterate over canonical kmers of the reads. G.Rizk, R.Chikhi (Genscale, Rennes) GATB workshop 15/06/ / 41
38 Read correction: task 2 Query for each kmer its solidity. print read profiles in the form of: Print a 1 for a solid kmer, a 0 for weak kmer Hint: Build a graph node with // build node from a kmer Node node ( Node :: Value ( itkmer -> value ())); Query node solidity with graph.contains(node) G.Rizk, R.Chikhi (Genscale, Rennes) GATB workshop 15/06/ / 41
39 Read correction, task 2 solution Kmer <span >:: ModelCanonical model ( kmersize ); Kmer <span >:: ModelCanonical :: Iterator itkmer ( model ); // We loop over sequences. for (it -> first ();!it -> isdone (); it -> next ()) { itkmer. setdata ((* it) -> getdata ()); // loop over kmers of the sequence for ( itkmer. first ();! itkmer. isdone (); itkmer. next ()) { // build node from a kmer Node node ( Node :: Value ( itkmer -> value ())); if( graph. contains ( node )) { printf ("1"); } else { printf ("0"); } } printf ("\n"); } G.Rizk, R.Chikhi (Genscale, Rennes) GATB workshop 15/06/ / 41
40 Read correction: task 2 Sample output: TGGCTGCCATCAGGAAGAGTTATAACAGGCATTTTATATCCTTATTTGCAGTGGTGACCCACACGAAAGATCACATACAAAGAAAAATTTGTTTATTAAC GGCACCAATAATTTGCCCATCCACAACAACCGGTACGCCGCCTTCCAGCGACGTTAATTTCGGCGCAGTCACGAACGCGGTACGTCCGTTGTTCACCATC CAGATCAATGTGCCGGAAGATAACGACATGGTGCTCGCAATGTATGAACGCCTGGGATATGAACACGCCGACGAGCTGAGTCTGGGTAAGCGTTTGATTG We ll focus on simple cases: an isolated error generates a gap of k weak kmers. G.Rizk, R.Chikhi (Genscale, Rennes) GATB workshop 15/06/ / 41
41 Read correction: task 3 Start from solution/minicorrec2.cpp if lost :) Detect stretches of k non solid kmers. infer the position of the error. change the nucleotide at this pos, build a kmer and query it in the graph, to find which nucleotide is the correct one. Hint: You can get the sequence in char * with seq.getdatabuffer(); Build a Kmer from a char * with model.codeseed() Build a Node from a kmer with : Node ( Node :: Value ( putative_corrected_kmer. value ())) You may start form the easier file minicorrec2_easier.cpp G.Rizk, R.Chikhi (Genscale, Rennes) GATB workshop 15/06/ / 41
42 Read correction Going further: correction of close snps, snps near begin/end of read. validate correction with multiple overlapping kmers many algorithms / heuristics are possible. Bloocoo (G.Benoit, G.Rizk, C.Lemaitre) > short read correction Lordec (L. Salmela, E. Rivals) > hybrid pacbio correction G.Rizk, R.Chikhi (Genscale, Rennes) GATB workshop 15/06/ / 41
GATB. The Genomic Assembly & Analysis Toolbox. GATB Workshop. D.Lavenier E.Drézen
The Genomic Assembly & Analysis Toolbox GATB Workshop D.Lavenier E.Drézen INTRODUCTION NGS technologies produce terabytes of data Efficient and fast algorithms are essential to analyze such data >read1
More informationComputational Methods for de novo Assembly of Next-Generation Genome Sequencing Data
1/39 Computational Methods for de novo Assembly of Next-Generation Genome Sequencing Data Rayan Chikhi ENS Cachan Brittany / IRISA (Genscale team) Advisor : Dominique Lavenier 2/39 INTRODUCTION, YEAR 2000
More informationGenome Reconstruction: A Puzzle with a Billion Pieces Phillip E. C. Compeau and Pavel A. Pevzner
Genome Reconstruction: A Puzzle with a Billion Pieces Phillip E. C. Compeau and Pavel A. Pevzner Outline I. Problem II. Two Historical Detours III.Example IV.The Mathematics of DNA Sequencing V.Complications
More informationdiscosnp++ Reference-free detection of SNPs and small indels v2.2.2
discosnp++ Reference-free detection of SNPs and small indels v2.2.2 User's guide November 2015 contact: pierre.peterlongo@inria.fr Table of contents GNU AFFERO GENERAL PUBLIC LICENSE... 1 Publication...
More informationde novo assembly Simon Rasmussen 36626: Next Generation Sequencing analysis DTU Bioinformatics Next Generation Sequencing Analysis
de novo assembly Simon Rasmussen 36626: Next Generation Sequencing analysis DTU Bioinformatics 27626 - Next Generation Sequencing Analysis Generalized NGS analysis Data size Application Assembly: Compare
More informationKMC API documentation
KMC API documentation for KMC v. 2.3.0 Contents Introduction 2 1 API 3 1.1 CKmerAPI class.............................................. 3 1.2 CKMCFile class..............................................
More informationPyramidal and Chiral Groupings of Gold Nanocrystals Assembled Using DNA Scaffolds
Pyramidal and Chiral Groupings of Gold Nanocrystals Assembled Using DNA Scaffolds February 27, 2009 Alexander Mastroianni, Shelley Claridge, A. Paul Alivisatos Department of Chemistry, University of California,
More information1 Abstract. 2 Introduction. 3 Requirements
1 Abstract 2 Introduction This SOP describes the HMP Whole- Metagenome Annotation Pipeline run at CBCB. This pipeline generates a 'Pretty Good Assembly' - a reasonable attempt at reconstructing pieces
More informationC-String Library Functions
Strings Class 34 C-String Library Functions there are several useful functions in the cstring library strlen: the number of characters before the \0 strncat: concatenate two strings together strncpy: overwrite
More informationLab 1: First Steps in C++ - Eclipse
Lab 1: First Steps in C++ - Eclipse Step Zero: Select workspace 1. Upon launching eclipse, we are ask to chose a workspace: 2. We select a new workspace directory (e.g., C:\Courses ): 3. We accept the
More informationLecture 12. Short read aligners
Lecture 12 Short read aligners Ebola reference genome We will align ebola sequencing data against the 1976 Mayinga reference genome. We will hold the reference gnome and all indices: mkdir -p ~/reference/ebola
More informationModern C++ for Computer Vision and Image Processing. Igor Bogoslavskyi
Modern C++ for Computer Vision and Image Processing Igor Bogoslavskyi Outline Static variables and methods Representation of numbers in memory Raw C arrays Non-owning pointers in C++ Classes in memory
More informationC++ Basics. Data Processing Course, I. Hrivnacova, IPN Orsay
C++ Basics Data Processing Course, I. Hrivnacova, IPN Orsay The First Program Comments Function main() Input and Output Namespaces Variables Fundamental Types Operators Control constructs 1 C++ Programming
More informationMy First Command-Line Program
1. Tutorial Overview My First Command-Line Program In this tutorial, you re going to create a very simple command-line application that runs in a window. Unlike a graphical user interface application where
More informationSequencing error correction
Sequencing error correction Ben Langmead Department of Computer Science You are free to use these slides. If you do, please sign the guestbook (www.langmead-lab.org/teaching-materials), or email me (ben.langmead@gmail.com)
More informationAppendix A. Example code output. Chapter 1. Chapter 3
Appendix A Example code output This is a compilation of output from selected examples. Some of these examples requires exernal input from e.g. STDIN, for such examples the interaction with the program
More informationCS-211 Fall 2017 Test 1 Version Practice For Test on Oct. 2, Name:
CS-211 Fall 2017 Test 1 Version Practice For Test on Oct. 2, 2017 True/False Questions... Name: 1. (10 points) For the following, Check T if the statement is true, the F if the statement is false. (a)
More informationOmega: an Overlap-graph de novo Assembler for Metagenomics
Omega: an Overlap-graph de novo Assembler for Metagenomics B a h l e l H a i d e r, Ta e - H y u k A h n, B r i a n B u s h n e l l, J u a n j u a n C h a i, A l e x C o p e l a n d, C h o n g l e Pa n
More informationFast Introduction to Object Oriented Programming and C++
Fast Introduction to Object Oriented Programming and C++ Daniel G. Aliaga Note: a compilation of slides from Jacques de Wet, Ohio State University, Chad Willwerth, and Daniel Aliaga. Outline Programming
More informationdebgr: An Efficient and Near-Exact Representation of the Weighted de Bruijn Graph Prashant Pandey Stony Brook University, NY, USA
debgr: An Efficient and Near-Exact Representation of the Weighted de Bruijn Graph Prashant Pandey Stony Brook University, NY, USA De Bruijn graphs are ubiquitous [Pevzner et al. 2001, Zerbino and Birney,
More informationDNA Sequence Reads Compression
DNA Sequence Reads Compression User Guide Release 2.0 March 31, 2014 Contents Contents ii 1 Introduction 1 1.1 What is DSRC?....................................... 1 1.2 Main features.......................................
More informationCS 216 Fall 2007 Midterm 1 Page 1 of 10 Name: ID:
Page 1 of 10 Name: Email ID: You MUST write your name and e-mail ID on EACH page and bubble in your userid at the bottom of EACH page including this page and page 10. If you do not do this, you will receive
More informationABySS. Assembly By Short Sequences
ABySS Assembly By Short Sequences ABySS Developed at Canada s Michael Smith Genome Sciences Centre Developed in response to memory demands of conventional DBG assembly methods Parallelizability Illumina
More informationExercise Session 2 Systems Programming and Computer Architecture
Systems Group Department of Computer Science ETH Zürich Exercise Session 2 Systems Programming and Computer Architecture Herbstsemester 216 Agenda Linux vs. Windows Working with SVN Exercise 1: bitcount()
More information1. Download the data from ENA and QC it:
GenePool-External : Genome Assembly tutorial for NGS workshop 20121016 This page last changed on Oct 11, 2012 by tcezard. This is a whole genome sequencing of a E. coli from the 2011 German outbreak You
More informationMaciej Sobieraj. Lecture 1
Maciej Sobieraj Lecture 1 Outline 1. Introduction to computer programming 2. Advanced flow control and data aggregates Your first program First we need to define our expectations for the program. They
More informationGenome 373: Genome Assembly. Doug Fowler
Genome 373: Genome Assembly Doug Fowler What are some of the things we ve seen we can do with HTS data? We ve seen that HTS can enable a wide variety of analyses ranging from ID ing variants to genome-
More informationShort Notes of CS201
#includes: Short Notes of CS201 The #include directive instructs the preprocessor to read and include a file into a source code file. The file name is typically enclosed with < and > if the file is a system
More informationPerforming a resequencing assembly
BioNumerics Tutorial: Performing a resequencing assembly 1 Aim In this tutorial, we will discuss the different options to obtain statistics about the sequence read set data and assess the quality, and
More informationParallel de novo Assembly of Complex (Meta) Genomes via HipMer
Parallel de novo Assembly of Complex (Meta) Genomes via HipMer Aydın Buluç Computational Research Division, LBNL May 23, 2016 Invited Talk at HiCOMB 2016 Outline and Acknowledgments Joint work (alphabetical)
More informationTHE INTEGER DATA TYPES. Laura Marik Spring 2012 C++ Course Notes (Provided by Jason Minski)
THE INTEGER DATA TYPES STORAGE OF INTEGER TYPES IN MEMORY All data types are stored in binary in memory. The type that you give a value indicates to the machine what encoding to use to store the data in
More informationCS61C : Machine Structures
Get your clickers ready...! inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture 3 Introduction to the C Programming Language (pt 1)!!Senior Lecturer SOE Dan Garcia!!!www.cs.berkeley.edu/~ddgarcia
More informationArmide Documentation. Release Kyle Mayes
Armide Documentation Release 0.3.1 Kyle Mayes December 19, 2014 Contents 1 Introduction 1 1.1 Features.................................................. 1 1.2 License..................................................
More informationCS201 - Introduction to Programming Glossary By
CS201 - Introduction to Programming Glossary By #include : The #include directive instructs the preprocessor to read and include a file into a source code file. The file name is typically enclosed with
More informationAgenda. The main body and cout. Fundamental data types. Declarations and definitions. Control structures
The main body and cout Agenda 1 Fundamental data types Declarations and definitions Control structures References, pass-by-value vs pass-by-references The main body and cout 2 C++ IS AN OO EXTENSION OF
More informationIdentiyfing splice junctions from RNA-Seq data
Identiyfing splice junctions from RNA-Seq data Joseph K. Pickrell pickrell@uchicago.edu October 4, 2010 Contents 1 Motivation 2 2 Identification of potential junction-spanning reads 2 3 Calling splice
More informationJava How to Program, 10/e. Copyright by Pearson Education, Inc. All Rights Reserved.
Java How to Program, 10/e Education, Inc. All Rights Reserved. Each class you create becomes a new type that can be used to declare variables and create objects. You can declare new classes as needed;
More informationGenome Assembly Using de Bruijn Graphs. Biostatistics 666
Genome Assembly Using de Bruijn Graphs Biostatistics 666 Previously: Reference Based Analyses Individual short reads are aligned to reference Genotypes generated by examining reads overlapping each position
More informationC:\Temp\Templates. Download This PDF From The Web Site
11 2 2 2 3 3 3 C:\Temp\Templates Download This PDF From The Web Site 4 5 Use This Main Program Copy-Paste Code From The Next Slide? Compile Program 6 Copy/Paste Main # include "Utilities.hpp" # include
More informationLecture Notes CPSC 224 (Spring 2012) Today... Java basics. S. Bowers 1 of 8
Today... Java basics S. Bowers 1 of 8 Java main method (cont.) In Java, main looks like this: public class HelloWorld { public static void main(string[] args) { System.out.println("Hello World!"); Q: How
More informationProblem statement. CS267 Assignment 3: Parallelize Graph Algorithms for de Novo Genome Assembly. Spring Example.
CS267 Assignment 3: Problem statement 2 Parallelize Graph Algorithms for de Novo Genome Assembly k-mers are sequences of length k (alphabet is A/C/G/T). An extension is a simple symbol (A/C/G/T/F). The
More informationCS61C : Machine Structures
Get your clickers ready...! inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture 3 Introduction to the C Programming Language (pt 1) 2013-01-28! Hello to Nishant Varma watching from India!!!Senior
More informationASAP - Allele-specific alignment pipeline
ASAP - Allele-specific alignment pipeline Jan 09, 2012 (1) ASAP - Quick Reference ASAP needs a working version of Perl and is run from the command line. Furthermore, Bowtie needs to be installed on your
More informationDe novo genome assembly
BioNumerics Tutorial: De novo genome assembly 1 Aims This tutorial describes a de novo assembly of a Staphylococcus aureus genome, using single-end and pairedend reads generated by an Illumina R Genome
More informationBOOLEAN EXPRESSIONS CONTROL FLOW (IF-ELSE) INPUT/OUTPUT. Problem Solving with Computers-I
BOOLEAN EXPRESSIONS CONTROL FLOW (IF-ELSE) INPUT/OUTPUT Problem Solving with Computers-I Announcements HW02: Complete (individually)using dark pencil or pen, turn in during lab section next Wednesday Please
More informationLinked List using a Sentinel
Linked List using a Sentinel Linked List.h / Linked List.h Using a sentinel for search Created by Enoch Hwang on 2/1/10. Copyright 2010 La Sierra University. All rights reserved. / #include
More information5. Applicative Programming. 1. Juli 2011
1. Juli 2011 Einführung in die Programmierung Introduction to C/C++, Tobias Weinzierl page 1 of 41 Outline Recapitulation Computer architecture extended: Registers and caches Header files Global variables
More informationde Bruijn graphs for sequencing data
de Bruijn graphs for sequencing data Rayan Chikhi CNRS Bonsai team, CRIStAL/INRIA, Univ. Lille 1 SMPGD 2016 1 MOTIVATION - de Bruijn graphs are instrumental for reference-free sequencing data analysis:
More informationIntroduction to C++ 2. A Simple C++ Program. A C++ program consists of: a set of data & function definitions, and the main function (or driver)
Introduction to C++ 1. General C++ is an Object oriented extension of C which was derived from B (BCPL) Developed by Bjarne Stroustrup (AT&T Bell Labs) in early 1980 s 2. A Simple C++ Program A C++ program
More informationC++ Data Types. 1 Simple C++ Data Types 2. 3 Numeric Types Integers (whole numbers) Decimal Numbers... 5
C++ Data Types Contents 1 Simple C++ Data Types 2 2 Quick Note About Representations 3 3 Numeric Types 4 3.1 Integers (whole numbers)............................................ 4 3.2 Decimal Numbers.................................................
More informationLab: Supplying Inputs to Programs
Steven Zeil May 25, 2013 Contents 1 Running the Program 2 2 Supplying Standard Input 4 3 Command Line Parameters 4 1 In this lab, we will look at some of the different ways that basic I/O information can
More informationUnit Testing. Contents. Steven Zeil. July 22, Types of Testing 2. 2 Unit Testing Scaffolding Drivers Stubs...
Steven Zeil July 22, 2013 Contents 1 Types of Testing 2 2 6 2.1 Scaffolding................. 7 2.1.1 Drivers............... 7 2.1.2 Stubs................ 13 3 Integration Testing 17 1 1 Types of Testing
More informationChapter 2. Procedural Programming
Chapter 2 Procedural Programming 2: Preview Basic concepts that are similar in both Java and C++, including: standard data types control structures I/O functions Dynamic memory management, and some basic
More informationWelcome to MAPHiTS (Mapping Analysis Pipeline for High-Throughput Sequences) tutorial page.
Welcome to MAPHiTS (Mapping Analysis Pipeline for High-Throughput Sequences) tutorial page. In this page you will learn to use the tools of the MAPHiTS suite. A little advice before starting : rename your
More informationG52CPP C++ Programming Lecture 8. Dr Jason Atkin
G52CPP C++ Programming Lecture 8 Dr Jason Atkin 1 Last lecture Dynamic memory allocation Memory re-allocation to grow arrays Linked lists Use -> rather than. pcurrent = pcurrent -> pnext; 2 Aside: do not
More informationUnit Testing. Steven Zeil. July 22, Types of Testing 2. 2 Unit Testing Scaffolding Drivers Stubs...
Steven Zeil July 22, 2013 Contents 1 Types of Testing 2 2 Unit Testing 4 2.1 Scaffolding............ 4 2.1.1 Drivers.......... 4 2.1.2 Stubs........... 9 3 Integration Testing 12 1 1 Types of Testing Testing
More informationChapter 2: Overview of C++
Chapter 2: Overview of C++ Problem Solving, Abstraction, and Design using C++ 6e by Frank L. Friedman and Elliot B. Koffman C++ Background Introduced by Bjarne Stroustrup of AT&T s Bell Laboratories in
More informationReview: Exam 1. Your First C++ Program. Declaration Statements. Tells the compiler. Examples of declaration statements
Review: Exam 1 9/20/06 CS150 Introduction to Computer Science 1 1 Your First C++ Program 1 //*********************************************************** 2 // File name: hello.cpp 3 // Author: Shereen Khoja
More informationTutorial: De Novo Assembly of Paired Data
: De Novo Assembly of Paired Data September 20, 2013 CLC bio Silkeborgvej 2 Prismet 8000 Aarhus C Denmark Telephone: +45 70 22 32 44 Fax: +45 86 20 12 22 www.clcbio.com support@clcbio.com : De Novo Assembly
More informationQStringView. everywhere. Marc Mutz, Senior Software Engineer at KDAB
QStringView QStringView everywhere Marc Mutz, Senior Software Engineer at KDAB Intro About me p.2 Intro (cont'd) Author of QStringView p.3 Overview QStringView Using QStringView API Patterns For QStringView
More informationSAM / BAM Tutorial. EMBL Heidelberg. Course Materials. Tobias Rausch September 2012
SAM / BAM Tutorial EMBL Heidelberg Course Materials Tobias Rausch September 2012 Contents 1 SAM / BAM 3 1.1 Introduction................................... 3 1.2 Tasks.......................................
More informationCOMP322 - Introduction to C++ Lecture 02 - Basics of C++
COMP322 - Introduction to C++ Lecture 02 - Basics of C++ School of Computer Science 16 January 2012 C++ basics - Arithmetic operators Where possible, C++ will automatically convert among the basic types.
More informationA brief introduction to C++
A brief introduction to C++ Rupert Nash r.nash@epcc.ed.ac.uk 13 June 2018 1 References Bjarne Stroustrup, Programming: Principles and Practice Using C++ (2nd Ed.). Assumes very little but it s long Bjarne
More informationObject-Oriented Programming for Scientific Computing
Object-Oriented Programming for Scientific Computing Traits and Policies Ole Klein Interdisciplinary Center for Scientific Computing Heidelberg University ole.klein@iwr.uni-heidelberg.de 11. Juli 2017
More informationMapping RNA sequence data (Part 1: using pathogen portal s RNAseq pipeline) Exercise 6
Mapping RNA sequence data (Part 1: using pathogen portal s RNAseq pipeline) Exercise 6 The goal of this exercise is to retrieve an RNA-seq dataset in FASTQ format and run it through an RNA-sequence analysis
More informationIntroduction to Visual Basic and Visual C++ Introduction to Java. JDK Editions. Overview. Lesson 13. Overview
Introduction to Visual Basic and Visual C++ Introduction to Java Lesson 13 Overview I154-1-A A @ Peter Lo 2010 1 I154-1-A A @ Peter Lo 2010 2 Overview JDK Editions Before you can write and run the simple
More informationFinancial computing with C++
Financial Computing with C++, Lecture 6 - p1/24 Financial computing with C++ LG Gyurkó University of Oxford Michaelmas Term 2015 Financial Computing with C++, Lecture 6 - p2/24 Outline Linked lists Linked
More informationECE 2036 Lab 1: Introduction to Software Objects
ECE 2036 Lab 1: Introduction to Software Objects Assigned: Aug 24/25 2015 Due: September 1, 2015 by 11:59 PM Reading: Deitel& Deitel Chapter 2-4 Student Name: Check Off/Score Part 1: Check Off/Score Part
More informationImporting sequence assemblies from BAM and SAM files
BioNumerics Tutorial: Importing sequence assemblies from BAM and SAM files 1 Aim With the BioNumerics BAM import routine, a sequence assembly in BAM or SAM format can be imported in BioNumerics. A BAM
More informationDBG2OLC: Efficient Assembly of Large Genomes Using Long Erroneous Reads of the Third Generation Sequencing Technologies
DBG2OLC: Efficient Assembly of Large Genomes Using Long Erroneous Reads of the Third Generation Sequencing Technologies Chengxi Ye 1, Christopher M. Hill 1, Shigang Wu 2, Jue Ruan 2, Zhanshan (Sam) Ma
More informationReview for COSC 120 8/31/2017. Review for COSC 120 Computer Systems. Review for COSC 120 Computer Structure
Computer Systems Computer System Computer Structure C++ Environment Imperative vs. object-oriented programming in C++ Input / Output Primitive data types Software Banking System Compiler Music Player Text
More informationComputer Programming : C++
The Islamic University of Gaza Engineering Faculty Department of Computer Engineering Fall 2017 ECOM 2003 Muath i.alnabris Computer Programming : C++ Experiment #1 Basics Contents Structure of a program
More informationCopyright 2014 Regents of the University of Minnesota
Quality Control of Illumina Data using Galaxy August 18, 2014 Contents 1 Introduction 2 1.1 What is Galaxy?..................................... 2 1.2 Galaxy at MSI......................................
More informationRNA-Seq in Galaxy: Tuxedo protocol. Igor Makunin, UQ RCC, QCIF
RNA-Seq in Galaxy: Tuxedo protocol Igor Makunin, UQ RCC, QCIF Acknowledgments Genomics Virtual Lab: gvl.org.au Galaxy for tutorials: galaxy-tut.genome.edu.au Galaxy Australia: galaxy-aust.genome.edu.au
More informationInformatica e Sistemi in Tempo Reale
Informatica e Sistemi in Tempo Reale Introduction to C programming Giuseppe Lipari http://retis.sssup.it/~lipari Scuola Superiore Sant Anna Pisa October 5, 2011 G. Lipari (Scuola Superiore Sant Anna) Introduction
More informationCA341 - Comparative Programming Languages
CA341 - Comparative Programming Languages David Sinclair Dynamic Data Structures Generally we do not know how much data a program will have to process. There are 2 ways to handle this: Create a fixed data
More informationProgram Organization and Comments
C / C++ PROGRAMMING Program Organization and Comments Copyright 2013 Dan McElroy Programming Organization The layout of a program should be fairly straight forward and simple. Although it may just look
More informationGenomic Files. University of Massachusetts Medical School. October, 2015
.. Genomic Files University of Massachusetts Medical School October, 2015 2 / 55. A Typical Deep-Sequencing Workflow Samples Fastq Files Fastq Files Sam / Bam Files Various files Deep Sequencing Further
More informationFundamentals of Programming
Fundamentals of Programming Introduction to the C language Giuseppe Lipari http://retis.sssup.it/~lipari Scuola Superiore Sant Anna Pisa February 29, 2012 G. Lipari (Scuola Superiore Sant Anna) The C language
More informationPre Lab (Lab-1) Scrutinize Different Computer Components
Pre Lab (Lab-1) Scrutinize Different Computer Components Central Processing Unit (CPU) All computer programs have functions, purposes, and goals. For example, spreadsheet software helps users store data
More informationProgramming with C++ Language
Programming with C++ Language Fourth stage Prepared by: Eng. Samir Jasim Ahmed Email: engsamirjasim@yahoo.com Prepared By: Eng. Samir Jasim Page 1 Introduction: Programming languages: A programming language
More informationG52CPP C++ Programming Lecture 9
G52CPP C++ Programming Lecture 9 Dr Jason Atkin http://www.cs.nott.ac.uk/~jaa/cpp/ g52cpp.html 1 Last lecture const Constants, including pointers The C pre-processor And macros Compiling and linking And
More informationNumerical Computing in C and C++ Jamie Griffin. Semester A 2017 Lecture 2
Numerical Computing in C and C++ Jamie Griffin Semester A 2017 Lecture 2 Visual Studio in QM PC rooms Microsoft Visual Studio Community 2015. Bancroft Building 1.15a; Queen s W207, EB7; Engineering W128.D.
More informationThe Structure of a C++ Program
Steven Zeil May 25, 2013 Contents 1 Separate Compilation 3 1.1 Separate Compilation.......... 4 2 Pre-processing 7 2.1 #include.................. 9 2.2 Other Pre-processing Commands... 14 3 Declarations
More informationTutorial. De Novo Assembly of Paired Data. Sample to Insight. November 21, 2017
De Novo Assembly of Paired Data November 21, 2017 Sample to Insight QIAGEN Aarhus Silkeborgvej 2 Prismet 8000 Aarhus C Denmark Telephone: +45 70 22 32 44 www.qiagenbioinformatics.com AdvancedGenomicsSupport@qiagen.com
More informationPIC 10A Flow control. Ernest Ryu UCLA Mathematics
PIC 10A Flow control Ernest Ryu UCLA Mathematics If statement An if statement conditionally executes a block of code. # include < iostream > using namespace std ; int main () { double d1; cin >> d1; if
More information1. In C++, reserved words are the same as predefined identifiers. a. True
C++ Programming From Problem Analysis to Program Design 8th Edition Malik TEST BANK Full clear download (no formatting errors) at: https://testbankreal.com/download/c-programming-problem-analysis-program-design-8thedition-malik-test-bank/
More informationUser Manual. This is the example for Oases: make color 'VELVET_DIR=/full_path_of_velvet_dir/' 'MAXKMERLENGTH=63' 'LONGSEQUENCES=1'
SATRAP v0.1 - Solid Assembly TRAnslation Program User Manual Introduction A color space assembly must be translated into bases before applying bioinformatics analyses. SATRAP is designed to accomplish
More informationReference-free compression of high throughput sequencing data with a probabilistic de Bruijn graph
Benoit et al. BMC Bioinformatics (2015) 16:288 DOI 10.1186/s12859-015-0709-7 RESEARCH ARTICLE Open Access Reference-free compression of high throughput sequencing data with a probabilistic de Bruijn graph
More informationGetting started with C++ (Part 2)
Getting started with C++ (Part 2) CS427: Elements of Software Engineering Lecture 2.2 11am, 16 Jan 2012 CS427 Getting started with C++ (Part 2) 1/22 Outline 1 Recall from last week... 2 Recall: Output
More information6.096 Introduction to C++ January (IAP) 2009
MIT OpenCourseWare http://ocw.mit.edu 6.096 Introduction to C++ January (IAP) 2009 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms. Welcome to 6.096 Lecture
More informationComma C/C++ A parallel programming language
Comma C/C++ A parallel programming language By Steve Casselman CEO and Founder Comma Corp. Parallel Computing Technology 5/14/2014 Comma Corp 1 Computers and compilers evolved together First there was
More informationRelease Notes. Version Gene Codes Corporation
Version 4.10.1 Release Notes 2010 Gene Codes Corporation Gene Codes Corporation 775 Technology Drive, Ann Arbor, MI 48108 USA 1.800.497.4939 (USA) +1.734.769.7249 (elsewhere) +1.734.769.7074 (fax) www.genecodes.com
More informationC++ Support Classes (Data and Variables)
C++ Support Classes (Data and Variables) School of Mathematics 2018 Today s lecture Topics: Computers and Programs; Syntax and Structure of a Program; Data and Variables; Aims: Understand the idea of programming
More informationC++ Programming Lecture 1 Software Engineering Group
C++ Programming Lecture 1 Software Engineering Group Philipp D. Schubert Contents 1. More on data types 2. Expressions 3. Const & Constexpr 4. Statements 5. Control flow 6. Recap More on datatypes: build-in
More informationBoolean Algebra Boolean Algebra
What is the result and type of the following expressions? Int x=2, y=15; float u=2.0, v=15.0; -x x+y x-y x*v y / x x/y y%x x%y u*v u/v v/u u%v x * u (x+y)*u u / (x-x) x++ u++ u = --x u = x -- u *= ++x
More informationHomework 5. Yuji Shimojo CMSC 330. Instructor: Prof. Reginald Y. Haseltine
Homework 5 Yuji Shimojo CMSC 330 Instructor: Prof. Reginald Y. Haseltine July 13, 2013 Question 1 Consider the following Java definition of a mutable string class. class MutableString private char[] chars
More information5. Control Statements
5. Control Statements This section of the course will introduce you to the major control statements in C++. These control statements are used to specify the branching in an algorithm/recipe. Control statements
More informationIncrement and the While. Class 15
Increment and the While Class 15 Increment and Decrement Operators Increment and Decrement Increase or decrease a value by one, respectively. the most common operation in all of programming is to increment
More informationG52CPP C++ Programming Lecture 17
G52CPP C++ Programming Lecture 17 Dr Jason Atkin http://www.cs.nott.ac.uk/~jaa/cpp/ g52cpp.html 1 Last Lecture Exceptions How to throw (return) different error values as exceptions And catch the exceptions
More information