Using Pipeline Output Data for Whole Genome Alignment

Size: px
Start display at page:

Download "Using Pipeline Output Data for Whole Genome Alignment"

Transcription

1 Using Pipeline Output Data for Whole Genome Alignment FOR RESEARCH ONLY Topics 4 Introduction 4 Pipeline 4 Maq 4 GBrowse 4 Hardware Requirements 5 Workflow 6 Preparing to Run Maq 6 UNIX/Linux Environment 6 Testing PERL 6 Installing Maq 7 Getting Reference Sequences 8 Reference Genome with Multiple Chromosomes 9 Output File from Pipeline 9 Required Pipeline Output File 9 Format of Sequence.txt File 10 Quality Values 11 Getting Consensus, Identifying SNPs and Indels 11 Building Consensus 13 Extracting Consensus Information Part # , Rev. A May 2008

2 2 13 SNP Calling 16 Indel Discovery 18 Viewing SNPs and Indels with GBrowse 18 GBrowse 18 Reformatting Data 22 Using GBrowse 25 Appendix A: Installing Maq Yourself 26 Appendix B: Quality Value Tables 26 Illumina Symbolic ASCII Quality Values 27 Sanger Symbolic ASCII Quality Values Part # , Rev. A

3 This publication and its contents are proprietary to Illumina, Inc., and are intended solely for the contractual use of its customers and for no other purpose than to operate the system described herein. This publication and its contents shall not be used or distributed for any other purpose and/or otherwise communicated, disclosed, or reproduced in any way whatsoever without the prior written consent of Illumina, Inc. For the proper operation of this system and/or all parts thereof, the instructions in this guide must be strictly and explicitly followed by experienced personnel. All of the contents of this guide must be fully read and understood prior to operating the system or any of the parts thereof. FAILURE TO COMPLETELY READ AND FULLY UNDERSTAND AND FOLLOW ALL OF THE CONTENTS OF THIS GUIDE PRIOR TO OPERATING THIS SYSTEM, OR PARTS THEREOF, MAY RESULT IN DAMAGE TO THE EQUIPMENT, OR PARTS THEREOF, AND INJURY TO ANY PERSONS OPERATING THE SAME. Illumina, Inc. does not assume any liability arising out of the application or use of any products, component parts, or software described herein. Illumina, Inc. further does not convey any license under its patent, trademark, copyright, or common-law rights nor the similar rights of others. Illumina, Inc. further reserves the right to make any changes in any processes, products, or parts thereof, described herein without notice. While every effort has been made to make this guide as complete and accurate as possible as of the publication date, no warranty or fitness is implied, nor does Illumina accept any liability for damages resulting from the information contained in this guide Illumina, Inc. All rights reserved. Illumina, Solexa, Making Sense Out of Life, Oligator, Sentrix, GoldenGate, DASL, BeadArray, Array of Arrays, Infinium, BeadXpress, VeraCode, IntelliHyb, iselect, CSPro, iscan, and GenomeStudio are registered trademarks or trademarks of Illumina. All other brands and names contained herein are the property of their respective owners.

4 4 Introduction The Genome Analyzer can generate several Gb of data a week. Converting these huge amounts of sequence data into usable information requires fast and efficient downstream analysis. This document describes how to align Genome Analyzer Pipeline sequence data to a known genome using the Mapping and Assembly with Quality (Maq) application. Results can then be assessed opening the output files, or imported into a GBrowse implementation to view in the genomic context. NOTE This guide does not explain how to use Pipeline, and only provides limited information for the use of Maq and GBrowse. The main goal is to provide a path to efficiently use Pipeline output for whole genome alignment. The key sections of this guide are: Preparing to Run Maq on page 6 Gives information on installing Maq. Output File from Pipeline on page 9 Describes the fields in the relevant Pipeline files and the various metrics. Getting Consensus, Identifying SNPs and Indels on page 11 Explains how to get a consensus sequence, SNPs and indels from Maq. Viewing SNPs and Indels with GBrowse on page 18 Explains how to use GBrowse to view SNPs and indels. Pipeline Maq GBrowse Hardware Requirements The Genome Analyzer Pipeline software is a highly customizable workflow engine capable of taking the raw image data generated by the Genome Analyzer and producing intensity scores, base calls, quality metrics, and quality scored alignments. This software is the result of extensive collaborations with many of the world s leading sequencing centers. Maq is a third party open source software tool that builds mapping assemblies from short reads generated by next-generation sequencing machines. Maq is specifically developed for the Genome Analyzer by Heng Li and Richard Durbin from the Sanger Institute. Maq runs on UNIX/Linux, so you will need a computer that uses Linux or UNIX as the operating system. GBrowse is an open source genome viewer, generated as part of the Generic Model Organism Database project (GMOD). Many genome centers and universities have implemented GBrowse to enable you to view their genomic data. At minimum, you will need 1 GB of memory. This should be enough to map 2 million reads to a bacterial genome, though 4 GB is preferable. For mammalian-sized genome alignments, you will need to map many batches of about 2 million reads, and you will be better served with 16 GB of memory. Part # , Rev. A

5 5 Workflow The workflow for generating consensus, SNPs and indels is illustrated in Figure 1. Figure 1 Workflow Generating Consensus, SNPs and Indels Pipeline to Maq to GBrowse

6 6 Preparing to Run Maq Before you can install Maq, there are a number of requirements you need to fulfill. This section lists these requirements, and gives some options for installing these. UNIX/Linux Environment You need to install Maq in an environment that runs on UNIX or Linux (a version of UNIX). Workstation Your best option is to run Maq on a dedicated UNIX or Linux workstation. See if you can find such a workstation in your department where you can install and run Maq. You may need to install Linux on a computer from scratch. Talk to your IT department to see what is required, and whether they can help. Linux Distributions If you do not have access to a workstation running UNIX/Linux and you need to install Linux, there are many different distributions of Linux available, paid or free. Good choices are Red Hat Linux (paid) and Fedora Linux (free), but others should work too. Use the documentation provided with your Linux distribution for installation. Testing PERL Installing Maq Maq uses a number of scripts that are written in the programming language Perl. Many UNIX/Linux distributions already have Perl installed, so first check whether Perl is installed in your UNIX/Linux environment by typing the following: 1. Go to your UNIX/Linux environment 2. In the command prompt, enter: perl -v 3. Evaluate whether you have Perl installed: If Perl has been installed, you will get a message stating the version of Perl, copyright and other information. Continue with the section Installing Maq. If Perl is not installed yet, you will get a message like this: perl: command not found If Perl is not installed yet, go to and install the most recent fully released version of Perl for Linux and your hardware configuration. When your Linux environment is set up, ask your IT department to install Maq. The download is available from maq.sourceforge.net (Figure 2). We used Maq versions and to test the application. Part # , Rev. A

7 7 Maq User s Manual Maq Reference Manual Maq FAQ Download page Maq Wiki Figure 2 Maq Home Page NOTE If you have to install Maq yourself, refer to Appendix A: Installing Maq Yourself on page 25. Getting Reference Sequences You need to download a reference genome for the organism you sequenced to compare it to. Many are available from the NCBI website. 1. Open your browser and navigate to 2. Click on the link Genomic Biology in the left navigation bar. 3. Browse to your species under Genome Projects Database in the right navigation bar. 4. Navigate to or search for the species you are looking for, and click on Project data Genomic 5. Download the genomic files in fasta format (*.fasta, *.fa or *.fna). Download each chromosome of your organism. 6. Make sure to keep track of the exact build of the genome you are using. You can find this in the genbank file, in the Comments section. NOTE Another good source for reference genomes is UCSC (hgdownload.cse.ucsc.edu). Pipeline to Maq to GBrowse

8 8 Reference Genome with Multiple Chromosomes If you use a reference genome with multiple chromosomes, you may only find them as a fasta file per chromosome. You will need to combine these fasta files in one file for the reference genome, else your alignment scores may be affected. Perform the following: 1. Open the command line (Terminal) in Linux. 2. Go to the directory containing the downloaded reference genome files using the cd command. 3. Enter the following: cat chr1.fa chr2.fa chr3.fa >ref.fa where: chr1.fa chr2.fa and chr3.fa are the fasta input files. ref.fa is the fasta reference genome output file. Part # , Rev. A

9 9 Output File from Pipeline After you called the bases in Pipeline, Pipeline saves files containing the sequence information. This section specifies what file you need from Pipeline for alignment in Maq, and explains the different elements in this file. Required Pipeline Output File The Pipeline output file you should use for alignment in Maq has the following naming scheme: s_n_r_sequence.txt (for paired-end sequence files) or s_n_sequence.txt (for single-read sequence files) where: The N stands for the lane. The R stands for the read, in case of paired-end sequencing. An example of a sequencing reads for one clusters is s_3_2_sequence.txt; this file contains information from read 2 of lane 3. Format of Sequence.txt File The s_n_r_sequence.txt file contains sequence and quality information for one read from one sequencing lane. The files are in FASTQ format. An example of an entry for one read is shown GCCTAACCTTTCTGAACCTCATGCGGAAAAACTGTTT +SLXA-B3_604:2:1:512:767/1 ccccccccccccchkhcchcu`]`lpvrtinksnlaa Every entry contains the following lines: Read Identifier: The contains the read identifier, which has the following elements: Description Abbreviated run name Element SLXA-B3_604 Lane 2 Tile 1 Coordinates of the cluster on tile 512,767 Indicates the read of a paired end run /1 The read indentifier line starts with an '@', which indicates this line is going to be followed by a sequence line. Sequence: The line GCCTAACCTTTCTGAACCTCATGCGGAAAAACTGTTT contains the called sequence for this entry. Pipeline to Maq to GBrowse

10 10 Read Identifier: The line +SLXA-B3_604:2:1:512:767/1 contains the same read identifier as above, but this time the line starts with a '+', which indicates it is going to be followed by a quality score line. Quality scores: The line ccccccccccccchkhcchcu`]`lpvrtinksnlaa contains the quality scores for this entry. Every base call in an entry has a corresponding quality score, i.e., the n th position in the quality scores line corresponds to the n th nucleotide in the sequence line. Quality Values The quality scores are in Illumina symbolic ASCII format, according to the following formula: Quality value = (ASCII character code) The values of the characters in the Illumina symbolic ASCII format are listed in the Appendix, section Illumina Symbolic ASCII Quality Values on page 26. For a single basecall, a Q value of 30 is great, Q20 is a good score, while Q10 is still usable. Difference of Illumina and Phred Scoring Scheme The Illumina quality scoring scheme and the Phred quality scoring scheme are different: Illumina: 10 x log10((1-e)/e) Phred: -10log10(e) where: e=error probability. The two definitions round to the same value from approximately Q15 and above, however our scores can go as low as -5. Difference of Illumina and Sanger FASTQ The Sanger FASTQ format, which is used by Maq, differs slightly from the Illumina FASTQ format. The main difference is that the quality of the base calls is scored using different scales (Illumina versus Phred quality scores). Maq comes with tools to convert Illumina FASTQ (also often called Solexa FASTQ) to Sanger FASTQ; see Preparing to Run Maq on page 6 and the Maq documentation for more information. Part # , Rev. A

11 11 Getting Consensus, Identifying SNPs and Indels Maq aligns your sequence reads to a reference sequence, builds a consensus and calls single nucleotide polymorphisms (SNPs), and can identify insertion/ deletions (indels) if you have performed paired-end sequencing. This section explains briefly how to perform these actions, and what output files you will get when you call SNPs and identify indels. A lot of this information has been summarized from the Maq user s manual and the Maq reference manual, available at maq.sourceforge.net (see Figure 2). For more detailed instructions and comprehensive descriptions of the commands in Maq, see these documents; additional information is present in the FAQ section and in the Maq Wiki. Generating Analysis Folder Building Consensus You need to generate a folder in which you run the analysis. Copy the following files to this folder: Read files (Illumina FASTQ format). Reference sequence file (FASTA format). All output files Maq generated will be stored in this folder (unless you specifically direct Maq to another folder). The first thing you need to do is align the reads to the reference, and build a consensus. This is described in this section. NOTE For small sequencing projects (1 lane of sequence data from a procaryote), many of these steps can be combined as a batch using the easyrun command. See the Maq user s manual for information. Converting Illumina FASTQ to Sanger FASTQ As described in Quality Values on page 10, the FASTQ format used by Maq is different from the Illumina FASTQ format. To use Maq, you need to first convert the format for all read files by entering: maq sol2sanger s_n_r_sequence.txt s_n_r_sequence.fastq where: s_n_r_sequence.txt is the Illumina read sequence file s_n_r_sequence.fastq is the output file in Sanger FASTQ. Converting Sanger FASTQ to BFQ Next you need to convert Sanger FASTQ to binary FASTQ (bfq) for all read files by entering: maq fastq2bfq s_n_r_sequence.fastq s_n_r_sequence.bfq where: s_n_r_sequence.fastq is the Sanger FASTQ read sequence file. s_n_r_sequence.bfq is the output file in binary FASTQ. Pipeline to Maq to GBrowse

12 12 Converting Reference FASTA to BFA Next you need to convert FASTA to binary FASTA (bfa) for the reference sequence by entering: maq fasta2bfa ref.fasta ref.bfa where: ref.fasta is the FASTA reference sequence file. ref.bfa is the output reference file in binary FASTA. Aligning Reads to Reference For single-read sequencing, you align the reads from one file to the reference sequence by entering: maq map s_n_sequence.map ref.bfa s_n_sequence.bfq For paired-end sequencing, you align the reads from two matching pairedend files to the reference sequence by entering: maq map s_n_sequence.map ref.bfa s_n_1_sequence.bfq s_n_2_sequence.bfq where: s_n_sequence.map is the mapped alignment output file. ref.bfa is the reference file in binary FASTA. s_n_sequence.bfq is the single-read output file in binary FASTQ. s_n_1_sequence.bfq is the paired-end first read output file in binary FASTQ. s_n_2_sequence.bfq is the paired-end second read output file in binary FASTQ. NOTE When you align paired-end reads, you will get a message that indicates the success of the pairing: (total, ispe, mapped, paired) = ( , 1, , 6142) The number of mapped reads should be close to the number of paired reads. If the number of paired samples is very low (6142 in the example above), and you have done long distance paired-end reads, you need to specify the maximum read length (which should be slightly longer than the average paired-end fragment length). For example, for paired-end reads from 500 bp fragments, add a maximum fragment length of 550 bp by adding the argument -a 550, i.e. enter the following: maq map -a 550 s_n_sequence.map ref.bfa s_n_1_sequence.bfq s_n_2_sequence.bfq Merging Map Files Maq works best with 1 to 3 million reads as input when aligning reads to the reference sequence. If you have a big sequencing project with multiple lanes, you should perform the alignment per lane first, and then combine the map files using mapmerge. So if you used multiple lanes to sequence the same sample, you can combine the mapped alignments now by entering: Part # , Rev. A

13 13 maq mapmerge s_123_sequence.map s_1_sequence.map s_2_sequence.map s_3_sequence.map where: s_123_sequence.map is the combined mapped alignment output file for lane 1,2, and 3. s_n_sequence.map is the mapped alignment file for lane N. Building Consensus Now you can assemble the consensus from the (merged) map files: maq assemble s123.cns ref.bfa s_123_sequence.map where: s123.cns is the consensus output file ref.bfa is the reference file in binary FASTA. s_123_sequence.map is the merged mapped alignment file. Extracting Consensus Information Once you have built the consensus, you can extract the new consensus sequence in FASTA format, or in FASTQ format (containing Sanger quality scores). Extracting Consensus in FASTA Format To extract the consensus in FASTA format, enter the following: maq cns2ref s123.cns >s123.cns.fasta where: s123.cns is the consensus file. s123.cns.fasta is the output consensus file in FASTA. Extracting Consensus in FASTQ Format To extract the consensus in Sanger FASTQ format, enter the following: maq cns2fq s123.cns >s123.cns.fastq where: s123.cns is the consensus file. s123.cns.fastq is the output consensus file in FASTQ. The files are saved in the Sanger FASTQ format, with quality scores in the Sanger symbolic ASCII format (see Quality Values on page 10 for differences with the Illumina quality scheme). The quality scores are in Sanger symbolic ASCII format, according to the following formula: Quality value = (ASCII character code)- 33 The values of the characters in the Sanger symbolic ASCII format are listed in the Appendix, section Sanger Symbolic ASCII Quality Values on page 27. SNP Calling Extracting SNP Calls Once you have built the consensus, extract SNPs the following way: maq cns2snp s123.cns >s123.snp Pipeline to Maq to GBrowse

14 14 where: s123.cns is the consensus file s123.snp is the tab-delimited, output snp file. SNP File To view the SNP calls, open the snp file in excel (Figure 3). Chromosome/ Reference Position Reference Base Consensus Base Consensus Quality Read Depth Average # Hits Highest Mapping Quality Quality Difference Figure 3 SNP File Opened in Excel The columns contain the following information: Column Name Description A Chromosome / Reference Chromosome or reference sequence. B Position Position of SNP on the reference sequence. C Reference Base The base as present in the reference sequence. D Consensus Base The base called in the consensus of your sequencing reads. E Consensus Quality The quality of the base called in the consensus. This is the Sanger quality, which is different from the Illumina quality scores (see Difference of Illumina and Phred Scoring Scheme on page 10). F Read Depth The amount of reads covering the position. G Average # Hits The average number of hits of reads covering this position, which roughly equals the copy number of the flanking region in the reference genome. Part # , Rev. A

15 15 Column Name Description H I Highest Mapping Quality Quality Difference The highest mapping quality of the reads covering the position. The quality difference between the strong allele and the weak allele. If the quality difference is close to the highest mapping quality, you may be looking at a read error. For the consensus bases, heterozygotes are designated using IUB codes: IUB code A C G T M K Y R W S D B H V N Bases A C G T A/C G/T C/T A/G A/T G/C A/G/T C/G/T A/C/T A/C/G A/C/G/T Improving SNP Quality In addition, the following commands are useful for filtering SNP calls: SNPfilter. SNPfilter removes SNPs that are covered by just one read, fall in a repetitive region, or fall in a 10 bp region with at least 3 SNPs. Enter the following: perl maq.pl SNPfilter s123.snp >s123.filtered.snp where: s123.snp is the consensus file. Pipeline to Maq to GBrowse

16 16 s123.filtered.snp is the tab-delimited, output filtered snp file. rmdup. Rmdup removes pairs with identical ends, which could have been caused by PCR at sample prep. Removing duplicates may improve SNP calling accuracy. This filter needs to be done before the consensus is assembled (Building Consensus on page 13); use it as follows: maq rmdup s_123_rmdup.map s_123_sequence.map where: s_123_rmdup.map is the output filtered mapped alignment file s_123_sequence.map is the input mapped alignment file Indel Discovery Extracting Indels Once you have built the consensus, you can extract the indels the following way: maq indelpe ref.bfa s_123_sequence.map >s_123_sequence.indelpe where: ref.bfa is the reference file in binary FASTA. s_123_sequence.map is the merged mapped alignment file. s_123_sequence.indelpe is the tab-delimited, output indel file. NOTE You can only find indels using Maq with paired-end data. Indel File To view the indels found, open the indel file in excel (Figure 4). Chromosome/ Reference Position Indel Type # Ref Reads Indel Size Forward Reads Reverse Reads Figure 4 Indel File Opened in Excel Part # , Rev. A

17 17 The columns contain the following information: Column Name Description A Chromosome / Reference Chromosome or reference sequence. B Start Position Start position of indel on reference sequence. C Indel Type * Indicates the indel is confirmed by reads from both strands. + Means the indel is hit by at least two reads but from the same strand. - Shows the indel is only found on one read.. Means the indel is too close to another indel and is filtered out. D # Ref Reads The number of reads across the indel. E Indel Size Size of indel. F Forward Reads Number of reads on the forward strand confirming the consensus. G Reverse Reads Number of reads on the reverse strand confirming the consensus. NOTE If you want to concentrate on the most promising indels, filter the indel file in excel for * in the Indel Type field. Pipeline to Maq to GBrowse

18 18 Viewing SNPs and Indels with GBrowse Once you have files with SNPs and indels, you may want to view them in a genomic context. Many genome centers have implimented GBrowse, an open source genome viewer. This section helps you viewing your results in a GBrowse viewer. You will need to perform the following steps: 1. Find a GBrowse implementation for the organism and build you are interested in. 2. Transfer your SNP or indel data to the proper file format. 3. Upload the file to GBrowse. Now you are ready to look at your SNPs and indels as annotations in a genomic context. GBrowse GBrowse is an open source genome viewer, generated as part of the Generic Model Organism Database project (GMOD). Many genome centers and universities have implemented GBrowse to enable you to view their genomic data. Finding Suitable GBrowse Implementation Lists of implementations can be found at the following two websites: Browse through these lists and see if there is a GBrowse implementation for the organism and build you are interested in. These lists are not comprehensive; if you can t find one you can use, try entering GBrowse and your particular build in google, and see if you can find an appropriate implementation that way. Alternative Solutions If no suitable implementation of GBrowse exists, you can do two things: Redo your alignments with a build that is supported in a GBrowse implementation. Install GBrowse locally. This is possible, but requires more work and skill. See for instructions. Reformatting Data The SNP and indel files do not have the appropriate format for GBrowse to recognize. Fortunately, they are usually not extremely big, and can be handled in Excel, and you do not need a Perl script to change the format. This section explains how to reformat your SNP or indel data. Annotation File Format GBrowse can read a number of different file formats. Here we explain the annotation file format that works well with our data (Figure 5). Part # , Rev. A

19 19 Figure 5 GBrowse File The annotation file is a text file, and has to start with the following line: reference=landmark name The reference line has the following properties: The line starts with reference= (in lowercase). The line refers to the chromosome (reference=chr1) or the accession number of the organism (reference=nc_000913). No spaces allowed. The reference applies to all entries below it, until a new reference is found. Multiple reference lines are allowed. The reference line is followed by data lines, which have the following fields: Column Entry Description A Feature Type In our case SNP or INDEL. B Feature Name A unique name for each entry. C Feature Position One or more ranges in the format , or , D Description (optional) A description that will be displayed in the viewer. E URL (optional) If you have a hyperlink, provide it here. NOTE Do not use spaces, unless you put quotation marks around the field entry. Pipeline to Maq to GBrowse

20 20 Reformatting SNP Files To reformat the SNP file, perform the following steps: 1. Open the SNP file in Excel. 2. To get a unique SNP name, enter SNP1 in the top field of the empty column J. 3. You need to have a range of nucleotides for the feature position field. In the top field of the empty column K, enter: =CONCATENATE(B1,"-",B1) 4. You need one field with an informative description for every SNP. In the top field of the empty column L, enter: =CONCATENATE(C1,">",D1,",Q",E1,",",B1) The SNP description will consist of the following information: reference base>consensus base,quality score,position 5. To copy all formulas and calculate values for every entry: a. Select fields J1, K1, and L1 b. Drag down the selected fields by the bottom right corner (Figure 6). Select Bottom Right Corner Drag Down to Last Entry Figure 6 Drag Down Bottom Right Corner The values in column J and K should automatically recalculate, and column L should be filled with unique names (SNP1, SNP2, and so on). 6. Save the file in Excel format (*.xls). 7. Open a new book. This will be the annotation file 8. Copy the values from columns J, K and L of the modified SNP file to columns B, C and D of the annotation file (paste values only). 9. Enter SNP in the top field of the empty column A of the annotation file. Copy SNP all the way down to the last data line. 10. Select the first row and insert an empty line by pressing Ctrl Shift Enter the reference line in field A1, for example reference=chr1 or reference=nc_ Part # , Rev. A

21 21 NOTE You can refer to multiple chromosomes per file; just insert a reference line with the new chromosome above the data line where the next chromosome starts. The reference applies to all entries below it, until a new reference is found. The SNP annotation file should look like this (Figure 7): Figure 7 SNP Annotation File 12. Save the SNP annotation file as a text (tab delimited) file (*.txt). Reformatting Indel Files To reformat the indel file, perform the following steps: 1. Open the indel file in Excel. NOTE If you want to concentrate on the most promising indels, filter the indel file in excel for * in the Indel Type field (column C), and copy all the promising indels to a new book. 2. To get a unique indel name, enter INDEL1 in the top field of the empty column H. 3. You need to have a range of nucleotides for the feature position field. In the top field of the empty column I, enter: =CONCATENATE(B1,"-",B1) 4. You need one field with an informative description for every indel. In the top field of the empty column J, enter: =CONCATENATE(C1,",",E1,",f",F1,",r",G1) The indel description will consist of the following information: Indel type,indel size,f forward reads,r reverse reads 5. To copy all formulas and calculate values for every entry: a. Select fields H1, I1, and J1 b. Drag down the selected fields by the bottom right corner (Figure 6). The values in column I and J should automatically recalculate, and column L should be filled with unique names (INDEL1, INDEL2, and so on). Pipeline to Maq to GBrowse

22 22 6. Save the file in Excel format (*.xls). 7. Open a new book. This will be the annotation file 8. Copy the values from columns H, I, and J of the modified indel file to columns B, C and D of the annotation file (paste values only). 9. Enter INDEL in the top field of the empty column A of the annotation file. Copy INDEL all the way down to the last data line. 10. Select the first row and insert an empty line by pressing Ctrl Shift Enter the reference line in field A1, for example reference=chr1 or reference=nc_ NOTE You can refer to multiple chromosomes per file; just insert a reference line with the new chromosome above the data line where the next chromosome starts. The reference applies to all entries below it, until a new reference is found. The indel annotation file should look like this (Figure 8): Figure 8 Indel Annotation File 12. Save the indel annotation file as a text (tab delimited) file (*.txt). Using GBrowse When you have generated your annotation file, and found a suitable GBrowse implementation, you can start viewing your indels or SNPs in a genomic context. For comprehensive GBrowse help, FAQs and a tutorial, see Upload the Annotation File 1. Navigate your web browser to the GBrowse running web site. 2. Scroll down to the bottom of the page, where you can upload your own annotations (Figure 9). Different GBrowse implementations may look slightly different. Part # , Rev. A

23 23 Browse to File Upload File Figure 9 Upload Annotation File 3. Click Browse, go to the annotation file, select the file, and click Open. 4. Click Upload. Viewing SNPs and Indels Once your annotation file is uploaded you will see the file appear with the separate features (Figure 10). Figure 10 Uploaded Annotation File Annotation Check Box Uploaded Annotation File Edit File Clickable Features Make sure the annotation check box is selected. You can now edit the uploaded annotation file, or click on the separate features (SNPs or indels). This will display the feature in the viewer panel (Figure 11 and Figure 12). Zoom and Browse Area Gene Information Published SNPs Your Favorite SNP Figure 11 Your Favorite SNP in the GBrowse Viewer Pipeline to Maq to GBrowse

24 24 Zoom and Browse Area Gene Information Your Favorite Indels Figure 12 Your Favorite Indels in the GBrowse Viewer Part # , Rev. A

25 25 Appendix A: Installing Maq Yourself If you decide to install Maq yourself, do the following: 1. Open your browser in Linux and navigate to maq.sourceforge.net. 2. Click on the link download page (see Figure 2). 3. Click on the link Download for the most recent version of Maq. 4. Click on the package for your Linux and hardware configuration. If you are not sure which one is best, choose platform independent. 5. Click Save to download the package. 6. Repeat steps 3 to 5 for Maqview and Maq-Data. 7. Open the command line (Terminal). 8. Go to the directory containing the downloaded files using the cd command. The exact location depends on how your Linux is set up. 9. To unzip the packages type the following in the command line: bunzip2 *.bz2 10. List the directory contents by using the ls command. 11. To remove the files from the archive, type the following for every *.tar file in the directory: tar xvf name.tar You should get three new directories (check by using the ls command). 12. Go to the directory containing the Maq files: cd maq-x.x.x 13. Install the package by entering the following three commands in succession:./configure make make install 14. If you get a message that access is denied to the default install directory, you need to specify a directory that you do have access to. Enter the following two commands:./configure --prefix=/home/share/yourfolder (with /home/share/yourfolder your accessible directory) make install 15. Go one directory up: cd Test whether Maq is working by entering: maq You should get a message explaining Maq usage. If the command maq is not recognized, try the second method decribed in the Maq User Manual, or ask a Linux expert for help. Pipeline to Maq to GBrowse

26 26 Appendix B: Quality Value Tables Illumina Symbolic ASCII Quality Values The quality values of the characters in the Illumina symbolic ASCII quality values are listed in the table below: Table 1 Quality Value of Characters in the Illumina Symbolic ASCII Format Char. Code Qual. Value Char. Code Qual. Value Char. Code Qual. Value Char. Code Qual. Value Char. Code Qual. Value Char. Code Qual. Value ; -5 C 3 K 11 S 19 [ 27 c 35 < -4 D 4 L 12 T 20 \ 28 d 36 = -3 E 5 M 13 U 21 ] 29 e 37 > -2 F 6 N 14 V 22 ^ 30 f 38? -1 G 7 O 15 W 23 _ 31 g 0 H 8 P 16 X h 40 A 1 I 9 Q 17 Y 25 a 33 B 2 J 10 R 18 Z 26 b 34 Part # , Rev. A

27 27 Sanger Symbolic ASCII Quality Values The quality values of the characters in the Sanger Symbolic ASCII Quality Values are listed in the table below: Table 2 Quality Value of Characters in the Sanger Symbolic ASCII Format Char. Code Qual. Value Char. Code Qual. Value Char. Code Qual. Value Char. Code Qual. Value Char. Code Qual. Value Char. Code Qual. Value Char. Code Qual. Value! 0 / 14 = 28 K 42 Y 56 g 70 u 84 " > 29 L 43 Z 57 h 71 v 85 # ? 30 M 44 [ 58 i 72 w 86 $ N 45 \ 59 j 73 x 87 % A 32 O 46 ] 60 k 74 y 88 & B 33 P 47 ^ 61 l 75 z 89 ' C 34 Q 48 _ 62 m 76 { 90 ( D 35 R n ) E 36 S 50 a 64 o 78 } 92 * F 37 T 51 b 65 p 79 ~ G 38 U 52 c 66 q 80, 11 : 25 H 39 V 53 d 67 r ; 26 I 40 W 54 e 68 s < 27 J 41 X 55 f 69 t 83 Pipeline to Maq to GBrowse

28 Illumina, Inc Towne Centre Drive San Diego, CA ILMN (4566) (outside North America)

EcoStudy Software User Guide

EcoStudy Software User Guide EcoStudy Software User Guide FOR RESEARCH USE ONLY What is EcoStudy? 3 Setting Up a Study 4 Specifying Analysis Settings for your Study 6 Reviewing the Data in your Study 8 Exporting Study Data to a Report

More information

GenomeStudio Software Release Notes

GenomeStudio Software Release Notes GenomeStudio Software 2009.2 Release Notes 1. GenomeStudio Software 2009.2 Framework... 1 2. Illumina Genome Viewer v1.5...2 3. Genotyping Module v1.5... 4 4. Gene Expression Module v1.5... 6 5. Methylation

More information

Illumina Next Generation Sequencing Data analysis

Illumina Next Generation Sequencing Data analysis Illumina Next Generation Sequencing Data analysis Chiara Dal Fiume Sr Field Application Scientist Italy 2010 Illumina, Inc. All rights reserved. Illumina, illuminadx, Solexa, Making Sense Out of Life,

More information

Sequence Genotyper Reference Guide

Sequence Genotyper Reference Guide Sequence Genotyper Reference Guide For Research Use Only. Not for use in diagnostic procedures. Introduction 3 Installation 4 Dashboard Overview 5 Projects 6 Targets 7 Samples 9 Reports 12 Revision History

More information

AutoLoader 2.x User Guide

AutoLoader 2.x User Guide AutoLoader 2.x User Guide FOR RESEARCH USE ONLY Topics 3 Introduction 4 Supported Configurations 7 AutoLoader 2.x Components 9 Process Overview 10 Powering Up the AutoLoader 2.x 11 Starting the AutoLoader

More information

mtdna Variant Processor v1.0 BaseSpace App Guide

mtdna Variant Processor v1.0 BaseSpace App Guide mtdna Variant Processor v1.0 BaseSpace App Guide For Research Use Only. Not for use in diagnostic procedures. Introduction 3 Workflow Diagram 4 Workflow 5 Log In to BaseSpace 6 Set Analysis Parameters

More information

ASAP - Allele-specific alignment pipeline

ASAP - Allele-specific alignment pipeline ASAP - Allele-specific alignment pipeline Jan 09, 2012 (1) ASAP - Quick Reference ASAP needs a working version of Perl and is run from the command line. Furthermore, Bowtie needs to be installed on your

More information

Welcome to MAPHiTS (Mapping Analysis Pipeline for High-Throughput Sequences) tutorial page.

Welcome to MAPHiTS (Mapping Analysis Pipeline for High-Throughput Sequences) tutorial page. Welcome to MAPHiTS (Mapping Analysis Pipeline for High-Throughput Sequences) tutorial page. In this page you will learn to use the tools of the MAPHiTS suite. A little advice before starting : rename your

More information

Indexed Sequencing. Overview Guide

Indexed Sequencing. Overview Guide Indexed Sequencing Overview Guide Introduction 3 Single-Indexed Sequencing Overview 3 Dual-Indexed Sequencing Overview 4 Dual-Indexed Workflow on a Paired-End Flow Cell 4 Dual-Indexed Workflow on a Single-Read

More information

BlueFuse Multi v4.4 Installation Guide

BlueFuse Multi v4.4 Installation Guide BlueFuse Multi v4.4 Installation Guide For Research Use Only. Not for use in diagnostic procedures. Revision History 3 Introduction 4 Supported Operating Systems 5 Hardware Requirements 6 Deployment Modes

More information

Agilent Genomic Workbench Lite Edition 6.5

Agilent Genomic Workbench Lite Edition 6.5 Agilent Genomic Workbench Lite Edition 6.5 SureSelect Quality Analyzer User Guide For Research Use Only. Not for use in diagnostic procedures. Agilent Technologies Notices Agilent Technologies, Inc. 2010

More information

MiSeq Reporter TruSight Tumor 15 Workflow Guide

MiSeq Reporter TruSight Tumor 15 Workflow Guide MiSeq Reporter TruSight Tumor 15 Workflow Guide For Research Use Only. Not for use in diagnostic procedures. Introduction 3 TruSight Tumor 15 Workflow Overview 4 Reports 8 Analysis Output Files 9 Manifest

More information

NGS Data Visualization and Exploration Using IGV

NGS Data Visualization and Exploration Using IGV 1 What is Galaxy Galaxy for Bioinformaticians Galaxy for Experimental Biologists Using Galaxy for NGS Analysis NGS Data Visualization and Exploration Using IGV 2 What is Galaxy Galaxy for Bioinformaticians

More information

Helpful Galaxy screencasts are available at:

Helpful Galaxy screencasts are available at: This user guide serves as a simplified, graphic version of the CloudMap paper for applicationoriented end-users. For more details, please see the CloudMap paper. Video versions of these user guides and

More information

HiSeq Instrument Software Release Notes

HiSeq Instrument Software Release Notes HiSeq Instrument Software Release Notes HCS v2.0.12 RTA v1.17.21.3 Recipe Fragments v1.3.61 Illumina BaseSpace Broker v2.0.13022.1628 SAV v1.8.20 For HiSeq 2000 and HiSeq 1000 Systems FOR RESEARCH USE

More information

Designing Custom GoldenGate Genotyping Assays

Designing Custom GoldenGate Genotyping Assays Designing Custom GoldenGate Genotyping Assays Guidelines for efficiently creating and ordering high-quality custom GoldenGate Genotyping Assays using the Illumina Assay Design Tool. Introduction The Illumina

More information

Bioinformatics in next generation sequencing projects

Bioinformatics in next generation sequencing projects Bioinformatics in next generation sequencing projects Rickard Sandberg Assistant Professor Department of Cell and Molecular Biology Karolinska Institutet March 2011 Once sequenced the problem becomes computational

More information

RNA-Seq in Galaxy: Tuxedo protocol. Igor Makunin, UQ RCC, QCIF

RNA-Seq in Galaxy: Tuxedo protocol. Igor Makunin, UQ RCC, QCIF RNA-Seq in Galaxy: Tuxedo protocol Igor Makunin, UQ RCC, QCIF Acknowledgments Genomics Virtual Lab: gvl.org.au Galaxy for tutorials: galaxy-tut.genome.edu.au Galaxy Australia: galaxy-aust.genome.edu.au

More information

Pre-processing and quality control of sequence data. Barbera van Schaik KEBB - Bioinformatics Laboratory

Pre-processing and quality control of sequence data. Barbera van Schaik KEBB - Bioinformatics Laboratory Pre-processing and quality control of sequence data Barbera van Schaik KEBB - Bioinformatics Laboratory b.d.vanschaik@amc.uva.nl Topic: quality control and prepare data for the interesting stuf Keep Throw

More information

KaryoStudio v1.4 User Guide

KaryoStudio v1.4 User Guide KaryoStudio v1.4 User Guide FOR RESEARCH USE ONLY ILLUMINA PROPRIETARY Part # 11328837 Rev. C June 2011 Notice This document and its contents are proprietary to Illumina, Inc. and its affiliates ("Illumina"),

More information

EpiGnome Methyl Seq Bioinformatics User Guide Rev. 0.1

EpiGnome Methyl Seq Bioinformatics User Guide Rev. 0.1 EpiGnome Methyl Seq Bioinformatics User Guide Rev. 0.1 Introduction This guide contains data analysis recommendations for libraries prepared using Epicentre s EpiGnome Methyl Seq Kit, and sequenced on

More information

Using Genome Analyzer Sequencing Control Software Version 2.5

Using Genome Analyzer Sequencing Control Software Version 2.5 Using Genome Analyzer Sequencing Control Software Version 2.5 FOR RESEARCH USE ONLY Topics 3 Introduction 4 Run Parameters Window 8 Data Collection Software Interface 12 Recipe Viewer 13 Reagent Tracking

More information

De novo genome assembly

De novo genome assembly BioNumerics Tutorial: De novo genome assembly 1 Aims This tutorial describes a de novo assembly of a Staphylococcus aureus genome, using single-end and pairedend reads generated by an Illumina R Genome

More information

User Guide. SLAMseq Data Analysis Pipeline SLAMdunk on Bluebee Platform

User Guide. SLAMseq Data Analysis Pipeline SLAMdunk on Bluebee Platform SLAMseq Data Analysis Pipeline SLAMdunk on Bluebee Platform User Guide Catalog Numbers: 061, 062 (SLAMseq Kinetics Kits) 015 (QuantSeq 3 mrna-seq Library Prep Kits) 063UG147V0100 FOR RESEARCH USE ONLY.

More information

Designing Custom GoldenGate Genotyping Assays

Designing Custom GoldenGate Genotyping Assays Designing Custom GoldenGate Genotyping Assays Guidelines for efficiently creating and ordering high-quality custom GoldenGate Genotyping Assays using the Illumina Assay Design Tool. Introduction The Illumina

More information

Local Run Manager Resequencing Analysis Module Workflow Guide

Local Run Manager Resequencing Analysis Module Workflow Guide Local Run Manager Resequencing Analysis Module Workflow Guide For Research Use Only. Not for use in diagnostic procedures. Overview 3 Set Parameters 4 Analysis Methods 6 View Analysis Results 8 Analysis

More information

Reference & Track Manager

Reference & Track Manager Reference & Track Manager U SoftGenetics, LLC 100 Oakwood Avenue, Suite 350, State College, PA 16803 USA * info@softgenetics.com www.softgenetics.com 888-791-1270 2016 Registered Trademarks are property

More information

QIAseq DNA V3 Panel Analysis Plugin USER MANUAL

QIAseq DNA V3 Panel Analysis Plugin USER MANUAL QIAseq DNA V3 Panel Analysis Plugin USER MANUAL User manual for QIAseq DNA V3 Panel Analysis 1.0.1 Windows, Mac OS X and Linux January 25, 2018 This software is for research purposes only. QIAGEN Aarhus

More information

Genome Browsers - The UCSC Genome Browser

Genome Browsers - The UCSC Genome Browser Genome Browsers - The UCSC Genome Browser Background The UCSC Genome Browser is a well-curated site that provides users with a view of gene or sequence information in genomic context for a specific species,

More information

Performing a resequencing assembly

Performing a resequencing assembly BioNumerics Tutorial: Performing a resequencing assembly 1 Aim In this tutorial, we will discuss the different options to obtain statistics about the sequence read set data and assess the quality, and

More information

Peter Schweitzer, Director, DNA Sequencing and Genotyping Lab

Peter Schweitzer, Director, DNA Sequencing and Genotyping Lab The instruments, the runs, the QC metrics, and the output Peter Schweitzer, Director, DNA Sequencing and Genotyping Lab Overview Roche/454 GS-FLX 454 (GSRunbrowser information) Evaluating run results Errors

More information

Sequencing Analysis Viewer Software User Guide

Sequencing Analysis Viewer Software User Guide Sequencing Analysis Viewer Software User Guide FOR RESEARCH USE ONLY Revision History 3 Introduction 4 Setting Up Sequencing Analysis Viewer Software 5 Data Availability 8 Loading Data 9 Analysis Tab 10

More information

Tutorial 1: Exploring the UCSC Genome Browser

Tutorial 1: Exploring the UCSC Genome Browser Last updated: May 12, 2011 Tutorial 1: Exploring the UCSC Genome Browser Open the homepage of the UCSC Genome Browser at: http://genome.ucsc.edu/ In the blue bar at the top, click on the Genomes link.

More information

RNA-Seq data analysis software. User Guide 023UG050V0200

RNA-Seq data analysis software. User Guide 023UG050V0200 RNA-Seq data analysis software User Guide 023UG050V0200 FOR RESEARCH USE ONLY. NOT INTENDED FOR DIAGNOSTIC OR THERAPEUTIC USE. INFORMATION IN THIS DOCUMENT IS SUBJECT TO CHANGE WITHOUT NOTICE. Lexogen

More information

Next Generation Sequence Alignment on the BRC Cluster. Steve Newhouse 22 July 2010

Next Generation Sequence Alignment on the BRC Cluster. Steve Newhouse 22 July 2010 Next Generation Sequence Alignment on the BRC Cluster Steve Newhouse 22 July 2010 Overview Practical guide to processing next generation sequencing data on the cluster No details on the inner workings

More information

Genetics 211 Genomics Winter 2014 Problem Set 4

Genetics 211 Genomics Winter 2014 Problem Set 4 Genomics - Part 1 due Friday, 2/21/2014 by 9:00am Part 2 due Friday, 3/7/2014 by 9:00am For this problem set, we re going to use real data from a high-throughput sequencing project to look for differential

More information

High-throughput sequencing: Alignment and related topic. Simon Anders EMBL Heidelberg

High-throughput sequencing: Alignment and related topic. Simon Anders EMBL Heidelberg High-throughput sequencing: Alignment and related topic Simon Anders EMBL Heidelberg Established platforms HTS Platforms Illumina HiSeq, ABI SOLiD, Roche 454 Newcomers: Benchtop machines 454 GS Junior,

More information

Performing whole genome SNP analysis with mapping performed locally

Performing whole genome SNP analysis with mapping performed locally BioNumerics Tutorial: Performing whole genome SNP analysis with mapping performed locally 1 Introduction 1.1 An introduction to whole genome SNP analysis A Single Nucleotide Polymorphism (SNP) is a variation

More information

INTRODUCTION AUX FORMATS DE FICHIERS

INTRODUCTION AUX FORMATS DE FICHIERS INTRODUCTION AUX FORMATS DE FICHIERS Plan. Formats de séquences brutes.. Format fasta.2. Format fastq 2. Formats d alignements 2.. Format SAM 2.2. Format BAM 4. Format «Variant Calling» 4.. Format Varscan

More information

Next generation Confirmation (NGC) module

Next generation Confirmation (NGC) module QUICK REFERENCE Next generation Confirmation (NGC) module Catalog Number A28221 Pub. No. MAN0015891 Rev. A.0 Product description The Applied Biosystems Next generation Confirmation (NGC) module analyzes

More information

Data Walkthrough: Background

Data Walkthrough: Background Data Walkthrough: Background File Types FASTA Files FASTA files are text-based representations of genetic information. They can contain nucleotide or amino acid sequences. For this activity, students will

More information

RNA-Seq data analysis software. User Guide 023UG050V0210

RNA-Seq data analysis software. User Guide 023UG050V0210 RNA-Seq data analysis software User Guide 023UG050V0210 FOR RESEARCH USE ONLY. NOT INTENDED FOR DIAGNOSTIC OR THERAPEUTIC USE. INFORMATION IN THIS DOCUMENT IS SUBJECT TO CHANGE WITHOUT NOTICE. Lexogen

More information

Resequencing Analysis. (Pseudomonas aeruginosa MAPO1 ) Sample to Insight

Resequencing Analysis. (Pseudomonas aeruginosa MAPO1 ) Sample to Insight Resequencing Analysis (Pseudomonas aeruginosa MAPO1 ) 1 Workflow Import NGS raw data Trim reads Import Reference Sequence Reference Mapping QC on reads Variant detection Case Study Pseudomonas aeruginosa

More information

EXERCISE: GETTING STARTED WITH SAV

EXERCISE: GETTING STARTED WITH SAV Sequencing Analysis Viewer (SAV) Overview 1 EXERCISE: GETTING STARTED WITH SAV Purpose This exercise explores the following topics: How to load run data into SAV How to explore run metrics with SAV Getting

More information

Image Analysis and Base Calling Sarah Reid FAS

Image Analysis and Base Calling Sarah Reid FAS Image Analysis and Base Calling Sarah Reid FAS For Research Use Only. Not for use in diagnostic procedures. 2016 Illumina, Inc. All rights reserved. Illumina, 24sure, BaseSpace, BeadArray, BlueFish, BlueFuse,

More information

CLC Server. End User USER MANUAL

CLC Server. End User USER MANUAL CLC Server End User USER MANUAL Manual for CLC Server 10.0.1 Windows, macos and Linux March 8, 2018 This software is for research purposes only. QIAGEN Aarhus Silkeborgvej 2 Prismet DK-8000 Aarhus C Denmark

More information

Local Run Manager Amplicon Analysis Module Workflow Guide

Local Run Manager Amplicon Analysis Module Workflow Guide Local Run Manager Amplicon Analysis Module Workflow Guide For Research Use Only. Not for use in diagnostic procedures. Overview 3 Set Parameters 4 Analysis Methods 6 View Analysis Results 9 Analysis Report

More information

Tutorial. Variant Detection. Sample to Insight. November 21, 2017

Tutorial. Variant Detection. Sample to Insight. November 21, 2017 Resequencing: Variant Detection November 21, 2017 Map Reads to Reference and Sample to Insight QIAGEN Aarhus Silkeborgvej 2 Prismet 8000 Aarhus C Denmark Telephone: +45 70 22 32 44 www.qiagenbioinformatics.com

More information

When we search a nucleic acid databases, there is no need for you to carry out your own six frame translation. Mascot always performs a 6 frame

When we search a nucleic acid databases, there is no need for you to carry out your own six frame translation. Mascot always performs a 6 frame 1 When we search a nucleic acid databases, there is no need for you to carry out your own six frame translation. Mascot always performs a 6 frame translation on the fly. That is, 3 reading frames from

More information

Copyright 2014 Regents of the University of Minnesota

Copyright 2014 Regents of the University of Minnesota Quality Control of Illumina Data using Galaxy August 18, 2014 Contents 1 Introduction 2 1.1 What is Galaxy?..................................... 2 1.2 Galaxy at MSI......................................

More information

Hands-on Instruction in Sequence Assembly

Hands-on Instruction in Sequence Assembly 1 Botany 2010 Workshop: An Introduction to Next-Generation Sequencing Hands-on Instruction in Sequence Assembly Part 1. Download sequence files in fastq format from GenBank Sequence Read Archive. 1. Go

More information

MiSeq Reporter Amplicon DS Workflow Guide

MiSeq Reporter Amplicon DS Workflow Guide MiSeq Reporter Amplicon DS Workflow Guide For Research Use Only. Not for use in diagnostic procedures. Introduction 3 Amplicon DS Workflow Overview 4 Optional Settings for the Amplicon DS Workflow 7 Analysis

More information

Next generation sequencing: assembly by mapping reads. Laurent Falquet, Vital-IT Helsinki, June 3, 2010

Next generation sequencing: assembly by mapping reads. Laurent Falquet, Vital-IT Helsinki, June 3, 2010 Next generation sequencing: assembly by mapping reads Laurent Falquet, Vital-IT Helsinki, June 3, 2010 Overview What is assembly by mapping? Methods BWT File formats Tools Issues Visualization Discussion

More information

MIRING: Minimum Information for Reporting Immunogenomic NGS Genotyping. Data Standards Hackathon for NGS HACKATHON 1.0 Bethesda, MD September

MIRING: Minimum Information for Reporting Immunogenomic NGS Genotyping. Data Standards Hackathon for NGS HACKATHON 1.0 Bethesda, MD September MIRING: Minimum Information for Reporting Immunogenomic NGS Genotyping Data Standards Hackathon for NGS HACKATHON 1.0 Bethesda, MD September 27 2014 Static Dynamic Static Minimum Information for Reporting

More information

Copyright 2014 Regents of the University of Minnesota

Copyright 2014 Regents of the University of Minnesota Quality Control of Illumina Data using Galaxy Contents September 16, 2014 1 Introduction 2 1.1 What is Galaxy?..................................... 2 1.2 Galaxy at MSI......................................

More information

Agilent Genomic Workbench 6.5

Agilent Genomic Workbench 6.5 Agilent Genomic Workbench 6.5 Product Overview Guide For Research Use Only. Not for use in diagnostic procedures. Agilent Technologies Notices Agilent Technologies, Inc. 2010, 2015 No part of this manual

More information

High-throughput sequencing: Alignment and related topic. Simon Anders EMBL Heidelberg

High-throughput sequencing: Alignment and related topic. Simon Anders EMBL Heidelberg High-throughput sequencing: Alignment and related topic Simon Anders EMBL Heidelberg Established platforms HTS Platforms Illumina HiSeq, ABI SOLiD, Roche 454 Newcomers: Benchtop machines: Illumina MiSeq,

More information

User's guide to ChIP-Seq applications: command-line usage and option summary

User's guide to ChIP-Seq applications: command-line usage and option summary User's guide to ChIP-Seq applications: command-line usage and option summary 1. Basics about the ChIP-Seq Tools The ChIP-Seq software provides a set of tools performing common genome-wide ChIPseq analysis

More information

m6aviewer Version Documentation

m6aviewer Version Documentation m6aviewer Version 1.6.0 Documentation Contents 1. About 2. Requirements 3. Launching m6aviewer 4. Running Time Estimates 5. Basic Peak Calling 6. Running Modes 7. Multiple Samples/Sample Replicates 8.

More information

Cyverse tutorial 1 Logging in to Cyverse and data management. Open an Internet browser window and navigate to the Cyverse discovery environment:

Cyverse tutorial 1 Logging in to Cyverse and data management. Open an Internet browser window and navigate to the Cyverse discovery environment: Cyverse tutorial 1 Logging in to Cyverse and data management Open an Internet browser window and navigate to the Cyverse discovery environment: https://de.cyverse.org/de/ Click Log in with your CyVerse

More information

Browser Exercises - I. Alignments and Comparative genomics

Browser Exercises - I. Alignments and Comparative genomics Browser Exercises - I Alignments and Comparative genomics 1. Navigating to the Genome Browser (GBrowse) Note: For this exercise use http://www.tritrypdb.org a. Navigate to the Genome Browser (GBrowse)

More information

Genomic Files. University of Massachusetts Medical School. October, 2015

Genomic Files. University of Massachusetts Medical School. October, 2015 .. Genomic Files University of Massachusetts Medical School October, 2015 2 / 55. A Typical Deep-Sequencing Workflow Samples Fastq Files Fastq Files Sam / Bam Files Various files Deep Sequencing Further

More information

ChIP-Seq Tutorial on Galaxy

ChIP-Seq Tutorial on Galaxy 1 Introduction ChIP-Seq Tutorial on Galaxy 2 December 2010 (modified April 6, 2017) Rory Stark The aim of this practical is to give you some experience handling ChIP-Seq data. We will be working with data

More information

Genomic Files. University of Massachusetts Medical School. October, 2014

Genomic Files. University of Massachusetts Medical School. October, 2014 .. Genomic Files University of Massachusetts Medical School October, 2014 2 / 39. A Typical Deep-Sequencing Workflow Samples Fastq Files Fastq Files Sam / Bam Files Various files Deep Sequencing Further

More information

Analysis of high-throughput sequencing data. Simon Anders EBI

Analysis of high-throughput sequencing data. Simon Anders EBI Analysis of high-throughput sequencing data Simon Anders EBI Outline Overview on high-throughput sequencing (HTS) technologies, focusing on Solexa's GenomAnalyzer as example Software requirements to works

More information

Galaxy Platform For NGS Data Analyses

Galaxy Platform For NGS Data Analyses Galaxy Platform For NGS Data Analyses Weihong Yan wyan@chem.ucla.edu Collaboratory Web Site http://qcb.ucla.edu/collaboratory Collaboratory Workshops Workshop Outline ü Day 1 UCLA galaxy and user account

More information

Agilent Genomic Workbench 7.0

Agilent Genomic Workbench 7.0 Agilent Genomic Workbench 7.0 Data Viewing User Guide Agilent Technologies Notices Agilent Technologies, Inc. 2012, 2015 No part of this manual may be reproduced in any form or by any means (including

More information

BaseSpace User Guide FOR RESEARCH USE ONLY

BaseSpace User Guide FOR RESEARCH USE ONLY BaseSpace User Guide FOR RESEARCH USE ONLY Introduction 3 How Do I Start 7 BaseSpace User Interface 11 How To Use BaseSpace 21 Workflow Reference 44 Data Reference 50 Technical Assistance ILLUMINA PROPRIETARY

More information

Lecture 8. Sequence alignments

Lecture 8. Sequence alignments Lecture 8 Sequence alignments DATA FORMATS bioawk bioawk is a program that extends awk s powerful processing of tabular data to processing tasks involving common bioinformatics formats like FASTA/FASTQ,

More information

Tutorial: De Novo Assembly of Paired Data

Tutorial: De Novo Assembly of Paired Data : De Novo Assembly of Paired Data September 20, 2013 CLC bio Silkeborgvej 2 Prismet 8000 Aarhus C Denmark Telephone: +45 70 22 32 44 Fax: +45 86 20 12 22 www.clcbio.com support@clcbio.com : De Novo Assembly

More information

Indexed Sequencing. Overview Guide

Indexed Sequencing. Overview Guide Indexed Sequencing Overview Guide Introduction 3 Single-Indexed Sequencing Overview 3 Dual-Indexed Sequencing Overview 4 Dual-Indexed Workflow on a Paired-End Flow Cell 4 Dual-Indexed Workflow on a Single-Read

More information

RNA-Seq data analysis software. User Guide 023UG050V0100

RNA-Seq data analysis software. User Guide 023UG050V0100 RNA-Seq data analysis software User Guide 023UG050V0100 FOR RESEARCH USE ONLY. NOT INTENDED FOR DIAGNOSTIC OR THERAPEUTIC USE. INFORMATION IN THIS DOCUMENT IS SUBJECT TO CHANGE WITHOUT NOTICE. Lexogen

More information

Perl for Biologists. Practical example. Session 14 June 3, Robert Bukowski. Session 14: Practical example Perl for Biologists 1.

Perl for Biologists. Practical example. Session 14 June 3, Robert Bukowski. Session 14: Practical example Perl for Biologists 1. Perl for Biologists Session 14 June 3, 2015 Practical example Robert Bukowski Session 14: Practical example Perl for Biologists 1.2 1 Session 13 review Process is an object of UNIX (Linux) kernel identified

More information

Importing your Exeter NGS data into Galaxy:

Importing your Exeter NGS data into Galaxy: Importing your Exeter NGS data into Galaxy: The aim of this tutorial is to show you how to import your raw Illumina FASTQ files and/or assemblies and remapping files into Galaxy. As of 1 st July 2011 Illumina

More information

Biostatistics and Bioinformatics Molecular Sequence Databases

Biostatistics and Bioinformatics Molecular Sequence Databases . 1 Description of Module Subject Name Paper Name Module Name/Title 13 03 Dr. Vijaya Khader Dr. MC Varadaraj 2 1. Objectives: In the present module, the students will learn about 1. Encoding linear sequences

More information

ITMO Ecole de Bioinformatique Hands-on session: smallrna-seq N. Servant 21 rd November 2013

ITMO Ecole de Bioinformatique Hands-on session: smallrna-seq N. Servant 21 rd November 2013 ITMO Ecole de Bioinformatique Hands-on session: smallrna-seq N. Servant 21 rd November 2013 1. Data and objectives We will use the data from GEO (GSE35368, Toedling, Servant et al. 2011). Two samples were

More information

Preparation of alignments for variant calling with GATK: exercise instructions for BioHPC Lab computers

Preparation of alignments for variant calling with GATK: exercise instructions for BioHPC Lab computers Preparation of alignments for variant calling with GATK: exercise instructions for BioHPC Lab computers Data used in the exercise We will use D. melanogaster WGS paired-end Illumina data with NCBI accessions

More information

SAM / BAM Tutorial. EMBL Heidelberg. Course Materials. Tobias Rausch September 2012

SAM / BAM Tutorial. EMBL Heidelberg. Course Materials. Tobias Rausch September 2012 SAM / BAM Tutorial EMBL Heidelberg Course Materials Tobias Rausch September 2012 Contents 1 SAM / BAM 3 1.1 Introduction................................... 3 1.2 Tasks.......................................

More information

Genome Browsers Guide

Genome Browsers Guide Genome Browsers Guide Take a Class This guide supports the Galter Library class called Genome Browsers. See our Classes schedule for the next available offering. If this class is not on our upcoming schedule,

More information

Fusion Detection Using QIAseq RNAscan Panels

Fusion Detection Using QIAseq RNAscan Panels Fusion Detection Using QIAseq RNAscan Panels June 11, 2018 Sample to Insight QIAGEN Aarhus Silkeborgvej 2 Prismet 8000 Aarhus C Denmark Telephone: +45 70 22 32 44 www.qiagenbioinformatics.com ts-bioinformatics@qiagen.com

More information

Minimum Information for Reporting Immunogenomic NGS Genotyping (MIRING)

Minimum Information for Reporting Immunogenomic NGS Genotyping (MIRING) Minimum Information for Reporting Immunogenomic NGS Genotyping (MIRING) Reporting guideline statement for HLA and KIR genotyping data generated via Next Generation Sequencing (NGS) technologies and analysis

More information

Public Repositories Tutorial: Bulk Downloads

Public Repositories Tutorial: Bulk Downloads Public Repositories Tutorial: Bulk Downloads Almost all of the public databases, genome browsers, and other tools you have explored so far offer some form of access to rapidly download all or large chunks

More information

User Manual. Ver. 3.0 March 19, 2012

User Manual. Ver. 3.0 March 19, 2012 User Manual Ver. 3.0 March 19, 2012 Table of Contents 1. Introduction... 2 1.1 Rationale... 2 1.2 Software Work-Flow... 3 1.3 New in GenomeGems 3.0... 4 2. Software Description... 5 2.1 Key Features...

More information

PRACTICAL SESSION 5 GOTCLOUD ALIGNMENT WITH BWA JAN 7 TH, 2014 STOM 2014 WORKSHOP HYUN MIN KANG UNIVERSITY OF MICHIGAN, ANN ARBOR

PRACTICAL SESSION 5 GOTCLOUD ALIGNMENT WITH BWA JAN 7 TH, 2014 STOM 2014 WORKSHOP HYUN MIN KANG UNIVERSITY OF MICHIGAN, ANN ARBOR PRACTICAL SESSION 5 GOTCLOUD ALIGNMENT WITH BWA JAN 7 TH, 2014 STOM 2014 WORKSHOP HYUN MIN KANG UNIVERSITY OF MICHIGAN, ANN ARBOR GOAL OF THIS SESSION Assuming that The audiences know how to perform GWAS

More information

ChIP-seq practical: peak detection and peak annotation. Mali Salmon-Divon Remco Loos Myrto Kostadima

ChIP-seq practical: peak detection and peak annotation. Mali Salmon-Divon Remco Loos Myrto Kostadima ChIP-seq practical: peak detection and peak annotation Mali Salmon-Divon Remco Loos Myrto Kostadima March 2012 Introduction The goal of this hands-on session is to perform some basic tasks in the analysis

More information

!"#$%&$'()#$*)+,-./).01"0#,23+3,303456"6,&((46,7$+-./&((468,

!#$%&$'()#$*)+,-./).010#,23+3,3034566,&((46,7$+-./&((468, !"#$%&$'()#$*)+,-./).01"0#,23+3,303456"6,&((46,7$+-./&((468, 9"(1(02)1+(',:.;.4(*.',?9@A,!."2.4B.'#A,C(;.

More information

Handling sam and vcf data, quality control

Handling sam and vcf data, quality control Handling sam and vcf data, quality control We continue with the earlier analyses and get some new data: cd ~/session_3 wget http://wasabiapp.org/vbox/data/session_4/file3.tgz tar xzf file3.tgz wget http://wasabiapp.org/vbox/data/session_4/file4.tgz

More information

Supplementary Figure 1. Fast read-mapping algorithm of BrowserGenome.

Supplementary Figure 1. Fast read-mapping algorithm of BrowserGenome. Supplementary Figure 1 Fast read-mapping algorithm of BrowserGenome. (a) Indexing strategy: The genome sequence of interest is divided into non-overlapping 12-mers. A Hook table is generated that contains

More information

4.1. Access the internet and log on to the UCSC Genome Bioinformatics Web Page (Figure 1-

4.1. Access the internet and log on to the UCSC Genome Bioinformatics Web Page (Figure 1- 1. PURPOSE To provide instructions for finding rs Numbers (SNP database ID numbers) and increasing sequence length by utilizing the UCSC Genome Bioinformatics Database. 2. MATERIALS 2.1. Sequence Information

More information

Tutorial: Resequencing Analysis using Tracks

Tutorial: Resequencing Analysis using Tracks : Resequencing Analysis using Tracks September 20, 2013 CLC bio Silkeborgvej 2 Prismet 8000 Aarhus C Denmark Telephone: +45 70 22 32 44 Fax: +45 86 20 12 22 www.clcbio.com support@clcbio.com : Resequencing

More information

BaseSpace User Guide. Supporting the NextSeq, MiSeq, and HiSeq Sequencing Systems FOR RESEARCH USE ONLY

BaseSpace User Guide. Supporting the NextSeq, MiSeq, and HiSeq Sequencing Systems FOR RESEARCH USE ONLY BaseSpace User Guide Supporting the NextSeq, MiSeq, and HiSeq Sequencing Systems FOR RESEARCH USE ONLY Introduction 3 How Do I Start 8 BaseSpace User Interface 13 How To Use BaseSpace 25 Workflow Reference

More information

Atlas-SNP2 DOCUMENTATION V1.1 April 26, 2010

Atlas-SNP2 DOCUMENTATION V1.1 April 26, 2010 Atlas-SNP2 DOCUMENTATION V1.1 April 26, 2010 Contact: Jin Yu (jy2@bcm.tmc.edu), and Fuli Yu (fyu@bcm.tmc.edu) Human Genome Sequencing Center (HGSC) at Baylor College of Medicine (BCM) Houston TX, USA 1

More information

Annotating a single sequence

Annotating a single sequence BioNumerics Tutorial: Annotating a single sequence 1 Aim The annotation application in BioNumerics has been designed for the annotation of coding regions on sequences. In this tutorial you will learn how

More information

NGS Data and Sequence Alignment

NGS Data and Sequence Alignment Applications and Servers SERVER/REMOTE Compute DB WEB Data files NGS Data and Sequence Alignment SSH WEB SCP Manpreet S. Katari App Aug 11, 2016 Service Terminal IGV Data files Window Personal Computer/Local

More information

HiScanSQ System Site Preparation Guide

HiScanSQ System Site Preparation Guide HiScanSQ System Site Preparation Guide FOR RESEARCH USE ONLY Introduction 3 Supported Configurations 4 User-Supplied Equipment and Materials 5 Delivery and Installation 6 Operation, Maintenance, and Service

More information

Bioinformatics Services for HT Sequencing

Bioinformatics Services for HT Sequencing Bioinformatics Services for HT Sequencing Tyler Backman, Rebecca Sun, Thomas Girke December 19, 2008 Bioinformatics Services for HT Sequencing Slide 1/18 Introduction People Service Overview and Rates

More information

Overview of the Plug-In. Versions Supported

Overview of the Plug-In. Versions Supported Oracle Enterprise Manager System Monitoring Plug-In Installation Guide for Exadata Power Distribution Unit Release 11.1.0.2.0 E20087-03 March 2011 Overview of the Plug-In This plug-in will be used to monitor

More information

High-throughout sequencing and using short-read aligners. Simon Anders

High-throughout sequencing and using short-read aligners. Simon Anders High-throughout sequencing and using short-read aligners Simon Anders High-throughput sequencing (HTS) Sequencing millions of short DNA fragments in parallel. a.k.a.: next-generation sequencing (NGS) massively-parallel

More information

Genomics. Nolan C. Kane

Genomics. Nolan C. Kane Genomics Nolan C. Kane Nolan.Kane@Colorado.edu Course info http://nkane.weebly.com/genomics.html Emails let me know if you are not getting them! Email me at nolan.kane@colorado.edu Office hours by appointment

More information

SAS Universal Viewer 1.3

SAS Universal Viewer 1.3 SAS Universal Viewer 1.3 User's Guide SAS Documentation The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2012. SAS Universal Viewer 1.3: User's Guide. Cary, NC: SAS

More information