Mar. Guide. Edico Genome Inc North Torrey Pines Court, Plaza Level, La Jolla, CA 92037

Size: px

Start display at page:

Download "Mar. Guide. Edico Genome Inc North Torrey Pines Court, Plaza Level, La Jolla, CA 92037"

Erica Daniel
6 years ago
Views:

1 Mar 2017 DRAGEN TM Quick Start Guide Edico Genome Inc North Torrey Pines Court, Plaza Level, La Jolla, CA 92037

2 Notice Contents of this document and associated software and hardware are Copyright (c) Edico Genome Corporation. This document is proprietary to Edico Genome, and contains confidential information. Proprietary & Confidential Page 1 of 15 Edico Genome Inc.

3 Table of Contents Notice Introduction Hardware & Software Installation/Upgrade Running the Self-Test Running Your Own Test Generating a Reference (AKA Hash Table) Loading a Reference (AKA Hash Table) Process Your Input Data End-To-End Aligning and Variant Calling Examples Alignment Only Examples RNA Map/Align Only Examples Epigenome Map/Align Examples Variant Calling Only Examples Somatic Examples gvcf and Joint Calling Examples BCL Input Examples Troubleshooting Proprietary & Confidential Page 2 of 15 Edico Genome Inc.

4 1 Introduction This Quick Start Guide will help you to start processing data as quickly as possible. It assumes the server is powered on and that you are logged in. The full User s Guide can be found on the DRAGEN Portal website 2 Hardware & Software Installation/Upgrade If you are already running the latest version of the DRAGEN software and hardware, you can skip ahead to Section 3: Running the Self-Test. Query the current version of software and hardware with the command: dragen_info -b You can find out just the software version by running the command: rpm -q edico To install a new version of software and/or hardware, first download the package from the DRAGEN Portal website onto your DRAGEN server. The preferred installation method is the self-extracting.run file: sudo sh DRAGEN_ run During installation, if you are prompted to switch to a new hardware version, enter y. It is extremely important that the hardware upgrade process is not interrupted. When it is complete, you must halt and power cycle the server (a reboot command will not update the hardware version; you must issue a halt command and power the server off and on). 3 Running the Self-Test Run the command: /opt/edico/self_test/self_test.sh This will perform a thorough test of the hardware and will take about 15 minutes. When complete, it should output: SELF TEST RESULT : PASS If there is any failure, please contact Edico Genome support. You can ignore any tests which mention NON MANDATY TEST SKIPPED. 4 Running Your Own Test Below, we outline how to optionally generate a reference (5-15 minutes), load a reference (<1 minute), and process your own data. 4.1 Generating a Reference (AKA Hash Table) If you do not have a reference, you can generate one using these instructions. You simply run a dragen build-hash-table command (example below) and pass in the location of your reference FASTA file. You Proprietary & Confidential Page 3 of 15 Edico Genome Inc.

5 can specify a set of parameters when building your hash table (see the DRAGEN User Guide for more details), but for the quick start, you can run the example shell script or simple commands below. These examples assume your FASTA file is in /staging/human/reference/hg19/hg19.fa. /opt/edico/examples/build_hash_table.sh mkdir -p /staging/human/reference/hg19/hg19.fa.k_21.f_16.m_149 cd /staging/human/reference/hg19/hg19.fa.k_21.f_16.m_149 dragen --build-hash-table true --ht-reference /staging/human/reference/hg19/hg19.fa --output-dir /staging/human/reference/hg19/hg19.fa.k_21.f_16.m_149 The dragen --build-hash-table command is multithreaded and defaults to 8 threads, and takes about 15 minutes. You can use --ht-num-threads with a value up to 32 if your server supports that many threads, and the command will run in as little as 5 minutes. Note that the hash table directory name lists key default parameter values that were used during the hash table build. We strongly recommend following this best practice when you generate your own hash tables and change the directory name accordingly. 4.2 Loading a Reference (AKA Hash Table) Once the binary reference is loaded into memory on the DRAGEN board, it can be used for processing any number of input data sets; you will not need to reload the reference unless you restart the system, or wish to switch to a different reference/hash table. The reference will be loaded automatically the first time you process data with it; however, to load the reference genome manually onto the board, use this example shell script or command (where the reference directory in this example is /staging/human/reference/hg19/hg19.fa.k_21.f_16.m_149): /opt/edico/examples/load_reference.sh dragen -l This should take less than 1 minute, and should return: DRAGEN finished normally If a manual or automatic system reset occurs, then next time you try to process data, the reference you specify on the command line will be automatically reloaded. This is also true if you reboot the system. 4.3 Process Your Input Data Once you have loaded your reference, it is time to process your input FASTQ data. Pick the example below that best matches your data sets. These commands can take up to approximately 40 minutes to run on a 24 core server with SSD drives on a 30x coverage whole human genome when running end-to-end (fastq input to VCF output). The speed scales with input size, so a 60x coverage genome would take twice as long. Exome data takes a fraction of the time. Future releases will run even faster. A successful result is indicated by: Proprietary & Confidential Page 4 of 15 Edico Genome Inc.

6 DRAGEN finished normally followed by a block of metrics such as read count and performance. If there is any problem with the command-line arguments, an error will be displayed, followed by help usage. If your terminal window is short, you may need to scroll up to see the error. The DRAGEN log can be redirected to a file, to keep the record for future reference. Notes: To get help on dragen command-line options, run: dragen -h These example commands are formatted for visual display and include line feeds, and some characters (such as the dash and double-dash) may have been changed by MS Word. To avoid copy-paste errors, each example command is contained in an individual shell script in /opt/edico/examples/. All commands can accept either FASTQ or gzipped FASTQ (fastq.gz). DRAGEN will automatically determine which file type it is. All of these sample commands include the -f option, which will force the output file to be overwritten if it already exists. These commands all assume that your DRAGEN reference (hash table) directory is /staging/human/reference/hg19/hg19.fa.k_21.f_16.m_149, and your FASTA reference file is /staging/human/reference/hg19/hg19.fa. Replace those with the correct references if needed. These examples assume that the example data package is present in /staging/examples (in particular, the fastq and fastq.gz files are expected to be in /staging/examples/reads) End-To-End Aligning and Variant Calling Examples NOTE: In all the examples below in which the DRAGEN Variant Caller is enabled, there is a parameter named vc-reference specified which requires a path to the fasta reference file that was used when you built the hash tables. This is a temporary requirement and will be removed in the next release. 1. Paired-End Fastq Input, VCF Output (Default) /opt/edico/examples/paired_fastq_in_vcf_out.sh This command should take about 6 minutes on a 24-core server. Proprietary & Confidential Page 5 of 15 Edico Genome Inc.

7 This example illustrates the minimum parameters that must be specified to perform an end-to-end run. Note that by default, duplicate-marking is not performed. If you want to perform duplicate marking, see the following example in 2. Note that no BAM output is produced by default. If you want that along with the VCF file, see the example in 3. The user may optionally combine any of these per the desired use case. 2. Paired-End Fastq Input, Sorted and Duplicate-Marked, VCF Output /opt/edico/examples/paired_fastq_in_dupmark_vcf_out.sh --enable-duplicate-marking true 3. Paired-End Fastq Input, Sorted BAM and VCF Output /opt/edico/examples/paired_fastq_in_dupmark_bam_and_vcf_out.sh --enable-duplicate-marking true --enable-map-align-output true 4. Paired-End Fastq Input, Sorted SAM and VCF Output /opt/edico/examples/paired_fastq_in_dupmark_sam_and_vcf_out.sh Proprietary & Confidential Page 6 of 15 Edico Genome Inc.

8 --enable-duplicate-marking true --enable-map-align-output true --output-format SAM 5. Paired-End Fastq Input, Sorted CRAM and VCF Output /opt/edico/examples/paired_fastq_in_dupmark_cram_and_vcf_out.sh --enable-duplicate-marking true --enable-map-align-output true --output-format CRAM --cram-reference /staging/human/reference/hg19/hg19.fa Alignment Only Examples All of the variations for performing alignment shown in these examples can be used in the end-to-end case as well. 1. Map/Align Single-Ended FASTQ Input, Sorted BAM output (Default) /opt/edico/examples/single_fastq_in_bam_out.sh dragen f -1 /staging/examples/reads/sra056922_30x_rand1_100k.fastq --output-file-prefix SRA056922_30x_rand1_100K 2. Map/Align Single-ended FASTQ input, Sorted, Duplicate-Marked BAM Output /opt/edico/examples/single_fastq_in_dupmark_bam_out.sh dragen f -1 /staging/examples/reads/sra056922_30x_rand1_100k.fastq Proprietary & Confidential Page 7 of 15 Edico Genome Inc.

9 --output-file-prefix SRA056922_30x_rand1_100K_dup_marked --enable-duplicate-marking true 3. Map/Align Paired-End FASTQ Input, Sorted BAM Output (Default) /opt/edico/examples/paired_fastq_in_bam_out.sh dragen f 4. Map/Align Paired-End FASTQ Input, Sorted CRAM Output /opt/edico/examples/paired_fastq_in_cram_out.sh dragen f --cram-reference /staging/human/reference/hg19/hg19.fa --output-format CRAM 5. Map/Align Paired-End FASTQ Input, Sorted Uncompressed BAM Output /opt/edico/examples/paired_fastq_in_uncompressed_bam_out.sh --output-file-prefix uncompressed_sra056922_30x_e10_50m --enable-bam-compression false 6. Map/Align Paired-End FASTQ Input, Sorted SAM Output /opt/edico/examples/paired_fastq_in_sam_out.sh Proprietary & Confidential Page 8 of 15 Edico Genome Inc.

10 --output-format SAM 7. Map/Align Paired -End FASTQ Input, UN-Sorted BAM output /opt/edico/examples/paired_fastq_in_unsorted_bam_out.sh --output-file-prefix unsorted_sra056922_30x_e10_50m --enable-sort false 8. Map/Align Interleaved Paired-Ended FASTQ Input, BAM Output /opt/edico/examples/interleaved_fastq_in_bam_out.sh dragen f -1 /staging/examples/reads/sra056922_pe_30x_rand1_10k_interleaved.fastq --interleaved --output-file-prefix SRA056922_PE_30x_rand1_10K_interleaved RNA Map/Align Only Examples Any of the Map/Align Only examples can be used for RNA. The only difference in running it is to add the option --enable-rna true to the command line. DRAGEN will automatically pick up the RNA specific hash tables and use the RNA spliced aligner in its processing. 1. RNA Map/Align Paired-Ended FASTQ Input, BAM Output dragen f --enable-rna true Epigenome Map/Align Examples Prior to performing an epigenome (methylation) Map/Align run with bisulfite sequencing data you must first create methylation-specific reference hash tables: mkdir -p /staging/human/reference/hg19_epigenome dragen --build-hash-table true --ht-reference /staging/human/reference/hg19/hg19.fa --ht-max-seed-freq 64 --ht-seed-len 27 --ht-methylated true --output-directory /staging/human/reference/hg19_epigenome Proprietary & Confidential Page 9 of 15 Edico Genome Inc.

11 The above DRAGEN command will produce two hash table directories under /staging/human/reference/hg19_epigenome: GA_converted and CT_converted. The CT_converted hash table is produced by converting each C base to T in the reference sequences. Similarly, the GA_converted hash table is produced from the G->A base-converted reference sequences. The baseconverted references have less complexity, and to compensate we typically increase the hash table seed length argument (--ht-seed-len) to 27 for mammalian genomes (default seed length is 21). 1. Epigenome Map/Align, Directional-protocol, Single-Ended FASTQ Input, BAM Output The directional (Lister) protocol produces reads from two of the four possible bisulfite sequencing strands (see Section 6 of User Guide). Consequently, when the --methylation-protocol=directional argument is used, DRAGEN will align each read or read pair twice with different constraints corresponding to the two possible strands. The following DRAGEN command will produce two separate BAM files: mkdir p /staging/epigenome/directional dragen --output-directory /staging/epigenome/directional --methylationprotocol=directional r /staging/human/reference/hg19_epigenome --fastqfile1=/staging/epigenome/reads/sample_1_r1.fastq.gz --RGID=rg1 --RGSM=samp1 -- RGPL=illumina --output-file-prefix=sample_1 2. Epigenome Map/Align, Non-directional-protocol, Paired-Ended FASTQ Input, BAM Output As described in Section 6 of the User Guide, the non-directional protocol produces reads from all four possible bisulfite sequencing strands. Consequently, when the --methylation-protocol=non-directional argument is used, DRAGEN will align each read four times and produce four separate BAM files. mkdir p /staging/epigenome/non-directional dragen --output-directory /staging/epigenome/non-directional --methylationprotocol=non-directional r /staging/human/reference/hg19_epigenome --fastqfile1=/staging/epigenome/reads/sample_10_r1.fastq.gz --fastqfile2=/staging/epigenome/reads/sample_10_r2.fastq.gz --RGID=rg10 --RGSM=samp10 -- RGPL=illumina --output-file-prefix=sample_ Variant Calling Only Examples The examples shown in this section demonstrate how you can pass an existing aligned BAM or CRAM file directly to the DRAGEN Variant Caller. By default, the BAM/CRAM file will pass through the sorting stage prior to variant calling. If it is already sorted, then you can save some time by disabling the sort step. NOTE: If you need to duplicate mark your BAM file before running the DRAGEN Variant Caller, you will need to use a separate tool for that step. The DRAGEN Duplicate Marker depends on information provided by the Mapper/Aligner which does not exist in BAM files. To take advantage of the DRAGEN Duplicate Marker, use DRAGEN in end-to-end mode. Note: The BAM/CRAM files which are used as input to these example commands, are not included in the example data set. They are generated by a previous example commands in the Alignment Only Examples above. 1. Unsorted BAM Input, VCF Output (Default) /opt/edico/examples/unsorted_bam_in_vcf_out.sh Proprietary & Confidential Page 10 of 15 Edico Genome Inc.

12 -b /staging/human/unsorted_sra056922_30x_e10_50m.bam --output-file-prefix unsorted_output_sra056922_30x_e10_50m 2. Sorted BAM Input, VCF Output /opt/edico/examples/sorted_bam_in_vcf_out.sh -b /staging/human/sra056922_30x_e10_50m.bam --output-file-prefix sorted_output_sra056922_30x_e10_50m --enable-sort false 3. Sorted CRAM Input, VCF Output /opt/edico/examples/sorted_cram_in_vcf_out.sh --output-file-prefix sorted_output_sra056922_30x_e10_50m --enable-sort false --cram-reference /staging/human/reference/hg19/hg19.fa --cram-input /staging/human/sra056922_30x_e10_50m.cram Proprietary & Confidential Page 11 of 15 Edico Genome Inc.

13 4.3.6 Somatic Examples 1. Paired-End Fastq Input --tumor-fastq1 /staging/examples/reads/sra056922_30x_shuffle16k_e10_50m_1.fastq.gz --tumor-fastq2 /staging/examples/reads/sra056922_30x_shuffle16k_e10_50m_2.fastq.gz 2. Sorted BAM Input --tumor-bam-input /staging/human/sra056922_30x_e10_50m.bam --output-file-prefix sorted_output_sra056922_30x_e10_50m Proprietary & Confidential Page 12 of 15 Edico Genome Inc.

14 4.3.7 gvcf and Joint Calling Examples 1. Paired-End Fastq Input, gvcf Output --vc-emit-ref-confidence GVCF 2. Joint Calling with gvcf input --enable-joint-genotyping true --output-file-prefix Joint_SRA056922_30x_e10_50M --variant /staging/examples/sra056922_30x_e10_50m.gvcf Proprietary & Confidential Page 13 of 15 Edico Genome Inc.

15 4.3.8 BCL Input Examples In this section we demonstrate how to use DRAGEN to process Illumina s BCL format files. DRAGEN can use BCL input to produce FASTQ files very quickly. With some limitations, it can also use BCL input directly to perform Map-Align and optionally Variant Calling, saving the time and space required to perform conversion to FASTQ. Note: The BCL directory in these examples is not included in the example data package. Please replace /mnt/san/131022_hsxten008_0123_fc543 with your own BCL directory. 1. BCL to FASTQ conversion with minimal settings This example shows how to convert data from the BCL format to FASTQ files. Note that DRAGEN will produce multiple files per sample with names like <SampleName>_001.fastq, <SampleName>_002.fastq, etc. There is no need to concatenate these files before performing Map-Align using DRAGEN: specifying the first file in the series will cause DRAGEN to read all of them as if they were concatenated into one file. dragen --bcl-conversion-only=true --bcl-input-dir /mnt/san/131022_hsxten008_0123_fc543 --bcl-output-dir /staging/examples/ 2. Map/Align BCL Lane 1 Input, Sorted BAM output (Default) This example performs Map-Align operation directly from BCL, outputting a sorted BAM file. Note that a single lane must be specified, and that lane must have a single entry in the SampleSheet.csv file (nonindexed BCL). dragen --bcl-input-dir /mnt/san/131022_hsxten008_0123_fc543 --bcl-only-lane 1 --output-file-prefix SRA056922_30x_rand1_100K 3. BCL Lane 3 Input, VCF Output (Default) This full-pipeline run is subject to the same BCL streaming limitations as the example above: a single, nonindexed BCL lane. dragen --bcl-input-dir /mnt/san/131022_hsxten008_0123_fc543 --bcl-only-lane 3 --output-file-prefix SRA056922_30x_rand1_100K 5 Troubleshooting The DRAGEN software will automatically reset the board if any problems are encountered. In the rare case that this doesn t occur automatically, you can issue this command: dragen_reset If this does not resolve the issue, please use the DRAGEN Portal to create a support ticket and attach the results produced by the following command: Proprietary & Confidential Page 14 of 15 Edico Genome Inc.

16 sudo sosreport --batch This tool will take several minutes to execute and will report the location where it has saved the diagnostic information in /tmp. For more details, please see the DRAGEN User Guide which is available from the DRAGEN Portal. Proprietary & Confidential Page 15 of 15 Edico Genome Inc.

Sep. Guide. Edico Genome Corp North Torrey Pines Court, Plaza Level, La Jolla, CA 92037

Sep. Guide. Edico Genome Corp North Torrey Pines Court, Plaza Level, La Jolla, CA 92037 Sep 2017 DRAGEN TM Quick Start Guide www.edicogenome.com info@edicogenome.com Edico Genome Corp. 3344 North Torrey Pines Court, Plaza Level, La Jolla, CA 92037 Notice Contents of this document and associated