Molecular Identifier (MID) Analysis for TAM-ChIP Paired-End Sequencing

Similar documents
Molecular Identifier (MID) Analysis for TAM-ChIP Paired-End Sequencing

Run Setup and Bioinformatic Analysis. Accel-NGS 2S MID Indexing Kits

Illumina Experiment Manager: Narration Transcript

Illumina Experiment Manager User Guide

Indexed Sequencing. Overview Guide

Sequencing set-up guidelines for NGS libraries prepped with Agilent NGS kits

Indexed Sequencing. Overview Guide

Analyzing ChIP- Seq Data in Galaxy

HiSeq Instrument Software Release Notes

Miseq spec, process and turnaround times

User Guide: Illumina sequencing technologies Sequence-ready libraries

QIAseq DNA V3 Panel Analysis Plugin USER MANUAL

QIAseq Targeted RNAscan Panel Analysis Plugin USER MANUAL

Demultiplexing Illumina sequencing data containing unique molecular indexes (UMIs)

mtdna Variant Processor v1.0 BaseSpace App Guide

README _EPGV_DataTransfer_Illumina Sequencing

Guidelines for sequencing SCC libraries All libraries made after are V3 libraries

umicount Documentation

The software comes with 2 installers: (1) SureCall installer (2) GenAligners (contains BWA, BWA- MEM).

Copyright 2014 Regents of the University of Minnesota

Copyright 2014 Regents of the University of Minnesota

TECH NOTE Improving the Sensitivity of Ultra Low Input mrna Seq

Tutorial. Small RNA Analysis using Illumina Data. Sample to Insight. October 5, 2016

BaseSpace - MiSeq Reporter Software v2.4 Release Notes

Small RNA Analysis using Illumina Data

High-throughout sequencing and using short-read aligners. Simon Anders

High-throughput sequencing: Alignment and related topic. Simon Anders EMBL Heidelberg

RNA-seq. Manpreet S. Katari

BaseSpace User Guide. Supporting the NextSeq, MiSeq, and HiSeq Sequencing Systems FOR RESEARCH USE ONLY

High-throughput sequencing: Alignment and related topic. Simon Anders EMBL Heidelberg

Tutorial: De Novo Assembly of Paired Data

Introduction. Library quantification and rebalancing. Prepare Library Sequence Analyze Data. Library rebalancing with the iseq 100 System

Protocol: peak-calling for ChIP-seq data / segmentation analysis for histone modification data

User Guide For LabCollector Workflow Manager

Practical Bioinformatics for Life Scientists. Week 4, Lecture 8. István Albert Bioinformatics Consulting Center Penn State

The software comes with 2 installers: (1) SureCall installer (2) GenAligners (contains BWA, BWA-MEM).

Illumina Next Generation Sequencing Data analysis

Analysis of ChIP-seq data

Galaxy Platform For NGS Data Analyses

ChIP-seq Analysis Practical

Nevada Genomics Center

Peter Schweitzer, Director, DNA Sequencing and Genotyping Lab

Tutorial. OTU Clustering Step by Step. Sample to Insight. March 2, 2017

Sequence Preprocessing: A perspective

Importing your Exeter NGS data into Galaxy:

m6aviewer Version Documentation

CLC Server. End User USER MANUAL

These will serve as a basic guideline for read prep. This assumes you have demultiplexed Illumina data.

Tutorial. OTU Clustering Step by Step. Sample to Insight. June 28, 2018

User's guide to ChIP-Seq applications: command-line usage and option summary

Sequencing Analysis Viewer Software User Guide

Fusion Detection Using QIAseq RNAscan Panels

HIPPIE User Manual. (v0.0.2-beta, 2015/4/26, Yih-Chii Hwang, yihhwang [at] mail.med.upenn.edu)

AVANTUS TRAINING PTE LTD

Colorado State University Bioinformatics Algorithms Assignment 6: Analysis of High- Throughput Biological Data Hamidreza Chitsaz, Ali Sharifi- Zarchi

Workflow Manager. October 2017

Welcome to MAPHiTS (Mapping Analysis Pipeline for High-Throughput Sequences) tutorial page.

ScholarOne Abstracts. Review Administrator Guide

Spreadsheet definition: Starting a New Excel Worksheet: Navigating Through an Excel Worksheet

EpiGnome Methyl Seq Bioinformatics User Guide Rev. 0.1

Local Run Manager Resequencing Analysis Module Workflow Guide

Moving You Forward A first look at the New FileBound 6.5.2

Tutorial for Windows and Macintosh. Trimming Sequence Gene Codes Corporation

LIMS User Guide 1. Content

Contact: Raymond Hovey Genomics Center - SFS

EcoStudy Software User Guide

Customizable information fields (or entries) linked to each database level may be replicated and summarized to upstream and downstream levels.

Tutorial. Comparative Analysis of Three Bovine Genomes. Sample to Insight. November 21, 2017

Ensembl RNASeq Practical. Overview

ChIP-Seq Tutorial on Galaxy

Dr. Gabriela Salinas Dr. Orr Shomroni Kaamini Rhaithata

Variation among genomes

SAM : Sequence Alignment/Map format. A TAB-delimited text format storing the alignment information. A header section is optional.

NGS NEXT GENERATION SEQUENCING

Employee User s Guide

Understanding and Pre-processing Raw Illumina Data

All About PlexSet Technology Data Analysis in nsolver Software

Agilent Genomic Workbench Lite Edition 6.5

FileLoader for SharePoint

ChIP-seq Analysis. BaRC Hot Topics - Feb 23 th 2016 Bioinformatics and Research Computing Whitehead Institute.

RC90 Upgrade Procedures

Tutorial. De Novo Assembly of Paired Data. Sample to Insight. November 21, 2017

Sequence Data Quality Assessment Exercises and Solutions.

Access 2016 Essentials Syllabus

When you pass Exam : Access 2010, you complete the requirements for the Microsoft Office Specialist (MOS) - Access 2010 certification.

TestOut Desktop Pro Plus - English 4.x.x. MOS Instructor Guide. Revised

Tutorial. Find Very Low Frequency Variants With QIAGEN GeneRead Panels. Sample to Insight. November 21, 2017

Microsoft Certified Application Specialist Exam Objectives Map

Ion AmpliSeq Designer: Getting Started

Pre-processing and quality control of sequence data. Barbera van Schaik KEBB - Bioinformatics Laboratory

MiSeq System User Guide

ELPREP PERFORMANCE ACROSS PROGRAMMING LANGUAGES PASCAL COSTANZA CHARLOTTE HERZEEL FOSDEM, BRUSSELS, BELGIUM, FEBRUARY 3, 2018

Legal Notes. Regarding Trademarks KYOCERA MITA Corporation

HID Real-Time PCR Analysis Software

A Product of. Structured Solutions Inc.

Microsoft Office 2016 Mail Merge

Sequence Analysis Pipeline

Excel 2016: Core Data Analysis, Manipulation, and Presentation; Exam

WinFlexOne - Importer MHM Resources LLC

PRACTICAL SESSION 5 GOTCLOUD ALIGNMENT WITH BWA JAN 7 TH, 2014 STOM 2014 WORKSHOP HYUN MIN KANG UNIVERSITY OF MICHIGAN, ANN ARBOR

Transcription:

Molecular Identifier (MID) Analysis for TAM-ChIP Paired-End Sequencing Catalog Nos.: 53126 & 53127 Name: TAM-ChIP antibody conjugate Description Active Motif s TAM-ChIP technology combines antibody directed genomic targeting and NGS library preparation in one step. First, a ChIPvalidated antibody is used to target a genomic region of interest, such as a histone or transcription factor binding site. Then, a TAM-ChIP anti-species antibody that is conjugated to barcoded sequencing adapters and a TsTn5 transposase is added to each reaction. Activation of the transposase cuts the nearby DNA and pastes the antibody associated adapters into the DNA sequence. Following immunoprecipitation, enriched DNA is ready for library amplification and sequencing using Illumina platforms. The inclusion of unique molecular identifiers (MIDs, also known as UMIs) in the TAM-ChIP Index primers enables accurate removal of PCR duplicates from sequencing data. This helps to increase the number of unique alignments for more accurate data sets. The TAM-ChIP antibody conjugate also contains a unique barcode. Through bioinformatic analysis, individual samples are identified and PCR duplicates can be removed from the data set, while fragmentation duplicates are preserved. Schematic of TAM-ChIP. Application Notes TAM-ChIP uses dual index sequencing that can be run as single-read or paired-end and is compatible with Illumina platforms and sequencing reagents. It is recommended to sequence 30 million reads per sample. Different Illumina instruments may require different sample sheet set-up for correct processing. Guidelines are provided below to create a sample sheet template for your specific instrument using the Illumina Experiment Manager (IEM). Modifications will need to be made to the template to add custom i7 and i5 index sequences. Due to the tethering of the TsTn5 transposase to the TAM-ChIP antibody conjugate, there is a high likelihood that the transposase will insert the sequencing adapters into similar regions of the genome across your sample population. By utilizing the Molecular Identifiers (MIDs), sequencing reads that start at the same position can be identified as biological replicates instead of PCR duplicates. With the MID de-duplication tool from Active Motif, these biological replicates are retained and added back into the sequencing analysis. This increases the total number of sequencing reads up to 3-fold. This document contains instructions on how to prepare FASTQ files for de-duplication of the MIDs that are incorporated into the TAM-ChIP Index primers (i7 index) The Perl script to run the de-duplication can be obtained by contacting Active Motif s technical support team at tech_service@activemotif.com. North America 877 222 9543 Europe 32 (0)2 653 0001 Japan 81 (0)3 5225 3638 www.activemotif.com

Sequencing Recommendations: Prepare the sequencing libraries using the TAM-ChIP Index primers provided with the TAM-ChIP antibody conjugate (Catalog Nos. 53126 & 53127) following the instructions provided on the lot-specific product data sheet. For sequencing, we recommend 30 million reads on an Illumina platform. Table 1: Instrument Selection Instrument Max Reads* Max Read Length i5 Index Orientation Application NovaSeq 3.3 B 2 x 150 Forward NovaSeq FASTQ only HiSeq 3000/4000 2.5 B 2 x 150 PE: Reverse Complement SR: Forward HiSeq 2500 NextSeq HO: 1.5 B (v3) - 2 B (v4) RR: 400 M HO: 400 M MO: 130 M *Output and read calculations are based on single flowcell. 2 x 125 2 X 250 2 X 150 2 x 150 Forward Reverse Complement HiSeq FASTQ only HiSeq FASTQ only NextSeq FASTQ only Each TAM-ChIP Index primer (i7) contains an 8 bp random molecular identifier followed by a 3 bp sample barcode (shown in bold in Table 2 below) for a total of 11 bp. Table 2: TAM-ChIP i7 Index Forward Sequence TAM-ChIP Index 1 TAM-ChIP Index 2 TAM-ChIP Index 3 TAM-ChIP Index 4 TAM-ChIP Index 5 TAM-ChIP Index 6 TAM-ChIP Index 7 TAM-ChIP Index 8 TAM-ChIP Index 9 TAM-ChIP Index 10 TAM-ChIP Index 11 TAM-ChIP Index 12 TAM-ChIP Index 13 TAM-ChIP Index 14 TAM-ChIP Index 15 TAM-ChIP Index 16 NNNNNNNNTCG NNNNNNNNGAC NNNNNNNNACG NNNNNNNNGCT NNNNNNNNTGA NNNNNNNNCGT NNNNNNNNCTA NNNNNNNNGTA NNNNNNNNAGT NNNNNNNNATC NNNNNNNNTAG NNNNNNNNCAG NNNNNNNNTAC NNNNNNNNGTC NNNNNNNNCAC NNNNNNNNACA The antibody index (i5) is an 8 bp sequence of which 3 bp are used for analysis (shown in bold in the Table 3 below). Each anti-species TAM-ChIP conjugate has its own i5 index to enable multiplexing of both the anti-rabbit and anti-mouse conjugates within the same sequencing reaction. Table 3: Antibody i5 Index Forward Sequence Reverse Complement TAM-ChIP anti-rabbit GTAAGGAG CTCCTTAC TAM-ChIP anti-mouse ACTGCATA TATGCAGT North America 877 222 9543 Europe 32 (0)2 653 0001 Japan 81 (0)3 5225 3638 www.activemotif.com

We recommend running the sequencing instrument in standalone mode to allow the use of different length sequencing of the i5 and i7 indices (11 bp on i7 and 8 bp on i5). Note: BaseSpace does not allow for different length sequencing so you will need to use 8 bp and 8 bp in this configuration. Guidelines For Creating a Sample Sheet Template We suggest using the Illumina Experiment Manager (IEM) to create a sample sheet template (.csv file). IEM can be downloaded as a separate software program from Illumina at: https://support.illumina.com/sequencing/sequencing_software/experiment_manager/ downloads.html. 1. In IEM, select Create Sample Sheet. 2. Select the appropriate instrument. 3. Select the application shown in Table 1 for your instrument (e.g. FASTQ only). 4. Fill in the information in the Sample Sheet Wizard - Workflow Parameters. a. Reagent Kit Barcode is a required field. If you don t know your reagent kit, type any text in order to create the template, this will also serve as the file name. b. TAM-ChIP uses dual index sequencing and requires a dual index library prep kit. Select the Nextera DNA selection to create your sample sheet template since this is a dual index library kit. c. Select Nextera Index Kit, 24 Indexes if creating a sample sheet template with the Nextera DNA kit. Additional samples can be added later in the Excel template. d. Select 2 (Dual) Index Reads e. Fill out the rest of the experimental details including Read Type and Cycles Read based on your experimental design. f. If using the MID de-duplication tool (which is strongly encouraged), uncheck the box(es) for Adapter Trimming and Use Adapter Trimming Read 2 as the MID de-duping tool will trim the adapter sequences as part of its script. g. Select Next. 5. In Sample Sheet Wizard - Sample Selection, on the right hand pane, select Add Blank Row. To create a template you can add 2 rows here and then modify the.csv file later to customize for all the samples. Click the Maximize box in the upper right corner to view a full screen shot of the sample sheet layout. a. Fill in a unique name for the Sample ID in each row. b. Select unique options from the drop-down menu for Index 1 (i7) and Index 2 (i5) for each row. The actual selections do not matter since this is a template. We will overwrite the data with the custom TAM-ChIP i7 and i5 information in the Excel file. c. If the information is entered correctly, the bottom left corner will show Sample Sheet Status: Valid. d. Select Finish to complete the Sample Sheet Template and create a.csv file. You can then view and modify the template in Excel. 6. Open the Sample Sheet Template in Excel. Do not modify any of the [Header], [Reads], or [Settings] fields. Only modify the content of the [Data] portion. If other fields need to be adjusted, create a new template using steps 1-5 above. a. Overwrite the Excel values for I7_Index_ID with the correct TAM-ChIP Index identification. Overwrite the i7 Index sequence using the 11 bp sequence provided in Table 2: TAM-ChIP i7 Index. b. Overwrite the Excel values for I5_Index_ID with a unique name (e.g Rabbit 1, Rabbit2). Overwrite the i5 index sequence with the correct 8 bp antibody conjugate information provided in Table 3: Antibody i5 Index. Use the appropriate i5 Index orientation (forward or reverse complement sequence) based on the recommendations for your instrument as shown in Table 1. c. Copy and paste additional rows to your Sample Sheet Template as needed for your sample run. Overwrite data to make sure each row contains unique values. 7. Once the edits to the template are completed, save as a.csv file to your designated folder location in your run directory. North America 877 222 9543 Europe 32 (0)2 653 0001 Japan 81 (0)3 5225 3638 www.activemotif.com

Guidelines to Prepare FASTQ File for MID De-Duplication: 1. Prepare SampleSheet.csv as described above and run sequencing. Note: Remember to remove adapter sequences from sample sheet to avoid trimming by bcl2fastq. 2. This protocol is based on using Illumina bcl2fastq Conversion Software v2.20. If using another version of the software, please contact Active Motif s technical support team at tech_service@activemotif.com to obtain the necessary changes to the protocol. Run bcl2fastq v2.20 to generate de-multiplexed FASTQ files for each sample. bcl2fastq \ --use-bases-mask Y*,I3n*,I3n*,Y* \ --minimum-trimmed-read-length 0 \ --barcode-mismatches 0 \ --mask-short-adapter-reads 0 \ --no-lane-splitting \ -R $nextseq_run_dir \ -o $nextseq_run_dir/fastq \ --interop-dir $nextseq_run_dir/interop \ --reports-dir $nextseq_run_dir/reports \ --stats-dir $nextseq_run_dir/reports/html 3. Remove or rename SampleSheet.csv and run bcl2fastq v2.20 again to generate four non-demultiplexed FASTQ files, making sure to change base mask and paths to output directories as shown below. bcl2fastq \ --use-bases-mask Y*,Y*,Y*,Y* \ --minimum-trimmed-read-length 0 \ --barcode-mismatches 0 \ --mask-short-adapter-reads 0 \ --no-lane-splitting \ -R $nextseq_run_dir \ -o $nextseq_run_dir/fastq_concat \ --interop-dir $nextseq_run_dir/interop_concat \ --reports-dir $nextseq_run_dir/reports \ --stats-dir $nextseq_run_dir/reports/html_concat 4. Reformat the three FASTQ files for the non-de-multiplexed data into a single FASTQ file. After Step 3 above, there should be two FASTQ file containing the r1 and r2 read sequence data, and two additional FASTQ files containing the i7 and i5 index sequence data, respectively. The first four lines of these four FASTQ files will look similar to the example below: @NS500375:446:HG3Y3BGX2:1:11101:8018:1054 1:N:0:0 North America 877 222 9543 Europe 32 (0)2 653 0001 Japan 81 (0)3 5225 3638 www.activemotif.com

@NS500375:446:HG3Y3BGX2:1:11101:8018:1054 2:N:0:0 GATACTAGTAC AAAAAEEEEE6 @NS500375:446:HG3Y3BGX2:1:11101:8018:1054 3:N:0:0 TATGCAGT AAAAAEEE @NS500375:446:HG3Y3BGX2:1:11101:8018:1054 4:N:0:0 Using simple programs (such as a perl script), extract the sequence of each i7 and i5 index and append these to the end of the first line of the r1 and r2 read sequence FASTQ headers to create two new output FASTQ files that will look similar to the example below: @NS500375:446:HG3Y3BGX2:1:11101:8018:1054 1:N:0:GATACTAGTACTATGCAGT @NS500375:446:HG3Y3BGX2:1:11101:8018:1054 2:N:0:GATACTAGTACTATGCAGT 5. Reformat the FASTQ files for each sample to incorporate the MID DNA sequence into the FASTQ header. After Step 4 above, there should be two FASTQ files for each sample (read 1 & read 2), plus two FASTQ files for the entire non-demultiplexed run (read 1 & read 2). The first four lines of the de-multiplexed FASTQ samples (created in Step 2) will look similar to the example below: @NS500375:446:HG3Y3BGX2:1:11101:8018:1054 1:N:0:GATTAT The first four lines of the corresponding data from the non-demultiplexed FASTQ sample file (created in Step 4) will look similar to the example below: @NS500375:446:HG3Y3BGX2:1:11101:8018:1054 1:N:0:GATACTAGTACTATGCAGT North America 877 222 9543 Europe 32 (0)2 653 0001 Japan 81 (0)3 5225 3638 www.activemotif.com

Note the difference in the end of the header line, which has truncated the i7 and i5 adapter sequences in the de-multiplexed version compared to the non-de-multiplexed version. Using a simple program (such as seqtk and perl scripts), extract each de-multiplexed sample from the non-de-multiplexed data (thereby restoring the original non-truncated header line to the de-multiplexed data). Then modify each header to add the full i7 index sequence to the end of the SEQID with an underscore separator. The final result will look like the example below: @NS500375:446:HG3Y3BGX2:1:11101:8018:1054_GATACTAGTAC 1:N:0:GATACTAGTACTATGCAGT This is necessary to preserve the molecular ID sequence after the BAM mapping in the BAM QNAME field. 6. Trim adapters using tool of choice (example shown): trim_galore \ --adapter $AdapterSeq \ --adapter2 $AdapterSeq \ --path_to_cutadapt $cutadapt_path/cutadapt \ --paired \ INPUT_r1.fastq \ INPUT_r2.fastq 7. Map FASTQ data to reference genome to create BAM using tool of choice (e.g. BWA). 8. Sort BAM using tool of choice (e.g. SAMtools). 9. Run MID de-duping to mark duplicates in BAM. Please contact Active Motif Technical Support at 877-222-9543 or tech_service@activemotif.com to obtain the MID de-duping Perl script. North America 877 222 9543 Europe 32 (0)2 653 0001 Japan 81 (0)3 5225 3638 www.activemotif.com

Technical Services If you need assistance at any time, please call Active Motif Technical Service at one of the numbers listed below. Active Motif North America 1914 Palomar Oaks Way, Suite 150 Carlsbad, CA 92008 USA Toll Free: 877 222 9543 Telephone: 760 431 1263 Fax: 760 431 1351 E-mail: tech_service@activemotif.com Active Motif Europe Avenue Reine Astrid, 92 B-1330 La Hulpe, Belgium UK Free Phone: 0800 169 31 47 France Free Phone: 0800 90 99 79 Germany Free Phone: 0800 181 99 10 Telephone: 32 (0)2 653 0001 Fax: 32 (0)2 653 0050 E-mail: eurotech@activemotif.com Active Motif Japan Azuma Bldg, 7th Floor 2-21 Ageba-Cho, Shinjuku-Ku Tokyo, 162-0824, Japan Telephone: 81 3 5225 3638 Fax: 81 3 5261 8733 E-mail: japantech@activemotif.com Active Motif China 787 Kangqiao Road Building 10, Suite 202, Pudong District Shanghai, 201315, China Telephone: (86)-21-20926090 Hotline: 400-018-8123 E-mail: techchina@activemotif.com Visit Active Motif on the worldwide web at http://www.activemotif.com At this site: Read about who we are, where we are, and what we do Review data supporting our products and the latest updates Enter your name into our mailing list to receive our catalog, MotifVations newsletter and notification of our upcoming products Share your ideas and results with us View our job opportunities Don t forget to bookmark our site for easy reference! North America 877 222 9543 Europe 32 (0)2 653 0001 Japan 81 (0)3 5225 3638 www.activemotif.com