SAMtools. SAM BAM. mapping. BAM sort & indexing (ex: IGV) SNP call

Similar documents
Welcome to MAPHiTS (Mapping Analysis Pipeline for High-Throughput Sequences) tutorial page.

Variant calling using SAMtools

Manual Reference Pages samtools (1)

Practical exercises Day 2. Variant Calling

High-throughput sequencing: Alignment and related topic. Simon Anders EMBL Heidelberg

INTRODUCTION AUX FORMATS DE FICHIERS

Handling sam and vcf data, quality control

Calling variants in diploid or multiploid genomes

High-throughput sequencing: Alignment and related topic. Simon Anders EMBL Heidelberg

Variation among genomes

High-throughout sequencing and using short-read aligners. Simon Anders

NGS Analysis Using Galaxy

NGS Data Analysis. Roberto Preste

RNAseq analysis: SNP calling. BTI bioinformatics course, spring 2013

SAM / BAM Tutorial. EMBL Heidelberg. Course Materials. Tobias Rausch September 2012

CBSU/3CPG/CVG Joint Workshop Series Reference genome based sequence variation detection

Mapping and Viewing Deep Sequencing Data bowtie2, samtools, igv

Tutorial on gene-c ancestry es-ma-on: How to use LASER. Chaolong Wang Sequence Analysis Workshop June University of Michigan

SAM and VCF formats. UCD Genome Center Bioinformatics Core Tuesday 14 June 2016

SAM : Sequence Alignment/Map format. A TAB-delimited text format storing the alignment information. A header section is optional.

Dindel User Guide, version 1.0

Read Mapping and Variant Calling

NGS Data Visualization and Exploration Using IGV

The SAM Format Specification (v1.3 draft)

PRACTICAL SESSION 5 GOTCLOUD ALIGNMENT WITH BWA JAN 7 TH, 2014 STOM 2014 WORKSHOP HYUN MIN KANG UNIVERSITY OF MICHIGAN, ANN ARBOR

The SAM Format Specification (v1.3-r837)

Sequence Mapping and Assembly

freebayes in depth: model, filtering, and walkthrough Erik Garrison Wellcome Trust Sanger of Iowa May 19, 2015

File Formats: SAM, BAM, and CRAM. UCD Genome Center Bioinformatics Core Tuesday 15 September 2015

Sequence Alignment. GS , Introduc8on to Bioinforma8cs The University of Texas GSBS program, Fall 2013

DNA Sequencing analysis on Artemis

Next Generation Sequence Alignment on the BRC Cluster. Steve Newhouse 22 July 2010

Lecture 12. Short read aligners

Atlas-SNP2 DOCUMENTATION V1.1 April 26, 2010

Resequencing Analysis. (Pseudomonas aeruginosa MAPO1 ) Sample to Insight

NGS Sequence data. Jason Stajich. UC Riverside. jason.stajich[at]ucr.edu. twitter:hyphaltip stajichlab

Analyzing ChIP- Seq Data in Galaxy

AgroMarker Finder manual (1.1)

Genome 373: Mapping Short Sequence Reads III. Doug Fowler

Introduction to Linux & UPPMAX

Variant Calling and Filtering for SNPs

Sentieon Documentation

SNP Calling. Tuesday 4/21/15

Input files: Trim reads: Create bwa index: Align trimmed reads: Convert sam to bam: Sort bam: Remove duplicates: Index sorted, no-duplicates bam:

v0.2.0 XX:Z:UA - Unassigned XX:Z:G1 - Genome 1-specific XX:Z:G2 - Genome 2-specific XX:Z:CF - Conflicting

Evaluate NimbleGen SeqCap RNA Target Enrichment Data

Pre-processing and quality control of sequence data. Barbera van Schaik KEBB - Bioinformatics Laboratory

An Introduction to Linux and Bowtie

From fastq to vcf. NGG 2016 / Evolutionary Genomics Ari Löytynoja /

v0.3.2 March 29, 2017

v0.3.0 May 18, 2016 SNPsplit operates in two stages:

Bioinformatica e analisi dei genomi

Supplementary Information. Detecting and annotating genetic variations using the HugeSeq pipeline

QIAseq DNA V3 Panel Analysis Plugin USER MANUAL

NA12878 Platinum Genome GENALICE MAP Analysis Report

REPORT. NA12878 Platinum Genome. GENALICE MAP Analysis Report. Bas Tolhuis, PhD GENALICE B.V.

Mar. Guide. Edico Genome Inc North Torrey Pines Court, Plaza Level, La Jolla, CA 92037

Exome sequencing. Jong Kyoung Kim

PRACTICAL SESSION 8 SEQUENCE-BASED ASSOCIATION, INTERPRETATION, VISUALIZATION USING EPACTS JAN 7 TH, 2014 STOM 2014 WORKSHOP

NGS Analyses with Galaxy

Genome Assembly Using de Bruijn Graphs. Biostatistics 666

Preparation of alignments for variant calling with GATK: exercise instructions for BioHPC Lab computers

RNA-Seq in Galaxy: Tuxedo protocol. Igor Makunin, UQ RCC, QCIF

Analysing re-sequencing samples. Anna Johansson WABI / SciLifeLab

Sequence Analysis Pipeline

elprep: a high- performance tool for preparing SAM/BAM files for variant calling Charlo<e Herzeel (Imec) Pascal Costanza (Intel) July 2014

Genomic Files. University of Massachusetts Medical School. October, 2014

Galaxy Platform For NGS Data Analyses

Bioinformatics in next generation sequencing projects

Cycle «Analyse de données de séquençage à haut-débit» Module 1/5 Analyse ADN. Sophie Gallina CNRS Evo-Eco-Paléo (EEP)

Isaac Enrichment v2.0 App

The SAM Format Specification (v1.4-r956)

NGSEP plugin manual. Daniel Felipe Cruz Juan Fernando De la Hoz Claudia Samantha Perea

Helpful Galaxy screencasts are available at:

Mapping. Reference. read

EpiGnome Methyl Seq Bioinformatics User Guide Rev. 0.1

Under the Hood of Alignment Algorithms for NGS Researchers

LASER: Locating Ancestry from SEquence Reads version 2.04

merged_bam => $merged_bam, picard_file => /path/to/lib_picard_insert_size_metrics.txt output_dir => /path/for/output/ });

MIRING: Minimum Information for Reporting Immunogenomic NGS Genotyping. Data Standards Hackathon for NGS HACKATHON 1.0 Bethesda, MD September

Local Run Manager Resequencing Analysis Module Workflow Guide

Bioinformatics Framework

Introduction to NGS analysis on a Raspberry Pi. Beta version 1.1 (04 June 2013)

Analysing re-sequencing samples. Malin Larsson WABI / SciLifeLab

2015 Workshop on Genomics. Genomics Laboratory

An Introduction to VariantTools

ChIP-seq (NGS) Data Formats

Sequence Alignment/Map Optional Fields Specification

Sep. Guide. Edico Genome Corp North Torrey Pines Court, Plaza Level, La Jolla, CA 92037

RNA- SeQC Documentation

Exeter Sequencing Service

Introduction to Read Alignment. UCD Genome Center Bioinformatics Core Tuesday 15 September 2015

Demultiplexing Illumina sequencing data containing unique molecular indexes (UMIs)

RPGC Manual. You will also need python 2.7 or above to run our home-brew python scripts.

MiSeq Reporter TruSight Tumor 15 Workflow Guide

Evaluate NimbleGen SeqCap Epi Target Enrichment Data

Genomic Files. University of Massachusetts Medical School. October, 2015

ALGORITHM USER GUIDE FOR RVD

CORE Year 1 Whole Genome Sequencing Final Data Format Requirements

merantk Version 1.1.1a

Transcription:

SAMtools http://samtools.sourceforge.net/ SAM/BAM mapping BAM SAM BAM BAM sort & indexing (ex: IGV) mapping SNP call SAMtools NGS

Program: samtools (Tools for alignments in the SAM format) Version: 0.1.19 Usage: samtools <command> [options] Command: view SAM<->BAM conversion sort sort alignment file mpileup multi-way pileup depth compute the depth faidx index/extract FASTA tview text alignment viewer index index alignment idxstats BAM index stats fixmate fix mate information flagstat simple stats calmd recalculate MD/NM tags and '=' bases merge merge sorted alignments rmdup remove PCR duplicates reheader replace BAM header cat concatenate BAMs bedcov read depth per BED region targetcut cut fosmid regions (for fosmid pool only) phase phase heterozygotes bamshuf shuffle and group alignments by name samtools view Usage: samtools view [options] <in.bam> <in.sam> [region1 [...]] Options: -b output BAM -h print header for the SAM output -H print header only (no alignments) -S input is SAM -u uncompressed BAM output (force -b) -x output FLAG in HEX (samtools-c specific) -X output FLAG in string (samtools-c specific) -c print only the count of matching records -t FILE list of reference names and lengths (force -S) [null] -T FILE reference sequence file (force -S) [null] -o FILE output file name [stdout] -R FILE list of read groups to be outputted [null] -f INT required flag, 0 for unset [0] -F INT filtering flag, 0 for unset [0] -q INT minimum mapping quality [0] -l STR only output reads in library STR [null] -r STR only output reads in read group STR [null] -? longer help

Q1. less ex1.sam Q2. less ex1.bam Q3. samtools Q4. samtools view samtools view Q5. samtools view ex1.bam Q6. samtools view ex1.sam bam ex1_myself.bam ex1.bam Q7. ls *Q8. samtools view -f ex1_myself.bam BAM index BAM BAM

Q1. samtools index samtools index Q2. ex1_myself.bam index index sort bam Q3. ex1_myself.bam sort samtools Q4. Q2 sort bam index *Q5. gn:buc 1000-1200 > samtools view file_sorted.bam gn:buc:1000-1200 sam 2 Yes/No 1/0 2 10 000001010011 = 83 Read2 1 seq 2 2 2 read 2 3 read 2 4 2 5 2 6 Read1 2 7 Read2 : Read1 83 map

10100011 163 01100011 99 ref Read2 Read1 read Read1 Read2 01010011 83 10010011 147 *Q6.

flagstat, depth flagstat: Collect some statistics about alignment $ samtools flagstat NA12878.chr16p.bam 2253834 + 0 in total (QC-passed reads + QC-failed reads) 131828 + 0 duplicates 2175422 + 0 mapped (96.52%:nan%) 1907026 + 0 paired in sequencing 953675 + 0 read1 953351 + 0 read2 1589213 + 0 properly paired (83.33%:nan%) 1750199 + 0 with itself and mate mapped 78415 + 0 singletons (4.11%:nan%) 47076 + 0 with mate mapped to a different chr 27432 + 0 with mate mapped to a different chr (mapq>=5) depth: compute the depth 1 coverage (depth) $ samtools depth NA12878.chr16p.bam head 16 47999937 1 16 47999938 1 Q1. samtools flagstat depth ex1_myself.sort.bam Q2. ex1_myself.bam flagstat Q3. ex1_myself.bam depth

mpileup Usage: samtools mpileup [options] in1.bam [in2.bam [...]] Input options: -6 assume the quality is in the Illumina-1.3+ encoding -A count anomalous read pairs -B disable BAQ computation -b FILE list of input BAM files [null] -C INT parameter for adjusting mapq; 0 to disable [0] -d INT max per-bam depth to avoid excessive memory usage [250] -E extended BAQ for higher sensitivity but lower specificity -f FILE faidx indexed reference sequence file [null] -G FILE exclude read groups listed in FILE [null] -l FILE list of positions (chr pos) or regions (BED) [null] -M INT cap mapping quality at INT [60] -r STR region in which pileup is generated [null] -R ignore RG tags -q INT skip alignments with mapq smaller than INT [0] -Q INT skip bases with baseq/baq smaller than INT [13] Output options: -D output per-sample DP in BCF (require -g/-u) -g generate BCF output (genotype likelihoods) -O output base positions on reads (disabled by -g/-u) -s output mapping quality (disabled by -g/-u) -S output per-sample strand bias P-value in BCF (require -g/-u) -u generate uncompress BCF output SNP/INDEL genotype likelihoods options (effective with `-g' or `-u'): -e INT Phred-scaled gap extension seq error probability [20] -F FLOAT minimum fraction of gapped reads for candidates [0.002] -h INT coefficient for homopolymer errors [100] -I do not perform indel calling -L INT max per-sample depth for INDEL calling [250] -m INT minimum gapped reads for indel candidates [1] -o INT Phred-scaled gap open sequencing error probability [40] -P STR comma separated list of platforms for indels [all] Notes: Assuming diploid individuals. > samtools mpileup ex1_myself.bam gn:buc 69656 N 32 t$t$tttttttttttttttttttttttttttttt HHEHFGIDHFCH?15HHHGHIH gn:buc 69657 N 30 tttttttttttttttttttttttttttttt EHHFG@HGDHF)BHHHHHGHDHEHHHHHEG gn:buc 69658 N 30 tttttttttttttttttttttttttttttt DHGGGBH?DHF1CHHHFHGHGHFHHHHGEF gn:buc 69659 N 30 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa EHFCFBHFDHD;5FHHGHEHDF?HHHHGDF gn:buc 69660 N 30 tttttttttttttttttttttttttttttt 6HDG=BHFDCE+;BHBFH?H?FFHGHHHEB gn:buc 69661 N 30 cccccccccccccccccccccccccccccc GGFEG8HGGHH,;HHHFHGHEGEHGHHHFH gn:buc 69662 N 30 cccccccccccccccccccccacccccccc EHEEH8HHGEE0/HDHFHDHA?FHHHHHGH gn:buc 69663 N 30 c$ccccccccccccccccccccccccccccc EHGEH<HGFHHA=HHHCHDF<FFHHHHHEH gn:buc 69664 N 29 ccccccccccccccccccccccccccccc HFCH<HFFGHCEHGHGHCHAHFHHHHH@H gn:buc 69665 N 29 t$tttttttttttttttttttttttttttt HHEE5HDEEFFEHHHFHFHF?DHHHGHFH gn:buc 69666 N 28 CCccCCCcccccCCCCcCCcCccCCCcc HFH5H?EHHG;HHHEHGHFF?HHHHHBH gn:buc 69667 N 28 CCccCCCcccccCCCCcCCcCccCCCcc HEH<HEEHHAEHHHGHGHGHEHHHHHGH gn:buc 69668 N 28 AAaaAAAaaaaaAAAAaAAaAaaAAAaa HEH5G<DHHDFHGGGGGHFHCGHHHHEH gn:buc 69669 N 28 A$AaaAAAaaaaaAAAAaAAaAaaAAAaa EBH:HEEGHDBHHHHHGHFHFHHHHHGH gn:buc 69670 N 27 T$ttTTTtttttTTTTtTTtTttTTTtt <H>HEEHF4EHHHHHHFGHFHHHHHFH

Q1. buc.genome.fasta mpileup Q2. index samtools faidx fasta index Q1 index gn:buc 49461 A 28,$,..,,.,.,,...,..,..,., AHHBHHHHHHEHHHDHHHDFHHHHHE<C gn:buc 49462 G 27,..,,.,.,,...,..,..,., HHEGHHHHHEHHHGFHDEFHGGHHE8< gn:buc 49463 A 28,..,,.,.,,...,..,..,.,^K, HHEHHHHHH>HHHHHHGCEHHHHH@<<< gn:buc 49464 A 28,..,,.,.,,...,..,..,.,, HG6HHHHHH:HHHHHHHDFHHGHHE>C= gn:buc 49465 A 28,..,,.,.,,...,..,..,.,, HEEGHHHHH@HHHEHHHDEHHGHGE6@> gn:buc 49466 A 29,..,,.,.,,...,..,..,.,,^K. HHEHHHHHH*HHHGHHHDFHHGDGE7/>8 gn:buc 49467 A 29,..,,.,.,,...,..,..,.,,. HEEHHHHHH@HHHEHHHD?HHGEH<6??8 gn:buc 49468 A 29,..,,.,.,,...,..,..,.,,. FHEFHHHGH8HHGHGHHDDEHFDH70CA9 gn:buc 49469 G 29 ccccccccccccccccccccccccccccc HFEHHHHHHEHHHHHHHCDEHBEH@;DB; gn:buc 49470 A 29,..,,.,.,,...,..,..,.,,. mpileup -> bcftools SAMtools BCFtools variant caller mpileup BCFtools variant vcf ) http://samtools.sourceforge.net/mpileup.shtml

Q1. ex1_myself.bam vcf Q2. less vcf SAMtools tview text alignment viewer viewer fixmate fix mate information merge merge sorted alignments BAM merge rmdup remove PCR duplicates PCR duplicate

Q1. ex1_myself.sort.bam rmdup SAMtools SAMtools NGS NGS