SAMtools http://samtools.sourceforge.net/ SAM/BAM mapping BAM SAM BAM BAM sort & indexing (ex: IGV) mapping SNP call SAMtools NGS
Program: samtools (Tools for alignments in the SAM format) Version: 0.1.19 Usage: samtools <command> [options] Command: view SAM<->BAM conversion sort sort alignment file mpileup multi-way pileup depth compute the depth faidx index/extract FASTA tview text alignment viewer index index alignment idxstats BAM index stats fixmate fix mate information flagstat simple stats calmd recalculate MD/NM tags and '=' bases merge merge sorted alignments rmdup remove PCR duplicates reheader replace BAM header cat concatenate BAMs bedcov read depth per BED region targetcut cut fosmid regions (for fosmid pool only) phase phase heterozygotes bamshuf shuffle and group alignments by name samtools view Usage: samtools view [options] <in.bam> <in.sam> [region1 [...]] Options: -b output BAM -h print header for the SAM output -H print header only (no alignments) -S input is SAM -u uncompressed BAM output (force -b) -x output FLAG in HEX (samtools-c specific) -X output FLAG in string (samtools-c specific) -c print only the count of matching records -t FILE list of reference names and lengths (force -S) [null] -T FILE reference sequence file (force -S) [null] -o FILE output file name [stdout] -R FILE list of read groups to be outputted [null] -f INT required flag, 0 for unset [0] -F INT filtering flag, 0 for unset [0] -q INT minimum mapping quality [0] -l STR only output reads in library STR [null] -r STR only output reads in read group STR [null] -? longer help
Q1. less ex1.sam Q2. less ex1.bam Q3. samtools Q4. samtools view samtools view Q5. samtools view ex1.bam Q6. samtools view ex1.sam bam ex1_myself.bam ex1.bam Q7. ls *Q8. samtools view -f ex1_myself.bam BAM index BAM BAM
Q1. samtools index samtools index Q2. ex1_myself.bam index index sort bam Q3. ex1_myself.bam sort samtools Q4. Q2 sort bam index *Q5. gn:buc 1000-1200 > samtools view file_sorted.bam gn:buc:1000-1200 sam 2 Yes/No 1/0 2 10 000001010011 = 83 Read2 1 seq 2 2 2 read 2 3 read 2 4 2 5 2 6 Read1 2 7 Read2 : Read1 83 map
10100011 163 01100011 99 ref Read2 Read1 read Read1 Read2 01010011 83 10010011 147 *Q6.
flagstat, depth flagstat: Collect some statistics about alignment $ samtools flagstat NA12878.chr16p.bam 2253834 + 0 in total (QC-passed reads + QC-failed reads) 131828 + 0 duplicates 2175422 + 0 mapped (96.52%:nan%) 1907026 + 0 paired in sequencing 953675 + 0 read1 953351 + 0 read2 1589213 + 0 properly paired (83.33%:nan%) 1750199 + 0 with itself and mate mapped 78415 + 0 singletons (4.11%:nan%) 47076 + 0 with mate mapped to a different chr 27432 + 0 with mate mapped to a different chr (mapq>=5) depth: compute the depth 1 coverage (depth) $ samtools depth NA12878.chr16p.bam head 16 47999937 1 16 47999938 1 Q1. samtools flagstat depth ex1_myself.sort.bam Q2. ex1_myself.bam flagstat Q3. ex1_myself.bam depth
mpileup Usage: samtools mpileup [options] in1.bam [in2.bam [...]] Input options: -6 assume the quality is in the Illumina-1.3+ encoding -A count anomalous read pairs -B disable BAQ computation -b FILE list of input BAM files [null] -C INT parameter for adjusting mapq; 0 to disable [0] -d INT max per-bam depth to avoid excessive memory usage [250] -E extended BAQ for higher sensitivity but lower specificity -f FILE faidx indexed reference sequence file [null] -G FILE exclude read groups listed in FILE [null] -l FILE list of positions (chr pos) or regions (BED) [null] -M INT cap mapping quality at INT [60] -r STR region in which pileup is generated [null] -R ignore RG tags -q INT skip alignments with mapq smaller than INT [0] -Q INT skip bases with baseq/baq smaller than INT [13] Output options: -D output per-sample DP in BCF (require -g/-u) -g generate BCF output (genotype likelihoods) -O output base positions on reads (disabled by -g/-u) -s output mapping quality (disabled by -g/-u) -S output per-sample strand bias P-value in BCF (require -g/-u) -u generate uncompress BCF output SNP/INDEL genotype likelihoods options (effective with `-g' or `-u'): -e INT Phred-scaled gap extension seq error probability [20] -F FLOAT minimum fraction of gapped reads for candidates [0.002] -h INT coefficient for homopolymer errors [100] -I do not perform indel calling -L INT max per-sample depth for INDEL calling [250] -m INT minimum gapped reads for indel candidates [1] -o INT Phred-scaled gap open sequencing error probability [40] -P STR comma separated list of platforms for indels [all] Notes: Assuming diploid individuals. > samtools mpileup ex1_myself.bam gn:buc 69656 N 32 t$t$tttttttttttttttttttttttttttttt HHEHFGIDHFCH?15HHHGHIH gn:buc 69657 N 30 tttttttttttttttttttttttttttttt EHHFG@HGDHF)BHHHHHGHDHEHHHHHEG gn:buc 69658 N 30 tttttttttttttttttttttttttttttt DHGGGBH?DHF1CHHHFHGHGHFHHHHGEF gn:buc 69659 N 30 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa EHFCFBHFDHD;5FHHGHEHDF?HHHHGDF gn:buc 69660 N 30 tttttttttttttttttttttttttttttt 6HDG=BHFDCE+;BHBFH?H?FFHGHHHEB gn:buc 69661 N 30 cccccccccccccccccccccccccccccc GGFEG8HGGHH,;HHHFHGHEGEHGHHHFH gn:buc 69662 N 30 cccccccccccccccccccccacccccccc EHEEH8HHGEE0/HDHFHDHA?FHHHHHGH gn:buc 69663 N 30 c$ccccccccccccccccccccccccccccc EHGEH<HGFHHA=HHHCHDF<FFHHHHHEH gn:buc 69664 N 29 ccccccccccccccccccccccccccccc HFCH<HFFGHCEHGHGHCHAHFHHHHH@H gn:buc 69665 N 29 t$tttttttttttttttttttttttttttt HHEE5HDEEFFEHHHFHFHF?DHHHGHFH gn:buc 69666 N 28 CCccCCCcccccCCCCcCCcCccCCCcc HFH5H?EHHG;HHHEHGHFF?HHHHHBH gn:buc 69667 N 28 CCccCCCcccccCCCCcCCcCccCCCcc HEH<HEEHHAEHHHGHGHGHEHHHHHGH gn:buc 69668 N 28 AAaaAAAaaaaaAAAAaAAaAaaAAAaa HEH5G<DHHDFHGGGGGHFHCGHHHHEH gn:buc 69669 N 28 A$AaaAAAaaaaaAAAAaAAaAaaAAAaa EBH:HEEGHDBHHHHHGHFHFHHHHHGH gn:buc 69670 N 27 T$ttTTTtttttTTTTtTTtTttTTTtt <H>HEEHF4EHHHHHHFGHFHHHHHFH
Q1. buc.genome.fasta mpileup Q2. index samtools faidx fasta index Q1 index gn:buc 49461 A 28,$,..,,.,.,,...,..,..,., AHHBHHHHHHEHHHDHHHDFHHHHHE<C gn:buc 49462 G 27,..,,.,.,,...,..,..,., HHEGHHHHHEHHHGFHDEFHGGHHE8< gn:buc 49463 A 28,..,,.,.,,...,..,..,.,^K, HHEHHHHHH>HHHHHHGCEHHHHH@<<< gn:buc 49464 A 28,..,,.,.,,...,..,..,.,, HG6HHHHHH:HHHHHHHDFHHGHHE>C= gn:buc 49465 A 28,..,,.,.,,...,..,..,.,, HEEGHHHHH@HHHEHHHDEHHGHGE6@> gn:buc 49466 A 29,..,,.,.,,...,..,..,.,,^K. HHEHHHHHH*HHHGHHHDFHHGDGE7/>8 gn:buc 49467 A 29,..,,.,.,,...,..,..,.,,. HEEHHHHHH@HHHEHHHD?HHGEH<6??8 gn:buc 49468 A 29,..,,.,.,,...,..,..,.,,. FHEFHHHGH8HHGHGHHDDEHFDH70CA9 gn:buc 49469 G 29 ccccccccccccccccccccccccccccc HFEHHHHHHEHHHHHHHCDEHBEH@;DB; gn:buc 49470 A 29,..,,.,.,,...,..,..,.,,. mpileup -> bcftools SAMtools BCFtools variant caller mpileup BCFtools variant vcf ) http://samtools.sourceforge.net/mpileup.shtml
Q1. ex1_myself.bam vcf Q2. less vcf SAMtools tview text alignment viewer viewer fixmate fix mate information merge merge sorted alignments BAM merge rmdup remove PCR duplicates PCR duplicate
Q1. ex1_myself.sort.bam rmdup SAMtools SAMtools NGS NGS