adriaan van der graaf*

Size: px

Start display at page:

Download "adriaan van der graaf*"

Leona Cain
5 years ago
Views:

1 B S S E Q D ATA A N A LY S I S adriaan van der graaf* contents 1 Introduction 1 2 Steps performed before the practical Isolation of DNA Sequencing library creation Bisulfite conversion and amplification Sequencing The practical Data preprocessing Alignment to the genome Steps not performed in the practical Initial data for methylome construction Determine conversion rate Calculate p values per cytosine position Determination of p-value cutoff using the false discovery rate Creation of a methylation map Further analysis Introduction and implementation Conclusion 16 1 introduction This practical will provide an overview of how methylcytosines are detected and analyzed using bisulfite sequencing (Bsseq). BSseq is the golden standard in detection of cytosine methylation, Methylation maps with single base pair resolution were first created in 2008 using BSseq. BSseq relies on the chemical conversion of cytosine under the influence of sodium bisulfite into uracil in DNA, but the absence of this reaction when the cytosine is replaced by the methylated variant: 5-methyl-cytosine. After conversion, the sequence of the DNA can be determined by a sequencing method of choice, however special steps need to be taken in library construction (Usage of methylated adapters) and alignment to the genome. The goal for this practical is for the user to create and analyse a methylation map based on bisulfite sequenced data. The manual will provide a method for analysis of bisulfite treated data from sample to full methylation map. Due to the limited amount of time in the practical, the participant will start with a subset of sequencing data to do the preprocessing and alignment step. After which the full set of aligned data is provided to create and detemine a methylation map, and do some basic analysis. * Groningen BioInformatics Center, Rijksuniversiteit Groningen, Groningen, the Netherlands 1

2 2 steps performed before the practical 2 2 steps performed before the practical In this practical we start out with raw sequencing data. The steps taken beforehand to obtain this data is described in this section. Although these steps may be informative to perform they are very time and asset consuming, due to the limitations of the practical we will not produce them here. 2.1 Isolation of DNA DNA needs to be isolated, before sequencing. Due to the detrimental effects of the bisulfite treatment on DNA in later steps, a large sample is used. In our case Arabidopsis thaliana was grown for 21 days, leaf tissue harvested, flash frozen and DNA was isolated using commercially available kits. 2.2 Sequencing library creation The Illumina platform used for sequencing the samples requires adapters to be ligated to DNA of a specific length. To make sure these adapters will retain their sequence, adapters are created with methylated cytosines. (ensuring proper binding to the flowcell) DNA is fragmentized using some fragmentation method (in our case sonication) where average length of the fragments is slightly bigger than the eventual number of sequencing iterations. Afterwards methylated adapters are ligated to the DNA, resulting in sequencing libraries. 2.3 Bisulfite conversion and amplification DNA libraries are bisulfite treated, converting cytosines into uracils, while methylated cytosines are not converted. After conversion, the libraries are pcr amplified using the pcr primers integrated in the adapters. After PCR amplification, the uracils base pair with adenines and will be amplified as thymines, subsequently these thymines will basepair with Adenines after amplification. 2.4 Sequencing Sequencing is done using the normal protocols available from standard vendors. However, due to the bisulfite conversion, when assesing for quality in.fastq files, sequenced reads will be lacking in CG content. Reads are then stored in the.fastq format. The sequencing in this practical was performed on an Illumina instrument using single end sequencing, with 101bp of sequencing data per read.

3 3 the practical 3 3 the practical Introduction This section is used to take a closer look at the whole data analysis pipeline from raw sequencing data, towards a full methylation map in Arabidopsis. The practical will let the participant use a subset of.fastq data to preprocess, align and remove copies. Afterwards fully processed aligned data is used to create a methylation map Implementation before we start, please login to the popeye server using ssh, using the credentials provided by the workshop organization, and use the following bash commands to create folders we will use in later steps in your home directory. 1 mkdir temp align TRIM_CUT Now move the raw.fastq (subset) data to your directory by using the cp command: 1 cp /home/allbio-ba-2014/line_69_rep1_subset.fastq line_69_rep1. fastq now the main.fastq file is in your directory if you use the head command you may look at the structure of.fastq: 1 head line_69_rep1.fastq Please observe.fastq files are composed of a 4 line structure. The first line is an identifier starting with character, the second line is the sequence read, the third line is a delimiter (+), also with identifier and the fourth line is the quality data in PHRED scores, a logarithmic scale of base calling errors. If you d like to know more about Phred scores, there is a good wikipedia lemma outlining the specifics.

4 3 the practical Data preprocessing introduction Raw sequencing reads will contain some contamination created by bisulfite treament, adapters and sequencing errors. Removing contamination of the raw sequencing data is done in two steps: removal of low quality reads and removal of adapter sequences. There are multiple publicly available software packages that remove these contaminations. The data for this practical was preprocessed using the publicly available cutadapt tool, using a quality requirement of 5, and with adapters removed from the 3 end of the sequence (fastq format is always 5 to 3, so removed from the end of the sequence). Reads shorter than 20bp were removed from further analysis Implementation Cutadapt is already installed on the server, All you need to do is run the following command in the terminal: 1 cutadapt -a AGATCGGAAGAGCTCGTATGCCGTCTTCTGCTTG --minimum-length =20 -q 5 line_69_rep1.fastq -o TRIM_CUT/TRIM_CUT_line_69_rep1. fastq > TRIM_CUT/cutadapt_stats.txt Cutadapt removes the sequence associated with -a from the three end of the sequence portion of the fastq file. The -q command of 5 will cut the sequences based on quality using the following algorithm: Subtract the given cutoff from all qualities; compute partial sums from all indices to the end of the sequence; cut sequence at the index at which the sum is minimal. If, after adapter removal and quality trimming, the length of the read is less than 20, it is discarded as aligning short reads to the genome may produce mismatches. After cutadapt has finished (takes around minutes depending on disk IO), it will provide some basic statistics on what has been cut in the file. you can do this by typing in the shell: 1 less TRIM_CUT/cutadapt_stats.txt Make sure there is not too little or too much cut data, and the strands that are the amount of cuts are not too long: We are using an adapter sequence of 33 bases, and if they are found in the first reads of the sequence, this would mean something is wrong with the ligation of the adapters, maybe they were ligated without the library in between them. If you have assesed the data, you can close the less program by pressing q.

5 3 the practical Alignment to the genome Introduction Alignment of bisulfite treated DNA can be done in a variety of methods. We used the alignment software BS Seeker 2 which aligns the bisulfite treated data to 3 base (adenine, tyrosine and guanine) reference genomes. 4 reference genomes are created: forward and backward where C is converted into T and forward and backward where G is converted into A. Afterwards the reads are aligned to the reference genomes using the bowtie aligner. Alignment mismatches are considered where a read C compared to reference T is not considered a mismatch. On the reverse strand this is also the case where read G to reference A is considered valid, while the other is considered a mismatch. Eventually all positions where a methylated read matches the reference are retained and considered a methylated position. Other Bisulfite sequencing methods may mask T positions as C in reads where a C is expected based on the reference sequence. This information is then retained and the reads that are methylated are counted compared to reads that are not methylated. Our data used the standard features of BSseeker2 using the bowtie 1 aligner, allowing for a maximum of 4 mismatches per read. The reference genome has already been converted into bisulfite converted beforehand, so this step does not need to be taken Implementation To align the genome, we need to specify a minimum of four things: What data do you want to align to, what reference do you want to use, where do you want the aligned data to go and which aligner to use. 1 python /opt/bsseeker2/bs_seeker2-align.py -i TRIM_CUT/TRIM_CUT_ line_69_rep1.fastq -g /opt/bsseeker2/bs_utils/reference_ genomes/tair10.fasta_bowtie/tair10.fasta --aligner=bowtie -- temp_dir=temp/ --output-format=bs_seeker1 --bt-p 4 --output= align/bs_align_line_69_rep1.txt > align/bs_stats.txt BSseeker2 uses python, the -i command is the input data, in our case the data that was produced by cutadapt. The -g command is the reference genome located in the same location as BSseeker folder. The aligner is the aligner. Temp_dir is used to store a subset of the fastq data to limit memory consumption and output-format is the format we want to output the data. output is the file we would like to output the aligned data. After the process is finished (10-20 minutes, depending on disk IO), please read the stats file using the less command: 1 less align/bs_stats.txt Scroll down to the last part of the file using the page down key. Here you will find the basic statistics of the sequencing run, Less than 50 % of the reads aligned to the genome would indicate something may be wrong. Now taking a look at the aligned data, open up the alignment file and take a look at it: 1 less align/bs_align_line_69_rep1.txt What BS seeker really just does is provide coordinates of where reads are placed, and which cytosines are methylated or unmethylated. among some other data that are less important to us. The output file uses a column like structure with the following data per column: 1. Read id (the same id as the ilumina id.) 2. The number of mismatches between the reference and the actual sequence

6 3 the practical 6 3. Which strand the mapping is done on 4. The coordinates (chromosome and afterwards the position on said chromosome) 5. The genomic sequence of the mapped sequence + 2 bases on both sides 6. BS sequences from 5 to 3 7. The summarized sequence, with capital XYZ for methylated in context (CG, CHG or CHH in order) 8. An index for subsequent CG reads. please take a look at the data and familiarize yourself with how bisulfite alignment is documented in these files. From this file we are able to construct subsequent methylation maps, however due to time constraints we will take a small jump in the data.

7 3 the practical Steps not performed in the practical Copy removal The pcr amplification described in section 2.3 will result in some sequences of the library to be amplified more than others. After sequencing, highly amplified sections will be read many times by the sequencing instrument. This amplification is detrimental to the determination of methylation as it over represents a sequence when considering if a position is methylated or not. To counter this amplification bias, Aligned reads that map to the same position are removed. Only the longest read with the fewest mismatches is retained in the analysis. This part was performed using custom software in our data, although I expect there are many packages available in the public domain to do this. We will not implement this step because the data used is a subset, therefore, not very many reads will map to the same position Restructuring of data Now we take a small jump, data wise. In the folder /home/allbio-ba- 2014/methylomeAnalysis we find the data we will use later in the practical, these files contain the number of methylated cytosines and the coverage of the cytosines. pplease copy this folder to your own directory using the code below and move along towards the next section describing the data in more detail. 1 cp -r /home/allbio-ba-2014/methylomeanalysis methylomeanalysis

8 3 the practical Initial data for methylome construction Introduction Our initial data is constructed out of 7 chromosomes: 5 autosomes (chromosomes in the nucleus), the mitochondrial chromosome and the chloroplast chromosome. In our analysis we will not use the mitochondrial chromosome, but we will use the chloroplast as this is unmethylated and thus will provide a measure of bisulfite conversion, thus also a false positive rate for methylatin in downstream analysis. The expermimental data is stored as plain text, in a two column structure, where each line is representative of a cytosine in genome position. The first column of the data is the number of methylated reads, the second column is the total number of reads for a specific position. Positions in the genome that are not cytosines have coverage 0. Cytosine positions are stored in the files: Meth_Total_line_69_rep1_chrX.txt, where X is the name of the chromosome This section will familiarize yourself with the initial data and some standard functions of R. The participant will determine the average coverage of all C positions in the mitochondrial chromosome in this section Implementation you can start the R interpreter using the bash command: R and close it using the command: q() in the interpreter or use the key combination ctrl-c. After you have started the R interpreter, set the working directory: 1 setwd("$path") #where $path is the location of the directory where you have copied the data in methylomeanalysis. you can find this by using the pwd command in linux shell All the files may be listed by typing the command: 1 list.files() #show all the filenames in the directory Or, one could determine the working directory by typing: 1 getwd() When you are in the correct working directory please load the file containing the mitochondrial data into an object called chrm. and determine some basic characteristics on how R handles the data: 1 chrm <- as.matrix(read.csv("meth_total_line_69_rep1_chrm.txt", header=false, sep=" ")) #load the file 2 summary(chrm) #this will provide a per column summary of the mitochondrion 3 dim(chrm) #this will output the dimensions (number of rows and number of columns) of the object. 4 head(chrm -n 100) #will provide the first 100 rows of the object To determine the average coverage of all the cytosine positions, we need to determine which positions are C s. Remember, positions in the genome that are not C positions have coverage 0, as do cytosine positions that are not covered. Therefore we need to remove all non-cg positions. The files that are in pos_c_chr[x].txt define which positions are cytosines and which aren t. There are 3 columns in these files, the first signifies which strand the cytosine is at, the second column provides the position of the specific cytosine and the third column signifies which C context the cytosine is: X=CG, Y=CHG and Z=CHH (where H is any base but guanine.). We will need to load this into R, identify the positions that are cytosines and retrieve the coverage: based on the chrm object. 1 chrmposc <- read.table("pos_c_chrm.txt", header=false) #load the C positions file.

9 3 the practical 9 2 Cpositions <- chrmposc[,2]!= "-" #create a logical vector if a position is true or not, based on the second column ([,2]) of the matrix. 3 4 mean(chrm[cpositions,2]) #determine the mean of cytosine coverage for the chrm object (second column). 5 mean(chrm[cpositions,1]) #determine the mean of cytosine methylation for the chrm object(first column). Now we have determined the mean coverage and the mean methylated of the mitochonrial chromosome, we will save all the cytosine positions in the.rdata format for fast loading when we need it. 1 chrm <- chrm[cpositions,] #retain all the C positions in chrm 2 save(chrm, file="meth_total_onlyc_chrm.rdata") #save all the C positions in the.rdata format Whenever we want to load.rdata files, we are able to do so by using the load() command. The object in the file will be directly available (you will not have to assign it a name, this will be retained in the file.) We will now create.rdata files for all chromosomes, this will take about 20 minutes, so please have a cup of coffee. Copy paste the following lines into the interpreter 1 2 starttime <- proc.time() 3 4 ############################### 5 ## Do chromosome 1. 6 ## Because reassignment in.rdata object is not possible, 7 ## we have to do this step 6 times: 5 autosomes and 1 chloroplast genome. 8 ############################### 9 10 chr1 <- as.matrix(read.csv("meth_total_line_69_rep1_chr1.txt", header=false, sep=" ")) #load the C positions file of chromosome PosC <- read.csv("pos_c_chr1.txt", header=false, sep="\t") #read C positions 12 Cpositions <- PosC[,2]!= "-" #find the C positions for chromosome chr1 <- chr1[cpositions,] #retain only C positions 15 save(chr1, file="meth_total_onlyc_chr1.rdata") #save all the C positions in the.rdata format rm(chr1) #memory management ##timing 20 cat("timesincestart (secs) ", proc.time()[3]-starttime[3]) ############################### 23 ## Do chromosome ############################### 25 chr2 <- as.matrix(read.csv("meth_total_line_69_rep1_chr2.txt", header=false, sep=" ")) #load the C positions file of chromosome PosC <- read.csv("pos_c_chr2.txt", header=false, sep="\t") #read C positions 27 Cpositions <- PosC[,2]!= "-" #find the C positions for chromosome 2.

10 3 the practical chr2 <- chr2[cpositions,] #retain only C positions 30 save(chr2, file="meth_total_onlyc_chr2.rdata") #save all the C positions in the.rdata format rm(chr2) #memory management ##timing 35 cat("timesincestart (secs) ", proc.time()[3]-starttime[3]) ############################### 39 ## Do chromosome ############################### 41 chr3 <- as.matrix(read.csv("meth_total_line_69_rep1_chr3.txt", header=false, sep=" ")) #load the C positions file of chromosome PosC <- read.csv("pos_c_chr3.txt", header=false, sep="\t") #read C positions 43 Cpositions <- PosC[,2]!= "-" #find the C positions for chromosome chr3 <- chr3[cpositions,] #retain only C positions 46 save(chr3, file="meth_total_onlyc_chr3.rdata") #save all the C positions in the.rdata format rm(chr3) #memory management ##timing 51 cat("timesincestart (secs) ", proc.time()[3]-starttime[3]) ############################### 54 ## Do chromosome ############################### 56 chr4 <- as.matrix(read.csv("meth_total_line_69_rep1_chr4.txt", header=false, sep=" ")) #load the C positions file of chromosome PosC <- read.csv("pos_c_chr4.txt", header=false, sep="\t") #read C positions 58 Cpositions <- PosC[,2]!= "-" #find the C positions for chromosome chr4 <- chr4[cpositions,] #retain only C positions 61 save(chr4, file="meth_total_onlyc_chr4.rdata") #save all the C positions in the.rdata format rm(chr4) #memory management ##timing 66 cat("timesincestart (secs) ", proc.time()[3]-starttime[3]) ############################### 70 ## Do chromosome ############################### 72 chr5 <- as.matrix(read.csv("meth_total_line_69_rep1_chr5.txt", header=false, sep=" ")) #load the C positions file of chromosome 5.

11 3 the practical PosC <- read.csv("pos_c_chr5.txt", header=false, sep="\t") #read C positions 74 Cpositions <- PosC[,2]!= "-" #find the C positions for chromosome chr5 <- chr5[cpositions,] #retain only C positions 77 save(chr5, file="meth_total_onlyc_chr5.rdata") #save all the C positions in the.rdata format rm(chr5) #memory management ##timing 84 cat("timesincestart (secs) ", proc.time()[3]-starttime[3]) ############################### 87 ## Do the chloroplast. 88 ############################### 89 chrc <- as.matrix(read.csv("meth_total_line_69_rep1_chrc.txt", header=false, sep=" ")) #load the C positions file of the chloroplast. 90 PosC <- read.csv("pos_c_chrc.txt", header=false, sep="\t") #read C positions 91 Cpositions <- PosC[,2]!= "-" #find the C positions for chromosome C chrc <- chrc[cpositions,] #retain only C positions save(chrc, file="meth_total_onlyc_chrc.rdata") #save all the C positions in the.rdata format rm(chrc) #memory management ##timing 100 cat("timesincestart (secs) ", proc.time()[3]-starttime[3])

12 3 the practical Determine conversion rate Introduction Bisulfite treatment of DNA is a chemical process and thus there is difficulty in creating a fully converted genome while still retaining DNA integrity. In plants the chloroplast does not contain methylcytosines in its DNA, this creates a suitable sequence for determination of the conversion rate. Animals do not have chloroplasts, bisulfite sequencing of these organisms will require the addition of some unmethylated DNA (usually the addition of some Phage DNA) to the sample for determination of conversion rate. Bisulfite conversion is important in the downstream determination of methylation for ambiguously methylated cytosines. The bisulfite conversion rate is calculated by taking the sum of the unconverted positions and divide it by the sum of converted reads Implementation Close and restart the R console (R is not very memory efficient, this will wipe the retained memory, and start fresh), set your working directory and load the chloroplast genome by using the setwd() and load() functions. The conversion rate is the relation between how many false positives we observe in the chloroplast compared to the total amount of converted reads, and thus can bee seen as a kind of a false positive measure. To realize this in out data, consider the data structure: the first column with all methylated reads for C positions and the second column with the total coverage. we can simply take the sum of both columns and divide the first by the second: 1 convrate <- 1 - sum(chrc[,1])/sum(chrc[,2]) #divide the total sum of methylated reads([,1]) by the total sum of reads([,2]) in the chloroplast. Save the conversion rate in a the following file: 1 save(convrate, file="conversionrate_line_69_rep1.rdata")

13 3 the practical Calculate p values per cytosine position introduction After calculation of the bisulfite conversion rate, we are able to consider if a position is methylated or not. A one tailed binomial test is performed on every covered cytosine, using the conversion rate calculated in section 3.5 as the expected probability. The resulting p-value will be the probability of the conversion rate being greater or equal to the conversion rate that has been observed in this position. A low p-value will thus indicate that the position is methylated, while a high p-value will indicate that the position is not methylated. In this part every covered cytosine will receive a p-value based on a binomial test Implementation We will determine the p-value of every cytosine using the binom.test function. To achieve this, we iterate over every row in the raw data, and calculate a p-value accordingly, using the binom.vec function. 1 source("/home/allbio-ba-2014/code/binomvecfunction.r") #load the binomial Vector function Or by manual copy/paste the content of the file into the interpreter. The function will output a vector of p-values or NA values if the position was not covered in sequencing. You are able to use the function by typing the function with the matrix of methylated vs. covered (chrx) and the value for the conversion rate: convrate. This is only a subset of the total data, as calculation will take around an hour per chromosome, already prepared data will be used in later steps. 1 load("meth_total_onlyc_chr1.rdata") 2 pvalvecsubset <- binomvec(chr1[1:100000,], convrate) #perform on the first 100,000 cytosines in chromosome 1. Please take some time to see how the pvalues are structured: 1 summary(pvalvecsubset) #provide a summary 2 head(pvalvecsubset n=300) # get the first 300 pvalues 3 table(pvalvecsubset) #get a frequency table These p-values have already been produced beforehand in the folder /home/allbio-ba-2014/p-values, exit the R interpreter and copy the files in shell to your folder, we will load them in the creation of p-values: 1 cp /home/allbio-ba-2014/p-values/* methylomeanalysis/

14 3 the practical Determination of p-value cutoff using the false discovery rate Introduction To account for multiple hypothesis testing we will perform a false discovery rate (FDR) method for determination if a cytosine is methylated or not. As the name implies, the FDR method is used to limit the amount of false positive discoveries when assessing many tests. This method will require a list (or vector) of p-values and based on a stepwise procedure, will produce the cutoff value. In our case, positions with a p-value lower than the FDR cutoff will be called methylated whereas positions that do not meet this criterium will be considered unmethylated. The FDR method will search for the highest indices k in an ordered list of p-values P, see equation 1. P (k) k m c(m) α (1) = 1 i i=1 Where m is the total amount of tests (or length of the list) and α is the user defined FDR value. c(m) = 1 when considering independent observations, however under dependence the c(m) = m nearing the value 2. Cytosine methylation is not randomly distributed over the genome thus we use the FDR under dependence Implementation Make sure the R interpreter is running and the proper working directory(your methylomeanalysis directory) is set using the setwd() command. The false discovery rate needs to be determined genome wide, therefore we will catenate all pvalue vectors (one per chromosome) into one. we do this by using the c() command after loading all the pvalue vector in the interpreter. 1 pvalfiles <- list.files(pattern="^pvalvec*.") #all the pvalue chromosomes 2 3 pvalvecfull <- NULL #initialize the full pvalue vector 4 5 for(file in pvalfiles){ 6 7 load(file) 8 9 pvalvecfull <- c(pvalvecfull, pvalvec) #here the p-values are catenated } Now we have added all p values into one big vector describing them genome wide. The FDR determination will be done using the compute.fdr function taking two arguments: the p-value vector and the actual false discovery rate. We set this last value to You are able to load the function using the source() function or copying it from the computefdr.r file. 1 #load the fdr file 2 source("/home/allbio-ba-2014/code/computefdr.r") 3 4 #determine FDR 5 6 FDRcutoff <- compute.fdr(pvalvecfull, 0.05)

15 3 the practical Creation of a methylation map Introduction In the previous steps we have created all the information necessary for the determination of methylation status per cytosine. In this step we will determine methylation status of all positions. When considering methylation status of a cytosine, there are 3 options: methylated, unmethylated and uncovered. We will use the logical object in R to obtain this Implementation As we now have all the ingredients to compute a methylation map, we look at which positions have a p-value lower than the FDR cutoff, a single line is everything we need to produce the final methylation map: 1 methylationmap <- pvalvecfull <= FDRcutoff #this is compared over all positions in the p-value vector. The methylationmap object consists of 3 possibilities much like the Cpositions object. where a coverage of 0 is NA, a methylated position is TRUE and an unmethylated position is FALSE. This is enought to do all the methylpome analysis. 3.9 Further analysis 3.10 Introduction and implementation We will end this tutorial with a small analysis of methylation status of in certain contexts and annotations. For this please load the previously prepared annotation file: 1 load("/home/allbio-ba-2014/annotationallchromosomes.rdata") This file contains all objects, logial vectors of the same length as the methylationmap, and logical comparison will provide all the necessary data to do these basic analyses. The objects provide the location of all sequence contexts (CG, CHG and CHH) and gene, 1.5kb upstream, noncoding and Transposable element annotations. Now, determine the total number of cytosines that are covered, methylated and unmethylated 1 sum(!is.na(methylationmap)) #total covered 2 sum(methylationmap, na.rm=t) #Total methylated 3 sum(!methylationmap, na.rm=t) #Total unmethylated. Now determine the genome wide methylation proportion of this Arabidopsis line by dividing the number of methylated by the total number of covered. To determine the amount of methylation in an annotation (In this example CG) one uses the following notation: 1 sum(methylationmap & CG, na.rm=t) #methylated in CG 2 sum(cg, na.rm=t) #total number of CG Now you will be able to determine the methylation proportions for all available annotations (object names in R): CG, CHG, CHH, gene, upstream, noncoding, TE. You are able to do this my taking the total number of methylated cytosines in this annotation or context and divide it by the total number of cytosines in this annotations or context, creating the basic statistics. If you are interested in other questions, you are fully able to perform them.

16 4 conclusion 16 4 conclusion This concludes the tutorial for BSseq. The goal of this tutorial was to determine a methylation map and get some basic statistics from Arabidopsis. I hope you have found it interesting. If you have any questions regarding the tutorial in later time, you may me at: adriaan.vd.graaf@gmail.com Thank you for your time.

USING BRAT-BW Table 1. Feature comparison of BRAT-bw, BRAT-large, Bismark and BS Seeker (as of on March, 2012)

USING BRAT-BW Table 1. Feature comparison of BRAT-bw, BRAT-large, Bismark and BS Seeker (as of on March, 2012) USING BRAT-BW-2.0.1 BRAT-bw is a tool for BS-seq reads mapping, i.e. mapping of bisulfite-treated sequenced reads. BRAT-bw is a part of BRAT s suit. Therefore, input and output formats for BRAT-bw are