Manual of mirdeepfinder for EST or GSS

Size: px

Start display at page:

Download "Manual of mirdeepfinder for EST or GSS"

Flora Kelley
6 years ago
Views:

1 Manual of mirdeepfinder for EST or GSS Index 1. Description 2. Requirement 2.1 requirement for Windows system Perl Install the module DBI BLAST Requirement for Linux System Perl DBI, Getopt::Std, File::Temp, Math::CDF, and Spreadsheet::WriteExcel R MySQL 3. mirdeepfinder installation 4. Work Flow 4.1 Data preparation and create related databases Short Read sequence Databases of EST or GSS cdna database or EST database for target identification All plant reference protein database mirbase Gene Ontology (GO) database KEGG database Rfam database Other short read dataset files Configure mirdeepfinder parameters file Input GO database to MySQL Input mirna database (mirbase) to MySQL Create database schema for mirna deep sequencing analysis 5 Run mirdeepfinder 5.1 mirna analysis 5.2 Further check the generated secondary hairpin structures manually. 5.3 Filter mirnas with imperfect hairpin structure 5.4 Group predicted mirnas 5.5 Update the mirna names in MySQL 5.6 Target analysis 5.7 Remove repeated target 5.8 Analysis of GO and KEGG 5.9 Report all result data 5.10 Detect novel mirna in other small RNA datasets from same species 1

2 6 degradome analysis by CleaveLand and targetfinder 7 Contact us 1. Description mirdeepfinder package provides an entire workflow of analyzing data from plant microrna (mirna) deep sequencing, which includes identification of mirnas and their targets, mirna family classification, degradome sequencing analysis for mirna target identification, analysis of GO (Gene ontology), and pathway enrichment of KEGG (Kyoto Encyclopedia of Genes and Genomes). This document is an introduction to the MiRDeepFinder package. MiRDeepFinder package is developed by Perl (V5.8 or higher) and MySQL. Most of program should be run on Linux system and Target-Align for target prediction should be run on Windows. MiRDeepFinder also offers user another mirna target search engine, targetfinder (targetfinder 1.6, besides of Target-Align. 2. Requirement MiRDeepFinder incorporates BLAST++ (blastn, blastp, blastx, and makeblastdb), RNAfold from Vienna RNA package, WATER from EMBOSS, Bowtie, FASTA35, targetfinder, CleaveLand, and ps2pdfwr. mirdeepfinder offers users two engine options for mirna target search, Target-Align and targetfinder. Target-Align was developed by C# and should be run on Windows system. Thus, except mirna target identification with Target-Align, all steps of mirdeepfinder should be run on Linux system. 2.1 requirement for Windows system If you use Target-Align to identify mirna target, Perl, DBI, and BLAST++ should be installed. The following is the way of how to set up them Perl Perl (V5.8 or V5.10) can be downloaded from the web site or After installation, you should make sure perl.exe is under CLASSPATH environment variables. Here we assume Perl is installed under the directory of C:/perl/. The setting of classpath environment variables is listed as follow: Right click the "My Computer" icon; Select Properties; Select Advanced; Click on Environment variables; Select CLASSPATH and click Edit; Add ; C:/perl/ bin/ (do not include the quotes) to the end of CLASSPATH and then click Confirm 2

3 2.1.2 Install the module DBI Type command ppm search DBD-mysql in a DOS-command line. The search result will show like these 1: DBD-mysql A MySQL driver for the Perl5 Database Interface (DBI) Version: Released: Repo: ActiveState Package Repository 2: DBD-mysql A MySQL driver for the Perl5 Database Interface (DBI) Version: Repo: Kobes. Try to find the number X before the module DBD-mysql and then continue typing the command ppm install X. Take the search result showed as above as an example, the command is ppm install 1 or ppm install BLAST++ ncbi-blast can be available from ftp://ftp.ncbi.nih.gov/blast/executables/release/2.2.9/ and ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.2.24/. Please make sure BLAST s bin directory is in CLASSPATH (reference the way of setting CLASSPATH for Perl) 2.2 Requirement for Linux System For Linux system, Perl, Perl modules (DBI, Getopt::Std, File::Temp, Math::CDF, and Spreadsheet::WriteExcel), and R should be set up Perl Perl (V5.8 or V5.10) can be downloaded from the web site and be set up according to its document DBI, Getopt::Std, File::Temp, Math::CDF, and Spreadsheet::WriteExcel Type the command in Linux terminal end (eg: Ubuntu) sudo cpan install DBI install Getopt::Std install File::Temp install Math::CDF install Spreadsheet::WriteExcel R sudo apt-get install r-base 3

4 2.2.4 MySQL Install MySQL server by command: sudo apt-get install mysql-server. If your Linux system supports X-GENOME GUI interface, you also can install MySQL GUI client to better operate database. The command is sudo apt-get install mysql-client. Suppose you configure a user root and its password password for MySQL server. If you want to use Target-Align to do mirna target search, you need to configure MySQL server in order to allow the computer in Windows system to access the MySQL server. Edit the /etc/mysql/my.cnf file to configure MySQL to listen for connections from network hosts. cancel the bind_address to your Linux local machine IP: #bind-address = That means you allow MySQL to listen response from other machines with different IP. To assign an account to remotely visit MySQL database from Windows system, Open a terminal window in Linux system, type the following commands (assume the user name is root and password is password ): >mysql uroot ppassword >grant all privileges on *.* to % identified by password ; It grants all privileges to any machine to access MySQL server by user name root and password password. 3. mirdeepfinder installation Unzip mirdeepfinder1.0.tar.gz: tar -zxvf mirdeepfinder1.0.tar.gz; Suppose unzipped file directory is /home/mirna/ mirdeepfinder1.0/ Make sure all files are executable. If not executable, use the command: chmod -R 777 /home/mirna/ mirdeepfinder1.0/ 4. Work Flow I strongly recommend you create a new directory for your data analysis and put your prepared data to this directory. Suppose your current workplace directory is /home/mirna/deepsequencing/, 4.1 Data preparation and create related databases Short Read sequence MiRDeepFinder support several versions of short reads, like original sequencing raw data or clean small RNA datasets downloaded from NCBI. For original sequencing data including sequence and sequencing quality, run Adapter_trim.pl in directory removeadapter to remove adaptor sequences and sequences in low quality and group these sequences. Usage: perl Adapter_trim.pl [options] >outputfile 4

5 Options: -i <file> Short reads file in fastq format -n <str> Sample name; default="sample" -x <str> 5\' adaptor sequence, default="gttcagagttctacagtccgacgatc" -y <str> 3\' adaptor sequence, default="tcgtatgccgtcttctgcttg" -f <int> Fastq file format: 1=Sanger format; 2=Solexa/Illumina 1.0 format; 3=Illumina 1.3+ format; default=2 -h Help Examples: perl /home/mirna/mirdeepfinder1.0/removeadapter/adapter_trim.pl -i sample.fq -n "newid" -f 1 >outputfile perl /home/mirna/mirdeepfinder1.0/removeadapter/adapter_trim.pl -i sample.fq -x "ATCGGGCT" -y "TCGTAT" -f 3 >outputfile perl /home/mirna/mirdeepfinder1.0/removeadapter/adapter_trim.pl -i sra_data.fastq -x "GTTCAGAGTTCTACAGTCCGACGATC" -y "TCGTATGCCGTCTTCTGCTTG" -f 2 >outputfile For short reads from with adaptor sequences from 454 sequencing, run Remove454Adaptor.pl in directory removeadapter to remove adaptor sequences and group these sequences. The format of short read file is as follow: >SequenceName ATCGTAGGCACCTGAAACGCGGGTTCCCTAACTACCACGGATGTC Usage: perl /home/mirna/mirdeepfinder1.0/removeadapter/remove454adaptor.pl [short read file] [5' primer adaptor] [3' primer adaptor] [result file] Example: Perl /home/mirna/mirdeepfinder1.0/removeadapter/remove454adaptor.pl seq.fa ATCGTAGGCACCTGAAA ATTGATGGTGCCTACAG resultfile For clean short reads downloaded from NCBI, run CleanSequence.pl to further clean sequences. The final format of short read file is as follow: (Sequence) (Read count) GTTCAGAGTTCTACAGTCCGACGATC 458 Usage: perl./cleansequence.pl [short read file] [result file] Example: perl /home/mirna/mirdeepfinder1.0/removeadapter/cleansequence.pl seq.fa resultfile.fa mirdeepfinder assumes the short reads is pre-processed by the way described above, since it only recognizes the final short sequence format (sequence read_count) and inputs these short reads to MySQL for further analysis. If the format of data is different, users should try to either modify the file mirnaanalysis/1inputshortread.pl or transform data to the fixed format. If users have problem on modifying the code or transforming data to the fixed format, they are welcome to get help from us by . 5

6 4.1.2 Databases of EST or GSS mirdeepfinder only support EST and GSS in Fasta format of NCBI, like >gi gb JK JK Xh_LDF_23H073 Leaf Dehydration (LD) Xerophyta humilis cdna clone. If you have multiple EST or GSS, you should combine all EST or GSS to a file. Put the files to a directory, like /home/mirna/deepsequencing/estorgss/ cdna database or EST database for target identification mirdeepfinder only support cdna database or EST database in Fasta format of NCBI, like >gi gb JK JK Xh_LDF_23H073 Leaf Dehydration (LD) Xerophyta humilis cdna clone. Put all your cdna database or EST database to a directory, like /home/mirna/deepsequencing/targetestorcdna/ All plant reference protein database Go to NCBI and select Protein database. Input keyword plant and search. Then click RefSeq link and download it as all plant reference protein database for detecting non-coding gene. Put plant reference protein database to a directory, like /home/mirna/deepsequencing/allplantrefproteindb/ mirbase The mirbase is also integrated into local MySQL. The whole database files of mirbase are downloaded from the website ftp://mirbase.org/pub/mirbase/current/database_files/ and then are unzipped to a directory, like /home/mirna/deepsequencing/mirbase/ Gene Ontology (GO) database The newest GO database is obtained in the directory of latest-lite from the address ftp://ftp.geneontology.org/godatabase/archive/. The file is named with date. All the zipped files are downloaded to a local directory and are also unzipped to a directory, such as /home/mirna/deepsequencing/go/ KEGG database Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway database is available from the ftp://ftp.genome.jp/pub/kegg/pathway/ and is saved to a directory, like /home/mirna/deepsequencing/kegg/ Rfam database Rfam database (10.0) is downloaded from ftp://ftp.sanger.ac.uk/pub/databases/rfam/10.0/ and saved to a directory, like /home/mirna/deepsequencing/rfam/ Other short read dataset files If you want to detect if the final analyzed mirnas exist in other short read dataset files, you 6

7 should create a directory named otherdataset in current directory, like /home/mirna/deepsequencing/ otherdataset/ Configure mirdeepfinder parameters file Copy the configuration template file of MiRDeepFinder from the installed directory, /home/mirna/mirdeepfinder1.0/config-template to current workplace directory /home/mirna/deepsequencing/ and rename it to config. Open the file config with vim, notepad, or gedit, and edit it. The line beginning with -- is annotation line. Eg sever: localhost server is parameter s key and localhost is parameter s value. The key and its value are divided by a colon plus a space (like : ). Caution: the key name should not be revised. According to the meaning of key, users are just allowed to edit your related parameters value. Users should use absolute path when some values pertain to file path Input GO database to MySQL We assumed the GO database files for is unzipped to the directory /home/mirna/deepsequencing /GO/. Find the file go_ schema-mysql.sql and edit it by notepad or other editor softwares. Add these lines below to the top of file go_ schema-mysql.sql. drop database if exists GO; create database GO; use GO; Save and close the file. Login in MySQL from Linux terminal end as follows (assume the user name is root and password is password ) >mysql uroot ppassword Then run following commands one by one (make sure the current directory is right, otherwise you need to add path to following files): source go_ schema-mysql.sql; use GO; source go_ termdb-data; source go_ assocdb-data; source go_ seqdb-data; It will take a relatively long time to finish it. Be patient! Input mirna database (mirbase) to MySQL Open the sql script file mirbasecommand.sql in the directory /home/mirna/ 7

8 mirdeepfinder1.0/table/ and edit the file path. For the line: source../mirbase/tables.sql; If the mirbase files are not in the directory of../mirbase/, replace../mirbase/ with your right path of mirbase files. For example: replace../mirbase/ with /home/mirna/deepsequencing/mirbase/. This step only marks the mirnas from plant by using the sql command listed as bellow: update mirbase.mirna a, mirbase.mirna_mature b, mirbase.mirna_pre_mature c, mirbase.mirna_species d set b.source='plant' where a.auto_mirna=c.auto_mirna and b.auto_mature=c.auto_mature and a.auto_species =d.auto_id and d.taxonomy like '%Magnoliophyta%' ; The command is contained in the mirbasecommand.sql Save the file mirbasecommand.sql and close it. Run the sql script file in MySQL command line by command source mirbasecommand.sql (see the steps to set up GO database by running MySQL sql files) Create database schema for mirna deep sequencing analysis Edit sql script file mirdeepfinder.sql in the directory of table. For instance: drop database if exists mirdeepfinder; create database mirdeepfinder; use mirdeepfinder; You can change the database name to your defined database name. The default name is mirdeepfinder. Login in MySQL and run the sql script file mirdeepfinder.sql (see the steps to set up GO database by running MySQL sql files). Next, run the two sql script files go.sql and kegg.sql to for databases of GO and kegg, respectively. 5 Run mirdeepfinder After data preparation, database creation, configuring config file, make sure config is configured with right parameters. Especially for parameter related to file path, please use absolute path (full path). Assume the config file is under current directory. 5.1 mirna analysis Run perl /home/mirna/mirdeepfinder1.0/1deep.pl. It will run for a relatively long time. All mirna short read sequence could be input All result will be stored in MySQL database. mirnas will be categorized conserved mirna and novel mirnas (non-conserved mirna). Secondary hairpin structures of conserved mirna 8

9 precursors and novel mirna precursors in pdf files will be output to /home/mirna/deepsequencing/mirdeepfinder_output/report/conservedpdf and /home/mirna/deepsequencing/mirdeepfinder_output/report/nonconservedpdf, respectively. Conserved mirna cluster will be output to /home/mirna/deepsequencing/mirdeepfinder_output/report/conservedclusterreportfile.txt 5.2 Further check the generated secondary hairpin structures manually. Go to the directories of /home/mirna/deepsequencing/mirdeepfinder_output/report/conservedpdf /home/mirna/deepsequencing/mirdeepfinder_output/report/nonconservedpdf, and click every pdf file and check whether the corresponding hairpin structure is perfect or near perfect hairpinloop structure. The mature mirna sequence is located at either 5 end or 3 end. See the figure: and The right end is 3 end and the left end is 5 end. Most of these hairpin structures are normal. The reason why we need to double-check the structures is the length of plant mirna precursors varies more than those in animals. It is hard to make absolute analysis of hairpin structure based on folding information by RNAfold program. The criteria for checking hairpin structure as follow: 1. There were no more than six nucleotides mismatched between the predicted mature mirna sequence and its opposite mirna* sequence in the secondary structure; 2. There were no obvious loops or breaks in the mirna: mirna* complex; If the hairpin structure is identified to be imperfect, just write down the file name without extension name in a line to the /home/mirna/deepsequencing/mirdeepfinder_output/query/imperfectfile.txt. So, the content of the file imperfectfile.txt would be similar as follow: Both of names of conserved mirnas and non-conserved mirnas with imperfect hairpin structures should be recorded to the file imperfectfile.txt. 5.3 Filter mirnas with imperfect hairpin structure Run perl script file 2RemoveImperfect.pl by command: perl /home/mirna/mirdeepfinder1.0/2removeimperfect.pl. 9

10 5.4 Group predicted mirnas and rename your mirnas Run perl script file 3GroupNovelFamily.pl by command perl /home/mirna/mirdeepfinder1.0/3groupnovelfamily.pl. Novel mirna groups will be output to /home/mirna/deepsequencing/mirdeepfinder_output/report/novelmirnafamilygroupreportfi le.txt. All mirna will be output to the file /home/mirna/deepsequencing/mirdeepfinder_output/report/allmirnafilename.xls. According to the mirna family group information, assign mirna name to the column mirnaname. Copy two columns ( mirnaid and mirnaname ) without column titles and paste it to the file /home/mirna/deepsequencing/mirdeepfinder_output/query/renameallmirna.txt. format is like as below: 4 CC-1 17 CC0 18 CC1 23 CC2 The 5.5 Update the mirna names in MySQL Run perl script file 4UpdateAllMiRNAname.pl by command perl /home/mirna/mirdeepfinder1.0/ 4UpdateAllMiRNAname.pl. 5.6 Target analysis Run Perl script file 5InputEST.pl by command perl /home/mirna/mirdeepfinder1.0/ 5InputEST.pl to input cdna to MySQL database for target prediction. mirdeepfinder provides user two search engines for mirna target identification, Targetalign (Xie, F. and Zhang, B. (2010) Target-align: a tool for plant microrna target identification. Bioinformatics, 26, ) and targetfinder (targetfinder 1.6, If you use Target-align: Since Target-align is developed by C#, it should be run on Window system. Copy the whole directory /home/mirna/mirdeepfinder1.0/targetforwindow to Window System. Configure the parameter file, config in targetforwindow. For the server, it should be IP address of Linux system, like It is different from the parameters configuration for Linux system. Then Run the perl script perl./6target.pl in DOS command mode. If you use targetfinder: Run the Perl script file 6runTargetFinder.pl by command perl /home/mirna/mirdeepfinder1.0/6runtargetfinder.pl 10

11 5.7 Remove repeated target Run the Perl script file 7target.pl by command perl 7target.pl /home/mirna/mirdeepfinder1.0/ 5.8 Analysis of GO and KEGG Run perl script file 8GOandKEGG.pl by command perl /home/mirna/mirdeepfinder1.0/ 8GOandKEGG.pl. 5.9 Report all result data Run perl script file 9ReportResult.pl by command perl /home/mirna/mirdeepfinder1.0/9reportresult.pl. A series of result files will be generated. Conserved mirnas: /home/mirna/deepsequencing/mirdeepfinder_output/report/mirna/conservedmirnareportfi le.xls; Some basic statistics for conserved mirnas : /home/mirna/deepsequencing/mirdeepfinder_output/report/mirna/conservedmirnastatistics File.xls; Details of conserved mirna: /home/mirna/deepsequencing/mirdeepfinder_output/report/mirna/conservedmirnafilenam e.xls; Details of novel mirna: /home/mirna/deepsequencing/mirdeepfinder_output/report/mirna/ novelmirnafilename.xls; Target report file: /home/mirna/deepsequencing/mirdeepfinder_output/report/target/targetresultreportfilename.xls; GO-cellular component: /home/mirna/deepsequencing/mirdeepfinder_output/report/go/cellular_componentfilename.x ls; GO-biological process: /home/mirna/deepsequencing/mirdeepfinder_output/report/go/biological_processfilename.xl s; GO- molecular function: /home/mirna/deepsequencing/mirdeepfinder_output/report/go/molecular_functionfilename.x ls; 11

12 KEGG: /home/mirna/deepsequencing/mirdeepfinder_output/report/kegg/keggreportfilename.xls; Conserved mirna hairpin structure directory: /home/mirna/deepsequencing/mirdeepfinder_output/report/conservedpdf/ ; Conserved mirna hairpin structure directory: /home/mirna/deepsequencing/mirdeepfinder_output/report/ nonconservedpdf /; 5.10 Detect novel mirna in other small RNA datasets from same species Apparently, if a identified novel mirna also exists in other small RNA datasets from same species, it would be very likely to be real mirna. Run perl script file detectotherdatasets.pl by command perl /home/mirna/mirdeepfinder1.0/detectotherdatasets.pl. Final comparison result will be output to /home/mirna/deepsequencing/mirdeepfinder_output/report/resultfileofotherdatasets.txt 6 degradome analysis by CleaveLand and targetfinder Prepare your mirna in fasta format to file, like matur_mirna.fasta. Run perl script perl /home/mirna/mirdeepfinder1.0/10runcleaveland.pl matur_mirna.fasta. A series of result files will be generated. Bowtie index files: /home/mirna/deepsequencing/mirdeepfinder_output/cleveland/bowtie/ Target identification files by targetfinder: /home/mirna/deepsequencing/mirdeepfinder_output/cleveland/targetfinder/ Target identification files by degradome analysis: /home/mirna/deepsequencing/mirdeepfinder_output/cleveland/cleveland/targetanalysis/ Target t-plot files ("target plot" showing the degradome data plotted against transcript position): /home/mirna/deepsequencing/mirdeepfinder_output/cleveland/cleveland/t_plotter/ 7 Contact us For more questions or bugs on MiRDeepFinder, you are welcome to contact Fuliang Xie by fulxie@gmail.com. Author: Fuliang Xie, Department of biology, East Carolina University, 12

13 Greenville, NC,

1. mirmod (Version: 0.3)

1. mirmod (Version: 0.3) mirmod is a mirna modification prediction tool. It identifies modified mirnas (5' and 3' non-templated nucleotide addition as well as trimming) using small RNA (srna) sequencing