Software review. Analysis for free: Comparing programs for sequence analysis

Similar documents
Geneious 5.6 Quickstart Manual. Biomatters Ltd

Tour Guide for Windows and Macintosh

MacVector for Mac OS X

MacVector for Mac OS X. The online updater for this release is MB in size

Bioinformatics explained: BLAST. March 8, 2007

Release Notes. Version Gene Codes Corporation

CLC Server. End User USER MANUAL

INTRODUCTION TO BIOINFORMATICS

MacVector for Mac OS X

Filogeografía BIOL 4211, Universidad de los Andes 25 de enero a 01 de abril 2006

MacVector for Mac OS X. ClustalW can now be run using just chromatogram files as input.

DNASIS MAX V2.0. Tutorial Booklet

CodonCode Aligner User Manual

Tutorial for Windows and Macintosh. Trimming Sequence Gene Codes Corporation

Compares a sequence of protein to another sequence or database of a protein, or a sequence of DNA to another sequence or library of DNA.

INTRODUCTION TO BIOINFORMATICS

Introduction to Bioinformatics Software on Bio-Linux

FASTA. Besides that, FASTA package provides SSEARCH, an implementation of the optimal Smith- Waterman algorithm.

Principles of Bioinformatics. BIO540/STA569/CSI660 Fall 2010

Biostatistics and Bioinformatics Molecular Sequence Databases

Tutorial for Windows and Macintosh SNP Hunting

The Kodon quickguide

Bioinformatics for Biologists

CS313 Exercise 4 Cover Page Fall 2017

Additional Alignments Plugin USER MANUAL

When we search a nucleic acid databases, there is no need for you to carry out your own six frame translation. Mascot always performs a 6 frame

Performing whole genome SNP analysis with mapping performed locally

Gegenees genome format...7. Gegenees comparisons...8 Creating a fragmented all-all comparison...9 The alignment The analysis...

Tutorial: De Novo Assembly of Paired Data

Dynamic Programming User Manual v1.0 Anton E. Weisstein, Truman State University Aug. 19, 2014

2) NCBI BLAST tutorial This is a users guide written by the education department at NCBI.

Database Searching Using BLAST

Categorized software tools: (this page is being updated and links will be restored ASAP. Click on one of the menu links for more information)

Lecture 4: January 1, Biological Databases and Retrieval Systems

CLC Sequence Viewer 6.5 Windows, Mac OS X and Linux

3. Open Vector NTI 9 (note 2) from desktop. A three pane window appears.

Geneious Prime User Manual. Biomatters Ltd

Nevada Genomics Center

Unipro UGENE Manual. Version 1.29

Uploading sequences to GenBank

Wilson Leung 01/03/2018 An Introduction to NCBI BLAST. Prerequisites: Detecting and Interpreting Genetic Homology: Lecture Notes on Alignment

Tutorial 1: Exploring the UCSC Genome Browser

TBtools, a Toolkit for Biologists integrating various HTS-data

BLAST, Profile, and PSI-BLAST

Page 1.1 Guidelines 2 Requirements JCoDA package Input file formats License. 1.2 Java Installation 3-4 Not required in all cases

Basic Local Alignment Search Tool (BLAST)

Unipro UGENE Manual. Version 1.31

Comparative Sequencing

Software review. Shopping in the genome market with EnsMart

Sequence alignment theory and applications Session 3: BLAST algorithm

Lab 8: Using POY from your desktop and through CIPRES

As of August 15, 2008, GenBank contained bases from reported sequences. The search procedure should be

CLC Sequence Viewer USER MANUAL

Tutorial for Windows and Macintosh SNP Hunting

Lecture Overview. Sequence search & alignment. Searching sequence databases. Sequence Alignment & Search. Goals: Motivations:

Tutorial 4 BLAST Searching the CHO Genome

GENE CONSTRUCTION KIT

Tutorial: chloroplast genomes

Tutorial. Variant Detection. Sample to Insight. November 21, 2017

Copyright. (1) Copyright. (2) Trademark

INTRODUCTION TO CONSED

Browser Exercises - I. Alignments and Comparative genomics

JET 2 User Manual 1 INSTALLATION 2 EXECUTION AND FUNCTIONALITIES. 1.1 Download. 1.2 System requirements. 1.3 How to install JET 2

PARALIGN: rapid and sensitive sequence similarity searches powered by parallel computing technology

Bioinformatics explained: Smith-Waterman

Blast2GO Command Line User Manual

BIR pipeline steps and subsequent output files description STEP 1: BLAST search

Data Walkthrough: Background

Sequence Alignment & Search

TECH NOTE Improving the Sensitivity of Ultra Low Input mrna Seq

MacVector for Mac OS X. Sequence Confirmation Tutorial

Sequence Alignment. GBIO0002 Archana Bhardwaj University of Liege

Phylogeny Yun Gyeong, Lee ( )

Tutorial. De Novo Assembly of Paired Data. Sample to Insight. November 21, 2017

ChIP-seq Analysis Practical

Scientific Programming Practical 10

How to use KAIKObase Version 3.1.0

Tutorial: How to use the Wheat TILLING database

Working with AppleScript

MetaPhyler Usage Manual

Bioinformatics. Sequence alignment BLAST Significance. Next time Protein Structure

HORIZONTAL GENE TRANSFER DETECTION

QIAseq DNA V3 Panel Analysis Plugin USER MANUAL

Module 1 Artemis. Introduction. Aims IF YOU DON T UNDERSTAND, PLEASE ASK! -1-

NGS Data Visualization and Exploration Using IGV

ICB Fall G4120: Introduction to Computational Biology. Oliver Jovanovic, Ph.D. Columbia University Department of Microbiology

Lecture 5 Advanced BLAST

Introduction to Bioinformatics Problem Set 3: Genome Sequencing

COS 551: Introduction to Computational Molecular Biology Lecture: Oct 17, 2000 Lecturer: Mona Singh Scribe: Jacob Brenner 1. Database Searching

Software review. Biomolecular Interaction Network Database

BIOINFORMATICS A PRACTICAL GUIDE TO THE ANALYSIS OF GENES AND PROTEINS

Bioinformatics workflows in the cloud with Unipro UGENE. Mikhail Fursov Konstantin Okonechnikov

Computational Molecular Biology

Proteome Comparison: A fine-grained tool for comparative genomics

BioEdit version 5.0.6

Managing big biological sequence data with Biostrings and DECIPHER. Erik Wright University of Wisconsin-Madison

Similarity Searches on Sequence Databases

Tutorial. Aligning contigs manually using the Genome Finishing. Sample to Insight. February 6, 2019

BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha

Using many concepts related to bioinformatics, an application was created to

Transcription:

Analysis for free: Comparing programs for sequence analysis Keywords: sequence comparison tools, alignment, annotation, freeware, sequence analysis Abstract Programs to import, manage and align sequences and to analyse the properties of DNA, RNA and proteins are essential for every biological laboratory. This review describes two different freeware (BioEdit and pdraw for MS Windows) and a commercial program (Sequencher for MS Windows and Apple MacOS). Bioedit and Sequencher offer functions such as sequence alignment and editing plus reading of sequence trace files. pdraw is a very comfortable visualisation tool with a variety of analysis functions. While Sequencher impresses with a very user-friendly interface and easy-to-use tools, BioEdit offers the largest and most customisable variety of tools. The strength of pdraw is drawing and analysis of single sequences for priming and restriction sites and virtual cloning. It has a database function for user-specific oligonucleotides and restriction enzymes. INTRODUCTION Analysing sequences belongs to a molecular biologist s everyday life, whether a newly sequenced gene needs to be examined, cloning of a gene needs to be confirmed or just the sequence of DNA, cdna, RNA or proteins needs to be viewed. Design, analysis and managing of primers, sequences, clones and domains keep a whole industry running; each company attempts to offer a program better than their competitor. Speed of analysis, especially when many sequences are managed, is essential. The authors of programs have mainly two choices: write a complex, suite-like program that performs a variety of analyses, or design a simpler program to solve a particular (biological) problem. There are varieties of free web-based solutions such as primer design available. However, many users may prefer a single all-in-one program installed on their own computer. The main factor that most biologists are interested in is a simple graphical user interface (GUI) to ease communication to command line based programs, and security to use the program in the future. Another important interest is usually the option of running a local search (such as BLAST) to avoid waiting time when the NCBI server is heavily used, or because private sequence data should not be sent over public internet channels via http (hypertext transport protocol). Three programs described in this review can help solve these problems: Sequencher, Bioedit and pdraw. Gene Codes Sequencher is a common program in most molecular biology laboratories, which is often simply used as a sequence displayer with some aligning functions. However, it is a very powerful tool that can be used for analysing sequences as for example from an ABI sequencer, and aligning them. According to company information, its features include restriction enzyme mapping, heterozygote detection, cdna to genomic DNA large gap alignment, support for confidence scoring, comparative sequencing, and open 82 & HENRY STEWART PUBLICATIONS 1467-5463. BRIEFINGS IN BIOINFORMATICS. VOL 5. NO 1. 82 87. MARCH 2004

reading frame (ORF), motif and single nucleotide polymorphism (SNP) analysis. Originally written for Macintoshes (MacOS version 9), it is nowadays also available for MS Windows. A MacOSX version will be released in spring 2004, according Gene Codes customer support. It has a Macintosh-derived GUI and is state-of-the-art concerning userfriendliness. However, the need to run each version with a dongle (hardware licence verification tool) limits the usage in big laboratories. Therefore this review discusses two freely available programs that offer a variety of tools to do the same job at no cost: pdraw and BioEdit. pdraw is a freeware program for sequence visualisation written by Kjeld Olesen who runs Acaclone Software. pdraw can do sequence annotation, editing and analysis and produce a graphical output of sequences or constructs. The author calls the program beer ware, meaning you are encouraged to buy him a beer when you meet him somewhere. This is most likely to be on a yeast genetics conference, according to his website. BioEdit is an older freeware sequence analysis tool (latest version of 2001) that can call up other applications to analyse given sequences. It is, according to its creator Tom Hall, a sequence analysis program designed and written by a graduate student who was frustrated by using word processors and command line programs for sequence analysis. Therefore, he created a program that made full use of Windows GUI. He assembled a very useful modular package with an extremely easy, graphical interface for sequence manipulation and editing that is likely to fulfil your basic needs and help you with most sequencerelated problems. DETAILS OF USE Installing the different programs is not a difficult task; they all come with an installer. BioEdit can be downloaded in its latest version (5.09) from the website of Tom Hall s department at North Carolina University. 1 The file (8.2 MB) is a zipped installer that will install BioEdit in a folder on your C: drive named BioEdit. The system recommendations are a Windows operating system (95 to 2000) and a Pentium I CPU with at least 32 MB and approximately 40 MB hard disc space, which is a very low-end configuration nowadays. The installer works without problems under Windows 98 to XP and should work with Windows 95 as well. Treeview, 2 a program to view tree files, has to be installed additionally in the applications folder. The newest version can be downloaded from the homepage of Roderic Page. 3 pdraw is frequently (approximately 10 times in 2003) and easily updated (just replace the current copy with the new one). pdraw can be downloaded from Acaclone s homepage; 4 the set-up file is only 2.6 MB. The program is designed to run under Windows and the current version (1.1.68 at the time of writing) supports MS Windows XP. It has also been reported to run on an Apple ibook under SoftWindows 95 emulation, and even with different Linux distributions using WINE emulation. According to Acaclone, pdraw32 can open sequences to up to two giga base pairs. A PC with 32 MB RAM is sufficient to analyse one mega base pairs and the author states that short sequences can even be analysed on a 486 PC with 16 MB. The program runs on older machines (Pentium 2) and on the newest Pentium 4 equipped computers without problems. A demo version (Version 3.0 for Macintosh, 4.1.4 for Windows) of Sequencher can be downloaded from Gene Code s ftp server. 5 After unzipping, the installer will create a folder called Gene Codes in your program file folder, which uses approximately 12 MB hard disk space. The demo version would not run as a full version even with a Sequencher dongle (4.05) installed. The full version (referring to version 4.05 for Windows) is delivered on three floppies & HENRY STEWART PUBLICATIONS 1467-5463. BRIEFINGS IN BIOINFORMATICS. VOL 5. NO 1. 82 87. MARCH 2004 83

or a CD-ROM. Installing takes less than a minute on a Pentium 3. System requirements are MS Windows 98 to XP, 96 MB RAM and 20 MB hard disk space. Once all the programs are installed you can start to work with them immediately; none requires rebooting. All programs have their own project files: SPF, PDW and BIO. They are not exchangeable. The user must work with common formats. Bioedit in detail BioEdit is a freeware sequence analysis tool that can call up other applications to further analyse given sequences. It accepts a wide range of sequence formats (as scl, seq, txt or fas) and uses Don Gilbert s ReadSeq 6 to import the majority of sequences. If a sequence is not recalled, BioEdit offers the option to open it as text. They can also be directly copied from the clipboard where they have been stored from another program or NCBI website. Most of the applications run without problems under WinXP; however, to use ClustalW 7 a newer version 1.83 (to be downloaded from the website 8 ) has to be placed in the helper application folder. The majority of the helper applications (ClustalX, formatdb, localblast) 9 are freely available and can be downloaded from various sources on the web (ie NCBI). The integration is limited and adding new applications is a laborious task. On the other hand, users might appreciate the main advantage, a GUI for formatdb and BLAST stand-alones. The help file describes in detail how to add programs by the example of Treeview. The online help was not updated in the last version; it refers to an earlier version of BioEdit (5.06). Reading and writing of alignment files can be organised with the BioEdit project file. Editing of sequences is an easy task and there are many options for viewing, annotating and editing multiple sequence alignments. Cut and paste between BioEdit and Sequencher or pdraw is easy. Sequences can be transformed via translations, complementation and reversing sequences. Doing a local BLAST or NCBI search is possible. Bioedit uses ClustalW for multiple sequence alignment; the output can later be fed into PHYLIP programs to generate tree files that are displayed in TreeView. 3 Dot plots and pairwise alignments are both possible. Sequences can easily be analysed for composition with a graphical output. BioEdit features a list for Internet bookmarks. It can store up to 500 different bookmarks; some installed links are obsolete in the meantime. Individual links have to be added manually; the help file explains the syntax. The defaults include a link to Webcutter, 10 which might be a very useful addition to BioEdit s restriction analysis. Primers cannot be designed. BioEdit reads the files from, for example, an ABI sequencer (and.scf files) and can perform some basic manipulation, such as increasing the scale of the base calls. Single base calls can be marked and changed. Export to FastA and text is possible. Plasmids can be drawn, but not to the extent available in pdraw. The software does not accept gaps in the sequence; they have to be removed first or the plasmid tool will not start. Be aware that there is no checking of memory requirements before opening an alignment; this can cause problems. Sequencher in detail Sequencher was introduced by Gene Codes in 1991 and has since become a major program for DNA sequencing. The Windows version was introduced in 1998. It has won several prizes over the years. Being primarily a sequence analysis tool for your sequences from a sequencer, it has useful analysis tools to offer beyond that. This can be of special interest for laboratories that cannot or do not want to buy different software. A simple GUI for assembling sequences offers the choice between three categories: dirty data, clean data and large gap. Two sliders allow choosing between minimum match (percentage 60 100) and minimum overlap (0 100). The technology behind this automatic 84 & HENRY STEWART PUBLICATIONS 1467-5463. BRIEFINGS IN BIOINFORMATICS. VOL 5. NO 1. 82 87. MARCH 2004

assembly is based on an optimised Smith Waterman algorithm according to company information. The procedure is an outstanding example for userfriendliness. If you are in a hurry, this program will deliver results with a few mouse clicks. The assembled contig is visualised with arrows in different colours, and buttons allow you to switch to sequence view. Changing of sequences is easily performed and the user can examine the original trace data of sequences. The manipulation options for chromatograms are better than in Bioedit and allow the choice of nucleotides to be displayed. This program is well designed for analysing sequences. To clean up data Sequencher allows you to trim your data, an option from the Sequence menu. This is in order to remove low-quality parts of the data (remains of vectors, or just the first 10 30 base pairs) before you assemble them to a contig. Although the user has the choice to open the raw trace files from the sequence read and individually adjust ambiguous bases, this automated step is very helpful. To shorten time it is possible to create a reference sequence to which the single contigs are compared. The manual offers detailed advice for the procedure. The Tour guide provided with the evaluation version is a very helpful source of information and answers most questions. PDRAW in detail The main DNA sequence drawing features many options: giving the sequence a name, optional comments and additional annotations (domains, inserts, origin of replication) are easily made. The display of the adequate primers, ORF, guanine and cytosine (GC) content and your personal annotations is switched on and off by buttons that are easy to understand also for technological laypeople. Once the sequence has been read, pdraw automatically analyses it for restriction enzyme sites, methylation sites and ORFs (based on standard or non-standard genetic code). In addition, it handles oligonucleotide-priming sites of the primers the user has entered into their database. This is a text-based flat file that can also be edited by the user with a standard text editor such as WordPad, which can be useful since there is no sorting of the entries. They appear in the order they have been entered. For further analysis, the users will appreciate the calculation of the optimal polymerisation chain reaction (PCR) annealing temperature based on the melting temperature (T m ) of the chosen primers. For a given sequence, parameters such as primer or salt concentration can be chosen. Choosing the option internal stability always crashes the program; however, since it is a helper application, pdraw continues. An additional feature is the virtual gel plot allowing the user to digest their DNA and analyse the generated fragments on a virtual gel. Agarose concentration of the gel, different commercial markers and running length can be selected. Although this feature cannot replace laboratory work, it is helpful for the user to know what digestion pattern to expect. This is a tool found in very professional software such as Informax s Vector NTI. The sequence view offers a choice of options to be displayed. The virtual cloning procedure was not tested. Using the programs The main task for using these programs is pasting your own sequences into one of them and analysing their physical properties or their relation to other known sequences. The amount of sequenced genomes is growing every day so a lot of information is at hand without the need for laborious DNA extraction, cdna bank fishing, screening, etc. Are you looking for a specific gene? Just go to GenBank, EMBL/EBI or one of the genome websites and get the sequence. But what then? Biological sequence software should recognise plain text, GenBank, FastA and PIR (Protein Information Resource) format files and & HENRY STEWART PUBLICATIONS 1467-5463. BRIEFINGS IN BIOINFORMATICS. VOL 5. NO 1. 82 87. MARCH 2004 85

the sequence chromatogram files to name minimum requirements. The good news is that all programs are easy to use for copy and paste, but not all recognise the common FastA format properly. Creating a test.seq and a test.txt file with a FastA formatted sequence of 1000 bp leads to different results. While Sequencher usually imports the test files without problems and BioEdit most of the time, pdraw includes the FastA header starting with a. into the sequence, ignoring numbers and nonnucleotide letters. This can be corrected in the edit sequence and header menu. The program quotes a missing divider. Adding Sequence:: before your sequence should solve the problem. BioEdit usually starts the ReadSeq import engine that works reliable, but it is not always reliable. Trying to open, for example, pdraw or similar text files can lead either to opening a text file revealing all the additional information or to a sequence file, assuming the sequence is an RNA sequence or amino acid. Since it stores this information along with the sequence, this can later lead to trouble even if the sequence was corrected. If a BLAST search is performed with the sequence(s), the program will refer to the original assumption and automatically set the pull-down menu to Blastp. Since no further check is done, the user will realise the problem only when they do not get any results after a while. After finding a gene of interest, how do you cut it out and fuse it to a reporter or a promoter? All programs offer a restriction map. However, the pdraw way is the easiest: It features a full list with most common enzymes and allows you to show only the ones you selected. With a click in the plot view, you see all possible restriction sites in your chosen sequence. This can be quite complex and the author admits that the graphical description is not always accurate when many restriction sites have to be plotted. The parameters for each enzymes can be changed (for example a minimum of cutting sites). Finally, you can have a detailed overview in the analysis menu where all restriction sites are listed as well as all chosen enzymes that did not cut. Sequencher comes with a full list called default enzymes whose name should not be changed. Instead, you are encouraged to modify this list to your needs and save a copy as default enzymes in the same folder. A visual restriction map for a single sequence can be created easily with cut map. Single enzymes can be chosen. Bioedit restriction mapping is also simple: mark a sequence and find the restriction map entry under sequence. The output offers a text file displaying the sites for the cutting enzymes schematically and in a table. The table with the enzymes can be updated with a new gcgenz enzyme file from the REBASE website. 11 However, the handling of individual enzymes is not very comfortable, since the enzymes are listed according to manufacturer. Choosing individual enzymes seems to be easier in pdraw and Sequencher. DISCUSSION Sequencher is a highly professional userfriendly tool that will handle your sequences, easily align them with no further ado and check where your different sequences make a contig. Nevertheless, it can be abused to serve as a real sequence analyser: The user can check where primers will align to the sequence or do a restriction enzyme analysis. This and the original application, checking the sequences that have been cloned or amplified, are done easily. The Macintosh look is still visible in version 4 for Windows. A user does not need any further knowledge about sequence comparison; simple copy and paste or import commands are enough. The default settings for aligning are sufficient for high homology or high-quality sequence data. However for genomescale assembly one has to use the PHRED/PHRAP package. 12,13 Users would surely appreciate a BLAST search option and an optimised help function. pdraw has its limitations, but is excellent at what it was created for: paste 86 & HENRY STEWART PUBLICATIONS 1467-5463. BRIEFINGS IN BIOINFORMATICS. VOL 5. NO 1. 82 87. MARCH 2004

a sequence, annotate and display; get the GC content, ORF and the positions of the primers in stock. The program is often updated and runs under the newest Windows versions, and reportedly in Windows emulation. The main disadvantages concern import of other file formats (at least FastA should be standard), an automated file saving, removal of bugs (such as the T m calculator/primer analyser) and a better organisation for the oligonucleotide and enzyme database files. However, it is the champion of value for money in this review, a professional software that comes at virtually no price. BioEdit is a nicely assembled package that is ready to use for editing and reading sequences and plasmid drawing. The import engine works very well, it accepts a large variety of formats and is fairly bug free. In combination with ClustalW and other NCBI tools, it allows multiple alignments and local or distant BLASTs directly from the alignment window. It is highly customisable and has a pleasant user interface. Its limitations are the lack of tools compared with commercial software, the need to manually set up helper applications and lacking support for Windows XP. However, it is free. If a programmer does a few revisions and adds new programs, it could be marketed as shareware. Each program offers advantages and has features the others lack. All programs could improve the integrated help function. On the other hand, Bioedit s help has many entries and users of Sequencher can contact the customer support. Although the freeware programs cannot offer the functionality of commercial software in all aspects, their distributed at no charge feature might outweigh some of their drawbacks such as limited integration of the different tools or bugs in some modules. If BioEdit is updated to a fully operational Windows XP version and a wider range of functions is added to pdraw (such as sequence alignment), they could become true alternatives to existing products on the market. In the present state, they are too limited to be a full match for Sequencher and other commercial software, although they can be very useful additions to other programs. References Helge-Friedrich Tippmann Plant Research Department, Risø National Laboratory, DK-4000 Roskilde, Denmark Tel: 0045 4677 4291 E-mail: helge.tippmann@tu-berlin.de 1. BioEdit (URL: http://www.mbio.ncsu.edu/ BioEdit/). 2. Page, R. D. M. (1996), TREEVIEW: An application to display phylogenetic trees on personal computers, Comput. Appl. Biosci., Vol. 12, pp. 357 358. 3. Treeview (URL: http:// taxonomy.zoology.gla.ac.uk/rod/rod.html). 4. pdraw (URL: http://www.acaclone.com/). 5. Sequencer, GeneCodes Corporation (URL: http://www.genecodes.com). 6. ReadSeq (URL: http://iubio.bio.indiana.edu/ soft/molbio/readseq/). 7. Thompson, J. D., Higgins, D. G. and Gibson, T. J. (1994), CLUSTAL W: Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position specific gap penalties and weight matrix choice, Nucleic Acids Res., Vol. 22, pp. 4673 4680. 8. ClustalW (URL: http://www.imtech.res.in/ pub/mirror_sites/ebi/dos/clustalw). 9. Altschul, S. F., Madden, T. L., Alejandro, A. et al. (1997), Gapped BLAST and PSI- BLAST: A new generation of protein database search programs, Nucleic Acids Res., Vol. 25, pp. 3389 3402. 10. Webcutter (URL: http://firstmarket.com/ firstmarket/cutter/). 11. Rebase (URL: http://www.neb.com/rebase/). 12. Ewing, B., Hillier, L. D., Wendl, M. C. and Green, P. (1998), Base-calling of automated sequencer traces using phred. I. Accuracy assessment, Genome Res., Vol. 8, pp. 175 185. 13. Ewing, B. and Green, P. (1998), Base-calling of automated sequencer traces using phred. II. Error probabilities, Genome Res., Vol. 8, pp. 186 194. & HENRY STEWART PUBLICATIONS 1467-5463. BRIEFINGS IN BIOINFORMATICS. VOL 5. NO 1. 82 87. MARCH 2004 87