Background and Strategy Smitha, Adrian, Devin, Jeff, Ali, Sanjeev, Karthikeyan
What is a genome browser? A web/desktop based graphical tool for rapid and reliable display of any requested portion of the genome at any scale, integrated with a large collection of annotations. A browser could be configured to display, Genome sequence -contigs \ -assembly -mrna -ESTs -Poly A sites -Splicing boundaries -Non coding RNAs -multiple gene predictions -gene expression -qpcr primers -Origin of replication -conserved sites -cross-species homologies -SNPs -in/dels -CNVs -Inversions
-transposons -repeats -microsatellites -DNAse hypersensitivity sites -TF binding sites -DNA Methylation sites -Literature -GWAS catalog -Mutations (OMIM) Text and sequence-based searches provide quick and precise access to any region of specific interest. Secondary links from individual features lead to sequence details and supplementary off-site databases. Representation of data along a single co-ordinate axis is the USP of a genome browser *The Human Genome Browser at UCSC,W. James Kent, Charles W. Sugnet, Terrence S. Furey, et al.,genome Res. 2002 12: 996-1006
The browser history ;) C.elegans database (ACEDB) (Eeckman and Urbin,1995) The Saccharomyces Genome Database (SGD) (Cherry et al, 1998) N.Meningitidis browser (Comp Genomics et al, 2014) Ensembl (Birney et al, 2001) UCSC genome browser (Kent et al, 2002)
Genome Browser Examples UCSC UTGB Dalliance GBrowse JBrowse
UCSC
UCSC (University of California, Santa Cruz) Shows annotations for variable chromosomal region size Designed to handle large volume of complex data quickly Overkill For each fasta file, need to make several converted files and many tables in db.
UTGB
UTGB (University of Tokyo Genome Browser) Uses AJAX-based web interfaces to avoid excessive reloading of web pages Already equipped with a stand-alone web server and database management system for querying genome databases for easy installation. Generated genome browsers work in Windows, Mac and Linux and can be deployed to remote web servers. Lastest update December 1, 2011
Dalliance
Lightweight visualization tool HTML5 DAS (Distributed Annotation System) System deals badly with searches which match to more than one region of the genome
GBrowse
The Generic Model Organism Database Project (GMOD) developed the Generic Genome Browser (Gbrowse) Perl based Customizable plug in, blast, dump and import many formats Glyph library
JBrowse
Fast, smooth navigation and zooming Can handle multi-gigabase genomes and deep-coverage sequencing Supports BED, GFF3, FASTA, BioLLDB, Chado, WIG, BAM, BigWig, UCSC (intron/exon structure, name lookups, quantitative plots). Relatively easy to get running Capable of thousands of track selections Very lightweight. Requires little server resource requirements (no back-end server code, just data file formatting tools read directly over HTTP)
JBrowse Details
Storing/Querying Data Contig1 GENE 33165 34499. -. id=g1135;name=glmm;signature="glmm: phosphoglucosamine mutase",... Contig1 GENE 34629 35480. -. id=g469;name=folp; Synonyms=dhpS;signature="Pterin binding enzyme",...
Storing/Querying Data Ordered by Start: ABCDEFGHIJKLMNOP SELECT * FROM Intervals Ordered by End: ABEDFHGJICKMLNPO WHERE Interval.Start < Query.End AND Interval.End > Query.Start Alekseyenko AV, Lee CJ. Nested Containment List (NCList): a new algorithm for accelerating interval query of genome alignment and interval databases. Bioinformatics. 2007;23(11):1386-93.
Nested Containment List Store data as a tree where each interval keeps a sublist of contained intervals. Sorts sublist intervals by start AND end simultaneously.
Storing/Querying Data Alekseyenko AV, Lee CJ. Nested Containment List (NCList): a new algorithm for accelerating interval query of genome alignment and interval databases. Bioinformatics. 2007;23(11):1386-93.
Storing/Querying Data JBrowse: NC List UCSC: Binning GBrowse: R-Tree Alekseyenko AV, Lee CJ. Nested Containment List (NCList): a new algorithm for accelerating interval query of genome alignment and interval databases. Bioinformatics. 2007;23(11):1386-93.
Storing/Querying Data JBrowse DOES NOT use a relational database!!! Nested Containment List not implemented in a relational database at the start of JBrowse development Has since been implemented, but not in use by JBrowse Wiley LK, Sivley RM, Bush WS. Rapid storage and retrieval of genomic intervals from a relational database system using nested containment lists. Database (Oxford). 2013;2013:bat056.
Extra Features BLAST BLAST ATLAS (Genewiz, Brigs) TREES ALIGNMENTS (MAUVE) SNP CALLING MULTIPLE SEQUENCE ALIGNMENT