How to submit nucleotide sequence data to the EMBL Data Library: Information for Authors
|
|
- Brice Cross
- 6 years ago
- Views:
Transcription
1 727 How to submit nucleotide sequence data to the EMBL Data Library: Information for Authors l\i»jhe EMBL Data Library, Postfach , D-6900 Heidelberg, Federal Republic of Germany ii I i ii January The first step in getting an accession number 2 What to submit to the EMBL Data Library Before doing anything else, authors should get a copy of a sequence data submission form. This form solicits all of the information needed to make a database entry; that is, the primary sequence data together with descriptive information such as the source of the sequenced segment (e.g., organism, strain, tissue) and the location of interesting regions within the sequence (e.g., coding regions, regulatory signals). It also contains information about data formats. The data submission form exists in both a paper and a computerreadable version; the latter can be completed using a text editor. These versions are available from the following sources: A data submission should include the following (for further details, see the data submission form itself): (a) Paper form: printed at the end of this article, from the Development editorial office and available upon request from EMBL, GenBank and the DNA Databank of Japan (DDBJ) at the addresses given in Appendix 2. (b) Computer-readable form: (1) With all releases of the EMBL and GenBank databases since January 1987 and with DDBJ releases since January (2) From EMBL by electronic mail (computer network) via our file server. Anyone with access to BITNET (either directly or via a gateway) can send a request to the EMBL file server, which will automatically return a copy of the data submission form by electronic mail. Instructions for using the EMBL file server are given in Appendix I. (3) From EMBL, on Macintosh or IBM-compatible (5i" or 3 ") floppy diskettes. Complete information on how to contact the EMBL Data Library is given in Appendix II. (4) From GenBank via electronic mail or on floppy diskette. For information on requesting the form from GenBank via Telenet, contact David Benton ( ). Researchers in Japan can obtain the form by dialing up the DDBJ computer system ( ). 3 How to send data to the EMBL Data Library (a) the sequence itself, in computer-readable form (computer network mail, magnetic tape or IBMcompatible or Macintosh floppy diskette). Printouts will be accepted only if the authors have no access to a computer. (b) a completed data submission form for each submitted sequence. The form is available from the sources listed in section l(a). (c) a computer network address, a telex number or a telefax number (advisable, to help speed things up, but not required). Data can be sent to the Data Library in one of several ways: (a) Electronic file transfer: files can be sent via computer network to DATASUBS@EMBL.BITNET. This BITNET address can be reached directly (by people at BITNET sites) or via various gateways from Arpanet, Usenet, JANET, etc. Ask your local network expert for help or phone us ( ). (b) Telefax to Data Submissions, EMBL Data Library. Our fax number is: (c) Normal post. See address given in Appendix II. 4 How long will it take to get an accession number? We will process data submissions within 7 working days of receipt and send authors notification of either what accession number(s) their data have been assigned or what additional information is needed. There are several things authors can do to minimise the time it takes to get an accession number: (a) Be sure that submissions include all the necessary materials and that all relevant questions on the data submission form have been answered.
2 728 EMBL Data Library (b) Check the data to be sure that they do not contain inconsistencies/errors (e.g., a stop codon in the middle of a region listed on the form as an exon). (c) Be sure to include either a computer network address or a telex or telefax number. If this information is not provided, notification of accession numbers will be sent by regular post. Telephoning is costly and time-consuming, and the Data Library will therefore not attempt to contact authors by phone. Although we will process data submissions as quickly as we can, we strongly encourage authors to submit their data at or before the time they begin writing the manuscript, rather than once it is finished. This way we can process the data while the manuscript is being written, and authors will not have to delay submission of their manuscript while they wait for notification of their accession number. It should be emphasised that authors are responsible for communicating their accession number(s) to the journal at the time they submit their manuscript; the Data Library will not contact the journal. 5 Data security The data submission form asks authors whether their submitted data can be made available to the public immediately or whether it should be withheld until publication. 6 Updating your data Once a database entry has been created from a submission, a copy is sent to the submittor for his/her reference and for comments or corrections. However, it often happens that the entry is correct when it is created but, with the passage of time, becomes out of date: the authors may make corrections to the sequence itself, or may discover new features of the sequence. Since such findings are generally not published, the only way to keep entries correct and up to date is if the authors communicate their new findings to the database. This can be done by normal post or electronic mail to the address given in Appendix II. One type of update which merits separate mention is that relating to citations. Most submissions represent data not yet been accepted for publication, and therefore the journal citation is not available when the entry is created. Adding this information at a later date requires that the database staff identify which submissions correspond to which publications; while this is often straightforward, it can also be problematic, especially if the journal does not print an accession number in the article, or if the submitted and the published data are not identical. We therefore strongly encourage researchers to let us know when and where and when data they have submitted to us are published. Appendix I. EMBL network file server Computer users with access to BITNET (directly or via a gateway) can obtain copies of the data submission form, or of database entries, by sending commands to a file server running on the VAXcluster at EMBL. The file server facility is provided free of charge, though users may have to meet some or all of the communication costs, depending on the accounting system of their local computer service. To use this facility, send file server commands (as electronic mail) to the address NETSERV@EMBL. BITNET. Each line of the mail message should consist of a single file server command, and nothing else. The mail can be sent over BITNET, or from any other network which has a gateway into BITNET (e.g., JANET in the UK or ARPANET in the USA). The most important file server command, to get users started, is HELP. If the file server receives this command, it will return a help file to the sender, explaining in some detail how to use the facility. In order to send electronic mail to a BITNET address, users must find out which command they have to use on their own local machine and how they should format the address NETSERV@EMBL.BITNET. Users who don't already know how to do this should contact their local computer service, or if all else fails, contact the Data Library and we will do our best to help. Below are some examples which illustrate how to send commands to the file server using a VAX/VMS system that is a BITNET node running JNET software. To send a HELP command to the file server, you could use the operating system command MAIL as follows: $ MAIL <filename> "JNET% ""NETSERV@EMBL""" where <filename> is the name of a file containing file server commands. To request help information the file should contain the following command: HELP To request a copy of the data submission form, it should contain the following GET command: GET DATALIB: DATASUB.TXT Users can also request specific sequences via the File Server. Information on how to do this is provided in the HELP file. Appendix II. How to contact the nucleotide sequence databases EMBL Data Library: (a) Computer network: datasubs@embl.bitnet (for data submissions); datalib@embl.bitnet (for questions requiring a personal response) (b) Postal address: Data Submissions, EMBL Data
3 Information for Authors 729 Library, Postfach , 6900 Heidelberg, Federal Republic of Germany (c) Telephone: (d) Telefax: (e) Telex: (embl d) GenBank : (a) Computer network address: gb-subs@lanl.gov (b) Postal address: GenBank Submissions, Mail Stop K710, Los Alamos National Laboratory, Los Alamos, NM 87545, USA (c) Telephone: (d) Telefax: DNA Databank of Japan: (a) Computer network: ddbjsub@ddbj.nig.ac.jp (for data submissions); ddbj@ddbj.nig.ac.jp (for other enquiries) (b) Postal address: Laboratory of Genetic Information Analysis, Center for Genetic Information Research, National Institute of Genetics, Mishima, Shizuoka 411, Japan (c) Telephone: x647 (d) Telefax:
4 730 EMBL Data Library Sequence Data Submission Form This form solicits the information needed for a nucleotide or amino acid sequence database entry. By completing and returning it to us promptly you help us to enter your data in the database accurately and rapidly. These data will be shared among the following databases: EMBL Data Library (Heidelberg, Federal Republic of Germany); GenBank (Los Alamos, NM, U.S.A. and Mountain View, CA, U.S.A), DNA Data Bank of Japan (DDBJ; Mishima, Japan); National Biomedical Research Foundation Protein Identification Resource (NBRF-PIR; Washington, D.C., U.S.A.); Martinsried Institute for Protein Sequence Data (MIPS; Martinsried, Federal Republic of Germany) and International Protein Information Database in Japan (JEPID; Noda, Japan). Please answer all questions which apply to your data. If you submit 2 or more non-contiguous sequences, copy and fill out this form for each additional sequence. Please include in your submission any additional sequence data which is not reported in your manuscript but which has been reliably determined (for example, introns or flanking sequences). When submitting nucleic acid sequences containing protein coding regions, also include a translation (SEPARATELY from the nucleic acid sequence). Then send (1) this form, (2) a copy of your manuscript (if available) and (3) your sequence data (in machine readable form) to the address shown below. Information about the various ways you can send us your data and about formats for the sequence data is given in the following two sections. Thank you. SUBMITTING DATA TO THE EMBL DATA LIBRARY We are happy to accept data submitted in any of the following ways: (1) Electronic Tile transfer: files can be sent via computer network to: DATASUBS@EMBL.EARN. This BITNET/EARN address can be reached via various gateways from Arpanet, Usenet, JANET, etc. Ask your local network expert for help or phone us. Please ensure that each line in your file is not longer than 80 characters; longer lines often get truncated when they are sent. (2) Floppy disks: we can read Macintosh and IBM-compatible diskettes. Please use the 'save as text only 1 feature of your editor to save your sequence file, as otherwise we might have difficulty processing it (3) Magnetic tapes: 9-track only (fixed-length records preferred); 800, 1600 or 6250 bpi (any blocksize); ASCII or EBCDIC character codes; any label type or unlabelled. Our address is: EMBL Data Library Submissions Computer network DATASUBS@EMBL.BITNET Postfach Telefax (+49) D-6900 Heidelberg Telephone (+49) Federal Republic of Germany When we receive your data we will assign them an accession number, which serves as a reference that permanently identifies them in the database. We will inform you what accession number your data have been given and we recommend that you cite this number when referring to these data in publications. If your manuscript has already been accepted for publication, the accession number can be included at the galley proof stage as a note added in proof. So that we can process your data and inform you of your accession number before you receive the galley proofs, please return this form to us as soon as possible. We suggest that the note added in proof should read approximately as follows: The nucleotide sequence data reported will appear in the EMBL, GenBank and DDBJ Nucleotide Sequence Databases under the accession number." A computer-readable version of this form is available on the distribution tapes of the EMBL Data Library from Release 11 onwards and on GenBank Releases 48 onwards. The BIONET National Computer Resource for Molecular Biology (Mountain View, CA, U.S.A.) also has a copy. Feel free to use the computer-readable form rather than this printed one. In this case, the form should be filled out with a text editor and sent via computer network or normal post to the address indicated above. FORMATS FOR SUBMITTED DATA We would appreciate receiving the sequence data in a form which conforms as closely as possible to the following standards. Each sequence should include the names of the authors. Each distinct sequence should be listed separately using the same number of bases/residues per line. The length of each sequence in bases/residues should be clearly indicated. Enumeration should begin with a "1" and continue in the direction 5' to 3' (or amino- to carboxy- terminus). Amino acid sequences should be listed using the one-letter code. Translations of protein coding regions in nucleotide sequences should be submitted in a separate computer file from the nucleotide sequences themselves. The code for representing the sequence characters should conform to the IUPAC-IUB standards, which are described in: Nucl. Acids Res. 13: (1985) (for nucleic acids) and J. Biol. Chem. 243: (1968) and Eur. J. Biochem 5: (1968) (for amino acids). El.5/11.89
5 L GENERAL INFORMATION Your last name Institution Address First name Information for Authors 731 Middle initials Computer mail address Telephone Telex number Telefax number On what medium and in what format are you sending us your sequence data? (see instructions on front page) [ ] electronic mail [ ] diskette: computer oneratine svstem eriitnr [ ] magnetic tape record length blocksize label tvoe density [ ] 800 [ ] 1600 [ ] 6250 character code t ] ASCII [ ] EBCDIC [ ] printed copy (please, ONLY if it is impossible to send us machine-readable data) H. CITATION INFORMATION These data are [ ] published [ ] in press [ ] submitted [ ] in preparation [ ] no plans to publish authors title of paper journal volume first-last pages year Do you agree that these data can be made available in the database before they appear in print? [ ] yes [ ] no, they should be made available only after publication (estimated date: Does the sequence which you are sending with this form include data that does not appear in the above citation? [ ] no [ ] yes, from position to [ ] base pairs OR [ ] amino acid residues (If your sequence contains 2 or more such spans, use the feature table in section IV to indicate their positions) If so, how should these data be cited in the database? [ ] published [ ] in press [ ] submitted [ ] in preparation [ ] no plans to publish authors address (if different from that given in section I) title of paper journal volume first-last pages year List references to papers and/or database entries which report sequences overlapping with that submitted here. first author journal, vol., pages, year and/or database, accession number C2J/I1.89
6 732 EMBL Data Library m. DESCRIPTION OF SEQUENCED SEGMENT Wherever possible, please use standard nomenclature or conventions. If a question is not applicable to your sequence, answer by writing N.A.; if the information is relevant but not available, write a question mark (7). What kind of molecule did you sequence? (check all boxes which apply) [ ] genomic DNA [ ] genomic RNA [ ] virus [ ] provirus [ ]cdnatomrna [ ] cdna to genomic RNA [ ] organelle DNA [ ] organelle RNA please specify organelle [ ] trna [ ] rrna [ ] snrna [ ] scrna [ ] other nucleic acid (please specify) [ ] peptide: [ ] sequence assembled by [ ] overlap of sequenced fragments [ ] homology with related sequence [ ] other (please specify) [ ] partial: [ ] N-terminal or [ ] C-terminal or [ ] internal fragment length of sequence [ ] base pairs or [ ] amino acid residues gene name(s) (e.g., lact) gene product name(s) (e.g., beta-d-galactosidase) Enzyme Commission number (e.g., EC ) gene product subunit structure (e.g., hemoglobin The following items refer to the original source of the molecule you have sequenced. organism (species) name (e.g., Escherichia coli; Mus musculus) sub-species strain (e.g., K12; BALB/c) name/number of individual or isolate (e.g., patient 123; influenza virus A/PR/8#4) developmental stage [ ] germ line [ ] rearranged haplotype tissue type cell type The following items refer to the immediate experimental source of the submitted sequence, name of cell line (e.g., Hela; 3T3-L1) library (type; name) clone(s) The following items refer to the position of the submitted sequence in the genome, chromosome (or segment) name/number map position units: [ ] genome % or [ ] nucleotide number or [ ] other Using single words or short phrases, describe the properties of the sequence in terms of: its associated phenotype(s); the biological/enzymatic activity of its product; the general functional classification of the gene and/or gene product macromolecules to which the gene product can bind (e.g., DNA, calcium, other proteins); subcellular localization of the gene product; any other relevant information. Example (for viral erbb nucleotide sequence): transforming capacity, EGF receptor-related; tyrosine kinase; oncogene; transmembrane protein. C3.1/2.88
7 IV. FEATURES OF THE SEQUENCE Information for Authors 733 Please list below the types and locations of all significant features experimentally identified within the sequence. that your sequence is numbered beginning with "1." In the column marked fill in feature from to bp aa id comp Significant features include: Be sure type of feature (see information below) number of first base/amino acid in the feature number of last base/amino acid in the feature x, if your numbers refer to positions of base pairs in a nucleotide sequence x, if your numbers refer to positions of amino acid residues in a peptide sequence method by which the feature was identified. E = experimentally, S = by similarity with known sequence or to an established consensus sequence; P = by similarity to some other pattern, such as an open reading frame x, if feature is located on the nucleic acid strand complementary to that reported here regulatory signals (e.g., promoters, attenuators, enhancers) transcribed regions (e.g., mrna, rrna, trna). (indicate reading frame if start and stop codons are not present) regions subject to post-transcriptional modiftcaton (e.g., introns, modified bases) translated regions extent of signal peptide, prepropeptide, propeptide, mature peptide regions subject to post-translational modification (e.g., glycosylated or phosphorylated sites) other domains/sites of interest (e.g., extracellular domain, DNA-binding domain, active site, inhibitory site) sites involved in bonding (disulfidc, thiolester, intrachain, interchain) regions of protein secondary structure (e.g., alpha helix or beta sheet) conflicts with sequence data reported by other authors variations and polymorphisms The first 2 lines of the table are filled in with examples. If you think you will need more space than the table below provides, please photocopy this page before you fill it out. Numbering for features on the sequence submitted here [ ] matches paper [ ] does not match paper feature from to bp aa id comp EXAMPLE TATA box 1 8 EXAMPLE exon C4.1/2.88
8
) I R L Press Limited, Oxford, England. The protein identification resource (PIR)
Volume 14 Number 1 Volume 1986 Nucleic Acids Research 14 Number 1986 Nucleic Acids Research The protein identification resource (PIR) David G.George, Winona C.Barker and Lois T.Hunt National Biomedical
More informationThe PIR protein sequence database
k.) 1991 Oxford University Press Nucleic Acids Research, Vol. 19, Supplement 2231 The PIR protein sequence database Winona C.Barker*, David G.George, Lois T.Hunt and John S.Garavelli National Biomedical
More informationTopics of the talk. Biodatabases. Data types. Some sequence terminology...
Topics of the talk Biodatabases Jarno Tuimala / Eija Korpelainen CSC What data are stored in biological databases? What constitutes a good database? Nucleic acid sequence databases Amino acid sequence
More informationWhen we search a nucleic acid databases, there is no need for you to carry out your own six frame translation. Mascot always performs a 6 frame
1 When we search a nucleic acid databases, there is no need for you to carry out your own six frame translation. Mascot always performs a 6 frame translation on the fly. That is, 3 reading frames from
More informationCAP BIOINFORMATICS Su-Shing Chen CISE. 8/19/2005 Su-Shing Chen, CISE 1
CAP 5510-2 BIOINFORMATICS Su-Shing Chen CISE 8/19/2005 Su-Shing Chen, CISE 1 Building Local Genomic Databases Genomic research integrates sequence data with gene function knowledge. Gene ontology to represent
More informationBiostatistics and Bioinformatics Molecular Sequence Databases
. 1 Description of Module Subject Name Paper Name Module Name/Title 13 03 Dr. Vijaya Khader Dr. MC Varadaraj 2 1. Objectives: In the present module, the students will learn about 1. Encoding linear sequences
More informationMacVector for Mac OS X
MacVector 11.0.4 for Mac OS X System Requirements MacVector 11 runs on any PowerPC or Intel Macintosh running Mac OS X 10.4 or higher. It is a Universal Binary, meaning that it runs natively on both PowerPC
More information: Intro Programming for Scientists and Engineers Assignment 3: Molecular Biology
Assignment 3: Molecular Biology Page 1 600.112: Intro Programming for Scientists and Engineers Assignment 3: Molecular Biology Peter H. Fröhlich phf@cs.jhu.edu Joanne Selinski joanne@cs.jhu.edu Due Dates:
More informationFASTA. Besides that, FASTA package provides SSEARCH, an implementation of the optimal Smith- Waterman algorithm.
FASTA INTRODUCTION Definition (by David J. Lipman and William R. Pearson in 1985) - Compares a sequence of protein to another sequence or database of a protein, or a sequence of DNA to another sequence
More informationTutorial 1: Exploring the UCSC Genome Browser
Last updated: May 12, 2011 Tutorial 1: Exploring the UCSC Genome Browser Open the homepage of the UCSC Genome Browser at: http://genome.ucsc.edu/ In the blue bar at the top, click on the Genomes link.
More informationUser Guide for DNAFORM Clone Search Engine
User Guide for DNAFORM Clone Search Engine Document Version: 3.0 Dated from: 1 October 2010 The document is the property of K.K. DNAFORM and may not be disclosed, distributed, or replicated without the
More informationCompares a sequence of protein to another sequence or database of a protein, or a sequence of DNA to another sequence or library of DNA.
Compares a sequence of protein to another sequence or database of a protein, or a sequence of DNA to another sequence or library of DNA. Fasta is used to compare a protein or DNA sequence to all of the
More informationWilson Leung 01/03/2018 An Introduction to NCBI BLAST. Prerequisites: Detecting and Interpreting Genetic Homology: Lecture Notes on Alignment
An Introduction to NCBI BLAST Prerequisites: Detecting and Interpreting Genetic Homology: Lecture Notes on Alignment Resources: The BLAST web server is available at https://blast.ncbi.nlm.nih.gov/blast.cgi
More informationThe Human PAX6 Mutation Database
1998 Oxford University Press Nucleic Acids Research, 1998, Vol. 26, No. 1 259 264 The Human PAX6 Mutation Database Alastair Brown*, Mark McKie, Veronica van Heyningen and Jane Prosser Medical Research
More informationComputational Molecular Biology
Computational Molecular Biology Erwin M. Bakker Lecture 2 Materials used from R. Shamir [2] and H.J. Hoogeboom [4]. 1 Molecular Biology Sequences DNA A, T, C, G RNA A, U, C, G Protein A, R, D, N, C E,
More informationFilogeografía BIOL 4211, Universidad de los Andes 25 de enero a 01 de abril 2006
Laboratory excercise written by Andrew J. Crawford with the support of CIES Fulbright Program and Fulbright Colombia. Enjoy! Filogeografía BIOL 4211, Universidad de los Andes 25 de enero
More informationWhat is Internet COMPUTER NETWORKS AND NETWORK-BASED BIOINFORMATICS RESOURCES
What is Internet COMPUTER NETWORKS AND NETWORK-BASED BIOINFORMATICS RESOURCES Global Internet DNS Internet IP Internet Domain Name System Domain Name System The Domain Name System (DNS) is a hierarchical,
More information2) NCBI BLAST tutorial This is a users guide written by the education department at NCBI.
Web resources -- Tour. page 1 of 8 This is a guided tour. Any homework is separate. In fact, this exercise is used for multiple classes and is publicly available to everyone. The entire tour will take
More informationWilson Leung 05/27/2008 A Simple Introduction to NCBI BLAST
A Simple Introduction to NCBI BLAST Prerequisites: Detecting and Interpreting Genetic Homology: Lecture Notes on Alignment Resources: The BLAST web server is available at http://www.ncbi.nih.gov/blast/
More informationINTRODUCTION TO BIOINFORMATICS
Molecular Biology-2017 1 INTRODUCTION TO BIOINFORMATICS In this section, we want to provide a simple introduction to using the web site of the National Center for Biotechnology Information NCBI) to obtain
More informationEval: A Gene Set Comparison System
Masters Project Report Eval: A Gene Set Comparison System Evan Keibler evan@cse.wustl.edu Table of Contents Table of Contents... - 2 - Chapter 1: Introduction... - 5-1.1 Gene Structure... - 5-1.2 Gene
More informationGenome Browsers - The UCSC Genome Browser
Genome Browsers - The UCSC Genome Browser Background The UCSC Genome Browser is a well-curated site that provides users with a view of gene or sequence information in genomic context for a specific species,
More informationCLC Server. End User USER MANUAL
CLC Server End User USER MANUAL Manual for CLC Server 10.0.1 Windows, macos and Linux March 8, 2018 This software is for research purposes only. QIAGEN Aarhus Silkeborgvej 2 Prismet DK-8000 Aarhus C Denmark
More informationIntroduction to the Protein Data Bank Master Chimie Info Roland Stote Page #
Introduction to the Protein Data Bank Master Chimie Info - 2009 Roland Stote The purpose of the Protein Data Bank is to collect and organize 3D structures of proteins, nucleic acids, protein-nucleic acid
More informationA Protocol for Maintaining Multidatabase Referential Integrity. Articial Intelligence Center. SRI International, EJ229
A Protocol for Maintaining Multidatabase Referential Integrity Peter D. Karp Articial Intelligence Center SRI International, EJ229 333 Ravenswood Ave. Menlo Park, CA 94025 voice: 415-859-6375 fax: 415-859-3735
More informationBioinformatics explained: BLAST. March 8, 2007
Bioinformatics Explained Bioinformatics explained: BLAST March 8, 2007 CLC bio Gustav Wieds Vej 10 8000 Aarhus C Denmark Telephone: +45 70 22 55 09 Fax: +45 70 22 55 19 www.clcbio.com info@clcbio.com Bioinformatics
More informationINTRODUCTION TO BIOINFORMATICS
Molecular Biology-2019 1 INTRODUCTION TO BIOINFORMATICS In this section, we want to provide a simple introduction to using the web site of the National Center for Biotechnology Information NCBI) to obtain
More informationComputational Genomics and Molecular Biology, Fall
Computational Genomics and Molecular Biology, Fall 2015 1 Sequence Alignment Dannie Durand Pairwise Sequence Alignment The goal of pairwise sequence alignment is to establish a correspondence between the
More informationMacVector for Mac OS X
MacVector 10.6 for Mac OS X System Requirements MacVector 10.6 runs on any PowerPC or Intel Macintosh running Mac OS X 10.4 or higher. It is a Universal Binary, meaning that it runs natively on both PowerPC
More informationA tree-structured index algorithm for Expressed Sequence Tags clustering
A tree-structured index algorithm for Expressed Sequence Tags clustering Benjamin Kumwenda 0408046X Supervisor: Professor Scott Hazelhurst April 21, 2008 Declaration I declare that this dissertation is
More informationFinding homologous sequences in databases
Finding homologous sequences in databases There are multiple algorithms to search sequences databases BLAST (EMBL, NCBI, DDBJ, local) FASTA (EMBL, local) For protein only databases scan via Smith-Waterman
More informationTour Guide for Windows and Macintosh
Tour Guide for Windows and Macintosh 2011 Gene Codes Corporation Gene Codes Corporation 775 Technology Drive, Suite 100A, Ann Arbor, MI 48108 USA phone 1.800.497.4939 or 1.734.769.7249 (fax) 1.734.769.7074
More informationDNA Inspired Bi-directional Lempel-Ziv-like Compression Algorithms
DNA Inspired Bi-directional Lempel-Ziv-like Compression Algorithms Attiya Mahmood, Nazia Islam, Dawit Nigatu, and Werner Henkel Jacobs University Bremen Electrical Engineering and Computer Science Bremen,
More informationICB Fall G4120: Introduction to Computational Biology. Oliver Jovanovic, Ph.D. Columbia University Department of Microbiology
ICB Fall 2008 G4120: Computational Biology Oliver Jovanovic, Ph.D. Columbia University Department of Microbiology Copyright 2008 Oliver Jovanovic, All Rights Reserved. The Digital Language of Computers
More informationAdvanced UCSC Browser Functions
Advanced UCSC Browser Functions Dr. Thomas Randall tarandal@email.unc.edu bioinformatics.unc.edu UCSC Browser: genome.ucsc.edu Overview Custom Tracks adding your own datasets Utilities custom tools for
More informationGenome Browsers Guide
Genome Browsers Guide Take a Class This guide supports the Galter Library class called Genome Browsers. See our Classes schedule for the next available offering. If this class is not on our upcoming schedule,
More informationIntroduction to Phylogenetics Week 2. Databases and Sequence Formats
Introduction to Phylogenetics Week 2 Databases and Sequence Formats I. Databases Crucial to bioinformatics The bigger the database, the more comparative research data Requires scientists to upload data
More informationOverview. Dataset: testpos DNA: CCCATGGTCGGGGGGGGGGAGTCCATAACCC Num exons: 2 strand: + RNA (from file): AUGGUCAGUCCAUAA peptide (from file): MVSP*
Overview In this homework, we will write a program that will print the peptide (a string of amino acids) from four pieces of information: A DNA sequence (a string). The strand the gene appears on (a string).
More informationIntroduction to Sequence Databases. 1. DNA & RNA 2. Proteins
Introduction to Sequence Databases 1. DNA & RNA 2. Proteins 1 What are Databases? A database is a structured collection of information. A database consists of basic units called records or entries. Each
More informationUsing Manhattan distance and standard deviation for expressed sequence tag clustering. Dane Kennedy Supervisor: Scott Hazelhurst
Using Manhattan distance and standard deviation for expressed sequence tag clustering Dane Kennedy Supervisor: Scott Hazelhurst October 25, 2010 Abstract An explosion of genomic data in recent years has
More informationMacVector for Mac OS X. The online updater for this release is MB in size
MacVector 17.0.3 for Mac OS X The online updater for this release is 143.5 MB in size You must be running MacVector 15.5.4 or later for this updater to work! System Requirements MacVector 17.0 is supported
More informationTutorial: chloroplast genomes
Tutorial: chloroplast genomes Stacia Wyman Department of Computer Sciences Williams College Williamstown, MA 01267 March 10, 2005 ASSUMPTIONS: You are using Internet Explorer under OS X on the Mac. You
More informationTo prepare a paper using the NAR word template, please download the word file here
PREPARING AND SUBMITTING YOUR MANUSCRIPT To prepare a paper using the NAR word template, please download the word file here To prepare a paper using our LaTeX templates click here PUBLICATION CHARGES THE
More informationBioinformatics Hubs on the Web
Bioinformatics Hubs on the Web Take a class The Galter Library teaches a related class called Bioinformatics Hubs on the Web. See our Classes schedule for the next available offering. If this class is
More informationExercise 2: Browser-Based Annotation and RNA-Seq Data
Exercise 2: Browser-Based Annotation and RNA-Seq Data Jeremy Buhler July 24, 2018 This exercise continues your introduction to practical issues in comparative annotation. You ll be annotating genomic sequence
More informationIntegrated Access to Biological Data. A use case
Integrated Access to Biological Data. A use case Marta González Fundación ROBOTIKER, Parque Tecnológico Edif 202 48970 Zamudio, Vizcaya Spain marta@robotiker.es Abstract. This use case reflects the research
More informationThe Kodon quickguide
The Kodon quickguide Version 3.5 Copyright 2002-2007, Applied Maths NV. All rights reserved. Kodon is a registered trademark of Applied Maths NV. All other product names or trademarks are the property
More informationA Short Rasmol Tutorial: trna
A Short Rasmol Tutorial: trna Note that this tutorial is due at the beginning of class on Wednesday, October 3. amino acid attaches here 3 end trna secondary structure. The sequence of yeast trna Phe is
More informationLong Read RNA-seq Mapper
UNIVERSITY OF ZAGREB FACULTY OF ELECTRICAL ENGENEERING AND COMPUTING MASTER THESIS no. 1005 Long Read RNA-seq Mapper Josip Marić Zagreb, February 2015. Table of Contents 1. Introduction... 1 2. RNA Sequencing...
More informationNCBI News, November 2009
Peter Cooper, Ph.D. NCBI cooper@ncbi.nlm.nh.gov Dawn Lipshultz, M.S. NCBI lipshult@ncbi.nlm.nih.gov Featured Resource: New Discovery-oriented PubMed and NCBI Homepage The NCBI Site Guide A new and improved
More information(DNA#): Molecular Biology Computation Language Proposal
(DNA#): Molecular Biology Computation Language Proposal Aalhad Patankar, Min Fan, Nan Yu, Oriana Fuentes, Stan Peceny {ap3536, mf3084, ny2263, oif2102, skp2140} @columbia.edu Motivation Inspired by the
More informationSequence Alignment. GBIO0002 Archana Bhardwaj University of Liege
Sequence Alignment GBIO0002 Archana Bhardwaj University of Liege 1 What is Sequence Alignment? A sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity.
More informationIntroduction to Genome Browsers
Introduction to Genome Browsers Rolando Garcia-Milian, MLS, AHIP (Rolando.milian@ufl.edu) Department of Biomedical and Health Information Services Health Sciences Center Libraries, University of Florida
More informationGenomic Analysis with Genome Browsers.
Genomic Analysis with Genome Browsers http://barc.wi.mit.edu/hot_topics/ 1 Outline Genome browsers overview UCSC Genome Browser Navigating: View your list of regions in the browser Available tracks (eg.
More informationA Platform-Independent Graphical User Interface for SEQSEE and XALIGN
A Platform-Independent Graphical User Interface for SEQSEE and XALIGN David S. Wishart 1, Scott Fortin 2, David R. Woloschuk 2, Warren Wong 2, Timothy Rosborough 2, Gary Van Domselaar 1, Jonathan Schaeffer
More informationComplex Query Formulation Over Diverse Information Sources Using an Ontology
Complex Query Formulation Over Diverse Information Sources Using an Ontology Robert Stevens, Carole Goble, Norman Paton, Sean Bechhofer, Gary Ng, Patricia Baker and Andy Brass Department of Computer Science,
More informationInformation Resources in Molecular Biology Marcela Davila-Lopez How many and where
Information Resources in Molecular Biology Marcela Davila-Lopez (marcela.davila@medkem.gu.se) How many and where Data growth DB: What and Why A Database is a shared collection of logically related data,
More informationBLAST Exercise 2: Using mrna and EST Evidence in Annotation Adapted by W. Leung and SCR Elgin from Annotation Using mrna and ESTs by Dr. J.
BLAST Exercise 2: Using mrna and EST Evidence in Annotation Adapted by W. Leung and SCR Elgin from Annotation Using mrna and ESTs by Dr. J. Buhler Prerequisites: BLAST Exercise: Detecting and Interpreting
More informationIn the sense of the definition above, a system is both a generalization of one gene s function and a recipe for including and excluding components.
1 In the sense of the definition above, a system is both a generalization of one gene s function and a recipe for including and excluding components. 2 Starting from a biological motivation to annotate
More information8/19/13. Computational problems. Introduction to Algorithm
I519, Introduction to Introduction to Algorithm Yuzhen Ye (yye@indiana.edu) School of Informatics and Computing, IUB Computational problems A computational problem specifies an input-output relationship
More informationSept. 9, An Introduction to Bioinformatics. Special Topics BSC5936:
Special Topics BSC5936: An Introduction to Bioinformatics. Florida State University The Department of Biological Science www.bio.fsu.edu Sept. 9, 2003 The Dot Matrix Method Steven M. Thompson Florida State
More informationBioinformatics Database Worksheet
Bioinformatics Database Worksheet (based on http://www.usm.maine.edu/~rhodes/goodies/matics.html) Where are the opsin genes in the human genome? Point your browser to the NCBI Map Viewer at http://www.ncbi.nlm.nih.gov/mapview/.
More informationMOLECULAR VISUALIZATION LAB USING PYMOL a supplement to Chapter 11. Please complete this tutorial before coming to your lab section
MOLECULAR VISUALIZATION LAB USING PYMOL a supplement to Chapter 11 Please complete this tutorial before coming to your lab section (Adapted from Dr. Vardar-Ulu Fall 2015) Before coming to your lab section
More informationHymenopteraMine Documentation
HymenopteraMine Documentation Release 1.0 Aditi Tayal, Deepak Unni, Colin Diesh, Chris Elsik, Darren Hagen Apr 06, 2017 Contents 1 Welcome to HymenopteraMine 3 1.1 Overview of HymenopteraMine.....................................
More informationwith Data Annotation Tool Yamato II
Development of New DDBJ DNA Sequence Database with Data Annotation Tool Yamato II T. Koike 3 T. Okayama 3 J. Ishii 3 tkoike@genes.nig.ac.jp tokayama@genes.nig.ac.jp jishii@genes.nig.ac.jp T. Mizunuma 3
More informationGerman Cancer Research Center, Institute for Documentation, Information and Statistics, and
volume 10 Number 11982 Nucleic Acids Research Computer programs for the analysis and the management of DNA sequences G.Osterburg, K.H.Glatting and R.Sommer + German Cancer Research Center, Institute for
More informationAbstract. of biological data of high variety, heterogeneity, and semi-structured nature, and the increasing
Paper ID# SACBIO-129 HAVING A BLAST: ANALYZING GENE SEQUENCE DATA WITH BLASTQUEST WHERE DO WE GO FROM HERE? Abstract In this paper, we pursue two main goals. First, we describe a new tool called BlastQuest,
More informationBrowser Exercises - I. Alignments and Comparative genomics
Browser Exercises - I Alignments and Comparative genomics 1. Navigating to the Genome Browser (GBrowse) Note: For this exercise use http://www.tritrypdb.org a. Navigate to the Genome Browser (GBrowse)
More informationAs of August 15, 2008, GenBank contained bases from reported sequences. The search procedure should be
48 Bioinformatics I, WS 09-10, S. Henz (script by D. Huson) November 26, 2009 4 BLAST and BLAT Outline of the chapter: 1. Heuristics for the pairwise local alignment of two sequences 2. BLAST: search and
More informationHomology Modeling Professional for HyperChem Release Notes
Homology Modeling Professional for HyperChem Release Notes This document lists additional information about Homology Modeling Professional for HyperChem. Current Revision Revision H1 (Version 8.1.1) Current
More informationUser Manual. Ver. 3.0 March 19, 2012
User Manual Ver. 3.0 March 19, 2012 Table of Contents 1. Introduction... 2 1.1 Rationale... 2 1.2 Software Work-Flow... 3 1.3 New in GenomeGems 3.0... 4 2. Software Description... 5 2.1 Key Features...
More informationGeneR. JORGE ARTURO ZEPEDA MARTINEZ LOPEZ HERNANDEZ JOSE FABRICIO. October 6, 2009
GeneR JORGE ARTURO ZEPEDA MARTINEZ LOPEZ HERNANDEZ JOSE FABRICIO. jzepeda@lcg.unam.mx jlopez@lcg.unam.mx October 6, 2009 Abstract GeneR packages allow direct use of nucleotide sequences within R software.
More informationTECH NOTE Improving the Sensitivity of Ultra Low Input mrna Seq
TECH NOTE Improving the Sensitivity of Ultra Low Input mrna Seq SMART Seq v4 Ultra Low Input RNA Kit for Sequencing Powered by SMART and LNA technologies: Locked nucleic acid technology significantly improves
More informationLangara College Spring archived
Instructor: Office: Anoush Dadgar A303k Office Phone: 604-323-5236 Email: Office Hours: Text: GENERAL BIOLOGY I adadgar@langara.bc.ca 10:30 M, T, W & Th or by appointment Biology. Campbell, Neil A. and
More informationModule 1 Artemis. Introduction. Aims IF YOU DON T UNDERSTAND, PLEASE ASK! -1-
Module 1 Artemis Introduction Artemis is a DNA viewer and annotation tool, free to download and use, written by Kim Rutherford from the Sanger Institute (Rutherford et al., 2000). The program allows the
More informationBMC Genomics. Open Access. Abstract
BMC Genomics BioMed Central Software Genome Annotation Transfer Utility (GATU): rapid annotation of viral genomes using a closely related reference genome Vasily Tcherepanov, Angelika Ehlers and Chris
More informationFinding and Exporting Data. BioMart
September 2017 Finding and Exporting Data Not sure what tool to use to find and export data? BioMart is used to retrieve data for complex queries, involving a few or many genes or even complete genomes.
More informationGENE CONSTRUCTION KIT
GENE CONSTRUCTION KIT Tutorials & User Manual from Textco BioSoftware, Inc. Gene Construction Kit User Manual is Copyrighted Textco BioSoftware, Inc. All rights reserved. Textco BioSoftware, Inc. 7413
More informationWisconsin Science Olympiad Protein Folding Challenge. A Guide to Using RasMol for Exploring Protein Structure
Wisconsin Science Olympiad Protein Folding Challenge A Guide to Using RasMol for Exploring Protein Structure Prepared by MSOE Center for BioMolecular Modeling Milwaukee, WI Shannon Colton, Ph.D. Timothy
More informationDepositing small-angle scattering data and models to the Small-Angle Scattering Biological Data Bank (SASBDB).
Depositing small-angle scattering data and models to the Small-Angle Scattering Biological Data Bank (SASBDB). Introduction. The following guide provides a basic outline of the minimum requirements necessary
More informationUCSC Genome Browser Pittsburgh Workshop -- Practical Exercises
UCSC Genome Browser Pittsburgh Workshop -- Practical Exercises We will be using human assembly hg19. These problems will take you through a variety of resources at the UCSC Genome Browser. You will learn
More informationThe UCSC Genome Browser
The UCSC Genome Browser Search, retrieve and display the data that you want Materials prepared by Warren C. Lathe, Ph.D. Mary Mangan, Ph.D. www.openhelix.com Updated: Q3 2006 Version_0906 Copyright OpenHelix.
More informationData Curation Profile Human Genomics
Data Curation Profile Human Genomics Profile Author Profile Author Institution Name Contact J. Carlson N. Brown Purdue University J. Carlson, jrcarlso@purdue.edu Date of Creation October 27, 2009 Date
More information3. Open Vector NTI 9 (note 2) from desktop. A three pane window appears.
SOP: SP043.. Recombinant Plasmid Map Design Vector NTI Materials and Reagents: 1. Dell Dimension XPS T450 Room C210 2. Vector NTI 9 application, on desktop 3. Tuberculist database open in Internet Explorer
More informationThe Dot Matrix Method
Special Topics BS5936: An Introduction to Bioinformatics. Florida State niversity The Department of Biological Science www.bio.fsu.edu Sept. 9, 2003 The Dot Matrix Method Steven M. Thompson Florida State
More informationThe Use of WWW in Biological Research
The Use of WWW in Biological Research Introduction R.Doelz, Biocomputing Basel T.Etzold, EMBL Heidelberg Information in Biology grows rapidly. Initially, biological retrieval systems used conventional
More informationMSCBIO 2070/02-710: Computational Genomics, Spring A4: spline, HMM, clustering, time-series data analysis, RNA-folding
MSCBIO 2070/02-710:, Spring 2015 A4: spline, HMM, clustering, time-series data analysis, RNA-folding Due: April 13, 2015 by email to Silvia Liu (silvia.shuchang.liu@gmail.com) TA in charge: Silvia Liu
More informationEBI patent related services
EBI patent related services 4 th Annual Forum for SMEs October 18-19 th 2010 Jennifer McDowall Senior Scientist, EMBL-EBI EBI is an Outstation of the European Molecular Biology Laboratory. Overview Patent
More informationDATA-SHARING PLAN FOR MOORE FOUNDATION Coral resilience investigated in the field and via a sea anemone model system
DATA-SHARING PLAN FOR MOORE FOUNDATION Coral resilience investigated in the field and via a sea anemone model system GENERAL PHILOSOPHY (Arthur Grossman, Steve Palumbi, and John Pringle) The three Principal
More informationAn Efficient Algorithm to Locate All Locally Optimal Alignments Between Two Sequences Allowing for Gaps
An Efficient Algorithm to Locate All Locally Optimal Alignments Between Two Sequences Allowing for Gaps Geoffrey J. Barton Laboratory of Molecular Biophysics University of Oxford Rex Richards Building
More informationSubmitting allele sequences to the GenBank NGSengine allele submission Sequin
1 Submitting allele sequences to the GenBank 1 2 NGSengine allele submission 1 2.1 NGSengine restrictions 1 2.2 Allele names 2 2.3 Generating the fasta file and feature table 2 3 Sequin 2 3.1 Generating
More informationRNA Secondary Structure Prediction by Stochastic Context-Free Grammars
Faculty of Applied Sciences Department of Electronics and Information Systems Head of the Department: Prof. Dr. Eng. J. Van Campenhout RNA Secondary Structure Prediction by Stochastic Context-Free Grammars
More information7.36/7.91/20.390/20.490/6.802/6.874 PROBLEM SET 3. Gibbs Sampler, RNA secondary structure, Protein Structure with PyRosetta, Connections (25 Points)
7.36/7.91/20.390/20.490/6.802/6.874 PROBLEM SET 3. Gibbs Sampler, RNA secondary structure, Protein Structure with PyRosetta, Connections (25 Points) Due: Thursday, April 3 th at noon. Python Scripts All
More informationLinkDB: A Database of Cross Links between Molecular Biology Databases
LinkDB: A Database of Cross Links between Molecular Biology Databases Susumu Goto, Yutaka Akiyama, Minoru Kanehisa Institute for Chemical Research, Kyoto University Introduction We have developed a molecular
More informationGetting Started. Copyright statement
Getting Started Copyright statement Copyright 2001 Accelrys, a subsidiary of Pharmacopeia Inc. All rights reserved. This document contains proprietary information of Accelrys and its licensors. It is their
More informationChIP-seq Analysis Practical
ChIP-seq Analysis Practical Vladimir Teif (vteif@essex.ac.uk) An updated version of this document will be available at http://generegulation.info/index.php/teaching In this practical we will learn how
More informationALSCRIPT A Tool to Format Multiple Sequence Alignments
ALSCRIPT A Tool to Format Multiple Sequence Alignments Geoffrey J. Barton University of Oxford Laboratory of Molecular Biophysics The Rex Richards Building South Parks Road Oxford OX1 3QU Tel: (0865) 275368
More informationTutorial for the Exon Ontology website
Tutorial for the Exon Ontology website Table of content Outline Step-by-step Guide 1. Preparation of the test-list 2. First analysis step (without statistical analysis) 2.1. The output page is composed
More informationA First Introduction to Scientific Visualization Geoffrey Gray
Visual Molecular Dynamics A First Introduction to Scientific Visualization Geoffrey Gray VMD on CIRCE: On the lower bottom left of your screen, click on the window start-up menu. In the search box type
More informationPART 1: GENOME BROWSING WITH ARTEMIS
PART 1: GENOME BROWSING WITH ARTEMIS 1. Starting up the Artemis software In the Unix window type artemis A small start-up window will appear (see below). Now follow the sequence of numbers to load
More information