Bioinforma)cs Resources - Swissprot -
|
|
- Madlyn Fleming
- 6 years ago
- Views:
Transcription
1 Bioinforma)cs Resources - Swissprot - Lecture & Exercises Prof. B. Rost, Dr. L. Richter, J. Reeb Ins)tut für Informa)k I12
2 Puta)ve Schedule Apr. 22 nd Intro, General Overview (1. sh.) Apr. 29 th Sequence Databases (2. sh.) May 6 th No lecture May 13 th Sequence Databases (3. sh.) May 20 th Structure Databases (4. sh)* May 27 th SQL (5. sh.) Jun 3 rd SQL (6. sh) Jun 10 th No-SQL (7.sh.) Jun 17 th No-SQL (8.sh.)* Jun 24 th JavaScript / UI (9.sh.) Jul 1 st Web Services (10.sh.) Jul 8 th Bioinformatics Suites / Forums Jul 15 th Wrap Up, Q&A Jul 28 th Exam, 10:30-12:00 MW1050 * These exercises can earn you a bonus
3 XML Infusion (in 10 sec) compila)on from hmp:// XML is a soqware- and hardware-independent tool to store and to transport data XML stands for extensible Markup Language designed to store and transport data designed to be self-descrip)ve W3C recommenda)on it does NOT DO anything
4 About Tags XML tags are not predefined like HTML tags everybody can/has to invent his own tags new tags can be added any )me the author has to define content and structure of the document everything is plain text
5 Document Structure <?xml version="1.0" encoding="utf-8"?>! <bookstore>!! <book category="cooking">! <title lang="en">everyday Italian</title>! <author>giada De Laurentiis</author>! <year>2005</year>! <price>30.00</price>! </book>!! <book category="children">! <title lang="en">harry Potter</title>! <author>j K. Rowling</author>! <year>2005</year>! <price>29.99</price>! </book>!!...! </bookstore>!! taken from hmp://
6 Syntax Rules elements are defined using tags: <tagname>... </tagname> or <tagname/>! elements can be nested (contain other elements - parent and child nodes, sibling nodes) elements can have text content each document must contain ONE root element that is the parent of all other elements
7 Syntax Refined prolog line <?xml...> is op)onal tags must be (self-) closed tag are case sensi)ve tags must be properly nested: <a><b>...</a></b> <a><b>...</b></a>! Wrong! Right!
8 Syntax Refined tags may have amributes amribute values must always be quoted some special characters cannot be used directly -> coded by en)ty references: < < less than > > greater than & & ampersand ' apostrophe " quota)on mark comments: <! >!
9 case sensi)ve Tag Names must start with a lemer or underscore must not start with the lemers xml in any case can contain: lemers, digits, hyphens, underscores and periods cannot contain spaces apply common sense and a consistent style avoid: minus (-), period (.), colon (:), non-english characters for compa)bility reasons
10 XML Element everything between the start and the end tag tags are included can contain: - text - amributes - other elements - a mix of all are extensible
11 XML AMributes values must be quoted: single or double quotes the unused character can be used inside the value decision for amribute or element undecided, but: - amributes cannot contain mul)ple values - amributes cannot contain tree structures - amributes are not easily expandable useful to store meta data, like element id, etc.
12 A Glimpse of Namespaces allow to prevent tag name collisions between different authors/applica)ons/domains implemented by the introduc)on of prefixes defined as an amribute: xmlns:prefix= URI! usage: <prefix:tagname>! the URI is only needed to be unique used to integrate other specifica)ons, e.g. XSLT
13 Levels of Correctness well formed: a document obey the syntax rules: - root element - closing tag - case sensi)ve - properly nested - amribute values quoted valid documents: in add)on to being valid the also conform to a document type defini)on (format specifica)on)
14 Document Type Defini)ons two ways to specify a document structure: DTD: Document Typ Defini)on XML Schema: XML based alterna)ve to DTD
15 Example <?xml version="1.0" encoding="utf-8"?> <!DOCTYPE note SYSTEM "Note.dtd > <note> <to>tove</to> <from>jani</from> <heading>reminder</heading> <body>don't forget me this weekend! ©right; </body> </note>!
16 Example <!DOCTYPE note [ <!ELEMENT note (to,from,heading,body)> <!ELEMENT to (#PCDATA)> <!ELEMENT from (#PCDATA)> <!ELEMENT heading (#PCDATA)> <!ELEMENT body (#PCDATA)> <!ENTITY copyright Copyright by.. > ]>!
17 XML DTD referenced from a document with: <!DOCTYPE note SYSTEM "Note.dtd">!!DOCTYPE defines the root element!element defines the structure of the elements #PCDTA means parse-able text data!entity defines special characters or strings
18 XML Schema alterna)ve to DTD <xs:element name="note > <xs:complextype> <xs:sequence> <xs:element name="to" type="xs:string"/> <xs:element name="from" type="xs:string"/> <xs:element name="heading" type="xs:string"/> <xs:element name="body" type="xs:string"/> </xs:sequence> </xs:complextype> </xs:element>! support of data types and namespaces wrimen in XML and extensible!
19 Names and Other Complica)ons Amos Bairoch Ioannis Xenarios taken from hmp://web.expasy.org/ images/people/amos_bairoch.jpg taken from hmp:// Ioannis.Xenarios
20 History 1986 A. Bairoch created Swiss-Prot at the University of Geneva, since 1988 in collabora)on with EMBL/EBI 1993 together with Ron Appel launch of ExPASy 1998 Founda)on of SIB (Swiss Ins)tute of Bioinforma)cs) 2002 Founda)on of the UniProt consor)um by EBI, SIB and PIR
21 UniProtKB: UniProt Components: - UniProtKB/Swiss-Prot - UniProtKB/TrEMBL UniParc: pure sequence archive, no annota)ons UniRef: consists fo three databases of clustered sets of protein sequences (UniRef100, UniRef90, UniRef50) using the CD-HIT algorithm UniMes: data from metagenomic and environmental samples, not in UniProtKB
22 ExPASy hmp:// Expert Protein Analysis System (1993) now: SIB ExPASy Bioinforma)cs Resources Portal Ar)mo P, Jonnalagedda M, Arnold K, Bara)n D, Csardi G, de Castro E, Duvaud S, Flegel V, For)er A, Gasteiger E, Grosdidier A, Hernandez C, Ioannidis V, Kuznetsov D, Liech) R, Moreo S, Mostaguir K, Redaschi N, Rossier G, Xenarios I, and Stockinger H. ExPASy: SIB bioinforma9cs resource portal, Nucleic Acids Res, 40(W1):W597-W603, 2012.
23 Expasy Categories Proteomics Genomics Structural Bioinforma)cs Systems biology Phylogeny/evolu)on
24 Expasy Categories Popula)on gene)cs Transcriptomics Biophysics Imaging Drug Design
25 Resource Descrip)on 1. Resource name and descrip)on 2. Maintaining SIB group 3. Scien)fic category 4. Keywords: a controlled vocabulary is used to tag the resource
26 Resource Descrip)on 5. URL for the web interface and for the download if available 6. SoQware type: website, command line interface, GUI, etc 7. Status: green checkbox if currently available
27 UniProt/SwissProt Sta)s)cs Release 2016_05, May. 11 th taken from hmp://web.expasy.org/docs/ relnotes/relstat.html sequence entries ( in 2015_05)/ amino acids ( in 2015_05)
28 UniProt/SwissProt Sta)s)cs Growth over one year: 2016_5 vs 2015_5 Protein existence (PE) Entries % 1. Evidence at protein level (85.419) 16.8 (15.6) 2. Evidence at transcript level (61.814) 10.5 (11.3) 3. Inferred from homology ( ) 70.3 (70.7) 4. Predicted (11.526) 2.1 (2.1) 5. Uncertain (1.962) 0.4 (0.4)
29 Development taken from hmp://web.expasy.org/docs/relnotes/relstat1.png for release 2015_5
30 More Numbers (rel. 2015_5) Represented species: Top 20 species: sequences, i.e. 21.3% of the total number of sequences Entries No of Species Entries No of Species >
31 Species Representa)on (rel. 2015_5) Top Frequency Species Homo sapiens (Human) Mus musculus (Mouse) Arabidopsis thaliana (Mouse-ear cress) Rattus norvegicus (Rat) Saccharomyces cerevisiae (Baker s yest) Bos taurus (Bovine) Schizosaccheromyces pombe (Fission yeast) Escherichia coli K Bacillus subtilis Dictyostelium discoideum (Slime mold)
32 Representa)on of the Divisions (rel. 2015_5) Viruses (3%), Archaea (4%), Eukaryota (33%), Bacteria (61%),
33 Distribu)on of Eukaryota (rel. 2015_5) Nematoda (2%), 4417 Insecta (5%), 8781 Fungi (17%), Viridiplantae (20%), Other (8%), Human (11%), Other Mammalia (26%), Other Vertebrata (10%), 17823
34 Length Distribu)on (rel. 2015_5)
35 Amino Acid Composi)on (rel. 2015_5) figure taken from gray=aliphatic, red=acidic, green=small hydroxy, blue=basic, black=aromatic, white=amide, yellow=sulfur
36 SwissProt Annota)on Process defined in hmp:// sop_manual_cura)on.pdf explained in hmp://
37 Annota)on Phases 1. Sequence cura)on 2. Sequence analysis 3. Literature cura)on 4. Family-based cura)on 5. Evidence amribu)on 6. Quality assurance, integra)on and update
38 Sequence Cura)on more than 95% are translated CDS from INSDC other sources: PDB, direct protein sequencing, projects not submiong to INSDC sequences are selected according to cura)on priori)es (hmp:// results in the canonical sequence for a gene/ species pair
39 Steps toward the canonical sequence Entry selec)on Run BLAST similarity searches to iden)fy addi)onal sequences for the same gene Iden)fy homologs by reciprocal BLAST and phylogeny based resources Lock selected entries for other curators to prevent duplica)on
40 Steps toward the canonical sequence Prepare sequence alignments with T-Coffee, Muscle, Clustal W Merge into the canonical sequence: - most prevalent - most similar to orthologs sequences found in other species - based on length and aa composi)on it allows the clearest descrip)on - default: longest record conflicts and varia)ons
41 Sequence Analysis Several analysis programs are applied to the sequences for: - topological features - post-transla)onal modifica)ons - domains all results are manually checked and in- or excluded for annota)on
42 Topological Analysis Tools Prediction Signal P Presence and location of signal peptides TargetP Presence and location of transit peptides Predotar Mitochondrial, plastid or ER targeting sequences ESKW Transmembrane domains MEMSAT Transmembrane domains TMHMM Transmembrane domains Phobius Discriminates transmembrane and signal regions
43 Post-transla)onal modifica)on Analysis Tools Prediction GPI-predictor GPI lipid anchor sites NetNGlyc N-glycosylation sites NetOGlyc O-glycosylation sites NMT Predictor N-terminal myristoylation sites Sulfinator Tyrosine sulfatation sites
44 Domain Analysis Tools Prediction ps_scan internal PROSITE profile, pattern and rule scanning InterPro retrieves non-prosite motif matches using InterPro database or InterProScan Coils Coiled-coils regions polyaa internal program which identifies homopolymeric stretches of amino acids REPEAT identifies the following repeats: Ankyrin, Armadillo, HAT, HEAT, Kelch, Leucine-rich, PFTA, PFTB, RCC1, TPR, WD40
45 taken from Figure 1. UniProtKB sequence analysis results displayed in graphical interface
46 Literature Cura)on Iden)fica)on of relevant scien)fic literature from - literature and text mining resources (PubMed, Europe PMC, ihop, TextPresso) - addi)ons from other sources made by the curator Informa)on is extracted form the full text: - general annota)ons (not posi)on specific) - posi)on specific annota)ons
47 General Annota)ons hmp:// general_annota)on posi)on-independent contains mostly general biological informa)on like: func)ons, cataly)c ac)vity, cofactor, enzyme regula)on, subunit structure, pathway,...
48 Sequence Annota)ons posi)on dependent hmp:// sequence_annota)on regions or sites of interest like post-transla)onal modifica)ons, binding sites, ac)ve sites, etc. contains several subsec)ons: molecule processing, regions, sites, amino acid modifica)ons, natural variants, experimental info, secondary structure
49 Family-based Cura)on Evalua)on and cura)on of homologs as described above Standardiza)on of annota)on of homologs Propaga)on of annota)on across the homologs to ensure consistency
50 Evidence AMribu)on Every annota)on is amributed to its original source Every annota)on can be traced back and evaluated For evidence dis)nc)on there are 7 codes from the Evidence Code Ontology (ECO) used for manually curated entries hmp:// Addi)onal GO term annota)on
51 ECO code Term name Usage ECO: experimental evidence used in manual assertion Information for which there is published experimental evidence ECO: non-traceable author statement used in manual assertion Information based on author statements in scientific articles for which there is no ECO: ECO: ECO: ECO: ECO: sequence similarity evidence used in manual assertion imported information used in manual assertion curator inference used in manual assertion match to sequence model evidence used in manual assertion combinatorial evidence used in manual assertion taken from experimental support Information which has been propagated from a related experimentally characterised protein Information which has been imported from another database and manually verified Information which has been inferred by a curator based on his/her scientific knowledge or on the scientific content of an article Information originating from the UniProt automatic annotation systems or any of the sequence analysis programs used during the manual curation process and which has been manually verified Information which is manually curated based on a combination of experimental and computational evidence
52 Quality Control and Integra)on Finished entries run through a series of rulebased checked concerning especially posi)ons and regions All errors are corrected Manually reviewed by a senior curator Finally it is integrated into the database Unlock the finished entries for further cura)on
53 Demostra)on hmp:// P62756#sec)on_features
54 The Swiss-Prot Flat File hmp://web.expasy.org/docs/userman.html An entry is composed by different line types Line types have their own format Follows EMBL Nucleo)de Sequence Database format as close as possible 2 sec)ons: - core data (sequence data, cita)on info, taxonomy) - annota)ons (func)on, modifica)on, domains, sec and quart structure, disease associa)ons, conflicts, asf)
55 The following table lists the available two-letter line codes. Each code is followed by three blanks. Line Code Content Occurence in an entry ID Identification Once; starts the entry AC Accession number(s) Once or more DT Date Three times DE Description Once or more GN Gene name(s) Optional OS Organism species Once or more OG Organelle Optional OC Organism classification Once or more OX Taxonomy cross-reference Once OH Organism host Optional --continued--
56 Line Code Content Occurence in an entry RN Reference number Once or more RP Reference position Once or more RC Reference comment(s) Optional RX Reference cross-reference(s) Optional RG Reference group Once or more (Optional if RA line) RA Reference authors Once or more (Optional if R line) RT Reference title Optional RL Reference location Once or more CC Comments or notes Optional DR Database cross-references Optional PE Protein existence Once KW Keywords Optional FT Feature table data Once or more in Swiss-Prot, optional in TrEMBL SQ Sequence header Once (blanks) Sequence data Once or more // Termination line Once; ends the entry
57 Fields in More Detail ID line: ID EntryName Status; SequenceLength. EntryName: up to 11 uppercase alphanumeric characters X_Y - X is a mnemonic code of at most 5 alphanumeric characters - Y is a mnemonic species iden)fica)on code of at most 5 alphanumeric characters ID CYC_BOVIN Reviewed; 104 AA.
58 AC line: AC AC_number_1;[ AC_number_2;]...[ AC_number_N;] Accession number: 6 or 10 characters [A-N,R-Z] [0-9] [A-Z] [A-Z, 0-9] [A-Z, 0-9] [0-9] [O,P,Q] [0-9] [A-Z, 0-9] [A-Z, 0-9] [A-Z, 0-9] [0-9] [A-N,R-Z] [0-9] [A-Z] [A-Z, 0-9] [A-Z, 0-9] [0-9] [A-Z] [A-Z,0-9] [A-Z,0-9] [0-9] RegEx: [OPQ][0-9][A-Z0-9]{3}[0-9] [A-NR-Z][0-9] ([A-Z][A-Z0-9]{2}[0-9]){1,2} Examples: P12345, Q1AAA9, A0A022YWF9
59 DT line: date, DD-MMM-YYYY always one of the biweekly release dates always three lines: - date of integra)on - date of sequence version, sequence version X - date of entry version, entry version X Example: DT 01-FEB-1999, integrated into UniProtKB/TrEMBL. DT 15-OCT-2000, sequence version 2. DT 15-DEC-2004, entry version 5.
60 DE lines: - three categories and addi)onal subcategories - contains a recommended name - besides: full name, short name, EC number - alterna)ve names: e.g. as an allergen or in biotechnology,...
61 DE RecName: Full=Annexin A5; DE Short=Annexin-5; DE AltName: Full=Annexin V; DE AltName: Full=Lipocor)n V; DE AltName: Full=Endonexin II; DE AltName: Full=Calphobindin I; DE AltName: Full=CBP-I; DE AltName: Full=Placental an)coagulant protein I; DE Short=PAP-I; DE AltName: Full=PP4; DE AltName: Full=Thromboplas)n inhibitor; DE AltName: Full=Vascular an)coagulant-alpha; DE Short=VAC-alpha; DE AltName: Full=Anchorin CII; DE RecName: Full=Granulocyte colony-s)mula)ng factor; DE Short=G-CSF; DE AltName: Full=Pluripoie)n; DE AltName: Full=Filgras)m; DE AltName: Full=Lenogras)m; DE Flags: Precursor;
62 OS line: origina)ng organism OS Homo sapiens (Human). OS Rous sarcoma virus (strain Schmidt-Ruppin A) (RSV-SRA) (Avian leukosis OS virus-rsa). OC lines: contain the taxonomic classifica)on of the source organism according to ( hmp:// OC Node[; Node...]. OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; OC Mammalia; Eutheria; Euarchontoglires; Primates; Catarrhini; Hominidae; OC Homo.
63 RN, RP, RC, RX, RG, RA, RT, RL can occur mul)ple )me order in block fixed e.g: RN [1] RP NUCLEOTIDE SEQUENCE [MRNA] (ISOFORMS A AND C), FUNCTION, INTERACTION RP WITH PKC-3, SUBCELLULAR LOCATION, TISSUE SPECIFICITY, DEVELOPMENTAL RP STAGE, AND MUTAGENESIS OF PHE-175 AND PHE-221. RC STRAIN=Bristol N2; RX PubMed= ; DOI= /jbc.M ; RA Zhang L., Wu S.-L., Rubin C.S.; RT "A novel adapter protein employs a phosphotyrosine binding domain and RT excep)onally basic N-terminal domains to capture and localize an RT atypical protein kinase C: characteriza)on of Caenorhabdi)s elegans RT C kinase adapter 1, a protein that avidly binds protein kinase C3. ; RL J. Biol. Chem. 276: (2001).
64 CC lines free text contains most of the annotated informa)on CC -!- TOPIC: First line of a comment block; CC second and subsequent lines of a comment block. structured by predefined topics like: Allergen, Alterna)ve Products,.., Cofactor,..., Disease,.. Domain,..., Func)on, Interac)on,...
65 CC CC CC CC CC CC CC CC CC CC CC CC CC -!- ALLERGEN: Causes an allergic reaction in human. Minor allergen of! bovine dander.! -!- ALTERNATIVE PRODUCTS:! Event=Alternative initiation; Named isoforms=2;! Name=Alpha;! IsoId=P ; Sequence=Displayed;! Name=Beta;! IsoId=P ; Sequence=VSP_018696;! -!- SUBCELLULAR LOCATION: Cell membrane {ECO: }; Peripheral! membrane protein {ECO: }. Secreted {ECO: }. Note=The! last 22 C-terminal amino acids may participate in cell membrane! attachment.! -!- SUBCELLULAR LOCATION: Isoform 2: Cytoplasm {ECO: }.!!!
66 Cross References too many to enumerate extensive references with nucleo)de databases, e.g.: in EMBL FT CDS FT /protein_id="caa FT /db_xref="swiss-prot:p26345 FT /gene="reca FT /product="reca protein in Swiss=Prot DR EMBL; AJ297977; CAC ; -; Genomic_DNA. DR EMBL; X56491; CAA ; ALT_FRAME; mrna.
67 Key Words / Feature Table KW Keyword[; Keyword...]. helps to search resp. index the database no limits: KW 3D-structure; Alterna)ve splicing; Alzheimer disease; Amyloid; KW Apoptosis; Cell adhesion; Coated pits; Copper; KW Direct protein sequencing; Disease muta)on; Endocytosis; KW Glycoprotein; Heparin-binding; Iron; Membrane; Metal-binding; KW Notch signaling pathway; Phosphoryla)on; Polymorphism; KW Protease inhibitor; Proteoglycan; Serine protease inhibitor; Signal; KW Transmembrane; Zinc. Feature table like GenBank/EMBL/DDBJ
68 Programma)c Access hmp:// programma)c_access (remember this link!) several use cases documented, but not as an API best way: use the web interface to construct/ refine your query first before you try to automate the process
69 Retrieving an Individual Entry uses simple URL which can be bookmarked for individual entries: hmp:// default result is a web page alterna)ve formats: txt, xml, rdf, fasta, gff specified via the accession suffix structured formats like xml or rdf can include referenced entries
70 Using the ID mapping service hmp:// programma)c_access#batch_retrieval_perl_exa mple uses hmp POST method converts between different database IDs you have to know the specific abbrevia)on for the respec)ve databases
71 Retrieving Entries via Queries uses hmp GET method i.e. the query string is part of the URL structure might be quite complex use the browser to configure the query string more seong are available via the query builder hmp:// the URL length might be limited to 1000 characters
72 Examples hmp:// hmp:// hmp:// UniRef90_P04259.xml hmp:// UniRef90_P04259.rdf hmp:// UniRef90_P04259.fasta hmp:// UniRef90_P04259.tab
Bioinforma)cs Resources XML / Web Access
Bioinforma)cs Resources XML / Web Access Lecture & Exercises Prof. B. Rost, Dr. L. Richter, J. Reeb Ins)tut für Informa)k I12 XML Infusion (in 10 sec) compila)on from hkp://www.w3schools.com/xml/default.asp
More informationAutomatic annotation in UniProtKB using UniRule, and Complete Proteomes. Wei Mun Chan
Automatic annotation in UniProtKB using UniRule, and Complete Proteomes Wei Mun Chan Talk outline Introduction to UniProt UniProtKB annotation and propagation Data increase and the need for Automatic Annotation
More informationmarkup language carry data define your own tags self-descriptive W3C Recommendation
XML intro What is XML? XML stands for EXtensible Markup Language XML is a markup language much like HTML XML was designed to carry data, not to display data XML tags are not predefined. You must define
More informationXML (Extensible Markup Language)
Basics of XML: What is XML? XML (Extensible Markup Language) XML stands for Extensible Markup Language XML was designed to carry data, not to display data XML tags are not predefined. You must define your
More informationBioinforma)cs Resources
Bioinforma)cs Resources Lecture & Exercises Prof. B. Rost, Dr. L. Richter, J. Reeb Ins)tut für Informa)k I12 Bioinforma)cs Resources Organiza)on Schedule Overview Organiza)on Lecture: Friday 9-12, i.e.
More informationData formats. { "firstname": "John", "lastname" : "Smith", "age" : 25, "address" : { "streetaddress": "21 2nd Street",
Data formats { "firstname": "John", "lastname" : "Smith", "age" : 25, "address" : { "streetaddress": "21 2nd Street", "city" : "New York", "state" : "NY", "postalcode" : "10021" }, CSCI 470: Web Science
More informationXML extensible Markup Language
extensible Markup Language Eshcar Hillel Sources: http://www.w3schools.com http://java.sun.com/webservices/jaxp/ learning/tutorial/index.html Tutorial Outline What is? syntax rules Schema Document Object
More informationSoftware Engineering Methods, XML extensible Markup Language. Tutorial Outline. An Example File: Note.xml XML 1
extensible Markup Language Eshcar Hillel Sources: http://www.w3schools.com http://java.sun.com/webservices/jaxp/ learning/tutorial/index.html Tutorial Outline What is? syntax rules Schema Document Object
More informationSimilarity searches in biological sequence databases
Similarity searches in biological sequence databases Volker Flegel september 2004 Page 1 Outline Keyword search in databases General concept Examples SRS Entrez Expasy Similarity searches in databases
More informationEBI patent related services
EBI patent related services 4 th Annual Forum for SMEs October 18-19 th 2010 Jennifer McDowall Senior Scientist, EMBL-EBI EBI is an Outstation of the European Molecular Biology Laboratory. Overview Patent
More information11. EXTENSIBLE MARKUP LANGUAGE (XML)
11. EXTENSIBLE MARKUP LANGUAGE (XML) Introduction Extensible Markup Language is a Meta language that describes the contents of the document. So these tags can be called as self-describing data tags. XML
More informationThe Use of WWW in Biological Research
The Use of WWW in Biological Research Introduction R.Doelz, Biocomputing Basel T.Etzold, EMBL Heidelberg Information in Biology grows rapidly. Initially, biological retrieval systems used conventional
More informationUniProt - The Universal Protein Resource
UniProt - The Universal Protein Resource Claire O Donovan Pre-UniProt Swiss-Prot: created in July 1986; since 1987, a collaboration of the SIB and the EMBL/EBI; TrEMBL: created at the EBI in November 1996
More informationExtensible Markup Language (XML) Hamid Zarrabi-Zadeh Web Programming Fall 2013
Extensible Markup Language (XML) Hamid Zarrabi-Zadeh Web Programming Fall 2013 2 Outline Introduction XML Structure Document Type Definition (DTD) XHMTL Formatting XML CSS Formatting XSLT Transformations
More informationWeb Services Part I. XML Web Services. Instructor: Dr. Wei Ding Fall 2009
Web Services Part I Instructor: Dr. Wei Ding Fall 2009 CS 437/637 Database-Backed Web Sites and Web Services 1 XML Web Services XML Web Services = Web Services A Web service is a different kind of Web
More information12. Key features involved in building biological 3databases
12. Key features involved in building biological 3databases Central to the discipline of bioinformatics is the need to store biological information systematically in structured databases. The first databases
More informationINTRODUCTION TO BIOINFORMATICS
Molecular Biology-2017 1 INTRODUCTION TO BIOINFORMATICS In this section, we want to provide a simple introduction to using the web site of the National Center for Biotechnology Information NCBI) to obtain
More informationBiostatistics and Bioinformatics Molecular Sequence Databases
. 1 Description of Module Subject Name Paper Name Module Name/Title 13 03 Dr. Vijaya Khader Dr. MC Varadaraj 2 1. Objectives: In the present module, the students will learn about 1. Encoding linear sequences
More informationBioinformatics resources for data management. Etienne de Villiers KEMRI-Wellcome Trust, Kilifi
Bioinformatics resources for data management Etienne de Villiers KEMRI-Wellcome Trust, Kilifi Typical Bioinformatic Project Pose Hypothesis Store data in local database Read Relevant Papers Retrieve data
More informationXML for Android Developers. partially adapted from XML Tutorial by W3Schools
XML for Android Developers partially adapted from XML Tutorial by W3Schools Markup Language A system for annotating a text document in way that is syntactically distinguishble from the content. Motivated
More informationINTERNET PROGRAMMING XML
INTERNET PROGRAMMING XML Software Engineering Branch / 4 th Class Computer Engineering Department University of Technology OUTLINES XML Basic XML Advanced 2 HTML & CSS & JAVASCRIPT & XML DOCUMENTS HTML
More informationWeb Computing. Revision Notes
Web Computing Revision Notes Exam Format The format of the exam is standard: Answer TWO OUT OF THREE questions Candidates should answer ONLY TWO questions The time allowed is TWO hours Notes: You will
More informationmpmorfsdb: A database of Molecular Recognition Features (MoRFs) in membrane proteins. Introduction
mpmorfsdb: A database of Molecular Recognition Features (MoRFs) in membrane proteins. Introduction Molecular Recognition Features (MoRFs) are short, intrinsically disordered regions in proteins that undergo
More informationBioinformatics Hubs on the Web
Bioinformatics Hubs on the Web Take a class The Galter Library teaches a related class called Bioinformatics Hubs on the Web. See our Classes schedule for the next available offering. If this class is
More information20.453J / 2.771J / HST.958J Biomedical Information Technology Fall 2008
MIT OpenCourseWare http://ocw.mit.edu 20.453J / 2.771J / HST.958J Biomedical Information Technology Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
More informationBioinforma)cs Resources
Bioinforma)cs Resources Lecture & Exercises Prof. B. Rost, Dr. L. Richter, J. Reeb Ins)tut für Informa)k I12 Bioinforma)cs Resources Organiza)on Schedule Overview Organiza)on Lecture: Friday 9-12, i.e.
More informationEXtensible Markup Language XML
EXtensible Markup Language XML 1 What is XML? XML stands for EXtensible Markup Language XML is a markup language much like HTML XML was designed to carry data, not to display data XML tags are not predefined.
More informationWhat is XML? XML is designed to transport and store data.
What is XML? XML stands for extensible Markup Language. XML is designed to transport and store data. HTML was designed to display data. XML is a markup language much like HTML XML was designed to carry
More informationBioinforma)cs Resources
Bioinforma)cs Resources Lecture & Exercises Prof. B. Rost, Dr. L. Richter, J. Reeb Ins)tut für Informa)k I12 Bioinforma)cs Resources Organiza)on Schedule Overview Organiza)on Lecture: Friday 9-12, i.e.
More informationWeb services CSCI 470: Web Science Keith Vertanen Copyright 2011
Web services CSCI 470: Web Science Keith Vertanen Copyright 2011 Web services What does that mean? Why are they useful? Examples! Major interac>on types REST SOAP Data formats: XML JSON Overview 2 3 W3C
More informationRESTRUCTURING GPSDB UNIVERSITY OF GENEVA MASTER IN PROTEOMICS AND BIOINFORMATICS. Written by Emilie Pasche
UNIVERSITY OF GENEVA MASTER IN PROTEOMICS AND BIOINFORMATICS RESTRUCTURING GPSDB Written by Emilie Pasche Supervisors: Violaine Pillet, Anne-Lise Veuthey, Céline Hernandez Swiss Institute of Bioinformatics,
More informationTopics of the talk. Biodatabases. Data types. Some sequence terminology...
Topics of the talk Biodatabases Jarno Tuimala / Eija Korpelainen CSC What data are stored in biological databases? What constitutes a good database? Nucleic acid sequence databases Amino acid sequence
More informationIntroduction to Sequence Databases. 1. DNA & RNA 2. Proteins
Introduction to Sequence Databases 1. DNA & RNA 2. Proteins 1 What are Databases? A database is a structured collection of information. A database consists of basic units called records or entries. Each
More informationIntroduc)on to annota)on with Artemis. Download presenta.on and data
Introduc)on to annota)on with Artemis Download presenta.on and data Annota)on Assign an informa)on to genomic sequences???? Genome annota)on 1. Iden.fying genomic elements by: Predic)on (structural annota.on
More informationProtein Information Tutorial
Protein Information Tutorial Relevant websites: SMART (normal mode): SMART (batch mode): HMMER search: InterProScan: CBS Prediction Servers: EMBOSS: http://smart.embl-heidelberg.de/ http://smart.embl-heidelberg.de/smart/batch.pl
More informationAutomatic GO Term Prediction on UniProtKB
Automatic GO Term Prediction on UniProtKB Diploma Thesis at University of Applied Sciences Weihenstephan Department of Biotechnology and Bioinformatics Barbara Meier 23. Mar. 2007 conducted at European
More informationWilson Leung 01/03/2018 An Introduction to NCBI BLAST. Prerequisites: Detecting and Interpreting Genetic Homology: Lecture Notes on Alignment
An Introduction to NCBI BLAST Prerequisites: Detecting and Interpreting Genetic Homology: Lecture Notes on Alignment Resources: The BLAST web server is available at https://blast.ncbi.nlm.nih.gov/blast.cgi
More informationFinding and Exporting Data. BioMart
September 2017 Finding and Exporting Data Not sure what tool to use to find and export data? BioMart is used to retrieve data for complex queries, involving a few or many genes or even complete genomes.
More informationINTRODUCTION TO BIOINFORMATICS
Molecular Biology-2019 1 INTRODUCTION TO BIOINFORMATICS In this section, we want to provide a simple introduction to using the web site of the National Center for Biotechnology Information NCBI) to obtain
More informationbut XML goes far beyond HTML: it describes data
The XML Meta-Language 1 Introduction to XML The father of markup languages: XML = EXtensible Markup Language is a simplified version of SGML Originally created to overcome the limitations of HTML the HTML
More informationWeb Programming Paper Solution (Chapter wise)
What is valid XML document? Design an XML document for address book If in XML document All tags are properly closed All tags are properly nested They have a single root element XML document forms XML tree
More information2) NCBI BLAST tutorial This is a users guide written by the education department at NCBI.
Web resources -- Tour. page 1 of 8 This is a guided tour. Any homework is separate. In fact, this exercise is used for multiple classes and is publicly available to everyone. The entire tour will take
More informationProtein Sequence Database
Protein Sequence Database A protein is a large molecule manufactured in the cell of a living organism to carry out essential functions within the cell. The primary structure of a protein is a sequence
More informationTrad DDBJ. DNA Data Bank of Japan
Trad DDBJ DNA Data Bank of Japan LOCUS HUMIL2HOM 397 bp DNA linear HUM 27-APR-1993 DEFINITION Human interleukin 2 (IL-2)-like DNA. ACCESSION M13784 VERSION M13784.1 KEYWORDS. SOURCE Homo sapiens (human)
More informationXML. Presented by : Guerreiro João Thanh Truong Cong
XML Presented by : Guerreiro João Thanh Truong Cong XML : Definitions XML = Extensible Markup Language. Other Markup Language : HTML. XML HTML XML describes a Markup Language. XML is a Meta-Language. Users
More informationIntroduction Syntax and Usage XML Databases Java Tutorial XML. November 5, 2008 XML
Introduction Syntax and Usage Databases Java Tutorial November 5, 2008 Introduction Syntax and Usage Databases Java Tutorial Outline 1 Introduction 2 Syntax and Usage Syntax Well Formed and Valid Displaying
More informationXML: a "skeleton" for creating markup languages you already know it! <element attribute="value">content</element> languages written in XML specify:
1 XML What is XML? 2 XML: a "skeleton" for creating markup languages you already know it! syntax is identical to XHTML's: content languages written in XML specify:
More informationProtein Sequence Database
Protein Sequence Database A protein is a large molecule manufactured in the cell of a living organism to carry out essential functions within the cell. The primary structure of a protein is a sequence
More informationXML CSC 443: Web Programming
1 XML CSC 443: Web Programming Haidar Harmanani Department of Computer Science and Mathematics Lebanese American University Byblos, 1401 2010 Lebanon What is XML? 2 XML: a "skeleton" for creating markup
More informationBioinforma)cs Resources - Genbank -
Bioinforma)cs Resources - Genbank - Lecture & Exercises Prof. B. Rost, Dr. L. Richter, J. Reeb Ins)tut für Informa)k I12 Preliminary Schedule April 13 th Intro, General Overview (1. sh.) April 20 th Sequence
More informationMining the Biomedical Research Literature. Ken Baclawski
Mining the Biomedical Research Literature Ken Baclawski Data Formats Flat files Spreadsheets Relational databases Web sites XML Documents Flexible very popular text format Self-describing records XML Documents
More informationCAP BIOINFORMATICS Su-Shing Chen CISE. 8/19/2005 Su-Shing Chen, CISE 1
CAP 5510-2 BIOINFORMATICS Su-Shing Chen CISE 8/19/2005 Su-Shing Chen, CISE 1 Building Local Genomic Databases Genomic research integrates sequence data with gene function knowledge. Gene ontology to represent
More informationCSE 421 Algorithms. Sequence Alignment
CSE 421 Algorithms Sequence Alignment 1 Sequence Alignment What Why A Dynamic Programming Algorithm 2 Sequence Alignment Goal: position characters in two strings to best line up identical/similar ones
More informationCACAO Training. Jim Hu and Suzi Aleksander Spring 2016
CACAO Training Jim Hu and Suzi Aleksander Spring 2016 1 What is CACAO? Community Assessment of Community Annotation with Ontologies (CACAO) Annotation of gene function Competition Within a class Between
More informationWilson Leung 05/27/2008 A Simple Introduction to NCBI BLAST
A Simple Introduction to NCBI BLAST Prerequisites: Detecting and Interpreting Genetic Homology: Lecture Notes on Alignment Resources: The BLAST web server is available at http://www.ncbi.nih.gov/blast/
More informationCSC Web Technologies, Spring Web Data Exchange Formats
CSC 342 - Web Technologies, Spring 2017 Web Data Exchange Formats Web Data Exchange Data exchange is the process of transforming structured data from one format to another to facilitate data sharing between
More informationExtensible Markup Language (XML) What is XML? Structure of an XML document. CSE 190 M (Web Programming), Spring 2007 University of Washington
Page 1 Extensible Markup Language (XML) CSE 190 M (Web Programming), Spring 2007 University of Washington Reading: Sebesta Ch. 8 sections 8.1-8.3, 8.7-8.8, 8.10.3 What is XML? a specification for creating
More informationBovineMine Documentation
BovineMine Documentation Release 1.0 Deepak Unni, Aditi Tayal, Colin Diesh, Christine Elsik, Darren Hag Oct 06, 2017 Contents 1 Tutorial 3 1.1 Overview.................................................
More informationInformation Resources in Molecular Biology Marcela Davila-Lopez How many and where
Information Resources in Molecular Biology Marcela Davila-Lopez (marcela.davila@medkem.gu.se) How many and where Data growth DB: What and Why A Database is a shared collection of logically related data,
More informationE-Applications. XML and DOM in Javascript. Michail Lampis
E-Applications XML and DOM in Javascript Michail Lampis michail.lampis@dauphine.fr Acknowledgment Much of the material on these slides follows the tutorial given in: http://www.w3schools.com/dom/ XML XML
More informationCSE 312 Autumn More Algorithms: Dynamic Programming for Sequence Alignment
CSE 312 Autumn 2011 More Algorithms: Dynamic Programming for Sequence Alignment 1 An Important Algorithm Design Technique Dynamic Programming Give a solution to a problem using smaller sub-problems, e.g.
More informationPPI Network Alignment Advanced Topics in Computa8onal Genomics
PPI Network Alignment 02-715 Advanced Topics in Computa8onal Genomics PPI Network Alignment Compara8ve analysis of PPI networks across different species by aligning the PPI networks Find func8onal orthologs
More informationWhat is Internet COMPUTER NETWORKS AND NETWORK-BASED BIOINFORMATICS RESOURCES
What is Internet COMPUTER NETWORKS AND NETWORK-BASED BIOINFORMATICS RESOURCES Global Internet DNS Internet IP Internet Domain Name System Domain Name System The Domain Name System (DNS) is a hierarchical,
More informationSoftware review. Biomolecular Interaction Network Database
Biomolecular Interaction Network Database Keywords: protein interactions, visualisation, biology data integration, web access Abstract This software review looks at the utility of the Biomolecular Interaction
More informationBlast2GO User Manual. Blast2GO Ortholog Group Annotation May, BioBam Bioinformatics S.L. Valencia, Spain
Blast2GO User Manual Blast2GO Ortholog Group Annotation May, 2016 BioBam Bioinformatics S.L. Valencia, Spain Contents 1 Clusters of Orthologs 2 2 Orthologous Group Annotation Tool 2 3 Statistics for NOG
More informationCOMP9321 Web Application Engineering. Extensible Markup Language (XML)
COMP9321 Web Application Engineering Extensible Markup Language (XML) Dr. Basem Suleiman Service Oriented Computing Group, CSE, UNSW Australia Semester 1, 2016, Week 4 http://webapps.cse.unsw.edu.au/webcms2/course/index.php?cid=2442
More informationLecture 5 Advanced BLAST
Introduction to Bioinformatics for Medical Research Gideon Greenspan gdg@cs.technion.ac.il Lecture 5 Advanced BLAST BLAST Recap Sequence Alignment Complexity and indexing BLASTN and BLASTP Basic parameters
More informationExtensible Markup Language (XML) What is XML? An example XML file. CSE 190 M (Web Programming), Spring 2008 University of Washington
Extensible Markup Language (XML) CSE 190 M (Web Programming), Spring 2008 University of Washington Except where otherwise noted, the contents of this presentation are Copyright 2008 Marty Stepp and Jessica
More informationBioinforma)cs Resources - NoSQL -
Bioinforma)cs Resources - NoSQL - Lecture & Exercises Prof. B. Rost, Dr. L. Richter, J. Reeb Ins)tut für Informa)k I12 Short SQL Recap schema typed data tables defined layout space consump)on is computable
More informationM359 Block5 - Lecture12 Eng/ Waleed Omar
Documents and markup languages The term XML stands for extensible Markup Language. Used to label the different parts of documents. Labeling helps in: Displaying the documents in a formatted way Querying
More informationExercises. Biological Data Analysis Using InterMine workshop exercises with answers
Exercises Biological Data Analysis Using InterMine workshop exercises with answers Exercise1: Faceted Search Use HumanMine for this exercise 1. Search for one or more of the following using the keyword
More informationEditing Pathway/Genome Databases
Editing Pathway/Genome Databases By Ron Caspi This presentation can be found at http://bioinformatics.ai.sri.com/ptools/tutorial/sessions/ 1 Pathway Tools in Editing Mode The database is separate from
More informationMANAGING INFORMATION (CSCU9T4) LECTURE 2: XML STRUCTURE
MANAGING INFORMATION (CSCU9T4) LECTURE 2: XML STRUCTURE Gabriela Ochoa http://www.cs.stir.ac.uk/~goc/ OUTLINE XML Elements vs. Attributes Well-formed vs. Valid XML documents Document Type Definitions (DTDs)
More informationJava EE 7: Back-end Server Application Development 4-2
Java EE 7: Back-end Server Application Development 4-2 XML describes data objects called XML documents that: Are composed of markup language for structuring the document data Support custom tags for data
More informationUsing the Distributed Annotation System
Using the Distributed Annotation System http://www.ebi.ac.uk Introduction This half day course is designed for those with a biological background that are relatively new to the use of the Distributed Annotation
More informationComplex Query Formulation Over Diverse Information Sources Using an Ontology
Complex Query Formulation Over Diverse Information Sources Using an Ontology Robert Stevens, Carole Goble, Norman Paton, Sean Bechhofer, Gary Ng, Patricia Baker and Andy Brass Department of Computer Science,
More informationSupplementary Note 1: Considerations About Data Integration
Supplementary Note 1: Considerations About Data Integration Considerations about curated data integration and inferred data integration mentha integrates high confidence interaction information curated
More informationFacilitating Semantic Alignment of EBI Resources
Facilitating Semantic Alignment of EBI Resources 17 th March, 2017 Tony Burdett Technical Co-ordinator Samples, Phenotypes and Ontologies Team www.ebi.ac.uk What is EMBL-EBI? Europe s home for biological
More informationGFF3sort: an efficient tool to sort GFF3 files for tabix indexing
biorxiv preprint first posted online Jun., 0; doi: http://dx.doi.org/0.0/. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY-NC.0
More informationIntegrated Access to Biological Data. A use case
Integrated Access to Biological Data. A use case Marta González Fundación ROBOTIKER, Parque Tecnológico Edif 202 48970 Zamudio, Vizcaya Spain marta@robotiker.es Abstract. This use case reflects the research
More informationEditing Pathway/Genome Databases
Editing Pathway/Genome Databases By Ron Caspi ron.caspi@sri.com This presentation can be found at http://bioinformatics.ai.sri.com/ptools/tutorial/sessions/ curation/curation of genes, enzymes and Pathways/
More informationWeb-based tools for Bioinformatics; A (free) introduction to (freely available) NCBI, MUSC and World-wide Bioinformatics Resources.
1 of 12 9/10/2003 11:15 AM Web-based tools for Bioinformatics; A (free) introduction to (freely available) NCBI, MUSC and World-wide Bioinformatics Resources. When and Where---Wednesdays at 1pm Room 438
More informationComp 336/436 - Markup Languages. Fall Semester Week 4. Dr Nick Hayward
Comp 336/436 - Markup Languages Fall Semester 2017 - Week 4 Dr Nick Hayward XML - recap first version of XML became a W3C Recommendation in 1998 a useful format for data storage and exchange config files,
More informationMaster Thesis. Andreas Schlicker
Master Thesis A Global Approach to Comparative Genomics: Comparison of Functional Annotation over the Taxonomic Tree by Andreas Schlicker A Thesis Submitted to the Center for Bioinformatics of Saarland
More informationFinding homologous sequences in databases
Finding homologous sequences in databases There are multiple algorithms to search sequences databases BLAST (EMBL, NCBI, DDBJ, local) FASTA (EMBL, local) For protein only databases scan via Smith-Waterman
More information1. HPC & I/O 2. BioPerl
1. HPC & I/O 2. BioPerl A simplified picture of the system User machines Login server(s) jhpce01.jhsph.edu jhpce02.jhsph.edu 72 nodes ~3000 cores compute farm direct attached storage Research network
More informationCopyright 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley. Chapter 7 XML
Chapter 7 XML 7.1 Introduction extensible Markup Language Developed from SGML A meta-markup language Deficiencies of HTML and SGML Lax syntactical rules Many complex features that are rarely used HTML
More informationHymenopteraMine Documentation
HymenopteraMine Documentation Release 1.0 Aditi Tayal, Deepak Unni, Colin Diesh, Chris Elsik, Darren Hagen Apr 06, 2017 Contents 1 Welcome to HymenopteraMine 3 1.1 Overview of HymenopteraMine.....................................
More informationEBI services. Jennifer McDowall EMBL-EBI
EBI services Jennifer McDowall EMBL-EBI The SLING project is funded by the European Commission within Research Infrastructures of the FP7 Capacities Specific Programme, grant agreement number 226073 (Integrating
More informationViewing Molecular Structures
Viewing Molecular Structures Proteins fulfill a wide range of biological functions which depend upon their three dimensional structures. Therefore, deciphering the structure of proteins has been the quest
More informationComp 336/436 - Markup Languages. Fall Semester Week 4. Dr Nick Hayward
Comp 336/436 - Markup Languages Fall Semester 2018 - Week 4 Dr Nick Hayward XML - recap first version of XML became a W3C Recommendation in 1998 a useful format for data storage and exchange config files,
More informationIntroduction to XML. Chapter 133
Chapter 133 Introduction to XML A. Multiple choice questions: 1. Attributes in XML should be enclosed within. a. single quotes b. double quotes c. both a and b d. none of these c. both a and b 2. Which
More informationDatabase Similarity Searching
Database Similarity Searching Why search databases? To find out if a new DNA sequence shares similari?es with sequences already deposited in the databanks. To find proteins homologous to a puta?ve coding
More informationXML: Introduction. !important Declaration... 9:11 #FIXED... 7:5 #IMPLIED... 7:5 #REQUIRED... Directive... 9:11
!important Declaration... 9:11 #FIXED... 7:5 #IMPLIED... 7:5 #REQUIRED... 7:4 @import Directive... 9:11 A Absolute Units of Length... 9:14 Addressing the First Line... 9:6 Assigning Meaning to XML Tags...
More informationBioExtract Server User Manual
BioExtract Server User Manual University of South Dakota About Us The BioExtract Server harnesses the power of online informatics tools for creating and customizing workflows. Users can query online sequence
More informationCOMPARATIVE MICROBIAL GENOMICS ANALYSIS WORKSHOP. Exercise 2: Predicting Protein-encoding Genes, BlastMatrix, BlastAtlas
COMPARATIVE MICROBIAL GENOMICS ANALYSIS WORKSHOP Exercise 2: Predicting Protein-encoding Genes, BlastMatrix, BlastAtlas First of all connect once again to the CBS system: Open ssh shell client. Press Quick
More informationHow to use KAIKObase Version 3.1.0
How to use KAIKObase Version 3.1.0 Version3.1.0 29/Nov/2010 http://sgp2010.dna.affrc.go.jp/kaikobase/ Copyright National Institute of Agrobiological Sciences. All rights reserved. Outline 1. System overview
More informationW3C XML XML Overview
Overview Jaroslav Porubän 2008 References Tutorials, http://www.w3schools.com Specifications, World Wide Web Consortium, http://www.w3.org David Hunter, et al.: Beginning, 4th Edition, Wrox, 2007, 1080
More informationThe components of a basic XML system.
XML XML stands for EXtensible Markup Language. XML is a markup language much like HTML XML is a software- and hardware-independent tool for carrying information. XML is easy to learn. XML was designed
More informationIntroduction to XML. M2 MIA, Grenoble Université. François Faure
M2 MIA, Grenoble Université Example tove jani reminder dont forget me this weekend!
More information