Bioinforma)cs Resources - Swissprot -

Size: px
Start display at page:

Download "Bioinforma)cs Resources - Swissprot -"

Transcription

1 Bioinforma)cs Resources - Swissprot - Lecture & Exercises Prof. B. Rost, Dr. L. Richter, J. Reeb Ins)tut für Informa)k I12

2 Puta)ve Schedule Apr. 22 nd Intro, General Overview (1. sh.) Apr. 29 th Sequence Databases (2. sh.) May 6 th No lecture May 13 th Sequence Databases (3. sh.) May 20 th Structure Databases (4. sh)* May 27 th SQL (5. sh.) Jun 3 rd SQL (6. sh) Jun 10 th No-SQL (7.sh.) Jun 17 th No-SQL (8.sh.)* Jun 24 th JavaScript / UI (9.sh.) Jul 1 st Web Services (10.sh.) Jul 8 th Bioinformatics Suites / Forums Jul 15 th Wrap Up, Q&A Jul 28 th Exam, 10:30-12:00 MW1050 * These exercises can earn you a bonus

3 XML Infusion (in 10 sec) compila)on from hmp:// XML is a soqware- and hardware-independent tool to store and to transport data XML stands for extensible Markup Language designed to store and transport data designed to be self-descrip)ve W3C recommenda)on it does NOT DO anything

4 About Tags XML tags are not predefined like HTML tags everybody can/has to invent his own tags new tags can be added any )me the author has to define content and structure of the document everything is plain text

5 Document Structure <?xml version="1.0" encoding="utf-8"?>! <bookstore>!! <book category="cooking">! <title lang="en">everyday Italian</title>! <author>giada De Laurentiis</author>! <year>2005</year>! <price>30.00</price>! </book>!! <book category="children">! <title lang="en">harry Potter</title>! <author>j K. Rowling</author>! <year>2005</year>! <price>29.99</price>! </book>!!...! </bookstore>!! taken from hmp://

6 Syntax Rules elements are defined using tags: <tagname>... </tagname> or <tagname/>! elements can be nested (contain other elements - parent and child nodes, sibling nodes) elements can have text content each document must contain ONE root element that is the parent of all other elements

7 Syntax Refined prolog line <?xml...> is op)onal tags must be (self-) closed tag are case sensi)ve tags must be properly nested: <a><b>...</a></b> <a><b>...</b></a>! Wrong! Right!

8 Syntax Refined tags may have amributes amribute values must always be quoted some special characters cannot be used directly -> coded by en)ty references: < < less than > > greater than & & ampersand &apos; apostrophe " quota)on mark comments: <! >!

9 case sensi)ve Tag Names must start with a lemer or underscore must not start with the lemers xml in any case can contain: lemers, digits, hyphens, underscores and periods cannot contain spaces apply common sense and a consistent style avoid: minus (-), period (.), colon (:), non-english characters for compa)bility reasons

10 XML Element everything between the start and the end tag tags are included can contain: - text - amributes - other elements - a mix of all are extensible

11 XML AMributes values must be quoted: single or double quotes the unused character can be used inside the value decision for amribute or element undecided, but: - amributes cannot contain mul)ple values - amributes cannot contain tree structures - amributes are not easily expandable useful to store meta data, like element id, etc.

12 A Glimpse of Namespaces allow to prevent tag name collisions between different authors/applica)ons/domains implemented by the introduc)on of prefixes defined as an amribute: xmlns:prefix= URI! usage: <prefix:tagname>! the URI is only needed to be unique used to integrate other specifica)ons, e.g. XSLT

13 Levels of Correctness well formed: a document obey the syntax rules: - root element - closing tag - case sensi)ve - properly nested - amribute values quoted valid documents: in add)on to being valid the also conform to a document type defini)on (format specifica)on)

14 Document Type Defini)ons two ways to specify a document structure: DTD: Document Typ Defini)on XML Schema: XML based alterna)ve to DTD

15 Example <?xml version="1.0" encoding="utf-8"?> <!DOCTYPE note SYSTEM "Note.dtd > <note> <to>tove</to> <from>jani</from> <heading>reminder</heading> <body>don't forget me this weekend! &copyright; </body> </note>!

16 Example <!DOCTYPE note [ <!ELEMENT note (to,from,heading,body)> <!ELEMENT to (#PCDATA)> <!ELEMENT from (#PCDATA)> <!ELEMENT heading (#PCDATA)> <!ELEMENT body (#PCDATA)> <!ENTITY copyright Copyright by.. > ]>!

17 XML DTD referenced from a document with: <!DOCTYPE note SYSTEM "Note.dtd">!!DOCTYPE defines the root element!element defines the structure of the elements #PCDTA means parse-able text data!entity defines special characters or strings

18 XML Schema alterna)ve to DTD <xs:element name="note > <xs:complextype> <xs:sequence> <xs:element name="to" type="xs:string"/> <xs:element name="from" type="xs:string"/> <xs:element name="heading" type="xs:string"/> <xs:element name="body" type="xs:string"/> </xs:sequence> </xs:complextype> </xs:element>! support of data types and namespaces wrimen in XML and extensible!

19 Names and Other Complica)ons Amos Bairoch Ioannis Xenarios taken from hmp://web.expasy.org/ images/people/amos_bairoch.jpg taken from hmp:// Ioannis.Xenarios

20 History 1986 A. Bairoch created Swiss-Prot at the University of Geneva, since 1988 in collabora)on with EMBL/EBI 1993 together with Ron Appel launch of ExPASy 1998 Founda)on of SIB (Swiss Ins)tute of Bioinforma)cs) 2002 Founda)on of the UniProt consor)um by EBI, SIB and PIR

21 UniProtKB: UniProt Components: - UniProtKB/Swiss-Prot - UniProtKB/TrEMBL UniParc: pure sequence archive, no annota)ons UniRef: consists fo three databases of clustered sets of protein sequences (UniRef100, UniRef90, UniRef50) using the CD-HIT algorithm UniMes: data from metagenomic and environmental samples, not in UniProtKB

22 ExPASy hmp:// Expert Protein Analysis System (1993) now: SIB ExPASy Bioinforma)cs Resources Portal Ar)mo P, Jonnalagedda M, Arnold K, Bara)n D, Csardi G, de Castro E, Duvaud S, Flegel V, For)er A, Gasteiger E, Grosdidier A, Hernandez C, Ioannidis V, Kuznetsov D, Liech) R, Moreo S, Mostaguir K, Redaschi N, Rossier G, Xenarios I, and Stockinger H. ExPASy: SIB bioinforma9cs resource portal, Nucleic Acids Res, 40(W1):W597-W603, 2012.

23 Expasy Categories Proteomics Genomics Structural Bioinforma)cs Systems biology Phylogeny/evolu)on

24 Expasy Categories Popula)on gene)cs Transcriptomics Biophysics Imaging Drug Design

25 Resource Descrip)on 1. Resource name and descrip)on 2. Maintaining SIB group 3. Scien)fic category 4. Keywords: a controlled vocabulary is used to tag the resource

26 Resource Descrip)on 5. URL for the web interface and for the download if available 6. SoQware type: website, command line interface, GUI, etc 7. Status: green checkbox if currently available

27 UniProt/SwissProt Sta)s)cs Release 2016_05, May. 11 th taken from hmp://web.expasy.org/docs/ relnotes/relstat.html sequence entries ( in 2015_05)/ amino acids ( in 2015_05)

28 UniProt/SwissProt Sta)s)cs Growth over one year: 2016_5 vs 2015_5 Protein existence (PE) Entries % 1. Evidence at protein level (85.419) 16.8 (15.6) 2. Evidence at transcript level (61.814) 10.5 (11.3) 3. Inferred from homology ( ) 70.3 (70.7) 4. Predicted (11.526) 2.1 (2.1) 5. Uncertain (1.962) 0.4 (0.4)

29 Development taken from hmp://web.expasy.org/docs/relnotes/relstat1.png for release 2015_5

30 More Numbers (rel. 2015_5) Represented species: Top 20 species: sequences, i.e. 21.3% of the total number of sequences Entries No of Species Entries No of Species >

31 Species Representa)on (rel. 2015_5) Top Frequency Species Homo sapiens (Human) Mus musculus (Mouse) Arabidopsis thaliana (Mouse-ear cress) Rattus norvegicus (Rat) Saccharomyces cerevisiae (Baker s yest) Bos taurus (Bovine) Schizosaccheromyces pombe (Fission yeast) Escherichia coli K Bacillus subtilis Dictyostelium discoideum (Slime mold)

32 Representa)on of the Divisions (rel. 2015_5) Viruses (3%), Archaea (4%), Eukaryota (33%), Bacteria (61%),

33 Distribu)on of Eukaryota (rel. 2015_5) Nematoda (2%), 4417 Insecta (5%), 8781 Fungi (17%), Viridiplantae (20%), Other (8%), Human (11%), Other Mammalia (26%), Other Vertebrata (10%), 17823

34 Length Distribu)on (rel. 2015_5)

35 Amino Acid Composi)on (rel. 2015_5) figure taken from gray=aliphatic, red=acidic, green=small hydroxy, blue=basic, black=aromatic, white=amide, yellow=sulfur

36 SwissProt Annota)on Process defined in hmp:// sop_manual_cura)on.pdf explained in hmp://

37 Annota)on Phases 1. Sequence cura)on 2. Sequence analysis 3. Literature cura)on 4. Family-based cura)on 5. Evidence amribu)on 6. Quality assurance, integra)on and update

38 Sequence Cura)on more than 95% are translated CDS from INSDC other sources: PDB, direct protein sequencing, projects not submiong to INSDC sequences are selected according to cura)on priori)es (hmp:// results in the canonical sequence for a gene/ species pair

39 Steps toward the canonical sequence Entry selec)on Run BLAST similarity searches to iden)fy addi)onal sequences for the same gene Iden)fy homologs by reciprocal BLAST and phylogeny based resources Lock selected entries for other curators to prevent duplica)on

40 Steps toward the canonical sequence Prepare sequence alignments with T-Coffee, Muscle, Clustal W Merge into the canonical sequence: - most prevalent - most similar to orthologs sequences found in other species - based on length and aa composi)on it allows the clearest descrip)on - default: longest record conflicts and varia)ons

41 Sequence Analysis Several analysis programs are applied to the sequences for: - topological features - post-transla)onal modifica)ons - domains all results are manually checked and in- or excluded for annota)on

42 Topological Analysis Tools Prediction Signal P Presence and location of signal peptides TargetP Presence and location of transit peptides Predotar Mitochondrial, plastid or ER targeting sequences ESKW Transmembrane domains MEMSAT Transmembrane domains TMHMM Transmembrane domains Phobius Discriminates transmembrane and signal regions

43 Post-transla)onal modifica)on Analysis Tools Prediction GPI-predictor GPI lipid anchor sites NetNGlyc N-glycosylation sites NetOGlyc O-glycosylation sites NMT Predictor N-terminal myristoylation sites Sulfinator Tyrosine sulfatation sites

44 Domain Analysis Tools Prediction ps_scan internal PROSITE profile, pattern and rule scanning InterPro retrieves non-prosite motif matches using InterPro database or InterProScan Coils Coiled-coils regions polyaa internal program which identifies homopolymeric stretches of amino acids REPEAT identifies the following repeats: Ankyrin, Armadillo, HAT, HEAT, Kelch, Leucine-rich, PFTA, PFTB, RCC1, TPR, WD40

45 taken from Figure 1. UniProtKB sequence analysis results displayed in graphical interface

46 Literature Cura)on Iden)fica)on of relevant scien)fic literature from - literature and text mining resources (PubMed, Europe PMC, ihop, TextPresso) - addi)ons from other sources made by the curator Informa)on is extracted form the full text: - general annota)ons (not posi)on specific) - posi)on specific annota)ons

47 General Annota)ons hmp:// general_annota)on posi)on-independent contains mostly general biological informa)on like: func)ons, cataly)c ac)vity, cofactor, enzyme regula)on, subunit structure, pathway,...

48 Sequence Annota)ons posi)on dependent hmp:// sequence_annota)on regions or sites of interest like post-transla)onal modifica)ons, binding sites, ac)ve sites, etc. contains several subsec)ons: molecule processing, regions, sites, amino acid modifica)ons, natural variants, experimental info, secondary structure

49 Family-based Cura)on Evalua)on and cura)on of homologs as described above Standardiza)on of annota)on of homologs Propaga)on of annota)on across the homologs to ensure consistency

50 Evidence AMribu)on Every annota)on is amributed to its original source Every annota)on can be traced back and evaluated For evidence dis)nc)on there are 7 codes from the Evidence Code Ontology (ECO) used for manually curated entries hmp:// Addi)onal GO term annota)on

51 ECO code Term name Usage ECO: experimental evidence used in manual assertion Information for which there is published experimental evidence ECO: non-traceable author statement used in manual assertion Information based on author statements in scientific articles for which there is no ECO: ECO: ECO: ECO: ECO: sequence similarity evidence used in manual assertion imported information used in manual assertion curator inference used in manual assertion match to sequence model evidence used in manual assertion combinatorial evidence used in manual assertion taken from experimental support Information which has been propagated from a related experimentally characterised protein Information which has been imported from another database and manually verified Information which has been inferred by a curator based on his/her scientific knowledge or on the scientific content of an article Information originating from the UniProt automatic annotation systems or any of the sequence analysis programs used during the manual curation process and which has been manually verified Information which is manually curated based on a combination of experimental and computational evidence

52 Quality Control and Integra)on Finished entries run through a series of rulebased checked concerning especially posi)ons and regions All errors are corrected Manually reviewed by a senior curator Finally it is integrated into the database Unlock the finished entries for further cura)on

53 Demostra)on hmp:// P62756#sec)on_features

54 The Swiss-Prot Flat File hmp://web.expasy.org/docs/userman.html An entry is composed by different line types Line types have their own format Follows EMBL Nucleo)de Sequence Database format as close as possible 2 sec)ons: - core data (sequence data, cita)on info, taxonomy) - annota)ons (func)on, modifica)on, domains, sec and quart structure, disease associa)ons, conflicts, asf)

55 The following table lists the available two-letter line codes. Each code is followed by three blanks. Line Code Content Occurence in an entry ID Identification Once; starts the entry AC Accession number(s) Once or more DT Date Three times DE Description Once or more GN Gene name(s) Optional OS Organism species Once or more OG Organelle Optional OC Organism classification Once or more OX Taxonomy cross-reference Once OH Organism host Optional --continued--

56 Line Code Content Occurence in an entry RN Reference number Once or more RP Reference position Once or more RC Reference comment(s) Optional RX Reference cross-reference(s) Optional RG Reference group Once or more (Optional if RA line) RA Reference authors Once or more (Optional if R line) RT Reference title Optional RL Reference location Once or more CC Comments or notes Optional DR Database cross-references Optional PE Protein existence Once KW Keywords Optional FT Feature table data Once or more in Swiss-Prot, optional in TrEMBL SQ Sequence header Once (blanks) Sequence data Once or more // Termination line Once; ends the entry

57 Fields in More Detail ID line: ID EntryName Status; SequenceLength. EntryName: up to 11 uppercase alphanumeric characters X_Y - X is a mnemonic code of at most 5 alphanumeric characters - Y is a mnemonic species iden)fica)on code of at most 5 alphanumeric characters ID CYC_BOVIN Reviewed; 104 AA.

58 AC line: AC AC_number_1;[ AC_number_2;]...[ AC_number_N;] Accession number: 6 or 10 characters [A-N,R-Z] [0-9] [A-Z] [A-Z, 0-9] [A-Z, 0-9] [0-9] [O,P,Q] [0-9] [A-Z, 0-9] [A-Z, 0-9] [A-Z, 0-9] [0-9] [A-N,R-Z] [0-9] [A-Z] [A-Z, 0-9] [A-Z, 0-9] [0-9] [A-Z] [A-Z,0-9] [A-Z,0-9] [0-9] RegEx: [OPQ][0-9][A-Z0-9]{3}[0-9] [A-NR-Z][0-9] ([A-Z][A-Z0-9]{2}[0-9]){1,2} Examples: P12345, Q1AAA9, A0A022YWF9

59 DT line: date, DD-MMM-YYYY always one of the biweekly release dates always three lines: - date of integra)on - date of sequence version, sequence version X - date of entry version, entry version X Example: DT 01-FEB-1999, integrated into UniProtKB/TrEMBL. DT 15-OCT-2000, sequence version 2. DT 15-DEC-2004, entry version 5.

60 DE lines: - three categories and addi)onal subcategories - contains a recommended name - besides: full name, short name, EC number - alterna)ve names: e.g. as an allergen or in biotechnology,...

61 DE RecName: Full=Annexin A5; DE Short=Annexin-5; DE AltName: Full=Annexin V; DE AltName: Full=Lipocor)n V; DE AltName: Full=Endonexin II; DE AltName: Full=Calphobindin I; DE AltName: Full=CBP-I; DE AltName: Full=Placental an)coagulant protein I; DE Short=PAP-I; DE AltName: Full=PP4; DE AltName: Full=Thromboplas)n inhibitor; DE AltName: Full=Vascular an)coagulant-alpha; DE Short=VAC-alpha; DE AltName: Full=Anchorin CII; DE RecName: Full=Granulocyte colony-s)mula)ng factor; DE Short=G-CSF; DE AltName: Full=Pluripoie)n; DE AltName: Full=Filgras)m; DE AltName: Full=Lenogras)m; DE Flags: Precursor;

62 OS line: origina)ng organism OS Homo sapiens (Human). OS Rous sarcoma virus (strain Schmidt-Ruppin A) (RSV-SRA) (Avian leukosis OS virus-rsa). OC lines: contain the taxonomic classifica)on of the source organism according to ( hmp:// OC Node[; Node...]. OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; OC Mammalia; Eutheria; Euarchontoglires; Primates; Catarrhini; Hominidae; OC Homo.

63 RN, RP, RC, RX, RG, RA, RT, RL can occur mul)ple )me order in block fixed e.g: RN [1] RP NUCLEOTIDE SEQUENCE [MRNA] (ISOFORMS A AND C), FUNCTION, INTERACTION RP WITH PKC-3, SUBCELLULAR LOCATION, TISSUE SPECIFICITY, DEVELOPMENTAL RP STAGE, AND MUTAGENESIS OF PHE-175 AND PHE-221. RC STRAIN=Bristol N2; RX PubMed= ; DOI= /jbc.M ; RA Zhang L., Wu S.-L., Rubin C.S.; RT "A novel adapter protein employs a phosphotyrosine binding domain and RT excep)onally basic N-terminal domains to capture and localize an RT atypical protein kinase C: characteriza)on of Caenorhabdi)s elegans RT C kinase adapter 1, a protein that avidly binds protein kinase C3. ; RL J. Biol. Chem. 276: (2001).

64 CC lines free text contains most of the annotated informa)on CC -!- TOPIC: First line of a comment block; CC second and subsequent lines of a comment block. structured by predefined topics like: Allergen, Alterna)ve Products,.., Cofactor,..., Disease,.. Domain,..., Func)on, Interac)on,...

65 CC CC CC CC CC CC CC CC CC CC CC CC CC -!- ALLERGEN: Causes an allergic reaction in human. Minor allergen of! bovine dander.! -!- ALTERNATIVE PRODUCTS:! Event=Alternative initiation; Named isoforms=2;! Name=Alpha;! IsoId=P ; Sequence=Displayed;! Name=Beta;! IsoId=P ; Sequence=VSP_018696;! -!- SUBCELLULAR LOCATION: Cell membrane {ECO: }; Peripheral! membrane protein {ECO: }. Secreted {ECO: }. Note=The! last 22 C-terminal amino acids may participate in cell membrane! attachment.! -!- SUBCELLULAR LOCATION: Isoform 2: Cytoplasm {ECO: }.!!!

66 Cross References too many to enumerate extensive references with nucleo)de databases, e.g.: in EMBL FT CDS FT /protein_id="caa FT /db_xref="swiss-prot:p26345 FT /gene="reca FT /product="reca protein in Swiss=Prot DR EMBL; AJ297977; CAC ; -; Genomic_DNA. DR EMBL; X56491; CAA ; ALT_FRAME; mrna.

67 Key Words / Feature Table KW Keyword[; Keyword...]. helps to search resp. index the database no limits: KW 3D-structure; Alterna)ve splicing; Alzheimer disease; Amyloid; KW Apoptosis; Cell adhesion; Coated pits; Copper; KW Direct protein sequencing; Disease muta)on; Endocytosis; KW Glycoprotein; Heparin-binding; Iron; Membrane; Metal-binding; KW Notch signaling pathway; Phosphoryla)on; Polymorphism; KW Protease inhibitor; Proteoglycan; Serine protease inhibitor; Signal; KW Transmembrane; Zinc. Feature table like GenBank/EMBL/DDBJ

68 Programma)c Access hmp:// programma)c_access (remember this link!) several use cases documented, but not as an API best way: use the web interface to construct/ refine your query first before you try to automate the process

69 Retrieving an Individual Entry uses simple URL which can be bookmarked for individual entries: hmp:// default result is a web page alterna)ve formats: txt, xml, rdf, fasta, gff specified via the accession suffix structured formats like xml or rdf can include referenced entries

70 Using the ID mapping service hmp:// programma)c_access#batch_retrieval_perl_exa mple uses hmp POST method converts between different database IDs you have to know the specific abbrevia)on for the respec)ve databases

71 Retrieving Entries via Queries uses hmp GET method i.e. the query string is part of the URL structure might be quite complex use the browser to configure the query string more seong are available via the query builder hmp:// the URL length might be limited to 1000 characters

72 Examples hmp:// hmp:// hmp:// UniRef90_P04259.xml hmp:// UniRef90_P04259.rdf hmp:// UniRef90_P04259.fasta hmp:// UniRef90_P04259.tab

Bioinforma)cs Resources XML / Web Access

Bioinforma)cs Resources XML / Web Access Bioinforma)cs Resources XML / Web Access Lecture & Exercises Prof. B. Rost, Dr. L. Richter, J. Reeb Ins)tut für Informa)k I12 XML Infusion (in 10 sec) compila)on from hkp://www.w3schools.com/xml/default.asp

More information

Automatic annotation in UniProtKB using UniRule, and Complete Proteomes. Wei Mun Chan

Automatic annotation in UniProtKB using UniRule, and Complete Proteomes. Wei Mun Chan Automatic annotation in UniProtKB using UniRule, and Complete Proteomes Wei Mun Chan Talk outline Introduction to UniProt UniProtKB annotation and propagation Data increase and the need for Automatic Annotation

More information

markup language carry data define your own tags self-descriptive W3C Recommendation

markup language carry data define your own tags self-descriptive W3C Recommendation XML intro What is XML? XML stands for EXtensible Markup Language XML is a markup language much like HTML XML was designed to carry data, not to display data XML tags are not predefined. You must define

More information

XML (Extensible Markup Language)

XML (Extensible Markup Language) Basics of XML: What is XML? XML (Extensible Markup Language) XML stands for Extensible Markup Language XML was designed to carry data, not to display data XML tags are not predefined. You must define your

More information

Bioinforma)cs Resources

Bioinforma)cs Resources Bioinforma)cs Resources Lecture & Exercises Prof. B. Rost, Dr. L. Richter, J. Reeb Ins)tut für Informa)k I12 Bioinforma)cs Resources Organiza)on Schedule Overview Organiza)on Lecture: Friday 9-12, i.e.

More information

Data formats. { "firstname": "John", "lastname" : "Smith", "age" : 25, "address" : { "streetaddress": "21 2nd Street",

Data formats. { firstname: John, lastname : Smith, age : 25, address : { streetaddress: 21 2nd Street, Data formats { "firstname": "John", "lastname" : "Smith", "age" : 25, "address" : { "streetaddress": "21 2nd Street", "city" : "New York", "state" : "NY", "postalcode" : "10021" }, CSCI 470: Web Science

More information

XML extensible Markup Language

XML extensible Markup Language extensible Markup Language Eshcar Hillel Sources: http://www.w3schools.com http://java.sun.com/webservices/jaxp/ learning/tutorial/index.html Tutorial Outline What is? syntax rules Schema Document Object

More information

Software Engineering Methods, XML extensible Markup Language. Tutorial Outline. An Example File: Note.xml XML 1

Software Engineering Methods, XML extensible Markup Language. Tutorial Outline. An Example File: Note.xml XML 1 extensible Markup Language Eshcar Hillel Sources: http://www.w3schools.com http://java.sun.com/webservices/jaxp/ learning/tutorial/index.html Tutorial Outline What is? syntax rules Schema Document Object

More information

Similarity searches in biological sequence databases

Similarity searches in biological sequence databases Similarity searches in biological sequence databases Volker Flegel september 2004 Page 1 Outline Keyword search in databases General concept Examples SRS Entrez Expasy Similarity searches in databases

More information

EBI patent related services

EBI patent related services EBI patent related services 4 th Annual Forum for SMEs October 18-19 th 2010 Jennifer McDowall Senior Scientist, EMBL-EBI EBI is an Outstation of the European Molecular Biology Laboratory. Overview Patent

More information

11. EXTENSIBLE MARKUP LANGUAGE (XML)

11. EXTENSIBLE MARKUP LANGUAGE (XML) 11. EXTENSIBLE MARKUP LANGUAGE (XML) Introduction Extensible Markup Language is a Meta language that describes the contents of the document. So these tags can be called as self-describing data tags. XML

More information

The Use of WWW in Biological Research

The Use of WWW in Biological Research The Use of WWW in Biological Research Introduction R.Doelz, Biocomputing Basel T.Etzold, EMBL Heidelberg Information in Biology grows rapidly. Initially, biological retrieval systems used conventional

More information

UniProt - The Universal Protein Resource

UniProt - The Universal Protein Resource UniProt - The Universal Protein Resource Claire O Donovan Pre-UniProt Swiss-Prot: created in July 1986; since 1987, a collaboration of the SIB and the EMBL/EBI; TrEMBL: created at the EBI in November 1996

More information

Extensible Markup Language (XML) Hamid Zarrabi-Zadeh Web Programming Fall 2013

Extensible Markup Language (XML) Hamid Zarrabi-Zadeh Web Programming Fall 2013 Extensible Markup Language (XML) Hamid Zarrabi-Zadeh Web Programming Fall 2013 2 Outline Introduction XML Structure Document Type Definition (DTD) XHMTL Formatting XML CSS Formatting XSLT Transformations

More information

Web Services Part I. XML Web Services. Instructor: Dr. Wei Ding Fall 2009

Web Services Part I. XML Web Services. Instructor: Dr. Wei Ding Fall 2009 Web Services Part I Instructor: Dr. Wei Ding Fall 2009 CS 437/637 Database-Backed Web Sites and Web Services 1 XML Web Services XML Web Services = Web Services A Web service is a different kind of Web

More information

12. Key features involved in building biological 3databases

12. Key features involved in building biological 3databases 12. Key features involved in building biological 3databases Central to the discipline of bioinformatics is the need to store biological information systematically in structured databases. The first databases

More information

INTRODUCTION TO BIOINFORMATICS

INTRODUCTION TO BIOINFORMATICS Molecular Biology-2017 1 INTRODUCTION TO BIOINFORMATICS In this section, we want to provide a simple introduction to using the web site of the National Center for Biotechnology Information NCBI) to obtain

More information

Biostatistics and Bioinformatics Molecular Sequence Databases

Biostatistics and Bioinformatics Molecular Sequence Databases . 1 Description of Module Subject Name Paper Name Module Name/Title 13 03 Dr. Vijaya Khader Dr. MC Varadaraj 2 1. Objectives: In the present module, the students will learn about 1. Encoding linear sequences

More information

Bioinformatics resources for data management. Etienne de Villiers KEMRI-Wellcome Trust, Kilifi

Bioinformatics resources for data management. Etienne de Villiers KEMRI-Wellcome Trust, Kilifi Bioinformatics resources for data management Etienne de Villiers KEMRI-Wellcome Trust, Kilifi Typical Bioinformatic Project Pose Hypothesis Store data in local database Read Relevant Papers Retrieve data

More information

XML for Android Developers. partially adapted from XML Tutorial by W3Schools

XML for Android Developers. partially adapted from XML Tutorial by W3Schools XML for Android Developers partially adapted from XML Tutorial by W3Schools Markup Language A system for annotating a text document in way that is syntactically distinguishble from the content. Motivated

More information

INTERNET PROGRAMMING XML

INTERNET PROGRAMMING XML INTERNET PROGRAMMING XML Software Engineering Branch / 4 th Class Computer Engineering Department University of Technology OUTLINES XML Basic XML Advanced 2 HTML & CSS & JAVASCRIPT & XML DOCUMENTS HTML

More information

Web Computing. Revision Notes

Web Computing. Revision Notes Web Computing Revision Notes Exam Format The format of the exam is standard: Answer TWO OUT OF THREE questions Candidates should answer ONLY TWO questions The time allowed is TWO hours Notes: You will

More information

mpmorfsdb: A database of Molecular Recognition Features (MoRFs) in membrane proteins. Introduction

mpmorfsdb: A database of Molecular Recognition Features (MoRFs) in membrane proteins. Introduction mpmorfsdb: A database of Molecular Recognition Features (MoRFs) in membrane proteins. Introduction Molecular Recognition Features (MoRFs) are short, intrinsically disordered regions in proteins that undergo

More information

Bioinformatics Hubs on the Web

Bioinformatics Hubs on the Web Bioinformatics Hubs on the Web Take a class The Galter Library teaches a related class called Bioinformatics Hubs on the Web. See our Classes schedule for the next available offering. If this class is

More information

20.453J / 2.771J / HST.958J Biomedical Information Technology Fall 2008

20.453J / 2.771J / HST.958J Biomedical Information Technology Fall 2008 MIT OpenCourseWare http://ocw.mit.edu 20.453J / 2.771J / HST.958J Biomedical Information Technology Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

Bioinforma)cs Resources

Bioinforma)cs Resources Bioinforma)cs Resources Lecture & Exercises Prof. B. Rost, Dr. L. Richter, J. Reeb Ins)tut für Informa)k I12 Bioinforma)cs Resources Organiza)on Schedule Overview Organiza)on Lecture: Friday 9-12, i.e.

More information

EXtensible Markup Language XML

EXtensible Markup Language XML EXtensible Markup Language XML 1 What is XML? XML stands for EXtensible Markup Language XML is a markup language much like HTML XML was designed to carry data, not to display data XML tags are not predefined.

More information

What is XML? XML is designed to transport and store data.

What is XML? XML is designed to transport and store data. What is XML? XML stands for extensible Markup Language. XML is designed to transport and store data. HTML was designed to display data. XML is a markup language much like HTML XML was designed to carry

More information

Bioinforma)cs Resources

Bioinforma)cs Resources Bioinforma)cs Resources Lecture & Exercises Prof. B. Rost, Dr. L. Richter, J. Reeb Ins)tut für Informa)k I12 Bioinforma)cs Resources Organiza)on Schedule Overview Organiza)on Lecture: Friday 9-12, i.e.

More information

Web services CSCI 470: Web Science Keith Vertanen Copyright 2011

Web services CSCI 470: Web Science Keith Vertanen Copyright 2011 Web services CSCI 470: Web Science Keith Vertanen Copyright 2011 Web services What does that mean? Why are they useful? Examples! Major interac>on types REST SOAP Data formats: XML JSON Overview 2 3 W3C

More information

RESTRUCTURING GPSDB UNIVERSITY OF GENEVA MASTER IN PROTEOMICS AND BIOINFORMATICS. Written by Emilie Pasche

RESTRUCTURING GPSDB UNIVERSITY OF GENEVA MASTER IN PROTEOMICS AND BIOINFORMATICS. Written by Emilie Pasche UNIVERSITY OF GENEVA MASTER IN PROTEOMICS AND BIOINFORMATICS RESTRUCTURING GPSDB Written by Emilie Pasche Supervisors: Violaine Pillet, Anne-Lise Veuthey, Céline Hernandez Swiss Institute of Bioinformatics,

More information

Topics of the talk. Biodatabases. Data types. Some sequence terminology...

Topics of the talk. Biodatabases. Data types. Some sequence terminology... Topics of the talk Biodatabases Jarno Tuimala / Eija Korpelainen CSC What data are stored in biological databases? What constitutes a good database? Nucleic acid sequence databases Amino acid sequence

More information

Introduction to Sequence Databases. 1. DNA & RNA 2. Proteins

Introduction to Sequence Databases. 1. DNA & RNA 2. Proteins Introduction to Sequence Databases 1. DNA & RNA 2. Proteins 1 What are Databases? A database is a structured collection of information. A database consists of basic units called records or entries. Each

More information

Introduc)on to annota)on with Artemis. Download presenta.on and data

Introduc)on to annota)on with Artemis. Download presenta.on and data Introduc)on to annota)on with Artemis Download presenta.on and data Annota)on Assign an informa)on to genomic sequences???? Genome annota)on 1. Iden.fying genomic elements by: Predic)on (structural annota.on

More information

Protein Information Tutorial

Protein Information Tutorial Protein Information Tutorial Relevant websites: SMART (normal mode): SMART (batch mode): HMMER search: InterProScan: CBS Prediction Servers: EMBOSS: http://smart.embl-heidelberg.de/ http://smart.embl-heidelberg.de/smart/batch.pl

More information

Automatic GO Term Prediction on UniProtKB

Automatic GO Term Prediction on UniProtKB Automatic GO Term Prediction on UniProtKB Diploma Thesis at University of Applied Sciences Weihenstephan Department of Biotechnology and Bioinformatics Barbara Meier 23. Mar. 2007 conducted at European

More information

Wilson Leung 01/03/2018 An Introduction to NCBI BLAST. Prerequisites: Detecting and Interpreting Genetic Homology: Lecture Notes on Alignment

Wilson Leung 01/03/2018 An Introduction to NCBI BLAST. Prerequisites: Detecting and Interpreting Genetic Homology: Lecture Notes on Alignment An Introduction to NCBI BLAST Prerequisites: Detecting and Interpreting Genetic Homology: Lecture Notes on Alignment Resources: The BLAST web server is available at https://blast.ncbi.nlm.nih.gov/blast.cgi

More information

Finding and Exporting Data. BioMart

Finding and Exporting Data. BioMart September 2017 Finding and Exporting Data Not sure what tool to use to find and export data? BioMart is used to retrieve data for complex queries, involving a few or many genes or even complete genomes.

More information

INTRODUCTION TO BIOINFORMATICS

INTRODUCTION TO BIOINFORMATICS Molecular Biology-2019 1 INTRODUCTION TO BIOINFORMATICS In this section, we want to provide a simple introduction to using the web site of the National Center for Biotechnology Information NCBI) to obtain

More information

but XML goes far beyond HTML: it describes data

but XML goes far beyond HTML: it describes data The XML Meta-Language 1 Introduction to XML The father of markup languages: XML = EXtensible Markup Language is a simplified version of SGML Originally created to overcome the limitations of HTML the HTML

More information

Web Programming Paper Solution (Chapter wise)

Web Programming Paper Solution (Chapter wise) What is valid XML document? Design an XML document for address book If in XML document All tags are properly closed All tags are properly nested They have a single root element XML document forms XML tree

More information

2) NCBI BLAST tutorial This is a users guide written by the education department at NCBI.

2) NCBI BLAST tutorial   This is a users guide written by the education department at NCBI. Web resources -- Tour. page 1 of 8 This is a guided tour. Any homework is separate. In fact, this exercise is used for multiple classes and is publicly available to everyone. The entire tour will take

More information

Protein Sequence Database

Protein Sequence Database Protein Sequence Database A protein is a large molecule manufactured in the cell of a living organism to carry out essential functions within the cell. The primary structure of a protein is a sequence

More information

Trad DDBJ. DNA Data Bank of Japan

Trad DDBJ. DNA Data Bank of Japan Trad DDBJ DNA Data Bank of Japan LOCUS HUMIL2HOM 397 bp DNA linear HUM 27-APR-1993 DEFINITION Human interleukin 2 (IL-2)-like DNA. ACCESSION M13784 VERSION M13784.1 KEYWORDS. SOURCE Homo sapiens (human)

More information

XML. Presented by : Guerreiro João Thanh Truong Cong

XML. Presented by : Guerreiro João Thanh Truong Cong XML Presented by : Guerreiro João Thanh Truong Cong XML : Definitions XML = Extensible Markup Language. Other Markup Language : HTML. XML HTML XML describes a Markup Language. XML is a Meta-Language. Users

More information

Introduction Syntax and Usage XML Databases Java Tutorial XML. November 5, 2008 XML

Introduction Syntax and Usage XML Databases Java Tutorial XML. November 5, 2008 XML Introduction Syntax and Usage Databases Java Tutorial November 5, 2008 Introduction Syntax and Usage Databases Java Tutorial Outline 1 Introduction 2 Syntax and Usage Syntax Well Formed and Valid Displaying

More information

XML: a "skeleton" for creating markup languages you already know it! <element attribute="value">content</element> languages written in XML specify:

XML: a skeleton for creating markup languages you already know it! <element attribute=value>content</element> languages written in XML specify: 1 XML What is XML? 2 XML: a "skeleton" for creating markup languages you already know it! syntax is identical to XHTML's: content languages written in XML specify:

More information

Protein Sequence Database

Protein Sequence Database Protein Sequence Database A protein is a large molecule manufactured in the cell of a living organism to carry out essential functions within the cell. The primary structure of a protein is a sequence

More information

XML CSC 443: Web Programming

XML CSC 443: Web Programming 1 XML CSC 443: Web Programming Haidar Harmanani Department of Computer Science and Mathematics Lebanese American University Byblos, 1401 2010 Lebanon What is XML? 2 XML: a "skeleton" for creating markup

More information

Bioinforma)cs Resources - Genbank -

Bioinforma)cs Resources - Genbank - Bioinforma)cs Resources - Genbank - Lecture & Exercises Prof. B. Rost, Dr. L. Richter, J. Reeb Ins)tut für Informa)k I12 Preliminary Schedule April 13 th Intro, General Overview (1. sh.) April 20 th Sequence

More information

Mining the Biomedical Research Literature. Ken Baclawski

Mining the Biomedical Research Literature. Ken Baclawski Mining the Biomedical Research Literature Ken Baclawski Data Formats Flat files Spreadsheets Relational databases Web sites XML Documents Flexible very popular text format Self-describing records XML Documents

More information

CAP BIOINFORMATICS Su-Shing Chen CISE. 8/19/2005 Su-Shing Chen, CISE 1

CAP BIOINFORMATICS Su-Shing Chen CISE. 8/19/2005 Su-Shing Chen, CISE 1 CAP 5510-2 BIOINFORMATICS Su-Shing Chen CISE 8/19/2005 Su-Shing Chen, CISE 1 Building Local Genomic Databases Genomic research integrates sequence data with gene function knowledge. Gene ontology to represent

More information

CSE 421 Algorithms. Sequence Alignment

CSE 421 Algorithms. Sequence Alignment CSE 421 Algorithms Sequence Alignment 1 Sequence Alignment What Why A Dynamic Programming Algorithm 2 Sequence Alignment Goal: position characters in two strings to best line up identical/similar ones

More information

CACAO Training. Jim Hu and Suzi Aleksander Spring 2016

CACAO Training. Jim Hu and Suzi Aleksander Spring 2016 CACAO Training Jim Hu and Suzi Aleksander Spring 2016 1 What is CACAO? Community Assessment of Community Annotation with Ontologies (CACAO) Annotation of gene function Competition Within a class Between

More information

Wilson Leung 05/27/2008 A Simple Introduction to NCBI BLAST

Wilson Leung 05/27/2008 A Simple Introduction to NCBI BLAST A Simple Introduction to NCBI BLAST Prerequisites: Detecting and Interpreting Genetic Homology: Lecture Notes on Alignment Resources: The BLAST web server is available at http://www.ncbi.nih.gov/blast/

More information

CSC Web Technologies, Spring Web Data Exchange Formats

CSC Web Technologies, Spring Web Data Exchange Formats CSC 342 - Web Technologies, Spring 2017 Web Data Exchange Formats Web Data Exchange Data exchange is the process of transforming structured data from one format to another to facilitate data sharing between

More information

Extensible Markup Language (XML) What is XML? Structure of an XML document. CSE 190 M (Web Programming), Spring 2007 University of Washington

Extensible Markup Language (XML) What is XML? Structure of an XML document. CSE 190 M (Web Programming), Spring 2007 University of Washington Page 1 Extensible Markup Language (XML) CSE 190 M (Web Programming), Spring 2007 University of Washington Reading: Sebesta Ch. 8 sections 8.1-8.3, 8.7-8.8, 8.10.3 What is XML? a specification for creating

More information

BovineMine Documentation

BovineMine Documentation BovineMine Documentation Release 1.0 Deepak Unni, Aditi Tayal, Colin Diesh, Christine Elsik, Darren Hag Oct 06, 2017 Contents 1 Tutorial 3 1.1 Overview.................................................

More information

Information Resources in Molecular Biology Marcela Davila-Lopez How many and where

Information Resources in Molecular Biology Marcela Davila-Lopez How many and where Information Resources in Molecular Biology Marcela Davila-Lopez (marcela.davila@medkem.gu.se) How many and where Data growth DB: What and Why A Database is a shared collection of logically related data,

More information

E-Applications. XML and DOM in Javascript. Michail Lampis

E-Applications. XML and DOM in Javascript. Michail Lampis E-Applications XML and DOM in Javascript Michail Lampis michail.lampis@dauphine.fr Acknowledgment Much of the material on these slides follows the tutorial given in: http://www.w3schools.com/dom/ XML XML

More information

CSE 312 Autumn More Algorithms: Dynamic Programming for Sequence Alignment

CSE 312 Autumn More Algorithms: Dynamic Programming for Sequence Alignment CSE 312 Autumn 2011 More Algorithms: Dynamic Programming for Sequence Alignment 1 An Important Algorithm Design Technique Dynamic Programming Give a solution to a problem using smaller sub-problems, e.g.

More information

PPI Network Alignment Advanced Topics in Computa8onal Genomics

PPI Network Alignment Advanced Topics in Computa8onal Genomics PPI Network Alignment 02-715 Advanced Topics in Computa8onal Genomics PPI Network Alignment Compara8ve analysis of PPI networks across different species by aligning the PPI networks Find func8onal orthologs

More information

What is Internet COMPUTER NETWORKS AND NETWORK-BASED BIOINFORMATICS RESOURCES

What is Internet COMPUTER NETWORKS AND NETWORK-BASED BIOINFORMATICS RESOURCES What is Internet COMPUTER NETWORKS AND NETWORK-BASED BIOINFORMATICS RESOURCES Global Internet DNS Internet IP Internet Domain Name System Domain Name System The Domain Name System (DNS) is a hierarchical,

More information

Software review. Biomolecular Interaction Network Database

Software review. Biomolecular Interaction Network Database Biomolecular Interaction Network Database Keywords: protein interactions, visualisation, biology data integration, web access Abstract This software review looks at the utility of the Biomolecular Interaction

More information

Blast2GO User Manual. Blast2GO Ortholog Group Annotation May, BioBam Bioinformatics S.L. Valencia, Spain

Blast2GO User Manual. Blast2GO Ortholog Group Annotation May, BioBam Bioinformatics S.L. Valencia, Spain Blast2GO User Manual Blast2GO Ortholog Group Annotation May, 2016 BioBam Bioinformatics S.L. Valencia, Spain Contents 1 Clusters of Orthologs 2 2 Orthologous Group Annotation Tool 2 3 Statistics for NOG

More information

COMP9321 Web Application Engineering. Extensible Markup Language (XML)

COMP9321 Web Application Engineering. Extensible Markup Language (XML) COMP9321 Web Application Engineering Extensible Markup Language (XML) Dr. Basem Suleiman Service Oriented Computing Group, CSE, UNSW Australia Semester 1, 2016, Week 4 http://webapps.cse.unsw.edu.au/webcms2/course/index.php?cid=2442

More information

Lecture 5 Advanced BLAST

Lecture 5 Advanced BLAST Introduction to Bioinformatics for Medical Research Gideon Greenspan gdg@cs.technion.ac.il Lecture 5 Advanced BLAST BLAST Recap Sequence Alignment Complexity and indexing BLASTN and BLASTP Basic parameters

More information

Extensible Markup Language (XML) What is XML? An example XML file. CSE 190 M (Web Programming), Spring 2008 University of Washington

Extensible Markup Language (XML) What is XML? An example XML file. CSE 190 M (Web Programming), Spring 2008 University of Washington Extensible Markup Language (XML) CSE 190 M (Web Programming), Spring 2008 University of Washington Except where otherwise noted, the contents of this presentation are Copyright 2008 Marty Stepp and Jessica

More information

Bioinforma)cs Resources - NoSQL -

Bioinforma)cs Resources - NoSQL - Bioinforma)cs Resources - NoSQL - Lecture & Exercises Prof. B. Rost, Dr. L. Richter, J. Reeb Ins)tut für Informa)k I12 Short SQL Recap schema typed data tables defined layout space consump)on is computable

More information

M359 Block5 - Lecture12 Eng/ Waleed Omar

M359 Block5 - Lecture12 Eng/ Waleed Omar Documents and markup languages The term XML stands for extensible Markup Language. Used to label the different parts of documents. Labeling helps in: Displaying the documents in a formatted way Querying

More information

Exercises. Biological Data Analysis Using InterMine workshop exercises with answers

Exercises. Biological Data Analysis Using InterMine workshop exercises with answers Exercises Biological Data Analysis Using InterMine workshop exercises with answers Exercise1: Faceted Search Use HumanMine for this exercise 1. Search for one or more of the following using the keyword

More information

Editing Pathway/Genome Databases

Editing Pathway/Genome Databases Editing Pathway/Genome Databases By Ron Caspi This presentation can be found at http://bioinformatics.ai.sri.com/ptools/tutorial/sessions/ 1 Pathway Tools in Editing Mode The database is separate from

More information

MANAGING INFORMATION (CSCU9T4) LECTURE 2: XML STRUCTURE

MANAGING INFORMATION (CSCU9T4) LECTURE 2: XML STRUCTURE MANAGING INFORMATION (CSCU9T4) LECTURE 2: XML STRUCTURE Gabriela Ochoa http://www.cs.stir.ac.uk/~goc/ OUTLINE XML Elements vs. Attributes Well-formed vs. Valid XML documents Document Type Definitions (DTDs)

More information

Java EE 7: Back-end Server Application Development 4-2

Java EE 7: Back-end Server Application Development 4-2 Java EE 7: Back-end Server Application Development 4-2 XML describes data objects called XML documents that: Are composed of markup language for structuring the document data Support custom tags for data

More information

Using the Distributed Annotation System

Using the Distributed Annotation System Using the Distributed Annotation System http://www.ebi.ac.uk Introduction This half day course is designed for those with a biological background that are relatively new to the use of the Distributed Annotation

More information

Complex Query Formulation Over Diverse Information Sources Using an Ontology

Complex Query Formulation Over Diverse Information Sources Using an Ontology Complex Query Formulation Over Diverse Information Sources Using an Ontology Robert Stevens, Carole Goble, Norman Paton, Sean Bechhofer, Gary Ng, Patricia Baker and Andy Brass Department of Computer Science,

More information

Supplementary Note 1: Considerations About Data Integration

Supplementary Note 1: Considerations About Data Integration Supplementary Note 1: Considerations About Data Integration Considerations about curated data integration and inferred data integration mentha integrates high confidence interaction information curated

More information

Facilitating Semantic Alignment of EBI Resources

Facilitating Semantic Alignment of EBI Resources Facilitating Semantic Alignment of EBI Resources 17 th March, 2017 Tony Burdett Technical Co-ordinator Samples, Phenotypes and Ontologies Team www.ebi.ac.uk What is EMBL-EBI? Europe s home for biological

More information

GFF3sort: an efficient tool to sort GFF3 files for tabix indexing

GFF3sort: an efficient tool to sort GFF3 files for tabix indexing biorxiv preprint first posted online Jun., 0; doi: http://dx.doi.org/0.0/. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY-NC.0

More information

Integrated Access to Biological Data. A use case

Integrated Access to Biological Data. A use case Integrated Access to Biological Data. A use case Marta González Fundación ROBOTIKER, Parque Tecnológico Edif 202 48970 Zamudio, Vizcaya Spain marta@robotiker.es Abstract. This use case reflects the research

More information

Editing Pathway/Genome Databases

Editing Pathway/Genome Databases Editing Pathway/Genome Databases By Ron Caspi ron.caspi@sri.com This presentation can be found at http://bioinformatics.ai.sri.com/ptools/tutorial/sessions/ curation/curation of genes, enzymes and Pathways/

More information

Web-based tools for Bioinformatics; A (free) introduction to (freely available) NCBI, MUSC and World-wide Bioinformatics Resources.

Web-based tools for Bioinformatics; A (free) introduction to (freely available) NCBI, MUSC and World-wide Bioinformatics Resources. 1 of 12 9/10/2003 11:15 AM Web-based tools for Bioinformatics; A (free) introduction to (freely available) NCBI, MUSC and World-wide Bioinformatics Resources. When and Where---Wednesdays at 1pm Room 438

More information

Comp 336/436 - Markup Languages. Fall Semester Week 4. Dr Nick Hayward

Comp 336/436 - Markup Languages. Fall Semester Week 4. Dr Nick Hayward Comp 336/436 - Markup Languages Fall Semester 2017 - Week 4 Dr Nick Hayward XML - recap first version of XML became a W3C Recommendation in 1998 a useful format for data storage and exchange config files,

More information

Master Thesis. Andreas Schlicker

Master Thesis. Andreas Schlicker Master Thesis A Global Approach to Comparative Genomics: Comparison of Functional Annotation over the Taxonomic Tree by Andreas Schlicker A Thesis Submitted to the Center for Bioinformatics of Saarland

More information

Finding homologous sequences in databases

Finding homologous sequences in databases Finding homologous sequences in databases There are multiple algorithms to search sequences databases BLAST (EMBL, NCBI, DDBJ, local) FASTA (EMBL, local) For protein only databases scan via Smith-Waterman

More information

1. HPC & I/O 2. BioPerl

1. HPC & I/O 2. BioPerl 1. HPC & I/O 2. BioPerl A simplified picture of the system User machines Login server(s) jhpce01.jhsph.edu jhpce02.jhsph.edu 72 nodes ~3000 cores compute farm direct attached storage Research network

More information

Copyright 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley. Chapter 7 XML

Copyright 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley. Chapter 7 XML Chapter 7 XML 7.1 Introduction extensible Markup Language Developed from SGML A meta-markup language Deficiencies of HTML and SGML Lax syntactical rules Many complex features that are rarely used HTML

More information

HymenopteraMine Documentation

HymenopteraMine Documentation HymenopteraMine Documentation Release 1.0 Aditi Tayal, Deepak Unni, Colin Diesh, Chris Elsik, Darren Hagen Apr 06, 2017 Contents 1 Welcome to HymenopteraMine 3 1.1 Overview of HymenopteraMine.....................................

More information

EBI services. Jennifer McDowall EMBL-EBI

EBI services. Jennifer McDowall EMBL-EBI EBI services Jennifer McDowall EMBL-EBI The SLING project is funded by the European Commission within Research Infrastructures of the FP7 Capacities Specific Programme, grant agreement number 226073 (Integrating

More information

Viewing Molecular Structures

Viewing Molecular Structures Viewing Molecular Structures Proteins fulfill a wide range of biological functions which depend upon their three dimensional structures. Therefore, deciphering the structure of proteins has been the quest

More information

Comp 336/436 - Markup Languages. Fall Semester Week 4. Dr Nick Hayward

Comp 336/436 - Markup Languages. Fall Semester Week 4. Dr Nick Hayward Comp 336/436 - Markup Languages Fall Semester 2018 - Week 4 Dr Nick Hayward XML - recap first version of XML became a W3C Recommendation in 1998 a useful format for data storage and exchange config files,

More information

Introduction to XML. Chapter 133

Introduction to XML. Chapter 133 Chapter 133 Introduction to XML A. Multiple choice questions: 1. Attributes in XML should be enclosed within. a. single quotes b. double quotes c. both a and b d. none of these c. both a and b 2. Which

More information

Database Similarity Searching

Database Similarity Searching Database Similarity Searching Why search databases? To find out if a new DNA sequence shares similari?es with sequences already deposited in the databanks. To find proteins homologous to a puta?ve coding

More information

XML: Introduction. !important Declaration... 9:11 #FIXED... 7:5 #IMPLIED... 7:5 #REQUIRED... Directive... 9:11

XML: Introduction. !important Declaration... 9:11 #FIXED... 7:5 #IMPLIED... 7:5 #REQUIRED... Directive... 9:11 !important Declaration... 9:11 #FIXED... 7:5 #IMPLIED... 7:5 #REQUIRED... 7:4 @import Directive... 9:11 A Absolute Units of Length... 9:14 Addressing the First Line... 9:6 Assigning Meaning to XML Tags...

More information

BioExtract Server User Manual

BioExtract Server User Manual BioExtract Server User Manual University of South Dakota About Us The BioExtract Server harnesses the power of online informatics tools for creating and customizing workflows. Users can query online sequence

More information

COMPARATIVE MICROBIAL GENOMICS ANALYSIS WORKSHOP. Exercise 2: Predicting Protein-encoding Genes, BlastMatrix, BlastAtlas

COMPARATIVE MICROBIAL GENOMICS ANALYSIS WORKSHOP. Exercise 2: Predicting Protein-encoding Genes, BlastMatrix, BlastAtlas COMPARATIVE MICROBIAL GENOMICS ANALYSIS WORKSHOP Exercise 2: Predicting Protein-encoding Genes, BlastMatrix, BlastAtlas First of all connect once again to the CBS system: Open ssh shell client. Press Quick

More information

How to use KAIKObase Version 3.1.0

How to use KAIKObase Version 3.1.0 How to use KAIKObase Version 3.1.0 Version3.1.0 29/Nov/2010 http://sgp2010.dna.affrc.go.jp/kaikobase/ Copyright National Institute of Agrobiological Sciences. All rights reserved. Outline 1. System overview

More information

W3C XML XML Overview

W3C XML XML Overview Overview Jaroslav Porubän 2008 References Tutorials, http://www.w3schools.com Specifications, World Wide Web Consortium, http://www.w3.org David Hunter, et al.: Beginning, 4th Edition, Wrox, 2007, 1080

More information

The components of a basic XML system.

The components of a basic XML system. XML XML stands for EXtensible Markup Language. XML is a markup language much like HTML XML is a software- and hardware-independent tool for carrying information. XML is easy to learn. XML was designed

More information

Introduction to XML. M2 MIA, Grenoble Université. François Faure

Introduction to XML. M2 MIA, Grenoble Université. François Faure M2 MIA, Grenoble Université Example tove jani reminder dont forget me this weekend!

More information