Protein Data Bank Japan

Similar documents
The PDB and experimental data

mpmorfsdb: A database of Molecular Recognition Features (MoRFs) in membrane proteins. Introduction

ISO INTERNATIONAL STANDARD. Health informatics Genomic Sequence Variation Markup Language (GSVML)

Bioinformatics Hubs on the Web

Activities of PDBj and wwpdb: A new PDB format, Data Deposition, Validation, and Data Integration

Protein Data Bank: An open access resource enabling basic and applied research and education in biology and medicine

Viewing Molecular Structures

Software review. Biomolecular Interaction Network Database

Introduction to the Protein Data Bank Master Chimie Info Roland Stote Page #

Min Wang. April, 2003

Grid computing and bioinformatics development. A case study on the Oryza sativa (rice) genome*

PDB-Metrics: a Web tool for exploring the PDB contents

Structural Bioinformatics

Integrating Data with Publications: Greater Interactivity and Challenges for Long-Term Preservation of the Scientific Record

3DProIN: Protein-Protein Interaction Networks and Structure Visualization

Evaluation of structure quality tutorial, using RCSB PDB tools

Towards a Semantic Wiki-Based Japanese Biodictionary

Guide for the EFI-Database (EFI-DB)

Using Protein Data Bank and Astex Viewer to Study Protein Structure

Data Mining Technologies for Bioinformatics Sequences

Deliverable D5.5. D5.5 VRE-integrated PDBe Search and Query API. World-wide E-infrastructure for structural biology. Grant agreement no.

Life Sciences Oracle Based Solutions. June 2004

User Guide for DNAFORM Clone Search Engine

PROJECT FINAL REPORT. Tel: Fax:

VirusPKT: A Search Tool For Assimilating Assorted Acquaintance For Viruses

Homology Modeling FABP

Bioinformatics explained: BLAST. March 8, 2007

XML in the bipharmaceutical

Topics of the talk. Biodatabases. Data types. Some sequence terminology...

LinkDB: A Database of Cross Links between Molecular Biology Databases

) I R L Press Limited, Oxford, England. The protein identification resource (PIR)

Virginia Bioinformatics Institute. Extension / Research IT Network DIVERSE API

Constructing Digital Archive of Architectural Material with ontology

Bioinformatics explained: Smith-Waterman

Untangling the semantic web:

Ontology-Based Mediation in the. Pisa June 2007

Information Resources in Molecular Biology Marcela Davila-Lopez How many and where

Bioinformatics Data Distribution and Integration via Web Services and XML

FOR IMMEDIATE RELEASE

What is Internet COMPUTER NETWORKS AND NETWORK-BASED BIOINFORMATICS RESOURCES

Human-Centered Design Approach for Middleware

The beginning of this guide offers a brief introduction to the Protein Data Bank, where users can download structure files.

How to store and visualize RNA-seq data

wwpdb Processing Procedures and Policies Document Section B: wwpdb Policies Authored by the wwpdb annotation staff November 2017 version 4.

Measuring inter-annotator agreement in GO annotations

LEARNING OBJECT METADATA IN A WEB-BASED LEARNING ENVIRONMENT

APPENDIX TO THE CHARTER OF THE WORLDWIDE PDB (wwpdb)

Lezione 7. Bioinformatica. Mauro Ceccanti e Alberto Paoluzzi

Langara College Spring archived

Enabling Open Science: Data Discoverability, Access and Use. Jo McEntyre Head of Literature Services

efip online Help Document

The CALBC RDF Triple store: retrieval over large literature content

Semantic Correspondence in Federated Life Science Data Integration Systems

Language Grid Toolbox: Open Source Multi-language Community Site

STATISTICS (STAT) Statistics (STAT) 1

BLAST, Profile, and PSI-BLAST

An Online Ontology: WiktionaryZ

Computer Science & Engineering (CSE)

Hitachi Announces Executive Changes

Structural Analysis of Paper Citation and Co-Authorship Networks using Network Analysis Techniques

PERSONALIZED ANNOTATION AND INFORMATION SHARING IN PROTEIN SCIENCE WITH INFORMATION-SLIPS

Bio3D: Interactive Tools for Structural Bioinformatics.

Biostatistics and Bioinformatics Molecular Sequence Databases

20.453J / 2.771J / HST.958J Biomedical Information Technology Fall 2008

Server DBMS M Server DBMS 2. Application Program. Electric Gene Dictionary Sequence. Interface Process. Local DBMS N. Application Program

THE SCIENCE PERSPECTIVE In 1971, a small group of scientists interested in protein structure attending

Quick Review of Graphs

Lezione 7. BioPython. Contents. BioPython Installing and exploration Tutorial. Bioinformatica. Mauro Ceccanti e Alberto Paoluzzi

Revealing the Modern History of Japanese Philosophy Using Digitization, Natural Language Processing, and Visualization

EBI is an Outstation of the European Molecular Biology Laboratory.

COLLEGE OF IMAGING ARTS AND SCIENCES. Medical Illustration

Bachelor of Science in. Computer Science. Advising Brochure Department of. Computer Science & Engineering College of Arts & Sciences

Literature Databases

Searchable. Readable. Relatable. E-Journal Platform for Japanese Academic Societies

Trilateral Search Guidebook in Biotechnology. [Ver.1 Publication ]

LG CNS and HITACHI Agreed to Jointly Develop a Information System for University

Properties of Biological Networks

Lezione 7. BioPython. Contents. BioPython Installing and exploration Tutorial First Course Project First Start First Start with Biopython

Automated Bioinformatics Analysis System on Chip ABASOC. version 1.1

Multiple Sequence Alignment

An Adaptive Query Processing Method according to System Environments in Database Broadcasting Systems

The Electron Microscopy Data Bank and OME


Metadata Discovery and Integration to Support Repurposing of Heterogeneous Data using the OpenFurther Platform

DNA Tutorial for Arcimboldo

org.hs.ipi.db November 7, 2017 annotation data package

Applied Bioinformatics

7.36/7.91/20.390/20.490/6.802/6.874 PROBLEM SET 3. Gibbs Sampler, RNA secondary structure, Protein Structure with PyRosetta, Connections (25 Points)

Integrated Access to Biological Data. A use case

Reliability Measure of 2D-PAGE Spot Matching using Multiple Graphs

Data Standards and Protocols for Biological Collections Data

Easy manual for SIRD interface

Customisable Curation Workflows in Argo

Domain Specific Search Engine for Students

Depositing small-angle scattering data and models to the Small-Angle Scattering Biological Data Bank (SASBDB).

Complex Query Formulation Over Diverse Information Sources Using an Ontology

Annual Report for the Utility Savings Initiative

Worldwide Protein Data Bank biocuration supporting open access to high-quality 3D structural biology data

Retrieving factual data and documents using IMGT-ML in the IMGT information system

Second Session (only if there should still be places available after the first one)

Transcription:

Protein Data Bank Japan http://www.pdbj.org/ PDBj Today gene information for many species is just at the point of being revealed. To make use of this information, it is necessary to look at the proteins for which the genes code. In particular, elucidation of an individual protein structure can reveal its specific function, and so structures are the key to understanding gene evolution and biological function on a molecular level. We, PDBj (Protein Data Bank Japan), maintain the database for these protein structures with financial assistance from the Institute for Bioinformatics Research and Development of Japan Science and Technology Corporation (BIRD-JST), collaborating with the Research Collaboratory for Structural Bioinformatics (RCSB) and the MSD in the European Bioinformatics Institute (EBI) in EU. All three organizations serve as deposition, data processing and distribution sites. For further information, see our Web site, http://www.pdbj.org/. PDB Deposition Statistics As of May 31st, 2003 the number of registered structures in the PDB (Protein Data Bank) is 21055, and the contribution from PDBj is ever increasing. On the PDBj Web page, the average number of access requests per day is 6,894, and the average number of page views delivered per day is 717. Both of these figures are higher than in the previous fiscal year. 1

http://www.pdbj.org/ XML description of PDB data and advanced search systems We developed our own preliminary XML (extensible Markup Language) description of PDB data along several fronts since the last fiscal year, keeping the data contents in coincidence with that described by the mmcif format. Using this preliminary development of the XML database, RCSB, EBI and PDBj have made efforts to establish a new canonical XML description of PDB data by June, 2003. The new schema file with the fully tagged data is now open at http: //deposit.pdb.org/pdbml/pdbx-v0.904.xsd, and the schema for the alternative description with separated coordinate files is available at http://deposit.pdb.org/pdbml/pdbx-v0.904-alt.xsd. Using the extensible nature of the XML format, we have also made efforts to include more information relating to individual proteins. Currently, PDB files are lacking a detailed description of function, experimental conditions, and the like. We add this information to the XML files by extracting it from journal articles that reference the PDB entries. The number of these annotated PDB entries is now over 2000. Based on the schema of the canonical XML description, we have prepared an experimental XML database (XML-DB_PDBj), including the additional information above. For querying the XML database, we provide multiple search strategies using X- Path. The best way to make use of an XML database is with Web Services. This is implemented using a protocol called SOAP (Simple Object Access Protocol). For information on our experimental services, see our Web site, http://www.pdbj.org/xml-db_pdbj/. ef-site database At ef-site (electrostatic molecular surface of Functional-Site), we probe the relationship between structure and function with an emphasis on the surface properties of the functional site. This is because proteins interact primarily at the molecular surface. Since the important properties that characterize a given functional site are the electrostatic potential and the geometry at the surface, we display these properties graphically, and enable surface-based database queries. The functional site information is grouped in five categories: antibody, prosite, active site, membrane proteins, and binding site. We have also recently added two new categories: mononucleotides and DNA. Moreover, to enhance convenience for the user, we developed our own web-based viewer (PDBjViewer) that enables visualization of the molecular surfaces of functional sites and the surface properties at the same time with conventional molecular graphics facilities. In the near future, ef-site will display a more exhaustive binding site database, consisting of the targets to all registered small molecule-binding proteins in the PDB database. 2

http://www.pdbj.org/ ProMode database ProMode is a database of normal mode analysis (NMA) of proteins, developed by Prof. Hiroshi Wako at Waseda Univ. Dynamics simulations of protein molecules are important to understand the principles of protein structure and function. NMA is based on the simple harmonic approximation, and it has been confirmed that the results from NMA are reasonable qualitatively in most cases. NMA is much less time-consuming in computation and more systematic and adequate for the analysis carried out routinely for many proteins than Monte Carlo or Molecular Dynamics simulations. The most time-consuming part of the NMA is regularization and energy minimization of the original structures in the PDB. ProMode uses the program FEDER developed by the group of Prof. Nobuhiro Go at Japan Atomic Energy Research Institute for rapid and efficient computations. (About FEDER, see T. Noguti and N. Go (1983) J. Phys. Soc. Jpn 52, 3685; H. Wako and N. Go (1987) J. Comp. Chem. 8, 625) Now, 80 proteins are available, and much more numbers of proteins will soon be analyzed and registered. Three-dimensional comparison tool: ASH "Structural Alignment", in parallel with other residue-based properties, makes use of protein conformational information. The reliability of the extracted information is significantly enhanced by simultaneously aligning many structures (multiple structure alignment). The team of Prof. Hiroyuki Toh at Kyoto Univ. has developed the protein structural multiple alignment system -ASH- based on the double dynamic programming algorithm that was originally proposed by Taylor and Orengo in 1989. This system utilizes the "distance cut-off approximation", introduced by Prof. Toh in 1997, as well as the "two-step alignment method" and the secondary structural forward alignment technique. These improvements have not only increased the processing speed, but have also improved the accuracy of the resulting alignments. The program can be accessed on the web at: http://timpani.genome.ad.jp /~ash/ and also on the PDBj home page. 3

http://www.pdbj.org/ PDB REMARK transcoder The team of Prof. Takenao Okawa at Osaka Univ. has developed "PDB REMARK transcoder", which is an effective translation tool for automatically extracting relevant information from PDB file REMARK lines and converting the information to XML format. At first, the PDB RE- MARK transcoder memorized description patterns automatically based on REMARK data from 100 entries in the PDB. With this data, 8,906 entries (about 5.2 million words) were converted to XML, and the accuracy of tagging for 10 entries that were selected at random was examined. The tagging is ranked fifth-level automatically based on the distance of property value from cluster to token when extracting description patterns. The highest reliable level accounts for 85% of all 8,906 entries. Thus PDB REMARK transcoder is significantly effective at the preliminary step toward the final artificial adjustment. We have now been optimizing the reliability of the transcoder by training on known data. eprots The PDB is aimed at experts in structural biology and is described only in English. This format may not be very convenient for users who are not specialists of structural biology, or for Japanese students. For this reason, we have started to develop the encyclopedia of Protein Structures "eprots". This database takes advantage of XML by summarizing important genetic information in a very comprehensive form. The resulting document serves as an educational resource for non-experts. We currently introduce about a hundred proteins in this manner, both in English and Japanese. 4

PDBj Staff Director Nakamura, Haruki (Prof., IPR, Osaka Univ.) Group for PDB Database Curation Kusunoki, Masami (Associate Prof., IPR, Osaka Univ.) Kosada, Takashi (Technical Assistant, IPR, Osaka Univ.) Igarashi, Reiko (Research Assistant, JST) Kengaku, Yumiko (Research Assistant, JST) Saeki, Shinobu (Research Assistant, JST) Morita, Yasuyo (Research Assistant, JST) Group for Development of XML description Ito, Nobutoshi (Prof., School of Biomedical Science, Tokyo Medical and Dental Univ.) Standley, M. Daron (Researcher, JST) Kobayashi, Kaori (Technical Assistant, JST) Sakamoto, Hisashi (Technical Assistant, JST) Shimizu, Yukiko (Research Assistant, JST) Takahashi, Aki (Research Assistant, JST) ef-site database Kinoshita, Kengo (Instructor, Grad., Sch., of Integrated Sci., Yokohama City Univ.) ProMode database Wako, Hiroshi (Prof., School of Social Sciences, Waseda Univ.) Three-dimentional comparison tool: ASH Toh, Hiroyuki (Prof., Bioinformatics Center, Institute for Chem. Research, Kyoto Univ.) PDB REMARK transcoder Okawa, Takenao (Associate Prof., Grad. Sch. of Informatics Sci., Osaka Univ.) Kaneda Yoshikazu(Assistant Researcher, JST) Secretary Group for NMR database (NMR) Akutsu, Hideo (Prof., IPR, Osaka Univ.) Matsuki, Yoh (Researcher, JST) Nakatani, Eiichi (Technical Assistant, JST) Kamada, Chisa (JST) Office address PDBj Research Center for Structural and Functional Proteomics, Institute for Protein Research (IPR), Osaka University 3-2 Yamadaoka, Suita, Osaka 565-0871, Japan TEL (PDBj office): +81-(0)6-6879-4311 TEL (PDBj deposition office): +81-(0)6-6879-8636 FAX: +81-(0)6-6879-8636 E-mail: pdbj@protein.osaka-u.ac.jp