SureChem and ChEMBL. ACS CINF webinar. John P. Overington & Nicko Goncharoff

Similar documents
Open PHACTS. An Introduction and Explanation March Acknowledgements: Contains contributions from across the Open PHACTS partners.

Open PHACTS. Deliverable 5.3.4

RDF Workshop. Building an RDF representation of the the ChEMBL Database. Mark Davies. ChEMBL Group, Technical Lead 30/04/2014

Customisable Curation Workflows in Argo

Powering Knowledge Discovery. Insights from big data with Linguamatics I2E

Pfizerpedia Patents The Who, what when and why of patents. David Walsh, Andrew Berridge, RDMi, Pfizer, Sandwich

Gábor Imre MADFAST SIMILARITY SEARCH

Deliverable D4.3 Release of pilot version of data warehouse

TEXT MINING: THE NEXT DATA FRONTIER

Integrating text and literature sources with traditional chemoinformatics tools

Unstructured Text in Big Data The Elephant in the Room

One Search Many Answers

KNIME Enalos+ Molecular Descriptor nodes

Maximizing the Value of STM Content through Semantic Enrichment. Frank Stumpf December 1, 2009

Deliverable D5.5. D5.5 VRE-integrated PDBe Search and Query API. World-wide E-infrastructure for structural biology. Grant agreement no.

Life Sciences Oracle Based Solutions. June 2004

Open PHACTS Explorer: Pharmacology by Enzyme Family

Visual Concept Detection and Linked Open Data at the TIB AV- Portal. Felix Saurbier, Matthias Springstein Hamburg, November 6 SWIB 2017

RDF friendly Chemical Taxonomies for Semantic Web (Using ORACLE/MySQL

SELF-SERVICE SEMANTIC DATA FEDERATION

Quick Reference Guide

KNIME Enalos+ Modelling nodes

Transitioning to Symyx

Facilitating Semantic Alignment of EBI Resources

ToxPredict Beta Testing Report Template

Webinar Annotate data in the EUDAT CDI

The Expansive Reach of ChemSpider as a Resource for the Chemistry Community. Antony Williams University of Oregon, April 24 th 2013

EMBL-EBI Patent Services

How to Work with a Reference Answer Set

New generation of patent sequence databases Information Sources in Biotechnology Japan

Trilateral Search Guidebook in Biotechnology. [Ver.1 Publication ]

Semantic Knowledge Discovery OntoChem IT Solutions

Structural Bioinformatics

Browsing Large Scale Cheminformatics Data with Dimension Reduction

ReaxysTutorial. Dr. QF Carlos F. Lagos

Luke S. Fisher, Ph.D. Manager, Client Services US Modeling and Simulation Support. July 24 th, 2008

PSF Project MMS Virtual LAB

Some useful resources. Data-mining

Building innovative drug discovery alliances. Migrating to ChemAxon

SciFinder Training Materials July 2017

Building innovative drug discovery alliances. Knime Desktop tools for chemists

Guide to Database Curation and New Structure Deposition January 2010

SciFinder Training Materials

Semantic MediaWiki (SMW) for Scientific Literature Management

Chemotion funded by. Göttingen eresearch Toolbox Series - Electronic Note Keeping. Nicole Jung.

Data Immersion : Providing Integrated Data to Infinity Scientists. Kevin Gilpin Principal Engineer Infinity Pharmaceuticals October 19, 2004

Markush Structure Usability in Patent and Combinatorial Chemistry: New Approaches and Software Tools. Wei Deng (David)

EBI services. Jennifer McDowall EMBL-EBI

NOW ON. Mike Takats Thomson Reuters April 30, 2013

Enabling Open Science: Data Discoverability, Access and Use. Jo McEntyre Head of Literature Services

The ELIXIR of Linked Data

Overview. IBEX - access and exploit SAR data from patents and journals

How to Work with a Substance Answer Set

Update: MIRIAM Registry and SBO

Exploring the Generation and Integration of Publishable Scientific Facts Using the Concept of Nano-publications

NCI Thesaurus, managing towards an ontology

New STN and BizInt Smart Charts

Triple store databases and their role in high throughput, automated extensible data analysis

The CALBC RDF Triple store: retrieval over large literature content

Using the List management features can be a useful way of collecting a set of results ready for export.

NOTSL Fall Meeting, October 30, 2015 Cuyahoga County Public Library Parma, OH by

Welcome - webinar instructions

Graph Modeling and Analysis in Oracle

Big Linked Data ETL Benchmark on Cloud Commodity Hardware

Bioqueries: A Social Community Sharing Experiences while Querying Biological Linked Data (

Data management and integration

irods for Data Management and Archiving UGM 2018 Masilamani Subramanyam

Geosemantically-enhanced PubMed Queries Using the Geonames Ontology and Web Services

Development of Text Mining Tools for Information Retrieval from Patents

Quick Guide 1: Legal Analytics

enanomapper database, search tools and templates Nina Jeliazkova, Nikolay Kochev IdeaConsult Ltd. Sofia, Bulgaria

What is Text Mining? Sophia Ananiadou National Centre for Text Mining University of Manchester

SciVerse ScienceDirect. User Guide. October SciVerse ScienceDirect. Open to accelerate science

An UIMA based Tool Suite for Semantic Text Processing

Visualization and text mining of patent and non-patent data

Text mining tools for semantically enriching the scientific literature

Great Migrations! Approaches to Moving your Chemistry. Michael Dippolito 2013 ChemAxon UGM Budapest

BioNav: An Ontology-Based Framework to Discover Semantic Links in the Cloud of Linked Data

SciVerse Scopus. 1. Scopus introduction and content coverage. 2. Scopus in comparison with Web of Science. 3. Basic functionalities of Scopus

BIOLOGICAL PATHWAYS AND THE SEMANTIC WEB

SBMLmerge and MIRIAM support

Taking a view on bio-ontologies. Simon Jupp Functional Genomics Production Team ICBO, 2012 Graz, Austria

Creating an Index of Hit Structures using BizInt Smart Charts for Patents

Click the Register button in the upper right part of the screen. Click the My Settings button. Then click the Change Password link

Presenting Eidogen/Sertanty Kinase Knowledge Base (KKB) via Dotmatics browser. Kerim Babaoglu

Markus Kaindl Senior Manager Semantic Data Business Owner SN SciGraph

American Institute of Physics

Ag Data Commons: Harnessing the Power of Digital Agriculture Cynthia Parr USDA ARS National Agricultural Library

Alternative Tools for Mining The Biomedical Literature

Issues and Opportunities Associated with Federated Searching

QUICK USER GUIDE. UG_1_Reaxys_Quick User Guide_print_AW.indd 1

Instruction for Reaxys database

Metasearch Process for Transcription Targets

Data publication and discovery with Globus

<is web> Information Systems & Semantic Web University of Koblenz Landau, Germany

Thomson Reuters Graph Feed & Amazon Neptune

Semantic Enrichment ARMA Chicago Spring Seminar April 18, 2018

AmbitXT v2.1.0 Manual

Taxonomy Tools: Collaboration, Creation & Integration. Dow Jones & Company

How to contribute information to AGRIS

Transcription:

SureChem and ChEMBL ACS CINF webinar John P. Overington & Nicko Goncharoff 8 th April 2014

Assay/Target ChEMBL Data for Drug Discovery 1. Scientific facts 3. Insight, tools and resources for translational drug discovery >Thrombin MAHVRGLQLPGCLALAALCSLVHSQHVFLAPQQARSLLQRVRRANTFLEEVRKGNLE RECVEETCSYEEAFEALESSTATDVFWAKYTACETARTPRDKLAACLEGNCAEGLGT NYRGHVNITRSGIECQLWRSRYPHKPEINSTTHPGADLQENFCRNPDSSTTGPWCYT TDPTVRRQECSIPVCGQDQVTVAMTPRSEGSSVNLSPPLEQCVPDRGQQYQGRLAVT THGLPCLAWASAQAKALSKHQDFNSAVQLVENFCRNPDGDEEGVWCYVAGKPGDFGY CDLNYCEEAVEEETGDGLDEDSDRAIEGRTATSEYQTFFNPRTFGSGEADCGLRPLF EKKSLEDKTERELLESYIDGRIVEGSDAEIGMSPWQVMLFRKSPQELLCGASLISDR WVLTAAHCLLYPPWDKNFTENDLLVRIGKHSRTRYERNIEKISMLEKIYIHPRYNWR ENLDRDIALMKLKKPVAFSDYIHPVCLPDRETAASLLQAGYKGRVTGWGNLKETWTA NVGKGQPSVLQVVNLPIVERPVCKDSTRIRITDNMFCAGYKPDEGKRGDACEGDSGG PFVMKSPFNNRWYQMGIVSWGEGCDRDGKYGFYTHVFRLKKWIQKVIDQFGE K i = 4.5nM Compound Bioactivity data APTT = 11 min. 2. Organization, integration, curation and standardization of pharmacology data

Overview of EMBL-EBI Chemistry Resources ChEBI ChEMBL SureChEMBL PDBe Atlas Structures, metadata for metabolites. Chemical Ontology Bioactivity data from literature and depositions Ligand structures from patent literature Ligand structures from structurally defined protein complexes Ligand induced transcript response UniChem InChI-based resolver (full + relaxed lenses ) ~70M

ChEMBL The world s largest primary public database of medicinal chemistry data ~1.4 million compounds, ~9,000 targets, ~12 million bioactivities Truly Open Data - CC-BY- SA license Many download/access formats Semantic Web RDF download, SPARQL endpoint at http://rdf.ebi.ac.uk/chembl ChEMBL Applicances mychembl linux VM ChEMpi raspberry pi

SureChEMBL EMBL-EBI acquired the SureChem product from Digital Science State-of-the-art chemistry patent product 15 million chemical structures Automatically extracted chemical structures from fulltext patent Research community wants open access to patent data Patent literature 2-3 years ahead of published literature Better competitive position Plan to provide ongoing free, Open resource to entire community

SureChEMBL Overview Patent Offices WO SureChem System Amazon Web Services Molfiles in patent Chemistry Database US Applications & granted EP Applications & Granted Processed patents Entity Recognition Image to Structure (one method) Name to Structure (five methods) Database JP Abstracts API Application Server Patent PDFs Users

Immediate Priorities Migrate working pipeline across to EMBL-EBI servers Establish new account system Migrate current user accounts Offer GUI access at SureChem Pro equivalent level Turn off API access and refactor new API in OpenPHACTS framework Partners in OpenPHACTS will get early test access and input into development pipeline Build RDF version of SureChEMBL

Future Plans Dependent on funding and interest! Add sequence searching Add disease term, animal disease model, etc. indexing KNIME/Pipeline Pilot nodes Add links to/from Europe PMC Extend image extraction retrospectively from 2006 spot pricing compute from AWS Provide weekly/monthly feed of patent structures to PubChem and ChemSpider Add chemical structure tagging & search to full text content of Europe PMC Develop UniChem VM for in-house private patent alerting using feed of SureChEMBL data

Keyword search The search interface http://www.surechembl.org/ Patent number search Filter by authority help Structure sketch Types of chemistry search Filter by date help Paste SMILES, MOL, name Filter by document section

Keyword-based search Example Searches roche OR novartis C07D048704 sterili?e kinase* Pfizer C07D kinase inhibitor pn: WO2011058149A1 pa:(bayer OR astra OR Genentech OR merck) AND desc:(chemotherap* AND (Phosphoinositide kinases~3 OR Pi3K)) http://support.surechem.com/knowledgebase/articles/92016-lucene-query-field-names-and-examples

Fielded keyword search Keyword search Filter by document section Logical operators

Patent number search

Patent number search

Chemistry-based search Types of search Structure sketch Filter by MW range Paste SMILES, MOL, name Filter by document section

Example searches Retrieve all antimalarial small molecule US patents ic:c07d AND ic:a61p003306 AND pnctry:us Retrieve a specific patent pn:wo2011058149a1 Similarity search (sildenafil nearest neighbours) Paste CCCc1nn(C)c2C(=O)NC(=Nc12)c3cc(ccc3OCC)S(=O)(=O)N4C CN(C)CC4

Example search

Review the hits

Review the hits

Select a subset of hits

Export hits (Pro user) Property range filters Count filters

Select a subset of hits

Review patent documents

Retrieve patent families

Review patent documents

Retrieve chemistry (Pro user) Property range filters Count filters

Summary Searching capabilities Free text keywords and Lucene fields Patent IDs & bibliographic information Patent authority & date Structure Retrieving capabilities Retrieve chemistry (with additional filters) Retrieve patent family information Retrieve annotated full patent text

Any questions? http://chembl.blogspot.co.uk/ http://chembl.blogspot.co.uk/search/label/webinar surechembl-help@ebi.ac.uk