SureChem and ChEMBL ACS CINF webinar John P. Overington & Nicko Goncharoff 8 th April 2014
Assay/Target ChEMBL Data for Drug Discovery 1. Scientific facts 3. Insight, tools and resources for translational drug discovery >Thrombin MAHVRGLQLPGCLALAALCSLVHSQHVFLAPQQARSLLQRVRRANTFLEEVRKGNLE RECVEETCSYEEAFEALESSTATDVFWAKYTACETARTPRDKLAACLEGNCAEGLGT NYRGHVNITRSGIECQLWRSRYPHKPEINSTTHPGADLQENFCRNPDSSTTGPWCYT TDPTVRRQECSIPVCGQDQVTVAMTPRSEGSSVNLSPPLEQCVPDRGQQYQGRLAVT THGLPCLAWASAQAKALSKHQDFNSAVQLVENFCRNPDGDEEGVWCYVAGKPGDFGY CDLNYCEEAVEEETGDGLDEDSDRAIEGRTATSEYQTFFNPRTFGSGEADCGLRPLF EKKSLEDKTERELLESYIDGRIVEGSDAEIGMSPWQVMLFRKSPQELLCGASLISDR WVLTAAHCLLYPPWDKNFTENDLLVRIGKHSRTRYERNIEKISMLEKIYIHPRYNWR ENLDRDIALMKLKKPVAFSDYIHPVCLPDRETAASLLQAGYKGRVTGWGNLKETWTA NVGKGQPSVLQVVNLPIVERPVCKDSTRIRITDNMFCAGYKPDEGKRGDACEGDSGG PFVMKSPFNNRWYQMGIVSWGEGCDRDGKYGFYTHVFRLKKWIQKVIDQFGE K i = 4.5nM Compound Bioactivity data APTT = 11 min. 2. Organization, integration, curation and standardization of pharmacology data
Overview of EMBL-EBI Chemistry Resources ChEBI ChEMBL SureChEMBL PDBe Atlas Structures, metadata for metabolites. Chemical Ontology Bioactivity data from literature and depositions Ligand structures from patent literature Ligand structures from structurally defined protein complexes Ligand induced transcript response UniChem InChI-based resolver (full + relaxed lenses ) ~70M
ChEMBL The world s largest primary public database of medicinal chemistry data ~1.4 million compounds, ~9,000 targets, ~12 million bioactivities Truly Open Data - CC-BY- SA license Many download/access formats Semantic Web RDF download, SPARQL endpoint at http://rdf.ebi.ac.uk/chembl ChEMBL Applicances mychembl linux VM ChEMpi raspberry pi
SureChEMBL EMBL-EBI acquired the SureChem product from Digital Science State-of-the-art chemistry patent product 15 million chemical structures Automatically extracted chemical structures from fulltext patent Research community wants open access to patent data Patent literature 2-3 years ahead of published literature Better competitive position Plan to provide ongoing free, Open resource to entire community
SureChEMBL Overview Patent Offices WO SureChem System Amazon Web Services Molfiles in patent Chemistry Database US Applications & granted EP Applications & Granted Processed patents Entity Recognition Image to Structure (one method) Name to Structure (five methods) Database JP Abstracts API Application Server Patent PDFs Users
Immediate Priorities Migrate working pipeline across to EMBL-EBI servers Establish new account system Migrate current user accounts Offer GUI access at SureChem Pro equivalent level Turn off API access and refactor new API in OpenPHACTS framework Partners in OpenPHACTS will get early test access and input into development pipeline Build RDF version of SureChEMBL
Future Plans Dependent on funding and interest! Add sequence searching Add disease term, animal disease model, etc. indexing KNIME/Pipeline Pilot nodes Add links to/from Europe PMC Extend image extraction retrospectively from 2006 spot pricing compute from AWS Provide weekly/monthly feed of patent structures to PubChem and ChemSpider Add chemical structure tagging & search to full text content of Europe PMC Develop UniChem VM for in-house private patent alerting using feed of SureChEMBL data
Keyword search The search interface http://www.surechembl.org/ Patent number search Filter by authority help Structure sketch Types of chemistry search Filter by date help Paste SMILES, MOL, name Filter by document section
Keyword-based search Example Searches roche OR novartis C07D048704 sterili?e kinase* Pfizer C07D kinase inhibitor pn: WO2011058149A1 pa:(bayer OR astra OR Genentech OR merck) AND desc:(chemotherap* AND (Phosphoinositide kinases~3 OR Pi3K)) http://support.surechem.com/knowledgebase/articles/92016-lucene-query-field-names-and-examples
Fielded keyword search Keyword search Filter by document section Logical operators
Patent number search
Patent number search
Chemistry-based search Types of search Structure sketch Filter by MW range Paste SMILES, MOL, name Filter by document section
Example searches Retrieve all antimalarial small molecule US patents ic:c07d AND ic:a61p003306 AND pnctry:us Retrieve a specific patent pn:wo2011058149a1 Similarity search (sildenafil nearest neighbours) Paste CCCc1nn(C)c2C(=O)NC(=Nc12)c3cc(ccc3OCC)S(=O)(=O)N4C CN(C)CC4
Example search
Review the hits
Review the hits
Select a subset of hits
Export hits (Pro user) Property range filters Count filters
Select a subset of hits
Review patent documents
Retrieve patent families
Review patent documents
Retrieve chemistry (Pro user) Property range filters Count filters
Summary Searching capabilities Free text keywords and Lucene fields Patent IDs & bibliographic information Patent authority & date Structure Retrieving capabilities Retrieve chemistry (with additional filters) Retrieve patent family information Retrieve annotated full patent text
Any questions? http://chembl.blogspot.co.uk/ http://chembl.blogspot.co.uk/search/label/webinar surechembl-help@ebi.ac.uk