New generation of patent sequence databases Information Sources in Biotechnology Japan

Similar documents
EMBL-EBI Patent Services

EBI services. Jennifer McDowall EMBL-EBI

EBI patent related services

Trilateral Search Guidebook in Biotechnology. [Ver.1 Publication ]

INTRODUCTION TO BIOINFORMATICS

INTRODUCTION TO BIOINFORMATICS

User Guide for DNAFORM Clone Search Engine

Compares a sequence of protein to another sequence or database of a protein, or a sequence of DNA to another sequence or library of DNA.

Lecture 5 Advanced BLAST

FASTA. Besides that, FASTA package provides SSEARCH, an implementation of the optimal Smith- Waterman algorithm.

BLAST. NCBI BLAST Basic Local Alignment Search Tool

The EPO Online Products Roadshow

BLAST, Profile, and PSI-BLAST

Biostatistics and Bioinformatics Molecular Sequence Databases

Global Dossier Document Sharing Proof of Concept IP5 GDTF

Enabling Open Science: Data Discoverability, Access and Use. Jo McEntyre Head of Literature Services

Bioinformatics Hubs on the Web

Information Resources in Molecular Biology Marcela Davila-Lopez How many and where

An Introduction to Patent Searching

Tutorial 4 BLAST Searching the CHO Genome

BIOINFORMATICS A PRACTICAL GUIDE TO THE ANALYSIS OF GENES AND PROTEINS

Deliverable D4.3 Release of pilot version of data warehouse

LinkDB: A Database of Cross Links between Molecular Biology Databases

CS313 Exercise 4 Cover Page Fall 2017

Global Dossier. Ford Khorsandian, Ellen Krabbe, Steve Sampson

Web-based tools for Bioinformatics; A (free) introduction to (freely available) NCBI, MUSC and World-wide Bioinformatics Resources.

Bioinformatics explained: BLAST. March 8, 2007

Introduction to Phylogenetics Week 2. Databases and Sequence Formats

How to store and visualize RNA-seq data

Data Mining Technologies for Bioinformatics Sequences

พ ชราว ไล พงษ ว ชช ลดา PATENT SEARCH : EPO & WIPO

In the previous issue of PAJ NEWS reported that since October 1, 2004, some services previously administered by the Japan Patent Office (JPO),

2) NCBI BLAST tutorial This is a users guide written by the education department at NCBI.

Request form of Collaborative Search Pilot Program

Finding homologous sequences in databases

Mapping RNA sequence data (Part 1: using pathogen portal s RNAseq pipeline) Exercise 6

BioExtract Server User Manual

TEN TRAPS FOR ATTORNEYS TO AVOID IN IP SEQUENCE SEARCH AND ANALYSIS

Patent Classification Codes Made Easy

Heuristic methods for pairwise alignment:

What do I do if my blast searches seem to have all the top hits from the same genus or species?

---(Slide 25)--- Next, I will explain J-PlatPat. J-PlatPat is useful in searching Japanese documents.

Recommendation for the Disclosure of Sequence Listings using XML (ST.26) Sue Wolski Office of PCT Legal Administration

What is Internet COMPUTER NETWORKS AND NETWORK-BASED BIOINFORMATICS RESOURCES

When we search a nucleic acid databases, there is no need for you to carry out your own six frame translation. Mascot always performs a 6 frame

Patent Web System (Read Only) Release 4 PATENT WEB SYSTEM (READ ONLY) RELEASE

Sequence Alignment. GBIO0002 Archana Bhardwaj University of Liege

REGISTER PLUS 1. INTRODUCTION 2. SEARCHING 2.1. SIMPLE SEARCH

Geneious 2.0. Biomatters Ltd

BovineMine Documentation

Common Citation Document (CCD) Handbook. User documentation and online help 1/60. Version 3.0

Database Searching Using BLAST

Lecture 4: January 1, Biological Databases and Retrieval Systems

Maize TE (transposable element) database users' guide July 8, 2008 modified June 29, Main web page. Retrieving information

mpmorfsdb: A database of Molecular Recognition Features (MoRFs) in membrane proteins. Introduction

Similarity Searches on Sequence Databases

visualize and recover Grapegen Affymetrix Genechip Probeset Initial page: Optimized for Mozilla Firefox 3 (recommended browser)

Locate patents which contain a biological sequence of interest in GENESEQ

HymenopteraMine Documentation

Introduction to BLAST with Protein Sequences. Utah State University Spring 2014 STAT 5570: Statistical Bioinformatics Notes 6.2

2. Take a few minutes to look around the site. The goal is to familiarize yourself with a few key components of the NCBI.

Submitting allele sequences to the GenBank NGSengine allele submission Sequin

What is a Web Service?

EPO INPADOC 44 years. Dr. Günther Vacek, EPO Patent Information Fair 2016, Tokyo. November 2016

As of August 15, 2008, GenBank contained bases from reported sequences. The search procedure should be

Improvements to services at the European Nucleotide Archive

Value-added Features of Commercial Patent Information Resources

Presenter: Payam Karisani

MetaPhyler Usage Manual

Sequence Alignment: BLAST

Database Searching Lecture - 2

Wilson Leung 01/03/2018 An Introduction to NCBI BLAST. Prerequisites: Detecting and Interpreting Genetic Homology: Lecture Notes on Alignment

cbioportal /5/401

IP Search Tools. Intellectual Property Teaching Kit

Maximizing the Value of STM Content through Semantic Enrichment. Frank Stumpf December 1, 2009

Multifile Patent Sequence Searching on STN. Robert Austin FIZ Karlsruhe

Using Biopython for Laboratory Analysis Pipelines

with Data Annotation Tool Yamato II

Annotating a single sequence

Geneious Biomatters Ltd

Bioinformatics Data Distribution and Integration via Web Services and XML

General Arc of a Search. 1. Define information need, get vocabulary. 2. Choose information source

SMART SEQUENCE SIMILARITY SEARCH (S 4 ) SYSTEM. A Project. Presented to the. Faculty of. California State University, San Bernardino

An Introduction to Taverna Workflows Katy Wolstencroft University of Manchester

Finding data. HMMER Answer key

Laboratorio di Basi di Dati per Bioinformatica

Integrated Access to Biological Data. A use case

GPFS at EBI. Facing performance degradation when using mmap based applications. Jordi Valls Systems Infrastructure Group

BIOEXTRACT SERVER TUTORIAL. Workflows within the BioExtract Server Leveraging iplant Resources. Title: Creating Bioinformatic

Proposal for the IP5 Global Dossier Active Phase Assumptions and Procedure January 25, 2018

EBI is an Outstation of the European Molecular Biology Laboratory.

SciVerse ScienceDirect. User Guide. October SciVerse ScienceDirect. Open to accelerate science

高通量生物序列比對平台 : myblast

24 Grundlagen der Bioinformatik, SS 10, D. Huson, April 26, This lecture is based on the following papers, which are all recommended reading:

Prior Art Search - Entry level - Japan Patent Office

Descriptions of the most

Outline of JPO s Activities for Using AI. May 2018 Japan Patent Office

Open. New. Search. Search Data & Review Analysis IP Experts. T F E

Lab 4: Multiple Sequence Alignment (MSA)

NCBI News, November 2009

Transcription:

New generation of patent sequence databases Information Sources in Biotechnology Japan EBI is an Outstation of the European Molecular Biology Laboratory.

Patent-related resources Patents Patent Resources 2 http://www.ebi.ac.uk/

Patent resources at EBI 3 http://www.ebi.ac.uk/patentdata/

Patent resources at EBI EPO Patent proteins: USPTO JPO KIPO Patent nucleotides: ENA (EPO, USPTO, JPO, KIPO) 4 Same sequences (EPO, USPTO, JPO, KIPO) Non-redundant sequence data Patent family classification Enriched with patent information

Sequence data from patent literature JPO USPTO NCBI GenBank NIG DDBJ KIPO INSDC 5 other patent offices INSDC agreement: Free unrestricted access All data exchanged daily EBI EMBL-Bank EPO NR patent sequence databases

Non-redundant patent databases Patent nucleotides Patent proteins Level-1 NRNL1 NRPL1 (Non-redundant (Non-redundant nucleotide level-1) protein level-1) Groups together 100% identical patent sequences Level-2 NRNL2 (Non-redundant NRPL2 (Non-redundant Groups together identical sequences nucleotide level-2) protein level-2) by patent family 6 http://www.ebi.ac.uk/patentdata/

Patent sequence record in NRNL1 7 Patents containing 100% identical sequence Sequence

8 Patent sequence record in NRNL2 Patent equivalents Sequence record in ENA Priority number and date Patent literature Translation Sequence

Non-redundant patent databases EMBL patents (redundant) Remove sequence redundancy Level-1 NR Group by patent families Additional annotation, including priority dates for patent families 9 Level-2 NR www.ebi.ac.uk

Patent sequence records at EBI Nucleotide ENA NRNL1 NRNL2 ~23.9 M PAT sequences (>230 M total) ~12.2 M sequences ~15.5 M sequences Protein Patent Proteins NRPL1 ~6.5 M PRT sequences (>32 M total) ~2.5 M sequences 10 NRPL2 ~3.8 M sequences

11 Sequence search

Sequence searching Tools Sequence Similarity & Analysis 12 http://www.ebi.ac.uk/

Sequence searching Wide variety of search tools 13 www.ebi.ac.uk/tools/sss/

Choosing the right search engine BLAST General search engine FASTA Better general search engine SSEARCH Sensitive but slow; good for short sequences GGSEARCH Force full-length matches Query Subject 14 GLSEARCH Match domains/patterns to protein; oligo-to-gene Query Subject

15 Search a variety of databases Protein *Select all 6 results in triplicate!! Patent databases

16 Search a variety of databases Nucleotide *Select all 3 results in triplicate!! Patent data

17 let s look at an example

Searching a redundant database Protein Example: Search patent protein sequence Patent proteins 18 http://www.ebi.ac.uk/tools/sss/

19 Results from a redundant database. >260 identical results too much to analyze

20 LEVEL-1 NR patent sequence database removes redundancy fewer results to analyze, less chance of missing important results

Searching NR level-1 patent database NR patent Level-1 Example: Search patent protein sequence NR patent level-1 21 http://www.ebi.ac.uk/tools/sss/

22 Results from NR level-1 database Each hit unique

23 Results from NR level-1 database List of all patents containing the sequence Earliest publication date Link to sequence entry Link to patent documentation

24 Patent families Simple Patent Family is a group of patents that relate to the same invention, and are based on the same originating application They arise when an invention is patented in multiple countries Grouping patents into families reduces multi-national results down to a representative member

Patent families patent family Invention A second patent family Invention B EP WO US US JP GM671154 ADA42650 CS017585 ACQ13114 DI603183 HB492658 AAR79155 DD649656 100% identical sequences Same sequence can appear multiple times in a database due to: Same invention filed multiple times in different offices (same patent family) Different inventors use the same sequence in different contexts (different 25 patent families)

26 LEVEL-2 NR patent sequence database groups identical sequences by patent family provides earliest priority date for family

Searching NR level-2 patent database NR patent Level-2 Example: Search patent protein sequence NR patent level-2 27 http://www.ebi.ac.uk/tools/sss/

28 Results from NR level-2 database Each hit = one family

29 Results from NR level-2 database Patent equivalents Earliest publication data in family Earliest active priority date in family

30 Results from NR level-2 database patents in same family Link to sequence entry Link to patent documentation

31 Text search

SRS: advanced text search 1 st : Select resources to search 2 nd : Create query 32 http://www.ebi.ac.uk/srs/

SRS: advanced text search Select library tab Sequence Searching Tools 33

SRS: advanced text search Search >100 databases Select library tab NR patent DNA (NRNL1 & NRNL2) NR patent proteins (NRPL1 & NRPL2) Sequence Searching Tools 34

SRS: advanced text search Search >100 databases Select library tab Example: Selected to search NR level-1 patent DNA database Sequence Searching Tools 35

SRS: advanced text search Select library tab Select resources to search Sequence Searching Tools 36

SRS: advanced text search Select library tab Select resources to search 1) Select field 2) Type in text Sequence Searching Tools 37

SRS: advanced text search Select library tab Select resources to search Sequence Searching Tools 38 Here, selected patent number

SRS: advanced text search Select library tab Select resources to search Create query Sequence Searching Tools 39

SRS: advanced text search Select library tab Select resources to search Create query Lists non-redundant nucleotide sequences from WO0146262 Sequence Searching Tools 40

SRS: advanced text search Select library tab Select resources to search Create query WO0146262 sequences Sequence Searching Tools 41

SRS: advanced text search Select library tab WO0146262 nucleotide sequence record in NRNL1 Select resources to search Create query WO0146262 sequences Sequence Searching Tools 42 Details which other patents also claim this sequence (with NRNL2, would see family grouping)

SRS: advanced text search Select library tab Select resources to search Create query NRNL1 sequence record WO0146262 sequences Sequence Searching Tools 43

SRS: advanced text search Select library tab Select resources to search Create query WO0146262 literature WO0146262 sequences NRNL1 sequence record Sequence Searching Tools 44 http://www.ebi.ac.uk/srs/

SRS: advanced text search EMBL-Bank Find all sequences associated with a patent NRNL1 Find all sequences associated with a patent + identify all patents associated with each sequence NRNL2 Find all sequences associated with a patent + identify all patents in the same family associated with each sequence Sequence Searching Tools 45

For more information Non-redundant 46 http://www.ebi.ac.uk/patentdata/

47 For more information User Manual Publication

48 Help Contacts: http://www.ebi.ac.uk/support/