The Text Analytics Challenge BioCreative V - Extraction of causal network information in BEL

Size: px
Start display at page:

Download "The Text Analytics Challenge BioCreative V - Extraction of causal network information in BEL"

Transcription

1 The Text Analytics Challenge BioCreative V - Extraction of causal network information in BEL Fabio Rinaldi

2 Outline Biomedical text mining, motivation Competitive evaluations: BioNLP, BioCreative The BEL BioCreative V Outlook

3 Motivation The purpose of biomedical curation activities is to help the Life Sciences community to make sense of all the data that is accumulating. A. Bairoch, The future of annotation/ biocuration. Nature Preceedings 2009.

4 Growth of PubMed citations from 1986 to Lu Z Database 2011;2011:baq036 The Author(s) Published by Oxford University Press.

5 Motivation The purpose of biomedical curation activities is to help the Life Sciences community to make sense of all the data that is accumulating. Nobody will ever be able to manually annotate all the macromolecular biological entities that exist on this planet, and consequently automatization is the only solution. A. Bairoch, The future of annotation/biocuration. Nature Preceedings 2009.

6 Why text mining? Massive amount of published material, human curation is impossible Text mining can assist Database curation Targeted searches by scientist Identification of research targets by industry Build systemic networks Text mining technologies are regularly evaluated through community assessments

7 Goals of community assessments Determine state of the art Monitor improvements Investigation different approaches Evaluation of new strategies Identification of positive / negative features Scientific forum stimulate progress in research

8 Competitive evaluations BioCreative BioNLP BioASQ i2b2 (medical) CALBC CLEF-ER QA4MRE Semeval

9 BioASQ Three editions so far: 2013, 2014, 2015 Two tasks: a. Large-scale online biomedical semantic indexing Annotate PubMed abstract with classes from the MeSH hierarchy b. Introductory biomedical semantic QA Questions to be answered with relevant concepts (from designated terminologies and ontologies), relevant articles (in English, from designated article repositories), relevant snippets (from the relevant articles), and relevant RDF triples (from designated ontologies).

10 BioNLP shared task

11 BioNLP 2009 Task 1. Core event extraction (mandatory) Unary events: 70%; binding and regulation: 40% Task 2. Event enrichment (optional) phosphorylation of TRAF2 (Type:Phosphorylation, Theme:TRAF2) localization of beta-catenin into nucleus (Type:Localization, Theme:beta-catenin, ToLoc:nucleus) Task 3. Negation and speculation recognition (optional) TRADD did not interact with TES2 (Negation (Type:Binding, Theme:TRADD, Theme:TES2)) Total number of participants: 24

12 BioNLP 2011 [GE] GENIA; p: 15, F: 53% (full text), F: 57% (abstracts) [EPI] Epigenetics and Post-translational Modifications; p: 7, F: 53% [ID] Infectious Diseases; p: 7, F: 56% Bacteria Track: [BB] Biotopes (p:3, F: 45%), [BI] Gene Interactions (p: 1, F: 77.0%), [REN] Bacteria Gene Renaming (p:3, F: 87.0%) [CO] Protein/Gene Coreference Task, p:6, F: 34.1% [REL] Entity Relations Supporting Task, p: 4, F: 57.7%

13 BioNLP 2013 [GE] Genia Event Extraction; p: 10, F: 51% [CG] Cancer Genetics; p: 2, F: 55.4% [PC] Pathway Curation; p: 2, F: 52.8% [GRO] Corpus Annotation with Gene Regulation Ontology; p: 1, F: 22% [GRN] Gene Regulation Network in Bacteria; p: 5, SER: 73% [BB] Bacteria Biotopes (semantic annotation by an ontology); P: 5, entities SER: 46%, relations: 40%, events: 14%

14 BioCreative

15 BioCreative I (2004) BioCreative I 27 Teams, Granada, Spain Hirschman et al. Overview of BioCreative: critical assessment of information extraction for biology. BMC Bioinformatics (2005), 6:S1 Tracks: identification of gene mentions in text and linking protein database entries to abstracts. extraction of human gene product annotations with GO terms

16 BioCreative II (2006) BioCreative II 44 teams, Madrid, Spain Krallinger et al. Evaluation of text-mining systems in Biology: overview of the Second BioCreative community challenge, Genome Biology (2008), 9:S1 Tracks Gene mention tagging [GM] Gene normalization [GN] Extraction of protein-protein interactions from text [PPI]: IAS (article), IPS (pair), IMS (methods), ISS (evidence)

17 BioCreative II: GM, GN

18 BioCreative II: PPI

19 Species

20 BioCreative II.5 (2009) Challenge run through web services Participants: 16 teams Corpus: FEBS Letter 2007 Goal: Reproduce the Structured Digital Abstract Subtasks: [ACT] Article classification; AUC: 67.8% [INT] Interactor Normalization; AUC: 43.5% [IPT] Interaction Pair; AUC: 22.2%

21

22 Structured Digital Abstracts

23 BioCreative III (2010) Tasks: gene mentions (GM); p: 14, TAP-10: 34.6% protein-protein interactions article classification (ACT); p: 10, AUC: 68% experimental Methods (IMT); p: 9, AUC: 53% interactive task (IAT); p: 6

24 PPI-IMT

25 BioCreative IV (2013) Task 1: BioC (PyBioC); p: 9 Task 2: CHEMDNER (chemicals); p: 27 Task 3: CTD web service; p: 7 Best F-score: 87.39% CEM, 88.20% CDI gene: 61%, chemical: 74%, disease: 51%, act: 54% Task 4: GO annotations; p: 8 Task A (evidence text), best F: 0.27 (exact) / 0.38 Task B (predict GO terms), best F: 0.13 / 0.34 (hier.) Task 5: Interactive task; p: 9

26 BioCreative V (2015) Collaborative Biocurator Assistant Task (BioC) CHEMDNER patents Chemical-disease relation (CDR) task Extraction of causal network information in Biological Expression Language (BEL) Interactive Curation (IAT)

27 The BEL BioCreative V

28 BEL Track: Timeline Oct 2014: preparation of proposal for Task Nov 2014: approval of proposal Dec 2014: administrative and contractual arrangements Jan 2015: official start of supported activity Feb 2015: release sample set Mar 2015: release training set Mar-May 2015: preparation of evaluation framework and supporting data Jun 15-18: release of test set and official evaluation

29 Datasets and supporting material Sample set Training set 295 BEL statements with evidence BEL statements with evidence Supporting material: BioC version Structural graphs Fragments (tsv representation of BEL statements) Entities (list of entities contained in BEL statement)

30 Tasks Task 1 Given textual evidence for a BEL statement, generate the corresponding BEL statement Data: 100 Sentences Accept 3 runs per participant Accept up to 10 BEL statements or fragments per sentence Task 2 Given a BEL statement, provide at most 10 additional evidence sentences Data: 100 BEL statements, verified to have evidence in PubMed Accept 1 run per participant, each with 10 sentences ranked by relevance

31 Simplifications Selection of statements: non-nested Selection of relationships decreases/directlydecreases, increases/directlyincreases Namespaces: Six namespaces considered (HGNC, MGI, EGID, GOPB, MESHD, CHEBI) Equivalence between HGNC, MGI and EGID Simplification of abundance functions for gene/protein: p() can be used instead of g(), m() and r()) Restrictions and equivalences of functions Simplification of Abundance Modifier Functions: Cellular locations are not requested P is used as default argument to pmod() (pmod(p)) Simplification of functions: act() is used instead of cat(), tscript(), kin(), gtp(), sec(), surf() etc.

32 Documentation BEL track initial pages at biocreative.org BEL track extended description at openbel.org: wiki.openbel.org/display/bioc/biocreative+home Setup of the task Sample and Training data Evaluation details

33 Information to participants Broadcast calls to several mailing list Announcements on the BioCreative mailing list Set-up dedidated google group 13 registered users Used to deliver target information about the challenge setup Evaluation web SCAI

34 Evaluation metrics Primary: Term (T), Function (F), Relationship (R), Full BEL statement (F). Secondary: Function (Fs), Relationship (Rs) 2nd stage includes gold standard entities

35 Format conversions Definition of BEL/BioC format (collaboration with NLM/NCBI) Parsing of BEL statements via ruby, conversion BEL into BioC (and RDF) Visualization of BEL structure via graph

36

37 Timeline: next steps Jul 2015: Feedback to participants on evaluation results Preparation of proceedings: Evaluation kaggle Aug 2015: Arrangements for workshop Possible revisions of overview paper and participants papers Sep 2015: overview paper collection of participants papers Workshop, September 9-11, Sevilla, Spain Sep Dec: Writing of journal paper (DATABASE)

38 BEL task: challenges One sentence is a very limited context Disambiguation of named entities is context dependent One sentence often does not offer sufficient context, in particular for species identification No negative set Several levels of analysis: entities, functions, relations Multiple / large namespaces

39 Outlook A considerable investment in terms of time and resources Yet few participants expected, why? Novelty of the task (it takes time to adapt tools or create new ones) Short time available for development, due to late start and early evaluation Investment made so far will pay off in future editions Participants will become gradually more familiar with the nature of the task Evaluation framework can be reused Documentation can be partially reused (extended and adapted) Workshop raises attention to BEL in the text mining community

40 Bel Task kaggle.com? How well can a general purpose Machine Learning Community solve a biomedical Text Mining task? Challenges and Benefits Provide the evidence in a support data format which lowers the NLP requirements for participants Model the task as a multi-label, multi-class prediction problem NER output, Chunking, Dependency Parsing Representing structured outcomes in this way seems to be interesting for the ML community (e.g. Meka) Get more ML approaches tested on the Bel Task data set Kaggle has an active and innovative community

41 Summary Text mining in biology: essential for coping with the information deluge Competitive evaluations: provide rigorous evaluation in a controlled environment BEL challenge: a novel task with a great potential However: more time is needed to allow participants to become familiar with such a complex framework and develop useful systems

42 Acknowledgments BEL task BioCreative Juliane Fluck Martin Krallinger,CNIO Sumit Madan Florian Leitner, CNIO Tilia Ellendorff Simon Clematide Alfonso Valencia, CNIO Lynette Hirschman, MITRE Adrian van der Lek Sam Ansari Julia Hoeng Manuel Peitsch

Overview of BioCreative VI Precision Medicine Track

Overview of BioCreative VI Precision Medicine Track Overview of BioCreative VI Precision Medicine Track Mining scientific literature for protein interactions affected by mutations Organizers: Rezarta Islamaj Dogan (NCBI) Andrew Chatr-aryamontri (BioGrid)

More information

A Framework for BioCuration (part II)

A Framework for BioCuration (part II) A Framework for BioCuration (part II) Text Mining for the BioCuration Workflow Workshop, 3rd International Biocuration Conference Friday, April 17, 2009 (Berlin) Martin Krallinger Spanish National Cancer

More information

Text mining tools for semantically enriching the scientific literature

Text mining tools for semantically enriching the scientific literature Text mining tools for semantically enriching the scientific literature Sophia Ananiadou Director National Centre for Text Mining School of Computer Science University of Manchester Need for enriching the

More information

Information Retrieval, Information Extraction, and Text Mining Applications for Biology. Slides by Suleyman Cetintas & Luo Si

Information Retrieval, Information Extraction, and Text Mining Applications for Biology. Slides by Suleyman Cetintas & Luo Si Information Retrieval, Information Extraction, and Text Mining Applications for Biology Slides by Suleyman Cetintas & Luo Si 1 Outline Introduction Overview of Literature Data Sources PubMed, HighWire

More information

Projects Tools BLAH proposal Conclusion. OntoGene/BioMeXT

Projects Tools BLAH proposal Conclusion. OntoGene/BioMeXT OntoGene/BioMeXT The Bio Term Hub and OGER Lenz Furrer, Nico Colic, Fabio Rinaldi University of Zurich and Swiss Institute of Bioinformatics January 10, 2018 Outline Projects Tools BLAH proposal Conclusion

More information

The CALBC RDF Triple store: retrieval over large literature content

The CALBC RDF Triple store: retrieval over large literature content The CALBC RDF Triple store: retrieval over large literature content Samuel Croset, Christoph Grabmüller, Chen Li, Silverstras Kavaliauskas, Dietrich Rebholz-Schuhmann croset@ebi.ac.uk 10 th December 2010,

More information

BioC: a minimalist approach to interoperability for biomedical text processing. Don Comeau

BioC: a minimalist approach to interoperability for biomedical text processing. Don Comeau BioC: a minimalist approach to interoperability for biomedical text processing Don Comeau Outline Background and origin of BioC What is BioC? Available Tools and Corpora 2 BioCreative Critical Assessment

More information

Customisable Curation Workflows in Argo

Customisable Curation Workflows in Argo Customisable Curation Workflows in Argo Rafal Rak*, Riza Batista-Navarro, Andrew Rowley, Jacob Carter and Sophia Ananiadou National Centre for Text Mining, University of Manchester, UK *Corresponding author:

More information

The user interactive task (IAT) in BioCreative Challenges BioCreative Workshop on Text Mining Applications April 7, 2014

The user interactive task (IAT) in BioCreative Challenges BioCreative Workshop on Text Mining Applications April 7, 2014 The user interactive task (IAT) in BioCreative Challenges BioCreative Workshop on Text Mining Applications April 7, 2014 N., PhD Research Associate Professor Protein Information Resource CBCB, University

More information

TEES 2.2: Biomedical Event Extraction for Diverse Corpora

TEES 2.2: Biomedical Event Extraction for Diverse Corpora RESEARCH Open Access TEES 2.2: Biomedical Event Extraction for Diverse Corpora Jari Björne 1,2*, Tapio Salakoski 1,2 From BioNLP Shared Task 2013 Sofia, Bulgaria. 9 August 2013 Abstract Background: The

More information

Chemical name recognition with harmonized feature-rich conditional random fields

Chemical name recognition with harmonized feature-rich conditional random fields Chemical name recognition with harmonized feature-rich conditional random fields David Campos, Sérgio Matos, and José Luís Oliveira IEETA/DETI, University of Aveiro, Campus Universitrio de Santiago, 3810-193

More information

Improving Interoperability of Text Mining Tools with BioC

Improving Interoperability of Text Mining Tools with BioC Improving Interoperability of Text Mining Tools with BioC Ritu Khare, Chih-Hsuan Wei, Yuqing Mao, Robert Leaman, Zhiyong Lu * National Center for Biotechnology Information, 8600 Rockville Pike, Bethesda,

More information

Benchmarking biomedical text mining web servers at BioCreative V.5: the technical Interoperability and Performance of annotation Servers - TIPS track

Benchmarking biomedical text mining web servers at BioCreative V.5: the technical Interoperability and Performance of annotation Servers - TIPS track Benchmarking biomedical text mining web servers at BioCreative V.5: the technical Interoperability and Performance of annotation Servers - TIPS track Martin Pérez-Pérez 1,2, Gael Pérez-Rodríguez 1,2, Aitor

More information

A curation pipeline and web-services for PDF documents

A curation pipeline and web-services for PDF documents A curation pipeline and web-services for PDF documents André Santos 1, Sérgio Matos 1, David Campos 2 and José Luís Oliveira 1 1 DETI/IEETA, University of Aveiro, 3810-193 Aveiro, Portugal {aleixomatos,andre.jeronimo,jlo}@ua.pt

More information

Measuring inter-annotator agreement in GO annotations

Measuring inter-annotator agreement in GO annotations Measuring inter-annotator agreement in GO annotations Camon EB, Barrell DG, Dimmer EC, Lee V, Magrane M, Maslen J, Binns ns D, Apweiler R. An evaluation of GO annotation retrieval for BioCreAtIvE and GOA.

More information

@Note2 tutorial. Hugo Costa Ruben Rodrigues Miguel Rocha

@Note2 tutorial. Hugo Costa Ruben Rodrigues Miguel Rocha @Note2 tutorial Hugo Costa (hcosta@silicolife.com) Ruben Rodrigues (pg25227@alunos.uminho.pt) Miguel Rocha (mrocha@di.uminho.pt) 23-01-2018 The document presents a typical workflow using @Note2 platform

More information

What is Text Mining? Sophia Ananiadou National Centre for Text Mining University of Manchester

What is Text Mining? Sophia Ananiadou National Centre for Text Mining   University of Manchester National Centre for Text Mining www.nactem.ac.uk University of Manchester Outline Aims of text mining Text Mining steps Text Mining uses Applications 2 Aims Extract and discover knowledge hidden in text

More information

RLIMS-P Website Help Document

RLIMS-P Website Help Document RLIMS-P Website Help Document Table of Contents Introduction... 1 RLIMS-P architecture... 2 RLIMS-P interface... 2 Login...2 Input page...3 Results Page...4 Text Evidence/Curation Page...9 URL: http://annotation.dbi.udel.edu/text_mining/rlimsp2/

More information

Extraction of biomedical events using case-based reasoning

Extraction of biomedical events using case-based reasoning Extraction of biomedical events using case-based reasoning Mariana L. Neves Biocomputing Unit Centro Nacional de Biotecnología - CSIC C/ Darwin 3, Campus de Cantoblanco, 28049, Madrid, Spain mlara@cnb.csic.es

More information

Semi-Supervised Abstraction-Augmented String Kernel for bio-relationship Extraction

Semi-Supervised Abstraction-Augmented String Kernel for bio-relationship Extraction Semi-Supervised Abstraction-Augmented String Kernel for bio-relationship Extraction Pavel P. Kuksa, Rutgers University Yanjun Qi, Bing Bai, Ronan Collobert, NEC Labs Jason Weston, Google Research NY Vladimir

More information

This document contains information about the annotation workflow for the Full BioCreative interactive task.

This document contains information about the annotation workflow for the Full BioCreative interactive task. BioCreative IV-User Interactive Task RLIMS-P Annotation Task This document contains information about the annotation workflow for the Full BioCreative interactive task. Annotation Workflow using RLIMS-P

More information

A new methodology for gene normalization using a mix of taggers, global alignment matching and document similarity disambiguation

A new methodology for gene normalization using a mix of taggers, global alignment matching and document similarity disambiguation A new methodology for gene normalization using a mix of taggers, global alignment matching and document similarity disambiguation Mariana Neves 1, Monica Chagoyen 1, José M Carazo 1, Alberto Pascual-Montano

More information

Natural Language Processing Pipelines to Annotate BioC Collections with an Application to the NCBI Disease Corpus

Natural Language Processing Pipelines to Annotate BioC Collections with an Application to the NCBI Disease Corpus Natural Language Processing Pipelines to Annotate BioC Collections with an Application to the NCBI Disease Corpus Donald C. Comeau *, Haibin Liu, Rezarta Islamaj Doğan and W. John Wilbur National Center

More information

Ranking of CTD articles and interactions using the OntoGene pipeline

Ranking of CTD articles and interactions using the OntoGene pipeline Ranking of CTD articles and interactions using the OntoGene pipeline Fabio Rinaldi, Simon Clematide and Simon Hafner Institute of Computational Linguistics, University of Zurich {rinaldi,siclemat}@cl.uzh.ch,{hafnersimon@gmail.com}

More information

Update: MIRIAM Registry and SBO

Update: MIRIAM Registry and SBO Update: MIRIAM Registry and SBO Nick Juty, EMBL-EBI 3rd Sept, 2011 Overview MIRIAM Registry MIRIAM Guidelines.. MIRIAM Registry content URIs (URN form), example Summary/current developments SBO Purpose

More information

Exploring and Exploiting the Biological Maze. Presented By Vidyadhari Edupuganti Advisor Dr. Zoe Lacroix

Exploring and Exploiting the Biological Maze. Presented By Vidyadhari Edupuganti Advisor Dr. Zoe Lacroix Exploring and Exploiting the Biological Maze Presented By Vidyadhari Edupuganti Advisor Dr. Zoe Lacroix Motivation An abundance of biological data sources contain data about scientific entities, such as

More information

New Concept for Article 36 Networking and Management of the List

New Concept for Article 36 Networking and Management of the List New Concept for Article 36 Networking and Management of the List Kerstin Gross-Helmert, AFSCO 28 th Meeting of the Focal Point Network EFSA, MTG SEAT 00/M08-09 THE PRESENTATION Why a new concept? What

More information

Maximizing the Value of STM Content through Semantic Enrichment. Frank Stumpf December 1, 2009

Maximizing the Value of STM Content through Semantic Enrichment. Frank Stumpf December 1, 2009 Maximizing the Value of STM Content through Semantic Enrichment Frank Stumpf December 1, 2009 What is Semantics and Semantic Processing? Content Knowledge Framework Technology Framework Search Text Images

More information

SAPIENT Automation project

SAPIENT Automation project Dr Maria Liakata Leverhulme Trust Early Career fellow Department of Computer Science, Aberystwyth University Visitor at EBI, Cambridge mal@aber.ac.uk 25 May 2010, London Motivation SAPIENT Automation Project

More information

Acquiring Experience with Ontology and Vocabularies

Acquiring Experience with Ontology and Vocabularies Acquiring Experience with Ontology and Vocabularies Walt Melo Risa Mayan Jean Stanford The author's affiliation with The MITRE Corporation is provided for identification purposes only, and is not intended

More information

Bio wikis. Paolo Romano Bioinformatics, National Cancer Research Institute, Genova

Bio wikis. Paolo Romano Bioinformatics, National Cancer Research Institute, Genova Bio wikis Paolo Romano (paolo.romano@istge.it) Bioinformatics, National Cancer Research Institute, Genova Outline o Wiki systems: aims and technologies o Working with wikis: practical issues for setting

More information

Learning to Answer Biomedical Factoid & List Questions: OAQA at BioASQ 3B

Learning to Answer Biomedical Factoid & List Questions: OAQA at BioASQ 3B Learning to Answer Biomedical Factoid & List Questions: OAQA at BioASQ 3B Zi Yang, Niloy Gupta, Xiangyu Sun, Di Xu, Chi Zhang, Eric Nyberg Language Technologies Institute School of Computer Science Carnegie

More information

EVENT EXTRACTION WITH COMPLEX EVENT CLASSIFICATION USING RICH FEATURES

EVENT EXTRACTION WITH COMPLEX EVENT CLASSIFICATION USING RICH FEATURES Journal of Bioinformatics and Computational Biology Vol. 8, No. 1 (2010) 131 146 c 2010 The Authors DOI: 10.1142/S0219720010004586 EVENT EXTRACTION WITH COMPLEX EVENT CLASSIFICATION USING RICH FEATURES

More information

SciMiner User s Manual

SciMiner User s Manual SciMiner User s Manual Copyright 2008 Junguk Hur. All rights reserved. Bioinformatics Program University of Michigan Ann Arbor, MI 48109, USA Email: juhur@umich.edu Homepage: http://jdrf.neurology.med.umich.edu/sciminer/

More information

MIRACLE at ImageCLEFmed 2008: Evaluating Strategies for Automatic Topic Expansion

MIRACLE at ImageCLEFmed 2008: Evaluating Strategies for Automatic Topic Expansion MIRACLE at ImageCLEFmed 2008: Evaluating Strategies for Automatic Topic Expansion Sara Lana-Serrano 1,3, Julio Villena-Román 2,3, José C. González-Cristóbal 1,3 1 Universidad Politécnica de Madrid 2 Universidad

More information

Connecting Text Mining and Pathways using the PathText Resource

Connecting Text Mining and Pathways using the PathText Resource Connecting Text Mining and Pathways using the PathText Resource Sætre, Kemper, Oda, Okazaki a, Matsuoka b, Kikuchi c, Kitano d, Tsuruoka, Ananiadou, Tsujii e a Computer Science, University of Tokyo, Hongo

More information

A Semantic Web-Based Approach for Harvesting Multilingual Textual. definitions from Wikipedia to support ICD-11 revision

A Semantic Web-Based Approach for Harvesting Multilingual Textual. definitions from Wikipedia to support ICD-11 revision A Semantic Web-Based Approach for Harvesting Multilingual Textual Definitions from Wikipedia to Support ICD-11 Revision Guoqian Jiang 1,* Harold R. Solbrig 1 and Christopher G. Chute 1 1 Department of

More information

Development of Text Mining Tools for Information Retrieval from Patents

Development of Text Mining Tools for Information Retrieval from Patents Development of Text Mining Tools for Information Retrieval from Patents Tiago Alves 1,2(B),Rúben Rodrigues 1, Hugo Costa 2, and Miguel Rocha 1 1 Centre Biological Engineering, University of Minho, 4710-057

More information

EFFICIENT AUTOMATED PROCESSING OF BIOMEDICAL LITERATURE

EFFICIENT AUTOMATED PROCESSING OF BIOMEDICAL LITERATURE EFFICIENT AUTOMATED PROCESSING OF BIOMEDICAL LITERATURE NICO COLIC 1. Introduction The rate at which biomedical research papers are published is ever increasing. Because of this, professionals rely on

More information

Profiling Medical Journal Articles Using a Gene Ontology Semantic Tagger. Mahmoud El-Haj Paul Rayson Scott Piao Jo Knight

Profiling Medical Journal Articles Using a Gene Ontology Semantic Tagger. Mahmoud El-Haj Paul Rayson Scott Piao Jo Knight Profiling Medical Journal Articles Using a Gene Ontology Semantic Tagger Mahmoud El-Haj Paul Rayson Scott Piao Jo Knight Origin and Outcomes Currently funded through a Wellcome Trust Seed award Collaboration

More information

Scholarly Big Data: Leverage for Science

Scholarly Big Data: Leverage for Science Scholarly Big Data: Leverage for Science C. Lee Giles The Pennsylvania State University University Park, PA, USA giles@ist.psu.edu http://clgiles.ist.psu.edu Funded in part by NSF, Allen Institute for

More information

Document Retrieval using Predication Similarity

Document Retrieval using Predication Similarity Document Retrieval using Predication Similarity Kalpa Gunaratna 1 Kno.e.sis Center, Wright State University, Dayton, OH 45435 USA kalpa@knoesis.org Abstract. Document retrieval has been an important research

More information

Turning Text into Insight: Text Mining in the Life Sciences WHITEPAPER

Turning Text into Insight: Text Mining in the Life Sciences WHITEPAPER Turning Text into Insight: Text Mining in the Life Sciences WHITEPAPER According to The STM Report (2015), 2.5 million peer-reviewed articles are published in scholarly journals each year. 1 PubMed contains

More information

COURSE LISTING. Courses Listed. Training for Database & Technology with Modeling in SAP HANA. 20 November 2017 (12:10 GMT) Beginner.

COURSE LISTING. Courses Listed. Training for Database & Technology with Modeling in SAP HANA. 20 November 2017 (12:10 GMT) Beginner. Training for Database & Technology with Modeling in SAP HANA Courses Listed Beginner HA100 - SAP HANA Introduction Advanced HA300 - SAP HANA Certification Exam C_HANAIMP_13 - SAP Certified Application

More information

Text-mining-assisted biocuration workflows in Argo

Text-mining-assisted biocuration workflows in Argo Database, 2014, 1 14 doi: 10.1093/database/bau070 Original article Original article Text-mining-assisted biocuration workflows in Argo Rafal Rak 1, *, Riza Theresa Batista-Navarro 1,2, Andrew Rowley 1,

More information

EVIDENCE FOR SHOWING GENE/PROTEIN NAME SUGGESTIONS IN BIOSCIENCE LITERATURE SEARCH INTERFACES

EVIDENCE FOR SHOWING GENE/PROTEIN NAME SUGGESTIONS IN BIOSCIENCE LITERATURE SEARCH INTERFACES EVIDENCE FOR SHOWING GENE/PROTEIN NAME SUGGESTIONS IN BIOSCIENCE LITERATURE SEARCH INTERFACES ANNA DIVOLI, MARTI A. HEARST, MICHAEL A. WOOLDRIDGE School of Information, UC Berkeley {divoli,hearst,mikew}@.ischool.berkeley.edu

More information

Shrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India

Shrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 3 ISSN : 2456-3307 Some Issues in Application of NLP to Intelligent

More information

Powering Knowledge Discovery. Insights from big data with Linguamatics I2E

Powering Knowledge Discovery. Insights from big data with Linguamatics I2E Powering Knowledge Discovery Insights from big data with Linguamatics I2E Gain actionable insights from unstructured data The world now generates an overwhelming amount of data, most of it written in natural

More information

The Final Updates. Philippe Rocca-Serra Alejandra Gonzalez-Beltran, Susanna-Assunta Sansone, Oxford e-research Centre, University of Oxford, UK

The Final Updates. Philippe Rocca-Serra Alejandra Gonzalez-Beltran, Susanna-Assunta Sansone, Oxford e-research Centre, University of Oxford, UK The Final Updates Supported by the NIH grant 1U24 AI117966-01 to UCSD PI, Co-Investigators at: Philippe Rocca-Serra Alejandra Gonzalez-Beltran, Susanna-Assunta Sansone, Oxford e-research Centre, University

More information

Integrated Access to Biological Data. A use case

Integrated Access to Biological Data. A use case Integrated Access to Biological Data. A use case Marta González Fundación ROBOTIKER, Parque Tecnológico Edif 202 48970 Zamudio, Vizcaya Spain marta@robotiker.es Abstract. This use case reflects the research

More information

Master Project. Various Aspects of Recommender Systems. Prof. Dr. Georg Lausen Dr. Michael Färber Anas Alzoghbi Victor Anthony Arrascue Ayala

Master Project. Various Aspects of Recommender Systems. Prof. Dr. Georg Lausen Dr. Michael Färber Anas Alzoghbi Victor Anthony Arrascue Ayala Master Project Various Aspects of Recommender Systems May 2nd, 2017 Master project SS17 Albert-Ludwigs-Universität Freiburg Prof. Dr. Georg Lausen Dr. Michael Färber Anas Alzoghbi Victor Anthony Arrascue

More information

PROJECT PERIODIC REPORT

PROJECT PERIODIC REPORT PROJECT PERIODIC REPORT Grant Agreement number: 257403 Project acronym: CUBIST Project title: Combining and Uniting Business Intelligence and Semantic Technologies Funding Scheme: STREP Date of latest

More information

Lecture 5. Functional Analysis with Blast2GO Enriched functions. Kegg Pathway Analysis Functional Similarities B2G-Far. FatiGO Babelomics.

Lecture 5. Functional Analysis with Blast2GO Enriched functions. Kegg Pathway Analysis Functional Similarities B2G-Far. FatiGO Babelomics. Lecture 5 Functional Analysis with Blast2GO Enriched functions FatiGO Babelomics FatiScan Kegg Pathway Analysis Functional Similarities B2G-Far 1 Fisher's Exact Test One Gene List (A) The other list (B)

More information

A STACKED GRAPHICAL MODEL FOR ASSOCIATING SUB-IMAGES WITH SUB-CAPTIONS

A STACKED GRAPHICAL MODEL FOR ASSOCIATING SUB-IMAGES WITH SUB-CAPTIONS A STACKED GRAPHICAL MODEL FOR ASSOCIATING SUB-IMAGES WITH SUB-CAPTIONS ZHENZHEN KOU, WILLIAM W. COHEN, AND ROBERT F. MURPHY Machine Learning Department, Carnegie Mellon University 5000 Forbes Avenue, Pittsburgh,

More information

The GENIA corpus Linguistic and Semantic Annotation of Biomedical Literature. Jin-Dong Kim Tsujii Laboratory, University of Tokyo

The GENIA corpus Linguistic and Semantic Annotation of Biomedical Literature. Jin-Dong Kim Tsujii Laboratory, University of Tokyo The GENIA corpus Linguistic and Semantic Annotation of Biomedical Literature Jin-Dong Kim Tsujii Laboratory, University of Tokyo Contents Ontology, Corpus and Annotation for IE Annotation and Information

More information

Unstructured Text in Big Data The Elephant in the Room

Unstructured Text in Big Data The Elephant in the Room Unstructured Text in Big Data The Elephant in the Room David Milward ICIC, October 2013 Click Unstructured to to edit edit Master Master Big title Data style title style Big Data Volume, Variety, Velocity

More information

Biomedical literature mining for knowledge discovery

Biomedical literature mining for knowledge discovery Biomedical literature mining for knowledge discovery REZARTA ISLAMAJ DOĞAN National Center for Biotechnology Information National Library of Medicine Outline Biomedical Literature Access Challenges in

More information

A Technical Introduction to the Semantic Search Engine SeMedico

A Technical Introduction to the Semantic Search Engine SeMedico Talk in the Semesterprojekt Entwicklung einer Suchmaschine für Alternativmethoden zu Tierversuchen January 12, 2018 Humboldt-Universität zu Berlin A Technical Introduction to the Semantic Search Engine

More information

Visualizing Semantic Metadata from Biological Publications

Visualizing Semantic Metadata from Biological Publications Visualizing Semantic Metadata from Biological Publications Johannes Hellrich, Erik Faessler, Ekaterina Buyko and Udo Hahn Jena University Language and Information Engineering (JULIE) Lab Friedrich-Schiller-Universität

More information

Stakeholder consultation process and online consultation platform

Stakeholder consultation process and online consultation platform Stakeholder consultation process and online consultation platform Grant agreement no.: 633107 Deliverable No. D6.2 Stakeholder consultation process and online consultation platform Status: Final Dissemination

More information

A Semantic Model for Federated Queries Over a Normalized Corpus

A Semantic Model for Federated Queries Over a Normalized Corpus A Semantic Model for Federated Queries Over a Normalized Corpus Samuel Croset, Christoph Grabmüller, Dietrich Rebholz-Schuhmann 17 th March 2010, Hinxton EBI is an Outstation of the European Molecular

More information

BioNav: An Ontology-Based Framework to Discover Semantic Links in the Cloud of Linked Data

BioNav: An Ontology-Based Framework to Discover Semantic Links in the Cloud of Linked Data BioNav: An Ontology-Based Framework to Discover Semantic Links in the Cloud of Linked Data María-Esther Vidal 1, Louiqa Raschid 2, Natalia Márquez 1, Jean Carlo Rivera 1, and Edna Ruckhaus 1 1 Universidad

More information

A RapidMiner framework for protein interaction extraction

A RapidMiner framework for protein interaction extraction A RapidMiner framework for protein interaction extraction Timur Fayruzov 1, George Dittmar 2, Nicolas Spence 2, Martine De Cock 1, Ankur Teredesai 2 1 Ghent University, Ghent, Belgium 2 University of Washington,

More information

Medical Event Extraction using the Swedish FrameNet, a pilot study

Medical Event Extraction using the Swedish FrameNet, a pilot study Medical Event Extraction using the Swedish FrameNet, a pilot study DIMITRIOS KOKKINAKIS Centre for Language Technology University of Gothenburg Sweden dimitrios.kokkinakis@svenska.gu.se Overview From entities

More information

Use of Semantic Technologies at Eli Lilly and Company. J Phil Brooks Information Consultant, SE Data Team Discover IT Eli Lilly and Company

Use of Semantic Technologies at Eli Lilly and Company. J Phil Brooks Information Consultant, SE Data Team Discover IT Eli Lilly and Company Use of Semantic Technologies at Eli Lilly and Company J Phil Brooks Information Consultant, SE Data Team Discover IT Eli Lilly and Company Notable Semantic Projects at Lilly Discovery Metadata Integration

More information

Classification and retrieval of biomedical literatures: SNUMedinfo at CLEF QA track BioASQ 2014

Classification and retrieval of biomedical literatures: SNUMedinfo at CLEF QA track BioASQ 2014 Classification and retrieval of biomedical literatures: SNUMedinfo at CLEF QA track BioASQ 2014 Sungbin Choi, Jinwook Choi Medical Informatics Laboratory, Seoul National University, Seoul, Republic of

More information

CACAO Training. Jim Hu and Suzi Aleksander Spring 2016

CACAO Training. Jim Hu and Suzi Aleksander Spring 2016 CACAO Training Jim Hu and Suzi Aleksander Spring 2016 1 What is CACAO? Community Assessment of Community Annotation with Ontologies (CACAO) Annotation of gene function Competition Within a class Between

More information

Original article Using the OntoGene pipeline for the triage task of BioCreative 2012

Original article Using the OntoGene pipeline for the triage task of BioCreative 2012 Original article Using the OntoGene pipeline for the triage task of BioCreative 2012 Fabio Rinaldi 1, *, Simon Clematide 1, Simon Hafner 1, Gerold Schneider 1, Gintare_ Grigonyte_ 1, Martin Romacker 2

More information

Enabling Open Science: Data Discoverability, Access and Use. Jo McEntyre Head of Literature Services

Enabling Open Science: Data Discoverability, Access and Use. Jo McEntyre Head of Literature Services Enabling Open Science: Data Discoverability, Access and Use Jo McEntyre Head of Literature Services www.ebi.ac.uk About EMBL-EBI Part of the European Molecular Biology Laboratory International, non-profit

More information

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK REVIEW PAPER ON IMPLEMENTATION OF DOCUMENT ANNOTATION USING CONTENT AND QUERYING

More information

WEB SEARCH, FILTERING, AND TEXT MINING: TECHNOLOGY FOR A NEW ERA OF INFORMATION ACCESS

WEB SEARCH, FILTERING, AND TEXT MINING: TECHNOLOGY FOR A NEW ERA OF INFORMATION ACCESS 1 WEB SEARCH, FILTERING, AND TEXT MINING: TECHNOLOGY FOR A NEW ERA OF INFORMATION ACCESS BRUCE CROFT NSF Center for Intelligent Information Retrieval, Computer Science Department, University of Massachusetts,

More information

efip online Help Document

efip online Help Document efip online Help Document University of Delaware Computer and Information Sciences & Center for Bioinformatics and Computational Biology Newark, DE, USA December 2013 K K S I K K Table of Contents INTRODUCTION...

More information

A few contributions of the SIFR project

A few contributions of the SIFR project A few contributions of the SIFR project Semantic Indexing of French biomedical Resources Data seminar- December 10th 2015 LIRMM University of Montpellier Clement Jonquet, Mathieu Roche, Sandra Bringay

More information

National Centre for Text Mining NaCTeM. e-science and data mining workshop

National Centre for Text Mining NaCTeM. e-science and data mining workshop National Centre for Text Mining NaCTeM e-science and data mining workshop John Keane Co-Director, NaCTeM john.keane@manchester.ac.uk School of Informatics, University of Manchester What is text mining?

More information

IPA: networks generation algorithm

IPA: networks generation algorithm IPA: networks generation algorithm Dr. Michael Shmoish Bioinformatics Knowledge Unit, Head The Lorry I. Lokey Interdisciplinary Center for Life Sciences and Engineering Technion Israel Institute of Technology

More information

COURSE LISTING. Courses Listed. with SAP Hybris Marketing Cloud. 24 January 2018 (23:53 GMT) HY760 - SAP Hybris Marketing Cloud

COURSE LISTING. Courses Listed. with SAP Hybris Marketing Cloud. 24 January 2018 (23:53 GMT) HY760 - SAP Hybris Marketing Cloud with SAP Hybris Marketing Cloud Courses Listed HY760 - SAP Hybris Marketing Cloud C_HYMC_1702 - SAP Certified Technology Associate - SAP Hybris Marketing Cloud (1702) Implementation Page 1 of 12 All available

More information

Overview of the NTCIR-12 MobileClick-2 Task

Overview of the NTCIR-12 MobileClick-2 Task Overview of the NTCIR-12 MobileClick-2 Task Makoto P. Kato (Kyoto U.), Tetsuya Sakai (Waseda U.), Takehiro Yamamoto (Kyoto U.), Virgil Pavlu (Northeastern U.), Hajime Morita (Kyoto U.), and Sumio Fujita

More information

Exploring the Generation and Integration of Publishable Scientific Facts Using the Concept of Nano-publications

Exploring the Generation and Integration of Publishable Scientific Facts Using the Concept of Nano-publications Exploring the Generation and Integration of Publishable Scientific Facts Using the Concept of Nano-publications Amanda Clare 1,3, Samuel Croset 2,3 (croset@ebi.ac.uk), Christoph Grabmueller 2,3, Senay

More information

Text Mining. Representation of Text Documents

Text Mining. Representation of Text Documents Data Mining is typically concerned with the detection of patterns in numeric data, but very often important (e.g., critical to business) information is stored in the form of text. Unlike numeric data,

More information

Genescene: Biomedical Text and Data Mining

Genescene: Biomedical Text and Data Mining Claremont Colleges Scholarship @ Claremont CGU Faculty Publications and Research CGU Faculty Scholarship 5-1-2003 Genescene: Biomedical Text and Data Mining Gondy Leroy Claremont Graduate University Hsinchun

More information

CRFVoter: Chemical Entity Mention, Gene and Protein Related Object recognition using a conglomerate of CRF based tools

CRFVoter: Chemical Entity Mention, Gene and Protein Related Object recognition using a conglomerate of CRF based tools CRFVoter: Chemical Entity Mention, Gene and Protein Related Object recognition using a conglomerate of CRF based tools Wahed Hemati, Alexander Mehler, and Tolga Uslu Text Technology Lab, Goethe Universitt

More information

Extracting reproducible simulation studies from model repositories using the CombineArchive Toolkit

Extracting reproducible simulation studies from model repositories using the CombineArchive Toolkit Extracting reproducible simulation studies from model repositories using the CombineArchive Toolkit Martin Scharm, Dagmar Waltemath Department of Systems Biology and Bioinformatics University of Rostock

More information

This report is based on sampled data. Jun 1 Jul 6 Aug 10 Sep 14 Oct 19 Nov 23 Dec 28 Feb 1 Mar 8 Apr 12 May 17 Ju

This report is based on sampled data. Jun 1 Jul 6 Aug 10 Sep 14 Oct 19 Nov 23 Dec 28 Feb 1 Mar 8 Apr 12 May 17 Ju 0 - Total Traffic Content View Query This report is based on sampled data. Jun 1, 2009 - Jun 25, 2010 Comparing to: Site 300 Unique Pageviews 300 150 150 0 0 Jun 1 Jul 6 Aug 10 Sep 14 Oct 19 Nov 23 Dec

More information

Precise Medication Extraction using Agile Text Mining

Precise Medication Extraction using Agile Text Mining Precise Medication Extraction using Agile Text Mining Chaitanya Shivade *, James Cormack, David Milward * The Ohio State University, Columbus, Ohio, USA Linguamatics Ltd, Cambridge, UK shivade@cse.ohio-state.edu,

More information

Complex-to-Pairwise Mapping of Biological Relationships using a Semantic Network Representation

Complex-to-Pairwise Mapping of Biological Relationships using a Semantic Network Representation Complex-to-Pairwise Mapping of Biological Relationships using a Semantic Network Representation Juho Heimonen, 1 Sampo Pyysalo, 2 Filip Ginter 1 and Tapio Salakoski 1,2 1 Department of Information Technology,

More information

Semantic Annotation and Linking of Medical Educational Resources

Semantic Annotation and Linking of Medical Educational Resources 5 th European IFMBE MBEC, Budapest, September 14-18, 2011 Semantic Annotation and Linking of Medical Educational Resources N. Dovrolis 1, T. Stefanut 2, S. Dietze 3, H.Q. Yu 3, C. Valentine 3 & E. Kaldoudi

More information

Characterization and Modeling of Deleted Questions on Stack Overflow

Characterization and Modeling of Deleted Questions on Stack Overflow Characterization and Modeling of Deleted Questions on Stack Overflow Denzil Correa, Ashish Sureka http://correa.in/ February 16, 2014 Denzil Correa, Ashish Sureka (http://correa.in/) ACM WWW-2014 February

More information

TURNING TEXT INTO INSIGHT: TEXT MINING IN THE LIFE SCIENCES

TURNING TEXT INTO INSIGHT: TEXT MINING IN THE LIFE SCIENCES TURNING TEXT INTO INSIGHT: TEXT MINING IN THE LIFE SCIENCES According to The STM Report (2015), 2.5 million peer-reviewed articles are published in scholarly journals each year. 1 PubMed contains more

More information

Database of Curated Mutations (DoCM) ournal/v13/n10/full/nmeth.4000.

Database of Curated Mutations (DoCM)     ournal/v13/n10/full/nmeth.4000. Database of Curated Mutations (DoCM) http://docm.genome.wustl.edu/ http://www.nature.com/nmeth/j ournal/v13/n10/full/nmeth.4000.h tml Home Page Information in DoCM DoCM uses many data sources to compile

More information

PMC text mining subset in BioC: 2.3 million full text articles and growing

PMC text mining subset in BioC: 2.3 million full text articles and growing PMC text mining subset in BioC: 2.3 million full text articles and growing Donald C. Comeau, Chih-Hsuan Wei, Rezarta Islamaj Doğan and Zhiyong Lu National Center for Biotechnology Information, U.S. Library

More information

Asks for clarification of whether a GOP must communicate to a TOP that a generator is in manual mode (no AVR) during start up or shut down.

Asks for clarification of whether a GOP must communicate to a TOP that a generator is in manual mode (no AVR) during start up or shut down. # Name Duration 1 Project 2011-INT-02 Interpretation of VAR-002 for Constellation Power Gen 185 days Jan Feb Mar Apr May Jun Jul Aug Sep O 2012 2 Start Date for this Plan 0 days 3 A - ASSEMBLE SDT 6 days

More information

Blast2GO Teaching Exercises

Blast2GO Teaching Exercises Blast2GO Teaching Exercises Ana Conesa and Stefan Götz 2012 BioBam Bioinformatics S.L. Valencia, Spain Contents 1 Annotate 10 sequences with Blast2GO 2 2 Perform a complete annotation process with Blast2GO

More information

Steering Committee Meeting

Steering Committee Meeting Steering Committee Meeting To hear the meeting, you must call in Toll-free phone number: 1-866-740-1260 Access Code: 2201876 For international call in numbers, please visit: https://www.readytalk.com/account-administration/international-numbers

More information

Brat2BioC: conversion tool between brat and BioC

Brat2BioC: conversion tool between brat and BioC Brat2: conversion tool between and Antonio Jimeno Yepes 1,2, Mariana Neves 3,4, Karin Verspoor 1,2 1 NICTA Victoria Research Lab, Melbourne VIC 3010, Australia 2 Department of Computing and Information

More information

An UIMA based Tool Suite for Semantic Text Processing

An UIMA based Tool Suite for Semantic Text Processing An UIMA based Tool Suite for Semantic Text Processing Katrin Tomanek, Ekaterina Buyko, Udo Hahn Jena University Language & Information Engineering Lab StemNet Knowledge Management for Immunology in life

More information

INAB Mandatory and Guidance Documents Policy and Index

INAB Mandatory and Guidance Documents Policy and Index INAB Mandatory and Guidance s Policy and Index This publication is aimed at assisting in determining what documents are relevant to various organisations and at providing contact points for accessing such

More information

NCI Thesaurus, managing towards an ontology

NCI Thesaurus, managing towards an ontology NCI Thesaurus, managing towards an ontology CENDI/NKOS Workshop October 22, 2009 Gilberto Fragoso Outline Background on EVS The NCI Thesaurus BiomedGT Editing Plug-in for Protege Semantic Media Wiki supports

More information

Website Redevelopment Content Information Session. Presentation by

Website Redevelopment Content Information Session. Presentation by Website Redevelopment Content Information Session Presentation by December 3, 2010 Agenda December 3rd & 10th, 2010 10:00 10:10 Welcome & Introductions 10:10 10:20 Project Status & Development Schedule

More information

National Smart Metering Program Testing Framework Work Stream Actions Log

National Smart Metering Program Testing Framework Work Stream Actions Log - ustomer section of the table Description / Progress Source/Origin Workstream/ owner Who By When Status Reporting 20090923 4 TFWG need to feed into the BPPWG the issues about service levels and performance

More information

MeSH : A Thesaurus for PubMed

MeSH : A Thesaurus for PubMed Scuola di dottorato di ricerca in Scienze Molecolari Resources and tools for bibliographic research MeSH : A Thesaurus for PubMed What is MeSH? Who uses MeSH? Why use MeSH? Searching by using the MeSH

More information