Projects Tools BLAH proposal Conclusion. OntoGene/BioMeXT

Size: px

Start display at page:

Download "Projects Tools BLAH proposal Conclusion. OntoGene/BioMeXT"

Marcus Lester
5 years ago
Views:

1 OntoGene/BioMeXT

2 The Bio Term Hub and OGER Lenz Furrer, Nico Colic, Fabio Rinaldi University of Zurich and Swiss Institute of Bioinformatics January 10, 2018

3 Outline Projects Tools BLAH proposal Conclusion

4 Topic Projects Tools BLAH proposal Conclusion

5 VetMine

6 VetMine

7 VetMine

8 VetMine

9 VetMine

10 PsyMine

11 Goal: discover causal interactions

12 From disorders to etiological factors

13 Creation of a reference corpus

14 An application: temporal trends

15 Author name disambiguation

16 SwissMADE: The challenge of clinical text SwissMADE (Monitoring of Adverse Drug Event) older patients (aged 65 years) antithrombotic drugs using structured and unstructured parts of the EHRs involves five hospitals [

17 MedMon Mining the web and social networks for mentions of Adverse Drug Reactions Collaboration with a major Pharma Company and another Swiss University I urgently need a junior PostDoc! PhD in Computational Linguistics, Computer Science or a related field Good programming skills, and proven expertise in Python Experience with Information Extraction and Text Mining Experience with machine learning approaches

18 Assisted curation The OntoGene/BioMeXT group has been active in assisted curation since 2010 with the SASEBio project (Semi-Automated Semantic Enrichment of the Biomedical Literature). Since 2013 we are collaborating with the RegulonDB database in a project aimed at testing and gradually introduce assisted curation techniques in their curation pipeline. RegulonDB is a database of the regulatory network of Escherichia coli K-12.

19 Example We additionally found that expression of the mntp gene is upregulated by manganese through MntR. Given: MntR [+] mntp To identify: condition [manganese]

OxyR experiment TOPIC: oxidative stress by OxyR CORPUS: 46 papers, curated in RegDB METHODS: automated annotations of entities via OntoGene, selection of sentences via ODIN filters, manual validation

21 OxyR experiment TOPIC: oxidative stress by OxyR CORPUS: 46 papers, curated in RegDB METHODS: automated annotations of entities via OntoGene, selection of sentences via ODIN filters, manual validation RESULTS: 100% of RIs retrieved, including TF, EFFECT and their TG Identified the growth conditions for 15 of the 20 Ris of OxyR checking only a limited set of sentences (about 10% of the article is read) [Gama-Castro et al., 2014]

23 Topic Projects Tools BLAH proposal Conclusion

24 Bio Term Hub

26 Bio Term Hub

27 BTH: Term Statistics

28 BTH: Term confusion matrix

29 OGER

30 OGER: annotation service The OntoGene s Biomedical Entity Recogniser (OGER) RESTful web service, using BTH terminologies Allows annotation of a collection of documents. Evaluated in the Bio Text Mining services challenge BioCreative/TIPS best results according to several of the evaluation metrics.

31 OGER: annotation service Annotates input text with entities from the BTH Except EntrezGene Can be used as a web demo (for annotation of single articles) or as a web service (batch). Input: PubMed, PubMed Central, Free Text Formats: text(i), BioC (I/O), pxml (I), tsv (O), brat (O), odin-xml (O) Note: user-provided terminologies can be used, but this is not yet supported by the interface and web service.

32 BioCreative V.5 / TIPS

33 OGER in TIPS

35 Term ambiguity

36 Term ambiguity

37 Term ambiguity

39 CRAFT corpus

41 Results [Anna Jancso, Lenz Furrer, Fabio Rinaldi, in preparation]

42 Previous history... [2006] BioCreative II: PPI (3rd), IMT (best) [2009] BioCreative II.5 PPI (best results); BioNLP [2010] BioCreative III: ACT, IMT, IAT [2011] CALBC (large scale entity extraction), BioNLP [2012] CTD task at BioCreative 2012 [2013] BioCreative IV: BioC, CTD, IAT

43 Topic Projects Tools BLAH proposal Conclusion

44 Use BTH/OGER through web API integration of BTH in another dictionary-based annotation platform usage of OGER web services Suggestion: integration with PubDictionaries/PubAnnotations

45 BTH: Rest API The Bio Term Hub can currently be accessed publicly through a web interface or (if installed locally) used through a command-line interface. To ease integration into automatic workflows, a REST API should be added.

46 BTH: JSON output The Bio Term Hub currently produces plain-text output (a TSV table). dense (as compared to e.g. XML), and straight-forward to parse in text-based processing environments for some applications JSON might be more suitable. We propose to evaluate different possible JSON representations and implement the best one. License: BSD 2-clause

47 OGER: BioC/JSON OGER supports BioC XML as both input and output. recently a JSON version of the BioC format has been defined. We propose to add support for this new format. While a possible approach would be to use the converter provided by the NCBI, it is preferable to use a solution with less overhead with respect to speed and memory consumption.

48 OGER: Format options in the web UI only a fraction of the API s options is exposed in the web interface only allows specifying input documents through an ID or by typing/pasting plain text into a text box. output is always an embedded HTML fragment with the annotations highlighted in color, which cannot easily be downloaded. We propose to extend the availables choices to the full range of input and output formats.

49 Topic Projects Tools BLAH proposal Conclusion

50 Conclusions Bio Term Hub: a one-stop site for obtaining up-to-date biomedical terminological resources OGER: an efficient text annotation tool using the BTH terminologies Provides spans and IDs (NER and CR) OGER-CR shown to be state-of-the-art But: disambiguation not yet included in web services

51 Thank you / どうもありがとうございます

Customisable Curation Workflows in Argo

Customisable Curation Workflows in Argo Rafal Rak*, Riza Batista-Navarro, Andrew Rowley, Jacob Carter and Sophia Ananiadou National Centre for Text Mining, University of Manchester, UK *Corresponding author: