Information Retrieval, Information Extraction, and Text Mining Applications for Biology. Slides by Suleyman Cetintas & Luo Si
|
|
- Antony Nichols
- 5 years ago
- Views:
Transcription
1 Information Retrieval, Information Extraction, and Text Mining Applications for Biology Slides by Suleyman Cetintas & Luo Si 1
2 Outline Introduction Overview of Literature Data Sources PubMed, HighWire Press, Google Scholar, Other Sources Structure of Biomedical Language Biological Terminology Lexical and Semantic Sources for Biology Biomedical Literature Processing Applications Beyond BioCreative: Advanced Applications Summary References 2
3 Introduction Life-science research Large and heterogeneous biological data in the form of protein and genomic sequence data, expression profiles, protein structures Yet, significant amount of information in natural language Most discoveries communicated by natural language via publications, patents, reports, and e-texts on the www controlled vocabulary terms used for other biological sources: gene product annotations (e.g., Gene Ontology [GO] terms) Database records (e.g., UniProt), containing comments, keywords, descriptions etc. 3
4 Introduction Structured database entries enable efficient data retrieval, exchange, and analysis recent tendency to enrich annotation records general annotation databases such as UniProt (of 134K citations as of 2008) are of great practical value Yet, only capable of covering a small fraction of biological context information can t capture the richness of scientific information, argumentation in the literature Hard to cope up with the rapid accumulation of new publications Text mining can help to link the database entries to the evidence and argumentation in the literature 4
5 Introduction Online literature collections 5 e.g., PubMed 70 million queries every month, >20 million publications (as of 2010) crucial importance to experimental biologists, biomedical researchers, database curators, etc. Face double-exponential growth rates (due to new journals & increasing number of journal articles) Different needs Scientific community needs efficient and effective information retrieval for targeted literature searches Pharmaceutical industry uses text-mining systems for their competitive intelligence Government institutions use such tools to have a global view of the current research state
6 Overview of Literature Data Sources Several efforts to make medical and life-science journal information electronically accessible to the public through the worldwide web Efforts can be grouped under 3 categories: 1) Centralized institutional (PubMed) or academic (Highwire Press & Holllis) repositories of peer reviewed articles or abstracts II) Article collection repositories by publishers (e.g., BioMedCentral, EMBASE) III) Access to indexed scholar articles (e.g., Google Scholar, Scirus) via web-crawlers 6
7 Overview of Literature Data Sources Several efforts to make medical and life-science journal information electronically accessible to the public through the worldwide web Efforts can be grouped under 3 categories: 1) Centralized institutional (PubMed) or academic (Highwire Press & Holllis) repositories of peer reviewed articles or abstracts II) Article collection repositories by publishers (e.g., BioMedCentral, EMBASE) III) Access to indexed scholar articles (e.g., Google Scholar, Scirus) via web-crawlers 7
8 PubMed The most important resource for text mining applications Includes citations (i.e., title, abstract, authors, and source information) by participating publishers by the National Center for Biotechnology Information (NCBI) at the National Library of Medicine (NLM) Basic Search: can be accessed online by Entrez, a text based search and retrieval system Entrez improves the basic keyword searches by translating the user query to Medical Subject Heading (MeSH) terms MeSH: controlled vocabulary terms of medical domain, chemicals, genes, proteins, etc. 8
9 PubMed Growth of PubMed citations between
10 PubMed Technology development timeline for PubMed (in light green color) and other biomedical literature search tools (in light orange color) 10
11 PubMed Programmatic Access: PubMed also offers a more programmatic access to its content through: Entrez Programming Utilities Open Source Projects BioPerl, BioPhyton, BioJava, etc. for biologist programmers The NCBI provides the My NCBI service, to periodically retrieve new publications in PubMed matching a predefined user query The requester receives a corresponding notification via an alert system 11
12 PubMed For a Local PubMed it is possible to have a local relational database of all PubMed citations Obtain a licensed copy of the whole PubMed containing XMLformatted citation records from NLM/NCBI Mobile Access Txt2MEDLINE: use SMS to access PubMed PubMed Informer: Web-based PubMed monitoring tool, facilitates PDA downloads and RSS feeds 12
13 Google Scholar alternative to PubMed not only peer reviewed articles, but also other scholarly texts such as theses, books, preprint repositories often returns larger retrieval sets, (yet with substantial number of link-outs to PubMed records) does not offer the advanced search functions that PubMed offers 13
14 HighWire Press alternative to PubMed an initiative of Stanford University represents another complementary resource to PubMed Access to peer-reviewed articles, providing search interface to over 1160 journals, 4.8 million full-text articles (with over 1.9 million articles available free by HighWire partner publishers) share many search characteristics with PubMed (there are also differences of each) HighWire, further has graphical representation of articles citation map allows user specifiy where to conduct the search (title, abstract, etc.) 14
15 Other resources PubMed Central Free access to full-text articles (not only to abstracts) contains articles published before 1966 publishers have also developed platforms of searchable article repositories such as EMBASE and BioMed Central to improve the access to their articles 15
16 Structure of Biomedical Language A collection of homologous protein sequences often share a common structural fold and tend to exhibit a similar function In natural language, a particular meaning may be expressed using different but largely synonymous expressions Natural language processing (NLP) is used to decode human language exploiting the regularities and constraints that occur at multiple levels in human language These 4 levels: words, syntax, semantics, pragmatics 16
17 Structure of Biomed. Lang.: Words Tokenization and morphology: identification of words in biology text in English, word boundaries by whitespace, sentence boundaries by. (period or full stop). there are too many complications as well the JULIE (Jena University Language and Information Engineering) laboratory provides tools for token and sentence boundary detection 17
18 Structure of Biomed. Lang.: Words Tokenization and morphology: identification of words in biology text very important stage gene mention identification (BioCreative II gene mention task) some teams explored the integration of publicly available gene mention taggers, e.g. the ABNER application or the LingPipe system linking these mentions to specific entries in biological resources (gene normalization) stemming convert words to their roots, reduce variability general stemmer the Porter stemmer specific biomedical stemmers 18
19 Structure of Biomed. Lang.: Syntax Syntax: syntax or grammar of a language controls how words are grouped into meaningful phrases words can be associated with parts of speech (POS) tags POS taggers are based on machine learning algorithms (e.g., hidden Markov models) trained on manually marked corpus biomedical POS distribution slightly different than the general English special taggers for biomedical domain: MedPost tagger, dtagger POS tagging can be useful to detect textual patterns expressing protein interaction locate gene and protein mentions 19
20 Structure of Biomed. Lang.: Semantics & Pragmatics Semantics: capture the meaning e.g., c-jun is activated by VRK1 can be represented as an operator activate(vrk1,c-jun) semantic representation abstracts away the syntax Pragmatics: capture the larger context and its contribution to meaning text mining systems often rely on sentences as basic processing unit for extracting associations between biological entities descriptions of those relations goes beyond sentence boundaries, and make use of referring expressions 20
21 Structure of Biomedical Language Main NLP levels, from word tokenization to semantics 21
22 Biological Terminology Biological literature characterized by heavy use of domain-specific terminology ~12% of all terms in biochemistry pubs are technical terms a need for recognizing medical terms & their variations automatically 2 main challenges constant formation of new terms and new short forms ambiguity or polysemy (multiple meaning of the same word) 22
23 Biological Terminology ambiguity or polysemy text mining tools must select the correct sense of the word, using the context behind (for disambiguating) gene names are problem as often shared across species general English => 0.57% ambiguity medical terms => 1.01% ambiguity gene names => 14.20% ambiguity biomedical & life science literature heavily depends on short forms => further ambiguity online tools for acronym-full name pairs: ADAM, the Abbreviation Server, and AcroMine 23
24 Lexical and Semantic Sources for Biology domain-specific technical terms used for expressing functional descriptions of bio-entities, relevant biological processes, experimental techniques terminological repositories & dictionaries important resources to interpret scientific articles many have been developed ontologies developed for various subfields of biology Gene Ontology (GO) widely used as controlled vocabulary to describe biologically relevant aspects of gene products 24
25 Lexical and Semantic Sources for Biology Ontologies Gene Ontology (GO) Although primarily designed for annotation purposes, can also be used as a lexical resource for indexing via the GoPubMed application GOAnnotator allows extraction of test-based GO annotations for a given protein identifier (Swiss accession number) GO Annotation Task in BioCreative I showed that automatic detection of GO terms are more efficient in case of short terms 25
26 Lexical and Semantic Sources for Biology Word Level SwissProt biological annotation database BioThesaurus widely used resource combining gene and protein names from multiple sources TerMine developed at the National Center for Text Mining (NaCTeM) integrates automatic term recognition approach using linguistic and statistical analysis of candidate terms 26
27 Biomedical Literature Processing Applications Provide access to information in scientific articles at various levels of granularity Building blocks for biomedical text processing can be grouped with respect to the BioCreative tasks: Document retrieval: core of the interaction article subtask, to select articles about protein-protein interactions Entity mention: identification of mentions of biological entities Entity normalization: linking biological entities (e.g., genes, proteins, etc.) to biological resources (e.g., SwissProt, Entrez Gene, etc.) 27
28 BioMed. Lit. Proc. Apps: Document Retrieval Requires the ability to process and index massive volumes of data (e.g., the entire MEDLINE collection) robust, efficient wrt space and time Look for keywords that characterize a collection of papers, based on keyword frequency basis of neighbor searches in MEDLINE (the predecessor of et-blast) still the most heavily used system Statistical analysis of word occurrences many current literature mining systems rely on calculated over the whole PubMed database, resulting in weighted associations between biological entities 28
29 BioMed. Lit. Proc. Apps: Document Retrieval Statistical analysis of word occurrences underlying assumption is that if two biological entities frequently co-occur together, they should have some biological relationship can provide high recall challenge in human interpretation lacks semantic information on the type of biological association CoPub Mapper system provide online access to ranked co-occurrence associations extracted from PubMed (btw genes and biological terms) PubGene system Generates a graphical protein interaction network based on proteinprotein literature co-occurances 29
30 BioMed. Lit. Proc. Apps: Document Retrieval Stemming converts words into standardized forms (stems) essential component of IR systems and search engines one common shortcoming two semantically different words can be collapsed to a common stem used by systems such as etblast, to quantify the similarity btw documents CoPub System detects over-represented terms from multiple abstract collections etblast ranks retrieved PubMed records given an input article 30
31 BioMed. Lit. Proc. Apps: Document Retrieval Clustering Algorithms used to group genes according to their expression profiles in microarray experiments using document similarity calculation have been used by PubClust, McSyBi 31 list of systems for clustering and similarity ranking on the right
32 Biomedical Literature Processing Applications: Gene Mention & Gene Normalization Biologists search the annotation databases using gene/protein names or symbols as queries these names have been manually extracted from the literature too time-consuming unable to cover all synonyms or naming variants used by the biologists Automatic detection of protein & gene mentions 32 improves the coverage of annotation databases enable semantically refined literature search constitute a crucial initial step for other text mining systems focus of BioCreative gene mention task performance of 90% F measure training data of 15K sentences & 5K test sentences
33 Biomedical Literature Processing Applications: Gene Mention & Gene Normalization Most current bio-entity recognition systems e.g., GAPSCORE, ABGENE Can label text for protein or gene mentions other systems such as ANBER also identify cell lines or cell types Chemical compound mentions Another set of biological entities of interest Oscar, open source system for chemical entity recognition integrates dictionary of compound names as well as using regular expressions, heuristics, and certain word combinations to find chemical names in text 33
34 Biomedical Literature Processing Applications: Gene Mention & Gene Normalization Mentions of species and taxonomic names important for the emerging field of biodiversity crucial step to link gene mentions to corresponding organism source Detecting bio-entity mentions alone is often not enough to retrieve informative sentences BioIE system detects (for a given query keyword) only sentences related to protein families, functions, etc. Other applications such as ihop given a gene or protein, maps it to its corresponding db identifier, and retrieves related sentences with definition info, etc. 34
35 Biomedical Literature Processing Applications: Gene Mention & Gene Normalization Detecting bio-entity mentions alone is often not enough to retrieve informative sentences EBIMed and FACTA systems for a given query protein, present a summary table of co-occurring concepts based on PubMed abstracts FABLE retrieves co-occurring gene and protein mentions for a query keyword results can be downloaded in XML or Excell format For searching functional information for gene products search with protein sequences is possible though METIS and MedBlast systems query sequence is linked to corresponding db record, and the associated literature is retrieved afterwards 35
36 Beyond BioCreative: Advanced Applications ihop and InfoPubMed allow retrieval of protein interaction sentences from PubMed Chilibot to find supporting relationship evidence between two predefined entities of interest (genes, proteins, keywords) Mutation-Finder to extract amino-acid mutation mentions from large text collections MarkerInfoFinder to detect information related to sequence variants of human genes 36
37 Beyond BioCreative: Advanced Applications PepBank Database (of peptide sequences) a text-mining system was used to automatically detect and extract peptide sequences from abstracts and full-text papers Photo.ELM Database integrated a text-mining system to detect S/T/Y phosphorylation sites from the literature MeInfoText & PubMeth use text-mining to provide detailed information on gene methylation and association with cancer Epiloc System a text-based subcellular location prediction tool (complementing alternative sequence-based localization algs) 37
38 Summary: Biological Text Mining Applications from the Biology User Perspective Protein-relations Function annotation & localization relations Gene group & lists analysis 38
39 Summary: Biological Text Mining Applications from the Biology User Perspective Acronmy and term extraction Gene-disease assocication 39
40 Summary: Biological Text Mining Applications from the Biology User Perspective Gene-disease assocication Bio-entity tagging Text retrieval, classification, clustering, similarity ranking 40
41 Summary: Biological Text Mining Applications from the Biology User Perspective Protein sequence Gene group & lists analysis 41
42 References Main References: M. Krallinger, A. Valencia, :L. Hirschman. Linking genes to literature: text mining, information extraction, and retrieval applications for biology. Genome Biol. 2008; 9:S8. Z. Lu, PubMed and beyond: a survey of web tools for searching biomedical literature, Database For original images & references to the mentioned tools, please either conduct an online search with their names or refer to the original articles above 42
43 Questions? Please let us know in case of any questions/issues! Further info: {scetinta, 43
Text mining tools for semantically enriching the scientific literature
Text mining tools for semantically enriching the scientific literature Sophia Ananiadou Director National Centre for Text Mining School of Computer Science University of Manchester Need for enriching the
More informationA Framework for BioCuration (part II)
A Framework for BioCuration (part II) Text Mining for the BioCuration Workflow Workshop, 3rd International Biocuration Conference Friday, April 17, 2009 (Berlin) Martin Krallinger Spanish National Cancer
More informationNatural Language Processing Pipelines to Annotate BioC Collections with an Application to the NCBI Disease Corpus
Natural Language Processing Pipelines to Annotate BioC Collections with an Application to the NCBI Disease Corpus Donald C. Comeau *, Haibin Liu, Rezarta Islamaj Doğan and W. John Wilbur National Center
More informationAn UIMA based Tool Suite for Semantic Text Processing
An UIMA based Tool Suite for Semantic Text Processing Katrin Tomanek, Ekaterina Buyko, Udo Hahn Jena University Language & Information Engineering Lab StemNet Knowledge Management for Immunology in life
More informationPowering Knowledge Discovery. Insights from big data with Linguamatics I2E
Powering Knowledge Discovery Insights from big data with Linguamatics I2E Gain actionable insights from unstructured data The world now generates an overwhelming amount of data, most of it written in natural
More informationMaximizing the Value of STM Content through Semantic Enrichment. Frank Stumpf December 1, 2009
Maximizing the Value of STM Content through Semantic Enrichment Frank Stumpf December 1, 2009 What is Semantics and Semantic Processing? Content Knowledge Framework Technology Framework Search Text Images
More informationKnowledge Retrieval. Franz J. Kurfess. Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A.
Knowledge Retrieval Franz J. Kurfess Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A. 1 Acknowledgements This lecture series has been sponsored by the European
More informationImproving Interoperability of Text Mining Tools with BioC
Improving Interoperability of Text Mining Tools with BioC Ritu Khare, Chih-Hsuan Wei, Yuqing Mao, Robert Leaman, Zhiyong Lu * National Center for Biotechnology Information, 8600 Rockville Pike, Bethesda,
More information@Note2 tutorial. Hugo Costa Ruben Rodrigues Miguel Rocha
@Note2 tutorial Hugo Costa (hcosta@silicolife.com) Ruben Rodrigues (pg25227@alunos.uminho.pt) Miguel Rocha (mrocha@di.uminho.pt) 23-01-2018 The document presents a typical workflow using @Note2 platform
More informationLiterature Databases
Literature Databases Introduction to Bioinformatics Dortmund, 16.-20.07.2007 Lectures: Sven Rahmann Exercises: Udo Feldkamp, Michael Wurst 1 Overview 1. Databases 2. Publications in Science 3. PubMed and
More informationBiomedical literature mining for knowledge discovery
Biomedical literature mining for knowledge discovery REZARTA ISLAMAJ DOĞAN National Center for Biotechnology Information National Library of Medicine Outline Biomedical Literature Access Challenges in
More informationIntegrated Access to Biological Data. A use case
Integrated Access to Biological Data. A use case Marta González Fundación ROBOTIKER, Parque Tecnológico Edif 202 48970 Zamudio, Vizcaya Spain marta@robotiker.es Abstract. This use case reflects the research
More informationEBP. Accessing the Biomedical Literature for the Best Evidence
Accessing the Biomedical Literature for the Best Evidence Structuring the search for information and evidence Basic search resources Starting the search EBP Lab / Practice: Simple searches Using PubMed
More informationOptimizing Query results using Middle Layers Based on Concept Hierarchies
Optimizing Query results using Middle Layers Based on Concept Hierarchies G.Kumari,I.Gayathri Devi G.Kumari Asst.Professor, CSE Department I.Gayathri Devi Asst.Professor, CSE Department Pragati Engineering
More informationThe CALBC RDF Triple store: retrieval over large literature content
The CALBC RDF Triple store: retrieval over large literature content Samuel Croset, Christoph Grabmüller, Chen Li, Silverstras Kavaliauskas, Dietrich Rebholz-Schuhmann croset@ebi.ac.uk 10 th December 2010,
More informationWhat is Text Mining? Sophia Ananiadou National Centre for Text Mining University of Manchester
National Centre for Text Mining www.nactem.ac.uk University of Manchester Outline Aims of text mining Text Mining steps Text Mining uses Applications 2 Aims Extract and discover knowledge hidden in text
More informationMeasuring inter-annotator agreement in GO annotations
Measuring inter-annotator agreement in GO annotations Camon EB, Barrell DG, Dimmer EC, Lee V, Magrane M, Maslen J, Binns ns D, Apweiler R. An evaluation of GO annotation retrieval for BioCreAtIvE and GOA.
More informationCustomisable Curation Workflows in Argo
Customisable Curation Workflows in Argo Rafal Rak*, Riza Batista-Navarro, Andrew Rowley, Jacob Carter and Sophia Ananiadou National Centre for Text Mining, University of Manchester, UK *Corresponding author:
More informationA new methodology for gene normalization using a mix of taggers, global alignment matching and document similarity disambiguation
A new methodology for gene normalization using a mix of taggers, global alignment matching and document similarity disambiguation Mariana Neves 1, Monica Chagoyen 1, José M Carazo 1, Alberto Pascual-Montano
More informationUsing open access literature to guide full-text query formulation. Heather A. Piwowar and Wendy W. Chapman. Background
Using open access literature to guide full-text query formulation Heather A. Piwowar and Wendy W. Chapman Background Much scientific knowledge is contained in the details of the full-text biomedical literature.
More informationRLIMS-P Website Help Document
RLIMS-P Website Help Document Table of Contents Introduction... 1 RLIMS-P architecture... 2 RLIMS-P interface... 2 Login...2 Input page...3 Results Page...4 Text Evidence/Curation Page...9 URL: http://annotation.dbi.udel.edu/text_mining/rlimsp2/
More informationPPI Finder: A Mining Tool for Human Protein-Protein Interactions
PPI Finder: A Mining Tool for Human Protein-Protein Interactions Min He 1,2., Yi Wang 1., Wei Li 1 * 1 Key Laboratory of Molecular and Developmental Biology, Institute of Genetics and Developmental Biology,
More informationMeSH: A Thesaurus for PubMed
Resources and tools for bibliographic research MeSH: A Thesaurus for PubMed October 24, 2012 What is MeSH? Who uses MeSH? Why use MeSH? Searching by using the MeSH Database What is MeSH? Acronym for Medical
More informationThe Text Analytics Challenge BioCreative V - Extraction of causal network information in BEL
The Text Analytics Challenge BioCreative V - Extraction of causal network information in BEL http://tinyurl.com/beltask Fabio Rinaldi Outline Biomedical text mining, motivation Competitive evaluations:
More informationEVIDENCE FOR SHOWING GENE/PROTEIN NAME SUGGESTIONS IN BIOSCIENCE LITERATURE SEARCH INTERFACES
EVIDENCE FOR SHOWING GENE/PROTEIN NAME SUGGESTIONS IN BIOSCIENCE LITERATURE SEARCH INTERFACES ANNA DIVOLI, MARTI A. HEARST, MICHAEL A. WOOLDRIDGE School of Information, UC Berkeley {divoli,hearst,mikew}@.ischool.berkeley.edu
More informationText Mining. Representation of Text Documents
Data Mining is typically concerned with the detection of patterns in numeric data, but very often important (e.g., critical to business) information is stored in the form of text. Unlike numeric data,
More informationScopus. Information literacy in Chemistry. J une 14, 2011
Information literacy in Chemistry Scopus J une 14, 2011 BIBLIOGRAPHIC DATABASE electronic archive of bibliographic records that refer to published academic literature the records are structured and organized
More informationSemi-Supervised Abstraction-Augmented String Kernel for bio-relationship Extraction
Semi-Supervised Abstraction-Augmented String Kernel for bio-relationship Extraction Pavel P. Kuksa, Rutgers University Yanjun Qi, Bing Bai, Ronan Collobert, NEC Labs Jason Weston, Google Research NY Vladimir
More informationin Evidence-Based Medicine
in Evidence-Based Medicine รศ.นพ. อน ร ธ ภะทรากาญจน Aims of Literature Search To solve clinical problems (EBM) To search for existing knowledge in order to conduct a research To update knowledge 5A of
More informationData Mining in Bioinformatics: Study & Survey
Data Mining in Bioinformatics: Study & Survey Saliha V S St. Joseph s college Irinjalakuda Abstract--Large amounts of data are generated in medical research. A biological database consists of a collection
More informationA Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2
A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2 1 Department of Electronics & Comp. Sc, RTMNU, Nagpur, India 2 Department of Computer Science, Hislop College, Nagpur,
More informationWEB SEARCH, FILTERING, AND TEXT MINING: TECHNOLOGY FOR A NEW ERA OF INFORMATION ACCESS
1 WEB SEARCH, FILTERING, AND TEXT MINING: TECHNOLOGY FOR A NEW ERA OF INFORMATION ACCESS BRUCE CROFT NSF Center for Intelligent Information Retrieval, Computer Science Department, University of Massachusetts,
More informationWho is Citing Your Work?
Who is Citing Your Work? Thursday Topic Series Office of Faculty Affairs 12 November 2009 BERNARD BECKER MEDICAL LIBRARY Washington University School of Medicine Knowing How Your Research Was Used: How
More informationUnstructured Text in Big Data The Elephant in the Room
Unstructured Text in Big Data The Elephant in the Room David Milward ICIC, October 2013 Click Unstructured to to edit edit Master Master Big title Data style title style Big Data Volume, Variety, Velocity
More informationParmenides. Semi-automatic. Ontology. construction and maintenance. Ontology. Document convertor/basic processing. Linguistic. Background knowledge
Discover hidden information from your texts! Information overload is a well known issue in the knowledge industry. At the same time most of this information becomes available in natural language which
More informationAcquiring Experience with Ontology and Vocabularies
Acquiring Experience with Ontology and Vocabularies Walt Melo Risa Mayan Jean Stanford The author's affiliation with The MITRE Corporation is provided for identification purposes only, and is not intended
More informationData Mining Technologies for Bioinformatics Sequences
Data Mining Technologies for Bioinformatics Sequences Deepak Garg Computer Science and Engineering Department Thapar Institute of Engineering & Tecnology, Patiala Abstract Main tool used for sequence alignment
More informationScuola di dottorato in Scienze molecolari Information literacy in chemistry 2015 SCOPUS
SCOPUS ORIGINAL RESEARCH INFORMATION IN SCIENCE is published (stored) in PRIMARY LITERATURE it refers to the first place a scientist will communicate to the general audience in a publicly accessible document
More informationA Technical Introduction to the Semantic Search Engine SeMedico
Talk in the Semesterprojekt Entwicklung einer Suchmaschine für Alternativmethoden zu Tierversuchen January 12, 2018 Humboldt-Universität zu Berlin A Technical Introduction to the Semantic Search Engine
More informationOverview of BioCreative VI Precision Medicine Track
Overview of BioCreative VI Precision Medicine Track Mining scientific literature for protein interactions affected by mutations Organizers: Rezarta Islamaj Dogan (NCBI) Andrew Chatr-aryamontri (BioGrid)
More informationShrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India
International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 3 ISSN : 2456-3307 Some Issues in Application of NLP to Intelligent
More informationBioinformatics Hubs on the Web
Bioinformatics Hubs on the Web Take a class The Galter Library teaches a related class called Bioinformatics Hubs on the Web. See our Classes schedule for the next available offering. If this class is
More informationMeSH : A Thesaurus for PubMed
Scuola di dottorato di ricerca in Scienze Molecolari Resources and tools for bibliographic research MeSH : A Thesaurus for PubMed What is MeSH? Who uses MeSH? Why use MeSH? Searching by using the MeSH
More informationProfiling Medical Journal Articles Using a Gene Ontology Semantic Tagger. Mahmoud El-Haj Paul Rayson Scott Piao Jo Knight
Profiling Medical Journal Articles Using a Gene Ontology Semantic Tagger Mahmoud El-Haj Paul Rayson Scott Piao Jo Knight Origin and Outcomes Currently funded through a Wellcome Trust Seed award Collaboration
More information2) NCBI BLAST tutorial This is a users guide written by the education department at NCBI.
Web resources -- Tour. page 1 of 8 This is a guided tour. Any homework is separate. In fact, this exercise is used for multiple classes and is publicly available to everyone. The entire tour will take
More informationA Relevance Feedback-Based System For Quickly Narrowing Biomedical Literature Search Result
Wayne State University Wayne State University Dissertations 1-1-2013 A Relevance Feedback-Based System For Quickly Narrowing Biomedical Literature Search Result Massuod Hassan Alatrash Wayne State University,
More informationBioNav: An Ontology-Based Framework to Discover Semantic Links in the Cloud of Linked Data
BioNav: An Ontology-Based Framework to Discover Semantic Links in the Cloud of Linked Data María-Esther Vidal 1, Louiqa Raschid 2, Natalia Márquez 1, Jean Carlo Rivera 1, and Edna Ruckhaus 1 1 Universidad
More informationSELF-SERVICE SEMANTIC DATA FEDERATION
SELF-SERVICE SEMANTIC DATA FEDERATION WE LL MAKE YOU A DATA SCIENTIST Contact: IPSNP Computing Inc. Chris Baker, CEO Chris.Baker@ipsnp.com (506) 721 8241 BIG VISION: SELF-SERVICE DATA FEDERATION Biomedical
More informationCD 485 Computer Applications in Communication Disorders and Sciences MODULE 3
CD 485 Computer Applications in Communication Disorders and Sciences MODULE 3 SECTION VII IDENTIFYING THE APPROPRIATE DATABASES JOURNAL ARTICLES THROUGH PUBMED, MEDLINE AND COMMUNICATION DISORDERS MULTISEARCH
More informatione-scider: A tool to retrieve, prioritize and analyze the articles from PubMed database Sujit R. Tangadpalliwar 1, Rakesh Nimbalkar 2, Prabha Garg* 3
e-scider: A tool to retrieve, prioritize and analyze the articles from PubMed database Sujit R. Tangadpalliwar 1, Rakesh Nimbalkar 2, Prabha Garg* 3 1 National Institute of Pharmaceutical Education and
More informationConceptual document indexing using a large scale semantic dictionary providing a concept hierarchy
Conceptual document indexing using a large scale semantic dictionary providing a concept hierarchy Martin Rajman, Pierre Andrews, María del Mar Pérez Almenta, and Florian Seydoux Artificial Intelligence
More informationCHAPTER 3 INFORMATION RETRIEVAL BASED ON QUERY EXPANSION AND LATENT SEMANTIC INDEXING
43 CHAPTER 3 INFORMATION RETRIEVAL BASED ON QUERY EXPANSION AND LATENT SEMANTIC INDEXING 3.1 INTRODUCTION This chapter emphasizes the Information Retrieval based on Query Expansion (QE) and Latent Semantic
More informationRetrieval of Highly Related Documents Containing Gene-Disease Association
Retrieval of Highly Related Documents Containing Gene-Disease Association K. Santhosh kumar 1, P. Sudhakar 2 Department of Computer Science & Engineering Annamalai University Annamalai Nagar, India. santhosh09539@gmail.com,
More informationSearching the Evidence in PubMed
CAMBRIDGE UNIVERSITY LIBRARY MEDICAL LIBRARY Supporting Literature Searching Searching the Evidence in PubMed July 2017 Supporting Literature Searching Searching the Evidence in PubMed How to access PubMed
More informationDomain-specific Concept-based Information Retrieval System
Domain-specific Concept-based Information Retrieval System L. Shen 1, Y. K. Lim 1, H. T. Loh 2 1 Design Technology Institute Ltd, National University of Singapore, Singapore 2 Department of Mechanical
More informationMIRACLE at ImageCLEFmed 2008: Evaluating Strategies for Automatic Topic Expansion
MIRACLE at ImageCLEFmed 2008: Evaluating Strategies for Automatic Topic Expansion Sara Lana-Serrano 1,3, Julio Villena-Román 2,3, José C. González-Cristóbal 1,3 1 Universidad Politécnica de Madrid 2 Universidad
More informationAustralian Journal of Basic and Applied Sciences. Named Entity Recognition from Biomedical Abstracts An Information Extraction Task
ISSN:1991-8178 Australian Journal of Basic and Applied Sciences Journal home page: www.ajbasweb.com Named Entity Recognition from Biomedical Abstracts An Information Extraction Task 1 N. Kanya and 2 Dr.
More informationFinal Project Discussion. Adam Meyers Montclair State University
Final Project Discussion Adam Meyers Montclair State University Summary Project Timeline Project Format Details/Examples for Different Project Types Linguistic Resource Projects: Annotation, Lexicons,...
More informationISSN: [Sugumar * et al., 7(4): April, 2018] Impact Factor: 5.164
IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY IMPROVED PERFORMANCE OF STEMMING USING ENHANCED PORTER STEMMER ALGORITHM FOR INFORMATION RETRIEVAL Ramalingam Sugumar & 2 M.Rama
More informationHumboldt-University of Berlin
Humboldt-University of Berlin Exploiting Link Structure to Discover Meaningful Associations between Controlled Vocabulary Terms exposé of diploma thesis of Andrej Masula 13th October 2008 supervisor: Louiqa
More informationTurning Text into Insight: Text Mining in the Life Sciences WHITEPAPER
Turning Text into Insight: Text Mining in the Life Sciences WHITEPAPER According to The STM Report (2015), 2.5 million peer-reviewed articles are published in scholarly journals each year. 1 PubMed contains
More informationSemantic Knowledge Discovery OntoChem IT Solutions
Semantic Knowledge Discovery OntoChem IT Solutions OntoChem IT Solutions GmbH Blücherstr. 24 06120 Halle (Saale) Germany Tel. +49 345 4780472 Fax: +49 345 4780471 mail: info(at)ontochem.com Get the Gold!
More informationPrecise Medication Extraction using Agile Text Mining
Precise Medication Extraction using Agile Text Mining Chaitanya Shivade *, James Cormack, David Milward * The Ohio State University, Columbus, Ohio, USA Linguamatics Ltd, Cambridge, UK shivade@cse.ohio-state.edu,
More informationRevealing the Modern History of Japanese Philosophy Using Digitization, Natural Language Processing, and Visualization
Revealing the Modern History of Japanese Philosophy Using Digitization, Natural Language Katsuya Masuda *, Makoto Tanji **, and Hideki Mima *** Abstract This study proposes a framework to access to the
More informationSciVerse Scopus. 1. Scopus introduction and content coverage. 2. Scopus in comparison with Web of Science. 3. Basic functionalities of Scopus
Prepared by: Jawad Sayadi Account Manager, United Kingdom Elsevier BV Radarweg 29 1043 NX Amsterdam The Netherlands J.Sayadi@elsevier.com SciVerse Scopus SciVerse Scopus 1. Scopus introduction and content
More informationNCBI News, November 2009
Peter Cooper, Ph.D. NCBI cooper@ncbi.nlm.nh.gov Dawn Lipshultz, M.S. NCBI lipshult@ncbi.nlm.nih.gov Featured Resource: New Discovery-oriented PubMed and NCBI Homepage The NCBI Site Guide A new and improved
More informationThe GENIA corpus Linguistic and Semantic Annotation of Biomedical Literature. Jin-Dong Kim Tsujii Laboratory, University of Tokyo
The GENIA corpus Linguistic and Semantic Annotation of Biomedical Literature Jin-Dong Kim Tsujii Laboratory, University of Tokyo Contents Ontology, Corpus and Annotation for IE Annotation and Information
More informationOn Topic Categorization of PubMed Query Results
On Topic Categorization of PubMed Query Results Andreas Kanavos 1, Christos Makris 1 and Evangelos Theodoridis 1,2 1.Computer Engineering and Informatics Department University of Patras Rio, Patras, Greece,
More informationText Mining: A Burgeoning technology for knowledge extraction
Text Mining: A Burgeoning technology for knowledge extraction 1 Anshika Singh, 2 Dr. Udayan Ghosh 1 HCL Technologies Ltd., Noida, 2 University School of Information &Communication Technology, Dwarka, Delhi.
More informationQuery Difficulty Prediction for Contextual Image Retrieval
Query Difficulty Prediction for Contextual Image Retrieval Xing Xing 1, Yi Zhang 1, and Mei Han 2 1 School of Engineering, UC Santa Cruz, Santa Cruz, CA 95064 2 Google Inc., Mountain View, CA 94043 Abstract.
More informationDocument Retrieval using Predication Similarity
Document Retrieval using Predication Similarity Kalpa Gunaratna 1 Kno.e.sis Center, Wright State University, Dayton, OH 45435 USA kalpa@knoesis.org Abstract. Document retrieval has been an important research
More informationInformation Retrieval
Multimedia Computing: Algorithms, Systems, and Applications: Information Retrieval and Search Engine By Dr. Yu Cao Department of Computer Science The University of Massachusetts Lowell Lowell, MA 01854,
More informationAutomatically Constructing a Directory of Molecular Biology Databases
Automatically Constructing a Directory of Molecular Biology Databases Luciano Barbosa Sumit Tandon Juliana Freire School of Computing University of Utah {lbarbosa, sumitt, juliana}@cs.utah.edu Online Databases
More informationConnecting Text Mining and Pathways using the PathText Resource
Connecting Text Mining and Pathways using the PathText Resource Sætre, Kemper, Oda, Okazaki a, Matsuoka b, Kikuchi c, Kitano d, Tsuruoka, Ananiadou, Tsujii e a Computer Science, University of Tokyo, Hongo
More informationSciMiner User s Manual
SciMiner User s Manual Copyright 2008 Junguk Hur. All rights reserved. Bioinformatics Program University of Michigan Ann Arbor, MI 48109, USA Email: juhur@umich.edu Homepage: http://jdrf.neurology.med.umich.edu/sciminer/
More informationPubMed Assistant: A Biologist-Friendly Interface for Enhanced PubMed Search
Bioinformatics (2006), accepted. PubMed Assistant: A Biologist-Friendly Interface for Enhanced PubMed Search Jing Ding Department of Electrical and Computer Engineering, Iowa State University, Ames, IA
More informationHow to Create a Reference Answer Set
How to Create a Reference Answer Set Find references quickly and easily In SciFinder, you are searching the world s largest, publicly available reference database for chemistry and related sciences as
More informationData and Information Integration: Information Extraction
International OPEN ACCESS Journal Of Modern Engineering Research (IJMER) Data and Information Integration: Information Extraction Varnica Verma 1 1 (Department of Computer Science Engineering, Guru Nanak
More informationSemantic Scholar. ICSTI Towards a More Efficient Review of Research Literature 11 September
Semantic Scholar ICSTI Towards a More Efficient Review of Research Literature 11 September 2018 Allen Institute for Artificial Intelligence (https://allenai.org/) Non-profit Research Institute in Seattle,
More informationA Semantic Model for Concept Based Clustering
A Semantic Model for Concept Based Clustering S.Saranya 1, S.Logeswari 2 PG Scholar, Dept. of CSE, Bannari Amman Institute of Technology, Sathyamangalam, Tamilnadu, India 1 Associate Professor, Dept. of
More informationAlternative Tools for Mining The Biomedical Literature
Yale University From the SelectedWorks of Rolando Garcia-Milian May 14, 2014 Alternative Tools for Mining The Biomedical Literature Rolando Garcia-Milian, Yale University Available at: https://works.bepress.com/rolando_garciamilian/1/
More informationGenescene: Biomedical Text and Data Mining
Claremont Colleges Scholarship @ Claremont CGU Faculty Publications and Research CGU Faculty Scholarship 5-1-2003 Genescene: Biomedical Text and Data Mining Gondy Leroy Claremont Graduate University Hsinchun
More informationMedLingMap: A growing resource mapping the Bio-Medical NLP field
MedLingMap: A growing resource mapping the Bio-Medical NLP field Marie Meteer, Bensiin Borukhov, Michael Crivaro, Michael Shafir, Attapol Thamrongrattanarit {mmeteer, bborukhov, mcrivaro, mshafir, tet}@brandeis.edu
More informationScholarly Big Data: Leverage for Science
Scholarly Big Data: Leverage for Science C. Lee Giles The Pennsylvania State University University Park, PA, USA giles@ist.psu.edu http://clgiles.ist.psu.edu Funded in part by NSF, Allen Institute for
More informationCRFVoter: Chemical Entity Mention, Gene and Protein Related Object recognition using a conglomerate of CRF based tools
CRFVoter: Chemical Entity Mention, Gene and Protein Related Object recognition using a conglomerate of CRF based tools Wahed Hemati, Alexander Mehler, and Tolga Uslu Text Technology Lab, Goethe Universitt
More informationCross Language Information Retrieval for Biomedical Literature
Cross Language Information Retrieval for Biomedical Literature Martijn Schuemie Erasmus MC m.schuemie@erasmusmc.nl Dolf Trieschnigg University of Twente trieschn@ewi.utwente.nl Wessel Kraaij TNO kraaijw@acm.org
More informationAn Algebra for Protein Structure Data
An Algebra for Protein Structure Data Yanchao Wang, and Rajshekhar Sunderraman Abstract This paper presents an algebraic approach to optimize queries in domain-specific database management system for protein
More informationUniversity of Eastern Finland Library Heikki Laitinen UEF // University of Eastern Finland
Information Retrieval in Health-Related Natural Sciences Applied physics, biomedicine, environmental sciences, medical engineering and computing, nutrition, pharmacy, toxicology University of Eastern Finland
More informationQuick Reference Guide. Biomedical Answers
Quick Reference Guide Biomedical Answers www.embase.com .... 3 - Homepage... 4.... 5 - Search Forms... 6 - Refine... 8 - Using Emtree... 9 3.... - Reviewing Records... - Preview Abstracts and Index Terms...
More informationBio wikis. Paolo Romano Bioinformatics, National Cancer Research Institute, Genova
Bio wikis Paolo Romano (paolo.romano@istge.it) Bioinformatics, National Cancer Research Institute, Genova Outline o Wiki systems: aims and technologies o Working with wikis: practical issues for setting
More informationefip online Help Document
efip online Help Document University of Delaware Computer and Information Sciences & Center for Bioinformatics and Computational Biology Newark, DE, USA December 2013 K K S I K K Table of Contents INTRODUCTION...
More informationYou need to start your research and most people just start typing words into Google, but that s not the best way to start.
Academic Research Using Google Worksheet This worksheet is designed to have you examine using various Google search products for research. The exercise is not extensive but introduces you to things that
More informationChapter 6: ISAR Systems: Functions and Design
Chapter 6: ISAR Systems: Functions and Design Information Search And Retrieval is a system which allow end users to communicate with the system. Every one will use the ISAR system in a different way. Each
More informationHistorical Text Mining:
Historical Text Mining Historical Text Mining, and Historical Text Mining: Challenges and Opportunities Dr. Robert Sanderson Dept. of Computer Science University of Liverpool azaroth@liv.ac.uk http://www.csc.liv.ac.uk/~azaroth/
More informationGeneral Arc of a Search. 1. Define information need, get vocabulary. 2. Choose information source
General Arc of a Search 1. Define information need, get vocabulary 2. Choose information source 3. Decide on your search strategy (keyword/author, citation analysis, related item search, etc.) 4. Construct
More informationMedlineDB: An integrated biological text mining framework
MedlineDB: An integrated biological text mining framework Bryan Cardillo and Lyle Ungar University of Pennsylvania Philadelphia PA 19104 Abstract. MedlineDB is a schema and framework for integrating Medline
More informationGenomic Information Retrieval through Selective Extraction and Tagging by the ASU-BioAI Group
Genomic Information Retrieval through Selective Extraction and Tagging by the ASU-BioAI Group Lian Yu, Syed Toufeeq Ahmed, Graciela Gonzalez, Brandon Logsdon, Mutsumi Nakamura, Shawn Nikkila, Kalpesh Shah,
More informationMeSH : A Thesaurus for PubMed
Resources and tools for bibliographic research MeSH : A Thesaurus for PubMed What is MeSH? Who uses MeSH? Why use MeSH? Searching by using the MeSH Database What is MeSH? http://www.ncbi.nlm.nih.gov/mesh
More informationChallenge. Case Study. The fabric of space and time has collapsed. What s the big deal? Miami University of Ohio
Case Study Use Case: Recruiting Segment: Recruiting Products: Rosette Challenge CareerBuilder, the global leader in human capital solutions, operates the largest job board in the U.S. and has an extensive
More informationAnalyzer of Bio-resource Citations. World Data Center of Microorganisms(WDCM)
Analyzer of Bio-resource Citations World Data Center of Microorganisms(WDCM) http://abc.wdcm.org/ Outlines Introduction of ABC Homepage and function of ABC Text mining for microorganism : classification,
More information