Study of an automatic indexing tool: NLM Medical Text Indexer Rita Pinal Fuentes and Eva Lorenzo Iglesias

Size: px
Start display at page:

Download "Study of an automatic indexing tool: NLM Medical Text Indexer Rita Pinal Fuentes and Eva Lorenzo Iglesias"

Transcription

1 Study of an automatic indexing tool: NLM Medical Text Indexer Rita Pinal Fuentes and Eva Lorenzo Iglesias Abstract. In this paper we study the architecture of the MTI (Medical Text Indexer) and evaluate what are the parameterization options of this tool for obtain the best MESH terms. MESH is a controlled vocabulary used to indexing databases as MEDLINE. Here we show how semantic relations are more important than syntactic structure of words in a document. Keywords: Scientific document indexing, MTI, MESH, MEDLINE, PubMed. 1 Introduction The past decade has seen a growth in the amount of experimental and computational biomedical data, accompanied by an accelerated increase in the number of biomedical publications [8]. This growth has driven the emergence of institutions like NLM (National Library of Medicine) [15] to help provide health professionals access to information necessary for research, health care and education. MEDLINE is a bibliographic database of the NLM. The scope of MEDLINE includes diverse topics as microbiology, delivery of health care, nutrition, pharmacology and environmental health. MEDLINE is the most comprehensive source of biomedical bibliographic information in existence. It contains over 18 million journal citations and abstracts for biomedical literature from around the world, from 1948 to present, and the number continues to grow steeply, with over citations added in 2008 alone [16]. In order to index, search and catalog these citations, the NLM employs a vocabulary of controlled terminology, the Medical Subject Headings (MESH) [18]. MESH terms are used as keywords for archiving, storing and then locate the MEDLINE documents that correspond to certain keywords. The task of assigning MESH terms to new citations is an intensive labor, and the development and evaluation of automated approaches to assist with this task have been the subjects of intensive research. Much of this research [19] has been conducted under the auspices of the NLM s Indexing Initiative, which has produced the Medical Text Indexer (MTI) system. The MTI automatically generates ordered lists of MESH suggestions and is currently used by human curators at the NLM as an assistive tool. The goal of this work is to research about the methods used by de MTI tool for annotation of documents to be included in the public database MEDLINE. MTI is composed by modules which implement different annotation techniques, with particular emphasis on the recognition of MESH terms. Therefore, in this paper we study the architecture of the MTI and evaluate the parameterization options of this tool for obtain the best MESH candidates. Using controlled vocabularies, as MESH, increase precision and recall in searching by identifying equivalent terms.

2 2 Indexing Indexing is the task of assigning to a document a limited number of terms denoting concepts that are substantively discussed in the document. This type of indexing is useful for retrieval purposes, but it also has a strong semantic descriptive value, in that the set of terms chosen to describe a document will serve as a synopsis of the subject matter discussed in the document. Indexing is a crucial step in any information retrieval system. In this paper, we focus on the particular controlled indexing task of assigning indexing terms from the MESH thesaurus to biomedical text referenced in MEDLINE, also known as citations. For this database it has been observed that adding the MESH terms to the text does give an improvement in performance [12]. Human indexing is an expensive and intensive activity that consists of reviewing the complete text of an article and assigning MESH terms to index biomedical articles in the following way: (1) select main headings (or descriptors) to represent concepts that are substantively discussed in the article (approximately twelve main headings are selected, but the number may vary depending upon the article s length and complexity), (2) attach the appropriate subheadings to the main headings selected. Subheadings (also known as qualifiers) afford a convenient means of grouping together those citations which are concerned with a particular aspect of a subject, (3) mark the most substantively discussed concepts as major (marked with *) and (4) make sure appropriate checktags are selected, all the while (5) complying with instructions detailed in the indexing manual. As more and more documents become available in electronic form, and as more organizations develop digital libraries for their collections, the exploration of automated indexing techniques becomes both feasible and necessary to continue to provide adequate access to information. 2.1 Medical Text Indexer The Indexing Initiative System (IIS) at the NLM was begun several years ago to address the indexing problem by exploring semi-automated indexing methods with the ultimate goal of improving access to bibliographic information and providing a number of methods for automatically computing MESH terms that could be added to a document prior to standard MESH indexing. Some information retrieval experiments have shown that MTI s indexing produces retrieval results that are almost as good as that produced by human indexing [13]. Several promising techniques were studied and formed into a prototype indexing system which eventually became the Medical Text Indexer (MTI). MTI system consists of software for applying alternative methods of discovering MESH headings and then combining them into an ordered list of recommended indexing terms as shown in the Fig. 1. The top portion of the diagram consists of three paths, or methods, for creating a list of recommended indexing terms: MetaMap Indexing, Trigram Phrase Matching, and PubMed Related Citations. The two left paths actually compute Unified Medical Language System (UMLS) Metathesaurus concepts [6] which are passed to the

3 Restrict to MESH method. Results from each path are weighted and combined using the Clustering method. The system is highly parameterized not only by path weights but also by several internal parameters specific to the Restrict to MESH and Clustering methods. Next we briefly explain each component. Fig. 1. This figure (partially obtained from [13]) shows MTI structure and principal components: Metamap, Trigrams, PubMed Related Citations, Restrict to MESH and Clustering and Ranking. MetaMap Indexing. MMI [1] is a method of discovering UMLS concepts [12] and consists of applying the MetaMap program [3] to a body of text and then ordering the resulting concepts using a ranking function. Steps are: 1. Parsing. Arbitrary text is parsed into simple noun phrases using the SPECIALIST parser [2]. 2. Variant Generation. For each phrase, variants are generated, where a variant consists of one or more consecutive phrase words together with all its acronyms, abbreviations, synonyms, inflectional variants and meaningful combinations of these. 3. Candidate Retrieval. The candidate set of all UMLS strings containing at least one of the variants is retrieved. 4. Candidate Evaluation. Each UMLS candidate is evaluated against the input text by first computing a mapping from the phrase words to the candidate's words and then calculating the strength of the mapping using a linguistically principled evaluation function consisting of a weighted average of four metrics: centrality (involvement of the head of the input phrase), variation, coverage and cohesiveness. The candidates are ordered according to mapping strength. 5. Mapping Construction. Complete mappings are constructed by combining candidates involved in disjoint parts of the phrase, and the strength of the complete

4 mappings is computed just as for candidate mappings. The highest-scoring complete mappings represent MetaMap's best interpretation of the original phrase. Trigram Phrase Matching. It is a method of identifying phrases that have a high probability of being synonyms. It is based on representing each phrase by a set of character trigrams that are extracted from that phrase. The character trigrams are used as key terms in a representation of the phrase much as words are used as key terms to represent a document. The similarity of phrases is then computed using the vector cosine similarity measure according to the following algorithm: 1. Break the title and abstract of a document up into all possible phrases consisting of one to six contiguous words without internal punctuation. 2. For each phrase produced in 1, compute the similarity score against all phrases in UMLS and record the phrase that obtains the highest score. 3. For each word in the title and abstract, record that phrase of which that word is a member and which receives the highest overall score against the UMLS and record also the UMLS phrase that produced that highest score. 4. For each phrase pair obtained in 3 where one element is a phrase in the document and the other is a phrase in UMLS, count how many times the pair appears in different places in the document and return the pair, their score, and the count. Like MMI, the Trigram Phrase Matching algorithm produces UMLS concepts which are subsequently restricted to MESH headings by the Restrict to MESH method. Restrict to MESH. This is a method based on the observation that the representation of meaning in the UMLS is organized according to the principle of semantic locality [11] in which several means of representing relationships between concepts produce a cluster of semantically related terms. In the Indexing Initiative, three of these phenomena are used to find the MESH terms most closely related to any given UMLS concept: synonyms, interconcept relationships, and categorization [8]. The overall strategy for restricting a given UMLS term to the semantically closest MESH term involves the following four steps: 1. Choose a MESH term as a synonym of the source concept. 2. Choose an associated expression which is a translation of the source concept. 3. Select MESH terms from concepts hierarchically related to the source concept. 4. Base the selection on the non-hierarchically related concepts of the source concept. The algorithm stops at any step that succeeds. PubMed Related Citations (RC). This method [20] directly computes a ranked list of MESH headings based on a given title and abstract. The neighbours of a pending document (related citations) are those documents in the database that are the most similar to it. The similarity between documents is measured by the words in title and abstract. Stopwords are eliminated from processing and a limited amount of stemming of words is done, but no thesaurus is used in processing. Having obtained the set of terms that represent each document, the next step is to assign global and local weights to each term. The global weight is used in weighting the term throughout the

5 database. The global weight of a term is greater for the less frequent terms. The local weight is log(n+1) where n is the number of times the term occurs in a document. The product of the two weights is the weight of the term. The similarity of two documents is computed using the term weights defined above and is an example of vector cosine. Recommended index terms are extracted from the MESH fields of documents most similar to a given document. Clustering and Ranking. This task [12] produces a single list of recommended MESH terms by combining the recommendations of the methods described above. It computes a rank score for each suggested indexing term using term weights, cooccurrence information, and estimates of the importance of the term based on where and how the term arose. The result of the clustering process constitutes the output of the MTI system. The Clustering and Ranking task provides a weighting of the confidence or strength of belief in the assignment, and ranks the suggested headings appropriately. There are a number of factors that can be recognized as playing a role in that confidence: The path: we can assign a weight to the overall method of finding the MESH term (PathWeight). The goodness of the match: it is how much confidence is available in how the method found the MESH term. The goodness of the match depends on the method used to find the heading. Each time a MESH heading is suggested, a weighting can be given to that suggestion. This is accomplished using both a MapScore and a NavScore. The MapScore reflects the confidence in the mapping to a UMLS term, the NavScore the confidence in navigating from a UMLS term to a MESH Heading. The possibilities are: A phrase identified in a text is an exact match to a MESH term. Equivalently, it might have been a match to a UMLS term that was a synonym of a MESH term Of lesser significance is an exact match to a UMLS term that is then being mapped to a MESH heading using the Restrict to MESH method. Another possibility is that the phrase is an inexact, or approximate, match to a UMLS term, which is either a synonym of a MESH heading or mapped to MESH. The location in the text of the nominal phrase that led to that suggestion: if the heading was suggested by a phrase occurring in the title, it should be given more weight; this is because it is known that things mentioned in the title of the article are probably more important than other concepts mentioned in the article. The semantic consistency: semantic consistency can be identified by relationships that a suggested heading has with another one. These relationships might be either the occurrence in the same hierarchy (as parents or siblings), or as known cooccurring headings in MEDLINE. This latter evidence needs to be weighted according to a normalized frequency of this co-occurrence that is explained below. The four steps involved in this clustering and ranking process are: 1. Creating the Normalized Frequency Scores for the Co-Occurring Concepts: MTI creates a co-occurring concepts normalized frequency database using the UMLS

6 Metathesaurus. Co-occurrences are concepts that occur together in the same "entries" in some information source (e.g. MEDLINE). Co-occurrence relationships may exist between similar concepts (e.g., "Atrial Fibrillation" and "Arrhythmia") or between very different concepts that nevertheless have some important connection in the field of biomedicine (e.g., "Atrial Fibrillation" and "Digoxin"), or between a primary concept and a qualifier e.g., "Lithotripsy" and "instrumentation". A co-occurrence relationship can exist between two concepts that have no other apparent relationship, although the frequency of such cooccurrences will be small. From MEDLINE, co-occurrence was computed for concepts that were designated as principal or main points in the same article; i.e., the co-occurrence counts do not include articles in which either or both of the concepts were present and indexed in MEDLINE but not designated as main points. (A concept is considered to be a main point if the * is attached to the main heading or any of its subheadings). The following steps calculate the normalized frequency score for co-occurring concepts: 1.1 Summarize all of the records by combining identical pairings of CUI (Concept Unique Identifier of UMLS) counts. See example in Table 1: Table 1. Example of clustering and ranking creating Normalized Frequency (step 1.1) CUI 1 CUI 2 COF (Co-ocurrence factor) C C C C C C C C C C Determine total frequency counts for each CUI1 (Table 2): this is made summarizing the COF for each CUI1 and CUI2 combination and providing a total frequency count for each CUI1 and CUI2 pairing. Table 2. Example of clustering and ranking creating Normalized Frequency (step 1.2) CUI 1 CUI 2 COF (Co-ocurrence factor) C C C C C C C C Create a temporary file containing a single line for each unique CUI1 concept (Table 3). This line contains the total frequency count for that particular CUI1. Table 3. Example of clustering and ranking creating Normalized Frequency (step 1.3) CUI 2 Total frequency C

7 1.4 Combine 1.2 and 1.3 in a file containing all of the records of 1.2 and the total frequency count from 1.3 above appended to the end of the line (see Table 4): Table 4. Example of clustering and ranking creating Normalized Frequency (step 1.4) CUI 1 CUI 2 COF (Co-ocurrence factor) Frequency C C C C C C C C Calculate the normalization of the frequency counts for each of the records by dividing the individual record s frequency count (field 3) by the CUI1 s total frequency count (field 4). See example in Table 5: Table 5. Example of clustering and ranking creating Normalized Frequency (step 1.5) CUI 1 CUI 2 Frequency normalized C C /1190 = C C /1190 = C C /1190 = C C /1190 = Load and summarize individual path results calculating the term weights. The TermWeight for each MESH Heading is the summation of all entries for a MESH term (identified by MH) from each of the paths used (MetaMap and PubMed Related Citations). The TermWeight for each MH regardless of path is calculated using the Eq. 1, where i represents the single occurrence of the suggestion of one MESH heading: TermWeight = TW = n i=1(pathweight i * MapScore i * NavScore i ) (1) The following steps are done for each MESH Heading to calculate the Term Weight: 2.1 The weight from the item is provided by each of the individual paths along with the navigational string information. The example in Table 6 shows items returned for the concept Blood Flow Velocity via both the MMI and RC pathways. Table 6. Example of clustering phase calculating term weights (step 2.1) Path CUI Individual Navigational Concept Name MapScore string MMI C G/P Blood Flow Velocity MMI C O Blood Flow Velocity RC C NIM Blood Flow Velocity RC C NIM Blood Flow Velocity

8 In the first line we have an item coming from the MMI pathway with a MapScore of 118 out of a possible 1000 perfect score and having a navigational string of G/P (Parent/Broader) [See parameter tunning section]. In the third line we have an item coming from the RC pathway with a MapScore of out of a possible 255 perfect score and having a navigational string of NIM (MESH Heading). Perfect score is 1000 is the path is MMI or Trigram, and 255 if is RC. 2.2 The MMI items are loaded into the program before loading all the RC terms. To calculate the PathWeight to be used in the calculations for each item, the individual path weight (assigned by user) is divided by the path-scoring factor (1000 for MMI or Trigram, and 255 for RC) (see Table 7). The path-scoring factor is used to equalize all of the different scoring methods. Table 7. Example of clustering phase calculating term weights (step 2.2) Path User value Path-scoring PathWeight MMI /1000 = RC /255 = Calculate the individual item weights via (PathWeight * MapScore * NavScore), where NavScore depends on the navigation string [see Parameter tunning section] (see Table 8). Table 8. Example of clustering phase calculating term weights (step 2.3) Path PathWeight Individual Navigational Total MapScore string 1 1 MMI G/P (0.90) 118*0.0070*0.90 = MMI O (0.50) 118*0.0070*0.50 = RC NIM (0.80) *0.0078*0.80 = RC NIM (0.80) *0.0078*0.80 = Sum all of the individual item weights together to get the final TermWeight = For the Blood Flow Velocity example, TermWeight obtained is The five different path entries are summarized into a single term containing the concept name, CUI, score (which is zero at this point and will be calculated in clustering step), and the TermWeight calculated (see Table 9). 1 Navigational string are explained in section 2.3 Parameters tuning

9 Table 9. Example of clustering phase calculating term weights (step 2.4) Concept name CUI Score TermWeight Blood Flow Velocity C The summarized list for all processed items is stored in a file called mt_table as follows: Table 10. Example of clustering phase calculating term weights (step 2.5) Concept name CUI Score TermWeight mt_table[0] DNA-Binding Proteins C mt_table[1] Transcription Factors C mt_table[2] SEF1 protein C mt_table[3] Blood Circulation Time C mt_table[4] Blood Flow Velocity C mt_table[88] Regression Analysis C Clustering of the results determining which of the results are related. In the clustering phase every item is crossed in the summarized term weighted list looking for what other items in the list co-occur with the item or are related via the MESH tree structure to the item. Results of the clustering process are compartmented into co-occurring terms (COT) and MESH tree relationship terms. The MESH tree relationships are again compartmented into Parent, Child, or Sibling (PAR/CHD/SIB), called treerel, and then Broader, Narrower, or Other (RN/RB/RO), called othrel. 4. Ranking the results using the information obtained in 1 and 2 to compute the rank of each item. This is the final stage where a final RankScore is calculated for each item based on the TermWeight, the normalized frequency count, and user specified constants for COT, REL, Title, and PathWeight. The formula for the RankScore is showed in Eq.2: RankScore = TW * [F * [1+ j=1 (COT j * TW j ) + k=1 (REL * TW k )]] (2) where j represents co-ocurrent terms, k represents related terms (see Table 15) and F is the Path Factor (see Table 16). 2.2 Filtering The MTI system has three selectable levels of filtering to help remove inappropriate recommendations before they are presented to a user or returned to a program. 1. Base filtering: base layer of filtering is a collection of four rules that are used to: The (1) addition and (2) removal of MESH headings, check tags, or subheadings

10 based on recommended terms from the two pathways, (3) the boosting of certain MESH headings based on the recommended terms from the two pathways, and (4) the substitution of subheadings for certain MESH headings. Base filtering provides a mixed list of good and bad recommendations with a fair number of good recommendations near the top of the list. 2. Medium Filter: The MetaMap (MM) method tends to provide more general terms, and the very nature of the PubMed Related Citations (RC) method tends to provide a small number of spurious terms that are not related to the article being indexed. Medium filtering uses a sequence of ten heuristics to balance the results from both the MM and RC methods to help reinforce the terms from each other. Medium filtering uses the general terms from the MM method to remove spurious RC method terms by ensuring that we have at least one more general term from the MM method for any RC method term, or we remove it. Medium filtering then removes any more general MM method term when a more specific RC method term is found. The specificity of the terms is usually determined using the MESH tree hierarchy, but for longer terms may also be determined by terms being substrings of one another. This balancing of the results from the two methods allows medium filtering to filter out the more general terms and also reduce the number of unrelated terms. Medium filtering provides a good-sized list with mostly correct recommendations. 3. Strict filtering: Strict filtering is very simple, if a term was not recommended by both the MetaMap and PubMed Related Citations pathways, the term is removed. This filtering provides very high precision at the expense of ignoring good terms which were only recommended by one of the pathways. In the extreme case, no results are getting, when the RC pathway finds no related articles. Strict Filtering is not currently used in any NLM indexing environment. Base filtering and medium filtering are appropriate for most needs where base filtering produces better recall and medium filtering produces better precision. Base filtering is used to assist indexers in indexing MEDLINE, and medium filtering is used to provide fully automated indexing for abstracts collections. 2.3 Parameter tuning The overall RankScore can be altered by changing any of the constants (COT, REL, and PathWeight) or by changing the method by which the weight is calculated (NavScore and MapScore). Altering these values allows a number of experiments to be performed to evaluate the robustness of the weighting scheme, and to establish reasonable values for the constants. Tables 11 to 19 depict the parameters used in calculating the TermWeight along with their default values: Table 11. PathWeight parameters Abreviation Full Name Notes Default value Range of values

11 MMI MetaMap Indexing Path Weight for MetaMap RC Related Path Weight for Related Citations Citations T Trigram Path Weight for Trigram Table 12. NavScore parameters Abreviation Full Name Notes Default value Range of values I Direct Match Relevance scoring for term identified Navigational String as having a Direct Match to a MESH Heading. A G/P G/C G/S O ATX (Associated Expresion) Navigational String Parent/Broader Navigational String Child/Narrowe r Navigational String Sibling Navigational String Other Related Navigation String Relevance scoring for term identified as having an Associated Expression relationship to a MESH Heading. Relevance scoring for term identified as having an Parent or Broader relationship. Relevance scoring for term identified as having a Child or Narrower relationship. Relevance scoring for term identified as having a Sibling relationship to the MESH Heading. Relevance scoring for term identified as having an Other Related relationship (not synonymous, narrower or broader) to the MESH Heading NavScore parameters are related to the level of confidence between a UMLS term and MESH term. UMLS is organized in three parts: 1) a list of word forms and their lemmas, part-of-speech and morphological information, 2) a metathesaurus where assign a unique string identifier (CUI) to each term and represent relationships between terms, and 3) a semantic network which provides a grouping of concepts according to their meaning into semantic types. Existing relationships in metathesaurus are either hierarchical relationships: PAR (parent), CHD (child), RB (broader), RN (narrower), hierarchically-related: SIB sibling), or non-hierarchical, essentially associative relationships: O (other). Table 13. Related Citations parameters used to calculate NavScore. Abreviation Full Name Notes Default value Range of values IM MESH Major Relevance scoring for MESH major

12 Topic Navigational String NIM MESH Heading Navigational String NC Number of citations topic items returned from Related Citations method. Relevance scoring for normal MESH items returned from the Related Citations method. Number of related citations to use from PubMed (0 turns off the RC path) Table 14. MapScore parameters Full Name Default value Tunable by user Best possible score for items returned by the MMI path (MapScore) 1000 No Best possible score for items returned by the RC path (MapScore) 255 No Best possible score for items returned by the Trigram path (MapScore) 1000 No Table 15. RankScore parameters tunable by users Abreviation Full Name Notes Default value Range of values COT Co-occurrences factor Relevance scoring for terms identified as co-ocurring with another term. Coocurrence is identified using the REL Related Term Factor UMLS MRCOC file. Relevance scoring for terms identified as being related via the MESH tree structure. This is used during the clustering phase and figures into the overall RankScore for an item. TF Title Factor This parameter has been superceded by the Emphasize Titles factor which is a defined doubling of the score for items found in the Title field of the citation. This emphasis is done after ranking and clustering Not currently used For each pair of MESH terms, the frequency of co-occurrence in MEDLINE citations is recorded in the UMLS and can be used as a surrogate for the strength of the relationships. Therefore, co-occurrences are an important source of knowledge that has the potential to complement the limited set of symbolic relationships, and should benefit from characterization of their semantics to be fully usable.

13 Table 16. RankScore parameters no tunnable by users Full Name Default value Tunable by user TW: Term Weight - No F: Path Factor (If the items comes from MetaMap or Trigrams AND also from PubMed Related Citations F = 2 otherwise F=1) 1 or 2 No Table 17. Filtering level parameters Full Name Notes Default value Medium Filtering Remove items from the list of recommendations based on specific heuristics. String Filtering Remove all items from the list of recommendations that are not recommended by both MetaMap and PubMed Related Citations. Base Filtering Basic processing based on the default values for all options. Table 18. Post-processing parameters Full Name Notes Default value Star MESH that come from Add * to each MESH term that was identified from the Title Title. Add CheckTags Add from a list of CheckTags based on review of actual text and the list of CheckTags. Add Geographics Add from a list of Geographic Locations based on review of actual text and the list of Geographics. Remove Do Not Index With Remove MESH Terms which have been indicated as Terms Do Not Index With from our list of Show Headings Mapped to (HM) recommendations and prior to scoring. Display MESH Headings that are in fact Headings Mapped to with a HM notation versus normal MESH Headings MH notation. Show Entry Terms (ET) Replace MESH Headings with their corresponding Entry Term where applicable. Show Treecodes In the detailed outputs, add in the treecodes for each result if we have them. Show Term Unique Identifiers Normally only used in II overnight DCMS processing. Perform Aged/Human Review Make sure we don t add age related checktags if we already have the CheckTag Animals set and Humans not set. If animals is not set, and we have age related CheckTags recommended, we need to add Humans. Age related CheckTags include: Adolescent, Adult, Aged, Child, Infant and Infant Newborn. If animals is set, we sill remove any of

14 Bypass Related Citations Results Exclusion Limit Recommendations via Publication Types Limit Recommendations for Title Only Citations Rank Score Filtering for Title Only Citations Rank Score Filtering for Title & Abstract Citations Use Latest Supplemental Concepts Show MESH DUIs Use Word Sense Disambiguation (WSD) these age related CheckTags. Do not process the results obtained via the PubMed Related Citations through our MH_exclude list. Reduce the number of recommendations from the default when a citation is identified by specific Publication Types. Currently this is set as follows: PT equals WReview or News, we limit the number of returned terms to 14. If the PT equals Editorial we limit to 9, and if the PT equals Letter, we limit to 8. Reduce the number of recommendations from the default when a citation only has a title field and no abstract. This is currently calculated based on the number of words in the title: 0-2 words limits the number to 7, 3 or 4 limit to 12, limit to 13, limit to 14, and anything larger then 21 words in the title is limited to 13 items. If this is a title only abstract/citation AND the term is ranked 11 or below on the list of recommendations AND if the score is less than 190, we will stop the list. If this is a title AND abstract citation AND the term is ranked 14 or below on the list of recommendations AND if the score is less than 203, we will stop the list. Every Monday morning the MESH Vocabulary is updated. This usually only involves the Supplement Concepts. This option says to use this updated lookup list and apply any relevant changes. In the detailed output, add in the MESH Unique Identifier for each result if e have it. This options turns on the WSD option for the MetaMap path to MTI. MetaMap uses WSD to limit ambiguous UMLS Concepts it finds in the text being processed. Table 19. Output options Full Name Notes Default value Simple Simple display with only the names of the MESH Headings, CheckTags, and SubHeadings being displayed in scoring order and with annotations. Detailed Detailed display showing all relevant information about all of the topn recommendations. This includes: name of the item, CIU, final score, type, where item was found in the text, and who recommended the term. In the case of CheckTags and SubHeadings, the field after the type (CT/SH) contains the triggering information who caused this item to be included in the recommendations. Recommendations are displayed in scoring order. Expanded Detail The fields are the same as Detailed above except here we add in the Text Trigger field, that gives us a mapping of concepts to actual triggering text within the document.

15 Full Listing with Detailed The Full Listing format is the similar to the Detailed format outlined above. The differences are that the Full Listing shows the entire list and includes a number showing the list position for each recommendation. Just The Facts The fields are the same as Detailed above except here we limit to just the first four fields: PMID Term CIU Score DCMS List The DCMS List output format is a single line showing the PMID followed by zero or more recommended MESH Terms and their associated data type. Show NO_TERMS This is the same as the DCMS List above except if we have List zero recommendations for a given PMID, we will print NO_TERMS. XML In the XML output format, we enclose all the terms with XMLS tags. 3 Experimental context The high degree of parameterization of the MTI allows us to test the components for their relative contribution to the results. It is possible, for example, compare the same method using different parameter settings or the same settings across different methods. Such experiments were performed to determine optimal system parameterization values using a randomly selected sample of 1000 MEDLINE citations. This test corpus was obtained by searching PubMed with the search limited to the last 1000 items discharged between January and April The results were exported using MEDLINE format records. This format includes MESH terms assigned for NLM indexers with which we will compare our results. Each experiment will consist of processing the citations with a given set of parameters. Recommended indexing will be compared with the terms assigned by NLM indexers. As reported by Lancaster [4], it is difficult to adequately evaluate the quality of indexing because even in the case of controlled indexing, there is no unique correct indexing set to use as a reference. However, we used existing MEDLINE indexing as the good standard indexing for a citation. Throughout the study, precision, recall and F-measure are used to perform quantitative evaluations of the results. Recall corresponds to the number of pairs recommended that were also in the MEDLINE indexing divided by the total number of correct pairs according to the MEDLINE indexing. Precision corresponds to the number of pairs recommended that were also in the MEDLINE indexing divided by the total number of pairs recommended. F-measure (or balanced F-score) is the harmonic mean of precision and recall. It is computed as shown in Eq. 3, where P is precision and R is recall: F-measure = β x P x R / (P + R) (3) We selected this measure because the β=2 version of the F-measure gives recall twice the weight of precision. This corresponds to the observation that indexers will

16 tolerate some inappropriate terms as long as many useful terms are presented to them. This weighting also ameliorates the handicap of always recommending 25 terms when we know that the normal number of MESH terms assigned is closer to 12. Recall, precision and F-measure are calculated for each citation, and the average the median and the standard deviation over all the citations in an experiment are reported. The average is strongly influenced by atypical values (data not homogeneous), which does not happen with the median, thus both measures are used. The standard deviation is a statistic that tells us how tightly all the various examples are clustered around the mean in a set of data. In [14], summary results for an evaluation analysis performed in 2007 by MTI team using 200 MEDLINE documents can be consulted. 3.1 Results All experiments are performed using the same values for filtering, post-processing (Table 20) and output options (Table 21), but we have modified the parameters involved in the calculation of PathWeight, and RankScore and NavScore used from MTI Clustering phase. Table 20. Fixed options for post-processing Full Name Star MESH that come from Title Add CheckTags Add Geographics Remove Do Not Index With Terms Show Headings Mapped to (HM) Show Entry Terms (ET) Show Treecodes Show Term Unique Identifiers Perform Aged/Human Review Bypass Related Citations Results Exclusion Limit Recommendations via Publication Types Limit Recommendations for Title Only Citations Rank Score Filtering for Title Only Citations Rank Score Filtering for Title & Abstract Citations Use Latest Supplemental Concepts Show MESH DUIs Use Word Sense Disambiguation (WSD) Default value Table 21. Fixed options for output Full Name Simple Detailed Default value

17 Expanded Detail Full Listing with Detailed Just The Facts DCMS List Show NO_TERMS List XML In these experiments we based on top-25 MTI recommendations plus CheckTags but no use subheadings. In the first experiment, the parameters involved in calculating of NavWeight were adjusted. These parameters are: I (direct match), A (associated expression), G/P (parent/broader), G/C (child/narrower), G/S (sibling) and O (other relations). To test the best result for each parameter (independent of others) their values were modified with the following weights: 0.25, 0.50, 0.75 and In Table 22 is showed the average, median, variance and standard deviation of the F-measure from the 1000 values obtained. Table 22. First approximation of parameterization of NavWeight Parameter I (Direct Match) Parameter A (Associated Expression) Parameter G/P (Parent/Broader) Parameter G/C (Child/Narrower) Parameter G/S (Sibling) Average Median Variance deviation ,3483 0,3479 0,0231 0, ,3406 0,3333 0,0238 0, ,3428 0,3333 0,0238 0, ,3456 0,3438 0,0243 0,1559 Average Median Variance deviation ,3456 0,3478 0,0242 0, ,3406 0,3333 0,0237 0, ,3453 0,3448 0,0239 0, ,3502 0,3478 0,0235 0,1533 Average Median Variance deviation ,3497 0,3479 0,0237 0, ,3413 0,3333 0,0243 0, ,3453 0,3448 0,024 0, ,3486 0,3479 0,0242 0,1556 Average Median Variance deviation ,3519 0,3529 0,0241 0, ,3433 0,3428 0,0244 0, ,3468 0,3448 0,0241 0, ,3489 0,3478 0,0247 0,1572 Average Median Variance deviation ,347 0,3478 0,0244 0, ,3491 0,35 0,0241 0, ,3391 0,3333 0,0252 0, ,3466 0,3478 0,0243 0,1559 Average Median Variance deviation ,3397 0,3333 0,0236 0,1536 Parameter O (Other relations) ,3459 0,3439 0,0238 0,1543

18 0.75 0,3421 0,3448 0,0241 0, ,3433 0,3448 0,0249 0,1578 In Table 23, the best results for each parameter are combined and its efficiency as a whole is checked, by comparing it with the default values. In case of Parameter O, we test with O=0.75 (Option 1) and O=1.00 (Option 2) because both have the same median value, though 0.75 value has a lower standard deviation. Table 23. Summary table combining the best values for I, A, G/P, G/C, G/S and O parameters Average Median Variance deviation Option1 (I=0.25, A=1.00, G/P=0.25, G/C=0.25, G/S=0.50, O=1.00) 0,3845 0,3809 0,0218 0,1476 Option2 (I=0.25, A=1.00, G/P=0.25, G/C=0.25, G/S=0.50, O=0.75) 0,3484 0,3479 0,0242 0,1556 Default(I=1.00, A=1.00, G/P=0.90, G/C=0.75, G/S=0.70, O=0.50) 0,3424 0,3448 0,0243 0,1559 As observed in Table 23, this new parameterization does improve the combination of MTI's default settings. In a second attempt to get better results using NavWeight parameters, we assign values based on the best value of the predecessor parameter, trying with four values: 0.25, 0.50, 0.75 and 1.00 (see Table 24). As previous, in Table 25, the best results for each parameter are combined and compared with the default values. Table 24. Second approximation of parameterization of NavWeight I (Direct Match) using other params equal to 0 A (Associated Expression) using I=0.25 G/P (Parent/Broader) using I=0.25 and Average Median Variance deviation ,3483 0,3479 0,0231 0, ,3406 0,3333 0,0238 0, ,3428 0,3333 0,0238 0, ,3456 0,3438 0,0243 0,1559 Average Median Variance deviation ,3849 0,381 0,0219 0, ,3854 0,381 0,0218 0, ,3388 0,3333 0,0228 0, ,3545 0,3529 0,0236 0,1536 Average Median Variance deviation ,3415 0,3333 0,0231 0, ,352 0,3529 0,0232 0, ,3446 0,3428 0,0232 0,1523 A= ,3447 0,3334 0,0244 0,1562 Average Median Variance deviation G/C ,3528 0,35 0,0241 0,1552 (Child/Narrower) ,3499 0,3479 0,0236 0,1536 using I=0.25, A= ,341 0,3333 0,0239 0,1546 and G/P= ,3561 0,3529 0,0239 0,1546

19 Average Median Variance deviation G/S (Sibling) ,3518 0,3479 0,0242 0,1556 using I=0.25, ,3474 0,3478 0,0241 0,1552 A=0.50, G/P= ,3432 0,3333 0,0238 0,1543 and G/C= ,3544 0,3529 0,0244 0,1562 Average Median Variance deviation O (Other ,355 0,3529 0,0237 0,1539 Relations) I=0.25, ,3479 0,3448 0,0238 0,1543 A=0.50, G/P=0.50, ,342 0,3333 0,0236 0,1536 G/C=1.00, G/S= ,3411 0,3333 0,0251 0,1584 Table 25. Comparing second approximation with default values Average Median Variance deviation Default (I=1.00,A=1.00, G/P=0.90, G/C=0.75, G/S=0.70, O=0.50) 0,3424 0,3448 0,0243 0,1559 Option (I=0.25, A=0.50, G/P=0.50, G/C=1.00, G/S=0.25, O=0.25) 0,355 0,3529 0,0237 0,1539 As observed in Table 25, the results obtained in this experiment did not improve the results of the previous experiment (Table 23), but the results are better than results obtained with default values. In summary, as a result of these early experiments, we can conclude that the best value for these parameters, used to calculate NavWeight value, are those obtained in the first experiment (the first row in Table 23). From these results, and based on the meaning of these parameters, we can also conclude that is more important the semantic value of words in the document than its syntactic structure, i.e. the parameter I (direct match) has a very low weight against other parameters such as A (related expressions) or O (other relationships). Based on this statement, we are going to perform a new test increasing value of parameters G/P (parent or broader), G/C (child or narrower) and G/S (sibling or synonymy) to provide them greater weight, since we believe that the semantic relations represented by these parameters should be more important. Table 26. Increasing G/P, G/C and G/S parameter values Parameter values Average Median Variance deviation Option1 (I=0.25, A=1.00, G/P=0.25, G/C=0.25, G/S=0.50, O=1.00) 0,3845 0,3809 0,0218 0,1476 Option2 (I=0.25, A=1.00, G/P=0.50, G/C=0.50, G/S=0.75, O=1.00) Option3 (I=0.25, A=1.00, G/P=0.75, G/C=0.75, G/S=1.00, O=1.00) In Table 26 we can see that increasing the weight to the parameters that refer to a semantic relationship between terms succeeded in increasing F-measure value, getting

20 better results. Therefore, for the NavScore parameters we are going to use the best values (second row in Table 26). In the third experiment (see Table 27) we try to adjust the parameters involved in calculating of RankScore. The parameters are COT (co-occurrences factor) and REL (related terms factor). Table 27. Parameters COT and REL Average Median Variance deviation COT=0, REL=0 0,3493 0,35 0,0241 0,1552 COT=32767, REL=0 0,3382 0,3333 0,0252 0,1587 COT=0, REL= ,3481 0,3448 0,0241 0,1552 COT=10000, REL=0 0,3431 0,3478 0,0244 0,1562 COT=0, REL= ,3495 0,3515 0,0244 0,1562 COT=10000, REL= ,3465 0,3448 0,0243 0,1559 COT=100, REL= ,3534 0,3529 0,0231 0,152 COT=10000, REL=100 0,3424 0,3448 0,0243 0,1559 COT=10000, REL= ,3846 0,381 0,0218 0,1476 In Table 27, we can see that the best results are obtained by applying the mean value to COT parameter and the highest value to REL parameter. These results corroborate the conclusion obtained above, as the REL factor refers to relations between different terms, while the COT factor refers to words that often appear together in a particular context. Once again we can conclude the importance of charge semantic relationships of words within a document when extracting the terms that best identify it. Finally, we combine best results for NavWeight and RankScore parameters (see Table 28). Results are improved changing each parameter separately, but not combining both (NavWeight and RankScore). Table 28. Combination of NavWeight and RankScore parameters Average Median Variance deviation Default (I=1.00, A=1.00, G/P=0.90, G/C=0.75, G/S=0.70, O=0.50) Default (COT=10000, REL=100) 0,3424 0,3448 0,0243 0,1559 Default (I=1.00, A=1.00, G/P=0.90, G/C=0.75, G/S=0.70, O=0.50) Test (COT=10000, REL=32767) 0,3846 0,381 0,0218 0,1476 Option1 (I=0.25, A=1.00, G/P=0.50, G/C=0.50, G/S=0.75, O=1.00) Test (COT=10000, REL=32767) Option1 (I=0.25, A=1.00, G/P=0.50, G/C=0.50, G/S=0.75, O=1.00) Default (COT=10000, REL=100)

21 4 Conclusions and Future Work The treatment of semantic relations (synonymous, narrower, broader or related terms or expressions) between terms is essential in information retrieval and therefore in annotation or indexing of documents. In biomedical field we can use Medical Text Indexer (MTI), a tool developed to facilitate the indexing of documents, which provides MESH terms candidates extracted from the text (title and abstract). These candidate terms come from parsing syntactically the sentences of the text, looking for in another similar documents and using a metathesaurus that provides new related terms. Therefore, syntactic relationship can be found between terms but semantic relationship could be inferred. The identification of terms and their mapping to concepts is the first stage of semantic analysis. Semantic relations between concepts represent another layer of information, which have the potential of making the document search even more detailed and specific [9]. MTI is a flexible and highly customizable tool that allows users to indicate different levels of importance at the pathways to extract terms from a document using parameters that define the weight of different semantic relationships (synonymy, hyponymy, hyperonymy) compared to direct matches of words or other relationships (co-occurrences, related terms, expressions associated). The associated expressions provide a translation of some complex concepts to expressions in other vocabularies [7]. Synonymy and lexical matching techniques are used to link terms together. The identification of new instances of relations was based on observed co-occurrences of concepts using MESH tree structure. The experiments described here have resulted in improved MTI performance tuning some parameters used in clustering and ranking phase. We have increased the value of those parameters involved in calculating the ranking of terms based in related expressions, broader, narrower, sibling and other relations, obtaining better results of F-measure than using the default values. It reinforces the theory of the importance of semantic relations to indexing a document in the biomedical field and the relevance of MESH terms coming from associated expressions [7]. As a future work we can analyze these parameters using full text instead title and abstract only and extend our studies to other parameters of the tool. Other future researches could include learning semantic relations using classification techniques, where the context features of MESH co-occurrences will be expanded from verbs to other linguistic markers including grammatical functions [9].

22 References 1. Aronson AR. - Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program. Proc AMIA Symp. 2001; Aronson AR. - The effect of textual variation on concept-based information retrieval. In: Cimino JJ, ed. AMIA Annual Fall Symposium. Washington, D. C.: Hanley & Belfus, Inc., 1996: Aronson AR, Browne AC, Rindflesch TC. Exploiting a large thesaurus for information retrieval. RIAO 94. Rockefeller University, New York, N. Y: JOUVE, Paris, 1994: Aronson AR, Gay CW, Humphrey S, Mork J, Rogers W- The NLM Indexing Initiative s Medical Text Indexer. 5. Aronson AR, Kim W, Wilbur WJ - Automatic MeSH Term Assignment and Quality Assessment. Proc AMIASymp. 2001; Bondenreider O - The Uni ed Medical Language System (UMLS): integrating biomedical terminology, September 27, Bodenreider O, Burgun A - Methods for exploring the semantics of the relationships between co-occurring UMLS concepts. 8. Bodenreider O, Chang HF, Hole WT, Nelson SJ - Beyond Synonymy: Exploiting the UMLS Semantics in Mapping Vocabularies. Proc AMIA Symp 1998; Buitelaar P, Vintar S, Volk M - Semantic Relations in Concept-Based Cross-Language Medical Information Retrieval. 10. Feldman R, Shatkay H, 2003 Mining de Biomedical Literature in the Genomic Era: An Overview. 11. McCray AT, Nelson SJ. - The representation of meaning in the UMLS. Methods of Information in Medicine 1995; 34(1-2): Medical Text Indexer Processing Flow. March 13, Available from: Accessed May 20, Srinivasan P. - Optimal document indexing vocabulary for MEDLINE. Information Processing & Management 1996; 32(5): Summary results for 200 MEDLINE Evaluation Anaylisis (March, 2007). Available from: Accessed May 20, U.S. National Library of Medicine. National Institutes of Health Fact Sheet. Available from: Accessed May 20, U.S. National Library of Medicine. National Institutes of Health - Yearly Citation Totals from 2009 MEDLINE. Available from: Accessed May 20, U.S. National Library of Medicine. National Institutes of Health Unified Medical Language System (UMLS). Available from Accessed May 20, U.S. National Library of Medicine NLM Technical Bulletin. Available from: Accessed May 20, Vasuki V, Cohen T. - Reflective random indexing for semi-automatic indexing of the biomedical literature. J Biomed Inform. 2010; 43(5): Wilbur WJ. - PubMed Related Citations Algorithm. Available at Accessed May 20, 2011.

The NLM Medical Text Indexer System for Indexing Biomedical Literature

The NLM Medical Text Indexer System for Indexing Biomedical Literature The NLM Medical Text Indexer System for Indexing Biomedical Literature James G. Mork 1, Antonio J. Jimeno Yepes 2,1, Alan R. Aronson 1 1 National Library of Medicine, Bethesda, MD, USA {mork,alan}@nlm.nih.gov

More information

Sheffield University and the TREC 2004 Genomics Track: Query Expansion Using Synonymous Terms

Sheffield University and the TREC 2004 Genomics Track: Query Expansion Using Synonymous Terms Sheffield University and the TREC 2004 Genomics Track: Query Expansion Using Synonymous Terms Yikun Guo, Henk Harkema, Rob Gaizauskas University of Sheffield, UK {guo, harkema, gaizauskas}@dcs.shef.ac.uk

More information

Document Retrieval using Predication Similarity

Document Retrieval using Predication Similarity Document Retrieval using Predication Similarity Kalpa Gunaratna 1 Kno.e.sis Center, Wright State University, Dayton, OH 45435 USA kalpa@knoesis.org Abstract. Document retrieval has been an important research

More information

MeSH: A Thesaurus for PubMed

MeSH: A Thesaurus for PubMed Resources and tools for bibliographic research MeSH: A Thesaurus for PubMed October 24, 2012 What is MeSH? Who uses MeSH? Why use MeSH? Searching by using the MeSH Database What is MeSH? Acronym for Medical

More information

MeSH : A Thesaurus for PubMed

MeSH : A Thesaurus for PubMed Scuola di dottorato di ricerca in Scienze Molecolari Resources and tools for bibliographic research MeSH : A Thesaurus for PubMed What is MeSH? Who uses MeSH? Why use MeSH? Searching by using the MeSH

More information

In the previous lecture we went over the process of building a search. We identified the major concepts of a topic. We used Boolean to define the

In the previous lecture we went over the process of building a search. We identified the major concepts of a topic. We used Boolean to define the In the previous lecture we went over the process of building a search. We identified the major concepts of a topic. We used Boolean to define the relationships between concepts. And we discussed common

More information

Evaluation of Automatically Assigned MeSH Terms for Retrieval of Medical Images

Evaluation of Automatically Assigned MeSH Terms for Retrieval of Medical Images Evaluation of Automatically Assigned MeSH Terms for Retrieval of Medical Images Miguel E. Ruiz 1 and Aurélie Névéol 2 1 University of North Texas, School of Library and Information Sciences P.O. Box 311068,

More information

Supporting Discovery in Medicine by Association Rule Mining in Medline and UMLS

Supporting Discovery in Medicine by Association Rule Mining in Medline and UMLS Supporting Discovery in Medicine by Association Rule Mining in Medline and UMLS Dimitar Hristovski a, Janez Stare a, Borut Peterlin b, Saso Dzeroski c a IBMI, Medical Faculty, University of Ljubljana,

More information

Knowledge Retrieval. Franz J. Kurfess. Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A.

Knowledge Retrieval. Franz J. Kurfess. Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A. Knowledge Retrieval Franz J. Kurfess Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A. 1 Acknowledgements This lecture series has been sponsored by the European

More information

A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2

A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2 A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2 1 Department of Electronics & Comp. Sc, RTMNU, Nagpur, India 2 Department of Computer Science, Hislop College, Nagpur,

More information

MeSH : A Thesaurus for PubMed

MeSH : A Thesaurus for PubMed Resources and tools for bibliographic research MeSH : A Thesaurus for PubMed What is MeSH? Who uses MeSH? Why use MeSH? Searching by using the MeSH Database What is MeSH? http://www.ncbi.nlm.nih.gov/mesh

More information

WEB SEARCH, FILTERING, AND TEXT MINING: TECHNOLOGY FOR A NEW ERA OF INFORMATION ACCESS

WEB SEARCH, FILTERING, AND TEXT MINING: TECHNOLOGY FOR A NEW ERA OF INFORMATION ACCESS 1 WEB SEARCH, FILTERING, AND TEXT MINING: TECHNOLOGY FOR A NEW ERA OF INFORMATION ACCESS BRUCE CROFT NSF Center for Intelligent Information Retrieval, Computer Science Department, University of Massachusetts,

More information

Euripides G.M. Petrakis 1, Angelos Hliaoutakis 2

Euripides G.M. Petrakis 1, Angelos Hliaoutakis 2 Automatic Document Categorisation by User Profile in Medline Euripides G.M. Petrakis 1, Angelos Hliaoutakis 2 Dept. Of Electronic and Comp. Engineering, Technical Univ. of Crete (TUC), Chania, Crete, Greece,

More information

Biomedical literature mining for knowledge discovery

Biomedical literature mining for knowledge discovery Biomedical literature mining for knowledge discovery REZARTA ISLAMAJ DOĞAN National Center for Biotechnology Information National Library of Medicine Outline Biomedical Literature Access Challenges in

More information

Search of the literature

Search of the literature Disclaimer: The contents of this presentation are the views of the author and do not necessarily represent an official position of the European Commission. European Union, 2013 Search of the literature

More information

Optimization of the PubMed Automatic Term Mapping

Optimization of the PubMed Automatic Term Mapping 238 Medical Informatics in a United and Healthy Europe K.-P. Adlassnig et al. (Eds.) IOS Press, 2009 2009 European Federation for Medical Informatics. All rights reserved. doi:10.3233/978-1-60750-044-5-238

More information

Renae Barger, Executive Director NN/LM Middle Atlantic Region

Renae Barger, Executive Director NN/LM Middle Atlantic Region Renae Barger, Executive Director NN/LM Middle Atlantic Region rbarger@pitt.edu http://nnlm.gov/mar/ DANJ Meeting, November 4, 2011 Advanced PubMed (20 min) General Information PubMed Citation Types Automatic

More information

CHAPTER 3 INFORMATION RETRIEVAL BASED ON QUERY EXPANSION AND LATENT SEMANTIC INDEXING

CHAPTER 3 INFORMATION RETRIEVAL BASED ON QUERY EXPANSION AND LATENT SEMANTIC INDEXING 43 CHAPTER 3 INFORMATION RETRIEVAL BASED ON QUERY EXPANSION AND LATENT SEMANTIC INDEXING 3.1 INTRODUCTION This chapter emphasizes the Information Retrieval based on Query Expansion (QE) and Latent Semantic

More information

Citations Titles Standardization Using Information Retrieval Techniques

Citations Titles Standardization Using Information Retrieval Techniques Citations Titles Standardization Using Information Retrieval Techniques Rogério Mugnaini, Esteban Fernandez Tuesta, Adalberto Otranto Tardelli OMS/OPS/BIREME Centro Latino-Americano e do Caribe de Informação

More information

Word Indexing Versus Conceptual Indexing in Medical Image Retrieval

Word Indexing Versus Conceptual Indexing in Medical Image Retrieval Word Indexing Versus Conceptual Indexing in Medical Image Retrieval (ReDCAD participation at ImageCLEF Medical Image Retrieval 2012) Karim Gasmi, Mouna Torjmen-Khemakhem, and Maher Ben Jemaa Research unit

More information

A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition

A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition Ana Zelaia, Olatz Arregi and Basilio Sierra Computer Science Faculty University of the Basque Country ana.zelaia@ehu.es

More information

An Introduction to PubMed Searching: A Reference Guide

An Introduction to PubMed Searching: A Reference Guide An Introduction to PubMed Searching: A Reference Guide Created by the Ontario Public Health Libraries Association (OPHLA) ACCESSING PubMed PubMed, the National Library of Medicine s free version of MEDLINE,

More information

Classification and retrieval of biomedical literatures: SNUMedinfo at CLEF QA track BioASQ 2014

Classification and retrieval of biomedical literatures: SNUMedinfo at CLEF QA track BioASQ 2014 Classification and retrieval of biomedical literatures: SNUMedinfo at CLEF QA track BioASQ 2014 Sungbin Choi, Jinwook Choi Medical Informatics Laboratory, Seoul National University, Seoul, Republic of

More information

Retrieval of Highly Related Documents Containing Gene-Disease Association

Retrieval of Highly Related Documents Containing Gene-Disease Association Retrieval of Highly Related Documents Containing Gene-Disease Association K. Santhosh kumar 1, P. Sudhakar 2 Department of Computer Science & Engineering Annamalai University Annamalai Nagar, India. santhosh09539@gmail.com,

More information

MIRACLE at ImageCLEFmed 2008: Evaluating Strategies for Automatic Topic Expansion

MIRACLE at ImageCLEFmed 2008: Evaluating Strategies for Automatic Topic Expansion MIRACLE at ImageCLEFmed 2008: Evaluating Strategies for Automatic Topic Expansion Sara Lana-Serrano 1,3, Julio Villena-Román 2,3, José C. González-Cristóbal 1,3 1 Universidad Politécnica de Madrid 2 Universidad

More information

What is this Song About?: Identification of Keywords in Bollywood Lyrics

What is this Song About?: Identification of Keywords in Bollywood Lyrics What is this Song About?: Identification of Keywords in Bollywood Lyrics by Drushti Apoorva G, Kritik Mathur, Priyansh Agrawal, Radhika Mamidi in 19th International Conference on Computational Linguistics

More information

A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition

A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition Ana Zelaia, Olatz Arregi and Basilio Sierra Computer Science Faculty University of the Basque Country ana.zelaia@ehu.es

More information

UTS Library s Guide to Finding Evidence-Based Practice Resources

UTS Library s Guide to Finding Evidence-Based Practice Resources UTS Library s Guide to Finding Evidence-Based Practice Resources UTS: Library UTS Library s Health librarians have made this step-by-step guide to finding evidence-based practice resources using the PICO

More information

A Machine Learning Approach for Displaying Query Results in Search Engines

A Machine Learning Approach for Displaying Query Results in Search Engines A Machine Learning Approach for Displaying Query Results in Search Engines Tunga Güngör 1,2 1 Boğaziçi University, Computer Engineering Department, Bebek, 34342 İstanbul, Turkey 2 Visiting Professor at

More information

Indexing and Retrieving Medical Literature

Indexing and Retrieving Medical Literature Evaluation of SAPHIRE: An Automated Approach to Indexing and Retrieving Medical Literature William Hersh, M.D. David H. Hickam, M.D., M.P.H. Oregon Health Sciences University Portland, Oregon, USA R. Brian

More information

A Scale-Free Network View of the UMLS to Learn Terminology Translations

A Scale-Free Network View of the UMLS to Learn Terminology Translations MEDINFO 2007 K. Kuhn et al. (Eds) IOS Press, 2007 2007 The authors. All rights reserved. A Scale-Free Network View of the UMLS to Learn Terminology Translations Chintan O. Patel, James J. Cimino Department

More information

EBSCOhost User Guide MEDLINE

EBSCOhost User Guide MEDLINE EBSCOhost User Guide MEDLINE April 1, 2003 Table of Contents What is MEDLINE?... 3 What is EBSCOhost?... 3 System Requirements...3 Choosing Databases to Search... 3 Database Help...3 Using the Toolbar...

More information

UNIVERSITY OF NEW BRUNSWICK USER GUIDE CINAHL

UNIVERSITY OF NEW BRUNSWICK USER GUIDE CINAHL UNIVERSITY OF NEW BRUNSWICK USER GUIDE CINAHL Adapted from EBSCOhost training materials by Frances Giberson, UNB Libraries August 2012 Table of Contents WHAT IS CINAHL?... 3 THE TOOLBAR... 3 SEARCHING

More information

TEXT PREPROCESSING FOR TEXT MINING USING SIDE INFORMATION

TEXT PREPROCESSING FOR TEXT MINING USING SIDE INFORMATION TEXT PREPROCESSING FOR TEXT MINING USING SIDE INFORMATION Ms. Nikita P.Katariya 1, Prof. M. S. Chaudhari 2 1 Dept. of Computer Science & Engg, P.B.C.E., Nagpur, India, nikitakatariya@yahoo.com 2 Dept.

More information

dr.ir. D. Hiemstra dr. P.E. van der Vet

dr.ir. D. Hiemstra dr. P.E. van der Vet dr.ir. D. Hiemstra dr. P.E. van der Vet Abstract Over the last 20 years genomics research has gained a lot of interest. Every year millions of articles are published and stored in databases. Researchers

More information

Text Mining: A Burgeoning technology for knowledge extraction

Text Mining: A Burgeoning technology for knowledge extraction Text Mining: A Burgeoning technology for knowledge extraction 1 Anshika Singh, 2 Dr. Udayan Ghosh 1 HCL Technologies Ltd., Noida, 2 University School of Information &Communication Technology, Dwarka, Delhi.

More information

CS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University

CS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University CS473: CS-473 Course Review Luo Si Department of Computer Science Purdue University Basic Concepts of IR: Outline Basic Concepts of Information Retrieval: Task definition of Ad-hoc IR Terminologies and

More information

Chapter 27 Introduction to Information Retrieval and Web Search

Chapter 27 Introduction to Information Retrieval and Web Search Chapter 27 Introduction to Information Retrieval and Web Search Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 27 Outline Information Retrieval (IR) Concepts Retrieval

More information

Geosemantically-enhanced PubMed Queries Using the Geonames Ontology and Web Services

Geosemantically-enhanced PubMed Queries Using the Geonames Ontology and Web Services Geosemantically-enhanced PubMed Queries Using the Geonames Ontology and Web Services Maged N. Kamel Boulos, PhD, MSc, MBBCh Plymouth University, UK mnkboulos@ieee.org Agenda About PubMed and MeSH The Problem

More information

Shrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India

Shrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 3 ISSN : 2456-3307 Some Issues in Application of NLP to Intelligent

More information

Genescene: Biomedical Text and Data Mining

Genescene: Biomedical Text and Data Mining Claremont Colleges Scholarship @ Claremont CGU Faculty Publications and Research CGU Faculty Scholarship 5-1-2003 Genescene: Biomedical Text and Data Mining Gondy Leroy Claremont Graduate University Hsinchun

More information

Taxonomies and controlled vocabularies best practices for metadata

Taxonomies and controlled vocabularies best practices for metadata Original Article Taxonomies and controlled vocabularies best practices for metadata Heather Hedden is the taxonomy manager at First Wind Energy LLC. Previously, she was a taxonomy consultant with Earley

More information

A BFS-BASED SIMILAR CONFERENCE RETRIEVAL FRAMEWORK

A BFS-BASED SIMILAR CONFERENCE RETRIEVAL FRAMEWORK A BFS-BASED SIMILAR CONFERENCE RETRIEVAL FRAMEWORK Qing Guo 1, 2 1 Nanyang Technological University, Singapore 2 SAP Innovation Center Network,Singapore ABSTRACT Literature review is part of scientific

More information

Go to library.med.nyu.edu. Select Title Search. Under E-Resources, select Biomedical Search for PsycINFO

Go to library.med.nyu.edu. Select Title Search. Under E-Resources, select Biomedical Search for PsycINFO PsycINFO (via Ovid) Ovid Technologies, Inc. provides access to a collection of health sciences databases through a single search interface. The Ovid suite of databases includes EMBASE, CINAHL, PsycINFO,

More information

Query Reformulation for Clinical Decision Support Search

Query Reformulation for Clinical Decision Support Search Query Reformulation for Clinical Decision Support Search Luca Soldaini, Arman Cohan, Andrew Yates, Nazli Goharian, Ophir Frieder Information Retrieval Lab Computer Science Department Georgetown University

More information

Information Retrieval, Information Extraction, and Text Mining Applications for Biology. Slides by Suleyman Cetintas & Luo Si

Information Retrieval, Information Extraction, and Text Mining Applications for Biology. Slides by Suleyman Cetintas & Luo Si Information Retrieval, Information Extraction, and Text Mining Applications for Biology Slides by Suleyman Cetintas & Luo Si 1 Outline Introduction Overview of Literature Data Sources PubMed, HighWire

More information

INFORMATION RETRIEVAL SYSTEM: CONCEPT AND SCOPE

INFORMATION RETRIEVAL SYSTEM: CONCEPT AND SCOPE 15 : CONCEPT AND SCOPE 15.1 INTRODUCTION Information is communicated or received knowledge concerning a particular fact or circumstance. Retrieval refers to searching through stored information to find

More information

Scientific Literature Retrieval based on Terminological Paraphrases using Predicate Argument Tuple

Scientific Literature Retrieval based on Terminological Paraphrases using Predicate Argument Tuple Scientific Literature Retrieval based on Terminological Paraphrases using Predicate Argument Tuple Sung-Pil Choi 1, Sa-kwang Song 1, Hanmin Jung 1, Michaela Geierhos 2, Sung Hyon Myaeng 3 1 Korea Institute

More information

EBP. Accessing the Biomedical Literature for the Best Evidence

EBP. Accessing the Biomedical Literature for the Best Evidence Accessing the Biomedical Literature for the Best Evidence Structuring the search for information and evidence Basic search resources Starting the search EBP Lab / Practice: Simple searches Using PubMed

More information

PubMed Guide. A. Searching

PubMed Guide. A. Searching TSRI, 400-S helplib@scripps.edu 858-784-8705 PubMed Guide A. Searching 1. Keyword searching: What is really going on when you search for a term like stem cells? can use Boolean (AND, OR, NOT) type in:

More information

Text Mining. Representation of Text Documents

Text Mining. Representation of Text Documents Data Mining is typically concerned with the detection of patterns in numeric data, but very often important (e.g., critical to business) information is stored in the form of text. Unlike numeric data,

More information

Balancing Manual and Automatic Indexing for Retrieval of Paper Abstracts

Balancing Manual and Automatic Indexing for Retrieval of Paper Abstracts Balancing Manual and Automatic Indexing for Retrieval of Paper Abstracts Kwangcheol Shin 1, Sang-Yong Han 1, and Alexander Gelbukh 1,2 1 Computer Science and Engineering Department, Chung-Ang University,

More information

Literature Search. What is PubMed? PubMed Database. What Does MEDLINE Cover? How Big is MEDLINE? PubMed Basics. PubMed

Literature Search. What is PubMed? PubMed Database. What Does MEDLINE Cover? How Big is MEDLINE? PubMed Basics. PubMed What is PubMed? Literature Search PubMed Somkiat Asawaphureekorn M.D., M.Sc. (Clinical Epidemiology) A web-based retrieval system developed by NCBI (a part of Entrez retrieval system) Free version of MEDLINE

More information

E B S C O h o s t U s e r G u i d e M E D L I N E MEDLINE. EBSCOhost User Guide MEDLINE. MEDLINE with Full Text. MEDLINE Complete

E B S C O h o s t U s e r G u i d e M E D L I N E MEDLINE. EBSCOhost User Guide MEDLINE. MEDLINE with Full Text. MEDLINE Complete E B S C O h o s t U s e r G u i d e M E D L I N E MEDLINE EBSCOhost User Guide MEDLINE MEDLINE with Full Text MEDLINE Complete Last Updated November 13, 2013 Table of Contents What is MEDLINE?... 3 What

More information

PubMed Assistant: A Biologist-Friendly Interface for Enhanced PubMed Search

PubMed Assistant: A Biologist-Friendly Interface for Enhanced PubMed Search Bioinformatics (2006), accepted. PubMed Assistant: A Biologist-Friendly Interface for Enhanced PubMed Search Jing Ding Department of Electrical and Computer Engineering, Iowa State University, Ames, IA

More information

COCHRANE LIBRARY. Contents

COCHRANE LIBRARY. Contents COCHRANE LIBRARY Contents Introduction... 2 Learning outcomes... 2 About this workbook... 2 1. Getting Started... 3 a. Finding the Cochrane Library... 3 b. Understanding the databases in the Cochrane Library...

More information

Automated data entry system: performance issues

Automated data entry system: performance issues Automated data entry system: performance issues George R. Thoma, Glenn Ford National Library of Medicine, Bethesda, Maryland 20894 ABSTRACT This paper discusses the performance of a system for extracting

More information

Semantic Annotation for Semantic Social Networks. Using Community Resources

Semantic Annotation for Semantic Social Networks. Using Community Resources Semantic Annotation for Semantic Social Networks Using Community Resources Lawrence Reeve and Hyoil Han College of Information Science and Technology Drexel University, Philadelphia, PA 19108 lhr24@drexel.edu

More information

0.1 Knowledge Organization Systems for Semantic Web

0.1 Knowledge Organization Systems for Semantic Web 0.1 Knowledge Organization Systems for Semantic Web 0.1 Knowledge Organization Systems for Semantic Web 0.1.1 Knowledge Organization Systems Why do we need to organize knowledge? Indexing Retrieval Organization

More information

Alternative Tools for Mining The Biomedical Literature

Alternative Tools for Mining The Biomedical Literature Yale University From the SelectedWorks of Rolando Garcia-Milian May 14, 2014 Alternative Tools for Mining The Biomedical Literature Rolando Garcia-Milian, Yale University Available at: https://works.bepress.com/rolando_garciamilian/1/

More information

NHS Evidence: Healthcare Databases Advanced Search

NHS Evidence: Healthcare Databases Advanced Search NHS Evidence: Healthcare Databases Advanced Search Healthcare databases are available from www.evidence.nhs.uk. Click Journals and Databases at the top left of the screen and then select Healthcare Databases

More information

Automatic Term Indexing in Medical Text Corpora. and its Applications to Consumer Health. Information Systems. Angelos Hliaoutakis

Automatic Term Indexing in Medical Text Corpora. and its Applications to Consumer Health. Information Systems. Angelos Hliaoutakis Automatic Term Indexing in Medical Text Corpora and its Applications to Consumer Health Information Systems Angelos Hliaoutakis December 3, 2009 Contents List of Figures iii Abstract v Acknowledgements

More information

A Semantic Model for Concept Based Clustering

A Semantic Model for Concept Based Clustering A Semantic Model for Concept Based Clustering S.Saranya 1, S.Logeswari 2 PG Scholar, Dept. of CSE, Bannari Amman Institute of Technology, Sathyamangalam, Tamilnadu, India 1 Associate Professor, Dept. of

More information

Questions? Find citations on the therapy of earache with antibiotics written in English and published since 2000.

Questions? Find citations on the therapy of earache with antibiotics written in English and published since 2000. Questions? Find an article studying on the clinical application of benazepri published in NEJM, and written by Prof. Hou Fanfan who works in Nanfang Hospital. 1 Questions? Find citations on the therapy

More information

Domain-specific Concept-based Information Retrieval System

Domain-specific Concept-based Information Retrieval System Domain-specific Concept-based Information Retrieval System L. Shen 1, Y. K. Lim 1, H. T. Loh 2 1 Design Technology Institute Ltd, National University of Singapore, Singapore 2 Department of Mechanical

More information

Using a Medical Thesaurus to Predict Query Difficulty

Using a Medical Thesaurus to Predict Query Difficulty Using a Medical Thesaurus to Predict Query Difficulty Florian Boudin, Jian-Yun Nie, Martin Dawes To cite this version: Florian Boudin, Jian-Yun Nie, Martin Dawes. Using a Medical Thesaurus to Predict Query

More information

Literature Searching: hints and tips for developing search strategies and running searches

Literature Searching: hints and tips for developing search strategies and running searches Literature Searching: hints and tips for developing search strategies and running searches Kathy Murray, medical librarian kmurray10@alaska.edu 786.1611 Outline Being thorough Formulating the question

More information

Challenge. Case Study. The fabric of space and time has collapsed. What s the big deal? Miami University of Ohio

Challenge. Case Study. The fabric of space and time has collapsed. What s the big deal? Miami University of Ohio Case Study Use Case: Recruiting Segment: Recruiting Products: Rosette Challenge CareerBuilder, the global leader in human capital solutions, operates the largest job board in the U.S. and has an extensive

More information

Chapter 6: Information Retrieval and Web Search. An introduction

Chapter 6: Information Retrieval and Web Search. An introduction Chapter 6: Information Retrieval and Web Search An introduction Introduction n Text mining refers to data mining using text documents as data. n Most text mining tasks use Information Retrieval (IR) methods

More information

I Know Your Name: Named Entity Recognition and Structural Parsing

I Know Your Name: Named Entity Recognition and Structural Parsing I Know Your Name: Named Entity Recognition and Structural Parsing David Philipson and Nikil Viswanathan {pdavid2, nikil}@stanford.edu CS224N Fall 2011 Introduction In this project, we explore a Maximum

More information

Searching Pubmed Database استخدام قاعدة المعلومات Pubmed

Searching Pubmed Database استخدام قاعدة المعلومات Pubmed Searching Pubmed Database استخدام قاعدة المعلومات Pubmed برنامج مهارات البحث العلمي مركز البحىث بأقسام العلىم والدراسات الطبية للطالبات األحد 1433/11/14 ه الموافق 30 2012 /9/ م د. سيناء عبد المحسن العقيل

More information

How to Apply Basic Principles of Evidence-

How to Apply Basic Principles of Evidence- CSHP 2015 TOOLKIT FROM PAPER TO PR ACTI CE: INCORPORATING EV IDENCE INTO YOUR PAGE 1 PHARMACY PR ACTICE (O BJECTIVE 3.1) How to Apply Basic Principles of Evidence- Based Practice May 2011 How to Do a Basic

More information

SciMiner User s Manual

SciMiner User s Manual SciMiner User s Manual Copyright 2008 Junguk Hur. All rights reserved. Bioinformatics Program University of Michigan Ann Arbor, MI 48109, USA Email: juhur@umich.edu Homepage: http://jdrf.neurology.med.umich.edu/sciminer/

More information

CREATING A BIOMEDICAL ONTOLOGY INDEXED SEARCH ENGINE TO IMPROVE THE SEMANTIC RELEVANCE OF RETREIVED MEDICAL TEXT

CREATING A BIOMEDICAL ONTOLOGY INDEXED SEARCH ENGINE TO IMPROVE THE SEMANTIC RELEVANCE OF RETREIVED MEDICAL TEXT Clemson University TigerPrints All Dissertations Dissertations 5-2010 CREATING A BIOMEDICAL ONTOLOGY INDEXED SEARCH ENGINE TO IMPROVE THE SEMANTIC RELEVANCE OF RETREIVED MEDICAL TEXT William Taylor II

More information

Conceptual document indexing using a large scale semantic dictionary providing a concept hierarchy

Conceptual document indexing using a large scale semantic dictionary providing a concept hierarchy Conceptual document indexing using a large scale semantic dictionary providing a concept hierarchy Martin Rajman, Pierre Andrews, María del Mar Pérez Almenta, and Florian Seydoux Artificial Intelligence

More information

Journal of Asian Scientific Research FEATURES COMPOSITION FOR PROFICIENT AND REAL TIME RETRIEVAL IN CBIR SYSTEM. Tohid Sedghi

Journal of Asian Scientific Research FEATURES COMPOSITION FOR PROFICIENT AND REAL TIME RETRIEVAL IN CBIR SYSTEM. Tohid Sedghi Journal of Asian Scientific Research, 013, 3(1):68-74 Journal of Asian Scientific Research journal homepage: http://aessweb.com/journal-detail.php?id=5003 FEATURES COMPOSTON FOR PROFCENT AND REAL TME RETREVAL

More information

Reference Guide. cochranelibrary.com

Reference Guide. cochranelibrary.com Reference Guide cochranelibrary.com Did you know? Ten tips for getting the most out of the Cochrane Library 1. Discover the complete Cochrane Library in Spanish View, search, and discover content in Spanish

More information

NYS Early Learning Trainer Credential. Portfolio Instructions

NYS Early Learning Trainer Credential. Portfolio Instructions TC Portfolio Guidelines 8/17/2011 Contents Trainer Definitions 3 General Instructions.....4 Credential Levels...4 Portfolio Structure...5 Portfolio Content Parts 1 and 2: Portfolio Entries.........5 Portfolio

More information

PROPOSED DOCUMENT. International Medical Device Regulators Forum

PROPOSED DOCUMENT. International Medical Device Regulators Forum PROPOSED DOCUMENT International Medical Device Regulators Forum Title: Assembly and Technical Guide for IMDRF Table of Contents (ToC) Submissions (ToC-based submissions) Authoring Group: Regulated Product

More information

Information Retrieval. Chap 7. Text Operations

Information Retrieval. Chap 7. Text Operations Information Retrieval Chap 7. Text Operations The Retrieval Process user need User Interface 4, 10 Text Text logical view Text Operations logical view 6, 7 user feedback Query Operations query Indexing

More information

warwick.ac.uk/lib-publications

warwick.ac.uk/lib-publications Original citation: Zhao, Lei, Lim Choi Keung, Sarah Niukyun and Arvanitis, Theodoros N. (2016) A BioPortalbased terminology service for health data interoperability. In: Unifying the Applications and Foundations

More information

Semantically Driven Snippet Selection for Supporting Focused Web Searches

Semantically Driven Snippet Selection for Supporting Focused Web Searches Semantically Driven Snippet Selection for Supporting Focused Web Searches IRAKLIS VARLAMIS Harokopio University of Athens Department of Informatics and Telematics, 89, Harokopou Street, 176 71, Athens,

More information

This paper studies methods to enhance cross-language retrieval of domain-specific

This paper studies methods to enhance cross-language retrieval of domain-specific Keith A. Gatlin. Enhancing Cross-Language Retrieval of Comparable Corpora Through Thesaurus-Based Translation and Citation Indexing. A master s paper for the M.S. in I.S. degree. April, 2005. 23 pages.

More information

Introduction to Information Retrieval. Lecture Outline

Introduction to Information Retrieval. Lecture Outline Introduction to Information Retrieval Lecture 1 CS 410/510 Information Retrieval on the Internet Lecture Outline IR systems Overview IR systems vs. DBMS Types, facets of interest User tasks Document representations

More information

Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p.

Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p. Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p. 6 What is Web Mining? p. 6 Summary of Chapters p. 8 How

More information

Customer Clustering using RFM analysis

Customer Clustering using RFM analysis Customer Clustering using RFM analysis VASILIS AGGELIS WINBANK PIRAEUS BANK Athens GREECE AggelisV@winbank.gr DIMITRIS CHRISTODOULAKIS Computer Engineering and Informatics Department University of Patras

More information

SYSTEMS FOR NON STRUCTURED INFORMATION MANAGEMENT

SYSTEMS FOR NON STRUCTURED INFORMATION MANAGEMENT SYSTEMS FOR NON STRUCTURED INFORMATION MANAGEMENT Prof. Dipartimento di Elettronica e Informazione Politecnico di Milano INFORMATION SEARCH AND RETRIEVAL Inf. retrieval 1 PRESENTATION SCHEMA GOALS AND

More information

Precise Medication Extraction using Agile Text Mining

Precise Medication Extraction using Agile Text Mining Precise Medication Extraction using Agile Text Mining Chaitanya Shivade *, James Cormack, David Milward * The Ohio State University, Columbus, Ohio, USA Linguamatics Ltd, Cambridge, UK shivade@cse.ohio-state.edu,

More information

Introduction to Ovid. As a Clinical Librarían tool! Masoud Mohammadi Golestan University of Medical Sciences

Introduction to Ovid. As a Clinical Librarían tool! Masoud Mohammadi Golestan University of Medical Sciences Introduction to Ovid As a Clinical Librarían tool! Masoud Mohammadi Golestan University of Medical Sciences Overview Ovid helps researchers, librarians, clinicians, and other healthcare professionals find

More information

TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES

TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES Mu. Annalakshmi Research Scholar, Department of Computer Science, Alagappa University, Karaikudi. annalakshmi_mu@yahoo.co.in Dr. A.

More information

BioNav: An Ontology-Based Framework to Discover Semantic Links in the Cloud of Linked Data

BioNav: An Ontology-Based Framework to Discover Semantic Links in the Cloud of Linked Data BioNav: An Ontology-Based Framework to Discover Semantic Links in the Cloud of Linked Data María-Esther Vidal 1, Louiqa Raschid 2, Natalia Márquez 1, Jean Carlo Rivera 1, and Edna Ruckhaus 1 1 Universidad

More information

Letter Pair Similarity Classification and URL Ranking Based on Feedback Approach

Letter Pair Similarity Classification and URL Ranking Based on Feedback Approach Letter Pair Similarity Classification and URL Ranking Based on Feedback Approach P.T.Shijili 1 P.G Student, Department of CSE, Dr.Nallini Institute of Engineering & Technology, Dharapuram, Tamilnadu, India

More information

Ontology Extraction from Heterogeneous Documents

Ontology Extraction from Heterogeneous Documents Vol.3, Issue.2, March-April. 2013 pp-985-989 ISSN: 2249-6645 Ontology Extraction from Heterogeneous Documents Kirankumar Kataraki, 1 Sumana M 2 1 IV sem M.Tech/ Department of Information Science & Engg

More information

Latest development in image feature representation and extraction

Latest development in image feature representation and extraction International Journal of Advanced Research and Development ISSN: 2455-4030, Impact Factor: RJIF 5.24 www.advancedjournal.com Volume 2; Issue 1; January 2017; Page No. 05-09 Latest development in image

More information

Semantic Extensions to Syntactic Analysis of Queries Ben Handy, Rohini Rajaraman

Semantic Extensions to Syntactic Analysis of Queries Ben Handy, Rohini Rajaraman Semantic Extensions to Syntactic Analysis of Queries Ben Handy, Rohini Rajaraman Abstract We intend to show that leveraging semantic features can improve precision and recall of query results in information

More information

Literature Databases

Literature Databases Literature Databases Introduction to Bioinformatics Dortmund, 16.-20.07.2007 Lectures: Sven Rahmann Exercises: Udo Feldkamp, Michael Wurst 1 Overview 1. Databases 2. Publications in Science 3. PubMed and

More information

Semi-Supervised Abstraction-Augmented String Kernel for bio-relationship Extraction

Semi-Supervised Abstraction-Augmented String Kernel for bio-relationship Extraction Semi-Supervised Abstraction-Augmented String Kernel for bio-relationship Extraction Pavel P. Kuksa, Rutgers University Yanjun Qi, Bing Bai, Ronan Collobert, NEC Labs Jason Weston, Google Research NY Vladimir

More information

Searching for Literature Using HDAS (Healthcare Databases Advanced Search)

Searching for Literature Using HDAS (Healthcare Databases Advanced Search) Searching for Literature Using HDAS (Healthcare Databases Advanced Search) 1. What is HDAS?... page 2 2. How do I access HDAS?... page 3 3. Questions and concepts (PICO) page 4 4. Selecting a database.

More information

Automatic Text Summarization System Using Extraction Based Technique

Automatic Text Summarization System Using Extraction Based Technique Automatic Text Summarization System Using Extraction Based Technique 1 Priyanka Gonnade, 2 Disha Gupta 1,2 Assistant Professor 1 Department of Computer Science and Engineering, 2 Department of Computer

More information

Lane Medical Library Stanford University Medical Center

Lane Medical Library Stanford University Medical Center Lane Medical Library Stanford University Medical Center http://lane.stanford.edu LaneAskUs@Stanford.edu 650.723.6831 PubMed: A Quick Guide PubMed: (connect from Lane Library s webpage, http://lane.stanford.edu/

More information