How orthogonal are the OBO Foundry ontologies?

Size: px
Start display at page:

Download "How orthogonal are the OBO Foundry ontologies?"

Transcription

1 JOURNAL OF BIOMEDICAL SEMANTICS PROCEEDINGS Open Access How orthogonal are the OBO Foundry ontologies? Amir Ghazvinian *, Natalya F Noy *, Mark A Musen From Bio-Ontologies 2010: Semantic Applications in Life Sciences Boston, MA, USA July 2010 * Correspondence: amirg@stanford. edu; noy@stanford.edu Stanford Center for Biomedical Informatics Research, Stanford University, 251 Campus Drive, Stanford, CA, USA Abstract Background: Ontologies in biomedicine facilitate information integration, data exchange, search and query of biomedical data, and other critical knowledgeintensive tasks. The OBO Foundry is a collaborative effort to establish a set of principles for ontology development with the eventual goal of creating a set of interoperable reference ontologies in the domain of biomedicine. One of the key requirements to achieve this goal is to ensure that ontology developers reuse term definitions that others have already created rather than create their own definitions, thereby making the ontologies orthogonal. Methods: We used a simple lexical algorithm to analyze the extent to which the set of OBO Foundry candidate ontologies identified from September 2009 to September 2010 conforms to this vision. Specifically, we analyzed (1) the level of explicit term reuse in this set of ontologies, (2) the level of overlap, where two ontologies define similar terms independently, and (3) how the levels of reuse and overlap changed during the course of this year. Results: We found that 30% of the ontologies reuse terms from other Foundry candidates and 96% of the candidate ontologies contain terms that overlap with terms from the other ontologies. We found that while term reuse increased among the ontologies between September 2009 and September 2010, the level of overlap among the ontologies remained relatively constant. Additionally, we analyzed the six ontologies announced as OBO Foundry members on March 5, 2010, and identified that the level of overlap was extremely low, but, notably, so was the level of term reuse. Conclusions: We have created a prototype web application that allows OBO Foundry ontology developers to see which classes from their ontologies overlap with classes from other ontologies in the OBO Foundry ( From our analysis, we conclude that while the OBO Foundry has made significant progress toward orthogonality during the period of this study through increased adoption of explicit term reuse, a large amount of overlap remains among these ontologies. Furthermore, the characteristics of the identified overlap, such as the terms it comprises and its distribution among the ontologies, indicate that the achieving orthogonality will be exceptionally difficult, if not impossible Ghazvinian et al; licensee BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License ( which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

2 Page 2 of 14 Introduction Ontologies in biomedicine facilitate information integration, data exchange, search and query of biomedical data, and other critical knowledge-intensive tasks [1,2]. The OBO Foundryisaninitiativetocreateasetofwell-defined reference ontologies that are designed to work with one another to form a single, non-redundant system [3]. The OBO Foundry consortium defines a number of principles for ontology development and ontology developers that want their ontologies to be members of the OBO Foundry must work to conform to these principles. The principles include, for example, the requirement that the ontology is openly available; that the ontologies can be expressed in a common shared syntax; that all the terms in the ontologies have well-formed definitions; and that the ontology has the plurality of users. Ontology developers who request to have their ontology as an OBO Foundry candidate are expected to work with the OBO Foundry custodians to ensure that their ontology conforms to the OBO Foundry principles. The set of ontologies in the OBO Foundry evolves constantly. New ontologies submitted to the OBO Foundry are added to the list of candidate ontologies and the OBO Foundry community works together to bring these ontologies as close to satisfying the OBO Foundry principles as possible. After the custodians decide that an ontology conforms sufficiently to the OBO Foundry principles, it may become a bona fide OBO Foundry member. One of the key and, arguably, more controversial aims of the OBO Foundry effort is to create a set of orthogonal ontologies, which means that each term is defined in only one ontology. Other ontologies that need to use the term refer to its definition in thesourceontology.forexample,thecelltypeontologymaydefineatermcell. Then other ontologies, such as the Foundational Model of Anatomy (FMA) [4], that need to refer to the term Cell will refer to the the term Cell in the Cell Type ontology. Thus, in order to increase orthogonality, ontology authors reuse terms from other ontologies. Each term in an OBO Foundry ontology has a unique identifier (id). The id consists of a prefix, which is the ontology abbreviation, and the id for that term within the ontology. For example, the id for the class Biological process in the Gene Ontology (GO) [13] is GO: Because each term in the OBO Foundry has a unique id (another OBO Foundry principle), the FMA can refer to the id of the term Cell in the Cell Type ontology whenever the FMA authors need the term Cell. Analternative form of term reuse for ontologies in the OBO format (currently the most common format in the OBO Foundry) is to use the xref property to indicate that the term that the authors are defining is a term from another ontology. While the authors still create a new term in this case and link it to a term in another ontology we can still consider such a reference as reuse: The authors have clearly identified the correspondence, and it can be a mechanical task to replace the newly defined class with the one that it references. Our goal in this paper is twofold: (1) to analyze how orthogonal the OBO Foundry ontologies actually are; and (2) to analyze how the level of orthogonality in this set of ontologies has evolved over time, as the OBO Foundry community has worked to bring these ontologies closer to its stated ideal. Thus, we want to test the hypothesis that the ontologies developed by the OBO Foundry community are indeed approaching orthogonality, as the contributors work together to eliminate the overlap between the ontologies and to replace it with the reuse of terms across ontologies. Furthermore,

3 Page 3 of 14 evaluating the conformance of these ontologies to the criterion of orthogonality can serve as a measure of quality control for the OBO Foundry. We took three snapshots of the OBO Foundry candidates, each six months apart, and analyzed the extent to which this set of ontologies was approaching the goal of orthogonality. We analyzed the OBO Foundry candidates in September 2009, March 2010, and again in September 2010 (Figure 1). Our analysis examines 53 ontologies that were OBO Foundry candidates on September 1, 2009, the date that we took the first snapshot. Because many of these ontologies were originally developed independently of one another, there is a significant level of overlap among them. The OBO Foundry community states that it works actively to eliminate this overlap. Since the OBO Foundry project evolves continuously, our analysis of the data on any specific date is naturally only a snapshot of its state at that time. On March 5th, 2010, the OBO Foundry reorganized its set of ontologies, inducting the first six candidates to become OBO Foundry members [5]. This announcement asserted that these ontologies conform sufficiently to the OBO Foundry principles. We have included in this paper a separate analysis of these six ontologies. Until now, there has been no quantitative analysis either of the level of overlap or of the level of term reuse. There is also no comprehensive list of the terms that are represented in more than one ontology and that must be reconciled. In this paper, we use a simple lexical method to find correspondences (both reused terms and overlapping terms) between terms in different ontologies. We have shown previously that this method produces good results for mapping terms in biomedical ontologies [6,7]. We use the results of this mapping to analyze the level of term reuse and term overlap in the OBO Foundry ontologies and to analyze the dynamics of these links over time. Specifically, this paper makes the following contributions: We analyze the dynamics of term reuse among OBO Foundry candidate ontologies. We analyze the dynamics of overlap among the OBO Foundry candidates and thus quantify the amount of work that needs to be done to achieve orthogonality among the ontologies. We provide a list of current overlapping terms for OBO Foundry candidates, thus facilitating progress toward orthogonality. Figure 1 Snapshots of the OBO Foundry ontologies for September 2009, March 2010, and September We analyzed 53 OBO Foundry ontologies at the three dates indicated in the figure. The numbers in bold next to each group of ontologies represent the number of ontologies in the group. Several ontologies are omitted from each snapshot for readability. Beginning in March 2010, the OBO Foundry designated six ontologies as OBO Foundry members, which we analyzed separately in addition to analyzing as part of the set of OBO Foundry ontologies.

4 Page 4 of 14 We analyze the level of reuse and overlap for the six ontologies that became the first OBO Foundry members. Methods In each of three snapshots, we analyzed 53 ontologies that were identified as OBO Foundry candidates. These 53 ontologies contained 272,168 terms in September 2009, 311,351 terms in March 2010, and 318,872 terms in September To perform our analysis, we processed the ontologies from the OBO Foundry repository, created lexical mappings between terms based on the terms preferred names, and then divided this set of mappings into the cases that constitute term reuse and the cases that constitute overlap of terms. Collecting and processing the ontologies BioPortal is an ontology library that, among other features, provides browsing and visualization for more than 200 biomedical ontologies [8,9]. The BioPortal ontology library includes ontologies that individual investigators submit directly to BioPortal, as well as terminologies drawn from both the Unified Medical Language System (UMLS) [10,11] and the WHO Family of International Classifications (WHO-FIC) [12]. The BioPortal repository also includes the ontologies that are candidates or members of the OBO Foundry. BioPortal processes all the ontologies, creating a unified and easily accessible index of all the content. We used this index to perform our analysis of the ontologies that were listed as OBO Foundry candidates or members at the times of our snapshots. We excluded two ontologies from our analysis a timed and an abstract version of the Human developmental anatomy ontology; these ontologies contain multiple classes with the same preferred name. Thus, a term from another ontology with that preferred name may map to multiple classes in one of these ontologies. Furthermore, the large inherent overlap between these two ontologies generated more than 100,000 inter-ontology mappings, with each mapped term from the abstract version overlapping with several terms from the timed version. We analyzed separately the six ontologies that were announced as the first members of the OBO Foundry on March 5, 2010: Chemical Entities of Biological Interest (CHEBI), Gene Ontology (GO), Phenotypic Quality Ontology (PATO), Protein Ontology (PRO), Xenopus Anatomy Ontology (XAO), Zebrafish Anatomy Ontology (ZFA). The announcement asserted that these ontologies have satisfied the OBO Foundry principles to an acceptable degree and are therefore the OBO Foundry community recommends that these ontologies serve as preferred targets for community convergence. Creating lexical mappings We used lexical mappings between ontology terms to identify cases of term reuse and to estimate the level of overlap between the ontologies. We used the following steps to create lexical mappings between terms: 1. generate an index of labels that are used as preferred names of terms; 2. normalize the strings that represent the labels in the index; 3. find pairs of matching labels by comparing the normalized strings;

5 Page 5 of create mappings between terms in the ontologies based on the matching labels that we identified in the previous step. After generating a database of labels used as preferred names of terms, we normalized all the names by converting them to lower case and removing all delimiters (e.g., spaces, underscores, parentheses). We used a MySQL database to index each label along with the ID for the ontology and the term that the label came from. To find matching terms, we performed an SQL query on the table of normalized labels to identify pairs of strings that matched exactly. To improve precision, we compared only strings with at least three characters. We created a mapping between every pair of ontology terms where the normalized strings representing their preferred names were equal. Identifying reuse We used the mappings between ontology terms to identify the level of term reuse across ontologies by employing the following constructs to identify explicit reuse of terms from one ontology by another: Case 1: The ids of the terms were the same. Case 2: The ids of the terms appear as if they were intended to be the same Case 3: One term contained a reference to the id of the other term as xref. Recall that the goal of the OBO Foundry is to create a set of orthogonal ontologies, where ontology developers reuse terms from other ontologies rather than define their own. The two recognized ways to reuse terms are for one ontology to refer explicitly to a term id from another ontology (Case 1) and for one ontology to use the xref property in the OBO format to indicate that a term is defined in another ontology (Case 3). We also observed and included in our count of the reused terms cases of intended reuse: an ontology author uses an id for a term incorrectly, but it seems clear what the intention of the author was. For instance, because some ontologies changed their prefix over time, some reused terms use prefixes from the older versions, which are no longer valid. Similarly, there are inconsistencies in the reuse of the terms from the Basic Formal Ontology (BFO) perhaps because BFO includes several prefixes in its term ids (span:, snap:, and bfo). Additionally, we identified cases where the ids were intended to refer to a term in another ontology, but were not used correctly. For example, MP: MP_ from Suggested Ontology for Pharmacogenomics (SOPHARM) is not a correct reference to the term MP: from Mammalian phenotype (MP), but the SOPHARM authors apparently intended this id to refer to a term from the MP ontology. The mappings that we generated automatically directly provided measures for Cases 1 and 2 in the list above. We processed the files in OBO format separately to analyze reuse through xref. Identifying overlap Our lexical-mapping method produced pairs of terms with matching preferred names. By removing the terms that constitute reuse, we arrived at the set of terms that constitute overlap: these are terms with matching preferred names that do not refer to each other explicitly.

6 Page 6 of 14 Results Applying the lexical mapping technique resulted in 20,570 mappings among the OBO Foundry candidate ontologies in the September 2009 snapshot, 22,924 mappings in the March 2010 snapshot, and 27,059 mappings in the September 2010 snapshot. In each case, we analyzed the set of reused terms and overlapping terms separately (Table 1). We then analyzed reused and overlapping terms among the six OBO Foundry members announced on March 5, Dynamics of reuse Among the 53 OBO Foundry candidates that we analyzed, reuse increased from 10,972 cases in September 2009 to 17,067 cases in September These reuse relationships comprise the three forms identified earlier. Figure 2 shows their distribution for each of our three snapshots. In each snapshot, the majority of reuse results from a term from one ontology directly referencing the id of a term from another ontology or apparently intending to do so. The smallest portion of reuse comes through use of the xref property. We constructed graphs showing reuse relationships among ontologies for the March and September 2010 snapshots (Figure 3). In each graph, an arrow from a node representing one ontology to a node representing another ontology means that the former reuses some terms from the latter. The figure illustrates both the current state of reuse and the change in reuse among ontologies over the six month period from March 2010 to September Currently, 16 of the 53 ontologies (30%) reuse at least one term from another ontology and 19 ontologies (36%) have at least one of their terms reused. The Common Anatomy Reference Ontology (CARO) and the Cell Type ontology (CL) are the two most commonly reused ontologies; they have 8 and 7 ontologies reusing terms from them respectively. Notably, as of September 2010, CARO had three more ontologies reusing terms from it than it did in March. The Ontology for Biomedical Investigations (OBI) and the Infectious Disease ontology (IDO) reuse terms from the greatest number of ontologies: OBI reuses terms from 10 different ontologies and IDO reuses terms from 6 ontologies. IDO reuses a total of 50 terms (or 11% of its total terms) and OBI reuses 177 terms, about 7% of its total. Overall, if we define a reuse relationship to be a relationship between two ontologies where one ontology reuses at least one term from the other, the number of reuse relationships among the 53 OBO Foundry candidate ontologies almost doubled between March and September 2010, increasing from 24 to 45. In addition to analyzing the patterns of reuse among ontologies, we examined the extent to which each of these ontologies reuses terms from other ontologies. We found that 16 ontologies reuse at least Table 1 Summary of overlap and reuse statistics over time. September 2009 March 2010 September 2010 Total terms 272, , ,872 Reused terms 10,972 (4.0%) 13,458 (4.3%) 17,067 (5.4%) Overlap terms 9,598 (3.5%) 9,566 (3.1%) 9,992 (3.1%) The table shows the number of terms, and the number of overlapping and reused terms for the three snapshots: September 2009, March 2010, September 2010

7 Page 7 of 14 Figure 2 The reuse relationships for each of our OBO Foundry candidate snapshots, broken up by type of reuse. The height of each bar indicates the total number of reused terms for a particular snapshot. Each bar also shows what portion of the total reuse for that snapshot consists of each of the three types of reuse that we identify. 10% of their terms. Table 2 shows, for each of these ontologies, the number of terms that the ontology reuses, relative to the size of the ontology. Dynamics of overlap From lexical mapping, we found that the number of overlaps among the OBO Foundry candidates increased slightly from 9,598 in September 2009 to 9,992 overlaps in September In the most recent snapshot, 51 ontologies (96%) have at least one overlapping term. Figure 4 shows graphs representing the level of overlap between ontologies for March and September The left panel of each graph has connections between nodes representing ontologies if at least 30% of the terms in one ontology overlap with terms from the other ontology. The right panel of each figure represents the same graph, but with the edge created if 10% of the terms overlap. The figure shows that the number of ontology pairs where at least 30% of one ontology overlaps with the other decreased significantly between March and September Additionally, CARO had far fewer ontologies with which it shares a large portion of its terms. We also identified new overlap introduced by ongoing ontology development. For example, as of September 2010, BFO overlaps with Units of Measurement (UO), Phenotypic Quality (PATO), and Mass Spectometry (MS), overlaps which did not exist in March BFO is a small, upper-level ontology designed to be reused for ontology building, particularly among OBO Foundry ontologies; however, UO, PATO, and MS do not explicitly reuse terms from BFO, but instead define new terms that correspond to the terms in BFO. Finally, we note that out of the 53 ontologies in our analysis, 17 ontologies (32%) had fewer than 4% of their terms overlapping with those of other ontologies and 32 ontologies (59%) had fewer than 10% of their terms overlapping with those of other ontologies. Furthermore, there are 7 ontologies (13%) with 5 or

8 Page 8 of 14 Figure 3 The reuse relationships of OBO Foundry candidate ontologies for March 2010 and September Each node represents an ontology. Please refer to for the list of full ontology names corresponding to the acronyms that we use as labels in the graph. In each graph, an edge from a node representing an ontology O 1 to a node representing an ontology O 2 means that ontology O 1 reuses at least one term from O 2. fewer total overlapping terms. By contrast, table 3 shows the ontologies that have the largest percent of their terms overlapping with terms from other ontologies in our latest snapshot. Reuse and overlap among the first designated OBO Foundry members In our analysis of the six OBO Foundry members [5], we identified 179 cases of term reuse and 135 cases of term overlap. Most of the term reuse and overlap involves only two ontologies, Xenopus Anatomy and Development (XAO) and Zebrafish Anatomy and Development (ZFA). These two ontologies represent the anatomy and

9 Page 9 of 14 Table 2 The extent of reuse for each OBO Foundry candidate that reuses at least 10% of its terms from other ontologies. Ontology Reuses From # of Terms Reused Ontology Size % Reused ZFA CARO, TAO, CL 2,505 2,593 96% UO PATO 2,150 2,425 88% SOPHARM BFO, MP, SO 6,624 8,603 77% MS PATO, UO 2,417 3,887 62% XAO AAO, CARO, ZFA, CL % TAO CL, CARO 402 3,001 13% IDO BFO, GO, OGMS, CHEBI, CARO, TRANS % Column (1) is the ontology id; (2) is the set of ontologies from which it reuses terms; (3) is the number of terms reused; (4) is the number of terms in the ontology; (5) is the percent of the ontology is composed of reused terms. development of different organisms. Naturally, they share a large number of common terms, although those terms are not necessarily equivalent: They are anatomical parts of different organisms. The 179 terms that XAO reuses from ZFA are references through the xref field: An anatomical term describing xenopus in XAO references a term with the same name describing zebrafish in ZFA. However, the majority of overlapping terms in this set (109 of 135; 81%) are also between XAO and ZFA. The relationship of overlapping terms between these two ontologies is similar in nature to the relationship of the reused terms between them, identified through xref. Table4 Figure 4 The overlap relationships of OBO Foundry candidate ontologies for March 2010 and September For each date, the two graphs show links between ontologies at 30% and 10% overlap. Nodes represent OBO Foundry candidate ontologies. An edge from a node representing an ontology O 1 to a node representing an ontology O 2 means that at least 30% (10%) of the terms in O 1 map to some term in O 2.

10 Page 10 of 14 Table 3 The amount of overlap for the OBO Foundry candidate ontologies. Ontology # of Overlapping Terms Ontology Size (# of terms) % of Overlapping Terms CARO % XAO % MPATH % MA 1,032 2,956 35% SAO % ZFA 815 2,593 31% TAO 888 3,001 30% BFO % AAO % APO % Column (1) is the ontology id; (2) is the number of terms that overlap with terms from a different ontology; (3) is the ontology size; (4) is the percent of terms from the ontology that overlap with a term from another ontology. summarizes the remaining overlap among the six ontologies. Except for XAO and ZFA, no other pair of ontologies has more than 10 overlapping terms between them, but each ontology overlaps with at least one other ontology. The percentage of terms that overlap is significantly lower than the percentage that we observed among the candidates (Table 3). Discussion We used a lexical matching algorithm to generate mappings because we have previously shown that this method has high precision and relatively good recall for biomedical ontologies. However, there are a few key limitations to our approach. First, ontologies that are very small can strongly affect our analysis. For example, CARO and BFO both appear in our table of the ontologies with the largest percentage of overlap. BFO has more than 20% of its terms overlapping with terms from other ontologies, and, as we mentioned earlier, 100% of CARO terms overlap with terms in other ontologies. However, CARO contains only 46 terms and BFO contains only 39 terms so the large overlap percentages reflect only a handful of overlapping terms from these ontologies in reality. Second, our analysis errs on the side of precision, with most mappings that we identify representing real overlap. Our analysis indicates that our recall is good, but not perfect. We previously found that, when identifying mappings between anatomy ontologies, recall for a lexical matching algorithm with less stringent requirements was 65%. Therefore, our more precise method most likely has recall below 65%. We found that lexical matching works relatively well for finding mappings between biomedical ontologies due to rich lexical information in these ontologies [6]; however, lexical methods Table 4 The amount of overlap for the six OBO Foundry members. Ontology # of Overlapping Terms Ontology Size (# of terms) % of Overlapping Terms XAO % ZFA 113 2,475 5% PATO 4 1, % PRO 9 11, % GO 12 29, % CHEBI 8 24, % Column (1) is the ontology id; (2) is the number of terms that overlap with a term from another ontology; (3) is the ontology size; (4) is the percent of terms from the ontology that overlap with a term from another ontology.

11 Page 11 of 14 cannot identify mappings between classes without lexical similarity. Since lexical matching identifies high-precision mappings with potentially limited recall, our numbers represent the lower bound on the number of overlapping terms. We need more advanced methods to identify the remaining overlap. Finally, our analysis presents snapshots of the state of orthogonality at given points in time. The OBO Foundry is a work in progress as some of the OBO Foundry candidate ontologies are updated regularly. Therefore, the number of reused terms and the amount of overlap may change frequently. Despite these limitations, our analysis allows ontology developers to form a more complete picture of inter-ontology relationships among OBO Foundry candidates and to understand how these relationships have evolved over time. In the course of one year (September 2009 to September 2010), reuse among the 53 OBO Foundry candidates increased from 10,972 cases to 17,067 cases, with 13,358 cases of reuse at the half way point in March Additionally, about 63% of mapped terms reflect reuse of terms among OBO Foundry candidates in September 2010, compared to 53% in September Although both the absolute number of reuse cases and the percentage of mappings that represent reuse rather than overlap have increased, the percentage of OBO Foundry candidate terms involved in an overlap has remained relatively constant at around 3%. Thus, while reuse has increased in the OBO Foundry over the course of this year, overlap has remained relatively consistent both in terms of the absolute number of overlaps and in terms of the percentage of terms from the OBO Foundry that are involved in overlap. Our results indicate that the nature of term reuse has evolved over time. We can see in Figure 2 that, from September 2009 to March 2010, most of the increased reuse resulted from increased use of the xref property to refer to classes, and from March 2010 to September 2010, most of the additional reuse consisted of terms using the same id, although the other two types of reuse increased as well. Examining ontologies with significant reuse can also yield insights into the characteristics of those ontologies. For example, as of September 2010, OBI reuses terms from 10 different ontologies. OBI serves as a cross-disciplinary ontology for describing biological and clinical experimental processes and defines both broad terms that span multiple domains and domain-specific terms relevant to particular areas of study [14]. As such, OBI reuses terms from a number of different ontologies by design. Although the authors of 16 ontologies reuse terms from other ontologies, this reuse exists in a variety of different forms. Indeed, there is currently no standard way for an ontology author to define that she is reusing a term from another ontology. In fact, a large portion of current reuse results from cases in which the id of one term is intended to reference another one explicitly, but does not match exactly. By analyzing the dynamics of the overlap landscape (Figure 4), we can see how the distribution of overlaps among ontologies has changed over time. Although the level of overlap increased between March and September 2010, the number of ontology pairs for which at least 30% of one ontology overlaps with terms in the other ontology decreased. Additionally, the number of ontology pairs with at least 10% overlap decreased over the same period. This trend indicates that ontology development efforts have reduced some significant overlap among the ontologies, but a similar level of total overlap remains.

12 Page 12 of 14 There are only 9 ontologies for which more than 20% of the terms overlap with terms in other ontologies. Additionally, there are only a handful of ontologies that have more than 10% of their terms mapped to any one other ontology. These facts, when coupled with the statistic that almost one-third of the ontologies have fewer than 4% of their terms overlapping with terms from other ontologies, suggest that there are a few ontologies with considerable overlap and many ontologies that each have just a small amount of overlap. Naturally, the larger the number of ontologies that require reconciliation with other ontologies, the more difficult the process: If two ontologies have 100 classes in common, the developers of only these two ontologies need to coordinate their efforts. If the same 100 overlapping terms are spread across 10 ontologies, ten teams of developers will need to be involved in the process of reconciliation, which will make the process more difficult. Our results indicate that the largest amount of overlap is in the cluster of ontologies representing anatomy: the Foundational Model of Anatomy (FMA), Mouse Adult Gross Anatomy (MA), and ZFA each have more than 800 overlapping terms. At the same time, achieving orthogonality in this setmayberatherdifficult:thetermsthat the ontologies share describe different organisms, and therefore, one could argue they are not actually identical. However, in order to create a coherent set, lung in the MA, for example, must be related in some way to Lung in FMA. One possibility would be to create explicit mappings indicating homology and to keep the classes separate. Another possibility would be to create a naming convention to give organism-specific names to anatomical terms. For example, lung could be renamed to mouse lung in MA and human lung in FMA. Furthermore, every term in the Common Anatomy Reference Ontology (CARO) overlaps with at least one term from another ontology. CARO was designed as a reference ontology that will link together other anatomy ontologies, so it is not surprising to find significant overlap between CARO and the other anatomy ontologies. However, our results show that many anatomy ontologies do not reuse CARO terms explicitly. Notably, our analysis does not account for the nature of the ontologies that we examine. For example, at the moment we do not distinguish between reference and application ontologies a distinction that might give us further insight into the nature of the reuse and overlap among these ontologies. However, in order to account for the nature of the ontologies in our analysis, we would need to ensure that two prerequisites are met: First, we would require a clear set of criteria defining reference and application ontologies, and, second, we would need metadata from each ontology that would allow us to categorize it as a reference or application ontology. Our results demonstrate that some overlap remains among the ontologies included as the first OBO Foundry members. Although there are only 135 overlapping terms among them, each of the six ontologies has some overlap and therefore a coordinated effort will be required to make these ontologies truly orthogonal. Thelackofanytermreuseoutsideofthetwoanatomyontologiesinthesetofthe six OBO Foundry members indicates that this set of ontologies represents low-hanging fruit in achieving orthogonality: They represent disjoint domains and inclusion in the OBO Foundry did not require one ontology to drop some of its own terms and to reuse terms defined elsewhere. The reconciliation process will be more difficult for ontologies that represent overlapping domains when each defines a similar set of

13 Page 13 of 14 terms. Overall, among the 53 candidate ontologies that we analyzed, only 2 ontologies are orthogonal within the OBO Foundry, having no overlap with other candidates. Additionally, only 30% of ontologies reuse terms, while 96% have terms overlapping with other ontologies. Together, these statistics indicate that the vast majority of the ontologies that currently overlap have not yet adopted any measure of term reuse. Our analysis indicates that several characteristics of the overlap among the ontologies will make reaching orthogonality difficult. First, many ontologies contain at least one term that overlaps with a term in another ontology, which means many developers will need to be involved in reconciling this overlap. Second, in some domains, such as anatomy, terms with closely related meanings may be difficult to reconcile. Finally, new ontology development can readily introduce new overlap among the ontologies. Given these overlap characteristics in combination with the fact that the number of overlaps among OBO Foundry ontologies did not decrease over the course of a year of active development, there is no evidence that the OBO Foundry will reach its stated goal of orthogonality, despite the increased adoption of term reuse among its ontologies. Conclusions Our analysis produced a list of 9,992 pairs of overlapping classes that can serve as a guide for the developers of the OBO Foundry candidate ontologies in order to achieve orthogonality. We have created a prototype web application that allows OBO Foundry ontology developers to see which classes from their ontologies overlap with classes from other ontologies in the OBO Foundry ( We believe that there are several different categories for the pairs of overlapping terms identified by this analysis, each requiring a different development approach: (1) The terms represent the same concept; (2) The terms have the same name and represent related but different concepts (e.g., the anatomy example); and (3) The terms have the same name but are different terms, either because our analysis identified an incorrect match or because the terms are polysemous. Our results show that 51 ontologies (96%) of those listed on the OBO Foundry web site would require such attention. From our analysis, we conclude that while the OBO Foundry has made significant progress toward orthogonality during the period of this study through increased adoption of explicit term reuse, a large amount of overlap remains among these ontologies. Notably, the number of cases of term reuse increased by 6,095 terms over this period, from 4.0% of the total terms among the ontologies to 5.3% of the total. However, the total number of overlaps also increased during this period and 96% of the OBO Foundry candidate ontologies still have at least some overlap with the other ontologies. Additionally, some of the remaining overlap, such as that among the anatomy ontologies, requires careful consideration to reconcile. In this paper, we have shown that through analysis of the inter-ontology relationships among the OBO Foundry candidate ontologies, we can understand the current state of orthogonality in the OBO Foundry and how that state has evolved over time. Given that the overlap among the OBO Foundry candidate ontologies did not decrease from September 2009 to September 2010 and that this overlap remains spread across the vast majority of the ontologies, achieving orthogonality in the OBO Foundry will be an extraordinarily difficult task.

14 Page 14 of 14 In future work, we plan to analyze where the overlap and reuse actually lie within the ontologies. Doing so might allow us to identify sub-domains of the ontologies that have either similar or conflicting representations. Such sub-domains would be important targets for reconciliation as they may reflect different perspectives or granularities with respect to the modeling of similar entities. Acknowledgments This work was supported by the National Center for Biomedical Ontology, under roadmap-initiative grant U54 HG from the National Institutes of Health. We are very grateful to Barry Smith, Trish Whetzel, Clement Jonquet, and Cherie Youn for their help with this work. This article has been published as part of Journal of Biomedical Semantics Volume 2 Supplement 2, 2011: Proceedings of the Bio-Ontologies Special Interest Group Meeting The full contents of the supplement are available online at Authors contributions AG collected the data, wrote the software, and performed the analysis. NN and MM participated in the design of the study and in drafting the manuscript. All authors read and approved the final manuscript. Competing interests The authors declare no competing interests. Published: 17 May 2011 References 1. Rubin DL, Shah NH, Noy NF: Biomedical ontologies: a functional perspective. Briefings in Bioinformatics 2008, 9: Bodenreider O, Stevens R: Bio-ontologies: current trends and future directions. Briefings in Bioinformatics 2006, 7: Smith B, Ashburner M, Rosse C, Bard J, Bug W, Ceusters W, Goldberg LJ, Eilbeck K, Ireland A, Mungall CJ, Leontis N, Rocca-Serra P, Ruttenberg A, Sansone SA, Scheuermann RH, Shah N, Whetzel PL, Lewis S: The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration. Nature Biotechnology 2007, 25(11): Rosse C, Mejino JL Jr: A Reference Ontology for Bioinformatics: The Foundational Model of Anatomy. J Biomed Inform 2003, 36: Announcement of First Set of OBO Foundry Ontologies [ Announcement_of_First_Set_of_OBO_Foundry Ontologies]. 6. Ghazvinian A, Noy NF, Musen MA: Creating Mappings For Ontologies in Biomedicine: Simple Methods Work. AMIA Annu Symp Proc 2009, 2009: Ghazvinian A, Noy NF, Jonquet C, Shah N, Musen M: What Four Million Mappings Can Tell You About Two Hundred Ontologies. In 8th International Semantic Web Conference (ISWC 2009). Volume 5823/2009. Chantilly, VA; 2009: Noy NF, Shah NH, Whetzel PL, Dai B, Dorf M, Griffith N, Jonquet C, Rubin DL, Storey MA, Chute CG, Musen MA: BioPortal: ontologies and integrated data resources at the click of a mouse. Nucleic Acids Research 2009, / nar/gkp BioPortal [ 10. Lindberg D, Humphreys B, McCray A: The Unified Medical Language System. Methods of Information in Medicine 1993, 32(4): Bodenreider O: The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Research 2004, 32(suppl 1):D267-D WHO: The WHO Family of International Classifications [ 13. Consortium GO: The Gene Ontology (GO) project in Nucleic Acids Research 2006, 34(suppl 1):D322-D Brinkman RR, Courtot M, Derom D, Fostel JM, He Y, Lord P, Malone J, Parkinson H, Peters B, Rocca-Serra P, Ruttenberg A, Sansone SAA, Soldatova LN, Stoeckert CJ, Turner JA, Zheng J, OBI consortium: Modeling biomedical experimental processes with OBI. Journal of biomedical semantics 2010, 1 Suppl 1(Suppl 1):S7+. doi: / s2-s2 Cite this article as: Ghazvinian et al.: How orthogonal are the OBO Foundry ontologies? Journal of Biomedical Semantics (Suppl 2):S2.

What Four Million Mappings Can Tell You about Two Hundred Ontologies

What Four Million Mappings Can Tell You about Two Hundred Ontologies What Four Million Mappings Can Tell You about Two Hundred Ontologies Amir Ghazvinian, Natasha Noy, Clement Jonquet, Nigam Shah, Mark Musen To cite this version: Amir Ghazvinian, Natasha Noy, Clement Jonquet,

More information

A Semantic Web-Based Approach for Harvesting Multilingual Textual. definitions from Wikipedia to support ICD-11 revision

A Semantic Web-Based Approach for Harvesting Multilingual Textual. definitions from Wikipedia to support ICD-11 revision A Semantic Web-Based Approach for Harvesting Multilingual Textual Definitions from Wikipedia to Support ICD-11 Revision Guoqian Jiang 1,* Harold R. Solbrig 1 and Christopher G. Chute 1 1 Department of

More information

Collaborative Ontology Development on the (Semantic) Web

Collaborative Ontology Development on the (Semantic) Web Collaborative Ontology Development on the (Semantic) Web Natalya F. Noy and Tania Tudorache Stanford Center for Biomedical Informatics Research Stanford University Stanford, CA 94305 {noy,tudorache}@stanford.edu

More information

NCBO Technology: Powering semantically aware applications

NCBO Technology: Powering semantically aware applications JOURNAL OF BIOMEDICAL SEMANTICS PROCEEDINGS Open Access NCBO Technology: Powering semantically aware applications Patricia L Whetzel 1*, NCBO Team 1,2,3,4 From Bio-Ontologies 2012 Long Beach, CA, USA.

More information

OBOPedia: An Encyclopaedia of Biology Using OBO Ontologies

OBOPedia: An Encyclopaedia of Biology Using OBO Ontologies OBOPedia: An Encyclopaedia of Biology Using OBO Ontologies Robert Stevens and Adam Nogradi School of Computer Science, University of Manchester, UK, robert.stevens@manchester.ac.uk Abstract. Ontologies

More information

ORES-2010 Ontology Repositories and Editors for the Semantic Web

ORES-2010 Ontology Repositories and Editors for the Semantic Web Vol-596 urn:nbn:de:0074-596-3 Copyright 2010 for the individual papers by the papers' authors. Copying permitted only for private and academic purposes. This volume is published and copyrighted by its

More information

Ontology Refinement and Evaluation based on is-a Hierarchy Similarity

Ontology Refinement and Evaluation based on is-a Hierarchy Similarity Ontology Refinement and Evaluation based on is-a Hierarchy Similarity Takeshi Masuda The Institute of Scientific and Industrial Research, Osaka University Abstract. Ontologies are constructed in fields

More information

Prototyping a Biomedical Ontology Recommender Service

Prototyping a Biomedical Ontology Recommender Service Prototyping a Biomedical Ontology Recommender Service Clement Jonquet Nigam H. Shah Mark A. Musen jonquet@stanford.edu 1 Ontologies & data & annota@ons (1/2) Hard for biomedical researchers to find the

More information

Collecting Community-Based Mappings in an Ontology Repository

Collecting Community-Based Mappings in an Ontology Repository Collecting Community-Based Mappings in an Ontology Repository Natalya F. Noy, Nicholas Griffith, Mark A. Musen Stanford University, Stanford, CA 94305, US {noy,ngriff,musen}@stanford.edu Abstract. A number

More information

Knowledge Representations. How else can we represent knowledge in addition to formal logic?

Knowledge Representations. How else can we represent knowledge in addition to formal logic? Knowledge Representations How else can we represent knowledge in addition to formal logic? 1 Common Knowledge Representations Formal Logic Production Rules Semantic Nets Schemata and Frames 2 Production

More information

A WEB-BASED TOOLKIT FOR LARGE-SCALE ONTOLOGIES

A WEB-BASED TOOLKIT FOR LARGE-SCALE ONTOLOGIES A WEB-BASED TOOLKIT FOR LARGE-SCALE ONTOLOGIES 1 Yuxin Mao 1 School of Computer and Information Engineering, Zhejiang Gongshang University, Hangzhou 310018, P.R. China E-mail: 1 maoyuxin@zjgsu.edu.cn ABSTRACT

More information

Effective Mapping Composition for Biomedical Ontologies

Effective Mapping Composition for Biomedical Ontologies Effective Mapping Composition for Biomedical Ontologies Michael Hartung 1,2, Anika Gross 1,2, Toralf Kirsten 2, and Erhard Rahm 1,2 1 Department of Computer Science, University of Leipzig 2 Interdisciplinary

More information

Auditing Redundant Import in Reuse of a Top Level Ontology for the Drug Discovery Investigations Ontology (DDI)

Auditing Redundant Import in Reuse of a Top Level Ontology for the Drug Discovery Investigations Ontology (DDI) ICBO 2013 Workshop on Vaccine and Drug Ontology Studies Auditing Redundant Import in Reuse of a Top Level Ontology for the Drug Discovery Investigations Ontology (DDI) Zhe He 1 Christopher Ochs 1 Larisa

More information

warwick.ac.uk/lib-publications

warwick.ac.uk/lib-publications Original citation: Zhao, Lei, Lim Choi Keung, Sarah Niukyun and Arvanitis, Theodoros N. (2016) A BioPortalbased terminology service for health data interoperability. In: Unifying the Applications and Foundations

More information

FCA-Map Results for OAEI 2016

FCA-Map Results for OAEI 2016 FCA-Map Results for OAEI 2016 Mengyi Zhao 1 and Songmao Zhang 2 1,2 Institute of Mathematics, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, P. R. China 1 myzhao@amss.ac.cn,

More information

SNOMED Clinical Terms

SNOMED Clinical Terms Representing clinical information using SNOMED Clinical Terms with different structural information models KR-MED 2008 - Phoenix David Markwell Laura Sato The Clinical Information Consultancy Ltd NHS Connecting

More information

Community-based ontology development, alignment, and evaluation. Natasha Noy Stanford Center for Biomedical Informatics Research Stanford University

Community-based ontology development, alignment, and evaluation. Natasha Noy Stanford Center for Biomedical Informatics Research Stanford University Community-based ontology development, alignment, and evaluation Natasha Noy Stanford Center for Biomedical Informatics Research Stanford University Community-based Ontology... Everything Development and

More information

NEO: Systematic Non-Lattice Embedding of Ontologies for Comparing the Subsumption Relationship in SNOMED CT and in FMA Using MapReduce

NEO: Systematic Non-Lattice Embedding of Ontologies for Comparing the Subsumption Relationship in SNOMED CT and in FMA Using MapReduce NEO: Systematic Non-Lattice Embedding of Ontologies for Comparing the Subsumption Relationship in SNOMED CT and in FMA Using MapReduce Wei Zhu, Guo-Qiang Zhang, PhD, Shiqiang Tao, MS, Mengmeng Sun, Licong

More information

Ontology-Based Representation of Simulation Models of Physiology

Ontology-Based Representation of Simulation Models of Physiology Ontology-Based Representation of Simulation Models of Physiology Daniel L. Rubin, MD, MS 1, David Grossman, PhD 1, Maxwell Neal, BS 2, Daniel L. Cook, PhD 3, James B. Bassingthwaighte, MD, PhD 2, and Mark

More information

Ontology-Assisted Analysis of Web Queries to Determine the Knowledge Radiologists Seek

Ontology-Assisted Analysis of Web Queries to Determine the Knowledge Radiologists Seek Ontology-Assisted Analysis of Web Queries to Determine the Knowledge Radiologists Seek Daniel L. Rubin, 1 Adam Flanders, 2 Woojin Kim, 3 Khan M. Siddiqui, 4,5 and Charles E. Kahn Jr 6 Radiologists frequently

More information

Acquiring Experience with Ontology and Vocabularies

Acquiring Experience with Ontology and Vocabularies Acquiring Experience with Ontology and Vocabularies Walt Melo Risa Mayan Jean Stanford The author's affiliation with The MITRE Corporation is provided for identification purposes only, and is not intended

More information

Ontrez Project Report National Center for Biomedical Ontology November, 2007

Ontrez Project Report National Center for Biomedical Ontology November, 2007 Ontrez Project Report National Center for Biomedical Ontology November, 2007 Executive summary Currently, genomics data and data repositories in the public domain are expanding at an explosive pace. 1

More information

Towards annotating potential incoherences in BioPortal mappings

Towards annotating potential incoherences in BioPortal mappings Towards annotating potential incoherences in BioPortal mappings Daniel Faria 1, Ernesto Jiménez-Ruiz 2, Catia Pesquita 1,3, Emanuel Santos 3, and Francisco M. Couto 1,3 1 LASIGE, Faculdade de Ciências,

More information

Medical Imaging on the Semantic Web: Annotation and Image Markup

Medical Imaging on the Semantic Web: Annotation and Image Markup Medical Imaging on the Semantic Web: Annotation and Image Markup Daniel L. Rubin, 1 Pattanasak Mongkolwat, 2 Vladimir Kleper, 2 Kaustubh Supekar, 1 and David S. Channin 2 1 Department of Radiology and

More information

An Architecture for Semantic Enterprise Application Integration Standards

An Architecture for Semantic Enterprise Application Integration Standards An Architecture for Semantic Enterprise Application Integration Standards Nenad Anicic 1, 2, Nenad Ivezic 1, Albert Jones 1 1 National Institute of Standards and Technology, 100 Bureau Drive Gaithersburg,

More information

Joint Entity Resolution

Joint Entity Resolution Joint Entity Resolution Steven Euijong Whang, Hector Garcia-Molina Computer Science Department, Stanford University 353 Serra Mall, Stanford, CA 94305, USA {swhang, hector}@cs.stanford.edu No Institute

More information

POMap results for OAEI 2017

POMap results for OAEI 2017 POMap results for OAEI 2017 Amir Laadhar 1, Faiza Ghozzi 2, Imen Megdiche 1, Franck Ravat 1, Olivier Teste 1, and Faiez Gargouri 2 1 Paul Sabatier University, IRIT (CNRS/UMR 5505) 118 Route de Narbonne

More information

Powering Knowledge Discovery. Insights from big data with Linguamatics I2E

Powering Knowledge Discovery. Insights from big data with Linguamatics I2E Powering Knowledge Discovery Insights from big data with Linguamatics I2E Gain actionable insights from unstructured data The world now generates an overwhelming amount of data, most of it written in natural

More information

A Semantic Web Framework for the Life Sciences Based on Foundational Ontologies and Metadata Standards

A Semantic Web Framework for the Life Sciences Based on Foundational Ontologies and Metadata Standards Proceedings of I-MEDIA 07 and I-SEMANTICS 07 Graz, Austria, September 5-7, 2007 A Semantic Web Framework for the Life Sciences Based on Foundational Ontologies and Metadata Standards Matthias Samwald (Section

More information

Semantic Web in a Constrained Environment

Semantic Web in a Constrained Environment Semantic Web in a Constrained Environment Laurens Rietveld and Stefan Schlobach Department of Computer Science, VU University Amsterdam, The Netherlands {laurens.rietveld,k.s.schlobach}@vu.nl Abstract.

More information

The National Cancer Institute's Thésaurus and Ontology

The National Cancer Institute's Thésaurus and Ontology The National Cancer Institute's Thésaurus and Ontology Jennifer Golbeck 1, Gilberto Fragoso 2, Frank Hartel 2, Jim Hendler 1, Jim Oberthaler 2, Bijan Parsia 1 1 University of Maryland, College Park 2 National

More information

Mapping Composition for Matching Large Life Science Ontologies

Mapping Composition for Matching Large Life Science Ontologies Mapping Composition for Matching Large Life Science Ontologies Anika Groß 1,2, Michael Hartung 1,2, Toralf Kirsten 2,3, Erhard Rahm 1,2 1 Department of Computer Science, University of Leipzig 2 Interdisciplinary

More information

Developing A Semantic Web-based Framework for Executing the Clinical Quality Language Using FHIR

Developing A Semantic Web-based Framework for Executing the Clinical Quality Language Using FHIR Developing A Semantic Web-based Framework for Executing the Clinical Quality Language Using FHIR Guoqian Jiang 1, Eric Prud Hommeax 2, and Harold R. Solbrig 1 1 Mayo Clinic, Rochester, MN, 55905, USA 2

More information

Man vs. Machine Dierences in SPARQL Queries

Man vs. Machine Dierences in SPARQL Queries Man vs. Machine Dierences in SPARQL Queries Laurens Rietveld 1 and Rinke Hoekstra 1,2 1 Department of Computer Science, VU University Amsterdam, The Netherlands {laurens.rietveld,rinke.hoekstra}@vu.nl

More information

Implementation Techniques

Implementation Techniques V Implementation Techniques 34 Efficient Evaluation of the Valid-Time Natural Join 35 Efficient Differential Timeslice Computation 36 R-Tree Based Indexing of Now-Relative Bitemporal Data 37 Light-Weight

More information

Data Models: The Center of the Business Information Systems Universe

Data Models: The Center of the Business Information Systems Universe Data s: The Center of the Business Information Systems Universe Whitemarsh Information Systems Corporation 2008 Althea Lane Bowie, Maryland 20716 Tele: 301-249-1142 Email: Whitemarsh@wiscorp.com Web: www.wiscorp.com

More information

The Neuroscience Information Framework Practical experiences in using and building community ontologies

The Neuroscience Information Framework Practical experiences in using and building community ontologies The Neuroscience Information Framework Practical experiences in using and building community ontologies Maryann Martone, Ph. D. University of California, San Diego 1 The Neuroscience Information Framework:

More information

Domain-specific Concept-based Information Retrieval System

Domain-specific Concept-based Information Retrieval System Domain-specific Concept-based Information Retrieval System L. Shen 1, Y. K. Lim 1, H. T. Loh 2 1 Design Technology Institute Ltd, National University of Singapore, Singapore 2 Department of Mechanical

More information

A methodology for biomedical ontology reuse

A methodology for biomedical ontology reuse A methodology for biomedical ontology reuse Zulkarnain, NZ and Meziane, F http://dx.doi.org/10.1007/978 3 319 41754 7_1 Title Authors Type URL A methodology for biomedical ontology reuse Zulkarnain, NZ

More information

Thanks to our Sponsors

Thanks to our Sponsors Thanks to our Sponsors A brief history of Protégé 1987 PROTÉGÉ runs on LISP machines 1992 PROTÉGÉ-II runs under NeXTStep 1995 Protégé/Win runs under guess! 2000 Protégé-2000 runs under Java 2005 Protégé

More information

Annotation Science From Theory to Practice and Use Introduction A bit of history

Annotation Science From Theory to Practice and Use Introduction A bit of history Annotation Science From Theory to Practice and Use Nancy Ide Department of Computer Science Vassar College Poughkeepsie, New York 12604 USA ide@cs.vassar.edu Introduction Linguistically-annotated corpora

More information

FIPA ACL Message Structure Specification

FIPA ACL Message Structure Specification 1 2 3 4 5 FOUNDATION FOR INTELLIGENT PHYSICAL AGENTS FIPA ACL Message Structure Specification 6 7 Document title FIPA ACL Message Structure Specification Document number XC00061E Document source FIPA TC

More information

A System for Ontology-Based Annotation of Biomedical Data

A System for Ontology-Based Annotation of Biomedical Data A System for Ontology-Based Annotation of Biomedical Data Clement Jonquet, Mark A. Musen, and Nigam Shah Stanford Center for Biomedical Informatics Research Stanford University School of Medicine Medical

More information

ServOMap and ServOMap-lt Results for OAEI 2012

ServOMap and ServOMap-lt Results for OAEI 2012 ServOMap and ServOMap-lt Results for OAEI 2012 Mouhamadou Ba 1, Gayo Diallo 1 1 LESIM/ISPED, Univ. Bordeaux Segalen, F-33000, France first.last@isped.u-bordeaux2.fr Abstract. We present the results obtained

More information

Languages and tools for building and using ontologies. Simon Jupp, James Malone

Languages and tools for building and using ontologies. Simon Jupp, James Malone An overview of ontology technology Languages and tools for building and using ontologies Simon Jupp, James Malone jupp@ebi.ac.uk, malone@ebi.ac.uk Outline Languages OWL and OBO classes, individuals, relations,

More information

An Online Ontology: WiktionaryZ

An Online Ontology: WiktionaryZ KR-MED 2006 "Biomedical Ontology in Action" November 8, 2006, Baltimore, Maryland, USA An Online Ontology: WiktionaryZ Erik M. van Mulligen, Ph.D. 1,2, Erik Möller, Peter-Jan Roes 3, Marc Weeber, Ph.D.

More information

NCI Thesaurus, managing towards an ontology

NCI Thesaurus, managing towards an ontology NCI Thesaurus, managing towards an ontology CENDI/NKOS Workshop October 22, 2009 Gilberto Fragoso Outline Background on EVS The NCI Thesaurus BiomedGT Editing Plug-in for Protege Semantic Media Wiki supports

More information

A Family-Based Framework for Supporting Quality Assurance of Biomedical Ontologies in BioPortal

A Family-Based Framework for Supporting Quality Assurance of Biomedical Ontologies in BioPortal A Family-Based Framework for Supporting Quality Assurance of Biomedical Ontologies in BioPortal Zhe He, MS 1, Christopher Ochs, MS 1, Ankur Agrawal, PhD 2, Yehoshua Perl, PhD 1, Dimitris Zeginis, MS 3,

More information

Using Ontologies for Data and Semantic Integration

Using Ontologies for Data and Semantic Integration Using Ontologies for Data and Semantic Integration Monica Crubézy Stanford Medical Informatics, Stanford University ~~ November 4, 2003 Ontologies Conceptualize a domain of discourse, an area of expertise

More information

Developing A Semantic Web-based Framework for Executing the Clinical Quality Language Using FHIR

Developing A Semantic Web-based Framework for Executing the Clinical Quality Language Using FHIR Developing A Semantic Web-based Framework for Executing the Clinical Quality Language Using FHIR Guoqian Jiang 1, Eric Prud Hommeaux 2, Guohui Xiao 3, and Harold R. Solbrig 1 1 Mayo Clinic, Rochester,

More information

A Relation-Centric Query Engine for the Foundational Model of Anatomy

A Relation-Centric Query Engine for the Foundational Model of Anatomy A Relation-Centric Query Engine for the Foundational Model of Anatomy Landon T. Detwiler, Emily Chung, Ann Li, José L.V. Mejino Jr., Augusto V. Agoncillo, James F. Brinkley, Cornelius Rosse, and Linda

More information

Keywords Repository, Retrieval, Component, Reusability, Query.

Keywords Repository, Retrieval, Component, Reusability, Query. Volume 4, Issue 3, March 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Multiple Search

More information

Building an effective and efficient background knowledge resource to enhance ontology matching

Building an effective and efficient background knowledge resource to enhance ontology matching Building an effective and efficient background knowledge resource to enhance ontology matching Amina Annane 1,2, Zohra Bellahsene 2, Faiçal Azouaou 1, Clement Jonquet 2,3 1 Ecole nationale Supérieure d

More information

Taking a view on bio-ontologies. Simon Jupp Functional Genomics Production Team ICBO, 2012 Graz, Austria

Taking a view on bio-ontologies. Simon Jupp Functional Genomics Production Team ICBO, 2012 Graz, Austria Taking a view on bio-ontologies Simon Jupp Functional Genomics Production Team ICBO, 2012 Graz, Austria Who we are European Bioinformatics Institute one of world s largest bio data and service providers

More information

International Reference Terminologies and Ontologies in Medicine

International Reference Terminologies and Ontologies in Medicine International Reference Terminologies and Ontologies in Medicine Catalina Martinez Costa, PhD., BSc., MSc. Stefan Schulz, MD, PhD Institute for Medical Informatics, Statistics and Documentation, Graz,

More information

Malaria study data integration and information retrieval based on OBO Foundry ontologies

Malaria study data integration and information retrieval based on OBO Foundry ontologies Malaria study data integration and information retrieval based on OBO Foundry ontologies Jie Zheng, JaShon Cade, Brian Brunk, David S. Roos, Christian J. Stoeckert Jr. EuPath Bioinformatics Resource Center

More information

Taking a view on bio-ontologies

Taking a view on bio-ontologies Simon Jupp 1, Andrew Gibson 2,3, James Malone 1, Helen Parkinson 1 and Robert Stevens 4 1 Functional Genomics Group, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge

More information

Automated data entry system: performance issues

Automated data entry system: performance issues Automated data entry system: performance issues George R. Thoma, Glenn Ford National Library of Medicine, Bethesda, Maryland 20894 ABSTRACT This paper discusses the performance of a system for extracting

More information

Designing and documenting the behavior of software

Designing and documenting the behavior of software Chapter 8 Designing and documenting the behavior of software Authors: Gürcan Güleşir, Lodewijk Bergmans, Mehmet Akşit Abstract The development and maintenance of today s software systems is an increasingly

More information

Quantifying and Assessing the Merge of Cloned Web-Based System: An Exploratory Study

Quantifying and Assessing the Merge of Cloned Web-Based System: An Exploratory Study Quantifying and Assessing the Merge of Cloned Web-Based System: An Exploratory Study Jadson Santos Department of Informatics and Applied Mathematics Federal University of Rio Grande do Norte, UFRN Natal,

More information

Loughborough University Institutional Repository. IEEE Transactions on Reliability, 51(3), pp

Loughborough University Institutional Repository. IEEE Transactions on Reliability, 51(3), pp Loughborough University Institutional Repository Choosing a heuristic for the fault tree to binary decision diagram conversion, using neural networks This item was submitted to Loughborough University's

More information

Interoperability of Protégé 2.0 beta and OilEd 3.5 in the Domain Knowledge of Osteoporosis

Interoperability of Protégé 2.0 beta and OilEd 3.5 in the Domain Knowledge of Osteoporosis EXPERIMENT: Interoperability of Protégé 2.0 beta and OilEd 3.5 in the Domain Knowledge of Osteoporosis Franz Calvo, MD fcalvo@u.washington.edu and John H. Gennari, PhD gennari@u.washington.edu Department

More information

Biomedical Digital Libraries

Biomedical Digital Libraries Biomedical Digital Libraries BioMed Central Resource review database: a review Judy F Burnham* Open Access Address: University of South Alabama Biomedical Library, 316 BLB, Mobile, AL 36688, USA Email:

More information

Humboldt-University of Berlin

Humboldt-University of Berlin Humboldt-University of Berlin Exploiting Link Structure to Discover Meaningful Associations between Controlled Vocabulary Terms exposé of diploma thesis of Andrej Masula 13th October 2008 supervisor: Louiqa

More information

UMLS-Query: A Perl Module for Querying the UMLS

UMLS-Query: A Perl Module for Querying the UMLS UMLS-Query: A Perl Module for Querying the UMLS Nigam H. Shah, MBBS, PhD, Mark A. Musen, MD, PhD Center for Biomedical Informatics Research, Stanford University, Stanford, CA Abstract The Metathesaurus

More information

Open Ontology Repository Initiative

Open Ontology Repository Initiative Open Ontology Repository Initiative Frank Olken Lawrence Berkeley National Laboratory National Science Foundation folken@nsf.gov presented to CENDI/NKOS Workshop World Bank Sept. 11, 2008 Version 6.0 DISCLAIMER

More information

Sheffield University and the TREC 2004 Genomics Track: Query Expansion Using Synonymous Terms

Sheffield University and the TREC 2004 Genomics Track: Query Expansion Using Synonymous Terms Sheffield University and the TREC 2004 Genomics Track: Query Expansion Using Synonymous Terms Yikun Guo, Henk Harkema, Rob Gaizauskas University of Sheffield, UK {guo, harkema, gaizauskas}@dcs.shef.ac.uk

More information

ARC BRIEF. ISA100 and Wireless Standards Convergence. By Harry Forbes

ARC BRIEF. ISA100 and Wireless Standards Convergence. By Harry Forbes ARC BRIEF OCTOBER 1, 2010 ISA100 and Wireless Standards Convergence By Harry Forbes ISA100 is one of three standards competing in industrial wireless sensing. What is distinctive about ISA100? What are

More information

Draft for discussion, by Karen Coyle, Diane Hillmann, Jonathan Rochkind, Paul Weiss

Draft for discussion, by Karen Coyle, Diane Hillmann, Jonathan Rochkind, Paul Weiss Framework for a Bibliographic Future Draft for discussion, by Karen Coyle, Diane Hillmann, Jonathan Rochkind, Paul Weiss Introduction Metadata is a generic term for the data that we create about persons,

More information

Executive Summary for deliverable D6.1: Definition of the PFS services (requirements, initial design)

Executive Summary for deliverable D6.1: Definition of the PFS services (requirements, initial design) Electronic Health Records for Clinical Research Executive Summary for deliverable D6.1: Definition of the PFS services (requirements, initial design) Project acronym: EHR4CR Project full title: Electronic

More information

Collaborative & WebProtégé

Collaborative & WebProtégé Collaborative & WebProtégé Tania Tudorache Stanford Center for Biomedical Informatics Research Joint Ontolog-OOR Panel Session July 16, 2009 1 Collaborative Ontology Development Collaboration: several

More information

NCBO Ontology Recommender 2.0: an enhanced approach for biomedical ontology recommendation

NCBO Ontology Recommender 2.0: an enhanced approach for biomedical ontology recommendation Martínez-Romero et al. Journal of Biomedical Semantics (2017) 8:21 DOI 10.1186/s13326-017-0128-y RESEARCH Open Access NCBO Ontology Recommender 2.0: an enhanced approach for biomedical ontology recommendation

More information

Disease Information and Semantic Web

Disease Information and Semantic Web Rheinische Friedrich-Wilhelms-Universität Bonn Institute of Computer Science III Disease Information and Semantic Web Master s Thesis Supervisor: Prof. Sören Auer, Heiner OberKampf Turan Gojayev München,

More information

Chapter 2 Overview of the Design Methodology

Chapter 2 Overview of the Design Methodology Chapter 2 Overview of the Design Methodology This chapter presents an overview of the design methodology which is developed in this thesis, by identifying global abstraction levels at which a distributed

More information

Document Retrieval using Predication Similarity

Document Retrieval using Predication Similarity Document Retrieval using Predication Similarity Kalpa Gunaratna 1 Kno.e.sis Center, Wright State University, Dayton, OH 45435 USA kalpa@knoesis.org Abstract. Document retrieval has been an important research

More information

2. TOPOLOGICAL PATTERN ANALYSIS

2. TOPOLOGICAL PATTERN ANALYSIS Methodology for analyzing and quantifying design style changes and complexity using topological patterns Jason P. Cain a, Ya-Chieh Lai b, Frank Gennari b, Jason Sweis b a Advanced Micro Devices, 7171 Southwest

More information

RiMOM Results for OAEI 2009

RiMOM Results for OAEI 2009 RiMOM Results for OAEI 2009 Xiao Zhang, Qian Zhong, Feng Shi, Juanzi Li and Jie Tang Department of Computer Science and Technology, Tsinghua University, Beijing, China zhangxiao,zhongqian,shifeng,ljz,tangjie@keg.cs.tsinghua.edu.cn

More information

Ontology-based annotation of multiscale imaging data: Utilizing and building the Neuroscience Information Framework. Maryann E.

Ontology-based annotation of multiscale imaging data: Utilizing and building the Neuroscience Information Framework. Maryann E. Ontology-based annotation of multiscale imaging data: Utilizing and building the Neuroscience Information Framework Maryann E. Martone University of California, San Diego What does this mean? 3D Volumes

More information

Development of an Environment for Data Annotation in Bioengineering

Development of an Environment for Data Annotation in Bioengineering Development of an Environment for Data Annotation in Bioengineering Renato Galina Barbosa Azambuja Correia Instituto Superior Técnico, Universidade Técnica de Lisboa, Portugal renato_correia1@hotmail.com

More information

OWL/ZIP: Distributing Large and Modular Ontologies

OWL/ZIP: Distributing Large and Modular Ontologies OWL/ZIP: Distributing Large and Modular Ontologies Nicolas Matentzoglu and Bijan Parsia The University of Manchester Oxford Road, Manchester, M13 9PL, UK {bparsia,matentzn}@cs.manchester.ac.uk Abstract.

More information

Detecting Controversial Articles in Wikipedia

Detecting Controversial Articles in Wikipedia Detecting Controversial Articles in Wikipedia Joy Lind Department of Mathematics University of Sioux Falls Sioux Falls, SD 57105 Darren A. Narayan School of Mathematical Sciences Rochester Institute of

More information

Chapter 6 Architectural Design

Chapter 6 Architectural Design Chapter 6 Architectural Design Chapter 6 Architectural Design Slide 1 Topics covered The WHAT and WHY of architectural design Architectural design decisions Architectural views/perspectives Architectural

More information

Automated Cognitive Walkthrough for the Web (AutoCWW)

Automated Cognitive Walkthrough for the Web (AutoCWW) CHI 2002 Workshop: Automatically Evaluating the Usability of Web Sites Workshop Date: April 21-22, 2002 Automated Cognitive Walkthrough for the Web (AutoCWW) Position Paper by Marilyn Hughes Blackmon Marilyn

More information

CICS insights from IT professionals revealed

CICS insights from IT professionals revealed CICS insights from IT professionals revealed A CICS survey analysis report from: IBM, CICS, and z/os are registered trademarks of International Business Machines Corporation in the United States, other

More information

An Empirical Study of Lazy Multilabel Classification Algorithms

An Empirical Study of Lazy Multilabel Classification Algorithms An Empirical Study of Lazy Multilabel Classification Algorithms E. Spyromitros and G. Tsoumakas and I. Vlahavas Department of Informatics, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece

More information

Supporting Patient Screening to Identify Suitable Clinical Trials

Supporting Patient Screening to Identify Suitable Clinical Trials Supporting Patient Screening to Identify Suitable Clinical Trials Anca BUCUR a,1, Jasper VAN LEEUWEN a, Njin-Zu CHEN a, Brecht CLAERHOUT b Kristof DE SCHEPPER b, David PEREZ-REY c, Raul ALONSO-CALVO c,

More information

THE ADHERENCE OF OPEN SOURCE JAVA PROGRAMMERS TO STANDARD CODING PRACTICES

THE ADHERENCE OF OPEN SOURCE JAVA PROGRAMMERS TO STANDARD CODING PRACTICES THE ADHERENCE OF OPEN SOURCE JAVA PROGRAMMERS TO STANDARD CODING PRACTICES Mahmoud O. Elish Department of Computer Science George Mason University Fairfax VA 223-44 USA melish@gmu.edu ABSTRACT The use

More information

Semantic Annotation for Semantic Social Networks. Using Community Resources

Semantic Annotation for Semantic Social Networks. Using Community Resources Semantic Annotation for Semantic Social Networks Using Community Resources Lawrence Reeve and Hyoil Han College of Information Science and Technology Drexel University, Philadelphia, PA 19108 lhr24@drexel.edu

More information

These Are the Top Languages for Enterprise Application Development

These Are the Top Languages for Enterprise Application Development These Are the Top Languages for Enterprise Application Development And What That Means for Business August 2018 Enterprises are now free to deploy a polyglot programming language strategy thanks to a decrease

More information

Administrative Guideline. SMPTE Metadata Registers Maintenance and Publication SMPTE AG 18:2017. Table of Contents

Administrative Guideline. SMPTE Metadata Registers Maintenance and Publication SMPTE AG 18:2017. Table of Contents SMPTE AG 18:2017 Administrative Guideline SMPTE Metadata Registers Maintenance and Publication Page 1 of 20 pages Table of Contents 1 Scope 3 2 Conformance Notation 3 3 Normative References 3 4 Definitions

More information

ISO/IEC TR TECHNICAL REPORT. Software and systems engineering Life cycle management Guidelines for process description

ISO/IEC TR TECHNICAL REPORT. Software and systems engineering Life cycle management Guidelines for process description TECHNICAL REPORT ISO/IEC TR 24774 First edition 2007-09-01 Software and systems engineering Life cycle management Guidelines for process description Ingénierie du logiciel et des systèmes Gestion du cycle

More information

Community-based Ontology Development, Annotation and Discussion with MediaWiki extension Ontokiwi and Ontokiwi-based Ontobedia

Community-based Ontology Development, Annotation and Discussion with MediaWiki extension Ontokiwi and Ontokiwi-based Ontobedia Community-based Ontology Development, Annotation and Discussion with MediaWiki extension Ontokiwi and Ontokiwi-based Ontobedia Abstract Edison Ong; Yongqun He, DVM, PhD University of Michigan Medical School,

More information

INSPIRE and SPIRES Log File Analysis

INSPIRE and SPIRES Log File Analysis INSPIRE and SPIRES Log File Analysis Cole Adams Science Undergraduate Laboratory Internship Program Wheaton College SLAC National Accelerator Laboratory August 5, 2011 Prepared in partial fulfillment of

More information

2 Experimental Methodology and Results

2 Experimental Methodology and Results Developing Consensus Ontologies for the Semantic Web Larry M. Stephens, Aurovinda K. Gangam, and Michael N. Huhns Department of Computer Science and Engineering University of South Carolina, Columbia,

More information

A semantic integration methodology

A semantic integration methodology Extreme Markup Languages 2003 Montréal, Québec August 4-8, 2003 A semantic integration methodology Steven R. Newcomb Coolheads Consulting Abstract The heart of the semantic integration problem is how to

More information

Taxonomies and controlled vocabularies best practices for metadata

Taxonomies and controlled vocabularies best practices for metadata Original Article Taxonomies and controlled vocabularies best practices for metadata Heather Hedden is the taxonomy manager at First Wind Energy LLC. Previously, she was a taxonomy consultant with Earley

More information

SOST 201 September 20, Stem-and-leaf display 2. Miscellaneous issues class limits, rounding, and interval width.

SOST 201 September 20, Stem-and-leaf display 2. Miscellaneous issues class limits, rounding, and interval width. 1 Social Studies 201 September 20, 2006 Presenting data and stem-and-leaf display See text, chapter 4, pp. 87-160. Introduction Statistical analysis primarily deals with issues where it is possible to

More information

SEMANTIC SUPPORT FOR MEDICAL IMAGE SEARCH AND RETRIEVAL

SEMANTIC SUPPORT FOR MEDICAL IMAGE SEARCH AND RETRIEVAL SEMANTIC SUPPORT FOR MEDICAL IMAGE SEARCH AND RETRIEVAL Wang Wei, Payam M. Barnaghi School of Computer Science and Information Technology The University of Nottingham Malaysia Campus {Kcy3ww, payam.barnaghi}@nottingham.edu.my

More information

The IDN Variant TLD Program: Updated Program Plan 23 August 2012

The IDN Variant TLD Program: Updated Program Plan 23 August 2012 The IDN Variant TLD Program: Updated Program Plan 23 August 2012 Table of Contents Project Background... 2 The IDN Variant TLD Program... 2 Revised Program Plan, Projects and Timeline:... 3 Communication

More information

Market Report. Scale-out 2.0: Simple, Scalable, Services- Oriented Storage. Scale-out Storage Meets the Enterprise. June 2010.

Market Report. Scale-out 2.0: Simple, Scalable, Services- Oriented Storage. Scale-out Storage Meets the Enterprise. June 2010. Market Report Scale-out 2.0: Simple, Scalable, Services- Oriented Storage Scale-out Storage Meets the Enterprise By Terri McClure June 2010 Market Report: Scale-out 2.0: Simple, Scalable, Services-Oriented

More information