ه عا ی Semantic Web Ontology Alignment Morteza Amini Sharif University of Technology Fall 94-95
Outline The Problem of Ontologies Ontology Heterogeneity Ontology Alignment Overall Process Similarity Methods 2
Outline The Problem of Ontologies Ontology Heterogeneity Ontology Alignment Overall Process Similarity Methods 3
The Problem Like the Web, the Semantic Web by design will be distributed and heterogeneous. Ontology is used in it to support interoperability and common understanding between different parties. c a b d??????? Ontologies themselves may have some heterogeneities. Ontology Alignment is needed to find semantic relationships among entities of ontologies. How should I use them?!!! 4
Need for Ontology Merging There is significant overlap in existing ontologies Yahoo! and DMOZ Open Directory Product catalogs for similar domains 5
Terminology (1) Mapping: a formal expression that states the semantic relationship between two entities belonging to different ontologies. Given two ontologies O 1 and O 2, mapping one ontology onto another means that for each entity (concept C, relation R, or instance I) in ontology O 1, we try to find a corresponding entity, which has the same intended meaning, in ontology O 2. map(e 1i ) = e 2j Ontology Alignment: a process of producing a set of correspondences between two or more (in case of multialignment) ontologies. These correspondences are expressed as mappings. 6
Terminology (2) Ontology Transformation: a general term for referring to any process which leads to a new ontology O 0 from an ontology O by using a transformation function T. Ontology Translation: an ontology transformation function t for translating an ontology O written in some language L into an ontology O written in a distinct language L. Ontology Merging: the creation of a new ontology from two (possibly overlapping) source ontologies. This concept is closely related to that of integration in the database community. 7
An Example of Ontology Alignment Car : Ontology A ( similar to ) Automobile : Ontology B Object 1.0 Thing Has Owner Vehicle Car Boat 0.6 Has Speed Vehicle Automobile Has Specification Speed Owner Ali Peugeot 405 250 km/h Speed 0.8 Ali s Peugeot Fast 0.6 Car Automobile Label Similarity = 0.0 Super Similarity = 1.0 Instance Similarity = 0.6 Relation Similarity = 0.8 Total Similarity = 0.6 Concept Property Instance Type Similarity 8
An Example of Ontology Merging Object Thing Vehicle Automobile Bus Car Sport Car Family Car Sport Car Luxury Car Family Car Porsche BMW 9
An Example of Ontology Merging Object Thing Vehicle Automobile Bus Car Sport Car Family Car Sport Car Luxury Car Family Car Porsche BMW 10
An Example of Ontology Merging Object, Thing Vehicle Bus Car, Automobile Sport Car Luxury Car Family Car 11 BMW Porsche
Outline The Problem of Ontologies Ontology Heterogeneity Ontology Alignment Overall Process Similarity Methods 12
Forms of Heterogeneity in Ontologies (1) (1) Syntactic: depend on the choice of the representation OWL, RDFS, DAML, N3, DATALOG, PROLOG, (2) Terminological: all forms of mismatches that are related to the process of naming the entities (e.g. individuals, classes, properties, relations) that occur in an ontology. Typical Examples: different words are used to name the same entity (synonymy); the same word is used to name different entities (polysemy); words from different languages (English, French, etc.) are used to name entities; syntactic variations of the same word (different acceptable spellings, abbreviations, use of optional prefixes or suffixes, etc.). Mismatches at the terminological level are not as deep as those occurring at the conceptual level. However, Most real cases have to do with the terminological level (e.g., with the way different people name the same entities), and therefore this level is at least as crucial as the other one. 13
Forms of Heterogeneity in Ontologies (2) (3) Conceptual: we encounter mismatches which have to do with the content of an ontology. Metaphysical differences: which have to do with how the world is broken into pieces. Coverage: cover different portions possibly overlapping of the world. Granularity: One ontology provides a more (or less) detailed description of the same entities. Perspective: an ontology may provide a viewpoint, which is different from the viewpoint adopted in another ontology. 14
Forms of Heterogeneity in Ontologies (3) Metaphysical differences: 15
Overcoming Heterogeneity One common approach to the problems of heterogeneity is the definition of relations (mappings) across the heterogeneous representations. These relations can be used for transforming expression of one ontology into a form compatible with that of the other. This may happen at any level: syntactic: through semantic-preserving transducers; terminological: through functions mapping lexical information; conceptual: through general transformation of the representations. 16
Structure of Mappings Alignment: a process that starts from two representations O and O and produces a set of mappings between pairs of (simple or complex) entities <e, e > belonging to O and O respectively. Intuitively, we will assume that in general a mapping can be described as a quadruple: <e, e, n, R> e and e are the entities between which a relation is asserted by the mapping. n is a degree of trust (confidence) in that mapping. R is the relation associated to a mapping, where R identifies the relation holding between e and e. Example: (Car, Automobile, 0.6, Equivalent) In this course we focus on finding equivalence or same as relations. 17
Finding Mappings Through Similarity There are many ways to assess the similarity between two entities. The most common way amounts to defining a measure of this similarity. The characteristics which can be asked from these measures: 18
Outline The Problem of Ontologies Ontology Heterogeneity Ontology Alignment Overall Process Similarity Methods 19
Ontology Alignment Process Iterations 1. Feature Extraction 2. Entity Pair Selection 3. Similarity 4. Aggregation 5. Interpretation Input Output 20
1 & 2. Feature Extraction / Pair Selection Extracting entities of two ontologies and their properties or featureas. Example Features: name, label, subclassof, instances Object Pair selection hasowner Vehicle Owner Boat Car hasspeed Speed Marc Porsche KA-123 250 km/h 21
3. Similarity - Measures String similarity: string comparisons e.g. labels. E.g., sim String ( s 1, s 2 ) = min( s1, s2 ) ed( s max(0, min( s, s ) 1 2 1, s 2 ) ) Object similarity: direct object comparisons. Are two objects the same? E.g., for evaluating the similarity of instances. Set similarity: set comparisons. Are the two sets of objects the same? E.g., for evaluating the similarity of concepts (based on their instances). Set similarity requires a precalculated similarity of the objects based on object similarity method. 22
3. Similarity - Rules Feature Similarity Measure Concepts name String Similarity subclassof instances Object Similarity Set Similarity Relations instances Set Similarity Instances name String Similarity instanceof Object Similarity 23
4. Aggregation How are the individual similarity measures combined? Linearly Weighted sim ( e, f ) = wk simk ( e, f ) k Special Function Aggregation methods are in fact Global similarity methods. 24
5. Interpretation From similarities to mappings. A threshold can be applied on the similarity (measured in the previous step) to determine the required mapping. map(e) = f if sim(e,f)>t The threshold can be determined through test (training) data sets. Manual interpretation based on the collected information is another approach. 25
Outline The Problem of Ontologies Ontology Heterogeneity Ontology Alignment Overall Process Similarity Methods 26
Similarity Methods Local Methods Having local view to compute similarities. Global Methods Having global view to compute similarities and merge computed local similarities. 27
Similarity Local Methods Terminological Methods String Based Methods Language Based Methods Structural Methods Internal Structure External Structure Extensional (based on instances) Methods When the classes share the same instances When they do not 28
Terminological Methods The main idea in using such measures is the fact that usually similar entities have similar names and descriptions in different ontologies. Terminological methods compare strings. Can be applied to: name, label comments concerning entities URI Take advantage of the structure of the string (as a sequence of letter). 29
Terminological Methods - Normalization There are a number of normalization procedures that help improving the results of subsequent comparison: Case normalization: consists of converting each alphabetic character in the strings in their down case counterpart; Diacritics suppression: replacing characters with diacritic signs with their most frequent replacement (replacing Montréal with Montreal); Blank normalization: Normalizing all blank characters (blank, tabulation, carriage return) into a single blank character; Link stripping: normalizing some links between words, e.g., replacing apostrophes and blank underline into dashes; Stopword elimination: eliminates words that can be found in a list (usually like, to, a"... ). 30
Terminological Methods - String Based Substring Similarity Hamming Distance N-Gram Distance Edit Distance Jaro Similarity Token Based Distances 31
Terminological Methods - String Based In string edit distance, the operations usually considered are insertion of a character, replacement of a character by another and deletion of a character. Levenshtein Distance is an Edit Distance with all costs to 1. 32
Terminological Methods Language Based Rely on using NLP techniques to find associations between instances of concepts or classes. Intrinsic methods: perform the terminological matching with the help of morphological and syntactic analysis to perform term normalization. (Stemming) : going go Extrinsic methods: make use of external resources such as dictionaries and lexicons (Wordnet). Resnik Semantic Similarity 33
Structural Methods The structure of entities that can be found in ontology can be compared, instead of comparing their names or identifiers. Internal Structure: use criteria such as the range of their properties (attributes and relations), their cardinality, and the transitivity and/or symmetry of their properties to calculate the similarity between them. External Structure: The similarity comparison between two entities from two ontologies can be based on the position of entities within their hierarchies. 34
Structural Methods External (1) If two entities from two ontologies are similar, their neighbors might also be somehow similar. Criteria for deciding that the two entities are similar include: Their direct super-entities are already similar. Their sibling-entities are already similar. Their direct sub-entities are already similar. All (or most) of their descendant-entities (entities in the sub tree rooted at the entity in question) are already similar. All (or most) of their leaf-entities are already similar. All (or most) of entities in the paths from the root to the entities in question are already similar. 35
Structural Methods External (2) Some existing Approaches: Structural topological dissimilarity on hierarchies Upward Cotopic Distance 36
Extensional (based on instances) Methods Compares the extension of classes, i.e., their set of instances rather than their interpretation. Conditions in which such techniques can be used: When the classes share the same instances When they do not 37
Similarity Global Methods After calculation of local similarity, it is remain to compute the alignment. This involve some kind of more global treatments, including: aggregating the results of these base methods in order to compute the similarity between compound entities organizing the combination of various similarity / alignment algorithms involving the user in the loop finally extracting the alignments (mappings) from the resulting (dis)similarity 38
Compound Similarity Some existing approaches: 39
Users Feed Back The support of effective interaction of the user with the system components is one concern of ontology alignment. User input can take place in many areas of alignment: Assessing initial similarity between some terms; Invoking and composing alignment methods; Accepting or refusing similarity or alignment provided by the various methods. 40
Alignment Extraction The ultimate alignment goal is a satisfactory set of correspondences (mappings) between ontologies. Manual Extraction: Display the entity pairs with their similarity scores and/or ranks and leaving the choice of the appropriate pairs up to the user of the alignment tool. Automatic Extraction: Using Thresholds Hard threshold: retains all the correspondence above threshold n. Delta method: using the highest similarity value to which a particular constant value d is subtracted as a threshold (max d). Proportional method: using the n percentage of the highest similarity value as a threshold. Percentage: retains the n% correspondences above the others. 41
Existing Works Features Method Year Organization Project Leader Automatic Lexical String Semantic Instance Structure Aggregation OntoMorph 1997 S. California Chalupsky Semi T U.S. Army 1999 DARPA Semi T Smart 1999 Sanford Fridman, Noy Semi T T Chimaera 1999 Stanford McGuinness Semi T T T Prompt 2001 Stanford Noy, Musen Semi T T InfoSlueth 2001 Amsterdam Ding Semi T T A. Prompt 2002 Stanford Noy, Musen Semi T T T Glue 2002 Illinois Doan Automatic T T T T IF Map 2003 Southampton Kafoglou Automatic T T NOM 2003 Karlsruhe Ehric Automatic T T T T T QOM 2004 Karlsruhe Ehric Automatic T T T T CROSI 2005 Southampton Kafoglou Automatic T T T 42
Any Question... amini@sharif.edu 43