Towards Rule Learning Approaches to Instance-based Ontology Matching

Similar documents
RiMOM Results for OAEI 2009

Probabilistic Information Integration and Retrieval in the Semantic Web

WeSeE-Match Results for OEAI 2012

HotMatch Results for OEAI 2012

Ontology Matching with CIDER: Evaluation Report for the OAEI 2008

An Approach for Accessing Linked Open Data for Data Mining Purposes

OWL-CM : OWL Combining Matcher based on Belief Functions Theory

FOAM Framework for Ontology Alignment and Mapping Results of the Ontology Alignment Evaluation Initiative

An Approach to Evaluate and Enhance the Retrieval of Web Services Based on Semantic Information

What you have learned so far. Interoperability. Ontology heterogeneity. Being serious about the semantic web

Creating Large-scale Training and Test Corpora for Extracting Structured Data from the Web

Heuristic Rule-Based Regression via Dynamic Reduction to Classification Frederik Janssen and Johannes Fürnkranz

Combining Government and Linked Open Data in Emergency Management

A Framework for Source Code metrics

A Visual Tool for Supporting Developers in Ontology-based Application Integration

Linked Data Querying through FCA-based Schema Indexing

Motivating Ontology-Driven Information Extraction

On Supporting HCOME-3O Ontology Argumentation Using Semantic Wiki Technology

Falcon-AO: Aligning Ontologies with Falcon

Combining Ontology Mapping Methods Using Bayesian Networks

MapPSO Results for OAEI 2010

An Empirical Investigation of the Trade-Off Between Consistency and Coverage in Rule Learning Heuristics

FedX: A Federation Layer for Distributed Query Processing on Linked Open Data

Theme Identification in RDF Graphs

OntoDNA: Ontology Alignment Results for OAEI 2007

Programming the Semantic Web

International Journal of Computer Engineering and Applications, Volume IX, Issue X, Oct. 15 ISSN

Multi-relational Decision Tree Induction

Category Theory in Ontology Research: Concrete Gain from an Abstract Approach

Testing the Impact of Pattern-Based Ontology Refactoring on Ontology Matching Results

Structure of Association Rule Classifiers: a Review

A Framework for Securing Databases from Intrusion Threats

Development of Prediction Model for Linked Data based on the Decision Tree for Track A, Task A1

Data Mining with Oracle 10g using Clustering and Classification Algorithms Nhamo Mdzingwa September 25, 2005

Keywords Data alignment, Data annotation, Web database, Search Result Record

KDD, SEMMA AND CRISP-DM: A PARALLEL OVERVIEW. Ana Azevedo and M.F. Santos

Enabling Product Comparisons on Unstructured Information Using Ontology Matching

Pattern Mining in Frequent Dynamic Subgraphs

Visualizing semantic table annotations with TableMiner+

Efficient Pairwise Classification

Using AgreementMaker to Align Ontologies for OAEI 2010

1. Inroduction to Data Mininig

A Comparison of Error Metrics for Learning Model Parameters in Bayesian Knowledge Tracing

Lily: Ontology Alignment Results for OAEI 2009

AROMA results for OAEI 2009

Cluster-based Similarity Aggregation for Ontology Matching

Ontology Refinement and Evaluation based on is-a Hierarchy Similarity

TrOWL: Tractable OWL 2 Reasoning Infrastructure

A Robust Number Parser based on Conditional Random Fields

RaDON Repair and Diagnosis in Ontology Networks

X-KIF New Knowledge Modeling Language

RiMOM Results for OAEI 2008

Towards a Global Component Architecture for Learning Objects: An Ontology Based Approach

e-ccc-biclustering: Related work on biclustering algorithms for time series gene expression data

KNOWLEDGE GRAPHS. Lecture 1: Introduction and Motivation. TU Dresden, 16th Oct Markus Krötzsch Knowledge-Based Systems

Leveraging Terminological Structure for Object Reconciliation

KOSIMap: Ontology alignments results for OAEI 2009

A Semantic Role Repository Linking FrameNet and WordNet

LPHOM results for OAEI 2016

Web Services Annotation and Reasoning

WEB PAGE RE-RANKING TECHNIQUE IN SEARCH ENGINE

Introducing Partial Matching Approach in Association Rules for Better Treatment of Missing Values

Semantic Web. Ontology Alignment. Morteza Amini. Sharif University of Technology Fall 95-96

PROJECT PERIODIC REPORT

YAM++ : A multi-strategy based approach for Ontology matching task

POMap results for OAEI 2017

Lecture 1: Introduction and Motivation Markus Kr otzsch Knowledge-Based Systems

An Architecture for Semantic Enterprise Application Integration Standards

An FCA Framework for Knowledge Discovery in SPARQL Query Answers

Improving Recognition through Object Sub-categorization

Cate: A System for Analysis and Test of Java Card Applications

Modelling Structures in Data Mining Techniques

Classification Algorithms in Data Mining

Linked Data and RDF. COMP60421 Sean Bechhofer

Extracting knowledge from Ontology using Jena for Semantic Web

Semantic Web Methoden, Werkzeuge und Anwendungen

Ontology Modularization for Knowledge Selection: Experiments and Evaluations

IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, ISSN:

Shrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India

Context-Aware Analytics in MOM Applications

WebIsALOD: Providing Hypernymy Relations extracted from the Web as Linked Open Data

Presented by: Dimitri Galmanovich. Petros Venetis, Alon Halevy, Jayant Madhavan, Marius Paşca, Warren Shen, Gengxin Miao, Chung Wu

INTRODUCTION. Chapter GENERAL

EQuIKa System: Supporting OWL applications with local closed world assumption

Efficient, Scalable, and Provenance-Aware Management of Linked Data

Transforming Transaction Models into ArchiMate

Taccumulation of the social network data has raised

AN IMPROVISED FREQUENT PATTERN TREE BASED ASSOCIATION RULE MINING TECHNIQUE WITH MINING FREQUENT ITEM SETS ALGORITHM AND A MODIFIED HEADER TABLE

A mining method for tracking changes in temporal association rules from an encoded database

SEMANTIC ENHANCED UDDI USING OWL-S PROFILE ONTOLOGY FOR THE AUTOMATIC DISCOVERY OF WEB SERVICES IN THE DOMAIN OF TELECOMMUNICATION

Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p.

Towards Generating Domain-Specific Model Editors with Complex Editing Commands

SIMON: A Multi-strategy Classification Approach Resolving Ontology Heterogeneity The P2P Meets the Semantic Web *

Open Research Online The Open University s repository of research publications and other research outputs

TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES

YAM++ Results for OAEI 2013

Hierarchical Clustering of Process Schemas

Ontologies and similarity

Development of an Ontology-Based Portal for Digital Archive Services

A Parallel Evolutionary Algorithm for Discovery of Decision Rules

Transcription:

Towards Rule Learning Approaches to Instance-based Ontology Matching Frederik Janssen 1, Faraz Fallahi 2 Jan Noessner 3, and Heiko Paulheim 1 1 Knowledge Engineering Group, TU Darmstadt, Hochschulstrasse 10, 64289 Darmstadt, Germany {janssen,paulheim}@ke.tu-darmstadt.de 2 ontoprise GmbH, An der Raumfabrik 29, 76227 Karlsruhe, Germany fallahi@ontoprise.de 3 KR & KM Research Group University of Mannheim, B6 26, 68159 Mannheim, Germany jan@informatik.uni-mannheim.de Abstract. Ontology matching approaches have mostly worked on the schema level so far. With the advent of Linked Open Data and the availability of a massive amount of instance information, instance-based approaches become possible. This position paper discusses approaches and challenges for using those instances as input for machine learning algorithms, with a focus on rule learning algorithms, as a means for ontology matching. 1 Introduction Today, data integration results produced by various developed automated algorithms and systems are still error-prone. Therefore, the field of data integration is a major challenge of modern information technology and of significance to various research areas. Ontologies can play a key role in resolving semantic heterogeneity by providing a formal description of a domain. Not only since the advent of Linked Open Data, ontology mappings have become an essential ingredient to ensure the interoperability of data sets using different ontologies. While Linked Open Data is much concerned about the linking of instances, mappings on the schema level, e.g. for federating queries across different data sets, are currently under-represented. At the same time, the vast amount of instance data in Linked Open Data allows for developing powerful instance-based approaches to ontology matching to complement the currently predominant schema-level approaches. Utilizing matching algorithms with lexical distance measure and pattern recognition techniques for automatically finding mappings between ontologies do not always yield the expected results. Where lexical distance measure algorithms fail, machine learning algorithms which are based on symbolic representations can serve as a remedy, at least to some extent.

2 Janssen et al. In the last years, a major research focus has been set on schema based solutions. Hence, a variety of state-of-the-art algorithms for alignment have been developed, which work mostly on traversing schemas and their structure, and only a few approaches explicitly use data about instances [1, 2]. In the scope of the MappingAssistant project [4], instance data has been utilized to repair and refine existing ontology alignments. In this paper, we discuss two possible approaches for employing machine learning for instance-based ontology matching. The basic idea of both of them is to use rule learning to solve this problem. The main advantage of rule learning is its symbolic character, i.e., rules can be interpreted and compared to each other. In a first case study, we use association rule learners on the instance level to discover mappings. A second case study shows how separate-and-conquer rule learners can be employed to further refine mappings. Both case studies aim at demonstrating the ideas and exploring possible challenges. 2 Case Study 1 Creating Mappings by Rule Learning In this case study, we focus on cases where a set of instances is contained in different data sets either identified by identical URIs or connected via owl:sameas statements. While this is rather frequent in Linked Open Data, the OAEI datasets typically do not contain shared instances. Therefore, we have created our own dataset, using an excerpt from DBpedia. As DBpedia uses its own ontology as well as YAGO for classification, it provides an instance set using different ontologies, which can be used as a benchmark for instance-based ontology matching. To construct our dataset, we used an existing, publicly available partial 1:1 mapping 4 between the DBpedia and the YAGO ontology as a starting point. The dataset contains 169 simple mappings between DBpedia and YAGO. For that dataset, we retrieved all instances that contain at least one of the types in the mapping. That data set has a total of 231,635 instances. We use association rule mining to discover ontologies, as suggested by [6]. Association rules are used, e.g., to discover sets of items that are likely to be bought together, in order to build recommender algorithms. Likewise, we use them to find out which classes (and other ontology constructs) frequently cooccur, and derive mappings from those co-occurences. The portions of the two ontologies under consideration are essentially different. For example, the YAGO ontology relies mainly on a rich classification and contains very special classes such as yago:industrialrockmusicgroups, while the DBpedia relies on relations to describe such classes, e.g., as a DBpedia-owl: Band with a value of DBpedia:IndustrialRock for DBpedia-owl:genre 5. Thus, the mapping between the two ontologies can only be expressed by a complex one: yago:industrialrockmusicgroups DBpedia-owl:Band DBpedia-owl:genre. {DBpedia:IndustrialRock} 4 http://www.netestate.de/de/loesungen/dbpedia-yago-ontology-matching 5 See http://dbpedia.org/resource/nine_inch_nails for this example.

Towards Rule Learning Approaches to Instance-based Ontology Matching 3 100 90 1 0.9 80 0.8 number of mapping rules 70 60 50 40 30 20 0.7 0.6 0.5 0.4 0.3 0.2 precision precision # of rules 10 0.1 0 0 >=95 >=90 >=85 >=80 Minimum confidence threshold Fig. 1. Matching results for complex mappings In a first experiment, we tried to infer the simple 1:1 mappings. Using the feature generation toolkit FeGeLOD [5], we added a boolean feature for each rdf:type property for all the instances in our data set, ending up with a total of 98,414 features. On this data set, we used an association rule mining algorithm to find rules expressing frequent co-occurring rdf:type statements, i.e., two classes in different ontologies that share many instances. For example, from two symmetrically occurring rules, we conclude a mapping as follows: DBpedia-owl:ProtectedArea yago:park yago:park DBpedia-owl:ProtectedArea DBpedia-owl:ProtectedArea yago:park As discussed in [5], an F-Measure of up to 25% can be achieved with this naive approach. While this is not as much as state-of-the-art approaches, the mappings found are often hard to find with conventional approaches, as the above example shows. In a second experiment, we tried to infer complex mappings between two classes. To that end, we have created a second dataset which comprises not only the type information as in the first case, but also boolean properties indicating whether there is an incoming or outgoing relation of a certain type. The resulting data set has 117,029 features. From that dataset, we were able to discover complex mappings, such as 1DBpedia-owl:name yago:person Since there is, to the best of our knowledge, no gold standard for complex mappings, we have evaluated our approach by counting the number of rules found, and manually determining the rules that are correct. The results are depicted in Fig. 1 for different thresholds. The experiments show that it is possible to employ association rule learning for discovering ontology mappings. Typical challenges include scalability to larger datasets, learning more expressive rules, and evaluation of complex mappings.

4 Janssen et al. @relation car dataset from ontology O 1 @attribute acceleration {low,medium,high} @attribute cargocapacityrating {low,high} @attribute passengerspacerating {low,high} @attribute conveniencerating {low,medium,high} @attribute milespergallon {low,medium,high} @relation cars dataset from ontology O 2 @attribute acceleration {low,medium,high} @attribute cargocapacity {low,high} @attribute passengerspace {low,high} @attribute convenience {low,medium,high} @attribute numberofextras {low,medium,high} @attribute mpg {low,medium,high} @data @data high,low,high,medium,low high,low,high,medium,medium,low high,low,low,high,medium high,low,low,low,high,high low,low,high,high,low low,low,high,high,high,low low,low,low,low,medium low,low,low,high,medium,low medium,high,high,low,low low,high,high,high,low,medium medium,high,low,high,medium medium,high,high,high,high,medium low,high,high,medium,high low,high,high,medium,low,high...... Figure 1: A 4 class classification problem Fig. 2. Excerpt of the input for the separate-and-conquer approach 3 Case Study 2 Extending Mappings by Rule Learning In the second case study, our goal is to use classification rule learning to find non-trivial alignments between ontologies, which cannot be found through conventional schema-based matching techniques. We focus on finding additional property alignments, assuming that some alignments between properties defined in two ontologies O 1 and O 2 are given as an initial mapping, as indicated by the arrows in Fig. 2. Note that in that example, the property numberofextras that is present in O 2 is missing in O 1. The goal is to detect additional property alignments between O 1 and O 2. In the example, the alignment still left to detect is milespergallon mpg. For each unmapped property, we construct a dataset as input for a rule learner. The datasets are created by including all the instances available from the A-boxes of the two ontologies, and using the unmapped property as a class attribute. In Fig. 2, three classification problems would be created: using milespergallon as the target class in the data set for O 1, and using numberofextras and mpg as the target class in the data set for O 2. In our example, we learn rule sets for each non-aligned property q 1 and q 2, using a CN2-like rule learner. While for the illustration of the approach the actual learning algorithm is not that important (given that the output are rules), certainly the performance of the approach is strongly related to the used algorithm (cf. the discussion in Section 4). In the end, the rule learner outputs three rule sets, one for each non-aligned property, e.g., one ruleset R 1 for the property milespergalllon of O 1, and two rulesets (R 2 and R 3 ) for the two properties that reside unmapped in O 2 (numberofextras and mpg). The key idea of the approach is to compare the rules found on the instances of the two ontologies: if two rules comprise attribute-value tests for properties that are previously mapped, the consequence is that the target properties (the prediction

Towards Rule Learning Approaches to Instance-based Ontology Matching 5 of the rules) then can be also mapped, because their characteristics are similar to a certain extent. To illustrate such a mapping, in the following examples for similar and different rules are shown. The best case is that the rules are identical: r 1,1: milespergallon=medium conveniencerating=high acceleration=high. r 2,1: mpg=medium convenience=high acceleration=high. where r i,j stands for the j-th rule of rule set i. Rule r 1,1 tests whether the conveniencerating is high and the acceleration is high. Due to the predefined mappings, it can be concluded that both rules use the same properties with the same values and hence are similar (with confidence 1.0). Note that all numerical values of all properties are discretized in a preprocessing step for simplification issues. Therefore, rules that are exactly similar may not be so rare. Nevertheless, rules that are found on real world data may not be identical. An example for such different rules is given below. r 1,3: milespergallon=high acceleration=medium cargocapacityrating=low. r 3,3: numberofextras=high convenience=high passengerspace=high. In that case, the rules are completely dissimilar. Thus, a similarity measure sim r (r, r ) for a pair of rules r and r is needed. This similarity of individual rules is used then to compute the similarity of the whole rule sets sim R (R, R ). A naive choice to define the similarity of two rule sets could look as follows: sim R (R, R ) = sim r(r 1,i,r 2,j) θ (tp(r1,i)+tp(r2,j)) /2 ( D 1 + D 2 )/2 where D i is the total number of examples in dataset D i, tp(r) yields the number of true positive examples covered by rule r (i.e., the examples that are correctly covered by the rule r), and θ is a threshold that decides whether the rules are equal or not. The similarity of single rules can naively be defined by { sim r (r, r 1 if r matches r exactly ) = (2) 0 otherwise Clearly, in real world situations these definitions are too restrictive. Nevertheless, despite their simplicity, first experiments show that they are sufficient. These preliminary experiments with two ontologies containing a few instances show that the approach works. The two rule sets R 1 (rules for milespergalllon) and R 3 (rules for mpg) were identical. In contrast, the rule sets learned for numberofextras and milespergallon are completely different, since these are different properties. Thus, the overall similarity value for numberofextras and milespergallon is significantly lower than for mpg and milespergallon. Therefore, in the end, the mapping milespergallon mpg was found while number- OfExtras is left unmapped. The case study shows that rule learning can be used to derive additional mappings from those already found. Open challenges include the definition of metrics for rule comparison, the use of suitable rule learning heuristics [3], and the decomposition of the ontology matching problem into a binary classification problem. (1)

6 Janssen et al. 4 Conclusion and Challenges In this paper, we have discussed approaches of learning and refining non-trivial ontology alignments from instances through utilization of inductive rule learning algorithms. We have shown two case studies which examine how finding and refining ontology mappings can be reformulated as problems of association rule mining and separate-and-conquer rule learning, respectively. While the case studies show the feasibility of employing rule learning algorithms for ontology matching, there are quite a few open research issues and challenges. The first case study has shown an approach which is especially suitable for discovering complex mappings. Suitable benchmark datasets for complex mappings do not exist at the moment. Generating those benchmark sets from Linked Open Data is a suitable option, which at the same time produces new scalability challenges. When dealing with instance-based matching, using a reasoner in the loop is also an interesting opportunity for refining mappings. Necessary preprocessing steps, e.g., for dealing with numerical data properties, are also a subject of future research. The second case study has used the similarity of rule sets as a means of assessing validity of ontology mappings. Finding suitable similarity measures of rule sets is a challenge for future work. It may also be a valid direction to use other learning models (e.g. subgroup discovery, or even decision trees) and suitable heuristics, which may allow different and maybe better means for comparison. Although those challenges are non-trivial and require further deep investigations, we are confident that rule learning is suitable means to instance-based ontology matching. Thus, we are confident that the approaches sketched in this paper will lead to the development of high-performance matching algorithms. References 1. M. Ehrig and Y. Sure. Ontology Mapping - An Integrated Approach. In C. Bussler, J. Davis, D. Fensel, and R. Studer, editors, Proceedings of the 1st European Semantic Web Symposium, volume 3053, pages 76 91, Heraklion, Greece, 2004. Springer Verlag. 2. J. Huber, T. Sztyler, J. Nößner, and C. Meilicke. CODI: Combinatorial Optimization for Data Integration: Results for OAEI 2011. In Proceedings of the 6th International Workshop on Ontology Matching, Bonn, Germany, October 2011. 3. F. Janssen and J. Fürnkranz. On the Quest for Optimal Rule Learning Heuristics. Machine Learning, 78(3):343 379, March 2010. DOI 10.1007/s10994-009-5162-2. 4. J. Noessner, F. Fallahi,, E. Kiss, and H. Stuckenschmidt. Interactive data integration with mappingassistant. In Demo Proceedings of the 10th International Semantic Web Conference (ISWC), Bonn, Germany, October 2011. 5. H. Paulheim and J. Fürnkranz. Unsupervised Feature Generation from Linked Open Data. In 2nd International Conference on Web Intelligence, Mining, and Semantics, 2012. to appear. 6. J. Völker and M. Niepert. Statistical Schema Induction. In Proceedings of the 8th Extended Semantic Web Conference: Linked Open Data Track, pages 124 138, Berlin, Heidelberg, 2011. Springer-Verlag.