SeMap: A Generic Schema Matching System

Size: px
Start display at page:

Download "SeMap: A Generic Schema Matching System"

Transcription

1 SeMap: A Generic Schema Matching System by Ting Wang B.Sc., Zhejiang University, 2004 A THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF Master of Science in The Faculty of Graduate Studies (Computer Science) The University Of British Columbia August, 2006 c Ting Wang 2006

2 Abstract The rapidly growing number of autonomous data sources on the web makes the need of effective tools of creating semantic mappings increasingly crucial. Moreover, the goal of allowing applications to have more expressive semantics requires a change in focus. While most previous work focus on creating mappings in specific data models for data transformation, they fail to capture a richer set of possible relationships between schema elements. For example, current schema matching approaches might discover that TA in one schema equals to grad TA in another one, even though the relationship can be modeled more accurately by saying that grad TA is a specialization of TA. This increased semantics of the mapping in turn allows for applications involving richer semantics. In this thesis we concentrate on the following problem: given initial match (correspondence) information produced by current schema matching techniques, how to construct a complex, semantically richer mapping that can be used across data models? Specifically, we aim at detecting the relationship types of Has-a, Is-a, Associates and Equivalent. Technically, we achieve this goal in mainly three steps: (1) exploiting various types of semantic evidence for possible matches; (2) finding a globally optimal match assignment; (3) identifying the relationship embedded in the selected matches. We implemented our semantic matching approach within a prototype system SeMap, and tested its accuracy and effectiveness. ii

3 Table of Contents Abstract ii Table of Contents iii List of Tables v List of Figures vi Acknowledgements viii 1 Introduction Motivation Contribution Organization Related Work Relationship Classification Equivalence Relationships Set-Theoretic Relationships Generic Relationships Schema Matching Techniques Rule-Based Solutions Learning-Based Solutions Ontology Alignment Techniques Sample Prototypes Rondo Cupid COMA imap Problem Formulation Representation iii

4 Table of Contents 3.2 Problem Statement Semantic Resources Internal Resources External Resources Approach Overview Schema Matcher Match Selector Mapping Assembler SeMap System Schema Matcher Base Matcher Similarity Score and Lineage Information Element-Level Matcher Structure-Level Matcher Architecture of Schema Matcher Match Selector Representation Bidirectional search Modeling user interaction Mapping Assembler Combining Map s and Map t Identifying relationships Assembling mapping Experimental Analysis Experimental Setting Data Set Expert Mapping Evaluation Metrics Experimental Methodology Experimental Result Matching Accuracy Component Contribution Incorporating User Feedback Discussion Conclusion & Future Work Bibliography iv

5 List of Tables 5.1 Characteristics of the input schemas Characteristics of the expert mappings v

6 List of Figures 1.1 An example of input schemas and output mapping A classification of current schema matching techniques. Courtesy of [22] Representation of model. The left plot shows a graphical representation of a model, comprised of nodes (elements) and edges (relationships). The right table shows the tuple representation of edges Illustration of four relationship types handled by SeMap An example of complex mapping handled by SeMap Illustration of the matching process The basic system architecture of SeMap. It takes two models and external resources as input, and produces generic semantic mapping. It consists of three main parts: the schema matcher, the match selector and the mapping assembler Architecture of schema matcher. It consists of three layers, base matcher, combining layer and structure matcher Partial match assignments from the perspectives of source and target schemas respectively Mapping assembling for matches of different types. Each 1-1 equivalence match corresponds to one mapping element, while each element of complex match is associated with one mapping element Matching accuracy of SeMap. The three plots show the recall, precision and F-measure of the matching results for the three relationship types Equivalent, Has-a, Is-a and total correct matches respectively Error analysis of the resulting mappings vi

7 List of Figures 5.3 The precision of SeMap after pruning incorrect matches. The bars from left to right shows the matching results for the three relationship types Equivalent, Has-a, Is-a respectively Relative contribution of different types of semantic evidences to the matching results of SeMap. The two plots (from up to down) show the F-measure of identified matches (correspondences) and identified relationships respectively F-measure of correct correspondences versus the amount of user interaction (percentage of expert matches provided over the total number of matches). The curves for four datasets (Real Estate 1/2, Course Info 1/2) are shown vii

8 Acknowledgements I would like to express my gratitude to all those who have offered me help in completing this thesis. Especially, I owe the greatest thanks to my supervisor Rachel Pottinger, who provided me with excellent guidance and support in the entire process of this thesis project. I want to thank Dr. Tsiknis for giving me insightful comments on this work, and being my second reader. I would like also to thank all the members at database management lab, especially Jian Xu for their constructive suggestions. Without their help, this work would not be possible. Finally, I thank all my friends at the University of British Columbia. It has been a wonderful experience to grow up with them. viii

9 Chapter 1 Introduction 1.1 Motivation Spurred by the growth of data sources on the web, information systems are witnessing a paradigm shift from monolithic databases to heterogeneous, interacting data sources. The fundamental problem in sharing data from multiple sources is to deal with the semantic heterogeneity inherent in their autonomous nature, and the key is to identifying the semantic correspondences between them. The operation of finding such correspondences is called Match, which takes two schemas as input and produces a semantic mapping, specifying the relationships between elements of the two schemas. Such semantic mappings play a crucial role in numerous data sharing applications, including web data integration, schema evolution and migration, component-based development, etc. Currently, the creation of semantic mappings, especially complex ones is still mostly done manually, possibly supported by a graphical user interaction interface. Manually creating semantic mappings is a tedious, errorprone process. The labor-intensity grows linearly as the matches to be performed. Hence the rapidly increasing number of web data sources necessitates automatic support for schema matching. 1

10 Chapter 1. Introduction The problem of semi-automatically creating mappings has attracted on intensive research in both the database and AI communities [2, 4, 10, 15, 28]. The procedure is comprised of two phases, schema matching and mapping construction. In schema matching, equivalence correspondences between elements of both schemas are identified. The equivalence correspondences can be one-to-one (1-1) matches, e.g., class corresponds to course, or complex matches containing more than one element in each schema, e.g., TA maps to some combination of grad TA and ugrad TA. Note that the focus of schema matching is to find such potential correspondences, rather than giving a final mapping to the users. Finding this mapping is done in mapping construction, where the identified correspondences are built on by adding more specific semantic information to generate a semantically rich mapping. schema S Map S T schema T class m 1 (=) course Has-a professor Has-a instructor Associates dept Has-a m 2 (=) Has-a m 3 (=) Is-a m 5 (=) m 4 (=) Is-a m 6 (=) Associates college Has-a grad TA Associates faculty Has-a TA m 7 (=) ugrad TA Is-a Is-a m 8 (=) m 9 (=) Figure 1.1: An example of input schemas and output mapping. As a typical example of mapping construction, Clio [32] includes a set of user-interaction techniques to create SQL-style mappings, based on the output of an initial schema match. Such semantic mappings are necessary 2

11 Chapter 1. Introduction to transform data. Clio however, like most other previous work on mapping construction, is restricted to relational and XML-style schemas; it does not capture the general richness of the possible relationships between elements in a data-model-independent fashion. Thus, although many common relationship types exist across SQL and XML (e.g., specialization), this work cannot be used to create the XML-style mapping. Data sources on the web however are, of various data models, e.g., XML, HTML, RDF, ontologies, text, etc. Hence exploring how to create richer, general relationships between schema elements, rather than concentrating on the specific data model under consideration, allows us to understand the general space of the possibilities. It also allows better reuse of ideas, since one does not have to create a separate algorithm for each ensuing data model. After a mapping with such general relationships is constructed, the transformations into a specific data model can be made more concretely. For example, it can be easily transformed into specific forms, e.g., SQL views or XSLT transformations, thus excluding the need of maintaining specific mappings separately. Also, a generic mapping can create a uniform interface between domain knowledge (ontologies) and web interface (database schemas), which is helpful for semantic web applications. Furthermore it can be fed into a model management system [17], which aims to solve meta-data problems in a data model neutral fashion, or used for knowledge inference when applied to ontology domain. An example of a generic semantic mapping is shown in Figure 1.1, where two schemas S and T represent the concepts of class and course respectively. A generic mapping S T is constructed, specifying a rich collection of 3

12 Chapter 1. Introduction semantic relationships between the elements of S and T, e.g., college of T Has-A dept of S, while instructor of S Is-A faculty of T. The relationship types adopted in this thesis follow the relationship classification of [21]. Compared with the equivalence relationships (1-1 or complex) considered in previous literature, this relationship classification is semantically richer and more expressive. Equipped with such generic mappings, one can envision a number of applications. For example, one problem facing current semantic web applications is the lack of domain specific knowledge (e.g., ontologies). If domain knowledge in different representations can be mutually converted, the collection of knowledge will be significantly enriched. 1.2 Contribution In this thesis we explore constructing such generic semantic mappings, based on initial match information that shows correspondences between the elements of both schemas. This initial match information can be produced by current schema matching techniques. Mapping construction takes as input a set of initial matches produced by a set of schema matching algorithms, and generates a semantically richer mapping, such as the one in Figure 1.1, which describes complex relationships between elements of both schemas. Specifically, mapping construction is responsible for searching for a global optimal match assignment from the pool of possible assignments, solving the conflicts among the selected matches, and identifying the complex relationships between the schema elements, e.g., the Has-A relationships in Figure 1.1. However, constructing 4

13 Chapter 1. Introduction a generic semantic mapping is fundamentally difficult for several reasons: Finding correspondences with generic semantic relationships is substantially harder than simple equivalence, since the space of possibility under consideration is much larger, and more semantic evidence is needed; The pool of initial matches is possibly quite large. This search space is large enough in considering n:1 equivalence matches to make most matching algorithms only consider 1:1 matches, but when relationships other than simple equivalence are considered, it is infeasible to try all possible combinations to find the optimal assignment; Various semantic constraints can be imposed, rendering match selection a complicated constrained optimization problem; Identifying the relationships implicit in matches is a hard problem, and one that is made more difficult by attempting to make our output data model independent. As in schema matching, mapping construction inherently can not be fully automatic. The importance of user feedback is recognized in schema matching research [4, 31], however no systematic modeling of user interaction for mapping construction is available to date. One of the goals of our work is to limit interaction to critical points to help focus user attention and minimize user effort. Aiming at overcoming the problems listed above, in this thesis we describe a prototype system SeMap to create a generic, semantic mapping. We 5

14 Chapter 1. Introduction choose a graph-based representation that is similar to that used in model management [17], which is expressive enough to accommodate both schemas of many types and other meta-data, such as ontologies. Specifically, we make the following contributions: An architecture for semi-automatically constructing generic semantic mappings based on initial correspondence information; A novel probabilistic framework that incorporates match uncertainty and semantic constraints in a uniform way, and expresses match selection to a mathematical optimization problem; Effective modeling of user interaction to help focus user attention and minimize user effort, by detecting critical points where feedback is maximally useful; Effective solution to extracting implicit relationship of initial match based on various types of semantic evidences; A prototype system embodying the innovations above and a set of experiments to illustrate the correctness and effectiveness of our approach. 1.3 Organization This thesis is a specification of our schema matching system SeMap. The goal is to present the technical details in implementing the system. Specifically, we intend to make clear mainly the following three aspects: 6

15 Chapter 1. Introduction 1. The formulation of the problem, including the exact representation of the input/output of the system, the resources we use and the assumptions we have made; 2. The specification of the system, including the system architecture, the exact input/output and interior structure of each component and their interaction; 3. The experimental analysis, including the dataset we can use, the metric we use to evaluate our approach, the experimental result and its explanation. The remainder of the thesis will be organized as follows: Chapter 2 presents a survey of related work; Chapter 3 formally defines the problem of mapping construction and gives an overview of the architecture of our system. In Chapter 4, we describe our mapping construction approach in more details; Chapter 5 presents the experimental analysis of our approach; and Chapter 6 concludes this thesis and presents future work. 7

16 Chapter 2 Related Work Semi-automatically creating semantic mappings has attracted upon intensive research in both the database (schema matching) and AI (ontology alignment) communities. The key differences and similarities of schema matching and ontology alignment include: Differences. Ontologies are logical systems, which obey some formal semantics, i.e., they can be interpreted as a set of logical axioms; however database schemas often provide no explicit semantics for their data. Similarities. Schemas and ontologies are quite similar in the sense that (1) they both provide a vocabulary of terms that describe a domain of interest and (2) they both constrain the meaning of terms used in the vocabulary [30]. Due to their differences, schema matching is usually performed with the techniques to guess the semantics implicit in the schemas, while ontology alignment is designed to exploit the knowledge explicitly encoded in the ontologies. Their similarities however make the solutions from these two problems mutually beneficial. Following, we will discuss the problems of schema matching and ontology alignment as a whole. 8

17 Chapter 2. Related Work In this chapter, we present a survey of related work in three parts: first in Section 2.1 we classify the current schema matching/ontology alignment techniques based on the relationships they can handle; we then discuss some typical techniques used in these approaches, specifically, schema matching in Section 2.2 and ontology alignment in Section 2.3; finally, we present several example prototype matching systems in Section Relationship Classification The relationship types created by matching techniques can be roughly divided into three categories: equivalent relationship, set-theoretic relationships and generic relationships. Specifically, two schema elements having the equivalent relationship means they are semantically equivalent, and the techniques to identify equivalent relationship is described in Section 2.1.1; the set-theoretic relationship classification regards each schema element as a set, and specifies their relationship as one of equivalence, subsumption, intersection, disjointness and incompatibility, which is discussed in Section 2.1.2; the generic relationships refer to those non-equivalent relationships, such as Has-a and Is-a relationships discussed in this thesis. Two typical classification of generic relationships can be found in ontology modelling [18] and meta-data management [21]. The techniques developed so far to handle generic relationships is presented in Section

18 Chapter 2. Related Work Equivalence Relationships With the main goal of data transformation in specific data models, most schema matching/ontology alignment algorithms to date aim at discovering the equivalence relationships [2, 3, 4, 10, 13, 14, 16, 31]. The found equivalence correspondence can be the case of a 1-to-1 match (e.g., course = class ), or a complex match (e.g., name = concat( first-name + lastname )). The complexity of creating multi-arity (1-to-n or even n-to-m) matches is significantly harder than 1-to-1 matches for several reasons: (1) while the number of candidate matches is bounded for 1-to-1 match (the product of the sizes of two schemas), the number of match candidates to be considered in complex case is much larger. (2) it is inherently difficult to generate a match to start with in the case of multi-arity matches. That is in the case of n-to-m match, it is difficult to determine n and m in order to generate a set of candidate matches. Hence to date most the work on schema matching has been focused on discovering 1-to-1 equivalence correspondences between schema elements [3, 4, 10, 13, 14, 16, 31]. R. Dhamankar et al. [2] proposed imap, a prototype of identifying 1-to-n correspondence matches, which reformulates schema matching as a search in an often very large match space. To search effectively, it employs a set of searchers, each discovering specific types of complex matches. However, while attempting to discover semantically equivalent correspondences, it is possible that the matches identified by these techniques may not be exactly of equivalence relationships; they may instead be the 10

19 Chapter 2. Related Work semantically richer relationships we are endeavoring to find, such as the relationship between TA and { grad TA, ugrad TA } as shown in Figure Set-Theoretic Relationships The equivalence relationship can be considered as a special case of the settheoretic relationships, which can specify the relative containment relationship between two sets. In [26], an effective solution is proposed to identify inter-set relationships by bidirectionally comparing the containment of data instances and meta-data, of different schema elements. The problem with this approach is that the data instances associated with the two schemas should be in the same universe, otherwise the comparison of containment relationship is not meaningful. However, in many applications, especially web data integration, the data sources do not overlap Generic Relationships There have been very few works on finding generic relationships between schema elements. The solution proposed by D. Embley et al [5] relies heavily on a domain-specific ontology to find the relationships of Merge/Split (e.g., Address consists of Street, City and State ), Superset/Subset (e.g., Phone contains both Phone day and Phone evening ), and Set-Name as Value (e.g., the attribute Water-front in one schema appears as a value of the attribute House-description in the other schema). The basic idea is to first map the schema elements to a comprehensive domain-specific ontology, and the relationships between schema elements can then be determined by that of their counterparts in the ontology. This 11

20 Chapter 2. Related Work approach requires (1) a comprehensive ontology that covers all possible concepts that may appear in schemas in that domain; (2) a domain-specific thesaurus that can map schema elements to their alternative representations in the ontology. Such ontology and thesaurus are usually hard to obtain in real scenarios. Our work has fairly simple requirement for the needed semantic information, available in most schemas, and does not assume any comprehensive ontology. Nevertheless the existence of such ontology can improve the quality of the matching results of our system. F. Giunchiglia et al. [7] proposed the concept of semantic matching, a pure schema-based approach. The basic idea is to first populate each element name with their meanings in some domain-specific dictionaries, and computes the specialization relationship of schema elements based on the containment relationship of their meanings. Their approach however works only for identifying Is-a relationship and tree-structured schemas. 2.2 Schema Matching Techniques The research on schema matching/ontology alignment provides a wealth of techniques to semiautomatically find semantic matches. The techniques can be classified by the information they exploit [22] as shown in Figure 2.1: the matches can be found by exploiting one type of semantic evidence (schemalevel, data instance-level, etc), or combining multiple types of evidences (i.e., hybrid matchers, which integrate multiple matching criteria, and composite matchers, which combine results of independently executed matchers [22]). Matching techniques can also be classified by their methodologies into rule- 12

21 Chapter 2. Related Work based and learning-based solutions, which will be discussed in Sections and respectively. schema matching techniques individual matcher combining matcher schema-level instance-level hybrid matcher combining individual matcher element-level structure-level element-level name similarity graph matching value patterns type similarity... data distribution... frequent term... manually: iterative user feedback automatic: matcher selection result combination Figure 2.1: A classification of current schema matching techniques. Courtesy of [22] Rule-Based Solutions Rule-based matching [7, 14, 16] techniques constitute a wealthy collection of schema matching solutions, which have been used in both early and current matching applications. Rule-based techniques discover similar schema elements by exploiting schema-level information using hand-crafted rules. A broad variety of rules have been devised to exploit all possible information, including element name (label), data types, structures, number of subelements, and integrity constraints. For example, F. Giunchiglia et al. [7] proposed to exploit the semantic meanings of element names to discover similar elements; Cupid [14] employs rules that categorize elements based on names, data types and domains; Similarity flooding measures pairwise similarity by propagating similarity from some fixed points according to the schema structures. 13

22 Chapter 2. Related Work The rule-based techniques have some desirable features: (1) they are usually inexpensive in computation and require no training process as in learning-based approaches; (2) they usually require only schema-level information, which is available in most matching scenarios; (3) if some domain knowledge is available, one can specify domain-specific rules, which can work very well in certain types of applications. For example, users can write regular expressions that encode times or phone numbers, or quickly compile a collection of zip codes to help recognize these types of entities. The learning methods however can hardly deal with these scenarios. They either can not learn some complex rules, or require a large amount of training data with the correct representation for desired result, which is usually hard to obtain. However the rule-based techniques have several drawbacks: (1) they can not effectively exploit data-instance level information, even though the data instances provide valuable information, e.g., precise data format, data distribution, statistical values, etc. It is possible in some cases that the schemalevel information is opaque or very difficult to interpret, e.g., the element names like A or B1 are too abstract to be interpreted. In contrast, learning methods such as Naive Bayes can easily construct some probabilistic rules that find similarity in such scenarios, based on the distribution of data instances [11]; (2) moreover, rule-based techniques can not exploit previous matching results to improve the current matching process. Hence in a matching application for a specific domain, the rule-based techniques are usually insufficient. 14

23 Chapter 2. Related Work Learning-Based Solutions Motivated by the drawbacks of rule-based matching methods, a collection of learning-based solutions have been proposed: these methods have considered a variety of learning techniques, and exploited both schema-level and data instance level information. For example, Doan et al. proposed the LSD system, which employs Naive Bayes learning method over data instances, and also exploited the structure information of XML data format; The imap system [2] pays attention to the description of elements, in addition to other schema information. In developing learning techniques for schema matching, it has been realized that considering only schema-level or data instance-level evidence in the schemas being matched is often insufficient for a purpose of more accurate matching. Hence, several types of external resources have been considered to improve the matching quality. For example, assuming a domain-specific ontology is available, one technique is to first maps schemas/ontologies into the ontology, then constructing the matches based on the relationships inherent in that ontology [5]. For example, it is hard to identify the relationship between direct and free toll by using regular approaches such as string comparison. However, by mapping them to a domain-specific ontology, one can find that they are both specializations of the concept phone, so that it can be concluded that direct is highly similar to free toll. Some recent works advocate exploiting past matching results to improve current ones [3, 4], with the basic idea of learning from past matches to predict unseen matching scenarios. An alternative solution considers learning 15

24 Chapter 2. Related Work from a corpus of schemas and matches [14]. Such corpus provides alternative representations of concepts in the domain, i.e., functions in the same way as ontology, thus can be leveraged to discover similarity between schema elements. However, it is not always practical to have such external resources, particularly since such these resources must be domain-specific to be effective. 2.3 Ontology Alignment Techniques Ontology alignment deals with finding corresponding concepts in different ontologies. In this section, we present some typical work on ontology alignment, and a comprehensive survey is referred to [10]. OntoMorph [1] focused on the problem of translation of symbolically represented knowledge between different knowledge representations. It used a description logic based approach, and offers syntactic rewriting to support the translation between two different knowledge languages, and semantic rewriting to support inference-based transformation. OntoMorph requires users to provide transformation rules, thus can be regarded as one type of rule-based technique. Prompt [20] proposed an ontology alignment mechanism that finds corresponding concepts by refining an initial mapping (pairs of anchors) given by users or some simple linguistic matching approaches. Specifically, it analyzes the paths in sub-graphs limited by the anchors and determines which concepts frequently appear in similar positions on similar paths. The philosophy followed by Prompt is similar to that of Similarity Flooding [16]. 16

25 Chapter 2. Related Work FCA-Merge [28] is an example of alignment technique depending on external resource. The resources used in FCA-Merge are domain-specific documents, which cover the concepts in the ontologies. Through natural language analysis techniques, it generates a formal context for each document, which tells which documents contain which concepts. Based on these formal contexts, the Is-a relationships between concepts are inferred. However since the formal context is built upon the generalization/specialization hierarchy of the concepts, this approach could not be extended to other relationships, such as Has-a. Moreover, the requirement of domain-specific documents is not always feasible. MAFRA [15] proposed a framework for sharing distributed ontologies via mapping. A multi-strategy process is employed to calculate the similarities between ontology entities, including lexical similarity, property similarity (attributes or relations). Both top-down and bottom-up similarity propagations are employed. This can be considered as a counterpart of the hybrid matching techniques in schema matching. To our best knowledge, though the ontology itself can have complex relationships, e.g., Has-a or Is-a, the focus of most previous work on ontology alignment is finding semantically equivalent concepts or one specific type of relationships, e.g., Is-a in FCA-Merge, in different ontologies [10], rather than discovering corresponding concepts with more types of generic relationships, and the rich relationships in ontologies are used only as one type of semantic evidence. 17

26 Chapter 2. Related Work 2.4 Sample Prototypes In this section, we consider some recent prototype of schema matching systems Rondo Rondo [17] is a complete prototype of generic model-management system, in which high-level operators are used to manipulate models and mappings between models. As one of its main operators, match is implemented using the Similarity Flooding (SF) algorithm [16]. SF utilizes a hybrid matching approach based on the idea of similarity propagation. It starts from a stringbased comparison, e.g., common prefixes, suffixes, of the schema elements names to get an initial mapping, which is further refined using a fix-point computation. The matching process is well formulated as a mathematical optimization problem in SF Cupid Cupid [14] implements a hybrid matching algorithm that analyzes syntactic information at elements (e.g., string prefixes, suffixes), and structure information of schemas (e.g., tree matching weighted by leaves). Moreover, it exploits external resources, i.e., a pre-compiled thesaurus COMA COMA [3] is a composite schema matching system. It provides a matcher library composed of different matching algorithms. Its framework allows 18

27 Chapter 2. Related Work the combination of partial results. The matcher library can be extended by adding new matching algorithms. Specifically, it contains 6 elementary matchers, 5 hybrid matchers and one reuse-oriented matcher. Compared with Cupid, this reuse-oriented matcher is a novel algorithm, which tries to leverage previously obtained results for new schemas imap imap [2] is a matching system that considers 1-to-n equivalence matches. The authors regard the problem of matching as a search in a usually infinite match space. The overall goal is achieved in three steps: (1) a set of basic matchers, called searchers are employed to detect similar elements according to different criteria (e.g., linguistic similarity, numerical equivalence, etc). Specifically, for each element in the target schema, a set of similar elements are found in the source schema by the searchers, including 1-to-1 and n-to-1 matches. (2) the match candidates generated in the first step are evaluated by a similarity evaluator module in this step, and the result is a similarity matrix which indicates the similarity between the target element and different match candidates. (3) a match selector module selects the best match candidate as the final result. imap also provides a explanation module which can provide explanation for the generated matches, e.g., the reason the match is selected, and the implicit equivalence relationship, etc. To the best of our knowledge, most previous work on schema matching focus on one-to-one equivalence relationships in finding semantic correspondences between two schemas. Little work is done in identifying multiple types of 19

28 Chapter 2. Related Work complex relationships. In the following chapters, we present SeMap, a prototype schema matching system which is designed to find generic semantic correspondences. 20

29 Chapter 3 Problem Formulation As discussed in the related work (Chapter 2), most work on schema matching so far focuses on finding one-to-one equivalence relationships between schema elements. The overall goal of our schema matching system,, is to identify generic semantic mapping between two schemas. And generic semantic mapping means (1) the matches may be non one-to-one, e.g., one element is mapped to multiple elements of the other schema, a.k.a, 1-to-n matches; (2) the relationship types may be non-equivalence, e.g., Has-a, Is-a, etc, as classified in Vanilla meta-meta model [21]. An example of a generic semantic mapping is shown in Figure 1.1, where two schemas represent the concept of class / course in different ways. The mapping contains complex correspondences, such as TA of schema S is mapped to undergrad TA and grad TA of schema T. Instead of the equivalence relation considered in most schema matching approaches, the relationship types involved are also complex, e.g., the department of schema S is considered as a member of the college of schema T. 21

30 Chapter 3. Problem Formulation model course Associates Associates college faculty Has-a Has-a grad TA ugrad TA (a) s p o course Associates course course Associates faculty course Has-a ugrad TA course Has-a grad TA (b) Figure 3.1: Representation of model. The left plot shows a graphical representation of a model, comprised of nodes (elements) and edges (relationships). The right table shows the tuple representation of edges. 3.1 Representation In this thesis we consider how to form a generic semantic mapping. Because we are attempting to solve this problem in a data model neutral fashion that could be applied equally well to relational or XML schemas or an ontology, we adopt the terminology from Model Management [17], and say that we take as input two models 1. A model is a complex design artifact, such as a relational schema, XML schema, XML DTD, or an ontology, etc. Technically, a model can be represented as a directed labelled graph (V, E). Specifically, V is the set of nodes, each denoting an element of the schema, e.g., attributes in relational database table, type definition in XML schemas, clauses of SQL statement, etc. E is the set of binary, directed typed edges over V. Formally, each edge is a tuple < s, p, o >, where s is the source node, p is the type of edge, and o the target node 2, and p denotes the relationship between s and o. An 1 In what follows, we will use schema and model exchangeably. 2 The notation < s, p, o > follows the notation of <subject, predict, object> in ontologies. 22

31 Chapter 3. Problem Formulation example of model representation is depicted in Figure 3.1, which illustrates the concept of course. As indicated in [19, 24], in addition to Equivalent relationship, the concepts of generalization/specialization and part-of/whole have been long recognized as ubiquitous and essential mechanisms in object-oriented modeling techniques, which have a large scope of applications, such as CAD, manufacturing, software development and computer graphics. In this thesis, we follow the relationship classification in Vanilla meta-meta model [21], which embeds the concepts of generalization/specialization and part-of/whole. Specifically, in the Vanilla meta-meta model, there are five relationship types, namely Associates, Is-a, Has-a, Contains, and Type-of, In this thesis, we concentrate on the first four Associates, Is-a, Has-a and Contains, where Is-a represents the concept of generalization/specialization, Contains and Has-a represent the concept of part/whole, and Associates represents all other weak semantic relationships. Strictly speaking, though both Has-a and Contains embed the concept of part-of/whole, they are different in semantics. As indicated in [19], partof relationships can be categorized in two dimensions, that is (1) the degree of sharing of parts among whole objects and (2) the degree of dependence between some part objects and some whole object(s). Contains and Has-a are different in the second dimension in that part objects are highly dependent on whole object(s) in Contains, while this dependence is not so strong in Has-a. This difference brings the rule that in a Contains relationship, the containee is a part of its container element, and cannot exist on its own (delete propagation). Moreover, Contains is a transitive relationship 23

32 Chapter 3. Problem Formulation and must be acyclic; while Has-a is weaker than Contains in that it does not propagate deletion and can be cyclic. Since we focus on the high-level part-of/whole relationship, we treat Has-a and Contains as the same in our framework. In addition, we also consider the equivalence relationship, which is the main focus of previous schema matching approaches. So totally, in our framework, we consider four relationship types: Equivalent, Has-a, Is-a, and Associates. Their formal definitions are specified as follows, and their graphical representation is shown in Figure 3.2: Equivalent: E(x, y) means that x is equivalent to y semantically. This is a symmetric relationship type, i.e., E(x, y) E(y, x); Has-a: H(x, y) indicates that x has a sub-component/member of y. This is an asymmetric relationship, i.e., H(x, y) can not infer H(y, x); Is-a: I(x, y) means that x is a specialization of y. This is an asymmetric relationship; Associates: A(x, y) indicates that x is associated with y. It is the weakest relationship that can be expressed. It has no constraints or special semantics. This is a symmetric relationship type. This representation is complex enough to capture many of the semantic relationships that appear in models, and yet is simple enough for a reasonable initial foray into the problem. A mapping, Map S T is a formal description of the semantic relationships between two schemas, S and T. A mapping itself is a model consisting of a 24

33 Chapter 3. Problem Formulation class class college faculty dept course dept professor Associates Equivalent Has-a Is-a Figure 3.2: Illustration of four relationship types handled by SeMap. set of mapping elements E, and a set of relationships R on E. The elements of the two schemas are related through the mapping elements. Each mapping element e E is like any other element in schemas S and T. In addition to being the origin or destination of any kind of relationship found in a model, i.e., R, each e E can be the origin of one or more mapping relationships, M(e, s), where s S T, which specifies that the origin element e, corresponds to the destination element s. The semantics of a mapping relationship is such that for all s 1, s 2 S T s.t. M(e, s 1 ) and M(e, s 2 ), s 1 = s 2, and s 1 corresponds to s 2. Given this rich mapping structure, the generic semantic relationship, not just simple correspondences between the elements of S and T can be expressed in this way: two semantically equivalent elements is represented by one mapping element; while the relationship of two mapping elements indicate that between their corresponding schema elements. For example, in Figure 3.3, the mapping element m 1 corresponds to the elements class and course representing the same concept; the relationship between m 4 and m 5 indicates instructor is-a faculty. 25

34 Chapter 3. Problem Formulation schema S Map S T schema T class m 1 (=) course Has-a professor Has-a instructor Associates dept Has-a m 2 (=) Has-a m 3 (=) Is-a m 5 (=) m 4 (=) Is-a m 6 (=) Associates college Has-a grad TA Associates faculty Has-a TA m 7 (=) ugrad TA Is-a Is-a m 8 (=) m 9 (=) Figure 3.3: An example of complex mapping handled by SeMap. 3.2 Problem Statement Given the definition of model and mapping, we are now ready to formally define the goal of SeMap: given two models, S and T, find generic semantic relationships required to create the mapping S T between S and T. There may be some optional inputs to the matching process, specifically (1) an initial mapping S T which provides an initial set of correspondences, and needs to be refined by the process; (2) external semantic resources r used by the matching process, e.g., domain-specific thesauri, ontologies, etc. The matching process is illustrated in Figure 3.4. resource r Schema S initial mapping S T Schema T Matching Mapping S T Figure 3.4: Illustration of the matching process. 26

35 Chapter 3. Problem Formulation 3.3 Semantic Resources The semantic resources used by matching techniques can be categorized as internal resources, which is contained in the input schemas or associated data instances, and external resources, which is the semantic information not presented in the schemas or data instances Internal Resources The semantic resources of the input schemas include both element-level information, which refers to the information stored at each schema element (e.g., element name, data type, structure, etc) and structure-level information, which refers to the information contained in the relationships between schema elements (e.g., relationship type, constraints, etc). In Section and , we introduce the element-level and structure-level resources considered in our SeMap system respectively Element-Level Information We consider the following element-level information: Element name (label). Each element name is of String type. The name (label) provides a first layer semantic evidence of the possible meaning of this schema element. Element type. If an element contains data, it is usually associated with a type indicating the storing format of the data. Note that in many representations, the data type of an model element is considered as a separate element, which is linked to the element itself by a 27

36 Chapter 3. Problem Formulation Type-of relationship. In our system, we consider data type as an attribute of the model element, e.g., String is an attribute of the element professor, rather a separate element. The element type can provide hints in the sense that similar schema elements usually have the same or compatible data types. Element description. It is a short description of the semantic meaning of the element, which usually contains more information than the element name only. For web interface where only schema-level information is available, the element description is especially valuable in determining the exact semantics of the elements. For example, it is hard to tell the semantics of an element only by its name people, in a flight ticket booking website. However, with the help of its description of total passengers, one can conclude that people stands for the overall number of tickets bought. Data instances. As discussed in Chapter 2, data instances can provide valuable information that could not be found in schemas, e.g., precise data format, data distribution, statistical values, etc. Specifically, the data type of an element may not be exactly how its data is stored, which can only be found in data instances. For example, the element phone may be of an Integer type. However, if looking at its data instances, one may notice that its exact format is of xxx-xxx-xxxx, which is not reflected in its data type. Meanwhile, the distribution of data instances is also useful in identifying similar schema elements, especially when the element names are obscure, e.g., A 1 and B 2 [11]. 28

37 Chapter 3. Problem Formulation Structure-Level Information In addition to the element-level information discussed above, we also consider structure-level information. In our system SeMap, we mainly consider two types of structure-level evidence: Relationship Type. Each edge between two schema elements is of certain relationship, which can be leveraged in matching process. The basic intuition is that if two elements are semantically similar, the elements having the same relationship with them are also highly likely to be semantically related. Constraints. Each edge can have constraints, including (1) cardinality in relational database table, e.g., 1-n, 1-1, etc, (2) key properties of elements, e.g., unique, primary, etc External Resources Previous work on matching techniques has shown that internal semantic evidence is usually insufficient for achieving high quality matching results; some additional external resources should be leveraged to improve the matching quality. In SeMap, we consider two types of external resources: Thesaurus. It is a dictionary which provides the different representations of the same concept. Hence the element names can be first populated with their synonyms, so that one has a better chance to find similar elements. Specifically, SeMap uses WordNet as the thesaurus. WordNet is a comprehensive English lexical reference system, 29

38 Chapter 3. Problem Formulation which organize more than nouns, verbs, 6000 adjectives and 3000 adverbs into synonym sets (synsets). It is considered one of the most powerful tools for computational linguistics, and has been used in several matching applications [7]. Ontology. Ontologies, especially domain-specific ontologies are powerful tools in discovering similar elements, even in identifying their implicit relationships. However they are not always obtainable. The collection of ontologies we employed in SeMap is provided from Onto- Builder [6]. 3.4 Approach Overview In this section, we present an overview of our generic matching system SeMap. As an implementation of the match operator, SeMap takes as input two schemas (models) S and T, and produces their generic semantic mapping S T. In addition, SeMap has additional input of external semantic resource r. In order to identify the generic semantic relationships between schema elements, SeMap not only has to identify the correspondences of complex relationships, but also extract the implicit relationship types. Figure 3.5 shows the basic architecture of this mapping construction system. SeMap implements this goal mainly in three phases: In the first phase, schema matching, the candidate matches (correspondences) of generic semantic relationships are identified. Note that most previous work focus on finding correspondences of Equivalent relationships, while in our work we also have 30

Category Theory in Ontology Research: Concrete Gain from an Abstract Approach

Category Theory in Ontology Research: Concrete Gain from an Abstract Approach Category Theory in Ontology Research: Concrete Gain from an Abstract Approach Markus Krötzsch Pascal Hitzler Marc Ehrig York Sure Institute AIFB, University of Karlsruhe, Germany; {mak,hitzler,ehrig,sure}@aifb.uni-karlsruhe.de

More information

NOTES ON OBJECT-ORIENTED MODELING AND DESIGN

NOTES ON OBJECT-ORIENTED MODELING AND DESIGN NOTES ON OBJECT-ORIENTED MODELING AND DESIGN Stephen W. Clyde Brigham Young University Provo, UT 86402 Abstract: A review of the Object Modeling Technique (OMT) is presented. OMT is an object-oriented

More information

A GML SCHEMA MAPPING APPROACH TO OVERCOME SEMANTIC HETEROGENEITY IN GIS

A GML SCHEMA MAPPING APPROACH TO OVERCOME SEMANTIC HETEROGENEITY IN GIS A GML SCHEMA MAPPING APPROACH TO OVERCOME SEMANTIC HETEROGENEITY IN GIS Manoj Paul, S. K. Ghosh School of Information Technology, Indian Institute of Technology, Kharagpur 721302, India - (mpaul, skg)@sit.iitkgp.ernet.in

More information

By Robin Dhamankar, Yoonkyong Lee, AnHai Doan, Alon Halevy and Pedro Domingos. Presented by Yael Kazaz

By Robin Dhamankar, Yoonkyong Lee, AnHai Doan, Alon Halevy and Pedro Domingos. Presented by Yael Kazaz By Robin Dhamankar, Yoonkyong Lee, AnHai Doan, Alon Halevy and Pedro Domingos Presented by Yael Kazaz Example: Merging Real-Estate Agencies Two real-estate agencies: S and T, decide to merge Schema T has

More information

Ontology Based Prediction of Difficult Keyword Queries

Ontology Based Prediction of Difficult Keyword Queries Ontology Based Prediction of Difficult Keyword Queries Lubna.C*, Kasim K Pursuing M.Tech (CSE)*, Associate Professor (CSE) MEA Engineering College, Perinthalmanna Kerala, India lubna9990@gmail.com, kasim_mlp@gmail.com

More information

University of Rome Tor Vergata GENOMA. GENeric Ontology Matching Architecture

University of Rome Tor Vergata GENOMA. GENeric Ontology Matching Architecture University of Rome Tor Vergata GENOMA GENeric Ontology Matching Architecture Maria Teresa Pazienza +, Roberto Enea +, Andrea Turbati + + ART Group, University of Rome Tor Vergata, Via del Politecnico 1,

More information

Chapter S:II. II. Search Space Representation

Chapter S:II. II. Search Space Representation Chapter S:II II. Search Space Representation Systematic Search Encoding of Problems State-Space Representation Problem-Reduction Representation Choosing a Representation S:II-1 Search Space Representation

More information

The Results of Falcon-AO in the OAEI 2006 Campaign

The Results of Falcon-AO in the OAEI 2006 Campaign The Results of Falcon-AO in the OAEI 2006 Campaign Wei Hu, Gong Cheng, Dongdong Zheng, Xinyu Zhong, and Yuzhong Qu School of Computer Science and Engineering, Southeast University, Nanjing 210096, P. R.

More information

Learning mappings and queries

Learning mappings and queries Learning mappings and queries Marie Jacob University Of Pennsylvania DEIS 2010 1 Schema mappings Denote relationships between schemas Relates source schema S and target schema T Defined in a query language

More information

A Survey of Schema-based Matching Approaches

A Survey of Schema-based Matching Approaches A Survey of Schema-based Matching Approaches Pavel Shvaiko 1 and Jérôme Euzenat 2 1 University of Trento, Povo, Trento, Italy, pavel@dit.unitn.it 2 INRIA, Rhône-Alpes, France, Jerome.Euzenat@inrialpes.fr

More information

Model-Solver Integration in Decision Support Systems: A Web Services Approach

Model-Solver Integration in Decision Support Systems: A Web Services Approach Model-Solver Integration in Decision Support Systems: A Web Services Approach Keun-Woo Lee a, *, Soon-Young Huh a a Graduate School of Management, Korea Advanced Institute of Science and Technology 207-43

More information

1.1 Jadex - Engineering Goal-Oriented Agents

1.1 Jadex - Engineering Goal-Oriented Agents 1.1 Jadex - Engineering Goal-Oriented Agents In previous sections of the book agents have been considered as software artifacts that differ from objects mainly in their capability to autonomously execute

More information

A LIBRARY OF SCHEMA MATCHING ALGORITHMS FOR DATASPACE MANAGEMENT SYSTEMS

A LIBRARY OF SCHEMA MATCHING ALGORITHMS FOR DATASPACE MANAGEMENT SYSTEMS A LIBRARY OF SCHEMA MATCHING ALGORITHMS FOR DATASPACE MANAGEMENT SYSTEMS A dissertation submitted to the University of Manchester for the degree of Master of Science in the Faculty of Engineering and Physical

More information

A B2B Search Engine. Abstract. Motivation. Challenges. Technical Report

A B2B Search Engine. Abstract. Motivation. Challenges. Technical Report Technical Report A B2B Search Engine Abstract In this report, we describe a business-to-business search engine that allows searching for potential customers with highly-specific queries. Currently over

More information

A Semi-Automatic Ontology Extension Method for Semantic Web Services

A Semi-Automatic Ontology Extension Method for Semantic Web Services University of Jordan From the SelectedWorks of Dr. Mutaz M. Al-Debei 2011 A Semi-Automatic Ontology Extension Method for Semantic Web Services Mutaz M. Al-Debei Mohammad Mourhaf Al Asswad Available at:

More information

The HMatch 2.0 Suite for Ontology Matchmaking

The HMatch 2.0 Suite for Ontology Matchmaking The HMatch 2.0 Suite for Ontology Matchmaking S. Castano, A. Ferrara, D. Lorusso, and S. Montanelli Università degli Studi di Milano DICo - Via Comelico, 39, 20135 Milano - Italy {castano,ferrara,lorusso,montanelli}@dico.unimi.it

More information

Lily: Ontology Alignment Results for OAEI 2009

Lily: Ontology Alignment Results for OAEI 2009 Lily: Ontology Alignment Results for OAEI 2009 Peng Wang 1, Baowen Xu 2,3 1 College of Software Engineering, Southeast University, China 2 State Key Laboratory for Novel Software Technology, Nanjing University,

More information

Ontology Based Search Engine

Ontology Based Search Engine Ontology Based Search Engine K.Suriya Prakash / P.Saravana kumar Lecturer / HOD / Assistant Professor Hindustan Institute of Engineering Technology Polytechnic College, Padappai, Chennai, TamilNadu, India

More information

An Architecture for Semantic Enterprise Application Integration Standards

An Architecture for Semantic Enterprise Application Integration Standards An Architecture for Semantic Enterprise Application Integration Standards Nenad Anicic 1, 2, Nenad Ivezic 1, Albert Jones 1 1 National Institute of Standards and Technology, 100 Bureau Drive Gaithersburg,

More information

3 Classifications of ontology matching techniques

3 Classifications of ontology matching techniques 3 Classifications of ontology matching techniques Having defined what the matching problem is, we attempt at classifying the techniques that can be used for solving this problem. The major contributions

More information

Ontology Matching with CIDER: Evaluation Report for the OAEI 2008

Ontology Matching with CIDER: Evaluation Report for the OAEI 2008 Ontology Matching with CIDER: Evaluation Report for the OAEI 2008 Jorge Gracia, Eduardo Mena IIS Department, University of Zaragoza, Spain {jogracia,emena}@unizar.es Abstract. Ontology matching, the task

More information

DATA MODELS FOR SEMISTRUCTURED DATA

DATA MODELS FOR SEMISTRUCTURED DATA Chapter 2 DATA MODELS FOR SEMISTRUCTURED DATA Traditionally, real world semantics are captured in a data model, and mapped to the database schema. The real world semantics are modeled as constraints and

More information

A Generic Algorithm for Heterogeneous Schema Matching

A Generic Algorithm for Heterogeneous Schema Matching You Li, Dongbo Liu, and Weiming Zhang A Generic Algorithm for Heterogeneous Schema Matching You Li1, Dongbo Liu,3, and Weiming Zhang1 1 Department of Management Science, National University of Defense

More information

A tutorial report for SENG Agent Based Software Engineering. Course Instructor: Dr. Behrouz H. Far. XML Tutorial.

A tutorial report for SENG Agent Based Software Engineering. Course Instructor: Dr. Behrouz H. Far. XML Tutorial. A tutorial report for SENG 609.22 Agent Based Software Engineering Course Instructor: Dr. Behrouz H. Far XML Tutorial Yanan Zhang Department of Electrical and Computer Engineering University of Calgary

More information

Web Information Retrieval using WordNet

Web Information Retrieval using WordNet Web Information Retrieval using WordNet Jyotsna Gharat Asst. Professor, Xavier Institute of Engineering, Mumbai, India Jayant Gadge Asst. Professor, Thadomal Shahani Engineering College Mumbai, India ABSTRACT

More information

Chapter 27 Introduction to Information Retrieval and Web Search

Chapter 27 Introduction to Information Retrieval and Web Search Chapter 27 Introduction to Information Retrieval and Web Search Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 27 Outline Information Retrieval (IR) Concepts Retrieval

More information

Efficient, Scalable, and Provenance-Aware Management of Linked Data

Efficient, Scalable, and Provenance-Aware Management of Linked Data Efficient, Scalable, and Provenance-Aware Management of Linked Data Marcin Wylot 1 Motivation and objectives of the research The proliferation of heterogeneous Linked Data on the Web requires data management

More information

code pattern analysis of object-oriented programming languages

code pattern analysis of object-oriented programming languages code pattern analysis of object-oriented programming languages by Xubo Miao A thesis submitted to the School of Computing in conformity with the requirements for the degree of Master of Science Queen s

More information

Making Sense Out of the Web

Making Sense Out of the Web Making Sense Out of the Web Rada Mihalcea University of North Texas Department of Computer Science rada@cs.unt.edu Abstract. In the past few years, we have witnessed a tremendous growth of the World Wide

More information

Modelling Structures in Data Mining Techniques

Modelling Structures in Data Mining Techniques Modelling Structures in Data Mining Techniques Ananth Y N 1, Narahari.N.S 2 Associate Professor, Dept of Computer Science, School of Graduate Studies- JainUniversity- J.C.Road, Bangalore, INDIA 1 Professor

More information

Generalized Document Data Model for Integrating Autonomous Applications

Generalized Document Data Model for Integrating Autonomous Applications 6 th International Conference on Applied Informatics Eger, Hungary, January 27 31, 2004. Generalized Document Data Model for Integrating Autonomous Applications Zsolt Hernáth, Zoltán Vincellér Abstract

More information

Information Discovery, Extraction and Integration for the Hidden Web

Information Discovery, Extraction and Integration for the Hidden Web Information Discovery, Extraction and Integration for the Hidden Web Jiying Wang Department of Computer Science University of Science and Technology Clear Water Bay, Kowloon Hong Kong cswangjy@cs.ust.hk

More information

A Session-based Ontology Alignment Approach for Aligning Large Ontologies

A Session-based Ontology Alignment Approach for Aligning Large Ontologies Undefined 1 (2009) 1 5 1 IOS Press A Session-based Ontology Alignment Approach for Aligning Large Ontologies Editor(s): Name Surname, University, Country Solicited review(s): Name Surname, University,

More information

Schema-based Semantic Matching: Algorithms, a System and a Testing Methodology

Schema-based Semantic Matching: Algorithms, a System and a Testing Methodology Schema-based Semantic Matching: Algorithms, a System and a Testing Methodology Abstract. Schema/ontology/classification matching is a critical problem in many application domains, such as, schema/ontology/classification

More information

Minsoo Ryu. College of Information and Communications Hanyang University.

Minsoo Ryu. College of Information and Communications Hanyang University. Software Reuse and Component-Based Software Engineering Minsoo Ryu College of Information and Communications Hanyang University msryu@hanyang.ac.kr Software Reuse Contents Components CBSE (Component-Based

More information

A Tool for Semi-Automated Semantic Schema Mapping: Design and Implementation

A Tool for Semi-Automated Semantic Schema Mapping: Design and Implementation A Tool for Semi-Automated Semantic Schema Mapping: Design and Implementation Dimitris Manakanatas, Dimitris Plexousakis Institute of Computer Science, FO.R.T.H. P.O. Box 1385, GR 71110, Heraklion, Greece

More information

Models versus Ontologies - What's the Difference and where does it Matter?

Models versus Ontologies - What's the Difference and where does it Matter? Models versus Ontologies - What's the Difference and where does it Matter? Colin Atkinson University of Mannheim Presentation for University of Birmingham April 19th 2007 1 Brief History Ontologies originated

More information

Modeling Systems Using Design Patterns

Modeling Systems Using Design Patterns Modeling Systems Using Design Patterns Jaroslav JAKUBÍK Slovak University of Technology Faculty of Informatics and Information Technologies Ilkovičova 3, 842 16 Bratislava, Slovakia jakubik@fiit.stuba.sk

More information

RiMOM Results for OAEI 2009

RiMOM Results for OAEI 2009 RiMOM Results for OAEI 2009 Xiao Zhang, Qian Zhong, Feng Shi, Juanzi Li and Jie Tang Department of Computer Science and Technology, Tsinghua University, Beijing, China zhangxiao,zhongqian,shifeng,ljz,tangjie@keg.cs.tsinghua.edu.cn

More information

Leveraging Set Relations in Exact Set Similarity Join

Leveraging Set Relations in Exact Set Similarity Join Leveraging Set Relations in Exact Set Similarity Join Xubo Wang, Lu Qin, Xuemin Lin, Ying Zhang, and Lijun Chang University of New South Wales, Australia University of Technology Sydney, Australia {xwang,lxue,ljchang}@cse.unsw.edu.au,

More information

An Improving for Ranking Ontologies Based on the Structure and Semantics

An Improving for Ranking Ontologies Based on the Structure and Semantics An Improving for Ranking Ontologies Based on the Structure and Semantics S.Anusuya, K.Muthukumaran K.S.R College of Engineering Abstract Ontology specifies the concepts of a domain and their semantic relationships.

More information

automatic digitization. In the context of ever increasing population worldwide and thereby

automatic digitization. In the context of ever increasing population worldwide and thereby Chapter 1 Introduction In the recent time, many researchers had thrust upon developing various improvised methods of automatic digitization. In the context of ever increasing population worldwide and thereby

More information

SEMANTIC WEB POWERED PORTAL INFRASTRUCTURE

SEMANTIC WEB POWERED PORTAL INFRASTRUCTURE SEMANTIC WEB POWERED PORTAL INFRASTRUCTURE YING DING 1 Digital Enterprise Research Institute Leopold-Franzens Universität Innsbruck Austria DIETER FENSEL Digital Enterprise Research Institute National

More information

In the recent past, the World Wide Web has been witnessing an. explosive growth. All the leading web search engines, namely, Google,

In the recent past, the World Wide Web has been witnessing an. explosive growth. All the leading web search engines, namely, Google, 1 1.1 Introduction In the recent past, the World Wide Web has been witnessing an explosive growth. All the leading web search engines, namely, Google, Yahoo, Askjeeves, etc. are vying with each other to

More information

3.4 Data-Centric workflow

3.4 Data-Centric workflow 3.4 Data-Centric workflow One of the most important activities in a S-DWH environment is represented by data integration of different and heterogeneous sources. The process of extract, transform, and load

More information

Object-Oriented Software Engineering Practical Software Development using UML and Java

Object-Oriented Software Engineering Practical Software Development using UML and Java Object-Oriented Software Engineering Practical Software Development using UML and Java Chapter 5: Modelling with Classes Lecture 5 5.1 What is UML? The Unified Modelling Language is a standard graphical

More information

CHAPTER 5 SEARCH ENGINE USING SEMANTIC CONCEPTS

CHAPTER 5 SEARCH ENGINE USING SEMANTIC CONCEPTS 82 CHAPTER 5 SEARCH ENGINE USING SEMANTIC CONCEPTS In recent years, everybody is in thirst of getting information from the internet. Search engines are used to fulfill the need of them. Even though the

More information

Module 1 Lecture Notes 2. Optimization Problem and Model Formulation

Module 1 Lecture Notes 2. Optimization Problem and Model Formulation Optimization Methods: Introduction and Basic concepts 1 Module 1 Lecture Notes 2 Optimization Problem and Model Formulation Introduction In the previous lecture we studied the evolution of optimization

More information

Software Reuse and Component-Based Software Engineering

Software Reuse and Component-Based Software Engineering Software Reuse and Component-Based Software Engineering Minsoo Ryu Hanyang University msryu@hanyang.ac.kr Contents Software Reuse Components CBSE (Component-Based Software Engineering) Domain Engineering

More information

CHAPTER THREE INFORMATION RETRIEVAL SYSTEM

CHAPTER THREE INFORMATION RETRIEVAL SYSTEM CHAPTER THREE INFORMATION RETRIEVAL SYSTEM 3.1 INTRODUCTION Search engine is one of the most effective and prominent method to find information online. It has become an essential part of life for almost

More information

Schema Quality Improving Tasks in the Schema Integration Process

Schema Quality Improving Tasks in the Schema Integration Process 468 Schema Quality Improving Tasks in the Schema Integration Process Peter Bellström Information Systems Karlstad University Karlstad, Sweden e-mail: peter.bellstrom@kau.se Christian Kop Institute for

More information

Chapter 2 Overview of the Design Methodology

Chapter 2 Overview of the Design Methodology Chapter 2 Overview of the Design Methodology This chapter presents an overview of the design methodology which is developed in this thesis, by identifying global abstraction levels at which a distributed

More information

Falcon-AO: Aligning Ontologies with Falcon

Falcon-AO: Aligning Ontologies with Falcon Falcon-AO: Aligning Ontologies with Falcon Ningsheng Jian, Wei Hu, Gong Cheng, Yuzhong Qu Department of Computer Science and Engineering Southeast University Nanjing 210096, P. R. China {nsjian, whu, gcheng,

More information

INFORMATION RETRIEVAL SYSTEM: CONCEPT AND SCOPE

INFORMATION RETRIEVAL SYSTEM: CONCEPT AND SCOPE 15 : CONCEPT AND SCOPE 15.1 INTRODUCTION Information is communicated or received knowledge concerning a particular fact or circumstance. Retrieval refers to searching through stored information to find

More information

Multi-label classification using rule-based classifier systems

Multi-label classification using rule-based classifier systems Multi-label classification using rule-based classifier systems Shabnam Nazmi (PhD candidate) Department of electrical and computer engineering North Carolina A&T state university Advisor: Dr. A. Homaifar

More information

H1 Spring B. Programmers need to learn the SOAP schema so as to offer and use Web services.

H1 Spring B. Programmers need to learn the SOAP schema so as to offer and use Web services. 1. (24 points) Identify all of the following statements that are true about the basics of services. A. If you know that two parties implement SOAP, then you can safely conclude they will interoperate at

More information

Extracting knowledge from Ontology using Jena for Semantic Web

Extracting knowledge from Ontology using Jena for Semantic Web Extracting knowledge from Ontology using Jena for Semantic Web Ayesha Ameen I.T Department Deccan College of Engineering and Technology Hyderabad A.P, India ameenayesha@gmail.com Khaleel Ur Rahman Khan

More information

Intelligent flexible query answering Using Fuzzy Ontologies

Intelligent flexible query answering Using Fuzzy Ontologies International Conference on Control, Engineering & Information Technology (CEIT 14) Proceedings - Copyright IPCO-2014, pp. 262-277 ISSN 2356-5608 Intelligent flexible query answering Using Fuzzy Ontologies

More information

Algorithms. Lecture Notes 5

Algorithms. Lecture Notes 5 Algorithms. Lecture Notes 5 Dynamic Programming for Sequence Comparison The linear structure of the Sequence Comparison problem immediately suggests a dynamic programming approach. Naturally, our sub-instances

More information

imap: Discovering Complex Semantic Matches between Database Schemas

imap: Discovering Complex Semantic Matches between Database Schemas imap: Discovering Complex Semantic Matches between Database Schemas Robin Dhamankar, Yoonkyong Lee, AnHai Doan Department of Computer Science University of Illinois, Urbana-Champaign, IL, USA {dhamanka,ylee11,anhai}@cs.uiuc.edu

More information

An Approach to Evaluate and Enhance the Retrieval of Web Services Based on Semantic Information

An Approach to Evaluate and Enhance the Retrieval of Web Services Based on Semantic Information An Approach to Evaluate and Enhance the Retrieval of Web Services Based on Semantic Information Stefan Schulte Multimedia Communications Lab (KOM) Technische Universität Darmstadt, Germany schulte@kom.tu-darmstadt.de

More information

Semantic Web. Ontology Pattern. Gerd Gröner, Matthias Thimm. Institute for Web Science and Technologies (WeST) University of Koblenz-Landau

Semantic Web. Ontology Pattern. Gerd Gröner, Matthias Thimm. Institute for Web Science and Technologies (WeST) University of Koblenz-Landau Semantic Web Ontology Pattern Gerd Gröner, Matthias Thimm {groener,thimm}@uni-koblenz.de Institute for Web Science and Technologies (WeST) University of Koblenz-Landau July 18, 2013 Gerd Gröner, Matthias

More information

Fundamentals of STEP Implementation

Fundamentals of STEP Implementation Fundamentals of STEP Implementation David Loffredo loffredo@steptools.com STEP Tools, Inc., Rensselaer Technology Park, Troy, New York 12180 A) Introduction The STEP standard documents contain such a large

More information

PORTAL RESOURCES INFORMATION SYSTEM: THE DESIGN AND DEVELOPMENT OF AN ONLINE DATABASE FOR TRACKING WEB RESOURCES.

PORTAL RESOURCES INFORMATION SYSTEM: THE DESIGN AND DEVELOPMENT OF AN ONLINE DATABASE FOR TRACKING WEB RESOURCES. PORTAL RESOURCES INFORMATION SYSTEM: THE DESIGN AND DEVELOPMENT OF AN ONLINE DATABASE FOR TRACKING WEB RESOURCES by Richard Spinks A Master s paper submitted to the faculty of the School of Information

More information

Enhanced Web Log Based Recommendation by Personalized Retrieval

Enhanced Web Log Based Recommendation by Personalized Retrieval Enhanced Web Log Based Recommendation by Personalized Retrieval Xueping Peng FACULTY OF ENGINEERING AND INFORMATION TECHNOLOGY UNIVERSITY OF TECHNOLOGY, SYDNEY A thesis submitted for the degree of Doctor

More information

Ontologies for Agents

Ontologies for Agents Agents on the Web Ontologies for Agents Michael N. Huhns and Munindar P. Singh November 1, 1997 When we need to find the cheapest airfare, we call our travel agent, Betsi, at Prestige Travel. We are able

More information

The Encoding Complexity of Network Coding

The Encoding Complexity of Network Coding The Encoding Complexity of Network Coding Michael Langberg Alexander Sprintson Jehoshua Bruck California Institute of Technology Email: mikel,spalex,bruck @caltech.edu Abstract In the multicast network

More information

Automation of Semantic Web based Digital Library using Unified Modeling Language Minal Bhise 1 1

Automation of Semantic Web based Digital Library using Unified Modeling Language Minal Bhise 1 1 Automation of Semantic Web based Digital Library using Unified Modeling Language Minal Bhise 1 1 Dhirubhai Ambani Institute for Information and Communication Technology, Gandhinagar, Gujarat, India Email:

More information

SOME TYPES AND USES OF DATA MODELS

SOME TYPES AND USES OF DATA MODELS 3 SOME TYPES AND USES OF DATA MODELS CHAPTER OUTLINE 3.1 Different Types of Data Models 23 3.1.1 Physical Data Model 24 3.1.2 Logical Data Model 24 3.1.3 Conceptual Data Model 25 3.1.4 Canonical Data Model

More information

Part I: Data Mining Foundations

Part I: Data Mining Foundations Table of Contents 1. Introduction 1 1.1. What is the World Wide Web? 1 1.2. A Brief History of the Web and the Internet 2 1.3. Web Data Mining 4 1.3.1. What is Data Mining? 6 1.3.2. What is Web Mining?

More information

A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2

A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2 A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2 1 Department of Electronics & Comp. Sc, RTMNU, Nagpur, India 2 Department of Computer Science, Hislop College, Nagpur,

More information

An Ontological Approach to Domain Engineering

An Ontological Approach to Domain Engineering An Ontological Approach to Domain Engineering Richard de Almeida Falbo, Giancarlo Guizzardi, Katia Cristina Duarte International Conference on Software Engineering and Knowledge Engineering, SEKE 02 Taehoon

More information

WORKFLOW ENGINE FOR CLOUDS

WORKFLOW ENGINE FOR CLOUDS WORKFLOW ENGINE FOR CLOUDS By SURAJ PANDEY, DILEBAN KARUNAMOORTHY, and RAJKUMAR BUYYA Prepared by: Dr. Faramarz Safi Islamic Azad University, Najafabad Branch, Esfahan, Iran. Task Computing Task computing

More information

Knowledge and Ontological Engineering: Directions for the Semantic Web

Knowledge and Ontological Engineering: Directions for the Semantic Web Knowledge and Ontological Engineering: Directions for the Semantic Web Dana Vaughn and David J. Russomanno Department of Electrical and Computer Engineering The University of Memphis Memphis, TN 38152

More information

Semantic Interoperability. Being serious about the Semantic Web

Semantic Interoperability. Being serious about the Semantic Web Semantic Interoperability Jérôme Euzenat INRIA & LIG France Natasha Noy Stanford University USA 1 Being serious about the Semantic Web It is not one person s ontology It is not several people s common

More information

Content Management for the Defense Intelligence Enterprise

Content Management for the Defense Intelligence Enterprise Gilbane Beacon Guidance on Content Strategies, Practices and Technologies Content Management for the Defense Intelligence Enterprise How XML and the Digital Production Process Transform Information Sharing

More information

Enabling Product Comparisons on Unstructured Information Using Ontology Matching

Enabling Product Comparisons on Unstructured Information Using Ontology Matching Enabling Product Comparisons on Unstructured Information Using Ontology Matching Maximilian Walther, Niels Jäckel, Daniel Schuster, and Alexander Schill Technische Universität Dresden, Faculty of Computer

More information

WEB PAGE RE-RANKING TECHNIQUE IN SEARCH ENGINE

WEB PAGE RE-RANKING TECHNIQUE IN SEARCH ENGINE WEB PAGE RE-RANKING TECHNIQUE IN SEARCH ENGINE Ms.S.Muthukakshmi 1, R. Surya 2, M. Umira Taj 3 Assistant Professor, Department of Information Technology, Sri Krishna College of Technology, Kovaipudur,

More information

Abstract Path Planning for Multiple Robots: An Empirical Study

Abstract Path Planning for Multiple Robots: An Empirical Study Abstract Path Planning for Multiple Robots: An Empirical Study Charles University in Prague Faculty of Mathematics and Physics Department of Theoretical Computer Science and Mathematical Logic Malostranské

More information

Uncertain Data Models

Uncertain Data Models Uncertain Data Models Christoph Koch EPFL Dan Olteanu University of Oxford SYNOMYMS data models for incomplete information, probabilistic data models, representation systems DEFINITION An uncertain data

More information

Web Services Annotation and Reasoning

Web Services Annotation and Reasoning Web Services Annotation and Reasoning, W3C Workshop on Frameworks for Semantics in Web Services Web Services Annotation and Reasoning Peter Graubmann, Evelyn Pfeuffer, Mikhail Roshchin Siemens AG, Corporate

More information

Knowledge Engineering with Semantic Web Technologies

Knowledge Engineering with Semantic Web Technologies This file is licensed under the Creative Commons Attribution-NonCommercial 3.0 (CC BY-NC 3.0) Knowledge Engineering with Semantic Web Technologies Lecture 5: Ontological Engineering 5.3 Ontology Learning

More information

Bayesian Classification Using Probabilistic Graphical Models

Bayesian Classification Using Probabilistic Graphical Models San Jose State University SJSU ScholarWorks Master's Projects Master's Theses and Graduate Research Spring 2014 Bayesian Classification Using Probabilistic Graphical Models Mehal Patel San Jose State University

More information

Object Oriented Finite Element Modeling

Object Oriented Finite Element Modeling Object Oriented Finite Element Modeling Bořek Patzák Czech Technical University Faculty of Civil Engineering Department of Structural Mechanics Thákurova 7, 166 29 Prague, Czech Republic January 2, 2018

More information

Metaheuristic Development Methodology. Fall 2009 Instructor: Dr. Masoud Yaghini

Metaheuristic Development Methodology. Fall 2009 Instructor: Dr. Masoud Yaghini Metaheuristic Development Methodology Fall 2009 Instructor: Dr. Masoud Yaghini Phases and Steps Phases and Steps Phase 1: Understanding Problem Step 1: State the Problem Step 2: Review of Existing Solution

More information

Ontology based Model and Procedure Creation for Topic Analysis in Chinese Language

Ontology based Model and Procedure Creation for Topic Analysis in Chinese Language Ontology based Model and Procedure Creation for Topic Analysis in Chinese Language Dong Han and Kilian Stoffel Information Management Institute, University of Neuchâtel Pierre-à-Mazel 7, CH-2000 Neuchâtel,

More information

Semantic Web. Ontology Alignment. Morteza Amini. Sharif University of Technology Fall 94-95

Semantic Web. Ontology Alignment. Morteza Amini. Sharif University of Technology Fall 94-95 ه عا ی Semantic Web Ontology Alignment Morteza Amini Sharif University of Technology Fall 94-95 Outline The Problem of Ontologies Ontology Heterogeneity Ontology Alignment Overall Process Similarity Methods

More information

Semantic Web. Ontology Alignment. Morteza Amini. Sharif University of Technology Fall 95-96

Semantic Web. Ontology Alignment. Morteza Amini. Sharif University of Technology Fall 95-96 ه عا ی Semantic Web Ontology Alignment Morteza Amini Sharif University of Technology Fall 95-96 Outline The Problem of Ontologies Ontology Heterogeneity Ontology Alignment Overall Process Similarity (Matching)

More information

Evaluation of Seed Selection Strategies for Vehicle to Vehicle Epidemic Information Dissemination

Evaluation of Seed Selection Strategies for Vehicle to Vehicle Epidemic Information Dissemination Evaluation of Seed Selection Strategies for Vehicle to Vehicle Epidemic Information Dissemination Richard Kershaw and Bhaskar Krishnamachari Ming Hsieh Department of Electrical Engineering, Viterbi School

More information

6. Relational Algebra (Part II)

6. Relational Algebra (Part II) 6. Relational Algebra (Part II) 6.1. Introduction In the previous chapter, we introduced relational algebra as a fundamental model of relational database manipulation. In particular, we defined and discussed

More information

is easing the creation of new ontologies by promoting the reuse of existing ones and automating, as much as possible, the entire ontology

is easing the creation of new ontologies by promoting the reuse of existing ones and automating, as much as possible, the entire ontology Preface The idea of improving software quality through reuse is not new. After all, if software works and is needed, just reuse it. What is new and evolving is the idea of relative validation through testing

More information

A Developer s Guide to the Semantic Web

A Developer s Guide to the Semantic Web A Developer s Guide to the Semantic Web von Liyang Yu 1. Auflage Springer 2011 Verlag C.H. Beck im Internet: www.beck.de ISBN 978 3 642 15969 5 schnell und portofrei erhältlich bei beck-shop.de DIE FACHBUCHHANDLUNG

More information

Chapter 15 Introduction to Linear Programming

Chapter 15 Introduction to Linear Programming Chapter 15 Introduction to Linear Programming An Introduction to Optimization Spring, 2015 Wei-Ta Chu 1 Brief History of Linear Programming The goal of linear programming is to determine the values of

More information

Chapter 2. Related Work

Chapter 2. Related Work Chapter 2 Related Work There are three areas of research highly related to our exploration in this dissertation, namely sequential pattern mining, multiple alignment, and approximate frequent pattern mining.

More information

Whole Platform Foundation. The Long Way Toward Language Oriented Programming

Whole Platform Foundation. The Long Way Toward Language Oriented Programming Whole Platform Foundation The Long Way Toward Language Oriented Programming 2008 by Riccardo Solmi made available under the Creative Commons License last updated 22 October 2008 Outline Aim: Engineering

More information

38050 Povo Trento (Italy), Via Sommarive 14 A SURVEY OF SCHEMA-BASED MATCHING APPROACHES. Pavel Shvaiko and Jerome Euzenat

38050 Povo Trento (Italy), Via Sommarive 14  A SURVEY OF SCHEMA-BASED MATCHING APPROACHES. Pavel Shvaiko and Jerome Euzenat UNIVERSITY OF TRENTO DEPARTMENT OF INFORMATION AND COMMUNICATION TECHNOLOGY 38050 Povo Trento (Italy), Via Sommarive 14 http://www.dit.unitn.it A SURVEY OF SCHEMA-BASED MATCHING APPROACHES Pavel Shvaiko

More information

RAQUEL s Relational Operators

RAQUEL s Relational Operators Contents RAQUEL s Relational Operators Introduction 2 General Principles 2 Operator Parameters 3 Ordinary & High-Level Operators 3 Operator Valency 4 Default Tuples 5 The Relational Algebra Operators in

More information

METEOR-S Web service Annotation Framework with Machine Learning Classification

METEOR-S Web service Annotation Framework with Machine Learning Classification METEOR-S Web service Annotation Framework with Machine Learning Classification Nicole Oldham, Christopher Thomas, Amit Sheth, Kunal Verma LSDIS Lab, Department of CS, University of Georgia, 415 GSRC, Athens,

More information

Natural Language Processing with PoolParty

Natural Language Processing with PoolParty Natural Language Processing with PoolParty Table of Content Introduction to PoolParty 2 Resolving Language Problems 4 Key Features 5 Entity Extraction and Term Extraction 5 Shadow Concepts 6 Word Sense

More information

Collaborative Framework for Testing Web Application Vulnerabilities Using STOWS

Collaborative Framework for Testing Web Application Vulnerabilities Using STOWS Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IMPACT FACTOR: 5.258 IJCSMC,

More information