Quality of Matching in large scale scenarios

Size: px
Start display at page:

Download "Quality of Matching in large scale scenarios"

Transcription

1 Quality of Matching in large scale scenarios Sana Sellami : Université de Lyon, CNRS INSA-Lyon, LIRIS, UMR5205, F-69621, France. sana.sellami@insa-lyon.fr Nabila Benharkat : Université de Lyon, CNRS INSA-Lyon, LIRIS, UMR5205, F-69621, France. nabila.benharkat@insa-lyon.fr Youssef Amghar : Université de Lyon, CNRS INSA-Lyon, LIRIS, UMR5205, F-69621, France. ABSTRACT. Matching Techniques are becoming a very attractive research topic. With the development and the use of a large variety of data (e.g. DB schemas, XML schemas, ontologies), in many domains (e.g. semantic web, E-business, etc), matching techniques are called to overcome the challenge of aligning these different data. In this paper, we are interested in studying the quality of large scale matching systems. We define and propose a quality of Matching (QoM) that can be used to evaluate large scale matching systems. We survey the techniques, called optimization techniques, used in existing matching approaches to improve this quality. One can acknowledge that this domain is on top of effervescence and large scale matching need much more advances. So, we demonstrate how quality evaluation can be integrated in our scalable matching system PLASMA. KEYWORDS: Matching, Quality of Matching (QoM), Large Scale, Optimization techniques.

2 1. Introduction Recently, in the business and scientific area, we are witnessing an explosive growth of data. In fact, there are many databases and information sources available through the web covering different domains: semantic Web, deep Web, e-business, biology, digital libraries, etc. In such domains, the data generated are heterogeneous and voluminous e.g schemas with several thousand elements are common in e-business applications. The presence of vast heterogeneous collections of data arises one of the greatest challenges in data integration field. Hence, matching techniques are solutions to automatically search correspondences between these data in order to obtain useful information. Schema matching has found considerable interest in both research and practice. In fact, matching is an operation that takes data as input (e.g XML schemas, ontologies, relational database schemas) and returns the semantic similarity values of their elements. One of the challenges of the matching community is to efficiently search correspondences between several and voluminous schemas. However, matching these data at large scale represents a laborious process. The standard approach trying to match the complete input schemas will often lead to performance problems. Various schema matching systems have been developed to solve the problem semi-automatically. Since schema matching is a semi-automatic task, efficient implementations are required to support interactive user feedback. In this context, scalable matching becomes a problem to be solved. Our main motivation is then to optimize and improve scalable matching algorithms in terms of efficiency and the quality of matching solutions produce. We begin by defining the quality of matching (QoM). We propose a quality of Matching (QoM) classification in terms of factors and metrics that can be used to evaluate matching systems and to ensure high outcome for large scale matching. Moreover, we expose in detail the proposed techniques in the literature in both pairwise and holistic approaches to improve the quality of matching and to deal with scalability and performance problems. This analysis of state of the art techniques allows us to make some conclusions and observations about the quality of existing matching systems. Depending on these observations, we describe how quality of matching can be integrated in our scalable schema matching system PLASMA (PLatform for LArge Scale MAtching). The goal of our paper is to show the importance of quality of matching to ensure an efficient scalable matching system. The paper is organized as follows. In section 2, we define quality of Matching (QoM). We propose then a classification in terms of metrics and factors to evaluate the quality and we analyse the different existing techniques in the literature to improve this quality. Section 3 presents the important role of quality in our scalable matching system PLASMA. Finally, we conclude and discuss future works. 2. Quality of Matching (QoM) In the large scale context, we define and propose a Quality of Matching (QoM) which represents the reliability and robustness of large scale matching systems. In fact, we estimate that it is important and interesting to relate the quality aspect to the scalable matching techniques. The quality assessment brings to the users an optimal solution to accomplish their needs. Therefore, QoM means for us an optimization of the large scale matching system. We analyse and define in this section two main aspects of QoM: how evaluating quality of Matching and how improving the quality of matching in large scale matching scenarios.

3 2.1. Quality of matching evaluation Evaluations of schema matching systems have been deeply studied in (Do et al. discussing various aspects (input, output, match quality measures, effort) that contribute 2002) to the match quality obtained as the result of an evaluation. The quality concept has been used in several domains as an important phase of evaluation in the current information systems. There are a variety of approaches to study the quality of data in information integration and data search. However, there exists little work, which tackles the quality aspect in the matching process at the large scale. The authors (Bernstein et al., 2004) test their system taking into account the scalability and extensibility criteria. Practically, all matchers are evaluated using precision and recall measures. For example, the authors (Smiljanic et al., 2006) propose the evaluation of quality in terms of performance. The performance of a schema matching system consists of efficiency (which expresses how much one system performs faster than the other) and effectiveness (expressed through precision and recall). In (Duchateau et al., 2007), the authors propose quality measures of matching using a number of scoring functions. More specially, the quality of Matching (QoM) is based on the use of quality measures to evaluate the matching system. Firstly, we need to identify which quality factors to be evaluated. The selection of the appropriate quality factors implies the selection of metrics and the implementation of evaluation algorithms that measure and estimate such quality factors. In this respect, a metric is a specific instrument that can be used to measure a given quality factor. We distinguish between two aspects (Figure. 1): the factors that influence the quality and the metrics to evaluate and measure the quality of the matching techniques. We propose the factors that mainly depend on the quality of the context (input data and the characteristics of the domain) and the features of matching systems and algorithms. On the other hand, we define the metrics (performance, accuracy, scalability, etc) in term of the characteristics of the matching process that builds the resulting data from sources Quality factors in large scale matching The factors that have an influence on a large scale are essentially related to the context (input data and domain) and matching systems or algorithms. We summarize these quality factors in the following paragraph. Factors related to the context Input data: Quality of matching depends on the internal quality of the data (their coherence, their completeness, their freshness, etc.), and on the confidence about producers of these data. Moreover, we should determine the type, representation and structure of data that have been used (schemas, ontologies, query interfaces etc). These characteristics influence the quality of matching. Domain: Data reside at different sources and consequently are extracted from different domains. Data managed by different sources are typically heterogeneous, and data can be incorrect, incomplete, and noisy, that is, it may be data of poor quality. Therefore, it is important to determine if the data source result from different or the same domains, the characteristic of domains, etc.

4 Factors related to the matching systems/ algorithms Techniques: In a context where the information is produced by sophisticated algorithms, the quality measurement requires a fine knowledge of the computing process of this information. Moreover, the use of these algorithms and techniques (i.e. the type of the matchers implemented (schema vs. instance level, element vs. structural level, language vs. constraint based, etc), auxiliary information, optimization techniques, etc.) could be very expensive. Needs in Runtime performance: The quality of matching solutions is measured in terms of how long applications take to be run to completion when tasks of applications are allocated to nodes based on decisions of matching algorithms. This duration is called execution time. Efficient matching algorithms must keep times to a minimum. Complexity: The matching problem is an extreme case in terms of size and complexity. In fact, the schema matching problem is a combinatorial problem with an exponential complexity. This complexity is due to the large number and size of data (number of schemas/components), and the expensive computation of semantic similarity (e.g using the auxiliary resources). Consequently, this makes the naive matching algorithms for large schemas prohibitively inefficient. Therefore, the complexity is a property that affects the quality of matching algorithms. Human interaction (Wang et al., 2007): Matching operation cannot be entirely automated; it is still largely conducted by hand, in a labor-intensive and error-prone process. The manual matching has now become a key bottleneck in building large-scale information management systems. Therefore, user or designer input is necessary to generate correct matchings Quality metrics in large scale matching We define the metrics that are involved individually in existing large scale matching systems evaluations. Our classification (figure.1) could be a support to QoM. We propose the metrics (performance, accuracy, scalability, etc) in term of the characteristics of the matching process that builds the resulting data from sources. Performance: The performance is measured in terms of efficiency and pertinence: Efficiency: It is the time the system needs to solve a matching problem. Pertinence: Evaluates the relevance of matching results. This metric can be calculated by precision and recall values (Do et al., 2002). In order to compare the quality of the matching, we have established a manual matching as referential. Therefore, the results obtained by the automatic matching are separately checked with respect to three quality measurements: Precision, Recall and Overall.

5 Accuracy: Called also Overall has been proposed by (Melnik et al., 2002) specifically in schema matching context. This measure considers the post-match effort needed. Accuracy depends on both Recall and Precision measures. Manual effort (Wang et al., 2007): It is very important to specify the kind of manual effort during the pre-matching process and the post- matching process. This metric is a type of cost metric that estimates the human part of the cost and typically measured in person-days or person-months (spent time in correction and improvement of the matching output). Scalability: It is a property of systems to keep functioning correctly even with the adding new elements. A system, whose performance improves after adding hardware, proportionally to the capacity added, is said to be a scalable system. In our context, an algorithm is said to be to scale if it is suitably efficient and practical when applied to large situations (e.g. large input data set or large number of participating nodes in the case of a distributed system). For a given matching algorithm implementation on a given machine, let T(A,S) be the execution time with the algorithm A performed on S schemas, and T(A,S') be the execution time with the algorithm A performed on S schemas with (S'>S). If the number of schemas increases from S to S', and the efficiency E is conserved, then we can define scalability metric or time scale TS as follows: T(A,S) time-scale E,(S,S') = T(A,S') The value of time-scale E, (S,S') is inevitably less than 1, otherwise the algorithm is time constant; but this is impossible currently with all algorithms of the literature. A large value means a good scalability of the algorithm applied to a large number of schemas, and a small value means a poor scalability. Adaptability (Bharadwaj et al., 2004): Refers to the degree to which adjustments in practices, processes, or structures of systems are possible to projected or actual changes of their environment. This criterion could measure the degree of change that a system can support. Extensibility: Means that the system has been so architected that the design includes all of the hooks and mechanisms for expanding/enhancing the system with new capabilities without having to make major changes to the system infrastructure. Therefore, matching systems should be extended by adding matching techniques, algorithms or customized data structures and operators.

6 Figure 1. Quality of matching (QoM) : factors and metrics 2.2 Quality of matching improvement in large scale The goal of this section is to analyse the different strategies and techniques used in existing matching approaches (pair-wise and holistic) to improve the quality of matching. To this end, we analyse and underline the importance and the usefulness of the used techniques and strategies, called optimization techniques QoM improvement in Pair-wise matching Matching has been approached mainly by finding pair-wise attribute correspondences, to construct an integrated schema for two sources. Several pair-wise matching approaches over schemas and ontologies have been developed. Schema matching Being a central process for several research topics like data integration, data transformation, schema evolution, etc, schema matching (figure.2) has attracted much attention by research community (Avesani et al,2005; Bernstein et al, 2004; Do et al, 2007; Lu & Wang, 2005; Smiljanic et al, 2006). We present the main strategies dealing with quality (e.g scalability) problem. These strategies represent an effective attempt to resolve large scale matching problem. The used techniques aim at improving the quality of matching: Fragment based strategy (Rahm et al, 2004): This is a divide and conquer approach which decomposes a large matching problem into smaller sub-problems by matching at the level of schema fragments. This approach has been implemented in COMA++ (Do et al, 2007) matching tool. The fragment-based approach represents an effective solution to treat large schemas and to improve the performance of matching algorithms. Extraction of common structures (Lu & Wang, 2005): The main goal of this approach is to extract a disjoint set of the largest approximate common substructures between two trees. This set of common structures represents the most likely matches between substructures in the two schemas. Identifying these structures aim at

7 improving the efficiency of matching process. However, there is no proof of correctness of this proposed approach. Clustered schema matching strategy (Avesani et al, 2005): This is a technique for improving the efficiency of schema matching by means of clustering. In this approach, matching is achieved between a small schema and a schema repository. The clustering is introduced after the generation of matching elements. Clustering is then used to quickly identify regions in the schema repository which are likely to include good matchings for the smaller schema. The clustered schema matching is achieved by the clustering algorithm K-means (Xu & Wunsch, 2005). The authors choose an adaptation of the k-means clustering algorithm. Bellflower system implements this technique. The improved efficiency, however, comes at the cost of the loss of some matching. The loss mostly occurs among the matchings which rank low. However, there is no measure of a cluster s quality that can be used to decide which clusters have better chances to produce good matchings. Figure 2. Pair-wise schema matching Ontology matching Ontology matching (figure. 3) is a promising solution to the semantic heterogeneity problem. It finds correspondences between semantically related entities of the ontologies. These correspondences can be used for various tasks, such as ontology merging, query answering, data translation, or for navigation on the semantic web. Thus, matching ontologies enables the knowledge and data expressed in the matched ontologies to interoperate. The increasing awareness of the benefits of ontologies for information processing has led to the creation of a number of large ontologies about real world domains. The size of these ontologies causes serious problems in managing them. Actually, many approaches (Hu & Qu, 2006; Hu et al, 2006; Qu et al, 2006; Stuckenschmidt & Klein, 2004; Wang et al, 2006a; Wang et al, 2006b) have been proposed in literature to study the large ontology matching problem. We describe her existing approaches and techniques aiming to improve the quality of large scale matching ontologies: Partitioning strategy (Hu & Qu, 2006): have been introduced this strategy as a method for partition-based block matching that is appropriate to large class hierarchies. Large class hierarchies are one of the most common kinds of large-scale ontologies. The two large class hierarchies are partitioned, based on both structural affinities and linguistic similarities, a priori into small blocks respectively. The matching process is then achieved between blocks by combining the two kinds of

8 relatedness found via predefined anchors and virtual documents between them. The partitioning process is realized based on ROCK (Robust Clustering Using Links) algorithm (Xu & Wunsch, 2005). However, this approach is not completely applicable to large ontologies and it partitions two large class hierarchies separately without considering the correspondences between them. In addition, it only assumes matchings between classes, thus it is not a general solution for ontology matching. To cope with large ontologies matching, Hu & Qu (2006) then propose a partitioningbased approach to address the block matching problem. Qu et al (2006) consider both linguistic and structural characteristics of domain entities based on virtual documents for the relatedness measure. Partitioning ontologies is achieved by a hierarchical bisection algorithm to provide block mappings. Modularization strategy: Wang et al (2006) propose this approach to deal with large and complex ontologies. The authors propose a Modularization-based Ontology Matching approach (MOM). This is a divide-and-conquer strategy which decomposes a large matching problem into smaller sub-problems by matching at the level of ontology modules. This approach includes sub-steps for large ontology partitioning, finding similar modules, module matching and result combination. This method uses the ε -connection to transform the input ontology into an ε -connection with the largest possible number of connected knowledge bases. Figure 3. Pair-wise ontology matching QoM improvement in Holistic matching Traditional schema matching research has been found by pair-wise approach. Recently, holistic schema matching has received much attention due to its efficiency in exploring the contextual information and scalability. Holistic matching (figure.4) matches multiple schemas at the same time to find attribute correspondences among all the schemas at once. These schemas are usually extracted from web query interfaces in the deep Web. The deep Web refers to World Wide Web content not part of the surface Web indexed by search engines. The data sources in the deep Web are structured and accessible only via dynamic queries instead of static URL links. Several current approaches to holistic schema matching (He et al, 2006; He et al, 2004; He et al, 2003; He et al, 2005; Madhavan et al, 2005; Pei et al, 2006a; Pei et al, 2006b; Su et al, 2006a; Su et al, 2006b) rely on a large amount of data to discover semantic correspondences between attributes. We describe the most important strategies proposed in the literature and we highlight the used techniques to improve the quality of holistic matching. Statistical strategy: This approach has been introduced in (He et al, 2004; He et al, 2003) with MGS (for hypothesis modeling, generation, and selection) and a

9 DCM (Dual Correlation Mining) framework. The MGS framework is an approach for global evaluation, building upon the hypothesis of the existence of a hidden schema model that probabilistically generates the schemas we observed. This evaluation estimates all possible models, where a model expresses all attributes matchings. Nevertheless, this approach does not take into consideration complex mappings. DCM framework has been proposed for local evaluation, based on the observation that cooccurrence patterns across schemas often reveal the complex relationships of attributes. However, these approaches suffer from noisy data. The works suggested in (Chen et al, 2005; He et al, 2006) outperform (He et al, 2004; He et al, 2003) by adding sampling and voting techniques, which are inspired by bagging predictors. Specifically, this approach creates a set of matchers, by randomizing input schema data into many independently down sampled trials, executing the same matcher on each trial and then aggregating their ranked results by taking majority voting. HSM (Holistic Schema Matching) (Su et al, 2006a) and PSM (Parallel Schema Matching) (Su et al, 2006b) have been proposed to find matching attributes across a set of Web database schemas of the same domain. HSM integrates several steps: matching score calculation that measures the probability of two attributes being synonymous, grouping score calculation that estimates whether two attributes are grouping attributes. PSM forms parallel schemas by comparing two schemas and deleting their common attributes. HSM and PSM are purely based on the occurrence patterns of attributes and require neither domain-knowledge, nor user interaction. Clustering based approach: This approach has been presented in (Pei et al, 2006a; Pei et al, 2006b). First, schemas are clustered based on their contextual similarity. Second, attributes of the schemas that are in the same schema cluster are clustered to find attribute correspondences between these schemas. Third, attributes are clustered across different schema clusters using statistical information gleaned from the existing attribute clusters to find attribute correspondences between more schemas. The K-means algorithm has been used in these three clustering tasks and a resampling method has been proposed to extract stable attributes from a collection of data. Figure 4. Holistic schema matching 3. Quality of Matching in PLASMA (Platform for LArge Scale schema MAtching) The goal of this section is to show how quality of matching can be integrated in our scalable schema matching system PLASMA (PLatform for LArge Scale MAtching) and how we can evaluate this quality. In our platform PLASMA, the quality of matching takes place in each phase to evaluate each treatment. The final goal of QoM is to ensure scalability, adaptability and extensibility of our system.

10 System Description: The architecture of PLASMA (figure 5) is deployed in three phases: Pre-matching, matching and Post-matching. Figure 5. Architecture of PLASMA Pre-Matching: This phase represents a pre-treatment of voluminous schemas. The focus of Pre-Matching phase is to find the common and similar characteristics between various XML schemas in an automated manner to effectively facilitate the matching process. It includes: an XML schema parser to analyse and transform XML schemas into trees, a thesaurus to address the issue of synonyms, abbreviations similarities, a holistic module to find the most similar sub-schemas and a QoM module. To improve the quality of matching, we propose the use of tree mining algorithms. The goal of this operation is to find all common and similar sub-structures. The result of prematching phase is a set of the most similar sub-structures ready for matching. The interest of this approach is to reduce the complexity of large scale matching and to improve the performance of matching algorithms. In fact, this approach is based on decomposing large schemas into smaller ones. Then matching will be performed between small schemas. The module QoM evaluates then: Human interaction: Due to the use of thesaurus, the user effort must be evaluated. The metric used to evaluate this effort is the manual effort. An intuitive formula to give the Effort deviation called Ed should be calculated in function of RPD: the Real Person Days and the PPD: the Planned Person Days. Ed = RPD PPD PPD*100 Mining techniques: We determine the performance, execution times and scalability of the used tree mining algorithms.

11 Matching: The resulted common subs-schemas are matched in this phase. We apply then structural matcher (pair-wise module) on these sub-schemas instead of matching all the original input schemas. Then matching large schemas is reduced to the matching of much smaller ones. We apply our matching algorithm (Chukmol et al., 2005) to discover structural correspondences between pair of schema elements. This similarity considers the context of the elements. The quality of matching module is defined by the evaluation of execution times of matching algorithms and the quality of resulted matchings. To calculate the match quality measures, we define the following metrics: The Precision and Recall are largely used in the field of the information retrieval and they are also used in the evaluations of matching systems in (Do et al., 2002). The Overall is developed specifically in the context of schema matching. It measures the effort of post-matching necessary to add the true negative and remove the false positive. The following formulas are used to calculate these measurements: Precision = B, calculates the number of true correspondences B found B + C among those returned ( B + C ); C is called false positive. Recall = B A + B, calculates the number of true correspondences B found among the total of true correspondences ( A + B ); A is called true negative. A + C B C 1 precision Overall = 1 = = Re call * (2 ), represents the effort needed to A + B A + B correct the results of an automatic matching (i.e. adding the true negative and removing the false positive). Post-Matching: This module combines the structural and linguistic matchings. We select the highly ranked matchings that represent the most pertinent results. We use different measures to select the best correspondences for an element from a set possible matchings. The output is the set of elements correspondences and the most similar schemas. These results will be saved for a forthcoming use. To this end, we evaluate the quality of the selected matching to assure the reuse of good and valuable results. 4. Conclusion and future works This paper presented a broad scope of quality of matching characteristics. We have presented the main importance of QoM in large scale matching scenarios. Since quality is very important to evaluate matching systems, we have proposed metrics to measure the quality of Matching (QoM) and defined the different factors that influence the quality. We have achieved a state of the art study covering strategies to improve QoM. Based on these existing techniques, we have proposed an approach based on tree mining algorithms to improve QoM and an evaluation of the quality in every phase of our system architecture PLASMA, which is a large scale matching system. In the future, we plan to implement all the proposed metrics in our system to evaluate the quality of the matching results and to test the scalability of our system.

12 5. References Avesani, P., Giunchiglia, F., & Yatskevich, M.(2005). A Large Scale taxonomy mapping Evaluation. In Proceedings of the 4th International Semantic Web Conference (ISWC), Galway, Ireland, Bernstein, P. A., Melnik, S.,Petropoulos, M., & Quix, C. (2004). Industrial-Strength Schema Matching. In ACM SIGMOD Record, Chukmol, U., Rifaieh, R. and Benharkat, A. (2005) EXSMAL: EDI/XML semiautomatic Schema Matching Algorithm. In the 7th International IEEE Conference on E-Commerce Technology (CEC), Do, H.H., Melnik, S., & Rahm, E.(2002). Comparison of schema Matching Evaluations. In GI-Workshop Web and Databases.Erfurt, Germany, Do H.H., & Rahm, E. (2007). Matching large schemas: Approaches and evaluation. In Journal of Information Systems, He, B., & Chen-chuan Chang, K.(2006). Automatic Complex Schema Matching Across Web Query Interfaces: A Correlation Mining Approach. In ACM Transactions on Database Systems (TODS). ACM Press, New York, Duchateau, F., Bellahsene, Z., and Hunt, E. (2007). Xbenchmatch: a benchmark for xml schema matching tools. In VLDB, He, B., Chen-Chan Chang, K., & Han, J.(2004). Discovering complex matchings across Web Query Interfaces: A Correlation Mining Approach. In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). ACM Press,New York, NY, He, B., & Chen-Chan Chang, K.(2003). Statistical Schema Matching across Web Query Interfaces. In Proceedings of the ACM SIGMOD International Conference on Management of Data. San Diego, California, He, H., Meng, W., Yu, C., & Wu, Z. (2005). WISE-Integrator : A System for Extracting and Integrating Complex Web Search Interfaces of the Deep Web. In Proceedings of the 31st International Conference on Very Large Data Bases (VLDB). Trondheim, Norway, Hu, W. & Qu, Y.(2006). Block Matching for Ontologies. In Proceedings of the 5th International Semantic Web Conference (ISWC). Athens, GA, USA, Hu, W., Zhao, Y., & Qu,Y. (2006). Partition-Based Block Matching of Large Class Hierarchies. In Proceedings of the First Asian Semantic Web Conference (ASWC). Beijing, China, Lu, J., Wang, S., & Wang, J. (2005). An experiment on the Matching and Reuse of XML Schemas. In Proceedings of the 5th International Conference on Web engineering (ICWE)). Sydney, Australia, Madhavan, J., Bernstein, P. A., Doan, A., & Halevy, A.Y. (2005). Corpus-based Schema Matching. In Proceedings of the 21st International Conference on Data Engineering (ICDE). Tokyo, Japan, Melnik S., Garcia-Molina H., Rahm E. (2002) Similarity Flooding: A Versatile Graph Matching Algorithm and Its Application to Schema Matching. Proceedings de la conférence ICDE 02, San Jose, CA, 26 Février - 1 Mars, Pei, J., Hong, J., & Bell, D.A. (2006a). A Novel Clustering-based Approach to Schema Matching. In Proceedings of the 4th International Conference on Advances in Information Systems (ADVIS). Izmir, Turkey, Pei, J., Hong, J., & Bell, D.A.(2006b).A Robust Approach to Schema Matching over Web Query Interfaces. In Proceedings of the 22nd International Conference on Data Engineering Workshops (ICDE Workshops).Atlanta, GA.

13 Qu, Y., Hu, W., & Cheng, G. (2006). Constructing Virtual Documents for Ontology Matching. In Proceedings of the 15th International Conference on World Wide Web (WWW). ACM Press Edinburgh, Scotland, Rahm, E., Do, H.H., & Maβmann, S.(2004). Matching Large XML Schemas. In SIGMOD Record. ACM Press, New York, NY, Shvaiko P., & Euzenat J. (2005). A Survey of Schema-based Matching approaches. Journal on Data Semantics IV 3730, Smiljanic, M., Keulen, M., & Jonker, W. (2006). Using Element Clustering to Increase the Efficiency of XML Schema Matching. In Proceedings of the 22nd International Conference on Data Engineering Workshops (ICDE Workshops).. Stuckenschmidt, H., & Klein, M. (2004). Structure-based Partitioning of large concept hierarchies. In Proceedings of the 3rd International Semantic Web Conference (ISWC). Hiroshima, Japan, Su, W., Wang, J., & Lochovsky, F. (2006a). Holistic Schema Matching for Web Query Interface. In Proceedings of the 10th International Conference on Extending Database Technology (EDBT). Munich, Germany, Su, W., Wang, J., &Lochovsky, F. (2006b). Holistic Query Interface Matching using Parallel Schema Matching. In Proceedings of the 22nd International Conference on Data Engineering (ICDE). Atlanta, GA. Wang, Z., Wang, Y., Zhang, S., Shen, G. & Du, T. (2006). Effective Large Scale Ontology Mapping. In Proceedings of the First International Conference Knowledge Science, Engineering and Management (KSEM). Guilin, China, Wang, Z., Wang, Y., Zhang, S., Shen, G., & Du, T. (2006). Matching Large Scale Ontology Effectively. In Proceedings of the First Asian Semantic Web Conference (ASWC). Beijing, China, Wang, G., Rifaieh, R., Goguen, J., Zavesov, V., Rajasekar, A., and Miller, M. (2007). Towards user centric schema mapping platform. In International Workshop on Semantic Data and Service Integration, Vienna, Austria. Xu, R. and Wunsch, D. (2005). Survey of clustering algorithms. Neural Networks, IEEE Transactions, 16:

XBenchMatch: a Benchmark for XML Schema Matching Tools

XBenchMatch: a Benchmark for XML Schema Matching Tools XBenchMatch: a Benchmark for XML Schema Matching Tools Fabien Duchateau, Zohra Bellahsene, Ela Hunt To cite this version: Fabien Duchateau, Zohra Bellahsene, Ela Hunt. XBenchMatch: a Benchmark for XML

More information

Matching and Alignment: What is the Cost of User Post-match Effort?

Matching and Alignment: What is the Cost of User Post-match Effort? Matching and Alignment: What is the Cost of User Post-match Effort? (Short paper) Fabien Duchateau 1 and Zohra Bellahsene 2 and Remi Coletta 2 1 Norwegian University of Science and Technology NO-7491 Trondheim,

More information

Poster Session: An Indexing Structure for Automatic Schema Matching

Poster Session: An Indexing Structure for Automatic Schema Matching Poster Session: An Indexing Structure for Automatic Schema Matching Fabien Duchateau LIRMM - UMR 5506 Université Montpellier 2 34392 Montpellier Cedex 5 - France duchatea@lirmm.fr Mark Roantree Interoperable

More information

Keywords Data alignment, Data annotation, Web database, Search Result Record

Keywords Data alignment, Data annotation, Web database, Search Result Record Volume 5, Issue 8, August 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Annotating Web

More information

PRIOR System: Results for OAEI 2006

PRIOR System: Results for OAEI 2006 PRIOR System: Results for OAEI 2006 Ming Mao, Yefei Peng University of Pittsburgh, Pittsburgh, PA, USA {mingmao,ypeng}@mail.sis.pitt.edu Abstract. This paper summarizes the results of PRIOR system, which

More information

Lily: Ontology Alignment Results for OAEI 2009

Lily: Ontology Alignment Results for OAEI 2009 Lily: Ontology Alignment Results for OAEI 2009 Peng Wang 1, Baowen Xu 2,3 1 College of Software Engineering, Southeast University, China 2 State Key Laboratory for Novel Software Technology, Nanjing University,

More information

What you have learned so far. Interoperability. Ontology heterogeneity. Being serious about the semantic web

What you have learned so far. Interoperability. Ontology heterogeneity. Being serious about the semantic web What you have learned so far Interoperability Introduction to the Semantic Web Tutorial at ISWC 2010 Jérôme Euzenat Data can be expressed in RDF Linked through URIs Modelled with OWL ontologies & Retrieved

More information

Anchor-Profiles for Ontology Mapping with Partial Alignments

Anchor-Profiles for Ontology Mapping with Partial Alignments Anchor-Profiles for Ontology Mapping with Partial Alignments Frederik C. Schadd Nico Roos Department of Knowledge Engineering, Maastricht University, Maastricht, The Netherlands Abstract. Ontology mapping

More information

Ontology Matching with CIDER: Evaluation Report for the OAEI 2008

Ontology Matching with CIDER: Evaluation Report for the OAEI 2008 Ontology Matching with CIDER: Evaluation Report for the OAEI 2008 Jorge Gracia, Eduardo Mena IIS Department, University of Zaragoza, Spain {jogracia,emena}@unizar.es Abstract. Ontology matching, the task

More information

Integrating Correlation Clustering and Agglomerative Hierarchical Clustering for Holistic Schema Matching

Integrating Correlation Clustering and Agglomerative Hierarchical Clustering for Holistic Schema Matching Journal of Computer Science Original Research Paper Integrating Correlation Clustering and Agglomerative Hierarchical Clustering for Holistic Schema Matching Basel Alshaikhdeeb and Kamsuriah Ahmad Faculty

More information

XML Schema Matching Using Structural Information

XML Schema Matching Using Structural Information XML Schema Matching Using Structural Information A.Rajesh Research Scholar Dr.MGR University, Maduravoyil, Chennai S.K.Srivatsa Sr.Professor St.Joseph s Engineering College, Chennai ABSTRACT Schema matching

More information

Deep Web Content Mining

Deep Web Content Mining Deep Web Content Mining Shohreh Ajoudanian, and Mohammad Davarpanah Jazi Abstract The rapid expansion of the web is causing the constant growth of information, leading to several problems such as increased

More information

A Generic Algorithm for Heterogeneous Schema Matching

A Generic Algorithm for Heterogeneous Schema Matching You Li, Dongbo Liu, and Weiming Zhang A Generic Algorithm for Heterogeneous Schema Matching You Li1, Dongbo Liu,3, and Weiming Zhang1 1 Department of Management Science, National University of Defense

More information

Ontology matching using vector space

Ontology matching using vector space University of Wollongong Research Online University of Wollongong in Dubai - Papers University of Wollongong in Dubai 2008 Ontology matching using vector space Zahra Eidoon University of Tehran, Iran Nasser

More information

A Tool for Semi-Automated Semantic Schema Mapping: Design and Implementation

A Tool for Semi-Automated Semantic Schema Mapping: Design and Implementation A Tool for Semi-Automated Semantic Schema Mapping: Design and Implementation Dimitris Manakanatas, Dimitris Plexousakis Institute of Computer Science, FO.R.T.H. P.O. Box 1385, GR 71110, Heraklion, Greece

More information

Alignment Results of SOBOM for OAEI 2009

Alignment Results of SOBOM for OAEI 2009 Alignment Results of SBM for AEI 2009 Peigang Xu, Haijun Tao, Tianyi Zang, Yadong, Wang School of Computer Science and Technology Harbin Institute of Technology, Harbin, China xpg0312@hotmail.com, hjtao.hit@gmail.com,

More information

TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES

TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES Mu. Annalakshmi Research Scholar, Department of Computer Science, Alagappa University, Karaikudi. annalakshmi_mu@yahoo.co.in Dr. A.

More information

XML Grammar Similarity: Breakthroughs and Limitations

XML Grammar Similarity: Breakthroughs and Limitations XML Grammar Similarity: Breakthroughs and Limitations Joe TEKLI, Richard CHBEIR* and Kokou YETONGNON LE2I Laboratory University of Bourgogne Engineer s Wing BP 47870 21078 Dijon CEDEX FRANCE Phone: (+33)

More information

The Results of Falcon-AO in the OAEI 2006 Campaign

The Results of Falcon-AO in the OAEI 2006 Campaign The Results of Falcon-AO in the OAEI 2006 Campaign Wei Hu, Gong Cheng, Dongdong Zheng, Xinyu Zhong, and Yuzhong Qu School of Computer Science and Engineering, Southeast University, Nanjing 210096, P. R.

More information

Schema-based Semantic Matching: Algorithms, a System and a Testing Methodology

Schema-based Semantic Matching: Algorithms, a System and a Testing Methodology Schema-based Semantic Matching: Algorithms, a System and a Testing Methodology Abstract. Schema/ontology/classification matching is a critical problem in many application domains, such as, schema/ontology/classification

More information

LPHOM results for OAEI 2016

LPHOM results for OAEI 2016 LPHOM results for OAEI 2016 Imen Megdiche, Olivier Teste, and Cassia Trojahn Institut de Recherche en Informatique de Toulouse (UMR 5505), Toulouse, France {Imen.Megdiche, Olivier.Teste, Cassia.Trojahn}@irit.fr

More information

Improving the Efficiency of Fast Using Semantic Similarity Algorithm

Improving the Efficiency of Fast Using Semantic Similarity Algorithm International Journal of Scientific and Research Publications, Volume 4, Issue 1, January 2014 1 Improving the Efficiency of Fast Using Semantic Similarity Algorithm D.KARTHIKA 1, S. DIVAKAR 2 Final year

More information

Ontology Matching Techniques: a 3-Tier Classification Framework

Ontology Matching Techniques: a 3-Tier Classification Framework Ontology Matching Techniques: a 3-Tier Classification Framework Nelson K. Y. Leung RMIT International Universtiy, Ho Chi Minh City, Vietnam nelson.leung@rmit.edu.vn Seung Hwan Kang Payap University, Chiang

More information

Automatic Complex Schema Matching Across Web Query Interfaces: A Correlation Mining Approach

Automatic Complex Schema Matching Across Web Query Interfaces: A Correlation Mining Approach Automatic Complex Schema Matching Across Web Query Interfaces: A Correlation Mining Approach BIN HE and KEVIN CHEN-CHUAN CHANG University of Illinois at Urbana-Champaign To enable information integration,

More information

SEMANTIC WEB POWERED PORTAL INFRASTRUCTURE

SEMANTIC WEB POWERED PORTAL INFRASTRUCTURE SEMANTIC WEB POWERED PORTAL INFRASTRUCTURE YING DING 1 Digital Enterprise Research Institute Leopold-Franzens Universität Innsbruck Austria DIETER FENSEL Digital Enterprise Research Institute National

More information

arxiv: v1 [cs.db] 23 Feb 2016

arxiv: v1 [cs.db] 23 Feb 2016 SIFT: An Algorithm for Extracting Structural Information From Taxonomies Jorge Martinez-Gil, Software Competence Center Hagenberg (Austria), jorgemar@acm.org Keywords: Algorithms; Knowledge Engineering;

More information

UAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA

UAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA UAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA METANAT HOOSHSADAT, SAMANEH BAYAT, PARISA NAEIMI, MAHDIEH S. MIRIAN, OSMAR R. ZAÏANE Computing Science Department, University

More information

Extraction of Web Image Information: Semantic or Visual Cues?

Extraction of Web Image Information: Semantic or Visual Cues? Extraction of Web Image Information: Semantic or Visual Cues? Georgina Tryfou and Nicolas Tsapatsoulis Cyprus University of Technology, Department of Communication and Internet Studies, Limassol, Cyprus

More information

Enabling Product Comparisons on Unstructured Information Using Ontology Matching

Enabling Product Comparisons on Unstructured Information Using Ontology Matching Enabling Product Comparisons on Unstructured Information Using Ontology Matching Maximilian Walther, Niels Jäckel, Daniel Schuster, and Alexander Schill Technische Universität Dresden, Faculty of Computer

More information

Context-Aware Analytics in MOM Applications

Context-Aware Analytics in MOM Applications Context-Aware Analytics in MOM Applications Martin Ringsquandl, Steffen Lamparter, and Raffaello Lepratti Corporate Technology Siemens AG Munich, Germany martin.ringsquandl.ext@siemens.com arxiv:1412.7968v1

More information

Combining Multiple Query Interface Matchers Using Dempster-Shafer Theory of Evidence

Combining Multiple Query Interface Matchers Using Dempster-Shafer Theory of Evidence Combining Multiple Query Interface Matchers Using Dempster-Shafer Theory of Evidence Jun Hong, Zhongtian He and David A. Bell School of Electronics, Electrical Engineering and Computer Science Queen s

More information

Web Database Integration

Web Database Integration In Proceedings of the Ph.D Workshop in conjunction with VLDB 06 (VLDB-PhD2006), Seoul, Korea, September 11, 2006 Web Database Integration Wei Liu School of Information Renmin University of China Beijing,

More information

Semantic Interoperability. Being serious about the Semantic Web

Semantic Interoperability. Being serious about the Semantic Web Semantic Interoperability Jérôme Euzenat INRIA & LIG France Natasha Noy Stanford University USA 1 Being serious about the Semantic Web It is not one person s ontology It is not several people s common

More information

FOAM Framework for Ontology Alignment and Mapping Results of the Ontology Alignment Evaluation Initiative

FOAM Framework for Ontology Alignment and Mapping Results of the Ontology Alignment Evaluation Initiative FOAM Framework for Ontology Alignment and Mapping Results of the Ontology Alignment Evaluation Initiative Marc Ehrig Institute AIFB University of Karlsruhe 76128 Karlsruhe, Germany ehrig@aifb.uni-karlsruhe.de

More information

NUS-I2R: Learning a Combined System for Entity Linking

NUS-I2R: Learning a Combined System for Entity Linking NUS-I2R: Learning a Combined System for Entity Linking Wei Zhang Yan Chuan Sim Jian Su Chew Lim Tan School of Computing National University of Singapore {z-wei, tancl} @comp.nus.edu.sg Institute for Infocomm

More information

3 Classifications of ontology matching techniques

3 Classifications of ontology matching techniques 3 Classifications of ontology matching techniques Having defined what the matching problem is, we attempt at classifying the techniques that can be used for solving this problem. The major contributions

More information

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li Learning to Match Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li 1. Introduction The main tasks in many applications can be formalized as matching between heterogeneous objects, including search, recommendation,

More information

A Comparative Analysis of Ontology and Schema Matching Systems

A Comparative Analysis of Ontology and Schema Matching Systems A Comparative Analysis of Ontology and Schema Matching Systems K. Saruladha Computer Science Department, Pondicherry Engineering College, Puducherry, India Dr. G. Aghila Computer Science Department, Pondicherry

More information

Block Matching for Ontologies

Block Matching for Ontologies Block Matching for Ontologies Wei Hu and Yuzhong Qu School of Computer Science and Engineering, Southeast University, Nanjing 210096, P. R. China {whu, yzqu}@seu.edu.cn Abstract. Ontology matching is a

More information

OWL-CM : OWL Combining Matcher based on Belief Functions Theory

OWL-CM : OWL Combining Matcher based on Belief Functions Theory OWL-CM : OWL Combining Matcher based on Belief Functions Theory Boutheina Ben Yaghlane 1 and Najoua Laamari 2 1 LARODEC, Université de Tunis, IHEC Carthage Présidence 2016 Tunisia boutheina.yaghlane@ihec.rnu.tn

More information

Domain-specific Concept-based Information Retrieval System

Domain-specific Concept-based Information Retrieval System Domain-specific Concept-based Information Retrieval System L. Shen 1, Y. K. Lim 1, H. T. Loh 2 1 Design Technology Institute Ltd, National University of Singapore, Singapore 2 Department of Mechanical

More information

FCA-Map Results for OAEI 2016

FCA-Map Results for OAEI 2016 FCA-Map Results for OAEI 2016 Mengyi Zhao 1 and Songmao Zhang 2 1,2 Institute of Mathematics, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, P. R. China 1 myzhao@amss.ac.cn,

More information

Learning mappings and queries

Learning mappings and queries Learning mappings and queries Marie Jacob University Of Pennsylvania DEIS 2010 1 Schema mappings Denote relationships between schemas Relates source schema S and target schema T Defined in a query language

More information

Hierarchical Document Clustering

Hierarchical Document Clustering Hierarchical Document Clustering Benjamin C. M. Fung, Ke Wang, and Martin Ester, Simon Fraser University, Canada INTRODUCTION Document clustering is an automatic grouping of text documents into clusters

More information

A Tagging Approach to Ontology Mapping

A Tagging Approach to Ontology Mapping A Tagging Approach to Ontology Mapping Colm Conroy 1, Declan O'Sullivan 1, Dave Lewis 1 1 Knowledge and Data Engineering Group, Trinity College Dublin {coconroy,declan.osullivan,dave.lewis}@cs.tcd.ie Abstract.

More information

An Approach to Resolve Data Model Heterogeneities in Multiple Data Sources

An Approach to Resolve Data Model Heterogeneities in Multiple Data Sources Edith Cowan University Research Online ECU Publications Pre. 2011 2006 An Approach to Resolve Data Model Heterogeneities in Multiple Data Sources Chaiyaporn Chirathamjaree Edith Cowan University 10.1109/TENCON.2006.343819

More information

Semantic Web Mining and its application in Human Resource Management

Semantic Web Mining and its application in Human Resource Management International Journal of Computer Science & Management Studies, Vol. 11, Issue 02, August 2011 60 Semantic Web Mining and its application in Human Resource Management Ridhika Malik 1, Kunjana Vasudev 2

More information

Ontology matching techniques: a 3-tier classification framework

Ontology matching techniques: a 3-tier classification framework University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 2009 Ontology matching techniques: a 3-tier classification framework Nelson

More information

A Hierarchical Document Clustering Approach with Frequent Itemsets

A Hierarchical Document Clustering Approach with Frequent Itemsets A Hierarchical Document Clustering Approach with Frequent Itemsets Cheng-Jhe Lee, Chiun-Chieh Hsu, and Da-Ren Chen Abstract In order to effectively retrieve required information from the large amount of

More information

An Efficient Technique for Tag Extraction and Content Retrieval from Web Pages

An Efficient Technique for Tag Extraction and Content Retrieval from Web Pages An Efficient Technique for Tag Extraction and Content Retrieval from Web Pages S.Sathya M.Sc 1, Dr. B.Srinivasan M.C.A., M.Phil, M.B.A., Ph.D., 2 1 Mphil Scholar, Department of Computer Science, Gobi Arts

More information

ISSN (Online) ISSN (Print)

ISSN (Online) ISSN (Print) Accurate Alignment of Search Result Records from Web Data Base 1Soumya Snigdha Mohapatra, 2 M.Kalyan Ram 1,2 Dept. of CSE, Aditya Engineering College, Surampalem, East Godavari, AP, India Abstract: Most

More information

Background. Problem Statement. Toward Large Scale Integration: Building a MetaQuerier over Databases on the Web. Deep (hidden) Web

Background. Problem Statement. Toward Large Scale Integration: Building a MetaQuerier over Databases on the Web. Deep (hidden) Web Toward Large Scale Integration: Building a MetaQuerier over Databases on the Web K. C.-C. Chang, B. He, and Z. Zhang Presented by: M. Hossein Sheikh Attar 1 Background Deep (hidden) Web Searchable online

More information

A GML SCHEMA MAPPING APPROACH TO OVERCOME SEMANTIC HETEROGENEITY IN GIS

A GML SCHEMA MAPPING APPROACH TO OVERCOME SEMANTIC HETEROGENEITY IN GIS A GML SCHEMA MAPPING APPROACH TO OVERCOME SEMANTIC HETEROGENEITY IN GIS Manoj Paul, S. K. Ghosh School of Information Technology, Indian Institute of Technology, Kharagpur 721302, India - (mpaul, skg)@sit.iitkgp.ernet.in

More information

Interoperability Issues, Ontology Matching and MOMA

Interoperability Issues, Ontology Matching and MOMA Interoperability Issues, Ontology Matching and MOMA Malgorzata Mochol (Free University of Berlin, Königin-Luise-Str. 24-26, 14195 Berlin, Germany mochol@inf.fu-berlin.de) Abstract: Thought interoperability

More information

The HMatch 2.0 Suite for Ontology Matchmaking

The HMatch 2.0 Suite for Ontology Matchmaking The HMatch 2.0 Suite for Ontology Matchmaking S. Castano, A. Ferrara, D. Lorusso, and S. Montanelli Università degli Studi di Milano DICo - Via Comelico, 39, 20135 Milano - Italy {castano,ferrara,lorusso,montanelli}@dico.unimi.it

More information

IJMIE Volume 2, Issue 3 ISSN:

IJMIE Volume 2, Issue 3 ISSN: Deep web Data Integration Approach Based on Schema and Attributes Extraction of Query Interfaces Mr. Gopalkrushna Patel* Anand Singh Rajawat** Mr. Satyendra Vyas*** Abstract: The deep web is becoming a

More information

Interactive Campaign Planning for Marketing Analysts

Interactive Campaign Planning for Marketing Analysts Interactive Campaign Planning for Marketing Analysts Fan Du University of Maryland College Park, MD, USA fan@cs.umd.edu Sana Malik Adobe Research San Jose, CA, USA sana.malik@adobe.com Eunyee Koh Adobe

More information

Outline A Survey of Approaches to Automatic Schema Matching. Outline. What is Schema Matching? An Example. Another Example

Outline A Survey of Approaches to Automatic Schema Matching. Outline. What is Schema Matching? An Example. Another Example A Survey of Approaches to Automatic Schema Matching Mihai Virtosu CS7965 Advanced Database Systems Spring 2006 April 10th, 2006 2 What is Schema Matching? A basic problem found in many database application

More information

Annotating Multiple Web Databases Using Svm

Annotating Multiple Web Databases Using Svm Annotating Multiple Web Databases Using Svm M.Yazhmozhi 1, M. Lavanya 2, Dr. N. Rajkumar 3 PG Scholar, Department of Software Engineering, Sri Ramakrishna Engineering College, Coimbatore, India 1, 3 Head

More information

This tutorial has been prepared for computer science graduates to help them understand the basic-to-advanced concepts related to data mining.

This tutorial has been prepared for computer science graduates to help them understand the basic-to-advanced concepts related to data mining. About the Tutorial Data Mining is defined as the procedure of extracting information from huge sets of data. In other words, we can say that data mining is mining knowledge from data. The tutorial starts

More information

RiMOM Results for OAEI 2008

RiMOM Results for OAEI 2008 RiMOM Results for OAEI 2008 Xiao Zhang 1, Qian Zhong 1, Juanzi Li 1, Jie Tang 1, Guotong Xie 2 and Hanyu Li 2 1 Department of Computer Science and Technology, Tsinghua University, China {zhangxiao,zhongqian,ljz,tangjie}@keg.cs.tsinghua.edu.cn

More information

Leveraging Data and Structure in Ontology Integration

Leveraging Data and Structure in Ontology Integration Leveraging Data and Structure in Ontology Integration O. Udrea L. Getoor R.J. Miller Group 15 Enrico Savioli Andrea Reale Andrea Sorbini DEIS University of Bologna Searching Information in Large Spaces

More information

ANNUAL REPORT Visit us at project.eu Supported by. Mission

ANNUAL REPORT Visit us at   project.eu Supported by. Mission Mission ANNUAL REPORT 2011 The Web has proved to be an unprecedented success for facilitating the publication, use and exchange of information, at planetary scale, on virtually every topic, and representing

More information

Linking Entities in Chinese Queries to Knowledge Graph

Linking Entities in Chinese Queries to Knowledge Graph Linking Entities in Chinese Queries to Knowledge Graph Jun Li 1, Jinxian Pan 2, Chen Ye 1, Yong Huang 1, Danlu Wen 1, and Zhichun Wang 1(B) 1 Beijing Normal University, Beijing, China zcwang@bnu.edu.cn

More information

Challenges and Interesting Research Directions in Associative Classification

Challenges and Interesting Research Directions in Associative Classification Challenges and Interesting Research Directions in Associative Classification Fadi Thabtah Department of Management Information Systems Philadelphia University Amman, Jordan Email: FFayez@philadelphia.edu.jo

More information

Document Retrieval using Predication Similarity

Document Retrieval using Predication Similarity Document Retrieval using Predication Similarity Kalpa Gunaratna 1 Kno.e.sis Center, Wright State University, Dayton, OH 45435 USA kalpa@knoesis.org Abstract. Document retrieval has been an important research

More information

Interrogation System Architecture of Heterogeneous Data for Decision Making

Interrogation System Architecture of Heterogeneous Data for Decision Making Interrogation System Architecture of Heterogeneous Data for Decision Making Cécile Nicolle, Youssef Amghar, Jean-Marie Pinon Laboratoire d'ingénierie des Systèmes d'information INSA de Lyon Abstract Decision

More information

Unity: Speeding the Creation of Community Vocabularies for Information Integration and Reuse

Unity: Speeding the Creation of Community Vocabularies for Information Integration and Reuse Unity: Speeding the Creation of Community Vocabularies for Information Integration and Reuse Ken Smith, Peter Mork, Len Seligman, Peter Leveille, Beth Yost, Maya Li, Chris Wolf The MITRE Corporation {kps,

More information

LiSTOMS: a Light-weighted Self-tuning Ontology Mapping System

LiSTOMS: a Light-weighted Self-tuning Ontology Mapping System 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology LiSTOMS: a Light-weighted Self-tuning Ontology Mapping System Zhen Zhen Junyi Shen Institute of Computer

More information

An Efficient Algorithm for finding high utility itemsets from online sell

An Efficient Algorithm for finding high utility itemsets from online sell An Efficient Algorithm for finding high utility itemsets from online sell Sarode Nutan S, Kothavle Suhas R 1 Department of Computer Engineering, ICOER, Maharashtra, India 2 Department of Computer Engineering,

More information

38050 Povo Trento (Italy), Via Sommarive 14 Fausto Giunchiglia, Pavel Shvaiko and Mikalai Yatskevich

38050 Povo Trento (Italy), Via Sommarive 14   Fausto Giunchiglia, Pavel Shvaiko and Mikalai Yatskevich UNIVERSITY OF TRENTO DEPARTMENT OF INFORMATION AND COMMUNICATION TECHNOLOGY 38050 Povo Trento (Italy), Via Sommarive 14 http://www.dit.unitn.it SEMANTIC MATCHING Fausto Giunchiglia, Pavel Shvaiko and Mikalai

More information

A User-Guided Approach for Large-Scale Multi-schema Integration

A User-Guided Approach for Large-Scale Multi-schema Integration A User-Guided Approach for Large-Scale Multi-schema Integration Muhammad Wasimullah Khan 1 and Jelena Zdravkovic 2 1 School of Information and Communication Technology, Royal Institute of Technology 2

More information

YAM++ : A multi-strategy based approach for Ontology matching task

YAM++ : A multi-strategy based approach for Ontology matching task YAM++ : A multi-strategy based approach for Ontology matching task Duy Hoa Ngo, Zohra Bellahsene To cite this version: Duy Hoa Ngo, Zohra Bellahsene. YAM++ : A multi-strategy based approach for Ontology

More information

METEOR-S Web service Annotation Framework with Machine Learning Classification

METEOR-S Web service Annotation Framework with Machine Learning Classification METEOR-S Web service Annotation Framework with Machine Learning Classification Nicole Oldham, Christopher Thomas, Amit Sheth, Kunal Verma LSDIS Lab, Department of CS, University of Georgia, 415 GSRC, Athens,

More information

Bibster A Semantics-Based Bibliographic Peer-to-Peer System

Bibster A Semantics-Based Bibliographic Peer-to-Peer System Bibster A Semantics-Based Bibliographic Peer-to-Peer System Peter Haase 1, Björn Schnizler 1, Jeen Broekstra 2, Marc Ehrig 1, Frank van Harmelen 2, Maarten Menken 2, Peter Mika 2, Michal Plechawski 3,

More information

Jianyong Wang Department of Computer Science and Technology Tsinghua University

Jianyong Wang Department of Computer Science and Technology Tsinghua University Jianyong Wang Department of Computer Science and Technology Tsinghua University jianyong@tsinghua.edu.cn Joint work with Wei Shen (Tsinghua), Ping Luo (HP), and Min Wang (HP) Outline Introduction to entity

More information

Matching Large XML Schemas

Matching Large XML Schemas Matching Large XML Schemas Erhard Rahm, Hong-Hai Do, Sabine Maßmann University of Leipzig, Germany rahm@informatik.uni-leipzig.de Abstract Current schema matching approaches still have to improve for very

More information

Falcon-AO: Aligning Ontologies with Falcon

Falcon-AO: Aligning Ontologies with Falcon Falcon-AO: Aligning Ontologies with Falcon Ningsheng Jian, Wei Hu, Gong Cheng, Yuzhong Qu Department of Computer Science and Engineering Southeast University Nanjing 210096, P. R. China {nsjian, whu, gcheng,

More information

ImgSeek: Capturing User s Intent For Internet Image Search

ImgSeek: Capturing User s Intent For Internet Image Search ImgSeek: Capturing User s Intent For Internet Image Search Abstract - Internet image search engines (e.g. Bing Image Search) frequently lean on adjacent text features. It is difficult for them to illustrate

More information

(Big Data Integration) : :

(Big Data Integration) : : (Big Data Integration) : : 3 # $%&'! ()* +$,- 2/30 ()* + # $%&' = 3 : $ 2 : 17 ;' $ # < 2 6 ' $%&',# +'= > 0 - '? @0 A 1 3/30 3?. - B 6 @* @(C : E6 - > ()* (C :(C E6 1' +'= - ''3-6 F :* 2G '> H-! +'-?

More information

Towards Rule Learning Approaches to Instance-based Ontology Matching

Towards Rule Learning Approaches to Instance-based Ontology Matching Towards Rule Learning Approaches to Instance-based Ontology Matching Frederik Janssen 1, Faraz Fallahi 2 Jan Noessner 3, and Heiko Paulheim 1 1 Knowledge Engineering Group, TU Darmstadt, Hochschulstrasse

More information

Studying the Impact of Text Summarization on Contextual Advertising

Studying the Impact of Text Summarization on Contextual Advertising Studying the Impact of Text Summarization on Contextual Advertising G. Armano, A. Giuliani, and E. Vargiu Intelligent Agents and Soft-Computing Group Dept. of Electrical and Electronic Engineering University

More information

Ontology Based Prediction of Difficult Keyword Queries

Ontology Based Prediction of Difficult Keyword Queries Ontology Based Prediction of Difficult Keyword Queries Lubna.C*, Kasim K Pursuing M.Tech (CSE)*, Associate Professor (CSE) MEA Engineering College, Perinthalmanna Kerala, India lubna9990@gmail.com, kasim_mlp@gmail.com

More information

Joint Entity Resolution

Joint Entity Resolution Joint Entity Resolution Steven Euijong Whang, Hector Garcia-Molina Computer Science Department, Stanford University 353 Serra Mall, Stanford, CA 94305, USA {swhang, hector}@cs.stanford.edu No Institute

More information

Taccumulation of the social network data has raised

Taccumulation of the social network data has raised International Journal of Advanced Research in Social Sciences, Environmental Studies & Technology Hard Print: 2536-6505 Online: 2536-6513 September, 2016 Vol. 2, No. 1 Review Social Network Analysis and

More information

An approach to the model-based fragmentation and relational storage of XML-documents

An approach to the model-based fragmentation and relational storage of XML-documents An approach to the model-based fragmentation and relational storage of XML-documents Christian Süß Fakultät für Mathematik und Informatik, Universität Passau, D-94030 Passau, Germany Abstract A flexible

More information

E-Agricultural Services and Business

E-Agricultural Services and Business E-Agricultural Services and Business A Conceptual Framework for Developing a Deep Web Service Nattapon Harnsamut, Naiyana Sahavechaphan nattapon.harnsamut@nectec.or.th, naiyana.sahavechaphan@nectec.or.th

More information

Merging Data Mining Techniques for Web Page Access Prediction: Integrating Markov Model with Clustering

Merging Data Mining Techniques for Web Page Access Prediction: Integrating Markov Model with Clustering www.ijcsi.org 188 Merging Data Mining Techniques for Web Page Access Prediction: Integrating Markov Model with Clustering Trilok Nath Pandey 1, Ranjita Kumari Dash 2, Alaka Nanda Tripathy 3,Barnali Sahu

More information

Record Linkage using Probabilistic Methods and Data Mining Techniques

Record Linkage using Probabilistic Methods and Data Mining Techniques Doi:10.5901/mjss.2017.v8n3p203 Abstract Record Linkage using Probabilistic Methods and Data Mining Techniques Ogerta Elezaj Faculty of Economy, University of Tirana Gloria Tuxhari Faculty of Economy, University

More information

MapPSO Results for OAEI 2010

MapPSO Results for OAEI 2010 MapPSO Results for OAEI 2010 Jürgen Bock 1 FZI Forschungszentrum Informatik, Karlsruhe, Germany bock@fzi.de Abstract. This paper presents and discusses the results produced by the MapPSO system for the

More information

UC Irvine UC Irvine Previously Published Works

UC Irvine UC Irvine Previously Published Works UC Irvine UC Irvine Previously Published Works Title Differencing and merging within an evolving product line architecture Permalink https://escholarship.org/uc/item/0k73r951 Authors Chen, Ping H Critchlow,

More information

Matching Schemas for Geographical Information Systems Using Semantic Information

Matching Schemas for Geographical Information Systems Using Semantic Information Matching Schemas for Geographical Information Systems Using Semantic Information Christoph Quix, Lemonia Ragia, Linlin Cai, and Tian Gan Informatik V, RWTH Aachen University, Germany {quix, ragia, cai,

More information

Method to Study and Analyze Fraud Ranking In Mobile Apps

Method to Study and Analyze Fraud Ranking In Mobile Apps Method to Study and Analyze Fraud Ranking In Mobile Apps Ms. Priyanka R. Patil M.Tech student Marri Laxman Reddy Institute of Technology & Management Hyderabad. Abstract: Ranking fraud in the mobile App

More information

Evaluation of ontology matching

Evaluation of ontology matching Evaluation of ontology matching Jérôme Euzenat (INRIA Rhône-Alpes & LIG) + work within Knowledge web 2.2 and esp. Malgorzata Mochol (FU Berlin) April 19, 2007 Evaluation of ontology matching 1 / 44 Outline

More information

YAM++ Results for OAEI 2013

YAM++ Results for OAEI 2013 YAM++ Results for OAEI 2013 DuyHoa Ngo, Zohra Bellahsene University Montpellier 2, LIRMM {duyhoa.ngo, bella}@lirmm.fr Abstract. In this paper, we briefly present the new YAM++ 2013 version and its results

More information

AN IMPROVISED FREQUENT PATTERN TREE BASED ASSOCIATION RULE MINING TECHNIQUE WITH MINING FREQUENT ITEM SETS ALGORITHM AND A MODIFIED HEADER TABLE

AN IMPROVISED FREQUENT PATTERN TREE BASED ASSOCIATION RULE MINING TECHNIQUE WITH MINING FREQUENT ITEM SETS ALGORITHM AND A MODIFIED HEADER TABLE AN IMPROVISED FREQUENT PATTERN TREE BASED ASSOCIATION RULE MINING TECHNIQUE WITH MINING FREQUENT ITEM SETS ALGORITHM AND A MODIFIED HEADER TABLE Vandit Agarwal 1, Mandhani Kushal 2 and Preetham Kumar 3

More information

DENSITY BASED AND PARTITION BASED CLUSTERING OF UNCERTAIN DATA BASED ON KL-DIVERGENCE SIMILARITY MEASURE

DENSITY BASED AND PARTITION BASED CLUSTERING OF UNCERTAIN DATA BASED ON KL-DIVERGENCE SIMILARITY MEASURE DENSITY BASED AND PARTITION BASED CLUSTERING OF UNCERTAIN DATA BASED ON KL-DIVERGENCE SIMILARITY MEASURE Sinu T S 1, Mr.Joseph George 1,2 Computer Science and Engineering, Adi Shankara Institute of Engineering

More information

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK REVIEW PAPER ON IMPLEMENTATION OF DOCUMENT ANNOTATION USING CONTENT AND QUERYING

More information

Multi-dimensional database design and implementation of dam safety monitoring system

Multi-dimensional database design and implementation of dam safety monitoring system Water Science and Engineering, Sep. 2008, Vol. 1, No. 3, 112-120 ISSN 1674-2370, http://kkb.hhu.edu.cn, e-mail: wse@hhu.edu.cn Multi-dimensional database design and implementation of dam safety monitoring

More information

Advances in Data Management - Web Data Integration A.Poulovassilis

Advances in Data Management - Web Data Integration A.Poulovassilis Advances in Data Management - Web Data Integration A.Poulovassilis 1 1 Integrating Deep Web Data Traditionally, the web has made available vast amounts of information in unstructured form (i.e. text).

More information