Combining Multiple Query Interface Matchers Using Dempster-Shafer Theory of Evidence

Size: px
Start display at page:

Download "Combining Multiple Query Interface Matchers Using Dempster-Shafer Theory of Evidence"

Transcription

1 Combining Multiple Query Interface Matchers Using Dempster-Shafer Theory of Evidence Jun Hong, Zhongtian He and David A. Bell School of Electronics, Electrical Engineering and Computer Science Queen s University Belfast Belfast BT7 1NN, UK {j.hong,zhe01,da.bell}@qub.ac.uk Abstract Matching query interfaces is a crucial step in data integration across multiple Web databases. The problem is closely related to schema matching that typically exploits different features of schemas. Relying on a particular feature of schemas is not sufficient. We propose an evidential approach to combining multiple matchers using Dempster-Shafer theory of evidence. First, our approach views the match results of an individual matcher as a source of evidence that provides a level of confidence on the validity of each candidate attribute correspondence. Second, it combines multiple sources of evidence to get a combined mass function that represents the overall level of confidence, taking into account the match results of different matchers. Our combination mechanism does not require use of weighing parameters, hence no setting and tuning of them is needed. Third, it selects the top k attribute correspondences of each source attribute from the target schema based on the combined mass function. Our experimental results show that our approach is highly accurate and effective. 1 Introduction Web databases are now pervasive, which can be accessed via their query interfaces (usually HTML query forms) only. Query forms provide a natural way for the user to make queries to the underlying databases without using a particular query language. On receiving form-based queries, these databases return query results encoded in HTML, which are then displayed to the user. Many E-commerce sites are supported by Web databases. In a specific domain (e.g. flight booking, book sales), there are many database-driven Web sites that sell similar products or services. It is a daunting task for the user to visit numerous Web sites individually to search for and compare services or products. Web data integration aims to provide integrated access to a multitude of Web databases, where users need to fill in only a uniform query form, and on receiving a user query the system will automatically make connections to different sites, fill in the local query forms on these sites, submit these (a) abebooks.com (b) Compman.co.uk Figure 1: Two Web query interfaces in the book domain forms, combine the query results, and return the combined results to the user. Matching query interfaces is a crucial step in Web data integration, which finds attribute correspondences between two query interfaces. Figure 1 shows two query interfaces on the Web, where we can find attribute correspondences such as those indicated by different shapes or underlined. The problem is closely related to schema matching that takes two schemas as input and produces a set of attribute correspondences between them [Rahm and Bernstein, 2001]. Schema matching has been extensively studied (e.g. [Rahm and Bernstein, 2001; Do and Rahm, 2002; Doan et al., 2001; He et al., 2004; He and Chang, 2003; Madhavan et al., 2001; Wang et al., 2004; Wu et al., 2004]). These approaches exploit different features of schemas, including structural and

2 linguistic features and data types, etc to match attributes between schemas. Schema matching is inherently uncertain due to lack of complete knowledge about schemas. Relying on a single feature of schemas is not sufficient and the match results of individual matchers are often inaccurate and uncertain. Approaches have been proposed to combine multiple matchers taking into account different features of schemas. A common approach is to apply different weight coefficients to the match results of individual matchers reflecting their different levels of importance, and their weighted values are then added together as the combined match results. However, these weight coefficients are often manually set for a particular domain in a trial and error manner. There are several differences between query interfaces and traditional database schemas. First, database schemas are designed internally for database developers. As a consequence, the attributes of the schemas may be named in a highly inconsistent manner, imposing many difficulties in schema matching. In contrast, query interfaces are designed for normal users and are likely more meaningful and consistent. For example, labels on query forms are usually words or phrases whereas attribute names of database schemas are often abbreviations. Second, database schema matching has mainly focused on schema-level matching while instance-level matching has not been done extensively. This is often due to the unavailability of data instances and the assumption that the same domains of values have been used across different schemas. Whereas in query interfaces, the user is likely to be given ranges of values to choose from and as these values are designed for human use they are also likely to be more meaningful and consistent. As data instances are pervasive, semantic heterogeneity of data instances between query interfaces has to be addressed. The availability of rich linguistic and semantic features in both attributes and data instances of query interfaces motivates us to explore multiple sources of evidence from these linguistic and semantic features. We propose an evidential approach to combining the match results of multiple matchers using Dempster-Shafer theory of evidence. First, this approach views the match results of an individual matcher as a source of evidence that provides some degree of belief on the validity of each candidate attribute correspondence. Second, it combines degrees of belief from multiple sources of evidence to get a combined mass function that represents the overall degree of belief on the validity of each candidate attribute correspondence, taking into account the match results of different matchers. Our combination mechanism does not require use of weighing parameters, hence no setting and tuning of them is needed. Third, it selects the top k attribute correspondences of each source attribute from the target schema based on the combined mass function. Our experimental results show that our approach is highly accurate and effective. 2 Dempster-Shafer (DS) Theory of Evidence Dempster-Shafer theory of evidence, sometimes called evidential reasoning [Lowrance and Garvey, 1981] or belief function theory, is a mechanism formalized by Shafer [Shafer, 1976] for representing and reasoning with uncertain, imprecise and incomplete information. The theory represents a set of propositional hypotheses by a frame of discernment. Definition 1 Frames of Discernment A frame of discernment (or simply a frame), usually denoted as Θ, contains mutually exclusive and exhaustive propositional hypotheses, one and only one of which is true. Example 1: A patient has been observed having two symptoms: coughing and sniveling and only three types of illness could have caused these symptoms: flu (F), cold (C) and pneumonia (P). We can use frame Θ = {F, C, P } to represent these types of illness. There are three important functions in DS theory: the basic probability assignment (bpa or m), the belief function (bel), and the plausibility function (pl). Definition 2 Basic Probability Assignment (Mass Function) A function, m: 2 Θ [0, 1], is called a basic probability assignment on a frame Θ if it satisfies the following two conditions: m(φ) = 0 (1) m(a) = 1 (2) A Θ where φ is an empty set and A is any subset of Θ. The basic probability assignment (mass function thereafter) is a primitive function. Given a frame, Θ, for each source of evidence, a mass function assigns a mass to every subset of Θ, which represents the degree of belief that one of the hypotheses in the subset is true, given the source of evidence. In Example 1, when the patient has been observed having the symptom coughing, the degree of belief that the patient has flu or cold is 0.6 and the degree of belief that the patient has pneumonia is 0.4. We then have a mass function: m 1 ({C, F }) = 0.6 and m 1 ({P }) = 0.4. Similarly, with the symptom of sniveling, we have another mass function: m 2 ({F }) = 0.7, m 2 ({C}) = 0.2 and m 2 ({P }) = 0.1. From the mass function for a source of evidence, the belief function (bel), and the plausibility function (pl) can be defined, which represent the upper and lower bounds of an interval for every subset A of Θ, which contains the precise probability p(a) that one of the hypotheses in A is true, given the source of evidence, i.e. bel(a) p(a) pl(a). Definition 3 Belief Function (bel) For every subset A of Θ, bel(a) is defined as the sum of the masses assigned to every subset B of A: bel(a) = B A m(b) (3) Definition 4 Plausibility Function (pl) For every subset A of Θ, pl(a) is defined as the sum of the masses assigned to every subset B of Θ that intersects A: pl(a) = m(b) (4) B A The belief function and the plausibility function are related to each other as follows: pl(a) = 1 bel(ā) (5)

3 It follows from Equations 3, 4 and 5 that from one of the three functions m, bel and pl the other two functions can be derived. Given two mass functions m 1 and m 2, DS theory also provides Dempster s combination rule for combining them, which is defined as follows: A B=C m 1(A)m 2 (B) m(c) = 1 A B=φ m (6) 1(A)m 2 (B) In Example 1, we combine two mass functions, m 1 and m 2, to get a combined mass function: m(c) = 0.207, m(f ) = and m(p ) = From this mass function, the corresponding belief and plausibility functions can also be derived. When all masses have been assigned to subsets with a single element only, bel, pl and m are all the same. Since bel(a) p(a) pl(a) for every subset A of Θ, we have a probability distribution, p(c) = 0.207, p(f ) = and p(p ) = Therefore, given the two symptoms the patient has, we can conclude that it is more likely that he is having flu. 3 Combining Multiple Matchers Using DS Theory Based on DS theory, we propose an evidential approach to combining multiple matchers that exploit different features of schemas. Given a source schema and a target schema, for every source attribute, each target attribute is one of its candidate correspondences. An individual matcher provides a different measure on the validity of each candidate correspondence of the source attribute. Applying this measure to all the candidate correspondences provides a source of evidence on the validity of each candidate correspondence. Based on this source of evidence, we can generate a mass function that assigns a mass to every subset of the given frame, reflecting the degree of belief that the valid correspondence of the source attribute belongs to the subset. A set of different matchers provide multiple measures and applying these measures to all the candidate correspondences provides multiple sources of evidence on the validity of each candidate correspondence, based on which we can generate multiple mass functions. These mass functions can then be combined using Dempster s combination rule to calculate the overall degree of belief on the validity of each candidate attribute correspondence, reflecting the match results of different matchers. The combined mass function can then be further used for deciding on the top k attribute correspondences of each source attribute. 3.1 Individual Matchers We use four individual matchers. The first three matchers are based on different linguistic features of attribute names and the last matcher uses the data types of attributes. Semantic Based Similarity We measure semantic similarity between two words based on their relationships on a general linguistic ontology, Word- Net 1. WordNet [Miller, 1995] is a lexical database devel- 1 oped by Princeton University, which provides taxonomic hierarchies of natural language concepts. We define semantic similarity between two different words, w 1 and w 2, as Sim se (w 1, w 2 ) = 1 1+L, where L is the length of the shortest path in WordNet between w 1 and w 2, if both w 1 and w 2 can be found in WordNet. Otherwise, Sim se (w 1, w 2 ) = 0. For two same words, w 1 and w 2, we define semantic similarity between them as Sim se (w 1, w 2 ) = 1. Edit Distance Based Similarity Edit distance (Levenshtein distance) between two strings is measured by the number of edit operations, including insertions, deletions and substitutions, necessary to transform one string into another [Hall and Dowling, 1980]. We define the edit distance based similarity between two words, w 1 and w 2, as follows: Sim ed (w 1, w 2 ) = ed(s 1, s 2 ) where s 1 and s 2 are two strings in w 1 and w 2 respectively and ed(s 1, s 2 ) is the edit distance between s 1 and s 2. Jaro Distance Based Similarity The Jaro distance based string similarity between two strings, Jaro(s 1, s 2 ), is defined based on the number and order of the common characters between them [Cohen et al., 2003]. The Jaro distance based similarity Sim ja (w 1, w 2 ) between two words, w 1 and w 2 is defined as the Jaro distance based string similarity Jaro(s 1, s 2 ) between two strings s 1 and s 2 in w 1 and w 2. Similarity between Attribute Names A particular type of similarity between two attribute names is aggregated from the same type of similarities between individual words in the attribute names. Assume that two attribute names, A 1 and A 2, contain two sets of words, A 1 = {w 1, w 2,..., w m } and A 2 = {w 1, w 2,..., w n}. For each word, w i for i = 1, 2,...m, in A 1, we calculate its similarity with every word in A 2 and find the maximum similarity value v i. We then get a similarity value set for A 1 : Sim 1 = {v 1, v 2,..., v m }. Similarly, we get a similarity value set for A 2 : Sim 2 = {v 1, v 2,..., v n}. We calculate similarity between two attribute names A 1 and A 2 as follows: m i=1 Sim(A 1, A 2 ) = v i + n i=1 v i (8) m + n where m is the number of words in A 1, n is the number of words in A 2. Data Types of Attributes We define that two data types are compatible if one subsumes another (i.e. is-a relationship) in a given hierarchy of data types. An attribute with only an input box is given an any type since the input box can accept data of different types such as string or integer. The other data type of an attribute is recognized on the basis of the set of labels and form elements that form the attribute. (7)

4 3.2 Representing Possible Matches of Source Attributes in DS Assume that we have a source schema, S = {a 1, a 2,..., a m }, where a i, for i = 1, 2,..., m, is a source attribute, and a target schema, T = {b 1, b 2,..., b n }, where b j, for j = 1, 2,..., n, is a target attribute. For each source attribute, a i, we have a set of candidate correspondences in the target schema { a i, b 1, a i, b 2,..., a i, b n }. It is also possible that a i may have no correspondence in the target schema at all. We represent all possible matches of a i in T using a frame of discernment, Θ = { a i, b 1, a i, b 2,..., a i, b n, a i, null }, where a i, b j, for j = 1, 2,..., n, represent all possible candidate correspondences and a i, null represents that there is no correspondence of a i in the target schema. Example 2: As shown in Figure 1, for Author on abebooks.com, a frame of discernment for it is defined as follows: Θ = { Author, Category, Author, Keywords, Author, T itle, Author, Subject, Author, P ublisher, Author, Author, Author, OurRef erence, Author, ISBN, Author, Availability, Author, ReleasedDate, Author, U pperp ricelimit, Author, Binding, Author, null } Generating Subsets of Indistinguishable Attribute Correspondences Given frame Θ = { a i, b 1, a i, b 2,..., a i, b n, a i, null } for a source attribute a i, an individual matcher may not be able to distinguish two attribute correspondences in Θ from each other in terms of level of confidence on the validity of the attribute correspondences. For example, if the data type of the source attribute is compatible with the data types of two target attributes, the data type matcher cannot distinguish the two attribute correspondences from each other. The data type matcher cannot distinguish an attribute correspondence, in which the data type of the source attribute a i is compatible with the one of the target attribute, from a i, null either. A string similarity based matcher cannot distinguish two strings abc and bcd from each other when string bc is matched to each of them. We therefore cluster attribute correspondences in Θ into subsets of indistinguishable attribute correspondences. We do not have any further knowledge on the subset of indistinguishable attribute correspondences to be able to tell which attribute correspondence is more valid than the others. What we have is some degree of belief that one of the attribute correspondences in the subset is valid, according to a particular matcher. In Example 2, based on edit distance based similarity, the frame of discernment for Author on abebooks.com is divided into the following indistinguishable subsets: { Author, Category, Author, T itle, Author, OurRef erence }, { Author, Keywords, Author, Subject, Author, P ublisher, Author, ISBN }, { Author, Author }, { Author, Availability }, { Author, ReleasedDate }, { Author, U pperp ricelimit }, { Author, Binding }, { Author, null } Generating Mass Distributions on Subsets of Indistinguishable Attribute Correspondences Given a similarity based matcher, for each indistinguishable subset of attribute correspondences, we have a similarity value for each correspondence in the set, which represents how well the two attributes in the correspondence match according to the measure used by the matcher. Suppose we have subset A = { a i, b i1, a i, b i2,..., a i, b il }, a mass assigned to A is calculated based on the similarity values for all the attribute correspondences in A as follows: m (A) = 1 Π l j=1(1 Sim(a i, b ij )) (9) where Sim(a i, b ij ) is similarity value for the correspondence a i, b ij in A. Since we do not have a similarity value for a i, null by the matcher, we form a special singleton subset, { a i, null }. The mass assigned to the subset is calculated as follows: m ({ a i, null }) = Π n j=1(1 Sim(a i, b j )) (10) The mass assigned to { a i, null }, therefore, represents the degree of belief that none of the target attributes is the attribute correspondence of source attribute, a i. We scale the mass distribution, m, so that the sum of all masses assigned to every indistinguishable subset equals to 1: m (A) m(a) = B Θ m (B) (11) where A and B are indistinguishable subsets of Θ. Therefore these indistinguishable subsets of Θ are the focal elements of Θ. The mass assigned to any other subset of Θ is 0. In Example 2, based on edit distance based similarity, for Author on abebooks.com, we have a mass function defined as follows: m ed ({ Author, Category, Author, T itle, Author, OurRef erence }) = 0.175, m ed ({ Author, Keywords, Author, Subject, Author, P ublisher, Author, ISBN }) = 0.191, m ed ({ Author, Author }) = 0.414, m ed ({ Author, Availability }) = 0.035, m ed ({ Author, ReleasedDate }) = 0.063, m ed ({ Author, UpperP ricelimit }) = 0.071, m ed ({ Author, Binding }) = 0.052, m ed ({ Author, null }) = For the data type based matcher, we create two subsets of indistinguishable attribute correspondences, A and A. A contains both attribute correspondences in which the data type of the source attribute is compatible with the one of the target attribute and a i, null where a i is the source attribute. A contains attribute correspondences in which the data type of the source attribute is not compatible with the one of the target attribute. We have a mass function defined as: m da (A) = 1 and m da (A ) = 0.

5 3.3 Combining Mass Functions from Multiple Matchers We now have a mass function by each of the individual matchers, which assigns a mass to every indistinguishable subset of Θ. Using Dempster s combination rule, we can take into account different sources of evidence witnessed by different matchers by combining the appropriate mass functions by these matchers. In Example 2, for Author on abebooks.com, we have a combined mass function defined as follows: m({ Author, OurRef erence }) = 0.020, m({ Author, U pperp ricelimit }) = 0.003, m({ Author, P ublisher }) = 0.048, m({ Author, Author }) = 0.929, 3.4 Selecting Top k Attribute Correspondences This section shows how to select the top k attribute correspondences of each source attribute. The mass function produced in the previous section can be used to derive both the bel and pl functions. However, the mass function (or the corresponding belief and/or plausibility functions) cannot be directly used for making decisions. The final mass function can be transformed into a pignistic probability function using the transformation function proposed in [Smets, 2004]. Decisions can then be made based on this probability distribution. When all masses have been assigned to subsets with a single element only, bel, pl and m are all the same. In this case, we choose those attribute correspondences with top k masses as the top k attribute correspondences. In Example 2, for Author on abebooks.com, we have the top 3 attribute correspondences as follows: m({ Author, Author }) = 0.929, m({ Author, P ublisher }) = 0.048, m({ Author, OurRef erence }) = Experimental Results We use a set of 88 query interfaces selected from the ICQ Query Interfaces data set at UIUC, which contains manually extracted schemas of interfaces in 5 different domains: airfares, automobiles, books, jobs, and real estates, which involve 1:1 matching only (as we have focused on 1:1 matching in this paper). In each domain we create a uniform query interface as the source interface, and take each of the query interfaces in the domain as a target interface. We use three performance metrics: precision (P), recall (R), and F-measure (F). Precision is the percentage of correct matches over all the matches by a matcher. Recall is the percentage of correct matches by a matcher over all the matches by domain experts. F-measure is the incorporation of precision and recall as follows: F = 2P R P +R. First, in each domain we use three individual matchers: edit distance, Jaro distance and semantic similarity (the data type matcher cannot be used alone), and compare them with our new approach that combines four individual matchers: edit distance, Jaro distance, semantic similarity and data type. Given a source attribute a i, for each individual matcher, we Table 1: Precisions of individual matchers and our combined matcher Edit distance Jaro distance Semantic similarity Ours Airfares 83.3% 56.8% 86.4% 92.0% Autos 84.4% 48.1% 93.1% 96.3% Books 87.0% 48.8% 92.0% 94.4% Jobs 68.5% 50.0% 71.0% 91.9% Estates 86.8% 52.9% 81.6% 93.8% Average 82.1% 51.3% 84.8% 93.7% treat a i, null, i.e. no match, as a possible answer. We also have every candidate attribute correspondence, a i, b j, as a possible answer. We therefore have a set of possible answers, θ = { a i, b 1, a i, b 2,..., a i, b n, a i, null }. One and only one of these possible answers is true. Therefore, in our experiments, the number of matches by a matcher equals to the number of matches identified by experts, and precision, recall and F-measure turn out to be the same. We use precision only. For every a i, b j, for j = 1, 2,..., n, we have a similarity value by the matcher. Since we do not have a similarity value for a i, null, a value is calculated as follows: s( a i, null ) = Π n j=1(1 Sim(a i, b j )) In our experiments, we do not need a threshold to decide whether there is a match. If similarity values for every candidate attribute correspondence are all very low, a value calculated for a i, null, i.e. no match, will be high enough so that a i, null will be selected as the top answer. As shown in Table 1, our matcher gets much higher precision than the individual matchers. Second, we compare our results with the work in [Wu et al., 2004], which uses a similar data set for their experiments, covering the same five domains as ours. However, they also handle 1:m matching. In their experiments, a 1:m match is counted as m 1:1 matches. They did three experiments. The first is on automatic field matching which uses weight coefficients to combine multiple matchers and the clustering thresholds for two fields to be matched are set to zero for all domains. The second uses thresholds learned by user interactions. The last also uses user interactions for resolving uncertainties in match results. As shown in Figure 2, without using learned thresholds, the results of our approach are better. When the learned thresholds are used, their precision is better than ours, but we have higher recall and F-measure. Finally, when user interactions are also used to resolve uncertainties in match results, their results are better than ours. Our approach is effective and accurate for automatic schema matching across query interfaces without automated learning and user interaction. 5 Related work Cupid [Madhavan et al., 2001] exploits linguistic and structural similarity between elements and uses a weighted formula to combine these two similarities together. However, weights have to be manually generated and are domain dependent. COMA [Do and Rahm, 2002] allows users to tailor match strategies by selecting a combination of match algorithms for

6 Figure 2: Precisions, recalls and F-measures of different matchers a given problem, including Max, Min, Average and Weighted strategies. It also allows users to provide feedback for improving match results. These strategies are effective in some situations while sometimes they cannot combine results effectively, and choosing strategies by users involves human efforts. In [Wu et al., 2004], weight coefficients are also used to combine multiple matchers, which are set to some domainindependent empirical values. However, clustering is used to find attribute correspondences across multiple interfaces, in which thresholds are required for merging clusters. These thresholds need to be either manually set or learned from user interactions and are domain dependent. So this approach also involves human effort. Some approaches [He and Chang, 2003; He et al., 2004] use attribute distribution rather than linguistic or domain information. Superior to other schema matching approaches, these approaches can discover synonyms by analyzing attribute distributions in the given schemas. However, they work well only when a large training data set is available, but this is not always the case. 6 Conclusions We proposed a new approach to combining multiple matchers using DS theory and presented an algorithm for resolving conflicts among the correspondences of different source attributes. Applying different matchers to a set of candidate correspondences provides different sources of evidence, and mass distributions are defined on the basis of the match results from these matchers. We use Dempster s combination rule to combine these mass distributions, and choose the top k correspondences of each source attribute. We implemented a prototype and tested it using a large data set that contains real-world query interfaces in five different domains. The experimental results demonstrate the feasibility and accuracy of our approach. References [Cohen et al., 2003] W.W. Cohen, P. Ravikumar, and S.E. Fienberg. A comparison of string distance metrics for name-matching tasks. pages 73 78, [Do and Rahm, 2002] H.H. Do and E. Rahm. Coma - a system for flexible combination of schema matching approaches. In VLDB 02, pages , [Doan et al., 2001] A. Doan, P. Domingos, and A.Y. Halevy. Reconciling schemas of disparate data sources: A machine-learning approach. In SIGMOD 01, pages , [Hall and Dowling, 1980] P. Hall and G. Dowling. Approximate string matching. Computing Surveys, pages , [He and Chang, 2003] Bin He and Kevin Chen-Chuan Chang. Statistical schema matching across web query interfaces. In SIGMOD Conference, pages , San Diego, CA, [He et al., 2004] B. He, K.C.C. Chang, and J. Han. Discovering complex matchings across web query interfaces: A correlation mining approach. In KDD 04, pages , [Lowrance and Garvey, 1981] J.D. Lowrance and T.D. Garvey. Evidential reasoning: An developing concept. In ICCS 81, pages 6 9, [Madhavan et al., 2001] J. Madhavan, P.A. Bernstein, and E. Rahm. Generic schema matching with cupid. In VLDB 01, pages 49 58, [Miller, 1995] George A. Miller. Wordnet: A lexical databases for english. Communications of ACM, 38(11):39 41, [Rahm and Bernstein, 2001] E. Rahm and P.A. Bernstein. A survey of approaches to automatic schema matching. VLDB Journal, 10(4): , [Shafer, 1976] G. Shafer. A Mathematical Theory of Evidence [Smets, 2004] Philip Smets. Decision making in the tbm: the necessity of the pignistic transformation. International Journal of Approximate Reasoning, 38: , [Wang et al., 2004] Jiying Wang, Ji-Rong Wen, Fred Lochovsky, and Wei-Ying Ma. Instance-based schema matching for web databases by domain-specific query probing. In Proceedings of the 30th VLDB Conference, pages , Toronto, Canada, [Wu et al., 2004] Wensheng Wu, Clement Yu, AnHai Doan, and Weiyi Meng. An interactive clustering based approach to integrating source query interfaces on the deep web. In SIGMOD Conference, pages , Paris, France, Acknowledgments The authors wish to thank Phil Bernstein for his helpful suggestions and comments on the research presented in the paper.

MINING ASSOCIATION RULES WITH UNCERTAIN ITEM RELATIONSHIPS

MINING ASSOCIATION RULES WITH UNCERTAIN ITEM RELATIONSHIPS MINING ASSOCIATION RULES WITH UNCERTAIN ITEM RELATIONSHIPS Mei-Ling Shyu 1, Choochart Haruechaiyasak 1, Shu-Ching Chen, and Kamal Premaratne 1 1 Department of Electrical and Computer Engineering University

More information

Semi-automatic Generation of Active Ontologies from Web Forms

Semi-automatic Generation of Active Ontologies from Web Forms Semi-automatic Generation of Active Ontologies from Web Forms Martin Blersch, Mathias Landhäußer, and Thomas Mayer (IPD) 1 KIT The Research University in the Helmholtz Association www.kit.edu "Create an

More information

Decision Fusion using Dempster-Schaffer Theory

Decision Fusion using Dempster-Schaffer Theory Decision Fusion using Dempster-Schaffer Theory Prof. D. J. Parish High Speed networks Group Department of Electronic and Electrical Engineering D.J.Parish@lboro.ac.uk Loughborough University Overview Introduction

More information

OWL-CM : OWL Combining Matcher based on Belief Functions Theory

OWL-CM : OWL Combining Matcher based on Belief Functions Theory OWL-CM : OWL Combining Matcher based on Belief Functions Theory Boutheina Ben Yaghlane 1 and Najoua Laamari 2 1 LARODEC, Université de Tunis, IHEC Carthage Présidence 2016 Tunisia boutheina.yaghlane@ihec.rnu.tn

More information

Deep Web Content Mining

Deep Web Content Mining Deep Web Content Mining Shohreh Ajoudanian, and Mohammad Davarpanah Jazi Abstract The rapid expansion of the web is causing the constant growth of information, leading to several problems such as increased

More information

DSSim-ontology mapping with uncertainty

DSSim-ontology mapping with uncertainty DSSim-ontology mapping with uncertainty Miklos Nagy, Maria Vargas-Vera, Enrico Motta Knowledge Media Institute (Kmi) The Open University Walton Hall, Milton Keynes, MK7 6AA, United Kingdom mn2336@student.open.ac.uk;{m.vargas-vera,e.motta}@open.ac.uk

More information

Evidential Editing K-Nearest Neighbor Classifier

Evidential Editing K-Nearest Neighbor Classifier Evidential Editing -Nearest Neighbor Classifier Lianmeng Jiao 1,2, Thierry Denœux 1, and Quan Pan 2 1 Sorbonne Universités, Université de Technologie de Compiègne, CNRS, Heudiasyc UMR 7253, Compiègne,

More information

Schema Integration Based on Uncertain Semantic Mappings

Schema Integration Based on Uncertain Semantic Mappings chema Integration Based on Uncertain emantic Mappings Matteo Magnani 1, Nikos Rizopoulos 2, Peter M c.brien 2, and Danilo Montesi 3 1 Department of Computer cience, University of Bologna, Via Mura A.Zamboni

More information

UAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA

UAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA UAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA METANAT HOOSHSADAT, SAMANEH BAYAT, PARISA NAEIMI, MAHDIEH S. MIRIAN, OSMAR R. ZAÏANE Computing Science Department, University

More information

Integrating Correlation Clustering and Agglomerative Hierarchical Clustering for Holistic Schema Matching

Integrating Correlation Clustering and Agglomerative Hierarchical Clustering for Holistic Schema Matching Journal of Computer Science Original Research Paper Integrating Correlation Clustering and Agglomerative Hierarchical Clustering for Holistic Schema Matching Basel Alshaikhdeeb and Kamsuriah Ahmad Faculty

More information

Keywords Data alignment, Data annotation, Web database, Search Result Record

Keywords Data alignment, Data annotation, Web database, Search Result Record Volume 5, Issue 8, August 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Annotating Web

More information

Dempster s Rule for Evidence Ordered in a Complete Directed Acyclic Graph

Dempster s Rule for Evidence Ordered in a Complete Directed Acyclic Graph Dempster s Rule for Evidence Ordered in a Complete Directed Acyclic Graph Ulla Bergsten and Johan Schubert Division of Applied Mathematics and Scientific Data Processing, Department of Weapon Systems,

More information

Fingerprint Indexing using Minutiae and Pore Features

Fingerprint Indexing using Minutiae and Pore Features Fingerprint Indexing using Minutiae and Pore Features R. Singh 1, M. Vatsa 1, and A. Noore 2 1 IIIT Delhi, India, {rsingh, mayank}iiitd.ac.in 2 West Virginia University, Morgantown, USA, afzel.noore@mail.wvu.edu

More information

Automatic Complex Schema Matching Across Web Query Interfaces: A Correlation Mining Approach

Automatic Complex Schema Matching Across Web Query Interfaces: A Correlation Mining Approach Automatic Complex Schema Matching Across Web Query Interfaces: A Correlation Mining Approach BIN HE and KEVIN CHEN-CHUAN CHANG University of Illinois at Urbana-Champaign To enable information integration,

More information

A Generic Algorithm for Heterogeneous Schema Matching

A Generic Algorithm for Heterogeneous Schema Matching You Li, Dongbo Liu, and Weiming Zhang A Generic Algorithm for Heterogeneous Schema Matching You Li1, Dongbo Liu,3, and Weiming Zhang1 1 Department of Management Science, National University of Defense

More information

E-Agricultural Services and Business

E-Agricultural Services and Business E-Agricultural Services and Business A Conceptual Framework for Developing a Deep Web Service Nattapon Harnsamut, Naiyana Sahavechaphan nattapon.harnsamut@nectec.or.th, naiyana.sahavechaphan@nectec.or.th

More information

Handling Missing Values via Decomposition of the Conditioned Set

Handling Missing Values via Decomposition of the Conditioned Set Handling Missing Values via Decomposition of the Conditioned Set Mei-Ling Shyu, Indika Priyantha Kuruppu-Appuhamilage Department of Electrical and Computer Engineering, University of Miami Coral Gables,

More information

APPLICATION OF DATA FUSION THEORY AND SUPPORT VECTOR MACHINE TO X-RAY CASTINGS INSPECTION

APPLICATION OF DATA FUSION THEORY AND SUPPORT VECTOR MACHINE TO X-RAY CASTINGS INSPECTION APPLICATION OF DATA FUSION THEORY AND SUPPORT VECTOR MACHINE TO X-RAY CASTINGS INSPECTION Ahmad Osman 1*, Valérie Kaftandjian 2, Ulf Hassler 1 1 Fraunhofer Development Center X-ray Technologies, A Cooperative

More information

Leveraging Data and Structure in Ontology Integration

Leveraging Data and Structure in Ontology Integration Leveraging Data and Structure in Ontology Integration O. Udrea L. Getoor R.J. Miller Group 15 Enrico Savioli Andrea Reale Andrea Sorbini DEIS University of Bologna Searching Information in Large Spaces

More information

Creating a Mediated Schema Based on Initial Correspondences

Creating a Mediated Schema Based on Initial Correspondences Creating a Mediated Schema Based on Initial Correspondences Rachel A. Pottinger University of Washington Seattle, WA, 98195 rap@cs.washington.edu Philip A. Bernstein Microsoft Research Redmond, WA 98052-6399

More information

PRIOR System: Results for OAEI 2006

PRIOR System: Results for OAEI 2006 PRIOR System: Results for OAEI 2006 Ming Mao, Yefei Peng University of Pittsburgh, Pittsburgh, PA, USA {mingmao,ypeng}@mail.sis.pitt.edu Abstract. This paper summarizes the results of PRIOR system, which

More information

Learning mappings and queries

Learning mappings and queries Learning mappings and queries Marie Jacob University Of Pennsylvania DEIS 2010 1 Schema mappings Denote relationships between schemas Relates source schema S and target schema T Defined in a query language

More information

Post-processing the hybrid method for addressing uncertainty in risk assessments. Technical Note for the Journal of Environmental Engineering

Post-processing the hybrid method for addressing uncertainty in risk assessments. Technical Note for the Journal of Environmental Engineering Post-processing the hybrid method for addressing uncertainty in risk assessments By: Cédric Baudrit 1, Dominique Guyonnet 2, Didier Dubois 1 1 : Math. Spec., Université Paul Sabatier, 31063 Toulouse, France

More information

Outline A Survey of Approaches to Automatic Schema Matching. Outline. What is Schema Matching? An Example. Another Example

Outline A Survey of Approaches to Automatic Schema Matching. Outline. What is Schema Matching? An Example. Another Example A Survey of Approaches to Automatic Schema Matching Mihai Virtosu CS7965 Advanced Database Systems Spring 2006 April 10th, 2006 2 What is Schema Matching? A basic problem found in many database application

More information

A Framework for Domain-Specific Interface Mapper (DSIM)

A Framework for Domain-Specific Interface Mapper (DSIM) 56 IJCSNS International Journal of Computer Science and Network Security, VOL.8 No.12, December 2008 A Framework for Domain-Specific Interface Mapper (DSIM) Komal Kumar Bhatia1 and A. K. Sharma2, YMCA

More information

Annotating Multiple Web Databases Using Svm

Annotating Multiple Web Databases Using Svm Annotating Multiple Web Databases Using Svm M.Yazhmozhi 1, M. Lavanya 2, Dr. N. Rajkumar 3 PG Scholar, Department of Software Engineering, Sri Ramakrishna Engineering College, Coimbatore, India 1, 3 Head

More information

Matching and Alignment: What is the Cost of User Post-match Effort?

Matching and Alignment: What is the Cost of User Post-match Effort? Matching and Alignment: What is the Cost of User Post-match Effort? (Short paper) Fabien Duchateau 1 and Zohra Bellahsene 2 and Remi Coletta 2 1 Norwegian University of Science and Technology NO-7491 Trondheim,

More information

InsMT / InsMTL Results for OAEI 2014 Instance Matching

InsMT / InsMTL Results for OAEI 2014 Instance Matching InsMT / InsMTL Results for OAEI 2014 Instance Matching Abderrahmane Khiat 1, Moussa Benaissa 1 1 LITIO Lab, University of Oran, BP 1524 El-Mnaouar Oran, Algeria abderrahmane_khiat@yahoo.com moussabenaissa@yahoo.fr

More information

XML Schema Matching Using Structural Information

XML Schema Matching Using Structural Information XML Schema Matching Using Structural Information A.Rajesh Research Scholar Dr.MGR University, Maduravoyil, Chennai S.K.Srivatsa Sr.Professor St.Joseph s Engineering College, Chennai ABSTRACT Schema matching

More information

Combining Neural Networks Based on Dempster-Shafer Theory for Classifying Data with Imperfect Labels

Combining Neural Networks Based on Dempster-Shafer Theory for Classifying Data with Imperfect Labels Combining Neural Networks Based on Dempster-Shafer Theory for Classifying Data with Imperfect Labels Mahdi Tabassian 1,2, Reza Ghaderi 1, and Reza Ebrahimpour 2,3 1 Faculty of Electrical and Computer Engineering,

More information

TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES

TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES Mu. Annalakshmi Research Scholar, Department of Computer Science, Alagappa University, Karaikudi. annalakshmi_mu@yahoo.co.in Dr. A.

More information

Ontology Based Prediction of Difficult Keyword Queries

Ontology Based Prediction of Difficult Keyword Queries Ontology Based Prediction of Difficult Keyword Queries Lubna.C*, Kasim K Pursuing M.Tech (CSE)*, Associate Professor (CSE) MEA Engineering College, Perinthalmanna Kerala, India lubna9990@gmail.com, kasim_mlp@gmail.com

More information

Top-k Keyword Search Over Graphs Based On Backward Search

Top-k Keyword Search Over Graphs Based On Backward Search Top-k Keyword Search Over Graphs Based On Backward Search Jia-Hui Zeng, Jiu-Ming Huang, Shu-Qiang Yang 1College of Computer National University of Defense Technology, Changsha, China 2College of Computer

More information

Open Access Algorithm of Context Inconsistency Elimination Based on Feedback Windowing and Evidence Theory for Smart Home

Open Access Algorithm of Context Inconsistency Elimination Based on Feedback Windowing and Evidence Theory for Smart Home Send Orders for Reprints to reprints@benthamscience.ae The Open Automation and Control Systems Journal, 2014, 6, 637-648 637 Open Access Algorithm of Context Inconsistency Elimination Based on Feedback

More information

Information Discovery, Extraction and Integration for the Hidden Web

Information Discovery, Extraction and Integration for the Hidden Web Information Discovery, Extraction and Integration for the Hidden Web Jiying Wang Department of Computer Science University of Science and Technology Clear Water Bay, Kowloon Hong Kong cswangjy@cs.ust.hk

More information

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li Learning to Match Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li 1. Introduction The main tasks in many applications can be formalized as matching between heterogeneous objects, including search, recommendation,

More information

Value Added Association Rules

Value Added Association Rules Value Added Association Rules T.Y. Lin San Jose State University drlin@sjsu.edu Glossary Association Rule Mining A Association Rule Mining is an exploratory learning task to discover some hidden, dependency

More information

A new parameterless credal method to track-to-track assignment problem

A new parameterless credal method to track-to-track assignment problem A new parameterless credal method to track-to-track assignment problem Samir Hachour, François Delmotte, and David Mercier Univ. Lille Nord de France, UArtois, EA 3926 LGI2A, Béthune, France Abstract.

More information

Quality of Matching in large scale scenarios

Quality of Matching in large scale scenarios Quality of Matching in large scale scenarios Sana Sellami : Université de Lyon, CNRS INSA-Lyon, LIRIS, UMR5205, F-69621, France. sana.sellami@insa-lyon.fr Nabila Benharkat : Université de Lyon, CNRS INSA-Lyon,

More information

Ontology based Model and Procedure Creation for Topic Analysis in Chinese Language

Ontology based Model and Procedure Creation for Topic Analysis in Chinese Language Ontology based Model and Procedure Creation for Topic Analysis in Chinese Language Dong Han and Kilian Stoffel Information Management Institute, University of Neuchâtel Pierre-à-Mazel 7, CH-2000 Neuchâtel,

More information

Enabling Product Comparisons on Unstructured Information Using Ontology Matching

Enabling Product Comparisons on Unstructured Information Using Ontology Matching Enabling Product Comparisons on Unstructured Information Using Ontology Matching Maximilian Walther, Niels Jäckel, Daniel Schuster, and Alexander Schill Technische Universität Dresden, Faculty of Computer

More information

38050 Povo Trento (Italy), Via Sommarive 14 Fausto Giunchiglia, Pavel Shvaiko and Mikalai Yatskevich

38050 Povo Trento (Italy), Via Sommarive 14   Fausto Giunchiglia, Pavel Shvaiko and Mikalai Yatskevich UNIVERSITY OF TRENTO DEPARTMENT OF INFORMATION AND COMMUNICATION TECHNOLOGY 38050 Povo Trento (Italy), Via Sommarive 14 http://www.dit.unitn.it SEMANTIC MATCHING Fausto Giunchiglia, Pavel Shvaiko and Mikalai

More information

µbe: User Guided Source Selection and Schema Mediation for Internet Scale Data Integration

µbe: User Guided Source Selection and Schema Mediation for Internet Scale Data Integration µbe: User Guided Source Selection and Schema Mediation for Internet Scale Data Integration Ashraf Aboulnaga Kareem El Gebaly University of Waterloo {ashraf, kelgebal}@cs.uwaterloo.ca Abstract The typical

More information

Presented by: Dimitri Galmanovich. Petros Venetis, Alon Halevy, Jayant Madhavan, Marius Paşca, Warren Shen, Gengxin Miao, Chung Wu

Presented by: Dimitri Galmanovich. Petros Venetis, Alon Halevy, Jayant Madhavan, Marius Paşca, Warren Shen, Gengxin Miao, Chung Wu Presented by: Dimitri Galmanovich Petros Venetis, Alon Halevy, Jayant Madhavan, Marius Paşca, Warren Shen, Gengxin Miao, Chung Wu 1 When looking for Unstructured data 2 Millions of such queries every day

More information

IJMIE Volume 2, Issue 3 ISSN:

IJMIE Volume 2, Issue 3 ISSN: Deep web Data Integration Approach Based on Schema and Attributes Extraction of Query Interfaces Mr. Gopalkrushna Patel* Anand Singh Rajawat** Mr. Satyendra Vyas*** Abstract: The deep web is becoming a

More information

DENSITY BASED AND PARTITION BASED CLUSTERING OF UNCERTAIN DATA BASED ON KL-DIVERGENCE SIMILARITY MEASURE

DENSITY BASED AND PARTITION BASED CLUSTERING OF UNCERTAIN DATA BASED ON KL-DIVERGENCE SIMILARITY MEASURE DENSITY BASED AND PARTITION BASED CLUSTERING OF UNCERTAIN DATA BASED ON KL-DIVERGENCE SIMILARITY MEASURE Sinu T S 1, Mr.Joseph George 1,2 Computer Science and Engineering, Adi Shankara Institute of Engineering

More information

EXTRACT THE TARGET LIST WITH HIGH ACCURACY FROM TOP-K WEB PAGES

EXTRACT THE TARGET LIST WITH HIGH ACCURACY FROM TOP-K WEB PAGES EXTRACT THE TARGET LIST WITH HIGH ACCURACY FROM TOP-K WEB PAGES B. GEETHA KUMARI M. Tech (CSE) Email-id: Geetha.bapr07@gmail.com JAGETI PADMAVTHI M. Tech (CSE) Email-id: jageti.padmavathi4@gmail.com ABSTRACT:

More information

Deduplication of Hospital Data using Genetic Programming

Deduplication of Hospital Data using Genetic Programming Deduplication of Hospital Data using Genetic Programming P. Gujar Department of computer engineering Thakur college of engineering and Technology, Kandiwali, Maharashtra, India Priyanka Desai Department

More information

An Uncertain Data Integration System

An Uncertain Data Integration System Author manuscript, published in "Int. Conf. On Ontologies, DataBases, and Applications of Semantics (ODBASE), France" An Uncertain Data Integration System Naser Ayat #1, Hamideh Afsarmanesh #2, Reza Akbarinia

More information

Inference in Hierarchical Multidimensional Space

Inference in Hierarchical Multidimensional Space Proc. International Conference on Data Technologies and Applications (DATA 2012), Rome, Italy, 25-27 July 2012, 70-76 Related papers: http://conceptoriented.org/ Inference in Hierarchical Multidimensional

More information

Data Clustering Hierarchical Clustering, Density based clustering Grid based clustering

Data Clustering Hierarchical Clustering, Density based clustering Grid based clustering Data Clustering Hierarchical Clustering, Density based clustering Grid based clustering Team 2 Prof. Anita Wasilewska CSE 634 Data Mining All Sources Used for the Presentation Olson CF. Parallel algorithms

More information

Discovering Complex Matchings across Web Query Interfaces: A Correlation Mining Approach

Discovering Complex Matchings across Web Query Interfaces: A Correlation Mining Approach Discovering Complex Matchings across Web Query Interfaces: A Correlation Mining Approach Bin He, Kevin Chen-Chuan Chang, Jiawei Han Computer Science Department University of Illinois at Urbana-Champaign

More information

ISSN (Online) ISSN (Print)

ISSN (Online) ISSN (Print) Accurate Alignment of Search Result Records from Web Data Base 1Soumya Snigdha Mohapatra, 2 M.Kalyan Ram 1,2 Dept. of CSE, Aditya Engineering College, Surampalem, East Godavari, AP, India Abstract: Most

More information

SATELLITE IMAGE COMPRESSION TECHNIQUE BASED ON THE EVIDENCE THEORY

SATELLITE IMAGE COMPRESSION TECHNIQUE BASED ON THE EVIDENCE THEORY SATELLITE IMAGE COMPRESSION TECHNIQUE BASED ON THE EVIDENCE THEORY Khaled Sahnoun, Noureddine Benabadji Department of Physics, University of Sciences and Technology of Oran- Algeria Laboratory of Analysis

More information

Review of Fuzzy Logical Database Models

Review of Fuzzy Logical Database Models IOSR Journal of Computer Engineering (IOSRJCE) ISSN: 2278-0661, ISBN: 2278-8727Volume 8, Issue 4 (Jan. - Feb. 2013), PP 24-30 Review of Fuzzy Logical Database Models Anupriya 1, Prof. Rahul Rishi 2 1 (Department

More information

By Robin Dhamankar, Yoonkyong Lee, AnHai Doan, Alon Halevy and Pedro Domingos. Presented by Yael Kazaz

By Robin Dhamankar, Yoonkyong Lee, AnHai Doan, Alon Halevy and Pedro Domingos. Presented by Yael Kazaz By Robin Dhamankar, Yoonkyong Lee, AnHai Doan, Alon Halevy and Pedro Domingos Presented by Yael Kazaz Example: Merging Real-Estate Agencies Two real-estate agencies: S and T, decide to merge Schema T has

More information

Reconciling Agent Ontologies for Web Service Applications

Reconciling Agent Ontologies for Web Service Applications Reconciling Agent Ontologies for Web Service Applications Jingshan Huang, Rosa Laura Zavala Gutiérrez, Benito Mendoza García, and Michael N. Huhns Computer Science and Engineering Department University

More information

A Tagging Approach to Ontology Mapping

A Tagging Approach to Ontology Mapping A Tagging Approach to Ontology Mapping Colm Conroy 1, Declan O'Sullivan 1, Dave Lewis 1 1 Knowledge and Data Engineering Group, Trinity College Dublin {coconroy,declan.osullivan,dave.lewis}@cs.tcd.ie Abstract.

More information

Annotating Search Results from Web Databases Using Clustering-Based Shifting

Annotating Search Results from Web Databases Using Clustering-Based Shifting Annotating Search Results from Web Databases Using Clustering-Based Shifting Saranya.J 1, SelvaKumar.M 2, Vigneshwaran.S 3, Danessh.M.S 4 1, 2, 3 Final year students, B.E-CSE, K.S.Rangasamy College of

More information

Ontology Matching Using an Artificial Neural Network to Learn Weights

Ontology Matching Using an Artificial Neural Network to Learn Weights Ontology Matching Using an Artificial Neural Network to Learn Weights Jingshan Huang 1, Jiangbo Dang 2, José M. Vidal 1, and Michael N. Huhns 1 {huang27@sc.edu, jiangbo.dang@siemens.com, vidal@sc.edu,

More information

Ontology-based Integration and Refinement of Evaluation-Committee Data from Heterogeneous Data Sources

Ontology-based Integration and Refinement of Evaluation-Committee Data from Heterogeneous Data Sources Indian Journal of Science and Technology, Vol 8(23), DOI: 10.17485/ijst/2015/v8i23/79342 September 2015 ISSN (Print) : 0974-6846 ISSN (Online) : 0974-5645 Ontology-based Integration and Refinement of Evaluation-Committee

More information

Classification with Diffuse or Incomplete Information

Classification with Diffuse or Incomplete Information Classification with Diffuse or Incomplete Information AMAURY CABALLERO, KANG YEN Florida International University Abstract. In many different fields like finance, business, pattern recognition, communication

More information

An Approach to Intensional Query Answering at Multiple Abstraction Levels Using Data Mining Approaches

An Approach to Intensional Query Answering at Multiple Abstraction Levels Using Data Mining Approaches An Approach to Intensional Query Answering at Multiple Abstraction Levels Using Data Mining Approaches Suk-Chung Yoon E. K. Park Dept. of Computer Science Dept. of Software Architecture Widener University

More information

Information Fusion Based Fault Location Technology for Distribution Network

Information Fusion Based Fault Location Technology for Distribution Network 826 JOURNAL OF SOFTWARE, VOL. 6, NO. 5, MA 211 Information Fusion Based Fault Location Technology for Distribution Network Qingle Pang 1, 2 1. School of Information and Electronic Engineering, Shandong

More information

Identifying and Ranking Possible Semantic and Common Usage Categories of Search Engine Queries

Identifying and Ranking Possible Semantic and Common Usage Categories of Search Engine Queries Identifying and Ranking Possible Semantic and Common Usage Categories of Search Engine Queries Reza Taghizadeh Hemayati 1, Weiyi Meng 1, Clement Yu 2 1 Department of Computer Science, Binghamton university,

More information

Lily: Ontology Alignment Results for OAEI 2009

Lily: Ontology Alignment Results for OAEI 2009 Lily: Ontology Alignment Results for OAEI 2009 Peng Wang 1, Baowen Xu 2,3 1 College of Software Engineering, Southeast University, China 2 State Key Laboratory for Novel Software Technology, Nanjing University,

More information

A FRAMEWORK FOR EFFICIENT DATA SEARCH THROUGH XML TREE PATTERNS

A FRAMEWORK FOR EFFICIENT DATA SEARCH THROUGH XML TREE PATTERNS A FRAMEWORK FOR EFFICIENT DATA SEARCH THROUGH XML TREE PATTERNS SRIVANI SARIKONDA 1 PG Scholar Department of CSE P.SANDEEP REDDY 2 Associate professor Department of CSE DR.M.V.SIVA PRASAD 3 Principal Abstract:

More information

EXPERT SYSTEM DESIGN

EXPERT SYSTEM DESIGN EXPERT SYSTEM DESIGN TERMINOLOGY A. EXPERT SYSTEM - EMULATES A HUMAN EXPERT 1. SPECIALIZED KNOWLEDGE B. KNOWLEDGE-BASED SYSTEM - A SYSTEM THAT USES EXPERT SYSTEM DESIGN TECHNOLOGY BUT DOES NOT NECESSARILY

More information

Content Based Smart Crawler For Efficiently Harvesting Deep Web Interface

Content Based Smart Crawler For Efficiently Harvesting Deep Web Interface Content Based Smart Crawler For Efficiently Harvesting Deep Web Interface Prof. T.P.Aher(ME), Ms.Rupal R.Boob, Ms.Saburi V.Dhole, Ms.Dipika B.Avhad, Ms.Suvarna S.Burkul 1 Assistant Professor, Computer

More information

Query-Sensitive Similarity Measure for Content-Based Image Retrieval

Query-Sensitive Similarity Measure for Content-Based Image Retrieval Query-Sensitive Similarity Measure for Content-Based Image Retrieval Zhi-Hua Zhou Hong-Bin Dai National Laboratory for Novel Software Technology Nanjing University, Nanjing 2193, China {zhouzh, daihb}@lamda.nju.edu.cn

More information

Symbol Detection Using Region Adjacency Graphs and Integer Linear Programming

Symbol Detection Using Region Adjacency Graphs and Integer Linear Programming 2009 10th International Conference on Document Analysis and Recognition Symbol Detection Using Region Adjacency Graphs and Integer Linear Programming Pierre Le Bodic LRI UMR 8623 Using Université Paris-Sud

More information

A Survey of Context Modelling and Reasoning Techniques

A Survey of Context Modelling and Reasoning Techniques Formal Information A Survey of Context Modelling and Reasoning Techniques Bettini, Brdiczka, Henricksen, Indulska, Nicklas, Ranganathan, Riboni Pervasive and Mobile Computing 2008 (submitted), 2010 (published)

More information

NAME SIMILARITY MEASURES FOR XML SCHEMA MATCHING

NAME SIMILARITY MEASURES FOR XML SCHEMA MATCHING NAME SIMILARITY MEASURES FOR XML SCHEMA MATCHING Ali El Desoukey Mansoura University, Mansoura, Egypt Amany Sarhan, Alsayed Algergawy Tanta University, Tanta, Egypt Seham Moawed Middle Delta Company for

More information

What you have learned so far. Interoperability. Ontology heterogeneity. Being serious about the semantic web

What you have learned so far. Interoperability. Ontology heterogeneity. Being serious about the semantic web What you have learned so far Interoperability Introduction to the Semantic Web Tutorial at ISWC 2010 Jérôme Euzenat Data can be expressed in RDF Linked through URIs Modelled with OWL ontologies & Retrieved

More information

Generalized proportional conflict redistribution rule applied to Sonar imagery and Radar targets classification

Generalized proportional conflict redistribution rule applied to Sonar imagery and Radar targets classification Arnaud Martin 1, Christophe Osswald 2 1,2 ENSIETA E 3 I 2 Laboratory, EA 3876, 2, rue Francois Verny, 29806 Brest Cedex 9, France Generalized proportional conflict redistribution rule applied to Sonar

More information

Dempster-Shafer Theory as an Inference Method for Corresponding Geometric Distorted Images

Dempster-Shafer Theory as an Inference Method for Corresponding Geometric Distorted Images Dempster-Shafer Theory as an Inference Method for Corresponding Geometric Distorted Images José Demisio Simões da Silva, 2,Paulo Ouvera Simoni 2, 3 Instituto Nacional de Pesquisas Espaciais - INPE, Laboratório

More information

Web Database Integration

Web Database Integration In Proceedings of the Ph.D Workshop in conjunction with VLDB 06 (VLDB-PhD2006), Seoul, Korea, September 11, 2006 Web Database Integration Wei Liu School of Information Renmin University of China Beijing,

More information

ProFoUnd: Program-analysis based Form Understanding

ProFoUnd: Program-analysis based Form Understanding ProFoUnd: Program-analysis based Form Understanding (joint work with M. Benedikt, T. Furche, A. Savvides) PIERRE SENELLART IC2 Group Seminar, 16 May 2012 The Deep Web Definition (Deep Web, Hidden Web,

More information

Leveraging Set Relations in Exact Set Similarity Join

Leveraging Set Relations in Exact Set Similarity Join Leveraging Set Relations in Exact Set Similarity Join Xubo Wang, Lu Qin, Xuemin Lin, Ying Zhang, and Lijun Chang University of New South Wales, Australia University of Technology Sydney, Australia {xwang,lxue,ljchang}@cse.unsw.edu.au,

More information

MERGING BUSINESS VOCABULARIES AND RULES

MERGING BUSINESS VOCABULARIES AND RULES MERGING BUSINESS VOCABULARIES AND RULES Edvinas Sinkevicius Departament of Information Systems Centre of Information System Design Technologies, Kaunas University of Lina Nemuraite Departament of Information

More information

Modelling Structures in Data Mining Techniques

Modelling Structures in Data Mining Techniques Modelling Structures in Data Mining Techniques Ananth Y N 1, Narahari.N.S 2 Associate Professor, Dept of Computer Science, School of Graduate Studies- JainUniversity- J.C.Road, Bangalore, INDIA 1 Professor

More information

3 No-Wait Job Shops with Variable Processing Times

3 No-Wait Job Shops with Variable Processing Times 3 No-Wait Job Shops with Variable Processing Times In this chapter we assume that, on top of the classical no-wait job shop setting, we are given a set of processing times for each operation. We may select

More information

Analytic Hierarchy Process using Belief Function Theory: Belief AHP Approach

Analytic Hierarchy Process using Belief Function Theory: Belief AHP Approach Analytic Hierarchy Process using Belief Function Theory: Belief AHP Approach Elaborated by: Amel ENNACEUR Advisors: Zied ELOUEDI (ISG, Tunisia) Eric Lefevre (univ-artois, France) 2010/2011 Acknowledgements

More information

Outline. Data Integration. Entity Matching/Identification. Duplicate Detection. More Resources. Duplicates Detection in Database Integration

Outline. Data Integration. Entity Matching/Identification. Duplicate Detection. More Resources. Duplicates Detection in Database Integration Outline Duplicates Detection in Database Integration Background HumMer Automatic Data Fusion System Duplicate Detection methods An efficient method using priority queue Approach based on Extended key Approach

More information

Rank Measures for Ordering

Rank Measures for Ordering Rank Measures for Ordering Jin Huang and Charles X. Ling Department of Computer Science The University of Western Ontario London, Ontario, Canada N6A 5B7 email: fjhuang33, clingg@csd.uwo.ca Abstract. Many

More information

Fuzzy Sets and Systems. Lecture 1 (Introduction) Bu- Ali Sina University Computer Engineering Dep. Spring 2010

Fuzzy Sets and Systems. Lecture 1 (Introduction) Bu- Ali Sina University Computer Engineering Dep. Spring 2010 Fuzzy Sets and Systems Lecture 1 (Introduction) Bu- Ali Sina University Computer Engineering Dep. Spring 2010 Fuzzy sets and system Introduction and syllabus References Grading Fuzzy sets and system Syllabus

More information

On Generalizing Rough Set Theory

On Generalizing Rough Set Theory On Generalizing Rough Set Theory Y.Y. Yao Department of Computer Science, University of Regina Regina, Saskatchewan, Canada S4S 0A2 E-mail: yyao@cs.uregina.ca Abstract. This paper summarizes various formulations

More information

(Big Data Integration) : :

(Big Data Integration) : : (Big Data Integration) : : 3 # $%&'! ()* +$,- 2/30 ()* + # $%&' = 3 : $ 2 : 17 ;' $ # < 2 6 ' $%&',# +'= > 0 - '? @0 A 1 3/30 3?. - B 6 @* @(C : E6 - > ()* (C :(C E6 1' +'= - ''3-6 F :* 2G '> H-! +'-?

More information

Entity Resolution over Graphs

Entity Resolution over Graphs Entity Resolution over Graphs Bingxin Li Supervisor: Dr. Qing Wang Australian National University Semester 1, 2014 Acknowledgements I would take this opportunity to thank my supervisor, Dr. Qing Wang,

More information

TRIE BASED METHODS FOR STRING SIMILARTIY JOINS

TRIE BASED METHODS FOR STRING SIMILARTIY JOINS TRIE BASED METHODS FOR STRING SIMILARTIY JOINS Venkat Charan Varma Buddharaju #10498995 Department of Computer and Information Science University of MIssissippi ENGR-654 INFORMATION SYSTEM PRINCIPLES RESEARCH

More information

Chapter 4 Fuzzy Logic

Chapter 4 Fuzzy Logic 4.1 Introduction Chapter 4 Fuzzy Logic The human brain interprets the sensory information provided by organs. Fuzzy set theory focus on processing the information. Numerical computation can be performed

More information

HELIOS: a General Framework for Ontology-based Knowledge Sharing and Evolution in P2P Systems

HELIOS: a General Framework for Ontology-based Knowledge Sharing and Evolution in P2P Systems HELIOS: a General Framework for Ontology-based Knowledge Sharing and Evolution in P2P Systems S. Castano, A. Ferrara, S. Montanelli, D. Zucchelli Università degli Studi di Milano DICO - Via Comelico, 39,

More information

Constructing Popular Routes from Uncertain Trajectories

Constructing Popular Routes from Uncertain Trajectories Constructing Popular Routes from Uncertain Trajectories Ling-Yin Wei, Yu Zheng, Wen-Chih Peng presented by Slawek Goryczka Scenarios A trajectory is a sequence of data points recording location information

More information

Web Information Retrieval using WordNet

Web Information Retrieval using WordNet Web Information Retrieval using WordNet Jyotsna Gharat Asst. Professor, Xavier Institute of Engineering, Mumbai, India Jayant Gadge Asst. Professor, Thadomal Shahani Engineering College Mumbai, India ABSTRACT

More information

Web Data Extraction Using Tree Structure Algorithms A Comparison

Web Data Extraction Using Tree Structure Algorithms A Comparison Web Data Extraction Using Tree Structure Algorithms A Comparison Seema Kolkur, K.Jayamalini Abstract Nowadays, Web pages provide a large amount of structured data, which is required by many advanced applications.

More information

MEASURING SEMANTIC SIMILARITY BETWEEN WORDS AND IMPROVING WORD SIMILARITY BY AUGUMENTING PMI

MEASURING SEMANTIC SIMILARITY BETWEEN WORDS AND IMPROVING WORD SIMILARITY BY AUGUMENTING PMI MEASURING SEMANTIC SIMILARITY BETWEEN WORDS AND IMPROVING WORD SIMILARITY BY AUGUMENTING PMI 1 KAMATCHI.M, 2 SUNDARAM.N 1 M.E, CSE, MahaBarathi Engineering College Chinnasalem-606201, 2 Assistant Professor,

More information

Wireless Network Security : Spring Arjun Athreya March 3, 2011 Survey: Trust Evaluation

Wireless Network Security : Spring Arjun Athreya March 3, 2011 Survey: Trust Evaluation Wireless Network Security 18-639: Spring 2011 Arjun Athreya March 3, 2011 Survey: Trust Evaluation A scenario LOBOS Management Co A CMU grad student new to Pittsburgh is looking for housing options in

More information

Poster Session: An Indexing Structure for Automatic Schema Matching

Poster Session: An Indexing Structure for Automatic Schema Matching Poster Session: An Indexing Structure for Automatic Schema Matching Fabien Duchateau LIRMM - UMR 5506 Université Montpellier 2 34392 Montpellier Cedex 5 - France duchatea@lirmm.fr Mark Roantree Interoperable

More information

Hierarchical Online Mining for Associative Rules

Hierarchical Online Mining for Associative Rules Hierarchical Online Mining for Associative Rules Naresh Jotwani Dhirubhai Ambani Institute of Information & Communication Technology Gandhinagar 382009 INDIA naresh_jotwani@da-iict.org Abstract Mining

More information

Combining Ontology Mapping Methods Using Bayesian Networks

Combining Ontology Mapping Methods Using Bayesian Networks Combining Ontology Mapping Methods Using Bayesian Networks Ondřej Šváb, Vojtěch Svátek University of Economics, Prague, Dep. Information and Knowledge Engineering, Winston Churchill Sq. 4, 130 67 Praha

More information