Combining Multiple Query Interface Matchers Using Dempster-Shafer Theory of Evidence

Size: px

Start display at page:

Download "Combining Multiple Query Interface Matchers Using Dempster-Shafer Theory of Evidence"

Phyllis Owen
6 years ago
Views:

1 Combining Multiple Query Interface Matchers Using Dempster-Shafer Theory of Evidence Jun Hong, Zhongtian He and David A. Bell School of Electronics, Electrical Engineering and Computer Science Queen s University Belfast Belfast BT7 1NN, UK {j.hong,zhe01,da.bell}@qub.ac.uk Abstract Matching query interfaces is a crucial step in data integration across multiple Web databases. The problem is closely related to schema matching that typically exploits different features of schemas. Relying on a particular feature of schemas is not sufficient. We propose an evidential approach to combining multiple matchers using Dempster-Shafer theory of evidence. First, our approach views the match results of an individual matcher as a source of evidence that provides a level of confidence on the validity of each candidate attribute correspondence. Second, it combines multiple sources of evidence to get a combined mass function that represents the overall level of confidence, taking into account the match results of different matchers. Our combination mechanism does not require use of weighing parameters, hence no setting and tuning of them is needed. Third, it selects the top k attribute correspondences of each source attribute from the target schema based on the combined mass function. Our experimental results show that our approach is highly accurate and effective. 1 Introduction Web databases are now pervasive, which can be accessed via their query interfaces (usually HTML query forms) only. Query forms provide a natural way for the user to make queries to the underlying databases without using a particular query language. On receiving form-based queries, these databases return query results encoded in HTML, which are then displayed to the user. Many E-commerce sites are supported by Web databases. In a specific domain (e.g. flight booking, book sales), there are many database-driven Web sites that sell similar products or services. It is a daunting task for the user to visit numerous Web sites individually to search for and compare services or products. Web data integration aims to provide integrated access to a multitude of Web databases, where users need to fill in only a uniform query form, and on receiving a user query the system will automatically make connections to different sites, fill in the local query forms on these sites, submit these (a) abebooks.com (b) Compman.co.uk Figure 1: Two Web query interfaces in the book domain forms, combine the query results, and return the combined results to the user. Matching query interfaces is a crucial step in Web data integration, which finds attribute correspondences between two query interfaces. Figure 1 shows two query interfaces on the Web, where we can find attribute correspondences such as those indicated by different shapes or underlined. The problem is closely related to schema matching that takes two schemas as input and produces a set of attribute correspondences between them [Rahm and Bernstein, 2001]. Schema matching has been extensively studied (e.g. [Rahm and Bernstein, 2001; Do and Rahm, 2002; Doan et al., 2001; He et al., 2004; He and Chang, 2003; Madhavan et al., 2001; Wang et al., 2004; Wu et al., 2004]). These approaches exploit different features of schemas, including structural and

2 linguistic features and data types, etc to match attributes between schemas. Schema matching is inherently uncertain due to lack of complete knowledge about schemas. Relying on a single feature of schemas is not sufficient and the match results of individual matchers are often inaccurate and uncertain. Approaches have been proposed to combine multiple matchers taking into account different features of schemas. A common approach is to apply different weight coefficients to the match results of individual matchers reflecting their different levels of importance, and their weighted values are then added together as the combined match results. However, these weight coefficients are often manually set for a particular domain in a trial and error manner. There are several differences between query interfaces and traditional database schemas. First, database schemas are designed internally for database developers. As a consequence, the attributes of the schemas may be named in a highly inconsistent manner, imposing many difficulties in schema matching. In contrast, query interfaces are designed for normal users and are likely more meaningful and consistent. For example, labels on query forms are usually words or phrases whereas attribute names of database schemas are often abbreviations. Second, database schema matching has mainly focused on schema-level matching while instance-level matching has not been done extensively. This is often due to the unavailability of data instances and the assumption that the same domains of values have been used across different schemas. Whereas in query interfaces, the user is likely to be given ranges of values to choose from and as these values are designed for human use they are also likely to be more meaningful and consistent. As data instances are pervasive, semantic heterogeneity of data instances between query interfaces has to be addressed. The availability of rich linguistic and semantic features in both attributes and data instances of query interfaces motivates us to explore multiple sources of evidence from these linguistic and semantic features. We propose an evidential approach to combining the match results of multiple matchers using Dempster-Shafer theory of evidence. First, this approach views the match results of an individual matcher as a source of evidence that provides some degree of belief on the validity of each candidate attribute correspondence. Second, it combines degrees of belief from multiple sources of evidence to get a combined mass function that represents the overall degree of belief on the validity of each candidate attribute correspondence, taking into account the match results of different matchers. Our combination mechanism does not require use of weighing parameters, hence no setting and tuning of them is needed. Third, it selects the top k attribute correspondences of each source attribute from the target schema based on the combined mass function. Our experimental results show that our approach is highly accurate and effective. 2 Dempster-Shafer (DS) Theory of Evidence Dempster-Shafer theory of evidence, sometimes called evidential reasoning [Lowrance and Garvey, 1981] or belief function theory, is a mechanism formalized by Shafer [Shafer, 1976] for representing and reasoning with uncertain, imprecise and incomplete information. The theory represents a set of propositional hypotheses by a frame of discernment. Definition 1 Frames of Discernment A frame of discernment (or simply a frame), usually denoted as Θ, contains mutually exclusive and exhaustive propositional hypotheses, one and only one of which is true. Example 1: A patient has been observed having two symptoms: coughing and sniveling and only three types of illness could have caused these symptoms: flu (F), cold (C) and pneumonia (P). We can use frame Θ = {F, C, P } to represent these types of illness. There are three important functions in DS theory: the basic probability assignment (bpa or m), the belief function (bel), and the plausibility function (pl). Definition 2 Basic Probability Assignment (Mass Function) A function, m: 2 Θ [0, 1], is called a basic probability assignment on a frame Θ if it satisfies the following two conditions: m(φ) = 0 (1) m(a) = 1 (2) A Θ where φ is an empty set and A is any subset of Θ. The basic probability assignment (mass function thereafter) is a primitive function. Given a frame, Θ, for each source of evidence, a mass function assigns a mass to every subset of Θ, which represents the degree of belief that one of the hypotheses in the subset is true, given the source of evidence. In Example 1, when the patient has been observed having the symptom coughing, the degree of belief that the patient has flu or cold is 0.6 and the degree of belief that the patient has pneumonia is 0.4. We then have a mass function: m 1 ({C, F }) = 0.6 and m 1 ({P }) = 0.4. Similarly, with the symptom of sniveling, we have another mass function: m 2 ({F }) = 0.7, m 2 ({C}) = 0.2 and m 2 ({P }) = 0.1. From the mass function for a source of evidence, the belief function (bel), and the plausibility function (pl) can be defined, which represent the upper and lower bounds of an interval for every subset A of Θ, which contains the precise probability p(a) that one of the hypotheses in A is true, given the source of evidence, i.e. bel(a) p(a) pl(a). Definition 3 Belief Function (bel) For every subset A of Θ, bel(a) is defined as the sum of the masses assigned to every subset B of A: bel(a) = B A m(b) (3) Definition 4 Plausibility Function (pl) For every subset A of Θ, pl(a) is defined as the sum of the masses assigned to every subset B of Θ that intersects A: pl(a) = m(b) (4) B A The belief function and the plausibility function are related to each other as follows: pl(a) = 1 bel(ā) (5)

3 It follows from Equations 3, 4 and 5 that from one of the three functions m, bel and pl the other two functions can be derived. Given two mass functions m 1 and m 2, DS theory also provides Dempster s combination rule for combining them, which is defined as follows: A B=C m 1(A)m 2 (B) m(c) = 1 A B=φ m (6) 1(A)m 2 (B) In Example 1, we combine two mass functions, m 1 and m 2, to get a combined mass function: m(c) = 0.207, m(f ) = and m(p ) = From this mass function, the corresponding belief and plausibility functions can also be derived. When all masses have been assigned to subsets with a single element only, bel, pl and m are all the same. Since bel(a) p(a) pl(a) for every subset A of Θ, we have a probability distribution, p(c) = 0.207, p(f ) = and p(p ) = Therefore, given the two symptoms the patient has, we can conclude that it is more likely that he is having flu. 3 Combining Multiple Matchers Using DS Theory Based on DS theory, we propose an evidential approach to combining multiple matchers that exploit different features of schemas. Given a source schema and a target schema, for every source attribute, each target attribute is one of its candidate correspondences. An individual matcher provides a different measure on the validity of each candidate correspondence of the source attribute. Applying this measure to all the candidate correspondences provides a source of evidence on the validity of each candidate correspondence. Based on this source of evidence, we can generate a mass function that assigns a mass to every subset of the given frame, reflecting the degree of belief that the valid correspondence of the source attribute belongs to the subset. A set of different matchers provide multiple measures and applying these measures to all the candidate correspondences provides multiple sources of evidence on the validity of each candidate correspondence, based on which we can generate multiple mass functions. These mass functions can then be combined using Dempster s combination rule to calculate the overall degree of belief on the validity of each candidate attribute correspondence, reflecting the match results of different matchers. The combined mass function can then be further used for deciding on the top k attribute correspondences of each source attribute. 3.1 Individual Matchers We use four individual matchers. The first three matchers are based on different linguistic features of attribute names and the last matcher uses the data types of attributes. Semantic Based Similarity We measure semantic similarity between two words based on their relationships on a general linguistic ontology, Word- Net 1. WordNet [Miller, 1995] is a lexical database devel- 1 oped by Princeton University, which provides taxonomic hierarchies of natural language concepts. We define semantic similarity between two different words, w 1 and w 2, as Sim se (w 1, w 2 ) = 1 1+L, where L is the length of the shortest path in WordNet between w 1 and w 2, if both w 1 and w 2 can be found in WordNet. Otherwise, Sim se (w 1, w 2 ) = 0. For two same words, w 1 and w 2, we define semantic similarity between them as Sim se (w 1, w 2 ) = 1. Edit Distance Based Similarity Edit distance (Levenshtein distance) between two strings is measured by the number of edit operations, including insertions, deletions and substitutions, necessary to transform one string into another [Hall and Dowling, 1980]. We define the edit distance based similarity between two words, w 1 and w 2, as follows: Sim ed (w 1, w 2 ) = ed(s 1, s 2 ) where s 1 and s 2 are two strings in w 1 and w 2 respectively and ed(s 1, s 2 ) is the edit distance between s 1 and s 2. Jaro Distance Based Similarity The Jaro distance based string similarity between two strings, Jaro(s 1, s 2 ), is defined based on the number and order of the common characters between them [Cohen et al., 2003]. The Jaro distance based similarity Sim ja (w 1, w 2 ) between two words, w 1 and w 2 is defined as the Jaro distance based string similarity Jaro(s 1, s 2 ) between two strings s 1 and s 2 in w 1 and w 2. Similarity between Attribute Names A particular type of similarity between two attribute names is aggregated from the same type of similarities between individual words in the attribute names. Assume that two attribute names, A 1 and A 2, contain two sets of words, A 1 = {w 1, w 2,..., w m } and A 2 = {w 1, w 2,..., w n}. For each word, w i for i = 1, 2,...m, in A 1, we calculate its similarity with every word in A 2 and find the maximum similarity value v i. We then get a similarity value set for A 1 : Sim 1 = {v 1, v 2,..., v m }. Similarly, we get a similarity value set for A 2 : Sim 2 = {v 1, v 2,..., v n}. We calculate similarity between two attribute names A 1 and A 2 as follows: m i=1 Sim(A 1, A 2 ) = v i + n i=1 v i (8) m + n where m is the number of words in A 1, n is the number of words in A 2. Data Types of Attributes We define that two data types are compatible if one subsumes another (i.e. is-a relationship) in a given hierarchy of data types. An attribute with only an input box is given an any type since the input box can accept data of different types such as string or integer. The other data type of an attribute is recognized on the basis of the set of labels and form elements that form the attribute. (7)

4 3.2 Representing Possible Matches of Source Attributes in DS Assume that we have a source schema, S = {a 1, a 2,..., a m }, where a i, for i = 1, 2,..., m, is a source attribute, and a target schema, T = {b 1, b 2,..., b n }, where b j, for j = 1, 2,..., n, is a target attribute. For each source attribute, a i, we have a set of candidate correspondences in the target schema { a i, b 1, a i, b 2,..., a i, b n }. It is also possible that a i may have no correspondence in the target schema at all. We represent all possible matches of a i in T using a frame of discernment, Θ = { a i, b 1, a i, b 2,..., a i, b n, a i, null }, where a i, b j, for j = 1, 2,..., n, represent all possible candidate correspondences and a i, null represents that there is no correspondence of a i in the target schema. Example 2: As shown in Figure 1, for Author on abebooks.com, a frame of discernment for it is defined as follows: Θ = { Author, Category, Author, Keywords, Author, T itle, Author, Subject, Author, P ublisher, Author, Author, Author, OurRef erence, Author, ISBN, Author, Availability, Author, ReleasedDate, Author, U pperp ricelimit, Author, Binding, Author, null } Generating Subsets of Indistinguishable Attribute Correspondences Given frame Θ = { a i, b 1, a i, b 2,..., a i, b n, a i, null } for a source attribute a i, an individual matcher may not be able to distinguish two attribute correspondences in Θ from each other in terms of level of confidence on the validity of the attribute correspondences. For example, if the data type of the source attribute is compatible with the data types of two target attributes, the data type matcher cannot distinguish the two attribute correspondences from each other. The data type matcher cannot distinguish an attribute correspondence, in which the data type of the source attribute a i is compatible with the one of the target attribute, from a i, null either. A string similarity based matcher cannot distinguish two strings abc and bcd from each other when string bc is matched to each of them. We therefore cluster attribute correspondences in Θ into subsets of indistinguishable attribute correspondences. We do not have any further knowledge on the subset of indistinguishable attribute correspondences to be able to tell which attribute correspondence is more valid than the others. What we have is some degree of belief that one of the attribute correspondences in the subset is valid, according to a particular matcher. In Example 2, based on edit distance based similarity, the frame of discernment for Author on abebooks.com is divided into the following indistinguishable subsets: { Author, Category, Author, T itle, Author, OurRef erence }, { Author, Keywords, Author, Subject, Author, P ublisher, Author, ISBN }, { Author, Author }, { Author, Availability }, { Author, ReleasedDate }, { Author, U pperp ricelimit }, { Author, Binding }, { Author, null } Generating Mass Distributions on Subsets of Indistinguishable Attribute Correspondences Given a similarity based matcher, for each indistinguishable subset of attribute correspondences, we have a similarity value for each correspondence in the set, which represents how well the two attributes in the correspondence match according to the measure used by the matcher. Suppose we have subset A = { a i, b i1, a i, b i2,..., a i, b il }, a mass assigned to A is calculated based on the similarity values for all the attribute correspondences in A as follows: m (A) = 1 Π l j=1(1 Sim(a i, b ij )) (9) where Sim(a i, b ij ) is similarity value for the correspondence a i, b ij in A. Since we do not have a similarity value for a i, null by the matcher, we form a special singleton subset, { a i, null }. The mass assigned to the subset is calculated as follows: m ({ a i, null }) = Π n j=1(1 Sim(a i, b j )) (10) The mass assigned to { a i, null }, therefore, represents the degree of belief that none of the target attributes is the attribute correspondence of source attribute, a i. We scale the mass distribution, m, so that the sum of all masses assigned to every indistinguishable subset equals to 1: m (A) m(a) = B Θ m (B) (11) where A and B are indistinguishable subsets of Θ. Therefore these indistinguishable subsets of Θ are the focal elements of Θ. The mass assigned to any other subset of Θ is 0. In Example 2, based on edit distance based similarity, for Author on abebooks.com, we have a mass function defined as follows: m ed ({ Author, Category, Author, T itle, Author, OurRef erence }) = 0.175, m ed ({ Author, Keywords, Author, Subject, Author, P ublisher, Author, ISBN }) = 0.191, m ed ({ Author, Author }) = 0.414, m ed ({ Author, Availability }) = 0.035, m ed ({ Author, ReleasedDate }) = 0.063, m ed ({ Author, UpperP ricelimit }) = 0.071, m ed ({ Author, Binding }) = 0.052, m ed ({ Author, null }) = For the data type based matcher, we create two subsets of indistinguishable attribute correspondences, A and A. A contains both attribute correspondences in which the data type of the source attribute is compatible with the one of the target attribute and a i, null where a i is the source attribute. A contains attribute correspondences in which the data type of the source attribute is not compatible with the one of the target attribute. We have a mass function defined as: m da (A) = 1 and m da (A ) = 0.

5 3.3 Combining Mass Functions from Multiple Matchers We now have a mass function by each of the individual matchers, which assigns a mass to every indistinguishable subset of Θ. Using Dempster s combination rule, we can take into account different sources of evidence witnessed by different matchers by combining the appropriate mass functions by these matchers. In Example 2, for Author on abebooks.com, we have a combined mass function defined as follows: m({ Author, OurRef erence }) = 0.020, m({ Author, U pperp ricelimit }) = 0.003, m({ Author, P ublisher }) = 0.048, m({ Author, Author }) = 0.929, 3.4 Selecting Top k Attribute Correspondences This section shows how to select the top k attribute correspondences of each source attribute. The mass function produced in the previous section can be used to derive both the bel and pl functions. However, the mass function (or the corresponding belief and/or plausibility functions) cannot be directly used for making decisions. The final mass function can be transformed into a pignistic probability function using the transformation function proposed in [Smets, 2004]. Decisions can then be made based on this probability distribution. When all masses have been assigned to subsets with a single element only, bel, pl and m are all the same. In this case, we choose those attribute correspondences with top k masses as the top k attribute correspondences. In Example 2, for Author on abebooks.com, we have the top 3 attribute correspondences as follows: m({ Author, Author }) = 0.929, m({ Author, P ublisher }) = 0.048, m({ Author, OurRef erence }) = Experimental Results We use a set of 88 query interfaces selected from the ICQ Query Interfaces data set at UIUC, which contains manually extracted schemas of interfaces in 5 different domains: airfares, automobiles, books, jobs, and real estates, which involve 1:1 matching only (as we have focused on 1:1 matching in this paper). In each domain we create a uniform query interface as the source interface, and take each of the query interfaces in the domain as a target interface. We use three performance metrics: precision (P), recall (R), and F-measure (F). Precision is the percentage of correct matches over all the matches by a matcher. Recall is the percentage of correct matches by a matcher over all the matches by domain experts. F-measure is the incorporation of precision and recall as follows: F = 2P R P +R. First, in each domain we use three individual matchers: edit distance, Jaro distance and semantic similarity (the data type matcher cannot be used alone), and compare them with our new approach that combines four individual matchers: edit distance, Jaro distance, semantic similarity and data type. Given a source attribute a i, for each individual matcher, we Table 1: Precisions of individual matchers and our combined matcher Edit distance Jaro distance Semantic similarity Ours Airfares 83.3% 56.8% 86.4% 92.0% Autos 84.4% 48.1% 93.1% 96.3% Books 87.0% 48.8% 92.0% 94.4% Jobs 68.5% 50.0% 71.0% 91.9% Estates 86.8% 52.9% 81.6% 93.8% Average 82.1% 51.3% 84.8% 93.7% treat a i, null, i.e. no match, as a possible answer. We also have every candidate attribute correspondence, a i, b j, as a possible answer. We therefore have a set of possible answers, θ = { a i, b 1, a i, b 2,..., a i, b n, a i, null }. One and only one of these possible answers is true. Therefore, in our experiments, the number of matches by a matcher equals to the number of matches identified by experts, and precision, recall and F-measure turn out to be the same. We use precision only. For every a i, b j, for j = 1, 2,..., n, we have a similarity value by the matcher. Since we do not have a similarity value for a i, null, a value is calculated as follows: s( a i, null ) = Π n j=1(1 Sim(a i, b j )) In our experiments, we do not need a threshold to decide whether there is a match. If similarity values for every candidate attribute correspondence are all very low, a value calculated for a i, null, i.e. no match, will be high enough so that a i, null will be selected as the top answer. As shown in Table 1, our matcher gets much higher precision than the individual matchers. Second, we compare our results with the work in [Wu et al., 2004], which uses a similar data set for their experiments, covering the same five domains as ours. However, they also handle 1:m matching. In their experiments, a 1:m match is counted as m 1:1 matches. They did three experiments. The first is on automatic field matching which uses weight coefficients to combine multiple matchers and the clustering thresholds for two fields to be matched are set to zero for all domains. The second uses thresholds learned by user interactions. The last also uses user interactions for resolving uncertainties in match results. As shown in Figure 2, without using learned thresholds, the results of our approach are better. When the learned thresholds are used, their precision is better than ours, but we have higher recall and F-measure. Finally, when user interactions are also used to resolve uncertainties in match results, their results are better than ours. Our approach is effective and accurate for automatic schema matching across query interfaces without automated learning and user interaction. 5 Related work Cupid [Madhavan et al., 2001] exploits linguistic and structural similarity between elements and uses a weighted formula to combine these two similarities together. However, weights have to be manually generated and are domain dependent. COMA [Do and Rahm, 2002] allows users to tailor match strategies by selecting a combination of match algorithms for

6 Figure 2: Precisions, recalls and F-measures of different matchers a given problem, including Max, Min, Average and Weighted strategies. It also allows users to provide feedback for improving match results. These strategies are effective in some situations while sometimes they cannot combine results effectively, and choosing strategies by users involves human efforts. In [Wu et al., 2004], weight coefficients are also used to combine multiple matchers, which are set to some domainindependent empirical values. However, clustering is used to find attribute correspondences across multiple interfaces, in which thresholds are required for merging clusters. These thresholds need to be either manually set or learned from user interactions and are domain dependent. So this approach also involves human effort. Some approaches [He and Chang, 2003; He et al., 2004] use attribute distribution rather than linguistic or domain information. Superior to other schema matching approaches, these approaches can discover synonyms by analyzing attribute distributions in the given schemas. However, they work well only when a large training data set is available, but this is not always the case. 6 Conclusions We proposed a new approach to combining multiple matchers using DS theory and presented an algorithm for resolving conflicts among the correspondences of different source attributes. Applying different matchers to a set of candidate correspondences provides different sources of evidence, and mass distributions are defined on the basis of the match results from these matchers. We use Dempster s combination rule to combine these mass distributions, and choose the top k correspondences of each source attribute. We implemented a prototype and tested it using a large data set that contains real-world query interfaces in five different domains. The experimental results demonstrate the feasibility and accuracy of our approach. References [Cohen et al., 2003] W.W. Cohen, P. Ravikumar, and S.E. Fienberg. A comparison of string distance metrics for name-matching tasks. pages 73 78, [Do and Rahm, 2002] H.H. Do and E. Rahm. Coma - a system for flexible combination of schema matching approaches. In VLDB 02, pages , [Doan et al., 2001] A. Doan, P. Domingos, and A.Y. Halevy. Reconciling schemas of disparate data sources: A machine-learning approach. In SIGMOD 01, pages , [Hall and Dowling, 1980] P. Hall and G. Dowling. Approximate string matching. Computing Surveys, pages , [He and Chang, 2003] Bin He and Kevin Chen-Chuan Chang. Statistical schema matching across web query interfaces. In SIGMOD Conference, pages , San Diego, CA, [He et al., 2004] B. He, K.C.C. Chang, and J. Han. Discovering complex matchings across web query interfaces: A correlation mining approach. In KDD 04, pages , [Lowrance and Garvey, 1981] J.D. Lowrance and T.D. Garvey. Evidential reasoning: An developing concept. In ICCS 81, pages 6 9, [Madhavan et al., 2001] J. Madhavan, P.A. Bernstein, and E. Rahm. Generic schema matching with cupid. In VLDB 01, pages 49 58, [Miller, 1995] George A. Miller. Wordnet: A lexical databases for english. Communications of ACM, 38(11):39 41, [Rahm and Bernstein, 2001] E. Rahm and P.A. Bernstein. A survey of approaches to automatic schema matching. VLDB Journal, 10(4): , [Shafer, 1976] G. Shafer. A Mathematical Theory of Evidence [Smets, 2004] Philip Smets. Decision making in the tbm: the necessity of the pignistic transformation. International Journal of Approximate Reasoning, 38: , [Wang et al., 2004] Jiying Wang, Ji-Rong Wen, Fred Lochovsky, and Wei-Ying Ma. Instance-based schema matching for web databases by domain-specific query probing. In Proceedings of the 30th VLDB Conference, pages , Toronto, Canada, [Wu et al., 2004] Wensheng Wu, Clement Yu, AnHai Doan, and Weiyi Meng. An interactive clustering based approach to integrating source query interfaces on the deep web. In SIGMOD Conference, pages , Paris, France, Acknowledgments The authors wish to thank Phil Bernstein for his helpful suggestions and comments on the research presented in the paper.

MINING ASSOCIATION RULES WITH UNCERTAIN ITEM RELATIONSHIPS

MINING ASSOCIATION RULES WITH UNCERTAIN ITEM RELATIONSHIPS Mei-Ling Shyu 1, Choochart Haruechaiyasak 1, Shu-Ching Chen, and Kamal Premaratne 1 1 Department of Electrical and Computer Engineering University