Dependence among Terms in Vector Space Model
|
|
- Regina Shelton
- 6 years ago
- Views:
Transcription
1 Dependence among Terms in Vector Space Model Ilmério Reis Silva, João Nunes de Souza, Karina Silveira Santos Faculdade de Computação - Universidade Federal de Uberlândia (UFU) [ilmerio, Nunes]@facom.ufu.br, karinass@pop.com.br Abstract The vector space model is a mathematical-based model that represents terms, documents and queries by vectors and provides a ranking. In this model, the subspace of interest is formed by a set of pairwise orthogonal term vectors, indicating which terms are mutually independent. However, this is an over simplification. With this in view, we present, in this work, an extension to the vector space model to take into account the correlation among terms. In the proposed model, term vectors, originally orthogonal, are rotated in space geometrically reflecting the dependence semantics among terms. This rotation is done with any technique that generates information on the relationship among terms of the collection. We propose the technique of association rules in information retrieval to find sets of terms that co-occur in documents collection. The retrieval effectiveness of the proposed model is evaluated and the results show that our model improves in average precision, relative to the standard Vector Model, for all collections evaluated, leading to a gain up to 31%. 1. Introduction In information retrieval (IR), the vector space model is the most popular [16,17,18]. Its definition of weight of term in the document and the partial matching of the query with the documents results in a good ranking strategy. Besides, it is simple and fast [4,7]. Although it is one of the models of information retrieval most used, the vector space model presents disadvantages [6,21]. The documents are represented by keywords extracted from themselves, and the relationship among them is not considered. It means, for instance, that the context in which the terms are inserted is not represented. This is an over simplification of the model. Many words have multiple meanings, and the terms of a query can literally match the terms of an irrelevant document. Considering this, the main objective of this work is to incorporate information of correlation among terms in the collection to the vector space model to improve its retrieval effectiveness. The proposed solution alters the representation of term vectors in the vector space model. In this model, terms are represented by orthogonal vectors since it is not known a priori any correlation among the terms. The algorithm proposed in this work has as it main foundation the rotation of those vectors in the space, so that their representations reflect the dependence among the terms. All term vectors which have some correlation with one or more terms are rotated in the space. After all rotations, term vectors are not necessarily orthogonal among themselves. In the set of resultant vectors, the proximity between the vectors is related to the degree of dependence between the respective terms. The closer the term vectors, the greater the dependence observed between them. The rotation of term vectors is based on techniques that result in information on the relationship among terms of the collection. We have presented the data mining association rules technique to obtain this information.
2 The remaining of this paper is organized as follows. In the section immediately bellow we discuss some related work. Section three describes foundations of vector space model. In section four we present the association rules in the context of information retrieval. The proposed model is described in section five. The experimental results are discussed in section six. Finally, we present some conclusions and future works. 2. Related Work Several approaches for the incorporation of correlation among terms have already been presented in the relevant literature. We describe the works related to this paper. Query expansion in the vector space model is suggested in several proposals, among them, [7,11,12,20]. In [20], Voorhees examined the usefulness of lexical query expansion in the collection TREC. Voorhees obtained considerable improvements of effectiveness just in the use of short queries. Mandala et al. [11] analyzed the characteristics of different thesaurus types and proposed a method to combine them and to expand queries. In [12], Nie and Jin, used the logical operator OR to connect expansion terms with the original terms of the query. In [5], Becker and Kuropka expose a model of IR for the comparison of documents that represents topics, terms and documents as vectors. The basis of the space is formed by a set of orthogonal topic vectors, where term vectors are represented. The angle between the term vectors and the weight of the term is calculated using information about the collection, such as, for instance, a list of radicals of the collection terms. A work similar to the proposed herein was accomplished by Possas et al. in [13,14,15]. An extension to the Vector space model was suggested considering the correlation among the terms, obtained using association rules. In [15], a new model is presented, named set-based model, for computing term weights, based on set theory, and for ranking documents. For computing those weights, the theory of the association rules is used. The proposal presented by the authors in [14] is similar to [15], and the main difference consists in how association rules are used. Then, in [13], an extension to the set-based model is proposed using information about proximity among the terms of the query in the documents. The generalized vector space model (GVSM) is another extension of the vector space model, which contemplates the correlation among terms [22,23]. In GVSM, the terms can be non-orthogonal and are represented by smaller components named minterms. The minterms are vectors, with binary weights, which indicate all co-occurrence possibilities of terms in documents. The basis for GVSM is formed by a set of 2 t (t is the number of distinct terms in the collection) minterms vectors. The term vectors are linear combinations of minterms, reflecting co-occurrence proceeding from minterms. Our work differs from the above related works in the following aspects. In none of the cited works, the term vectors are rotated in the space to reflect their correlation as we have done in this work. Moreover, the association rules are used to determine the proximity among term vectors, differing from the cited models. 3. Vector Space Model The Vector space model was, initially, proposed by Gerard Salton [16,17]. In said model, all relevant objects for a information retrieval system are represented as vectors: terms, documents and queries.
3 Each term k i is represented as a t-dimensional vector, where t is the number of distinct terms in the collection. In the vector space model, the vector k i represents the term k i. If a r is the r th element of the vector k i, then k i = (a 1, a 2,..., a t ) where that is, a r = 0 r i a r = 1 r = i k 1 = (1, 0, 0,...,0) k 2 = (0, 1, 0,...,0) O k t = (0, 0, 0,..,1) The set of all term vectors K = {k 1, k 2,..., k t } is linearly independent and forms the canonical basis for space R t. The vectors of terms are pairwise orthogonal and, in consequence, the corresponding terms are considered independent. Document and query vectors are represented using the set K of term vectors. These vectors are built as linear combinations of the term vectors. The vector d j associated with the document d j is defined as: t d j = i= 1 w i,j k i or d j = (w 1,j, w 2,j,..., w t,j ) Similarly, the vector for query q is defined as: t q = i= 1 w i,q k i or q = (w 1,q, w 2,q,..., w t,q ) In the equalities above, w i,j and w i,q are weights of term i in document j and in query q, respectively. The most efficient definition of term weights for the information retrieval is named tf-idf [4]. This strategy considers the number of times an index term occurs in a document and the number of documents of the collection in which an index term occurs. The vector space model evaluates the degree of similarity of the document d j in relation to the query q as the correlation between the vectors d j and q. The relevance of a document for a query is proportional to the distance between the respective vectors. Usually, that correlation is quantified by the cosine of the angle among those two vectors. That is, sim(d j, q) = d j q = t i=1w i,j. w i,q d j x q t i=1 w i,j 2 t i=1 w i,q 2
4 The closest documents in the space to the query are considered relevant for the user and returned as answer set for the query. After the computation of the similarity degrees, it is possible to order a list of documents (ranking) and their respective degrees of relevance to the query. 4. Association Rules in Information Retrieval In the area of data mining, the association rules serve, typically, to represent frequent patterns found in the data [1,2,3,9]. The main function of the rules is to characterize the data, representing regularities. One of the purposes of this work is to use the data mining in IR. In general, the literature regarding the data mining works with items and transactions. However the algorithms used for the discovery of association rules can be adapted also to work with terms and documents, identifying the co-occurrence among terms. In IR context, X and Y are terms or sets of terms. Consider the following example, which defines an association rule in IR. The information whereby that documents whose theme is tourism discuss on hotels as well, is represented in the association rule (1) below: tourism hotel [support = 2%, confidence = 80%] (1) The support and the confidence of a rule are two measures that reflect, respectively, the usefulness and the certainty of the rules found. The support is a percentage in relation to the entire collection of documents analyzed. In the example above, in 2% of the collection, the words tourism and hotel appear simultaneously in the same document. The confidence is a percentage in relation to an attribute. A confidence of 80% reveals that 80% of the documents that discuss tourism also discuss hotels. Typically, association rules are considered useful if they meet a support and confidence threshold [12] Basic Concepts Let J = {k 1,k 2,...k m } be the set of distinct terms in a collection of documents D. Each document d j of the database is a set of terms such that d j J. An association rule is an implication like A B, where A J, B J, and A B =. The rule A B is valid in a set of documents D with support s, if s is the percentage of documents in D which contains A B (in other words, A and B at the same time). The rule A B has confidence c in the set of documents D if c is the percentage of documents in D having A which also contains B. Rules that meet a minimum support (min_sup) and a minimum confidence (min_conf) are termed strong. A set of terms is referred to as termset. A termset that contains k terms is a k-termset. Association rules are found in large databases in two steps: 1 Find all the sets of terms (termsets) that meet the minimum support. These termsets are named frequent termsets; 2 Generate frequent termsets strong association rules: by definition, these rules should meet the minimum support and the minimum confidence. Apriori is an algorithm for mining frequent termsets for association rules [2,3,12]. Apriori uses an iterative approach known as search in levels, where k-termsets are used to
5 explore (k+1)-termsets. First, the set of 1-termsets frequent is found. This set is denoted L 1. L 1 is used to find L 2, the set of frequent 2-termsets, which is used to find L 3, and so on, until no more frequent k-termsets can be found. The search of each L k requires a complete scan in the database. To improve the generation efficiency of frequent termsets, an important property called Apriori is used to reduce the search space. Once generated the frequent termsets of the transactions in the database D, the strong association rules can be generated. This can be made using the following equation for the confidence, using the termset frequency: confidence(a B) = P(B A) = freq(a B) freq(a) where freq(a B) is the number of transactions containing the termsets A B, and freq(a) is the number of transactions containing A. Based on this equation, association rules can be generated as it follows: For each frequent termset I, generate all nonempty subsets of I. For each nonempty subset s of I, generate the rule s (l-s) if freq(l) min_conf, freq(s) where min_conf is the confidence minimum threshold. In the following section, we show how association rules modify the vector space model. 5. Vector Space Model Modified by Association Rules The main foundation of the algorithm proposed in this work is the rotation of the term vectors in the space, so that its representations reflect, geometrically, the semantics of correlation of terms adopted. We have used the association rules as a tool for the generation of information about the dependence among the terms. The term vectors are rotated in the space, reflecting, in a geometric way, the semantics defined for the association rules. This method is based on the assumption that a pair of words that frequently occurs together in the same documents is related to the same subject. The association rules are of the form k i k j, and c ij is the confidence index of the rule, which indicates the degree of dependence of the term k i in relation to the term k j. That index is used, in this work, to compute the new angle between the term vectors k i and k j. The confidence was chosen as a parameter to determine the proximity of the term vectors, because it reflects the certainty of the association rule. The term vectors are brought close together according to the association rules created for the respective terms as follows: Definition 5.1 (Rotation of basis vectors): Let k i and k j be two term vectors, c ij the confidence index of the association rule k i k j. The new angle θ ij between k i and k j is given by θ ij = 90 (1 c ij ) where 90 is the original angle between the vectors k i and k j. In this case, the rotation occur only in the vector k i, the vector k j is not modified. The reason for this is related to the
6 semantics of the association rule and the confidence. The index c ij of the association rule k i k j determines that, in c% times the term k i appears, the term k j also appears. Therefore, the rotation is made in the vector corresponding to the term of the antecedent of the association rule. θ ij is the new vector between the term vectors k i and k j whenever θ ij < 90º. The vector k i approaches the vector k j, and the new vector is named k i, where the r th element of the vector k i, named a r, is defined as: a r = sin(θ ij ) r = i a r = cos(θ ij ) r = j a r = 0 r i and r j Therefore, the vector k i is transformed in vector k i = (a 1, a 2,..., a t ), altering the positions i and j of the original vector. In position i, we have sin(θ ij ) and, in position j, we have cos(θ ij ). In case a term k p presents two or more associated terms, a normalization is made in the new vector k p as it follows. Let k p k n and k p k v be two association rules, with equal antecedents and respective confidences c pn, c pv, the new vector k p is defined as k p = k pn + k pv k pn + k pv where k pn is the vector k p modified using k p k n and c pn (definition 5.1), k pv is the modified vetor using k p k v and c pv. The vector space basis K is formed by the sets of term vectors {k 1, k 2,..., k t }. After the rotation of the term vectors, the new basis for the vector space, denoted K, is obtained from K, replacing the vectors k i by k i, so K = {k 1, k 2,..., k t }. The set K continues forming the basis of the vector space R t because their vectors are linearly independent. The document and query vectors, d j and q, are represented in the new basis K as linear combination of terms vectors k i. Document and query vectors are termed d j and q and defined as: t t d j = w ij k i q = w iq k i i= 1 So, document and query vectors, d j and q, reflect, now, the dependence semantics among the terms, implicit in basis K. The same function in the computing of the similarity is used in the vector space model modified by dependence among the terms. Therefore, we have, sim(d j, q) = d j q = t i=1w i,j k i. t s=1 w s,q k s = t i,s=1w i,j k i. w s,q k s d j x q t i=1 w 2 i,j t 2 s=1 w s,q t 2 i=1 w i,j t 2 s=1 w s,q The similarity between the query and documents is modified due to the changes in the respective vectors, now non-orthogonal. The normalization of the similarity, or the factors in the denominator of the formula, is made using the original norm of the documents. That strategy was adopted because otherwise, should the normalization use the document vectors i= 1
7 d j, the norm of all the documents would have to be recalculated, elevating the computational costs of calculation the similarity. Besides, that simplification does not change the results significantly. In the computation of the similarity between the query and the documents, the main consequence in term vectors rotation is the automatic query expansion. The query is expanded with terms related to their original terms. Besides, documents which have query terms and associated query terms occupy a position in the ranking above the documents that just have the terms of the query Algorithm The implementation of the model presented is divided in two phases. The first is the generation of the information on the dependence among the terms, which means the construction of vectors k i. This task is thoroughly accomplished in the pre-processing phase. The second phase is the development of the proposed model. The search algorithm used in the implementation of the vector space model modified by dependence among the terms, described in Figure 1, is similar to the original model. It considers A a list of accumulators, with each item A j of A storing the partial similarity of the document d j in relation to the query q. The function value(k i, i) returns the value stored in the position i in the vector of the term k i. The necessary modifications to the original algorithm to reflect the dependence among the terms, are in step (2) and in the loop of step (6). (1) Create and initialize a structure of accumulators (A) (2) For each query term k i, add to the query all the terms associated. (3) For each term k i of the modified query do: (4) For each pair [d j, f ij ] in the term inverted list do: (5) aux = w ij * w iq * (value(k i, i)) 2 (6) For each term k j associated to term k i do: (7) aux = aux + (w ij * w iq * value(k i, i) * value(k i, j)) (8) End For (9) if A j A then (10) A j = aux (11) else (12) A j = A j + aux (13) A = A + {A j } (14) End For (15) End For (16) Divide each accumulator A j by the document norm d j. (17) Order the list of accumulators A j and return the documents d j retrieved. Figure 1. Search algorithm for the Vector space model modified by dependence among the terms. In step (2), there is a difference in relation to the original algorithm. Once determined the identifiers of query terms, the terms associated to each term of the query are added to the list of query terms. This step of the algorithm defines the automatic expansion of the query with the terms related to the query terms.
8 Steps of (5) to (8) are equal to the sum w i,j k i w s,q k s of the equation of the i, s= 1 internal product between the vectors d j and q. Step (5) corresponds to the sum for i = s. And the loop of step (6) corresponds to the other cases, when i s. These steps are necessary because the term vectors are non-orthogonal. When analyzing the algorithm, we clearly notice that the proposed model is an extension to the original vector space model. That is justified because, if no association among the terms exists, the algorithm described is equivalent to the original algorithm. 6. Experiments To evaluate the efficiency of the vector space model modified by dependence among the terms, the experiments were made with four reference collections named CACM [8], Cystic Fribosis (CFC) [19], CISI and Third Text Retrieval Conference (TREC-3) [10]. The collection characteristics are shown in Table 1. Reference collections Table 1. Characteristics of the reference collections. Number of distinct terms Number of documents Average number of terms per document t Number of queries Average number of terms per query Average relevant documents per query CFC ,2 64 4,0 39 CACM , ,7 13 CISI ,6 50 9,4 50 TREC , ,58 106,38 The evaluation of the IR system proposed here is related with the effectiveness of the retrieval, in other words, how much precise the answer set is returned by the system for a given query. We used the precision-recalls curves to compare the effectiveness of the vector space model modified by dependence among terms with the one of the classic vector space model. Each curve quantifies the precision as a function of the percentage of the documents retrieved (recall). In the computing of the association rules, some parameters can be adjusted during the process of generation of association rules. Min_sup and min_conf are, respectively, support and confidence thresholds. We accomplished experiments and observed that min_sup should contain a low value (up to 5%) because, in general, the frequency of terms in collections is low. Besides, in case min_sup is low, association rules, involving terms whose frequency is small in the collection of documents, are discarded. On the other hand, min_conf should contain a higher value (above 40%), because this parameter determines the approach among the vectors. In case min_conf contains a low value, term vectors which have very low co-occurrence are brought close together. This harms the effectiveness of the retrieval, because the system will expand the query with terms not related to query terms. As we can see in Figures 2 and 3, the proposed model yields better precision than Vector Space Model, regardless of the collection and of the recall level. Table 2 presents a summary of the results obtained, in which the averages of precision are exhibited for the two models in all collections and the gains obtained of the model proposed in relation to the original.
9 Recall x Precision CACM Recall x Precision CISI 80% 60% VS M MVSM 80% 60% VS M MVSM 40% 40% 20% 20% 0% 0% 20% 40% 60% 80% 100% 0% 0% 20% 40% 60% 80% 100% Figure 2. Recall-Precision for CACM and CISI. Recall x Precision CFC Recall x Precision TREC-3 80% VS M 80% VS M 60% MVSM 60% MVSM 40% 40% 20% 20% 0% 0% 20% 40% 60% 80% 100% 0% 0% 20% 40% 60% 80% 100% Figure 3. Recall-Precision for CFC and TREC-3. Table 2. Average Precision Curves and gain provided by the vector space model modified by association rules. Collection Average Precision (%) Classic Modified Gain (%) CACM 30,03 32,08 6,83 CISI 17,64 20,09 13,89 CFC 10,05 13,24 31,74 TREC-3 12,09 14,04 16,13 The results presented for the vector space model modified by association rules are the best ones, considering the analysis of the parameters values described. Then, for maximum min_sup from 4% to 5%, and for min_conf alternating between 45% and 70%, the variation of the results is minimum in relation to the one presented. When defining the minimum confidence with a value up to 70%, few rules are generated and, consequently, the results approach more those presented for the classic vector space model. The various
10 possibilities of values of the parameters were tested. However, the collections behave in a similar way in their alteration. The experiments have shown that the proposed model improves the average precision of the answer set for all collections. Besides, the medium precision obtained was not harmed by the recall increase occurred when expanding the queries. 7. Conclusions In this paper, we have presented an extension to the vector space model to reflect the dependence among the terms of the collection. In the proposed model, the dependence among the terms is represented geometrically in the vector space. The proposed model is based on the rotation of the term vectors, in agreement with the dependence among the terms. This rotation is made based on techniques that generate information on the correlation among terms of the collection. In this work, we used the association rules. However, other techniques can be used. The generation of association rules is a known technique of data mining, which allows finding frequent patterns in large databases. In the context of this paper, it is used to find sets of terms that appear simultaneously in the collection of documents. This information is useful to modify the term vectors, so that they reflect the semantics of co-occurrence defined for the association rules. The extension to the vector space model we here presented contemplates the dependence among terms in a clear, flexible and new way. It is clear because the dependence incorporation among the terms is made step by step and the vector space basis reflects the semantics defined for the adopted technique. The proposed model is flexible because it allows the correlation incorporation among the terms of collection obtained in several ways. Finally, the proposal is new because in the relevant literature there is not an extension to the vector space model which modifies the vector space basis as it was done in this work. We have evaluated the effectiveness of the model proposed with four reference collections. There was an increase in the retrieval model effectiveness in comparison with the classic vector space model for all of the reference collections used. As future works, the effectiveness of the proposed model will be compared to the effectiveness of the generalized vector space model. Besides, we will research other methods of obtaining correlation among the terms of a collection of documents. These methods will be incorporated in a geometric way to the model proposed in this paper. We also intend to evaluate the model proposed for larger collections formed by Web documents. References 1. Adriaans, P., Zantige, D. Data Mining. Inglaterra, Addison-Wesley, Agrawal, R., Imielinski, T., Swami, A. Mining association rules between sets of items in large databases. Proceedings of the ACM SIGMOD Conference. Washington, DC, USA, p , may Agrawal, R., Srikant, R. Fast algorithms for mining association rules. Proceedings of the 20th Int l Conference on Very Large Databases. Santiago, Chile, September Baeza-Yates, R., Ribeiro-Neto, B. Modern information retrieval. ACM/Addison-Wesley, Becker, J., Kuropka, D. Topic-based vector space model. Proceedings of the 6th International Conference on Business Information Systems, Colorado Springs, June 2003, p
11 6. Bollmann-Sdorra, P., Raghavan, V. V. On the necessity of term dependence in a query space for weighted retrieval. Journal of the American Society of Information Science, 49(13): , Buckley, C., Salton, G., Allan, J., Singhal, A. Automatic query expansion using SMART : TREC 3. In D. K. Harmon, editor, NIST Special Publication : The Third Text Retrieval conference (TREC 3), 1995, p CAM-Collection. ftp://ftp.cs.cornell.edu/pub/smart/cacm. 9. Han, J., Kamber, M. Data mining Concepts and techniques. San Diego: Academic Press, 2001, p Harman, D. Overview of the third Text Retrieval Conference. Proceedings of the third Text Retrieval Conference (TREC-3), Gaithersburg, MD,USA,1995, p Mandala, R., Tokunaga, T., Tanaka, H. M. Combining multiple evidence from different types of thesaurus for query expansion. Proceedings of the 22th annual international ACM SIGIR conference on Research and development in information retrieval, Berkeley, California, United States, August 1999, p Nie, J. Y., Jin, F. Integrating logical operators in query expansion in Vector Space Model. Workshop on Mathematical/Formal Methods in Information Retrieval, 25th ACM-SIGIR, Tampere, Finland, August Pôssas, B, Ziviani, N., Meira-Jr, W., Enhancing the set-based model using proximity information. Proceedings of the 9th International Symposium of String Processing and Information Retrieval, Lisbon, Portugal, September 2002, p Pôssas, B, Ziviani, N., Meira-Jr, W., Ribeiro-Neto, B. Modelagem vetorial estendida por regras de associação. XVI Simpósio Brasileiro de Banco de Dados, Rio de Janeiro, Brasil, Pôssas, B, Ziviani, N., Meira-Jr, W., Ribeiro-Neto, B. Set-based model: A new approach for information retrieval. Proceedings of the 25th International ACM SIGIR Conference on Research and Development in Information Retrieval, Tampere, Finland, August Salton, G. (ed) The SMART retrieval system experiments in automatic document processing. Englewood Cliffs, NJ: Prentice Hall, Salton, G., Lesk, M. E. Computer evaluation of indexing and text processing. Journal of the ACM, 15(1):8-36, Janeiro Salton, G., McGill M. J. Introduction to modern information retrieval. MacGraw Hill, New York, Shaw, W. M., Wood, R. E, Tiboo, H. R. The cystic fibrosis database: Content and research opportunities. Library and Information Science Research,13: , Voorhees E. M. Query expansion using lexical-semantic relations. Proceedings of the 17th ACM- SIGIR Conference, 1993, p Wong, S. K.M., Raghavan, V. V. The vector space model of information retrieval A reevaluation. Proceedings of the 7th annual international ACM SIGIR conference on Research and development in information retrieval, Cambridge, England, Wong, S. K.M., Ziarko, W., Raghavan, V. V., Wong, P. C.N. On modeling of information retrieval concepts in vector spaces. Proceedings of the ACMTransactions on Database Systems Volume 12, New York, NY, USA, June 1987, p Wong, S. K. M., Ziarko W., Wong, P. C. N. Generalized vector space model in information retrieval. Proceedings of the 8th ACM-SIGIR Conference on Research and Development in Information Retrieval. New York, USA, 1985, p
IMPROVING THE RELEVANCY OF DOCUMENT SEARCH USING THE MULTI-TERM ADJACENCY KEYWORD-ORDER MODEL
IMPROVING THE RELEVANCY OF DOCUMENT SEARCH USING THE MULTI-TERM ADJACENCY KEYWORD-ORDER MODEL Lim Bee Huang 1, Vimala Balakrishnan 2, Ram Gopal Raj 3 1,2 Department of Information System, 3 Department
More informationDesigning and Building an Automatic Information Retrieval System for Handling the Arabic Data
American Journal of Applied Sciences (): -, ISSN -99 Science Publications Designing and Building an Automatic Information Retrieval System for Handling the Arabic Data Ibrahiem M.M. El Emary and Ja'far
More informationMining Quantitative Association Rules on Overlapped Intervals
Mining Quantitative Association Rules on Overlapped Intervals Qiang Tong 1,3, Baoping Yan 2, and Yuanchun Zhou 1,3 1 Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China {tongqiang,
More informationInformation Retrieval. (M&S Ch 15)
Information Retrieval (M&S Ch 15) 1 Retrieval Models A retrieval model specifies the details of: Document representation Query representation Retrieval function Determines a notion of relevance. Notion
More informationAn Evolutionary Algorithm for Mining Association Rules Using Boolean Approach
An Evolutionary Algorithm for Mining Association Rules Using Boolean Approach ABSTRACT G.Ravi Kumar 1 Dr.G.A. Ramachandra 2 G.Sunitha 3 1. Research Scholar, Department of Computer Science &Technology,
More informationBalancing Manual and Automatic Indexing for Retrieval of Paper Abstracts
Balancing Manual and Automatic Indexing for Retrieval of Paper Abstracts Kwangcheol Shin 1, Sang-Yong Han 1, and Alexander Gelbukh 1,2 1 Computer Science and Engineering Department, Chung-Ang University,
More informationMaximal Termsets as a Query Structuring Mechanism
Maximal Termsets as a Query Structuring Mechanism ABSTRACT Bruno Pôssas Federal University of Minas Gerais 30161-970 Belo Horizonte-MG, Brazil bavep@dcc.ufmg.br Berthier Ribeiro-Neto Federal University
More informationTransforming Quantitative Transactional Databases into Binary Tables for Association Rule Mining Using the Apriori Algorithm
Transforming Quantitative Transactional Databases into Binary Tables for Association Rule Mining Using the Apriori Algorithm Expert Systems: Final (Research Paper) Project Daniel Josiah-Akintonde December
More informationMaking Retrieval Faster Through Document Clustering
R E S E A R C H R E P O R T I D I A P Making Retrieval Faster Through Document Clustering David Grangier 1 Alessandro Vinciarelli 2 IDIAP RR 04-02 January 23, 2004 D a l l e M o l l e I n s t i t u t e
More informationModern Information Retrieval
Modern Information Retrieval Chapter 3 Modeling Part I: Classic Models Introduction to IR Models Basic Concepts The Boolean Model Term Weighting The Vector Model Probabilistic Model Chap 03: Modeling,
More informationQUERY EXPANSION USING WORDNET WITH A LOGICAL MODEL OF INFORMATION RETRIEVAL
QUERY EXPANSION USING WORDNET WITH A LOGICAL MODEL OF INFORMATION RETRIEVAL David Parapar, Álvaro Barreiro AILab, Department of Computer Science, University of A Coruña, Spain dparapar@udc.es, barreiro@udc.es
More informationPercent Perfect Performance (PPP)
Percent Perfect Performance (PPP) Information Processing & Management, 43 (4), 2007, 1020-1029 Robert M. Losee CB#3360 University of North Carolina Chapel Hill, NC 27599-3360 email: losee at unc period
More informationInternational Journal of Advance Foundation and Research in Science & Engineering (IJAFRSE) Volume 1, Issue 2, July 2014.
A B S T R A C T International Journal of Advance Foundation and Research in Science & Engineering (IJAFRSE) Information Retrieval Models and Searching Methodologies: Survey Balwinder Saini*,Vikram Singh,Satish
More informationIntroduction to Information Retrieval
Introduction to Information Retrieval (Supplementary Material) Zhou Shuigeng March 23, 2007 Advanced Distributed Computing 1 Text Databases and IR Text databases (document databases) Large collections
More informationA Universal Model for XML Information Retrieval
A Universal Model for XML Information Retrieval Maria Izabel M. Azevedo 1, Lucas Pantuza Amorim 2, and Nívio Ziviani 3 1 Department of Computer Science, State University of Montes Claros, Montes Claros,
More informationA RECOMMENDER SYSTEM FOR SOCIAL BOOK SEARCH
A RECOMMENDER SYSTEM FOR SOCIAL BOOK SEARCH A thesis Submitted to the faculty of the graduate school of the University of Minnesota by Vamshi Krishna Thotempudi In partial fulfillment of the requirements
More informationA mining method for tracking changes in temporal association rules from an encoded database
A mining method for tracking changes in temporal association rules from an encoded database Chelliah Balasubramanian *, Karuppaswamy Duraiswamy ** K.S.Rangasamy College of Technology, Tiruchengode, Tamil
More informationA New Approach for Automatic Thesaurus Construction and Query Expansion for Document Retrieval
Information and Management Sciences Volume 18, Number 4, pp. 299-315, 2007 A New Approach for Automatic Thesaurus Construction and Query Expansion for Document Retrieval Liang-Yu Chen National Taiwan University
More informationDiscovering interesting rules from financial data
Discovering interesting rules from financial data Przemysław Sołdacki Institute of Computer Science Warsaw University of Technology Ul. Andersa 13, 00-159 Warszawa Tel: +48 609129896 email: psoldack@ii.pw.edu.pl
More informationhighest cosine coecient [5] are returned. Notice that a query can hit documents without having common terms because the k indexing dimensions indicate
Searching Information Servers Based on Customized Proles Technical Report USC-CS-96-636 Shih-Hao Li and Peter B. Danzig Computer Science Department University of Southern California Los Angeles, California
More informationChapter 6: Information Retrieval and Web Search. An introduction
Chapter 6: Information Retrieval and Web Search An introduction Introduction n Text mining refers to data mining using text documents as data. n Most text mining tasks use Information Retrieval (IR) methods
More informationmodern database systems lecture 4 : information retrieval
modern database systems lecture 4 : information retrieval Aristides Gionis Michael Mathioudakis spring 2016 in perspective structured data relational data RDBMS MySQL semi-structured data data-graph representation
More informationABSTRACT. VENKATESH, JAYASHREE. Pairwise Document Similarity using an Incremental Approach to TF-IDF. (Under the direction of Dr. Christopher Healey.
ABSTRACT VENKATESH, JAYASHREE. Pairwise Document Similarity using an Incremental Approach to TF-IDF. (Under the direction of Dr. Christopher Healey.) Advances in information and communication technologies
More informationModern Information Retrieval
Modern Information Retrieval Chapter 5 Relevance Feedback and Query Expansion Introduction A Framework for Feedback Methods Explicit Relevance Feedback Explicit Feedback Through Clicks Implicit Feedback
More informationAN IMPROVISED FREQUENT PATTERN TREE BASED ASSOCIATION RULE MINING TECHNIQUE WITH MINING FREQUENT ITEM SETS ALGORITHM AND A MODIFIED HEADER TABLE
AN IMPROVISED FREQUENT PATTERN TREE BASED ASSOCIATION RULE MINING TECHNIQUE WITH MINING FREQUENT ITEM SETS ALGORITHM AND A MODIFIED HEADER TABLE Vandit Agarwal 1, Mandhani Kushal 2 and Preetham Kumar 3
More informationA Content Vector Model for Text Classification
A Content Vector Model for Text Classification Eric Jiang Abstract As a popular rank-reduced vector space approach, Latent Semantic Indexing (LSI) has been used in information retrieval and other applications.
More informationUsing Query History to Prune Query Results
Using Query History to Prune Query Results Daniel Waegel Ursinus College Department of Computer Science dawaegel@gmail.com April Kontostathis Ursinus College Department of Computer Science akontostathis@ursinus.edu
More informationBoolean Model. Hongning Wang
Boolean Model Hongning Wang CS@UVa Abstraction of search engine architecture Indexed corpus Crawler Ranking procedure Doc Analyzer Doc Representation Query Rep Feedback (Query) Evaluation User Indexer
More informationA Conflict-Based Confidence Measure for Associative Classification
A Conflict-Based Confidence Measure for Associative Classification Peerapon Vateekul and Mei-Ling Shyu Department of Electrical and Computer Engineering University of Miami Coral Gables, FL 33124, USA
More informationAssociation Rule Mining. Entscheidungsunterstützungssysteme
Association Rule Mining Entscheidungsunterstützungssysteme Frequent Pattern Analysis Frequent pattern: a pattern (a set of items, subsequences, substructures, etc.) that occurs frequently in a data set
More informationPSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets
2011 Fourth International Symposium on Parallel Architectures, Algorithms and Programming PSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets Tao Xiao Chunfeng Yuan Yihua Huang Department
More informationDiscovery of Multi-level Association Rules from Primitive Level Frequent Patterns Tree
Discovery of Multi-level Association Rules from Primitive Level Frequent Patterns Tree Virendra Kumar Shrivastava 1, Parveen Kumar 2, K. R. Pardasani 3 1 Department of Computer Science & Engineering, Singhania
More informationMining Frequent Patterns with Counting Inference at Multiple Levels
International Journal of Computer Applications (097 7) Volume 3 No.10, July 010 Mining Frequent Patterns with Counting Inference at Multiple Levels Mittar Vishav Deptt. Of IT M.M.University, Mullana Ruchika
More informationData Mining Part 3. Associations Rules
Data Mining Part 3. Associations Rules 3.2 Efficient Frequent Itemset Mining Methods Fall 2009 Instructor: Dr. Masoud Yaghini Outline Apriori Algorithm Generating Association Rules from Frequent Itemsets
More informationDiscovery of Multi Dimensional Quantitative Closed Association Rules by Attributes Range Method
Discovery of Multi Dimensional Quantitative Closed Association Rules by Attributes Range Method Preetham Kumar, Ananthanarayana V S Abstract In this paper we propose a novel algorithm for discovering multi
More informationImproved Frequent Pattern Mining Algorithm with Indexing
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 16, Issue 6, Ver. VII (Nov Dec. 2014), PP 73-78 Improved Frequent Pattern Mining Algorithm with Indexing Prof.
More informationCLUSTERING, TIERED INDEXES AND TERM PROXIMITY WEIGHTING IN TEXT-BASED RETRIEVAL
STUDIA UNIV. BABEŞ BOLYAI, INFORMATICA, Volume LVII, Number 4, 2012 CLUSTERING, TIERED INDEXES AND TERM PROXIMITY WEIGHTING IN TEXT-BASED RETRIEVAL IOAN BADARINZA AND ADRIAN STERCA Abstract. In this paper
More informationAssociation Rule Mining from XML Data
144 Conference on Data Mining DMIN'06 Association Rule Mining from XML Data Qin Ding and Gnanasekaran Sundarraj Computer Science Program The Pennsylvania State University at Harrisburg Middletown, PA 17057,
More informationDiscovering the Association Rules in OLAP Data Cube with Daily Downloads of Folklore Materials *
Discovering the Association Rules in OLAP Data Cube with Daily Downloads of Folklore Materials * Galina Bogdanova, Tsvetanka Georgieva Abstract: Association rules mining is one kind of data mining techniques
More informationPseudo-Relevance Feedback and Title Re-Ranking for Chinese Information Retrieval
Pseudo-Relevance Feedback and Title Re-Ranking Chinese Inmation Retrieval Robert W.P. Luk Department of Computing The Hong Kong Polytechnic University Email: csrluk@comp.polyu.edu.hk K.F. Wong Dept. Systems
More informationMining High Order Decision Rules
Mining High Order Decision Rules Y.Y. Yao Department of Computer Science, University of Regina Regina, Saskatchewan, Canada S4S 0A2 e-mail: yyao@cs.uregina.ca Abstract. We introduce the notion of high
More informationAn Algorithm for Frequent Pattern Mining Based On Apriori
An Algorithm for Frequent Pattern Mining Based On Goswami D.N.*, Chaturvedi Anshu. ** Raghuvanshi C.S.*** *SOS In Computer Science Jiwaji University Gwalior ** Computer Application Department MITS Gwalior
More informationX. A Relevance Feedback System Based on Document Transformations. S. R. Friedman, J. A. Maceyak, and S. F. Weiss
X-l X. A Relevance Feedback System Based on Document Transformations S. R. Friedman, J. A. Maceyak, and S. F. Weiss Abstract An information retrieval system using relevance feedback to modify the document
More informationMining Generalized Sequential Patterns using Genetic Programming
Mining Generalized Sequential Patterns using Genetic Programming Sandra de Amo Universidade Federal de Uberlândia Faculdade de Computação Uberlândia MG - Brazil deamo@ufu.br Ary dos Santos Rocha Jr. Universidade
More informationPurna Prasad Mutyala et al, / (IJCSIT) International Journal of Computer Science and Information Technologies, Vol. 2 (5), 2011,
Weighted Association Rule Mining Without Pre-assigned Weights PURNA PRASAD MUTYALA, KUMAR VASANTHA Department of CSE, Avanthi Institute of Engg & Tech, Tamaram, Visakhapatnam, A.P., India. Abstract Association
More informationEVALUATING GENERALIZED ASSOCIATION RULES THROUGH OBJECTIVE MEASURES
EVALUATING GENERALIZED ASSOCIATION RULES THROUGH OBJECTIVE MEASURES Veronica Oliveira de Carvalho Professor of Centro Universitário de Araraquara Araraquara, São Paulo, Brazil Student of São Paulo University
More informationIn = number of words appearing exactly n times N = number of words in the collection of words A = a constant. For example, if N=100 and the most
In = number of words appearing exactly n times N = number of words in the collection of words A = a constant. For example, if N=100 and the most common word appears 10 times then A = rn*n/n = 1*10/100
More informationModern Information Retrieval
Modern Information Retrieval Chapter 3 Retrieval Evaluation Retrieval Performance Evaluation Reference Collections CFC: The Cystic Fibrosis Collection Retrieval Evaluation, Modern Information Retrieval,
More informationvector space retrieval many slides courtesy James Amherst
vector space retrieval many slides courtesy James Allan@umass Amherst 1 what is a retrieval model? Model is an idealization or abstraction of an actual process Mathematical models are used to study the
More informationAn Efficient Reduced Pattern Count Tree Method for Discovering Most Accurate Set of Frequent itemsets
IJCSNS International Journal of Computer Science and Network Security, VOL.8 No.8, August 2008 121 An Efficient Reduced Pattern Count Tree Method for Discovering Most Accurate Set of Frequent itemsets
More informationThe Effect of Word Sampling on Document Clustering
The Effect of Word Sampling on Document Clustering OMAR H. KARAM AHMED M. HAMAD SHERIN M. MOUSSA Department of Information Systems Faculty of Computer and Information Sciences University of Ain Shams,
More informationA Patent Retrieval Method Using a Hierarchy of Clusters at TUT
A Patent Retrieval Method Using a Hierarchy of Clusters at TUT Hironori Doi Yohei Seki Masaki Aono Toyohashi University of Technology 1-1 Hibarigaoka, Tenpaku-cho, Toyohashi-shi, Aichi 441-8580, Japan
More informationAssociation Rule Mining. Introduction 46. Study core 46
Learning Unit 7 Association Rule Mining Introduction 46 Study core 46 1 Association Rule Mining: Motivation and Main Concepts 46 2 Apriori Algorithm 47 3 FP-Growth Algorithm 47 4 Assignment Bundle: Frequent
More informationUsing Statistical Properties of Text to Create. Metadata. Computer Science and Electrical Engineering Department
Using Statistical Properties of Text to Create Metadata Grace Crowder crowder@cs.umbc.edu Charles Nicholas nicholas@cs.umbc.edu Computer Science and Electrical Engineering Department University of Maryland
More informationMining of Web Server Logs using Extended Apriori Algorithm
International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research) International Journal of Emerging Technologies in Computational
More informationWEIGHTING QUERY TERMS USING WORDNET ONTOLOGY
IJCSNS International Journal of Computer Science and Network Security, VOL.9 No.4, April 2009 349 WEIGHTING QUERY TERMS USING WORDNET ONTOLOGY Mohammed M. Sakre Mohammed M. Kouta Ali M. N. Allam Al Shorouk
More informationHandling Missing Values via Decomposition of the Conditioned Set
Handling Missing Values via Decomposition of the Conditioned Set Mei-Ling Shyu, Indika Priyantha Kuruppu-Appuhamilage Department of Electrical and Computer Engineering, University of Miami Coral Gables,
More informationDIVERSITY-BASED INTERESTINGNESS MEASURES FOR ASSOCIATION RULE MINING
DIVERSITY-BASED INTERESTINGNESS MEASURES FOR ASSOCIATION RULE MINING Huebner, Richard A. Norwich University rhuebner@norwich.edu ABSTRACT Association rule interestingness measures are used to help select
More informationInformation Retrieval and Web Search
Information Retrieval and Web Search Introduction to IR models and methods Rada Mihalcea (Some of the slides in this slide set come from IR courses taught at UT Austin and Stanford) Information Retrieval
More informationReactive Ranking for Cooperative Databases
Reactive Ranking for Cooperative Databases Berthier A. Ribeiro-Neto Guilherme T. Assis Computer Science Department Federal University of Minas Gerais Brazil berthiertavares @dcc.ufmg.br Abstract A cooperative
More informationA Novel Texture Classification Procedure by using Association Rules
ITB J. ICT Vol. 2, No. 2, 2008, 03-4 03 A Novel Texture Classification Procedure by using Association Rules L. Jaba Sheela & V.Shanthi 2 Panimalar Engineering College, Chennai. 2 St.Joseph s Engineering
More informationAn Apriori-like algorithm for Extracting Fuzzy Association Rules between Keyphrases in Text Documents
An Apriori-lie algorithm for Extracting Fuzzy Association Rules between Keyphrases in Text Documents Guy Danon Department of Information Systems Engineering Ben-Gurion University of the Negev Beer-Sheva
More informationA Comparative Study of Association Rules Mining Algorithms
A Comparative Study of Association Rules Mining Algorithms Cornelia Győrödi *, Robert Győrödi *, prof. dr. ing. Stefan Holban ** * Department of Computer Science, University of Oradea, Str. Armatei Romane
More informationCS 6320 Natural Language Processing
CS 6320 Natural Language Processing Information Retrieval Yang Liu Slides modified from Ray Mooney s (http://www.cs.utexas.edu/users/mooney/ir-course/slides/) 1 Introduction of IR System components, basic
More informationTadeusz Morzy, Maciej Zakrzewicz
From: KDD-98 Proceedings. Copyright 998, AAAI (www.aaai.org). All rights reserved. Group Bitmap Index: A Structure for Association Rules Retrieval Tadeusz Morzy, Maciej Zakrzewicz Institute of Computing
More informationUsing Association Rules for Better Treatment of Missing Values
Using Association Rules for Better Treatment of Missing Values SHARIQ BASHIR, SAAD RAZZAQ, UMER MAQBOOL, SONYA TAHIR, A. RAUF BAIG Department of Computer Science (Machine Intelligence Group) National University
More informationA NEW ASSOCIATION RULE MINING BASED ON FREQUENT ITEM SET
A NEW ASSOCIATION RULE MINING BASED ON FREQUENT ITEM SET Ms. Sanober Shaikh 1 Ms. Madhuri Rao 2 and Dr. S. S. Mantha 3 1 Department of Information Technology, TSEC, Bandra (w), Mumbai s.sanober1@gmail.com
More informationInformation Retrieval CS Lecture 06. Razvan C. Bunescu School of Electrical Engineering and Computer Science
Information Retrieval CS 6900 Lecture 06 Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Boolean Retrieval vs. Ranked Retrieval Many users (professionals) prefer
More informationA New Technique to Optimize User s Browsing Session using Data Mining
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 3, March 2015,
More informationOptimization using Ant Colony Algorithm
Optimization using Ant Colony Algorithm Er. Priya Batta 1, Er. Geetika Sharmai 2, Er. Deepshikha 3 1Faculty, Department of Computer Science, Chandigarh University,Gharaun,Mohali,Punjab 2Faculty, Department
More informationA novel supervised learning algorithm and its use for Spam Detection in Social Bookmarking Systems
A novel supervised learning algorithm and its use for Spam Detection in Social Bookmarking Systems Anestis Gkanogiannis and Theodore Kalamboukis Department of Informatics Athens University of Economics
More informationModern Information Retrieval
Modern Information Retrieval Ricardo Baeza-Yates Berthier Ribeiro-Neto ACM Press NewYork Harlow, England London New York Boston. San Francisco. Toronto. Sydney Singapore Hong Kong Tokyo Seoul Taipei. New
More informationSimilarity search in multimedia databases
Similarity search in multimedia databases Performance evaluation for similarity calculations in multimedia databases JO TRYTI AND JOHAN CARLSSON Bachelor s Thesis at CSC Supervisor: Michael Minock Examiner:
More informationA Search Relevancy Tuning Method Using Expert Results Content Evaluation
A Search Relevancy Tuning Method Using Expert Results Content Evaluation Boris Mark Tylevich Chair of System Integration and Management Moscow Institute of Physics and Technology Moscow, Russia email:boris@tylevich.ru
More informationRetrieval Evaluation
Retrieval Evaluation - Reference Collections Berlin Chen Department of Computer Science & Information Engineering National Taiwan Normal University References: 1. Modern Information Retrieval, Chapter
More informationAssociating Terms with Text Categories
Associating Terms with Text Categories Osmar R. Zaïane Department of Computing Science University of Alberta Edmonton, AB, Canada zaiane@cs.ualberta.ca Maria-Luiza Antonie Department of Computing Science
More informationPerformance Measures for Multi-Graded Relevance
Performance Measures for Multi-Graded Relevance Christian Scheel, Andreas Lommatzsch, and Sahin Albayrak Technische Universität Berlin, DAI-Labor, Germany {christian.scheel,andreas.lommatzsch,sahin.albayrak}@dai-labor.de
More informationTemporal Weighted Association Rule Mining for Classification
Temporal Weighted Association Rule Mining for Classification Purushottam Sharma and Kanak Saxena Abstract There are so many important techniques towards finding the association rules. But, when we consider
More informationData Mining: Mining Association Rules. Definitions. .. Cal Poly CSC 466: Knowledge Discovery from Data Alexander Dekhtyar..
.. Cal Poly CSC 466: Knowledge Discovery from Data Alexander Dekhtyar.. Data Mining: Mining Association Rules Definitions Market Baskets. Consider a set I = {i 1,...,i m }. We call the elements of I, items.
More informationModeling the Real World for Data Mining: Granular Computing Approach
Modeling the Real World for Data Mining: Granular Computing Approach T. Y. Lin Department of Mathematics and Computer Science San Jose State University San Jose California 95192-0103 and Berkeley Initiative
More informationPerformance Based Study of Association Rule Algorithms On Voter DB
Performance Based Study of Association Rule Algorithms On Voter DB K.Padmavathi 1, R.Aruna Kirithika 2 1 Department of BCA, St.Joseph s College, Thiruvalluvar University, Cuddalore, Tamil Nadu, India,
More informationARCHITECTURE AND IMPLEMENTATION OF A NEW USER INTERFACE FOR INTERNET SEARCH ENGINES
ARCHITECTURE AND IMPLEMENTATION OF A NEW USER INTERFACE FOR INTERNET SEARCH ENGINES Fidel Cacheda, Alberto Pan, Lucía Ardao, Angel Viña Department of Tecnoloxías da Información e as Comunicacións, Facultad
More informationA recommendation engine by using association rules
Available online at www.sciencedirect.com Procedia - Social and Behavioral Sciences 62 ( 2012 ) 452 456 WCBEM 2012 A recommendation engine by using association rules Ozgur Cakir a 1, Murat Efe Aras b a
More informationConcept-Based Interactive Query Expansion
Concept-Based Interactive Query Expansion Bruno M. Fonseca 12 maciel@dcc.ufmg.br Paulo Golgher 2 golgher@akwan.com.br Bruno Pôssas 12 bavep@akwan.com.br Berthier Ribeiro-Neto 1 2 berthier@dcc.ufmg.br Nivio
More informationOutline. Possible solutions. The basic problem. How? How? Relevance Feedback, Query Expansion, and Inputs to Ranking Beyond Similarity
Outline Relevance Feedback, Query Expansion, and Inputs to Ranking Beyond Similarity Lecture 10 CS 410/510 Information Retrieval on the Internet Query reformulation Sources of relevance for feedback Using
More informationInverted List Caching for Topical Index Shards
Inverted List Caching for Topical Index Shards Zhuyun Dai and Jamie Callan Language Technologies Institute, Carnegie Mellon University {zhuyund, callan}@cs.cmu.edu Abstract. Selective search is a distributed
More informationCHAPTER 3 ASSOCIATON RULE BASED CLUSTERING
41 CHAPTER 3 ASSOCIATON RULE BASED CLUSTERING 3.1 INTRODUCTION This chapter describes the clustering process based on association rule mining. As discussed in the introduction, clustering algorithms have
More informationInformation Retrieval. Information Retrieval and Web Search
Information Retrieval and Web Search Introduction to IR models and methods Information Retrieval The indexing and retrieval of textual documents. Searching for pages on the World Wide Web is the most recent
More informationRelevance Feedback and Query Reformulation. Lecture 10 CS 510 Information Retrieval on the Internet Thanks to Susan Price. Outline
Relevance Feedback and Query Reformulation Lecture 10 CS 510 Information Retrieval on the Internet Thanks to Susan Price IR on the Internet, Spring 2010 1 Outline Query reformulation Sources of relevance
More informationDocument Expansion for Text-based Image Retrieval at CLEF 2009
Document Expansion for Text-based Image Retrieval at CLEF 2009 Jinming Min, Peter Wilkins, Johannes Leveling, and Gareth Jones Centre for Next Generation Localisation School of Computing, Dublin City University
More informationUsing Coherence-based Measures to Predict Query Difficulty
Using Coherence-based Measures to Predict Query Difficulty Jiyin He, Martha Larson, and Maarten de Rijke ISLA, University of Amsterdam {jiyinhe,larson,mdr}@science.uva.nl Abstract. We investigate the potential
More informationA Technical Analysis of Market Basket by using Association Rule Mining and Apriori Algorithm
A Technical Analysis of Market Basket by using Association Rule Mining and Apriori Algorithm S.Pradeepkumar*, Mrs.C.Grace Padma** M.Phil Research Scholar, Department of Computer Science, RVS College of
More informationEncoding Words into String Vectors for Word Categorization
Int'l Conf. Artificial Intelligence ICAI'16 271 Encoding Words into String Vectors for Word Categorization Taeho Jo Department of Computer and Information Communication Engineering, Hongik University,
More informationMining Spatial Gene Expression Data Using Association Rules
Mining Spatial Gene Expression Data Using Association Rules M.Anandhavalli Reader, Department of Computer Science & Engineering Sikkim Manipal Institute of Technology Majitar-737136, India M.K.Ghose Prof&Head,
More informationAn Improved Apriori Algorithm for Association Rules
Research article An Improved Apriori Algorithm for Association Rules Hassan M. Najadat 1, Mohammed Al-Maolegi 2, Bassam Arkok 3 Computer Science, Jordan University of Science and Technology, Irbid, Jordan
More informationMinoru SASAKI and Kenji KITA. Department of Information Science & Intelligent Systems. Faculty of Engineering, Tokushima University
Information Retrieval System Using Concept Projection Based on PDDP algorithm Minoru SASAKI and Kenji KITA Department of Information Science & Intelligent Systems Faculty of Engineering, Tokushima University
More informationK-Means Clustering With Initial Centroids Based On Difference Operator
K-Means Clustering With Initial Centroids Based On Difference Operator Satish Chaurasiya 1, Dr.Ratish Agrawal 2 M.Tech Student, School of Information and Technology, R.G.P.V, Bhopal, India Assistant Professor,
More informationFinding the boundaries of attributes domains of quantitative association rules using abstraction- A Dynamic Approach
7th WSEAS International Conference on APPLIED COMPUTER SCIENCE, Venice, Italy, November 21-23, 2007 52 Finding the boundaries of attributes domains of quantitative association rules using abstraction-
More informationKnowledge Engineering in Search Engines
San Jose State University SJSU ScholarWorks Master's Projects Master's Theses and Graduate Research Spring 2012 Knowledge Engineering in Search Engines Yun-Chieh Lin Follow this and additional works at:
More informationModel for Load Balancing on Processors in Parallel Mining of Frequent Itemsets
American Journal of Applied Sciences 2 (5): 926-931, 2005 ISSN 1546-9239 Science Publications, 2005 Model for Load Balancing on Processors in Parallel Mining of Frequent Itemsets 1 Ravindra Patel, 2 S.S.
More information