SIMON: A Multi-strategy Classification Approach Resolving Ontology Heterogeneity The P2P Meets the Semantic Web *
|
|
- Justin Grant
- 5 years ago
- Views:
Transcription
1 SIMON: A Multi-strategy Classification Approach Resolving Ontology Heterogeneity The P2P Meets the Semantic Web * Leyun Pan, Liang Zhang, and Fanyuan Ma Department of Computer Science and Engineering Shanghai Jiao Tong University, Shanghai, China {pan-ly, zhangliang}@cs.sjtu.edu.cn, fyma@sjtu.edu.cn Abstract. The semantic web technology is seen as a key to realizing peer-topeer for resource discovery and service combination in the ubiquitous communication environment. However, in a Peer-to-Peer environment, we must face the situation, where individual peers maintain their own view of the domain in terms of the organization of the local information sources. Ontology heterogeneity among individual peers is becoming ever more important issues. In this paper, we propose a multi-strategy learning approach to resolve the problem. We describe the SIMON (Semantic Interoperation by Matching between ONtologies) system, which applies multiple classification methods to learn the matching between ontologies. We use the general statistic classification method to discover category features in data instances and use the first-order learning algorithm FOIL to exploit the semantic relations among data instances. On the prediction results of individual methods, the system combines their outcomes using our matching committee rule called the Best Outstanding Champion. The experiments show that SIMON system achieves high accuracy on real-world domain. 1 Introduction Today s P2P solutions support only limited update, search and retrieval functionality, which make current P2P systems unsuitable for knowledge sharing purposes. Metadata plays a central role in the effort of providing search techniques that go beyond string matching. Ontology-based metadata facilitates the access to domain knowledge. Furthermore, it enables the construction of semantic queries [1]. Existing approaches of ontology-based information access almost always assume a setting where information providers share an ontology that is used to access the information. However, we rather face the situation, where individual peers maintain their own view of the domain in terms of the organization of the local file system and other information sources. Enforcing the use of a global ontology in such an environment would mean to give up the benefits of the P2P approach mentioned above. * Research described in this paper is supported by The Science & Technology Committee of Shanghai Municipality Key Project Grant 02DJ14045 and by The Science & Technology Committee of Shanghai Municipality Key Technologies R&D Project Grant 03dz M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp , Springer-Verlag Berlin Heidelberg 2004
2 SIMON: A Multi-strategy Classification Approach 745 We can consider the process of addressing the semantic heterogeneity as the process of ontology matching (ontology mapping) [2]. Matching processes typically involve analyzing data instances associated with ontologies and comparing them to determine the correspondence among concepts. Given two ontologies in the same domain, we can find the most similar concept node in one ontology for each concept node in another one. However, at the Internet scale, finding such mappings is tedious, error-prone, and clearly not possible. It cannot satisfy the need of online exchange of ontology to two peers not in agreement. Hence, we must find some approaches to assist in the ontology (semi-) automatically matching process. In the paper, we will discuss the use of data instances associated with the ontology for addressing semantic heterogeneity. We propose the SIMON (Semantic Interoperation by Matching between ONtologies) system, which applies multiple classification methods to learn the matching between the pair of ontologies that are homogenous and their elements have significant overlap. Given the source ontology B and the target ontology A, for each concept node in target ontology A, we can find the most similar concept node from source ontology B. SIMON considers the ontology A and its data instances as the learning resource. All concept nodes in ontology A are the classification categories and relevant data instances of each concept are labeled learning samples in a classification process. The data instances of concept nodes in ontology B are unseen samples. SIMON classifies instances of each node in ontology B into the categories of ontology A according the classifiers for A. SIMON uses multiple learning strategies, namely multiple classifiers. Each of classifier exploits different type of information either in data instances or in the semantic relations among these data instances. Using appropriate matching committee method, we can get better result than simple classifier. This paper is organized as follows. In the next section, we introduce the overview of the ontology matching system. In section 3, we will discuss the multi-strategy classification for ontology matching. Section 4 presents the experiment results with our SIMON system. Section 5 reviews related work. We give the conclusion and the future work in section 6. 2 Overview of the Ontology Matching System The ontology matching system is trained to compare two ontologies and to find the correspondence among concept nodes. An example of such task is illustrated in Figure 1 and Figure 2. There are two ontologies of movie database. When a soft agent wants to collect some information about movies, it accesses a P2P system of movie. The movie information on individual peers will be marked up using some ontology such as Figure.1 or Figure.2. Here the data is organized into a hierarchical structure that includes movie, person, company, awards and so on. Movies have attributes such as title, language, cast&crew, production company and genre and so on. Some classes link to each other by some attributes shown as italic in figure. However, because each of peers may use different ontology, it is difficult to completely integrate all data for an agent that only master one ontology. For example, agent may consider that Movie in Allmovie is equivalent to Movie in IMDB. However, in fact Movie in IMDB is just an empty ontology node and MainMovieInfo in IMDB is the most similar to Movie in Allmovie. The
3 746 L. Pan, L. Zhang, and F. Ma mismatch also may happen between MoviePerson and Person, GenreInstance and Genre, Awards and Nominations and Awards. IMDB homepage: Movie Awards and Nominations result: category: AllMovie homepage: awardname: awardsmovie: MainMovieInfo Company title: name: Language: address: Plot: createdyear: cast&crew: production company: MoviePerson name: biography: countryofbirth: belongsto: filmography: Music title: musicmood: composer: awardswon: GenreInstance genretype: genrekeywords: Recommends: Movie title: Language: cast&crew: production: genre: Company name: address: createdyear: Person name: introduction: country: belongsto: filmography: Music title: musicmood: composer: Genre genretype: genrekeywords: Recommends: Awards result: category: awardname: awardsmovie: awardswon: genre: Actor roleplayed: awards: Director independent: awards: Player roleplayed: awards: Director independent: awards: Fig. 1. Ontology of movie database IMDB Fig. 2. Ontology of movie database Allmovie SIMON uses multi-strategy learning methods including both statistical and firstorder learning techniques. Each base learner exploits well a certain type of information from the training instances to build matching hypotheses. We use a statistical bag-of-words approach to classifying the pure text instances. Furthermore, the relations among concepts can help to learn the classifier. On the prediction results of individual methods, system combines their outcomes using our matching committee rule called the Best Outstanding Champion that is a weighted voting committee. This way, we can achieve higher matching accuracy than with any single base classifier alone. 3 Multi-strategies Learning for Ontology Matching 3.1 Statistical Text Classification One of methods that we use for text classification is naive Bayes, which is a kind of probabilistic models that ignore the words sequence and naively assumes that the presence of each word in a document is conditionally independent of all other words in the document. Naive Bayes for text classification can be formulated as follows. Given a set of classes C = { c1,..., cn} and a document consisting of k words, { w 1,..., wk}, we classify the document as a member of the class, c *, that is most probable, given the words in the document: c * = arg max c Pr( c w1,..., wk) Pr( c w1,..., wk) can be transformed into a computable expression by applying Bayes Rule (Eq. 2); rewriting the expression using the product rule and dropping the denominator, since this term is a constant across all classes, (Eq. 3); and assuming that words are independent of each other (Eq. 4). (1)
4 SIMON: A Multi-strategy Classification Approach 747 Pr( Pr( c ) Pr( w 1,..., w k c ) c w 1,..., w k ) = (2) Pr( w 1,..., w k ) k Pr( c ) Pr( w i c, w 1,... w i 1) (3) i = 1 = k (4) Pr( c ) Pr( w i c ) i = 1 Pr(c ) is estimated as the portion of training instances that belong to c. So a key step in implementing naive Bayes is estimating the word probabilities, Pr( wi c). We use Witten-Bell smoothing [3], which depends on the relationship between the number of unique words and the total number of word occurrences in the training data for the class: if most of the word occurrences are unique words, the prior is stronger; if words are often repeated, the prior is weaker. 3.2 First-Order Text Classification As mentioned above, data instances under ontology are richly structured datasets, where data best described by a graph where the nodes in the graph are objects and the edges in the graph are links or relations between objects. The methods for classifying data instances that we discussed in the previous section consider the words in a single node of the graph. However, the method can t learn models that take into account such features as the pattern of connectivity around a given instance, or the words occurring in instance of neighboring nodes. For example, we can learn a rule such as An data instance belongs to movie if it contains the words minute and release and is linked to an instance that contains the word birth." Clearly, rules of this type, that are able to represent general characteristics of a graph, can be exploited to improve the predictive accuracy of the learned models. This kind of rules can be concisely represented using a first-order representation. We can learn to classify text instance using a learner that is able to induce first-order rules. The learning algorithm that we use in our system is Quinlan's Foil algorithm [4]. Foil is a greedy covering algorithm for learning function-free Horn clauses definitions of a relation in terms of itself and other relations. Foil induces each Horn clause by beginning with an empty tail and using a hill-climbing search to add literals to the tail until the clause covers only positive instances. When Foil algorithm is used as a classification method, the input file for learning a category consists of the following relations: 1. category(instance): This is the target relation that will be learned from other background relations. Each learned target relation represents a classification rule for a category. 2. has_word(instance): This set of relations indicates which words occur in which instances. The sample belonging a specific has-word relation consists a set of instances in which the word word occurs. 3. linkto(instance, instance): This relation represents that the semantic relations between two data instances.
5 748 L. Pan, L. Zhang, and F. Ma We apply Foil to learn a separate set of clauses for every concept node in the ontology. When classifying the other ontology s data instances, if an instance can t match any clause of any category, we treat it as an instance of other category. 3.3 Evaluation of Classifiers for Matching and Matching Committees Method of Committees (a.k.a. ensembles) is based on the idea that, given a task that requires expert knowledge to perform, k experts may be better than one if their individual judgments are appropriately combined [7]. For obtaining matching result, there are two different matching committee methods according to whether utilizing classifier committee: microcommittees: System firstly utilizes classifier committee. Classifier committee will negotiate for the category of each unseen data instance. Then System will make matching decision on the base of single classification result. macrocommittees: System doesn t utilize classifier committee. Each classifier individually decides the category of each unseen data instance. Then System will negotiate for matching on the base of multiple classification results. To optimize the result of combination, generally, we wish we could give each member of committees a weight reflecting the expected relative effectiveness of member. There are some differences between evaluations of text classification and ontology matching. In text classification, the initial corpus can be easily split into two sets: a training(- and-validation) set and test set. However, the boundary among training set, test set and unseen data instance set in ontology matching process is not obvious. Firstly, test set is absent in ontology matching process in which the instances of target ontology are regarded as training set and the instances of source ontology are regarded as unseen samples. Secondly, unseen data instances are not completely unseen, because instances of source ontology all have labels and we just don t know what each label means. Because of the absence of test set, it is difficult to evaluate the classifier in microcommittees. Microcommittees can only believe the prior experience and manually evaluate the classifier weights, as did in [2]. We adopt macrocommittees in our ontology matching system. Notes that the instances of source ontology have the relative unseen feature. When these instances are classified, the unit is not a single but a category. So we can observe the distribution of a category of instances. Each classifier will find a champion that gains the maximal similarity degree in categories of target ontology. In these champions, some may have obvious predominance and the others may keep ahead other nodes just a little. Generally, the more outstanding one champions is, the more we believe it. Thus we can adopt the degree of outstandingness of candidate as the evaluation of effectiveness of each classifier. The degree of outstandingness can be observe from classification results and needn t be adjusted and optimized on a validation set. We propose a matching committee rule called the Best Outstanding Champion, which means that system chooses a final champion with maximal accumulated degree of outstandingness among champion-candidates. The method can be regarded as a weighted voting committee. Each classifier votes a ticket for the most similar node according to its judgment. However, each vote has different weight that can be measured by degree of champion s outstandingness. We define the degree of outstandingness as the ratio of champion to the secondary node.
6 SIMON: A Multi-strategy Classification Approach Experiments We take movie as our experiment domain. We choose the first three movie websites as our experimental objects which rank ahead in google directory Arts > Movies > Databases: IMDB, AllMovie and Rotten Tomatoes. We manually match three ontologies to each other to measure the matching accuracy that can be defined as the percentage of the manual mappings that machine predicted correctly. We found about 150 movies in each website. Then we exchange the keywords and found 300 movies again. So each ontology holds about 400 movies data instances except repetition. We use a three-fold cross-matching methodology to evaluate our algorithms. We conduct three runs in which we performed two experiments that map ontologies to each other. In each experiment, we train classifiers using data instances of target ontology and classify data instances of source ontology to find the matching pairs from source ontology to target ontology. Table 1. Results matrixs of statistic classifier and the First-Order classifier Table 1 shows the classification result matrixes of partial categories in Allmovie- IMDB experiment, respectively for the statistic classifier and the First-Order classifier (The numbers in the parentheses are the results of First-Order classifier). Each column of the matrix represents one category of source ontology Allmovie and shows how the instances of this category are classified to categories of target ontology IMDB. Boldface indicates the leading candidate on each column. These matrixes illustrate several interesting results. First, note that for most classes, the coverage of champion is high enough for matching judgment. For example, 63% of the Movie column in statistic classifier and 56% of the Player column in First- Order classifier are correctly classified. And second, there are notable exceptions to this trend: the Player and Director in statistic classifier; the Movie and the Person in First-Order classifier. There will be a wrong matching decision according to results of Player column in statistic classifier, where Player in AllMovie is not matched to Actor but Director in IMDB. In other columns, the first and the second are so close that we can t absolutely believe the matching results according to these classification results. The low level of classification coverage of champion for the Player and Director is explained by the characteristic of categories: two categories lack of feature properties.
7 750 L. Pan, L. Zhang, and F. Ma For this reason, many of the instances of two categories are classified to many other categories. However, our First-Order classifier can repair the shortcoming. By mining the information of neighboring instances-awards and nominations, we can learn the rules for two categories and classify most instances to the proper categories. Because the Player often wins the best actor awards and vice versa. The neighboring instances don t always provide correct evidence for classification. The Movie column and the Person column in table 6 belong to this situation. Because many data instances between these two categories link to each other, the effectiveness of the learned rules descends. Fortunately, in statistic classifier, the classification results of two categories are ideal. By using our matching committee rule, we can easily integrate the preferable classification results of both classifiers. After calculating and comparing the degree of outstandingness, we more trust the matching results for Movie and Person in statistic classifier and for Player and Director in First-Order classifier statistic learner First-Order Learner Matching committee AllMovie to IMDB IMDB to AllMovie RT to IMDB IMDB to RT RT to AllMovie AllMovie to RT Fig. 3. Ontology matching accuracy Figure.3 shows three runs and six groups of experimental results. We match two ontologies to each other in each run, where there is a little difference between two experimental results. The three bars in each experimental represent the matching accuracy produced by: (1) the statistic learner alone, (2) the First-Order learner alone, and (3) the matching committee using the previous two learners. 5 Related Works From perspective of ontology matching using data instance, some works are related to our system. In [2] some strategies classify the data instances and another strategy Relaxation Labeler searches for the mapping configuration that best satisfies the given domain constraints and heuristic knowledge. However, automated text classification is the core of our system. We focus on the full mining of data instances for automated classification and ontology matching. By constructing the classification samples according to the feature property set and exploiting the classification features in or among data instances, we can furthest utilize the text classification methods.
8 SIMON: A Multi-strategy Classification Approach 751 Furthermore, as regards the combination multiple learning strategies, [2] uses microcommittees and manually evaluate the classifier weights. But in our system, we adopt the degree of outstandingness as the weights of classifiers that can be computed from classification result. Not using any domain and heuristic knowledge, our system can automatically achieve the similar matching accuracy as in [2]. [5] also compare ontologies using similarity measures, whereas they compute the similarity between lexical entries. [6] describes the use of FOIL algorithm in classification and extraction for constructing knowledge bases from the web. 6 Conclusions The completely distributed nature and the high degree of autonomy of individual peers in a P2P system come with new challenges for the use of semantic descriptions. We propose a multi-strategy learning approach for resolving ontology heterogeneity in P2P systems. In the paper, we introduce the SIMON system and describe the key techniques. We take movie as our experiment domain and extract the ontologies and the data instances from three different movie database websites. We use the general statistic classification method to discover category features in data instances and use the first-order learning algorithm FOIL to exploit the semantic relations among data instances. The system combines their outcomes using our matching committee rule called the Best Outstanding Champion. A series of experiment results show that our approach can achieves higher accuracy on a real-world domain. References 1. J. Broekstra, M. Ehrig, P. Haase. A Metadata Model for Semantics-Based Peer-to-Peer Systems. Proceedings of SemPGRID 03, 1st Workshop on Semantics in Peer-to-Peer and Grid Computing 2. A. Doan, J. Madhavan, P. Domingos, and A. Halevy. Learning to Map between Ontologies on the Semantic Web. In Proceedings of the World Wide Web Conference (WWW-2002). 3. I. H. Witten, T. C. Bell. The zero-frequency problem: Estimating the probabilities of novel events in text compression. IEEE Transactions on Information Theory, 37(4), July J. R. Quinlan, R. M. Cameron-Jones. FOIL: A midterm report. In Proceedings of the European Conference on Machine Learning, pages 3-20, Vienna, Austria, A. Maedche, S. Staab. Comparing Ontologies- Similarity Measures and a Comparison Study. Internal Report No. 408, Institute AIFB, University of Karlsruhe, March M.Craven, D. DiPasquo, D. Freitag, A. McCalluma, T. Mitchell. Learning to Construct Knowledge Bases from the World Wide Web. Artificial Intelligence, Elsevier, F. Sebastiani. Machine Learning in Automated Text Categorization. ACM Computing Surveys, Vol. 34, No. 1, March 2002.
A Modular k-nearest Neighbor Classification Method for Massively Parallel Text Categorization
A Modular k-nearest Neighbor Classification Method for Massively Parallel Text Categorization Hai Zhao and Bao-Liang Lu Department of Computer Science and Engineering, Shanghai Jiao Tong University, 1954
More informationFig 1. Overview of IE-based text mining framework
DiscoTEX: A framework of Combining IE and KDD for Text Mining Ritesh Kumar Research Scholar, Singhania University, Pacheri Beri, Rajsthan riteshchandel@gmail.com Abstract: Text mining based on the integration
More informationFedX: A Federation Layer for Distributed Query Processing on Linked Open Data
FedX: A Federation Layer for Distributed Query Processing on Linked Open Data Andreas Schwarte 1, Peter Haase 1,KatjaHose 2, Ralf Schenkel 2, and Michael Schmidt 1 1 fluid Operations AG, Walldorf, Germany
More informationAn Empirical Study of Lazy Multilabel Classification Algorithms
An Empirical Study of Lazy Multilabel Classification Algorithms E. Spyromitros and G. Tsoumakas and I. Vlahavas Department of Informatics, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece
More informationPredict the box office of US movies
Predict the box office of US movies Group members: Hanqing Ma, Jin Sun, Zeyu Zhang 1. Introduction Our task is to predict the box office of the upcoming movies using the properties of the movies, such
More informationA Comparison of Text-Categorization Methods applied to N-Gram Frequency Statistics
A Comparison of Text-Categorization Methods applied to N-Gram Frequency Statistics Helmut Berger and Dieter Merkl 2 Faculty of Information Technology, University of Technology, Sydney, NSW, Australia hberger@it.uts.edu.au
More informationBibster A Semantics-Based Bibliographic Peer-to-Peer System
Bibster A Semantics-Based Bibliographic Peer-to-Peer System Peter Haase 1, Björn Schnizler 1, Jeen Broekstra 2, Marc Ehrig 1, Frank van Harmelen 2, Maarten Menken 2, Peter Mika 2, Michal Plechawski 3,
More informationLearning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li
Learning to Match Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li 1. Introduction The main tasks in many applications can be formalized as matching between heterogeneous objects, including search, recommendation,
More informationTowards Rule Learning Approaches to Instance-based Ontology Matching
Towards Rule Learning Approaches to Instance-based Ontology Matching Frederik Janssen 1, Faraz Fallahi 2 Jan Noessner 3, and Heiko Paulheim 1 1 Knowledge Engineering Group, TU Darmstadt, Hochschulstrasse
More informationEvaluation Methods for Focused Crawling
Evaluation Methods for Focused Crawling Andrea Passerini, Paolo Frasconi, and Giovanni Soda DSI, University of Florence, ITALY {passerini,paolo,giovanni}@dsi.ing.unifi.it Abstract. The exponential growth
More informationEstimating Missing Attribute Values Using Dynamically-Ordered Attribute Trees
Estimating Missing Attribute Values Using Dynamically-Ordered Attribute Trees Jing Wang Computer Science Department, The University of Iowa jing-wang-1@uiowa.edu W. Nick Street Management Sciences Department,
More informationKeywords APSE: Advanced Preferred Search Engine, Google Android Platform, Search Engine, Click-through data, Location and Content Concepts.
Volume 5, Issue 3, March 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Advanced Preferred
More informationCollaborative Framework for Testing Web Application Vulnerabilities Using STOWS
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IMPACT FACTOR: 5.258 IJCSMC,
More informationAUTOMATIC VISUAL CONCEPT DETECTION IN VIDEOS
AUTOMATIC VISUAL CONCEPT DETECTION IN VIDEOS Nilam B. Lonkar 1, Dinesh B. Hanchate 2 Student of Computer Engineering, Pune University VPKBIET, Baramati, India Computer Engineering, Pune University VPKBIET,
More informationWeb Data mining-a Research area in Web usage mining
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 13, Issue 1 (Jul. - Aug. 2013), PP 22-26 Web Data mining-a Research area in Web usage mining 1 V.S.Thiyagarajan,
More informationInternational Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X
Analysis about Classification Techniques on Categorical Data in Data Mining Assistant Professor P. Meena Department of Computer Science Adhiyaman Arts and Science College for Women Uthangarai, Krishnagiri,
More informationFault Identification from Web Log Files by Pattern Discovery
ABSTRACT International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2017 IJSRCSEIT Volume 2 Issue 2 ISSN : 2456-3307 Fault Identification from Web Log Files
More informationPredictive Analysis: Evaluation and Experimentation. Heejun Kim
Predictive Analysis: Evaluation and Experimentation Heejun Kim June 19, 2018 Evaluation and Experimentation Evaluation Metrics Cross-Validation Significance Tests Evaluation Predictive analysis: training
More informationPerformance Analysis of Data Mining Classification Techniques
Performance Analysis of Data Mining Classification Techniques Tejas Mehta 1, Dr. Dhaval Kathiriya 2 Ph.D. Student, School of Computer Science, Dr. Babasaheb Ambedkar Open University, Gujarat, India 1 Principal
More informationMining High Order Decision Rules
Mining High Order Decision Rules Y.Y. Yao Department of Computer Science, University of Regina Regina, Saskatchewan, Canada S4S 0A2 e-mail: yyao@cs.uregina.ca Abstract. We introduce the notion of high
More information70 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 6, NO. 1, FEBRUARY ClassView: Hierarchical Video Shot Classification, Indexing, and Accessing
70 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 6, NO. 1, FEBRUARY 2004 ClassView: Hierarchical Video Shot Classification, Indexing, and Accessing Jianping Fan, Ahmed K. Elmagarmid, Senior Member, IEEE, Xingquan
More informationHybrid Feature Selection for Modeling Intrusion Detection Systems
Hybrid Feature Selection for Modeling Intrusion Detection Systems Srilatha Chebrolu, Ajith Abraham and Johnson P Thomas Department of Computer Science, Oklahoma State University, USA ajith.abraham@ieee.org,
More informationMulti-Aspect Tagging for Collaborative Structuring
Multi-Aspect Tagging for Collaborative Structuring Katharina Morik and Michael Wurst University of Dortmund, Department of Computer Science Baroperstr. 301, 44221 Dortmund, Germany morik@ls8.cs.uni-dortmund
More informationXETA: extensible metadata System
XETA: extensible metadata System Abstract: This paper presents an extensible metadata system (XETA System) which makes it possible for the user to organize and extend the structure of metadata. We discuss
More informationAn Intelligent Retrieval Platform for Distributional Agriculture Science and Technology Data
An Intelligent Retrieval Platform for Distributional Agriculture Science and Technology Data Xiaorong Yang 1,2, Wensheng Wang 1,2, Qingtian Zeng 3, and Nengfu Xie 1,2 1 Agriculture Information Institute,
More informationMovie Recommender System - Hybrid Filtering Approach
Chapter 7 Movie Recommender System - Hybrid Filtering Approach Recommender System can be built using approaches like: (i) Collaborative Filtering (ii) Content Based Filtering and (iii) Hybrid Filtering.
More informationEfficient Pairwise Classification
Efficient Pairwise Classification Sang-Hyeun Park and Johannes Fürnkranz TU Darmstadt, Knowledge Engineering Group, D-64289 Darmstadt, Germany Abstract. Pairwise classification is a class binarization
More informationConstruction of Knowledge Base for Automatic Indexing and Classification Based. on Chinese Library Classification
Construction of Knowledge Base for Automatic Indexing and Classification Based on Chinese Library Classification Han-qing Hou, Chun-xiang Xue School of Information Science & Technology, Nanjing Agricultural
More informationAn Efficient Semantic Image Retrieval based on Color and Texture Features and Data Mining Techniques
An Efficient Semantic Image Retrieval based on Color and Texture Features and Data Mining Techniques Doaa M. Alebiary Department of computer Science, Faculty of computers and informatics Benha University
More informationLeveraging Knowledge Graphs for Web-Scale Unsupervised Semantic Parsing. Interspeech 2013
Leveraging Knowledge Graphs for Web-Scale Unsupervised Semantic Parsing LARRY HECK, DILEK HAKKANI-TÜR, GOKHAN TUR Focus of This Paper SLU and Entity Extraction (Slot Filling) Spoken Language Understanding
More informationSemantic Extensions to Syntactic Analysis of Queries Ben Handy, Rohini Rajaraman
Semantic Extensions to Syntactic Analysis of Queries Ben Handy, Rohini Rajaraman Abstract We intend to show that leveraging semantic features can improve precision and recall of query results in information
More informationA Finite State Mobile Agent Computation Model
A Finite State Mobile Agent Computation Model Yong Liu, Congfu Xu, Zhaohui Wu, Weidong Chen, and Yunhe Pan College of Computer Science, Zhejiang University Hangzhou 310027, PR China Abstract In this paper,
More informationDomain-specific Concept-based Information Retrieval System
Domain-specific Concept-based Information Retrieval System L. Shen 1, Y. K. Lim 1, H. T. Loh 2 1 Design Technology Institute Ltd, National University of Singapore, Singapore 2 Department of Mechanical
More informationBy Robin Dhamankar, Yoonkyong Lee, AnHai Doan, Alon Halevy and Pedro Domingos. Presented by Yael Kazaz
By Robin Dhamankar, Yoonkyong Lee, AnHai Doan, Alon Halevy and Pedro Domingos Presented by Yael Kazaz Example: Merging Real-Estate Agencies Two real-estate agencies: S and T, decide to merge Schema T has
More informationTour-Based Mode Choice Modeling: Using An Ensemble of (Un-) Conditional Data-Mining Classifiers
Tour-Based Mode Choice Modeling: Using An Ensemble of (Un-) Conditional Data-Mining Classifiers James P. Biagioni Piotr M. Szczurek Peter C. Nelson, Ph.D. Abolfazl Mohammadian, Ph.D. Agenda Background
More informationComment Extraction from Blog Posts and Its Applications to Opinion Mining
Comment Extraction from Blog Posts and Its Applications to Opinion Mining Huan-An Kao, Hsin-Hsi Chen Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan
More informationInternational Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.3, May Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani
LINK MINING PROCESS Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani Higher Colleges of Technology, United Arab Emirates ABSTRACT Many data mining and knowledge discovery methodologies and process models
More informationCollaborative Rough Clustering
Collaborative Rough Clustering Sushmita Mitra, Haider Banka, and Witold Pedrycz Machine Intelligence Unit, Indian Statistical Institute, Kolkata, India {sushmita, hbanka r}@isical.ac.in Dept. of Electrical
More informationAn Efficient Hash-based Association Rule Mining Approach for Document Clustering
An Efficient Hash-based Association Rule Mining Approach for Document Clustering NOHA NEGM #1, PASSENT ELKAFRAWY #2, ABD-ELBADEEH SALEM * 3 # Faculty of Science, Menoufia University Shebin El-Kom, EGYPT
More informationImproving Suffix Tree Clustering Algorithm for Web Documents
International Conference on Logistics Engineering, Management and Computer Science (LEMCS 2015) Improving Suffix Tree Clustering Algorithm for Web Documents Yan Zhuang Computer Center East China Normal
More informationNaïve Bayes for text classification
Road Map Basic concepts Decision tree induction Evaluation of classifiers Rule induction Classification using association rules Naïve Bayesian classification Naïve Bayes for text classification Support
More informationQuery Difficulty Prediction for Contextual Image Retrieval
Query Difficulty Prediction for Contextual Image Retrieval Xing Xing 1, Yi Zhang 1, and Mei Han 2 1 School of Engineering, UC Santa Cruz, Santa Cruz, CA 95064 2 Google Inc., Mountain View, CA 94043 Abstract.
More informationOntology Matching with CIDER: Evaluation Report for the OAEI 2008
Ontology Matching with CIDER: Evaluation Report for the OAEI 2008 Jorge Gracia, Eduardo Mena IIS Department, University of Zaragoza, Spain {jogracia,emena}@unizar.es Abstract. Ontology matching, the task
More informationCost-sensitive Boosting for Concept Drift
Cost-sensitive Boosting for Concept Drift Ashok Venkatesan, Narayanan C. Krishnan, Sethuraman Panchanathan Center for Cognitive Ubiquitous Computing, School of Computing, Informatics and Decision Systems
More informationMotivating Ontology-Driven Information Extraction
Motivating Ontology-Driven Information Extraction Burcu Yildiz 1 and Silvia Miksch 1, 2 1 Institute for Software Engineering and Interactive Systems, Vienna University of Technology, Vienna, Austria {yildiz,silvia}@
More informationWeb Page Classification using FP Growth Algorithm Akansha Garg,Computer Science Department Swami Vivekanad Subharti University,Meerut, India
Web Page Classification using FP Growth Algorithm Akansha Garg,Computer Science Department Swami Vivekanad Subharti University,Meerut, India Abstract - The primary goal of the web site is to provide the
More informationIMDB Film Prediction with Cross-validation Technique
IMDB Film Prediction with Cross-validation Technique Shivansh Jagga 1, Akhil Ranjan 2, Prof. Siva Shanmugan G 3 1, 2, 3 Department of Computer Science and Technology 1, 2, 3 Vellore Institute Of Technology,
More informationA Roadmap to an Enhanced Graph Based Data mining Approach for Multi-Relational Data mining
A Roadmap to an Enhanced Graph Based Data mining Approach for Multi-Relational Data mining D.Kavinya 1 Student, Department of CSE, K.S.Rangasamy College of Technology, Tiruchengode, Tamil Nadu, India 1
More informationUsing Association Rules for Better Treatment of Missing Values
Using Association Rules for Better Treatment of Missing Values SHARIQ BASHIR, SAAD RAZZAQ, UMER MAQBOOL, SONYA TAHIR, A. RAUF BAIG Department of Computer Science (Machine Intelligence Group) National University
More informationOntology Extraction from Heterogeneous Documents
Vol.3, Issue.2, March-April. 2013 pp-985-989 ISSN: 2249-6645 Ontology Extraction from Heterogeneous Documents Kirankumar Kataraki, 1 Sumana M 2 1 IV sem M.Tech/ Department of Information Science & Engg
More informationImproving Recognition through Object Sub-categorization
Improving Recognition through Object Sub-categorization Al Mansur and Yoshinori Kuno Graduate School of Science and Engineering, Saitama University, 255 Shimo-Okubo, Sakura-ku, Saitama-shi, Saitama 338-8570,
More informationDesign of Ontology for The Internet Movie Database (IMDb) Sasikanth Avancha, Srikanth Kallurkar, Tapan Kamdar
Design of Ontology for The Internet Movie Database (IMDb) Sasikanth Avancha, Srikanth Kallurkar, Tapan Kamdar {savanc1,skallu1,kamdar}@cs.umbc.edu Semester Project, CMSC 771, Spring 2001 Table of Contents
More informationFOAM Framework for Ontology Alignment and Mapping Results of the Ontology Alignment Evaluation Initiative
FOAM Framework for Ontology Alignment and Mapping Results of the Ontology Alignment Evaluation Initiative Marc Ehrig Institute AIFB University of Karlsruhe 76128 Karlsruhe, Germany ehrig@aifb.uni-karlsruhe.de
More informationStructure of Association Rule Classifiers: a Review
Structure of Association Rule Classifiers: a Review Koen Vanhoof Benoît Depaire Transportation Research Institute (IMOB), University Hasselt 3590 Diepenbeek, Belgium koen.vanhoof@uhasselt.be benoit.depaire@uhasselt.be
More informationLearning to Construct Knowledge Bases from the World Wide Web
Learning to Construct Knowledge Bases from the World Wide Web Mark Craven a Dan DiPasquo a Dayne Freitag b Andrew McCallum a,b Tom Mitchell a Kamal Nigam a Seán Slattery a a School of Computer Science,
More informationComparison of Recommender System Algorithms focusing on the New-Item and User-Bias Problem
Comparison of Recommender System Algorithms focusing on the New-Item and User-Bias Problem Stefan Hauger 1, Karen H. L. Tso 2, and Lars Schmidt-Thieme 2 1 Department of Computer Science, University of
More informationEnhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques
24 Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Ruxandra PETRE
More informationAutomatic New Topic Identification in Search Engine Transaction Log Using Goal Programming
Proceedings of the 2012 International Conference on Industrial Engineering and Operations Management Istanbul, Turkey, July 3 6, 2012 Automatic New Topic Identification in Search Engine Transaction Log
More informationResource Load Balancing Based on Multi-agent in ServiceBSP Model*
Resource Load Balancing Based on Multi-agent in ServiceBSP Model* Yan Jiang 1, Weiqin Tong 1, and Wentao Zhao 2 1 School of Computer Engineering and Science, Shanghai University 2 Image Processing and
More informationDiscovering Advertisement Links by Using URL Text
017 3rd International Conference on Computational Systems and Communications (ICCSC 017) Discovering Advertisement Links by Using URL Text Jing-Shan Xu1, a, Peng Chang, b,* and Yong-Zheng Zhang, c 1 School
More informationContext Ontology Construction For Cricket Video
Context Ontology Construction For Cricket Video Dr. Sunitha Abburu Professor& Director, Department of Computer Applications Adhiyamaan College of Engineering, Hosur, pin-635109, Tamilnadu, India Abstract
More informationTERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES
TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES Mu. Annalakshmi Research Scholar, Department of Computer Science, Alagappa University, Karaikudi. annalakshmi_mu@yahoo.co.in Dr. A.
More informationApplication of Support Vector Machine Algorithm in Spam Filtering
Application of Support Vector Machine Algorithm in E-Mail Spam Filtering Julia Bluszcz, Daria Fitisova, Alexander Hamann, Alexey Trifonov, Advisor: Patrick Jähnichen Abstract The problem of spam classification
More informationContent-based Dimensionality Reduction for Recommender Systems
Content-based Dimensionality Reduction for Recommender Systems Panagiotis Symeonidis Aristotle University, Department of Informatics, Thessaloniki 54124, Greece symeon@csd.auth.gr Abstract. Recommender
More informationSlides for Data Mining by I. H. Witten and E. Frank
Slides for Data Mining by I. H. Witten and E. Frank 7 Engineering the input and output Attribute selection Scheme-independent, scheme-specific Attribute discretization Unsupervised, supervised, error-
More informationFast Mode Decision for H.264/AVC Using Mode Prediction
Fast Mode Decision for H.264/AVC Using Mode Prediction Song-Hak Ri and Joern Ostermann Institut fuer Informationsverarbeitung, Appelstr 9A, D-30167 Hannover, Germany ri@tnt.uni-hannover.de ostermann@tnt.uni-hannover.de
More informationINTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY
INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK REVIEW PAPER ON IMPLEMENTATION OF DOCUMENT ANNOTATION USING CONTENT AND QUERYING
More informationRecommender Systems. Collaborative Filtering & Content-Based Recommending
Recommender Systems Collaborative Filtering & Content-Based Recommending 1 Recommender Systems Systems for recommending items (e.g. books, movies, CD s, web pages, newsgroup messages) to users based on
More informationProcess Mediation in Semantic Web Services
Process Mediation in Semantic Web Services Emilia Cimpian Digital Enterprise Research Institute, Institute for Computer Science, University of Innsbruck, Technikerstrasse 21a, A-6020 Innsbruck, Austria
More informationEfficient Case Based Feature Construction
Efficient Case Based Feature Construction Ingo Mierswa and Michael Wurst Artificial Intelligence Unit,Department of Computer Science, University of Dortmund, Germany {mierswa, wurst}@ls8.cs.uni-dortmund.de
More informationLink Prediction for Social Network
Link Prediction for Social Network Ning Lin Computer Science and Engineering University of California, San Diego Email: nil016@eng.ucsd.edu Abstract Friendship recommendation has become an important issue
More informationMulti-relational Decision Tree Induction
Multi-relational Decision Tree Induction Arno J. Knobbe 1,2, Arno Siebes 2, Daniël van der Wallen 1 1 Syllogic B.V., Hoefseweg 1, 3821 AE, Amersfoort, The Netherlands, {a.knobbe, d.van.der.wallen}@syllogic.com
More informationSTUDYING OF CLASSIFYING CHINESE SMS MESSAGES
STUDYING OF CLASSIFYING CHINESE SMS MESSAGES BASED ON BAYESIAN CLASSIFICATION 1 LI FENG, 2 LI JIGANG 1,2 Computer Science Department, DongHua University, Shanghai, China E-mail: 1 Lifeng@dhu.edu.cn, 2
More informationPRIVACY-PRESERVING MULTI-PARTY DECISION TREE INDUCTION
PRIVACY-PRESERVING MULTI-PARTY DECISION TREE INDUCTION Justin Z. Zhan, LiWu Chang, Stan Matwin Abstract We propose a new scheme for multiple parties to conduct data mining computations without disclosing
More informationKeywords Data alignment, Data annotation, Web database, Search Result Record
Volume 5, Issue 8, August 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Annotating Web
More informationInformation Retrieval
Introduction to Information Retrieval SCCS414: Information Storage and Retrieval Christopher Manning and Prabhakar Raghavan Lecture 10: Text Classification; Vector Space Classification (Rocchio) Relevance
More informationVideo annotation based on adaptive annular spatial partition scheme
Video annotation based on adaptive annular spatial partition scheme Guiguang Ding a), Lu Zhang, and Xiaoxu Li Key Laboratory for Information System Security, Ministry of Education, Tsinghua National Laboratory
More informationIMAGE RETRIEVAL SYSTEM: BASED ON USER REQUIREMENT AND INFERRING ANALYSIS TROUGH FEEDBACK
IMAGE RETRIEVAL SYSTEM: BASED ON USER REQUIREMENT AND INFERRING ANALYSIS TROUGH FEEDBACK 1 Mount Steffi Varish.C, 2 Guru Rama SenthilVel Abstract - Image Mining is a recent trended approach enveloped in
More informationA Framework for Securing Databases from Intrusion Threats
A Framework for Securing Databases from Intrusion Threats R. Prince Jeyaseelan James Department of Computer Applications, Valliammai Engineering College Affiliated to Anna University, Chennai, India Email:
More information2 Experimental Methodology and Results
Developing Consensus Ontologies for the Semantic Web Larry M. Stephens, Aurovinda K. Gangam, and Michael N. Huhns Department of Computer Science and Engineering University of South Carolina, Columbia,
More informationDATA ANALYSIS I. Types of Attributes Sparse, Incomplete, Inaccurate Data
DATA ANALYSIS I Types of Attributes Sparse, Incomplete, Inaccurate Data Sources Bramer, M. (2013). Principles of data mining. Springer. [12-21] Witten, I. H., Frank, E. (2011). Data Mining: Practical machine
More informationResPubliQA 2010
SZTAKI @ ResPubliQA 2010 David Mark Nemeskey Computer and Automation Research Institute, Hungarian Academy of Sciences, Budapest, Hungary (SZTAKI) Abstract. This paper summarizes the results of our first
More informationK-Nearest-Neighbours with a Novel Similarity Measure for Intrusion Detection
K-Nearest-Neighbours with a Novel Similarity Measure for Intrusion Detection Zhenghui Ma School of Computer Science The University of Birmingham Edgbaston, B15 2TT Birmingham, UK Ata Kaban School of Computer
More informationMulti-Stage Rocchio Classification for Large-scale Multilabeled
Multi-Stage Rocchio Classification for Large-scale Multilabeled Text data Dong-Hyun Lee Nangman Computing, 117D Garden five Tools, Munjeong-dong Songpa-gu, Seoul, Korea dhlee347@gmail.com Abstract. Large-scale
More informationSelection of Best Web Site by Applying COPRAS-G method Bindu Madhuri.Ch #1, Anand Chandulal.J #2, Padmaja.M #3
Selection of Best Web Site by Applying COPRAS-G method Bindu Madhuri.Ch #1, Anand Chandulal.J #2, Padmaja.M #3 Department of Computer Science & Engineering, Gitam University, INDIA 1. binducheekati@gmail.com,
More informationMidterm Examination CS540-2: Introduction to Artificial Intelligence
Midterm Examination CS540-2: Introduction to Artificial Intelligence March 15, 2018 LAST NAME: FIRST NAME: Problem Score Max Score 1 12 2 13 3 9 4 11 5 8 6 13 7 9 8 16 9 9 Total 100 Question 1. [12] Search
More informationA Comparative Study of Selected Classification Algorithms of Data Mining
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 6, June 2015, pg.220
More informationWhat is this Song About?: Identification of Keywords in Bollywood Lyrics
What is this Song About?: Identification of Keywords in Bollywood Lyrics by Drushti Apoorva G, Kritik Mathur, Priyansh Agrawal, Radhika Mamidi in 19th International Conference on Computational Linguistics
More informationA New Technique to Optimize User s Browsing Session using Data Mining
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 3, March 2015,
More informationA Bayesian Approach to Hybrid Image Retrieval
A Bayesian Approach to Hybrid Image Retrieval Pradhee Tandon and C. V. Jawahar Center for Visual Information Technology International Institute of Information Technology Hyderabad - 500032, INDIA {pradhee@research.,jawahar@}iiit.ac.in
More informationTowards a hybrid approach to Netflix Challenge
Towards a hybrid approach to Netflix Challenge Abhishek Gupta, Abhijeet Mohapatra, Tejaswi Tenneti March 12, 2009 1 Introduction Today Recommendation systems [3] have become indispensible because of the
More informationLearning Link-Based Naïve Bayes Classifiers from Ontology-Extended Distributed Data
Learning Link-Based Naïve Bayes Classifiers from Ontology-Extended Distributed Data Cornelia Caragea 1, Doina Caragea 2, and Vasant Honavar 1 1 Computer Science Department, Iowa State University 2 Computer
More informationAutomatic Interpretation of Natural Language for a Multimedia E-learning Tool
Automatic Interpretation of Natural Language for a Multimedia E-learning Tool Serge Linckels and Christoph Meinel Department for Theoretical Computer Science and New Applications, University of Trier {linckels,
More informationStudy on Classifiers using Genetic Algorithm and Class based Rules Generation
2012 International Conference on Software and Computer Applications (ICSCA 2012) IPCSIT vol. 41 (2012) (2012) IACSIT Press, Singapore Study on Classifiers using Genetic Algorithm and Class based Rules
More informationEfficient SQL-Querying Method for Data Mining in Large Data Bases
Efficient SQL-Querying Method for Data Mining in Large Data Bases Nguyen Hung Son Institute of Mathematics Warsaw University Banacha 2, 02095, Warsaw, Poland Abstract Data mining can be understood as a
More informationDocument Retrieval using Predication Similarity
Document Retrieval using Predication Similarity Kalpa Gunaratna 1 Kno.e.sis Center, Wright State University, Dayton, OH 45435 USA kalpa@knoesis.org Abstract. Document retrieval has been an important research
More informationClassifying Twitter Data in Multiple Classes Based On Sentiment Class Labels
Classifying Twitter Data in Multiple Classes Based On Sentiment Class Labels Richa Jain 1, Namrata Sharma 2 1M.Tech Scholar, Department of CSE, Sushila Devi Bansal College of Engineering, Indore (M.P.),
More informationInteractive Machine Learning (IML) Markup of OCR Generated Text by Exploiting Domain Knowledge: A Biodiversity Case Study
Interactive Machine Learning (IML) Markup of OCR Generated by Exploiting Domain Knowledge: A Biodiversity Case Study Several digitization projects such as Google books are involved in scanning millions
More informationDynamic Ensemble Construction via Heuristic Optimization
Dynamic Ensemble Construction via Heuristic Optimization Şenay Yaşar Sağlam and W. Nick Street Department of Management Sciences The University of Iowa Abstract Classifier ensembles, in which multiple
More informationContent Based Smart Crawler For Efficiently Harvesting Deep Web Interface
Content Based Smart Crawler For Efficiently Harvesting Deep Web Interface Prof. T.P.Aher(ME), Ms.Rupal R.Boob, Ms.Saburi V.Dhole, Ms.Dipika B.Avhad, Ms.Suvarna S.Burkul 1 Assistant Professor, Computer
More information