SIMON: A Multi-strategy Classification Approach Resolving Ontology Heterogeneity The P2P Meets the Semantic Web *

Size: px
Start display at page:

Download "SIMON: A Multi-strategy Classification Approach Resolving Ontology Heterogeneity The P2P Meets the Semantic Web *"

Transcription

1 SIMON: A Multi-strategy Classification Approach Resolving Ontology Heterogeneity The P2P Meets the Semantic Web * Leyun Pan, Liang Zhang, and Fanyuan Ma Department of Computer Science and Engineering Shanghai Jiao Tong University, Shanghai, China {pan-ly, zhangliang}@cs.sjtu.edu.cn, fyma@sjtu.edu.cn Abstract. The semantic web technology is seen as a key to realizing peer-topeer for resource discovery and service combination in the ubiquitous communication environment. However, in a Peer-to-Peer environment, we must face the situation, where individual peers maintain their own view of the domain in terms of the organization of the local information sources. Ontology heterogeneity among individual peers is becoming ever more important issues. In this paper, we propose a multi-strategy learning approach to resolve the problem. We describe the SIMON (Semantic Interoperation by Matching between ONtologies) system, which applies multiple classification methods to learn the matching between ontologies. We use the general statistic classification method to discover category features in data instances and use the first-order learning algorithm FOIL to exploit the semantic relations among data instances. On the prediction results of individual methods, the system combines their outcomes using our matching committee rule called the Best Outstanding Champion. The experiments show that SIMON system achieves high accuracy on real-world domain. 1 Introduction Today s P2P solutions support only limited update, search and retrieval functionality, which make current P2P systems unsuitable for knowledge sharing purposes. Metadata plays a central role in the effort of providing search techniques that go beyond string matching. Ontology-based metadata facilitates the access to domain knowledge. Furthermore, it enables the construction of semantic queries [1]. Existing approaches of ontology-based information access almost always assume a setting where information providers share an ontology that is used to access the information. However, we rather face the situation, where individual peers maintain their own view of the domain in terms of the organization of the local file system and other information sources. Enforcing the use of a global ontology in such an environment would mean to give up the benefits of the P2P approach mentioned above. * Research described in this paper is supported by The Science & Technology Committee of Shanghai Municipality Key Project Grant 02DJ14045 and by The Science & Technology Committee of Shanghai Municipality Key Technologies R&D Project Grant 03dz M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp , Springer-Verlag Berlin Heidelberg 2004

2 SIMON: A Multi-strategy Classification Approach 745 We can consider the process of addressing the semantic heterogeneity as the process of ontology matching (ontology mapping) [2]. Matching processes typically involve analyzing data instances associated with ontologies and comparing them to determine the correspondence among concepts. Given two ontologies in the same domain, we can find the most similar concept node in one ontology for each concept node in another one. However, at the Internet scale, finding such mappings is tedious, error-prone, and clearly not possible. It cannot satisfy the need of online exchange of ontology to two peers not in agreement. Hence, we must find some approaches to assist in the ontology (semi-) automatically matching process. In the paper, we will discuss the use of data instances associated with the ontology for addressing semantic heterogeneity. We propose the SIMON (Semantic Interoperation by Matching between ONtologies) system, which applies multiple classification methods to learn the matching between the pair of ontologies that are homogenous and their elements have significant overlap. Given the source ontology B and the target ontology A, for each concept node in target ontology A, we can find the most similar concept node from source ontology B. SIMON considers the ontology A and its data instances as the learning resource. All concept nodes in ontology A are the classification categories and relevant data instances of each concept are labeled learning samples in a classification process. The data instances of concept nodes in ontology B are unseen samples. SIMON classifies instances of each node in ontology B into the categories of ontology A according the classifiers for A. SIMON uses multiple learning strategies, namely multiple classifiers. Each of classifier exploits different type of information either in data instances or in the semantic relations among these data instances. Using appropriate matching committee method, we can get better result than simple classifier. This paper is organized as follows. In the next section, we introduce the overview of the ontology matching system. In section 3, we will discuss the multi-strategy classification for ontology matching. Section 4 presents the experiment results with our SIMON system. Section 5 reviews related work. We give the conclusion and the future work in section 6. 2 Overview of the Ontology Matching System The ontology matching system is trained to compare two ontologies and to find the correspondence among concept nodes. An example of such task is illustrated in Figure 1 and Figure 2. There are two ontologies of movie database. When a soft agent wants to collect some information about movies, it accesses a P2P system of movie. The movie information on individual peers will be marked up using some ontology such as Figure.1 or Figure.2. Here the data is organized into a hierarchical structure that includes movie, person, company, awards and so on. Movies have attributes such as title, language, cast&crew, production company and genre and so on. Some classes link to each other by some attributes shown as italic in figure. However, because each of peers may use different ontology, it is difficult to completely integrate all data for an agent that only master one ontology. For example, agent may consider that Movie in Allmovie is equivalent to Movie in IMDB. However, in fact Movie in IMDB is just an empty ontology node and MainMovieInfo in IMDB is the most similar to Movie in Allmovie. The

3 746 L. Pan, L. Zhang, and F. Ma mismatch also may happen between MoviePerson and Person, GenreInstance and Genre, Awards and Nominations and Awards. IMDB homepage: Movie Awards and Nominations result: category: AllMovie homepage: awardname: awardsmovie: MainMovieInfo Company title: name: Language: address: Plot: createdyear: cast&crew: production company: MoviePerson name: biography: countryofbirth: belongsto: filmography: Music title: musicmood: composer: awardswon: GenreInstance genretype: genrekeywords: Recommends: Movie title: Language: cast&crew: production: genre: Company name: address: createdyear: Person name: introduction: country: belongsto: filmography: Music title: musicmood: composer: Genre genretype: genrekeywords: Recommends: Awards result: category: awardname: awardsmovie: awardswon: genre: Actor roleplayed: awards: Director independent: awards: Player roleplayed: awards: Director independent: awards: Fig. 1. Ontology of movie database IMDB Fig. 2. Ontology of movie database Allmovie SIMON uses multi-strategy learning methods including both statistical and firstorder learning techniques. Each base learner exploits well a certain type of information from the training instances to build matching hypotheses. We use a statistical bag-of-words approach to classifying the pure text instances. Furthermore, the relations among concepts can help to learn the classifier. On the prediction results of individual methods, system combines their outcomes using our matching committee rule called the Best Outstanding Champion that is a weighted voting committee. This way, we can achieve higher matching accuracy than with any single base classifier alone. 3 Multi-strategies Learning for Ontology Matching 3.1 Statistical Text Classification One of methods that we use for text classification is naive Bayes, which is a kind of probabilistic models that ignore the words sequence and naively assumes that the presence of each word in a document is conditionally independent of all other words in the document. Naive Bayes for text classification can be formulated as follows. Given a set of classes C = { c1,..., cn} and a document consisting of k words, { w 1,..., wk}, we classify the document as a member of the class, c *, that is most probable, given the words in the document: c * = arg max c Pr( c w1,..., wk) Pr( c w1,..., wk) can be transformed into a computable expression by applying Bayes Rule (Eq. 2); rewriting the expression using the product rule and dropping the denominator, since this term is a constant across all classes, (Eq. 3); and assuming that words are independent of each other (Eq. 4). (1)

4 SIMON: A Multi-strategy Classification Approach 747 Pr( Pr( c ) Pr( w 1,..., w k c ) c w 1,..., w k ) = (2) Pr( w 1,..., w k ) k Pr( c ) Pr( w i c, w 1,... w i 1) (3) i = 1 = k (4) Pr( c ) Pr( w i c ) i = 1 Pr(c ) is estimated as the portion of training instances that belong to c. So a key step in implementing naive Bayes is estimating the word probabilities, Pr( wi c). We use Witten-Bell smoothing [3], which depends on the relationship between the number of unique words and the total number of word occurrences in the training data for the class: if most of the word occurrences are unique words, the prior is stronger; if words are often repeated, the prior is weaker. 3.2 First-Order Text Classification As mentioned above, data instances under ontology are richly structured datasets, where data best described by a graph where the nodes in the graph are objects and the edges in the graph are links or relations between objects. The methods for classifying data instances that we discussed in the previous section consider the words in a single node of the graph. However, the method can t learn models that take into account such features as the pattern of connectivity around a given instance, or the words occurring in instance of neighboring nodes. For example, we can learn a rule such as An data instance belongs to movie if it contains the words minute and release and is linked to an instance that contains the word birth." Clearly, rules of this type, that are able to represent general characteristics of a graph, can be exploited to improve the predictive accuracy of the learned models. This kind of rules can be concisely represented using a first-order representation. We can learn to classify text instance using a learner that is able to induce first-order rules. The learning algorithm that we use in our system is Quinlan's Foil algorithm [4]. Foil is a greedy covering algorithm for learning function-free Horn clauses definitions of a relation in terms of itself and other relations. Foil induces each Horn clause by beginning with an empty tail and using a hill-climbing search to add literals to the tail until the clause covers only positive instances. When Foil algorithm is used as a classification method, the input file for learning a category consists of the following relations: 1. category(instance): This is the target relation that will be learned from other background relations. Each learned target relation represents a classification rule for a category. 2. has_word(instance): This set of relations indicates which words occur in which instances. The sample belonging a specific has-word relation consists a set of instances in which the word word occurs. 3. linkto(instance, instance): This relation represents that the semantic relations between two data instances.

5 748 L. Pan, L. Zhang, and F. Ma We apply Foil to learn a separate set of clauses for every concept node in the ontology. When classifying the other ontology s data instances, if an instance can t match any clause of any category, we treat it as an instance of other category. 3.3 Evaluation of Classifiers for Matching and Matching Committees Method of Committees (a.k.a. ensembles) is based on the idea that, given a task that requires expert knowledge to perform, k experts may be better than one if their individual judgments are appropriately combined [7]. For obtaining matching result, there are two different matching committee methods according to whether utilizing classifier committee: microcommittees: System firstly utilizes classifier committee. Classifier committee will negotiate for the category of each unseen data instance. Then System will make matching decision on the base of single classification result. macrocommittees: System doesn t utilize classifier committee. Each classifier individually decides the category of each unseen data instance. Then System will negotiate for matching on the base of multiple classification results. To optimize the result of combination, generally, we wish we could give each member of committees a weight reflecting the expected relative effectiveness of member. There are some differences between evaluations of text classification and ontology matching. In text classification, the initial corpus can be easily split into two sets: a training(- and-validation) set and test set. However, the boundary among training set, test set and unseen data instance set in ontology matching process is not obvious. Firstly, test set is absent in ontology matching process in which the instances of target ontology are regarded as training set and the instances of source ontology are regarded as unseen samples. Secondly, unseen data instances are not completely unseen, because instances of source ontology all have labels and we just don t know what each label means. Because of the absence of test set, it is difficult to evaluate the classifier in microcommittees. Microcommittees can only believe the prior experience and manually evaluate the classifier weights, as did in [2]. We adopt macrocommittees in our ontology matching system. Notes that the instances of source ontology have the relative unseen feature. When these instances are classified, the unit is not a single but a category. So we can observe the distribution of a category of instances. Each classifier will find a champion that gains the maximal similarity degree in categories of target ontology. In these champions, some may have obvious predominance and the others may keep ahead other nodes just a little. Generally, the more outstanding one champions is, the more we believe it. Thus we can adopt the degree of outstandingness of candidate as the evaluation of effectiveness of each classifier. The degree of outstandingness can be observe from classification results and needn t be adjusted and optimized on a validation set. We propose a matching committee rule called the Best Outstanding Champion, which means that system chooses a final champion with maximal accumulated degree of outstandingness among champion-candidates. The method can be regarded as a weighted voting committee. Each classifier votes a ticket for the most similar node according to its judgment. However, each vote has different weight that can be measured by degree of champion s outstandingness. We define the degree of outstandingness as the ratio of champion to the secondary node.

6 SIMON: A Multi-strategy Classification Approach Experiments We take movie as our experiment domain. We choose the first three movie websites as our experimental objects which rank ahead in google directory Arts > Movies > Databases: IMDB, AllMovie and Rotten Tomatoes. We manually match three ontologies to each other to measure the matching accuracy that can be defined as the percentage of the manual mappings that machine predicted correctly. We found about 150 movies in each website. Then we exchange the keywords and found 300 movies again. So each ontology holds about 400 movies data instances except repetition. We use a three-fold cross-matching methodology to evaluate our algorithms. We conduct three runs in which we performed two experiments that map ontologies to each other. In each experiment, we train classifiers using data instances of target ontology and classify data instances of source ontology to find the matching pairs from source ontology to target ontology. Table 1. Results matrixs of statistic classifier and the First-Order classifier Table 1 shows the classification result matrixes of partial categories in Allmovie- IMDB experiment, respectively for the statistic classifier and the First-Order classifier (The numbers in the parentheses are the results of First-Order classifier). Each column of the matrix represents one category of source ontology Allmovie and shows how the instances of this category are classified to categories of target ontology IMDB. Boldface indicates the leading candidate on each column. These matrixes illustrate several interesting results. First, note that for most classes, the coverage of champion is high enough for matching judgment. For example, 63% of the Movie column in statistic classifier and 56% of the Player column in First- Order classifier are correctly classified. And second, there are notable exceptions to this trend: the Player and Director in statistic classifier; the Movie and the Person in First-Order classifier. There will be a wrong matching decision according to results of Player column in statistic classifier, where Player in AllMovie is not matched to Actor but Director in IMDB. In other columns, the first and the second are so close that we can t absolutely believe the matching results according to these classification results. The low level of classification coverage of champion for the Player and Director is explained by the characteristic of categories: two categories lack of feature properties.

7 750 L. Pan, L. Zhang, and F. Ma For this reason, many of the instances of two categories are classified to many other categories. However, our First-Order classifier can repair the shortcoming. By mining the information of neighboring instances-awards and nominations, we can learn the rules for two categories and classify most instances to the proper categories. Because the Player often wins the best actor awards and vice versa. The neighboring instances don t always provide correct evidence for classification. The Movie column and the Person column in table 6 belong to this situation. Because many data instances between these two categories link to each other, the effectiveness of the learned rules descends. Fortunately, in statistic classifier, the classification results of two categories are ideal. By using our matching committee rule, we can easily integrate the preferable classification results of both classifiers. After calculating and comparing the degree of outstandingness, we more trust the matching results for Movie and Person in statistic classifier and for Player and Director in First-Order classifier statistic learner First-Order Learner Matching committee AllMovie to IMDB IMDB to AllMovie RT to IMDB IMDB to RT RT to AllMovie AllMovie to RT Fig. 3. Ontology matching accuracy Figure.3 shows three runs and six groups of experimental results. We match two ontologies to each other in each run, where there is a little difference between two experimental results. The three bars in each experimental represent the matching accuracy produced by: (1) the statistic learner alone, (2) the First-Order learner alone, and (3) the matching committee using the previous two learners. 5 Related Works From perspective of ontology matching using data instance, some works are related to our system. In [2] some strategies classify the data instances and another strategy Relaxation Labeler searches for the mapping configuration that best satisfies the given domain constraints and heuristic knowledge. However, automated text classification is the core of our system. We focus on the full mining of data instances for automated classification and ontology matching. By constructing the classification samples according to the feature property set and exploiting the classification features in or among data instances, we can furthest utilize the text classification methods.

8 SIMON: A Multi-strategy Classification Approach 751 Furthermore, as regards the combination multiple learning strategies, [2] uses microcommittees and manually evaluate the classifier weights. But in our system, we adopt the degree of outstandingness as the weights of classifiers that can be computed from classification result. Not using any domain and heuristic knowledge, our system can automatically achieve the similar matching accuracy as in [2]. [5] also compare ontologies using similarity measures, whereas they compute the similarity between lexical entries. [6] describes the use of FOIL algorithm in classification and extraction for constructing knowledge bases from the web. 6 Conclusions The completely distributed nature and the high degree of autonomy of individual peers in a P2P system come with new challenges for the use of semantic descriptions. We propose a multi-strategy learning approach for resolving ontology heterogeneity in P2P systems. In the paper, we introduce the SIMON system and describe the key techniques. We take movie as our experiment domain and extract the ontologies and the data instances from three different movie database websites. We use the general statistic classification method to discover category features in data instances and use the first-order learning algorithm FOIL to exploit the semantic relations among data instances. The system combines their outcomes using our matching committee rule called the Best Outstanding Champion. A series of experiment results show that our approach can achieves higher accuracy on a real-world domain. References 1. J. Broekstra, M. Ehrig, P. Haase. A Metadata Model for Semantics-Based Peer-to-Peer Systems. Proceedings of SemPGRID 03, 1st Workshop on Semantics in Peer-to-Peer and Grid Computing 2. A. Doan, J. Madhavan, P. Domingos, and A. Halevy. Learning to Map between Ontologies on the Semantic Web. In Proceedings of the World Wide Web Conference (WWW-2002). 3. I. H. Witten, T. C. Bell. The zero-frequency problem: Estimating the probabilities of novel events in text compression. IEEE Transactions on Information Theory, 37(4), July J. R. Quinlan, R. M. Cameron-Jones. FOIL: A midterm report. In Proceedings of the European Conference on Machine Learning, pages 3-20, Vienna, Austria, A. Maedche, S. Staab. Comparing Ontologies- Similarity Measures and a Comparison Study. Internal Report No. 408, Institute AIFB, University of Karlsruhe, March M.Craven, D. DiPasquo, D. Freitag, A. McCalluma, T. Mitchell. Learning to Construct Knowledge Bases from the World Wide Web. Artificial Intelligence, Elsevier, F. Sebastiani. Machine Learning in Automated Text Categorization. ACM Computing Surveys, Vol. 34, No. 1, March 2002.

A Modular k-nearest Neighbor Classification Method for Massively Parallel Text Categorization

A Modular k-nearest Neighbor Classification Method for Massively Parallel Text Categorization A Modular k-nearest Neighbor Classification Method for Massively Parallel Text Categorization Hai Zhao and Bao-Liang Lu Department of Computer Science and Engineering, Shanghai Jiao Tong University, 1954

More information

Fig 1. Overview of IE-based text mining framework

Fig 1. Overview of IE-based text mining framework DiscoTEX: A framework of Combining IE and KDD for Text Mining Ritesh Kumar Research Scholar, Singhania University, Pacheri Beri, Rajsthan riteshchandel@gmail.com Abstract: Text mining based on the integration

More information

FedX: A Federation Layer for Distributed Query Processing on Linked Open Data

FedX: A Federation Layer for Distributed Query Processing on Linked Open Data FedX: A Federation Layer for Distributed Query Processing on Linked Open Data Andreas Schwarte 1, Peter Haase 1,KatjaHose 2, Ralf Schenkel 2, and Michael Schmidt 1 1 fluid Operations AG, Walldorf, Germany

More information

An Empirical Study of Lazy Multilabel Classification Algorithms

An Empirical Study of Lazy Multilabel Classification Algorithms An Empirical Study of Lazy Multilabel Classification Algorithms E. Spyromitros and G. Tsoumakas and I. Vlahavas Department of Informatics, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece

More information

Predict the box office of US movies

Predict the box office of US movies Predict the box office of US movies Group members: Hanqing Ma, Jin Sun, Zeyu Zhang 1. Introduction Our task is to predict the box office of the upcoming movies using the properties of the movies, such

More information

A Comparison of Text-Categorization Methods applied to N-Gram Frequency Statistics

A Comparison of Text-Categorization Methods applied to N-Gram Frequency Statistics A Comparison of Text-Categorization Methods applied to N-Gram Frequency Statistics Helmut Berger and Dieter Merkl 2 Faculty of Information Technology, University of Technology, Sydney, NSW, Australia hberger@it.uts.edu.au

More information

Bibster A Semantics-Based Bibliographic Peer-to-Peer System

Bibster A Semantics-Based Bibliographic Peer-to-Peer System Bibster A Semantics-Based Bibliographic Peer-to-Peer System Peter Haase 1, Björn Schnizler 1, Jeen Broekstra 2, Marc Ehrig 1, Frank van Harmelen 2, Maarten Menken 2, Peter Mika 2, Michal Plechawski 3,

More information

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li Learning to Match Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li 1. Introduction The main tasks in many applications can be formalized as matching between heterogeneous objects, including search, recommendation,

More information

Towards Rule Learning Approaches to Instance-based Ontology Matching

Towards Rule Learning Approaches to Instance-based Ontology Matching Towards Rule Learning Approaches to Instance-based Ontology Matching Frederik Janssen 1, Faraz Fallahi 2 Jan Noessner 3, and Heiko Paulheim 1 1 Knowledge Engineering Group, TU Darmstadt, Hochschulstrasse

More information

Evaluation Methods for Focused Crawling

Evaluation Methods for Focused Crawling Evaluation Methods for Focused Crawling Andrea Passerini, Paolo Frasconi, and Giovanni Soda DSI, University of Florence, ITALY {passerini,paolo,giovanni}@dsi.ing.unifi.it Abstract. The exponential growth

More information

Estimating Missing Attribute Values Using Dynamically-Ordered Attribute Trees

Estimating Missing Attribute Values Using Dynamically-Ordered Attribute Trees Estimating Missing Attribute Values Using Dynamically-Ordered Attribute Trees Jing Wang Computer Science Department, The University of Iowa jing-wang-1@uiowa.edu W. Nick Street Management Sciences Department,

More information

Keywords APSE: Advanced Preferred Search Engine, Google Android Platform, Search Engine, Click-through data, Location and Content Concepts.

Keywords APSE: Advanced Preferred Search Engine, Google Android Platform, Search Engine, Click-through data, Location and Content Concepts. Volume 5, Issue 3, March 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Advanced Preferred

More information

Collaborative Framework for Testing Web Application Vulnerabilities Using STOWS

Collaborative Framework for Testing Web Application Vulnerabilities Using STOWS Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IMPACT FACTOR: 5.258 IJCSMC,

More information

AUTOMATIC VISUAL CONCEPT DETECTION IN VIDEOS

AUTOMATIC VISUAL CONCEPT DETECTION IN VIDEOS AUTOMATIC VISUAL CONCEPT DETECTION IN VIDEOS Nilam B. Lonkar 1, Dinesh B. Hanchate 2 Student of Computer Engineering, Pune University VPKBIET, Baramati, India Computer Engineering, Pune University VPKBIET,

More information

Web Data mining-a Research area in Web usage mining

Web Data mining-a Research area in Web usage mining IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 13, Issue 1 (Jul. - Aug. 2013), PP 22-26 Web Data mining-a Research area in Web usage mining 1 V.S.Thiyagarajan,

More information

International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X

International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X Analysis about Classification Techniques on Categorical Data in Data Mining Assistant Professor P. Meena Department of Computer Science Adhiyaman Arts and Science College for Women Uthangarai, Krishnagiri,

More information

Fault Identification from Web Log Files by Pattern Discovery

Fault Identification from Web Log Files by Pattern Discovery ABSTRACT International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2017 IJSRCSEIT Volume 2 Issue 2 ISSN : 2456-3307 Fault Identification from Web Log Files

More information

Predictive Analysis: Evaluation and Experimentation. Heejun Kim

Predictive Analysis: Evaluation and Experimentation. Heejun Kim Predictive Analysis: Evaluation and Experimentation Heejun Kim June 19, 2018 Evaluation and Experimentation Evaluation Metrics Cross-Validation Significance Tests Evaluation Predictive analysis: training

More information

Performance Analysis of Data Mining Classification Techniques

Performance Analysis of Data Mining Classification Techniques Performance Analysis of Data Mining Classification Techniques Tejas Mehta 1, Dr. Dhaval Kathiriya 2 Ph.D. Student, School of Computer Science, Dr. Babasaheb Ambedkar Open University, Gujarat, India 1 Principal

More information

Mining High Order Decision Rules

Mining High Order Decision Rules Mining High Order Decision Rules Y.Y. Yao Department of Computer Science, University of Regina Regina, Saskatchewan, Canada S4S 0A2 e-mail: yyao@cs.uregina.ca Abstract. We introduce the notion of high

More information

70 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 6, NO. 1, FEBRUARY ClassView: Hierarchical Video Shot Classification, Indexing, and Accessing

70 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 6, NO. 1, FEBRUARY ClassView: Hierarchical Video Shot Classification, Indexing, and Accessing 70 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 6, NO. 1, FEBRUARY 2004 ClassView: Hierarchical Video Shot Classification, Indexing, and Accessing Jianping Fan, Ahmed K. Elmagarmid, Senior Member, IEEE, Xingquan

More information

Hybrid Feature Selection for Modeling Intrusion Detection Systems

Hybrid Feature Selection for Modeling Intrusion Detection Systems Hybrid Feature Selection for Modeling Intrusion Detection Systems Srilatha Chebrolu, Ajith Abraham and Johnson P Thomas Department of Computer Science, Oklahoma State University, USA ajith.abraham@ieee.org,

More information

Multi-Aspect Tagging for Collaborative Structuring

Multi-Aspect Tagging for Collaborative Structuring Multi-Aspect Tagging for Collaborative Structuring Katharina Morik and Michael Wurst University of Dortmund, Department of Computer Science Baroperstr. 301, 44221 Dortmund, Germany morik@ls8.cs.uni-dortmund

More information

XETA: extensible metadata System

XETA: extensible metadata System XETA: extensible metadata System Abstract: This paper presents an extensible metadata system (XETA System) which makes it possible for the user to organize and extend the structure of metadata. We discuss

More information

An Intelligent Retrieval Platform for Distributional Agriculture Science and Technology Data

An Intelligent Retrieval Platform for Distributional Agriculture Science and Technology Data An Intelligent Retrieval Platform for Distributional Agriculture Science and Technology Data Xiaorong Yang 1,2, Wensheng Wang 1,2, Qingtian Zeng 3, and Nengfu Xie 1,2 1 Agriculture Information Institute,

More information

Movie Recommender System - Hybrid Filtering Approach

Movie Recommender System - Hybrid Filtering Approach Chapter 7 Movie Recommender System - Hybrid Filtering Approach Recommender System can be built using approaches like: (i) Collaborative Filtering (ii) Content Based Filtering and (iii) Hybrid Filtering.

More information

Efficient Pairwise Classification

Efficient Pairwise Classification Efficient Pairwise Classification Sang-Hyeun Park and Johannes Fürnkranz TU Darmstadt, Knowledge Engineering Group, D-64289 Darmstadt, Germany Abstract. Pairwise classification is a class binarization

More information

Construction of Knowledge Base for Automatic Indexing and Classification Based. on Chinese Library Classification

Construction of Knowledge Base for Automatic Indexing and Classification Based. on Chinese Library Classification Construction of Knowledge Base for Automatic Indexing and Classification Based on Chinese Library Classification Han-qing Hou, Chun-xiang Xue School of Information Science & Technology, Nanjing Agricultural

More information

An Efficient Semantic Image Retrieval based on Color and Texture Features and Data Mining Techniques

An Efficient Semantic Image Retrieval based on Color and Texture Features and Data Mining Techniques An Efficient Semantic Image Retrieval based on Color and Texture Features and Data Mining Techniques Doaa M. Alebiary Department of computer Science, Faculty of computers and informatics Benha University

More information

Leveraging Knowledge Graphs for Web-Scale Unsupervised Semantic Parsing. Interspeech 2013

Leveraging Knowledge Graphs for Web-Scale Unsupervised Semantic Parsing. Interspeech 2013 Leveraging Knowledge Graphs for Web-Scale Unsupervised Semantic Parsing LARRY HECK, DILEK HAKKANI-TÜR, GOKHAN TUR Focus of This Paper SLU and Entity Extraction (Slot Filling) Spoken Language Understanding

More information

Semantic Extensions to Syntactic Analysis of Queries Ben Handy, Rohini Rajaraman

Semantic Extensions to Syntactic Analysis of Queries Ben Handy, Rohini Rajaraman Semantic Extensions to Syntactic Analysis of Queries Ben Handy, Rohini Rajaraman Abstract We intend to show that leveraging semantic features can improve precision and recall of query results in information

More information

A Finite State Mobile Agent Computation Model

A Finite State Mobile Agent Computation Model A Finite State Mobile Agent Computation Model Yong Liu, Congfu Xu, Zhaohui Wu, Weidong Chen, and Yunhe Pan College of Computer Science, Zhejiang University Hangzhou 310027, PR China Abstract In this paper,

More information

Domain-specific Concept-based Information Retrieval System

Domain-specific Concept-based Information Retrieval System Domain-specific Concept-based Information Retrieval System L. Shen 1, Y. K. Lim 1, H. T. Loh 2 1 Design Technology Institute Ltd, National University of Singapore, Singapore 2 Department of Mechanical

More information

By Robin Dhamankar, Yoonkyong Lee, AnHai Doan, Alon Halevy and Pedro Domingos. Presented by Yael Kazaz

By Robin Dhamankar, Yoonkyong Lee, AnHai Doan, Alon Halevy and Pedro Domingos. Presented by Yael Kazaz By Robin Dhamankar, Yoonkyong Lee, AnHai Doan, Alon Halevy and Pedro Domingos Presented by Yael Kazaz Example: Merging Real-Estate Agencies Two real-estate agencies: S and T, decide to merge Schema T has

More information

Tour-Based Mode Choice Modeling: Using An Ensemble of (Un-) Conditional Data-Mining Classifiers

Tour-Based Mode Choice Modeling: Using An Ensemble of (Un-) Conditional Data-Mining Classifiers Tour-Based Mode Choice Modeling: Using An Ensemble of (Un-) Conditional Data-Mining Classifiers James P. Biagioni Piotr M. Szczurek Peter C. Nelson, Ph.D. Abolfazl Mohammadian, Ph.D. Agenda Background

More information

Comment Extraction from Blog Posts and Its Applications to Opinion Mining

Comment Extraction from Blog Posts and Its Applications to Opinion Mining Comment Extraction from Blog Posts and Its Applications to Opinion Mining Huan-An Kao, Hsin-Hsi Chen Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan

More information

International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.3, May Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani

International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.3, May Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani LINK MINING PROCESS Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani Higher Colleges of Technology, United Arab Emirates ABSTRACT Many data mining and knowledge discovery methodologies and process models

More information

Collaborative Rough Clustering

Collaborative Rough Clustering Collaborative Rough Clustering Sushmita Mitra, Haider Banka, and Witold Pedrycz Machine Intelligence Unit, Indian Statistical Institute, Kolkata, India {sushmita, hbanka r}@isical.ac.in Dept. of Electrical

More information

An Efficient Hash-based Association Rule Mining Approach for Document Clustering

An Efficient Hash-based Association Rule Mining Approach for Document Clustering An Efficient Hash-based Association Rule Mining Approach for Document Clustering NOHA NEGM #1, PASSENT ELKAFRAWY #2, ABD-ELBADEEH SALEM * 3 # Faculty of Science, Menoufia University Shebin El-Kom, EGYPT

More information

Improving Suffix Tree Clustering Algorithm for Web Documents

Improving Suffix Tree Clustering Algorithm for Web Documents International Conference on Logistics Engineering, Management and Computer Science (LEMCS 2015) Improving Suffix Tree Clustering Algorithm for Web Documents Yan Zhuang Computer Center East China Normal

More information

Naïve Bayes for text classification

Naïve Bayes for text classification Road Map Basic concepts Decision tree induction Evaluation of classifiers Rule induction Classification using association rules Naïve Bayesian classification Naïve Bayes for text classification Support

More information

Query Difficulty Prediction for Contextual Image Retrieval

Query Difficulty Prediction for Contextual Image Retrieval Query Difficulty Prediction for Contextual Image Retrieval Xing Xing 1, Yi Zhang 1, and Mei Han 2 1 School of Engineering, UC Santa Cruz, Santa Cruz, CA 95064 2 Google Inc., Mountain View, CA 94043 Abstract.

More information

Ontology Matching with CIDER: Evaluation Report for the OAEI 2008

Ontology Matching with CIDER: Evaluation Report for the OAEI 2008 Ontology Matching with CIDER: Evaluation Report for the OAEI 2008 Jorge Gracia, Eduardo Mena IIS Department, University of Zaragoza, Spain {jogracia,emena}@unizar.es Abstract. Ontology matching, the task

More information

Cost-sensitive Boosting for Concept Drift

Cost-sensitive Boosting for Concept Drift Cost-sensitive Boosting for Concept Drift Ashok Venkatesan, Narayanan C. Krishnan, Sethuraman Panchanathan Center for Cognitive Ubiquitous Computing, School of Computing, Informatics and Decision Systems

More information

Motivating Ontology-Driven Information Extraction

Motivating Ontology-Driven Information Extraction Motivating Ontology-Driven Information Extraction Burcu Yildiz 1 and Silvia Miksch 1, 2 1 Institute for Software Engineering and Interactive Systems, Vienna University of Technology, Vienna, Austria {yildiz,silvia}@

More information

Web Page Classification using FP Growth Algorithm Akansha Garg,Computer Science Department Swami Vivekanad Subharti University,Meerut, India

Web Page Classification using FP Growth Algorithm Akansha Garg,Computer Science Department Swami Vivekanad Subharti University,Meerut, India Web Page Classification using FP Growth Algorithm Akansha Garg,Computer Science Department Swami Vivekanad Subharti University,Meerut, India Abstract - The primary goal of the web site is to provide the

More information

IMDB Film Prediction with Cross-validation Technique

IMDB Film Prediction with Cross-validation Technique IMDB Film Prediction with Cross-validation Technique Shivansh Jagga 1, Akhil Ranjan 2, Prof. Siva Shanmugan G 3 1, 2, 3 Department of Computer Science and Technology 1, 2, 3 Vellore Institute Of Technology,

More information

A Roadmap to an Enhanced Graph Based Data mining Approach for Multi-Relational Data mining

A Roadmap to an Enhanced Graph Based Data mining Approach for Multi-Relational Data mining A Roadmap to an Enhanced Graph Based Data mining Approach for Multi-Relational Data mining D.Kavinya 1 Student, Department of CSE, K.S.Rangasamy College of Technology, Tiruchengode, Tamil Nadu, India 1

More information

Using Association Rules for Better Treatment of Missing Values

Using Association Rules for Better Treatment of Missing Values Using Association Rules for Better Treatment of Missing Values SHARIQ BASHIR, SAAD RAZZAQ, UMER MAQBOOL, SONYA TAHIR, A. RAUF BAIG Department of Computer Science (Machine Intelligence Group) National University

More information

Ontology Extraction from Heterogeneous Documents

Ontology Extraction from Heterogeneous Documents Vol.3, Issue.2, March-April. 2013 pp-985-989 ISSN: 2249-6645 Ontology Extraction from Heterogeneous Documents Kirankumar Kataraki, 1 Sumana M 2 1 IV sem M.Tech/ Department of Information Science & Engg

More information

Improving Recognition through Object Sub-categorization

Improving Recognition through Object Sub-categorization Improving Recognition through Object Sub-categorization Al Mansur and Yoshinori Kuno Graduate School of Science and Engineering, Saitama University, 255 Shimo-Okubo, Sakura-ku, Saitama-shi, Saitama 338-8570,

More information

Design of Ontology for The Internet Movie Database (IMDb) Sasikanth Avancha, Srikanth Kallurkar, Tapan Kamdar

Design of Ontology for The Internet Movie Database (IMDb) Sasikanth Avancha, Srikanth Kallurkar, Tapan Kamdar Design of Ontology for The Internet Movie Database (IMDb) Sasikanth Avancha, Srikanth Kallurkar, Tapan Kamdar {savanc1,skallu1,kamdar}@cs.umbc.edu Semester Project, CMSC 771, Spring 2001 Table of Contents

More information

FOAM Framework for Ontology Alignment and Mapping Results of the Ontology Alignment Evaluation Initiative

FOAM Framework for Ontology Alignment and Mapping Results of the Ontology Alignment Evaluation Initiative FOAM Framework for Ontology Alignment and Mapping Results of the Ontology Alignment Evaluation Initiative Marc Ehrig Institute AIFB University of Karlsruhe 76128 Karlsruhe, Germany ehrig@aifb.uni-karlsruhe.de

More information

Structure of Association Rule Classifiers: a Review

Structure of Association Rule Classifiers: a Review Structure of Association Rule Classifiers: a Review Koen Vanhoof Benoît Depaire Transportation Research Institute (IMOB), University Hasselt 3590 Diepenbeek, Belgium koen.vanhoof@uhasselt.be benoit.depaire@uhasselt.be

More information

Learning to Construct Knowledge Bases from the World Wide Web

Learning to Construct Knowledge Bases from the World Wide Web Learning to Construct Knowledge Bases from the World Wide Web Mark Craven a Dan DiPasquo a Dayne Freitag b Andrew McCallum a,b Tom Mitchell a Kamal Nigam a Seán Slattery a a School of Computer Science,

More information

Comparison of Recommender System Algorithms focusing on the New-Item and User-Bias Problem

Comparison of Recommender System Algorithms focusing on the New-Item and User-Bias Problem Comparison of Recommender System Algorithms focusing on the New-Item and User-Bias Problem Stefan Hauger 1, Karen H. L. Tso 2, and Lars Schmidt-Thieme 2 1 Department of Computer Science, University of

More information

Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques

Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques 24 Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Ruxandra PETRE

More information

Automatic New Topic Identification in Search Engine Transaction Log Using Goal Programming

Automatic New Topic Identification in Search Engine Transaction Log Using Goal Programming Proceedings of the 2012 International Conference on Industrial Engineering and Operations Management Istanbul, Turkey, July 3 6, 2012 Automatic New Topic Identification in Search Engine Transaction Log

More information

Resource Load Balancing Based on Multi-agent in ServiceBSP Model*

Resource Load Balancing Based on Multi-agent in ServiceBSP Model* Resource Load Balancing Based on Multi-agent in ServiceBSP Model* Yan Jiang 1, Weiqin Tong 1, and Wentao Zhao 2 1 School of Computer Engineering and Science, Shanghai University 2 Image Processing and

More information

Discovering Advertisement Links by Using URL Text

Discovering Advertisement Links by Using URL Text 017 3rd International Conference on Computational Systems and Communications (ICCSC 017) Discovering Advertisement Links by Using URL Text Jing-Shan Xu1, a, Peng Chang, b,* and Yong-Zheng Zhang, c 1 School

More information

Context Ontology Construction For Cricket Video

Context Ontology Construction For Cricket Video Context Ontology Construction For Cricket Video Dr. Sunitha Abburu Professor& Director, Department of Computer Applications Adhiyamaan College of Engineering, Hosur, pin-635109, Tamilnadu, India Abstract

More information

TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES

TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES Mu. Annalakshmi Research Scholar, Department of Computer Science, Alagappa University, Karaikudi. annalakshmi_mu@yahoo.co.in Dr. A.

More information

Application of Support Vector Machine Algorithm in Spam Filtering

Application of Support Vector Machine Algorithm in  Spam Filtering Application of Support Vector Machine Algorithm in E-Mail Spam Filtering Julia Bluszcz, Daria Fitisova, Alexander Hamann, Alexey Trifonov, Advisor: Patrick Jähnichen Abstract The problem of spam classification

More information

Content-based Dimensionality Reduction for Recommender Systems

Content-based Dimensionality Reduction for Recommender Systems Content-based Dimensionality Reduction for Recommender Systems Panagiotis Symeonidis Aristotle University, Department of Informatics, Thessaloniki 54124, Greece symeon@csd.auth.gr Abstract. Recommender

More information

Slides for Data Mining by I. H. Witten and E. Frank

Slides for Data Mining by I. H. Witten and E. Frank Slides for Data Mining by I. H. Witten and E. Frank 7 Engineering the input and output Attribute selection Scheme-independent, scheme-specific Attribute discretization Unsupervised, supervised, error-

More information

Fast Mode Decision for H.264/AVC Using Mode Prediction

Fast Mode Decision for H.264/AVC Using Mode Prediction Fast Mode Decision for H.264/AVC Using Mode Prediction Song-Hak Ri and Joern Ostermann Institut fuer Informationsverarbeitung, Appelstr 9A, D-30167 Hannover, Germany ri@tnt.uni-hannover.de ostermann@tnt.uni-hannover.de

More information

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK REVIEW PAPER ON IMPLEMENTATION OF DOCUMENT ANNOTATION USING CONTENT AND QUERYING

More information

Recommender Systems. Collaborative Filtering & Content-Based Recommending

Recommender Systems. Collaborative Filtering & Content-Based Recommending Recommender Systems Collaborative Filtering & Content-Based Recommending 1 Recommender Systems Systems for recommending items (e.g. books, movies, CD s, web pages, newsgroup messages) to users based on

More information

Process Mediation in Semantic Web Services

Process Mediation in Semantic Web Services Process Mediation in Semantic Web Services Emilia Cimpian Digital Enterprise Research Institute, Institute for Computer Science, University of Innsbruck, Technikerstrasse 21a, A-6020 Innsbruck, Austria

More information

Efficient Case Based Feature Construction

Efficient Case Based Feature Construction Efficient Case Based Feature Construction Ingo Mierswa and Michael Wurst Artificial Intelligence Unit,Department of Computer Science, University of Dortmund, Germany {mierswa, wurst}@ls8.cs.uni-dortmund.de

More information

Link Prediction for Social Network

Link Prediction for Social Network Link Prediction for Social Network Ning Lin Computer Science and Engineering University of California, San Diego Email: nil016@eng.ucsd.edu Abstract Friendship recommendation has become an important issue

More information

Multi-relational Decision Tree Induction

Multi-relational Decision Tree Induction Multi-relational Decision Tree Induction Arno J. Knobbe 1,2, Arno Siebes 2, Daniël van der Wallen 1 1 Syllogic B.V., Hoefseweg 1, 3821 AE, Amersfoort, The Netherlands, {a.knobbe, d.van.der.wallen}@syllogic.com

More information

STUDYING OF CLASSIFYING CHINESE SMS MESSAGES

STUDYING OF CLASSIFYING CHINESE SMS MESSAGES STUDYING OF CLASSIFYING CHINESE SMS MESSAGES BASED ON BAYESIAN CLASSIFICATION 1 LI FENG, 2 LI JIGANG 1,2 Computer Science Department, DongHua University, Shanghai, China E-mail: 1 Lifeng@dhu.edu.cn, 2

More information

PRIVACY-PRESERVING MULTI-PARTY DECISION TREE INDUCTION

PRIVACY-PRESERVING MULTI-PARTY DECISION TREE INDUCTION PRIVACY-PRESERVING MULTI-PARTY DECISION TREE INDUCTION Justin Z. Zhan, LiWu Chang, Stan Matwin Abstract We propose a new scheme for multiple parties to conduct data mining computations without disclosing

More information

Keywords Data alignment, Data annotation, Web database, Search Result Record

Keywords Data alignment, Data annotation, Web database, Search Result Record Volume 5, Issue 8, August 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Annotating Web

More information

Information Retrieval

Information Retrieval Introduction to Information Retrieval SCCS414: Information Storage and Retrieval Christopher Manning and Prabhakar Raghavan Lecture 10: Text Classification; Vector Space Classification (Rocchio) Relevance

More information

Video annotation based on adaptive annular spatial partition scheme

Video annotation based on adaptive annular spatial partition scheme Video annotation based on adaptive annular spatial partition scheme Guiguang Ding a), Lu Zhang, and Xiaoxu Li Key Laboratory for Information System Security, Ministry of Education, Tsinghua National Laboratory

More information

IMAGE RETRIEVAL SYSTEM: BASED ON USER REQUIREMENT AND INFERRING ANALYSIS TROUGH FEEDBACK

IMAGE RETRIEVAL SYSTEM: BASED ON USER REQUIREMENT AND INFERRING ANALYSIS TROUGH FEEDBACK IMAGE RETRIEVAL SYSTEM: BASED ON USER REQUIREMENT AND INFERRING ANALYSIS TROUGH FEEDBACK 1 Mount Steffi Varish.C, 2 Guru Rama SenthilVel Abstract - Image Mining is a recent trended approach enveloped in

More information

A Framework for Securing Databases from Intrusion Threats

A Framework for Securing Databases from Intrusion Threats A Framework for Securing Databases from Intrusion Threats R. Prince Jeyaseelan James Department of Computer Applications, Valliammai Engineering College Affiliated to Anna University, Chennai, India Email:

More information

2 Experimental Methodology and Results

2 Experimental Methodology and Results Developing Consensus Ontologies for the Semantic Web Larry M. Stephens, Aurovinda K. Gangam, and Michael N. Huhns Department of Computer Science and Engineering University of South Carolina, Columbia,

More information

DATA ANALYSIS I. Types of Attributes Sparse, Incomplete, Inaccurate Data

DATA ANALYSIS I. Types of Attributes Sparse, Incomplete, Inaccurate Data DATA ANALYSIS I Types of Attributes Sparse, Incomplete, Inaccurate Data Sources Bramer, M. (2013). Principles of data mining. Springer. [12-21] Witten, I. H., Frank, E. (2011). Data Mining: Practical machine

More information

ResPubliQA 2010

ResPubliQA 2010 SZTAKI @ ResPubliQA 2010 David Mark Nemeskey Computer and Automation Research Institute, Hungarian Academy of Sciences, Budapest, Hungary (SZTAKI) Abstract. This paper summarizes the results of our first

More information

K-Nearest-Neighbours with a Novel Similarity Measure for Intrusion Detection

K-Nearest-Neighbours with a Novel Similarity Measure for Intrusion Detection K-Nearest-Neighbours with a Novel Similarity Measure for Intrusion Detection Zhenghui Ma School of Computer Science The University of Birmingham Edgbaston, B15 2TT Birmingham, UK Ata Kaban School of Computer

More information

Multi-Stage Rocchio Classification for Large-scale Multilabeled

Multi-Stage Rocchio Classification for Large-scale Multilabeled Multi-Stage Rocchio Classification for Large-scale Multilabeled Text data Dong-Hyun Lee Nangman Computing, 117D Garden five Tools, Munjeong-dong Songpa-gu, Seoul, Korea dhlee347@gmail.com Abstract. Large-scale

More information

Selection of Best Web Site by Applying COPRAS-G method Bindu Madhuri.Ch #1, Anand Chandulal.J #2, Padmaja.M #3

Selection of Best Web Site by Applying COPRAS-G method Bindu Madhuri.Ch #1, Anand Chandulal.J #2, Padmaja.M #3 Selection of Best Web Site by Applying COPRAS-G method Bindu Madhuri.Ch #1, Anand Chandulal.J #2, Padmaja.M #3 Department of Computer Science & Engineering, Gitam University, INDIA 1. binducheekati@gmail.com,

More information

Midterm Examination CS540-2: Introduction to Artificial Intelligence

Midterm Examination CS540-2: Introduction to Artificial Intelligence Midterm Examination CS540-2: Introduction to Artificial Intelligence March 15, 2018 LAST NAME: FIRST NAME: Problem Score Max Score 1 12 2 13 3 9 4 11 5 8 6 13 7 9 8 16 9 9 Total 100 Question 1. [12] Search

More information

A Comparative Study of Selected Classification Algorithms of Data Mining

A Comparative Study of Selected Classification Algorithms of Data Mining Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 6, June 2015, pg.220

More information

What is this Song About?: Identification of Keywords in Bollywood Lyrics

What is this Song About?: Identification of Keywords in Bollywood Lyrics What is this Song About?: Identification of Keywords in Bollywood Lyrics by Drushti Apoorva G, Kritik Mathur, Priyansh Agrawal, Radhika Mamidi in 19th International Conference on Computational Linguistics

More information

A New Technique to Optimize User s Browsing Session using Data Mining

A New Technique to Optimize User s Browsing Session using Data Mining Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 3, March 2015,

More information

A Bayesian Approach to Hybrid Image Retrieval

A Bayesian Approach to Hybrid Image Retrieval A Bayesian Approach to Hybrid Image Retrieval Pradhee Tandon and C. V. Jawahar Center for Visual Information Technology International Institute of Information Technology Hyderabad - 500032, INDIA {pradhee@research.,jawahar@}iiit.ac.in

More information

Towards a hybrid approach to Netflix Challenge

Towards a hybrid approach to Netflix Challenge Towards a hybrid approach to Netflix Challenge Abhishek Gupta, Abhijeet Mohapatra, Tejaswi Tenneti March 12, 2009 1 Introduction Today Recommendation systems [3] have become indispensible because of the

More information

Learning Link-Based Naïve Bayes Classifiers from Ontology-Extended Distributed Data

Learning Link-Based Naïve Bayes Classifiers from Ontology-Extended Distributed Data Learning Link-Based Naïve Bayes Classifiers from Ontology-Extended Distributed Data Cornelia Caragea 1, Doina Caragea 2, and Vasant Honavar 1 1 Computer Science Department, Iowa State University 2 Computer

More information

Automatic Interpretation of Natural Language for a Multimedia E-learning Tool

Automatic Interpretation of Natural Language for a Multimedia E-learning Tool Automatic Interpretation of Natural Language for a Multimedia E-learning Tool Serge Linckels and Christoph Meinel Department for Theoretical Computer Science and New Applications, University of Trier {linckels,

More information

Study on Classifiers using Genetic Algorithm and Class based Rules Generation

Study on Classifiers using Genetic Algorithm and Class based Rules Generation 2012 International Conference on Software and Computer Applications (ICSCA 2012) IPCSIT vol. 41 (2012) (2012) IACSIT Press, Singapore Study on Classifiers using Genetic Algorithm and Class based Rules

More information

Efficient SQL-Querying Method for Data Mining in Large Data Bases

Efficient SQL-Querying Method for Data Mining in Large Data Bases Efficient SQL-Querying Method for Data Mining in Large Data Bases Nguyen Hung Son Institute of Mathematics Warsaw University Banacha 2, 02095, Warsaw, Poland Abstract Data mining can be understood as a

More information

Document Retrieval using Predication Similarity

Document Retrieval using Predication Similarity Document Retrieval using Predication Similarity Kalpa Gunaratna 1 Kno.e.sis Center, Wright State University, Dayton, OH 45435 USA kalpa@knoesis.org Abstract. Document retrieval has been an important research

More information

Classifying Twitter Data in Multiple Classes Based On Sentiment Class Labels

Classifying Twitter Data in Multiple Classes Based On Sentiment Class Labels Classifying Twitter Data in Multiple Classes Based On Sentiment Class Labels Richa Jain 1, Namrata Sharma 2 1M.Tech Scholar, Department of CSE, Sushila Devi Bansal College of Engineering, Indore (M.P.),

More information

Interactive Machine Learning (IML) Markup of OCR Generated Text by Exploiting Domain Knowledge: A Biodiversity Case Study

Interactive Machine Learning (IML) Markup of OCR Generated Text by Exploiting Domain Knowledge: A Biodiversity Case Study Interactive Machine Learning (IML) Markup of OCR Generated by Exploiting Domain Knowledge: A Biodiversity Case Study Several digitization projects such as Google books are involved in scanning millions

More information

Dynamic Ensemble Construction via Heuristic Optimization

Dynamic Ensemble Construction via Heuristic Optimization Dynamic Ensemble Construction via Heuristic Optimization Şenay Yaşar Sağlam and W. Nick Street Department of Management Sciences The University of Iowa Abstract Classifier ensembles, in which multiple

More information

Content Based Smart Crawler For Efficiently Harvesting Deep Web Interface

Content Based Smart Crawler For Efficiently Harvesting Deep Web Interface Content Based Smart Crawler For Efficiently Harvesting Deep Web Interface Prof. T.P.Aher(ME), Ms.Rupal R.Boob, Ms.Saburi V.Dhole, Ms.Dipika B.Avhad, Ms.Suvarna S.Burkul 1 Assistant Professor, Computer

More information