WordNet and Automated Text Summarization

Size: px
Start display at page:

Download "WordNet and Automated Text Summarization"

Transcription

1 WordNet and Automated Text Summarization Rui Pedro Chaves Computation of Lexical and Grammatical Knowledge Research Group Centro de Linguística da Universidade de Lisboa Avenida 5 de Outubro, 85-5º Lisboa PORTUGAL rui.chaves@clul.ul.pt Abstract Proposals for text classification and information retrieval have been recently presented making use of the WordNet ontology. Generally, this methodology requires statistical induction of synset clusters and entails costly training of specific key domains. The present proposal ints to show that a simple recursive evaluation procedure and WordNet are rich enough to obtain useful results in text categorization and summarization without training nor the need for tagged corpora. Introduction The present work consists on the comparison of two different ways of using WordNet for text classification purposes. We will suport the conclusions of the experiments depicted in Text classification using WordNet hypernyms (Scott and Matwin,1998) on the basis that WordNet may aid machine learning techniques in Information Retrieval. We will propose a different approach however, more efficient and more reliable, since it does not require training and obtains results solely on search procedures and text distribution quantification. In Section 1, a brief overview to Princeton s WordNet is made and to authors that use this ontology for diverse aplicattions from word- -sense disambiguation to Information Retrieval. In Section 2, the article by Scott and Matwin (1998) is briefly presented and analyzed, with special focus on the complexity of the training method, and general efficiency issues. Finally, Section 3 presents the current proposal and provides an execution example, followed by the conclusion. 1 WordNet in Text Classification Wordnet (Miller, 1990; Miller and Fellbaum, 1991; Fellbaum, 1998) is a lexical inheritance ontology gifted with many different pointers that aims to represent some aspects of the semantics of the lexicon, and the relationships of different lexicalized concepts. Princeton s WordNet has been under construction for over a decade and presently has more than word forms organized in word meanings. More recently, an european consortium has opened the doors to EuroWordNet, a collection of many european WordNets, all integrated into a single system for machine translation purposes. Some wordnets have been developed partially automatically, others fully manually. In WordNet, words are grouped in synsets. A synset is a set of synonyms (word forms that relate to the same word meaning) and two words are said to be synonyms if their mutual substitution does not alter the truth value of a given sentence in which they occurr, in a given context. Thus, the following are synsets: 1) {police man, police officer} 2) {buy, purchase) Synsets can be related in many different ways with formal pointers (about 20 in Princeton s WN, and over 70 in EuroWordnet) that have a symmetrical counterpart pointer: 3) {taxi,cab} has_hyperonym {car,automobile} {car,automobile} has_hyponym {taxi,cab} 4) {foot} has_meronym {toe} {toe} has_holonym {foot} In the present paper, a series of experiments described by Scott and Matwin (1998) will be

2 analyzed, going out to exemplify how some authors propose to deal with text classification within a mixed model where WordNet and machine learning are the main ingredients. This proposal explores the hypothesis that the incorporation of structured linguistic knowledge can aid (and guide) statistical inference in order to classify corpora. Other proposals have the same hybrid spirit in related areas: Eneko and Rigau (1996), Rodriguez, Buenaga, Gómez- Hidalgo, Agudo (1997) and Vorhees (1998) use the WordNet ontology for Information Retrieval; Resnik (1995), Gonzalo, Verdejo, Chugur, Cigarran (1998), Stairmand and Black (1996) propose new methodologies that index corpora to WordNet with the goal of increasing the reliability of Information Retrieval results. Scott and Matwin (1998), however, use a machine learning algorithm elaborated for WordNet (more specifically, over the relations of synonymy and hyperonymy). This aims to alter the text representation from a non-ordered set of words (bag-of-words) to a hyperonymy density structure. 2 A WordNet-based Machine Learning Approach Scott and Matwin (1998) adopt a different text representation in order to cope with the text classification problem. An algorithm, Ripper, created by William Cohen (Cohen, 1995) is adapted specifically to the task of dealing with the multi-dimensionality of classifying texts by sets of bag-of-words. Having a training corpus, the steps are three: I. A tagger identifies every word in the corpus. II. A query is issued to WordNet for every noun and verb in the corpus, and a global listing of all synsets with their respective hyperonyms is obtained. Infrequent synsets are ignored (i.e. synsets with a frequency of less than 0.06N, where N is the number of documents in the corpus). III. The density of each synset is calculated (determined by the number of occurrences of a given synset divided by the number of words in the document), thus obtaining a set of numerical values. A value h (height) controls the level of generalization, i.e., given a chain of synsets connected by hyperonymy as the pointer for the has_hyperonym relation): X X X X -> X k the variable h (0 h k) defines the limit in the chain where the synsets are used for search purposes. At this point, the need for such a parameter is to ensure that whenever related words with different domains share the same hyperonym, this common hyperonym is not too high in the chain. Otherwise, this common synset could be too abstract to yield any usefulness. The synsets {neuron}, {stone}, and {cork} have a maximal common hyperonym: {entity}. This hyperonym is too high in the hiearchy to have interesting consequences for text classification purposes. Thus Scott and Matwin (1998) propose that the ideal value for h, given the task of classifying a specific text, deps of several factors: topic, terminology, style and level of speech. This seems to compromise the general-purposeness of classifying corpora this way. By allowing the value h to dep on the topic alone, this entails two different training sessions, one for the the value of h and another for the topic. This prespective is costly and complex because the hypothesis for h are, potentially, depent of every synset in WordNet since virtually every synset is a candidate for the classification of a text. The authors suggest that one of the possible sources of error in the training process may be the lack of structuring in WordNet in some specific inherently abstract domains (psychological states, non-physical events, complex interactions between events, etc.). In other words, the WordNet ontology is not a balanced tree. Yet, the parameter h is not considered to be a source of uncertainty, specifically when it is fixed ad hoc by the authors. If the value of h is too high the algorithm will suffer from overfitting, and if it is too low it will not generalize in a useful manner. Thus, the value of h must be fixed by the tutors of the system that have this value oscillate in a series of experiments with the goal of detecting the best values for a given topic. However, a quite useful observation is made regarding that no disambiguation procedure is required during the computation of the

3 hyperonymy chains. Homographical words are fed into the queries and spawn every possible hyperonym chain. Synsets that share the same hyperonyms (in any given level) will reinforce each other (and the same classification hypothesis), whilst sparse and disjunct chains will, with great probability, point to different and scattered hyperonyms. This property of WordNet will be the basis for the proposal in the present paper. 2.1 Some results from Ripper This supervised machine-learning algorithm is a rule-induction system, chosen for, amongst other reasons, the legibility of its responses. The authors exemplify the algorithm s behavior with a snapshot of a step in the process of classifying a text, namely, in the word possession (Scott and Matwin, 1998): Rule learned using hypernym frequency: possession(synset) 2.9 soc.history default misc.taxes.moderated Rule learned using bag of words: for a document D, ( tax D & history D) OR ( tax D & s D & any D) OR ( tax D & is D & and D & if D & roth D) OR ( century D) OR ( great D) OR ( survey D) OR ( war D) soc.history Thus it is concluded that the synset at hand (for the specific training set) usually belongs to the domain misc.taxes.moderated whenever the frequency is over 2.9. Otherwise, the document must belong to the domain soc.history. For the synset {possession} to have a high density frequency, many of its hyponyms must occur in the text (or themselves have hyponyms that occur), such as {ownership}, {asset} or {liability}. For this example, h was fixed as h=max, meaning that no limitation in the level of the chain of hyperonyms was made. But, for overgeneralization reasons, it was determined that h=2 would be the best value for other semantic domains in the corpus. Scott and Matwin (1998) conclude that between the two methods (bag-of-words, and hyperonym density ) the WordNet based is the most robust and less depent on the training set. They also suggest that other relations available in WordNet could be used, and that the best values for h could be obtained from a best- -first search. We believe, however, that h is a heavy drawback of the proposal, demanding two different training procedures and we will propose a way in which the effect of h can be carried out in an emergent and straightforward fashion. 3 Text Classification via WordNet: a non-machine Learning Approach Efforts have ben made in alternative methods that use lexical chains or lexical cohesion (Morris and Hirst; 1991) to obtain some representation of keysections in the text. Such strategy does necessarily entail a training procedure, but rather an informed processing. Relevant work in this area has been enthusiastic, often sugesting ontologies such as WordNet as knowledge databases, such as Graeme and Stonge (1995), Stairmand (1996), Barzilay and Elhadad (1997) and others. The philosophy behind lexical chains is one that is somewhat related to the present work but differs in the fact that lexical chains are usually the result of a search procedure that aims to obtain some semantic representation of a set of keywords. When using WordNet, that requires searching various relations for a given synset and scoring them differently in order to build the most cohese chains possible. In the present proposal however, there is no representation of the text, since the only data extracted is a (possibly) partial hyperonymy chain from WordNet and a occurrence indexes that indicate of often a word of a given synset has been identified. Our objective is then to show that WordNet itself has the necessary and sufficient lexical semantic information for the task at hand, without training methods nor intermediate text representations. For the objective at hand we have built an application written in ProLog that is able to read the Princeton WordNet 1.6 datafiles. This tool is capable of searching the ontology in a fast and flexible way. It generates, with ease, every possible hyperonym chain for a given word that occurs in a synset. Take as an example the word gain : (1) of money, sum, sum of {possession}

4 For the sake of clarity, the compiled ProLog representation of WordNet is hereby simplifyed to the following form, where synsets are represented by synset(+idnumber,+word): (2) synset(15, gain ). synset(11, amount ). synset(11, amount of money ). synset(11, sum ). synset(11, sum of money ). has_hyperonym(15,11). synset(21, asset ). has_hyperonym(11,21). synset(10, possession ). has_hyperonym(21,10). A simple search engine organizes the synsets in hyperonymic relation, represented in a list of pairs of synsets and occurrence indexes (by default index = 1) of the form [(Synset 1, Index 1 ),, (Synset n,index n )]: (3) [([ income ],1),([ financial gain ], 1),([ gain ],1), ([ amount, amount of money, sum, sum of money ], 1), ([ asset ],1),([ possession ],1)] We will introduce below the most relevant parts of the algorithm in pseudo-prolog notation (for the sake of space limitations), as the motivation is given. We shall further assume that the WordNet ontology is precompiled and available for queries under the previously mentioned predicates has_hyperonym/2 and synonym/2 and some Standard ProLog ISO built-in predicates. Initialization - Generate all possible hyperonym chains for every word in the text. Only nouns will be accounted for at this stage. The following code recognizes the words that occurred in the text (up to three lexical items long 1 ) that belong to a synset: search(+textwords, -WNLSyns) if (Textwords=[])then (WNLsyns:=[]) else begin if ((TextWords=[X,Y,Z RestTLW]) and (synset(_, XYZ )))then (S:=[ XYZ ]) else if((textwords=[x,y RestTLW]) and (synset(_, XY ))) then (S:=[ XY ]) 1 We assume that synsets like {kick the bucket, die, pass away, expire} and {never, not in a pig s eye, when hell freezes over} have four and three words respectively. At this stage we will only consider synsets with a words composed of a maximum of three lexical items. else if ((TextWords=[X RestTLW]) and (synset(_, X ))) then (S:=[ X ]) else begin TextWords:=[_,Y,Z RestTLW], WNLSyns:=RestWNL, S:=[] app(s,restwnl,wnlsyns), search([y,z RestTLW],RestWNL). Next, it is necessary to build the hyperonymy chains associated with the synsets found. The following code simply generates a list of synsets [(Synset 1,1),, (Synset n,1)] connected by the predicate has_hyperonym/2 in WordNet. The search is triggered by the leftmost element, the leaf Synset 1 previously detected. spawnhchain(+wnlsyns,-hylsyns) while (not(wnlsynsets=[])) begin WNLSyns=[Syn RestWNL], synset(id,syn), findall(id,gethyperonym(id,l),l), app(l,temp,hylsyns), spawnhchain(restwnl, Temp) else HYLSyns=[]. The auxiliar gethyperonym/2 is simple enough to be given in full ProLog: gethyperonym(id, [(Synset,1) List]):- synset(id,synset), has_hyperonym(id,hypid), gethyper(hypid,list). gethyperonym(id,[(synset,1),(top,1)]):- synset(id,synset), has_hyperonym(id,topid), \+(has_hyperonym(topid,_)), synset(topid,top). 3.1 Identification Methodology After having a set of hyperonymy chains of all the synsets that occurred in the text, one needs to organize this data in a better way. The idea is to join the chains that are most related. Merging - Let X and Y be hyperonymy chains in a set of chains C obtained from a text, such that X and Y share the most number of synsets, from all the members of C. Merge X and Y into Z (updating the ocurrence index, i.e. adding the indexes). If Z evaluates better than X and Y individually, substitute X and Y for Z in C.

5 This requires observing two disntinct cases. The first arises when a given chain is fused throughout the list of chains, licenced by the predicate best_merging/4 which fails only when all the chains available bear no common synsets with the chain at hand. When either this predicate fails or the list of chains is completely scanned, the second case comes to effect. merge(+hylsyns, -TotalSyns) while (not(hylsyns=[])) begin HYLSyns:=[Head RestH], best_merging(head,resth,best,restb), eval(head,e1), eval(best,e2), fuse(head,best,fused,_), eval(fused,e3), if ((E3>=E2) and (E3>=E1)) then merge([fused RestB],TotalSyns) The predicate fuse/4 is used within best_merging/4 to aid the search for the best merging. The last argument is a counter that simply stores the number of common synsets. Thus, the best mergings are guided by this integer. fuse([],[],[],0). fuse([(s,i1) L1],[(S,I2) L2)], [(S,I3) L3],TSyns):- I3 is I1+I2, fuse(l1,l2,l3,numbsyns), TSyns is NumbSyns+1. fuse([s1,_) L1],[(S2,_) L2],L3,TSyns):- \+(S1=S2), fuse(l1,l2,l3,tsyns). The second case is when either there are no more mergings and one must continue throughout the list of chains in search for other mergings, or at the of the list of chains. merge(+hylsyns, -[Fused TotalSyns]) if (HYLSyns=[]) then TotalSyns:=[] else HYLSyns:=[Fused,Head RestH], merge([head RestH],TotalSyns) Thus the merging procedure is repeated until all chains are disjunct or no further successful mergings exist. Next, chains with no occurrence indexes larger than the value of 1 are extracted from the set of chains C. Note that the merging of two long chains that share a medium depth synset result, graphically, in an inverted Y shaped chain. The two lower branches are not copied and this chain will probably not have an evaluation as good as one of the original. For this reason this merging in unsuccessful. If this were not so, all mergings would neglect disjunct elements and the result would be just maximal hyponyms, too abstract to yield usefulness to the task at hand. Now, with meaningful indexes, one may evaluate the resulting chains and decide where the most relevant synsets lie. The evaluation of a hyperonymy chain with n synsets, e.g. [(S 1,i 1 ), (S 2,i 2 ),,(S n,i n )], shall be defined as the sum of the occurrence indexes of each synset i m (1 m n) powered by the level of depth in the chain, m: eval( [( s i1),..., ( sni n) ]) = n k 1 ( ik ) k= 1 Take for instance the chain [(A,5), (B,4), (C,3), (D,3), (E,1)]. Evaluation would yield [5 1, 4 2, 3, 3 4, 1 5 ] and that, in turn would yield the total of 130. Thus (D,3) would be the best overall synset (the key-synset) of this list, scoring 81. The best value is searched in the list by a simple greedy search that stops when a synset (S i,e i ) i is individually evaluated worse than the next synset (S i+1,e i+1 ) i+1. In this fashion, shallow chains have their leafs poorly evaluated, while long chains yield the best results in the synsets with the most depth and occurrence index. Of course, evaluation functions and heuristics may vary a great deal, but a simple example can show the capabilities of this measuring. This procedure should thusly point to the synset with both the maximum depth in the hyponymy chain and the maximum value on the occurrence index. In general, we argue that a candidate topic for the text is the synset with the best relation between the depth and the occurrence index. The following text 2 was submitted, as it is, to WordNet 1.6 and every word was subjected to an hyperonymy search. No tagging nor normalization of inflection required. «In putting together the model, Mundell was particularly interested in the consequences of foreign trade and the movement of capital across national nation borders. His research showed that rates of exchange between currencies have a significant 2 Scientific American, The Nobel Prizes for 1999 Robert A. Mundel, Economics The godfather of the Euro, January 2000, pp.13.

6 influence on the efficacy of a country s monetary money policies (the supply of money available and changes in national interest rates) and fiscal policies (taxation and federal budget considerations). According to the Mundell-Fleming model, under a fixed exchange rate, changes to monetary money policies would have little effect on a nation s economy, but fiscal policies would be quite powerful. The reverse is true under a floating exchange rate.» A glimpse of the raw results are transcribed below, ordered by the common maximal hyperonym, {possession}: L=[([ change ],1),([ cash ],1), ([ currency ],1), ([ medium of exchange, monetary system ],1), ([ asset ],1),([ possession ],1)] L=[([ capital, working capital ],1), ([ asset ], 1), ([ possession ], 1)] L=[([ capital ],1),([ material resource ],1),([ asset ],1), ([ possession ],1)] L=[([ retainer, consideration ],1), ([ fee ], 1), ([ fixed charge ],1), ([ charge ],1),([ cost ],1),([ outgo, expiture, outlay ],1),([ financial loss ],1),([ loss ], 1), ([ transferred property, transferred possession ], 1),([ possession ],1)] L=[([ rate, charge per unit ],1), ([ charge ],1),([ cost ],1),([ outgo, expiture, outlay ],1),([ financial loss ],1),([ loss ],1),([ transferred property, transferred possession ],1), ([ possession ],1)] L=[([ rate ],1),([ tax, taxation, revenue enhancement ],1),([ levy ],1), ([ charge ],1),([ liability, financial obligation, indebtedness, pecuniary obligation ],1), ([ possession ],1)] After running the algorithm, only a small number of hypothesis remained under this maximal hyperonym: [([ cash ],2),([ currency ],5),([ mediu m of exchange, monetary system ],7), ([ asset ],10), ([ possession ],18)] e: = 1118 [([ rate of exchange, exchange rate ], 2),([ rate, charge per unit ],4), ([ charge ],5),([ cost ],5), ([ outgo, expiture, outlay ],5),([ financial loss ],5),([ loss ],5),([ transferred property, transferred possession ], 5),([ possession ],18)] e: = Actually, the resulting set C in this example has only seven members. From the above results, it becomes straightforward to determine the synsets with the most depth and occurrence index. At this point we will limit a maximum of two synsets selected per maximal hyperonym. Thus, the result is the following listing: Score: Synset 78125:{charge} 1024: {thinking, thought,cerebration, intellection, mentation} 729: {administrative district, administrative division, territorial division} 625: {currency} 512: {quality} 125: {action} 16: {consequence, effect, outcome, result, issue, upshot} These are the results sorted out of eighteen domains (maximal hyperonyms, out of the twenty five existent in Princeton s WordNet) from very different areas yet all related to the topic of the text. In fact, our experiments showed that texts which dealt with different subjects scored high in these areas or in intrinsically related terminology. Thus, several parallel classifications may arise directly, without requiring changes in the search procedure or evaluation function. In machine learning approaches, the best results are obtained when two (or more) different training sets are unrelated. The more similar the topics, the harder it is to deal with the (possibly linearly indivisible) search-space. From this prespective, it becomes clear that the parameter h is not explicitly necessary because the merging procedure ensures that the best hypothesis are considered first, since abstract and maximal synsets yield poor evaluations. The effect of such parameter, however, emerges in a motivated way. 3.2 Interpretation and Generation As noted, the number of potential categories for any given text may span any number of WordNet synsets and thus, in this approach, one cannot simply label a text with a single category. The next logical steps are interpreting and generating the keypoint pieces of the initial text.

7 Again, the simplest possible approach is adopted, where interpreting and generation are part of a single procedure, hardly separable. Thus, a procedure shall be described where all the words of all individual sections in the text are evaluated for their relationship with the previously found synsets (the ones maximally evaluated by function eval in the hyperonymy chains that belong to the set C). Thus, a paragraph with many words that are hyponyms (direct or indirect) of the synsets found to be the most relevant to the topic (key-synsets), has a good chance of making it to the summarized version of the text. So, given a text and a set of key-synsets K (the best evaluated from each list in set C), every section (sequence of words bounded by a special symbol, e.g. ;, (, ),?,., -, marking parenthetical expressions, paragraphs, etc.) is subjected to an evaluation based upon how many words are hyponyms of any of the synsets in K. Basically, the score of a sequence of words with length n is the following sum: ( w,..., w ) ρ = 1 n n k = 1 ( has _ hypw ( k, s) : s K) 1 n The function has_hyp is true (equals value 1) iff there exists an hyperonymy relation (direct or not) between the given word and a given synset, member of set K. The result of this procedure is a score, attached to a section of the text, stored in a bi-directional list. Sequences with evaluation below threshold of 1 are rejected. Over this fashion, the results are ordered as they are inserted in the structure and one may specify how many sections are to make part of the summarized version of the text, ideally, proportionally to the number of words in it. As a result (and a quite difficult piece of text to summarize it already is) the output over the example is (annotated with <words, synsets related to K, score>): In putting together the model, :<5,0,0.2> Mundell was particularly interested in the consequences of foreign trade and the movement of capital across national nation borders. :<19,5,0.8> His research showed that rates of exchange between currencies have a significant influence on the efficacy of a country s monetary money policies ( : <19,7,1.8> the supply of 2 money available and changes in national interest rates) : <9,4,1.0> and fiscal policies ( :<2,1,0.0> taxation and federal budget considerations). : <4,2,0.2> According to the Mundell-Fleming model, : <5,0,0.2> under a fixed exchange rate, : <4,1,0.0> changes to monetary money policies would have little effect on a nation s economy, : <13,7,2.7> but fiscal policies would be quite powerful. :<6,2,0.1> The reverse is true under a floating exchange rate. : <8,2,0.1> 3.3 Experiments A number of experiments were made in which the system proved interesting results, namely speed, reasonable flexibility and robustiveness, especially when dealing with multi-category texts. The speed arises from the fact that there is no training and that all the searches are linear. For long texts however, it is hard to decide if the output of the system is a good one. We are starting now with a small corpus of hand-summarized texts and hope to be able to get feedback on the general behavior of the system. Precision and recall will have to be pondered by human subjective analysis of the texts fed into the system and only then a general formal measure of the performance shall be available. Such a process is resource and time- -consuming and is still ongoing at the present moment. Conclusion A series of proposals for text classification, retrieval, categorization and summarization have recently been presented, making use of the WordNet ontology for statistical induction of synset clusters. What all these proposals have in common is that they require costly training. The quality (or adequacy) of the keywords found relative to the domains deps on their similarity: similar or related topics are harder to capture. Furthermore, without a tagged corpus, homography results in overfitting. The present proposal points out that WordNet has enough language-descriptive richness to allow interesting results in text categorization and summarization without training nor the need for tagged texts. Homography is solved in a straightforward way, where the hyperonymy distribution reinforces the correct classification for the text. No parsing and no intermediate text representation required. The emergence of multiple categories is also driven by the

8 hyperonymic structure of the words in the text. Thus, it is argued that this method (taking under account that the measuring heuristics may vary a great deal) is not inferior to the cited approaches and yields efficiency advantages, especially in fuzzy domains. Future Work To evaluate the systems performance against a set of hand-summarized texts and, if necessary, to ext the system with other relations in WordNet, namely verbal and derivational relations (as in the EurowordNet project: is_derived_from, has_derived and derived ). If this method proves reliable enough for further experimentation, we believe that a more sensitive evaluation of how the words in the text are related to the key-synsets is in order. Namely, upon three different parameters: 1) a measure of depth (distance in hyperonymy steps in function has_hyp); 2) a measure of homonimy (measuring noise as the number of different synsets that have this same word found in the text); 3) a measure of semantic reinforcement (based upon the number of words found in the same section that belong to synsets related to the same hyperonymy chain, in other words, taking into account how many different hyperonymy chains are active in the same section). References Barzilay Regina and Michael Helhadad (1997) In ACL/EACL-97 summarization workshop, pp.10-18, Madrid. Buenaga M. Rodríguez., Gómez-Hidalgo J. M., Díaz Agudo, B. (1997) Using WordNet to complement training information in text categorization. In Proceedings of the International Conference on Recent Advances in Natural Language Processing, Tzigov Chark. Cohen William W. (1995) Fast Effective Rule Induction. In Proceedings of ICML-95, Lake Tahoe, California. Fellbaum Christianne (1998) A Semantic Network of English Verbs. In Fellbaum C. (ed.) WordNet: An Electronic Lexical Database and Some of its Applications, MIT Press, Cambridge, MA. Gonzalo Julio, Felisa Verdejo, Irina Chugur, Juan Cigarran (1998) Indexing with WordNet synsets can improve text retrieval. In Proceedings of the COLING/ACL Workshop on Usage of WordNet in Natural Language Processing Systems, Montreal. Hirst Graeme and David St-onge (1995) Lexical Chains as Representations of Context for the Detection and Correction of Malapropisms, In In Fellbaum C. (ed.) WordNet: An Electronic Lexical Database and Some of its Applications, MIT Press, Cambridge, MA. Morris J. and Graeme Hirst (1991) Lexical cohesion computed by thesaural relations as an indicator of the structure of the text. In Computational Linguistics, 17(1): pp Miller George A. (1990) WordNet: An On- Line Lexical Database. In Special Issue of International Journal of Lexicography, Vol 3, No. 4. Miller, George A. (1995) WordNet: A lexical database for English. Communications of the ACM, 38 (11), Miller, George A. and Fellbaum C. (1991) Semantic Networks of English. In Cognition, special issue, Reprinted in Levin B. and Pinker (eds.) Lexical and Conceptual Semantics. Blackwell, Cambridge, MA, pp Resnik Philip (1995) Using information content to evaluate semantic similarity in a taxonomy. In Proceedings of the 14th International Joint Conference on Artificial Intelligence, Montreal. Scott S. and Matwin S. (1998) Text classification using WordNet hypernyms. In Proceedings of the COLING/ACL Workshop on Usage of WordNet in Natural Language Processing Systems, Montreal. Stairmand Mark A. (1996) A Computational Analysis of Lexical Cohesion with Applications in Information Retrieval. Phd. thesis, Center for Computational Linguistics, UMIST, Manchester. Stairmand Mark A. and Black William J. (1996) Contextual and conceptual indexing using WordNet-derived lexical chains. In Proceedings of the 18th BCS-IRSG Colloquium on Information Retrieval Research, pp Vorhees, Ellen M. (1998) Using WordNet for text retrieval. In Fellbaum C. (ed.) WordNet: An Electronic Lexical Database, MIT Press.

A Combined Method of Text Summarization via Sentence Extraction

A Combined Method of Text Summarization via Sentence Extraction Proceedings of the 2007 WSEAS International Conference on Computer Engineering and Applications, Gold Coast, Australia, January 17-19, 2007 434 A Combined Method of Text Summarization via Sentence Extraction

More information

Conceptual document indexing using a large scale semantic dictionary providing a concept hierarchy

Conceptual document indexing using a large scale semantic dictionary providing a concept hierarchy Conceptual document indexing using a large scale semantic dictionary providing a concept hierarchy Martin Rajman, Pierre Andrews, María del Mar Pérez Almenta, and Florian Seydoux Artificial Intelligence

More information

Knowledge-based Word Sense Disambiguation using Topic Models Devendra Singh Chaplot

Knowledge-based Word Sense Disambiguation using Topic Models Devendra Singh Chaplot Knowledge-based Word Sense Disambiguation using Topic Models Devendra Singh Chaplot Ruslan Salakhutdinov Word Sense Disambiguation Word sense disambiguation (WSD) is defined as the problem of computationally

More information

Shrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India

Shrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 3 ISSN : 2456-3307 Some Issues in Application of NLP to Intelligent

More information

WEIGHTING QUERY TERMS USING WORDNET ONTOLOGY

WEIGHTING QUERY TERMS USING WORDNET ONTOLOGY IJCSNS International Journal of Computer Science and Network Security, VOL.9 No.4, April 2009 349 WEIGHTING QUERY TERMS USING WORDNET ONTOLOGY Mohammed M. Sakre Mohammed M. Kouta Ali M. N. Allam Al Shorouk

More information

A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2

A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2 A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2 1 Department of Electronics & Comp. Sc, RTMNU, Nagpur, India 2 Department of Computer Science, Hislop College, Nagpur,

More information

QUERY EXPANSION USING WORDNET WITH A LOGICAL MODEL OF INFORMATION RETRIEVAL

QUERY EXPANSION USING WORDNET WITH A LOGICAL MODEL OF INFORMATION RETRIEVAL QUERY EXPANSION USING WORDNET WITH A LOGICAL MODEL OF INFORMATION RETRIEVAL David Parapar, Álvaro Barreiro AILab, Department of Computer Science, University of A Coruña, Spain dparapar@udc.es, barreiro@udc.es

More information

MEASUREMENT OF SEMANTIC SIMILARITY BETWEEN WORDS: A SURVEY

MEASUREMENT OF SEMANTIC SIMILARITY BETWEEN WORDS: A SURVEY MEASUREMENT OF SEMANTIC SIMILARITY BETWEEN WORDS: A SURVEY Ankush Maind 1, Prof. Anil Deorankar 2 and Dr. Prashant Chatur 3 1 M.Tech. Scholar, Department of Computer Science and Engineering, Government

More information

Lexical ambiguity in cross-language image retrieval: a preliminary analysis.

Lexical ambiguity in cross-language image retrieval: a preliminary analysis. Lexical ambiguity in cross-language image retrieval: a preliminary analysis. Borja Navarro-Colorado, Marcel Puchol-Blasco, Rafael M. Terol, Sonia Vázquez and Elena Lloret. Natural Language Processing Research

More information

TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES

TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES Mu. Annalakshmi Research Scholar, Department of Computer Science, Alagappa University, Karaikudi. annalakshmi_mu@yahoo.co.in Dr. A.

More information

arxiv:cmp-lg/ v1 5 Aug 1998

arxiv:cmp-lg/ v1 5 Aug 1998 Indexing with WordNet synsets can improve text retrieval Julio Gonzalo and Felisa Verdejo and Irina Chugur and Juan Cigarrán UNED Ciudad Universitaria, s.n. 28040 Madrid - Spain {julio,felisa,irina,juanci}@ieec.uned.es

More information

A Comprehensive Analysis of using Semantic Information in Text Categorization

A Comprehensive Analysis of using Semantic Information in Text Categorization A Comprehensive Analysis of using Semantic Information in Text Categorization Kerem Çelik Department of Computer Engineering Boğaziçi University Istanbul, Turkey celikerem@gmail.com Tunga Güngör Department

More information

Evaluating wordnets in Cross-Language Information Retrieval: the ITEM search engine

Evaluating wordnets in Cross-Language Information Retrieval: the ITEM search engine Evaluating wordnets in Cross-Language Information Retrieval: the ITEM search engine Felisa Verdejo, Julio Gonzalo, Anselmo Peñas, Fernando López and David Fernández Depto. de Ingeniería Eléctrica, Electrónica

More information

Semantic Extensions to Syntactic Analysis of Queries Ben Handy, Rohini Rajaraman

Semantic Extensions to Syntactic Analysis of Queries Ben Handy, Rohini Rajaraman Semantic Extensions to Syntactic Analysis of Queries Ben Handy, Rohini Rajaraman Abstract We intend to show that leveraging semantic features can improve precision and recall of query results in information

More information

Measuring conceptual distance using WordNet: the design of a metric for measuring semantic similarity

Measuring conceptual distance using WordNet: the design of a metric for measuring semantic similarity Measuring conceptual distance using WordNet: the design of a metric for measuring semantic similarity Item Type text; Article Authors Lewis, William D. Publisher University of Arizona Linguistics Circle

More information

Ontology Based Search Engine

Ontology Based Search Engine Ontology Based Search Engine K.Suriya Prakash / P.Saravana kumar Lecturer / HOD / Assistant Professor Hindustan Institute of Engineering Technology Polytechnic College, Padappai, Chennai, TamilNadu, India

More information

Enhancing Web Page Skimmability

Enhancing Web Page Skimmability Enhancing Web Page Skimmability Chen-Hsiang Yu MIT CSAIL 32 Vassar St Cambridge, MA 02139 chyu@mit.edu Robert C. Miller MIT CSAIL 32 Vassar St Cambridge, MA 02139 rcm@mit.edu Abstract Information overload

More information

NATURAL LANGUAGE PROCESSING

NATURAL LANGUAGE PROCESSING NATURAL LANGUAGE PROCESSING LESSON 9 : SEMANTIC SIMILARITY OUTLINE Semantic Relations Semantic Similarity Levels Sense Level Word Level Text Level WordNet-based Similarity Methods Hybrid Methods Similarity

More information

Evaluating a Conceptual Indexing Method by Utilizing WordNet

Evaluating a Conceptual Indexing Method by Utilizing WordNet Evaluating a Conceptual Indexing Method by Utilizing WordNet Mustapha Baziz, Mohand Boughanem, Nathalie Aussenac-Gilles IRIT/SIG Campus Univ. Toulouse III 118 Route de Narbonne F-31062 Toulouse Cedex 4

More information

GernEdiT: A Graphical Tool for GermaNet Development

GernEdiT: A Graphical Tool for GermaNet Development GernEdiT: A Graphical Tool for GermaNet Development Verena Henrich University of Tübingen Tübingen, Germany. verena.henrich@unituebingen.de Erhard Hinrichs University of Tübingen Tübingen, Germany. erhard.hinrichs@unituebingen.de

More information

CHAPTER 5 SEARCH ENGINE USING SEMANTIC CONCEPTS

CHAPTER 5 SEARCH ENGINE USING SEMANTIC CONCEPTS 82 CHAPTER 5 SEARCH ENGINE USING SEMANTIC CONCEPTS In recent years, everybody is in thirst of getting information from the internet. Search engines are used to fulfill the need of them. Even though the

More information

Ontology Based Prediction of Difficult Keyword Queries

Ontology Based Prediction of Difficult Keyword Queries Ontology Based Prediction of Difficult Keyword Queries Lubna.C*, Kasim K Pursuing M.Tech (CSE)*, Associate Professor (CSE) MEA Engineering College, Perinthalmanna Kerala, India lubna9990@gmail.com, kasim_mlp@gmail.com

More information

A Linguistic Approach for Semantic Web Service Discovery

A Linguistic Approach for Semantic Web Service Discovery A Linguistic Approach for Semantic Web Service Discovery Jordy Sangers 307370js jordysangers@hotmail.com Bachelor Thesis Economics and Informatics Erasmus School of Economics Erasmus University Rotterdam

More information

is easing the creation of new ontologies by promoting the reuse of existing ones and automating, as much as possible, the entire ontology

is easing the creation of new ontologies by promoting the reuse of existing ones and automating, as much as possible, the entire ontology Preface The idea of improving software quality through reuse is not new. After all, if software works and is needed, just reuse it. What is new and evolving is the idea of relative validation through testing

More information

Ontology Matching with CIDER: Evaluation Report for the OAEI 2008

Ontology Matching with CIDER: Evaluation Report for the OAEI 2008 Ontology Matching with CIDER: Evaluation Report for the OAEI 2008 Jorge Gracia, Eduardo Mena IIS Department, University of Zaragoza, Spain {jogracia,emena}@unizar.es Abstract. Ontology matching, the task

More information

ResPubliQA 2010

ResPubliQA 2010 SZTAKI @ ResPubliQA 2010 David Mark Nemeskey Computer and Automation Research Institute, Hungarian Academy of Sciences, Budapest, Hungary (SZTAKI) Abstract. This paper summarizes the results of our first

More information

Motivating Ontology-Driven Information Extraction

Motivating Ontology-Driven Information Extraction Motivating Ontology-Driven Information Extraction Burcu Yildiz 1 and Silvia Miksch 1, 2 1 Institute for Software Engineering and Interactive Systems, Vienna University of Technology, Vienna, Austria {yildiz,silvia}@

More information

A Method for Semi-Automatic Ontology Acquisition from a Corporate Intranet

A Method for Semi-Automatic Ontology Acquisition from a Corporate Intranet A Method for Semi-Automatic Ontology Acquisition from a Corporate Intranet Joerg-Uwe Kietz, Alexander Maedche, Raphael Volz Swisslife Information Systems Research Lab, Zuerich, Switzerland fkietz, volzg@swisslife.ch

More information

Guidelines for a flexible and resilient statistical system: the architecture of the new Portuguese BOP/IIP system

Guidelines for a flexible and resilient statistical system: the architecture of the new Portuguese BOP/IIP system Guidelines for a flexible and resilient statistical system: the architecture of the new Portuguese BOP/IIP system Marques, Carla Banco de Portugal1, Statistics Department Av. D. João II, Lote 1.12.02 Lisbon

More information

CS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University

CS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University CS473: CS-473 Course Review Luo Si Department of Computer Science Purdue University Basic Concepts of IR: Outline Basic Concepts of Information Retrieval: Task definition of Ad-hoc IR Terminologies and

More information

Sense-based Information Retrieval System by using Jaccard Coefficient Based WSD Algorithm

Sense-based Information Retrieval System by using Jaccard Coefficient Based WSD Algorithm ISBN 978-93-84468-0-0 Proceedings of 015 International Conference on Future Computational Technologies (ICFCT'015 Singapore, March 9-30, 015, pp. 197-03 Sense-based Information Retrieval System by using

More information

Query Difficulty Prediction for Contextual Image Retrieval

Query Difficulty Prediction for Contextual Image Retrieval Query Difficulty Prediction for Contextual Image Retrieval Xing Xing 1, Yi Zhang 1, and Mei Han 2 1 School of Engineering, UC Santa Cruz, Santa Cruz, CA 95064 2 Google Inc., Mountain View, CA 94043 Abstract.

More information

Joint Entity Resolution

Joint Entity Resolution Joint Entity Resolution Steven Euijong Whang, Hector Garcia-Molina Computer Science Department, Stanford University 353 Serra Mall, Stanford, CA 94305, USA {swhang, hector}@cs.stanford.edu No Institute

More information

A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition

A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition Ana Zelaia, Olatz Arregi and Basilio Sierra Computer Science Faculty University of the Basque Country ana.zelaia@ehu.es

More information

Decision Trees Dr. G. Bharadwaja Kumar VIT Chennai

Decision Trees Dr. G. Bharadwaja Kumar VIT Chennai Decision Trees Decision Tree Decision Trees (DTs) are a nonparametric supervised learning method used for classification and regression. The goal is to create a model that predicts the value of a target

More information

Ontology based Model and Procedure Creation for Topic Analysis in Chinese Language

Ontology based Model and Procedure Creation for Topic Analysis in Chinese Language Ontology based Model and Procedure Creation for Topic Analysis in Chinese Language Dong Han and Kilian Stoffel Information Management Institute, University of Neuchâtel Pierre-à-Mazel 7, CH-2000 Neuchâtel,

More information

Information Extraction Techniques in Terrorism Surveillance

Information Extraction Techniques in Terrorism Surveillance Information Extraction Techniques in Terrorism Surveillance Roman Tekhov Abstract. The article gives a brief overview of what information extraction is and how it might be used for the purposes of counter-terrorism

More information

Question Answering Approach Using a WordNet-based Answer Type Taxonomy

Question Answering Approach Using a WordNet-based Answer Type Taxonomy Question Answering Approach Using a WordNet-based Answer Type Taxonomy Seung-Hoon Na, In-Su Kang, Sang-Yool Lee, Jong-Hyeok Lee Department of Computer Science and Engineering, Electrical and Computer Engineering

More information

Serbian Wordnet for biomedical sciences

Serbian Wordnet for biomedical sciences Serbian Wordnet for biomedical sciences Sanja Antonic University library Svetozar Markovic University of Belgrade, Serbia antonic@unilib.bg.ac.yu Cvetana Krstev Faculty of Philology, University of Belgrade,

More information

LexiRes: A Tool for Exploring and Restructuring EuroWordNet for Information Retrieval

LexiRes: A Tool for Exploring and Restructuring EuroWordNet for Information Retrieval LexiRes: A Tool for Exploring and Restructuring EuroWordNet for Information Retrieval Ernesto William De Luca and Andreas Nürnberger 1 Abstract. The problem of word sense disambiguation in lexical resources

More information

A Semantic Role Repository Linking FrameNet and WordNet

A Semantic Role Repository Linking FrameNet and WordNet A Semantic Role Repository Linking FrameNet and WordNet Volha Bryl, Irina Sergienya, Sara Tonelli, Claudio Giuliano {bryl,sergienya,satonelli,giuliano}@fbk.eu Fondazione Bruno Kessler, Trento, Italy Abstract

More information

Integrating Spanish Linguistic Resources in a Web Site Assistant

Integrating Spanish Linguistic Resources in a Web Site Assistant Integrating Spanish Linguistic Resources in a Web Site Assistant Paloma Martínez*, Ana García-Serrano, Alberto Ruiz-Cristina * Universidad Carlos III de Madrid Avd. Universidad 30, 28911 Leganés, Madrid,

More information

Information Retrieval and Web Search

Information Retrieval and Web Search Information Retrieval and Web Search Relevance Feedback. Query Expansion Instructor: Rada Mihalcea Intelligent Information Retrieval 1. Relevance feedback - Direct feedback - Pseudo feedback 2. Query expansion

More information

SOME TYPES AND USES OF DATA MODELS

SOME TYPES AND USES OF DATA MODELS 3 SOME TYPES AND USES OF DATA MODELS CHAPTER OUTLINE 3.1 Different Types of Data Models 23 3.1.1 Physical Data Model 24 3.1.2 Logical Data Model 24 3.1.3 Conceptual Data Model 25 3.1.4 Canonical Data Model

More information

The Dictionary Parsing Project: Steps Toward a Lexicographer s Workstation

The Dictionary Parsing Project: Steps Toward a Lexicographer s Workstation The Dictionary Parsing Project: Steps Toward a Lexicographer s Workstation Ken Litkowski ken@clres.com http://www.clres.com http://www.clres.com/dppdemo/index.html Dictionary Parsing Project Purpose: to

More information

Improving Retrieval Experience Exploiting Semantic Representation of Documents

Improving Retrieval Experience Exploiting Semantic Representation of Documents Improving Retrieval Experience Exploiting Semantic Representation of Documents Pierpaolo Basile 1 and Annalina Caputo 1 and Anna Lisa Gentile 1 and Marco de Gemmis 1 and Pasquale Lops 1 and Giovanni Semeraro

More information

A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition

A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition Ana Zelaia, Olatz Arregi and Basilio Sierra Computer Science Faculty University of the Basque Country ana.zelaia@ehu.es

More information

A Comparison of Text-Categorization Methods applied to N-Gram Frequency Statistics

A Comparison of Text-Categorization Methods applied to N-Gram Frequency Statistics A Comparison of Text-Categorization Methods applied to N-Gram Frequency Statistics Helmut Berger and Dieter Merkl 2 Faculty of Information Technology, University of Technology, Sydney, NSW, Australia hberger@it.uts.edu.au

More information

EFFICIENT CLUSTERING WITH FUZZY ANTS

EFFICIENT CLUSTERING WITH FUZZY ANTS EFFICIENT CLUSTERING WITH FUZZY ANTS S. SCHOCKAERT, M. DE COCK, C. CORNELIS AND E. E. KERRE Fuzziness and Uncertainty Modelling Research Unit, Department of Applied Mathematics and Computer Science, Ghent

More information

MEASURING SEMANTIC SIMILARITY BETWEEN WORDS AND IMPROVING WORD SIMILARITY BY AUGUMENTING PMI

MEASURING SEMANTIC SIMILARITY BETWEEN WORDS AND IMPROVING WORD SIMILARITY BY AUGUMENTING PMI MEASURING SEMANTIC SIMILARITY BETWEEN WORDS AND IMPROVING WORD SIMILARITY BY AUGUMENTING PMI 1 KAMATCHI.M, 2 SUNDARAM.N 1 M.E, CSE, MahaBarathi Engineering College Chinnasalem-606201, 2 Assistant Professor,

More information

Semi-Automatic Conceptual Data Modeling Using Entity and Relationship Instance Repositories

Semi-Automatic Conceptual Data Modeling Using Entity and Relationship Instance Repositories Semi-Automatic Conceptual Data Modeling Using Entity and Relationship Instance Repositories Ornsiri Thonggoom, Il-Yeol Song, Yuan An The ischool at Drexel Philadelphia, PA USA Outline Long Term Research

More information

Making Sense Out of the Web

Making Sense Out of the Web Making Sense Out of the Web Rada Mihalcea University of North Texas Department of Computer Science rada@cs.unt.edu Abstract. In the past few years, we have witnessed a tremendous growth of the World Wide

More information

Ontology Creation and Development Model

Ontology Creation and Development Model Ontology Creation and Development Model Pallavi Grover, Sonal Chawla Research Scholar, Department of Computer Science & Applications, Panjab University, Chandigarh, India Associate. Professor, Department

More information

Web Information Retrieval using WordNet

Web Information Retrieval using WordNet Web Information Retrieval using WordNet Jyotsna Gharat Asst. Professor, Xavier Institute of Engineering, Mumbai, India Jayant Gadge Asst. Professor, Thadomal Shahani Engineering College Mumbai, India ABSTRACT

More information

Challenges and Benefits of a Methodology for Scoring Web Content Accessibility Guidelines (WCAG) 2.0 Conformance

Challenges and Benefits of a Methodology for Scoring Web Content Accessibility Guidelines (WCAG) 2.0 Conformance NISTIR 8010 Challenges and Benefits of a Methodology for Scoring Web Content Accessibility Guidelines (WCAG) 2.0 Conformance Frederick Boland Elizabeth Fong http://dx.doi.org/10.6028/nist.ir.8010 NISTIR

More information

Modern Programming Languages. Lecture LISP Programming Language An Introduction

Modern Programming Languages. Lecture LISP Programming Language An Introduction Modern Programming Languages Lecture 18-21 LISP Programming Language An Introduction 72 Functional Programming Paradigm and LISP Functional programming is a style of programming that emphasizes the evaluation

More information

Punjabi WordNet Relations and Categorization of Synsets

Punjabi WordNet Relations and Categorization of Synsets Punjabi WordNet Relations and Categorization of Synsets Rupinderdeep Kaur Computer Science Engineering Department, Thapar University, rupinderdeep@thapar.edu Suman Preet Department of Linguistics and Punjabi

More information

Semi-Automatic Conceptual Data Modeling Using Entity and Relationship Instance Repositories

Semi-Automatic Conceptual Data Modeling Using Entity and Relationship Instance Repositories Semi-Automatic Conceptual Data Modeling Using Entity and Relationship Instance Repositories Ornsiri Thonggoom, Il-Yeol Song, and Yuan An The ischool at Drexel University, Philadelphia, PA USA Ot62@drexel.edu,

More information

String Vector based KNN for Text Categorization

String Vector based KNN for Text Categorization 458 String Vector based KNN for Text Categorization Taeho Jo Department of Computer and Information Communication Engineering Hongik University Sejong, South Korea tjo018@hongik.ac.kr Abstract This research

More information

Reading group on Ontologies and NLP:

Reading group on Ontologies and NLP: Reading group on Ontologies and NLP: Machine Learning27th infebruary Automated 2014 1 / 25 Te Reading group on Ontologies and NLP: Machine Learning in Automated Text Categorization, by Fabrizio Sebastianini.

More information

ACCOUNTING (ACCT) Kent State University Catalog

ACCOUNTING (ACCT) Kent State University Catalog Kent State University Catalog 2018-2019 1 ACCOUNTING (ACCT) ACCT 23020 INTRODUCTION TO FINANCIAL ACCOUNTING 3 Credit (Equivalent to ACTT 11000) Introduction to the basic concepts and standards underlying

More information

Data Preprocessing. Slides by: Shree Jaswal

Data Preprocessing. Slides by: Shree Jaswal Data Preprocessing Slides by: Shree Jaswal Topics to be covered Why Preprocessing? Data Cleaning; Data Integration; Data Reduction: Attribute subset selection, Histograms, Clustering and Sampling; Data

More information

Enhancing Automatic Wordnet Construction Using Word Embeddings

Enhancing Automatic Wordnet Construction Using Word Embeddings Enhancing Automatic Wordnet Construction Using Word Embeddings Feras Al Tarouti University of Colorado Colorado Springs 1420 Austin Bluffs Pkwy Colorado Springs, CO 80918, USA faltarou@uccs.edu Jugal Kalita

More information

What is this Song About?: Identification of Keywords in Bollywood Lyrics

What is this Song About?: Identification of Keywords in Bollywood Lyrics What is this Song About?: Identification of Keywords in Bollywood Lyrics by Drushti Apoorva G, Kritik Mathur, Priyansh Agrawal, Radhika Mamidi in 19th International Conference on Computational Linguistics

More information

MetaData for Database Mining

MetaData for Database Mining MetaData for Database Mining John Cleary, Geoffrey Holmes, Sally Jo Cunningham, and Ian H. Witten Department of Computer Science University of Waikato Hamilton, New Zealand. Abstract: At present, a machine

More information

CS229 Lecture notes. Raphael John Lamarre Townshend

CS229 Lecture notes. Raphael John Lamarre Townshend CS229 Lecture notes Raphael John Lamarre Townshend Decision Trees We now turn our attention to decision trees, a simple yet flexible class of algorithms. We will first consider the non-linear, region-based

More information

COMP90042 LECTURE 3 LEXICAL SEMANTICS COPYRIGHT 2018, THE UNIVERSITY OF MELBOURNE

COMP90042 LECTURE 3 LEXICAL SEMANTICS COPYRIGHT 2018, THE UNIVERSITY OF MELBOURNE COMP90042 LECTURE 3 LEXICAL SEMANTICS SENTIMENT ANALYSIS REVISITED 2 Bag of words, knn classifier. Training data: This is a good movie.! This is a great movie.! This is a terrible film. " This is a wonderful

More information

Schema Quality Improving Tasks in the Schema Integration Process

Schema Quality Improving Tasks in the Schema Integration Process 468 Schema Quality Improving Tasks in the Schema Integration Process Peter Bellström Information Systems Karlstad University Karlstad, Sweden e-mail: peter.bellstrom@kau.se Christian Kop Institute for

More information

Eurown: an EuroWordNet module for Python

Eurown: an EuroWordNet module for Python Eurown: an EuroWordNet module for Python Neeme Kahusk Institute of Computer Science University of Tartu, Liivi 2, 50409 Tartu, Estonia neeme.kahusk@ut.ee Abstract The subject of this demo is a Python module

More information

A cocktail approach to the VideoCLEF 09 linking task

A cocktail approach to the VideoCLEF 09 linking task A cocktail approach to the VideoCLEF 09 linking task Stephan Raaijmakers Corné Versloot Joost de Wit TNO Information and Communication Technology Delft, The Netherlands {stephan.raaijmakers,corne.versloot,

More information

Cluster-based Instance Consolidation For Subsequent Matching

Cluster-based Instance Consolidation For Subsequent Matching Jennifer Sleeman and Tim Finin, Cluster-based Instance Consolidation For Subsequent Matching, First International Workshop on Knowledge Extraction and Consolidation from Social Media, November 2012, Boston.

More information

2 Experimental Methodology and Results

2 Experimental Methodology and Results Developing Consensus Ontologies for the Semantic Web Larry M. Stephens, Aurovinda K. Gangam, and Michael N. Huhns Department of Computer Science and Engineering University of South Carolina, Columbia,

More information

Encoding Words into String Vectors for Word Categorization

Encoding Words into String Vectors for Word Categorization Int'l Conf. Artificial Intelligence ICAI'16 271 Encoding Words into String Vectors for Word Categorization Taeho Jo Department of Computer and Information Communication Engineering, Hongik University,

More information

Distributed minimum spanning tree problem

Distributed minimum spanning tree problem Distributed minimum spanning tree problem Juho-Kustaa Kangas 24th November 2012 Abstract Given a connected weighted undirected graph, the minimum spanning tree problem asks for a spanning subtree with

More information

A hybrid method to categorize HTML documents

A hybrid method to categorize HTML documents Data Mining VI 331 A hybrid method to categorize HTML documents M. Khordad, M. Shamsfard & F. Kazemeyni Electrical & Computer Engineering Department, Shahid Beheshti University, Iran Abstract In this paper

More information

Semantically Driven Snippet Selection for Supporting Focused Web Searches

Semantically Driven Snippet Selection for Supporting Focused Web Searches Semantically Driven Snippet Selection for Supporting Focused Web Searches IRAKLIS VARLAMIS Harokopio University of Athens Department of Informatics and Telematics, 89, Harokopou Street, 176 71, Athens,

More information

CHAPTER 6 PROPOSED HYBRID MEDICAL IMAGE RETRIEVAL SYSTEM USING SEMANTIC AND VISUAL FEATURES

CHAPTER 6 PROPOSED HYBRID MEDICAL IMAGE RETRIEVAL SYSTEM USING SEMANTIC AND VISUAL FEATURES 188 CHAPTER 6 PROPOSED HYBRID MEDICAL IMAGE RETRIEVAL SYSTEM USING SEMANTIC AND VISUAL FEATURES 6.1 INTRODUCTION Image representation schemes designed for image retrieval systems are categorized into two

More information

Context Sensitive Search Engine

Context Sensitive Search Engine Context Sensitive Search Engine Remzi Düzağaç and Olcay Taner Yıldız Abstract In this paper, we use context information extracted from the documents in the collection to improve the performance of the

More information

Hierarchical Online Mining for Associative Rules

Hierarchical Online Mining for Associative Rules Hierarchical Online Mining for Associative Rules Naresh Jotwani Dhirubhai Ambani Institute of Information & Communication Technology Gandhinagar 382009 INDIA naresh_jotwani@da-iict.org Abstract Mining

More information

Enriching Ontology Concepts Based on Texts from WWW and Corpus

Enriching Ontology Concepts Based on Texts from WWW and Corpus Journal of Universal Computer Science, vol. 18, no. 16 (2012), 2234-2251 submitted: 18/2/11, accepted: 26/8/12, appeared: 28/8/12 J.UCS Enriching Ontology Concepts Based on Texts from WWW and Corpus Tarek

More information

Image Classification Using Text Mining and Feature Clustering (Text Document and Image Categorization Using Fuzzy Similarity Based Feature Clustering)

Image Classification Using Text Mining and Feature Clustering (Text Document and Image Categorization Using Fuzzy Similarity Based Feature Clustering) Image Classification Using Text Mining and Clustering (Text Document and Image Categorization Using Fuzzy Similarity Based Clustering) 1 Mr. Dipak R. Pardhi, 2 Mrs. Charushila D. Pati 1 Assistant Professor

More information

MIRACLE at ImageCLEFmed 2008: Evaluating Strategies for Automatic Topic Expansion

MIRACLE at ImageCLEFmed 2008: Evaluating Strategies for Automatic Topic Expansion MIRACLE at ImageCLEFmed 2008: Evaluating Strategies for Automatic Topic Expansion Sara Lana-Serrano 1,3, Julio Villena-Román 2,3, José C. González-Cristóbal 1,3 1 Universidad Politécnica de Madrid 2 Universidad

More information

INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET) CONTEXT SENSITIVE TEXT SUMMARIZATION USING HIERARCHICAL CLUSTERING ALGORITHM

INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET) CONTEXT SENSITIVE TEXT SUMMARIZATION USING HIERARCHICAL CLUSTERING ALGORITHM INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & 6367(Print), ISSN 0976 6375(Online) Volume 3, Issue 1, January- June (2012), TECHNOLOGY (IJCET) IAEME ISSN 0976 6367(Print) ISSN 0976 6375(Online) Volume

More information

Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information

Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information Satoshi Sekine Computer Science Department New York University sekine@cs.nyu.edu Kapil Dalwani Computer Science Department

More information

Sense Match Making Approach for Semantic Web Service Discovery 1 2 G.Bharath, P.Deivanai 1 2 M.Tech Student, Assistant Professor

Sense Match Making Approach for Semantic Web Service Discovery 1 2 G.Bharath, P.Deivanai 1 2 M.Tech Student, Assistant Professor Sense Match Making Approach for Semantic Web Service Discovery 1 2 G.Bharath, P.Deivanai 1 2 M.Tech Student, Assistant Professor 1,2 Department of Software Engineering, SRM University, Chennai, India 1

More information

Error annotation in adjective noun (AN) combinations

Error annotation in adjective noun (AN) combinations Error annotation in adjective noun (AN) combinations This document describes the annotation scheme devised for annotating errors in AN combinations and explains how the inter-annotator agreement has been

More information

News Filtering and Summarization System Architecture for Recognition and Summarization of News Pages

News Filtering and Summarization System Architecture for Recognition and Summarization of News Pages Bonfring International Journal of Data Mining, Vol. 7, No. 2, May 2017 11 News Filtering and Summarization System Architecture for Recognition and Summarization of News Pages Bamber and Micah Jason Abstract---

More information

Falcon-AO: Aligning Ontologies with Falcon

Falcon-AO: Aligning Ontologies with Falcon Falcon-AO: Aligning Ontologies with Falcon Ningsheng Jian, Wei Hu, Gong Cheng, Yuzhong Qu Department of Computer Science and Engineering Southeast University Nanjing 210096, P. R. China {nsjian, whu, gcheng,

More information

Department of Electronic Engineering FINAL YEAR PROJECT REPORT

Department of Electronic Engineering FINAL YEAR PROJECT REPORT Department of Electronic Engineering FINAL YEAR PROJECT REPORT BEngCE-2007/08-HCS-HCS-03-BECE Natural Language Understanding for Query in Web Search 1 Student Name: Sit Wing Sum Student ID: Supervisor:

More information

Enhanced retrieval using semantic technologies:

Enhanced retrieval using semantic technologies: Enhanced retrieval using semantic technologies: Ontology based retrieval as a new search paradigm? - Considerations based on new projects at the Bavarian State Library Dr. Berthold Gillitzer 28. Mai 2008

More information

Markov Chains for Robust Graph-based Commonsense Information Extraction

Markov Chains for Robust Graph-based Commonsense Information Extraction Markov Chains for Robust Graph-based Commonsense Information Extraction N iket Tandon 1,4 Dheera j Ra jagopal 2,4 Gerard de M elo 3 (1) Max Planck Institute for Informatics, Germany (2) NUS, Singapore

More information

TEXT PREPROCESSING FOR TEXT MINING USING SIDE INFORMATION

TEXT PREPROCESSING FOR TEXT MINING USING SIDE INFORMATION TEXT PREPROCESSING FOR TEXT MINING USING SIDE INFORMATION Ms. Nikita P.Katariya 1, Prof. M. S. Chaudhari 2 1 Dept. of Computer Science & Engg, P.B.C.E., Nagpur, India, nikitakatariya@yahoo.com 2 Dept.

More information

Organizing Information. Organizing information is at the heart of information science and is important in many other

Organizing Information. Organizing information is at the heart of information science and is important in many other Dagobert Soergel College of Library and Information Services University of Maryland College Park, MD 20742 Organizing Information Organizing information is at the heart of information science and is important

More information

COS 320. Compiling Techniques

COS 320. Compiling Techniques Topic 5: Types COS 320 Compiling Techniques Princeton University Spring 2016 Lennart Beringer 1 Types: potential benefits (I) 2 For programmers: help to eliminate common programming mistakes, particularly

More information

Identifying and Ranking Possible Semantic and Common Usage Categories of Search Engine Queries

Identifying and Ranking Possible Semantic and Common Usage Categories of Search Engine Queries Identifying and Ranking Possible Semantic and Common Usage Categories of Search Engine Queries Reza Taghizadeh Hemayati 1, Weiyi Meng 1, Clement Yu 2 1 Department of Computer Science, Binghamton university,

More information

Natural Language Processing. SoSe Question Answering

Natural Language Processing. SoSe Question Answering Natural Language Processing SoSe 2017 Question Answering Dr. Mariana Neves July 5th, 2017 Motivation Find small segments of text which answer users questions (http://start.csail.mit.edu/) 2 3 Motivation

More information

An Improving for Ranking Ontologies Based on the Structure and Semantics

An Improving for Ranking Ontologies Based on the Structure and Semantics An Improving for Ranking Ontologies Based on the Structure and Semantics S.Anusuya, K.Muthukumaran K.S.R College of Engineering Abstract Ontology specifies the concepts of a domain and their semantic relationships.

More information

code pattern analysis of object-oriented programming languages

code pattern analysis of object-oriented programming languages code pattern analysis of object-oriented programming languages by Xubo Miao A thesis submitted to the School of Computing in conformity with the requirements for the degree of Master of Science Queen s

More information

Taxonomies and controlled vocabularies best practices for metadata

Taxonomies and controlled vocabularies best practices for metadata Original Article Taxonomies and controlled vocabularies best practices for metadata Heather Hedden is the taxonomy manager at First Wind Energy LLC. Previously, she was a taxonomy consultant with Earley

More information

Contributions to the Study of Semantic Interoperability in Multi-Agent Environments - An Ontology Based Approach

Contributions to the Study of Semantic Interoperability in Multi-Agent Environments - An Ontology Based Approach Int. J. of Computers, Communications & Control, ISSN 1841-9836, E-ISSN 1841-9844 Vol. V (2010), No. 5, pp. 946-952 Contributions to the Study of Semantic Interoperability in Multi-Agent Environments -

More information