Lecture 4: Unsupervised Word-sense Disambiguation
|
|
- Shonda Quinn
- 6 years ago
- Views:
Transcription
1 ootstrapping Lecture 4: Unsupervised Word-sense Disambiguation Lexical Semantics and Discourse Processing MPhil in dvanced Computer Science Simone Teufel Natural Language and Information Processing (NLIP) Group Slides after Frank Keller February 2, 2011 Simone Teufel Lecture 4: Unsupervised Word-sense Disambiguation 1
2 ootstrapping 1 ootstrapping Heuristics Seed Set Classification Generalization 2 Reading: Yarowsky (1995), Navigli and Lapata (2010). Simone Teufel Lecture 4: Unsupervised Word-sense Disambiguation 2
3 Heuristics ootstrapping Heuristics Seed Set Classification Generalization Yarowsky s (1995) algorithm uses two powerful heuristics for WSD: One sense per collocation: nearby words provide clues to the sense of the target word, conditional on distance, order, syntactic relationship. One sense per discourse: the sense of a target words is consistent within a given document. The Yarowsky algorithm is a bootstrapping algorithm, i.e., it requires a small amount of annotated data. Figures and tables in this section from Yarowsky (1995). Simone Teufel Lecture 4: Unsupervised Word-sense Disambiguation 3
4 Seed Set ootstrapping Heuristics Seed Set Classification Generalization Step 1: Extract all instances of a polysemous or homonymous word. Step 2: Generate a seed set of labeled examples: either by manually labeling them; or by using a reliable heuristic. Example: target word plant: s seed set take all instances of plant life (sense ) and manufacturing plant (sense ). Simone Teufel Lecture 4: Unsupervised Word-sense Disambiguation 4
5 ootstrapping Heuristics Seed Set Classification Generalization Seed Set Life Manufacturing Simone Teufel Lecture 4: Unsupervised Word-sense Disambiguation 5
6 Classification ootstrapping Heuristics Seed Set Classification Generalization Step 3a: Train classifier on the seed set. Step 3b: pply classifier to the entire sample set. dd those examples that are classified reliably (probability above a threshold) to the seed set. Yarowsky uses a decision list classifier: rules of the form: collocation sense rules are ordered by log-likelihood: log P(sense collocation i ) P(sense collocation i ) classification is based on the first rule that applies. Simone Teufel Lecture 4: Unsupervised Word-sense Disambiguation 6
7 Classification ootstrapping Heuristics Seed Set Classification Generalization LogL Collocation Sense 8.10 plant life 7.58 manufacturing plant 7.39 life (within words) 7.20 manufacturing (in words) 6.27 animal (within words) 4.70 equipment (within words) 4.39 employee (within words) 4.30 assembly plant 4.10 plant closure 3.52 plant species 3.48 automate (within +-10 words) 3.45 microscopic plant... Simone Teufel Lecture 4: Unsupervised Word-sense Disambiguation 7
8 Classification ootstrapping Heuristics Seed Set Classification Generalization Step 3c: Use one-sense-per-discourse constraint to filter newly classified examples: If several examples have already been annotated as sense, then extend this to all examples of the word in the discourse. This can form a bridge to new collocations, and correct erroneously labeled examples. Step 3d: repeat Steps 3a d. Simone Teufel Lecture 4: Unsupervised Word-sense Disambiguation 8
9 ootstrapping Heuristics Seed Set Classification Generalization Classification Life Manufacturing Microscopic Species nimal utomate Equipment Employee Simone Teufel Lecture 4: Unsupervised Word-sense Disambiguation 9
10 Generalization ootstrapping Heuristics Seed Set Classification Generalization Step 4: lgorithm converges on a stable residual set (remaining unlabeled instances): most training examples will now exhibit multiple collocations indicative of the same sense; decision list procedure uses only the most reliable rule, not a combination of rules. Step 5: The final classifier can now be applied to unseen data. Simone Teufel Lecture 4: Unsupervised Word-sense Disambiguation 10
11 Discussion ootstrapping Heuristics Seed Set Classification Generalization Strengths: simple algorithm that uses only minimal features (words in the context of the target word); minimal effort required to create seed set; does not rely on dictionary or other external knowledge. Weaknesses: uses very simple classifier (but could replace it with a more state-of-the-art one); not fully unsupervised: requires seed data; does not make use of the structure of the sense inventory. lternative: graph-based algorithms exploit the structure of the sense inventory for WSD. Simone Teufel Lecture 4: Unsupervised Word-sense Disambiguation 11
12 ootstrapping Navigli and Lapata s (2010) algorithm is an example of graph-based WSD. It exploits the fact that sense inventories have internal structure. Example: synsets (senses) of drink in Wordnet: (1) a. b. {drink 1 v, imbibe3 v } {drink 2 v, booze 1 v, fuddle 2 v } c. {toast 2 v, drink 3 v, pledge 2 v, salute 1 v, wassail 2 v} d. {drink in 1 v, drink 4 v } e. {drink 5 v, tope 1 v } Figures and tables in this section from Navigli and Lapata (2010). Simone Teufel Lecture 4: Unsupervised Word-sense Disambiguation 12
13 WN as a graph ootstrapping We can represent Wordnet as a graph whose nodes are synsets and whose edges are relations between synsets. Note that the edges are not labeled, i.e., the type of relation between the nodes is ignored. Simone Teufel Lecture 4: Unsupervised Word-sense Disambiguation 13
14 ootstrapping Example: graph for the first sense of drink. helping 1 n food 1 n liquid 1 n drink 1 n beverage 1 n milk 1 n sup 1 v toast 4 n consume 2 v drink 1 v sip 1 v nip 4 n consumer 1 n drinker 1 n drinking 1 n consumption 1 n potation 1 n Simone Teufel Lecture 4: Unsupervised Word-sense Disambiguation 14
15 ootstrapping Disambiguation algorithm: 1 Use the Wordnet graph to construct a graph that incorporates each content word in the sentence to be disambiguated; 2 Rank each node in the sentence graph according to its importance using graph connectivity measures; 3 For each content word, pick the highest ranked sense as the correct sense of the word. Simone Teufel Lecture 4: Unsupervised Word-sense Disambiguation 15
16 ootstrapping Given a word sequence σ = (w 1,w 2,...,w n ), the graph G is constructed as follows: 1 Let V σ := n Senses(w i ) denote all possible word senses in σ. i=1 We set V := V σ and E :=. 2 For each node v V σ, we perform a depth-first search (DFS) of the Wordnet graph: every time we encounter a node v V σ (v v) along a path v v 1 v k v of length L, we add all intermediate nodes and edges on the path from v to v : V := V {v 1,...,v k } and E := E {{v,v 1 },..., {v k,v }}. For tractability, we fix the maximum path length at 6. Simone Teufel Lecture 4: Unsupervised Word-sense Disambiguation 16
17 ootstrapping Example: graph for drink milk. drink 1 v drink 1 n beverage 1 n milk 1 n drink 2 v drink 3 v drink 4 v drink 5 v milk 2 n milk 3 n milk 4 n Simone Teufel Lecture 4: Unsupervised Word-sense Disambiguation 17
18 ootstrapping Example: graph for drink milk. drink 1 v drink 2 v drink 1 n beverage 1 n milk 1 n food 1 n drink 3 v nutriment 1 n milk 2 n drink 4 v milk 3 n drink 5 v milk 4 n Simone Teufel Lecture 4: Unsupervised Word-sense Disambiguation 17
19 ootstrapping Example: graph for drink milk. drink 1 v drink 2 v drink 1 n beverage 1 n milk 1 n food 1 n drink 3 v nutriment 1 n milk 2 n drink 4 v milk 3 n drink 5 v milk 4 n Simone Teufel Lecture 4: Unsupervised Word-sense Disambiguation 17
20 ootstrapping Example: graph for drink milk. drink 1 v drink 1 n beverage 1 n milk 1 n drink 2 v drinker 2 n food 1 n drink 3 v nutriment 1 n milk 2 n drink 4 v milk 3 n drink 5 v milk 4 n Simone Teufel Lecture 4: Unsupervised Word-sense Disambiguation 17
21 ootstrapping Example: graph for drink milk. drink 1 v drink 1 n beverage 1 n milk 1 n drink 2 v drinker 2 n food 1 n drink 3 v nutriment 1 n milk 2 n drink 4 v milk 3 n drink 5 v milk 4 n Simone Teufel Lecture 4: Unsupervised Word-sense Disambiguation 17
22 ootstrapping Example: graph for drink milk. drink 1 v drink 1 n beverage 1 n milk 1 n drink 2 v drinker 2 n food 1 n drink 3 v boozing 1 n nutriment 1 n milk 2 n drink 4 v milk 3 n drink 5 v milk 4 n Simone Teufel Lecture 4: Unsupervised Word-sense Disambiguation 17
23 ootstrapping Example: graph for drink milk. drink 1 v drink 1 n beverage 1 n milk 1 n drink 2 v drinker 2 n food 1 n drink 3 v boozing 1 n nutriment 1 n milk 2 n drink 4 v milk 3 n drink 5 v milk 4 n Simone Teufel Lecture 4: Unsupervised Word-sense Disambiguation 17
24 ootstrapping Example: graph for drink milk. drink 1 v drink 1 n beverage 1 n milk 1 n drink 2 v drinker 2 n food 1 n drink 3 v boozing 1 n nutriment 1 n milk 2 n drink 4 v milk 3 n drink 5 v milk 4 n We get 3 2 = 6 interpretations, i.e., subgraphs obtained when only considering one connected sense of drink and milk. Simone Teufel Lecture 4: Unsupervised Word-sense Disambiguation 17
25 ootstrapping Once we have the graph, we pick the most connected node for each word as the correct sense. Two types of connectivity measures: Local measures: gives a connectivity score to an individual node in the graph; use this directly to pick a sense; Global measures: assigns a connectivity score the to the graph as a whole; apply the measure to each interpretation and select the highest scoring one. Navigli and Lapata (2010) discuss a large number of graph connectivity measures; we will focus on the most important ones. Simone Teufel Lecture 4: Unsupervised Word-sense Disambiguation 18
26 Degree Centrality ootstrapping ssume a graph with nodes V and edges E. Then the degree of v V is the number of edges terminating in it: deg(v) = {{u,v} E : u V } (1) Degree centrality is the degree of a node normalized by the maximum degree: C D (v) = deg(v) V 1 For the previous example, C D (drinkv 1) = 3 14, C D(drinkv 2) = C D (drinkv 5) = 2 14, and C D(milkn 1) = C D(milkn 2) = So we pick drink 1 v, while milk n is tied. (2) Simone Teufel Lecture 4: Unsupervised Word-sense Disambiguation 19
27 Edge Density ootstrapping The edge density of a graph is the number of edges compared to a complete graph with V nodes (given by ( V ) 2 ): ED(G) = E(G) ) (3) ( V 2 The first interpretation of drink milk has ED(G) = ( 6 5 2) = 6 10 = 0.60, the second one ED(G) = ( 5 5 2) = 5 10 = Simone Teufel Lecture 4: Unsupervised Word-sense Disambiguation 20
28 on SemCor ootstrapping Local Global WordNet EnWordNet Measure ll Poly ll Poly Random ExtLesk Degree PageRank HITS KPP etweenness Compactness Graph Entropy Edge Density First Sense Simone Teufel Lecture 4: Unsupervised Word-sense Disambiguation 21
29 ootstrapping on Semeval ll-words Data System est Unsupervised (Sussex) 45.8 ExtLesk 43.1 Degree Unsupervised 52.9 est Semi-supervised (IRST-DDD) 56.7 Degree Semi-Unsupervised 60.7 First Sense 62.4 est Supervised (GML) 65.2 F Simone Teufel Lecture 4: Unsupervised Word-sense Disambiguation 22
30 Discussion ootstrapping Strengths: exploits the structure of the sense inventory/dictionary; conceptually simple, doesn t require any training data, not even a seed set; achieves good performance for unsupervised system. Weaknesses: performance not good enough for real applications (F-score of 53 on Semeval); sense inventories take a lot of effort to create (Wordnet has been under development for more than 15 years). Simone Teufel Lecture 4: Unsupervised Word-sense Disambiguation 23
31 Summary ootstrapping The Yarowsky algorithm uses two key heuristics: one sense per collocation; one sense per discourse; It starts with a small seed set, trains a classifier on it, and then applies it to the whole data set (bootstrapping); Reliable examples are kept, and the classifier is re-trained. Unsupervised graph-based WSD is an alternative, where the connectivity of the sense inventory is exploited. graph is constructed that represents the possible interpretations of a sentence; the nodes with the highest connectivity are picked as correct senses; range of connectivity measures exists, simple degree is best. Simone Teufel Lecture 4: Unsupervised Word-sense Disambiguation 24
32 References ootstrapping Yarowsky (1995): Unsupervised Word Sense Disambiguation rivaling Supervised Methods. Proceedings of the CL. Navigli and Lapata (2010): n Experimental Study of Graph Connectivity for Unsupervised Word Sense Disambiguation. IEEE Transactions on Pattern nalysis and Machine Intelligence (TPMI), 32(4), IEEE Press, 2010, pp Simone Teufel Lecture 4: Unsupervised Word-sense Disambiguation 25
WORD sense disambiguation (WSD), the ability to identify
IEEE TPAMI 1 An Experimental Study of Graph Connectivity for Unsupervised Word Sense Disambiguation Roberto Navigli and Mirella Lapata Abstract Word sense disambiguation (WSD), the task of identifying
More informationslide courtesy of D. Yarowsky Splitting Words a.k.a. Word Sense Disambiguation Intro to NLP - J. Eisner 1
Splitting Words a.k.a. Word Sense Disambiguation 600.465 - Intro to NLP - J. Eisner Representing Word as Vector Could average over many occurrences of the word... Each word type has a different vector
More informationMaking Sense Out of the Web
Making Sense Out of the Web Rada Mihalcea University of North Texas Department of Computer Science rada@cs.unt.edu Abstract. In the past few years, we have witnessed a tremendous growth of the World Wide
More informationRandom Walks for Knowledge-Based Word Sense Disambiguation. Qiuyu Li
Random Walks for Knowledge-Based Word Sense Disambiguation Qiuyu Li Word Sense Disambiguation 1 Supervised - using labeled training sets (features and proper sense label) 2 Unsupervised - only use unlabeled
More informationUsing the Multilingual Central Repository for Graph-Based Word Sense Disambiguation
Using the Multilingual Central Repository for Graph-Based Word Sense Disambiguation Eneko Agirre, Aitor Soroa IXA NLP Group University of Basque Country Donostia, Basque Contry a.soroa@ehu.es Abstract
More informationCOMP90042 LECTURE 3 LEXICAL SEMANTICS COPYRIGHT 2018, THE UNIVERSITY OF MELBOURNE
COMP90042 LECTURE 3 LEXICAL SEMANTICS SENTIMENT ANALYSIS REVISITED 2 Bag of words, knn classifier. Training data: This is a good movie.! This is a great movie.! This is a terrible film. " This is a wonderful
More informationSemEval-2013 Task 13: Word Sense Induction for Graded and Non-Graded Senses
SemEval-2013 Task 13: Word Sense Induction for Graded and Non-Graded Senses David Jurgens Dipartimento di Informatica Sapienza Universita di Roma jurgens@di.uniroma1.it Ioannis Klapaftis Search Technology
More informationSense-based Information Retrieval System by using Jaccard Coefficient Based WSD Algorithm
ISBN 978-93-84468-0-0 Proceedings of 015 International Conference on Future Computational Technologies (ICFCT'015 Singapore, March 9-30, 015, pp. 197-03 Sense-based Information Retrieval System by using
More informationDAEBAK!: Peripheral Diversity for Multilingual Word Sense Disambiguation
DAEBAK!: Peripheral Diversity for Multilingual Word Sense Disambiguation Steve L. Manion University of Canterbury Christchurch, New Zealand steve.manion @pg.canterbury.ac.nz Raazesh Sainudiin University
More informationKnowledge-based Word Sense Disambiguation using Topic Models Devendra Singh Chaplot
Knowledge-based Word Sense Disambiguation using Topic Models Devendra Singh Chaplot Ruslan Salakhutdinov Word Sense Disambiguation Word sense disambiguation (WSD) is defined as the problem of computationally
More informationStatistical NLP Spring 2009
Statistical NLP Spring 2009 Learning Models with EM Hard EM: alternate between E-step: Find best completions Y for fixed θ M-step: Find best parameters θ for fixed Y Example: K-Means Lecture 5: WSD / Maxent
More informationStatistical NLP Spring 2009
Statistical NLP Spring 2009 Lecture 5: WSD / Maxent Dan Klein UC Berkeley Learning Models with EM Hard EM: alternate between E-step: Find best completions Y for fixed θ M-step: Find best parameters θ for
More informationEnhanced Retrieval of Web Pages using Improved Page Rank Algorithm
Enhanced Retrieval of Web Pages using Improved Page Rank Algorithm Rekha Jain 1, Sulochana Nathawat 2, Dr. G.N. Purohit 3 1 Department of Computer Science, Banasthali University, Jaipur, Rajasthan ABSTRACT
More informationBabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network
BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network Roberto Navigli, Simone Paolo Ponzetto What is BabelNet a very large, wide-coverage multilingual
More informationQuery Difficulty Prediction for Contextual Image Retrieval
Query Difficulty Prediction for Contextual Image Retrieval Xing Xing 1, Yi Zhang 1, and Mei Han 2 1 School of Engineering, UC Santa Cruz, Santa Cruz, CA 95064 2 Google Inc., Mountain View, CA 94043 Abstract.
More informationShrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India
International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 3 ISSN : 2456-3307 Some Issues in Application of NLP to Intelligent
More informationBuilding Instance Knowledge Network for Word Sense Disambiguation
Building Instance Knowledge Network for Word Sense Disambiguation Shangfeng Hu, Chengfei Liu Faculty of Information and Communication Technologies Swinburne University of Technology Hawthorn 3122, Victoria,
More informationExtractive Text Summarization Techniques
Extractive Text Summarization Techniques Tobias Elßner Hauptseminar NLP Tools 06.02.2018 Tobias Elßner Extractive Text Summarization Overview Rough classification (Gupta and Lehal (2010)): Supervised vs.
More informationAutomatically Annotating Text with Linked Open Data
Automatically Annotating Text with Linked Open Data Delia Rusu, Blaž Fortuna, Dunja Mladenić Jožef Stefan Institute Motivation: Annotating Text with LOD Open Cyc DBpedia WordNet Overview Related work Algorithms
More informationSwinburne Research Bank
Swinburne Research Bank http://researchbank.swinburne.edu.au Hu, S., & Liu, C. (2011). Incorporating coreference resolution into word sense disambiguation. Originally published A. Gelbukh (eds.). Proceedings
More informationInformation Retrieval Lecture 4: Web Search. Challenges of Web Search 2. Natural Language and Information Processing (NLIP) Group
Information Retrieval Lecture 4: Web Search Computer Science Tripos Part II Simone Teufel Natural Language and Information Processing (NLIP) Group sht25@cl.cam.ac.uk (Lecture Notes after Stephen Clark)
More informationTag Semantics for the Retrieval of XML Documents
Tag Semantics for the Retrieval of XML Documents Davide Buscaldi 1, Giovanna Guerrini 2, Marco Mesiti 3, Paolo Rosso 4 1 Dip. di Informatica e Scienze dell Informazione, Università di Genova, Italy buscaldi@disi.unige.it,
More informationSupervised Models for Coreference Resolution [Rahman & Ng, EMNLP09] Running Example. Mention Pair Model. Mention Pair Example
Supervised Models for Coreference Resolution [Rahman & Ng, EMNLP09] Many machine learning models for coreference resolution have been created, using not only different feature sets but also fundamentally
More informationCross-Lingual Word Sense Disambiguation
Cross-Lingual Word Sense Disambiguation Priyank Jaini Ankit Agrawal pjaini@iitk.ac.in ankitag@iitk.ac.in Department of Mathematics and Statistics Department of Mathematics and Statistics.. Mentor: Prof.
More informationA Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition
A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition Ana Zelaia, Olatz Arregi and Basilio Sierra Computer Science Faculty University of the Basque Country ana.zelaia@ehu.es
More informationA Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition
A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition Ana Zelaia, Olatz Arregi and Basilio Sierra Computer Science Faculty University of the Basque Country ana.zelaia@ehu.es
More informationBuilding Multilingual Resources and Neural Models for Word Sense Disambiguation. Alessandro Raganato March 15th, 2018
Building Multilingual Resources and Neural Models for Word Sense Disambiguation Alessandro Raganato March 15th, 2018 About me alessandro.raganato@helsinki.fi http://wwwusers.di.uniroma1.it/~raganato ERC
More informationClustering Lecture 5: Mixture Model
Clustering Lecture 5: Mixture Model Jing Gao SUNY Buffalo 1 Outline Basics Motivation, definition, evaluation Methods Partitional Hierarchical Density-based Mixture model Spectral methods Advanced topics
More informationExploiting Conversation Structure in Unsupervised Topic Segmentation for s
Exploiting Conversation Structure in Unsupervised Topic Segmentation for Emails Shafiq Joty, Giuseppe Carenini, Gabriel Murray, Raymond Ng University of British Columbia Vancouver, Canada EMNLP 2010 1
More informationClass 5: Attributes and Semantic Features
Class 5: Attributes and Semantic Features Rogerio Feris, Feb 21, 2013 EECS 6890 Topics in Information Processing Spring 2013, Columbia University http://rogerioferis.com/visualrecognitionandsearch Project
More informationOverview Citation. ML Introduction. Overview Schedule. ML Intro Dataset. Introduction to Semi-Supervised Learning Review 10/4/2010
INFORMATICS SEMINAR SEPT. 27 & OCT. 4, 2010 Introduction to Semi-Supervised Learning Review 2 Overview Citation X. Zhu and A.B. Goldberg, Introduction to Semi- Supervised Learning, Morgan & Claypool Publishers,
More informationWordNet-based User Profiles for Semantic Personalization
PIA 2005 Workshop on New Technologies for Personalized Information Access WordNet-based User Profiles for Semantic Personalization Giovanni Semeraro, Marco Degemmis, Pasquale Lops, Ignazio Palmisano LACAM
More informationCS371R: Final Exam Dec. 18, 2017
CS371R: Final Exam Dec. 18, 2017 NAME: This exam has 11 problems and 16 pages. Before beginning, be sure your exam is complete. In order to maximize your chance of getting partial credit, show all of your
More informationMulti-Modal Word Synset Induction. Jesse Thomason and Raymond Mooney University of Texas at Austin
Multi-Modal Word Synset Induction Jesse Thomason and Raymond Mooney University of Texas at Austin Word Synset Induction kiwi Word Synset Induction chinese grapefruit kiwi kiwi vine Word Synset Induction
More informationIdentifying and Ranking Possible Semantic and Common Usage Categories of Search Engine Queries
Identifying and Ranking Possible Semantic and Common Usage Categories of Search Engine Queries Reza Taghizadeh Hemayati 1, Weiyi Meng 1, Clement Yu 2 1 Department of Computer Science, Binghamton university,
More informationKnowledge-based Word Sense Disambiguation using Topic Models
Knowledge-based Word Sense Disambiguation using Topic Models Devendra Singh Chaplot, Ruslan Salakhutdinov {chaplot,rsalakhu}@cs.cmu.edu Machine Learning Department School of Computer Science Carnegie Mellon
More informationScienceDirect. Enhanced Associative Classification of XML Documents Supported by Semantic Concepts
Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 46 (2015 ) 194 201 International Conference on Information and Communication Technologies (ICICT 2014) Enhanced Associative
More informationCross-Instance Tuning of Unsupervised Document Clustering Algorithms
Cross-Instance Tuning of Unsupervised Document Clustering Algorithms Damianos Karakos, Jason Eisner, and Sanjeev Khudanpur Center for Language and Speech Processing Johns Hopkins University Carey E. Priebe
More informationAccurate Semi-supervised Classification for Graph Data
Accurate Semi-supervised Classification for Graph Data Frank Lin Carnegie Mellon University 5000 Forbes Ave Pittsburgh, PA 523 frank@cs.cmu.edu William W. Cohen Carnegie Mellon University 5000 Forbes Ave
More informationNumber Sense Disambiguation
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab) Sabine Buchholz (Toshiba Research Europe) Introduction Number Sense Disambiguation Related Work My System Error Analysis
More informationImprovement of Web Search Results using Genetic Algorithm on Word Sense Disambiguation
Volume 3, No.5, May 24 International Journal of Advances in Computer Science and Technology Pooja Bassin et al., International Journal of Advances in Computer Science and Technology, 3(5), May 24, 33-336
More informationFinal Exam Search Engines ( / ) December 8, 2014
Student Name: Andrew ID: Seat Number: Final Exam Search Engines (11-442 / 11-642) December 8, 2014 Answer all of the following questions. Each answer should be thorough, complete, and relevant. Points
More informationUsing PageRank in Feature Selection
Using PageRank in Feature Selection Dino Ienco, Rosa Meo, and Marco Botta Dipartimento di Informatica, Università di Torino, Italy {ienco,meo,botta}@di.unito.it Abstract. Feature selection is an important
More informationUsing PageRank in Feature Selection
Using PageRank in Feature Selection Dino Ienco, Rosa Meo, and Marco Botta Dipartimento di Informatica, Università di Torino, Italy fienco,meo,bottag@di.unito.it Abstract. Feature selection is an important
More informationCSE 7/5337: Information Retrieval and Web Search Document clustering I (IIR 16)
CSE 7/5337: Information Retrieval and Web Search Document clustering I (IIR 16) Michael Hahsler Southern Methodist University These slides are largely based on the slides by Hinrich Schütze Institute for
More informationAutomatic Wordnet Mapping: from CoreNet to Princeton WordNet
Automatic Wordnet Mapping: from CoreNet to Princeton WordNet Jiseong Kim, Younggyun Hahm, Sunggoo Kwon, Key-Sun Choi Semantic Web Research Center, School of Computing, KAIST 291 Daehak-ro, Yuseong-gu,
More informationIdentifying Poorly-Defined Concepts in WordNet with Graph Metrics
Identifying Poorly-Defined Concepts in WordNet with Graph Metrics John P. McCrae and Narumol Prangnawarat Insight Centre for Data Analytics, National University of Ireland, Galway john@mccr.ae, narumol.prangnawarat@insight-centre.org
More informationstructure of the presentation Frame Semantics knowledge-representation in larger-scale structures the concept of frame
structure of the presentation Frame Semantics semantic characterisation of situations or states of affairs 1. introduction (partially taken from a presentation of Markus Egg): i. what is a frame supposed
More informationTopics for Today. The Last (i.e. Final) Class. Weakly Supervised Approaches. Weakly supervised learning algorithms (for NP coreference resolution)
Topics for Today The Last (i.e. Final) Class Weakly supervised learning algorithms (for NP coreference resolution) Co-training Self-training A look at the semester and related courses Submit the teaching
More informationLecture #11: The Perceptron
Lecture #11: The Perceptron Mat Kallada STAT2450 - Introduction to Data Mining Outline for Today Welcome back! Assignment 3 The Perceptron Learning Method Perceptron Learning Rule Assignment 3 Will be
More informationTowards Efficient and Effective Semantic Table Interpretation Ziqi Zhang
Towards Efficient and Effective Semantic Table Interpretation Ziqi Zhang Department of Computer Science, University of Sheffield Outline Define semantic table interpretation State-of-the-art and motivation
More informationWeb Document Clustering using Semantic Link Analysis
Web Document Clustering using Semantic Link Analysis SOMJIT ARCH-INT, Ph.D. Semantic Information Technology Innovation (SITI) LAB Department of Computer Science, Faculty of Science, Khon Kaen University,
More informationCompilers and Interpreters
Overview Roadmap Language Translators: Interpreters & Compilers Context of a compiler Phases of a compiler Compiler Construction tools Terminology How related to other CS Goals of a good compiler 1 Compilers
More informationAutomatic Word Sense Disambiguation Using Wikipedia
Automatic Word Sense Disambiguation Using Wikipedia Sivakumar J *, Anthoniraj A ** School of Computing Science and Engineering, VIT University Vellore-632014, TamilNadu, India * jpsivas@gmail.com ** aanthoniraja@gmail.com
More informationIntroduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p.
Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p. 6 What is Web Mining? p. 6 Summary of Chapters p. 8 How
More informationA Linguistic Approach for Semantic Web Service Discovery
A Linguistic Approach for Semantic Web Service Discovery Jordy Sangers 307370js jordysangers@hotmail.com Bachelor Thesis Economics and Informatics Erasmus School of Economics Erasmus University Rotterdam
More informationMIA - Master on Artificial Intelligence
MIA - Master on Artificial Intelligence 1 Hierarchical Non-hierarchical Evaluation 1 Hierarchical Non-hierarchical Evaluation The Concept of, proximity, affinity, distance, difference, divergence We use
More informationThe Luxembourg BabelNet Workshop
The Luxembourg BabelNet Workshop 2 March 2016: Session 3 Tech session Disambiguating text with Babelfy. The Babelfy API Claudio Delli Bovi Outline Multilingual disambiguation with Babelfy Using Babelfy
More informationToday s topic CS347. Results list clustering example. Why cluster documents. Clustering documents. Lecture 8 May 7, 2001 Prabhakar Raghavan
Today s topic CS347 Clustering documents Lecture 8 May 7, 2001 Prabhakar Raghavan Why cluster documents Given a corpus, partition it into groups of related docs Recursively, can induce a tree of topics
More informationINF4820, Algorithms for AI and NLP: Evaluating Classifiers Clustering
INF4820, Algorithms for AI and NLP: Evaluating Classifiers Clustering Erik Velldal University of Oslo Sept. 18, 2012 Topics for today 2 Classification Recap Evaluating classifiers Accuracy, precision,
More informationNatural Language Processing
Natural Language Processing Machine Learning Potsdam, 26 April 2012 Saeedeh Momtazi Information Systems Group Introduction 2 Machine Learning Field of study that gives computers the ability to learn without
More informationSemi-Supervised Learning: Lecture Notes
Semi-Supervised Learning: Lecture Notes William W. Cohen March 30, 2018 1 What is Semi-Supervised Learning? In supervised learning, a learner is given a dataset of m labeled examples {(x 1, y 1 ),...,
More informationMRD-based Word Sense Disambiguation: Extensions and Applications
MRD-based Word Sense Disambiguation: Extensions and Applications Timothy Baldwin Joint Work with F. Bond, S. Fujita, T. Tanaka, Willy and S.N. Kim 1 MRD-based Word Sense Disambiguation: Extensions and
More informationCS 6604: Data Mining Large Networks and Time-Series
CS 6604: Data Mining Large Networks and Time-Series Soumya Vundekode Lecture #12: Centrality Metrics Prof. B Aditya Prakash Agenda Link Analysis and Web Search Searching the Web: The Problem of Ranking
More informationINTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET) CONTEXT SENSITIVE TEXT SUMMARIZATION USING HIERARCHICAL CLUSTERING ALGORITHM
INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & 6367(Print), ISSN 0976 6375(Online) Volume 3, Issue 1, January- June (2012), TECHNOLOGY (IJCET) IAEME ISSN 0976 6367(Print) ISSN 0976 6375(Online) Volume
More informationSome experiments on Telugu
Siva Abhilash Language Technologies Research Center IIIT Hyderabad August 27, 2008 Outline 1 WSD for telugu Introduction Resources Approach 2 Introduction Dutch Wordnet 3 Building Telugu Wordnet Method
More informationMACHINE LEARNING FOR SOFTWARE MAINTAINABILITY
MACHINE LEARNING FOR SOFTWARE MAINTAINABILITY Anna Corazza, Sergio Di Martino, Valerio Maggio Alessandro Moschitti, Andrea Passerini, Giuseppe Scanniello, Fabrizio Silverstri JIMSE 2012 August 28, 2012
More informationExam Marco Kuhlmann. This exam consists of three parts:
TDDE09, 729A27 Natural Language Processing (2017) Exam 2017-03-13 Marco Kuhlmann This exam consists of three parts: 1. Part A consists of 5 items, each worth 3 points. These items test your understanding
More informationEnsemble Learning: An Introduction. Adapted from Slides by Tan, Steinbach, Kumar
Ensemble Learning: An Introduction Adapted from Slides by Tan, Steinbach, Kumar 1 General Idea D Original Training data Step 1: Create Multiple Data Sets... D 1 D 2 D t-1 D t Step 2: Build Multiple Classifiers
More informationChapter 3. Graphs. Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved.
Chapter 3 Graphs Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved. 1 3.1 Basic Definitions and Applications Undirected Graphs Undirected graph. G = (V, E) V = nodes. E
More informationAn Iterative Sudoku Style Approach to Subgraph-based Word Sense Disambiguation
An Iterative Sudoku Style Approach to Subgraph-based Word Sense Disambiguation Steve L. Manion University of Canterbury Christchurch, New Zealand steve.manion @pg.canterbury.ac.nz Raazesh Sainudiin University
More informationPart I: Data Mining Foundations
Table of Contents 1. Introduction 1 1.1. What is the World Wide Web? 1 1.2. A Brief History of the Web and the Internet 2 1.3. Web Data Mining 4 1.3.1. What is Data Mining? 6 1.3.2. What is Web Mining?
More informationBOOTSTRAPPING VIA GRAPH PROPAGATION
BOOTSTRAPPING VIA GRAPH PROPAGATION by Max Whitney B.Sc. (Computing Science), Simon Fraser University, 2009 a Thesis submitted in partial fulfillment of the requirements for the degree of Master of Science
More informationVISION & LANGUAGE From Captions to Visual Concepts and Back
VISION & LANGUAGE From Captions to Visual Concepts and Back Brady Fowler & Kerry Jones Tuesday, February 28th 2017 CS 6501-004 VICENTE Agenda Problem Domain Object Detection Language Generation Sentence
More informationCPSC340. State-of-the-art Neural Networks. Nando de Freitas November, 2012 University of British Columbia
CPSC340 State-of-the-art Neural Networks Nando de Freitas November, 2012 University of British Columbia Outline of the lecture This lecture provides an overview of two state-of-the-art neural networks:
More informationExploiting Symmetry in Relational Similarity for Ranking Relational Search Results
Exploiting Symmetry in Relational Similarity for Ranking Relational Search Results Tomokazu Goto, Nguyen Tuan Duc, Danushka Bollegala, and Mitsuru Ishizuka The University of Tokyo, Japan {goto,duc}@mi.ci.i.u-tokyo.ac.jp,
More informationIntroduction to Information Retrieval
Introduction to Information Retrieval http://informationretrieval.org IIR 6: Flat Clustering Wiltrud Kessler & Hinrich Schütze Institute for Natural Language Processing, University of Stuttgart 0-- / 83
More informationA graph-based method to improve WordNet Domains
A graph-based method to improve WordNet Domains Aitor González, German Rigau IXA group UPV/EHU, Donostia, Spain agonzalez278@ikasle.ehu.com german.rigau@ehu.com Mauro Castillo UTEM, Santiago de Chile,
More informationAT&T: The Tag&Parse Approach to Semantic Parsing of Robot Spatial Commands
AT&T: The Tag&Parse Approach to Semantic Parsing of Robot Spatial Commands Svetlana Stoyanchev, Hyuckchul Jung, John Chen, Srinivas Bangalore AT&T Labs Research 1 AT&T Way Bedminster NJ 07921 {sveta,hjung,jchen,srini}@research.att.com
More informationA hybrid method to categorize HTML documents
Data Mining VI 331 A hybrid method to categorize HTML documents M. Khordad, M. Shamsfard & F. Kazemeyni Electrical & Computer Engineering Department, Shahid Beheshti University, Iran Abstract In this paper
More informationMarkov Chains for Robust Graph-based Commonsense Information Extraction
Markov Chains for Robust Graph-based Commonsense Information Extraction N iket Tandon 1,4 Dheera j Ra jagopal 2,4 Gerard de M elo 3 (1) Max Planck Institute for Informatics, Germany (2) NUS, Singapore
More informationMachine Learning. Nonparametric methods for Classification. Eric Xing , Fall Lecture 2, September 12, 2016
Machine Learning 10-701, Fall 2016 Nonparametric methods for Classification Eric Xing Lecture 2, September 12, 2016 Reading: 1 Classification Representing data: Hypothesis (classifier) 2 Clustering 3 Supervised
More informationSoft Word Sense Disambiguation
Soft Word Sense Disambiguation Abstract: Word sense disambiguation is a core problem in many tasks related to language processing. In this paper, we introduce the notion of soft word sense disambiguation
More informationOutline. Morning program Preliminaries Semantic matching Learning to rank Entities
112 Outline Morning program Preliminaries Semantic matching Learning to rank Afternoon program Modeling user behavior Generating responses Recommender systems Industry insights Q&A 113 are polysemic Finding
More informationCHAPTER 5 SEARCH ENGINE USING SEMANTIC CONCEPTS
82 CHAPTER 5 SEARCH ENGINE USING SEMANTIC CONCEPTS In recent years, everybody is in thirst of getting information from the internet. Search engines are used to fulfill the need of them. Even though the
More informationCombining Qualitative and Quantitative Keyword Extraction Methods with Document Layout Analysis
Combining Qualitative and Quantitative Keyword Extraction Methods with Document Layout Analysis Stefano Ferilli 1, Marenglen Biba 1, Teresa M.A. Basile 1, Floriana Esposito 1 1 Università di Bari, Dipartimento
More informationTwo graph-based algorithms for state-of-the-art WSD
Two graph-based algorithms for state-of-the-art WSD Eneko Agirre, David Martínez, Oier López de Lacalle and Aitor Soroa IXA NLP Group University of the Basque Country Donostia, Basque Contry a.soroa@si.ehu.es
More informationNLP in practice, an example: Semantic Role Labeling
NLP in practice, an example: Semantic Role Labeling Anders Björkelund Lund University, Dept. of Computer Science anders.bjorkelund@cs.lth.se October 15, 2010 Anders Björkelund NLP in practice, an example:
More informationData-Mining Algorithms with Semantic Knowledge
Data-Mining Algorithms with Semantic Knowledge Ontology-based information extraction Carlos Vicient Monllaó Universitat Rovira i Virgili December, 14th 2010. Poznan A Project funded by the Ministerio de
More informationEffect of log-based Query Term Expansion on Retrieval Effectiveness in Patent Searching
Effect of log-based Query Term Expansion on Retrieval Effectiveness in Patent Searching Wolfgang Tannebaum, Parvaz Madabi and Andreas Rauber Institute of Software Technology and Interactive Systems, Vienna
More informationINF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering
INF4820 Algorithms for AI and NLP Evaluating Classifiers Clustering Erik Velldal & Stephan Oepen Language Technology Group (LTG) September 23, 2015 Agenda Last week Supervised vs unsupervised learning.
More informationCSE 573: Artificial Intelligence Autumn 2010
CSE 573: Artificial Intelligence Autumn 2010 Lecture 16: Machine Learning Topics 12/7/2010 Luke Zettlemoyer Most slides over the course adapted from Dan Klein. 1 Announcements Syllabus revised Machine
More informationNoida institute of engineering and technology,greater noida
Impact Of Word Sense Ambiguity For English Language In Web IR Prachi Gupta 1, Dr.AnuragAwasthi 2, RiteshRastogi 3 1,2,3 Department of computer Science and engineering, Noida institute of engineering and
More informationPV211: Introduction to Information Retrieval https://www.fi.muni.cz/~sojka/pv211
PV: Introduction to Information Retrieval https://www.fi.muni.cz/~sojka/pv IIR 6: Flat Clustering Handout version Petr Sojka, Hinrich Schütze et al. Faculty of Informatics, Masaryk University, Brno Center
More informationA Semantic Role Repository Linking FrameNet and WordNet
A Semantic Role Repository Linking FrameNet and WordNet Volha Bryl, Irina Sergienya, Sara Tonelli, Claudio Giuliano {bryl,sergienya,satonelli,giuliano}@fbk.eu Fondazione Bruno Kessler, Trento, Italy Abstract
More informationLimitations of XPath & XQuery in an Environment with Diverse Schemes
Exploiting Structure, Annotation, and Ontological Knowledge for Automatic Classification of XML-Data Martin Theobald, Ralf Schenkel, and Gerhard Weikum Saarland University Saarbrücken, Germany 23.06.2003
More informationUlas Bagci
CAP5415-Computer Vision Lecture 14-Decision Forests for Computer Vision Ulas Bagci bagci@ucf.edu 1 Readings Slide Credits: Criminisi and Shotton Z. Tu R.Cipolla 2 Common Terminologies Randomized Decision
More informationCHAPTER 4: CLUSTER ANALYSIS
CHAPTER 4: CLUSTER ANALYSIS WHAT IS CLUSTER ANALYSIS? A cluster is a collection of data-objects similar to one another within the same group & dissimilar to the objects in other groups. Cluster analysis
More informationLecture 8 May 7, Prabhakar Raghavan
Lecture 8 May 7, 2001 Prabhakar Raghavan Clustering documents Given a corpus, partition it into groups of related docs Recursively, can induce a tree of topics Given the set of docs from the results of
More informationTowards Semantic Data Mining
Towards Semantic Data Mining Haishan Liu Department of Computer and Information Science, University of Oregon, Eugene, OR, 97401, USA ahoyleo@cs.uoregon.edu Abstract. Incorporating domain knowledge is
More information