WordNet-based User Profiles for Semantic Personalization

Similar documents
Personalization in Digital Library: An Intelligent Service based on Semantic User Profiles

Improving Retrieval Experience Exploiting Semantic Representation of Documents

Random Walks for Knowledge-Based Word Sense Disambiguation. Qiuyu Li

CHAPTER 5 SEARCH ENGINE USING SEMANTIC CONCEPTS

Query Difficulty Prediction for Contextual Image Retrieval

Web Information Retrieval using WordNet

Automatic Construction of WordNets by Using Machine Translation and Language Modeling

NLP Final Project Fall 2015, Due Friday, December 18

Shrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India

Sense-based Information Retrieval System by using Jaccard Coefficient Based WSD Algorithm

COMP90042 LECTURE 3 LEXICAL SEMANTICS COPYRIGHT 2018, THE UNIVERSITY OF MELBOURNE

A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition

A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition

WEIGHTING QUERY TERMS USING WORDNET ONTOLOGY

What is this Song About?: Identification of Keywords in Bollywood Lyrics

MRD-based Word Sense Disambiguation: Extensions and Applications

Making Sense Out of the Web

Information Retrieval CS Lecture 01. Razvan C. Bunescu School of Electrical Engineering and Computer Science

Content-based Recommender Systems

Reading group on Ontologies and NLP:

CHAPTER 3 INFORMATION RETRIEVAL BASED ON QUERY EXPANSION AND LATENT SEMANTIC INDEXING

Tag Semantics for the Retrieval of XML Documents

NATURAL LANGUAGE PROCESSING

Package wordnet. February 15, 2013

Introduction to Information Retrieval

Using the Multilingual Central Repository for Graph-Based Word Sense Disambiguation

Chapter 6: Information Retrieval and Web Search. An introduction

Conceptual document indexing using a large scale semantic dictionary providing a concept hierarchy

Towards the Automatic Creation of a Wordnet from a Term-based Lexical Network

A Comprehensive Analysis of using Semantic Information in Text Categorization

DIT - University of Trento Concept Search: Semantics Enabled Information Retrieval

Evaluating a Conceptual Indexing Method by Utilizing WordNet

Query Phrase Expansion using Wikipedia for Patent Class Search

QUERY EXPANSION USING WORDNET WITH A LOGICAL MODEL OF INFORMATION RETRIEVAL

Karami, A., Zhou, B. (2015). Online Review Spam Detection by New Linguistic Features. In iconference 2015 Proceedings.

An ontology-based approach for semantics ranking of the web search engines results

TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES

YOUCAT : WEAKLY SUPERVISED YOUTUBE VIDEO CATEGORIZATION SYSTEM FROM META DATA & USER COMMENTS USING WORDNET & WIKIPEDIA

A Combined Method of Text Summarization via Sentence Extraction

Automatic Wordnet Mapping: from CoreNet to Princeton WordNet

BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network

Information Retrieval and Web Search

International Journal of Advance Engineering and Research Development SENSE BASED INDEXING OF HIDDEN WEB USING ONTOLOGY

Natural Language Processing Basics. Yingyu Liang University of Wisconsin-Madison

Semantic and bayesian profiling services for textual resource retrieval

Web Query Translation with Representative Synonyms in Cross Language Information Retrieval

Multi-Modal Word Synset Induction. Jesse Thomason and Raymond Mooney University of Texas at Austin

A Linguistic Approach for Semantic Web Service Discovery

Enhancing Automatic Wordnet Construction Using Word Embeddings

The JUMP project: domain ontologies and linguistic work

Sentiment Analysis using Support Vector Machine based on Feature Selection and Semantic Analysis

QUERIES SNIPPET EXPANSION FOR EFFICIENT IMAGES RETRIEVAL

Text Similarity. Semantic Similarity: Synonymy and other Semantic Relations

LexiRes: A Tool for Exploring and Restructuring EuroWordNet for Information Retrieval

A hybrid method to categorize HTML documents

Information Retrieval and Web Search

Department of Electronic Engineering FINAL YEAR PROJECT REPORT

INTERCONNECTING AND MANAGING MULTILINGUAL LEXICAL LINKED DATA. Ernesto William De Luca

Information Retrieval. Information Retrieval and Web Search

Limitations of XPath & XQuery in an Environment with Diverse Schemes

Challenge. Case Study. The fabric of space and time has collapsed. What s the big deal? Miami University of Ohio

Cross-Lingual Word Sense Disambiguation

Personalized Terms Derivative

Soft Word Sense Disambiguation

Digital Libraries: Language Technologies

Research Article Relevance Feedback Based Query Expansion Model Using Borda Count and Semantic Similarity Approach

Let s get parsing! Each component processes the Doc object, then passes it on. doc.is_parsed attribute checks whether a Doc object has been parsed

Search Engines. Information Retrieval in Practice

QUERY BASED TEXT SUMMARIZATION

CHAPTER 3 DYNAMIC NOMINAL LANGUAGE MODEL FOR INFORMATION RETRIEVAL

A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2

Academic Paper Recommendation Based on Heterogeneous Graph

Wordnet Based Document Clustering

Cross-Lingual Word Sense Disambiguation using Wordnets and Context based Mapping

SAACO: Semantic Analysis based Ant Colony Optimization Algorithm for Efficient Text Document Clustering

Recommender Systems. Collaborative Filtering & Content-Based Recommending

Optimal Query. Assume that the relevant set of documents C r. 1 N C r d j. d j. Where N is the total number of documents.

Chapter 27 Introduction to Information Retrieval and Web Search

Enriching Ontology Concepts Based on Texts from WWW and Corpus

A Method for Semi-Automatic Ontology Acquisition from a Corporate Intranet

Putting ontologies to work in NLP

Adaptive Model of Personalized Searches using Query Expansion and Ant Colony Optimization in the Digital Library

TEXT MINING APPLICATION PROGRAMMING

Ontology Based Search Engine

Package wordnet. November 26, 2017

MEASUREMENT OF SEMANTIC SIMILARITY BETWEEN WORDS: A SURVEY

Ontology Research Group Overview

CREATING A BIOMEDICAL ONTOLOGY INDEXED SEARCH ENGINE TO IMPROVE THE SEMANTIC RELEVANCE OF RETREIVED MEDICAL TEXT

Punjabi WordNet Relations and Categorization of Synsets

Semantically Driven Snippet Selection for Supporting Focused Web Searches

Manning Chapter: Text Retrieval (Selections) Text Retrieval Tasks. Vorhees & Harman (Bulkpack) Evaluation The Vector Space Model Advanced Techniques

Incorporating WordNet in an Information Retrieval System

Introduction to Tools for IndoWordNet and Word Sense Disambiguation

Application of Sentiment Lexicons on Movies Transcripts to Detect Violence in Videos

Using WordNet to Disambiguate Word Senses

Boolean Queries. Keywords combined with Boolean operators:

arxiv:cmp-lg/ v1 5 Aug 1998

Influence of Word Normalization on Text Classification

Text Mining. Munawar, PhD. Text Mining - Munawar, PhD

Automatically Enriching a Thesaurus with Information from Dictionaries

Transcription:

PIA 2005 Workshop on New Technologies for Personalized Information Access WordNet-based User Profiles for Semantic Personalization Giovanni Semeraro, Marco Degemmis, Pasquale Lops, Ignazio Palmisano LACAM Knowledge Acquisition and Machine Learning Lab Department of Computer Science University of Bari July 24th, 2005, Edinburgh, Scotland, UK

Outline

Today s Information Society Problems Explosion of irrelevant information Users overloaded by this information and consequences Searching is time consuming Need for intelligent solutions to support users

MyWeb? /2 Personalized Portals & Stores

Personalized Access Learning User Profiles as a Text Categorization Problem Preferences Arts & Photography Children s books Computers & Internet content-based recommendations by learning from TEXT and users ratings on items Book description at Amazon.com

Personalized Access Research Goal Intelligent Information Access = 3. by user profiles + 4. Semantic Access by concept identification in documents USER PROFILE: A STRUCTURED REPRESENTATION OF USER INTERESTS AND PREFERENCES Automated induction of user profiles by means of supervised machine learning techniques Taking into account the meaning of the words

Personalized Access Intelligent Personalized Information Access Training documents Positive examples Negative examples Documents Concept identification Learning Concept-based Profile matching Concept-based document representation

Personalized Access Movie Recommending on the Web Tokenization + Stopword elimination + Stemming Bag of Words (BOW) Title Director Instance (movie) Cast Summary Keywords User Ratings:0-5

Word Sense Disambiguation (WSD) Many meanings for polisemous words, known as senses One sense at a time is used in a specific context. Deciding which sense to use is Word Sense Disambiguation Approaches to WSD Knowledge-based: uses Machine Readable Dictionaries Corpus-based: uses sense-tagged corpus

WordNet Lexical reference database whose design is inspired by current psycholinguistic theories of human lexical memory The work started in 985 by a group of psychologists and linguists at Princeton University English nouns, verbs, adverbs and adjectives are organized into SYNonym SETs, each representing one underlying lexical concept Relations among synsets can be used to engineer a change of representation in text data by transforming vectors of words into vectors of word meanings The synonymy relation can be used to map words with similar meanings together Hypernymy (corresponding to the IS A relation) can be used to generalize noun and verb meanings to a higher level of abstraction http://www.cogsci.princeton.edu/~wn/

Synset Semantic Similarity How to use WordNet for WSD? Semantic similarity between synsets is inversely proportional to the distance between synsets in the WordNet IS-A hierarchy Placental mammal Path length similarity between synsets3 could 4 SINSIM(cat,mouse) be used to assign =scores to synsets of a Carnivore Rodent polysemous word -log(5/32)=0.806 2 5 Feline, felid Mouse (rodent) Cat (feline mammal) Leacock-Chodorow [Fellbaum 998]

Semantic Indexing A document d is mapped into a list of WordNet synsets following these steps: Each monosemous word w in a slot of a document d is mapped into the corresponding WordNet synset; For each couple of words <noun,noun> or <adjective,noun>, a search in WordNet is made in order to verify if at least one synset exists for the bigram <w,w2>. In the positive case, WSD algorithm is applied on the bigram, otherwise it is applied separately on w and w2, using all words in the slot as the context C of w; Each polysemous unigram w is disambiguated, using all words in the slot as the context C of w.

Experimental Evaluation Extended Eachmovie Internet Movie Database 0-fold stratified cross-validation Precision, Recall, F-measure, NDPM Movie relevant if rating >2 Rocchio: Cosine Similarity (positive/negative profile) : BOW-generated profiles vs. BOS-generated profiles Wilcoxon signed rank test Low number of independent trials Classification for each genre is a trial Significance level p < 0.05

The EachMovie Dataset Project conducted by Compaq Research Centre (996-997) Dataset of user-movie ratings About 2.8 millions ratings Over 72,000 users,628 items (movies) subdivided in 0 categories Discrete rating between 0 and 5 Movies content crawled from the Internet Movie Database (IMDb) 0 movie categories 933 randomly selected users 00 users for each category, only for Category 2 Animation, 33 users selected Each user rated between 30 and 00 movies

Extended Eachmovie (ratings+content) Id Genre Genre #Rated Movies %POS %NEG Action 4,474 72.05 27.95 2 Animation,03 56.67 43.33 3 Art_Foreign 4,246 76.2 23.79 4 Classic 5,026 9.73 8.27 5 Comedy 4,74 63.46 36.54 6 Drama 4,880 76.24 23.76 7 Family 3,808 63.7 36.29 8 Horror 3,63 59.89 40. 9 Romance 3,707 72.97 27.03 0 Thriller 3,709 7.94 28.06 39,298 7.84 28.6 60-65% positive 70-75% positive 75-00% positive

Bag of Synsets Bag of Words Bag of Synsets Id Movie Word Form Occurrence Id Movie Word Form Id Synset Occurrence 3 aaron 3 aaron 884402 67 murder 67 murder 672568 34 roll 3 34 wheel 2 34 34 34 Roll roll wheel 6 zoom 6 zoom #Features=72.296 205720 205720 205720 3 5 2 6855 #Features=07.990 38% Reduction of features representing movies in the EachMovie dataset Mainly on slots containing proper names Recognition of bigrams Synonyms represented by the same synsets

Semantic Profiles Evaluation Id Precision Recall F NDPM BOW BOS BOW BOS BOW BOS BOW BOS 0.72 0.75 0.82 0.86 0.75 0.79 0.46 0.44 2 0.65 0.64 0.66 0.66 0.64 0.63 0.34 0.38 3 0.77 0.85 0.79 0.86 0.77 0.84 0.46 0.48 4 0.92 0.94 0.94 0.96 0.93 0.94 0.45 0.43 5 0.66 0.69 0.72 0.75 0.67 0.70 0.44 0.46 6 0.78 0.79 0.84 0.87 0.80 0.8 0.45 0.45 7 0.68 0.74 0.75 0.84 0.69 0.77 0.4 0.40 8 0.64 0.69 0.74 0.82 0.67 0.73 0.42 0.44 9 0.73 0.76 0.79 0.8 0.74 0.77 0.48 0.48 0 0.74 0.75 0.85 0.84 0.77 0.78 0.45 0.44 Me an 0.73 0.76 0.78 0.83 0.74 0.84 0.44 0.44

Results Improvement in precision (+3%) and recall (+5%) The BOS model outperforms the BOW model specifically on datasets: 3 (+8% of precision, +7% of recall) 7 (+6% of precision, +9% of recall) 8 (+5% of precision, +8% of recall) No improvement on dataset 2 (Animation) Low number of rated movies WSD errors (difficulty in disambiguating stories)

Remarks Final Conclusions & Future Works Extending BOW to BOS improves classification accuracy when WSD in performed on short documents Improved results are independent from the distribution of positive and negative examples in the dataset Integration of user profiles into UUCM [Metha et al. 2005] Ontologies and user profiles Domain-specific ontologies