Interpreting Document Collections with Topic Models. Nikolaos Aletras University College London

Size: px
Start display at page:

Download "Interpreting Document Collections with Topic Models. Nikolaos Aletras University College London"

Transcription

1 Interpreting Document Collections with Topic Models Nikolaos Aletras University College London

2 Acknowledgements Mark Stevenson, Sheffield Tim Baldwin, Melbourne Jey Han Lau, IBM Research

3 Talk Outline Introduction Topic Modelling Organising Document Collections with Topic Models Topic Coherence Topic Similarity Topic Labelling

4 Introduction Large amount of collections of documents available electronically

5 Introduction Large amount of collections of documents available electronically Unstructured

6 Introduction Large amount of collections of documents available electronically Unstructured No classification system

7 Introduction Large amount of collections of documents available electronically Unstructured No classification system Difficult to find specific information

8 Topic Models Unsupervised. Data-driven. Capturing the themes discussed within documents. Topics are probability distributions over words. Documents are represented as distributions over topics.

9 Topic Models Topic Model e.g. Latent Dirichlet Allocation (LDA; Blei et al., 2003) Topic-Word Matrix Topic-Doc Matrix

10 Topic - Word Matrix water, river, bank bank river water money interest loan Topic Topic bank, money, interest Topic representation: Top-N words with highest marginal probability.

11 Topic - Document Matrix water, river, bank Doc 1 Doc 2 Doc 3 Doc 4 Doc 5 Doc 6 Topic Topic bank, money, interest

12 Visualising Document Collections using Topic Models Topic Model Visualization Engine (TMVE; Chaney and Blei, 2012)

13 Visualising Document Collections using Topic Models Topic Model Visualization Engine (TMVE; Chaney and Blei, 2012)

14 Aim and Challenges Make topics more comprehensible and useful. Topic Coherence Topic Similarity Topic Labelling

15 Topic Coherence Topics need to present a coherent thematic subject. Why compute topic coherence? Decide which topics should be shown in topic browsers. Pre-processing step in topic labelling algorithms.

16 Topic Coherence Topics team, game, james, season, player, nba, play, knicks, coach, league model, wheel, engine, system, drive, front, vehicle, rear, speed, power privacy, andrews, elli, alexander, burke, zoo, information, chung, user, regan taylor, morris, camp, rom, elliott, romania, orange, even, gray, brantley

17 Topic Coherence Topics team, game, james, season, player, nba, play, knicks, coach, league model, wheel, engine, system, drive, front, vehicle, rear, speed, power privacy, andrews, elli, alexander, burke, zoo, information, chung, user, regan taylor, morris, camp, rom, elliott, romania, orange, even, gray, brantley

18 Topic Coherence Topics team, game, james, season, player, nba, play, knicks, coach, league model, wheel, engine, system, drive, front, vehicle, rear, speed, power privacy, andrews, elli, alexander, burke, zoo, information, chung, user, regan taylor, morris, camp, rom, elliott, romania, orange, even, gray, brantley

19 Topic Coherence Computing topic coherence: Average pairwise similarity between topic words (Newman et al., 2010). team game james season player nba play knicks coach league Sim(team, league)

20 Topic Coherence Computing Sim(wi,wj): Previous work: Pointwise Mutual Information (PMI; Newman et al., 2010). Log ratio of co-document frequency and document frequency in training corpus (Mimno et al., 2011).

21 Topic Coherence Compute topic coherence using distributional semantics (Aletras and Stevenson, 2013; IWCS). Topic words: Vectors in a semantic space. Topic word similarity: Cosine of the angles of the vectors.

22 Topic Coherence team game james season player nba play knicks coach league team game league Sim(team, game) = Cos(team, game) PMI

23 Topic Coherence Evaluation and Results 300 topics with human judgements of topic coherence. 20NG, NYT, Genomics. Spearman Correlation NYT 20NG Genomics Newman et al. (2010) Mimno et al. (2011) Dis. Sem

24 Topic Similarity Group similar topics together. Why? Decide which topics should be shown together in topic browsers.

25 Topic Similarity Topic similarity: Previous work: Compare per-topic word probability distributions: KL divergence (Li and McCallum, 2006; Wang et al., 2009; Newman et a., 2009, Kim and Oh, 2011) Jensen-Shannon Divergence (JSD; Kim and Oh, 2011) Cosine (He et al., 2009; Ramage et al., 2009; Kim and Oh, 2011) Log odds ratio (Chaney and Blei, 2012)

26 Topic Similarity Computing topic similarity: Average pairwise similarity of the words of two topics (Aletras and Stevenson, 2014; EACL). team game james season player nba play knicks coach league Sim(team, goal) world cup team soccer africa player south game match goal

27 Topic Similarity Computing Sim(w i,w j ): Word association PMI Distributional semantics Knowledge-based methods Explicit Semantic Analysis (ESA, Gabrilovich and Markovitch, 2007)

28 Topic Similarity Evaluation and Results 800 pairs of 50,100, 200 LDA topics with human judgements of topic similarity. NYT articles. Spearman Correlation Word Overlap JSD Cosine Chaney and Blei (2012) PMI Dis. Sem ESA SVR IAA

29 Topic Labelling Associate textual or image labels with topics. Why? Assist in the interpretation of the lists of words representing the topics.

30 Topic Labelling team, game, james, season, player, nba, play, knicks, coach, league

31 Topic Labelling NBA team, game, james, season, player, nba, play, knicks, coach, league

32 Topic Labelling NBA American Basketball team, game, james, season, player, nba, play, knicks, coach, league

33 Topic Labelling NBA American Basketball Basketball team, game, james, season, player, nba, play, knicks, coach, league

34 Topic Labelling NBA American Basketball Basketball team, game, james, season, player, nba, play, knicks, coach, league

35 Topic Labelling NBA American Basketball Basketball team, game, james, season, player, nba, play, knicks, coach, league

36 Topic Labelling using Text Assume a topic and a set of candidate labels, select the most appropriate. Previous work: Ranking candidate labels by combining word association measures, lexical features and an information retrieval technique into a supervised model (Lau et al., 2011).

37 Topic Labelling using Text Ranking labels using unsupervised graph-based methods (Aletras and Stevenson 2014; ACL). Retrieving text information Creating text graph Identifying important terms Scoring candidate labels

38 Topic Labelling using Text Retrieving text information TOPIC SEARCH ENGINE RESULTS METADATA team, game, james, season, player, nba, play, knicks, coach, league Metadata Title Fields: 1. LeBron James Stats 2. NBA basketball statistics 3. Cleveland Cavaliers 4. Fiba Basketball World Cup

39 Topic Labelling using Text Creating text graph w1 Edge weight: 1 - Unweighted Graph (PR) Normalised PMI - Weighted Graph (PR-NPMI) w2 w3 w4 w5 w6 w7 w8

40 Topic Labelling using Text Creating text graph Edge weight: 1 - Unweighted Graph (PR) Normalised PMI - Weighted Graph (PR-NPMI) w1 Identifying important terms w1 w3 w2 w2 w3 w4 w4 PageRank w5 w6 w7 w8 w5 w6 w7 w8

41 Topic Labelling using Text Creating text graph Edge weight: 1 - Unweighted Graph (PR) Normalised PMI - Weighted Graph (PR-NPMI) w1 Identifying important terms w1 w3 w2 w2 w3 w4 w4 PageRank w5 w7 w8 w5 w6 w6 Scoring Candidate Labels w7 Candidate Label: L = {w1, w2,, wm} Scoring Function: w8

42 Topic Labelling using Text Evaluation and Results 228 topics and 6K labels with human judgements of relevance (Lau et al., 2011). Blogs, books, news, Pubmed. Top-1 Average Rating (0-3) Blogs Books News Pubmed Lau et al. (2011)-U Lau et al. (2011)-S PR 2.05* 1.98* 2.04* 1.88** PR-NPMI 2.08* 2.01* 2.05* 1.90** Upper Bound

43 Topic Labelling using Images Identifying and ranking image labels using unsupervised graph-based methods (Aletras and Stevenson 2013; NAACL-HLT). Retrieving images Extracting textual and visual information Creating image graph Identifying important images

44 Topic Labelling using Images Retrieving text and visual information Text Features: Title and Link Metadata Fields TOPIC SEARCH ENGINE Top-20 Images team, game, james, season, player, nba, play, knicks, coach, league Visual Features: Bags of Visual Words (BOVW)

45 Creating candidate images graph Topic Labelling using Images Edge weight: Cosine of BOVW

46 Topic Labelling using Images Creating candidate images graph Ranking images PageRank Edge weight: Cosine of BOVW

47 Topic Labelling using Images Evaluation and Results 300 topics and 6K image labels with human judgements of relevance. News, Wikipedia articles Top-1 Average Rating (0-3) Random 1.79 Word Overlap 1.85 Google Image Search 1.89 PR-VIS 1.96 PR-VIS+Per(PMI) 2.00* Human Performance 2.24

48 Comparing Topic Representations Which topic representations are suitable within a document browser interface? What is the impact of different topic representations on human search effectiveness for a given query?

49 Comparing Topic Representations Retrieval Task (Aletras et al., 2014; JCDL, Aletras et al., 2015; JASIST). Aim: Identify as many documents relevant to a set of queries as possible within 3 minutes in a document collection organised using topic models.

50 Retrieval Task Document Collection: Reuters Corpus (Rose et al., 2002). 20 subject categories used as queries. 100K documents.

51 Reuters Category (Query) No. Docs Travel & Tourism 314 Domestic Politics (USA) 27,236 War - Civil War 16,615 Biographies, Personalities, People 2,601 Defence 4,224 Crime, Law Enforcement 10,673 Religion 1,477 Disasters & Accidents 3,161 International Relations 19,273 Science & Technology 1,042 Employment/Labour 2,796 Government Finance 17,904 Weather 1,190 Elections 5,866 Environment & Natural World 1,933 Arts, Culture, Entertainment 1,450 Health 1,567 European Commission Institutions 1,046 Sports 18,913 Welfare, Social Services 775

52 Retrieval Task Topic Modelling: LDA. Learned 100 topics. 84 topics after filtering out incoherent topics (Aletras and Stevenson, 2013b).

53 Retrieval Task Topic Browsing Systems: Topic Model Visualisation Engine (TMVE, Chaney and Blei, 2012). 3 topic representations: Keywords (Top-10) Textual Phrases Wikipedia article titles. Images Wikimedia images.

54 Retrieval Task Task: Two-step procedure. Given a query and a set of topics: 1. Identify all potentially relevant topics, 2. Identify relevant documents from a list of documents associated with the selected topics.

55 Retrieval Task Step 1: Identify relevant topics (Keywords).

56 Step 1: Identify relevant topics (Textual Phrases).

57 Step 1: Identify relevant topics (Images).

58 Retrieval Task Step 2: Identify relevant documents.

59 Retrieval Task Subjects and Procedure: 15 members of research staff and graduate students at the Universities of Sheffield, Melbourne and King s College. Each participant had to sign up to our on-line system. Participants had access to a personalised page where they could read instructions, see how many queries they have completed so far and perform a new query. Participants performed each of the 20 queries. Each query has been performed at least by 5 different participants.

60 Results Number of Retrieved Documents Topic Representation Total Keywords 1,086 Text 1,264 Image 1,115

61 Results Precision Topic Representation Average Keywords 0.59 Text 0.53 Image 0.56

62 Conclusions Post-processing the output of topic models can help building better exploratory search interfaces. Filtering-out junk topics. Grouping together similar topics. Assisting to topic interpretation by providing textual or image labels. If you are interested in the topic: Topic Models: Post-Processing and Applications Workshop in CIKM

63 Publications N. Aletras, T. Baldwin, J. H. Lau and M. Stevenson (2015). Evaluating Topic Representations for Exploring Document Collections. JASIST. N. Aletras and M. Stevenson (2014) Labelling Topics using Unsupervised Graph-based Methods. In ACL. N. Aletras and M. Stevenson (2014) Measuring the Similarity between Automatically Generated Topics. In EACL. N. Aletras, T. Baldwin, J. Lau and M. Stevenson (2014) Representing Topics Labels for Exploring Digital Libraries. In JCDL. N. Aletras and M. Stevenson (2013) Representing Topics Using Images. In NAACL-HLT. N. Aletras and M. Stevenson (2013) Evaluating Topic Coherence Using Distributional Semantics. In IWCS.

Evaluating Topic Representations for Exploring Document Collections

Evaluating Topic Representations for Exploring Document Collections Evaluating Topic Representations for Exploring Document Collections Nikolaos Aletras (corresponding author) Computer Science University College London nikos.aletras@gmail.com Timothy Baldwin Computing

More information

Computing Similarity between Cultural Heritage Items using Multimodal Features

Computing Similarity between Cultural Heritage Items using Multimodal Features Computing Similarity between Cultural Heritage Items using Multimodal Features Nikolaos Aletras and Mark Stevenson Department of Computer Science, University of Sheffield Could the combination of textual

More information

arxiv: v1 [cs.cl] 29 Mar 2019

arxiv: v1 [cs.cl] 29 Mar 2019 Re-Ranking Words to Improve Interpretability of Automatically Generated Topics Areej Alokaili 1,2, Nikolaos Aletras 1 and Mark Stevenson 1 1 University of Sheffield, United Kingdom 2 King Saud University,

More information

Jianyong Wang Department of Computer Science and Technology Tsinghua University

Jianyong Wang Department of Computer Science and Technology Tsinghua University Jianyong Wang Department of Computer Science and Technology Tsinghua University jianyong@tsinghua.edu.cn Joint work with Wei Shen (Tsinghua), Ping Luo (HP), and Min Wang (HP) Outline Introduction to entity

More information

Knowledge-based Word Sense Disambiguation using Topic Models Devendra Singh Chaplot

Knowledge-based Word Sense Disambiguation using Topic Models Devendra Singh Chaplot Knowledge-based Word Sense Disambiguation using Topic Models Devendra Singh Chaplot Ruslan Salakhutdinov Word Sense Disambiguation Word sense disambiguation (WSD) is defined as the problem of computationally

More information

Representing Topics Using Images

Representing Topics Using Images Representing Topics Using Images Nikolaos Aletras and Mark Stevenson Department of Computer Science University of Sheffield Regent Court, 211 Portobello Sheffield, S1 4DP, UK {n.aletras, m.stevenson}@dcs.shef.ac.uk

More information

Clustering using Topic Models

Clustering using Topic Models Clustering using Topic Models Compiled by Sujatha Das, Cornelia Caragea Credits for slides: Blei, Allan, Arms, Manning, Rai, Lund, Noble, Page. Clustering Partition unlabeled examples into disjoint subsets

More information

Metadata Topic Harmonization and Semantic Search for Linked-Data-Driven Geoportals -- A Case Study Using ArcGIS Online

Metadata Topic Harmonization and Semantic Search for Linked-Data-Driven Geoportals -- A Case Study Using ArcGIS Online Metadata Topic Harmonization and Semantic Search for Linked-Data-Driven Geoportals -- A Case Study Using ArcGIS Online Yingjie Hu 1, Krzysztof Janowicz 1, Sathya Prasad 2, and Song Gao 1 1 STKO Lab, Department

More information

What is this Song About?: Identification of Keywords in Bollywood Lyrics

What is this Song About?: Identification of Keywords in Bollywood Lyrics What is this Song About?: Identification of Keywords in Bollywood Lyrics by Drushti Apoorva G, Kritik Mathur, Priyansh Agrawal, Radhika Mamidi in 19th International Conference on Computational Linguistics

More information

Ranking models in Information Retrieval: A Survey

Ranking models in Information Retrieval: A Survey Ranking models in Information Retrieval: A Survey R.Suganya Devi Research Scholar Department of Computer Science and Engineering College of Engineering, Guindy, Chennai, Tamilnadu, India Dr D Manjula Professor

More information

Developing Focused Crawlers for Genre Specific Search Engines

Developing Focused Crawlers for Genre Specific Search Engines Developing Focused Crawlers for Genre Specific Search Engines Nikhil Priyatam Thesis Advisor: Prof. Vasudeva Varma IIIT Hyderabad July 7, 2014 Examples of Genre Specific Search Engines MedlinePlus Naukri.com

More information

Text Analytics (Text Mining)

Text Analytics (Text Mining) CSE 6242 / CX 4242 Text Analytics (Text Mining) Concepts and Algorithms Duen Horng (Polo) Chau Georgia Tech Some lectures are partly based on materials by Professors Guy Lebanon, Jeffrey Heer, John Stasko,

More information

Visual Query Suggestion

Visual Query Suggestion Visual Query Suggestion Zheng-Jun Zha, Linjun Yang, Tao Mei, Meng Wang, Zengfu Wang University of Science and Technology of China Textual Visual Query Suggestion Microsoft Research Asia Motivation Framework

More information

Introduction to Text Mining. Hongning Wang

Introduction to Text Mining. Hongning Wang Introduction to Text Mining Hongning Wang CS@UVa Who Am I? Hongning Wang Assistant professor in CS@UVa since August 2014 Research areas Information retrieval Data mining Machine learning CS@UVa CS6501:

More information

Outline. Morning program Preliminaries Semantic matching Learning to rank Entities

Outline. Morning program Preliminaries Semantic matching Learning to rank Entities 112 Outline Morning program Preliminaries Semantic matching Learning to rank Afternoon program Modeling user behavior Generating responses Recommender systems Industry insights Q&A 113 are polysemic Finding

More information

Text Analytics (Text Mining)

Text Analytics (Text Mining) CSE 6242 / CX 4242 Apr 1, 2014 Text Analytics (Text Mining) Concepts and Algorithms Duen Horng (Polo) Chau Georgia Tech Some lectures are partly based on materials by Professors Guy Lebanon, Jeffrey Heer,

More information

Ruslan Salakhutdinov and Geoffrey Hinton. University of Toronto, Machine Learning Group IRGM Workshop July 2007

Ruslan Salakhutdinov and Geoffrey Hinton. University of Toronto, Machine Learning Group IRGM Workshop July 2007 SEMANIC HASHING Ruslan Salakhutdinov and Geoffrey Hinton University of oronto, Machine Learning Group IRGM orkshop July 2007 Existing Methods One of the most popular and widely used in practice algorithms

More information

Automatic Summarization

Automatic Summarization Automatic Summarization CS 769 Guest Lecture Andrew B. Goldberg goldberg@cs.wisc.edu Department of Computer Sciences University of Wisconsin, Madison February 22, 2008 Andrew B. Goldberg (CS Dept) Summarization

More information

Evaluating an Associative Browsing Model for Personal Information

Evaluating an Associative Browsing Model for Personal Information Evaluating an Associative Browsing Model for Personal Information Jinyoung Kim, W. Bruce Croft, David A. Smith and Anton Bakalov Department of Computer Science University of Massachusetts Amherst {jykim,croft,dasmith,abakalov}@cs.umass.edu

More information

Evaluation of the Document Categorization in Fixed-point Observatory

Evaluation of the Document Categorization in Fixed-point Observatory Evaluation of the Document Categorization in Fixed-point Observatory Yoshihiro Ueda Mamiko Oka Katsunori Houchi Service Technology Development Department Fuji Xerox Co., Ltd. 3-1 Minatomirai 3-chome, Nishi-ku,

More information

Query Expansion using Wikipedia and DBpedia

Query Expansion using Wikipedia and DBpedia Query Expansion using Wikipedia and DBpedia Nitish Aggarwal and Paul Buitelaar Unit for Natural Language Processing, Digital Enterprise Research Institute, National University of Ireland, Galway firstname.lastname@deri.org

More information

Entity and Knowledge Base-oriented Information Retrieval

Entity and Knowledge Base-oriented Information Retrieval Entity and Knowledge Base-oriented Information Retrieval Presenter: Liuqing Li liuqing@vt.edu Digital Library Research Laboratory Virginia Polytechnic Institute and State University Blacksburg, VA 24061

More information

Ontological Topic Modeling to Extract Twitter users' Topics of Interest

Ontological Topic Modeling to Extract Twitter users' Topics of Interest Ontological Topic Modeling to Extract Twitter users' Topics of Interest Ounas Asfari, Lilia Hannachi, Fadila Bentayeb and Omar Boussaid Abstract--Twitter, as the most notable services of micro-blogs, has

More information

Exploiting Conversation Structure in Unsupervised Topic Segmentation for s

Exploiting Conversation Structure in Unsupervised Topic Segmentation for  s Exploiting Conversation Structure in Unsupervised Topic Segmentation for Emails Shafiq Joty, Giuseppe Carenini, Gabriel Murray, Raymond Ng University of British Columbia Vancouver, Canada EMNLP 2010 1

More information

NATURAL LANGUAGE PROCESSING

NATURAL LANGUAGE PROCESSING NATURAL LANGUAGE PROCESSING LESSON 9 : SEMANTIC SIMILARITY OUTLINE Semantic Relations Semantic Similarity Levels Sense Level Word Level Text Level WordNet-based Similarity Methods Hybrid Methods Similarity

More information

International Journal of Video& Image Processing and Network Security IJVIPNS-IJENS Vol:10 No:02 7

International Journal of Video& Image Processing and Network Security IJVIPNS-IJENS Vol:10 No:02 7 International Journal of Video& Image Processing and Network Security IJVIPNS-IJENS Vol:10 No:02 7 A Hybrid Method for Extracting Key Terms of Text Documents Ahmad Ali Al-Zubi Computer Science Department

More information

Learning Semantic Entity Representations with Knowledge Graph and Deep Neural Networks and its Application to Named Entity Disambiguation

Learning Semantic Entity Representations with Knowledge Graph and Deep Neural Networks and its Application to Named Entity Disambiguation Learning Semantic Entity Representations with Knowledge Graph and Deep Neural Networks and its Application to Named Entity Disambiguation Hongzhao Huang 1 and Larry Heck 2 Computer Science Department,

More information

A Navigation-log based Web Mining Application to Profile the Interests of Users Accessing the Web of Bidasoa Turismo

A Navigation-log based Web Mining Application to Profile the Interests of Users Accessing the Web of Bidasoa Turismo A Navigation-log based Web Mining Application to Profile the Interests of Users Accessing the Web of Bidasoa Turismo Olatz Arbelaitz, Ibai Gurrutxaga, Aizea Lojo, Javier Muguerza, Jesús M. Pérez and Iñigo

More information

Prof. Ahmet Süerdem Istanbul Bilgi University London School of Economics

Prof. Ahmet Süerdem Istanbul Bilgi University London School of Economics Prof. Ahmet Süerdem Istanbul Bilgi University London School of Economics Media Intelligence Business intelligence (BI) Uses data mining techniques and tools for the transformation of raw data into meaningful

More information

MSRA Columbus at GeoCLEF2007

MSRA Columbus at GeoCLEF2007 MSRA Columbus at GeoCLEF2007 Zhisheng Li 1, Chong Wang 2, Xing Xie 2, Wei-Ying Ma 2 1 Department of Computer Science, University of Sci. & Tech. of China, Hefei, Anhui, 230026, P.R. China zsli@mail.ustc.edu.cn

More information

IBM Research - China

IBM Research - China TIARA: A Visual Exploratory Text Analytic System Furu Wei +, Shixia Liu +, Yangqiu Song +, Shimei Pan #, Michelle X. Zhou*, Weihong Qian +, Lei Shi +, Li Tan + and Qiang Zhang + + IBM Research China, Beijing,

More information

On-Lib: An Application and Analysis of Fuzzy-Fast Query Searching and Clustering on Library Database

On-Lib: An Application and Analysis of Fuzzy-Fast Query Searching and Clustering on Library Database On-Lib: An Application and Analysis of Fuzzy-Fast Query Searching and Clustering on Library Database Ashritha K.P, Sudheer Shetty 4 th Sem M.Tech, Dept. of CS&E, Sahyadri College of Engineering and Management,

More information

User Intent Discovery using Analysis of Browsing History

User Intent Discovery using Analysis of Browsing History User Intent Discovery using Analysis of Browsing History Wael K. Abdallah Information Systems Dept Computers & Information Faculty Mansoura University Mansoura, Egypt Dr. / Aziza S. Asem Information Systems

More information

National Certificate in Civil Defence (Response) (Level 3)

National Certificate in Civil Defence (Response) (Level 3) NZQF NQ Ref 0327 Version 5 Page 1 of 9 National Certificate in Civil Defence (Response) (Level 3) Level 3 Credits 52 This qualification has been reviewed. The last date to meet the requirements is 31 December

More information

Improving Difficult Queries by Leveraging Clusters in Term Graph

Improving Difficult Queries by Leveraging Clusters in Term Graph Improving Difficult Queries by Leveraging Clusters in Term Graph Rajul Anand and Alexander Kotov Department of Computer Science, Wayne State University, Detroit MI 48226, USA {rajulanand,kotov}@wayne.edu

More information

Information Retrieval. (M&S Ch 15)

Information Retrieval. (M&S Ch 15) Information Retrieval (M&S Ch 15) 1 Retrieval Models A retrieval model specifies the details of: Document representation Query representation Retrieval function Determines a notion of relevance. Notion

More information

Understanding the Query: THCIB and THUIS at NTCIR-10 Intent Task

Understanding the Query: THCIB and THUIS at NTCIR-10 Intent Task Understanding the Query: THCIB and THUIS at NTCIR-10 Intent Task Yunqing Xia 1 and Sen Na 2 1 Tsinghua University 2 Canon Information Technology (Beijing) Co. Ltd. Before we start Who are we? THUIS is

More information

Fall CS646: Information Retrieval. Lecture 2 - Introduction to Search Result Ranking. Jiepu Jiang University of Massachusetts Amherst 2016/09/12

Fall CS646: Information Retrieval. Lecture 2 - Introduction to Search Result Ranking. Jiepu Jiang University of Massachusetts Amherst 2016/09/12 Fall 2016 CS646: Information Retrieval Lecture 2 - Introduction to Search Result Ranking Jiepu Jiang University of Massachusetts Amherst 2016/09/12 More course information Programming Prerequisites Proficiency

More information

How Many Topics? Stability Analysis for Topic Models. Derek Greene, Derek O Callaghan, Pádraig Cunningham

How Many Topics? Stability Analysis for Topic Models. Derek Greene, Derek O Callaghan, Pádraig Cunningham How Many Topics? Stability Analysis for Topic Models Derek Greene, Derek O Callaghan, Pádraig Cunningham School of Computer Science & Informatics, University College Dublin {derek.greene,derek.ocallaghan,padraig.cunningham}@ucd.ie

More information

Researching Individuals Revised on 9/2010

Researching Individuals Revised on 9/2010 1904 Franklin St. Suite 900, Oakland, CA 94612. tel:510.835.4692 fax:510.835.3017 Researching Individuals Revised on 9/2010 This is a list of sources for you to use when researching individuals (like police

More information

August 2012 Daejeon, South Korea

August 2012 Daejeon, South Korea Building a Web of Linked Entities (Part I: Overview) Pablo N. Mendes Free University of Berlin August 2012 Daejeon, South Korea Outline Part I A Web of Linked Entities Challenges Progress towards solutions

More information

Erin Crane, E-Resources and Instruction Librarian Germanna Community College

Erin Crane, E-Resources and Instruction Librarian Germanna Community College Erin Crane, E-Resources and Instruction Librarian Germanna Community College Context The Study Results Recommendations What s Next Main campus in Fredericksburg, VA 4119 FTE 3 libraries http://www.vccs.edu/about/where-we-are/college-locator/

More information

Qualitative Data Analysis Software. A workshop for staff & students School of Psychology Makerere University

Qualitative Data Analysis Software. A workshop for staff & students School of Psychology Makerere University Qualitative Data Analysis Software A workshop for staff & students School of Psychology Makerere University (PhD) January 27, 2016 Outline for the workshop CAQDAS NVivo Overview Practice 2 CAQDAS Before

More information

Domain-specific user preference prediction based on multiple user activities

Domain-specific user preference prediction based on multiple user activities 7 December 2016 Domain-specific user preference prediction based on multiple user activities Author: YUNFEI LONG, Qin Lu,Yue Xiao, MingLei Li, Chu-Ren Huang. www.comp.polyu.edu.hk/ Dept. of Computing,

More information

Extractive Text Summarization Techniques

Extractive Text Summarization Techniques Extractive Text Summarization Techniques Tobias Elßner Hauptseminar NLP Tools 06.02.2018 Tobias Elßner Extractive Text Summarization Overview Rough classification (Gupta and Lehal (2010)): Supervised vs.

More information

Text Document Clustering Using DPM with Concept and Feature Analysis

Text Document Clustering Using DPM with Concept and Feature Analysis Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 2, Issue. 10, October 2013,

More information

Wikipedia as a Big Data source for Tourism

Wikipedia as a Big Data source for Tourism Wikipedia as a Big Data source for Tourism Global Forum on Tourism Statistics Venice, November 23 rd 2016 Serena Signorelli 1,2, Fernando Reis 2, Silvia Biffignandi 1 1 University of Bergamo 2 EUROSTAT

More information

The Sheffield and Basque Country Universities Entry to CHiC: using Random Walks and Similarity to access Cultural Heritage

The Sheffield and Basque Country Universities Entry to CHiC: using Random Walks and Similarity to access Cultural Heritage The Sheffield and Basque Country Universities Entry to CHiC: using Random Walks and Similarity to access Cultural Heritage Eneko Agirre 1, Paul Clough 2, Samuel Fernando 2, Mark Hall 2, Arantxa Otegi 1,

More information

146 Information Technology

146 Information Technology Date : 12/08/2007 Gallica 2.0 : a second life for the Bibliothèque nationale de France digital library Catherine Lupovici, Noémie Lesquins Meeting: Simultaneous Interpretation: No WORLD LIBRARY AND INFORMATION

More information

Lab for Media Search, National University of Singapore 1

Lab for Media Search, National University of Singapore 1 1 2 Word2Image: Towards Visual Interpretation of Words Haojie Li Introduction Motivation A picture is worth 1000 words Traditional dictionary Containing word entries accompanied by photos or drawing to

More information

Get going with SPORT DISCUS. Summer Life & Health Sciences Library Team ULSTER UNIVERSITY

Get going with SPORT DISCUS. Summer Life & Health Sciences Library Team ULSTER UNIVERSITY Get going with SPORT DISCUS A workbook prepared by Ulster Library Life & Health Sciences Library staff on searching the SPORT DISCUS database Life & Health Sciences Library Team ULSTER UNIVERSITY INTRODUCTION

More information

HUKB at NTCIR-12 IMine-2 task: Utilization of Query Analysis Results and Wikipedia Data for Subtopic Mining

HUKB at NTCIR-12 IMine-2 task: Utilization of Query Analysis Results and Wikipedia Data for Subtopic Mining HUKB at NTCIR-12 IMine-2 task: Utilization of Query Analysis Results and Wikipedia Data for Subtopic Mining Masaharu Yoshioka Graduate School of Information Science and Technology, Hokkaido University

More information

Privacy Policy. England Athletics Limited commitment to Privacy. Introduction. The information we collect about you. The information provided to us

Privacy Policy. England Athletics Limited commitment to Privacy. Introduction. The information we collect about you. The information provided to us Privacy Policy England Athletics Limited commitment to Privacy Introduction The information we collect about you The information provided to us How we use your information Our legal bases for processing

More information

Finding Sources. Introduction Primary and secondary sources Searching for sources Search engines Online databases Blogs Talk About It Your Turn

Finding Sources. Introduction Primary and secondary sources Searching for sources Search engines Online databases Blogs Talk About It Your Turn Finding Sources Introduction Primary and secondary sources Searching for sources Search engines Online databases Blogs Talk About It Your Turn Tech Tools in this presentation LOC Online Catalog Advanced

More information

ENHANCEMENT OF METICULOUS IMAGE SEARCH BY MARKOVIAN SEMANTIC INDEXING MODEL

ENHANCEMENT OF METICULOUS IMAGE SEARCH BY MARKOVIAN SEMANTIC INDEXING MODEL ENHANCEMENT OF METICULOUS IMAGE SEARCH BY MARKOVIAN SEMANTIC INDEXING MODEL Shwetha S P 1 and Alok Ranjan 2 Visvesvaraya Technological University, Belgaum, Dept. of Computer Science and Engineering, Canara

More information

Database Foundations. 1-2 Introduction to Databases. Copyright 2015, Oracle and/or its affiliates. All rights reserved.

Database Foundations. 1-2 Introduction to Databases. Copyright 2015, Oracle and/or its affiliates. All rights reserved. Database Foundations 1-2 Roadmap You are here About the Course Introduction to Databases Types of Database Models Relational Databases Database Storage Structures Understanding Business Requirements 3

More information

NYU CSCI-GA Fall 2016

NYU CSCI-GA Fall 2016 1 / 45 Information Retrieval: Personalization Fernando Diaz Microsoft Research NYC November 7, 2016 2 / 45 Outline Introduction to Personalization Topic-Specific PageRank News Personalization Deciding

More information

Master Project. Various Aspects of Recommender Systems. Prof. Dr. Georg Lausen Dr. Michael Färber Anas Alzoghbi Victor Anthony Arrascue Ayala

Master Project. Various Aspects of Recommender Systems. Prof. Dr. Georg Lausen Dr. Michael Färber Anas Alzoghbi Victor Anthony Arrascue Ayala Master Project Various Aspects of Recommender Systems May 2nd, 2017 Master project SS17 Albert-Ludwigs-Universität Freiburg Prof. Dr. Georg Lausen Dr. Michael Färber Anas Alzoghbi Victor Anthony Arrascue

More information

Image Similarity Based on Direct Human Judgment

Image Similarity Based on Direct Human Judgment Image Similarity Based on Direct Human Judgment Raul Guerra Dept. of Computer Science University of Maryland College Park, MD 20742 rguerra@cs.umd.edu Abstract Recently the field of human-based computation

More information

A New Tool for Textual Aggregation in OLAP Context

A New Tool for Textual Aggregation in OLAP Context A New Tool for Textual Aggregation in OLAP Context Mustapha BOUAKKAZ 1, Sabine LOUDCHER 2 and Youcef OUINTEN 1 1 LIM Laboratory,University of Laghouat, Algeria 2 ERIC Laboratory, University of Lyon2, France

More information

Recommender Systems: Practical Aspects, Case Studies. Radek Pelánek

Recommender Systems: Practical Aspects, Case Studies. Radek Pelánek Recommender Systems: Practical Aspects, Case Studies Radek Pelánek 2017 This Lecture practical aspects : attacks, context, shared accounts,... case studies, illustrations of application illustration of

More information

Ontology based Model and Procedure Creation for Topic Analysis in Chinese Language

Ontology based Model and Procedure Creation for Topic Analysis in Chinese Language Ontology based Model and Procedure Creation for Topic Analysis in Chinese Language Dong Han and Kilian Stoffel Information Management Institute, University of Neuchâtel Pierre-à-Mazel 7, CH-2000 Neuchâtel,

More information

Exploring archives with probabilistic models: Topic modelling for the European Commission Archives

Exploring archives with probabilistic models: Topic modelling for the European Commission Archives Exploring archives with probabilistic models: Topic modelling for the European Commission Archives Simon Hengchen, Mathias Coeckelbergs, Seth van Hooland, Ruben Verborgh & Thomas Steiner Université libre

More information

Semantic Annotation of Web Resources Using IdentityRank and Wikipedia

Semantic Annotation of Web Resources Using IdentityRank and Wikipedia Semantic Annotation of Web Resources Using IdentityRank and Wikipedia Norberto Fernández, José M.Blázquez, Luis Sánchez, and Vicente Luque Telematic Engineering Department. Carlos III University of Madrid

More information

Tourism applications of Artificial Intelligence techniques. Dr. Antonio Moreno, ITAKA research group, URV

Tourism applications of Artificial Intelligence techniques. Dr. Antonio Moreno, ITAKA research group, URV Tourism applications of Artificial Intelligence techniques Dr. Antonio Moreno, ITAKA research group, URV ITAKA Basic research lines Multi-agent systems Ontology Learning Information Extraction Automated

More information

Print Article - Team Managers Manual. This article is also available for viewing online at

Print Article - Team Managers Manual. This article is also available for viewing online at This article is also available for viewing online at http://support.ngin.com/questions.php?questionid=200 Team Managers Manual The Sport NGIN website is a content management system, designed to help managers

More information

COMP 465 Special Topics: Data Mining

COMP 465 Special Topics: Data Mining COMP 465 Special Topics: Data Mining Introduction & Course Overview 1 Course Page & Class Schedule http://cs.rhodes.edu/welshc/comp465_s15/ What s there? Course info Course schedule Lecture media (slides,

More information

University of Virginia Department of Computer Science. CS 4501: Information Retrieval Fall 2015

University of Virginia Department of Computer Science. CS 4501: Information Retrieval Fall 2015 University of Virginia Department of Computer Science CS 4501: Information Retrieval Fall 2015 5:00pm-6:15pm, Monday, October 26th Name: ComputingID: This is a closed book and closed notes exam. No electronic

More information

Document Structure Analysis in Associative Patent Retrieval

Document Structure Analysis in Associative Patent Retrieval Document Structure Analysis in Associative Patent Retrieval Atsushi Fujii and Tetsuya Ishikawa Graduate School of Library, Information and Media Studies University of Tsukuba 1-2 Kasuga, Tsukuba, 305-8550,

More information

Social Search Introduction to Information Retrieval INF 141/ CS 121 Donald J. Patterson

Social Search Introduction to Information Retrieval INF 141/ CS 121 Donald J. Patterson Social Search Introduction to Information Retrieval INF 141/ CS 121 Donald J. Patterson The Anatomy of a Large-Scale Social Search Engine by Horowitz, Kamvar WWW2010 Web IR Input is a query of keywords

More information

Mobile Web in India. Arun Tanksali Jataayu Software

Mobile Web in India. Arun Tanksali Jataayu Software the color of Mobile Web in India Arun Tanksali Jataayu Software Jataayu - Background Jataayu formed with a clear focus of delivering solutions for wireless data services Established in Mar 2000 Handset

More information

Liangjie Hong*, Dawei Yin*, Jian Guo, Brian D. Davison*

Liangjie Hong*, Dawei Yin*, Jian Guo, Brian D. Davison* Tracking Trends: Incorporating Term Volume into Temporal Topic Models Liangjie Hong*, Dawei Yin*, Jian Guo, Brian D. Davison* Dept. of Computer Science and Engineering, Lehigh University, Bethlehem, PA,

More information

Multimodal Information Spaces for Content-based Image Retrieval

Multimodal Information Spaces for Content-based Image Retrieval Research Proposal Multimodal Information Spaces for Content-based Image Retrieval Abstract Currently, image retrieval by content is a research problem of great interest in academia and the industry, due

More information

Web Mining Evolution & Comparative Study with Data Mining

Web Mining Evolution & Comparative Study with Data Mining Web Mining Evolution & Comparative Study with Data Mining Anu, Assistant Professor (Resource Person) University Institute of Engineering and Technology Mahrishi Dayanand University Rohtak-124001, India

More information

CPSC 340: Machine Learning and Data Mining. Ranking Fall 2016

CPSC 340: Machine Learning and Data Mining. Ranking Fall 2016 CPSC 340: Machine Learning and Data Mining Ranking Fall 2016 Assignment 5: Admin 2 late days to hand in Wednesday, 3 for Friday. Assignment 6: Due Friday, 1 late day to hand in next Monday, etc. Final:

More information

Available online at ScienceDirect. Procedia Computer Science 82 (2016 ) 28 34

Available online at  ScienceDirect. Procedia Computer Science 82 (2016 ) 28 34 Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 82 (2016 ) 28 34 Symposium on Data Mining Applications, SDMA2016, 30 March 2016, Riyadh, Saudi Arabia Finding similar documents

More information

SIRS Issues Researcher

SIRS Issues Researcher From the main screen of SIRS, click on the SIRS Issues Researcher link. 1 This tutorial will provide an overview of the following features available through SIRS Issues Researcher: 2. Search Tabs 3. Reference

More information

MCA APPLICATION TITLES LIST

MCA APPLICATION TITLES LIST 1. Automated tele-network system JAVA 2. Car Rental System JAVA 3. City Information System JAVA 4. College Feedback System JAVA 5. Design and strategies for online voting system JAVA 6. E-Learning JAVA

More information

Information Retrieval CS Lecture 01. Razvan C. Bunescu School of Electrical Engineering and Computer Science

Information Retrieval CS Lecture 01. Razvan C. Bunescu School of Electrical Engineering and Computer Science Information Retrieval CS 6900 Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Information Retrieval Information Retrieval (IR) is finding material of an unstructured

More information

CIRGDISCO at RepLab2012 Filtering Task: A Two-Pass Approach for Company Name Disambiguation in Tweets

CIRGDISCO at RepLab2012 Filtering Task: A Two-Pass Approach for Company Name Disambiguation in Tweets CIRGDISCO at RepLab2012 Filtering Task: A Two-Pass Approach for Company Name Disambiguation in Tweets Arjumand Younus 1,2, Colm O Riordan 1, and Gabriella Pasi 2 1 Computational Intelligence Research Group,

More information

In = number of words appearing exactly n times N = number of words in the collection of words A = a constant. For example, if N=100 and the most

In = number of words appearing exactly n times N = number of words in the collection of words A = a constant. For example, if N=100 and the most In = number of words appearing exactly n times N = number of words in the collection of words A = a constant. For example, if N=100 and the most common word appears 10 times then A = rn*n/n = 1*10/100

More information

Information Retrieval CSCI

Information Retrieval CSCI Information Retrieval CSCI 4141-6403 My name is Anwar Alhenshiri My email is: anwar@cs.dal.ca I prefer: aalhenshiri@gmail.com The course website is: http://web.cs.dal.ca/~anwar/ir/main.html 5/6/2012 1

More information

Question Answering Systems

Question Answering Systems Question Answering Systems An Introduction Potsdam, Germany, 14 July 2011 Saeedeh Momtazi Information Systems Group Outline 2 1 Introduction Outline 2 1 Introduction 2 History Outline 2 1 Introduction

More information

Named Entity Disambiguation in Digital Libraries

Named Entity Disambiguation in Digital Libraries Erasmus Mundus European Master in Language & Communication Technologies (LCT) Named Entity Disambiguation in Digital Libraries Le DieuThu Supervisors: Raffaella Bernardi Massimo Poesio Patrick Blackburn

More information

Papers for comprehensive viva-voce

Papers for comprehensive viva-voce Papers for comprehensive viva-voce Priya Radhakrishnan Advisor : Dr. Vasudeva Varma Search and Information Extraction Lab, International Institute of Information Technology, Gachibowli, Hyderabad, India

More information

Data-Mining Algorithms with Semantic Knowledge

Data-Mining Algorithms with Semantic Knowledge Data-Mining Algorithms with Semantic Knowledge Ontology-based information extraction Carlos Vicient Monllaó Universitat Rovira i Virgili December, 14th 2010. Poznan A Project funded by the Ministerio de

More information

jldadmm: A Java package for the LDA and DMM topic models

jldadmm: A Java package for the LDA and DMM topic models jldadmm: A Java package for the LDA and DMM topic models Dat Quoc Nguyen School of Computing and Information Systems The University of Melbourne, Australia dqnguyen@unimelb.edu.au Abstract: In this technical

More information

Privacy Notice for Jersey Swimming Club

Privacy Notice for Jersey Swimming Club Privacy Notice for Jersey Swimming Club What personal data does the JSC collect, and what is it used for? Who is your data shared with? Where does this data come from? How is your data stored? Who is responsible

More information

Is Brad Pitt Related to Backstreet Boys? Exploring Related Entities

Is Brad Pitt Related to Backstreet Boys? Exploring Related Entities Is Brad Pitt Related to Backstreet Boys? Exploring Related Entities Nitish Aggarwal, Kartik Asooja, Paul Buitelaar, and Gabriela Vulcu Unit for Natural Language Processing Insight-centre, National University

More information

A Personal Information Retrieval System in a Web Environment

A Personal Information Retrieval System in a Web Environment Vol.87 (Art, Culture, Game, Graphics, Broadcasting and Digital Contents 2015), pp.42-46 http://dx.doi.org/10.14257/astl.2015.87.10 A Personal Information Retrieval System in a Web Environment YoungDeok

More information

The University of Amsterdam at the CLEF 2008 Domain Specific Track

The University of Amsterdam at the CLEF 2008 Domain Specific Track The University of Amsterdam at the CLEF 2008 Domain Specific Track Parsimonious Relevance and Concept Models Edgar Meij emeij@science.uva.nl ISLA, University of Amsterdam Maarten de Rijke mdr@science.uva.nl

More information

Basic techniques. Text processing; term weighting; vector space model; inverted index; Web Search

Basic techniques. Text processing; term weighting; vector space model; inverted index; Web Search Basic techniques Text processing; term weighting; vector space model; inverted index; Web Search Overview Indexes Query Indexing Ranking Results Application Documents User Information analysis Query processing

More information

TriRank: Review-aware Explainable Recommendation by Modeling Aspects

TriRank: Review-aware Explainable Recommendation by Modeling Aspects TriRank: Review-aware Explainable Recommendation by Modeling Aspects Xiangnan He, Tao Chen, Min-Yen Kan, Xiao Chen National University of Singapore Presented by Xiangnan He CIKM 15, Melbourne, Australia

More information

VisoLink: A User-Centric Social Relationship Mining

VisoLink: A User-Centric Social Relationship Mining VisoLink: A User-Centric Social Relationship Mining Lisa Fan and Botang Li Department of Computer Science, University of Regina Regina, Saskatchewan S4S 0A2 Canada {fan, li269}@cs.uregina.ca Abstract.

More information

CS224W: Social and Information Network Analysis Project Report: Edge Detection in Review Networks

CS224W: Social and Information Network Analysis Project Report: Edge Detection in Review Networks CS224W: Social and Information Network Analysis Project Report: Edge Detection in Review Networks Archana Sulebele, Usha Prabhu, William Yang (Group 29) Keywords: Link Prediction, Review Networks, Adamic/Adar,

More information

Query Subtopic Mining Exploiting Word Embedding for Search Result Diversification

Query Subtopic Mining Exploiting Word Embedding for Search Result Diversification Query Subtopic Mining Exploiting Word Embedding for Search Result Diversification Md Zia Ullah, Md Shajalal, Abu Nowshed Chy, and Masaki Aono Department of Computer Science and Engineering, Toyohashi University

More information

Chinese On The Go By Live ABC

Chinese On The Go By Live ABC Chinese On The Go By Live ABC If you are searched for a ebook Chinese On the Go by Live ABC in pdf format, in that case you come on to correct website. We furnish the complete edition of this ebook in

More information

TALP at WePS Daniel Ferrés and Horacio Rodríguez

TALP at WePS Daniel Ferrés and Horacio Rodríguez TALP at WePS-3 2010 Daniel Ferrés and Horacio Rodríguez TALP Research Center, Software Department Universitat Politècnica de Catalunya Jordi Girona 1-3, 08043 Barcelona, Spain {dferres, horacio}@lsi.upc.edu

More information

Natural Language Processing SoSe Question Answering. (based on the slides of Dr. Saeedeh Momtazi)

Natural Language Processing SoSe Question Answering. (based on the slides of Dr. Saeedeh Momtazi) Natural Language Processing SoSe 2015 Question Answering Dr. Mariana Neves July 6th, 2015 (based on the slides of Dr. Saeedeh Momtazi) Outline 2 Introduction History QA Architecture Outline 3 Introduction

More information

myttm General Help for Teams

myttm General Help for Teams This document is intended for myttm team users. It provides information specifically for teams when using the myttm Web Service portal. The following topics will be discussed in this document: Sign In

More information