Interpreting Document Collections with Topic Models. Nikolaos Aletras University College London
|
|
- Corey Patterson
- 5 years ago
- Views:
Transcription
1 Interpreting Document Collections with Topic Models Nikolaos Aletras University College London
2 Acknowledgements Mark Stevenson, Sheffield Tim Baldwin, Melbourne Jey Han Lau, IBM Research
3 Talk Outline Introduction Topic Modelling Organising Document Collections with Topic Models Topic Coherence Topic Similarity Topic Labelling
4 Introduction Large amount of collections of documents available electronically
5 Introduction Large amount of collections of documents available electronically Unstructured
6 Introduction Large amount of collections of documents available electronically Unstructured No classification system
7 Introduction Large amount of collections of documents available electronically Unstructured No classification system Difficult to find specific information
8 Topic Models Unsupervised. Data-driven. Capturing the themes discussed within documents. Topics are probability distributions over words. Documents are represented as distributions over topics.
9 Topic Models Topic Model e.g. Latent Dirichlet Allocation (LDA; Blei et al., 2003) Topic-Word Matrix Topic-Doc Matrix
10 Topic - Word Matrix water, river, bank bank river water money interest loan Topic Topic bank, money, interest Topic representation: Top-N words with highest marginal probability.
11 Topic - Document Matrix water, river, bank Doc 1 Doc 2 Doc 3 Doc 4 Doc 5 Doc 6 Topic Topic bank, money, interest
12 Visualising Document Collections using Topic Models Topic Model Visualization Engine (TMVE; Chaney and Blei, 2012)
13 Visualising Document Collections using Topic Models Topic Model Visualization Engine (TMVE; Chaney and Blei, 2012)
14 Aim and Challenges Make topics more comprehensible and useful. Topic Coherence Topic Similarity Topic Labelling
15 Topic Coherence Topics need to present a coherent thematic subject. Why compute topic coherence? Decide which topics should be shown in topic browsers. Pre-processing step in topic labelling algorithms.
16 Topic Coherence Topics team, game, james, season, player, nba, play, knicks, coach, league model, wheel, engine, system, drive, front, vehicle, rear, speed, power privacy, andrews, elli, alexander, burke, zoo, information, chung, user, regan taylor, morris, camp, rom, elliott, romania, orange, even, gray, brantley
17 Topic Coherence Topics team, game, james, season, player, nba, play, knicks, coach, league model, wheel, engine, system, drive, front, vehicle, rear, speed, power privacy, andrews, elli, alexander, burke, zoo, information, chung, user, regan taylor, morris, camp, rom, elliott, romania, orange, even, gray, brantley
18 Topic Coherence Topics team, game, james, season, player, nba, play, knicks, coach, league model, wheel, engine, system, drive, front, vehicle, rear, speed, power privacy, andrews, elli, alexander, burke, zoo, information, chung, user, regan taylor, morris, camp, rom, elliott, romania, orange, even, gray, brantley
19 Topic Coherence Computing topic coherence: Average pairwise similarity between topic words (Newman et al., 2010). team game james season player nba play knicks coach league Sim(team, league)
20 Topic Coherence Computing Sim(wi,wj): Previous work: Pointwise Mutual Information (PMI; Newman et al., 2010). Log ratio of co-document frequency and document frequency in training corpus (Mimno et al., 2011).
21 Topic Coherence Compute topic coherence using distributional semantics (Aletras and Stevenson, 2013; IWCS). Topic words: Vectors in a semantic space. Topic word similarity: Cosine of the angles of the vectors.
22 Topic Coherence team game james season player nba play knicks coach league team game league Sim(team, game) = Cos(team, game) PMI
23 Topic Coherence Evaluation and Results 300 topics with human judgements of topic coherence. 20NG, NYT, Genomics. Spearman Correlation NYT 20NG Genomics Newman et al. (2010) Mimno et al. (2011) Dis. Sem
24 Topic Similarity Group similar topics together. Why? Decide which topics should be shown together in topic browsers.
25 Topic Similarity Topic similarity: Previous work: Compare per-topic word probability distributions: KL divergence (Li and McCallum, 2006; Wang et al., 2009; Newman et a., 2009, Kim and Oh, 2011) Jensen-Shannon Divergence (JSD; Kim and Oh, 2011) Cosine (He et al., 2009; Ramage et al., 2009; Kim and Oh, 2011) Log odds ratio (Chaney and Blei, 2012)
26 Topic Similarity Computing topic similarity: Average pairwise similarity of the words of two topics (Aletras and Stevenson, 2014; EACL). team game james season player nba play knicks coach league Sim(team, goal) world cup team soccer africa player south game match goal
27 Topic Similarity Computing Sim(w i,w j ): Word association PMI Distributional semantics Knowledge-based methods Explicit Semantic Analysis (ESA, Gabrilovich and Markovitch, 2007)
28 Topic Similarity Evaluation and Results 800 pairs of 50,100, 200 LDA topics with human judgements of topic similarity. NYT articles. Spearman Correlation Word Overlap JSD Cosine Chaney and Blei (2012) PMI Dis. Sem ESA SVR IAA
29 Topic Labelling Associate textual or image labels with topics. Why? Assist in the interpretation of the lists of words representing the topics.
30 Topic Labelling team, game, james, season, player, nba, play, knicks, coach, league
31 Topic Labelling NBA team, game, james, season, player, nba, play, knicks, coach, league
32 Topic Labelling NBA American Basketball team, game, james, season, player, nba, play, knicks, coach, league
33 Topic Labelling NBA American Basketball Basketball team, game, james, season, player, nba, play, knicks, coach, league
34 Topic Labelling NBA American Basketball Basketball team, game, james, season, player, nba, play, knicks, coach, league
35 Topic Labelling NBA American Basketball Basketball team, game, james, season, player, nba, play, knicks, coach, league
36 Topic Labelling using Text Assume a topic and a set of candidate labels, select the most appropriate. Previous work: Ranking candidate labels by combining word association measures, lexical features and an information retrieval technique into a supervised model (Lau et al., 2011).
37 Topic Labelling using Text Ranking labels using unsupervised graph-based methods (Aletras and Stevenson 2014; ACL). Retrieving text information Creating text graph Identifying important terms Scoring candidate labels
38 Topic Labelling using Text Retrieving text information TOPIC SEARCH ENGINE RESULTS METADATA team, game, james, season, player, nba, play, knicks, coach, league Metadata Title Fields: 1. LeBron James Stats 2. NBA basketball statistics 3. Cleveland Cavaliers 4. Fiba Basketball World Cup
39 Topic Labelling using Text Creating text graph w1 Edge weight: 1 - Unweighted Graph (PR) Normalised PMI - Weighted Graph (PR-NPMI) w2 w3 w4 w5 w6 w7 w8
40 Topic Labelling using Text Creating text graph Edge weight: 1 - Unweighted Graph (PR) Normalised PMI - Weighted Graph (PR-NPMI) w1 Identifying important terms w1 w3 w2 w2 w3 w4 w4 PageRank w5 w6 w7 w8 w5 w6 w7 w8
41 Topic Labelling using Text Creating text graph Edge weight: 1 - Unweighted Graph (PR) Normalised PMI - Weighted Graph (PR-NPMI) w1 Identifying important terms w1 w3 w2 w2 w3 w4 w4 PageRank w5 w7 w8 w5 w6 w6 Scoring Candidate Labels w7 Candidate Label: L = {w1, w2,, wm} Scoring Function: w8
42 Topic Labelling using Text Evaluation and Results 228 topics and 6K labels with human judgements of relevance (Lau et al., 2011). Blogs, books, news, Pubmed. Top-1 Average Rating (0-3) Blogs Books News Pubmed Lau et al. (2011)-U Lau et al. (2011)-S PR 2.05* 1.98* 2.04* 1.88** PR-NPMI 2.08* 2.01* 2.05* 1.90** Upper Bound
43 Topic Labelling using Images Identifying and ranking image labels using unsupervised graph-based methods (Aletras and Stevenson 2013; NAACL-HLT). Retrieving images Extracting textual and visual information Creating image graph Identifying important images
44 Topic Labelling using Images Retrieving text and visual information Text Features: Title and Link Metadata Fields TOPIC SEARCH ENGINE Top-20 Images team, game, james, season, player, nba, play, knicks, coach, league Visual Features: Bags of Visual Words (BOVW)
45 Creating candidate images graph Topic Labelling using Images Edge weight: Cosine of BOVW
46 Topic Labelling using Images Creating candidate images graph Ranking images PageRank Edge weight: Cosine of BOVW
47 Topic Labelling using Images Evaluation and Results 300 topics and 6K image labels with human judgements of relevance. News, Wikipedia articles Top-1 Average Rating (0-3) Random 1.79 Word Overlap 1.85 Google Image Search 1.89 PR-VIS 1.96 PR-VIS+Per(PMI) 2.00* Human Performance 2.24
48 Comparing Topic Representations Which topic representations are suitable within a document browser interface? What is the impact of different topic representations on human search effectiveness for a given query?
49 Comparing Topic Representations Retrieval Task (Aletras et al., 2014; JCDL, Aletras et al., 2015; JASIST). Aim: Identify as many documents relevant to a set of queries as possible within 3 minutes in a document collection organised using topic models.
50 Retrieval Task Document Collection: Reuters Corpus (Rose et al., 2002). 20 subject categories used as queries. 100K documents.
51 Reuters Category (Query) No. Docs Travel & Tourism 314 Domestic Politics (USA) 27,236 War - Civil War 16,615 Biographies, Personalities, People 2,601 Defence 4,224 Crime, Law Enforcement 10,673 Religion 1,477 Disasters & Accidents 3,161 International Relations 19,273 Science & Technology 1,042 Employment/Labour 2,796 Government Finance 17,904 Weather 1,190 Elections 5,866 Environment & Natural World 1,933 Arts, Culture, Entertainment 1,450 Health 1,567 European Commission Institutions 1,046 Sports 18,913 Welfare, Social Services 775
52 Retrieval Task Topic Modelling: LDA. Learned 100 topics. 84 topics after filtering out incoherent topics (Aletras and Stevenson, 2013b).
53 Retrieval Task Topic Browsing Systems: Topic Model Visualisation Engine (TMVE, Chaney and Blei, 2012). 3 topic representations: Keywords (Top-10) Textual Phrases Wikipedia article titles. Images Wikimedia images.
54 Retrieval Task Task: Two-step procedure. Given a query and a set of topics: 1. Identify all potentially relevant topics, 2. Identify relevant documents from a list of documents associated with the selected topics.
55 Retrieval Task Step 1: Identify relevant topics (Keywords).
56 Step 1: Identify relevant topics (Textual Phrases).
57 Step 1: Identify relevant topics (Images).
58 Retrieval Task Step 2: Identify relevant documents.
59 Retrieval Task Subjects and Procedure: 15 members of research staff and graduate students at the Universities of Sheffield, Melbourne and King s College. Each participant had to sign up to our on-line system. Participants had access to a personalised page where they could read instructions, see how many queries they have completed so far and perform a new query. Participants performed each of the 20 queries. Each query has been performed at least by 5 different participants.
60 Results Number of Retrieved Documents Topic Representation Total Keywords 1,086 Text 1,264 Image 1,115
61 Results Precision Topic Representation Average Keywords 0.59 Text 0.53 Image 0.56
62 Conclusions Post-processing the output of topic models can help building better exploratory search interfaces. Filtering-out junk topics. Grouping together similar topics. Assisting to topic interpretation by providing textual or image labels. If you are interested in the topic: Topic Models: Post-Processing and Applications Workshop in CIKM
63 Publications N. Aletras, T. Baldwin, J. H. Lau and M. Stevenson (2015). Evaluating Topic Representations for Exploring Document Collections. JASIST. N. Aletras and M. Stevenson (2014) Labelling Topics using Unsupervised Graph-based Methods. In ACL. N. Aletras and M. Stevenson (2014) Measuring the Similarity between Automatically Generated Topics. In EACL. N. Aletras, T. Baldwin, J. Lau and M. Stevenson (2014) Representing Topics Labels for Exploring Digital Libraries. In JCDL. N. Aletras and M. Stevenson (2013) Representing Topics Using Images. In NAACL-HLT. N. Aletras and M. Stevenson (2013) Evaluating Topic Coherence Using Distributional Semantics. In IWCS.
Evaluating Topic Representations for Exploring Document Collections
Evaluating Topic Representations for Exploring Document Collections Nikolaos Aletras (corresponding author) Computer Science University College London nikos.aletras@gmail.com Timothy Baldwin Computing
More informationComputing Similarity between Cultural Heritage Items using Multimodal Features
Computing Similarity between Cultural Heritage Items using Multimodal Features Nikolaos Aletras and Mark Stevenson Department of Computer Science, University of Sheffield Could the combination of textual
More informationarxiv: v1 [cs.cl] 29 Mar 2019
Re-Ranking Words to Improve Interpretability of Automatically Generated Topics Areej Alokaili 1,2, Nikolaos Aletras 1 and Mark Stevenson 1 1 University of Sheffield, United Kingdom 2 King Saud University,
More informationJianyong Wang Department of Computer Science and Technology Tsinghua University
Jianyong Wang Department of Computer Science and Technology Tsinghua University jianyong@tsinghua.edu.cn Joint work with Wei Shen (Tsinghua), Ping Luo (HP), and Min Wang (HP) Outline Introduction to entity
More informationKnowledge-based Word Sense Disambiguation using Topic Models Devendra Singh Chaplot
Knowledge-based Word Sense Disambiguation using Topic Models Devendra Singh Chaplot Ruslan Salakhutdinov Word Sense Disambiguation Word sense disambiguation (WSD) is defined as the problem of computationally
More informationRepresenting Topics Using Images
Representing Topics Using Images Nikolaos Aletras and Mark Stevenson Department of Computer Science University of Sheffield Regent Court, 211 Portobello Sheffield, S1 4DP, UK {n.aletras, m.stevenson}@dcs.shef.ac.uk
More informationClustering using Topic Models
Clustering using Topic Models Compiled by Sujatha Das, Cornelia Caragea Credits for slides: Blei, Allan, Arms, Manning, Rai, Lund, Noble, Page. Clustering Partition unlabeled examples into disjoint subsets
More informationMetadata Topic Harmonization and Semantic Search for Linked-Data-Driven Geoportals -- A Case Study Using ArcGIS Online
Metadata Topic Harmonization and Semantic Search for Linked-Data-Driven Geoportals -- A Case Study Using ArcGIS Online Yingjie Hu 1, Krzysztof Janowicz 1, Sathya Prasad 2, and Song Gao 1 1 STKO Lab, Department
More informationWhat is this Song About?: Identification of Keywords in Bollywood Lyrics
What is this Song About?: Identification of Keywords in Bollywood Lyrics by Drushti Apoorva G, Kritik Mathur, Priyansh Agrawal, Radhika Mamidi in 19th International Conference on Computational Linguistics
More informationRanking models in Information Retrieval: A Survey
Ranking models in Information Retrieval: A Survey R.Suganya Devi Research Scholar Department of Computer Science and Engineering College of Engineering, Guindy, Chennai, Tamilnadu, India Dr D Manjula Professor
More informationDeveloping Focused Crawlers for Genre Specific Search Engines
Developing Focused Crawlers for Genre Specific Search Engines Nikhil Priyatam Thesis Advisor: Prof. Vasudeva Varma IIIT Hyderabad July 7, 2014 Examples of Genre Specific Search Engines MedlinePlus Naukri.com
More informationText Analytics (Text Mining)
CSE 6242 / CX 4242 Text Analytics (Text Mining) Concepts and Algorithms Duen Horng (Polo) Chau Georgia Tech Some lectures are partly based on materials by Professors Guy Lebanon, Jeffrey Heer, John Stasko,
More informationVisual Query Suggestion
Visual Query Suggestion Zheng-Jun Zha, Linjun Yang, Tao Mei, Meng Wang, Zengfu Wang University of Science and Technology of China Textual Visual Query Suggestion Microsoft Research Asia Motivation Framework
More informationIntroduction to Text Mining. Hongning Wang
Introduction to Text Mining Hongning Wang CS@UVa Who Am I? Hongning Wang Assistant professor in CS@UVa since August 2014 Research areas Information retrieval Data mining Machine learning CS@UVa CS6501:
More informationOutline. Morning program Preliminaries Semantic matching Learning to rank Entities
112 Outline Morning program Preliminaries Semantic matching Learning to rank Afternoon program Modeling user behavior Generating responses Recommender systems Industry insights Q&A 113 are polysemic Finding
More informationText Analytics (Text Mining)
CSE 6242 / CX 4242 Apr 1, 2014 Text Analytics (Text Mining) Concepts and Algorithms Duen Horng (Polo) Chau Georgia Tech Some lectures are partly based on materials by Professors Guy Lebanon, Jeffrey Heer,
More informationRuslan Salakhutdinov and Geoffrey Hinton. University of Toronto, Machine Learning Group IRGM Workshop July 2007
SEMANIC HASHING Ruslan Salakhutdinov and Geoffrey Hinton University of oronto, Machine Learning Group IRGM orkshop July 2007 Existing Methods One of the most popular and widely used in practice algorithms
More informationAutomatic Summarization
Automatic Summarization CS 769 Guest Lecture Andrew B. Goldberg goldberg@cs.wisc.edu Department of Computer Sciences University of Wisconsin, Madison February 22, 2008 Andrew B. Goldberg (CS Dept) Summarization
More informationEvaluating an Associative Browsing Model for Personal Information
Evaluating an Associative Browsing Model for Personal Information Jinyoung Kim, W. Bruce Croft, David A. Smith and Anton Bakalov Department of Computer Science University of Massachusetts Amherst {jykim,croft,dasmith,abakalov}@cs.umass.edu
More informationEvaluation of the Document Categorization in Fixed-point Observatory
Evaluation of the Document Categorization in Fixed-point Observatory Yoshihiro Ueda Mamiko Oka Katsunori Houchi Service Technology Development Department Fuji Xerox Co., Ltd. 3-1 Minatomirai 3-chome, Nishi-ku,
More informationQuery Expansion using Wikipedia and DBpedia
Query Expansion using Wikipedia and DBpedia Nitish Aggarwal and Paul Buitelaar Unit for Natural Language Processing, Digital Enterprise Research Institute, National University of Ireland, Galway firstname.lastname@deri.org
More informationEntity and Knowledge Base-oriented Information Retrieval
Entity and Knowledge Base-oriented Information Retrieval Presenter: Liuqing Li liuqing@vt.edu Digital Library Research Laboratory Virginia Polytechnic Institute and State University Blacksburg, VA 24061
More informationOntological Topic Modeling to Extract Twitter users' Topics of Interest
Ontological Topic Modeling to Extract Twitter users' Topics of Interest Ounas Asfari, Lilia Hannachi, Fadila Bentayeb and Omar Boussaid Abstract--Twitter, as the most notable services of micro-blogs, has
More informationExploiting Conversation Structure in Unsupervised Topic Segmentation for s
Exploiting Conversation Structure in Unsupervised Topic Segmentation for Emails Shafiq Joty, Giuseppe Carenini, Gabriel Murray, Raymond Ng University of British Columbia Vancouver, Canada EMNLP 2010 1
More informationNATURAL LANGUAGE PROCESSING
NATURAL LANGUAGE PROCESSING LESSON 9 : SEMANTIC SIMILARITY OUTLINE Semantic Relations Semantic Similarity Levels Sense Level Word Level Text Level WordNet-based Similarity Methods Hybrid Methods Similarity
More informationInternational Journal of Video& Image Processing and Network Security IJVIPNS-IJENS Vol:10 No:02 7
International Journal of Video& Image Processing and Network Security IJVIPNS-IJENS Vol:10 No:02 7 A Hybrid Method for Extracting Key Terms of Text Documents Ahmad Ali Al-Zubi Computer Science Department
More informationLearning Semantic Entity Representations with Knowledge Graph and Deep Neural Networks and its Application to Named Entity Disambiguation
Learning Semantic Entity Representations with Knowledge Graph and Deep Neural Networks and its Application to Named Entity Disambiguation Hongzhao Huang 1 and Larry Heck 2 Computer Science Department,
More informationA Navigation-log based Web Mining Application to Profile the Interests of Users Accessing the Web of Bidasoa Turismo
A Navigation-log based Web Mining Application to Profile the Interests of Users Accessing the Web of Bidasoa Turismo Olatz Arbelaitz, Ibai Gurrutxaga, Aizea Lojo, Javier Muguerza, Jesús M. Pérez and Iñigo
More informationProf. Ahmet Süerdem Istanbul Bilgi University London School of Economics
Prof. Ahmet Süerdem Istanbul Bilgi University London School of Economics Media Intelligence Business intelligence (BI) Uses data mining techniques and tools for the transformation of raw data into meaningful
More informationMSRA Columbus at GeoCLEF2007
MSRA Columbus at GeoCLEF2007 Zhisheng Li 1, Chong Wang 2, Xing Xie 2, Wei-Ying Ma 2 1 Department of Computer Science, University of Sci. & Tech. of China, Hefei, Anhui, 230026, P.R. China zsli@mail.ustc.edu.cn
More informationIBM Research - China
TIARA: A Visual Exploratory Text Analytic System Furu Wei +, Shixia Liu +, Yangqiu Song +, Shimei Pan #, Michelle X. Zhou*, Weihong Qian +, Lei Shi +, Li Tan + and Qiang Zhang + + IBM Research China, Beijing,
More informationOn-Lib: An Application and Analysis of Fuzzy-Fast Query Searching and Clustering on Library Database
On-Lib: An Application and Analysis of Fuzzy-Fast Query Searching and Clustering on Library Database Ashritha K.P, Sudheer Shetty 4 th Sem M.Tech, Dept. of CS&E, Sahyadri College of Engineering and Management,
More informationUser Intent Discovery using Analysis of Browsing History
User Intent Discovery using Analysis of Browsing History Wael K. Abdallah Information Systems Dept Computers & Information Faculty Mansoura University Mansoura, Egypt Dr. / Aziza S. Asem Information Systems
More informationNational Certificate in Civil Defence (Response) (Level 3)
NZQF NQ Ref 0327 Version 5 Page 1 of 9 National Certificate in Civil Defence (Response) (Level 3) Level 3 Credits 52 This qualification has been reviewed. The last date to meet the requirements is 31 December
More informationImproving Difficult Queries by Leveraging Clusters in Term Graph
Improving Difficult Queries by Leveraging Clusters in Term Graph Rajul Anand and Alexander Kotov Department of Computer Science, Wayne State University, Detroit MI 48226, USA {rajulanand,kotov}@wayne.edu
More informationInformation Retrieval. (M&S Ch 15)
Information Retrieval (M&S Ch 15) 1 Retrieval Models A retrieval model specifies the details of: Document representation Query representation Retrieval function Determines a notion of relevance. Notion
More informationUnderstanding the Query: THCIB and THUIS at NTCIR-10 Intent Task
Understanding the Query: THCIB and THUIS at NTCIR-10 Intent Task Yunqing Xia 1 and Sen Na 2 1 Tsinghua University 2 Canon Information Technology (Beijing) Co. Ltd. Before we start Who are we? THUIS is
More informationFall CS646: Information Retrieval. Lecture 2 - Introduction to Search Result Ranking. Jiepu Jiang University of Massachusetts Amherst 2016/09/12
Fall 2016 CS646: Information Retrieval Lecture 2 - Introduction to Search Result Ranking Jiepu Jiang University of Massachusetts Amherst 2016/09/12 More course information Programming Prerequisites Proficiency
More informationHow Many Topics? Stability Analysis for Topic Models. Derek Greene, Derek O Callaghan, Pádraig Cunningham
How Many Topics? Stability Analysis for Topic Models Derek Greene, Derek O Callaghan, Pádraig Cunningham School of Computer Science & Informatics, University College Dublin {derek.greene,derek.ocallaghan,padraig.cunningham}@ucd.ie
More informationResearching Individuals Revised on 9/2010
1904 Franklin St. Suite 900, Oakland, CA 94612. tel:510.835.4692 fax:510.835.3017 Researching Individuals Revised on 9/2010 This is a list of sources for you to use when researching individuals (like police
More informationAugust 2012 Daejeon, South Korea
Building a Web of Linked Entities (Part I: Overview) Pablo N. Mendes Free University of Berlin August 2012 Daejeon, South Korea Outline Part I A Web of Linked Entities Challenges Progress towards solutions
More informationErin Crane, E-Resources and Instruction Librarian Germanna Community College
Erin Crane, E-Resources and Instruction Librarian Germanna Community College Context The Study Results Recommendations What s Next Main campus in Fredericksburg, VA 4119 FTE 3 libraries http://www.vccs.edu/about/where-we-are/college-locator/
More informationQualitative Data Analysis Software. A workshop for staff & students School of Psychology Makerere University
Qualitative Data Analysis Software A workshop for staff & students School of Psychology Makerere University (PhD) January 27, 2016 Outline for the workshop CAQDAS NVivo Overview Practice 2 CAQDAS Before
More informationDomain-specific user preference prediction based on multiple user activities
7 December 2016 Domain-specific user preference prediction based on multiple user activities Author: YUNFEI LONG, Qin Lu,Yue Xiao, MingLei Li, Chu-Ren Huang. www.comp.polyu.edu.hk/ Dept. of Computing,
More informationExtractive Text Summarization Techniques
Extractive Text Summarization Techniques Tobias Elßner Hauptseminar NLP Tools 06.02.2018 Tobias Elßner Extractive Text Summarization Overview Rough classification (Gupta and Lehal (2010)): Supervised vs.
More informationText Document Clustering Using DPM with Concept and Feature Analysis
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 2, Issue. 10, October 2013,
More informationWikipedia as a Big Data source for Tourism
Wikipedia as a Big Data source for Tourism Global Forum on Tourism Statistics Venice, November 23 rd 2016 Serena Signorelli 1,2, Fernando Reis 2, Silvia Biffignandi 1 1 University of Bergamo 2 EUROSTAT
More informationThe Sheffield and Basque Country Universities Entry to CHiC: using Random Walks and Similarity to access Cultural Heritage
The Sheffield and Basque Country Universities Entry to CHiC: using Random Walks and Similarity to access Cultural Heritage Eneko Agirre 1, Paul Clough 2, Samuel Fernando 2, Mark Hall 2, Arantxa Otegi 1,
More information146 Information Technology
Date : 12/08/2007 Gallica 2.0 : a second life for the Bibliothèque nationale de France digital library Catherine Lupovici, Noémie Lesquins Meeting: Simultaneous Interpretation: No WORLD LIBRARY AND INFORMATION
More informationLab for Media Search, National University of Singapore 1
1 2 Word2Image: Towards Visual Interpretation of Words Haojie Li Introduction Motivation A picture is worth 1000 words Traditional dictionary Containing word entries accompanied by photos or drawing to
More informationGet going with SPORT DISCUS. Summer Life & Health Sciences Library Team ULSTER UNIVERSITY
Get going with SPORT DISCUS A workbook prepared by Ulster Library Life & Health Sciences Library staff on searching the SPORT DISCUS database Life & Health Sciences Library Team ULSTER UNIVERSITY INTRODUCTION
More informationHUKB at NTCIR-12 IMine-2 task: Utilization of Query Analysis Results and Wikipedia Data for Subtopic Mining
HUKB at NTCIR-12 IMine-2 task: Utilization of Query Analysis Results and Wikipedia Data for Subtopic Mining Masaharu Yoshioka Graduate School of Information Science and Technology, Hokkaido University
More informationPrivacy Policy. England Athletics Limited commitment to Privacy. Introduction. The information we collect about you. The information provided to us
Privacy Policy England Athletics Limited commitment to Privacy Introduction The information we collect about you The information provided to us How we use your information Our legal bases for processing
More informationFinding Sources. Introduction Primary and secondary sources Searching for sources Search engines Online databases Blogs Talk About It Your Turn
Finding Sources Introduction Primary and secondary sources Searching for sources Search engines Online databases Blogs Talk About It Your Turn Tech Tools in this presentation LOC Online Catalog Advanced
More informationENHANCEMENT OF METICULOUS IMAGE SEARCH BY MARKOVIAN SEMANTIC INDEXING MODEL
ENHANCEMENT OF METICULOUS IMAGE SEARCH BY MARKOVIAN SEMANTIC INDEXING MODEL Shwetha S P 1 and Alok Ranjan 2 Visvesvaraya Technological University, Belgaum, Dept. of Computer Science and Engineering, Canara
More informationDatabase Foundations. 1-2 Introduction to Databases. Copyright 2015, Oracle and/or its affiliates. All rights reserved.
Database Foundations 1-2 Roadmap You are here About the Course Introduction to Databases Types of Database Models Relational Databases Database Storage Structures Understanding Business Requirements 3
More informationNYU CSCI-GA Fall 2016
1 / 45 Information Retrieval: Personalization Fernando Diaz Microsoft Research NYC November 7, 2016 2 / 45 Outline Introduction to Personalization Topic-Specific PageRank News Personalization Deciding
More informationMaster Project. Various Aspects of Recommender Systems. Prof. Dr. Georg Lausen Dr. Michael Färber Anas Alzoghbi Victor Anthony Arrascue Ayala
Master Project Various Aspects of Recommender Systems May 2nd, 2017 Master project SS17 Albert-Ludwigs-Universität Freiburg Prof. Dr. Georg Lausen Dr. Michael Färber Anas Alzoghbi Victor Anthony Arrascue
More informationImage Similarity Based on Direct Human Judgment
Image Similarity Based on Direct Human Judgment Raul Guerra Dept. of Computer Science University of Maryland College Park, MD 20742 rguerra@cs.umd.edu Abstract Recently the field of human-based computation
More informationA New Tool for Textual Aggregation in OLAP Context
A New Tool for Textual Aggregation in OLAP Context Mustapha BOUAKKAZ 1, Sabine LOUDCHER 2 and Youcef OUINTEN 1 1 LIM Laboratory,University of Laghouat, Algeria 2 ERIC Laboratory, University of Lyon2, France
More informationRecommender Systems: Practical Aspects, Case Studies. Radek Pelánek
Recommender Systems: Practical Aspects, Case Studies Radek Pelánek 2017 This Lecture practical aspects : attacks, context, shared accounts,... case studies, illustrations of application illustration of
More informationOntology based Model and Procedure Creation for Topic Analysis in Chinese Language
Ontology based Model and Procedure Creation for Topic Analysis in Chinese Language Dong Han and Kilian Stoffel Information Management Institute, University of Neuchâtel Pierre-à-Mazel 7, CH-2000 Neuchâtel,
More informationExploring archives with probabilistic models: Topic modelling for the European Commission Archives
Exploring archives with probabilistic models: Topic modelling for the European Commission Archives Simon Hengchen, Mathias Coeckelbergs, Seth van Hooland, Ruben Verborgh & Thomas Steiner Université libre
More informationSemantic Annotation of Web Resources Using IdentityRank and Wikipedia
Semantic Annotation of Web Resources Using IdentityRank and Wikipedia Norberto Fernández, José M.Blázquez, Luis Sánchez, and Vicente Luque Telematic Engineering Department. Carlos III University of Madrid
More informationTourism applications of Artificial Intelligence techniques. Dr. Antonio Moreno, ITAKA research group, URV
Tourism applications of Artificial Intelligence techniques Dr. Antonio Moreno, ITAKA research group, URV ITAKA Basic research lines Multi-agent systems Ontology Learning Information Extraction Automated
More informationPrint Article - Team Managers Manual. This article is also available for viewing online at
This article is also available for viewing online at http://support.ngin.com/questions.php?questionid=200 Team Managers Manual The Sport NGIN website is a content management system, designed to help managers
More informationCOMP 465 Special Topics: Data Mining
COMP 465 Special Topics: Data Mining Introduction & Course Overview 1 Course Page & Class Schedule http://cs.rhodes.edu/welshc/comp465_s15/ What s there? Course info Course schedule Lecture media (slides,
More informationUniversity of Virginia Department of Computer Science. CS 4501: Information Retrieval Fall 2015
University of Virginia Department of Computer Science CS 4501: Information Retrieval Fall 2015 5:00pm-6:15pm, Monday, October 26th Name: ComputingID: This is a closed book and closed notes exam. No electronic
More informationDocument Structure Analysis in Associative Patent Retrieval
Document Structure Analysis in Associative Patent Retrieval Atsushi Fujii and Tetsuya Ishikawa Graduate School of Library, Information and Media Studies University of Tsukuba 1-2 Kasuga, Tsukuba, 305-8550,
More informationSocial Search Introduction to Information Retrieval INF 141/ CS 121 Donald J. Patterson
Social Search Introduction to Information Retrieval INF 141/ CS 121 Donald J. Patterson The Anatomy of a Large-Scale Social Search Engine by Horowitz, Kamvar WWW2010 Web IR Input is a query of keywords
More informationMobile Web in India. Arun Tanksali Jataayu Software
the color of Mobile Web in India Arun Tanksali Jataayu Software Jataayu - Background Jataayu formed with a clear focus of delivering solutions for wireless data services Established in Mar 2000 Handset
More informationLiangjie Hong*, Dawei Yin*, Jian Guo, Brian D. Davison*
Tracking Trends: Incorporating Term Volume into Temporal Topic Models Liangjie Hong*, Dawei Yin*, Jian Guo, Brian D. Davison* Dept. of Computer Science and Engineering, Lehigh University, Bethlehem, PA,
More informationMultimodal Information Spaces for Content-based Image Retrieval
Research Proposal Multimodal Information Spaces for Content-based Image Retrieval Abstract Currently, image retrieval by content is a research problem of great interest in academia and the industry, due
More informationWeb Mining Evolution & Comparative Study with Data Mining
Web Mining Evolution & Comparative Study with Data Mining Anu, Assistant Professor (Resource Person) University Institute of Engineering and Technology Mahrishi Dayanand University Rohtak-124001, India
More informationCPSC 340: Machine Learning and Data Mining. Ranking Fall 2016
CPSC 340: Machine Learning and Data Mining Ranking Fall 2016 Assignment 5: Admin 2 late days to hand in Wednesday, 3 for Friday. Assignment 6: Due Friday, 1 late day to hand in next Monday, etc. Final:
More informationAvailable online at ScienceDirect. Procedia Computer Science 82 (2016 ) 28 34
Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 82 (2016 ) 28 34 Symposium on Data Mining Applications, SDMA2016, 30 March 2016, Riyadh, Saudi Arabia Finding similar documents
More informationSIRS Issues Researcher
From the main screen of SIRS, click on the SIRS Issues Researcher link. 1 This tutorial will provide an overview of the following features available through SIRS Issues Researcher: 2. Search Tabs 3. Reference
More informationMCA APPLICATION TITLES LIST
1. Automated tele-network system JAVA 2. Car Rental System JAVA 3. City Information System JAVA 4. College Feedback System JAVA 5. Design and strategies for online voting system JAVA 6. E-Learning JAVA
More informationInformation Retrieval CS Lecture 01. Razvan C. Bunescu School of Electrical Engineering and Computer Science
Information Retrieval CS 6900 Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Information Retrieval Information Retrieval (IR) is finding material of an unstructured
More informationCIRGDISCO at RepLab2012 Filtering Task: A Two-Pass Approach for Company Name Disambiguation in Tweets
CIRGDISCO at RepLab2012 Filtering Task: A Two-Pass Approach for Company Name Disambiguation in Tweets Arjumand Younus 1,2, Colm O Riordan 1, and Gabriella Pasi 2 1 Computational Intelligence Research Group,
More informationIn = number of words appearing exactly n times N = number of words in the collection of words A = a constant. For example, if N=100 and the most
In = number of words appearing exactly n times N = number of words in the collection of words A = a constant. For example, if N=100 and the most common word appears 10 times then A = rn*n/n = 1*10/100
More informationInformation Retrieval CSCI
Information Retrieval CSCI 4141-6403 My name is Anwar Alhenshiri My email is: anwar@cs.dal.ca I prefer: aalhenshiri@gmail.com The course website is: http://web.cs.dal.ca/~anwar/ir/main.html 5/6/2012 1
More informationQuestion Answering Systems
Question Answering Systems An Introduction Potsdam, Germany, 14 July 2011 Saeedeh Momtazi Information Systems Group Outline 2 1 Introduction Outline 2 1 Introduction 2 History Outline 2 1 Introduction
More informationNamed Entity Disambiguation in Digital Libraries
Erasmus Mundus European Master in Language & Communication Technologies (LCT) Named Entity Disambiguation in Digital Libraries Le DieuThu Supervisors: Raffaella Bernardi Massimo Poesio Patrick Blackburn
More informationPapers for comprehensive viva-voce
Papers for comprehensive viva-voce Priya Radhakrishnan Advisor : Dr. Vasudeva Varma Search and Information Extraction Lab, International Institute of Information Technology, Gachibowli, Hyderabad, India
More informationData-Mining Algorithms with Semantic Knowledge
Data-Mining Algorithms with Semantic Knowledge Ontology-based information extraction Carlos Vicient Monllaó Universitat Rovira i Virgili December, 14th 2010. Poznan A Project funded by the Ministerio de
More informationjldadmm: A Java package for the LDA and DMM topic models
jldadmm: A Java package for the LDA and DMM topic models Dat Quoc Nguyen School of Computing and Information Systems The University of Melbourne, Australia dqnguyen@unimelb.edu.au Abstract: In this technical
More informationPrivacy Notice for Jersey Swimming Club
Privacy Notice for Jersey Swimming Club What personal data does the JSC collect, and what is it used for? Who is your data shared with? Where does this data come from? How is your data stored? Who is responsible
More informationIs Brad Pitt Related to Backstreet Boys? Exploring Related Entities
Is Brad Pitt Related to Backstreet Boys? Exploring Related Entities Nitish Aggarwal, Kartik Asooja, Paul Buitelaar, and Gabriela Vulcu Unit for Natural Language Processing Insight-centre, National University
More informationA Personal Information Retrieval System in a Web Environment
Vol.87 (Art, Culture, Game, Graphics, Broadcasting and Digital Contents 2015), pp.42-46 http://dx.doi.org/10.14257/astl.2015.87.10 A Personal Information Retrieval System in a Web Environment YoungDeok
More informationThe University of Amsterdam at the CLEF 2008 Domain Specific Track
The University of Amsterdam at the CLEF 2008 Domain Specific Track Parsimonious Relevance and Concept Models Edgar Meij emeij@science.uva.nl ISLA, University of Amsterdam Maarten de Rijke mdr@science.uva.nl
More informationBasic techniques. Text processing; term weighting; vector space model; inverted index; Web Search
Basic techniques Text processing; term weighting; vector space model; inverted index; Web Search Overview Indexes Query Indexing Ranking Results Application Documents User Information analysis Query processing
More informationTriRank: Review-aware Explainable Recommendation by Modeling Aspects
TriRank: Review-aware Explainable Recommendation by Modeling Aspects Xiangnan He, Tao Chen, Min-Yen Kan, Xiao Chen National University of Singapore Presented by Xiangnan He CIKM 15, Melbourne, Australia
More informationVisoLink: A User-Centric Social Relationship Mining
VisoLink: A User-Centric Social Relationship Mining Lisa Fan and Botang Li Department of Computer Science, University of Regina Regina, Saskatchewan S4S 0A2 Canada {fan, li269}@cs.uregina.ca Abstract.
More informationCS224W: Social and Information Network Analysis Project Report: Edge Detection in Review Networks
CS224W: Social and Information Network Analysis Project Report: Edge Detection in Review Networks Archana Sulebele, Usha Prabhu, William Yang (Group 29) Keywords: Link Prediction, Review Networks, Adamic/Adar,
More informationQuery Subtopic Mining Exploiting Word Embedding for Search Result Diversification
Query Subtopic Mining Exploiting Word Embedding for Search Result Diversification Md Zia Ullah, Md Shajalal, Abu Nowshed Chy, and Masaki Aono Department of Computer Science and Engineering, Toyohashi University
More informationChinese On The Go By Live ABC
Chinese On The Go By Live ABC If you are searched for a ebook Chinese On the Go by Live ABC in pdf format, in that case you come on to correct website. We furnish the complete edition of this ebook in
More informationTALP at WePS Daniel Ferrés and Horacio Rodríguez
TALP at WePS-3 2010 Daniel Ferrés and Horacio Rodríguez TALP Research Center, Software Department Universitat Politècnica de Catalunya Jordi Girona 1-3, 08043 Barcelona, Spain {dferres, horacio}@lsi.upc.edu
More informationNatural Language Processing SoSe Question Answering. (based on the slides of Dr. Saeedeh Momtazi)
Natural Language Processing SoSe 2015 Question Answering Dr. Mariana Neves July 6th, 2015 (based on the slides of Dr. Saeedeh Momtazi) Outline 2 Introduction History QA Architecture Outline 3 Introduction
More informationmyttm General Help for Teams
This document is intended for myttm team users. It provides information specifically for teams when using the myttm Web Service portal. The following topics will be discussed in this document: Sign In
More information