Towards Breaking the Quality Curse. AWebQuerying Web-Querying Approach to Web People Search.
|
|
- Howard Bradley
- 5 years ago
- Views:
Transcription
1 Towards Breaking the Quality Curse. AWebQuerying Web-Querying Approach to Web People Search. Dmitri V. Kalashnikov Rabia Nuray-Turan Sharad Mehrotra Dept of Computer Science University of California, Irvine 1
2 Searching for People on the Web Heterogeneity Approx. 5-10% of internet searches Andrew ACM: Items for look for specific people by their names [Guha&Garg 04] McCallum s Homepage Associate Professor Andrew Kachites McCallum Author: McCallum, R. Andrew Challenges: Heterogeneity of representations: Same entity represented multiple ways. Ambiguity in representations: Same representation of multiple entities. WePS systems address both these challenges. DBLP: Andrew McCallum UMass professor UMass professor Ambiguity Mental Floss Andrew McCallum Musician News: Andrew McCallum Australia CROSS president 2
3 Search Engines v WePS WePS system return clusters of web pages wherein pages of the same entity are clustered together Google results for Andrew McCallum WePS results for Andrew McCallum Umass professor photographer physician 3
4 Clustering Search Engines v WePS Examples of clustering search engines: Clusty Kartoo Search results of Andrew McCallum using Clusty WePS system results of Andrew McCallum Differences: While both return clustered search results, Clustering search engines cluster similar/related pages based on broad topics (e.g., research, photos) while WePS system cluster based on disambiguating amongst various namesakes. 4
5 WePS System Architecture Classified based on where references are disambiguated Client side solution the client resolves references Client Disamb. Approach Query Returned pages Server Web Proxy Solution Query Proxy Proxy resolves references Server side solution The server resolves references Client Client clusters clusters Disamb. Approach Query Web Returned pages Server Web Server Disamb. Approach 5
6 Current WePS Approaches Commercial Systems: Zoominfo Spock Multiple research proposals [Kamha et al. 04; Artiles et. al 07 & 05; Bekkerman et. al. 05; Bollegala et. al. 06;Matsou et. al. 06] etc. Web people search task included in Sem-Eval workshop in teams participated in the task WePS techniques differ from each other in: data analyzed for disambiguation & Clustering approach used Type of data analyzed Baseline exploit information in result pages only Keywords, named entities, hyperlinks, s, [Bekkerman et.al. 05], [Li, et. al. 05][Elmachioglu, et. al. 07] exploit external information Ontologies, Encyclopedia, [Kalashnikov, et. al, 07] Exploit Web content Crawling [Kanani et. al. 07] Search engine Statistics 6
7 Contributions of this paper Exploiting Co-occurrence occurrence statistics from the search engine for WePS. A novel classification approach based on skylines to support reference resolution. Outperforms existing techniques on multiple data sets 7
8 Approach Overview Top-K Preprocessing Post processing Webpages 1.Extract NEs 2. Filter Initial Clustering Preprocessed pages 1. Rank clusters 2. Extract cluster sketches 3. Rank pages in clusters Final clusters Person1 Refining Clusters using Web Queries Person3 Person2 1. Compute similarity 1. Web Query using TF/IDF on NE 2. Feature Extraction 2. Cluster based on Initial clusters 3. Skyline based classifier TF/IDF similarity 4. Cluster refinement 8
9 Preprocessing: Named Entity Extraction Locate query name on web page Extract M neighboring names entities organization people Names Stanford Named Entity Recognizer & GATE used People NEs Lise Getoor Ben Taskar John McCallum Charles Organization UMASS ACM Queue Location Amherst 9
10 Preprocessing: Filtering Named Input: Extracted NEs Entities Location Filter Remove location NEs by looking up a dictionary Ambiguous name Filter Remove one-word person names, such as John Ambiguous Org. Filter Remove common English words Same last name Filter Remove the person names with the same last name. Output: Cleaned NEs 10
11 Initial Clustering Resolved references Threshold h = Intuition: If two web pages significantly overlap in terms of associated NEs, then the two pages refer to the same individual. Approach: Compute similarity between web pages based on TF/IDF over Named Entities Merge all pairs of Web pages whose TF/IDF similarity exceeds a threshold Threshold learnt over the training dataset. 11
12 Cluster Refinement Initial Clustering based on TF/IDF similarity: Accurate references resolved are usually correct. Conservative too many unresolved references Andrew McCallum UMASS 0.01 Andrew McCallum CMU Key Observation: Co-occurrence Andrew statistics can McCallum Job: UMASS Phd: CMU Web may contain additional information that may further resolve unresolved references. Co-occurrenceoccurrence between contexts associated with unresolved references be learnt by querying the search engine 12
13 Using Co-occurrence to Disambiguate Query N Andrew McCallum Search Results D1 Andrew McCallum UMASS D2 Number of web pages containing Andrew McCallum AND UMASS AND CMU Andrew McCallum CMU Context associated with Search results C1 = UMASS C2 = CMU N. C. C CMU Overlap What between if context t 1NEs 2 " AndrewMcCa llum "". UMASS "". CMU " Ci and Cj are very general contexts N " Andrew McCallum " over the Web The co-occurrence counts could be high for all namesakes! Solution -- Normalize N Ci Cj using Dice similarity overlap between contexts associated with two pages of the same entity expected to be higher compared to pages of unrelated entities. Merge Web pages Di and Dj based on N.Ci.Cj / N? 13
14 Dice Similarity Two possible versions of dice: will be large for general context words 14
15 Dice2 Similarity -- Example Let an N O 1 O 2 query be: Q: Andrew McCallum AND Yahoo AND Google Let counts be: Andrew McCallum AND Yahoo AND Google 522 Andrew McCallum AND Yahoo 2470 Andrew McCallum AND Google 8950 Andrew McCallum Yahoo AND Google Dice: Dice 1 = 0.05 Dice 2 = 7.15e-6 Dice 2 captures that Yahoo and Google are too unspecific Organization Entities to be a good context for Andrew McCallum Dice 2 leads to visibly better results in practice! 15
16 Web Queries to Gather Co-Occurrence Statistics Context People NEs Lise Getoor Ben Taskar Organization UMASS ACM People NEs Charles A. Sutton Ben Wellner Organization DBLP KDD Association between context (L.G. OR B.T.) (L.G. OR B.T.) (UMASS OR ACM) AND AND AND (C.A.S. OR B.W.) (DBLP OR KDD) (C.A.S. OR B.W.) (UMASS OR ACM) AND (DBLP OR KDD) Q1 Q2 Q3 Q4 (#97) (#2,570) (#720) (#2,190K) N AND Q1 N AND Q2 N AND Q3 N AND Q4 relevant to original query N (#79) (#49) (#63) (#10,100) Association amongst Context within pages Ex: Q1 = ( Lise Getoor OR Ben Taskar ) AND ( Charles A. Sutton OR Ben Wellner ) 8 web queries 16
17 Creating Features from Web Co-Occurrence Counts 8-Feature Vector 1: N P i P j = 79/152K = : Dice(N, P i P j ) = : N P i O j = 49/152K = : Dice(N, P i O j j) ) = : N O i P j = 63/152K = : D(N, O i P j ) = : N O i O j = 10,100 = : D(N, O i O j ) =
18 Role of a Classifier Initial Clusters Edges with 8D Features Final Clusters Web Co-Occurrence Feature Generation Edge Classification Classifier used to mark edges as merge (+) or do not merge (-) resulting in the final cluster. Since the features are similarity based, the classifier should satisfy the dominance property 18
19 Dominance Property Let f and g be 8D feature vectors f =(f 1, f 2,,f 8 ) g = (g 1,g 2,,g 8 ) Notation (f up dominates g) if f 1 g 1, f 2 g 2,, f 8 g 8 (f down dominates g) if f 1 g 1, f 2 g 2,, f 8 g 8 Dominance Property for classifier Let f and g be the features associated with edges (d i, d j ) and (d m, d n ) respectively. If f g and the classifier decides to merge (d i, d j ), then it must also merge (d m, d n ) 19
20 Skyline-Based Classification A skyline classifier consists of an up-dominating classification skyline All points above or on it are declared merge ( + ) The rest are don t merge ( - ) Automatically takes care of dominance Our goal is to learn a classification skyline that supports a good final clusters of search results. 20
21 Learning Classification Skyline Initial CS Greedy Approach Initialize classification skyline with the point that t dominates the entire space. Pool of points to add to CS Down-dominating skyline on active + edges Maintain current clustering Clustering induced d by current CS Choosing a point from pool to CS that Stage1: Minimizes loss of quality due to adding new - edges Stage2: Maximizes overall quality Iterate until done Output best skyline discovered 21
22 Example: Adding a point to CS Ground Truth Clustering: Initial Clustering: Current CS = {q,d,c} Current Clustering: Point Pool = {f,e,u,h} 22
23 Example: Considering f f k a What if add f to CS? v b e s g l w q u m o i h d t p c r j Stage 1: Adding edge k Fp = Stage 2: Adding edge f Fp = (a) 23
24 Example: Considering e f k a What if add e to CS? v b e s g l w q u m o i h d t p c r j Stage 1: Adding edges k and v Fp = Stage 2: Adding edge e Fp = (a) 24
25 Example: Considering u What if add u to CS? f k a v b e s g l w q u m o i h d t p c r j Stage 1: Adding edge i Fp = Stage 2: Adding edge u Fp = (a) 25
26 Example: Considering h What if add h to CS? f k a v b e s g l w q u m o i h d t p c r j Stage 1: Adding edges i and t Fp = Stage 2: Adding edge h Fp = (a) 26
27 Example: Stage2 f k a After Stage1 only f and u remain v b e q s g l m u i d h t r j w o p c (b) Stage 1 (for f): Fp = Stage 2 (for f): Fp = Stage 1 (for u): Fp = Stage 2 (for u): Fp = Thus adding f to CS! 27
28 Why Skyline-based classifier? Specialized classifier Designed specifically for the problem at hand Is aware of the clustering underneath Fine-tunes itself to a given clustering quality metric Such as F 1, F P, F B Takes into account the dominance in 8D data Explained shortly Takes into account the transitivity of merges Classifying one edge as + might lead to classifying multiple l edges as + For instance when two large clusters are merged 28
29 Post Processing Page Ranking Top-K Use the original ranking from search engine Preprocessing Post processing Webpages Cluster Ranking 1.Extract NEs each cluster we simply select the highest 2Fil 2. ranked Filter page and use that as the order of the cluster. Preprocessed pages Cluster Sketches coalesce all pages in the cluster into a single page removing Initial Clustering the stop words we compute the TF/IDF 1. Compute of the remaining similarity words Select using set of TF/IDF terms on above NE a certain 2. Cluster threshold based (or on top N terms) TF/IDF similarity Initial clusters 1. Rank clusters 2. Extract cluster sketches 3. Rank pages in clusters Final clusters 1. Web Query 2. Feature Extraction 3. Skyline based classifier 4. Cluster refinement Person1 Refining Clusters using Web Queries Person3 Person2 29
30 Experiments Datasets WWW 05 By Bekkerman and McCallum 12 different Person Names all related to each other. Each Person name has different number of namesakes. SIGIR 05 By Artiles et al 9 different Person Names, each with different number of namesakes. WEPS For Semeval Workshop on WEPS task (by Artiles et al) Training dataset ( 49 Different Person Names) Test Dataset (30 Different Person Names) 30
31 Quality Measures F P : Harmonic mean of purity and inverse purity F B : Harmonic mean of B-cubed precision and recall. 31
32 Experiments Baselines Named Entity based Clustering (NE) TF/IDF merging scheme that uses the NEs in the Web Pages Web-based Co-Occurrence Clustering (WebDice) Computes the similarity between Web pages by summing up the individual dice similarities of People-People and People-Organization similarities of Web pages. 32
33 Experiments on WWW & SIGIR Datasets 33
34 Experiments on the WEPS Dataset 34
35 Effect of Initial Clustering 35
36 Conclusion and Future Work Web co-occurrence statistics can significantly improve disambiguation quality in WePS task. Skyline classifier is effective in making merge decisions. However, current approach has limitations: large number of search engine queries to collect statistics will work only as a server side solution given limitations today. Based on the assumption that context of different entities is very different. May not work as well on the difficult case when contexts of different entities significantly overlap. 36
Towards Breaking the Quality Curse. A Web-Querying Approach to Web People Search.
Towards Breaking the Quality Curse. A Web-Querying Approach to Web People Search. Dmitri V. Kalashnikov dvk@ics.uci.edu Rabia Nuray-Turan rnuray@ics.uci.edu Department of Computer Science University of
More informationSEARCHING for entities is a common activity in Internet
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 20, NO. 11, NOVEMBER 2008 1 Web People Search via Connection Analysis Dmitri V. Kalashnikov, Zhaoqi (Stella) Chen, Sharad Mehrotra, Member, IEEE,
More informationExploiting Context Analysis for Combining Multiple Entity Resolution Systems
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems Zhaoqi Chen Microsoft Corporation Redmond, USA schen@microsoft.com Dmitri V. Kalashnikov Dept. of Computer Science University
More informationACM Journal of Data and Information Quality (ACM JDIQ), 4(2), Adaptive Connection Strength Models for Relationship-based Entity Resolution
ACM Journal of Data and Information Quality (ACM JDIQ), 4(2), 2013 Adaptive Connection Strength Models for Relationship-based Entity Resolution A RABIA NURAY-TURAN DMITRI V. KALASHNIKOV SHARAD MEHROTRA
More informationAdaptive Graphical Approach to Entity Resolution
Adaptive Graphical Approach to Entity Resolution Zhaoqi Chen Dmitri V. Kalashnikov Sharad Mehrotra Computer Science Department University of California, Irvine ABSTRACT Entity resolution is a very common
More informationName Disambiguation Using Web Connection
Name Disambiguation Using Web nection Yiming Lu 1*, Zaiqing Nie, Taoyuan Cheng *, Ying Gao *, Ji-Rong Wen Microsoft Research Asia, Beijing, China 1 University of California, Irvine, U.S.A. Renmin University
More informationLink Mining & Entity Resolution. Lise Getoor University of Maryland, College Park
Link Mining & Entity Resolution Lise Getoor University of Maryland, College Park Learning in Structured Domains Traditional machine learning and data mining approaches assume: A random sample of homogeneous
More informationIJMIE Volume 2, Issue 8 ISSN:
DISCOVERY OF ALIASES NAME FROM THE WEB N.Thilagavathy* T.Balakumaran** P.Ragu** R.Ranjith kumar** Abstract An individual is typically referred by numerous name aliases on the web. Accurate identification
More informationAssigning Vocation-Related Information to Person Clusters for Web People Search Results
Global Congress on Intelligent Systems Assigning Vocation-Related Information to Person Clusters for Web People Search Results Hiroshi Ueda 1) Harumi Murakami 2) Shoji Tatsumi 1) 1) Graduate School of
More informationWeb People Search using Ontology Based Decision Tree Mrunal Patil 1, Sonam Khomane 2, Varsha Saykar, 3 Kavita Moholkar 4
Web People Search using Ontology Based Decision Tree Mrunal Patil 1, Sonam Khomane 2, Varsha Saykar, 3 Kavita Moholkar 4 * (Department of Computer Engineering, Rajarshi Shahu College of Engineering, Pune-411033,
More informationGraph Mining and Social Network Analysis
Graph Mining and Social Network Analysis Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References q Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", The Morgan Kaufmann
More informationAssigning Location Information to Display Individuals on a Map for Web People Search Results
Assigning Location Information to Display Individuals on a Map for Web People Search Results Harumi Murakami 1, Yuya Takamori 1,, Hiroshi Ueda 2, and Shoji Tatsumi 2 1 Graduate School for Creative Cities,
More informationReview: Identification of cell types from single-cell transcriptom. method
Review: Identification of cell types from single-cell transcriptomes using a novel clustering method University of North Carolina at Charlotte October 12, 2015 Brief overview Identify clusters by merging
More informationDisambiguating Search by Leveraging a Social Context Based on the Stream of User s Activity
Disambiguating Search by Leveraging a Social Context Based on the Stream of User s Activity Tomáš Kramár, Michal Barla and Mária Bieliková Faculty of Informatics and Information Technology Slovak University
More informationTALP at WePS Daniel Ferrés and Horacio Rodríguez
TALP at WePS-3 2010 Daniel Ferrés and Horacio Rodríguez TALP Research Center, Software Department Universitat Politècnica de Catalunya Jordi Girona 1-3, 08043 Barcelona, Spain {dferres, horacio}@lsi.upc.edu
More informationInferring User Search for Feedback Sessions
Inferring User Search for Feedback Sessions Sharayu Kakade 1, Prof. Ranjana Barde 2 PG Student, Department of Computer Science, MIT Academy of Engineering, Pune, MH, India 1 Assistant Professor, Department
More informationEntity Linking. David Soares Batista. November 11, Disciplina de Recuperação de Informação, Instituto Superior Técnico
David Soares Batista Disciplina de Recuperação de Informação, Instituto Superior Técnico November 11, 2011 Motivation Entity-Linking is the process of associating an entity mentioned in a text to an entry,
More informationCollective Entity Resolution in Relational Data
Collective Entity Resolution in Relational Data I. Bhattacharya, L. Getoor University of Maryland Presented by: Srikar Pyda, Brett Walenz CS590.01 - Duke University Parts of this presentation from: http://www.norc.org/pdfs/may%202011%20personal%20validation%20and%20entity%20resolution%20conference/getoorcollectiveentityresolution
More informationJianyong Wang Department of Computer Science and Technology Tsinghua University
Jianyong Wang Department of Computer Science and Technology Tsinghua University jianyong@tsinghua.edu.cn Joint work with Wei Shen (Tsinghua), Ping Luo (HP), and Min Wang (HP) Outline Introduction to entity
More informationWeb Page Classification using FP Growth Algorithm Akansha Garg,Computer Science Department Swami Vivekanad Subharti University,Meerut, India
Web Page Classification using FP Growth Algorithm Akansha Garg,Computer Science Department Swami Vivekanad Subharti University,Meerut, India Abstract - The primary goal of the web site is to provide the
More informationQuery-time Entity Resolution
Journal of Artificial Intelligence Research 30 (2007) 621-657 Submitted 03/07; published 12/07 Query-time Entity Resolution Indrajit Bhattacharya IBM India Research Laboratory Vasant Kunj, New Delhi 110
More informationRSDC 09: Tag Recommendation Using Keywords and Association Rules
RSDC 09: Tag Recommendation Using Keywords and Association Rules Jian Wang, Liangjie Hong and Brian D. Davison Department of Computer Science and Engineering Lehigh University, Bethlehem, PA 18015 USA
More informationA BFS-BASED SIMILAR CONFERENCE RETRIEVAL FRAMEWORK
A BFS-BASED SIMILAR CONFERENCE RETRIEVAL FRAMEWORK Qing Guo 1, 2 1 Nanyang Technological University, Singapore 2 SAP Innovation Center Network,Singapore ABSTRACT Literature review is part of scientific
More informationIdentifying and Ranking Possible Semantic and Common Usage Categories of Search Engine Queries
Identifying and Ranking Possible Semantic and Common Usage Categories of Search Engine Queries Reza Taghizadeh Hemayati 1, Weiyi Meng 1, Clement Yu 2 1 Department of Computer Science, Binghamton university,
More informationEffective Latent Space Graph-based Re-ranking Model with Global Consistency
Effective Latent Space Graph-based Re-ranking Model with Global Consistency Feb. 12, 2009 1 Outline Introduction Related work Methodology Graph-based re-ranking model Learning a latent space graph A case
More informationAutomatic Identification of User Goals in Web Search [WWW 05]
Automatic Identification of User Goals in Web Search [WWW 05] UichinLee @ UCLA ZhenyuLiu @ UCLA JunghooCho @ UCLA Presenter: Emiran Curtmola@ UC San Diego CSE 291 4/29/2008 Need to improve the quality
More informationDiversification of Query Interpretations and Search Results
Diversification of Query Interpretations and Search Results Advanced Methods of IR Elena Demidova Materials used in the slides: Charles L.A. Clarke, Maheedhar Kolla, Gordon V. Cormack, Olga Vechtomova,
More informationEvaluation Methods for Focused Crawling
Evaluation Methods for Focused Crawling Andrea Passerini, Paolo Frasconi, and Giovanni Soda DSI, University of Florence, ITALY {passerini,paolo,giovanni}@dsi.ing.unifi.it Abstract. The exponential growth
More informationPRIS at TAC2012 KBP Track
PRIS at TAC2012 KBP Track Yan Li, Sijia Chen, Zhihua Zhou, Jie Yin, Hao Luo, Liyin Hong, Weiran Xu, Guang Chen, Jun Guo School of Information and Communication Engineering Beijing University of Posts and
More informationInformation Retrieval. Lecture 9 - Web search basics
Information Retrieval Lecture 9 - Web search basics Seminar für Sprachwissenschaft International Studies in Computational Linguistics Wintersemester 2007 1/ 30 Introduction Up to now: techniques for general
More informationDynamically Building Facets from Their Search Results
Dynamically Building Facets from Their Search Results Anju G. R, Karthik M. Abstract: People are very passionate in searching new things and gaining new knowledge. They usually prefer search engines to
More informationText Categorization (I)
CS473 CS-473 Text Categorization (I) Luo Si Department of Computer Science Purdue University Text Categorization (I) Outline Introduction to the task of text categorization Manual v.s. automatic text categorization
More informationEnhanced Retrieval of Web Pages using Improved Page Rank Algorithm
Enhanced Retrieval of Web Pages using Improved Page Rank Algorithm Rekha Jain 1, Sulochana Nathawat 2, Dr. G.N. Purohit 3 1 Department of Computer Science, Banasthali University, Jaipur, Rajasthan ABSTRACT
More informationA Novel Categorized Search Strategy using Distributional Clustering Neenu Joseph. M 1, Sudheep Elayidom 2
A Novel Categorized Search Strategy using Distributional Clustering Neenu Joseph. M 1, Sudheep Elayidom 2 1 Student, M.E., (Computer science and Engineering) in M.G University, India, 2 Associate Professor
More informationUMass at TREC 2006: Enterprise Track
UMass at TREC 2006: Enterprise Track Desislava Petkova and W. Bruce Croft Center for Intelligent Information Retrieval Department of Computer Science University of Massachusetts, Amherst, MA 01003 Abstract
More informationTERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES
TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES Mu. Annalakshmi Research Scholar, Department of Computer Science, Alagappa University, Karaikudi. annalakshmi_mu@yahoo.co.in Dr. A.
More informationLSI UNED at M-WePNaD: Embeddings for Person Name Disambiguation
LSI UNED at M-WePNaD: Embeddings for Person Name Disambiguation Andres Duque, Lourdes Araujo, and Juan Martinez-Romo Dpto. Lenguajes y Sistemas Informáticos Universidad Nacional de Educación a Distancia
More informationJoint Entity Resolution
Joint Entity Resolution Steven Euijong Whang, Hector Garcia-Molina Computer Science Department, Stanford University 353 Serra Mall, Stanford, CA 94305, USA {swhang, hector}@cs.stanford.edu No Institute
More informationBruno Martins. 1 st Semester 2012/2013
Link Analysis Departamento de Engenharia Informática Instituto Superior Técnico 1 st Semester 2012/2013 Slides baseados nos slides oficiais do livro Mining the Web c Soumen Chakrabarti. Outline 1 2 3 4
More informationOutline. Possible solutions. The basic problem. How? How? Relevance Feedback, Query Expansion, and Inputs to Ranking Beyond Similarity
Outline Relevance Feedback, Query Expansion, and Inputs to Ranking Beyond Similarity Lecture 10 CS 410/510 Information Retrieval on the Internet Query reformulation Sources of relevance for feedback Using
More informationSCUBA DIVER: SUBSPACE CLUSTERING OF WEB SEARCH RESULTS
SCUBA DIVER: SUBSPACE CLUSTERING OF WEB SEARCH RESULTS Fatih Gelgi, Srinivas Vadrevu, Hasan Davulcu Department of Computer Science and Engineering, Arizona State University, Tempe, AZ fagelgi@asu.edu,
More informationNUS-I2R: Learning a Combined System for Entity Linking
NUS-I2R: Learning a Combined System for Entity Linking Wei Zhang Yan Chuan Sim Jian Su Chew Lim Tan School of Computing National University of Singapore {z-wei, tancl} @comp.nus.edu.sg Institute for Infocomm
More informationTREC 2016 Dynamic Domain Track: Exploiting Passage Representation for Retrieval and Relevance Feedback
RMIT @ TREC 2016 Dynamic Domain Track: Exploiting Passage Representation for Retrieval and Relevance Feedback Ameer Albahem ameer.albahem@rmit.edu.au Lawrence Cavedon lawrence.cavedon@rmit.edu.au Damiano
More informationScholarly Big Data: Leverage for Science
Scholarly Big Data: Leverage for Science C. Lee Giles The Pennsylvania State University University Park, PA, USA giles@ist.psu.edu http://clgiles.ist.psu.edu Funded in part by NSF, Allen Institute for
More informationOutline. Eg. 1: DBLP. Motivation. Eg. 2: ACM DL Portal. Eg. 2: DBLP. Digital Libraries (DL) often have many errors that negatively affect:
Outline Effective and Scalable Solutions for Mixed and Split Citation Problems in Digital Libraries Dongwon Lee, Byung-Won On Penn State University, USA Jaewoo Kang North Carolina State University, USA
More informationEntity Linking at TAC Task Description
Entity Linking at TAC 2013 Task Description Version 1.0 of April 9, 2013 1 Introduction The main goal of the Knowledge Base Population (KBP) track at TAC 2013 is to promote research in and to evaluate
More informationMEASURING SEMANTIC SIMILARITY BETWEEN WORDS AND IMPROVING WORD SIMILARITY BY AUGUMENTING PMI
MEASURING SEMANTIC SIMILARITY BETWEEN WORDS AND IMPROVING WORD SIMILARITY BY AUGUMENTING PMI 1 KAMATCHI.M, 2 SUNDARAM.N 1 M.E, CSE, MahaBarathi Engineering College Chinnasalem-606201, 2 Assistant Professor,
More informationTitle: Artificial Intelligence: an illustration of one approach.
Name : Salleh Ahshim Student ID: Title: Artificial Intelligence: an illustration of one approach. Introduction This essay will examine how different Web Crawling algorithms and heuristics that are being
More informationSupervised Models for Multimodal Image Retrieval based on Visual, Semantic and Geographic Information
Supervised Models for Multimodal Image Retrieval based on Visual, Semantic and Geographic Information Duc-Tien Dang-Nguyen, Giulia Boato, Alessandro Moschitti, Francesco G.B. De Natale Department of Information
More informationNTU Approaches to Subtopic Mining and Document Ranking at NTCIR-9 Intent Task
NTU Approaches to Subtopic Mining and Document Ranking at NTCIR-9 Intent Task Chieh-Jen Wang, Yung-Wei Lin, *Ming-Feng Tsai and Hsin-Hsi Chen Department of Computer Science and Information Engineering,
More informationijade Reporter An Intelligent Multi-agent Based Context Aware News Reporting System
ijade Reporter An Intelligent Multi-agent Based Context Aware Reporting System Eddie C.L. Chan and Raymond S.T. Lee The Department of Computing, The Hong Kong Polytechnic University, Hung Hong, Kowloon,
More informationIRCE at the NTCIR-12 IMine-2 Task
IRCE at the NTCIR-12 IMine-2 Task Ximei Song University of Tsukuba songximei@slis.tsukuba.ac.jp Yuka Egusa National Institute for Educational Policy Research yuka@nier.go.jp Masao Takaku University of
More informationRishiraj Saha Roy and Niloy Ganguly IIT Kharagpur India. Monojit Choudhury and Srivatsan Laxman Microsoft Research India India
Rishiraj Saha Roy and Niloy Ganguly IIT Kharagpur India Monojit Choudhury and Srivatsan Laxman Microsoft Research India India ACM SIGIR 2012, Portland August 15, 2012 Dividing a query into individual semantic
More informationCS230: Lecture 3 Various Deep Learning Topics
CS230: Lecture 3 Various Deep Learning Topics Kian Katanforoosh, Andrew Ng Today s outline We will learn how to: - Analyse a problem from a deep learning approach - Choose an architecture - Choose a loss
More informationShrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India
International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 3 ISSN : 2456-3307 Some Issues in Application of NLP to Intelligent
More informationAN EFFECTIVE SEARCH ON WEB LOG FROM MOST POPULAR DOWNLOADED CONTENT
AN EFFECTIVE SEARCH ON WEB LOG FROM MOST POPULAR DOWNLOADED CONTENT Brindha.S 1 and Sabarinathan.P 2 1 PG Scholar, Department of Computer Science and Engineering, PABCET, Trichy 2 Assistant Professor,
More informationMeasuring Semantic Similarity between Words Using Page Counts and Snippets
Measuring Semantic Similarity between Words Using Page Counts and Snippets Manasa.Ch Computer Science & Engineering, SR Engineering College Warangal, Andhra Pradesh, India Email: chandupatla.manasa@gmail.com
More informationArtificial Intelligence. Programming Styles
Artificial Intelligence Intro to Machine Learning Programming Styles Standard CS: Explicitly program computer to do something Early AI: Derive a problem description (state) and use general algorithms to
More informationPERSONALIZED MOBILE SEARCH ENGINE BASED ON MULTIPLE PREFERENCE, USER PROFILE AND ANDROID PLATFORM
PERSONALIZED MOBILE SEARCH ENGINE BASED ON MULTIPLE PREFERENCE, USER PROFILE AND ANDROID PLATFORM Ajit Aher, Rahul Rohokale, Asst. Prof. Nemade S.B. B.E. (computer) student, Govt. college of engg. & research
More informationEfficient Strategies for Improving Partitioning-Based Author Coreference by Incorporating Web Pages as Graph Nodes
Efficient Strategies for Improving Partitioning-Based Author Coreference by Incorporating Web Pages as Graph Nodes Pallika Kanani and Andrew McCallum Department of Computer Science, UMass, Amherst 140
More informationLink Analysis and Web Search
Link Analysis and Web Search Moreno Marzolla Dip. di Informatica Scienza e Ingegneria (DISI) Università di Bologna http://www.moreno.marzolla.name/ based on material by prof. Bing Liu http://www.cs.uic.edu/~liub/webminingbook.html
More informationOutlier Ensembles. Charu C. Aggarwal IBM T J Watson Research Center Yorktown, NY Keynote, Outlier Detection and Description Workshop, 2013
Charu C. Aggarwal IBM T J Watson Research Center Yorktown, NY 10598 Outlier Ensembles Keynote, Outlier Detection and Description Workshop, 2013 Based on the ACM SIGKDD Explorations Position Paper: Outlier
More informationEvaluation. Evaluate what? For really large amounts of data... A: Use a validation set.
Evaluate what? Evaluation Charles Sutton Data Mining and Exploration Spring 2012 Do you want to evaluate a classifier or a learning algorithm? Do you want to predict accuracy or predict which one is better?
More information7. Mining Text and Web Data
7. Mining Text and Web Data Contents of this Chapter 7.1 Introduction 7.2 Data Preprocessing 7.3 Text and Web Clustering 7.4 Text and Web Classification 7.5 References [Han & Kamber 2006, Sections 10.4
More informationSOURCERER: MINING AND SEARCHING INTERNET- SCALE SOFTWARE REPOSITORIES
SOURCERER: MINING AND SEARCHING INTERNET- SCALE SOFTWARE REPOSITORIES Introduction to Information Retrieval CS 150 Donald J. Patterson This content based on the paper located here: http://dx.doi.org/10.1007/s10618-008-0118-x
More informationIncorporating Hyperlink Analysis in Web Page Clustering
Incorporating Hyperlink Analysis in Web Page Clustering Michael Chau School of Business The University of Hong Kong Pokfulam, Hong Kong +852 2859-1014 mchau@business.hku.hk Patrick Y. K. Chau School of
More informationCategorizing Search Results Using WordNet and Wikipedia
Categorizing Search Results Using WordNet and Wikipedia Reza Taghizadeh Hemayati 1, Weiyi Meng 1, Clement Yu 2 1 Department of Computer Science, Binghamton University, Binghamton, NY 13902, USA {hemayati,
More informationWord Disambiguation in Web Search
Word Disambiguation in Web Search Rekha Jain Computer Science, Banasthali University, Rajasthan, India Email: rekha_leo2003@rediffmail.com G.N. Purohit Computer Science, Banasthali University, Rajasthan,
More informationFocused crawling: a new approach to topic-specific Web resource discovery. Authors
Focused crawling: a new approach to topic-specific Web resource discovery Authors Soumen Chakrabarti Martin van den Berg Byron Dom Presented By: Mohamed Ali Soliman m2ali@cs.uwaterloo.ca Outline Why Focused
More informationAutomatically Constructing a Directory of Molecular Biology Databases
Automatically Constructing a Directory of Molecular Biology Databases Luciano Barbosa Sumit Tandon Juliana Freire School of Computing University of Utah {lbarbosa, sumitt, juliana}@cs.utah.edu Online Databases
More informationClustering Results. Result List Example. Clustering Results. Information Retrieval
Information Retrieval INFO 4300 / CS 4300! Presenting Results Clustering Clustering Results! Result lists often contain documents related to different aspects of the query topic! Clustering is used to
More informationAccessing XML documents: The INEX initiative. Mounia Lalmas, Thomas Rölleke, Zoltán Szlávik, Tassos Tombros (+ Duisburg-Essen)
Accessing XML documents: The INEX initiative Mounia Lalmas, Thomas Rölleke, Zoltán Szlávik, Tassos Tombros (+ Duisburg-Essen) XML documents Book Chapters Sections World Wide Web This is only only another
More informationApproach Research of Keyword Extraction Based on Web Pages Document
2017 3rd International Conference on Electronic Information Technology and Intellectualization (ICEITI 2017) ISBN: 978-1-60595-512-4 Approach Research Keyword Extraction Based on Web Pages Document Yangxin
More informationFeature selection. LING 572 Fei Xia
Feature selection LING 572 Fei Xia 1 Creating attribute-value table x 1 x 2 f 1 f 2 f K y Choose features: Define feature templates Instantiate the feature templates Dimensionality reduction: feature selection
More informationExtracting Rankings for Spatial Keyword Queries from GPS Data
Extracting Rankings for Spatial Keyword Queries from GPS Data Ilkcan Keles Christian S. Jensen Simonas Saltenis Aalborg University Outline Introduction Motivation Problem Definition Proposed Method Overview
More informationIntroduction to Information Retrieval
Introduction to Information Retrieval Mohsen Kamyar چهارمین کارگاه ساالنه آزمایشگاه فناوری و وب بهمن ماه 1391 Outline Outline in classic categorization Information vs. Data Retrieval IR Models Evaluation
More informationGRAPE: A Graph-Based Framework for Disambiguating People Appearances in Web Search
2009 Ninth IEEE International Conference on Data Mining GRAPE: A Graph-Based Framework for Disambiguating People Appearances in Web Search Lili Jiang, Jianyong Wang,NingAn, Shengyuan Wang,JianZhan,LianLi
More informationLarge Scale 3D Reconstruction by Structure from Motion
Large Scale 3D Reconstruction by Structure from Motion Devin Guillory Ziang Xie CS 331B 7 October 2013 Overview Rome wasn t built in a day Overview of SfM Building Rome in a Day Building Rome on a Cloudless
More informationAutomatic Summarization
Automatic Summarization CS 769 Guest Lecture Andrew B. Goldberg goldberg@cs.wisc.edu Department of Computer Sciences University of Wisconsin, Madison February 22, 2008 Andrew B. Goldberg (CS Dept) Summarization
More informationLinking Entities in Chinese Queries to Knowledge Graph
Linking Entities in Chinese Queries to Knowledge Graph Jun Li 1, Jinxian Pan 2, Chen Ye 1, Yong Huang 1, Danlu Wen 1, and Zhichun Wang 1(B) 1 Beijing Normal University, Beijing, China zcwang@bnu.edu.cn
More informationUNIT-V WEB MINING. 3/18/2012 Prof. Asha Ambhaikar, RCET Bhilai.
UNIT-V WEB MINING 1 Mining the World-Wide Web 2 What is Web Mining? Discovering useful information from the World-Wide Web and its usage patterns. 3 Web search engines Index-based: search the Web, index
More informationDetermine the Entity Number in Hierarchical Clustering for Web Personal Name Disambiguation
Determine the Entity Number in Hierarchical Clustering for Web Personal Name Disambiguation Jun Gong Department of Information System Beihang University No.37 XueYuan Road HaiDian District, Beijing, China
More informationGene Clustering & Classification
BINF, Introduction to Computational Biology Gene Clustering & Classification Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Introduction to Gene Clustering
More information10/10/13. Traditional database system. Information Retrieval. Information Retrieval. Information retrieval system? Information Retrieval Issues
COS 597A: Principles of Database and Information Systems Information Retrieval Traditional database system Large integrated collection of data Uniform access/modifcation mechanisms Model of data organization
More informationText Analytics (Text Mining)
CSE 6242 / CX 4242 Apr 1, 2014 Text Analytics (Text Mining) Concepts and Algorithms Duen Horng (Polo) Chau Georgia Tech Some lectures are partly based on materials by Professors Guy Lebanon, Jeffrey Heer,
More informationCS145: INTRODUCTION TO DATA MINING
CS145: INTRODUCTION TO DATA MINING Clustering Evaluation and Practical Issues Instructor: Yizhou Sun yzsun@cs.ucla.edu November 7, 2017 Learnt Clustering Methods Vector Data Set Data Sequence Data Text
More informationMultimedia Event Detection for Large Scale Video. Benjamin Elizalde
Multimedia Event Detection for Large Scale Video Benjamin Elizalde Outline Motivation TrecVID task Related work Our approach (System, TF/IDF) Results & Processing time Conclusion & Future work Agenda 2
More informationQuery Difficulty Prediction for Contextual Image Retrieval
Query Difficulty Prediction for Contextual Image Retrieval Xing Xing 1, Yi Zhang 1, and Mei Han 2 1 School of Engineering, UC Santa Cruz, Santa Cruz, CA 95064 2 Google Inc., Mountain View, CA 94043 Abstract.
More informationCluster Analysis (b) Lijun Zhang
Cluster Analysis (b) Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Grid-Based and Density-Based Algorithms Graph-Based Algorithms Non-negative Matrix Factorization Cluster Validation Summary
More informationIntroduction to Web Clustering
Introduction to Web Clustering D. De Cao R. Basili Corso di Web Mining e Retrieval a.a. 2008-9 June 26, 2009 Outline Introduction to Web Clustering Some Web Clustering engines The KeySRC approach Some
More informationCSE 3. How Is Information Organized? Searching in All the Right Places. Design of Hierarchies
CSE 3 Comics Updates Shortcut(s)/Tip(s) of the Day Web Proxy Server PrimoPDF How Computers Work Ch 30 Chapter 5: Searching for Truth: Locating Information on the WWW Fluency with Information Technology
More informationIMPROVING INFORMATION RETRIEVAL BASED ON QUERY CLASSIFICATION ALGORITHM
IMPROVING INFORMATION RETRIEVAL BASED ON QUERY CLASSIFICATION ALGORITHM Myomyo Thannaing 1, Ayenandar Hlaing 2 1,2 University of Technology (Yadanarpon Cyber City), near Pyin Oo Lwin, Myanmar ABSTRACT
More informationAn Adaptive Agent for Web Exploration Based on Concept Hierarchies
An Adaptive Agent for Web Exploration Based on Concept Hierarchies Scott Parent, Bamshad Mobasher, Steve Lytinen School of Computer Science, Telecommunication and Information Systems DePaul University
More informationHierarchical Location and Topic Based Query Expansion
Hierarchical Location and Topic Based Query Expansion Shu Huang 1 Qiankun Zhao 2 Prasenjit Mitra 1 C. Lee Giles 1 Information Sciences and Technology 1 AOL Research Lab 2 Pennsylvania State University
More informationImprovement of Web Search Results using Genetic Algorithm on Word Sense Disambiguation
Volume 3, No.5, May 24 International Journal of Advances in Computer Science and Technology Pooja Bassin et al., International Journal of Advances in Computer Science and Technology, 3(5), May 24, 33-336
More informationInternational Journal of Advance Engineering and Research Development. Survey of Web Usage Mining Techniques for Web-based Recommendations
Scientific Journal of Impact Factor (SJIF): 5.71 International Journal of Advance Engineering and Research Development Volume 5, Issue 02, February -2018 e-issn (O): 2348-4470 p-issn (P): 2348-6406 Survey
More informationI. INTRODUCTION. Fig Taxonomy of approaches to build specialized search engines, as shown in [80].
Focus: Accustom To Crawl Web-Based Forums M.Nikhil 1, Mrs. A.Phani Sheetal 2 1 Student, Department of Computer Science, GITAM University, Hyderabad. 2 Assistant Professor, Department of Computer Science,
More informationDATA MINING - 1DL105, 1DL111
1 DATA MINING - 1DL105, 1DL111 Fall 2007 An introductory class in data mining http://user.it.uu.se/~udbl/dut-ht2007/ alt. http://www.it.uu.se/edu/course/homepage/infoutv/ht07 Kjell Orsborn Uppsala Database
More informationLET:Towards More Precise Clustering of Search Results
LET:Towards More Precise Clustering of Search Results Yi Zhang, Lidong Bing,Yexin Wang, Yan Zhang State Key Laboratory on Machine Perception Peking University,100871 Beijing, China {zhangyi, bingld,wangyx,zhy}@cis.pku.edu.cn
More informationLINK GRAPH ANALYSIS FOR ADULT IMAGES CLASSIFICATION
LINK GRAPH ANALYSIS FOR ADULT IMAGES CLASSIFICATION Evgeny Kharitonov *, ***, Anton Slesarev *, ***, Ilya Muchnik **, ***, Fedor Romanenko ***, Dmitry Belyaev ***, Dmitry Kotlyarov *** * Moscow Institute
More information