Modeling Information Seeking Behavior in Social Media Eugene Agichtein

Size: px
Start display at page:

Download "Modeling Information Seeking Behavior in Social Media Eugene Agichtein"

Transcription

1 Modeling Information Seeking Behavior in Social Media Eugene Agichtein I lli I f i A L b (IRL b) Intelligent Information Access Lab (IRLab) Emory University

2 Intelligent Information Access Lab (IRLab) Modeling information seeking behavior Searching the Web and social media Text and data mining for medical informatics and public health Ablimit Aji Qi Guo Julia Kiseleva In collaboration with: - Beth Buffalo (Neurology) - Charlie Clarke (Waterloo) - Ernie Garcia (Radiology) - Phil Wolff (Psychology) - Hongyuan Zha (GaTech) Dmitry Lagun Qiaoling Liu Yu Wang 2

3 Our Approach to Intelligent Information Access Search logs: queries, clicks Data-Driven Di Model ldiscovery (machine learning/data mining) Intelligent search Information Health Cognitive sharing Informatics Diagnosticsi 3 3

4 Intelligent search Contextualized Intent Inference 4

5 Intelligent search Web-scale Text Mining Extract entities, relationships, events from text Estimate accuracy of web content DiseaseOutbreaks, The New York Times Some Applications: Incorporating extracted information into (web) search Finding implicit connections between events, entities Visualizing and exploring large text collections 18 November 2009 Eugene Agichtein, Emory 5 University, IR Lab [DL 00, ICDE 2003 best student paper, SIGMOD 2006 best paper, ]

6 Health Informatics Information Extraction for Decision Support with E. V. Garcia (Radiology) and A. Ram (Georgia Tech) Rule Discovery from Medical Literature (MERLIN project): Identify articles containing useful clinical knowledge Extract new expert system rules for the Emory Cardiac Toolbox IF LV_stress_perfusion_is_abnormal THEN Diseased_coronary_is(LAD) Personalized diagnosis i and care (PRETEX project): Extract clinical variables from text in patient records Personalize expert system rules for a given patient or population New: unexpected findings 18 November 2009 Eugene Agichtein, Emory University, IR Lab 6

7 Talk Outline IR Lab research overview Mining interactions in social media Content quality Information seeker satisfaction Question intent If time: inferring web search intent 7

8 Finding Information Online 8

9 From Searching to Finding 9

10 Social (Information) Sharing 10

11 11 11

12 (Text) Social Media Today Published: 4Gb/day Social Media: 10Gb/Day Yahoo Answers: 120M users, 40M questions, 1B answers Yes, we could read your blog. Or, you could tell us about your day 12

13 Finding Information Online (Revisited) Claim: next generation of search will provide support for real-time, mediated info exchange First step: web-scale collaborative question answering (CQA) sites Realistic information exchange 100M+ community Many immediate challenges 13

14

15 15

16 (Some) Related Work Adamic et al., WWW 2007, WWW 2008: Expertise sharing, network structure Elsas et al., SIGIR 2008: Blog search Glance et al.: Blog Pulse, popularity, information sharing Harper et al., CHI 2008, 2009: Answer quality across multiple CQA sites Krautetal: al.: community participation Kumar et al., WWW 2004, KDD 2008, : Information diffusion in blogspace, network evolution Third Workshop on Searching Social Media (SSM 2010) at WSDM: edu/ssm2010/ 16

17 Finding High Quality Content in SM E. Agichtein, C. Castillo, D. Donato, A. Gionis, and G. Mishne, Finding High Quality Content in Social Media, in WSDM 2008 Well-written Interesting Relevant (answer) Factually correct Popular? Provocative? Useful? As judged by professional editors 17

18 18 18

19 How do Question and Answer Quality relate? 19 19

20 20 20

21 21 21

22 22 22

23 23 23

24 Community 24

25 Link Analysis for Authority Estimation User 1 User 2 Question 1 Question 2 Answer 1 User 3 Answer 2 User 4 Answer 3 User 5 Answer 4 User 6 User 3 User 1 User 4 User 5 User 2 User 6 Question 3 Answer 5 Answer 6 A ( j ) = H ( i ) i= 0.. M H ( i) = A( j) j = 0.. K Hub (asker) Authority (answerer) 25

26 HITS effective Qualitative Observations HITS ineffective 26

27 Random forest classifier 27 27

28 Result 1: Identifying High Quality Questions 28

29 Top Features for Question Classification Asker popularity ( stars ) Punctuation density Topical category Page views KL Divergence from reference corpus LM 29

30 Identifying High Quality Answers 30

31 Top Features for Answer Classification Answer length Community ratings Answerer reputation Word overlap Kincaid readability score 31

32 User and Content Quality: Coupled Mutual Reinforcement 32

33 Can Improve Performance OR Reduce Training 33

34 Finding Information Online (Revisited) Next generation of search: human-machine-human CQA: a case study in complex IR Content quality Asker satisfaction Understanding the interactions 34

35 Dimensions of Quality Well-written Interesting Relevant (answer) Factually correct Popular? Timely? Provocative? Useful? As judged by the asker (or community) 35 35

36 Are Editor Labels Meaningful for CGC? Information seeking process: want to find useful information about topic with incomplete knowledge N. Belkin: Anomalous states of knowledge Want to model directly if user found satisfactory information Specific (amenable) case: CQA 36

37 Yahoo! Answers: The Good News Active community of millions of users in many countries and languages Effective for subjective information needs Great forum for socialization/chat Can be invaluable for hard-to-find information not available on the web 37

38 38

39 Yahoo! Answers: The Bad News May have to wait a long time to get a satisfactory answer 1. FIFA World Cup Optical Poetry Football (American) Soccer Medicine WinterSports 5 8. Special Education 0 9. General Health Care Outdoor Recreation Time to close a question (hours) May never obtain a satisfying i answer 39

40 Predicting Asker Satisfaction Y. Liu, J. Bian, and E. Agichtein, in SIGIR 2008 Yandong Liu Jiang Bian Given a question submitted by an asker in CQA, predict whether the user will be satisfied with the answers contributed by the community. Satisfied : The asker has closed the question AND Selected the best answer AND Rated best answer >= 3 stars (# not important) Else, Unsatisfied 40

41 ASP: Asker Satisfaction Prediction Question Answer Asker History Answerer History Category Text Classifier Wikipedia News asker is satisfied asker is not satisfied 41 41

42 Experimental Setup: Data Crawled from Yahoo! Answers in early 2008 Questions Answers Askers Categories % Satisfied 216,170 1,963, , % Anonymized dataset available at: 1/2009: Yahoo! Webscope : Comprehensive Answers dataset: ~5M questions & answers

43 Satisfaction by Topic Topic Questions Answers A per Q Satisfied Asker rating 2006 FIFA World Cup Mental Health Time to close by asker , % minutes % days Mathematics % minutes Diet & % days Fitness 43

44 Satisfaction Prediction: Human Judges Truth: asker s rating A random sample of 130 questions Researchers Agreement: 0.82 F1: P*R/(P+R) R) Amazon Mechanical Turk Five workers per question. Agreement: 0.9 F1: 0.61 Best when at least 4 out of 5 raters agree 44 44

45 Performance: ASP vs. Humans (F1, Satisfied) Classifier With Text Without Text Selected Features ASP_SVMSVM ASP_C ASP_RandomForest ASP_Boosting ASP_NB Best Human Perf 0.61 Baseline (random) 0.66 Human F1 is lower than the random baseline! ASP is significantly more effective than humans 45

46 Top Features by Information Gain Q: Askers previous rating Q: Average past rating by asker UH: Member since (interval) UH: Average # answers for by past Q UH: Previous Q resolved for the asker CA: Average asker rating for category UH: Total number of answers received 46

47 Offline vs. Online Prediction Offline prediction (AFTER answers arrive) All features( question, answer, asker & category) F1: 0.77 Online prediction (BEFORE question posted) NO answer features Only asker history and question features (stars, #comments, sum of votes ) F1:

48 Personalized Prediction of Satisfaction Y. Liu and E. Agichtein, You've Got Answers: Personalized Models for Predicting Success in Community Question Answering, ACL 2008 Same information!= same usefulness for different searchers! Personalization vs. Groupization? 48

49 Example Personalized Models 49

50 Outline Next generation of search: Algorithmically mediated information exchange CQA: a case study in complex IR Content quality Asker satisfaction Understanding the interactions 50

51 Social Media Language Analysis Social Media!= WSJ Text Subjectivity, Sentiment, Temporal Sensitivity 51

52 Subjectivity in CQA B. Li, Y. Liu, and E. Agichtein, CoCQA: Co-Training Over Questions and Answers with an Application to Predicting Question Subjectivity Orientation, EMNLP 2008 How can we exploit structure of CQA for categorization of social media content? Case Study: Text Subjectivity Subjective: Has anyone got one of those home blood pressure monitors? and if so what make is it and do you think they are worth getting? Objective: What is the difference between chemotherapy and radiation treatments? 52 52

53 Objective vs. Subjective Content in CQA Education 30% Arts Science 36% 70% 48% 52% 64% Health 34% Objective Subjective 21% Sports 36% 64% 66% 79% 53

54 54 54

55 Questions and Answers: Two Views Example: Q: Has anyone got one of those home blood pressure monitors? and if so what make is it and do you think they are worth getting? A: My mom has one as she is diabetic so its important for her to monitor it she finds it useful. Answer orientation usually matches question Idea: Co-Training (Blum & Mitchell, COLT 1998) 55 55

56 CoCQA: A Co-Training Framework over Questions and Answers EA1 Labeled Data Q A CQ C CA Class ify Q A Unlabeled Unlabeled Data Data???????????????????????????????????????? Stop Validation (Holdout training data) 56 56

57 Slide 56 EA1 Include one more box on lower right corner: after "stop" lights up, show box "apply final classifier on test data" Eugene Agichtein, 10/26/2008

58 Features Method Example result: CoCQA Outperforms State-ofthe-Art Partially Supervised ML Question (macro avg F1) Question+ Best Answer (macro avg F1) Supervised GE (-0.7%) (+3.2%) CoCQA (+1.9%) (+7.2%) Implications: Can reduce amount of required manual labels Can improve accuracy with more unlabeled data 57

59 Another Example: Question Urgency [Liu et al., SIGIR 2009 poster] Problem a growing volume of questions competing for visibility Urgent questions pushed out Delayed responses useless 58

60 Outline Next generation of search: Algorithmically mediated information exchange CQA: a case study in complex IR Content quality Asker satisfaction Understanding interactions 59

61 Current Work (in Progress) Partially supervised models of expertise (Bian et al., WWW 2009) Sentiment, temporal sensitivity i i analysis Influence of text on interactions Towards real-time hybrid social/web search 60

62 Intelligent search Goal: Hybrid Web/Social Search 61

63 Takeaways Robust machine learning over interaction data system improvements, insights into behavior Contextualized models for NLP and text mining system improvements, insights into interactions Mining social media:potentialfortransformative transformative impact for IR, sociology, psychology, medical informatics, public health, 62

64 More information, datasets, papers, slides: References Modeling search intent [SIGIR 06, 07, ECIR 09, WI 09] Estimating content quality [WSDM 2008] Estimating contributor authority [CIKM 2007] Searching CQA archives [WWW 2008, WWW 2009] Inferring asker intent [EMNLP 2008, SIGIR 09 poster] Predicting satisfaction [SIGIR 2008, ACL 2008, TKDE 09] Coping with spam in CQA [AIRWeb 2008] 63

65 Thank you! Diane Kelly and UNC for hosting my visit Supported by: 64

Machine Learning Applications to Modeling Web Searcher Behavior Eugene Agichtein

Machine Learning Applications to Modeling Web Searcher Behavior Eugene Agichtein Machine Learning Applications to Modeling Web Searcher Behavior Eugene Agichtein Intelligent Information Access Lab (IRLab) Emory University Talk Outline Overview of the Emory IR Lab Intent-centric Web

More information

Survey on Community Question Answering Systems

Survey on Community Question Answering Systems World Journal of Technology, Engineering and Research, Volume 3, Issue 1 (2018) 114-119 Contents available at WJTER World Journal of Technology, Engineering and Research Journal Homepage: www.wjter.com

More information

The Web: Concepts and Technology. 1 CS 584: Information Retrieval. Math & Computer Science Department, Emory University

The Web: Concepts and Technology. 1 CS 584: Information Retrieval. Math & Computer Science Department, Emory University The Web: Concepts and Technology January 15: Course Overview 1 CS 584: Information Retrieval. Math & Computer Science Department, Emory University Today s Plan Who am I? What is this course about? Logistics

More information

Karami, A., Zhou, B. (2015). Online Review Spam Detection by New Linguistic Features. In iconference 2015 Proceedings.

Karami, A., Zhou, B. (2015). Online Review Spam Detection by New Linguistic Features. In iconference 2015 Proceedings. Online Review Spam Detection by New Linguistic Features Amir Karam, University of Maryland Baltimore County Bin Zhou, University of Maryland Baltimore County Karami, A., Zhou, B. (2015). Online Review

More information

Eugene Agichtein, Curriculum Vitae October Eugene Agichtein

Eugene Agichtein, Curriculum Vitae October Eugene Agichtein Eugene Agichtein Mathematics and Computer Science Department Emory University eugene@mathcs.emory.edu 400 Dowman Drive, Suite W401 Web: http://www.mathcs.emory.edu/~eugene/ Atlanta, GA 30322 Telephone:

More information

Mining Trusted Information in Medical Science: An Information Network Approach

Mining Trusted Information in Medical Science: An Information Network Approach Mining Trusted Information in Medical Science: An Information Network Approach Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign Collaborated with many, especially Yizhou

More information

Ranking with Query-Dependent Loss for Web Search

Ranking with Query-Dependent Loss for Web Search Ranking with Query-Dependent Loss for Web Search Jiang Bian 1, Tie-Yan Liu 2, Tao Qin 2, Hongyuan Zha 1 Georgia Institute of Technology 1 Microsoft Research Asia 2 Outline Motivation Incorporating Query

More information

Lecture 5: Search Interfaces + New Directions

Lecture 5: Search Interfaces + New Directions Modeling User Behavior and dinteractions ti Lecture 5: Search Interfaces + New Directions Eugene Agichtein Emory University Eugene Agichtein, Emory University, RuSSIR 2009 (Petrozavodsk, Russia) 1 Lecture

More information

Anomaly Detection. You Chen

Anomaly Detection. You Chen Anomaly Detection You Chen 1 Two questions: (1) What is Anomaly Detection? (2) What are Anomalies? Anomaly detection refers to the problem of finding patterns in data that do not conform to expected behavior

More information

Telling Experts from Spammers Expertise Ranking in Folksonomies

Telling Experts from Spammers Expertise Ranking in Folksonomies 32 nd Annual ACM SIGIR 09 Boston, USA, Jul 19-23 2009 Telling Experts from Spammers Expertise Ranking in Folksonomies Michael G. Noll (Albert) Ching-Man Au Yeung Christoph Meinel Nicholas Gibbins Nigel

More information

PUTTING CONTEXT INTO SEARCH AND SEARCH INTO CONTEXT. Susan Dumais, Microsoft Research

PUTTING CONTEXT INTO SEARCH AND SEARCH INTO CONTEXT. Susan Dumais, Microsoft Research PUTTING CONTEXT INTO SEARCH AND SEARCH INTO CONTEXT Susan Dumais, Microsoft Research Overview Importance of context in IR Potential for personalization framework Examples Personal navigation Client-side

More information

Towards Predicting Web Searcher Gaze Position from Mouse Movements

Towards Predicting Web Searcher Gaze Position from Mouse Movements Towards Predicting Web Searcher Gaze Position from Mouse Movements Qi Guo Emory University 400 Dowman Dr., W401 Atlanta, GA 30322 USA qguo3@emory.edu Eugene Agichtein Emory University 400 Dowman Dr., W401

More information

Query Independent Scholarly Article Ranking

Query Independent Scholarly Article Ranking Query Independent Scholarly Article Ranking Shuai Ma, Chen Gong, Renjun Hu, Dongsheng Luo, Chunming Hu, Jinpeng Huai SKLSDE Lab, Beihang University, China Beijing Advanced Innovation Center for Big Data

More information

CriES 2010

CriES 2010 CriES Workshop @CLEF 2010 Cross-lingual Expert Search - Bridging CLIR and Social Media Institut AIFB Forschungsgruppe Wissensmanagement (Prof. Rudi Studer) Organizing Committee: Philipp Sorg Antje Schultz

More information

Introduction to Text Mining. Hongning Wang

Introduction to Text Mining. Hongning Wang Introduction to Text Mining Hongning Wang CS@UVa Who Am I? Hongning Wang Assistant professor in CS@UVa since August 2014 Research areas Information retrieval Data mining Machine learning CS@UVa CS6501:

More information

TriRank: Review-aware Explainable Recommendation by Modeling Aspects

TriRank: Review-aware Explainable Recommendation by Modeling Aspects TriRank: Review-aware Explainable Recommendation by Modeling Aspects Xiangnan He, Tao Chen, Min-Yen Kan, Xiao Chen National University of Singapore Presented by Xiangnan He CIKM 15, Melbourne, Australia

More information

Addressing the Challenges of Underspecification in Web Search. Michael Welch

Addressing the Challenges of Underspecification in Web Search. Michael Welch Addressing the Challenges of Underspecification in Web Search Michael Welch mjwelch@cs.ucla.edu Why study Web search?!! Search engines have enormous reach!! Nearly 1 billion queries globally each day!!

More information

KDD 10 Tutorial: Recommender Problems for Web Applications. Deepak Agarwal and Bee-Chung Chen Yahoo! Research

KDD 10 Tutorial: Recommender Problems for Web Applications. Deepak Agarwal and Bee-Chung Chen Yahoo! Research KDD 10 Tutorial: Recommender Problems for Web Applications Deepak Agarwal and Bee-Chung Chen Yahoo! Research Agenda Focus: Recommender problems for dynamic, time-sensitive applications Content Optimization

More information

Recommender Systems. Collaborative Filtering & Content-Based Recommending

Recommender Systems. Collaborative Filtering & Content-Based Recommending Recommender Systems Collaborative Filtering & Content-Based Recommending 1 Recommender Systems Systems for recommending items (e.g. books, movies, CD s, web pages, newsgroup messages) to users based on

More information

Effective Keyword Search over (Semi)-Structured Big Data Mehdi Kargar

Effective Keyword Search over (Semi)-Structured Big Data Mehdi Kargar Effective Keyword Search over (Semi)-Structured Big Data Mehdi Kargar School of Computer Science Faculty of Science University of Windsor How Big is this Big Data? 40 Billion Instagram Photos 300 Hours

More information

Classification. I don t like spam. Spam, Spam, Spam. Information Retrieval

Classification. I don t like spam. Spam, Spam, Spam. Information Retrieval Information Retrieval INFO 4300 / CS 4300! Classification applications in IR Classification! Classification is the task of automatically applying labels to items! Useful for many search-related tasks I

More information

Combining Review Text Content and Reviewer-Item Rating Matrix to Predict Review Rating

Combining Review Text Content and Reviewer-Item Rating Matrix to Predict Review Rating Combining Review Text Content and Reviewer-Item Rating Matrix to Predict Review Rating Dipak J Kakade, Nilesh P Sable Department of Computer Engineering, JSPM S Imperial College of Engg. And Research,

More information

Automatic Domain Partitioning for Multi-Domain Learning

Automatic Domain Partitioning for Multi-Domain Learning Automatic Domain Partitioning for Multi-Domain Learning Di Wang diwang@cs.cmu.edu Chenyan Xiong cx@cs.cmu.edu William Yang Wang ww@cmu.edu Abstract Multi-Domain learning (MDL) assumes that the domain labels

More information

S-MART: Novel Tree-based Structured Learning Algorithms Applied to Tweet Entity Linking

S-MART: Novel Tree-based Structured Learning Algorithms Applied to Tweet Entity Linking S-MART: Novel Tree-based Structured Learning Algorithms Applied to Tweet Entity Linking Yi Yang * and Ming-Wei Chang # * Georgia Institute of Technology, Atlanta # Microsoft Research, Redmond Traditional

More information

Interaction Model to Predict Subjective Specificity of Search Results

Interaction Model to Predict Subjective Specificity of Search Results Interaction Model to Predict Subjective Specificity of Search Results Kumaripaba Athukorala, Antti Oulasvirta, Dorota Glowacka, Jilles Vreeken, Giulio Jacucci Helsinki Institute for Information Technology

More information

CIRGDISCO at RepLab2012 Filtering Task: A Two-Pass Approach for Company Name Disambiguation in Tweets

CIRGDISCO at RepLab2012 Filtering Task: A Two-Pass Approach for Company Name Disambiguation in Tweets CIRGDISCO at RepLab2012 Filtering Task: A Two-Pass Approach for Company Name Disambiguation in Tweets Arjumand Younus 1,2, Colm O Riordan 1, and Gabriella Pasi 2 1 Computational Intelligence Research Group,

More information

Annotation and Evaluation

Annotation and Evaluation Annotation and Evaluation Digging into Data: Jordan Boyd-Graber University of Maryland April 15, 2013 Digging into Data: Jordan Boyd-Graber (UMD) Annotation and Evaluation April 15, 2013 1 / 21 Exam Solutions

More information

CS425: Algorithms for Web Scale Data

CS425: Algorithms for Web Scale Data CS425: Algorithms for Web Scale Data Most of the slides are from the Mining of Massive Datasets book. These slides have been modified for CS425. The original slides can be accessed at: www.mmds.org J.

More information

Questioning Yahoo! Answers

Questioning Yahoo! Answers Questioning Yahoo! Answers Zoltán Gyöngyi zoltan@cs.stanford.edu Outline Yahoo! Answers model Statistics Basics Diversity Authority Problems Interaction model Others Question Answering on the Web April

More information

Understanding the use of Temporal Expressions on Persian Web Search

Understanding the use of Temporal Expressions on Persian Web Search Understanding the use of Temporal Expressions on Persian Web Search Behrooz Mansouri Mohammad Zahedi Ricardo Campos Mojgan Farhoodi Alireza Yari Ricardo Campos TempWeb 2018 @ WWW Lyon, France, Apr 23,

More information

Learning Temporal-Dependent Ranking Models

Learning Temporal-Dependent Ranking Models Learning Temporal-Dependent Ranking Models Miguel Costa, Francisco Couto, Mário Silva LaSIGE @ Faculty of Sciences, University of Lisbon IST/INESC-ID, University of Lisbon 37th Annual ACM SIGIR Conference,

More information

Final Project Discussion. Adam Meyers Montclair State University

Final Project Discussion. Adam Meyers Montclair State University Final Project Discussion Adam Meyers Montclair State University Summary Project Timeline Project Format Details/Examples for Different Project Types Linguistic Resource Projects: Annotation, Lexicons,...

More information

Positive and Negative Links

Positive and Negative Links Positive and Negative Links Web Science (VU) (707.000) Elisabeth Lex KTI, TU Graz May 4, 2015 Elisabeth Lex (KTI, TU Graz) Networks May 4, 2015 1 / 66 Outline 1 Repetition 2 Motivation 3 Structural Balance

More information

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li Learning to Match Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li 1. Introduction The main tasks in many applications can be formalized as matching between heterogeneous objects, including search, recommendation,

More information

WEB SPAM IDENTIFICATION THROUGH LANGUAGE MODEL ANALYSIS

WEB SPAM IDENTIFICATION THROUGH LANGUAGE MODEL ANALYSIS WEB SPAM IDENTIFICATION THROUGH LANGUAGE MODEL ANALYSIS Juan Martinez-Romo and Lourdes Araujo Natural Language Processing and Information Retrieval Group at UNED * nlp.uned.es Fifth International Workshop

More information

This study is brought to you courtesy of.

This study is brought to you courtesy of. This study is brought to you courtesy of www.google.com/think/insights Health Consumer Study The Role of Digital in Patients Healthcare Actions & Decisions Google/OTX U.S., December 2009 Background Demonstrate

More information

How to organize the Web?

How to organize the Web? How to organize the Web? First try: Human curated Web directories Yahoo, DMOZ, LookSmart Second try: Web Search Information Retrieval attempts to find relevant docs in a small and trusted set Newspaper

More information

A Novel Categorized Search Strategy using Distributional Clustering Neenu Joseph. M 1, Sudheep Elayidom 2

A Novel Categorized Search Strategy using Distributional Clustering Neenu Joseph. M 1, Sudheep Elayidom 2 A Novel Categorized Search Strategy using Distributional Clustering Neenu Joseph. M 1, Sudheep Elayidom 2 1 Student, M.E., (Computer science and Engineering) in M.G University, India, 2 Associate Professor

More information

Introduction to Information Retrieval. Hongning Wang

Introduction to Information Retrieval. Hongning Wang Introduction to Information Retrieval Hongning Wang CS@UVa What is information retrieval? 2 Why information retrieval Information overload It refers to the difficulty a person can have understanding an

More information

In the Mood to Click? Towards Inferring Receptiveness to Search Advertising

In the Mood to Click? Towards Inferring Receptiveness to Search Advertising In the Mood to Click? Towards Inferring Receptiveness to Search Advertising Qi Guo Eugene Agichtein Mathematics & Computer Science Department Emory University Atlanta, USA {qguo3,eugene}@mathcs.emory.edu

More information

Math Information Retrieval: User Requirements and Prototype Implementation. Jin Zhao, Min Yen Kan and Yin Leng Theng

Math Information Retrieval: User Requirements and Prototype Implementation. Jin Zhao, Min Yen Kan and Yin Leng Theng Math Information Retrieval: User Requirements and Prototype Implementation Jin Zhao, Min Yen Kan and Yin Leng Theng Why Math Information Retrieval? Examples: Looking for formulas Collect teaching resources

More information

Topic Classification in Social Media using Metadata from Hyperlinked Objects

Topic Classification in Social Media using Metadata from Hyperlinked Objects Topic Classification in Social Media using Metadata from Hyperlinked Objects Sheila Kinsella 1, Alexandre Passant 1, and John G. Breslin 1,2 1 Digital Enterprise Research Institute, National University

More information

Part 1: Link Analysis & Page Rank

Part 1: Link Analysis & Page Rank Chapter 8: Graph Data Part 1: Link Analysis & Page Rank Based on Leskovec, Rajaraman, Ullman 214: Mining of Massive Datasets 1 Graph Data: Social Networks [Source: 4-degrees of separation, Backstrom-Boldi-Rosa-Ugander-Vigna,

More information

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu How to organize the Web? First try: Human curated Web directories Yahoo, DMOZ, LookSmart Second

More information

Cognos: Crowdsourcing Search for Topic Experts in Microblogs

Cognos: Crowdsourcing Search for Topic Experts in Microblogs Cognos: Crowdsourcing Search for Topic Experts in Microblogs Saptarshi Ghosh, Naveen Sharma, Fabricio Benevenuto, Niloy Ganguly, Krishna Gummadi IIT Kharagpur, India; UFOP, Brazil; MPI-SWS, Germany Topic

More information

Human Computer Interaction in Health Informatics: From Laboratory Usability Testing to Televaluation of Web-based Information Systems

Human Computer Interaction in Health Informatics: From Laboratory Usability Testing to Televaluation of Web-based Information Systems Human Computer Interaction in Health Informatics: From Laboratory Usability Testing to Televaluation of Web-based Information Systems André W. Kushniruk, Ph.D. Arts Information Technology Program York

More information

Chapter 6 Evaluation Metrics and Evaluation

Chapter 6 Evaluation Metrics and Evaluation Chapter 6 Evaluation Metrics and Evaluation The area of evaluation of information retrieval and natural language processing systems is complex. It will only be touched on in this chapter. First the scientific

More information

Domain Adaptation Using Domain Similarity- and Domain Complexity-based Instance Selection for Cross-domain Sentiment Analysis

Domain Adaptation Using Domain Similarity- and Domain Complexity-based Instance Selection for Cross-domain Sentiment Analysis Domain Adaptation Using Domain Similarity- and Domain Complexity-based Instance Selection for Cross-domain Sentiment Analysis Robert Remus rremus@informatik.uni-leipzig.de Natural Language Processing Group

More information

Searching in All the Right Places. How Is Information Organized? Chapter 5: Searching for Truth: Locating Information on the WWW

Searching in All the Right Places. How Is Information Organized? Chapter 5: Searching for Truth: Locating Information on the WWW Chapter 5: Searching for Truth: Locating Information on the WWW Fluency with Information Technology Third Edition by Lawrence Snyder Searching in All the Right Places The Obvious and Familiar To find tax

More information

Data Mining Concepts & Tasks

Data Mining Concepts & Tasks Data Mining Concepts & Tasks Duen Horng (Polo) Chau Georgia Tech CSE6242 / CX4242 Sept 9, 2014 Partly based on materials by Professors Guy Lebanon, Jeffrey Heer, John Stasko, Christos Faloutsos Last Time

More information

CS490W: Web Information Search & Management. CS-490W Web Information Search and Management. Luo Si. Department of Computer Science Purdue University

CS490W: Web Information Search & Management. CS-490W Web Information Search and Management. Luo Si. Department of Computer Science Purdue University CS490W: Web Information Search & Management CS-490W Web Information Search and Management Luo Si Department of Computer Science Purdue University Overview Web: Growth of the Web The world produces between

More information

Time-aware Approaches to Information Retrieval

Time-aware Approaches to Information Retrieval Time-aware Approaches to Information Retrieval Nattiya Kanhabua Department of Computer and Information Science Norwegian University of Science and Technology 24 February 2012 Motivation Searching documents

More information

Bruno Martins. 1 st Semester 2012/2013

Bruno Martins. 1 st Semester 2012/2013 Link Analysis Departamento de Engenharia Informática Instituto Superior Técnico 1 st Semester 2012/2013 Slides baseados nos slides oficiais do livro Mining the Web c Soumen Chakrabarti. Outline 1 2 3 4

More information

Taccumulation of the social network data has raised

Taccumulation of the social network data has raised International Journal of Advanced Research in Social Sciences, Environmental Studies & Technology Hard Print: 2536-6505 Online: 2536-6513 September, 2016 Vol. 2, No. 1 Review Social Network Analysis and

More information

Efficiently Mining Positive Correlation Rules

Efficiently Mining Positive Correlation Rules Applied Mathematics & Information Sciences An International Journal 2011 NSP 5 (2) (2011), 39S-44S Efficiently Mining Positive Correlation Rules Zhongmei Zhou Department of Computer Science & Engineering,

More information

CS-490WIR Web Information Retrieval and Management. Luo Si

CS-490WIR Web Information Retrieval and Management. Luo Si CS490W: Web Information Retrieval & Management CS-490WIR Web Information Retrieval and Management Luo Si Department of Computer Science Purdue University Overview Web: Growth of the Web The world produces

More information

A Framework to Crawl Web Forums Based on Time

A Framework to Crawl Web Forums Based on Time A Framework to Crawl Web Forums Based on Time Dr. M.V. Siva Prasad principal@anurag.ac.in Ch. Suresh Kumar chsuresh.cse@anurag.ac.in B. Ramesh rameshcse532@gmail.com ABSTRACT: An Internet or web forum,

More information

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu How to organize the Web? First try: Human curated Web directories Yahoo, DMOZ, LookSmart Second

More information

Latent Aspect Rating Analysis. Hongning Wang

Latent Aspect Rating Analysis. Hongning Wang Latent Aspect Rating Analysis Hongning Wang CS@UVa Online opinions cover all kinds of topics Topics: People Events Products Services, Sources: Blogs Microblogs Forums Reviews, 45M reviews 53M blogs 1307M

More information

Information Retrieval

Information Retrieval Multimedia Computing: Algorithms, Systems, and Applications: Information Retrieval and Search Engine By Dr. Yu Cao Department of Computer Science The University of Massachusetts Lowell Lowell, MA 01854,

More information

Personalized Models of Search Satisfaction. Ahmed Hassan and Ryen White

Personalized Models of Search Satisfaction. Ahmed Hassan and Ryen White Personalized Models of Search Satisfaction Ahmed Hassan and Ryen White Online Satisfaction Measurement Satisfying users is the main objective of any search system Measuring user satisfaction is essential

More information

Extracting Visual Snippets for Query Suggestion in Collaborative Web Search

Extracting Visual Snippets for Query Suggestion in Collaborative Web Search Extracting Visual Snippets for Query Suggestion in Collaborative Web Search Hannarin Kruajirayu, Teerapong Leelanupab Knowledge Management and Knowledge Engineering Laboratory Faculty of Information Technology

More information

Building and Annotating Corpora of Collaborative Authoring in Wikipedia

Building and Annotating Corpora of Collaborative Authoring in Wikipedia Building and Annotating Corpora of Collaborative Authoring in Wikipedia Johannes Daxenberger, Oliver Ferschke and Iryna Gurevych Workshop: Building Corpora of Computer-Mediated Communication: Issues, Challenges,

More information

University of Glasgow at CLEF 2013: Experiments in ehealth Task 3 with Terrier

University of Glasgow at CLEF 2013: Experiments in ehealth Task 3 with Terrier University of Glasgow at CLEF 2013: Experiments in ehealth Task 3 with Terrier Nut Limsopatham 1, Craig Macdonald 2, and Iadh Ounis 2 School of Computing Science University of Glasgow G12 8QQ, Glasgow,

More information

Query Modifications Patterns During Web Searching

Query Modifications Patterns During Web Searching Bernard J. Jansen The Pennsylvania State University jjansen@ist.psu.edu Query Modifications Patterns During Web Searching Amanda Spink Queensland University of Technology ah.spink@qut.edu.au Bhuva Narayan

More information

Recommender Systems: Practical Aspects, Case Studies. Radek Pelánek

Recommender Systems: Practical Aspects, Case Studies. Radek Pelánek Recommender Systems: Practical Aspects, Case Studies Radek Pelánek 2017 This Lecture practical aspects : attacks, context, shared accounts,... case studies, illustrations of application illustration of

More information

CSE 3. How Is Information Organized? Searching in All the Right Places. Design of Hierarchies

CSE 3. How Is Information Organized? Searching in All the Right Places. Design of Hierarchies CSE 3 Comics Updates Shortcut(s)/Tip(s) of the Day Web Proxy Server PrimoPDF How Computers Work Ch 30 Chapter 5: Searching for Truth: Locating Information on the WWW Fluency with Information Technology

More information

Query Sugges*ons. Debapriyo Majumdar Information Retrieval Spring 2015 Indian Statistical Institute Kolkata

Query Sugges*ons. Debapriyo Majumdar Information Retrieval Spring 2015 Indian Statistical Institute Kolkata Query Sugges*ons Debapriyo Majumdar Information Retrieval Spring 2015 Indian Statistical Institute Kolkata Search engines User needs some information search engine tries to bridge this gap ssumption: the

More information

Advanced Topics in Information Retrieval. Learning to Rank. ATIR July 14, 2016

Advanced Topics in Information Retrieval. Learning to Rank. ATIR July 14, 2016 Advanced Topics in Information Retrieval Learning to Rank Vinay Setty vsetty@mpi-inf.mpg.de Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ATIR July 14, 2016 Before we start oral exams July 28, the full

More information

Northeastern University in TREC 2009 Million Query Track

Northeastern University in TREC 2009 Million Query Track Northeastern University in TREC 2009 Million Query Track Evangelos Kanoulas, Keshi Dai, Virgil Pavlu, Stefan Savev, Javed Aslam Information Studies Department, University of Sheffield, Sheffield, UK College

More information

Accessing Web Archives

Accessing Web Archives Accessing Web Archives Web Science Course 2017 Helge Holzmann 05/16/2017 Helge Holzmann (holzmann@l3s.de) Not today s topic http://blog.archive.org/2016/09/19/the-internet-archive-turns-20/ 05/16/2017

More information

WebSci and Learning to Rank for IR

WebSci and Learning to Rank for IR WebSci and Learning to Rank for IR Ernesto Diaz-Aviles L3S Research Center. Hannover, Germany diaz@l3s.de Ernesto Diaz-Aviles www.l3s.de 1/16 Motivation: Information Explosion Ernesto Diaz-Aviles

More information

Modern Retrieval Evaluations. Hongning Wang

Modern Retrieval Evaluations. Hongning Wang Modern Retrieval Evaluations Hongning Wang CS@UVa What we have known about IR evaluations Three key elements for IR evaluation A document collection A test suite of information needs A set of relevance

More information

Analysis of Large Graphs: TrustRank and WebSpam

Analysis of Large Graphs: TrustRank and WebSpam Note to other teachers and users of these slides: We would be delighted if you found this our material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit

More information

LaHC at CLEF 2015 SBS Lab

LaHC at CLEF 2015 SBS Lab LaHC at CLEF 2015 SBS Lab Nawal Ould-Amer, Mathias Géry To cite this version: Nawal Ould-Amer, Mathias Géry. LaHC at CLEF 2015 SBS Lab. Conference and Labs of the Evaluation Forum, Sep 2015, Toulouse,

More information

Internet Search. (COSC 488) Nazli Goharian Nazli Goharian, 2005, Outline

Internet Search. (COSC 488) Nazli Goharian Nazli Goharian, 2005, Outline Internet Search (COSC 488) Nazli Goharian nazli@cs.georgetown.edu Nazli Goharian, 2005, 2012 1 Outline Web: Indexing & Efficiency Partitioned Indexing Index Tiering & other early termination techniques

More information

Overview of Web Mining Techniques and its Application towards Web

Overview of Web Mining Techniques and its Application towards Web Overview of Web Mining Techniques and its Application towards Web *Prof.Pooja Mehta Abstract The World Wide Web (WWW) acts as an interactive and popular way to transfer information. Due to the enormous

More information

Keywords APSE: Advanced Preferred Search Engine, Google Android Platform, Search Engine, Click-through data, Location and Content Concepts.

Keywords APSE: Advanced Preferred Search Engine, Google Android Platform, Search Engine, Click-through data, Location and Content Concepts. Volume 5, Issue 3, March 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Advanced Preferred

More information

Towards Breaking the Quality Curse. AWebQuerying Web-Querying Approach to Web People Search.

Towards Breaking the Quality Curse. AWebQuerying Web-Querying Approach to Web People Search. Towards Breaking the Quality Curse. AWebQuerying Web-Querying Approach to Web People Search. Dmitri V. Kalashnikov Rabia Nuray-Turan Sharad Mehrotra Dept of Computer Science University of California, Irvine

More information

DATA MINING II - 1DL460. Spring 2014"

DATA MINING II - 1DL460. Spring 2014 DATA MINING II - 1DL460 Spring 2014" A second course in data mining http://www.it.uu.se/edu/course/homepage/infoutv2/vt14 Kjell Orsborn Uppsala Database Laboratory Department of Information Technology,

More information

The Web: Concepts and Technology. January 15: Course Overview

The Web: Concepts and Technology. January 15: Course Overview The Web: Concepts and Technology January 15: Course Overview 1 Today s Plan Who am I? What is this course about? Logistics Who are you? 2 Meet Your Instructor Instructor: Eugene Agichtein Web: http://www.mathcs.emory.edu/~eugene

More information

Finding Nutrition Information on the Web: Coverage vs. Authority

Finding Nutrition Information on the Web: Coverage vs. Authority Finding Nutrition Information on the Web: Coverage vs. Authority Susan G. Doran Department of Computer Science and Engineering, University of South Carolina, Columbia, SC 29208.Sue_doran@yahoo.com Samuel

More information

Information Retrieval

Information Retrieval Information Retrieval Learning to Rank Ilya Markov i.markov@uva.nl University of Amsterdam Ilya Markov i.markov@uva.nl Information Retrieval 1 Course overview Offline Data Acquisition Data Processing Data

More information

UNDERSTANDING AND IMPROVING WEB SEARCH USING LARGE-SCALE BEHAVIORAL LOGS. Susan Dumais, Microsoft Research

UNDERSTANDING AND IMPROVING WEB SEARCH USING LARGE-SCALE BEHAVIORAL LOGS. Susan Dumais, Microsoft Research UNDERSTANDING AND IMPROVING WEB SEARCH USING LARGE-SCALE BEHAVIORAL LOGS Susan Dumais, Microsoft Research Overview The big data revolution examples from Web search Large-scale behavioral logs Observations:

More information

Department of Computer Science & Engineering The Graduate School, Chung-Ang University. CAU Artificial Intelligence LAB

Department of Computer Science & Engineering The Graduate School, Chung-Ang University. CAU Artificial Intelligence LAB Department of Computer Science & Engineering The Graduate School, Chung-Ang University CAU Artificial Intelligence LAB 1 / 17 Text data is exploding on internet because of the appearance of SNS, such as

More information

Slides based on those in:

Slides based on those in: Spyros Kontogiannis & Christos Zaroliagis Slides based on those in: http://www.mmds.org A 3.3 B 38.4 C 34.3 D 3.9 E 8.1 F 3.9 1.6 1.6 1.6 1.6 1.6 2 y 0.8 ½+0.2 ⅓ M 1/2 1/2 0 0.8 1/2 0 0 + 0.2 0 1/2 1 [1/N]

More information

How to do an On-Page SEO Analysis Table of Contents

How to do an On-Page SEO Analysis Table of Contents How to do an On-Page SEO Analysis Table of Contents Step 1: Keyword Research/Identification Step 2: Quality of Content Step 3: Title Tags Step 4: H1 Headings Step 5: Meta Descriptions Step 6: Site Performance

More information

Interpreting Document Collections with Topic Models. Nikolaos Aletras University College London

Interpreting Document Collections with Topic Models. Nikolaos Aletras University College London Interpreting Document Collections with Topic Models Nikolaos Aletras University College London Acknowledgements Mark Stevenson, Sheffield Tim Baldwin, Melbourne Jey Han Lau, IBM Research Talk Outline Introduction

More information

Personalized Information Retrieval. Elena Holobiuc Iulia Pasov Alexandru Agape Octavian Sima Bogdan Cap-Bun

Personalized Information Retrieval. Elena Holobiuc Iulia Pasov Alexandru Agape Octavian Sima Bogdan Cap-Bun Personalized Information Retrieval Elena Holobiuc Iulia Pasov Alexandru Agape Octavian Sima Bogdan Cap-Bun Content Overview Enhancing Personalized Web Search Intent and interest in personalized search

More information

Searching for Information

Searching for Information Searching for Information INFO/CSE100, Spring 2006 Fluency in Information Technology http://www.cs.washington.edu/100 Apr-10-06 searching @ university of washington 1 Readings and References Reading Fluency

More information

Usability Testing. November 14, 2016

Usability Testing. November 14, 2016 Usability Testing November 14, 2016 Announcements Wednesday: HCI in industry VW: December 1 (no matter what) 2 Questions? 3 Today Usability testing Data collection and analysis 4 Usability test A usability

More information

Inferring User Search for Feedback Sessions

Inferring User Search for Feedback Sessions Inferring User Search for Feedback Sessions Sharayu Kakade 1, Prof. Ranjana Barde 2 PG Student, Department of Computer Science, MIT Academy of Engineering, Pune, MH, India 1 Assistant Professor, Department

More information

Federated Search. Jaime Arguello INLS 509: Information Retrieval November 21, Thursday, November 17, 16

Federated Search. Jaime Arguello INLS 509: Information Retrieval November 21, Thursday, November 17, 16 Federated Search Jaime Arguello INLS 509: Information Retrieval jarguell@email.unc.edu November 21, 2016 Up to this point... Classic information retrieval search from a single centralized index all ueries

More information

Unsupervised Rank Aggregation with Distance-Based Models

Unsupervised Rank Aggregation with Distance-Based Models Unsupervised Rank Aggregation with Distance-Based Models Alexandre Klementiev, Dan Roth, and Kevin Small University of Illinois at Urbana-Champaign Motivation Consider a panel of judges Each (independently)

More information

Query Reformulation for Clinical Decision Support Search

Query Reformulation for Clinical Decision Support Search Query Reformulation for Clinical Decision Support Search Luca Soldaini, Arman Cohan, Andrew Yates, Nazli Goharian, Ophir Frieder Information Retrieval Lab Computer Science Department Georgetown University

More information

Student Guide to Neehr Perfect Go!

Student Guide to Neehr Perfect Go! Student Guide to Neehr Perfect Go! I. Introduction... 1 II. Quick Facts... 1 III. Creating your Account... 1 IV. Applying Your Subscription... 4 V. Logging in to Neehr Perfect... 6 VI. Activities... 6

More information

Matt Quinn.

Matt Quinn. Matt Quinn matt.quinn@nist.gov Roles of AHRQ and NIST What s at Stake Current State of Usability in Certified EHRs Projects to Support Improved Usability Moving Forward June 7 NIST Workshop Questions NIST's

More information

Detecting Good Abandonment in Mobile Search

Detecting Good Abandonment in Mobile Search Detecting Good Abandonment in Mobile Search Kyle Williams, Julia Kiseleva, Aidan C. Crook Γ, Imed Zitouni Γ, Ahmed Hassan Awadallah Γ, Madian Khabsa Γ The Pennsylvania State University, University Park,

More information

3 Data, Data Mining. Chengkai Li

3 Data, Data Mining. Chengkai Li CSE4334/5334 Data Mining 3 Data, Data Mining Chengkai Li Department of Computer Science and Engineering University of Texas at Arlington Fall 2018 (Slides partly courtesy of Pang-Ning Tan, Michael Steinbach

More information

Where the Social Web Meets the Semantic Web. Tom Gruber RealTravel.com tomgruber.org

Where the Social Web Meets the Semantic Web. Tom Gruber RealTravel.com tomgruber.org Where the Social Web Meets the Semantic Web Tom Gruber RealTravel.com tomgruber.org Doug Engelbart, 1968 "The grand challenge is to boost the collective IQ of organizations and of society. " Tim Berners-Lee,

More information