ResearchNet and NewsNet: Two Nets That May Capture Many Lovely Birds
|
|
- Preston Barnett
- 5 years ago
- Views:
Transcription
1 ResearchNet and NewsNet: Two Nets That May Capture Many Lovely Birds Jiawei Han Data Mining Research Group, Computer Science University of Illinois at Urbana-Champaign Acknowledgements: NSF, ARL, NIH, DHS, Microsoft, Yahoo!, LinkedIn, HP Lab March 13, 2015
2 2 Outline Why ResearchNet and NewsNet? What Are the Major Challenges? What Have We Done? Construction of ResearchNet and NewsNet Mining ResearchNet and NewsNet Conclusions
3 Where There Is Information, There Are Networks! Social Networking Websites Biological Network: Protein Interaction 3 Research Collaboration Network Product Recommendation Network via s
4 Evolution: Data Mining Link/Network Mining 4 Han, Kamber and Pei, Data Mining, 3 rd ed Yu, Han and Faloutsos (eds.), Link Mining, 2010 Sun and Han, Mining Heterogeneous Information Networks, 2012
5 The Real World: Heterogeneous Networks Multiple object types and/or multiple link types Movie Studio Venue Paper Author DBLP Bibliographic Network Actor Movie Director The IMDB Movie Network The Facebook Network Homogeneous networks are information loss projection of heterogeneous networks! 5 Directly mining information-richer heterogeneous networks
6 What Are ResearchNet and NewsNet? 6 ResearchNet A generic research network construction, exploration and mining system built for any research domain Research domains: CS, BioMedical, civil engineering, Beyond bibliographic data: DBLP, PubMed, ArXiv, Integration of Webpages, Wikipedia, DBPedia, Freebase,. Need to construct Research-Net by data integration & mining Need to develop rich and powerful search, exploration, and mining functions NewsNet: A news network constructed by data integration and mining to facilitate search, mining and exploration Similar construction, exploration and mining functionalities Integrate KBs + tweets and other kinds of social media
7 7 Why ResearchNet and NewsNet? Data, data everywhere! They are abundant, real datasets! R-Net: DBLP, Webpages, Wikipedia, DBPedia, Freebase,. N-Net: News by different agencies, tweets, blogs, KBs, Tester, tester everywhere! We are domain experts! We are also news experts! We are eager users as well Everyone else can understand it, test it, judge it, and use it! It solves the real problems: Lots of challenging research issues on construction, search and mining, from any angle! A new data-to-network-to-knowledge (D2N2K) paradigm Great collaboration scenarios: Never ending exciting stories!
8 8 Outline Why ResearchNet and NewsNet? What Are the Major Challenges? What Have We Done? Construction of ResearchNet and NewsNet Mining ResearchNet and NewsNet Conclusions
9 Construction of Quality Heterogeneous Networks 9 Entity identification, extraction, clustering, and typing R-Net: DBLP titles, abstraction, contents, Webpages, Wikipedia, DBPedia, Freebase,. N-Net: News, tweets, blogs, KBs, Term and concept hierarchy discovery, concept clustering Role discovery and hierarchy generation Information extraction for special dimensions Time, location, organization, person, event, Data and information integration Truth/quality validation Construction of typed, heterogeneous information networks Incremental update and network maintenance
10 10 Exploring and Mining Heterogeneous Networks Similarity search in heterogeneous information networks Querying any components/types in networks Mining multi-typed heterogeneous networks Clustering, classification, ranking, prediction Information diffusion, community evolution Anomaly Can we really understand what it is talking about? Multi-dimensional data summary Does OLAP make sense? R-net OLAP: On research themes N-Net OLAP: News, tweets, blogs, KBs,
11 11 Outline Why ResearchNet and NewsNet? What Are the Major Challenges? What Have We Done? Construction of ResearchNet and NewsNet Mining ResearchNet and NewsNet Conclusions
12 12 What Has Been Done on Network Mining? Clustering, classification and ranking heterogeneous networks RankClus [EDBT09], NetClus [KDD09], GNetMine [PKDD10], RankClass [KDD11], PathSelClus [KDD12], Similarity search in heterogeneous networks PathSim [VLDB12] Prediction and recommendation in heterogeneous networks PathPredict [ASONAM11], citation prediction [KDD14], personalized recommendation [WSDM14] Evolution and info diffusion in heterogeneous networks [TKDE14] [CIKM14]
13 13 What Can be Mined from Heterogeneous Networks? DBLP: A Computer Science bibliographic database A sample publication record in DBLP (>1.8 M papers, >0.7 M authors, >10 K venues), Knowledge hidden in DBLP Network How are CS research areas structured? Who are the leading researchers on Web search? What are the most essential terms, venues, authors in AI? Who are the peer researchers of Jure Leskovec? Whom will Christos Faloutsos collaborate with? Which types of relationships are most influential for an author to decide her topics? How was the field of Data Mining emerged or evolving? Which authors are rather different from his/her peers in IR? Mining Functions Clustering Ranking Classification + Ranking Similarity Search Relationship Prediction Relation Strength Learning Network Evolution Outlier/anomaly detection
14 14 RankClus, NetClus and RankClass Initialization Randomly partition Repeat Ranking Sub-Network Ranking objects in each sub-network induced from each cluster Generating new measure space SIGMOD VLDB EDBT KDD ICDM SDM AAAI ICML Tom Mary Alice Bob Cindy Tracy Jack Mike Lucy Jim SIGMOD VLDB EDBT AAAI ICML SDM ICDM Ranking KDD Ranking Objects Clustering Estimate mixture model coefficients for each target object Adjusting cluster Until stable
15 Interesting Results from Other Domains RankCompete: Organize your photo album automatically! [WWW 11] Ranking influential treatments for diseases from MEDLINE data 15 Rank treatments for AIDS from MEDLINE [ADC 13]
16 Experiments with Very Small Training Set DBLP: 4-fields data set (DB, DM, AI, IR) forming a heterog. info. network Rank objects within each class (with extremely limited label information) Obtain High classification accuracy and excellent rankings within each class Top-5 ranked conferences Top-5 ranked terms Database Data Mining AI IR VLDB KDD IJCAI SIGIR SIGMOD SDM AAAI ECIR ICDE ICDM ICML CIKM PODS PKDD CVPR WWW EDBT PAKDD ECML WSDM data mining learning retrieval database data knowledge information query clustering reasoning web system classification logic search xml frequent cognition text
17 Some Similarity Measure Is Better Than Others Anhai Doan CS, Wisconsin Database area PhD: 2002 Jignesh Patel CS, Wisconsin Database area PhD: 1998 Meta-Path: Author-Paper-Venue-Paper-Author Amol Deshpande CS, Maryland Database area PhD: 2004 Jun Yang CS, Duke Database area PhD:
18 18 Meta-Path Based Co-authorship Prediction in DBLP Co-authorship prediction problem Whether two authors are going to collaborate for the first time Co-authorship encoded in meta-path Author-Paper-Author Topological features encoded in meta-paths Meta-Path Semantic Meaning Meta-paths between authors under length 4
19 19 The Power of PathPredict Explain the prediction power of each meta-path Wald Test for logistic regression Higher prediction accuracy than using projected homogeneous network 11% higher in prediction accuracy Co-author prediction for Jian Pei: Only 42 among 4809 candidates are true first-time co-authors! (Feature collected in [1996, 2002]; Test period in [2003,2009])
20 20 What Has Been Done on Network Construction? Role discovery Advisor-Advisee [KDD10], Truth validation TruthFinder [KDD07, TKDE08], LTM [VLDB12] Web structure discovery: Growing parallel path [WWW11] Integration of phase mining and topic modeling KERT [SDM 14], CATHY [KDD 13], CATHYHIN [ICDM 13] ToPMine [VLDB 15] Entity recognition and typing in massive text corpora ClusType [KDD 15 submission]
21 21 Role Discovery: Mining Advisor-Advisee Relationships in DBLP Network [C. Wang et al. KDD 10] Propagation of simple, commonly accepted constraints in Time- Constrained Probabilistic Factor Graph (TPFG) Advisor has more publications and longer history than advisee at the time of advising Once an advisee becomes advisor, s/he will not become advisee again Input: Temporal collaboration network 1999 Output: Relationship analysis (0.9, [/, 1998]) Visualized chorological hierarchies Ada 2000 Bob Ada (0.4, [/, 1998]) (0.5, [/, 2000]) 2000 (0.8, [1999,2000]) Ying Smith Jerry Bob (0.7, [2000, 2001]) (0.65, [2002, 2004]) Ying (0.49, [/, 1999]) Jerry (0.2, [2001, 2003]) 2004 Smith
22 22 Role Discovery: Performance & Case Study DBLP data: 654, 628 authors, 1076,946 publications, years provided Labeled data: MathGealogy Project; AI Gealogy Project; Homepage Datasets RULE SVM IndMAX TPFG TEST1 69.9% 73.4% 75.2% 78.9% 80.2% 84.4% TEST2 69.8% 74.6% 74.6% 79.0% 81.5% 84.3% TEST3 80.6% 86.7% 83.1% 90.9% 88.8% 91.3% Case study heuristics Supervised learning Empirical parameter Advisee Top Ranked Advisor Time Note optimized parameter David M. Blei 1. Michael I. Jordan PhD advisor, 2004 grad 2. John D. Lafferty Postdoc, 2006 Hong Cheng 1. Qiang Yang MS advisor, Jiawei Han PhD advisor, 2008 Sergey Brin 1. Rajeev Motawani Unofficial advisor
23 23 Enhancing the Quality of Heterogeneous Info. Networks Info. networks could be untrustworthy, error-prone, missing, TruthFinder [KDD 07]: Inference on trustworthiness by mutual enhancement of info provider and statement trustworthiness Latent Truth Model (LTM) [VLDB12]: Modeling two-sided quality to support multiple true values per entity for truth-finding Web sites Facts Objects w 1 f 1 w 2 f 2 w 3 f 3 w 4 f 4 o 1 o 2 Generating Implicit Negative Claims: Positive Claim Negative Claim Correct Claim Incorrect Claim High Precision, High Recall IMDB High Precision, Low Recall Netflix Low Precision, Low BadSour Recall ce Harry Potter
24 24 Truth Discovery: Effectiveness of Latent Truth Model [B. Zhao et al. 2012] Experimental datasets: Large and real Book Authors from abebooks.com (1263 books, 879 sources, claims, 2420 book-author, 100 labeled) Movie Directors from Bing (15073 movies, 12 sources, claims, movie-director, 100 labeled) Effectiveness of Latent Truth Model: Model source quality in other data integration tasks, e.g. entity resolution. Trustworthiness in multi-genre networks (text-rich networks, social networks, etc.)
25 25 Growing Parallel Paths [T. Weninger, WWW 11] Path HTML DIV... DIV DIV... UL LI A B HTML Page B DIV P P LI A X 1 A D HTML DIV UL Page D LI A Y 2 A E LI HTML DIV UL Page E LI A Z A W 3 4 X Y Z W Page A LI A C HTML Page C DIV P A F HTML Page F TABLE TR TD TD A U A V 5 6 U V Example:
26 CATHYHIN: Topic Hierarchy Construction by Integration of Heterogeneous Info. Networks [C. Wang ICDM 13] Using DBLP heterog. Info. network to enhance topical hierarchy generation CATHYHIN output for DBLP data Hierarchies generated not only on topical phrases but also on authors & venues database system query processing concurrency control Divesh Srivastava Surajit Chaudhuri Jeffrey F. Naughton ICDE SIGMOD VLDB information retrieval retrieval question answering W. Bruce Croft James Allan Maarten de Rijke SIGIR ECIR CIKM 26 text categorization text classification document clustering multi-document summarization relevance feedback query expansion collaborative filtering information filtering
27 27 KERT: Topic Modeling + Phase Mining [M. Danilevsky, et al. SDM 14] Run bag-of-words model inference, and assign topic label to each token Extract candidate keyphrases within each topic Frequent pattern mining Rank the keyphrases in each topic Popularity: information retrieval vs. cross-language information retrieval Discriminativeness: only frequent in documents about topic t Concordance: active learning vs. learning classification Completeness: vector machine vs. support vector machine KERT [Danilevsky et al. 14] learning support vector machines reinforcement learning feature selection conditional random fields classification decision trees :
28 28 Phrase Mining: Frequent Pattern Mining + Statistical Analysis [El-Kishky et al., VLDB 15] Quality phrases Significance score [Church et al. 91]: α(a, B) = ( AB A )/ AB [Markov blanket] [feature selection] for [support vector machines] [knowledge discovery] using [least squares] [support vector machine] [classifiers] [support vector] for [machine learning] Phrase Raw freq. True freq. [support vector machine] [vector machine] 95 0 [support vector]
29 29 ToPMine: Experiments on Yelp Reviews
30 30 Mining Quality Phrases from Massive Text Corpora [J. Liu et al. SIGMOD 15] Integrate the segmentation with the phrase quality assessment Only frequent phrases with reasonable quality are considered Phrase quality guides the segmentation, and the segmentation rectifies the phrase quality estimation
31 EventCube, ResearchInsight and NewsNetExplorer [Tao, et al., SIGMOD 13, KDD 13, SIGMOD 14 demos] 31 Several prototyped systems constructed in our research EventCube [Funded by NASA, Han and Zhai groups] ResearchInsight (prototype of ResearchNet) NewsNetExplorer (Prototype of NewsNet)
32 32 Adding New Dimensions: GeoTopic Discovery [Z. Yin et al., WWW 11] LGTA: GeoTopic discovery with geo-tagged photos and associated text LDM Geo-tagged photos w. landscape (coast vs. desert vs. mountain) TDM GeoFolk LGTA
33 33 Outline Why ResearchNet and NewsNet? What Are the Major Challenges? What Have We Done? Construction of ResearchNet and NewsNet Mining ResearchNet and NewsNet Conclusions
34 34 Network Construction: Still Many Challenges Extraction of phrases, entities, types, relationships (using KB) Domain-based and KB-guided concept hierarchy construction Hierarchy enhancement or knowledge enrichment: Given a rough or existing hierarchy, derive a deeper and complete one Mining roles (e.g., advisor) and creating infobox properties Network construction by integrating Web, KB, social network (Facebook, LinkedIn) and media (tweets) Automatic finding most related info on the web Truth consolidation and multi-source data integration Network cube: Build multi-dim. and multi-level info. networks Build sophisticated networks by deepening into contents Never ending network construction: Enrichment by data mining
35 35 A Two-Stage Framework for Concept Hierarchy Construction
36 Distant Supervision: Enabling Structured Analysis of Unstructured Text Corpus [X. Ren et al. KDD 15 sub] Detect entity mentions from text Map candidate mentions to KB entities of target types Use confidently mapped {mention, type} to infer types of remaining candidate mentions 36 Identifying token span as entity mentions in documents and labeling their types Target Types FOOD/ LOCATION JOB_TITLE/ EVENT ORGANIZATION/ The best BBQ I ve tasted in Phoenix! I had the pulled pork sandwich with coleslaw and baked beans for lunch.... The owner is very nice. Plain text The best BBQ:Food I ve tasted in Phoenix:LOC! I had the [pulled pork sandwich]:food with coleslaw:food and [baked beans]:food for lunch. The owner:job_title is very nice. Text with typed entities
37 ClusType: Example Output and Relation Phrase Clusters Extracts more mentions and predicts types with higher accuracy Not only synonymous relation phrases, but also both sparse and frequent relation phrase can be clustered together 37 boosts sparse relation phrases with type information of frequent relation phrases
38 Construction and Enrichment of ResearchNet: Author and Venue Profiling 38 Use author, paper, and venue info. to find relevant web pages from multiple sources, conduct data integration + entity linking Extract interested attribute values for author/venue from the resolved web pages Resolve conflicts and remove redundancy through truth finding.
39 39 Paper Profiling: Modeling Research Paper Contents in Heterog. Networks Paper category identification: survey vs. research vs. demonstration papers Identifying the themes of a paper Identifying the important terms of a paper Distinguish synonyms, assisted terms, major terms Identifying the major methods or algorithms addressed in a research paper Identifying major applications addressed in a research paper Linking terms with other papers to form networks? Automated paper summary? Automated paper review? Missing or recommend related work? Too similar to some existing work?
40 40 Outline Why ResearchNet and NewsNet? What Are the Major Challenges? What Have We Done? Construction of ResearchNet and NewsNet Mining ResearchNet and NewsNet Conclusions
41 41 Exploring and Mining ResearchNet Similarity search/queries on ResearchNet Similar papers, similar venues, similar authors, similar terms, similar entities (algorithms, genes, diseases, treatments) Similar relationships (author-author, author-term, Similar evolution, diffusion, dynamics, Recommendations and prediction of Recommend coauthors, citations, experts, papers, venues, Clustering and ranking Classification, active learning, transfer learning,. Network folding
42 42 Network Folding-Based ResearchNet Summarization Network folding: brings close the node/link pairs that were separated by multiple hops Help answer queries Q1: Show the venuevenue citation network for papers in venue A citing those published in venue B in [ ] Q2: Show the affiliation collaboration network for the papers using RBF kernel tricks
43 43 Effective OLAP Exploration TEXplorer (CIKM 11): Integrating keyword-based ranking and OLAP exploration OLAP query processing: cross network queries, drilling, Healthcare Reform
44 44 Outline Why ResearchNet and NewsNet? What Are the Major Challenges? What Have We Done? Construction of ResearchNet and NewsNet Mining ResearchNet and NewsNet Conclusions
45 45 Conclusions Big data are likely interconnected and implicitly structured Unstructured data are likely convertible to structured networks! Mining big data construction and mining of heterogeneous networks actionable knowledge (The D2N2K paradigm) Surprisingly rich knowledge can be mine from structured heterogeneous info. Networks Quality, structured heterogeneous info. networks could be automatically constructed from massive data ResearchNet and NewsNet Real data, real cases, real testers, real users, and real tools They are really exciting projects and will have real impact!
Mining Trusted Information in Medical Science: An Information Network Approach
Mining Trusted Information in Medical Science: An Information Network Approach Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign Collaborated with many, especially Yizhou
More informationConstruction and Mining of Heterogeneous Information Networks: Will It Be a Key to Web-Aged Information Management and Mining
Construction and Mining of Heterogeneous Information Networks: Will It Be a Key to Web-Aged Information Management and Mining Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign
More informationData Mining: Dynamic Past and Promising Future
SDM@10 Anniversary Panel: Data Mining: A Decade of Progress and Future Outlook Data Mining: Dynamic Past and Promising Future Jiawei Han Department of Computer Science University of Illinois at Urbana
More informationJiawei Han University of Illinois at Urbana Champaign
1 Web Structure Mining and Information Network Analysis: An Integrated Approach Jiawei Han University of Illinois at Urbana Champaign Collaborated with many students in my group, especially Tim Weninger,
More informationCWS: : A Comparative Web Search System
CWS: : A Comparative Web Search System Jian-Tao Sun, Xuanhui Wang, Dou Shen Hua-Jun Zeng, Zheng Chen Microsoft Research Asia University of Illinois at Urbana-Champaign Hong Kong University of Science and
More informationEffective Keyword Search over (Semi)-Structured Big Data Mehdi Kargar
Effective Keyword Search over (Semi)-Structured Big Data Mehdi Kargar School of Computer Science Faculty of Science University of Windsor How Big is this Big Data? 40 Billion Instagram Photos 300 Hours
More informationData Mining: Concepts and Techniques. (3 rd ed.) Chapter 1
Data Mining: Concepts and Techniques (3 rd ed.) Chapter 1 Jiawei Han, Micheline Kamber, and Jian Pei University of Illinois at Urbana-Champaign & Simon Fraser University 2013 Han, Kamber & Pei. All rights
More informationCS249: ADVANCED DATA MINING
CS249: ADVANCED DATA MINING Recommender Systems II Instructor: Yizhou Sun yzsun@cs.ucla.edu May 31, 2017 Recommender Systems Recommendation via Information Network Analysis Hybrid Collaborative Filtering
More informationIntroduction to Text Mining. Hongning Wang
Introduction to Text Mining Hongning Wang CS@UVa Who Am I? Hongning Wang Assistant professor in CS@UVa since August 2014 Research areas Information retrieval Data mining Machine learning CS@UVa CS6501:
More informationGraph Classification in Heterogeneous
Title: Graph Classification in Heterogeneous Networks Name: Xiangnan Kong 1, Philip S. Yu 1 Affil./Addr.: Department of Computer Science University of Illinois at Chicago Chicago, IL, USA E-mail: {xkong4,
More informationIVIS: SEARCH AND VISUALIZATION ON HETEROGENEOUS INFORMATION NETWORKS YINTAO YU THESIS
c 2010 Yintao Yu IVIS: SEARCH AND VISUALIZATION ON HETEROGENEOUS INFORMATION NETWORKS BY YINTAO YU THESIS Submitted in partial fulfillment of the requirements for the degree of Master of Science in Computer
More informationUser Guided Entity Similarity Search Using Meta-Path Selection in Heterogeneous Information Networks
User Guided Entity Similarity Search Using Meta-Path Selection in Heterogeneous Information Networks Xiao Yu, Yizhou Sun, Brandon Norick, Tiancheng Mao, Jiawei Han Computer Science Department University
More informationJianyong Wang Department of Computer Science and Technology Tsinghua University
Jianyong Wang Department of Computer Science and Technology Tsinghua University jianyong@tsinghua.edu.cn Joint work with Wei Shen (Tsinghua), Ping Luo (HP), and Min Wang (HP) Outline Introduction to entity
More informationEntity and Knowledge Base-oriented Information Retrieval
Entity and Knowledge Base-oriented Information Retrieval Presenter: Liuqing Li liuqing@vt.edu Digital Library Research Laboratory Virginia Polytechnic Institute and State University Blacksburg, VA 24061
More informationCitation Prediction in Heterogeneous Bibliographic Networks
Citation Prediction in Heterogeneous Bibliographic Networks Xiao Yu Quanquan Gu Mianwei Zhou Jiawei Han University of Illinois at Urbana-Champaign {xiaoyu1, qgu3, zhou18, hanj}@illinois.edu Abstract To
More informationChapter 1, Introduction
CSI 4352, Introduction to Data Mining Chapter 1, Introduction Young-Rae Cho Associate Professor Department of Computer Science Baylor University What is Data Mining? Definition Knowledge Discovery from
More informationCSE5243 INTRO. TO DATA MINING
CSE5243 INTRO. TO DATA MINING Chapter 1. Introduction Huan Sun, CSE@The Ohio State University Slides adapted from UIUC CS412, Fall 2017, by Prof. Jiawei Han CSE 5243. Course Page & Schedule Class Homepage:
More informationChapter 1 Introduction
Chapter 1 Introduction Abstract In this chapter, we introduce some basic concepts and definitions in heterogeneous information network and compare the heterogeneous information network with other related
More informationHetPathMine: A Novel Transductive Classification Algorithm on Heterogeneous Information Networks
HetPathMine: A Novel Transductive Classification Algorithm on Heterogeneous Information Networks Chen Luo 1,RenchuGuan 1, Zhe Wang 1,, and Chenghua Lin 2 1 College of Computer Science and Technology, Jilin
More informationA Machine Learning Approach for Information Retrieval Applications. Luo Si. Department of Computer Science Purdue University
A Machine Learning Approach for Information Retrieval Applications Luo Si Department of Computer Science Purdue University Why Information Retrieval: Information Overload: Since the introduction of digital
More informationText Analytics (Text Mining)
CSE 6242 / CX 4242 Apr 1, 2014 Text Analytics (Text Mining) Concepts and Algorithms Duen Horng (Polo) Chau Georgia Tech Some lectures are partly based on materials by Professors Guy Lebanon, Jeffrey Heer,
More informationClustering using Topic Models
Clustering using Topic Models Compiled by Sujatha Das, Cornelia Caragea Credits for slides: Blei, Allan, Arms, Manning, Rai, Lund, Noble, Page. Clustering Partition unlabeled examples into disjoint subsets
More informationPersonalized Entity Recommendation: A Heterogeneous Information Network Approach
Personalized Entity Recommendation: A Heterogeneous Information Network Approach Xiao Yu, Xiang Ren, Yizhou Sun, Quanquan Gu, Bradley Sturt, Urvashi Khandelwal, Brandon Norick, Jiawei Han University of
More informationOverview of Web Mining Techniques and its Application towards Web
Overview of Web Mining Techniques and its Application towards Web *Prof.Pooja Mehta Abstract The World Wide Web (WWW) acts as an interactive and popular way to transfer information. Due to the enormous
More informationIntroduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p.
Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p. 6 What is Web Mining? p. 6 Summary of Chapters p. 8 How
More informationComment Extraction from Blog Posts and Its Applications to Opinion Mining
Comment Extraction from Blog Posts and Its Applications to Opinion Mining Huan-An Kao, Hsin-Hsi Chen Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan
More informationBring Semantic Web to Social Communities
Bring Semantic Web to Social Communities Jie Tang Dept. of Computer Science, Tsinghua University, China jietang@tsinghua.edu.cn April 19, 2010 Abstract Recently, more and more researchers have recognized
More informationKarami, A., Zhou, B. (2015). Online Review Spam Detection by New Linguistic Features. In iconference 2015 Proceedings.
Online Review Spam Detection by New Linguistic Features Amir Karam, University of Maryland Baltimore County Bin Zhou, University of Maryland Baltimore County Karami, A., Zhou, B. (2015). Online Review
More informationTriRank: Review-aware Explainable Recommendation by Modeling Aspects
TriRank: Review-aware Explainable Recommendation by Modeling Aspects Xiangnan He, Tao Chen, Min-Yen Kan, Xiao Chen National University of Singapore Presented by Xiangnan He CIKM 15, Melbourne, Australia
More informationOLAP on Information Networks: a new Framework for Dealing with Bibliographic Data
OLAP on Information Networks: a new Framework for Dealing with Bibliographic Data Wararat Jakawat, C. Favre, Sabine Loudcher To cite this version: Wararat Jakawat, C. Favre, Sabine Loudcher. OLAP on Information
More informationCS425: Algorithms for Web Scale Data
CS425: Algorithms for Web Scale Data Most of the slides are from the Mining of Massive Datasets book. These slides have been modified for CS425. The original slides can be accessed at: www.mmds.org J.
More informationLink Mining & Entity Resolution. Lise Getoor University of Maryland, College Park
Link Mining & Entity Resolution Lise Getoor University of Maryland, College Park Learning in Structured Domains Traditional machine learning and data mining approaches assume: A random sample of homogeneous
More informationAspEm: Embedding Learning by Aspects in Heterogeneous Information Networks
AspEm: Embedding Learning by Aspects in Heterogeneous Information Networks Yu Shi, Huan Gui, Qi Zhu, Lance Kaplan, Jiawei Han University of Illinois at Urbana-Champaign (UIUC) Facebook Inc. U.S. Army Research
More informationWeb Database Integration
In Proceedings of the Ph.D Workshop in conjunction with VLDB 06 (VLDB-PhD2006), Seoul, Korea, September 11, 2006 Web Database Integration Wei Liu School of Information Renmin University of China Beijing,
More informationWE know that most real systems usually consist of a
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 29, NO. 1, JANUARY 2017 17 A Survey of Heterogeneous Information Network Analysis Chuan Shi, Member, IEEE, Yitong Li, Jiawei Zhang, Yizhou Sun,
More informationText Analytics (Text Mining)
CSE 6242 / CX 4242 Text Analytics (Text Mining) Concepts and Algorithms Duen Horng (Polo) Chau Georgia Tech Some lectures are partly based on materials by Professors Guy Lebanon, Jeffrey Heer, John Stasko,
More informationPart I: Data Mining Foundations
Table of Contents 1. Introduction 1 1.1. What is the World Wide Web? 1 1.2. A Brief History of the Web and the Internet 2 1.3. Web Data Mining 4 1.3.1. What is Data Mining? 6 1.3.2. What is Web Mining?
More informationc 2012 by Yizhou Sun. All rights reserved.
c 2012 by Yizhou Sun. All rights reserved. MINING HETEROGENEOUS INFORMATION NETWORKS BY YIZHOU SUN DISSERTATION Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy
More informationInternational Journal of Advance Engineering and Research Development. A Facebook Profile Based TV Shows and Movies Recommendation System
Scientific Journal of Impact Factor (SJIF): 4.72 International Journal of Advance Engineering and Research Development Volume 4, Issue 3, March -2017 A Facebook Profile Based TV Shows and Movies Recommendation
More informationA Novel Categorized Search Strategy using Distributional Clustering Neenu Joseph. M 1, Sudheep Elayidom 2
A Novel Categorized Search Strategy using Distributional Clustering Neenu Joseph. M 1, Sudheep Elayidom 2 1 Student, M.E., (Computer science and Engineering) in M.G University, India, 2 Associate Professor
More informationMining Web Data. Lijun Zhang
Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems
More informationCombining Review Text Content and Reviewer-Item Rating Matrix to Predict Review Rating
Combining Review Text Content and Reviewer-Item Rating Matrix to Predict Review Rating Dipak J Kakade, Nilesh P Sable Department of Computer Engineering, JSPM S Imperial College of Engg. And Research,
More informationLearning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li
Learning to Match Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li 1. Introduction The main tasks in many applications can be formalized as matching between heterogeneous objects, including search, recommendation,
More informationMining Web Data. Lijun Zhang
Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems
More informationCSE 701: LARGE-SCALE GRAPH MINING. A. Erdem Sariyuce
CSE 701: LARGE-SCALE GRAPH MINING A. Erdem Sariyuce WHO AM I? My name is Erdem Office: 323 Davis Hall Office hours: Wednesday 2-4 pm Research on graph (network) mining & management Practical algorithms
More informationQuery Independent Scholarly Article Ranking
Query Independent Scholarly Article Ranking Shuai Ma, Chen Gong, Renjun Hu, Dongsheng Luo, Chunming Hu, Jinpeng Huai SKLSDE Lab, Beihang University, China Beijing Advanced Innovation Center for Big Data
More informationSocial Network Analysis as Knowledge Discovery process: a case study on Digital Bibliography
Social etwork Analysis as Knowledge Discovery process: a case study on Digital Bibliography Michele Coscia, Fosca Giannotti, Ruggero Pensa ISTI-CR Pisa, Italy Email: name.surname@isti.cnr.it Abstract Today
More informationSurvey on Community Question Answering Systems
World Journal of Technology, Engineering and Research, Volume 3, Issue 1 (2018) 114-119 Contents available at WJTER World Journal of Technology, Engineering and Research Journal Homepage: www.wjter.com
More informationNews Filtering and Summarization System Architecture for Recognition and Summarization of News Pages
Bonfring International Journal of Data Mining, Vol. 7, No. 2, May 2017 11 News Filtering and Summarization System Architecture for Recognition and Summarization of News Pages Bamber and Micah Jason Abstract---
More information3 Data, Data Mining. Chengkai Li
CSE4334/5334 Data Mining 3 Data, Data Mining Chengkai Li Department of Computer Science and Engineering University of Texas at Arlington Fall 2018 (Slides partly courtesy of Pang-Ning Tan, Michael Steinbach
More informationScholarly Big Data: Leverage for Science
Scholarly Big Data: Leverage for Science C. Lee Giles The Pennsylvania State University University Park, PA, USA giles@ist.psu.edu http://clgiles.ist.psu.edu Funded in part by NSF, Allen Institute for
More informationData Mining. Ryan Benton Center for Advanced Computer Studies University of Louisiana at Lafayette Lafayette, La., USA.
Data Mining Ryan Benton Center for Advanced Computer Studies University of Louisiana at Lafayette Lafayette, La., USA January 13, 2011 Important Note! This presentation was obtained from Dr. Vijay Raghavan
More informationEntity Information Management in Complex Networks
Entity Information Management in Complex Networks Yi Fang Department of Computer Science 250 N. University Street Purdue University, West Lafayette, IN 47906, USA fangy@cs.purdue.edu ABSTRACT Entity information
More informationChapter 27 Introduction to Information Retrieval and Web Search
Chapter 27 Introduction to Information Retrieval and Web Search Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 27 Outline Information Retrieval (IR) Concepts Retrieval
More informationHeterogeneous Graph-Based Intent Learning with Queries, Web Pages and Wikipedia Concepts
Heterogeneous Graph-Based Intent Learning with Queries, Web Pages and Wikipedia Concepts Xiang Ren, Yujing Wang, Xiao Yu, Jun Yan, Zheng Chen, Jiawei Han University of Illinois, at Urbana Champaign MicrosoD
More informationGraphs / Networks. CSE 6242/ CX 4242 Feb 18, Centrality measures, algorithms, interactive applications. Duen Horng (Polo) Chau Georgia Tech
CSE 6242/ CX 4242 Feb 18, 2014 Graphs / Networks Centrality measures, algorithms, interactive applications Duen Horng (Polo) Chau Georgia Tech Partly based on materials by Professors Guy Lebanon, Jeffrey
More informationIntroduction to Information Retrieval. Hongning Wang
Introduction to Information Retrieval Hongning Wang CS@UVa What is information retrieval? 2 Why information retrieval Information overload It refers to the difficulty a person can have understanding an
More informationIntegrating Meta-Path Selection with User-Preference for Top-k Relevant Search in Heterogeneous Information Networks
Integrating Meta-Path Selection with User-Preference for Top-k Relevant Search in Heterogeneous Information Networks Shaoli Bu bsl89723@gmail.com Zhaohui Peng pzh@sdu.edu.cn Abstract Relevance search in
More informationAnalysis of Large Graphs: TrustRank and WebSpam
Note to other teachers and users of these slides: We would be delighted if you found this our material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit
More informationDeep Web Crawling and Mining for Building Advanced Search Application
Deep Web Crawling and Mining for Building Advanced Search Application Zhigang Hua, Dan Hou, Yu Liu, Xin Sun, Yanbing Yu {hua, houdan, yuliu, xinsun, yyu}@cc.gatech.edu College of computing, Georgia Tech
More informationBing Liu. Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data. With 177 Figures. Springer
Bing Liu Web Data Mining Exploring Hyperlinks, Contents, and Usage Data With 177 Figures Springer Table of Contents 1. Introduction 1 1.1. What is the World Wide Web? 1 1.2. A Brief History of the Web
More informationMaster Project. Various Aspects of Recommender Systems. Prof. Dr. Georg Lausen Dr. Michael Färber Anas Alzoghbi Victor Anthony Arrascue Ayala
Master Project Various Aspects of Recommender Systems May 2nd, 2017 Master project SS17 Albert-Ludwigs-Universität Freiburg Prof. Dr. Georg Lausen Dr. Michael Färber Anas Alzoghbi Victor Anthony Arrascue
More informationA Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2
A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2 1 Department of Electronics & Comp. Sc, RTMNU, Nagpur, India 2 Department of Computer Science, Hislop College, Nagpur,
More informationDATA MINING RESEARCH: RETROSPECT AND PROSPECT
DATA MINING RESEARCH: RETROSPECT AND PROSPECT Prof(Dr).V.SARAVANAN & Mr. ABDUL KHADAR JILANI Department of Computer Science College of Computer and Information Sciences Majmaah University Kingdom of Saudi
More informationc 2014 by Xiao Yu. All rights reserved.
c 2014 by Xiao Yu. All rights reserved. ENTITY RECOMMENDATION AND SEARCH IN HETEROGENEOUS INFORMATION NETWORKS BY XIAO YU DISSERTATION Submitted in partial fulfillment of the requirements for the degree
More informationA Survey on Postive and Unlabelled Learning
A Survey on Postive and Unlabelled Learning Gang Li Computer & Information Sciences University of Delaware ligang@udel.edu Abstract In this paper we survey the main algorithms used in positive and unlabeled
More informationAn Efficient Methodology for Image Rich Information Retrieval
An Efficient Methodology for Image Rich Information Retrieval 56 Ashwini Jaid, 2 Komal Savant, 3 Sonali Varma, 4 Pushpa Jat, 5 Prof. Sushama Shinde,2,3,4 Computer Department, Siddhant College of Engineering,
More informationRankCompete: Simultaneous Ranking and Clustering of Information Networks
RankCompete: Simultaneous Ranking and Clustering of Information Networks Liangliang Cao a, Xin Jin b, Zhijun Yin b, Andrey Del Pozo a, Jiebo Luo c, Jiawei Han b, Thomas S. Huang a a Beckman Institute and
More informationData Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1395
Data Mining Introduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 21 Table of contents 1 Introduction 2 Data mining
More informationMining Latent Entity Structures
MORGAN& CLAYPOOL PUBLISHERS Mining Latent Entity Structures Chi Wang Jiawei Han SyntheSiS LectureS on Data Mining and KnowLeDge DiScovery Jiawei Han, Lise Getoor, Wei Wang, Johannes Gehrke, Robert Grossman,
More informationCS 412 Intro. to Data Mining
CS 412 Intro. to Data Mining Chapter 1. Introduction Jiawei Han, Computer Science, Univ. Illinois at Urbana -Champaign, 2017 1 August 28, 2017 Data Mining: Concepts and Techniques 2 August 28, 2017 Data
More informationCIRGDISCO at RepLab2012 Filtering Task: A Two-Pass Approach for Company Name Disambiguation in Tweets
CIRGDISCO at RepLab2012 Filtering Task: A Two-Pass Approach for Company Name Disambiguation in Tweets Arjumand Younus 1,2, Colm O Riordan 1, and Gabriella Pasi 2 1 Computational Intelligence Research Group,
More informationChapter 28. Outline. Definitions of Data Mining. Data Mining Concepts
Chapter 28 Data Mining Concepts Outline Data Mining Data Warehousing Knowledge Discovery in Databases (KDD) Goals of Data Mining and Knowledge Discovery Association Rules Additional Data Mining Algorithms
More informationEXTRACT THE TARGET LIST WITH HIGH ACCURACY FROM TOP-K WEB PAGES
EXTRACT THE TARGET LIST WITH HIGH ACCURACY FROM TOP-K WEB PAGES B. GEETHA KUMARI M. Tech (CSE) Email-id: Geetha.bapr07@gmail.com JAGETI PADMAVTHI M. Tech (CSE) Email-id: jageti.padmavathi4@gmail.com ABSTRACT:
More informationCombining Text Embedding and Knowledge Graph Embedding Techniques for Academic Search Engines
Combining Text Embedding and Knowledge Graph Embedding Techniques for Academic Search Engines SemDeep-4, Oct. 2018 Gengchen Mai Krzysztof Janowicz Bo Yan STKO Lab, University of California, Santa Barbara
More informationHolistic and Compact Selectivity Estimation for Hybrid Queries over RDF Graphs
Holistic and Compact Selectivity Estimation for Hybrid Queries over RDF Graphs Authors: Andreas Wagner, Veli Bicer, Thanh Tran, and Rudi Studer Presenter: Freddy Lecue IBM Research Ireland 2014 International
More informationJure Leskovec, Cornell/Stanford University. Joint work with Kevin Lang, Anirban Dasgupta and Michael Mahoney, Yahoo! Research
Jure Leskovec, Cornell/Stanford University Joint work with Kevin Lang, Anirban Dasgupta and Michael Mahoney, Yahoo! Research Network: an interaction graph: Nodes represent entities Edges represent interaction
More informationData Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1394
Data Mining Introduction Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1394 1 / 20 Table of contents 1 Introduction 2 Data mining
More informationEmpowering People with Knowledge the Next Frontier for Web Search. Wei-Ying Ma Assistant Managing Director Microsoft Research Asia
Empowering People with Knowledge the Next Frontier for Web Search Wei-Ying Ma Assistant Managing Director Microsoft Research Asia Important Trends for Web Search Organizing all information Addressing user
More informationNUS-I2R: Learning a Combined System for Entity Linking
NUS-I2R: Learning a Combined System for Entity Linking Wei Zhang Yan Chuan Sim Jian Su Chew Lim Tan School of Computing National University of Singapore {z-wei, tancl} @comp.nus.edu.sg Institute for Infocomm
More informationIntroduction to Data Mining
Introduction to Data Mining Lecture #10: Link Analysis-2 Seoul National University 1 In This Lecture Pagerank: Google formulation Make the solution to converge Computing Pagerank for very large graphs
More informationAutomatically Building Research Reading Lists
Automatically Building Research Reading Lists Michael D. Ekstrand 1 Praveen Kanaan 1 James A. Stemper 2 John T. Butler 2 Joseph A. Konstan 1 John T. Riedl 1 ekstrand@cs.umn.edu 1 GroupLens Research Department
More informationFinding Topic-centric Identified Experts based on Full Text Analysis
Finding Topic-centric Identified Experts based on Full Text Analysis Hanmin Jung, Mikyoung Lee, In-Su Kang, Seung-Woo Lee, Won-Kyung Sung Information Service Research Lab., KISTI, Korea jhm@kisti.re.kr
More informationData Mining Concepts & Tasks
Data Mining Concepts & Tasks Duen Horng (Polo) Chau Georgia Tech CSE6242 / CX4242 Sept 9, 2014 Partly based on materials by Professors Guy Lebanon, Jeffrey Heer, John Stasko, Christos Faloutsos Last Time
More informationReview of feature selection techniques in bioinformatics by Yvan Saeys, Iñaki Inza and Pedro Larrañaga.
Americo Pereira, Jan Otto Review of feature selection techniques in bioinformatics by Yvan Saeys, Iñaki Inza and Pedro Larrañaga. ABSTRACT In this paper we want to explain what feature selection is and
More informationData Mining. Chapter 1: Introduction. Adapted from materials by Jiawei Han, Micheline Kamber, and Jian Pei
Data Mining Chapter 1: Introduction Adapted from materials by Jiawei Han, Micheline Kamber, and Jian Pei 1 Any Question? Just Ask 3 Chapter 1. Introduction Why Data Mining? What Is Data Mining? A Multi-Dimensional
More informationLink prediction in multiplex bibliographical networks
Int. J. Complex Systems in Science vol. 3(1) (2013), pp. 77 82 Link prediction in multiplex bibliographical networks Manisha Pujari 1, and Rushed Kanawati 1 1 Laboratoire d Informatique de Paris Nord (LIPN),
More informationContents. Preface to the Second Edition
Preface to the Second Edition v 1 Introduction 1 1.1 What Is Data Mining?....................... 4 1.2 Motivating Challenges....................... 5 1.3 The Origins of Data Mining....................
More informationPersonalized Recommendations using Knowledge Graphs. Rose Catherine Kanjirathinkal & Prof. William Cohen Carnegie Mellon University
+ Personalized Recommendations using Knowledge Graphs Rose Catherine Kanjirathinkal & Prof. William Cohen Carnegie Mellon University + The Problem 2 n Generate content-based recommendations on sparse real
More informationMulti-label classification using rule-based classifier systems
Multi-label classification using rule-based classifier systems Shabnam Nazmi (PhD candidate) Department of electrical and computer engineering North Carolina A&T state university Advisor: Dr. A. Homaifar
More informationDatabase and Knowledge-Base Systems: Data Mining. Martin Ester
Database and Knowledge-Base Systems: Data Mining Martin Ester Simon Fraser University School of Computing Science Graduate Course Spring 2006 CMPT 843, SFU, Martin Ester, 1-06 1 Introduction [Fayyad, Piatetsky-Shapiro
More informationTutorial on Mining Heterogeneous Information Networks
Tutorial on Mining Heterogeneous Information Networks Rokia Missaoui LARIM Université du Québec en Outaouais, Canada http://w3.uqo.ca/missaoui 1 Acknowledgement I am grateful to Professor Jiawei Han who
More informationLink Prediction across Networks by Biased Cross-Network Sampling
Link Prediction across Networks by Biased Cross-Network Sampling Guo-Jun Qi 1, Charu C. Aggarwal 2, Thomas Huang 1 1 Department of Electrical and Computer Engineering University of Illinois at Urbana-Champaign
More informationOverview. Data-mining. Commercial & Scientific Applications. Ongoing Research Activities. From Research to Technology Transfer
Data Mining George Karypis Department of Computer Science Digital Technology Center University of Minnesota, Minneapolis, USA. http://www.cs.umn.edu/~karypis karypis@cs.umn.edu Overview Data-mining What
More informationObject Distinction: Distinguishing Objects with Identical Names
Object Distinction: Distinguishing Objects with Identical Names Xiaoxin Yin Univ. of Illinois xyin1@uiuc.edu Jiawei Han Univ. of Illinois hanj@cs.uiuc.edu Philip S. Yu IBM T. J. Watson Research Center
More informationText Mining. Representation of Text Documents
Data Mining is typically concerned with the detection of patterns in numeric data, but very often important (e.g., critical to business) information is stored in the form of text. Unlike numeric data,
More informationQuotient Cube: How to Summarize the Semantics of a Data Cube
Quotient Cube: How to Summarize the Semantics of a Data Cube Laks V.S. Lakshmanan (Univ. of British Columbia) * Jian Pei (State Univ. of New York at Buffalo) * Jiawei Han (Univ. of Illinois at Urbana-Champaign)
More informationInformation Retrieval
Multimedia Computing: Algorithms, Systems, and Applications: Information Retrieval and Search Engine By Dr. Yu Cao Department of Computer Science The University of Massachusetts Lowell Lowell, MA 01854,
More information1. Inroduction to Data Mininig
1. Inroduction to Data Mininig 1.1 Introduction Universe of Data Information Technology has grown in various directions in the recent years. One natural evolutionary path has been the development of the
More informationComputer-based Tracking Protocols: Improving Communication between Databases
Computer-based Tracking Protocols: Improving Communication between Databases Amol Deshpande Database Group Department of Computer Science University of Maryland Overview Food tracking and traceability
More information