Construction and Mining of Heterogeneous Information Networks: Will It Be a Key to Web-Aged Information Management and Mining
|
|
- Dana McKenzie
- 5 years ago
- Views:
Transcription
1 Construction and Mining of Heterogeneous Information Networks: Will It Be a Key to Web-Aged Information Management and Mining Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign Collaborated with many students, especially Yizhou Sun, Ming Ji, Chi Wang, Ahmed El-Kishky, Bo Zhao Acknowledgements: NSF, ARL, NASA, AFOSR (MURI), ARO, Microsoft, IBM, Yahoo!, LinkedIn, Google & Boeing July 3,
2 Outline Why Heterogeneous Information Networks? On the Power of Mining Structured Heterogeneous Info. Networks Clustering & Classification: A Ranking-Based Approach Similarity Search & Meta Path in Heterogeneous Info. Net Prediction & Recommendation: A Meta Path-Based Approach From Unstructured Data to Structured Networks From Unstructured Text to Structured Hierarchies Type Extraction and Role Discovery: Enrich Network Structures Truth Analysis: Enhance Quality of Structured Network Looking Forward to the Future 2
3 Where There Is Information, There Are Networks! Social Networking Websites Biological Network: Protein Interaction Research Collaboration Network Product Recommendation Network via s 3
4 The Real World: Heterogeneous Networks Multiple object types and/or multiple link types Movie Studio Venue Paper Author DBLP Bibliographic Network Actor Movie Director The IMDB Movie Network The Facebook Network Homogeneous networks are information loss projection of heterogeneous networks! Directly mining information-richer heterogeneous networks 4
5 Structured Heterogeneous Network Modeling Leads to the New Power of Data Mining! DBLP: A Computer Science bibliographic database A sample publication record in DBLP (>2 M papers, >0.7 M authors, >10 K venues), Knowledge hidden in DBLP Network How are CS research areas structured? Who are the leading researchers on Web search? What are the most essential terms, venues, authors in AI? Who are the peer researchers of Jure Leskovec? Whom will Christos Faloutsos collaborate with? Which types of relationships are most influential for an author to decide her topics? How was the field of Data Mining emerged or evolving? Which authors are rather different from his/her peers in IR? Mining Functions Clustering Ranking Classification + Ranking Similarity Search Relationship Prediction Relation Strength Learning Network Evolution Outlier/anomaly detection Power of het. network modeling: Treat Author, Venue, Term & Paper all first-class citizens! 5
6 Outline Why Heterogeneous Information Networks? On the Power of Mining Structured Heterogeneous Info. Networks Clustering & Classification: A Ranking-Based Approach Similarity Search & Meta Path in Heterogeneous Info. Net Prediction & Recommendation: A Meta Path-Based Approach From Unstructured Data to Structured Networks From Unstructured Text to Structured Hierarchies Type Extraction and Role Discovery: Enrich Network Structures Truth Analysis: Enhance Quality of Structured Network Looking Forward to the Future 6
7 Ranking-Based Clustering & Classification Basic functions for networks: Clustering, classification & ranking Why not integrate ranking with clustering & classification? High-ranked objects should be more important (thus more weighted) in a cluster/class than low-ranked ones Ranking will make more sense within a particular cluster/class Einstein in physics vs. Turing in Coputer Science Integrate ranking with clustering/classification in the same process Ranking, as the feature, is conditional (i.e., relative) to a specific cluster/class Ranking and clustering/classification mutually enhance each other Ranking-based clustering: RankClus [EDBT 09], NetClus [KDD 09], PathSelClus [KDD 12] Ranking-based classification: GNetMine (for classification) [ECMLPKDD 10], RankClass [KDD 11] 7
8 RankClus: Integrating Clustering with Ranking Initialization Randomly partition Repeat Ranking Sub-Network Ranking objects in each sub-network induced from each cluster Generating new measure space Estimate mixture model coefficients for each target object Adjusting cluster Until stable Clustering Ranking An E-M framework for iterative enhancement Difference on ranking rules: 1. Simple ranking rule: Ranking high if more papers 2. Highly ranked if publishing more in highly rankedvenues 8
9 Step-by-Step Running of RankClus Clustering and ranking two fields: DB/DM & HW/CA Initially, ranking distributions are mixed together Improved a little Two clusters of objects mixed together, but preserve similarity somehow Two clusters are almost well separated Improved significantly Well separated Stable 9
10 NetClus: Ranking-Based Clustering with Star Network Schema Beyond bi-typed network: capture more semantics with multiple types Split a network into multi-subnets, each a net-cluster [KDD 09] DBLP network: using terms, venues & authors to jointly distinguish a sub-field, e.g., database V P A Database Venue Author T Publish Write V A Research Paper P Hardware Contain Term Computer Science NetClus V T P A Theory T 10
11 NetClus: Database System Cluster Term Venue Author database databases system data query systems queries management object relational processing based distributed xml oriented design web information model efficient VLDB SIGMOD Conf ICDE PODS EDBT One-level deeper: Authors in XML, Xquery cluster Surajit Chaudhuri Michael Stonebraker Michael J. Carey C. Mohan David J. DeWitt Hector Garcia-Molina H. V. Jagadish David B. Lomet Raghu Ramakrishnan Philip A. Bernstein Joseph M. Hellerstein Jeffrey F. Naughton Yannis E. Ioannidis Jennifer Widom Per-Ake Larson Rakesh Agrawal Dan Suciu Michael J. Franklin Umeshwar Dayal Abraham Silberschatz
12 Rank-Based Clustering: Works in Multiple Domains RankCompete: Organize your photo album automatically! Use multi-typed image features to build up heterog. networks Explore multiple types in a star schema network MedRank: Rank treatments for AIDS from Medline 12
13 Classification: Knowledge Propagation Across Heterogeneous Typed Networks Labeled knowledge propagates through multi-typed objects across heterogeneous networks [ECMLPKDD'10, KDD 11] 13
14 RankClass: Ranking-Based Classification Class: a group of multi-typed nodes sharing the same topic + within-class ranking node class The RankClass Framework 14
15 Methodology of RankClass Classification in heterogeneous networks Knowledge propagation: Class label knowledge propagated across multi-typed objects through a heterogeneous network GNetMine [Ji et al., PKDD 10]: Objects are treated equally RankClass [Ji et al., KDD 11]: Ranking-based classification Highly ranked objects will play more role in classification An object can only be ranked high in some focused classes Class membership and ranking are stat. distributions Let ranking and classification mutually enhance each other! Output: Classification results + ranking list of objects within each class 15
16 Experiments on DBLP Class: Four research areas (communities) Database, data mining, AI, information retrieval Four types of objects Paper (14376), Conf. (20), Author (14475), Term (8920) Three types of relations Paper-conf., paper-author, paper-term Algorithms for comparison Learning with Local and Global Consistency (LLGC) [Zhou et al. NIPS 2003] also the homogeneous version of our method Weighted-vote Relational Neighbor classifier (wvrn) [Macskassy et al. JMLR 2007] Network-only Link-based Classification (nlb) [Lu et al. ICML 2003, Macskassy et al. JMLR 2007] 16
17 Comparing Classification Accuracy on the DBLP Data 17
18 Object Ranking in Each Class: Experiment DBLP: 4-fields data set (DB, DM, AI, IR) forming a heterog. info. network Rank objects within each class (with extremely limited label information) Obtain high classification accuracy and excellent ranking within each class Top-5 ranked conferences Top-5 ranked terms Database Data Mining AI IR VLDB KDD IJCAI SIGIR SIGMOD SDM AAAI ECIR ICDE ICDM ICML CIKM PODS PKDD CVPR WWW EDBT PAKDD ECML WSDM data mining learning retrieval database data knowledge information query clustering reasoning web system classification logic search xml frequent cognition text 18
19 Outline Why Heterogeneous Information Networks? On the Power of Mining Structured Heterogeneous Info. Networks Clustering & Classification: A Ranking-Based Approach Similarity Search & Meta Path in Heterogeneous Info. Net Prediction & Recommendation: A Meta Path-Based Approach From Unstructured Data to Structured Networks From Unstructured Text to Structured Hierarchies Type Extraction and Role Discovery: Enrich Network Structures Truth Analysis: Enhance Quality of Structured Network Looking Forward to the Future 19
20 Similarity Search: Find Similar Objects in Networks Who are most similar to Christos Faloutsos? Meta-Path: Meta-level description of a path between two objects Different meta-paths Schema of the carry rather different DBLP Network semantics Meta-Path: Author-Paper-Author (APA) Different meta-paths lead to very different results! Meta-Path: Author-Paper-Venue-Paper-Author (APVPA) Christos s students or close collaborators Similar reputation at similar venues 20
21 PathSim: A Het. Network Similarity Measure Popular object similarity measures in networks Random walk (RW) (or Personalized PageRank): Favors highly visible objects (i.e., objects with large degrees) Pairwise random walk (PRW) (or SimRank): Favors pure objects (i.e., objects with highly skewed distribution in their in-links or out-links) PathSim [VLDB 11] Note: P-PageRank and SimRank do not distinguish object type nor relationship type Favor peers : objects with strong connectivity and similar visibility under the given meta-path x y 21
22 Which Similarity Measure Is Better? Anhai Doan CS, Wisconsin Database area PhD: 2002 Jignesh Patel CS, Wisconsin Database area PhD: 1998 Meta-Path: Author-Paper-Venue-Paper-Author (APVPA) Amol Deshpande CS, Maryland Database area PhD: 2004 Jun Yang CS, Duke Database area PhD:
23 Outline Why Heterogeneous Information Networks? On the Power of Mining Structured Heterogeneous Info. Networks Clustering & Classification: A Ranking-Based Approach Similarity Search & Meta Path in Heterogeneous Info. Net Prediction & Recommendation: A Meta Path-Based Approach From Unstructured Data to Structured Networks From Unstructured Text to Structured Hierarchies Type Extraction and Role Discovery: Enrich Network Structures Truth Analysis: Enhance Quality of Structured Network Looking Forward to the Future 23
24 PathPredict: Meta-Path Based Relationship Prediction Meta path-guided prediction of links and relationships vs. Philosophy: Meta path relationships among similar typed links share similar semantics and are comparable and inferable venue publish publish -1 mention topic -1 write paper -1 mention write contain/contain -1 cite/cite -1 author Bibliographic network: Co-author prediction (A P A) Use topological features encoded by meta paths, e.g., citation relations between authors (A P P A) 24
25 Meta-Path Based Co-authorship Prediction Co-authorship prediction problem Whether two authors are going to collaborate for the first time Co-authorship encoded in meta-path Author-Paper-Author Topological features encoded in meta-paths Meta-Path Semantic Meaning Meta-paths between authors under length 4 25
26 The Power of PathPredict: Experiments Explain the prediction power of each meta-path Wald Test for logistic regression Higher prediction accuracy than using projected homogeneous network 11% higher in prediction accuracy Co-author prediction for Jian Pei: Only 42 among 4809 candidates are true first-time co-authors! (Feature collected in [1996, 2002]; Test period in [2003,2009]) 26
27 Enhancing the Power of Recommender Systems by Heterog. Info. Network Analysis Collaborative filtering methods suffer from the data sparsity issue Avatar Zoe Saldana Aliens Adventure Titanic James Cameron Leonardo Dicaprio Revolutionary Road Kate Winslet A small set of users & items have a large number of ratings Romance # of ratings Most users and items have a small number of ratings # of users or items Personalized recommendation with heterog. Networks [WSDM 14] Heterogeneous relationships complement each other Users and items with limited feedback can be connected to the network by different types of paths Connect new users or items in the information network Different users may require different models: Relationship heterogeneity makes personalized recommendation models easier to define 27
28 Performance Study: Personalized Recommendation in Heterogeneous Networks Datasets: Methods to be compared: Popularity: Recommend the most popular items to users Co-click: Conditional probabilities between items NMF: Non-negative matrix factorization on user feedback Hybrid-SVM: Use Rank-SVM with plain features (utilize both user feedback and information network) Winner: HeteRec personalized recommendation (HeteRec-p) 28
29 Outline Why Heterogeneous Information Networks? On the Power of Mining Structured Heterogeneous Info. Networks Clustering & Classification: A Ranking-Based Approach Similarity Search & Meta Path in Heterogeneous Info. Net Prediction & Recommendation: A Meta Path-Based Approach From Unstructured Data to Structured Networks From Unstructured Text to Structured Hierarchies Type Extraction and Role Discovery: Enrich Network Structures Truth Analysis: Enhance Quality of Structured Network Looking Forward to the Future 29
30 Automated Construction of Heterogeneous Info. Networks Rich semantics can be mined from structured heterog. networks Unfortunately, much of the real world data is unstructured News, Wikipedia, blogs, tweets, multimedia data, Too expensive to be structured by human: Automated & scalable How to turn unstructured data to structured heterog. Networks? From Unstructured Text to Structured Hierarchies Type Extraction and Role Discovery: Enrich Network Structures Truth Analysis: Enhance Quality of Structured Network Methodology: Incrementally & progressively explore links and structures to construct quality heterogeneous networks 30
31 From Unstructured Text to Structured Hierarchies Entity text Input collection Our recent effort: 1. Hierarchical topic discovery with entities 2. Phrase mining 3. Rank phrases & entities per topic Output hierarchy w/ phrases & entities Kert [Danilevsky et al, SDM 14]: LDA + Keyphrase extraction and ranking by topic (4 criteria: coverage, purity, phraseness, completeness) CATHTY [Wang, et al., KDD 13]: Recursive generation of topic hierarchy based on term co-occurrence network CATHYHIN [Wang, et al., ICDM 13]: Use hetero. network to help topic hierarchy generation TopMine [El-Kishky et al., 2014]: Exploring frequent contiguous pattern mining o/1/1 o/1 o o/1/2 o/2 o/2/1 Hierarchy so generated can be used for network mining, exploration, etc. 31
32 Automatic Discovery of Clusters and Term Hierarchies from Collection of Documents KERT: Keyphrase Extraction and Ranking by Topic [SDM 14] 1. Use background LDA (blda) to cluster unigrams into T topics + one background topic 2. Extract candidate keyphrases for each foreground topic according to the word topic assignment should be order-free and non-consecutive E.g., mining frequent patterns vs. frequent pattern mining E.g., mining top-k frequent closed patterns 3. Rank the keyphrases in each foreground topic Coverage: information retrieval vs. cross-language information retrieval Purity: only frequent in documents about topic t Phraseness: active learning vs. learning classification Completeness: vector machine vs. support vector machine 32
33 Experiment: Machine Learning: Top Keyphrases Dataset: DBLP paper titles KERT variations: ignore each criterion in turn Baselines: two variations of an approach from a recent work 33
34 CATHYHIN: Topic Hierarchy Construction by Integration of Heterogeneous Info. Networks Using DBLP heterog. Info. network to enhance topical hierarchy generation Hierarchies generated not only on topical phrases but also on authors & venues 34
35 TopMine: Phrase Mining Then Top Modeling Frequent Contiguous Phrase Mining Frequency: A low frequency phrase is likely unimportant --- a generalization of the most likely unigrams in LDA Collocation: The co-occurrence of tokens in such frequency that is significantly higher than expected due to chance. Completeness: Do not double count sub-phrases (e.g. support vector is not a phrase when inside support vector machine) General frequent pattern mining may place words far apart in the same phrase The pattern for a phrase: each phrase is a contiguous sequence of ordered words. Markov Blanket Feature Selection for Support Vector Machines 35
36 Good Phrases Should Be Stat. Significant Under the assumption that the text data was generated by a sequence of Bernoulli trials, we can derive a significance measure for potential mergings. Good Phrases! 36
37 Bag-of-Phrase Construction: An Example 37
38 The ToPMine Framework 1. Preprocessing: Perform stemming to reduce vocabulary size and increase collisions of phrases, split on phrase-invariant punctuation 2. Frequent Contiguous Pattern Mining: Collects aggregate counts and candidate phrases 3. Phrase Construction: Agglomerative construction of significant phrases (addresses frequency, collocation, and completeness) 4. Use the phrase construction of bag of phrases : This can be considered an enriched input for later algorithms (not exclusive to topic modeling) 5. Bag-of-phrases Topic Modeling 38
39 Experiments on AP News 39
40 Experiments on Yelp Reviews 40
41 Scalability: Runtime Results 41
42 Outline Why Heterogeneous Information Networks? On the Power of Mining Structured Heterogeneous Info. Networks Clustering & Classification: A Ranking-Based Approach Similarity Search & Meta Path in Heterogeneous Info. Net Prediction & Recommendation: A Meta Path-Based Approach From Unstructured Data to Structured Networks From Unstructured Text to Structured Hierarchies Type Extraction and Role Discovery: Enrich Network Structures Truth Analysis: Enhance Quality of Structured Network Looking Forward to the Future 42
43 Role Discovery: Mining Advisor-Advisee Relationships in DBLP Network Propagation of simple, commonly accepted constraints in Time- Constrained Probabilistic Factor Graph (TPFG) Advisor has more publications and longer history than advisee at the time of advising Once an advisee becomes advisor, s/he will not become advisee again Input: Temporal collaboration network 1999 Output: Relationship analysis (0.9, [/, 1998]) Visualized chorological hierarchies Ada 2000 Bob Ada (0.4, [/, 1998]) (0.5, [/, 2000]) 2000 (0.8, [1999,2000]) Ying Smith Jerry Bob (0.7, [2000, 2001]) (0.65, [2002, 2004]) Ying (0.49, [/, 1999]) Jerry (0.2, [2001, 2003]) 2004 Smith 43
44 Role Discovery: Performance & Case Study DBLP data: 654, 628 authors, 1076,946 publications, years provided Labeled data: MathGealogy Project; AI Gealogy Project; Homepage Datasets RULE SVM IndMAX TPFG TEST1 69.9% 73.4% 75.2% 78.9% 80.2% 84.4% TEST2 69.8% 74.6% 74.6% 79.0% 81.5% 84.3% TEST3 80.6% 86.7% 83.1% 90.9% 88.8% 91.3% Case study heuristics Supervised learning Empirical parameter Advisee Top Ranked Advisor Time Note optimized parameter David M. Blei 1. Michael I. Jordan PhD advisor, 2004 grad 2. John D. Lafferty Postdoc, 2006 Hong Cheng 1. Qiang Yang MS advisor, Jiawei Han PhD advisor, 2008 Sergey Brin 1. Rajeev Motawani Unofficial advisor 44
45 Hierarchical Relationship Discovery From partially ordered objects to hierarchy (tree) Based on NLP or other techniques to extract partially ordered objects Using constraints to discover relationships Singleton Potential Type Cognitive description Potential definition Homophile Polarity Support pattern Forbidden pattern Parent and child are similar Parent is superior to child Patterns frequently occurring with child-parent pairs Patterns rarely occurring with child-parent pairs Pairwise Potential Function: Cases Type Cognitive description Potential definition Attribute Use inherited attributes augment from parents or children Label Similar nodes share similar propagate parents (or children) Patterns altering in childparent & parent-child pairs Reciprocity Constraints Restrict certain patterns Discovery of the Kenny Family Tree 45
46 Outline Why Heterogeneous Information Networks? On the Power of Mining Structured Heterogeneous Info. Networks Clustering & Classification: A Ranking-Based Approach Similarity Search & Meta Path in Heterogeneous Info. Net Prediction & Recommendation: A Meta Path-Based Approach From Unstructured Data to Structured Networks From Unstructured Text to Structured Hierarchies Type Extraction and Role Discovery: Enrich Network Structures Truth Analysis: Enhance Quality of Structured Network Looking Forward to the Future 46
47 Truth Analysis: Enhancing the Quality of Heterogeneous Info. Networks Info. networks could be untrustworthy, error-prone, missing, TruthFinder [KDD 07]: Inference on trustworthiness by constructing heterogeneous info. networks Web sites (Book Sellers) Facts (Book Authors) w 1 f 1 Objects (Books) o 1 w 2 f 2 w 3 f 3 o 2 w 4 f 4 True facts and trustable websites mutually enhance each other and will become apparent after some iterations 47
48 Truth Discovery: Multiple Truth Value and Handling False Negatives Voting may not always work well: Some sources tend to miss true values (False Negatives), while some others tend to produce false claims (False Positives) Why Latent Truth Model (LTM)? Modeling two-sided quality to support multiple true values per entity for truth finding [VLDB 12] Generating Implicit Negative Claims: Positive Claim High Precision, High Recall IMDB Negative Claim Correct Claim Incorrect Claim High Precision, Low Recall Netflix Low Precision, Low Recall BadSource Harry Potter 48
49 Truth Discovery: Effectiveness of Latent Truth Model Experimental datasets: Large and real Book Authors from abebooks.com (1263 books, 879 sources, claims, 2420 book-author, 100 labeled) Movie Directors from Bing (15073 movies, 12 sources, claims, movie-director, 100 labeled) Effectiveness of Latent Truth Model: Model source quality in other data integration tasks, e.g. entity resolution. Trustworthiness in multi-genre networks (text-rich networks, social networks, etc.) 49
50 Outline Why Heterogeneous Information Networks? On the Power of Mining Structured Heterogeneous Info. Networks Clustering & Classification: A Ranking-Based Approach Similarity Search & Meta Path in Heterogeneous Info. Net Prediction & Recommendation: A Meta Path-Based Approach From Unstructured Data to Structured Networks From Unstructured Text to Structured Hierarchies Type Extraction and Role Discovery: Enrich Network Structures Truth Analysis: Enhance Quality of Structured Network Looking Forward to the Future 50
51 On-Going and Future Work From small, structured data to real, big, unstructured data From the DBLP network to networks constructed from news, PubMed, tweets, Construction of quality, semi-structured heterogeneous networks from massive, real data sets Integration information network with data/text cube and OLAP technologies Data/text cubes have been shown its power in multidimensional OLAP analysis, e.g., EventCube Integration of heterogeneous network mining OLAP mining on multi-dimensional social and information networks Network exploration of mission- or query-based hidden networks 51
52 From Data Mining to Mining Info. Networks Han, Kamber and Pei, Data Mining, 3 rd ed Yu, Han and Faloutsos (eds.), Link Mining, 2010 Sun and Han, Mining Heterogeneous Information Networks,
53 Conclusions Heterogeneous information networks are ubiquitous Most datasets can be organized or transformed into structured multi-typed heterogeneous info. networks Examples: DBLP, IMDB, Flickr, Google News, Wikipedia, Surprisingly rich knowledge can be mined from structured heterogeneous info. networks Clustering, ranking, classification, path prediction, Knowledge is power, but knowledge is hidden in massive, but relatively structured nodes and links! Key issue: Construction of trusted, semi-structured heterogeneous networks from unstructured data From data to knowledge: Much more to be explored but heterogeneous network mining has shown high promise! 53
54 References M. Ji, J. Han, and M. Danilevsky, "Ranking-Based Classification of Heterogeneous Information Networks", KDD'11. Y. Sun and J. Han, Mining Heterogeneous Information Networks: Principles and Methodologies, Morgan & Claypool Publishers, 2012 Y. Sun, J. Han, et al., "RankClus: Integrating Clustering with Ranking for Heterogeneous Information Network Analysis", EDBT 09 Y. Sun, Y. Yu, and J. Han, "Ranking-Based Clustering of Heterogeneous Information Networks with Star Network Schema", KDD 09 Y. Sun, J. Han, X. Yan, P. S. Yu, and T. Wu, PathSim: Meta Path-Based Top-K Similarity Search in Heterogeneous Information Networks, VLDB'11 Y. Sun, R. Barber, M. Gupta, C. Aggarwal and J. Han, "Co-Author Relationship Prediction in Heterogeneous Bibliographic Networks", ASONAM'11 Y. Sun, J. Han, C. C. Aggarwal, N. Chawla, When Will It Happen? Relationship Prediction in Heterogeneous Information Networks, WSDM'12 F. Tao, et al., EventCube: Multi-Dimensional Search and Mining of Structured and Text Data, (system demo) KDD 13 C. Wang, J. Han, et al., Mining Advisor-Advisee Relationships from Research Publication Networks", KDD'10 C. Wang, M. Danilevsky, et al., A Phrase Mining Framework for Recursive Construction of a Topical Hierarchy, KDD 13 C. Wang, Marina Danilevsky, et al., "Constructing Topical Hierarchies in Heterogeneous Information Networks", ICDM'13 54
Mining Trusted Information in Medical Science: An Information Network Approach
Mining Trusted Information in Medical Science: An Information Network Approach Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign Collaborated with many, especially Yizhou
More informationResearchNet and NewsNet: Two Nets That May Capture Many Lovely Birds
ResearchNet and NewsNet: Two Nets That May Capture Many Lovely Birds Jiawei Han Data Mining Research Group, Computer Science University of Illinois at Urbana-Champaign Acknowledgements: NSF, ARL, NIH,
More informationCS249: ADVANCED DATA MINING
CS249: ADVANCED DATA MINING Recommender Systems II Instructor: Yizhou Sun yzsun@cs.ucla.edu May 31, 2017 Recommender Systems Recommendation via Information Network Analysis Hybrid Collaborative Filtering
More informationPersonalized Entity Recommendation: A Heterogeneous Information Network Approach
Personalized Entity Recommendation: A Heterogeneous Information Network Approach Xiao Yu, Xiang Ren, Yizhou Sun, Quanquan Gu, Bradley Sturt, Urvashi Khandelwal, Brandon Norick, Jiawei Han University of
More informationJiawei Han University of Illinois at Urbana Champaign
1 Web Structure Mining and Information Network Analysis: An Integrated Approach Jiawei Han University of Illinois at Urbana Champaign Collaborated with many students in my group, especially Tim Weninger,
More informationData Mining: Dynamic Past and Promising Future
SDM@10 Anniversary Panel: Data Mining: A Decade of Progress and Future Outlook Data Mining: Dynamic Past and Promising Future Jiawei Han Department of Computer Science University of Illinois at Urbana
More informationHetPathMine: A Novel Transductive Classification Algorithm on Heterogeneous Information Networks
HetPathMine: A Novel Transductive Classification Algorithm on Heterogeneous Information Networks Chen Luo 1,RenchuGuan 1, Zhe Wang 1,, and Chenghua Lin 2 1 College of Computer Science and Technology, Jilin
More informationUser Guided Entity Similarity Search Using Meta-Path Selection in Heterogeneous Information Networks
User Guided Entity Similarity Search Using Meta-Path Selection in Heterogeneous Information Networks Xiao Yu, Yizhou Sun, Brandon Norick, Tiancheng Mao, Jiawei Han Computer Science Department University
More informationGraph Classification in Heterogeneous
Title: Graph Classification in Heterogeneous Networks Name: Xiangnan Kong 1, Philip S. Yu 1 Affil./Addr.: Department of Computer Science University of Illinois at Chicago Chicago, IL, USA E-mail: {xkong4,
More informationData Mining: Concepts and Techniques. (3 rd ed.) Chapter 1
Data Mining: Concepts and Techniques (3 rd ed.) Chapter 1 Jiawei Han, Micheline Kamber, and Jian Pei University of Illinois at Urbana-Champaign & Simon Fraser University 2013 Han, Kamber & Pei. All rights
More informationIVIS: SEARCH AND VISUALIZATION ON HETEROGENEOUS INFORMATION NETWORKS YINTAO YU THESIS
c 2010 Yintao Yu IVIS: SEARCH AND VISUALIZATION ON HETEROGENEOUS INFORMATION NETWORKS BY YINTAO YU THESIS Submitted in partial fulfillment of the requirements for the degree of Master of Science in Computer
More informationIntroduction to Text Mining. Hongning Wang
Introduction to Text Mining Hongning Wang CS@UVa Who Am I? Hongning Wang Assistant professor in CS@UVa since August 2014 Research areas Information retrieval Data mining Machine learning CS@UVa CS6501:
More informationChapter 1 Introduction
Chapter 1 Introduction Abstract In this chapter, we introduce some basic concepts and definitions in heterogeneous information network and compare the heterogeneous information network with other related
More informationEffective Latent Space Graph-based Re-ranking Model with Global Consistency
Effective Latent Space Graph-based Re-ranking Model with Global Consistency Feb. 12, 2009 1 Outline Introduction Related work Methodology Graph-based re-ranking model Learning a latent space graph A case
More informationc 2012 by Yizhou Sun. All rights reserved.
c 2012 by Yizhou Sun. All rights reserved. MINING HETEROGENEOUS INFORMATION NETWORKS BY YIZHOU SUN DISSERTATION Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy
More informationCS425: Algorithms for Web Scale Data
CS425: Algorithms for Web Scale Data Most of the slides are from the Mining of Massive Datasets book. These slides have been modified for CS425. The original slides can be accessed at: www.mmds.org J.
More informationChapter 1, Introduction
CSI 4352, Introduction to Data Mining Chapter 1, Introduction Young-Rae Cho Associate Professor Department of Computer Science Baylor University What is Data Mining? Definition Knowledge Discovery from
More informationMining Web Data. Lijun Zhang
Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems
More informationText Analytics (Text Mining)
CSE 6242 / CX 4242 Apr 1, 2014 Text Analytics (Text Mining) Concepts and Algorithms Duen Horng (Polo) Chau Georgia Tech Some lectures are partly based on materials by Professors Guy Lebanon, Jeffrey Heer,
More informationWE know that most real systems usually consist of a
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 29, NO. 1, JANUARY 2017 17 A Survey of Heterogeneous Information Network Analysis Chuan Shi, Member, IEEE, Yitong Li, Jiawei Zhang, Yizhou Sun,
More informationAspEm: Embedding Learning by Aspects in Heterogeneous Information Networks
AspEm: Embedding Learning by Aspects in Heterogeneous Information Networks Yu Shi, Huan Gui, Qi Zhu, Lance Kaplan, Jiawei Han University of Illinois at Urbana-Champaign (UIUC) Facebook Inc. U.S. Army Research
More informationMining Web Data. Lijun Zhang
Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems
More informationA Novel Categorized Search Strategy using Distributional Clustering Neenu Joseph. M 1, Sudheep Elayidom 2
A Novel Categorized Search Strategy using Distributional Clustering Neenu Joseph. M 1, Sudheep Elayidom 2 1 Student, M.E., (Computer science and Engineering) in M.G University, India, 2 Associate Professor
More informationIntroduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p.
Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p. 6 What is Web Mining? p. 6 Summary of Chapters p. 8 How
More informationCitation Prediction in Heterogeneous Bibliographic Networks
Citation Prediction in Heterogeneous Bibliographic Networks Xiao Yu Quanquan Gu Mianwei Zhou Jiawei Han University of Illinois at Urbana-Champaign {xiaoyu1, qgu3, zhou18, hanj}@illinois.edu Abstract To
More informationPart I: Data Mining Foundations
Table of Contents 1. Introduction 1 1.1. What is the World Wide Web? 1 1.2. A Brief History of the Web and the Internet 2 1.3. Web Data Mining 4 1.3.1. What is Data Mining? 6 1.3.2. What is Web Mining?
More informationIntegrating Meta-Path Selection with User-Preference for Top-k Relevant Search in Heterogeneous Information Networks
Integrating Meta-Path Selection with User-Preference for Top-k Relevant Search in Heterogeneous Information Networks Shaoli Bu bsl89723@gmail.com Zhaohui Peng pzh@sdu.edu.cn Abstract Relevance search in
More informationTriRank: Review-aware Explainable Recommendation by Modeling Aspects
TriRank: Review-aware Explainable Recommendation by Modeling Aspects Xiangnan He, Tao Chen, Min-Yen Kan, Xiao Chen National University of Singapore Presented by Xiangnan He CIKM 15, Melbourne, Australia
More informationAnalysis of Large Graphs: TrustRank and WebSpam
Note to other teachers and users of these slides: We would be delighted if you found this our material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit
More informationMining Heterogeneous Information Networks
ACM SIGKDD Conference Tutorial, Washington, D.C., July 25, 2010 Mining Heterogeneous Information Networks Jiawei Han Yizhou Sun Xifeng Yan Philip S. Yu University of Illinois at Urbana Champaign University
More informationEffective Keyword Search over (Semi)-Structured Big Data Mehdi Kargar
Effective Keyword Search over (Semi)-Structured Big Data Mehdi Kargar School of Computer Science Faculty of Science University of Windsor How Big is this Big Data? 40 Billion Instagram Photos 300 Hours
More informationLink Mining & Entity Resolution. Lise Getoor University of Maryland, College Park
Link Mining & Entity Resolution Lise Getoor University of Maryland, College Park Learning in Structured Domains Traditional machine learning and data mining approaches assume: A random sample of homogeneous
More informationMining Knowledge from Databases: An Information Network Analysis Approach
ACM SIGMOD Conference Tutorial, Indianapolis, Indiana, June 8, 2010 Mining Knowledge from Databases: An Information Network Analysis Approach Jiawei Han Yizhou Sun Xifeng Yan Philip S. Yu University of
More informationText Analytics (Text Mining)
CSE 6242 / CX 4242 Text Analytics (Text Mining) Concepts and Algorithms Duen Horng (Polo) Chau Georgia Tech Some lectures are partly based on materials by Professors Guy Lebanon, Jeffrey Heer, John Stasko,
More informationIntroduction to Data Mining
Introduction to Data Mining Lecture #10: Link Analysis-2 Seoul National University 1 In This Lecture Pagerank: Google formulation Make the solution to converge Computing Pagerank for very large graphs
More informationHolistic and Compact Selectivity Estimation for Hybrid Queries over RDF Graphs
Holistic and Compact Selectivity Estimation for Hybrid Queries over RDF Graphs Authors: Andreas Wagner, Veli Bicer, Thanh Tran, and Rudi Studer Presenter: Freddy Lecue IBM Research Ireland 2014 International
More informationOverview Citation. ML Introduction. Overview Schedule. ML Intro Dataset. Introduction to Semi-Supervised Learning Review 10/4/2010
INFORMATICS SEMINAR SEPT. 27 & OCT. 4, 2010 Introduction to Semi-Supervised Learning Review 2 Overview Citation X. Zhu and A.B. Goldberg, Introduction to Semi- Supervised Learning, Morgan & Claypool Publishers,
More informationJianyong Wang Department of Computer Science and Technology Tsinghua University
Jianyong Wang Department of Computer Science and Technology Tsinghua University jianyong@tsinghua.edu.cn Joint work with Wei Shen (Tsinghua), Ping Luo (HP), and Min Wang (HP) Outline Introduction to entity
More informationMining Diversity on Networks
Mining Diversity on Networks Lu Liu 1, Feida Zhu 3, Chen Chen 2, Xifeng Yan 4, Jiawei Han 2, Philip Yu 5, and Shiqiang Yang 1 1 Tsinghua University, 2 University of Illinois at Urbana-Champaign 3 Singapore
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu 2/6/2013 Jure Leskovec, Stanford C246: Mining Massive Datasets 2 High dim. data Graph data Infinite data Machine
More informationChapter 2 Survey of Current Developments
Chapter 2 Survey of Current Developments Abstract Heterogeneous information network (HIN) provides a new paradigm to manage networked data. Meanwhile, it also introduces new challenges for many data mining
More informationA Machine Learning Approach for Information Retrieval Applications. Luo Si. Department of Computer Science Purdue University
A Machine Learning Approach for Information Retrieval Applications Luo Si Department of Computer Science Purdue University Why Information Retrieval: Information Overload: Since the introduction of digital
More informationComputer-based Tracking Protocols: Improving Communication between Databases
Computer-based Tracking Protocols: Improving Communication between Databases Amol Deshpande Database Group Department of Computer Science University of Maryland Overview Food tracking and traceability
More informationGraph Cube: On Warehousing and OLAP Multidimensional Networks
Graph Cube: On Warehousing and OLAP Multidimensional Networks Peixiang Zhao, Xiaolei Li, Dong Xin, Jiawei Han Department of Computer Science, UIUC Groupon Inc. Google Cooperation pzhao4@illinois.edu, hanj@cs.illinois.edu
More informationCSE5243 INTRO. TO DATA MINING
CSE5243 INTRO. TO DATA MINING Chapter 1. Introduction Huan Sun, CSE@The Ohio State University Slides adapted from UIUC CS412, Fall 2017, by Prof. Jiawei Han CSE 5243. Course Page & Schedule Class Homepage:
More informationDatabase and Knowledge-Base Systems: Data Mining. Martin Ester
Database and Knowledge-Base Systems: Data Mining Martin Ester Simon Fraser University School of Computing Science Graduate Course Spring 2006 CMPT 843, SFU, Martin Ester, 1-06 1 Introduction [Fayyad, Piatetsky-Shapiro
More informationSlides based on those in:
Spyros Kontogiannis & Christos Zaroliagis Slides based on those in: http://www.mmds.org A 3.3 B 38.4 C 34.3 D 3.9 E 8.1 F 3.9 1.6 1.6 1.6 1.6 1.6 2 y 0.8 ½+0.2 ⅓ M 1/2 1/2 0 0.8 1/2 0 0 + 0.2 0 1/2 1 [1/N]
More informationCOMP 465: Data Mining Still More on Clustering
3/4/015 Exercise COMP 465: Data Mining Still More on Clustering Slides Adapted From : Jiawei Han, Micheline Kamber & Jian Pei Data Mining: Concepts and Techniques, 3 rd ed. Describe each of the following
More informationOLAP on Information Networks: a new Framework for Dealing with Bibliographic Data
OLAP on Information Networks: a new Framework for Dealing with Bibliographic Data Wararat Jakawat, C. Favre, Sabine Loudcher To cite this version: Wararat Jakawat, C. Favre, Sabine Loudcher. OLAP on Information
More informationTutorial on Mining Heterogeneous Information Networks
Tutorial on Mining Heterogeneous Information Networks Rokia Missaoui LARIM Université du Québec en Outaouais, Canada http://w3.uqo.ca/missaoui 1 Acknowledgement I am grateful to Professor Jiawei Han who
More informationHolistic Analysis of Multi-Source, Multi- Feature Data: Modeling and Computation Challenges
Holistic Analysis of Multi-Source, Multi- Feature Data: Modeling and Computation Challenges Abhishek Santra 1 and Sanjukta Bhowmick 2 1 Information Technology Laboratory, CSE Department, University of
More informationA. Papadopoulos, G. Pallis, M. D. Dikaiakos. Identifying Clusters with Attribute Homogeneity and Similar Connectivity in Information Networks
A. Papadopoulos, G. Pallis, M. D. Dikaiakos Identifying Clusters with Attribute Homogeneity and Similar Connectivity in Information Networks IEEE/WIC/ACM International Conference on Web Intelligence Nov.
More informationBing Liu. Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data. With 177 Figures. Springer
Bing Liu Web Data Mining Exploring Hyperlinks, Contents, and Usage Data With 177 Figures Springer Table of Contents 1. Introduction 1 1.1. What is the World Wide Web? 1 1.2. A Brief History of the Web
More informationQuotient Cube: How to Summarize the Semantics of a Data Cube
Quotient Cube: How to Summarize the Semantics of a Data Cube Laks V.S. Lakshmanan (Univ. of British Columbia) * Jian Pei (State Univ. of New York at Buffalo) * Jiawei Han (Univ. of Illinois at Urbana-Champaign)
More informationQuery Independent Scholarly Article Ranking
Query Independent Scholarly Article Ranking Shuai Ma, Chen Gong, Renjun Hu, Dongsheng Luo, Chunming Hu, Jinpeng Huai SKLSDE Lab, Beihang University, China Beijing Advanced Innovation Center for Big Data
More informationMining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University Infinite data. Filtering data streams
/9/7 Note to other teachers and users of these slides: We would be delighted if you found this our material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them
More informationDATA MINING RESEARCH: RETROSPECT AND PROSPECT
DATA MINING RESEARCH: RETROSPECT AND PROSPECT Prof(Dr).V.SARAVANAN & Mr. ABDUL KHADAR JILANI Department of Computer Science College of Computer and Information Sciences Majmaah University Kingdom of Saudi
More informationData Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1395
Data Mining Introduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 21 Table of contents 1 Introduction 2 Data mining
More informationObject Distinction: Distinguishing Objects with Identical Names
Object Distinction: Distinguishing Objects with Identical Names Xiaoxin Yin Univ. of Illinois xyin1@uiuc.edu Jiawei Han Univ. of Illinois hanj@cs.uiuc.edu Philip S. Yu IBM T. J. Watson Research Center
More informationOverview. Data-mining. Commercial & Scientific Applications. Ongoing Research Activities. From Research to Technology Transfer
Data Mining George Karypis Department of Computer Science Digital Technology Center University of Minnesota, Minneapolis, USA. http://www.cs.umn.edu/~karypis karypis@cs.umn.edu Overview Data-mining What
More informationRankCompete: Simultaneous Ranking and Clustering of Information Networks
RankCompete: Simultaneous Ranking and Clustering of Information Networks Liangliang Cao a, Xin Jin b, Zhijun Yin b, Andrey Del Pozo a, Jiebo Luo c, Jiawei Han b, Thomas S. Huang a a Beckman Institute and
More informationUsing PageRank in Feature Selection
Using PageRank in Feature Selection Dino Ienco, Rosa Meo, and Marco Botta Dipartimento di Informatica, Università di Torino, Italy {ienco,meo,botta}@di.unito.it Abstract. Feature selection is an important
More informationHeterogeneous Graph-Based Intent Learning with Queries, Web Pages and Wikipedia Concepts
Heterogeneous Graph-Based Intent Learning with Queries, Web Pages and Wikipedia Concepts Xiang Ren, Yujing Wang, Xiao Yu, Jun Yan, Zheng Chen, Jiawei Han University of Illinois, at Urbana Champaign MicrosoD
More informationData Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1394
Data Mining Introduction Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1394 1 / 20 Table of contents 1 Introduction 2 Data mining
More informationBig Trend in Business Intelligence: Data Mining over Big Data Web Transaction Data. Fall 2012
Big Trend in Business Intelligence: Data Mining over Big Data Web Transaction Data Fall 2012 Data Warehousing and OLAP Introduction Decision Support Technology On Line Analytical Processing Star Schema
More informationGraphs / Networks. CSE 6242/ CX 4242 Feb 18, Centrality measures, algorithms, interactive applications. Duen Horng (Polo) Chau Georgia Tech
CSE 6242/ CX 4242 Feb 18, 2014 Graphs / Networks Centrality measures, algorithms, interactive applications Duen Horng (Polo) Chau Georgia Tech Partly based on materials by Professors Guy Lebanon, Jeffrey
More informationCS224W: Social and Information Network Analysis Jure Leskovec, Stanford University
CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu How to organize the Web? First try: Human curated Web directories Yahoo, DMOZ, LookSmart Second
More informationBig Data Analytics Influx of data pertaining to the 4Vs, i.e. Volume, Veracity, Velocity and Variety
Holistic Analysis of Multi-Source, Multi- Feature Data: Modeling and Computation Challenges Big Data Analytics Influx of data pertaining to the 4Vs, i.e. Volume, Veracity, Velocity and Variety Abhishek
More informationTo Enhance Projection Scalability of Item Transactions by Parallel and Partition Projection using Dynamic Data Set
To Enhance Scalability of Item Transactions by Parallel and Partition using Dynamic Data Set Priyanka Soni, Research Scholar (CSE), MTRI, Bhopal, priyanka.soni379@gmail.com Dhirendra Kumar Jha, MTRI, Bhopal,
More informationA BFS-BASED SIMILAR CONFERENCE RETRIEVAL FRAMEWORK
A BFS-BASED SIMILAR CONFERENCE RETRIEVAL FRAMEWORK Qing Guo 1, 2 1 Nanyang Technological University, Singapore 2 SAP Innovation Center Network,Singapore ABSTRACT Literature review is part of scientific
More informationCS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University
CS473: CS-473 Course Review Luo Si Department of Computer Science Purdue University Basic Concepts of IR: Outline Basic Concepts of Information Retrieval: Task definition of Ad-hoc IR Terminologies and
More informationSearching complex graphs
Searching complex graphs complex graph data Big volume: huge number of nodes/links Big variety: complex, heterogeneous schema Big velocity: e.g., frequently updated Noisy, ambiguous attributes and values
More information[Gidhane* et al., 5(7): July, 2016] ISSN: IC Value: 3.00 Impact Factor: 4.116
IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY AN EFFICIENT APPROACH FOR TEXT MINING USING SIDE INFORMATION Kiran V. Gaidhane*, Prof. L. H. Patil, Prof. C. U. Chouhan DOI: 10.5281/zenodo.58632
More informationMaster Project. Various Aspects of Recommender Systems. Prof. Dr. Georg Lausen Dr. Michael Färber Anas Alzoghbi Victor Anthony Arrascue Ayala
Master Project Various Aspects of Recommender Systems May 2nd, 2017 Master project SS17 Albert-Ludwigs-Universität Freiburg Prof. Dr. Georg Lausen Dr. Michael Färber Anas Alzoghbi Victor Anthony Arrascue
More informationThanks to Jure Leskovec, Anand Rajaraman, Jeff Ullman
Thanks to Jure Leskovec, Anand Rajaraman, Jeff Ullman http://www.mmds.org Overview of Recommender Systems Content-based Systems Collaborative Filtering J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive
More information3 announcements: Thanks for filling out the HW1 poll HW2 is due today 5pm (scans must be readable) HW3 will be posted today
3 announcements: Thanks for filling out the HW1 poll HW2 is due today 5pm (scans must be readable) HW3 will be posted today CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu
More informationMining Query-Based Subnetwork Outliers in Heterogeneous Information Networks
Mining Query-Based Subnetwork Outliers in Heterogeneous Information Networks Honglei Zhuang, Jing Zhang 2, George Brova, Jie Tang 2, Hasan Cam 3, Xifeng Yan 4, Jiawei Han University of Illinois at Urbana-Champaign
More informationDynamic Optimization of Generalized SQL Queries with Horizontal Aggregations Using K-Means Clustering
Dynamic Optimization of Generalized SQL Queries with Horizontal Aggregations Using K-Means Clustering Abstract Mrs. C. Poongodi 1, Ms. R. Kalaivani 2 1 PG Student, 2 Assistant Professor, Department of
More informationData Mining: Concepts and Techniques. (3 rd ed.) Chapter 3
Data Mining: Concepts and Techniques (3 rd ed.) Chapter 3 Jiawei Han, Micheline Kamber, and Jian Pei University of Illinois at Urbana-Champaign & Simon Fraser University 2011 Han, Kamber & Pei. All rights
More informationHierarchical Document Clustering
Hierarchical Document Clustering Benjamin C. M. Fung, Ke Wang, and Martin Ester, Simon Fraser University, Canada INTRODUCTION Document clustering is an automatic grouping of text documents into clusters
More informationarxiv: v1 [cs.mm] 12 Jan 2016
Learning Subclass Representations for Visually-varied Image Classification Xinchao Li, Peng Xu, Yue Shi, Martha Larson, Alan Hanjalic Multimedia Information Retrieval Lab, Delft University of Technology
More informationInformation Retrieval
Introduction Information Retrieval Information retrieval is a field concerned with the structure, analysis, organization, storage, searching and retrieval of information Gerard Salton, 1968 J. Pei: Information
More informationBring Semantic Web to Social Communities
Bring Semantic Web to Social Communities Jie Tang Dept. of Computer Science, Tsinghua University, China jietang@tsinghua.edu.cn April 19, 2010 Abstract Recently, more and more researchers have recognized
More informationTupleRank: Ranking Relational Databases using Random Walks on Extended K-partite Graphs
TupleRank: Ranking Relational Databases using Random Walks on Extended K-partite Graphs Jiyang Chen, Osmar R. Zaïane, Randy Goebel, Philip S. Yu 2 Department of Computing Science, University of Alberta
More informationEmpowering People with Knowledge the Next Frontier for Web Search. Wei-Ying Ma Assistant Managing Director Microsoft Research Asia
Empowering People with Knowledge the Next Frontier for Web Search Wei-Ying Ma Assistant Managing Director Microsoft Research Asia Important Trends for Web Search Organizing all information Addressing user
More informationPersonalized Recommendations using Knowledge Graphs. Rose Catherine Kanjirathinkal & Prof. William Cohen Carnegie Mellon University
+ Personalized Recommendations using Knowledge Graphs Rose Catherine Kanjirathinkal & Prof. William Cohen Carnegie Mellon University + The Problem 2 n Generate content-based recommendations on sparse real
More informationGraphGAN: Graph Representation Learning with Generative Adversarial Nets
The 32 nd AAAI Conference on Artificial Intelligence (AAAI 2018) New Orleans, Louisiana, USA GraphGAN: Graph Representation Learning with Generative Adversarial Nets Hongwei Wang 1,2, Jia Wang 3, Jialin
More informationChapter 27 Introduction to Information Retrieval and Web Search
Chapter 27 Introduction to Information Retrieval and Web Search Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 27 Outline Information Retrieval (IR) Concepts Retrieval
More informationMining Latent Entity Structures
MORGAN& CLAYPOOL PUBLISHERS Mining Latent Entity Structures Chi Wang Jiawei Han SyntheSiS LectureS on Data Mining and KnowLeDge DiScovery Jiawei Han, Lise Getoor, Wei Wang, Johannes Gehrke, Robert Grossman,
More informationCIRGDISCO at RepLab2012 Filtering Task: A Two-Pass Approach for Company Name Disambiguation in Tweets
CIRGDISCO at RepLab2012 Filtering Task: A Two-Pass Approach for Company Name Disambiguation in Tweets Arjumand Younus 1,2, Colm O Riordan 1, and Gabriella Pasi 2 1 Computational Intelligence Research Group,
More informationUsing PageRank in Feature Selection
Using PageRank in Feature Selection Dino Ienco, Rosa Meo, and Marco Botta Dipartimento di Informatica, Università di Torino, Italy fienco,meo,bottag@di.unito.it Abstract. Feature selection is an important
More informationAn Improved Apriori Algorithm for Association Rules
Research article An Improved Apriori Algorithm for Association Rules Hassan M. Najadat 1, Mohammed Al-Maolegi 2, Bassam Arkok 3 Computer Science, Jordan University of Science and Technology, Irbid, Jordan
More informationFeature Selection. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani
Feature Selection CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Dimensionality reduction Feature selection vs. feature extraction Filter univariate
More informationCS249: ADVANCED DATA MINING
CS249: ADVANCED DATA MINING Classification Evaluation and Practical Issues Instructor: Yizhou Sun yzsun@cs.ucla.edu April 24, 2017 Homework 2 out Announcements Due May 3 rd (11:59pm) Course project proposal
More informationAn Overview of Data Warehousing and OLAP Technology
An Overview of Data Warehousing and OLAP Technology What is a data warehouse? A multi-dimensional data model Data warehouse architecture Data warehouse implementation lecture 2 1 What is Data Warehouse?
More informationClustering Results. Result List Example. Clustering Results. Information Retrieval
Information Retrieval INFO 4300 / CS 4300! Presenting Results Clustering Clustering Results! Result lists often contain documents related to different aspects of the query topic! Clustering is used to
More informationPlagiarism Detection Using FP-Growth Algorithm
Northeastern University NLP Project Report Plagiarism Detection Using FP-Growth Algorithm Varun Nandu (nandu.v@husky.neu.edu) Suraj Nair (nair.sur@husky.neu.edu) Supervised by Dr. Lu Wang December 10,
More informationTribhuvan University Institute of Science and Technology MODEL QUESTION
MODEL QUESTION 1. Suppose that a data warehouse for Big University consists of four dimensions: student, course, semester, and instructor, and two measures count and avg-grade. When at the lowest conceptual
More informationSupervised Models for Multimodal Image Retrieval based on Visual, Semantic and Geographic Information
Supervised Models for Multimodal Image Retrieval based on Visual, Semantic and Geographic Information Duc-Tien Dang-Nguyen, Giulia Boato, Alessandro Moschitti, Francesco G.B. De Natale Department of Information
More informationLeveraging Knowledge Graphs for Web-Scale Unsupervised Semantic Parsing. Interspeech 2013
Leveraging Knowledge Graphs for Web-Scale Unsupervised Semantic Parsing LARRY HECK, DILEK HAKKANI-TÜR, GOKHAN TUR Focus of This Paper SLU and Entity Extraction (Slot Filling) Spoken Language Understanding
More information