Academic Recommendation using Citation Analysis with theadvisor

Size: px
Start display at page:

Download "Academic Recommendation using Citation Analysis with theadvisor"

Transcription

1 Academic Recommendation using Citation Analysis with theadvisor joint work with Onur Küçüktunç, Kamer Kaya, Ümit V. Çatalyürek Department of Biomedical Informatics The Ohio State University CSTA 2013 :: 1 / 37

2 Table of Contents 1 Introduction Why? Overview 2 Citation Analysis for Document Recommendation Previous Approaches Direction Aware Recommendation 3 A High Performance Computing Problem A specialization of SpMV Ordering and Partitioning 4 Result Diversification 5 Other Features 6 Final Thoughts Conclusion Future Works :: 2 / 37

3 Once upon a time : a survey paper The Jimmy John s scheduling problem scheduling partitioning mapping pipeline workflow data flow task graph linear chain sequences (tree) (serial parallel) Introduction::Why? 3 / 37

4 Once upon a time : a survey paper The Jimmy John s scheduling problem scheduling partitioning mapping pipeline workflow data flow task graph linear chain sequences (tree) (serial parallel) But also... Scheduling problems in parallel query optimization Bringing skeletons out of the closet: A pragmatic manifesto for skeletal parallel programming Introduction::Why? 3 / 37

5 Once upon a time : a survey paper The Jimmy John s scheduling problem scheduling partitioning mapping pipeline workflow data flow task graph linear chain sequences (tree) (serial parallel) But also... Scheduling problems in parallel query optimization Bringing skeletons out of the closet: A pragmatic manifesto for skeletal parallel programming After 6 months, unknown papers where still uncovered Introduction::Why? 3 / 37

6 Once upon a time : a survey paper The Jimmy John s scheduling problem scheduling partitioning mapping pipeline workflow data flow task graph linear chain sequences (tree) (serial parallel) But also... Scheduling problems in parallel query optimization Bringing skeletons out of the closet: A pragmatic manifesto for skeletal parallel programming After 6 months, unknown papers where still uncovered Develop software to make the search easier! Introduction::Why? 3 / 37

7 Design Goals Personalized The user should be able to make a query that describes precisely what she is looking for. Conceptual The system should free of linguistic problems. Ambiguity and synonymy should be taken into accounts. Exploratory Different perspective should be available. The system should enhance the user s search. Easy to use The user should not need to know anything about data mining or algorithms. Introduction::Why? 4 / 37

8 The Academic Web Service Ecosystem DBLP List of CS papers with clean reference and disambiguated names. Introduction::Why? 5 / 37

9 The Academic Web Service Ecosystem DBLP List of CS papers with clean reference and disambiguated names. Citeseer, {Ref,Ack,Collab}Seer Automatically crawled papers in CS. Give PDFs. Contain citation information, full text. Compute similarity. Introduction::Why? 5 / 37

10 The Academic Web Service Ecosystem DBLP List of CS papers with clean reference and disambiguated names. Citeseer, {Ref,Ack,Collab}Seer Automatically crawled papers in CS. Give PDFs. Contain citation information, full text. Compute similarity. CiteUlike Social paper tagging application. Find paper from researchers with similar interest. Introduction::Why? 5 / 37

11 The Academic Web Service Ecosystem DBLP List of CS papers with clean reference and disambiguated names. Citeseer, {Ref,Ack,Collab}Seer Automatically crawled papers in CS. Give PDFs. Contain citation information, full text. Compute similarity. CiteUlike Social paper tagging application. Find paper from researchers with similar interest. ArnetMiner Academic network analysis. Introduction::Why? 5 / 37

12 The Academic Web Service Ecosystem DBLP List of CS papers with clean reference and disambiguated names. Mendeley Application for managing references. Database of reference. Citeseer, {Ref,Ack,Collab}Seer Automatically crawled papers in CS. Give PDFs. Contain citation information, full text. Compute similarity. CiteUlike Social paper tagging application. Find paper from researchers with similar interest. ArnetMiner Academic network analysis. Introduction::Why? 5 / 37

13 The Academic Web Service Ecosystem DBLP List of CS papers with clean reference and disambiguated names. Citeseer, {Ref,Ack,Collab}Seer Automatically crawled papers in CS. Give PDFs. Contain citation information, full text. Compute similarity. Mendeley Application for managing references. Database of reference. Google Scholar Keyword-based search engine (with citation informations). CiteUlike Social paper tagging application. Find paper from researchers with similar interest. ArnetMiner Academic network analysis. Introduction::Why? 5 / 37

14 The Academic Web Service Ecosystem DBLP List of CS papers with clean reference and disambiguated names. Citeseer, {Ref,Ack,Collab}Seer Automatically crawled papers in CS. Give PDFs. Contain citation information, full text. Compute similarity. CiteUlike Social paper tagging application. Find paper from researchers with similar interest. Mendeley Application for managing references. Database of reference. Google Scholar Keyword-based search engine (with citation informations). Microsoft Academic Search Keyword-based search engine and Academic network analysis. ArnetMiner Academic network analysis. Introduction::Why? 5 / 37

15 The Academic Web Service Ecosystem DBLP List of CS papers with clean reference and disambiguated names. Citeseer, {Ref,Ack,Collab}Seer Automatically crawled papers in CS. Give PDFs. Contain citation information, full text. Compute similarity. CiteUlike Social paper tagging application. Find paper from researchers with similar interest. ArnetMiner Academic network analysis. Mendeley Application for managing references. Database of reference. Google Scholar Keyword-based search engine (with citation informations). Microsoft Academic Search Keyword-based search engine and Academic network analysis. IEEE, ACM, Elsevier, JSTOR,... Publishers or digital libraries with complete text and references. Some suggestions. Introduction::Why? 5 / 37

16 A Use Case Introduction::Overview 6 / 37

17 System Overview Architecture A web-server as a front end. A cluster in the back-end. New instances are dynamically created as the load varies. Functional.bib.ris.xml Paper Mapper parameters {k,d,κ} paper IDs Recommendation Engine π π π Venue Rec. Reviewer Rec. Diversification Engine venues reviewers papers Visualization Relevance Feedback Introduction::Overview 7 / 37

18 Outline 1 Introduction Why? Overview 2 Citation Analysis for Document Recommendation Previous Approaches Direction Aware Recommendation 3 A High Performance Computing Problem A specialization of SpMV Ordering and Partitioning 4 Result Diversification 5 Other Features 6 Final Thoughts Conclusion Future Works Citation Analysis:: 8 / 37

19 Using the Citation Graph Hypothesis If two papers are related or treat the same subject, then they will be close to each other in the citation graph (and reciprocal) Benefits No linguistic => no synonymy, no ambiguity Automatically crowd source by researchers Drawbacks Difficult to gather the data (But thanks Citeseer) Relies on researcher already having made similar connections Citation Analysis:: 9 / 37

20 Local Approaches reference edges of v v x citation edges of v u t Bibliographic coupling [Kessler63]: papers having similar references are related Cocitation [Small73]: papers which are cited by the same papers are related CCIDF [Lawrence99]: cocitations weighted with inverse frequencies Citation Analysis::Previous Approaches 10 / 37

21 Local Approaches reference edges of v v x citation edges of v u t Bibliographic coupling [Kessler63]: papers having similar references are related Cocitation [Small73]: papers which are cited by the same papers are related CCIDF [Lawrence99]: cocitations weighted with inverse frequencies Problem: Considers only level-2 papers based on level-1 information. Citation Analysis::Previous Approaches 10 / 37

22 Global Approaches Graph distance-based Katz: number of paths between two papers [Strohman07] Random walk with restarts (RWR) based ArticleRank [Li09] (PageRank [Brin98] extension) PaperRank [Gori06] (Personalized PageRank [Haveliwala02] extension) RWR treats the citations and references in the same way Citation Analysis::Previous Approaches 11 / 37

23 Global Approaches Graph distance-based Katz: number of paths between two papers [Strohman07] Random walk with restarts (RWR) based ArticleRank [Li09] (PageRank [Brin98] extension) PaperRank [Gori06] (Personalized PageRank [Haveliwala02] extension) RWR treats the citations and references in the same way This is not exploratory! Citation Analysis::Previous Approaches 11 / 37

24 PageRank Let G = (V, E) be the citation graph PageRank [Brin98] π i (u) = d 1 V + (1 d) π i 1 (v) v N(u) δ(v) with d (0 : 1) is the damping factor. It converges to a stable distribution. source: wikipedia Citation Analysis::Direction Awareness 12 / 37

25 PageRank Let G = (V, E) be the citation graph PageRank [Brin98] π i (u) = d 1 V + (1 d) π i 1 (v) v N(u) δ(v) with d (0 : 1) is the damping factor. It converges to a stable distribution. But it is not personalized. source: wikipedia Citation Analysis::Direction Awareness 12 / 37

26 PageRank Let G = (V, E) be the citation graph PageRank [Brin98] π i (u) = d 1 V + (1 d) π i 1 (v) v N(u) δ(v) with d (0 : 1) is the damping factor. It converges to a stable distribution. But it is not personalized. Personalized PageRank [Jeh03] π i (u) = dp (u)+(1 d) v N(u) with p (u) = 1. π i 1 (v) δ(v) source: wikipedia Citation Analysis::Direction Awareness 12 / 37

27 PageRank Let G = (V, E) be the citation graph PageRank [Brin98] π i (u) = d 1 V + (1 d) π i 1 (v) v N(u) δ(v) with d (0 : 1) is the damping factor. It converges to a stable distribution. But it is not personalized. Personalized PageRank [Jeh03] π i (u) = dp (u)+(1 d) v N(u) with p (u) = 1. But it is not exploratory. π i 1 (v) δ(v) source: wikipedia Citation Analysis::Direction Awareness 12 / 37

28 Direction Awareness Time exploration What if we are interested in searching papers per years. Recent papers? Traditional papers? Let M be a set of known relevant papers. Direction Aware Random Walk with Restart π i (u) = dp (u) + (1 d)(κ v N + (u) d (0 : 1) is the damping factor. π i 1 (v) δ (v) κ (0 : 1). p (u) = 1 M, if u M, p (u) = 0, otherwise + (1 κ) v N (u) d (1-κ) π i 1 (v) δ + (v) ) a b c d δ + (v) reference edge v d κ δ - (v) back-reference (citation) edge (1-d) m restart edge q m Citation Analysis::Direction Awareness 13 / 37

29 Exploring in Depth d κ average shortest distance Citation Analysis::Direction Awareness 14 / 37

30 Exploring in Time d κ average publication year Citation Analysis::Direction Awareness 15 / 37

31 Hidden Reference Discovery The recovery test Let s hide some references from a paper and see if an algorithm can find them Results of the experiments with mean average precision (MAP@50) and 95% confidence intervals. hide random hide recent hide earlier mean interval mean interval mean interval DaRWR P.R Katz β Cocit Cocoup CCIDF Citation Analysis::Direction Awareness 16 / 37

32 Outline 1 Introduction Why? Overview 2 Citation Analysis for Document Recommendation Previous Approaches Direction Aware Recommendation 3 A High Performance Computing Problem A specialization of SpMV Ordering and Partitioning 4 Result Diversification 5 Other Features 6 Final Thoughts Conclusion Future Works A HPC computing problem:: 17 / 37

33 A Sparse Matrix-Vector Multiplication (SpMV) Rewriting DaRWR π i (u) = dp (u) + (1 d) κ π i (u) = dp (u) + π i = v N + (u) dp + Aπ i 1 v N + (u) π i 1 (v) δ (v) (1 d)κ δ (v) π i 1(v) + + (1 κ) v N (u) v N (u) π i 1 (v) δ + (v) (1 d)(1 κ) δ + π i 1 (v) (v) A HPC computing problem::spmv 18 / 37

34 A Sparse Matrix-Vector Multiplication (SpMV) Rewriting DaRWR π i (u) = dp (u) + (1 d) κ π i (u) = dp (u) + π i = v N + (u) dp + Aπ i 1 v N + (u) π i 1 (v) δ (v) (1 d)κ δ (v) π i 1(v) + + (1 κ) v N (u) v N (u) π i 1 (v) δ + (v) (1 d)(1 κ) δ + π i 1 (v) (v) CRS Full Traverse A column per column. Skip columns where π i 1 (v) = 0. Per edge: 2 non-zeros (2 indices, 2 values) A HPC computing problem::spmv 18 / 37

35 A Sparse Matrix-Vector Multiplication (SpMV) Rewriting DaRWR π i (u) = dp (u) + π i = v N + (u) (1 d)κ δ (v) π i 1(v) + v N (u) (1 d)(1 κ) δ + π i 1 (v) (v) dp (1 d)κ + (1 d)(1 κ) + B δ π i 1 +B δ + π i 1 A HPC computing problem::spmv 19 / 37

36 A Sparse Matrix-Vector Multiplication (SpMV) Rewriting DaRWR π i (u) = dp (u) + π i = v N + (u) (1 d)κ δ (v) π i 1(v) + v N (u) (1 d)(1 κ) δ + π i 1 (v) (v) dp (1 d)κ + (1 d)(1 κ) + B δ π i 1 +B δ + π i 1 CRS Half pre-compute: (1 d)κ π δ i 1 and (1 d)(1 κ) δ π + i 1 B and B + are 0/1 and symmetric Traverse the matrix twice (B and B + ) Skip columns where π i 1 (v) = 0. Per edge: 1 non-zeros (1 index, 0 values) A HPC computing problem::spmv 19 / 37

37 A Sparse Matrix-Vector Multiplication (SpMV) Rewriting DaRWR π i (u) = dp (u) + π i = v N + (u) (1 d)κ δ (v) π i 1(v) + v N (u) (1 d)(1 κ) δ + π i 1 (v) (v) dp (1 d)κ + (1 d)(1 κ) + B δ π i 1 +B δ + π i 1 COO Half pre-compute: (1 d)κ π δ i 1 and (1 d)(1 κ) δ π + i 1 B and B + are 0/1 and symmetric Traverse the matrix once (B and B + ) Arbitrary order. Don t skip anything. Per edge: 1 non-zeros (2 indices, 0 values) A HPC computing problem::spmv 19 / 37

38 Number of updates 12M 10M # updates 8M 6M 4M 2M 0 CRS-Full CRS-Half COO-Half iteration A HPC computing problem::spmv 20 / 37

39 Runtimes execution time (s) CRS-Half CRS-Full COO-Half Hybrid CRS-Full CRS-Full (RCM) CRS-Full (AMD) CRS-Full (SB) CRS-Half CRS-Half (RCM) CRS-Half (AMD) CRS-Half (SB) COO-Half COO-Half (RCM) COO-Half (AMD) COO-Half (SB) Hybrid Hybrid (RCM) Hybrid (AMD) Hybrid (SB) #partitions A HPC computing problem::spmv 21 / 37

40 Ordering Locality SpMV is sensitive to non-zero locality. A HPC computing problem::ordering 22 / 37

41 Ordering Locality SpMV is sensitive to non-zero locality. Reverse Cuthill-McKee [Cuthill, McKee, 69] Order with respect to a Breadth First Search ordering. (Do 10 times, pick best) A HPC computing problem::ordering 22 / 37

42 Ordering Locality SpMV is sensitive to non-zero locality. Reverse Cuthill-McKee [Cuthill, McKee, 69] Order with respect to a Breadth First Search ordering. (Do 10 times, pick best) Approximate Minimum Degree [Amestoy et al.,96] Greedily, add the vertex whose degree is minimum. A HPC computing problem::ordering 22 / 37

43 Ordering Locality SpMV is sensitive to non-zero locality. Reverse Cuthill-McKee [Cuthill, McKee, 69] Order with respect to a Breadth First Search ordering. (Do 10 times, pick best) Approximate Minimum Degree [Amestoy et al.,96] Greedily, add the vertex whose degree is minimum. Slashburn [Kang, Faloutsos,11] Order by connected components. Remove the highest degree vertex. Repeat. A HPC computing problem::ordering 22 / 37

44 Partitioning A HPC computing problem::ordering 23 / 37

45 Runtimes execution time (s) CRS-Half CRS-Full COO-Half Hybrid CRS-Full CRS-Full (RCM) CRS-Full (AMD) CRS-Full (SB) CRS-Half CRS-Half (RCM) CRS-Half (AMD) CRS-Half (SB) COO-Half COO-Half (RCM) COO-Half (AMD) COO-Half (SB) Hybrid Hybrid (RCM) Hybrid (AMD) Hybrid (SB) #partitions A HPC computing problem::ordering 24 / 37

46 Outline 1 Introduction Why? Overview 2 Citation Analysis for Document Recommendation Previous Approaches Direction Aware Recommendation 3 A High Performance Computing Problem A specialization of SpMV Ordering and Partitioning 4 Result Diversification 5 Other Features 6 Final Thoughts Conclusion Future Works Result Diversification:: 25 / 37

47 Principle The goal of diversity is to avoid clustered answers. Result Diversification:: 26 / 37

48 Principle The goal of diversity is to avoid clustered answers. Relevant Result Diversification:: 26 / 37

49 Principle The goal of diversity is to avoid clustered answers. Relevant Relevant Diverse Result Diversification:: 26 / 37

50 A Modelization problem dens better rel 10-RLM BC1 BC1 V BC2 BC2 V BC BC BC 15 0 BC BC BC 25 0 CDivRank DRAGON Here is a distribution of known algorithms GRA GSP IL1 IL2 LM PDiv PR k-rl top-9 top-7 top-5 top-2 All R Result Diversification:: 27 / 37

51 A Modelization problem dens RLM BC1 BC1 V BC2 BC2 V BC BC BC 15 0 BC BC BC 25 0 CDivRank DRAGON Would such an algorithm be of interest? GRA GSP IL1 IL2 LM PDiv PR k-rl top-9 top-7 top-5 top-2 All R better rel Result Diversification:: 27 / 37

52 A Modelization problem dens RLM BC1 BC1 V BC2 BC2 V BC BC BC 15 0 BC BC BC 25 0 CDivRank DRAGON Would such an algorithm be of interest? That algorithm is random! GRA GSP IL1 IL2 LM PDiv PR k-rl top-9 top-7 top-5 top-2 All R better rel Result Diversification:: 27 / 37

53 What to do? See later talk! Result Diversification:: 28 / 37

54 Results references references recommendations recommendations top-100 Graph mining top-100 Graph mining Compression Generic SpMV Compression Generic SpMV Multicore e Multicore Partitioning Partitioning GPU GPU Eigensolvers Eigensolvers Result Diversification:: 29 / 37

55 Outline 1 Introduction Why? Overview 2 Citation Analysis for Document Recommendation Previous Approaches Direction Aware Recommendation 3 A High Performance Computing Problem A specialization of SpMV Ordering and Partitioning 4 Result Diversification 5 Other Features 6 Final Thoughts Conclusion Future Works Other Features:: 30 / 37

56 Relevance Feedback Papers can be tagged are relevant or irrelevant. Positive feedback (+RF): Relevant results are added to Q Negative feedback (-RF): Irrelevant results are removed from the graph How long does it take to find the first level-3 paper? Tau No RF 0.2 Pos RF Neg RF 0.1 Pos+Neg RF Ratio Other Features:: 31 / 37

57 Relevance Feedback Papers can be tagged are relevant or irrelevant. Positive feedback (+RF): Relevant results are added to Q Negative feedback (-RF): Irrelevant results are removed from the graph How long does it take to find the first level-3 paper? Tau More exploration! 0.3 No RF 0.2 Pos RF Neg RF 0.1 Pos+Neg RF Ratio Other Features:: 31 / 37

58 Visualization Other Features:: 32 / 37

59 Visualization More exploration! Other Features:: 32 / 37

60 Application Programming Interface Web service theadvisor can be accessed programmatically. Emit HTTP requests and obtain JSON encoded replies. Other Features:: 33 / 37

61 Application Programming Interface Web service theadvisor can be accessed programmatically. Emit HTTP requests and obtain JSON encoded replies. Potential Applications Interfacing with article editors (e.g., TexShop) Recommendation in bibliography manager (e.g., Mendeley) Suggesting reviewers to program committees (e.g., EasyChair) Suggesting sessions of interest at conferences (e.g., iconference ) Other Features:: 33 / 37

62 Application Programming Interface Web service theadvisor can be accessed programmatically. Emit HTTP requests and obtain JSON encoded replies. Potential Applications Interfacing with article editors (e.g., TexShop) Recommendation in bibliography manager (e.g., Mendeley) Suggesting reviewers to program committees (e.g., EasyChair) Suggesting sessions of interest at conferences (e.g., iconference ) Easier to use! Other Features:: 33 / 37

63 Outline 1 Introduction Why? Overview 2 Citation Analysis for Document Recommendation Previous Approaches Direction Aware Recommendation 3 A High Performance Computing Problem A specialization of SpMV Ordering and Partitioning 4 Result Diversification 5 Other Features 6 Final Thoughts Conclusion Future Works Final Thoughts:: 34 / 37

64 Design Goals - Are they matched? Personalized The query is expressed in a very precise way. Conceptual Using the citation graph allows to avoid all linguistic. Though, it may not be enough to find all relevant papers. Exploratory Direction Awareness (to choose time), Diversification (to see more topics), Visualization (for manual crawling) Easy to use Efficient (recommendation in less than 2 seconds), web-based. Final Thoughts::Conclusion 35 / 37

65 Design Goals - Are they matched? Personalized The query is expressed in a very precise way. Conceptual Using the citation graph allows to avoid all linguistic. Though, it may not be enough to find all relevant papers. Exploratory Direction Awareness (to choose time), Diversification (to see more topics), Visualization (for manual crawling) Easy to use Efficient (recommendation in less than 2 seconds), web-based. Is it good enough? Tell us! Final Thoughts::Conclusion 35 / 37

66 Future works Clustering Let s assume for an instant that we have accurate disambiguated tags for every document. We could restrict analysis to some fields. Improve diversification. Betweenness Centrality DaRWR provides recommendation around the query set. What about recommending what is between it? Contextual information Distinguishing types of papers and citations. Survey, Method, Application... Final Thoughts::Future Works 36 / 37

67 Thank you More information contact : esaule@bmi.osu.edu visit: (or or Research at HPC lab is supported by Final Thoughts::Future Works 37 / 37

theadvisor: A Graph-based Citation Recommendation Service

theadvisor: A Graph-based Citation Recommendation Service theadvisor: A Graph-based Citation Recommendation Service joint work with Onur Küçüktunç, Erik Saule, Kamer Kaya umit@bmi.osu.edu Professor Department of Biomedical Informatics Department of Electrical

More information

Fast Recommendation on Bibliographic Networks with Sparse-Matrix Ordering and Partitioning

Fast Recommendation on Bibliographic Networks with Sparse-Matrix Ordering and Partitioning Social Network Analysis and Mining manuscript No. (will be inserted by the editor) Fast Recommendation on Bibliographic Networks with Sparse-Matrix Ordering and Partitioning Onur Küçüktunç Kamer Kaya Erik

More information

Large Scale Graph Analysis

Large Scale Graph Analysis HPC Lab Biomedical Informatics The Ohio State University February 19, 2013 UNC Charlotte :: 1 / 39 Outline 1 Introduction 2 theadvisor Citation Analysis for Document Recommendation A High Performance Computing

More information

Large Scale Graph Analysis

Large Scale Graph Analysis HPC Lab Biomedical Informatics The Ohio State University March 11, 2013 UMass Boston :: 1 / 43 Outline 1 Introduction 2 theadvisor Citation Analysis for Document Recommendation A High Performance Computing

More information

Author's personal copy

Author's personal copy Soc. Netw. Anal. Min. (203) 3:097 DOI 0.007/s3278-03-006-z ORIGINAL ARTICLE Fast recommendation on bibliographic networks with sparse-matrix ordering and partitioning Onur Küçüktunç Kamer Kaya Erik Saule

More information

Local is Good: A Fast Citation Recommendation Approach

Local is Good: A Fast Citation Recommendation Approach Local is Good: A Fast Citation Recommendation Approach Haofeng Jia and Erik Saule Dept. of Computer Science, UNC Charlotte, USA {hjia1,esaule}@uncc.edu Abstract. Finding relevant research works from the

More information

Mining Web Data. Lijun Zhang

Mining Web Data. Lijun Zhang Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems

More information

Mining Web Data. Lijun Zhang

Mining Web Data. Lijun Zhang Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems

More information

CSE 701: LARGE-SCALE GRAPH MINING. A. Erdem Sariyuce

CSE 701: LARGE-SCALE GRAPH MINING. A. Erdem Sariyuce CSE 701: LARGE-SCALE GRAPH MINING A. Erdem Sariyuce WHO AM I? My name is Erdem Office: 323 Davis Hall Office hours: Wednesday 2-4 pm Research on graph (network) mining & management Practical algorithms

More information

Social Network Analysis

Social Network Analysis Social Network Analysis Mathematics of Networks Manar Mohaisen Department of EEC Engineering Adjacency matrix Network types Edge list Adjacency list Graph representation 2 Adjacency matrix Adjacency matrix

More information

Motivation. Motivation

Motivation. Motivation COMS11 Motivation PageRank Department of Computer Science, University of Bristol Bristol, UK 1 November 1 The World-Wide Web was invented by Tim Berners-Lee circa 1991. By the late 199s, the amount of

More information

Social-Network Graphs

Social-Network Graphs Social-Network Graphs Mining Social Networks Facebook, Google+, Twitter Email Networks, Collaboration Networks Identify communities Similar to clustering Communities usually overlap Identify similarities

More information

CS 6604: Data Mining Large Networks and Time-Series

CS 6604: Data Mining Large Networks and Time-Series CS 6604: Data Mining Large Networks and Time-Series Soumya Vundekode Lecture #12: Centrality Metrics Prof. B Aditya Prakash Agenda Link Analysis and Web Search Searching the Web: The Problem of Ranking

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining Lecture #10: Link Analysis-2 Seoul National University 1 In This Lecture Pagerank: Google formulation Make the solution to converge Computing Pagerank for very large graphs

More information

Part I: Data Mining Foundations

Part I: Data Mining Foundations Table of Contents 1. Introduction 1 1.1. What is the World Wide Web? 1 1.2. A Brief History of the Web and the Internet 2 1.3. Web Data Mining 4 1.3.1. What is Data Mining? 6 1.3.2. What is Web Mining?

More information

Relational Retrieval Using a Combination of Path-Constrained Random Walks

Relational Retrieval Using a Combination of Path-Constrained Random Walks Relational Retrieval Using a Combination of Path-Constrained Random Walks Ni Lao, William W. Cohen University 2010.9.22 Outline Relational Retrieval Problems Path-constrained random walks The need for

More information

1 Starting around 1996, researchers began to work on. 2 In Feb, 1997, Yanhong Li (Scotch Plains, NJ) filed a

1 Starting around 1996, researchers began to work on. 2 In Feb, 1997, Yanhong Li (Scotch Plains, NJ) filed a !"#$ %#& ' Introduction ' Social network analysis ' Co-citation and bibliographic coupling ' PageRank ' HIS ' Summary ()*+,-/*,) Early search engines mainly compare content similarity of the query and

More information

Page rank computation HPC course project a.y Compute efficient and scalable Pagerank

Page rank computation HPC course project a.y Compute efficient and scalable Pagerank Page rank computation HPC course project a.y. 2012-13 Compute efficient and scalable Pagerank 1 PageRank PageRank is a link analysis algorithm, named after Brin & Page [1], and used by the Google Internet

More information

A Modified Algorithm to Handle Dangling Pages using Hypothetical Node

A Modified Algorithm to Handle Dangling Pages using Hypothetical Node A Modified Algorithm to Handle Dangling Pages using Hypothetical Node Shipra Srivastava Student Department of Computer Science & Engineering Thapar University, Patiala, 147001 (India) Rinkle Rani Aggrawal

More information

Information Retrieval. Lecture 11 - Link analysis

Information Retrieval. Lecture 11 - Link analysis Information Retrieval Lecture 11 - Link analysis Seminar für Sprachwissenschaft International Studies in Computational Linguistics Wintersemester 2007 1/ 35 Introduction Link analysis: using hyperlinks

More information

Large-Scale Networks. PageRank. Dr Vincent Gramoli Lecturer School of Information Technologies

Large-Scale Networks. PageRank. Dr Vincent Gramoli Lecturer School of Information Technologies Large-Scale Networks PageRank Dr Vincent Gramoli Lecturer School of Information Technologies Introduction Last week we talked about: - Hubs whose scores depend on the authority of the nodes they point

More information

Enhanced Retrieval of Web Pages using Improved Page Rank Algorithm

Enhanced Retrieval of Web Pages using Improved Page Rank Algorithm Enhanced Retrieval of Web Pages using Improved Page Rank Algorithm Rekha Jain 1, Sulochana Nathawat 2, Dr. G.N. Purohit 3 1 Department of Computer Science, Banasthali University, Jaipur, Rajasthan ABSTRACT

More information

Centralities (4) By: Ralucca Gera, NPS. Excellence Through Knowledge

Centralities (4) By: Ralucca Gera, NPS. Excellence Through Knowledge Centralities (4) By: Ralucca Gera, NPS Excellence Through Knowledge Some slide from last week that we didn t talk about in class: 2 PageRank algorithm Eigenvector centrality: i s Rank score is the sum

More information

Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p.

Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p. Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p. 6 What is Web Mining? p. 6 Summary of Chapters p. 8 How

More information

Graphs (Part II) Shannon Quinn

Graphs (Part II) Shannon Quinn Graphs (Part II) Shannon Quinn (with thanks to William Cohen and Aapo Kyrola of CMU, and J. Leskovec, A. Rajaraman, and J. Ullman of Stanford University) Parallel Graph Computation Distributed computation

More information

Graphs / Networks. CSE 6242/ CX 4242 Feb 18, Centrality measures, algorithms, interactive applications. Duen Horng (Polo) Chau Georgia Tech

Graphs / Networks. CSE 6242/ CX 4242 Feb 18, Centrality measures, algorithms, interactive applications. Duen Horng (Polo) Chau Georgia Tech CSE 6242/ CX 4242 Feb 18, 2014 Graphs / Networks Centrality measures, algorithms, interactive applications Duen Horng (Polo) Chau Georgia Tech Partly based on materials by Professors Guy Lebanon, Jeffrey

More information

Advanced Topics in Information Retrieval. Learning to Rank. ATIR July 14, 2016

Advanced Topics in Information Retrieval. Learning to Rank. ATIR July 14, 2016 Advanced Topics in Information Retrieval Learning to Rank Vinay Setty vsetty@mpi-inf.mpg.de Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ATIR July 14, 2016 Before we start oral exams July 28, the full

More information

Incorporating Satellite Documents into Co-citation Networks for Scientific Paper Searches

Incorporating Satellite Documents into Co-citation Networks for Scientific Paper Searches Incorporating Satellite Documents into Co-citation Networks for Scientific Paper Searches Masaki Eto Gakushuin Women s College Tokyo, Japan masaki.eto@gakushuin.ac.jp Abstract. To improve the search performance

More information

Wikipedia is not the sum of all human knowledge: do we need a wiki for open data?

Wikipedia is not the sum of all human knowledge: do we need a wiki for open data? Wikipedia is not the sum of all human knowledge: do we need a wiki for open data? Finn Årup Nielsen Lundbeck Foundation Center for Integrated Molecular Brain Imaging at Department of Informatics and Mathematical

More information

Bibliometrics: Citation Analysis

Bibliometrics: Citation Analysis Bibliometrics: Citation Analysis Many standard documents include bibliographies (or references), explicit citations to other previously published documents. Now, if you consider citations as links, academic

More information

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 http://intelligentoptimization.org/lionbook Roberto Battiti

More information

The Anatomy of a Large-Scale Hypertextual Web Search Engine

The Anatomy of a Large-Scale Hypertextual Web Search Engine The Anatomy of a Large-Scale Hypertextual Web Search Engine Article by: Larry Page and Sergey Brin Computer Networks 30(1-7):107-117, 1998 1 1. Introduction The authors: Lawrence Page, Sergey Brin started

More information

DSCI 575: Advanced Machine Learning. PageRank Winter 2018

DSCI 575: Advanced Machine Learning. PageRank Winter 2018 DSCI 575: Advanced Machine Learning PageRank Winter 2018 http://ilpubs.stanford.edu:8090/422/1/1999-66.pdf Web Search before Google Unsupervised Graph-Based Ranking We want to rank importance based on

More information

MAE 298, Lecture 9 April 30, Web search and decentralized search on small-worlds

MAE 298, Lecture 9 April 30, Web search and decentralized search on small-worlds MAE 298, Lecture 9 April 30, 2007 Web search and decentralized search on small-worlds Search for information Assume some resource of interest is stored at the vertices of a network: Web pages Files in

More information

SimRank : A Measure of Structural-Context Similarity

SimRank : A Measure of Structural-Context Similarity SimRank : A Measure of Structural-Context Similarity Glen Jeh and Jennifer Widom 1.Co-Founder at FriendDash 2.Stanford, compter Science, department chair SIGKDD 2002, Citation : 506 (Google Scholar) 1

More information

Developing Focused Crawlers for Genre Specific Search Engines

Developing Focused Crawlers for Genre Specific Search Engines Developing Focused Crawlers for Genre Specific Search Engines Nikhil Priyatam Thesis Advisor: Prof. Vasudeva Varma IIIT Hyderabad July 7, 2014 Examples of Genre Specific Search Engines MedlinePlus Naukri.com

More information

COMPARATIVE ANALYSIS OF POWER METHOD AND GAUSS-SEIDEL METHOD IN PAGERANK COMPUTATION

COMPARATIVE ANALYSIS OF POWER METHOD AND GAUSS-SEIDEL METHOD IN PAGERANK COMPUTATION International Journal of Computer Engineering and Applications, Volume IX, Issue VIII, Sep. 15 www.ijcea.com ISSN 2321-3469 COMPARATIVE ANALYSIS OF POWER METHOD AND GAUSS-SEIDEL METHOD IN PAGERANK COMPUTATION

More information

Analysis of Biological Networks. 1. Clustering 2. Random Walks 3. Finding paths

Analysis of Biological Networks. 1. Clustering 2. Random Walks 3. Finding paths Analysis of Biological Networks 1. Clustering 2. Random Walks 3. Finding paths Problem 1: Graph Clustering Finding dense subgraphs Applications Identification of novel pathways, complexes, other modules?

More information

G(B)enchmark GraphBench: Towards a Universal Graph Benchmark. Khaled Ammar M. Tamer Özsu

G(B)enchmark GraphBench: Towards a Universal Graph Benchmark. Khaled Ammar M. Tamer Özsu G(B)enchmark GraphBench: Towards a Universal Graph Benchmark Khaled Ammar M. Tamer Özsu Bioinformatics Software Engineering Social Network Gene Co-expression Protein Structure Program Flow Big Graphs o

More information

Roadmap. Roadmap. Ranking Web Pages. PageRank. Roadmap. Random Walks in Ranking Query Results in Semistructured Databases

Roadmap. Roadmap. Ranking Web Pages. PageRank. Roadmap. Random Walks in Ranking Query Results in Semistructured Databases Roadmap Random Walks in Ranking Query in Vagelis Hristidis Roadmap Ranking Web Pages Rank according to Relevance of page to query Quality of page Roadmap PageRank Stanford project Lawrence Page, Sergey

More information

Information Retrieval and Web Search

Information Retrieval and Web Search Information Retrieval and Web Search Link analysis Instructor: Rada Mihalcea (Note: This slide set was adapted from an IR course taught by Prof. Chris Manning at Stanford U.) The Web as a Directed Graph

More information

Lecture 17 November 7

Lecture 17 November 7 CS 559: Algorithmic Aspects of Computer Networks Fall 2007 Lecture 17 November 7 Lecturer: John Byers BOSTON UNIVERSITY Scribe: Flavio Esposito In this lecture, the last part of the PageRank paper has

More information

Python & Web Mining. Lecture Old Dominion University. Department of Computer Science CS 495 Fall 2012

Python & Web Mining. Lecture Old Dominion University. Department of Computer Science CS 495 Fall 2012 Python & Web Mining Lecture 6 10-10-12 Old Dominion University Department of Computer Science CS 495 Fall 2012 Hany SalahEldeen Khalil hany@cs.odu.edu Scenario So what did Professor X do when he wanted

More information

Maximizing the Value of STM Content through Semantic Enrichment. Frank Stumpf December 1, 2009

Maximizing the Value of STM Content through Semantic Enrichment. Frank Stumpf December 1, 2009 Maximizing the Value of STM Content through Semantic Enrichment Frank Stumpf December 1, 2009 What is Semantics and Semantic Processing? Content Knowledge Framework Technology Framework Search Text Images

More information

Data-Intensive Computing with MapReduce

Data-Intensive Computing with MapReduce Data-Intensive Computing with MapReduce Session 5: Graph Processing Jimmy Lin University of Maryland Thursday, February 21, 2013 This work is licensed under a Creative Commons Attribution-Noncommercial-Share

More information

Scholarly Big Data: Leverage for Science

Scholarly Big Data: Leverage for Science Scholarly Big Data: Leverage for Science C. Lee Giles The Pennsylvania State University University Park, PA, USA giles@ist.psu.edu http://clgiles.ist.psu.edu Funded in part by NSF, Allen Institute for

More information

CPSC 340: Machine Learning and Data Mining. Ranking Fall 2016

CPSC 340: Machine Learning and Data Mining. Ranking Fall 2016 CPSC 340: Machine Learning and Data Mining Ranking Fall 2016 Assignment 5: Admin 2 late days to hand in Wednesday, 3 for Friday. Assignment 6: Due Friday, 1 late day to hand in next Monday, etc. Final:

More information

Bing Liu. Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data. With 177 Figures. Springer

Bing Liu. Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data. With 177 Figures. Springer Bing Liu Web Data Mining Exploring Hyperlinks, Contents, and Usage Data With 177 Figures Springer Table of Contents 1. Introduction 1 1.1. What is the World Wide Web? 1 1.2. A Brief History of the Web

More information

A brief history of Google

A brief history of Google the math behind Sat 25 March 2006 A brief history of Google 1995-7 The Stanford days (aka Backrub(!?)) 1998 Yahoo! wouldn't buy (but they might invest...) 1999 Finally out of beta! Sergey Brin Larry Page

More information

Google Scale Data Management

Google Scale Data Management Google Scale Data Management The slides are based on the slides made by Prof. K. Selcuk Candan, which is partially based on slides by Qing Li Google (..a course on that??) 2 1 Google (..a course on that??)

More information

Exploiting the OpenPOWER Platform for Big Data Analytics and Cognitive. Rajesh Bordawekar and Ruchir Puri IBM T. J. Watson Research Center

Exploiting the OpenPOWER Platform for Big Data Analytics and Cognitive. Rajesh Bordawekar and Ruchir Puri IBM T. J. Watson Research Center Exploiting the OpenPOWER Platform for Big Data Analytics and Cognitive Rajesh Bordawekar and Ruchir Puri IBM T. J. Watson Research Center 3/17/2015 2014 IBM Corporation Outline IBM OpenPower Platform Accelerating

More information

Introduction to Text Mining. Hongning Wang

Introduction to Text Mining. Hongning Wang Introduction to Text Mining Hongning Wang CS@UVa Who Am I? Hongning Wang Assistant professor in CS@UVa since August 2014 Research areas Information retrieval Data mining Machine learning CS@UVa CS6501:

More information

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu How to organize the Web? First try: Human curated Web directories Yahoo, DMOZ, LookSmart Second

More information

Learning to Rank Networked Entities

Learning to Rank Networked Entities Learning to Rank Networked Entities Alekh Agarwal Soumen Chakrabarti Sunny Aggarwal Presented by Dong Wang 11/29/2006 We've all heard that a million monkeys banging on a million typewriters will eventually

More information

CS6200 Information Retreival. The WebGraph. July 13, 2015

CS6200 Information Retreival. The WebGraph. July 13, 2015 CS6200 Information Retreival The WebGraph The WebGraph July 13, 2015 1 Web Graph: pages and links The WebGraph describes the directed links between pages of the World Wide Web. A directed edge connects

More information

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu How to organize the Web? First try: Human curated Web directories Yahoo, DMOZ, LookSmart Second

More information

Lecture 27: Learning from relational data

Lecture 27: Learning from relational data Lecture 27: Learning from relational data STATS 202: Data mining and analysis December 2, 2017 1 / 12 Announcements Kaggle deadline is this Thursday (Dec 7) at 4pm. If you haven t already, make a submission

More information

Query Independent Scholarly Article Ranking

Query Independent Scholarly Article Ranking Query Independent Scholarly Article Ranking Shuai Ma, Chen Gong, Renjun Hu, Dongsheng Luo, Chunming Hu, Jinpeng Huai SKLSDE Lab, Beihang University, China Beijing Advanced Innovation Center for Big Data

More information

Reading Time: A Method for Improving the Ranking Scores of Web Pages

Reading Time: A Method for Improving the Ranking Scores of Web Pages Reading Time: A Method for Improving the Ranking Scores of Web Pages Shweta Agarwal Asst. Prof., CS&IT Deptt. MIT, Moradabad, U.P. India Bharat Bhushan Agarwal Asst. Prof., CS&IT Deptt. IFTM, Moradabad,

More information

COMP 4601 Hubs and Authorities

COMP 4601 Hubs and Authorities COMP 4601 Hubs and Authorities 1 Motivation PageRank gives a way to compute the value of a page given its position and connectivity w.r.t. the rest of the Web. Is it the only algorithm: No! It s just one

More information

University of Maryland. Tuesday, March 2, 2010

University of Maryland. Tuesday, March 2, 2010 Data-Intensive Information Processing Applications Session #5 Graph Algorithms Jimmy Lin University of Maryland Tuesday, March 2, 2010 This work is licensed under a Creative Commons Attribution-Noncommercial-Share

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu HITS (Hypertext Induced Topic Selection) Is a measure of importance of pages or documents, similar to PageRank

More information

Word Disambiguation in Web Search

Word Disambiguation in Web Search Word Disambiguation in Web Search Rekha Jain Computer Science, Banasthali University, Rajasthan, India Email: rekha_leo2003@rediffmail.com G.N. Purohit Computer Science, Banasthali University, Rajasthan,

More information

Anatomy of a search engine. Design criteria of a search engine Architecture Data structures

Anatomy of a search engine. Design criteria of a search engine Architecture Data structures Anatomy of a search engine Design criteria of a search engine Architecture Data structures Step-1: Crawling the web Google has a fast distributed crawling system Each crawler keeps roughly 300 connection

More information

Introduction. Internet and Social Networking Research Tools for Academic Writing. Introduction. Introduction. Writing well is challenging enough

Introduction. Internet and Social Networking Research Tools for Academic Writing. Introduction. Introduction. Writing well is challenging enough Introduction Internet and Social Networking Research Tools for Academic Writing Copyright 2013 Todd A. Whittaker (todd.whittaker@franklin.edu) Writing well is challenging enough grammar, usage, style elegance

More information

.. Spring 2009 CSC 466: Knowledge Discovery from Data Alexander Dekhtyar..

.. Spring 2009 CSC 466: Knowledge Discovery from Data Alexander Dekhtyar.. .. Spring 2009 CSC 466: Knowledge Discovery from Data Alexander Dekhtyar.. Link Analysis in Graphs: PageRank Link Analysis Graphs Recall definitions from Discrete math and graph theory. Graph. A graph

More information

Organize. Collaborate. Discover. All About Mendeley

Organize. Collaborate. Discover.  All About Mendeley Organize. Collaborate. Discover. www.mendeley.com All About Mendeley 1 What is Mendeley? Free Academic Software Cross-Platform (Win/Mac/Linux/Mobile) All Major Browsers Desktop Web Mobile How does Mendeley

More information

Science 2.0 VU Big Science, e-science and E- Infrastructures + Bibliometric Network Analysis

Science 2.0 VU Big Science, e-science and E- Infrastructures + Bibliometric Network Analysis W I S S E N n T E C H N I K n L E I D E N S C H A F T Science 2.0 VU Big Science, e-science and E- Infrastructures + Bibliometric Network Analysis Elisabeth Lex KTI, TU Graz WS 2015/16 u www.tugraz.at

More information

COMP Page Rank

COMP Page Rank COMP 4601 Page Rank 1 Motivation Remember, we were interested in giving back the most relevant documents to a user. Importance is measured by reference as well as content. Think of this like academic paper

More information

An Improved Computation of the PageRank Algorithm 1

An Improved Computation of the PageRank Algorithm 1 An Improved Computation of the PageRank Algorithm Sung Jin Kim, Sang Ho Lee School of Computing, Soongsil University, Korea ace@nowuri.net, shlee@computing.ssu.ac.kr http://orion.soongsil.ac.kr/ Abstract.

More information

Tag-based Social Interest Discovery

Tag-based Social Interest Discovery Tag-based Social Interest Discovery Xin Li / Lei Guo / Yihong (Eric) Zhao Yahoo!Inc 2008 Presented by: Tuan Anh Le (aletuan@vub.ac.be) 1 Outline Introduction Data set collection & Pre-processing Architecture

More information

Performance Evaluation of Sparse Matrix Multiplication Kernels on Intel Xeon Phi

Performance Evaluation of Sparse Matrix Multiplication Kernels on Intel Xeon Phi Performance Evaluation of Sparse Matrix Multiplication Kernels on Intel Xeon Phi Erik Saule 1, Kamer Kaya 1 and Ümit V. Çatalyürek 1,2 esaule@uncc.edu, {kamer,umit}@bmi.osu.edu 1 Department of Biomedical

More information

VALLIAMMAI ENGINEERING COLLEGE SRM Nagar, Kattankulathur DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING QUESTION BANK VII SEMESTER

VALLIAMMAI ENGINEERING COLLEGE SRM Nagar, Kattankulathur DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING QUESTION BANK VII SEMESTER VALLIAMMAI ENGINEERING COLLEGE SRM Nagar, Kattankulathur 603 203 DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING QUESTION BANK VII SEMESTER CS6007-INFORMATION RETRIEVAL Regulation 2013 Academic Year 2018

More information

Link Analysis. Hongning Wang

Link Analysis. Hongning Wang Link Analysis Hongning Wang CS@UVa Structured v.s. unstructured data Our claim before IR v.s. DB = unstructured data v.s. structured data As a result, we have assumed Document = a sequence of words Query

More information

CSE 373 NOVEMBER 20 TH TOPOLOGICAL SORT

CSE 373 NOVEMBER 20 TH TOPOLOGICAL SORT CSE 373 NOVEMBER 20 TH TOPOLOGICAL SORT PROJECT 3 500 Internal Error problems Hopefully all resolved (or close to) P3P1 grades are up (but muted) Leave canvas comment Emails tomorrow End of quarter GRAPHS

More information

Guide to RefWorks 2.0

Guide to RefWorks 2.0 Guide to RefWorks 2.0 RefWorks is a powerful online research management, writing and collaboration tool designed to help users at all levels easily gather, organize, store and share all types of information

More information

Optimizing Search Engines using Click-through Data

Optimizing Search Engines using Click-through Data Optimizing Search Engines using Click-through Data By Sameep - 100050003 Rahee - 100050028 Anil - 100050082 1 Overview Web Search Engines : Creating a good information retrieval system Previous Approaches

More information

Masher: Mapping Long(er) Reads with Hash-based Genome Indexing on GPUs

Masher: Mapping Long(er) Reads with Hash-based Genome Indexing on GPUs Masher: Mapping Long(er) Reads with Hash-based Genome Indexing on GPUs Anas Abu-Doleh 1,2, Erik Saule 1, Kamer Kaya 1 and Ümit V. Çatalyürek 1,2 1 Department of Biomedical Informatics 2 Department of Electrical

More information

Bring Semantic Web to Social Communities

Bring Semantic Web to Social Communities Bring Semantic Web to Social Communities Jie Tang Dept. of Computer Science, Tsinghua University, China jietang@tsinghua.edu.cn April 19, 2010 Abstract Recently, more and more researchers have recognized

More information

Figure 1: A directed graph.

Figure 1: A directed graph. 1 Graphs A graph is a data structure that expresses relationships between objects. The objects are called nodes and the relationships are called edges. For example, social networks can be represented as

More information

Text Analytics (Text Mining)

Text Analytics (Text Mining) CSE 6242 / CX 4242 Apr 1, 2014 Text Analytics (Text Mining) Concepts and Algorithms Duen Horng (Polo) Chau Georgia Tech Some lectures are partly based on materials by Professors Guy Lebanon, Jeffrey Heer,

More information

Graph Algorithms. Revised based on the slides by Ruoming Kent State

Graph Algorithms. Revised based on the slides by Ruoming Kent State Graph Algorithms Adapted from UMD Jimmy Lin s slides, which is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States. See http://creativecommons.org/licenses/by-nc-sa/3.0/us/

More information

Competitive Intelligence and Web Mining:

Competitive Intelligence and Web Mining: Competitive Intelligence and Web Mining: Domain Specific Web Spiders American University in Cairo (AUC) CSCE 590: Seminar1 Report Dr. Ahmed Rafea 2 P age Khalid Magdy Salama 3 P age Table of Contents Introduction

More information

Machine Learning in WAN Research

Machine Learning in WAN Research Machine Learning in WAN Research Mariam Kiran mkiran@es.net Energy Sciences Network (ESnet) Lawrence Berkeley National Lab Oct 2017 Presented at Internet2 TechEx 2017 Outline ML in general ML in network

More information

More Ways to Solve & Graph Quadratics The Square Root Property If x 2 = a and a R, then x = ± a

More Ways to Solve & Graph Quadratics The Square Root Property If x 2 = a and a R, then x = ± a More Ways to Solve & Graph Quadratics The Square Root Property If x 2 = a and a R, then x = ± a Example: Solve using the square root property. a) x 2 144 = 0 b) x 2 + 144 = 0 c) (x + 1) 2 = 12 Completing

More information

Lecture Notes: Social Networks: Models, Algorithms, and Applications Lecture 28: Apr 26, 2012 Scribes: Mauricio Monsalve and Yamini Mule

Lecture Notes: Social Networks: Models, Algorithms, and Applications Lecture 28: Apr 26, 2012 Scribes: Mauricio Monsalve and Yamini Mule Lecture Notes: Social Networks: Models, Algorithms, and Applications Lecture 28: Apr 26, 2012 Scribes: Mauricio Monsalve and Yamini Mule 1 How big is the Web How big is the Web? In the past, this question

More information

NATURAL LANGUAGE PROCESSING

NATURAL LANGUAGE PROCESSING NATURAL LANGUAGE PROCESSING LESSON 9 : SEMANTIC SIMILARITY OUTLINE Semantic Relations Semantic Similarity Levels Sense Level Word Level Text Level WordNet-based Similarity Methods Hybrid Methods Similarity

More information

Contextual Search using Cognitive Discovery Capabilities

Contextual Search using Cognitive Discovery Capabilities Contextual Search using Cognitive Discovery Capabilities In this exercise, you will work with a sample application that uses the Watson Discovery service API s for cognitive search use cases. Discovery

More information

Seminar Recent Trends in Database Research

Seminar Recent Trends in Database Research Seminar Recent Trends in Database Research Summer Term 2013 Lehrgebiet Informationssysteme Weiping Qu qu@cs.uni-kl.de AG Datenbanken und Informationssysteme AG Heterogene Informationssysteme Goals a) Familiarize

More information

Link Analysis from Bing Liu. Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data, Springer and other material.

Link Analysis from Bing Liu. Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data, Springer and other material. Link Analysis from Bing Liu. Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data, Springer and other material. 1 Contents Introduction Network properties Social network analysis Co-citation

More information

Promoting Ranking Diversity for Biomedical Information Retrieval based on LDA

Promoting Ranking Diversity for Biomedical Information Retrieval based on LDA Promoting Ranking Diversity for Biomedical Information Retrieval based on LDA Yan Chen, Xiaoshi Yin, Zhoujun Li, Xiaohua Hu and Jimmy Huang State Key Laboratory of Software Development Environment, Beihang

More information

Automated Path Ascend Forum Crawling

Automated Path Ascend Forum Crawling Automated Path Ascend Forum Crawling Ms. Joycy Joy, PG Scholar Department of CSE, Saveetha Engineering College,Thandalam, Chennai-602105 Ms. Manju. A, Assistant Professor, Department of CSE, Saveetha Engineering

More information

Search Engine Architecture. Hongning Wang

Search Engine Architecture. Hongning Wang Search Engine Architecture Hongning Wang CS@UVa CS@UVa CS4501: Information Retrieval 2 Document Analyzer Classical search engine architecture The Anatomy of a Large-Scale Hypertextual Web Search Engine

More information

Research and implementation of search engine based on Lucene Wan Pu, Wang Lisha

Research and implementation of search engine based on Lucene Wan Pu, Wang Lisha 2nd International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 2016) Research and implementation of search engine based on Lucene Wan Pu, Wang Lisha Physics Institute,

More information

Link Structure Analysis

Link Structure Analysis Link Structure Analysis Kira Radinsky All of the following slides are courtesy of Ronny Lempel (Yahoo!) Link Analysis In the Lecture HITS: topic-specific algorithm Assigns each page two scores a hub score

More information

Web Crawling. Jitali Patel 1, Hardik Jethva 2 Dept. of Computer Science and Engineering, Nirma University, Ahmedabad, Gujarat, India

Web Crawling. Jitali Patel 1, Hardik Jethva 2 Dept. of Computer Science and Engineering, Nirma University, Ahmedabad, Gujarat, India Web Crawling Jitali Patel 1, Hardik Jethva 2 Dept. of Computer Science and Engineering, Nirma University, Ahmedabad, Gujarat, India - 382 481. Abstract- A web crawler is a relatively simple automated program

More information

Web consists of web pages and hyperlinks between pages. A page receiving many links from other pages may be a hint of the authority of the page

Web consists of web pages and hyperlinks between pages. A page receiving many links from other pages may be a hint of the authority of the page Link Analysis Links Web consists of web pages and hyperlinks between pages A page receiving many links from other pages may be a hint of the authority of the page Links are also popular in some other information

More information

Example-Based Skeleton Extraction. Scott Schaefer Can Yuksel

Example-Based Skeleton Extraction. Scott Schaefer Can Yuksel Example-Based Skeleton Extraction Scott Schaefer Can Yuksel Example-Based Deformation Examples Previous Work Mesh-based Inverse Kinematics [Sumner et al. 2005], [Der et al. 2006] Example-based deformation

More information

Online Social Networks and Media

Online Social Networks and Media Online Social Networks and Media Absorbing Random Walks Link Prediction Why does the Power Method work? If a matrix R is real and symmetric, it has real eigenvalues and eigenvectors: λ, w, λ 2, w 2,, (λ

More information

Kartik Lakhotia, Rajgopal Kannan, Viktor Prasanna USENIX ATC 18

Kartik Lakhotia, Rajgopal Kannan, Viktor Prasanna USENIX ATC 18 Accelerating PageRank using Partition-Centric Processing Kartik Lakhotia, Rajgopal Kannan, Viktor Prasanna USENIX ATC 18 Outline Introduction Partition-centric Processing Methodology Analytical Evaluation

More information