Homework 1 Yang Zhang

Size: px
Start display at page:

Download "Homework 1 Yang Zhang"

Transcription

1 Homework 1 Yang Zhang Part 1: Using test-sm.nt as dataset for this part: (1) python dd.py to get Degree: Degree: 3, Frequency: 2 Degree: 4, Frequency: 3 Degree: 2, Frequency: 3 (2) python pr-d.py to get PageRank;: Vertex: node:5, PageRank: Vertex: node:8, PageRank: Vertex: node:6, PageRank: Vertex: node:4, PageRank: Vertex: node:7, PageRank: Vertex: node:3, PageRank: Vertex: node:2, PageRank: Vertex: node:1, PageRank: (3) python tr.py to get number of triangles: Number of Triangles: 4. (4) The eccentricity of <node:1> is 2. The eccentricity of <node:2> is 3. The eccentricity of <node:3> is 3. The eccentricity of <node:4> is 3. The eccentricity of <node:5> is 2. The eccentricity of <node:6> is 2. The eccentricity of <node:7> is 3. The eccentricity of <node:8> is 3. Part 4: Parallel way is much faster than serial: each step in serial implementation takes about 0.6 to 1.6 seconds, which means it will takes thousands or even millions of seconds to wait till it reaches convergences. Explanation: (1) When python codes call sparql query, there exists some delay due to the communication. Serial implementation calls sparql query much more frequent than parallel so the a lot of time are wasted in sparql-python communication; (2) Time sparql requires to handle large dataset is not linearly proportional to the size of the data, time spent on K data are much smaller than K times of the time spent on one data.

2 For (2)(3)(5): A Random-Walk Way of Graph Path Ranking Using SPARQL 1. General Idea: To implement random walk algorithm on a directed graph, each path starts from a random vertex, and randomly picks up a connected node to extend the path, repeatedly, till an expected length. To make it parallel in SPARQL, K paths proceeds at the same time. Running this progress in a While loop keeps updating the counts of the visits to each vertex, and hence the percentages of each visited vertex. We assume, similar to Page Rank concept, the more important a vertex is, the more visits it gets. Therefore the percentage represents the importance level of the vertex within the graph. After a certain number of iterations, the percentages will converge to equilibrium values. Hereby, we define the convergence rate as the maximum change of the percentages of the visits to the vertices in two adjacent iterations. The algorithm is completed after the convergence rate drops below a threshold. A path is scored by adding up the percentages of the nodes the path includes. To avoid mistakenly scoring paths involving circles, a node is counted only once even if it was reached multiple times. A path with higher score is more likely to connect the important nodes. 2. Development: 2.1 Framework: Three working graphs are used besides the default graph storing the data. Graph Name Functions Format workinggraph0 To update the statistics(count, percentage, convergence rate)?nodeid <temp:count>?count?nodeid <temp: percentage>?percentage?nodeid <temp:difference>?difference workinggraph1 workinggraph2 To save all previous visits, in both accomplished paths and the earlier finished steps in the ongoing paths A buffer graph temporally saving the generated next steps in the ongoing paths. Data in workinggraph2 will then be moved to workinggraph1 before executing next step?pathid <step: N>?nodeId N varies from 1 to LENGTH_OF_PATH?pathId <step: M>?nodeId M is the ongoing step

3 2.2 Generating random starting vertices: SELECT DISTINCT?startNode WHERE{ {?startnode?p?o} UNION{?o?p?startNode} BIND(RAND() AS?sortKey) }GROUP BY?sortKey LIMIT 1 #for serial processing LIMIT 2 #for parallel processing 2.3 Routing Randomly Selecting a random connection to extend a path is similar to generating random starting vertices, except an inner projection needs to be generated for each path 2.4 Updating the statistics After each iteration, the newly generated visits will be moved from workinggraph1 to workinggraph2. Then the query counts both the total number of all previous visits, and the number of visits to each vertex. Dividing the latter by former gets the percentages of the visits to each vertex. To update the convergence rate, the differences between the new and previous percentage of each node are calculated and the maximum of them is the convergence rate of current iteration. 2.5 Checking if it is convergent A buffer list of size 10 is defined in Python to store the convergence rates of the most recent ten iterations. When a new iteration is finished, the first element of the buffer list will be popped out and the new convergence rate is pushed to the end. Only when all the 10 convergence rates are smaller than the threshold, random walking is identified as being convergent and the iterations stops. This prevents improperly cease due to two coincidentally similar iterations. The convergence threshold is set up as 2.6 Path Ranking: SELECT?pathId (SAMPLE(?_nextNode) AS?nextNode) { SELECT?pathId?currentNode?_nextNode{ GRAPH< <step:n>?currentnode}?currentnode?p?_nextnode BIND(RAND() AS?orderKey) } ORDER BY?orderKey }GROUP BY?pathId 1 number of vertices 10 Retrieve the nodes on a path and add the percentages up as the score of this path.

4 SELECT?pathId (SUM(?per) AS?score) (GROUP_CONCAT(?nodeId;SEPARATOR="->") AS?nodes) WHERE{ SELECT DISTINCT?pathId?nodeId?per WHERE{ GRAPH < GRAPH< <temp:percentage>?per.} } } GROUP BY?pathId ORDER BY DESC(?score) 3. Implementation 3.1 Serial While True: getstartnode(1); For step = 1 to LENGTH_OF_PATH getnextnode(); Endfor If isconvergent(): Break; //if convergent, jump out of while loop Endif PathRanking(); 3.2 Implementation: K-Parallel Parallel way is much similar to serial except getting K start nodes and extending K paths at the same time. K or 1 is a parameter transferred to SPARQL queries. While True: getstartnode(k); For step = 1 to LENGTH_OF_PATH getnextnodeforkpaths(); Endfor If isconvergent(): Break; //if convergent, jump out of while loop Endif PathRanking();

5 4. Case Study: We tested the implementation on a VMware Ubuntu virtual machine, which is assigned 2GB memory. The test graph has 10,000 vertices and 104,250 edges. When testing the serial way, we find each step takes 0.6 to 1.7 seconds, which means it takes 6,000 to 17,000 seconds to generate 10,000 random paths, and this is far away from reaching convergence. Serial way is way too time consuming. We executed the algorithm parallelly with 100, 500, 1000, and 5000 threads respectively. The following figure shows the association between the convergence rate and the execution time and 1000-Parallels reach convergence faster than 100- and 5000-Parallels. This tells us a proper number of parallels could be around 5% to 10% of the number of vertices of the graph. Too many and too little parallels deteriorates its performance. Convergence Rate / Parallel 500-Parallel 1000-Parallel 5000-Parallel Convergence Threshold Time /s

Link Analysis in the Cloud

Link Analysis in the Cloud Cloud Computing Link Analysis in the Cloud Dell Zhang Birkbeck, University of London 2017/18 Graph Problems & Representations What is a Graph? G = (V,E), where V represents the set of vertices (nodes)

More information

Lecture 27: Learning from relational data

Lecture 27: Learning from relational data Lecture 27: Learning from relational data STATS 202: Data mining and analysis December 2, 2017 1 / 12 Announcements Kaggle deadline is this Thursday (Dec 7) at 4pm. If you haven t already, make a submission

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu HITS (Hypertext Induced Topic Selection) Is a measure of importance of pages or documents, similar to PageRank

More information

Graph Algorithms. Revised based on the slides by Ruoming Kent State

Graph Algorithms. Revised based on the slides by Ruoming Kent State Graph Algorithms Adapted from UMD Jimmy Lin s slides, which is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States. See http://creativecommons.org/licenses/by-nc-sa/3.0/us/

More information

University of Maryland. Tuesday, March 2, 2010

University of Maryland. Tuesday, March 2, 2010 Data-Intensive Information Processing Applications Session #5 Graph Algorithms Jimmy Lin University of Maryland Tuesday, March 2, 2010 This work is licensed under a Creative Commons Attribution-Noncommercial-Share

More information

1. The following graph is not Eulerian. Make it into an Eulerian graph by adding as few edges as possible.

1. The following graph is not Eulerian. Make it into an Eulerian graph by adding as few edges as possible. 1. The following graph is not Eulerian. Make it into an Eulerian graph by adding as few edges as possible. A graph is Eulerian if it has an Eulerian circuit, which occurs if the graph is connected and

More information

PatternRank: A Software-Pattern Search System Based on Mutual Reference Importance

PatternRank: A Software-Pattern Search System Based on Mutual Reference Importance PatternRank: A Software-Pattern Search System Based on Mutual Reference Importance Atsuto Kubo, Hiroyuki Nakayama, Hironori Washizaki, Yoshiaki Fukazawa Waseda University Department of Computer Science

More information

Data-Intensive Computing with MapReduce

Data-Intensive Computing with MapReduce Data-Intensive Computing with MapReduce Session 5: Graph Processing Jimmy Lin University of Maryland Thursday, February 21, 2013 This work is licensed under a Creative Commons Attribution-Noncommercial-Share

More information

Information Networks: PageRank

Information Networks: PageRank Information Networks: PageRank Web Science (VU) (706.716) Elisabeth Lex ISDS, TU Graz June 18, 2018 Elisabeth Lex (ISDS, TU Graz) Links June 18, 2018 1 / 38 Repetition Information Networks Shape of the

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu SPAM FARMING 2/11/2013 Jure Leskovec, Stanford C246: Mining Massive Datasets 2 2/11/2013 Jure Leskovec, Stanford

More information

COMP5331: Knowledge Discovery and Data Mining

COMP5331: Knowledge Discovery and Data Mining COMP5331: Knowledge Discovery and Data Mining Acknowledgement: Slides modified based on the slides provided by Lawrence Page, Sergey Brin, Rajeev Motwani and Terry Winograd, Jon M. Kleinberg 1 1 PageRank

More information

Jordan Boyd-Graber University of Maryland. Thursday, March 3, 2011

Jordan Boyd-Graber University of Maryland. Thursday, March 3, 2011 Data-Intensive Information Processing Applications! Session #5 Graph Algorithms Jordan Boyd-Graber University of Maryland Thursday, March 3, 2011 This work is licensed under a Creative Commons Attribution-Noncommercial-Share

More information

CS 6604: Data Mining Large Networks and Time-Series

CS 6604: Data Mining Large Networks and Time-Series CS 6604: Data Mining Large Networks and Time-Series Soumya Vundekode Lecture #12: Centrality Metrics Prof. B Aditya Prakash Agenda Link Analysis and Web Search Searching the Web: The Problem of Ranking

More information

Web consists of web pages and hyperlinks between pages. A page receiving many links from other pages may be a hint of the authority of the page

Web consists of web pages and hyperlinks between pages. A page receiving many links from other pages may be a hint of the authority of the page Link Analysis Links Web consists of web pages and hyperlinks between pages A page receiving many links from other pages may be a hint of the authority of the page Links are also popular in some other information

More information

CC PROCESAMIENTO MASIVO DE DATOS OTOÑO Lecture 7: Information Retrieval II. Aidan Hogan

CC PROCESAMIENTO MASIVO DE DATOS OTOÑO Lecture 7: Information Retrieval II. Aidan Hogan CC5212-1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2017 Lecture 7: Information Retrieval II Aidan Hogan aidhog@gmail.com How does Google know about the Web? Inverted Index: Example 1 Fruitvale Station is a 2013

More information

Social Networks 2015 Lecture 10: The structure of the web and link analysis

Social Networks 2015 Lecture 10: The structure of the web and link analysis 04198250 Social Networks 2015 Lecture 10: The structure of the web and link analysis The structure of the web Information networks Nodes: pieces of information Links: different relations between information

More information

COMP Page Rank

COMP Page Rank COMP 4601 Page Rank 1 Motivation Remember, we were interested in giving back the most relevant documents to a user. Importance is measured by reference as well as content. Think of this like academic paper

More information

VISUAL RERANKING USING MULTIPLE SEARCH ENGINES

VISUAL RERANKING USING MULTIPLE SEARCH ENGINES VISUAL RERANKING USING MULTIPLE SEARCH ENGINES By Dennis Lim Thye Loon A REPORT SUBMITTED TO Universiti Tunku Abdul Rahman in partial fulfillment of the requirements for the degree of Faculty of Information

More information

Agenda. Math Google PageRank algorithm. 2 Developing a formula for ranking web pages. 3 Interpretation. 4 Computing the score of each page

Agenda. Math Google PageRank algorithm. 2 Developing a formula for ranking web pages. 3 Interpretation. 4 Computing the score of each page Agenda Math 104 1 Google PageRank algorithm 2 Developing a formula for ranking web pages 3 Interpretation 4 Computing the score of each page Google: background Mid nineties: many search engines often times

More information

Assignment 3 ITCS-6010/8010: Cloud Computing for Data Analysis

Assignment 3 ITCS-6010/8010: Cloud Computing for Data Analysis Assignment 3 ITCS-6010/8010: Cloud Computing for Data Analysis Due by 11:59:59pm on Tuesday, March 16, 2010 This assignment is based on a similar assignment developed at the University of Washington. Running

More information

Unsupervised Learning. Pantelis P. Analytis. Introduction. Finding structure in graphs. Clustering analysis. Dimensionality reduction.

Unsupervised Learning. Pantelis P. Analytis. Introduction. Finding structure in graphs. Clustering analysis. Dimensionality reduction. March 19, 2018 1 / 40 1 2 3 4 2 / 40 What s unsupervised learning? Most of the data available on the internet do not have labels. How can we make sense of it? 3 / 40 4 / 40 5 / 40 Organizing the web First

More information

Graph Data Processing with MapReduce

Graph Data Processing with MapReduce Distributed data processing on the Cloud Lecture 5 Graph Data Processing with MapReduce Satish Srirama Some material adapted from slides by Jimmy Lin, 2015 (licensed under Creation Commons Attribution

More information

Popularity of Twitter Accounts: PageRank on a Social Network

Popularity of Twitter Accounts: PageRank on a Social Network Popularity of Twitter Accounts: PageRank on a Social Network A.D-A December 8, 2017 1 Problem Statement Twitter is a social networking service, where users can create and interact with 140 character messages,

More information

Centralities (4) By: Ralucca Gera, NPS. Excellence Through Knowledge

Centralities (4) By: Ralucca Gera, NPS. Excellence Through Knowledge Centralities (4) By: Ralucca Gera, NPS Excellence Through Knowledge Some slide from last week that we didn t talk about in class: 2 PageRank algorithm Eigenvector centrality: i s Rank score is the sum

More information

2013/2/12 EVOLVING GRAPH. Bahman Bahmani(Stanford) Ravi Kumar(Google) Mohammad Mahdian(Google) Eli Upfal(Brown) Yanzhao Yang

2013/2/12 EVOLVING GRAPH. Bahman Bahmani(Stanford) Ravi Kumar(Google) Mohammad Mahdian(Google) Eli Upfal(Brown) Yanzhao Yang 1 PAGERANK ON AN EVOLVING GRAPH Bahman Bahmani(Stanford) Ravi Kumar(Google) Mohammad Mahdian(Google) Eli Upfal(Brown) Present by Yanzhao Yang 1 Evolving Graph(Web Graph) 2 The directed links between web

More information

A project report submitted to Indiana University

A project report submitted to Indiana University Sequential Page Rank Algorithm Indiana University, Bloomington Fall-2012 A project report submitted to Indiana University By Shubhada Karavinkoppa and Jayesh Kawli Under supervision of Prof. Judy Qiu 1

More information

CS224W Project Write-up Static Crawling on Social Graph Chantat Eksombatchai Norases Vesdapunt Phumchanit Watanaprakornkul

CS224W Project Write-up Static Crawling on Social Graph Chantat Eksombatchai Norases Vesdapunt Phumchanit Watanaprakornkul 1 CS224W Project Write-up Static Crawling on Social Graph Chantat Eksombatchai Norases Vesdapunt Phumchanit Watanaprakornkul Introduction Our problem is crawling a static social graph (snapshot). Given

More information

Personalized Web Search

Personalized Web Search Personalized Web Search Dhanraj Mavilodan (dhanrajm@stanford.edu), Kapil Jaisinghani (kjaising@stanford.edu), Radhika Bansal (radhika3@stanford.edu) Abstract: With the increase in the diversity of contents

More information

Chapter 10. Fundamental Network Algorithms. M. E. J. Newman. May 6, M. E. J. Newman Chapter 10 May 6, / 33

Chapter 10. Fundamental Network Algorithms. M. E. J. Newman. May 6, M. E. J. Newman Chapter 10 May 6, / 33 Chapter 10 Fundamental Network Algorithms M. E. J. Newman May 6, 2015 M. E. J. Newman Chapter 10 May 6, 2015 1 / 33 Table of Contents 1 Algorithms for Degrees and Degree Distributions Degree-Degree Correlation

More information

Data structures are often needed to provide organization for large sets of data.

Data structures are often needed to provide organization for large sets of data. Motivation Data structures are often needed to provide organization for large sets of data. Skip Lists However, traditional approaches offer a tradeoff between insertion/deletion and search performance:

More information

Pagerank Scoring. Imagine a browser doing a random walk on web pages:

Pagerank Scoring. Imagine a browser doing a random walk on web pages: Ranking Sec. 21.2 Pagerank Scoring Imagine a browser doing a random walk on web pages: Start at a random page At each step, go out of the current page along one of the links on that page, equiprobably

More information

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017)

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017) Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017) Week 5: Analyzing Graphs (2/2) February 2, 2017 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo These

More information

Web search before Google. (Taken from Page et al. (1999), The PageRank Citation Ranking: Bringing Order to the Web.)

Web search before Google. (Taken from Page et al. (1999), The PageRank Citation Ranking: Bringing Order to the Web.) ' Sta306b May 11, 2012 $ PageRank: 1 Web search before Google (Taken from Page et al. (1999), The PageRank Citation Ranking: Bringing Order to the Web.) & % Sta306b May 11, 2012 PageRank: 2 Web search

More information

Introduction To Graphs and Networks. Fall 2013 Carola Wenk

Introduction To Graphs and Networks. Fall 2013 Carola Wenk Introduction To Graphs and Networks Fall 2013 Carola Wenk What is a Network? We have thought of a computer as a single entity, but they can also be connected to one another. Internet What are the advantages

More information

ORGANIZING THE DATA IN A FREQUENCY TABLE

ORGANIZING THE DATA IN A FREQUENCY TABLE ORGANIZING THE DATA IN A FREQUENCY TABLE Suppose the scores obtained by 5 students on a standardized test are as follows: 68, 55, 61, 55, 43, 59, 55, 58, 77, 6, 56, 53, 58, 7, 57, 62, 5, 69, 44, 63, 48,79,

More information

Seminar on. A Coarse-Grain Parallel Formulation of Multilevel k-way Graph Partitioning Algorithm

Seminar on. A Coarse-Grain Parallel Formulation of Multilevel k-way Graph Partitioning Algorithm Seminar on A Coarse-Grain Parallel Formulation of Multilevel k-way Graph Partitioning Algorithm Mohammad Iftakher Uddin & Mohammad Mahfuzur Rahman Matrikel Nr: 9003357 Matrikel Nr : 9003358 Masters of

More information

Subtracting with Multi-Digit Numbers Adaptable for 2 nd, 3 rd, 4 th, and 5 th grades*

Subtracting with Multi-Digit Numbers Adaptable for 2 nd, 3 rd, 4 th, and 5 th grades* Subtracting with Multi-Digit Numbers Adaptable for 2 nd, 3 rd, 4 th, and 5 th grades* *Please note that this lesson will be most effective after students have been taught a conceptual foundation in subtraction

More information

Temporal Graphs KRISHNAN PANAMALAI MURALI

Temporal Graphs KRISHNAN PANAMALAI MURALI Temporal Graphs KRISHNAN PANAMALAI MURALI METRICFORENSICS: A Multi-Level Approach for Mining Volatile Graphs Authors: Henderson, Eliassi-Rad, Faloutsos, Akoglu, Li, Maruhashi, Prakash and Tong. Published:

More information

Web Structure Mining using Link Analysis Algorithms

Web Structure Mining using Link Analysis Algorithms Web Structure Mining using Link Analysis Algorithms Ronak Jain Aditya Chavan Sindhu Nair Assistant Professor Abstract- The World Wide Web is a huge repository of data which includes audio, text and video.

More information

COMMUNITY SHELL S EFFECT ON THE DISINTEGRATION OF SOCIAL NETWORKS

COMMUNITY SHELL S EFFECT ON THE DISINTEGRATION OF SOCIAL NETWORKS Annales Univ. Sci. Budapest., Sect. Comp. 43 (2014) 57 68 COMMUNITY SHELL S EFFECT ON THE DISINTEGRATION OF SOCIAL NETWORKS Imre Szücs (Budapest, Hungary) Attila Kiss (Budapest, Hungary) Dedicated to András

More information

EXTRACTION OF RELEVANT WEB PAGES USING DATA MINING

EXTRACTION OF RELEVANT WEB PAGES USING DATA MINING Chapter 3 EXTRACTION OF RELEVANT WEB PAGES USING DATA MINING 3.1 INTRODUCTION Generally web pages are retrieved with the help of search engines which deploy crawlers for downloading purpose. Given a query,

More information

Predictive Indexing for Fast Search

Predictive Indexing for Fast Search Predictive Indexing for Fast Search Sharad Goel, John Langford and Alex Strehl Yahoo! Research, New York Modern Massive Data Sets (MMDS) June 25, 2008 Goel, Langford & Strehl (Yahoo! Research) Predictive

More information

Great Ideas of Computer Science: Random Walks and Searching the Web

Great Ideas of Computer Science: Random Walks and Searching the Web CS/MA 109 Fall 2016 Wayne Snyder Department Boston University Great Ideas of : Random Walks and Searching the Web Internet Search Engines Earlier search engines were fairly useless, because they could

More information

Computer and Programming: Lab 1

Computer and Programming: Lab 1 01204111 Computer and Programming: Lab 1 Name ID Section Goals To get familiar with Wing IDE and learn common mistakes with programming in Python To practice using Python interactively through Python Shell

More information

CSI 445/660 Part 10 (Link Analysis and Web Search)

CSI 445/660 Part 10 (Link Analysis and Web Search) CSI 445/660 Part 10 (Link Analysis and Web Search) Ref: Chapter 14 of [EK] text. 10 1 / 27 Searching the Web Ranking Web Pages Suppose you type UAlbany to Google. The web page for UAlbany is among the

More information

Analysis of Algorithms

Analysis of Algorithms Algorithm An algorithm is a procedure or formula for solving a problem, based on conducting a sequence of specified actions. A computer program can be viewed as an elaborate algorithm. In mathematics and

More information

The PageRank Citation Ranking

The PageRank Citation Ranking October 17, 2012 Main Idea - Page Rank web page is important if it points to by other important web pages. *Note the recursive definition IR - course web page, Brian home page, Emily home page, Steven

More information

Python & Web Mining. Lecture Old Dominion University. Department of Computer Science CS 495 Fall 2012

Python & Web Mining. Lecture Old Dominion University. Department of Computer Science CS 495 Fall 2012 Python & Web Mining Lecture 6 10-10-12 Old Dominion University Department of Computer Science CS 495 Fall 2012 Hany SalahEldeen Khalil hany@cs.odu.edu Scenario So what did Professor X do when he wanted

More information

This document describes how I implement the Newton method using Python and Fortran on the test function f(x) = (x 1) log 10 (x).

This document describes how I implement the Newton method using Python and Fortran on the test function f(x) = (x 1) log 10 (x). AMS 209 Foundations of Scientific Computing Homework 6 November 23, 2015 Cheng-Han Yu This document describes how I implement the Newton method using Python and Fortran on the test function f(x) = (x 1)

More information

COMP 4601 Hubs and Authorities

COMP 4601 Hubs and Authorities COMP 4601 Hubs and Authorities 1 Motivation PageRank gives a way to compute the value of a page given its position and connectivity w.r.t. the rest of the Web. Is it the only algorithm: No! It s just one

More information

Graphs / Networks CSE 6242/ CX Centrality measures, algorithms, interactive applications. Duen Horng (Polo) Chau Georgia Tech

Graphs / Networks CSE 6242/ CX Centrality measures, algorithms, interactive applications. Duen Horng (Polo) Chau Georgia Tech CSE 6242/ CX 4242 Graphs / Networks Centrality measures, algorithms, interactive applications Duen Horng (Polo) Chau Georgia Tech Partly based on materials by Professors Guy Lebanon, Jeffrey Heer, John

More information

Simulated Annealing. Slides based on lecture by Van Larhoven

Simulated Annealing. Slides based on lecture by Van Larhoven Simulated Annealing Slides based on lecture by Van Larhoven Iterative Improvement 1 General method to solve combinatorial optimization problems Principle: Start with initial configuration Repeatedly search

More information

Sampling Large Graphs for Anticipatory Analysis

Sampling Large Graphs for Anticipatory Analysis Sampling Large Graphs for Anticipatory Analysis Lauren Edwards*, Luke Johnson, Maja Milosavljevic, Vijay Gadepally, Benjamin A. Miller IEEE High Performance Extreme Computing Conference September 16, 2015

More information

Design Guide- Mobility

Design Guide- Mobility Proxim Wireless. All rights reserved. 1 Purpose This document serves as a reference guide for the mobility network designers to plan and design a mobility network that suits their requirement. Mobility

More information

Query Answering Using Inverted Indexes

Query Answering Using Inverted Indexes Query Answering Using Inverted Indexes Inverted Indexes Query Brutus AND Calpurnia J. Pei: Information Retrieval and Web Search -- Query Answering Using Inverted Indexes 2 Document-at-a-time Evaluation

More information

CS 347 Parallel and Distributed Data Processing

CS 347 Parallel and Distributed Data Processing CS 347 Parallel and Distributed Data Processing Spring 2016 Notes 12: Distributed Information Retrieval CS 347 Notes 12 2 CS 347 Notes 12 3 CS 347 Notes 12 4 CS 347 Notes 12 5 Web Search Engine Crawling

More information

CS 347 Parallel and Distributed Data Processing

CS 347 Parallel and Distributed Data Processing CS 347 Parallel and Distributed Data Processing Spring 2016 Notes 12: Distributed Information Retrieval CS 347 Notes 12 2 CS 347 Notes 12 3 CS 347 Notes 12 4 Web Search Engine Crawling Indexing Computing

More information

Geographic Routing Without Location Information. AP, Sylvia, Ion, Scott and Christos

Geographic Routing Without Location Information. AP, Sylvia, Ion, Scott and Christos Geographic Routing Without Location Information AP, Sylvia, Ion, Scott and Christos Routing in Wireless Networks Distance vector DSDV On-demand DSR, TORA, AODV Discovers and caches routes on demand Geographic

More information

Structure of Social Networks

Structure of Social Networks Structure of Social Networks Outline Structure of social networks Applications of structural analysis Social *networks* Twitter Facebook Linked-in IMs Email Real life Address books... Who Twitter #numbers

More information

Graphs (Part II) Shannon Quinn

Graphs (Part II) Shannon Quinn Graphs (Part II) Shannon Quinn (with thanks to William Cohen and Aapo Kyrola of CMU, and J. Leskovec, A. Rajaraman, and J. Ullman of Stanford University) Parallel Graph Computation Distributed computation

More information

L22-23: Graph Algorithms

L22-23: Graph Algorithms Indian Institute of Science Bangalore, India भ रत य व ज ञ न स स थ न ब गल र, भ रत Department of Computational and Data Sciences DS 0--0,0 L-: Graph Algorithms Yogesh Simmhan simmhan@cds.iisc.ac.in Slides

More information

CS101 Lecture 30: How Search Works and searching algorithms.

CS101 Lecture 30: How Search Works and searching algorithms. CS101 Lecture 30: How Search Works and searching algorithms. John Magee 5 August 2013 Web Traffic - pct of Page Views Source: alexa.com, 4/2/2012 1 What You ll Learn Today Google: What is a search engine?

More information

Link Structure Analysis

Link Structure Analysis Link Structure Analysis Kira Radinsky All of the following slides are courtesy of Ronny Lempel (Yahoo!) Link Analysis In the Lecture HITS: topic-specific algorithm Assigns each page two scores a hub score

More information

CSE494 Information Retrieval Project C Report

CSE494 Information Retrieval Project C Report CSE494 Information Retrieval Project C Report By: Jianchun Fan Introduction In project C we implement several different clustering methods on the query results given by pagerank algorithms. The clustering

More information

CS6200 Information Retreival. The WebGraph. July 13, 2015

CS6200 Information Retreival. The WebGraph. July 13, 2015 CS6200 Information Retreival The WebGraph The WebGraph July 13, 2015 1 Web Graph: pages and links The WebGraph describes the directed links between pages of the World Wide Web. A directed edge connects

More information

The Kinect Sensor. Luís Carriço FCUL 2014/15

The Kinect Sensor. Luís Carriço FCUL 2014/15 Advanced Interaction Techniques The Kinect Sensor Luís Carriço FCUL 2014/15 Sources: MS Kinect for Xbox 360 John C. Tang. Using Kinect to explore NUI, Ms Research, From Stanford CS247 Shotton et al. Real-Time

More information

CSE/EE-576, Final Project

CSE/EE-576, Final Project 1 CSE/EE-576, Final Project Torso tracking Ke-Yu Chen Introduction Human 3D modeling and reconstruction from 2D sequences has been researcher s interests for years. Torso is the main part of the human

More information

CS161 Design and Architecture of Computer Systems. Cache $$$$$

CS161 Design and Architecture of Computer Systems. Cache $$$$$ CS161 Design and Architecture of Computer Systems Cache $$$$$ Memory Systems! How can we supply the CPU with enough data to keep it busy?! We will focus on memory issues,! which are frequently bottlenecks

More information

Jeff Nothwehr National Drought Mitigation Center University of Nebraska-Lincoln

Jeff Nothwehr National Drought Mitigation Center University of Nebraska-Lincoln Using Multiprocessing in Python to Decrease Map Production Time Jeff Nothwehr National Drought Mitigation Center University of Nebraska-Lincoln Overview About multi-processing How it works Implementation

More information

Matching Algorithms. Proof. If a bipartite graph has a perfect matching, then it is easy to see that the right hand side is a necessary condition.

Matching Algorithms. Proof. If a bipartite graph has a perfect matching, then it is easy to see that the right hand side is a necessary condition. 18.433 Combinatorial Optimization Matching Algorithms September 9,14,16 Lecturer: Santosh Vempala Given a graph G = (V, E), a matching M is a set of edges with the property that no two of the edges have

More information

Computational Optimization Homework 3

Computational Optimization Homework 3 Computational Optimization Homework 3 Nedialko B. Dimitrov By completing this homework assignment you will learn about: Organizing code by creating your own objects. Vectorizing complex operations, and

More information

Searching the Web What is this Page Known for? Luis De Alba

Searching the Web What is this Page Known for? Luis De Alba Searching the Web What is this Page Known for? Luis De Alba ldealbar@cc.hut.fi Searching the Web Arasu, Cho, Garcia-Molina, Paepcke, Raghavan August, 2001. Stanford University Introduction People browse

More information

Policy-Based Spectrum Management

Policy-Based Spectrum Management Policy-Based Spectrum Management Prepared by : Sarah Dumoulin, Communications Research Centre, 3701 Carling Ave., Ottawa, K2H 8S2 Prepared for: Tricia Willink, Contract Technical Authority DRDC - Ottawa

More information

Solution 1 (python) Performance: Enron Samples Rate Recall Precision Total Contribution

Solution 1 (python) Performance: Enron Samples Rate Recall Precision Total Contribution Summary Each of the ham/spam classifiers has been tested against random samples from pre- processed enron sets 1 through 6 obtained via: http://www.aueb.gr/users/ion/data/enron- spam/, or the entire set

More information

DIAL: A Distributed Adaptive-Learning Routing Method in VDTNs

DIAL: A Distributed Adaptive-Learning Routing Method in VDTNs : A Distributed Adaptive-Learning Routing Method in VDTNs Bo Wu, Haiying Shen and Kang Chen Department of Electrical and Computer Engineering Clemson University, Clemson, South Carolina 29634 {bwu2, shenh,

More information

Information Retrieval (IR) Introduction to Information Retrieval. Lecture Overview. Why do we need IR? Basics of an IR system.

Information Retrieval (IR) Introduction to Information Retrieval. Lecture Overview. Why do we need IR? Basics of an IR system. Introduction to Information Retrieval Ethan Phelps-Goodman Some slides taken from http://www.cs.utexas.edu/users/mooney/ir-course/ Information Retrieval (IR) The indexing and retrieval of textual documents.

More information

I/O Characterization of Commercial Workloads

I/O Characterization of Commercial Workloads I/O Characterization of Commercial Workloads Kimberly Keeton, Alistair Veitch, Doug Obal, and John Wilkes Storage Systems Program Hewlett-Packard Laboratories www.hpl.hp.com/research/itc/csl/ssp kkeeton@hpl.hp.com

More information

Correctness. The Powercrust Algorithm for Surface Reconstruction. Correctness. Correctness. Delaunay Triangulation. Tools - Voronoi Diagram

Correctness. The Powercrust Algorithm for Surface Reconstruction. Correctness. Correctness. Delaunay Triangulation. Tools - Voronoi Diagram Correctness The Powercrust Algorithm for Surface Reconstruction Nina Amenta Sunghee Choi Ravi Kolluri University of Texas at Austin Boundary of a solid Close to original surface Homeomorphic to original

More information

COMP Homework #5. Due on April , 23:59. Web search-engine or Sudoku (100 points)

COMP Homework #5. Due on April , 23:59. Web search-engine or Sudoku (100 points) COMP 250 - Homework #5 Due on April 11 2017, 23:59 Web search-engine or Sudoku (100 points) IMPORTANT NOTES: o Submit only your SearchEngine.java o Do not change the class name, the file name, the method

More information

1 a = [ 5, 1, 6, 2, 4, 3 ] 4 f o r j i n r a n g e ( i + 1, l e n ( a ) 1) : 3 min = i

1 a = [ 5, 1, 6, 2, 4, 3 ] 4 f o r j i n r a n g e ( i + 1, l e n ( a ) 1) : 3 min = i Selection Sort Algorithm Principles of Computer Science II Sorting Algorithms This algorithm first finds the smallest element in the array and exchanges it with the element in the first position, then

More information

Web Search. Lecture Objectives. Text Technologies for Data Science INFR Learn about: 11/14/2017. Instructor: Walid Magdy

Web Search. Lecture Objectives. Text Technologies for Data Science INFR Learn about: 11/14/2017. Instructor: Walid Magdy Text Technologies for Data Science INFR11145 Web Search Instructor: Walid Magdy 14-Nov-2017 Lecture Objectives Learn about: Working with Massive data Link analysis (PageRank) Anchor text 2 1 The Web Document

More information

Identifying Web Spam With User Behavior Analysis

Identifying Web Spam With User Behavior Analysis Identifying Web Spam With User Behavior Analysis Yiqun Liu, Rongwei Cen, Min Zhang, Shaoping Ma, Liyun Ru State Key Lab of Intelligent Tech. & Sys. Tsinghua University 2008/04/23 Introduction simple math

More information

BUBBLE RAP: Social-Based Forwarding in Delay-Tolerant Networks

BUBBLE RAP: Social-Based Forwarding in Delay-Tolerant Networks 1 BUBBLE RAP: Social-Based Forwarding in Delay-Tolerant Networks Pan Hui, Jon Crowcroft, Eiko Yoneki Presented By: Shaymaa Khater 2 Outline Introduction. Goals. Data Sets. Community Detection Algorithms

More information

Part 1: Link Analysis & Page Rank

Part 1: Link Analysis & Page Rank Chapter 8: Graph Data Part 1: Link Analysis & Page Rank Based on Leskovec, Rajaraman, Ullman 214: Mining of Massive Datasets 1 Graph Data: Social Networks [Source: 4-degrees of separation, Backstrom-Boldi-Rosa-Ugander-Vigna,

More information

Big Data Analytics CSCI 4030

Big Data Analytics CSCI 4030 High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Web advertising

More information

Pregel. Ali Shah

Pregel. Ali Shah Pregel Ali Shah s9alshah@stud.uni-saarland.de 2 Outline Introduction Model of Computation Fundamentals of Pregel Program Implementation Applications Experiments Issues with Pregel 3 Outline Costs of Computation

More information

A SERVER-SIDE SUPPORT LAYER FOR CLIENT PERSPECTIVE TRANSPARENT WEB CONTENT MIGRATION

A SERVER-SIDE SUPPORT LAYER FOR CLIENT PERSPECTIVE TRANSPARENT WEB CONTENT MIGRATION STUDIA UNIV. BABEŞ BOLYAI, INFORMATICA, Volume LVIII, Number 3, 2013 A SERVER-SIDE SUPPORT LAYER FOR CLIENT PERSPECTIVE TRANSPARENT WEB CONTENT MIGRATION DARIUS BUFNEA (1) AND DIANA HALIŢĂ(1) Abstract.

More information

CHENNAI MATHEMATICAL INSTITUTE M.Sc. / Ph.D. Programme in Computer Science

CHENNAI MATHEMATICAL INSTITUTE M.Sc. / Ph.D. Programme in Computer Science CHENNAI MATHEMATICAL INSTITUTE M.Sc. / Ph.D. Programme in Computer Science Entrance Examination, 5 May 23 This question paper has 4 printed sides. Part A has questions of 3 marks each. Part B has 7 questions

More information

Ranking on Data Manifolds

Ranking on Data Manifolds Ranking on Data Manifolds Dengyong Zhou, Jason Weston, Arthur Gretton, Olivier Bousquet, and Bernhard Schölkopf Max Planck Institute for Biological Cybernetics, 72076 Tuebingen, Germany {firstname.secondname

More information

Lecture 17 November 7

Lecture 17 November 7 CS 559: Algorithmic Aspects of Computer Networks Fall 2007 Lecture 17 November 7 Lecturer: John Byers BOSTON UNIVERSITY Scribe: Flavio Esposito In this lecture, the last part of the PageRank paper has

More information

Module 4. Constraint satisfaction problems. Version 2 CSE IIT, Kharagpur

Module 4. Constraint satisfaction problems. Version 2 CSE IIT, Kharagpur Module 4 Constraint satisfaction problems Lesson 10 Constraint satisfaction problems - II 4.5 Variable and Value Ordering A search algorithm for constraint satisfaction requires the order in which variables

More information

Lecture 6: Spectral Graph Theory I

Lecture 6: Spectral Graph Theory I A Theorist s Toolkit (CMU 18-859T, Fall 013) Lecture 6: Spectral Graph Theory I September 5, 013 Lecturer: Ryan O Donnell Scribe: Jennifer Iglesias 1 Graph Theory For this course we will be working on

More information

Online Pattern Recognition in Multivariate Data Streams using Unsupervised Learning

Online Pattern Recognition in Multivariate Data Streams using Unsupervised Learning Online Pattern Recognition in Multivariate Data Streams using Unsupervised Learning Devina Desai ddevina1@csee.umbc.edu Tim Oates oates@csee.umbc.edu Vishal Shanbhag vshan1@csee.umbc.edu Machine Learning

More information

Predicting 3D Geometric shapes of objects from a Single Image

Predicting 3D Geometric shapes of objects from a Single Image Predicting3DGeometricshapesofobjectsfromaSingleImage TamerAhmedDeif::SavilSrivastava CS229FinalReport Introduction Automaticallyreconstructingasolid3Dmodelfromasingleimageisanopen computer vision problem.

More information

Lecture 8: Linkage algorithms and web search

Lecture 8: Linkage algorithms and web search Lecture 8: Linkage algorithms and web search Information Retrieval Computer Science Tripos Part II Ronan Cummins 1 Natural Language and Information Processing (NLIP) Group ronan.cummins@cl.cam.ac.uk 2017

More information

Absorbing Random walks Coverage

Absorbing Random walks Coverage DATA MINING LECTURE 3 Absorbing Random walks Coverage Random Walks on Graphs Random walk: Start from a node chosen uniformly at random with probability. n Pick one of the outgoing edges uniformly at random

More information

ECE 2574: Data Structures and Algorithms - Basic Sorting Algorithms. C. L. Wyatt

ECE 2574: Data Structures and Algorithms - Basic Sorting Algorithms. C. L. Wyatt ECE 2574: Data Structures and Algorithms - Basic Sorting Algorithms C. L. Wyatt Today we will continue looking at sorting algorithms Bubble sort Insertion sort Merge sort Quick sort Common Sorting Algorithms

More information

SELF-BALANCING SEARCH TREES. Chapter 11

SELF-BALANCING SEARCH TREES. Chapter 11 SELF-BALANCING SEARCH TREES Chapter 11 Tree Balance and Rotation Section 11.1 Algorithm for Rotation BTNode root = left right = data = 10 BTNode = left right = data = 20 BTNode NULL = left right = NULL

More information

Lecture #11: The Perceptron

Lecture #11: The Perceptron Lecture #11: The Perceptron Mat Kallada STAT2450 - Introduction to Data Mining Outline for Today Welcome back! Assignment 3 The Perceptron Learning Method Perceptron Learning Rule Assignment 3 Will be

More information

Absorbing Random walks Coverage

Absorbing Random walks Coverage DATA MINING LECTURE 3 Absorbing Random walks Coverage Random Walks on Graphs Random walk: Start from a node chosen uniformly at random with probability. n Pick one of the outgoing edges uniformly at random

More information