Scale Free Network Growth By Ranking. Santo Fortunato, Alessandro Flammini, and Filippo Menczer

Similar documents
Overlay (and P2P) Networks

arxiv:cs/ v1 [cs.ir] 26 Apr 2002

Example for calculation of clustering coefficient Node N 1 has 8 neighbors (red arrows) There are 12 connectivities among neighbors (blue arrows)

Empirical analysis of online social networks in the age of Web 2.0

Complex Networks. Structure and Dynamics


Structural Mining of. Thesis Defense Mark Meiss

Mining Web Data. Lijun Zhang

Exploring Econometric Model Selection Using Sensitivity Analysis

FILTERING OF URLS USING WEBCRAWLER

Phase Transitions in Random Graphs- Outbreak of Epidemics to Network Robustness and fragility

Compact Encoding of the Web Graph Exploiting Various Power Laws

The topology of the Web as a complex, scale-free network is now

The Value of Metadata

arxiv:cs/ v2 [cs.cy] 23 Aug 2006

2SAT Andreas Klappenecker

Graph Exploration: Taking the User into the Loop

5. Key ingredients for programming a random walk

Two-dimensional Totalistic Code 52

TELCOM2125: Network Science and Analysis

Primo Product Working Group Q&A Session. IGeLU 2011 Haifa

Selecting the Right Model Studio - Excel 2007 Version

A geometric model for on-line social networks

How to speed up a database which has gotten slow

Accelerated Life Testing Module Accelerated Life Testing - Overview

Parallel Computing Concepts. CSInParallel Project

LOESS curve fitted to a population sampled from a sine wave with uniform noise added. The LOESS curve approximates the original sine wave.

Detecting Behavior Propagation in BGP Trace Data Brian J. Premore Michael Liljenstam David Nicol

CS249: SPECIAL TOPICS MINING INFORMATION/SOCIAL NETWORKS

Distributed Pagerank for P2P Systems

Graph Structure Over Time

Linear Regression. Problem: There are many observations with the same x-value but different y-values... Can t predict one y-value from x. d j.

Mathematical Analysis of Google PageRank

Predictive Analysis: Evaluation and Experimentation. Heejun Kim

Link prediction with sequences on an online social network

Shuffling with a Croupier: Nat Aware Peer Sampling

Mining Web Data. Lijun Zhang

Week 7: The normal distribution and sample means

Chapter 2 Basic Structure of High-Dimensional Spaces

COMP5331: Knowledge Discovery and Data Mining

D-Optimal Designs. Chapter 888. Introduction. D-Optimal Design Overview

The missing links in the BGP-based AS connectivity maps

Week 3 Video 4. Automated Feature Generation Automated Feature Selection

CSE494 Information Retrieval Project C Report

Evolutionary Linkage Creation between Information Sources in P2P Networks

EXPLORING PATTERNS THAT INVOLVE SPACE AND MEASUREMENT: SOME EXAMPLE OF NON LINEAR PATTERNS OR GROWTH PATTERNS

Graph Theory. Graph Theory. COURSE: Introduction to Biological Networks. Euler s Solution LECTURE 1: INTRODUCTION TO NETWORKS.

Estimating Local Decision-Making Behavior in Complex Evolutionary Systems

Home Page. Title Page. Page 1 of 14. Go Back. Full Screen. Close. Quit

On Local Estimations of PageRank: A Mean Field Approach

CSE 546 Machine Learning, Autumn 2013 Homework 2

Youtube Graph Network Model and Analysis Yonghyun Ro, Han Lee, Dennis Won

Characterizing Gnutella Network Properties for Peer-to-Peer Network Simulation

COSC 311: ALGORITHMS HW1: SORTING

Keyword is the term used for the words that people type into search engines to find you:

Link Analysis. CSE 454 Advanced Internet Systems University of Washington. 1/26/12 16:36 1 Copyright D.S.Weld

Outlier Ensembles. Charu C. Aggarwal IBM T J Watson Research Center Yorktown, NY Keynote, Outlier Detection and Description Workshop, 2013

Motion estimation for video compression

Structure of Social Networks

How Do Real Networks Look? Networked Life NETS 112 Fall 2014 Prof. Michael Kearns

Properties of Biological Networks

Computational Intelligence Meets the NetFlix Prize

Algorithm efficiency can be measured in terms of: Time Space Other resources such as processors, network packets, etc.

5/13/2009. Introduction. Introduction. Introduction. Introduction. Introduction

Available Optimization Methods

Robotics. Lecture 5: Monte Carlo Localisation. See course website for up to date information.

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015

Discovering. Algebra. An Investigative Approach. Condensed Lessons for Make-up Work

Overview. Monte Carlo Methods. Statistics & Bayesian Inference Lecture 3. Situation At End Of Last Week

Non Stationary Variograms Based on Continuously Varying Weights

Modeling Traffic of Information Packets on Graphs with Complex Topology

Building an Internet-Scale Publish/Subscribe System

An Attempt to Identify Weakest and Strongest Queries

Part 11: Collaborative Filtering. Francesco Ricci

An Evolving Network Model With Local-World Structure

Feature Selection. Department Biosysteme Karsten Borgwardt Data Mining Course Basel Fall Semester / 262

Social and Technological Network Data Analytics. Lecture 5: Structure of the Web, Search and Power Laws. Prof Cecilia Mascolo

ECLT 5810 Clustering

Implementation of the Gnutella Protocol

Phd. studies at Mälardalen University

Chapter 3. Graphs CLRS Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved.

Machine Learning Reliability Techniques for Composite Materials in Structural Applications.

TELCOM2125: Network Science and Analysis

Characterizing Gnutella Network Properties for Peer-to-Peer Network Simulation

Constructing Overlay Networks through Gossip

Overlapping Ring Monitoring Algorithm in TIPC

Exam IST 441 Spring 2013

Degree Distribution, and Scale-free networks. Ralucca Gera, Applied Mathematics Dept. Naval Postgraduate School Monterey, California

CS446: Machine Learning Fall Problem Set 4. Handed Out: October 17, 2013 Due: October 31 th, w T x i w

Structural Analysis of Paper Citation and Co-Authorship Networks using Network Analysis Techniques

Knowledge Discovery and Data Mining

EXTRACTION OF RELEVANT WEB PAGES USING DATA MINING

Statistical Matching using Fractional Imputation

Part 2 Introductory guides to the FMSP stock assessment software

Exam IST 441 Spring 2011

(Social) Networks Analysis III. Prof. Dr. Daning Hu Department of Informatics University of Zurich

UNCLASSIFIED. R-1 Program Element (Number/Name) PE D8Z / Software Engineering Institute (SEI) Applied Research. Prior Years FY 2013 FY 2014

CITS4009 Introduction to Data Science

Web Structure, Age and Page Quality. Computer Science Department, University of Chile. Blanco Encalada 2120, Santiago, Chile.

Computer Science 210 Data Structures Siena College Fall Topic Notes: Complexity and Asymptotic Analysis

Transcription:

Scale Free Network Growth By Ranking Santo Fortunato, Alessandro Flammini, and Filippo Menczer

Motivation Network growth is usually explained through mechanisms that rely on node prestige measures, such as degree or fitness. In many real networks, those who create and connect nodes do not know the prestige values of existing nodes but only their ranking by prestige. They proposed a criterion of network growth that relies on the ranking of the nodes according to any prestige measure, be it topological or not. The resulting network has a scale-free degree distribution when the probability to link a target node is any power-law function of its rank, even when one has only partial information of node ranks. Their criterion may explain the frequency and robustness of scale-free degree distributions in real networks, such as the Web graph.

The Model First, a prestige ranking criterion is selected. At the (t + 1)th iteration, the new node t + 1 is created and new links are set from it to m pre-existing nodes. The previous t nodes are ranked according to prestige, and the linking probability p(t + 1 -> j) that node t + 1 be connected to node j depends only on the rank R j of j: p(t +1 -> j) = t k=1 R j α R k α Where α > 0 is a real-valued parameter. The linking probability decreases with increasing rank. The choice of prestige measure is arbitrary.

Ranking by Age If the nodes are sorted by age, from the oldest to the newest, the label of each node coincides with its rank, i.e. R t = t t In this case, the linking probability is the same as that of the static network model We can calculate the number of links that the Rth node will attract since its creation. Suppose that the evolution of the network stops when N nodes are created. N The expected total number k R of links that the Rth node has attracted at the end is k R N = N mr α t t=r+1 j α j=1

Ranking by Age k R N = AR α N k Approximating with integrals gives that where A is a function of N N Knowing k R it is possible to find how many nodes have the same expected number of links k The probability p(k,n) that a node of the network has degree k is p(k, N) k (1+1/α ) This shows that the degree distribution of the network follows a power law with exponent γ =1+1/α for any value of α

Ranking by Degree The number of incoming links of a node represents how many times the node has been selected by its peers. For undirected networks, we can equivalently use the degree. Nodes with (in-)degree zero, which if present are a problem for the extension of other growth models to directed networks, do not raise an issue here because they have ranks expressed by positive numbers, like all other nodes. To see what kind of networks emerge with this new prestige measure, we cannot apply the same derivation as for ranking by age because the degree-based ranking of a node can change over time. For a growing network there is a strong correlation between the age of a node and its degree, as older nodes have more chances to receive links. Also the ranking of nodes according to degree is quite stable. Therefore, we expect the same result as for the ranking by age.

Ranking by Degree To verify this expectation, they performed Monte Carlo simulations of the network growth process with the new degree-based ranking. The inset shows the degree distributions of four networks, corresponding to various values of the exponent. In the logarithmic scale of the plot the tails appear as straight lines, as one would expect for scale-free distributions. To verify that the relation between α and the exponent γ of the degree distribution is the one predicted by the model, in the main plot various pairs (α,γ) were compared with the hyperbola and the agreement was clear.

Advantages Compared to mechanisms proposed in the past to explain the emergence of scale-free networks, the rank-based model presents three main advantages. First, it assumes less information is available to nodes (or node creators); it seems more realistic in many real cases to imagine that the relative importance of items is easier to access than their absolute importance. Second, the link attractiveness of nodes is by no means restricted to topology; it can depend on exogenous attributes of the nodes, which makes our model suitable for applications in many different contexts. Third, the criterion is more robust in that: (i) it naturally extends to directed networks; (ii) it leads to long-tailed degree distributions for a broad class of linking probability functions namely, power laws of rank with any exponent α>0, (iii) the scale-free degree distribution generated by the model is not affected by limiting the information available to subsets of nodes.

Applications The rank-based model is directly applicable to the Web as a special case, if one considers the role of search engines in the discovery of pages. When a user submits a query, the search engine ranks the results by various criteria including a topological prestige measure, PageRank, closely correlated with in-degree. Users do not know the PageRank of the search hits but observe their ranking and, thus, are more likely to discover and link pages that are ranked near the top. Assuming that users tend to link pages that they discover by searching, and that they are aware only of the pages returned by search engines in response to their queries, their model predicts a scale-free distribution of in-degree with exponent γ=2.1 This is in perfect agreement with established Web measurements