My Best Current Friend in a Social Network

Similar documents
Lecture #3: PageRank Algorithm The Mathematics of Google Search

Link Analysis and Web Search

1 Starting around 1996, researchers began to work on. 2 In Feb, 1997, Yanhong Li (Scotch Plains, NJ) filed a

COMPARATIVE ANALYSIS OF POWER METHOD AND GAUSS-SEIDEL METHOD IN PAGERANK COMPUTATION

DSCI 575: Advanced Machine Learning. PageRank Winter 2018

Mathematical Analysis of Google PageRank

Proximity Prestige using Incremental Iteration in Page Rank Algorithm

A Modified Algorithm to Handle Dangling Pages using Hypothetical Node

CPSC 532L Project Development and Axiomatization of a Ranking System

Fast Iterative Solvers for Markov Chains, with Application to Google's PageRank. Hans De Sterck

Web Structure Mining using Link Analysis Algorithms

Link Analysis from Bing Liu. Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data, Springer and other material.

Centralities (4) By: Ralucca Gera, NPS. Excellence Through Knowledge

THE KNOWLEDGE MANAGEMENT STRATEGY IN ORGANIZATIONS. Summer semester, 2016/2017

Page rank computation HPC course project a.y Compute efficient and scalable Pagerank

Collaborative Filtering using Euclidean Distance in Recommendation Engine

Graph Algorithms. Revised based on the slides by Ruoming Kent State

A New Technique for Ranking Web Pages and Adwords

Algorithms, Games, and Networks February 21, Lecture 12

Application of PageRank Algorithm on Sorting Problem Su weijun1, a

PageRank and related algorithms

Extracting Information from Complex Networks

Link Analysis in the Cloud

Supervised Random Walks

An Improved Computation of the PageRank Algorithm 1

Graph Data Processing with MapReduce

PAGE RANK ON MAP- REDUCE PARADIGM

MAE 298, Lecture 9 April 30, Web search and decentralized search on small-worlds

Trust in the Internet of Things From Personal Experience to Global Reputation. 1 Nguyen Truong PhD student, Liverpool John Moores University

University of Maryland. Tuesday, March 2, 2010

Jordan Boyd-Graber University of Maryland. Thursday, March 3, 2011

2.3 Algorithms Using Map-Reduce

Large-Scale Networks. PageRank. Dr Vincent Gramoli Lecturer School of Information Technologies

Web search before Google. (Taken from Page et al. (1999), The PageRank Citation Ranking: Bringing Order to the Web.)

PageRank Algorithm Abstract: Keywords: I. Introduction II. Text Ranking Vs. Page Ranking

Social Network Analysis

Complex-Network Modelling and Inference

Week 5 Video 5. Relationship Mining Network Analysis

The PageRank Computation in Google, Randomized Algorithms and Consensus of Multi-Agent Systems

How to explore big networks? Question: Perform a random walk on G. What is the average node degree among visited nodes, if avg degree in G is 200?

The PageRank Citation Ranking

The PageRank Computation in Google, Randomized Algorithms and Consensus of Multi-Agent Systems

.. Spring 2009 CSC 466: Knowledge Discovery from Data Alexander Dekhtyar..

Web consists of web pages and hyperlinks between pages. A page receiving many links from other pages may be a hint of the authority of the page

Mining Web Data. Lijun Zhang

A GEOGRAPHICAL LOCATION INFLUENCED PAGE RANKING TECHNIQUE FOR INFORMATION RETRIEVAL IN SEARCH ENGINE

International Journal of Advance Engineering and Research Development. A Review Paper On Various Web Page Ranking Algorithms In Web Mining

Agenda. Math Google PageRank algorithm. 2 Developing a formula for ranking web pages. 3 Interpretation. 4 Computing the score of each page

Mining Web Data. Lijun Zhang

Page Rank Link Farm Detection

SGN (4 cr) Chapter 11

WEB STRUCTURE MINING USING PAGERANK, IMPROVED PAGERANK AN OVERVIEW

Information Retrieval Lecture 4: Web Search. Challenges of Web Search 2. Natural Language and Information Processing (NLIP) Group

COMP Page Rank

Social Networks 2015 Lecture 10: The structure of the web and link analysis

How Google Finds Your Needle in the Web's

CS6200 Information Retreival. The WebGraph. July 13, 2015

Graph Theory. Network Science: Graph theory. Graph theory Terminology and notation. Graph theory Graph visualization

Lecture 11: Graph algorithms! Claudia Hauff (Web Information Systems)!

Some rankings based on PageRank applied to the Valencia Metro

Seismic regionalization based on an artificial neural network

CS 6604: Data Mining Large Networks and Time-Series

Motivation. Motivation

Chapter Introduction

PageRank. CS16: Introduction to Data Structures & Algorithms Spring 2018

Efficient Mining Algorithms for Large-scale Graphs

ScienceDirect A NOVEL APPROACH FOR ANALYZING THE SOCIAL NETWORK

Link Analysis. Link Analysis

Finding Top UI/UX Design Talent on Adobe Behance

A ROUTING MECHANISM BASED ON SOCIAL NETWORKS AND BETWEENNESS CENTRALITY IN DELAY-TOLERANT NETWORKS

Rank Measures for Ordering

A STUDY OF RANKING ALGORITHM USED BY VARIOUS SEARCH ENGINE

TELCOM2125: Network Science and Analysis

Robustness of Centrality Measures for Small-World Networks Containing Systematic Error

An Enhanced Page Ranking Algorithm Based on Weights and Third level Ranking of the Webpages

Chapter 4: Implicit Error Detection

6. NEURAL NETWORK BASED PATH PLANNING ALGORITHM 6.1 INTRODUCTION

Scalable Clustering of Signed Networks Using Balance Normalized Cut

Big Data Analytics CSCI 4030

DS504/CS586: Big Data Analytics Graph Mining Prof. Yanhua Li

Information Retrieval. Lecture 11 - Link analysis

Using Google s PageRank Algorithm to Identify Important Attributes of Genes

CS224W Final Report Emergence of Global Status Hierarchy in Social Networks

CS249: SPECIAL TOPICS MINING INFORMATION/SOCIAL NETWORKS

World Wide Web has specific challenges and opportunities

Graphs / Networks. CSE 6242/ CX 4242 Feb 18, Centrality measures, algorithms, interactive applications. Duen Horng (Polo) Chau Georgia Tech

The application of Randomized HITS algorithm in the fund trading network

A SHARKOVSKY THEOREM FOR VERTEX MAPS ON TREES

On Finding Power Method in Spreading Activation Search

Scalable Data-driven PageRank: Algorithms, System Issues, and Lessons Learned

A *69>H>N6 #DJGC6A DG C<>C::G>C<,8>:C8:H /DA 'D 2:6G, ()-"&"3 -"(' ( +-" " " % '.+ % ' -0(+$,

Mining Social Network Graphs

Using Non-Linear Dynamical Systems for Web Searching and Ranking

INTRODUCTION TO DATA SCIENCE. Link Analysis (MMDS5)

Part 1: Link Analysis & Page Rank

CENTRALITIES. Carlo PICCARDI. DEIB - Department of Electronics, Information and Bioengineering Politecnico di Milano, Italy

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

Clustering: Classic Methods and Modern Views

Supplementary Information

Analysis of Biological Networks. 1. Clustering 2. Random Walks 3. Finding paths

Transcription:

Procedia Computer Science Volume 51, 2015, Pages 2903 2907 ICCS 2015 International Conference On Computational Science My Best Current Friend in a Social Network Francisco Moreno 1, Santiago Hernández 2, and Edison Ospina 1 1 Department of Computer and Decision Sciences 2 Department of Mathematics Universidad Nacional de Colombia, Medellín Campus {fjmoreno, sahernandezto, ecospinaa}@unal.edu.co Abstract Due to its popularity, social networks (SNs) have been subject to different analyses. A research field in this area is the identification of several types of users and groups. To make the identification process easier, a SN is usually represented through a graph. Usual tools to analyze a graph are the centrality measures, which identify the most important vertices. One of these measures is the PageRank (a measure originally designed to classify web pages). Informally, in the context of a SN, the PageRank of a user i represents the probability that another user of the SN is seeing the page of i after a considerable time of navigation in the SN. In this paper, we define a new type of user in a SN: the best current friend. Informally, the idea is to identify, among the friends of a user i, who is the friend k that would generate the highest decrease in the PageRank of i if k stops being his/her friend. This may be useful to identify the users/customers whose friendship/relationship should be a priority to keep. 1 Introduction Based on the relationships established by the members of a community, e.g., the users of a social network (SN), different types of users and user groups can be identified. For instance, leader users (Pedroche 2010b; Pedroche 2012), the best potential friends of a user (Moreno et al. 2013), the users that exhibit a distrust behavior (Ortega et al. 2012), and the efficient information spreaders (Kitsak et al. 2010), among others. With regard to groups, in (Pedroche 2010a) user groups that compete for visibility in the community are identified and in (Masuda et al. 2013) user groups of depressive and suicidal communities are analyzed. To facilitate the identification and analysis of these types of users and user groups, the community of users and their relationships are usually represented through some mechanism. For example, a SN is usually represented through a graph. Usual tools to analyze a graph are the centrality measures (Masuda et al. 2013), which identify the most important vertices. These measures include the degree Selection and peer-review under responsibility of the Scientific Programme Committee of ICCS 2015 c The Authors. Published by Elsevier B.V. doi:10.1016/j.procs.2015.05.472 2903

centrality which measures the number of links of a node, the closeness centrality determined by the length of the shortest paths from one node to the rest of the nodes of the network, the betweeness centrality that is based on the total number of shortest paths that exist among all the pairs of nodes that pass through a node, and the PageRank. The PageRank is a measure originally designed to classify web pages (Page et al. 1999). Informally, the PageRank of a web page p represents the probability that a web surfer is visiting p after a considerable time of navigation in the web. In this short paper, we define a new type of user in a SN based on the PageRank: the best current friend (BCF). Informally, our goal is to identify among the friends of a user i, who is the friend k that would generate the highest decrease in the PageRank of i if k stops being his/her friend. This may be useful to identify the users whose relationship (business, friendship, and family) should be a priority to keep. These users are a key element for the executives and for a company to get future customers. For instance, it is important for the sales executives of a company to detect this type of users to keep and strengthen their relationships, e.g., offering them extra benefits and customized services. This paper is organized as follows. In Section 2, we present the basic elements of the PageRank method. In Section 3, we formally introduce the concept of the BCF based on the PageRank. In Section 4, we conclude the paper and outline future work. 2 Basic Definitions The users and their relationships in a SN may be represented by a graph. For example, consider a SN with n = 5 users represented with a directed graph GSN = (N, E), where N represents the set of nodes {1, 2, 3, 4, 5} and E the set of edges {(1, 2), (1, 3), (2, 1), (2, 3), (2, 4), (3, 1), (3, 4), (4, 2), (4, 5), (5, 2)}, see Figure 1. An edge (i, j) indicates that i is friend of j (i points to j). Note that this representation supports both unidirectional and bidirectional relationships, e.g., in SNs such as Twitter a user w follows a user z but z not necessarily follows w, i.e., they may exhibit a unidirectional Figure 1: SN with 5 nodes, represented with a directed graph. relationship. Our goal is to classify the nodes (users) of a SN applying the PageRank method ( Page et al. 1999; Pedroche 2010b; Pedroche 2012). Note that in a SN it is reasonable to assume that each user points to at least to one friend (outlink). This is a mandatory condition to apply the PageRank method, i.e., there must not be dangling nodes (Pedroche 2010b). 2904

To apply the PageRank method, first we build a connectivity matrix H = (h ij) R n n, 1 i, j n, that represents the links of each node. If there exists a link from node i to node j, i j, then h ij = 1, otherwise h ij = 0; if i = j then h ii = 0. From H matrix we build the row stochastic matrix P = (p ij) R n n, 1 i, j n. A matrix is row stochastic if the sum of the elements of each of its rows is 1. P is calculated by dividing each element h ij by the sum of the elements of row i of H. Note that because we assume that there do not exist dangling nodes then this sum (in each row) cannot be zero. The PageRank method requires that the P matrix, in addition to be row stochastic, must be primitive. A non-negative square matrix is primitive (Varga 2009) if the number of distinct eigenvalues of the matrix whose absolute value is equal to the spectral radius ρ(p) is 1, where ρ(p) is the maximum value (in absolute value) of its eigenvalues. To ensure this property (and still preserving the row stochastic property), we apply the following transformation (Page et al. 1999; Pedroche 2007). G = P + (1 )ev T Where G is known as Google matrix. is a damping factor, 0 < < 1, and represents the probability with which the surfer of the network moves among the links of the H matrix, and (1- ) represents the probability of the surfer to randomly navigate to a link which is not among the links of H. Note that if = 1, then G = P, i.e., we would be working with the initial P matrix. Usually, is set to 0.85, a value that was established by Brin and Page, the creators of the PageRank method (Page et al. 1999; Pedroche 2007). In (Becchetti & Castillo 2006; Boldi 2005) the effect of several values of is analized. On the other hand, e R n 1 is the vector of all ones and v T e = 1. v is called personalization or teletransportation vector and may be used to affect (to benefit or to harm) the ranking of the nodes of the network (Pedroche 2007): v = (v i) R n 1 : v i > 0, 1 i n. Usually, v = (1/n) and is known as the basic personalization vector. However, if we want to affect the ranking of a specific node i, v may be defined as follows: Let 0 < ε < 1 then v i = (v ij) R n 1 : v ij = ε / (n - 1) for i j, v ii = 1 - ε,. Thus, when ε is close to zero, the ranking of node i tends to increase, but if ε is close to one, its ranking tends to decrease. A value commonly used in the literature for ε is 0.3. We denote PPR (Personalized PageRank) as the PageRank of a node using some pre-scribed personalization vector v j and we denote PR j the PageRank vector computed using v j. From G matrix we can compute the PageRank vector π. To compute vectorπ we consider the following system of equations π T = π T G, where π T = [q 1 q 2 q 3 q 4 q 5]. In addition, to ensure that π is a probability vector, we also consider the equation: q 1 + q 2 + q 3 + q 4 + q 5 = 1. For the running example, we solved the system using MATLAB; results are showed in Table 1. Results show that node 2 has the highest PageRank whereas node 5 has the lowest one. Node PageRank 1 0.1972 2 0.2944 3 0.1972 4 0.1972 5 0.1138 Table 1: PageRank vector π. Highest Lowest 2905

3 The BCF We introduce the concept of the BCF of a node in a SN. The BCF of a node i is the node k of the SN, k i, H[k, i] = 1, such that if k stops being friend of i (k stops pointing to i), k is the node that generates the highest decrease in the PageRank of i. That is, let GSN = (N, E) be the initial graph that represents the SN. Let i(gsn) denote the i component of the PPR for some personalization vector v. Given i N, let: Q(i ) = { j N: i j, (j, i) E }, i.e., the set of nodes that point to i. Let E (j, i) = E - {(j, i)}, with some j Q(i ), i.e., the initial set of edges E minus the edge from j to i, and let GSN (j, i) = (N, E (j, i)). Then we say that k Q(i) is the BCF of i if the following condition holds: i(gsn (k, i)) = min( i(gsn (j, i))), j Q(i). Example. Consider the SN of Figure 1. Currently the PageRank of node 2 (the node with the highest PageRank) is 0.2944. In Table 2, we show the change in the PageRank of this node depending on the node that has been disconnected. In this example, node 1 is the BCF of node 2 because if it is disconnected, it will be the node that decreases the most the PageRank in the node 2. Node to be disconnected PageRank Node 2 1 0.2149 BCF 2 N/A 3 N/A 4 0.2654 5 N/A (cannot be disconnected) Table 2: Changes in the PageRank of node 2 depending on the node that is disconnected. 4 Conclusions and Future Works In this paper, a new type of user in a SN, the BCF, was formally defined. Based on the PageRank, it was determined which is the friend of a user i whose friendship is more important to keep because in case it gets lost, this would generate the highest decrease in the PageRank of i. The identification of the BCF could be decisive for a user when keeping the visibility and influence in the SN. As future work, we plan to implement the corresponding algorithm to identify the BCF and to conduct a series of experiments with real social networks, such as Facebook and Twitter. As future work, we also consider as relevant aspects the following: to define the BCF in terms of other centrality measures, to compare the results among them, and to determine correlations if there is any. For instance, if a node k is the BCF of a node i when considering a centrality measure c then, how close is k to be the BCF of i when another centrality measure is considered? Another work is the development of a visual tool that allows the analyst to identify, in a friendly way, the BCF of each node, and that also permits the interactive manipulation of the SN (addition and removal of nodes/relationships), and that shows the way the BCF of each node is affected given these changes. This could lead to the understanding of how the relationships of other users of the SN affect a node i and to its corresponding BCF. At the same time, this could lead to the identification of the best external friendship with regard to a node i, i.e., among all the couples of friends in a SN (couples that 2906

do not include i), which is the couple that generate the highest decrease in the PageRank of i if this couple fell out. References Becchetti, L. & Castillo, C., 2006. The distribution of PageRank follows a power-law only for particular values of the damping factor. Proceedings of the 15th international conference on World Wide Web, pp.941 942. Kitsak, M. et al., 2010. Identification of influential spreaders in complex networks. Nature Physics, 6, pp.888 893. Masuda, N., Kurahashi, I. & Onari, H., 2013. Suicide ideation of individuals in online social networks. PLoS One, 8(4), pp.1 8. Moreno, F., Valencia, A. & González, A., 2013. My best potential friend in a social network. In Social Networking: Recent Trends, Emerging Issues and Future Outlook. pp. 113 124. Ortega, F.J. et al., 2012. Propagation of trust and distrust for the detection of trolls in a social network. Computer Networks, 56(12), pp.2884 2895. Page, L. et al., 1999. The PageRank Citation Ranking: Bringing Order to the Web. Available at: http://ilpubs.stanford.edu:8090/422. Pedroche, F., 2012. A model to classify users of social networks based on PageRank. International Journal of Bifurcation and Chaos, 22(7), pp.1 14. Pedroche, F., 2010a. Competitivity Groups on Social Network Sites. Mathematical and Computer Modelling, 52(7-8), pp.1052 1057. Pedroche, F., 2007. Métodos de cálculo del vector PageRank. Boletín de la Sociedad Española de Matemática Aplicada -SeMA, (39), pp.7 30. Pedroche, F., 2010b. Ranking nodes in social network sites using biased PageRank. In 2 o Encuentro de Álgebra Lineal Análisis Matricial y Aplicaciones. ALAMA-2010. Valencia. Varga, R.S., 2009. Matrix Iterative Analysis Second., Springer Series in Computational Mathematics Volume 27. 2907