Link prediction in multiplex bibliographical networks

Similar documents
CS224W: Social and Information Network Analysis Project Report: Edge Detection in Review Networks

Improving the vector auto regression technique for time-series link prediction by using support vector machine

A Holistic Approach for Link Prediction in Multiplex Networks

Link Prediction in Microblog Network Using Supervised Learning with Multiple Features

arxiv: v1 [cs.si] 8 Apr 2016

A Naïve Bayes model based on overlapping groups for link prediction in online social networks

Link Prediction in Fuzzy Social Networks using Distributed Learning Automata

Automatic Group-Outlier Detection

Link Prediction in a Modified Heterogeneous Bibliographic Network

Supervised Prediction of Social Network Links Using Implicit Sources of Information

Supervised Link Prediction with Path Scores

Internal Link Prediction in Early Stage using External Network

User Guided Entity Similarity Search Using Meta-Path Selection in Heterogeneous Information Networks

OLAP on Information Networks: a new Framework for Dealing with Bibliographic Data

Link Prediction in a Semi-bipartite Network for Recommendation

Users and Pins and Boards, Oh My! Temporal Link Prediction over the Pinterest Network

AspEm: Embedding Learning by Aspects in Heterogeneous Information Networks

Combining Review Text Content and Reviewer-Item Rating Matrix to Predict Review Rating

The link prediction problem for social networks

Graph Classification in Heterogeneous

Meta-path based Multi-Network Collective Link Prediction

Structural Analysis of Paper Citation and Co-Authorship Networks using Network Analysis Techniques

Capturing Missing Links in Social Networks Using Vertex Similarity

Bipartite Editing Prediction in Wikipedia

Link Prediction for Social Network

Exploiting Social and Mobility Patterns for Friendship Prediction in Location-Based Social Networks

Supervised Random Walks on Homogeneous Projections of Heterogeneous Networks

An Efficient Link Prediction Technique in Social Networks based on Node Neighborhoods

Effective Latent Space Graph-based Re-ranking Model with Global Consistency

Using Friendship Ties and Family Circles for Link Prediction

Mining Trusted Information in Medical Science: An Information Network Approach

Link Prediction in Graph Streams

Online Social Networks and Media

Measuring scholarly impact: Methods and practice Link prediction with the linkpred tool

An Analysis of Researcher Network Evolution on the Web

CSE 316: SOCIAL NETWORK ANALYSIS INTRODUCTION. Fall 2017 Marion Neumann

Meta-path based Multi-Network Collective Link Prediction

Keywords Complex Network, Link prediction, Link Prediction methods, Fuzzy logic, Fuzzy logic applications

Link Prediction and Anomoly Detection

E-commercial Recommendation Algorithms Based on Link Analysis

Link Prediction in Heterogeneous Collaboration. networks

Generating Useful Network-based Features for Analyzing Social Networks

An Exploratory Journey Into Network Analysis A Gentle Introduction to Network Science and Graph Visualization

CAN DISSIMILAR USERS CONTRIBUTE TO ACCURACY AND DIVERSITY OF PERSONALIZED RECOMMENDATION?

Big Data Analytics Influx of data pertaining to the 4Vs, i.e. Volume, Veracity, Velocity and Variety

Link Prediction and Topological Feature Importance in Social Networks

Graph Mining and Social Network Analysis

Holistic Analysis of Multi-Source, Multi- Feature Data: Modeling and Computation Challenges

MANY complex systems in sociology, biology, and computer

A. Papadopoulos, G. Pallis, M. D. Dikaiakos. Identifying Clusters with Attribute Homogeneity and Similar Connectivity in Information Networks

Generating Useful Network-based Features for Analyzing Social Networks

Exhaustive and Guided Algorithms for Recommendation in a Professional Social Network

arxiv: v9 [cs.si] 22 Jun 2012

An Empirical Analysis of Communities in Real-World Networks

arxiv: v1 [cs.si] 14 Aug 2017

Using network evolution theory and singular value decomposition method to improve accuracy of link prediction in social networks

A new link prediction method for improving security in social networks

Overlay (and P2P) Networks

Mining Web Data. Lijun Zhang

Link prediction with sequences on an online social network

Hadoop Based Link Prediction Performance Analysis

CS 224W Final Report Group 37

REMOVAL OF REDUNDANT AND IRRELEVANT DATA FROM TRAINING DATASETS USING SPEEDY FEATURE SELECTION METHOD

CS 224W Final Project. An Analysis of GitHub s Collaborative Software Network

Part I: Data Mining Foundations

Link Mining & Entity Resolution. Lise Getoor University of Maryland, College Park

Research Article Link Prediction in Directed Network and Its Application in Microblog

Mining Web Data. Lijun Zhang

Learning Latent Representations of Nodes for Classifying in Heterogeneous Social Networks

Sequences Modeling and Analysis Based on Complex Network

Topology Enhancement in Wireless Multihop Networks: A Top-down Approach

List of Exercises: Data Mining 1 December 12th, 2015

E6885 Network Science Lecture 5: Network Estimation and Modeling

Fast Inbound Top- K Query for Random Walk with Restart

CS249: ADVANCED DATA MINING

[Gidhane* et al., 5(7): July, 2016] ISSN: IC Value: 3.00 Impact Factor: 4.116

Query Independent Scholarly Article Ranking

Chapter 1 Introduction

Multi-Label Classification with Conditional Tree-structured Bayesian Networks

QUINT: On Query-Specific Optimal Networks

Collective Prediction of Multiple Types of Links in Heterogeneous Information Networks

Link prediction in graph construction for supervised and semi-supervised learning

Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p.

Using Transactional Information to Predict Link Strength in Online Social Networks

A New Energy Model for the Hidden Markov Random Fields

An Adaptive Framework for Multistream Classification

Ranking Web Pages by Associating Keywords with Locations

Tutorials Case studies

Recommendation System for Location-based Social Network CS224W Project Report

Using Real-valued Meta Classifiers to Integrate and Contextualize Binding Site Predictions

Citation Prediction in Heterogeneous Bibliographic Networks

Topic mash II: assortativity, resilience, link prediction CS224W

Sampling Operations on Big Data

Social Network Mining An Introduction

LinkBoost: A Novel Cost-Sensitive Boosting Framework for Community-Level Network Link Prediction

CSE 258. Web Mining and Recommender Systems. Advanced Recommender Systems

A Semi-Supervised Approach for Web Spam Detection using Combinatorial Feature-Fusion

WE know that most real systems usually consist of a

Efficient Top-k Shortest-Path Distance Queries on Large Networks by Pruned Landmark Labeling with Application to Network Structure Prediction

ISSN: X International Journal of Advanced Research in Electronics and Communication Engineering (IJARECE) Volume 6, Issue 8, August 2017

Transcription:

Int. J. Complex Systems in Science vol. 3(1) (2013), pp. 77 82 Link prediction in multiplex bibliographical networks Manisha Pujari 1, and Rushed Kanawati 1 1 Laboratoire d Informatique de Paris Nord (LIPN), Université Sorbonne Paris Cité, CNRS UMR 7030 Villetaneuse, France Abstract. In this work we present a new approach for co-authorship link prediction based on leveraging information contained in general bibliographical multiplex networks. A multiplex network, also called multi-slice or multi-relational network, that we consider here is a graph defined over a set of nodes linked by different types of relations. For instance, the multiplex network we are studying here is defined as follows : nodes represent authors and links can be one of the following types : co-authorship links, co-venue attending links and co-citing links. Other types of links can also be considered involving bibliographical coupling, research theme sharing and so on. We show here a new approach for exploring the multiplex relations to predict future collaboration (co-authorship links) among authors. The applied approach is a supervised-machine learning approach where we attempt to learn a model for link formation based on a set of topological attributes describing both positive and negative examples. While such an approach has been successfully applied in the context on simple networks, different options can be applied to extend it to the multiplex network context. One option is to compute topological attributes in each layer of the multiplex. Another one is to compute directly new multiplex-based attributes quantifying the multiplex nature of dyads (potential links). We show our first results on experiments conducted on real datasets extracted from the famous bibliographical database; DBLP that has been enriched with citation informations. Keywords: Link prediction, multiplex networks, supervised machine learning Corresponding author: manisha.pujari@lipn.univ-paris13.fr Received: December 5th, 2013 Published: December 31th, 2013

78 Link prediction in multiplex bibliographical networks 1. Introduction Link prediction plays an important role in the analysis of complex networks with its wide range of application like identification of missing links in biological networks, identification of hidden and new criminal links, recommendation systems, prediction of future collaborations or purchases in e-commerce. It can be defined as the process of identifying missing or new links in a network by studying the history of the network. A popular category of link prediction approaches is dyadic topological approaches which consider only the graph structure and compute a score for unconnected nodes pairs. In a seminal work of Liben-Nowell et al.[8], authors have shown that simple topological features characterising pairs of unlinked nodes, can be used for predicting formation of new links. They propose to sort a list of unconnected node pairs according to the values of a topological measure. The top k node pairs are then returned as the output of the prediction task. Here an assumption is made that the topological measure should be able to rank the most probable new links on the top. Many other works have been published focusing on combining different topological metrics to enhance prediction performances which convert the link prediction task to a binary classification problem and hence use machine learning algorithms[9, 7]. But all these work address the link prediction in only simple networks which have homogeneous links. To our knowledge, not much have been explored to add multiplex information for the task of link prediction. Although there are a few recent work proposing methods for prediction of links in heterogeneous networks, networks which have different types of nodes as well as edges[1]. There have also been few work on extending simple structural features like degree, path etc. to the context of multiplex networks [4, 6] but none have attempted to use them for link prediction. We propose a new approach for exploring the multiplex relations to predict future collaboration (co-authorship links) among authors. The applied approach is a supervised-machine learning approach where we attempt to learn a model for link formation based on a set of topological attributes describing both positive and negative examples. While such an approach has been successfully applied in the context on simple networks, different options can be applied to extend it to the multiplex network context. One option is to compute topological attributes in each layer of the multiplex. Another one is to compute directly new multiplex-based attributes quantifying the multiplex nature of dyads (potential links). Both approaches will be discussed in the next section.

M. Pujari et al 79 2. Link prediction approach Our approach includes computing simple topological scores for unconnected node pairs in a graph. Then we extend these attributes to include information from other dimension graphs. This can be done in three ways: First we compute the simple topological measures in all dimensions; second is to take the average of the scores; and third we propose an entropy based version of each topological measures which gives importance to the presence of a nonzero score of the node pair in each dimension. In the end all these attributes can be combined in various ways to form different sets of vectors of attribute values characterizing each example or unconnected node pair. Formally, if we have a multiplex graph G =< V, E 1,..., E m > which in fact is a set of graphs < G 1, G 2,..., G m > and a topological attribute X. For any two unconnected nodes u and v in graph G i (where we want to make a prediction), X(u, v) computed on G i will be direct attribute and the same computed on all other dimension graphs will be indirect attributes. The second category computes m α=1 an average of the attribute over all the dimension i.e. X average = X(u,v)[α] m for u, v V and (u, v) / E i. where m is the number of types of relations in the graph (dimension or layer). In the third category we propose a new attribute called product of node degree entropy (PNE) which is based on degree entropy, a multiplex property proposed by F. Battistion et al. [4]. If degree of node u is k(u), the degree entropy is given by: E(u) = m k(u) [α] α=1 k total log( k(u)[α] k total ) where k total = m α=1 k(u)[α] and we define product of node degree entropy as P NE(u, v) = E(u) E(v) We also extend the same concept to define entropy of a simple topological attribute, say X ent X ent (u, v) = m α=1 X(u, v) [α] X total X(u, v)[α] log( ) X total where X total = m α=1 X(u, v)[α]. The entropy based attributes are more suitable to capture the distribution of the attribute value over all dimensions. A higher value indicates uniform distribution attribute value across the multiplex layers. We address average and entropy based attributes as multiplex attributes.

80 Link prediction in multiplex bibliographical networks 3. Experiments We evaluated our approach using data obtained from DBLP 1 databases of which we created three datasets, each corresponding to a different period of time. Table.1 summarizes the information about the graphs of each dataset. Each graph has four years for learning or training and next two years are used to label the examples generated from the learning graphs. Examples are unconnected node pairs and they are labelled as positive or negative based on whether they are connected during the labelling period or not. Table.2 shows the number of examples obtained for each dataset. Years Properties Co-Author Co-Venue Co-Citation 1970-1973 1972-1975 1974-1977 Nodes 91 91 91 Edges 116 1256 171 Nodes 221 221 221 Edges 319 5098 706 Nodes 323 323 323 Edges 451 9831 993 Table 1: Graphs Years # Positive # Negatives Train/Test Labeling 1970-1973 1974-1975 16 1810 1972-1975 1976-1977 49 12141 1974-1977 1978-1979 93 26223 Table 2: Examples from co-authorship graph We selected the following topological attributes: Number of common neighbors (CN), Jaccard coefficient (JC ), Preferential attachment (PA)[5], Adamic Adar coefficient (AA)[3], Resource allocation (RA)[2] and Shortest path length (SPL). We applied decision tree algorithm on one dataset to generate a model and then tested it on another dataset. We are using data mining tool Orange 2 for that. We use four types of combinations of the attributes creating five different sets namely: Set direct (attributes computed only in the co-authorship graph); Set direct+indirect (attributes computed in co-authorship, co-venue and co-citation graphs); Set direct+multiplex (attributes computed from co-authorship 1 http://www.dblp.org 2 http://orange.biolab.si

M. Pujari et al 81 graph with average attributes obtained from three dimension graphs, and also entropy based attributes); Set all (attributes computed in co-authorship, covenue and co-citation graphs, with average of the attributes, and also entropy based attributes) and Set multiplex (average attributes and entropy based attributes). Table.3 shows the result obtained in terms of F1-measure and area under the ROC curve (AUC). We can see that there is improvement in the F1-measure when we use multiplex attributes. AUC is better for all the sets that include multiplex and indirect attributes for both datasets. Learning:1970-1973 Learning:1972-1975 Attributes Test:1972-1975 Test:1974-1977 F-measure AUC F-measure AUC Set direct 0.0357 0.5263 0.0168 0.4955 Set direct+indirect 0.0256 0.5372 0.0150 0.5132 Set direct+multiplex 0.0592 0.5374 0.0122 0.5108 Set all 0.0153 0.5361 0.0171 0.5555 Set multiplex 0.0374 0.5181 0.0185 0.5485 Table 3: Results of decision tree algorithm 4. Conclusion This paper presents our new approach of link prediction in multiplex networks. We propose some new and extended topological features that can be used for characterizing the unlinked node pairs for link prediction task, including also multiplex relation information. They can be applied to predict links in any of the layers of the network. We tested our method for prediction of co-authorship links on datasets obtained from DBLP databases. The preliminary results show that addition of multiplex information indeed improve the prediction performance and thereby motivates us to continue our research further to better confirm this concept. References [1] Yizhou Sun, Rick Barber, Manish Gupta, Charu C. Aggarwa and Jiawei Han, Advances on social network Analysis and mining (ASONAM), (2011). [2] Tao Zhou, Lu Linyuan and Yi-Cheng Zhang, Predicting Missing Links via Local Information, (2009).

82 Link prediction in multiplex bibliographical networks [3] Lada Adamic, Orkut Buyukkokten, and Eytan Adar, A social network caught in web ( First Monday) 8 (6), (2003). [4] Federico Battiston, Vincenzo Nicosia and Vito Latora, Metrics for the analysis of multiplex networks, (2013). [5] Zan Huang, Xin Li and Hsinchun Chen, Link prediction approach to collaborative filtering (JDLC), (2005). [6] M. Berlingerio, M. Coscia, F. Giannotti, A. Monreale and D. Pedreschi, Foundations of Multidimensional Network Analysis(ASONAM), 485 489 (2009). [7] Nesserine Benchettara, Rushed Kanawati and Céline Rouveirol, A supervised machine learning link prediction approach for academic collaboration recommendation (RecSys 10), 253 (2010). [8] David Liben-Nowell and Jon M. Kleinberg, The link prediction problem for social networks (CIKM),556 559 (2003). [9] Mohammad Al Hasan, Vineet Chaoji, Saeed Salem and Mohammed Zaki, Link prediction using supervised learning (SIAM Workshop on Link Analysis, Counterterrorism and Security with SIAM Data Mining Conference), (2006).