An Exploratory Journey Into Network Analysis A Gentle Introduction to Network Science and Graph Visualization

Similar documents
Girls Talk Math Summer Camp

Social Network Analysis With igraph & R. Ofrit Lesser December 11 th, 2014

Basics of Network Analysis

CAIM: Cerca i Anàlisi d Informació Massiva

Graph Theory for Network Science

Extracting Information from Complex Networks

Structural Analysis of Paper Citation and Co-Authorship Networks using Network Analysis Techniques

Graph Theory for Network Science

Incoming, Outgoing Degree and Importance Analysis of Network Motifs

Nick Hamilton Institute for Molecular Bioscience. Essential Graph Theory for Biologists. Image: Matt Moores, The Visible Cell

Networks in economics and finance. Lecture 1 - Measuring networks

Chapter 1. Social Media and Social Computing. October 2012 Youn-Hee Han

Some Graph Theory for Network Analysis. CS 249B: Science of Networks Week 01: Thursday, 01/31/08 Daniel Bilar Wellesley College Spring 2008

Algorithms and Applications in Social Networks. 2017/2018, Semester B Slava Novgorodov

Case Studies in Complex Networks

Graph Theory Review. January 30, Network Science Analytics Graph Theory Review 1

Complex networks: A mixture of power-law and Weibull distributions

CSE 258 Lecture 12. Web Mining and Recommender Systems. Social networks

Introduction to Complex Networks Analysis

Graph Theory. Graph Theory. COURSE: Introduction to Biological Networks. Euler s Solution LECTURE 1: INTRODUCTION TO NETWORKS.

Network Thinking. Complexity: A Guided Tour, Chapters 15-16

CS224W: Analysis of Network Jure Leskovec, Stanford University

CSE 158 Lecture 11. Web Mining and Recommender Systems. Social networks

Critical Phenomena in Complex Networks

Social Data Management Communities

Introduction to Networks and Business Intelligence

Structure of biological networks. Presentation by Atanas Kamburov

Examples of Complex Networks

CS224W: Analysis of Networks Jure Leskovec, Stanford University

CS 224W Final Report Group 37

How Do Real Networks Look? Networked Life NETS 112 Fall 2014 Prof. Michael Kearns

Modeling and Simulating Social Systems with MATLAB

Detecting and Analyzing Communities in Social Network Graphs for Targeted Marketing

Topic mash II: assortativity, resilience, link prediction CS224W

THE DEPENDENCY NETWORK IN FREE OPERATING SYSTEM

Non Overlapping Communities

Overlay (and P2P) Networks

Social-Network Graphs

Overview of Network Theory, I

V 1 Introduction! Mon, Oct 15, 2012! Bioinformatics 3 Volkhard Helms!

Properties of Biological Networks

CSE 258 Lecture 6. Web Mining and Recommender Systems. Community Detection

Social Network Analysis

Biological Networks Analysis

Social Networks. Slides by : I. Koutsopoulos (AUEB), Source:L. Adamic, SN Analysis, Coursera course

M.E.J. Newman: Models of the Small World

The Complex Network Phenomena. and Their Origin

Comparison of Centralities for Biological Networks

1 Degree Distributions

CSE 158 Lecture 6. Web Mining and Recommender Systems. Community Detection

Graph Theory. Network Science: Graph theory. Graph theory Terminology and notation. Graph theory Graph visualization

CSE 190 Lecture 16. Data Mining and Predictive Analytics. Small-world phenomena

Networks and stability

Graphs. Data Structures and Algorithms CSE 373 SU 18 BEN JONES 1

Complex Networks. Structure and Dynamics

TELCOM2125: Network Science and Analysis

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

An introduction to the physics of complex networks

Signal Processing for Big Data

Complex networks Phys 682 / CIS 629: Computational Methods for Nonlinear Systems

MAE 298, Lecture 9 April 30, Web search and decentralized search on small-worlds

Online Social Networks and Media

My favorite application using eigenvalues: partitioning and community detection in social networks

Heuristics for the Critical Node Detection Problem in Large Complex Networks

Complex-Network Modelling and Inference

Algorithmic and Economic Aspects of Networks. Nicole Immorlica

Graphs. Edges may be directed (from u to v) or undirected. Undirected edge eqvt to pair of directed edges

A quick review. The clustering problem: Hierarchical clustering algorithm: Many possible distance metrics K-mean clustering algorithm:

Machine Learning and Modeling for Social Networks

Introduction to network metrics

Characteristics of Preferentially Attached Network Grown from. Small World

Failure in Complex Social Networks

Analysis of Co-Authorship Network of Scientists Working on Topic of Network Theory

Social, Information, and Routing Networks: Models, Algorithms, and Strategic Behavior

Theory and Applications of Complex Networks

Summary: What We Have Learned So Far

THE KNOWLEDGE MANAGEMENT STRATEGY IN ORGANIZATIONS. Summer semester, 2016/2017

Graph Algorithms using Map-Reduce. Graphs are ubiquitous in modern society. Some examples: The hyperlink structure of the web

Topic II: Graph Mining

CUT: Community Update and Tracking in Dynamic Social Networks

CS-E5740. Complex Networks. Scale-free networks

Erdős-Rényi Model for network formation

Network Basics. CMSC 498J: Social Media Computing. Department of Computer Science University of Maryland Spring Hadi Amiri

CSE 158 Lecture 11. Web Mining and Recommender Systems. Triadic closure; strong & weak ties

What is a Network? Theory and Applications of Complex Networks. Network Example 1: High School Friendships

CSE 255 Lecture 13. Data Mining and Predictive Analytics. Triadic closure; strong & weak ties

Systems, ESD.00. Networks II. Lecture 8. Lecturers: Professor Joseph Sussman Dr. Afreen Siddiqi TA: Regina Clewlow

Introduction to Engineering Systems, ESD.00. Networks. Lecturers: Professor Joseph Sussman Dr. Afreen Siddiqi TA: Regina Clewlow

- relationships (edges) among entities (nodes) - technology: Internet, World Wide Web - biology: genomics, gene expression, proteinprotein

Constructing a G(N, p) Network

Data mining --- mining graphs

modern database systems lecture 10 : large-scale graph processing

On Complex Dynamical Networks. G. Ron Chen Centre for Chaos Control and Synchronization City University of Hong Kong

CSE 255 Lecture 6. Data Mining and Predictive Analytics. Community Detection

Mathematics of Networks II

Biological Networks Analysis

An Investigation into the Free/Open Source Software Phenomenon using Data Mining, Social Network Theory, and Agent-Based

Alessandro Del Ponte, Weijia Ran PAD 637 Week 3 Summary January 31, Wasserman and Faust, Chapter 3: Notation for Social Network Data

ECS 253 / MAE 253, Lecture 8 April 21, Web search and decentralized search on small-world networks

Constructing a G(N, p) Network

Transcription:

An Exploratory Journey Into Network Analysis A Gentle Introduction to Network Science and Graph Visualization Pedro Ribeiro (DCC/FCUP & CRACS/INESC-TEC)

Part 1 Motivation and emergence of Network Science

Complexity I think the next century will be the century of complexity Stephen Hawking (Jan, 2000)

The Real World is Complex World Population: 7 billions

The Real World is Complex World Population: 7 billions Human Brain Neurons: 100 billions

The Real World is Complex World Population: 7 billions Human Brain Neurons: 100 billions Internet Devices: 8 billions

Complex Systems Complex Networks Flights Map

Complex Networks are Ubiquitous Social

Complex Networks are Ubiquitous Social Facebook

Complex Networks are Ubiquitous Social Facebook Co-authorship

Complex Networks are Ubiquitous Social Facebook Co-authorship Biological Nodes + Edges

Complex Networks are Ubiquitous Social Facebook Co-authorship Biological Nodes + Edges Brain

Complex Networks are Ubiquitous Social Facebook Co-authorship Biological Nodes + Edges Brain Metabolism (proteins)

Complex Networks are Ubiquitous Spatial

Complex Networks are Ubiquitous Spatial Power

Complex Networks are Ubiquitous Spatial Power Roads

Complex Networks are Ubiquitous Spatial Power Roads Software

Complex Networks are Ubiquitous Spatial Power Roads Software Module Dependency

Complex Networks are Ubiquitous Spatial Roads Power Software Text Module Dependency

Complex Networks are Ubiquitous Spatial Roads Power Software Text Module Dependency Semantic

Network Science Behind many complex systems there is a network that defines the interactions between the components In order to understand the systems... we need to understand the networks!

Network Science Network Science has been emerging on this century as a new discipline: Origins on graph theory and social network research Image: Adapted from (Barabasi, 2015)

Why now? Two main contributing factors:

Why now? Two main contributing factors: 1) The emergence of network maps

Why now? Two main contributing factors: 1) The emergence of network maps Movie actor network: 1998 World Wide Web: 1999 Citation Network: 1998 Metabolic Network: 2000 PPI Network: 2001

Why now? Two main contributing factors: 1) The emergence of network maps Movie actor network: 1998 World Wide Web: 1999 Citation Network: 1998 Metabolic Network: 2000 PPI Network: 2001 436 nodes 2003 (email exchange, Adamic-Adar, SocNets) 43,553 nodes 2006 (email exchange, Kossinets-Watts, Science) 4.4 million nodes 2005 (friendships, Liben-Nowell, PNAS) 800 million nodes 2011 (Facebook, Backstrom et al.) ters! t a m Size

Why now? Two main contributing factors: 2) Universality of network characteristics Image: Adapted from (Newman, 2005)

Why now? Two main contributing factors: 2) Universality of network characteristics The architecture and topology of networks from different domains exhibit more similarities that what one would expect

Why now? Two main contributing factors: 2) Universality of network characteristics The architecture and topology of networks from different domains exhibit more similarities that what one would expect laws r e w o E.g. p Image: Adapted from (Newman, 2005) Image: Adapted from Leskovec, 2015

Impact of Network Science Economic Impact

Impact of Network Science Network Biology/Network Medicine

Impact of Network Science Fighting Terrorism and Military

Impact of Network Science Scientific Impact 1998: Watts-Strogatz paper in the most cited Nature publication from 1998; highlightedby ISI as one of the ten most cited papers in physics in the decade after its publication. 1999: Barabasi and Albert paper is the most cited Science paper in 1999;highlighted by ISI as one of the ten most cited papers in physics in the decade after its publication. 2001: Pastor -Satorras and Vespignani is one of the two most cited papers among the papers published in 2001 by Physical Review Letters. 2002: Girvan-Newman is the most cited paper in 2002 Proceedings of the National Academy of Sciences. REVIEWS The first review of network science by Albert and Barabasi (2001 is the most cited paper published in Reviews of Modern Physics, the highest impact factor physics journal, published since 1929. The SIAM review of Newman on network science is the most cited paper of any SIAM journal Network Biology, by Barabasi and Oltvai (2004), is the second most cited paper in the history of Nature Reviews Genetics, the top review journal in genetics.

Impact of Network Science Books

Impact of Network Science Books

Impact of Network Science Books (General Audience) And even award an winning documentary!

Impact of Network Science Example Real Application: Epidemics

Network Science Topics Some possible tasks:

Network Science Topics Some possible tasks: General Patterns Ex: scale-free, small-world

Network Science Topics Some possible tasks: General Patterns Ex: scale-free, small-world Community Detection What groups of nodes are related?

Network Science Topics Some possible tasks: General Patterns Community Detection Ex: scale-free, small-world What groups of nodes are related? Node Classification Importance and function of a certain node?

Network Science Topics Some possible tasks: General Patterns Community Detection What groups of nodes are related? Node Classification Ex: scale-free, small-world Importance and function of a certain node? Network Comparison What is the type of the network?

Network Science Topics Some possible tasks: General Patterns Community Detection Importance and function of a certain node? Network Comparison What groups of nodes are related? Node Classification Ex: scale-free, small-world What is the type of the network? Information Propagation Epidemics? Robustness?

Network Science Topics Some possible tasks: General Patterns Community Detection What is the type of the network? Information Propagation Importance and function of a certain node? Network Comparison What groups of nodes are related? Node Classification Ex: scale-free, small-world Epidemics? Robustness? Link prediction Future connections? Errors in graph constructions?

Part 2 A brief introduction to Graph Theory and network vocabulary

Graph Terminology Objects: nodes, vertices Interactions: links, edges System: network, graph N E G(N,E)

Graph Terminology Undirected Directed co-authorship networks www hyperlinks actor networks phone calls facebook friendships roads network

Graph Terminology Edge Attributes Examples: Weight (duration call, distance road,...) Ranking (best friend, second best friend, ) Type (friend, relative, co-worker,...) [colored edges] We can have a set of multiple attributes Node Attributes Examples: Type (nationality, sex, age, ) [colored nodes] We can have a set of multiple attributes

Node Properties From immediate connections Outdegree how many directed edges originate at node Indegree how many directed edges are incident on a node Outdegree=3 Indegree=2 Degree (in or out) number of outgoing and incoming edges Degree=5

Node Properties Degree related metrics: Degree sequence an ordered list of the (in,out) degree of each node In-degree sequence: [4, 2, 1, 1, 0] Out-degree sequence: [3, 2, 2, 1, 0] Degree sequence: [4, 3, 3, 3, 3] Degree Distribution a frequency count of the occurrences of each degree [usually plotted as probability normalization] In-degree Distribution 2.5 2 1.5 1 0.5 0 Out-degree Distribution 2.5 2 1.5 1 0.5 0 0 1 2 3 4 0 1 2 3 4 Degree Distribution 5 4 3 2 1 0 0 1 2 3 4

Sparsity of Networks Real Networks are usually very Sparse! Network Dir/Undir Nodes Edges Avg. Degree Internet Undirected 192,244 609,066 6.33 WWW Directed 325,729 1,479,134 4.60 Power Grid Undirected 4,941 6,594 2.67 Mobile Phone Calls Directed 36,595 91,826 2.51 Email Directed 57,194 103,731 1.81 Science Collaboration Undirected 23,133 93,439 8.08 Actor Network Undirected 702,388 29,397,908 83.71 Citation Network Directed 449,673 4,689,479 10.43 E. Coli Metabolism Directed 1,039 5,082 5.58 Protein Interactions Undirected 2,018 2,930 2.90 A graph where every pair of nodes is connected is called a complete graph (or a clique) Table: Adapted from (Barabasi, 2015)

Power Law in the Degree Sequence

Connectivity Not everything is connected

Connectivity A strongly connected component is a maximal subset of nodes where each pair of nodes is reachable trough a directed path 3 strongly connected components: - {1, 2, 5} - {3, 4, 8} - {6, 7} In a weakly connected component we can use the links in any direction 1 Weakly connected component: - {1, 2, 3, 4, 5, 6, 7, 8}

Connectivity If the largest component has a large fraction of the nodes we call it the giant component

Bipartite A bipartite graph is a graph whose nodes can be divided into two disjoint sets U and V such that every edge connects a node in U to one in V. Example: - Actor Network. U = Actor. V = Movies Image: Adapted from Leskovec, 2015

Bipartite

Bipartite Human Disease Network

Paths A path between two nodes is a sequence of adjacent nodes and their respective connecting edges The distance between two nodes (in an unweighted network) is the number of edges in the shortest path between them Example: - Distance from A to D is 3 - Distance from A to E is 4 - Distance from E to F is 2 Diameter: maximum distance between any pair of nodes Example: for the graph above, the diameter is 4

Node Centrality Centrality (how important a node is?) Betweenness: percentage of all shortest paths the node is part of Closeness: average distance to all other nodes Eigenvector: how important a node is depends on its neighbours PageRank: importance is related to in-links Image: Mateo, 2015

Clustering Clustering Coefficient (to which extent do the nodes cluster) Node : Ci = nr connection between neighbours nr maximum possible connections Global: i) Average C (Watts and Strogatz) i ii) nr triangles (cliques of size 3) nr connected triplets of vertices Real World networks typically have high clustering coefficients

Community Structure Communities Groups of nodes that are densely connected between themselves Several variations and algorithms Girvan-Newman Modularity Hierarchical clustering... Image: Newman, 2012

Part 3 Network Visualization and Exploration

Why Visualization? The greatest value of a picture is when it forces to notice what we never expected to see

Exploratory Data Analysis Visualization alone is not enough Part of a larger process to extract insight Data process chain near Non-li Error! d n a Trial Images: Ben Fry, 2004

Exploring a Network 1) See the network Draw using a certain layout,... 2) Interact in real time Group, filter, compute metrics,... 3) Build a visual language Size of nodes, thickness of edges, colors,...

Exploring Graphs Today we are going to use Gephi Open-Source Network Analysis and Visualization Platform (written in Java)

Why Gephi? Because it has a large community Because it has history and will continue to have Started at 1998 Maintained by a consortium (long-term vision) Because it is extensible with plugins Gephi marketplace Because I am familiar with it! :) There are other options: The main concepts and ideas we will show can be used on any other visualization tool

Datasets for Today Co-Authorships in Network Science http://www-personal.umich.edu/~mejn/netdata/netscience.zip Compiled by Mark Newman in May 2006 Available in gml (Graph Modeling Language) 1,589 scientists, 2,742 collaborations Flights Data http://openflights.org/data.html Compiled by Open Flights website 3,440 airports, 67,663 routes from 531 airlines

What to do? Load graph Filter Force Directed, Geographical, Circular, (polishing the results) Ranking Centralities, degrees, distances, communities Draw using a layout Main operators, selecting, ranges, combining Compute metrics Opening a network vs importing data Color or size of the nodes and edges according to a metric Partition Coloring according to a partition

What to do?! O M E D