Centrality Book. cohesion.

Similar documents
Alessandro Del Ponte, Weijia Ran PAD 637 Week 3 Summary January 31, Wasserman and Faust, Chapter 3: Notation for Social Network Data

The Network Analysis Five-Number Summary

Unit 2: Graphs and Matrices. ICPSR University of Michigan, Ann Arbor Summer 2015 Instructor: Ann McCranie

V4 Matrix algorithms and graph partitioning

Economics of Information Networks

THE KNOWLEDGE MANAGEMENT STRATEGY IN ORGANIZATIONS. Summer semester, 2016/2017

V2: Measures and Metrics (II)

Chapter 4 Graphs and Matrices. PAD637 Week 3 Presentation Prepared by Weijia Ran & Alessandro Del Ponte

2.2. Data Format Save as UCINET files in Netdraw Import from VNA format Node Level Analysis Degree...

The Basics of Network Structure

Centralities (4) By: Ralucca Gera, NPS. Excellence Through Knowledge

Chapter 10. Fundamental Network Algorithms. M. E. J. Newman. May 6, M. E. J. Newman Chapter 10 May 6, / 33


Introduction to network metrics

CS2 Algorithms and Data Structures Note 9

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/27/18

CS224W: Analysis of Networks Jure Leskovec, Stanford University

Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/18/14

(Refer Slide Time 6:48)

Web Science and Web Technology Affiliation Networks

1 Algorithms, Inputs, Outputs

Fundamentals of Operations Research. Prof. G. Srinivasan. Department of Management Studies. Indian Institute of Technology, Madras. Lecture No.

A Complement-Derived Centrality Index for Disconnected Graphs 1

MSA220 - Statistical Learning for Big Data

An Information Theory Approach to Identify Sets of Key Players

Characterizing Graphs (3) Characterizing Graphs (1) Characterizing Graphs (2) Characterizing Graphs (4)

2. CONNECTIVITY Connectivity

CSE 190 Lecture 16. Data Mining and Predictive Analytics. Small-world phenomena

Advanced Operations Research Techniques IE316. Quiz 1 Review. Dr. Ted Ralphs

Social Network Analysis

Mathematical Foundations

A New Measure of Linkage Between Two Sub-networks 1

Equations and Functions, Variables and Expressions

Digital Image Processing. Prof. P. K. Biswas. Department of Electronic & Electrical Communication Engineering

Advanced Operations Research Prof. G. Srinivasan Department of Management Studies Indian Institute of Technology, Madras

6.001 Notes: Section 6.1

Module 2: NETWORKS AND DECISION MATHEMATICS

Supplementary material to Epidemic spreading on complex networks with community structures

The Basics of Graphical Models

1 More configuration model

Robustness of Centrality Measures for Small-World Networks Containing Systematic Error

Further Mathematics 2016 Module 2: NETWORKS AND DECISION MATHEMATICS Chapter 9 Undirected Graphs and Networks

ELEC-270 Solutions to Assignment 5

Lecture Note: Computation problems in social. network analysis

Logic: The Big Picture. Axiomatizing Arithmetic. Tautologies and Valid Arguments. Graphs and Trees

Link Analysis in the Cloud

CPSC 320 Sample Solution, Playing with Graphs!

Centrality. Peter Hoff. 567 Statistical analysis of social networks. Statistics, University of Washington 1/36

Lecture 9 - Matrix Multiplication Equivalences and Spectral Graph Theory 1

CSCI5070 Advanced Topics in Social Computing

NP-Complete Problems

Rockefeller College University at Albany

Week 5 Video 5. Relationship Mining Network Analysis

TELCOM2125: Network Science and Analysis

Undirected Network Summary

PITSCO Math Individualized Prescriptive Lessons (IPLs)

A Formalization of Transition P Systems

Statistical Methods for Network Analysis: Exponential Random Graph Models

(Refer Slide Time: 05:25)

MIDTERM EXAMINATION Networked Life (NETS 112) November 21, 2013 Prof. Michael Kearns

Computational problems. Lecture 2: Combinatorial search and optimisation problems. Computational problems. Examples. Example

Tie strength, social capital, betweenness and homophily. Rik Sarkar

Simple Graph. General Graph

MT5821 Advanced Combinatorics

1 Homophily and assortative mixing

Mathematics of Networks II

Spatial Patterns Point Pattern Analysis Geographic Patterns in Areal Data

Virtualization. Q&A with an industry leader. Virtualization is rapidly becoming a fact of life for agency executives,

Pagerank Scoring. Imagine a browser doing a random walk on web pages:

Lecture 11: Randomized Least-squares Approximation in Practice. 11 Randomized Least-squares Approximation in Practice

We have already seen the transportation problem and the assignment problem. Let us take the transportation problem, first.

Network Basics. CMSC 498J: Social Media Computing. Department of Computer Science University of Maryland Spring Hadi Amiri

Graphs Introduction and Depth first algorithm

Using! to Teach Graph Theory

Midterm Review NETS 112

MITOCW watch?v=kz7jjltq9r4

MAT 090 Brian Killough s Instructor Notes Strayer University

Web Structure Mining Community Detection and Evaluation

Mining Social Network Graphs

Graphical Models. David M. Blei Columbia University. September 17, 2014

adjacent angles Two angles in a plane which share a common vertex and a common side, but do not overlap. Angles 1 and 2 are adjacent angles.

For Module 2 SKILLS CHECKLIST. Fraction Notation. George Hartas, MS. Educational Assistant for Mathematics Remediation MAT 025 Instructor

SOFTWARE ENGINEERING Prof.N.L.Sarda Computer Science & Engineering IIT Bombay. Lecture #10 Process Modelling DFD, Function Decomp (Part 2)

Lecture 1: Overview

CSE 417 Network Flows (pt 3) Modeling with Min Cuts

How can we lay cable at minimum cost to make every telephone reachable from every other? What is the fastest route between two given cities?

Graph Data Processing with MapReduce

GraphBLAS Mathematics - Provisional Release 1.0 -

Multi-Criteria Decision Making 1-AHP

CONNECTIVITY AND NETWORKS

Graph Theory Review. January 30, Network Science Analytics Graph Theory Review 1

Lecture 5: The Halting Problem. Michael Beeson

Lecture 23 Representing Graphs

Graph Algorithms. Revised based on the slides by Ruoming Kent State

CSE 100: GRAPH ALGORITHMS

Graph Theory. ICT Theory Excerpt from various sources by Robert Pergl

16 - Networks and PageRank

Lecture 26: Graphs: Traversal (Part 1)

(Refer Slide Time: 02.06)

University of Maryland. Tuesday, March 2, 2010

Transcription:

Cohesion The graph-theoretic terms discussed in the previous chapter have very specific and concrete meanings which are highly shared across the field of graph theory and other fields like social network analysis that use graph theory. For those terms, the conceptual idea and its measurement are one and the same thing. For example we defined degree as a certain quantity that is calculated in a certain way. In contrast, in this section we consider the more abstract social science concept of cohesion, which can be measured in a variety of different ways, none of which can be said to be the way. In general, measures of cohesion are built upon the more fundamental concepts described in the previous section. The idea of cohesion is connectedness or knittedness. Most people think about cohesion as a group or network level property. Some groups are more cohesive than others. There is a Spanish word enrededado that expresses it nicely. It means mixed up together, like a big clump of electrical wires. It is particularly appropriate because the word is based on the word for network, which is red. A measure of group cohesion, then, is a single value that characterizes the stuck-togetherness of the group. But we can also talk about dyadic or relational cohesion. This refers to the attraction or closeness of pairs of nodes, some of whom will be very cohesive while others are quite distant or uninvolved. Actually, the idea can be applied to any level of analysis. For example, at the node level, we can refer to the extent that a node is connected with all other nodes in the network as the node s cohesion with the rest of the group. In fact, it is key thesis of this book that one of the main types of centrality is nothing more than nodelevel cohesion. Because group-level cohesion is more familiar and easier to understand, it would make sense to discuss that first. However, since many of the group-level measures are built on dyadic level measures, it is more logical and convenient to discuss the dyadic level measures first. Relational Cohesion The simplest, most fundamental measure of relational cohesion is adjacency itself. If you and I have a tie (let s say a trust tie) then we are more cohesive than if we didn t have a tie. Of course, we have to be careful to think about what kind of relation is being measured. But even a conflict or dislike tie is a measure of dyadic cohesion it is just that it is an inverse measure. If the data consist of valued ties (e.g., strengths or frequencies), so much the better, because then we have degrees of cohesion instead of simple presence or absence. It s useful to note that some nodes that are not adjacent may still be indirectly related. All nodes that belong to the same component are far more cohesive than a pair of nodes that are on separate components. If a virus is spreading in one component, it will eventually reach every node in the component but it cannot jump to another component. This

suggests another measure of dyadic cohesion, namely reachability. Two nodes are reachable if there exists a path no matter how long from one to the other. We typically represent this as a matrix R in which r ij = 1 if i can reach j by some path, and r ij = 0 if no such path exists (i.e., the nodes are on separate components). Of course, if we are using the existence of a path from one node to another as a measure of cohesion, it is only a small stretch to consider counting the number of links in the shortest path between two nodes as an inverse measure of dyadic cohesion. The geodesic distance matrix is in fact extremely similar to the adjacency matrix of a graph: where there are 1s in the adjacency matrix, there are 1s in the distance matrix. But where there are 0s in the adjacency matrix, there are a range of values in the distance matrix, providing a more nuanced account of lack of adjacency. One problem with geodesic distance is that the distance between nodes in separate components is technically undefined (or, popularly, infinite). A solution is to use the reciprocal of geodesic distance (1/d ij ) with the convention that if the distance is undefined, then the reciprocal is zero. This also has the advantage of making it so that larger values indicate more cohesion. An entirely different approach is based on the notion of a tie being buttressed by the ties that the two nodes have in common with third parties. The basic idea is that if a tie is embedded in a locally dense region of the network, it will be harder for the tie to break apart, and for one party to treat to the other badly (because anything they do will be observed by others). Thus, a tie between nodes A and B is said to be embedded to the extent there exist third parties C such that both A and B are adjacent to C (Feld, 19xx). Such ties have also been referred to as Simmelian ties (Krackhardt, 19xx). Group-Level Cohesion The group-level counterpart of the simplest type of dyadic cohesion is the simple sum of each of the dyadic cohesions. This gives the total amount of cohesion in the network. If the data consist of frequencies of interactions between people, then the sum gives the total amount of interaction going on in the system. If the adjacency matrix consists of 1s and 0s, indicating the presence or absence of ties, then the sum gives the total number of ties in the group. We can then compare that total with the total for other groups of similar size, to get a sense of the relative amount of cohesion in each. Alternatively, we could normalize this total by dividing by the maximum possible, facilitating comparisons across graphs of different sizes. In an undirected graph without reflexive loops, the maximum is given by n*(n-1)/2, where n is the number of nodes in the graph. Dividing by this number gives the proportion of all dyads that are actually tied, a measure known as graph density. Equation 1 gives the full formula, where T is the number of ties in the network and n is the number of nodes.

2T Density = Equation 1 Another way to think of density is as the average of all values in the adjacency matrix of the graph (not counting the diagonal values, which represent self-loops). Taking this perspective makes the generalization of density to valued ties quite natural, as we simply take the average tie strength. A closely related measure of cohesion is the average degree of the network. If we compute the degree (number of ties) for each node, and then average these degrees, we obtain the average degree of the network. Equation 2 gives the formula and shows how average degree ( d ) is related to density. The average degree is easier to interpret than density because it is literally the average number of ties that each node has. It is also the average of the row sums of the adjacency matrix. 2T d = = Density * ( Equation 2 n One problem with both density and average degree is that they don t take into account the broader structure of the network. For example, a network that is divided into two very dense components (see Figure xx) will have high density and average degree scores, but will only be cohesive within local regions. Globally, the network will be very noncohesive because none of the nodes in one component can reach any of the nodes in the other. This suggests a number of measures of cohesion (or non-cohesion) based on the number and size of components in the network. The first, based only on the number of components, is typically normalized as shown in Equation 3, where c is the number of components and n is the number of nodes in the graph. This normalized measure called the component ratio achieves its maximum value of 1.0 when every node is an isolate, and its minimum value of 0 when there is just one component. Obviously, this is an inverse measure of cohesion as larger values indicate less cohesion. c 1 CR = Equation 3 n 1 A more sensitive measure along these same lines is called fragmentation (Borgatti, 19xx) or connectedness (Krackhardt, XXXX). Fragmentation is defined as the proportion of pairs of nodes that cannot reach each other by any path. In other words, the proportion of pairs of nodes that are not located in the same component. The formula for fragmentation is given in Equation 4, where r ij is 1 if nodes i and j are in the same component and 0 otherwise. rij i, j F = 1 = 1 Connectedness Equation 4

Another approach to operationalizing cohesion is based on the lengths of paths connecting pairs of nodes. Perhaps the most obvious measure of this type is the average geodesic distance, also known as the characteristic path length. This is literally the average of all the values in the distance matrix, not including the main diagonal. It can be seen as a measure of how long things typically take to flow from one randomly chosen node to another, at least for something that flows along shortest paths. Clearly, if things flowing through the network can reach nodes quickly, the network is in this sense cohesive. A variation on this form of cohesion is the diameter of the graph, which is simply the largest value in the distance matrix (i.e., the length of largest shortest path).it can be seen as the maximum amount of time that it would take to diffuse something to every single person in a network when traveling exclusively via shortest paths. All variations on the mean, the mode, the maximum and so on can be seen as measures of network cohesion when applied to the geodesic distance matrix. A difficulty with the average geodesic distance and its variants is that it cannot be applied to disconnected graphs, i.e., ones with multiple components, since some distances are not defined. One strategy is to replace the missing values with an arbitrary large value. 1 Another strategy is to take reciprocals of the valid distances and assign zeroes to the missing cells. This is called breadth, and is defined according to Equation 5, where d ij is the geodesic distance from i to j and 1/d ij is defined to be zero when d ij is undefined. Like fragmentation, breadth is an inverse measure of cohesion. 1 i, j dij Equation 4 B = 1 A completely different approach to cohesion is robustness. How difficult is it to disconnect the network by removing nodes or lines? If you need to remove quite a few nodes or lines to increase the number of components in the graph, then the network is highly robust and in this sense cohesive. Equivalently, if you have to remove many lines or nodes, then there must be many fully independent paths between the nodes, again suggesting cohesion. This suggests that the graph-theoretic concepts of cutpoint and vertex cutset, as well as bridge and edge cutset, might be useful. Indeed, both the vertex connectivity and the edge connectivity of a graph can be seen as measures of cohesion. The greater the value, the more independent paths there are between all pairs of nodes, and the more cohesive the network. 1 Valente suggests subtracting the valid distances from an arbitrary constant and assigning 0 for the cells corresponding to missing distances. The correlation of this with the original strategy of simply assigning a constant to missing values is a perfect -1.0.

- discuss transitivity as local cohesion? (Or as normalized aggregate of embeddedn/simmelian ties) - Discuss centrality and cp structures? Dyad Level Group Level Adjacency Density Strength of tie ATS (Average Tie Strength) (D-Bar?) Geodesic Distance CPL (Characteristic Path Length) Breadth Reachability Connectedness; Fragmentation; Component Ratio Simmelian/Embedded ties Transitivity As we shall see in later chapters, if can regard some network level cohesion measures as aggregations of dyadic cohesion across all pairs of nodes to obtain a summary for the entire network, then shall also be able to view centrality as an aggregation of dyadic cohesion up to the node level. In short, sum measures just sum the cohesion of the ties associated with a given node.