A subquadratic triad census algorithm for large sparse networks with small maximum degree

Similar documents
A subquadratic triad census algorithm for large sparse networks with small maximum degree

arxiv:cs/ v1 [cs.ds] 25 Oct 2003

AN ALGORITHM FOR CORES DECOMPOSITION OF NETWORKS

Alessandro Del Ponte, Weijia Ran PAD 637 Week 3 Summary January 31, Wasserman and Faust, Chapter 3: Notation for Social Network Data

Math.Subj.Class.(2000): 05 A 18, 05 C 70, 05 C 85, 05 C 90, 68 R 10, 68 W 40, 92 H 30, 93 A 15.

Math.3336: Discrete Mathematics. Chapter 10 Graph Theory

The Pennsylvania State University The Graduate School College of Engineering FAST PARALLEL TRIAD CENSUS AND TRIANGLE LISTING ON

The b-chromatic number of cubic graphs

Introduction to Graph Theory

On median graphs and median grid graphs

World Inside a Computer is Binary

Pajek. Chase Christopherson, Heriberto Diaz, Jiyuan Guo

Math 776 Graph Theory Lecture Note 1 Basic concepts

CS473-Algorithms I. Lecture 13-A. Graphs. Cevdet Aykanat - Bilkent University Computer Engineering Department

Introduction III. Graphs. Motivations I. Introduction IV

Combinatorics Summary Sheet for Exam 1 Material 2019

Logic and Discrete Mathematics. Section 2.5 Equivalence relations and partitions

REGULAR GRAPHS OF GIVEN GIRTH. Contents

Notes on Blockmodeling

Graph Theory S 1 I 2 I 1 S 2 I 1 I 2

HW Graph Theory SOLUTIONS (hbovik) - Q

First Steps to Network Visualization with Pajek

MC 302 GRAPH THEORY 10/1/13 Solutions to HW #2 50 points + 6 XC points

Discrete Applied Mathematics. A revision and extension of results on 4-regular, 4-connected, claw-free graphs

CHAPTER 3 FUZZY RELATION and COMPOSITION

Computing Regular Equivalence: Practical and Theoretical Issues

Transitivity and Triads

Graph and Digraph Glossary

Native mesh ordering with Scotch 4.0

On subgraphs of Cartesian product graphs

Chapter 4 Graphs and Matrices. PAD637 Week 3 Presentation Prepared by Weijia Ran & Alessandro Del Ponte

Slides for Faculty Oxford University Press All rights reserved.

Efficient Counting of Network Motifs

Math 778S Spectral Graph Theory Handout #2: Basic graph theory

Network analysis of dictionaries

BACKGROUND: A BRIEF INTRODUCTION TO GRAPH THEORY

Lecture 22 Tuesday, April 10

Algorithms: Graphs. Amotz Bar-Noy. Spring 2012 CUNY. Amotz Bar-Noy (CUNY) Graphs Spring / 95

Graphs, graph algorithms (for image segmentation),... in progress

Sequences from Centered Hexagons of Integers

1. a graph G = (V (G), E(G)) consists of a set V (G) of vertices, and a set E(G) of edges (edges are pairs of elements of V (G))

Ray shooting from convex ranges

Graphs (MTAT , 6 EAP) Lectures: Mon 14-16, hall 404 Exercises: Wed 14-16, hall 402

Mathematics and Statistics, Part A: Graph Theory Problem Sheet 1, lectures 1-4

Course on Social Network Analysis Weights

Estimating normal vectors and curvatures by centroid weights

Characterizing Graphs (3) Characterizing Graphs (1) Characterizing Graphs (2) Characterizing Graphs (4)

Definition: A graph G = (V, E) is called a tree if G is connected and acyclic. The following theorem captures many important facts about trees.

Algorithms. Graphs. Algorithms

Elements of Graph Theory

5 Graphs

The crossing number of K 1,4,n

Computational problems. Lecture 2: Combinatorial search and optimisation problems. Computational problems. Examples. Example

Structural Balance and Transitivity. Social Network Analysis, Chapter 6 Wasserman and Faust

Graphs. Introduction To Graphs: Exercises. Definitions:

CS 220: Discrete Structures and their Applications. graphs zybooks chapter 10

Discrete mathematics , Fall Instructor: prof. János Pach

Worcester County Mathematics League

Ma/CS 6b Class 26: Art Galleries and Politicians

Graphes: Manipulations de base et parcours

MSWLogo and dynamic link libraries. Matjaž Zaveršnik, Vladimir Batagelj

Edge disjoint monochromatic triangles in 2-colored graphs

International Mathematics TOURNAMENT OF THE TOWNS. Junior A-Level Solutions Fall 2013

CSCI5070 Advanced Topics in Social Computing

Selection, Bubble, Insertion, Merge, Heap, Quick Bucket, Radix

A RADIO COLORING OF A HYPERCUBE

1. Find f(1), f(2), f(3), and f(4) if f(n) is defined recursively by f(0) = 1 and for n = 0, 1, 2,

Generalized blockmodeling of sparse networks

University of Toronto Department of Mathematics

Extremal Graph Theory: Turán s Theorem

Ma/CS 6b Class 13: Counting Spanning Trees

CHAPTER 3 FUZZY RELATION and COMPOSITION

Domination, Independence and Other Numbers Associated With the Intersection Graph of a Set of Half-planes

Signed domination numbers of a graph and its complement

Definition. Given a (v,k,λ)- BIBD, (X,B), a set of disjoint blocks of B which partition X is called a parallel class.

Discrete Mathematics

Vertex-Colouring Edge-Weightings

An Introduction to Graph Theory

ON THE STRONGLY REGULAR GRAPH OF PARAMETERS

On vertex-degree restricted subgraphs in polyhedral graphs

Efficient Minimization of New Quadric Metric for Simplifying Meshes with Appearance Attributes

Number System. Introduction. Natural Numbers (N) Whole Numbers (W) Integers (Z) Prime Numbers (P) Face Value. Place Value

Graph Algorithms. Chromatic Polynomials. Graph Algorithms

Chapter 4. Relations & Graphs. 4.1 Relations. Exercises For each of the relations specified below:

Course Introduction / Review of Fundamentals of Graph Theory

Binary Relations McGraw-Hill Education

CS 311 Discrete Math for Computer Science Dr. William C. Bulko. Graphs

MC302 GRAPH THEORY SOLUTIONS TO HOMEWORK #1 9/19/13 68 points + 6 extra credit points

An interpolating 4-point C 2 ternary stationary subdivision scheme

CS Data Structures and Algorithm Analysis

Cardinality of Sets. Washington University Math Circle 10/30/2016

Chordal graphs and the characteristic polynomial

Graphs. The ultimate data structure. graphs 1

CS6702 GRAPH THEORY AND APPLICATIONS 2 MARKS QUESTIONS AND ANSWERS

DS UNIT 4. Matoshri College of Engineering and Research Center Nasik Department of Computer Engineering Discrete Structutre UNIT - IV

ECS 20 Lecture 17b = Discussion D8 Fall Nov 2013 Phil Rogaway

Problem Set 3. MATH 778C, Spring 2009, Austin Mohr (with John Boozer) April 15, 2009

CS 441 Discrete Mathematics for CS Lecture 26. Graphs. CS 441 Discrete mathematics for CS. Final exam

2009 HMMT Team Round. Writing proofs. Misha Lavrov. ARML Practice 3/2/2014

Definition of Graphs and Trees. Representation of Trees.

Transcription:

Social Networks 23 (2001) 237 243 A subquadratic triad census algorithm for large sparse networks with small maximum degree Vladimir Batagelj, Andrej Mrvar Department of Mathematics, University of Ljubljana, FMF, Jadranska 19, 1000 Ljubljana, Slovenia Abstract In the paper a subquadratic (O(m), m is the number of arcs) triad census algorithm for large and sparse networks with small maximum degree is presented. The algorithm is implemented in the program Pajek. 2001 Elsevier Science B.V. All rights reserved. MSC: 91D30; 93A15; 68R10; 05C85; 68W40; 05C90 Keywords: Large networks; Triad census; Algorithm 1. Introduction Moody (1998) proposed in a quadratic (O(n 2 ), n is the number of vertices) algorithm for determining the triad census of a network. For (very) large and sparse networks (tens or hundreds of thousands of vertices) a subquadratic algorithm is needed (Batagelj and Mrvar, 1998, http://vlado.fmf.uni-lj.si/pub/networks/pajek/). In this paper we present such an O(m), m is the number of arcs, triad census algorithm. Let G = (V, R) be a directed network (relational graph); V is the set of vertices and R V V is the set of arcs. We assume that G has no loops the relation R is irreflexive. We denote by R the inverse relation, xr y yrx, and by ˆR = R R the symmetrized relation. ˆR(v) is the set of all neighbors of vertex v, ˆR(v) ={u : v ˆRu}, and ˆd(v) its cardinality ˆd(v) = ˆR(v). Note that ˆd(v) 2m. We also define the maximum degree in network G ˆ = max ˆd(v) Most large networks are sparse the number of lines (edges or arcs) m is subquadratic: m = O(k(n) n), where k(n) is small with respect to n, k(n) n (for example k(n) = Corresponding author. E-mail address: vladimir.batagelj@uni-lj.si (V. Batagelj). 0378-8733/01/$ see front matter 2001 Elsevier Science B.V. All rights reserved. PII: S0378-8733(01)00035-1

238 V. Batagelj, A. Mrvar / Social Networks 23 (2001) 237 243 constant, or k(n) = log n,ork(n) = n). For some interesting ideas about the structure of large networks see (Barabasi et al., 1999, http://www.nd.edu/ networks). Graph G determines a function { 1 urv Link(u, v) = 0 otherwise that we shall use in our algorithm. 2. The algorithm 2.1. Basic idea All possible triads (Wasserman and Faust, 1994, p. 244) can be partitioned into three basic types (see Fig. 1): the null triad 003; dyadic triads 012 and 102; and Fig. 1. Types of triads.

V. Batagelj, A. Mrvar / Social Networks 23 (2001) 237 243 239 connected triads: 111D, 201, 210, 300, 021D, 111U, 120D, 021U, 030T, 120U, 021C, 030C and 120C. In a large and sparse network most triads are null triads Since the total number of triads is T = ( n ) 3 and the above types partition the set of all triads, the idea of the algorithm is as follows: count all dyadic T 2 and all connected T 3 triads with their subtypes; compute the number of null triads T 1 = T T 2 T 3. In the algorithm we have to assure that every non-null triad is counted exactly once. A set of three vertices {v, u, w} can be in general selected in 6 different ways (v,u,w),(v,w,u), (u, v, w), (u, w, v), (w,v,u),(w,u,v). We solve the problem by introducing the canonical selection that contributes to the triadic count; the other, noncanonical selections need not to be considered in the counting process. In the following, to simplify the presentation, we shall assume that the vertices are represented by integers V ={1, 2, 3,...,n}. 2.2. Counting dyadic triads Every connected dyad forms a dyadic triad with every vertex both members of the dyad are not adjacent to. Each pair of vertices (v, u), v<uconnected by an arc contributes n ˆR(u) ˆR(v) \{u, v} 2 triads of type 3-102, if u and v are connected in both directions; and of type 2-012 otherwise. The condition v<udetermines the canonical selection for dyadic triads. 2.3. Counting connected triads A selection (v,u,w)of connected triad is canonical iff v<u<w. The triads isomorphism problem can be efficiently solved by assigning to each triad a code Tricode(v,u,w) an integer number between 0 to 63, defined by function Tricode(v, u, w : vertex) : integer; begin Tricode := Link(v, u) + 2 (Link(u, v) + 2 (Link(v, w)+ 2 (Link(w, v) + 2 (Link(u, w) + 2 Link(w, u))))); end; The function Tricode essentially treats the out-diagonal entries of triad adjacency matrix as a binary number (see Fig. 2). Each triad code corresponds to a unique triad type as described by Table 1.

240 V. Batagelj, A. Mrvar / Social Networks 23 (2001) 237 243 Fig. 2. Triad code. 2.4. The triad census algorithm INPUT: relational graph G = (V, R) represented by lists of neighbors OUTPUT: table Census with frequencies of triadic types 1 for i := 1 to 16 do Census[i] := 0; 2 for each v V do 2.1 for each u ˆR(v) do if v<uthen begin 2.1.1 S := ˆR(u) ˆR(v) \{u, v}; 2.1.2 if vru urv then TriType := 3 else TriType := 2; 2.1.3 Census[TriType] := Census[TriType] + n S 2; 2.1.4 for each w S do if u < w (v < w w < u v ˆRw) then begin 2.1.4.1 TriType := TriTypes[Tricode(v,u,w)]; 2.1.4.2 Census[TriType] := Census[TriType] + 1; end end; Table 1 Triad types Code Type Code Type Code Type Code Type 0 1 16 2 32 2 48 3 1 2 17 6 33 5 49 7 2 2 18 4 34 6 50 8 3 3 19 8 35 7 51 11 4 2 20 5 36 6 52 7 5 4 21 9 37 9 53 12 6 6 22 9 38 10 54 14 7 8 23 13 39 14 55 15 8 2 24 6 40 4 56 8 9 6 25 10 41 9 57 14 10 5 26 9 42 9 58 13 11 7 27 14 43 12 59 15 12 3 28 7 44 8 60 11 13 8 29 14 45 13 61 15 14 7 30 12 46 14 62 15 15 11 31 15 47 15 63 16

V. Batagelj, A. Mrvar / Social Networks 23 (2001) 237 243 241 3 sum := 0; for i := 2 to 16 do sum := sum + Census[i]; Census[1] := (1/6)n(n 1)(n 2) sum; For a connected triad we can always assume that v is the smallest of its vertices. So we have in 2.1.4 to determine the canonical selection from the remaining two selections (v,u,w) and (v,w,u).ifv<w<uand v ˆRw then the selection (v,w,u) was already counted before. Therefore we have to consider it as canonical only if v ˆRw. In an implementation of the algorithm, we must take care about the range overflow in the case of T and T 1. For example, already T(10 000) = ( 10 000 ) 3 1.67 10 11. It is easy to see that T 2 < nm and T 3 < ˆ m. Therefore, using in Delphi for counters the type Int64 with 19 20 decimal digits, there should be no problem also with sparse networks on some millions of vertices. 2.5. Complexity of the algorithm Assuming that the sets ˆR(v) are represented by ordered lists, the count of dyadic triads (2.1.1 2.1.3) for each arc (v, u) can be done in time O( ˆ ). Therefore, the total complexity for counting dyadic triads is T(T 2 ) = O(m ˆ ). The body (2.1.4.1 2.1.4.2) of the connected triads counting loop requires constant time. Therefore the complexity of counting connected triads T(T 3 ) is of the same order as the number of all connected triads. Since every connected triad contains a vertex that is an origin of an angle we have T 3 ( ) ˆd(v) = 1 ˆd(v)( ˆd(v) 1) 1 [2] 2 2 ( ˆ 1) ˆd(v) ( ˆ 1)m Counting the null triads (3) can be done in constant time. Therefore the total complexity of the algorithm is O( ˆ m) and thus, for networks with small ˆ n, since 2m n ˆ, of order O(n). Note that in a large sparse network some vertices can still have large degrees. For such networks ˆ = O(n) and the proposed algorithm is quadratic. 3. Example Internet connections As an example of application of the proposed algorithm we applied it to the routing data on the Internet network. This network was produced from web scanning data (May 1999) available from http://www.cs.bell-labs.com/who/ches/map/index.html. It can be obtained as a Pajek s NET file from http://vlado.fmf.uni-lj.si/pub/networks/ data/web/web.zip. It has 124,651 vertices, 207,214 arcs, ˆ = 153 and average degree is 3.3. Fig. 3 shows 6-core (arcs are counted in both directions) of the largest strongly connected component of the network. Using Pajek implementation of the proposed algorithm on 300 MHz PC we obtained in 10 s the triad census presented in Table 2.

242 V. Batagelj, A. Mrvar / Social Networks 23 (2001) 237 243 Fig. 3. 6-Core of the routing data network. There are only 69 complete triads (type 16-300). Using Pajek s pattern searching we identified all of them and extracted the induced subnetwork from the network. The subnetwork has 18 components, 11 of them are triangles. The other, nontriangular, components are presented in Fig. 4. Table 2 Triad census of the routing data network Triad Count 1-003 322,769,974,374,083 2-012 23,955,959,979 3-102 175,605,448 4-021D 882,596 5-021U 109,179 6-021C 444,490 7-111D 4917 8-111U 15,508 9-030T 17,107 10-030C 111 11-201 1002 12-120D 899 13-120U 1120 14-120C 96 15-210 121 16-300 69

V. Batagelj, A. Mrvar / Social Networks 23 (2001) 237 243 243 Fig. 4. Nontriangular complete triadic components of the routing data network. Acknowledgements The algorithm was presented at 17th Leoben Ljubljana seminar on Discrete mathematics, Leoben, 27 29 April 2000. This work was supported by the Ministry of Science and Technology of Slovenia, Project J1-8532. References Barabasi, A-L., Reka, A., Hawoong, J., 1999. Mean-field theory for scale-free random networks. Physica A 272, 173 187. Batagelj, V., Mrvar, A., 1998. Pajek a program for large network analysis. Connections 21 (2), 47 57. Wasserman, S., Faust, K., 1994. Social Network Analysis. Cambridge University Press, Cambridge. Moody, J., 1998. Matrix methods for calculating the triad census. Social Networks 20, 291 299.