THE DEPENDENCY NETWORK IN FREE OPERATING SYSTEM

Similar documents
Basics of Network Analysis

Statistical Analysis of the Metropolitan Seoul Subway System: Network Structure and Passenger Flows arxiv: v1 [physics.soc-ph] 12 May 2008

An Exploratory Journey Into Network Analysis A Gentle Introduction to Network Science and Graph Visualization

Network Theory: Social, Mythological and Fictional Networks. Math 485, Spring 2018 (Midterm Report) Christina West, Taylor Martins, Yihe Hao

TELCOM2125: Network Science and Analysis

Empirical analysis of online social networks in the age of Web 2.0

Introduction to network metrics

Immunization for complex network based on the effective degree of vertex

SNA 8: network resilience. Lada Adamic

Higher order clustering coecients in Barabasi Albert networks

The network of scientific collaborations within the European framework programme

Earthquake Networks, Complex

The missing links in the BGP-based AS connectivity maps

BMC Bioinformatics. Open Access. Abstract

arxiv:cs/ v4 [cs.ni] 13 Sep 2004

Universal Behavior of Load Distribution in Scale-free Networks

Second-Order Assortative Mixing in Social Networks

Modelling weighted networks using connection count

Response Network Emerging from Simple Perturbation

Universal Properties of Mythological Networks Midterm report: Math 485

Social Data Management Communities

Optimal weighting scheme for suppressing cascades and traffic congestion in complex networks

Summary: What We Have Learned So Far

1 Homophily and assortative mixing

Modeling Traffic of Information Packets on Graphs with Complex Topology

CAIM: Cerca i Anàlisi d Informació Massiva

Deterministic Hierarchical Networks

Weighted Evolving Networks with Self-organized Communities

Structure of biological networks. Presentation by Atanas Kamburov

Structural Analysis of Paper Citation and Co-Authorship Networks using Network Analysis Techniques

arxiv:cond-mat/ v2 [cond-mat.stat-mech] 1 Sep 2002

Tuning clustering in random networks with arbitrary degree distributions

Benchmarks for testing community detection algorithms on directed and weighted graphs with overlapping communities

Topic mash II: assortativity, resilience, link prediction CS224W

Cascading failures in complex networks with community structure

caution in interpreting graph-theoretic diagnostics

2007 by authors and 2007 World Scientific Publishing Company

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and

Research on Community Structure in Bus Transport Networks

Topology of the Erasmus student mobility network

Community detection. Leonid E. Zhukov

Supplementary material to Epidemic spreading on complex networks with community structures

Example for calculation of clustering coefficient Node N 1 has 8 neighbors (red arrows) There are 12 connectivities among neighbors (blue arrows)

Section 7.13: Homophily (or Assortativity) By: Ralucca Gera, NPS

Networks in economics and finance. Lecture 1 - Measuring networks

Graph-theoretic Properties of Networks

First-improvement vs. Best-improvement Local Optima Networks of NK Landscapes

CS-E5740. Complex Networks. Network analysis: key measures and characteristics

Properties of Biological Networks

Social Network Analysis With igraph & R. Ofrit Lesser December 11 th, 2014

Reply Networks on Bulletin Board System

Wednesday, March 8, Complex Networks. Presenter: Jirakhom Ruttanavakul. CS 790R, University of Nevada, Reno

arxiv:cond-mat/ v5 [cond-mat.dis-nn] 16 Aug 2006

Attack Vulnerability of Network with Duplication-Divergence Mechanism

An introduction to the physics of complex networks

Topologies and Centralities of Replied Networks on Bulletin Board Systems

Search in weighted complex networks

Traffic dynamics based on an efficient routing strategy on scale free networks

Stable Statistics of the Blogograph

Fractal small-world dichotomy in real-world networks

The Topology and Dynamics of Complex Man- Made Systems

Complex Networks: Ubiquity, Importance and Implications. Alessandro Vespignani

Graph Theory. Graph Theory. COURSE: Introduction to Biological Networks. Euler s Solution LECTURE 1: INTRODUCTION TO NETWORKS.

arxiv:cond-mat/ v2 [cond-mat.stat-mech] 27 Feb 2004

Necessary backbone of superhighways for transport on geographical complex networks

Small World Properties Generated by a New Algorithm Under Same Degree of All Nodes

Why Do Computer Viruses Survive In The Internet?

An Evolving Network Model With Local-World Structure

Self-organization of collaboration networks

Phase Transitions in Random Graphs- Outbreak of Epidemics to Network Robustness and fragility

Girls Talk Math Summer Camp

Using Stable Communities for Maximizing Modularity

beyond social networks

COMP6237 Data Mining and Networks. Markus Brede. Lecture slides available here:

1 Degree Distributions

Integrating local static and dynamic information for routing traffic

A Generating Function Approach to Analyze Random Graphs

Introduction to Complex Networks Analysis

arxiv: v2 [cs.si] 19 Jan 2015

Evolving Models for Meso-Scale Structures

Community structures evaluation in complex networks: A descriptive approach

Characteristics of Preferentially Attached Network Grown from. Small World

Mixing navigation on networks. Tao Zhou

γ : constant Goett 2 P(k) = k γ k : degree

V2: Measures and Metrics (II)

Star Decompositions of the Complete Split Graph

Cluster Analysis. Angela Montanari and Laura Anderlucci

Complex-Network Modelling and Inference

Complex networks: A mixture of power-law and Weibull distributions

Disclaimer. Lect 2: empirical analyses of graphs

Complex networks: Dynamics and security

Large-scale topological and dynamical properties of the Internet

Complex Networks. Structure and Dynamics

Critical Phenomena in Complex Networks

Statistical analysis of the airport network of Pakistan

Ranking of nodes of networks taking into account the power function of its weight of connections

Physics Letters A. Evolution of a large online social network. Haibo Hu,XiaofanWang. Contents lists available at ScienceDirect

Demystifying movie ratings 224W Project Report. Amritha Raghunath Vignesh Ganapathi Subramanian

arxiv:cond-mat/ v3 [cond-mat.dis-nn] 30 Jun 2005

An Improved k-shell Decomposition for Complex Networks Based on Potential Edge Weights

Transcription:

Vol. 5 (2012) Acta Physica Polonica B Proceedings Supplement No 1 THE DEPENDENCY NETWORK IN FREE OPERATING SYSTEM Tomasz M. Gradowski a, Maciej J. Mrowinski a Robert A. Kosinski a,b a Warsaw University of Technology, Faculty of Physics Koszykowa 75, 00-662 Warszawa, Poland b Central Institute for Labour Protection National Research Institute (CIOP PIB), Czerniakowska 16, 00-701 Warszawa, Poland (Received December 12, 2011) Phenomena observed in many complex systems can be attributed to their network structure. In this paper we present an analysis of the dependency network of a large software project Debian Operating System (GNU/Linux distribution) and show its properties, like power-law degree distribution, modularity and hierarchical organization. DOI:10.5506/APhysPolBSupp.5.47 PACS numbers: 89.20. a, 89.75.Fb, 64.60.ah 1. Introduction Free operating systems are becoming more and more popular nowadays. Many computer users choose these systems as an alternative for commercial systems because of their low price (actually no price), flexibility and reliability. In this case free means not only free of charge, but first of all libre. Free software is written and maintained by communities of open source developers and based on free licenses like GPL (General Public License). It means that it can be used, studied and modified without restriction, and it can be copied and redistributed in modified or unmodified form either without restriction or with minimal restrictions. Free software powers not only home computers but also large servers and data centers as well as mobile devices. In this paper we focus on the Debian Operating System, which is one of the most popular and influential GNU/Linux distributions [1]. Presented at the Summer Solstice 2011 International Conference on Discrete Models of Complex Systems, Turku, Finland, June 6 10, 2011. Corresponding e-mail address: tomgrad@if.pw.edu.pl (47)

48 T.M. Gradowski, M.J. Mrowinski, R.A. Kosinski Such a project is a collection of a large number of software packages. Each package can contain a compiled program, library, documentation, source files or other data, such as images and sound files. The distribution defines relations between these packages dependencies. Each package has a list of dependencies other packages required for its installation. A small portion of this system the dependency subgraph of one particular package is presented in Fig. 1. The whole distribution can be described as a large complex network of dependencies between packages. A proper structure of the dependency graph implies stability of the whole operating system and its susceptibility to random damages or attacks. mc libcomerr2 libglib2.0-0 libgpm2 libslang2 e2fslib libpcre3 libselinux1 zlib1g libc6 libc-bin libgcc1 Fig. 1. Dependency graph for one of the packages from the Debian distribution (mc Midnight Commander a very popular file manager). Each node represents one package from the distribution. Relations are represented by arrows pointing from one package to all its dependencies. Top package mc is the root of this subgraph and the core package libc6 is present with 9 incoming links. In this paper we study the dependency network of the Debian 6.0 Operating System. We conduct research on the scaling of the degree distribution. We analyze the nodes correlations in order to describe the assortativity of the network. By studying the relation between the degree and the average degree of nearest neighbors, we find the disassortative mixing, which is typical for technical networks, as well as biological networks like food-webs. A paper on the same matter was published recently by de Sousa et al. [2]. The authors presented interesting results of the research devoted to the stability and modularity of the Debian Operating System. In our paper we study this issue further and deliver more results and comments concerning modularity and node correlations.

The Dependency Network in Free Operating System 49 2. The network In this section we present some basic properties of the dependency network. In our research we used the up-to-date stable release of Debian 6.0 with only the main repository active. Network consists of 28234 nodes and 116783 edges. The fully connected component (giant cluster) contains 25775 vertices and 116644 edges. The edges represent the dependencies between packages, therefore there are no cycles and no parallel links in the network. The degree distribution for the giant cluster is depicted in Fig. 2, separately for incoming and outcoming links. The maximum indegree is k in = 12364. Package with such a large number of incoming links is labeled libc6, it is the most important library, since the most used language in the GNU/Linux ecosystem is C. The distribution follows the power law k γ in with the γ exponent of 2. 10 0 P(k out ) 10-1 10-2 10-3 10-4 10-5 10-6 1 10 100 k out Fig. 2. Scaling properties of the dependency network. Degree distributions for incoming (left) and outcoming links (right). Dashed line shows the k γ slope with γ = 2 for indegrees, γ = 2 and γ = 5 for outdegrees. Distribution of outdegrees is more interesting because of the double scaling. For small degrees, under 20, we observe scaling with exponent γ = 2, the same as for ingoing links. But above that value the situation drastically changes and the probability obeys kout 5 scaling. There is rather simple explanation of this phenomenon. Outdegree is a number of dependencies of a packages. The more dependencies a package has, the harder it is to maintain such a package. The developers and distribution maintainers try to keep the number of dependencies as low as possible. Sometimes it is better to split one packages into two or more packages with less dependencies to simplify the maintenance. Results presented in this section partially agree with those by de Sousa et al. [2], however in our calculations we used logarithmic bin sizes for histograms to obtain degree distributions, instead of constant size bins, which is a more proper method for scale free systems. With constant binning it is impossible to describe the double scaling of outdegree phenomenon.

50 T.M. Gradowski, M.J. Mrowinski, R.A. Kosinski 3. Hierarchical organization Clustering coefficient of a node indicates how close its nearest neighbors are to a complete graph (clique). For a directed graph it is defined as follows [3] (simplified version of the original definition due to unweighted network) 1 C i = a ij a ik a jk, (1) k i (k i 1) where a ij equals 1 if there is a directed edge from i-th to j-th node and 0 otherwise. The clustering coefficient is based on triplets (triangles) a ij a ik a jk and because the edges represent software dependencies its maximum value can be 0.5. If there is a link from node i to j and from i to k, there can be only one directed link between j and k. Otherwise there would be a cycle, which is forbidden in network of dependencies. The global clustering coefficient averaged over 15819 nodes with k out 2 is C = 0.306. The network of dependencies is not only scale-free, but also modular and hierarchical. Modularity is a tendency of groups of nodes to cluster with each other. Modularity is a common property of large software projects as well. Qt and GTK+ libraries, KDE and GNOME desktop environments, Xorg graphical system, Open Office suite, Tex Live distribution, just to name the most popular, all these projects consist of many programs or libraries which have similar purpose and share their dependencies. For instance, Open Office is a collection of programs like Writer, Calc, Impress, Draw which are tools for writing, spreadsheets, presentations and drawing. On the other hand modules often depend on each other, i.e. GNOME environment is build with the GTK+ library, which works on top of the Xorg system. It was shown previously [4] that networks with hierarchical organization exhibit power-law dependency between the average clustering coefficient and degree. This property is common for networks without strong geographical constrains (i.e. where cost of the link is insignificant) and was proven for few real networks: actors collaboration networks, semantic web, WWW, Internet on the autonomous system level and metabolism networks. We found similar property in Debian network of dependencies (Fig. 3). Although the C(k out ) relation follows the scaling law k 2 as opposed to k 1 in cited results, it proves the existence of different degrees of modularity in the network. This result is opposite to random networks, where clustering coefficient is k-independent. In hierarchical networks the higher level of hierarchy means lower clustering coefficient, and this kind of organization is observed in Debian dependency network for packages with more than 20 dependencies. But this relation is not observed for packages with lower k out due to the existence of a large number of hierarchy-independent modules. j,k

The Dependency Network in Free Operating System 51 1 <C(k out )> 0.1 0.01 1 10 100 k out Fig. 3. The scaling of clustering coefficient C with k out. The dashed line has slope 2. The C(k) scaling indicates hierarchical organization in the network. 4. Assortativity Assortativity is a tendency for vertices in networks to be connected to vertices with certain properties [5]. In particular, in networks with assortative mixing high-degree nodes tend to connect to other high-degree nodes (common for social networks). Such a behavior is opposite to disassortative mixing where high-degree nodes tend to connect only to low-degree nodes (different sorts of technical networks). Following this definition and focusing on the average connectivity of nearest neighbors [6] we can determine how important the dependencies are. In order to do so, we present the k in DEPS (k out ) relation, where k in DEPS is a number of ingoing links averaged over all the dependencies of a node with outgoing degree k out (Fig. 4). Decreasing average reveals the disassortative Fig. 4. Average indegree of dependencies of each package with its outdegree. Each point represents one package. Packages depending on the libc6 library are colored darker than the rest of packages.

52 T.M. Gradowski, M.J. Mrowinski, R.A. Kosinski mixing but does not deliver the full information about correlations in this network. A large number of nodes with high k in DEPS is separated from the rest of nodes. This is a consequence of the existence of a super hub in the network (libc6 library which is a dependency for almost half of the packages) and scale-free nature (most nodes have a low degree). 5. Conclusions Operating system understood as system kernel, basic tools and user applications, just like in popular GNU/Linux distributions, is a complex system with a structure of a scale-free network. This fact gives us not only the opportunity to describe computer software in the language of complex networks but also opens a lot of possibilities to study such systems with methods known from complex networks field. One of the crucial issues is the stability of operating system and its resistance for random damages as well as planned attacks. One of the common problems causing instabilities in GNU/Linux distributions, including Debian, are mistakes in the dependency graph. These mistakes, in computer jargon called bugs, often mean missing dependencies, non-existing dependencies, wrong version of dependencies or dependency loops (cycles). All these bugs make it impossible to install damaged package. Though Debian is known for its high quality and stability, we found a small number of cycles in the dependencies. Of course, determining the stability of the distribution relying on e.g. the connectivity analysis is a massive simplification. It would be interesting to investigate the effect of higher order dependencies (that is dependencies of dependencies) on the stability. REFERENCES [1] http://www.debian.org [2] O.F. de Sousa, M.A. de Menezes, T.J.P. Penna, JCIS 1, 127 (2009). [3] A. Barrat, M. Barthelemy, R. Pastor-Satorras, A. Vespignani, PNAS 101, 3747 (2004). [4] E. Ravasz, A.-L. Barabasi, Phys. Rev. E67, 026112 (2003). [5] M.E.J. Newman, Phys. Rev. Lett. 89, 208701 (2002). [6] R. Pastor-Satorras, A. Vazquez, A. Vespignani, Phys. Rev. Lett. 87, 258701 (2001).