Why Do Computer Viruses Survive In The Internet?

Why Do Computer Viruses Survive In The Internet? Margarita Ifti a and Paul Neumann b a Department of Physics, Faculty of Natural Sciences, University of Tirana Bul. Zog I, Tirana, Albania b Department of Computer Sciences, University Kristal Highway Tirana-Durres, km.5, Tirana, Albania Abstract. We use the three-species cyclic competition model (Rock-Paper-Scissors), described by reactions A + B 2B, B + C 2C, C + A 2A, for emulating a computer network with e-mail viruses. Different topologies of the network bring about different dynamics of the epidemics. When the parameters of the network are varied, it is observed that very high clustering coefficients are necessary for a pandemics to happen. The differences between the networks of computer users, e- mail networks, and social networks, as well as their role in determining the nature of epidemics are also discussed. Keywords: computer networks, computer viruses, topology, clustering. PACS: 87.10.Rt, 89.20.Hh, 89.75.Hc INTRODUCTION The Internet is a complex network of computers and routers, connected by wires or wireless links, the World Wide Web is a network of web pages connected by hyperlinks (virtual), the social networks are composed of human beings (nodes) and social connections (links). All these are examples of complex networks, whose structure (topology) has interested scientists, particularly physicists, in recent years. Three concepts are central to the modern understanding of complex networks: Small worlds: Despite the large size of the network, there is always a relatively short path between any two nodes. The best known example is the ``six degrees of separation'' idea, discovered by Stanley Milgram [1] who stated that there is a path of acquaintances with typical length of six between any pair of people in the United States. Clustering: In social networks, particularly, we observe the formation of ``cliques'', representing circles of friends or acquaintances in which everybody knows everybody. This attribute is characterized by the clustering coefficient, defined for a node i with k i >1 edges as follows: C i = 2E i / [k i (k i -1)], where E i is the number of links between the k i nodes connected to the i-th node [2]. Their average over the network gives the clustering coefficient of the network. Degree distribution: The number of edges (links) for a particular node is known as the degree of that node. The degree distribution P(k) for the real networks (World Wide Web [3], Internet [4], and metabolic networks [5]) has a power-law tail: P(k) k -. Networks with the above degree distribution (independent of N) are called scale-free [6]. The topology of the Internet is studied at the router level, and at the interdomain level. Faloutsos et al. [4] report power-law distributions with exponents around 2.2 at the interdomain level, and 2.48 at the router level. The clustering coefficient is 0.18 and 0.3 [7]. Similar topology has been observed in the movie actor collaboration [8, 6, 9, 10], science collaboration [11, 12, 13, 14], citations [14], human sexual contacts [15] networks (as some examples of social/human contacts networks). In previous works we have considered a non-spatial version of three-species competition models, and studied their long-term behaviour [16, 17]. They exhibit the extinction of one of the species, when there are no migrations, or survival of all three, when there are enough migrations into and out of the system. The same holds for systems with a larger number of species [18, 19], as well as generalized Volterra systems [20]. Many studies of the cyclic competition system, or cyclic Lotka-Volterra model, put the ensemble on a lattice [21, 22, 23]. However, humans are mobile. In such situations, the lattice models do not correctly represent the real spatial structure of the interaction and migration networks [24]. In this paper we will present a model that relates to the spread and survival of e-mail viruses in the Internet. CP1203, 7 th International Conference of the Balkan Physical Union, edited by A. Angelopoulos and T. Fildisis 2009 American Institute of Physics 978-0-7354-0740-4/09/$25.00 737

THE ABC CYCLIC MODEL ON A SCALE-FREE NETWORK The ABC model we have previously studied in the non-spatial version has relevance to epidemiological studies, since it resembles the case when the immunity to a particular infectious disease is only temporary, otherwise known in epidemiology as the SIRS (Susceptible-Infected-Recovered-Susceptible) model [25]. (The similarity is not present at all the steps within the SIRS model.) This is particularly relevant when one is interested in the spread of computer viruses in the Internet. We would like to be able to predict, just as for ordinary infectious diseases, whether they will flare up and die out, or remain endemic. Several questions arise, the answer to which will be determined by the topology of the system. First, let us discuss the relevance of the SIRS model to the computer viruses' spread. Computer and E-mail Virus Features The very name of computer viruses comes from the obvious resemblance to natural viruses: their ability to selfreplicate, high speed of spreading, the ability to select the target system (i.e. each virus only targets certain systems, or groups of systems), the ability to infect non-infected systems, our difficulties in fighting them, etc. Most recently, one can also add a permanently increasing rate of mutation and the appearance of new generations of viruses. The global distribution of the viruses in a large degree is determined by the mass-production and popularity of personal computers, which are particularly vulnerable to infection, due to their standardised architecture and software. Many viruses are able to mutate by modifying their own code randomly, so that it is difficult to establish certain fingerprinting characteristics of a given virus. It makes a reappearing virus unrecognisable to the antivirus software, causing reinfection. This circumstance justifies the use of a SIRS model for this class of viruses. The spread of computing networks caused concerns that it would create an environment on which the viruses would thrive. Although this is true, the contrary is also true: computing networks create the environment in which the antiviruses can spread very quickly. Once one of the users of the network is in possession of the means to combat a virus, they can share it with all other users in the network who are interested within minimal time. For all the circumstances described above, the spread of some sorts of computer infections (but not all of them) can be epidemiologically modelled by a SIRS model. Our ABC cyclic model, described by reactions A + B 2B, B + C 2C, C + A 2A (all three at equal rates), is a good first realisation of the epidemiological SIRS model. Models of Networks with Different Topologies Very good reviews in the field have been written by R. Albert and A.-L. Barabási [24], S.N. Dorogovtsev and J.F.F. Mendes [26], M.E.J. Newman [27], etc. We will give very short descriptions of relevant models. A network is represented by a graph, i.e. a set of nodes and edges (links) between them. The theory of random graphs was founded by Erdös and Rényi [28, 29, 30]. A random graph is defined as N nodes connected by n edges, randomly chosen among all possible edges [28]. Erdös and Rényi [29] showed that there is a sudden change in the cluster structure as the average degree per node <k> goes through 1: from a loose collection of small clusters, a large cluster is formed, which for <k> c =1 has approximately N 2/3 of the nodes, and for larger <k> has a number of nodes proportional to network size N. For large N the degree distribution of a random network is Poisson distributed with mean <k>. The diameter of these graphs, defined as the maximal distance between any pair of nodes in it, is almost constant, if p is not too small. The above described random networks have clustering coefficients that are much smaller than those observed in real-world networks. Definition of the scale-free model. The origin of the scale-free behaviour in networks was first addressed by Barabási and Albert [3]. They attributed the scale-free nature of the networks to two mechanisms present in many networks. The Albert- Barabási algorithm for the scale-free model is the following: (i) Growth: Start with a small (m 0 ) number of nodes, then every step add a new node with m( m 0 ) edges which link the newly added node to m nodes already in the network. (ii) Preferential attachment: The probability that the new node will be connected to node i depends on the degree k i of node i: P(k i ) = k i / k j. This algorithm produces a scale-free network with an exponent equal to 3 for the power-law degree distribution. The average path length of the scale-free model follows a logarithmic dependence on the number of nodes in the network N. This model is a minimal one that captures the mechanisms responsible for the power-law degree distribution. 738

The CM is already the standard for simulating networks of a given. For each node, its degree k is drawn from the power-law probability distribution function. Then pairs of links are randomly placed between the nodes, until they get saturated [31, 32]. This model creates correlations in the connectivity distribution between the system nodes for <3. UCM uses the same algorithm for the construction of the network, except that the upper cutoff is fixed beforehand at k max =N 1/2. Then the connectivity correlations disappear [33]. The spread of ideas, innovations, and computer viruses is influenced immensely by the network structure. The dynamics is influenced heavily by the fact that these networks have a large clustering coefficient. Many studies of dynamics on networks have been and are being done [34, 35, 36]. This area of study is a rapidly developing one. A Model for Social and E-mail Networks When modelling the fate of viral infections, we have to keep in mind that the process of spreading will take place in a social network, for the natural viruses, and in an e-mail network, in the case of the e-mail viruses. Acquaintance networks have been studied extensively [1, 9]. Scale-free properties have been observed in some of them [37, 12, 13]. A model for the emergence of small world properties from local interactions has been introduced by Davidsen, Ebel, and Bornholdt [38, 39]. The model is based on ``transitive linking'', i.e. ``my friend's friend is my friend'', and a finite lifetime of the nodes. The dynamics is then defined as follows: (i) A randomly chosen node picks at random two nodes it is already connected to, and a link is established between those two nodes. If the picked node has degree less than two, then a link is established between it and another, randomly picked, node. (ii) With probability p, a randomly chosen node is removed from the network, and is replaced by a node with one randomly chosen link. The number of nodes remain constant. Since the nodes have a finite lifetime, the network reaches a stationary state. The death probability p is bound to be very small, since the time separation between meetings with new acquaintances is a lot shorter than that between major events, like somebody leaving the network (dying). The model described above exhibits large clustering, which only depends on the parameter p, and the (large) clustering coefficient can be tuned by varying the death rate p (this is the main reason we chose this model). When p is small enough, the network shows scale-free degree distribution; above that the degree distribution is exponential. This degree distribution is also observed in real-world social networks. SIMULATIONS OF ABC MODEL IN NETWORKS WITH DIFFERENT TOPOLOGIES It seems reasonable to try to simulate the behaviour of the ABC model in a network, generated according to the transitive linking and finite lifetime assumptions [38, 39], since it correctly describes the social networks. The social networks are a good candidate, since they are the medium through which most viruses spread. E-mail networks appear as some ``diluted'' version of a social network (e.g. not all one's friends use e-mail). Hence, the above model does not completely describe them [40], but it seems to be a good toy model. In each case, the random networks, as well as scale-free networks (generated with Albert- Barabási, CM and UCM algorithm), were used for comparison. In our approach to the simulation of our systems, we aim at making the simulation fit the theoretical description, which means that we have to simulate continuous time Markov processes. In these processes the event times are exponentially distributed, and the characteristic time depends on the current configuration of the system. First, we initialize the simulation: the nodes are assigned an initial state at random, i.e. A, B, or C with probability 1/3, and then they are assigned a ``wake up'' time with exponential distribution. The node with the smallest time will be activated: if its state is, say, B, it will convert the nodes at state A among the ones it is connected to (emulating infection at contact), and analogously for other states. The process is then iterated. No Pandemics in the Internet! We started our simulations with the Albert- Barabási model, and a modified version of it, after Capocci et al. [41, as well as the CM and UCM models. In all cases the number of A, B, C individuals drifts with time, but none of them becomes zero. This situation is observed in network sizes up to 50 000 and <k> up to 200. A typical time series is shown in Fig. 1. This situation is very applicable to network viruses. Indeed we have seen viruses reappear several times in the real life. But does this mean that we cannot have pandemics in small-world and scale-free networks? 739

FIGURE 1. A typical time series for the ABC model in a scale-free network of size N=900. The model for the social networks, introduced by Davidsen et al. [38, 39], seems to be a good testing ground for this, since its average degree, as well as clustering coefficient, can be tuned by varying the death parameter. As we increase the clustering coefficient, the system undergoes a transition from the ``diversity'' regime to the ``fixation'' one, much like the one observed for the non-spatial system [17]. The variation of the critical clustering coefficient with the network size is given in Fig. 2. It was possible to obtain an approximate value of the critical clustering coefficient for network sizes up to 50 000. The critical value of clustering coefficient stabilizes around 0.65. FIGURE 2. Critical clustering coefficient vs. network size for the social networks, modelled using the Davidsen et al. algorithm. 740

The computer users and e-mail networks appear as ``diluted'' versions of the social networks, so one would expect their clustering coefficient to be lower, perhaps close to those obtained with Albert- Barabási model, for which the ABC model does not reach fixation. A study of the e-mail networks was done by Ebel et al. [40], based on the log files of the e-mail server at the University of Kiel, recording the sender and receiver of every message from or to a student account over a period of 112 days. The network that emerged from the log files has 59 912 nodes (including 5 165 student accounts), and it has an average degree <k>=2.88. The large cluster comprises 56 969 nodes and has an average degree of 2.96. The power law exponent for the degree distribution is 1.81. The network shows ``small world'' properties: its clustering coefficient is large, C=0.156. However, the clustering coefficient of real life networks never reach the values required in order to have pandemics, as shown by our simulations. As far as clustering goes, one end of the spectrum (when networks are considered) are the social networks, while the other one are the random networks. Runs performed on random networks were quite surprising: one would require enormous connectivities (p) to reach fixation on them! A plot of critical connectivity vs. network size for the random networks is shown in Fig. 3. The critical connectivity reaches values close to 0.4, which is enormous. FIGURE 3. Critical connectivity vs. network size for random networks. CONCLUSIONS The modern computer and e-mail viruses, with the ability to mutate (some of them), and the absence of a reliable antivirus (for others) call for SIRS epidemiologic models, rather than simply SIR. Simulations of the spread and evolution of such systems in different scale-free topologies show that all three types, (A, B, or C) are present in the network at all times. An attempt made in social network models, shows the presence of a transition to the fixation state for clustering coefficients close to 0.65. This result would relate to the observed peak and disappearance of epidemics in human communities. Runs performed on random networks shows that the critical connectivities approach 0.4, which is quite an enormous connectivity. Apparently, the computer viral epidemics do not exhibit the same characteristics as human epidemics, because the clustering coefficients for the computer users/e-mail networks cannot achieve the critical values. The clustering coefficients of e-mail networks, as measured, do not seem to get high enough to approach the critical values obtained by our simulations. It seems that the pandemics of a virus in the Internet is an unlikely event, unlike pandemics in human social networks. The Internet is threatened more by the side-effect of extreme congestion from the uncontrolled multiplication of viruses. 741

REFERENCES 1. S. Milgram, Psych. Today 2, 60-67 (1967). 2. D.J. Watts and S.H. Strogatz, Nature 393, 409-410 (1998). 3. R. Albert, H. Jeong and A.-L. Barabási, Nature 401, 130-131 (1999). 4. M. Faloutsos, P. Faloutsos and C. Faloutsos, Proc. ACM SIGCOMM, Comput. Commun. Rev. 29, 251-263 (1999). 5. H. Jeong, B. Tombor, R. Albert, Z.N. Oltvai and A.-L. Barabási, Nature 407, 651-654 (2000). 6. A.-L. Barabási and R. Albert, Science 286, 509-512 (1999). 7. R. Pastor-Satorras, A. Vázquez and A. Vespignani, Phys. Rev. Let. 87, 258701 1-5 (2001). 8. M.E.J. Newman, S.H. Strogatz and D.J. Watts, Phys. Rev. E 64, 026118 1-19 (2001). 9. L.A.N. Amaral, A. Scala, M. Barthélémy and H.A. Stanley, Proc. Nat. Acad. Sci. USA 97, 11149-52 (2000). 10. R. Albert and A.-L. Barabási, Phys. Rev. Lett. 85, 5234-37 (2000). 11. M.E.J. Newman, Proc. Nat. Acad. Sci. USA 98, 404-409 (2001). 12. M.E.J. Newman, Phys. Rev. E 64, 016131 1-7 (2001). 13. A.-L. Barabási, H. Jeong, Z. Néda, E. Ravasz, A. Schubert and T. Vicsek, Physica A 311, 590-614 (2002). 14. S. Redner, Eur. Phys. J. B 4, 131-134 (1998). 15. F. Liljeros, C.R. Edling, L.A.N. Amaral, H.E. Stanley and Y. Aberg, Nature 411, 907-908 (2001). 16. M. Ifti and B. Bergersen, Eur. Phys. J. E 10(3), 241-248 (2003). 17. M. Ifti and B. Bergersen, Eur. Phys. J. B 37, 101-107 (2004). 18. Y. Togashi and K. Kaneko, Phys. Rev. Let. 86(11), 2459-64 (2001). 19. M. Ifti and B. Bergersen, Proc. Ann. Meeting Alb-Sci. Inst., Tirana 2008 (accepted for publication). 20. M. Ifti and B. Bergersen, Proc. Int. Conf. Biol. Environ. Sci., Tirana 2008, pp. 151-156. 21. L. Frachebourg, P.L. Krapivsky and E. Ben-Naim, Phys. Rev. E 54, 6186-6200 (1996). 22. G. Szabó, M.A. Santos and J.F.F. Mendes, Phys. Rev. E 60, 3776-3781 (1999). 23. G. Szabó and C. Hauert, Phys. Rev. E 66, 062903 1-4 (2002). 24. R. Albert and A.-L. Barabási, Rev. Mod. Phys. 74, 47-97 (2002). 25. H.W. Hethcote, SIAM Review 42, 599-653 (2000). 26. S.N. Dorogovtsev and J.F.F. Mendes, Adv. Phys. 51(4), 1079-1187 (2002). 27. M.E.J. Newman, SIAM Review 45(2), 167-256 (2003). 28. P. Erdös and A. Rényi, Publ. Math. Debrecen 6, 290-297 (1959). 29. P. Erdös and A. Rényi, Publ. Math. Inst. Hung. Acad. Sci. 5, 17-61 (1960). 30. P. Erdös and A. Rényi, Bull. Inst. Int. Stat. 38 no. 4, 343-347 (1961). 31. R. Cohen, K. Erez, D. ben Avraham, and S. Havlin, Phys. Rev Lett. 85, 4626-4629 (2000). 32. S.N. Dorogovtsev, J.F.F. Mendes, and A.N. Samukhin, Phys. Rev E 63, 062101 1-4 (2001). 33. M. Catanzaro, M. Boguñá, and R. Pastor-Satorras, Phys. Rev E 71, 027103 1-4 (2005). 34. R. Pastor-Satorras and A. Vespignani, Phys. Rev. Lett. 86, 3200-3203 (2001). 35. R. Pastor-Satorras and A. Vespignani, Phys. Rev. E 63, 060117 1-9 (2001). 36. R. Pastor-Satorras and A. Vespignani, Phys. Rev E 65, 036104 1-9 (2002). 37. S.H. Strogatz, Nature 410, 268-269 (2001). 38. J. Davidsen, H. Ebel and S. Bornholdt, Phys. Rev. Lett. 88, 128701 1-4 (2002). 39. H. Ebel, J. Davidsen and S. Bornholdt, Complexity 8(2), 24-27 (2003). 40. H. Ebel, L.-I. Mielsch and S. Bornholdt, Phys. Rev E 66, 035103(R) 1-4 (2002). 41. A. Capocci, G. Caldarelli and P. De Los Rios, arxiv:cond-mat/0206336. 742