Expected Capacity Guaranteed Routing based on Dynamic Link Failure Prediction

Expected Capacity Guaranteed Routing based on Dynamic Link Failure Prediction Shu Sekigawa, Satoru Okamoto, and Naoaki Yamanaka Department of Information and Computer Science, Faculty of Science and Technology, Keio University, 3-14-1 Hiyoshi, Kohoku-ku, Yokohama, 223 8522, Japan Email: shu.sekigawa@yamanaka.ics.keio.ac.jp Eiji Oki Graduate School of Informatics, Kyoto University Yoshida-honmachi, Sakyo-ku, Kyoto, 606-8501, Japan Email: oki@i.kyoto-u.ac.jp Abstract In a high-speed backbone network, the failure of network links may cause large data losses, so it is necessary to reserve spare network resources for faster recovery. The conventional protection methods that reserve backup paths do not consider the failure probability of each network link and allocate same amount of network resources for the backup paths regardless of the failure rate of network links. This leads to the excessive or insufficient capacity allocation. This paper proposes the routing method that guarantees the expected value of the allocated capacity based on dynamically changed link failure rate. We formulate a Mixed Integer Liner Programming (MILP) model for the proposed method. We conduct simulations to investigate the effect of the expected capacity guaranteed routing over the conventional routing method in terms of bandwidth blocking probability, transmitted capacity achieved ratio, and the ratio of total transmitted capacity to total requested capacity. The results show that our proposed method marks higher transmitted capacity achieved ratio compared to conventional routing methods. We also find the transmitted capacity achieved ratio of the proposed method maintains 90% under high failure rate. I. INTRODUCTION Total Internet traffic has been rapidly increasing in the past two decades and annual global IP traffic is expected to reach 3.3 EB by 2021 [1]. To meet this requirement, a high-speed backbone network whose transmission capacity is exceeding 100 Gbps has been introduced [2]. However, a lot of data may be lost even when a single failure occurs in a network link and it may have a huge influence on various services. In order to prevent such a situation, the path protection method is widely used for recovering from link failures. The path protection method prepares a backup path for each path before configuring the network [3]. When a failure occurs, a connection is switched to the backup path to prevent the interruption of communication. Since it is unnecessary to perform routing processing and route securing again when a failure occurs, the path protection enables to shorten the time before resuming the communication [4]. With the spread of high-speed networks, many devices are connected to the Internet and the Internet of things (IoT) system is attracting a lot of attention. Based on analyzing the collected data from IoT systems and network equipment, the failure prediction of a network system is being studied [5], [6]. The conventional protection methods do not take of the failure probability of each link and the backup path is prepared uniformly for all working paths. The same link capacity is reserved for backup paths no matter how low the failure possibility of the link is. If the amount of network resources secured for backup paths are decided based on the link failure probability, network resources can be used more efficiently. The routing method in data center with awareness on link failure probability is presented in [7]. The method selects the optimal route with minimum failure probability, however it does not ensure the performance of the allocated route. In order to achieve the path allocation with performance assurance according to the failure probability, this paper proposes a routing method that guarantees the expected value of allocated capacity considering to dynamic link failure rate environment. The proposed method is called an expected capacity guaranteed routing method (ECGR). The basic idea of ECGR was introduced in [8]; only the static failure rate scenario is considered. We formulate a Mixed Integer Liner Programming (MILP) model for ECGR. We conduct simulations to study the advantage of the expected capacity guaranteed routing over the conventional protection method in terms of bandwidth blocking probability, transmitted capacity achieved ratio, and the ratio of total transmitted capacity to total requested capacity. We evaluate two scenarios with different link lifetime to confirm the effect by corresponding to the dynamic link failure rate. The rest of the paper is organized as follows. Section II presents detailed operation of the expected capacity guaranteed routing. Section III explains the MILP model for the expected capacity guaranteed routing. Section IV shows the variable capacity allocation that is the method to reduce total allocated resources under high failure rate. In Section V, we show the performance evaluation. Finally, we conclude the paper in Section VI. II. EXPECTED CAPACITY GUARANTEED ROUTING The basic idea of ECGR is calculating the expected value of usable capacity on the allocated paths based on the link failure

probability and selecting the multiple paths that the total expected value of allocated capacity exceeds the requested capacity. The expected value of capacity is calculated as the product of the allocated capacity on the path and the path available probability that can be obtained from the link failure rate. The path available probability means the probability that the link does not experience failure during communication. In order to correspond to dynamical failure rate change, it is necessary to calculate how much probability the path is available during connection holding time. We first derive the link reliability from the link failure rate. The link failure rate, λ ij (t), is defined as the probability that the link (i, j) fails next unit time when there is no failure before time t. Here, time t is equivalent to the age of link. The link reliability, R ij (t) is defined as the probability that the link experiences no failure during the time interval 0 to t. There is a relationship such as (1) between the link failure rate and the link available probability. R ij (t) = exp{ t 0 λ ij (x)dx} (1) Next, we calculate the link available probability from the link reliability. Assuming that the communication starts at T and ends at T + t, the link available probability A ij (T, T + t) is presented as (2). A ij (T, T + t) = P (No failure before t = T + t No failure before t = T ) = R ij(t + t) R ij (T ) Since a path can be considered as serial system composed of links, the path available probability is denoted by the product of the available probability of all links. Thus, we can find how much probability the path is available during communication by the failure rate function of links. We assume a connection request has a capacity requirement, an expected capacity quality and connection holding time. A request is represented by r = p, q, B req, t, where p and q are the source and destination nodes, B req is the capacity requirement, and t is the connection holding time. When a connection request r = p, q, B req, t is given, ECGR selects K paths between p and q with total expected capacity on the paths is at least B req. To increase the number of connections that can be accommodated in the network, ECGR selects the route with the lowest cost; cost is defined as the product of link distance and traffic capacity flowing on the link. Figure 1 shows an example solution of the minimum cost flow problem and ECGR. Figure 1(a) shows a topology with connection request r = 1, 4, 12, 1. Numbers next to each link indicate length, capacity, and link failure rate from the left. Figure 1(b) shows the solution by the minimum-cost flow problem (MCFP), which finds the paths that can send requested amount of flow with the minimum cost. Figure 1(c) (2) Fig. 1. Example solutions of the minimum cost flow problem and the expected capacity guaranteed routing. shows the solution by ECGR. Arrows in Figure 1(b), (c) represent the link on which the selected paths flowing on by each method. The numbers next to the arrow shows the allocated capacity on the path. We can find that the solution by ECGR allocates more capacity on the long links with lower failure probability like link (1, 3) and (2, 4) than that by MCFP. ECGR provides additional capacity allocation to achieve the requested expected capacity value, while the solution by MCFP reserves the capacity as requested. We develop the MILP model for ECGR presented in next Section III. III. MILP MODEL FOR THE EXPECTED CAPACITY GUARANTEED ROUTING We present the MILP model for the expected capacity guaranteed routing. First, we describe the definition of each constant and variable used in the MILP model. Given parameters: E A set of links. V A set of nodes. K A set of path number k. (i, j) A link between node i and j. (i, j) E. d ij Length of (i, j). A ij Available probability of (i, j). c ij Total capacity of (i, j). B req Requested capacity. T ij Age of (i, j). t Connection holding time. p Source node. p V. q Destination node. q V. Variables: x k ij Boolean variable that equals 1 if kth path uses (i, j) and 0 otherwise. b k Traffic capacity that can flow through kth path. Next, the objective and constraints of the MILP model are described in the following. The objective minimizes the sum of the products of the traffic capacity and the length of each link used for each path. Objective: minimize x k ij d ij b k k K

Constraints: Flow conservation constraint x k ij { x k 0, k K, i p, q V ji = 1, k K, i = p j: j:(j,i) E (3) Equation (3) ensures that the number of input links and output links of all nodes excluding for the source and destination node is the same, and the number of output links of source node is one more than that of input links. Link capacity constraint 0 k K x k ij b k min(b req, c ij ), (i, j) E (4) Equation (4) ensures that the maximum traffic capacity which can flow through (i, j) is c ij. To prevent deviation of the allocated capacity on each path, the maximum capacity flowed on a single link is also limited to B req. Expected capacity requirement constraint {b k k K A xk ij ij } B req (5) Equation (5) ensures that the total expected value of capacity is equal to or larger than requested expected capacity B req. The expected capacity of kth path is expressed as, A k = Axk ij ij. A k is the product of the available probability of all links which used by the kth path, because if one of the link used by the path fails, the path can not be used for communication anymore. It is not easy to find a solution directly from the above optimization problem, since the product x k ij b k is nonlinear. We first transform it into a linear form by using a lineartransformation idea presented in [9]. New variables y k ij satisfying the following constraints are introduced. y k ij b k + U(x k ij 1), k K, (i, j) E (6) y k ij Ux k ij, k K, (i, j) E (7) y k ij 0, k K, (i, j) E (8) In (6) (8), U is a sufficiently large number to satisfy U max c ij. By the limitation of (6), if x k ij = 0, then yk ij b k U, and if x k ij = 1, then yk ij b k. Equation (7) forces yij k 0 if xk ij = 0, and yk ij U if xk ij = 1. Finally, (8) ensures that yij k is not a nonnegative variable. Overall, if xk ij = 0, then yij k = 0 and if xk ij = 1, then b k yij k U. xk ij b k can be replaced with yij k, because yk ij is minimized by the objective function and it matches b k in the optimal solution. Next, we use F ij = 1 A ij to linearize (5). Here, we use Bernoulli s inequality for an approximate calculation of path available probability. F ij means the probability of failure of Fig. 2. Example of Variable capacity allocation. (i, j) from time T ij to time T ij + t. Equations (9) (11) show the rewriting of (5) with F ij. A k = (1 F ij ) xk ij (9) A k = (1 F 11 ) xk 11 (1 F12 ) xk 12 (1 F13 ) xk 13 (10) A k = 1 F ij x k ij + (higher order terms) (11) Assuming F ij 1, we can ignore the second and higher terms on F ij, so (11) can be rewritten as (12). A k 1 F ij x k ij = 1 (1 A ij ) x k ij (12) Available probability calculated by (12) is approximated to be smaller than the exact value. From the above, (5) can be linearized and represented as (13). b k (1 A ij ) b k x k ij B req (13) k K k K IV. VARIABLE CAPACITY ALLOCATION The higher the link failure rate is, the larger capacity ECGR tries to allocate, compared to that requested in order to meet the expected capacity requirement. This leads to increasing the blocking probability of connection requests. We introduce the variable capacity allocation (VCA) to deal with this problem. VCA allows to change the allocated capacity and the connection holding time while keeping the total transmitted capacity during the connection. In general, it costs more when allocating higher capacity for a connection. However, allocating larger capacity and reducing the connection holding time in ECGR may reduce the total allocated resources. According to (2), the available probability increases when the connection holding time is reduced. VCA shorten the connection holding time to reduce spare capacity for guaranteeing expected capacity under high link failure rate environment. Figure 2 shows the example of VCA. Given the requested capacity 15 MB/s, the holding time 4s, and the path available probability 0.6, ECGR allocates 25MB/s on the path. The capacity is 1.7 times higher than that of requested. When VCA is applied, the connection holding time is shortened by half and requested capacity is doubled for maintaining the total

Fig. 3. NSFNET topology used in the simulation. transmitted capacity. As a result, the ECGR allocates 34 MB/s that is only 1.2 times higher than that of requested. Fig. 4. Bandwidth blocking probability performance of expected capacity guaranteed routing, minimum cost flow routing and conventional protection method. V. SIMULATION AND DISCUSSION We evaluate the performance of ECGR. We measure the bandwidth blocking probability (BBP), the transmitted capacity achieved ratio (TCAR), and the ratio of total transmitted capacity to total requested capacity. BBP is defined as the total requested capacity of rejected connections divided by the total requested capacity of all connections. Lower BBP means that more connections can be accepted into the network. TCAR is defined as the number of connections, each of which total transmitted capacity along the allocated path is more than the product of the allocated capacity and the connection holding time, divided by the number of accepted connections. TCAR expresses the proportion of accepted connections that are successfully transmitted. To confirm the effect of guaranteeing the expected capacity, we show the ratio of total transmitted capacity to total requested capacity. Unlike TCAR, the ratio of total transmitted capacity to total requested capacity takes account of the transmitted capacity of the connections that fail to meet the capacity requirement. We use two conventional routing methods to compare with. One is the routing by MCFP that shows the conventional method without using backup path. The other is the link disjoint two-path routing, which represents the conventional protection method. We used NSFNET with 14 nodes and 21 links as a topology shown in Figure 3. The number in each node indicates the node number and the number beside each link shows the length of the link. The arrival of connections follows a Poisson distribution and the connection holding time is geometrically distributed with a mean of 4 seconds. We conduct simulations for different number of demands. In each simulation, 5,000 demands are generated randomly, and, source and destination nodes for each demand are randomly selected. The requested capacity B req is randomly chosen from 2, 4, 8, and 16. The maximum number of paths is set to 3. The initial capacity of each link is set to 100 and the available link capacity decreases with requests accepted into the network. The failure rate of each link is dynamically changed according to the Weibull distribution [10], which is commonly used for modeling system failure rates. The failure rate function λ(t) of the Weibull distribution is represented by, λ(t) = m η m tm 1. (14) The shape parameter m is set to 5 to simulate the wear-out failures. The scale parameter η, which indicates the average lifetime, is 10 and 20. The recovery time of links is geometric distributed with mean of 5 seconds and, after recovery, the age of link is reset to 0. A. Comparison in bandwidth blocking probability Figure 4 shows the BBP performance of the expected capacity guaranteed routing, minimum-cost flow routing, and the conventional protection method at the different numbers of the arrival connections and the scale parameter η = 10, 20. In both of η = 10 and 20 cases, the BBP of ECGR without VCA is the highest and that of the minimum cost flow routing is the lowest. This is because that, in minimum cost flow routing, the connection is accepted as long as the available capacity is equal to or larger than the requested capacity, whereas, in ECGR, blocking occurs when there is no sufficient capacity to satisfy the requested expected capacity. Comparing the BBPs of η = 10 and 20, the difference in BBP of ECGR is larger than that of MCFP and the protection method. This means the BBP of ECGR increases as the average failure rate becomes high. By applying VCA, we can find that the BBP of ECGR is reduced by 27% in average at η = 10 and 21% at η = 20. B. Comparison in transmitted capacity achieved ratio Figure 5 shows the TCAR of the expected capacity guaranteed routing, minimum cost flow routing and the conventional protection method at the different numbers of the arrival connections and the scale parameter η = 10, 20. ECGR achieves higher TCAR than MCFP and the protection method, and keeps about 90% of TCAR regardless of the average failure rate and with or without VCA. In contrast, TCARs of both MCFP and the protection method, fall sharply when the average failure rate is increased. The results indicate that most of the accepted connections transmitted successfully without being affected by link failures by ensuring the expected value of allocated capacity.

Fig. 5. Transmitted capacity achieved ratio performance of expected capacity guaranteed routing, minimum cost flow routing, and conventional protection method. allocation causes a reduction in the number of connections that can be accepted into the network and lower the performance quality. This paper proposed the routing method that guarantees the expected value of the allocated capacity based on link failure rate which is dynamically changed. We developed the MILP model for ECGR, and introduced VCA method to improve the bandwidth blocking probability. The simulation results showed that ECGR marks higher TCAR compared to conventional methods and keeps 90% of TCAR regardless of how high the average failure rate is. The results presented that the BBP of ECGR gets high under high failure rate. We confirmed that VCA method reduces BBP of ECGR approximately 20 30% and BBP of ECGR with VCA at η = 20 is lower than that of the protection method when the average number of arrival connection is more than 25. We measured the ratio of total transmitted capacity to total requested capacity and confirmed that ECGR allocates the sufficient capacity to transmit the requested capacity. ACKNOWLEDGMENT This work is partly supported by the R&D of innovative optical network technologies for supporting new social infrastructure project funded by the Ministry of Internal Affairs and Communications Japan. Fig. 6. Ratio of total transmitted capacity to total requested capacity of expected capacity guaranteed routing, minimum cost flow routing, and conventional protection method. C. Comparison in ratio of total transmitted capacity to total requested capacity Figure 6 shows the ratio of total transmitted capacity to total requested capacity of the expected capacity guaranteed routing, minimum cost flow routing and the conventional protection method at the different number of the arrival connections and the scale parameter η = 10. The ratio of total transmitted capacity to total requested capacity of ECGR is higher than 1. This means that ECGR surely allocates plenty of capacity to guarantee the expected capacity and the route selected by ECGR can transmit the requested capacity on average. The ratio of total transmitted capacity to total requested capacity of conventional methods is below 1. This means that those methods can not secure sufficient capacity to meet the requirements. We observe 20% difference between the ratio of total transmitted capacity to total requested capacity of ECGR with and without VCA. It is considered the difference is caused by the error in an approximate calculation in (9) (11). VI. CONCLUSION The conventional path protection method does not consider the failure probability of network links. This leads to the excessive or insufficient capacity allocation. An irrelevant capacity REFERENCES [1] Cisco, The zettabyte era: Trends and analysis, (Date last accessed 9-Aug-2018). [Online]. Available: https://www.cisco.com/c/en/us/solutions/collateral/serviceprovider/visual-networking-index-vni/vni-hyperconnectivitywp.html# Toc484556818 [2] K. A. Tse, AT&Ts photonic backbone design options, in 2010 Conference on Optical Fiber Communication (OFC/NFOEC), collocated National Fiber Optic Engineers Conference, March 2010, pp. 1 3. [3] S. Ramamurthy, L. Sahasrabuddhe, and B. Mukherjee, Survivable wdm mesh networks, Journal of Lightwave Technology, vol. 21, no. 4, pp. 870 883, April 2003. [4] S. S. Lumetta, M. Medard, and Y.-C. Tseng, Capacity versus robustness: a tradeoff for link restoration in mesh networks, Journal of Lightwave Technology, vol. 18, no. 12, pp. 1765 1775, December 2000. [5] J. Zhong, W. Guo, and Z. Wang, Study on network failure prediction based on alarm logs, in 2016 3rd MEC International Conference on Big Data and Smart City (ICBDSC), March 2016, pp. 1 7. [6] K. Ishibashi, T. Hayashi, and K. Shiomoto, Advanced network design and operation by machine learning and data analysis, (written in Japanese) NTT Technical Journal, vol. 27, no. 12, pp. 29 33, December 2015. [7] Y. Zhang, Y. Shi, C. Li, J. Xiao, B. Wu, H. Wen, and X. Jiang, Cloud service routing in wdm networks with awareness on delay and link failure probability, in 2014 13th International Conference on Optical Communications and Networks (ICOCN), Nov 2014, pp. 1 4. [8] S. Sekigawa, E. Oki, T. Sato, S. Okamoto, and N. Yamanaka, Expected capacity guaranteed routing method based on failure probability of links, in 2017 IEEE International Symposium on Local and Metropolitan Area Networks (LANMAN), June 2017, pp. 1 2. [9] M. Johnston, H. W. Lee, and E. Modiano, A robust optimization approach to backup network design with random failures, IEEE/ACM Transactions on Networking, vol. 23, no. 4, pp. 1216 1228, August 2015. [10] G. S. Mudholkar and D. K. Srivastava, Exponentiated weibull family for analyzing bathtub failure-rate data, IEEE Transactions on Reliability, vol. 42, no. 2, pp. 299 302, June 1993.