SoS Dependability Assessment: Modelling and Measurement

Size: px
Start display at page:

Download "SoS Dependability Assessment: Modelling and Measurement"

Transcription

1 DSoS IST Dependable Systems of Systems SoS Dependability Assessment: Modelling and Measurement Report Version: Deliverable CSDA3 Report Preparation Date: October 2002 Classification: Public Circulation Contract Start Date: 1 April 2000 Duration: 36m Project Co-ordinator: Newcastle University Partners: DERA, Malvern UK; INRIA France; CNRS-LAAS France; TU Wien Austria; Universität Ulm Germany; LRI Paris-Sud - France Project funded by the European Community under the Information Society Technology Programme ( )

2 LAAS-CNRS Report No

3 Table of Contents 1. Introduction SoS Dependability Modelling: The Travel Agency Example The Travel Agency (TA) Presentation Function and User Levels Service and Function Levels Resource Level TA Availability Modelling Service Level Availability External services Internal services Function Level Availability User Level Availability Evaluation Results Summary Measurement-based Evaluation Target system architecture Data collection and processing approach Event logging in Unix Event logging in Windows NT and 2K Data collection strategy Data processing Application to Unix Systems Identification of reboots Distribution of reboots per machine Machine uptimes and downtimes evaluation Availability evaluation Dependencies among machines Dependencies related to reboot events Dependencies related to SNR events Application to Windows NT and 2K Systems Identification of reboots Distribution of reboots per machine Reboot causes analysis Uptime and downtime evaluation Availability evaluation Summary Conclusion References...61

4

5 SoS Dependability Assessment: Modelling and Measurement Mohamed Kaâniche, Karama Kanoun, Magnos Martinello, Cristina Simache LAAS-CNRS (Toulouse, France) 1. Introduction This report summarizes the work carried out within the VA Workpackage on SoS dependability modelling and assessment. Two complementary approaches are considered to support the SoS dependability evaluation: (i) analytical modelling, and (ii) measurement-based assessment. Modelling is useful to guide the development of the target SoS during the design phase by providing quantitative measures characterizing the dependability of the target SoS at the successive design stages. Various design alternatives can be analysed and assessed in order to choose the final solution that better satisfies the requirements. Our hierarchical modelling approach proposed in deliverable DMS1 [Kaâniche et al. 2001] has been defined to allow the easy construction and refinement of the SoS dependability models during the early design stages. Measurement experiments are needed to provide estimates for the parameters used in the models as well as demonstrate the validity of the modelling assumptions and evaluations. This report is structured into two main parts. Part 1 concerns SoS dependability modelling and Part 2 addresses the estimation of dependability related measures and parameters, characterizing component systems, based on data collected from operation. Part 1 illustrates the main concepts of our hierarchical modelling framework proposed in deliverable DMS1 [Kaâniche et al. 2001] for the dependability evaluation of systems of systems, using the travel agency (TA) case study described in deliverable DMS3 [Periorellis & Dobson 2001] as an example. In particular, the objectives are: 1) to show how to apply our framework based on the decomposition of the target SoS according to four levels: user, function, service and resource levels, and 2) to present typical dependability analysis and evaluation results obtained from modelling, to help the SoS providers in making objective design decisions. In particular, several sensitivity analysis results are presented to illustrate the impact of various assumptions concerning e.g. the users operational profile, the TA architecture and the fault coverage, on the user perceived availability. The availability measure Dependable Systems of Systems 1

6 CSDA3- SoS Dependability Assessment: Modelling and Measurement considered takes into account the combined impact of performance related failures and traditional software and hardware failures. The application of this framework requires the estimation of several parameters involved in the models using data collected from the field. This issue is covered in Part 2. The ideal situation would be to collect data from an operational TA SoS for which we have detailed information on the architecture. However, such an SoS is not available in the context of DSoS. To illustrate the type of measurement-based studies that can be carried out, the LAAS computing network is used as an example. In particular, the results presented in Part 2 are based on event logs collected during a three year observation period from 373 SunOS/Solaris Unix machines, 76 Windows NT and 89 Windows 2K systems interconnected through the LAAS network. The identification of useful trends from large event logs is a time consuming task that requires thorough manual analyses. In our study, we have focused on the identification of machine reboots, and the evaluation of statistical measures characterizing: a) the reboot distribution and occurrence rate, per machine, b) the distribution of uptimes and downtimes associated to these reboots and the corresponding availability, c) the classification of reboot causes and d) the analysis of error dependencies among machines. 2 Deliverable CSDA3

7 2. SoS Dependability Modelling: The Travel Agency Example 2. SoS Dependability Modelling: The Travel Agency Example The aim of this part is to i) illustrate the main concepts of the SoS hierarchical dependability modelling framework proposed in deliverable DMS1 and to ii) show its applicability by considering the travel agency case study presented in deliverable DMS3 as an example. Dependability evaluation is performed in two main steps corresponding to: 1) Hierarchical description of the SoS and its interactions with the users, from the functional and structural point of view; this step consists in identifying and structuring the main information needed to support SoS dependability modelling. 2) Hierarchical construction and solution of the SoS dependability model based on the information provided at step (1). The information needed to describe the SoS behaviour from the user perspective is structured into four levels. The first level (user level) describes how the users interact with the SoS, and the three remaining levels (function, service and resource levels) detail how the user requests are implemented by the SoS. More specifically, the proposed levels are defined as follows: The user level describes the user operational profile in terms of the types of SoS functions invoked and the probability of activation of each of them. The function level describes the set of functions available at the SoS provider site. The service level describes the main services needed to implement each function and the interactions among them. Two categories of services are distinguished: those provided by the SoS provider (internal services) and those provided by external suppliers (external services). The resource level describes the architecture on which the services identified at the service level are implemented. At this level, the architecture, and fault tolerance and maintenance strategies implemented at the SoS provider site are detailed. However, each service provided by an external supplier is represented by a single resource that is considered as a black box. This is illustrated in Figure 2.1 where the dependability measure considered is availability. This figure shows that the SoS availability modelling and evaluation step is directly related to the SoS hierarchical description. It has been defined in such a way that the outputs of a given level are used in the next immediately upper level to compute the availability measures associated to this level (denoted by A(x) where x is a user, a function, a service or a resource). Accordingly, at the service level, the availability of each service is derived based on the availability of the resources involved in the accomplishment of this service. Similarly, at the function level, the availability of each function is obtained from the availability of the services implementing it. Finally, at the user level, the availability measures are obtained based on the availability measures of the functions invoked by the users. Various techniques can be used to model each level of the hierarchy: fault trees, reliability block diagrams, Markov chains, stochastic Petri nets, etc. The selection of the right technique to be used for each level mainly depends on the kinds of dependencies between the elements of the Dependable Systems of Systems 3

8 CSDA3- SoS Dependability Assessment: Modelling and Measurement considered level and on the quantitative measures to be evaluated. In Section 2.2, we will mainly make use of block diagrams and Markov chains to evaluate the availability of the travel agency. It is noteworthy that although the dependability measure considered in Figure 2.1 is availability other quantitative measures can be evaluated following the same approach, e.g., reliability and performability-related measures. This is illustrated in particular on the travel agency example where availability measures taking into account performance-related failures are considered (see Section 2.2.1). User 1 User 2 User level A(user 1 ) A(user 2 ) A(user N ) Start F1 Fn Exit Start F Fn Exit F1 F2 Fn Start F1 Fn Exit Start F Fn Exit Function level Availability modelling at the user level A(F 1 ) A(F 2 ) A(F n ) F1 F2 Fn Si 1 Si m Se 1 Se p Availability modelling at the function level Service level SoS provider External suppliers Si 1 Si 2 Si m Se 1 Se 2 Se p A(Si 1 ) A(Si k ) A(Se 1 ) A(Se p ) Si 1 Si 2 Si m Ri 1 Ri 2 Ri k Se 1 Se 2 Se p Re 1 Re 2 Re p Availability modelling at the service level Resource level A(Ri 1 ) A(Ri m ) SoS provider External suppliers A(Re 1 ) A(Re p ) Ri 1 Ri 2 Ri m Re 1 Re 2 Re p Availability modelling at the resource level SoS description SoS availability modelling Figure 2-1. SoS hierarchical availability modelling The above presentation shows that we need to structure the information about the target system to characterize each level of the hierarchical model (user, function, service and resource levels). The rest of this part is organised as follows. Section 2.1, presents the travel agency according to the above hierarchical description. Section 2.2 concentrates on modelling the availability of the travel agency. Section 2.3 gives some examples of dependability evaluation results. 4 Deliverable CSDA3

9 2.1 The Travel Agency (TA) Presentation 2. SoS Dependability Modelling: The Travel Agency Example The TA is designed to allow the users to plan and book trips over the web. For this end, the TA interacts through dedicated linking interfaces (LIFs) with several flight reservation, hotel booking and car rental component systems. The TA described in CS1 [Periorellis 2001, Periorellis & Dobson 2001] is composed of two basic components: the travel agent front end- client side, denoted as TAFE-CS, and the travel agent front end- server side, denoted as TAFE-SS (Figure 2.2). The TAFE-CS handles user s inputs, performs necessary checks and forwards the data to the TAFE-SS by calling the Abstract Service Interface. The TA-SS is the main component of the TA SoS. It is designed to respond to a number of calls from the TA-CS concerning for instance, availability checking, booking, payment and cancellation of each item of a trip. The TA-SS handles all transactions to and from the booking systems, composes items into full trips, converts incoming data into a common data structure supported throughout the SoS and finally handles all exceptions. Travel Agency TAFE-SS Users TAFE-CS trip details Abstract Service Interface Flight Hotel Car Flight reservation component systems Hotel booking component systems Car rental component systems Figure 2-2. The TA high-level structuring Starting from this very high-level description, we will further detail the system description according to the various aspects required for the hierarchical description. Therefore, we will first focus on the function and user levels together then the service and function levels before addressing the resource level Function and User Levels To fulfil its three main purposes (flight, hotel and car reservation) the TA supplies the users with various functions. The successive execution of these functions, will allow the users to obtain the actions/information looked-for from the TA. We have identified seven such possible functions, defined as follows: Start: this state identifies the start of a customer visit to the TA web site. Home: this state is reached when a customer accesses the TA home page. Dependable Systems of Systems 5

10 CSDA3- SoS Dependability Assessment: Modelling and Measurement Browse: in this state, the customer navigates through the links available at the TA web site to view any of the pages of the site. These links include for example the weekly promotions, help pages, frequent queries, etc. Search: here, the TA checks the availability of trip offers corresponding to the criteria specified by the customer. A user request can be composed of a flight, a hotel and a car reservation. Based on the information provided by the user, the TA converts the user requests into transactions to several hotel, flight and car reservation component systems and returns the results of the search to the user. Book: the customer chooses the trip that suits his request and confirms his reservation. Pay: this state is reached when the customer is ready to pay for the reservation fees for the trips booked on the TA site. Exit: end of the customer visit to the TA site. Operational profile To characterize the behaviour of the users accessing the TA web site, we consider the operational profile example presented in Figure 2.3 where the nodes represent the various functions identified above. The transitions among the nodes and the associated probabilities p ij describe how the users interact with the TA web site. A given class of users is defined by a specific set of probabilities p ij. These probabilities are usually obtained by collecting data on the web site (see e.g., [Menascé et al. 2000]). Home p 12 p 32 p 27 p 47 p 24 p 54 Start Exit Book Search p 57 p 56 p 45 p 34 p 44 p 23 p 37 p 67 Pay p 13 Browse p 33 Figure 2-3. User operational profile graph User execution scenarios Let us first assume that the various probabilities p ij are specified. Each path from node Start to node Exit of the operational profile denotes a user execution scenario (or shortly, user scenario) when visiting the TA web site. The probability of activation of each path denotes the relative frequency of the corresponding user scenario compared to the other scenarios of the same class. 6 Deliverable CSDA3

11 2. SoS Dependability Modelling: The Travel Agency Example Table 2.1 lists all the user scenarios derived from the example of Figure 2.3 and the associated probability of activation as obtained from processing the user profile graph. The parameters p ij are the probabilities associated to the transitions of the user operational profile. The notations {Home - Browse} * and {Search-Book} * mean that these functions are activated more than once in the corresponding scenarios, due to the presence of cycles in the graph 1. Table 2-1. TA user execution scenarios and associated probabilities (π i ) User scenario Scenario activation probability (π i ) 1: Start-Home-Exit p p : Start-Browse-Exit 3: Start-{Home; Browse} * -Exit 4: Start-Home-Search-Exit 5: Start-Browse-Search-Exit 6: Start-{Home; Browse} * - Search-Exit 7: Start-Home- {Search-Book} * -Exit 8: Start-Browse- {Search-Book} * -Exit 9: Start-{Home; Browse} * - {Search-Book} * -Exit 10: Start-Home- {Search-Book} * -Pay-Exit 11: Start-Browse- {Search-Book} * - Pay-Exit 12: Start-{Home - Browse} * - {Search-Book} * - Pay-Exit p 1 p p 33 p p p +p p p +p p p p + p p p 1 p p p p p p p p13p34p47 ( 1 p44) ( 1 p33) p p p p +p p p +p p p p + p 13 p p 33 ( 1 p33 p32p23) ( 1 p44) p p p p + p 45 p 54 p p44 1 p44 p45 p54 p p p p + p 45 p 54 p p44 ( 1 p44 p45 p54) ( 1 p33) p p p +p p p +p p p p + p 13 p 34 p p + p p p p33 1 p44 1 p p p 1 p p p p p p p p 1 p p p ( ) ( 33) ( ) ( ) p13p34p45 p56 p67 1 p p p 1 p p p p +p p p +p p p p + p p p 1 p -p p 1 p p p ( )( ) p p p Traditional techniques for computing path probabilities are presented e.g., in [Kemeny & Snell 1959] J. G. Kemeny and J. L. Snell, Finite Markov Chains, Princeton, NJ: Van Nostrand, 1959, [Howard 1971] R. A. Howard, Dynamic Probabilistic Systems Volume I: Markov Models, 576p., John Wiley & Sons, Inc., New York, Dependable Systems of Systems 7

12 CSDA3- SoS Dependability Assessment: Modelling and Measurement Sensitivity analyses based on the equations presented in Table 2.1 allow us to understand how the parameters p ij affect the probabilities associated to each path. Such analyses are useful to identify the most significant scenarios to be considered when evaluating the SoS availability as perceived by the users. Indeed, the higher the probability of activation of a given scenario, the higher its impact on the availability as perceived at the user level. Such availability measure is affected by the availability of the functions, services and resources involved in the corresponding user scenario. The scenarios listed in Table 2.1 can be grouped into four categories, denoted as SC1, SC2, SC3 and SC4 according to the activated functions: SC1 gathers all scenarios that lead to the execution of functions Home or Browse without invoking the other functions (i.e., scenarios 1-3). SC2 gathers all scenarios that include the invocation of the Search function, without going through the Book or Pay functions (i.e., scenarios 4-6). These scenarios may require several interactions between the TA and the flight, hotel and car reservation component systems. However, they do not end up with a booking or payment. SC3 gathers all scenarios that include the invocation of the Book function (i.e., scenarios 7-9). These scenarios involve several interactions between the TA and the booking systems. SC4 gathers all scenarios that reach the Pay function (i.e., scenarios 10-12). These scenarios end up with a payment. Let us denote by π(sc1), π(sc2), π(sc3) and π(sc4) the activation probability of SC1, SC2, SC3 and SC4. These probabilities can be obtained from Table 2.1 by summing the probabilities associated to the corresponding scenarios. Example of two user classes For our example, we define two customer profiles (denoted as user class A and user class B), with different values for the transition probabilities p ij. In particular, the class A profile is characterised by a high proportion of users who are mainly seeking for information without a buying intention, whereas the class B profile is characterized by a higher proportion of users really seeking for booking a trip. Tables 2.2 and 2.3 give the probability transition matrices associated to the class A and class B profiles, respectively. The associated scenario probabilities are given in Table 2.4 (in terms of percentage). 8 Deliverable CSDA3

13 2. SoS Dependability Modelling: The Travel Agency Example Table 2-2. User class A profile Start Home Browse Search Book Pay Exit Start Home Browse Search Book Pay Exit Table 2-3. User class B profile Start Home Browse Search Book Pay Exit Start Home Browse Search Book Pay Exit Table 2-4. User scenario probabilities (in %) for user classes A and B User scenario π i, Class A π i, Class B 1: Start-Home-Exit : Start-Browse-Exit : Start-{Home- Browse} * -Exit : Start-Home-Search-Exit : Start-Browse-Search-Exit : Start-{Home- Browse} * -Search-Exit : Start-Home-{Search-Book} * -Exit : Start-Browse-{Search-Book} * -Exit : Start-{Home- Browse} * -{Search-Book} * -Exit : Start-Home-{Search-Book} * -Pay-Exit : Start-Browse-{Search-Book} * -Pay-Exit : Start-{Home-Browse} * -{Search-Book} * -Pay-Exit Table 2.5 gives the probabilities π(sc1), π(sc2), π(sc3) and π(sc4) associated with the scenario categories SC1 to SC4, corresponding to scenarios involving functions up to Browse, Search, Book and Pay respectively. It can be seen that the user class B exhibits a higher probability of activation for scenarios SC2, SC3 and SC4, compared to the user class A. In particular, 80% of user transactions involve the external reservation systems in addition to the TA SoS, whereas this percentage is around 50% only when considering the class A profile. Dependable Systems of Systems 9

14 CSDA3- SoS Dependability Assessment: Modelling and Measurement Moreover, the percentage of transactions that end up with a payment of a trip is around 20% for user class B while it is almost 3 times lower for user class A. Table 2-5. π (SC1), π (SC2), π (SC3) and π (SC4) for user classes A and B π(sc1) π(sc2) π(sc3) π(sc4) Class A 47.9% 38.2% 6.4% 7.5% Class B 20.8% 44.0% 14.9% 20.3% These two examples of user classes will be used in Section 2.3 to evaluate the user availability Service and Function Levels The service level identifies the set of servers involved in the execution of each function and describes their interactions. This analysis requires a deep understanding of the business logic and the technical solutions implemented by the TA SoS provider. For the sake of illustration, Table 2.6 gives a simplified example of mapping between the functions provided at the SoS TA site, the internal servers directly controlled by the TA SoS provider and the external servers operated and controlled by external suppliers. Table 2-6. Mapping between functions and services Internal Services External Services Web Application Database Flight Hotel Car Payment Home Browse Search Book Pay The external suppliers correspond to the flight reservation component systems (AF, KLM, BA, ), hotel reservation component systems (Sofitel, Holiday Inn, ), and car rental component systems (Hertz, Avis, Europcar, ), that provide information on the corresponding items of a trip. Also, we assume that the SoS provider uses the services of an external payment component system for handling card-based transactions. The internal services are supported by three types of servers: 1) Web servers that receive user requests and send back the requested data. 2) Application servers that implement the main operations needed to process user requests. 10 Deliverable CSDA3

15 2. SoS Dependability Modelling: The Travel Agency Example 3) Database servers handling data related operations (for storing and retrieving information about flight reservation, hotel booking and car rental companies, as well as information on customer orders) 2. The execution of the Home function involves only the web server. However, for the other functions several servers are involved. In this case, it is necessary to analyse for each function the interactions among the servers involved. Similarly to the user level, we have to identify for each function all possible function execution scenarios (also referred to as function scenarios). This is achieved through the interaction diagram dedicated to each function. Examples of interaction diagrams for the Browse, Search, Book and Pay functions are given hereafter. Browse Figure 2.4 describes the interactions among the servers involved in the accomplishment of the Browse function. The Begin and End nodes identify the beginning and the end of each function execution. Each path from the Begin node to the End node identifies one possible function scenario. The probability of activation of each scenario can be evaluated by taking into account the probabilities q ij associated to the transitions involved in the corresponding scenario. Note that the probability of activation of non-labelled transitions is one End WS End q 2,3 q 4, Begin WS AS DS AS WS End q 2,4 q 4,7 WS : web server AS : application server S : database server Figure 2-4. Interaction diagram of the Browse function We can identify three scenarios described as follows: 1 2 3: The user sends a request to the web server (node 2). The data requested is available in the local cache and returned back to the user (node 3). This marks the end of this interaction : The web server accepts the request from the user and sends it to the application server (node 4). In this case the requested data is not available in the local cache. The application server processes the user request and returns a dynamically 2 If we refer to the TA high-level design presented in Figure 2.2, the Web-servers will typically host the travel agency front-end client side (TAFE-CS) component and the application servers will host the front-end server side (TAFE-CS) components (including the Abstract Service Interface, and the Flight, Hotel and Car LIFs). Dependable Systems of Systems 11

16 CSDA3- SoS Dependability Assessment: Modelling and Measurement generated page to the web server (node 5). The latter is then forwarded to the user (node 6). The database is not involved in this case : The application server requires some specific information that is on the TA database server (node 7). After the database server has answered the application server, the latter processes the user request (node 8) and sends the results to the web server (node 9). The latter generates an HTML page incorporating the corresponding outputs (node 10). Search The interaction diagram describing the execution of the Search function is decomposed into 9 stages (Figure 2.5). The input data provided in the search request issued by the user (node 1) are first processed by the web server WS (node 2). WS performs necessary checks, and then breaks down the user request into three individual requests corresponding to each aspect of the trip. If data is correct and in the right format, it is then forwarded to the application server AS (node 4), otherwise an exception is sent to the user (node 3). AS uses the request information to formulate a query and asks the database server (node 5) for the list of component booking systems to be contacted. Based on the answer received, AS sends a query (node 6) to the selected systems (identified by the Flight, Hotel and Car nodes in our example). The AND operator means that the request is submitted to the three types of booking systems (nodes 7.a, 7.b, 7.c). The answers returned to AS will be formatted by AS (node 8) and sent to WS (node 9) that forwards them to the user (node 10). The number of Flight, Hotel and Car reservation systems contacted is not indicated in this figure. We assume that the TA SoS always interacts with the same booking systems. We assume that a transaction is successful when, for each type of service (Flight, Hotel and Car reservation), at least one system responds to the request submitted by AS. 3 End 7.a Flight q 2, b Begin WS AS DS q 2,4 AS Hotel AS WS End AND 7.c Car Figure 2-4. Interaction diagram of the Search function Book An example of interaction diagram the Book function is given in Figure 2.6. In this example, the trip booking order received from the user through the web server is processed by the application server. Using the parameters embedded in the book order associated with the selected trip, the application server interacts with the corresponding flight, hotel and car booking systems to book the selected trip. The booking references returned to the application 12 Deliverable CSDA3

17 2. SoS Dependability Modelling: The Travel Agency Example server are then stored in the database, before a confirmation is sent to the user through the web server. 4.a Flight b Begin WS AS Hotel AS DS AS WS End AND 4.c Car Figure 2-6. Interaction diagram of the Book function Pay The interaction diagram for the Pay function is presented in Figure 2.7. When a payment call is received through the web server, the booking data is first checked by the application server, then a call is sent to the payment server, for authentication and verification purposes, and also to accomplish the payment. Finally, the application server updates the information in the database concerning client orders, before sending a confirmation to the user Begin WS AS PS AS DS WS End WS : web server AS : application server DS : database server PS : payment server Figure 2-7. Interaction diagram of the Pay function Resource Level The various services are mapped into the resources involved in their accomplishment. Therefore, we need to take into account the real hardware and software organisation of the SoS. With respect to external services, as the architecture on which these services is not known, we associate to each external service a single resource that is considered as a black box. For internal services, it is possible to detail the organization of internal resources for which the architecture is known. Different architectural solutions are possible for implementing the internal services. In particular, several alternatives corresponding to different organisations of the servers on the hardware support (e.g., dedicated hosts for each server, vs. multiple servers on the same host) or different fault tolerance strategies (non-redundant servers vs. replicated servers) might be analysed and compared from the availability point of view. Replicated servers can be located at one site or be geographically distributed at distinct sites. Also, fault tolerance can be applied to provide redundant accesses to the Internet or redundant communication links between internal resources. Additionally, the architecture solutions might be compared with regards to the Dependable Systems of Systems 13

18 CSDA3- SoS Dependability Assessment: Modelling and Measurement maintenance strategy adopted by the SoS provider (e.g., immediate maintenance vs. deferred maintenance, dedicated vs. shared repair resources). For illustration purposes, we consider the two architectures presented in figures 2.8 and 2.9. The basic architecture (Figure 2.8) consists in allocating a dedicated host to each server and interconnecting these hosts through a LAN. The LAN is viewed as a single resource providing communication between the servers. Concerning external services, we assume that the flight, hotel and car reservation systems are composed of respectively NF, NH and NC components each. The basic architecture suffers from several weak points due to the lack of redundancy and scalability. The architecture described in Figure 2.9 applies redundancy in several places to reduce some of these weaknesses. The TA SoS provider site architecture is based on a server farm configuration with load balancing. This redundant architecture is based on NW web servers, two application servers and two database servers with two mirrored disks. The servers are connected through a LAN (that can be replicated). Indeed, several LANs are generally used to interconnect these servers, nevertheless we will assume that all of them are represented as a single LAN. Also, to simplify the modelling, the load balancers are not explicitly described in this architecture. SoS Provider site Payment server Flight reservation Flight Flight reservation component reservation system #N component system #2 F component system #1 Disk Database server Web server Internet Hotel reservation component Hotel reservation system #1 component Hotel reservation system #2 component system #N H Application server LAN Car reservation component Car reservation system #1 component Car reservation system #2 component system #N C Figure 2.8: Basic architecture SoS Provider site D1 D2 Database server 1 Database server 2 Application server 1 Application server 2 Web server 1 Web server 2 Web server N W LAN Payment server Internet Flight reservation Flight Flight reservation component reservation system #N component system #2 F component system #1 Car reservation component Car reservation system #1 component Car reservation system #2 component system #N C Hotel reservation component Hotel reservation system #1 component Hotel reservation system #2 component system #N H Figure 2-9. Redundant architecture 14 Deliverable CSDA3

19 2. SoS Dependability Modelling: The Travel Agency Example In the next section, we will model the availability of both the basic and redundant architectures. 2.2 TA Availability Modelling The availability modelling and evaluation of the TA SoS will be carried out according to the hierarchical description of the SoS in four steps in order to evaluate progressively the resource, service, function and user levels (see Figure 2.1). The outputs of a given level are used in the next immediately upper level to compute the availability measures associated to this level. An overview of the various modelling steps is recalled hereafter, before modelling specifically the TA case study. Resource models: The resource models describe the behaviour of the SoS provider resources, as resulting from components interaction, failure and repair. Depending on the system nature and dependency among components, one or several models are built. The outputs of these models are the availabilities of the various resources. Service level model: We make a distinction between internal and external services. External services are delivered by providers for whom only little information is known. It is assumed that external services are independent. The availability of these services, denoted as {A(Se j )}, is expected to be provided using specific experiments or measurements, such as those presented in [Long et al. 1995, Kalyanakrishnam et al. 1999a, Machiraju et al. 2000]. Internal services are supplied by the resources of the SoS TA provider. The availability of these services, denoted as {A (Si j )}, is evaluated based on the availability of the resources on which these services are implemented. Indeed, very often, as a result of the analysis of the mapping between services and resources and due to dependencies between the resources, for a given service, the service model and the resource model are built as a unique model, for efficiency reasons. In the case of TA, we will build the same models for services and resources. Function level model: Availability modelling is based on the knowledge of the availability of the services involved in function accomplishment, along with the matrix giving the mapping between the functions and the services, and the scenario probabilities derived from the interaction diagram of each function. The outputs of this level are the availability of the various functions {A (F i )} that can be evaluated as follows. M i j j i j= 1 A(F) = φ A( σ ( F)) (1) where: M is the number of function execution scenarios for function F i in the interaction diagram φ j is the probability of activation function execution scenario j σ j (F i ) is the set of servers involved in function execution scenario j A(σ j (F i ) is the availability of the servers involved in function execution scenario j User level model: The availability of the target SoS as perceived by a given user class is based on the knowledge of the execution scenarios followed by the user when visiting the SoS Dependable Systems of Systems 15

20 CSDA3- SoS Dependability Assessment: Modelling and Measurement provider site(s) (derived from the user operational profile) and the availability of the functions invoked in each scenario. The outputs of this level are the availability as seen by the various classes of users {A (user k )}. Similarly to the function level, A (user k ) is obtained as follows: N A(user k) = π i A(L i ) i1 = (2) where: N is the number of user scenarios in the Markov chain describing the user operational profile π i is the probability of activation of user scenario i L i is the set of functions involved in user scenario i A(L i ) is the availability of the functions involved in user scenario i In the following, we illustrate the above availability evaluation approach on the TA example, starting with the evaluation of the availability measures at the service level based on the modelling of the two architectures presented in Section Service Level Availability At this step, we are concerned with the evaluation of the availability of external and internal services External services We assume that the external resources are identical for both architectures. They correspond to Flight reservation, Hotel reservation, Car reservation and Payment. To evaluate the availability of these services, each external component system is described by a single resource modelled as a black box. Each of these systems is assumed to fail independently of all the others. The availabilities of the various component systems are defined as follows: A Fi : Availability of the flight reservation component system i, i = 1, 2,, NF A Hi : Availability of the hotel reservation component system i, i = 1, 2,, NH A Ci : Availability of the car reservation component system i, i = 1, 2,, NC A PS : Availability of the payment component system. A net : Availability of the TA connectivity to the Internet. Using the failure independence assumption and considering that the service is provided as long as at least one of the redundant component systems is available, the availability of the external services can be directly derived as in Table Deliverable CSDA3

21 2. SoS Dependability Modelling: The Travel Agency Example Table 2-7. External service availability NF A(Flight) 1 ( 1 i1 = NH A(Hotel) 1 ( 1 i1 = A(Car) 1 ( 1 A(Payment server) A PS It is worth mentioning that if the TA connectivity to the Internet is unavailable, none of the services is provided. As a consequence, the availability of the TA connectivity to the Internet will be accounted for by multiplying the user availability expression by A net as will be seen in Section NC i1 = A Fi A Hi A Ci ) ) ) Internal services They concern the web, application and database services. The availability measures will be evaluated for the two architectures of Figures 2.8 and 2.9 (basic and redundant architectures, respectively). For both architectures, communication between servers is achieved by a local area network (LAN). The LAN is assumed to be a single point of failure, i.e., when the LAN is unavailable, all internal services are unavailable. As a consequence, the LAN availability, denoted by A LAN, is in factor of all equations giving the various function availabilities (as will be seen in Section 2.2.2). A LAN can be evaluated using the model discussed in deliverable DMS1 [Kaâniche et al. 2001]. As the primary objective of this deliverable is to show the applicability of the approach to the TA SoS, we make simplistic assumptions for the application and database services. More realistic assumptions are made for the web service, to illustrate the kind of more complex calculations that can be performed. Similar approaches can be followed to evaluate the availability of the application and database services. Application and database service availability Let us denote by C AS and C DS the computer hosts associated to the application and database servers, respectively. Their availability are denoted by A(C AS ) and A(C DS ). The disk availability is denoted by A(Disk). To simplify the presentation we assume that each component (i.e., computer hosts and disks) fails independently of the others. The application and database service availability are given in Table 2.8. Dependable Systems of Systems 17

22 CSDA3- SoS Dependability Assessment: Modelling and Measurement Table 2-8. Application and database service availability Basic architecture Redundant architecture A(Application service) A( C AS ) ( A( C )) A(Database service) A( C ) A( Disk) DS 1 ( 1 A( C 2 DS)) 1 ( 1 A( Disk)) 2 AS [ ][ ] In the following, we focus on the evaluation of the web service availability for the basic and redundant architectures, respectively. Web service availability To evaluate the availability of the web service, we distinguish two sources of failures: 1) Hardware and software failures that affect the computer host and lead to the failure of the web server; 2) Performance-related failures that are due to the fact that the web server generally has a limited capacity. When the input buffer is full, the incoming requests are not serviced. The web service is assumed to be available when neither of the above types of failures occurs. The impact of both types of failures on the web service availability can be accounted for by adopting a composite performance and availability (generally called performability) evaluation approach. The main idea was initially proposed by Meyer [Meyer 1980, Meyer 1982] and it has been since extensively used in performability modelling. It consists in combining the results obtained from two models: a pure performance model and a pure availability model. The performance model takes into account the request arrival and service processes and evaluates performance related measures conditioned on the state of the system as determined from the availability model. The availability model is used to evaluate the steady state probability associated to the system states that result from the occurrence of failures and recoveries. This approach is based on the assumption that the system reaches a quasi steady state with respect to the performance related events, between successive occurrences of failure-recovery events. This assumption is valid when the failure/recovery rates are much lower than the request arrival/service rates, which is typically true in our context. Basic architecture It is composed of a unique computer host, C WS. Let us denote by p K the probability that the web server input buffer (whose size is K) is full when a request is received. The evaluation of p K is derived from the performance model and depends on the assumptions made about the request arrival process and the request service process. Let us assume that the request arrivals are modelled by a Poisson process with rate α and the request service times are exponentially distributed with rate ν. Then the web server behaviour governed by the arrival and service processes can be modelled by an M/M/1/K queue. 18 Deliverable CSDA3

23 2. SoS Dependability Modelling: The Travel Agency Example The probability that an arriving request is lost due to buffer being full is well known (see e.g., [Allen 1978]) and is given by: 1 ρ K pk = ρ K 1 1 ρ + (3) with: ρ = α. (4) ν The availability model is composed of two states: up and down states. The steady state probability of the up state corresponds to the system steady-state availability denoted A (C WS ). The availability of the web service can then be expressed as follows: A A (Web service) = ( C )( 1 p ) (5) WS K Thus, this definition of availability allows incorporation of the inherent dependence between performance and dependability in one equation. Redundant Architecture The redundant architecture is composed of NW identical web servers. We assume that all component failures are independent and that the web service is provided as long as at least one of the redundant component systems is available. The performance model associated to this architecture to evaluate, p K (i), the probability that web requests are lost due to input buffer being full is assumed to be described by an M/M/i/K queue, where i is the number of servers available and K is the size of the buffer. For a system state with i operational servers, the probability that an arriving request is lost due to buffer being full, denoted as p K (i), is given by (see, e.g., [Allen 1978]): p K(i) = K j j ρ i1 ρ K i K-i + ρ i! j0 = j! ji = i j-i i! 1 (6) where ρ = α. (7) ν With respect to the availability model, the aim is to model the behaviour of the redundant architecture as resulting from the occurrence of failures/repairs in order to evaluate the steady state probability associated to system states i (where i is the number of operational servers, as denoted above). In the following, two assumptions are made with regard web server failures and recovery. First we assume a perfect coverage following the failure of a web server then we consider the case where coverage is imperfect. Perfect coverage: The model presented in Figure 2.10 is based on the assumption that each web server runs on a dedicated computer host. Web server failures occur with rate λ. The repair rate is µ. Also, the model assumes shared repair facilities. Upon the failure of a web server, it is automatically Dependable Systems of Systems 19

24 CSDA3- SoS Dependability Assessment: Modelling and Measurement disconnected and the system is reconfigured (with probability 1) with the web servers that are still operational. N w λ (N w - 1) λ (N w - 2) λ 2λ λ N w N w - 1 N w µ µ µ µ µ Figure Markov model of the NW web servers (perfect coverage) Let us denote by Π i the steady-state occupation probability of state i, i = 0, 1,, NW. In state i, i 0, i web servers are available to process the input requests. (Π 0 correspond to web server unavailability). The Π i are given by: Π i i= 1 µ λ i! Π o i =1,, N w. (8) Π 0 = 1 Nw i 1 i=0 i! µ λ (9) The availability of the web service is as follows: A(Web service) = (i) + Nw 1 ΠipK Πo i=1 (10) where p K (i), the probability that an arriving request in state i is lost due to buffer being full), is given by equation (6). This definition of availability incorporates the inherent dependence between performance and dependability in one equation. The expression between the brackets corresponds to the probability that a web request is not serviced either due i) to buffer being full or ii) to web server unavailability. Imperfect coverage: The model of Figure 2.10 is based on perfect failure coverage and reconfiguration assumption. This assumption is revisited in the model presented in Figure 2.11, where from each state i, two output transitions are considered: 1) After a covered failure (transition with rate icλ) the system is automatically reconfigured into an operational state with (i-1) web servers. 2) Upon the occurrence of an uncovered failure (transition with rate i(1-c)λ), the system moves to a down state y i, where a manual reconfiguration action is required before 20 Deliverable CSDA3

25 2. SoS Dependability Modelling: The Travel Agency Example moving to operational state (i-1). The corresponding reconfiguration times are assumed to be exponentially distributed with mean 1/β. N w cλ (N w - 1) cλ (N w - 2) cλ 2cλ λ N w N w - 1 N w - 2 µ µ µ. β β β µ 1 µ 0 N w (1-c)λ (N w - 1) (1-c) λ 2 (1-c) λ y Nw - 1 y Nw - 2 y 2 Figure Markov model of the NW web servers (imperfect coverage) Solving Figure 2.11 model for steady-state probabilities leads to: Π Π i i= 1 µ λ yi i! Π o i-1 = µ (1 c) i-1)! Π µ β( λ µ (1 c) µ i! β(nw-i-1)! ( λ) Nw i Nw-2 Π 0 = 1 µ + i=0 λ i=0 o 1 Nw-i-1 i =1,, NW (11) i =1,, N w -2 (12) (13) Giving the fact that states y i, correspond to down states, the availability of the web service can be computed as follows: A(Web service) = Nw Nw (i) + + i 2 1 Π p K Πy Π i o (14) i=1 where p K (i), is also given by equation (6). Summary of web service availability i=1 Table 2.9 recalls the equations of the web server availability for the basic and redundant architecture, assuming perfect and imperfect coverage. Dependable Systems of Systems 21

26 CSDA3- SoS Dependability Assessment: Modelling and Measurement Table 2-9. Web service availability Architecture Basic Web service availability A A (Web service) = ( C )( 1 p ) 1 ρ K pk = ρ K 1 1 ρ + ρ = α ν WS K Redundant (perfect coverage) Redundant (imperfect coverage) A(Web service) = (i) + Nw 1 ΠipK Πo i=1 K j j 1 ρ i1 ρ K p K(i) = i K-i + ρ i! j0 = j! ji = i j-i i! ρ = α ν Π 0 = 1 Nw i 1 i! µ λ Π i=0 i i= 1 µ λ i! Π o A(Web service) = Nw Nw (i) + + i 2 1 Π pk Πy Π i o i=1 p K(i) = ρ = α ν K j j ρ i1 ρ K i K-i + ρ i! j0 = j! ji = i j-i i! Nw i Nw-2 Π 0 = 1 µ + i=0 λ i=0 Π Π yi yi 1 µ (1 c) µ i! β(nw-i-1)! ( λ) i-1 = µ (1 c) i-1)! Π µ β( λ i-1 = µ (1 c) i-1)! Π µ β( λ o o i=1 1 Nw-i-1 22 Deliverable CSDA3

27 2. SoS Dependability Modelling: The Travel Agency Example Function Level Availability The availability evaluation of each function, identified at the function level, is based on the availabilities of the services involved in its accomplishment and when various function execution scenarios are possible on the activation probability of each scenario. Table 2.10 gives the availability for the Home, Browse, Search, Book and Pay functions. A(WS), A(AS), A(DS) correspond respectively to A(Web service), A(Application service) and A(Database service), given in Tables 2.8 and 2.9. A(PS) corresponds to A(Payment service) given in Table 2.7 A(Flight), A(Hotel) and A(Car) are given in Table 2.7. The parameters q ij involved in the availability of the Browse function are associated to the three execution scenarios of this function presented in Section Note that all the function equations include the product A net A LAN, meaning that if the TA connectivity to the Internet or the internal communication among the servers is not available, none of the TA functions can be invoked by the users. Also, the Book function has the same availability equation as the Search function. This is due to the fact that we have assumed that the former uses a subset of the resources used by the latter. Indeed, in our example the Book function can be achieved only if the Search function has succeeded. This led us to assume that if the Search function succeeds, automatically the Book function succeeds. Of course, other situations can be modelled. Table Function level availabilities A (Home) A (Browse) A (Search) A (Book) A (Pay) A net A LAN A(WS) A net A LAN [q 23 A(WS) + q 24.q 45 A(WS)A(AS) + q 24. q 47 A(WS)A(AS) A(DS)] A net A LAN A(WS) A(AS) A(DS) A(Flight) A(Hotel) A(Car) A net A LAN A(WS) A(AS) A(DS) A(Flight) A(Hotel) A(Car) A net A LAN A(WS) A(AS) A(DS) A(PS) User Level Availability For a given user operational profile, the availability perceived by the users can be obtained by evaluating for each user execution scenario derived from the operational profile, the expression specifying that all the functions invoked in the corresponding scenario are available. When several functions are invoked in a given path, a careful analysis of the dependencies that might exist among the functions due to shared services or resources is needed at this stage to evaluate the availability measure associated to the path from the availability of the corresponding functions. Dependable Systems of Systems 23

28 CSDA3- SoS Dependability Assessment: Modelling and Measurement Table 2.11 gives the availabilities associated to the user scenarios presented in Section (Table 2.3). The first column identifies the scenario and the functions invoked. The second column specifies the availability of the user scenario based on the availability of the functions and the analysis of their dependencies. The third column gives the availability of the user scenario that takes into account the availability of corresponding services and resources. It is worth mentioning that the results presented in column 2 take into account the dependencies that exist among the various functions involved in each scenario. As discussed in Section 2.2.2, such dependencies mainly result from resource sharing among the function (this is the case of the Search and Book functions). Table User scenarios and associated availabilities Scenario Availability wrt associated functions 1: Start-Home-Exit A (Home) A net A LAN A(WS) Availability 2: Start-Browse-Exit A (Browse) A net A LAN [q 23 A(WS) + q 24.q 45 A(WS)A(AS) + q 24. q 47 A(WS) A(AS) A( DS)] 3: Start-{Home; Browse} * - Exit A (Browse) A net A LAN [q 23 A(WS) + q 24.q 45 A(WS)A(AS) + q 24. q 47 A(WS) A(AS) A(DS)] 4: Start-Home-Search-Exit A (Search) 5: Start-Browse-Search-Exit A (Search) A net A LAN A(WS) A(AS) A(DS) A(Flight) A(Hotel) A(Car) A net A LAN A(WS) A(AS) A(DS) A(Flight) A(Hotel) A(Car) 6: Start-{Home; Browse} * - Search-Exit 7: Start-Home-{Search- Book} * -Exit 8: Start-Browse- {Search-Book} * -Exit 9: Start-{Home; Browse} * - {Search-Book} * -Exit 10: Start-Home-{Search- Book} * -Pay-Exit 11: Start-Browse-{Search- Book} * - Pay-Exit 12: Start-{Home - Browse} * - {Search-Book} * - Pay-Exit A (Search) A (Search) A (Search) A (Search) A(Search; Pay) A(Search; Pay) A(Search; Pay) A net A LAN A(WS) A(AS) A(DS) A(Flight) A(Hotel) A(Car) A net A LAN A(WS) A(AS) A(DS) A(Flight) A(Hotel) A(Car) A net A LAN A(WS) A(AS) A(DS) A(Flight) A(Hotel) A(Car) A net A LAN A(WS) A(AS) A(DS) A(Flight) A(Hotel) A(Car) A net A LAN A(WS) A(AS) A(DS) A(Flight) A(Hotel) A(Car) A(PS) A net A LAN A(WS) A(AS) A(DS) A(Flight) A(Hotel) A(Car) A(PS) A net A LAN A(WS) A(AS) A(DS) A(Flight) A(Hotel) A(Car) A(PS) 24 Deliverable CSDA3

29 2. SoS Dependability Modelling: The Travel Agency Example Taking into account: the activation probabilities of all user scenarios i, π i, (whose equations are given in Table 2.3, for the TA example). The numerical values of π i, are given in Table 2.4 for user classes A and B, the contribution of all scenarios i, as given in Table 2.11, the user availability is given by equation (15). A(user) = A net A LAN A(WS) [π 1 + (π 2 +π 3 ) {q 23 + A(AS) (q 24 q 45 + q 24 q 47 A(DS)} + A(AS) A(DS) A(Flight) A(Hotel) A(Car) {(π 4 +π 5 +π 6 +π 7 +π 8 +π 9 ) + (π 10 +π 11 +π 12 ) A(PS) }] (15) Taking into account the grouping of user scenarios into four categories SC1, SC2, SC3 and SC4 as defined in Section 2.1.1, Equation 15 can be written as follows: A(user) = A(SC1) + A(SC2) + A(SC3) + A(SC4) (16) where A(SC1) = A net A LAN A(WS){π 1 + (π 2 +π 3 ) {q 23 + A(AS) (q 24 q 45 + q 24 q 47 A(DS)} A(SC2) = A net A LAN A(WS) A(AS) A(DS) A(Flight) A(Hotel) A(Car) (π 4 +π 5 +π 6 ) A(SC3) = A net A LAN A(WS) A(AS) A(DS) A(Flight) A(Hotel) A(Car) (π 7 +π 8 +π 9 ) A(SC4) = A net A LAN A(WS) A(AS) A(DS) A(Flight) A(Hotel) A(Car) A(PS) (π 10 +π 11 +π 12 ) For a given user class, Equation (16) enables the analysis of the relative contribution to availability of scenarios that end up with a payment compared to all the scenarios than might be invoked by the users. Equations (15) and (16) will be used in Section 2.3 to evaluate the availability of the two user classes A and B defined in Tables 2.2 and Evaluation Results In the previous section, we have defined two user classes for the travel agency. They both use the same set of functions that are activated differently. Also, we have defined two possible TA architectures: a basic architecture (in which each service is implemented on a single computer host) and a redundant architecture (composed of NW redundant web servers, a duplex application server and a duplex database server using a mirrored disk). We have established the models of the various services, the function models as well as the user model and derived their availabilities. Concerning the redundant architecture, when building the web service, we made two assumptions with respect to web server recovery (perfect and imperfect coverage). From equation (15), it can be seen that the availabilities of the LAN, the net and the web service are the most influential ones (i.e., their impact is of the first order, while the others are Dependable Systems of Systems 25

30 CSDA3- SoS Dependability Assessment: Modelling and Measurement at least at the second order). This is due to the fact that all requests (more exactly, all user scenarios) use these three services. In the rest of this section, we will first show the impact of the number of web servers as well as their failures rates on the web service availability, according to the request arrival rates. Then, based on the various equations derived in the previous section, we will evaluate the user availability as perceived by user classes A and B. web service availability results Figures 2.12 and 2.13 give the web service availability for perfect and imperfect fault coverage, with the number of web servers NW varying from 1 to 10. It is worth mentioning that when only one web server is used (NW = 1), the results correspond to the basic architecture. The parameters used to obtain these curves are indicated on the figures. Sensitivity analyses are done considering different values of web server failure rates (10-2, 10-3 and 10-4 per hour) and request arrival rates (50, 100 and 150 requests per second). It is assumed that each web server has a processing rate ν equal to 100 per second and a repair rate µ equal to 1 per hour. The mean reconfiguration rate of the web server architecture (β) is 12 per hour (i.e., 1/ β = 5 min) and the buffer size K is assumed to be 10. Web service Unavailability (1- A (WS) ) 1 e+0 1 e-1 1 e-2 1 e-3 1 e-4 1 e-5 1 e-6 1 e-7 1 e-8 α = 150/sec λ = 1e-2/hour α = 50/sec λ = 1e-4/hour α = 150/sec λ = 1e-3/hour α = 150/sec λ = 1e-4/hour α = 100/sec λ = 1e-2/hour µ = 1/hour ν = 100/sec β = 12/hour K = 10 c = 1 α = 100/sec λ = 1e-3/hour α = 100/sec λ = 1e-4/hour 1 e-9 α = 50/sec λ = 1e-3/hour α = 50/sec λ = 1e-2/hour 1 e Number of web servers (Nw ) Figure Web service unavailability (perfect coverage) 26 Deliverable CSDA3

31 2. SoS Dependability Modelling: The Travel Agency Example 1 e+0 1 e-1 α = 100/sec λ = 1e-2/hour µ = 1/hour ν = 100/sec β = 12/hour K = 10 c = 0.98 Web service Unavailability (1- A (WS) ) 1 e-2 1 e-3 1 e-4 1 e-5 1 e-6 α = 50/sec λ = 1e-2/hour α = 50/sec λ = 1e-3/hour α = 100/sec λ = 1e-3/hour α = 50/sec λ = 1e-4/hour α = 100/sec λ = 1e-4/hour 1 e Number of web servers (Nw ) Figure Web service unavailability (imperfect coverage) Both figures show that increasing the number of web servers NW from 1 to 2, 3 or 4 (depending on the failure and request arrival rates) reduces the web service unavailability. However, the trend is reversed when the coverage is imperfect for NW values higher than 4 (Figure 2.13). This is due to the fact that when the coverage is imperfect, increasing the number of servers also increases the probability for the system being in states y i, (of Figure 2.11) where the web service is unavailable and a manual reconfiguration action is required. Actually, the probability of a request being rejected because the buffer is full plays a significant role until a certain value of NW. When the number of servers is higher than the threshold value, the total service rate and the buffer capacity are sufficient to handle the flow of arrivals without rejecting requests. In this case, the unavailability of the web service mainly results from hardware and software failures leading the web server architecture to a down state. Compared to the imperfect coverage model, it can be noticed that the model with perfect coverage is more sensitive to the variation of NW. Indeed the unavailability decreases exponentially when NW increases and the trend is not reversed for values higher than 4. Also, the web servers failure rate has a significant impact on availability only when the system load (α/ν) is lower than 1. Design decisions can be made based on the results presented on these figures. In particular, we can determine the number of servers needed to achieve a given availability requirement, or evaluate the maximum availability that can be obtained when the number of servers is set to a given value. For instance, considering the model with imperfect coverage, the number of servers needed to satisfy an unavailability lower than 5 min/year (unavailability < 10-5 ), with a failure rate equal to 10-3 per hour will be at least 2 if the request arrival rate is 50 per second and 4 if the request arrival rate is 100 per second. We obtain the same result with a failure rate Dependable Systems of Systems 27

32 CSDA3- SoS Dependability Assessment: Modelling and Measurement 10-4 per hour, however such a requirement cannot be satisfied with a failure rate of 10-2 per hour. Similar sensitivity analyses can be done to study the level of availability that can be achieved when the number of web servers is set to a given value. For instance, if we decide to employ three servers to support the web service, we would have an unavailability lower than 1hour per year. This is true when the failure rate varies from 10-2 to 10-4 and the system load (α/ν) is less than 1. User level availability results We will consider equations (15) and (16) presented in Section to evaluate the user availability as perceived by user classes A and B. Numerical values should be assigned to the various parameters involved in these equations. These parameters, together with their numerical values are given in Table The probabilities characterizing user classes A and B operational profiles have been presented in Tables 2.2, 2.3 and 2.4. Table Model parameters A net = A LAN = A(C AS ) = A(C DS ) = A(Disk) = 0.9 A PS = A Fi = A Hi = A Ci = 0.9 q 23 = 0.2 q 24 = 0.8 q 45 = 0.4 q 47 = 0.6 Referring to equation (15), to analyse the impact of the operational profile on the user perceived availability, we will consider the normalized availability given by A(user). This quantity AnetALAN A( WS ) does not depend on the availability of the LAN, the Internet or the web service. Table 2.13 presents the normalized availability for user classes A and B considering different values for the number of flight, car and hotel reservation systems (N F, N H, N C ) interacting with the travel agency SoS. For the sake of simplicity, the same number is assumed for N F, N H and N C. Also, according to Table 2.12, it is assumed that all the reservation systems have the same availability (A Fi = A Hi = A Ci = 0.9). The results in Table 2.13 show that for a given user class, the normalized availability increases significantly when the number of reservation systems increases from 1 to 4, and then stabilizes. The rate of availability variation is directly related to the availability assigned to each reservation system. Comparison of the results obtained for class A and B users show that different operational profiles might lead to significant differences in the availability perceived by the users. For instance, considering the case N F = N H = N C = 10, the normalized user perceived unavailability is about 57 hours per year for class A users and 74 hours for class B users. Such unavailability takes into account all the scenarios that might be invoked by the users. 28 Deliverable CSDA3

33 2. SoS Dependability Modelling: The Travel Agency Example Table Normalized user perceived availabilities for user classes A and B as a function of the number of flight, car and hotel reservation systems N F = N H = N C A(Class A users) AnetALAN A( WS ) A(Class B users) AnetALAN A( WS) The user perceived availability can be analysed from another perspective by considering equation (16) which allows the evaluation of the relative contribution to the observed availability of each category of user scenarios (SC1, SC2, SC3 and SC4 as defined in Section 2.1.1). This is illustrated on figures 2.14 and 2.15 considering class A and class B users, respectively, and assuming that the web service is implemented on four servers with imperfect coverage. UA (A users) (respectively UA (B users)) denotes the unavailability perceived by Class A users, and UA(SCi), i varying from 1 to 4, denotes the contribution of scenarios SCi to the user perceived unavailability. It can be seen that the unavailability caused by scenarios SC4 that end up with a trip payment is higher for class B users compared to class A users (43 hours downtime per year for class B users compared to 16 hours for class A users, when considering the steady values). Therefore, the impact in terms of loss of revenue for the TA provider will be higher. Indeed, if the users transaction rate is 100 per second, the total number of transactions ending up with a payment that are lost is 5.7 million for class A users and 15.5 million for class B users. Assuming that the average revenue generated by each transaction is 100 euros, then the loss of revenue amounts to 570 million euro and 1.55 billion euros, respectively. This result clearly shows that it is important to have a faithful estimation of the user operational to obtain realistic predictions of the impact of failures from the economic and business view points. Dependable Systems of Systems 29

34 CSDA3- SoS Dependability Assessment: Modelling and Measurement 1 µ = 1/hour ν = 100/sec β = 12/hour K = 10 c = 0.98 α = 100/sec λ = 1e-4/hour Nw=4 0.1 UA(SC2) UA(A users) Unavailability 0.01 UA(SC4) UA(SC1) UA(SC3) NF=NH=NC Figure Class A users unavailability with the unavailability of associated scenarios SC1, SC2, SC3 and SC4 1 µ = 1/hour ν = 100/sec β = 12/hour K = 10 c = 0.98 α = 100/sec λ = 1e-4/hour Nw=4 UA(SC4) UA(B users) 0.1 UA(SC2) Unavailability 0.01 UA(SC3) UA(SC1) NF=NH=NC Figure Class B users unavailability with the unavailability of associated scenarios SC1, SC2, SC3 and SC4 30 Deliverable CSDA3

35 2. SoS Dependability Modelling: The Travel Agency Example 2.4. Summary In this part of the report, we have illustrated the main concepts that we defined within our hierarchical modelling framework proposed in deliverable DMS1 for the dependability evaluation of systems of systems. The example used for the illustration is the travel agency case study described in deliverable DMS3. Our objectives were: 1) to show how to apply our framework considering the decomposition of the target SoS according to four levels: user, function, service and resource levels, and 2) to present typical dependability analysis and evaluation results that could be obtained from the modelling to help the SoS providers in making objective design decisions. For the sake of illustration, we have deliberately considered simplified (yet realistic) assumptions. We have showed that the proposed hierarchical framework provides a systematic and pragmatic modelling approach, that is necessary to be able to evaluate the dependability characteristics of the target SoS at different levels of abstractions. The proposed framework is general enough and can be applied to handle more complex assumptions and models. The application of this framework requires the estimation of several parameters involved in the models. The second part of this report addresses this issue. Dependable Systems of Systems 31

36 CSDA3- SoS Dependability Assessment: Modelling and Measurement 3. Measurement-based Evaluation The application of the SoS dependability modelling framework requires the estimation of several parameters that characterise the failure and recovery behaviour of the component systems and resources included in the model(s). Such parameters can be estimated based on measurement [Arlat et al. 2000]. Measurement involves three main steps: (1) data collection, (2) data validation and (3) data processing. Data collection consists in the definition of which data to collect and how to collect it. The analysis and assessment of computer systems based on data collected during operation provide valuable information on actual error/failure behaviour. In most commercial systems, in particular Unix and Windows NT and 2K based systems, error and failure data can be obtained from the event logging mechanisms offered by the operating system. Event logs include a large amount of information about the occurrence of various types of events; some of these events result from the normal activity of the target systems, whereas others are recorded when errors and failures affect local or distributed resources, or upon the occurrence of system reboots and shutdowns. Usually, the collected data contains a large amount of redundant and irrelevant information, as well as incorrect or incomplete information. Such problems have been observed in several studies, e.g. those reported in [Kaâniche et al. 1990, Levendel 1990, Buckley & Siewiorek 1995, Thakur & Iyer 1996]. Therefore, data validation is needed in order to analyse the collected data for correctness, consistency, and completeness. This consists in particular in filtering-out invalid or irrelevant data and in coalescing redundant or equivalent data. Once this step is achieved, the basic dependability characteristics of the measured system can be identified through data processing. Data processing consists in performing statistical analyses on the validated data to identify and analyse trends and to evaluate quantitative measures that characterise dependability. Various statistics can be derived from the data to study the distribution of errors and failures among system components and their severity, evaluate the time to failure or time to recovery distribution, analyse the impact of the workload on the system behaviour, etc. Measurement-based dependability analysis of computer systems, using event logs or data collected from the field have given rise to a wide variety of research. A detailed survey of the state of the art was presented in deliverable BC2 [Arlat et al. 2000]. Today s computing environments are mainly based on Unix, Windows NT and Windows 2K interconnected systems. However, to the best of our knowledge, only a few studies addressed the dependability analysis of Unix or Windows NT systems based on event logs [Thakur & Iyer 1996, Kalyanakrishnam et al. 1999b, Xu et al. 1999]. These studies did not cover Windows 2K systems. The work reported in [Thakur & Iyer 1996] is based on event logs collected from 69 SunOS workstations monitored over a period of 32 weeks. In [Kalyanakrishnam et al. 1999b], several analyses are carried out using event logs collected over a six month from 70 Windows NT mail 32 Deliverable CSDA3

37 3. Measurement-based Evaluation servers. Similar analyses are presented in [Xu et al. 1999] based on event logs collected over a four month period from 503 Windows NT servers running in a production environment. The systems analysed in these three studies are from distinct environments and the data collection period is rather short (less than 8 months). Clearly, additional measurement-based analyses are needed to understand the dependability characteristics of networked distributed systems and to give better insights into the problems that one might face when processing and analysing event logs. In this part of the report, we summarize the results obtained from the analysis of event logs collected from 373 Unix SunOS/Solaris machines, 76 Windows NT and 89 Windows 2K systems, interconnected through the LAAS computing network. The data collection period was about 33 months for Unix systems (from November 1999 until July 2002), 44 months for Windows NT (from January 1999 until July 2002) and 23 months for Windows 2K (from September 2000 until July 2002). The identification of useful trends from large event logs is a time consuming task that requires thorough manual analyses. In our study, we have focused on the identification of machine reboots, and the evaluation of statistical measures characterizing: a) the distribution of reboots (per machine, over time), b) the distribution of uptimes and downtimes associated to these reboots, c) the availability of machines including workstations and servers. These analyses have been done for both Unix and Windows systems. Also, we present some results concerning the classification of Windows NT and 2K reboot causes and the analysis of error dependencies among Unix machines. Preliminary analyses of subsets of the data presented in this report can be found in [Simache & Kaâniche 2001b, Simache et al. 2002]. This part of the report is organized as follows. Section 3.1 describes the target system architecture. Section 3.2 outlines the data collection strategy and the main analyses carried out on the collected data. The results obtained from the analysis are presented in Section 3.3 for Unix Systems and in Section 3.4. The main conclusions are summarized in Section Target system architecture The LAAS computing network is composed of a large set of heterogeneous workstations and servers interconnected through an Ethernet-based local area network. These systems are organized into subnets, according to their physical location and the research group they belong to. The subnets are interconnected through dedicated communication switches to a central switch. The latter provides connectivity to the servers shared by the whole network (SMTP, NIS+, Backup, HTTP, FTP, etc.) as well as to the Internet. Some of these services are replicated on several machines (e.g. the NIS+ server), and some machines host more than one service. In addition, some research groups have a set of servers dedicated to their users (NFS, POP, Application, Printing, etc.), nevertheless there are also some servers that are shared by several research groups. Most of the network and group servers are implemented on SunOS and Solaris machines (23), and a few shared servers run on Windows NT and 2K machines (10). The clients are a heterogeneous mix of Unix workstations, PCs and Macintoshes hosting many types of operation systems like SunOS, Solaris, Linux, Windows and MacOS and a large variety of versions. It is noteworthy that some machines host more than one operating system (e.g. Linux and MacOS). Dependable Systems of Systems 33

38 CSDA3- SoS Dependability Assessment: Modelling and Measurement In our study, we focussed on Unix and Windows NT and 2K machines. 3.2 Data collection and processing approach Our analysis is based on the operational data logged by the Unix, Windows NT and 2K systems-based connected to the LAAS network. Each type of operating system has its own event logging mechanism. Event logging is a facility used by computer systems to record the occurrence of significant events: error reports, system alerts, and diagnostic messages. The Unix-based systems offer capabilities for event logging by means of the syslogd daemon and the Windows-based systems via the Event Logging facility. In this section, we present some details concerning these facilities, the data collection strategy used in order to collect the operational data and the processing approach used to analyse the data Event logging in Unix The Unix operating system offers capabilities for event logging by means of the syslogd daemon. This background process records events generated by different local sources: kernel, system components (disk, network interfaces, memory), daemons and application processes that are configured to communicate with syslogd. Different types of events of various severity levels are generally recorded. Some of them result from the normal activity of the system whereas others provide information about hardware, software and configuration errors as well as system events such as reboots and shutdowns. The configuration file /etc/syslog.conf specifies the destination of each event received by syslogd, depending on its severity level and its origin. The destination could be one or several log files, the administration console or the operator (notified by ). The events that are relevant to our study are generally stored in the log file /var/adm/messages. Each event stored in this log file is formatted as follows: Date and time of the event Machine on which the event is logged Description of the message Example: Dec 15 16:39:29 napoli unix: server butch not responding still trying The Unix operating system provides the possibility to automatically control the size of the log files. This is done by executing, on a weekly basis, the script /usr/lib/newsyslog, via the cron mechanism. This script ensures that only the current log file /var/adm/messages and those recorded during the last four weeks (named messages.0, messages.1, messages.2 and messages.3) remain in the system. Therefore, data is lost if not archived within five weeks. 34 Deliverable CSDA3

39 3. Measurement-based Evaluation Event logging in Windows NT and 2K For Windows NT and 2K, event logging is implemented as a system service that runs in the background and waits for processes running on the local (or a remote) system to send it reports of events [Murray 1998]. Each event report is stored in a specific event log file on a disk. There are three event log files: The security log contains events generated by the system security and auditing processes. The system log contains events generated by system components, including drivers and services. It is used primarily to store diagnostic messages that are used by the system administrators for troubleshooting abnormal conditions, or to find problems unnoticed by the users. For example, a driver has failed to load, the operation of a device has failed, an I/O error has occurred, etc. The application event log stores all event reports not involving security auditing and system component event reporting. It is most commonly used to report internal errors that occur during the execution of an application, such as failing to allocate memory, being unable to access object, or aborting the transfer of a file, etc. The only native facility giving the user access to event logs is Event Viewer. This application displays the information on event records sorted in chronological order. Also it is used to back up or clear the event logs, or to change the parameters of the event logging policy. The data displayed by Event Viewer is formatted according to the following fields: - Event type: denotes the severity level of the event; five event types are defined: error, warning, information, success audit and failure audit. - Date and time: indicates the date and time when the report was logged. - Source: the registered name of the event source that reported the event. - Category: source-specific event classification. - Event: source-specific event identification (called also Event ID). - User: name of the user account that generated the event. - Computer: name of the computer that reported the event. In addition, Event Viewer offers the possibility to display a description of the event, its cause, and where it occurred. However, such a description is not always available Data collection strategy We have set up a data collection strategy to automatically collect the data stored in the /var/adm/messages.0 log file of each SunOS and Solaris machine and the Application and System event logs of each Windows NT and 2K machine connected to the network. This strategy has been defined to take into account the dynamic evolution of the network configuration resulting for instance from system administration and maintenance activities (connection of new machines, upgrade of OS versions, modification of shared services and resources configuration, modification of machine names and configuration, temporary disconnection of machines from the network, etc). Dependable Systems of Systems 35

40 CSDA3- SoS Dependability Assessment: Modelling and Measurement The data collection strategy is decomposed into two main steps: 1) Identification of the list of machines to be included in the data collection process. 2) Collection of data from these machines to a dedicated machine used for data processing. The identification of Unix and Windows machines from which data will be collected is based on the analysis of the hosts.org_dir master table maintained by the NIS+ server. All IP devices connected to the network, including Unix and Windows machines, are declared in this table. However, this table generally contains redundant information that corresponds, for instance, to machines that are declared under different IP addresses or with different names. The script that we have developed automatically detects and eliminates such redundant information to avoid collecting multiple copies of the same log files from the corresponding machines. Also, the script eliminates from the list of machines those that are not relevant to our study; for example, those used to support offline maintenance activities, or used in specific experimental testbeds, or those Windows systems that have Linux as a second operating system and laptops. In the second step of the data collection strategy, the log files are remotely copied from the selected machines to a dedicated machine, collated into a single file corresponding to each machine that is sorted chronologically. Only the new events logged since the last collection are selected and included in the final file containing the data for the corresponding machine. Also, a verification of the format of the collected data is done at this step, and an additional field specifying the year is added to the date of each message corresponding to a Unix machine (by default, the year is not recorded by syslogd; it s not the case for the Event Logging facility of Windows systems). This simplifies analyses of data collected over several years. For Windows systems, the data collection is carried out manually once every month using the Event Viewer backup function. For Unix systems, the data collection is carried out using Shell and Perl scripts. These scripts are executed via cron on a weekly basis in accordance with the mechanism provided by the operating system to control the log files size. However, manual verification is sometimes needed when problems affecting some target machines occur during the execution of these scripts, e.g., these targets may not be alive, or they are alive but due to some local problems the scripts hang. If the manual verification is not done, we might lose some data, or the same data may be copied more than once (see [Simache & Kaâniche 2001a] for more detail). Using this strategy, we have collected on a regular basis the event logs stored on 373 Unix, 76 Windows NT and 89 Windows 2K systems connected to the LAAS network. The data collection period was (October 1999, July 2002) for Unix systems (January 1999, July 2002) for Windows NT systems and (September 2000, July 2002) for Windows 2K systems. However, due to the frequent addition and removal of machines from the network, the data collection period was not uniform for all the machines that we have monitored. This is illustrated in Figures 3-1 and 3-2 which plot the distribution of the data collection period (in hours) for Unix and Windows machines, respectively. These figures show a large variability among the machines with respect to the data collection period. In particular, the data collection period for a few Unix and Windows 2K machines that have been recently connected to the network was short (less than 2000 hours). Clearly, these machines should be excluded from the analysis to avoid having biased results due to the short time during which they have been 36 Deliverable CSDA3

41 3. Measurement-based Evaluation monitored. For the rest of the analysis we decided to ignore all machines for which the data collected period was shorter than three months (2000 hours). Accordingly, 23 Unix and 8 Windows 2K machines satisfying this criterion were excluded from the analyses presented in the following sections Data collection period (hours) Unix % 10.00% 20.00% 30.00% 40.00% 50.00% 60.00% 70.00% 80.00% 90.00% % %machines Figure 3-1. Distribution of data collection period for Unix machines Data collection period (hours) NT 2K % 10.00% 20.00% 30.00% 40.00% 50.00% 60.00% 70.00% 80.00% 90.00% % %machines Figure 3-2. Distribution of data collection period for NT and 2K machines Data processing The data processing phase consists of: 1) extracting from the log files the information that is relevant to the dependability analysis of the target system and 2) evaluating statistical measures to identify significant trends. The log files contain a large amount of information that is not Dependable Systems of Systems 37

42 CSDA3- SoS Dependability Assessment: Modelling and Measurement always easy to categorize. The identification of events corresponding to errors and the definition of error classification criteria requires a thorough manual analysis of event logs. The classification is system dependent because, even for the same operating system, system hardware and software components, architecture and activity may strongly influence the criteria used for classification. In this report, we focus on the identification of machine reboots, and the evaluation of statistical measures characterizing: a) the distribution of reboots (per machine, over time), b) the distribution of uptimes and downtimes associated to these reboots, c) the availability of machines including workstations and servers. These analyses have been done for both Unix and Windows systems. Also, we present some results concerning the classification of Windows NT and 2K reboot causes and the analysis of error dependencies among Unix machines. A summary of these results is presented in the following sections, considering first Unix systems (section 3.3) and then Windows NT and 2K systems (section 3.4). 3.3 Application to Unix Systems In this section, we present the algorithm that we developed to identify machine reboots from the Unix event log files, and the results obtained from the analysis of the identified reboots. The event log files have been collected from 350 Unix machines during the period (October 1999, July 2002) Identification of reboots Three methods can be distinguished to identify when Unix machines are rebooted: 1) Use of last reboot command 2) Analysis of /var/adm/wtmp log files 3) Analysis of /var/adm/messages log files With the first and second methods, only the start timestamp of machine reboots can be identified. However, in our study, we are interested in identifying the start and end timestamps of machine reboots as well as the service interruption duration associated to these reboots. Moreover, the causes of reboots can be investigated based on the analysis of the messages logged by the system before the machine is rebooted. Therefore, we have developed an algorithm to identify machines reboots based on the analysis of the /var/adm/messages log files collected from the target systems included in our data collection. A manual analysis of collected data revealed that not all reboots could be easily identified from the corresponding log files. Indeed, whereas some reboots are explicitly identified by a reboot or a shutdown event, many others can be detected only by identifying the sequence of initialisation events generated by the system when it is restarted. Generally, an initialisation sequence of the system is composed of about 70 messages, starting with unix: SunOS Release or unix: Copyright 3 messages, and ending with 3 Note that these messages may appear several times in the sequence. 38 Deliverable CSDA3

43 3. Measurement-based Evaluation clock synchronization messages generated by the ntpdate and xntpd or ntpd daemons. An example of such a sequence is presented in Figure Jan 31 08:16:03 ripolin unix: Copyright , Sun Microsystems, Inc Jan 31 08:16:03 ripolin unix: SunOS Release Version Generic_ [UNIX System V Release 4.0] 2000 Jan 31 08:16:03 ripolin unix: root nexus = SUNW,SPARCstation Jan 31 08:16:03 ripolin unix: Ethernet address = 8:0:20:82:23:f 2000 Jan 31 08:16:03 ripolin unix: avail mem = Jan 31 08:16:04 ripolin unix: SunOS Release Version Generic_ [UNIX System V Release 4.0] 2000 Jan 31 08:16:04 ripolin unix: Copyright , Sun Microsystems, Inc Jan 31 08:16:13 ripolin unix: vol0 is /pseudo/vol@ Jan 31 08:16:13 ripolin unix: pseudo-device: vol Jan 31 08:16:18 ripolin ntpdate[228]: step time server offset sec 2000 Jan 31 08:16:23 ripolin xntpd[231]: xntpd Tue Jul 6 18:01:08 MET DST 1999 (1) 2000 Jan 31 08:16:24 ripolin xntpd[231]: sched_setscheduler(): Operation not applicable 2000 Jan 31 08:16:25 ripolin xntpd[231]: tickadj = 5, tick = 10000, tvu_maxslew = 495, est. hz = 100 Figure 3-3. Initialisation sequence However, we have identified several scenarios that do not fit the initialisation sequence presented in Figure 3-3. Such scenarios occur for example: a) when multiple reboots are needed before the machine can restore its normal functioning state, or b) when the time synchronization messages do not appear in the corresponding sequence, or their timestamp precedes the timestamp of the messages identifying the start of the sequence. Typically the latter case corresponds to synchronization events with a negative offset value. An example of such scenario is presented in Figure 3-4. It can be seen that the timestamp of the ntpdate message (Jan 16 18:22:56) precedes the timestamp of the unix: SunOS Release message because the negative value of the offset (-15.89) Jan 16 18:23:02 demeter unix: SunOS Release 5.7 Version Generic 64-bit [UNIX System V Release 4.0] 2000 Jan 16 18:23:02 demeter unix: Copyright , Sun Microsystems, Inc Jan 16 18:23:02 demeter unix: mem = K (0x ) 2000 Jan 16 18:23:05 demeter unix: vol0 is /pseudo/vol@ Jan 16 18:22:56 demeter ntpdate[269]: step time server offset sec 2000 Jan 16 18:23:01 demeter xntpd[273]: xntpd Tue Jul 6 18:01:08 MET DST 1999 (1) 2000 Jan 16 18:23:02 demeter xntpd[273]: kvm_open failed Figure 3-4. ntpdate message with negative offset at the end of a reboot (original sequence, i.e., before sorting chronologically the data) To identify reboots from the log files, we have developed an algorithm, implemented in Perl, that is based on the sequential parsing and matching of each message in the collected log files to specific patterns or sequences of patterns characterizing the occurrence of reboots. These patterns correspond to explicit reboot messages or to sequences of events generated during the initialisation of the system, as explained above. The algorithm is detailed in [Simache & Kaâniche 2001a]. This algorithm gives, for each reboot detected in the log file and for each machine, the timestamp of the start and of the end of the reboot, and the last event logged before each reboot with the corresponding timestamp. Dependable Systems of Systems 39

44 CSDA3- SoS Dependability Assessment: Modelling and Measurement The reboot identification algorithm allowed us to detect 8842 reboots from the log files collected from 350 Unix machines during 33 months (November 1999 until July 2002). During the observation period, several versions of SunOS and Solaris were running on these machines including versions 1.2, 4.1.3, 4.1.4, 5.4, 5.5, 5.5.1, 5.6, 5.7 and 5.8. Among the machines that we monitored, 23 machines (referred to as main servers ) hosted critical services shared by the whole network or by a large subset of users. In the following we present various analyses of the 8842 reboots corresponding to the 350 Unix machines Distribution of reboots per machine The number of reboots observed during the data collection period constitutes a large sample of data on which significant statistical analyses can be performed. However, these reboots are not uniformly distributed among the machines. This is illustrated by the number of reboots per machine statistics presented in Table 3-1. In particular, 85.7% of the Unix machines had more than 10 reboots. Further investigation showed that 50% of the reboots were caused by 25% of the machines. Such variability is explained by differences with respect to the length of the data collection period (see Figure 3-1), the configuration of these machines, the types of software running on them and the user workload. Table 3-1. Distribution of the number of reboots per machine 0 #reb <#reb <#reb <#reb <#reb 14.29% 28.57% 31.43% 12.86% 12.86% The impact of the user s workload can be highlighted by considering the distribution of reboots according to the hour of the day when the reboots occurred. As illustrated in Figure 3-5, the majority of reboots occurred during normal working hours (8AM to 6PM). The peak between 9 and 10 AM includes all reboots that are generally done during the morning by the system administrator to solve problems that occur during the night Unix Number of Reboots :00 2:00 4:00 6:00 8:00 10:00 12:00 14:00 16:00 18:00 20:00 22:00 Hour of day Figure 3-5. Number of reboots per hour of the day 40 Deliverable CSDA3

A User -Perceived Availability Evaluation of a Web-based Travel Agency

A User -Perceived Availability Evaluation of a Web-based Travel Agency A User -Perceived Availability Evaluation of a Web-based Travel Agency Mohamed Kaâniche, Karama Kanoun, Magnos Martinello Partially supported by the European Community, DSoS - Project IST-1999-11585 DSN-2003,

More information

A User-Perceived Availability Evaluation of a Web Based Travel Agency

A User-Perceived Availability Evaluation of a Web Based Travel Agency A User-Perceived Availability Evaluation of a Web Based Travel Agency Mohamed Kaâniche, Karama Kanoun, and Magnos Martinello * LAAS-CNRS 7 Avenue du Colonel Roche 31077 Toulouse Cedex 4 France {Mohamed.Kaaniche,

More information

A framework for modeling availability of E-business systems

A framework for modeling availability of E-business systems A framework for modeling availability of E-business systems Mohamed Kaâniche, Karama Kanoun, Mourad Rabah To cite this version: Mohamed Kaâniche, Karama Kanoun, Mourad Rabah. A framework for modeling availability

More information

Dependability Modeling Based on AADL Description (Architecture Analysis and Design Language)

Dependability Modeling Based on AADL Description (Architecture Analysis and Design Language) Dependability Modeling Based on AADL Description (Architecture Analysis and Design Language) Ana Rugina, Karama Kanoun and Mohamed Kaâniche {rugina, kanoun, kaaniche}@laas.fr European Integrated Project

More information

Measurement-based Availability Analysis of Unix Systems in a Distributed Environment

Measurement-based Availability Analysis of Unix Systems in a Distributed Environment Author manuscript, published in "12th International Symposium on Software Reliability Engineering (ISSRE 2001), Hong-Kong : Hong Kong (2001)" DOI : 10.1109/ISSRE.2001.989489 Measurement-based Availability

More information

Queuing Networks. Renato Lo Cigno. Simulation and Performance Evaluation Queuing Networks - Renato Lo Cigno 1

Queuing Networks. Renato Lo Cigno. Simulation and Performance Evaluation Queuing Networks - Renato Lo Cigno 1 Queuing Networks Renato Lo Cigno Simulation and Performance Evaluation 2014-15 Queuing Networks - Renato Lo Cigno 1 Moving between Queues Queuing Networks - Renato Lo Cigno - Interconnecting Queues 2 Moving

More information

A Capacity Planning Methodology for Distributed E-Commerce Applications

A Capacity Planning Methodology for Distributed E-Commerce Applications A Capacity Planning Methodology for Distributed E-Commerce Applications I. Introduction Most of today s e-commerce environments are based on distributed, multi-tiered, component-based architectures. The

More information

A System Dependability Modeling Framework Using AADL and GSPNs

A System Dependability Modeling Framework Using AADL and GSPNs A System Dependability Modeling Framework Using AADL and GSPNs Ana-Elena Rugina, Karama Kanoun, and Mohamed Kaâniche LAAS-CNRS, University of Toulouse 7 avenue Colonel Roche 31077 Toulouse Cedex 4, France

More information

Event log based dependability analysis of Windows NT and 2K systems

Event log based dependability analysis of Windows NT and 2K systems Event log based dependability analysis of Windows NT and 2K systems Cristina Simache, Mohamed Kaâniche, Ayda Saidane To cite this version: Cristina Simache, Mohamed Kaâniche, Ayda Saidane. Event log based

More information

A queueing network model to study Proxy Cache Servers

A queueing network model to study Proxy Cache Servers Proceedings of the 7 th International Conference on Applied Informatics Eger, Hungary, January 28 31, 2007. Vol. 1. pp. 203 210. A queueing network model to study Proxy Cache Servers Tamás Bérczes, János

More information

Calculating Call Blocking and Utilization for Communication Satellites that Use Dynamic Resource Allocation

Calculating Call Blocking and Utilization for Communication Satellites that Use Dynamic Resource Allocation Calculating Call Blocking and Utilization for Communication Satellites that Use Dynamic Resource Allocation Leah Rosenbaum Mohit Agrawal Leah Birch Yacoub Kureh Nam Lee UCLA Institute for Pure and Applied

More information

Markov Chains and Multiaccess Protocols: An. Introduction

Markov Chains and Multiaccess Protocols: An. Introduction Markov Chains and Multiaccess Protocols: An Introduction Laila Daniel and Krishnan Narayanan April 8, 2012 Outline of the talk Introduction to Markov Chain applications in Communication and Computer Science

More information

Basic Concepts of Reliability

Basic Concepts of Reliability Basic Concepts of Reliability Reliability is a broad concept. It is applied whenever we expect something to behave in a certain way. Reliability is one of the metrics that are used to measure quality.

More information

VALIDATING AN ANALYTICAL APPROXIMATION THROUGH DISCRETE SIMULATION

VALIDATING AN ANALYTICAL APPROXIMATION THROUGH DISCRETE SIMULATION MATHEMATICAL MODELLING AND SCIENTIFIC COMPUTING, Vol. 8 (997) VALIDATING AN ANALYTICAL APPROXIMATION THROUGH DISCRETE ULATION Jehan-François Pâris Computer Science Department, University of Houston, Houston,

More information

Chapter 2 Overview of the Design Methodology

Chapter 2 Overview of the Design Methodology Chapter 2 Overview of the Design Methodology This chapter presents an overview of the design methodology which is developed in this thesis, by identifying global abstraction levels at which a distributed

More information

Analysis of Replication Control Protocols

Analysis of Replication Control Protocols Analysis of Replication Control Protocols Darrell D. E. Long University of California, Santa Cruz darrell@cis.ucsc.edu June 22, 2003 Abstract In recent years many replication control protocols have been

More information

DLV02.01 Business processes. Study on functional, technical and semantic interoperability requirements for the Single Digital Gateway implementation

DLV02.01 Business processes. Study on functional, technical and semantic interoperability requirements for the Single Digital Gateway implementation Study on functional, technical and semantic interoperability requirements for the Single Digital Gateway implementation 18/06/2018 Table of Contents 1. INTRODUCTION... 7 2. METHODOLOGY... 8 2.1. DOCUMENT

More information

A NEW MODELLING APPROACH TO ENHANCE RELIABILITY OF TRANSACTIONAL ORIENTED WEB SERVICES

A NEW MODELLING APPROACH TO ENHANCE RELIABILITY OF TRANSACTIONAL ORIENTED WEB SERVICES A NEW MODELLING APPROACH TO ENHANCE RELIABILITY OF TRANSACTIONAL ORIENTED WEB SERVICES Adil M. Hammadi 1 ), Saqib Ali ), Fei Liu 1 ) Abstract Reliability and uptime are the key indicators of business systems

More information

A Path Decomposition Approach for Computing Blocking Probabilities in Wavelength-Routing Networks

A Path Decomposition Approach for Computing Blocking Probabilities in Wavelength-Routing Networks IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 8, NO. 6, DECEMBER 2000 747 A Path Decomposition Approach for Computing Blocking Probabilities in Wavelength-Routing Networks Yuhong Zhu, George N. Rouskas, Member,

More information

Modeling and Simulation of Quality of Service for Composite Web Services

Modeling and Simulation of Quality of Service for Composite Web Services Modeling and Simulation of Quality of Service for Composite Web Services Gregory A. Silver Angela Maduko Rabia Jafri John A. Miller Amit P. Sheth Department of Computer Science University of Georgia Athens,

More information

Executing Evaluations over Semantic Technologies using the SEALS Platform

Executing Evaluations over Semantic Technologies using the SEALS Platform Executing Evaluations over Semantic Technologies using the SEALS Platform Miguel Esteban-Gutiérrez, Raúl García-Castro, Asunción Gómez-Pérez Ontology Engineering Group, Departamento de Inteligencia Artificial.

More information

Loopback: Exploiting Collaborative Caches for Large-Scale Streaming

Loopback: Exploiting Collaborative Caches for Large-Scale Streaming Loopback: Exploiting Collaborative Caches for Large-Scale Streaming Ewa Kusmierek Yingfei Dong David Du Poznan Supercomputing and Dept. of Electrical Engineering Dept. of Computer Science Networking Center

More information

n = 2 n = 2 n = 1 n = 1 λ 12 µ λ λ /2 λ /2 λ22 λ 22 λ 22 λ n = 0 n = 0 λ 11 λ /2 0,2,0,0 1,1,1, ,0,2,0 1,0,1,0 0,2,0,0 12 1,1,0,0

n = 2 n = 2 n = 1 n = 1 λ 12 µ λ λ /2 λ /2 λ22 λ 22 λ 22 λ n = 0 n = 0 λ 11 λ /2 0,2,0,0 1,1,1, ,0,2,0 1,0,1,0 0,2,0,0 12 1,1,0,0 A Comparison of Allocation Policies in Wavelength Routing Networks Yuhong Zhu a, George N. Rouskas b, Harry G. Perros b a Lucent Technologies, Acton, MA b Department of Computer Science, North Carolina

More information

Dependable and Secure Systems Dependability Master of Science in Embedded Computing Systems

Dependable and Secure Systems Dependability Master of Science in Embedded Computing Systems Dependable and Secure Systems Dependability Master of Science in Embedded Computing Systems Quantitative Dependability Analysis with Stochastic Activity Networks: the Möbius Tool April 2016 Andrea Domenici

More information

OPTIMIZING PRODUCTION WORK FLOW USING OPEMCSS. John R. Clymer

OPTIMIZING PRODUCTION WORK FLOW USING OPEMCSS. John R. Clymer Proceedings of the 2000 Winter Simulation Conference J. A. Joines, R. R. Barton, K. Kang, and P. A. Fishwick, eds. OPTIMIZING PRODUCTION WORK FLOW USING OPEMCSS John R. Clymer Applied Research Center for

More information

Module 4: Stochastic Activity Networks

Module 4: Stochastic Activity Networks Module 4: Stochastic Activity Networks Module 4, Slide 1 Stochastic Petri nets Session Outline Places, tokens, input / output arcs, transitions Readers / Writers example Stochastic activity networks Input

More information

Using Queuing theory the performance measures of cloud with infinite servers

Using Queuing theory the performance measures of cloud with infinite servers Using Queuing theory the performance measures of cloud with infinite servers A.Anupama Department of Information Technology GMR Institute of Technology Rajam, India anupama.a@gmrit.org G.Satya Keerthi

More information

Availability measurement of grid services from the perspective of a scientific computing centre

Availability measurement of grid services from the perspective of a scientific computing centre Journal of Physics: Conference Series Availability measurement of grid services from the perspective of a scientific computing centre To cite this article: H Marten and T Koenig 2011 J. Phys.: Conf. Ser.

More information

Determining the Number of CPUs for Query Processing

Determining the Number of CPUs for Query Processing Determining the Number of CPUs for Query Processing Fatemah Panahi Elizabeth Soechting CS747 Advanced Computer Systems Analysis Techniques The University of Wisconsin-Madison fatemeh@cs.wisc.edu, eas@cs.wisc.edu

More information

STRUCTURE-BASED software reliability analysis techniques

STRUCTURE-BASED software reliability analysis techniques IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 31, NO. 8, AUGUST 2005 1 A Simulation Approach to Structure-Based Software Reliability Analysis Swapna S. Gokhale, Member, IEEE, and Michael R. Lyu, Fellow,

More information

Chapter 2 System Models

Chapter 2 System Models CSF661 Distributed Systems 分散式系統 Chapter 2 System Models 吳俊興國立高雄大學資訊工程學系 Chapter 2 System Models 2.1 Introduction 2.2 Physical models 2.3 Architectural models 2.4 Fundamental models 2.5 Summary 2 A physical

More information

these developments has been in the field of formal methods. Such methods, typically given by a

these developments has been in the field of formal methods. Such methods, typically given by a PCX: A Translation Tool from PROMELA/Spin to the C-Based Stochastic Petri et Language Abstract: Stochastic Petri ets (SPs) are a graphical tool for the formal description of systems with the features of

More information

Managing test suites for services

Managing test suites for services Managing test suites for services Kathrin Kaschner Universität Rostock, Institut für Informatik, 18051 Rostock, Germany kathrin.kaschner@uni-rostock.de Abstract. When developing an existing service further,

More information

Path Analysis References: Ch.10, Data Mining Techniques By M.Berry, andg.linoff Dr Ahmed Rafea

Path Analysis References: Ch.10, Data Mining Techniques By M.Berry, andg.linoff  Dr Ahmed Rafea Path Analysis References: Ch.10, Data Mining Techniques By M.Berry, andg.linoff http://www9.org/w9cdrom/68/68.html Dr Ahmed Rafea Outline Introduction Link Analysis Path Analysis Using Markov Chains Applications

More information

Datacenter replication solution with quasardb

Datacenter replication solution with quasardb Datacenter replication solution with quasardb Technical positioning paper April 2017 Release v1.3 www.quasardb.net Contact: sales@quasardb.net Quasardb A datacenter survival guide quasardb INTRODUCTION

More information

Queueing Networks analysis with GNU Octave. Moreno Marzolla Università di Bologna

Queueing Networks analysis with GNU Octave. Moreno Marzolla  Università di Bologna The queueing Package Queueing Networks analysis with GNU Octave Moreno Marzolla marzolla@cs.unibo.it http://www.moreno.marzolla.name/ Università di Bologna december 4, 2012 Moreno Marzolla (Università

More information

Mathematical Analysis of Google PageRank

Mathematical Analysis of Google PageRank INRIA Sophia Antipolis, France Ranking Answers to User Query Ranking Answers to User Query How a search engine should sort the retrieved answers? Possible solutions: (a) use the frequency of the searched

More information

DDSS: Dynamic Dedicated Servers Scheduling for Multi Priority Level Classes in Cloud Computing

DDSS: Dynamic Dedicated Servers Scheduling for Multi Priority Level Classes in Cloud Computing DDSS: Dynamic Dedicated Servers Scheduling for Multi Priority Level Classes in Cloud Computing Husnu Saner Narman Md. Shohrab Hossain Mohammed Atiquzzaman School of Computer Science University of Oklahoma,

More information

Mining for User Navigation Patterns Based on Page Contents

Mining for User Navigation Patterns Based on Page Contents WSS03 Applications, Products and Services of Web-based Support Systems 27 Mining for User Navigation Patterns Based on Page Contents Yue Xu School of Software Engineering and Data Communications Queensland

More information

Deriving safety requirements according to ISO for complex systems: How to avoid getting lost?

Deriving safety requirements according to ISO for complex systems: How to avoid getting lost? Deriving safety requirements according to ISO 26262 for complex systems: How to avoid getting lost? Thomas Frese, Ford-Werke GmbH, Köln; Denis Hatebur, ITESYS GmbH, Dortmund; Hans-Jörg Aryus, SystemA GmbH,

More information

Dynamic Time Delay Models for Load Balancing Part II: A Stochastic Analysis of the Effect of Delay Uncertainty. 1. Introduction

Dynamic Time Delay Models for Load Balancing Part II: A Stochastic Analysis of the Effect of Delay Uncertainty. 1. Introduction Dynamic Time Delay Models for Load Balancing Part II: A Stochastic Analysis of the Effect of Delay Uncertainty Majeed M. Hayat, Sagar Dhakal, Chaouki T. Abdallah Department of Electrical and Computer Engineering

More information

IPv6-based Beyond-3G Networking

IPv6-based Beyond-3G Networking IPv6-based Beyond-3G Networking Motorola Labs Abstract This paper highlights the technical issues in IPv6-based Beyond-3G networking as a means to enable a seamless mobile Internet beyond simply wireless

More information

Implicit vs. Explicit Data-Flow Requirements in Web Service Composition Goals

Implicit vs. Explicit Data-Flow Requirements in Web Service Composition Goals Implicit vs. Explicit Data-Flow Requirements in Web Service Composition Goals Annapaola Marconi, Marco Pistore, and Paolo Traverso ITC-irst Via Sommarive 18, Trento, Italy {marconi, pistore, traverso}@itc.it

More information

Mathematics and Computer Science

Mathematics and Computer Science Technical Report TR-2006-010 Revisiting hypergraph models for sparse matrix decomposition by Cevdet Aykanat, Bora Ucar Mathematics and Computer Science EMORY UNIVERSITY REVISITING HYPERGRAPH MODELS FOR

More information

Background. 20: Distributed File Systems. DFS Structure. Naming and Transparency. Naming Structures. Naming Schemes Three Main Approaches

Background. 20: Distributed File Systems. DFS Structure. Naming and Transparency. Naming Structures. Naming Schemes Three Main Approaches Background 20: Distributed File Systems Last Modified: 12/4/2002 9:26:20 PM Distributed file system (DFS) a distributed implementation of the classical time-sharing model of a file system, where multiple

More information

UNIT 4: QUEUEING MODELS

UNIT 4: QUEUEING MODELS UNIT 4: QUEUEING MODELS 4.1 Characteristics of Queueing System The key element s of queuing system are the customer and servers. Term Customer: Can refer to people, trucks, mechanics, airplanes or anything

More information

CPSC 531: System Modeling and Simulation. Carey Williamson Department of Computer Science University of Calgary Fall 2017

CPSC 531: System Modeling and Simulation. Carey Williamson Department of Computer Science University of Calgary Fall 2017 CPSC 531: System Modeling and Simulation Carey Williamson Department of Computer Science University of Calgary Fall 2017 Recap: Simulation Model Taxonomy 2 Recap: DES Model Development How to develop a

More information

Slides 11: Verification and Validation Models

Slides 11: Verification and Validation Models Slides 11: Verification and Validation Models Purpose and Overview The goal of the validation process is: To produce a model that represents true behaviour closely enough for decision making purposes.

More information

Higher-order Testing. Stuart Anderson. Stuart Anderson Higher-order Testing c 2011

Higher-order Testing. Stuart Anderson. Stuart Anderson Higher-order Testing c 2011 Higher-order Testing Stuart Anderson Defining Higher Order Tests 1 The V-Model V-Model Stages Meyers version of the V-model has a number of stages that relate to distinct testing phases all of which are

More information

An Information Model for High-Integrity Real Time Systems

An Information Model for High-Integrity Real Time Systems An Information Model for High-Integrity Real Time Systems Alek Radjenovic, Richard Paige, Philippa Conmy, Malcolm Wallace, and John McDermid High-Integrity Systems Group, Department of Computer Science,

More information

Privacy Policy- Introduction part Personal Information

Privacy Policy- Introduction part Personal Information Privacy policy The Privacy Policy is applicable to the website www.mypitcrew.in registered as MyPitCrew. This privacy statement also does not apply to the websites of our business partners, corporate affiliates

More information

Business Process Modelling

Business Process Modelling CS565 - Business Process & Workflow Management Systems Business Process Modelling CS 565 - Lecture 2 20/2/17 1 Business Process Lifecycle Enactment: Operation Monitoring Maintenance Evaluation: Process

More information

Building and evaluating network simulation systems

Building and evaluating network simulation systems S-72.333 Postgraduate Course in Radiocommunications Fall 2000 Building and evaluating network simulation systems Shkumbin Hamiti Nokia Research Center shkumbin.hamiti@nokia.com HUT 06.02.2001 Page 1 (14)

More information

Network Survivability Performance Evaluation with Applications in WDM Networks with Wavelength Conversion

Network Survivability Performance Evaluation with Applications in WDM Networks with Wavelength Conversion Network Survivability Performance Evaluation with Applications in WDM Networks with Wavelength Conversion Manijeh Keshtgary, Fahad A. Al-Zahrani, Anura P. Jayasumana Electrical and Computer Engineering

More information

OPTIMAL LINK CAPACITY ASSIGNMENTS IN TELEPROCESSING AND CENTRALIZED COMPUTER NETWORKS *

OPTIMAL LINK CAPACITY ASSIGNMENTS IN TELEPROCESSING AND CENTRALIZED COMPUTER NETWORKS * OPTIMAL LINK CAPACITY ASSIGNMENTS IN TELEPROCESSING AND CENTRALIZED COMPUTER NETWORKS * IZHAK RUBIN UCLA Los Angeles, California Summary. We consider a centralized network model representing a teleprocessing

More information

Chapter 1: Number and Operations

Chapter 1: Number and Operations Chapter 1: Number and Operations 1.1 Order of operations When simplifying algebraic expressions we use the following order: 1. Perform operations within a parenthesis. 2. Evaluate exponents. 3. Multiply

More information

AOSA - Betriebssystemkomponenten und der Aspektmoderatoransatz

AOSA - Betriebssystemkomponenten und der Aspektmoderatoransatz AOSA - Betriebssystemkomponenten und der Aspektmoderatoransatz Results obtained by researchers in the aspect-oriented programming are promoting the aim to export these ideas to whole software development

More information

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models Prof. Daniel Cremers 4. Probabilistic Graphical Models Directed Models The Bayes Filter (Rep.) (Bayes) (Markov) (Tot. prob.) (Markov) (Markov) 2 Graphical Representation (Rep.) We can describe the overall

More information

An algorithm for Performance Analysis of Single-Source Acyclic graphs

An algorithm for Performance Analysis of Single-Source Acyclic graphs An algorithm for Performance Analysis of Single-Source Acyclic graphs Gabriele Mencagli September 26, 2011 In this document we face with the problem of exploiting the performance analysis of acyclic graphs

More information

Throughput Maximization for Energy Efficient Multi-Node Communications using Actor-Critic Approach

Throughput Maximization for Energy Efficient Multi-Node Communications using Actor-Critic Approach Throughput Maximization for Energy Efficient Multi-Node Communications using Actor-Critic Approach Charles Pandana and K. J. Ray Liu Department of Electrical and Computer Engineering University of Maryland,

More information

A New Location Caching with Fixed Local Anchor for Reducing Overall Location Management Cost in Wireless Mobile Networks

A New Location Caching with Fixed Local Anchor for Reducing Overall Location Management Cost in Wireless Mobile Networks A New Location Caching with Fixed Local Anchor for Reducing Overall Location Management Cost in Wireless Mobile Networks Md. Mohsin Ali Department of Computer Science and Engineering (CSE) Khulna niversity

More information

Advanced Topics UNIT 2 PERFORMANCE EVALUATIONS

Advanced Topics UNIT 2 PERFORMANCE EVALUATIONS Advanced Topics UNIT 2 PERFORMANCE EVALUATIONS Structure Page Nos. 2.0 Introduction 4 2. Objectives 5 2.2 Metrics for Performance Evaluation 5 2.2. Running Time 2.2.2 Speed Up 2.2.3 Efficiency 2.3 Factors

More information

Design of Distributed Data Mining Applications on the KNOWLEDGE GRID

Design of Distributed Data Mining Applications on the KNOWLEDGE GRID Design of Distributed Data Mining Applications on the KNOWLEDGE GRID Mario Cannataro ICAR-CNR cannataro@acm.org Domenico Talia DEIS University of Calabria talia@deis.unical.it Paolo Trunfio DEIS University

More information

Towards a Task-Oriented, Policy-Driven Business Requirements Specification for Web Services

Towards a Task-Oriented, Policy-Driven Business Requirements Specification for Web Services Towards a Task-Oriented, Policy-Driven Business Requirements Specification for Web Services Stephen Gorton and Stephan Reiff-Marganiec Department of Computer Science, University of Leicester University

More information

CS 556 Advanced Computer Networks Spring Solutions to Midterm Test March 10, YOUR NAME: Abraham MATTA

CS 556 Advanced Computer Networks Spring Solutions to Midterm Test March 10, YOUR NAME: Abraham MATTA CS 556 Advanced Computer Networks Spring 2011 Solutions to Midterm Test March 10, 2011 YOUR NAME: Abraham MATTA This test is closed books. You are only allowed to have one sheet of notes (8.5 11 ). Please

More information

Automatic Reconstruction of the Underlying Interaction Design of Web Applications

Automatic Reconstruction of the Underlying Interaction Design of Web Applications Automatic Reconstruction of the Underlying Interaction Design of Web Applications L.Paganelli, F.Paternò C.N.R., Pisa Via G.Moruzzi 1 {laila.paganelli, fabio.paterno}@cnuce.cnr.it ABSTRACT In this paper

More information

Dependable and Secure Systems Dependability

Dependable and Secure Systems Dependability Dependable and Secure Systems Dependability Master of Science in Embedded Computing Systems Quantitative Dependability Analysis with Stochastic Activity Networks: the Möbius Tool Andrea Domenici DII, Università

More information

Statistical Testing of Software Based on a Usage Model

Statistical Testing of Software Based on a Usage Model SOFTWARE PRACTICE AND EXPERIENCE, VOL. 25(1), 97 108 (JANUARY 1995) Statistical Testing of Software Based on a Usage Model gwendolyn h. walton, j. h. poore and carmen j. trammell Department of Computer

More information

Information Retrieval. Lecture 11 - Link analysis

Information Retrieval. Lecture 11 - Link analysis Information Retrieval Lecture 11 - Link analysis Seminar für Sprachwissenschaft International Studies in Computational Linguistics Wintersemester 2007 1/ 35 Introduction Link analysis: using hyperlinks

More information

CHAPTER 3 FUZZY RELATION and COMPOSITION

CHAPTER 3 FUZZY RELATION and COMPOSITION CHAPTER 3 FUZZY RELATION and COMPOSITION The concept of fuzzy set as a generalization of crisp set has been introduced in the previous chapter. Relations between elements of crisp sets can be extended

More information

Chapter 8 Fault Tolerance

Chapter 8 Fault Tolerance DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN Chapter 8 Fault Tolerance 1 Fault Tolerance Basic Concepts Being fault tolerant is strongly related to

More information

Analytic Performance Models for Bounded Queueing Systems

Analytic Performance Models for Bounded Queueing Systems Analytic Performance Models for Bounded Queueing Systems Praveen Krishnamurthy Roger D. Chamberlain Praveen Krishnamurthy and Roger D. Chamberlain, Analytic Performance Models for Bounded Queueing Systems,

More information

Multi-threaded, discrete event simulation of distributed computing systems

Multi-threaded, discrete event simulation of distributed computing systems Multi-threaded, discrete event simulation of distributed computing systems Iosif C. Legrand California Institute of Technology, Pasadena, CA, U.S.A Abstract The LHC experiments have envisaged computing

More information

Model suitable for virtual circuit networks

Model suitable for virtual circuit networks . The leinrock Independence Approximation We now formulate a framework for approximation of average delay per packet in telecommunications networks. Consider a network of communication links as shown in

More information

Reliability and Dependability in Computer Networks. CS 552 Computer Networks Side Credits: A. Tjang, W. Sanders

Reliability and Dependability in Computer Networks. CS 552 Computer Networks Side Credits: A. Tjang, W. Sanders Reliability and Dependability in Computer Networks CS 552 Computer Networks Side Credits: A. Tjang, W. Sanders Outline Overall dependability definitions and concepts Measuring Site dependability Stochastic

More information

Lecture: Simulation. of Manufacturing Systems. Sivakumar AI. Simulation. SMA6304 M2 ---Factory Planning and scheduling. Simulation - A Predictive Tool

Lecture: Simulation. of Manufacturing Systems. Sivakumar AI. Simulation. SMA6304 M2 ---Factory Planning and scheduling. Simulation - A Predictive Tool SMA6304 M2 ---Factory Planning and scheduling Lecture Discrete Event of Manufacturing Systems Simulation Sivakumar AI Lecture: 12 copyright 2002 Sivakumar 1 Simulation Simulation - A Predictive Tool Next

More information

Usability Evaluation of Tools for Nomadic Application Development

Usability Evaluation of Tools for Nomadic Application Development Usability Evaluation of Tools for Nomadic Application Development Cristina Chesta (1), Carmen Santoro (2), Fabio Paternò (2) (1) Motorola Electronics S.p.a. GSG Italy Via Cardinal Massaia 83, 10147 Torino

More information

Dependability Modelling using AADL and the AADL Error Model Annex

Dependability Modelling using AADL and the AADL Error Model Annex Dependability Modelling using AADL and the AADL Error Model Annex Ana Rugina {aerugina@laas.fr} October 2005 Copyright 2004-2007 ASSERT Project 1 Context Dependability evaluation for embedded real-time

More information

Quantitative Models for Performance Enhancement of Information Retrieval from Relational Databases

Quantitative Models for Performance Enhancement of Information Retrieval from Relational Databases Quantitative Models for Performance Enhancement of Information Retrieval from Relational Databases Jenna Estep Corvis Corporation, Columbia, MD 21046 Natarajan Gautam Harold and Inge Marcus Department

More information

The ITIL v.3. Foundation Examination

The ITIL v.3. Foundation Examination The ITIL v.3. Foundation Examination ITIL v. 3 Foundation Examination: Sample Paper 4, version 3.0 Multiple Choice Instructions 1. All 40 questions should be attempted. 2. There are no trick questions.

More information

Petri Nets ~------~ R-ES-O---N-A-N-C-E-I--se-p-te-m--be-r Applications.

Petri Nets ~------~ R-ES-O---N-A-N-C-E-I--se-p-te-m--be-r Applications. Petri Nets 2. Applications Y Narahari Y Narahari is currently an Associate Professor of Computer Science and Automation at the Indian Institute of Science, Bangalore. His research interests are broadly

More information

2 Discrete Dynamic Systems

2 Discrete Dynamic Systems 2 Discrete Dynamic Systems This chapter introduces discrete dynamic systems by first looking at models for dynamic and static aspects of systems, before covering continuous and discrete systems. Transition

More information

Lecture 5: Performance Analysis I

Lecture 5: Performance Analysis I CS 6323 : Modeling and Inference Lecture 5: Performance Analysis I Prof. Gregory Provan Department of Computer Science University College Cork Slides: Based on M. Yin (Performability Analysis) Overview

More information

(DMCA201) ASSIGNMENT 1 M.C.A. DEGREE EXAMINATION, MAY 2018 Second Year SOFTWARE ENGINEERING. Maximum Marks 30 Answer all questions

(DMCA201) ASSIGNMENT 1 M.C.A. DEGREE EXAMINATION, MAY 2018 Second Year SOFTWARE ENGINEERING. Maximum Marks 30 Answer all questions ASSIGNMENT 1 M.C.A. DEGREE EXAMINATION, MAY 2018 SOFTWARE ENGINEERING Q1) Explain about software process frame work in detail. (DMCA201) Q2) Explain how both waterfall model and prototyping model can be

More information

Capacity Planning for Application Design

Capacity Planning for Application Design WHITE PAPER Capacity Planning for Application Design By Mifan Careem Director - Solutions Architecture, WSO2 1. Introduction The ability to determine or forecast the capacity of a system or set of components,

More information

Airside Congestion. Airside Congestion

Airside Congestion. Airside Congestion Airside Congestion Amedeo R. Odoni T. Wilson Professor Aeronautics and Astronautics Civil and Environmental Engineering Massachusetts Institute of Technology Objectives Airside Congestion _ Introduce fundamental

More information

Microscopic Traffic Simulation

Microscopic Traffic Simulation Microscopic Traffic Simulation Lecture Notes in Transportation Systems Engineering Prof. Tom V. Mathew Contents Overview 2 Traffic Simulation Models 2 2. Need for simulation.................................

More information

Lecture 5 STRUCTURED ANALYSIS. PB007 So(ware Engineering I Faculty of Informa:cs, Masaryk University Fall Bühnová, Sochor, Ráček

Lecture 5 STRUCTURED ANALYSIS. PB007 So(ware Engineering I Faculty of Informa:cs, Masaryk University Fall Bühnová, Sochor, Ráček Lecture 5 STRUCTURED ANALYSIS PB007 So(ware Engineering I Faculty of Informa:cs, Masaryk University Fall 2015 1 Outline ² Yourdon Modern Structured Analysis (YMSA) Context diagram (CD) Data flow diagram

More information

Performance Evaluation of Mobile Agent Network

Performance Evaluation of Mobile Agent Network Performance Evaluation of Mobile Agent Network Ignac LOVREK and Vjekoslav SINKOVIC University of Zagreb Faculty of Electrical Engineering and Computing, Department of Telecommunications HR-10000 Zagreb,

More information

The Grid Monitor. Usage and installation manual. Oxana Smirnova

The Grid Monitor. Usage and installation manual. Oxana Smirnova NORDUGRID NORDUGRID-MANUAL-5 2/5/2017 The Grid Monitor Usage and installation manual Oxana Smirnova Abstract The LDAP-based ARC Grid Monitor is a Web client tool for the ARC Information System, allowing

More information

A GPFS Primer October 2005

A GPFS Primer October 2005 A Primer October 2005 Overview This paper describes (General Parallel File System) Version 2, Release 3 for AIX 5L and Linux. It provides an overview of key concepts which should be understood by those

More information

EXERCISES SHORTEST PATHS: APPLICATIONS, OPTIMIZATION, VARIATIONS, AND SOLVING THE CONSTRAINED SHORTEST PATH PROBLEM. 1 Applications and Modelling

EXERCISES SHORTEST PATHS: APPLICATIONS, OPTIMIZATION, VARIATIONS, AND SOLVING THE CONSTRAINED SHORTEST PATH PROBLEM. 1 Applications and Modelling SHORTEST PATHS: APPLICATIONS, OPTIMIZATION, VARIATIONS, AND SOLVING THE CONSTRAINED SHORTEST PATH PROBLEM EXERCISES Prepared by Natashia Boland 1 and Irina Dumitrescu 2 1 Applications and Modelling 1.1

More information

Lecture Little s Law Flux Queueing Theory Simulation Definition. CMPSCI 691ST Systems Fall 2011

Lecture Little s Law Flux Queueing Theory Simulation Definition. CMPSCI 691ST Systems Fall 2011 CMPSCI 691ST Systems Fall 2011 Lecture 12 Lecturer: Emery Berger Scribe: Nicolas Scarrci 12.1 Little s Law In queueing theory Little s law relates the number of items processed by a queue to the average

More information

Analysis and Design Language (AADL) for Quantitative System Reliability and Availability Modeling

Analysis and Design Language (AADL) for Quantitative System Reliability and Availability Modeling Application of the Architectural Analysis and Design Language (AADL) for Quantitative System Reliability and Availability Modeling Chris Vogl, Myron Hecht, and Alex Lam Presented to System and Software

More information

Safety and Reliability of Embedded Systems. (Sicherheit und Zuverlässigkeit eingebetteter Systeme) Safety and Reliability Analysis Models: Overview

Safety and Reliability of Embedded Systems. (Sicherheit und Zuverlässigkeit eingebetteter Systeme) Safety and Reliability Analysis Models: Overview (Sicherheit und Zuverlässigkeit eingebetteter Systeme) Safety and Reliability Analysis Models: Overview Content Classification Hazard and Operability Study (HAZOP) Preliminary Hazard Analysis (PHA) Event

More information

DATA MODELS FOR SEMISTRUCTURED DATA

DATA MODELS FOR SEMISTRUCTURED DATA Chapter 2 DATA MODELS FOR SEMISTRUCTURED DATA Traditionally, real world semantics are captured in a data model, and mapped to the database schema. The real world semantics are modeled as constraints and

More information

Adaptive packet scheduling for requests delay guaranties in packetswitched computer communication network

Adaptive packet scheduling for requests delay guaranties in packetswitched computer communication network Paweł Świątek Institute of Computer Science Wrocław University of Technology Wybrzeże Wyspiańskiego 27 50-370 Wrocław, Poland Email: pawel.swiatek@pwr.wroc.pl Adam Grzech Institute of Computer Science

More information

* Department of Computer Science, University of Pisa, Pisa, Italy Department of Elect. Engineering, University of Roma Tor Vergata, Rome, Italy

* Department of Computer Science, University of Pisa, Pisa, Italy Department of Elect. Engineering, University of Roma Tor Vergata, Rome, Italy A SURVEY OF PRODUCT-FORM QUEUEING NETWORKS WITH BLOCKING AND THEIR EQUIVALENCES Simonetta BALSAMO * and Vittoria DE NITTO PERSONE' * Department of Computer Science, University of Pisa, Pisa, Italy Department

More information

Introduction to Queuing Systems

Introduction to Queuing Systems Introduction to Queuing Systems Queuing Theory View network as collections of queues FIFO data-structures Queuing theory provides probabilistic analysis of these queues Examples: Average length Probability

More information

F O U N D A T I O N. OPC Unified Architecture. Specification. Part 1: Concepts. Version 1.00

F O U N D A T I O N. OPC Unified Architecture. Specification. Part 1: Concepts. Version 1.00 F O U N D A T I O N Unified Architecture Specification Part 1: Concepts Version 1.00 July 28, 2006 Unified Architecture, Part 1 iii Release 1.00 CONTENTS Page FOREWORD... vi AGREEMENT OF USE... vi 1 Scope...

More information