Predicting the performance of general task graphs. with underlying queueing model. Abstract

Size: px
Start display at page:

Download "Predicting the performance of general task graphs. with underlying queueing model. Abstract"

Transcription

1 In: Proc. 1st Annual Conf. of the Advanced School for Computing and Imaging, May 1995, pp. 293{302 Predicting the performance of general task graphs with underlying queueing model Henk Jonkers Gerard L. Reijns Delft University of Technology, Faculty of Electrical Engineering P.O. Box 5031, 2600 GA Delft, The Netherlands Abstract The Glamis methodology provides a framework for performance modelling of parallel applications, based on a combination of task graph models and queueing network models. This paper presents a new, moderately cheap algorithm which enables the analysis of arbitrary task graph models of parallel programs with general precedence relationships, given a queueing model of the underlying machine. The algorithm generalises over an earlier algorithm. An extension to this algorithm is proposed to allow for the analysis of a certain type of mutual exclusion at the task level. The features of the methodology are illustrated by means of a case study, modelling a task graph executor running on a distributed-memory parallel machine. Measurements carried out on the actual machine, as well as simulations, are used to validate the predictions. 1 Introduction For the eective design of parallel applications, fast but accurate performance predictions are required. Traditional performance modelling techniques, e.g. queueing networks, are not directly suited to model parallel applications, because they are not capable of expressing certain types of synchronisation. Other techniques, such as Petri nets and simulation, are often too computation-intensive. Our objective is to develop a methodology for performance modelling and prediction of general parallel systems. Our approach is based on a combination and extension of existing techniques, selecting the best features of these techniques in such a way that reasonably accurate predictions are obtained, while keeping the analytical cost within acceptable limits. Our predictions will typically be used to provide feedback to the user, in order to support decisions with respect to optimal system parameters and program design. Although fairly ecient, the analysis will generally be too slow to be used within compilers to support compile-time optimisations. For this purpose, very ecient static methods, preferably yielding symbolic predictions, are required, for a rst-order estimate of the performance of the dierent alternatives. An example of such a method, which includes the eect of contention, is serialisation analysis [5]. The Glamis methodology for performance modelling of parallel applications comprises a modelling formalism and analysis algorithms to predict the completion time of parallel programs, making use of queueing networks to model the inuence of the underlying parallel machine. While previous papers introduced the methodology and described algorithms to analyse a subclass of parallel programs [9, 10], this paper describes a new algorithm for the analysis of programs with arbitrary synchronisation patterns, thus generalising over the previous algorithms. It will be shown that a similar approach can be

2 followed to include the impact of course-grain, program-level mutual exclusion in the predictions. A case study has been carried out, concerning a task graph executor running on a distributed-memory parallel machine, serving to both illustrate the concepts introduced in this paper, and to validate the predictions. A models was made of this application, including the inuence of communications, dierent mappings, and two dierent scheduling strategies. The predictions were compared to simulations and execution times measured on the actual parallel machine. The remainder of this paper is organised as follows. The next section gives an overview of the main features of the Glamis methodology, and its relation to other approaches. Section 3 describes the representation of queueing models of a machine, while section 4 describes the program modelling formalism. Section 5 introduces the task graph analysis algorithm. In section 6 it is intuitively shown how this algorithm can be modied to include the analysis of course-grain mutual exclusion. Section 7 presents the case study and its results. Finally, in section 8 some conclusions are drawn. 2 Overview of the methodology Several people have adopted an approach combining queueing network and task graphs to model parallel applications [11, 14]. This combination has some attractive features. Both formalisms occupy a favourable position on the trade-o between modelling power and analytical eciency. In separation, they lack the expressiveness to reliably model parallel systems. Queueing networks are not capable of expressing condition synchronisations, while task graphs cannot express mutual exclusion. However, these shortcomings are complementary: when used together, both synchronisation types are covered. Glamis diers from related approaches on a few important points. A major dierence is the distribution of the task completion times. Most related approaches assume exponentially distributed completion times [7, 11, 14]. We assume a negligible variance in completion times. Our experiments, as well as a recent publication by Adve and Vernon [1], support this assumption, provided that certain requirements are met. The complexity of Mak's method [11] is polynomial, but its applicability is restricted to series-parallel (SP) graphs. The worst-case complexity of the analysis of Thomasian's models [14] is exponential. Machine models in Glamis are closed queueing models. We restrict ourselves to the class of separable networks, in order to keep the analytical cost low. Separable networks can be analysed in polynomial time using algorithms such as mean value analysis (MVA), while the worst-case complexity of the analysis of general queueing networks is exponential. Approximate MVA, such as the Schweitzer algorithm [13] can be applied to further reduce the complexity, sacricing some of the accuracy. Program models consist of tasks, condition synchronisations between the tasks and the workload imposed by each task on the system resources. The workload is specied in terms of instruction counts. Instruction counts rather than direct visit counts to the resources are used, in order to keep the program models machine-independent. This separation is important in order to obtain reusable models. It also leads to more comprehensible models. However, a complete separation is not possible: the models will always have to share the same (logical) instruction set. A mapping will be dened for the derivation of visit counts from the instruction counts. The instruction set together with the mapping is the model counterpart of the programming interface. Parallel systems and programs often display a high degree of symmetry. Examples are identical processors, memory banks or interconnection switches at the machine level, or

3 identical tasks at the program level.glamis exploits these symmetries through replicated model elements, aiming to reduce the analytical cost and to obtain scalable models. In queueing models, the analytical cost of certain symmetric subsystems is reduced signicantly using aggregation. 3 Machine models A model of a (virtual) parallel machine consists of a queueing model of the architecture, the instruction set of the machine and a mapping of the instruction set to a workload on the queueing model elements in terms of visit counts. A queueing model is dened as a tuple of building blocks, each block representing one or more identical system resources. Because at the low level MVA is applied to solve the queueing model, only the total workload on each service centre needs to be known, i.e. the exact way in which the building blocks are connected to each other is irrelevant. As analysability is a prerequisite of Glamis models, the building blocks are chosen in such a way that queueing models constructed with these blocks are (approximately) separable. A queueing model block consists of a letter denoting the type of service centre, a superscript denoting the number of service centres in the block (default 1), a subscript denoting the number of servers for each service centre (default 1) and an argument denoting the mean service time (or the total service demand on the centres for a visit to the block, in case of a block of several service centres). Three types of service centres are distinguished. A delay centre or innite server, denoted by an I, can have any service time distribution. A queueing centre with rst come, rst serve (FCFS) scheduling and a deterministic service time, denoted by a D, can only be analysed approximately. All types of queueing centres which are allowed in separable queueing networks [2] are denoted by a Q. For FCFS centres this requires an exponentially distributed service time with a common mean for all classes. For processor sharing (PS) or last come rst served centres with preemptive resume (LCFS-PR), any service time distribution is permitted. The queueing model describes the physical resources of a (parallel) system. In order to complete the model of a virtual machine, the programming interface needs to be described. This interface is modelled as an instruction set and a mapping from the instructions to visit counts representing the workload on the queueing model elements. Formally, a machine model is described by the tuple hq; S; Y; K; M; I; F i Q is the set of queueing model elements. The mean service time for every service centre is specied by the function S : Q! R +. The function Y : Q! fi; d; qg denes the queue type. In case of a multiple-server, the function K : Q! N species the multiplicity of the centre (for a single server the multiplicity is one). In case of a block of identical servers, the function M : Q! N species the number of centres. The instruction set of the machine is denoted by I. The function F : I Q! R + maps instructions to visit counts. It is often convenient to impose an order on the elements in the sets, resulting in a vector of queueing model elements ~q = (q 1 ; : : :; q jqj ) and an instruction vector ~i = (i 1 ; : : :; i jij ). Vector counterparts ~s, ~y, ~k and ~m of the functions S, Y, K and M can be dened, which are applied element-wise on their vector arguments. Mapping F can be represented compactly as a jij jqj matrix [F ].

4 4 Program models A parallel program is specied in terms of a general task graph. A task graph is a directed acyclic graph (DAG), in which the nodes represent the tasks (i.e. units of computation in a parallel program) and the edges represent task precedence relationships (i.e. condition synchronisations). Tasks are mutually independent except for their precedence relationships and shared use of the same (hardware or software) resources. In order to keep task graphs of large programs manageable and to improve scalability, we will also use replicate tasks, i.e. a parallel section consisting of k identical tasks (see gure 1). Tasks are identical if they share the same predecessors and successors, and impose the same workload on the system resources. Formally, a task graph is described by a tuple ht; N; P; I; Ci T is the set of tasks. The function N : T! N denes the multiplicity of every task in T. The successor function P : T! }(T ) denes the precedence relationships between the tasks: t j 2 P (t) if and only if t j is a direct successor of t i. The instruction set used in the program is given by I. This is the only overlap with the machine model. The function C : T I! R + species the instruction counts, i.e. C(t i ; i j ) is the average number of times every instance of task t i executes instruction i j. Together with the function F from the machine model this function determines the visit counts of the tasks on the dierent queueing model elements. The visit P count V (t; e) of a task t 2 T to queueing model element q 2 Q is given by V (t; q) = x2i C(t; x)f (x; q). k 1 k t 1 <5, 5, 5> <10, 5, 5> <15, 10, 10> k t 2 t k 3 t 4 <5, 5, 5> Figure 1: Task with a multiplicity of k Figure 2: Example task graph Similarly to the machine model case, it will often be convenient to use a task vector ~t = (t 1 ; : : :; t jt j ), and vector counterparts ~n and ~p of the functions N and P. The result of the latter vector function is a vector of subsets of T. In combination with an instruction vector ~i, the function C can be specied compactly as a jt j jij matrix [C]. Visit counts can then be derived by a simple matrix-matrix multiplication: [V ] = [C] [F ] 5 Model analysis With the rst introduction of Glamis [9], an iterative algorithm was presented for the analysis of programs with an SPS-structure, i.e. programs consisting of a sequence of parallel sections, each section possibly containing dierent types (or classes) of tasks. In other words, the only condition synchronisations considered were barrier synchronisations. In a follow-up paper [10], an alternative algorithm was presented, in many cases improving over the rst algorithm. This section generalises the latter algorithm, resulting in an algorithm allowing for the analysis of general task graph models as described by the formalism in the previous section.

5 Similar to previous algorithms, the analysis of a combined machine and program model can be distinguished in low-level analysis, capturing the inuence of machine aspects modelled with queueing network, and high-level analysis, yielding the overall program performance. The results of the low-level analysis are used as input to the high-level algorithm. Some aspects of low-level queueing model analysis specic for Glamis and the high-level task graph analysis algorithm are presented in the following subsections. Queueing network analysis. In the high-level program analysis algorithm, which will be described in the next subsection, the response times are calculated using an MVA function mva. Any variant of multiple-class MVA (either exact [12] or approximate [3, 13]) can be used. The choice between an exact or an approximate solution solely depends on the required accuracy and the analytical cost that is still acceptable, thus once again illustrating the trade-o between eciency and accuracy. By means of aggregation [4], the analysis of queueing networks containing replicated service centres can be made more ecient. A block of identical service centres is replaced by a single ow-equivalent service centre which, because of the symmetry, has a service rate given by a closed-form expression. For a block of centres with an exponentially distributed service time, Q m (S), this rate, for population n, is given by [9, 15]: m (n) = n=((m + n? 1)S) For a block of FCFS centres with a deterministic service times, D m (S), an exact owequivalent service rate can only be derived for small values of m or n. For the general case, a good approximation is the following expression [10]: m (n) = 1 n?1 X m? 1 j if m n, m (n) = 1 m?1 X n? 1 j if m n S m S n j=0 Because deterministic service times for FCFS centres violate the product-form requirements, an additional prediction error is introduced when incorporating this block in the total queueing network. However, this error is generally limited, and in most cases less than the error that results from the assumption of probabilistic service times, especially in the case of highly symmetric structures which often occur in parallel systems. High-level task graph analysis algorithm. The high-level algorithm captures the eects of condition synchronisations at the task level, using the results obtained with MVA to include machine inuences. For convenience, we introduce a predecessor function E, dened as: E : T! }(T ); 8u 2 T 8t 2 T : u 2 E(t), t 2 P (u). The algorithm, using the formal description of a task graph from section 4, is presented in gure 3. Set A, containing the active tasks, is initialised with all tasks without predecessors. Set B, containing the completed tasks, is initially empty. The response times for all active tasks are calculated using MVA, as described in the previous subsection. The number of job classes in the queueing network is equal to the number of active tasks. The number of jobs in a class is equal to the multiplicity of the corresponding task. The visit counts of the dierent tasks to the queueing model elements are derived using the functions C and F, as indicated in section 4. The function mva returns the response times R a for all tasks a 2 A obtained with MVA, given the set of active tasks A. All tasks with a minimum response time (denoted by R min ) are removed from A and added to B. Successors of these tasks are added to A, provided that all their predecessors are members of B. All steps are repeated until A is empty. The total completion time is equal to the sum of the minimum response times in every iteration. j=0

6 1: A := ft 2 T je(t) = g ; 2: B := ; R tot := 0:0 ; 3: while A 6= do 4: 8 a2a : R a = mva(a; a) ; 5: R min := min a2a R a ; R tot := R tot + R min ; 6: A min := fa 2 AjR a = R min g ; 7: B := B [ A min ; A := AnA min ; 1? R min Ra C(a; i) ; 8: 8 a2a 8 i2i : C(a; i) := 9: 8 a2amin : A := A [ fb 2 P (a)je(b) Bg ; 10: endwhile 11: return R tot ; Figure 3: Task graph analysis algorithm 6 Analysis of course-grain mutual exclusion This section concerns the analysis of a certain type of course-grain (task-level) mutual exclusion, which cannot easily be expressed in a regular queueing network, e.g. tasks being executed within a critical code section. These synchronisations resemble the regular precedence relationships specied by a task graph. However, the order in which two mutually exclusive tasks are executed is not specied. This fundamental dierence between condition synchronisation and mutual exclusion makes that a treatment of course-grain mutual exclusion similar to the way condition synchronisations are treated will result in less robust answers. Still, this approach has some attractive features. Firstly, resource usage modelled by the queueing network is still possible within a task participating in such a mutual exclusion relationship. In this way, ecient analysis of a certain type of simultaneous resource possession is made possible, thus extending the scope of the original methodology. Other types of simultaneous resource possession, e.g. occurring in circuit-switched interconnection networks, cannot be solved with this method. These require dierent solutions, e.g. the method of surrogates [8] (which can directly be combined with our high-level task graph analysis algorithms, replacing the mva function). A second advantage of using this type of mutual exclusion, which inherently represents an FCFS scheduling policy, is that it allows for dierent service times for dierent tasks. This is not allowed for an FCFS server in a queueing model. Finally, transient eects are automatically included in the predictions. A function U : T! }(T ) is dened in addition to the functions P and E, yielding a set of tasks to be executed in mutual exclusion with the argument task. For simplicity, we will assume that all tasks have a multiplicity of 1. Before activating a task t, an additional check is necessary to make sure no task in U(t) is active. When a task t nishes, members of U(t) are candidates for activation, in addition to the members of P (t). Because several tasks to be executed in mutual exclusion might simultaneously become ready for execution, the activation of tasks must be carried out strictly sequential (which is indicated by the use of a for-statement rather than the 8-symbol in the code fragments given below), in order to make sure that only one of these tasks will actually become active. The algorithm from gure 3 only needs to be adapted at two places. The rst line becomes 1: for t 2 T

7 if E(t) = ^ U(t) \ A = then A := A [ ftg Line 9 is replaced by 9: for a 2 A min for b 2 P (a) [ U(a) if E(b) B ^ U(b) \ A = then A := A [ fbg This approach to account for mutual exclusion can never replace queueing models, because of the completely dierent nature of this type of synchronisation. This approach assumes that the arbitration of contention is completely deterministic, which is normally not the case. Consequently, the algorithm yields one sample of the execution time distribution, which can be anywhere in the range of possible execution times. When interested in the mean execution time, a more complicated strategy is required, enumerating over all possibilities, similar to the way course-grain conditional statements are analysed [10]. However, when the number of decisions to be taken is large, this becomes infeasible. A nal restriction is that the method is only applicable for an FCFS scheduling policy. 7 Case study: task graph executor The case study concerns the performance prediction of a task graph executor using a farmer-worker strategy, running on a Parsytec GCel distributed-memory machine 1. A central processor (farmer) distributes tasks ready to be executed to the workers (the number of workers W ranging from 1 to 4). Tasks are distributed asynchronously, i.e. the farmer only determines which worker will execute which tasks, the workers are responsible for the scheduling of tasks and, if necessary, task queueing. A star topology is adopted, therefore all communications take place between farmer and workers, and are nearestneighbour. Two scheduling disciplines for the workers are distinguished. In rst-come rst-served (FCFS) scheduling, at most one task is active on a processor at a time, other tasks scheduled on the same processor are queued until the task nishes. This is modelled with mutually exclusive tasks, as described in section 6. The workload within the context of the processor is simply modelled as a delay centre. When using processor sharing (PS) scheduling, tasks running on the same worker are executed concurrently, in dierent threads. This is modelled as a workload on a single-server queueing centre. The communication model is based on the model of the Parsytec GCel described by Van Gemund [6]. The key property of this model is that every communication link is modelled as a single queueing centre, representing the exclusive use of the link and the DMA devices of sender and receiver. Sending a message also imposes a small workload on sending and receiving processor. Additional workload which occurs when two communications in opposite directions take place simultaneously (as a result of additional acknowledgements trac), is ignored. A queueing model of this system (for W = 4) is shown in gure 4. Every shaded box represents a node. The farmer node consists of one processor queue and W queues for the incoming communication links (accounting for both sender and receiver DMA channels). Every worker node consists of a processor queue and a link queue for incoming communications. One visist to the communication link models the transmission of one 120-byte packet. The transmission time of a packet over a link is 108 s. Consider the task graph shown in gure 5. The computation time is 0.3 sec. for tasks 1 and 8, and 0.2 sec. for the other tasks. The communication between tasks is negligible. 1 Kindly made available by the Interdisciplinary Center for Computer-based Complex systems research Amsterdam (IC 3 A).

8 Figure 4: Queueing model Figure 5: Example task graph This graph is executed on a varying number of processors, using dierent task to processor mappings. The dierence between PS and FCFS scheduling is studied. The results are FCFS PS P Map. Meas. Sim. Pred. Meas. Sim. Pred A B C D Table 1: Results of the example task graph summarised in table 1 for various congurations. In addition to measured execution times and analytical predictions, execution times obtained with model simulation also included, in order to distinguish between modelling errors and analytical errors. In all cases, the predictions and simulation results are almost identical to the measured times. The small dierences found can entirely be attributed to measurement inaccuracies, and some overhead not included in the models. In mapping A, tasks 1 to 4 are mapped on processor 1, and tasks 5 to 8 on processor 2. Measurement, as well as simulation and prediction, show that for this mapping FCFS scheduling performs better than PS scheduling. Simulation shows that, due to non-deterministic arbitration of mutual exclusion, the actual completion time in case of FCFS can vary from 1.2 to 1.6 sec. However, due to the implementation of the executor, only 1.2 sec. is measured in practice, the same value that is predicted by our algorithm. Mapping B (tasks 1; 3; 5; 7! proc. 1, tasks 2; 4; 6; 8! proc. 2) is the optimal mapping for 2 processors. In case of mapping C (tasks 1; 4; 5! proc. 1, tasks 2; 6; 8! proc. 2, tasks 3; 7! proc. 3), a similar situation as in mapping B occurs. Simulation shows that an execution time of either 0.7 or 0.8 sec. is possible for FCFS, while measurement and prediction only give 0.8 sec. PS scheduling performs slightly worse. Mapping D is optimal for 3 processors (tasks 1; 5; 7! proc. 1, tasks 2; 4; 8! proc. 2, tasks 3; 6! proc. 3). For four processors and an optimal mapping, the lowest possible execution time of 0.5 sec., corresponding to the critical path, is obtained. As a second experiment, the completion times of random task graphs with dierent communication behaviour are studied. Only PS scheduling is considered. The results are shown in tables 2 (average communication), 3 (only communication) and 4 (low communication). All times are in seconds. Relative deviations from the numbers in the previous column (in %) are shown in parentheses. In case of low communication, only task parameters are transferred from farmer to worker (64 data bytes). For average communication

9 (2.2) (0.6) (1.0) (0.5) (2.5) (0.9) (1.4) (0.0) (0.2) (0.1) (1.0) (0.2) (0.9) (0.1) (1.6) (0.3) Table 2: Random task graphs with 5 and 50 tasks, average communication and only communication, a message of 20,000 bytes is sent to the worker for each task assigned to it, and a result message of 20,000 bytes is sent back to the farmer after task completion. It appears that the errors are highest in task graphs with only communication, which is explained by the simplications in the communication model. Very accurate results are obtained for task graphs with relatively little communication. For most task graphs, an accuracy within 5% can be expected (2.1) (7.3) (3.6) (4.1) (8.0) (6.2) (7.0) (1.6) (1.4) (7.4) (3.5) (5.1) (4.1) (3.0) (1.5) (3.9) Table 3: Task graphs with 75 and 200 tasks, only communication (0.0) (0.3) (0.0) (0.3) (0.3) (0.4) (0.8) (0.2) (0.0) (0.6) (0.6) (0.7) (0.3) (0.8) (0.2) (0.3) Table 4: Task graphs with 35 and 60 tasks, low communication 8 Conclusions This paper presents a new algorithm for the analysis of parallel programs with general static synchronisation patterns, given a queueing model to capture the inuence of the underlying parallel machine, e.g. contention for hardware resources. This algorithm, which is part of the Glamis methodology for performance modelling of parallel applications, generalises over an earlier presented algorithm [10] applicable to only a subclass of task graphs. The algorithm is extended to include the analysis of programs with a certain type of mutual exclusion at the task level, although the results may be less accurate due to the non-deterministic character of mutual exclusion synchronisation. A measurement-based case study is presented, serving to illustrate the main features of the methodology and to validate the algorithm (including the extension). The Glamis methodology aims at the accuracy required for feedback to the user. Ef- cient numerical methods, yielding reasonably accurate performance predictions in poly-

10 nomial time (e.g. MVA), are used for the analysis of the queueing model. The number of invocations of the queueing model analysis algorithm is in the worst case equal to the number of tasks, which means that the overall complexity of the analysis remains polynomial. Because of the probabilistic foundation of the methodology, non-deterministic features can easily be taken into account. Other key features of Glamis include modelling ease, exibility and scalability of the models. The eciency and scalability can be improved when exploiting the symmetrical structure of many parallel architectures and programs. The case study shows that the most important characteristics of a realistic application, including the eects of communication and resource contention, can be captured in a relatively simple model. The performance predictions obtained with this model, either using simulation or our analytical methods, match the measured values well. The eects of dierent mappings and scheduling policies are correctly predicted. References [1] V.S. Adve and M.K. Vernon, \The inuence of random delays on parallel execution times," in Proc ACM SIGMETRICS Conf. on Measurement and Modelling of Computer Systems, May 1993, pp. 61{73. [2] F. Baskett, K.M. Chandy, R.R. Muntz and F.G. Palacios, \Open, closed, and mixed networks of queues with dierent classes of customers," J. of the ACM, vol. 22, Apr. 1975, pp. 248{260. [3] K.M. Chandy and D. Neuse, \Linearizer: A heuristic algorithm for queueing network models of computing systems," Comm. of the ACM, vol. 25, Feb. 1982, pp. 126{134. [4] P.J. Courtois, Decomposability: Queueing and Computer System Applications. Academic Press, [5] A.J.C. van Gemund, \Compiling performance models from parallel programs," in Proc. 8th ACM Int. Conf. on Supercomputing, Manchester, July 1994, pp. 303{312. [6] A.J.C. van Gemund and G.L. Reijns, \Predicting Parallel System Performance with Pamela," in these proceedings. [7] P. Heidelberger and K.S. Trivedi, \Analytic queueing models for programs with internal concurrency," IEEE Tr. on Computers, vol. 32, Jan. 1983, pp. 73{82. [8] P.A. Jacobson and E.D. Lazowska, \Analyzing queueing networks with simultaneous resource possession," Comm. of the ACM, vol. 25, Feb. 1982, pp. 142{151. [9] H. Jonkers, \Queueing models of parallel applications: The Glamis methodology," in Computer Performance Evaluation: Modelling Techniques and Tools (LNCS 794) (G. Haring and G. Kotsis, eds.), Springer-Verlag, May 1994, pp. 123{138. [10] H. Jonkers, A.J.C. van Gemund and G.L. Reijns, \A probabilistic approach to parallel system performance modelling," in Proc. 28th Hawaii Int. Conf. on System Sciences, Vol. II, IEEE, Jan. 1995, pp. 412{421. [11] V.W. Mak and S.F. Lundstrom, \Predicting performance of parallel computations," IEEE Tr. on Parallel and Distributed Systems, vol. 1, July 1990, pp. 257{270. [12] M. Reiser and S.S. Lavenberg, \Mean value analysis of closed multichain queueing networks," J. of the ACM, vol. 27, Apr. 1980, pp. 313{322. [13] P. Schweitzer, \Approximate analysis of multiclass closed networks of queues," in Proc. of Int. Conf. on Control and Optimization, Amsterdam, [14] A. Thomasian and P.F. Bay, \Analytic queueing network models for parallel processing task systems," IEEE Tr. on Computers, vol. 35, Dec. 1986, pp. 1045{1054. [15] J. Zahorjan et al., \Balanced job bound analysis of queueing networks," Comm. of the ACM, vol. 25, Feb. 1982, pp. 134{141.

In: Proc. 7th International Conference on Modelling Techniques and Tools for. Computer Performance Evaluation, Vienna, Austria, May 1994.

In: Proc. 7th International Conference on Modelling Techniques and Tools for. Computer Performance Evaluation, Vienna, Austria, May 1994. In: Proc. 7th International Conference on Modelling Techniques and Tools for Computer Performance Evaluation, Vienna, Austria, May 1994 Queueing Models of Parallel Applications: The Glamis Methodology

More information

Real-Time Scalability of Nested Spin Locks. Hiroaki Takada and Ken Sakamura. Faculty of Science, University of Tokyo

Real-Time Scalability of Nested Spin Locks. Hiroaki Takada and Ken Sakamura. Faculty of Science, University of Tokyo Real-Time Scalability of Nested Spin Locks Hiroaki Takada and Ken Sakamura Department of Information Science, Faculty of Science, University of Tokyo 7-3-1, Hongo, Bunkyo-ku, Tokyo 113, Japan Abstract

More information

2 J. Karvo et al. / Blocking of dynamic multicast connections Figure 1. Point to point (top) vs. point to multipoint, or multicast connections (bottom

2 J. Karvo et al. / Blocking of dynamic multicast connections Figure 1. Point to point (top) vs. point to multipoint, or multicast connections (bottom Telecommunication Systems 0 (1998)?? 1 Blocking of dynamic multicast connections Jouni Karvo a;, Jorma Virtamo b, Samuli Aalto b and Olli Martikainen a a Helsinki University of Technology, Laboratory of

More information

Analytical models for parallel programs have been successful at providing simple qualitative

Analytical models for parallel programs have been successful at providing simple qualitative A Deterministic Model for Parallel Program Performance Evaluation Vikram S. Adve Rice University and Mary K. Vernon University of Wisconsin-Madison Analytical models for parallel programs have been successful

More information

FB(9,3) Figure 1(a). A 4-by-4 Benes network. Figure 1(b). An FB(4, 2) network. Figure 2. An FB(27, 3) network

FB(9,3) Figure 1(a). A 4-by-4 Benes network. Figure 1(b). An FB(4, 2) network. Figure 2. An FB(27, 3) network Congestion-free Routing of Streaming Multimedia Content in BMIN-based Parallel Systems Harish Sethu Department of Electrical and Computer Engineering Drexel University Philadelphia, PA 19104, USA sethu@ece.drexel.edu

More information

A Customized MVA Model for ILP Multiprocessors

A Customized MVA Model for ILP Multiprocessors A Customized MVA Model for ILP Multiprocessors Daniel J. Sorin, Mary K. Vernon, Vijay S. Pai, Sarita V. Adve, and David A. Wood Computer Sciences Dept University of Wisconsin - Madison sorin, vernon, david

More information

A taxonomy of race. D. P. Helmbold, C. E. McDowell. September 28, University of California, Santa Cruz. Santa Cruz, CA

A taxonomy of race. D. P. Helmbold, C. E. McDowell. September 28, University of California, Santa Cruz. Santa Cruz, CA A taxonomy of race conditions. D. P. Helmbold, C. E. McDowell UCSC-CRL-94-34 September 28, 1994 Board of Studies in Computer and Information Sciences University of California, Santa Cruz Santa Cruz, CA

More information

Creating Meaningful Training Data for Dicult Job Shop Scheduling Instances for Ordinal Regression

Creating Meaningful Training Data for Dicult Job Shop Scheduling Instances for Ordinal Regression Creating Meaningful Training Data for Dicult Job Shop Scheduling Instances for Ordinal Regression Helga Ingimundardóttir University of Iceland March 28 th, 2012 Outline Introduction Job Shop Scheduling

More information

to automatically generate parallel code for many applications that periodically update shared data structures using commuting operations and/or manipu

to automatically generate parallel code for many applications that periodically update shared data structures using commuting operations and/or manipu Semantic Foundations of Commutativity Analysis Martin C. Rinard y and Pedro C. Diniz z Department of Computer Science University of California, Santa Barbara Santa Barbara, CA 93106 fmartin,pedrog@cs.ucsb.edu

More information

Ecient Processor Allocation for 3D Tori. Wenjian Qiao and Lionel M. Ni. Department of Computer Science. Michigan State University

Ecient Processor Allocation for 3D Tori. Wenjian Qiao and Lionel M. Ni. Department of Computer Science. Michigan State University Ecient Processor llocation for D ori Wenjian Qiao and Lionel M. Ni Department of Computer Science Michigan State University East Lansing, MI 4884-107 fqiaow, nig@cps.msu.edu bstract Ecient allocation of

More information

A Freely Congurable Audio-Mixing Engine. M. Rosenthal, M. Klebl, A. Gunzinger, G. Troster

A Freely Congurable Audio-Mixing Engine. M. Rosenthal, M. Klebl, A. Gunzinger, G. Troster A Freely Congurable Audio-Mixing Engine with Automatic Loadbalancing M. Rosenthal, M. Klebl, A. Gunzinger, G. Troster Electronics Laboratory, Swiss Federal Institute of Technology CH-8092 Zurich, Switzerland

More information

Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks

Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks X. Yuan, R. Melhem and R. Gupta Department of Computer Science University of Pittsburgh Pittsburgh, PA 156 fxyuan,

More information

School of Electrical, Electronic & Computer Engineering. Soft arbiters. Andrey Mokhov, Alex Yakovlev. Technical Report Series NCL-EECE-MSD-TR

School of Electrical, Electronic & Computer Engineering. Soft arbiters. Andrey Mokhov, Alex Yakovlev. Technical Report Series NCL-EECE-MSD-TR School of Electrical, Electronic & Computer Engineering Soft arbiters Andrey Mokhov, Alex Yakovlev Technical Report Series NCL-EECE-MSD-TR-2009-149 August 2009 Contact: Andrey.Mokhov@ncl.ac.uk Alex.Yakovlev@ncl.ac.uk

More information

Numerical Evaluation of Hierarchical QoS Routing. Sungjoon Ahn, Gayathri Chittiappa, A. Udaya Shankar. Computer Science Department and UMIACS

Numerical Evaluation of Hierarchical QoS Routing. Sungjoon Ahn, Gayathri Chittiappa, A. Udaya Shankar. Computer Science Department and UMIACS Numerical Evaluation of Hierarchical QoS Routing Sungjoon Ahn, Gayathri Chittiappa, A. Udaya Shankar Computer Science Department and UMIACS University of Maryland, College Park CS-TR-395 April 3, 1998

More information

CHAPTER 4 AN INTEGRATED APPROACH OF PERFORMANCE PREDICTION ON NETWORKS OF WORKSTATIONS. Xiaodong Zhang and Yongsheng Song

CHAPTER 4 AN INTEGRATED APPROACH OF PERFORMANCE PREDICTION ON NETWORKS OF WORKSTATIONS. Xiaodong Zhang and Yongsheng Song CHAPTER 4 AN INTEGRATED APPROACH OF PERFORMANCE PREDICTION ON NETWORKS OF WORKSTATIONS Xiaodong Zhang and Yongsheng Song 1. INTRODUCTION Networks of Workstations (NOW) have become important distributed

More information

is developed which describe the mean values of various system parameters. These equations have circular dependencies and must be solved iteratively. T

is developed which describe the mean values of various system parameters. These equations have circular dependencies and must be solved iteratively. T A Mean Value Analysis Multiprocessor Model Incorporating Superscalar Processors and Latency Tolerating Techniques 1 David H. Albonesi Israel Koren Department of Electrical and Computer Engineering University

More information

Evaluation of Parallel Programs by Measurement of Its Granularity

Evaluation of Parallel Programs by Measurement of Its Granularity Evaluation of Parallel Programs by Measurement of Its Granularity Jan Kwiatkowski Computer Science Department, Wroclaw University of Technology 50-370 Wroclaw, Wybrzeze Wyspianskiego 27, Poland kwiatkowski@ci-1.ci.pwr.wroc.pl

More information

under Timing Constraints David Filo David Ku Claudionor N. Coelho, Jr. Giovanni De Micheli

under Timing Constraints David Filo David Ku Claudionor N. Coelho, Jr. Giovanni De Micheli Interface Optimization for Concurrent Systems under Timing Constraints David Filo David Ku Claudionor N. Coelho, Jr. Giovanni De Micheli Abstract The scope of most high-level synthesis eorts to date has

More information

Availability of Coding Based Replication Schemes. Gagan Agrawal. University of Maryland. College Park, MD 20742

Availability of Coding Based Replication Schemes. Gagan Agrawal. University of Maryland. College Park, MD 20742 Availability of Coding Based Replication Schemes Gagan Agrawal Department of Computer Science University of Maryland College Park, MD 20742 Abstract Data is often replicated in distributed systems to improve

More information

A New Theory of Deadlock-Free Adaptive Multicast Routing in. Wormhole Networks. J. Duato. Facultad de Informatica. Universidad Politecnica de Valencia

A New Theory of Deadlock-Free Adaptive Multicast Routing in. Wormhole Networks. J. Duato. Facultad de Informatica. Universidad Politecnica de Valencia A New Theory of Deadlock-Free Adaptive Multicast Routing in Wormhole Networks J. Duato Facultad de Informatica Universidad Politecnica de Valencia P.O.B. 22012, 46071 - Valencia, SPAIN E-mail: jduato@aii.upv.es

More information

Layer-Based Scheduling Algorithms for Multiprocessor-Tasks with Precedence Constraints

Layer-Based Scheduling Algorithms for Multiprocessor-Tasks with Precedence Constraints Layer-Based Scheduling Algorithms for Multiprocessor-Tasks with Precedence Constraints Jörg Dümmler, Raphael Kunis, and Gudula Rünger Chemnitz University of Technology, Department of Computer Science,

More information

. The problem: ynamic ata Warehouse esign Ws are dynamic entities that evolve continuously over time. As time passes, new queries need to be answered

. The problem: ynamic ata Warehouse esign Ws are dynamic entities that evolve continuously over time. As time passes, new queries need to be answered ynamic ata Warehouse esign? imitri Theodoratos Timos Sellis epartment of Electrical and Computer Engineering Computer Science ivision National Technical University of Athens Zographou 57 73, Athens, Greece

More information

A Quantitative Model for Capacity Estimation of Products

A Quantitative Model for Capacity Estimation of Products A Quantitative Model for Capacity Estimation of Products RAJESHWARI G., RENUKA S.R. Software Engineering and Technology Laboratories Infosys Technologies Limited Bangalore 560 100 INDIA Abstract: - Sizing

More information

Quaestiones lnformaticae

Quaestiones lnformaticae Quaestiones lnformaticae An offlclal publication of the Computer Society of and of the n Institute of Computer Scientists 'n Amptellke tydskrif van die Rekenaarverenlging van Suld-Afrika en van die Suld-Afrikaanse

More information

Ecient Implementation of Sorting Algorithms on Asynchronous Distributed-Memory Machines

Ecient Implementation of Sorting Algorithms on Asynchronous Distributed-Memory Machines Ecient Implementation of Sorting Algorithms on Asynchronous Distributed-Memory Machines Zhou B. B., Brent R. P. and Tridgell A. y Computer Sciences Laboratory The Australian National University Canberra,

More information

Shared-Memory Multiprocessor Systems Hierarchical Task Queue

Shared-Memory Multiprocessor Systems Hierarchical Task Queue UNIVERSITY OF LUGANO Advanced Learning and Research Institute -ALaRI PROJECT COURSE: PERFORMANCE EVALUATION Shared-Memory Multiprocessor Systems Hierarchical Task Queue Mentor: Giuseppe Serazzi Candidates:

More information

A Hierarchical Approach to Workload. M. Calzarossa 1, G. Haring 2, G. Kotsis 2,A.Merlo 1,D.Tessera 1

A Hierarchical Approach to Workload. M. Calzarossa 1, G. Haring 2, G. Kotsis 2,A.Merlo 1,D.Tessera 1 A Hierarchical Approach to Workload Characterization for Parallel Systems? M. Calzarossa 1, G. Haring 2, G. Kotsis 2,A.Merlo 1,D.Tessera 1 1 Dipartimento di Informatica e Sistemistica, Universita dipavia,

More information

Analytical Modeling of Routing Algorithms in. Virtual Cut-Through Networks. Real-Time Computing Laboratory. Electrical Engineering & Computer Science

Analytical Modeling of Routing Algorithms in. Virtual Cut-Through Networks. Real-Time Computing Laboratory. Electrical Engineering & Computer Science Analytical Modeling of Routing Algorithms in Virtual Cut-Through Networks Jennifer Rexford Network Mathematics Research Networking & Distributed Systems AT&T Labs Research Florham Park, NJ 07932 jrex@research.att.com

More information

Mean Value Analysis and Related Techniques

Mean Value Analysis and Related Techniques Mean Value Analysis and Related Techniques 34-1 Overview 1. Analysis of Open Queueing Networks 2. Mean-Value Analysis 3. Approximate MVA 4. Balanced Job Bounds 34-2 Analysis of Open Queueing Networks Used

More information

Network. Department of Statistics. University of California, Berkeley. January, Abstract

Network. Department of Statistics. University of California, Berkeley. January, Abstract Parallelizing CART Using a Workstation Network Phil Spector Leo Breiman Department of Statistics University of California, Berkeley January, 1995 Abstract The CART (Classication and Regression Trees) program,

More information

Structural Advantages for Ant Colony Optimisation Inherent in Permutation Scheduling Problems

Structural Advantages for Ant Colony Optimisation Inherent in Permutation Scheduling Problems Structural Advantages for Ant Colony Optimisation Inherent in Permutation Scheduling Problems James Montgomery No Institute Given Abstract. When using a constructive search algorithm, solutions to scheduling

More information

Ecient Implementation of Sorting Algorithms on Asynchronous Distributed-Memory Machines

Ecient Implementation of Sorting Algorithms on Asynchronous Distributed-Memory Machines Ecient Implementation of Sorting Algorithms on Asynchronous Distributed-Memory Machines B. B. Zhou, R. P. Brent and A. Tridgell Computer Sciences Laboratory The Australian National University Canberra,

More information

An Ecient Scheduling Algorithm for Multiprogramming on Parallel Computing Systems

An Ecient Scheduling Algorithm for Multiprogramming on Parallel Computing Systems An Ecient Scheduling Algorithm for Multiprogramming on Parallel Computing Systems Zhou B. B., Brent R. P. and Qu X. Computer Sciences Laboratory The Australian National University Canberra, ACT 0200, Australia

More information

Matrix Unit Cell Scheduler (MUCS) for. Input-Buered ATM Switches. Haoran Duan, John W. Lockwood, and Sung Mo Kang

Matrix Unit Cell Scheduler (MUCS) for. Input-Buered ATM Switches. Haoran Duan, John W. Lockwood, and Sung Mo Kang Matrix Unit Cell Scheduler (MUCS) for Input-Buered ATM Switches Haoran Duan, John W. Lockwood, and Sung Mo Kang University of Illinois at Urbana{Champaign Department of Electrical and Computer Engineering

More information

The Effect of Scheduling Discipline on Dynamic Load Sharing in Heterogeneous Distributed Systems

The Effect of Scheduling Discipline on Dynamic Load Sharing in Heterogeneous Distributed Systems Appears in Proc. MASCOTS'97, Haifa, Israel, January 1997. The Effect of Scheduling Discipline on Dynamic Load Sharing in Heterogeneous Distributed Systems Sivarama P. Dandamudi School of Computer Science,

More information

On Object Orientation as a Paradigm for General Purpose. Distributed Operating Systems

On Object Orientation as a Paradigm for General Purpose. Distributed Operating Systems On Object Orientation as a Paradigm for General Purpose Distributed Operating Systems Vinny Cahill, Sean Baker, Brendan Tangney, Chris Horn and Neville Harris Distributed Systems Group, Dept. of Computer

More information

Theoretical Foundations of SBSE. Xin Yao CERCIA, School of Computer Science University of Birmingham

Theoretical Foundations of SBSE. Xin Yao CERCIA, School of Computer Science University of Birmingham Theoretical Foundations of SBSE Xin Yao CERCIA, School of Computer Science University of Birmingham Some Theoretical Foundations of SBSE Xin Yao and Many Others CERCIA, School of Computer Science University

More information

apply competitive analysis, introduced in [10]. It determines the maximal ratio between online and optimal oine solutions over all possible inputs. In

apply competitive analysis, introduced in [10]. It determines the maximal ratio between online and optimal oine solutions over all possible inputs. In Online Scheduling of Continuous Media Streams? B. Monien, P. Berenbrink, R. Luling, and M. Riedel?? University of Paderborn, Germany E-mail: bm,pebe,rl,barcom@uni-paderborn.de Abstract. We present a model

More information

(Preliminary Version 2 ) Jai-Hoon Kim Nitin H. Vaidya. Department of Computer Science. Texas A&M University. College Station, TX

(Preliminary Version 2 ) Jai-Hoon Kim Nitin H. Vaidya. Department of Computer Science. Texas A&M University. College Station, TX Towards an Adaptive Distributed Shared Memory (Preliminary Version ) Jai-Hoon Kim Nitin H. Vaidya Department of Computer Science Texas A&M University College Station, TX 77843-3 E-mail: fjhkim,vaidyag@cs.tamu.edu

More information

Performance Modeling of a Cluster of Workstations

Performance Modeling of a Cluster of Workstations Performance Modeling of a Cluster of Workstations Ahmed M. Mohamed, Lester Lipsky and Reda A. Ammar Dept. of Computer Science and Engineering University of Connecticut Storrs, CT 6269 Abstract Using off-the-shelf

More information

Latency Tolerance: A Metric for Performance Analysis of Multithreaded Architectures

Latency Tolerance: A Metric for Performance Analysis of Multithreaded Architectures Latency Tolerance: A Metric for Performance Analysis of Multithreaded Architectures Shashank S. Nemawarkar and Guang R. Gao School of Computer Science McGill University, Montreal, Quebec H3A 2A7 Canada

More information

Parallel Clustering on a Unidirectional Ring. Gunter Rudolph 1. University of Dortmund, Department of Computer Science, LS XI, D{44221 Dortmund

Parallel Clustering on a Unidirectional Ring. Gunter Rudolph 1. University of Dortmund, Department of Computer Science, LS XI, D{44221 Dortmund Parallel Clustering on a Unidirectional Ring Gunter Rudolph 1 University of Dortmund, Department of Computer Science, LS XI, D{44221 Dortmund 1. Introduction Abstract. In this paper a parallel version

More information

Lecture 9: Load Balancing & Resource Allocation

Lecture 9: Load Balancing & Resource Allocation Lecture 9: Load Balancing & Resource Allocation Introduction Moler s law, Sullivan s theorem give upper bounds on the speed-up that can be achieved using multiple processors. But to get these need to efficiently

More information

Consistent Logical Checkpointing. Nitin H. Vaidya. Texas A&M University. Phone: Fax:

Consistent Logical Checkpointing. Nitin H. Vaidya. Texas A&M University. Phone: Fax: Consistent Logical Checkpointing Nitin H. Vaidya Department of Computer Science Texas A&M University College Station, TX 77843-3112 hone: 409-845-0512 Fax: 409-847-8578 E-mail: vaidya@cs.tamu.edu Technical

More information

An Experimental Investigation into the Rank Function of the Heterogeneous Earliest Finish Time Scheduling Algorithm

An Experimental Investigation into the Rank Function of the Heterogeneous Earliest Finish Time Scheduling Algorithm An Experimental Investigation into the Rank Function of the Heterogeneous Earliest Finish Time Scheduling Algorithm Henan Zhao and Rizos Sakellariou Department of Computer Science, University of Manchester,

More information

Compilation Issues for High Performance Computers: A Comparative. Overview of a General Model and the Unied Model. Brian J.

Compilation Issues for High Performance Computers: A Comparative. Overview of a General Model and the Unied Model. Brian J. Compilation Issues for High Performance Computers: A Comparative Overview of a General Model and the Unied Model Abstract This paper presents a comparison of two models suitable for use in a compiler for

More information

2 Data Reduction Techniques The granularity of reducible information is one of the main criteria for classifying the reduction techniques. While the t

2 Data Reduction Techniques The granularity of reducible information is one of the main criteria for classifying the reduction techniques. While the t Data Reduction - an Adaptation Technique for Mobile Environments A. Heuer, A. Lubinski Computer Science Dept., University of Rostock, Germany Keywords. Reduction. Mobile Database Systems, Data Abstract.

More information

Symbolic Evaluation of Sums for Parallelising Compilers

Symbolic Evaluation of Sums for Parallelising Compilers Symbolic Evaluation of Sums for Parallelising Compilers Rizos Sakellariou Department of Computer Science University of Manchester Oxford Road Manchester M13 9PL United Kingdom e-mail: rizos@csmanacuk Keywords:

More information

Scheduling with Bus Access Optimization for Distributed Embedded Systems

Scheduling with Bus Access Optimization for Distributed Embedded Systems 472 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 8, NO. 5, OCTOBER 2000 Scheduling with Bus Access Optimization for Distributed Embedded Systems Petru Eles, Member, IEEE, Alex

More information

and therefore the system throughput in a distributed database system [, 1]. Vertical fragmentation further enhances the performance of database transa

and therefore the system throughput in a distributed database system [, 1]. Vertical fragmentation further enhances the performance of database transa Vertical Fragmentation and Allocation in Distributed Deductive Database Systems Seung-Jin Lim Yiu-Kai Ng Department of Computer Science Brigham Young University Provo, Utah 80, U.S.A. Email: fsjlim,ngg@cs.byu.edu

More information

On Checkpoint Latency. Nitin H. Vaidya. In the past, a large number of researchers have analyzed. the checkpointing and rollback recovery scheme

On Checkpoint Latency. Nitin H. Vaidya. In the past, a large number of researchers have analyzed. the checkpointing and rollback recovery scheme On Checkpoint Latency Nitin H. Vaidya Department of Computer Science Texas A&M University College Station, TX 77843-3112 E-mail: vaidya@cs.tamu.edu Web: http://www.cs.tamu.edu/faculty/vaidya/ Abstract

More information

The Encoding Complexity of Network Coding

The Encoding Complexity of Network Coding The Encoding Complexity of Network Coding Michael Langberg Alexander Sprintson Jehoshua Bruck California Institute of Technology Email: mikel,spalex,bruck @caltech.edu Abstract In the multicast network

More information

A Dag-Based Algorithm for Distributed Mutual Exclusion. Kansas State University. Manhattan, Kansas maintains [18]. algorithms [11].

A Dag-Based Algorithm for Distributed Mutual Exclusion. Kansas State University. Manhattan, Kansas maintains [18]. algorithms [11]. A Dag-Based Algorithm for Distributed Mutual Exclusion Mitchell L. Neilsen Masaaki Mizuno Department of Computing and Information Sciences Kansas State University Manhattan, Kansas 66506 Abstract The paper

More information

Neuro-Remodeling via Backpropagation of Utility. ABSTRACT Backpropagation of utility is one of the many methods for neuro-control.

Neuro-Remodeling via Backpropagation of Utility. ABSTRACT Backpropagation of utility is one of the many methods for neuro-control. Neuro-Remodeling via Backpropagation of Utility K. Wendy Tang and Girish Pingle 1 Department of Electrical Engineering SUNY at Stony Brook, Stony Brook, NY 11794-2350. ABSTRACT Backpropagation of utility

More information

SORT INFERENCE \coregular" signatures, they derive an algorithm for computing a most general typing for expressions e which is only slightly more comp

SORT INFERENCE \coregular signatures, they derive an algorithm for computing a most general typing for expressions e which is only slightly more comp Haskell Overloading is DEXPTIME{complete Helmut Seidl Fachbereich Informatik Universitat des Saarlandes Postfach 151150 D{66041 Saarbrucken Germany seidl@cs.uni-sb.de Febr., 1994 Keywords: Haskell type

More information

A Delayed Vacation Model of an M/G/1 Queue with Setup. Time and its Application to SVCC-based ATM Networks

A Delayed Vacation Model of an M/G/1 Queue with Setup. Time and its Application to SVCC-based ATM Networks IEICE TRANS. COMMUN., VOL. 0, NO. 0 1996 1 PAPER Special Issue on Telecommunications Network Planning and Design A Delayed Vacation Model of an M/G/1 Queue with Setup Time and its Application to SVCCbased

More information

MDP Routing in ATM Networks. Using the Virtual Path Concept 1. Department of Computer Science Department of Computer Science

MDP Routing in ATM Networks. Using the Virtual Path Concept 1. Department of Computer Science Department of Computer Science MDP Routing in ATM Networks Using the Virtual Path Concept 1 Ren-Hung Hwang, James F. Kurose, and Don Towsley Department of Computer Science Department of Computer Science & Information Engineering University

More information

LIST BASED SCHEDULING ALGORITHM FOR HETEROGENEOUS SYSYTEM

LIST BASED SCHEDULING ALGORITHM FOR HETEROGENEOUS SYSYTEM LIST BASED SCHEDULING ALGORITHM FOR HETEROGENEOUS SYSYTEM C. Subramanian 1, N.Rajkumar 2, S. Karthikeyan 3, Vinothkumar 4 1 Assoc.Professor, Department of Computer Applications, Dr. MGR Educational and

More information

n = 2 n = 1 µ λ n = 0

n = 2 n = 1 µ λ n = 0 A Comparison of Allocation Policies in Wavelength Routing Networks Yuhong Zhu, George N. Rouskas, Harry G. Perros Department of Computer Science, North Carolina State University Abstract We consider wavelength

More information

A Comparison of Allocation Policies in Wavelength Routing Networks*

A Comparison of Allocation Policies in Wavelength Routing Networks* Photonic Network Communications, 2:3, 267±295, 2000 # 2000 Kluwer Academic Publishers. Manufactured in The Netherlands. A Comparison of Allocation Policies in Wavelength Routing Networks* Yuhong Zhu, George

More information

Parallel Program Performance Prediction Using Deterministic Task Graph Analysis

Parallel Program Performance Prediction Using Deterministic Task Graph Analysis Parallel Program Performance Prediction Using Deterministic Task Graph Analysis Vikram S. Adve University of Illinois at Urbana-Champaign and Mary K. Vernon University of Wisconsin-Madison In this paper,

More information

Adaptive Migratory Scheme for Distributed Shared Memory 1. Jai-Hoon Kim Nitin H. Vaidya. Department of Computer Science. Texas A&M University

Adaptive Migratory Scheme for Distributed Shared Memory 1. Jai-Hoon Kim Nitin H. Vaidya. Department of Computer Science. Texas A&M University Adaptive Migratory Scheme for Distributed Shared Memory 1 Jai-Hoon Kim Nitin H. Vaidya Department of Computer Science Texas A&M University College Station, TX 77843-3112 E-mail: fjhkim,vaidyag@cs.tamu.edu

More information

Steering. Stream. User Interface. Stream. Manager. Interaction Managers. Snapshot. Stream

Steering. Stream. User Interface. Stream. Manager. Interaction Managers. Snapshot. Stream Agent Roles in Snapshot Assembly Delbert Hart Dept. of Computer Science Washington University in St. Louis St. Louis, MO 63130 hart@cs.wustl.edu Eileen Kraemer Dept. of Computer Science University of Georgia

More information

task object task queue

task object task queue Optimizations for Parallel Computing Using Data Access Information Martin C. Rinard Department of Computer Science University of California, Santa Barbara Santa Barbara, California 9316 martin@cs.ucsb.edu

More information

Enhancing Integrated Layer Processing using Common Case. Anticipation and Data Dependence Analysis. Extended Abstract

Enhancing Integrated Layer Processing using Common Case. Anticipation and Data Dependence Analysis. Extended Abstract Enhancing Integrated Layer Processing using Common Case Anticipation and Data Dependence Analysis Extended Abstract Philippe Oechslin Computer Networking Lab Swiss Federal Institute of Technology DI-LTI

More information

Distributed minimum spanning tree problem

Distributed minimum spanning tree problem Distributed minimum spanning tree problem Juho-Kustaa Kangas 24th November 2012 Abstract Given a connected weighted undirected graph, the minimum spanning tree problem asks for a spanning subtree with

More information

Heap-on-Top Priority Queues. March Abstract. We introduce the heap-on-top (hot) priority queue data structure that combines the

Heap-on-Top Priority Queues. March Abstract. We introduce the heap-on-top (hot) priority queue data structure that combines the Heap-on-Top Priority Queues Boris V. Cherkassky Central Economics and Mathematics Institute Krasikova St. 32 117418, Moscow, Russia cher@cemi.msk.su Andrew V. Goldberg NEC Research Institute 4 Independence

More information

The element the node represents End-of-Path marker e The sons T

The element the node represents End-of-Path marker e The sons T A new Method to index and query Sets Jorg Homann Jana Koehler Institute for Computer Science Albert Ludwigs University homannjkoehler@informatik.uni-freiburg.de July 1998 TECHNICAL REPORT No. 108 Abstract

More information

T H. Runable. Request. Priority Inversion. Exit. Runable. Request. Reply. For T L. For T. Reply. Exit. Request. Runable. Exit. Runable. Reply.

T H. Runable. Request. Priority Inversion. Exit. Runable. Request. Reply. For T L. For T. Reply. Exit. Request. Runable. Exit. Runable. Reply. Experience with Real-Time Mach for Writing Continuous Media Applications and Servers Tatsuo Nakajima Hiroshi Tezuka Japan Advanced Institute of Science and Technology Abstract This paper describes the

More information

Centre for Parallel Computing, University of Westminster, London, W1M 8JS

Centre for Parallel Computing, University of Westminster, London, W1M 8JS Graphical Construction of Parallel Programs G. R. Ribeiro Justo Centre for Parallel Computing, University of Westminster, London, WM 8JS e-mail: justog@wmin.ac.uk, Abstract Parallel programming is not

More information

Dierential-Linear Cryptanalysis of Serpent? Haifa 32000, Israel. Haifa 32000, Israel

Dierential-Linear Cryptanalysis of Serpent? Haifa 32000, Israel. Haifa 32000, Israel Dierential-Linear Cryptanalysis of Serpent Eli Biham, 1 Orr Dunkelman, 1 Nathan Keller 2 1 Computer Science Department, Technion. Haifa 32000, Israel fbiham,orrdg@cs.technion.ac.il 2 Mathematics Department,

More information

SCHEDULING REAL-TIME MESSAGES IN PACKET-SWITCHED NETWORKS IAN RAMSAY PHILP. B.S., University of North Carolina at Chapel Hill, 1988

SCHEDULING REAL-TIME MESSAGES IN PACKET-SWITCHED NETWORKS IAN RAMSAY PHILP. B.S., University of North Carolina at Chapel Hill, 1988 SCHEDULING REAL-TIME MESSAGES IN PACKET-SWITCHED NETWORKS BY IAN RAMSAY PHILP B.S., University of North Carolina at Chapel Hill, 1988 M.S., University of Florida, 1990 THESIS Submitted in partial fulllment

More information

Dynamic Multi-Path Communication for Video Trac. Hao-hua Chu, Klara Nahrstedt. Department of Computer Science. University of Illinois

Dynamic Multi-Path Communication for Video Trac. Hao-hua Chu, Klara Nahrstedt. Department of Computer Science. University of Illinois Dynamic Multi-Path Communication for Video Trac Hao-hua Chu, Klara Nahrstedt Department of Computer Science University of Illinois h-chu3@cs.uiuc.edu, klara@cs.uiuc.edu Abstract Video-on-Demand applications

More information

Abstract Studying network protocols and distributed applications in real networks can be dicult due to the need for complex topologies, hard to nd phy

Abstract Studying network protocols and distributed applications in real networks can be dicult due to the need for complex topologies, hard to nd phy ONE: The Ohio Network Emulator Mark Allman, Adam Caldwell, Shawn Ostermann mallman@lerc.nasa.gov, adam@eni.net ostermann@cs.ohiou.edu School of Electrical Engineering and Computer Science Ohio University

More information

III Data Structures. Dynamic sets

III Data Structures. Dynamic sets III Data Structures Elementary Data Structures Hash Tables Binary Search Trees Red-Black Trees Dynamic sets Sets are fundamental to computer science Algorithms may require several different types of operations

More information

2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006

2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006 2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006 The Encoding Complexity of Network Coding Michael Langberg, Member, IEEE, Alexander Sprintson, Member, IEEE, and Jehoshua Bruck,

More information

Multiprocessing and Scalability. A.R. Hurson Computer Science and Engineering The Pennsylvania State University

Multiprocessing and Scalability. A.R. Hurson Computer Science and Engineering The Pennsylvania State University A.R. Hurson Computer Science and Engineering The Pennsylvania State University 1 Large-scale multiprocessor systems have long held the promise of substantially higher performance than traditional uniprocessor

More information

NFV Resource Allocation using Mixed Queuing Network Model

NFV Resource Allocation using Mixed Queuing Network Model NFV Resource Allocation using Mixed Queuing Network Model Min Sang Yoon, and Ahmed E. Kamal Department of Electrical and Computer Engineering Iowa State University, IA, 50010 {my222, kamal}@iastate.edu

More information

Cluster quality 15. Running time 0.7. Distance between estimated and true means Running time [s]

Cluster quality 15. Running time 0.7. Distance between estimated and true means Running time [s] Fast, single-pass K-means algorithms Fredrik Farnstrom Computer Science and Engineering Lund Institute of Technology, Sweden arnstrom@ucsd.edu James Lewis Computer Science and Engineering University of

More information

Worker-Checker A Framework for Run-time. National Tsing Hua University. Hsinchu, Taiwan 300, R.O.C.

Worker-Checker A Framework for Run-time. National Tsing Hua University. Hsinchu, Taiwan 300, R.O.C. Worker-Checker A Framework for Run-time Parallelization on Multiprocessors Kuang-Chih Liu Chung-Ta King Department of Computer Science National Tsing Hua University Hsinchu, Taiwan 300, R.O.C. e-mail:

More information

Improved Attack on Full-round Grain-128

Improved Attack on Full-round Grain-128 Improved Attack on Full-round Grain-128 Ximing Fu 1, and Xiaoyun Wang 1,2,3,4, and Jiazhe Chen 5, and Marc Stevens 6, and Xiaoyang Dong 2 1 Department of Computer Science and Technology, Tsinghua University,

More information

THE IMPLEMENTATION OF A DISTRIBUTED FILE SYSTEM SUPPORTING THE PARALLEL WORLD MODEL. Jun Sun, Yasushi Shinjo and Kozo Itano

THE IMPLEMENTATION OF A DISTRIBUTED FILE SYSTEM SUPPORTING THE PARALLEL WORLD MODEL. Jun Sun, Yasushi Shinjo and Kozo Itano THE IMPLEMENTATION OF A DISTRIBUTED FILE SYSTEM SUPPORTING THE PARALLEL WORLD MODEL Jun Sun, Yasushi Shinjo and Kozo Itano Institute of Information Sciences and Electronics University of Tsukuba Tsukuba,

More information

Department of. Computer Science. Remapping Subpartitions of. Hyperspace Using Iterative. Genetic Search. Keith Mathias and Darrell Whitley

Department of. Computer Science. Remapping Subpartitions of. Hyperspace Using Iterative. Genetic Search. Keith Mathias and Darrell Whitley Department of Computer Science Remapping Subpartitions of Hyperspace Using Iterative Genetic Search Keith Mathias and Darrell Whitley Technical Report CS-4-11 January 7, 14 Colorado State University Remapping

More information

perform well on paths including satellite links. It is important to verify how the two ATM data services perform on satellite links. TCP is the most p

perform well on paths including satellite links. It is important to verify how the two ATM data services perform on satellite links. TCP is the most p Performance of TCP/IP Using ATM ABR and UBR Services over Satellite Networks 1 Shiv Kalyanaraman, Raj Jain, Rohit Goyal, Sonia Fahmy Department of Computer and Information Science The Ohio State University

More information

Eect of fan-out on the Performance of a. Single-message cancellation scheme. Atul Prakash (Contact Author) Gwo-baw Wu. Seema Jetli

Eect of fan-out on the Performance of a. Single-message cancellation scheme. Atul Prakash (Contact Author) Gwo-baw Wu. Seema Jetli Eect of fan-out on the Performance of a Single-message cancellation scheme Atul Prakash (Contact Author) Gwo-baw Wu Seema Jetli Department of Electrical Engineering and Computer Science University of Michigan,

More information

An ATM Network Planning Model. A. Farago, V.T. Hai, T. Cinkler, Z. Fekete, A. Arato. Dept. of Telecommunications and Telematics

An ATM Network Planning Model. A. Farago, V.T. Hai, T. Cinkler, Z. Fekete, A. Arato. Dept. of Telecommunications and Telematics An ATM Network Planning Model A. Farago, V.T. Hai, T. Cinkler, Z. Fekete, A. Arato Dept. of Telecommunications and Telematics Technical University of Budapest XI. Stoczek u. 2, Budapest, Hungary H-1111

More information

Ecient XPath Axis Evaluation for DOM Data Structures

Ecient XPath Axis Evaluation for DOM Data Structures Ecient XPath Axis Evaluation for DOM Data Structures Jan Hidders Philippe Michiels University of Antwerp Dept. of Math. and Comp. Science Middelheimlaan 1, BE-2020 Antwerp, Belgium, fjan.hidders,philippe.michielsg@ua.ac.be

More information

Automatic Code Generation for Non-Functional Aspects in the CORBALC Component Model

Automatic Code Generation for Non-Functional Aspects in the CORBALC Component Model Automatic Code Generation for Non-Functional Aspects in the CORBALC Component Model Diego Sevilla 1, José M. García 1, Antonio Gómez 2 1 Department of Computer Engineering 2 Department of Information and

More information

Scheduling Periodic and Aperiodic. John P. Lehoczky and Sandra R. Thuel. and both hard and soft deadline aperiodic tasks using xed-priority methods.

Scheduling Periodic and Aperiodic. John P. Lehoczky and Sandra R. Thuel. and both hard and soft deadline aperiodic tasks using xed-priority methods. Chapter 8 Scheduling Periodic and Aperiodic Tasks Using the Slack Stealing Algorithm John P. Lehoczky and Sandra R. Thuel This chapter discusses the problem of jointly scheduling hard deadline periodic

More information

Research on outlier intrusion detection technologybased on data mining

Research on outlier intrusion detection technologybased on data mining Acta Technica 62 (2017), No. 4A, 635640 c 2017 Institute of Thermomechanics CAS, v.v.i. Research on outlier intrusion detection technologybased on data mining Liang zhu 1, 2 Abstract. With the rapid development

More information

Parallel Program Graphs and their. (fvivek dependence graphs, including the Control Flow Graph (CFG) which

Parallel Program Graphs and their. (fvivek dependence graphs, including the Control Flow Graph (CFG) which Parallel Program Graphs and their Classication Vivek Sarkar Barbara Simons IBM Santa Teresa Laboratory, 555 Bailey Avenue, San Jose, CA 95141 (fvivek sarkar,simonsg@vnet.ibm.com) Abstract. We categorize

More information

Algorithms for an FPGA Switch Module Routing Problem with. Application to Global Routing. Abstract

Algorithms for an FPGA Switch Module Routing Problem with. Application to Global Routing. Abstract Algorithms for an FPGA Switch Module Routing Problem with Application to Global Routing Shashidhar Thakur y Yao-Wen Chang y D. F. Wong y S. Muthukrishnan z Abstract We consider a switch-module-routing

More information

(SHT) share the same one-to-many voting pattern and the representation of the accumulator array. In the original paper on the Probabilistic Hough Tran

(SHT) share the same one-to-many voting pattern and the representation of the accumulator array. In the original paper on the Probabilistic Hough Tran Progressive Probabilistic Hough Transform J. Matas y x, C. Galambos y and J. Kittler y y CVSSP, University of Surrey, Guildford, Surrey GU2 5XH, United Kingdom e-mail: g.matas@ee.surrey.ac.uk x Centre

More information

Approximate Linear Programming for Average-Cost Dynamic Programming

Approximate Linear Programming for Average-Cost Dynamic Programming Approximate Linear Programming for Average-Cost Dynamic Programming Daniela Pucci de Farias IBM Almaden Research Center 65 Harry Road, San Jose, CA 51 pucci@mitedu Benjamin Van Roy Department of Management

More information

A technique for adding range restrictions to. August 30, Abstract. In a generalized searching problem, a set S of n colored geometric objects

A technique for adding range restrictions to. August 30, Abstract. In a generalized searching problem, a set S of n colored geometric objects A technique for adding range restrictions to generalized searching problems Prosenjit Gupta Ravi Janardan y Michiel Smid z August 30, 1996 Abstract In a generalized searching problem, a set S of n colored

More information

Laxmi N. Bhuyan, Ravi R. Iyer, Tahsin Askar, Ashwini K. Nanda and Mohan Kumar. Abstract

Laxmi N. Bhuyan, Ravi R. Iyer, Tahsin Askar, Ashwini K. Nanda and Mohan Kumar. Abstract Performance of Multistage Bus Networks for a Distributed Shared Memory Multiprocessor 1 Laxmi N. Bhuyan, Ravi R. Iyer, Tahsin Askar, Ashwini K. Nanda and Mohan Kumar Abstract A Multistage Bus Network (MBN)

More information

160 M. Nadjarbashi, S.M. Fakhraie and A. Kaviani Figure 2. LUTB structure. each block-level track can be arbitrarily connected to each of 16 4-LUT inp

160 M. Nadjarbashi, S.M. Fakhraie and A. Kaviani Figure 2. LUTB structure. each block-level track can be arbitrarily connected to each of 16 4-LUT inp Scientia Iranica, Vol. 11, No. 3, pp 159{164 c Sharif University of Technology, July 2004 On Routing Architecture for Hybrid FPGA M. Nadjarbashi, S.M. Fakhraie 1 and A. Kaviani 2 In this paper, the routing

More information

Analytic Performance Models for Bounded Queueing Systems

Analytic Performance Models for Bounded Queueing Systems Analytic Performance Models for Bounded Queueing Systems Praveen Krishnamurthy Roger D. Chamberlain Praveen Krishnamurthy and Roger D. Chamberlain, Analytic Performance Models for Bounded Queueing Systems,

More information

Maple on the Intel Paragon. Laurent Bernardin. Institut fur Wissenschaftliches Rechnen. ETH Zurich, Switzerland.

Maple on the Intel Paragon. Laurent Bernardin. Institut fur Wissenschaftliches Rechnen. ETH Zurich, Switzerland. Maple on the Intel Paragon Laurent Bernardin Institut fur Wissenschaftliches Rechnen ETH Zurich, Switzerland bernardin@inf.ethz.ch October 15, 1996 Abstract We ported the computer algebra system Maple

More information

A Study of Query Execution Strategies. for Client-Server Database Systems. Department of Computer Science and UMIACS. University of Maryland

A Study of Query Execution Strategies. for Client-Server Database Systems. Department of Computer Science and UMIACS. University of Maryland A Study of Query Execution Strategies for Client-Server Database Systems Donald Kossmann Michael J. Franklin Department of Computer Science and UMIACS University of Maryland College Park, MD 20742 f kossmann

More information