This article appeared in Proc. 7th IEEE Symposium on Computers and Communications, Taormina/Giardini Naxos, Italy, July , IEEE Computer

Size: px
Start display at page:

Download "This article appeared in Proc. 7th IEEE Symposium on Computers and Communications, Taormina/Giardini Naxos, Italy, July , IEEE Computer"

Transcription

1 This article appeared in Proc. 7th IEEE Symposium on Computers and Communications, Taormina/Giardini Naxos, Italy, July , IEEE Computer Society. Software Supports for Preemptive Rollback in Optimistic Parallel Simulation on Myrinet Clusters Francesco Quaglia and Andrea Santoro DIS, Universita di Roma \La Sapienza" Via Salaria 113, Roma, Italy Abstract In this paper we present a communication layer for Myrinet based clusters, designed to eciently support preemptive rollback operations in optimistic parallel simulation. Beyond standard low latency message delivery functionalities, this layer also embeds functionalities for allowing the overlying simulation application to eciently track whether an incoming message will actually produce causality inconsistency of the currently executed simulation event upon its receipt at the application level. Exploiting these functionalities, awareness of the inconsistency precedes the message receipt at the application level, thus allowing timely event execution interruption for activating rollback procedures. Experimental results on a standard simulation benchmark show that the layer we implement allows a strong reduction of the rollback overhead which, in its turn, yields strong performance improvements (up to 33%), especially in case of large parallelism in the simulation model execution. 1 Introduction In parallel discrete-event simulation, distinct parts of the system to be simulated are modeled by distinct Logical Processes (LPs), with their own local simulation clocks [4]. LPs execute sequences of simulation events, and each event execution possibly schedules new events to be executed by any LP at later simulation time. The notication of scheduled events among distinct LPs takes place through the exchange of messages carrying the content and the occurrence time, namely the timestamp, of the event. In order to ensure correct simulation results, synchronization mechanisms are used to maintain a non-decreasing timestamp order for the execution of events at each LP. Optimistic synchronization [6] allows any LP to execute events unless its pending event set is empty, and uses a checkpoint-based rollback to recover from timestamp order violations, namely causality inconsistencies. A rollback recovers the state of the LP to its value immediately prior the violation. While rolling back, the LP undoes the eects of the events scheduled during the rolled back portion of the simulation. This is done by sending an antimessage for each event that must be undone. Upon the receipt of an antimessage that cancels an event already executed, the recipient LP rolls back as well. This is also referred to as \cascading rollback". In current simulation software technology, event execution at an LP is typically handled as an atomic action, and check for incoming messages/antimessages, which might reveal causality inconsistency, is performed only in between consecutive event executions ( 1 ). This means that rollback operations for an LP, if required, do never preempt any in progress event execution activity [4]. Such a preemption exhibits, however, the following three advantages: (i) The simulation application does not waste CPU time to complete the execution of a causally inconsistent event that must be rolled back. (ii) Interruption of the event execution might avoid sending some event notication messages to other LPs. Those messages, if sent, should be eventually revoked through the corresponding antimessages. Therefore, avoiding sending them might achieve a reduction of the communication overhead. Also, a reduction of the amount of messages to be revoked might reduce the likelihood of cascading rollback. (iii) Interruption of the event execution allows anticipating the sending of antimessages required by rollback operations. Therefore we get a reduction of the likelihood that, upon the antimessage receipt, the corresponding message was already processed by the recipient LP. This additionally contributes to reduce the likelihood of cascading rollback. The reason why current simulation software technology does not employ preemptive rollback is the inadequacy of general purpose communication layers functionalities. Indeed, using any of those layers, the only chance an LP has to become aware, during event execution, of any message/antimessage revealing a causality inconsistency, is to run the event routine and the message receipt module in time inter-leaved mode. This is because the layer itself is not aware of the content of transmitted data, therefore causality inconsistency ver- 1 In the standard approach to optimistic parallel simulation, in case multiple LPs are hosted by the same machine, the schedule sequence of event executions is determined on the basis of the lowest-timestamp-rst rule, which means that the LP scheduled for event execution is always the one having, in its pending event set, the non-executed event with the lowest timestamp. According to this scheduling rule, causality inconsistencies can occur only due to interprocessor communication.

2 ication must be performed at the simulation application level. However, this solution shows several drawbacks making it not viable. First, time inter-leaving might cause additional execution costs (i.e. application level context switch costs). More important, event execution time is stretched thus causing delay in sending notication messages for newly scheduled events, which possibly increases the likelihood of timestamp order violations at the recipient LPs [2, 12]. At worst such an increase might give rise to rollback thrashing in the parallel execution. In this paper we present software supports for preemptive rollback in case of optimistic parallel simulations on Myrinet clusters. These supports allow the implementation of preemptive rollback operations with no inter-leaving between event routine and message receipt at the application level. Actually, Myrinet network cards, as well as other types of network cards of the last generation [7], are equipped with a programmable processor, typically used only for message transfer purposes. We employ that same processor to perform \on-the-y" verication on incoming messages/antimessages, so as to ascertain whether the LP being executed is causally consistent. Then, the simulation application software reads the verication outcome at negligible cost by means of calls to a light function belonging to the API associated with the communication layer. We test the eectiveness of these software supports by means of an empirical study on a standard simulation benchmark. The data show that rollback costs can be denitely reduced, with consequent increase (up to 33%) of the speed of the parallel execution. The remainder of this paper is organized as follows. In Section 2 we describe the software supports for preemptive rollback we have developed. Performance results are reported in Section 3. 2 The Software Supports The software supports we present are an extension of a classical message passing layer for Myrinet clusters. To make clear the extension design, we preliminary report a brief description of the Myrinet architecture and of a classical layer organization. Then the extension is presented in detail. 2.1 Myrinet Overview A Myrinet network consists of a switch based on wormhole technology and of interface cards in between the switch and the hosts, which are based on the LANai chip [8, 9, 10]. At high level, the interface card architecture can be schematized as shown in Figure 1. Specically, it is a programmable communication device equipped with: (a) An on-board programmable RISC Processor (RP). (b) A RAM bank memory (Internal Memory, namely IM), accessible by RP, which can be mapped into the host address space. (c) A packet interface towards the switch, associated with Send/Receive DMA engines used for IM/packetinterface data transfer and vice versa. (d) A PCI bridge Switch Messages RISC Processor (RP) Packet Interface Send DMA Receive DMA Myrinet Card Internal Bus (IB) EBUS DMA RAM Bank (IM) PCI Bridge Host Figure 1. Myrinet Card Architecture. Host Memory Receive Queue towards the host. (e) A DMA engine, called EBUS DMA (External Bus DMA) able to transfer data from the IM to the host main memory and vice versa. All the components of the architecture are interconnected through an Internal Bus (IB) and the three DMA engines are programmable by RP. As a nal observation, RP cannot access the host memory directly. Nonetheless, the program run by RP, typically known as Control Program (CP), can activate the EBUS DMA for performing data transfer operations to/from the host memory. As a common choice to fast speed messaging layers for Myrinet, see for example [11], messages incoming from the switch are temporarily buered into the IM, with data transfer between the packet interface and the IM taking place through the Receive DMA. The messages are then transferred into the receive queue, located onto host memory, through the EBUS DMA (see the directed dotted line in Figure 1). Once transferred into the receive queue, a message is received by the application program very eciently by simply performing a memcpy() of the message content into a proper buer in the application address space. In case multiple messages stored in contiguous slots of IM must be transferred into the receive queue, a single EBUS DMA transfer is used for the whole set of messages. This technique is referred to as \block-dma". Another common choice consists of performing a send operation according to the \zero-copy" method, which means that the content of the message to be sent is directly copied into the IM (with no intermediate buering into host memory). Then the message is transferred onto the network through the Send DMA. The responsibility to program the three DMA engines anytime there is the need for supporting a given message transfer operation pertains to the CP run by RP. The main loop of the CP for our implementation is as follows: 1. While (1){ 2. if (message needs to be sent) activate_send_dma(); 3. if (message needs to be received) activate_receive_dma(); 4. if (Send DMA completed) complete_send(); 5. if (Receive DMA completed) complete_receive(); 6. if (block-dma needed) block_dma(); 7. }

3 2.2 Layer Extension for Preemptive Rollback We decided to extend the classical layer functionalities by actively employing the potential of RP. The aim of the extension is to introduce an on-the-y causality inconsistency tracking mechanism as schematized in Figure 2. The idea is to identify, in the incoming data stream, all the messages/antimessages destined to the LP currently executing a simulation event, if any. Then causality inconsistency verication is performed by comparing the timestamp of an identied message/antimessage with the current simulation clock of that LP. Notication of causal inconsistent execution at the application level takes place by setting a ag whose value can be checked by the LP at its own pace during event execution. Note that identication and verication performed at the level of the CP run by RP can take place without altering the order in which messages incoming from the switch are inserted into the receive queue located onto host memory. As in our implementation, this order is typically FCFS, that together with the FCFS property guaranteed by transmission through the Myrinet switch, ensures point to point message delivery reecting the message send order. Actually identication and verication can take place just after the incoming message/antimessage has been buered into the IM on-board of the card (recall buering is performed by the Receive DMA). Note that identication and verication performed at that point do not require structural modications to the CP, and do not require special handling, such as an additional intermediate buering of the message/antimessage. incoming message/antimessage Myrinet Card flag causality inconsistency verification polling event execution Host interruption (in case a violation is revealed) receive queue housekeeping operations Figure 2. Causality Inconsistency Tracking. In our implementation, the function performing identication and verication has been called peek(), and the main loop of the CP has been slightly modied as shown below in order to embed a call to this function as soon as buering of a message/antimessage into the IM is completed (see lines 5 to 7 of the loop): 1. While (1){ 2. if (message needs to be sent) activate_send_dma(); 3. if (message needs to be received) activate_receive_dma(); 4. if (Send DMA completed) complete_send(); 5. if (Receive DMA completed){ 6. complete_receive(); 7. peek(); 8. } 9. if (block-dma needed) block_dma(); 10.} The function peek() needs the following information to perform its task: (i) The identier of the LP, if any, Time currently executing a simulation event. (ii) The current simulation clock of this LP. This information must be made available into the IM since, as pointed out before, the CP cannot access host memory. To this purpose we have introduced, in the API associated with the layer, the function set tracking(int LP id, double simulation clock), which writes both the LP identi- er and its simulation clock to a proper buer located into the IM on-board of the card (recall IM is mapped in the host address space so that the host can perform read/write operations on it). Then the function peek() determines the occurrence of a causality inconsistency on the running LP by simply comparing the current LP simulation clock and the timestamp of any incoming message/antimessage destined to that LP. If the timestamp value is less than the simulation clock value, the current event execution is not causally consistent. Beyond writing the LP identier and its simulation clock to a proper buer in the IM, set tracking() has also the responsibility to enable causality inconsistency tracking performed at the level of the CP. This is done by setting a ag, namely control flag, implemented as a word (4 bytes) located into the IM ( 2 ). Flag setting indicates to the function peek() that causality inconsistency tracking is currently required by the application level. peek() checks the value of control flag as rst action and avoids performing causality inconsistency tracking in case the ag is not set. In other words, set tracking() acts as an initializer/activator for the causality inconsistency tracking mechanism. This function should be called right after the simulation clock of the LP scheduled for running is updated to the timestamp of the event that is going to be executed. In standard simulation code, this happens as soon as the LP enters the event routine. (A more detailed description of the usage, at the simulation application level, of the whole set of preemptive rollback support functions belonging to the API will be presented in Section 2.3.) Disabling causality inconsistency tracking, i.e. resetting control flag, can be performed through the function reset tracking(), which we have introduced in the API associated with the layer. This function should be called at the application level prior the start of any period in which causality inconsistency tracking is not required. The reason why we have introduced control flag is twofold. First, disabling causality inconsistency tracking whenever not required by the application level increases the responsiveness of the CP in programming/controlling the DMA engines on-board of the card. Second, and more important, critical races between the software run at the host side and the CP run by RP might occur, causing errors in the outcome of the causality inconsistency tracking mechanism. Actu- 2 We have used a word instead of a single byte for control flag since Myrinet cards internal circuitry is optimized for aligned words access.

4 set tracking() in charge of this transition OFF ON CP in charge of this transition reset tracking() in charge of this transition SHUT-DOWN Figure 3. Transition State Diagram. ally, adequate management of control flag as a three state ag (see Figure 3) prevents those races. If a causality inconsistency is detected, the function peek() sets to TRUE a ag, namely inconsistency flag, implemented as a word located into the IM. This ag is reset to FALSE by set tracking() upon its call. The application software checks the value of inconsistency flag through the function check causality inconsistency(), which we have introduced into the API. This function simply returns the ag value, namely TRUE in case a causality inconsistency has been revealed, FALSE otherwise. In other words, the function check causality inconsistency() can be used at the simulation application level for implementing an ecient polling based approach to the detection of causality inconsistency of the running LP. 2.3 Usage at the Application Level Preemptive rollback support functions are expected to be employed as follows. As soon as the simulation clock of the LP is moved to the event to be executed (this typically happens at the event execution starting) the function set tracking() should be called. In case this function is called prior to updating the LP simulation clock, causality inconsistency tracking might result less eective since it cannot reveal a timestamp order violation occurring in the simulation time interval between the not yet updated value of the simulation clock and the timestamp of the event currently executed by the LP. Just before the event execution is completed, the function reset traking() should be called to disable causality inconsistency tracking. The two functions above initialize/enable and disable causality inconsistency tracking. In order to actually read the tracking outcome, periodic calls to check causality inconsistency() should be issued. In case one of these calls reveals a causality inconsistency, the event execution should be interrupted. As a last action before interruption, the function reset tracking() should be called to disable again the causality inconsistency tracking mechanism until a new call to set tracking() is issued by any LP that will be scheduled for event execution. We have measured the time required to execute the function check causality inconsistency() in case of a Pentium II 300 MHz with LINUX (kernel version ) equipped with an M2M-PCI32C Myrinet card ( 3 ), and the measurement outcome is The data we report in Section 3 refer to experiments exe- secs. Considering that parallel discrete event technology is typically adopted for simulation models with event granularity (i.e. event execution time) ranging from several tens or hundreds of microseconds up to some milliseconds or more on current conventional architectures, the overhead due to polling performed by check causality inconsistency() is negligible in practice even if several calls to this function are performed within a single event execution. As respect to the strategy for placing these calls within the event code, we note that revealing a causality inconsistency right before sending out any notication message for a new event is likely to amplify the benets from preemptive rollback. From the point of view of pure communication, the Myrinet layer we have extended with previously described functionalities is characterized by the following set of API functions: send(int msg type, int machine id, void *message) and receive(int msg type, void *message, int *machine id). Both send and receive functions treat the message content as a simple sequence of bytes, therefore the overlaying application is in charge of dening an adequate structure for the content itself. On the other hand, the layer extension we have presented accesses the content (i.e. the identier of the destination LP and the timestamp) at the level of the CP through the function peek(). To make this access working properly, any message/antimessage must comply with the following format: (i) the rst four bytes must contain the LP id of the destination LP, expressed as an integer; (ii) the following eight bytes must contain the timestamp, expressed as a double precision oating point. The rest of the message is of no concern to the CP. 3 Performance Data In this section we report the results of an empirical study on the eectiveness of the software supports for preemptive rollback we have implemented. The experiments were all performed on a cluster of 8 Pentium II 300 MHz (128 Mbytes RAM Kbytes second level cache). All the PCs of the cluster run LINUX (kernel version ) and are equipped with M2M-PCI32C Myrinet cards. We have used a classical parameterized simulation benchmark, namely the PHOLD model [3]. This benchmark consists of a xed number of LPs, and of a constant number of ctitious simulation events circulating among the LPs. The execution of an event at an LP has the only eect to schedule a new event to be executed by any LP, with a timestamp increment selected according to some stochastic distribution. We have chosen this simulation benchmark for two main reasons: (i) its parameters (e.g. routing of events, timestamp increment distributions, event granularity, etc.) can be easily modied, (ii) it is the most used benchmark for testing the eectiveness of opticuted on a cluster of this type of PCs.

5 mizations in support of parallel discrete event simulation. [1, 13, 14, 15, 16, 17, 18, 19, 20]. In our experiments we have considered a PHOLD model with an exponential distribution for the timestamp increment, with mean 10 simulation time units, and we have xed the amount of initial ctitious simulation events at 5 per LP (a similar workload has been used, as an example, in the experimental study in [19]). Also, the event execution time has been set at about 100 microseconds, which is an intermediate value for parallel discrete event applications. As in the common approach [19], this has been done by structuring the event code as a simple busy loop. The independent parameter in the study is the number of LPs per machine, which has been varied from 1 to 8. ER (preemtive roll.) / ER (non-preemtive roll.) Figure 4. ER vs (Ratio Between Preemptive Rollback and Non- Preemptive Rollback). EFF (%) non-preemptive rollback preemptive rollback Figure 5. EFF vs. F Figure 6. F vs. COPCE non-preemptive rollback preemptive rollback Figure 7. COPCE vs. We report measures related to the following parameters. The event rate (ER), that is, the number of committed simulation events per second. This parameter indicates how fast is the simulation execution, it is therefore representative of the achieved performance. The eciency (EFF), that is, the ratio between the number of committed simulation events and the total number of executed simulation events (committed plus rolled back). This parameter is an indicator of the percentage of productive simulation work performed. The communication overhead per committed event (COPCE), that is the number of antimessages plus the number of revoked messages per committed event (simulation messages/antimessages that do not need to be transferred over the network, i.e. those destined to local LPs, are not included in the measurement). This parameter is an indicator of the additional communication cost, which must be paid due to the rollback mechanism, for each productive simulation event execution. We have measured the previous parameters both for the case of a typical execution with no preemptive rollback and for the case of preemptive rollback based on periodic calls (polling) to check causality inconsistency() embedded within the previously mentioned busy loop characterizing the event execution. Polling is done each 10 microseconds, therefore, recalling that a single call to check causality inconsistency() takes about 0.5 sec on this experimental platform (see Section 2.3), we get an overhead on the event execution of about 5%. For the case of preemptive rollback we have also measured the fraction F of rolled back events that have been also preempted during their execution. This parameter indicates the eectiveness of preemptive rollback in revealing causal inconsistencies just during event execution. Each reported value results as the average over 10 runs, all done with dierent random seeds for the stochastic timestamp increment. At least committed events were simulated in each run. In Figure 4, the ratio between the ER achieved in case of preemptive rollback and the ER achieved with classical non-preemptive rollback is plotted. The data show that when the degree of parallelism in the simulation model execution is high (i.e. 1-2 LPs per machine), preemptive rollback allows an execution speed which is between 1.33 and 1.10 times the one achievable with non-preemptive rollback, with a gain between 33% and 10%. Then the gain disappears when the degree of parallelism gets lower (i.e. 4-8 LPs) per machine. This is an expected behavior when looking at the plots of EFF in Figure 5. Specically, EFF in the order of 85-90%, achieved for the case of 4-8 LPs per machine, indicates that the likelihood of causal inconsistent execution is denitely low ( 4 ). As a consequence, 4 Increase in the eciency for the case of limited degree of parallelism derives from the fact that the simulation clocks of the LPs typically exhibit less divergence as compared to high

6 there is a reduced likelihood that, upon the arrival of a message/antimessage at the destination host, the currently executed event, if any, needs to be interrupted due to a causal inconsistency. Therefore, the usage of preemptive rollback actually leads to parallel execution dynamics which are quite similar to those achieved with classical non-preemptive rollback. As an empirical support to this deduction, the plot for F in Figure 6 clearly indicates that the fraction of preempted events among those rolled back becomes minimal for the 4-8 LPs per machine case. However, this is not a aw of the preemptive technique since, for parallel executions exhibiting high eciency, dierent dynamics between preemptive and non-preemptive rollback are anyway expected to produce very similar performance. This is due to the fact that the cost of rollback is limited, therefore, even if preemptive rollback would achieve rollback cost reductions, the nal impact on performance is expected to be negligible in practice. As an additional observation, by the plots in Figure 5 we note that preemptive and non-preemptive rollback exhibit in practice the same values for EFF. Instead, by the plots in Figure 7, for the case of high degree of parallelism in the simulation model execution, preemptive rollback achieves a strong reduction of COPCE (up to 20%). This is an indication that the gain in the execution speed is not due to variations in the amount of rollback, but to the reduced costs associated with any single rollback operation. (Note that the reduction of the rollback cost achieved by preemptive rollback derives not only from the reduced values of COPCE, but also from the reduced waste of CPU time due to timely interruption of a causally inconsistent event execution.) Additional advantages can be expected whenever preemptive rollback also triggers a reduction in the amount of rollback, as in its potential. Finally, we note that high degree of parallelism in the simulation model execution is typical of the recent parallel simulation approach based on federations of existing sequential simulators (in this solution we get a situation similar to the 1 LP per machine execution [5] since each LP can be considered, in practice, as a sequential discrete event simulator). On the basis of previous results, adopting optimistic synchronization mechanism among the federated simulators could take strong advantages from usage of the software supports for preemptive rollback we have developed. 4 Summary In this paper we have presented software supports for the implementation of preemptive rollback operations relying on timely interruption of the execution of a causally inconsistent simulation event. These supports have been developed for optimistic parallel discrete event simulators running on top of Myrinet based architectures, and derive from an extension of a classical Myrinet message-passing layer. degree of parallelism, which yields reduced amounts of causality violations. We have discussed how to use the preemptive rollback functions at the simulation application level and we have also reported the results of an empirical study on a standard simulation benchmark, which demonstrate the eectiveness of the presented extension. Specically, these results show that the performance can be increased up to 33% in case of high degree of parallelism in the simulation model execution. As discussed, the extension should be particularly useful for the recent approach to parallel discrete event simulation based on federated optimistic execution of multiple sequential discrete event simulators, which typically exhibits high degree of execution parallelism. References [1] D. Bruce, \The Treatment of State in Optimistic Systems", Proc. 9-th Workshop on Parallel and Distributed Simulation (PADS'95), pp.40-49, June [2] C. D. Carothers, R. Fujimoto and P. England, \Eect of Communication Overheads on Time Warp Performance: An experimental Study", Proc. 8th Workshop on Parallel and Distributed Simulation (PADS'94), pp , July [3] R.M. Fujimoto, \Performance of Time Warp Under Synthetic Workloads", Proc. Multiconf. Distributed Simulation, Vol.22, No.1, January [4] R.M. Fujimoto, \Parallel Discrete Event Simulation", Communications of ACM, Vol.33, No.10, pp.30-53, [5] R.M. Fujimoto, \Exploiting Temporal Uncertainty in Parallel and Distributed Simulations", Proc. 13-th Workshop on Parallel and Distributed Simulation (PADS'99), pp.46-53, May [6] D.R. Jeerson, \Virtual Time", ACM Transactions on Programming Languages and Systems, Vol.7, No.3, pp , [7] \Intelligent Ethernet Interface Solutions", [8] MYRICOM, \LANai 4", Draft, February [9] MYRICOM, \LANai 7", Draft, June Available at [10] MYRICOM, \LANai 9", May Available at [11] S. Pakin, M. Lauria and A. Chen, \High Performance Messaging on Workstations: Illinois Fast Messages (FM) for Myrinet", Proc. Supercomputing'95, [12] B.R. Preiss, W.M. Loucks and D. MacIntyre, \Eects of the Checkpoint Interval on Time and Space in Time Warp", ACM Transactions on Modeling and Computer Simulation, Vol. 4, No. 3, pp , [13] F.Quaglia, \Event History Based Sparse State Saving in Time Warp", Proc. 12-th Workshop on Parallel and Distributed Simulation (PADS'98), pp.72-79, May [14] F. Quaglia, \A Cost Model for Selecting Checkpoint Positions in Time Warp Parallel Simulation", IEEE Transactions on Parallel and Distributed Systems, Vol.12, No.4, pp , April [15] F. Quaglia and V. Cortellessa, \Grain Sensitive Event Scheduling in Time Warp Parallel Discrete Event Simulation", Proc. 14th Workshop on Parallel and Distributed Simulation (PADS'00), pp , May [16] R. Ronngren and R. Ayani, \Adaptive Checkpointing in Time Warp", Proc. 8-th Workshop on Parallel and Distributed Simulation (PADS'94), pp , May [17] T.K. Som and R.G. Sargent, \A Probabilistic Event Scheduling Policy for Optimistic Parallel Discrete Event Simulation", Proc. 12-th Workshop on Parallel and Distributed Simulation (PADS'98), pp.56-63, May [18] S. Skold and R. Ronngren, \Event Sensitive State Saving in Time Warp Parallel Discrete Event Simulations", Proc Winter Simulation Conference, December [19] S. Srinivasan and P.F. Reynolds Jr., \Elastic Time", ACM Transactions on Modeling and Computer Simulation, Vol.8, No.2, pp , [20] D. West and K. Panesar, \Automatic Incremental State Saving", Proc. 10-th Workshop on Parallel and Distributed Simulation (PADS'96), pp.78-85, May 1996.

χ=5 virtual time state LVT entirely saved state partially saved state χ=5 ν=2 virtual time state LVT entirely saved partially saved unsaved state

χ=5 virtual time state LVT entirely saved state partially saved state χ=5 ν=2 virtual time state LVT entirely saved partially saved unsaved state ROLLBACK-BASED PARALLEL DISCRETE EVENT SIMULATION BY USING HYBRID STATE SAVING Francesco Quaglia Dipartimento di Informatica e Sistemistica, Universita di Roma "La Sapienza" Via Salaria 113, 00198 Roma,

More information

An Empirical Performance Study of Connection Oriented Time Warp Parallel Simulation

An Empirical Performance Study of Connection Oriented Time Warp Parallel Simulation 230 The International Arab Journal of Information Technology, Vol. 6, No. 3, July 2009 An Empirical Performance Study of Connection Oriented Time Warp Parallel Simulation Ali Al-Humaimidi and Hussam Ramadan

More information

AGGRESSIVENESS/RISK EFFECTS BASED SCHEDULING IN TIME WARP

AGGRESSIVENESS/RISK EFFECTS BASED SCHEDULING IN TIME WARP Proceedings of the 2000 Winter Simulation Conference J. A. Joines, R. R. Barton, K. Kang, and P. A. Fishwick, eds. AGGRESSIVENESS/RISK EFFECTS BASED SCHEDULING IN TIME WARP Vittorio Cortellessa Computer

More information

Event List Management In Distributed Simulation

Event List Management In Distributed Simulation Event List Management In Distributed Simulation Jörgen Dahl ½, Malolan Chetlur ¾, and Philip A Wilsey ½ ½ Experimental Computing Laboratory, Dept of ECECS, PO Box 20030, Cincinnati, OH 522 0030, philipwilsey@ieeeorg

More information

Consistent Logical Checkpointing. Nitin H. Vaidya. Texas A&M University. Phone: Fax:

Consistent Logical Checkpointing. Nitin H. Vaidya. Texas A&M University. Phone: Fax: Consistent Logical Checkpointing Nitin H. Vaidya Department of Computer Science Texas A&M University College Station, TX 77843-3112 hone: 409-845-0512 Fax: 409-847-8578 E-mail: vaidya@cs.tamu.edu Technical

More information

Eect of fan-out on the Performance of a. Single-message cancellation scheme. Atul Prakash (Contact Author) Gwo-baw Wu. Seema Jetli

Eect of fan-out on the Performance of a. Single-message cancellation scheme. Atul Prakash (Contact Author) Gwo-baw Wu. Seema Jetli Eect of fan-out on the Performance of a Single-message cancellation scheme Atul Prakash (Contact Author) Gwo-baw Wu Seema Jetli Department of Electrical Engineering and Computer Science University of Michigan,

More information

Rollback Overhead Reduction Methods for Time Warp Distributed Simulation

Rollback Overhead Reduction Methods for Time Warp Distributed Simulation Rollback Overhead Reduction Methods for Time Warp Distributed Simulation M.S. Balsamo and C. Manconi* Dipartimento di Matematica e Informatica, University of Udine Vial delle Scienze 108, Udine, Italy,

More information

Other Optimistic Mechanisms, Memory Management!

Other Optimistic Mechanisms, Memory Management! Other Optimistic Mechanisms, Memory Management! Richard M. Fujimoto! Professor!! Computational Science and Engineering Division! College of Computing! Georgia Institute of Technology! Atlanta, GA 30332-0765,

More information

Steering. Stream. User Interface. Stream. Manager. Interaction Managers. Snapshot. Stream

Steering. Stream. User Interface. Stream. Manager. Interaction Managers. Snapshot. Stream Agent Roles in Snapshot Assembly Delbert Hart Dept. of Computer Science Washington University in St. Louis St. Louis, MO 63130 hart@cs.wustl.edu Eileen Kraemer Dept. of Computer Science University of Georgia

More information

A taxonomy of race. D. P. Helmbold, C. E. McDowell. September 28, University of California, Santa Cruz. Santa Cruz, CA

A taxonomy of race. D. P. Helmbold, C. E. McDowell. September 28, University of California, Santa Cruz. Santa Cruz, CA A taxonomy of race conditions. D. P. Helmbold, C. E. McDowell UCSC-CRL-94-34 September 28, 1994 Board of Studies in Computer and Information Sciences University of California, Santa Cruz Santa Cruz, CA

More information

RTI Performance on Shared Memory and Message Passing Architectures

RTI Performance on Shared Memory and Message Passing Architectures RTI Performance on Shared Memory and Message Passing Architectures Steve L. Ferenci Richard Fujimoto, PhD College Of Computing Georgia Institute of Technology Atlanta, GA 3332-28 {ferenci,fujimoto}@cc.gatech.edu

More information

The latency of user-to-user, kernel-to-kernel and interrupt-to-interrupt level communication

The latency of user-to-user, kernel-to-kernel and interrupt-to-interrupt level communication The latency of user-to-user, kernel-to-kernel and interrupt-to-interrupt level communication John Markus Bjørndalen, Otto J. Anshus, Brian Vinter, Tore Larsen Department of Computer Science University

More information

6.9. Communicating to the Outside World: Cluster Networking

6.9. Communicating to the Outside World: Cluster Networking 6.9 Communicating to the Outside World: Cluster Networking This online section describes the networking hardware and software used to connect the nodes of cluster together. As there are whole books and

More information

Kevin Skadron. 18 April Abstract. higher rate of failure requires eective fault-tolerance. Asynchronous consistent checkpointing oers a

Kevin Skadron. 18 April Abstract. higher rate of failure requires eective fault-tolerance. Asynchronous consistent checkpointing oers a Asynchronous Checkpointing for PVM Requires Message-Logging Kevin Skadron 18 April 1994 Abstract Distributed computing using networked workstations oers cost-ecient parallel computing, but the higher rate

More information

Subject Name: OPERATING SYSTEMS. Subject Code: 10EC65. Prepared By: Kala H S and Remya R. Department: ECE. Date:

Subject Name: OPERATING SYSTEMS. Subject Code: 10EC65. Prepared By: Kala H S and Remya R. Department: ECE. Date: Subject Name: OPERATING SYSTEMS Subject Code: 10EC65 Prepared By: Kala H S and Remya R Department: ECE Date: Unit 7 SCHEDULING TOPICS TO BE COVERED Preliminaries Non-preemptive scheduling policies Preemptive

More information

Optimistic Distributed Simulation Based on Transitive Dependency. Tracking. Dept. of Computer Sci. AT&T Labs-Research Dept. of Elect. & Comp.

Optimistic Distributed Simulation Based on Transitive Dependency. Tracking. Dept. of Computer Sci. AT&T Labs-Research Dept. of Elect. & Comp. Optimistic Distributed Simulation Based on Transitive Dependency Tracking Om P. Damani Yi-Min Wang Vijay K. Garg Dept. of Computer Sci. AT&T Labs-Research Dept. of Elect. & Comp. Eng Uni. of Texas at Austin

More information

The Avalanche Myrinet Simulation Package. University of Utah, Salt Lake City, UT Abstract

The Avalanche Myrinet Simulation Package. University of Utah, Salt Lake City, UT Abstract The Avalanche Myrinet Simulation Package User Manual for V. Chen-Chi Kuo, John B. Carter fchenchi, retracg@cs.utah.edu WWW: http://www.cs.utah.edu/projects/avalanche UUCS-96- Department of Computer Science

More information

CS370 Operating Systems

CS370 Operating Systems CS370 Operating Systems Colorado State University Yashwant K Malaiya Spring 1018 L10 Synchronization Slides based on Text by Silberschatz, Galvin, Gagne Various sources 1 1 FAQ Development project: You

More information

DISTRIBUTED SELF-SIMULATION OF HOLONIC MANUFACTURING SYSTEMS

DISTRIBUTED SELF-SIMULATION OF HOLONIC MANUFACTURING SYSTEMS DISTRIBUTED SELF-SIMULATION OF HOLONIC MANUFACTURING SYSTEMS Naoki Imasaki I, Ambalavanar Tharumarajah 2, Shinsuke Tamura 3 J Toshiba Corporation, Japan, naoki.imasaki@toshiba.co.jp 2 CSIRO Manufacturing

More information

T H. Runable. Request. Priority Inversion. Exit. Runable. Request. Reply. For T L. For T. Reply. Exit. Request. Runable. Exit. Runable. Reply.

T H. Runable. Request. Priority Inversion. Exit. Runable. Request. Reply. For T L. For T. Reply. Exit. Request. Runable. Exit. Runable. Reply. Experience with Real-Time Mach for Writing Continuous Media Applications and Servers Tatsuo Nakajima Hiroshi Tezuka Japan Advanced Institute of Science and Technology Abstract This paper describes the

More information

global checkpoint and recovery line interchangeably). When processes take local checkpoint independently, a rollback might force the computation to it

global checkpoint and recovery line interchangeably). When processes take local checkpoint independently, a rollback might force the computation to it Checkpointing Protocols in Distributed Systems with Mobile Hosts: a Performance Analysis F. Quaglia, B. Ciciani, R. Baldoni Dipartimento di Informatica e Sistemistica Universita di Roma "La Sapienza" Via

More information

QUESTION BANK UNIT I

QUESTION BANK UNIT I QUESTION BANK Subject Name: Operating Systems UNIT I 1) Differentiate between tightly coupled systems and loosely coupled systems. 2) Define OS 3) What are the differences between Batch OS and Multiprogramming?

More information

CHAPTER 4 AN INTEGRATED APPROACH OF PERFORMANCE PREDICTION ON NETWORKS OF WORKSTATIONS. Xiaodong Zhang and Yongsheng Song

CHAPTER 4 AN INTEGRATED APPROACH OF PERFORMANCE PREDICTION ON NETWORKS OF WORKSTATIONS. Xiaodong Zhang and Yongsheng Song CHAPTER 4 AN INTEGRATED APPROACH OF PERFORMANCE PREDICTION ON NETWORKS OF WORKSTATIONS Xiaodong Zhang and Yongsheng Song 1. INTRODUCTION Networks of Workstations (NOW) have become important distributed

More information

ANALYSIS OF THE CORRELATION BETWEEN PACKET LOSS AND NETWORK DELAY AND THEIR IMPACT IN THE PERFORMANCE OF SURGICAL TRAINING APPLICATIONS

ANALYSIS OF THE CORRELATION BETWEEN PACKET LOSS AND NETWORK DELAY AND THEIR IMPACT IN THE PERFORMANCE OF SURGICAL TRAINING APPLICATIONS ANALYSIS OF THE CORRELATION BETWEEN PACKET LOSS AND NETWORK DELAY AND THEIR IMPACT IN THE PERFORMANCE OF SURGICAL TRAINING APPLICATIONS JUAN CARLOS ARAGON SUMMIT STANFORD UNIVERSITY TABLE OF CONTENTS 1.

More information

Application Program. Language Runtime. SCore-D UNIX. Myrinet. WS or PC

Application Program. Language Runtime. SCore-D UNIX. Myrinet. WS or PC Global State Detection using Network Preemption Atsushi Hori Hiroshi Tezuka Yutaka Ishikawa Tsukuba Research Center Real World Computing Partnership 1-6-1 Takezono, Tsukuba-shi, Ibaraki 305, JAPAN TEL:+81-298-53-1661,

More information

Ch 4 : CPU scheduling

Ch 4 : CPU scheduling Ch 4 : CPU scheduling It's the basis of multiprogramming operating systems. By switching the CPU among processes, the operating system can make the computer more productive In a single-processor system,

More information

Performance Evaluation of Myrinet-based Network Router

Performance Evaluation of Myrinet-based Network Router Performance Evaluation of Myrinet-based Network Router Information and Communications University 2001. 1. 16 Chansu Yu, Younghee Lee, Ben Lee Contents Suez : Cluster-based Router Suez Implementation Implementation

More information

Comparison of Priority Queue algorithms for Hierarchical Scheduling Framework. Mikael Åsberg

Comparison of Priority Queue algorithms for Hierarchical Scheduling Framework. Mikael Åsberg Comparison of Priority Queue algorithms for Hierarchical Scheduling Framework Mikael Åsberg mag04002@student.mdh.se August 28, 2008 2 The Time Event Queue (TEQ) is a datastructure that is part of the implementation

More information

Operating Systems Design Fall 2010 Exam 1 Review. Paul Krzyzanowski

Operating Systems Design Fall 2010 Exam 1 Review. Paul Krzyzanowski Operating Systems Design Fall 2010 Exam 1 Review Paul Krzyzanowski pxk@cs.rutgers.edu 1 Question 1 To a programmer, a system call looks just like a function call. Explain the difference in the underlying

More information

Some Thoughts on Distributed Recovery. (preliminary version) Nitin H. Vaidya. Texas A&M University. Phone:

Some Thoughts on Distributed Recovery. (preliminary version) Nitin H. Vaidya. Texas A&M University. Phone: Some Thoughts on Distributed Recovery (preliminary version) Nitin H. Vaidya Department of Computer Science Texas A&M University College Station, TX 77843-3112 Phone: 409-845-0512 Fax: 409-847-8578 E-mail:

More information

Using Time Division Multiplexing to support Real-time Networking on Ethernet

Using Time Division Multiplexing to support Real-time Networking on Ethernet Using Time Division Multiplexing to support Real-time Networking on Ethernet Hariprasad Sampathkumar 25 th January 2005 Master s Thesis Defense Committee Dr. Douglas Niehaus, Chair Dr. Jeremiah James,

More information

In examining performance Interested in several things Exact times if computable Bounded times if exact not computable Can be measured

In examining performance Interested in several things Exact times if computable Bounded times if exact not computable Can be measured System Performance Analysis Introduction Performance Means many things to many people Important in any design Critical in real time systems 1 ns can mean the difference between system Doing job expected

More information

Switch. Switch. PU: Pentium Pro 200MHz Memory: 128MB Myricom Myrinet 100Base-T Ethernet

Switch. Switch. PU: Pentium Pro 200MHz Memory: 128MB Myricom Myrinet 100Base-T Ethernet COMPaS: A Pentium Pro PC-based SMP Cluster and its Experience Yoshio Tanaka 1, Motohiko Matsuda 1, Makoto Ando 1, Kazuto Kubota and Mitsuhisa Sato 1 Real World Computing Partnership fyoshio,matu,ando,kazuto,msatog@trc.rwcp.or.jp

More information

On Checkpoint Latency. Nitin H. Vaidya. In the past, a large number of researchers have analyzed. the checkpointing and rollback recovery scheme

On Checkpoint Latency. Nitin H. Vaidya. In the past, a large number of researchers have analyzed. the checkpointing and rollback recovery scheme On Checkpoint Latency Nitin H. Vaidya Department of Computer Science Texas A&M University College Station, TX 77843-3112 E-mail: vaidya@cs.tamu.edu Web: http://www.cs.tamu.edu/faculty/vaidya/ Abstract

More information

(Preliminary Version 2 ) Jai-Hoon Kim Nitin H. Vaidya. Department of Computer Science. Texas A&M University. College Station, TX

(Preliminary Version 2 ) Jai-Hoon Kim Nitin H. Vaidya. Department of Computer Science. Texas A&M University. College Station, TX Towards an Adaptive Distributed Shared Memory (Preliminary Version ) Jai-Hoon Kim Nitin H. Vaidya Department of Computer Science Texas A&M University College Station, TX 77843-3 E-mail: fjhkim,vaidyag@cs.tamu.edu

More information

Performance Evaluation of Two Home-Based Lazy Release Consistency Protocols for Shared Virtual Memory Systems

Performance Evaluation of Two Home-Based Lazy Release Consistency Protocols for Shared Virtual Memory Systems The following paper was originally published in the Proceedings of the USENIX 2nd Symposium on Operating Systems Design and Implementation Seattle, Washington, October 1996 Performance Evaluation of Two

More information

Optimistic Parallel Simulation of TCP/IP over ATM networks

Optimistic Parallel Simulation of TCP/IP over ATM networks Optimistic Parallel Simulation of TCP/IP over ATM networks M.S. Oral Examination November 1, 2000 Ming Chong mchang@ittc.ukans.edu 1 Introduction parallel simulation ProTEuS Agenda Georgia Tech. Time Warp

More information

Exploiting On-Chip Data Transfers for Improving Performance of Chip-Scale Multiprocessors

Exploiting On-Chip Data Transfers for Improving Performance of Chip-Scale Multiprocessors Exploiting On-Chip Data Transfers for Improving Performance of Chip-Scale Multiprocessors G. Chen 1, M. Kandemir 1, I. Kolcu 2, and A. Choudhary 3 1 Pennsylvania State University, PA 16802, USA 2 UMIST,

More information

Real-Time Component Software. slide credits: H. Kopetz, P. Puschner

Real-Time Component Software. slide credits: H. Kopetz, P. Puschner Real-Time Component Software slide credits: H. Kopetz, P. Puschner Overview OS services Task Structure Task Interaction Input/Output Error Detection 2 Operating System and Middleware Application Software

More information

Performance Modeling of a Parallel I/O System: An. Application Driven Approach y. Abstract

Performance Modeling of a Parallel I/O System: An. Application Driven Approach y. Abstract Performance Modeling of a Parallel I/O System: An Application Driven Approach y Evgenia Smirni Christopher L. Elford Daniel A. Reed Andrew A. Chien Abstract The broadening disparity between the performance

More information

On Checkpoint Latency. Nitin H. Vaidya. Texas A&M University. Phone: (409) Technical Report

On Checkpoint Latency. Nitin H. Vaidya. Texas A&M University.   Phone: (409) Technical Report On Checkpoint Latency Nitin H. Vaidya Department of Computer Science Texas A&M University College Station, TX 77843-3112 E-mail: vaidya@cs.tamu.edu Phone: (409) 845-0512 FAX: (409) 847-8578 Technical Report

More information

Removing the Latency Overhead of the ITB Mechanism in COWs with Source Routing Λ

Removing the Latency Overhead of the ITB Mechanism in COWs with Source Routing Λ Removing the Latency Overhead of the ITB Mechanism in COWs with Source Routing Λ J. Flich, M. P. Malumbres, P. López and J. Duato Dpto. of Computer Engineering (DISCA) Universidad Politécnica de Valencia

More information

Abstract Studying network protocols and distributed applications in real networks can be dicult due to the need for complex topologies, hard to nd phy

Abstract Studying network protocols and distributed applications in real networks can be dicult due to the need for complex topologies, hard to nd phy ONE: The Ohio Network Emulator Mark Allman, Adam Caldwell, Shawn Ostermann mallman@lerc.nasa.gov, adam@eni.net ostermann@cs.ohiou.edu School of Electrical Engineering and Computer Science Ohio University

More information

Multimedia Systems 2011/2012

Multimedia Systems 2011/2012 Multimedia Systems 2011/2012 System Architecture Prof. Dr. Paul Müller University of Kaiserslautern Department of Computer Science Integrated Communication Systems ICSY http://www.icsy.de Sitemap 2 Hardware

More information

Node Application Logic. SCI Interface. Output FIFO. Input FIFO. Bypass FIFO M U X. Output Link. Input Link. Address Decoder

Node Application Logic. SCI Interface. Output FIFO. Input FIFO. Bypass FIFO M U X. Output Link. Input Link. Address Decoder Real-Time Message Transmission Over The Scalable Coherent Interface (SCI) Lei Jiang Sarit Mukherjee Dept. of Computer Science & Engg. University of Nebraska-Lincoln Lincoln, NE 68588-0115 Email: fljiang,

More information

CHAPTER 8: MEMORY MANAGEMENT. By I-Chen Lin Textbook: Operating System Concepts 9th Ed.

CHAPTER 8: MEMORY MANAGEMENT. By I-Chen Lin Textbook: Operating System Concepts 9th Ed. CHAPTER 8: MEMORY MANAGEMENT By I-Chen Lin Textbook: Operating System Concepts 9th Ed. Chapter 8: Memory Management Background Swapping Contiguous Memory Allocation Segmentation Paging Structure of the

More information

Logged Virtual Memory. David R. Cheriton and Kenneth J. Duda. Computer Science Department. Stanford University. Stanford, CA 94305

Logged Virtual Memory. David R. Cheriton and Kenneth J. Duda. Computer Science Department. Stanford University. Stanford, CA 94305 Logged Virtual Memory David R. Cheriton and Kenneth J. Duda Computer Science Department Stanford University Stanford, CA 9435 fcheriton,kjdg@cs.stanford.edu Abstract Logged virtual memory (LVM) provides

More information

A First Implementation of In-Transit Buffers on Myrinet GM Software Λ

A First Implementation of In-Transit Buffers on Myrinet GM Software Λ A First Implementation of In-Transit Buffers on Myrinet GM Software Λ S. Coll, J. Flich, M. P. Malumbres, P. López, J. Duato and F.J. Mora Universidad Politécnica de Valencia Camino de Vera, 14, 46071

More information

Midterm Exam Amy Murphy 6 March 2002

Midterm Exam Amy Murphy 6 March 2002 University of Rochester Midterm Exam Amy Murphy 6 March 2002 Computer Systems (CSC2/456) Read before beginning: Please write clearly. Illegible answers cannot be graded. Be sure to identify all of your

More information

Comp 204: Computer Systems and Their Implementation. Lecture 18: Devices

Comp 204: Computer Systems and Their Implementation. Lecture 18: Devices Comp 204: Computer Systems and Their Implementation Lecture 18: Devices 1 Today Devices Introduction Handling I/O Device handling Buffering and caching 2 Operating System An Abstract View User Command

More information

Introduction to Operating Systems. Chapter Chapter

Introduction to Operating Systems. Chapter Chapter Introduction to Operating Systems Chapter 1 1.3 Chapter 1.5 1.9 Learning Outcomes High-level understand what is an operating system and the role it plays A high-level understanding of the structure of

More information

Georgia Tech Time Warp (GTW Version 3.1) Programmer's Manual for Distributed Network of. Richard M. Fujimoto, Samir R. Das, Kiran S.

Georgia Tech Time Warp (GTW Version 3.1) Programmer's Manual for Distributed Network of. Richard M. Fujimoto, Samir R. Das, Kiran S. Georgia Tech Time Warp (GTW Version 3.1) Programmer's Manual for Distributed Network of Workstations Richard M. Fujimoto, Samir R. Das, Kiran S. Panesar, Maria Hybinette and Chris Carothers College of

More information

160 M. Nadjarbashi, S.M. Fakhraie and A. Kaviani Figure 2. LUTB structure. each block-level track can be arbitrarily connected to each of 16 4-LUT inp

160 M. Nadjarbashi, S.M. Fakhraie and A. Kaviani Figure 2. LUTB structure. each block-level track can be arbitrarily connected to each of 16 4-LUT inp Scientia Iranica, Vol. 11, No. 3, pp 159{164 c Sharif University of Technology, July 2004 On Routing Architecture for Hybrid FPGA M. Nadjarbashi, S.M. Fakhraie 1 and A. Kaviani 2 In this paper, the routing

More information

Probabilistic Worst-Case Response-Time Analysis for the Controller Area Network

Probabilistic Worst-Case Response-Time Analysis for the Controller Area Network Probabilistic Worst-Case Response-Time Analysis for the Controller Area Network Thomas Nolte, Hans Hansson, and Christer Norström Mälardalen Real-Time Research Centre Department of Computer Engineering

More information

A Scalable Multiprocessor for Real-time Signal Processing

A Scalable Multiprocessor for Real-time Signal Processing A Scalable Multiprocessor for Real-time Signal Processing Daniel Scherrer, Hans Eberle Institute for Computer Systems, Swiss Federal Institute of Technology CH-8092 Zurich, Switzerland {scherrer, eberle}@inf.ethz.ch

More information

March 6, 2000 Applications that process and/or transfer Continuous Media (audio and video) streams become

March 6, 2000 Applications that process and/or transfer Continuous Media (audio and video) streams become Increasing the Clock Interrupt Frequency for Better Support of Real-Time Applications Constantinos Dovrolis Parameswaran Ramanathan Department of Electrical and Computer Engineering University of Wisconsin-Madison

More information

Operating Systems. Deadlock. Lecturer: William Fornaciari Politecnico di Milano Homel.deib.polimi.

Operating Systems. Deadlock. Lecturer: William Fornaciari Politecnico di Milano Homel.deib.polimi. Politecnico di Milano Operating Systems Lecturer: William Fornaciari Politecnico di Milano william.fornaciari@elet.polimi.it Homel.deib.polimi.it/fornacia Summary What is a resource Conditions for deadlock

More information

FAULT TOLERANT SYSTEMS

FAULT TOLERANT SYSTEMS FAULT TOLERANT SYSTEMS http://www.ecs.umass.edu/ece/koren/faulttolerantsystems Part 16 - Checkpointing I Chapter 6 - Checkpointing Part.16.1 Failure During Program Execution Computers today are much faster,

More information

CPU Scheduling. Rab Nawaz Jadoon. Assistant Professor DCS. Pakistan. COMSATS, Lahore. Department of Computer Science

CPU Scheduling. Rab Nawaz Jadoon. Assistant Professor DCS. Pakistan. COMSATS, Lahore. Department of Computer Science CPU Scheduling Rab Nawaz Jadoon DCS COMSATS Institute of Information Technology Assistant Professor COMSATS, Lahore Pakistan Operating System Concepts Objectives To introduce CPU scheduling, which is the

More information

Tape Channel Analyzer Windows Driver Spec.

Tape Channel Analyzer Windows Driver Spec. Tape Channel Analyzer Windows Driver Spec. 1.1 Windows Driver The Driver handles the interface between the Adapter and the Adapter Application Program. The driver follows Microsoft Windows Driver Model

More information

Lookahead Accumulation in Conservative Parallel Discrete Event Simulation.

Lookahead Accumulation in Conservative Parallel Discrete Event Simulation. Lookahead Accumulation in Conservative Parallel Discrete Event Simulation. Jan Lemeire, Wouter Brissinck, Erik Dirkx Parallel Systems lab, Vrije Universiteit Brussel (VUB) Brussels, Belgium {jlemeire,

More information

Enhancing Integrated Layer Processing using Common Case. Anticipation and Data Dependence Analysis. Extended Abstract

Enhancing Integrated Layer Processing using Common Case. Anticipation and Data Dependence Analysis. Extended Abstract Enhancing Integrated Layer Processing using Common Case Anticipation and Data Dependence Analysis Extended Abstract Philippe Oechslin Computer Networking Lab Swiss Federal Institute of Technology DI-LTI

More information

Chapter 8: Virtual Memory. Operating System Concepts

Chapter 8: Virtual Memory. Operating System Concepts Chapter 8: Virtual Memory Silberschatz, Galvin and Gagne 2009 Chapter 8: Virtual Memory Background Demand Paging Copy-on-Write Page Replacement Allocation of Frames Thrashing Memory-Mapped Files Allocating

More information

IMPROVING PERFORMANCE OF PARALLEL SIMULATION KERNEL FOR WIRELESS NETWORK SIMULATIONS

IMPROVING PERFORMANCE OF PARALLEL SIMULATION KERNEL FOR WIRELESS NETWORK SIMULATIONS IMPROVING PERFORMANCE OF PARALLEL SIMULATION KERNEL FOR WIRELESS NETWORK SIMULATIONS M. Thoppian, S. Venkatesan, H. Vu, R. Prakash, N. Mittal Department of Computer Science, The University of Texas at

More information

Devices. Today. Comp 104: Operating Systems Concepts. Operating System An Abstract View 05/01/2017. Devices. Devices

Devices. Today. Comp 104: Operating Systems Concepts. Operating System An Abstract View 05/01/2017. Devices. Devices Comp 104: Operating Systems Concepts Devices Today Devices Introduction Handling I/O Device handling Buffering and caching 1 2 Operating System An Abstract View User Command Interface Processor Manager

More information

Introduction to Operating. Chapter Chapter

Introduction to Operating. Chapter Chapter Introduction to Operating Systems Chapter 1 1.3 Chapter 1.5 1.9 Learning Outcomes High-level understand what is an operating system and the role it plays A high-level understanding of the structure of

More information

Chapter 12. CPU Structure and Function. Yonsei University

Chapter 12. CPU Structure and Function. Yonsei University Chapter 12 CPU Structure and Function Contents Processor organization Register organization Instruction cycle Instruction pipelining The Pentium processor The PowerPC processor 12-2 CPU Structures Processor

More information

EE108B Lecture 17 I/O Buses and Interfacing to CPU. Christos Kozyrakis Stanford University

EE108B Lecture 17 I/O Buses and Interfacing to CPU. Christos Kozyrakis Stanford University EE108B Lecture 17 I/O Buses and Interfacing to CPU Christos Kozyrakis Stanford University http://eeclass.stanford.edu/ee108b 1 Announcements Remaining deliverables PA2.2. today HW4 on 3/13 Lab4 on 3/19

More information

Component-Based Communication Support for Parallel Applications Running on Workstation Clusters

Component-Based Communication Support for Parallel Applications Running on Workstation Clusters Component-Based Communication Support for Parallel Applications Running on Workstation Clusters Antônio Augusto Fröhlich 1 and Wolfgang Schröder-Preikschat 2 1 GMD FIRST Kekulésraÿe 7 D-12489 Berlin, Germany

More information

Request for Comments: 851 Obsoletes RFC: 802. The ARPANET 1822L Host Access Protocol RFC 851. Andrew G. Malis ARPANET Mail:

Request for Comments: 851 Obsoletes RFC: 802. The ARPANET 1822L Host Access Protocol RFC 851. Andrew G. Malis ARPANET Mail: Request for Comments: 851 Obsoletes RFC: 802 The ARPANET 1822L Host Access Protocol Andrew G. Malis ARPANET Mail: malis@bbn-unix Bolt Beranek and Newman Inc. 50 Moulton St. Cambridge, MA 02238 April 1983

More information

MC7204 OPERATING SYSTEMS

MC7204 OPERATING SYSTEMS MC7204 OPERATING SYSTEMS QUESTION BANK UNIT I INTRODUCTION 9 Introduction Types of operating systems operating systems structures Systems components operating systems services System calls Systems programs

More information

Adaptive Memory Allocations in Clusters to Handle Unexpectedly Large Data-Intensive Jobs

Adaptive Memory Allocations in Clusters to Handle Unexpectedly Large Data-Intensive Jobs IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 15, NO. 7, JULY 2004 577 Adaptive Memory Allocations in Clusters to Handle Unexpectedly Large Data-Intensive Jobs Li Xiao, Member, IEEE Computer

More information

DSP/BIOS Kernel Scalable, Real-Time Kernel TM. for TMS320 DSPs. Product Bulletin

DSP/BIOS Kernel Scalable, Real-Time Kernel TM. for TMS320 DSPs. Product Bulletin Product Bulletin TM DSP/BIOS Kernel Scalable, Real-Time Kernel TM for TMS320 DSPs Key Features: Fast, deterministic real-time kernel Scalable to very small footprint Tight integration with Code Composer

More information

sequence number trillian:1166_==>_marvin:3999 (time sequence graph)

sequence number trillian:1166_==>_marvin:3999 (time sequence graph) Fixing Two BSD TCP Bugs Mark Allman Sterling Software NASA Lewis Research Center 21000 Brookpark Rd. MS 54-2 Cleveland, OH 44135 mallman@lerc.nasa.gov CR-204151 Abstract 2 Two Segment Initial Window This

More information

Architectural Differences nc. DRAM devices are accessed with a multiplexed address scheme. Each unit of data is accessed by first selecting its row ad

Architectural Differences nc. DRAM devices are accessed with a multiplexed address scheme. Each unit of data is accessed by first selecting its row ad nc. Application Note AN1801 Rev. 0.2, 11/2003 Performance Differences between MPC8240 and the Tsi106 Host Bridge Top Changwatchai Roy Jenevein risc10@email.sps.mot.com CPD Applications This paper discusses

More information

Introduction to Operating Systems. Chapter Chapter

Introduction to Operating Systems. Chapter Chapter Introduction to Operating Systems Chapter 1 1.3 Chapter 1.5 1.9 Learning Outcomes High-level understand what is an operating system and the role it plays A high-level understanding of the structure of

More information

COMPILED CODE IN DISTRIBUTED LOGIC SIMULATION. Jun Wang Carl Tropper. School of Computer Science McGill University Montreal, Quebec, CANADA H3A2A6

COMPILED CODE IN DISTRIBUTED LOGIC SIMULATION. Jun Wang Carl Tropper. School of Computer Science McGill University Montreal, Quebec, CANADA H3A2A6 Proceedings of the 2006 Winter Simulation Conference L. F. Perrone, F. P. Wieland, J. Liu, B. G. Lawson, D. M. Nicol, and R. M. Fujimoto, eds. COMPILED CODE IN DISTRIBUTED LOGIC SIMULATION Jun Wang Carl

More information

GFS: The Google File System

GFS: The Google File System GFS: The Google File System Brad Karp UCL Computer Science CS GZ03 / M030 24 th October 2014 Motivating Application: Google Crawl the whole web Store it all on one big disk Process users searches on one

More information

Optimistic Implementation of. Bulk Data Transfer Protocols. John B. Carter. Willy Zwaenepoel. Rice University. Houston, Texas.

Optimistic Implementation of. Bulk Data Transfer Protocols. John B. Carter. Willy Zwaenepoel. Rice University. Houston, Texas. Optimistic Implementation of Bulk Data Transfer Protocols John B. Carter Willy Zwaenepoel Department of Computer Science Rice University Houston, Texas Abstract During a bulk data transfer over a high

More information

Routing without Flow Control Hot-Potato Routing Simulation Analysis

Routing without Flow Control Hot-Potato Routing Simulation Analysis Routing without Flow Control Hot-Potato Routing Simulation Analysis Lawrence Bush Computer Science Department Rensselaer Polytechnic Institute Troy, New York December 5, 22 Abstract This paper presents

More information

Analyzing Memory Access Patterns and Optimizing Through Spatial Memory Streaming. Ogün HEPER CmpE 511 Computer Architecture December 24th, 2009

Analyzing Memory Access Patterns and Optimizing Through Spatial Memory Streaming. Ogün HEPER CmpE 511 Computer Architecture December 24th, 2009 Analyzing Memory Access Patterns and Optimizing Through Spatial Memory Streaming Ogün HEPER CmpE 511 Computer Architecture December 24th, 2009 Agenda Introduction Memory Hierarchy Design CPU Speed vs.

More information

THE IMPLEMENTATION OF A DISTRIBUTED FILE SYSTEM SUPPORTING THE PARALLEL WORLD MODEL. Jun Sun, Yasushi Shinjo and Kozo Itano

THE IMPLEMENTATION OF A DISTRIBUTED FILE SYSTEM SUPPORTING THE PARALLEL WORLD MODEL. Jun Sun, Yasushi Shinjo and Kozo Itano THE IMPLEMENTATION OF A DISTRIBUTED FILE SYSTEM SUPPORTING THE PARALLEL WORLD MODEL Jun Sun, Yasushi Shinjo and Kozo Itano Institute of Information Sciences and Electronics University of Tsukuba Tsukuba,

More information

Department of Computer Science, Institute for System Architecture, Operating Systems Group. Real-Time Systems '08 / '09. Hardware.

Department of Computer Science, Institute for System Architecture, Operating Systems Group. Real-Time Systems '08 / '09. Hardware. Department of Computer Science, Institute for System Architecture, Operating Systems Group Real-Time Systems '08 / '09 Hardware Marcus Völp Outlook Hardware is Source of Unpredictability Caches Pipeline

More information

Memory. From Chapter 3 of High Performance Computing. c R. Leduc

Memory. From Chapter 3 of High Performance Computing. c R. Leduc Memory From Chapter 3 of High Performance Computing c 2002-2004 R. Leduc Memory Even if CPU is infinitely fast, still need to read/write data to memory. Speed of memory increasing much slower than processor

More information

Towards Adaptive Caching for Parallel and Discrete Event Simulation

Towards Adaptive Caching for Parallel and Discrete Event Simulation Towards Adaptive Caching for Parallel and Discrete Event Simulation Abhishek Chugh and Maria Hybinette Computer Science Department The University of Georgia 415 Boyd Graduate Studies Research Center Athens,

More information

A Concurrency Control for Transactional Mobile Agents

A Concurrency Control for Transactional Mobile Agents A Concurrency Control for Transactional Mobile Agents Jeong-Joon Yoo and Dong-Ik Lee Department of Information and Communications, Kwang-Ju Institute of Science and Technology (K-JIST) Puk-Gu Oryong-Dong

More information

Q.1 Explain Computer s Basic Elements

Q.1 Explain Computer s Basic Elements Q.1 Explain Computer s Basic Elements Ans. At a top level, a computer consists of processor, memory, and I/O components, with one or more modules of each type. These components are interconnected in some

More information

Module 11: I/O Systems

Module 11: I/O Systems Module 11: I/O Systems Reading: Chapter 13 Objectives Explore the structure of the operating system s I/O subsystem. Discuss the principles of I/O hardware and its complexity. Provide details on the performance

More information

Chair for Network Architectures and Services Prof. Carle Department of Computer Science TU München. Parallel simulation

Chair for Network Architectures and Services Prof. Carle Department of Computer Science TU München. Parallel simulation Chair for Network Architectures and Services Prof. Carle Department of Computer Science TU München Parallel simulation Most slides/figures borrowed from Richard Fujimoto Parallel simulation: Summary/Outline

More information

Main Points of the Computer Organization and System Software Module

Main Points of the Computer Organization and System Software Module Main Points of the Computer Organization and System Software Module You can find below the topics we have covered during the COSS module. Reading the relevant parts of the textbooks is essential for a

More information

Chapter 4. Routers with Tiny Buffers: Experiments. 4.1 Testbed experiments Setup

Chapter 4. Routers with Tiny Buffers: Experiments. 4.1 Testbed experiments Setup Chapter 4 Routers with Tiny Buffers: Experiments This chapter describes two sets of experiments with tiny buffers in networks: one in a testbed and the other in a real network over the Internet2 1 backbone.

More information

FB(9,3) Figure 1(a). A 4-by-4 Benes network. Figure 1(b). An FB(4, 2) network. Figure 2. An FB(27, 3) network

FB(9,3) Figure 1(a). A 4-by-4 Benes network. Figure 1(b). An FB(4, 2) network. Figure 2. An FB(27, 3) network Congestion-free Routing of Streaming Multimedia Content in BMIN-based Parallel Systems Harish Sethu Department of Electrical and Computer Engineering Drexel University Philadelphia, PA 19104, USA sethu@ece.drexel.edu

More information

residual residual program final result

residual residual program final result C-Mix: Making Easily Maintainable C-Programs run FAST The C-Mix Group, DIKU, University of Copenhagen Abstract C-Mix is a tool based on state-of-the-art technology that solves the dilemma of whether to

More information

1 Introduction A mobile computing system is a distributed system where some of nodes are mobile computers [3]. The location of mobile computers in the

1 Introduction A mobile computing system is a distributed system where some of nodes are mobile computers [3]. The location of mobile computers in the Low-Cost Checkpointing and Failure Recovery in Mobile Computing Systems Ravi Prakash and Mukesh Singhal Department of Computer and Information Science The Ohio State University Columbus, OH 43210. e-mail:

More information

Chapter 1 Computer System Overview

Chapter 1 Computer System Overview Operating Systems: Internals and Design Principles Chapter 1 Computer System Overview Ninth Edition By William Stallings Operating System Exploits the hardware resources of one or more processors Provides

More information

FM24C Kb FRAM Serial Memory Features

FM24C Kb FRAM Serial Memory Features Preliminary FM24C512 512Kb FRAM Serial Memory Features 512Kbit Ferroelectric Nonvolatile RAM Organized as 65,536 x 8 bits High Endurance 10 Billion (10 10 ) Read/Writes 45 year Data Retention NoDelay Writes

More information

CSC 553 Operating Systems

CSC 553 Operating Systems CSC 553 Operating Systems Lecture 1- Computer System Overview Operating System Exploits the hardware resources of one or more processors Provides a set of services to system users Manages secondary memory

More information

Efficiency and memory footprint of Xilkernel for the Microblaze soft processor

Efficiency and memory footprint of Xilkernel for the Microblaze soft processor Efficiency and memory footprint of Xilkernel for the Microblaze soft processor Dariusz Caban, Institute of Informatics, Gliwice, Poland - June 18, 2014 The use of a real-time multitasking kernel simplifies

More information

Blocking vs. Non-blocking Communication under. MPI on a Master-Worker Problem. Institut fur Physik. TU Chemnitz. D Chemnitz.

Blocking vs. Non-blocking Communication under. MPI on a Master-Worker Problem. Institut fur Physik. TU Chemnitz. D Chemnitz. Blocking vs. Non-blocking Communication under MPI on a Master-Worker Problem Andre Fachat, Karl Heinz Homann Institut fur Physik TU Chemnitz D-09107 Chemnitz Germany e-mail: fachat@physik.tu-chemnitz.de

More information

TR-CS The rsync algorithm. Andrew Tridgell and Paul Mackerras. June 1996

TR-CS The rsync algorithm. Andrew Tridgell and Paul Mackerras. June 1996 TR-CS-96-05 The rsync algorithm Andrew Tridgell and Paul Mackerras June 1996 Joint Computer Science Technical Report Series Department of Computer Science Faculty of Engineering and Information Technology

More information