TIME WARP PARALLEL LOGIC SIMULATION ON A DISTRIBUTED MEMORY MULTIPROCESSOR. Peter Luksch, Holger Weitlich
|
|
- Lionel Hodge
- 5 years ago
- Views:
Transcription
1 TIME WARP PARALLEL LOGIC SIMULATION ON A DISTRIBUTED MEMORY MULTIPROCESSOR ABSTRACT Peter Luksch, Holger Weitlich Department of Computer Science, Munich University of Technology P.O. Box, D-W-8-Munchen, Germany phone: ; fax: ; luksch@informatik.tu-muenchen.de Germany to appear in SCS European Simulation Conference, Lyon, June 7--9, 99 In this paper we describe a Time Warp based parallel implementation of an event driven logic simulator on a distributed memory multiprocessor (ipsc/86). The basic Time Warp mechanism has been complemented with an optimized method for incremental state saving and a mechanism that optimizes re-simulation of a rolled back period of simulated time which is especially worthwhile for complex elements. In addition to static partitioning where elements are distributed to partitions either randomly or by using a min-cut algorithm dynamic repartitioning is possible in our implementation. For our measurements, we used a set of wellknown benchmark circuits. Speedups showed to be strongly dependent on the circuit being simulated, its input stimuli and on the way circuits are partitioned. However, one observation has been made with most of the workloads: The simulators' lvt's tend to diverge extremely throughout the simulation. Even though memory requirements for state saving have been minimized, simulators whose lvt is far ahead of the other processes run out memory for larger circuits. We therefore had to limit Time Warp's optimism by preventing simulators from getting too far ahead of gvt. THE TIME WARP MECHANISM The Time Warp mechanism [Jeerson, 98] is an optimistic synchronisation protocol which can be used to synchronize parallel discrete event simulation that is based on model partitioning. The protocol, however, is not restricted to this application. Each process simulates a partition of the circuit's elements and has its own simulation time (lvt) and event list. Whenever it generates an event aecting a signal that connects to remote partitions, it sends an event message to the corresponding processes. Processes simulate their partition based on their current This work has been partially funded by the DFG (\Deutsche Forschungsgemeinschaft", German science foundation) under contract No. SFB, TP A information about signal values, which, however, may be incorrect because events with time stamps in the local past of the receiving simulators may arrive from remote partitions. Such an event message is referred to as a straggler. Stragglers as well as anti messages (i.e. messages informing a process about the incorrectness of an event message) cause the simulator to roll back, i.e. return to the point where simulation began to be incorrect. Rollback involves restoration of the local state information for the point in simulated time to which the simulator rolls back and the cancellation of all messages sent in the rolled back period. Therefore, the simulation process has to store information about its local state and the messages it has sent. One method to undo messages is to send anti messages immediately upon rollback (aggressive cancellation). The alternative approach, lazy cancellation, is based on the observation that a large portion of events can be expected to be generated again during re-simulation of the rolled back time interval. Therefore, an anti message is sent only if the corresponding event is guaranteed not to be generated again. In our implementation, we use lazy cancellation. Global progress of a Time Warp simulation is measured by the global virtual time (gvt), which is the minimum of the local simulation times and the time stamps of all un-processed events in the system. A number of gvt algorithms have been proposed in literature [Samadi, 98, Lin & Lazowska, 99, Bauer & Sporrer, 99]. TIME WARP PARALLELIZATION OF A LOGIC SIMULATOR The basis for our parallel implementation is a logic simulator for the gate level that implements most of today's state-of-the art techniques in the modelling of digital circuits [Krodel & Antreich, 99]. It uses a six-valued logic and allows for ambiguity delays to be modelled explicitly. The program is written in c for
2 a unix environment. The parallel program has been implemented on ipsc/86 and ipsc/ multiprocessors using mmk, a parallel programming library designed within project SFB, TP A. communication rate [MB/sec]. MMK remote send operation ipsc/86 Communication. Figure displays the performance of the communication system as a function of message length. For each message there is a signicant startup time of s on the ipsc/86 and ms on the ipsc/ which is independent from message length. The latency is due to the circuit switched message passing on the ipsc's. Therefore, a given amount of information should be transferred using few long messages instead of many short ones, i.e. several event messages have to be combined into one message that is transmitted by the communication system. On the other hand the synchronisation protocol requires remote partitions to be be informed about events as soon as possible in order to prevent simulators from having to roll back over long periods of time because because they were informed too late about the incorrectness of their computation. In our implementation communication is controlled by a buering mechanism that accounts for both of these conicting requirements. There is one buer for each remote partition where event messages for the corresponding partition are written to. After each step (i.e. one iteration in the loop of signal value update followed by the evaluation of fanout elements) buers are checked whether they have reached some minimum length or contain events that have been generated more than a maximum number of steps before. The number or steps the simulator executes while an event stays in the buer is referred to as the event's age. If a buer is long enough or contains events that have been held in the buer for too long a time, its contents is sent. Synchronisation eciency can be optimized for a given multiprocessor system by adjusting these parameters. State Saving. The state of a simulator can be saved either periodically as a whole (checkpointing) or incrementally by storing state changes. Since in logic simulation each event changes only a very small portion of the state, checkpointing would result in inecient memory usage. Moreover, the target system, like most of today's parallel computers, has only limited physical and no virtual memory on the. As status information is quite large when simulating big circuits, incremental state saving has to be used. Memory requirements are reduced further by saving only the rst change of a signal value that occurs in processing an active point in simulated time. Since a rolled back point in simulated time always is resimulated completely, this state information is sucient.... ipsc/ message length [kb] Figure : mmk: communication performance as a function of message length Global Virtual Time. We have implemented two gvt algorithms: Samadi's gvt [Samadi, 98] and an algorithm proposed by Lin and Lazowska [Lin & Lazowska, 99]. In contrast to the algorithm by Lin/Lazowska, Samadi's simple gvt algorithm requires all processes to stop simulation during gvt computation. Our implementation of inter-simulator communication permits processes to continue local simulation. However, they must refrain from sending any event messages during gvt computation. Simulation with Samadi's algorithm showed to be faster than with Lin/Lazowska's method because the latter requires more messages to be sent. Optimized Re-Simulation after Rollback. Lazy cancellation is based on the optimistic assumption that most events will be generated again when resimulating the rolled back interval. While lazy cancellation prevents unnecessary re-evaluations in remote partitions, local computation is redone completely. For complex elements like PLA's or even microprocessors it is desirable to avoid unnecessary evaluations in the local partition, too. In order to skip renewed element evaluation, the simulator must know the events that have been generated upon evaluation of the element under consideration in the preceding simulation, i.e. the causality relation between events needs to be stored during \normal" simulation. For each event that is executed, pointers to the events that have been generated when evaluating the fanout elements of the signal that is aected by the event are stored together with the identity of the fanout element whose evaluation caused them to be created, the element's input signal values and its internal state (if any). During rollback the simulator marks local events instead of deleting them as it is done in the basic Time Warp mechanism. If during re-simulation of a rolled back period in simulated
3 time an element is up to be evaluated due to an event that has occurred in the previous simulation, too, the simulator has to check whether the element's current inputs and its internal state are the same as just before the corresponding evaluation in the previous simulation. If so, the element need not be evaluated. Instead, the events caused by its previous evaluation can be re-scheduled. Partitioning. Before simulation, circuits are partitioned based on their topology. We use random partitioning and a min-cut algorithm which is a generalization of Fiduccia's and Mattheyses' bipartitioning method [Vijayan, 989]. At runtime, dynamic repartitioning allows to take into account the activity of elements and signals in order to distribute work evenly among the processors. Each simulator reports to the gvt process the time of the earliest un-processed event in its partition that it knows about. These time stamps reect the simulator's load. A simulator reporting a low time stamp lags far behind the others in its simulation, i.e. it is heavily loaded. A lightly loaded simulator will advance its LVT quickly and thus report a high time stamp. In principle, elements should be moved from the \slowest" simulator to the \fastest" simulator. Elements are selected according to their complexity and their activity. In addition, the eect of possible element migrations on communication topology must be taken into account. Time Warp synchronisation introduces an additional problem: in order to be able to roll back simulation, a simulator whose partition has been assigned new elements has to know state information associated with signals connected to these elements. If a signal has not been in the partition before repartitioning its \history" must be transferred, too. In our implementation the gvt process tells the \slowest" simulator to move elements out of its partition. This process will determine the target partition according to the other simulators' load values (provided by the gvt process) and the number of events that to the other simulator in the past. It selects elements whose outputs are already in the receiver's partition, whenever this is possible. For each element being a candidate to be moved the eect that moving it would have on communication topology is considered. Migrations resulting in minimal communication costs (i.e. number of interpartition signals weighted by their activities) are preferred. Also, moving few highly active elements is preferred to moving more but less active elements. Element and signal activities are measured by counting the number of evaluations of each element and the number of events for each signal. EXPERIMENTAL RESULTS The parallel simulator has been run with several of the ISCAS-89 benchmark circuits. Performance measurements were done by source code instrumentation. Times for dierent subtasks were measured using the ipsc's hardware clock. Additional statistics were collected by counters. Dynamic behaviour of LVT's on dierent was observed using the topsys software monitor [Bemmerl et al., 99]. In most of our simulation runs the simulators' lvt's have diverged extremely. During simulation of, units of time lvt's diverge by up to more than, time units, i.e. nearly half the total period being simulated. Even though memory consumption has been minimized by incremental state saving, simulators run out of memory for larger circuits or longer input sequences. We therefore had to limit Time Warp's optimism by preventing simulators from advancing their lvt's too far ahead of gvt by suspending simulation if a maximum value for memory consumption is exceeded. Speedup. Speedup does not scale linearly with the number of simulators. Instead curves show peaks and valleys (see gure ). Despite being not a straight line, the curve clearly has a positive slope. In addition to speedup the following statistics are displayed: the time that is spent in rollback, the time for communication and for processing extern events and the time during which the simulation is suspended to prevent processes from running out of memory. For each measurement (i.e. number of simulators), the gure displays the maximum value of all partitions involved. There is a clear correlation between good speedup and low rollback costs and simulation being suspended rarely. The correspondence between peaks in speedup and valleys in communication is less distinct. For more than two, memory consumption for state information always reaches the limits set by the ' physical memory capacity. We have also gathered statistics on communication. Having set the parameters for event message buering to a maximum event age of and a minimum message length of events, we found average message length to be in the range of to kb for the simulation of c. For this message length, eective bandwidth is still far below its maximum value (see g. ). Communication performance can be optimized by increasing the maximum event age parameter. For larger circuits, however, the buer length can be expected to increase since the number of events that are generated in each simulation step will increase as partitions get larger. For all our test runs, only the time for the simula-
4 simulated time GVT LVT LVT LVT LVT GVT and LVT s trace (TOPSYS software monitoring) circuit: c clock resolution: ms real time [sec] Figure : GVT and LVT's vs. real time (trace generated by topsys software monitor) tion proper has been measured. Input and output les had to be accessed using Intel's remote hosting software. Therefore, i/o has been extremely slow. Unfortunately it was impossible to use the concurrent le system (cfs) because its use is not supported by mmk. However, since parallelization aims at acellerating computations, not I/O, ommiting I/O times seems to be justied for the evaluation of a synchronisation protocol. Monitoring LVT's and GVT. lvt's and gvt have been observed with the help of topsys' distributed monitoring system. An inspect task on one node periodically broadcasts display commands for the lvt and gvt variables to the and stores their replies in a buer that is written to le after simulation has nished. The monitoring technique provides the best possible approximation to a global time base in the distributed memory multiprocessor. Figure shows a trace from the simulation of c. Samadi's simple algorithm has shown to approximate gvt suciently good. Its main benet is the small number of messages per gvt computation. Its disadvantage of having to stop simulation during gvt computation is mitigated by event message buering which allows local simulation to proceed if no event messages are sent while processes are computing their local minima. Optimized Simulation after Rollback. Reducing the number of element evaluations during resimulation of a rolled back period of simulated time can signicantly increase Time Warp's performance if elements are complex to evaluate. Its benet, however, varies strongly with the number of processes in the parallel simulation. For an element evaluation time of ms the maximum increase in speedup that we have observed in the simulation of c is a factor of more than two (see g. ) For some numbers of partitions there was, however, no noticeable benet from optimized re-simulation. CONCLUSIONS AND FUTURE WORK Measurements have shown that Time Warp's ef- ciency strongly depends on an equal distribution of computation load on processes. Although elements have been evenly distributed on processes in static partitioning lvt's diverge extremely. This observation emphasizes the need for dynamic repartitioning. We have not yet been able to analyze Time Warp's behaviour and the eects of our optimizations comprehensively because a detailed study requires a very large number of measurements to be carried out where each of the numerous parameters impacting TW's performance is modied in a controlled way. However, program development and performance measurements were impeded by the fact that our ipsc's have been very unreliable for more than a year now (and still are). Hoping for the system's reliability to improve in the future we intend to carry out more measurements especially in order to evaluate our optimizations to the basic Time Warp mechanism.
5 speedup speedup basic Time Warp method optimized re-simulation rollback time [sec] communication + processing extern events [sec] simulation suspended [sec] Figure : simulation of c (multiple delays) REFERENCES [Bauer & Sporrer, 99] Bauer, H. & Sporrer, C. (99). Distributed Logic Simulation and an Approach to Asynchronous GVT-Calculation. In Proceedings of the 99 circuit: c (unit delay), element evaluation: ms Figure : The eect of optimized re-simulation SCS Western Simulation Multiconference on Parallel and Distributed Simulation (PADS9) (pp. {9). Newport Beach, California. [Bemmerl et al., 99] Bemmerl, T., Lindhof, R., & Treml, T. (99). The Distributed Monitor System of TOPSYS. In H. Burkhart (Ed.), Proceedings of CON- PAR9 VAPP IV, volume 7 of LNCS (pp. 76{76). Zurich, Schweiz: Springer-Verlag. [Jeerson, 98] Jeerson, D. (98). Virtual Time. ACM Transactions on Programming Languages and Systems, 7(), {. [Krodel & Antreich, 99] Krodel, T. & Antreich, K. (99). An Accurate Model for Ambiguity Delay Simulation. In 7th ACM/IEEE Design Automation Conference (pp. {7). [Lin & Lazowska, 99] Lin, Y.-B. & Lazowska, E. (99). Determining the Global Virtual Time in a Distributed Simulation. In Proceedings of the 99 International Conference on Parallel Processing, volume III (pp. {9). [Luksch, 99] Luksch, P. (99). Parallele Logiksimulation auf Multiprozessoren mit verteiltem Speicher. In H. Fuss & P. Schwarz (Eds.), 8. Workshop Simulationsmethoden und -Sprachen fur verteilte Systeme und parallele Prozesse, volume 7 of ASIM-Mitteilungen Dresden: ASIM. [Samadi, 98] Samadi, B. (98). Distributed Simulation, Algorithms and Performance Analysis. Technical Report, University of California, Los Angeles, (UCLA). [Vijayan, 989] Vijayan, G. (989). Min-Cost Partitioning on a Tree Structure and Applications. In 6th ACM/IEEE Design Automation Conference (pp. 77{ 77). [Weitlich, 99] Weitlich, H. (99). Parallele Logiksimulation nach der Time-Warp-Methode auf einem Multiprozessorsystem mit verteiltem Speicher. Diplomarbeit, Technische Universitat Munchen, Institut fur Informatik, Munchen.
Eect of fan-out on the Performance of a. Single-message cancellation scheme. Atul Prakash (Contact Author) Gwo-baw Wu. Seema Jetli
Eect of fan-out on the Performance of a Single-message cancellation scheme Atul Prakash (Contact Author) Gwo-baw Wu Seema Jetli Department of Electrical Engineering and Computer Science University of Michigan,
More informationRollback Overhead Reduction Methods for Time Warp Distributed Simulation
Rollback Overhead Reduction Methods for Time Warp Distributed Simulation M.S. Balsamo and C. Manconi* Dipartimento di Matematica e Informatica, University of Udine Vial delle Scienze 108, Udine, Italy,
More informationχ=5 virtual time state LVT entirely saved state partially saved state χ=5 ν=2 virtual time state LVT entirely saved partially saved unsaved state
ROLLBACK-BASED PARALLEL DISCRETE EVENT SIMULATION BY USING HYBRID STATE SAVING Francesco Quaglia Dipartimento di Informatica e Sistemistica, Universita di Roma "La Sapienza" Via Salaria 113, 00198 Roma,
More informationTechnische Universitat Munchen. Institut fur Informatik. D Munchen.
Developing Applications for Multicomputer Systems on Workstation Clusters Georg Stellner, Arndt Bode, Stefan Lamberts and Thomas Ludwig? Technische Universitat Munchen Institut fur Informatik Lehrstuhl
More informationConsistent Logical Checkpointing. Nitin H. Vaidya. Texas A&M University. Phone: Fax:
Consistent Logical Checkpointing Nitin H. Vaidya Department of Computer Science Texas A&M University College Station, TX 77843-3112 hone: 409-845-0512 Fax: 409-847-8578 E-mail: vaidya@cs.tamu.edu Technical
More informationAn Empirical Performance Study of Connection Oriented Time Warp Parallel Simulation
230 The International Arab Journal of Information Technology, Vol. 6, No. 3, July 2009 An Empirical Performance Study of Connection Oriented Time Warp Parallel Simulation Ali Al-Humaimidi and Hussam Ramadan
More informationComputing Global Virtual Time!
Computing Global Virtual Time Issues and Some Solutions Richard M. Fujimoto Professor Computational Science and Engineering Division College of Computing Georgia Institute of Technology Atlanta, GA 30332-0765,
More informationEvent List Management In Distributed Simulation
Event List Management In Distributed Simulation Jörgen Dahl ½, Malolan Chetlur ¾, and Philip A Wilsey ½ ½ Experimental Computing Laboratory, Dept of ECECS, PO Box 20030, Cincinnati, OH 522 0030, philipwilsey@ieeeorg
More informationComparative Analysis of Periodic State Saving Techniques in Time. Warp Simulators. Center for Digital Systems Engineering. Cincinnati, Ohio
This paper appeared in the Proceedings of the 9th Workshop on Parallel and Distributed Simulation, PADS-1995. c 1995, IEEE. Personal use of this material is permitted. However, permission to reprint or
More informationParallel and Distributed VHDL Simulation
Parallel and Distributed VHDL Simulation Dragos Lungeanu Deptartment of Computer Science University of Iowa C.J. chard Shi Department of Electrical Engineering University of Washington Abstract This paper
More informationBlocking vs. Non-blocking Communication under. MPI on a Master-Worker Problem. Institut fur Physik. TU Chemnitz. D Chemnitz.
Blocking vs. Non-blocking Communication under MPI on a Master-Worker Problem Andre Fachat, Karl Heinz Homann Institut fur Physik TU Chemnitz D-09107 Chemnitz Germany e-mail: fachat@physik.tu-chemnitz.de
More informationParallel Logic Simulation of VLSI Systems
Parallel Logic Simulation of VLSI Systems Roger D. Chamberlain Computer and Communications Research Center Department of Electrical Engineering Washington University, St. Louis, Missouri Abstract Design
More informationEvent Reconstruction in Time Warp
Event Reconstruction in Time Warp Lijun Li and Carl Tropper School of Computer Science McGill University Montreal, Canada lli22, carl@cs.mcgill.ca Abstract In optimistic simulations, checkpointing techniques
More informationOther Optimistic Mechanisms, Memory Management!
Other Optimistic Mechanisms, Memory Management! Richard M. Fujimoto! Professor!! Computational Science and Engineering Division! College of Computing! Georgia Institute of Technology! Atlanta, GA 30332-0765,
More informationStudy of a Multilevel Approach to Partitioning for Parallel Logic Simulation Λ
Study of a Multilevel Approach to Partitioning for Parallel Logic Simulation Λ Swaminathan Subramanian, Dhananjai M. Rao,andPhilip A. Wilsey Experimental Computing Laboratory, Cincinnati, OH 45221 0030
More informationCOMPILED CODE IN DISTRIBUTED LOGIC SIMULATION. Jun Wang Carl Tropper. School of Computer Science McGill University Montreal, Quebec, CANADA H3A2A6
Proceedings of the 2006 Winter Simulation Conference L. F. Perrone, F. P. Wieland, J. Liu, B. G. Lawson, D. M. Nicol, and R. M. Fujimoto, eds. COMPILED CODE IN DISTRIBUTED LOGIC SIMULATION Jun Wang Carl
More informationOptimistic Parallel Simulation of TCP/IP over ATM networks
Optimistic Parallel Simulation of TCP/IP over ATM networks M.S. Oral Examination November 1, 2000 Ming Chong mchang@ittc.ukans.edu 1 Introduction parallel simulation ProTEuS Agenda Georgia Tech. Time Warp
More informationSteering. Stream. User Interface. Stream. Manager. Interaction Managers. Snapshot. Stream
Agent Roles in Snapshot Assembly Delbert Hart Dept. of Computer Science Washington University in St. Louis St. Louis, MO 63130 hart@cs.wustl.edu Eileen Kraemer Dept. of Computer Science University of Georgia
More informationinstruction fetch memory interface signal unit priority manager instruction decode stack register sets address PC2 PC3 PC4 instructions extern signals
Performance Evaluations of a Multithreaded Java Microcontroller J. Kreuzinger, M. Pfeer A. Schulz, Th. Ungerer Institute for Computer Design and Fault Tolerance University of Karlsruhe, Germany U. Brinkschulte,
More informationChair for Network Architectures and Services Prof. Carle Department of Computer Science TU München. Parallel simulation
Chair for Network Architectures and Services Prof. Carle Department of Computer Science TU München Parallel simulation Most slides/figures borrowed from Richard Fujimoto Parallel simulation: Summary/Outline
More informationmessages from disque to parsim messages from parsim to disque
Extension to DISQUE - A trace facility to produce trace data for use by a monitoring tool for distributed simulators Gerd Meister Department of Computer Science, University of Kaiserslautern P.O.Box 3049,
More informationApplication Programm 1
A Concept of Datamigration in a Distributed, Object-Oriented Knowledge Base Oliver Schmid Research Institute for Robotic and Real-Time Systems, Department of Computer Science, Technical University of Munich,
More informationOn Object Orientation as a Paradigm for General Purpose. Distributed Operating Systems
On Object Orientation as a Paradigm for General Purpose Distributed Operating Systems Vinny Cahill, Sean Baker, Brendan Tangney, Chris Horn and Neville Harris Distributed Systems Group, Dept. of Computer
More informationLogged Virtual Memory. David R. Cheriton and Kenneth J. Duda. Computer Science Department. Stanford University. Stanford, CA 94305
Logged Virtual Memory David R. Cheriton and Kenneth J. Duda Computer Science Department Stanford University Stanford, CA 9435 fcheriton,kjdg@cs.stanford.edu Abstract Logged virtual memory (LVM) provides
More informationresidual residual program final result
C-Mix: Making Easily Maintainable C-Programs run FAST The C-Mix Group, DIKU, University of Copenhagen Abstract C-Mix is a tool based on state-of-the-art technology that solves the dilemma of whether to
More informationCHAPTER 4 AN INTEGRATED APPROACH OF PERFORMANCE PREDICTION ON NETWORKS OF WORKSTATIONS. Xiaodong Zhang and Yongsheng Song
CHAPTER 4 AN INTEGRATED APPROACH OF PERFORMANCE PREDICTION ON NETWORKS OF WORKSTATIONS Xiaodong Zhang and Yongsheng Song 1. INTRODUCTION Networks of Workstations (NOW) have become important distributed
More informationMulti-Version Caches for Multiscalar Processors. Manoj Franklin. Clemson University. 221-C Riggs Hall, Clemson, SC , USA
Multi-Version Caches for Multiscalar Processors Manoj Franklin Department of Electrical and Computer Engineering Clemson University 22-C Riggs Hall, Clemson, SC 29634-095, USA Email: mfrankl@blessing.eng.clemson.edu
More informationOn Checkpoint Latency. Nitin H. Vaidya. In the past, a large number of researchers have analyzed. the checkpointing and rollback recovery scheme
On Checkpoint Latency Nitin H. Vaidya Department of Computer Science Texas A&M University College Station, TX 77843-3112 E-mail: vaidya@cs.tamu.edu Web: http://www.cs.tamu.edu/faculty/vaidya/ Abstract
More informationis developed which describe the mean values of various system parameters. These equations have circular dependencies and must be solved iteratively. T
A Mean Value Analysis Multiprocessor Model Incorporating Superscalar Processors and Latency Tolerating Techniques 1 David H. Albonesi Israel Koren Department of Electrical and Computer Engineering University
More informationOptimistic Distributed Simulation Based on Transitive Dependency. Tracking. Dept. of Computer Sci. AT&T Labs-Research Dept. of Elect. & Comp.
Optimistic Distributed Simulation Based on Transitive Dependency Tracking Om P. Damani Yi-Min Wang Vijay K. Garg Dept. of Computer Sci. AT&T Labs-Research Dept. of Elect. & Comp. Eng Uni. of Texas at Austin
More information1 PERFORMANCE ANALYSIS OF SUPERCOMPUTING ENVIRONMENTS. Department of Computer Science, University of Illinois at Urbana-Champaign
1 PERFORMANCE ANALYSIS OF TAPE LIBRARIES FOR SUPERCOMPUTING ENVIRONMENTS Ilker Hamzaoglu and Huseyin Simitci Department of Computer Science, University of Illinois at Urbana-Champaign {hamza, simitci}@cs.uiuc.edu
More informationA taxonomy of race. D. P. Helmbold, C. E. McDowell. September 28, University of California, Santa Cruz. Santa Cruz, CA
A taxonomy of race conditions. D. P. Helmbold, C. E. McDowell UCSC-CRL-94-34 September 28, 1994 Board of Studies in Computer and Information Sciences University of California, Santa Cruz Santa Cruz, CA
More informationParallel Discrete Event Simulation
Parallel Discrete Event Simulation Dr.N.Sairam & Dr.R.Seethalakshmi School of Computing, SASTRA Univeristy, Thanjavur-613401. Joint Initiative of IITs and IISc Funded by MHRD Page 1 of 8 Contents 1. Parallel
More informationExploiting On-Chip Data Transfers for Improving Performance of Chip-Scale Multiprocessors
Exploiting On-Chip Data Transfers for Improving Performance of Chip-Scale Multiprocessors G. Chen 1, M. Kandemir 1, I. Kolcu 2, and A. Choudhary 3 1 Pennsylvania State University, PA 16802, USA 2 UMIST,
More informationJust-In-Time Cloning
Just-In-Time Cloning Maria Hybinette Computer Science Department University of Georgia Athens, GA 30602-7404, USA maria@cs.uga.edu Abstract In this work we focus on a new technique for making cloning of
More informationKevin Skadron. 18 April Abstract. higher rate of failure requires eective fault-tolerance. Asynchronous consistent checkpointing oers a
Asynchronous Checkpointing for PVM Requires Message-Logging Kevin Skadron 18 April 1994 Abstract Distributed computing using networked workstations oers cost-ecient parallel computing, but the higher rate
More informationNetworks. Wu-chang Fengy Dilip D. Kandlurz Debanjan Sahaz Kang G. Shiny. Ann Arbor, MI Yorktown Heights, NY 10598
Techniques for Eliminating Packet Loss in Congested TCP/IP Networks Wu-chang Fengy Dilip D. Kandlurz Debanjan Sahaz Kang G. Shiny ydepartment of EECS znetwork Systems Department University of Michigan
More informationCheckpointing and Rollback Recovery in Distributed Systems: Existing Solutions, Open Issues and Proposed Solutions
Checkpointing and Rollback Recovery in Distributed Systems: Existing Solutions, Open Issues and Proposed Solutions D. Manivannan Department of Computer Science University of Kentucky Lexington, KY 40506
More informationDISTRIBUTED SELF-SIMULATION OF HOLONIC MANUFACTURING SYSTEMS
DISTRIBUTED SELF-SIMULATION OF HOLONIC MANUFACTURING SYSTEMS Naoki Imasaki I, Ambalavanar Tharumarajah 2, Shinsuke Tamura 3 J Toshiba Corporation, Japan, naoki.imasaki@toshiba.co.jp 2 CSIRO Manufacturing
More informationNetwork. Department of Statistics. University of California, Berkeley. January, Abstract
Parallelizing CART Using a Workstation Network Phil Spector Leo Breiman Department of Statistics University of California, Berkeley January, 1995 Abstract The CART (Classication and Regression Trees) program,
More informationThis article appeared in Proc. 7th IEEE Symposium on Computers and Communications, Taormina/Giardini Naxos, Italy, July , IEEE Computer
This article appeared in Proc. 7th IEEE Symposium on Computers and Communications, Taormina/Giardini Naxos, Italy, July 1-4 2002, IEEE Computer Society. Software Supports for Preemptive Rollback in Optimistic
More informationSomething to think about. Problems. Purpose. Vocabulary. Query Evaluation Techniques for large DB. Part 1. Fact:
Query Evaluation Techniques for large DB Part 1 Fact: While data base management systems are standard tools in business data processing they are slowly being introduced to all the other emerging data base
More informationThrashing in Real Address Caches due to Memory Management. Arup Mukherjee, Murthy Devarakonda, and Dinkar Sitaram. IBM Research Division
Thrashing in Real Address Caches due to Memory Management Arup Mukherjee, Murthy Devarakonda, and Dinkar Sitaram IBM Research Division Thomas J. Watson Research Center Yorktown Heights, NY 10598 Abstract:
More informationLondon SW7 2BZ. in the number of processors due to unfortunate allocation of the. home and ownership of cache lines. We present a modied coherency
Using Proxies to Reduce Controller Contention in Large Shared-Memory Multiprocessors Andrew J. Bennett, Paul H. J. Kelly, Jacob G. Refstrup, Sarah A. M. Talbot Department of Computing Imperial College
More information1 Introduction A mobile computing system is a distributed system where some of nodes are mobile computers [3]. The location of mobile computers in the
Low-Cost Checkpointing and Failure Recovery in Mobile Computing Systems Ravi Prakash and Mukesh Singhal Department of Computer and Information Science The Ohio State University Columbus, OH 43210. e-mail:
More informationIBM Almaden Research Center, at regular intervals to deliver smooth playback of video streams. A video-on-demand
1 SCHEDULING IN MULTIMEDIA SYSTEMS A. L. Narasimha Reddy IBM Almaden Research Center, 650 Harry Road, K56/802, San Jose, CA 95120, USA ABSTRACT In video-on-demand multimedia systems, the data has to be
More informationLINUX. Benchmark problems have been calculated with dierent cluster con- gurations. The results obtained from these experiments are compared to those
Parallel Computing on PC Clusters - An Alternative to Supercomputers for Industrial Applications Michael Eberl 1, Wolfgang Karl 1, Carsten Trinitis 1 and Andreas Blaszczyk 2 1 Technische Universitat Munchen
More information8ns. 8ns. 16ns. 10ns COUT S3 COUT S3 A3 B3 A2 B2 A1 B1 B0 2 B0 CIN CIN COUT S3 A3 B3 A2 B2 A1 B1 A0 B0 CIN S0 S1 S2 S3 COUT CIN 2 A0 B0 A2 _ A1 B1
Delay Abstraction in Combinational Logic Circuits Noriya Kobayashi Sharad Malik C&C Research Laboratories Department of Electrical Engineering NEC Corp. Princeton University Miyamae-ku, Kawasaki Japan
More informationAlgorithms Implementing Distributed Shared Memory. Michael Stumm and Songnian Zhou. University of Toronto. Toronto, Canada M5S 1A4
Algorithms Implementing Distributed Shared Memory Michael Stumm and Songnian Zhou University of Toronto Toronto, Canada M5S 1A4 Email: stumm@csri.toronto.edu Abstract A critical issue in the design of
More informationSystem Models. 2.1 Introduction 2.2 Architectural Models 2.3 Fundamental Models. Nicola Dragoni Embedded Systems Engineering DTU Informatics
System Models Nicola Dragoni Embedded Systems Engineering DTU Informatics 2.1 Introduction 2.2 Architectural Models 2.3 Fundamental Models Architectural vs Fundamental Models Systems that are intended
More informationFault-Tolerant Computer Systems ECE 60872/CS Recovery
Fault-Tolerant Computer Systems ECE 60872/CS 59000 Recovery Saurabh Bagchi School of Electrical & Computer Engineering Purdue University Slides based on ECE442 at the University of Illinois taught by Profs.
More informationParallel Pipeline STAP System
I/O Implementation and Evaluation of Parallel Pipelined STAP on High Performance Computers Wei-keng Liao, Alok Choudhary, Donald Weiner, and Pramod Varshney EECS Department, Syracuse University, Syracuse,
More information\Symbolic Debugging of. Charles E. McDowell. April University of California at Santa Cruz. Santa Cruz, CA abstract
A urther Note on Hennessy's \Symbolic ebugging of Optimized Code" Max Copperman Charles E. Mcowell UCSC-CRL-92-2 Supersedes UCSC-CRL-9-0 April 992 Board of Studies in Computer and Information Sciences
More informationGPU Implementation of a Multiobjective Search Algorithm
Department Informatik Technical Reports / ISSN 29-58 Steffen Limmer, Dietmar Fey, Johannes Jahn GPU Implementation of a Multiobjective Search Algorithm Technical Report CS-2-3 April 2 Please cite as: Steffen
More informationScalability of a parallel implementation of ant colony optimization
SEMINAR PAPER at the University of Applied Sciences Technikum Wien Game Engineering and Simulation Scalability of a parallel implementation of ant colony optimization by Emanuel Plochberger,BSc 3481, Fels
More informationMANUFACTURING SIMULATION USING BSP TIME WARP WITH VARIABLE NUMBERS OF PROCESSORS
MANUFACTURING SIMULATION USING BSP TIME WARP WITH VARIABLE NUMBERS OF PROCESSORS Malcolm Yoke Hean Low Programming Research Group, Computing Laboratory, University of Oxford Wolfson Building, Parks Road,
More informationAvailability of Coding Based Replication Schemes. Gagan Agrawal. University of Maryland. College Park, MD 20742
Availability of Coding Based Replication Schemes Gagan Agrawal Department of Computer Science University of Maryland College Park, MD 20742 Abstract Data is often replicated in distributed systems to improve
More information1 Introduction Discrete-event simulation can be used to examine a variety of performance-related issues in complex systems. Parallel discrete-event si
Experiments in Automated Load Balancing Linda F. Wilson Institute for Computer Applications in Science and Engineering Mail Stop 132C NASA Langley Research Center Hampton, Virginia 23681 David M. Nicol
More informationStorage System. Distributor. Network. Drive. Drive. Storage System. Controller. Controller. Disk. Disk
HRaid: a Flexible Storage-system Simulator Toni Cortes Jesus Labarta Universitat Politecnica de Catalunya - Barcelona ftoni, jesusg@ac.upc.es - http://www.ac.upc.es/hpc Abstract Clusters of workstations
More informationPerformance Evaluation of Two New Disk Scheduling Algorithms. for Real-Time Systems. Department of Computer & Information Science
Performance Evaluation of Two New Disk Scheduling Algorithms for Real-Time Systems Shenze Chen James F. Kurose John A. Stankovic Don Towsley Department of Computer & Information Science University of Massachusetts
More informationEcient Implementation of Sorting Algorithms on Asynchronous Distributed-Memory Machines
Ecient Implementation of Sorting Algorithms on Asynchronous Distributed-Memory Machines Zhou B. B., Brent R. P. and Tridgell A. y Computer Sciences Laboratory The Australian National University Canberra,
More informationon Current and Future Architectures Purdue University January 20, 1997 Abstract
Performance Forecasting: Characterization of Applications on Current and Future Architectures Brian Armstrong Rudolf Eigenmann Purdue University January 20, 1997 Abstract A common approach to studying
More informationAdaptive Methods for Distributed Video Presentation. Oregon Graduate Institute of Science and Technology. fcrispin, scen, walpole,
Adaptive Methods for Distributed Video Presentation Crispin Cowan, Shanwei Cen, Jonathan Walpole, and Calton Pu Department of Computer Science and Engineering Oregon Graduate Institute of Science and Technology
More informationParallel Discrete Event Simulation
IEEE/ACM DS RT 2016 September 2016 Parallel Discrete Event Simulation on Data Processing Engines Kazuyuki Shudo, Yuya Kato, Takahiro Sugino, Masatoshi Hanai Tokyo Institute of Technology Tokyo Tech Proposal:
More informationHardware Implementation of GA.
Chapter 6 Hardware Implementation of GA Matti Tommiska and Jarkko Vuori Helsinki University of Technology Otakaari 5A, FIN-02150 ESPOO, Finland E-mail: Matti.Tommiska@hut.fi, Jarkko.Vuori@hut.fi Abstract.
More informationComparison of Priority Queue algorithms for Hierarchical Scheduling Framework. Mikael Åsberg
Comparison of Priority Queue algorithms for Hierarchical Scheduling Framework Mikael Åsberg mag04002@student.mdh.se August 28, 2008 2 The Time Event Queue (TEQ) is a datastructure that is part of the implementation
More informationEvent Simulation Algorithms
VLSI Design 1994, Vol. 2, No. 1, pp. 1-16 Reprints available directly from the publisher Photocopying permitted by license only (C) 1994 Gordon and Breach Science Publishers S.A. Printed in the United
More informationCombining MBP-Speculative Computation and Loop Pipelining. in High-Level Synthesis. Technical University of Braunschweig. Braunschweig, Germany
Combining MBP-Speculative Computation and Loop Pipelining in High-Level Synthesis U. Holtmann, R. Ernst Technical University of Braunschweig Braunschweig, Germany Abstract Frequent control dependencies
More informationDistributed Simulation for Structural VHDL Netlists
Distributed Simulation for Structural VHDL Netlists Werner van Almsick 1, Wilfried Daehn 1, David Bernstein 2 1 SICAN GmbH, Germany 2 Vantage Analysis Systems, USA Abstract: This article describes the
More informationX /99/$ IEEE.
Distributed Simulation of VLSI Systems via Lookahead-Free Self-Adaptive and Synchronization Dragos Lungeanu and C.-J. chard Shi Department of Electrical Engineering, University of Washington, Seattle WA
More informationTHROUGHPUT IN THE DQDB NETWORK y. Shun Yan Cheung. Emory University, Atlanta, GA 30322, U.S.A. made the request.
CONTROLLED REQUEST DQDB: ACHIEVING FAIRNESS AND MAXIMUM THROUGHPUT IN THE DQDB NETWORK y Shun Yan Cheung Department of Mathematics and Computer Science Emory University, Atlanta, GA 30322, U.S.A. ABSTRACT
More informationUsing Timestamps to Track Causal Dependencies
Using Timestamps to Track Causal Dependencies J. A. David McWha Dept. of Computer Science, University of Waikato, Private Bag 315, Hamilton jadm@cs.waikato.ac.nz ABSTRACT As computer architectures speculate
More informationAnalysing Probabilistically Constrained Optimism
Analysing Probabilistically Constrained Optimism Michael Lees and Brian Logan School of Computer Science & IT University of Nottingham UK {mhl,bsl}@cs.nott.ac.uk Dan Chen, Ton Oguara and Georgios Theodoropoulos
More informationON THE SCALABILITY AND DYNAMIC LOAD BALANCING OF PARALLEL VERILOG SIMULATIONS. Sina Meraji Wei Zhang Carl Tropper
Proceedings of the 2009 Winter Simulation Conference M. D. Rossetti, R. R. Hill, B. Johansson, A. Dunkin, and R. G. Ingalls, eds. ON THE SCALABILITY AND DYNAMIC LOAD BALANCING OF PARALLEL VERILOG SIMULATIONS
More informationCompiler Support for Software-Based Cache Partitioning. Frank Mueller. Humboldt-Universitat zu Berlin. Institut fur Informatik. Unter den Linden 6
ACM SIGPLAN Workshop on Languages, Compilers and Tools for Real-Time Systems, La Jolla, California, June 1995. Compiler Support for Software-Based Cache Partitioning Frank Mueller Humboldt-Universitat
More informationIncorporating the Controller Eects During Register Transfer Level. Synthesis. Champaka Ramachandran and Fadi J. Kurdahi
Incorporating the Controller Eects During Register Transfer Level Synthesis Champaka Ramachandran and Fadi J. Kurdahi Department of Electrical & Computer Engineering, University of California, Irvine,
More informationAn Evaluation of Information Retrieval Accuracy. with Simulated OCR Output. K. Taghva z, and J. Borsack z. University of Massachusetts, Amherst
An Evaluation of Information Retrieval Accuracy with Simulated OCR Output W.B. Croft y, S.M. Harding y, K. Taghva z, and J. Borsack z y Computer Science Department University of Massachusetts, Amherst
More informationSparse Matrix Operations on Multi-core Architectures
Sparse Matrix Operations on Multi-core Architectures Carsten Trinitis 1, Tilman Küstner 1, Josef Weidendorfer 1, and Jasmin Smajic 2 1 Lehrstuhl für Rechnertechnik und Rechnerorganisation Institut für
More informationJWarp: a Java library for parallel discrete-event simulations
CONCURRENCY: PRACTICE AND EXPERIENCE Concurrency: Pract. Exper.,Vol.10(11 13), 999 1005 (1998) JWarp: a Java library for parallel discrete-event simulations PEDRO BIZARRO,LUÍS M. SILVA AND JOÃO GABRIEL
More informationPARALLEL MULTI-DELAY SIMULATION
PARALLEL MULTI-DELAY SIMULATION Yun Sik Lee Peter M. Maurer Department of Computer Science and Engineering University of South Florida Tampa, FL 33620 CATEGORY: 7 - Discrete Simulation PARALLEL MULTI-DELAY
More informationEgemen Tanin, Tahsin M. Kurc, Cevdet Aykanat, Bulent Ozguc. Abstract. Direct Volume Rendering (DVR) is a powerful technique for
Comparison of Two Image-Space Subdivision Algorithms for Direct Volume Rendering on Distributed-Memory Multicomputers Egemen Tanin, Tahsin M. Kurc, Cevdet Aykanat, Bulent Ozguc Dept. of Computer Eng. and
More informationAndrew Davenport and Edward Tsang. fdaveat,edwardgessex.ac.uk. mostly soluble problems and regions of overconstrained, mostly insoluble problems as
An empirical investigation into the exceptionally hard problems Andrew Davenport and Edward Tsang Department of Computer Science, University of Essex, Colchester, Essex CO SQ, United Kingdom. fdaveat,edwardgessex.ac.uk
More informationA Linear-Time Heuristic for Improving Network Partitions
A Linear-Time Heuristic for Improving Network Partitions ECE 556 Project Report Josh Brauer Introduction The Fiduccia-Matteyses min-cut heuristic provides an efficient solution to the problem of separating
More informationOn Computing Minimum Size Prime Implicants
On Computing Minimum Size Prime Implicants João P. Marques Silva Cadence European Laboratories / IST-INESC Lisbon, Portugal jpms@inesc.pt Abstract In this paper we describe a new model and algorithm for
More informationPredictive Thread-to-Core Assignment on a Heterogeneous Multi-core Processor*
Predictive Thread-to-Core Assignment on a Heterogeneous Multi-core Processor* Tyler Viswanath Krishnamurthy, and Hridesh Laboratory for Software Design Department of Computer Science Iowa State University
More informationEcient Implementation of Sorting Algorithms on Asynchronous Distributed-Memory Machines
Ecient Implementation of Sorting Algorithms on Asynchronous Distributed-Memory Machines B. B. Zhou, R. P. Brent and A. Tridgell Computer Sciences Laboratory The Australian National University Canberra,
More informationLarge-Scale Network Simulation Scalability and an FPGA-based Network Simulator
Large-Scale Network Simulation Scalability and an FPGA-based Network Simulator Stanley Bak Abstract Network algorithms are deployed on large networks, and proper algorithm evaluation is necessary to avoid
More informationPredicting the performance of synchronous discrete event simulation systems
Predicting the performance of synchronous discrete event simulation systems Jinsheng Xu and Moon Jung Chung Department of Computer Science Michigan State University {xujinshe,chung}@cse.msu.edu ABSTRACT
More informationChapter 8 & Chapter 9 Main Memory & Virtual Memory
Chapter 8 & Chapter 9 Main Memory & Virtual Memory 1. Various ways of organizing memory hardware. 2. Memory-management techniques: 1. Paging 2. Segmentation. Introduction Memory consists of a large array
More informationexecution host commd
Batch Queuing and Resource Management for Applications in a Network of Workstations Ursula Maier, Georg Stellner, Ivan Zoraja Lehrstuhl fur Rechnertechnik und Rechnerorganisation (LRR-TUM) Institut fur
More informationUSING GENETIC ALGORITHMS TO LIMIT THE OPTIMISM IN TIME WARP. Jun Wang Carl Tropper
Proceedings of the 2009 Winter Simulation Conference M. D. Rossetti, R. R. Hill, B. Johansson, A. Dunkin, and R. G. Ingalls, eds. USING GENETIC ALGORITHMS TO LIMIT THE OPTIMISM IN TIME WARP Jun Wang Carl
More informationUniversity of Maryland. fzzj, basili, Empirical studies (Desurvire, 1994) (Jeries, Miller, USABILITY INSPECTION
AN EMPIRICAL STUDY OF PERSPECTIVE-BASED USABILITY INSPECTION Zhijun Zhang, Victor Basili, and Ben Shneiderman Department of Computer Science University of Maryland College Park, MD 20742, USA fzzj, basili,
More informationTIERS: Topology IndependEnt Pipelined Routing and Scheduling for VirtualWire TM Compilation
TIERS: Topology IndependEnt Pipelined Routing and Scheduling for VirtualWire TM Compilation Charles Selvidge, Anant Agarwal, Matt Dahl, Jonathan Babb Virtual Machine Works, Inc. 1 Kendall Sq. Building
More informationRecovering from Main-Memory Lapses. H.V. Jagadish Avi Silberschatz S. Sudarshan. AT&T Bell Labs. 600 Mountain Ave., Murray Hill, NJ 07974
Recovering from Main-Memory Lapses H.V. Jagadish Avi Silberschatz S. Sudarshan AT&T Bell Labs. 600 Mountain Ave., Murray Hill, NJ 07974 fjag,silber,sudarshag@allegra.att.com Abstract Recovery activities,
More informationParallel Clustering on a Unidirectional Ring. Gunter Rudolph 1. University of Dortmund, Department of Computer Science, LS XI, D{44221 Dortmund
Parallel Clustering on a Unidirectional Ring Gunter Rudolph 1 University of Dortmund, Department of Computer Science, LS XI, D{44221 Dortmund 1. Introduction Abstract. In this paper a parallel version
More informationPARALLEL LOGIC SIMULATION OF MILLION-GATE VLSI CIRCUITS
PARALLEL LOGIC SIMULATION OF MILLION-GATE VLSI CIRCUITS By Lijuan Zhu A Thesis Submitted to the Graduate Faculty of Rensselaer Polytechnic Institute in Partial Fulfillment of the Requirements for the Degree
More informationApplication. CoCheck Overlay Library. MPE Library Checkpointing Library. OS Library. Operating System
Managing Checkpoints for Parallel Programs Jim Pruyne and Miron Livny Department of Computer Sciences University of Wisconsin{Madison fpruyne, mirong@cs.wisc.edu Abstract Checkpointing is a valuable tool
More informationTECHNICAL RESEARCH REPORT
TECHNICAL RESEARCH REPORT A Resource Reservation Scheme for Synchronized Distributed Multimedia Sessions by W. Zhao, S.K. Tripathi T.R. 97-14 ISR INSTITUTE FOR SYSTEMS RESEARCH Sponsored by the National
More informationThe Impact of Lookahead on the Performance of Conservative Distributed Simulation
The Impact of Lookahead on the Performance of Conservative Distributed Simulation Bruno R Preiss Wayne M Loucks Department of Electrical and Computer Engineering University of Waterloo, Waterloo, Ontario,
More informationTHE IMPLEMENTATION OF A DISTRIBUTED FILE SYSTEM SUPPORTING THE PARALLEL WORLD MODEL. Jun Sun, Yasushi Shinjo and Kozo Itano
THE IMPLEMENTATION OF A DISTRIBUTED FILE SYSTEM SUPPORTING THE PARALLEL WORLD MODEL Jun Sun, Yasushi Shinjo and Kozo Itano Institute of Information Sciences and Electronics University of Tsukuba Tsukuba,
More information