The recursive decoupling method for solving tridiagonal linear systems

Size: px
Start display at page:

Download "The recursive decoupling method for solving tridiagonal linear systems"

Transcription

1 Loughborough University Institutional Repository The reursive deoupling method for solving tridiagonal linear systems This item was submitted to Loughborough University's Institutional Repository by the/an author. Additional Information: A Master's Thesis submitted in partial fulfilment of the requirements for the award of Master of Philosophy of the Loughborough University of Tehnology. Metadata Reord: Publisher: Giulia Spaletta Please ite the published version.

2 This item was submitted to Loughborough University as an MPhil thesis by the author and is made available in the Institutional Repository ( under the following Creative Commons Liene onditions. For the full text of this liene, please go to:

3 LOUGHBOROUGH UNIVERSITY OF TECHNOLOGY LIBRARY AUTHOR/FILING TITLE S (.l,!\l" rr A G ~ , ~ ACCESSION/COPY NO. VOL. NO. J~~.s.G_~_Q_Q$.':t: =L~ CLASS MARK.,

4

5 The Reursive Deoupling Method for Solving Tridiagonal Linear Systems by GIULIA SPALETTA, Dott. A Master's Thesis Submitted in partial fulfilment of the requirements for the award of Master of Philosophy of the Loughborough University of Tehnology September, 1991 Supervisor: Professor D. J. EVANS, D.S. by Giulia Spaletta, 1991

6 LOtlghborough Unrvel1llty of Tehnol."y Library - ".~~"L (.1:1. :. A"" No 03booo3't7

7 Delaration I delare that this thesis is a reord of researh work arried out by me, and that is my own omposition. I also ertify that neither this thesis nor the original work ontained therein has been submitted to this or any other institution for a higher degree. G.SPALETTA

8 Aknowledgements I wish to express my deepest and most sinere gratitude to Professor D. J. Evans for giving me the opportunity to arry out this work, in the first instane; and subsequently for his friendly and unfailing guidane, ontinuous help and inspiring enthusiasm throughout this researh. Finally, for his invaluable advie and infinite patiene during the writing of this thesis. Thanks to Professor Evans, I an onsider the period I spent studying under his supervision as one of the most fruitful experienes both in my aademi areer and life. I also wish to thank: - Mrs. J. Poulton for her professional help and onstant, friendly presene; - Mr. M. Sofroniou for his interest and fruitful ollaboration, for many stimulating disussions, not to mention his typing of the whole of hapter 5 (ore of this thesis) and the improvements he has brought to the original manusript. Most of all, I wish to express to him my speial indebtedness for being suh a patient, true friend; - Miss. H. Y. Sanossian, Mr. N. M. Bahoshy and Dr. A. Osbaldestin for their ative o-operation and onstant support as olleagues, but most of all as very good friends; - Dr. W. S. Yousif and Mr. G. S. Samra for all their tehnial advie (and their infinite patiene); - Miss. L. Howard for her help in typing part of this thesis; - all my olleagues and the staff of the Department of Computer Studies. Finally, I thank my family for their love and understanding.

9 Abstrat

10 Abstrat The work presented in this thesis mainly onerns the analysis of parallel algorithms for the solution of tridiagonal linear systems and the design of a new tridiagonal equation solver, whih an be run on a MIMD (Multiple Instrution Multiple Data stream) type parallel omputer, in partiular the Balane 8000 Sequent system at Loughborough University of Tehnology. In the first hapter, an introdution to the existing omputer models IS given, together with a brief desription of the proess that has led from the uniproessor mahine to the development of different parallel arhitetures. Enhanement is given to MIMD shared memory systems. In this respet, the main harateristis of the Sequent system are presented, as well as the main programming features supported by the Balane Operating System, the Dynix. The seond hapter presents the fundamentals of parallel programmmg on the Balane 8000 omputer. Terms and onepts that are speifi to multitasking programs are introdued. Also, the two multitasking methods, data partitioning and funtion partitioning, are outlined. In the same hapter, we investigate problems (suh as program dependenies, sharing of data, synhronization of onurrent proess) arising from the adaptation of an appliation to parallel versions, and the related programming tehniques. Some of the parallel programming tools are desribed, with partiular attention to the so-alled "data partitioning with Sequent Fortran" and "data partitioning with Dynix". Chapter 3 starts with an outline of the most well known algorithms for the solution of tridiagonal systems, one of whih is analysed in more detail in hapter 4. Parameters used to evaluate performane are defined, suh as

11 speed-up, effiieny and omputational omplexity, together with the basi priniples of Parallel Numerial Analysis. In the fourth hapter, the Wang tridiagonal system solver is presented. We have onsidered a variant of this partitioning method suitable for MIMD arhitetures, and we have modified it to run on the Balane Test matries have then been used, in order to evaluate the performane of the Wang routine on the Balane omputer and to form a omparison with the new Reursive Deoupling routine of hapter 5. The fifth hapter onstitutes the ore of the whole thesis. The new algorithm also belongs to the lass of partitioning methods, sine it is based on repeated partitioning of the oeffiient matrix into 2x2 submatriesj this strategy, together with a rank-one updating proedure, allows us to alulate the solution expliitly, by solving independent sets of subsystems. Furthermore, the methods turns out to be intrinsially parallel and suitable for solution on multiproessor arhitetures. The performane of the Reursive Deoupling routine on the Balane 8000 omputer has been tested by using the same example matries as those used to test the Wang method. The thesis onludes with a hapter summanzmg the mam results and suggestions for further researh. Keywords Tridiagonal Linear Equations; Shared Memory Parallel Computers; Sequent Balane 8000 Multiproessor; Partition Method; Reursive Deoupling Method; Parallel Numerial Analysis.

12 Contents

13 Contents Aknowledgements Abstrat 1. Introdution to Parallel Computers 1.1. Introdution 1.2. A Classifiation of Computer Models 1.3. Shared Memory Systems Page Parallel Numerial Analysis and the Flynn Classifiation of Computer Models 1.5. The Balane 8000 Parallel Proessing System Priniples of Parallel Programming on the Balane Introdution to Parallel Programming on the Balane Parallelizable appliations: Homogeneous and Heterogeneous Multitasking Program Dependenies Elements of Parallel Programming Parallel Programming Tools Parallel Numerial Analysis: the Tridiagonal Linear Systems Problem 3.1. Introdution Performane Evaluation Parameters: Speed-up and Computational Complexity 3.3. Fundamentals of Parallel Numerial Analysis 41 50

14 4. The Wang Partitioning Method 4.1. Introdution The Wang Algorithm The Wang Fortran Routine Numerial Experiments and Remarks The Reursive Deoupling Method 5.1. Introdution to the Reursive Deoupling Method The Partitioning Proess The Reursive Deoupling Proess The Reursive Deoupling Algorithm An Analytial Example A Numerial Example The Reursive Deoupling Routine Numerial Experiments and Remarks Conlusions and Further Work 6.1. Conlusions and Suggestions for Further Work 129 Referenes Appendix. Programs Listings

15 1. Introdution to Parallel Computers

16 1.1. Introdution In the last few years we have seen an explosion in the interest In parallel proessors and parallel programming. The sope of parallel proessing is to redue the elapsed time to omplete a job. This time will basially depend on the oding style, the arhiteture of the mahine and the hardware implementation. The job of everybody in harge of software development (system designers, ompiler and library writers, programmers) is to get the atual time required by the alulations as lose as possible to the ideal. Tools have been developed to express the parallelism expliitly, either in the form of subroutine libraries or language extensions; furthermore, studies are still in progress, onerning the automati parallelization of sequential ode. To date, the only automati system available is limited to individual loops. Parallelism at a higher level must still be speified by the programmer A Classifiation of Computer Models A knowledge of the omputer arhiteture and the hardware implementation is not essential to the programmer. However, when performane beomes ritial, a good understanding of the hardware parallelism an be fundamental to the program's tuning. In spite of all the efforts made to write portable programs, some algorithms will run effiiently on ertain arhitetures, poorly on others. The situation is worse for parallel proessors than for uniproessors, due to the wider variety of arhitetures. 1

17 We an state a lassifiation of different omputer models, based on those aspets in the hardware implementation of parallelism that most affet the oding style [16]: 1) shared memory systems (figure1.1); 2) distributed memory systems, also alled message passmg systems (figure 1. 2 & figure 1.4); 3) hybrid systems (figure1.3). We are mostly interested in the first type of omputer arhiteture, therefore we shall present a brief study of this kind of parallel mahine. I CPU I I CPU I I CPU I I MEMORY I FIGURE 1.1. Shemati of a shared memory system. I CPU MEMORY CPU C P 1! MEMORY MEMORY I CPU JI MEMORY FIGURE 1.2. Shemati of a distributed memory system: fully interonneted message passing mahine. CPU CPU CPU MEMORY MEMORY MEMORY CONNECTION NETWORK FIGURE 1.3. Shemati of a hybrid mahine. 2

18 (al (bl (l (dl FIGURE 1.4. Distributed memory systems. (a) Ring onnetion mahine. (b) Star onnetion mahine. () Mesh mahine. (d) Hyperube of order 3. (M: memory). 3

19 1.3. Shared Memory Systems A shared memory mahine has a single global memory aessible to all proessors. Eah proessor may have some loal memory (suh as "registers" on the Cray X-MP or the "ahe" on the IBM 3090). The data organization inside the memory (global and loal memory) is totally transparent to the user. The data aess time is independent of the proessor making the request. This is not to say that there is no memory ontention. Problems like page faults, memory bank onflits, et., still affet the performane. Algorithms are easy to design for shared memory systems. The data input on these mahines is done as if running on a uniproessor. On the other hand, programs are hard to debug. The most ommon type of error involves piking up wrong data from a global variable. There is no indiation of when the error ourred, so that the omputing proess ontinues, produing an erroneous final result. Data organization, therefore, is a key to parallel algorithms, even on a shared memory omputer. Unfortunately, the most ommonly used language for sientifi purposes (Fortran) only allows quite simple data strutures (just salars and arrays), induing the programmer to onentrate on program flow rather than on data management. The latest version of Fortran language permits the use of a wider variety of strutures and mehanisms. The data sharing speifiations, though, still onstitutes a fundamental problem on shared memory systems, a problem that beomes even more ritial when the parallelism is nested. 4

20 To simplify the programmer's job, in this last ase, most parallel proessors provide only a single level of parallelism; that is to say that a master proess is allowed to spawn subproesses, while the subproesses may not themselves spawn proesses. Data is either known to all the reated proesses or is private. As a onsequene of everything that has been said so far, the shared memory systems need a few language extensions. Firstly, the need to delare whih data is private to eah proessor (loal da:ta) and whih is known to all proessors (global data) arises. Seondly, synhronization is needed to prevent out-of-sequene aess of different proessors to the shared memory. The following onsiderations answer the above mentioned problems. The work in a shared memory mahine is usually divided up in a so-alled "fork-join" style: one proess spawns the subproesses (fork) and waits for them to finish (join). A means to restrit aess to the ode is needed and obtained, introduing the onept of a "ritial setion"; this is a setion of ode exeuted by all proessors, one at a time (suh as in the ase of a redution variable). The onept of a "sequential setion" is also introdued, whih is a part of the ode that has to be exeuted by only one proessor and skipped by all the others. A sequential setion is typially used to initialise global data. The easiest way of obtaining synhronization is the JOIN onstrut. When this is not possible, other onstruts have to be used, suh as "barriers" or 5

21 "semaphores". All these onepts will be more preisely illustrated in the following paragraphs. Finally, sme the ost of sharing data is very small in shared memory mahines, programmers often tend to parallelize the ode at the Do-loop level. In the ase of independent loop iterations, eah proessor an run a different subset of the loop index range, providing that eah index value is used exatly one. There are basially two ways of parallelizing a Do-loop. One way is to assign the first loop index value to the first arrived proessor, the seond index value to the seond proessor, and so on. Whenever a proessor has ompleted its task (its loop iteration), it returns to the top to get more work. In this way, an automati load balaning is realized. On the other hand, this way of obtaining a parallel Do-loop requires some form of synhronization, to assure that eah proessor gets a unique value of the loop index. A seond way to parallelize a Do-loop is to partition it so that eah proessor will do a ertain set of loop iterations. This way of proeeding is to be preferred if the work is naturally load balaned, and expeially if the synhronization ost is high Parallel Numerial Analysis and the Flynn Classifiation of Computer Models In lassial numerial analysis, a universal omputer model is represented by the Von Neumann mahine; this an be shematized as follows (figure 1.5): 6

22 PROCESSOR INPUT... " L. A. U. la.. I" " OUTPUT C. u I' " MEMORY I PROGRAM I FIGURE 1.5. Sheme of the Von Neumann mahine. L. A. U. LogiC & Arithmeti Unit. C. U. Control Unit. The main features of this universal omputer are: a) digital representation of variables; b) serial proessing, arried out aording to the basi operations of arithmeti and logi; ) the program is a oded version of the algorithm to be implemented; d) data are held in the main memory. The algorithms of lassial numerial analysis are then based on the Von Neumann model and entail a large number of elementary operations. This basi serial model has been taken as the starting point for all further developments, until the onept of "parallelism" began to be disussed. Parallelism was to be interpreted in the widest sense, that is not just to build a parallel digital omputer, but also to reate a body of numerial mathematis 7

23 whih exploits the possibilities offered by parallel omputers. Furthermore, the question arose as to whether there exists a maximal parallelism for a given range of problems. All these fats led to the need for a "parallel numerial analysis". Conneted to this need was the problem of formulating a standard mahine model for parallel numerial methods. During the last thirty years, the performane of serial mahines has been improved greatly, due to the use of a new tehnology and new design. Parallel features have been introdued: in the organization of input/output hannels; by overlapping the exeution of instrutions; by using interleaved storage tehniques. Starting from these ones, new developments have been realized, leading to a truly parallel mahine. Gains have been obtained, suh as: 1) inrease of omputing speed; 2) possibility of solving problems too omplex for serial omputers; 3) exploitation of the inherent parallelism of some problems; 4) possibility of alulation of a solution in real time. On the other hand, parallel omputers present new diffiulties, due to a ompliated organization of the data and also due to mahine dependent optimization for effiieny. At present, there is still no standard model for parallel systems. Suh a model ould be represented as shown in the following figure: 8

24 ontrol network Deoding of instrutions and ontrol unit ontrol network 2...,. FIGURE 1.6. General onfiguration of a parallel omputer with different levels of parallelism (M: memory; P: proessor). In the above diagram parallelism is possible at different levels: within the ontrol unit; among proessors; among the stores; in the data network. The above figure, though, is too general both for the building of a funtioning omputer and the development of algorithms. Suh a standard diagram an only be taken as a theoretial basis for parallel numerial analysis and parallel omputers. 9

25 Depending on whih level of parallelism is implemented in the diagram of figure 1.6, we an state the following lassifiation of omputer (this lassifiation is due to Flynn [13]): 1) SISD mahines: it is the Von Neumann model (Single Instrution - Single Data stream); 2) SIMD mahines: array proessors, pipeline proessors and assoiative mahines belong to this lass (Single Instrution - Multiple Data stream); 3) MIMD mahines: omputers with several data proessors and multiple proessor systems belong to this lass (Multiple Instrution - Multiple Data stream); 4) MISD mahines: it has been proven that this type of organization (Multiple Instrution - Single Data stream) is equivalent to that of a Von Neumann mahine. Therefore the MISD lass is onsidered empty. Control unit Proessor Memory FIGURE!.7. Sheme of a SI50 omputer. 10

26 FIGURE I.B. Sheme of a SIMO omputer. Data Organisation Network FIGURE 1.9. Sheme of a MISO omputer. Data Organisation Network FIGURE Sheme of a MIMO omputer. NOTE. C: ontrol unit; P: proessor; M: memory. 11

27 In the ontext of parallel numerial analysis, all these omputer models involve problems of rounding errors and their propagation, together with questions of numerial stability of the algorithm used. The SIMD organization, in partiular, is suitable for lasses of numerial problems suh as: matrix operations; numerial integration of differential equations; MonteCarlo methods; pattern reognitions. MIMD mahines onsist of a ertain number m of independent proessors PI, P2,.., Pm, eah having its own ontrol unit (Cl, C2,.., Cm respetively). All these proessors share, among other things, a number of input/output units and a main memory. At every instant eah proessor an arry out different instrutions in parallel, that is to say all proessors an operate simultaneously. Unlike the SIMD mahines, the MIMD omputers are onsidered as "general purpose" omputers, beause they are muh more flexible than the SIMD ones and a greater variety of problems an be solved through them. As mentioned before, in this work we are only onerned with true multiproessor shared memory mahines; an example of this kind of mahine is represented by the Balane 8000 omputer. In the following paragraph we will briefly introdue the Balane arhiteture and the parallel programming apabilities of this system. 12

28 1.5. The Balane 8000 Parallel Proessing System The Balane 8000 Sequent system is a multiproessor shared memory mahine and therefore it belongs to the MIMD lass. Its main features are the following ones [24]: a) it is a true multiproessor, onsisting of multiple idential proessors (CPUs); eah CPU is a general purpose 32 bit miroproessor; b) it is a shared memory mahine, i.e. there is a single ommon memory; an appliation an onsist of multiple proesses, all aessing shared data held in the memory; ) it is a tightly oupled mahine, i.e. all proessors share a single pool of memory; sharing memory is a natural way for two proesses (running on different proessors) to ommuniate with eah other. Note that a tightly oupled multiproessor an do more than assign non-interating proesses to a different proessor. It an also distribute a single proess among many proessors, so that eah proessor only exeutes part of the alulation. This is done, as we will see in the following hapter, to get a "speed-up" (that is if a proess takes time t to run on an uniproessor, it ould take time tin to run on n proessors); d) the Balane system has a symmetri arhiteture, sine all proessors are idential and an exeute both user ode and operating system ode; e) there is a single high-speed Common Bus, used by all the proessors, the memory modules and the input/output ontrollers: this is done to simplify the adding of proessors, memory and input/output bandwidth; f) programs written for a uniproessor system an run on the Balane system in suh a way that it appears transparent to the user; that is programs do 13

29 not need to be modified for multiproessing support. Proessors an be added or removed, with no need of modifying either the operating system appliations or the user appliations; g) dynami load balaning is provided automatially by the proessors, to ensure that all proessors are kept busy (in the most effiient possible way) as long as there are exeutable proesses available; h) hardware support for mutual exlusion is provided, to enable the user to lok any setion of physial memory, whenever there is the need for exlusive aess to shared data strutures. The following figure illustrate the omponents of a typial Balane 8000 system (taken from Sequent Computer System, "Balane 8000 Sy~tem Tehnial Summary ",(26)): - - I f- -CUIT"I*. SYSTEII - - IlULn.,. IIULn.,. - Iotl INTlAI'ACI ICe AI)A~A II-IIT CfIUo HI 80AIID IOAIID ITHIANn CONIOU 0 _na :::::::::... I ~I I l I I IL, I:: :J DIll! Oil. r- l -- I '-- 0 _ &Sw. J 0 TAN a TAN 00 14

30 Proessors The Balane 8000 omputer is designed to employ from two to twelve 32 bit CPUs, in a tightly oupled multiproessing arhiteture. The CPUs are pakaged two per board. To hange the number of CPUs in the system it is neessary only to shut down the system and add or remove one or more dual-proessor boards. No hanges to the operating system or user appliations are required. Memory The Balane 8000 an employ from 2 to 28 Megabytes of primary memory and it an provide 16 Megabytes of virtual address spae per proess. Memory is pakaged in one-board or two-board memory modules. Memory an be added or removed in muh the same way as the CPU s. SCSI bus The SCSI bus (Small Computer System Interfae bus) is used to onnet blok-oriented devies, suh as disk drivers or tape drivers to the system. It supports high-speed, high-volume data transfer between memory and peripherals (disks, tape units). SCED board A Balane 8000 system an inlude from 1 to 4 SCED boards (SCSI Ethernet Diagnosti ontroller boards). Eah SCED board an serve as host adaptor on a SCSI bus. In any Balane 8000 system one SCED board is designated the "master" SCED board: this master board onnets to the system onsole and provides 15

31 power-up diagnostis. It also provides a power-up monitor for any program running on the main CPU, suh as programs to boot the operating system. M ultibus interfae A Balane 8000 system an inlude up to 4 Multibus interfaes: they enable the system to inorporate any of a variety of peripherals and ustom devies. The Balane 8000 System bus It is a high-performane data bus, tailored to multiproessing in the sense that it provides the high bus bandwidth needed to support multiple CPUs. The Balane System bus is a 64 bit system bus whih arries data among the CPUs, the memory modules and the peripheral subsystems. Network interfaes A Balane 8000 an onnet to up to 4 other systems both in loal area networks (one per SCED board), using Ethernet, and in wide-area networks, using ordinary telephone lines. The onnetion in loal area networks failitates ommuniation among users as well as the sharing of files and devies. Eah of the four onnetable Ethernet loal area networks an onnet hundreds of systems, over distanes of one mile or more. Furthermore, the Balane system networking apabilities inlude those ommon to all modern Unix systems. Terminal multiplexor This is a two-board module that resides on the Multibus and an onnet to a terminal, printer, modem or other ompatible devie. 16

32 There an be up to 4 terminal multiplexors per multibus. Operating system: the Dynix The Dynix operating system is a version of Unix 4.2BSD modified to exploit the Balane parallel arhiteture; differenes between Dynix and Unix 4.2BSD are transparent to the user. Dynix also supports most utilities, libraries and system alls provided by Unix System V and, like other versions of Unix, it is a multi-user operating system. Two or more users an use the system simultaneously, while eah user seems to have the system's undivided attention. This is ahieved through an operating system tehnique alled multiprogramming: a CPU moves from one proess to another many times per seond, so that the omputer system is allowed to exeute multiple unrelated proesses (programs) onurrently. All the exeutable proesses wait in a "run queue": when the CPU suspends or terminates the exeution of one proess, it swithes to the proess at the head of the run queue. The Dynix operating system uses the same tehnique, exept that multiprogramming on Dynix is enhaned by the Balane multiproessing arhiteture: in a Balane system a pool of proessors is available to exeute proesses from the run queue. Dynix balanes the system load among the available proessors, keeping all proessors busy as long as there is enough work available. Note that the Dynix operating system does multiprogramming for all the users automatially. Along with the multiprogramming tehnique, the Balane system also supports another kind of parallel programming: multitasking. 17

33 Multitasking is a programming tehnique that allows a single appliation to onsist of multiple losely o-operating proesses [9]. As a onsequene of multitasking and multiprogramming, we an make the following onsiderations. By definition, parallel programs exeute onurrently, meamng that at any instant the system is exeuting multiple programs. On a Balane system, parallel programs exeute simultaneously: at any instant, the Dynix operating system an be exeuting multiple instrutions from multiple proesses (one proess per CPU). Thus, parallel programming on a Balane system has two speial benefits: multiprogramming yields improved "system throughput" for multiple unrelated programs. That is, eah program finishes in about the time it would take on a uniproessor (whih is running that program alone); multitasking yields improved "exeution speed" for individual programs, that is the owner of an appliation (onsisting of multiple proesses) sees an improvement in the exeution speed of the appliation itself, beyond what would be possible on a uniproessor. In the following hapter we will analyze parallel programmmg on the Balane 8000, using the multitasking tehnique. 18

34 2. Priniples of Parallel Programming on the Balane 8000

35 2.1. Introdution to Parallel Programming on the Balane 8000 As illustrated in setion 1.5, the two basi kinds of parallel programmmg are multitasking and multiprogramming. This hapter is primarily about multitasking, sine the Dynix operating system of the Balane 8000 does multiprogramming for all users automatially. Many appliations an be onverted from sequential algorithms to parallel algorithms with relative ease, yielding linear or quasi-linear performane improvements, as more CPUs are dediated to the task. In addition, ertain types of appliations an be designed speifially to exploit the Balane multiproessing arhiteture. The gam m the exeution speed, that an be ahieved by means of the multitasking tehnique, is determined by the following fators: the perentage of the program's time that an be spent exeuting parallel ode (a great number of appliations need to spend less than 2-3% of their time exeuting sequential ode); the number of proessors available to the appliation; the hardware ontention imposed by multiple proessors ompeting for the same resoures (suh as the system bus, the system ommon memory, et.). Note that on a Balane system the overhead due to this hardware ontention is negligible, sine most CPU memory operations aess ahe memory, not the system bus; the overhead in reating multiple proesses; the overhead in synhronization and ommuniation among multiple proesses. 19

36 In adapting an appliation for multitasking, therefore, we will aim to run as muh of the program in parallel as possible; at the same time, we will aim to balane the omputational load as evenly as possible among parallel proesses Parallelizable Appliations: Homogeneous and Heterogeneous M ultitasking We also have to determine whether an appliation an benefit from parallelization and whih kind of multitasking tehnique is the most suitable. A parallel appliation, in fat, onsists of two or more proesses exeuting simultaneously. These proesses an be multiple instanes of the same program ("homogeneous multitasking" or "data partitioning") or they may be distint but o-operating programs ("heterogeneous multitasking" or "funtion partitioning"). Homogeneous multitasking basially onsists of running the same ode on eah CPU. Multiple idential proesses are reated and work on different portions of the data struture simultaneously. Data partitioning, therefore, applies to appliations performing many iterations on large data strutures (e.g. matrix multipliations, Fourier transformations). The entire data struture an be divided up evenly among proesses, before they start work (stati load balaning), or eah proess an work on one portion at a time, going bak for more work when it finishes (dynami load balaning). 20

37 Heterogeneous multitasking, on the ontrary, assigns different ode to eah CPU; that is, all the proesses work simultaneously on a shared data set but eah proess handles a different task. Appliations performing many different operations on the same data set are andidates for funtion partitioning (e.g. flight simulation, program ompilation ). While some appliations require funtion partitioning or a ombination of data and funtion partitioning, most problems adapt more easily to data partitioning. This last method offers some advantages over funtion partitioning, suh as less programming effort is required to onvert a serial program to a parallel algorithm. Furthermore, with data partitioning it is easier to ahieve an even load balaning among proessors; it is also easier to adapt the programs automatially to the number of available proessors. In the remaining part of this hapter, we will only refer to the homogeneous multitasking tehnique. As far as it onerns the deision whether to parallelize a program, we an point out that many programs spend the majority of their time exeuting in very few routines (usually just one or two). When onverting a program to a parallel version, it is often possible to ahieve maximum gain in exeution speed simply by parallelizing these few routines. Furthermore, a typial fration of ode that annot be parallelized turns out to be just 2-3% for most programs (as already been mentioned). Typial setions of ode that have to be performed serially are those related to initialisation phases and input/output operations. 21

38 2.3. Program Dependenies One the portions of parallel ode have been identified, the next step is to analyse all the possible program dependenies, for any program unit [2, 3, 24]. Some program operations, in fat, may depend on previous operations, while some may be exeuted in any order. Program Dependene Analysis, therefore, is needed to arry out all the ordering neessary to guarantee orret results. When a program unit has no dependenies, the statements in that unit an be exeuted in any order or even simultaneously. Most of the time, this is not the ase; we an group the kinds of dependenies into two lasses: data dependenies and ontrol dependenies. Within the data dependenies lass, we separate: flow dependene; anti dependene; output dependene. Flow dependene ours when one operation sets a data value that is used by a subsequent operation: 1) A=B+C II) D=3xA Statement (ll) depends on the result of statement (1). Antidependene ours when one operation uses a memory loation that is loaded by a subsequent operation: 22

39 I) A = B + C H) C=3xB Statement (I) must exeute before statement (H), sine the first statement uses the urrent value of the variable C. Output dependene ours when one operation loads a memory loation whih is also loaded in a subsequent operation: I) A = B + C H) A=D-3 Statement (H) must exeute after statement (I), or A will ontain the wrong value at the end of this program unit. The seond lass of program dependenies is the ontrol dependenies lass;.. it inludes dependenies due to the required flow of ontrol in a program: I) IF (X.GT.O) H) A OB + 3 ' Statement (H) is onditionally exeuted, depending on the result of the test in statement (I). It is neessary' to identify all the program dependen'ies within a program unit (and for all program units), in order to transform a given program, loop or subroutine, to orretly run in parallel. It is also neessary to orr'etly organise the data struture (shared or private) and to get synhronization points and loking mehanisms for all the proesses. 23

40 2.4. Elements of Parallel Programming The remaining setion introdues some elements of parallel programming that are not ommon in sequential programming. We have already disussed the multitasking tehnique and the program dependene analysis. What we still need to onsider is: the reation of shared and private data; the reation and termination of multiple proesses; the division of omputing tasks among parallel proesses ("sheduling"); the synhronization of parallel proesses; the mutual exlusion of parallel proesses (loks mehanisms). Let us study all these subjets, one at a time, in the above order. Shared memory and shared data The Dynix operating system allows any number of proesses to share a ommon region of system memory. Any proess that has aess to a shared-memory region an read or write in that region, in the same way it reads and writes in ordinary memory. Shared memory provides a diret and effiient method for o-operating proesses to share data. It also simplifies the onversion of sequential algorithms to parallel (and it simplifies this onversion muh more than message-passing mehanisms or network-based mahines). Multitasking programs inlude both shared and private data. Shared data is aessible by all the proesses, while private data is aessible by only one proess. 24

41 The following figure 2.1 illustrates the virtual memory ontents of a proess (16 Megabytes of virtual memory are alloated for eah proess): 16Mb Virtual Memory Convent iona I UNIX model Stak ~ DYNIX Parallel Programming model, Stak t Shared Data o ~ Data Text (shared) ~ Private Data Text (shared) FIGURE Comparison of virtual memory ontents, If the proess forks any hild proesses (as we will see later), eah hild proess inherits aess to the parent's shared memory area and shared stak. Both the parent and the hild proesses an then aess the shared data. This mehanism (besides providing an effiient way of interproess ommuniation) uses less memory than having multiple opies of shared data; it also avoids the overhead of making suh opies of shared data. 25

42 Proess reation, sheduling and termination In Dynix, as in other Unix-based operating systems, a new proess is reated by using a system all alled a FORK. The new proess (hild) is a dupliate of the old proess (parent): the hild proess shares the same files and shared memory aessible to the parent proess. A proess identifiation number (proess id) distinguishes the parent proess from all the reated hild proesses: when some hild proesses are forked, the proess id number 0 (zero) is assigned to the parent, while the proess id number 1 is assigned to the first hild proess, the proess id number 2 is assigned to the seond hild proess, and so on. From this point on (until reahing the JOIN phase), they are separate entities. The fork operation is relatively expensive. Therefore, a parallel appliation should fork as many proesses as it is likely to need at the beginning of the program and terminate them at the end of the program (on ompletion of the program itself). If a proess is not needed during ertain ode sequenes, the proess an wait in a busy loop (spinning) or it an relinquish its proessors to other appliations (until it is needed again). In multitasking programming, tasks an be sheduled among all the proesses reated using three different tehniques: presheduling; stati sheduling; dynami sheduling. In presheduling the task division is determined by the programmer before the program is ompiled. The programmer assigns a speifi task to eah proess. 26

43 Automati load balaning, therefore, is not allowed by the presheduling tehnique (that only applies to heterogeneous multitasking). In stati sheduling the tasks are sheduled by the proesses at run time, but they are divided in some predetermined (stati) way. The stati sheduling proedure for one proess is as follows: 1st step) it works out all the tasks that it will do; 2nd step) it does all its tasks; 3rd step) it waits until all other proesses finish their work. Stati sheduling produes stati load balaning: sine the division of tasks is statially determined, some proessors may stand idle while one proessor ompletes its work. This stati tehnique only applies to homogeneous multitasking. In dynami sheduling the tasks are sheduled by the proesses at run time and they are taken from a task queue. The dynami sheduling proedure for one proess is: 1st step) it waits until there are some tasks to exeute; 2nd step) it removes the first task from the task queue and exeutes it; 3rd step) if there are any more tasks to exeute, it goes on to the seond step. Otherwise, it goes bak to the first step. Dynami sheduling produes dynami load balaning: all the proesses are kept busy as long as there is work to be done; the work-load is evenly distributed among the proesses. 27

44 This dynami tehnique applies to both homogeneous and heterogeneous multitasking. Dynami sheduling, though, entails more overhead than stati sheduling: eah time a proess shedules a task for itself, it must hek the shared task queue (to make sure that there is more work to do) and it must remove that task from the queue. Proess synhronization, loop sheduling and lok mehanisms Synhronization is fundamental to ensure that eah proess performs its work without interfering with the other proesses. It is not unusual for a looping subprogram (to be exeuted in parallel) to ontain a ode setion whih depends on all the proesses having ompleted exeution of the preeding ode. All real appliation programs ontain the program dependenies we have studied in setion 2.3. We then need some synhronization mehanisms to ensure the orret exeution of multiple o-operating proesses; these mehanisms are basially: barriers; loks and semaphores. A barrier is a synhronization point: on reahing a barrier, one proess marks itself as "present"; then it waits for all the other proesses to arrive. There are two kinds of barriers. It is possible to synhronise all proesses at a single pre-initialised barrier. 28

45 With the seond type of barrier, the programmer is allowed to set more than one barrier or to synhronise just a subset of the proesses. A lok is the simplest kind of semaphore in the Balane Dynix system. It ensures that only one proess at a time an aess a shared data struture. A lok has two values: loked or unloked. Before attempting to aess a shared data struture, a proess waits until the lok assoiated with the data struture is unloked (indiating that no other proess is aessing the struture). The proess then loks the lok, aesses the shared data and finally unloks the lok. While a proess is waiting for a lok to beome unloked, it "spins" in a loop, produing no work. It is impossible for two proesses to aquire a lok at the same time. Even when a few proesses attempt the same lok immediately, only one sueeds, while all the others have to wait (until the first proess has released the lok). Semaphore8 are synhronization mehanisms based on the loking/unloking priniple; they are used to protet order-dependent setions of ode and to manage queues. "Counting/queueing" semaphores, for example, are useful for queue management. When several proesses are waiting for a lok, the lok will go to the first proess that tries to aquire it right after it is unloked. Counting/queueing semaphores an ensure that the lok is assigned (instead) to the proess that has waited the longest for it. If a barrier is used for synhronization, one proess is delayed in a spinning state (alled "busy-wait" state) until a set number of proesses have reahed the barrier. 29

46 When using a lok or a semaphore the situation is more omplex; in this ase, in fat, there exist four possibilities onerning what a proess should do while it is waiting for its turn to aess the loked ode setion. These four possibilities are: 1) the proess does not wait; it performs a different task and heks the lok again later; 2) the proess spins in a "busy-wait" state; 3) the proess "bloks", that is to say it relinquishes its proessors to another job; 4) the proess spins for a speified period of time, then it bloks. We omplete this paragraph with a onsideration affeting input/output handling. Input/output, in parallel programming, is ompliated by the need for aution when multiple proesses write to the same file. These ompliations an usually be redued by performing input/output only during sequential phases or by designating one proess as a server to perform the input/output operations. This hapter is onluded by introduing the parallel programming tools supported by the Balane system. 30

47 2.5. Parallel Programming Tools The appliations that an be adapted for parallel programming vary greatly in their requirements for data sharing, interproess ommuniations, synhronization, et. [4]. To gain optimum speed-up, the programmer must develop an algorithm that meets these requirements (while still exploiting the appliation's inherent parallelism). To aid in this effort, the Balane system supports programming tools that adapt to the needs of a wide range of appliations. We are mostly interested in two of these parallel programming tools: the Fortran Parallel Programming Diretives (Sequent Fortran); the Parallel Programming Library (Dynix). We illustrate these two tools in more detail in the following setions and we show how to employ them for data partitioning. Fortran Parallel Programming Diretives: data partitioning with Sequent Fortran The Fortran Parallel Programming Diretives support parallel exeution of Fortran Do-loops. By interpreting these diretives, the Sequent Fortran ompiler an restruture a Do-loop for parallel exeution. The user prepares the program for the preproessor by inserting a set of diretives: these diretives identify the loops to be exeuted in parallel; they also identify the shared and private data within eah loop and any ritial setions of the loop under onsideration. Furthermore, the Fortran Parallel Programming Diretives allow the user to ontrol the sheduling of loop iterations among proesses and the data division among all proesses. 31

48 Ideally, the loop to be hosen for parallel exeution should be an "independent" loop (i.e. a loop in whih no iteration depends on the operations in any other iteration). Otherwise, it is reasonable to hoose a loop whih aounts for a large portion of the omputation. Finally, in the ase of nested loops, hoose the outermost loop (if possible). One it has been determined whih loop to prepare for parallel exeution, it is neessary to analyse all the variables in that loop and to lassify them into one of the following ategories: shared variables; loal variables; redution variables; shared ordered variables; shared loked variables. After this analysis phase, the user is ready to use the Fortran Parallel Programming Diretives, to prepare the loop under onsideration for parallel exeution; these diretives are listed in the following table (Table 2.2): 32

49 TABLE 2.2 DIRECTIVES C$DOACROSS C$ORDER DESCRIPTIONS Identify Do-loop for parallel exeution Start loop setion whih ontains a shared ordered variable C$ENDORDER End loop setion whih ontains a shared ordered variable C$ Add Fortran statement for onditional ompilation C$& Continue parallel programming diretive Parallel Programming Diretives. At this point, the preproessor handles all the low-level tasks of data partitioning. By interpreting the diretives, the preproessor produes a program that performs the following features: sets up shared data strutures; reates a set of idential proesses; shedules tasks among proesses; handles mutual exlusion and proess synhronization. All this is done in a way that is totally transparent to the user. For more detailed information about the loop variables lassifiation and the use of the Parallel Programming Diretives refer to [24]. 33

50 If neessary, the user is allowed to all the Dynix Parallel Programming Library routines (see the next setion), in order to preserve the orret data flow within the loop. Parallel Programming Library: data partitioning with Dynix The Sequent Parallel Programming Library is a olletion of C routines whih allow the programmer to perform parallel Fortran programs (as well as C and Pasal programs). This library inludes three sets of routines: 1) mirotasking routines (mirotasking library); 2) routines for general use with data partitioning programs (data partitioning library); 3) routines for memory alloation in data partitioning programs (memory alloation library). By means of them, the user is able to: reate sets of proesses to exeute subprograms in parallel; shedule tasks among proesses; synhronise proesses among tasks; alloate memory for shared data. As a result, programs that use the Parallel Programming Library an be made to balane loads automatially among proessors and to adjust the division of tasks at run time (basing the division on the number of available proessors). 34

51 Data partitioning with Dynix onsists of the reation of multiple independent proesses to exeute iteration loops in parallel. This is done as follows: a) eah loop to be exeuted in parallel is ontained in a subroutine; b) for eah loop, the program alls a speial funtion (m_fork), whih forks a set of hild proesses and assigns a opy of the subroutine to eah proess; ) eah forked proess exeutes some of the loop iterations (either stati or dynami sheduling an be used); d) when neessary, the subroutine may ontain alls to synhronization routines (m_syn, m_lok, m_unlok, et.); e) when all the loop iterations have been exeuted, ontrol returns from the subroutine to the main program. At this point, the program either terminates the parallel proesses (by means of the m_killpros routine), or it suspends their exeution until they are needed again (m_park_pros and m_rele_pros routines), or it leaves the parallel proesses to spin in a busy-wait state and then uses them later. A omplete list of all the routines available in the mirotasking library, in the data partitioning library and in the memory alloation library is given in the following three tables (Tables 2.3, 2.4, 2.5). 35

52 TABLE 2.9 ROUTINES m_fork m_gelmyid DESCRIPTIONS Exeute a subprogram in parallel Return proess identifiation number m_get_numpros Return number of hild proesses m_killpros m_lok m_multi m_next m_park_pros m_rele_proes m_set_pros m_single m_syn m_unlok Terminate hild proesses Lok a lok End single-proess ode setion Inrement global ounter Suspend hild proess exeution Resume hild proess exeution Set number of hild proesses Begin single-proess ode setion Chek in at barrier Unlok a lok Parallel Programming Library Mirotasking Routines. Note: the mirotasking library is designed "around" the m_fork routine; any other routine belonging to this library should only be used in ombination with the m_fork routine. 36

53 TABLE 2.4 ROUTINES pus_online s_init_barrier S_INIT_BARRIER s-inillok S_INIT_LOCK s_lok, s_lok DESCRIPTIONS Return number of CPUs on-line Ini tialise a barrier C Maro Initialise a lok C maro Lok a lok S_LOCK, S_CLOCK C maro s_unlok S_UNLOCK s_ wait_ barrier Unlok a lok C maro Wait at a barrier S_ WAIT_BARRIER C maro Parallel Programming Library Data Partitioning Routines. Note: the data partitioning library inludes a routine to determine the number of available proessors; it also inludes several synhronization routines and their analogous C preproessor maros (these maros are faster than the normal funtion alls, but they an add to the ode size). 37

54 TABLE 2.5 ROUTINES brk, sbrk DESCRIPTIONS Change private data segment size shbrk, shsbrk Change shared data segment size shfree shmallo De-alloate shared data memory Alloate shared data memory Parallel Programming Library Memory Alloation Routines. Note: the memory alloation library onsists of routines that allow data partitioning programs to alloate or de-alloate shared memory; these routines also permit a hange in the amount of shared and private memory assigned to a proess. For more detailed information onerning the Parallel Programming Library usage, refer to the Sequent Guide To Parallel Programming [24]. Data partitioning with Dynix, as well as data partitioning with Sequent Fortran, requires an analysis of all the variables onerned with the setion of ode (Do-loop) to be performed in parallel. It is neessary to identify: shared variables, i.e. "read-only" arrays and salars or arrays whose elements are referened by only one loop iteration; private variables, i.e. variables that are initialised in eah loop iteration before their values are used; dependent variables (redution variables, ordered variables, loked variables ). 38

Outline: Software Design

Outline: Software Design Outline: Software Design. Goals History of software design ideas Design priniples Design methods Life belt or leg iron? (Budgen) Copyright Nany Leveson, Sept. 1999 A Little History... At first, struggling

More information

On - Line Path Delay Fault Testing of Omega MINs M. Bellos 1, E. Kalligeros 1, D. Nikolos 1,2 & H. T. Vergos 1,2

On - Line Path Delay Fault Testing of Omega MINs M. Bellos 1, E. Kalligeros 1, D. Nikolos 1,2 & H. T. Vergos 1,2 On - Line Path Delay Fault Testing of Omega MINs M. Bellos, E. Kalligeros, D. Nikolos,2 & H. T. Vergos,2 Dept. of Computer Engineering and Informatis 2 Computer Tehnology Institute University of Patras,

More information

Pipelined Multipliers for Reconfigurable Hardware

Pipelined Multipliers for Reconfigurable Hardware Pipelined Multipliers for Reonfigurable Hardware Mithell J. Myjak and José G. Delgado-Frias Shool of Eletrial Engineering and Computer Siene, Washington State University Pullman, WA 99164-2752 USA {mmyjak,

More information

What are Cycle-Stealing Systems Good For? A Detailed Performance Model Case Study

What are Cycle-Stealing Systems Good For? A Detailed Performance Model Case Study What are Cyle-Stealing Systems Good For? A Detailed Performane Model Case Study Wayne Kelly and Jiro Sumitomo Queensland University of Tehnology, Australia {w.kelly, j.sumitomo}@qut.edu.au Abstrat The

More information

Algorithms, Mechanisms and Procedures for the Computer-aided Project Generation System

Algorithms, Mechanisms and Procedures for the Computer-aided Project Generation System Algorithms, Mehanisms and Proedures for the Computer-aided Projet Generation System Anton O. Butko 1*, Aleksandr P. Briukhovetskii 2, Dmitry E. Grigoriev 2# and Konstantin S. Kalashnikov 3 1 Department

More information

A DYNAMIC ACCESS CONTROL WITH BINARY KEY-PAIR

A DYNAMIC ACCESS CONTROL WITH BINARY KEY-PAIR Malaysian Journal of Computer Siene, Vol 10 No 1, June 1997, pp 36-41 A DYNAMIC ACCESS CONTROL WITH BINARY KEY-PAIR Md Rafiqul Islam, Harihodin Selamat and Mohd Noor Md Sap Faulty of Computer Siene and

More information

System-Level Parallelism and Throughput Optimization in Designing Reconfigurable Computing Applications

System-Level Parallelism and Throughput Optimization in Designing Reconfigurable Computing Applications System-Level Parallelism and hroughput Optimization in Designing Reonfigurable Computing Appliations Esam El-Araby 1, Mohamed aher 1, Kris Gaj 2, arek El-Ghazawi 1, David Caliga 3, and Nikitas Alexandridis

More information

Exploring the Commonality in Feature Modeling Notations

Exploring the Commonality in Feature Modeling Notations Exploring the Commonality in Feature Modeling Notations Miloslav ŠÍPKA Slovak University of Tehnology Faulty of Informatis and Information Tehnologies Ilkovičova 3, 842 16 Bratislava, Slovakia miloslav.sipka@gmail.om

More information

COSSIM An Integrated Solution to Address the Simulator Gap for Parallel Heterogeneous Systems

COSSIM An Integrated Solution to Address the Simulator Gap for Parallel Heterogeneous Systems COSSIM An Integrated Solution to Address the Simulator Gap for Parallel Heterogeneous Systems Andreas Brokalakis Synelixis Solutions Ltd, Greee brokalakis@synelixis.om Nikolaos Tampouratzis Teleommuniation

More information

COST PERFORMANCE ASPECTS OF CCD FAST AUXILIARY MEMORY

COST PERFORMANCE ASPECTS OF CCD FAST AUXILIARY MEMORY COST PERFORMANCE ASPECTS OF CCD FAST AUXILIARY MEMORY Dileep P, Bhondarkor Texas Instruments Inorporated Dallas, Texas ABSTRACT Charge oupled devies (CCD's) hove been mentioned as potential fast auxiliary

More information

Extracting Partition Statistics from Semistructured Data

Extracting Partition Statistics from Semistructured Data Extrating Partition Statistis from Semistrutured Data John N. Wilson Rihard Gourlay Robert Japp Mathias Neumüller Department of Computer and Information Sienes University of Strathlyde, Glasgow, UK {jnw,rsg,rpj,mathias}@is.strath.a.uk

More information

Learning Convention Propagation in BeerAdvocate Reviews from a etwork Perspective. Abstract

Learning Convention Propagation in BeerAdvocate Reviews from a etwork Perspective. Abstract CS 9 Projet Final Report: Learning Convention Propagation in BeerAdvoate Reviews from a etwork Perspetive Abstrat We look at the way onventions propagate between reviews on the BeerAdvoate dataset, and

More information

Partial Character Decoding for Improved Regular Expression Matching in FPGAs

Partial Character Decoding for Improved Regular Expression Matching in FPGAs Partial Charater Deoding for Improved Regular Expression Mathing in FPGAs Peter Sutton Shool of Information Tehnology and Eletrial Engineering The University of Queensland Brisbane, Queensland, 4072, Australia

More information

Algorithms for External Memory Lecture 6 Graph Algorithms - Weighted List Ranking

Algorithms for External Memory Lecture 6 Graph Algorithms - Weighted List Ranking Algorithms for External Memory Leture 6 Graph Algorithms - Weighted List Ranking Leturer: Nodari Sithinava Sribe: Andi Hellmund, Simon Ohsenreither 1 Introdution & Motivation After talking about I/O-effiient

More information

This fact makes it difficult to evaluate the cost function to be minimized

This fact makes it difficult to evaluate the cost function to be minimized RSOURC LLOCTION N SSINMNT In the resoure alloation step the amount of resoures required to exeute the different types of proesses is determined. We will refer to the time interval during whih a proess

More information

A Dual-Hamiltonian-Path-Based Multicasting Strategy for Wormhole-Routed Star Graph Interconnection Networks

A Dual-Hamiltonian-Path-Based Multicasting Strategy for Wormhole-Routed Star Graph Interconnection Networks A Dual-Hamiltonian-Path-Based Multiasting Strategy for Wormhole-Routed Star Graph Interonnetion Networks Nen-Chung Wang Department of Information and Communiation Engineering Chaoyang University of Tehnology,

More information

Accommodations of QoS DiffServ Over IP and MPLS Networks

Accommodations of QoS DiffServ Over IP and MPLS Networks Aommodations of QoS DiffServ Over IP and MPLS Networks Abdullah AlWehaibi, Anjali Agarwal, Mihael Kadoh and Ahmed ElHakeem Department of Eletrial and Computer Department de Genie Eletrique Engineering

More information

CleanUp: Improving Quadrilateral Finite Element Meshes

CleanUp: Improving Quadrilateral Finite Element Meshes CleanUp: Improving Quadrilateral Finite Element Meshes Paul Kinney MD-10 ECC P.O. Box 203 Ford Motor Company Dearborn, MI. 8121 (313) 28-1228 pkinney@ford.om Abstrat: Unless an all quadrilateral (quad)

More information

Parametric Abstract Domains for Shape Analysis

Parametric Abstract Domains for Shape Analysis Parametri Abstrat Domains for Shape Analysis Xavier RIVAL (INRIA & Éole Normale Supérieure) Joint work with Bor-Yuh Evan CHANG (University of Maryland U University of Colorado) and George NECULA (University

More information

A Novel Validity Index for Determination of the Optimal Number of Clusters

A Novel Validity Index for Determination of the Optimal Number of Clusters IEICE TRANS. INF. & SYST., VOL.E84 D, NO.2 FEBRUARY 2001 281 LETTER A Novel Validity Index for Determination of the Optimal Number of Clusters Do-Jong KIM, Yong-Woon PARK, and Dong-Jo PARK, Nonmembers

More information

13.1 Numerical Evaluation of Integrals Over One Dimension

13.1 Numerical Evaluation of Integrals Over One Dimension 13.1 Numerial Evaluation of Integrals Over One Dimension A. Purpose This olletion of subprograms estimates the value of the integral b a f(x) dx where the integrand f(x) and the limits a and b are supplied

More information

Constructing Transaction Serialization Order for Incremental. Data Warehouse Refresh. Ming-Ling Lo and Hui-I Hsiao. IBM T. J. Watson Research Center

Constructing Transaction Serialization Order for Incremental. Data Warehouse Refresh. Ming-Ling Lo and Hui-I Hsiao. IBM T. J. Watson Research Center Construting Transation Serialization Order for Inremental Data Warehouse Refresh Ming-Ling Lo and Hui-I Hsiao IBM T. J. Watson Researh Center July 11, 1997 Abstrat In typial pratie of data warehouse, the

More information

Gray Codes for Reflectable Languages

Gray Codes for Reflectable Languages Gray Codes for Refletable Languages Yue Li Joe Sawada Marh 8, 2008 Abstrat We lassify a type of language alled a refletable language. We then develop a generi algorithm that an be used to list all strings

More information

Analysis of input and output configurations for use in four-valued CCD programmable logic arrays

Analysis of input and output configurations for use in four-valued CCD programmable logic arrays nalysis of input and output onfigurations for use in four-valued D programmable logi arrays J.T. utler H.G. Kerkhoff ndexing terms: Logi, iruit theory and design, harge-oupled devies bstrat: s in binary,

More information

Performance Benchmarks for an Interactive Video-on-Demand System

Performance Benchmarks for an Interactive Video-on-Demand System Performane Benhmarks for an Interative Video-on-Demand System. Guo,P.G.Taylor,E.W.M.Wong,S.Chan,M.Zukerman andk.s.tang ARC Speial Researh Centre for Ultra-Broadband Information Networks (CUBIN) Department

More information

Multi-Channel Wireless Networks: Capacity and Protocols

Multi-Channel Wireless Networks: Capacity and Protocols Multi-Channel Wireless Networks: Capaity and Protools Tehnial Report April 2005 Pradeep Kyasanur Dept. of Computer Siene, and Coordinated Siene Laboratory, University of Illinois at Urbana-Champaign Email:

More information

Make your process world

Make your process world Automation platforms Modion Quantum Safety System Make your proess world a safer plae You are faing omplex hallenges... Safety is at the heart of your proess In order to maintain and inrease your ompetitiveness,

More information

The AMDREL Project in Retrospective

The AMDREL Project in Retrospective The AMDREL Projet in Retrospetive K. Siozios 1, G. Koutroumpezis 1, K. Tatas 1, N. Vassiliadis 2, V. Kalenteridis 2, H. Pournara 2, I. Pappas 2, D. Soudris 1, S. Nikolaidis 2, S. Siskos 2, and A. Thanailakis

More information

Acoustic Links. Maximizing Channel Utilization for Underwater

Acoustic Links. Maximizing Channel Utilization for Underwater Maximizing Channel Utilization for Underwater Aousti Links Albert F Hairris III Davide G. B. Meneghetti Adihele Zorzi Department of Information Engineering University of Padova, Italy Email: {harris,davide.meneghetti,zorzi}@dei.unipd.it

More information

Zippy - A coarse-grained reconfigurable array with support for hardware virtualization

Zippy - A coarse-grained reconfigurable array with support for hardware virtualization Zippy - A oarse-grained reonfigurable array with support for hardware virtualization Christian Plessl Computer Engineering and Networks Lab ETH Zürih, Switzerland plessl@tik.ee.ethz.h Maro Platzner Department

More information

Formal Verification by Model Checking

Formal Verification by Model Checking Formal Verifiation by Model Cheking Jonathan Aldrih Carnegie Mellon University Based on slides developed by Natasha Sharygina 15-413: Introdution to Software Engineering Fall 2005 3 Formal Verifiation

More information

Automatic Physical Design Tuning: Workload as a Sequence Sanjay Agrawal Microsoft Research One Microsoft Way Redmond, WA, USA +1-(425)

Automatic Physical Design Tuning: Workload as a Sequence Sanjay Agrawal Microsoft Research One Microsoft Way Redmond, WA, USA +1-(425) Automati Physial Design Tuning: Workload as a Sequene Sanjay Agrawal Mirosoft Researh One Mirosoft Way Redmond, WA, USA +1-(425) 75-357 sagrawal@mirosoft.om Eri Chu * Computer Sienes Department University

More information

Cross-layer Resource Allocation on Broadband Power Line Based on Novel QoS-priority Scheduling Function in MAC Layer

Cross-layer Resource Allocation on Broadband Power Line Based on Novel QoS-priority Scheduling Function in MAC Layer Communiations and Networ, 2013, 5, 69-73 http://dx.doi.org/10.4236/n.2013.53b2014 Published Online September 2013 (http://www.sirp.org/journal/n) Cross-layer Resoure Alloation on Broadband Power Line Based

More information

Graph-Based vs Depth-Based Data Representation for Multiview Images

Graph-Based vs Depth-Based Data Representation for Multiview Images Graph-Based vs Depth-Based Data Representation for Multiview Images Thomas Maugey, Antonio Ortega, Pasal Frossard Signal Proessing Laboratory (LTS), Eole Polytehnique Fédérale de Lausanne (EPFL) Email:

More information

mahines. HBSP enhanes the appliability of the BSP model by inorporating parameters that reet the relative speeds of the heterogeneous omputing omponen

mahines. HBSP enhanes the appliability of the BSP model by inorporating parameters that reet the relative speeds of the heterogeneous omputing omponen The Heterogeneous Bulk Synhronous Parallel Model Tiani L. Williams and Rebea J. Parsons Shool of Computer Siene University of Central Florida Orlando, FL 32816-2362 fwilliams,rebeag@s.uf.edu Abstrat. Trends

More information

HEXA: Compact Data Structures for Faster Packet Processing

HEXA: Compact Data Structures for Faster Packet Processing Washington University in St. Louis Washington University Open Sholarship All Computer Siene and Engineering Researh Computer Siene and Engineering Report Number: 27-26 27 HEXA: Compat Data Strutures for

More information

1. Introduction. 2. The Probable Stope Algorithm

1. Introduction. 2. The Probable Stope Algorithm 1. Introdution Optimization in underground mine design has reeived less attention than that in open pit mines. This is mostly due to the diversity o underground mining methods and omplexity o underground

More information

Abstract. Key Words: Image Filters, Fuzzy Filters, Order Statistics Filters, Rank Ordered Mean Filters, Channel Noise. 1.

Abstract. Key Words: Image Filters, Fuzzy Filters, Order Statistics Filters, Rank Ordered Mean Filters, Channel Noise. 1. Fuzzy Weighted Rank Ordered Mean (FWROM) Filters for Mixed Noise Suppression from Images S. Meher, G. Panda, B. Majhi 3, M.R. Meher 4,,4 Department of Eletronis and I.E., National Institute of Tehnology,

More information

DETECTION METHOD FOR NETWORK PENETRATING BEHAVIOR BASED ON COMMUNICATION FINGERPRINT

DETECTION METHOD FOR NETWORK PENETRATING BEHAVIOR BASED ON COMMUNICATION FINGERPRINT DETECTION METHOD FOR NETWORK PENETRATING BEHAVIOR BASED ON COMMUNICATION FINGERPRINT 1 ZHANGGUO TANG, 2 HUANZHOU LI, 3 MINGQUAN ZHONG, 4 JIAN ZHANG 1 Institute of Computer Network and Communiation Tehnology,

More information

Interconnection Styles

Interconnection Styles Interonnetion tyles oftware Design Following the Export (erver) tyle 2 M1 M4 M5 4 M3 M6 1 3 oftware Design Following the Export (Client) tyle e 2 e M1 M4 M5 4 M3 M6 1 e 3 oftware Design Following the Export

More information

- 1 - S 21. Directory-based Administration of Virtual Private Networks: Policy & Configuration. Charles A Kunzinger.

- 1 - S 21. Directory-based Administration of Virtual Private Networks: Policy & Configuration. Charles A Kunzinger. - 1 - S 21 Diretory-based Administration of Virtual Private Networks: Poliy & Configuration Charles A Kunzinger kunzinge@us.ibm.om - 2 - Clik here Agenda to type page title What is a VPN? What is VPN Poliy?

More information

The Minimum Redundancy Maximum Relevance Approach to Building Sparse Support Vector Machines

The Minimum Redundancy Maximum Relevance Approach to Building Sparse Support Vector Machines The Minimum Redundany Maximum Relevane Approah to Building Sparse Support Vetor Mahines Xiaoxing Yang, Ke Tang, and Xin Yao, Nature Inspired Computation and Appliations Laboratory (NICAL), Shool of Computer

More information

A Partial Sorting Algorithm in Multi-Hop Wireless Sensor Networks

A Partial Sorting Algorithm in Multi-Hop Wireless Sensor Networks A Partial Sorting Algorithm in Multi-Hop Wireless Sensor Networks Abouberine Ould Cheikhna Department of Computer Siene University of Piardie Jules Verne 80039 Amiens Frane Ould.heikhna.abouberine @u-piardie.fr

More information

A Load-Balanced Clustering Protocol for Hierarchical Wireless Sensor Networks

A Load-Balanced Clustering Protocol for Hierarchical Wireless Sensor Networks International Journal of Advanes in Computer Networks and Its Seurity IJCNS A Load-Balaned Clustering Protool for Hierarhial Wireless Sensor Networks Mehdi Tarhani, Yousef S. Kavian, Saman Siavoshi, Ali

More information

Evaluation of Benchmark Performance Estimation for Parallel. Fortran Programs on Massively Parallel SIMD and MIMD. Computers.

Evaluation of Benchmark Performance Estimation for Parallel. Fortran Programs on Massively Parallel SIMD and MIMD. Computers. Evaluation of Benhmark Performane Estimation for Parallel Fortran Programs on Massively Parallel SIMD and MIMD Computers Thomas Fahringer Dept of Software Tehnology and Parallel Systems University of Vienna

More information

Space- and Time-Efficient BDD Construction via Working Set Control

Space- and Time-Efficient BDD Construction via Working Set Control Spae- and Time-Effiient BDD Constrution via Working Set Control Bwolen Yang Yirng-An Chen Randal E. Bryant David R. O Hallaron Computer Siene Department Carnegie Mellon University Pittsburgh, PA 15213.

More information

Direct-Mapped Caches

Direct-Mapped Caches A Case for Diret-Mapped Cahes Mark D. Hill University of Wisonsin ahe is a small, fast buffer in whih a system keeps those parts, of the ontents of a larger, slower memory that are likely to be used soon.

More information

Multiple-Criteria Decision Analysis: A Novel Rank Aggregation Method

Multiple-Criteria Decision Analysis: A Novel Rank Aggregation Method 3537 Multiple-Criteria Deision Analysis: A Novel Rank Aggregation Method Derya Yiltas-Kaplan Department of Computer Engineering, Istanbul University, 34320, Avilar, Istanbul, Turkey Email: dyiltas@ istanbul.edu.tr

More information

High-level synthesis under I/O Timing and Memory constraints

High-level synthesis under I/O Timing and Memory constraints Highlevel synthesis under I/O Timing and Memory onstraints Philippe Coussy, Gwenolé Corre, Pierre Bomel, Eri Senn, Eri Martin To ite this version: Philippe Coussy, Gwenolé Corre, Pierre Bomel, Eri Senn,

More information

Drawing lines. Naïve line drawing algorithm. drawpixel(x, round(y)); double dy = y1 - y0; double dx = x1 - x0; double m = dy / dx; double y = y0;

Drawing lines. Naïve line drawing algorithm. drawpixel(x, round(y)); double dy = y1 - y0; double dx = x1 - x0; double m = dy / dx; double y = y0; Naïve line drawing algorithm // Connet to grid points(x0,y0) and // (x1,y1) by a line. void drawline(int x0, int y0, int x1, int y1) { int x; double dy = y1 - y0; double dx = x1 - x0; double m = dy / dx;

More information

Self-Adaptive Parent to Mean-Centric Recombination for Real-Parameter Optimization

Self-Adaptive Parent to Mean-Centric Recombination for Real-Parameter Optimization Self-Adaptive Parent to Mean-Centri Reombination for Real-Parameter Optimization Kalyanmoy Deb and Himanshu Jain Department of Mehanial Engineering Indian Institute of Tehnology Kanpur Kanpur, PIN 86 {deb,hjain}@iitk.a.in

More information

Unsupervised Stereoscopic Video Object Segmentation Based on Active Contours and Retrainable Neural Networks

Unsupervised Stereoscopic Video Object Segmentation Based on Active Contours and Retrainable Neural Networks Unsupervised Stereosopi Video Objet Segmentation Based on Ative Contours and Retrainable Neural Networks KLIMIS NTALIANIS, ANASTASIOS DOULAMIS, and NIKOLAOS DOULAMIS National Tehnial University of Athens

More information

Chapter 2: Introduction to Maple V

Chapter 2: Introduction to Maple V Chapter 2: Introdution to Maple V 2-1 Working with Maple Worksheets Try It! (p. 15) Start a Maple session with an empty worksheet. The name of the worksheet should be Untitled (1). Use one of the standard

More information

THROUGHPUT EVALUATION OF AN ASYMMETRICAL FDDI TOKEN RING NETWORK WITH MULTIPLE CLASSES OF TRAFFIC

THROUGHPUT EVALUATION OF AN ASYMMETRICAL FDDI TOKEN RING NETWORK WITH MULTIPLE CLASSES OF TRAFFIC THROUGHPUT EVALUATION OF AN ASYMMETRICAL FDDI TOKEN RING NETWORK WITH MULTIPLE CLASSES OF TRAFFIC Priya N. Werahera and Anura P. Jayasumana Department of Eletrial Engineering Colorado State University

More information

Parallelizing Frequent Web Access Pattern Mining with Partial Enumeration for High Speedup

Parallelizing Frequent Web Access Pattern Mining with Partial Enumeration for High Speedup Parallelizing Frequent Web Aess Pattern Mining with Partial Enumeration for High Peiyi Tang Markus P. Turkia Department of Computer Siene Department of Computer Siene University of Arkansas at Little Rok

More information

arxiv: v1 [cs.db] 13 Sep 2017

arxiv: v1 [cs.db] 13 Sep 2017 An effiient lustering algorithm from the measure of loal Gaussian distribution Yuan-Yen Tai (Dated: May 27, 2018) In this paper, I will introdue a fast and novel lustering algorithm based on Gaussian distribution

More information

Methods for Multi-Dimensional Robustness Optimization in Complex Embedded Systems

Methods for Multi-Dimensional Robustness Optimization in Complex Embedded Systems Methods for Multi-Dimensional Robustness Optimization in Complex Embedded Systems Arne Hamann, Razvan Rau, Rolf Ernst Institute of Computer and Communiation Network Engineering Tehnial University of Braunshweig,

More information

COMP 181. Prelude. Intermediate representations. Today. Types of IRs. High-level IR. Intermediate representations and code generation

COMP 181. Prelude. Intermediate representations. Today. Types of IRs. High-level IR. Intermediate representations and code generation Prelude COMP 181 Intermediate representations and ode generation November, 009 What is this devie? Large Hadron Collider What is a hadron? Subatomi partile made up of quarks bound by the strong fore What

More information

PROJECT PERIODIC REPORT

PROJECT PERIODIC REPORT FP7-ICT-2007-1 Contrat no.: 215040 www.ative-projet.eu PROJECT PERIODIC REPORT Publishable Summary Grant Agreement number: ICT-215040 Projet aronym: Projet title: Enabling the Knowledge Powered Enterprise

More information

Capturing Large Intra-class Variations of Biometric Data by Template Co-updating

Capturing Large Intra-class Variations of Biometric Data by Template Co-updating Capturing Large Intra-lass Variations of Biometri Data by Template Co-updating Ajita Rattani University of Cagliari Piazza d'armi, Cagliari, Italy ajita.rattani@diee.unia.it Gian Lua Marialis University

More information

Facility Location: Distributed Approximation

Facility Location: Distributed Approximation Faility Loation: Distributed Approximation Thomas Mosibroda Roger Wattenhofer Distributed Computing Group PODC 2005 Where to plae ahes in the Internet? A distributed appliation that has to dynamially plae

More information

Reading Object Code. A Visible/Z Lesson

Reading Object Code. A Visible/Z Lesson Reading Objet Code A Visible/Z Lesson The Idea: When programming in a high-level language, we rarely have to think about the speifi ode that is generated for eah instrution by a ompiler. But as an assembly

More information

Series/1 GA File No i=:: IBM Series/ Battery Backup Unit Description :::5 ~ ~ >-- ffi B~88 ~0 (] II IIIIII

Series/1 GA File No i=:: IBM Series/ Battery Backup Unit Description :::5 ~ ~ >-- ffi B~88 ~0 (] II IIIIII Series/1 I. (.. GA34-0032-0 File No. 51-10 a i=:: 5 Q 1 IBM Series/1 4999 Battery Bakup Unit Desription B88 0 (] o. :::5 >-- ffi "- I II1111111111IIIIII1111111 ---- - - - - ----- --_.- Series/1 «h: ",

More information

Reducing Runtime Complexity of Long-Running Application Services via Dynamic Profiling and Dynamic Bytecode Adaptation for Improved Quality of Service

Reducing Runtime Complexity of Long-Running Application Services via Dynamic Profiling and Dynamic Bytecode Adaptation for Improved Quality of Service Reduing Runtime Complexity of Long-Running Appliation Servies via Dynami Profiling and Dynami Byteode Adaptation for Improved Quality of Servie ABSTRACT John Bergin Performane Engineering Laboratory University

More information

RAC 2 E: Novel Rendezvous Protocol for Asynchronous Cognitive Radios in Cooperative Environments

RAC 2 E: Novel Rendezvous Protocol for Asynchronous Cognitive Radios in Cooperative Environments 21st Annual IEEE International Symposium on Personal, Indoor and Mobile Radio Communiations 1 RAC 2 E: Novel Rendezvous Protool for Asynhronous Cognitive Radios in Cooperative Environments Valentina Pavlovska,

More information

Approximate logic synthesis for error tolerant applications

Approximate logic synthesis for error tolerant applications Approximate logi synthesis for error tolerant appliations Doohul Shin and Sandeep K. Gupta Eletrial Engineering Department, University of Southern California, Los Angeles, CA 989 {doohuls, sandeep}@us.edu

More information

Alleviating DFT cost using testability driven HLS

Alleviating DFT cost using testability driven HLS Alleviating DFT ost using testability driven HLS M.L.Flottes, R.Pires, B.Rouzeyre Laboratoire d Informatique, de Robotique et de Miroéletronique de Montpellier, U.M. CNRS 5506 6 rue Ada, 34392 Montpellier

More information

Implementing Load-Balanced Switches With Fat-Tree Networks

Implementing Load-Balanced Switches With Fat-Tree Networks Implementing Load-Balaned Swithes With Fat-Tree Networks Hung-Shih Chueh, Ching-Min Lien, Cheng-Shang Chang, Jay Cheng, and Duan-Shin Lee Department of Eletrial Engineering & Institute of Communiations

More information

Torpedo Trajectory Visual Simulation Based on Nonlinear Backstepping Control

Torpedo Trajectory Visual Simulation Based on Nonlinear Backstepping Control orpedo rajetory Visual Simulation Based on Nonlinear Bakstepping Control Peng Hai-jun 1, Li Hui-zhou Chen Ye 1, 1. Depart. of Weaponry Eng, Naval Univ. of Engineering, Wuhan 400, China. Depart. of Aeronautial

More information

Establishing Secure Ethernet LANs Using Intelligent Switching Hubs in Internet Environments

Establishing Secure Ethernet LANs Using Intelligent Switching Hubs in Internet Environments Establishing Seure Ethernet LANs Using Intelligent Swithing Hubs in Internet Environments WOEIJIUNN TSAUR AND SHIJINN HORNG Department of Eletrial Engineering, National Taiwan University of Siene and Tehnology,

More information

Reading Object Code. A Visible/Z Lesson

Reading Object Code. A Visible/Z Lesson Reading Objet Code A Visible/Z Lesson The Idea: When programming in a high-level language, we rarely have to think about the speifi ode that is generated for eah instrution by a ompiler. But as an assembly

More information

EXODUS II: A Finite Element Data Model

EXODUS II: A Finite Element Data Model SAND92-2137 Unlimited Release Printed November 1995 Distribution Category UC-705 EXODUS II: A Finite Element Data Model Larry A. Shoof, Vitor R. Yarberry Computational Mehanis and Visualization Department

More information

Sparse Certificates for 2-Connectivity in Directed Graphs

Sparse Certificates for 2-Connectivity in Directed Graphs Sparse Certifiates for 2-Connetivity in Direted Graphs Loukas Georgiadis Giuseppe F. Italiano Aikaterini Karanasiou Charis Papadopoulos Nikos Parotsidis Abstrat Motivated by the emergene of large-sale

More information

Real-Time Control for a Turbojet Engine

Real-Time Control for a Turbojet Engine A Multiproessor mplementation of Real-Time Control for a Turbojet Engine Phillip L. Shaffer ABSTRACT: A real-time ontrol program for a turbojet engine has been implemented on a four-proessor omputer, ahieving

More information

Test Case Generation from UML State Machines

Test Case Generation from UML State Machines Test Case Generation from UML State Mahines Dirk Seifert To ite this version: Dirk Seifert. Test Case Generation from UML State Mahines. [Researh Report] 2008. HAL Id: inria-00268864

More information

Multi-hop Fast Conflict Resolution Algorithm for Ad Hoc Networks

Multi-hop Fast Conflict Resolution Algorithm for Ad Hoc Networks Multi-hop Fast Conflit Resolution Algorithm for Ad Ho Networks Shengwei Wang 1, Jun Liu 2,*, Wei Cai 2, Minghao Yin 2, Lingyun Zhou 2, and Hui Hao 3 1 Power Emergeny Center, Sihuan Eletri Power Corporation,

More information

Plot-to-track correlation in A-SMGCS using the target images from a Surface Movement Radar

Plot-to-track correlation in A-SMGCS using the target images from a Surface Movement Radar Plot-to-trak orrelation in A-SMGCS using the target images from a Surfae Movement Radar G. Golino Radar & ehnology Division AMS, Italy ggolino@amsjv.it Abstrat he main topi of this paper is the formulation

More information

Performance Improvement of TCP on Wireless Cellular Networks by Adaptive FEC Combined with Explicit Loss Notification

Performance Improvement of TCP on Wireless Cellular Networks by Adaptive FEC Combined with Explicit Loss Notification erformane Improvement of TC on Wireless Cellular Networks by Adaptive Combined with Expliit Loss tifiation Masahiro Miyoshi, Masashi Sugano, Masayuki Murata Department of Infomatis and Mathematial Siene,

More information

Flow Demands Oriented Node Placement in Multi-Hop Wireless Networks

Flow Demands Oriented Node Placement in Multi-Hop Wireless Networks Flow Demands Oriented Node Plaement in Multi-Hop Wireless Networks Zimu Yuan Institute of Computing Tehnology, CAS, China {zimu.yuan}@gmail.om arxiv:153.8396v1 [s.ni] 29 Mar 215 Abstrat In multi-hop wireless

More information

Uplink Channel Allocation Scheme and QoS Management Mechanism for Cognitive Cellular- Femtocell Networks

Uplink Channel Allocation Scheme and QoS Management Mechanism for Cognitive Cellular- Femtocell Networks 62 Uplink Channel Alloation Sheme and QoS Management Mehanism for Cognitive Cellular- Femtoell Networks Kien Du Nguyen 1, Hoang Nam Nguyen 1, Hiroaki Morino 2 and Iwao Sasase 3 1 University of Engineering

More information

Tackling IPv6 Address Scalability from the Root

Tackling IPv6 Address Scalability from the Root Takling IPv6 Address Salability from the Root Mei Wang Ashish Goel Balaji Prabhakar Stanford University {wmei, ashishg, balaji}@stanford.edu ABSTRACT Internet address alloation shemes have a huge impat

More information

An Optimized Approach on Applying Genetic Algorithm to Adaptive Cluster Validity Index

An Optimized Approach on Applying Genetic Algorithm to Adaptive Cluster Validity Index IJCSES International Journal of Computer Sienes and Engineering Systems, ol., No.4, Otober 2007 CSES International 2007 ISSN 0973-4406 253 An Optimized Approah on Applying Geneti Algorithm to Adaptive

More information

Fuzzy Meta Node Fuzzy Metagraph and its Cluster Analysis

Fuzzy Meta Node Fuzzy Metagraph and its Cluster Analysis Journal of Computer Siene 4 (): 9-97, 008 ISSN 549-3636 008 Siene Publiations Fuzzy Meta Node Fuzzy Metagraph and its Cluster Analysis Deepti Gaur, Aditya Shastri and Ranjit Biswas Department of Computer

More information

'* ~rr' _ ~~ f' lee : eel. Series/1 []J 0 [[] "'l... !l]j1. IBM Series/1 FORTRAN IV. I ntrod uction ...

'* ~rr' _ ~~ f' lee : eel. Series/1 []J 0 [[] 'l... !l]j1. IBM Series/1 FORTRAN IV. I ntrod uction ... ---- --- - ----- - - - --_.- --- Series/1 GC34-0132-0 51-25 PROGRAM PRODUCT 1 IBM Series/1 FORTRAN IV I ntrod ution Program Numbers 5719-F01 5719-F03 0 lee : eel II 11111111111111111111111111111111111111111111111

More information

Improved flooding of broadcast messages using extended multipoint relaying

Improved flooding of broadcast messages using extended multipoint relaying Improved flooding of broadast messages using extended multipoint relaying Pere Montolio Aranda a, Joaquin Garia-Alfaro a,b, David Megías a a Universitat Oberta de Catalunya, Estudis d Informàtia, Mulimèdia

More information

The Tofu Interconnect D

The Tofu Interconnect D 2018 IEEE International Conferene on Cluster Computing The Tofu Interonnet D Yuihiro Ajima, Takahiro Kawashima, Takayuki Okamoto, Naoyuki Shida, Kouihi Hirai, Toshiyuki Shimizu Next Generation Tehnial

More information

The Implementation of RRTs for a Remote-Controlled Mobile Robot

The Implementation of RRTs for a Remote-Controlled Mobile Robot ICCAS5 June -5, KINEX, Gyeonggi-Do, Korea he Implementation of RRs for a Remote-Controlled Mobile Robot Chi-Won Roh*, Woo-Sub Lee **, Sung-Chul Kang *** and Kwang-Won Lee **** * Intelligent Robotis Researh

More information

1. The collection of the vowels in the word probability. 2. The collection of real numbers that satisfy the equation x 9 = 0.

1. The collection of the vowels in the word probability. 2. The collection of real numbers that satisfy the equation x 9 = 0. C HPTER 1 SETS I. DEFINITION OF SET We begin our study of probability with the disussion of the basi onept of set. We assume that there is a ommon understanding of what is meant by the notion of a olletion

More information

Automated System for the Study of Environmental Loads Applied to Production Risers Dustin M. Brandt 1, Celso K. Morooka 2, Ivan R.

Automated System for the Study of Environmental Loads Applied to Production Risers Dustin M. Brandt 1, Celso K. Morooka 2, Ivan R. EngOpt 2008 - International Conferene on Engineering Optimization Rio de Janeiro, Brazil, 01-05 June 2008. Automated System for the Study of Environmental Loads Applied to Prodution Risers Dustin M. Brandt

More information

Compilation Lecture 11a. Register Allocation Noam Rinetzky. Text book: Modern compiler implementation in C Andrew A.

Compilation Lecture 11a. Register Allocation Noam Rinetzky. Text book: Modern compiler implementation in C Andrew A. Compilation 0368-3133 Leture 11a Text book: Modern ompiler implementation in C Andrew A. Appel Register Alloation Noam Rinetzky 1 Registers Dediated memory loations that an be aessed quikly, an have omputations

More information

CA Release Automation 5.x Implementation Proven Professional Exam (CAT-600) Study Guide Version 1.1

CA Release Automation 5.x Implementation Proven Professional Exam (CAT-600) Study Guide Version 1.1 Exam (CAT-600) Study Guide Version 1.1 PROPRIETARY AND CONFIDENTIAL INFORMATION 2016 CA. All rights reserved. CA onfidential & proprietary information. For CA, CA Partner and CA Customer use only. No unauthorized

More information

INTERPOLATED AND WARPED 2-D DIGITAL WAVEGUIDE MESH ALGORITHMS

INTERPOLATED AND WARPED 2-D DIGITAL WAVEGUIDE MESH ALGORITHMS Proeedings of the COST G-6 Conferene on Digital Audio Effets (DAFX-), Verona, Italy, Deember 7-9, INTERPOLATED AND WARPED -D DIGITAL WAVEGUIDE MESH ALGORITHMS Vesa Välimäki Lab. of Aoustis and Audio Signal

More information

Automatic Generation of Transaction-Level Models for Rapid Design Space Exploration

Automatic Generation of Transaction-Level Models for Rapid Design Space Exploration Automati Generation of Transation-Level Models for Rapid Design Spae Exploration Dongwan Shin, Andreas Gerstlauer, Junyu Peng, Rainer Dömer and Daniel D. Gajski Center for Embedded Computer Systems University

More information

A Formal Hybrid Analysis Technique for Composite Web Services Verification

A Formal Hybrid Analysis Technique for Composite Web Services Verification A Formal Hybrid Analysis Tehnique for Composite Web Servies Verifiation MAY HAIDAR 1,2, HICHAM H. HALLAL 1 1 Computer Siene Department / Department of Eletrial Engineering Fahad Bin Sultan University P.O

More information

represent = as a finite deimal" either in base 0 or in base. We an imagine that the omputer first omputes the mathematial = then rounds the result to

represent = as a finite deimal either in base 0 or in base. We an imagine that the omputer first omputes the mathematial = then rounds the result to Sientifi Computing Chapter I Computer Arithmeti Jonathan Goodman Courant Institute of Mathemaial Sienes Last revised January, 00 Introdution One of the many soures of error in sientifi omputing is inexat

More information

Scheduling Multiple Independent Hard-Real-Time Jobs on a Heterogeneous Multiprocessor

Scheduling Multiple Independent Hard-Real-Time Jobs on a Heterogeneous Multiprocessor Sheduling Multiple Independent Hard-Real-Time Jobs on a Heterogeneous Multiproessor Orlando Moreira NXP Semiondutors Researh Eindhoven, Netherlands orlando.moreira@nxp.om Frederio Valente Universidade

More information

Uncovering Hidden Loop Level Parallelism in Sequential Applications

Uncovering Hidden Loop Level Parallelism in Sequential Applications Unovering Hidden Loop Level Parallelism in Sequential Appliations Hongtao Zhong, Mojtaba Mehrara, Steve Lieberman, and Sott Mahlke Advaned Computer Arhiteture Laboratory University of Mihigan, Ann Arbor,

More information

Reduced-Complexity Column-Layered Decoding and. Implementation for LDPC Codes

Reduced-Complexity Column-Layered Decoding and. Implementation for LDPC Codes Redued-Complexity Column-Layered Deoding and Implementation for LDPC Codes Zhiqiang Cui 1, Zhongfeng Wang 2, Senior Member, IEEE, and Xinmiao Zhang 3 1 Qualomm In., San Diego, CA 92121, USA 2 Broadom Corp.,

More information

Contents Contents...I List of Tables...VIII List of Figures...IX 1. Introduction Information Retrieval... 8

Contents Contents...I List of Tables...VIII List of Figures...IX 1. Introduction Information Retrieval... 8 Contents Contents...I List of Tables...VIII List of Figures...IX 1. Introdution... 1 1.1. Internet Information...2 1.2. Internet Information Retrieval...3 1.2.1. Doument Indexing...4 1.2.2. Doument Retrieval...4

More information

An Alternative Approach to the Fuzzifier in Fuzzy Clustering to Obtain Better Clustering Results

An Alternative Approach to the Fuzzifier in Fuzzy Clustering to Obtain Better Clustering Results An Alternative Approah to the Fuzziier in Fuzzy Clustering to Obtain Better Clustering Results Frank Klawonn Department o Computer Siene University o Applied Sienes BS/WF Salzdahlumer Str. 46/48 D-38302

More information