International Conference on Parallel Processing, St. Charles, IL, August COMMUNICATION OPTIMIZATIONS USED IN THE PARADIGM

Size: px
Start display at page:

Download "International Conference on Parallel Processing, St. Charles, IL, August COMMUNICATION OPTIMIZATIONS USED IN THE PARADIGM"

Transcription

1 Internatonal Conference on Parallel Processng, St. Charles, IL, August COMMCATION OPTIMIZATIONS USED IN THE PARADIGM COMPILER FOR DISTRIBUTED-MEMORY MULTICOMPUTERS Danel J. Palermo, Ernesto Su, John A. Chandy, and Prthvra Baneree Center for Relable and Hgh-Performance Computng Unversty of Illnos at Urbana-Champagn Urbana, IL 11, U.S.A. fpalermo, ernesto, chandy, banereeg@crhc.uuc.edu Abstract The PARADIGM (PARAllelzng compler for DIstrbuted-memory General-purpose Multcomputers) proect at the Unversty of Illnos provdes a fully automated means to parallelze programs, wrtten n a seral programmng model, for executon on dstrbutedmemory multcomputers. To provde ecent executon, PARADIGM automatcally performs varous optmzatons to reduce the overhead and dle tme caused by nterprocessor communcaton. Optmzatons studed n ths paper nclude message coalescng, message vectorzaton, message aggregaton, and coarse gran ppelnng. To separate the optmzaton algorthms from machne-specc detals, parameterzed models are used to estmate communcaton and computaton costs for a gven machne. The models are also used n coarse gran ppelnng to automatcally select a task granularty that balances the avalable parallelsm wth the costs of communcaton. To determne the applcablty of the optmzatons on derent machnes, we analyzed ther performance on an Intel PSC/, an Intel PSC/, and a Thnkng Machnes. 1. INTRODUCTION Dstrbuted-memory multcomputers such as the Intel PSC/, the Intel Paragon, the IBM SP-1, the NCUBE/, and the Thnkng Machnes oer sgncant advantages over shared-memory multprocessors n terms of cost and scalablty. However, lackng a global address space, they present a very dcult programmng model n whch the user must specfy how data and computaton are to be parttoned across processors and determne whch sectons of data need to be communcated among whch processors. To overcome ths dculty, sgncant research eort has been amed at source-to-source parallelzng complers for multcomputers that releve the programmer from the task of program parttonng and communcaton generaton, whle the speccaton of data dstrbutons remans a responsblty of the programmer. These complers take a program wrtten n a sequental or shared-memory parallel language, and based on user-speced parttonng of the data, generate code for a gven multcomputer. Examples nclude Fortran D [1], Fortran 9D [], the SUIF Ths research was supported n part by the Oce of Naval Research under Contract N1-91J-19, and n part by the Natonal Aeronautcs and Space Admnstraton under Contract NASA NAG compler [3], and the Superb compler []. However, many of these research eorts are now lookng nto automated data parttonng. Many researchers n ths area are also currently nvolved n denng Hgh Performance Fortran (HPF) [] to standardze parallel programmng wth data dstrbuton drectves. Some related work on the evaluaton of compler optmzatons performed by the Fortran D compler has prevously appeared n []. The compler optmzatons that they descrbed were selected manually, appled to one-dmensonal parttonngs, and only evaluated on an PSC/. The novel aspects we present n ths paper are the automatc selecton of multdmensonal data parttons, the development of an estmaton framework for coarse gran ppelnng, and the comparson of exstng optmzatons on derent archtectures. The remander of ths paper s organzed as follows. Secton provdes an overvew of the PARADIGM compler. The varous communcaton optmzatons used n the compler, as well as the technques used to select granularty for the ppelnng transformaton, are descrbed n Secton 3. An analyss of the results usng the presented optmzatons s performed n Secton, and conclusons are presented n Secton.. OVERVIEW OF PARADIGM Fgure 1 presents a functonal vew of the maor components n the PARADIGM compler. The compler accepts a sequental program (currently FORTRAN 77) and produces a SPMD (Sngle Program Multple Data) parallel program wth message passng. Followng are bref descrptons of some of the maor areas n the complaton strategy: Program Analyss Parafrase- [7] s used as a preprocessng platform to parse the sequental program nto an ntermedate representaton, to perform useful analyss (such as generatng ow, dependence, and call graphs), as well as to facltate varous code transformatons (such as constant propagaton, nducton varable substtuton, loop dstrbuton, loop nterchange, and scalar expanson). To descrbe parttoned sets of teratons and regons of data, Processor Tagged Descrptors (PTDs) [] are used to provde a unform representaton of the parttonng for every processor. Operatons on PTDs are extremely ef- cent, capturng the eect on all processors n a gven dmenson smultaneously. PTDs are easly extended to an arbtrary number of dmensons and are ndependent of

2 Internatonal Conference on Parallel Processng, St. Charles, IL, August 199 Sequental Program Parafrase- Automatc Data Dstrbuton Module Data dstrbuton specfcatons PARADIGM Communcaton and Optmzatons Module Generc Lbrary Interface Code Generaton Fgure 1: PARADIGM Compler Overvew SPMD Parallel Program the total number of processors. Data Parttonng Dstrbuton of data s determned automatcally by the compler usng a constrant-based algorthm [9, 1], whch selects an abstract multdmensonal mesh topology along wth how program data s to be dstrbuted on the mesh. To mnmze the executon tme for a partcular machne, parameterzed models are used to estmate computaton and communcaton costs. For each target machne, there s a set of parameters whch nterface wth the cost models eectvely solatng the parttonng algorthm from a specc archtecture. Computaton Parttonng Computaton s dvded among processors usng the owner computes rule. A drect applcaton of ths rule wthout further optmzatons leads to run-tme resoluton, whch results n code that computes the ownershp and communcaton for each reference at run tme. An ecent mplementaton of the owner computes rule, however, can avod the overhead of computng ownershp at run tme. For computatons enclosed n a loop nest, the loops can be parttoned (known as loop bounds reducton [1]) allowng processors to execute only those teratons whch have assgnments that wrte to local memory. Communcaton Analyss The references n assgnment statements are also analyzed to detect the need for communcaton. PTDs are constructed to descrbe the teratons requrng communcaton of non-local data, the processors nvolved, and the exact regons of the arrays to be sent or receved. Once the communcaton descrptors have been computed for ndvdual references, varous communcaton optmzatons can be performed (see Secton 3). Data dependence and ow nformaton s also used to determne whether a gven optmzaton s applcable, and f so, to what extent. Processor Mappng The compler generates code that vews the target machne as a multdmensonal mesh of processors. The exact conguraton of ths mesh s chosen durng the automatc data-parttonng phase. Snce a mesh topology can be easly mapped onto other topologes, machne-dependent processor mappng s acheved through lbrary support to ecently map the mesh to a gven archtecture [11]. Generc Lbrary Interface Support for specc communcaton lbrares s provded through a generc lbrary nterface. For each supported lbrary, abstract functons are mapped to correspondng lbrary-specc code generators at comple-tme. Lbrary nterfaces have been mplemented for the Intel PSC communcaton lbrary, Thnkng Machnes CMMD, Parasoft Express [1], PVM [13], and the Portable Instrumented Communcaton Lbrary (PICL) [1, 1]. Express, PVM, and PICL also provde executon tracng and support for many derent machnes. The portablty of ths nterface allows the compler to generate code for a wde varety of machnes. Summary When fully mplemented, PARADIGM wll be capable of performng all of the followng tasks automatcally: generaton of data parttonng drectves [9, 1], parttonng of computaton and generaton of communcaton [11], synthess of hgh-level communcaton [1], explotaton of functonal parallelsm [17], support of a multthreaded executon model [1], and support of rregular computatons [19]. 3. COMMCATION OPTIMIZATIONS The rst three communcaton optmzatons examned n ths paper, message coalescng, message vectorzaton, and message aggregaton, are targeted at reducng the overhead assocated wth communcaton [1, ]. These optmzatons rely on the fact that the start-up cost of communcaton for dstrbuted-memory multcomputers s much greater than the per-byte transmsson cost (by a factor of over tmes for the PSC/ and 7 tmes for the ). In general, gven the start-up latency and transmsson rate for a specc archtecture (see Table 1), the compler uses a communcaton model n whch the transfer cost (n sec) of a message of m bytes s dened as: transfer(m) = ovhd + rate m (1) For a machne such as the PSC/, the parameters depend on the length of the message, so the model becomes: + :m (f m 1) transfer(m) = 1 + :3m (f m > 1) Several optmzatons can be employed to ncrease the performance of the parallel program by combnng messages n varous ways to reduce the total amount of communcaton overhead. 3.1 Message Coalescng Separate communcaton for derent references to the same data s unnecessary f the data has not been mod- ed between uses. When statcally analyzng the access Table 1: Communcaton Model Parameters [] Machne Sze ovhd (s) rate (s) m m > 1.1 PSC/ m 1. m > PSC/ m m > 1 7.3

3 Internatonal Conference on Parallel Processng, St. Charles, IL, August P1 P P1 P P1 P P1 P (a) Before (b) After Fgure : Message Vectorzaton patterns, these redundant communcatons are detected and coalesced nto a sngle message, allowng the data to be reused rather than communcated for every reference. For sectons of arrays, unons of overlappng PTD ndex sets ensure that each unmoded data element s communcated only once. Coalescng s always benecal snce entre communcaton operatons can be elmnated. 3. Message Vectorzaton Non-local elements of an array that are ndexed wthn a loop nest can also be vectorzed nto a sngle larger message nstead of beng communcated ndvdually (see Fgure ). Dependence analyss s used to determne the outermost loop at whch the combnng can be appled. The \temwse" messages are combned, or vectorzed, as they are lfted out of the enclosng loop nests to the selected vectorzaton level. (a) Vectorzaton reduces the total number of communcaton operatons, but also ncreases the message length. 3.3 Message Aggregaton Messages (correspondng to several array sectons) to be communcated between the same source and destnaton can also be aggregated nto a sngle larger message. Multple communcaton operatons (to be performed at the same pont) are sorted by ther destnatons durng the communcaton analyss. Messages wth dentcal destnatons can then be collected nto a sngle communcaton operaton (see Fgure 3). The gan from aggregaton s smlar to vectorzaton n that multple communcaton operatons can be elmnated at the cost of ncreasng the message length. Aggregaton can be performed on communcaton operatons of ndvdual data references as well as vectorzed communcaton operatons. Both of these applcatons of message aggregaton wll be examned n Secton. 3. Coarse Gran Ppelnng In loops where there are no cross-teraton dependences, parallelsm s extracted by ndependently executng groups of teratons on separate processors. However, n cases where there are cross-teraton dependences due to recurrences, t s not possble to mmedately execute every teraton. Often, there s the opportunty to overlap parts of the loop executon, usng some form of synchronzaton to ensure that the data dependences are enforced. In Fgure a, the rst processor s performng an operaton on every element of the rows t owns before sendng the border row to the watng processor, thereby seralzng executon of the entre computaton. In the example n (a) Management of avalable memory may requre that large regons of data be only partally vectorzed. [] (a) Before (b) After Fgure 3: Message Aggregaton do = 1, Y =p do = 1, X a(, ) = a(-1, ) + a(, -1) P P1 P (a) Before Transformaton do = 1, X do = 1, Y =p a(, ) = a(-1, ) + a(, -1) P P1 P (b) After Loop Interchange t 1 t t 3 Fgure : Code Example for Loop Ppelnng Fgure b, a loop nterchange has been appled such that the rst processor now can compute one parttoned column of elements. It can then send the border element of that column to the next processor so that processor can begn computaton mmedately. Such technques have been used n the desgn of systolc arrays [3, ] as well as n software ppelnng []. Ideally, f communcaton has zero overhead, ths s the most ecent form of computaton, snce no processor wll wat unnecessarly. Unfortunately, ths assumpton s not vald for dstrbuted-memory systems. By consderng overhead, the cost of performng numerous sngle element communcatons can be qute expensve. To address ths problem, the total communcaton overhead can agan be reduced by ncreasng the granularty of the communcaton. Ths procedure has become known as coarse gran ppelnng [] Executon Analyss In Fgure, an executon framework of a generalzed twolevel ppelned loop nest (smlar to that n Fgure ) s presented. In ths model, coarse gran ppelnng s performed by strp-mnng the outer dmenson of the twodmensonal loop nest whle channg s appled between consecutve ppelnes at an outer teraton level. Dentons of all varables whch wll be used n the analyss are also lsted along wth the gure. Snce the amount of avalable parallelsm s reduced as the granularty or strp sze, s, s ncreased, ths value must t t 3 t 3 t 1 t t 3

4 Internatonal Conference on Parallel Processng, St. Charles, IL, August 199 scomm startup = (p - 1)(s comp + comm) chaned ppelne = L ppelne + (L - 1) sync ppelne = X comp + (X/s - 1) ovhd(s) sync s comp + ovhd(s) scomm + comm -ovhd(s) comm P3 s comp P comm comm -Sovhd(s) P1 P s comp scomm comm -Rovhd(s) Strp Computaton Send (Sovhd) Forward Reference Sync Transfer L = outer loop teratons X; Y = number of columns and rows c = cost of nner loop nstructons p = number of processors s = strp sze (s b = strp sze n bytes) Receve (Rovhd) Strp Transfer transfer(m) = communcaton tme of m elements overhead(m) = communcaton overhead for m elements comp = computaton tme of one column = dy=pe c comm = tme for transfer(s) (strp of a row) scomm = tme for transfer(x) (entre row) Fgure : Estmaton Framework for Coarse Gran Ppelnng do l = 1, L comm(sze = X) do = 1, X f (my$p > ) recv(sze = 1) do = 1, dy=pe computaton f (my$p < p? 1) send(sze = 1) (a) Fne gran ppelnng do l = 1, L comm(sze = X) do = 1, X, s bb = mn(s,x-+1) f (my$p > ) recv(sze = bb) do =, + bb - 1 do = 1, dy=pe computaton f (my$p < p? 1) send(sze = bb) (b) Coarse gran ppelnng Fgure : Cross Processor Loop Ppelnng be carefully selected. If s s one, we have ne gran ppelnng, and f s s equal to the bounds of the seral loop, X, the executon s seralzed n the nner dmensons (no ppelnng). Somewhere n between les an optmal s that maxmzes the overlap of communcaton and computaton. In order to nvestgate ths tradeo, an executon tme estmate s developed from the framework to allow automatc selecton of a strp sze that yelds the hghest performance. The rst maor phase of executon s the tme requred to ll the ppelne. Ths s related to the number of processors as well as the strp sze. From the dagram, t can be seen that: startup = (p? 1)(s comp + comm) The next porton of executon s the tme spent n the ppelne. Ideally, wth no communcaton overhead, ths tme should be equal to the amount of computaton (X comp). However, because of the presence of communcaton, the tme for each message communcated n the ppelne must also be taken nto account. The number of communcaton operatons s X s. Therefore, l X m ppelne = X comp +? 1 overhead(s) s For loops wth ow-dependences caused by forward references n the computaton, the executon of a sequence of ppelned loop nests wll also generate communcaton ncurrng some extra synchronzaton costs to communcate the requred data. Snce most parallel machnes only support a sngle channel for memory transfer operatons through the communcaton network, ths causes some extra delay whch can be seen durng multple sends n Fgure. (b) The \scomm" term s used to represent the (b) In some applcatons there are no forward references, and therefore, no need to synchronze between outer loop teratons. In these cases each processor can proceed wthout any further synchronzaton.

5 Internatonal Conference on Parallel Processng, St. Charles, IL, August 199 amount of synchronzng communcaton (If an entre row s communcated, then ths cost s transfer(x)). Note that ths wll also be present n the start-up synchronzaton for the loop nest (see Fgure ). sync = s comp + 3 comm? overhead(s) + scomm Snce the ppelne s entered L tmes, the total executon tme s then: total cost = scomm + startup + L ppelne + (L? 1) sync Y = [LX + s(p + L? )] c + p (p + 3L? ) transfer(s) + h l X m L? + 1 overhead(s) + s L transfer(x) () Usng the communcaton cost model prevously presented (see Equaton 1) the total cost becomes: Y total cost = [LX + s(p + L? )] c + p (p + 3L? )(ovhd + s b rate) + h l m X L? + 1 ovhd + s L(ovhd + X b rate) The total cost can then be mnmzed wth respect to the strp sze to select the optmal granularty. Vercaton of the second dervatve wll show that ths value of s s ndeed a mnmum. Snce ovhd and rate are actually functons of s, Equaton 3 s evaluated teratvely total cost = (p + L? ) Y c + p s = (p + 3L? )b rate? L ovhd X s = s X L ovhd (p + L? )(b rate + (3) Y p c) + b(l? 1)rate 3.. Seral Computaton Cost Estmaton Snce the cost, c, of the nner loop computatons appears n ths expresson, t s necessary to estmate the computaton costs. Two technques can be used: source-level [9] and assembly-level cost estmaton. For each method of estmaton, a gven machne's operaton costs are expressed n terms of clock cycles. In the case of source-level estmaton, these costs must take nto account any support nstructons (address computaton, regster loads, stores, etc.) performed for the actual computaton. Assembly-level estmaton requres the tmng costs of ndvdual machne nstructons as well as beng able to perform pre-complaton of the source under examnaton. In both cases, estmaton of the cost of a xed block of code (whch may contan loops and other control ow structures) requres computng a dynamc cycle count. Table : Data Parttonng for Intal Tests Test PSC/ Dstrbuton ADI 1 1, block, EXPL block, Jacob 11 block, block. EVALUATION OF OPTIMIZATIONS A group of small scentc program kernels are used to examne the performance of the presented communcaton optmzatons. The selected program fragments nclude ADI Integraton (ADI, kernel ) [] -D Explct Hydrodynamcs (EXPL, kernel 1) [] Jacob's Iteratve Method Three other programs whch exhbt cross-teraton dependences are selected to examne the ppelnng optmzaton: Implct Hydrodynamcs (IMPL, kernel 3) [] Successve Over-Relaxaton Iteratve Method (SOR) Block Lower Trangular Solver (BLTS) [] These programs wll be examned separately n Secton.. Array dmensons are statcally speced and loop bounds are determned at comple tme when possble. Both the vrtual mesh conguraton as well as the data parttonng of the arrays are automatcally selected by the compler. The szes of the maor arrays and the chosen dstrbutons are shown n Table for each of the rst three programs. The evaluaton of the overhead optmzatons s performed wth processors of an Intel PSC/ as well as processors of a Thnkng Machnes. Larger arrays were used on the to relably obtan measurable tmes. Traces taken usng PICL [1] are analyzed usng the ParaGraph [1] vsualzaton tool to examne the eect of each optmzaton. The ParaGraph \Spacetme dagram" s used to further examne the executon prole vewng the frequency and amount of communcaton that takes place. Contnuous lnes ndcate unnterrupted executon whle a break n a lne ndcates that a processor has blocked awatng communcaton. Communcaton operatons are ndcated as vertcal lnes between the cooperatng processors' executon proles (see Fgures 7 to 1). Snapshot vews of the executon of EXPL compled wth the selected optmzatons are shown n each gure whle performance data s presented for all three test programs. For comparson purposes, the reported executon tmes have been normalzed to the seral executon of the correspondng program and are further separated nto two quanttes: { amount of tme spent on useful computaton (where useful refers to only the code whch carres out the actual computaton) { tme spent executng code related to computaton parttonng and communcaton The relatve eectveness of each optmzaton s determned by examnng the amount of overhead elmnated as the optmzatons are ncrementally appled.

6 Internatonal Conference on Parallel Processng, St. Charles, IL, August Intel PSC/ Thnkng Machnes Intel PSC/ Thnkng Machnes Fgure 7: Evaluaton of Run-tme Resoluton Fgure : Reduced Loop Bounds & Coalescng Intel PSC/ Thnkng Machnes Intel PSC/ Thnkng Machnes Fgure 9: Evaluaton of Message Vectorzaton Fgure 1: Message Vectorzaton & Aggregaton Unprocessor Executon Communcaton Vectorzaton Run-tme Resoluton Communcaton Aggregaton Reduced Loop Bounds All Optmzatons appled Traces taken from Explct Hydrodynamcs (Lvermore Kernel 1) usng a 1 processor PSC/ (tme n s)

7 Internatonal Conference on Parallel Processng, St. Charles, IL, August Results of Run-tme Resoluton A drect applcaton of the owner computes rule wthout any further optmzaton leads to run-tme resoluton. Snce each processor must execute the entre teraton space (to determne f any other processor needs locally owned data), derent nstances of a communcaton operaton are eectvely seralzed across the processors n a gven mesh dmenson (see Fgure 7). Multple communcaton operatons appear to be ppelned, but the tme spent between successve messages can be qute large. Examnng the executon tme for each program, t can be seen that traversng the entre teraton space to compute ownershp results n large amounts of overhead. Communcaton for resoluton programs s also very necent snce t s comprsed of a large number of small (sometmes even redundant) messages resultng n hgh communcaton overhead. The net result s a reducton n performance when compared to the seral case.. Results of Loop Bounds Reducton and Message Coalescng By applyng loop bounds reducton and statcally generatng communcaton operatons, the seralzaton present n the baselne resoluton cases can be elmnated. The statcally generated communcaton operatons are eectvely collapsed n tme as compared to the seralzed communcaton present n run-tme resoluton (see the spacetme dagram n Fgure ). Statc analyss of communcaton allows the loop bounds to be parttoned nstead of requrng every processor to check the ownershp of each reference for the entre teraton space. The overhead s dramatcally reduced as all ownershp and communcaton s now statcally determned. Note, however, that the sze of the messages s stll dentcal to that n run-tme resoluton (sngle elements), but redundant messages have been elmnated by applcaton of the coalescng optmzaton. It s stll apparent that there s an excessve amount of small messages beng communcated (as seen n Fgure ) snce communcaton overhead s now the domnant factor n the executon overhead..3 Results of Message Vectorzaton It s also possble to vectorze the communcaton operatons after loop bounds reducton and message coalescng have been appled. Usng dependence nformaton to determne when vectorzaton s applcable, communcaton operatons can be lfted out of the nner-most loops thereby reducng the communcaton frequency (ths can be seen n Fgure 9). Recall that ths also ncreases the sze of the messages, but snce the start-up cost s much greater than the per-byte cost, there s a large gan n performance. For ADI, t s possble to vectorze all communcaton completely out of the loop nest. Snce there s no longer any synchronzaton due to communcaton wthn the loops, the reducton of the array bounds resulted n super-lnear speedup, whch can be attrbuted to a gan n cache performance (also see Fgure 11).. Results of Message Aggregaton Aggregaton also reduces communcaton frequency by groupng multple communcaton operatons, ncreasng the sze of the resultng message. (see Fgure 1). Applyng aggregaton after loop bounds reducton and message ADI Expl Jacob ADI Expl Jacob.... (a) processor Intel PSC/.... x1 x1 (b) processor Thnkng Machnes CM Fgure 11: Comparson of Combned Optmzatons coalescng, a performance gan s seen for both ADI and EXPL. There was no mprovement usng aggregaton on Jacob's method snce there was only one message communcated for each source/destnaton par. By applyng aggregaton after vectorzaton reduces the communcaton overhead as can be seen n the \thnnng" of the communcaton n the executon proles when comparng Fgures 9 and 1. Snce groups of messages are combned, the performance mprovement s related to the number of derent array sectons that need to be communcated at the same pont n the program. For the small test programs examned, only a few arrays were nvolved n the communcaton, and hence the overall performance mproved only slghtly.. Results of Combned Optmzatons Fgure 11 shows the relatve speedup and the amount of overhead measured wth each optmzaton. Elmnatng the overhead of traversng the entre teraton space to compute ownershp, the combnaton of statc communcaton generaton and loop bounds reducton attaned roughly half of the total avalable performance for most programs. By applyng message vectorzaton t was possble to reduce the maorty of the remanng communcaton overhead resultng n near performance. Message aggregaton was benecal for the two programs (ADI, EXPL) whch contaned references to a number of derent arrays. Due to the more lmted scope of applcaton, ts eect tended to not be as dramatc as the other optmzatons. Snce the compler selected a two-dmensonal parttonng for Jacob's method, an extra run s shown n Fgure 11 where the compler was forced to generate a one-dmensonal parttonng. For the eght processor PSC/, the executon tme ncreased by about %. For the larger, processor, the executon tme has ncreased by over 3% and wll become more apparent wth larger numbers of processors. The hghest performance was acheved by allowng the compler to select the best dstrbuton based on cost models to estmate communca-

8 Internatonal Conference on Parallel Processng, St. Charles, IL, August 199 Table 3: Mesh Conguratons Test N Mesh ADI/ 1 EXPL Test N Mesh Jacob ton and computaton. Each of the fully optmzed test programs were also executed wth larger numbers of processors to examne the scalablty. In Table 3, each program's mesh conguraton s shown as selected by the automatc parttonng pass. The speedup curves for each of the test programs run on an PSC/, an PSC/, and a, can be seen n Fgure 1. The super-lnear speedup s agan observed for ADI and can be attrbuted to the cache eect prevously descrbed. The executon of these three programs was also compared to an exstng data-parallel compler avalable on the usng a language known as Connecton Machne Fortran (CMF.1 Fnal, CMOST 7.3). (c) The reducton n performance of the programs compled wth CMF can be attrbuted to the fact that t uses a SIMD (Sngle Instructon Multple Data) model of executon for complaton (carred over from the CM-). Smulatng a SIMD archtecture on a MIMD multcomputer (such as the ) ncurs farly hgh synchronzaton costs between blocks of computaton. Future versons of the CMF compler wll most lkely use a more asynchronous model of computaton (.e. SPMD) whch s better suted to the.. Results of Coarse Gran Ppelnng To evaluate the qualty of the strp sze estmate developed n Secton 3. (see Equaton 3), each of the test programs (IMPL, SOR, BLTS) was executed wth varyng strp szes to compare the estmate wth the actual mnmum. We also examned a smpler one-level strp sze estmate developed n the Fortran D proect []. It s nterestng to note that our Two-Level estmate reduces to ther One-Level estmate when several assumptons are appled. In ts most general form, the smpler estmate assumes that channg does not occur between consecutve ppelnes, communcaton s modeled as constant overhead, and the array dmensons are square. One-Level = r p p? 1 ovhd c Gven the two forms of strp sze estmates, the optmal strp sze s compared wth the estmated mnmums n Table. The executon tme usng the Two-Level estmate can be seen to be no more than % away from the optmal tme whle the most general form of the One-Level estmate was, at tmes, more than 3% worse than the actual (c) The vector unts were not utlzed for ether the CMF or message passng runs snce the node complers would not produce vector code for the message passng programs PSC/ PSC/ PSC/ PSC/ 1 1 (CMF) (a) ADI Integraton PSC/ PSC/ (CMF) (b) Explct Hydrodynamcs (CMF) (c) Jacob's Iteratve Method 1 1 Fgure 1: Performance Comparson mnmum. The One-Level estmate predcted strp szes of roughly half the sze of the Two-Level estmate. In fact, a further approxmaton q s made n Fortran D, resultng ovhd n an estmate of whch proves to be even farther c from the mnmum. The maor gan seen wth the Two-Level estmate comes from the channg of consecutve ppelnes. In the One-Level estmate, whch only models a sngle ppelne, the ppelne start-up costs have a more sgncant contrbuton and tend to reduce the strp sze n order to mnmze the total executon tme. When channg s taken nto account, as n the Two-Level estmate, the overall contrbuton of the start-up phase s much less. For both machnes examned, however, the communcaton rate dd not have a sgncant eect on the Two-Level estmate snce only small messages were communcated (the overhead of communcaton was two orders of magntude greater than the rate). For programs n whch larger amounts of data need to be communcated wthn the ppelne, the communcaton rate would most lkely become more sgncant. In Fgure 13, speedup curves are shown for the measured data as well as that predcted by the Two-Level estmate (wth the optmal pont ndcated for each). On the PSC/ the rato of communcaton to computaton s farly hgh whereas ths rato s much lower on the

9 Internatonal Conference on Parallel Processng, St. Charles, IL, August PSC/ - procs PSC/ - estmate 1 PSC/ - procs PSC/ - estmate Strp Sze 7 3 PSC/ - procs PSC/ - estmate 1 PSC/ - procs PSC/ - estmate Strp Sze PSC/ - 1 procs estmate (a) Implct Hydrodynamcs Strp Sze PSC/ - 1 procs estmate Strp Sze (b) Successve Over-Relaxaton Table : Data Parttonng for Ppelne Tests Test Array Szes Dstrbuton IMPL 1 1, block SOR, block BLTS 3,, block,, Table : Comparson of Estmates (Strp Sze, ) Machne Test L Optmal Two-Level One-Level IMPL PSC/ SOR ( proc) BLTS IMPL PSC/ SOR ( proc) BLTS IMPL PSC/ SOR (1 proc) BLTS PSC/ - procs PSC/ - estmate 1 PSC/ - procs PSC/ - estmate 1 1 Strp Sze PSC/ - 1 procs estmate 1 1 Strp Sze (c) Sparse Block Lower Trangular Solver Fgure 13: of Ppelnng vs. Granularty PSC/. (d) For ths reason, the performance curves tend to be steeper on the PSC/, makng the selecton of the correct strp sze more crtcal. In Fgure 1, traces from the SOR kernel also show the correlaton of the executon prole to the framework presented n Secton Snce there tends to be a range of strp szes whch all provde smlar performance, predctng the trend s more mportant than predctng the exact tme of the executon. Wth the correct trend, t s possble to automatcally select a granularty that approaches the mnmum executon tme. Estmates for the PSC/ followed the trend qute well, but dd not match the measured tmes so they are scaled n magntude to facltate comparson. Ths s expected due to the advanced optmzatons performed by the target compler. The estmates for the PSC/ were farly accurate when drectly compared to the measured data (except for the BLTS estmate whch was also scaled). For IMPL and SOR, communcaton s more tghtly coupled wth computaton than n BLTS. For these programs, control of the granularty had a greater eect on the resultng performance. On the other hand, the computaton n BLTS was not as ne graned and requred lttle or no ncrease n gran sze. (d) Comparng the PSC/ to the PSC/ the change n communcaton overhead s about a factor of whle the change n computaton s about a factor of 1.. CONCLUSIONS One of the most complex tasks facng a user n parallelzng seral programs s dealng wth nterprocessor communcaton. In ths paper, t has been shown that ths task can be performed by the compler through the use of good estmates for communcaton and computaton costs. It has also been shown that the applcaton of the presented optmzaton technques yelds hgh performance on several derent dstrbuted-memory multcomputers. By applyng the presented optmzatons, t was possble to amortze communcaton overhead obtanng near performance. Through use of good computaton and communcaton estmates, the compler was able to select the best dstrbuton for even small derences n performance. For larger machne szes and more complex programs, the utlty of automatc data dstrbuton becomes more apparent as the communcaton costs become greater for nferor data dstrbutons. The performance of coarse gran ppelnng s drectly nuenced by the relatve costs of communcaton and computaton on a gven machne. The estmates presented n ths paper allow for varaton n both computatonal power as well as the communcaton latency and bandwdth of derent machnes. Comparng the estmated mnmum to the measured data, t s apparent that the compler s able to automatcally select a granularty that gves nearoptmal performance. Currently, PARADIGM automatcally performs loop bounds reducton, message coalescng, message vectorzaton, and message aggregaton. The coarse gran ppelne transformaton wll be ntegrated nto the compler n the near future. Acknowledgements: We would lke to thank the revewers for ther helpful nput, and to thank Antono Lan, Chrsty Palermo, and Shankar Ramaswamy for ther nsghtful comments and suggestons.

10 Internatonal Conference on Parallel Processng, St. Charles, IL, August Fne Gran Fgure 1: Traces of SOR wth Ppelnng Coarse Gran REFERENCES [1] S. Hranandan, K. Kennedy, and C. Tseng, \Complng Fortran D for MIMD Dstrbuted Memory Machnes," Communcatons of the ACM, vol. 3, pp. {, Aug [] Z. Bozkus, A. Choudhary, G. Fox, T. Haupt, and S. Ranka, \Fortran 9D/HPF Compler for Dstrbuted Memory MIMD Computers, Desgn, Implementaton, and Performance Results," Proceedngs of the 1993 ACM Internatonal Conference on Supercomputng, pp. 31{3, July [3] S. P. Amarasnghe and M. S. Lam, \Communcaton Optmzaton and Code Generaton for Dstrbuted Memory Machnes," n Proceedngs of the ACM SIGPLAN'93 Conference on Programmng Language Desgn and Implementaton, pp. 1{13, June [] B. Chapman, P. Mehrotra, and H. Zma, \Programmng n Venna Fortran," n Thrd Workshop on Complers for Parallel Computers, pp. 1{1, 199. [] Hgh Performance Fortran Forum, \Hgh Performance Fortran Language Speccaton, verson 1.," Tech. Rep. CRPC-TR9, CRPC, Rce Unversty, Houston, TX, May [] S. Hranandan, K. Kennedy, and C.-W. Tseng, \Evaluaton of Compler Optmzatons for Fortran D on MIMD Dstrbuted-Memory Machnes," n Proceedngs of the 199 ACM Internatonal Conference on Supercomputng, (Washngton, DC), July 199. [7] C. D. Polychronopoulos, M. Grkar, M. R. Haghghat, C. L. Lee, B. Leung, and D. Schouten, \Parafrase-: An Envronment for Parallelzng, Parttonng, Synchronzng and Schedulng Programs on Multprocessors," n Proceedngs of the 199 Internatonal Conference on Parallel Processng, pp. II:39{, Aug [] E. Su, D. J. Palermo, and P. Baneree, \Processor Tagged Descrptors: A Data Structure for Complng for Dstrbuted-Memory Multcomputers," to appear n the 199 Internatonal Conference on Parallel Archtectures and Complaton Technques, 199. [9] M. Gupta and P. Baneree, \Demonstraton of Automatc Data Parttonng Technques for Parallelzng Complers on Multcomputers," IEEE Transactons on Parallel and Dstrbuted Systems, vol. 3, pp. 179{193, Mar [1] M. Gupta and P. Baneree, \PARADIGM: A Compler for Automated Data Parttonng on Multcomputers," n Proceedngs of the 1993 ACM Internatonal Conference on Supercomputng, (Tokyo, Japan), July [11] E. Su, D. J. Palermo, and P. Baneree, \Automatng Parallelzaton of Regular Computatons for Dstrbuted Memory Multcomputers n the PARADIGM Compler," n Proceedngs of the 1993 Internatonal Conference on Parallel Processng, pp. II:3{3, Aug [1] Parasoft Corporaton, Pasadena, CA, Express Reference Gude for FORTRAN Programmers, 199. [13] G. A. Gest, A. Begueln, J. J. Dongarra, W. Jang, R. Manchek, and V. S. Sunderam, \PVM 3. User's Gude and Reference Manual," Oak Rdge Natonal Laboratory, Oak Rdge, TN, Feb [1] M. T. Heath and J. A. Etherdge, \Vsualzng the Performance of Parallel Programs," IEEE Software, vol., pp. 9{39, Sept [1] G. A. Gest, M. T. Heath, B. W. Peyton, and P. H. Worley, \PICL: A Portable Instrumented Communcaton Lbrary, C reference manual," Tech. Rep. ORNL/TM-1113, Oak Rdge Natonal Laboratory, Oak Rdge, TN, July 199. [1] J. L and M. Chen, \Complng Communcaton-Ecent Programs for Massvely Parallel Machnes," IEEE Transactons on Parallel and Dstrbuted Systems, vol., pp. 31{37, July [17] S. Ramaswamy,, S. Sapatnekar, and P. Baneree, \A Convex Programmng Approach for Explotng Data and Functonal Parallelsm on Dstrbuted Memory Multcomputers," to appear n the 199 Internatonal Conference on Parallel Processng, 199. [1] J. G. Holm, A. Lan, and P. Baneree, \Complaton of Scentc Programs nto Multthreaded and Message Drven Computaton," to appear n the 199 Scalable Hgh Performance Computng Conference, 199. [19] A. Lan and P. Baneree, \Technques to Overlap Computaton and Communcaton n rregular Iteratve Applcatons," to appear n the ACM Internatonal Conference on Supercomputng 199, 199. [] T. von Ecken, D. E. Culler, S. C. Goldsten, and K. E. Schauser, \Actve Messages: a Mechansm for Integrated Communcaton and Computaton," n Proceedngs of the 19th Annual Internatonal Symposum on Computer Archtecture, pp. {, May 199. [1] M. Gerndt, \Updatng Dstrbuted Varables n Local Computatons," Concurrency Practce and Experence, vol., pp. 171{193, Sept [] V. Balasundaram, G. Fox, K. Kennedy, and U. Kremer, \An Interactve Envronment for Data Parttonng and Dstrbuton," n Proceedngs of the th Dstrbuted Memory Computng Conference, (Charleston, SC), pp. II,11{ 117, Apr [3] H. T. Kung, \Why Systolc Archtectures?," Computer, vol. 1, no. 1, pp. 37{, 19. [] D. I. Moldovan, Parallel Processng: From Applcatons to Systems. Morgan Kaufman, [] F. McMahon, \The Lvermore Fortran Kernels: A computer test of the numercal performance range," Tech. Rep. UCRL-37, Lawrence Lvermore Natonal Laboratory, 19. [] D. Baley, J. Barton, T. Lasnsk, and H. Smon, \The NAS Parallel Benchmarks," Tech. Rep. RNR-91-, NASA Ames Research Center, 1991.

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

Parallel matrix-vector multiplication

Parallel matrix-vector multiplication Appendx A Parallel matrx-vector multplcaton The reduced transton matrx of the three-dmensonal cage model for gel electrophoress, descrbed n secton 3.2, becomes excessvely large for polymer lengths more

More information

An Entropy-Based Approach to Integrated Information Needs Assessment

An Entropy-Based Approach to Integrated Information Needs Assessment Dstrbuton Statement A: Approved for publc release; dstrbuton s unlmted. An Entropy-Based Approach to ntegrated nformaton Needs Assessment June 8, 2004 Wllam J. Farrell Lockheed Martn Advanced Technology

More information

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster

More information

2.1. The Program Model

2.1. The Program Model Hyperplane Parttonng : n pproach to Global ata Parttonng for strbuted Memory Machnes S. R. Prakash and Y.. Srkant epartment of S, Indan Insttute of Scence angalore, Inda, 6 bstract utomatc Global ata Parttonng

More information

Channel 0. Channel 1 Channel 2. Channel 3 Channel 4. Channel 5 Channel 6 Channel 7

Channel 0. Channel 1 Channel 2. Channel 3 Channel 4. Channel 5 Channel 6 Channel 7 Optmzed Regonal Cachng for On-Demand Data Delvery Derek L. Eager Mchael C. Ferrs Mary K. Vernon Unversty of Saskatchewan Unversty of Wsconsn Madson Saskatoon, SK Canada S7N 5A9 Madson, WI 5376 eager@cs.usask.ca

More information

Concurrent Apriori Data Mining Algorithms

Concurrent Apriori Data Mining Algorithms Concurrent Apror Data Mnng Algorthms Vassl Halatchev Department of Electrcal Engneerng and Computer Scence York Unversty, Toronto October 8, 2015 Outlne Why t s mportant Introducton to Assocaton Rule Mnng

More information

Loop Permutation. Loop Transformations for Parallelism & Locality. Legality of Loop Interchange. Loop Interchange (cont)

Loop Permutation. Loop Transformations for Parallelism & Locality. Legality of Loop Interchange. Loop Interchange (cont) Loop Transformatons for Parallelsm & Localty Prevously Data dependences and loops Loop transformatons Parallelzaton Loop nterchange Today Loop nterchange Loop transformatons and transformaton frameworks

More information

Loop Transformations for Parallelism & Locality. Review. Scalar Expansion. Scalar Expansion: Motivation

Loop Transformations for Parallelism & Locality. Review. Scalar Expansion. Scalar Expansion: Motivation Loop Transformatons for Parallelsm & Localty Last week Data dependences and loops Loop transformatons Parallelzaton Loop nterchange Today Scalar expanson for removng false dependences Loop nterchange Loop

More information

Chapter 1. Introduction

Chapter 1. Introduction Chapter 1 Introducton 1.1 Parallel Processng There s a contnual demand for greater computatonal speed from a computer system than s currently possble (.e. sequental systems). Areas need great computatonal

More information

and NSF Engineering Research Center Abstract Generalized speedup is dened as parallel speed over sequential speed. In this paper

and NSF Engineering Research Center Abstract Generalized speedup is dened as parallel speed over sequential speed. In this paper Shared Vrtual Memory and Generalzed Speedup Xan-He Sun Janpng Zhu ICASE NSF Engneerng Research Center Mal Stop 132C Dept. of Math. and Stat. NASA Langley Research Center Msssspp State Unversty Hampton,

More information

Shared Virtual Memory Machines. Mississippi State, MS Abstract

Shared Virtual Memory Machines. Mississippi State, MS Abstract Performance Consderatons of Shared Vrtual Memory Machnes Xan-He Sun Janpng Zhu Department of Computer Scence NSF Engneerng Research Center Lousana State Unversty Dept. of Math. and Stat. Baton Rouge, LA

More information

The Codesign Challenge

The Codesign Challenge ECE 4530 Codesgn Challenge Fall 2007 Hardware/Software Codesgn The Codesgn Challenge Objectves In the codesgn challenge, your task s to accelerate a gven software reference mplementaton as fast as possble.

More information

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points; Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features

More information

an assocated logc allows the proof of safety and lveness propertes. The Unty model nvolves on the one hand a programmng language and, on the other han

an assocated logc allows the proof of safety and lveness propertes. The Unty model nvolves on the one hand a programmng language and, on the other han UNITY as a Tool for Desgn and Valdaton of a Data Replcaton System Phlppe Quennec Gerard Padou CENA IRIT-ENSEEIHT y Nnth Internatonal Conference on Systems Engneerng Unversty of Nevada, Las Vegas { 14-16

More information

Sorting Review. Sorting. Comparison Sorting. CSE 680 Prof. Roger Crawfis. Assumptions

Sorting Review. Sorting. Comparison Sorting. CSE 680 Prof. Roger Crawfis. Assumptions Sortng Revew Introducton to Algorthms Qucksort CSE 680 Prof. Roger Crawfs Inserton Sort T(n) = Θ(n 2 ) In-place Merge Sort T(n) = Θ(n lg(n)) Not n-place Selecton Sort (from homework) T(n) = Θ(n 2 ) In-place

More information

Virtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory

Virtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory Background EECS. Operatng System Fundamentals No. Vrtual Memory Prof. Hu Jang Department of Electrcal Engneerng and Computer Scence, York Unversty Memory-management methods normally requres the entre process

More information

Communication-Minimal Partitioning and Data Alignment for Af"ne Nested Loops

Communication-Minimal Partitioning and Data Alignment for Afne Nested Loops Communcaton-Mnmal Parttonng and Data Algnment for Af"ne Nested Loops HYUK-JAE LEE 1 AND JOSÉ A. B. FORTES 2 1 Department of Computer Scence, Lousana Tech Unversty, Ruston, LA 71272, USA 2 School of Electrcal

More information

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,

More information

Ecient Computation of the Most Probable Motion from Fuzzy. Moshe Ben-Ezra Shmuel Peleg Michael Werman. The Hebrew University of Jerusalem

Ecient Computation of the Most Probable Motion from Fuzzy. Moshe Ben-Ezra Shmuel Peleg Michael Werman. The Hebrew University of Jerusalem Ecent Computaton of the Most Probable Moton from Fuzzy Correspondences Moshe Ben-Ezra Shmuel Peleg Mchael Werman Insttute of Computer Scence The Hebrew Unversty of Jerusalem 91904 Jerusalem, Israel Emal:

More information

Vectorization in the Polyhedral Model

Vectorization in the Polyhedral Model Vectorzaton n the Polyhedral Model Lous-Noël Pouchet pouchet@cse.oho-state.edu Dept. of Computer Scence and Engneerng, the Oho State Unversty October 200 888. Introducton: Overvew Vectorzaton: Detecton

More information

Simulation Based Analysis of FAST TCP using OMNET++

Simulation Based Analysis of FAST TCP using OMNET++ Smulaton Based Analyss of FAST TCP usng OMNET++ Umar ul Hassan 04030038@lums.edu.pk Md Term Report CS678 Topcs n Internet Research Sprng, 2006 Introducton Internet traffc s doublng roughly every 3 months

More information

Parallel and Distributed Association Rule Mining - Dr. Giuseppe Di Fatta. San Vigilio,

Parallel and Distributed Association Rule Mining - Dr. Giuseppe Di Fatta. San Vigilio, Parallel and Dstrbuted Assocaton Rule Mnng - Dr. Guseppe D Fatta fatta@nf.un-konstanz.de San Vglo, 18-09-2004 1 Overvew Assocaton Rule Mnng (ARM) Apror algorthm Hgh Performance Parallel and Dstrbuted Computng

More information

Cluster Analysis of Electrical Behavior

Cluster Analysis of Electrical Behavior Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School

More information

[KV99] M. Kaul and R, Vemuri. Integrated Block-Processing and Design-Space Exploration in Temporal Partitioning for RTR Architectures.

[KV99] M. Kaul and R, Vemuri. Integrated Block-Processing and Design-Space Exploration in Temporal Partitioning for RTR Architectures. [KV99] M. Kaul and R, Vemur. Integrated Block-Processng and Desgn-Space Exploraton n Temporal Parttonng for RTR Archtectures. Reconfgurable Archtectures Workshop Proceedngs, Puerto Rco, Aprl 1999. [OCS98]

More information

Assembler. Building a Modern Computer From First Principles.

Assembler. Building a Modern Computer From First Principles. Assembler Buldng a Modern Computer From Frst Prncples www.nand2tetrs.org Elements of Computng Systems, Nsan & Schocken, MIT Press, www.nand2tetrs.org, Chapter 6: Assembler slde Where we are at: Human Thought

More information

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009. Farrukh Jabeen Algorthms 51 Assgnment #2 Due Date: June 15, 29. Assgnment # 2 Chapter 3 Dscrete Fourer Transforms Implement the FFT for the DFT. Descrbed n sectons 3.1 and 3.2. Delverables: 1. Concse descrpton

More information

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Some materal adapted from Mohamed Youns, UMBC CMSC 611 Spr 2003 course sldes Some materal adapted from Hennessy & Patterson / 2003 Elsever Scence Performance = 1 Executon tme Speedup = Performance (B)

More information

Improving High Level Synthesis Optimization Opportunity Through Polyhedral Transformations

Improving High Level Synthesis Optimization Opportunity Through Polyhedral Transformations Improvng Hgh Level Synthess Optmzaton Opportunty Through Polyhedral Transformatons We Zuo 2,5, Yun Lang 1, Peng L 1, Kyle Rupnow 3, Demng Chen 2,3 and Jason Cong 1,4 1 Center for Energy-Effcent Computng

More information

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 A mathematcal programmng approach to the analyss, desgn and

More information

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,

More information

Motivation. EE 457 Unit 4. Throughput vs. Latency. Performance Depends on View Point?! Computer System Performance. An individual user wants to:

Motivation. EE 457 Unit 4. Throughput vs. Latency. Performance Depends on View Point?! Computer System Performance. An individual user wants to: 4.1 4.2 Motvaton EE 457 Unt 4 Computer System Performance An ndvdual user wants to: Mnmze sngle program executon tme A datacenter owner wants to: Maxmze number of Mnmze ( ) http://e-tellgentnternetmarketng.com/webste/frustrated-computer-user-2/

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

Parallel Numerics. 1 Preconditioning & Iterative Solvers (From 2016)

Parallel Numerics. 1 Preconditioning & Iterative Solvers (From 2016) Technsche Unverstät München WSe 6/7 Insttut für Informatk Prof. Dr. Thomas Huckle Dpl.-Math. Benjamn Uekermann Parallel Numercs Exercse : Prevous Exam Questons Precondtonng & Iteratve Solvers (From 6)

More information

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour 6.854 Advanced Algorthms Petar Maymounkov Problem Set 11 (November 23, 2005) Wth: Benjamn Rossman, Oren Wemann, and Pouya Kheradpour Problem 1. We reduce vertex cover to MAX-SAT wth weghts, such that the

More information

Video Proxy System for a Large-scale VOD System (DINA)

Video Proxy System for a Large-scale VOD System (DINA) Vdeo Proxy System for a Large-scale VOD System (DINA) KWUN-CHUNG CHAN #, KWOK-WAI CHEUNG *# #Department of Informaton Engneerng *Centre of Innovaton and Technology The Chnese Unversty of Hong Kong SHATIN,

More information

Polyhedral Compilation Foundations

Polyhedral Compilation Foundations Polyhedral Complaton Foundatons Lous-Noël Pouchet pouchet@cse.oho-state.edu Dept. of Computer Scence and Engneerng, the Oho State Unversty Feb 8, 200 888., Class # Introducton: Polyhedral Complaton Foundatons

More information

Performance Study of Parallel Programming on Cloud Computing Environments Using MapReduce

Performance Study of Parallel Programming on Cloud Computing Environments Using MapReduce Performance Study of Parallel Programmng on Cloud Computng Envronments Usng MapReduce Wen-Chung Shh, Shan-Shyong Tseng Department of Informaton Scence and Applcatons Asa Unversty Tachung, 41354, Tawan

More information

with `ook-ahead for Broadcast WDM Networks TR May 14, 1996 Abstract

with `ook-ahead for Broadcast WDM Networks TR May 14, 1996 Abstract HPeR-`: A Hgh Performance Reservaton Protocol wth `ook-ahead for Broadcast WDM Networks Vjay Svaraman George N. Rouskas TR-96-06 May 14, 1996 Abstract We consder the problem of coordnatng access to the

More information

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique //00 :0 AM Outlne and Readng The Greedy Method The Greedy Method Technque (secton.) Fractonal Knapsack Problem (secton..) Task Schedulng (secton..) Mnmum Spannng Trees (secton.) Change Money Problem Greedy

More information

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 An Iteratve Soluton Approach to Process Plant Layout usng Mxed

More information

Loop Transformations, Dependences, and Parallelization

Loop Transformations, Dependences, and Parallelization Loop Transformatons, Dependences, and Parallelzaton Announcements Mdterm s Frday from 3-4:15 n ths room Today Semester long project Data dependence recap Parallelsm and storage tradeoff Scalar expanson

More information

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr) Helsnk Unversty Of Technology, Systems Analyss Laboratory Mat-2.08 Independent research projects n appled mathematcs (3 cr) "! #$&% Antt Laukkanen 506 R ajlaukka@cc.hut.f 2 Introducton...3 2 Multattrbute

More information

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices Internatonal Mathematcal Forum, Vol 7, 2012, no 52, 2549-2554 An Applcaton of the Dulmage-Mendelsohn Decomposton to Sparse Null Space Bases of Full Row Rank Matrces Mostafa Khorramzadeh Department of Mathematcal

More information

[20] M. Witbrock and M. Zagha, \Backpropagation learning on the IBM GF 11", in Parallel Digital

[20] M. Witbrock and M. Zagha, \Backpropagation learning on the IBM GF 11, in Parallel Digital [20] M. Wtbrock and M. Zagha, \Backpropagaton learnng on the IBM GF 11", n Parallel Dgtal Implementaton of Neural Networks, pp.77-104, 1993. [21] H. Yoon and J.H. Nang \Multlayer neural networks on dstrbuted-memory

More information

Repeater Insertion for Two-Terminal Nets in Three-Dimensional Integrated Circuits

Repeater Insertion for Two-Terminal Nets in Three-Dimensional Integrated Circuits Repeater Inserton for Two-Termnal Nets n Three-Dmensonal Integrated Crcuts Hu Xu, Vasls F. Pavlds, and Govann De Mchel LSI - EPFL, CH-5, Swtzerland, {hu.xu,vasleos.pavlds,govann.demchel}@epfl.ch Abstract.

More information

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Proceedngs of the Wnter Smulaton Conference M E Kuhl, N M Steger, F B Armstrong, and J A Jones, eds A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Mark W Brantley Chun-Hung

More information

USING GRAPHING SKILLS

USING GRAPHING SKILLS Name: BOLOGY: Date: _ Class: USNG GRAPHNG SKLLS NTRODUCTON: Recorded data can be plotted on a graph. A graph s a pctoral representaton of nformaton recorded n a data table. t s used to show a relatonshp

More information

A SYSTOLIC APPROACH TO LOOP PARTITIONING AND MAPPING INTO FIXED SIZE DISTRIBUTED MEMORY ARCHITECTURES

A SYSTOLIC APPROACH TO LOOP PARTITIONING AND MAPPING INTO FIXED SIZE DISTRIBUTED MEMORY ARCHITECTURES A SYSOLIC APPROACH O LOOP PARIIONING AND MAPPING INO FIXED SIZE DISRIBUED MEMORY ARCHIECURES Ioanns Drosts, Nektaros Kozrs, George Papakonstantnou and Panayots sanakas Natonal echncal Unversty of Athens

More information

Wishing you all a Total Quality New Year!

Wishing you all a Total Quality New Year! Total Qualty Management and Sx Sgma Post Graduate Program 214-15 Sesson 4 Vnay Kumar Kalakband Assstant Professor Operatons & Systems Area 1 Wshng you all a Total Qualty New Year! Hope you acheve Sx sgma

More information

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task Proceedngs of NTCIR-6 Workshop Meetng, May 15-18, 2007, Tokyo, Japan Term Weghtng Classfcaton System Usng the Ch-square Statstc for the Classfcaton Subtask at NTCIR-6 Patent Retreval Task Kotaro Hashmoto

More information

Petri Net Based Software Dependability Engineering

Petri Net Based Software Dependability Engineering Proc. RELECTRONIC 95, Budapest, pp. 181-186; October 1995 Petr Net Based Software Dependablty Engneerng Monka Hener Brandenburg Unversty of Technology Cottbus Computer Scence Insttute Postbox 101344 D-03013

More information

Verification by testing

Verification by testing Real-Tme Systems Specfcaton Implementaton System models Executon-tme analyss Verfcaton Verfcaton by testng Dad? How do they know how much weght a brdge can handle? They drve bgger and bgger trucks over

More information

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth

More information

Today Using Fourier-Motzkin elimination for code generation Using Fourier-Motzkin elimination for determining schedule constraints

Today Using Fourier-Motzkin elimination for code generation Using Fourier-Motzkin elimination for determining schedule constraints Fourer Motzkn Elmnaton Logstcs HW10 due Frday Aprl 27 th Today Usng Fourer-Motzkn elmnaton for code generaton Usng Fourer-Motzkn elmnaton for determnng schedule constrants Unversty Fourer-Motzkn Elmnaton

More information

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

Cache Performance 3/28/17. Agenda. Cache Abstraction and Metrics. Direct-Mapped Cache: Placement and Access

Cache Performance 3/28/17. Agenda. Cache Abstraction and Metrics. Direct-Mapped Cache: Placement and Access Agenda Cache Performance Samra Khan March 28, 217 Revew from last lecture Cache access Assocatvty Replacement Cache Performance Cache Abstracton and Metrcs Address Tag Store (s the address n the cache?

More information

Load Balancing for Hex-Cell Interconnection Network

Load Balancing for Hex-Cell Interconnection Network Int. J. Communcatons, Network and System Scences,,, - Publshed Onlne Aprl n ScRes. http://www.scrp.org/journal/jcns http://dx.do.org/./jcns.. Load Balancng for Hex-Cell Interconnecton Network Saher Manaseer,

More information

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,

More information

Run-Time Operator State Spilling for Memory Intensive Long-Running Queries

Run-Time Operator State Spilling for Memory Intensive Long-Running Queries Run-Tme Operator State Spllng for Memory Intensve Long-Runnng Queres Bn Lu, Yal Zhu, and lke A. Rundenstener epartment of Computer Scence, Worcester Polytechnc Insttute Worcester, Massachusetts, USA {bnlu,

More information

Dynamic Voltage Scaling of Supply and Body Bias Exploiting Software Runtime Distribution

Dynamic Voltage Scaling of Supply and Body Bias Exploiting Software Runtime Distribution Dynamc Voltage Scalng of Supply and Body Bas Explotng Software Runtme Dstrbuton Sungpack Hong EE Department Stanford Unversty Sungjoo Yoo, Byeong Bn, Kyu-Myung Cho, Soo-Kwan Eo Samsung Electroncs Taehwan

More information

Edge Detection in Noisy Images Using the Support Vector Machines

Edge Detection in Noisy Images Using the Support Vector Machines Edge Detecton n Nosy Images Usng the Support Vector Machnes Hlaro Gómez-Moreno, Saturnno Maldonado-Bascón, Francsco López-Ferreras Sgnal Theory and Communcatons Department. Unversty of Alcalá Crta. Madrd-Barcelona

More information

Efficient Distributed File System (EDFS)

Efficient Distributed File System (EDFS) Effcent Dstrbuted Fle System (EDFS) (Sem-Centralzed) Debessay(Debsh) Fesehaye, Rahul Malk & Klara Naherstedt Unversty of Illnos-Urbana Champagn Contents Problem Statement, Related Work, EDFS Desgn Rate

More information

Private Information Retrieval (PIR)

Private Information Retrieval (PIR) 2 Levente Buttyán Problem formulaton Alce wants to obtan nformaton from a database, but she does not want the database to learn whch nformaton she wanted e.g., Alce s an nvestor queryng a stock-market

More information

Lecture 5: Multilayer Perceptrons

Lecture 5: Multilayer Perceptrons Lecture 5: Multlayer Perceptrons Roger Grosse 1 Introducton So far, we ve only talked about lnear models: lnear regresson and lnear bnary classfers. We noted that there are functons that can t be represented

More information

Parallel Inverse Halftoning by Look-Up Table (LUT) Partitioning

Parallel Inverse Halftoning by Look-Up Table (LUT) Partitioning Parallel Inverse Halftonng by Look-Up Table (LUT) Parttonng Umar F. Sddq and Sadq M. Sat umar@ccse.kfupm.edu.sa, sadq@kfupm.edu.sa KFUPM Box: Department of Computer Engneerng, Kng Fahd Unversty of Petroleum

More information

Load-Balanced Anycast Routing

Load-Balanced Anycast Routing Load-Balanced Anycast Routng Chng-Yu Ln, Jung-Hua Lo, and Sy-Yen Kuo Department of Electrcal Engneerng atonal Tawan Unversty, Tape, Tawan sykuo@cc.ee.ntu.edu.tw Abstract For fault-tolerance and load-balance

More information

DESIGNING TRANSMISSION SCHEDULES FOR WIRELESS AD HOC NETWORKS TO MAXIMIZE NETWORK THROUGHPUT

DESIGNING TRANSMISSION SCHEDULES FOR WIRELESS AD HOC NETWORKS TO MAXIMIZE NETWORK THROUGHPUT DESIGNING TRANSMISSION SCHEDULES FOR WIRELESS AD HOC NETWORKS TO MAXIMIZE NETWORK THROUGHPUT Bran J. Wolf, Joseph L. Hammond, and Harlan B. Russell Dept. of Electrcal and Computer Engneerng, Clemson Unversty,

More information

Design of a Real Time FPGA-based Three Dimensional Positioning Algorithm

Design of a Real Time FPGA-based Three Dimensional Positioning Algorithm Desgn of a Real Tme FPGA-based Three Dmensonal Postonng Algorthm Nathan G. Johnson-Wllams, Student Member IEEE, Robert S. Myaoka, Member IEEE, Xaol L, Student Member IEEE, Tom K. Lewellen, Fellow IEEE,

More information

Evaluation of Parallel Processing Systems through Queuing Model

Evaluation of Parallel Processing Systems through Queuing Model ISSN 2278-309 Vkas Shnde, Internatonal Journal of Advanced Volume Trends 4, n Computer No.2, March Scence - and Aprl Engneerng, 205 4(2), March - Aprl 205, 36-43 Internatonal Journal of Advanced Trends

More information

AMath 483/583 Lecture 21 May 13, Notes: Notes: Jacobi iteration. Notes: Jacobi with OpenMP coarse grain

AMath 483/583 Lecture 21 May 13, Notes: Notes: Jacobi iteration. Notes: Jacobi with OpenMP coarse grain AMath 483/583 Lecture 21 May 13, 2011 Today: OpenMP and MPI versons of Jacob teraton Gauss-Sedel and SOR teratve methods Next week: More MPI Debuggng and totalvew GPU computng Read: Class notes and references

More information

EFFICIENT SYNCHRONOUS PARALLEL DISCRETE EVENT SIMULATION

EFFICIENT SYNCHRONOUS PARALLEL DISCRETE EVENT SIMULATION EFFICIENT SYNCHRONOUS PARALLEL DISCRETE EVENT SIMULATION WITH THE ARMEN ARCHITECTURE C. Beaumont, B. Potter J.M. Flloque LIBr I.U.T. de Brest and LIBr Unversté de Bretagne Occdentale Télécom Bretagne BP

More information

A One-Sided Jacobi Algorithm for the Symmetric Eigenvalue Problem

A One-Sided Jacobi Algorithm for the Symmetric Eigenvalue Problem P-Q- A One-Sded Jacob Algorthm for the Symmetrc Egenvalue Problem B. B. Zhou, R. P. Brent E-mal: bng,rpb@cslab.anu.edu.au Computer Scences Laboratory The Australan Natonal Unversty Canberra, ACT 000, Australa

More information

Real-Time Systems. Real-Time Systems. Verification by testing. Verification by testing

Real-Time Systems. Real-Time Systems. Verification by testing. Verification by testing EDA222/DIT161 Real-Tme Systems, Chalmers/GU, 2014/2015 Lecture #8 Real-Tme Systems Real-Tme Systems Lecture #8 Specfcaton Professor Jan Jonsson Implementaton System models Executon-tme analyss Department

More information

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z.

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z. TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS Muradalyev AZ Azerbajan Scentfc-Research and Desgn-Prospectng Insttute of Energetc AZ1012, Ave HZardab-94 E-mal:aydn_murad@yahoocom Importance of

More information

A Parallel Gauss-Seidel Algorithm for Sparse Power System. Matrices. D. P. Koester, S. Ranka, and G. C. Fox

A Parallel Gauss-Seidel Algorithm for Sparse Power System. Matrices. D. P. Koester, S. Ranka, and G. C. Fox A Parallel Gauss-Sedel Algorthm for Sparse Power System Matrces D. P. Koester, S. Ranka, and G. C. Fox School of Computer and Informaton Scence and The Northeast Parallel Archtectures Center (NPAC) Syracuse

More information

WCET-Directed Dynamic Scratchpad Memory Allocation of Data

WCET-Directed Dynamic Scratchpad Memory Allocation of Data WCET-Drected Dynamc Scratchpad Memory Allocaton of Data Jean-Franços Deverge and Isabelle Puaut Unversté Européenne de Bretagne / IRISA, Rennes, France Abstract Many embedded systems feature processors

More information

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization Problem efntons and Evaluaton Crtera for Computatonal Expensve Optmzaton B. Lu 1, Q. Chen and Q. Zhang 3, J. J. Lang 4, P. N. Suganthan, B. Y. Qu 6 1 epartment of Computng, Glyndwr Unversty, UK Faclty

More information

124 Chapter 8. Case Study: A Memory Component ndcatng some error condton. An exceptonal return of a value e s called rasng excepton e. A return s ssue

124 Chapter 8. Case Study: A Memory Component ndcatng some error condton. An exceptonal return of a value e s called rasng excepton e. A return s ssue Chapter 8 Case Study: A Memory Component In chapter 6 we gave the outlne of a case study on the renement of a safe regster. In ths chapter wepresent the outne of another case study on persstent communcaton;

More information

X- Chart Using ANOM Approach

X- Chart Using ANOM Approach ISSN 1684-8403 Journal of Statstcs Volume 17, 010, pp. 3-3 Abstract X- Chart Usng ANOM Approach Gullapall Chakravarth 1 and Chaluvad Venkateswara Rao Control lmts for ndvdual measurements (X) chart are

More information

Sequential search. Building Java Programs Chapter 13. Sequential search. Sequential search

Sequential search. Building Java Programs Chapter 13. Sequential search. Sequential search Sequental search Buldng Java Programs Chapter 13 Searchng and Sortng sequental search: Locates a target value n an array/lst by examnng each element from start to fnsh. How many elements wll t need to

More information

Distributed Resource Scheduling in Grid Computing Using Fuzzy Approach

Distributed Resource Scheduling in Grid Computing Using Fuzzy Approach Dstrbuted Resource Schedulng n Grd Computng Usng Fuzzy Approach Shahram Amn, Mohammad Ahmad Computer Engneerng Department Islamc Azad Unversty branch Mahallat, Iran Islamc Azad Unversty branch khomen,

More information

Module Management Tool in Software Development Organizations

Module Management Tool in Software Development Organizations Journal of Computer Scence (5): 8-, 7 ISSN 59-66 7 Scence Publcatons Management Tool n Software Development Organzatons Ahmad A. Al-Rababah and Mohammad A. Al-Rababah Faculty of IT, Al-Ahlyyah Amman Unversty,

More information

A Parallelization Design of JavaScript Execution Engine

A Parallelization Design of JavaScript Execution Engine , pp.171-184 http://dx.do.org/10.14257/mue.2014.9.7.15 A Parallelzaton Desgn of JavaScrpt Executon Engne Duan Huca 1,2, N Hong 2, Deng Feng 2 and Hu Lnln 2 1 Natonal Network New eda Engneerng Research

More information

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following.

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following. Complex Numbers The last topc n ths secton s not really related to most of what we ve done n ths chapter, although t s somewhat related to the radcals secton as we wll see. We also won t need the materal

More information

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters Proper Choce of Data Used for the Estmaton of Datum Transformaton Parameters Hakan S. KUTOGLU, Turkey Key words: Coordnate systems; transformaton; estmaton, relablty. SUMMARY Advances n technologes and

More information

CACHE MEMORY DESIGN FOR INTERNET PROCESSORS

CACHE MEMORY DESIGN FOR INTERNET PROCESSORS CACHE MEMORY DESIGN FOR INTERNET PROCESSORS WE EVALUATE A SERIES OF THREE PROGRESSIVELY MORE AGGRESSIVE ROUTING-TABLE CACHE DESIGNS AND DEMONSTRATE THAT THE INCORPORATION OF HARDWARE CACHES INTO INTERNET

More information

y and the total sum of

y and the total sum of Lnear regresson Testng for non-lnearty In analytcal chemstry, lnear regresson s commonly used n the constructon of calbraton functons requred for analytcal technques such as gas chromatography, atomc absorpton

More information

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics Introducton G10 NAG Fortran Lbrary Chapter Introducton G10 Smoothng n Statstcs Contents 1 Scope of the Chapter... 2 2 Background to the Problems... 2 2.1 Smoothng Methods... 2 2.2 Smoothng Splnes and Regresson

More information

Reducing Frame Rate for Object Tracking

Reducing Frame Rate for Object Tracking Reducng Frame Rate for Object Trackng Pavel Korshunov 1 and We Tsang Oo 2 1 Natonal Unversty of Sngapore, Sngapore 11977, pavelkor@comp.nus.edu.sg 2 Natonal Unversty of Sngapore, Sngapore 11977, oowt@comp.nus.edu.sg

More information

ELEC 377 Operating Systems. Week 6 Class 3

ELEC 377 Operating Systems. Week 6 Class 3 ELEC 377 Operatng Systems Week 6 Class 3 Last Class Memory Management Memory Pagng Pagng Structure ELEC 377 Operatng Systems Today Pagng Szes Vrtual Memory Concept Demand Pagng ELEC 377 Operatng Systems

More information

Parallel Incremental Graph Partitioning Using Linear Programming

Parallel Incremental Graph Partitioning Using Linear Programming Syracuse Unversty SURFACE College of Engneerng and Computer Scence - Former Departments, Centers, Insttutes and roects College of Engneerng and Computer Scence 994 arallel Incremental Graph arttonng Usng

More information

Real-time Fault-tolerant Scheduling Algorithm for Distributed Computing Systems

Real-time Fault-tolerant Scheduling Algorithm for Distributed Computing Systems Real-tme Fault-tolerant Schedulng Algorthm for Dstrbuted Computng Systems Yun Lng, Y Ouyang College of Computer Scence and Informaton Engneerng Zheang Gongshang Unversty Postal code: 310018 P.R.CHINA {ylng,

More information

Programming in Fortran 90 : 2017/2018

Programming in Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Exercse 1 : Evaluaton of functon dependng on nput Wrte a program who evaluate the functon f (x,y) for any two user specfed values

More information

UB at GeoCLEF Department of Geography Abstract

UB at GeoCLEF Department of Geography   Abstract UB at GeoCLEF 2006 Mguel E. Ruz (1), Stuart Shapro (2), June Abbas (1), Slva B. Southwck (1) and Davd Mark (3) State Unversty of New York at Buffalo (1) Department of Lbrary and Informaton Studes (2) Department

More information

Agenda & Reading. Simple If. Decision-Making Statements. COMPSCI 280 S1C Applications Programming. Programming Fundamentals

Agenda & Reading. Simple If. Decision-Making Statements. COMPSCI 280 S1C Applications Programming. Programming Fundamentals Agenda & Readng COMPSCI 8 SC Applcatons Programmng Programmng Fundamentals Control Flow Agenda: Decsonmakng statements: Smple If, Ifelse, nested felse, Select Case s Whle, DoWhle/Untl, For, For Each, Nested

More information

CHAPTER 4 PARALLEL PREFIX ADDER

CHAPTER 4 PARALLEL PREFIX ADDER 93 CHAPTER 4 PARALLEL PREFIX ADDER 4.1 INTRODUCTION VLSI Integer adders fnd applcatons n Arthmetc and Logc Unts (ALUs), mcroprocessors and memory addressng unts. Speed of the adder often decdes the mnmum

More information

Computer models of motion: Iterative calculations

Computer models of motion: Iterative calculations Computer models o moton: Iteratve calculatons OBJECTIVES In ths actvty you wll learn how to: Create 3D box objects Update the poston o an object teratvely (repeatedly) to anmate ts moton Update the momentum

More information

APPLICATION OF A COMPUTATIONALLY EFFICIENT GEOSTATISTICAL APPROACH TO CHARACTERIZING VARIABLY SPACED WATER-TABLE DATA

APPLICATION OF A COMPUTATIONALLY EFFICIENT GEOSTATISTICAL APPROACH TO CHARACTERIZING VARIABLY SPACED WATER-TABLE DATA RFr"W/FZD JAN 2 4 1995 OST control # 1385 John J Q U ~ M Argonne Natonal Laboratory Argonne, L 60439 Tel: 708-252-5357, Fax: 708-252-3 611 APPLCATON OF A COMPUTATONALLY EFFCENT GEOSTATSTCAL APPROACH TO

More information

CHAPTER 2 PROPOSED IMPROVED PARTICLE SWARM OPTIMIZATION

CHAPTER 2 PROPOSED IMPROVED PARTICLE SWARM OPTIMIZATION 24 CHAPTER 2 PROPOSED IMPROVED PARTICLE SWARM OPTIMIZATION The present chapter proposes an IPSO approach for multprocessor task schedulng problem wth two classfcatons, namely, statc ndependent tasks and

More information