An Evaluation of Automatic and Interactive Parallel Programming Tools

Size: px
Start display at page:

Download "An Evaluation of Automatic and Interactive Parallel Programming Tools"

Transcription

1 An Evaluation of Automati and Interative Parallel Programming Tools Doreen Y Cheng Computer Siene Co NASA Ames Researh Center MS Moffett Field, CA 9435 Douglas M Pase Formerly at NASA (CSC) Cray Researh, In 655F Lone Oak Dr Eagan, MN Abstrat We have evaluated two automati and one interative toola uaing eo typial NAS appliationa on a ORA Y Y-MP It waa found that automati toola produe inauffiient performane improvement Interative toola an produe better performane beauae they help uaen find and eliminate falae dependenies However,,imple-minded ode tranaformation haa resulted in a,ignifiant performane degradation whih anel, the,peedup obtainable by parallelization Therefore, tool, mud perform mahine-,peifi optimization, The benhmarlu ontain a large number of amall to medium,ize loopa, whih limit, the performane ahievable by parallelinng only loopa Featurea to a66e66 whether a,etion of ode,hould be parallelized, vetorized ' or left aequential are alao nee66ary 1 Introdution By the year 2, the Numerial Aero dynamis Simulation (NAS) program at NASA Ames Researh Center will provide sientists with superomputers having parallel arhitetures To make the power of parallel proessing available to sientists it will be neessary to provide a programm'ing environment that will enable them to fous on physial and mathematial modeling In addition, it will be neessary to provide tools that will help sientists produe orret and effiient ode for variety of superomputers Many parallel programming tools have been developed by researh institutes and industry1 Previously reported evaluations of parallel programming languages and tools have used small syntheti benhmarks2 The benhmarks and hardware used have quite different harateristis than those at NAS Parallelization depends on harateristis of both appliations and target systems We have evaluated several urrent approahes to parallel programming tools by using typial NAS appliations and hardware to give us insights into the proper diretion for researh into, and design and development of future NAS parallel programming environme ts The funtions of existing tools fall into three ategories: tools that onvert a sequential program into a parallel one, tools that assist in the reation of a parallel program and tools that help aid parallel debugging and performane optimization Two approahes have been taken in building tools that onvert sequential programs to parallel: automati and interative Automati tools rely o? ompilers to on,: ert a sequential pro gram mto a parallel one m a bath proessing!ashion Diretives an be inserted manually mto a program before ompilation, but the t ols do not help users find fal,e dependenle, falle dependeny is a dependene that will not atually our during the pro gram exeution for a given data set Interative tools require user interation to guide the ompiler during the ourse of parallelizing eah setion of the program They provide failities for analyzing dependenies One requirement for the NAS parallel programming environment is to minimize the number of hanges in the urrent programming praties of sientists If either automati or interative tools ould parallelize existing programs with a reasonable amount of user effort and deliver satisfatory performane, sientists would be able to ontinue writing sequential appliations and use the tools to onvert them into parallel ones We planned to evaluate a number of existing tools in order of inreasing user effort: automati tools, interative tools, and tools for writing new programs This paper reports the results of evaluating tools in the first two ategories The evaluation was performed on a CRA Y Y-MP using benhmarks that are representative of urrent NAS appliations The era Y Y-MP was hosen beause the alane between its proessor speed and mteronnetion performane is similar to what we expet from future NAS omputers Table 1 lists the system software used in the exp { iment The ORA Y Autotasking faility, fpp, and KAP/ORAy4 from Kuk & Assoiates were hosen for the evaluation of automati tools The Forge programs from Paifi Sierra Researh was the most 1991 ACM /91/412 $15 412

2 interative onversion tool we ould find that ran on the Y-MP We have found that automati tools produe insuffiient performane improvement for our appliations Interative tools allow users to onveniently aess a set of integrated tools In some ases, interative tools an produe better performane beause they help users find and eliminate false dependenies The urrent version of Forge flags a onverted loop as parallel by simply using a DO ALL ompiler diretive For our appliations, this approah has resulted in a signifiant performane degradation whih anels the potential speedup obtainable by parallelization For programs optimized for vetor mahines the degradation is the most severe To ahieve high performane, it is neessary for tools to perform optimization speifi to a target mahine in addition to parallelizing the program We have also found that typial NAS programs ontain a large number of small to medium size loops, whih limits the performane ahievable by parallelizing only loops Based on these results, we believe that tools are needed for designing parallel algorithms and writing parallel programs Features to assess whether a setion of ode should be parallelized, vetorized, or left sequential are neessary to redue the efl'ort of user direted optimization The urrent version of Forge has been signifiantly improved based on this results Setion 2 of this paper desribes the evaluation of CRAY fpp and KAP/CRAY Setion 3 presents the results of Forge evaluation Setion 4 explains the observed phenomenon, and Setion 5 presents onlusions and diretion for future work 2 Evaluation of Automatie Tools At present time, both fpp and KAP an only parallelize program loops The first experiment was designed to test the quality of automati parallelization For this reason, no ompiler diretives were inserted into the programs We seleted twenty five programs from typial NAS private odes and publidomain benhmarks for the evaluation The objetives of the evaluation were to find out how muh performane improvement ould be obtained by automati parallelization tools and how muh extra ompilation time was required To ompare parallel performane with the best sequential performane, the enhaned vetorization apability of these tools were studied as well 21 Benehmarks We used twenty five benhmarks, thirteen from the Perfet s,6 ten NAS private programs, the Livermore Loops, and NAS kernels They represent typial appliations urrently running on NAB superomputers The harateristis of the programs are listed in Table 2 memory requirements varied from 11 Kwords to 54 Mwords (8 bytes/word) The total number of floating point operations required by these programs varied from 58 million to 78 billion operations The I/O requirements of the benhmarks were very low, with only a few exeptions A large solid state storage devie was used to redue the time spent doing I/O Eah benhmark was first ompiled using the default vetorization provided by the CRAY ompiler This version was the base for omparison To study vetorization, three more versions of eah benhmark were generated using difl'erent preproessors The first version used fpp The seond version used KAP The last version used both fpp and KAP The performane of these versions was measured using a single proessor on a dediated Y-MP Three more versions of eah program were generated to study parallelization: using fpp, using KAP and using both preproessors The performane was measured on a dediated Y-MP using 1, 4, and 8 proessors 22 Results Figures 1-8 summarizes the results of evaluating the automati tools Eah benhmark is represented by a number that is the same number appearing in the first olumn of Table 2 The data of eah benhmark was plotted in Figures 1-7 aording to this numbering (the X-axis) Figure 1 shows the performane of default vetorization in MFLOPs Figure 2 shows the orresponding ompilation time in seonds Figure 3 is a graph of the speedup obtained by enhaned vetorization The speedup is alulated by dividing the elapsed time for the default vetorization version by the exeution time of the orresponding enhaned vetorization version Figure 4 shows the extra ompilation time required by enhaned vetorization In figure 4, Cw is the ompilation time used by enhaned vetorization and Cd is the ompilation time used by default vetorization Figures 3 and 4 show that only few benhmarks signifiantly benefit from the extra analysis and transformations provided by enhaned vetorization For many benhmarks, the default vetorization offered by the ompiler provides as good or better performane than the enhaned vetorization ofl'ered by the preproessors, and with less ompilation overhead This is beause the 413

3 user-performed optimization for the era Y vetor unit before the evaluation has made most loops vedorizable by simple analysis and transformations performed by the era Y ompiler The performane of a parallel program was ompared to the best sequential ounterpart implementing the same algorithm For this reason, the default vetorization version was used as the base instead of the parallel version running on a single proessor Figures 5 and 6 display the speedup of parallelized version running on 4 and 8 proessors The speedup is alulated by dividing the elapsed time required by the parallel version (T where n is the number of proessors us:d) into the time used by the default vetor version (T) Figure 7 ompares the ompilation osts (O p is the ompilation time used by parallelization and tl is the time used by default vedorization) More programs were improved by parallelization than by enhaned vetorization On the other hand, the ompilation time was lengthened by a fator of 3 on the average The majority of the program ran less than 5% faster; a few even slowed down Exeuting on 8 proessors did lead to higher speedup However, the best effiieny dropped to 5% from 75% on 4 proessors (The effiieny is defined as the speedup divided by the number of proessors used) The poor performane is beause the programs do not ontain enough large grain loops that are parallelizable by automati tools Loops parallelizable by these tools are also vedorizable The potential performane improvement is largely onsumed by vetorization and not muh left for parallel exeution To further illustrate the relation between vetorization and parallelization, Figure 8 plots the speedup obtained by parallel exeution on 4 proessors against the perentage of vetorization in the benhmarks (Exeutions on 8 proessors have similar harateristis) The results seem to indiate that programs that do not vetorize well do not parallelize well either Furthermore, higher than 7% vetorization is required to obtain reasonable speedup on multiple pro essors (This is one reason why the pro grams whih improved signifiantly under enhaned vetorization did not improve as muh as might be expeted under parallelization) The dependene of parallelization on the perentage of vetorization is partially beause the tools only try to parallelize loops and the analysis used for parallelization is similar to that for vetorization As a result, programs that ontain large vedorizable loops obtain higher speedup on multiple pro essors than those with small vetorizable loops This study shows that urrent state-ofthe-art automati tools deliver unsatisfatory performane on multiple proessors for these NAS benhmarks One possible reason is the lak of knowledge at algorithm level that an be used to reate large grain parallelism A natural question is whether interative tools an do better This lead to the seond experiment: evaluation of Forge 3 Evaluation of Interative Tools Just like the automati tools we studied, Forge only parallelijes loops, although it an be used to analyze dependenies between any two parts of a program The objetives of the evaluation were to find out how muh performane improvement a state-of-the-art interative tool ould produe for the NAS appliations, how muh user effort would be required, and how to improve them if they are not adequate The results are biased by the benhmark programs and the hardware platform used and represent harateristis of urrent NAS appliations and failities 31 s The evaluation used five programs Four of them, NAS4, NAS7, NAS8, and NASlO, were seleted from the ten NAS private programs used in evaluating the automati tools ARe3D was from the Perfet NAS4 and NAS7 show a low MFLOP rate using default vetorization (53 and 37 MFLOPs) Only a low perent e of the odes is vetorizable (63% and 59'7) Even using enhaned vetorization or automati parallelization (running on 4 pro essors) the improvement is less than 3% NAS8 and NASlO, on the other hand, show a high MFLOP rate (179 and 134 MFLOPs) A high perentage of these programs are vetorizable (both 96%) Automati parallelization speeds up the programs by a fador of 25 (the largest speedup) on 4 proessors During the evaluation, Forge was first used to generate the run time profile of a program Then all loops in the program were analyzed When Forge pointed out that dependenies had prohibited parallelization, its database query faility was used to analyze the dependenies and determine whether they ould be ignored The false dependenies were removed Finally, Forge was used to generate a program whih parallelized all loops that were parallelizable 414

4 32 Re8ult8 The results are summariled in Tables 3-12 Eah row of a table presents the performane of one version of the program The first olumn lists the name of the version Eah entry of the other olumns of Tables 3-7 ontains three numbers The first number is the elapsed time in seonds The seond is the speedup of a program running on multiple proessors relative to the performane of the same version running on one proessor The third number is the multiproessor effiieny The performane obtained by the automati parallelilation tool, fpp, (the entries named "fpp" in Tables 3-12) was used as the base to measure the improvement made by Forge This data was re-measured sine the system software had been hanged (Table 1) Up to six additional versions of eah benhmark were studied to understand th,: effet of ode transformations and granularity on performane under the ondition of fixed data set size Version 1 (named "Forge") was the parallel program generated by Forge and ompiled without invoking fpp All parallelizable loops in this version were parallelized The other versions were manually derived from this version Version 2 (named "Forge No Dep") paralleliled the same loops that were paralieliled by fpp The next four versions were generated for improving performane Version 3 (named "Forge T > 1%") only parallelized the loops whose exeution time was greater than 1% of the total elapsed time Only loops whih onsumed more than 1% of the total exeution time were parallelized in the forth version (named "Forge T > 1%") These three versions were ompiled without using fpp In the next two versions, Forge was used to optimize the loops with false dependenies while fpp was used for the rest The two versions differ in how the loops with false dependenies were optimized Version 5 (named "Forge + fpp Parallel") parallelizes the outer loops and vetorizes the inner loops when possible, whereas Version 6 (named "Forge + fpp Vetor") vetorizes the loops No versions 5 or 6 were produed for NAS8 and NASI beause they annot be further improved by Forge as will be shown The exeution time of the versions of NAS8 and NAS1 generated by Forge is intolerably long To redue the time needed on a dediated mahine, the data set size was redued (indiated by the label "Small" in the tables) The effet of data size is reported in a separate paper7 The performane of the version optimized by fpp using full size data is inluded for omparison The performane was measured on a dediated Y-MP using 1, 4, and 8 proe88rs Tables 8-12 tabulate the performane improvement of different versions over the version optimiled by fpp Based on the data presented in Tables 3 12, we have drawn the following onlusi n The first is that in addition to parall! lizmg ode, tools must perform optimila tlons speifi to a target mahine in order to produe reasonable performane The seond is that interative tools must provide more assist than presently available to ease user performed optimization The following para graphs elaborate these two points Table 13 ompares the single proessor performane of the fpp tjerlion and the Forge No Dep Verlion to the performane obtained by using only the default vetorization pro vided by the era Y ompiler Table 13 shows that for ARe3D, NAS4 and NAS7, both preproessors degrade the performane; Forge degrades more than fpp For NAS8 and NASI, fpp improves the performanes while Forge degrades by a fator of 5 to 9 The data in this table provides a referene for the data shown in Table 14 Table 14 ompares the performane of the Forge No Dep Veraion to the performane of the fpp Verlion It shows that when same loops are parallelized, the Forge No Dep Ver,ion runs signifiantly slower on a single proessor than the fpp Ver,ion For highly vetorized pro grams, suh as NAS8 and NASI, the degradation is as high as a fator of 7 to 1 Sine both versions parallelize the same loops, the performane differene reflets the quality of ode generation Table 15 gives a f w examples of the differene in ode genera tion When Forge parallelizes a loop, it simply inserts "DO ALL" diretives It does not optimize for era Y arhiteture espeially for the vetor units The simple-minded ode generation signifiantly dereases the performane Table 16 o ares the performane of the Forge Ver,ion whih parallelizes all the parallelizable loops to the performane of the fpp Verlion able 17 ompares the best performane of eah benhmark (obtained using tools) to the fpp Veraion The results indiate the performane of the Forge Ver,ion of ARe3D, NAS4, and NAS7 exeeds the performane of the respetive fpp Verlion when multiple proessors are used Only after additional user effort in optimization does the single proessor performane of these three programs exeed the performane of the fpp ounter parts However, the degradation in NAS8 and NAS1 is hardly redueed even in the best versions The degradation aused by simpleminded ode generation anels the speedup 415

5 obtainable through parallelization Only when the grain sile of the loops with false dependenies is large enough an the speedup obtained by the extra parallelisation overome the degradation aused by poor ode generation In NAS4 and NAS7, most parallelizable loops ontain false dependenies In addition, these loops ontain a muh larger grain sile than the loops with no dependenies Forge aids the user in parallelizing loops with false dependenies - loops not parallelized by automati tools For these two programs the performane is dominated by the speedup obtainable by parallelilation Therefore, Forge obtains better performane on multiple proessors than fpp In the ase of ARC3D, the two types of loops and their granularity are omparable The performane improvement of using Forge is thus less_ pronouned for this program More than 95% of the parallelizable loops in programs NAS8 and NASI are dependene free A few loops with false dependene in these programs are ver small (less than 4% of exeution time) For these pro grams, the degradation mtrodued by ode generation dominates the performane, and Forge annot improve the performane over fpp The degradation due to simple ode generation is therefore most pronouned for these programs The above analysis shows that parallelilation alone is insuffiient; parallel tools must perform mahine-speifi optimilations One possible solution is to let interative tools insert diretives that onvey only dependeny information and to let vendor-provided preproessors and/or ompilers perform optimization Forge provides an environment in whih users an onveniently aess tools that guide and assist parallelization However, even with all the help, it is still quite diffiult to disover false dependenies and to improve performane For this kind of tools to be useful for sientists, more funtions are needed Improving the performane of a pro gram an be diffiult and tedious beause there is no easy way to hoose the loops to be parallelized If no overhead is involved, the more loops are parallelized the better the performane would be With overhead, however, parallelizing small loops would degrade the performane In the experiment, many different versions of eah benhmark had to be derived in searh for better performane The following two paragraphs show the importane of tools for helping user make the tradeoffs between parallelism and granularity One attempt in searhing for better performane was to find a simple measure whih allows us to find loops with large enough granularity Forge uses perentage of exeution time onsumed by a loop ombined with the average loop length, to help users to selet the loops to be paralleliled Th,p perloo p entage of time is defined as 1X ---, T fopttrj where T,oo p is the exeution time of a loop, and T'ottrJ 18 the elapsed time of the program Table 18 shows the ratio of the exeution time of the versions with different granularity (measured by perentage of time) and the exeution time of the Forge tleraion whih parallelizes all the parallelizable loops It would be expeted that when paralleliling only larger grain loops, the performane should be better This is the ase for NAS4 and NAS8 The rest of the benhmarks, however, show little differene The reason is that the perentage-of-exeution-time meas ures the amount of work ontained in a loop relative to the work of the entire program It is not a good measure of the granularity referred to in parallelization A better measure of granularity is the ratio of the number of operations in a loop that perform useful work to the number of operations added in order to exeute the loop onurrently (overhead) The overhead an be introdued by both the system and ode transformation The system overhead inludes the time needed to reate, suspend, shedule, and terminate onurrent tasks, and the time spent in task ommuniation and synhronization The overhead intro dued by ode transformation is due to the sequential ode added to a loop while it is parallelized Sine overhead is system and transformation dependent, it is diffiult to measure at user level; tools must help The next attempt for optimizing the programs was to use the strength of both Forge and fpp In this approah, Forge was used to analyze a program and fpp was used to generate the ode Again, the results were different for different programs Two of them obtained better performane by vetorizing the loops with false dependenies, and one of them by parallelizing suh loops Better performane an be obtained only after onsidering the tradeoffs between grain size and parallel overhead For this reason, it is important for the interative tools to help users to make the tradeoffs Users should be able to query whether a setion of ode should be parallelized, vetorized, or left sequential At the very least, a tool should give a performane estimate for exeution on the target mahine The database query failities provided by Forge are quite useful in disovering false dependenies However, it is rather diffiult for people who are not familiar with the on- 416

6 epts and terminologies of dependeny analysis to use the faility Messages suh as "use - use onflit" are not very useful to users at algorithm level Questions like "Do I+N and J+M have overlapped range?" and "Is the value of M set at eah iteration of the loop before it is used?" make it muh easier for users to disover false dependenies Furthermore, use-def hain analysis may lead to the onlusion that a dependeny exists where it does not Examples of this kind an be found when values are read in from a file or are indexed by the values of an array In these ases, asking questions at the appliation level may lead to quik disovery of a false dependene 4 Disussion To further understand the behavior of the benhmarks, the harateristis of the loops they ontain are summarized in Tables 19 and 2 Table 19 lists the total number of loops, the number of loops that exeuted using the given input, and the number of the exeuted loops that an be parallelized The granularity of the loops are roughly haraterized by the perentage of exeution time Also listed in the table are the number of parallelized loops that have no dependenies and the number of loops that have false dependenies Table 2 shows another view of the data in Table 19 Although the programs have been o imized for the era Y vetor unit, only 56% to 9% of the exeuted loops are parallelizable Exept NAS4, more than 2/3 of loops are very small - exeution time is less than 1% of the total time Over 95% of the loops onsume less than 1% of the elapsed time Exept NAS7, the largest loop onsumes only less than 31% of the time Small to medium grain size of most loops is the reason why parallelization by both automati and interative tools do not ahieve high speedup even on 4 proessors Further study using different problem sizes1 shows that all benhmarks ontain serial loops whose bounds are proportional to problem size When the grain size of these loops is large enough, the speedup on multiple pro essors remains onstant in spite of inreasing problem size Therefore, only parallelizing loops is not suffiient; tools should support parallel algorithm design and program onstrution 5 Conlusions and Future Work To gain insights into the proper diretion for researh into, and design and development of future NAS parallel program- ming environments, we have performed two experiments In the first experiment, we used twenty five typial NAS appliations to evaluate two state-of-the-art automati tools: CRAY fpp and KAP/CRAY We found most benhmarks are not signifiantly improved by automati tools beause they do not spend enough time exeuting the large grain loops that are parallelizable by these tools Loops parallelizable by these tools are also vetorizable It is possible that the potential performane improvement is largely onsumed by vetorization and not muh is left for parallel exeution In the seond experiment, we used five typial NAS appliations to evaluate a stateof-the-art interative tool: Forge We found interative tools an produe better performane in some ase beause they help users find and eliminate fal,e dependenies The urrent version of Forge flags a onverted loop as parallel by simply using a DO ALL ompiler diretive For our appliations, this approah has resulted in a signifiant performane degradation whih anels the potential speedup obtainable by parallelization For programs optimized for vetor mahines the degradation is the most severe To ahieve high performane, it is neessary for tools to perform optimization speifi to a target mahine in addition to parallelizing the program Forge provides users with a onvenient environment to parallelize existing programs Integrating the following features an make them more useful First, tools should generate ode to take advantage of target mahine arhiteture Seond, tools should be provided to evaluate the tradeoffs between granularity of a setion of ode and the overhead introdued by either vetorization or parallelization Third, users should be able to query on whether a setion of ode should be parallelized, vetorized, or left sequential on a partiular hardware Forth, tools should be provided for developing new parallel algorithms and programs, sine only parallelizing loops is not suffiient In addition, the messages generated during the interations between a tool and a user should be understandable by appliation sientists Our analysis shows that the benhmarks ontain a large number of small to medium size loops, whih limits the performane ahievable by parallelizing only loops Based on these results, we believe that tools are needed for designing parallel algorithms and writing parallel programs The data obtained depends on the system harateristis of CRAY Y-MP The behavior may be quite different on mahines that introdue very small overhead to parallel exeution 417

7 The evaluation of parallel tools and environments is an ongoing proess at NAS Evaluation of tools for different mahines, suh &8 Intel ipsc/86, and evaluation of tools for writing new programs have been sheduled s that reflet the harateristis of future NAS appliations will be used in future evaluations Based on the experienes gained, the NAS parallel pro gramming environment will be designed and developed M F L P S , Aknowledgements The authors would like to express sinere appreiation to Paifi Sierra Researh, Kuk and Assoiates In for their generous support of this work and prompt response to suggestions and omments We would like to thank Katherine E Flether for the data she olleted in studying automati tools and Dr Jeffrey T Deutsh for his ritial review of the paper and his valuable suggestions for future work Referenees Doreen Y Cheng, "A Survey of Parallel Programming Tools," NASA Report RND-91-5, NASA Amu RelJearh Center, Feb 22, 1991 Alan H Karp and Robert G Babb II, "A Comparison of 12 Parallel Fortran Dialets," IEEE Software, Sept 1988 "CF77 Compiling System, Volume 4: Parallel Proessing Guide," SG-97 CRA Y Ruearh, In, Mendota HeightlJ, MN, 199 "KAPjCRAY User's Guide," Kuk & AlJIJolatu, In, Champaign, IL, 1989 "The Forge User's Guide Version 71," Paifi-Sierra Ruearh, De 199 L Kipp, "Perfet s Doumentation, Suite 1," CSRD, Univeraity of RlinoilJ at Urbana-Champaign, IL, 199 Doreen Y Cheng, "Forge Evaluation and An Ideal Parallel Programming Environment," Submitted to the IEEE, ACM WorklJhop on Parallel Programming TooilJ, Hawaii, 1991? S P e e d u P 1 Figure 1- Default Vetorization Performane 9 S 8 e 7 6, 5 n 4, d Figure Jj 1 \:, 8 CvtCd Compile Times with Default Vetorization Figure 'pp o KAP KAP+lpp 2 Speedup from Enhaned Vetorization fpp o KAP KAP+fpp Ii 2, u l:i Jj"a u ' u u u-uu 1 -,-, IiI - I I -' Figure 4 - Enhaned Vetor Compilation Time Expense 418

8 Tv/T r: C o U >! o 1pp o KAP KAP+lpp C 1:' : ( w Automati Interative UNICOS CF fmp Cpp Z61 -Wd-e46ijt -Wd-e46ijt KAP 11 Not Used Forge Not Used Table 1 - System Software Used h/tb BOO CplCd C C Figure 5 5 Figure U 5-4 CPU Speedups o 1 o 15 C 6-8 CPU Speedups >! U 1pp o KAP KAP+lpp 1pp o KAP KAP+lpp U o CC "' o 2 25 C C C u P 4P 8P fpp % 58% 35% Forge % 83% 54 68% Forge No Dep % 68% 44% Forge T> 1% % 65% Forge T> 1% % 8% 49 61% Forge + Cpp Vetor % 35% Forge + fpp Parallel % 87% 75% Table 3 - Elapsed time (seonds), speedup, and multiproessor effiieny of ARC3D 3 Figure 7 - Parallel Compilation Expense S P e e d u P pp o KAP KAP+lpp r -1 o C C ; ; o % Vetorization Figure 8-4 CPU Speedup vs % Vetorization 419

9 IV o ' NAS1 NAS2 NAS3 NAS4 NAS5 NAS6 NAS7 NAS8 NAS9 NAS1 Name MFLOP9 NASKERN ADM ARC2D BONA DYFESM FL52 MX M33D OCEAN an SPEC77 SPICE TRACK TRFD Soure Size Floating Point Floating Point Floating Point Floating Point Data Transferred Lines (MW) Adds Multiplies Reiproals Operations Mbytes 6, ,562,115,68 11,752,4,33 62,432,216 2,934,947,317 1C 4, ,32,89,324 4,81,1,58 5,764,488,75 13,885,579,87 3 2, ,647,16,394 5,594,873,334 2,346,221 13,244,379, ,68,489,244 6,674,198,82 2,186,73 14,356,873, , ,55,588,3 11,652,969,46 5,241,896 28,253,799, , ,598,13,65 7,86, ,732,247 12,89,63, ,371, ,997, ,834 1,425,483, , ,656,174,913 28,279,511,154 3,829,361,83 5,765,47, , ,14,85,825 1,354,68,536 1,39,31,52 17,678,4, , ,446,177,882 41,125,391,455 3,115,375,229 77,686,944, , ,95,915 79,786,453 4,663,693 23,546, , ,8,724,151 1,116,588,777 32,395,31 2,229,77, , ,824,288 24,133,23 23,57, ,14, , ,61,635 1,32,141,17 132,689,166 2,25,891, , ,13, ,838,352 92,161,725 1,177,13, , ,624,57 144,699,124 42, , , ,684, ,92,561 31,327,69 641,932, ,27 11! 153,971,939 1,485,467, ,938,87 3,94,378, , ,694,57,639 5,165,166, ,632,72 12,158,37, , ,682,527 55,872,466 19,5,428 1,53,65,421 19; 2, ,3, ,473,915 3,41,35 259,545, , ,92,944,788 86,19,669 15,52,16 1,968,466, , ,286,311 24,82,326 3,662,444 57,769, , ,827,571 43,597,17 1,36,233 84,784, ,132,61 216,213, , ,81, Average Minimum Maximum Standard Dev Total , ,882,953,279 5,68,64,3 64,873,935 11,132,467, ,286,311 24,82,326 42,488 57,769, , ,446,177,882 41,125,391,455 5,764,488,75 77,686, 944, , ,794,988,825 9,722,72,34 1,44,271,652 18,22,461, , ,73,831,967 14,216,7,53 16,21,848, ,311,687, Table 2 - Charateristis

10 1P 4P 8P 1P 4P 8P fpp % 27% 14% fpp % 68% 47% Forge % 83% 5% Forge No Dep % 33% 17% Forge T> 1% % 47% 28% Forge + fpp Vetor % 55% 24% Forge + fpp Parallel % 85% Table 4 - Elapsed time, speedup, and multiproessor effiieny of NAS4 54% fpp Small % 66% 41% Forge Small % 68% 41% Forge T > 1% Small 1% 72% % Forge T> 1% Small 1% 77% 52% Forge T> 1% Table 6 - Elapsed time, speedup, and multiproessor effiieny of NAS8 fpp 1P 4P 8P % 25% 13% Forge % 51% 27% Forge No Dep % 25% 13% Forge T> 1% % 52% 27% Forge + fpp Vetor % 24% 12% Forge + fpp Parallel % 5% 27% 1P 4P 8P fpp % 63% 38% fpp Small % 51% 29% Forge Small % 75% 48% Forge T > 1% Small 1% 78% 51% Forge T > 1% Table 7 - Elapsed time, speedup, and multiproeessor effieieney of NAS1 Table 5 - Elapsed time, speedup, and multiproessor effiieny of NAS7 421

11 l/fpp IP 4P 8P fpp Forge Forge No Dep Forge T > 1% Forge T> 1% Forge + fpp Vetor Forge + fpp Parallel Table 8 - Elapsed time of ARC3D normalized wrt Ipp Version Ijfpp IP 4P 8P fpp Forge Forge T> 1% Forge T> 1% Table 11 - Elapsed time of NAS8 (small data set) normalized wrt Ipp Version l/fpp IP 4P 8P fpp Forge Forge 698 T > 1% l/fpp IP 4P 8P fpp 1 1 I Forge Forge No Dep Forge T> 1% Forge + fpp Vetor Forge + fpp Parallel Table 9 - Elapsed time of NAS4 normalized wrt Ipp Version Table 12 - Elapsed time of NAS1 (small data set) normalized wrt Ipp Ver,ion Forge/fpp Forge/fpp Forge/fpp {lpj (4P) (8P) ARC3D NAS NAS NAS NASI Table 14 - Normalized exeution time wrt Ipp Version (same loops parallelized) l/fpp 1P 4P 8P fpp Forge Forge No Dep Forge T> 1% Forge + fpp Vetor Forge + fpp Parallel Default fpp Forge No Dep Vetorization ARC 3D NAS NAS NAS NABlO Table 1 - Elapsed time of NAS7 normalized wrt Ipp Version Table 13 - Single proessor performane of the Ipp Ver,ion, Forge No Dep Ver"ion and the version using only default vetorization The top number of eah entry is the elapsed time The bottom number is the elapsed time normalized wrt to the default time 422

12 fpp Forge Insert Y Y DO ALL Use Options Besides SHARED & PRIVATE Y N Chek Run Time Y N Loop Length Parallel Outer Y N Vetor Inner Take Advantage of Y N CRAY Vetor Unit Table 15 - Examples of differenes in ode generation ARC3D NAS4 NAS7 NAS8 NASI (%) (%) (%) (%) (%) ParalieliJable Loops T < 1% T < 1% T < 1% MAXT No Depend o -139 False Depend Table 19 - loop harateristis 46 Forgejfpp (ip) Forge/fpp (4P) ARC3D NAS NAS NAS NASI Forge/fpp (8P) Table 16 - Normalized exeution time wrt fpp Ver4ion (all loops parallelized in the Forge Ver4ion) Forgejfpp Forge/fpp Forgejfpp (lp) (4P) (8P) ARC3D NAS NAS NAS NASI ARC3D NAS4 NAS7 NAS8 NASI Saling LMAX Variables KMAX M NEQ NNX JDIM JMAX NNY KDIM Total Number of Loops Num of Loops with True Dep & Salable Bnd Largest Perentage 2% 22% 3% 3% 65% of Time Table 2 - Number and granularity (measured by perentage of time) of the serial loops whose bounds are proportional to the problem size Table 17 - Best performane normalized wrt the fpp Ver4ion l/forge T> 1% T> 1% (lp) (lp) ARC 3D 1 98 NAS4 2 NAS7 1 NASOB NAS1 97 T > 1% T> 1% T > 1% T> 1% (4P) (4P) (8P) (8P) Table 18 - Effet of using perentage of time as a measure of granularity on performane 423

System-Level Parallelism and Throughput Optimization in Designing Reconfigurable Computing Applications

System-Level Parallelism and Throughput Optimization in Designing Reconfigurable Computing Applications System-Level Parallelism and hroughput Optimization in Designing Reonfigurable Computing Appliations Esam El-Araby 1, Mohamed aher 1, Kris Gaj 2, arek El-Ghazawi 1, David Caliga 3, and Nikitas Alexandridis

More information

Pipelined Multipliers for Reconfigurable Hardware

Pipelined Multipliers for Reconfigurable Hardware Pipelined Multipliers for Reonfigurable Hardware Mithell J. Myjak and José G. Delgado-Frias Shool of Eletrial Engineering and Computer Siene, Washington State University Pullman, WA 99164-2752 USA {mmyjak,

More information

The Minimum Redundancy Maximum Relevance Approach to Building Sparse Support Vector Machines

The Minimum Redundancy Maximum Relevance Approach to Building Sparse Support Vector Machines The Minimum Redundany Maximum Relevane Approah to Building Sparse Support Vetor Mahines Xiaoxing Yang, Ke Tang, and Xin Yao, Nature Inspired Computation and Appliations Laboratory (NICAL), Shool of Computer

More information

On - Line Path Delay Fault Testing of Omega MINs M. Bellos 1, E. Kalligeros 1, D. Nikolos 1,2 & H. T. Vergos 1,2

On - Line Path Delay Fault Testing of Omega MINs M. Bellos 1, E. Kalligeros 1, D. Nikolos 1,2 & H. T. Vergos 1,2 On - Line Path Delay Fault Testing of Omega MINs M. Bellos, E. Kalligeros, D. Nikolos,2 & H. T. Vergos,2 Dept. of Computer Engineering and Informatis 2 Computer Tehnology Institute University of Patras,

More information

13.1 Numerical Evaluation of Integrals Over One Dimension

13.1 Numerical Evaluation of Integrals Over One Dimension 13.1 Numerial Evaluation of Integrals Over One Dimension A. Purpose This olletion of subprograms estimates the value of the integral b a f(x) dx where the integrand f(x) and the limits a and b are supplied

More information

A Novel Validity Index for Determination of the Optimal Number of Clusters

A Novel Validity Index for Determination of the Optimal Number of Clusters IEICE TRANS. INF. & SYST., VOL.E84 D, NO.2 FEBRUARY 2001 281 LETTER A Novel Validity Index for Determination of the Optimal Number of Clusters Do-Jong KIM, Yong-Woon PARK, and Dong-Jo PARK, Nonmembers

More information

Outline: Software Design

Outline: Software Design Outline: Software Design. Goals History of software design ideas Design priniples Design methods Life belt or leg iron? (Budgen) Copyright Nany Leveson, Sept. 1999 A Little History... At first, struggling

More information

What are Cycle-Stealing Systems Good For? A Detailed Performance Model Case Study

What are Cycle-Stealing Systems Good For? A Detailed Performance Model Case Study What are Cyle-Stealing Systems Good For? A Detailed Performane Model Case Study Wayne Kelly and Jiro Sumitomo Queensland University of Tehnology, Australia {w.kelly, j.sumitomo}@qut.edu.au Abstrat The

More information

A DYNAMIC ACCESS CONTROL WITH BINARY KEY-PAIR

A DYNAMIC ACCESS CONTROL WITH BINARY KEY-PAIR Malaysian Journal of Computer Siene, Vol 10 No 1, June 1997, pp 36-41 A DYNAMIC ACCESS CONTROL WITH BINARY KEY-PAIR Md Rafiqul Islam, Harihodin Selamat and Mohd Noor Md Sap Faulty of Computer Siene and

More information

Automatic Physical Design Tuning: Workload as a Sequence Sanjay Agrawal Microsoft Research One Microsoft Way Redmond, WA, USA +1-(425)

Automatic Physical Design Tuning: Workload as a Sequence Sanjay Agrawal Microsoft Research One Microsoft Way Redmond, WA, USA +1-(425) Automati Physial Design Tuning: Workload as a Sequene Sanjay Agrawal Mirosoft Researh One Mirosoft Way Redmond, WA, USA +1-(425) 75-357 sagrawal@mirosoft.om Eri Chu * Computer Sienes Department University

More information

Exploring the Commonality in Feature Modeling Notations

Exploring the Commonality in Feature Modeling Notations Exploring the Commonality in Feature Modeling Notations Miloslav ŠÍPKA Slovak University of Tehnology Faulty of Informatis and Information Tehnologies Ilkovičova 3, 842 16 Bratislava, Slovakia miloslav.sipka@gmail.om

More information

Detecting Outliers in High-Dimensional Datasets with Mixed Attributes

Detecting Outliers in High-Dimensional Datasets with Mixed Attributes Deteting Outliers in High-Dimensional Datasets with Mixed Attributes A. Koufakou, M. Georgiopoulos, and G.C. Anagnostopoulos 2 Shool of EECS, University of Central Florida, Orlando, FL, USA 2 Dept. of

More information

COST PERFORMANCE ASPECTS OF CCD FAST AUXILIARY MEMORY

COST PERFORMANCE ASPECTS OF CCD FAST AUXILIARY MEMORY COST PERFORMANCE ASPECTS OF CCD FAST AUXILIARY MEMORY Dileep P, Bhondarkor Texas Instruments Inorporated Dallas, Texas ABSTRACT Charge oupled devies (CCD's) hove been mentioned as potential fast auxiliary

More information

NONLINEAR BACK PROJECTION FOR TOMOGRAPHIC IMAGE RECONSTRUCTION. Ken Sauer and Charles A. Bouman

NONLINEAR BACK PROJECTION FOR TOMOGRAPHIC IMAGE RECONSTRUCTION. Ken Sauer and Charles A. Bouman NONLINEAR BACK PROJECTION FOR TOMOGRAPHIC IMAGE RECONSTRUCTION Ken Sauer and Charles A. Bouman Department of Eletrial Engineering, University of Notre Dame Notre Dame, IN 46556, (219) 631-6999 Shool of

More information

Performance of Histogram-Based Skin Colour Segmentation for Arms Detection in Human Motion Analysis Application

Performance of Histogram-Based Skin Colour Segmentation for Arms Detection in Human Motion Analysis Application World Aademy of Siene, Engineering and Tehnology 8 009 Performane of Histogram-Based Skin Colour Segmentation for Arms Detetion in Human Motion Analysis Appliation Rosalyn R. Porle, Ali Chekima, Farrah

More information

Algorithms, Mechanisms and Procedures for the Computer-aided Project Generation System

Algorithms, Mechanisms and Procedures for the Computer-aided Project Generation System Algorithms, Mehanisms and Proedures for the Computer-aided Projet Generation System Anton O. Butko 1*, Aleksandr P. Briukhovetskii 2, Dmitry E. Grigoriev 2# and Konstantin S. Kalashnikov 3 1 Department

More information

Self-Adaptive Parent to Mean-Centric Recombination for Real-Parameter Optimization

Self-Adaptive Parent to Mean-Centric Recombination for Real-Parameter Optimization Self-Adaptive Parent to Mean-Centri Reombination for Real-Parameter Optimization Kalyanmoy Deb and Himanshu Jain Department of Mehanial Engineering Indian Institute of Tehnology Kanpur Kanpur, PIN 86 {deb,hjain}@iitk.a.in

More information

Exploiting Enriched Contextual Information for Mobile App Classification

Exploiting Enriched Contextual Information for Mobile App Classification Exploiting Enrihed Contextual Information for Mobile App Classifiation Hengshu Zhu 1 Huanhuan Cao 2 Enhong Chen 1 Hui Xiong 3 Jilei Tian 2 1 University of Siene and Tehnology of China 2 Nokia Researh Center

More information

Reading Object Code. A Visible/Z Lesson

Reading Object Code. A Visible/Z Lesson Reading Objet Code A Visible/Z Lesson The Idea: When programming in a high-level language, we rarely have to think about the speifi ode that is generated for eah instrution by a ompiler. But as an assembly

More information

Extracting Partition Statistics from Semistructured Data

Extracting Partition Statistics from Semistructured Data Extrating Partition Statistis from Semistrutured Data John N. Wilson Rihard Gourlay Robert Japp Mathias Neumüller Department of Computer and Information Sienes University of Strathlyde, Glasgow, UK {jnw,rsg,rpj,mathias}@is.strath.a.uk

More information

Analysis of input and output configurations for use in four-valued CCD programmable logic arrays

Analysis of input and output configurations for use in four-valued CCD programmable logic arrays nalysis of input and output onfigurations for use in four-valued D programmable logi arrays J.T. utler H.G. Kerkhoff ndexing terms: Logi, iruit theory and design, harge-oupled devies bstrat: s in binary,

More information

Parallelizing Frequent Web Access Pattern Mining with Partial Enumeration for High Speedup

Parallelizing Frequent Web Access Pattern Mining with Partial Enumeration for High Speedup Parallelizing Frequent Web Aess Pattern Mining with Partial Enumeration for High Peiyi Tang Markus P. Turkia Department of Computer Siene Department of Computer Siene University of Arkansas at Little Rok

More information

Contents Contents...I List of Tables...VIII List of Figures...IX 1. Introduction Information Retrieval... 8

Contents Contents...I List of Tables...VIII List of Figures...IX 1. Introduction Information Retrieval... 8 Contents Contents...I List of Tables...VIII List of Figures...IX 1. Introdution... 1 1.1. Internet Information...2 1.2. Internet Information Retrieval...3 1.2.1. Doument Indexing...4 1.2.2. Doument Retrieval...4

More information

Partial Character Decoding for Improved Regular Expression Matching in FPGAs

Partial Character Decoding for Improved Regular Expression Matching in FPGAs Partial Charater Deoding for Improved Regular Expression Mathing in FPGAs Peter Sutton Shool of Information Tehnology and Eletrial Engineering The University of Queensland Brisbane, Queensland, 4072, Australia

More information

the data. Structured Principal Component Analysis (SPCA)

the data. Structured Principal Component Analysis (SPCA) Strutured Prinipal Component Analysis Kristin M. Branson and Sameer Agarwal Department of Computer Siene and Engineering University of California, San Diego La Jolla, CA 9193-114 Abstrat Many tasks involving

More information

Boosted Random Forest

Boosted Random Forest Boosted Random Forest Yohei Mishina, Masamitsu suhiya and Hironobu Fujiyoshi Department of Computer Siene, Chubu University, 1200 Matsumoto-ho, Kasugai, Aihi, Japan {mishi, mtdoll}@vision.s.hubu.a.jp,

More information

Cross-layer Resource Allocation on Broadband Power Line Based on Novel QoS-priority Scheduling Function in MAC Layer

Cross-layer Resource Allocation on Broadband Power Line Based on Novel QoS-priority Scheduling Function in MAC Layer Communiations and Networ, 2013, 5, 69-73 http://dx.doi.org/10.4236/n.2013.53b2014 Published Online September 2013 (http://www.sirp.org/journal/n) Cross-layer Resoure Alloation on Broadband Power Line Based

More information

We don t need no generation - a practical approach to sliding window RLNC

We don t need no generation - a practical approach to sliding window RLNC We don t need no generation - a pratial approah to sliding window RLNC Simon Wunderlih, Frank Gabriel, Sreekrishna Pandi, Frank H.P. Fitzek Deutshe Telekom Chair of Communiation Networks, TU Dresden, Dresden,

More information

HEXA: Compact Data Structures for Faster Packet Processing

HEXA: Compact Data Structures for Faster Packet Processing Washington University in St. Louis Washington University Open Sholarship All Computer Siene and Engineering Researh Computer Siene and Engineering Report Number: 27-26 27 HEXA: Compat Data Strutures for

More information

Improved Circuit-to-CNF Transformation for SAT-based ATPG

Improved Circuit-to-CNF Transformation for SAT-based ATPG Improved Ciruit-to-CNF Transformation for SAT-based ATPG Daniel Tille 1 René Krenz-Bååth 2 Juergen Shloeffel 2 Rolf Drehsler 1 1 Institute of Computer Siene, University of Bremen, 28359 Bremen, Germany

More information

The recursive decoupling method for solving tridiagonal linear systems

The recursive decoupling method for solving tridiagonal linear systems Loughborough University Institutional Repository The reursive deoupling method for solving tridiagonal linear systems This item was submitted to Loughborough University's Institutional Repository by the/an

More information

Reading Object Code. A Visible/Z Lesson

Reading Object Code. A Visible/Z Lesson Reading Objet Code A Visible/Z Lesson The Idea: When programming in a high-level language, we rarely have to think about the speifi ode that is generated for eah instrution by a ompiler. But as an assembly

More information

Approximate logic synthesis for error tolerant applications

Approximate logic synthesis for error tolerant applications Approximate logi synthesis for error tolerant appliations Doohul Shin and Sandeep K. Gupta Eletrial Engineering Department, University of Southern California, Los Angeles, CA 989 {doohuls, sandeep}@us.edu

More information

Dynamic Backlight Adaptation for Low Power Handheld Devices 1

Dynamic Backlight Adaptation for Low Power Handheld Devices 1 Dynami Baklight Adaptation for ow Power Handheld Devies 1 Sudeep Pasriha, Manev uthra, Shivajit Mohapatra, Nikil Dutt and Nalini Venkatasubramanian 444, Computer Siene Building, Shool of Information &

More information

Abstract. Key Words: Image Filters, Fuzzy Filters, Order Statistics Filters, Rank Ordered Mean Filters, Channel Noise. 1.

Abstract. Key Words: Image Filters, Fuzzy Filters, Order Statistics Filters, Rank Ordered Mean Filters, Channel Noise. 1. Fuzzy Weighted Rank Ordered Mean (FWROM) Filters for Mixed Noise Suppression from Images S. Meher, G. Panda, B. Majhi 3, M.R. Meher 4,,4 Department of Eletronis and I.E., National Institute of Tehnology,

More information

A Novel Bit Level Time Series Representation with Implication of Similarity Search and Clustering

A Novel Bit Level Time Series Representation with Implication of Similarity Search and Clustering A Novel Bit Level Time Series Representation with Impliation of Similarity Searh and lustering hotirat Ratanamahatana, Eamonn Keogh, Anthony J. Bagnall 2, and Stefano Lonardi Dept. of omputer Siene & Engineering,

More information

Acoustic Links. Maximizing Channel Utilization for Underwater

Acoustic Links. Maximizing Channel Utilization for Underwater Maximizing Channel Utilization for Underwater Aousti Links Albert F Hairris III Davide G. B. Meneghetti Adihele Zorzi Department of Information Engineering University of Padova, Italy Email: {harris,davide.meneghetti,zorzi}@dei.unipd.it

More information

Multi-Channel Wireless Networks: Capacity and Protocols

Multi-Channel Wireless Networks: Capacity and Protocols Multi-Channel Wireless Networks: Capaity and Protools Tehnial Report April 2005 Pradeep Kyasanur Dept. of Computer Siene, and Coordinated Siene Laboratory, University of Illinois at Urbana-Champaign Email:

More information

SVC-DASH-M: Scalable Video Coding Dynamic Adaptive Streaming Over HTTP Using Multiple Connections

SVC-DASH-M: Scalable Video Coding Dynamic Adaptive Streaming Over HTTP Using Multiple Connections SVC-DASH-M: Salable Video Coding Dynami Adaptive Streaming Over HTTP Using Multiple Connetions Samar Ibrahim, Ahmed H. Zahran and Mahmoud H. Ismail Department of Eletronis and Eletrial Communiations, Faulty

More information

Direct-Mapped Caches

Direct-Mapped Caches A Case for Diret-Mapped Cahes Mark D. Hill University of Wisonsin ahe is a small, fast buffer in whih a system keeps those parts, of the ontents of a larger, slower memory that are likely to be used soon.

More information

Chapter 2: Introduction to Maple V

Chapter 2: Introduction to Maple V Chapter 2: Introdution to Maple V 2-1 Working with Maple Worksheets Try It! (p. 15) Start a Maple session with an empty worksheet. The name of the worksheet should be Untitled (1). Use one of the standard

More information

Capturing Large Intra-class Variations of Biometric Data by Template Co-updating

Capturing Large Intra-class Variations of Biometric Data by Template Co-updating Capturing Large Intra-lass Variations of Biometri Data by Template Co-updating Ajita Rattani University of Cagliari Piazza d'armi, Cagliari, Italy ajita.rattani@diee.unia.it Gian Lua Marialis University

More information

Methods for Multi-Dimensional Robustness Optimization in Complex Embedded Systems

Methods for Multi-Dimensional Robustness Optimization in Complex Embedded Systems Methods for Multi-Dimensional Robustness Optimization in Complex Embedded Systems Arne Hamann, Razvan Rau, Rolf Ernst Institute of Computer and Communiation Network Engineering Tehnial University of Braunshweig,

More information

Allocating Rotating Registers by Scheduling

Allocating Rotating Registers by Scheduling Alloating Rotating Registers by Sheduling Hongbo Rong Hyunhul Park Cheng Wang Youfeng Wu Programming Systems Lab Intel Labs {hongbo.rong,hyunhul.park,heng..wang,youfeng.wu}@intel.om ABSTRACT A rotating

More information

RAC 2 E: Novel Rendezvous Protocol for Asynchronous Cognitive Radios in Cooperative Environments

RAC 2 E: Novel Rendezvous Protocol for Asynchronous Cognitive Radios in Cooperative Environments 21st Annual IEEE International Symposium on Personal, Indoor and Mobile Radio Communiations 1 RAC 2 E: Novel Rendezvous Protool for Asynhronous Cognitive Radios in Cooperative Environments Valentina Pavlovska,

More information

Alleviating DFT cost using testability driven HLS

Alleviating DFT cost using testability driven HLS Alleviating DFT ost using testability driven HLS M.L.Flottes, R.Pires, B.Rouzeyre Laboratoire d Informatique, de Robotique et de Miroéletronique de Montpellier, U.M. CNRS 5506 6 rue Ada, 34392 Montpellier

More information

Flow Demands Oriented Node Placement in Multi-Hop Wireless Networks

Flow Demands Oriented Node Placement in Multi-Hop Wireless Networks Flow Demands Oriented Node Plaement in Multi-Hop Wireless Networks Zimu Yuan Institute of Computing Tehnology, CAS, China {zimu.yuan}@gmail.om arxiv:153.8396v1 [s.ni] 29 Mar 215 Abstrat In multi-hop wireless

More information

Define - starting approximation for the parameters (p) - observational data (o) - solution criterion (e.g. number of iterations)

Define - starting approximation for the parameters (p) - observational data (o) - solution criterion (e.g. number of iterations) Global Iterative Solution Distributed proessing of the attitude updating L. Lindegren (21 May 2001) SAG LL 37 Abstrat. The attitude updating algorithm given in GAIA LL 24 (v. 2) is modified to allow distributed

More information

A Load-Balanced Clustering Protocol for Hierarchical Wireless Sensor Networks

A Load-Balanced Clustering Protocol for Hierarchical Wireless Sensor Networks International Journal of Advanes in Computer Networks and Its Seurity IJCNS A Load-Balaned Clustering Protool for Hierarhial Wireless Sensor Networks Mehdi Tarhani, Yousef S. Kavian, Saman Siavoshi, Ali

More information

A Partial Sorting Algorithm in Multi-Hop Wireless Sensor Networks

A Partial Sorting Algorithm in Multi-Hop Wireless Sensor Networks A Partial Sorting Algorithm in Multi-Hop Wireless Sensor Networks Abouberine Ould Cheikhna Department of Computer Siene University of Piardie Jules Verne 80039 Amiens Frane Ould.heikhna.abouberine @u-piardie.fr

More information

Measurement of the stereoscopic rangefinder beam angular velocity using the digital image processing method

Measurement of the stereoscopic rangefinder beam angular velocity using the digital image processing method Measurement of the stereosopi rangefinder beam angular veloity using the digital image proessing method ROMAN VÍTEK Department of weapons and ammunition University of defense Kouniova 65, 62 Brno CZECH

More information

represent = as a finite deimal" either in base 0 or in base. We an imagine that the omputer first omputes the mathematial = then rounds the result to

represent = as a finite deimal either in base 0 or in base. We an imagine that the omputer first omputes the mathematial = then rounds the result to Sientifi Computing Chapter I Computer Arithmeti Jonathan Goodman Courant Institute of Mathemaial Sienes Last revised January, 00 Introdution One of the many soures of error in sientifi omputing is inexat

More information

Performance Benchmarks for an Interactive Video-on-Demand System

Performance Benchmarks for an Interactive Video-on-Demand System Performane Benhmarks for an Interative Video-on-Demand System. Guo,P.G.Taylor,E.W.M.Wong,S.Chan,M.Zukerman andk.s.tang ARC Speial Researh Centre for Ultra-Broadband Information Networks (CUBIN) Department

More information

Real-Time Control for a Turbojet Engine

Real-Time Control for a Turbojet Engine A Multiproessor mplementation of Real-Time Control for a Turbojet Engine Phillip L. Shaffer ABSTRACT: A real-time ontrol program for a turbojet engine has been implemented on a four-proessor omputer, ahieving

More information

Volume 3, Issue 9, September 2013 International Journal of Advanced Research in Computer Science and Software Engineering

Volume 3, Issue 9, September 2013 International Journal of Advanced Research in Computer Science and Software Engineering Volume 3, Issue 9, September 2013 ISSN: 2277 128X International Journal of Advaned Researh in Computer Siene and Software Engineering Researh Paper Available online at: www.ijarsse.om A New-Fangled Algorithm

More information

CleanUp: Improving Quadrilateral Finite Element Meshes

CleanUp: Improving Quadrilateral Finite Element Meshes CleanUp: Improving Quadrilateral Finite Element Meshes Paul Kinney MD-10 ECC P.O. Box 203 Ford Motor Company Dearborn, MI. 8121 (313) 28-1228 pkinney@ford.om Abstrat: Unless an all quadrilateral (quad)

More information

Calculation of typical running time of a branch-and-bound algorithm for the vertex-cover problem

Calculation of typical running time of a branch-and-bound algorithm for the vertex-cover problem Calulation of typial running time of a branh-and-bound algorithm for the vertex-over problem Joni Pajarinen, Joni.Pajarinen@iki.fi Otober 21, 2007 1 Introdution The vertex-over problem is one of a olletion

More information

Detection and Recognition of Non-Occluded Objects using Signature Map

Detection and Recognition of Non-Occluded Objects using Signature Map 6th WSEAS International Conferene on CIRCUITS, SYSTEMS, ELECTRONICS,CONTROL & SIGNAL PROCESSING, Cairo, Egypt, De 9-31, 007 65 Detetion and Reognition of Non-Oluded Objets using Signature Map Sangbum Park,

More information

Cluster-based Cooperative Communication with Network Coding in Wireless Networks

Cluster-based Cooperative Communication with Network Coding in Wireless Networks Cluster-based Cooperative Communiation with Network Coding in Wireless Networks Zygmunt J. Haas Shool of Eletrial and Computer Engineering Cornell University Ithaa, NY 4850, U.S.A. Email: haas@ee.ornell.edu

More information

Query Evaluation Overview. Query Optimization: Chap. 15. Evaluation Example. Cost Estimation. Query Blocks. Query Blocks

Query Evaluation Overview. Query Optimization: Chap. 15. Evaluation Example. Cost Estimation. Query Blocks. Query Blocks Query Evaluation Overview Query Optimization: Chap. 15 CS634 Leture 12 SQL query first translated to relational algebra (RA) Atually, some additional operators needed for SQL Tree of RA operators, with

More information

mahines. HBSP enhanes the appliability of the BSP model by inorporating parameters that reet the relative speeds of the heterogeneous omputing omponen

mahines. HBSP enhanes the appliability of the BSP model by inorporating parameters that reet the relative speeds of the heterogeneous omputing omponen The Heterogeneous Bulk Synhronous Parallel Model Tiani L. Williams and Rebea J. Parsons Shool of Computer Siene University of Central Florida Orlando, FL 32816-2362 fwilliams,rebeag@s.uf.edu Abstrat. Trends

More information

The Implementation of RRTs for a Remote-Controlled Mobile Robot

The Implementation of RRTs for a Remote-Controlled Mobile Robot ICCAS5 June -5, KINEX, Gyeonggi-Do, Korea he Implementation of RRs for a Remote-Controlled Mobile Robot Chi-Won Roh*, Woo-Sub Lee **, Sung-Chul Kang *** and Kwang-Won Lee **** * Intelligent Robotis Researh

More information

Computing Pool: a Simplified and Practical Computational Grid Model

Computing Pool: a Simplified and Practical Computational Grid Model Computing Pool: a Simplified and Pratial Computational Grid Model Peng Liu, Yao Shi, San-li Li Institute of High Performane Computing, Department of Computer Siene and Tehnology, Tsinghua University, Beijing,

More information

Architecture and Performance of the Hitachi SR2201 Massively Parallel Processor System

Architecture and Performance of the Hitachi SR2201 Massively Parallel Processor System Arhiteture and Performane of the Hitahi SR221 Massively Parallel Proessor System Hiroaki Fujii, Yoshiko Yasuda, Hideya Akashi, Yasuhiro Inagami, Makoto Koga*, Osamu Ishihara*, Masamori Kashiyama*, Hideo

More information

Implementing Load-Balanced Switches With Fat-Tree Networks

Implementing Load-Balanced Switches With Fat-Tree Networks Implementing Load-Balaned Swithes With Fat-Tree Networks Hung-Shih Chueh, Ching-Min Lien, Cheng-Shang Chang, Jay Cheng, and Duan-Shin Lee Department of Eletrial Engineering & Institute of Communiations

More information

Scheduling Multiple Independent Hard-Real-Time Jobs on a Heterogeneous Multiprocessor

Scheduling Multiple Independent Hard-Real-Time Jobs on a Heterogeneous Multiprocessor Sheduling Multiple Independent Hard-Real-Time Jobs on a Heterogeneous Multiproessor Orlando Moreira NXP Semiondutors Researh Eindhoven, Netherlands orlando.moreira@nxp.om Frederio Valente Universidade

More information

An Optimized Approach on Applying Genetic Algorithm to Adaptive Cluster Validity Index

An Optimized Approach on Applying Genetic Algorithm to Adaptive Cluster Validity Index IJCSES International Journal of Computer Sienes and Engineering Systems, ol., No.4, Otober 2007 CSES International 2007 ISSN 0973-4406 253 An Optimized Approah on Applying Geneti Algorithm to Adaptive

More information

A Dual-Hamiltonian-Path-Based Multicasting Strategy for Wormhole-Routed Star Graph Interconnection Networks

A Dual-Hamiltonian-Path-Based Multicasting Strategy for Wormhole-Routed Star Graph Interconnection Networks A Dual-Hamiltonian-Path-Based Multiasting Strategy for Wormhole-Routed Star Graph Interonnetion Networks Nen-Chung Wang Department of Information and Communiation Engineering Chaoyang University of Tehnology,

More information

FUZZY WATERSHED FOR IMAGE SEGMENTATION

FUZZY WATERSHED FOR IMAGE SEGMENTATION FUZZY WATERSHED FOR IMAGE SEGMENTATION Ramón Moreno, Manuel Graña Computational Intelligene Group, Universidad del País Vaso, Spain http://www.ehu.es/winto; {ramon.moreno,manuel.grana}@ehu.es Abstrat The

More information

Recommendation Subgraphs for Web Discovery

Recommendation Subgraphs for Web Discovery Reommation Subgraphs for Web Disovery Arda Antikaioglu Department of Mathematis Carnegie Mellon University aantika@andrew.mu.edu R. Ravi Tepper Shool of Business Carnegie Mellon University ravi@mu.edu

More information

Learning Convention Propagation in BeerAdvocate Reviews from a etwork Perspective. Abstract

Learning Convention Propagation in BeerAdvocate Reviews from a etwork Perspective. Abstract CS 9 Projet Final Report: Learning Convention Propagation in BeerAdvoate Reviews from a etwork Perspetive Abstrat We look at the way onventions propagate between reviews on the BeerAdvoate dataset, and

More information

Video Data and Sonar Data: Real World Data Fusion Example

Video Data and Sonar Data: Real World Data Fusion Example 14th International Conferene on Information Fusion Chiago, Illinois, USA, July 5-8, 2011 Video Data and Sonar Data: Real World Data Fusion Example David W. Krout Applied Physis Lab dkrout@apl.washington.edu

More information

Space- and Time-Efficient BDD Construction via Working Set Control

Space- and Time-Efficient BDD Construction via Working Set Control Spae- and Time-Effiient BDD Constrution via Working Set Control Bwolen Yang Yirng-An Chen Randal E. Bryant David R. O Hallaron Computer Siene Department Carnegie Mellon University Pittsburgh, PA 15213.

More information

INTERPOLATED AND WARPED 2-D DIGITAL WAVEGUIDE MESH ALGORITHMS

INTERPOLATED AND WARPED 2-D DIGITAL WAVEGUIDE MESH ALGORITHMS Proeedings of the COST G-6 Conferene on Digital Audio Effets (DAFX-), Verona, Italy, Deember 7-9, INTERPOLATED AND WARPED -D DIGITAL WAVEGUIDE MESH ALGORITHMS Vesa Välimäki Lab. of Aoustis and Audio Signal

More information

Smooth Trajectory Planning Along Bezier Curve for Mobile Robots with Velocity Constraints

Smooth Trajectory Planning Along Bezier Curve for Mobile Robots with Velocity Constraints Smooth Trajetory Planning Along Bezier Curve for Mobile Robots with Veloity Constraints Gil Jin Yang and Byoung Wook Choi Department of Eletrial and Information Engineering Seoul National University of

More information

Torpedo Trajectory Visual Simulation Based on Nonlinear Backstepping Control

Torpedo Trajectory Visual Simulation Based on Nonlinear Backstepping Control orpedo rajetory Visual Simulation Based on Nonlinear Bakstepping Control Peng Hai-jun 1, Li Hui-zhou Chen Ye 1, 1. Depart. of Weaponry Eng, Naval Univ. of Engineering, Wuhan 400, China. Depart. of Aeronautial

More information

Performance Improvement of TCP on Wireless Cellular Networks by Adaptive FEC Combined with Explicit Loss Notification

Performance Improvement of TCP on Wireless Cellular Networks by Adaptive FEC Combined with Explicit Loss Notification erformane Improvement of TC on Wireless Cellular Networks by Adaptive Combined with Expliit Loss tifiation Masahiro Miyoshi, Masashi Sugano, Masayuki Murata Department of Infomatis and Mathematial Siene,

More information

arxiv: v1 [cs.db] 13 Sep 2017

arxiv: v1 [cs.db] 13 Sep 2017 An effiient lustering algorithm from the measure of loal Gaussian distribution Yuan-Yen Tai (Dated: May 27, 2018) In this paper, I will introdue a fast and novel lustering algorithm based on Gaussian distribution

More information

Incremental Mining of Partial Periodic Patterns in Time-series Databases

Incremental Mining of Partial Periodic Patterns in Time-series Databases CERIAS Teh Report 2000-03 Inremental Mining of Partial Periodi Patterns in Time-series Dataases Mohamed G. Elfeky Center for Eduation and Researh in Information Assurane and Seurity Purdue University,

More information

Algorithms for External Memory Lecture 6 Graph Algorithms - Weighted List Ranking

Algorithms for External Memory Lecture 6 Graph Algorithms - Weighted List Ranking Algorithms for External Memory Leture 6 Graph Algorithms - Weighted List Ranking Leturer: Nodari Sithinava Sribe: Andi Hellmund, Simon Ohsenreither 1 Introdution & Motivation After talking about I/O-effiient

More information

Multiple-Criteria Decision Analysis: A Novel Rank Aggregation Method

Multiple-Criteria Decision Analysis: A Novel Rank Aggregation Method 3537 Multiple-Criteria Deision Analysis: A Novel Rank Aggregation Method Derya Yiltas-Kaplan Department of Computer Engineering, Istanbul University, 34320, Avilar, Istanbul, Turkey Email: dyiltas@ istanbul.edu.tr

More information

DETECTION METHOD FOR NETWORK PENETRATING BEHAVIOR BASED ON COMMUNICATION FINGERPRINT

DETECTION METHOD FOR NETWORK PENETRATING BEHAVIOR BASED ON COMMUNICATION FINGERPRINT DETECTION METHOD FOR NETWORK PENETRATING BEHAVIOR BASED ON COMMUNICATION FINGERPRINT 1 ZHANGGUO TANG, 2 HUANZHOU LI, 3 MINGQUAN ZHONG, 4 JIAN ZHANG 1 Institute of Computer Network and Communiation Tehnology,

More information

特集 Road Border Recognition Using FIR Images and LIDAR Signal Processing

特集 Road Border Recognition Using FIR Images and LIDAR Signal Processing デンソーテクニカルレビュー Vol. 15 2010 特集 Road Border Reognition Using FIR Images and LIDAR Signal Proessing 高木聖和 バーゼル ファルディ Kiyokazu TAKAGI Basel Fardi ヘンドリック ヴァイゲル Hendrik Weigel ゲルド ヴァニーリック Gerd Wanielik This paper

More information

Introduction to Seismology Spring 2008

Introduction to Seismology Spring 2008 MIT OpenCourseWare http://ow.mit.edu 1.510 Introdution to Seismology Spring 008 For information about iting these materials or our Terms of Use, visit: http://ow.mit.edu/terms. 1.510 Leture Notes 3.3.007

More information

BENDING STIFFNESS AND DYNAMIC CHARACTERISTICS OF A ROTOR WITH SPLINE JOINTS

BENDING STIFFNESS AND DYNAMIC CHARACTERISTICS OF A ROTOR WITH SPLINE JOINTS Proeedings of ASME 0 International Mehanial Engineering Congress & Exposition IMECE0 November 5-, 0, San Diego, CA IMECE0-6657 BENDING STIFFNESS AND DYNAMIC CHARACTERISTICS OF A ROTOR WITH SPLINE JOINTS

More information

A scheme for racquet sports video analysis with the combination of audio-visual information

A scheme for racquet sports video analysis with the combination of audio-visual information A sheme for raquet sports video analysis with the ombination of audio-visual information Liyuan Xing a*, Qixiang Ye b, Weigang Zhang, Qingming Huang a and Hua Yu a a Graduate Shool of the Chinese Aadamy

More information

Multi-Piece Mold Design Based on Linear Mixed-Integer Program Toward Guaranteed Optimality

Multi-Piece Mold Design Based on Linear Mixed-Integer Program Toward Guaranteed Optimality INTERNATIONAL CONFERENCE ON MANUFACTURING AUTOMATION (ICMA200) Multi-Piee Mold Design Based on Linear Mixed-Integer Program Toward Guaranteed Optimality Stephen Stoyan, Yong Chen* Epstein Department of

More information

Detection of RF interference to GPS using day-to-day C/No differences

Detection of RF interference to GPS using day-to-day C/No differences 1 International Symposium on GPS/GSS Otober 6-8, 1. Detetion of RF interferene to GPS using day-to-day /o differenes Ryan J. R. Thompson 1#, Jinghui Wu #, Asghar Tabatabaei Balaei 3^, and Andrew G. Dempster

More information

Dr.Hazeem Al-Khafaji Dept. of Computer Science, Thi-Qar University, College of Science, Iraq

Dr.Hazeem Al-Khafaji Dept. of Computer Science, Thi-Qar University, College of Science, Iraq Volume 4 Issue 6 June 014 ISSN: 77 18X International Journal of Advaned Researh in Computer Siene and Software Engineering Researh Paper Available online at: www.ijarsse.om Medial Image Compression using

More information

1. Introduction. 2. The Probable Stope Algorithm

1. Introduction. 2. The Probable Stope Algorithm 1. Introdution Optimization in underground mine design has reeived less attention than that in open pit mines. This is mostly due to the diversity o underground mining methods and omplexity o underground

More information

On Dynamic Server Provisioning in Multi-channel P2P Live Streaming

On Dynamic Server Provisioning in Multi-channel P2P Live Streaming On Dynami Server Provisioning in Multi-hannel P2P Live Streaming Chuan Wu Baohun Li Shuqiao Zhao Department of Computer Siene Department of Eletrial Multimedia Development Group The University of Hong

More information

CA Release Automation 5.x Implementation Proven Professional Exam (CAT-600) Study Guide Version 1.1

CA Release Automation 5.x Implementation Proven Professional Exam (CAT-600) Study Guide Version 1.1 Exam (CAT-600) Study Guide Version 1.1 PROPRIETARY AND CONFIDENTIAL INFORMATION 2016 CA. All rights reserved. CA onfidential & proprietary information. For CA, CA Partner and CA Customer use only. No unauthorized

More information

- 1 - S 21. Directory-based Administration of Virtual Private Networks: Policy & Configuration. Charles A Kunzinger.

- 1 - S 21. Directory-based Administration of Virtual Private Networks: Policy & Configuration. Charles A Kunzinger. - 1 - S 21 Diretory-based Administration of Virtual Private Networks: Poliy & Configuration Charles A Kunzinger kunzinge@us.ibm.om - 2 - Clik here Agenda to type page title What is a VPN? What is VPN Poliy?

More information

A Fast Kernel-based Multilevel Algorithm for Graph Clustering

A Fast Kernel-based Multilevel Algorithm for Graph Clustering A Fast Kernel-based Multilevel Algorithm for Graph Clustering Inderjit Dhillon Dept. of Computer Sienes University of Texas at Austin Austin, TX 78712 inderjit@s.utexas.edu Yuqiang Guan Dept. of Computer

More information

3-D IMAGE MODELS AND COMPRESSION - SYNTHETIC HYBRID OR NATURAL FIT?

3-D IMAGE MODELS AND COMPRESSION - SYNTHETIC HYBRID OR NATURAL FIT? 3-D IMAGE MODELS AND COMPRESSION - SYNTHETIC HYBRID OR NATURAL FIT? Bernd Girod, Peter Eisert, Marus Magnor, Ekehard Steinbah, Thomas Wiegand Te {girod eommuniations Laboratory, University of Erlangen-Nuremberg

More information

Weak Dependence on Initialization in Mixture of Linear Regressions

Weak Dependence on Initialization in Mixture of Linear Regressions Proeedings of the International MultiConferene of Engineers and Computer Sientists 8 Vol I IMECS 8, Marh -6, 8, Hong Kong Weak Dependene on Initialization in Mixture of Linear Regressions Ryohei Nakano

More information

arxiv: v1 [cs.gr] 10 Apr 2015

arxiv: v1 [cs.gr] 10 Apr 2015 REAL-TIME TOOL FOR AFFINE TRANSFORMATIONS OF TWO DIMENSIONAL IFS FRACTALS ELENA HADZIEVA AND MARIJA SHUMINOSKA arxiv:1504.02744v1 s.gr 10 Apr 2015 Abstrat. This work introdues a novel tool for interative,

More information

ASSESSMENT OF TWO CHEAP CLOSE-RANGE FEATURE EXTRACTION SYSTEMS

ASSESSMENT OF TWO CHEAP CLOSE-RANGE FEATURE EXTRACTION SYSTEMS ASSESSMENT OF TWO CHEAP CLOSE-RANGE FEATURE EXTRACTION SYSTEMS Ahmed Elaksher a, Mohammed Elghazali b, Ashraf Sayed b, and Yasser Elmanadilli b a Shool of Civil Engineering, Purdue University, West Lafayette,

More information

An Approach to Physics Based Surrogate Model Development for Application with IDPSA

An Approach to Physics Based Surrogate Model Development for Application with IDPSA An Approah to Physis Based Surrogate Model Development for Appliation with IDPSA Ignas Mikus a*, Kaspar Kööp a, Marti Jeltsov a, Yuri Vorobyev b, Walter Villanueva a, and Pavel Kudinov a a Royal Institute

More information

CA Agile Requirements Designer 2.x Implementation Proven Professional Exam (CAT-720) Study Guide Version 1.0

CA Agile Requirements Designer 2.x Implementation Proven Professional Exam (CAT-720) Study Guide Version 1.0 Exam (CAT-720) Study Guide Version 1.0 PROPRIETARY AND CONFIDENTIAL INFORMATION 2017 CA. All rights reserved. CA onfidential & proprietary information. For CA, CA Partner and CA Customer use only. No unauthorized

More information