Evaluation of Benchmark Performance Estimation for Parallel. Fortran Programs on Massively Parallel SIMD and MIMD. Computers.

Size: px
Start display at page:

Download "Evaluation of Benchmark Performance Estimation for Parallel. Fortran Programs on Massively Parallel SIMD and MIMD. Computers."

Transcription

1 Evaluation of Benhmark Performane Estimation for Parallel Fortran Programs on Massively Parallel SIMD and MIMD Computers Thomas Fahringer Dept of Software Tehnology and Parallel Systems University of Vienna Bruennerstr 72, A-121 Vienna, Austria To be published in: IEEE Proeedings of the 2nd Euromiro Workshop on Parallel and Distributed Proessing, Malaga/Spain, Jan 1994 Abstrat A potential problem enountered when parallelizing programs for massively parallel systems is to guide the parallelization eort through performane predition Estimating the performane of parallel programs based on benhmarking is getting inreasingly popular in reent years However, there was little researh done so far to evaluate this approah mainly due to the lak of atual implementations This paper disusses the advantages and disadvantages of benhmark performane estimation for SIMD and MIMD mahines The design and implementation of a benhmark performane estimator is presented Even though benhmark performane estimations have been demonstrated to be very useful, experiments based on the desribed prototype unover several severe problems of this approah This inludes time eort, portability, measurement omplexity, performane inuene of target mahine and ompiler, pattern mathing of kernels, and predition auray Conrete experiments for the MasPar MP-1 and the ipsc/6 hyperube are presented 1 Introdution Common use of massively parallel SIMD and MIMD systems has been hindered by the diulty of programming suh mahines Even though the development of expliit parallel languages ([19, ]) and parallelizing ompilers ([3, 13, 1]) has been put forth in reent years to alleviate the programming task, the user is still responsible to make most of the strategi program transformation deisions It is widely aepted that a performane estimator is a key omponent to guide the parallelization and optimization eort For some time people have great hope in using the so-alled benhmark performane estimation approah 1 This involves pre-measurement of kernels to over both sequential and parallel program setions Parallel kernels are ommonly measured for varying data sizes and proessor numbers The measured runtimes are stored in a kernel library In order to obtain estimated runtimes, a parallel program is parsed to detet existing library kernels inorporating pattern mathing For eah kernel disovered, the premeasured runtime is aumulated, whih nally yields the overall runtime of the program MaDonald ([14]) desribes an approah to predit the runtime for Fortran77 programs by mapping language onstruts to time-formulas He ahieves good results for small kernels with trivial ontrol ow V Balasundaram et al ([2]) built a training set tool to help validating dierent data layout shemes for parallel programs based on the loosely synhronous ommuniation model They do not handle proedure alls and inorporate guessing to model ontrol ow They ahieve fairly aurate estimates This paper tries to address the pratial issue of using benhmarking to obtain performane estimates The orresponding researh started with the reation of a benhmark kernel library for a SIMD mahine, namely the MasPar MP-1 The runtime of parallel MasPar Fortran programs was derived by hand using pre-measured kernel runtimes This work initiated the design of an automati benhmark performane esti- 1 For the remainder of this paper performane estimation refers to benhmark performane estimation 1

2 mator for message passing Fortran programs, to be integrated in the the Vienna Fortran Compilation System (VFCS - [3]), whih is a ompiler that automatially translates Fortran programs into message passing programs for distributed memory parallel arhitetures The target programs for the benhmark estimator are based on the single-program multiple-data (SPMD) programming model ([3]), where eah proessor is exeuting the same program based on a dierent data domain The target mahine is the ipsc/6 hyperube This paper reports on the experiene with the manually obtained performane estimates for the MP-1 and the automatially derived values by the desribed prototype estimator for the ipsc/6 hyperube The benhmark approah is evaluated with respet to following issues: 1 time eort to build and maintain a benhmark performane estimator, 2 portability, 3 measurement omplexity, 4 loal and global performane inuene of target mahine and ompiler, 5 pattern mathing of kernels, and 6 predition auray The paper is organized as follows In Setion 2 the design and lassiation of a kernel library is presented The problems arisen during the pre-measurement of kernels for dierent target arhitetures are disussed Then the overall performane estimation onepts and tehniques inorporated are outlined A variety of problems disovered during development of the prototype estimator are analyzed In Setion 3 experiments for both the MasPar MP-1 and the ipsc/6 hyperube are presented The paper onludes with a a summary of important observations about this researh and future work 2 A benhmark Performane Estimator 21 Benhmark Kernel Library The benhmark kernels of the prototype implementation are stored in a benhmark kernel library As dierent kernels may require dierent measurement, pattern mathing and evaluation tehniques a lassi- ation is naturally imposed on them This inludes four dierent kernel lasses: 1 primitive operations: basi operations (+;?; ; =), logial operations (<, >; ==,et), array aess kernels (eg A(I+1)), et 2 primitive statements: DO loop header statements, subroutine and funtion alls, onditional and unonditional statements, assignment statements, GOTO statements, atomi ommuniation statements (send and reeive), et This kernel lass also ontains Fortran9 array operations as used in MasPar Fortran ([15]) on the MP-1 3 intrinsi funtions: SI, COS, MOD, LOG, et This kernel lass also ontains impliit redution funtions inluded in the Fortran77 language speiation suh as MI, MAX, IDEX, et Other redution funtions are mahine spei implementations ([15, 11]) suh as DOT PRODUCT, SHIFT, MAXLOC, COUT, TRASPOSE, and a variety of olletive ommuniation statements (eg broadast) 4 ode patterns: This kernel lass inludes standard ode patterns amenable to reognition suh as elementary operations of linear algebra (matrix multipliation, matrix inversion, determinant omputation, et) and ommonly used stenils suh as the Jaobi relaxation, LU deomposition, Gauss-Jordan and others Moreover, eah kernel lass is divided into mahine spei and mahine independent kernels In the framework of this projet extensive work and implementation for the rst 3 kernel lasses has been done About 15 dierent kernels were olleted aross these three lasses The urrent implementation of the desribed performane tool handles only a few larger ode patterns mainly beause of the diulty to detet them in a program evertheless, there exist a variety of oneptual ideas ([12, 4]) to approah the problem of pattern mathing for more ompliated kernels This will be addressed in future researh It is frequently assumed that building and maintaining suh a kernel library an be done with little time eort On the one hand, about 1 1/2 man-years were required to develop and implement a dierent performane estimator at the University of Vienna, namely the P 3 T 2 ([7, 5]) This estimator overs a wide lass of parallel programs going way beyond the apabilities of the desribed benhmark performane estimator On the other hand, building the benhmark estimator and in partiular the kernel library is an ongoing eort 2 The P 3 T is based on an analytial model, whih omputes a set of parallel program parameters to relate to the parallel program's performane 2

3 for more than 2 man-years A multitude of problems were enountered: The underlying target arhiteture and ompiler version fore the designer of a kernel library to add many dierent variations of even primitive kernels Eg on the Intel i6 it is vital to analyze the number, data types and the dimensionality (in ase of arrays) of proedure parameters A dierene in the number of array dimensions of atual and formal parameters may inrease a proedure all runtime by up to 5 % In ontrary to what is laimed in [2] a large portion of kernels { inluding primitive operations and statements { are neither portable aross a variety of target arhitetures nor aross dierent ompiler releases for the same mahine The kernel set had to be modied even for dierent ompiler versions of the ipsc/6 hyperube Eg in order to orrespond to the signiant performane dierene of kernel aesses to the main memory versus to a register, it is a prerequisite to model the register alloation poliy of the underlying target ompiler This eet is referred to as the loal kernel measurement eet, beause it is frequently loal to individual kernels The kernels are measured individually on dierent arhitetures However, they our in a global domain of a parallel program and may strongly inuene eah others performane Eg for the PSC Fortran Compiler Release 3 on the ipsc/6 hyperube depending on the problem size 3 of a program, a dierent number of assembly ode statements is generated for array and salar aesses by the target ompiler For small problem sizes only two, otherwise four assembly ode statements are generated Ignoring this fat would indue an estimation error of 4 to 5 % for the orresponding data aess runtime Furthermore, if a primitive operation is deteted by the target ompiler to be part of a ommon subexpressions ([1]), then its runtime an be signiantly redued Besides the problem of unovering how the target ompiler is handling ommon subexpressions, it is neessary to add additional kernels whih measure the eet of primitive operations as part of ommon subexpressions or as individual kernels This eet is alled the global kernel measurement eet, beause a kernel may inuene the performane of another kernel 3 This refers to the size of alloated arrays In order to ne tune the kernel library it was therefore neessary to analyze target arhiteture, target ompiler releases, and in partiular the resulting assembly ode of a parallel program This strongly speaks against portable kernel libraries eedless to mention the intense time eort for this task ot surprising at all, only a small subset of the kernel library desribed in this paper ould be used for both the ip- SC/6 hyperube and the MasPar MP-1 Larger ode patterns more likely allow to model both loal and global kernel measurement eets This, however, faes the designer of a kernel library with the diult task of pattern mathing for more ompliated and larger kernels (f [12, 4]) 22 Exeute kernels on dierent target arhitetures The kernel library is exeuted on every dierent target arhiteture for whih runtimes are to be estimated Primitive operations and most primitive statements { exept Fortran9 array operations { are measured for dierent data types, onstant and variable operand values Communiation statements, Fortran9 operations and intrinsi funtions are designed for dierent data layout shemes and measured for varying number of proessors and problem sizes Similar to [2] the hi-square t method [17] is used to t the measured runtime information into piee-wise linear funtions modeling both xed and variable stepsizes between these funtions Unfortunately xed stepsizes annot be assumed Even trivial kernels display non-onstant stepsizes between piee-wise linear performane funtions of even dierent shapes Undulations and runaway behavior of performane funtions are other frequently observed anomalies Fig 1 shows the benhmarking results of DOT PRODUCT whih is an intrinsi funtion in MasPar Fortran implementing the dot-produt multipliation of vetors First, this funtion was benhmarked for two vetors as rows of an array M The assoiated performane urve displays step wise linear urves of dierent shapes Sometimes a step is skipped (2 25) For 4 52 the runaway funtion behavior of a negative performane step was observed Even the stepsizes vary The reason for the steps between the linear piee-wise funtions is due to the wrap around memory hierarhy level of the MasPar MP-1 system ([16]) If a data vetor does not t onto a spei memory hierarhy level, it is wrapped around to the next higher memory level Data aess osts are suddenly inreasing eah time a next higher memory hierarhy 3

4 1?3 se 1?3 se DOT PRODUCT(M(I,2:-1),M(I,2:-1)) DOT PRODUCT(V(1:),V(1:)) s s s s s s s s s Figure 1: Irregular performane behavior of benhmark kernels level has to be aessed This is visualized by the steps in the performane funtion Seond, the same funtion is evaluated for a onedimensional vetor of size Using one-dimensional arrays instead of rows of a two-dimensional array uts the runtime overhead drastially This is due to a redued memory address omputation whih has to be done by the frontend proessor of the MP-1 The overall performane behavior is desribed by a asade funtion with disontinuities at multiples of 124 for At these multiples all proessors on the utilized MP-1 onguration (124 proessors) are employed in the omputation of DOT PRODUCT In all other ases some proessors have to be disabled for the assoiated omputation yles, while others are atually omputing This depends on the data layout sheme It seems that disabling some of the proessors is not for free, whih may explain this benhmark behavior The desribed performane estimator inorporates the arithmeti mean aross all stepsizes to ompensate for non-onstant stepsizes of performane funtions For some ases the stepsize onsistently inreases for larger problem sizes (see Fig 4) In that ase a linear funtion is omputed to model these non-onstant stepsizes by inorporating hi-square tting The memory requirement to store every single piee-wise linear funtion together with the assoiated step size would be unfeasible In general it was observed that more advaned interpolation tehniques are required to over larger lasses of parallel programs by the desribed performane funtion tting method This will be addressed in future researh When organizing the benhmark kernels in several exeutables a awkward diulty has been disovered As the kernel library ontains about 15 dierent kernels, they are ombined in several large exeutables instead of putting every single kernel in a distint exeutable On the Intel i6 it turned out that the measured performane for most spei kernels in a large exeutable was very dierent than if measured in single exeutables The reason is obvious: the pu-pipeline and ahe behavior is dissimilar for both measurement variants Deviations of one order of magnitude were observed in the worst ase and at least 1 to 2 % in the best ase The same problem is ubiquitous when estimating the performane of a real parallel program by a set of pre-measured kernels Again, going beyond primitive kernels might help to alleviate this drawbak 23 Deriving performane estimates Fig 2 illustrates the struture and omponents of the desribed benhmark performane estimator The parallel program to be evaluated by this tool is attributed by onrete values for program unknowns: loop iteration ounts, branhing probabilities, and statement exeution ounts These program unknowns are referred to as sequential program parameters as they relate to the ontrol ow of an SPMD program, whih is equal for all proessors The sequential program parameters are derived by a single preeding prole run of the input program as initiated by the Weight Finder ([6]), whih is an advaned proler for Fortran programs integrated in the VFCS 1 The parallel program attributed by the sequential program parameters is parsed by inorporating its syntax tree and ontrol owgraph representation under the VFCS There are a variety of routines as provided by the VFCS ([3]), whih allow to onveniently traverse through the syntax tree and ontrol owgraph For eah syntax tree no- 4

5 de a kernel pattern mathing and subsequently a performane evaluation algorithm is initiated, whih is explained in the following 2 Depending on the lass of benhmark kernels to be mathed with, a dierent pattern mathing strategy is applied Primitive operations, primitive statements and intrinsi funtions are simply deteted by their syntax tree node representation The underlying ompilation system strongly supports this pattern mathing task by normalizing expressions aross the entire parallel program Furthermore an expression simplier statially evaluates expressions ontaining symbolis and onstants to redue them to essentials For entire ode patterns (eg matrix multipliation) more advaned tehniques are required suh as those mentioned in [12, 4] The implementation status of the pattern mather handles all kernels in the kernel library exept ode patterns 3 Based on the pre-measured runtimes of the kernels in the kernel library and the sequential program parameters it is straight forward to obtain the estimated runtime for arbitrary program segments At the lowest level the runtime for a spei primitive statement S { or the sum of all ontained primitive operations in S, if S is a nonprimitive statement { is multiplied by the orresponding statement exeution ount This gure needs to be further weighted by the assoiated branhing probability in ase of a onditional statement, whih then yields the estimated runtime for S In order to ompute the estimated runtime for an arbitrary program segment the estimated runtimes for all of its statements are summed up The only problem arising with this approah is to estimate the runtime for proedure (Fortran subroutine or funtion) alls A major assumption is that the runtime of a proedure all is independent of the all site As a onsequene the runtime at a partiular all site is the same as the runtime of the proedure over all all sites This assumption is ommonly made ([1, 9]) and prevents expensive time and memory onsuming simulation eorts to evaluate a more preise program behavior The estimated runtime for a spei proedure all instantiation is therefore obtained by dividing the aumulated estimated runtime for the assoiated proedure by the sum of the statement exeution ounts aross all all sites of this proedure Multiplying this value by the statement exeution ount of a spei proedure all statement allows to weight the importane of dierent all sites with respet to their runtime overhead 4 Optionally the parallel program's internal representation (syntax tree) an be annotated with the estimated runtime values as derived in the performane evaluation phase This supports a lean interfae to other parallelization and optimization phases under the VFCS suh as the seletion of program transformations ([5, 3]) and automati data distribution strategies ([4]), whih require performane estimates 5 The estimated runtime gures an be seletively visualized together with the assoiated program statements This is fully implemented using a X11/Motif window under the VFCS ([5, 7]) ote that only the absene of a parser for MasPar Fortran programs prevents the automati performane predition of MasPar Fortran programs All other tool omponents an be equally applied to both SIMD and MIMD programs Future researh will extend the benhmark estimator in this diretion parallel program + sequential program parameters parsing pattern mathing performane evaluation annotated parallel program performane visualization Benhmarking Performane Estimator primitive operations intrinsi funtions primitive statements ode patterns Tool Program Kernels Figure 2: A performane estimator based on benhmarking 5

6 3 Experiments This setion disusses two experiments to validate the desribed benhmark estimator approah The rst experiment examines the Red-Blak hekerboard algorithm ([17]), whih is a pointwise relaxation method, on the ipsc/6 hyperube The seond experiment evaluates the JACOBI relaxation iterative method 4 ([17]) on the MasPar MP-1 The urrent benhmark estimator prototype handles only sequential programs for a single Intel i6 proessor Therefore, the rst experiment relates to the automati estimation of purely sequential kernels Fig3 shows the measured and estimated runtimes of the Red-Blak Relaxation ode The benhmark estimator was used to automatially estimate the runtime of this ode for varying problem sizes It an be learly seen that for inreasing problem sizes the estimation auray onsistently deteriorates For a problem size of = 124 the error rate is about 27 % The main reason for this result is, that the pre-measured runtimes of the rather small kernels in the kernel library lak the modeling of the global kernel measurement eets (f Setion 21) In partiular the inaurate modeling of the Intel i6 ahe and pu-pipeline behavior indue this deviation Using larger kernel may improve these results se estimated measured Figure 3: Measured versus predited runtimes for the Red-Blak Relaxation ode The seond experiment was obtained on the Mas- Par MP-1 with 124 proessors for the JACOBI relaxation iterative method The performane of a parallel 4 This method is used to approximate the solution of a partial dierential equation disretized on a grid JACOBI program written in MasPar Fortran is measured and predited for varying problem sizes Based on the desribed estimation approah it is possible to predit the runtime of the JACOBI program within 1 % of the atual result Table 1: Various kernels and program segments nr Kernel 1 F(2:-1,2:-1) = U(2:-1,2:-1) 2 F(2:-1,2:-1) = OMEGA * U(2:-1,2:-1) 3 F(2:-1,2:-1) = U(2:-1,2:-1) + U(1:-2, 2:-1) + U(3:,2:-1) + U(2:-1,1:-2) + U(2:-1,3:) 4 F(2:-1,2:-1) = (1-OMEGA) * U(2:-1,2:-1) + OMEGA*25*(F(2:-1,2:-1) + U(1:-2,2:-1) + U(3:,2:-1) + U(2:-1,3:) + U(2:-1,1:-2)) 5 JACOBI program The rst three entries in Table 1 illustrate library kernels as measured for the MasPar MP-1 The fourth entry displays the ode pattern of the main JACOBI relaxation statement The last entry represents the entire JACOBI program Fig 4 illustrates the measured versus predited runtimes for eah of the kernels in Table 1 in the same order, where Kernel-1 orresponds to Fig4a,, and Kernel-5 to Fig4e The measured runtime of Kernel-1 whih is a Fortran9 array assignment operation is plotted as a quadrati funtion This behavior is approximated by a step-wise linear funtion (see dashed funtion) Kernel-2 is very similar to Kernel-1, but inludes a salar multipliation This additional operation implies a doubling of the runtime, beause it is proessed on the frontend proessor of the MP-1 Kernel-3 represents a frequently found neighbor omputation stenil The reason for the orresponding asade runtime funtion in Fig4 is due to the wrap around memory hierarhy of the MasPar MP-1 By approximating this funtion with a stepwise linear funtion using the hi-square tting tehnique an estimation auray of more than 95 % is ahieved The non-onstant stepsize between the stepwise linear funtions is modeled by a linear funtion as outlined in Setion 22 Kernel-4 is a ombination of Kernel-1, 2 and 3 The resulting performane is therefore modeled as a linear ombination of these sub-kernels plus two additional salar operation kernels The dierene between atual versus predited runtime is surprisingly small (within 5 % in the worst ase) In Fig4e the performane of the entire JACOBI program is visualized showing a worst ase deviation 6

7 of less then 6 % for the largest data size measured From this plot it an be learly seen that the estimation auray worsens with inreasing problem size However, for this example the pre-measured kernel runtime funtions ompensate eah other Fig 4 shows an under-estimation while all others over-estimate the atual result There are other experiments were suh good estimation results ould not be ahieved This is just another sign of how diult it is to ne tune the kernel library for a given arhiteture 1?3 se ?3 se ?3 se b a d ?3 se measured estimated ?3 se e Figure 4: Measured versus predited runtimes for the JACOBI program 4 Conlusion This paper analyzes the popular benhmark performane estimation by evaluating a prototype implementation Based on experiments done on both the MasPar MP-1 and the Intel i6 following observations were made: portability: In ontrary to popular belief there is only little evidene that larger portions of the kernel library are portable aross a variety of arhitetures In many ases it was neessary to preisely investigate the target ompiler's ode restruturing poliies, register alloation, ommon sub-expression elimination and other strategies in the deepest assembly ode level in order to tune the kernel library loal and global kernel measurement eets: Two major problems were deteted when building a benhmark kernel library First, the performane eet purely loal to a spei kernel, suh as the register alloation impat for omplex proessors Seond, pre-measuring kernels laks to suiently relate to an ourrene in a real world program beause of the dierent global ahe and pu-pipeline behavior This is a partiular serious disadvantage of the benhmark method if used for MIMD mahines with omplex proessors pattern mathing: In order to build a reasonably aurate benhmark performane estimator it is vital to go beyond primitive kernels This requires larger kernels to be deteted in a parallel program, whih in turn raises the question of pattern mathing for suh kernels This appears to be an open researh topi time eort: More than two man-years of work were required to reate a kernel library overing a small set of appliation programs It order to ahieve reasonably aurate performane estimates it was neessary to study the assembly ode of kernels and programs estimation auray: The estimation auray severely depends on on the target arhiteture, the target ompiler and the quality of the kernel library This aounts in partiular for omplex parallel systems with proessors inluding pupipelines and ahes suh as the Intel i6 However, it seems that SIMD mahines with relatively simple proessors are reasonably well suited for this approah The experiments done on the MasPar MP-1 (f Setion 3) demonstrate that 7

8 Even though a good part of this paper reports on the disadvantages of the performane estimation based on benhmarking, this method seems the best known way of deriving onrete and realisti estimated runtime information In theory it greatly simplies the task of modeling target ompilers and arhitetures by simply measuring the kernels without worrying about details of the underlying system In pratie this method requires substantial researh before being appliable to larger lasses of real-world programs The urrent prototype implementation will be extended by a pattern mather for larger ode patterns in future work Furthermore the kernel library will be extended and validated for additional arhitetures Referenes [1] A Aho, R Sethi, and J Ullman Compilers, Priniples, Tehniques and Tools Series in Computer Siene Addison Wesley, 19 [2] V Balasundaram, G Fox, K Kennedy, and U Kremer A Stati Performane Estimator to Guide Data Partitioning Deisions In 3rd ACM Sigplan Symposium on Priniples and Pratie of Parallel Programming (PPoPP), Williamsburg, VA, April [3] B Chapman, S Benkner, R Blasko, P Brezany, M Egg, T Fahringer, HM Gerndt, J Hulman, B Knaus, P Kutshera, H Moritsh, A Shwald, V Sipkova, and HP Zima VIEA FORTRA Compilation System - Version 1 - User's Guide, January 1993 [4] B Chapman, T Fahringer, and H Zima Automati Support for Data Distribution In Pro of the Sixth Annual Workshop on Languages and Compilers for Parallel Computing, Portland, Oregon, Aug 1993 [5] T Fahringer Automati Performane Predition for Parallel Programs on Massively Parallel Computers PhD thesis, University of Vienna, Department of Software Tehnology and Parallel Systems, Otober 1993 [6] T Fahringer The Weight Finder, An Advaned Proler for Fortran Programs In Automati Parallelization, ew Approahes to Code Generation, Data Distribution, and Performane Predition Vieweg Advaned Studies in Computer Siene, ISB , Verlag Vieweg, Wiebaden, Germany, Marh 1993 [7] T Fahringer and H Zima A Stati Parameter based Performane Predition Tool for Parallel Programs In Invited Paper, In Pro of the 7th ACM International Conferene on Superomputing 1993, Tokyo, Japan, July 1993 [] G Fox, S Hiranandani, K Kennedy, C Koelbel, U Kremer, C Tseng, and M Wu Fortran D Language Speiation Tehnial Report TR9-141, Dept of Computer Siene, Rie University, Deember 199 [9] SL Graham, PB Kessler, and MK MKusik gprof: A Call Graph Exeution Proler In In Proeedings of the SIGPLA 2 Symposium on Compiler Constrution, pages 12 { 126, June 192 SIGPLA oties, Vol17, o 6 [1] S Hiranandani, K Kennedy, C Koelbel, U Kremer, and C Tseng An overview of the Fortran D programming system In Pro of the 4th Workshop on Languages and Compilers for Parallele Computing, Santa Clara, CA, Aug 1991 [11] Intel Superomputer Systems Division, Beaverton, OR ipsc/6 Fortran Compiler User's Guide, Marh 1992 [12] CW Keler Knowledge-Based Automati Parallelization by Pattern Reognition In Proeedings (Preprint), International Workshop on Automati Parallelization 1993 Universitat des Saarlandes, Saarbruken,Germany, Marh 1993 [13] D Loveman High Performane Fortran: Proposal, January 1992 [14] B MaDonald Prediting the Exeution Time of Sequential Sienti Codes In Proeedings (Preprint), International Workshop on Automati Parallelization 1993 Universitat des Saarlandes, Saarbruken,Germany, Marh 1993 [15] MasPar Computer Corporation, Sunnyvale, CA MasPar Fortran Referene Manual, July 1992 Software Version 2, Doument Part umber 933-, Revision A5 [16] MasPar Computer Corporation, Sunnyvale, CA MasPar System Overview, July 1992 Doument Part umber 933-1, Revision A5 [17] WH Press, BP Flannery, SA Teukolsky, and WT Vetterling umerial Reipes in C; The Art of Sienti Computing Cambridge University Press, 19

9 [1] V Sarkar Partitioning and Sheduling Parallel Programs for Multiproessor The MIT Press, Cambridge, Massahusetts, 199 [19] H Zima, P Brezany, B Chapman, P Mehrotra, and A Shwald Vienna Fortran - a language speiation Tehnial report, ICASE, Hampton,VA, 1992 ICASE Internal Report 21 9

Pipelined Multipliers for Reconfigurable Hardware

Pipelined Multipliers for Reconfigurable Hardware Pipelined Multipliers for Reonfigurable Hardware Mithell J. Myjak and José G. Delgado-Frias Shool of Eletrial Engineering and Computer Siene, Washington State University Pullman, WA 99164-2752 USA {mmyjak,

More information

Learning Convention Propagation in BeerAdvocate Reviews from a etwork Perspective. Abstract

Learning Convention Propagation in BeerAdvocate Reviews from a etwork Perspective. Abstract CS 9 Projet Final Report: Learning Convention Propagation in BeerAdvoate Reviews from a etwork Perspetive Abstrat We look at the way onventions propagate between reviews on the BeerAdvoate dataset, and

More information

mahines. HBSP enhanes the appliability of the BSP model by inorporating parameters that reet the relative speeds of the heterogeneous omputing omponen

mahines. HBSP enhanes the appliability of the BSP model by inorporating parameters that reet the relative speeds of the heterogeneous omputing omponen The Heterogeneous Bulk Synhronous Parallel Model Tiani L. Williams and Rebea J. Parsons Shool of Computer Siene University of Central Florida Orlando, FL 32816-2362 fwilliams,rebeag@s.uf.edu Abstrat. Trends

More information

A DYNAMIC ACCESS CONTROL WITH BINARY KEY-PAIR

A DYNAMIC ACCESS CONTROL WITH BINARY KEY-PAIR Malaysian Journal of Computer Siene, Vol 10 No 1, June 1997, pp 36-41 A DYNAMIC ACCESS CONTROL WITH BINARY KEY-PAIR Md Rafiqul Islam, Harihodin Selamat and Mohd Noor Md Sap Faulty of Computer Siene and

More information

Outline: Software Design

Outline: Software Design Outline: Software Design. Goals History of software design ideas Design priniples Design methods Life belt or leg iron? (Budgen) Copyright Nany Leveson, Sept. 1999 A Little History... At first, struggling

More information

Constructing Transaction Serialization Order for Incremental. Data Warehouse Refresh. Ming-Ling Lo and Hui-I Hsiao. IBM T. J. Watson Research Center

Constructing Transaction Serialization Order for Incremental. Data Warehouse Refresh. Ming-Ling Lo and Hui-I Hsiao. IBM T. J. Watson Research Center Construting Transation Serialization Order for Inremental Data Warehouse Refresh Ming-Ling Lo and Hui-I Hsiao IBM T. J. Watson Researh Center July 11, 1997 Abstrat In typial pratie of data warehouse, the

More information

Exploring the Commonality in Feature Modeling Notations

Exploring the Commonality in Feature Modeling Notations Exploring the Commonality in Feature Modeling Notations Miloslav ŠÍPKA Slovak University of Tehnology Faulty of Informatis and Information Tehnologies Ilkovičova 3, 842 16 Bratislava, Slovakia miloslav.sipka@gmail.om

More information

A Dual-Hamiltonian-Path-Based Multicasting Strategy for Wormhole-Routed Star Graph Interconnection Networks

A Dual-Hamiltonian-Path-Based Multicasting Strategy for Wormhole-Routed Star Graph Interconnection Networks A Dual-Hamiltonian-Path-Based Multiasting Strategy for Wormhole-Routed Star Graph Interonnetion Networks Nen-Chung Wang Department of Information and Communiation Engineering Chaoyang University of Tehnology,

More information

NONLINEAR BACK PROJECTION FOR TOMOGRAPHIC IMAGE RECONSTRUCTION. Ken Sauer and Charles A. Bouman

NONLINEAR BACK PROJECTION FOR TOMOGRAPHIC IMAGE RECONSTRUCTION. Ken Sauer and Charles A. Bouman NONLINEAR BACK PROJECTION FOR TOMOGRAPHIC IMAGE RECONSTRUCTION Ken Sauer and Charles A. Bouman Department of Eletrial Engineering, University of Notre Dame Notre Dame, IN 46556, (219) 631-6999 Shool of

More information

Automatic Physical Design Tuning: Workload as a Sequence Sanjay Agrawal Microsoft Research One Microsoft Way Redmond, WA, USA +1-(425)

Automatic Physical Design Tuning: Workload as a Sequence Sanjay Agrawal Microsoft Research One Microsoft Way Redmond, WA, USA +1-(425) Automati Physial Design Tuning: Workload as a Sequene Sanjay Agrawal Mirosoft Researh One Mirosoft Way Redmond, WA, USA +1-(425) 75-357 sagrawal@mirosoft.om Eri Chu * Computer Sienes Department University

More information

Capturing Large Intra-class Variations of Biometric Data by Template Co-updating

Capturing Large Intra-class Variations of Biometric Data by Template Co-updating Capturing Large Intra-lass Variations of Biometri Data by Template Co-updating Ajita Rattani University of Cagliari Piazza d'armi, Cagliari, Italy ajita.rattani@diee.unia.it Gian Lua Marialis University

More information

splitting tehniques that partition live ranges have been proposed to solve both the spilling problem[5][8] and the assignment problem[8][9]. The parti

splitting tehniques that partition live ranges have been proposed to solve both the spilling problem[5][8] and the assignment problem[8][9]. The parti Load/Store Range Analysis for Global Register Alloation Priyadarshan Kolte and Mary Jean Harrold Department of Computer Siene Clemson University Abstrat Live range splitting tehniques divide the live ranges

More information

On - Line Path Delay Fault Testing of Omega MINs M. Bellos 1, E. Kalligeros 1, D. Nikolos 1,2 & H. T. Vergos 1,2

On - Line Path Delay Fault Testing of Omega MINs M. Bellos 1, E. Kalligeros 1, D. Nikolos 1,2 & H. T. Vergos 1,2 On - Line Path Delay Fault Testing of Omega MINs M. Bellos, E. Kalligeros, D. Nikolos,2 & H. T. Vergos,2 Dept. of Computer Engineering and Informatis 2 Computer Tehnology Institute University of Patras,

More information

Extracting Partition Statistics from Semistructured Data

Extracting Partition Statistics from Semistructured Data Extrating Partition Statistis from Semistrutured Data John N. Wilson Rihard Gourlay Robert Japp Mathias Neumüller Department of Computer and Information Sienes University of Strathlyde, Glasgow, UK {jnw,rsg,rpj,mathias}@is.strath.a.uk

More information

Algorithms, Mechanisms and Procedures for the Computer-aided Project Generation System

Algorithms, Mechanisms and Procedures for the Computer-aided Project Generation System Algorithms, Mehanisms and Proedures for the Computer-aided Projet Generation System Anton O. Butko 1*, Aleksandr P. Briukhovetskii 2, Dmitry E. Grigoriev 2# and Konstantin S. Kalashnikov 3 1 Department

More information

What are Cycle-Stealing Systems Good For? A Detailed Performance Model Case Study

What are Cycle-Stealing Systems Good For? A Detailed Performance Model Case Study What are Cyle-Stealing Systems Good For? A Detailed Performane Model Case Study Wayne Kelly and Jiro Sumitomo Queensland University of Tehnology, Australia {w.kelly, j.sumitomo}@qut.edu.au Abstrat The

More information

Cluster-Based Cumulative Ensembles

Cluster-Based Cumulative Ensembles Cluster-Based Cumulative Ensembles Hanan G. Ayad and Mohamed S. Kamel Pattern Analysis and Mahine Intelligene Lab, Eletrial and Computer Engineering, University of Waterloo, Waterloo, Ontario N2L 3G1,

More information

Space- and Time-Efficient BDD Construction via Working Set Control

Space- and Time-Efficient BDD Construction via Working Set Control Spae- and Time-Effiient BDD Constrution via Working Set Control Bwolen Yang Yirng-An Chen Randal E. Bryant David R. O Hallaron Computer Siene Department Carnegie Mellon University Pittsburgh, PA 15213.

More information

The Minimum Redundancy Maximum Relevance Approach to Building Sparse Support Vector Machines

The Minimum Redundancy Maximum Relevance Approach to Building Sparse Support Vector Machines The Minimum Redundany Maximum Relevane Approah to Building Sparse Support Vetor Mahines Xiaoxing Yang, Ke Tang, and Xin Yao, Nature Inspired Computation and Appliations Laboratory (NICAL), Shool of Computer

More information

Performance Improvement of TCP on Wireless Cellular Networks by Adaptive FEC Combined with Explicit Loss Notification

Performance Improvement of TCP on Wireless Cellular Networks by Adaptive FEC Combined with Explicit Loss Notification erformane Improvement of TC on Wireless Cellular Networks by Adaptive Combined with Expliit Loss tifiation Masahiro Miyoshi, Masashi Sugano, Masayuki Murata Department of Infomatis and Mathematial Siene,

More information

13.1 Numerical Evaluation of Integrals Over One Dimension

13.1 Numerical Evaluation of Integrals Over One Dimension 13.1 Numerial Evaluation of Integrals Over One Dimension A. Purpose This olletion of subprograms estimates the value of the integral b a f(x) dx where the integrand f(x) and the limits a and b are supplied

More information

3-D IMAGE MODELS AND COMPRESSION - SYNTHETIC HYBRID OR NATURAL FIT?

3-D IMAGE MODELS AND COMPRESSION - SYNTHETIC HYBRID OR NATURAL FIT? 3-D IMAGE MODELS AND COMPRESSION - SYNTHETIC HYBRID OR NATURAL FIT? Bernd Girod, Peter Eisert, Marus Magnor, Ekehard Steinbah, Thomas Wiegand Te {girod eommuniations Laboratory, University of Erlangen-Nuremberg

More information

Volume 3, Issue 9, September 2013 International Journal of Advanced Research in Computer Science and Software Engineering

Volume 3, Issue 9, September 2013 International Journal of Advanced Research in Computer Science and Software Engineering Volume 3, Issue 9, September 2013 ISSN: 2277 128X International Journal of Advaned Researh in Computer Siene and Software Engineering Researh Paper Available online at: www.ijarsse.om A New-Fangled Algorithm

More information

Parallelizing Frequent Web Access Pattern Mining with Partial Enumeration for High Speedup

Parallelizing Frequent Web Access Pattern Mining with Partial Enumeration for High Speedup Parallelizing Frequent Web Aess Pattern Mining with Partial Enumeration for High Peiyi Tang Markus P. Turkia Department of Computer Siene Department of Computer Siene University of Arkansas at Little Rok

More information

A Load-Balanced Clustering Protocol for Hierarchical Wireless Sensor Networks

A Load-Balanced Clustering Protocol for Hierarchical Wireless Sensor Networks International Journal of Advanes in Computer Networks and Its Seurity IJCNS A Load-Balaned Clustering Protool for Hierarhial Wireless Sensor Networks Mehdi Tarhani, Yousef S. Kavian, Saman Siavoshi, Ali

More information

COST PERFORMANCE ASPECTS OF CCD FAST AUXILIARY MEMORY

COST PERFORMANCE ASPECTS OF CCD FAST AUXILIARY MEMORY COST PERFORMANCE ASPECTS OF CCD FAST AUXILIARY MEMORY Dileep P, Bhondarkor Texas Instruments Inorporated Dallas, Texas ABSTRACT Charge oupled devies (CCD's) hove been mentioned as potential fast auxiliary

More information

CleanUp: Improving Quadrilateral Finite Element Meshes

CleanUp: Improving Quadrilateral Finite Element Meshes CleanUp: Improving Quadrilateral Finite Element Meshes Paul Kinney MD-10 ECC P.O. Box 203 Ford Motor Company Dearborn, MI. 8121 (313) 28-1228 pkinney@ford.om Abstrat: Unless an all quadrilateral (quad)

More information

Writing Libraries in MPI*

Writing Libraries in MPI* Writing Libraries in MPI* Anthony Skjellumt Nathan E. Doss Purushotham V. Bangaloret Computer Siene Departmentt & NSF Engineering Researh Center for Computational Field Simulation Mississippi State University

More information

A {k, n}-secret Sharing Scheme for Color Images

A {k, n}-secret Sharing Scheme for Color Images A {k, n}-seret Sharing Sheme for Color Images Rastislav Luka, Konstantinos N. Plataniotis, and Anastasios N. Venetsanopoulos The Edward S. Rogers Sr. Dept. of Eletrial and Computer Engineering, University

More information

The recursive decoupling method for solving tridiagonal linear systems

The recursive decoupling method for solving tridiagonal linear systems Loughborough University Institutional Repository The reursive deoupling method for solving tridiagonal linear systems This item was submitted to Loughborough University's Institutional Repository by the/an

More information

Abstract. Key Words: Image Filters, Fuzzy Filters, Order Statistics Filters, Rank Ordered Mean Filters, Channel Noise. 1.

Abstract. Key Words: Image Filters, Fuzzy Filters, Order Statistics Filters, Rank Ordered Mean Filters, Channel Noise. 1. Fuzzy Weighted Rank Ordered Mean (FWROM) Filters for Mixed Noise Suppression from Images S. Meher, G. Panda, B. Majhi 3, M.R. Meher 4,,4 Department of Eletronis and I.E., National Institute of Tehnology,

More information

A Partial Sorting Algorithm in Multi-Hop Wireless Sensor Networks

A Partial Sorting Algorithm in Multi-Hop Wireless Sensor Networks A Partial Sorting Algorithm in Multi-Hop Wireless Sensor Networks Abouberine Ould Cheikhna Department of Computer Siene University of Piardie Jules Verne 80039 Amiens Frane Ould.heikhna.abouberine @u-piardie.fr

More information

the data. Structured Principal Component Analysis (SPCA)

the data. Structured Principal Component Analysis (SPCA) Strutured Prinipal Component Analysis Kristin M. Branson and Sameer Agarwal Department of Computer Siene and Engineering University of California, San Diego La Jolla, CA 9193-114 Abstrat Many tasks involving

More information

Parametric Abstract Domains for Shape Analysis

Parametric Abstract Domains for Shape Analysis Parametri Abstrat Domains for Shape Analysis Xavier RIVAL (INRIA & Éole Normale Supérieure) Joint work with Bor-Yuh Evan CHANG (University of Maryland U University of Colorado) and George NECULA (University

More information

Fuzzy Meta Node Fuzzy Metagraph and its Cluster Analysis

Fuzzy Meta Node Fuzzy Metagraph and its Cluster Analysis Journal of Computer Siene 4 (): 9-97, 008 ISSN 549-3636 008 Siene Publiations Fuzzy Meta Node Fuzzy Metagraph and its Cluster Analysis Deepti Gaur, Aditya Shastri and Ranjit Biswas Department of Computer

More information

Make your process world

Make your process world Automation platforms Modion Quantum Safety System Make your proess world a safer plae You are faing omplex hallenges... Safety is at the heart of your proess In order to maintain and inrease your ompetitiveness,

More information

8 Instruction Selection

8 Instruction Selection 8 Instrution Seletion The IR ode instrutions were designed to do exatly one operation: load/store, add, subtrat, jump, et. The mahine instrutions of a real CPU often perform several of these primitive

More information

Chapter 2: Introduction to Maple V

Chapter 2: Introduction to Maple V Chapter 2: Introdution to Maple V 2-1 Working with Maple Worksheets Try It! (p. 15) Start a Maple session with an empty worksheet. The name of the worksheet should be Untitled (1). Use one of the standard

More information

Graph-Based vs Depth-Based Data Representation for Multiview Images

Graph-Based vs Depth-Based Data Representation for Multiview Images Graph-Based vs Depth-Based Data Representation for Multiview Images Thomas Maugey, Antonio Ortega, Pasal Frossard Signal Proessing Laboratory (LTS), Eole Polytehnique Fédérale de Lausanne (EPFL) Email:

More information

Multi-Piece Mold Design Based on Linear Mixed-Integer Program Toward Guaranteed Optimality

Multi-Piece Mold Design Based on Linear Mixed-Integer Program Toward Guaranteed Optimality INTERNATIONAL CONFERENCE ON MANUFACTURING AUTOMATION (ICMA200) Multi-Piee Mold Design Based on Linear Mixed-Integer Program Toward Guaranteed Optimality Stephen Stoyan, Yong Chen* Epstein Department of

More information

System-Level Parallelism and Throughput Optimization in Designing Reconfigurable Computing Applications

System-Level Parallelism and Throughput Optimization in Designing Reconfigurable Computing Applications System-Level Parallelism and hroughput Optimization in Designing Reonfigurable Computing Appliations Esam El-Araby 1, Mohamed aher 1, Kris Gaj 2, arek El-Ghazawi 1, David Caliga 3, and Nikitas Alexandridis

More information

Performance of Histogram-Based Skin Colour Segmentation for Arms Detection in Human Motion Analysis Application

Performance of Histogram-Based Skin Colour Segmentation for Arms Detection in Human Motion Analysis Application World Aademy of Siene, Engineering and Tehnology 8 009 Performane of Histogram-Based Skin Colour Segmentation for Arms Detetion in Human Motion Analysis Appliation Rosalyn R. Porle, Ali Chekima, Farrah

More information

Institute for Computer Applications in Science and Engineering NASA Langley Research Center Hampton, VIrginia

Institute for Computer Applications in Science and Engineering NASA Langley Research Center Hampton, VIrginia https://ntrs.nasa.gov/searh.jsp?r=19910002904 2017-10-28T01:31:13+00:00Z NA5A- u- 1

More information

arxiv: v1 [cs.db] 13 Sep 2017

arxiv: v1 [cs.db] 13 Sep 2017 An effiient lustering algorithm from the measure of loal Gaussian distribution Yuan-Yen Tai (Dated: May 27, 2018) In this paper, I will introdue a fast and novel lustering algorithm based on Gaussian distribution

More information

Automated System for the Study of Environmental Loads Applied to Production Risers Dustin M. Brandt 1, Celso K. Morooka 2, Ivan R.

Automated System for the Study of Environmental Loads Applied to Production Risers Dustin M. Brandt 1, Celso K. Morooka 2, Ivan R. EngOpt 2008 - International Conferene on Engineering Optimization Rio de Janeiro, Brazil, 01-05 June 2008. Automated System for the Study of Environmental Loads Applied to Prodution Risers Dustin M. Brandt

More information

DETECTION METHOD FOR NETWORK PENETRATING BEHAVIOR BASED ON COMMUNICATION FINGERPRINT

DETECTION METHOD FOR NETWORK PENETRATING BEHAVIOR BASED ON COMMUNICATION FINGERPRINT DETECTION METHOD FOR NETWORK PENETRATING BEHAVIOR BASED ON COMMUNICATION FINGERPRINT 1 ZHANGGUO TANG, 2 HUANZHOU LI, 3 MINGQUAN ZHONG, 4 JIAN ZHANG 1 Institute of Computer Network and Communiation Tehnology,

More information

Test Case Generation from UML State Machines

Test Case Generation from UML State Machines Test Case Generation from UML State Mahines Dirk Seifert To ite this version: Dirk Seifert. Test Case Generation from UML State Mahines. [Researh Report] 2008. HAL Id: inria-00268864

More information

Real-Time Control for a Turbojet Engine

Real-Time Control for a Turbojet Engine A Multiproessor mplementation of Real-Time Control for a Turbojet Engine Phillip L. Shaffer ABSTRACT: A real-time ontrol program for a turbojet engine has been implemented on a four-proessor omputer, ahieving

More information

Multi-Channel Wireless Networks: Capacity and Protocols

Multi-Channel Wireless Networks: Capacity and Protocols Multi-Channel Wireless Networks: Capaity and Protools Tehnial Report April 2005 Pradeep Kyasanur Dept. of Computer Siene, and Coordinated Siene Laboratory, University of Illinois at Urbana-Champaign Email:

More information

Verifying Interaction Protocol Compliance of Service Orchestrations

Verifying Interaction Protocol Compliance of Service Orchestrations Verifying Interation Protool Compliane of Servie Orhestrations Andreas Shroeder and Philip Mayer Ludwig-Maximilians-Universität Münhen, Germany {shroeda, mayer}@pst.ifi.lmu.de Abstrat. An important aspet

More information

RAC 2 E: Novel Rendezvous Protocol for Asynchronous Cognitive Radios in Cooperative Environments

RAC 2 E: Novel Rendezvous Protocol for Asynchronous Cognitive Radios in Cooperative Environments 21st Annual IEEE International Symposium on Personal, Indoor and Mobile Radio Communiations 1 RAC 2 E: Novel Rendezvous Protool for Asynhronous Cognitive Radios in Cooperative Environments Valentina Pavlovska,

More information

Trajectory Tracking Control for A Wheeled Mobile Robot Using Fuzzy Logic Controller

Trajectory Tracking Control for A Wheeled Mobile Robot Using Fuzzy Logic Controller Trajetory Traking Control for A Wheeled Mobile Robot Using Fuzzy Logi Controller K N FARESS 1 M T EL HAGRY 1 A A EL KOSY 2 1 Eletronis researh institute, Cairo, Egypt 2 Faulty of Engineering, Cairo University,

More information

Boosted Random Forest

Boosted Random Forest Boosted Random Forest Yohei Mishina, Masamitsu suhiya and Hironobu Fujiyoshi Department of Computer Siene, Chubu University, 1200 Matsumoto-ho, Kasugai, Aihi, Japan {mishi, mtdoll}@vision.s.hubu.a.jp,

More information

Evolutionary Feature Synthesis for Image Databases

Evolutionary Feature Synthesis for Image Databases Evolutionary Feature Synthesis for Image Databases Anlei Dong, Bir Bhanu, Yingqiang Lin Center for Researh in Intelligent Systems University of California, Riverside, California 92521, USA {adong, bhanu,

More information

特集 Road Border Recognition Using FIR Images and LIDAR Signal Processing

特集 Road Border Recognition Using FIR Images and LIDAR Signal Processing デンソーテクニカルレビュー Vol. 15 2010 特集 Road Border Reognition Using FIR Images and LIDAR Signal Proessing 高木聖和 バーゼル ファルディ Kiyokazu TAKAGI Basel Fardi ヘンドリック ヴァイゲル Hendrik Weigel ゲルド ヴァニーリック Gerd Wanielik This paper

More information

Accommodations of QoS DiffServ Over IP and MPLS Networks

Accommodations of QoS DiffServ Over IP and MPLS Networks Aommodations of QoS DiffServ Over IP and MPLS Networks Abdullah AlWehaibi, Anjali Agarwal, Mihael Kadoh and Ahmed ElHakeem Department of Eletrial and Computer Department de Genie Eletrique Engineering

More information

HEXA: Compact Data Structures for Faster Packet Processing

HEXA: Compact Data Structures for Faster Packet Processing Washington University in St. Louis Washington University Open Sholarship All Computer Siene and Engineering Researh Computer Siene and Engineering Report Number: 27-26 27 HEXA: Compat Data Strutures for

More information

SVC-DASH-M: Scalable Video Coding Dynamic Adaptive Streaming Over HTTP Using Multiple Connections

SVC-DASH-M: Scalable Video Coding Dynamic Adaptive Streaming Over HTTP Using Multiple Connections SVC-DASH-M: Salable Video Coding Dynami Adaptive Streaming Over HTTP Using Multiple Connetions Samar Ibrahim, Ahmed H. Zahran and Mahmoud H. Ismail Department of Eletronis and Eletrial Communiations, Faulty

More information

References. December 1992, pp. 71 { 81. pp.457{467. Magazine, June for very large high throughput database systems,"

References. December 1992, pp. 71 { 81. pp.457{467. Magazine, June for very large high throughput database systems, the overall working time for other appliations. In ase, data ltering was the only appliation being run, then using distributed indexing, we an serve 00 times as many requests. 6 Conlusion We have explored

More information

Smooth Trajectory Planning Along Bezier Curve for Mobile Robots with Velocity Constraints

Smooth Trajectory Planning Along Bezier Curve for Mobile Robots with Velocity Constraints Smooth Trajetory Planning Along Bezier Curve for Mobile Robots with Veloity Constraints Gil Jin Yang and Byoung Wook Choi Department of Eletrial and Information Engineering Seoul National University of

More information

Face and Facial Feature Tracking for Natural Human-Computer Interface

Face and Facial Feature Tracking for Natural Human-Computer Interface Fae and Faial Feature Traking for Natural Human-Computer Interfae Vladimir Vezhnevets Graphis & Media Laboratory, Dept. of Applied Mathematis and Computer Siene of Mosow State University Mosow, Russia

More information

Query Evaluation Overview. Query Optimization: Chap. 15. Evaluation Example. Cost Estimation. Query Blocks. Query Blocks

Query Evaluation Overview. Query Optimization: Chap. 15. Evaluation Example. Cost Estimation. Query Blocks. Query Blocks Query Evaluation Overview Query Optimization: Chap. 15 CS634 Leture 12 SQL query first translated to relational algebra (RA) Atually, some additional operators needed for SQL Tree of RA operators, with

More information

Plot-to-track correlation in A-SMGCS using the target images from a Surface Movement Radar

Plot-to-track correlation in A-SMGCS using the target images from a Surface Movement Radar Plot-to-trak orrelation in A-SMGCS using the target images from a Surfae Movement Radar G. Golino Radar & ehnology Division AMS, Italy ggolino@amsjv.it Abstrat he main topi of this paper is the formulation

More information

Detection and Recognition of Non-Occluded Objects using Signature Map

Detection and Recognition of Non-Occluded Objects using Signature Map 6th WSEAS International Conferene on CIRCUITS, SYSTEMS, ELECTRONICS,CONTROL & SIGNAL PROCESSING, Cairo, Egypt, De 9-31, 007 65 Detetion and Reognition of Non-Oluded Objets using Signature Map Sangbum Park,

More information

Architecture and Performance of the Hitachi SR2201 Massively Parallel Processor System

Architecture and Performance of the Hitachi SR2201 Massively Parallel Processor System Arhiteture and Performane of the Hitahi SR221 Massively Parallel Proessor System Hiroaki Fujii, Yoshiko Yasuda, Hideya Akashi, Yasuhiro Inagami, Makoto Koga*, Osamu Ishihara*, Masamori Kashiyama*, Hideo

More information

35 th Design Automation Conference Copyright 1998 ACM

35 th Design Automation Conference Copyright 1998 ACM Using Reongurable Computing Tehniques to Aelerate Problems in the CAD Domain: A Case Study with Boolean Satisability Peixin Zhong, Pranav Ashar, Sharad Malik and Margaret Martonosi Prineton University

More information

Unsupervised Stereoscopic Video Object Segmentation Based on Active Contours and Retrainable Neural Networks

Unsupervised Stereoscopic Video Object Segmentation Based on Active Contours and Retrainable Neural Networks Unsupervised Stereosopi Video Objet Segmentation Based on Ative Contours and Retrainable Neural Networks KLIMIS NTALIANIS, ANASTASIOS DOULAMIS, and NIKOLAOS DOULAMIS National Tehnial University of Athens

More information

Partial Character Decoding for Improved Regular Expression Matching in FPGAs

Partial Character Decoding for Improved Regular Expression Matching in FPGAs Partial Charater Deoding for Improved Regular Expression Mathing in FPGAs Peter Sutton Shool of Information Tehnology and Eletrial Engineering The University of Queensland Brisbane, Queensland, 4072, Australia

More information

Detecting Outliers in High-Dimensional Datasets with Mixed Attributes

Detecting Outliers in High-Dimensional Datasets with Mixed Attributes Deteting Outliers in High-Dimensional Datasets with Mixed Attributes A. Koufakou, M. Georgiopoulos, and G.C. Anagnostopoulos 2 Shool of EECS, University of Central Florida, Orlando, FL, USA 2 Dept. of

More information

Methods for Multi-Dimensional Robustness Optimization in Complex Embedded Systems

Methods for Multi-Dimensional Robustness Optimization in Complex Embedded Systems Methods for Multi-Dimensional Robustness Optimization in Complex Embedded Systems Arne Hamann, Razvan Rau, Rolf Ernst Institute of Computer and Communiation Network Engineering Tehnial University of Braunshweig,

More information

Tackling IPv6 Address Scalability from the Root

Tackling IPv6 Address Scalability from the Root Takling IPv6 Address Salability from the Root Mei Wang Ashish Goel Balaji Prabhakar Stanford University {wmei, ashishg, balaji}@stanford.edu ABSTRACT Internet address alloation shemes have a huge impat

More information

DECODING OF ARRAY LDPC CODES USING ON-THE FLY COMPUTATION Kiran Gunnam, Weihuang Wang, Euncheol Kim, Gwan Choi, Mark Yeary *

DECODING OF ARRAY LDPC CODES USING ON-THE FLY COMPUTATION Kiran Gunnam, Weihuang Wang, Euncheol Kim, Gwan Choi, Mark Yeary * DECODING OF ARRAY LDPC CODES USING ON-THE FLY COMPUTATION Kiran Gunnam, Weihuang Wang, Eunheol Kim, Gwan Choi, Mark Yeary * Dept. of Eletrial Engineering, Texas A&M University, College Station, TX-77840

More information

Computing Pool: a Simplified and Practical Computational Grid Model

Computing Pool: a Simplified and Practical Computational Grid Model Computing Pool: a Simplified and Pratial Computational Grid Model Peng Liu, Yao Shi, San-li Li Institute of High Performane Computing, Department of Computer Siene and Tehnology, Tsinghua University, Beijing,

More information

Automatic Generation of Transaction-Level Models for Rapid Design Space Exploration

Automatic Generation of Transaction-Level Models for Rapid Design Space Exploration Automati Generation of Transation-Level Models for Rapid Design Spae Exploration Dongwan Shin, Andreas Gerstlauer, Junyu Peng, Rainer Dömer and Daniel D. Gajski Center for Embedded Computer Systems University

More information

An Alternative Approach to the Fuzzifier in Fuzzy Clustering to Obtain Better Clustering Results

An Alternative Approach to the Fuzzifier in Fuzzy Clustering to Obtain Better Clustering Results An Alternative Approah to the Fuzziier in Fuzzy Clustering to Obtain Better Clustering Results Frank Klawonn Department o Computer Siene University o Applied Sienes BS/WF Salzdahlumer Str. 46/48 D-38302

More information

Gray Codes for Reflectable Languages

Gray Codes for Reflectable Languages Gray Codes for Refletable Languages Yue Li Joe Sawada Marh 8, 2008 Abstrat We lassify a type of language alled a refletable language. We then develop a generi algorithm that an be used to list all strings

More information

PROJECT PERIODIC REPORT

PROJECT PERIODIC REPORT FP7-ICT-2007-1 Contrat no.: 215040 www.ative-projet.eu PROJECT PERIODIC REPORT Publishable Summary Grant Agreement number: ICT-215040 Projet aronym: Projet title: Enabling the Knowledge Powered Enterprise

More information

Performance Benchmarks for an Interactive Video-on-Demand System

Performance Benchmarks for an Interactive Video-on-Demand System Performane Benhmarks for an Interative Video-on-Demand System. Guo,P.G.Taylor,E.W.M.Wong,S.Chan,M.Zukerman andk.s.tang ARC Speial Researh Centre for Ultra-Broadband Information Networks (CUBIN) Department

More information

Approximate logic synthesis for error tolerant applications

Approximate logic synthesis for error tolerant applications Approximate logi synthesis for error tolerant appliations Doohul Shin and Sandeep K. Gupta Eletrial Engineering Department, University of Southern California, Los Angeles, CA 989 {doohuls, sandeep}@us.edu

More information

We don t need no generation - a practical approach to sliding window RLNC

We don t need no generation - a practical approach to sliding window RLNC We don t need no generation - a pratial approah to sliding window RLNC Simon Wunderlih, Frank Gabriel, Sreekrishna Pandi, Frank H.P. Fitzek Deutshe Telekom Chair of Communiation Networks, TU Dresden, Dresden,

More information

A Coarse-to-Fine Classification Scheme for Facial Expression Recognition

A Coarse-to-Fine Classification Scheme for Facial Expression Recognition A Coarse-to-Fine Classifiation Sheme for Faial Expression Reognition Xiaoyi Feng 1,, Abdenour Hadid 1 and Matti Pietikäinen 1 1 Mahine Vision Group Infoteh Oulu and Dept. of Eletrial and Information Engineering

More information

Algorithms for External Memory Lecture 6 Graph Algorithms - Weighted List Ranking

Algorithms for External Memory Lecture 6 Graph Algorithms - Weighted List Ranking Algorithms for External Memory Leture 6 Graph Algorithms - Weighted List Ranking Leturer: Nodari Sithinava Sribe: Andi Hellmund, Simon Ohsenreither 1 Introdution & Motivation After talking about I/O-effiient

More information

Cross-layer Resource Allocation on Broadband Power Line Based on Novel QoS-priority Scheduling Function in MAC Layer

Cross-layer Resource Allocation on Broadband Power Line Based on Novel QoS-priority Scheduling Function in MAC Layer Communiations and Networ, 2013, 5, 69-73 http://dx.doi.org/10.4236/n.2013.53b2014 Published Online September 2013 (http://www.sirp.org/journal/n) Cross-layer Resoure Alloation on Broadband Power Line Based

More information

1. Introduction. 2. The Probable Stope Algorithm

1. Introduction. 2. The Probable Stope Algorithm 1. Introdution Optimization in underground mine design has reeived less attention than that in open pit mines. This is mostly due to the diversity o underground mining methods and omplexity o underground

More information

Calculation of typical running time of a branch-and-bound algorithm for the vertex-cover problem

Calculation of typical running time of a branch-and-bound algorithm for the vertex-cover problem Calulation of typial running time of a branh-and-bound algorithm for the vertex-over problem Joni Pajarinen, Joni.Pajarinen@iki.fi Otober 21, 2007 1 Introdution The vertex-over problem is one of a olletion

More information

Gradient based progressive probabilistic Hough transform

Gradient based progressive probabilistic Hough transform Gradient based progressive probabilisti Hough transform C.Galambos, J.Kittler and J.Matas Abstrat: The authors look at the benefits of exploiting gradient information to enhane the progressive probabilisti

More information

Bayesian Belief Networks for Data Mining. Harald Steck and Volker Tresp. Siemens AG, Corporate Technology. Information and Communications

Bayesian Belief Networks for Data Mining. Harald Steck and Volker Tresp. Siemens AG, Corporate Technology. Information and Communications Bayesian Belief Networks for Data Mining Harald Stek and Volker Tresp Siemens AG, Corporate Tehnology Information and Communiations 81730 Munih, Germany fharald.stek, Volker.Trespg@mhp.siemens.de Abstrat

More information

Improved flooding of broadcast messages using extended multipoint relaying

Improved flooding of broadcast messages using extended multipoint relaying Improved flooding of broadast messages using extended multipoint relaying Pere Montolio Aranda a, Joaquin Garia-Alfaro a,b, David Megías a a Universitat Oberta de Catalunya, Estudis d Informàtia, Mulimèdia

More information

Multi-hop Fast Conflict Resolution Algorithm for Ad Hoc Networks

Multi-hop Fast Conflict Resolution Algorithm for Ad Hoc Networks Multi-hop Fast Conflit Resolution Algorithm for Ad Ho Networks Shengwei Wang 1, Jun Liu 2,*, Wei Cai 2, Minghao Yin 2, Lingyun Zhou 2, and Hui Hao 3 1 Power Emergeny Center, Sihuan Eletri Power Corporation,

More information

We P9 16 Eigenray Tracing in 3D Heterogeneous Media

We P9 16 Eigenray Tracing in 3D Heterogeneous Media We P9 Eigenray Traing in 3D Heterogeneous Media Z. Koren* (Emerson), I. Ravve (Emerson) Summary Conventional two-point ray traing in a general 3D heterogeneous medium is normally performed by a shooting

More information

A Novel Bit Level Time Series Representation with Implication of Similarity Search and Clustering

A Novel Bit Level Time Series Representation with Implication of Similarity Search and Clustering A Novel Bit Level Time Series Representation with Impliation of Similarity Searh and lustering hotirat Ratanamahatana, Eamonn Keogh, Anthony J. Bagnall 2, and Stefano Lonardi Dept. of omputer Siene & Engineering,

More information

Deep Rule-Based Classifier with Human-level Performance and Characteristics

Deep Rule-Based Classifier with Human-level Performance and Characteristics Deep Rule-Based Classifier with Human-level Performane and Charateristis Plamen P. Angelov 1,2 and Xiaowei Gu 1* 1 Shool of Computing and Communiations, Lanaster University, Lanaster, LA1 4WA, UK 2 Tehnial

More information

Reducing Runtime Complexity of Long-Running Application Services via Dynamic Profiling and Dynamic Bytecode Adaptation for Improved Quality of Service

Reducing Runtime Complexity of Long-Running Application Services via Dynamic Profiling and Dynamic Bytecode Adaptation for Improved Quality of Service Reduing Runtime Complexity of Long-Running Appliation Servies via Dynami Profiling and Dynami Byteode Adaptation for Improved Quality of Servie ABSTRACT John Bergin Performane Engineering Laboratory University

More information

Facility Location: Distributed Approximation

Facility Location: Distributed Approximation Faility Loation: Distributed Approximation Thomas Mosibroda Roger Wattenhofer Distributed Computing Group PODC 2005 Where to plae ahes in the Internet? A distributed appliation that has to dynamially plae

More information

New Fuzzy Object Segmentation Algorithm for Video Sequences *

New Fuzzy Object Segmentation Algorithm for Video Sequences * JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 24, 521-537 (2008) New Fuzzy Obet Segmentation Algorithm for Video Sequenes * KUO-LIANG CHUNG, SHIH-WEI YU, HSUEH-JU YEH, YONG-HUAI HUANG AND TA-JEN YAO Department

More information

Zippy - A coarse-grained reconfigurable array with support for hardware virtualization

Zippy - A coarse-grained reconfigurable array with support for hardware virtualization Zippy - A oarse-grained reonfigurable array with support for hardware virtualization Christian Plessl Computer Engineering and Networks Lab ETH Zürih, Switzerland plessl@tik.ee.ethz.h Maro Platzner Department

More information

Implementing Load-Balanced Switches With Fat-Tree Networks

Implementing Load-Balanced Switches With Fat-Tree Networks Implementing Load-Balaned Swithes With Fat-Tree Networks Hung-Shih Chueh, Ching-Min Lien, Cheng-Shang Chang, Jay Cheng, and Duan-Shin Lee Department of Eletrial Engineering & Institute of Communiations

More information

COMP 181. Prelude. Intermediate representations. Today. Types of IRs. High-level IR. Intermediate representations and code generation

COMP 181. Prelude. Intermediate representations. Today. Types of IRs. High-level IR. Intermediate representations and code generation Prelude COMP 181 Intermediate representations and ode generation November, 009 What is this devie? Large Hadron Collider What is a hadron? Subatomi partile made up of quarks bound by the strong fore What

More information

TOWARD HYBRID VARIANT/GENERATIVE PROCESS PLANNING

TOWARD HYBRID VARIANT/GENERATIVE PROCESS PLANNING Proeedings of DETC 97: 1997 ASME Design Engineering Tehnial Conferenes September 14-17,1997, Saramento, California DETC97/DFM-4333 TOWARD HYBRID VARIANT/GENERATIVE PROCESS PLANNING Alexei Elinson Dept.

More information

- 1 - S 21. Directory-based Administration of Virtual Private Networks: Policy & Configuration. Charles A Kunzinger.

- 1 - S 21. Directory-based Administration of Virtual Private Networks: Policy & Configuration. Charles A Kunzinger. - 1 - S 21 Diretory-based Administration of Virtual Private Networks: Poliy & Configuration Charles A Kunzinger kunzinge@us.ibm.om - 2 - Clik here Agenda to type page title What is a VPN? What is VPN Poliy?

More information