Modeling and Predicting the Efficiency of Application Execution in Distributed Environments

Size: px
Start display at page:

Download "Modeling and Predicting the Efficiency of Application Execution in Distributed Environments"

Transcription

1 Proceedigs of the 1th WSEAS Iteratioal Coferece o COMPUTERS, Vouliagmei, Athes, Greece, July 13-15, 26 (pp19-26) Modelig ad Predictig the Efficiecy of Applicatio Executio i Distributed Eviromets AHMED M. AL KINDI a,*, KHAMIS N. AL GHARBI b, GRAHAM NUDD c a Departmet of Computer Sciece, College of Sciece b Iformatio Systems Departmet, College of Commerce ad Ecoomics Sulta Qaboos Uiversity, P.O. Box 36, PC 123, Oma c High Performace Systems Laboratory, Departmet of Computer Sciece, Uiversity of Warwick, Covetry CV4 7AL, UK Abstract: Oe of the mai objectives i the study of performace predictio is to give researchers the optio to select from a variety of distributed systems/resources based o what would optimise the performace of the applicatio. This paper itroduces a optimizatio scheme for modellig ad predictig the efficiecy of a applicatio executio whe ru i a distributed eviromet. The results obtaied from this research show that the suggested modellig ad predictio techique ca provide the same quality of iformatio as a measuremet process for applicatio optimizatio, but i a fractio of the time. Keywords: Performace Optimizatio, Dyamic Performace Predictio/Modellig, DFT, FFT, FFTW, PACE. 1 Itroductio The promise of Iformatio Power Grid (IPG) is based o the cocept of selectig distributed resources to perform a sigle or multiple computatioal tasks with performace optimisatio i mid [1]. It relies o the availability of systems ad the utilisatio of performace iformatio to guide applicatio executio (i.e. commuicatio ad sychroizatio overhead of a applicatio executio whe ru i a distributed eviromet [2]). Istead of writig optimized codes by had, there have bee several examples of automated code geerators geared towards obtaiig the maximum performace o differet platforms. Oe such library is the Portable High Performace ANSI-C (PHiPAC), developed at Berkeley ad iteded for producig high performace matrix-vector libraries i the Basic Liear Algebraic Subprograms (BLAS) suite [3], [4], ad [5]. I a similar way, a adaptive applicatio called The Fastest Furrier Trasform i the West (FFTW) [6] uses automatically geerated blocks of highly optimized C codes called codelets, to calculate a FFT i a miimum amout of time o ay system [7]. This paper utilizes ad models this adaptive applicatio ad shows that efficiecy of applicatio executio i distributed eviromets ca be predicted with reasoable quality of iformatio. This research uses a performace evaluatio tool developed by the Uiversity of Warwick 1 called Performace Aalysis ad Characterizatio Eviromet (PACE), which was developed as a modellig toolset for high performace ad distributed applicatios. The work is a extesio of the sequetial performace evaluatio techiques preseted i [8] ad [9] which led to further ehacemet of the FFTW. The experimets ad evaluatios were coducted usig machies with equal CPU ad memory sizes (Itel Petium 2.4 GHz ad 512 MB RAM ruig uder the Red Hat LINUX operatig system). The ext sectio itroduces PACE while Sectio 3 discuses the FFTW package. These overviews are deliberately brief due 1 This work was coducted at the Uiversity of Warwick (UK) ad fuded i part by the Sulta Qaboos Uiversity, Oma

2 Proceedigs of the 1th WSEAS Iteratioal Coferece o COMPUTERS, Vouliagmei, Athes, Greece, July 13-15, 26 (pp19-26) i part to space costrait. Iterested researchers are advised to seek further iformatio i literatures refereced at the ed of this paper. Sectios 4 through 7 preset the implemetatio ad results. 2 Performace Aalysis ad Characterizatio Eviromet (PACE) PACE was developed as a modelig toolset for high performace ad distributed applicatios. It icludes tools for the defiitio ad creatio of the applicatio model ad applicatio performace aalysis [9]. It uses associative objects orgaized i a layered framework as a basis for represetig each of a system s compoets [1]. PACE supports ay applicatio that depeds o message passig usig MPI. I fact, ay hardware or platform that uses these programmig tools ca be aalyzed withi PACE [11]. A key feature of the object orgaizatio is the idepedet represetatio of computatio, parallelizatio, ad hardware i a layered framework. The fuctios of the layers are: Applicatio Layer: describes a applicatio i terms of a sequece of subtasks. It acts as the etry poit to the performace study, ad icludes a iterface that ca be used to modify parameters of a performace sceario. Applicatio Subtask Layer: describes the sequetial part of every subtask withi a applicatio that ca be executed i parallel. A applicatio might cosist of a umber of differet subtasks, each represetig the applicatio s parallel kerel. A combiatio of these subtasks by the applicatio object forms a fuctioality model of the whole applicatio. Parallel Template Layer: describes the parallel characteristics of subtasks i terms of expected computatio/commuicatio iteractios betwee processors. This is doe through the use of models i the hardware layer. Hardware Layer: collects system specificatio parameters, micro-bechmark results (e.g. atomic laguage operatios), statistical models (e.g. regressio commuicatio models), aalytical models (e.g. cache cotetio), ad heuristics that characterize the commuicatio ad computatio abilities of a particular system. Whe applicatio source code is available, PACE performs a static aalysis of the code to produce the cotrol flow of the applicatio, operatio couts, ad the commuicatio structure. A compiler traslates C code to scripts that are the liked with a evaluatio egie ad the hardware models. The fial output is a biary file that ca be executed rapidly. The user determies the system/applicatio cofiguratio ad the type of output that is required as commad lie argumets. The model biary performs all the ecessary model evaluatios ad produces the requested results. A MPI versio of PACE is also available where it has the ability to measure ad evaluate MPI-Sed, MPI-Receive, ad other MPI commads. MPI-PACE takes ito accout additioal overhead that is icurred whe sedig ad receivig the data from ad to the processors. This commuicatio/sychroizatio overhead is also modeled ad predicted ad the added to the overall computatioal cost of the evaluated applicatio. 3 The Fastest Fourier Trasform i the West (FFTW) The FFTW is based o the divide-ad-coquer approach of the recursive FFT. To illustrate how a DFT is computed usig the recursive FFT, let us assume that the followig polyomial: 1 = j A ( x) a j x (1) j= is to be evaluated at the complex roots of uity of degree : ω, ω, ω,..., ω ( is assured to be a power of 2). Polyomial A ca be used to represet ay complex valued vector a where: a = (a,a 1,,a -1 ) (2) The the results ca be defied as follows: y k 1 = A( ω k ) kj = aiω (3) j= where k =,1,2,., -1. The DFT of vector a is the vector: y = (y, y 1,, y -1 ) (4) To compute the DFT of the vector a, the array is divided ito a eve-idex sub-array, ad a oddidex sub-array. The two polyomials represetig the divided arrays are hece as follows: A [] (x) = a + a 2 x + a 4 x a -2 x /2-1 (5)

3 Proceedigs of the 1th WSEAS Iteratioal Coferece o COMPUTERS, Vouliagmei, Athes, Greece, July 13-15, 26 (pp19-26) A [1] (x) = a 1 + a 3 x + a 5 x a -1 x /2-1 (6) This divide-ad-coquer method is based o the followig formula: A(x) = A [] (x 2 ) + A [1] (x 2 ) * x (7) Therefore, the computatio of the DFT of a is ow reduced to two steps: Evaluatig the polyomials: A [] (x) ad A [1] (x) of degree /2 at the /2 complex roots of uity of degree /2: ω, ω, ω,..., ω 2 4 2, ad the applyig formula (7). Sice each root occurs twice, the equatios i (5 ad 6) may ow be recursively computed at the /2 complex roots of uity of degree /2. Therefore, a elemet DFT computatio is successfully divided ito two /2 elemet DFT computatios. This forms the basis of the portable C package called the FFTW which computes oe or multi-dimesioal complex DFT. It is claimed that due to its self-cofigurig ad dyamic ature, the FFTW ca compute the trasform faster tha ay other software. It requires a sequece of measuremets to be made for its software compoets (codelets) o the target system, ad the uses a combiatio of these codelets to determie how best to compute the trasform of ay give size. There are three essetial compoets that make up the FFTW: Codelets: These are a collectio of highly optimized blocks of C code used to compute the trasform. The codelets are geerated automatically. There are two types of codelets: No-Twiddle: used to compute small-sized trasforms (N <= 64), ad Twiddle: used to compute larger trasforms by the combiatio of smaller oes. Plaer: The plaer determies the most efficiet way i which the codelets ca be combied together to calculate the required trasform. Each idividual combiatio of codelets is called a pla. Typically dozes of plas are available for a specific FFT size. Executor: The executio of each pla is performed ad measured by the executor. This results i the ru-time cost of each pla o a specific system. From this, the best pla ca be determied i.e. the pla with the miimum executio time. The iitial stage of the FFTW operatio ivolves the plaer, which fids out the fastest way to compute the DFT of a give complex array. To illustrate the operatio of the plaer, a example of performig a 16-poit FFT is cosidered. The plaer uses the divide-ad-coquer approach to partitio the data set ad to costruct all possible combiatios of codelets. The resultig istructios cotaied withi the pla may look as follows: FFT (16) = Divide & Coquer (16,8) >> 8. FFT (2) or Divide & Coquer (16,4) >> 4. FFT (4) or Divide & Coquer (16,2) >> 2. FFT (8) or Solve (16) I this example, the plaer has selected four types of istructios or sequeces to sed to the executor. Oe is to solve usig a o-twiddle codelet 16, ad the rest are to solve usig a combiatio of twiddle ad o-twiddle codelets (e.g. twiddle codelet 8 ad a o-twiddle codelet 2 for the first istructio). To beefit from the recursive ature of the algorithm, solutios to idividual istructios could also be stored i tables durig this real-time computatio for reusability. The differet alteratives that are available for the calculatio usig this approach i the FFTW result i differet performace due, i part, to the use of the memory hierarchy icludig caches. For istace, the data required for computig a FFT of a small size may fit ito the system s primary cache, resultig i a better processig time, whereas the FFT of a larger size may ot. Thus combiatios of smaller FFTs may out-perform larger FFTs. By iitially measurig all these combiatios, a best pla ca be determied, which will be differet o differet systems. The FFTW provides two parallelizatio techiques: Multithread FFTW ad MPI-FFTW. Because PACE relies o MPI for its predictio aalysis of parallel eviromets, MPI-FFTW is selected for use i this research. Further detail o the FFTW may be foud at 4 Modelig the FFTW The measuremet procedure required by the FFTW to select the best pla for a specific system ca be replaced by a set of performace models. To

4 Proceedigs of the 1th WSEAS Iteratioal Coferece o COMPUTERS, Vouliagmei, Athes, Greece, July 13-15, 26 (pp19-26) overcome the costly measuremets of its iitial stage, separate models are created to represet all idividual FFTW codelets. The codelets are dyamically profiled ad aalyzed, ad the models for the idividual codelets are geerated. The operatio of the FFTW plaer is the simulated i order to select the miimum predicted executio time of a appropriate set of codelet models. Fially, the predicted optimum pla is passed to the FFTW executor to compute the trasformatio. The evaluatio of each pla ca be performed for a rage of systems ad problem sizes. The evaluatio time for this process is faster tha the correspodig real-time FFTW measuremets. The validity of this optimizatio methodology depeds o the selectio patter of the optimum solutio (or best pla). To achieve this, the performace models require accurate predictio of every pla s executio time. The performaces of six selected o-twiddle codelets are illustrated i Table 1 which shows their modeled performace compared to their real-time executio o a sigle Petium machie (P64 stads for pla for FFT of size 64). Table 1. Executio vs Predictio for six o-twiddle codelets Plas PACE (µs) FFTW (µs) PACE / FFTW (%) P P P P P P Table 2 holds the predicted times for the modeled twiddle ad o-twiddle codelets. Table 2. Mode ls table for all the c odelets Twiddle No-twiddle Time (µs) Codelets Codelets Time (µs) Oce the models have bee created ad depedig o the required FFT size, a performace p redictio ca be extracted. The ext phase i the aalysis requires the creatio of the commuicatio ad sychroizatio models which predict the overhead cost of trasferrig ad receivig the FFT data amog processors. The MPI- FFTW uses a umber of kow MPI calls (such as MPI-Sed, MPI-Receive, MPI-Gather, etc) to commuicate ad sychroize the parallelizatio of the trasform process. For example, it uses MPI_Bcast to broadcast a message (such as trasform size) to all used processors, ad uses MPI_Sed to sed data. These MPI calls form the mai commuicatio/sychroizatio overhead of the MPI-FFTW. The calls are evaluated usig PACE s parallel template where the umber of processors to be used is also declared. The applicatio object (Fig.1) is cofigured, compiled ad liked with the subtask object (Fig.2) alog with the parallel template (Fig.3). The parallel template passes the umber of processors through variable Nproc. The resultig biary would ow produce the commuicatio cost estimates depedig o the required umber of processors. applicatio foo_appl { iclude foo_subt; iclude hardware; var umeric: N = N, Nproc = 1; lik { hardware: Nproc = Nproc; foo_subt: N = N/Nproc; optio { hrduse="itelp4_24"; proc exec iit { call foo_subt; Fig.1. Applicatio object (foo_appl) used i MPI- PACE Table 3 shows the predicted commuicatio / sychroizatio costs for the MPI-FFTW o the give umber of machies. The ability to evaluate ad predict the cost of commuicatio whe usig parallel processig, gives researchers a cosiderable advatage sice there eed ot be ay supportig applicatio executio. This is especially true if the applicatio at had is big ad takes a log time to execute.

5 Proceedigs of the 1th WSEAS Iteratioal Coferece o COMPUTERS, Vouliagmei, Athes, Greece, July 13-15, 26 (pp19-26) subtask foo_subt { iclude asyc; iclude hardware; var umeric: N; lik { asyc: Tx = mai(), N = N; (* * CHIP3S * Applicatio Characterizatio Tool * Source : foo_subt * RUV Type: clc *) proc cflow mai { compute <is clc, FCAL>; Fig.2. Applicatio subtask (foo_subt) used i MPI- PACE #i clude <mpidefs.h> (* * asyc.la - Parallel template*) partmp asyc { iclude hardware; var umeric: N; var compute: Tx; (*executio time *) optio { stage = 1, seval = 1; proc exec iit { var umeric :i; step mpicom { for (i = 2; i <= hardware.nproc; i = i + 1) { call MPI_Barrier (MPI_COMM_WORLD); call MPI_Bcast (4,1,MPI_COMM_WORLD); call MPI_Gather (4,4,1,MPI_COMM_WORLD); cofdev 1, i, 4, MPI_COMM_WORLD; (* prit "asyc: Tx=", Tx; *) step cpu { cofdev Tx; Fig.3. Parallel template object used i MPI-PACE Oe additioal advatage is that a user ca also evaluate the parallelizatio of more tha the umber of machies curretly available. For example, i this study, up to 256 processors (which is more the the available resources) were defied i the evaluatio models. This helps i tryig to foresee the outcome of the parallelizatio of computig certai FFT ad parallelizatio sizes where it was impossible to do because of either huge commuicatio overhead that icur uacceptable bottleeck, or the fact that there were t eough machies to do the experimets o. Table 3. Predicted commuicatio/sychroizatio costs for MPI-FFTW Number of Processors Commuicatio Overhead (µs) e+6 After creatig the models for the commuicatio / sychroizatio overheads ad the codelets, a aalysis ad performace evaluatio may ow be coducted for ay FFT size ad for ay umber of processors i a parallelized eviromet. Give the fact that all machies used i this study are of the same performace characterizatio, the it is safe to assume that if the predicted executio time for a poit FFT equals Tseq (predicted sequetial time), the the predicted executio time for a /p poit FFT is (Tseq / p). If this is the applied to a distributed memory eviromet where p represets the umber of processors, ad add the predicted overhead cost, the the predicted computatio is as follows: where Tpar is the predicted time for computig a trasform by the FFTW i a parallel eviromet, T seq is the predicted sequetial time for the same trasform, p is the umber of processors used, ad T (c ommu.+ syc) is the overhead for commuicatio / sychroizatio. MPI-FFTW uses a speedup measuremet to evaluate its performace ehacemet through parallelism. Therefore, the experimets coducted here are also measured accordigly so that comparisos could be made i terms of speedups as well as executio times. This measuremet is give by the factor of sequetial executio time over the time obtaied whe ru i a distributed eviromet: o r: Predicted Tpar (p) = (Tseq / p) + T (commu.+syc) (8) Speed up = Tseq / Tpar (9) Speedup = Tseq / [(Tseq / p) + T (commu.+syc)] (1)

6 Proceedigs of the 1th WSEAS Iteratioal Coferece o COMPUTERS, Vouliagmei, Athes, Greece, July 13-15, 26 (pp19-26) 5 A Parallelized Performace Predictio ad Evaluatio of the MPI-FFTW Up to 16 cocurret processors were used to predict ad measure i real-time the ehacemet gaied by usig the MPI-FFTW to compute small trasforms. Results show that for small FFTs, both the predicted pe rformace as well as the FFTW real- time executio degrades istatly whe distributed memory is used. This is because the iitial cost of commuicatio overhead is greater tha the total computatio cost of the FFT i a sigle machie. Fig.4 shows the predicted speedup for computig a 64 poit FFT o 2, 4, 8, ad 16 processors i compariso to the real-time speedup gaied by the MPI-FFTW. The speedups for oe processor (which is 1 for all FFT sizes) are ot plotted so that the rest of the speedups ca be clearly viewed. eedup Sp FFTW PACE2 Fig.4. Speedup comparisos for a 64 poit FFT Speedup Fig.5. Predicted Speedups 16 N=1K N=2K N=4K Other FFT sizes (256, 512, K, 2K, ad 4K) were used i order to compare the MPI-FFTW with the MPI-PACE. The trasforms were computed usig 16 distributed machies. Fig.5 shows the predicted speedups, while Fig.6 shows the real-time measuremets. The behavior of the MPI-FFTW i the begiig is similar to the predicted oe. The iitial icrease i the umber of processors results i the degradatio of the performace. The behavior chages a little as the real-time speedups idicate a better performace tha predicted whe 8 processors are used for sizes 2K ad 4K, but the overall patter remais the same. While the real-time speedups deped o the availability of the machies (which is ot always possible), ad the possibility of affectig other users due to commuicatio overhead, it is much easier to coduct performace evaluatios ad produce reliable predictios. To illustrate this poit further, a umber of evaluatio experimets were coducted usig 1 to 256 processors ad for FFT sizes ragig from 32 to 128K. As Fig.7 shows, it was possible to coduct a predictio aalysis for up to 256 machies. The figure shows that for the user to beefit from the available distributed resources i the machies, the trasform must be at least of the size 16K. It also idicates that for that size, oly usig 2 processors would be deemed successful i terms of speedup. Trasforms of sizes 32K, 64K, ad 256K may be computed faster whe 4 machies are used i parallel. Otherwise the commuicatio/sychroizatio overhead is provig much costlier ad the speedups start to degrade agai as the umb er of processors icrease. However, although the performace predictio of the MPI- FFTW is close to that of the real-time executio whe the size of the FFT is small, a differet sceario occurs whe a compariso is made for larger trasforms. A umber of factors cotribute to this discrepacy. The FFTW is a adaptive package ad chages its behavior accordig to the circumstaces of the computatio, the fial outcome of the performace of the FFTW depeds also o the FFT size, speed i which idividual/combied trasforms (plas) are computed, ad the load o the system. As these factors icrease i size ad umber, so would the differece betwee the predicted ad the real-time performace. A icrease i the umber of plas meas a icrease i the variatios of best plas chose for each ru ad hece the possibility of sedig a differet pla to compute the trasform the the oe beig aalyzed ad predicted. Furthermore, the load affectig the machies i the real-time eviromet also cotributes to the overall performace of the FFTW, ad to have multiple machies ruig at the same time while beig affected by exteral variables meas that the chace of predictig precisely what the performace would be is more difficult whe the size of the problem is larger. Aother factor is the probability of havig the commuicatio/sychroizatio cost overestimated, which may idicate a eed for further research towards improvig the MPI aalysis especially whe

7 Proceedigs of the 1th WSEAS Iteratioal Coferece o COMPUTERS, Vouliagmei, Athes, Greece, July 13-15, 26 (pp19-26) dyamically optimized applicatios are studied. However, for most practical sizes of FFTs, the user ca greatly beefit from the modelig techiques as well as the dyamic performace evaluatio preseted here. Speedup Speedup Fig.6. MPI-FFTW Speedups K 2 K 4 K N=32 N=64 N=128 N=1K N=2K N=4K N=8K N=16K N=32K N=64K N=128K Fig.7. Speedup comparisos for various FFT sizes usig up to 256 machies 6 Modelig ad Predictig the Efficiecy The efficiecy of parallel computatio is calculated as the ratio of speedup resultig from distributig a give problem over a umber of processors. Efficiecy iformatio i this case study is helpful i determiig if the predicted speedups are efficiet eough for the size of the give problem. The efficiecy of usig p umber of processors equals the speedup gaied whe usig p processors i a parallel eviromet divided by p, ad it is writte as follows: Efficiecy(p) = Speedup (p) / p (11) For example, if the speedup of usig 4 machies i parallel to compute a give FFT is 2, the the efficiecy of this computatio process is 2/4 = 1/2. That is, the utilisatio power of such eviromet is oly 5% of the total computatioal resources available. Therefore, while speedups i Fig.7 are up to 2.5 faster o certai sizes, this is still ot the best performace the machies ca offer. Uless the machies are beig used i solvig other problems, 1/2 of the resources remai idle while the comput atio is goig o. Fig.8 illustrates that the efficiecy predicted is similar to that gaied by the MPI-FFTW. The figure represets predictio ad executio compariso for the efficiecy of computig a 64 poit trasform i a distributed eviromet cosistig of up to 16 processors. iciecy Eff PACE FFTW Fig.8. Efficiecy comparisos for computig a 64 poit FFT Efficiecy 1.2 N= N=64 N=128 N=1K N=2K N=4K N=8K N=16K N=32K N=64K N=128K Fig.9. Predicted efficiecy for computig various FFT sizes usig up to 256 machies Fig.9 predicts that usig oe machie to solve up to 128K FFT is more efficiet tha parallelisig the computatio eve if there are as may as 256 powerful processors. Oe reaso for this is the large commuicatio overhead cost that is ivolved whe computig this size of FFT. The fact that the data must be divided ad the distributed amog processors for the computatio the they must be reallocated back cotributes to this large overhead. For compariso purposes, the efficiecy of the parallelisatio process of the real-time MPI-FFTW is also measured (although less machies used). As predicted ad as illustrated by Fig.1, there is t much efficiecy i tryig to parallelise the computatio of the FFTs usig a fast machie like the Petium. The decrease i efficiecy is as sharp as the oe predicted by the previous figure.

8 Proceedigs of the 1th WSEAS Iteratioal Coferece o COMPUTERS, Vouliagmei, Athes, Greece, July 13-15, 26 (pp19-26) Efficiecy N=32 N=64 N=128 N=1 K Fig.1. MPI-FFTW efficiecy for computig various FFT sizes Efficiecy K 32 K 64 K Fig.11. MPI-FFTW efficiecy for large FFT sizes usig up to 16 machies However, ad as show i Fig.11, the efficiecy for computig a 64K poit FFT starts to icrease whe more tha 8 processors are used. This last observatio cofirms the previously oted fact that for large FFTs, predictig the overall performace behaviour of the FFTW is more difficult. 7 Summary This paper preseted a modelig techique for evaluatig the performace ad efficiecy of a applicatio executio i distributed eviromet. Besides predictig the behaviour patter of the parallelised computatios, the paper also aalysed ad predicted speedups as well as efficiecy gaied through multiprocessig. Results showed a reliable predictio process with a huge reductio i the overhead cost of the learig process. Predicted speedups suggest a performace ehacemet that is ot efficiet because the commuicatio ad sychroisatio costs are too big for the computatios at had. This is especially true whe large umber of processors is used. These predictios ad overall behaviour patters are matched by the real-time implemetatio. 2 K 4 K 8 K Experimets carried out here also show that parallelizatio schemes ca be aalyzed ad their performaces ca be predicted where either the cost of doig real-time studies is expesive ad/or whe there are t eough resources available. The results also highlight MPI-PACE overestimatio of the commuicatio / sychroizatio i distributed eviromets ad the eed to further re-examie it so that a more precise evaluatio is coducted. Refereces: [1]. Foster, C Kesselma, J. Nick. ad S. Tuecke. Grid Services for Distributed Systems Itegratio. IEEE Computer, 35 (6). 22. [2]. A. M. Alkidi, K. N. Al Gharbi, G. R. Nudd. Modelig ad Predictig the Commuicatio ad Sychroizatio Overhead of a Distributed Eviromet. Proceedigs of the 3 rd Iteratioal Symposium o Telecommuicatios, September 25. Volume 1 of IST25, pp [3]. C. Lawso, R. Haso, D. Kicaid, ad F. Krogh. Basic Liear Algebra Subprograms (BLAS) for FORTRAN usage. ACM Trasactios o Mathematical Software, 5:38-325, [4]. J. Dogarra, W. Getzsch: Computer Bechmarks, Volume 8, [5]. L. Blackford, J. Demmel, J. Dogarra, I. Duff, S. Hammarlig, G. Hery, M. Heroux, L. Kaufma, A. Lumsdaie, A. Petitet, R. Pozo, K. Remigto, R. Whaley. A Updated Set of Basic Liear Algebra Subprograms (BLAS), ACM TOMS, Volume 28, Issue 2, Jue 22, pp , ISSN [6]. M. Frigo, S. Johso. FFTW: A adaptive software architecture for FFT, i Proc. of the IEEE It. Cof. o Acoustics Speech, ad Sigal Processig, 3, Seattle (1998) [7]. M. Frigo. A fast Fourier Trasform Compiler, i Proc. of the ACM SIGPLAN Cof. o Programmig Laguage Desig ad Implemetatio (PLDI 99), Atlata (1999). [8]. A. Alkidi, D. Kerbyso, E. Papaefstathiou, G. Nudd. Ru-Time Optimizatio Usig Dyamic Performace Predictio, Proceedigs of the 8th iteratioal coferece, High-Performace Computig ad Networkig, Amsterdam, May 2. Volume 1823 of LNCS, Spriger, pp [9]. A. Alkidi, D. Kerbyso, E. Papaefstathiou, G. Nudd. Optimizatio of applicatio executio o dyamic systems, Future Geeratio Computer Systems 17 (8) (21), pp [1]. D. Kerbyso, E. Papaefstathiou, J. Harper, S. Perry, ad G. Nudd: Is predictive tracig too late for HPC users, I High-Performace Computig, pp 57-67, Pleum Press, [11]. G. Nudd, D. Kerbyso, E. Papaefstathiou, S. Perry, J. Harper, D. Wilcox: PACE-A Toolset for the performace predictio of parallel ad distributed systems, The Iteratioal Joural of High Performace Computig Applicatios, vol 14, fall 2, pp [12]. G. Nudd, E. Papaefstathiou, Y. Papay, T. Atherto, C. Clarke, D. Kerbyso, A. Stratto, R. Ziai, ad M. Zemerly: A layered approach to the characterizatio of parallel systems for performace predictio, i Proc. of the Performace Evaluatio of Parallel Systems Workshop (PEPS 93), Covetry, U.K. (1993) pp

Chapter 3 Classification of FFT Processor Algorithms

Chapter 3 Classification of FFT Processor Algorithms Chapter Classificatio of FFT Processor Algorithms The computatioal complexity of the Discrete Fourier trasform (DFT) is very high. It requires () 2 complex multiplicatios ad () complex additios [5]. As

More information

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe CHAPTER 19 Query Optimizatio Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe Itroductio Query optimizatio Coducted by a query optimizer i a DBMS Goal:

More information

A SOFTWARE MODEL FOR THE MULTILAYER PERCEPTRON

A SOFTWARE MODEL FOR THE MULTILAYER PERCEPTRON A SOFTWARE MODEL FOR THE MULTILAYER PERCEPTRON Roberto Lopez ad Eugeio Oñate Iteratioal Ceter for Numerical Methods i Egieerig (CIMNE) Edificio C1, Gra Capitá s/, 08034 Barceloa, Spai ABSTRACT I this work

More information

Lecture Notes 6 Introduction to algorithm analysis CSS 501 Data Structures and Object-Oriented Programming

Lecture Notes 6 Introduction to algorithm analysis CSS 501 Data Structures and Object-Oriented Programming Lecture Notes 6 Itroductio to algorithm aalysis CSS 501 Data Structures ad Object-Orieted Programmig Readig for this lecture: Carrao, Chapter 10 To be covered i this lecture: Itroductio to algorithm aalysis

More information

Chapter 1. Introduction to Computers and C++ Programming. Copyright 2015 Pearson Education, Ltd.. All rights reserved.

Chapter 1. Introduction to Computers and C++ Programming. Copyright 2015 Pearson Education, Ltd.. All rights reserved. Chapter 1 Itroductio to Computers ad C++ Programmig Copyright 2015 Pearso Educatio, Ltd.. All rights reserved. Overview 1.1 Computer Systems 1.2 Programmig ad Problem Solvig 1.3 Itroductio to C++ 1.4 Testig

More information

A Study on the Performance of Cholesky-Factorization using MPI

A Study on the Performance of Cholesky-Factorization using MPI A Study o the Performace of Cholesky-Factorizatio usig MPI Ha S. Kim Scott B. Bade Departmet of Computer Sciece ad Egieerig Uiversity of Califoria Sa Diego {hskim, bade}@cs.ucsd.edu Abstract Cholesky-factorizatio

More information

CSC 220: Computer Organization Unit 11 Basic Computer Organization and Design

CSC 220: Computer Organization Unit 11 Basic Computer Organization and Design College of Computer ad Iformatio Scieces Departmet of Computer Sciece CSC 220: Computer Orgaizatio Uit 11 Basic Computer Orgaizatio ad Desig 1 For the rest of the semester, we ll focus o computer architecture:

More information

Ones Assignment Method for Solving Traveling Salesman Problem

Ones Assignment Method for Solving Traveling Salesman Problem Joural of mathematics ad computer sciece 0 (0), 58-65 Oes Assigmet Method for Solvig Travelig Salesma Problem Hadi Basirzadeh Departmet of Mathematics, Shahid Chamra Uiversity, Ahvaz, Ira Article history:

More information

3D Model Retrieval Method Based on Sample Prediction

3D Model Retrieval Method Based on Sample Prediction 20 Iteratioal Coferece o Computer Commuicatio ad Maagemet Proc.of CSIT vol.5 (20) (20) IACSIT Press, Sigapore 3D Model Retrieval Method Based o Sample Predictio Qigche Zhag, Ya Tag* School of Computer

More information

How do we evaluate algorithms?

How do we evaluate algorithms? F2 Readig referece: chapter 2 + slides Algorithm complexity Big O ad big Ω To calculate ruig time Aalysis of recursive Algorithms Next time: Litterature: slides mostly The first Algorithm desig methods:

More information

Analysis Metrics. Intro to Algorithm Analysis. Slides. 12. Alg Analysis. 12. Alg Analysis

Analysis Metrics. Intro to Algorithm Analysis. Slides. 12. Alg Analysis. 12. Alg Analysis Itro to Algorithm Aalysis Aalysis Metrics Slides. Table of Cotets. Aalysis Metrics 3. Exact Aalysis Rules 4. Simple Summatio 5. Summatio Formulas 6. Order of Magitude 7. Big-O otatio 8. Big-O Theorems

More information

Improvement of the Orthogonal Code Convolution Capabilities Using FPGA Implementation

Improvement of the Orthogonal Code Convolution Capabilities Using FPGA Implementation Improvemet of the Orthogoal Code Covolutio Capabilities Usig FPGA Implemetatio Naima Kaabouch, Member, IEEE, Apara Dhirde, Member, IEEE, Saleh Faruque, Member, IEEE Departmet of Electrical Egieerig, Uiversity

More information

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe CHAPTER 18 Strategies for Query Processig Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe Itroductio DBMS techiques to process a query Scaer idetifies

More information

Multi-Threading. Hyper-, Multi-, and Simultaneous Thread Execution

Multi-Threading. Hyper-, Multi-, and Simultaneous Thread Execution Multi-Threadig Hyper-, Multi-, ad Simultaeous Thread Executio 1 Performace To Date Icreasig processor performace Pipeliig. Brach predictio. Super-scalar executio. Out-of-order executio. Caches. Hyper-Threadig

More information

Elementary Educational Computer

Elementary Educational Computer Chapter 5 Elemetary Educatioal Computer. Geeral structure of the Elemetary Educatioal Computer (EEC) The EEC coforms to the 5 uits structure defied by vo Neuma's model (.) All uits are preseted i a simplified

More information

Bayesian approach to reliability modelling for a probability of failure on demand parameter

Bayesian approach to reliability modelling for a probability of failure on demand parameter Bayesia approach to reliability modellig for a probability of failure o demad parameter BÖRCSÖK J., SCHAEFER S. Departmet of Computer Architecture ad System Programmig Uiversity Kassel, Wilhelmshöher Allee

More information

Load balanced Parallel Prime Number Generator with Sieve of Eratosthenes on Cluster Computers *

Load balanced Parallel Prime Number Generator with Sieve of Eratosthenes on Cluster Computers * Load balaced Parallel Prime umber Geerator with Sieve of Eratosthees o luster omputers * Soowook Hwag*, Kyusik hug**, ad Dogseug Kim* *Departmet of Electrical Egieerig Korea Uiversity Seoul, -, Rep. of

More information

Fast Fourier Transform (FFT) Algorithms

Fast Fourier Transform (FFT) Algorithms Fast Fourier Trasform FFT Algorithms Relatio to the z-trasform elsewhere, ozero, z x z X x [ ] 2 ~ elsewhere,, ~ e j x X x x π j e z z X X π 2 ~ The DFS X represets evely spaced samples of the z- trasform

More information

Morgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5

Morgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5 Morga Kaufma Publishers 26 February, 28 COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter 5 Set-Associative Cache Architecture Performace Summary Whe CPU performace icreases:

More information

A New Morphological 3D Shape Decomposition: Grayscale Interframe Interpolation Method

A New Morphological 3D Shape Decomposition: Grayscale Interframe Interpolation Method A ew Morphological 3D Shape Decompositio: Grayscale Iterframe Iterpolatio Method D.. Vizireau Politehica Uiversity Bucharest, Romaia ae@comm.pub.ro R. M. Udrea Politehica Uiversity Bucharest, Romaia mihea@comm.pub.ro

More information

. Written in factored form it is easy to see that the roots are 2, 2, i,

. Written in factored form it is easy to see that the roots are 2, 2, i, CMPS A Itroductio to Programmig Programmig Assigmet 4 I this assigmet you will write a java program that determies the real roots of a polyomial that lie withi a specified rage. Recall that the roots (or

More information

A Parallel DFA Minimization Algorithm

A Parallel DFA Minimization Algorithm A Parallel DFA Miimizatio Algorithm Ambuj Tewari, Utkarsh Srivastava, ad P. Gupta Departmet of Computer Sciece & Egieerig Idia Istitute of Techology Kapur Kapur 208 016,INDIA pg@iitk.ac.i Abstract. I this

More information

IMP: Superposer Integrated Morphometrics Package Superposition Tool

IMP: Superposer Integrated Morphometrics Package Superposition Tool IMP: Superposer Itegrated Morphometrics Package Superpositio Tool Programmig by: David Lieber ( 03) Caisius College 200 Mai St. Buffalo, NY 4208 Cocept by: H. David Sheets, Dept. of Physics, Caisius College

More information

What are we going to learn? CSC Data Structures Analysis of Algorithms. Overview. Algorithm, and Inputs

What are we going to learn? CSC Data Structures Analysis of Algorithms. Overview. Algorithm, and Inputs What are we goig to lear? CSC316-003 Data Structures Aalysis of Algorithms Computer Sciece North Carolia State Uiversity Need to say that some algorithms are better tha others Criteria for evaluatio Structure

More information

Algorithms for Disk Covering Problems with the Most Points

Algorithms for Disk Covering Problems with the Most Points Algorithms for Disk Coverig Problems with the Most Poits Bi Xiao Departmet of Computig Hog Kog Polytechic Uiversity Hug Hom, Kowloo, Hog Kog csbxiao@comp.polyu.edu.hk Qigfeg Zhuge, Yi He, Zili Shao, Edwi

More information

Accuracy Improvement in Camera Calibration

Accuracy Improvement in Camera Calibration Accuracy Improvemet i Camera Calibratio FaJie L Qi Zag ad Reihard Klette CITR, Computer Sciece Departmet The Uiversity of Aucklad Tamaki Campus, Aucklad, New Zealad fli006, qza001@ec.aucklad.ac.z r.klette@aucklad.ac.z

More information

Pseudocode ( 1.1) Analysis of Algorithms. Primitive Operations. Pseudocode Details. Running Time ( 1.1) Estimating performance

Pseudocode ( 1.1) Analysis of Algorithms. Primitive Operations. Pseudocode Details. Running Time ( 1.1) Estimating performance Aalysis of Algorithms Iput Algorithm Output A algorithm is a step-by-step procedure for solvig a problem i a fiite amout of time. Pseudocode ( 1.1) High-level descriptio of a algorithm More structured

More information

One advantage that SONAR has over any other music-sequencing product I ve worked

One advantage that SONAR has over any other music-sequencing product I ve worked *gajedra* D:/Thomso_Learig_Projects/Garrigus_163132/z_productio/z_3B2_3D_files/Garrigus_163132_ch17.3d, 14/11/08/16:26:39, 16:26, page: 647 17 CAL 101 Oe advatage that SONAR has over ay other music-sequecig

More information

Counting the Number of Minimum Roman Dominating Functions of a Graph

Counting the Number of Minimum Roman Dominating Functions of a Graph Coutig the Number of Miimum Roma Domiatig Fuctios of a Graph SHI ZHENG ad KOH KHEE MENG, Natioal Uiversity of Sigapore We provide two algorithms coutig the umber of miimum Roma domiatig fuctios of a graph

More information

COMP Parallel Computing. PRAM (1): The PRAM model and complexity measures

COMP Parallel Computing. PRAM (1): The PRAM model and complexity measures COMP 633 - Parallel Computig Lecture 2 August 24, 2017 : The PRAM model ad complexity measures 1 First class summary This course is about parallel computig to achieve high-er performace o idividual problems

More information

An Improved Shuffled Frog-Leaping Algorithm for Knapsack Problem

An Improved Shuffled Frog-Leaping Algorithm for Knapsack Problem A Improved Shuffled Frog-Leapig Algorithm for Kapsack Problem Zhoufag Li, Ya Zhou, ad Peg Cheg School of Iformatio Sciece ad Egieerig Hea Uiversity of Techology ZhegZhou, Chia lzhf1978@126.com Abstract.

More information

Analysis of Documents Clustering Using Sampled Agglomerative Technique

Analysis of Documents Clustering Using Sampled Agglomerative Technique Aalysis of Documets Clusterig Usig Sampled Agglomerative Techique Omar H. Karam, Ahmed M. Hamad, ad Sheri M. Moussa Abstract I this paper a clusterig algorithm for documets is proposed that adapts a samplig-based

More information

An Efficient Algorithm for Graph Bisection of Triangularizations

An Efficient Algorithm for Graph Bisection of Triangularizations A Efficiet Algorithm for Graph Bisectio of Triagularizatios Gerold Jäger Departmet of Computer Sciece Washigto Uiversity Campus Box 1045 Oe Brookigs Drive St. Louis, Missouri 63130-4899, USA jaegerg@cse.wustl.edu

More information

Lecture 5. Counting Sort / Radix Sort

Lecture 5. Counting Sort / Radix Sort Lecture 5. Coutig Sort / Radix Sort T. H. Corme, C. E. Leiserso ad R. L. Rivest Itroductio to Algorithms, 3rd Editio, MIT Press, 2009 Sugkyukwa Uiversity Hyuseug Choo choo@skku.edu Copyright 2000-2018

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Part A Datapath Design

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Part A Datapath Design COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter The Processor Part A path Desig Itroductio CPU performace factors Istructio cout Determied by ISA ad compiler. CPI ad

More information

EE 459/500 HDL Based Digital Design with Programmable Logic. Lecture 13 Control and Sequencing: Hardwired and Microprogrammed Control

EE 459/500 HDL Based Digital Design with Programmable Logic. Lecture 13 Control and Sequencing: Hardwired and Microprogrammed Control EE 459/500 HDL Based Digital Desig with Programmable Logic Lecture 13 Cotrol ad Sequecig: Hardwired ad Microprogrammed Cotrol Refereces: Chapter s 4,5 from textbook Chapter 7 of M.M. Mao ad C.R. Kime,

More information

MATHEMATICAL METHODS OF ANALYSIS AND EXPERIMENTAL DATA PROCESSING (Or Methods of Curve Fitting)

MATHEMATICAL METHODS OF ANALYSIS AND EXPERIMENTAL DATA PROCESSING (Or Methods of Curve Fitting) MATHEMATICAL METHODS OF ANALYSIS AND EXPERIMENTAL DATA PROCESSING (Or Methods of Curve Fittig) I this chapter, we will eamie some methods of aalysis ad data processig; data obtaied as a result of a give

More information

Running Time ( 3.1) Analysis of Algorithms. Experimental Studies. Limitations of Experiments

Running Time ( 3.1) Analysis of Algorithms. Experimental Studies. Limitations of Experiments Ruig Time ( 3.1) Aalysis of Algorithms Iput Algorithm Output A algorithm is a step- by- step procedure for solvig a problem i a fiite amout of time. Most algorithms trasform iput objects ito output objects.

More information

Analysis of Algorithms

Analysis of Algorithms Aalysis of Algorithms Iput Algorithm Output A algorithm is a step-by-step procedure for solvig a problem i a fiite amout of time. Ruig Time Most algorithms trasform iput objects ito output objects. The

More information

Morgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5.

Morgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5. Morga Kaufma Publishers 26 February, 208 COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter 5 Virtual Memory Review: The Memory Hierarchy Take advatage of the priciple

More information

APPLICATION NOTE PACE1750AE BUILT-IN FUNCTIONS

APPLICATION NOTE PACE1750AE BUILT-IN FUNCTIONS APPLICATION NOTE PACE175AE BUILT-IN UNCTIONS About This Note This applicatio brief is iteded to explai ad demostrate the use of the special fuctios that are built ito the PACE175AE processor. These powerful

More information

ISSN (Print) Research Article. *Corresponding author Nengfa Hu

ISSN (Print) Research Article. *Corresponding author Nengfa Hu Scholars Joural of Egieerig ad Techology (SJET) Sch. J. Eg. Tech., 2016; 4(5):249-253 Scholars Academic ad Scietific Publisher (A Iteratioal Publisher for Academic ad Scietific Resources) www.saspublisher.com

More information

Course Site: Copyright 2012, Elsevier Inc. All rights reserved.

Course Site:   Copyright 2012, Elsevier Inc. All rights reserved. Course Site: http://cc.sjtu.edu.c/g2s/site/aca.html 1 Computer Architecture A Quatitative Approach, Fifth Editio Chapter 2 Memory Hierarchy Desig 2 Outlie Memory Hierarchy Cache Desig Basic Cache Optimizatios

More information

Python Programming: An Introduction to Computer Science

Python Programming: An Introduction to Computer Science Pytho Programmig: A Itroductio to Computer Sciece Chapter 1 Computers ad Programs 1 Objectives To uderstad the respective roles of hardware ad software i a computig system. To lear what computer scietists

More information

Running Time. Analysis of Algorithms. Experimental Studies. Limitations of Experiments

Running Time. Analysis of Algorithms. Experimental Studies. Limitations of Experiments Ruig Time Aalysis of Algorithms Iput Algorithm Output A algorithm is a step-by-step procedure for solvig a problem i a fiite amout of time. Most algorithms trasform iput objects ito output objects. The

More information

Improving Template Based Spike Detection

Improving Template Based Spike Detection Improvig Template Based Spike Detectio Kirk Smith, Member - IEEE Portlad State Uiversity petra@ee.pdx.edu Abstract Template matchig algorithms like SSE, Covolutio ad Maximum Likelihood are well kow for

More information

BOOLEAN DIFFERENTIATION EQUATIONS APPLICABLE IN RECONFIGURABLE COMPUTATIONAL MEDIUM

BOOLEAN DIFFERENTIATION EQUATIONS APPLICABLE IN RECONFIGURABLE COMPUTATIONAL MEDIUM MATEC Web of Cofereces 79, 01014 (016) DOI: 10.1051/ mateccof/0167901014 T 016 BOOLEAN DIFFERENTIATION EQUATIONS APPLICABLE IN RECONFIGURABLE COMPUTATIONAL MEDIUM Staislav Shidlovskiy 1, 1 Natioal Research

More information

Data Structures and Algorithms. Analysis of Algorithms

Data Structures and Algorithms. Analysis of Algorithms Data Structures ad Algorithms Aalysis of Algorithms Outlie Ruig time Pseudo-code Big-oh otatio Big-theta otatio Big-omega otatio Asymptotic algorithm aalysis Aalysis of Algorithms Iput Algorithm Output

More information

Appendix D. Controller Implementation

Appendix D. Controller Implementation COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Appedix D Cotroller Implemetatio Cotroller Implemetatios Combiatioal logic (sigle-cycle); Fiite state machie (multi-cycle, pipelied);

More information

ECE4050 Data Structures and Algorithms. Lecture 6: Searching

ECE4050 Data Structures and Algorithms. Lecture 6: Searching ECE4050 Data Structures ad Algorithms Lecture 6: Searchig 1 Search Give: Distict keys k 1, k 2,, k ad collectio L of records of the form (k 1, I 1 ), (k 2, I 2 ),, (k, I ) where I j is the iformatio associated

More information

Cubic Polynomial Curves with a Shape Parameter

Cubic Polynomial Curves with a Shape Parameter roceedigs of the th WSEAS Iteratioal Coferece o Robotics Cotrol ad Maufacturig Techology Hagzhou Chia April -8 00 (pp5-70) Cubic olyomial Curves with a Shape arameter MO GUOLIANG ZHAO YANAN Iformatio ad

More information

High-Speed Computation of the Kleene Star in Max-Plus Algebra Using a Cell Broadband Engine

High-Speed Computation of the Kleene Star in Max-Plus Algebra Using a Cell Broadband Engine Proceedigs of the 9th WSEAS Iteratioal Coferece o APPLICATIONS of COMPUTER ENGINEERING High-Speed Computatio of the Kleee Star i Max-Plus Algebra Usig a Cell Broadbad Egie HIROYUKI GOTO ad TAKAHIRO ICHIGE

More information

Heuristic Approaches for Solving the Multidimensional Knapsack Problem (MKP)

Heuristic Approaches for Solving the Multidimensional Knapsack Problem (MKP) Heuristic Approaches for Solvig the Multidimesioal Kapsack Problem (MKP) R. PARRA-HERNANDEZ N. DIMOPOULOS Departmet of Electrical ad Computer Eg. Uiversity of Victoria Victoria, B.C. CANADA Abstract: -

More information

Data diverse software fault tolerance techniques

Data diverse software fault tolerance techniques Data diverse software fault tolerace techiques Complemets desig diversity by compesatig for desig diversity s s limitatios Ivolves obtaiig a related set of poits i the program data space, executig the

More information

FAST BIT-REVERSALS ON UNIPROCESSORS AND SHARED-MEMORY MULTIPROCESSORS

FAST BIT-REVERSALS ON UNIPROCESSORS AND SHARED-MEMORY MULTIPROCESSORS SIAM J. SCI. COMPUT. Vol. 22, No. 6, pp. 2113 2134 c 21 Society for Idustrial ad Applied Mathematics FAST BIT-REVERSALS ON UNIPROCESSORS AND SHARED-MEMORY MULTIPROCESSORS ZHAO ZHANG AND XIAODONG ZHANG

More information

6.854J / J Advanced Algorithms Fall 2008

6.854J / J Advanced Algorithms Fall 2008 MIT OpeCourseWare http://ocw.mit.edu 6.854J / 18.415J Advaced Algorithms Fall 2008 For iformatio about citig these materials or our Terms of Use, visit: http://ocw.mit.edu/terms. 18.415/6.854 Advaced Algorithms

More information

Evaluation scheme for Tracking in AMI

Evaluation scheme for Tracking in AMI A M I C o m m u i c a t i o A U G M E N T E D M U L T I - P A R T Y I N T E R A C T I O N http://www.amiproject.org/ Evaluatio scheme for Trackig i AMI S. Schreiber a D. Gatica-Perez b AMI WP4 Trackig:

More information

Τεχνολογία Λογισμικού

Τεχνολογία Λογισμικού ΕΘΝΙΚΟ ΜΕΤΣΟΒΙΟ ΠΟΛΥΤΕΧΝΕΙΟ Σχολή Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών Τεχνολογία Λογισμικού, 7ο/9ο εξάμηνο 2018-2019 Τεχνολογία Λογισμικού Ν.Παπασπύρου, Αν.Καθ. ΣΗΜΜΥ, ickie@softlab.tua,gr

More information

New HSL Distance Based Colour Clustering Algorithm

New HSL Distance Based Colour Clustering Algorithm The 4th Midwest Artificial Itelligece ad Cogitive Scieces Coferece (MAICS 03 pp 85-9 New Albay Idiaa USA April 3-4 03 New HSL Distace Based Colour Clusterig Algorithm Vasile Patrascu Departemet of Iformatics

More information

Redundancy Allocation for Series Parallel Systems with Multiple Constraints and Sensitivity Analysis

Redundancy Allocation for Series Parallel Systems with Multiple Constraints and Sensitivity Analysis IOSR Joural of Egieerig Redudacy Allocatio for Series Parallel Systems with Multiple Costraits ad Sesitivity Aalysis S. V. Suresh Babu, D.Maheswar 2, G. Ragaath 3 Y.Viaya Kumar d G.Sakaraiah e (Mechaical

More information

Math Section 2.2 Polynomial Functions

Math Section 2.2 Polynomial Functions Math 1330 - Sectio. Polyomial Fuctios Our objectives i workig with polyomial fuctios will be, first, to gather iformatio about the graph of the fuctio ad, secod, to use that iformatio to geerate a reasoably

More information

9.1. Sequences and Series. Sequences. What you should learn. Why you should learn it. Definition of Sequence

9.1. Sequences and Series. Sequences. What you should learn. Why you should learn it. Definition of Sequence _9.qxd // : AM Page Chapter 9 Sequeces, Series, ad Probability 9. Sequeces ad Series What you should lear Use sequece otatio to write the terms of sequeces. Use factorial otatio. Use summatio otatio to

More information

GPUMP: a Multiple-Precision Integer Library for GPUs

GPUMP: a Multiple-Precision Integer Library for GPUs GPUMP: a Multiple-Precisio Iteger Library for GPUs Kaiyog Zhao ad Xiaowe Chu Departmet of Computer Sciece, Hog Kog Baptist Uiversity Hog Kog, P. R. Chia Email: {kyzhao, chxw}@comp.hkbu.edu.hk Abstract

More information

Τεχνολογία Λογισμικού

Τεχνολογία Λογισμικού ΕΘΝΙΚΟ ΜΕΤΣΟΒΙΟ ΠΟΛΥΤΕΧΝΕΙΟ Σχολή Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών Τεχνολογία Λογισμικού, 7ο/9ο εξάμηνο 2018-2019 Τεχνολογία Λογισμικού Ν.Παπασπύρου, Αν.Καθ. ΣΗΜΜΥ, ickie@softlab.tua,gr

More information

An Efficient Algorithm for Graph Bisection of Triangularizations

An Efficient Algorithm for Graph Bisection of Triangularizations Applied Mathematical Scieces, Vol. 1, 2007, o. 25, 1203-1215 A Efficiet Algorithm for Graph Bisectio of Triagularizatios Gerold Jäger Departmet of Computer Sciece Washigto Uiversity Campus Box 1045, Oe

More information

Cache-Optimal Methods for Bit-Reversals

Cache-Optimal Methods for Bit-Reversals Proceedigs of the ACM/IEEE Supercomputig Coferece, November 1999, Portlad, Orego, U.S.A. Cache-Optimal Methods for Bit-Reversals Zhao Zhag ad Xiaodog Zhag Departmet of Computer Sciece College of William

More information

A Generalized Set Theoretic Approach for Time and Space Complexity Analysis of Algorithms and Functions

A Generalized Set Theoretic Approach for Time and Space Complexity Analysis of Algorithms and Functions Proceedigs of the 10th WSEAS Iteratioal Coferece o APPLIED MATHEMATICS, Dallas, Texas, USA, November 1-3, 2006 316 A Geeralized Set Theoretic Approach for Time ad Space Complexity Aalysis of Algorithms

More information

arxiv: v2 [cs.ds] 24 Mar 2018

arxiv: v2 [cs.ds] 24 Mar 2018 Similar Elemets ad Metric Labelig o Complete Graphs arxiv:1803.08037v [cs.ds] 4 Mar 018 Pedro F. Felzeszwalb Brow Uiversity Providece, RI, USA pff@brow.edu March 8, 018 We cosider a problem that ivolves

More information

Creating Exact Bezier Representations of CST Shapes. David D. Marshall. California Polytechnic State University, San Luis Obispo, CA , USA

Creating Exact Bezier Representations of CST Shapes. David D. Marshall. California Polytechnic State University, San Luis Obispo, CA , USA Creatig Exact Bezier Represetatios of CST Shapes David D. Marshall Califoria Polytechic State Uiversity, Sa Luis Obispo, CA 93407-035, USA The paper presets a method of expressig CST shapes pioeered by

More information

Project 2.5 Improved Euler Implementation

Project 2.5 Improved Euler Implementation Project 2.5 Improved Euler Implemetatio Figure 2.5.10 i the text lists TI-85 ad BASIC programs implemetig the improved Euler method to approximate the solutio of the iitial value problem dy dx = x+ y,

More information

Chapter 4 Threads. Operating Systems: Internals and Design Principles. Ninth Edition By William Stallings

Chapter 4 Threads. Operating Systems: Internals and Design Principles. Ninth Edition By William Stallings Operatig Systems: Iterals ad Desig Priciples Chapter 4 Threads Nith Editio By William Stalligs Processes ad Threads Resource Owership Process icludes a virtual address space to hold the process image The

More information

Sorting in Linear Time. Data Structures and Algorithms Andrei Bulatov

Sorting in Linear Time. Data Structures and Algorithms Andrei Bulatov Sortig i Liear Time Data Structures ad Algorithms Adrei Bulatov Algorithms Sortig i Liear Time 7-2 Compariso Sorts The oly test that all the algorithms we have cosidered so far is compariso The oly iformatio

More information

Appendix A. Use of Operators in ARPS

Appendix A. Use of Operators in ARPS A Appedix A. Use of Operators i ARPS The methodology for solvig the equatios of hydrodyamics i either differetial or itegral form usig grid-poit techiques (fiite differece, fiite volume, fiite elemet)

More information

Analysis of Server Resource Consumption of Meteorological Satellite Application System Based on Contour Curve

Analysis of Server Resource Consumption of Meteorological Satellite Application System Based on Contour Curve Advaces i Computer, Sigals ad Systems (2018) 2: 19-25 Clausius Scietific Press, Caada Aalysis of Server Resource Cosumptio of Meteorological Satellite Applicatio System Based o Cotour Curve Xiagag Zhao

More information

Task scenarios Outline. Scenarios in Knowledge Extraction. Proposed Framework for Scenario to Design Diagram Transformation

Task scenarios Outline. Scenarios in Knowledge Extraction. Proposed Framework for Scenario to Design Diagram Transformation 6-0-0 Kowledge Trasformatio from Task Scearios to View-based Desig Diagrams Nima Dezhkam Kamra Sartipi {dezhka, sartipi}@mcmaster.ca Departmet of Computig ad Software McMaster Uiversity CANADA SEKE 08

More information

The Magma Database file formats

The Magma Database file formats The Magma Database file formats Adrew Gaylard, Bret Pikey, ad Mart-Mari Breedt Johaesburg, South Africa 15th May 2006 1 Summary Magma is a ope-source object database created by Chris Muller, of Kasas City,

More information

Empirical Validate C&K Suite for Predict Fault-Proneness of Object-Oriented Classes Developed Using Fuzzy Logic.

Empirical Validate C&K Suite for Predict Fault-Proneness of Object-Oriented Classes Developed Using Fuzzy Logic. Empirical Validate C&K Suite for Predict Fault-Proeess of Object-Orieted Classes Developed Usig Fuzzy Logic. Mohammad Amro 1, Moataz Ahmed 1, Kaaa Faisal 2 1 Iformatio ad Computer Sciece Departmet, Kig

More information

Proceedings of the 4th Annual Linux Showcase & Conference, Atlanta

Proceedings of the 4th Annual Linux Showcase & Conference, Atlanta USENIX Associatio Proceedigs of the 4th Aual Liux Showcase & Coferece, Atlata Atlata, Georgia, USA October 1 14, 2 THE ADVANCED COMPUTING SYSTEMS ASSOCIATION 2 by The USENIX Associatio All Rights Reserved

More information

APPLICATION NOTE. Automated Gain Flattening. 1. Experimental Setup. Scope and Overview

APPLICATION NOTE. Automated Gain Flattening. 1. Experimental Setup. Scope and Overview APPLICATION NOTE Automated Gai Flatteig Scope ad Overview A flat optical power spectrum is essetial for optical telecommuicatio sigals. This stems from a eed to balace the chael powers across large distaces.

More information

BASED ON ITERATIVE ERROR-CORRECTION

BASED ON ITERATIVE ERROR-CORRECTION A COHPARISO OF CRYPTAALYTIC PRICIPLES BASED O ITERATIVE ERROR-CORRECTIO Miodrag J. MihaljeviC ad Jova Dj. GoliC Istitute of Applied Mathematics ad Electroics. Belgrade School of Electrical Egieerig. Uiversity

More information

CMSC Computer Architecture Lecture 12: Virtual Memory. Prof. Yanjing Li University of Chicago

CMSC Computer Architecture Lecture 12: Virtual Memory. Prof. Yanjing Li University of Chicago CMSC 22200 Computer Architecture Lecture 12: Virtual Memory Prof. Yajig Li Uiversity of Chicago A System with Physical Memory Oly Examples: most Cray machies early PCs Memory early all embedded systems

More information

Enhancing Efficiency of Software Fault Tolerance Techniques in Satellite Motion System

Enhancing Efficiency of Software Fault Tolerance Techniques in Satellite Motion System Joural of Iformatio Systems ad Telecommuicatio, Vol. 2, No. 3, July-September 2014 173 Ehacig Efficiecy of Software Fault Tolerace Techiques i Satellite Motio System Hoda Baki Departmet of Electrical ad

More information

SCI Reflective Memory

SCI Reflective Memory Embedded SCI Solutios SCI Reflective Memory (Experimetal) Atle Vesterkjær Dolphi Itercoect Solutios AS Olaf Helsets vei 6, N-0621 Oslo, Norway Phoe: (47) 23 16 71 42 Fax: (47) 23 16 71 80 Mail: atleve@dolphiics.o

More information

Lecture 28: Data Link Layer

Lecture 28: Data Link Layer Automatic Repeat Request (ARQ) 2. Go ack N ARQ Although the Stop ad Wait ARQ is very simple, you ca easily show that it has very the low efficiecy. The low efficiecy comes from the fact that the trasmittig

More information

Keywords Software Architecture, Object-oriented metrics, Reliability, Reusability, Coupling evaluator, Cohesion, efficiency

Keywords Software Architecture, Object-oriented metrics, Reliability, Reusability, Coupling evaluator, Cohesion, efficiency Volume 3, Issue 9, September 2013 ISSN: 2277 128X Iteratioal Joural of Advaced Research i Computer Sciece ad Software Egieerig Research Paper Available olie at: www.ijarcsse.com Couplig Evaluator to Ehace

More information

GE FUNDAMENTALS OF COMPUTING AND PROGRAMMING UNIT III

GE FUNDAMENTALS OF COMPUTING AND PROGRAMMING UNIT III GE2112 - FUNDAMENTALS OF COMPUTING AND PROGRAMMING UNIT III PROBLEM SOLVING AND OFFICE APPLICATION SOFTWARE Plaig the Computer Program Purpose Algorithm Flow Charts Pseudocode -Applicatio Software Packages-

More information

Application of rule rewriting system to software automated tuning

Application of rule rewriting system to software automated tuning Applicatio of rule rewritig system to software automated tuig Ivaeko P.A., Dorosheko A.Y. Istitute of Software Systems of Natioal Academy of Scieces of Ukraie, Glushkov av. 40, 03187 Kyiv, Ukraie paiv@ukr.et,

More information

The University of Adelaide, School of Computer Science 22 November Computer Architecture. A Quantitative Approach, Sixth Edition.

The University of Adelaide, School of Computer Science 22 November Computer Architecture. A Quantitative Approach, Sixth Edition. Computer Architecture A Quatitative Approach, Sixth Editio Chapter 2 Memory Hierarchy Desig 1 Itroductio Programmers wat ulimited amouts of memory with low latecy Fast memory techology is more expesive

More information

Cluster Computing Spring 2004 Paul A. Farrell

Cluster Computing Spring 2004 Paul A. Farrell Cluster Computig Sprig 004 3/18/004 Parallel Programmig Overview Task Parallelism OS support for task parallelism Parameter Studies Domai Decompositio Sequece Matchig Work Assigmet Static schedulig Divide

More information

Sectio 4, a prototype project of settig field weight with AHP method is developed ad the experimetal results are aalyzed. Fially, we coclude our work

Sectio 4, a prototype project of settig field weight with AHP method is developed ad the experimetal results are aalyzed. Fially, we coclude our work 200 2d Iteratioal Coferece o Iformatio ad Multimedia Techology (ICIMT 200) IPCSIT vol. 42 (202) (202) IACSIT Press, Sigapore DOI: 0.7763/IPCSIT.202.V42.0 Idex Weight Decisio Based o AHP for Iformatio Retrieval

More information

Lecture 1: Introduction and Strassen s Algorithm

Lecture 1: Introduction and Strassen s Algorithm 5-750: Graduate Algorithms Jauary 7, 08 Lecture : Itroductio ad Strasse s Algorithm Lecturer: Gary Miller Scribe: Robert Parker Itroductio Machie models I this class, we will primarily use the Radom Access

More information

Markov Chain Model of HomePlug CSMA MAC for Determining Optimal Fixed Contention Window Size

Markov Chain Model of HomePlug CSMA MAC for Determining Optimal Fixed Contention Window Size Markov Chai Model of HomePlug CSMA MAC for Determiig Optimal Fixed Cotetio Widow Size Eva Krimiger * ad Haiph Latchma Dept. of Electrical ad Computer Egieerig, Uiversity of Florida, Gaiesville, FL, USA

More information

COSC 1P03. Ch 7 Recursion. Introduction to Data Structures 8.1

COSC 1P03. Ch 7 Recursion. Introduction to Data Structures 8.1 COSC 1P03 Ch 7 Recursio Itroductio to Data Structures 8.1 COSC 1P03 Recursio Recursio I Mathematics factorial Fiboacci umbers defie ifiite set with fiite defiitio I Computer Sciece sytax rules fiite defiitio,

More information

Parallel Polygon Approximation Algorithm Targeted at Reconfigurable Multi-Ring Hardware

Parallel Polygon Approximation Algorithm Targeted at Reconfigurable Multi-Ring Hardware Parallel Polygo Approximatio Algorithm Targeted at Recofigurable Multi-Rig Hardware M. Arif Wai* ad Hamid R. Arabia** *Califoria State Uiversity Bakersfield, Califoria, USA **Uiversity of Georgia, Georgia,

More information

EE123 Digital Signal Processing

EE123 Digital Signal Processing Last Time EE Digital Sigal Processig Lecture 7 Block Covolutio, Overlap ad Add, FFT Discrete Fourier Trasform Properties of the Liear covolutio through circular Today Liear covolutio with Overlap ad add

More information

CS 683: Advanced Design and Analysis of Algorithms

CS 683: Advanced Design and Analysis of Algorithms CS 683: Advaced Desig ad Aalysis of Algorithms Lecture 6, February 1, 2008 Lecturer: Joh Hopcroft Scribes: Shaomei Wu, Etha Feldma February 7, 2008 1 Threshold for k CNF Satisfiability I the previous lecture,

More information

Software development of components for complex signal analysis on the example of adaptive recursive estimation methods.

Software development of components for complex signal analysis on the example of adaptive recursive estimation methods. Software developmet of compoets for complex sigal aalysis o the example of adaptive recursive estimatio methods. SIMON BOYMANN, RALPH MASCHOTTA, SILKE LEHMANN, DUNJA STEUER Istitute of Biomedical Egieerig

More information

A General Framework for Accurate Statistical Timing Analysis Considering Correlations

A General Framework for Accurate Statistical Timing Analysis Considering Correlations A Geeral Framework for Accurate Statistical Timig Aalysis Cosiderig Correlatios 7.4 Vishal Khadelwal Departmet of ECE Uiversity of Marylad-College Park vishalk@glue.umd.edu Akur Srivastava Departmet of

More information

Python Programming: An Introduction to Computer Science

Python Programming: An Introduction to Computer Science Pytho Programmig: A Itroductio to Computer Sciece Chapter 6 Defiig Fuctios Pytho Programmig, 2/e 1 Objectives To uderstad why programmers divide programs up ito sets of cooperatig fuctios. To be able to

More information

CS 111 Green: Program Design I Lecture 27: Speed (cont.); parting thoughts

CS 111 Green: Program Design I Lecture 27: Speed (cont.); parting thoughts CS 111 Gree: Program Desig I Lecture 27: Speed (cot.); partig thoughts By Nascarkig - Ow work, CC BY-SA 4.0, https://commos.wikimedia.org/w/idex.php?curid=38671041 Robert H. Sloa (CS) & Rachel Poretsky

More information