Heterogeneous Parallel Computing: from Clusters of Workstations to Hierarchical Hybrid Platforms

Size: px
Start display at page:

Download "Heterogeneous Parallel Computing: from Clusters of Workstations to Hierarchical Hybrid Platforms"

Transcription

1 Heterogeneous Parallel Computng: from Clusters of Workstatons to Herarchcal Hybrd Platforms A.L. Lastovetsky 1 DOI: /jsf c The Author Ths paper s publshed wth open access at SuperFr.org The paper overvews the state of the art n desgn and mplementaton of data parallel scentfc applcatons on heterogeneous platforms. It covers both tradtonal approaches orgnally desgned for clusters of heterogeneous workstatons and the most recent methods developed n the context of modern multcore and mult-accelerator heterogeneous platforms. Keywords: parallel computng, heterogeneous computng, data parttonng. Introducton Hgh performance computng systems become ncreasngly heterogeneous and herarchcal. A typcal compute node ntegrates multple (possbly heterogeneous) cores as well as hardware accelerators such as Graphcs Processng Unts. The ntegraton s often herarchcal. The motvaton behnd such complcated archtecture s to make these systems more energy effcent. The energy consderaton s paramount as future large-scale cluster nfrastructures wll have to have hundreds of thousands of compute nodes to solve Exascale problems and would not be energy sustanable f nodes of tradtonal archtecture were used. Future large-scale systems wll exhbt multple forms of archtectural and non-archtectural heterogenety as well as mean-tmeto-falure of mnutes. How to develop parallel applcatons and software that effcently utlze hghly heterogeneous and herarchcal computng and communcaton resources, whle scalng them towards Exascale, mantanng a sustanable energy footprnt, and preservng correctness are hghly challengng and open questons. Heterogeneous parallel computng s the area that emerged n 1990s to address the challenges posed by ever ncreasng heterogenety and complexty of the HPC platforms. Ths paper overvews the development of heterogeneous parallel computng technologes as they followed the evoluton of heterogeneous HPC platforms from smple sngle-swtched heterogeneous clusters of (unprocessor) workstatons to modern herarchcal clusters of heterogeneous hybrd nodes. It manly focuses on the desgn of fundamental data parttonng algorthms supportng the development of data parallel applcatons able to automatcally tune to the executng heterogeneous platform achevng optmal performance (and energy) effcency. Data parallel applcatons are the man target of parallel computng technologes because they domnate the scentfc and engneerng computng doman, as well as the emergng doman of large-scale ( Bg ) data analytcs. Optmzaton of data parallel applcatons on heterogeneous platforms s typcally acheved by balancng the load of the heterogeneous processors and mnmzng the cost of movng data between the processors. Data parttonng algorthms solve ths problem by fndng the optmal dstrbuton of data between the processors. They typcally requre a pror nformaton about the parallel applcaton and platform. Data parttonng s not the only technque used for load balancng. Dynamc load balancng, such as task queue schedulng and work stealng [5, 9, 26, 39 41] balance the load by movng fne-graned tasks between processors durng the calculaton. Dynamc algorthms do not requre a pror nformaton about executon but may ncur sgnfcant 1 Unversty College Dubln, Dubln, Ireland 70 Supercomputng Fronters and Innovatons

2 A.L. Lastovetsky communcaton overhead on dstrbuted-memory platforms due to data mgraton. At the same tme, dynamc algorthms often use statc data parttonng for ther ntal step to mnmze the amount of data redstrbutons needed. For example, n the state-of-the-art load balancng technques for mult-node, multcore, and mult-gpu platforms, the performance gan s manly due to better ntal data parttonng. It was shown that even the statc dstrbuton based on smplstc performance models (sngle values specfyng the maxmum performance of a domnant computatonal kernel on CPUs and GPUs) mproves the performance of tradtonal dynamc schedulng technques by up to 250% [44]. In ths overvew we focus on parallel scentfc applcatons, where computatonal workload s drectly proportonal to the sze of data, and dedcated HPC platforms, where: () the performance of the applcaton s stable n tme and s not affected by varyng system load; () there s a sgnfcant overhead assocated wth data mgraton between computng devces; () optmzed archtecture-specfc lbrares mplementng the same kernels may be avalable for dfferent computng devces. On these platforms, for most scentfc applcatons, statc load balancng algorthms outperform dynamc ones because they do not nvolve data mgraton. Therefore, for the type of applcatons and platforms we focus on, data parttonng s the most approprate optmzaton technque. One very mportant aspect of optmzaton of parallel applcatons on dstrbuted-memory heterogeneous platforms optmzaton of ther communcaton cost, s not covered n ths paper. A recent analytcal overvew of methods for optmzaton of collectve communcaton operatons n heterogeneous networks can be found n [21]. 1. Optmzaton of parallel applcatons on heterogeneous clusters of workstatons 1.1. Data parttonng algorthms based on constant performance models Snce the late 1990s, when the frst poneerng works n the feld were publshed, the desgn of heterogeneous parallel algorthms has made a sgnfcant progress. At that tme, the man target platform for the heterogeneous parallel algorthms beng developed was a heterogeneous cluster of workstatons, and the smplest possble performance model of ths platform was used n the algorthm desgn. Namely, t was seen as a set of ndependent heterogeneous (un)processors, each characterzed by a sngle postve number representng ts speed. The speed of the processors can be absolute or relatve. The absolute speed of the processors s understood as the number of computatonal unts performed by the processor per one tme unt. The relatve speed of the processor can be obtaned by the normalzaton of ts absolute speed. Whle ths performance model has no communcaton-related parameters, t stll allows for optmzaton of the communcaton cost through the mnmzaton of the amount of data moved between processors. Ths model s also known as Constant Performance Model, or CPM. Usng the CPM, a fundamental problem of optmal dstrbuton of ndependent equal unts of computaton over a set of heterogeneous processors was formulated and solved n [7]. The algorthm [7] solvng ths problem s of complexty O(p 2 ) and only needs relatve speeds. Ths algorthm s a basc buldng block n many heterogeneous parallel and dstrbuted algorthms. Ths s typcal n the desgn of heterogeneous parallel algorthms that the problem of dstrbuton of computatons n proporton to the speed of processors s reduced to the problem of parttonng of some mathematcal objects, such as sets, matrces, graphs, etc. Most of the CPM-based algorthms desgned so far have been amed at numercal lnear algebra. For exam- 2014, Vol. 1, No. 3 71

3 Heterogeneous Parallel Computng: from Clusters of Workstatons to Herarchcal... ple, the problem of LU factorzaton of a dense matrx A was reduced to the problem of optmal mappng of ts column panels a 1,..., a n to p heterogeneous processors, and the latter problem was further reduced to the problem of parttonng of a well-ordered set (whose elements represent the column panels). Two effcent algorthms solvng ths parttonng problem have been proposed the Dynamc Programmng (DP) algorthm [7, 10] and the Reverse algorthm [34]. The latter s more sutable for extenson to more complex heterogeneous performance models. Other algorthms of parttonng of well-ordered sets, e.g. [6], do not guarantee the return of an optmal soluton. As matrces are probably the most wdely used mathematcal objects n scentfc computng, most of data-parttonng studes deal wth them. Matrx parttonng problems occur durng the desgn of parallel lnear algebra algorthms for heterogeneous platforms. A typcal heterogeneous lnear-algebra algorthm s desgned as a modfcaton of ts homogeneous prototype, and ts desgn s eventually reduced to the problem of optmally parttonng a matrx over heterogeneous processors. From the parttonng pont of vew, a dense matrx s an nteger-valued rectangular. Therefore, f we are only nterested n an asymptotcally optmal soluton (whch s typcally the case), the problem of ts parttonng can be reduced to a problem of the parttonng of a real-valued rectangle. In a general form, the related geometrcal problem has been formulated as follows [8]: Gven a set of p processors P 1, P 2,..., P p, the relatve speed of each of whch s characterzed by a postve constant, s, partton a unt square nto p rectangles so that: there s a one-to-one mappng between the rectangles and the processors; the area of the rectangle allocated to processor P s equal to s ; the parttonng mnmzes the sum of half-permeters of the rectangles. Ths formulaton s motvated by the SUMMA matrx multplcaton algorthm [23] and amed at balancng the load of the processors and mnmzaton of the total volume of data communcated between the processors. Fg. 1 shows one teraton of the heterogeneous SUMMA algorthm assumng that matrces A, B and C are dentcally parttoned nto rectangular submatrces. At each teraton of the man loop, pvot block column of matrx A and pvot block row of matrx B are broadcast horzontally and vertcally, then all processors update ther own parts of matrx C n parallel. The blockng factor b s a parameter used to adjust the granularty of communcatons and computatons [13], whose optmal value can be found expermentally. Fgure 1. Heterogeneous parallel matrx multplcaton Ths geometrcal parttonng problem s NP-complete [8], but many restrcted and practcally mportant versons of ths problem have been effcently solved. The least restrctve s probably the column-based problem lookng for an optmal parttonng, the rectangles of whch make up columns as llustrated n Fg. 2. An algorthm of the complexty O(p 3 ) was proposed n [8]. More restrcted forms of the column-based geometrcal parttonng problem have also 72 Supercomputng Fronters and Innovatons

4 A.L. Lastovetsky been addressed. The poneerng result n the feld was a lnear algorthm [27] addtonally assumng that the number of columns c n the parttonng and the number of rectangles n each column are gven. A column-based parttonng wth the same number of rectangles n each column s known as a grd-based parttonng. An algorthm of the complexty O(p 3/2 ) solvng the grd-based parttonng problem was proposed n [29]. P 1 P 12 P 8 P 9 P 2 P 7 P 4 P 11 P 6 P 10 P 3 P 5 Fgure 2. Column-based parttonng of the unt square nto 12 rectangles. The rectangles of the parttonng form three columns A parttonng whose rectangles make both columns and rows s known as a Cartesan parttonng. It s attractve from the mplementaton pont of vew because of ts very smple and scalable communcaton pattern. However, the related parttonng problems are very dffcult and very lttle has been acheved n addressng them so far [7]. More recent research [19, 20] challenged the optmalty of the rectangular matrx parttonng. Usng a specally developed mathematcal technque and fve dfferent parallel matrx multplcaton algorthms, t was proved that the optmal partton shape can be non-rectangular, and the full lst of optmal shapes for the cases of two and three processors was dentfed. Fg. 3 shows these for the case of three processors. The performance model used n ths work combned the CPM and the Hockney communcaton model [24]. These results have a potental to sgnfcantly mprove the performance of matrx computatons on platforms that can be modeled by a small number of nterconnected heterogeneous abstract processors, such as hybrd CPU/GPU nodes and clusters of clusters. Fgure 3. The canddate partton shapes prevously dentfed as potentally optmal three processor shapes. Processors P, R, and S are n whte, grey, and black, respectvely. (1) Square Corner (2) Rectangle Corner (3) Square Rectangle (4) Block 2D Rectangular (5) L Rectangular (6) Tradtonal 1D Rectangular Sgnfcant work has been done n parttonng algorthms for graphs, whch are then appled to sparse matrces and meshes, the mathematcal objects wdely used n many scentfc applcatons, e.g. computatonal flud dynamcs. Algorthms mplemented n ParMets [28], 2014, Vol. 1, No. 3 73

5 Heterogeneous Parallel Computng: from Clusters of Workstatons to Herarchcal... SCOTCH [12], JOSTLE [45] reduce the number of edges between the target subdomans, amng to mnmze the total communcaton cost of the parallel applcaton. Algorthms mplemented n Zoltan [11], PaGrd [4] try to mnmze the executon tme of the applcaton. All these graph parttonng lbrares use performance models combnng the CPM and the Hockney model. The models have to be provded by the users Data parttonng algorthms based on functonal performance models The CPM can be a suffcently accurate approxmaton of the performance of heterogeneous processors executng a data parallel applcaton f: () the processors are general-purpose and execute the same code, () the local tasks are small enough to ft n the man memory but large enough not to fully ft n the processor cache. However, f we consder essentally heterogeneous processors usng dfferent code to solve the same task locally, or allow the tasks to span dfferent levels of memory herarchy on dfferent processors, then the relatve speed of the processors can sgnfcantly dffer for dfferent task szes. In these stuatons, the CPM becomes naccurate, and ts use can lead to hghly mbalanced load dstrbuton [16]. To address ths challenge, a functonal performance model (FPM) [35, 37, 38] was proposed. The FPM represents the speed of a processor by a functon of problem sze. It s bult emprcally and ntegrates many mportant features characterzng the performance of both the archtecture and the applcaton. The speed s defned as the number of computaton unts processed per second. The computaton unt can be defned dfferently for dfferent applcatons. The mportant requrement s that ts sze (n terms of arthmetc operatons) should not vary durng the executon of the applcaton. One FLOP s a smplest example of computaton unt. The fundamental problem of optmal dstrbuton of n ndependent equal unts of computaton between p heterogeneous processors represented by ther speed functons was formulated, and very effcent geometrcal algorthms (of complextes O(p 2 log 2 n) and O(p log 2 n)) solvng ths problem under dfferent assumptons about the shape of the speed functons were proposed [31, 35]. These algorthms are based on the followng observaton. Let the speed of processor P be represented by contnuous functon s (d) = d t, where t (d) (d) s the executon tme for processng of d computaton unts on the processor P. Then the optmal soluton of ths problem, whch balances the load of the processors, wll be acheved when all processors execute ther work wthn the same tme: t 1 (d 1 ) =... = t p (d p ). Ths can be expressed as: d 1 s 1 (d 1 ) =... = d p s p (d p ), where d 1 + d d p = n (1) The soluton to these equatons, d 1,..., d p, can be represented geometrcally by ntersecton of the speed functons wth a lne passng through the orgn of the coordnate system as llustrated n Fg. 4 The geometrcal algorthms proceed as follows. As any lne passng through the orgn and ntersectng the speed functons represents an optmum dstrbuton for a partcular problem sze, the space of solutons of the problem (1) conssts of all such lnes. The two outer bounds of the soluton space are selected as the startng pont of algorthm. The upper lne represents the optmal data dstrbuton x u 1,..., xu p for some problem sze n u < n, n u = x u xu p, whle the lower lne gves the soluton x l 1,..., xl p for n l > n, n l = x l xl p. The regon between two lnes s teratvely bsected as shown n Fg Supercomputng Fronters and Innovatons

6 A.L. Lastovetsky Fgure 4. Optmal dstrbuton of computatonal unts showng the geometrc proportonalty of the number of computaton unts to the speeds of the processors Fgure 5. Geometrcal data parttonng algorthm. Lne 1 (the upper lne) and lne 2 (the lower lne) represent the two ntal outer bounds of the soluton space. Lne 3 represents the frst bsecton. Lne 4 represents the second one. The dashed lne represents the optmal soluton At the teraton k, the problem sze correspondng to the new lne ntersectng the speed functons at the ponts x k 1,..., xk p s calculated as n k = x k xk p. Dependng on whether n k s less than or greater than n, ths lne becomes a new upper or lower bound. Makng n k close to n, ths algorthm fnds the optmal partton of the gven problem x 1,..., x p : x x p = n. The geometrcal algorthms wll always fnd a unque optmal soluton f the speed functons satsfy the followng assumptons: 1. On the nterval [0, X], the functon s monotoncally ncreasng and concave. 2. On the nterval [X, ], the functon s monotoncally decreasng. Extensve experments wth many scentfc kernels on dfferent workstatons have demonstrated that, n general, processor speed can be approxmated, wthn some acceptable degree of accuracy, by a functon satsfyng these assumptons. Another algorthm [43] sgnfcantly relaxes the restrctons on the shape of speed functons but does not always guarantee the globally optmal soluton. Ths algorthm assumes that the 2014, Vol. 1, No. 3 75

7 Heterogeneous Parallel Computng: from Clusters of Workstatons to Herarchcal... Akma splne nterpolaton [1] s used to approxmate the speed functon. Then t formulates the problem of optmal data parttonng n the form of a system of non-lnear equatons and apples multdmensonal solvers to numercal soluton of ths system. The algorthm s teratve and always converges n a fnte number of teratons returnng a soluton that balances the load of the processors. The number of teratons depends on the shape of the functons. In practce, the number can be as lttle as 2 teratons for very smooth speed functons and up to 30 teratons when parttonng n regons of rapdly changng speed functons. For llustraton, Fg. 6 shows speed functon approxmatons used n the geometrcal algorthms and n the algorthm based on the multdmensonal solvers. Fgure 6. Speed functon for non-optmzed Netlb BLAS: the pecewse approxmaton satsfyng the restrcton of monotoncty (left), and the Akma splne nterpolaton (rght) These algorthms have been successfully employed n dfferent data-parallel kernels and applcatons and sgnfcantly outperformed ther CPM-based counterparts [2, 15, 16, 18, 25, 34]. Algorthms that requre full FPMs as nput to fnd the optmal parttonng can be used n applcatons developed for executon on the same stable platform multple tmes. In ths case, the cost of buldng the FPMs for the full range of problem szes wll be nsgnfcant n comparson wth the accumulated gans due to the optmal parallelzaton. However, these algorthms cannot be employed n self-adaptable applcatons that are supposed to dscover the performance characterstcs of the executng heterogeneous platform at run-tme. To address that type of applcaton, a new class of parttonng algorthms was proposed [36]. They do not need the FPMs as nput. Instead, they run on the processors executng the applcaton and teratvely buld partal approxmatons of ther speed functons untl they become suffcently accurate to partton the task of the gven sze wth the requred precson. For example, f we want to dstrbute n unts of computaton between p heterogeneous processors usng the geometrcal data parttonng, but the speed functons s (x) of the processors are not known a pror, we wll proceed as follows. The frst approxmatons of the partal speed functons, s (x), are created as constants s (x) = s 0 = s (n/p) as llustrated n Fg. 7(a). At the teraton k, the pecewse lnear approxmatons s (x) are mproved by addng the ponts (d k, sk ), Fg. 7(b). Namely, let {(d (j), s (j) )} m j=1, d(1) <... < d (m), be the expermentally obtaned ponts of s (x) used to buld ts current pecewse lnear approxmaton, then, s (1) ) of the s (x) approxmaton wll be replaced by two connected lne segments (0, s k )) (dk, sk ) and (dk, sk ) (d(1), s (1) ); 2. If d k > d (m), then the lne (d (m), s (m) ) (, s (m) ) of ths approxmaton wll be replaced 1. If d k < d(1), then the lne segment (0, s (1) ) (d (1) by the lne segment (d (m), s (m) ) (d k, sk ) and the lne (dk, sk ) (, sk ); 76 Supercomputng Fronters and Innovatons

8 3. If d (j) < d k < d(j+1), the lne segment (d (j), s (j) by two connected lne segments (d (j) A.L. Lastovetsky, s (j) ) (d (j+1) ) (d k, sk ) and (dk, sk ) (d(j+1), s (j+1) ) of s (d) wll be replaced, s (j+1) ). (a) (b) Fgure 7. Constructon of partal speed functons usng lnear nterpolaton. After addng the new data pont (d j, sj ) to the partal speed functon s (x), we verfy that the shape of the resultng pecewse lnear approxmaton satsfes the above assumptons, and update the value of s j when requred. Namely, to keep the partal speed functon ncreasng and convex on the nterval [0, X], we ensure that s j 1 s j sj+1 and sj 1 s j 2 sj d j 1 d j 2 sj 1 sj+1 d j s j. dj 1 d j+1 d j The latter expresson represents non-ncreasng tangent of the peces, whch s requred for the convex shape of the pecewse lnear approxmaton. On the nterval [X, ], we ensure that s j 1 s j sj+1 for monotonously decreasng speed functon. Ths approach has proved to be very effcent n practce, typcally convergng to the optmal soluton after a very few teratons [16]. Whle some other non-constant performance models of heterogeneous processors such as the unt-step functonal model [22], the functonal model wth lmts on task sze [32] and the band model [30] have been proposed and used for the desgn of heterogeneous algorthms, they dd not go beyond some prelmnary studes as they appeared to be not sutable for practcal use n hgh-performance heterogeneous scentfc computng due to a varety of reasons Implementaton of heterogeneous data parttonng algorthms It s mportant to note that the effectveness of the data parttonng algorthms presented n ths secton strongly depends on how accurately the performance models employed n these algorthms are reflectng the real performance of the data parallel applcatons on the executng platforms. Unfortunately many algorthms, especally CPM based, come wthout a method for estmaton of the employed performance model, leavng ths task to the user. Therefore the use of these algorthms as well as tools straghtforwardly employng these algorthms s a challengng task. The graph parttonng lbrares [4, 11, 12, 28, 45] gve us examples of such tools. At the same tme, some algorthm desgners nclude the method of constructon of the employed performance model n the defnton of the algorthm. Such algorthms are easy to use and compare. The estmaton method helps to understand: () the meanng of the model parameters leavng no room for nterpretaton, and () the assumptons made about the applcaton and the target platform better. Accordng to ths approach, model-based algorthms wll be dfferent even f they only dffer n the method of model constructon. Such algorthms can be found 2014, Vol. 1, No. 3 77

9 Heterogeneous Parallel Computng: from Clusters of Workstatons to Herarchcal... n [15, 16, 35, 43]. For example, [15] proposes a two-dmensonal matrx parttonng algorthm desgned for heterogeneous SUMMA (see Fg. 1). The defnton of ths algorthm specfcally stpulates that the FPMs of the processors wll be bult usng the computatonal kernel performng one update of the submatrx C wth the portons of pvot block column A and pvot block row B : C + =A B as shown n Fg. 8. Fgure 8. The computatonal kernel Moreover, t proposes to use one-dmensonal FPMs by combnng the heght m and wdth n parameters nto one parameter, area d = m n, measured n b b blocks, and to only use square areas n benchmarkng, m = n = d, for 0 < d M N. Then t s parttoned usng a one-dmensonal FPM-based algorthm to determne the areas of the rectangles that should be parttoned to each processor. The CPM-based algorthm [8] s then appled to calculate the optmum shape and orderng of the rectangles so that the total volume of communcaton s mnmzed. The algorthm descrbed above makes the assumpton that a benchmark of a square area wll gve an accurate predcton of computaton tme of any rectangle of the same area, namely s(x, x) = s(x/c, c.x). However, n general ths does not hold true for all c (Fg. 9(a)). Fortunately, n order to mnmse the total volume of communcaton the algorthm [8] arranges the rectangles so that they are as square as possble. It has been verfed expermentally [15] by parttonng a medum szed square dense matrx usng the new algorthm for 1 to 1000 nodes from the Grd 5000 platform (ncorporatng 20 unque nodes), and plotted the frequency of the rato m : n n Fg. 9(c). Fg. 9(b), showng a detal of Fg. 9(a), llustrates that f the rectangle s approxmately square the assumpton holds. The effcency of the FPM-based data-parallel applcatons strongly depends on the accuracy of the evaluaton of the speed functon of each heterogeneous processor. It s a challengng problem that requres: () carefully desgned experments to accurately and effcently measure the speed of the processor for each problem sze; () approprate nterpolaton and approxmaton methods whch use the expermental ponts to construct an accurate speed functon of the gven shape. A software tool, FuPerMod, helpng the applcaton programmer solve these problems has been recently developed and released [17]. FuPerMod also provdes a number of heterogeneous data parttonng algorthms for sets, ordered sets and matrces, both CPM-based and FPMbased. It does not provde graph-parttonng algorthms though. Graph-parttonng algorthms are provded by a number of lbrares such as ParMets [28], SCOTCH [12], JOSTLE [45], Zoltan [11], PaGrd [4]. Whle the parttonng algorthms mplemented n these lbrares use performance models, the lbrares provde no support for ther constructon. 2. Optmzaton of parallel applcatons on hybrd multcore and mult-accelerator heterogeneous platforms Thus, the tradtonal heterogeneous performance models and data parttonng algorthms and applcatons are desgned for platforms whose processng elements are ndependent of each 78 Supercomputng Fronters and Innovatons

10 A.L. Lastovetsky Speed (GFLOPS) Speed (GFLOPS) :40 1:20 1:1 20:1 40:1 Rato m:n 0 1:1.4 1:1.2 1:1 1.2:1 1.4:1 Rato m:n (a) Frequency (b) 1:1.4 1:1.2 1:1 1.2:1 1.4:1 Rato m:n (c) Fgure 9. Showng speed aganst the rato of the sdes of the parttoned rectangles. Lnes connect rectangles of equal area. The centerlne at 1 : 1 represents square shape. In general speed s not constant wth area (a). However when the rato s close to 1 : 1, speed s approxmately constant (b). (c) Shows the frequency dstrbuton of the rato of m : n usng the new parttonng algorthm for 1 to 1000 machnes (ncorporatng 20 unque hardware confguratons) other. In modern heterogeneous multcore and mult-accelerator compute nodes, however, processng elements are coupled and share system resources. In such platforms, the speed of one processng element often depends on the load of others due to resource contenton. Therefore, they cannot be consdered ndependent, and hence ther assocated performance models cannot be consdered and bult ndependently. Ths makes the tradtonal models, methods of ther evaluaton and algorthms no longer applcable to the new platforms. Ths problem was recently addressed n [46] [47] [48]. In ths work, the authors do not study how to develop computatonal kernels for ndvdual computng devces used n hybrd heterogeneous platforms, such as multcore CPUs or GPUs. They assume that such kernels are avalable for the use n parallel applcatons on these platforms. Whle beng very challengng and mportant, ths problem has attracted sgnfcant attenton of the HPC research communty and many mportant kernels have been ported to modern multcores and GPUs. Instead, they focus on a wde open problem of optmal data dstrbuton between kernels of the data-parallel applcaton assumng that the confguraton of the applcaton s fxed. Fndng the optmal confguraton of the applcaton s another challenge to be addressed, whch s out of the scope of ths work. The authors however gve few basc emprcal rules that, they beleve, lead to optmal confguratons. For example, never run a NUMA-unaware mult-threaded computatonal kernel across multple NUMA nodes. Use nstead multple nstances of ths kernel, one per NUMA node. A multcore and mult-gpu system, whch the man target archtecture n ths work, s modeled by a set of heterogeneous abstract processors determned by the confguraton of the 2014, Vol. 1, No. 3 79

11 Heterogeneous Parallel Computng: from Clusters of Workstatons to Herarchcal... parallel applcaton. Namely, a group of processng elements executng one computatonal kernel of the applcaton wll make a combned processng unt and wll be represented n the model by one abstract processor. For example, f a sngle-threaded computatonal kernel s used, then each CPU core executng ths kernel wll be represented n the model by a separate abstract processor. If a mult-threaded computatonal kernel s used, then each group of CPU cores executng the kernel wll make a combned processng unt represented n the model by one abstract processor. A GPU s usually controlled by a host process runnng on a dedcated CPU core. Ths process nstructs the GPU to perform computatons and handles data transfers between the host and devce memory. In the case of a sngle-gpu computatonal kernel, the GPU and ts dedcated CPU core wll make a combned processng unt represented by an abstract processor. If a mult- GPU computatonal kernel s used n the applcaton, the GPUs and ther dedcated CPU core wll make a combned processng unt represented by one abstract processor. Fgure 10. Performance modelng on a GPU-accelerated multcore server of NUMA archtecture: sngle-threaded and sngle-gpu computatonal kernels executed Fgure 11. Performance modelng on a GPU-accelerated multcore server of NUMA archtecture: mult-threaded and mult-gpu computatonal kernels executed; two GPUs handled by a sngle dedcated CPU core Fgures 10 and 11 llustrate ths approach showng a GPU-accelerated multcore server of NUMA archtecture executng a parallel applcaton n two dfferent confguratons. The confguraton shown n Fg. 10 s based on the sngle-threaded and sngle-gpu computatonal kernels. It conssts of ten processes runnng the CPU kernels on ten cores of both NUMA nodes, and two processes runnng the GPU kernels on accelerators and ther dedcated cores on the second NUMA node. The confguraton n Fg. 11 s based on the mult-threaded and mult-gpu computatonal kernels. It conssts of one process runnng the 6-thread CPU kernel on one NUMA node, one process runnng the 5-thread CPU kernel on another NUMA node, and one process runnng the GPU kernel on the GPUs and ther sngle dedcated core. All processng elements n these dagrams are enumerated. Each number ndcates the combned processng unt to whch 80 Supercomputng Fronters and Innovatons

12 A.L. Lastovetsky the processng element belongs. For example, n the frst confguraton, the cores n NUMA node 0 make sx processng unts, and each GPU wth ts dedcated CPU core n NUMA node 1 make a combned processng unt. In the frst confguraton, the cores n NUMA node 0 execute sx dentcal processes and are modeled by sx abstract processors. These cores are tghtly coupled and share memory, therefore, they cannot be consdered ndependent. On the other hand, ths group of processng elements s relatvely ndependent of other processng elements of the server. Therefore, ther performance should be measured smultaneously n a group but can be measured separately from the others. In the second confguraton, these sx cores execute one process and modeled as one combned processng unt. Its performance can be measured separately from other processng elements of the server. Next steps are to buld functonal performance models of the abstract processors and perform model-based data parttonng n order to balance the workload between the combned processng unts represented by these abstract processors. In order to buld the performance models of the abstract processors, the performance of the processng unts representng these processors has to be measured. To measure the performance of the processng unts accurately, they are grouped by the shared system resources, so that the resources be shared wthn each group but not shared between groups. The performance of processng unts n a group s measured when all processng unts n the group are executng some workload smultaneously, thereby takng nto account the nfluence of resource contenton. To prevent the operatng system from mgratng processes excessvely, processes are bound to CPU cores. Processes are synchronzed to mnmze the dle computatonal cycles, amng at the hghest floatng pont rate for the applcaton. Synchronzaton also ensures that the resources wll be shared between the maxmum number of processes. To ensure the relablty of the results, measurements are repeated multple tmes, and average executon tmes are used. One mportant emprcal rule used n ths work s that when lookng for the optmal dstrbuton of the workload, only the solutons that evenly dstrbute the workload between dentcal CPU processng unts are consdered. Ths smplfcaton sgnfcantly reduces the complexty of the data parttonng problem. It s based both on the authors extensve experments that have shown no evdence that uneven dstrbuton between dentcal processng unts could speed up applcatons, and on the absence of such evdence n lterature. Therefore, dentcal processng unts that share system resources wll be always gven the same amount of workload durng performance measurements. To account for dfferent confguratons of the applcaton, three types of functonal performance models for CPU cores are defned: 1. s(x) approxmates the speed of a unprocessor executng a sngle-threaded computatonal kernel. The speed s(x) = x/t, where x s the number of computaton unts, and t s the executon tme. 2. s c (x) approxmates the speed of one of c CPU cores all executng the same sngle-threaded computatonal kernel smultaneously. The speed s c (x) = x/t, where x s the number of computaton unts executed by each CPU core, and t s the executon tme. 3. S c (x) approxmates the collectve speed of c CPU cores executng a mult-threaded computatonal kernel. The speed S c (x) = x/t, where x s the total number of computaton unts executed by all c CPU cores, and t s the executon tme. S c (cx)/c s used to approxmate the average speed of a CPU core. 2014, Vol. 1, No. 3 81

13 Heterogeneous Parallel Computng: from Clusters of Workstatons to Herarchcal... Speed (GFlops) Problem sze s 1 (x) s 6 (x) s 12 (x) S 6 (6x)/6 S 12 (12x)/12 Fgure 12. Speed functons of a CPU core bult n dfferent confguratons Speed (GFlops) Problem sze g 1 (x) g 2 (x) g 4 (x) G 2 (2x)/2 G 4 (4x)/4 Fgure 13. Speed functons of a GPU processng unt bult n dfferent confguratons Fg. 12 shows speed functons of a CPU core bult n dfferent confguratons on a server, consstng of eght NUMA nodes connected by AMD HyperTransport(HT) lnks, wth 6 cores and 16 GB local memory each. The server s equpped wth a NVIDIA Tesla S2050 server, whch conssts of two pars of GPUs. Each par s connected by a PCIe swtch and lnked to a separate NUMA node by a PCIe bus. Smlarly, three types of functonal performance models for GPUs are defned as follows: 1. g(x) approxmates the speed of a combned processng unt made of a GPU and ts dedcated CPU core that execute a sngle-gpu computatonal kernel, exclusvely usng a PCIe lnk. The speed g(x) = x/t, where x s the number of computaton unts, and t s the executon tme. 2. g d (x) approxmates the speed of one of d combned processng unts, each made of a GPU and ts dedcated CPU core. All processng unts execute dentcal sngle-gpu computatonal kernels smultaneously. The speed g d (x) = x/t, where x s the number of computaton unts executed by each GPU processng unt, and t s the executon tme. 3. G d (x) approxmates the speed of a combned processng unt made of d GPUs and ther dedcated CPU core that collectvely execute a mult-gpu computatonal kernel. The speed G d (x) = x/t, where x s the total number of computaton unts processed by all d GPUs, and t s the executon tme. G d (dx)/d s used to approxmate the average speed of a GPU. 82 Supercomputng Fronters and Innovatons

14 A.L. Lastovetsky Fg. 13 shows the speed functons of a combned GPU processng unt bult n dfferent confguratons on the same server. From these experments we can see that dependng on the confguraton of the applcaton the speed of ndvdual cores and GPUs can vary sgnfcantly. Therefore, to acheve optmal dstrbuton of computatons t s very mportant to buld and use speed functons whch accurately reflect ther performance durng the executon of the applcaton. Ths work also reveals that the speed of GPU can depend on the load of CPU cores, whch should be also taken nto account durng the parttonng step. Experments wth lnear algebra kernels and a CFD applcaton valdated the effcency of the proposed approach. At the same tme, ths work has demonstrated the mportance of proper confguraton of the applcaton. For example, Fg. 14 demonstrates the mpact of NUMA mappng on the performance of a GPU processng unt, comprsed of a CPU core and a GPU of Tesla S2050 deployed n the expermental server. g 1 (x) s bult by executng one sngle-gpu gemm kernel, whch uses exclusvely the data lnk and the memory of a local or remote NUMA node. g 2 (x) s bult by executng two sngle-gpu kernels smultaneously on two GPU unts that share the PCIe lnk and the memory of the same NUMA node, local or remote. In the remote confguraton, the GPU unts also share an extra HT lnk to the remote NUMA node. Speed functon g 2 (x) s also bult n the confguraton when two dedcated CPU cores are located on dfferent NUMA nodes, whch s denoted as local + remote. In ths case, the processng unts share PCIe but do not share memory. The dfference between speed functons g 1 (x) and g 2 (x) reflects the performance degradaton due to the contenton for PCIe, HT and memory. Sgnfcant dfference s observed for large problem szes when many data transfers are requred. Communcaton overhead between NUMA nodes can be estmated by the dfference between g 1 (x) n local and remote confguratons. The combned effect of both phenomena s reflected by the g 2 (x) functons n dfferent confguratons. Multlevel herarchy n modern heterogeneous clusters represents another challenge to be addressed n the desgn of data parttonng algorthms. One soluton, a herarchcal matrx parttonng algorthm based on realstc performance models at each level of herarchy, was recently proposed n [14]. To mnmze the total executon tme of the applcaton t teratvely parttons a matrx between nodes and parttons these sub-matrces between the devces n a node. Ths s a self-adaptve algorthm that dynamcally bulds the performance models at runtme and t employs an algorthm to mnmze the total volume of communcaton. Ths algorthm Speed (GFLOPs) Problem sze g 1 (x) local g 2 (x) local g 2 (x) local+remote g 1 (x) remote g 2 (x) remote Fgure 14. Speed functons of a GPU processng unt bult n dfferent confguratons 2014, Vol. 1, No. 3 83

15 Heterogeneous Parallel Computng: from Clusters of Workstatons to Herarchcal... allows scentfc applcatons to perform load balanced matrx operatons wth nested parallelsm on herarchcal heterogeneous platforms. Large scale experments on a heterogeneous multcluster ste ncorporatng multcore CPUs and GPU nodes have shown that ths herarchcal algorthm outperforms all other state of the art approaches and successfully load balance very large problems. 3. Programmng tools In the past, the man platform used for non-trval heterogeneous parallel computng (as opposed to volunteer computng, such as the set@home project) has been a heterogeneous cluster of workstatons. MPI s a standard programmng model for ths platform. However, the mplementaton of real-world heterogeneous parallel algorthms n an effcent and portable form requres much more than just the code mplementng the algorthm for each legal combnaton of ts nput parameters. Extra code should be wrtten to fnd optmal values of some parameters (say, the number of processes and ther arrangement n a mult-dmensonal shape) or to accurately estmate the others (such as relatve speeds of the processors). Ths extra code may account for at least 95% of all code n common cases. Therefore, for the mplementaton of heterogeneous parallel algorthms on ths platform, a small number of programmng tools was developed. mpc [3] s the frst programmng language desgned for heterogeneous parallel computng. It facltates the mplementaton of heterogeneous parallel algorthms by automatng the development of the routne code, whch comes n two forms: () applcaton specfc code generated by a compler from the specfcaton of the mplemented algorthm provded by the applcaton programmer; () unversal code n the form of run-tme support system and lbrares. HeteroMPI [33] s an extenson of MPI nspred by mpc. It allows the programmer to re-use the avalable MPI code when developng applcatons for heterogeneous clusters of workstatons. Both mpc and HeteroMPI have been used for development of a wde range of real-lfe applcatons. HeteroMPI was also the nstrumental tool for mplementaton of Heterogeneous ScaLAPACK [42], a verson of ScaLAPACK optmzed for heterogeneous clusters of workstatons. Modern and future heterogeneous HPC systems necesstate the synthess of multple programmng models n the same code. Ths wll be a result of the use of multple heterogeneous many-core devces for acceleratng code, as well as the use of both shared- and dstrbutedaddress spaces n the same code to cope wth heterogeneous memory herarches and forms of communcaton. Syntheszng multple programmng models n the same code n a way that would provde a good balance of performance, portablty and programmablty, s far from trval. Despte long-standng efforts to program parallel applcatons wth hybrd programmng models (e.g. MPI/OpenMP) and some recent developments n programmng models for hybrd archtectures (e.g. OpenCL), t s stll a long way towards solutons that would satsfy the HPC communty. Ths work was conducted wth the fnancal support of Scence Foundaton Ireland, Grant 08/IN.1/I2054. Ths paper s dstrbuted under the terms of the Creatve Commons Attrbuton-Non Commercal 3.0 Lcense whch permts non-commercal use, reproducton and dstrbuton of the work wthout further permsson provded the orgnal work s properly cted. 84 Supercomputng Fronters and Innovatons

16 References A.L. Lastovetsky 1. H. Akma. A new method of nterpolaton and smooth curve fttng based on local procedures. Journal of the ACM, 17: , A. Alonaz, D. Keyes, A. Lastovetsky, and V. Rychkov. Desgn and optmzaton of openfoam-based cfd applcatons for hybrd and heterogeneous hpc platforms. 26th Internatonal Conference on Parallel Computatonal Flud Dynamcs (ParCFD 2014), Trondhem, Norway, D. Arapov, A. Kalnov, A. Lastovetsky, I. Ledovskh, and T. Lews. A programmng envronment for heterogenous dstrbuted memory machnes. In 6th Heterogeneous Computng Workshop (HCW 1997), pages IEEE, E. Aubanel and X. Wu. Incorporatng latency n heterogeneous graph parttonng. In IPDPS 2007, pages 1 8, C. Augonnet et al. Automatc calbraton of performance models on heterogeneous multcore archtectures. In EuroPar, J. Barbosa, J. Tavares, and A. J. Padlha. Lnear algebra algorthms n a heterogeneous cluster of personal computers. In 9th Heterogeneous Computng Workshop (HCW 2000), pages , O. Beaumont, V. Boudet, A. Pettet, F. Rastello, and Y. Robert. A proposal for a heterogeneous cluster scalapack (dense lnear solvers). IEEE Transactons on Computers, 50(10): , O. Beaumont, V. Boudet, F. Rastello, and Y. Robert. Matrx multplcaton on heterogeneous platforms. IEEE Transactons on Parallel and Dstrbuted Systems, 12(10): , R.D. Blumofe and C.E. Leserson. Schedulng multthreaded computatons by work stealng. JACM, 46(5): , P. Boulet, J. Dongarra, F. Rastello, Y. Robert, and F. Vven. Algorthmc ssues on heterogeneous computng platforms. Parallel Processng Letters, 9(2): , U. Catalyurek, E. Boman, K. Devne, et al. Hypergraph-based dynamc load balancng for adaptve scentfc computatons. In IPDPS 2007, pages 1 11, C. Chevaler and F. Pellegrn. Pt-scotch: A tool for effcent parallel graph orderng. Parallel Computng, 34(6 8): , J. Cho. A new parallel matrx multplcaton algorthm on dstrbuted-memory concurrent computers. In HPC Asa, pages , D. Clarke, A. Ilc, A. Lastovetsky, and L. Sousa. Herarchcal parttonng algorthm for scentfc computng on hghly heterogeneous cpu+ gpu clusters. In Euro-Par 2012 Parallel Processng, pages Sprnger, D. Clarke, A. Lastovetsky, and V. Rychkov. Column-based matrx parttonng for parallel matrx multplcaton on heterogeneous processors based on functonal performance models. In HeteroPar 2011, pages Sprnger, D. Clarke, A. Lastovetsky, and V. Rychkov. Dynamc load balancng of parallel computatonal teratve routnes on hghly heterogeneous HPC platforms. Parallel Processng Letters, 21(2): , , Vol. 1, No. 3 85

17 Heterogeneous Parallel Computng: from Clusters of Workstatons to Herarchcal D. Clarke, Z. Zhong, V. Rychkov, and A. Lastovetsky. Fupermod: A framework for optmal data parttonng for parallel scentfc applcatons on dedcated heterogeneous hpc platforms. In PaCT 2013, volume 7979 of LNCS, pages Sprnger, J. Colaco, A. Matoga, et al. Transparent applcaton acceleraton by ntellgent schedulng of shared lbrary calls on heterogeneous systems. In PPAM 2013, Part I, pages , A. DeFlumere and A. Lastovetsky. Searchng for the optmal data parttonng shape for parallel matrx matrx multplcaton on 3 heterogeneous processors. In 23rd Heterogenety n Computng Workshop (HCW 2014), pages 1 12, A. DeFlumere, A. Lastovetsky, and B. Becker. Parttonng for parallel matrx multplcaton wth heterogeneous processors: The optmal soluton. In 21st Heterogenety n Computng Workshop (HCW 2012), pages 1 15, K. Dchev and A. Lastovetsky. Optmzaton of collectve communcaton for heterogeneous hpc platforms. Hgh-Performance Computng on Complex Envronments, pages , M. Drozdowsk and P. Wolnewcz. Out-of-core dvsble load processng. IEEE Transactons on Parallel and Dstrbuted Systems, 14(10): , R. Van De Gejn and J. Watts. Summa: Scalable unversal matrx multplcaton algorthm. Concurrency-Practce and Experence, 9(4): , R. Hockney. The communcaton challenge for mpp: Intel paragon and meko cs-2. Parallel Computng, 20(3): , A. Ilc, F. Pratas, P. Trancoso, and L. Sousa. Hgh-performance computng on heterogeneous systems: Database queres on cpu and gpu. In Hgh Performance Scentfc Computng wth Specal Emphass on Current Capabltes and Future Perspectves. IOS Press, A. Ilc and L. Sousa. On realstc dvsble load schedulng n hghly heterogeneous dstrbuted systems. In PDP 2012, pages IEEE, A. Kalnov and A. Lastovetsky. Heterogeneous dstrbuton of computatons whle solvng lnear algebra problems on networks of heterogeneous computers. In 7th Internatonal Conference on Hgh Performance Computng and Networkng Europe (HPCN 99), pages , G. Karyps and K. Schloegel. ParMETIS: Parallel Graph Parttonng and Sparse Matrx Orderng Lbrary. Verson 4.0. Unversty of Mnnesota, MN, USA, A. Lastovetsky. On grd-based matrx parttonng for heterogeneous processors. In 6th Internatonal Symposum on Parallel and Dstrbuted Computng (ISPDC 2007), pages IEEE, A. Lastovetsky and R. Hggns. Schedulng for heterogeneous networks of computers wth persstent fluctuaton of load. In 13th Internatonal Conference on Parallel Computng (ParCo 2005), pages , A. Lastovetsky and R. Reddy. Data parttonng wth a realstc performance model of networks of heterogeneous computers. In IPDPS 2004, pages 1 15, A. Lastovetsky and R. Reddy. Data parttonng for multprocessors wth memory heterogenety and memory constrants. Scentfc Programmng, 13(2):93 112, Supercomputng Fronters and Innovatons

18 A.L. Lastovetsky 33. A. Lastovetsky and R. Reddy. Heteromp: Towards a message-passng lbrary for heterogeneous networks of computers. Journal of Parallel and Dstrbuted Computng, 66(2): , A. Lastovetsky and R. Reddy. Data parttonng for dense factorzaton on computers wth memory heterogenety. Parallel Computng, 33(12): , A. Lastovetsky and R. Reddy. Data parttonng wth a functonal performance model of heterogeneous processors. Internatonal Journal of Hgh Performance Computng Applcatons, 21:76 90, A. Lastovetsky and R. Reddy. Dstrbuted data parttonng for heterogeneous processors based on partal estmaton of ther functonal performance models. In Euro-Par 09, pages , A. Lastovetsky, R. Reddy, and R. Hggns. Buldng the functonal performance model of a processor. In Proceedngs of the 2006 ACM Symposum on Appled Computng (SAC 2006), pages ACM, A. Lastovetsky and J. Twamley. Towards a realstc performance model for networks of heterogeneous computers. In Hgh Performance Computatonal Scence and Engneerng, pages Sprnger, M. Lnderman, J. Collns, H. Wang, et al. Merge: a programmng model for heterogeneous mult-core systems. SIGPLAN Not., 43: , G. Quntana-Ortí et al. Solvng dense lnear systems on platforms wth multple hardware accelerators. SIGPLAN Not., 44: , J.N. Quntn and F. Wagner. Herarchcal work-stealng. Euro-Par 2010-Parallel Processng, pages , R. Reddy and A. Lastovetsky. Heteromp+ scalapack: towards a scalapack (dense lnear solvers) on heterogeneous networks of computers. In 13th IEEE Internatonal Conference on Hgh Performance Computng (HPC 2006), pages V. Rychkov, D. Clarke, and A. Lastovetsky. Usng multdmensonal solvers for optmal data parttonng on dedcated heterogeneous hpc platforms. In PaCT 2011, pages Sprnger-Verlag, F. Song et al. Enablng and scalng matrx computatons on heterogeneous mult-core and mult-gpu systems. In ICS, C. Walshaw and M. Cross. Multlevel mesh parttonng for heterogeneous communcaton networks. Future Generaton Computer Systems, 17(5): , Z. Zhong, V. Rychkov, and A. Lastovetsky. Data parttonng on heterogeneous multcore platforms. In Cluster 2011, pages IEEE, Z. Zhong, V. Rychkov, and A. Lastovetsky. Data parttonng on heterogeneous multcore and mult-gpu systems usng functonal performance models of data-parallel applcatons. In Cluster 2012, pages IEEE, Z. Zhong, V. Rychkov, and A. Lastovetsky. Data parttonng on heterogeneous multcore and mult-gpu platforms usng functonal performance models. IEEE Transactons on Computers, pages 1 14, , Vol. 1, No. 3 87

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

Parallel matrix-vector multiplication

Parallel matrix-vector multiplication Appendx A Parallel matrx-vector multplcaton The reduced transton matrx of the three-dmensonal cage model for gel electrophoress, descrbed n secton 3.2, becomes excessvely large for polymer lengths more

More information

Load Balancing for Hex-Cell Interconnection Network

Load Balancing for Hex-Cell Interconnection Network Int. J. Communcatons, Network and System Scences,,, - Publshed Onlne Aprl n ScRes. http://www.scrp.org/journal/jcns http://dx.do.org/./jcns.. Load Balancng for Hex-Cell Interconnecton Network Saher Manaseer,

More information

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 A mathematcal programmng approach to the analyss, desgn and

More information

Mathematics 256 a course in differential equations for engineering students

Mathematics 256 a course in differential equations for engineering students Mathematcs 56 a course n dfferental equatons for engneerng students Chapter 5. More effcent methods of numercal soluton Euler s method s qute neffcent. Because the error s essentally proportonal to the

More information

The Codesign Challenge

The Codesign Challenge ECE 4530 Codesgn Challenge Fall 2007 Hardware/Software Codesgn The Codesgn Challenge Objectves In the codesgn challenge, your task s to accelerate a gven software reference mplementaton as fast as possble.

More information

Concurrent Apriori Data Mining Algorithms

Concurrent Apriori Data Mining Algorithms Concurrent Apror Data Mnng Algorthms Vassl Halatchev Department of Electrcal Engneerng and Computer Scence York Unversty, Toronto October 8, 2015 Outlne Why t s mportant Introducton to Assocaton Rule Mnng

More information

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,

More information

Wavefront Reconstructor

Wavefront Reconstructor A Dstrbuted Smplex B-Splne Based Wavefront Reconstructor Coen de Vsser and Mchel Verhaegen 14-12-201212 2012 Delft Unversty of Technology Contents Introducton Wavefront reconstructon usng Smplex B-Splnes

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,

More information

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points; Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features

More information

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices Internatonal Mathematcal Forum, Vol 7, 2012, no 52, 2549-2554 An Applcaton of the Dulmage-Mendelsohn Decomposton to Sparse Null Space Bases of Full Row Rank Matrces Mostafa Khorramzadeh Department of Mathematcal

More information

GSLM Operations Research II Fall 13/14

GSLM Operations Research II Fall 13/14 GSLM 58 Operatons Research II Fall /4 6. Separable Programmng Consder a general NLP mn f(x) s.t. g j (x) b j j =. m. Defnton 6.. The NLP s a separable program f ts objectve functon and all constrants are

More information

Wishing you all a Total Quality New Year!

Wishing you all a Total Quality New Year! Total Qualty Management and Sx Sgma Post Graduate Program 214-15 Sesson 4 Vnay Kumar Kalakband Assstant Professor Operatons & Systems Area 1 Wshng you all a Total Qualty New Year! Hope you acheve Sx sgma

More information

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS ARPN Journal of Engneerng and Appled Scences 006-017 Asan Research Publshng Network (ARPN). All rghts reserved. NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS Igor Grgoryev, Svetlana

More information

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Some materal adapted from Mohamed Youns, UMBC CMSC 611 Spr 2003 course sldes Some materal adapted from Hennessy & Patterson / 2003 Elsever Scence Performance = 1 Executon tme Speedup = Performance (B)

More information

Distributed Resource Scheduling in Grid Computing Using Fuzzy Approach

Distributed Resource Scheduling in Grid Computing Using Fuzzy Approach Dstrbuted Resource Schedulng n Grd Computng Usng Fuzzy Approach Shahram Amn, Mohammad Ahmad Computer Engneerng Department Islamc Azad Unversty branch Mahallat, Iran Islamc Azad Unversty branch khomen,

More information

Module Management Tool in Software Development Organizations

Module Management Tool in Software Development Organizations Journal of Computer Scence (5): 8-, 7 ISSN 59-66 7 Scence Publcatons Management Tool n Software Development Organzatons Ahmad A. Al-Rababah and Mohammad A. Al-Rababah Faculty of IT, Al-Ahlyyah Amman Unversty,

More information

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009. Farrukh Jabeen Algorthms 51 Assgnment #2 Due Date: June 15, 29. Assgnment # 2 Chapter 3 Dscrete Fourer Transforms Implement the FFT for the DFT. Descrbed n sectons 3.1 and 3.2. Delverables: 1. Concse descrpton

More information

Feature Reduction and Selection

Feature Reduction and Selection Feature Reducton and Selecton Dr. Shuang LIANG School of Software Engneerng TongJ Unversty Fall, 2012 Today s Topcs Introducton Problems of Dmensonalty Feature Reducton Statstc methods Prncpal Components

More information

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Proceedngs of the Wnter Smulaton Conference M E Kuhl, N M Steger, F B Armstrong, and J A Jones, eds A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Mark W Brantley Chun-Hung

More information

y and the total sum of

y and the total sum of Lnear regresson Testng for non-lnearty In analytcal chemstry, lnear regresson s commonly used n the constructon of calbraton functons requred for analytcal technques such as gas chromatography, atomc absorpton

More information

An Entropy-Based Approach to Integrated Information Needs Assessment

An Entropy-Based Approach to Integrated Information Needs Assessment Dstrbuton Statement A: Approved for publc release; dstrbuton s unlmted. An Entropy-Based Approach to ntegrated nformaton Needs Assessment June 8, 2004 Wllam J. Farrell Lockheed Martn Advanced Technology

More information

AADL : about scheduling analysis

AADL : about scheduling analysis AADL : about schedulng analyss Schedulng analyss, what s t? Embedded real-tme crtcal systems have temporal constrants to meet (e.g. deadlne). Many systems are bult wth operatng systems provdng multtaskng

More information

Cluster Analysis of Electrical Behavior

Cluster Analysis of Electrical Behavior Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School

More information

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 An Iteratve Soluton Approach to Process Plant Layout usng Mxed

More information

Solving two-person zero-sum game by Matlab

Solving two-person zero-sum game by Matlab Appled Mechancs and Materals Onlne: 2011-02-02 ISSN: 1662-7482, Vols. 50-51, pp 262-265 do:10.4028/www.scentfc.net/amm.50-51.262 2011 Trans Tech Publcatons, Swtzerland Solvng two-person zero-sum game by

More information

Comparison of Heuristics for Scheduling Independent Tasks on Heterogeneous Distributed Environments

Comparison of Heuristics for Scheduling Independent Tasks on Heterogeneous Distributed Environments Comparson of Heurstcs for Schedulng Independent Tasks on Heterogeneous Dstrbuted Envronments Hesam Izakan¹, Ath Abraham², Senor Member, IEEE, Václav Snášel³ ¹ Islamc Azad Unversty, Ramsar Branch, Ramsar,

More information

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision SLAM Summer School 2006 Practcal 2: SLAM usng Monocular Vson Javer Cvera, Unversty of Zaragoza Andrew J. Davson, Imperal College London J.M.M Montel, Unversty of Zaragoza. josemar@unzar.es, jcvera@unzar.es,

More information

Fast Computation of Shortest Path for Visiting Segments in the Plane

Fast Computation of Shortest Path for Visiting Segments in the Plane Send Orders for Reprnts to reprnts@benthamscence.ae 4 The Open Cybernetcs & Systemcs Journal, 04, 8, 4-9 Open Access Fast Computaton of Shortest Path for Vstng Segments n the Plane Ljuan Wang,, Bo Jang

More information

Optimizing Document Scoring for Query Retrieval

Optimizing Document Scoring for Query Retrieval Optmzng Document Scorng for Query Retreval Brent Ellwen baellwe@cs.stanford.edu Abstract The goal of ths project was to automate the process of tunng a document query engne. Specfcally, I used machne learnng

More information

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z.

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z. TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS Muradalyev AZ Azerbajan Scentfc-Research and Desgn-Prospectng Insttute of Energetc AZ1012, Ave HZardab-94 E-mal:aydn_murad@yahoocom Importance of

More information

The Shortest Path of Touring Lines given in the Plane

The Shortest Path of Touring Lines given in the Plane Send Orders for Reprnts to reprnts@benthamscence.ae 262 The Open Cybernetcs & Systemcs Journal, 2015, 9, 262-267 The Shortest Path of Tourng Lnes gven n the Plane Open Access Ljuan Wang 1,2, Dandan He

More information

Performance Study of Parallel Programming on Cloud Computing Environments Using MapReduce

Performance Study of Parallel Programming on Cloud Computing Environments Using MapReduce Performance Study of Parallel Programmng on Cloud Computng Envronments Usng MapReduce Wen-Chung Shh, Shan-Shyong Tseng Department of Informaton Scence and Applcatons Asa Unversty Tachung, 41354, Tawan

More information

Smoothing Spline ANOVA for variable screening

Smoothing Spline ANOVA for variable screening Smoothng Splne ANOVA for varable screenng a useful tool for metamodels tranng and mult-objectve optmzaton L. Rcco, E. Rgon, A. Turco Outlne RSM Introducton Possble couplng Test case MOO MOO wth Game Theory

More information

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique //00 :0 AM Outlne and Readng The Greedy Method The Greedy Method Technque (secton.) Fractonal Knapsack Problem (secton..) Task Schedulng (secton..) Mnmum Spannng Trees (secton.) Change Money Problem Greedy

More information

S1 Note. Basis functions.

S1 Note. Basis functions. S1 Note. Bass functons. Contents Types of bass functons...1 The Fourer bass...2 B-splne bass...3 Power and type I error rates wth dfferent numbers of bass functons...4 Table S1. Smulaton results of type

More information

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics Introducton G10 NAG Fortran Lbrary Chapter Introducton G10 Smoothng n Statstcs Contents 1 Scope of the Chapter... 2 2 Background to the Problems... 2 2.1 Smoothng Methods... 2 2.2 Smoothng Splnes and Regresson

More information

Preconditioning Parallel Sparse Iterative Solvers for Circuit Simulation

Preconditioning Parallel Sparse Iterative Solvers for Circuit Simulation Precondtonng Parallel Sparse Iteratve Solvers for Crcut Smulaton A. Basermann, U. Jaekel, and K. Hachya 1 Introducton One mportant mathematcal problem n smulaton of large electrcal crcuts s the soluton

More information

Programming in Fortran 90 : 2017/2018

Programming in Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Exercse 1 : Evaluaton of functon dependng on nput Wrte a program who evaluate the functon f (x,y) for any two user specfed values

More information

Maintaining temporal validity of real-time data on non-continuously executing resources

Maintaining temporal validity of real-time data on non-continuously executing resources Mantanng temporal valdty of real-tme data on non-contnuously executng resources Tan Ba, Hong Lu and Juan Yang Hunan Insttute of Scence and Technology, College of Computer Scence, 44, Yueyang, Chna Wuhan

More information

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

Overview. Basic Setup [9] Motivation and Tasks. Modularization 2008/2/20 IMPROVED COVERAGE CONTROL USING ONLY LOCAL INFORMATION

Overview. Basic Setup [9] Motivation and Tasks. Modularization 2008/2/20 IMPROVED COVERAGE CONTROL USING ONLY LOCAL INFORMATION Overvew 2 IMPROVED COVERAGE CONTROL USING ONLY LOCAL INFORMATION Introducton Mult- Smulator MASIM Theoretcal Work and Smulaton Results Concluson Jay Wagenpfel, Adran Trachte Motvaton and Tasks Basc Setup

More information

Hermite Splines in Lie Groups as Products of Geodesics

Hermite Splines in Lie Groups as Products of Geodesics Hermte Splnes n Le Groups as Products of Geodescs Ethan Eade Updated May 28, 2017 1 Introducton 1.1 Goal Ths document defnes a curve n the Le group G parametrzed by tme and by structural parameters n the

More information

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1)

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1) Secton 1.2 Subsets and the Boolean operatons on sets If every element of the set A s an element of the set B, we say that A s a subset of B, or that A s contaned n B, or that B contans A, and we wrte A

More information

Evaluation of an Enhanced Scheme for High-level Nested Network Mobility

Evaluation of an Enhanced Scheme for High-level Nested Network Mobility IJCSNS Internatonal Journal of Computer Scence and Network Securty, VOL.15 No.10, October 2015 1 Evaluaton of an Enhanced Scheme for Hgh-level Nested Network Moblty Mohammed Babker Al Mohammed, Asha Hassan.

More information

Related-Mode Attacks on CTR Encryption Mode

Related-Mode Attacks on CTR Encryption Mode Internatonal Journal of Network Securty, Vol.4, No.3, PP.282 287, May 2007 282 Related-Mode Attacks on CTR Encrypton Mode Dayn Wang, Dongda Ln, and Wenlng Wu (Correspondng author: Dayn Wang) Key Laboratory

More information

Reducing Frame Rate for Object Tracking

Reducing Frame Rate for Object Tracking Reducng Frame Rate for Object Trackng Pavel Korshunov 1 and We Tsang Oo 2 1 Natonal Unversty of Sngapore, Sngapore 11977, pavelkor@comp.nus.edu.sg 2 Natonal Unversty of Sngapore, Sngapore 11977, oowt@comp.nus.edu.sg

More information

On Some Entertaining Applications of the Concept of Set in Computer Science Course

On Some Entertaining Applications of the Concept of Set in Computer Science Course On Some Entertanng Applcatons of the Concept of Set n Computer Scence Course Krasmr Yordzhev *, Hrstna Kostadnova ** * Assocate Professor Krasmr Yordzhev, Ph.D., Faculty of Mathematcs and Natural Scences,

More information

Meta-heuristics for Multidimensional Knapsack Problems

Meta-heuristics for Multidimensional Knapsack Problems 2012 4th Internatonal Conference on Computer Research and Development IPCSIT vol.39 (2012) (2012) IACSIT Press, Sngapore Meta-heurstcs for Multdmensonal Knapsack Problems Zhbao Man + Computer Scence Department,

More information

ON SOME ENTERTAINING APPLICATIONS OF THE CONCEPT OF SET IN COMPUTER SCIENCE COURSE

ON SOME ENTERTAINING APPLICATIONS OF THE CONCEPT OF SET IN COMPUTER SCIENCE COURSE Yordzhev K., Kostadnova H. Інформаційні технології в освіті ON SOME ENTERTAINING APPLICATIONS OF THE CONCEPT OF SET IN COMPUTER SCIENCE COURSE Yordzhev K., Kostadnova H. Some aspects of programmng educaton

More information

Dynamic Voltage Scaling of Supply and Body Bias Exploiting Software Runtime Distribution

Dynamic Voltage Scaling of Supply and Body Bias Exploiting Software Runtime Distribution Dynamc Voltage Scalng of Supply and Body Bas Explotng Software Runtme Dstrbuton Sungpack Hong EE Department Stanford Unversty Sungjoo Yoo, Byeong Bn, Kyu-Myung Cho, Soo-Kwan Eo Samsung Electroncs Taehwan

More information

An efficient iterative source routing algorithm

An efficient iterative source routing algorithm An effcent teratve source routng algorthm Gang Cheng Ye Tan Nrwan Ansar Advanced Networng Lab Department of Electrcal Computer Engneerng New Jersey Insttute of Technology Newar NJ 7 {gc yt Ansar}@ntedu

More information

Clock Skew Compensator for Wireless Wearable. Computer Systems

Clock Skew Compensator for Wireless Wearable. Computer Systems Contemporary Engneerng Scences, Vol. 7, 2014, no. 15, 767 772 HIKRI Ltd, www.m-hkar.com http://dx.do.org/10.12988/ces.2014.4688 Clock Skew Compensator for Wreless Wearable Computer Systems Kyeong Hur Dept.

More information

Lobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide

Lobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide Lobachevsky State Unversty of Nzhn Novgorod Polyhedron Quck Start Gude Nzhn Novgorod 2016 Contents Specfcaton of Polyhedron software... 3 Theoretcal background... 4 1. Interface of Polyhedron... 6 1.1.

More information

Type-2 Fuzzy Non-uniform Rational B-spline Model with Type-2 Fuzzy Data

Type-2 Fuzzy Non-uniform Rational B-spline Model with Type-2 Fuzzy Data Malaysan Journal of Mathematcal Scences 11(S) Aprl : 35 46 (2017) Specal Issue: The 2nd Internatonal Conference and Workshop on Mathematcal Analyss (ICWOMA 2016) MALAYSIAN JOURNAL OF MATHEMATICAL SCIENCES

More information

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization Problem efntons and Evaluaton Crtera for Computatonal Expensve Optmzaton B. Lu 1, Q. Chen and Q. Zhang 3, J. J. Lang 4, P. N. Suganthan, B. Y. Qu 6 1 epartment of Computng, Glyndwr Unversty, UK Faclty

More information

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes SPH3UW Unt 7.3 Sphercal Concave Mrrors Page 1 of 1 Notes Physcs Tool box Concave Mrror If the reflectng surface takes place on the nner surface of the sphercal shape so that the centre of the mrror bulges

More information

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,

More information

Assembler. Building a Modern Computer From First Principles.

Assembler. Building a Modern Computer From First Principles. Assembler Buldng a Modern Computer From Frst Prncples www.nand2tetrs.org Elements of Computng Systems, Nsan & Schocken, MIT Press, www.nand2tetrs.org, Chapter 6: Assembler slde Where we are at: Human Thought

More information

A Five-Point Subdivision Scheme with Two Parameters and a Four-Point Shape-Preserving Scheme

A Five-Point Subdivision Scheme with Two Parameters and a Four-Point Shape-Preserving Scheme Mathematcal and Computatonal Applcatons Artcle A Fve-Pont Subdvson Scheme wth Two Parameters and a Four-Pont Shape-Preservng Scheme Jeqng Tan,2, Bo Wang, * and Jun Sh School of Mathematcs, Hefe Unversty

More information

Sorting Review. Sorting. Comparison Sorting. CSE 680 Prof. Roger Crawfis. Assumptions

Sorting Review. Sorting. Comparison Sorting. CSE 680 Prof. Roger Crawfis. Assumptions Sortng Revew Introducton to Algorthms Qucksort CSE 680 Prof. Roger Crawfs Inserton Sort T(n) = Θ(n 2 ) In-place Merge Sort T(n) = Θ(n lg(n)) Not n-place Selecton Sort (from homework) T(n) = Θ(n 2 ) In-place

More information

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning Outlne Artfcal Intellgence and ts applcatons Lecture 8 Unsupervsed Learnng Professor Danel Yeung danyeung@eee.org Dr. Patrck Chan patrckchan@eee.org South Chna Unversty of Technology, Chna Introducton

More information

Machine Learning: Algorithms and Applications

Machine Learning: Algorithms and Applications 14/05/1 Machne Learnng: Algorthms and Applcatons Florano Zn Free Unversty of Bozen-Bolzano Faculty of Computer Scence Academc Year 011-01 Lecture 10: 14 May 01 Unsupervsed Learnng cont Sldes courtesy of

More information

Two-Stage Data Distribution for Distributed Surveillance Video Processing with Hybrid Storage Architecture

Two-Stage Data Distribution for Distributed Surveillance Video Processing with Hybrid Storage Architecture Two-Stage Data Dstrbuton for Dstrbuted Survellance Vdeo Processng wth Hybrd Storage Archtecture Yangyang Gao, Hatao Zhang, Bngchang Tang, Yanpe Zhu, Huadong Ma Bejng Key Lab of Intellgent Telecomm. Software

More information

Lecture 5: Multilayer Perceptrons

Lecture 5: Multilayer Perceptrons Lecture 5: Multlayer Perceptrons Roger Grosse 1 Introducton So far, we ve only talked about lnear models: lnear regresson and lnear bnary classfers. We noted that there are functons that can t be represented

More information

Research of Dynamic Access to Cloud Database Based on Improved Pheromone Algorithm

Research of Dynamic Access to Cloud Database Based on Improved Pheromone Algorithm , pp.197-202 http://dx.do.org/10.14257/dta.2016.9.5.20 Research of Dynamc Access to Cloud Database Based on Improved Pheromone Algorthm Yongqang L 1 and Jn Pan 2 1 (Software Technology Vocatonal College,

More information

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr) Helsnk Unversty Of Technology, Systems Analyss Laboratory Mat-2.08 Independent research projects n appled mathematcs (3 cr) "! #$&% Antt Laukkanen 506 R ajlaukka@cc.hut.f 2 Introducton...3 2 Multattrbute

More information

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth

More information

Private Information Retrieval (PIR)

Private Information Retrieval (PIR) 2 Levente Buttyán Problem formulaton Alce wants to obtan nformaton from a database, but she does not want the database to learn whch nformaton she wanted e.g., Alce s an nvestor queryng a stock-market

More information

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems A Unfed Framework for Semantcs and Feature Based Relevance Feedback n Image Retreval Systems Ye Lu *, Chunhu Hu 2, Xngquan Zhu 3*, HongJang Zhang 2, Qang Yang * School of Computng Scence Smon Fraser Unversty

More information

An Accurate Evaluation of Integrals in Convex and Non convex Polygonal Domain by Twelve Node Quadrilateral Finite Element Method

An Accurate Evaluation of Integrals in Convex and Non convex Polygonal Domain by Twelve Node Quadrilateral Finite Element Method Internatonal Journal of Computatonal and Appled Mathematcs. ISSN 89-4966 Volume, Number (07), pp. 33-4 Research Inda Publcatons http://www.rpublcaton.com An Accurate Evaluaton of Integrals n Convex and

More information

Parallel Numerics. 1 Preconditioning & Iterative Solvers (From 2016)

Parallel Numerics. 1 Preconditioning & Iterative Solvers (From 2016) Technsche Unverstät München WSe 6/7 Insttut für Informatk Prof. Dr. Thomas Huckle Dpl.-Math. Benjamn Uekermann Parallel Numercs Exercse : Prevous Exam Questons Precondtonng & Iteratve Solvers (From 6)

More information

Application of Improved Fish Swarm Algorithm in Cloud Computing Resource Scheduling

Application of Improved Fish Swarm Algorithm in Cloud Computing Resource Scheduling , pp.40-45 http://dx.do.org/10.14257/astl.2017.143.08 Applcaton of Improved Fsh Swarm Algorthm n Cloud Computng Resource Schedulng Yu Lu, Fangtao Lu School of Informaton Engneerng, Chongqng Vocatonal Insttute

More information

CMPS 10 Introduction to Computer Science Lecture Notes

CMPS 10 Introduction to Computer Science Lecture Notes CPS 0 Introducton to Computer Scence Lecture Notes Chapter : Algorthm Desgn How should we present algorthms? Natural languages lke Englsh, Spansh, or French whch are rch n nterpretaton and meanng are not

More information

A Facet Generation Procedure. for solving 0/1 integer programs

A Facet Generation Procedure. for solving 0/1 integer programs A Facet Generaton Procedure for solvng 0/ nteger programs by Gyana R. Parja IBM Corporaton, Poughkeepse, NY 260 Radu Gaddov Emery Worldwde Arlnes, Vandala, Oho 45377 and Wlbert E. Wlhelm Teas A&M Unversty,

More information

AMath 483/583 Lecture 21 May 13, Notes: Notes: Jacobi iteration. Notes: Jacobi with OpenMP coarse grain

AMath 483/583 Lecture 21 May 13, Notes: Notes: Jacobi iteration. Notes: Jacobi with OpenMP coarse grain AMath 483/583 Lecture 21 May 13, 2011 Today: OpenMP and MPI versons of Jacob teraton Gauss-Sedel and SOR teratve methods Next week: More MPI Debuggng and totalvew GPU computng Read: Class notes and references

More information

VISUAL SELECTION OF SURFACE FEATURES DURING THEIR GEOMETRIC SIMULATION WITH THE HELP OF COMPUTER TECHNOLOGIES

VISUAL SELECTION OF SURFACE FEATURES DURING THEIR GEOMETRIC SIMULATION WITH THE HELP OF COMPUTER TECHNOLOGIES UbCC 2011, Volume 6, 5002981-x manuscrpts OPEN ACCES UbCC Journal ISSN 1992-8424 www.ubcc.org VISUAL SELECTION OF SURFACE FEATURES DURING THEIR GEOMETRIC SIMULATION WITH THE HELP OF COMPUTER TECHNOLOGIES

More information

Very simple computational domains can be discretized using boundary-fitted structured meshes (also called grids)

Very simple computational domains can be discretized using boundary-fitted structured meshes (also called grids) Structured meshes Very smple computatonal domans can be dscretzed usng boundary-ftted structured meshes (also called grds) The grd lnes of a Cartesan mesh are parallel to one another Structured meshes

More information

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms Course Introducton Course Topcs Exams, abs, Proects A quc loo at a few algorthms 1 Advanced Data Structures and Algorthms Descrpton: We are gong to dscuss algorthm complexty analyss, algorthm desgn technques

More information

Analysis of Continuous Beams in General

Analysis of Continuous Beams in General Analyss of Contnuous Beams n General Contnuous beams consdered here are prsmatc, rgdly connected to each beam segment and supported at varous ponts along the beam. onts are selected at ponts of support,

More information

A New Approach For the Ranking of Fuzzy Sets With Different Heights

A New Approach For the Ranking of Fuzzy Sets With Different Heights New pproach For the ankng of Fuzzy Sets Wth Dfferent Heghts Pushpnder Sngh School of Mathematcs Computer pplcatons Thapar Unversty, Patala-7 00 Inda pushpndersnl@gmalcom STCT ankng of fuzzy sets plays

More information

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster

More information

Computer Animation and Visualisation. Lecture 4. Rigging / Skinning

Computer Animation and Visualisation. Lecture 4. Rigging / Skinning Computer Anmaton and Vsualsaton Lecture 4. Rggng / Sknnng Taku Komura Overvew Sknnng / Rggng Background knowledge Lnear Blendng How to decde weghts? Example-based Method Anatomcal models Sknnng Assume

More information

Quality Improvement Algorithm for Tetrahedral Mesh Based on Optimal Delaunay Triangulation

Quality Improvement Algorithm for Tetrahedral Mesh Based on Optimal Delaunay Triangulation Intellgent Informaton Management, 013, 5, 191-195 Publshed Onlne November 013 (http://www.scrp.org/journal/m) http://dx.do.org/10.36/m.013.5601 Qualty Improvement Algorthm for Tetrahedral Mesh Based on

More information

Lecture #15 Lecture Notes

Lecture #15 Lecture Notes Lecture #15 Lecture Notes The ocean water column s very much a 3-D spatal entt and we need to represent that structure n an economcal way to deal wth t n calculatons. We wll dscuss one way to do so, emprcal

More information

Efficient Distributed File System (EDFS)

Efficient Distributed File System (EDFS) Effcent Dstrbuted Fle System (EDFS) (Sem-Centralzed) Debessay(Debsh) Fesehaye, Rahul Malk & Klara Naherstedt Unversty of Illnos-Urbana Champagn Contents Problem Statement, Related Work, EDFS Desgn Rate

More information

Motivation. EE 457 Unit 4. Throughput vs. Latency. Performance Depends on View Point?! Computer System Performance. An individual user wants to:

Motivation. EE 457 Unit 4. Throughput vs. Latency. Performance Depends on View Point?! Computer System Performance. An individual user wants to: 4.1 4.2 Motvaton EE 457 Unt 4 Computer System Performance An ndvdual user wants to: Mnmze sngle program executon tme A datacenter owner wants to: Maxmze number of Mnmze ( ) http://e-tellgentnternetmarketng.com/webste/frustrated-computer-user-2/

More information

CSCI 104 Sorting Algorithms. Mark Redekopp David Kempe

CSCI 104 Sorting Algorithms. Mark Redekopp David Kempe CSCI 104 Sortng Algorthms Mark Redekopp Davd Kempe Algorthm Effcency SORTING 2 Sortng If we have an unordered lst, sequental search becomes our only choce If we wll perform a lot of searches t may be benefcal

More information

Virtual Machine Migration based on Trust Measurement of Computer Node

Virtual Machine Migration based on Trust Measurement of Computer Node Appled Mechancs and Materals Onlne: 2014-04-04 ISSN: 1662-7482, Vols. 536-537, pp 678-682 do:10.4028/www.scentfc.net/amm.536-537.678 2014 Trans Tech Publcatons, Swtzerland Vrtual Machne Mgraton based on

More information

CHAPTER 2 PROPOSED IMPROVED PARTICLE SWARM OPTIMIZATION

CHAPTER 2 PROPOSED IMPROVED PARTICLE SWARM OPTIMIZATION 24 CHAPTER 2 PROPOSED IMPROVED PARTICLE SWARM OPTIMIZATION The present chapter proposes an IPSO approach for multprocessor task schedulng problem wth two classfcatons, namely, statc ndependent tasks and

More information

Self-tuning Histograms: Building Histograms Without Looking at Data

Self-tuning Histograms: Building Histograms Without Looking at Data Self-tunng Hstograms: Buldng Hstograms Wthout Lookng at Data Ashraf Aboulnaga Computer Scences Department Unversty of Wsconsn - Madson ashraf@cs.wsc.edu Surajt Chaudhur Mcrosoft Research surajtc@mcrosoft.com

More information

Hierarchical clustering for gene expression data analysis

Hierarchical clustering for gene expression data analysis Herarchcal clusterng for gene expresson data analyss Gorgo Valentn e-mal: valentn@ds.unm.t Clusterng of Mcroarray Data. Clusterng of gene expresson profles (rows) => dscovery of co-regulated and functonally

More information

Positive Semi-definite Programming Localization in Wireless Sensor Networks

Positive Semi-definite Programming Localization in Wireless Sensor Networks Postve Sem-defnte Programmng Localzaton n Wreless Sensor etworks Shengdong Xe 1,, Jn Wang, Aqun Hu 1, Yunl Gu, Jang Xu, 1 School of Informaton Scence and Engneerng, Southeast Unversty, 10096, anjng Computer

More information

High-Boost Mesh Filtering for 3-D Shape Enhancement

High-Boost Mesh Filtering for 3-D Shape Enhancement Hgh-Boost Mesh Flterng for 3-D Shape Enhancement Hrokazu Yagou Λ Alexander Belyaev y Damng We z Λ y z ; ; Shape Modelng Laboratory, Unversty of Azu, Azu-Wakamatsu 965-8580 Japan y Computer Graphcs Group,

More information

Cost-efficient deployment of distributed software services

Cost-efficient deployment of distributed software services 1/30 Cost-effcent deployment of dstrbuted software servces csorba@tem.ntnu.no 2/30 Short ntroducton & contents Cost-effcent deployment of dstrbuted software servces Cost functons Bo-nspred decentralzed

More information

Performance Evaluation of Information Retrieval Systems

Performance Evaluation of Information Retrieval Systems Why System Evaluaton? Performance Evaluaton of Informaton Retreval Systems Many sldes n ths secton are adapted from Prof. Joydeep Ghosh (UT ECE) who n turn adapted them from Prof. Dk Lee (Unv. of Scence

More information

Unsupervised Learning

Unsupervised Learning Pattern Recognton Lecture 8 Outlne Introducton Unsupervsed Learnng Parametrc VS Non-Parametrc Approach Mxture of Denstes Maxmum-Lkelhood Estmates Clusterng Prof. Danel Yeung School of Computer Scence and

More information