ADJUSTING A PROGRAM TRANSFORMATION FOR LEGALITY

Size: px
Start display at page:

Download "ADJUSTING A PROGRAM TRANSFORMATION FOR LEGALITY"

Transcription

1 Parallel Processng Letters c World Scentfc Publshng Company ADJUSTING A PROGRAM TRANSFORMATION FOR LEGALITY CÉDRIC BASTOUL Laboratore PRSM, Unversté de Versalles Sant Quentn 45 avenue des États-Uns, 785 Versalles Cedex, France cedrc.bastoul@prsm.uvsq.fr and PAUL FEAUTRIER LIP, École Normale Supéreure de Lyon 46 Allée d Itale, 664 Lyon, France paul.feautrer@ens-lyon.fr Receved (August th 24) Revsed (December 2th 24) Communcated by (Marco Danelutto, Domenco Laforenza, Marco Vannesch) ABSTRACT Program transformatons are one of the most valuable compler technques to mprove parallelsm or data localty. However, restructurng complers have a hard tme copng wth data dependences. A typcal soluton s to focus on program parts where the dependences are smple enough to enable any transformaton. For more complex problems s only addressed the queston of checkng whether a transformaton s legal or not. In ths paper we propose to go further. Startng from a transformaton wth no guarantee on legalty, we show how we can correct t for dependence satsfacton. Two drectons are explored: frst when transformaton propertes can be explctly expressed and second when they are mplct as n the data localty transformaton case. Generatng code havng the best propertes s a drect applcaton of ths result. Keywords: Program transformatons, legalty, dependences, polyhedral model, localty.. Introducton The task of optmzng compute-bound programs s crucal for present day supercomputers f we notce that most of these machnes run at a few percent of ther peak performance. The problem can be stated as a combnatoral optmzaton problem, but due to the complexty of real-lfe programs and computers, ths approach s not practcal. Most of the tme, we start from a frst mplementaton, and try to mprove ts performance by successve transformatons. Besde mprovng the performances, a transformaton must be legal,.e. must not change the fnal results of the program. Ths s usually enforced by usng only transformatons that respect dependences [22]. Whle selectng an optmzng transformaton s not too dffcult for an experenced programmer, adustng ths transformaton for legalty s a tedous and error-prone process.

2 Parallel Processng Letters To bypass the dependence problem, most of the exstng methods apply only to perfect loop nests n whch dependences are non-exstent or have a specal form (fully permutable loop nests) [27]. To enlarge ther applcaton doman some preprocessng, e.g. loop skewng or code snkng, may be appled [27,,4]. More ambtous technques do not lay down any requrement on dependences, but are lmted to propose soluton canddates then to check them for legalty [7,8]. If the canddate s proved to volate dependences, then the proposed transformaton s dscarded and another canddate, perhaps havng less nterestng propertes s studed. In ths paper, we present a method that goes beyond checkng by adustng f possble a transformaton for dependence satsfacton, wthout modfyng ts optmzng propertes. Ths technque can be used to correct a transformaton canddate as well as to replace preprocessng. The technque has been desgned n the context of localtymprovng transformatons, but can be appled n many other cases. On the other hand, only transformatons whch can be represented as affne transformatons n teraton space can be corrected n ths way. Ths paper s organzed as follows. In secton 2 we outlne the background of ths work. Secton deals wth the transformatons n the polyhedral model and focuses on ther dependences constrants. Secton 4 shows how t s possble to correct a transformaton for legalty. Secton 5 compares our proposal to prevous work n the feld of localty enhancement then secton 6 concludes and dscusses future work. 2. Background and Notatons A loop n an mperatve language lke C or FORTRAN can be represented usng a n-entry column vector called ts teraton vector: x (, 2... n ) T, where k s the k th loop ndex and n s the nnermost loop. The surroundng loops and condtonals of a statement defne ts teraton doman. The statement s executed once for each element of the teraton doman. When loop bounds and condtonals only depend on surroundng loop counters, formal parameters and constants, the teraton doman can be specfed by a set of lnear nequaltes defnng a polyhedron [8]. The term polyhedron wll be used n a broad sense to denote a convex set of ponts n a lattce (also called Z-polyhedron or lattce-polyhedron),.e. a set of ponts n a Z vector space bounded by affne nequaltes [24]. A maxmal set of consecutve statements n a program wth such polyhedral teraton domans s called a statc control part (SCoP) [7]. Fgure llustrates the correspondence between statc control and polyhedral domans. Each ntegral pont of the polyhedron corresponds to an operaton,.e. an nstance of the statement. The notaton S( x) refers to the nstance of the statement S wth teraton vector x. The executon of the operatons follows lexcographc order. Ths means n a n-dmensonal polyhedron, the operaton correspondng to the ntegral pont defned by the coordnates (a...a n ) s executed before those correspondng to the coordnates (b...b n ) ff, < n, (a...a ) (b...b ) a + < b +. We wll use and for the strct and non strct lexcographc order, respectvely.

3 Adustng a Program Transformaton for Legalty do, n do, n f (<n+2-) S: B[+][2*+]... n+2 n 2 > <n 2 n <n > <n+2 n n n n + 2 C A (a) surroundng control of S (b) teraton doman of S Fgure : Statc control and correspondng teraton doman Each statement may nclude one or several references to arrays (scalars are zero-dmensonal arrays). When the subscrpt functon f( x) of a reference s affne, we can wrte t f( x) F x + a where F s called the subscrpt matrx and a s a constant vector. For nstance, the reference to the array B n fgure (a) s B[f( x)] wth f» 2 +. In ths paper, matrces are always denoted by captal letters, vectors and functons n vector spaces are not. When an element s statement-specfc, t s subscrpted lke A S ; the subscrpt may be omtted when t s clear from the context.. Affne Transformatons.. Formulaton The goal of a transformaton s to modfy the orgnal executon order of the operatons. A convenent way to express the new order s to gve for each operaton an executon date. However, defnng all the executon dates separately would usually requre very large schedulng systems. Thus optmzng complers buld schedules at the statement level by fndng a functon specfyng an executon tme for each nstance of the correspondng statement. These functons are chosen affne for multple reasons: ths s the only case where we are able to decde exactly the transformaton legalty and where we know how to generate the target code. Thus, schedulng functons have the followng shape: θ S ( x S ) T S x S + t S, () where x S s the teraton vector, T S s a constant transformaton matrx and t S s

4 Parallel Processng Letters a constant vector (possbly ncludng structure parameters). It has been extensvely shown that lnear transformatons can express most of the useful transformatons. In partcular, loop transformatons (such as loop reversal, permutaton or skewng) can be modeled as a smple partcular case called unmodular transformatons (the T S matrx has to be square and has determnant ±) [5,25]. Complex transformatons such as tlng [26] can be acheved usng lnear transformatons as well [28]. These transformatons modfy the source polyhedra nto target polyhedra contanng the same ponts, but wth a new lexcographc order. Consderng an orgnal polyhedron defned by the system of affne constrants A x+ c and the transformaton functon θ leadng to the target ndex y T x, we deduce that the transformed polyhedron can be defned by (AT ) y + c (there exts more convenent way to descrbe the target polyhedron as dscussed n [6]). For nstance, let us consder the polyhedron n fgure 2(a) and the transformaton functon θ. The correspondng transformaton s a well» known teraton space skewng and the resultng polyhedron s shown n fgure 2(c) B A h B A (a) orgnal polyhedron (b) transformaton functon (c) target polyhedron A x + c y T x (AT ) y + c Fgure 2: A skewng transformaton.2. Legalty In general, applyng an arbtrary transformaton to a program wll change ts semantcs. Two operatons are sad to be n dependence f they share a varable (memory cell) and at least one the operatons modfes t. Ths defnton was suggested by Bernsten [9] and s the most wdely used one n works about program transformaton. It s a suffcent condton for parallelsm, but s by no means necessary, as the well known case of reductons shows. Many tests have been desgned for dependence checkng. Most of these are based on suffcent condtons for ndependence. They gve an approxmate, conservatve answer. The best known examples are the GCD-test [], and the Baneree test [4]. On the other hand, one can use classcal algorthms from Lnear Integer program-

5 Adustng a Program Transformaton for Legalty mng to get exact answers, as n the Omega-test [2] and the Smplex-Gomory test [2]. In the same way many dependence representatons are possble, from the smplest ones as dependence levels [2] to the most precse as dependence polyhedra [6]. We chose n ths paper to use the most precse representaton of dependences: the dependence polyhedra. However, many authors have notced that approxmate dependences are specal cases of dependence polyhedra. Hence, our method apples whatever representaton s chosen, provded the approxmaton s conservatve. In secton.2., we recall how dependences n a SCoP can be expressed exactly usng lnear (n)equaltes. Then we show n secton.2.2 how to buld the legal transformaton space..2.. Dependence Graph A convenent way to represent the schedulng constrants s the dependence graph. In ths drected graph, each program statement s represented usng a unque vertex, and the exstng dependence relatons are represented usng edges. Each vertex s labelled wth the teraton doman of the correspondng statement and the edges wth the dependence polyhedron descrbng the dependence. The dependence relaton can be defned n the followng way: Defnton A statement R depends on a statement S (wrtten SδR) f there exts an operaton S( x ), an operaton R( x 2 ) and a memory locaton m such that:. S( x ) and R( x 2 ) refer the same memory locaton m, and at least one of them wrtes to that locaton; 2. x and x 2 respectvely belong to the teraton doman of S and R;. n the orgnal sequental order, S( x ) s executed before R( x 2 ). From ths defnton follows the descrpton of the dependence polyhedron by affne (n)equaltes. The constrants systems have the followng components:. Same memory locaton: assumng that m s an array locaton, ths constrant s the equalty of the subscrpt functons of a par of references to the same array: F S x S + a S F R x R + a R. 2. Iteraton domans: both S and R teraton domans can be descrbed usng affne nequaltes, respectvely A S x S + c S and A R x R + c R.. Precedence order: ths constrant can be separated nto a dsuncton of as many parts as there are common loops to both S and R. Each case corresponds to a common loop depth and s called a dependence level. For each dependence level l, the precedence constrants are the equalty of the loop ndex varables at depth lesser to l: x R, x S, for < l and x R,l > x S,l f l s less than the common nestng level. Otherwse, there are no addtonal constrants and the dependence only exsts f S s textually before R. Such constrants can be wrtten usng lnear nequaltes: P S x S P R x R + b.

6 Parallel Processng Letters Thus, the dependence polyhedron for SδR at a gven level l and for a gven par of references p can be descrbed usng the followng system of (n)equaltes: [ ( ) FS F D SδR,l,p : D xs x R + d R ] ( ) A S xs x R + (2) A R P S P R a S a R c S c R b There s a dependence SδR f there exsts an ntegral pont nsde D SδR,l,p. Ths can be easly checked wth some lnear nteger programmng tool lke PpLb a []. If ths polyhedron s not empty, there s an edge n the dependence graph from the vertex correspondng to S up to the one correspondng to R, labelled wth D SδR,l,p. For the sake of smplcty we wll gnore subscrpts l and p and refer n the followng to D SδR as the only dependence polyhedron descrbng SδR Legal Transformaton Space Consderng the transformatons as schedulng functons, the tme nterval n the target program between the executons of two operatons R( x R ) and S( x S ) s R,S ( xs x R ) θ R ( x R ) θ S ( x S ). () ( ) If there exsts a dependence SδR,.e. f D SδR s not empty, then xs R,S x R must be lexcopostve n D SδR (ntutvely, the tme nterval between two operatons R( x R ) and S( x S ) such that R( x R ) depends on S( x S ) must be at least (,...,, ) T, the smallest tme nterval: ths guarantees that the operaton R( x R ) s executed after S( x S ) n the target program). Ths condton represents as many constrants as there are ponts n R,S. Fortunately, all these constrants can be compacted n a small set of affne constrants wth the help of Farkas Lemma []. Lemma (Affne form of Farkas Lemma [24]) Let D be a nonempty polyhedron defned by the nequaltes A x+ b. Then any affne functon f( x) s nonnegatve everywhere n D ff t s a postve affne combnaton: λ and λ T f( x) λ + λ T (A x + b), wth λ and λ T. are called Farkas multplers. R,S s a vector. For t to be lexcopostve, some of ts components must be constraned to be ether non negatve or strctly postve. Let us apply Farkas Lemma to one of the constraned components. We can fnd a non-negatve scalar λ and a non-negatve vector λ T such that: T R, x R + t R (T S, x S + t S ) δ λ + ( ( ) λ T D xs x R + d ) (4) In ths formula, T R, and T S, are correspondng rows n the T R and T S matrces, and δ s zero or one accordng to the poston of the rows. a PpLb s freely avalable at cedb

7 Adustng a Program Transformaton for Legalty Ths formula can be splt n as many equaltes as there are ndependent varables ( x S and x R components and parameters) by equatng ther coeffcents n both sdes of the formula. The Farkas multplers can be elmnated by usng the Fourer- Motzkn proecton algorthm [24]. The result s a system of affne constrants on the coeffcents of the transformaton (the elements of T R, and T S, ). The mportant pont s that ths system s the same for all rows of the schedulng matrces and depends only on the dependence to be satsfed. Furthermore, t depends lnearly on the value of δ. These systems completely characterze the legal transformatons of a program, and can be computed once and for all as soon as the dependences are known. 4. Correctng Transformatons Both optmzng complers and programmers have a tendency to thnk of performances frst and to check legalty afterward. The basc framework s frst to fnd the best transformaton (e.g. n the case of data localty mprovement, whch references carry the most reuse and necesstate new access patterns, whch rank constrants should be respected by the correspondng transformaton functons, etc.), then to check f a canddate transformaton s legal or not b. If the check fals, buld and test another canddate, and so on. The maor advantage of such a framework s to focus frstly on the most nterestng propertes, and the man drawback s to forsake these propertes f a legal transformaton s not drectly found after a smple check of a canddate soluton. In ths secton we wll show how t s possble to correct a canddate transformaton for dependences, frstly when t can be descrbed usng explct constrants as dscussed n secton 4.. Then n secton 4.2 we study the specal case of data localty mprovement where the transformaton propertes are hdden. 4.. Transformatons Wth Explct Propertes Experts or optmzng complers have a wde choce of optmzng transformatons for a gven program. Each transformaton has a more or less precse cost model whch helps n decdng whether to apply the transformaton or not. In the polyhedral framework, many transformatons are related to well chosen schedulng functons [5,,]. For nstance, generalzed loop nterchange s assocated to schedules whose matrx s a permutaton matrx [5]. Tryng to use these transformatons as they are may result n a negatve dependence test. Let us consder the code n Fgure. An expert or an optmzng compler may decde that movng the -loop nnermost would result n better localty. But because of complex dependences, usng drectly the loop nterchange transformaton θ S s not legal. Ths wll lead usually to reectng the transformaton.» b Ths can be done easly by nstantatng the transformaton functons n the space of all affne transformaton as defned n secton.2, then checkng whether t belong to the legal subset usng any lnear algebra tool.

8 Parallel Processng Letters D S do, n do, n S: a(+) / * (a()+a(+)+a(+2)) D SδS,,<a(),a( +)> D SδS,,<a(+),a( )> D SδS,,<a(+),a( +)> D SδS,,<a(+),a( +2)> D SδS,,<a(+2),a( +)> D SδS,2,<a(+),a( )> D SδS,2,<a(+2),a( +)> Fgure : Orgnal Hyperbolc-PDE program and dependence graph The polyhedral model allow more flexblty when defnng such transformatons. Moreover, one can work wth an ncompletely specfed transformatons and use the legalty constrants as a way of solvng for the mssng coeffcents. The method conssts n statng the constrants the transformaton has to satsfy, then solvng these constrants and the legalty constrants (4), usng a lnear algebra tool as PpLb. If the system does not have a soluton, we conclude that there s no legal nstance of the proposed transformaton. For example, nnermostng the -loop n the code n Fgure means that we are lookng for a transformaton functon θ S» T, T,2 T 2, T 2,2 t + t2 wth as only constrants T2, and T 2,2. By solvng the system we fnd the soluton θ S» Expressed usng classcal transformaton technques, t s a combnaton of loop skewng and loop nterchange. It leads to the target program n Fgure 4 and as expected to a better cache behavor (on a 86 GHz system wth 28KB L cache memory and n the number of cache msses of the orgnal program s 68M but 4M for the target one).. do 2, 2*n do max( -n,), mn( -,n) - ; ; S: a(+) / * (a()+a(+)+a(+2)) Fgure 4: Fnal Hyperbolc-PDE program 4.2. Transformatons Wth Implct Propertes: Data Localty Cache are used n most computer systems to compensate for the msmatch

9 Adustng a Program Transformaton for Legalty between processor and memory performance (at the tme of wrtng, factors of to are commonplace). Caches work by explotng localty,.e. the fact that accesses to each memory cell and ts close neghbors have a tendency to cluster n the program code. Whle ths s found to work very well for ordnary programs, t fals for compute-bound codes where large datasets are accessed accordng to very regular patterns. The basc framework for ncreasng cache ht rates s to move references to a gven memory cell (or cache lne) to neghborng teratons of some nnermost loop. Ths reduces the elapsed tme between two accesses and hence decreases the probablty that the cell has been evcted from the cache. Such a transformaton usually changes the executon order of the program, hence t must be checked for legalty before beng appled. Another way of expressng the same ntuton s to assgn an executon date (a schedule) to each operaton, and to take care that the date of accesses to the same memory cell are almost equal. Ths s usually obtaned by requrng that the outer components of the schedule are equal. The methods contnues by applyng a completon procedure to acheve an nvertble transformaton functon (see [27] for references). For nstance, let us consder self-temporal localty and a reference B[f( x)] to an array B wth the affne subscrpt functon f( x) F x + a. Two nstances of ths reference, B[f( x )] and B[f( x 2 )] refers the same memory locaton ff f( x ) f( x 2 ), that s when F x + a F x 2 + a, then ff F x r wth x r x x 2. Thus there s self-temporal reuse when x r ker F. The bass vectors of ker F gve the reuse drectons for the reference B[f( x)]; f ker F s trval, there s no self-temporal reuse for the correspondng reference. Reuse can be exploted f the transformed teraton order follows one of the reuse drectons. Then we have to fnd a vector orthogonal to the chosen reuse drecton to be the frst part of the transformaton matrx T. If ths partal transformaton does not volate dependences, we have many choces for the completon procedure n order for the transformaton functon to be one-toone ether by consderng artfcal dependences [2,5] or not [6]. As an example, consder the followng pseudo-code: do, n do, n S:... B[]... the subscrpt functon of the reference B[] s f», the kernel of the subscrpt matrx s then ker F span {(, )}. Thus there s reuse generated by the reference B[], and we can explot t thank to a transformaton matrx bult wth an orthogonal vector to the reuse drecton, e.g. [ ] and ts completon to a unmodular transformaton matrx as descrbed n [5]: T». The transformaton functon would be θ,.e. a loop nterchange» (the reader may care to verfy that ths soluton do explot the reuse of the reference

10 Parallel Processng Letters B[]). It s easy to generalze ths method to several references by consderng not only a reuse drecton vector, but a reuse drecton space (bult wth one bass vector per reference). It appears that there are a lot of degrees of freedom when lookng for a transformaton mprovng self-temporal localty, snce t s possble to choose the reuse drecton space, the completon method and the constant vector of the transformaton functon. Let us consder self-temporal localty and a transformaton canddate before completon θ Sc ( x S ) T Sc x S. Ths functon has the property that, modfed n the followng way: θ S ( x S ) C S T Sc x S + t S, (5) where C S s an nvertble matrx and t S s a constant vector, the localty propertes are left unmodfed for each tme step. Intutvely, f θ Sc gves the same executon date for x and x 2, then the transformed functon θ S does t as well. In the same way f the dates are dfferent wth θ Sc, then the transformed functon θ S returns dfferent dates. But whle the values of C S and t S do not change the self-temporal localty propertes c, they can change the transformaton from an llegal to a legal one. It s clearly not possble to check all these transformatons for legalty. In the followng we study another way: we show how to fnd, when possble, the unknown components C S T Sc and t S of formula 5 n order to construct a legal transformaton. Correctng formulae smlar to (5) and havng the same type of degrees of freedom can be used to acheve every type of localty (self or group - temporal or spatal) [25,8]. The challenge s, consderng the canddate transformaton matrces T Sc, to fnd the corrected matrces C S T Sc and the constant vectors t S n order for the transformaton system to be legal for dependences. Ths problem can be solved n an teratve way, each dmenson beng consdered as a stand-alone transformaton. Each row of C S T Sc s a lnear combnaton of the rows of T Sc. Thus, the unknown n the th algorthm teraton are, for each statement, the lnear combnaton coeffcents buldng the th row of C S T Sc from T Sc and the constant factor of the correspondng t S entry. After each teraton, we have to update the dependence graph snce, by a property of lexcographc order, there s no need to consder the already satsfed dependences. Thus, to fnd a soluton s easer as the algorthm terates. The algorthm s shown n fgure 5. Let us llustrate how the algorthm works usng the example n fgure 6. Suppose that an optmzng compler would lke to explot the data reuse generated by the references to the array A of the program n fgure 6(a) and that t suggests the transformaton canddates n fgure 6(b). As shown by the graph descrbng the resultng operaton executon order, where each arrow represents a dependence relaton and each backward arrow s a dependence volaton, the transformaton system s not c Ths amount to notcng that the amount of localty n the transformed program s lnked to the rank of T Sc. For a more formal dscusson, see [8].

11 Adustng a Program Transformaton for Legalty Correcton Algorthm: adust a transformaton system to respect dependences Input: a dependence graph DG, the transformaton canddates θ Sc ( x S ) T Sc x S. Output: the legal transformatons θ S ( x S ) C S T Sc x S + t S.. for dmenson to maxmum dmenson of T Sc (a) buld the legal transformaton space wth: for each edge n DG, the constrants of (4) for the th row of T Rc and T Sc the constrants equatng the th row entres of each C S T Sc wth a lnear combnaton of T Sc entres whose coeffcents are unknown (b) for each statement, remove from the soluton space the trval soluton where the lnear combnaton coeffcent of the th row of T Sc s null (c) f the soluton space s empty, return, else. pck the soluton gvng for each statement the mnmum values for the entres of the th row of C S T Sc and the th element of t S. update DG: for each edge n DG, add to the dependence polyhedron the constrant equatng the th dmenson of C S T Sc x S + t S of the statements labellng the source and destnaton vertces (ths may empty the polyhedron for ntegral solutons). f every dependence polyhedra n DG are empty, goto 2 v. for each statement, update the canddate transformaton T Sc : replace a row such that the correspondng lnear combnaton coeffcent s not null wth the th row replace the th row wth the th row of C S T Sc 2. return the transformaton functons θ S ( x S ) C S T Sc x S + t S. Fgure 5: Algorthm to correct the transformaton functons legal. The correcton algorthm modfes successvely each transformaton dmenson. Each stand-alone transformaton splts up the operatons nto sets such that there are no backward arrows between sets. The algorthm stops when there are no more backward arrows or when every dmenson has been corrected. Then any polyhedral code generator, lke CLooG d [6], can generate the target code. Choosng transformaton coeffcents as small as possble (step (c)) s a heurstc helpng code generators to avod control overhead. The correctness of the algorthm comes from two propertes: () the target d CLooG s freely avalable at cedb

12 Parallel Processng Letters transformatons are legal, (2) the C S matrces are nvertble. The legalty s acheved because each transformaton part s chosen n the legal transformaton space (step a). The second property follows from the updatng polcy (step (c)v): at start the C S matrces are denttes. Durng each teraton, we exchange ther rows, multply some rows by non null constants (as guaranteed by step b) and add to these rows a lnear combnaton of the other rows. Each of these transformatons does not modfy the nvertblty property. 5. Related Work Snce they cannot deal wth (complex) dependences, the earlest works on localty mprovement dscuss enablng transformatons to modfy the program n such a way that the proposed method can apply. Wolf and Lam [25] proposed n ther semnal data localty optmzng algorthm to use skewng and reversal to enable tlng as n prevous works on automatc parallelzaton. McKnley et al. [2] proposed a technque based on a detaled cost model that drves the use of fuson and dstrbuton manly to enable loop permutaton. Such methods are lmted by the set of drectves they use (lke fuse or skew) and because they have to apply them n a defnte order. We clam that proposng (and correctng) schedulng functons s more complete and has better compostonalty propertes. A sgnfcant step on preprocessng technques to produce fully permutable loop nests has been acheved by Ahmed et al. []. They use Farkas Lemma to fnd a vald code snkng-lke transformaton f t exsts. But ths transformaton s stll ndependent from the optmzaton tself and t s lmted to produce a fully permutable loop nest. The method proposed n ths paper may fnd solutons even when t s not possble to extract such a loop nest. The method of Grebl et al. [4] s qute dfferent. Ther am s to mnmze the amount of communcaton n a dstrbuted program, whch s ndeed a knd of localty optmzaton. They frst take care of dependences by fndng a legal spacetme transformaton (.e. a schedule and a placement) and then tle n space-tme to acheve the optmal granularty. Adaptng these deas to cache optmzaton seems by no mean obvous, although t s an nterestng subect for further research. Reasonng drectly on schedulng functons, L and Pngal proposed a completon algorthm to buld a non-unmodular transformaton functon from a partal matrx, such that startng from a legal transformaton, the completed transformaton stay legal for dependences [2]. In the same sprt, Grebl et al. [5] extended an arbtrary matrx descrbng a legal transformaton to a square nvertble matrx. In contrast, we show n ths paper how to fnd the vald functons before completon. 6. Concluson and Future Work In ths paper we presented a general method correctng a program transformaton for legalty wth no consequence on ts propertes. It can be appled ether

13 Adustng a Program Transformaton for Legalty do, n do, n do k, n S: A(,k) A(,k) + B(,,k) / A(,k-) S2: c A(n,n) + (a) Orgnal program θ k A k 5 A ; θ A S2 S S S S S S S S,,2 2,,2,, 2,,,2,2 2,2,2,2, 2,2, (b) Transformaton functon canddates θ k A k 5 A ; θ n A S S S S S S S S,,2 2,,2,, 2,,,2,2 2,2,2,2, 2,2, S2 (c) Frst correcton teraton θ k A k 5 A ; θ n n A S S S S S S S S,, 2,,,,2 2,,2,2, 2,2,,2,2 2,2,2 S2 (d) Second and last correcton teraton do, n do k, n do, n S: A(,k) A(,k) + B(,,k) / A(,k-) S2: c A(n,n) + (e) Target program Fgure 6: Iteratve transformaton correcton prncple (n 2 for graphs)

14 Parallel Processng Letters when the propertes can be explctly expressed as affne constrants, ether when they are carred mplctly as data localty propertes. It has been mplemented n the Chunky prototype [8], advantageously replacng usual enablng preprocessng technques and savng a sgnfcant amount of nterestng transformatons from beng gnored. It could be used combned wth a wde range of exstng optmzng technques and n partcular for data localty mprovement methods, for the sngle processor case as well as for parallel systems usng space-tme mappngs [9]. Further mplementaton work s necessary to handle real-lfe benchmarks n our prototype and to provde full statstcs on corrected transformatons. Moreover, the queston of scalablty s left open snce, for several tenth of deeply nested statements, the number of unknown n the constrant systems can become embarrassngly large. Splttng up the problem accordng to the dependence graph s a soluton under nvestgaton. Acknowledgements The authors would lke to thank the CC Internatonal Conference on Compler Constructon anonymous revewers for havng nspred ths paper by manfestng ther nterest on ths part of our work. We also wsh to thank the Euro-Par anonymous revewers for ther help n mprovng the qualty of the paper. References [] N. Ahmed, N. Mateev, and K. Pngal. Tlng mperfectly-nested loop nests. In SC 2 Hgh Performance Networkng and Computng, Dallas, November 2. [2] J. Allen and K. Kennedy. Automatc translaton of FORTRAN programs to vector form. ACM Transactons on Programmng Languages and Systems, 9(4):49 542, october 987. [] U. Baneree. Data dependence n ordnary programs. Master s thess, Dept. of Computer Scence, Unversty of Illnos at Urbana-Champagn, November 976. [4] U. Baneree. Dependence Analyss for Supercomputng. Kluwer Academc, 988. [5] U. Baneree. Unmodular transformatons of double loops. In Advances n Languages and Complers for Parallel Processng, pages 92 29, Irvne, August 99. [6] C. Bastoul. Effcent code generaton for automatc parallelzaton and optmzaton. In ISPDC IEEE Internatonal Symposum on Parallel and Dstrbuted Computng, pages 2, Lublana, October 2. [7] C. Bastoul, A. Cohen, S. Grbal, S. Sharma, and O. Temam. Puttng polyhedral transformatons to work. In LCPC 6 Internatonal Workshop on Languages and Complers for Parallel Computers, LNCS 2958, pages , College Staton, October 2. [8] C. Bastoul and P. Feautrer. Improvng data localty by chunkng. In CC 2 Internatonal Conference on Compler Constructon, LNCS 2622, pages 2 5, Warsaw, Aprl 2. [9] A. Bernsten. Analyss of programs for parallel processng. IEEE Transactons on Electronc Computers, 5(5):757 76, October 966. [] A. Cohen, S. Grbal, and O. Temam. A polyhedral approach to ease the composton of program transformatons. In Euro-Par Internatonal Euro-Par Conference,

15 Adustng a Program Transformaton for Legalty Psa, August 24. [] P. Feautrer. Parametrc nteger programmng. RAIRO Recherche Opératonnelle, 22():24 268, 988. [2] P. Feautrer. Dataflow analyss of scalar and array references. Internatonal Journal of Parallel Programmng, 2():2 5, February 99. [] P. Feautrer. Some effcent solutons to the affne schedulng problem: one dmensonal tme. Internatonal Journal of Parallel Programmng, 2(5): 48, October 992. [4] M. Grebl, P. Faber, and C. Lengauer. Space-tme mappng and tlng a helpful combnaton. Concurrency and Computaton: Practce and Experence, 6():22 246, march 24. [5] M. Grebl, C. Lengauer, and S. Wetzel. Code generaton n the polytope model. In PACT 98 Internatonal Conference on Parallel Archtectures and Complaton Technques, pages 6, 998. [6] F. Irgon and R. Trolet. Computng dependence drecton vectors and dependence cones wth lnear systems. Techncal Report ENSMP-CAI-87-E94, Ecole des Mnes de Pars, Fontanebleau (France), 987. [7] I. Kodukula, N. Ahmed, and K. Pngal. Data-centrc mult-level blockng. In ACM SIGPLAN 97 Conference on Programmng Language Desgn and Implementaton, pages 46 57, Las Vegas, June 997. [8] D. Kuck. The Structure of Computers and Computatons. John Wley & Sons, Inc., 978. [9] C. Lengauer. Loop parallelzaton n the polytope model. In Internatonal Conference on Concurrency Theory, LNCS 75, pages 98 46, Hldeshem, August 99. [2] W. L and K. Pngal. A sngular loop transformaton framework based on non-sngular matrces. Internatonal Journal of Parallel Programmng, 22(2):8 25, Aprl 994. [2] K. McKnley, S. Carr, and C. Tseng. Improvng data localty wth loop transformatons. ACM Transactons on Programmng Languages and Systems, 8(4):424 45, July 996. [22] D. Padua and M. Wolf. Advanced compler optmzatons for supercomputers. Communcatons of the ACM, 29(2):84 2, December 996. [2] W. Pugh. The omega test: a fast and practcal nteger programmng algorthm for dependence analyss. In Proceedngs of the thrd ACM/IEEE conference on Supercomputng, pages 4, Albuquerque, August 99. [24] A. Schrver. Theory of lnear and nteger programmng. John Wley & Sons, 986. [25] M. Wolf and M. Lam. A data localty optmzng algorthm. In ACM SIGPLAN 9 Conference on Programmng Language Desgn and Implementaton, pages 44, New York, June 99. [26] M. Wolfe. Iteraton space tlng for memory herarches. In rd SIAM Conference on Parallel Processng for Scentfc Computng, pages 57 6, December 987. [27] M. Wolfe. Hgh performance complers for parallel computng. Addson-Wesley Publshng Company, 995. [28] J. Xue. On tlng as a loop transformaton. Parallel Processng Letters, 7(4):49 424, 997.

Polyhedral Compilation Foundations

Polyhedral Compilation Foundations Polyhedral Complaton Foundatons Lous-Noël Pouchet pouchet@cse.oho-state.edu Dept. of Computer Scence and Engneerng, the Oho State Unversty Feb 8, 200 888., Class # Introducton: Polyhedral Complaton Foundatons

More information

Loop Permutation. Loop Transformations for Parallelism & Locality. Legality of Loop Interchange. Loop Interchange (cont)

Loop Permutation. Loop Transformations for Parallelism & Locality. Legality of Loop Interchange. Loop Interchange (cont) Loop Transformatons for Parallelsm & Localty Prevously Data dependences and loops Loop transformatons Parallelzaton Loop nterchange Today Loop nterchange Loop transformatons and transformaton frameworks

More information

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

Loop Transformations for Parallelism & Locality. Review. Scalar Expansion. Scalar Expansion: Motivation

Loop Transformations for Parallelism & Locality. Review. Scalar Expansion. Scalar Expansion: Motivation Loop Transformatons for Parallelsm & Localty Last week Data dependences and loops Loop transformatons Parallelzaton Loop nterchange Today Scalar expanson for removng false dependences Loop nterchange Loop

More information

Vectorization in the Polyhedral Model

Vectorization in the Polyhedral Model Vectorzaton n the Polyhedral Model Lous-Noël Pouchet pouchet@cse.oho-state.edu Dept. of Computer Scence and Engneerng, the Oho State Unversty October 200 888. Introducton: Overvew Vectorzaton: Detecton

More information

Efficient Code Generation for Automatic Parallelization and Optimization

Efficient Code Generation for Automatic Parallelization and Optimization Effcent Code Generaton for utomatc Parallelzaton and Optmzaton Cédrc Bastoul Laboratore PRSM, Unversté de Versalles Sant Quentn 5 avenue des États-Uns, 7805 Versalles Cedex, France Emal: cedrcbastoul@prsmuvsqfr

More information

Today Using Fourier-Motzkin elimination for code generation Using Fourier-Motzkin elimination for determining schedule constraints

Today Using Fourier-Motzkin elimination for code generation Using Fourier-Motzkin elimination for determining schedule constraints Fourer Motzkn Elmnaton Logstcs HW10 due Frday Aprl 27 th Today Usng Fourer-Motzkn elmnaton for code generaton Usng Fourer-Motzkn elmnaton for determnng schedule constrants Unversty Fourer-Motzkn Elmnaton

More information

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,

More information

Loop Transformations, Dependences, and Parallelization

Loop Transformations, Dependences, and Parallelization Loop Transformatons, Dependences, and Parallelzaton Announcements Mdterm s Frday from 3-4:15 n ths room Today Semester long project Data dependence recap Parallelsm and storage tradeoff Scalar expanson

More information

Hermite Splines in Lie Groups as Products of Geodesics

Hermite Splines in Lie Groups as Products of Geodesics Hermte Splnes n Le Groups as Products of Geodescs Ethan Eade Updated May 28, 2017 1 Introducton 1.1 Goal Ths document defnes a curve n the Le group G parametrzed by tme and by structural parameters n the

More information

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour 6.854 Advanced Algorthms Petar Maymounkov Problem Set 11 (November 23, 2005) Wth: Benjamn Rossman, Oren Wemann, and Pouya Kheradpour Problem 1. We reduce vertex cover to MAX-SAT wth weghts, such that the

More information

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster

More information

Solving two-person zero-sum game by Matlab

Solving two-person zero-sum game by Matlab Appled Mechancs and Materals Onlne: 2011-02-02 ISSN: 1662-7482, Vols. 50-51, pp 262-265 do:10.4028/www.scentfc.net/amm.50-51.262 2011 Trans Tech Publcatons, Swtzerland Solvng two-person zero-sum game by

More information

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1)

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1) Secton 1.2 Subsets and the Boolean operatons on sets If every element of the set A s an element of the set B, we say that A s a subset of B, or that A s contaned n B, or that B contans A, and we wrte A

More information

CMPS 10 Introduction to Computer Science Lecture Notes

CMPS 10 Introduction to Computer Science Lecture Notes CPS 0 Introducton to Computer Scence Lecture Notes Chapter : Algorthm Desgn How should we present algorthms? Natural languages lke Englsh, Spansh, or French whch are rch n nterpretaton and meanng are not

More information

GSLM Operations Research II Fall 13/14

GSLM Operations Research II Fall 13/14 GSLM 58 Operatons Research II Fall /4 6. Separable Programmng Consder a general NLP mn f(x) s.t. g j (x) b j j =. m. Defnton 6.. The NLP s a separable program f ts objectve functon and all constrants are

More information

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices Internatonal Mathematcal Forum, Vol 7, 2012, no 52, 2549-2554 An Applcaton of the Dulmage-Mendelsohn Decomposton to Sparse Null Space Bases of Full Row Rank Matrces Mostafa Khorramzadeh Department of Mathematcal

More information

Mathematics 256 a course in differential equations for engineering students

Mathematics 256 a course in differential equations for engineering students Mathematcs 56 a course n dfferental equatons for engneerng students Chapter 5. More effcent methods of numercal soluton Euler s method s qute neffcent. Because the error s essentally proportonal to the

More information

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique //00 :0 AM Outlne and Readng The Greedy Method The Greedy Method Technque (secton.) Fractonal Knapsack Problem (secton..) Task Schedulng (secton..) Mnmum Spannng Trees (secton.) Change Money Problem Greedy

More information

A Facet Generation Procedure. for solving 0/1 integer programs

A Facet Generation Procedure. for solving 0/1 integer programs A Facet Generaton Procedure for solvng 0/ nteger programs by Gyana R. Parja IBM Corporaton, Poughkeepse, NY 260 Radu Gaddov Emery Worldwde Arlnes, Vandala, Oho 45377 and Wlbert E. Wlhelm Teas A&M Unversty,

More information

LLVM passes and Intro to Loop Transformation Frameworks

LLVM passes and Intro to Loop Transformation Frameworks LLVM passes and Intro to Loop Transformaton Frameworks Announcements Ths class s recorded and wll be n D2L panapto. No quz Monday after sprng break. Wll be dong md-semester class feedback. Today LLVM passes

More information

Chapter 6 Programmng the fnte element method Inow turn to the man subject of ths book: The mplementaton of the fnte element algorthm n computer programs. In order to make my dscusson as straghtforward

More information

Lecture 15: Memory Hierarchy Optimizations. I. Caches: A Quick Review II. Iteration Space & Loop Transformations III.

Lecture 15: Memory Hierarchy Optimizations. I. Caches: A Quick Review II. Iteration Space & Loop Transformations III. Lecture 15: Memory Herarchy Optmzatons I. Caches: A Quck Revew II. Iteraton Space & Loop Transformatons III. Types of Reuse ALSU 7.4.2-7.4.3, 11.2-11.5.1 15-745: Memory Herarchy Optmzatons Phllp B. Gbbons

More information

Parallel matrix-vector multiplication

Parallel matrix-vector multiplication Appendx A Parallel matrx-vector multplcaton The reduced transton matrx of the three-dmensonal cage model for gel electrophoress, descrbed n secton 3.2, becomes excessvely large for polymer lengths more

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

Module Management Tool in Software Development Organizations

Module Management Tool in Software Development Organizations Journal of Computer Scence (5): 8-, 7 ISSN 59-66 7 Scence Publcatons Management Tool n Software Development Organzatons Ahmad A. Al-Rababah and Mohammad A. Al-Rababah Faculty of IT, Al-Ahlyyah Amman Unversty,

More information

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009. Farrukh Jabeen Algorthms 51 Assgnment #2 Due Date: June 15, 29. Assgnment # 2 Chapter 3 Dscrete Fourer Transforms Implement the FFT for the DFT. Descrbed n sectons 3.1 and 3.2. Delverables: 1. Concse descrpton

More information

Programming in Fortran 90 : 2017/2018

Programming in Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Exercse 1 : Evaluaton of functon dependng on nput Wrte a program who evaluate the functon f (x,y) for any two user specfed values

More information

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following.

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following. Complex Numbers The last topc n ths secton s not really related to most of what we ve done n ths chapter, although t s somewhat related to the radcals secton as we wll see. We also won t need the materal

More information

Lecture 5: Multilayer Perceptrons

Lecture 5: Multilayer Perceptrons Lecture 5: Multlayer Perceptrons Roger Grosse 1 Introducton So far, we ve only talked about lnear models: lnear regresson and lnear bnary classfers. We noted that there are functons that can t be represented

More information

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points; Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features

More information

ON SOME ENTERTAINING APPLICATIONS OF THE CONCEPT OF SET IN COMPUTER SCIENCE COURSE

ON SOME ENTERTAINING APPLICATIONS OF THE CONCEPT OF SET IN COMPUTER SCIENCE COURSE Yordzhev K., Kostadnova H. Інформаційні технології в освіті ON SOME ENTERTAINING APPLICATIONS OF THE CONCEPT OF SET IN COMPUTER SCIENCE COURSE Yordzhev K., Kostadnova H. Some aspects of programmng educaton

More information

Load Balancing for Hex-Cell Interconnection Network

Load Balancing for Hex-Cell Interconnection Network Int. J. Communcatons, Network and System Scences,,, - Publshed Onlne Aprl n ScRes. http://www.scrp.org/journal/jcns http://dx.do.org/./jcns.. Load Balancng for Hex-Cell Interconnecton Network Saher Manaseer,

More information

S1 Note. Basis functions.

S1 Note. Basis functions. S1 Note. Bass functons. Contents Types of bass functons...1 The Fourer bass...2 B-splne bass...3 Power and type I error rates wth dfferent numbers of bass functons...4 Table S1. Smulaton results of type

More information

The Codesign Challenge

The Codesign Challenge ECE 4530 Codesgn Challenge Fall 2007 Hardware/Software Codesgn The Codesgn Challenge Objectves In the codesgn challenge, your task s to accelerate a gven software reference mplementaton as fast as possble.

More information

Assembler. Building a Modern Computer From First Principles.

Assembler. Building a Modern Computer From First Principles. Assembler Buldng a Modern Computer From Frst Prncples www.nand2tetrs.org Elements of Computng Systems, Nsan & Schocken, MIT Press, www.nand2tetrs.org, Chapter 6: Assembler slde Where we are at: Human Thought

More information

5 The Primal-Dual Method

5 The Primal-Dual Method 5 The Prmal-Dual Method Orgnally desgned as a method for solvng lnear programs, where t reduces weghted optmzaton problems to smpler combnatoral ones, the prmal-dual method (PDM) has receved much attenton

More information

Concurrent Apriori Data Mining Algorithms

Concurrent Apriori Data Mining Algorithms Concurrent Apror Data Mnng Algorthms Vassl Halatchev Department of Electrcal Engneerng and Computer Scence York Unversty, Toronto October 8, 2015 Outlne Why t s mportant Introducton to Assocaton Rule Mnng

More information

A New Approach For the Ranking of Fuzzy Sets With Different Heights

A New Approach For the Ranking of Fuzzy Sets With Different Heights New pproach For the ankng of Fuzzy Sets Wth Dfferent Heghts Pushpnder Sngh School of Mathematcs Computer pplcatons Thapar Unversty, Patala-7 00 Inda pushpndersnl@gmalcom STCT ankng of fuzzy sets plays

More information

On Some Entertaining Applications of the Concept of Set in Computer Science Course

On Some Entertaining Applications of the Concept of Set in Computer Science Course On Some Entertanng Applcatons of the Concept of Set n Computer Scence Course Krasmr Yordzhev *, Hrstna Kostadnova ** * Assocate Professor Krasmr Yordzhev, Ph.D., Faculty of Mathematcs and Natural Scences,

More information

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

Improving Low Density Parity Check Codes Over the Erasure Channel. The Nelder Mead Downhill Simplex Method. Scott Stransky

Improving Low Density Parity Check Codes Over the Erasure Channel. The Nelder Mead Downhill Simplex Method. Scott Stransky Improvng Low Densty Party Check Codes Over the Erasure Channel The Nelder Mead Downhll Smplex Method Scott Stransky Programmng n conjuncton wth: Bors Cukalovc 18.413 Fnal Project Sprng 2004 Page 1 Abstract

More information

Sum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints

Sum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints Australan Journal of Basc and Appled Scences, 2(4): 1204-1208, 2008 ISSN 1991-8178 Sum of Lnear and Fractonal Multobjectve Programmng Problem under Fuzzy Rules Constrants 1 2 Sanjay Jan and Kalash Lachhwan

More information

Cluster Analysis of Electrical Behavior

Cluster Analysis of Electrical Behavior Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School

More information

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

Optimization Methods: Integer Programming Integer Linear Programming 1. Module 7 Lecture Notes 1. Integer Linear Programming

Optimization Methods: Integer Programming Integer Linear Programming 1. Module 7 Lecture Notes 1. Integer Linear Programming Optzaton Methods: Integer Prograng Integer Lnear Prograng Module Lecture Notes Integer Lnear Prograng Introducton In all the prevous lectures n lnear prograng dscussed so far, the desgn varables consdered

More information

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics Introducton G10 NAG Fortran Lbrary Chapter Introducton G10 Smoothng n Statstcs Contents 1 Scope of the Chapter... 2 2 Background to the Problems... 2 2.1 Smoothng Methods... 2 2.2 Smoothng Splnes and Regresson

More information

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth

More information

Parallel Numerics. 1 Preconditioning & Iterative Solvers (From 2016)

Parallel Numerics. 1 Preconditioning & Iterative Solvers (From 2016) Technsche Unverstät München WSe 6/7 Insttut für Informatk Prof. Dr. Thomas Huckle Dpl.-Math. Benjamn Uekermann Parallel Numercs Exercse : Prevous Exam Questons Precondtonng & Iteratve Solvers (From 6)

More information

Harvard University CS 101 Fall 2005, Shimon Schocken. Assembler. Elements of Computing Systems 1 Assembler (Ch. 6)

Harvard University CS 101 Fall 2005, Shimon Schocken. Assembler. Elements of Computing Systems 1 Assembler (Ch. 6) Harvard Unversty CS 101 Fall 2005, Shmon Schocken Assembler Elements of Computng Systems 1 Assembler (Ch. 6) Why care about assemblers? Because Assemblers employ some nfty trcks Assemblers are the frst

More information

Analysis of Continuous Beams in General

Analysis of Continuous Beams in General Analyss of Contnuous Beams n General Contnuous beams consdered here are prsmatc, rgdly connected to each beam segment and supported at varous ponts along the beam. onts are selected at ponts of support,

More information

Algorithmic Transformation Techniques for Efficient Exploration of Alternative Application Instances

Algorithmic Transformation Techniques for Efficient Exploration of Alternative Application Instances In: Proc. 0th Int. Symposum on Hardware/Software Codesgn (CODES 02), Estes Park, Colorado, USA, May 6 8, 2002 Algorthmc Transformaton Technques for Effcent Exploraton of Alternatve Applcaton Instances

More information

Smoothing Spline ANOVA for variable screening

Smoothing Spline ANOVA for variable screening Smoothng Splne ANOVA for varable screenng a useful tool for metamodels tranng and mult-objectve optmzaton L. Rcco, E. Rgon, A. Turco Outlne RSM Introducton Possble couplng Test case MOO MOO wth Game Theory

More information

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 An Iteratve Soluton Approach to Process Plant Layout usng Mxed

More information

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 A mathematcal programmng approach to the analyss, desgn and

More information

Private Information Retrieval (PIR)

Private Information Retrieval (PIR) 2 Levente Buttyán Problem formulaton Alce wants to obtan nformaton from a database, but she does not want the database to learn whch nformaton she wanted e.g., Alce s an nvestor queryng a stock-market

More information

Range images. Range image registration. Examples of sampling patterns. Range images and range surfaces

Range images. Range image registration. Examples of sampling patterns. Range images and range surfaces Range mages For many structured lght scanners, the range data forms a hghly regular pattern known as a range mage. he samplng pattern s determned by the specfc scanner. Range mage regstraton 1 Examples

More information

A SYSTOLIC APPROACH TO LOOP PARTITIONING AND MAPPING INTO FIXED SIZE DISTRIBUTED MEMORY ARCHITECTURES

A SYSTOLIC APPROACH TO LOOP PARTITIONING AND MAPPING INTO FIXED SIZE DISTRIBUTED MEMORY ARCHITECTURES A SYSOLIC APPROACH O LOOP PARIIONING AND MAPPING INO FIXED SIZE DISRIBUED MEMORY ARCHIECURES Ioanns Drosts, Nektaros Kozrs, George Papakonstantnou and Panayots sanakas Natonal echncal Unversty of Athens

More information

User Authentication Based On Behavioral Mouse Dynamics Biometrics

User Authentication Based On Behavioral Mouse Dynamics Biometrics User Authentcaton Based On Behavoral Mouse Dynamcs Bometrcs Chee-Hyung Yoon Danel Donghyun Km Department of Computer Scence Department of Computer Scence Stanford Unversty Stanford Unversty Stanford, CA

More information

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr) Helsnk Unversty Of Technology, Systems Analyss Laboratory Mat-2.08 Independent research projects n appled mathematcs (3 cr) "! #$&% Antt Laukkanen 506 R ajlaukka@cc.hut.f 2 Introducton...3 2 Multattrbute

More information

AMath 483/583 Lecture 21 May 13, Notes: Notes: Jacobi iteration. Notes: Jacobi with OpenMP coarse grain

AMath 483/583 Lecture 21 May 13, Notes: Notes: Jacobi iteration. Notes: Jacobi with OpenMP coarse grain AMath 483/583 Lecture 21 May 13, 2011 Today: OpenMP and MPI versons of Jacob teraton Gauss-Sedel and SOR teratve methods Next week: More MPI Debuggng and totalvew GPU computng Read: Class notes and references

More information

Meta-heuristics for Multidimensional Knapsack Problems

Meta-heuristics for Multidimensional Knapsack Problems 2012 4th Internatonal Conference on Computer Research and Development IPCSIT vol.39 (2012) (2012) IACSIT Press, Sngapore Meta-heurstcs for Multdmensonal Knapsack Problems Zhbao Man + Computer Scence Department,

More information

Quality Improvement Algorithm for Tetrahedral Mesh Based on Optimal Delaunay Triangulation

Quality Improvement Algorithm for Tetrahedral Mesh Based on Optimal Delaunay Triangulation Intellgent Informaton Management, 013, 5, 191-195 Publshed Onlne November 013 (http://www.scrp.org/journal/m) http://dx.do.org/10.36/m.013.5601 Qualty Improvement Algorthm for Tetrahedral Mesh Based on

More information

Kent State University CS 4/ Design and Analysis of Algorithms. Dept. of Math & Computer Science LECT-16. Dynamic Programming

Kent State University CS 4/ Design and Analysis of Algorithms. Dept. of Math & Computer Science LECT-16. Dynamic Programming CS 4/560 Desgn and Analyss of Algorthms Kent State Unversty Dept. of Math & Computer Scence LECT-6 Dynamc Programmng 2 Dynamc Programmng Dynamc Programmng, lke the dvde-and-conquer method, solves problems

More information

REDUCING hardware design time is more than ever a

REDUCING hardware design time is more than ever a TCAD-2012-0168 1 Polyhedral Bubble Inserton: A Method to Improve Nested Loop Ppelnng for Hgh-Level Synthess Antone Morvan, Steven Derren, and Patrce Qunton Abstract Hgh-Level Synthess (HLS) allows hardware

More information

2x x l. Module 3: Element Properties Lecture 4: Lagrange and Serendipity Elements

2x x l. Module 3: Element Properties Lecture 4: Lagrange and Serendipity Elements Module 3: Element Propertes Lecture : Lagrange and Serendpty Elements 5 In last lecture note, the nterpolaton functons are derved on the bass of assumed polynomal from Pascal s trangle for the fled varable.

More information

Communication-Minimal Partitioning and Data Alignment for Af"ne Nested Loops

Communication-Minimal Partitioning and Data Alignment for Afne Nested Loops Communcaton-Mnmal Parttonng and Data Algnment for Af"ne Nested Loops HYUK-JAE LEE 1 AND JOSÉ A. B. FORTES 2 1 Department of Computer Scence, Lousana Tech Unversty, Ruston, LA 71272, USA 2 School of Electrcal

More information

Assembler. Shimon Schocken. Spring Elements of Computing Systems 1 Assembler (Ch. 6) Compiler. abstract interface.

Assembler. Shimon Schocken. Spring Elements of Computing Systems 1 Assembler (Ch. 6) Compiler. abstract interface. IDC Herzlya Shmon Schocken Assembler Shmon Schocken Sprng 2005 Elements of Computng Systems 1 Assembler (Ch. 6) Where we are at: Human Thought Abstract desgn Chapters 9, 12 abstract nterface H.L. Language

More information

Machine Learning. Support Vector Machines. (contains material adapted from talks by Constantin F. Aliferis & Ioannis Tsamardinos, and Martin Law)

Machine Learning. Support Vector Machines. (contains material adapted from talks by Constantin F. Aliferis & Ioannis Tsamardinos, and Martin Law) Machne Learnng Support Vector Machnes (contans materal adapted from talks by Constantn F. Alfers & Ioanns Tsamardnos, and Martn Law) Bryan Pardo, Machne Learnng: EECS 349 Fall 2014 Support Vector Machnes

More information

The relation between diamond tiling and hexagonal tiling

The relation between diamond tiling and hexagonal tiling The relaton between damond tlng and hexagonal tlng Tobas Grosser INRIA and École Normale Supéreure, Pars tobas.grosser@nra.fr Sven Verdoolaege INRIA, École Normale Supéreure and KU Leuven sven.verdoolaege@nra.fr

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Decson surface s a hyperplane (lne n 2D) n feature space (smlar to the Perceptron) Arguably, the most mportant recent dscovery n machne learnng In a nutshell: map the data to a predetermned

More information

Solitary and Traveling Wave Solutions to a Model. of Long Range Diffusion Involving Flux with. Stability Analysis

Solitary and Traveling Wave Solutions to a Model. of Long Range Diffusion Involving Flux with. Stability Analysis Internatonal Mathematcal Forum, Vol. 6,, no. 7, 8 Soltary and Travelng Wave Solutons to a Model of Long Range ffuson Involvng Flux wth Stablty Analyss Manar A. Al-Qudah Math epartment, Rabgh Faculty of

More information

CHAPTER 2 DECOMPOSITION OF GRAPHS

CHAPTER 2 DECOMPOSITION OF GRAPHS CHAPTER DECOMPOSITION OF GRAPHS. INTRODUCTION A graph H s called a Supersubdvson of a graph G f H s obtaned from G by replacng every edge uv of G by a bpartte graph,m (m may vary for each edge by dentfyng

More information

F Geometric Mean Graphs

F Geometric Mean Graphs Avalable at http://pvamu.edu/aam Appl. Appl. Math. ISSN: 1932-9466 Vol. 10, Issue 2 (December 2015), pp. 937-952 Applcatons and Appled Mathematcs: An Internatonal Journal (AAM) F Geometrc Mean Graphs A.

More information

Non-Split Restrained Dominating Set of an Interval Graph Using an Algorithm

Non-Split Restrained Dominating Set of an Interval Graph Using an Algorithm Internatonal Journal of Advancements n Research & Technology, Volume, Issue, July- ISS - on-splt Restraned Domnatng Set of an Interval Graph Usng an Algorthm ABSTRACT Dr.A.Sudhakaraah *, E. Gnana Deepka,

More information

Related-Mode Attacks on CTR Encryption Mode

Related-Mode Attacks on CTR Encryption Mode Internatonal Journal of Network Securty, Vol.4, No.3, PP.282 287, May 2007 282 Related-Mode Attacks on CTR Encrypton Mode Dayn Wang, Dongda Ln, and Wenlng Wu (Correspondng author: Dayn Wang) Key Laboratory

More information

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS ARPN Journal of Engneerng and Appled Scences 006-017 Asan Research Publshng Network (ARPN). All rghts reserved. NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS Igor Grgoryev, Svetlana

More information

Collaboratively Regularized Nearest Points for Set Based Recognition

Collaboratively Regularized Nearest Points for Set Based Recognition Academc Center for Computng and Meda Studes, Kyoto Unversty Collaboratvely Regularzed Nearest Ponts for Set Based Recognton Yang Wu, Mchhko Mnoh, Masayuk Mukunok Kyoto Unversty 9/1/013 BMVC 013 @ Brstol,

More information

Overview. Basic Setup [9] Motivation and Tasks. Modularization 2008/2/20 IMPROVED COVERAGE CONTROL USING ONLY LOCAL INFORMATION

Overview. Basic Setup [9] Motivation and Tasks. Modularization 2008/2/20 IMPROVED COVERAGE CONTROL USING ONLY LOCAL INFORMATION Overvew 2 IMPROVED COVERAGE CONTROL USING ONLY LOCAL INFORMATION Introducton Mult- Smulator MASIM Theoretcal Work and Smulaton Results Concluson Jay Wagenpfel, Adran Trachte Motvaton and Tasks Basc Setup

More information

SENSITIVITY ANALYSIS IN LINEAR PROGRAMMING USING A CALCULATOR

SENSITIVITY ANALYSIS IN LINEAR PROGRAMMING USING A CALCULATOR SENSITIVITY ANALYSIS IN LINEAR PROGRAMMING USING A CALCULATOR Judth Aronow Rchard Jarvnen Independent Consultant Dept of Math/Stat 559 Frost Wnona State Unversty Beaumont, TX 7776 Wnona, MN 55987 aronowju@hal.lamar.edu

More information

TN348: Openlab Module - Colocalization

TN348: Openlab Module - Colocalization TN348: Openlab Module - Colocalzaton Topc The Colocalzaton module provdes the faclty to vsualze and quantfy colocalzaton between pars of mages. The Colocalzaton wndow contans a prevew of the two mages

More information

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters Proper Choce of Data Used for the Estmaton of Datum Transformaton Parameters Hakan S. KUTOGLU, Turkey Key words: Coordnate systems; transformaton; estmaton, relablty. SUMMARY Advances n technologes and

More information

Array transposition in CUDA shared memory

Array transposition in CUDA shared memory Array transposton n CUDA shared memory Mke Gles February 19, 2014 Abstract Ths short note s nspred by some code wrtten by Jeremy Appleyard for the transposton of data through shared memory. I had some

More information

Lecture 4: Principal components

Lecture 4: Principal components /3/6 Lecture 4: Prncpal components 3..6 Multvarate lnear regresson MLR s optmal for the estmaton data...but poor for handlng collnear data Covarance matrx s not nvertble (large condton number) Robustness

More information

Constructing Minimum Connected Dominating Set: Algorithmic approach

Constructing Minimum Connected Dominating Set: Algorithmic approach Constructng Mnmum Connected Domnatng Set: Algorthmc approach G.N. Puroht and Usha Sharma Centre for Mathematcal Scences, Banasthal Unversty, Rajasthan 304022 usha.sharma94@yahoo.com Abstract: Connected

More information

A Geometric Approach for Multi-Degree Spline

A Geometric Approach for Multi-Degree Spline L X, Huang ZJ, Lu Z. A geometrc approach for mult-degree splne. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 27(4): 84 850 July 202. DOI 0.007/s390-02-268-2 A Geometrc Approach for Mult-Degree Splne Xn L

More information

Problem Set 3 Solutions

Problem Set 3 Solutions Introducton to Algorthms October 4, 2002 Massachusetts Insttute of Technology 6046J/18410J Professors Erk Demane and Shaf Goldwasser Handout 14 Problem Set 3 Solutons (Exercses were not to be turned n,

More information

Ontology Generator from Relational Database Based on Jena

Ontology Generator from Relational Database Based on Jena Computer and Informaton Scence Vol. 3, No. 2; May 2010 Ontology Generator from Relatonal Database Based on Jena Shufeng Zhou (Correspondng author) College of Mathematcs Scence, Laocheng Unversty No.34

More information

12/2/2009. Announcements. Parametric / Non-parametric. Case-Based Reasoning. Nearest-Neighbor on Images. Nearest-Neighbor Classification

12/2/2009. Announcements. Parametric / Non-parametric. Case-Based Reasoning. Nearest-Neighbor on Images. Nearest-Neighbor Classification Introducton to Artfcal Intellgence V22.0472-001 Fall 2009 Lecture 24: Nearest-Neghbors & Support Vector Machnes Rob Fergus Dept of Computer Scence, Courant Insttute, NYU Sldes from Danel Yeung, John DeNero

More information

AADL : about scheduling analysis

AADL : about scheduling analysis AADL : about schedulng analyss Schedulng analyss, what s t? Embedded real-tme crtcal systems have temporal constrants to meet (e.g. deadlne). Many systems are bult wth operatng systems provdng multtaskng

More information

Reducing Frame Rate for Object Tracking

Reducing Frame Rate for Object Tracking Reducng Frame Rate for Object Trackng Pavel Korshunov 1 and We Tsang Oo 2 1 Natonal Unversty of Sngapore, Sngapore 11977, pavelkor@comp.nus.edu.sg 2 Natonal Unversty of Sngapore, Sngapore 11977, oowt@comp.nus.edu.sg

More information

Querying by sketch geographical databases. Yu Han 1, a *

Querying by sketch geographical databases. Yu Han 1, a * 4th Internatonal Conference on Sensors, Measurement and Intellgent Materals (ICSMIM 2015) Queryng by sketch geographcal databases Yu Han 1, a * 1 Department of Basc Courses, Shenyang Insttute of Artllery,

More information

High-Boost Mesh Filtering for 3-D Shape Enhancement

High-Boost Mesh Filtering for 3-D Shape Enhancement Hgh-Boost Mesh Flterng for 3-D Shape Enhancement Hrokazu Yagou Λ Alexander Belyaev y Damng We z Λ y z ; ; Shape Modelng Laboratory, Unversty of Azu, Azu-Wakamatsu 965-8580 Japan y Computer Graphcs Group,

More information

Virtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory

Virtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory Background EECS. Operatng System Fundamentals No. Vrtual Memory Prof. Hu Jang Department of Electrcal Engneerng and Computer Scence, York Unversty Memory-management methods normally requres the entre process

More information

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms Course Introducton Course Topcs Exams, abs, Proects A quc loo at a few algorthms 1 Advanced Data Structures and Algorthms Descrpton: We are gong to dscuss algorthm complexty analyss, algorthm desgn technques

More information

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision SLAM Summer School 2006 Practcal 2: SLAM usng Monocular Vson Javer Cvera, Unversty of Zaragoza Andrew J. Davson, Imperal College London J.M.M Montel, Unversty of Zaragoza. josemar@unzar.es, jcvera@unzar.es,

More information

Reading. 14. Subdivision curves. Recommended:

Reading. 14. Subdivision curves. Recommended: eadng ecommended: Stollntz, Deose, and Salesn. Wavelets for Computer Graphcs: heory and Applcatons, 996, secton 6.-6., A.5. 4. Subdvson curves Note: there s an error n Stollntz, et al., secton A.5. Equaton

More information

Intro. Iterators. 1. Access

Intro. Iterators. 1. Access Intro Ths mornng I d lke to talk a lttle bt about s and s. We wll start out wth smlartes and dfferences, then we wll see how to draw them n envronment dagrams, and we wll fnsh wth some examples. Happy

More information

Data Representation in Digital Design, a Single Conversion Equation and a Formal Languages Approach

Data Representation in Digital Design, a Single Conversion Equation and a Formal Languages Approach Data Representaton n Dgtal Desgn, a Sngle Converson Equaton and a Formal Languages Approach Hassan Farhat Unversty of Nebraska at Omaha Abstract- In the study of data representaton n dgtal desgn and computer

More information

Face Recognition University at Buffalo CSE666 Lecture Slides Resources:

Face Recognition University at Buffalo CSE666 Lecture Slides Resources: Face Recognton Unversty at Buffalo CSE666 Lecture Sldes Resources: http://www.face-rec.org/algorthms/ Overvew of face recognton algorthms Correlaton - Pxel based correspondence between two face mages Structural

More information