The relation between diamond tiling and hexagonal tiling

Size: px
Start display at page:

Download "The relation between diamond tiling and hexagonal tiling"

Transcription

1 The relaton between damond tlng and hexagonal tlng Tobas Grosser INRIA and École Normale Supéreure, Pars Sven Verdoolaege INRIA, École Normale Supéreure and KU Leuven P. Sadayappan Oho State Unversty Albert Cohen INRIA and École Normale Supéreure, Pars ABSTRACT Iteratve stencl computatons are mportant n scentfc computng and more and more also n the embedded and moble doman. Recent publcatons have shown that tlng schemes that ensure concurrent start provde effcent ways to execute these kernels. Damond tlng and hybrd-hexagonal tlng are two successful tlng schemes that enable concurrent start. Both have dfferent advantages: damond tlng s ntegrated n a general purpose optmzaton framework and uses a cost functon to choose among tlng hyperplanes, whereas the more flexble tle szes of hybrd-hexagonal tlng have proven to be effectve for the generaton of GPU code. We show that these two approaches are even more nterestng when combned. We revst the formalzaton of damond and hexagonal tlng, present the effects of tle sze and wavefront choces on tle-level parallelsm, and formulate constrants for optmal damond tle shapes. We then extend the damond tlng formulaton nto a hexagonal tlng one, combnng the benefts of both. The paper closes wth an outlook of hexagonal tlng n hgher dmensonal spaces, an mportant generalzaton sutable for massvely parallel archtectures. 1. INTRODUCTION Stencl computatons are an mportant computatonal pattern n both scentfc and engneerng applcatons and they are becomng ncreasngly mportant n the embedded and moble doman. Computatonal electrodynamcs [13] or partal dfferental equatons [11] are common use cases of stencls n hgh performance computng, whereas mage and vdeo processng are about to become drvng forces n the embedded market. Even though manual and automatc optmzatons of stencl computatons have been desgned snce many years, the generaton of effcent code remans a challenge especally for hgher-dmensonal stencls or for platforms whch allow hghly parallel executon on dfferent hardware levels. Wth the ncreased use of parallel hardware n moble markets as well as the foreseeable ncrease of three d- HStencls 1 Frst Internatonal Workshop on Hgh-Performance Stencl Computatons January 1, 1, Venna, Austra In conjuncton wth HPEAC 1. mensonal processng n upcomng embedded devces, a need emerges for solutons that facltate the automatc generaton of hgh-performance stencl codes for dfferent devces. For stencl computatons, the tlng strateges that enable reuse along the tme dmenson have shown to be most effcent. Unfortunately, the standard approach uses parallel wavefronts n a skewed ndex space. These skewed wavefronts reduce tle-level parallelsm [9] and nduce loadmbalanced prologue and eplogue phases. Splt tlng [, 9] and overlapped tlng [8, 9] address ths problem by enablng concurrent start along one of the orgnal teraton space dmenson. In other words, the tle schedule allows a wavefront of tles parallel to one of the orgnal dmensons of the ndex space to be executed n parallel. However, these two tlng technques requre ether perodcally alternatng tle shapes or nduce redundant computatons. In contrast, the recently publshed damond tlng [] and hybrd-hexagonal tlng [] schemes successfully obtan concurrent start wthout the need for redundant computatons or multple tle shapes. Damond tlng s a tlng strategy that uses a sngle n- dmensonal paralleloptope 1 that s calculated such that t s possble to create a tlng that ensures that the number of tles executable n parallel remans consstent throughout the computaton, meanng that the tle schedule enables concurrent start. The advantages of damond tlng are ts ntegraton n a general purpose complaton framework and the use of an adaptable cost functon to determne tle shapes. Hybrd hexagonal-classcal tlng s a tlng scheme that uses hexagonal tle shapes to enable concurrent start and to provde flexble tle sze choces on one dmenson. On the remanng dmensons t uses classcal parallelogram tlng. The more doman specfc formulaton of hybrd-hexagonal tlng does not optmze tle shapes for a certan cost functon, but always uses the most narrow dependence cone to derve the tle shape. On the other sde, hybrd-hexagonal tlng has the advantage that t allows to adjust the tme-tle heght and the wdth along the space dmenson ndvdually. It also permts the creaton of tles wth a flat summt and can ensure that tles do not only have the same ratonal shape, but ther nteger pont placement s by constructon dentcal all propertes that have shown to be essental for effcent GPU code generaton. Besdes these advantages, there are also open problems. Even though damond tlng 1 A parallelotope s a general term for what s known n D as parallelogram and n 3D as parallelepped.

2 generally explans how to derve tlng hyperplanes that enable concurrent start, a tle schedule that ncludes both the tle szes as well as the parallel wavefront coeffcents necessary to obtan concurrent start was not presented. Hexagonal tlng has shown benefcal for hgher dmensonal stencls when combned wth other tlng schemes, but the formulaton of hexagonal tlng tself s lmted to the D case (1 tme dmenson, 1 space dmenson). Ths paper combnes the two tlng strateges to get the best of both worlds. Its contrbutons are: a) an n-depth analyss of the constrants that damond-tlng mposes on tle-szes and wavefront coeffcents, b) a formulaton of condtons that ensure dentcal placement of nteger ponts wthn the tles, c) an extenson of the orgnal damond tlng algorthm to a hexagonal tlng algorthm for dmensonal problems (1 tme dmenson, 1 space dmenson), d) deas for hexagonal tlng of hgher dmensonal stencls. The paper s structured as follows. In Secton we revst damond tlng, provde nsghts on tle sze and wavefront coeffcent constrants and gve condtons that ensure mportant propertes of the damond tles. We then ntroduce the unfed hexagonal tlng scheme n Secton 3 whch ncludes a full formulaton for two dmensonal tlng as well as an outlook on hexagonal tlng for hgher-dmensonal cases. We dscuss related work n Secton and conclude n Secton.. DIAMOND TILING Damond tlng [] s a tlng technque for stencl computatons where the man contrbuton s the combnaton of affne transformatons and a rectangular tlng that enables concurrent start. The dea of concurrent start s to ensure that the wavefront of tles that are executed n parallel s algned to a concurrent start hyperplane (normally an teraton space boundary) such that the number of tles that are executed n parallel remans constant throughout the entre computaton. Ths ensures that already at the begnnng of the computaton a suffcent amount of parallelsm s avalable. Even though the name damond suggests that the tle shapes are rhomb or rhombohedra (a.k.a. damonds) and Fgure 1 n Bandsht et al. [] also uses edges of dentcal length, the tle shapes formed by damond tlng are not restrcted to damonds, but can be more general parallelograms (parallelotopes n hgher dmensons) as can be seen n Fgure 3 and Fgure 9a. However, some restrctons to the tle shape and szes must be enforce to ensure that concurrent start s possble..1 The Pluto optmzer Damond tlng was presented and mplemented as an extenson to Pluto [3], a general-purpose optmzer for data localty and parallelsm. In contrast to other approaches that drectly tle the teraton space (e.g., [, ]), the orgnal Pluto tlng as well as damond tlng are mplemented as a two phase process. As a frst step a program transformaton s calculated that exposes sequences of loops (bands) that are tleable wth rectangular tles. In the second step a rectangular tlng s performed on these bands. Combned, ths yelds tles wth a possbly not rectangular, but parallelotope tle shape. There are several benefts of separatng these two concerns. Frst, when calculatng the parallel bands Pluto can and does perform other optmzatons, e.g., data localty optmzatons such as loop fuson. Second, tlng of the transformed program makes the tle shapes ndependent of the tlng hyperplanes, whch makes the tlng easer to descrbe and analyze. Pluto calculates program transformatons on a polyhedral representaton. In ths representaton the set of executed program statements (the teraton space) s modeled wth a mult-dmensonal nteger set where each element represents an ndvdual statement teraton. The executon order of elements of the teraton space s descrbed by the schedule, an nteger map that assgns a possbly multdmensonal relatve executon tme to each element of the teraton space. Program transformatons are performed by modfyng the schedule. For a sngle statement and a k- dmensonal executon tme such a schedule has the form S = x (h x,..., h k x), where x s an element of the teraton space, h, [, k] are tlng hyperplanes and h x denotes the sum of the per element products of h and x. The result of Pluto s frst step are exactly these tlng hyperplanes, selected such that the dstance between two statements that depend on each other s not only lexcographcally nonnegatve (needed for valdty of the schedule), but that the dstance s also nonnegatve at each ndvdual dmenson. For the exact algorthm on how to select such hyperplanes, we refer to [3]. For ths paper, t s suffcent to understand that the all nonnegatve dependence vectors make rectangular tlng vald. We present the Pluto rectangular tlng as a schedule only transformaton whch we beleve s easer to understand than the actual Pluto transformaton whch modfes the teraton space as well. Conceptually, there should be no dfference. Gven a schedule S and a set of tle szes s, [, k] a rectangularly tled schedule of S conssts of two partal schedules. The frst one, S t, s placed at the outer level and enumerates the tles tself. It s called the tle schedule. The second one, S p, s placed at the nner level and enumerates the ponts wthn each tle. It s called pont schedule. We defne S t = (x,..., x k ) ( (h x)/s,..., (h k x)/s k ) and S p = S. Ths tled schedule may already expose parallelsm, but t may also be necessary to fall back to ppelne parallelsm by formng a wavefront schedule at the outermost tle dmenson. Then, such a wavefront schedule carres tself all dependences and ensures that the nner loops can be executed n parallel. Ths yelds S t = (x,..., x k ) (λ (h x)/s + + λ k (h k x)/s k, (h 1 x)/s 1,..., (h k x)/s k ) wth λ Z : [, k]. The coeffcents λ allow the constructon of dfferent wavefronts. We call λ = = λ k = 1 the default wavefront coeffcents. The hyperplanes that are calculated by the orgnal Pluto algorthm allow the formaton of such a wavefront schedule, but t s not always possble to form a tle schedule that s n the same drecton as a gven concurrent start face f.. The damond tlng extensons Damond tlng [] extends the Pluto algorthm n a way that ensures that for the tlng hyperplanes computed there always exst wavefront coeffcents that yeld concurrent start. In the followng, we dentfy a face or hyperplane to ts orthogonal vector. Ths paper shows that a transformaton enables tlewse concurrent start along a face f f and only f the tle schedule s n the same drecton as the face and carres all nter-tle dependences. It also shows that concurrent start along a face f can be exposed by a set of hyperplanes f and only f f les strctly nsde the cone formed

3 by the hyperplanes,.e., f and only f f s a strct conc combnaton of all the hyperplanes. Ths means t fnds for a concurrent start hyperplane f tlng hyperplanes h such that the followng equalty holds: mf = λ 1h λ k h k (1) λ, m Z The man focus of the damond tlng paper s to prove the condtons necessary to ensure that the calculated hyperplanes can be used to construct a concurrent start schedule as well as to gve an algorthm that actually calculates such hyperplanes. We consequently refer to ths publcaton for detals. One queston that was explored less s under whch condtons, especally for whch tle szes and for whch wavefront coeffcents, the rectangularly tled schedule acheves concurrent start. Specfcally, t s not clear for whch values of λ, s j the followng holds: mxf = λ (h x)/s + + λ k (h k x)/s k ().3 Relaton between tle szes and wavefronts Even though the damond tlng yelds tlng hyperplanes that allow concurrent start, to construct the full tle schedule the tle szes s as well as the wavefront coeffcents λ stll need to be chosen. Choosng the correct values s mportant, not only to ensure that the tles executed wthn the wavefront are started concurrently, but also to control the horzontal dstance between tles of the same color relatve to ther tle sze. We call ths the densty of the schedule, a property mportant to understand the amount of computaton that can be performed n parallel. Before suggestng good values, we explore the mpact of dfferent choces. Let us frst consder a smple example wth symmetrc dependences: for t for A[t+1][] = A[t][-1] + A[t][+1] Pluto s damond tlng mplementaton calculates for ths kernel the transformaton (t, ) (t, t + ) and apples rectangular tlng n the transformed space. The default wavefront coeffcents λ = λ 1 = 1 are then used to enable parallel executon. Ths results n the tle schedule (t, ) ( (t )/s + (t + )/s 1, (t + )/s 1 ). The default square tle shapes (s = s 1) yeld both concurrent start as well as a hgh densty of tles. Fgure 1 llustrates ths for s = s 1 = wth the tle wavefront hghlghted n red and the concurrent start hyperplane hghlghted n black. The two hyperplanes beng parallel shows that the tle wavefront has concurrent start. When dfferent tle szes are chosen for the two dmensons the default wavefront no longer yelds concurrent start. In Fgure we llustrate for s =, s 1 = that the default wavefront (red) s no longer parallel to the concurrent start hyperplane (black). It s possble to stll get concurrent start usng the non-default wavefront coeffcents λ =, λ 1 = 3, whch yelds the schedule (t, ) ( (t )/ +3 (t+)/, (t+)/ ). Unfortunately, a nondefault wavefront causes a large loss n tle-level parallelsm throughout the computaton. Ths effect s llustrated by the yellow wavefront n Fgure, whch s parallel to the concurrent start hyperplane (black). Next we analyze a kernel wth asymmetrc dependences: for t for A[t+1][] = A[t][-1] + A[t][+] Pluto derves from ths kernel the transformaton (t, ) (t, t + ). Ths transformaton combned wth square tlng and the default wavefront coeffcents allows concurrent start as shown n Fgure 3 for s = s 1 =. The reason for ths, possbly surprsng, result s that for a dmensonal stencl (1 space, 1 tme) wth dependence dstance 1 n the tme drecton, the coeffcent of the space dmenson n the normal wll always be ±1. Ths ensures that when addng the two hyperplanes together ther coeffcents for the space dmenson cancel out and we get agan the concurrent start hyperplane. Consequently, the default wavefront coeffcents combned wth square tle szes yeld a concurrent start wavefront. As already found earler, non-square tle szes wll prevent concurrent start wth the default wavefront coeffcents. Another nterestng observaton s that even though the ratonal tle shapes n Fgure 3 are dentcal throughout the orgnal teraton space, the set of contaned nteger ponts s not. The reason for ths dfference s that even though we use ntegral tle szes n the transformed space, the borders may become non-ntegral n the orgnal space. Varyng nteger pont placements between tles can cause problems due to addtonal condtons n the generated code. As a next step we look nto a case that has dependence dstances that have dfferent lengths on the tme dmenson. for t for A[t+1][] = A[t][-1] + A[t-][+1] For ths kernel, the Pluto mplementaton derves the transformaton (t, ) (t + 3, t + ). Note that ths result s dfferent from what the algorthm n [] would produce. Apparently, the Pluto mplementaton s usng a varaton of that algorthm. It s not clear f there s a problem n ths varaton or that ths s a mere mplementaton problem. As both hyperplanes have a postve coeffcent for the space dmenson, t s mpossble to create a conc combnaton that elmnates the space dmenson and yelds a concurrent start hyperplane. Accordng to the damond tlng paper concurrent start s mpossble and these are no vald damond tlng hyperplanes. Even though the damond tlng mplementaton n Pluto dd not derve a vald tlng for the last kernel, there exst vald damond tlngs for t. One s the transformaton (t, ) (t, t + ). The same transformaton was already chosen for the example llustrated n Fgure 1 and accordng to our understandng of the cost functon n Pluto, ths s n fact the transformaton that the algorthm of [] would choose. The resultng tlng yelds 8 computatons for a pertle memory footprnt of 3. Another vald damond tlng transformaton s (t, ) (t + 3, t ). The hyperplanes n ths transformaton are the ones hybrd-hexagonal tlng would read off drectly from the dependence cone. Gven a dfferent cost functon, Pluto may also choose ths transformaton. The nterestng pont here s, that the normal of the concurrent start hyperplane n the transformed space s not anymore (1,1), but rather (1,3). In ths case, the standard square tlng llustrated n Fgure only yelds concurrent start f, nstead of the default

4 t 1 t t t+ (a) orgnal space Fgure 1: Symmetrc dependences & square tlng t+ (a) orgnal space Fgure : Symmetrc dependences & non-square tlng t+ (a) orgnal space Fgure 3: Asymmetrc dependences & square tlng

5 t t t 7 1 t (a) orgnal space Fgure : More than one tme step - Tlng read off from dependence cone and used by hexagonal tlng. Square tles cause loss of tle-level parallelsm. 7 1 t (a) orgnal space Fgure : More than one tme step - Tlng read off from dependence cone and used by hexagonal tlng. Non-square tles ensure good effcency and maxmal tle-level parallelsm. 8 t+ 1 8 (a) orgnal space Fgure : Damond tlng

6 wavefront coeffcents, λ = 1, λ 1 = 3 are chosen. As shown earler, ths severely reduces tle-level parallelsm. On the other hand, for the same memory footprnt as before, ths tlng executes 1 computatons. We can restore concurrent start wth the default wavefront by usng non-square tle szes. Fgure shows a nonsquare tlng (s = 1, s 1 = ) whch enables concurrent start, whch has maxmal tle-level parallelsm and whch reaches 1 computatons for a memory footprnt of three. Consequently, we would prefer ths tlng over the prevous two.. Optmal tles wth default wavefront As seen n the prevous secton, the use of the default wavefront coeffcents s necessary to ensure hgh tle-densty. However, by tself t gves no guarantee nether for concurrent start nor does t ensure that all tles share the same nteger pont placement. As those propertes are mportant, we present the condtons under whch they can be reached. Frst, we explore the nteger pont placement. Assumng tlng hyperplanes h are combned nto a matrx: H = h. h k then tle szes that are multples of the determnant of H wll ensure that all tles have the same confguraton of nteger ponts snce det(h) H 1 s an nteger matrx. The hyperplanes used, e.g., n Fgure 3 yeld ( ) 1 1 H = 1 and consequently det(h) = 1+ = 3. As s = s 1 = are not multples of 3, the tles dffer n the nteger pont placement. For the same fgure, tle szes such as, e.g., s = s 1 = 3 would ensure a unform nteger placement across all tles. The above condton s suffcent ndependently of the chosen wavefront schedule. Next, we nvestgate the condtons on tle szes to ensure concurrent start wth the default default wavefront coeffcents. Let h x, be the frst component of h x and h x,1 the second. The default wavefront then s (h,t + h,1)/s + (h 1,t + h 1,1)/s 1. Now, to acheve concurrent start, we need to ensure that the default wavefront schedule only depends on the tme dmenson t and that all space dmensons (.e., ) are elmnated. Ths s true under the condton s / h,1 = s 1/ h 1,1. Note that the wavefront may stll depend on the fractonal part of the space dmenson, but ths only results n a varaton wthn a fxed range, ndependently of the sze of the doman. We can see that n Fgure 1, where we reach concurrent start for the default wavefront, ths condtons holds wth /1 = /1. On the other hand, when changng the tle szes to s = and s 1 = as n Fgure, the prevous condton turns nto /1 = /1 and concurrent start s not possble wth the default wavefront. The above shows that to obtan concurrent start the two tle szes cannot be chosen ndependently, but need to be scaled together. To make ths more clear we ntroduce a new varable s whch can be chosen freely and whch s then used to defne s = s h,1 and s 1 = s h 1,1 such that concurrent start s obtaned. 3. UNIFIED DIAMOND AND HEXAGONAL TILING In ths secton we present a extended formulaton of damond tlng whch allows the creaton of hexagonal tles. The hexagonal tles calculated are smlar to the ones presented n [], but are not dentcal n shape. 3.1 The schedule for hexagonal tlng (D case) Let us frst consder a two-dmensonal teraton space. To obtan such a schedule we start from the damond tlng approach, whch means we frst calculate a set of tlng hyperplanes, transform the ndex space wth these hyperplanes and then apply rectangular tlng n the transformed space. We then (optonally) transform the rectangular tlng by stretchng the rectangular tles along the concurrent start hyperplane. The stretched rectangular tles n the transformed space form hexagonal tles n the orgnal space. As a result we have a sngle schedule that descrbes damond tlng, f tles are stretched by a vector of length zero, and hexagonal tlng, f they are stretched by a non-zero-length vector. In the followng descrpton, we assume that the tlng hyperplanes h, h 1 are computed by the damond tlng algorthm as descrbed n []. We focus on the descrpton of the (possbly) stretched tlng scheme n the transformed space. As nput for the stretched tlng scheme, we take the tle szes s, s 1 as well as a vector v = (v, v 1), whch s parallel to the concurrent start hyperplane (n the transformed space). We also requre that the drecton vector of the concurrent start hyperplane n = (n, n 1) s strctly postve n all components, as guaranteed by the algorthm of []. We frst model damond tlng usng a standard D rectangular tlng n the transformed space. In ths tlng the symbols s, s 1 defne the tle szes along the dmensons d, d 1 whle T, T 1 are the resultng tle schedule dmensons (we gnore the pont schedule dmensons, as ths mappng s not nterestng for ths dscusson). The followng map descrbes such a rectangular tlng. (d, d 1) (T, T 1) : s T d < s (T + 1) s 1T 1 d 1 < s 1(T 1 + 1) Our goal s to acheve and mantan concurrent start usng the default wavefront. Consequently s and s 1 cannot be chosen freely (see Secton.). We requre the user to choose tle szes that ensure concurrent start. Fgure llustrates the above rectangular tlng usng the transformaton (t, ) (t +, t ), as well as the tle szes s =, s 1 = 3. The red tles show the concurrent start wavefront. Startng from ths rectangular tlng we want to stretch the contaned tles by a vector v wth components v, v 1, where v s parallel to the concurrent start hyperplane. In prncple, v can have ether of two possble drectons, but to smplfy the schedule formulaton we choose v such that v < v 1 >. Fgure 7 shows a stretchng as we obtan t for v = (, ) and n = (1, ). Before we mplement the actual stretchng, we frst add two addtonal constrants to each tle. The frst one bounds each tle at ts lexcographc mnmal pont wth the concurrent start hyperplane, the second one bounds each tle at ts lexcographc maxmal pont wth the same (but translated) hyperplane. We mplement the lower boundary by placng the hyperplane at the orgn and by offsettng t for each tle

7 t 8 t+ 1 8 (a) orgnal space Fgure 7: Hexagonal-tlng accordng to the tle szes. To offset the tle along d we adjust the rght hand sde of the lower bound by n s T and n 1s 1T 1. The upper boundary s mplemented by reversng the lower hyperplane. The locaton of the upper hyperplanes for tle (T, T 1) s the orgn of tle (T + 1, T 1 + 1). (d, d 1) (T, T 1) : s T d < s (T + 1) s 1T 1 d 1 < s 1(T 1 + 1) n s T + n 1s 1T 1 n d + n 1d 1 n d + n 1d 1 < n s (T + 1) + n 1s 1(T 1 + 1) As a last step, we now stretch the tles along v. Ths requres us to ncrease the sze of the rectangular tles by v n the d dmenson and v 1 n the d 1 dmenson. We also account for the shfted postons of the rectangular tles by addng some offsets o, o 1 to the upper and lower tle boundares that wll be derved later n ths secton. Fnally we adjust the locatons of the concurrent start planes by usng c = n 1(s + v ) + n v 1 and c 1 = n 1(s 1 + v 1) + n v. (d, d 1) (T, T 1) : o, o 1 : o = v T + v T 1 o 1 = v 1T + v 1T 1 s T + o + v d < s (T + 1) + o s 1T 1 + o 1 d 1 < s 1(T 1 + 1) + v 1 + o 1 c T + c 1T 1 n d + n 1d 1 n d + n 1d 1 < c (T + 1) + c 1(T 1 + 1) Fgure 8 llustrates the last step n detal. On the left sde we see n red the orgnal square tles (,), (1,) and (1,1) each of sze. On the rght sde, we see the tles wth the same tle numbers, but stretched along v. We can see that the rectangular tle shapes have been extended by along d and by along d 1 resultng n the lght blue tle shapes (the dark blue tle shapes llustrate the contaned nteger ponts). We can also see that the poston of the red tle shape of tle (,) has not moved. However, when gong one step up to tle (1,) whch means ncreasng the tle number T by one, we offset the tle by v along d 1 as well as v 1 along d 1. Smlarly, when gong from tle (1,) to tle (1,1) whch means ncreasng the tle number T 1 by one, we offset the tle by v along d and v 1 along d 1. Combned ths yelds the offset o = v T +v T 1 for d and o 1 = v 1T +v 1T 1 for d 1. The new values c and c 1 do now also take nto account the offset of the plane. When varyng T we now do not only need to take the vertcal tle sze s nto account, but n addton we nclude the addtonal vertcal offset v as well as the changed horzontal offset v 1. To support concurrent start hyperplanes of dfferent orentatons such offsets are scaled by the relevant components of n. The correspondng changes have been added when adjustng c 1. A very mportant observaton to make s that the tles (T, T 1) as well as (T + 1, T 1 + 1) have overlappng rectangular tles. However, the concurrent start hyperplanes that have been added rght at the poston of v ensure that the tles are non-overlappng and stll tle the full space. Also, as our stretchng and translaton was only along the concurrent start hyperplane, no dependences have been volated. Fnally, f the prevous tlng had concurrent start, stretchng along the concurrent start hyperplane preserves ths property. 3. Hexagonal tlng for hgher dmensons To extend our unfed hexagonal tlng to hgher dmensonal kernels we use a shape derved from a truncated octahedra [] to create a tlng for one tme and two space dmensons, that not only provdes two dmensons of parallelsm, but that also gves the freedom to adjust the sze of the tle shape ndependently for the dfferent dmensons. Fgure 9b llustrates such a tlng. In the llustraton the tme dmenson goes upwards whereas the space dmensons go to the lower left and the lower rght corner of the renderng. The hyperplane orthogonal to the tme dmenson s the concurrent start hyperplane. Tles of the same color are executed at the same tme step. As vsble n the fgure, the tles of a sngle color are wthn a hyperplane parallel to the concurrent start hyperplane. The ndvdual tles of a sngle color are ndependent and can be executed n parallel. There s parallelsm along both space dmensons. All tles share a sngle tle shape. The hexagonal tlng s derved from the damond tlng llustrated n Fgure 9a (the same example as used by Bandsht et al. []). At the begnnng the peak of all tles s formed by a sngle pont (llustrated by the red dot on the lower left blue tle of Fgure 9a). Smlarly to the constructon of hexagonal tlng for one space dmenson, we then bound each tle at the top and at the bottom by the concurrent start hyperplane and stretch the peak to form a plane. However, for the case of two space dmenson we stretch along three dfferent vectors all chosen to be parallel to the concurrent start hyperplane and, n addton, to be nsde one of the tlng hyperplanes. We llustrate n the blue

8 1 t+ 8 1, 1,1 1, 1,1 t+, 1 1, 3 8 (a) unstretched (b) stretched Fgure 8: The stretchng n the transformed space (a) Plan Damond tlng - unstretched (b) Hexagonal tlng - Derved from the tlng n Fgure 9a, but stretched along the concurrent start hyperplane Fgure 9: Hexagonal tlng - two space dmensons (3D renderng) Fgure : Hexagonal tlng - two space dmensons (Tme steps -)

9 tle at the lower left of Fgure 9b these stretchng vectors n red. Fgure llustrates that the tlng s space fllng. By stretchng only wthn the concurrent start hyperplane no dependences have been volated and concurrent start s preserved. The graphcal llustraton and the above clams only gve an ntuton of ths tlng scheme. Addtonal work s requred to understand the constructon of such a tlng, ts propertes and ts effectveness. However, the promse we see s that we can translate the advantages of hexagonal tlng nto hgher dmensonal cases enablng flexble tle szes, concurrent start as well as thread-level parallelsm along multple dmensons for hgher-dmensonal kernels.. RELATED WORK Asde from the already dscussed damond and hybrdhexagonal tlng [, ], there has been a lot of successful research n generatng code to effcently perform stencl computatons. There s Pochor [1], a doman-specfc C++ framework as well as Henretty et al. [7] wth a DSL-based approach. Strzodka [1] uses an n-tle wavefront traversal technque to acheve effcent cache use even wth tle szes larger than the avalable cache memory. All approaches generate effcent CPU code. Then, there are a set of general optmzers. PPCG [1] generates parallel CPU and GPU code usng classcal (tme) tlng. It reles on affne transformatons to extract parallelsm and mprove localty, usng a varant of the Pluto algorthm [3]. Reservor Labs R-Stream s also a reference polyhedral compler targetng GPUs [, 1]. ParAll [1] s an open source parallelzng compler developed by Slkan targetng multple archtectures. The compler s not based on the polyhedral model, but uses abstract nterpretaton for array regons, performng powerful nterprocedural analyss on the nput code. Fnally, there are tools that generate effcent GPU code. Here Holewnsk s Overtle [8] and Grosser s splt tlng [] complers represent, besdes [], the state-of-the-art for the automatc generaton of effcent GPU code relyng on overlapped and splt tlng, respectvely.. CONCLUSION We presented a formulaton of hexagonal tlng that combnes the benefts of damond tlng and hybrd-hexagonal tlng. Startng from the publshed damond-tlng algorthm, we formulated condtons on tle szes and wavefront coeffcents to ensure concurrent start. We also formulated the condton that ensures the same nteger pont placement across all tles. And most mportantly, we extended the orgnal damond tlng algorthm to hexagonal tles. The added flexblty of hexagonal tles does not only make the choce of tle szes more flexble but also enables the creaton of tles wth a flat summt. Both these features have been shown useful for GPU code generaton. Fnally, we gave an outlook on our plans to extend ths tlng scheme to hgher dmensonal stencls, an extenson that wll brng together flexble tle szes and multple dmensons of parallelsm. Acknowledgments. Ths work greatly benefted from regular dscussons wth Uday Bondhugula. It was partly funded by a Google European Fellowshp n Effcent Computng, by the European FP7 project CARP d. 8777, the COP- CAMS ARTEMIS project, and award 988 from the U.S. NSF.. REFERENCES [1] Mehd Amn, Béatrce Creusllet, Stéphane Even, Ronan Keryell, Ong Gouber, Serge Guelton, Jance Onanan McMahon, Franços-Xaver Pasquer, Grégore Péan, Perre Vllalon, et al. ParAll: From convex array regons to heterogeneous computng. In IMPACT, 1. [] Vnayaka Bandsht, Irshad Pananlath, and Uday Bondhugula. Tlng stencl computatons to maxmze parallelsm. In ACM Supercomputng Conf., 1. [3] Uday Bondhugula, J. Ramanujam, and et al. PLuTo: A practcal and fully automatc polyhedral program optmzaton system. In PLDI, 8. [] H.S.M. Coxeter. Regular and sem-regular polytopes.. Mathematsche Zetschrft, (1):38 7, 19. [] Tobas Grosser, Albert Cohen, Paul HJ Kelly, J Ramanujam, P Sadayappan, and Sven Verdoolaege. Splt tlng for gpus: automatc parallelzaton usng trapezodal tles. In GPGPU-. ACM, 13. [] Tobas Grosser, Albert Cohen, Sven Verdoolaege, P. Sadayappan, and Justn Holewnsk. Hybrd hexagonal/classcal tlng for GPUs. In Internatonal Symposum on Code Generaton and Optmzaton, 1. [7] Tom Henretty, Rchard Veras, Franz Franchett, Lous-Noël Pouchet, J. Ramanujam, and P. Sadayappan. A stencl compler for short-vector smd archtectures. In ICS. ACM, 13. [8] Justn Holewnsk, Lous-Noël Pouchet, and P Sadayappan. Hgh-performance code generaton for stencl computatons on gpu archtectures. In ICS, pages ACM, 1. [9] Srram Krshnamoorthy, Muthu Baskaran, Uday Bondhugula, J. Ramanujam, Atanas Rountev, and P. Sadayappan. Effectve automatc parallelzaton of stencl computatons. In PLDI, pages 3, 7. [] Allen Leung, Ncolas Vaslache, Benoît Mester, Muthu Baskaran, Davd Wohlford, Cédrc Bastoul, and Rchard Lethn. A mappng path for mult-gpgpu accelerated computers from a portable hgh level programmng abstracton. In GPGPU-3,. [11] G. Smth. Numercal Soluton of Partal Dfferental Equatons: Fnte Dfference Methods. Oxford Unversty Press,. [1] Robert Strzodka, Mohammed Shaheen, Dawd Pajak, and H Sedel. Cache accurate tme skewng n teratve stencl computatons. In Parallel Processng (ICPP), 11 Internatonal Conference on. IEEE, 11. [13] A. Taflove. Computatonal electrodynamcs: The Fnte-dfference tme-doman method. Artech House, 199. [1] Yuan Tang, Rezaul Alam Chowdhury, Bradley C Kuszmaul, Ch-Keung Luk, and Charles E Leserson. The pochor stencl compler. In SPAA. ACM, 11. [1] Ncolas Vaslache, Benot Mester, Muthu Baskaran, and Rchard Lethn. Jont schedulng and layout optmzaton to enable mult-level vectorzaton. In IMPACT, Pars, France, January 1. [1] Sven Verdoolaege, Juan Carlos Juega, Albert Cohen, José Ignaco Gómez, Chrstan Tenllado, and Francky Catthoor. Polyhedral parallel code generaton for cuda. ACM TACO, 9():, 13.

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

Polyhedral Compilation Foundations

Polyhedral Compilation Foundations Polyhedral Complaton Foundatons Lous-Noël Pouchet pouchet@cse.oho-state.edu Dept. of Computer Scence and Engneerng, the Oho State Unversty Feb 8, 200 888., Class # Introducton: Polyhedral Complaton Foundatons

More information

Loop Transformations for Parallelism & Locality. Review. Scalar Expansion. Scalar Expansion: Motivation

Loop Transformations for Parallelism & Locality. Review. Scalar Expansion. Scalar Expansion: Motivation Loop Transformatons for Parallelsm & Localty Last week Data dependences and loops Loop transformatons Parallelzaton Loop nterchange Today Scalar expanson for removng false dependences Loop nterchange Loop

More information

Loop Permutation. Loop Transformations for Parallelism & Locality. Legality of Loop Interchange. Loop Interchange (cont)

Loop Permutation. Loop Transformations for Parallelism & Locality. Legality of Loop Interchange. Loop Interchange (cont) Loop Transformatons for Parallelsm & Localty Prevously Data dependences and loops Loop transformatons Parallelzaton Loop nterchange Today Loop nterchange Loop transformatons and transformaton frameworks

More information

Vectorization in the Polyhedral Model

Vectorization in the Polyhedral Model Vectorzaton n the Polyhedral Model Lous-Noël Pouchet pouchet@cse.oho-state.edu Dept. of Computer Scence and Engneerng, the Oho State Unversty October 200 888. Introducton: Overvew Vectorzaton: Detecton

More information

The Codesign Challenge

The Codesign Challenge ECE 4530 Codesgn Challenge Fall 2007 Hardware/Software Codesgn The Codesgn Challenge Objectves In the codesgn challenge, your task s to accelerate a gven software reference mplementaton as fast as possble.

More information

A General Purpose Automatic Overlapped Tiling Technique in Polyhedral Frameworks

A General Purpose Automatic Overlapped Tiling Technique in Polyhedral Frameworks General Purpose utomatc Overlapped Tlng Technque n Polyhedral Frameworks Je Zhao INRI & École Normale Supéreure 45 rue d Ulm, 75005 Pars, France M Student Research ompetton (SR) 2018 IEEE/M Internatonal

More information

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster

More information

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 A mathematcal programmng approach to the analyss, desgn and

More information

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,

More information

S1 Note. Basis functions.

S1 Note. Basis functions. S1 Note. Bass functons. Contents Types of bass functons...1 The Fourer bass...2 B-splne bass...3 Power and type I error rates wth dfferent numbers of bass functons...4 Table S1. Smulaton results of type

More information

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour 6.854 Advanced Algorthms Petar Maymounkov Problem Set 11 (November 23, 2005) Wth: Benjamn Rossman, Oren Wemann, and Pouya Kheradpour Problem 1. We reduce vertex cover to MAX-SAT wth weghts, such that the

More information

Loop Transformations, Dependences, and Parallelization

Loop Transformations, Dependences, and Parallelization Loop Transformatons, Dependences, and Parallelzaton Announcements Mdterm s Frday from 3-4:15 n ths room Today Semester long project Data dependence recap Parallelsm and storage tradeoff Scalar expanson

More information

Analysis of Continuous Beams in General

Analysis of Continuous Beams in General Analyss of Contnuous Beams n General Contnuous beams consdered here are prsmatc, rgdly connected to each beam segment and supported at varous ponts along the beam. onts are selected at ponts of support,

More information

Today Using Fourier-Motzkin elimination for code generation Using Fourier-Motzkin elimination for determining schedule constraints

Today Using Fourier-Motzkin elimination for code generation Using Fourier-Motzkin elimination for determining schedule constraints Fourer Motzkn Elmnaton Logstcs HW10 due Frday Aprl 27 th Today Usng Fourer-Motzkn elmnaton for code generaton Usng Fourer-Motzkn elmnaton for determnng schedule constrants Unversty Fourer-Motzkn Elmnaton

More information

Feature Reduction and Selection

Feature Reduction and Selection Feature Reducton and Selecton Dr. Shuang LIANG School of Software Engneerng TongJ Unversty Fall, 2012 Today s Topcs Introducton Problems of Dmensonalty Feature Reducton Statstc methods Prncpal Components

More information

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

Mathematics 256 a course in differential equations for engineering students

Mathematics 256 a course in differential equations for engineering students Mathematcs 56 a course n dfferental equatons for engneerng students Chapter 5. More effcent methods of numercal soluton Euler s method s qute neffcent. Because the error s essentally proportonal to the

More information

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes SPH3UW Unt 7.3 Sphercal Concave Mrrors Page 1 of 1 Notes Physcs Tool box Concave Mrror If the reflectng surface takes place on the nner surface of the sphercal shape so that the centre of the mrror bulges

More information

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique //00 :0 AM Outlne and Readng The Greedy Method The Greedy Method Technque (secton.) Fractonal Knapsack Problem (secton..) Task Schedulng (secton..) Mnmum Spannng Trees (secton.) Change Money Problem Greedy

More information

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following.

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following. Complex Numbers The last topc n ths secton s not really related to most of what we ve done n ths chapter, although t s somewhat related to the radcals secton as we wll see. We also won t need the materal

More information

Hermite Splines in Lie Groups as Products of Geodesics

Hermite Splines in Lie Groups as Products of Geodesics Hermte Splnes n Le Groups as Products of Geodescs Ethan Eade Updated May 28, 2017 1 Introducton 1.1 Goal Ths document defnes a curve n the Le group G parametrzed by tme and by structural parameters n the

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

Cluster Analysis of Electrical Behavior

Cluster Analysis of Electrical Behavior Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School

More information

Chapter 6 Programmng the fnte element method Inow turn to the man subject of ths book: The mplementaton of the fnte element algorthm n computer programs. In order to make my dscusson as straghtforward

More information

Parallel Numerics. 1 Preconditioning & Iterative Solvers (From 2016)

Parallel Numerics. 1 Preconditioning & Iterative Solvers (From 2016) Technsche Unverstät München WSe 6/7 Insttut für Informatk Prof. Dr. Thomas Huckle Dpl.-Math. Benjamn Uekermann Parallel Numercs Exercse : Prevous Exam Questons Precondtonng & Iteratve Solvers (From 6)

More information

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,

More information

Machine Learning. Support Vector Machines. (contains material adapted from talks by Constantin F. Aliferis & Ioannis Tsamardinos, and Martin Law)

Machine Learning. Support Vector Machines. (contains material adapted from talks by Constantin F. Aliferis & Ioannis Tsamardinos, and Martin Law) Machne Learnng Support Vector Machnes (contans materal adapted from talks by Constantn F. Alfers & Ioanns Tsamardnos, and Martn Law) Bryan Pardo, Machne Learnng: EECS 349 Fall 2014 Support Vector Machnes

More information

A SYSTOLIC APPROACH TO LOOP PARTITIONING AND MAPPING INTO FIXED SIZE DISTRIBUTED MEMORY ARCHITECTURES

A SYSTOLIC APPROACH TO LOOP PARTITIONING AND MAPPING INTO FIXED SIZE DISTRIBUTED MEMORY ARCHITECTURES A SYSOLIC APPROACH O LOOP PARIIONING AND MAPPING INO FIXED SIZE DISRIBUED MEMORY ARCHIECURES Ioanns Drosts, Nektaros Kozrs, George Papakonstantnou and Panayots sanakas Natonal echncal Unversty of Athens

More information

3D vector computer graphics

3D vector computer graphics 3D vector computer graphcs Paolo Varagnolo: freelance engneer Padova Aprl 2016 Prvate Practce ----------------------------------- 1. Introducton Vector 3D model representaton n computer graphcs requres

More information

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision SLAM Summer School 2006 Practcal 2: SLAM usng Monocular Vson Javer Cvera, Unversty of Zaragoza Andrew J. Davson, Imperal College London J.M.M Montel, Unversty of Zaragoza. josemar@unzar.es, jcvera@unzar.es,

More information

TN348: Openlab Module - Colocalization

TN348: Openlab Module - Colocalization TN348: Openlab Module - Colocalzaton Topc The Colocalzaton module provdes the faclty to vsualze and quantfy colocalzaton between pars of mages. The Colocalzaton wndow contans a prevew of the two mages

More information

Solving two-person zero-sum game by Matlab

Solving two-person zero-sum game by Matlab Appled Mechancs and Materals Onlne: 2011-02-02 ISSN: 1662-7482, Vols. 50-51, pp 262-265 do:10.4028/www.scentfc.net/amm.50-51.262 2011 Trans Tech Publcatons, Swtzerland Solvng two-person zero-sum game by

More information

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 An Iteratve Soluton Approach to Process Plant Layout usng Mxed

More information

Meta-heuristics for Multidimensional Knapsack Problems

Meta-heuristics for Multidimensional Knapsack Problems 2012 4th Internatonal Conference on Computer Research and Development IPCSIT vol.39 (2012) (2012) IACSIT Press, Sngapore Meta-heurstcs for Multdmensonal Knapsack Problems Zhbao Man + Computer Scence Department,

More information

Smoothing Spline ANOVA for variable screening

Smoothing Spline ANOVA for variable screening Smoothng Splne ANOVA for varable screenng a useful tool for metamodels tranng and mult-objectve optmzaton L. Rcco, E. Rgon, A. Turco Outlne RSM Introducton Possble couplng Test case MOO MOO wth Game Theory

More information

2x x l. Module 3: Element Properties Lecture 4: Lagrange and Serendipity Elements

2x x l. Module 3: Element Properties Lecture 4: Lagrange and Serendipity Elements Module 3: Element Propertes Lecture : Lagrange and Serendpty Elements 5 In last lecture note, the nterpolaton functons are derved on the bass of assumed polynomal from Pascal s trangle for the fled varable.

More information

Parallel matrix-vector multiplication

Parallel matrix-vector multiplication Appendx A Parallel matrx-vector multplcaton The reduced transton matrx of the three-dmensonal cage model for gel electrophoress, descrbed n secton 3.2, becomes excessvely large for polymer lengths more

More information

An Approach in Coloring Semi-Regular Tilings on the Hyperbolic Plane

An Approach in Coloring Semi-Regular Tilings on the Hyperbolic Plane An Approach n Colorng Sem-Regular Tlngs on the Hyperbolc Plane Ma Louse Antonette N De Las Peñas, mlp@mathscmathadmueduph Glenn R Lago, glago@yahoocom Math Department, Ateneo de Manla Unversty, Loyola

More information

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009. Farrukh Jabeen Algorthms 51 Assgnment #2 Due Date: June 15, 29. Assgnment # 2 Chapter 3 Dscrete Fourer Transforms Implement the FFT for the DFT. Descrbed n sectons 3.1 and 3.2. Delverables: 1. Concse descrpton

More information

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1)

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1) Secton 1.2 Subsets and the Boolean operatons on sets If every element of the set A s an element of the set B, we say that A s a subset of B, or that A s contaned n B, or that B contans A, and we wrte A

More information

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,

More information

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices Internatonal Mathematcal Forum, Vol 7, 2012, no 52, 2549-2554 An Applcaton of the Dulmage-Mendelsohn Decomposton to Sparse Null Space Bases of Full Row Rank Matrces Mostafa Khorramzadeh Department of Mathematcal

More information

CMPS 10 Introduction to Computer Science Lecture Notes

CMPS 10 Introduction to Computer Science Lecture Notes CPS 0 Introducton to Computer Scence Lecture Notes Chapter : Algorthm Desgn How should we present algorthms? Natural languages lke Englsh, Spansh, or French whch are rch n nterpretaton and meanng are not

More information

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

TPL-Aware Displacement-driven Detailed Placement Refinement with Coloring Constraints

TPL-Aware Displacement-driven Detailed Placement Refinement with Coloring Constraints TPL-ware Dsplacement-drven Detaled Placement Refnement wth Colorng Constrants Tao Ln Iowa State Unversty tln@astate.edu Chrs Chu Iowa State Unversty cnchu@astate.edu BSTRCT To mnmze the effect of process

More information

Module Management Tool in Software Development Organizations

Module Management Tool in Software Development Organizations Journal of Computer Scence (5): 8-, 7 ISSN 59-66 7 Scence Publcatons Management Tool n Software Development Organzatons Ahmad A. Al-Rababah and Mohammad A. Al-Rababah Faculty of IT, Al-Ahlyyah Amman Unversty,

More information

Communication-Minimal Partitioning and Data Alignment for Af"ne Nested Loops

Communication-Minimal Partitioning and Data Alignment for Afne Nested Loops Communcaton-Mnmal Parttonng and Data Algnment for Af"ne Nested Loops HYUK-JAE LEE 1 AND JOSÉ A. B. FORTES 2 1 Department of Computer Scence, Lousana Tech Unversty, Ruston, LA 71272, USA 2 School of Electrcal

More information

GSLM Operations Research II Fall 13/14

GSLM Operations Research II Fall 13/14 GSLM 58 Operatons Research II Fall /4 6. Separable Programmng Consder a general NLP mn f(x) s.t. g j (x) b j j =. m. Defnton 6.. The NLP s a separable program f ts objectve functon and all constrants are

More information

Sum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints

Sum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints Australan Journal of Basc and Appled Scences, 2(4): 1204-1208, 2008 ISSN 1991-8178 Sum of Lnear and Fractonal Multobjectve Programmng Problem under Fuzzy Rules Constrants 1 2 Sanjay Jan and Kalash Lachhwan

More information

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur FEATURE EXTRACTION Dr. K.Vjayarekha Assocate Dean School of Electrcal and Electroncs Engneerng SASTRA Unversty, Thanjavur613 41 Jont Intatve of IITs and IISc Funded by MHRD Page 1 of 8 Table of Contents

More information

SENSITIVITY ANALYSIS IN LINEAR PROGRAMMING USING A CALCULATOR

SENSITIVITY ANALYSIS IN LINEAR PROGRAMMING USING A CALCULATOR SENSITIVITY ANALYSIS IN LINEAR PROGRAMMING USING A CALCULATOR Judth Aronow Rchard Jarvnen Independent Consultant Dept of Math/Stat 559 Frost Wnona State Unversty Beaumont, TX 7776 Wnona, MN 55987 aronowju@hal.lamar.edu

More information

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Some materal adapted from Mohamed Youns, UMBC CMSC 611 Spr 2003 course sldes Some materal adapted from Hennessy & Patterson / 2003 Elsever Scence Performance = 1 Executon tme Speedup = Performance (B)

More information

Range images. Range image registration. Examples of sampling patterns. Range images and range surfaces

Range images. Range image registration. Examples of sampling patterns. Range images and range surfaces Range mages For many structured lght scanners, the range data forms a hghly regular pattern known as a range mage. he samplng pattern s determned by the specfc scanner. Range mage regstraton 1 Examples

More information

VISUAL SELECTION OF SURFACE FEATURES DURING THEIR GEOMETRIC SIMULATION WITH THE HELP OF COMPUTER TECHNOLOGIES

VISUAL SELECTION OF SURFACE FEATURES DURING THEIR GEOMETRIC SIMULATION WITH THE HELP OF COMPUTER TECHNOLOGIES UbCC 2011, Volume 6, 5002981-x manuscrpts OPEN ACCES UbCC Journal ISSN 1992-8424 www.ubcc.org VISUAL SELECTION OF SURFACE FEATURES DURING THEIR GEOMETRIC SIMULATION WITH THE HELP OF COMPUTER TECHNOLOGIES

More information

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Proceedngs of the Wnter Smulaton Conference M E Kuhl, N M Steger, F B Armstrong, and J A Jones, eds A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Mark W Brantley Chun-Hung

More information

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth

More information

Multiblock method for database generation in finite element programs

Multiblock method for database generation in finite element programs Proc. of the 9th WSEAS Int. Conf. on Mathematcal Methods and Computatonal Technques n Electrcal Engneerng, Arcachon, October 13-15, 2007 53 Multblock method for database generaton n fnte element programs

More information

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters Proper Choce of Data Used for the Estmaton of Datum Transformaton Parameters Hakan S. KUTOGLU, Turkey Key words: Coordnate systems; transformaton; estmaton, relablty. SUMMARY Advances n technologes and

More information

Improving High Level Synthesis Optimization Opportunity Through Polyhedral Transformations

Improving High Level Synthesis Optimization Opportunity Through Polyhedral Transformations Improvng Hgh Level Synthess Optmzaton Opportunty Through Polyhedral Transformatons We Zuo 2,5, Yun Lang 1, Peng L 1, Kyle Rupnow 3, Demng Chen 2,3 and Jason Cong 1,4 1 Center for Energy-Effcent Computng

More information

Lecture 4: Principal components

Lecture 4: Principal components /3/6 Lecture 4: Prncpal components 3..6 Multvarate lnear regresson MLR s optmal for the estmaton data...but poor for handlng collnear data Covarance matrx s not nvertble (large condton number) Robustness

More information

A Facet Generation Procedure. for solving 0/1 integer programs

A Facet Generation Procedure. for solving 0/1 integer programs A Facet Generaton Procedure for solvng 0/ nteger programs by Gyana R. Parja IBM Corporaton, Poughkeepse, NY 260 Radu Gaddov Emery Worldwde Arlnes, Vandala, Oho 45377 and Wlbert E. Wlhelm Teas A&M Unversty,

More information

Wavefront Reconstructor

Wavefront Reconstructor A Dstrbuted Smplex B-Splne Based Wavefront Reconstructor Coen de Vsser and Mchel Verhaegen 14-12-201212 2012 Delft Unversty of Technology Contents Introducton Wavefront reconstructon usng Smplex B-Splnes

More information

ELEC 377 Operating Systems. Week 6 Class 3

ELEC 377 Operating Systems. Week 6 Class 3 ELEC 377 Operatng Systems Week 6 Class 3 Last Class Memory Management Memory Pagng Pagng Structure ELEC 377 Operatng Systems Today Pagng Szes Vrtual Memory Concept Demand Pagng ELEC 377 Operatng Systems

More information

Review of approximation techniques

Review of approximation techniques CHAPTER 2 Revew of appromaton technques 2. Introducton Optmzaton problems n engneerng desgn are characterzed by the followng assocated features: the objectve functon and constrants are mplct functons evaluated

More information

UNIT 2 : INEQUALITIES AND CONVEX SETS

UNIT 2 : INEQUALITIES AND CONVEX SETS UNT 2 : NEQUALTES AND CONVEX SETS ' Structure 2. ntroducton Objectves, nequaltes and ther Graphs Convex Sets and ther Geometry Noton of Convex Sets Extreme Ponts of Convex Set Hyper Planes and Half Spaces

More information

Programming in Fortran 90 : 2017/2018

Programming in Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Exercse 1 : Evaluaton of functon dependng on nput Wrte a program who evaluate the functon f (x,y) for any two user specfed values

More information

Improving Low Density Parity Check Codes Over the Erasure Channel. The Nelder Mead Downhill Simplex Method. Scott Stransky

Improving Low Density Parity Check Codes Over the Erasure Channel. The Nelder Mead Downhill Simplex Method. Scott Stransky Improvng Low Densty Party Check Codes Over the Erasure Channel The Nelder Mead Downhll Smplex Method Scott Stransky Programmng n conjuncton wth: Bors Cukalovc 18.413 Fnal Project Sprng 2004 Page 1 Abstract

More information

APPLICATION OF MULTIVARIATE LOSS FUNCTION FOR ASSESSMENT OF THE QUALITY OF TECHNOLOGICAL PROCESS MANAGEMENT

APPLICATION OF MULTIVARIATE LOSS FUNCTION FOR ASSESSMENT OF THE QUALITY OF TECHNOLOGICAL PROCESS MANAGEMENT 3. - 5. 5., Brno, Czech Republc, EU APPLICATION OF MULTIVARIATE LOSS FUNCTION FOR ASSESSMENT OF THE QUALITY OF TECHNOLOGICAL PROCESS MANAGEMENT Abstract Josef TOŠENOVSKÝ ) Lenka MONSPORTOVÁ ) Flp TOŠENOVSKÝ

More information

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1 4/14/011 Outlne Dscrmnatve classfers for mage recognton Wednesday, Aprl 13 Krsten Grauman UT-Austn Last tme: wndow-based generc obect detecton basc ppelne face detecton wth boostng as case study Today:

More information

SVM-based Learning for Multiple Model Estimation

SVM-based Learning for Multiple Model Estimation SVM-based Learnng for Multple Model Estmaton Vladmr Cherkassky and Yunqan Ma Department of Electrcal and Computer Engneerng Unversty of Mnnesota Mnneapols, MN 55455 {cherkass,myq}@ece.umn.edu Abstract:

More information

Type-2 Fuzzy Non-uniform Rational B-spline Model with Type-2 Fuzzy Data

Type-2 Fuzzy Non-uniform Rational B-spline Model with Type-2 Fuzzy Data Malaysan Journal of Mathematcal Scences 11(S) Aprl : 35 46 (2017) Specal Issue: The 2nd Internatonal Conference and Workshop on Mathematcal Analyss (ICWOMA 2016) MALAYSIAN JOURNAL OF MATHEMATICAL SCIENCES

More information

ADJUSTING A PROGRAM TRANSFORMATION FOR LEGALITY

ADJUSTING A PROGRAM TRANSFORMATION FOR LEGALITY Parallel Processng Letters c World Scentfc Publshng Company ADJUSTING A PROGRAM TRANSFORMATION FOR LEGALITY CÉDRIC BASTOUL Laboratore PRSM, Unversté de Versalles Sant Quentn 45 avenue des États-Uns, 785

More information

Problem Set 3 Solutions

Problem Set 3 Solutions Introducton to Algorthms October 4, 2002 Massachusetts Insttute of Technology 6046J/18410J Professors Erk Demane and Shaf Goldwasser Handout 14 Problem Set 3 Solutons (Exercses were not to be turned n,

More information

LECTURE NOTES Duality Theory, Sensitivity Analysis, and Parametric Programming

LECTURE NOTES Duality Theory, Sensitivity Analysis, and Parametric Programming CEE 60 Davd Rosenberg p. LECTURE NOTES Dualty Theory, Senstvty Analyss, and Parametrc Programmng Learnng Objectves. Revew the prmal LP model formulaton 2. Formulate the Dual Problem of an LP problem (TUES)

More information

Lecture 5: Multilayer Perceptrons

Lecture 5: Multilayer Perceptrons Lecture 5: Multlayer Perceptrons Roger Grosse 1 Introducton So far, we ve only talked about lnear models: lnear regresson and lnear bnary classfers. We noted that there are functons that can t be represented

More information

Kent State University CS 4/ Design and Analysis of Algorithms. Dept. of Math & Computer Science LECT-16. Dynamic Programming

Kent State University CS 4/ Design and Analysis of Algorithms. Dept. of Math & Computer Science LECT-16. Dynamic Programming CS 4/560 Desgn and Analyss of Algorthms Kent State Unversty Dept. of Math & Computer Scence LECT-6 Dynamc Programmng 2 Dynamc Programmng Dynamc Programmng, lke the dvde-and-conquer method, solves problems

More information

An Accurate Evaluation of Integrals in Convex and Non convex Polygonal Domain by Twelve Node Quadrilateral Finite Element Method

An Accurate Evaluation of Integrals in Convex and Non convex Polygonal Domain by Twelve Node Quadrilateral Finite Element Method Internatonal Journal of Computatonal and Appled Mathematcs. ISSN 89-4966 Volume, Number (07), pp. 33-4 Research Inda Publcatons http://www.rpublcaton.com An Accurate Evaluaton of Integrals n Convex and

More information

Topology Design using LS-TaSC Version 2 and LS-DYNA

Topology Design using LS-TaSC Version 2 and LS-DYNA Topology Desgn usng LS-TaSC Verson 2 and LS-DYNA Wllem Roux Lvermore Software Technology Corporaton, Lvermore, CA, USA Abstract Ths paper gves an overvew of LS-TaSC verson 2, a topology optmzaton tool

More information

Ecient Computation of the Most Probable Motion from Fuzzy. Moshe Ben-Ezra Shmuel Peleg Michael Werman. The Hebrew University of Jerusalem

Ecient Computation of the Most Probable Motion from Fuzzy. Moshe Ben-Ezra Shmuel Peleg Michael Werman. The Hebrew University of Jerusalem Ecent Computaton of the Most Probable Moton from Fuzzy Correspondences Moshe Ben-Ezra Shmuel Peleg Mchael Werman Insttute of Computer Scence The Hebrew Unversty of Jerusalem 91904 Jerusalem, Israel Emal:

More information

Wishing you all a Total Quality New Year!

Wishing you all a Total Quality New Year! Total Qualty Management and Sx Sgma Post Graduate Program 214-15 Sesson 4 Vnay Kumar Kalakband Assstant Professor Operatons & Systems Area 1 Wshng you all a Total Qualty New Year! Hope you acheve Sx sgma

More information

Virtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory

Virtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory Background EECS. Operatng System Fundamentals No. Vrtual Memory Prof. Hu Jang Department of Electrcal Engneerng and Computer Scence, York Unversty Memory-management methods normally requres the entre process

More information

Intra-Parametric Analysis of a Fuzzy MOLP

Intra-Parametric Analysis of a Fuzzy MOLP Intra-Parametrc Analyss of a Fuzzy MOLP a MIAO-LING WANG a Department of Industral Engneerng and Management a Mnghsn Insttute of Technology and Hsnchu Tawan, ROC b HSIAO-FAN WANG b Insttute of Industral

More information

Array transposition in CUDA shared memory

Array transposition in CUDA shared memory Array transposton n CUDA shared memory Mke Gles February 19, 2014 Abstract Ths short note s nspred by some code wrtten by Jeremy Appleyard for the transposton of data through shared memory. I had some

More information

CHAPTER 2 DECOMPOSITION OF GRAPHS

CHAPTER 2 DECOMPOSITION OF GRAPHS CHAPTER DECOMPOSITION OF GRAPHS. INTRODUCTION A graph H s called a Supersubdvson of a graph G f H s obtaned from G by replacng every edge uv of G by a bpartte graph,m (m may vary for each edge by dentfyng

More information

Efficient Code Generation for Automatic Parallelization and Optimization

Efficient Code Generation for Automatic Parallelization and Optimization Effcent Code Generaton for utomatc Parallelzaton and Optmzaton Cédrc Bastoul Laboratore PRSM, Unversté de Versalles Sant Quentn 5 avenue des États-Uns, 7805 Versalles Cedex, France Emal: cedrcbastoul@prsmuvsqfr

More information

PYTHON IMPLEMENTATION OF VISUAL SECRET SHARING SCHEMES

PYTHON IMPLEMENTATION OF VISUAL SECRET SHARING SCHEMES PYTHON IMPLEMENTATION OF VISUAL SECRET SHARING SCHEMES Ruxandra Olmd Faculty of Mathematcs and Computer Scence, Unversty of Bucharest Emal: ruxandra.olmd@fm.unbuc.ro Abstract Vsual secret sharng schemes

More information

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points; Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features

More information

Self-tuning Histograms: Building Histograms Without Looking at Data

Self-tuning Histograms: Building Histograms Without Looking at Data Self-tunng Hstograms: Buldng Hstograms Wthout Lookng at Data Ashraf Aboulnaga Computer Scences Department Unversty of Wsconsn - Madson ashraf@cs.wsc.edu Surajt Chaudhur Mcrosoft Research surajtc@mcrosoft.com

More information

Circuit Analysis I (ENGR 2405) Chapter 3 Method of Analysis Nodal(KCL) and Mesh(KVL)

Circuit Analysis I (ENGR 2405) Chapter 3 Method of Analysis Nodal(KCL) and Mesh(KVL) Crcut Analyss I (ENG 405) Chapter Method of Analyss Nodal(KCL) and Mesh(KVL) Nodal Analyss If nstead of focusng on the oltages of the crcut elements, one looks at the oltages at the nodes of the crcut,

More information

Quality Improvement Algorithm for Tetrahedral Mesh Based on Optimal Delaunay Triangulation

Quality Improvement Algorithm for Tetrahedral Mesh Based on Optimal Delaunay Triangulation Intellgent Informaton Management, 013, 5, 191-195 Publshed Onlne November 013 (http://www.scrp.org/journal/m) http://dx.do.org/10.36/m.013.5601 Qualty Improvement Algorthm for Tetrahedral Mesh Based on

More information

CSCI 104 Sorting Algorithms. Mark Redekopp David Kempe

CSCI 104 Sorting Algorithms. Mark Redekopp David Kempe CSCI 104 Sortng Algorthms Mark Redekopp Davd Kempe Algorthm Effcency SORTING 2 Sortng If we have an unordered lst, sequental search becomes our only choce If we wll perform a lot of searches t may be benefcal

More information

Data Representation in Digital Design, a Single Conversion Equation and a Formal Languages Approach

Data Representation in Digital Design, a Single Conversion Equation and a Formal Languages Approach Data Representaton n Dgtal Desgn, a Sngle Converson Equaton and a Formal Languages Approach Hassan Farhat Unversty of Nebraska at Omaha Abstract- In the study of data representaton n dgtal desgn and computer

More information

SAO: A Stream Index for Answering Linear Optimization Queries

SAO: A Stream Index for Answering Linear Optimization Queries SAO: A Stream Index for Answerng near Optmzaton Queres Gang uo Kun-ung Wu Phlp S. Yu IBM T.J. Watson Research Center {luog, klwu, psyu}@us.bm.com Abstract near optmzaton queres retreve the top-k tuples

More information

Machine Learning: Algorithms and Applications

Machine Learning: Algorithms and Applications 14/05/1 Machne Learnng: Algorthms and Applcatons Florano Zn Free Unversty of Bozen-Bolzano Faculty of Computer Scence Academc Year 011-01 Lecture 10: 14 May 01 Unsupervsed Learnng cont Sldes courtesy of

More information

Concurrent models of computation for embedded software

Concurrent models of computation for embedded software Concurrent models of computaton for embedded software and hardware! Researcher overvew what t looks lke semantcs what t means and how t relates desgnng an actor language actor propertes and how to represent

More information

PHYSICS-ENHANCED L-SYSTEMS

PHYSICS-ENHANCED L-SYSTEMS PHYSICS-ENHANCED L-SYSTEMS Hansrud Noser 1, Stephan Rudolph 2, Peter Stuck 1 1 Department of Informatcs Unversty of Zurch, Wnterthurerstr. 190 CH-8057 Zurch Swtzerland noser(stuck)@f.unzh.ch, http://www.f.unzh.ch/~noser(~stuck)

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Decson surface s a hyperplane (lne n 2D) n feature space (smlar to the Perceptron) Arguably, the most mportant recent dscovery n machne learnng In a nutshell: map the data to a predetermned

More information

Computer Animation and Visualisation. Lecture 4. Rigging / Skinning

Computer Animation and Visualisation. Lecture 4. Rigging / Skinning Computer Anmaton and Vsualsaton Lecture 4. Rggng / Sknnng Taku Komura Overvew Sknnng / Rggng Background knowledge Lnear Blendng How to decde weghts? Example-based Method Anatomcal models Sknnng Assume

More information

REDUCING hardware design time is more than ever a

REDUCING hardware design time is more than ever a TCAD-2012-0168 1 Polyhedral Bubble Inserton: A Method to Improve Nested Loop Ppelnng for Hgh-Level Synthess Antone Morvan, Steven Derren, and Patrce Qunton Abstract Hgh-Level Synthess (HLS) allows hardware

More information