The relation between diamond tiling and hexagonal tiling
|
|
- Olivia Hawkins
- 5 years ago
- Views:
Transcription
1 The relaton between damond tlng and hexagonal tlng Tobas Grosser INRIA and École Normale Supéreure, Pars Sven Verdoolaege INRIA, École Normale Supéreure and KU Leuven P. Sadayappan Oho State Unversty Albert Cohen INRIA and École Normale Supéreure, Pars ABSTRACT Iteratve stencl computatons are mportant n scentfc computng and more and more also n the embedded and moble doman. Recent publcatons have shown that tlng schemes that ensure concurrent start provde effcent ways to execute these kernels. Damond tlng and hybrd-hexagonal tlng are two successful tlng schemes that enable concurrent start. Both have dfferent advantages: damond tlng s ntegrated n a general purpose optmzaton framework and uses a cost functon to choose among tlng hyperplanes, whereas the more flexble tle szes of hybrd-hexagonal tlng have proven to be effectve for the generaton of GPU code. We show that these two approaches are even more nterestng when combned. We revst the formalzaton of damond and hexagonal tlng, present the effects of tle sze and wavefront choces on tle-level parallelsm, and formulate constrants for optmal damond tle shapes. We then extend the damond tlng formulaton nto a hexagonal tlng one, combnng the benefts of both. The paper closes wth an outlook of hexagonal tlng n hgher dmensonal spaces, an mportant generalzaton sutable for massvely parallel archtectures. 1. INTRODUCTION Stencl computatons are an mportant computatonal pattern n both scentfc and engneerng applcatons and they are becomng ncreasngly mportant n the embedded and moble doman. Computatonal electrodynamcs [13] or partal dfferental equatons [11] are common use cases of stencls n hgh performance computng, whereas mage and vdeo processng are about to become drvng forces n the embedded market. Even though manual and automatc optmzatons of stencl computatons have been desgned snce many years, the generaton of effcent code remans a challenge especally for hgher-dmensonal stencls or for platforms whch allow hghly parallel executon on dfferent hardware levels. Wth the ncreased use of parallel hardware n moble markets as well as the foreseeable ncrease of three d- HStencls 1 Frst Internatonal Workshop on Hgh-Performance Stencl Computatons January 1, 1, Venna, Austra In conjuncton wth HPEAC 1. mensonal processng n upcomng embedded devces, a need emerges for solutons that facltate the automatc generaton of hgh-performance stencl codes for dfferent devces. For stencl computatons, the tlng strateges that enable reuse along the tme dmenson have shown to be most effcent. Unfortunately, the standard approach uses parallel wavefronts n a skewed ndex space. These skewed wavefronts reduce tle-level parallelsm [9] and nduce loadmbalanced prologue and eplogue phases. Splt tlng [, 9] and overlapped tlng [8, 9] address ths problem by enablng concurrent start along one of the orgnal teraton space dmenson. In other words, the tle schedule allows a wavefront of tles parallel to one of the orgnal dmensons of the ndex space to be executed n parallel. However, these two tlng technques requre ether perodcally alternatng tle shapes or nduce redundant computatons. In contrast, the recently publshed damond tlng [] and hybrd-hexagonal tlng [] schemes successfully obtan concurrent start wthout the need for redundant computatons or multple tle shapes. Damond tlng s a tlng strategy that uses a sngle n- dmensonal paralleloptope 1 that s calculated such that t s possble to create a tlng that ensures that the number of tles executable n parallel remans consstent throughout the computaton, meanng that the tle schedule enables concurrent start. The advantages of damond tlng are ts ntegraton n a general purpose complaton framework and the use of an adaptable cost functon to determne tle shapes. Hybrd hexagonal-classcal tlng s a tlng scheme that uses hexagonal tle shapes to enable concurrent start and to provde flexble tle sze choces on one dmenson. On the remanng dmensons t uses classcal parallelogram tlng. The more doman specfc formulaton of hybrd-hexagonal tlng does not optmze tle shapes for a certan cost functon, but always uses the most narrow dependence cone to derve the tle shape. On the other sde, hybrd-hexagonal tlng has the advantage that t allows to adjust the tme-tle heght and the wdth along the space dmenson ndvdually. It also permts the creaton of tles wth a flat summt and can ensure that tles do not only have the same ratonal shape, but ther nteger pont placement s by constructon dentcal all propertes that have shown to be essental for effcent GPU code generaton. Besdes these advantages, there are also open problems. Even though damond tlng 1 A parallelotope s a general term for what s known n D as parallelogram and n 3D as parallelepped.
2 generally explans how to derve tlng hyperplanes that enable concurrent start, a tle schedule that ncludes both the tle szes as well as the parallel wavefront coeffcents necessary to obtan concurrent start was not presented. Hexagonal tlng has shown benefcal for hgher dmensonal stencls when combned wth other tlng schemes, but the formulaton of hexagonal tlng tself s lmted to the D case (1 tme dmenson, 1 space dmenson). Ths paper combnes the two tlng strateges to get the best of both worlds. Its contrbutons are: a) an n-depth analyss of the constrants that damond-tlng mposes on tle-szes and wavefront coeffcents, b) a formulaton of condtons that ensure dentcal placement of nteger ponts wthn the tles, c) an extenson of the orgnal damond tlng algorthm to a hexagonal tlng algorthm for dmensonal problems (1 tme dmenson, 1 space dmenson), d) deas for hexagonal tlng of hgher dmensonal stencls. The paper s structured as follows. In Secton we revst damond tlng, provde nsghts on tle sze and wavefront coeffcent constrants and gve condtons that ensure mportant propertes of the damond tles. We then ntroduce the unfed hexagonal tlng scheme n Secton 3 whch ncludes a full formulaton for two dmensonal tlng as well as an outlook on hexagonal tlng for hgher-dmensonal cases. We dscuss related work n Secton and conclude n Secton.. DIAMOND TILING Damond tlng [] s a tlng technque for stencl computatons where the man contrbuton s the combnaton of affne transformatons and a rectangular tlng that enables concurrent start. The dea of concurrent start s to ensure that the wavefront of tles that are executed n parallel s algned to a concurrent start hyperplane (normally an teraton space boundary) such that the number of tles that are executed n parallel remans constant throughout the entre computaton. Ths ensures that already at the begnnng of the computaton a suffcent amount of parallelsm s avalable. Even though the name damond suggests that the tle shapes are rhomb or rhombohedra (a.k.a. damonds) and Fgure 1 n Bandsht et al. [] also uses edges of dentcal length, the tle shapes formed by damond tlng are not restrcted to damonds, but can be more general parallelograms (parallelotopes n hgher dmensons) as can be seen n Fgure 3 and Fgure 9a. However, some restrctons to the tle shape and szes must be enforce to ensure that concurrent start s possble..1 The Pluto optmzer Damond tlng was presented and mplemented as an extenson to Pluto [3], a general-purpose optmzer for data localty and parallelsm. In contrast to other approaches that drectly tle the teraton space (e.g., [, ]), the orgnal Pluto tlng as well as damond tlng are mplemented as a two phase process. As a frst step a program transformaton s calculated that exposes sequences of loops (bands) that are tleable wth rectangular tles. In the second step a rectangular tlng s performed on these bands. Combned, ths yelds tles wth a possbly not rectangular, but parallelotope tle shape. There are several benefts of separatng these two concerns. Frst, when calculatng the parallel bands Pluto can and does perform other optmzatons, e.g., data localty optmzatons such as loop fuson. Second, tlng of the transformed program makes the tle shapes ndependent of the tlng hyperplanes, whch makes the tlng easer to descrbe and analyze. Pluto calculates program transformatons on a polyhedral representaton. In ths representaton the set of executed program statements (the teraton space) s modeled wth a mult-dmensonal nteger set where each element represents an ndvdual statement teraton. The executon order of elements of the teraton space s descrbed by the schedule, an nteger map that assgns a possbly multdmensonal relatve executon tme to each element of the teraton space. Program transformatons are performed by modfyng the schedule. For a sngle statement and a k- dmensonal executon tme such a schedule has the form S = x (h x,..., h k x), where x s an element of the teraton space, h, [, k] are tlng hyperplanes and h x denotes the sum of the per element products of h and x. The result of Pluto s frst step are exactly these tlng hyperplanes, selected such that the dstance between two statements that depend on each other s not only lexcographcally nonnegatve (needed for valdty of the schedule), but that the dstance s also nonnegatve at each ndvdual dmenson. For the exact algorthm on how to select such hyperplanes, we refer to [3]. For ths paper, t s suffcent to understand that the all nonnegatve dependence vectors make rectangular tlng vald. We present the Pluto rectangular tlng as a schedule only transformaton whch we beleve s easer to understand than the actual Pluto transformaton whch modfes the teraton space as well. Conceptually, there should be no dfference. Gven a schedule S and a set of tle szes s, [, k] a rectangularly tled schedule of S conssts of two partal schedules. The frst one, S t, s placed at the outer level and enumerates the tles tself. It s called the tle schedule. The second one, S p, s placed at the nner level and enumerates the ponts wthn each tle. It s called pont schedule. We defne S t = (x,..., x k ) ( (h x)/s,..., (h k x)/s k ) and S p = S. Ths tled schedule may already expose parallelsm, but t may also be necessary to fall back to ppelne parallelsm by formng a wavefront schedule at the outermost tle dmenson. Then, such a wavefront schedule carres tself all dependences and ensures that the nner loops can be executed n parallel. Ths yelds S t = (x,..., x k ) (λ (h x)/s + + λ k (h k x)/s k, (h 1 x)/s 1,..., (h k x)/s k ) wth λ Z : [, k]. The coeffcents λ allow the constructon of dfferent wavefronts. We call λ = = λ k = 1 the default wavefront coeffcents. The hyperplanes that are calculated by the orgnal Pluto algorthm allow the formaton of such a wavefront schedule, but t s not always possble to form a tle schedule that s n the same drecton as a gven concurrent start face f.. The damond tlng extensons Damond tlng [] extends the Pluto algorthm n a way that ensures that for the tlng hyperplanes computed there always exst wavefront coeffcents that yeld concurrent start. In the followng, we dentfy a face or hyperplane to ts orthogonal vector. Ths paper shows that a transformaton enables tlewse concurrent start along a face f f and only f the tle schedule s n the same drecton as the face and carres all nter-tle dependences. It also shows that concurrent start along a face f can be exposed by a set of hyperplanes f and only f f les strctly nsde the cone formed
3 by the hyperplanes,.e., f and only f f s a strct conc combnaton of all the hyperplanes. Ths means t fnds for a concurrent start hyperplane f tlng hyperplanes h such that the followng equalty holds: mf = λ 1h λ k h k (1) λ, m Z The man focus of the damond tlng paper s to prove the condtons necessary to ensure that the calculated hyperplanes can be used to construct a concurrent start schedule as well as to gve an algorthm that actually calculates such hyperplanes. We consequently refer to ths publcaton for detals. One queston that was explored less s under whch condtons, especally for whch tle szes and for whch wavefront coeffcents, the rectangularly tled schedule acheves concurrent start. Specfcally, t s not clear for whch values of λ, s j the followng holds: mxf = λ (h x)/s + + λ k (h k x)/s k ().3 Relaton between tle szes and wavefronts Even though the damond tlng yelds tlng hyperplanes that allow concurrent start, to construct the full tle schedule the tle szes s as well as the wavefront coeffcents λ stll need to be chosen. Choosng the correct values s mportant, not only to ensure that the tles executed wthn the wavefront are started concurrently, but also to control the horzontal dstance between tles of the same color relatve to ther tle sze. We call ths the densty of the schedule, a property mportant to understand the amount of computaton that can be performed n parallel. Before suggestng good values, we explore the mpact of dfferent choces. Let us frst consder a smple example wth symmetrc dependences: for t for A[t+1][] = A[t][-1] + A[t][+1] Pluto s damond tlng mplementaton calculates for ths kernel the transformaton (t, ) (t, t + ) and apples rectangular tlng n the transformed space. The default wavefront coeffcents λ = λ 1 = 1 are then used to enable parallel executon. Ths results n the tle schedule (t, ) ( (t )/s + (t + )/s 1, (t + )/s 1 ). The default square tle shapes (s = s 1) yeld both concurrent start as well as a hgh densty of tles. Fgure 1 llustrates ths for s = s 1 = wth the tle wavefront hghlghted n red and the concurrent start hyperplane hghlghted n black. The two hyperplanes beng parallel shows that the tle wavefront has concurrent start. When dfferent tle szes are chosen for the two dmensons the default wavefront no longer yelds concurrent start. In Fgure we llustrate for s =, s 1 = that the default wavefront (red) s no longer parallel to the concurrent start hyperplane (black). It s possble to stll get concurrent start usng the non-default wavefront coeffcents λ =, λ 1 = 3, whch yelds the schedule (t, ) ( (t )/ +3 (t+)/, (t+)/ ). Unfortunately, a nondefault wavefront causes a large loss n tle-level parallelsm throughout the computaton. Ths effect s llustrated by the yellow wavefront n Fgure, whch s parallel to the concurrent start hyperplane (black). Next we analyze a kernel wth asymmetrc dependences: for t for A[t+1][] = A[t][-1] + A[t][+] Pluto derves from ths kernel the transformaton (t, ) (t, t + ). Ths transformaton combned wth square tlng and the default wavefront coeffcents allows concurrent start as shown n Fgure 3 for s = s 1 =. The reason for ths, possbly surprsng, result s that for a dmensonal stencl (1 space, 1 tme) wth dependence dstance 1 n the tme drecton, the coeffcent of the space dmenson n the normal wll always be ±1. Ths ensures that when addng the two hyperplanes together ther coeffcents for the space dmenson cancel out and we get agan the concurrent start hyperplane. Consequently, the default wavefront coeffcents combned wth square tle szes yeld a concurrent start wavefront. As already found earler, non-square tle szes wll prevent concurrent start wth the default wavefront coeffcents. Another nterestng observaton s that even though the ratonal tle shapes n Fgure 3 are dentcal throughout the orgnal teraton space, the set of contaned nteger ponts s not. The reason for ths dfference s that even though we use ntegral tle szes n the transformed space, the borders may become non-ntegral n the orgnal space. Varyng nteger pont placements between tles can cause problems due to addtonal condtons n the generated code. As a next step we look nto a case that has dependence dstances that have dfferent lengths on the tme dmenson. for t for A[t+1][] = A[t][-1] + A[t-][+1] For ths kernel, the Pluto mplementaton derves the transformaton (t, ) (t + 3, t + ). Note that ths result s dfferent from what the algorthm n [] would produce. Apparently, the Pluto mplementaton s usng a varaton of that algorthm. It s not clear f there s a problem n ths varaton or that ths s a mere mplementaton problem. As both hyperplanes have a postve coeffcent for the space dmenson, t s mpossble to create a conc combnaton that elmnates the space dmenson and yelds a concurrent start hyperplane. Accordng to the damond tlng paper concurrent start s mpossble and these are no vald damond tlng hyperplanes. Even though the damond tlng mplementaton n Pluto dd not derve a vald tlng for the last kernel, there exst vald damond tlngs for t. One s the transformaton (t, ) (t, t + ). The same transformaton was already chosen for the example llustrated n Fgure 1 and accordng to our understandng of the cost functon n Pluto, ths s n fact the transformaton that the algorthm of [] would choose. The resultng tlng yelds 8 computatons for a pertle memory footprnt of 3. Another vald damond tlng transformaton s (t, ) (t + 3, t ). The hyperplanes n ths transformaton are the ones hybrd-hexagonal tlng would read off drectly from the dependence cone. Gven a dfferent cost functon, Pluto may also choose ths transformaton. The nterestng pont here s, that the normal of the concurrent start hyperplane n the transformed space s not anymore (1,1), but rather (1,3). In ths case, the standard square tlng llustrated n Fgure only yelds concurrent start f, nstead of the default
4 t 1 t t t+ (a) orgnal space Fgure 1: Symmetrc dependences & square tlng t+ (a) orgnal space Fgure : Symmetrc dependences & non-square tlng t+ (a) orgnal space Fgure 3: Asymmetrc dependences & square tlng
5 t t t 7 1 t (a) orgnal space Fgure : More than one tme step - Tlng read off from dependence cone and used by hexagonal tlng. Square tles cause loss of tle-level parallelsm. 7 1 t (a) orgnal space Fgure : More than one tme step - Tlng read off from dependence cone and used by hexagonal tlng. Non-square tles ensure good effcency and maxmal tle-level parallelsm. 8 t+ 1 8 (a) orgnal space Fgure : Damond tlng
6 wavefront coeffcents, λ = 1, λ 1 = 3 are chosen. As shown earler, ths severely reduces tle-level parallelsm. On the other hand, for the same memory footprnt as before, ths tlng executes 1 computatons. We can restore concurrent start wth the default wavefront by usng non-square tle szes. Fgure shows a nonsquare tlng (s = 1, s 1 = ) whch enables concurrent start, whch has maxmal tle-level parallelsm and whch reaches 1 computatons for a memory footprnt of three. Consequently, we would prefer ths tlng over the prevous two.. Optmal tles wth default wavefront As seen n the prevous secton, the use of the default wavefront coeffcents s necessary to ensure hgh tle-densty. However, by tself t gves no guarantee nether for concurrent start nor does t ensure that all tles share the same nteger pont placement. As those propertes are mportant, we present the condtons under whch they can be reached. Frst, we explore the nteger pont placement. Assumng tlng hyperplanes h are combned nto a matrx: H = h. h k then tle szes that are multples of the determnant of H wll ensure that all tles have the same confguraton of nteger ponts snce det(h) H 1 s an nteger matrx. The hyperplanes used, e.g., n Fgure 3 yeld ( ) 1 1 H = 1 and consequently det(h) = 1+ = 3. As s = s 1 = are not multples of 3, the tles dffer n the nteger pont placement. For the same fgure, tle szes such as, e.g., s = s 1 = 3 would ensure a unform nteger placement across all tles. The above condton s suffcent ndependently of the chosen wavefront schedule. Next, we nvestgate the condtons on tle szes to ensure concurrent start wth the default default wavefront coeffcents. Let h x, be the frst component of h x and h x,1 the second. The default wavefront then s (h,t + h,1)/s + (h 1,t + h 1,1)/s 1. Now, to acheve concurrent start, we need to ensure that the default wavefront schedule only depends on the tme dmenson t and that all space dmensons (.e., ) are elmnated. Ths s true under the condton s / h,1 = s 1/ h 1,1. Note that the wavefront may stll depend on the fractonal part of the space dmenson, but ths only results n a varaton wthn a fxed range, ndependently of the sze of the doman. We can see that n Fgure 1, where we reach concurrent start for the default wavefront, ths condtons holds wth /1 = /1. On the other hand, when changng the tle szes to s = and s 1 = as n Fgure, the prevous condton turns nto /1 = /1 and concurrent start s not possble wth the default wavefront. The above shows that to obtan concurrent start the two tle szes cannot be chosen ndependently, but need to be scaled together. To make ths more clear we ntroduce a new varable s whch can be chosen freely and whch s then used to defne s = s h,1 and s 1 = s h 1,1 such that concurrent start s obtaned. 3. UNIFIED DIAMOND AND HEXAGONAL TILING In ths secton we present a extended formulaton of damond tlng whch allows the creaton of hexagonal tles. The hexagonal tles calculated are smlar to the ones presented n [], but are not dentcal n shape. 3.1 The schedule for hexagonal tlng (D case) Let us frst consder a two-dmensonal teraton space. To obtan such a schedule we start from the damond tlng approach, whch means we frst calculate a set of tlng hyperplanes, transform the ndex space wth these hyperplanes and then apply rectangular tlng n the transformed space. We then (optonally) transform the rectangular tlng by stretchng the rectangular tles along the concurrent start hyperplane. The stretched rectangular tles n the transformed space form hexagonal tles n the orgnal space. As a result we have a sngle schedule that descrbes damond tlng, f tles are stretched by a vector of length zero, and hexagonal tlng, f they are stretched by a non-zero-length vector. In the followng descrpton, we assume that the tlng hyperplanes h, h 1 are computed by the damond tlng algorthm as descrbed n []. We focus on the descrpton of the (possbly) stretched tlng scheme n the transformed space. As nput for the stretched tlng scheme, we take the tle szes s, s 1 as well as a vector v = (v, v 1), whch s parallel to the concurrent start hyperplane (n the transformed space). We also requre that the drecton vector of the concurrent start hyperplane n = (n, n 1) s strctly postve n all components, as guaranteed by the algorthm of []. We frst model damond tlng usng a standard D rectangular tlng n the transformed space. In ths tlng the symbols s, s 1 defne the tle szes along the dmensons d, d 1 whle T, T 1 are the resultng tle schedule dmensons (we gnore the pont schedule dmensons, as ths mappng s not nterestng for ths dscusson). The followng map descrbes such a rectangular tlng. (d, d 1) (T, T 1) : s T d < s (T + 1) s 1T 1 d 1 < s 1(T 1 + 1) Our goal s to acheve and mantan concurrent start usng the default wavefront. Consequently s and s 1 cannot be chosen freely (see Secton.). We requre the user to choose tle szes that ensure concurrent start. Fgure llustrates the above rectangular tlng usng the transformaton (t, ) (t +, t ), as well as the tle szes s =, s 1 = 3. The red tles show the concurrent start wavefront. Startng from ths rectangular tlng we want to stretch the contaned tles by a vector v wth components v, v 1, where v s parallel to the concurrent start hyperplane. In prncple, v can have ether of two possble drectons, but to smplfy the schedule formulaton we choose v such that v < v 1 >. Fgure 7 shows a stretchng as we obtan t for v = (, ) and n = (1, ). Before we mplement the actual stretchng, we frst add two addtonal constrants to each tle. The frst one bounds each tle at ts lexcographc mnmal pont wth the concurrent start hyperplane, the second one bounds each tle at ts lexcographc maxmal pont wth the same (but translated) hyperplane. We mplement the lower boundary by placng the hyperplane at the orgn and by offsettng t for each tle
7 t 8 t+ 1 8 (a) orgnal space Fgure 7: Hexagonal-tlng accordng to the tle szes. To offset the tle along d we adjust the rght hand sde of the lower bound by n s T and n 1s 1T 1. The upper boundary s mplemented by reversng the lower hyperplane. The locaton of the upper hyperplanes for tle (T, T 1) s the orgn of tle (T + 1, T 1 + 1). (d, d 1) (T, T 1) : s T d < s (T + 1) s 1T 1 d 1 < s 1(T 1 + 1) n s T + n 1s 1T 1 n d + n 1d 1 n d + n 1d 1 < n s (T + 1) + n 1s 1(T 1 + 1) As a last step, we now stretch the tles along v. Ths requres us to ncrease the sze of the rectangular tles by v n the d dmenson and v 1 n the d 1 dmenson. We also account for the shfted postons of the rectangular tles by addng some offsets o, o 1 to the upper and lower tle boundares that wll be derved later n ths secton. Fnally we adjust the locatons of the concurrent start planes by usng c = n 1(s + v ) + n v 1 and c 1 = n 1(s 1 + v 1) + n v. (d, d 1) (T, T 1) : o, o 1 : o = v T + v T 1 o 1 = v 1T + v 1T 1 s T + o + v d < s (T + 1) + o s 1T 1 + o 1 d 1 < s 1(T 1 + 1) + v 1 + o 1 c T + c 1T 1 n d + n 1d 1 n d + n 1d 1 < c (T + 1) + c 1(T 1 + 1) Fgure 8 llustrates the last step n detal. On the left sde we see n red the orgnal square tles (,), (1,) and (1,1) each of sze. On the rght sde, we see the tles wth the same tle numbers, but stretched along v. We can see that the rectangular tle shapes have been extended by along d and by along d 1 resultng n the lght blue tle shapes (the dark blue tle shapes llustrate the contaned nteger ponts). We can also see that the poston of the red tle shape of tle (,) has not moved. However, when gong one step up to tle (1,) whch means ncreasng the tle number T by one, we offset the tle by v along d 1 as well as v 1 along d 1. Smlarly, when gong from tle (1,) to tle (1,1) whch means ncreasng the tle number T 1 by one, we offset the tle by v along d and v 1 along d 1. Combned ths yelds the offset o = v T +v T 1 for d and o 1 = v 1T +v 1T 1 for d 1. The new values c and c 1 do now also take nto account the offset of the plane. When varyng T we now do not only need to take the vertcal tle sze s nto account, but n addton we nclude the addtonal vertcal offset v as well as the changed horzontal offset v 1. To support concurrent start hyperplanes of dfferent orentatons such offsets are scaled by the relevant components of n. The correspondng changes have been added when adjustng c 1. A very mportant observaton to make s that the tles (T, T 1) as well as (T + 1, T 1 + 1) have overlappng rectangular tles. However, the concurrent start hyperplanes that have been added rght at the poston of v ensure that the tles are non-overlappng and stll tle the full space. Also, as our stretchng and translaton was only along the concurrent start hyperplane, no dependences have been volated. Fnally, f the prevous tlng had concurrent start, stretchng along the concurrent start hyperplane preserves ths property. 3. Hexagonal tlng for hgher dmensons To extend our unfed hexagonal tlng to hgher dmensonal kernels we use a shape derved from a truncated octahedra [] to create a tlng for one tme and two space dmensons, that not only provdes two dmensons of parallelsm, but that also gves the freedom to adjust the sze of the tle shape ndependently for the dfferent dmensons. Fgure 9b llustrates such a tlng. In the llustraton the tme dmenson goes upwards whereas the space dmensons go to the lower left and the lower rght corner of the renderng. The hyperplane orthogonal to the tme dmenson s the concurrent start hyperplane. Tles of the same color are executed at the same tme step. As vsble n the fgure, the tles of a sngle color are wthn a hyperplane parallel to the concurrent start hyperplane. The ndvdual tles of a sngle color are ndependent and can be executed n parallel. There s parallelsm along both space dmensons. All tles share a sngle tle shape. The hexagonal tlng s derved from the damond tlng llustrated n Fgure 9a (the same example as used by Bandsht et al. []). At the begnnng the peak of all tles s formed by a sngle pont (llustrated by the red dot on the lower left blue tle of Fgure 9a). Smlarly to the constructon of hexagonal tlng for one space dmenson, we then bound each tle at the top and at the bottom by the concurrent start hyperplane and stretch the peak to form a plane. However, for the case of two space dmenson we stretch along three dfferent vectors all chosen to be parallel to the concurrent start hyperplane and, n addton, to be nsde one of the tlng hyperplanes. We llustrate n the blue
8 1 t+ 8 1, 1,1 1, 1,1 t+, 1 1, 3 8 (a) unstretched (b) stretched Fgure 8: The stretchng n the transformed space (a) Plan Damond tlng - unstretched (b) Hexagonal tlng - Derved from the tlng n Fgure 9a, but stretched along the concurrent start hyperplane Fgure 9: Hexagonal tlng - two space dmensons (3D renderng) Fgure : Hexagonal tlng - two space dmensons (Tme steps -)
9 tle at the lower left of Fgure 9b these stretchng vectors n red. Fgure llustrates that the tlng s space fllng. By stretchng only wthn the concurrent start hyperplane no dependences have been volated and concurrent start s preserved. The graphcal llustraton and the above clams only gve an ntuton of ths tlng scheme. Addtonal work s requred to understand the constructon of such a tlng, ts propertes and ts effectveness. However, the promse we see s that we can translate the advantages of hexagonal tlng nto hgher dmensonal cases enablng flexble tle szes, concurrent start as well as thread-level parallelsm along multple dmensons for hgher-dmensonal kernels.. RELATED WORK Asde from the already dscussed damond and hybrdhexagonal tlng [, ], there has been a lot of successful research n generatng code to effcently perform stencl computatons. There s Pochor [1], a doman-specfc C++ framework as well as Henretty et al. [7] wth a DSL-based approach. Strzodka [1] uses an n-tle wavefront traversal technque to acheve effcent cache use even wth tle szes larger than the avalable cache memory. All approaches generate effcent CPU code. Then, there are a set of general optmzers. PPCG [1] generates parallel CPU and GPU code usng classcal (tme) tlng. It reles on affne transformatons to extract parallelsm and mprove localty, usng a varant of the Pluto algorthm [3]. Reservor Labs R-Stream s also a reference polyhedral compler targetng GPUs [, 1]. ParAll [1] s an open source parallelzng compler developed by Slkan targetng multple archtectures. The compler s not based on the polyhedral model, but uses abstract nterpretaton for array regons, performng powerful nterprocedural analyss on the nput code. Fnally, there are tools that generate effcent GPU code. Here Holewnsk s Overtle [8] and Grosser s splt tlng [] complers represent, besdes [], the state-of-the-art for the automatc generaton of effcent GPU code relyng on overlapped and splt tlng, respectvely.. CONCLUSION We presented a formulaton of hexagonal tlng that combnes the benefts of damond tlng and hybrd-hexagonal tlng. Startng from the publshed damond-tlng algorthm, we formulated condtons on tle szes and wavefront coeffcents to ensure concurrent start. We also formulated the condton that ensures the same nteger pont placement across all tles. And most mportantly, we extended the orgnal damond tlng algorthm to hexagonal tles. The added flexblty of hexagonal tles does not only make the choce of tle szes more flexble but also enables the creaton of tles wth a flat summt. Both these features have been shown useful for GPU code generaton. Fnally, we gave an outlook on our plans to extend ths tlng scheme to hgher dmensonal stencls, an extenson that wll brng together flexble tle szes and multple dmensons of parallelsm. Acknowledgments. Ths work greatly benefted from regular dscussons wth Uday Bondhugula. It was partly funded by a Google European Fellowshp n Effcent Computng, by the European FP7 project CARP d. 8777, the COP- CAMS ARTEMIS project, and award 988 from the U.S. NSF.. REFERENCES [1] Mehd Amn, Béatrce Creusllet, Stéphane Even, Ronan Keryell, Ong Gouber, Serge Guelton, Jance Onanan McMahon, Franços-Xaver Pasquer, Grégore Péan, Perre Vllalon, et al. ParAll: From convex array regons to heterogeneous computng. In IMPACT, 1. [] Vnayaka Bandsht, Irshad Pananlath, and Uday Bondhugula. Tlng stencl computatons to maxmze parallelsm. In ACM Supercomputng Conf., 1. [3] Uday Bondhugula, J. Ramanujam, and et al. PLuTo: A practcal and fully automatc polyhedral program optmzaton system. In PLDI, 8. [] H.S.M. Coxeter. Regular and sem-regular polytopes.. Mathematsche Zetschrft, (1):38 7, 19. [] Tobas Grosser, Albert Cohen, Paul HJ Kelly, J Ramanujam, P Sadayappan, and Sven Verdoolaege. Splt tlng for gpus: automatc parallelzaton usng trapezodal tles. In GPGPU-. ACM, 13. [] Tobas Grosser, Albert Cohen, Sven Verdoolaege, P. Sadayappan, and Justn Holewnsk. Hybrd hexagonal/classcal tlng for GPUs. In Internatonal Symposum on Code Generaton and Optmzaton, 1. [7] Tom Henretty, Rchard Veras, Franz Franchett, Lous-Noël Pouchet, J. Ramanujam, and P. Sadayappan. A stencl compler for short-vector smd archtectures. In ICS. ACM, 13. [8] Justn Holewnsk, Lous-Noël Pouchet, and P Sadayappan. Hgh-performance code generaton for stencl computatons on gpu archtectures. In ICS, pages ACM, 1. [9] Srram Krshnamoorthy, Muthu Baskaran, Uday Bondhugula, J. Ramanujam, Atanas Rountev, and P. Sadayappan. Effectve automatc parallelzaton of stencl computatons. In PLDI, pages 3, 7. [] Allen Leung, Ncolas Vaslache, Benoît Mester, Muthu Baskaran, Davd Wohlford, Cédrc Bastoul, and Rchard Lethn. A mappng path for mult-gpgpu accelerated computers from a portable hgh level programmng abstracton. In GPGPU-3,. [11] G. Smth. Numercal Soluton of Partal Dfferental Equatons: Fnte Dfference Methods. Oxford Unversty Press,. [1] Robert Strzodka, Mohammed Shaheen, Dawd Pajak, and H Sedel. Cache accurate tme skewng n teratve stencl computatons. In Parallel Processng (ICPP), 11 Internatonal Conference on. IEEE, 11. [13] A. Taflove. Computatonal electrodynamcs: The Fnte-dfference tme-doman method. Artech House, 199. [1] Yuan Tang, Rezaul Alam Chowdhury, Bradley C Kuszmaul, Ch-Keung Luk, and Charles E Leserson. The pochor stencl compler. In SPAA. ACM, 11. [1] Ncolas Vaslache, Benot Mester, Muthu Baskaran, and Rchard Lethn. Jont schedulng and layout optmzaton to enable mult-level vectorzaton. In IMPACT, Pars, France, January 1. [1] Sven Verdoolaege, Juan Carlos Juega, Albert Cohen, José Ignaco Gómez, Chrstan Tenllado, and Francky Catthoor. Polyhedral parallel code generaton for cuda. ACM TACO, 9():, 13.
Parallelism for Nested Loops with Non-uniform and Flow Dependences
Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr
More informationPolyhedral Compilation Foundations
Polyhedral Complaton Foundatons Lous-Noël Pouchet pouchet@cse.oho-state.edu Dept. of Computer Scence and Engneerng, the Oho State Unversty Feb 8, 200 888., Class # Introducton: Polyhedral Complaton Foundatons
More informationLoop Transformations for Parallelism & Locality. Review. Scalar Expansion. Scalar Expansion: Motivation
Loop Transformatons for Parallelsm & Localty Last week Data dependences and loops Loop transformatons Parallelzaton Loop nterchange Today Scalar expanson for removng false dependences Loop nterchange Loop
More informationLoop Permutation. Loop Transformations for Parallelism & Locality. Legality of Loop Interchange. Loop Interchange (cont)
Loop Transformatons for Parallelsm & Localty Prevously Data dependences and loops Loop transformatons Parallelzaton Loop nterchange Today Loop nterchange Loop transformatons and transformaton frameworks
More informationVectorization in the Polyhedral Model
Vectorzaton n the Polyhedral Model Lous-Noël Pouchet pouchet@cse.oho-state.edu Dept. of Computer Scence and Engneerng, the Oho State Unversty October 200 888. Introducton: Overvew Vectorzaton: Detecton
More informationThe Codesign Challenge
ECE 4530 Codesgn Challenge Fall 2007 Hardware/Software Codesgn The Codesgn Challenge Objectves In the codesgn challenge, your task s to accelerate a gven software reference mplementaton as fast as possble.
More informationA General Purpose Automatic Overlapped Tiling Technique in Polyhedral Frameworks
General Purpose utomatc Overlapped Tlng Technque n Polyhedral Frameworks Je Zhao INRI & École Normale Supéreure 45 rue d Ulm, 75005 Pars, France M Student Research ompetton (SR) 2018 IEEE/M Internatonal
More informationCompiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz
Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster
More informationA mathematical programming approach to the analysis, design and scheduling of offshore oilfields
17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 A mathematcal programmng approach to the analyss, desgn and
More informationAn Optimal Algorithm for Prufer Codes *
J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,
More informationS1 Note. Basis functions.
S1 Note. Bass functons. Contents Types of bass functons...1 The Fourer bass...2 B-splne bass...3 Power and type I error rates wth dfferent numbers of bass functons...4 Table S1. Smulaton results of type
More information6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour
6.854 Advanced Algorthms Petar Maymounkov Problem Set 11 (November 23, 2005) Wth: Benjamn Rossman, Oren Wemann, and Pouya Kheradpour Problem 1. We reduce vertex cover to MAX-SAT wth weghts, such that the
More informationLoop Transformations, Dependences, and Parallelization
Loop Transformatons, Dependences, and Parallelzaton Announcements Mdterm s Frday from 3-4:15 n ths room Today Semester long project Data dependence recap Parallelsm and storage tradeoff Scalar expanson
More informationAnalysis of Continuous Beams in General
Analyss of Contnuous Beams n General Contnuous beams consdered here are prsmatc, rgdly connected to each beam segment and supported at varous ponts along the beam. onts are selected at ponts of support,
More informationToday Using Fourier-Motzkin elimination for code generation Using Fourier-Motzkin elimination for determining schedule constraints
Fourer Motzkn Elmnaton Logstcs HW10 due Frday Aprl 27 th Today Usng Fourer-Motzkn elmnaton for code generaton Usng Fourer-Motzkn elmnaton for determnng schedule constrants Unversty Fourer-Motzkn Elmnaton
More informationFeature Reduction and Selection
Feature Reducton and Selecton Dr. Shuang LIANG School of Software Engneerng TongJ Unversty Fall, 2012 Today s Topcs Introducton Problems of Dmensonalty Feature Reducton Statstc methods Prncpal Components
More informationSupport Vector Machines
/9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.
More informationMathematics 256 a course in differential equations for engineering students
Mathematcs 56 a course n dfferental equatons for engneerng students Chapter 5. More effcent methods of numercal soluton Euler s method s qute neffcent. Because the error s essentally proportonal to the
More informationR s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes
SPH3UW Unt 7.3 Sphercal Concave Mrrors Page 1 of 1 Notes Physcs Tool box Concave Mrror If the reflectng surface takes place on the nner surface of the sphercal shape so that the centre of the mrror bulges
More informationThe Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique
//00 :0 AM Outlne and Readng The Greedy Method The Greedy Method Technque (secton.) Fractonal Knapsack Problem (secton..) Task Schedulng (secton..) Mnmum Spannng Trees (secton.) Change Money Problem Greedy
More informationComplex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following.
Complex Numbers The last topc n ths secton s not really related to most of what we ve done n ths chapter, although t s somewhat related to the radcals secton as we wll see. We also won t need the materal
More informationHermite Splines in Lie Groups as Products of Geodesics
Hermte Splnes n Le Groups as Products of Geodescs Ethan Eade Updated May 28, 2017 1 Introducton 1.1 Goal Ths document defnes a curve n the Le group G parametrzed by tme and by structural parameters n the
More informationA Binarization Algorithm specialized on Document Images and Photos
A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a
More informationCluster Analysis of Electrical Behavior
Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School
More informationChapter 6 Programmng the fnte element method Inow turn to the man subject of ths book: The mplementaton of the fnte element algorthm n computer programs. In order to make my dscusson as straghtforward
More informationParallel Numerics. 1 Preconditioning & Iterative Solvers (From 2016)
Technsche Unverstät München WSe 6/7 Insttut für Informatk Prof. Dr. Thomas Huckle Dpl.-Math. Benjamn Uekermann Parallel Numercs Exercse : Prevous Exam Questons Precondtonng & Iteratve Solvers (From 6)
More informationA Fast Content-Based Multimedia Retrieval Technique Using Compressed Data
A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,
More informationMachine Learning. Support Vector Machines. (contains material adapted from talks by Constantin F. Aliferis & Ioannis Tsamardinos, and Martin Law)
Machne Learnng Support Vector Machnes (contans materal adapted from talks by Constantn F. Alfers & Ioanns Tsamardnos, and Martn Law) Bryan Pardo, Machne Learnng: EECS 349 Fall 2014 Support Vector Machnes
More informationA SYSTOLIC APPROACH TO LOOP PARTITIONING AND MAPPING INTO FIXED SIZE DISTRIBUTED MEMORY ARCHITECTURES
A SYSOLIC APPROACH O LOOP PARIIONING AND MAPPING INO FIXED SIZE DISRIBUED MEMORY ARCHIECURES Ioanns Drosts, Nektaros Kozrs, George Papakonstantnou and Panayots sanakas Natonal echncal Unversty of Athens
More information3D vector computer graphics
3D vector computer graphcs Paolo Varagnolo: freelance engneer Padova Aprl 2016 Prvate Practce ----------------------------------- 1. Introducton Vector 3D model representaton n computer graphcs requres
More informationSLAM Summer School 2006 Practical 2: SLAM using Monocular Vision
SLAM Summer School 2006 Practcal 2: SLAM usng Monocular Vson Javer Cvera, Unversty of Zaragoza Andrew J. Davson, Imperal College London J.M.M Montel, Unversty of Zaragoza. josemar@unzar.es, jcvera@unzar.es,
More informationTN348: Openlab Module - Colocalization
TN348: Openlab Module - Colocalzaton Topc The Colocalzaton module provdes the faclty to vsualze and quantfy colocalzaton between pars of mages. The Colocalzaton wndow contans a prevew of the two mages
More informationSolving two-person zero-sum game by Matlab
Appled Mechancs and Materals Onlne: 2011-02-02 ISSN: 1662-7482, Vols. 50-51, pp 262-265 do:10.4028/www.scentfc.net/amm.50-51.262 2011 Trans Tech Publcatons, Swtzerland Solvng two-person zero-sum game by
More informationAn Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation
17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 An Iteratve Soluton Approach to Process Plant Layout usng Mxed
More informationMeta-heuristics for Multidimensional Knapsack Problems
2012 4th Internatonal Conference on Computer Research and Development IPCSIT vol.39 (2012) (2012) IACSIT Press, Sngapore Meta-heurstcs for Multdmensonal Knapsack Problems Zhbao Man + Computer Scence Department,
More informationSmoothing Spline ANOVA for variable screening
Smoothng Splne ANOVA for varable screenng a useful tool for metamodels tranng and mult-objectve optmzaton L. Rcco, E. Rgon, A. Turco Outlne RSM Introducton Possble couplng Test case MOO MOO wth Game Theory
More information2x x l. Module 3: Element Properties Lecture 4: Lagrange and Serendipity Elements
Module 3: Element Propertes Lecture : Lagrange and Serendpty Elements 5 In last lecture note, the nterpolaton functons are derved on the bass of assumed polynomal from Pascal s trangle for the fled varable.
More informationParallel matrix-vector multiplication
Appendx A Parallel matrx-vector multplcaton The reduced transton matrx of the three-dmensonal cage model for gel electrophoress, descrbed n secton 3.2, becomes excessvely large for polymer lengths more
More informationAn Approach in Coloring Semi-Regular Tilings on the Hyperbolic Plane
An Approach n Colorng Sem-Regular Tlngs on the Hyperbolc Plane Ma Louse Antonette N De Las Peñas, mlp@mathscmathadmueduph Glenn R Lago, glago@yahoocom Math Department, Ateneo de Manla Unversty, Loyola
More informationAssignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.
Farrukh Jabeen Algorthms 51 Assgnment #2 Due Date: June 15, 29. Assgnment # 2 Chapter 3 Dscrete Fourer Transforms Implement the FFT for the DFT. Descrbed n sectons 3.1 and 3.2. Delverables: 1. Concse descrpton
More informationFor instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1)
Secton 1.2 Subsets and the Boolean operatons on sets If every element of the set A s an element of the set B, we say that A s a subset of B, or that A s contaned n B, or that B contans A, and we wrte A
More informationImprovement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration
Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,
More informationAn Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices
Internatonal Mathematcal Forum, Vol 7, 2012, no 52, 2549-2554 An Applcaton of the Dulmage-Mendelsohn Decomposton to Sparse Null Space Bases of Full Row Rank Matrces Mostafa Khorramzadeh Department of Mathematcal
More informationCMPS 10 Introduction to Computer Science Lecture Notes
CPS 0 Introducton to Computer Scence Lecture Notes Chapter : Algorthm Desgn How should we present algorthms? Natural languages lke Englsh, Spansh, or French whch are rch n nterpretaton and meanng are not
More informationLearning the Kernel Parameters in Kernel Minimum Distance Classifier
Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department
More informationTPL-Aware Displacement-driven Detailed Placement Refinement with Coloring Constraints
TPL-ware Dsplacement-drven Detaled Placement Refnement wth Colorng Constrants Tao Ln Iowa State Unversty tln@astate.edu Chrs Chu Iowa State Unversty cnchu@astate.edu BSTRCT To mnmze the effect of process
More informationModule Management Tool in Software Development Organizations
Journal of Computer Scence (5): 8-, 7 ISSN 59-66 7 Scence Publcatons Management Tool n Software Development Organzatons Ahmad A. Al-Rababah and Mohammad A. Al-Rababah Faculty of IT, Al-Ahlyyah Amman Unversty,
More informationCommunication-Minimal Partitioning and Data Alignment for Af"ne Nested Loops
Communcaton-Mnmal Parttonng and Data Algnment for Af"ne Nested Loops HYUK-JAE LEE 1 AND JOSÉ A. B. FORTES 2 1 Department of Computer Scence, Lousana Tech Unversty, Ruston, LA 71272, USA 2 School of Electrcal
More informationGSLM Operations Research II Fall 13/14
GSLM 58 Operatons Research II Fall /4 6. Separable Programmng Consder a general NLP mn f(x) s.t. g j (x) b j j =. m. Defnton 6.. The NLP s a separable program f ts objectve functon and all constrants are
More informationSum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints
Australan Journal of Basc and Appled Scences, 2(4): 1204-1208, 2008 ISSN 1991-8178 Sum of Lnear and Fractonal Multobjectve Programmng Problem under Fuzzy Rules Constrants 1 2 Sanjay Jan and Kalash Lachhwan
More informationFEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur
FEATURE EXTRACTION Dr. K.Vjayarekha Assocate Dean School of Electrcal and Electroncs Engneerng SASTRA Unversty, Thanjavur613 41 Jont Intatve of IITs and IISc Funded by MHRD Page 1 of 8 Table of Contents
More informationSENSITIVITY ANALYSIS IN LINEAR PROGRAMMING USING A CALCULATOR
SENSITIVITY ANALYSIS IN LINEAR PROGRAMMING USING A CALCULATOR Judth Aronow Rchard Jarvnen Independent Consultant Dept of Math/Stat 559 Frost Wnona State Unversty Beaumont, TX 7776 Wnona, MN 55987 aronowju@hal.lamar.edu
More informationSome material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier
Some materal adapted from Mohamed Youns, UMBC CMSC 611 Spr 2003 course sldes Some materal adapted from Hennessy & Patterson / 2003 Elsever Scence Performance = 1 Executon tme Speedup = Performance (B)
More informationRange images. Range image registration. Examples of sampling patterns. Range images and range surfaces
Range mages For many structured lght scanners, the range data forms a hghly regular pattern known as a range mage. he samplng pattern s determned by the specfc scanner. Range mage regstraton 1 Examples
More informationVISUAL SELECTION OF SURFACE FEATURES DURING THEIR GEOMETRIC SIMULATION WITH THE HELP OF COMPUTER TECHNOLOGIES
UbCC 2011, Volume 6, 5002981-x manuscrpts OPEN ACCES UbCC Journal ISSN 1992-8424 www.ubcc.org VISUAL SELECTION OF SURFACE FEATURES DURING THEIR GEOMETRIC SIMULATION WITH THE HELP OF COMPUTER TECHNOLOGIES
More informationA MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS
Proceedngs of the Wnter Smulaton Conference M E Kuhl, N M Steger, F B Armstrong, and J A Jones, eds A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Mark W Brantley Chun-Hung
More informationContent Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers
IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth
More informationMultiblock method for database generation in finite element programs
Proc. of the 9th WSEAS Int. Conf. on Mathematcal Methods and Computatonal Technques n Electrcal Engneerng, Arcachon, October 13-15, 2007 53 Multblock method for database generaton n fnte element programs
More informationProper Choice of Data Used for the Estimation of Datum Transformation Parameters
Proper Choce of Data Used for the Estmaton of Datum Transformaton Parameters Hakan S. KUTOGLU, Turkey Key words: Coordnate systems; transformaton; estmaton, relablty. SUMMARY Advances n technologes and
More informationImproving High Level Synthesis Optimization Opportunity Through Polyhedral Transformations
Improvng Hgh Level Synthess Optmzaton Opportunty Through Polyhedral Transformatons We Zuo 2,5, Yun Lang 1, Peng L 1, Kyle Rupnow 3, Demng Chen 2,3 and Jason Cong 1,4 1 Center for Energy-Effcent Computng
More informationLecture 4: Principal components
/3/6 Lecture 4: Prncpal components 3..6 Multvarate lnear regresson MLR s optmal for the estmaton data...but poor for handlng collnear data Covarance matrx s not nvertble (large condton number) Robustness
More informationA Facet Generation Procedure. for solving 0/1 integer programs
A Facet Generaton Procedure for solvng 0/ nteger programs by Gyana R. Parja IBM Corporaton, Poughkeepse, NY 260 Radu Gaddov Emery Worldwde Arlnes, Vandala, Oho 45377 and Wlbert E. Wlhelm Teas A&M Unversty,
More informationWavefront Reconstructor
A Dstrbuted Smplex B-Splne Based Wavefront Reconstructor Coen de Vsser and Mchel Verhaegen 14-12-201212 2012 Delft Unversty of Technology Contents Introducton Wavefront reconstructon usng Smplex B-Splnes
More informationELEC 377 Operating Systems. Week 6 Class 3
ELEC 377 Operatng Systems Week 6 Class 3 Last Class Memory Management Memory Pagng Pagng Structure ELEC 377 Operatng Systems Today Pagng Szes Vrtual Memory Concept Demand Pagng ELEC 377 Operatng Systems
More informationReview of approximation techniques
CHAPTER 2 Revew of appromaton technques 2. Introducton Optmzaton problems n engneerng desgn are characterzed by the followng assocated features: the objectve functon and constrants are mplct functons evaluated
More informationUNIT 2 : INEQUALITIES AND CONVEX SETS
UNT 2 : NEQUALTES AND CONVEX SETS ' Structure 2. ntroducton Objectves, nequaltes and ther Graphs Convex Sets and ther Geometry Noton of Convex Sets Extreme Ponts of Convex Set Hyper Planes and Half Spaces
More informationProgramming in Fortran 90 : 2017/2018
Programmng n Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Exercse 1 : Evaluaton of functon dependng on nput Wrte a program who evaluate the functon f (x,y) for any two user specfed values
More informationImproving Low Density Parity Check Codes Over the Erasure Channel. The Nelder Mead Downhill Simplex Method. Scott Stransky
Improvng Low Densty Party Check Codes Over the Erasure Channel The Nelder Mead Downhll Smplex Method Scott Stransky Programmng n conjuncton wth: Bors Cukalovc 18.413 Fnal Project Sprng 2004 Page 1 Abstract
More informationAPPLICATION OF MULTIVARIATE LOSS FUNCTION FOR ASSESSMENT OF THE QUALITY OF TECHNOLOGICAL PROCESS MANAGEMENT
3. - 5. 5., Brno, Czech Republc, EU APPLICATION OF MULTIVARIATE LOSS FUNCTION FOR ASSESSMENT OF THE QUALITY OF TECHNOLOGICAL PROCESS MANAGEMENT Abstract Josef TOŠENOVSKÝ ) Lenka MONSPORTOVÁ ) Flp TOŠENOVSKÝ
More informationOutline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1
4/14/011 Outlne Dscrmnatve classfers for mage recognton Wednesday, Aprl 13 Krsten Grauman UT-Austn Last tme: wndow-based generc obect detecton basc ppelne face detecton wth boostng as case study Today:
More informationSVM-based Learning for Multiple Model Estimation
SVM-based Learnng for Multple Model Estmaton Vladmr Cherkassky and Yunqan Ma Department of Electrcal and Computer Engneerng Unversty of Mnnesota Mnneapols, MN 55455 {cherkass,myq}@ece.umn.edu Abstract:
More informationType-2 Fuzzy Non-uniform Rational B-spline Model with Type-2 Fuzzy Data
Malaysan Journal of Mathematcal Scences 11(S) Aprl : 35 46 (2017) Specal Issue: The 2nd Internatonal Conference and Workshop on Mathematcal Analyss (ICWOMA 2016) MALAYSIAN JOURNAL OF MATHEMATICAL SCIENCES
More informationADJUSTING A PROGRAM TRANSFORMATION FOR LEGALITY
Parallel Processng Letters c World Scentfc Publshng Company ADJUSTING A PROGRAM TRANSFORMATION FOR LEGALITY CÉDRIC BASTOUL Laboratore PRSM, Unversté de Versalles Sant Quentn 45 avenue des États-Uns, 785
More informationProblem Set 3 Solutions
Introducton to Algorthms October 4, 2002 Massachusetts Insttute of Technology 6046J/18410J Professors Erk Demane and Shaf Goldwasser Handout 14 Problem Set 3 Solutons (Exercses were not to be turned n,
More informationLECTURE NOTES Duality Theory, Sensitivity Analysis, and Parametric Programming
CEE 60 Davd Rosenberg p. LECTURE NOTES Dualty Theory, Senstvty Analyss, and Parametrc Programmng Learnng Objectves. Revew the prmal LP model formulaton 2. Formulate the Dual Problem of an LP problem (TUES)
More informationLecture 5: Multilayer Perceptrons
Lecture 5: Multlayer Perceptrons Roger Grosse 1 Introducton So far, we ve only talked about lnear models: lnear regresson and lnear bnary classfers. We noted that there are functons that can t be represented
More informationKent State University CS 4/ Design and Analysis of Algorithms. Dept. of Math & Computer Science LECT-16. Dynamic Programming
CS 4/560 Desgn and Analyss of Algorthms Kent State Unversty Dept. of Math & Computer Scence LECT-6 Dynamc Programmng 2 Dynamc Programmng Dynamc Programmng, lke the dvde-and-conquer method, solves problems
More informationAn Accurate Evaluation of Integrals in Convex and Non convex Polygonal Domain by Twelve Node Quadrilateral Finite Element Method
Internatonal Journal of Computatonal and Appled Mathematcs. ISSN 89-4966 Volume, Number (07), pp. 33-4 Research Inda Publcatons http://www.rpublcaton.com An Accurate Evaluaton of Integrals n Convex and
More informationTopology Design using LS-TaSC Version 2 and LS-DYNA
Topology Desgn usng LS-TaSC Verson 2 and LS-DYNA Wllem Roux Lvermore Software Technology Corporaton, Lvermore, CA, USA Abstract Ths paper gves an overvew of LS-TaSC verson 2, a topology optmzaton tool
More informationEcient Computation of the Most Probable Motion from Fuzzy. Moshe Ben-Ezra Shmuel Peleg Michael Werman. The Hebrew University of Jerusalem
Ecent Computaton of the Most Probable Moton from Fuzzy Correspondences Moshe Ben-Ezra Shmuel Peleg Mchael Werman Insttute of Computer Scence The Hebrew Unversty of Jerusalem 91904 Jerusalem, Israel Emal:
More informationWishing you all a Total Quality New Year!
Total Qualty Management and Sx Sgma Post Graduate Program 214-15 Sesson 4 Vnay Kumar Kalakband Assstant Professor Operatons & Systems Area 1 Wshng you all a Total Qualty New Year! Hope you acheve Sx sgma
More informationVirtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory
Background EECS. Operatng System Fundamentals No. Vrtual Memory Prof. Hu Jang Department of Electrcal Engneerng and Computer Scence, York Unversty Memory-management methods normally requres the entre process
More informationIntra-Parametric Analysis of a Fuzzy MOLP
Intra-Parametrc Analyss of a Fuzzy MOLP a MIAO-LING WANG a Department of Industral Engneerng and Management a Mnghsn Insttute of Technology and Hsnchu Tawan, ROC b HSIAO-FAN WANG b Insttute of Industral
More informationArray transposition in CUDA shared memory
Array transposton n CUDA shared memory Mke Gles February 19, 2014 Abstract Ths short note s nspred by some code wrtten by Jeremy Appleyard for the transposton of data through shared memory. I had some
More informationCHAPTER 2 DECOMPOSITION OF GRAPHS
CHAPTER DECOMPOSITION OF GRAPHS. INTRODUCTION A graph H s called a Supersubdvson of a graph G f H s obtaned from G by replacng every edge uv of G by a bpartte graph,m (m may vary for each edge by dentfyng
More informationEfficient Code Generation for Automatic Parallelization and Optimization
Effcent Code Generaton for utomatc Parallelzaton and Optmzaton Cédrc Bastoul Laboratore PRSM, Unversté de Versalles Sant Quentn 5 avenue des États-Uns, 7805 Versalles Cedex, France Emal: cedrcbastoul@prsmuvsqfr
More informationPYTHON IMPLEMENTATION OF VISUAL SECRET SHARING SCHEMES
PYTHON IMPLEMENTATION OF VISUAL SECRET SHARING SCHEMES Ruxandra Olmd Faculty of Mathematcs and Computer Scence, Unversty of Bucharest Emal: ruxandra.olmd@fm.unbuc.ro Abstract Vsual secret sharng schemes
More informationSubspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;
Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features
More informationSelf-tuning Histograms: Building Histograms Without Looking at Data
Self-tunng Hstograms: Buldng Hstograms Wthout Lookng at Data Ashraf Aboulnaga Computer Scences Department Unversty of Wsconsn - Madson ashraf@cs.wsc.edu Surajt Chaudhur Mcrosoft Research surajtc@mcrosoft.com
More informationCircuit Analysis I (ENGR 2405) Chapter 3 Method of Analysis Nodal(KCL) and Mesh(KVL)
Crcut Analyss I (ENG 405) Chapter Method of Analyss Nodal(KCL) and Mesh(KVL) Nodal Analyss If nstead of focusng on the oltages of the crcut elements, one looks at the oltages at the nodes of the crcut,
More informationQuality Improvement Algorithm for Tetrahedral Mesh Based on Optimal Delaunay Triangulation
Intellgent Informaton Management, 013, 5, 191-195 Publshed Onlne November 013 (http://www.scrp.org/journal/m) http://dx.do.org/10.36/m.013.5601 Qualty Improvement Algorthm for Tetrahedral Mesh Based on
More informationCSCI 104 Sorting Algorithms. Mark Redekopp David Kempe
CSCI 104 Sortng Algorthms Mark Redekopp Davd Kempe Algorthm Effcency SORTING 2 Sortng If we have an unordered lst, sequental search becomes our only choce If we wll perform a lot of searches t may be benefcal
More informationData Representation in Digital Design, a Single Conversion Equation and a Formal Languages Approach
Data Representaton n Dgtal Desgn, a Sngle Converson Equaton and a Formal Languages Approach Hassan Farhat Unversty of Nebraska at Omaha Abstract- In the study of data representaton n dgtal desgn and computer
More informationSAO: A Stream Index for Answering Linear Optimization Queries
SAO: A Stream Index for Answerng near Optmzaton Queres Gang uo Kun-ung Wu Phlp S. Yu IBM T.J. Watson Research Center {luog, klwu, psyu}@us.bm.com Abstract near optmzaton queres retreve the top-k tuples
More informationMachine Learning: Algorithms and Applications
14/05/1 Machne Learnng: Algorthms and Applcatons Florano Zn Free Unversty of Bozen-Bolzano Faculty of Computer Scence Academc Year 011-01 Lecture 10: 14 May 01 Unsupervsed Learnng cont Sldes courtesy of
More informationConcurrent models of computation for embedded software
Concurrent models of computaton for embedded software and hardware! Researcher overvew what t looks lke semantcs what t means and how t relates desgnng an actor language actor propertes and how to represent
More informationPHYSICS-ENHANCED L-SYSTEMS
PHYSICS-ENHANCED L-SYSTEMS Hansrud Noser 1, Stephan Rudolph 2, Peter Stuck 1 1 Department of Informatcs Unversty of Zurch, Wnterthurerstr. 190 CH-8057 Zurch Swtzerland noser(stuck)@f.unzh.ch, http://www.f.unzh.ch/~noser(~stuck)
More informationSupport Vector Machines
Support Vector Machnes Decson surface s a hyperplane (lne n 2D) n feature space (smlar to the Perceptron) Arguably, the most mportant recent dscovery n machne learnng In a nutshell: map the data to a predetermned
More informationComputer Animation and Visualisation. Lecture 4. Rigging / Skinning
Computer Anmaton and Vsualsaton Lecture 4. Rggng / Sknnng Taku Komura Overvew Sknnng / Rggng Background knowledge Lnear Blendng How to decde weghts? Example-based Method Anatomcal models Sknnng Assume
More informationREDUCING hardware design time is more than ever a
TCAD-2012-0168 1 Polyhedral Bubble Inserton: A Method to Improve Nested Loop Ppelnng for Hgh-Level Synthess Antone Morvan, Steven Derren, and Patrce Qunton Abstract Hgh-Level Synthess (HLS) allows hardware
More information