REDUCING hardware design time is more than ever a

Size: px
Start display at page:

Download "REDUCING hardware design time is more than ever a"

Transcription

1 TCAD Polyhedral Bubble Inserton: A Method to Improve Nested Loop Ppelnng for Hgh-Level Synthess Antone Morvan, Steven Derren, and Patrce Qunton Abstract Hgh-Level Synthess (HLS) allows hardware to be drectly produced from behavoral descrpton n C/C++, thus acceleratng the desgn process. Loop ppelnng s a key transformaton of HLS, as t mproves the throughput of the desgn at the prce of a small hardware overhead. However, for small loops, ts use often results n a poor hardware utlzaton due to the ppelne latency overhead. Overlappng the teratons of the whole loop nest nstead of the nnermost loop only s a way to overcome ths dffculty, but current avalable technques are restrcted to perfectly nested loops wth constant bounds, nvolvng unform dependences only. Usng the polyhedral model, we extend the applcablty of the nested loop ppelnng transformaton by proposng a new legalty check and a new loop correcton technque, called polyhedral bubble nserton. Ths method was mplemented n a source-to-source compler targetng Hgh- Level Synthess, and results on benchmark kernels shows that polyhedral bubble nserton s effectve n practce on a much larger class of loop nests. Index Terms Hgh-Level Synthess, Source-To-Source Transformatons, Nested Loop Ppelnng, Polyhedral Model, Loop Coalescng I. INTRODUCTION REDUCING hardware desgn tme s more than ever a prorty for chp vendors. To ths end, desgners are shftng away from regster transfer level descrptons n favor of desgn flows that operate at hgher levels of abstractons. Hgh- Level Synthess (HLS) addresses ths need by enablng the hardware components to be drectly desgned from behavoral specfcatons n C or C++. There now exst several mature and robust commercal tools [1], [2] that are used for producton by maor chp makers. However, desgns generated by HLS are far from delverng performance comparable to those produced by experts. Ths s manly due to the dffculty for HLS to extract from the source code the nformaton needed to enable some loop transformatons. Ths lack of performance can be overcome by lettng the desgner ether manually drve the HLS tool, or manually expose approprate structures (data and/or algorthms) drectly n the source code. It s our belef that such processes can be automated wthn a source-to-source optmzng compler. The goal of the work reported here s to mprove the applcablty and the effcency of nested loop ppelnng also known as nested software ppelnng, n C-to-hardware translaton tools. The contrbutons of ths research are as follows: A fast approxmaton along wth an accurate legalty check s descrbed. Gven a ppelne latency, t ndcates whether ppelnng a loop nest enforces the datadependences of a program. When the legalty check fals, a loop correcton algorthm s proposed. It conssts n addng, at comple tme, socalled wat-states nstructons, also known as ppelne bubbles, to make sure that the aforementoned ppelnng becomes legal. In order to make the loop nest amenable to ppelnng, the loop nest s flattened at the source level usng an automatc loop coalescng transformaton. These technques leverage on the well-known polyhedral model [3], [4], [5]. Usng the hgh-level representaton of loops of ths model, these methods are applcable to a much wder class of programs namely mperfectly nested loops wth affne bounds and ndex functons than prevously publshed works [6], [7], [8], [9]. Thanks to tools avalable n the polyhedral model communty, these new methods were mplemented wthn a sourceto-source compler. Ther applcablty was valdated on a set of representatve kernels, and the trade-offs between the performance mprovements provded by the full nested loop ppelnng transformaton on the one hand, and the area overhead nduced by guards that are added to the control code on the other hand are dscussed. Ths artcle s organzed as follows. Secton II provdes an n-depth descrpton of the problem addressed by ths research, and mentons exstng approaches. Secton III summarzes the prncples of program analyss and transformatons n the polyhedral framework that are needed to understand ths work. The new ppelne legalty analyss and the loop correcton technque are presented n sectons IV and V. Secton VI descrbes ther mplementaton and provdes a quanttatve and qualtatve analyss of ther performance. In secton VII, relevant related work s presented, and the novelty of ths contrbuton s hghlghted. Concluson and future work are descrbed n secton VIII. II. MOTIVATIONS The goal of ths secton s to present and motvate the problem addressed n ths research, that s nested loop ppelnng. To help the reader understand the contrbutons of ths work, a runnng toy loop-nest example shown n Fgure 1 s used throughout the remanng of ths artcle. Ths example s a smplfed excerpt from the QR decomposton algorthm. It conssts n a double nested loop operatng on a trangular teraton doman the teraton doman of a loop s the set of values taken by ts loop ndces. A. Loop ppelnng n HLS Loop ppelnng conssts n executng the body of a loop usng several ppelne stages. The effectveness of ths trans-

2 TCAD /* orgnal source code */ for(nt =0;<N;++) { for(nt =0;<N-;++){ S 0 : Y[] = func(y[]); Stage 4 Stage 3 Stage 2 Stage 1 ntalzaton flush ntalzaton flush (0,0) (0,0) (0,1) (0,0) (0,1) (0,0) (0,1) (0,1) ntalzaton t (Clock cycles) flush Fg. 2: Representaton of the ppelned executon of the smplfed QR decomposton loop of Fg. 1, for sze parameter N = 5, ntaton nterval Φ = 1 and ppelne latency = 4. Arrows represent dependences between operatons. /* orgnal source code */ for(nt =0;<N;++) { for(nt =0;<N-;++){ S0: Y[] = func(y[]); (a) Smplfed QR loop (b) Graphcal representaton of (a) Fg. 1: A smplfed QR decomposton loop (a) and the representaton of ts teraton doman and of the data-dependences of the Y array (sold arrows) for N = 5 (b). The red dashed arrow shows the executon order of the loop. formaton comes from the fact that several loop teratons can be executed smultaneously by the dfferent stages. To produce an equvalent loop, one must however make sure that the executons of successve teratons are ndependent. Loop ppelnng s characterzed by two mportant parameters: The ntaton nterval (denoted by Φ n the followng) s the number of clock cycles separatng the executon of two successve loop teratons. The ppelne latency (denoted by ) gves the number of clock cycles requred to completely execute one teraton of the loop. The latency usually corresponds to the number of stages of the ppelne. In the example of Fgure 1, the reader can observe that the nner loop (along the ndex) exhbts no data-dependences between calculatons done at dfferent teratons (also called loop carred dependences). As a consequence, one can ppelne the executon of ths loop by overlappng the executon of several teratons of ts nner loop. As an llustraton, Fgure 2 depcts the ppelned executon of the example of Fgure 1 wth an ntaton nterval Φ = 1 and a latency = 4. In practce the value of the ntaton nterval Φ s constraned by two factors: the presence of loop carred dependences, whch prevents loop teratons to be completely overlapped; resource constrants on the avalable hardware snce for a complete ppelned executon, each operaton executed n the loop has to be mapped on ts own hardware functonal unt. In ths example, notce that between two teratons of the external loop, there s a flush phase whch s needed to prevent dependences between these teratons to be volated. We shall see n ths paper how ths flush phase can be avoded, leadng to more effcent ppelned mplementatons. Because t helps maxmze the computaton throughput and because t mproves hardware utlzaton, loop ppelnng s a key transformaton of hgh-level synthess. Besdes, as desgners generally seek to get the best performance from ther mplementaton, fully ppelnng the loop, that s ntatng a new nner loop teraton every cycle by choosng Φ = 1, s a very common practce. However, the performance mprovements obtaned through ppelnng are often hndered by the fact that these tools rely on very mprecse data-dependency analyss algorthms and hence they may fal to detect when a ppelned executon s possble, especally when the nner loop nvolves complex memory access patterns. To help desgners cope wth these lmtatons, most tools offer the ablty to bypass part of the dependency analyss usng compler drectves (generally n the form of #pragma). These drectves force the tool to gnore user-specfed memory references n ts dependency analyss. Of course, ths possblty comes at the rsk of generatng an llegal ppelne and then an ncorrect crcut, and hence t puts on the desgner the burden to decde whether the transformaton s legal or not. B. The Ppelne Latency Overhead For loops wth large teraton counts loop teraton count s the number of teratons executed by a loop, the mpact of the ppelne latency on performance can be neglected, and the hardware s then almost 100% utlzed. However, whenever the teraton count of the loop becomes comparable to ts latency, one may observe very sgnfcant performance degradaton, as the ppelne flush phases domnate the executon tme. Ths s the case n the example of Fgure 2. For values N = 5 and = 4, the hardware utlzaton rate s only 50%. On ths example, experenced desgners would have certanly reached a hardware utlzaton close to 100% usng a handcrafted schedule n whch the executon of successve teratons of the loop would have been overlapped. C. Nested loop ppelnng Intally proposed by Dosh et al. [6], nested loop ppelnng s a means of mprovng the ppelned executon of a loop. It

3 TCAD =0;=0; whle(<n) { #pragma gnore_mem_depcy Y S 0 : Y[] = func(y[]); f( < N - 1) ++; else =0,++; Stage 4 Stage 3 Stage 2 Stage 1 ntalzaton (0,0) (0,0) (0,1) (0,0) (0,1) (0,0) (0,1) (0,1) (3,0) (3,0) (3,1) (3,0) (3,1) (4,0) (3,0) (3,1) (4,0) flush (3,1) (4,0) (4,0) t (Clock cycles) Fg. 3: Illustraton of an llegal nested loop ppelnng. The example shown s a coalesced verson of the smplfed QR decomposton loop, for N = 5, Φ = 1 and = 4. Thck red arrows show volated dependences. s the method consdered n ths artcle, and as done n other works [10], t was chosen here to apply t n two steps: frst rewrte the loop nest to be ppelned so that t becomes a sngle level loop. Ths s called loop coalescng; then ppelne the coalesced loop. The goal of loop coalescng (also known as loop flattenng) s to transform the control of the loop so that a sngle loop scans the orgnal loop nest doman. Dfferent versons of ths transformaton are dscussed n secton V-C. Loop coalescng can be done ndependently of the ppelne transformaton. It s worth notcng that untl now, nested loop ppelnng was only studed for perfectly nested loop wth constant bounds and unform dependences a very restrctve subset of loop nests, or wth relatvely mprecse dependency nformaton, and ths sgnfcantly restrcts ts applcablty and ts effcency. Whle these restrctons may seem over-precautous, t happens that mplementng nested loop ppelnng (and more partcularly enforcng ts correctness) s far from trval and requres a lot of attenton. As an example, Fgure 3 shows a coalesced verson of the loop nest of Fgure 1. Here, because the array accesses n the coalesced verson are dffcult to analyze (they do not depend on loop ndces as n Fgure 1), one would be tempted to bypass some of the dependency analyss through a compler drectve (#pragma gnore_mem_depcy Y) to force loop ppelnng, as explaned n subsecton II-A. Wthout such a drectve, the conservatve dependence analyss would forbd ppelnng. Whle at frst glance ths schedulng seems correct, t appears that some Read after Wrte dependences are volated when 3, as shown n Fgure 3. Indeed, the dependency between two successve teratons prevents the end of the nner loop ppelne to be overlapped wth the begnnng of the next one. For example, the memory read operaton on Y[0] of ( = 3, = 0) scheduled at t = 12 happens before Y[0] s updated by the wrte operaton of ( = 2, = 0) also scheduled at t = 12 on the last stage. As an llustraton of ths dffculty, among the numerous commercal and academc C-to-hardware tools that the authors have evaluated, only one of them actually provdes the ablty to perform automatc nested loop ppelnng. (Ths tool s called the reference HLS tool n the followng, RHLS for short.) However, ts mplementaton suffers from severe flaws and generates llegal schedules whenever the doman has nonconstant loop bounds. From what the authors understand, even wthout drectves to gnore data dependences, RHLS fals for the very same reasons as depcted n Fgure 3, that s ts analyss assumes that the dependences carred by the outer loop over the Y array are never volated. D. Contrbutons of ths work In the followng sectons, a formalzaton of the condtons under whch nested loop ppelnng s legal w.r.t data dependences s provded, n the case of mperfectly nested loops wth affne dependences (so called SCoPs [3]), where exact (.e. teraton-wse) data-dependency nformaton s avalable. In addton to ths legalty check, a technque to correct an a pror llegal nested ppelne schedule by nsertng wat states n the coalesced loop s proposed, n order to derve the most effcent legal ppelned schedule. These wat states correspond to properly nserted bubbles n the ppelne, hence the name polyhedral bubble nserton of ths new method. Fnally, to enable expermentaton and to reman as vendor ndependent as possble, an mplementaton of the polyhedral bubble nserton n the context of a source-to-source compler s descrbed. Ths mplementaton can be ncorporated as a preprocessng tool to be used ahead of thrd party HLS tools. III. BACKGROUND In order to perform a precse dependence analyss and f needed, to realze a cycle-accurate schedule correcton, an accurate representaton of loops s necessary. In ths respect, the polyhedral model s a robust mathematcal framework. It also comes wth a set of technques to analyze and transform loops and to regenerate source code. Fgure 4 llustrates a standard source-to-source flow wthn the polyhedral model. Ths secton detals each one of these steps. A. Statc Control Parts Detecton and Extracton The polyhedral model s a representaton of a subset of programs called Statc Control Parts (SCoPs), or alternatvely Affne Control Loops (ACLs). Such programs are composed only of loops and condtonal control structures, and the only allowed statements are array assgnments of arbtrary expressons wth array reads. (Scalar varables are vewed as zero-dmensonal arrays.) The loop bounds, the condtons

4 TCAD Source for( ) for ( ) S0(,) Code generaton Source for( ) for ( ) S0(,) Extracton + ADA Polymodel Polymodel Schedulng Fg. 4: Overvew of a classcal source-to-source flow wthn the polyhedral model. After extractng statc control parts (SCoPs) from the source code, the array data-flow analyss (ADA) produces the polyhedral representaton of the SCoP. Then schedulng transforms the doman and the executon order of the loop. Fnally, code generaton produces a loop nest that scans the new doman accordng to the new executon order. and array subscrpts have to be affne expressons of loop ndexes and parameters. Extractng SCoPs s the frst step n an automatc polyhedral flow, as depcted n Fgure 4. Each statement S n a SCoP, surrounded by n loops, has an assocated doman whch represents the set of values the ndces of these loops can take. Let Z n denote the set of ntegral coordnate vectors of dmenson n. Loop ndces are represented by teraton vectors of Z n. The doman of a statement S s called ts teraton doman, and s denoted by D S Z n. In SCoPs, D S s defned by a set of affne constrants,.e. the set of loop bounds and condtonals on ts ndexes, and t s therefore a parameterzed polyhedron. In what follows, we call operaton a partcular teraton of a statement,.e., a statement wth a gven teraton vector. Fgure 1 shows the graphcal representaton of such a doman, where each full crcle represents an operaton. The doman constrants for the only statement S 0 of the loop of Fgure 1 are defned by: D S0 = {, 0 < N 0 < N. We shall denote by S( v) the operaton that corresponds to statement S and teraton vector v. The polyhedral model s lmted to the aforementoned class of programs. Ths class can be however extended to whle loops and data-dependent bounds and ndexes, at the prce of a loss of accuracy n the dependence analyss [11], [12]. The detecton of SCoPs s done by a mere syntactc analyss of the compler front-end. B. Array Data-flow Analyss (ADA) The strength of the polyhedral model s ts capacty to allow an teraton-wse dependency analyss on arrays [13] to be performed. The goal of dependency analyss s to answer questons such as Q: what operaton produced the value beng read by the currently executng operaton? For example, n the program of Fgure 1, what operaton wrote the last value of the rght-hand sde reference Y[]? Answerng such a queston s the second part of the frst step n the polyhedral flow presented n Fgure 4. Iteratons of one statement n a loop nest can be ordered by the lexcographc order of ther teraton vectors. Consder two teraton vectors a Z m and b Z n. Denote by a [q] the q-th component of a, and by a [0..q] the left-most sub-vector (a 0,..., a q ) of a. Then a s sad to be lexcographcally greater than b, noted a b, ff ether ( a [0] > b [0] ) or there exsts a value q [1..mn(m, n)] such that ( a [0..q 1] = b [0..q 1] a [q] > b [q] ). Notce that teratons of several statements n a loop nest can be ordered by combnng the lexcographc order of ther teraton wth ther textual order n the loop. Ths combnaton defnes the precedence order, noted. When consderng sequental loop nests, the precedence order s a total order. To smplfy matter, teraton vectors can be extended usng the textual rank of the statements n the loop body (see Bastoul [3]), so that the precedence order reduces to the lexcographc order of the teraton vectors, and consequently, ths wll be assumed n the remanng of ths paper. Also n order to smplfy our presentaton, t wll be assumed, wthout loss of generalty, that a statement n a SCoP has at most one array wrte reference and one array read reference. Wth ths assumpton, read array or wrte array references are unquely dentfed by the teraton vector of ther operatons. The precedence order allows an exact answer to be gven to queston Q: The operaton that last modfed an array reference n the currently executng operaton s ust the latest wrte n the same array reference accordng to the precedence order. In the example of Fgure 1, the operaton that modfed the rght-hand sde reference Y[] n operaton S 0 (, ) s the same statement of the loop, when t was executed at prevous teraton S 0 ( 1, ). A dependency s represented by a functon d that assocates to each read the operaton that produced the value beng read. In our example, d(s 0 (, )) = S 0 ( 1, ). Another way of representng ths s to use a relaton notaton à la Omega [14] functons can be consdered as a specal case of bnary relatons. Ths s also the standard notaton of the ISL lbrary [15] that we shall use extensvely n ths paper: { d = (, ) (, ) (, ) D S 0 (, ) D S0 (, ) = ( 1, ) Snce they represent all the nstances of the dependency n a compact polyhedral relaton, dependency functons are called polyhedral reduced dependences. The graph representng all the dependences for one SCoP s called the polyhedral reduced dependency graph (PRDG). In the remanng of ths paper, snk(d) D denotes the doman of the dependence functon d, that s to say, the set of array reads on whch d can be appled, and src(d) D denotes the range of functon d,.e. the set of array wrtes that t leads to. In summary, the second part of the frst step of the polyhedral model flow, ADA, s to buld the PRDG of a SCoP, whch s much more nvolved as the SCoP detecton and extracton (see [13] for detals on ADA.) (1)

5 TCAD C. Schedulng In the polyhedral model, the precedence order s known exactly. Therefore, transformatons of the loop executon order, also known as schedulng transformatons, can be constraned to enforce data-flow dependences. Ths s the second step n the polyhedral flow of Fgure 4. A one-dmensonal, ntegral schedule σ enforces a dependency d : { a b f σ( a) > σ( b). More generally, a multdmensonal, ntegral schedule σ enforces a dependency f σ( a) σ( b). To enforce all dependences, a schedule must meet a set of affne constrants. The ntersecton of these constrants for all the dependences n the PRDG gves a polyhedron. Not only some propertes can be checked on ths polyhedron, for example the legalty of a gven transformaton, but also one can automatcally compute the space of all possble transformatons, n order to fnd the best one accordng to some crteron. A consderable amount of work has been done on ths topc, and the reader s referred to Feautrer [16] and Pouchet et al. [5] for more detals. As far as loop ppelnng s concerned, schedulng s not a central transformaton, as one may assume that ppelnng s appled to the sequental verson of the loop wthout changng ts schedule. However, t s worth mentonng ths step, as n some cases, t may be useful to change the schedule of a loop n order to reach a better ppelnng transformaton. D. Code Generaton The last step of source-to-source transformaton wthn the polyhedral model s to re-generate a sequental code that scans the new doman, as shown n Fgure 4. Two approaches to solve ths problem domnate n the lterature. 1) Loop Nests Generaton: The frst one was developed by Qullere and al. [17] and later extended and mplemented by Bastoul n the ClooG software [3]. ClooG allows regenerated loops to be guardless, thus avodng useless teratons at the prce of an ncrease n code sze. Wth the same goal, the code generator n the Omega proect [14] tres to regenerate guardless loops, and provdes optons to fnd a trade-off between code sze and guards. 2) Fnte State Machne Generaton: The second approach, developed by Boulet et al. [18] ams at generatng code wthout loops. The prncple s to determne durng one teraton the value of the next teraton vector, untl the entre teraton doman has been vsted. Rather than generatng nested loops, ths nstead amounts to derve a fnte state machne that scans the teraton space. To do so Boulet et al. ntroduce a next D functon whch, gven an teraton x D, provdes ts mmedate successor next D ( x) n D accordng to the lexcographcal order. The constructon of ths functon s detaled n secton IV-C. Snce ths second approach behaves lke a fnte-state machne, t s beleved to be more sutable for hardware mplementaton [19], though there are stll very few quanttatve evdences to backup ths clam. We dscuss one man aspect of ths approach n secton V-C, that s ts effcency for coalescng loops. IV. LEGALITY CHECK Ths secton consders the problem of checkng that a gven nested loop ppelnng transformaton does not volate dependences. Secton IV-A descrbes the ppelne model and presents the legalty condton. Secton IV-B shows how ths condton can be checked by computng the reuse dstance of dependences. Another method, based on the computaton of the successors of the teraton ponts s descrbed n secton IV-C. Secton IV-D explans how to buld the set of volated dependences. Fnally, the complete algorthm s descrbed n IV-E. A. Ppelne Model and Legalty Condton Let be the number of stages of the ppelne,.e., ts latency. In our ppelne model, we consder that all the reads are beng executed durng the frst stage of the ppelne, and all the wrtes durng ts last stage (These assumptons are not essental to our method, but they smplfy the explanatons.) Let us call reuse dstance of a dependence the number of ponts n the teraton doman between a source teraton x and ts snk y. Snce the executon of the loop follows the lexcographc order on the teraton doman, one can observe that the executons of two successve teraton ponts are separated by one cycle. Therefore, the number of cycles that separate the executon of the source teraton x and that of the snk teraton y s equal to ther reuse dstance. On the other hand, the value produced by the executon of teraton x s avalable cycles after ts begnnng accordng to the ppelne model. Therefore, the nested loop ppelnng does not volate data dependences provded the dstance (n number of teraton ponts) between the producton of the value (at teraton x, the source) and ts use (at teraton y, the snk) s equal to or larger than. Ths condton s trvally enforced n one partcular case, that s when the loop nest to be ppelned does not carry dependences,.e. when the loops are parallel. Ths happens, for example, f the dependences n the loop nest are carred only by the outermost loop. One can then ppelne the n 1 nner loops, and f the ppelne s flushed at each teraton of the outermost loop, the latency does not volate dependences. To apply the nested loop ppelnng transformaton on loops that carry dependences, or to ppelne a whole loop nest (as shown n the example of Fgure 3 for example), a deeper analyss s requred, and ths s ust what our legalty condton provdes. B. Checkng the Legalty by Estmatng the Reuse Dstance Computng the reuse dstance between a source and a snk amounts to count the mnmum number of teratons that separate the source and the snk n the teraton doman. Consder the relaton R gven by R = { x z x src(d) x z d 1 ( x) z D (2) For a gven source pont x, R gves all the teratons ponts z whch are lexcographcally between x and one element of the set d 1 ( x), that s, the set of all possble snks of x.

6 TCAD d (a) R when (, ) = (1, 1) and N = 5. (b) R when (, ) = (N 1, 0) (for all N > 1). Fg. 5: Representaton of the range of the relaton R (enclosed ponts) over the doman of the example n Fgure 1 gven the source operaton of d. R s a parameterzed polyhedron whch can be computed usng the ISL software [15]. Gven R, the number of ponts between a source x and ts closest snk s a parametrc multvarate pseudo-polynomal P, whch depends on the parameters of the doman and on x (see [20]). A closed form of P can be computed usng the Barvnok lbrary [21]. Fnally, one can compute the mnmum Bernsten expanson [22] of P, whch gves a lower bound of ths expresson. If the resultng bound s greater than 1, then applyng the ppelne s legal. Example: Startng from the dependency d defned n Equ. (1), we can compute the nverse of d as follows : { d 1 = (, ) (, ) (, ) D S 0 (, ) D S0 (, ) = ( + 1, ) Usng Equ. (2): { R = (, ) (, ) (, ) src(d) (, ) D S0 (, ) (, ) d 1 (, ) After resortng to the smplfcaton of ths polyhedral relaton thanks to a polyhedral lbrary [15], one obtans: (0 = R = (, ) (, ) 0 < < N ) (0 = < < N 1) When (, ) = (1, 1) and N = 5, the range of R represents the hghlghted set n Fgure 5a, that s {( = 1 2 3) ( = 2 = 0). The number of nteger ponts between a source and a snk n D S0, accordng to the dependency d, s expressed as follow : P = card(r) = {(, ) (N 1) that s 3 when (, ) = (1, 1) and N = 5. Usng the Bernsten expanson, one can compute the mnmum value of P over D S0, for all the possble values of N. As shown n Fgure 5b, the mnmum s 1, and t s reached for (, ) = (N 1, 0). Thus applyng nested loop ppelnng wth = 4 on ths loop nest s not ensured to be legal. The above method s fast, but t does not always gve a good estmate of the lower bound. Besdes, the result of ths analyss does not provde a means to fx the loop, f the legalty condton s not met. d C. Constructng the next D ( x) functon To avod the drawbacks of the prevous method, one can construct a functon next D ( x) that gves for a gven teraton vector x ts successor teratons away n doman D. Then by checkng that all the snk teraton vectors y d 1 ( x) verfy y next D ( x), one s sure that the value produced at teraton x s used at the earlest teratons later. The next D ( x) functon can be derved by leveragng on the next D functon ntroduced by Boulet et al. [18] n ther code generaton technque (see Secton III-D2). By conventon, let next D ( x) = when an teraton vector x has no successor nsde D, and let next D ( ) =. Algorthm 1 recalls Boulet et al. s method, where dm(d) s the number of dmensons of D, lexmn(succ ) (gven by ISL) provdes the lexcographc mnmum of the relaton succ, and doman(next D ) denotes the teraton doman on whch next D s applcable. As expressed here, ths algorthm computes the next D functon only on the depth nnermost loops, and ths feature wll be used later on to avod possble useless computatons. Algorthm 1 Bulds the next D and next D functon Requre: 1 depth dm(d) procedure NEXTBOULET(D, depth) n dm(d) next D R D for p = n (n depth) do lexgt p { x y x [0..p 1] = y [0..p 1] x [p] > y [p] succ p { x y x R y D lexgt p next D next D lexmn(succ p) R D doman(next D) end for return next D Requre: 1 depth dm(d) procedure NEXTPOWER(D,, depth) next D nextboulet(d, depth) {{ return next D next D... next D Algorthm 1 can be best explaned by followng ts operaton on the example shown n Fgure 6. It starts by generatng the functon gvng the mmedate successor on the nnermost loop at depth p (D 2 and p = 2 n the example of Fgure 6). When there s no successor on that nnermost dmenson, that s when the terators are along the upper bound of the teraton doman, the algorthm looks for a successor on the next outer dmenson, at depth p 1 (D 1 and p = 1 on Fgure 6). Ths procedure s then repeated untl all dmensons of the doman have been scanned by the analyss, or when dmensons p depth s reached. At termnaton, the remanng pont s the lexcographc maxmum of the doman, and ts successor s (D and p = 0 on Fgure 6). The domans nvolved n ths algorthm are parameterzed polyhedra. Therefore, computng the next D functon can be done usng parametrc nteger lnear programmng [23], [18]. A soluton has then the form of a pecewse quas-affne functon. (Quas-affne functons are affne functons where

7 TCAD D2 = D {, < N--1 nextd(,) = (,+1) (p=2) D1 = D {, N--1, < N-1 nextd(,) = (+1,0) (p=1) D = D {, N-1 nextd(,) = (p=0) Fg. 6: Expresson of the mmedate successor (the functon nextd ) for the example of Fgure 1. The expresson dffers accordng to the current teraton n D1, D2 or D (a) Usng 4th lne of equ. (3), (b) Usng 6th lne of equ. (3), next4d (1, 1) = (2, 1) next4d (2, 1) = (4, 0) S0 S0 Fg. 7: Representaton of the functon next4ds on example of 0 Fgure 1 when N = 5 for 2 example operatons. dvson or modulo by an nteger constant are allowed.) Snce we only need to look for a constant number of teratons ahead, the next D functon s bult by compostons of nextd. Example: When ~x = (, ), the value of nextds0 (~x) for the example of Fgure 6 s as follow: <N 1 (, + 1) f ( + 1, 0) elsef < N 1 nextds0 (, ) = else Composng ths relaton four tmes, one obtans the next4ds (, ) functon, whch s gven by : 0 next4ds (, ) = 0 (, + 4) f N 5 elsef N 5 = N 1 ( + 1, 3) elsef N 4 = N 2 ( + 1, 2) ( + 1, 1) elsef N 3 = N 3 (3) elsef N 4 = N 4 ( + 1, 0) (N 1, 0) elsef = N 3 = 1 N 3 (N 2, 0) elsef = N 4 = 3 N 4 else For example when N = 5 (the chosen parameter) and (, ) = (1, 1), the 4th lne of equaton (3) s actve ( = N 3 and N 3). Therefore the expresson of the successor 4 teratons away s ( + 1, 1) = (2, 1), whch can be checked on Fgure 7a. D. Buldng the Volated Dependency Set As mentoned prevously, a gven dependency d s enforced by the nested loop ppelnng transformaton ff, for all (~x ~y ) d such that ~y d 1 (~x), we have ~y next x). D (~ A consequence of ths condton s that f next (~ x ) =, D that s when the successor teratons later s out of the teraton doman, then the dependency d wll be volated by the ppelned executon, because at least one snk of ~x wll get the value computed by ~x too late due to the ppelne latency. Ths observaton allows the set Dd of all the source teratons volatng the dependency d to be bult: d 1 (~x) next (~x) D (4) Dd = ~x src(d) or next x) { D (~ Checkng the legalty of a nested loop ppelnng w.r.t. the dependency d then sums up to check the emptness of ths parameterzed doman, whch can be done wth ISL [15]. Checkng the legalty condton for a whole SCoP nvolves S checkng the emptness of the set D = d P RDG Dd. Example: Usng the nverse of d gven n secton IV-B, and the next4ds (, ) functon of Equ. (3), one can then fnd 0 the doman Dd of the source teratons volatng the datadependency d usng Equ. (4). In our example, and after resortng to the smplfcaton of ths polyhedral doman thanks to a polyhedral lbrary [15], one then obtans: Dd = {, (, ) DS0 N 4 < < N 1 < N 1 Snce d s the only dependency n the loop nest, D = When one substtutes N by 5 (the chosen value n our example), one gets D = {(2, 0), (2, 1), (3, 0), whch s the set of ponts that causes a dependency volaton n Fgure 3. Dd. E. The New Legalty Check Algorthm Algorthm 2 presents the legalty check for nested loop ppelnng. Argument depth represents the number of nner loops on whch the legalty check s appled. A few explanatons are n order. The Bernsten expanson s used as a means to avod some computatons of next D, snce they are costly. Functon ehrhart card(r) s drectly provded by the Barvnok lbrary and bernsten bound mn(p ) by ISL. Functon next D s bult accordng to Algorthm 1. Functon restrct(p RDG, depth) removes the dependences of the PRDG that are not carred by the depth nnermost loops, by ntersectng the PRDG wth the lexcographc equalty, only for dmensons that are not wthn the depth nnermost loops. Fnally, notce that ths method could be extended to a more general model of ppelne executon, where reads and wrtes

8 TCAD Algorthm 2 Checks nested loop ppelne legalty procedure { BERNSTEINBOUNDING(D, d) R x z x src(d) z D x z d 1 ( x) P ehrhart card(r) return bernsten bound mn(p ) Requre: 1 depth dm(d) procedure LEGALITYCHECK(P RDG, D,, depth) D P RDG = restrct(p RDG, depth) for all d P RDG do l bernstenboundng(d, d) f l < 1 then next D nextp ower(d,, depth) D d { x src(d) d 1 ( x) next D( x) next D( x) { D D D d end f end for return D = can occur durng any stage. would then have to be computed for each par of read and wrte, and moreover, wrte after read and wrte after wrte dependences would have to be taken nto consderaton. V. POLYHEDRAL BUBBLE INSERTION A legalty condton s an mportant step toward automated nested loop ppelnng. But t s possble to do better by correctng a gven loop to make nested loop ppelnng legal when the legalty check fals. Our dea s to determne at comple tme an teraton doman where wat states, or bubbles, are nserted n order to stall the ppelne so that the ppelned executon of the loop becomes legal. These bubbles should be nserted between the sources and the snks of volated dependences. Two constrants are mposed to ths correcton method. Frst, bubbles are nserted n the doman scanned by the coalesced loop, not n the orgnal loop nest. The reason s related to the behavor of most HLS tools, whose aggressve optmzaton technques are less lkely to dscard bubbles on the coalesced loop, thus removng ther effect. (Ths techncalty could probably be overcome by ntroducng some knd of NOP nstructon that the HLS tool would not optmze.) Second, ths correcton mechansm s restrcted to loop nests where at least the nnermost loop can be ppelned wthout bubble nserton. In such loop nests, the volated dependences are not carred by the nnermost loop, and one can add the bubbles at the end of the nnermost loop, only for teratons that are source of a volated dependency. It turns out that the corrected loop s then qute smple. To the contrary, and although t s perfectly possble to correct other knds of loops, experence has shown us that the resultng corrected loop contans a large number of new guards, whch make an mprovement very unlkely. The key queston s to determne how many bubbles are actually requred to fx the loop, as addng a sngle bubble =0;=0; whle(<n) { #pragma gnore_mem_depcy Y f(<n-) S 0: Y[] = func(y[]); f((>n-4&&<n-+2&&<n-1) <N--1) ++; else =0,++; bubbles Fg. 8: Illustraton of smple paddng for N = 5 and = 4. Whte ponts correspond to nserted bubbles when D s padded wth 1 = 3 bubbles. n a loop may ncdentally fx several volated dependences. In the followng, two solutons to ths problem are proposed: smple paddng, and optmzed paddng. A. Smple Paddng In a prevous work [24], the soluton proposed was to pad every nner loop contanng an teraton n D wth 1 bubbles. As a matter of fact, ths amounts to recreate the whole eplogue of the ppelned loop, but only for the outer loops that actually need t. Although smple, ths approach turns out to be too conservatve. Indeed one can notce that paddng D only, nstead of the whole nner loops enclosng t, stll adds a suffcent (and smaller) number of bubbles. How to buld ths set of bubbles s descrbed n Algorthm 3, and the result s llustrated n Fgure 8. Algorthm 3 Bulds the set of polyhedral bubbles Requre: sze 0 procedure PAD(D, sze) n dm(d) { R x y y [0..n 1] = x [0..n 1] x [n] y [n] x [n] + sze return R(D) Requre: D D procedure BUBBLESV1(D, D, ) B for all D d D do B B pad(d d, 1) end for return B D Nevertheless, ths method s stll too conservatve. For example, the reader wll notce that the nner loop teratons n D for ndex = 2 n the example of Fgure 8 do not actually need 2 bubbles, but only one. B. Optmzed Paddng The dea behnd ths second method s to buld the set of bubbles whle dong the legalty check, and to pad every nner loop n D enclosng D wth the exact number of teratons requred to preserve the dependency when applyng nested loop ppelnng. Consder a dependency between a source x D

9 TCAD =0;=0; whle(<n) { #pragma gnore_mem_depcy Y f(<n-) S 0: Y[] = func(y[]); f((>n-4&&<4&&<n-1) <N--1) ++; else =0,++; bubbles Fg. 9: Optmzed paddng for N = 5 and = 4. Whte ponts correspond to the nserted ppelne bubbles n the teraton doman. and a snk y, and let r be the reuse dstance assocated to these teraton ponts. One can note that f r <, then the dependence s volated, but nsertng at least r bubbles wll remove the dependency volaton. Thus, the legalty check s performed by ncreasng values of r, r = 1.., and f one computes for each value of r the set of source ponts D r for whch the dependency s volated, t becomes possble to pad the teraton doman wth a much more precse number of bubbles r. Ths s exactly what Algorthm 4 descrbes and what s llustrated n Fgure 9. In ths example, wth = 4, one has D 1 =, D 2 = {(3, 0) and D 3 = {(2, 0), (2, 1). Therefore paddng = 2 by 1 bubble and = 3 by 2 bubbles makes the ppelne legal. Algorthm 4 Mxed legalty check wth bubble nserton to buld the optmzed set of bubbles Requre: D 1 D 2 procedure ENCLOSING(D 1, D 2) n dm(d 1) return proectout(d 1, n 1) D2 Requre: 1 depth dm(d) procedure BUBBLESV2(P RDG, D,, depth) P RDG restrct(p RDG, depth) B for all d P RDG do l bernstenboundng(d, d) f l < 1 then for all r [1.. 1] do next r D nextp ower(d, r, depth) D d r { x src(d) d 1 ( x) = next r D( x) f D d r then B B pad(enclosng(d d r, D), r) end f end for end f end for return B D C. The Loop Coalescng Transformaton Once the bubble doman has been computed, the fnal step conssts n regeneratng C code for the HLS tool, after coalescng the loop. To do so, the loop nest structure s D for ( = 0; < 100; ++) for ( = 0; < 2; ++) S(,); = 0, = 0; whle ( < 100) f ( < 2) { S(,); ++; else = 0, ++; (b) CFG-coalesced loop. (a) Orgnal sample loop. = 0, = 0; whle ( < 100) S(,); f ( < 1) ++; else = 0, ++; (c) Boulet et al.-coalesced loop. Fg. 10: Dfference between (b) CFG- and (c) Boulet et al. loop coalescng. In CFG loop coalescng, whenever reaches 2, that s 1/3 of the teratons, tme s spent on control only. Usng Boulet et al. approach, never reaches the value 2. translated nto a software fnte state machne expressed as a whle loop. Startng from a polyhedral representaton of the loop nest, there are two possble approaches for mplementng ths transformaton. The frst approach reles on the ClooG [3] code generator to produce a loop nest that scans the teraton domans (ncludng bubbles). Coalescng can then be done by rewrtng the Control Flow Graph (CFG) of the generated loop nest, as t s done by Ylvsaker et al. [10]. The man advantage of ths approach s ts smplcty. (From what we understand, ths s the approach followed by RHLS when mplementng the nested loop ppelnng transformaton.) But t has the drawback that the automaton s bult from an mplct representaton of the teraton doman rather than from ts formal representaton as a polyhedron. Consequently the resultng automaton may contan extra dle states that do not correspond to an actual teraton of the loop nest. Ths stuaton s shown n Fgure 10b. Indeed, whenever reaches the value 2, the then branch of the condtonal s not taken, and statement S s not executed. Ths results n one cycle of outer loop executon spent only for control purpose, that s 1/3 of the total executon tme. The second approach follows the method of Boulet et al. [18], and conssts n buldng a fnte state machne drectly from the loop nest teraton doman usng the next D ( x) functon ntroduced n Secton IV. The generated code vsts the exact loop nest teraton doman. (See the example of Fgure 10c where never reaches 2.) The drawback of ths second approach s that the resultng code tends to be complex n term of guards, whch ncreases the hardware complexty. Snce the full loop nest needs not be coalesced, we combne both generaton methods by lettng the desgner add drectves to specfy how many nner loops should be coalesced. The ClooG software s then used to generate the outer loops (usng the stop opton), and Boulet et al. s method to generate the nner loops.

10 TCAD VI. EXPERIMENTAL RESULTS Ths secton descrbes how the ppelne legalty check and the polyhedral bubble nserton are mplemented wthn a compler framework, and provdes qualtatve and quanttatve evdence showng that they lead to sgnfcant performance mprovements at the prce of a moderate ncrease n hardware complexty. A. The GeCoS source-to-source compler GeCoS (Generc Compler Sute) s an open sourceto-source compler nfrastructure [25] ntegrated wthn the Eclpse framework and entrely based on Model Drven Software Development tools. GeCoS s specfcally targeted to HLS back-ends, and t provdes bult-n support for Mentor algorthmc data types C++ templates. GeCoS also contans a loop transformaton framework based on the polyhedral model, whch extensvely uses thrd party lbrares (ISL [15] for manpulatng polyhedral domans and solvng parametrc nteger lnear problems, and ClooG [3] for polyhedral code generaton). All the transformatons presented n ths work were mplemented wthn ths framework. B. Benchmark Kernels and Experment Condtons To evaluate ths approach, a set of applcaton kernels representng good canddates for our legalty check and for the bubble nserton algorthms was selected. These kernels were chosen to exercse the robustness and correctness of RHLS when confronted to non-trval cases. When needed, they were modfed so that they would allow ppelnng for ther nnermost loop but stll expose loop carred dependences on all ther outer loops. As a consequence, nested loop ppelnng could not be used blndly, as t mght have led to a dependency volaton. The benchmark kernels are as follows: Prodmat: a product of 2 matrces where the dependency of the accumulaton s moved to the second loop usng a loop nterchange transformaton. BBFIR: a block based FIR where the loop nest s skewed to remove the dependences on the nnermost loop. It s the only 2-nested loop kernel. Jacob: a 2D Jacob stencl wth the same transformaton as BBFIR. FW: an mplementaton of the Floyd-Warshall algorthm where the loop nest s also skewed. QRC: a QR Decomposton usng CORDIC operators where the orgnal C mplementaton already shows an nnermost loop wthout dependency. The benchmark kernels were submtted to GeCoS, for parsng, polyhedral analyss, and applcaton of ppelne sourceto-source transformatons. The transformed kernels were then processed by RHLS, whch produced VHDL code. The resultng VHDL code was then syntheszed usng Quartus, to target an Altera Stratx IV FPGA. Unless otherwse stated, data types of the kernels were 32 bt fxed pont numbers. Each kernel was assgned a fxed target ppelne latency n the followng way. RHLS was forced, Run-tme (ms) RHLS PBI V1 PBI V2 Kernel 1D 2D 1D 2D 3D 2D 3D Prodmat 4 ok llegal BBFIR 4 fal prevent Jacob 8 fal prevent FW 3 fal prevent QRC 13 fal prevent ok : HLS tool ppelnes and t s legal prevent : HLS tool does not ppelne and t s effectvely llegal fal : HLS tool does not ppelne whereas t s legal llegal : HLS tool does ppelne whereas t s llegal TABLE I: Result of RHLS on applcaton kernels, and runtme (Xeon at 2.4GHz) of two versons of the Polyhedral Bubble Inserton (PBI) for the gven latency. The analyss for 1 dmenson (column 1D) takes exactly the same tme for both algorthms, whle algorthm 3 (V1) takes more tme than algorthm 4 (V2) when bubble nserton s requred. by approprate drectves, to generate a possbly ncorrect ppelne hardware descrpton of the kernel, targetng a frequency of 100 MHz on the Altera Stratx IV FPGA. Snce the ppelne latency s essentally related to the complexty of statement, and much less to the ppelne control, ths method provdes realstc latency values. C. Qualtatve results The frst experment was to check f RHLS could ppelne the nnermost loop wthout any drectve (1D), and whether t would prevent the second loop to be ppelned (2D). Results for the benchmark kernels are shown n the left part of Table I. RHLS could only ppelne Prodmat, whose array accesses are very smple (see column 1D). It would also allow the two nnermost loops of Prodmat to be ppelned, whereas t would lead to a dependency volaton when the szes of the matrces are smaller than the data-path latency (see column 2D). For all other kernels, the dependency analyss of RHLS was too conservatve, and t faled at ppelnng the nnermost loop. It also prevented the second loop to be ppelned. The rghtmost part of Table I presents the run tme of our methods (PBI, standng for Polyhedral Bubble Inserton). We run both Algorthms 3 (V1) and 4 (V2) on the kernels. Column 1D gves the tme needed to check that there s no dependency on the nnermost loop (dentcal for both algorthms). Columns 2D V1 and 2D V2 gve respectvely the tme needed for algorthm 3 and algorthm 4 to buld the set of bubbles for the second loop, and 3D for the whole loop nest. These run-tmes depend on the shape of the loop teraton domans, and on the latency, but they are acceptable n the aforementoned context. One can note that V2 s n general much faster than V1. Ths s because the lnear programmng problems solved by V2 are smpler than those of V1. Generatng ths next D functon s very compute ntensve. Table II shows that the run-tme s stll acceptable when the latency and the depth are reasonable. Although the run-tme grows exponentally wth the latency and depth, the analyss stll fnshes n extreme cases (last row of the table).

11 TCAD Kernel Prodmat BBFIR Jacob FW QRC = depth TABLE II: Run-tme (Xeon at 2.4GHz) n mllseconds to generate the next D functon for several kernels, wth dfferent and at dfferent depths. Kernel Prodmat BBFIR FW Jacob QRC Verson Hardware Characterstcs ALUT REG DSP Freq. (MHz) RHLS 1D PBI 2D V PBI 2D V PBI 3D V RHLS 1D PBI 2D V PBI 2D V RHLS 1D PBI 2D V PBI 2D V PBI 3D V RHLS 1D PBI 2D V PBI 2D V PBI 3D V RHLS 1D PBI 2D V PBI 2D V PBI 3D V TABLE III: Hardware characterstcs for our nested ppelne mplementatons (PBI 2D V1, PBI 2D V2 and PBI 3D V2) compared to nnermost ppelne (RHLS 1D). To push the methods, algorthm 4 was appled to QRC, wth a target frequency of 200 MHz, a 72 bt data-type, and tryng to ppelne the 3 loops. For ths extreme scenaro, RHLS gave a latency of 67 cycles, and algorthm 4 was able to buld the set of bubbles, wth a run-tme of 25 mn. Ths shows that the method, although costly, s realstc for qute complex examples. D. Quanttatve results Insertng bubbles n the loop nest makes the control of the loop more complex, snce the bubbles doman adds guards to the statements, and constrants to the loop bounds. Ths extra control code results n addtonal hardware for the control n the hardware descrpton generated by HLS tools, and t can reduce the maxmum frequency acheved by the logc synthess. To evaluate the actual mpact of these methods, the hardware complexty and the run-tme of the algorthms on the benchmark kernels were estmated. Hardware was generated by RHLS usng drectves to make the tool gnore the dependences known to be false postves. Table III provdes an evaluaton of the hardware complexty and frequency. For each problem sze, four versons were consdered: RHLS 1D generates hardware usng RHLS. PBI 2D V1 and PBI 2D V2 are Algorthms 3 and Algorthm 4 respectvely, appled on the second level of the loop nest. PBI 3D V2 s Algorthm 4 appled on the whole loop nest. (Notce that Table III does not contan ppelnng results of RHLS, snce as explaned n subsecton VI-C, RHLS faled to detect possble ppelnng or would generate llegal ppelnes on the kernels.) For each method, Table III dsplays an estmaton of the area cost n terms of regsters (REG), adaptve lookup tables (ALUT) and DSP operators (DSP), and t provdes an estmaton of the maxmum frequency of the syntheszed desgn. The area overhead when nsertng bubbles s moderate (less than 25%), except for FW for whch the cost doubles. The reasons of ths excepton are frst, that FW only nvolves addtons and comparsons of nteger values, whch do not cost a lot as compared to the control, hence the huge overhead when control s added; second, the bubble domans of FW contans a number of addtonal constrants. The frequency of the generated hardware s smaller when the doman of the bubbles s complex, compared to the orgnal teraton doman (for Prodmat and FW), or s equvalent when the bubble doman s relatvely smple (QRC, Jacob). Ths was expected, snce addtonal constrants make the control data-path longer, thus ncreasng the crtcal path. For unknown reasons, RHLS acheved a much hgher frequency when ppelnng the 2 loops n the BBFIR kernel, whereas the bubbles doman s complex. Table IV dsplays the number of cycles requred to execute the ppelned kernels, and t provdes the wall clock tme w.r.t. the maxmum frequency n Table III. For each kernel, two problem szes were consdered n order to evaluate the effect of ths parameter. As shown by the Rato column, the number of cycles s n general smaller when usng nested loop ppelnng, because the approach nserts fewer bubbles than the number of cycles requred by a full flush at each teraton of the outer loops. However, when the problem sze ncreases, the loop trp count s hgh, and the flush overhead decreases. Therefore, the reducton of the number of cycles does not compensate for the lower frequency and the area overhead. Note that algorthm 4 (PBI 2D V2) always nsert less bubbles than algorthm 3 (PBI 2D V1), or at worst, the same number (see Prodmat). However, ths does not come at the expense of a hgher complexty hardware. For QRC wth problem sze = the gan s relatvely small. Ths s because the loop trp count s small, compared to the latency, thus the set of nserted bubbles s comparable n sze to the number of flush cycles of the sngle loop ppelne. Results for large problem szes show that n general, as expected, nested loop ppelnng s effectve only when the loop count s small compared to the ppelne latency.

12 TCAD Kernel Problem Performance Sze Verson # Cycles Tme (ns) Rato RHLS 1D PBI 2D V PBI 2D V PBI 3D V Prodmat RHLS 1D PBI 2D V ( = 4) PBI 2D V PBI 3D V RHLS 1D PBI 2D V BBFIR PBI 2D V RHLS 1D PBI 2D V ( = 4) PBI 2D V FW RHLS 1D PBI 2D V PBI 2D V PBI 3D V RHLS 1D PBI 2D V ( = 3) PBI 2D V PBI 3D V Jacob RHLS 1D PBI 2D V PBI 2D V PBI 3D V RHLS 1D PBI 2D V ( = 8) PBI 2D V PBI 3D V QRC RHLS 1D PBI 2D V PBI 2D V PBI 3D V RHLS 1D PBI 2D V ( = 13) PBI 2D V PBI 3D V TABLE IV: Performance for dfferent problem szes wth archtecture characterstcs descrbed n Table III. VII. RELATED WORK AND DISCUSSION Ths secton compares our approach to prevous work on loop ppelnng and nested loop ppelnng. A. Loop ppelnng n hardware synthess Earler work on systolc archtectures addressed the problem of fne gran parallelsm extracton. Among others, Derren et al. [8] propose to use teraton doman parttonng to help combne operaton-level (ppelne) and loop-level parallelsm. A somewhat smlar problem s addressed by Tech et al. [9] who propose to combne modulo schedulng wth loop-level parallelzaton technques. The man lmtaton of these contrbutons s that they only support one-dmensonal schedules [26], whch sgnfcantly lmt ther applcablty. Alas et al. [27] address the problem of generatng effcent nested loop ppelned hardware accelerators leveragng custom floatng-pont data-paths. Ther approach (also based on the polyhedral model) conssts n fndng a parallel hyperplane for the loop nest, and then n dervng a tlng (hyperplanes and tle szes) chosen n such a way that a ppelne of depth s legal. Ths research only targets perfectly nested loops, and t also requres that ncomplete tles be padded to behave lke full tles. Besdes, the authors restrct themselves to unform dependences, so as to guarantee that the reuse dstance s always constant for a gven tle sze. In contrast, our methods are more general and support mperfectly nested loops wth non-unform (.e. affne) dependences. In addton, n the case of tled teraton domans, we can provde a more precse correcton (n terms of extra bubbles) that would not requre paddng all ncomplete tles. The Compaan/Laura [20] tool set takes another pont of vew, as t does not try to fnd a global schedule for the program statements. Instead, each statement of the program s mapped on ts own process. Dependences between statements are then materalzed as communcaton buffers, followng the so-called polyhedral process network semantcs [28]. Because the causalty of the schedule s enforced by the avalablty of data on the channel output, there s no need for takng statement executon latency nto account n the process schedule [29]. On the other hand, ths approach suffers from a sgnfcant hardware complexty overhead, as each statement requres ts own hardware controller plus possbly complex reorderng memory structures. In our opnon, ths research s geared toward task level parallelsm rather than toward fne gran parallelsm/ppelne. B. Nested loop software ppelnng Software ppelnng has proved to be a key optmzaton for leveragng the nstructon level parallelsm avalable n most compute ntensve kernels. Snce ts ntroducton by Lam et al. [30] a lot of work has been carred out on ths topc. Two drectons have manly been addressed: Many contrbutons tred to extend software ppelnng applcablty to wder classes of program structures, by takng control flow nto consderaton [31]. The other man research drecton focused on ntegratng new archtectural specfctes and/or addtonal constrants when tryng to solve the optmal software ppelnng problem [32]. Among these numerous contrbutons, some of them tackle problems very close to ours. Rong et al. [7] study software ppelnng for nested loops. Ther goal s to ppelne a loop that s not nnermost by usng loop nterchange, and to merge the flush and ntalzaton parts of the ppelne to reduce the mpact of the latency. Ths s smlar to our research although they do not target hardware synthess, but they restrct themselves to a narrow subset of loops (only constant bound rectangular domans) and they do not leverage exact nstance-wse dependency nformaton. Fellah et al [33] address the problem of prologue/eplogue mergng n sequences of software ppelned loops. Ther work s also motvated by the fact that the software ppelne overhead tends to be a severe lmtaton as many embeddedmultmeda algorthms exhbt low trp count loops. Agan, our approach dffers from thers n the scope of ts applcablty, as we are able to deal wth loop nests (not only sequences of loops), and as we solve the problem n the context of HLS tools at the source level through a loop coalescng transformaton. On the contrary, ther approach handles the problem at machne code level, whch s not possble n our context (source-to-source transformaton).

13 TCAD Thanks to the specfctes of the Itanum EPIC archtecture, Muthukumar et al. [6] are able to control the flush of the ppelne. Ther research ams at countng the number of teratons separatng the defnton of a value from ts use. However ther approach s only applcable to bounded loops wth unform dependences. Ths work also proposes a correcton mechansm that partally drans the ppelne when memory dependences may be volated. Snce the same number of bubbles s used for all the teratons of the mmedate outer loop, ths method mples fewer guards, but also more bubbles than our approach. C. Loop coalescng Loop coalescng was ntally used n the context of parallelzng complers n order to reduce the synchronzaton overhead [34]. Indeed, snce synchronzaton occurs at the end of each nnermost loop, coalescng loops reduces the number of synchronzatons durng the program executon. Such an approach has some smlarty to ours (ndeed, one could see the flush of the nnermost loop ppelne as a knd of synchronzaton operaton). However, n our case we can beneft from an exact tmng model of the synchronzaton overhead, whch can be used to remove unnecessary synchronzaton steps. D. Correctng llegal loop transformatons The dea of correctng a schedule as a post-transformaton step s not new, and t was ntroduced by Bastoul et al [35]. Ther dea was to frst look for nterestng combnatons of loop transformatons (be they legal or not), and then to try to fx possble llegal schedule nstances wth loop shftng transformatons. Ther result was later extended by Vaslache et al. [36], who consdered a wder space of correctng transformatons. Our work dffers from thers n that we do not propose to modfy the exstng loop schedule, but rather to add artfact statements to mprove the behavor of the loop. E. Generalty of the approach The technque presented n ths work can be appled to a subset of mperatve programs known as Statc Control Parts. Some extensons to ths model have been proposed to handle dynamc control structures such as whle loop and/or non-affne memory accesses [37]. Proposed approaches suggest approxmatng non-affne memory ndex functon by a parameterzed polyhedral doman (the parameter beng used to model the fuzzness ntroduced by the non-affne array references). As a matter of fact, the technque presented n ths work s able to deal wth arbtrary (non-affne) memory access functons, by consderng a conservatve name based datadependency analyss whenever non-affne ndex functons are nvolved n the program. Extendng the approach to program constructs where the teraton space cannot be represented as a parametrc polyhedron s however lkely to be much more challengng. VIII. CONCLUSION In ths paper, a new technque, called polyhedral bubble nserton, was proposed to support nested loop software ppelnng n C-to-hardware synthess tools. A nested ppelne legalty check that can be combned wth a comple-tme bubble nserton mechansm was descrbed. Ths bubbles nserton allows the causalty n the ppelned schedule to be enforced, for a large class of loop nests called SCoPs, thanks to the polyhedral model representaton of loops. Ths technque was mplemented as a proof of concept n a sourceto-source compler, and experments show promsng results for nested loops operatng on small teraton domans (up to 45% executon tme reducton n terms of clock cycles, wth a moderate hardware complexty overhead). Ths research also demonstrates the potental of source-tosource complaton as a means to overcome the shortcomngs of state of the art HLS tools. Especally, source-to-source complers are very well suted to mplementng program transformatons usng hgh-level representatons such as the polyhedral model. Future research could go n several drectons. Frst, we beleve that these methods could be used n more classcal optmzng compler back-ends, for example for deeply ppelned VLIW archtectures wth many functonal unts. In that case one smply needs to use the value of the loop body ntaton nterval as addtonal nformaton to determne whch dependences may be volated. Snce all the control wthn the polyhedral model fts nto (quas-)affne expressons, one possble enhancement would be to apply aggressve strength reducton n order to reduce the overhead nduced by the extra guards. Another research drecton s to nvestgate the case when dependences cross several teratons of the outer loop, snce our optmzed bubble nserton has shown to be suboptmal n ths case. The value of n the ppelne model s a conservatve over-approxmaton. For example, operatons may have dfferent latences, thus the dstance between read and wrte may dffer dependng on the operaton, resultng n several values (one per read/wrte par). More accurate values for could, for example, be obtaned by analyzng more precsely the schedule provded by the HLS tool. Ths mght result n less conservatve ppelne legalty condtons. ACKNOWLEDGMENTS The authors would lke to thanks Sven Verdoolaege, Cedrc Bastoul and all the contrbutors to the wonderful peces of software that are ISL and ClooG. Ths work was funded by the INRIA STMcroelectroncs Nano2012-S2S4HLS proect. REFERENCES [1] AutoESL Desgn Technologes. [2] M. Graphcs, Catapult-C Synthess.

14 TCAD [3] C. Bastoul, Code Generaton n the Polyhedral Model Is Easer Than You Thnk, n Proceedngs of PACT 13, (Juan-les-Pns, France), pp. 7 16, Sept [4] U. Bondhugula, A. Hartono, J. Ramanuam, and P. Sadayappan, PLuTo: A Practcal and Fully Automatc Polyhedral Program Optmzaton System, n Proceedngs of the ACM SIGPLAN Conference on Programmng Language Desgn and Implementaton, (Tucson, AZ), ACM, June [5] L.-N. Pouchet, C. Bastoul, A. Cohen, and J. Cavazos, Iteratve Optmzaton n the Polyhedral Model: Part II, Multdmensonal Tme, n Proceedngs of PLDI 08, (Tucson, Arzona), pp , ACM Press, June [6] K. Muthukumar and G. Dosh, Software Ppelnng of Nested Loops, n Proceedngs of the 10th Int. Conf. on Compler Constructon, CC 01, (London, UK), pp , Sprnger-Verlag, [7] H. Rong, Z. Tang, R. Govndaraan, A. Doullet, and G. R. Gao, Sngle-Dmenson Software Ppelnng for Multdmensonal Loops, ACM Trans. Archt. Code Optm., vol. 4, March [8] S. Derren, S. Raopadhye, and S. Kolay, Combned Instructon and Loop Parallelsm n Array Synthess for FPGAs, n Proceedngs of the 14th Int. Symp. on System Synthess, pp , [9] J. Tech, L. Thele, and L. Z. Zhang, Parttonng Processor Arrays under Resource Constrants, VLSI Sgnal Processng, vol. 17, no. 1, pp. 5 20, [10] B. Ylvsaker, C. Ebelng, and S. Hauck, Enhanced Loop Flattenng for Software Ppelnng of Arbtrary Loop Nests, tech. rep., Unversty of Washngton, [11] M. W. Benabderrahmane, L.-N. Pouchet, A. Cohen, and C. Bastoul, The Polyhedral Model s More Wdely Applcable Than You Thnk, n Proceedngs of Int. Conf. on Compler Constructon, pp , Sprnger, [12] J. F. Collard, D. Barthou, and P. Feautrer, Fuzzy Array Dataflow Analyss, n Proceedngs of the ffth ACM SIGPLAN Symposum on Prncples and Practce of Parallel Programmng, pp , ACM, [13] P. Feautrer, Dataflow Analyss of Array and Scalar References, Internatonal Journal of Parallel Programmng, vol. 20, no. 1, pp , [14] W. Kelly, W. Pugh, and E. Rosser, Code Generaton for Multple Mappngs, Proceedngs of the 5th Symposum on the Fronters of Massvely Parallel Computaton, pp , February [15] S. Verdoolaege, ISL: An Integer Set Lbrary for the Polyhedral Model, n ICMS (K. Fukuda, J. Van Der Hoeven, M. Joswg, and Y. Takayama, eds.), vol of Lecture Notes n Computer Scence, (Kobe, Japan), pp , Sprnger, Sept [16] P. Feautrer, Some Effcent Solutons to the Affne Schedulng Problem. Part II. Multdmensonal Tme, Internatonal Journal of Parallel Programmng, vol. 21, no. 6, pp , [17] F. Qulleré, S. Raopadhye, and D. Wlde, Generaton of Effcent Nested Loops from Polyhedra, Internatonal Journal of Parallel Programmng, vol. 28, pp , [18] P. Boulet and P. Feautrer, Scannng Polyhedra Wthout Do-Loops, n Proceedngs of the 1998 Internatonal Conference on Parallel Archtectures and Complaton Technques, (Washngton, DC, USA), p. 4, IEEE Computer Socety, [19] A.-C. Gullou, P. Qunton, and T. Rsset, Hardware Synthess for Mult- Dmensonal Tme, n Proceedngs of ASAP 2003, (The Hague, The Netherlands), pp , IEEE Computer Socety, June [20] A. Turan, B. Kenhus, and E. F. Deprettere, Classfyng Interprocess Communcaton n Process Network Representaton of Nested-Loop Programs, ACM Transactons on Embedded Computng Systems (TECS), vol. 6, no. 2, [21] S. Verdoolaege, R. Seghr, K. Beyls, V. Loechner, and M. Bruynooghe, Countng Integer Ponts n Parametrc Polytopes Usng Barvnok s Ratonal Functons, Algorthmca, vol. 48, no. 1, pp , [22] P. Clauss and V. Loechner, Parametrc Analyss of Polyhedral Iteraton Spaces, The Journal of VLSI Sgnal Processng, vol. 19, pp , [23] P. Feautrer, Parametrc Integer Programmng, RAIRO Recherche opératonnelle, vol. 22, no. 3, pp , [24] A. Morvan, S. Derren, and P. Qunton, Effcent Nested Loop Ppelnng n Hgh Level Synthess usng Polyhedral Bubble Inserton., n Proceedngs of Int. Conf. on Feld Programmable Technologes (R. Tesser, ed.), pp. 1 10, IEEE, [25] The GeCoS (Generc Compler Sute) Source-to-Source Compler Infrastructure. [26] P. Feautrer, Some Effcent Solutons to the Affne Schedulng Problem. I. One-Dmensonal Tme, Internatonal Journal of Parallel Programmng, vol. 21, no. 5, pp , [27] C. Alas, B. Pasca, and A. Plesco, Automatc Generaton of FPGA- Specfc Ppelned Accelerators, n Proceedngs of Int. Symp. on Appled Reconfgurable Computng, Mars [28] S. Verdoolaege, Polyhedral Process Networks, n Handbook of Sgnal Processng Systems (S. Bhattacharrya, R. Leupers, J. Takala, and E. Deprettere, eds.), Hedelberg, Germany: Sprnger, frst ed., [29] C. Zssulescu, B. Kenhus, and E. F. Deprettere, Increasng Ppelned IP Core Utlzaton n Process Networks Usng Exploraton, n Proceedngs of FPL 2004 (J. Becker, M. Platzner, and S. Vernalde, eds.), vol of Lecture Notes n Computer Scence, (Leuven, Belgum), pp , Sprnger, Aug [30] M. S. Lam, Software Ppelnng: An Effectve Schedulng Technque for VLIW Machnes, n Proceedngs of PLDI 88, pp , [31] H.-S. Yun, J. Km, and S.-M. Moon, Tme Optmal Software Ppelnng of Loops wth Control Flows, Internatonal Journal of Parallel Programmng, vol. 31, pp , [32] C. Akturan and M. F. Jacome, CALBeR: a Software Ppelnng Algorthm for Clustered Embedded VLIW Processors, n Proceedngs of ICCAD 01, (Pscataway, NJ, USA), pp , IEEE Press, [33] M. Fellah and A. Cohen, Software Ppelnng n Nested Loops wth Prolog-Eplog Mergng, n HPEAC (André Seznec and Joel S. Emer and Mchael F. P. O Boyle and Margaret Martonos and Theo Ungerer, ed.), vol of Lecture Notes n Computer Scence, pp , Sprnger, [34] M. T. O Keefe and H. G. Detz, Loop Coalescng and Schedulng for Barrer MIMD Archtectures, IEEE Trans. Parallel Dstrb. Syst., vol. 4, pp , September [35] C. Bastoul and P. Feautrer, Adustng a Program Transformaton for Legalty, Parallel Processng Letters, vol. 15, pp. 3 17, Mar [36] N. Vaslache, A. Cohen, and L.-N. Pouchet, Automatc Correcton of Loop Transformatons, n Proceedngs of the 16th Int. Conf. on Parallel Archtecture and Complaton Technques, PACT 07, (Washngton, DC, USA), pp , IEEE Computer Socety, [37] M. Belaoucha, D. Barthou, A. Elche, and S.-A.-A. Touat, FADAlb: an Open Source C++ Lbrary for Fuzzy Array Dataflow Analyss, n Proceedngs of the Seventh Internatonal Workshop on Practcal Aspects of Hgh-Level Parallel Programmng (PAPP 2010), Antone Morvan s a PhD student n Computer Scence at Ecole Normale Supreure of Cachan antenne de Bretagne snce He s also a member of the Carn research group at IRISA, Rennes. Hs research nterests nclude Hgh-Level Synthess, optmzng complers, and computer aded desgn. Steven Derren obtaned hs PhD from Unversty of Rennes 1 n 2003, and s now professor at Unvesrty of Rennes 1. He s also a member of the Carn research group at IRISA. Hs research nterests nclude Hgh-Level Synthess, loop parallelzaton, and reconfgurable systems desgn.

15 TCAD Patrce Qunton obtaned a degree of Engneer n Computer Scence of ENSIMAG (Grenoble, France), n 1972, and a These d Etat n Mathematcs of the Unversty of Rennes (France) n He has been Drecteur de Recherches of the CNRS, and head of the VLSI Parallel Archtectures group of IRISA n Rennes between 1982 and 1997, and snce then, he s professor of the Unversty of Rennes 1. Patrce Qunton s currently deputy drector of the brttany branch of Ecole Normale Supreure of Cachan and member of the Carn research group at IRISA, Rennes. Hs nterests nclude parallel archtectures, VLSI, systolc arrays, computer aded desgn and sensor networks. Patrce Qunton s coauthor of one book, and author and co-author of about one hundred ournal papers, nternatonal conference communcatons or book chapters.

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster

More information

Vectorization in the Polyhedral Model

Vectorization in the Polyhedral Model Vectorzaton n the Polyhedral Model Lous-Noël Pouchet pouchet@cse.oho-state.edu Dept. of Computer Scence and Engneerng, the Oho State Unversty October 200 888. Introducton: Overvew Vectorzaton: Detecton

More information

Loop Transformations for Parallelism & Locality. Review. Scalar Expansion. Scalar Expansion: Motivation

Loop Transformations for Parallelism & Locality. Review. Scalar Expansion. Scalar Expansion: Motivation Loop Transformatons for Parallelsm & Localty Last week Data dependences and loops Loop transformatons Parallelzaton Loop nterchange Today Scalar expanson for removng false dependences Loop nterchange Loop

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

Mathematics 256 a course in differential equations for engineering students

Mathematics 256 a course in differential equations for engineering students Mathematcs 56 a course n dfferental equatons for engneerng students Chapter 5. More effcent methods of numercal soluton Euler s method s qute neffcent. Because the error s essentally proportonal to the

More information

Loop Transformations, Dependences, and Parallelization

Loop Transformations, Dependences, and Parallelization Loop Transformatons, Dependences, and Parallelzaton Announcements Mdterm s Frday from 3-4:15 n ths room Today Semester long project Data dependence recap Parallelsm and storage tradeoff Scalar expanson

More information

Loop Permutation. Loop Transformations for Parallelism & Locality. Legality of Loop Interchange. Loop Interchange (cont)

Loop Permutation. Loop Transformations for Parallelism & Locality. Legality of Loop Interchange. Loop Interchange (cont) Loop Transformatons for Parallelsm & Localty Prevously Data dependences and loops Loop transformatons Parallelzaton Loop nterchange Today Loop nterchange Loop transformatons and transformaton frameworks

More information

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,

More information

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1)

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1) Secton 1.2 Subsets and the Boolean operatons on sets If every element of the set A s an element of the set B, we say that A s a subset of B, or that A s contaned n B, or that B contans A, and we wrte A

More information

Parallel matrix-vector multiplication

Parallel matrix-vector multiplication Appendx A Parallel matrx-vector multplcaton The reduced transton matrx of the three-dmensonal cage model for gel electrophoress, descrbed n secton 3.2, becomes excessvely large for polymer lengths more

More information

Polyhedral Compilation Foundations

Polyhedral Compilation Foundations Polyhedral Complaton Foundatons Lous-Noël Pouchet pouchet@cse.oho-state.edu Dept. of Computer Scence and Engneerng, the Oho State Unversty Feb 8, 200 888., Class # Introducton: Polyhedral Complaton Foundatons

More information

Smoothing Spline ANOVA for variable screening

Smoothing Spline ANOVA for variable screening Smoothng Splne ANOVA for varable screenng a useful tool for metamodels tranng and mult-objectve optmzaton L. Rcco, E. Rgon, A. Turco Outlne RSM Introducton Possble couplng Test case MOO MOO wth Game Theory

More information

The Codesign Challenge

The Codesign Challenge ECE 4530 Codesgn Challenge Fall 2007 Hardware/Software Codesgn The Codesgn Challenge Objectves In the codesgn challenge, your task s to accelerate a gven software reference mplementaton as fast as possble.

More information

Today Using Fourier-Motzkin elimination for code generation Using Fourier-Motzkin elimination for determining schedule constraints

Today Using Fourier-Motzkin elimination for code generation Using Fourier-Motzkin elimination for determining schedule constraints Fourer Motzkn Elmnaton Logstcs HW10 due Frday Aprl 27 th Today Usng Fourer-Motzkn elmnaton for code generaton Usng Fourer-Motzkn elmnaton for determnng schedule constrants Unversty Fourer-Motzkn Elmnaton

More information

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms Course Introducton Course Topcs Exams, abs, Proects A quc loo at a few algorthms 1 Advanced Data Structures and Algorthms Descrpton: We are gong to dscuss algorthm complexty analyss, algorthm desgn technques

More information

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes SPH3UW Unt 7.3 Sphercal Concave Mrrors Page 1 of 1 Notes Physcs Tool box Concave Mrror If the reflectng surface takes place on the nner surface of the sphercal shape so that the centre of the mrror bulges

More information

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour 6.854 Advanced Algorthms Petar Maymounkov Problem Set 11 (November 23, 2005) Wth: Benjamn Rossman, Oren Wemann, and Pouya Kheradpour Problem 1. We reduce vertex cover to MAX-SAT wth weghts, such that the

More information

Programming in Fortran 90 : 2017/2018

Programming in Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Exercse 1 : Evaluaton of functon dependng on nput Wrte a program who evaluate the functon f (x,y) for any two user specfed values

More information

Array transposition in CUDA shared memory

Array transposition in CUDA shared memory Array transposton n CUDA shared memory Mke Gles February 19, 2014 Abstract Ths short note s nspred by some code wrtten by Jeremy Appleyard for the transposton of data through shared memory. I had some

More information

Assembler. Building a Modern Computer From First Principles.

Assembler. Building a Modern Computer From First Principles. Assembler Buldng a Modern Computer From Frst Prncples www.nand2tetrs.org Elements of Computng Systems, Nsan & Schocken, MIT Press, www.nand2tetrs.org, Chapter 6: Assembler slde Where we are at: Human Thought

More information

Chapter 6 Programmng the fnte element method Inow turn to the man subject of ths book: The mplementaton of the fnte element algorthm n computer programs. In order to make my dscusson as straghtforward

More information

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009. Farrukh Jabeen Algorthms 51 Assgnment #2 Due Date: June 15, 29. Assgnment # 2 Chapter 3 Dscrete Fourer Transforms Implement the FFT for the DFT. Descrbed n sectons 3.1 and 3.2. Delverables: 1. Concse descrpton

More information

5 The Primal-Dual Method

5 The Primal-Dual Method 5 The Prmal-Dual Method Orgnally desgned as a method for solvng lnear programs, where t reduces weghted optmzaton problems to smpler combnatoral ones, the prmal-dual method (PDM) has receved much attenton

More information

Petri Net Based Software Dependability Engineering

Petri Net Based Software Dependability Engineering Proc. RELECTRONIC 95, Budapest, pp. 181-186; October 1995 Petr Net Based Software Dependablty Engneerng Monka Hener Brandenburg Unversty of Technology Cottbus Computer Scence Insttute Postbox 101344 D-03013

More information

CE 221 Data Structures and Algorithms

CE 221 Data Structures and Algorithms CE 1 ata Structures and Algorthms Chapter 4: Trees BST Text: Read Wess, 4.3 Izmr Unversty of Economcs 1 The Search Tree AT Bnary Search Trees An mportant applcaton of bnary trees s n searchng. Let us assume

More information

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique //00 :0 AM Outlne and Readng The Greedy Method The Greedy Method Technque (secton.) Fractonal Knapsack Problem (secton..) Task Schedulng (secton..) Mnmum Spannng Trees (secton.) Change Money Problem Greedy

More information

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points; Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features

More information

Efficient Code Generation for Automatic Parallelization and Optimization

Efficient Code Generation for Automatic Parallelization and Optimization Effcent Code Generaton for utomatc Parallelzaton and Optmzaton Cédrc Bastoul Laboratore PRSM, Unversté de Versalles Sant Quentn 5 avenue des États-Uns, 7805 Versalles Cedex, France Emal: cedrcbastoul@prsmuvsqfr

More information

Problem Set 3 Solutions

Problem Set 3 Solutions Introducton to Algorthms October 4, 2002 Massachusetts Insttute of Technology 6046J/18410J Professors Erk Demane and Shaf Goldwasser Handout 14 Problem Set 3 Solutons (Exercses were not to be turned n,

More information

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 A mathematcal programmng approach to the analyss, desgn and

More information

Reducing Frame Rate for Object Tracking

Reducing Frame Rate for Object Tracking Reducng Frame Rate for Object Trackng Pavel Korshunov 1 and We Tsang Oo 2 1 Natonal Unversty of Sngapore, Sngapore 11977, pavelkor@comp.nus.edu.sg 2 Natonal Unversty of Sngapore, Sngapore 11977, oowt@comp.nus.edu.sg

More information

Hermite Splines in Lie Groups as Products of Geodesics

Hermite Splines in Lie Groups as Products of Geodesics Hermte Splnes n Le Groups as Products of Geodescs Ethan Eade Updated May 28, 2017 1 Introducton 1.1 Goal Ths document defnes a curve n the Le group G parametrzed by tme and by structural parameters n the

More information

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

CSCI 104 Sorting Algorithms. Mark Redekopp David Kempe

CSCI 104 Sorting Algorithms. Mark Redekopp David Kempe CSCI 104 Sortng Algorthms Mark Redekopp Davd Kempe Algorthm Effcency SORTING 2 Sortng If we have an unordered lst, sequental search becomes our only choce If we wll perform a lot of searches t may be benefcal

More information

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,

More information

ADJUSTING A PROGRAM TRANSFORMATION FOR LEGALITY

ADJUSTING A PROGRAM TRANSFORMATION FOR LEGALITY Parallel Processng Letters c World Scentfc Publshng Company ADJUSTING A PROGRAM TRANSFORMATION FOR LEGALITY CÉDRIC BASTOUL Laboratore PRSM, Unversté de Versalles Sant Quentn 45 avenue des États-Uns, 785

More information

CMPS 10 Introduction to Computer Science Lecture Notes

CMPS 10 Introduction to Computer Science Lecture Notes CPS 0 Introducton to Computer Scence Lecture Notes Chapter : Algorthm Desgn How should we present algorthms? Natural languages lke Englsh, Spansh, or French whch are rch n nterpretaton and meanng are not

More information

Data Representation in Digital Design, a Single Conversion Equation and a Formal Languages Approach

Data Representation in Digital Design, a Single Conversion Equation and a Formal Languages Approach Data Representaton n Dgtal Desgn, a Sngle Converson Equaton and a Formal Languages Approach Hassan Farhat Unversty of Nebraska at Omaha Abstract- In the study of data representaton n dgtal desgn and computer

More information

Wishing you all a Total Quality New Year!

Wishing you all a Total Quality New Year! Total Qualty Management and Sx Sgma Post Graduate Program 214-15 Sesson 4 Vnay Kumar Kalakband Assstant Professor Operatons & Systems Area 1 Wshng you all a Total Qualty New Year! Hope you acheve Sx sgma

More information

GSLM Operations Research II Fall 13/14

GSLM Operations Research II Fall 13/14 GSLM 58 Operatons Research II Fall /4 6. Separable Programmng Consder a general NLP mn f(x) s.t. g j (x) b j j =. m. Defnton 6.. The NLP s a separable program f ts objectve functon and all constrants are

More information

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 An Iteratve Soluton Approach to Process Plant Layout usng Mxed

More information

Repeater Insertion for Two-Terminal Nets in Three-Dimensional Integrated Circuits

Repeater Insertion for Two-Terminal Nets in Three-Dimensional Integrated Circuits Repeater Inserton for Two-Termnal Nets n Three-Dmensonal Integrated Crcuts Hu Xu, Vasls F. Pavlds, and Govann De Mchel LSI - EPFL, CH-5, Swtzerland, {hu.xu,vasleos.pavlds,govann.demchel}@epfl.ch Abstract.

More information

Meta-heuristics for Multidimensional Knapsack Problems

Meta-heuristics for Multidimensional Knapsack Problems 2012 4th Internatonal Conference on Computer Research and Development IPCSIT vol.39 (2012) (2012) IACSIT Press, Sngapore Meta-heurstcs for Multdmensonal Knapsack Problems Zhbao Man + Computer Scence Department,

More information

LLVM passes and Intro to Loop Transformation Frameworks

LLVM passes and Intro to Loop Transformation Frameworks LLVM passes and Intro to Loop Transformaton Frameworks Announcements Ths class s recorded and wll be n D2L panapto. No quz Monday after sprng break. Wll be dong md-semester class feedback. Today LLVM passes

More information

Conditional Speculative Decimal Addition*

Conditional Speculative Decimal Addition* Condtonal Speculatve Decmal Addton Alvaro Vazquez and Elsardo Antelo Dep. of Electronc and Computer Engneerng Unv. of Santago de Compostela, Span Ths work was supported n part by Xunta de Galca under grant

More information

Verification by testing

Verification by testing Real-Tme Systems Specfcaton Implementaton System models Executon-tme analyss Verfcaton Verfcaton by testng Dad? How do they know how much weght a brdge can handle? They drve bgger and bgger trucks over

More information

Virtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory

Virtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory Background EECS. Operatng System Fundamentals No. Vrtual Memory Prof. Hu Jang Department of Electrcal Engneerng and Computer Scence, York Unversty Memory-management methods normally requres the entre process

More information

An Entropy-Based Approach to Integrated Information Needs Assessment

An Entropy-Based Approach to Integrated Information Needs Assessment Dstrbuton Statement A: Approved for publc release; dstrbuton s unlmted. An Entropy-Based Approach to ntegrated nformaton Needs Assessment June 8, 2004 Wllam J. Farrell Lockheed Martn Advanced Technology

More information

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr) Helsnk Unversty Of Technology, Systems Analyss Laboratory Mat-2.08 Independent research projects n appled mathematcs (3 cr) "! #$&% Antt Laukkanen 506 R ajlaukka@cc.hut.f 2 Introducton...3 2 Multattrbute

More information

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following.

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following. Complex Numbers The last topc n ths secton s not really related to most of what we ve done n ths chapter, although t s somewhat related to the radcals secton as we wll see. We also won t need the materal

More information

Module Management Tool in Software Development Organizations

Module Management Tool in Software Development Organizations Journal of Computer Scence (5): 8-, 7 ISSN 59-66 7 Scence Publcatons Management Tool n Software Development Organzatons Ahmad A. Al-Rababah and Mohammad A. Al-Rababah Faculty of IT, Al-Ahlyyah Amman Unversty,

More information

Sequential search. Building Java Programs Chapter 13. Sequential search. Sequential search

Sequential search. Building Java Programs Chapter 13. Sequential search. Sequential search Sequental search Buldng Java Programs Chapter 13 Searchng and Sortng sequental search: Locates a target value n an array/lst by examnng each element from start to fnsh. How many elements wll t need to

More information

Parallel Numerics. 1 Preconditioning & Iterative Solvers (From 2016)

Parallel Numerics. 1 Preconditioning & Iterative Solvers (From 2016) Technsche Unverstät München WSe 6/7 Insttut für Informatk Prof. Dr. Thomas Huckle Dpl.-Math. Benjamn Uekermann Parallel Numercs Exercse : Prevous Exam Questons Precondtonng & Iteratve Solvers (From 6)

More information

Hierarchical clustering for gene expression data analysis

Hierarchical clustering for gene expression data analysis Herarchcal clusterng for gene expresson data analyss Gorgo Valentn e-mal: valentn@ds.unm.t Clusterng of Mcroarray Data. Clusterng of gene expresson profles (rows) => dscovery of co-regulated and functonally

More information

3D vector computer graphics

3D vector computer graphics 3D vector computer graphcs Paolo Varagnolo: freelance engneer Padova Aprl 2016 Prvate Practce ----------------------------------- 1. Introducton Vector 3D model representaton n computer graphcs requres

More information

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization Problem efntons and Evaluaton Crtera for Computatonal Expensve Optmzaton B. Lu 1, Q. Chen and Q. Zhang 3, J. J. Lang 4, P. N. Suganthan, B. Y. Qu 6 1 epartment of Computng, Glyndwr Unversty, UK Faclty

More information

Brave New World Pseudocode Reference

Brave New World Pseudocode Reference Brave New World Pseudocode Reference Pseudocode s a way to descrbe how to accomplsh tasks usng basc steps lke those a computer mght perform. In ths week s lab, you'll see how a form of pseudocode can be

More information

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision SLAM Summer School 2006 Practcal 2: SLAM usng Monocular Vson Javer Cvera, Unversty of Zaragoza Andrew J. Davson, Imperal College London J.M.M Montel, Unversty of Zaragoza. josemar@unzar.es, jcvera@unzar.es,

More information

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters Proper Choce of Data Used for the Estmaton of Datum Transformaton Parameters Hakan S. KUTOGLU, Turkey Key words: Coordnate systems; transformaton; estmaton, relablty. SUMMARY Advances n technologes and

More information

A New Approach For the Ranking of Fuzzy Sets With Different Heights

A New Approach For the Ranking of Fuzzy Sets With Different Heights New pproach For the ankng of Fuzzy Sets Wth Dfferent Heghts Pushpnder Sngh School of Mathematcs Computer pplcatons Thapar Unversty, Patala-7 00 Inda pushpndersnl@gmalcom STCT ankng of fuzzy sets plays

More information

Quality Improvement Algorithm for Tetrahedral Mesh Based on Optimal Delaunay Triangulation

Quality Improvement Algorithm for Tetrahedral Mesh Based on Optimal Delaunay Triangulation Intellgent Informaton Management, 013, 5, 191-195 Publshed Onlne November 013 (http://www.scrp.org/journal/m) http://dx.do.org/10.36/m.013.5601 Qualty Improvement Algorthm for Tetrahedral Mesh Based on

More information

Analysis of Continuous Beams in General

Analysis of Continuous Beams in General Analyss of Contnuous Beams n General Contnuous beams consdered here are prsmatc, rgdly connected to each beam segment and supported at varous ponts along the beam. onts are selected at ponts of support,

More information

Feature Reduction and Selection

Feature Reduction and Selection Feature Reducton and Selecton Dr. Shuang LIANG School of Software Engneerng TongJ Unversty Fall, 2012 Today s Topcs Introducton Problems of Dmensonalty Feature Reducton Statstc methods Prncpal Components

More information

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z.

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z. TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS Muradalyev AZ Azerbajan Scentfc-Research and Desgn-Prospectng Insttute of Energetc AZ1012, Ave HZardab-94 E-mal:aydn_murad@yahoocom Importance of

More information

Solving two-person zero-sum game by Matlab

Solving two-person zero-sum game by Matlab Appled Mechancs and Materals Onlne: 2011-02-02 ISSN: 1662-7482, Vols. 50-51, pp 262-265 do:10.4028/www.scentfc.net/amm.50-51.262 2011 Trans Tech Publcatons, Swtzerland Solvng two-person zero-sum game by

More information

Design and Analysis of Algorithms

Design and Analysis of Algorithms Desgn and Analyss of Algorthms Heaps and Heapsort Reference: CLRS Chapter 6 Topcs: Heaps Heapsort Prorty queue Huo Hongwe Recap and overvew The story so far... Inserton sort runnng tme of Θ(n 2 ); sorts

More information

Load Balancing for Hex-Cell Interconnection Network

Load Balancing for Hex-Cell Interconnection Network Int. J. Communcatons, Network and System Scences,,, - Publshed Onlne Aprl n ScRes. http://www.scrp.org/journal/jcns http://dx.do.org/./jcns.. Load Balancng for Hex-Cell Interconnecton Network Saher Manaseer,

More information

S1 Note. Basis functions.

S1 Note. Basis functions. S1 Note. Bass functons. Contents Types of bass functons...1 The Fourer bass...2 B-splne bass...3 Power and type I error rates wth dfferent numbers of bass functons...4 Table S1. Smulaton results of type

More information

APPLICATION OF MULTIVARIATE LOSS FUNCTION FOR ASSESSMENT OF THE QUALITY OF TECHNOLOGICAL PROCESS MANAGEMENT

APPLICATION OF MULTIVARIATE LOSS FUNCTION FOR ASSESSMENT OF THE QUALITY OF TECHNOLOGICAL PROCESS MANAGEMENT 3. - 5. 5., Brno, Czech Republc, EU APPLICATION OF MULTIVARIATE LOSS FUNCTION FOR ASSESSMENT OF THE QUALITY OF TECHNOLOGICAL PROCESS MANAGEMENT Abstract Josef TOŠENOVSKÝ ) Lenka MONSPORTOVÁ ) Flp TOŠENOVSKÝ

More information

Concurrent models of computation for embedded software

Concurrent models of computation for embedded software Concurrent models of computaton for embedded software and hardware! Researcher overvew what t looks lke semantcs what t means and how t relates desgnng an actor language actor propertes and how to represent

More information

Synthesizer 1.0. User s Guide. A Varying Coefficient Meta. nalytic Tool. Z. Krizan Employing Microsoft Excel 2007

Synthesizer 1.0. User s Guide. A Varying Coefficient Meta. nalytic Tool. Z. Krizan Employing Microsoft Excel 2007 Syntheszer 1.0 A Varyng Coeffcent Meta Meta-Analytc nalytc Tool Employng Mcrosoft Excel 007.38.17.5 User s Gude Z. Krzan 009 Table of Contents 1. Introducton and Acknowledgments 3. Operatonal Functons

More information

Kent State University CS 4/ Design and Analysis of Algorithms. Dept. of Math & Computer Science LECT-16. Dynamic Programming

Kent State University CS 4/ Design and Analysis of Algorithms. Dept. of Math & Computer Science LECT-16. Dynamic Programming CS 4/560 Desgn and Analyss of Algorthms Kent State Unversty Dept. of Math & Computer Scence LECT-6 Dynamc Programmng 2 Dynamc Programmng Dynamc Programmng, lke the dvde-and-conquer method, solves problems

More information

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices Internatonal Mathematcal Forum, Vol 7, 2012, no 52, 2549-2554 An Applcaton of the Dulmage-Mendelsohn Decomposton to Sparse Null Space Bases of Full Row Rank Matrces Mostafa Khorramzadeh Department of Mathematcal

More information

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS ARPN Journal of Engneerng and Appled Scences 006-017 Asan Research Publshng Network (ARPN). All rghts reserved. NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS Igor Grgoryev, Svetlana

More information

CHARUTAR VIDYA MANDAL S SEMCOM Vallabh Vidyanagar

CHARUTAR VIDYA MANDAL S SEMCOM Vallabh Vidyanagar CHARUTAR VIDYA MANDAL S SEMCOM Vallabh Vdyanagar Faculty Name: Am D. Trved Class: SYBCA Subject: US03CBCA03 (Advanced Data & Fle Structure) *UNIT 1 (ARRAYS AND TREES) **INTRODUCTION TO ARRAYS If we want

More information

Cluster Analysis of Electrical Behavior

Cluster Analysis of Electrical Behavior Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School

More information

2x x l. Module 3: Element Properties Lecture 4: Lagrange and Serendipity Elements

2x x l. Module 3: Element Properties Lecture 4: Lagrange and Serendipity Elements Module 3: Element Propertes Lecture : Lagrange and Serendpty Elements 5 In last lecture note, the nterpolaton functons are derved on the bass of assumed polynomal from Pascal s trangle for the fled varable.

More information

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching A Fast Vsual Trackng Algorthm Based on Crcle Pxels Matchng Zhqang Hou hou_zhq@sohu.com Chongzhao Han czhan@mal.xjtu.edu.cn Ln Zheng Abstract: A fast vsual trackng algorthm based on crcle pxels matchng

More information

Review of approximation techniques

Review of approximation techniques CHAPTER 2 Revew of appromaton technques 2. Introducton Optmzaton problems n engneerng desgn are characterzed by the followng assocated features: the objectve functon and constrants are mplct functons evaluated

More information

y and the total sum of

y and the total sum of Lnear regresson Testng for non-lnearty In analytcal chemstry, lnear regresson s commonly used n the constructon of calbraton functons requred for analytcal technques such as gas chromatography, atomc absorpton

More information

EECS 730 Introduction to Bioinformatics Sequence Alignment. Luke Huan Electrical Engineering and Computer Science

EECS 730 Introduction to Bioinformatics Sequence Alignment. Luke Huan Electrical Engineering and Computer Science EECS 730 Introducton to Bonformatcs Sequence Algnment Luke Huan Electrcal Engneerng and Computer Scence http://people.eecs.ku.edu/~huan/ HMM Π s a set of states Transton Probabltes a kl Pr( l 1 k Probablty

More information

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach Angle Estmaton and Correcton of Hand Wrtten, Textual and Large areas of Non-Textual Document Images: A Novel Approach D.R.Ramesh Babu Pyush M Kumat Mahesh D Dhannawat PES Insttute of Technology Research

More information

Circuit Analysis I (ENGR 2405) Chapter 3 Method of Analysis Nodal(KCL) and Mesh(KVL)

Circuit Analysis I (ENGR 2405) Chapter 3 Method of Analysis Nodal(KCL) and Mesh(KVL) Crcut Analyss I (ENG 405) Chapter Method of Analyss Nodal(KCL) and Mesh(KVL) Nodal Analyss If nstead of focusng on the oltages of the crcut elements, one looks at the oltages at the nodes of the crcut,

More information

Improving Low Density Parity Check Codes Over the Erasure Channel. The Nelder Mead Downhill Simplex Method. Scott Stransky

Improving Low Density Parity Check Codes Over the Erasure Channel. The Nelder Mead Downhill Simplex Method. Scott Stransky Improvng Low Densty Party Check Codes Over the Erasure Channel The Nelder Mead Downhll Smplex Method Scott Stransky Programmng n conjuncton wth: Bors Cukalovc 18.413 Fnal Project Sprng 2004 Page 1 Abstract

More information

Sorting Review. Sorting. Comparison Sorting. CSE 680 Prof. Roger Crawfis. Assumptions

Sorting Review. Sorting. Comparison Sorting. CSE 680 Prof. Roger Crawfis. Assumptions Sortng Revew Introducton to Algorthms Qucksort CSE 680 Prof. Roger Crawfs Inserton Sort T(n) = Θ(n 2 ) In-place Merge Sort T(n) = Θ(n lg(n)) Not n-place Selecton Sort (from homework) T(n) = Θ(n 2 ) In-place

More information

Intro. Iterators. 1. Access

Intro. Iterators. 1. Access Intro Ths mornng I d lke to talk a lttle bt about s and s. We wll start out wth smlartes and dfferences, then we wll see how to draw them n envronment dagrams, and we wll fnsh wth some examples. Happy

More information

UB at GeoCLEF Department of Geography Abstract

UB at GeoCLEF Department of Geography   Abstract UB at GeoCLEF 2006 Mguel E. Ruz (1), Stuart Shapro (2), June Abbas (1), Slva B. Southwck (1) and Davd Mark (3) State Unversty of New York at Buffalo (1) Department of Lbrary and Informaton Studes (2) Department

More information

Improving High Level Synthesis Optimization Opportunity Through Polyhedral Transformations

Improving High Level Synthesis Optimization Opportunity Through Polyhedral Transformations Improvng Hgh Level Synthess Optmzaton Opportunty Through Polyhedral Transformatons We Zuo 2,5, Yun Lang 1, Peng L 1, Kyle Rupnow 3, Demng Chen 2,3 and Jason Cong 1,4 1 Center for Energy-Effcent Computng

More information

Computer Animation and Visualisation. Lecture 4. Rigging / Skinning

Computer Animation and Visualisation. Lecture 4. Rigging / Skinning Computer Anmaton and Vsualsaton Lecture 4. Rggng / Sknnng Taku Komura Overvew Sknnng / Rggng Background knowledge Lnear Blendng How to decde weghts? Example-based Method Anatomcal models Sknnng Assume

More information

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics Introducton G10 NAG Fortran Lbrary Chapter Introducton G10 Smoothng n Statstcs Contents 1 Scope of the Chapter... 2 2 Background to the Problems... 2 2.1 Smoothng Methods... 2 2.2 Smoothng Splnes and Regresson

More information

Private Information Retrieval (PIR)

Private Information Retrieval (PIR) 2 Levente Buttyán Problem formulaton Alce wants to obtan nformaton from a database, but she does not want the database to learn whch nformaton she wanted e.g., Alce s an nvestor queryng a stock-market

More information

Outline. Digital Systems. C.2: Gates, Truth Tables and Logic Equations. Truth Tables. Logic Gates 9/8/2011

Outline. Digital Systems. C.2: Gates, Truth Tables and Logic Equations. Truth Tables. Logic Gates 9/8/2011 9/8/2 2 Outlne Appendx C: The Bascs of Logc Desgn TDT4255 Computer Desgn Case Study: TDT4255 Communcaton Module Lecture 2 Magnus Jahre 3 4 Dgtal Systems C.2: Gates, Truth Tables and Logc Equatons All sgnals

More information

Real-Time Systems. Real-Time Systems. Verification by testing. Verification by testing

Real-Time Systems. Real-Time Systems. Verification by testing. Verification by testing EDA222/DIT161 Real-Tme Systems, Chalmers/GU, 2014/2015 Lecture #8 Real-Tme Systems Real-Tme Systems Lecture #8 Specfcaton Professor Jan Jonsson Implementaton System models Executon-tme analyss Department

More information

Lecture 5: Multilayer Perceptrons

Lecture 5: Multilayer Perceptrons Lecture 5: Multlayer Perceptrons Roger Grosse 1 Introducton So far, we ve only talked about lnear models: lnear regresson and lnear bnary classfers. We noted that there are functons that can t be represented

More information

Multi-stable Perception. Necker Cube

Multi-stable Perception. Necker Cube Mult-stable Percepton Necker Cube Spnnng dancer lluson, Nobuuk Kaahara Fttng and Algnment Computer Vson Szelsk 6.1 James Has Acknowledgment: Man sldes from Derek Hoem, Lana Lazebnk, and Grauman&Lebe 2008

More information

AADL : about scheduling analysis

AADL : about scheduling analysis AADL : about schedulng analyss Schedulng analyss, what s t? Embedded real-tme crtcal systems have temporal constrants to meet (e.g. deadlne). Many systems are bult wth operatng systems provdng multtaskng

More information

ON SOME ENTERTAINING APPLICATIONS OF THE CONCEPT OF SET IN COMPUTER SCIENCE COURSE

ON SOME ENTERTAINING APPLICATIONS OF THE CONCEPT OF SET IN COMPUTER SCIENCE COURSE Yordzhev K., Kostadnova H. Інформаційні технології в освіті ON SOME ENTERTAINING APPLICATIONS OF THE CONCEPT OF SET IN COMPUTER SCIENCE COURSE Yordzhev K., Kostadnova H. Some aspects of programmng educaton

More information

TPL-Aware Displacement-driven Detailed Placement Refinement with Coloring Constraints

TPL-Aware Displacement-driven Detailed Placement Refinement with Coloring Constraints TPL-ware Dsplacement-drven Detaled Placement Refnement wth Colorng Constrants Tao Ln Iowa State Unversty tln@astate.edu Chrs Chu Iowa State Unversty cnchu@astate.edu BSTRCT To mnmze the effect of process

More information