GLORE: Generalized Loop Redundancy Elimination upon LER-Notation

Size: px
Start display at page:

Download "GLORE: Generalized Loop Redundancy Elimination upon LER-Notation"

Transcription

1 GLORE: Generalzed Loop Redundancy Elmnaton upon LER-Notaton YUFEI DING, XIPENG SHEN, North Carolna State Unversty, Unted States 74 Ths paper presents GLORE, a novel approach to enablng the detecton and removal of large-scope redundant computatons n nested loops. GLORE works on LER-notaton, a new representaton of computatons n both regular and rregular loops. Together wth a set of novel algorthms, t makes GLORE able to systematcally consder computaton reorderng at both the expresson level and the loop level n a unfed manner. GLORE shows an applcablty much broader than pror methods have, and frequently lowers the computatonal complextes of some nested loops that are elusve to pror optmzaton technques, producng sgnfcantly larger speedups. CCS Concepts: Software and ts engneerng General programmng languages; Addtonal Key Words and Phrases: program optmzaton, loop redundancy elmnaton, operaton mnmzaton ACM Reference Format: Yufe Dng, Xpeng Shen GLORE: Generalzed Loop Redundancy Elmnaton upon LER-Notaton. Proc. ACM Program. Lang. 1, OOPSLA, Artcle 74 (October 2017), 28 pages. 1 INTRODUCTION Removng redundant computatons s an effectve way to speed up applcatons. The tradtonal approach, loop redundancy elmnaton, detects and removes computatons that are nvarant across the nnermost loop. Many redundances, however, span a much larger scope and often reman hdden to pror methods. Detectng them would requre some careful large-scope computaton reorderng and reassocaton at both the expresson level and the loop level. They are elusve to tradtonal methods for ther small analyss scope and weaknesses n handlng rregular loops and complex control flows and dependences. For nstance, Example 1 n Fgure 1 (a) shows a code contanng a whle loop and a nested for loop. If we only focus on the expresson n the nnermost loop body, we could not fnd any redundant computaton: varable d gets updated n every teraton of the for loop, and w gets updated n every whle loop teraton. However, takng a broader vew, we can see that wth some large-scope reorderng and reassocaton of the computatons, the entre code s equvalent to the form n Fgure 1 (b), n whch, reducton loops a[] and b[] are both redundantly recomputed across the whle loop. If we take the redundant computatons out of the outer-level loop approprately, we can save many computatons and speed up the executon by orders of magntude. Both the Authors Emals: Yufe Dng, ydng8@ncsu.edu; Xpeng Shen, xshen5@ncsu.edu. Authors address: Department of Computer Scence, North Carolna State Unversty, Ralegh, North Carolna, 27606, Unted States, US.. Permsson to make dgtal or hard copes of all or part of ths work for personal or classroom use s granted wthout fee provded that copes are not made or dstrbuted for proft or commercal advantage and that copes bear ths notce and the full ctaton on the frst page. Copyrghts for components of ths work owned by others than ACM must be honored. Abstractng wth credt s permtted. To copy otherwse, or republsh, to post on servers or to redstrbute to lsts, requres pror specfc permsson and/or a fee. Request permssons from permssons@acm.org Assocaton for Computng Machnery /2017/10-ART74

2 74:2 Yufe Dng, Xpeng Shen w = w0; whle (d > 0.01){ d = 0; for( = 0; <= M; ++){ d += a[] + b[] * w; w = w * d; w = w0; whle (d > 0.01){ A = a[]; B = b[]; d = A + B * w; w = w * d for ( = 0; <= M; ++){ r[] = 0; for (k = 0; k <= ; k++){ for ( = 0; <= ; ++){ r[] += x[,] * y[,k]; for ( = 0; <= M; ++){ temp[,0] = y[,0]; for ( = 1; <= M; ++){ for ( = 0; <= ; ++){ temp[,] = temp[,-1]+y[,]; for ( = 0; <= M; ++){ r[] = 0; for ( = 0; <= ; ++){ r[] += x[,] * temp[,]; (a) Example 1 (b) A form equvalent to Example 1 (c) Example 2 (d) A form equvalent to Example 2 Fg. 1. Illustraton of large-scope loop redundances. needed large-scope analyss and the presence of whle loop prevent the tradtonal methods to fnd and remove such redundances. Example 2 n Fgure 1 (c) shows large-scope redundances n affne for loops. The code computes the products of elements n two arrays, reduces them along two axes (k and ), and then stores them n a new array r. Agan, f we only focus on the expressons n the nnermost loop, we could not fnd any redundant computatons, as the expresson x[, ]*y[, k] computes dfferent values across dfferent loop teratons. But redundances expose n a larger scope: swtch the order of loop k and ; the nner loop would be computng the product of x[, ] and the prefx-sum of y[, k] along k dmenson (up to ). We can have a separate loop to compute the prefx-sum. If we further notce that temp[, ] equals temp[, 1] + y[, ], the prefx-sum can be even smplfed nto the frst two loops n Fgure 1 (d). By reusng the prefx sums, the computaton of r s smplfed nto the bottom loop n Fgure 1 (d). The computaton complexty reduces to O(M 2 ) from the orgnal complexty O(M 3 ). In both examples, the redundances requre large-scope (across multple levels of loops) computaton reorderng to detect and remove. As notced n numerous studes [Cooper et al. 2008; Detz et al. 2001; Dng et al. 2017, 2015; Drake and Hamerly 2012; Elkan 2003; Fahm et al. 2006; Goldberg and Harrelson 2005; Greenspan et al. 2000; Gupta and Raopadhye 2006; Gutman 2004; Hamerly 2010; Nga et al. 2006; Wang et al. 2012; Wang 2011], such large-scope loop redundances are common, especally n applcatons n computatonal physcs, chemstry, data analytcs, and other domans that nvolve relatvely complex formulae or algorthms. When translatng those complex formulae or algorthms nto computer programs, the developers often ntutvely follow the formulae or algorthms step by step, producng logcally easy-to-understand and practcally easy-to-mantan code rather than tryng to mnmze redundant computatons. There have been some efforts n extendng the scope of tradtonal loop redundancy elmnaton [Cooper et al. 2008; Detz et al. 2001; Gupta and Raopadhye 2006]. They have made some sgnfcant contrbutons towards large-scope redundancy elmnaton, but they are subect to two maor lmtatons. Frst, they all have some strct requrements on the forms of the loops. None of them can handle mathematcal operatons (e.g., sn, mod) or rregular loops wth complex control flows (e.g., whle loops wth breaks) and complcated dependences. Second, almost none of them can systemcally consder the combnaton of loop-level computaton reorderng and expressonlevel algebrac reorderng, and deal wth ther nterplays. Tensor contracton optmzatons [Hartono et al. 2006, 2005] have consdered both levels of reorderngs, but they are desgned specfcally for tensor contracton n regular for loops wth constant loop bounds, napplcable to common loops. As a result, both examples n Fgure 1 are elusve to all pror technques.

3 GLORE: Generalzed Loop Redundancy Elmnaton upon LER-Notaton 74:3 for( = 0; <= M; ++){ result = a * b; x[] = result + y[]; for( = 2; <= M; ++){ x[] = y[-2]+y[-1]+y[+1]+y[+2]; for ( = 0; <= M; ++){ for ( = 0; <= M; ++){ for (k = 0; k <= N; k++){ for(l = 0; l <= N; l++){ r[,k] += x[,l] * y[l,] * s[,k]; for( = 1; <= M; ++){ for( = 1; <= ; ++){ y[] +=x[]; (a) Category 1: loop-nvarant expresson (b) Category 2: Partally loop-nvarant expresson (c) Category 3: Loop-nvarant loop (d) Category 4: Partally loop-nvarant loop Fg. 2. Examples of the four man categores of loop redundances. A key observaton made n our work s that the lmtatons of the pror technques fundamentally stem from the lack of a proper representaton of loops of varous forms. For nstance, a pror work [Gupta and Raopadhye 2006] uses a hgh-level equatonal form to represent loops. As a result, t cannot accommodate whle loops or data dependences. Moreover, t focuses on loop reorderng but gnores ts nterplay wth expresson-level reorderng, causng t to even mss many redundances hdden n regular loops that t can represent. Ths paper presents a new soluton that addresses those challenges through the development of GLORE, whch stands for generalzed loop redundancy elmnaton. Ths new method ntroduces a notaton scheme named loop-reducton notaton (LER-notaton), whch provdes the frst unfed symbolc abstracton for systematcally conductng computaton reorderng across both loops and expressons upon the laws of assocatvty, commutatvty, and dstrbutvty. LER-notaton equps GLORE wth an applcablty much broader than pror methods have, coverng both regular and rregular loops and applcable to code wth complex control flows, complcated dependences, and math operatons. At the same tme, LER-notaton offers a form more frendly for the exploratons of both loop and expresson reorderng. To translate the ncreased flexblty to actual redundancy removal, we further propose a set of novel transformatons and algorthms, ncludng operand foldng, alternatng form generaton, mnmum-unon algorthm, a lnear-tme closure-based algorthm, and so on. These technques allow GLORE to treat the varous loop complextes wth ease, and to effectvely detect loop redundances by explorng loop and expresson reorderng and reassocatons n a unfed, comprehensve manner. Experments on 21 benchmarks from four sources show that GLORE excels n both generalty and effectveness. Workng as an end-to-end framework, GLORE s able to detect and remove the most common cases n four maor categores of loop redundant computatons. Those cases nclude some large-scope redundances that have been elusve to pror technques, on whch, GLORE gves orders of magntude speedups by lowerng ther computatonal complextes. On loops that pror methods can handle, GLORE produces smlar or sgnfcantly hgher speedups. 2 FOUR CATEGORIES OF LOOP REDUNDANCY In general, any computatons that occur n multple teratons of a loop s a loop redundancy. We classfy loop redundances nto four maor categores accordng to ther granulartes and repetton patterns, and GLORE s desgned to tackle programs wth these four maor loop redundances. Category 1: Loop-nvarant expressons. If an expresson s operands and operatons are nvarant across all teratons of a loop, the expresson s a loop-nvarant expresson, llustrated by a b n Fgure 2 (a). Category 2: Partally loop-nvarant expressons. If an expresson s recomputed across some but not all teratons of a loop, that expresson s a partally loop-nvarant expresson of that loop. Such redundancy could be the outcome of array expressons that appears n some algned formats. Fgure 2 (b) offers such an example, n whch, y[ + 1] + y[ + 2] n the th

4 74:4 Yufe Dng, Xpeng Shen Loop nvarant removal [Allen and Kennedy 2001]: category 1 redundances only. ASE [Detz et al. 2001]: for sum-of-products n stencls only (part of category 2), requrng sngle-rank array references as operands and ndex expressons n a strngent form. ESR [Cooper et al. 2008]: for common array subexpressons n categores 1 and 2. REDUCTION [Gupta and Raopadhye 2006]: for reductons only. No quanttatve results reported. No support of whle loops, mperfectly nested loops, and complex operatons (e.g., sn, cos, etc.) n an expresson. Page [Page and Koeng 1982]: ncremental computaton across functon calls. Fg. 3. Summary of pror methods. teraton of the loop executes the same computaton as y[ 2] + y[ 1] does n the ( + 3) th teraton. Category 3: Loop-nvarant loops. Let loop L 2 be a loop nested n loop L 1. If every nvocaton of L 2 contans exactly the same set of computatons, we say that L 2 s a loop-nvarant loop of L 1. Fgure 2 (c) gves such an example: The reducton over products of x[,l] and y[l, ] along axs l s repeatedly computed across loop k, and the reducton over products of y[l, ] and s[, k] s repeatedly computed across loop. Category 4: Partally loop-nvarant loops. Let L 2 be a loop nested n loop L 1. If the computatons by some nvocatons of L 2 form a subset by some other nvocatons of L 2, we say that L 2 s a partally loop-nvarant loop of L 1. Fgure 2 (d) shows an example. The th teraton of the outer loop computes =1 x[]. That computaton s repeated n the frst part of every later teraton of the outer loop. It dffers from Category 3 n that later teratons have some extra computatons. The examples n Fgure 2 are ntentonally made smple for understandng. The actual code could nvolve data dependences, rregular loops, and other complextes as the motvatng examples n Fgure 1 show. One mportant ablty of GLORE s to systematcally conduct computaton reorderng and reassocaton across both loops and expressons, so that these four categores of redundances, f hdden n the orgnal program, could stll be exposed and removed. 3 RELATED WORK Fgure 3 summarzes the man methods developed n pror studes on removng loop redundances. Tradtonal loop nvarant removal [Allen and Kennedy 2001] s desgned for only category-1 redundancy. Efforts to expandng the scope have each been desgned for a specal type of redundancy. Wthout establshng a general flexble way to analyze and reorder computatons n a large scope, these efforts show lmted applcablty and effectveness. Array Subexpresson Elmnaton (ASE) [Detz et al. 2001] s desgned only for sum-of-products computaton stencls, where the operands must be sngle-rank array references and the ndex expressons must be of a strngent form. ESR [Cooper et al. 2008] combnes value numberng and scalar replacement to explore common subexpressons made up of array references. It s desgned for some loop redundances n only categores 1 and 2, and msses redundances that requre sophstcated computaton reorderng. REDUCTION [Gupta and Raopadhye 2006] s specally desgned for smplfyng reductons. It utlzes polyhedral models to explore computatons shared among multple reductons. It mght be able to fnd some redundances n categores 3 and 4, but cannot handle whle loops, mperfectly

5 GLORE: Generalzed Loop Redundancy Elmnaton upon LER-Notaton 74:5 nested loops, and complex operatons (e.g., sn, cos, etc.). Moreover, ts descrpton stops at a theoretcal level, gvng nether mplementaton nor gudelnes for mplementaton; no quanttatve results have been reported on that technque. Hartono et al. [Hartono et al. 2006, 2005] tres to dentfy the most cost-effectve common subexpressons for tensor contracton n electronc structure calculatons so that the total number of operatons could be mnmzed. They have strct requrements on the loop type and expresson format: only for loop wth constant loop bound, and the expressons must be products of array references and ther ndex expressons must be of a partcular form. Page [Page and Koeng 1982] studes how fnte dfferencng can be used to optmze ncremental computatons. The optmzaton s at functon level based on a predefned transformaton lbrary, wthout consderng sophstcated computaton reorderng. CLARITY [Olvo et al. 2015] s a recent work that shows promsng results n detectng repeated traversals of arrays. Even though repeated traversals could hnt on possble (not necessarly true) redundant computatons, they are nsuffcent for precsely detectng or removng redundant computatons. Some tlng technques for mperfect loops [Song and L 1999] may expose some possble redundant computatons as a result of the loop tlng transformatons. But ther man purpose and effects are on mprovng cache performance by restrctng data footprnt sze, rather than detectng loop redundances. A recent work [Luporn et al. 2017] manages to reduce the number of operaton counts for a class of fnte element ntegraton loop nests by explotng fundamental mathematcal propertes of fnte element operators. It can fnd some redundances n categores 3, but does not handle mperfectly nested loops or complex operatons (e.g., sn, cos, etc.). It nether gves any systemc consderatons of the combnaton of loop-level computaton reorderng and expresson-level algebrac reorderng, and thus may mss some large-scope optmzaton opportuntes. Overall, for lack of a general approach to analyzng large-scope redundancy and comprehensve reorderng, these methods are each lmted to a specal type of redundant computatons wth relatvely narrow applcabltes. Even mergng them together stll leave lots of cases uncovered and opportuntes untapped as Secton 9 wll show. 4 GLORE OVERVIEW GLORE overcomes the lmtatons of the prevous studes through two features: an LER-notaton to enable flexble analyss and reorderng of large-scope computatons on both regular and rregular loops, and a seres of novel algorthms to effectvely determne the approprate orders that mnmze the amount of redundant computatons. As a result, GLORE treats the most common cases n all the four categores of redundancy a much broader range of loop redundances than any pror method does, fnds better computaton orders, and s amenable to the aforementoned varous code complextes: reducton loops, regular loops, and rregular loops that are perfectly or mperfectly nested, carryng dependences or not, nvolvng smple or complex operatons (e.g., sn, cos, log). We next present LER-notaton and then explan how the algorthms n GLORE leverage the flexblty by the notaton to analyze and reorder computatons to remove each of the four categores of redundances. We descrbe the converson between code and LER-notaton at the end 1. 1 Ths paper uses C language termnology as GLORE s currently mplemented for C programs, but the prncpled technque should be extensble to code n some other languages.

6 74:6 Yufe Dng, Xpeng Shen 5 LER-NOTATION Wth LER-notaton, a nested loop can be represented concsely captured n some formulae, makng symbolc analyss easer to apply. Our survey fnds that no exstng notatons of loops can handle all the complextes mentoned n the prevous secton, whereas LER-notaton solves the problem. In LER-notaton, a nested loop s represented n one or more formulae (called LER-formulae). Although LER-formulae can represent calculatons n an arbtrary level of loops, n ths work, we use them to represent the computatons n the nnermost loop along wth all the levels of loops enclosng the computatons, because of the nnermost computatons typcally beng the most costly part of a nested loop. There could be data dependences flowng nto the nnermost loop from other levels of loops, whch would be captured by some subscrpts of operands n LER-notaton as descrbed at the end of ths secton. The general format of a formula n LER-notaton s as follows 2 : L E R, where, L represents a sequence of loop notatons, E represents an expresson nsde those loops, R represents that the computaton results are stored nto varable R. The LER-representaton of a nested loop s a collecton of such formulae. We next explan E and L n more detal. E and Operands Foldng. The expresson E may contan arbtrary mathematcal computatons (e.g., sn 2 (x[])) as long as the computatons do not alter the value of the operands or other varables (.e., free of sde effects). For computatons usng operators beyond the common basc mathematcal operators (+,-,*,/), the computatons are folded nto a sngle synthetc operand wth a unque ID and wth all the loop ndces used n the orgnal computatons ncluded n the operand s ndexng subscrpt. For nstance, sn(a[] + b[]) s represented as synthetc_ab1[, ], where synthetc_ab1 s the unque ID of the created synthetc operand and [, ] s ts ndexng subscrpt. We call ths transformaton operands foldng. By hdng the detaled complextes n expressons but explctly exposng the connectons wth the enclosng loops, operands foldng makes t possble for GLORE to handle loops wth complex expressons. L and Dependence Subscrpts. The loop sequence L s a combnaton of L,,, and W, whch each represents one knd of loops: (a) Regular for loops (L): L l,u represents a regular f or loop wth as the loop ndex varable. It s assumed that the loops have already gone through normalzaton such that the ndex goes from a lower bound (l) to an upper bound (u) (whch are affne expressons of loop ndex varables) wth 1 as the step sze. The followng code, for nstance, s represented as L 1,N L 1,M (a[] b[] c[]) x[, ] n LER-Notaton: for( = 1; N; ++){ for( = 1; M; ++){ x[,] = a[] b[] c[]; (b) Reducton loops (, ): If the loop conducts a reducton operaton (e.g., summaton or product) across teratons, the loop s represented as a reducton loop. In the current mplementaton, GLORE consders ust summaton ( ) and product ( ), whch are the most commonly seen Semrngs. Other Semrngs are possble to be handled wth mnor extensons. The notaton of a reducton loop s the same as a regular for loop except that L s replaced wth ether or. 2 The name LER comes from ths general form.

7 GLORE: Generalzed Loop Redundancy Elmnaton upon LER-Notaton 74:7 Loop normalzaton and affne loop bounds are also assumed. So, l,u represents a loop n whch a summaton s done across ts teratons wth as the loop ndex and l and u as the loop bounds. The followng code, for example, can be represented as b + 1,N 1,M a[] x: x = b; for( = 1; N; ++){ for( = 1; M; ++){ x = x + a[]; Note, f the nnermost loop contans multple statements, multple formulae could be created wth each correspondng to one of the statements. When there are values flowng between those statements, some vectorzatons of scalar varables may be necessary. For nstance, f there s a statement c[,] = x n the prevous example loop rght after the x=x+a[] statement, the LERnotaton would be as follows: b + 1,N L 1, N L 1,M 1,M a[] x[, ] x[, ] c[, ]. (c) Whle loops and other rregular loops (W): Unlke the prevous two knds of loops, a whle loop has no loop ndex varable or lower or upper bound. To help dentfy a partcular whle loop n the notaton, LER-notaton gves a unque dentty to each whle loop, represented as a subscrpt of W. For nstance, W t represents a whle loop whose dentty s set as t. Irregular for loops and reducton loops (e.g., wth non-affne loop bounds or control flow statements such as break that may cause early termnaton of the loops) are represented and treated n the same way as the whle loops, except that ther loop ndces are used as ther denttes. When the lower bound of a for loop or reducton loop s 1, the lower bound can be omtted. In LER-notaton, the subscrpt of L,,, or W s called the ID of that loop. If a varable (say x) gets assgned and a loop (ether f or or whle loop) wth ID s the nnermost loop that contans that assgnment, the varable carres a subscrpt (e.g., x ) n the LER-representaton to explctly ndcate possble data dependences caused by the update to that operand. For nstance, the w n Fgure 1 (a) gets updated n the whle loop; so ts subscrpt shall carry the ID of the whle loop to ndcate the possble cross-loop dependences. (In ths paper, to smplfy the representaton, we explctly wrte out such subscrpts only when t s necessary.) Such dependences propagate: Varables whose value comes from calculatons nvolvng a varable wth such a subscrpt would carry that subscrpt themselves. To obtan these subscrpts, the examnaton starts from the operands n the outermost loop, and gradually moves to the nner loops and propagates the subscrpts throughout the process. A synthetc operand carres subscrpt f any of ts orgnal operands would carry subscrpt. Examples. In LER-notaton, the mperfectly nested loop n Fgure 1(a) s expressed n the followng formula: 1, N W t (a[] + b[] w t ) d t. (1) It explctly represents the statement n the nnermost loop as that s the focus of optmzaton. It uses the subscrpt t to capture the dependences of w and d over the teratons of the whle loop due to the statements n the outer loop.

8 74:8 Yufe Dng, Xpeng Shen preprocess Forg (operand foldng; alt form gen) Falt f operand abstracton f* f loop encapsulaton g cat-3 opt (mn unon alg. & closure-based alg.) loop decapsulaton g upper case: a set of formulae; lower case: a formula; : transformaton; : for each member. G cat-4 opt (ncremental repres.) h H operand concretzaton h cat-1 & cat-2 opt. (reuse lst & groups) I Fg. 4. GLORE transforms formulae through a seres of steps to remove ts loop redundances. A fnal cross-formula optmzaton step s omtted. The example n Fgure 1(c) s expressed as L1, M 0, X 0, X k x[, ] y[, k] r [], (2) LER-notaton offers a way to concsely represent both regular and rregular loops and explctly encode possble data dependences nto the representaton. These propertes prove essental for GLORE to acheve a much broader range of applcablty and to more effectvely explore computaton reorderng of all scopes to detect and remove large-scope redundances than pror methods do. 6 GLORE ANALYSIS AND OPTIMIZATIONS Ths secton descrbes GLORE and explans how t works on the LER representatons of loops to fnd and remove the four categores of loop redundances. 6.1 Overvew Fgure 4 outlnes the man steps of the GLORE algorthm. The nput to GLORE s a set of LERformulae correspondng to a nested loop. Its output s a new set of LER-formulae wth all redundances GLORE fnds removed. The algorthm frst preprocesses the nput formulae. Ths step ncludes two man operatons. The frst s operand foldng (descrbed n Secton 5), after whch, operatons beyond the basc algorthmc operatons are replaced wth synthetc operands. The second s alternatng form generaton, n whch, mnus s turned nto negatve sgns assocated wth each of the relevant operands, and dvson s folded nto operands as nverse. After that, the formula contans only plus or tmes, a form we call alternatng form. In such a form, the expresson can be regarded as a herarchy wth the levels alternatng between PLUS and TIMES, as llustrated n Fgure 5. The herarchcal vew allows a dvde-and-conquer strategy to be used, wth redundancy detected and removed at each level of the herarchy. Because the composton at each level nvolves ether only plus or only tmes, t allows free reassocaton and communcaton of the operatons among the chldren of an arbtrary node n the tree. GLORE then optmzes each of the orgnal formulae ndvdually to remove loop redundances contaned n each. After that, t examnes the new set of formulae and removes redundances that exst across the formulae.

9 GLORE: Generalzed Loop Redundancy Elmnaton upon LER-Notaton 74:9 * * + * * + * PLUS TIMES Leaf Node 5 x -3 d[] -1 a[] d[] + b[] c[k] Fg. 5. The alternatng form of formula 5x (3d[] + a[] d[] (b[] + c[k]) s 5x + ( 3d[]) + ( a[]) + d[] (b[] + c[k]), regarded as a herarchy wth the levels alternatng between PLUS and TIMES. When optmzng an ndvdual formula, GLORE takes a seres of steps, and through the process, a formula s transformed nto a seres of forms. As Fgure 4 shows, for a gven formula f, GLORE frst conducts operand abstracton, whch replaces the ndex of each operand wth a set of the IDs of ts relevant loops. A loop s relevant to an operand f ts ID appears n the ndex of the operand. We use relloops(x) to denote the set of loops relevant to an operand x. For nstance, for the followng formula: L 1, N 1, 1, N k x[ + ] y[, + k] z[k] w[], (3) the relevant loop set s {, for x, {,,k for y, and {k for z. After operand abstracton, the formula becomes L 1, N 1, 1, N k x{, y{,, k z{k w[]. (4) Ths step smplfes the removal of loop redundances of categores 3 and 4, n whch, what s relevant s the set of loop ndces n the ndexng expressons of each operand rather than the ndexng expressons themselves. We denote the resultng formula wth f. (The concrete ndexng expressons of each operand s restored n later steps.) GLORE then apples loop encapsulaton on f to convert t nto a new form f, whch, through the use of pseudo-bounds, hdes the complextes n loop bounds such that every loop n f, other than whle loops, has only constant bounds. GLORE then uses mnmum unon algorthm to detect and remove category-3 redundances (loop-nvarant loops) from f, yeldng a new set of formulae G (Secton 6.2). For each formula n G, say д, GLORE decapsulates t to get a form д wth the complextes of the loop bounds restored. GLORE then fnds and removes category-4 redundances (partally loop-nvarant loops) by convertng д nto an ncremental representaton, resultng n a new set of formulae H. GLORE then restores the concrete ndex expressons of operands (operand concretzaton), and removes the other two categores of redundances by buldng up reuse lsts and reuse groups of the expressons n the formulas. We next gve a detaled explanaton of the algorthms for the removal of loop-nvarant loops (category 3). As the most complex category to handle, t demonstrates how LER-notaton facltates the large-scope analyss and computaton reorderng for redundancy removal. The treatments to other categores follow a smlar approach; we descrbe them brefly at the end.

10 74:10 Yufe Dng, Xpeng Shen 6.2 Removal of Loop-Invarant Loops (Category 3) To help understandng, we start wth the case where every loop bound s constant across the teratons of the nested loop of nterest, there are no loop-carred data dependences (expect regular reductons), and all loops are nterchangeable n order. We dscuss the other complextes later n Secton A loop-nvarant loop can be ether a reducton loop or a for loop. GLORE treats redundant reducton loops frst and then treats redundant regular loops Loop-Invarant Reducton Loops. We wll draw on the example n Fgure 2 (c) n ths secton. In LER notaton, t can be expressed n the followng formula: L 1,M 1,M L 1,N k 1,N l x[,l] y[l, ] s[, k] r[, k]. (5) RelLoops. The removal of redundant reductons s based on relloops of operands and relloops of reducton loops. Recall that the relloops of an operand s the set of loop IDs that appear n the ndexng expressons of that operand (as defned n secton 6.1). The relloops of a reducton loop R s the unon of the relloops of all ts operands whose relloops contans R. Formally, t s defned as follows: relloops(r) = RelLoops(o). (6) o:o operands (R) R relloops (o) For nstance, the relloops( ) n Formula 5 s {,l,k because appears n the ndexng expressons of operands y[l, ] and s[, k] and the unon of ther loop ndex sets s {,l,k. Relevant loops of a reducton tells what loops must be nvolved when dong the correspondng reducton. Our followng dscusson assumes that the reducton s a summaton. The prncple desgn s the same for multplcaton-based reducton. Formula Smplfcaton. The nput to the step for removng cat-3 redundances s the form after the preprocessng steps and the operand abstracton. Array ndexng expressons are already replaced wth the relloops of the array access. For nstance, for formula N N N k d[2 k] + 2 a[] sn(d[]) (a[3 + 3]b[] + c[k]) r, ts nput form to ths step s N N N k d{k + 2 a{ d_{ (a{b{ + c{k) r, where the ndexng of each array only ndcates the relevance of the loops rather than the exact locaton of the element to access, and d_ s a synthetc operand. Before detectng redundant reductons, we further smplfy the formula. If the top level of the expresson n the formula s a plus node (lke Fgure 5 shows), the formula s broken nto several formulae wth each correspondng to a chld of the root node. Moreover, for each of the new formula (a tmes node), ts operands are grouped to further smplfy the representaton: The operands represented by ts mmedate chld nodes are grouped nto a sngle synthetc operand f they have the same relloops, and each non-leaf chld node turns nto a sngle synthetc operand. For nstance, the smplfcaton result of formula N N N k d{k + 2 a{ d_{ (a{b{ +c{k) r becomes k d{k tmp1, k 2 дroup_a_d{ дroup_a_b_c{,, k tmp2, tmp1 + tmp2 r, where, the frst two formulae correspond to each of the two terms of the orgnal plus expresson. The operands n the second formula are further grouped: дroup_a_d{ s derved from a{ d_{

11 GLORE: Generalzed Loop Redundancy Elmnaton upon LER-Notaton 74:11 for ther dentcal relloops, and дroup_a_b_c{,, k s derved from the sngle term (a{b{ +c{k). The fnal formula adds the results of the prevous formulae to get the fnal result. Ths smplfcaton puts each formula nto a product form, offerng convenences for removal of redundant reductons as shown next. Detectng Redundancy. Based on the smplfed product form and the concept of relloops, the detecton of loop-nvarant loops becomes easy: Under the assumpton that all loops n a formula are nterchangeable, f loop relloops(r), where R s then the reducton loop R s nvarant to loop, whch means that we may move out of loop the reducton R of the subexpresson consstng of operands that nclude n ther relevant loop sets. An Example. In Formula 5, for nstance, loop s not n relloops( ), whch equals {,l,k. The formula s hence equvalent to the followng: 1,M L 1,N L 1, N y[l, ] s[, k] temp[l, k] l k L 1,M L 1, N k 1, N l x[,l] temp[l, k] r[, k]. (7) The equvalence s ntutve: Because the calculaton of the sum of the product of y and s has nothng to do wth loop, t does not need to be repeatedly computed nsde loop. Puttng t out removes the redundant computatons. The cost of the reducton s reduced from O (N 2 M 2 ) to O(N 2 M). Another way to understand the benefts s that the transformaton essentally changes the order of computaton nvolved n the two reductons by leveragng the dstrbutve property of multplcaton. Gven and k, the orgnal formula computes r[, k] as x[,l] y[l, ] s[, k], whle the new formula computes t as x[,l] y[l, ] s[, k]. l l Wth the nner summaton beng moved nto a separate formula, the computatonal complexty decreases. For that example, an alternatve form of the resultng formulae s as follows: L 1,M L 1,M L 1,M L 1, N k 1, N l 1,M x[,l] y[l, ] temp[, ] temp[, ] s[, k] r[, k]. (8) Ths form dffers from the prevous form n the orders of computng the reducton loops, and hence the computatonal complexty (O(N M 2 ) v.s. O(N 2 M)). Ths example demonstrates a crtcal aspect n removng redundant loops: fndng the best order of the computatons. We solve the problem through an algorthm named mnmum unon algorthm. Mnmum Unon Algorthm.

12 74:12 Yufe Dng, Xpeng Shen When there are many nested loops and operands whose ndexng expressons each cover some subsets of the loop ndces, fndng the best computaton order can be a dffcult problem. In fact, a prevous paper [Ch-Chung et al. 1997] shows that even a much smplfed verson of the problem 3 s already NP-complete. We desgn a heurstc algorthm, called mnmum unon algorthm, to help quckly determne a good order of reducton loops (regular loops dscussed later). It produces a forest, whch encodes the desred order of the reducton loops. Each node n the forest corresponds to a reducton loop. Loops on separate trees can have an arbtrary order n the produced formulae, whle the loops n one tree wll follow a post order (chldren before parent) n the produced formulae. We call the forest an orderng forest. Consder the followng example: N N N N a[][] b[][k] result (9) k t Fgure 6 shows the produced orderng forest. The t loop can be ether before or after the other three loops; reducton loop and loop k should be computed before reducton loop. 1 2 t: { : {,,k : {, k: {,k Fg. 6. The orderng forest produced by the mnmum unon algorthm for Formula 9 (the brackets show the relloops of each reducton). Mnmum unon algorthm leverages from nsghts. Frst, f the relloops of reducton loop s a subset of that of reducton loop and does not nclude, computng loop frst wll allow loop to use ts results (rather than recompute ts results n every teraton of loop ), lowerng the computatonal complexty. Second, f the relloops of two reducton loops and have no overlap, then the two reducton loops do not need to use the results from each other, and hence ther order does not matter to the cost. Fgure 7 outlnes our algorthm. Its nput s the set of reducton loops n a gven LER-formula. GLORE stores wth each reducton loop ts relloops, and estmates ts cost as the product of the ranges of the ndex values of all loops n ts relloops (symbolcally represented). It produces separate trees n the result forest based on the second nsght gven n the prevous paragraph. It takes a greedy strategy, attemptng to maxmze the amount of reuse. In each teraton of the whle loop n Fgure 7, the loop wth the mnmum cost s selected and added nto the forest. It s temporarly put as the chldren of all the yet-to-process reducton loops that use ts results (they are ts temporary parents ), and the algorthm (lnes n Fgure 7) updates those loops costs by consderng the replacement of the (re)computaton of that loop wth the reuse of ts result. The parent of a node s later clarfed: The frst of ts temporary parents that s put nto the forest s set as ts actual parent (lnes n Fgure 7). Note we do not need to undo the cost reducton to other 3 In the smplfed problem, only regular for and summaton loops wth constant bounds are allowed, and the operands must be the product of arrays whose ndces must follow some strct form for nstance, a[,] s allowed whle nether a[,] nor a[2,] s.

13 GLORE: Generalzed Loop Redundancy Elmnaton upon LER-Notaton 74:13 temporary parents because they wll happen after ths loop and can stll use ts results thanks to the post order n the generaton of new formulae. When several orders could be the best dependng on the actual loop bounds values, the analyss records all of them and ther respectve favorable condtons. Applcaton of the algorthm to Formula 9 gves the forest as shown n Fgure /* nputs: a set of reducton loops S; each element has an estmated computaton cost (cost) and relevant loops (relloops) recorded. outputs: a forest F that records an optmzed order to compute the reductons. */ worklst = createintalnodes (S); // a tree node s created for each loop wth chldren and // parent settng to NULL; whle(worklst!= ){ thsloop = the loop wth the mnmum cost n worklst; remove thsloop from worklst and add t nto F; foreach l n worklst{ // add thsloop nto the chldren lst of loops //that rely on ts results f ( ( l.id thsloop.relloops)!= ) { add thsloop nto the chldren lst of l; // update the cost of l l.cost /= thsloop.ndexrange; // confrm the parent relaton wth ts chldren foreach c n thsloop.chldren { f (c.parent!= NULL){ // c has a parent already remove c from the chldren lst of thsloop else{ c.parent = thsloop; Fg. 7. Mnmum Unon Algorthm for selectng an optmzed order for reducton loops. Removng Redundancy through Formula Generaton. After gettng the orderng forest, GLORE generates new formulae wth the redundancy removed. Ths step nvolves not ust the reducton loops, but also all other loops and all operands n the orgnal formula. The generaton works on the trees n the forest one after another; the order makes no dfference. We explan the algorthm frst and then provde an example. When startng workng on a tree T, the algorthm flls a lst A wth all the operands that appear n the orgnal formula. It traverses T n a post order (chldren before parent). Consder a node correspondng to reducton loop R n an orderng forest. The algorthm creates a formula L E r. L s a sequence of loop representatons correspondng to the loops n relloop(r ). E represents the product α β, where α s the product of the results produced by the chldren of ths node n the orderng forest, and β s the product of all the operands n A that have n ther relloops. Those operands are then removed from A. For a sngle-node tree wth no relevant operands (e.g., node 1 n Fgure 6), E s ust 1; the formula s replaced wth the computaton of the range (ub lb) of the loop ndex. The rght-hand-sde notaton r represents a new name created by the algorthm to record the result; f L contans some regular loops, then r has ndex as [d 1,d 2,,d k ], where d s the ID of the th regular loop n L. After all trees n the forest have been processed, a fnal formula s generated to get the product of all those results.

14 74:14 Yufe Dng, Xpeng Shen Appled to Formula 9, the algorthm generates the followng formulae based on the orderng forest shown n Fgure 6. N tmp0; L N N a[][] tmp1[]; L N N k b[][k] tmp2[]; N (tmp1[] tmp2[]) tmp3; tmp0 tmp3 result. The frst formula corresponds to the reducton loop t, whch contans no relevant operand or chld. The second formula comes from the left chld of the second tree whch corresponds to the reducton loop. As ts relloops contans only and, the formula contans loop as a regular loop and the reducton loop. That node has no chldren and hence the expresson n the formula contans only the product of all the operands that have n ther relloops, whch are ust a[][]. The result s stored nto a new name tmp1, whose ndex contans only. The thrd formula comes from the rght chld of the second tree n a smlar manner. The fourth formula comes from the root of the second tree. Because after the generaton of the second and thrd formulae, both a[][] and b[][k] have been removed from the operand lst A, there s no operand n A that has n ts relloops. Therefore, the expresson of ths thrd formula contans only the results from ts two chldren nodes, tmp1[] and tmp2[]. The fnal formula gets the fnal result by multplyng the results from dfferent trees. The overall computatonal complexty reduces from O (n 4 ) to O(n 2 ) Loop-Invarant Regular Loops. Redundant regular loops can be detected and removed upon LER-notaton n a smlar manner, through a dfferent algorthm named closure-based algorthm. It works on each of the formulae produced n the prevous step. We use the followng example for our dscusson. L N L N L N k x[] d[, k] c[, ] w[,, k]. (10) Such computatons are cross products that are commonly seen n computatonal physcs, where each operand represents some values n a lower dmensonal space, and the result gves the values n a hgher dmensonal space. There are two knds of optmzaton opportuntes for such regular loops. The frst s about synthetc operands. If x[] n Formula 10, for nstance, s a synthetc operand (defned n Secton 5) that nvolves non-trval computatons (e.g., sn 2 (t[])), then computng x[] cross loops and k would be redundant. It could be avoded f we put the computatons of all x[] (1 N ) nto a separate formula. Transformatons to explot ths knd of opportuntes s easy to do: Just put the operand and ts relevant loops nto a separate formula. The second s about reuses across subexpressons. When there are multple operands, ther computatons could be splt nto multple steps (each as a separate formula), such that later steps can reuse, rather than repeatedly compute, the results of earler steps. For example, the followng formulae compute the same results as Formula 10 does, but requres only N 2 + N 3 multplcatons, rather than the 2N 3 multplcatons needed by the orgnal formula. L N L N x[] c[, ] temp1[, ] L N L N L N k temp1[, ] d[, k] w[,, k]. (11) The complexty n explotng ths knd of opportuntes s agan on orderng: There may be many possble orders n whch the expressons could get computed. For Formula 10, a form alternatve to

15 GLORE: Generalzed Loop Redundancy Elmnaton upon LER-Notaton 74:15 (a) // nodeset: nodes that share a parent n the closure tree; foreach nd n nodeset{ nd.remanngindexspace = nd.ndexspace; seenindexset = ; whle (nodeset s not empty){ thsnode = the node havng the smallest remanngindexspace; generateformula(thsnode); remove thsnode from nodeset; extraindexset = thsnode.ndexset - seenindexset; extraindexspace = computespace (extraindexset); f (extraindexspace > 0) { // update the remanngndexspace of the other nodes foreach nd n nodeset { nd.extraindexspace /= extraindexspace; seenindexset = seenindexset (b) extraindexset; Fg. 8. (a) The operand closure tree for Formula 10. (b) Closure-based algorthm for fndng a good order for the operands that share a parent n an operand closure tree. Formula 11 s as follows: L N L N L N k x[] d[, k] temp1[,, k] L N L N L N k temp1[,, k] c[, ] w[,, k], (12) whch s more costly than Formulae 11 due to the order n whch t nvolves the operands n the computaton. For an arbtrary expresson, fndng the optmal order s NP-complete n general [Ch-Chung et al. 1997]. We desgn a lnear-tme closure-based heurstc algorthm to solve the problem. It s based on a concept we ntroduce, operand closure tree. Each node n the tree, except the root, corresponds to an operand n the expresson of the LER-formula to optmze, and carres the relloops of that operand n t. The root s an artfcally added node to put all nodes nto one tree structure; ts relloops conssts of all the loop IDs n the formula to optmze. An mportant property of the tree s that a chld s relloops must be a subset of ts parent s hence the name operand closure tree. Ths property helps GLORE fnd good orders. Fgure 8 (a) shows the operand closure tree of Formula 10. Fgure 8 (b) outlnes the closure-based algorthm, whch fnds a good order through a post-order walk over the closure tree. Before the walk, each non-root node has an ndexspace computed, whch equals the product of the ranges of all the loops n ts relloops. Through the post-order walk, the algorthm uses a greedy strategy to teratvely decde the order of the chldren of each node. Its desgn tres to make the unon of the ndex sets of the selected operands enlarge slowly, whch helps maxmze the amount of result reuse and hence effectvely avod unnecessary computatons. Through the orderng process, new formulae are generated to ncrementally compute the product of the chldren of a node (puttng one more chldren nto each new formula), and then creates a formula to compute the multplcatons between that product and the parent node. The search algorthm s for general cases. For loops wth only a small number of operands and loops, exhaustve search could be used to fnd the best Extra Complextes. Ths subsecton descrbes how GLORE handles non-constant loop bounds and data dependences when t removes category-3 redundances.

16 74:16 Yufe Dng, Xpeng Shen Non-Constant Bounds. It uses loop encapsulaton to handle loops wth non-constant bounds. Consder the followng example. L 1, N 1, 1,N k x[] y[] z[k] w[] (13) where, the upper bound of loop s. The basc dea of loop encapsulaton s to use a pseudo-loop wth constant bounds to replace a group of loops that may contan non-constant bounds and have dependences among ther ndces. Its applcaton to Formula 13 gves 1,N 2 t, { 1,N k x{t y{t z{k w[], where, loop t s a pseudo-loop for the group of loops {,, and the subscrpt { records the regular loop (loop ) n that group; the upper bound of loop t (N 2 ) s a smple rough estmaton of the sze of the combned teraton space; the operand x and y both have t n ther relloops because they are relevant to some (or all) loops n that loop group. After encapsulaton, the formulae turn nto a form amenable for the prevously descrbed optmzaton algorthms to apply. Techncal report( [TR 2017] contans the full algorthm of loop encapsulaton.) Data Dependences. When there are loop-carred data dependences, there may be certan restrctons on loop reorderng n the optmzatons. Many classc technques have been developed before for detectng loop-carred data dependences, and to recognze the legalty of a new order of loops accordng to the data dependences [Allen and Kennedy 2001]. These technques can be used to reveal the data dependences n a nested loop. Based on these analyss results, GLORE can ensure that ts transformatons produce legal formulae. Specfcally, GLORE avods dependence volatons through an annotaton scheme and two prncpled rules. The annotaton scheme s the subscrpts of operands for specfyng that the operands are subect to some data dependence across certan loops. An example s the subscrpt t n w t n Formula 1, whch ndcates the data dependence of w across the whle loop n Fgure 1(a). Such annotatons apply to for and reducton loops as well. The annotatons allow GLORE to follow two conservatve rules to prevent any dependences volatons: (1) If a new formula s expresson contans no operands that have dependence subscrpts, the formula s safe to create. The correctness comes from the classc loop transformaton theorem [Allen and Kennedy 2001]: Loop reorderng s safe f there are no loop-carred data dependences. (2) Whenever GLORE tres to create a new formula contanng operands wth dependence subscrpts, t must nclude nto the formula all the loops that any of the operands n the new formula depends on, and at the same tme, loop reorderng s allowed only among the loops nsde the nnermost dependence-carryng loop to avod dependence volatons. Control Flow Statements. Our technque apples regardless of whether the loops contan f, contnue, break, or other control flow statements. These statements are not explctly expressed n our LER notatons. But the dependences they nduce are kept n the optmzatons through dependence subscrpts of varables n the notatons and some constrants on loop reorderng. If the computatons n the expresson ncluded n an LER-notaton has control or data dependences on one of such control flow statements, the varables n those expressons are marked wth dependence subscrpts of all the loops enclosng that control flow statements, whch prevents the reorderng nvolvng those loops to observe the dependences.

17 GLORE: Generalzed Loop Redundancy Elmnaton upon LER-Notaton 74:17 L N X x[] y[] ) tmp1[] L N X y[] ) tmp2[] L N x[] tmp2[] ) tmp1[] L N x[](ncl[] :ncl[ 1] + y[]! ncl[1] = y[1]) ) tmp1[] relaton between teratons (ncl[] s a new name) ntal condton Fg. 9. Example for removng redundances of categores Other Categores Ths secton descrbes the algorthms for other categores brefly. Readers may refer to our techncal report [TR 2017] for detals. Category 4: GLORE frst decapsulates the encapsulated loops. It then detects possble category 4 redundances (partally loop-nvarant loops as Fgure 2 (d)) by matchng some common patterns (e.g., affne loop bounds) wth the formulae. From the nnermost to the outermost loops, t moves an operand out of a loop f ts relloops do not contan the loop s ndex, as the top step n Fgure 9 llustrates. It then reformulates the partally loop-nvarant loops wth an ncremental form as the bottom step n Fgure 9 (a) shows. In the code generaton step, t removes the orgnal redundant computatons by replacng them wth ncremental computatons. When there are multple reducton loops, GLORE apples the mnmum unon algorthms dscussed n Secton 6.2 to decde the reducton order. After that, for each reducton formula t generates, GLORE checks, n the LER-notaton, whether the unon of the relloops of all the operands n R s a proper subset of the unon of the loop IDs n L. If so, t tres to reformulate the partally loop-nvarant loops nto an ncremental form. Ths category of redundancy s often seen n cases where so-called ncremental computng [Hammer et al. 2015, 2014] has been appled. What GLORE contrbutes n ths part s a mechansm for ncorporatng the detected ncremental computatons nto the optmzaton of LER-formulae for automatcally removng the redundances. Category 2: In all the cases we have dscussed, some operands use only a subset of the loop ndces, creatng the redundances. When all operands cover all loop ndces, there can stll be redundant computatons. One typcal example s stencl-lke computatons, such as L L (A[][ 1] + A[ 1][] + A[ + 1][] + A[][ + 1]) C[, ], where, the frst half of the expresson (A[][-1] + A[-1][]) n teraton (, ) conducts the same computatons as the second half of the expresson (A[+1][] + A[][+1]) does n teraton ( 1, 1). GLORE deals wth these redundances after removng the redundances of categores 3 and 4. GLORE frst takes an operand concretzaton step to reverse the effects of operand abstracton such that the operands n the formulae now have ther concrete ndexng expressons. We draw on the followng example to explan the algorthm for removng category-2 redundances:

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1)

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1) Secton 1.2 Subsets and the Boolean operatons on sets If every element of the set A s an element of the set B, we say that A s a subset of B, or that A s contaned n B, or that B contans A, and we wrte A

More information

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,

More information

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster

More information

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

Parallel matrix-vector multiplication

Parallel matrix-vector multiplication Appendx A Parallel matrx-vector multplcaton The reduced transton matrx of the three-dmensonal cage model for gel electrophoress, descrbed n secton 3.2, becomes excessvely large for polymer lengths more

More information

Mathematics 256 a course in differential equations for engineering students

Mathematics 256 a course in differential equations for engineering students Mathematcs 56 a course n dfferental equatons for engneerng students Chapter 5. More effcent methods of numercal soluton Euler s method s qute neffcent. Because the error s essentally proportonal to the

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

CMPS 10 Introduction to Computer Science Lecture Notes

CMPS 10 Introduction to Computer Science Lecture Notes CPS 0 Introducton to Computer Scence Lecture Notes Chapter : Algorthm Desgn How should we present algorthms? Natural languages lke Englsh, Spansh, or French whch are rch n nterpretaton and meanng are not

More information

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009. Farrukh Jabeen Algorthms 51 Assgnment #2 Due Date: June 15, 29. Assgnment # 2 Chapter 3 Dscrete Fourer Transforms Implement the FFT for the DFT. Descrbed n sectons 3.1 and 3.2. Delverables: 1. Concse descrpton

More information

Vectorization in the Polyhedral Model

Vectorization in the Polyhedral Model Vectorzaton n the Polyhedral Model Lous-Noël Pouchet pouchet@cse.oho-state.edu Dept. of Computer Scence and Engneerng, the Oho State Unversty October 200 888. Introducton: Overvew Vectorzaton: Detecton

More information

Analysis of Continuous Beams in General

Analysis of Continuous Beams in General Analyss of Contnuous Beams n General Contnuous beams consdered here are prsmatc, rgdly connected to each beam segment and supported at varous ponts along the beam. onts are selected at ponts of support,

More information

Hermite Splines in Lie Groups as Products of Geodesics

Hermite Splines in Lie Groups as Products of Geodesics Hermte Splnes n Le Groups as Products of Geodescs Ethan Eade Updated May 28, 2017 1 Introducton 1.1 Goal Ths document defnes a curve n the Le group G parametrzed by tme and by structural parameters n the

More information

Programming in Fortran 90 : 2017/2018

Programming in Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Exercse 1 : Evaluaton of functon dependng on nput Wrte a program who evaluate the functon f (x,y) for any two user specfed values

More information

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following.

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following. Complex Numbers The last topc n ths secton s not really related to most of what we ve done n ths chapter, although t s somewhat related to the radcals secton as we wll see. We also won t need the materal

More information

Loop Transformations for Parallelism & Locality. Review. Scalar Expansion. Scalar Expansion: Motivation

Loop Transformations for Parallelism & Locality. Review. Scalar Expansion. Scalar Expansion: Motivation Loop Transformatons for Parallelsm & Localty Last week Data dependences and loops Loop transformatons Parallelzaton Loop nterchange Today Scalar expanson for removng false dependences Loop nterchange Loop

More information

The Codesign Challenge

The Codesign Challenge ECE 4530 Codesgn Challenge Fall 2007 Hardware/Software Codesgn The Codesgn Challenge Objectves In the codesgn challenge, your task s to accelerate a gven software reference mplementaton as fast as possble.

More information

Loop Permutation. Loop Transformations for Parallelism & Locality. Legality of Loop Interchange. Loop Interchange (cont)

Loop Permutation. Loop Transformations for Parallelism & Locality. Legality of Loop Interchange. Loop Interchange (cont) Loop Transformatons for Parallelsm & Localty Prevously Data dependences and loops Loop transformatons Parallelzaton Loop nterchange Today Loop nterchange Loop transformatons and transformaton frameworks

More information

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes SPH3UW Unt 7.3 Sphercal Concave Mrrors Page 1 of 1 Notes Physcs Tool box Concave Mrror If the reflectng surface takes place on the nner surface of the sphercal shape so that the centre of the mrror bulges

More information

Sorting Review. Sorting. Comparison Sorting. CSE 680 Prof. Roger Crawfis. Assumptions

Sorting Review. Sorting. Comparison Sorting. CSE 680 Prof. Roger Crawfis. Assumptions Sortng Revew Introducton to Algorthms Qucksort CSE 680 Prof. Roger Crawfs Inserton Sort T(n) = Θ(n 2 ) In-place Merge Sort T(n) = Θ(n lg(n)) Not n-place Selecton Sort (from homework) T(n) = Θ(n 2 ) In-place

More information

Lecture 5: Multilayer Perceptrons

Lecture 5: Multilayer Perceptrons Lecture 5: Multlayer Perceptrons Roger Grosse 1 Introducton So far, we ve only talked about lnear models: lnear regresson and lnear bnary classfers. We noted that there are functons that can t be represented

More information

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour 6.854 Advanced Algorthms Petar Maymounkov Problem Set 11 (November 23, 2005) Wth: Benjamn Rossman, Oren Wemann, and Pouya Kheradpour Problem 1. We reduce vertex cover to MAX-SAT wth weghts, such that the

More information

CSCI 104 Sorting Algorithms. Mark Redekopp David Kempe

CSCI 104 Sorting Algorithms. Mark Redekopp David Kempe CSCI 104 Sortng Algorthms Mark Redekopp Davd Kempe Algorthm Effcency SORTING 2 Sortng If we have an unordered lst, sequental search becomes our only choce If we wll perform a lot of searches t may be benefcal

More information

Chapter 6 Programmng the fnte element method Inow turn to the man subject of ths book: The mplementaton of the fnte element algorthm n computer programs. In order to make my dscusson as straghtforward

More information

CE 221 Data Structures and Algorithms

CE 221 Data Structures and Algorithms CE 1 ata Structures and Algorthms Chapter 4: Trees BST Text: Read Wess, 4.3 Izmr Unversty of Economcs 1 The Search Tree AT Bnary Search Trees An mportant applcaton of bnary trees s n searchng. Let us assume

More information

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique //00 :0 AM Outlne and Readng The Greedy Method The Greedy Method Technque (secton.) Fractonal Knapsack Problem (secton..) Task Schedulng (secton..) Mnmum Spannng Trees (secton.) Change Money Problem Greedy

More information

Kent State University CS 4/ Design and Analysis of Algorithms. Dept. of Math & Computer Science LECT-16. Dynamic Programming

Kent State University CS 4/ Design and Analysis of Algorithms. Dept. of Math & Computer Science LECT-16. Dynamic Programming CS 4/560 Desgn and Analyss of Algorthms Kent State Unversty Dept. of Math & Computer Scence LECT-6 Dynamc Programmng 2 Dynamc Programmng Dynamc Programmng, lke the dvde-and-conquer method, solves problems

More information

Brave New World Pseudocode Reference

Brave New World Pseudocode Reference Brave New World Pseudocode Reference Pseudocode s a way to descrbe how to accomplsh tasks usng basc steps lke those a computer mght perform. In ths week s lab, you'll see how a form of pseudocode can be

More information

Polyhedral Compilation Foundations

Polyhedral Compilation Foundations Polyhedral Complaton Foundatons Lous-Noël Pouchet pouchet@cse.oho-state.edu Dept. of Computer Scence and Engneerng, the Oho State Unversty Feb 8, 200 888., Class # Introducton: Polyhedral Complaton Foundatons

More information

Module Management Tool in Software Development Organizations

Module Management Tool in Software Development Organizations Journal of Computer Scence (5): 8-, 7 ISSN 59-66 7 Scence Publcatons Management Tool n Software Development Organzatons Ahmad A. Al-Rababah and Mohammad A. Al-Rababah Faculty of IT, Al-Ahlyyah Amman Unversty,

More information

Conditional Speculative Decimal Addition*

Conditional Speculative Decimal Addition* Condtonal Speculatve Decmal Addton Alvaro Vazquez and Elsardo Antelo Dep. of Electronc and Computer Engneerng Unv. of Santago de Compostela, Span Ths work was supported n part by Xunta de Galca under grant

More information

Smoothing Spline ANOVA for variable screening

Smoothing Spline ANOVA for variable screening Smoothng Splne ANOVA for varable screenng a useful tool for metamodels tranng and mult-objectve optmzaton L. Rcco, E. Rgon, A. Turco Outlne RSM Introducton Possble couplng Test case MOO MOO wth Game Theory

More information

Parallel Numerics. 1 Preconditioning & Iterative Solvers (From 2016)

Parallel Numerics. 1 Preconditioning & Iterative Solvers (From 2016) Technsche Unverstät München WSe 6/7 Insttut für Informatk Prof. Dr. Thomas Huckle Dpl.-Math. Benjamn Uekermann Parallel Numercs Exercse : Prevous Exam Questons Precondtonng & Iteratve Solvers (From 6)

More information

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

AP PHYSICS B 2008 SCORING GUIDELINES

AP PHYSICS B 2008 SCORING GUIDELINES AP PHYSICS B 2008 SCORING GUIDELINES General Notes About 2008 AP Physcs Scorng Gudelnes 1. The solutons contan the most common method of solvng the free-response questons and the allocaton of ponts for

More information

Load Balancing for Hex-Cell Interconnection Network

Load Balancing for Hex-Cell Interconnection Network Int. J. Communcatons, Network and System Scences,,, - Publshed Onlne Aprl n ScRes. http://www.scrp.org/journal/jcns http://dx.do.org/./jcns.. Load Balancng for Hex-Cell Interconnecton Network Saher Manaseer,

More information

Concurrent Apriori Data Mining Algorithms

Concurrent Apriori Data Mining Algorithms Concurrent Apror Data Mnng Algorthms Vassl Halatchev Department of Electrcal Engneerng and Computer Scence York Unversty, Toronto October 8, 2015 Outlne Why t s mportant Introducton to Assocaton Rule Mnng

More information

Memory Modeling in ESL-RTL Equivalence Checking

Memory Modeling in ESL-RTL Equivalence Checking 11.4 Memory Modelng n ESL-RTL Equvalence Checkng Alfred Koelbl 2025 NW Cornelus Pass Rd. Hllsboro, OR 97124 koelbl@synopsys.com Jerry R. Burch 2025 NW Cornelus Pass Rd. Hllsboro, OR 97124 burch@synopsys.com

More information

Assembler. Building a Modern Computer From First Principles.

Assembler. Building a Modern Computer From First Principles. Assembler Buldng a Modern Computer From Frst Prncples www.nand2tetrs.org Elements of Computng Systems, Nsan & Schocken, MIT Press, www.nand2tetrs.org, Chapter 6: Assembler slde Where we are at: Human Thought

More information

Cluster Analysis of Electrical Behavior

Cluster Analysis of Electrical Behavior Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School

More information

APPLICATION OF MULTIVARIATE LOSS FUNCTION FOR ASSESSMENT OF THE QUALITY OF TECHNOLOGICAL PROCESS MANAGEMENT

APPLICATION OF MULTIVARIATE LOSS FUNCTION FOR ASSESSMENT OF THE QUALITY OF TECHNOLOGICAL PROCESS MANAGEMENT 3. - 5. 5., Brno, Czech Republc, EU APPLICATION OF MULTIVARIATE LOSS FUNCTION FOR ASSESSMENT OF THE QUALITY OF TECHNOLOGICAL PROCESS MANAGEMENT Abstract Josef TOŠENOVSKÝ ) Lenka MONSPORTOVÁ ) Flp TOŠENOVSKÝ

More information

3D vector computer graphics

3D vector computer graphics 3D vector computer graphcs Paolo Varagnolo: freelance engneer Padova Aprl 2016 Prvate Practce ----------------------------------- 1. Introducton Vector 3D model representaton n computer graphcs requres

More information

Sequential search. Building Java Programs Chapter 13. Sequential search. Sequential search

Sequential search. Building Java Programs Chapter 13. Sequential search. Sequential search Sequental search Buldng Java Programs Chapter 13 Searchng and Sortng sequental search: Locates a target value n an array/lst by examnng each element from start to fnsh. How many elements wll t need to

More information

SENSITIVITY ANALYSIS IN LINEAR PROGRAMMING USING A CALCULATOR

SENSITIVITY ANALYSIS IN LINEAR PROGRAMMING USING A CALCULATOR SENSITIVITY ANALYSIS IN LINEAR PROGRAMMING USING A CALCULATOR Judth Aronow Rchard Jarvnen Independent Consultant Dept of Math/Stat 559 Frost Wnona State Unversty Beaumont, TX 7776 Wnona, MN 55987 aronowju@hal.lamar.edu

More information

TN348: Openlab Module - Colocalization

TN348: Openlab Module - Colocalization TN348: Openlab Module - Colocalzaton Topc The Colocalzaton module provdes the faclty to vsualze and quantfy colocalzaton between pars of mages. The Colocalzaton wndow contans a prevew of the two mages

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Decson surface s a hyperplane (lne n 2D) n feature space (smlar to the Perceptron) Arguably, the most mportant recent dscovery n machne learnng In a nutshell: map the data to a predetermned

More information

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 A mathematcal programmng approach to the analyss, desgn and

More information

Data Representation in Digital Design, a Single Conversion Equation and a Formal Languages Approach

Data Representation in Digital Design, a Single Conversion Equation and a Formal Languages Approach Data Representaton n Dgtal Desgn, a Sngle Converson Equaton and a Formal Languages Approach Hassan Farhat Unversty of Nebraska at Omaha Abstract- In the study of data representaton n dgtal desgn and computer

More information

GSLM Operations Research II Fall 13/14

GSLM Operations Research II Fall 13/14 GSLM 58 Operatons Research II Fall /4 6. Separable Programmng Consder a general NLP mn f(x) s.t. g j (x) b j j =. m. Defnton 6.. The NLP s a separable program f ts objectve functon and all constrants are

More information

Circuit Analysis I (ENGR 2405) Chapter 3 Method of Analysis Nodal(KCL) and Mesh(KVL)

Circuit Analysis I (ENGR 2405) Chapter 3 Method of Analysis Nodal(KCL) and Mesh(KVL) Crcut Analyss I (ENG 405) Chapter Method of Analyss Nodal(KCL) and Mesh(KVL) Nodal Analyss If nstead of focusng on the oltages of the crcut elements, one looks at the oltages at the nodes of the crcut,

More information

Intro. Iterators. 1. Access

Intro. Iterators. 1. Access Intro Ths mornng I d lke to talk a lttle bt about s and s. We wll start out wth smlartes and dfferences, then we wll see how to draw them n envronment dagrams, and we wll fnsh wth some examples. Happy

More information

2x x l. Module 3: Element Properties Lecture 4: Lagrange and Serendipity Elements

2x x l. Module 3: Element Properties Lecture 4: Lagrange and Serendipity Elements Module 3: Element Propertes Lecture : Lagrange and Serendpty Elements 5 In last lecture note, the nterpolaton functons are derved on the bass of assumed polynomal from Pascal s trangle for the fled varable.

More information

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices Internatonal Mathematcal Forum, Vol 7, 2012, no 52, 2549-2554 An Applcaton of the Dulmage-Mendelsohn Decomposton to Sparse Null Space Bases of Full Row Rank Matrces Mostafa Khorramzadeh Department of Mathematcal

More information

Solving two-person zero-sum game by Matlab

Solving two-person zero-sum game by Matlab Appled Mechancs and Materals Onlne: 2011-02-02 ISSN: 1662-7482, Vols. 50-51, pp 262-265 do:10.4028/www.scentfc.net/amm.50-51.262 2011 Trans Tech Publcatons, Swtzerland Solvng two-person zero-sum game by

More information

Loop Transformations, Dependences, and Parallelization

Loop Transformations, Dependences, and Parallelization Loop Transformatons, Dependences, and Parallelzaton Announcements Mdterm s Frday from 3-4:15 n ths room Today Semester long project Data dependence recap Parallelsm and storage tradeoff Scalar expanson

More information

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms Course Introducton Course Topcs Exams, abs, Proects A quc loo at a few algorthms 1 Advanced Data Structures and Algorthms Descrpton: We are gong to dscuss algorthm complexty analyss, algorthm desgn technques

More information

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance Tsnghua Unversty at TAC 2009: Summarzng Mult-documents by Informaton Dstance Chong Long, Mnle Huang, Xaoyan Zhu State Key Laboratory of Intellgent Technology and Systems, Tsnghua Natonal Laboratory for

More information

MATHEMATICS FORM ONE SCHEME OF WORK 2004

MATHEMATICS FORM ONE SCHEME OF WORK 2004 MATHEMATICS FORM ONE SCHEME OF WORK 2004 WEEK TOPICS/SUBTOPICS LEARNING OBJECTIVES LEARNING OUTCOMES VALUES CREATIVE & CRITICAL THINKING 1 WHOLE NUMBER Students wll be able to: GENERICS 1 1.1 Concept of

More information

An Entropy-Based Approach to Integrated Information Needs Assessment

An Entropy-Based Approach to Integrated Information Needs Assessment Dstrbuton Statement A: Approved for publc release; dstrbuton s unlmted. An Entropy-Based Approach to ntegrated nformaton Needs Assessment June 8, 2004 Wllam J. Farrell Lockheed Martn Advanced Technology

More information

Assembler. Shimon Schocken. Spring Elements of Computing Systems 1 Assembler (Ch. 6) Compiler. abstract interface.

Assembler. Shimon Schocken. Spring Elements of Computing Systems 1 Assembler (Ch. 6) Compiler. abstract interface. IDC Herzlya Shmon Schocken Assembler Shmon Schocken Sprng 2005 Elements of Computng Systems 1 Assembler (Ch. 6) Where we are at: Human Thought Abstract desgn Chapters 9, 12 abstract nterface H.L. Language

More information

Feature Reduction and Selection

Feature Reduction and Selection Feature Reducton and Selecton Dr. Shuang LIANG School of Software Engneerng TongJ Unversty Fall, 2012 Today s Topcs Introducton Problems of Dmensonalty Feature Reducton Statstc methods Prncpal Components

More information

Harvard University CS 101 Fall 2005, Shimon Schocken. Assembler. Elements of Computing Systems 1 Assembler (Ch. 6)

Harvard University CS 101 Fall 2005, Shimon Schocken. Assembler. Elements of Computing Systems 1 Assembler (Ch. 6) Harvard Unversty CS 101 Fall 2005, Shmon Schocken Assembler Elements of Computng Systems 1 Assembler (Ch. 6) Why care about assemblers? Because Assemblers employ some nfty trcks Assemblers are the frst

More information

Wishing you all a Total Quality New Year!

Wishing you all a Total Quality New Year! Total Qualty Management and Sx Sgma Post Graduate Program 214-15 Sesson 4 Vnay Kumar Kalakband Assstant Professor Operatons & Systems Area 1 Wshng you all a Total Qualty New Year! Hope you acheve Sx sgma

More information

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Determining the Optimal Bandwidth Based on Multi-criterion Fusion Proceedngs of 01 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 5 (01) (01) IACSIT Press, Sngapore Determnng the Optmal Bandwdth Based on Mult-crteron Fuson Ha-L Lang 1+, Xan-Mn

More information

CSE 326: Data Structures Quicksort Comparison Sorting Bound

CSE 326: Data Structures Quicksort Comparison Sorting Bound CSE 326: Data Structures Qucksort Comparson Sortng Bound Steve Setz Wnter 2009 Qucksort Qucksort uses a dvde and conquer strategy, but does not requre the O(N) extra space that MergeSort does. Here s the

More information

LOOP ANALYSIS. The second systematic technique to determine all currents and voltages in a circuit

LOOP ANALYSIS. The second systematic technique to determine all currents and voltages in a circuit LOOP ANALYSS The second systematic technique to determine all currents and voltages in a circuit T S DUAL TO NODE ANALYSS - T FRST DETERMNES ALL CURRENTS N A CRCUT AND THEN T USES OHM S LAW TO COMPUTE

More information

5 The Primal-Dual Method

5 The Primal-Dual Method 5 The Prmal-Dual Method Orgnally desgned as a method for solvng lnear programs, where t reduces weghted optmzaton problems to smpler combnatoral ones, the prmal-dual method (PDM) has receved much attenton

More information

such that is accepted of states in , where Finite Automata Lecture 2-1: Regular Languages be an FA. A string is the transition function,

such that is accepted of states in , where Finite Automata Lecture 2-1: Regular Languages be an FA. A string is the transition function, * Lecture - Regular Languages S Lecture - Fnte Automata where A fnte automaton s a -tuple s a fnte set called the states s a fnte set called the alphabet s the transton functon s the ntal state s the set

More information

Reducing Frame Rate for Object Tracking

Reducing Frame Rate for Object Tracking Reducng Frame Rate for Object Trackng Pavel Korshunov 1 and We Tsang Oo 2 1 Natonal Unversty of Sngapore, Sngapore 11977, pavelkor@comp.nus.edu.sg 2 Natonal Unversty of Sngapore, Sngapore 11977, oowt@comp.nus.edu.sg

More information

UB at GeoCLEF Department of Geography Abstract

UB at GeoCLEF Department of Geography   Abstract UB at GeoCLEF 2006 Mguel E. Ruz (1), Stuart Shapro (2), June Abbas (1), Slva B. Southwck (1) and Davd Mark (3) State Unversty of New York at Buffalo (1) Department of Lbrary and Informaton Studes (2) Department

More information

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters Proper Choce of Data Used for the Estmaton of Datum Transformaton Parameters Hakan S. KUTOGLU, Turkey Key words: Coordnate systems; transformaton; estmaton, relablty. SUMMARY Advances n technologes and

More information

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

Problem Set 3 Solutions

Problem Set 3 Solutions Introducton to Algorthms October 4, 2002 Massachusetts Insttute of Technology 6046J/18410J Professors Erk Demane and Shaf Goldwasser Handout 14 Problem Set 3 Solutons (Exercses were not to be turned n,

More information

Sum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints

Sum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints Australan Journal of Basc and Appled Scences, 2(4): 1204-1208, 2008 ISSN 1991-8178 Sum of Lnear and Fractonal Multobjectve Programmng Problem under Fuzzy Rules Constrants 1 2 Sanjay Jan and Kalash Lachhwan

More information

TPL-Aware Displacement-driven Detailed Placement Refinement with Coloring Constraints

TPL-Aware Displacement-driven Detailed Placement Refinement with Coloring Constraints TPL-ware Dsplacement-drven Detaled Placement Refnement wth Colorng Constrants Tao Ln Iowa State Unversty tln@astate.edu Chrs Chu Iowa State Unversty cnchu@astate.edu BSTRCT To mnmze the effect of process

More information

Active Contours/Snakes

Active Contours/Snakes Actve Contours/Snakes Erkut Erdem Acknowledgement: The sldes are adapted from the sldes prepared by K. Grauman of Unversty of Texas at Austn Fttng: Edges vs. boundares Edges useful sgnal to ndcate occludng

More information

Preconditioning Parallel Sparse Iterative Solvers for Circuit Simulation

Preconditioning Parallel Sparse Iterative Solvers for Circuit Simulation Precondtonng Parallel Sparse Iteratve Solvers for Crcut Smulaton A. Basermann, U. Jaekel, and K. Hachya 1 Introducton One mportant mathematcal problem n smulaton of large electrcal crcuts s the soluton

More information

S1 Note. Basis functions.

S1 Note. Basis functions. S1 Note. Bass functons. Contents Types of bass functons...1 The Fourer bass...2 B-splne bass...3 Power and type I error rates wth dfferent numbers of bass functons...4 Table S1. Smulaton results of type

More information

A fault tree analysis strategy using binary decision diagrams

A fault tree analysis strategy using binary decision diagrams Loughborough Unversty Insttutonal Repostory A fault tree analyss strategy usng bnary decson dagrams Ths tem was submtted to Loughborough Unversty's Insttutonal Repostory by the/an author. Addtonal Informaton:

More information

Optimizing Document Scoring for Query Retrieval

Optimizing Document Scoring for Query Retrieval Optmzng Document Scorng for Query Retreval Brent Ellwen baellwe@cs.stanford.edu Abstract The goal of ths project was to automate the process of tunng a document query engne. Specfcally, I used machne learnng

More information

c 2009 Society for Industrial and Applied Mathematics

c 2009 Society for Industrial and Applied Mathematics SIAM J. MATRIX ANAL. APPL. Vol. 31, No. 3, pp. 1382 1411 c 2009 Socety for Industral and Appled Mathematcs SUPERFAST MULTIFRONTAL METHOD FOR LARGE STRUCTURED LINEAR SYSTEMS OF EQUATIONS JIANLIN XIA, SHIVKUMAR

More information

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,

More information

Machine Learning. Support Vector Machines. (contains material adapted from talks by Constantin F. Aliferis & Ioannis Tsamardinos, and Martin Law)

Machine Learning. Support Vector Machines. (contains material adapted from talks by Constantin F. Aliferis & Ioannis Tsamardinos, and Martin Law) Machne Learnng Support Vector Machnes (contans materal adapted from talks by Constantn F. Alfers & Ioanns Tsamardnos, and Martn Law) Bryan Pardo, Machne Learnng: EECS 349 Fall 2014 Support Vector Machnes

More information

Hierarchical clustering for gene expression data analysis

Hierarchical clustering for gene expression data analysis Herarchcal clusterng for gene expresson data analyss Gorgo Valentn e-mal: valentn@ds.unm.t Clusterng of Mcroarray Data. Clusterng of gene expresson profles (rows) => dscovery of co-regulated and functonally

More information

Array transposition in CUDA shared memory

Array transposition in CUDA shared memory Array transposton n CUDA shared memory Mke Gles February 19, 2014 Abstract Ths short note s nspred by some code wrtten by Jeremy Appleyard for the transposton of data through shared memory. I had some

More information

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 An Iteratve Soluton Approach to Process Plant Layout usng Mxed

More information

A Facet Generation Procedure. for solving 0/1 integer programs

A Facet Generation Procedure. for solving 0/1 integer programs A Facet Generaton Procedure for solvng 0/ nteger programs by Gyana R. Parja IBM Corporaton, Poughkeepse, NY 260 Radu Gaddov Emery Worldwde Arlnes, Vandala, Oho 45377 and Wlbert E. Wlhelm Teas A&M Unversty,

More information

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization Problem efntons and Evaluaton Crtera for Computatonal Expensve Optmzaton B. Lu 1, Q. Chen and Q. Zhang 3, J. J. Lang 4, P. N. Suganthan, B. Y. Qu 6 1 epartment of Computng, Glyndwr Unversty, UK Faclty

More information

Today s Outline. Sorting: The Big Picture. Why Sort? Selection Sort: Idea. Insertion Sort: Idea. Sorting Chapter 7 in Weiss.

Today s Outline. Sorting: The Big Picture. Why Sort? Selection Sort: Idea. Insertion Sort: Idea. Sorting Chapter 7 in Weiss. Today s Outlne Sortng Chapter 7 n Wess CSE 26 Data Structures Ruth Anderson Announcements Wrtten Homework #6 due Frday 2/26 at the begnnng of lecture Proect Code due Mon March 1 by 11pm Today s Topcs:

More information

y and the total sum of

y and the total sum of Lnear regresson Testng for non-lnearty In analytcal chemstry, lnear regresson s commonly used n the constructon of calbraton functons requred for analytcal technques such as gas chromatography, atomc absorpton

More information

Related-Mode Attacks on CTR Encryption Mode

Related-Mode Attacks on CTR Encryption Mode Internatonal Journal of Network Securty, Vol.4, No.3, PP.282 287, May 2007 282 Related-Mode Attacks on CTR Encrypton Mode Dayn Wang, Dongda Ln, and Wenlng Wu (Correspondng author: Dayn Wang) Key Laboratory

More information

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task Proceedngs of NTCIR-6 Workshop Meetng, May 15-18, 2007, Tokyo, Japan Term Weghtng Classfcaton System Usng the Ch-square Statstc for the Classfcaton Subtask at NTCIR-6 Patent Retreval Task Kotaro Hashmoto

More information

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching A Fast Vsual Trackng Algorthm Based on Crcle Pxels Matchng Zhqang Hou hou_zhq@sohu.com Chongzhao Han czhan@mal.xjtu.edu.cn Ln Zheng Abstract: A fast vsual trackng algorthm based on crcle pxels matchng

More information

On Some Entertaining Applications of the Concept of Set in Computer Science Course

On Some Entertaining Applications of the Concept of Set in Computer Science Course On Some Entertanng Applcatons of the Concept of Set n Computer Scence Course Krasmr Yordzhev *, Hrstna Kostadnova ** * Assocate Professor Krasmr Yordzhev, Ph.D., Faculty of Mathematcs and Natural Scences,

More information

K-means and Hierarchical Clustering

K-means and Hierarchical Clustering Note to other teachers and users of these sldes. Andrew would be delghted f you found ths source materal useful n gvng your own lectures. Feel free to use these sldes verbatm, or to modfy them to ft your

More information

Math Homotopy Theory Additional notes

Math Homotopy Theory Additional notes Math 527 - Homotopy Theory Addtonal notes Martn Frankland February 4, 2013 The category Top s not Cartesan closed. problem. In these notes, we explan how to remedy that 1 Compactly generated spaces Ths

More information

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics Introducton G10 NAG Fortran Lbrary Chapter Introducton G10 Smoothng n Statstcs Contents 1 Scope of the Chapter... 2 2 Background to the Problems... 2 2.1 Smoothng Methods... 2 2.2 Smoothng Splnes and Regresson

More information

Test-Cost Modeling and Optimal Test-Flow Selection of 3D-Stacked ICs

Test-Cost Modeling and Optimal Test-Flow Selection of 3D-Stacked ICs Test-Cost Modelng and Optmal Test-Flow Selecton of 3D-Stacked ICs Mukesh Agrawal, Student Member, IEEE, and Krshnendu Chakrabarty, Fellow, IEEE Abstract Three-dmensonal (3D) ntegraton s an attractve technology

More information

Communication-Minimal Partitioning and Data Alignment for Af"ne Nested Loops

Communication-Minimal Partitioning and Data Alignment for Afne Nested Loops Communcaton-Mnmal Parttonng and Data Algnment for Af"ne Nested Loops HYUK-JAE LEE 1 AND JOSÉ A. B. FORTES 2 1 Department of Computer Scence, Lousana Tech Unversty, Ruston, LA 71272, USA 2 School of Electrcal

More information

Lecture 15: Memory Hierarchy Optimizations. I. Caches: A Quick Review II. Iteration Space & Loop Transformations III.

Lecture 15: Memory Hierarchy Optimizations. I. Caches: A Quick Review II. Iteration Space & Loop Transformations III. Lecture 15: Memory Herarchy Optmzatons I. Caches: A Quck Revew II. Iteraton Space & Loop Transformatons III. Types of Reuse ALSU 7.4.2-7.4.3, 11.2-11.5.1 15-745: Memory Herarchy Optmzatons Phllp B. Gbbons

More information

ON SOME ENTERTAINING APPLICATIONS OF THE CONCEPT OF SET IN COMPUTER SCIENCE COURSE

ON SOME ENTERTAINING APPLICATIONS OF THE CONCEPT OF SET IN COMPUTER SCIENCE COURSE Yordzhev K., Kostadnova H. Інформаційні технології в освіті ON SOME ENTERTAINING APPLICATIONS OF THE CONCEPT OF SET IN COMPUTER SCIENCE COURSE Yordzhev K., Kostadnova H. Some aspects of programmng educaton

More information