Run Time Methods for Parallelizing Partially Parallel Loops x

Size: px
Start display at page:

Download "Run Time Methods for Parallelizing Partially Parallel Loops x"

Transcription

1 Run Time Methods fo Paallelizing Patially Paallel Loops x Laence Rauchege y Nancy M. Amato z David A. Padua y Univesity of Illinois Texas A&M Univesity Univesity of Illinois Abstact In this pape e give a ne un time technique fo finding an optimal paallel execution schedule fo a patially paallel loop, i.e., a loop hose paallelization equies synchonization to ensue that the iteations ae executed in the coect ode. Given the oiginal loop, the compile geneates inspecto code that pefoms un time pepocessing of the loop s access patten, and schedule code that schedules (and executes) the loop iteations. The inspecto is fully paallel, uses no synchonization, and can be applied to any loop. In addition, it can implement at un time the to most effective tansfomations fo inceasing the amount of paallelism in a loop: aay pivatization and eduction paallelization (element ise). We also descibe a ne scheme fo constucting an optimal paallel execution schedule fo the iteations of the loop. Intoduction To achieve a high level of pefomance fo a paticula pogam on today s supecomputes, softae developes ae often foced to tediously hand-code optimizations tailoed to a specific machine. Such hand-coding is difficult, eo-pone, and often not potable to diffeent machines. Restuctuing, o paallelizing, compiles addess these poblems by detecting and exploiting paallelism in sequential pogams itten in conventional languages. Although compile techniques fo the automatic detection of paallelism have been studied extensively ove the last to decades [, ], cuent paallelizing compiles cannot extact a significant faction of the available paallelism in a loop if it has a complex and/o statically insufficiently defined access patten. This is an extemely impotant issue because a lage class of complex simulations used in industy today have iegula domains and/o dynamically changing inteactions. Examples include SPICE fo cicuit simulation, DYNA D and PRONTO D fo stuctual mechanics modeling, GAUS- SIAN and DMOL fo quantum mechanicalsimulation of molecules, CHARMM and DISCOVER fo molecula dynamics simulation of oganic systems, and FIDAP fo modeling complex fluid flos [8]. x Due to space limitations, this pape is an extended abstact of []. y Cente fo Supecomputing Reseach & Development, 08 W. Main St., Ubana, IL 680, ege,padua@csd.uiuc.edu. Reseach suppotedin pat by Intel and NASA GaduateFelloships, and Amy contact #DABT6-9-C-00. This ok is not necessaily epesentative of the positions o policies of the Amy o the Govenment. z Depatment of Compute Science, Texas A&M Univesity, College Station, TX 778-, amato@cs.tamu.edu. Reseach suppoted in pat by an AT&T Bell LaboatoiesGaduate Felloship, NSF Gant CCR , and the Intenational Compute Science Institute, Bekeley, CA. Thus, since the available paallelism in theses types of applications cannot be detemined statically by pesent paallelizing compiles [6, 8], compile-time analysis must be complemented by ne methods capableof automatically extacting paallelism at un time. Run time techniques ae needed because the access patten of some pogams cannot be statically detemined, eithe because of limitations of cuent analysis algoithms o because the access patten is input data dependent. Fo example, most dependence analysis algoithms consevatively assume dependences hen pesented ith non linea o subscipted subscipt expessions. Duing the past fe yeas,techniques have been developed fo the un time analysis and scheduling of loops [5, 9,, 7, 0,, 5, 6, 7, 8, 9, 0,, ]. The majoity of this ok has concentated on developing un time methods fo constucting execution schedules fo patially paallel loops, i.e., loops hose paallelization equies synchonization to ensue that the iteations ae executed in the coect ode. Given the oiginal, o souce loop, most of these techniques geneate inspecto code that analyzes, at un time, the coss-iteation dependences in the loop,and schedule/executo code that schedulesand executes the loop iteations using the dependence infomation extacted by the inspecto [0]. Ou Results. We give a ne inspecto/schedule/executo method fo finding an optimal paallel execution schedule fo a patially paallel loop. Ou inspecto is fully paallel, uses no synchonization, and can be applied to any loop (fom hich an inspecto can be extacted). In addition, ou inspecto can implement at un time the to most effective tansfomations fo inceasing the amount of paallelism in a loop: aay pivatization and eduction paallelization (element ise). The ability to identify pivatizable and eduction vaiables is vey poeful since it eliminates the data dependences involving these vaiables and inceases the available paallelism in the loop. The schedule patitions the set of iteations into subsets called avefonts. Iteations in each avefont can be executed in paallel, i.e., thee ae no data dependences beteen iteations in a avefont. Although the avefonts themselves ae constucted one afte anothe, the computation of each avefont is fully paallel and equies no synchonization. The scheduling can be dynamically ovelapped ith the paallel execution of the loop iteations to utilize the machine moe unifomly. Ou ne method impoves on the pevious techniquessince none of them has all of these popeties (a compaison to pevious ok is contained in Section ). Peliminaies In ode to guaantee the semantics of a loop, the paallel execution schedule fo its iteations must espect the data dependence elations beteen the statements in the loop body [, 5,,, 5]. Thee ae thee possible types of dependences beteen to statements that access the same memoy location: flo (ead afte ite), anti (ite afte ead), and output (ite afte ite). Flo dependences expess a fundamental elationship about the data flo in the pogam. Anti and output dependences, also knon as memoyelated dependences, ae caused by the euse of memoy, e.g., pogam vaiables. If thee ae flo dependences beteen accesses in

2 do i =, n/ S: tmp = A(*i) A(*i) = A(*i-) S : A(*i-) = tmp (a) Figue : do i=, n do j =, m S: A(j) = A(j) + exp() diffeent iteations of a loop, then the semantics of the loop cannotbe guaanteed unless those iteations ae executed in ode of iteation numbe because values that ae computed (poduced) in an iteation of the loop ae used (consumed) duing some late iteation. If thee ae no flo dependences, but thee ae anti o output dependences beteen iteations of a loop, then the loop must be modified to emove all such dependences befoe these iteations can be executed in paallel. In some cases, even flo dependences can be emoved by simple algoithm substitution, e.g., eductions. Unfotunately, not all such situations can be handled efficiently. In ode to emove cetain types of dependences to tansfomations can be applied to the loop: pivatization and eduction paallelization. Pivatization ceates, fo each pocesso coopeating on the execution of the loop, pivate copies of the pogam vaiables that give ise to anti o output dependences (see, e.g., [7, 8, 9, ]). The loop shon in Figue (a), is an example of a loop that can be executed in paallel by using pivatization; the anti dependences beteen statement S of iteation i and statement S of iteation i +, fo i < n=, can be emoved by pivatizing the tempoay vaiable tmp. In this pape, the folloing citeion is used to detemine hethe a vaiable may be pivatized. Pivatization Citeion: Let A be a shaed aay (o aay section) that is efeenced in a loop L. A can be pivatized if and only if evey ead access to an element of A is peceded by a ite access to that same element of A ithin the same iteation of L. In geneal, dependences that ae geneated by accesses to vaiables that ae only used as okspace (e.g., tempoay vaiables) ithin an iteation can be eliminated by pivatizing the okspace. Reduction paallelization is anothe impotant technique fo tansfoming cetain types of data dependent loops fo concuent execution. Definition: A eduction vaiable is a vaiable hose value is used inoneassociativeopeationofthefomx = xexp,heeis the associative opeato and x does not occu in exp o anyhee else in the loop. If the opeato is not commutative then the implementation of the paallel equivalent eduction opeation is moe constained. Reduction vaiables ae theefoe accessed in a cetain specific patten (hich leads to a chaacteistic data dependence gaph). A simple but typical example of a eduction is statement S in Figue (b). The opeato is exemplified by the + opeato, the access patten of aay A(:) is ead, modify, ite, and the function pefomed by the loop is to add a value computed in each iteation to the value stoed in A(:). Once eduction vaiables ae identified, methods ae knon fo pefoming the eduction opeation in paallel (see, e.g., [,, 6, 5]). Run Time Analysis of Loops Given a do loop hose access patten cannot be statically analyzed, compiles have taditionally geneated sequential code. Since compile time data dependence analysis techniques cannot be used on such pogams, methods of pefoming the analysis at un time (b) ae equied. Seveal techniques have been developed fo the un time analysis and scheduling of loops ith coss-iteation dependences[5,9,,7,0,,8,9,0,,]. Hoeve,fo vaious easons, such techniques have not achieved ide spead use in cuent paallelizing compiles. In the folloing e descibe a ne un time scheme fo constucting a paallel execution schedule fo the iteations of a loop. The geneal stuctue of ou method is simila to the above cited un time techniques: given the oiginal, o souce loop, the compile geneates inspecto code that analyzes, at un time, the cossiteation dependences in the loop, schedule code that schedules the loop iteations using the dependence infomation extacted by the inspecto, and executo code that executes the loop iteations. In the pevious techniques, the schedule and the executo ae tightly coupled codes hich ae collectively efeed to as the executo, and the inspecto and the schedule/executo codes ae usually decoupled [0]. Although ou methods can also inteleave the schedule and the executo, e teat them sepaately since they do tackle distinct tasks.. The Inspecto In this section e descibe a ne inspecto scheme that pocesses the memoy efeences in a loop and constucts a data stuctue hich the schedule can use to efficiently assign iteations to avefonts. In addition, ou inspecto can implement at un time to impotant tansfomations: (element ise) aay pivatization and eduction paallelization (see Section ). The ability to identify pivatizable and eduction vaiables is vey poeful since it eliminates the data dependences involving these vaiables. In paticula, these tansfomations incease the available paallelism in the loop and also educe the ok equied of the schedule since it need not conside dependencesinvolving such vaiables hen it constucts the paallel execution schedule fo the loop iteations. The basic stategy of ou method is fo the inspectoto pepocess the memoy efeences and detemine the data dependencesfo each memoy location accessed. Late, the schedule uses this memoylocation dependenceinfomation to detemine the data dependences beteen the iteations. We descibe the method as applied to a shaed aay A that is accessed though subscipt aays (see Figue (a)). Fo simplicity, e fist conside only the poblem of identifying the coss iteation dependences fo each aay element (memoy location). Afte descibing the inspecto, e discuss ho the dependence infomation it discoves can be used to identify the aay elements that ae ead only, pivatizable, o eduction vaiables. The inspecto has to main tasks.. Fo each aay element A[x], the inspecto collects all the efeences to it into an aay (o list) R x and stoes them in iteation ode. Fo each efeence it stoes the iteation numbe and access type (i.e., ead o ite) (see Figue (b)).. Fo each aay element A[x], the inspecto detemines the data dependences beteen all its efeences and stoes them in a data stuctue H x fo late use by the schedule. Belo e discuss ho the efeences to each aay element can be collected and stoed in the aay (o list) R x. Assuming R x is available, e fist descibe ho the inspecto detemines the dependencesamong the efeences to A[x] and computes the data stuctue H x. The elations beteen the efeences to A[x] can be oganized (conceptually) into an aay element dependence gaph D x.ifadjacent efeences in R x have diffeent access types, then a flo o

3 do i =,8 A(W(i)) = = A(R(i)) ok(i) (a) W(:8) = [ 5 6 ] R(:8) = [ 7 8 ] D iteation access type R ite type level level H index in R Figue : A (a) souceloop, (b) the aay R fo A[], (c) its dependence gaph D, and (d) its hieachy vecto H. anti dependence exists, and if they ae both ites, then an output dependence is signaled. These dependences ae eflected by paentchild elationships in D x. If adjacent efeences ae both eads, then thee is no dependence beteen the elements, but they may have a common paent (child) in D x: the last ite peceding (fist ite folloing) them in R x. Fo example, the dependence gaph D fo A[] is shon in Figue (c). Ou goal is to encode the pedecesso/successo infomation of the (conceptual) dependence gaph D x in a hieachy vecto H x so that the schedule can easily look-up the dependence infomation fo the efeences to A[x]. Fist,eaddalevel field to the ecods in R x, and stoe in it the efeence s level in the dependence gaph D x (see Figue (b)). Then, fo each level, e stoe in H x the index (pointe to location) in R x of the fist efeence at that level. Specifically, H x is an aay and H x[i] contains the index in R x of the fist efeence at level i, i.e., H x ill seve as a look up table fo the fist efeence in R x at any level (see Figue (d)). Note that this implies that H x ecods the position in R x of evey ite access and of the fist ead access in any un of eads. We no give an example of ho the hieachy vecto seves as a look-up table fo the pedecessosand successos of all the accesses. Conside the ead access to A[] in the 6th iteation, hich appeas as the 6th enty in R. Its level is 5, and thus it finds its successo by looking at the 5+ =6th element of the hieachy vecto H, hich contains the value 8 indicating that its successo is the 8th element in R. Similaly, its pedecesso is found by looking in the 5, =th element of H, hich indicates that its pedecesso is the 5th element of R. Implementing the Inspecto. We no conside ho to collect the accesses to each aay element A[x] into the aays R x. Regadless of the technique used to constuct these aays, to ensue the scalability of ou methods e must pocess (mak) the efeences to the shaed aay A in a doall (see Figue (a) and (b)). The computation pefomed in the making opeations ill depend upon the technique used to constuct the aays R x. In any case, note that since e ae inteested in coss iteation data dependences e need only ecod at most one ead and one ite access in R x fo any paticula iteation, i.e., subsequent eads o ites to A[x] in the same iteation can be ignoed. Pehaps the simplest method of constucting the element aays (b) (d) (c) do i =,8 A(W(i)) = = A(R(i)) ok(i) W(:8) = [ 5 6 ] R(:8) = [ 7 8 ] Poc pr ph (a) index.. index.. ite type level? index in PR?? doall p =,npoc pivate intege j do j=stat(p,nite),end(p,nite) makite(w(j)) makead(r(j)) all Poc pr ph index. index.. (b) (c) ? 5 Figue : An example of the pivate element aays pr and hieachy vectos ph (c) hen to pocessos ae used in the inspecto doall loop (b) fo the souce do loop (a). R x is to fist place a ecod fo each memoy efeence into an aay R A, and then sot the ecods lexicogaphically by aay element numbe (fist key) and iteation numbe (second key). Afte soting, each aay R x ill occupy a contiguous potion (a subaay) in the aay R A. In this case the making opeations simply ecod the infomation about the access into R A. Afte the lexicogaphic sot, the level of each efeence in D x can be computed by a pefix sum computation. Hoeve, since the ange of the values to be soted is knon in advance (it is given by the dimension of the shaed aay A), a linea time bucket o bin sot can be used in place of the moe geneal O(n log n) lexicogaphic sot. Moeove, if the inspecto s making phase is chunked (i.e., statically scheduled), then futhe optimization is possible. In this case, pocesso i ill be assigned iteations idn=pe though (i +)dn=pe,,heepis the total numbe of pocessos, n is the numbe of iteations in the loop, and 0 i<p. The basic idea is as follos. Fist, in a pivate making phase, each pocesso maks the efeences in its assigned iteations, and constucts element aays R x and hieachy vectos H x as descibed above, but only fo the efeences in its assigned iteations. Then, in a coss pocesso analysis phase, the hieachy vectos fo the hole iteation space of the loop ae fomed using the pocessos hieachy (sub)vectos. The pivate making phase poceeds as follos. Let A[:s] be the shaed aay unde scutiny, and suppose each pocesso has a sepaate aay pr[:s; :n=p] in hich to stoe the ecods of the efeences in its set of iteations. Each ecod contains the iteation, type of efeence, and level as descibed above. (The second dimension of :n=p follos since at most one ead and one ite to any element need to be maked in each iteation, and each pocesso has n=p iteations.) Assuming a pocesso maks its iteations in ode of inceasing iteation numbe, it can immediately place the ecods fo the efeences into its aay pr in soted ode of iteation numbe. In addition to the aay pr, each pocesso has a sepaate aay ph[:s; :n=p] used to stoe the hieachy vectos fo the efeences in its assigned set of iteations. Again,

4 assuming that iteations ae pocessed in inceasing ode of iteation numbe, the hieachy vectos can be filled in at the same time that the efeences ae ecoded in pr (see Figue (c)). In the coss-pocesso analysis phase e need to find fo each aay element A[x] the pedecesso, if any, of the fist efeence ecoded by each pocesso, i.e., e need to fill in the value in pocesso i s hieachy vecto fo the efeence that immediately pecedes (in the dependence gaph D x) the fist efeence to A[x] that as assigned to pocesso i. Similaly, e must find the immediate successoof the last efeence to A[x] that as assigned to pocesso i. Pocesso i can find the pedecessos (successos) needed fo its hieachy vectos by scanning the aays of the pocessos less than (lage than) i. Fo example, the? at the end of ph[] fo pocesso in Figue ould be filled in ith a pointe to the fist element in the aay pr[] of pocesso. Hence, the initial and final enties in the hieachy vectos also need to stoe the pocesso numbe that contains the pedecesso and successo. These scans can be made moe efficient by maintaining some auxiliay infomation, e.g., fo each aay element, each pocesso computes the total numbe of accessesit ecoded, and the indices in pr of the fist and last ite to that element. In any case, e note that filling in the pocessos hieachy vectos equies a minimal amount of intepocesso communication, i.e., it equies only a connecting and not a full meging of the diffeent hieachy vectos. Thee ae seveal ays in hich the above sketched analysis phase can be optimized. Fo example, in ode to detemine hich aay elements need pedecessos and successos (i.e., the elements ith non empty aays R x), the pocesso needs to check each o of its aay pr (o i of pr coesponds to the aay R i). This could be a costly opeation if the dimension of the oiginal aay is lage and the pocesso s assigned iteations have a spase access patten. Hoeve, the need to check each o in pr can be avoided by maintaining a list of the non empty os. This list can be constucted duing the making phase, and then tavesed in the analysis phase. Anothe souce of inefficiency fo machines ith many pocessos is the seach fo a paticula pedecesso (o successo)since each pocesso might need to look fo a pedecesso in all the peceding (succeeding) pocessos iteations. The cost of these seaches can be educed fom p to O(log p) using a standad paallel divide and conque pai ise meging appoach [6], hee p is the total numbe of pocessos. Pivatization and Reduction Recognition. The basic inspecto descibed above can easily be augmented to find the aay elements that ae independent (i.e., accessed in only one iteation), ead only, pivatizable, o eduction vaiables. We fist considethe poblem of identifying independent, ead only, and pivatizable aay elements. Duing the making phase, a pocesso maintains the status of each element efeenced in its assigned iteations ith espect to only these iteations. In paticula, if it finds than an element is itten in any of its assigned iteations, then it is not ead only. If an element is accessed in moe than one of its assigned iteations, then it is not independent. If an element as ead befoe it as itten in any of its assigned iteations, then it is not pivatizable. Next, the final status of each element is detemined in the coss pocesso analysis phase as follos. An element is independent if and only if it as classified as independent by exactly one pocesso, and as not efeenced on any othe pocesso. An element is ead only if and only if it as detemined to be ead only by evey pocesso that efeenced it. Similaly, an element is pivatizable if and only if it as pivatizable on evey pocesso that accessed do i =, n S: A(K(i)) =... S:... = A(L(i)) S: A(R(i)) = A(R(i)) + exp() doall i =, n makite(k(i)) makedux(k(i)) S: A(K(i)) =... makead(l(i)) makedux(l(i)) S:... = A(L(i)) makite(r(i)) S: A(R(i)) = A(R(i)) + exp() all Figue : The tansfomation of the do loop in (a) is shon in (b). The makite (makead) opeation adds a ecod to the pocesso s aay pr (if its not a duplicate), and updatesthe hieachyvecto ph appopiately. The makedux opeation invalidates the indicated aay element as a eduction vaiable since it is accessed outside the eduction statement S. it. Thus, the elements can be categoized by a simila pocess to the one used to find the pedecessos and successos hen filling in the pocessos hieachy vectos. Finally, if e maintain a linked list of the non empty os of pr as mentioned above, then the os coesponding to elements that ee found to be independent, ead only, o pivatizable ae emoved fom the list, i.e., accesses to these elements need not be consideedhen constucting the paallel execution schedule fo the loop iteations. We no conside the poblem of veifying that a statement is a eduction using un time data dependence analysis. Recall that potential eduction statements ae geneally identified by syntactically matching the statement ith the geneic eduction template x = x exp,heex is the eduction vaiable, and is an associative opeato. The statement is validated as a eduction if it can be shon that x is neithe efeenced in exp no anyhee in the loop body outside the eduction statement. Fo example, although statement S in the loop in Figue (a) matches a eduction statement, it is still necessay to pove that the elements of aay A efeenced in S and S do not ovelap ith those accessed in statement S, i.e., that: K(i) 6= R(j) and L(i) 6= R(j), foall i; j n. It tuns out that this condition can be tested in the same ay that ead only and pivatizable aay elements ae identified. In paticula, duing the making phase, heneve an element is accessed outside the eduction statement the pocesso invalidates that element as a eduction vaiable. Again, the final status of each element is detemined in the coss pocesso analysis phase, i.e., an element is a eduction vaiable if and only if it as not invalidated as such by any pocesso. This basic stategy can be extended to handle moe complex eduction opeations (efe to [] fo details). Complexity of the Inspecto. The ost case complexity of the inspecto is O(a log p),heea is the maximum numbe of efeences assigned to each pocesso and p is the total numbe of pocessos. In paticula, using the bucket sot implementation, each pocesso spends constant time on each of its O(a) accesses in the making phase, andthe analysis phasetakestime O(a log p) using a paallel divide and conque pai ise meging stategy [6]. We emak that since the cost of the analysis phase is popotional to the numbe of distinct elements accessed (i.e., the numbe of non empty os in the pr aay) the complexity of this phase could be significantly less than O(a log p) if thee ae many epeated efeences in the loop. Also, if a log p>s, then the mege among the pocesses can be impoved to O(s + log p) time by chunking the pr aays. (a) (b)

5 . The Schedule The schedule deives the moe estictive iteation-ise dependence elations fom the memoy location dependence infomation found by the inspecto. A valid paallel execution schedule fo a loop is a patition of the set of iteations into odeed subsets called avefonts, so that all coss-iteation dependences go fom an iteation in a loe numbeed avefont to an iteation in a highe numbeed avefont. We say that a valid paallel execution schedule is optimal if it has a minimum numbe of avefonts, i.e., is has as many avefonts as the longestpath (the citical path) in the diected acyclic gaph (dag) descibing the coss-iteation dependencesin the loop. We emak that the schedules descibed belo can be used to constuct the full iteation schedule in advance (as descibed) o they can be inteleaved ith the executo, i.e., the iteations could be executed as they ae found to be eady. A simple schedule. A simple schedulethat finds an optimal schedule is sketched in Figue 5(a). In the figue, an aay f(i) stoes the avefont found fo iteation i, the global vaiable done flags if all iteations have been scheduled, dy(i) signals if iteation i is eady to be executed, loe case lettes (a,b) ae used fo efeences to aay elements, a.ite is the iteation hich contains efeence a, andped(a) is the set of immediate pedecessos of a in the aay element dependence gaphs. The scheduling is pefomed in phases (line ) so that in phase i the iteations belonging to ith avefont ae identified. In each phase, all the efeences ecoded in the pr aays ae pocessed (lines 7 6), and the pedecessos of all efeences hose iteations have not been scheduled (line 0) ae examined. An iteation is not eady if the iteations of any of its efeence s pedecessos ee not assigned to pevious avefonts (line ). Afte all the efeences ae pocessed, all the iteations ae examined (lines 7 9) to see hich can be added to the cuent avefont: an iteation i is eady (line 8) if none of its efeences set dy(i) to false. Advantages of this schedule ae that it is conceptually vey simple and quite easy to implement. Optimizing the simple schedule. Thee ae some souces of inefficiency in this schedule. Fist, since a ite access could potentially have many paent ead accesses it could pove expensive to equie each ite to check all its paents (line 0). Fotunately, this poblem is easily cicumvented by equiing an unscheduled ead access to infom its successo s iteation that it is not eady. Then, a ite access only needs to check its pedecesso if the (single) pedecesso is also a ite. Anothe souce of inefficiency aises fom the fact that each inne doall (lines 7 6) equies time O(n a=p) to identify unscheduled iteations (line 9), hee n a is the total numbe of accesses to the shaed aay and p is the numbe of pocessos. Thus, the schedule takes time O((n a=p)cpl), hee cpl is the length of the citical path. If cpl p, then it cannot be expected to offe any speedup ove sequential execution, and even ose, it could yield slodons fo longe citical paths. Hoeve, note that in any single iteation of the schedule, the only iteations that could potentially be added to the next avefont must have all thei accesses at the loest unscheduledlevel in thei espective element ise dependencegaphs. Fo example, conside the dependence gaph shon in Figue 5(b). If iteation (level ) has not been scheduled yet, then none of the iteations ith accesses in highe levels could be added to the cuent avefont. Thus, in each of the cpl iteations of the do hile loop, e ould like to examine only those efeences that ae in the topmost unscheduled level of thei espective dependence f(:numite) = 0 done =.false. cpl = do hile (done.eq..false.) dy(:numite) =.tue. level done =.tue. doall i =, numaccess a = access(i) if (f(a.ite).eq. 0) then fo each (b in Ped(a)) if (f(b.ite).eq.0) then done =.false. dy(a.ite) =.false. endfo endif all doall i =,numite if (f(i).eq. 0.and. dy(i).eq..tue.) all cpl = cpl + hile D x fo A[x] iteation 8 9 (a) f(i) = cpl Figue 5: A simple schedule (a), and the dependencegaph fo one of the memoy locations accessed in the loop (b). gaph. Fist note that e can easily identify the accesses on each level of the aay element dependence gaphs since efeences ae stoed in inceasing level ode in the pr aays and the ph aays contain pointes the fist access at each level. To pocess only the accesses on the loest unscheduled level it is useful to have a count of the total numbe of (ecoded) accesses in each iteation hich can easily be extacted in the making phase. Then, in the schedule, a count of the numbe of eady accesses fo each iteation is computed on a pe pocesso basis in the fist doall (lines 7 6). In the second doall (lines 7 9), the coss-pocesso sum of the eady access counts fo each unscheduled iteation is compaed to its total access count, and if they ae equal the iteation is added to the cuent avefont. In summay, e ould expect the optimized vesion to outpefom the oiginal schedule if thee ae multiple levels in the aay element dependence gaphs. Hence, the detemination of hich vesion to use should be made using knoledge gained about the access patten by the inspecto. In [], e discuss ays to educe scheduling ovehead such as ovelapping avefont computation ith actual loop execution and using dynamic eady queues []. A Compaison ith Pevious Methods We no compae the methods descibed in this pape to seveal othe techniques that have been poposed fo analyzing and scheduling do loops at un time. Most of this ok has concentated on developing inspectos. A high level compaison of the vaious methods is given in Table. Methods utilizing citical sections. The method of Zhu and Ye [] computes the avefonts one afte anothe using a method simi- (b)

6 obtains contains equies esticts pivat optimal seial global type of o Method sched potions synch loop educt Ne Yes No No No P,R ZY [] No No Yes No No MP [0] Yes No Yes No No KS [] No No Yes No P CYT [9] No ; No Yes No No SM [8] No No Yes Yes 5 No SMC [0] Yes Yes Yes Yes 5 No LZ [7] Yes No Yes Yes 5 No P[] No No No No No RP [5, 6] No 6 No No No P,R Table : A compaison of un time paallelization techniques fo do loops. In the table enties, P and R sho that the method identities pivatizable and eduction vaiables, espectively. The supescipts have the folloing meanings:, the method seializes all ead accesses;, pefomance can degade significantly in the pesence of hotspots;, the schedule/executo is a doacoss loop (iteations ae stated in a apped manne) and busy aits ae used to enfoce cetain data dependences;, the inspecto loop sequentially taveses the access patten; 5, the method is applicable only to loops ithout output dependences (i.e., each memoy location is itten at most once); 6, the method identifies only fully paallel loops. la to the simple schedule descibed in Section.. Duing a phase, an iteation is added to the cuent avefont if none of the data accessed in that iteation is accessed by any loe unassigned iteation; the loest unassigned iteation to access any aay element is found using atomic compae-and-sap synchonization pimitives and a shado vesion of the aay. Midkiff and Padua [0] extended this method to allo concuent eads fom a memoy location in multiple iteations. These methods un the isk of a sevee degadation in pefomance fo access pattens containing hot spots (i.e., many accesses to the same memoy location). A featue of them is that they use only a shado vesion of the shaed aay heeas all othe methods (except [, 5, 6]) unoll the loop and stoe all accesses to the shaed aay. Kothapalli and Sadayappan [] poposed a un time scheme fo emoving anti and output dependences fom loops. Fo each memoy location, thei inspecto counts the numbe efeences to it (using citical sections as in []), places them in a dynamically allocated aay, and then sots them by iteation numbe. Afte building a dependence gaph fo each memoy location (simila to ou aays R x), the inspecto emoves all anti and output dependences by ediecting the accesses to dynamically allocated stoage (using an additional level of indiection). Flo dependences ae enfoced using full/empty bits. To ou knoledge, this is the only othe un time pivatization technique except fo the one descibed in [5, 6]. Recently, Chen, Ye, and Toellas [9] poposed an inspecto that fist builds (in pivate stoage) access lists fo each memoy location efeenced in a pocesso s assigned iteations (simila to [] and ou inspecto s making phase, except they seialize ead accesses), and then links them acoss pocessos using a global Zhu/Ye algoithm []. Thei schedule/executouses doacoss paallelization [8] (see belo). Although this scheme potentially has less communication ovehead than [], it is still sensitive to hot spots and thee ae cases (e.g., doalls) in hich it poves infeio to []. Methods fo loops ithout output dependences. This poblem has also been studied extensively by Saltz et al. [5, 8, 9, 0, ]. Most of thei ok assumes that thee ae no output dependences in the souce loop. In doacoss paallelization [8], an inspecto finds the (at most one) iteation in hich each vaiable is itten. The schedule/executo stats iteations in a apped manne and pocessos busy ait until thei opeands ae available. In [0], the inspectoconstucts avefonts that espectthe flodependencesby pefoming a sequential topological sot of the accesses in the loop, and the schedule/executo enfoces any anti dependences using old and ne vesions of each vaiable (possible since each vaiable in the souce loop is itten at most once). The topological sot can be paallelized somehat using doacoss paallelization. Leung and Zahojan [7] poposed methods of paallelizing the sequential inspecto of [0]. In theit sectioning method, the loop is chunked and each pocesso computes an optimal schedule fo its chunk, and then these schedules ae concatenated togethe sepaated by synchonization baies. In bootstapping technique, the inspecto is paallelized (not optimally) using sectioning, but an optimal schedule is poduced. Othe methods. In contast to the above methods hich place iteations in the loest possible avefont, Polychonopolous [] gives a method hee avefonts ae maximal sets of contiguous iteations ith no coss-iteation dependences. Dependences ae detected using shado vesions of the vaiables, eithe sequentially, o in paallel ith the aid of citical sections as in []. All of the above mentioned methods attempt to find a valid paallel execution schedule fo the souce do loop. Recently, e consideed a elated poblem [5, 6]: testing at un time hethe the loop is fully paallel, i.e., hethe thee ae any coss-iteation dependences in the loop. Ou inteest in fully paallel loops is motivated by the obsevation that they aise fequently in eal pogams. 5 Implementation and Expeimental Results We pesent expeimental esults obtained on to modestly paallel machines ith 8 (Alliant FX/80 []) and pocessos (Alliant FX/800 []). Hoeve, e emak that the esults scale ith the numbe of pocessos and the data size and thus they may be extapolated fo massively paallel pocessos (MPPs), the actual taget of ou un time methods. To demonstate that the ne methods can achieve speedups, e applied them to thee loops contained in the PERFECT Benchmaks []. To analyze the ovehead incued by the methods e applied them to access pattens taken fom actual pogams and to synthetic access pattens. The methods ee implemented in Ceda Fotan []. The inspecto as essentially as descibed in Section.. In paticula, e implemented the bucket sot vesion using sepaate pr and ph data stuctues fo each pocesso. Each pocesso constucted a linked list of the non-empty os in its pr aay duing the making phase. Checks fo independent, ead only, and pivatizable elements ee implemented in the inspecto (e have not yet included the test fo eduction vaiables). In the analysis phase, these elements ae classified at the same time that the pedecessos and successos ae found fo each o. An optimization that e did not yet implement as the pai ise mege acoss pocessos hen seaching fo pedecessos o successos in the analysis phase (o hen classifying elements as independent, ead only, o pivatizable). Hoeve, this is an impotant optimization since, as peviously noted, ithout it the analysis phase of the inspecto may fail to scale ith the numbe of pocessos. Since e implemented the optimized vesion of the sim-

7 ple schedule descibed in Section., a count of the total numbe of accessesin each iteation as computed in the making phase (no inte-pocesso communication is needed to detemine these counts since each iteation is assigned to a single pocesso). Fo simplicity, the schedule and the executo ee completely decoupled in the implementation, but bette speedups should be obtainable by inteleaving these to tasks (see Section.). We emak that thee ae othe issues to be consideed hen applying these methods in a eal application envionment such as memoy equiements and knon bounds on the souce loop s available paallelism (efe to [] fo moe details). Synthetic Loops Using synthetic loops, e studied the sensitivity of the ovehead of the methods to to chaacteistics of the souce do loop: its aveage paallelism (#iteations/cpl) and its hotspot degee (the maximum numbe of epeated accesses to any aay element). To simplify the geneation of the synthetic okloads, e did not identify independent, ead only, o pivatizable elements in the analysis phase. Aveage paallelism. To isolate the effect of the aveage paallelism in the souce loop on the ovehead of the methods, e geneated access pattens that ee as simila as possible in all aspects except fo the aveage paallelism: each iteation had to accesses (a ead folloed by a ite), and evey aay element as accessed appoximately tice. We ould not expect the inspecto s execution time to be dependent on the aveage paallelism in the souce loop since it is fully paallel. Hoeve, as the schedule uns in cpl steps, its execution time should be invesely coelated ith the aveage paallelism. In Figues 6 and 7 e display esults fom a loop ith 08 iteations un on 0 pocessos. The plot shos the ovehead incued fo a loop ith a citical path length of Step. As expected, the ovehead of the inspecto is invaiant ith the length of the citical path, and that of the schedule gos linealy ith this length. We also studied ho ovehead speedup elates to aveage paallelism. The inspecto s ovehead is independent of the aveage paallelism since it is fully paallel. Although, the schedule consists of cpl steps, it may still exhibit substantial speedupssince each step is fully paallel. In fact, in Figues 8 and 9 e sho that almost identical speedups ae obtained fo sequential, patially paallel, and fully paallel loops fo both the inspecto and schedule. The slightly diminished slope of the inspecto s speedup cuve afte about 0 pocessos is because ou implementation did not use a pai ise mege among the pocessos (Section.). Hotspots. To isolate the effect of the hotspot degee in the souce loop on the ovehead of the methods, e geneated simila access pattens diffeing only in hotspot degee: all loops had 08 iteations (each ith to accesses), a citical path length of 0, and a loop ith hotspot value h contained h efeences to each of 08=h aay elements. We ould not expect the methods to be negatively affected by the hot spot degee. In fact, a lage hotspot degee implies fee non-empty os in the pr aay, and thus e might see impoved esults in the analysis and scheduling phases. The esults in Figue 0 sho that in fact the total ovehead (inspecto + schedule) is nealy the same fo all hotspot degees. Loops fom the MA8 Solve We applied the ne methods to loops fom eal applications, both to demonstate the divesity of patially paallel access pattens and also to econfim the conclusions eached above using synthetic loops. Fo this pupose e chose Loop MA0cd/DO 0 fom MA8 (a blocked spase non-symmetic linea solve [0]). We selected this loop, hichpefoms the foad backadsubstitution in the final phase of the blocked spase linea system solve, because it can geneate many divese access pattens hen using the Haell-Boeing matices as input. Unfotunately, the loop itself is not a good candidate fo paallelization since it pefoms vey little ok and is highly imbalanced. We discuss to input sets: gemat, hich geneates 99 iteations, and bp 600, hich geneates 8 iteations. Afte extacting and pecomputing the linea ecuences fom the souce loop (based on the methods in [7]), e geneated a paallel inspecto and computed an optimal paallel execution schedule fo the loop. The paallelism pofiles obtained (Figues and ) sho the avefont sizes of the optimal paallel execution schedule and illustate ho the same loop can geneate vastly diffeent dependence gaphs given diffeent input. Figue shos that most of the iteations of the loop can be executed in the initial avefonts (cpl = ), hich suggests that inteleaving the avefont computation and execution ould be moe beneficial than ovelapping them, so that paallelization can be abandoned hen the sequential tail of the pofile is eached. Although in Figue most of the iteations ae also executed in the initial avefonts, in this case it appeas that some benefit could be gained by ovelapping, i.e., e can take advantage of the pauses in paallelism to compute futue (hopefully lage) avefonts. The histogams in Figues and undescoe the need fo scheduling and execution stategies that can adapt dynamically depending upon the type of paallelism encounteed. Figues 5 and 6 sho that ovehead speedup is invaiant ith the paallelism pofile. Lage speedups ee not obtained since the loop is heavily imbalanced due to the blocked natue of the algoithm used in MA8. Pefect Benchmak Loops We applied the methods to thee loops contained in the PERFECT Benchmaks []. In the analysis phase it as found that one of the loops as fully paallel, and that the othe to could be tansfomed into doalls by pivatizing the shaed aay unde test. Figues 7 though 9 sho the speedup measued fo each loop as a function of the numbe of pocessos used. As a efeence, e give the ideal speedup, hich as measued using an optimally paallelized (by hand) vesion of the loop. Thesegaphs shothat the speedupscales ith the numbe of pocessos and is a significant pecentage of the ideal speedup. We note that these loops could also be identified by the LRPD test [5, 6], a un time test fo identifying fully paallel loops, i.e., loops that can be tansfomed into doalls using pivatization and eduction paallelization. Although the LRPD test has a smalle ovehead than the methods pesented hee, it cannot extact patial paallelism. In BDNA ACTFOR Loop 0, the shaed aay unde test is accessedthough a subsciptaay computedinside the loop hich is found to be pivatizable in the analysis phase (Figue 7). In MDG INTERF Loop 000, it is also found that the shaed aay unde test is pivatizable in the analysis phase (Figue 8). In OCEAN FTRVMT Loop 09, all accesses to the shaed aay ae found to

8 be unique in the analysis phase. Since this loop is invoked 6,000 times, and accounts fo 0% of the sequential execution time of the pogam, it is an excellent candidate fo schedule euse [0]. The access patten fo each instantiation of the loop is detemined by a set of five scalas. In ode to apply schedule euse, e checked hethe the cuent set of scalas matched a peviously analyzed set. If not, then e applied the paallelization techniques, and if they did match then e simply executed the loop as a doall. As can be seen in Figue 9, ith schedule euse e obtain scalable speedups that ae compaable to the ideal speedup. 6 Conclusion Paallelizing statically intactable loops at un time is an impotant task since automatic, compile time paallelization had stopped ith egula, ell behaved,statically defined pogams hich epesent only a faction of all applications. We believe that aggessive, dynamic techniques such as those descibed hee can beak this baie and extact much of the available paallelism fom even the most complex pogams. The scalability of ou methodsensuesthat thei un time ovehead can be educed to an insignificant faction of the pogam s sequential execution time, hich implies that thei significance ill only incease ith the advent of massively paallel pocessos (MPPs). Although these ne methods illustate the potential benefits of un time paallelization, thee is still much ok left to be done. Fo example, thee ae many potential scheduling stategies that need to be studied. Anothe impotant task is to devise effective, automatable stategies fo detemining hen and ho to use un time paallelization. Since speedups obtainable fom un time paallelization ae uppe bounded by the inheent paallelism of the loop, the compile needs to estimate obtainable paallelism. Such estimates can be poduced only though collection and intepetation of valid statistics fom pogams in diffeent application domains. The ne methods povide a useful tool fo such studies since they detemine the dependence gaph and paallelism pofile of the loop. It should be noted that un time ovehead could be significantly educed though achitectual suppot. We vie the methods descibed in this pape as a building block in an evolving fameok of un time paallelization as a complement to the existing techniques [5, 6, 7]. Acknoledgment. We ould like to thank Paul Petesen fo his useful advice, and William Blume and Bett Masolf fo identifying and claifying applications fo ou expeiments. We ae also gateful to Richad Cole fo suggestions egading soting algoithms. Refeences [] Alliant Compute Systems Copoation. FX/Seies Achitectue Manual, 986. [] Alliant Computes Systems Copoation. Alliant FX/800 Seies System Desciption, 99. [] U. Banejee. Dependence Analysis fo Supecomputing. Klue. Boston, MA., 988. [] M. Bey and othes. The PERFECT club benchmaks: Effective pefomance evaluation of supecomputes. TR. 87, Ct. fo Supecomputing R.&D., Univ. of Illinois, Ubana, IL, May 989. [5] H. Beyman and J. Saltz. A manual fo PARTI untime pimitives. Inteim Repot 90-, ICASE, 990. [6] W. Blume and R. Eigenmann. Pefomance analysis of paallelizing compiles on the Pefect Benchmaks TM Pogams. IEEE Tans. on Paallel and Distibuted Systems, (6):6 656, Nov. 99. [7] M. Buke, R. Cyton, J. Feante, and W. Hsieh. Automatic geneation of nested, fok-joinpaallelism. J. of Supecomputing,pp. 7 88,989. [8] W. J. Camp, S. J. Plimpton, B. A. Hendickson,and R. W. Leland. Massively paallel methods fo engineeing and science poblems. Comm. ACM, 7():, Apil 99. [9] D. K. Chen, P. C. Ye, and J. Toellas. An efficient algoithm fo the un-time paallelization of doacoss loops. In Poc. of Supecomputing 99, pp , Nov. 99. [0] I. S. Duff. Ma8 a set of Fotan suboutines fo spase unsymmetic linea equations. Tech. Rept. AERE R870, HMSO, London, 977. [] R. Eigenmann, J. Hoeflinge, Z. Li, and D. Padua. Expeience in the automatic paallelization of fou Pefect-Benchmak pogams. In Lectue Notes in Comp. Science 589. Poc. of the th Wokshop on Languages and Compiles fo Paallel Computing, Santa Claa, CA, pp. 65 8, Aug. 99. [] M. Guzzi, D. Padua, J. Hoeflinge, and D. Laie. Ceda Fotan and othe vecto and paallel Fotan dialects. J. Supecomput., ():7 6, Mach 990. [] V. Kothapalli and P. Sadayappan. An appoach to synchonization of paallel computing. In Poc. of the 988 Int. Conf. on Supecomputing, pp , June 988. [] C. Kuskal. Efficient paallel algoithms fo gaph poblems. In Poc. of the 986 Int. Conf. on Paallel Pocessing, pp , Aug [5] D. J. Kuck, R. H. Kuhn, B. Leasue, D. A. Padua, and M. Wolfe. Dependence gaphs and compile optimizations. In Poc. of 8th ACM Symp. Pincip. Pog. Lang., pp. 07 8, Jan. 98. [6] F. Thomson Leighton. Intoduction to Paallel Algoithms and Achitectues: Aays, Tees, Hypecubes. Mogan Kaufmann, 99. [7] S. Leung and J. Zahojan. Impoving the pefomance of untime paallelization. In th PPOPP, pp. 8 9, May 99. [8] Zhiyuan Li. Aay pivatization fo paallel execution of loops. In Poc. of the 9th Int. Symp. on Comput. Ach., pp., 99. [9] D. E. Maydan, S. P. Amaasinghe, and M. S. Lam. Data dependence and data-flo analysis of aays. In Poc. 5th Wokshop on Pogamming Languages and Compiles fo Paallel Computing, Aug. 99. [0] S. Midkiff and D. Padua. Compile algoithms fo synchonization. IEEE Tans. Comput., C-6():85 95, 987. [] J. Moeia and C. Polychonopoulos. Autoscheduling in a distibuted shaed-memoy envionment. TR. 7, Ct. fo Supecomputing R.&D., Univ. of Illinois, Ubana, June 99. [] D. Padua and M. Wolfe. Advanced compile optimizations fo supecomputes. Communications of the ACM, 9:8 0, Dec [] C. Polychonopoulos. Compile optimizations fo enhancing paallelism and thei Impact on achitectue design. IEEE Tans. Comput., C-7(8):99 00, Aug [] L. Rauchege, N. Amato and D.Padua. Run-time methods fo paallelizing patially paallel loops. TR. 00, Ct. fo Supecomputing R.&D., Univ. of Illinois, Ubana, IL, May 989. [5] L. Rauchege and D. Padua. The pivatizing doall test: A un-time technique fo doall loop identification and aay pivatization. In Poc. of the 99 Int. Conf. on Supecomputing, pp., July 99. [6] L. Rauchege and D. Padua. The LRPD test: Speculative un-time paallelization of loops ith pivatization and eduction paallelization. In ACM SIGPLAN Conf. on Pogamming Language Design and Implementation, June, 995. [7] L. Rauchege and D. Padua. Paallelizing hile loops fo multipocesso systems. In 9th Int. Paallel Pocess. Symp., Apil, 995.

9 [8] J. Saltz and R. Michandaney. The pepocessed doacoss loop. In D. H.D. Schetman, edito, Poc. of the 99 Int. Conf. on Paallel Pocessing, pp CRC Pess, Inc., 99. Vol. II - Softae. [9] J. Saltz, R. Michandaney, and K. Coley. The doconside loop. In Poc. of the 989 Int. Conf. on Supecomputing,pp. 9 0, June 989. [0] J. Saltz, R. Michandaney, and K. Coley. Run-time paallelization and scheduling of loops. IEEE Tans. Comput., 0(5), May 99. [] P. Tu and D. Padua. Automatic aay pivatization. In Poc. 6th Annual Wokshop on Languages and Compiles fo Paallel Computing, Potland, OR, Aug. 99. [] M. Wolfe. Optimizing Compiles fo Supecomputes. The MIT Pess, Boston, MA, 989. [] J. Wu, J. Saltz, S. Hianandani,and H. Beyman. Runtime compilation methods fo multicomputes. In D. H.D. Schetman, edito, Poc. of the 99 Int. Conf. on Paallel Pocessing, pp CRC Pess, Inc., 99. Vol. II - Softae. [] C. Zhu and P. C. Ye. A scheme to enfoce data dependence on lage multipocesso systems. IEEE Tans. Soft. Eng., (6):76 79, 987. [5] H. Zima. Supecompiles fo Paallel and Vecto Computes. ACM Pess, Ne Yok, NY, 99. Figue 8: Figue 9: Figue 6: Figue 0: Figue 7: Figue :

Efficient Execution Path Exploration for Detecting Races in Concurrent Programs

Efficient Execution Path Exploration for Detecting Races in Concurrent Programs IAENG Intenational Jounal of Compute Science, 403, IJCS_40_3_02 Efficient Execution Path Exploation fo Detecting Races in Concuent Pogams Theodous E. Setiadi, Akihiko Ohsuga, and Mamou Maekaa Abstact Concuent

More information

MapReduce Optimizations and Algorithms 2015 Professor Sasu Tarkoma

MapReduce Optimizations and Algorithms 2015 Professor Sasu Tarkoma apreduce Optimizations and Algoithms 2015 Pofesso Sasu Takoma www.cs.helsinki.fi Optimizations Reduce tasks cannot stat befoe the whole map phase is complete Thus single slow machine can slow down the

More information

Pipes, connections, channels and multiplexors

Pipes, connections, channels and multiplexors Pipes, connections, channels and multiplexos Fancisco J. Ballesteos ABSTRACT Channels in the style of CSP ae a poeful abstaction. The ae close to pipes and connections used to inteconnect system and netok

More information

A Memory Efficient Array Architecture for Real-Time Motion Estimation

A Memory Efficient Array Architecture for Real-Time Motion Estimation A Memoy Efficient Aay Achitectue fo Real-Time Motion Estimation Vasily G. Moshnyaga and Keikichi Tamau Depatment of Electonics & Communication, Kyoto Univesity Sakyo-ku, Yoshida-Honmachi, Kyoto 66-1, JAPAN

More information

Journal of World s Electrical Engineering and Technology J. World. Elect. Eng. Tech. 1(1): 12-16, 2012

Journal of World s Electrical Engineering and Technology J. World. Elect. Eng. Tech. 1(1): 12-16, 2012 2011, Scienceline Publication www.science-line.com Jounal of Wold s Electical Engineeing and Technology J. Wold. Elect. Eng. Tech. 1(1): 12-16, 2012 JWEET An Efficient Algoithm fo Lip Segmentation in Colo

More information

IP Network Design by Modified Branch Exchange Method

IP Network Design by Modified Branch Exchange Method Received: June 7, 207 98 IP Netwok Design by Modified Banch Method Kaiat Jaoenat Natchamol Sichumoenattana 2* Faculty of Engineeing at Kamphaeng Saen, Kasetsat Univesity, Thailand 2 Faculty of Management

More information

Any modern computer system will incorporate (at least) two levels of storage:

Any modern computer system will incorporate (at least) two levels of storage: 1 Any moden compute system will incopoate (at least) two levels of stoage: pimay stoage: andom access memoy (RAM) typical capacity 32MB to 1GB cost pe MB $3. typical access time 5ns to 6ns bust tansfe

More information

RANDOM IRREGULAR BLOCK-HIERARCHICAL NETWORKS: ALGORITHMS FOR COMPUTATION OF MAIN PROPERTIES

RANDOM IRREGULAR BLOCK-HIERARCHICAL NETWORKS: ALGORITHMS FOR COMPUTATION OF MAIN PROPERTIES RANDOM IRREGULAR BLOCK-HIERARCHICAL NETWORKS: ALGORITHMS FOR COMPUTATION OF MAIN PROPERTIES Svetlana Avetisyan Mikayel Samvelyan* Matun Kaapetyan Yeevan State Univesity Abstact In this pape, the class

More information

Segmentation of Casting Defects in X-Ray Images Based on Fractal Dimension

Segmentation of Casting Defects in X-Ray Images Based on Fractal Dimension 17th Wold Confeence on Nondestuctive Testing, 25-28 Oct 2008, Shanghai, China Segmentation of Casting Defects in X-Ray Images Based on Factal Dimension Jue WANG 1, Xiaoqin HOU 2, Yufang CAI 3 ICT Reseach

More information

Point-Biserial Correlation Analysis of Fuzzy Attributes

Point-Biserial Correlation Analysis of Fuzzy Attributes Appl Math Inf Sci 6 No S pp 439S-444S (0 Applied Mathematics & Infomation Sciences An Intenational Jounal @ 0 NSP Natual Sciences Publishing o Point-iseial oelation Analysis of Fuzzy Attibutes Hao-En hueh

More information

An Unsupervised Segmentation Framework For Texture Image Queries

An Unsupervised Segmentation Framework For Texture Image Queries An Unsupevised Segmentation Famewok Fo Textue Image Queies Shu-Ching Chen Distibuted Multimedia Infomation System Laboatoy School of Compute Science Floida Intenational Univesity Miami, FL 33199, USA chens@cs.fiu.edu

More information

Embeddings into Crossed Cubes

Embeddings into Crossed Cubes Embeddings into Cossed Cubes Emad Abuelub *, Membe, IAENG Abstact- The hypecube paallel achitectue is one of the most popula inteconnection netwoks due to many of its attactive popeties and its suitability

More information

A Non-blocking Directory Protocol for Large-Scale Multiprocessors. Technical Report

A Non-blocking Directory Protocol for Large-Scale Multiprocessors. Technical Report A Non-blocking Diectoy Potocol fo Lage-Scale Multipocessos Technical Repot Depatment of Compute Science and Engineeing Univesity of Minnesota 4-192 EECS Building 200 Union Steet SE Minneapolis, MN 55455-0159

More information

Detection and Recognition of Alert Traffic Signs

Detection and Recognition of Alert Traffic Signs Detection and Recognition of Alet Taffic Signs Chia-Hsiung Chen, Macus Chen, and Tianshi Gao 1 Stanfod Univesity Stanfod, CA 9305 {echchen, macuscc, tianshig}@stanfod.edu Abstact Taffic signs povide dives

More information

On the Conversion between Binary Code and Binary-Reflected Gray Code on Boolean Cubes

On the Conversion between Binary Code and Binary-Reflected Gray Code on Boolean Cubes On the Convesion between Binay Code and BinayReflected Gay Code on Boolean Cubes The Havad community has made this aticle openly available. Please shae how this access benefits you. You stoy mattes Citation

More information

Controlled Information Maximization for SOM Knowledge Induced Learning

Controlled Information Maximization for SOM Knowledge Induced Learning 3 Int'l Conf. Atificial Intelligence ICAI'5 Contolled Infomation Maximization fo SOM Knowledge Induced Leaning Ryotao Kamimua IT Education Cente and Gaduate School of Science and Technology, Tokai Univeisity

More information

a Not yet implemented in current version SPARK: Research Kit Pointer Analysis Parameters Soot Pointer analysis. Objectives

a Not yet implemented in current version SPARK: Research Kit Pointer Analysis Parameters Soot Pointer analysis. Objectives SPARK: Soot Reseach Kit Ondřej Lhoták Objectives Spak is a modula toolkit fo flow-insensitive may points-to analyses fo Java, which enables expeimentation with: vaious paametes of pointe analyses which

More information

FACE VECTORS OF FLAG COMPLEXES

FACE VECTORS OF FLAG COMPLEXES FACE VECTORS OF FLAG COMPLEXES ANDY FROHMADER Abstact. A conjectue of Kalai and Eckhoff that the face vecto of an abitay flag complex is also the face vecto of some paticula balanced complex is veified.

More information

A Two-stage and Parameter-free Binarization Method for Degraded Document Images

A Two-stage and Parameter-free Binarization Method for Degraded Document Images A Two-stage and Paamete-fee Binaization Method fo Degaded Document Images Yung-Hsiang Chiu 1, Kuo-Liang Chung 1, Yong-Huai Huang 2, Wei-Ning Yang 3, Chi-Huang Liao 4 1 Depatment of Compute Science and

More information

Communication vs Distributed Computation: an alternative trade-off curve

Communication vs Distributed Computation: an alternative trade-off curve Communication vs Distibuted Computation: an altenative tade-off cuve Yahya H. Ezzeldin, Mohammed amoose, Chistina Fagouli Univesity of Califonia, Los Angeles, CA 90095, USA, Email: {yahya.ezzeldin, mkamoose,

More information

Shortest Paths for a Two-Robot Rendez-Vous

Shortest Paths for a Two-Robot Rendez-Vous Shotest Paths fo a Two-Robot Rendez-Vous Eik L Wyntes Joseph S B Mitchell y Abstact In this pape, we conside an optimal motion planning poblem fo a pai of point obots in a plana envionment with polygonal

More information

And Ph.D. Candidate of Computer Science, University of Putra Malaysia 2 Faculty of Computer Science and Information Technology,

And Ph.D. Candidate of Computer Science, University of Putra Malaysia 2 Faculty of Computer Science and Information Technology, (IJCSIS) Intenational Jounal of Compute Science and Infomation Secuity, Efficient Candidacy Reduction Fo Fequent Patten Mining M.H Nadimi-Shahaki 1, Nowati Mustapha 2, Md Nasi B Sulaiman 2, Ali B Mamat

More information

DEADLOCK AVOIDANCE IN BATCH PROCESSES. M. Tittus K. Åkesson

DEADLOCK AVOIDANCE IN BATCH PROCESSES. M. Tittus K. Åkesson DEADLOCK AVOIDANCE IN BATCH PROCESSES M. Tittus K. Åkesson Univesity College Boås, Sweden, e-mail: Michael.Tittus@hb.se Chalmes Univesity of Technology, Gothenbug, Sweden, e-mail: ka@s2.chalmes.se Abstact:

More information

Towards Adaptive Information Merging Using Selected XML Fragments

Towards Adaptive Information Merging Using Selected XML Fragments Towads Adaptive Infomation Meging Using Selected XML Fagments Ho-Lam Lau and Wilfed Ng Depatment of Compute Science and Engineeing, The Hong Kong Univesity of Science and Technology, Hong Kong {lauhl,

More information

Lecture # 04. Image Enhancement in Spatial Domain

Lecture # 04. Image Enhancement in Spatial Domain Digital Image Pocessing CP-7008 Lectue # 04 Image Enhancement in Spatial Domain Fall 2011 2 domains Spatial Domain : (image plane) Techniques ae based on diect manipulation of pixels in an image Fequency

More information

Color Correction Using 3D Multiview Geometry

Color Correction Using 3D Multiview Geometry Colo Coection Using 3D Multiview Geomety Dong-Won Shin and Yo-Sung Ho Gwangju Institute of Science and Technology (GIST) 13 Cheomdan-gwagio, Buk-ku, Gwangju 500-71, Republic of Koea ABSTRACT Recently,

More information

ADDING REALISM TO SOURCE CHARACTERIZATION USING A GENETIC ALGORITHM

ADDING REALISM TO SOURCE CHARACTERIZATION USING A GENETIC ALGORITHM ADDING REALISM TO SOURCE CHARACTERIZATION USING A GENETIC ALGORITHM Luna M. Rodiguez*, Sue Ellen Haupt, and Geoge S. Young Depatment of Meteoology and Applied Reseach Laboatoy The Pennsylvania State Univesity,

More information

A modal estimation based multitype sensor placement method

A modal estimation based multitype sensor placement method A modal estimation based multitype senso placement method *Xue-Yang Pei 1), Ting-Hua Yi 2) and Hong-Nan Li 3) 1),)2),3) School of Civil Engineeing, Dalian Univesity of Technology, Dalian 116023, China;

More information

Slotted Random Access Protocol with Dynamic Transmission Probability Control in CDMA System

Slotted Random Access Protocol with Dynamic Transmission Probability Control in CDMA System Slotted Random Access Potocol with Dynamic Tansmission Pobability Contol in CDMA System Intaek Lim 1 1 Depatment of Embedded Softwae, Busan Univesity of Foeign Studies, itlim@bufs.ac.k Abstact In packet

More information

Information Retrieval. CS630 Representing and Accessing Digital Information. IR Basics. User Task. Basic IR Processes

Information Retrieval. CS630 Representing and Accessing Digital Information. IR Basics. User Task. Basic IR Processes CS630 Repesenting and Accessing Digital Infomation Infomation Retieval: Basics Thosten Joachims Conell Univesity Infomation Retieval Basics Retieval Models Indexing and Pepocessing Data Stuctues ~ 4 lectues

More information

A New and Efficient 2D Collision Detection Method Based on Contact Theory Xiaolong CHENG, Jun XIAO a, Ying WANG, Qinghai MIAO, Jian XUE

A New and Efficient 2D Collision Detection Method Based on Contact Theory Xiaolong CHENG, Jun XIAO a, Ying WANG, Qinghai MIAO, Jian XUE 5th Intenational Confeence on Advanced Mateials and Compute Science (ICAMCS 2016) A New and Efficient 2D Collision Detection Method Based on Contact Theoy Xiaolong CHENG, Jun XIAO a, Ying WANG, Qinghai

More information

On Error Estimation in Runge-Kutta Methods

On Error Estimation in Runge-Kutta Methods Leonado Jounal of Sciences ISSN 1583-0233 Issue 18, Januay-June 2011 p. 1-10 On Eo Estimation in Runge-Kutta Methods Ochoche ABRAHAM 1,*, Gbolahan BOLARIN 2 1 Depatment of Infomation Technology, 2 Depatment

More information

Accurate Diffraction Efficiency Control for Multiplexed Volume Holographic Gratings. Xuliang Han, Gicherl Kim, and Ray T. Chen

Accurate Diffraction Efficiency Control for Multiplexed Volume Holographic Gratings. Xuliang Han, Gicherl Kim, and Ray T. Chen Accuate Diffaction Efficiency Contol fo Multiplexed Volume Hologaphic Gatings Xuliang Han, Gichel Kim, and Ray T. Chen Micoelectonic Reseach Cente Depatment of Electical and Compute Engineeing Univesity

More information

Illumination methods for optical wear detection

Illumination methods for optical wear detection Illumination methods fo optical wea detection 1 J. Zhang, 2 P.P.L.Regtien 1 VIMEC Applied Vision Technology, Coy 43, 5653 LC Eindhoven, The Nethelands Email: jianbo.zhang@gmail.com 2 Faculty Electical

More information

The International Conference in Knowledge Management (CIKM'94), Gaithersburg, MD, November 1994.

The International Conference in Knowledge Management (CIKM'94), Gaithersburg, MD, November 1994. The Intenational Confeence in Knowledge Management (CIKM'94), Gaithesbug, MD, Novembe 994. Hashing by Poximity to Pocess Duplicates in Spatial Databases Walid G. Aef Matsushita Infomation Technology Laboatoy

More information

SYSTEM LEVEL REUSE METRICS FOR OBJECT ORIENTED SOFTWARE : AN ALTERNATIVE APPROACH

SYSTEM LEVEL REUSE METRICS FOR OBJECT ORIENTED SOFTWARE : AN ALTERNATIVE APPROACH I J C A 7(), 202 pp. 49-53 SYSTEM LEVEL REUSE METRICS FOR OBJECT ORIENTED SOFTWARE : AN ALTERNATIVE APPROACH Sushil Goel and 2 Rajesh Vema Associate Pofesso, Depatment of Compute Science, Dyal Singh College,

More information

Image Enhancement in the Spatial Domain. Spatial Domain

Image Enhancement in the Spatial Domain. Spatial Domain 8-- Spatial Domain Image Enhancement in the Spatial Domain What is spatial domain The space whee all pixels fom an image In spatial domain we can epesent an image by f( whee x and y ae coodinates along

More information

Lecture 27: Voronoi Diagrams

Lecture 27: Voronoi Diagrams We say that two points u, v Y ae in the same connected component of Y if thee is a path in R N fom u to v such that all the points along the path ae in the set Y. (Thee ae two connected components in the

More information

COSC 6385 Computer Architecture. - Pipelining

COSC 6385 Computer Architecture. - Pipelining COSC 6385 Compute Achitectue - Pipelining Sping 2012 Some of the slides ae based on a lectue by David Culle, Pipelining Pipelining is an implementation technique wheeby multiple instuctions ae ovelapped

More information

An Extension to the Local Binary Patterns for Image Retrieval

An Extension to the Local Binary Patterns for Image Retrieval , pp.81-85 http://x.oi.og/10.14257/astl.2014.45.16 An Extension to the Local Binay Pattens fo Image Retieval Zhize Wu, Yu Xia, Shouhong Wan School of Compute Science an Technology, Univesity of Science

More information

Reachable State Spaces of Distributed Deadlock Avoidance Protocols

Reachable State Spaces of Distributed Deadlock Avoidance Protocols Reachable State Spaces of Distibuted Deadlock Avoidance Potocols CÉSAR SÁNCHEZ and HENNY B. SIPMA Stanfod Univesity We pesent a family of efficient distibuted deadlock avoidance algoithms with applications

More information

ANALYTIC PERFORMANCE MODELS FOR SINGLE CLASS AND MULTIPLE CLASS MULTITHREADED SOFTWARE SERVERS

ANALYTIC PERFORMANCE MODELS FOR SINGLE CLASS AND MULTIPLE CLASS MULTITHREADED SOFTWARE SERVERS ANALYTIC PERFORMANCE MODELS FOR SINGLE CLASS AND MULTIPLE CLASS MULTITHREADED SOFTWARE SERVERS Daniel A Menascé Mohamed N Bennani Dept of Compute Science Oacle, Inc Geoge Mason Univesity 1211 SW Fifth

More information

A Full-mode FME VLSI Architecture Based on 8x8/4x4 Adaptive Hadamard Transform For QFHD H.264/AVC Encoder

A Full-mode FME VLSI Architecture Based on 8x8/4x4 Adaptive Hadamard Transform For QFHD H.264/AVC Encoder 20 IEEE/IFIP 9th Intenational Confeence on VLSI and System-on-Chip A Full-mode FME VLSI Achitectue Based on 8x8/ Adaptive Hadamad Tansfom Fo QFHD H264/AVC Encode Jialiang Liu, Xinhua Chen College of Infomation

More information

DYNAMIC STORAGE ALLOCATION. Hanan Samet

DYNAMIC STORAGE ALLOCATION. Hanan Samet ds0 DYNAMIC STORAGE ALLOCATION Hanan Samet Compute Science Depatment and Cente fo Automation Reseach and Institute fo Advanced Compute Studies Univesity of Mayland College Pak, Mayland 07 e-mail: hjs@umiacs.umd.edu

More information

High performance CUDA based CNN image processor

High performance CUDA based CNN image processor High pefomance UDA based NN image pocesso GEORGE VALENTIN STOIA, RADU DOGARU, ELENA RISTINA STOIA Depatment of Applied Electonics and Infomation Engineeing Univesity Politehnica of Buchaest -3, Iuliu Maniu

More information

HISTOGRAMS are an important statistic reflecting the

HISTOGRAMS are an important statistic reflecting the JOURNAL OF L A T E X CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 1 D 2 HistoSketch: Disciminative and Dynamic Similaity-Peseving Sketching of Steaming Histogams Dingqi Yang, Bin Li, Laua Rettig, and Philippe

More information

Conversion Functions for Symmetric Key Ciphers

Conversion Functions for Symmetric Key Ciphers Jounal of Infomation Assuance and Secuity 2 (2006) 41 50 Convesion Functions fo Symmetic Key Ciphes Deba L. Cook and Angelos D. Keomytis Depatment of Compute Science Columbia Univesity, mail code 0401

More information

Optical Flow for Large Motion Using Gradient Technique

Optical Flow for Large Motion Using Gradient Technique SERBIAN JOURNAL OF ELECTRICAL ENGINEERING Vol. 3, No. 1, June 2006, 103-113 Optical Flow fo Lage Motion Using Gadient Technique Md. Moshaof Hossain Sake 1, Kamal Bechkoum 2, K.K. Islam 1 Abstact: In this

More information

All lengths in meters. E = = 7800 kg/m 3

All lengths in meters. E = = 7800 kg/m 3 Poblem desciption In this poblem, we apply the component mode synthesis (CMS) technique to a simple beam model. 2 0.02 0.02 All lengths in metes. E = 2.07 10 11 N/m 2 = 7800 kg/m 3 The beam is a fee-fee

More information

The Java Virtual Machine. Compiler construction The structure of a frame. JVM stacks. Lecture 2

The Java Virtual Machine. Compiler construction The structure of a frame. JVM stacks. Lecture 2 Compile constuction 2009 Lectue 2 Code geneation 1: Geneating code The Java Vitual Machine Data types Pimitive types, including intege and floating-point types of vaious sizes and the boolean type. The

More information

A Novel Automatic White Balance Method For Digital Still Cameras

A Novel Automatic White Balance Method For Digital Still Cameras A Novel Automatic White Balance Method Fo Digital Still Cameas Ching-Chih Weng 1, Home Chen 1,2, and Chiou-Shann Fuh 3 Depatment of Electical Engineeing, 2 3 Gaduate Institute of Communication Engineeing

More information

Modelling, simulation, and performance analysis of a CAN FD system with SAE benchmark based message set

Modelling, simulation, and performance analysis of a CAN FD system with SAE benchmark based message set Modelling, simulation, and pefomance analysis of a CAN FD system with SAE benchmak based message set Mahmut Tenuh, Panagiotis Oikonomidis, Peiklis Chachalakis, Elias Stipidis Mugla S. K. Univesity, TR;

More information

Lecture Topics ECE 341. Lecture # 12. Control Signals. Control Signals for Datapath. Basic Processing Unit. Pipelining

Lecture Topics ECE 341. Lecture # 12. Control Signals. Control Signals for Datapath. Basic Processing Unit. Pipelining EE 341 Lectue # 12 Instucto: Zeshan hishti zeshan@ece.pdx.edu Novembe 10, 2014 Potland State Univesity asic Pocessing Unit ontol Signals Hadwied ontol Datapath contol signals Dealing with memoy delay Pipelining

More information

Module 6 STILL IMAGE COMPRESSION STANDARDS

Module 6 STILL IMAGE COMPRESSION STANDARDS Module 6 STILL IMAE COMPRESSION STANDARDS Lesson 17 JPE-2000 Achitectue and Featues Instuctional Objectives At the end of this lesson, the students should be able to: 1. State the shotcomings of JPE standad.

More information

A VECTOR PERTURBATION APPROACH TO THE GENERALIZED AIRCRAFT SPARE PARTS GROUPING PROBLEM

A VECTOR PERTURBATION APPROACH TO THE GENERALIZED AIRCRAFT SPARE PARTS GROUPING PROBLEM Accepted fo publication Intenational Jounal of Flexible Automation and Integated Manufactuing. A VECTOR PERTURBATION APPROACH TO THE GENERALIZED AIRCRAFT SPARE PARTS GROUPING PROBLEM Nagiza F. Samatova,

More information

4.2. Co-terminal and Related Angles. Investigate

4.2. Co-terminal and Related Angles. Investigate .2 Co-teminal and Related Angles Tigonometic atios can be used to model quantities such as

More information

Scaling Location-based Services with Dynamically Composed Location Index

Scaling Location-based Services with Dynamically Composed Location Index Scaling Location-based Sevices with Dynamically Composed Location Index Bhuvan Bamba, Sangeetha Seshadi and Ling Liu Distibuted Data Intensive Systems Laboatoy (DiSL) College of Computing, Geogia Institute

More information

AUTOMATED LOCATION OF ICE REGIONS IN RADARSAT SAR IMAGERY

AUTOMATED LOCATION OF ICE REGIONS IN RADARSAT SAR IMAGERY AUTOMATED LOCATION OF ICE REGIONS IN RADARSAT SAR IMAGERY Chistophe Waceman (1), William G. Pichel (2), Pablo Clement-Colón (2) (1) Geneal Dynamics Advanced Infomation Systems, P.O. Box 134008 Ann Abo

More information

Effects of Model Complexity on Generalization Performance of Convolutional Neural Networks

Effects of Model Complexity on Generalization Performance of Convolutional Neural Networks Effects of Model Complexity on Genealization Pefomance of Convolutional Neual Netwoks Tae-Jun Kim 1, Dongsu Zhang 2, and Joon Shik Kim 3 1 Seoul National Univesity, Seoul 151-742, Koea, E-mail: tjkim@bi.snu.ac.k

More information

The Dual Round Robin Matching Switch with Exhaustive Service

The Dual Round Robin Matching Switch with Exhaustive Service The Dual Round Robin Matching Switch with Exhaustive Sevice Yihan Li, Shivenda S. Panwa, H. Jonathan Chao Abstact Vitual Output Queuing is widely used by fixed-length highspeed switches to ovecome head-of-line

More information

The Internet Ecosystem and Evolution

The Internet Ecosystem and Evolution The Intenet Ecosystem and Evolution Contents Netwok outing: basics distibuted/centalized, static/dynamic, linkstate/path-vecto inta-domain/inte-domain outing Mapping the sevice model to AS-AS paths valley-fee

More information

Clustering Interval-valued Data Using an Overlapped Interval Divergence

Clustering Interval-valued Data Using an Overlapped Interval Divergence Poc. of the 8th Austalasian Data Mining Confeence (AusDM'9) Clusteing Inteval-valued Data Using an Ovelapped Inteval Divegence Yongli Ren Yu-Hsn Liu Jia Rong Robet Dew School of Infomation Engineeing,

More information

A Minutiae-based Fingerprint Matching Algorithm Using Phase Correlation

A Minutiae-based Fingerprint Matching Algorithm Using Phase Correlation A Minutiae-based Fingepint Matching Algoithm Using Phase Coelation Autho Chen, Weiping, Gao, Yongsheng Published 2007 Confeence Title Digital Image Computing: Techniques and Applications DOI https://doi.og/10.1109/dicta.2007.4426801

More information

Separability and Topology Control of Quasi Unit Disk Graphs

Separability and Topology Control of Quasi Unit Disk Graphs Sepaability and Topology Contol of Quasi Unit Disk Gaphs Jiane Chen, Anxiao(Andew) Jiang, Iyad A. Kanj, Ge Xia, and Fenghui Zhang Dept. of Compute Science, Texas A&M Univ. College Station, TX 7784. {chen,

More information

A ROI Focusing Mechanism for Digital Cameras

A ROI Focusing Mechanism for Digital Cameras A ROI Focusing Mechanism fo Digital Cameas Chu-Hui Lee, Meng-Feng Lin, Chun-Ming Huang, and Chun-Wei Hsu Abstact With the development and application of digital technologies, the digital camea is moe popula

More information

Parallel processing model for XML parsing

Parallel processing model for XML parsing Recent Reseaches in Communications, Signals and nfomation Technology Paallel pocessing model fo XML pasing ADRANA GEORGEVA Fac. Applied Mathematics and nfomatics Technical Univesity of Sofia, TU-Sofia

More information

Frequency Domain Approach for Face Recognition Using Optical Vanderlugt Filters

Frequency Domain Approach for Face Recognition Using Optical Vanderlugt Filters Optics and Photonics Jounal, 016, 6, 94-100 Published Online August 016 in SciRes. http://www.scip.og/jounal/opj http://dx.doi.og/10.436/opj.016.68b016 Fequency Domain Appoach fo Face Recognition Using

More information

Positioning of a robot based on binocular vision for hand / foot fusion Long Han

Positioning of a robot based on binocular vision for hand / foot fusion Long Han 2nd Intenational Confeence on Advances in Mechanical Engineeing and Industial Infomatics (AMEII 26) Positioning of a obot based on binocula vision fo hand / foot fusion Long Han Compute Science and Technology,

More information

Fifth Wheel Modelling and Testing

Fifth Wheel Modelling and Testing Fifth heel Modelling and Testing en Masoy Mechanical Engineeing Depatment Floida Atlantic Univesity Boca aton, FL 4 Lois Malaptias IFMA Institut Fancais De Mechanique Advancee ampus De lemont Feand Les

More information

Structure discovery techniques for circuit design and process model visualization

Structure discovery techniques for circuit design and process model visualization Depatament de iències de la omputació Ph.D. in omputing Stuctue discovey techniques fo cicuit design and pocess model visualization Javie de San Pedo Matín Adviso: Jodi otadella Fotuny Bacelona, May 2017

More information

Efficient protection of many-to-one. communications

Efficient protection of many-to-one. communications Efficient potection of many-to-one communications Miklós Molná, Alexande Guitton, Benad Cousin, and Raymond Maie Iisa, Campus de Beaulieu, 35 042 Rennes Cedex, Fance Abstact. The dependability of a netwok

More information

COEN-4730 Computer Architecture Lecture 2 Review of Instruction Sets and Pipelines

COEN-4730 Computer Architecture Lecture 2 Review of Instruction Sets and Pipelines 1 COEN-4730 Compute Achitectue Lectue 2 Review of nstuction Sets and Pipelines Cistinel Ababei Dept. of Electical and Compute Engineeing Maquette Univesity Cedits: Slides adapted fom pesentations of Sudeep

More information

Reader & ReaderT Monad (11A) Young Won Lim 8/20/18

Reader & ReaderT Monad (11A) Young Won Lim 8/20/18 Copyight (c) 2016-2018 Young W. Lim. Pemission is ganted to copy, distibute and/o modify this document unde the tems of the GNU Fee Documentation License, Vesion 1.2 o any late vesion published by the

More information

A New Finite Word-length Optimization Method Design for LDPC Decoder

A New Finite Word-length Optimization Method Design for LDPC Decoder A New Finite Wod-length Optimization Method Design fo LDPC Decode Jinlei Chen, Yan Zhang and Xu Wang Key Laboatoy of Netwok Oiented Intelligent Computation Shenzhen Gaduate School, Habin Institute of Technology

More information

UCB CS61C : Machine Structures

UCB CS61C : Machine Structures inst.eecs.bekeley.edu/~cs61c UCB CS61C : Machine Stuctues Lectue SOE Dan Gacia Lectue 28 CPU Design : Pipelining to Impove Pefomance 2010-04-05 Stanfod Reseaches have invented a monitoing technique called

More information

Modeling a shared medium access node with QoS distinction

Modeling a shared medium access node with QoS distinction Modeling a shaed medium access node with QoS distinction Matthias Gies, Jonas Geutet Compute Engineeing and Netwoks Laboatoy (TIK) Swiss Fedeal Institute of Technology Züich CH-8092 Züich, Switzeland email:

More information

A Recommender System for Online Personalization in the WUM Applications

A Recommender System for Online Personalization in the WUM Applications A Recommende System fo Online Pesonalization in the WUM Applications Mehdad Jalali 1, Nowati Mustapha 2, Ali Mamat 2, Md. Nasi B Sulaiman 2 Abstact foeseeing of use futue movements and intentions based

More information

XFVHDL: A Tool for the Synthesis of Fuzzy Logic Controllers

XFVHDL: A Tool for the Synthesis of Fuzzy Logic Controllers XFVHDL: A Tool fo the Synthesis of Fuzzy Logic Contolles E. Lago, C. J. Jiménez, D. R. López, S. Sánchez-Solano and A. Baiga Instituto de Micoelectónica de Sevilla. Cento Nacional de Micoelectónica, Edificio

More information

Input Layer f = 2 f = 0 f = f = 3 1,16 1,1 1,2 1,3 2, ,2 3,3 3,16. f = 1. f = Output Layer

Input Layer f = 2 f = 0 f = f = 3 1,16 1,1 1,2 1,3 2, ,2 3,3 3,16. f = 1. f = Output Layer Using the Gow-And-Pune Netwok to Solve Poblems of Lage Dimensionality B.J. Biedis and T.D. Gedeon School of Compute Science & Engineeing The Univesity of New South Wales Sydney NSW 2052 AUSTRALIA bbiedis@cse.unsw.edu.au

More information

A Shape-preserving Affine Takagi-Sugeno Model Based on a Piecewise Constant Nonuniform Fuzzification Transform

A Shape-preserving Affine Takagi-Sugeno Model Based on a Piecewise Constant Nonuniform Fuzzification Transform A Shape-peseving Affine Takagi-Sugeno Model Based on a Piecewise Constant Nonunifom Fuzzification Tansfom Felipe Fenández, Julio Gutiéez, Juan Calos Cespo and Gacián Tiviño Dep. Tecnología Fotónica, Facultad

More information

Multi-azimuth Prestack Time Migration for General Anisotropic, Weakly Heterogeneous Media - Field Data Examples

Multi-azimuth Prestack Time Migration for General Anisotropic, Weakly Heterogeneous Media - Field Data Examples Multi-azimuth Pestack Time Migation fo Geneal Anisotopic, Weakly Heteogeneous Media - Field Data Examples S. Beaumont* (EOST/PGS) & W. Söllne (PGS) SUMMARY Multi-azimuth data acquisition has shown benefits

More information

(a, b) x y r. For this problem, is a point in the - coordinate plane and is a positive number.

(a, b) x y r. For this problem, is a point in the - coordinate plane and is a positive number. Illustative G-C Simila cicles Alignments to Content Standads: G-C.A. Task (a, b) x y Fo this poblem, is a point in the - coodinate plane and is a positive numbe. a. Using a tanslation and a dilation, show

More information

Extract Object Boundaries in Noisy Images using Level Set. Final Report

Extract Object Boundaries in Noisy Images using Level Set. Final Report Extact Object Boundaies in Noisy Images using Level Set by: Quming Zhou Final Repot Submitted to Pofesso Bian Evans EE381K Multidimensional Digital Signal Pocessing May 10, 003 Abstact Finding object contous

More information

User Specified non-bonded potentials in gromacs

User Specified non-bonded potentials in gromacs Use Specified non-bonded potentials in gomacs Apil 8, 2010 1 Intoduction On fist appeaances gomacs, unlike MD codes like LAMMPS o DL POLY, appeas to have vey little flexibility with egads to the fom of

More information

Automatically Testing Interacting Software Components

Automatically Testing Interacting Software Components Automatically Testing Inteacting Softwae Components Leonad Gallaghe Infomation Technology Laboatoy National Institute of Standads and Technology Gaithesbug, MD 20899, USA lgallaghe@nist.gov Jeff Offutt

More information

Hierarchically Clustered P2P Streaming System

Hierarchically Clustered P2P Streaming System Hieachically Clusteed P2P Steaming System Chao Liang, Yang Guo, and Yong Liu Polytechnic Univesity Thomson Lab Booklyn, NY 11201 Pinceton, NJ 08540 Abstact Pee-to-pee video steaming has been gaining populaity.

More information

Building an Embedded Control Program Workload

Building an Embedded Control Program Workload Building an Embedded Contol Pogam Wokload Heng Yu and Gant Weddell School of Compute Science, Univesity of Wateloo Febuay 21, 25 Contents 1 Intoduction 2 2 Revie of the TPC-C okload 3 2.1 Definitions of

More information

Color Interpolation for Single CCD Color Camera

Color Interpolation for Single CCD Color Camera Colo Intepolation fo Single CCD Colo Camea Yi-Ming Wu, Chiou-Shann Fuh, and Jui-Pin Hsu Depatment of Compute Science and Infomation Engineeing, National Taian Univesit, Taipei, Taian Email: 88036@csie.ntu.edu.t;

More information

GCC-AVR Inline Assembler Cookbook Version 1.2

GCC-AVR Inline Assembler Cookbook Version 1.2 GCC-AVR Inline Assemble Cookbook Vesion 1.2 About this Document The GNU C compile fo Atmel AVR isk pocessos offes, to embed assembly language code into C pogams. This cool featue may be used fo manually

More information

CS 2461: Computer Architecture 1 Program performance and High Performance Processors

CS 2461: Computer Architecture 1 Program performance and High Performance Processors Couse Objectives: Whee ae we. CS 2461: Pogam pefomance and High Pefomance Pocessos Instucto: Pof. Bhagi Naahai Bits&bytes: Logic devices HW building blocks Pocesso: ISA, datapath Using building blocks

More information

arxiv: v4 [cs.ds] 7 Feb 2018

arxiv: v4 [cs.ds] 7 Feb 2018 Dynamic DFS in Undiected Gaphs: beaking the O(m) baie Suende Baswana Sheejit Ray Chaudhuy Keeti Choudhay Shahbaz Khan axiv:1502.02481v4 [cs.ds] 7 Feb 2018 Depth fist seach (DFS) tee is a fundamental data

More information

An Optimised Density Based Clustering Algorithm

An Optimised Density Based Clustering Algorithm Intenational Jounal of Compute Applications (0975 8887) Volume 6 No.9, Septembe 010 An Optimised Density Based Clusteing Algoithm J. Hencil Pete Depatment of Compute Science St. Xavie s College, Palayamkottai,

More information

Spiral Recognition Methodology and Its Application for Recognition of Chinese Bank Checks

Spiral Recognition Methodology and Its Application for Recognition of Chinese Bank Checks Spial Recognition Methodology and Its Application fo Recognition of Chinese Bank Checks Hanshen Tang 1, Emmanuel Augustin 2, Ching Y. Suen 1, Olivie Baet 2, Mohamed Cheiet 3 1 Cente fo Patten Recognition

More information

ART GALLERIES WITH INTERIOR WALLS. March 1998

ART GALLERIES WITH INTERIOR WALLS. March 1998 ART GALLERIES WITH INTERIOR WALLS Andé Kündgen Mach 1998 Abstact. Conside an at galley fomed by a polygon on n vetices with m pais of vetices joined by inteio diagonals, the inteio walls. Each inteio wall

More information

Data mining based automated reverse engineering and defect discovery

Data mining based automated reverse engineering and defect discovery Data mining based automated evese engineeing and defect discovey James F. Smith III, ThanhVu H. Nguyen Naval Reseach Laboatoy, Code 5741, Washington, D.C., 20375-5000 ABSTRACT A data mining based pocedue

More information

A Family of Distributed Deadlock Avoidance Protocols and their Reachable State Spaces

A Family of Distributed Deadlock Avoidance Protocols and their Reachable State Spaces A Family of Distibuted Deadlock Avoidance Potocols and thei Reachable State Spaces Césa Sánchez, Henny B. Sipma, and Zoha Manna Compute Science Depatment Stanfod Univesity, Stanfod, CA 94305-9025 {cesa,sipma,manna}@cs.stanfod.edu

More information

GARBAGE COLLECTION METHODS. Hanan Samet

GARBAGE COLLECTION METHODS. Hanan Samet gc0 GARBAGE COLLECTION METHODS Hanan Samet Compute Science Depatment and Cente fo Automation Reseach and Institute fo Advanced Compute Studies Univesity of Mayland College Pak, Mayland 07 e-mail: hjs@umiacs.umd.edu

More information

Assessment of Track Sequence Optimization based on Recorded Field Operations

Assessment of Track Sequence Optimization based on Recorded Field Operations Assessment of Tack Sequence Optimization based on Recoded Field Opeations Matin A. F. Jensen 1,2,*, Claus G. Søensen 1, Dionysis Bochtis 1 1 Aahus Univesity, Faculty of Science and Technology, Depatment

More information

Image Registration among UAV Image Sequence and Google Satellite Image Under Quality Mismatch

Image Registration among UAV Image Sequence and Google Satellite Image Under Quality Mismatch 0 th Intenational Confeence on ITS Telecommunications Image Registation among UAV Image Sequence and Google Satellite Image Unde Quality Mismatch Shih-Ming Huang and Ching-Chun Huang Depatment of Electical

More information

Exploring non-typical memcache architectures for decreased latency and distributed network usage.

Exploring non-typical memcache architectures for decreased latency and distributed network usage. Syacuse Univesity SURFACE Electical Engineeing and Compute Science Technical Repots College of Engineeing and Compute Science 9-5-20 Exploing non-typical memcache achitectues fo deceased latency and distibuted

More information