Control CPR: A Branch Height Reduction Optimization for EPIC Architectures

Size: px
Start display at page:

Download "Control CPR: A Branch Height Reduction Optimization for EPIC Architectures"

Transcription

1 Control CPR: A Branh Height Redution Otimization for EPIC Arhitetures Mihael Shlansker, Sott Mahlke, Rihard Johnson HP Laboratories Palo Alto HPL February, [shlansk,mahlke]@hl.h.om rjohnson@transmeta.om ILP, ritial ath redution, omilers The hallenge of exloiting high degrees of instrutionlevel arallelism is often hamered by frequent ing. Both exosed lateny and low throughut an restrit arallelism. Control ritial ath redution (ontrol CPR) is a omilation tehnique to address these roblems. Control CPR an redue the deendene height of ritial aths through oerations as well as derease the number of exeuted es. In this aer, we resent an aroah to ontrol CPR that reognizes sequenes of es using rofiling statistis. The ontrol CPR transformation is alied to the redominant ath through this sequene. Our aroah, its imlementation, and exerimental results are resented. This work demonstrates that ontrol CPR enhanes instrution-level arallelism for a variety of aliation rograms and imroves their erformane aross a range of roessors. Internal Aession Date Only Coyright Hewlett-Pakard Comany 1999

2 Control CPR: A Branh Height Redution Otimization for EPIC Arhitetures Mihael Shlansker Sott Mahlke Rihard Johnson Hewlett-Pakard Laboratories Transmeta Cororation Palo Alto, CA Santa Clara, CA fshlansk,mahlkeg@hl.h.om rjohnson@transmeta.om Abstrat The hallenge of exloiting high degrees of instrution-level arallelism is often hamered by frequent ing. Both exosed lateny and low throughut an restrit arallelism. Control ritial ath redution (ontrol CPR) is a omilation tehnique to address these roblems. Control CPR an redue the deendene height of ritial aths through oerations as well as derease the number of exeuted es. In this aer, we resent an aroah to ontrol CPR that reognizes sequenes of es using rofiling statistis. The ontrol CPR transformation is alied to the redominant ath through this sequene. Our aroah, its imlementation, and exerimental results are resented. This work demonstrates that ontrol CPR enhanes instrution-level arallelism for a variety of aliation rograms and imroves their erformane aross a range of roessors. 1 Introdution Inreases in miroroessor erformane are driven by both inreased lok seed and the use of hardware arallelism to exloit instrution-level arallelism. Exliitly Parallel Instrution Comuting (EPIC) arhitetures, as exemlified by Intel's reently announed IA64, reresent an emerging lass of roessors whih suort higher levels of instrution-level arallelism by enabling additional arallelization to be erformed at omile time. EPIC arhitetures use three major features to failitate omile-time arallelization: exliit arallel issue, seulation, and rediation. To fully exloit EPIC roessors, omilers must also transform and shedule aliation rograms to make more arallelism available at run time. There is signifiant onern regarding the amount of available instrution-level arallelism in imortant aliations. Aliations with insuffiient arallelism fail to exloit hardware arallelism and suffer substantial erformane enalties on EPIC roessors. Limits on aliation arallelism ome in two basi 0 forms: data deendenes and deendenes. Either tye of deendene limits erformane by requiring the sequential exeution of deendent oerations. Rather than aet deendenes as hard limits to ahieved erformane, omiler researhers need to develo tehniques to alleviate their limiting effets. While traditional otimizations fous only on minimizing oeration ount, ritial ath redution (CPR) is a family of tehniques for transforming rograms to redue and data deendene height in order to enhane arallelism. CPR tehniques generally fae tradeoffs between the need to redue ath length versus the ost of introdued redundant oerations. In this work, we fous on ontrol ritial ath redution (ontrol CPR) to redue deendene height. We define a new ontrol CPR tehnique referred to as the irredundant onseutive method (ICBM). ICBM redues height without exeuting redundant oerations. In fat, ICBM an greatly redue the number of exeuted es thus imroving erformane on roessors with exosed lateny or inadequate throughut. The aroah is suitable for roessors with substantial hardware arallelism as well as for roessors with minimal hardware arallelism. While ontrol CPR is broadly aliable to both EPIC as well as suersalar arhitetures, our imlementation fouses on EPIC roessors. This aer makes several imortant ontributions. First, our imlementation of an aroah for ontrol CPR, ICBM, is resented. While revious aers disussed a onetual framework for ontrol CPR, this aer desribes a working imlementation of ICBM within our exerimental omiler. We have extended revious ontrol CPR tehniques to treat inut rograms ontaining arbitrary uses of rediated ode inluding both onventional ifonverted ode as well as other more omlex uses of rediates. ICBM also generalizes revious ontrol CPR tehniques to rovide more effiient treatment for redited taken es. This aer resents effetive heuristis to utilize rofile data to ontrol the aliation of ontrol CPR. Finally, the aer resents exerimental results demonstrating the effetiveness of ICBM for enhaning erformane in a variety of aliations. For instane, a geometri mean seedu aross all the benhmarks of 18% is observed for an EPIC roessor with modest hardware resoures. 2 Bakground Researhers have develoed omiler tehniques to alleviate the erformane limiting effets of data deendenes. Tree height redution has been used to arallelize arithmeti omutations [Ku78]. Otimization tehniques suh as renaming, re-

3 assoiation, and exression simlifiation have been used to redue data deendene height. Height redution has also been alied to loo data reurrenes [DT93] [SK93]. There has also been rior work in reduing the erformane limiting effets of deendenes. Seulative exeution redues height by moving oerations above a revious guarding. Seulative exeution has been used to aelerate rogram regions suh as traes [LFK + 93], suerbloks [H + 93], software ielined loos [TLS90], and global regions [ME92]. Prediated exeution uses an additional boolean oerand as a guard whih onditionally nullifies eah oeration. Oerations exeute to omletion when their rediate is true and are nullified when their rediate is false. If-onversion using rediated exeution has been used to eliminate es assoiated with if-then-else onstruts [AKPW83] [DT93] [MLC + 92]. Comiler otimization tehniques have been develoed to redue the height of ritial aths threading through es and the number of exeuted es in aliation rograms. Loo unrolling has been used to redue the number of exeuted es in ounted do-loos [LFK + 93]. The height and number of exeuted es has also been redued for while loos [SK95]. Otimizations for very seifi ode atterns have been used to eliminate onditional es [GK92]. The number of exeuted onditional es an also be redued using ode duliation [MW92] [MW95] [BGS95]. Finally, the number of exeuted es an be redued by reordering multi-way swith statements so ommonly exeuted ases aear first [YUW98]. Salar ontrol CPR rovides the otential for a omiler to more systematially treat a broader variety of salar rogram es [SK95]. Prior work outlines an aroah for salar ontrol CPR whih identifies new avenues for enhaning rogram arallelism. This aer is based on a working imlementation of ontrol CPR whih rovides a more detailed understanding of the issues surrounding the design and imlementation of ontrol CPR tehnology within a working omiler. In order to aomlish ontrol CPR without redundant ode, our aroah utilizes rofile information to exedite ommon rogram aths at the exense of rare rogram aths. Prior work has shown that rofiles are relatively onsistent aross multile data sets [FF92]. 3 PlayDoh: An examle EPIC arhiteture PlayDoh, an EPIC arhiteture intended to suort ublily available researh, suorts our exeriments [KSR93]. Two entral features of PlayDoh are imortant in our disussion and imlementation of ontrol CPR: the arallel exeution model and rediated exeution. Parallel exeution. PlayDoh exoses hardware arallelism diretly to the omiler. Parallelism is exosed in two forms: simultaneous wide issue, and visible lateny. While exosed arallelism is ommonly seen in seialized signal and media roessors, most general-urose roessors utilize a stritly sequential omutational model. Exliit arallelism introdues a number of arhitetural issues surrounding es. PlayDoh assumes that oerations have one or more yles of exosed lateny. This allows the onstrution of simler units without reditive hardware and whih do not stall on a mis-redit. On the other hand, this laes a substantial burden on the omiler inut result of rediate omare un u on o an a Table 1. Behavior of omare oerations to effiiently utilize the delay slots of exosed lateny es. The exeution of a taken does not nullify arithmeti oerations within its delay slots, and PlayDoh assumes that es are treated in a like fashion. Hardware riority has been reviously used to nullify lower riority es when es are onurrently exeuted. This onet an be extended to nullify lower riority es (and just es) within the exosed delay slots of a taken. This extension is onsidered to be awkward, and diffiult to imlement in hardware. Instead, Play- Doh assumes that all es are naturally ielined and take effet at their visible lateny. When multile es take simultaneously, exeution semantis is indeterminate. To simlify overlaed treatment, our omiler ensures that no takes when it is loated within a delay slot of another taken. Branhes an be statially overlaed only when the omiler guarantees that their guarding rediates are not simultaneously true. Prediated exeution. The PlayDoh rediate arhiteture is based on that of the Cydra 5 [DT93]. Prediated exeution uses boolean rediates to reresent information about flow-of-ontrol within the rogram. For eah exeution of the rogram, a rediate's value is true when ontrol flow reahes a seifi oint in the ontrol flow grah for the rogram; the rediate's value is false when ontrol flow does not reah the seified oint in the ontrol flow grah. PlayDoh has a number of enhanements over the Cydra 5 whih allow a more general and effiient omutation of rediates. These enhanements inlude the definition of omares whih omute a air of rediates in a single oeration as well as omares whih streamline the evaluation of multi-inut logial oerations needed for ontrol CPR. A rediate omuting omare oeration has the form:,q = m.<x>.<y> ond(a,b) if r, The omare oeration is interreted as follows:, q are destination rediate registers; m is the generi omare oode; <x>, <y> are two-letter ation seifiers for eah omare destination; ond(a,b) is the omarison itself; r is a soure rediate register. PlayDoh ation seifiers allowed for eah result inlude the following: unonditionally set (UN or UC), wired-or (ON or OC), or wired-and (AN or AC). The first harater (U, O or A) indiates the ation tye ( unonditional, or, and ), while the seond harater (N or C) indiates the ation mode ( normal mode, or omlemented mode ). When an ation exeutes in omlemented mode, the omare ondition is omlemented before erforming the ation on the target rediate. Table 1 shows the exeution behavior of these omare oerations in normal and omlement modes. Eah entry desribes the result on the destination rediate; note that the destination may be assigned a value or may be left untouhed (denoted as - ).

4 From the table, we see that an unonditional omare oeration always writes a value into its destination register. The value written is the logial and of the guarding rediate and the omarison result (or its omlement). The unonditional omare is ommonly used to omute rediates for taken and not-taken suessor bloks after a rogram. The wired-or oeration onditionally sets its destination rediate true if both its guarding rediate and its omarison result are true. This form an be used to effiiently omute disjuntions by aumulating terms into a single rediate register that was initially leared. Sine all oerations that write the same register onditionally write the same value (true), they an exeute in any order. After all onditional writes have omleted, a final orret value is left in the target of the wired-or oeration. Wired-or writes to a ommon loation are not treated as outut deendenes and are onsidered as unordered by the sheduler. Further, in an EPIC arhiteture like PlayDoh, simultaneous wired-or writes to a ommon register are well-defined and readily-imlemented in hardware. Similarly, the wired-and omare oeration writes the value false if its guarding rediate is true and its omarison result is false. The onjuntion of multile omare onditions is omuted by first setting a rediate register to true, then aumulating the and-ed terms into this rediate register, ossibly in arallel. The use of the wired-and and wired-or rediates to aumulate terms in a onjuntion or disjuntion is used extensively in the ontrol CPR transformation disussed in this aer. 4 Aroah to Control CPR This setion resents an overview of ontrol CPR and defines a seifi aroah alled the Irredundant Conseutive Branh Method (ICBM). Some aroahes to ontrol CPR are redundant like full CPR [SK95] whih aggressively aelerates all aths within a region at the ost of a quadrati growth in the number of omares. The use of rofile data allows us to exedite some rogram aths at the exense of others; ICBM redues ode growth by aelerating only a single, statially redited, rogram ath. While the stati number of oerations tyially inreases, the dynami number of exeuted oerations does not. Thus, ICBM is attrative for roessors with limited arallelism. Aroahes that aelerate multile aths an further imrove erformane for highly arallel roessors or where stati redition is diffiult. 4.1 Basi aroah Control CPR is introdued by onsidering a single-entry, linear sequene of oerations ontaining one or more es, referred to as a suerblok. A suerblok, onsisting of three es, is given in Figure 1(a). Eah has a ondition omutation to determine if it is taken, ai <biin the figure. We assume the referred exeution ath in the suerblok (the on-trae ath) is traversed by falling through eah suessive. An offtrae ath is traversed when any of the exit es take. Nonseulative oerations are guarded by sheduling them below (and outside delay slots of) revious es. Store oerations are used in Figure 1(a) to reresent generi non-seulative oerations. Branhes are sequentially ordered; a hain of deendenes exists between es that exoses all latenies. FRP onversion. With traditional ontrol flow, oerations are guarded by onfining them to basi bloks guarded by es. Prediates an be used to guard oerations without onfining them to sequentially exeuted basi bloks. In rior work, ifonversion was rimarily used to eliminate es within singleentry single-exit rogram regions [DT93] [MLC + 92]. Deendent hains of es (e.g., those found in suerbloks) were not treated and remained deendent during rogram sheduling. With rediates, deendent hains of es found in suerbloks may be transformed using a variant of traditional if-onversion. The rogram is first artitioned into single entry ayli regions and a rediate is assoiated with eah basi blok. The rediate is used as a guard for oerations within the blok. The omutation of these rediates and their use as guards eliminates deendenes between non-seulative oerations (inluding es) and revious es uon whih they are deendent. These rediates are referred to as fully-resolved rediates (FRPs). FRPs for both basi bloks and es within a single entry ayli region an be defined reursively as follows. The FRP for the entry blok has value true. The FRP for any seleted nonentry blok an be omuted by oring a term for eah ontrolflow edge entering the blok. The term for eah edge is omuted by onjoining the FRP for the blok from whih the edge originates with the ondition that auses flow of ontrol to traverse the edge and enter the seleted blok. The FRP for a onditional an be alulated as the onjuntion of the FRP for the blok in whih the resides and the ondition that auses the to take. Control deendene analysis and boolean exression maniulation are used to otimize FRP exressions. Our omiler transforms regions by inserting ode to omute FRPs for eah basi blok and for eah onditional. Oerations are guarded by referening these FRPs as rediate oerands. This roess is alled FRP onversion Chains of deendenes are onverted into hains of data deendenes through oerations that evaluate rediates. The FRP-onverted suerblok orresonding to Figure 1(a) is shown in Figure 1(b). Eah retangle rovides the funtionality of a single PlayDoh omare with UC left-hand and UN right-hand oututs (see Table 1). Eah omare omutes both an FRP for a basi blok as well as an FRP for a. The UC outut omutes a new blok FRP by onjoining the revious blok FRP with the omlement of the ondition. Analogously, the UN outut omutes the FRP using the ondition itself. Blok FRPs guard non-seulative oerations that are no longer deendent uon rior es. When any FRP in the sequene is true, all other FRPs are false. As a result, the es are mutually exlusive; they may be reordered during sheduling and they may exeute in arallel. In Figure 1(b), the fall-through suessor E4 is reahed after all three FRP-onverted es exeute and fall-through. The onventional suerblok in Figure 1(a) is limited by deendenes, while the FRP-onverted suerblok in Figure 1(b) is limited by data deendene height through a sequene of omares. Boolean exression otimization an be used to height redue data deendenes through the sequenes of onjuntions needed to omute FRPs for deendent es. Transformation. Figure 2 illustrates a transformation that

5 a0 b0 store 0 store 0 a0 b0 < < E1 a1 b1 a1 b1 store 1 store 1 < < E2 store 2 a2 b2 < a2 b2 store 2 < E3 E3 E2 E1 E4 E4 a) original suerblok, sequential es b) FRP-onverted suerblok, indeendent es Figure 1. FRP onversion roess both height-redues deendenes through sequenes of es as well as height-reduing deendenes through exressions that omute requisite FRPs. Note that in this Figure the onditions are now simly exressed as i. We begin again with the traditional suerblok from Figure 1(a), whih is ontained inside the retangle of Figure 2(a). We assume that the rogram usually falls through all three es. The original ode of Figure 2(a) is augmented with a new oeration, referred to as a byass. The byass behaves as a omosite that takes when any of the original es takes and falls through when all of the original es fall through. Comare oerations are added to omute the off-trae (byass-) FRP and an on-trae FRP, orresonding to falling through all of the original es. These omares are shown as multi-inut logial gates to indiate that the FRP exression an be freely re-assoiated to aelerate evaluation. Note that the byass in Figure 2(a) is redundant, sine it never takes. The next ste of transformation is shown in Figure 2(b). Eah of the original es and any non-seulative oerations deendent uon these es are moved down aross the byass. Normally, ode moved below a is reliated along both aths. However, the byass- ondition guarantees that if the byass falls through, then none of the es moved on trae below the byass will take. After these es are eliminated, the byass is the only remaining on trae. Non-seulative oerations (e.g. stores in the figure) that were originally traed between es an now be sheduled in arallel. After transformation, only a single remains on trae and both the on- and off-trae FRPs are omuted in a height-redued (freely re-assoiated) manner. Thus, ontrol CPR redues both deendene height as well as data deendene height needed to evaluate onditions. There are a number of ways to imlement the height-redued omutation of FRPs. On onventional roessors, they an be imlemented using using two-inut logial oerations. Height redution uses the assoiative roerty to arefully re-organize the tree of two-inut oerations. The aroah resented here height redues FRP omutation using the PlayDoh style wired-and and wired-or omares. These omares are unordered and the ode motion of a stati sheduler naturally re-assoiates FRP evaluation by aumulating inut terms as they beome available. Further, for wide mahines, more than two terms an be ombined in eah mahine yle. It is imortant to note that the transformation of Figure 2 an be alied either to the original suerblok of Figure 1(a) or the FRP-onverted suerblok of Figure 1(b). This leads to a general aroah for ontrol CPR of rediated ode whih orretly aommodates inut ode of arbitrary omlexity and ahieves height-redution benefits in most ases of interest inluding onventional and FRP-onverted suerbloks with embedded if-onversion. This is imortant beause rediated exeution is often introdued rior to ontrol CPR (e.g. when if-onverted intrinsis are inlined). Bloking. When ontrol CPR is uniformly alied to an entire suerblok, a number of diffiulties arise. First, in the ontext of a heuristi whih requires irredundant on-trae ode, it may be illegal to aly ontrol CPR to an entire suerblok. To ensure irredundant on-trae ode, omares assoiated with exit es that are moved off trae also move off-trae. Irredundant ode is defined in more detail in the ontext of PlayDoh oerations in Setion 4.2. This motion requires that rediates omuted by these omares annot be used on trae. When redundant omares are required on trae, the ode is referred to as insearable and our ontrol CPR transformation is not alied. For examle, assume that the omare ondition 1 in Figure 2(a) deends uon a load oeration that in turn deends uon store 1. Then, the ode after transformation (Figure 2(b)) is illegal beause the assumed load is used to omute 1 and must exeute before store 1 yet it is also deendent uon (after) the same store. Even when orretness is not an issue, suerior results are often ahieved when we aly ontrol CPR to smaller subregions. The aliation of ontrol CPR an delay the exeution of nonseulative oerations inluding suerblok exit es. Exit es are ushed below the byass and out of the su-

6 store 0 0 E store 0 store 1 1 E2 byass E4 to off-trae ode store 2 E3 store 1 store store 1 E1 E2 byass never ours store 2 E4 2 E3 a) insertion of byass b) final height-redued ode Figure 2. Control CPR shema on-trae byass and a omensation blok, while the middle (unit length) CPR blok remains unhanged. The final ode is sheduled as three distint hyerbloks, one hyerblok is sheduled for the entire on-trae region while a searate hyerblok is sheduled for eah omensation blok. This allows sheduling overla between adjaent on-trae CPR bloks. Note that when eah CPR blok is traversed on-trae, both ontrae and off-trae aths below this oint are aelerated. Thus, the suessful traversal of the first CPR blok (B1) in Figure 3(b) aelerates aths to on-trae exit E7 as well as to off-trae exit E5. Thus, when CPR bloking is alied the erformane of resultant ode is more tolerant to unbiased es than if ontrol CPR were uniformly alied to the entire suerblok. 4.2 ICBM aroah Figure 3. Partitioning into multile CPR bloks erblok. When an exit is taken, ontrol CPR an introdue a erformane enalty. Even when exeution remains on trae, all on-trae and non-seulative oerations are guarded by an ontrae FRP that is deendent uon all omare onditions. This an delay the exeution of non-seulative oerations and their deendent oerations, and an omromise erformane eseially for long suerbloks. Bloking long suerbloks into smaller subregions alleviates these roblems. A CPR blok is a linear sequene of basi bloks from the original suerblok over whih the ontrol CPR is alied. Figure 3(a) illustrates a suerblok rior to the aliation of ontrol CPR. Dashed lines show bloking into three CPR bloks. Figure 3(b) shows the ode after ontrol CPR. Eah nontrivial CPR blok has been transformed into ode with a single Prinial goals of ICBM (Irredundant Conseutive Branh Method) are: to redue ritial-ath length; and to transform ode without inreasing the average number of exeuted oerations. This is aomlished in art due to the motion of es off trae. ICBM oerates on linear single-entry, multi-exit regions of ode that ontain rediated oerations, suh as suerbloks and hyerbloks [H + 93]. For the remainder of our disussion, we will use hyerblok to refer to a andidate ICBM inut region. Figure 4 shows the ICBM transformation for an inut omrised of a single CPR blok. The initial ode is shown to the left of the arrow and transformed ode is shown to the right. The symbols in the figure are as follows: squares reresent the two-target omare oerations, diamonds reresent es, and irles reresent all other oerations. The solid edges reresent the ontrol and data deendenes that are neessary to exlain the transformation. Conversely, the dotted edges on the right hand side of eah of the shaded retangles reresent the remaining deendenes that are eriheral to the transformation. The examle initial ode reresents a CPR blok region within

7 root rediate C 0 u un C1 u un Cn-1 u un. A0 b1 O1 P1 b2 On-1 Pn-1 bn root rediate C0 a on a on and on-trae FRP C1 or... A0 O1. On-1 a on off-trae FRP byass Cn-1 P1. Pn-1 T C0 u un C1 u un Cn-1 u un a) original ode b) transformed on-trae and off-trae ode Figure 4. Overview of the ICBM shema an FRP-onverted suerblok. The transformed ode shows new oerations inserted by ICBM and the ode motion erformed during the transformation. Now onsider a basi blok i (other than the entry blok) in the original ode. The blok ontains a omare (with ondition Ci), a (bi+1), and sets of oerations Oi and Pi. These sets are used to reisely desribe the ode motion erformed during ICBM. The non-omare and non- oerations are artitioned into two sets: Oi ontains oerations that are indeendent of FRPs and Pi ontains oerations that diretly or indiretly deend uon FRPs. The first basi blok requires seial treatment, beause none of its oerations are fored off trae by the motion of omares; all oerations from the first blok are ontained in set A 0. The root rediate holds the entry ondition for a CPR blok. Figure 4 reresents a single CPR blok taken from a sequene as in Figure 3. Eah CPR blok in the sequene is resonsible for omuting a on-trae FRP in terms of its root rediate. This ontrae FRP beomes the root rediate for the next CPR blok. The root rediate for the first CPR blok in a region is true. The transformed ode of Figure 4b is divided into on-trae ode on the left and off-trae ode on the right. On-trae ode inludes all oerations from the first basi blok (A 0) together with oerations from subsequent basi bloks that are indeendent of FRPs (O 1;:::;On,1). The on-trae ath also inludes lookahead omares: oerations used to omute the on-trae and off-trae FRPs. The lookahead omares imlement the multi-inut logial gates of Figure 2 using PlayDoh wired-and and wired-or omares. Eah omare omutes two results using the semantis desribed in Setion 3. An AC term is the omlement of a ondition and' ed with the CPR blok root rediate; all AC terms are wire-and' ed to form the on-trae FRP. An ON term is a ondition and' ed with the CPR blok root rediate; all ON terms are wire-or' ed to form the off-trae FRP. The byass follows the lookahead omare oerations;. A0 b1 P1 b2 Pn-1 bn these oerations are introdued by the ICBM transformation. Finally, the added omare and oerations are followed by oerations that are deendent uon FRPs in the original ode, namely oerations in sets P 1;:::;Pn,1. Oerations in these sets are reliated both on and off trae. If we onsider both the ontrae and off-trae oy of oerations in Pi, eah of these oerations exeute under onditions in the transformed rogram that are idential to their exeution onditions in the original rogram. Note that on-trae ode is said to be irredundant sine it has fewer oerations than the original ode. This an be seen by first onsidering the sets of oerations, A, O and P. Every oeration within these sets in the original ode aears within these sets in the on-trae ode. For eah omare oeration in the original ode, a single height-redued omare oeration aears in the on-trae ode. Finally, all of the es in the original ode are relaed by a single byass in the on-trae ode. The net effet is to onserve the oeration ount exet that n es in the original ode are relaed by one in the on-trae ode. It is in this sense that the ode is onsidered irredundant (and in fat redued oeration ount) for the target PlayDoh arhiteture. Off-trae ode onsists of eah of the omares in the original ode and all oerations that were deendent on those omares. Beause the original omares are moved off trae to eliminate redundany, oerations that are deendent uon the omares also move off trae. 5 ICBM Imlementation The ICBM shema has been imlemented within Elor, our omiler for researh in high-erformane EPIC arhitetures. ICBM aets general single-entry linear regions as inut; ommon examles are onventional suerbloks, FRP-onverted suerbloks, and hyerbloks. The ICBM transformation onsists of a sequene of four hases whih either analyze or transform ode: 1) rediate seulation, 2) math, 3) restruture, and 4) off-trae motion. After ICBM, a ass of dead ode elimination removes any unneessary oerations, suh as oerations that omute rediates whih are not referened. The ICBM ode modules take advantage of Elor's family of rediate ognizant analysis tools. Classi tools for data-flow analysis and deendene edge onstrution have been ugraded to analyze rediated ode in a onservative (with reset to orretness) yet reasonably aurate manner. Without these enhanements, the benefits of rediate-based ontrol CPR would not be realized. 5.1 Prediate seulation ICBM begins with rediate seulation. Prediate seulation serves two uroses. First, it redues deendene height by eliminating an oeration's deendene on its rediate alulation [MLC + 92]. More imortantly, rediate seulation eliminates deendenes that would inhibit ICBM' s searability ondition (disussed in the next sub-setion). Often, the rediate for a basi blok guards oerations required to omute the rediate for the next blok. These deendenes revent omares from moving off trae during ICBM and onstitute a searability failure. In FRP-onverted ode, searability systematially fails at almost every basi blok. Prediate seulation removes most of these deendenes allowing searability to ass more frequently.

8 Prediate seulation oerates in two bottom-u traversals of a hyerblok. An array, alled liveness, ontains boolean exressions reresenting rediate onditions under whih eah register or memory loation is live [JS96]. Initially, liveness is only available at hyerblok exit oints, and is omuted on the fly at eah oint (i.e. at eah oeration) during the bakward traversal. All oerations are andidates for romotion with the exetion of omare-to-rediate oerations whih unonditionally omute result rediates from inut rediates. In the first ass, the guarding rediate () for eah oeration is onditionally romoted to a rediate (q) suh that imlies q (or qislarger than ). While in rinile, rediate romotion ould romote a rediate to a variety of larger rediates, the only larger rediate onsidered here is rediate true. For eah oeration, the romotion ours only if the oeration will not overwrite a live register or memory value when it is romoted. The seond ass seletively demotes rediates. Demotion is the inverse of romotion: an oeration' s guard is demoted to a rediate that evaluates to true less frequently. The oeration may be artially demoted or fully demoted (i.e. returned to its original guard). Demotion rovides two benefits: first, it may redue deendene height, and seond it may demote an oerations exeution ondition without adding deendene height. Demoting an oerations rediate rodues seond order benefits suh as redued memory traffi, fewer ahe misses, as well as imroved rediate sensitive register alloation. Demotion is demonstrated by a simle examle. Consider two deendent oerations initially guarded by the same rediate. During rediate seulation, assume that the first oeration' s rediate annot be romoted without violating orretness while the seond oeration' s rediate is romoted to true. However, sine the seond oeration deends uon the first, this seulation does not redue height. During the seond ass, the rediate of the seond oeration is lowered to its original value. Where there is a data deendene between the these oerations, demotion undoes ineffetive romotion without hanging deendene height. Where there is a deendene after the first ass, demotion redues deendene height by relaing the deendene with a data deendene on the 's guarding omare. 5.2 Math Math identifies CPR bloks within a rogram region whih is to be transformed. CPR blok identifiation addresses two major issues: orretness and erformane heuristis. Math roesses hyerbloks reating a desrition of a transformation to be subsequently erformed by the CPR transformation module. The result of math is a list of CPR bloks, where eah CPR blok is a subregion of the hyerblok that is to be indeendently transformed. A reliminary ass within math generates a list of es in the hyerblok in their sequential order. Next, reahing-definition analysis is erformed on rediate variables. For every or omare oeration, this analysis identifies the unique omareto-rediate oeration that omutes the guarding rediate, if suh an oeration exists within the region. This allows omares that do not ontrol es (e.g. those used for onventional ifonversion in hyerbloks) to be ignored. In a subsequent ass, math grows a list of CPR bloks to over all es in the hyerblok. The roess is seeded with a CPR blok ontaining only the first. Math grows CPR bloks by aending onseutive es until an exit ondition terminates the blok. The roess is re-seeded, with the next (not yet aended into any CPR blok), and ontinues until all es in the original hyerblok have been treated. Pseudo ode for math is shown in Figure 5. Math erforms a number of tests inluding the suitability, searability, exit-weight and redit-taken tests. Eah of these tests an terminate a CPR blok. The suitability and searability tests guarantee that the transformation an be orretly alied, while the exit-weight and redit-taken tests are heuristis to imrove erformane. The orretness tests are designed to detet situations where ICBM an be alied and guaranteed to rodue both orret and effiient ode. Other aroahes might broaden the aliability of ontrol CPR by generalizing ICBM. Suitability. The suitability test generalizes ontrol CPR to handle inut rograms ontaining arbitrary rediation. Suitability always asses when math roesses simle suerbloks or FRPonverted suerbloks. However, in the resene of more omlex uses of rediates, the suitability test ensures that CPR bloks use rediates and onditions in a fashion onsistent with the rodution of orret ode using the CPR shema of Figure 4. Consider the following linear sequene of omare and oerations taken from a hyerblok: f1, t1 = m.u.un(b1) if g0 <exit 1> if t1 f2, t2 = m.u.un(b2) if g1 <exit 2> if t2... fn, tn = m.u.un(bn) if g(n-1) <exit n> if tn FTEXIT: The ICBM shema must generate ode whih omutes an offtrae FRP for the byass ; this FRP must be true exatly when any of the es in the CPR blok takes. The i th takes exatly when bi and gi,1 are both true. Any one of the exit es takes when ((g 0 ^ b 1) _ :::_ (gn,1 ^ bn)). If guards and onditions for the omares whih omute rediates are arbitrary, then we must evaluate this fully-general exression for the off-trae FRP. The ICBM shema more effiiently addresses ommonly ourring rediate usage by atually generating ode for the offtrae FRP using the simler exression (g 0 ^ (b 1 _ b 2 _ :::_ bn)). Suitability guarantees that this simler exression rodues orret ode before the ontrol CPR transformation is alied to a given CPR blok. When suitability fails, ode is left unhanged over an inut subregion in order to ensure orretness rather than generating the more omlex fully-general exression for the offtrae FRP. Suitability is divided into an initialization ste whih treats a CPR blok of length one, and a growth ste whih deides whether the CPR blok an be legally augmented with the next oeration. The suitability test is initialized as follows. A urrent ointer is initialized to oint at the first in a lengthone CPR blok. A suitable rediate set (SP) is initialized to the

9 Proedure ICBM math f 1: final in rev blok = 0 ; 2: result.lear() ; // result is initially null 3: while (TRUE) // form a list of r bloks 4: first = final in rev blok + 1 ; 5: urr = first ; 6: if (urr > total number of es) 7: break; 8: r blok.lear() ; // initialize a new CPR blok 9: r blok.aend(urr ) ; // defines seed 10: suitability test init(urr ) ; 11: searability test init(urr ) ; 12: exit weight test init(urr ) ; 13: red taken flag = FALSE ; 14: while (TRUE) // grow CPR blok from seed 15: if (red taken flag) 16: break; 17: and = urr +1 ; 18: if (suitability test failure(and )) 19: break; 20: if (searability test failure(and )) 21: break; 22: if (redit taken(and br)) 23: red taken flag = TRUE ; 24: if (!red taken flag && exit weight test failure(and )) 25: break; 26: // assed all tests, aend tule to CPR blok 27: r blok.aend(urr ) ; 28: urr = and ; 29: endwhile 30: result.aend(r blok) ; 31: final in rev blok=urr ; 32: endwhile 33: return result ; g Figure 5. Math seudo ode emty set. For the initial (i.e. the 0 th ), if the ontrolling omare oeration unonditionally omutes its guard rediate (i.e. using the UN target modifier), then the omare oeration' s guarding rediate is added to SP. This guarding rediate is the CPR blok's root rediate, as shown in Figure 4. If the omare oeration also unonditionally omutes a omlementary fall-through rediate (i.e. using the UC modifier), then the fall-through rediate is added to SP as well. The following three onditions form an indution hyothesis whih is readily verified for an initial CPR blok of length one: (1) if g 0 is false then all members of SP (if any) are also false; (2) if g 0 is true and no exit is taken, then all members of SP are true; (3) the off-trae FRP omuted as g 0 ^ (b 1 _ :::_bn) is true exatly when one of the es in the CPR blok takes. We ontinue growing the urrent CPR blok by inseting andidate es in order. For the urrent (i.e. the i th ), if the ontrolling omare oeration unonditionally omutes its -guarding rediate (i.e. using the UN target modifier) and if the omare' s guarding rediate gi is in SP, then the andidate an be aended to the urrent CPR blok. Otherwise, growth of the urrent CPR blok is terminated, and the urrent beomes the initial seed for a subsequent CPR blok. If the omare oeration also unonditionally omutes a omlementary fall-through rediate (i.e. using the UC modifier), then the fall-through rediate is added to SP. One an show that when a andidate is aended to a CPR blok, all three onditions of the indution hyothesis remain true. Thus, CPR bloks formed after assing suitability have the roerty that their shematially generated off-trae FRP is true exatly when one of the es in the CPR blok takes. Searability. The searability test is also needed to ensure the orretness of the ontrol CPR shema. The CPR transformation shown in Figure 2 moves omares from the original ode off-trae and relaes them with lookahead omares whih remain on-trae. If imroerly alied, the ode motion required by the ICBM shema an violate deendene onstraints. This might our when a ondition required to omute on-trae and off-trae FRPs is deendent uon a rediate whih, after ICBM, is omuted only off-trae. Sine ICBM is designed to rodue irredundant ode by moving the original omares off-trae, deendenes from a omare whih will be moved off-trae to a lookahead omare whih must remain on-trae are not allowed. The searability test reeatedly alies the funtion aendsuessors whih, for a given, omutes a set oerations whih are deendent uon the omare whih guards the. Note that in the ontrol CPR shema, the off-trae FRP is omuted as: g 0^(b 1 _b 2_: ::_bn), only the guard for the initial omare oeration (g 0) is used. All guards for subsequent omares (g 1;:::;gn,1)are ignored due to the suessful aliation of the suitability test. Thus, when aend-suessors omutes a set of suessors from a given omare, any deendene resulting from the use of the omare' s fall-through rediate as the guard of a omare whih in turn guards a subsequent an be ignored. The searability test is also divided into an initialization ste whih treats a CPR blok of length one, and a subsequent searability ste whih deides whether the CPR blok an be legally augmented with an additional basi blok. During initialization, a set alled su is initialized to the null set, and aend-suessors is invoked on the initial, whih is unonditionally inluded in the CPR blok. After omuting the aroriate set of suessor oerations for the omare guarding the initial, this set is aumulated into su. The searability ste is invoked to test eah subsequent andidate for inlusion into the urrent CPR blok. First, the omare whih guards the andidate is tested for membershi in su. If the omare is a member of su, searability is violated and the andidate annot join the CPR blok. Otherwise, the andidate may be inluded in the CPR blok and aend-suessors is invoked on the andidate thus aumulating aroriate suessors of the omare guarding the into su. Exit-weight and redit-taken tests. The exit-weight and redit-taken tests are heuristis whih trunate the formation of a CPR blok using rofile data whih rovides taken and nottaken frequenies for eah. The exit-weight test monitors the ratio of the umulative exit frequenies of all of the es within the CPR blok divided by the entry frequeny into the CPR blok. When the inlusion of a andidate into a CPR blok auses this ratio to exeed a threshold, the andidate is not inluded and CPR blok growth is terminated. The redit-taken test identifies likely exits in the inut ode that are seleted as the final in a CPR blok. Suh a CPR blok is tagged as a likely-taken CPR blok and reeives seial

10 treatment during ICBM restruture. The ode generation and ode motion shema for the likely-taken CPR blok allows a CPR blok to reah a redited taken target without first ing to an off-trae omensation blok and again ing from the omensation blok to the final target. The redit-taken test monitors the ratio of the andidate exit frequeny divided by the CPR blok entry frequeny. When this ratio exeeds a threshold, a likely taken CPR blok is formed. This test takes riority over the exit-weight test, whih would otherwise trunate the CPR blok. Further, when the andidate is identified as satisfying all onditions required for a redit-taken CPR blok, growth is terminated after the andidate is aended to the urrent CPR blok. 5.3 Restruture Restruture erforms the atual height-reduing transformation on eah non-trivial CPR blok identified in the revious ste. This hase introdues all new oerations inluding the lookahead omares and the byass. The re-wiring of guarding rediates and onditions to these new rediates is also erformed during this hase. Lastly, a new region known as the omensation blok is reated to hold the off-trae ode after ode motion. Two variations of CPR bloks are treated: first, a sequene in whih all the es are likely fall-through (fall-through variation); and seond a sequene in whih all are likely fallthrough exet the last whih is likely-taken (taken variation). Restruture is rather mehanial in nature. It onsists of the following stes that are erformed for eah CPR blok: insert on-trae and off-trae rediate omutation; reate omensation blok; insert byass ; re-wire guarding rediates. These stes are exlained for the fall-through variation. Afterwards, hanges for the taken variation are desribed. Two rediates, the on-trae and off-trae FRPs, are introdued for eah CPR blok. These hold the onditions that exeution remains on-trae (on-trae FRP evaluates to true) or goes off-trae (byass FRP evaluates to true) in the ourse of the entire CPR blok. The on-trae FRP is omuted as the onjuntion of the fall-through onditions of the es using wired-and semantis. The off-trae FRP is omuted as the disjuntion of the taken onditions using wired-or semantis. These FRPs are omuted using lookahead omares inserted after eah of the original omares. Eah lookahead omare uses the same ondition and soure oerands as the original. All of the lookahead omares target the new on-trae and off-trae FRPs using the dual target AC/ON semantis. One subtlety of the transformation is that eah of the lookahead omares is not guarded by the rediate of the orresonding original omares. Rather, all are guarded by the root rediate of the CPR blok. This substitution is legal due to the suess of the suitability test. The use of wired-and and wired-or rediates require that they be roerly initialized. The off-trae rediate (wired-or) is simly initialized to 0. The on-trae rediate (wired-and) is initialized to the root rediate of the CPR blok whih is stritly greater than the wired-and result. The next two stes reate the omensation blok and insert a onditional to that blok. An emty omensation blok is added to the funtion body. The byass is added as a onditional to the omensation blok that ours when the off-trae FRP evaluates to true. This is inserted immediately after the final within the original CPR blok. The final ste of the restruture eliminates uses of rediates omuted by the original omares from oerations subsequent to the byass. One of the goals of the shema is to allow the original omares to move off-trae to eliminate redundany. However, this an only be aomlished if there are no uses in the remainder of the hyerblok of the rediates they omute. It an be shown that suh uses of any of the original rediates an be safely relaed by a use of the on-trae FRP. Taken variation. Several small hanges to restruture are neessary for the taken variation. Instead of aelerating the fallthrough diretion of the final, the taken diretion is aelerated. To aomlish this, the sense of the final lookahead omare is inverted, e.g., a less-than ondition in the original omare beomes a greater-than-or-equals in the new omare. The more interesting art of the variation is that a new byass is not required. Rather, the last in the CPR blok serves as the byass. Its taken ondition orresonds to remaining ontrae, and its fall-through orresonds to going off-trae. As a result, a new omensation blok is also not required. Instead, the remainder of the hyerblok serves as the omensation blok. 5.4 Off-trae motion After the reeding height-reduing transformations are omlete, redundant oerations are moved off-trae to benefit the more likely on-trae ath. This motion is erformed rior to sheduling so that we an use our existing suerblok/hyerblok sheduler. An alternate aroah would rely on a tree-region sheduler [HBC98] to erform ode motion between the height-redued hyerblok and its assoiated omensation bloks. Three asses are erformed over the hyerblok region to identify the set of oerations that must move off trae, and to further identify the subset of moved oerations that must be slit, so that a oy remains on trae. A final ste erforms the required oeration slitting and ode motion. The stes are as follows. First, identify all data deendene suessors of the omare and oerations that must be moved off-trae (set 1). Seond, identify a subset of the oerations in set 1 that rodue a value that is also needed on-trae (set 2). These oerations must be slit or reliated along both aths. Stores are the most ommon oerations that require slitting. Third, identify any of the remaining oerations not in set 1 whose results are used only along the off-trae ath, sine their motion off-trae will benefit the on-trae ath (set 3). Finally, move oerations in sets 1 and 3 to omensation bloks, while reliating oerations in set 2. 6 Code Examle The aliation of ICBM is illustrated in this setion using a simle ode examle. The examle hosen is the inner loo for a ommon string oy routine. The examle soure, shown in Figure 6(a), is a while loo whih oies elements of string A into string B. To exose instrution-level arallelism, the loo body is unrolled four times. PlayDoh assembly ode after unrolling and other traditional ode otimizations is shown in Figure 6(b). Eah iteration stores a urrent value into array B, loads the next value from array A, omutes the neessary addresses, and onditionally

11 a=&a[0]; b=&b[0]; while (a!=0)f b++ = a++ ; g (a) soure ode Loo: 1. r21 = add (r2, 0) if T 2. store (r21, r34) if T 3. r11 = add (r1, 1) if T 4. r31 = load (r11) if T 5. r41 = br (Exit, 0) if T = m.un eq (r31, 0) if T 7. (51, r41) 8. r22 = add (r2, 1) if T 9. store (r22, r31) if T 10. r12 = add (r1, 2) if T 11. r32 = load (r12) if T 12. r42 = br (Exit, 0) if T = m.un eq (r32, 0) if T 14. (52, r42) 15. r23 = add (r2, 2) if T 16. store (r23, r32) if T 17. r13 = add (r1, 3) if T 18. r33 = load (r13) if T 19. r43 = br (Exit, 0) if T = m.un eq (r33, 0) if T 21. (53, r43) 22. r24 = add (r2, 3) if T 23. store (r24, r33) if T 24. r14 = add (r1, 4) if T 25. r34 = load (r14) if T 26. r44 = br (Loo, 1) if T 27. r1 = add (r1, 4) if T 28. r2 = add (r2, 4) if T = m.un ne (r34, 0) if T 30. (54, r44) Exit: (b) unrolled assembly ode Loo: 1. r21 = add (r2, 0) if T 2. store (r21, r34) if T 3. r11 = add (r1, 1) if T 4. r31 = load (r11) if T 5. r41 = br (Exit, 0) if T 6. 51, 61 = m.un.u eq (r31, 0) if T 7. (51, r41) 8. r22 = add (r2, 1) if store (r22, r31) if r12 = add (r1, 2) if r32 = load (r12) if r42 = br (Exit, 0) if , 62 = m.un.u eq (r32, 0) if (52, r42) 15. r23 = add (r2, 2) if store (r23, r32) if r13 = add (r1, 3) if r33 = load (r13) if r43 = br (Exit, 0) if , 63 = m.un.u eq (r33, 0) if (53, r43) 22. r24 = add (r2, 3) if store (r24, r33) if r14 = add (r1, 4) if r34 = load (r14) if r44 = br (Loo, 1) if r1 = add (r1, 4) if r2 = add (r2, 4) if = m.un ne (r34, 0) if (54, r44) Exit: () after FRP onversion Figure 6. Examle: transformation alied to stry exits the loo after the end of A is reahed. In PlayDoh, onditional es are realized using three oerations: a reare-to- (o 5), a omarison (o 6), and a rediated ontrol transfer (o 7). For the last iteration, two additional oerations inrement the array ointers by the unroll amount (four). At this oint, all oerations are guarded by the rediate true (denoted by if T). Figure 6() shows the FRP-onverted suerblok. FRP onversion is aomlished using m oerations whih generate two rediate oututs: the taken ondition (UN outut) and the fallthrough ondition (UC outut). Oerations whih were deendent on the are now guarded by the fall-through ondition to omlete the transformation. For examle, in Figure 6(), o 6 has a seond target, 61, that generates the fall-through ondition; oerations that were deendent on o 7, namely os 8-13, are now guarded by 61. The ode in Figure 6() reresents the referred inut for the ICBM shema. Prediate seulation. The first hase of the ICBM shema is rediate seulation. Prediate seulation is alied to the FRP-onverted suerblok in Figure 6(). The resultant ode after seulation is shown in Figure 7(a). This examle is somewhat uninteresting from the viewoint of rediate seulation. In the first ass of rediate seulation, all eligible oerations in the blok are romoted to true. Note that romotion of oerations within an FRP-onverted suerblok is always legal, sine romotion faithfully mirrors the original ode. The seond ass demotes the rediate of os 9, 16, and 23 bak to their original value. Eah of these stores are deendent on a rior. For instane, o 16 is deendent on o 14 and seulating o 16 to true was not useful. Demotion lowers o 16' s rediate to the fall-through rediate, 62. Not only does demotion undo a useless romotion, it also lowers deendene height by enabling the store and to be freely reordered. Math. The next hase of the ICBM shema is to aly math to the ode in Figure 7(a). Math identifies the CPR bloks or the set of subregions that will be transformed. Reall that math onsists of a set of four tests (suitability, searability, exit weight and redit taken) that are onditions for terminating a CPR blok. For this examle, the only rediates that are used are those generated via FRP onversion and thus the suitability test sueeds aross the entire blok. In addition, there are no deendenes that ause the searability test to fail. It is illustrative to note that if the omiler ould not determine that a store and a subsequent load were indeendent in this examle, the searability test would fail. For instane, assume there is an alias between oerations 16 and 18. The math algorithm would attemt to aend the omare/ tule 20/21 to the CPR blok ontaining the revious two omare/ tules (6/7, 13/14). However, there is a hain of deendenes onneting os 13 and 20 (13 to 14 via a flow deendene, 14 to 16 via a ontrol deendene, 16 to 18 via the assumed memory deendene, and 18 to 20 via a flow deendene). This would ause a searability violation reventing the addition of the omare/ tule. The exit weight and redit taken tests are based on rofile data. For this examle, it is assumed the last (o 30) is redominantly taken sine it is the loo bak. Further, it is assumed the exit weight threshold is exeeded by the seond (o 14), but its redominant diretion is fall-through. As a result, the math algorithm identifies two CPR bloks in Figure 7(a). The first CPR blok inludes the first two es and will use the fall-through restruture shema. The seond CPR bloks inludes the last two es and will use the taken restruture shema. In the atual aliation of ICBM, suh small CPR bloks are not tyially formed. However, it is done in this examle to jointly illustrate both the fall-through and taken restruture shemas. Restruture. The next hase of the ICBM shema is to restruture (aly the ontrol CPR transformation) to eah of the CPR bloks. The overall result of the restruture is shown in Figure 7b. Fousing on the first CPR blok, the new on-trae and off-trae rediates are 71 and 81, resetively. After eah of the original ms that omute the the onditions (os 6 and 13), the lookahead AC/ON ms are inserted (os 32 and 33). The lookahead ms look idential to the original ms in terms of soure oerands and omare ondition. The lookahead ms are guarded by the root rediate of the CPR blok. Sine this is the first CPR blok in the hyerblok, the root rediate is true. Again, fousing just on the first CPR blok, o 35 is the byass and the new blok labeled Cm1 is the omensation blok. The byass is inserted after the last in the CPR blok. The byass- rediate is 81. Note that the PlayDoh arhiteture requires a reare-to- oeration for eah, hene o 34 is inserted in onjuntion with the byass. The final hase of restruture rewires oerations in the hyerblok subsequent to the last in the CPR blok (o 14) that use rediates omuted by the original ms. For this

Reconfigurable Shape-Adaptive Template Matching Architectures

Reconfigurable Shape-Adaptive Template Matching Architectures Reonfigurable Shae-Adative Temlate Mathing Arhitetures Jörn Gause 1, Peter Y.K. Cheung 1, Wayne Luk 2 1 Deartment of Eletrial and Eletroni Engineering, Imerial College, London SW7 2BT, England. 2 Deartment

More information

Approximate Labeling via the Primal-Dual Schema

Approximate Labeling via the Primal-Dual Schema Aroximate Labeling via the Primal-Dual Shema Nikos Komodakis and Georgios Tziritas Tehnial Reort CSD-TR-2005-01 February 1, 2005 Aroximate Labeling via the Primal-Dual Shema Nikos Komodakis and Georgios

More information

p[4] p[3] p[2] p[1] p[0]

p[4] p[3] p[2] p[1] p[0] CMSC 425 : Sring 208 Dave Mount and Roger Eastman Homework Due: Wed, Marh 28, :00m. Submit through ELMS as a df file. It an either be distilled from a tyeset doument or handwritten, sanned, and enhaned

More information

On - Line Path Delay Fault Testing of Omega MINs M. Bellos 1, E. Kalligeros 1, D. Nikolos 1,2 & H. T. Vergos 1,2

On - Line Path Delay Fault Testing of Omega MINs M. Bellos 1, E. Kalligeros 1, D. Nikolos 1,2 & H. T. Vergos 1,2 On - Line Path Delay Fault Testing of Omega MINs M. Bellos, E. Kalligeros, D. Nikolos,2 & H. T. Vergos,2 Dept. of Computer Engineering and Informatis 2 Computer Tehnology Institute University of Patras,

More information

Run-time Evaluation of Opportunities for Object Inlining in Java

Run-time Evaluation of Opportunities for Object Inlining in Java Run-time Evaluation o Oortunities or Objet Inlining in Java Ondřej Lhoták Laurie Hendren Sable Researh Grou Shool o Comuter Siene MGill University Montreal, Canada {olhotak,hendren@sable.mgill.a ABSTRACT

More information

Algorithms for External Memory Lecture 6 Graph Algorithms - Weighted List Ranking

Algorithms for External Memory Lecture 6 Graph Algorithms - Weighted List Ranking Algorithms for External Memory Leture 6 Graph Algorithms - Weighted List Ranking Leturer: Nodari Sithinava Sribe: Andi Hellmund, Simon Ohsenreither 1 Introdution & Motivation After talking about I/O-effiient

More information

EE560. Interface Lab. A. To introduce the 4-way and the 2-way asynchronous handshake mechanisms

EE560. Interface Lab. A. To introduce the 4-way and the 2-way asynchronous handshake mechanisms EE560 Interfae Lab Objetive: A. To introdue the 4-way and the 2-way asynhronous handshake mehanisms B. To design an interfae between a PRODUCER working at wlk lok and a CONSUMER working at rlk in several

More information

Bias Error Reduction of Digital Image Correlation Based on Kernel

Bias Error Reduction of Digital Image Correlation Based on Kernel Vol.81 (CST 15),.16- htt://dx.doi.org/1.1457/astl.15.81.4 Bias Error Redution of Digital Image Correlation Based on Kernel Huan Shen 1,, eize Zhang 1, and Xiang Shen 1 Energy and ower College, anjing Uniersity

More information

Skip Strips: Maintaining Triangle Strips for View-dependent Rendering

Skip Strips: Maintaining Triangle Strips for View-dependent Rendering Ski Stris: Maintaining Triangle Stris for View-deendent Rendering Jihad El-Sana ; Elvir Azanli Amitabh Varshney Deartment of Mathematis and Comuter Siene Deartment of Comuter Siene Ben-Gurion University

More information

A rich discrete labeling scheme for line drawings of curved objects

A rich discrete labeling scheme for line drawings of curved objects A rih disrete labeling sheme for line drawings of urved objets Martin C. Cooer, IRIT, University of Toulouse III, 31062 Toulouse, Frane ooer@irit.fr Abstrat We resent a disrete labeling sheme for line

More information

Exponential Particle Swarm Optimization Approach for Improving Data Clustering

Exponential Particle Swarm Optimization Approach for Improving Data Clustering International Journal of Eletrial and Eletronis Engineering 3:4 9 Exonential Partile Swarm Otimization Aroah for Imroving Data Clustering eveen I. Ghali, ahed El-Dessoui, Mervat A.., and Lamiaa Barawi

More information

Pipelined Multipliers for Reconfigurable Hardware

Pipelined Multipliers for Reconfigurable Hardware Pipelined Multipliers for Reonfigurable Hardware Mithell J. Myjak and José G. Delgado-Frias Shool of Eletrial Engineering and Computer Siene, Washington State University Pullman, WA 99164-2752 USA {mmyjak,

More information

Event Detection Using Local Binary Pattern Based Dynamic Textures

Event Detection Using Local Binary Pattern Based Dynamic Textures Event Detetion Using Loal Binary Pattern Based Dynami Textures Abstrat Deteting susiious events from video surveillane ameras has been an imortant task reently. Many trajetory based desritors were develoed,

More information

Extracting Partition Statistics from Semistructured Data

Extracting Partition Statistics from Semistructured Data Extrating Partition Statistis from Semistrutured Data John N. Wilson Rihard Gourlay Robert Japp Mathias Neumüller Department of Computer and Information Sienes University of Strathlyde, Glasgow, UK {jnw,rsg,rpj,mathias}@is.strath.a.uk

More information

Exploring the Commonality in Feature Modeling Notations

Exploring the Commonality in Feature Modeling Notations Exploring the Commonality in Feature Modeling Notations Miloslav ŠÍPKA Slovak University of Tehnology Faulty of Informatis and Information Tehnologies Ilkovičova 3, 842 16 Bratislava, Slovakia miloslav.sipka@gmail.om

More information

Parametric Abstract Domains for Shape Analysis

Parametric Abstract Domains for Shape Analysis Parametri Abstrat Domains for Shape Analysis Xavier RIVAL (INRIA & Éole Normale Supérieure) Joint work with Bor-Yuh Evan CHANG (University of Maryland U University of Colorado) and George NECULA (University

More information

Automatic Physical Design Tuning: Workload as a Sequence Sanjay Agrawal Microsoft Research One Microsoft Way Redmond, WA, USA +1-(425)

Automatic Physical Design Tuning: Workload as a Sequence Sanjay Agrawal Microsoft Research One Microsoft Way Redmond, WA, USA +1-(425) Automati Physial Design Tuning: Workload as a Sequene Sanjay Agrawal Mirosoft Researh One Mirosoft Way Redmond, WA, USA +1-(425) 75-357 sagrawal@mirosoft.om Eri Chu * Computer Sienes Department University

More information

4. Principles of Picture taking 4 hours

4. Principles of Picture taking 4 hours Leture 4 - - 0/3/003 Conet Hell/Pfeiffer February 003 4. Priniles of Piture taking 4 hours Aim: riniles of iture taking (normal ase, onvergent for oint measurements, flight lanning) flight lanning (arameter,

More information

This fact makes it difficult to evaluate the cost function to be minimized

This fact makes it difficult to evaluate the cost function to be minimized RSOURC LLOCTION N SSINMNT In the resoure alloation step the amount of resoures required to exeute the different types of proesses is determined. We will refer to the time interval during whih a proess

More information

Outline: Software Design

Outline: Software Design Outline: Software Design. Goals History of software design ideas Design priniples Design methods Life belt or leg iron? (Budgen) Copyright Nany Leveson, Sept. 1999 A Little History... At first, struggling

More information

Shuigeng Zhou. May 18, 2016 School of Computer Science Fudan University

Shuigeng Zhou. May 18, 2016 School of Computer Science Fudan University Query Processing Shuigeng Zhou May 18, 2016 School of Comuter Science Fudan University Overview Outline Measures of Query Cost Selection Oeration Sorting Join Oeration Other Oerations Evaluation of Exressions

More information

The official electronic file of this thesis or dissertation is maintained by the University Libraries on behalf of The Graduate School at Stony Brook

The official electronic file of this thesis or dissertation is maintained by the University Libraries on behalf of The Graduate School at Stony Brook Stony Brook University The offiial eletroni file of this thesis or dissertation is maintained by the University Libraries on behalf of The Graduate Shool at Stony Brook University. Alll Rigghht tss Reesseerrvveedd

More information

Multi-Channel Wireless Networks: Capacity and Protocols

Multi-Channel Wireless Networks: Capacity and Protocols Multi-Channel Wireless Networks: Capaity and Protools Tehnial Report April 2005 Pradeep Kyasanur Dept. of Computer Siene, and Coordinated Siene Laboratory, University of Illinois at Urbana-Champaign Email:

More information

Gray Codes for Reflectable Languages

Gray Codes for Reflectable Languages Gray Codes for Refletable Languages Yue Li Joe Sawada Marh 8, 2008 Abstrat We lassify a type of language alled a refletable language. We then develop a generi algorithm that an be used to list all strings

More information

CleanUp: Improving Quadrilateral Finite Element Meshes

CleanUp: Improving Quadrilateral Finite Element Meshes CleanUp: Improving Quadrilateral Finite Element Meshes Paul Kinney MD-10 ECC P.O. Box 203 Ford Motor Company Dearborn, MI. 8121 (313) 28-1228 pkinney@ford.om Abstrat: Unless an all quadrilateral (quad)

More information

HEXA: Compact Data Structures for Faster Packet Processing

HEXA: Compact Data Structures for Faster Packet Processing Washington University in St. Louis Washington University Open Sholarship All Computer Siene and Engineering Researh Computer Siene and Engineering Report Number: 27-26 27 HEXA: Compat Data Strutures for

More information

Social Network Analysis Based on BSP Clustering Algorithm

Social Network Analysis Based on BSP Clustering Algorithm Communiations of the IIMA Volume 7 Issue 4 Artile 5 7 Soial Network Analysis Based on BSP Clustering Algorithm ong Shool of Business Administration China University of Petroleum Follow this and additional

More information

Performance Improvement of TCP on Wireless Cellular Networks by Adaptive FEC Combined with Explicit Loss Notification

Performance Improvement of TCP on Wireless Cellular Networks by Adaptive FEC Combined with Explicit Loss Notification erformane Improvement of TC on Wireless Cellular Networks by Adaptive Combined with Expliit Loss tifiation Masahiro Miyoshi, Masashi Sugano, Masayuki Murata Department of Infomatis and Mathematial Siene,

More information

Partial Character Decoding for Improved Regular Expression Matching in FPGAs

Partial Character Decoding for Improved Regular Expression Matching in FPGAs Partial Charater Deoding for Improved Regular Expression Mathing in FPGAs Peter Sutton Shool of Information Tehnology and Eletrial Engineering The University of Queensland Brisbane, Queensland, 4072, Australia

More information

Reading Object Code. A Visible/Z Lesson

Reading Object Code. A Visible/Z Lesson Reading Objet Code A Visible/Z Lesson The Idea: When programming in a high-level language, we rarely have to think about the speifi ode that is generated for eah instrution by a ompiler. But as an assembly

More information

1. The collection of the vowels in the word probability. 2. The collection of real numbers that satisfy the equation x 9 = 0.

1. The collection of the vowels in the word probability. 2. The collection of real numbers that satisfy the equation x 9 = 0. C HPTER 1 SETS I. DEFINITION OF SET We begin our study of probability with the disussion of the basi onept of set. We assume that there is a ommon understanding of what is meant by the notion of a olletion

More information

Reading Object Code. A Visible/Z Lesson

Reading Object Code. A Visible/Z Lesson Reading Objet Code A Visible/Z Lesson The Idea: When programming in a high-level language, we rarely have to think about the speifi ode that is generated for eah instrution by a ompiler. But as an assembly

More information

A Novel Validity Index for Determination of the Optimal Number of Clusters

A Novel Validity Index for Determination of the Optimal Number of Clusters IEICE TRANS. INF. & SYST., VOL.E84 D, NO.2 FEBRUARY 2001 281 LETTER A Novel Validity Index for Determination of the Optimal Number of Clusters Do-Jong KIM, Yong-Woon PARK, and Dong-Jo PARK, Nonmembers

More information

Introductory Programming, IMM, DTU Systematic Software Test. Software test (afprøvning) Motivation. Structural test and functional test

Introductory Programming, IMM, DTU Systematic Software Test. Software test (afprøvning) Motivation. Structural test and functional test Introdutory Programming, IMM, DTU Systemati Software Test Peter Sestoft a Programs often ontain unintended errors how do you find them? Strutural test Funtional test Notes: Systemati Software Test, http://www.dina.kvl.dk/

More information

Automated Generation of Interactive 3D Exploded View Diagrams

Automated Generation of Interactive 3D Exploded View Diagrams Automated Generation of Interative 3D Exloded View Diagrams Abstrat We resent a system for reating and viewing interative exloded views of omlex 3D models. In our aroa, a 3D inut model is organized into

More information

Equality-Based Translation Validator for LLVM

Equality-Based Translation Validator for LLVM Equality-Based Translation Validator for LLVM Michael Ste, Ross Tate, and Sorin Lerner University of California, San Diego {mste,rtate,lerner@cs.ucsd.edu Abstract. We udated our Peggy tool, reviously resented

More information

Query Evaluation Overview. Query Optimization: Chap. 15. Evaluation Example. Cost Estimation. Query Blocks. Query Blocks

Query Evaluation Overview. Query Optimization: Chap. 15. Evaluation Example. Cost Estimation. Query Blocks. Query Blocks Query Evaluation Overview Query Optimization: Chap. 15 CS634 Leture 12 SQL query first translated to relational algebra (RA) Atually, some additional operators needed for SQL Tree of RA operators, with

More information

Compilation Lecture 11a. Register Allocation Noam Rinetzky. Text book: Modern compiler implementation in C Andrew A.

Compilation Lecture 11a. Register Allocation Noam Rinetzky. Text book: Modern compiler implementation in C Andrew A. Compilation 0368-3133 Leture 11a Text book: Modern ompiler implementation in C Andrew A. Appel Register Alloation Noam Rinetzky 1 Registers Dediated memory loations that an be aessed quikly, an have omputations

More information

Boosted Random Forest

Boosted Random Forest Boosted Random Forest Yohei Mishina, Masamitsu suhiya and Hironobu Fujiyoshi Department of Computer Siene, Chubu University, 1200 Matsumoto-ho, Kasugai, Aihi, Japan {mishi, mtdoll}@vision.s.hubu.a.jp,

More information

Detection and Recognition of Non-Occluded Objects using Signature Map

Detection and Recognition of Non-Occluded Objects using Signature Map 6th WSEAS International Conferene on CIRCUITS, SYSTEMS, ELECTRONICS,CONTROL & SIGNAL PROCESSING, Cairo, Egypt, De 9-31, 007 65 Detetion and Reognition of Non-Oluded Objets using Signature Map Sangbum Park,

More information

Direct-Mapped Caches

Direct-Mapped Caches A Case for Diret-Mapped Cahes Mark D. Hill University of Wisonsin ahe is a small, fast buffer in whih a system keeps those parts, of the ontents of a larger, slower memory that are likely to be used soon.

More information

Department of Electrical and Computer Engineering University of Wisconsin Madison. Fall

Department of Electrical and Computer Engineering University of Wisconsin Madison. Fall Department of Eletrial and Computer Engineering University of Wisonsin Madison ECE 553: Testing and Testable Design of Digital Systems Fall 2014-2015 Assignment #2 Date Tuesday, September 25, 2014 Due

More information

Graph-Based vs Depth-Based Data Representation for Multiview Images

Graph-Based vs Depth-Based Data Representation for Multiview Images Graph-Based vs Depth-Based Data Representation for Multiview Images Thomas Maugey, Antonio Ortega, Pasal Frossard Signal Proessing Laboratory (LTS), Eole Polytehnique Fédérale de Lausanne (EPFL) Email:

More information

Implicit Representation of Molecular Surfaces

Implicit Representation of Molecular Surfaces Imliit Reresentation of Moleular Surfaes Julius Parulek Ivan Viola Deartment of Informatis, University of Bergen Deartment of Informatis, University of Bergen. (a) (b) () (d) (e) (f) (g) (h) Figure : Ray-asting

More information

Plot-to-track correlation in A-SMGCS using the target images from a Surface Movement Radar

Plot-to-track correlation in A-SMGCS using the target images from a Surface Movement Radar Plot-to-trak orrelation in A-SMGCS using the target images from a Surfae Movement Radar G. Golino Radar & ehnology Division AMS, Italy ggolino@amsjv.it Abstrat he main topi of this paper is the formulation

More information

Calculation of typical running time of a branch-and-bound algorithm for the vertex-cover problem

Calculation of typical running time of a branch-and-bound algorithm for the vertex-cover problem Calulation of typial running time of a branh-and-bound algorithm for the vertex-over problem Joni Pajarinen, Joni.Pajarinen@iki.fi Otober 21, 2007 1 Introdution The vertex-over problem is one of a olletion

More information

Analysis of input and output configurations for use in four-valued CCD programmable logic arrays

Analysis of input and output configurations for use in four-valued CCD programmable logic arrays nalysis of input and output onfigurations for use in four-valued D programmable logi arrays J.T. utler H.G. Kerkhoff ndexing terms: Logi, iruit theory and design, harge-oupled devies bstrat: s in binary,

More information

Lecture 18. Today, we will discuss developing algorithms for a basic model for parallel computing the Parallel Random Access Machine (PRAM) model.

Lecture 18. Today, we will discuss developing algorithms for a basic model for parallel computing the Parallel Random Access Machine (PRAM) model. U.C. Berkeley CS273: Parallel and Distributed Theory Lecture 18 Professor Satish Rao Lecturer: Satish Rao Last revised Scribe so far: Satish Rao (following revious lecture notes quite closely. Lecture

More information

System-Level Parallelism and Throughput Optimization in Designing Reconfigurable Computing Applications

System-Level Parallelism and Throughput Optimization in Designing Reconfigurable Computing Applications System-Level Parallelism and hroughput Optimization in Designing Reonfigurable Computing Appliations Esam El-Araby 1, Mohamed aher 1, Kris Gaj 2, arek El-Ghazawi 1, David Caliga 3, and Nikitas Alexandridis

More information

Chapter 2: Introduction to Maple V

Chapter 2: Introduction to Maple V Chapter 2: Introdution to Maple V 2-1 Working with Maple Worksheets Try It! (p. 15) Start a Maple session with an empty worksheet. The name of the worksheet should be Untitled (1). Use one of the standard

More information

arxiv: v2 [cs.cv] 25 Nov 2015

arxiv: v2 [cs.cv] 25 Nov 2015 Pose-Guided Human Parsing with Dee-Learned Features Fangting Xia, Jun Zhu, Peng Wang, Alan Yuille University of California, Los Angeles arxiv:158.3881v2 [s.cv] 25 Nov 215 Abstrat Parsing human body into

More information

8 Instruction Selection

8 Instruction Selection 8 Instrution Seletion The IR ode instrutions were designed to do exatly one operation: load/store, add, subtrat, jump, et. The mahine instrutions of a real CPU often perform several of these primitive

More information

Incremental Mining of Partial Periodic Patterns in Time-series Databases

Incremental Mining of Partial Periodic Patterns in Time-series Databases CERIAS Teh Report 2000-03 Inremental Mining of Partial Periodi Patterns in Time-series Dataases Mohamed G. Elfeky Center for Eduation and Researh in Information Assurane and Seurity Purdue University,

More information

Folding. Hardware Mapped vs. Time multiplexed. Folding by N (N=folding factor) Node A. Unfolding by J A 1 A J-1. Time multiplexed/microcoded

Folding. Hardware Mapped vs. Time multiplexed. Folding by N (N=folding factor) Node A. Unfolding by J A 1 A J-1. Time multiplexed/microcoded Folding is verse of Unfolding Node A A Folding by N (N=folding fator) Folding A Unfolding by J A A J- Hardware Mapped vs. Time multiplexed l Hardware Mapped vs. Time multiplexed/mirooded FI : y x(n) h

More information

Allocating Rotating Registers by Scheduling

Allocating Rotating Registers by Scheduling Alloating Rotating Registers by Sheduling Hongbo Rong Hyunhul Park Cheng Wang Youfeng Wu Programming Systems Lab Intel Labs {hongbo.rong,hyunhul.park,heng..wang,youfeng.wu}@intel.om ABSTRACT A rotating

More information

DECT Module Installation Manual

DECT Module Installation Manual DECT Module Installation Manual Rev. 2.0 This manual desribes the DECT module registration method to the HUB and fan airflow settings. In order for the HUB to ommuniate with a ompatible fan, the DECT module

More information

SPITFIRE: Scalable Parallel Algorithms for Test Set Partitioned Fault Simulation

SPITFIRE: Scalable Parallel Algorithms for Test Set Partitioned Fault Simulation To aear in IEEE VLSI Test Symosium, 1997 SITFIRE: Scalable arallel Algorithms for Test Set artitioned Fault Simulation Dili Krishnaswamy y Elizabeth M. Rudnick y Janak H. atel y rithviraj Banerjee z y

More information

13.1 Numerical Evaluation of Integrals Over One Dimension

13.1 Numerical Evaluation of Integrals Over One Dimension 13.1 Numerial Evaluation of Integrals Over One Dimension A. Purpose This olletion of subprograms estimates the value of the integral b a f(x) dx where the integrand f(x) and the limits a and b are supplied

More information

The recursive decoupling method for solving tridiagonal linear systems

The recursive decoupling method for solving tridiagonal linear systems Loughborough University Institutional Repository The reursive deoupling method for solving tridiagonal linear systems This item was submitted to Loughborough University's Institutional Repository by the/an

More information

Multiple Assignments

Multiple Assignments Two Outputs Conneted Together Multiple Assignments Two Outputs Conneted Together if (En1) Q

More information

1 The Knuth-Morris-Pratt Algorithm

1 The Knuth-Morris-Pratt Algorithm 5-45/65: Design & Analysis of Algorithms September 26, 26 Leture #9: String Mathing last hanged: September 26, 27 There s an entire field dediated to solving problems on strings. The book Algorithms on

More information

Constructing Transaction Serialization Order for Incremental. Data Warehouse Refresh. Ming-Ling Lo and Hui-I Hsiao. IBM T. J. Watson Research Center

Constructing Transaction Serialization Order for Incremental. Data Warehouse Refresh. Ming-Ling Lo and Hui-I Hsiao. IBM T. J. Watson Research Center Construting Transation Serialization Order for Inremental Data Warehouse Refresh Ming-Ling Lo and Hui-I Hsiao IBM T. J. Watson Researh Center July 11, 1997 Abstrat In typial pratie of data warehouse, the

More information

Learning Convention Propagation in BeerAdvocate Reviews from a etwork Perspective. Abstract

Learning Convention Propagation in BeerAdvocate Reviews from a etwork Perspective. Abstract CS 9 Projet Final Report: Learning Convention Propagation in BeerAdvoate Reviews from a etwork Perspetive Abstrat We look at the way onventions propagate between reviews on the BeerAdvoate dataset, and

More information

On the Relationship Between Dual Photography and Classical Ghost Imaging

On the Relationship Between Dual Photography and Classical Ghost Imaging 1 On the Relationshi Between Dual Photograhy and Classial Ghost Imaging Pradee Sen University of California, Santa Barbara arxiv:1309.3007v1 [hysis.otis] 12 Se 2013 Abstrat Classial ghost imaging has reeived

More information

Definitions Homework. Quine McCluskey Optimal solutions are possible for some large functions Espresso heuristic. Definitions Homework

Definitions Homework. Quine McCluskey Optimal solutions are possible for some large functions Espresso heuristic. Definitions Homework EECS 33 There be Dragons here http://ziyang.ees.northwestern.edu/ees33/ Teaher: Offie: Email: Phone: L477 Teh dikrp@northwestern.edu 847 467 2298 Today s material might at first appear diffiult Perhaps

More information

ARABIC OCR SYSTEM ANALOGOUS TO HMM-BASED ASR SYSTEMS; IMPLEMENTATION AND EVALUATION

ARABIC OCR SYSTEM ANALOGOUS TO HMM-BASED ASR SYSTEMS; IMPLEMENTATION AND EVALUATION ARABIC OCR SYSTEM ANALOGOUS TO HMM-BASED ASR SYSTEMS; IMPLEMENTATION AND EVALUATION M.A.A. RASHWAN, M.W.T. FAKHR, M. ATTIA, M.S.M. EL-MAHALLAWY 4 ABSTRACT Desite 5 years of R&D on the roblem of Otial harater

More information

Acoustic Links. Maximizing Channel Utilization for Underwater

Acoustic Links. Maximizing Channel Utilization for Underwater Maximizing Channel Utilization for Underwater Aousti Links Albert F Hairris III Davide G. B. Meneghetti Adihele Zorzi Department of Information Engineering University of Padova, Italy Email: {harris,davide.meneghetti,zorzi}@dei.unipd.it

More information

Efficient Parallel Hierarchical Clustering

Efficient Parallel Hierarchical Clustering Efficient Parallel Hierarchical Clustering Manoranjan Dash 1,SimonaPetrutiu, and Peter Scheuermann 1 Deartment of Information Systems, School of Comuter Engineering, Nanyang Technological University, Singaore

More information

A Partial Sorting Algorithm in Multi-Hop Wireless Sensor Networks

A Partial Sorting Algorithm in Multi-Hop Wireless Sensor Networks A Partial Sorting Algorithm in Multi-Hop Wireless Sensor Networks Abouberine Ould Cheikhna Department of Computer Siene University of Piardie Jules Verne 80039 Amiens Frane Ould.heikhna.abouberine @u-piardie.fr

More information

Approximate logic synthesis for error tolerant applications

Approximate logic synthesis for error tolerant applications Approximate logi synthesis for error tolerant appliations Doohul Shin and Sandeep K. Gupta Eletrial Engineering Department, University of Southern California, Los Angeles, CA 989 {doohuls, sandeep}@us.edu

More information

A Dual-Hamiltonian-Path-Based Multicasting Strategy for Wormhole-Routed Star Graph Interconnection Networks

A Dual-Hamiltonian-Path-Based Multicasting Strategy for Wormhole-Routed Star Graph Interconnection Networks A Dual-Hamiltonian-Path-Based Multiasting Strategy for Wormhole-Routed Star Graph Interonnetion Networks Nen-Chung Wang Department of Information and Communiation Engineering Chaoyang University of Tehnology,

More information

Uncovering Hidden Loop Level Parallelism in Sequential Applications

Uncovering Hidden Loop Level Parallelism in Sequential Applications Unovering Hidden Loop Level Parallelism in Sequential Appliations Hongtao Zhong, Mojtaba Mehrara, Steve Lieberman, and Sott Mahlke Advaned Computer Arhiteture Laboratory University of Mihigan, Ann Arbor,

More information

Performance Benchmarks for an Interactive Video-on-Demand System

Performance Benchmarks for an Interactive Video-on-Demand System Performane Benhmarks for an Interative Video-on-Demand System. Guo,P.G.Taylor,E.W.M.Wong,S.Chan,M.Zukerman andk.s.tang ARC Speial Researh Centre for Ultra-Broadband Information Networks (CUBIN) Department

More information

Establishing Secure Ethernet LANs Using Intelligent Switching Hubs in Internet Environments

Establishing Secure Ethernet LANs Using Intelligent Switching Hubs in Internet Environments Establishing Seure Ethernet LANs Using Intelligent Swithing Hubs in Internet Environments WOEIJIUNN TSAUR AND SHIJINN HORNG Department of Eletrial Engineering, National Taiwan University of Siene and Tehnology,

More information

Object and Native Code Thread Mobility Among Heterogeneous Computers

Object and Native Code Thread Mobility Among Heterogeneous Computers Object and Native Code Thread Mobility Among Heterogeneous Comuters Bjarne Steensgaard Eric Jul Microsoft Research DIKU (Det. of Comuter Science) One Microsoft Way University of Coenhagen Redmond, WA 98052

More information

An Efficient and Scalable Approach to CNN Queries in a Road Network

An Efficient and Scalable Approach to CNN Queries in a Road Network An Effiient and Salable Approah to CNN Queries in a Road Network Hyung-Ju Cho Chin-Wan Chung Dept. of Eletrial Engineering & Computer Siene Korea Advaned Institute of Siene and Tehnology 373- Kusong-dong,

More information

Flow Demands Oriented Node Placement in Multi-Hop Wireless Networks

Flow Demands Oriented Node Placement in Multi-Hop Wireless Networks Flow Demands Oriented Node Plaement in Multi-Hop Wireless Networks Zimu Yuan Institute of Computing Tehnology, CAS, China {zimu.yuan}@gmail.om arxiv:153.8396v1 [s.ni] 29 Mar 215 Abstrat In multi-hop wireless

More information

A Load-Balanced Clustering Protocol for Hierarchical Wireless Sensor Networks

A Load-Balanced Clustering Protocol for Hierarchical Wireless Sensor Networks International Journal of Advanes in Computer Networks and Its Seurity IJCNS A Load-Balaned Clustering Protool for Hierarhial Wireless Sensor Networks Mehdi Tarhani, Yousef S. Kavian, Saman Siavoshi, Ali

More information

COST PERFORMANCE ASPECTS OF CCD FAST AUXILIARY MEMORY

COST PERFORMANCE ASPECTS OF CCD FAST AUXILIARY MEMORY COST PERFORMANCE ASPECTS OF CCD FAST AUXILIARY MEMORY Dileep P, Bhondarkor Texas Instruments Inorporated Dallas, Texas ABSTRACT Charge oupled devies (CCD's) hove been mentioned as potential fast auxiliary

More information

COMP 181. Prelude. Intermediate representations. Today. Types of IRs. High-level IR. Intermediate representations and code generation

COMP 181. Prelude. Intermediate representations. Today. Types of IRs. High-level IR. Intermediate representations and code generation Prelude COMP 181 Intermediate representations and ode generation November, 009 What is this devie? Large Hadron Collider What is a hadron? Subatomi partile made up of quarks bound by the strong fore What

More information

What are Cycle-Stealing Systems Good For? A Detailed Performance Model Case Study

What are Cycle-Stealing Systems Good For? A Detailed Performance Model Case Study What are Cyle-Stealing Systems Good For? A Detailed Performane Model Case Study Wayne Kelly and Jiro Sumitomo Queensland University of Tehnology, Australia {w.kelly, j.sumitomo}@qut.edu.au Abstrat The

More information

Drawing lines. Naïve line drawing algorithm. drawpixel(x, round(y)); double dy = y1 - y0; double dx = x1 - x0; double m = dy / dx; double y = y0;

Drawing lines. Naïve line drawing algorithm. drawpixel(x, round(y)); double dy = y1 - y0; double dx = x1 - x0; double m = dy / dx; double y = y0; Naïve line drawing algorithm // Connet to grid points(x0,y0) and // (x1,y1) by a line. void drawline(int x0, int y0, int x1, int y1) { int x; double dy = y1 - y0; double dx = x1 - x0; double m = dy / dx;

More information

Staircase Join: Teach a Relational DBMS to Watch its (Axis) Steps

Staircase Join: Teach a Relational DBMS to Watch its (Axis) Steps Stairase Join: Teah a Relational DBMS to Wath its (Axis) Steps Torsten Grust Maurie van Keulen Jens Teubner University of Konstanz Department of Computer and Information Siene P.O. Box D 88, 78457 Konstanz,

More information

Multi-Piece Mold Design Based on Linear Mixed-Integer Program Toward Guaranteed Optimality

Multi-Piece Mold Design Based on Linear Mixed-Integer Program Toward Guaranteed Optimality INTERNATIONAL CONFERENCE ON MANUFACTURING AUTOMATION (ICMA200) Multi-Piee Mold Design Based on Linear Mixed-Integer Program Toward Guaranteed Optimality Stephen Stoyan, Yong Chen* Epstein Department of

More information

Space- and Time-Efficient BDD Construction via Working Set Control

Space- and Time-Efficient BDD Construction via Working Set Control Spae- and Time-Effiient BDD Constrution via Working Set Control Bwolen Yang Yirng-An Chen Randal E. Bryant David R. O Hallaron Computer Siene Department Carnegie Mellon University Pittsburgh, PA 15213.

More information

Department of Electrical Engineering and Computer Science MASSACHUSETTS INSTITUTE OF TECHNOLOGY Fall Test I Solutions

Department of Electrical Engineering and Computer Science MASSACHUSETTS INSTITUTE OF TECHNOLOGY Fall Test I Solutions Department of Eletrial Engineering and Computer iene MAACHUETT INTITUTE OF TECHNOLOGY 6.035 Fall 2016 Test I olutions 1 I Regular Expressions and Finite-tate Automata For Questions 1, 2, and 3, let the

More information

represent = as a finite deimal" either in base 0 or in base. We an imagine that the omputer first omputes the mathematial = then rounds the result to

represent = as a finite deimal either in base 0 or in base. We an imagine that the omputer first omputes the mathematial = then rounds the result to Sientifi Computing Chapter I Computer Arithmeti Jonathan Goodman Courant Institute of Mathemaial Sienes Last revised January, 00 Introdution One of the many soures of error in sientifi omputing is inexat

More information

The Happy Ending Problem

The Happy Ending Problem The Happy Ending Problem Neeldhara Misra STATUTORY WARNING This doument is a draft version 1 Introdution The Happy Ending problem first manifested itself on a typial wintery evening in 1933 These evenings

More information

Detecting Moving Targets in Clutter in Airborne SAR via Keystoning and Multiple Phase Center Interferometry

Detecting Moving Targets in Clutter in Airborne SAR via Keystoning and Multiple Phase Center Interferometry Deteting Moving Targets in Clutter in Airborne SAR via Keystoning and Multiple Phase Center Interferometry D. M. Zasada, P. K. Sanyal The MITRE Corp., 6 Eletroni Parkway, Rome, NY 134 (dmzasada, psanyal)@mitre.org

More information

Smooth Trajectory Planning Along Bezier Curve for Mobile Robots with Velocity Constraints

Smooth Trajectory Planning Along Bezier Curve for Mobile Robots with Velocity Constraints Smooth Trajetory Planning Along Bezier Curve for Mobile Robots with Veloity Constraints Gil Jin Yang and Byoung Wook Choi Department of Eletrial and Information Engineering Seoul National University of

More information

A DYNAMIC ACCESS CONTROL WITH BINARY KEY-PAIR

A DYNAMIC ACCESS CONTROL WITH BINARY KEY-PAIR Malaysian Journal of Computer Siene, Vol 10 No 1, June 1997, pp 36-41 A DYNAMIC ACCESS CONTROL WITH BINARY KEY-PAIR Md Rafiqul Islam, Harihodin Selamat and Mohd Noor Md Sap Faulty of Computer Siene and

More information

High Quality Offset Printing An Evolutionary Approach

High Quality Offset Printing An Evolutionary Approach High Quality Offset Printing An Evolutionary Aroach Ralf Joost Institute of Alied Microelectronics and omuter Engineering University of Rostock Rostock, 18051, Germany +49 381 498 7272 ralf.joost@uni-rostock.de

More information

Video Data and Sonar Data: Real World Data Fusion Example

Video Data and Sonar Data: Real World Data Fusion Example 14th International Conferene on Information Fusion Chiago, Illinois, USA, July 5-8, 2011 Video Data and Sonar Data: Real World Data Fusion Example David W. Krout Applied Physis Lab dkrout@apl.washington.edu

More information

SVC-DASH-M: Scalable Video Coding Dynamic Adaptive Streaming Over HTTP Using Multiple Connections

SVC-DASH-M: Scalable Video Coding Dynamic Adaptive Streaming Over HTTP Using Multiple Connections SVC-DASH-M: Salable Video Coding Dynami Adaptive Streaming Over HTTP Using Multiple Connetions Samar Ibrahim, Ahmed H. Zahran and Mahmoud H. Ismail Department of Eletronis and Eletrial Communiations, Faulty

More information

Total 100

Total 100 CS331 SOLUTION Problem # Points 1 10 2 15 3 25 4 20 5 15 6 15 Total 100 1. ssume you are dealing with a ompiler for a Java-like language. For eah of the following errors, irle whih phase would normally

More information

Cross-layer Resource Allocation on Broadband Power Line Based on Novel QoS-priority Scheduling Function in MAC Layer

Cross-layer Resource Allocation on Broadband Power Line Based on Novel QoS-priority Scheduling Function in MAC Layer Communiations and Networ, 2013, 5, 69-73 http://dx.doi.org/10.4236/n.2013.53b2014 Published Online September 2013 (http://www.sirp.org/journal/n) Cross-layer Resoure Alloation on Broadband Power Line Based

More information

Volume 3, Issue 9, September 2013 International Journal of Advanced Research in Computer Science and Software Engineering

Volume 3, Issue 9, September 2013 International Journal of Advanced Research in Computer Science and Software Engineering Volume 3, Issue 9, September 2013 ISSN: 2277 128X International Journal of Advaned Researh in Computer Siene and Software Engineering Researh Paper Available online at: www.ijarsse.om A New-Fangled Algorithm

More information

Projector Calibration for 3D Scanning Using Virtual Target Images

Projector Calibration for 3D Scanning Using Virtual Target Images INTERNATIONAL JOURNAL OF RECISION ENGINEERING AND MANUFACTURING Vol. 13, No. 1,. 125-131 JANUARY 2012 / 125 DOI: 10.1007/s12541-012-0017-3 rojetor Calibration for 3D Sanning Using Virtual Target Images

More information

Research Article Intuitionistic Fuzzy Possibilistic C Means Clustering Algorithms

Research Article Intuitionistic Fuzzy Possibilistic C Means Clustering Algorithms Hindawi Publishing Cororation Advanes in Fuzzy Systems Volume 2015, Artile ID 2827, 17 ages htt://dx.doi.org/10.1155/2015/2827 Researh Artile Intuitionisti Fuzzy Possibilisti C Means Clustering Algorithms

More information

High Quality Offset Printing An Evolutionary Approach

High Quality Offset Printing An Evolutionary Approach High Quality Offset Printing An Evolutionary Aroach Ralf Joost Institute of Alied Microelectronics and omuter Engineering University of Rostock Rostock, 18051, Germany +49 381 498 7272 ralf.joost@uni-rostock.de

More information