Balancing Register Allocation Across Threads for a Multithreaded Network Processor

Size: px
Start display at page:

Download "Balancing Register Allocation Across Threads for a Multithreaded Network Processor"

Transcription

1 Balancing Regier Allocaion Acro Thread for a Mulihreaded Nework Proceor Xiaoong Zhuang Georgia Iniue of Technology College of Compuing Alana, GA, x2000@cc.gaech.edu Sanoh Pande Georgia Iniue of Technology College of Compuing Alana, GA, anoh@cc.gaech.edu ABSTRACT + Modern nework proceor employ muli-hreading o allow concurrency among muliple packe proceing ak. We udied he properie of applicaion running on he nework proceor and oberved ha heir imbalanced regier requiremen acro differen hread a differen program poin could lead o poor performance. Many ime applicaion need demand ome hread o be more performance criical han oher and hu by conrolling he regier allocaion acro hread one could impac he performance of he hread and ge he deired performance properie for concurren hread. Thi promp our work. Our regier allocaor aim o diribue available regier o differen hread according o heir need. The compiler analyze he regier need of each hread boh a he poin of a conex wich a well a inernally. Compiler hen deignae ome regier a hared and ome a privae o each hread. Shared regier are allocaed acro all hread explicily by he compiler. Value ha are live acro a conex wich can no be kep in hared regier due o afey reaon; hu, only hoe live range ha are inernal o he conex wich can be afely allocaed o hared regier. Spill can caue a conex wich. and hu, he problem of conex wich and allocaion are cloely coupled and we propoe a oluion o hi problem. The propoed inerference graph (GIG,BIG,IIG) diinguih variable ha mu ue a hread' privae regier from hoe ha can ue hared regier. We fir eimae he regier requiremen bound, hen reduce from he upper bound gradually o achieve a good regier balance among hread. To reduce he regier need, move inerion are inered a program poin ha pli he live range or he node on he inerference graph. We how ha he lower bound i reachable via live range pliing and i adequae for our benchmark program for imulaneouly aigning hem on differen hread. A our objecive, he number of move inrucion i minimized. Empirical reul how ha he compiler i able o effecively conrol he regier allocaion acro hread by maximizing he number of hared regier. Speed-up for performance criical hread range from 18 o 24% wherea degradaion for performance of non-criical hread range only from 1 o 4%. + Permiion o make digial or hard copie of all or par of hi work for peronal or claroom ue i graned wihou fee provided ha copie are no made or diribued for profi or commercial advanage and ha copie bear hi noice and he full ciaion on he fir page. To copy oherwie, or republih, o po on erver or o rediribue o li, require prior pecific permiion and/or a fee. PLDI 04, June 9 11, 2004, Wahingon, DC, USA. Copyrigh 2004 ACM /04/ $5.00. Caegorie and Subjec Decripor: D.3.4 [Programming Language]: Proceor Opimizaion, Code generaion, Run-ime environmen. General Term: Algorihm, Language, Performance. Keyword: Nework Proceor, Regier Allocaion, Mulihreaded Proceor. 1. INTRODUCTION The dramaic growh in Inerne raffic ha moivaed a pecialized caegory of embedded proceor called Nework Proceor (NP) wih fa proceing peed and pecialized hardware uppor for nework applicaion. Nework proceor are diinguihed by heir fa proceing core and are programmed in a dedicaed manner for caering o he pecific need of underlying applicaion. The compiler opimizaion for nework proceor i an emerging opic for reearch [3][4][5][19]. In hi paper, we aemp he regier allocaion problem for a mulihreaded nework proceor IXP. The IXP nework proceor model can be applied o any nework proceor wih hared CPU and regier file for muliple hread and wih fa conex wich o hide long laency operaion uch a memory accee. Typically, nework proceor applicaion coni of muliple hread concurrenly execuing muliple ak of a nework proceing applicaion. The ak can be a imple a packe rouing o complex one ha proce packe conen for virue and malignan code ec. In conra o general proceor, he ak ha execue on differen hread of a nework proceor are bound o hem a compilaion ime; in oher word, no run ime hread aignmen ake place. Since low level operaion originally done wih OS or hardware uch a conex wich are expoed o he programmer, he compiler ha he knowledge of hread ineracion which are predicable. I i obviou ha differen ak have differen complexiie and alo level of deired performance. Some ak may be more performance criical han oher. Implemening (effecing) uch performance need acro hread i currenly impoible for any uer. Thi i o ince compiler allocae a fixed number (32) of regier o each hread and doe no underake iner-hread analyi o balance heir overall regier need. I may be noed ha performance of a hread i quie eniive o regier need; even hough he number of pill may be mall for a larger number of regier, each pill i very expenive (laency of abou 20 cycle). Our experience wih Inel IXP nework proce family, which largely follow hi model, ell u ha: 1) we can achieve regier balancing among differen hread and 2) we can reduce pill hrough he afe ue of hared regier which are no

2 live acro conex wich inrucion for individual hread 3) hrough he ue of regier haring, overall, we make more regier available o hread booing heir performance. Thu, overall by balancing regier need acro hread we can mee heir performance requiremen. Thee opimizaion are neceary due o he dipariy of regier preure acro hread and acro differen region of code in each hread. We fir dicu he nework proceor archiecure o gain ome underanding of he problem of balancing regier requiremen acro hread. 1.1 Nework proceor Sae-of-he-ar nework proceor like Inel IXP1200/2400/2800, MMC np erie, IBM power NP ec.[21]. have programmable proceing core ha can be coded for applicaion need. In conra o radiional proceor, nework proceor have heir pecial properie. Speed v Flexibiliy Nework proceor face he dilemma of offering boh promp proceing of he nework raffic and flexibiliy o he ofware programmer o mee he requiremen of differen applicaion. A he nework peed coninue o increae, he ime o proce each packe mu be horened o avoid packe lo. For example, proceing a OC-192 allow only 52n for each packe and OC-768 leave only 13n for proceing. The higher peed require boh horer proceing ime for each packe and horer ime a packe can ay in he yem (waiing ime + proceing ime). To peedup he criical pah for packe proceing, normally a number of RISC proceor core are equipped o work in parallel. Alhough omeime a co-proceor (ypically a general purpoe proceor) i added o handle oher low ak, he packe proceing core mu be opimized for peed. Therefore, feaure uch a explici mulihreading, explici and fa conex wiching (only pc i aved), direc memory acce (wihou he complicaion of cache) are commonly een in nework proceor deign. A memory operaion are exremely ime-conuming, oluion hould focu on hiding he laency wih olerable hardware and ofware complexiy. Wih fa conex wich, each proceor core can hide laencie by conex wiching o oher hread when acceing he peripheral. Even if cache are enabled, conex wich o oher hread i generally a clever way o avoid he deviaion in memory acce ime like in he MMC np erie. Nework proceor are alo aimed o provide plehora of oluion for nework applicaion, which were originally implemened wih dedicaed hardware (no flexible) or general purpoe proceor (oo low). Recen reearch [1][2][22] ha aemped complicaed ak uch a conen inpecion, ofware rouer, inruion deecion, ec. A more code bae i added o nework proceor, wriing in aembly can be error-prone and ime-auming, which grealy hamper he fa prooyping and increae ime o marke. To provide programmabiliy, a High Level Language (HLL) compiler i omeime provided, alhough ypically wih limied language feaure uppor. There are everal on-going reearch effor o build proper opimizing compiler for nework proceor [3][4][5]. A menioned earlier, non of he curren compiler underake iner-hread analyi forcing programmer o manage he regier preure acro hread. Wihou any help from he compiler i i impoible for a uer o hand-une (muli-hreaded) code. Thi mo Inel IXP Nework Proceor In hi paper, we bae our work on he Inel IXP nework proceor. Since i ucceful deign ha made i a very popular produc in he nework proceor marke, we generalize i feaure a a general model in ecion 2. Here, we preen everal prominen feaure of he IXP nework proceor, which promp he hread regier allocaion problem (deail in ecion 1.2). Figure 1 how he block diagram of he IXP1200 nework proceor. The chip ha 6 micro-engine (proceing uni or PU) and 4 hread hare he ame PU. The chip ha connecion o offchip SRAM, SDRAM, PCI bu ec. A hown in Figure 2.a, ypically, each PU ge packe from i inpu queue, procee i and hen wrie o i oupu queue, or he inpu queue of he nex PU in he nex pipeline age. Wih pipeline proceing, ypically, ome PU are in charge of geing packe from he inpu por; ome handle packe proceing and ome are for oupu por. Our opimizaion focue on he code on differen hread of he ame PU. Figure 2.b how major componen inide each PU. Some of he imporan feaure are a follow: 1. Shared regier file bu ypically non-overlapped pariion. Figure 2.b how ha he general purpoe regier (GPR) file i hared by he 4 hread. Each hread ha acce o all regier; however wihou opimizaion, each hread i normally allocaed non-overlapping par of he regier file. The reaon for he regier file pariion i due o lighweighed conex wich a dicued below. 2. Non-preempable hread execuion. There i no operaing yem, no conrol preen over he hread haring he CPU. A hread give up he CPU only when i block on I/O or oher long laency operaion or execue a conex wich (cx_wich) inrucion volunarily Ligh-weighed conex wich. Conex wich i cheap (only PC i aved), hi i alo he reaon regier are normally allocaed in a non-overlapped fahion from he regier file. If a regier i allocaed o wo hread, afer conex wich, he conen in ha regier may be modified by he oher hread. Since regier are neiher auomaically aved nor reored during a conex wich uch poibiliie exi and hi i where i become a compiler problem o manage regier. 4. Cheap ALU, expenive memory acce. No cache i available for memory accee; a lea 20 cycle are needed for each load/ore inrucion. Conex wiche are ypically followed o hide he long laency of memory accee. In conra, all ALU inrucion can be compleed in 1 cycle. Large memory laency make overall performance eniive o pill even hough hey may be few in number. Figure 1. IXP1200 block diagram. 1 cx_wich inrucion can be inered by he programmer o achieve fair haring of he CPU.

3 128 GPR PU 1 PU 2 A Proceing Uni--MicroEngine Thread 1 Thread 2 Thread 3 Thread 4 Figure 2. IXP1200 hread and regier file on a PU. The above feaure of he IXP nework proceor are driven by deign philoophy o implify hardware o a o increae he clock rae and execuion peed. For inance, conex wich i kep very imple and fa (1 cycle laency). For hi only program couner (pc) i aved bu no regier are aved becaue i can caue long delay in conex wich which may offe he benefi of CPU haring. On he oher hand, ince all he hardware deail are expoed, compiler can pruden deciion regarding regier haring ec. Nex, we propoe he muli-hreaded regier allocaion problem. 1.2 The Regier Allocaion Problem A menioned above, alhough he regier file can be acceed by all hread, i ha o be pariioned wihou overlap acro hread becaue no regier i aved/reored during conex wich. Here, we argue ha ome regier can be afely hared by all hread hrough compiler analyi ince hread wich i predicable. The example in Figure 3 illurae he problem and he poible way o olve i. In Figure 3.a, he code for wo hread are hown. Aume all variable are dead afer heir la ue in he code. In hread 1, a code egmen conain 12 inrucion, including wo conex wich inrucion cx_wich give up CPU volunarily and a load caue conex wich o wai for I/O operaion. Any pair of he 3 variable inerfere wih each oher (co-live a ome program poin), o in Figure 3.b, hey are aigned 3 differen phyical regier. Noice ha variable a i live acro cx_wich inrucion, o i mu be allocaed o a phyical regier ha i no ued by any oher hread, becaue when hread 1 i conex wiched a hi poin, oher hread hould no modify he phyical regier of variable a, which mean only hread 1 hould ue he regier. On he conrary, variable b and c are only ued beween wo conex wich inrucion. In oher word, when hread 1 i wiched ou of he CPU, boh b and c mu be dead. Therefore i i afe o reue he phyical regier allocaed o b and c in oher hread. Thread 2 ha 4 inrucion, wih wo conex wich inrucion. d i only live beween wo conex wich inrucion, herefore d can hare a phyical regier wih oher hread. Simply, r2 i hared ued for b in hread 1 and d in hread 2, becaue he code guaranee ha when conex i wiched o hread 2, r2 conain a dead value for hread 1. Similarly, when conex i wiched o hread 1, r2 conain a dead value (d) in hread 2. Thi example how benefi of haring regier and lowering oal regier requiremen from four o hree. We now how ha hrough anoher echnique (live range pliing) one can reduce oal regier requiremen furher. Three regier eem neceary for hread 1, however we noice ha a any program poin, only wo variable are co-live. Thi promp our echnique of pliing one of he variable and inering a move inrucion a cerain poin. Thi i demonraed in Figure 3.c. In inrucion 6, r3 i replaced by r1, while from inrucion 8 o 9, r3 i replaced by r2. Inrucion 10 copie r2 o r1, o in inrucion 12, we have a conien replacemen (r3 r1). We have managed o reduce oal regier requiremen down o wo now. Thread 1 1. a= 2. cx_wich 3. if( )br L1 4. b= 5. =a+b 6. c= 7. br L2 L1: 8. c= 9. =a+c 10. b= L2: 11. =b+c 12. load Thread 2 1. cx_wich 2. d= 3. =d+ 4. ore Thread 1 1. r1= 2. cx_wich 3. if( )br L1 4. r2= 5. =r1+r2 6. r3= 7. br L2 L1: 8. r3= 9. =r1+r3 10. r2= L2: 11. =r2+r3 12. load Thread 2 1. cx_wich 2. r2= 3. =r2+ 4. ore Thread 1 1. r1= 2. cx_wich 3. if( )br L1 4. r2= 5. =r1+r2 6. r1(r3)= 7. br L2 L1: 8. r2(r3)= 9. =r1+r2(r3) 10. r1(r3)=r2(r3)* 11. r2= L2: 12. =r2+r1(r3) 13. load Figure 3. Example of regier haring and move inerion. The above example illurae he poenial benefi of regier haring acro hread and live range pliing. To furher juify he muli-hreaded regier allocaion i imporan and a compiler oluion i feaible, we li ome properie of he program ha run on he nework o uppor hi argumen. 1. For IXP1200, he hardware provide eemingly enough regier. 128 general purpoe regier (GPR) can be ued for each PU. However, for each hread, only 32 GPR are available if no GPR i hared acro hread. Regier haring in IXP i a purely ofware oluion, unlike ome SMT (Simulaneou Muli-hreading) where i i hardware managed. Compiler deignae and allocae a regier eiher a a hared or privae one. 2. Since here i no operaing yem o manage hread, memory acce, conex wich ec. are all explici and hu conex wich i predicable a compile ime. 3. A hown in our experimen, conex wich inrucion are ypically le han 10% of he oal inrucion and many variable are no live acro conex wich inrucion. 4. PU are aigned wih differen ak. Packe are proceed in pipeline fahion--figure 2.a. Currenly, ak aignmen canno be done auomaically. Alhough in mo cae, he ame ak i aigned o hread on he ame microengine. Thi acually lead o low uilizaion of he CPU, becaue i i hard o chop ak properly o ha hey all ake roughly ¼ of he compuaion power of he PU. Therefore, we hould aume ak migh be differen for hread on he ame PU. Iem 1 indicae ha he regier may no be ufficien on he nework proceor. Iem 2 and 3 uppor he feaibiliy of a compiler oluion o opimize he regier allocaion. Finally, iem 4 promp wo kind of problem, i.e. ymmeric v. aymmeric regier allocaion, which will be defined in nex ecion. Thi paper i organized a follow. Secion 2 decribe he yem model and problem formaion, ecion 3 alk abou he conrucion of he inerference graph, ecion 4 i he overall framework, ecion 5 propoe he algorihm o eimae bound of (c)

4 regier number, ecion 6 and 7 are for iner-hread and inrahread regier allocaion, ecion 8 menion SRA problem briefly, ecion 9 how performance evaluaion reul and ecion 10 alk abou relaed work and ecion 11 i he concluion. 2. PRELIMINARIES Syem Model In hi paper, we udy a mulihreaded nework proceor ha can run muliple hread on a ingle proceing uni (PU i.e. micro-engine for IXP). The hread on one PU hare he compuaion power of he PU and regier file ec. Formally, he model i a follow: 1. There are oally N reg regier ha can be ued by N hd hread haring a ingle PU. 2. Explici conex wich. A hread won give up he CPU once i ar execuion on i, unil a conex wich inrucion i me. Conex wich can happen due o explici inrucion or long laency inrucion like a load or a ore. 3. Conex wich i very cheap (only pc i aved) and i i inended o hide long laency operaion. 4. Since nework packe are moly independen of each oher o are hread. The purpoe of mulihreading on he ame PU i mainly for laency hiding and concurrency. When one hread i alled due o I/O or oher long laency operaion, oher hread can ake he CPU. Therefore, code on differen hread are almo independen (Figure 2.a). Thread communicaion or ynchronizaion rarely happen, however, our curren oluion ill work under uch circumance. A a fuure work, knowledge abou hread communicaion or ynchronizaion migh be exploied o improve he regier allocaor. 5. All regier are acceible by all hread, bu he regier ued by one hread a he poin of conex wich hould no be ued anywhere by oher hread (laer, we will define hee regier a privae regier), becaue hi migh caue unexpeced modificaion o he regier and lead o unafe code. 6. Move inrucion i much cheaper han pill. 7. Code on differen hread of he ame PU can be differen. Problem Claificaion A menioned in ecion 1.2, program execuing on differen hread can be idenical. We call he regier allocaion problem under uch circumance Symmeric Regier Allocaion (SRA). On he conrary, Aymmeric Regier Allocaion (ARA) aume differen program for differen hread. Mixing hread wih differen compuaion requiremen can achieve beer CPU uilizaion. Since SRA i a ub-problem of ARA, in hi paper, we develop our approache baed on ARA. Noice ha, alhough currenly mo real program are for SRA, we are no inenionally complicaing he problem, becaue our algorihm are equally neceary and imporan o SRA, a will be illuraed laer, SRA only reduce earching pace during iner-hread regier allocaion, while all echnique in hi paper are applicable o boh problem. Our goal i develop general echnique ha apply wihou undue rericion. Objecive The number of oal available regier i limied. Therefore, in a mulihreaded nework proceor model, we aim o (for ARA) balance he regier allocaion among all hread, o ha more regier are allocaed o he hread wih higher regier preure and he regier allocaion i caered o he requiremen of differen hread in he yem. Furhermore, deignaing a larger number of hared regier can help all hread o inernally adju heir regier preure wihou cauing pill. In cae here are no enough regier available for all hread, we aemp o pli he live range inide a hread by uing move inrucion. Alo, our objecive i o minimize he number of move inrucion inered. The reul how move inerion i cheap and effecive. Problem Formulaion To formalize he problem, we define everal concep. DEFINITIONS: PR i : Number of privae regier for hread i, hee are phyical regier only (excluively) ued by hread i. SR i : Number of hared regier needed by hread i, hee are phyical regier ued by hread i, bu oher hread may ue hem a well. R i : Number of oal phyical regier needed by hread i, equal PR i +SR i SGR: Number of globally hared regier needed, i i he maximum of hared regier demand of each hread, ince hared regier can be ued by all hread, hi i he maximum of all SR. N reg : Toal number of phyical regier available in a PU. For a hread, PR i he number of phyical regier ha are excluively allocaed o i or he number of phyical regier ha can be live acro conex wich inrucion, while SR i he number of allocaed phyical regier ha are dead during conex wich, which mean hey can be hared acro hread. For example, in Figure 3.b, for hread 1, PR 1 =1, SR 1 =2, for hread 2, PR 2 =0,SR 2 =1, herefore, SGR=2. The relaionhip and rericion among hee variable are illuraed a he following condiion: SGR = Max( SR1, SR2... SR Nhd ) PR + SGR N i i PRi + SRi = Ri reg For SRA, all PR i and SR i are equal. Given hee rericion, we need o aign regier in a way ha he overall regier need i aified and pill are minimized. 3. CONSTRUCTION OF INTERFERENCE GRAPHS 3.1 Non-Swich Region DEFINITIONS: Non-Swich Region (NSR): A non-wich region i a maximal conneced ub-graph of he CFG wihou any inernal conex wich inrucion. I conain conneced par from everal baic block. The boundarie of he NSR are eiher conex wich inrucion or program enry/exi poin. Conex Swich Boundary (CSB): The program poin of he conex wich inrucion. A CSB eparae he baic block i reide, hu become he boundary of NSR(). A NSR can be conruced by aring from an individual inrucion and grown i unil all nearby inrucion are conex wich inrucion or program enry/exi poin.

5 To illurae, Figure 4.a how he CFG and NSR for a code egmen from benchmark frag in he Commbench uie [15]. Thi code egmen i from one of he funcion o calculae he IP checkum. The CFG coni of 10 baic block. Noiceably, here are four conex wich inrucion, i.e. he read inrucion in BB3 and BB7, he explici cx_wich inrucion in BB5 and BB6. The cx_wich inrucion are inered by he programmer o avoid he monopoly of he CPU. Figure 4.b how he NSR. Afer erminaing he CFG a he poin of conex wich inrucion (boundarie), we ge 3 NSR. The NSR are bound by eiher program enry/exi poin or conex wich inrucion (CSB). We can aume all erminaing are inide baic block, herefore ome baic block are pli, like BB5 i pli ino BB5.a in NSR2 and BB5.b in NSR1. Someime, wo par of a eparaed baic block ill belong o he ame NSR like he BB7 in Figure 4. For he example in Figure 3, hread 1 ha wo NSR, inrucion 1 and 2 are in NSR1 and inrucion 2 o 12 coniue NSR2. For hread 2, all inrucion form one NSR. BB3 read mp1 [buf], 1 um+=mp1&0xffff buf=buf+2 if!(um&0x ) br BB5 BB4 Sum=(um&0xFFFF) +(um>>16) BB5 len-=2 cx_wich Goo BB2 cx_wich Goo BB2 BB3.a read NSR2 BB1 BB2 NSR1 BB5.b BB3.b read BB4 BB5.a len-=2 cx_wich Sum=0 If (len<2) br BB6 BB1 BB2 BB6 cx_wich If!(len) goo BB8 BB7 Read mp2 [buf],1 Sum+=mp2&0xFFFF BB6.a cx_wich BB8 If!(um>>16)br BB10 BB9 Sum=(um&0xFFFF) +(um>>16) goo BB8 BB7.a read BB7.b Read Sum+= BB9 reurn ~um NSR3 BB6.b cx_wich BB10 Figure 4. Program CFG and he conruced NSR. 3.2 Inerference Graph Afer building he NSR, we build he inerference graph, which will guide he regier requiremen eimaion and regier allocaion. We need o diinguih wo kind of inerference and inroduce ome oher definiion for he inerference graph. BB8 DEFINITIONS: Node: Live range of a virual regier or variable 2 Boundary Node: Node ha i live acro he CSB, which may inerfere wih oher boundary node. Inernal Node: Node ha i no live acro CSB. Boundary Inerference: If wo boundary node are co-live acro he ame CSB, hey are aid o be boundary inerfering wih each oher. Inernal Inerference: If wo node (inernal or boundary node) inerfere (co-live a a program poin) wihin a NSR. Boundary Inerference Graph (BIG): A graph coni of all boundary node and edge only repreening boundary inerference. Inernal Inerference Graph (IIG): For each NSR, we have an IIG, which only include he inernal node live wihin hi NSR and heir inerference edge. Global Inerference Graph (GIG): The global inerference graph include boh boundary node and inernal node. An edge i added if any wo node (inernal or boundary) inerfere wih each oher. The GIG of he code for he example in Figure 4 i drawn in Figure 5. We aume boh len and buf are live a he enry poin a he lengh and he buffer poiner of he packe o be calculaed. Alo, we aume all variable are dead afer heir la ue in he code. From Figure 4.b, we can ee boh variable mp1 and mp2 are only live wihin an NSR, o hey are inernal node. Oher variable are live acro CSB boundarie. They are boundary node. For memory read, ince all daa i fir loaded ino ranfer regier 3, he deinaion regier i no aumed o be live acro he memory read i.e. he CSB. A BB1, um, buf and len inerfere wih each oher inernally (hey alo inerfere a CSB), hu, he 3 node form a clique on he GIG. mp1 inerfere wih um, buf and len in BB3.b, bu a he live poin of mp2 in BB7.b, boh buf and len are dead. Thu, um, buf and len form a BIG; he IIG 1 for NSR1 i empy; he IIG 2 for NSR2 include only mp1, he IIG 3 for NSR3 include only mp2. Obviouly, we have he following claim for each hread. Claim 1: To avoid pill, he GIG hould be colored wih R color and he BIG hould be colored wih PR color. Each IIG, a a par of he GIG, hould be colored wih no more han R color. Claim 2: Inernal node on differen IIG are no conneced i.e. hey do no inerfere wih each oher. boundary node inernal node IIG 1 um buf mp 1 IIG 2 len mp 2 IIG 3 BIG Figure 5. Global inerference graph for he example. Noice ha, NSR and inerference graph can be conruced iner-procedurally. CFG and NSR of differen funcion are conneced wih edge linking funcion call and reurn poin. 2 Here, we aume each live range repreen one variable. 3 Tranfer regier are pecial regier on IXP ued o ore daa from/o he memory, generally we can aume hey are emporary regier dedicaed for memory accee bu unavailable a a GPR.

6 4. OVERALL FRAMEWORK Build NSR, Inerference Graph Eimae Lower/Upper bound Iner-hread Regier Allocaion Inra-hread Regier Allocaion Figure 6. Overall framework. Figure 6 how our framework o perform he regier allocaion. Our fir ep i o build NSR and inerference graph, we hen ry o eimae he lower and upper bound of PR and R for each hread. Saring from he upper bound he iner-hread regier allocaor reduce he overall regier requiremen gradually unil i i wihin N reg. During hi proce, when he iner-hread regier allocaor inend o reduce PR or SR, i call he inra-hread allocaor for all hread. The iner-hread allocaor goe oward he direcion of he malle co increae. The framework allow he inra-hread regier allocaor o be buil eparaely from he iner-hread regier allocaor. 5. REGISTER NUMBER ESTIMATION A he fir ep oward aigning regier o muliple hread, we need o eimae he number of regier each hread need baed on he inerference graph. The eimaion help o guide he diribuion of regier o hread a he beginning. Here, we are concerned wih finding he bound for R and PR a defined below. We do no eimae bound for SR, ince he number of SR i alway equal o R-PR. DEFINITIONS: MinPR, MaxPR: Minimal, maximal number of PR MinR,MaxR: Minimal, maximal number of R Lower Bound Eimaion The lower bound i he minimum number of regier a hread need. Fir we can ge an eimaion for he minimum number of privae regier (MinPR) one hread need. A rough eimaion i MinPR RegPCSB max Max(number of co-live regier a CSB) I i obviou ha if a a CSB poin, here are RegPCSB max node (variable) co-live, we need a lea hi number of privae regier ince hey canno be hared during conex wich. In oher word, he minimal number of privae regier needed i a lea equal o he maximal number of node co-live a he CSB boundarie. The following lemma ay hi bound can be reached if enough move inrucion are inered. Alo, we will explain more abou move inrucion inerion in Secion 7. Lemma 1: Regardle of hared regier, MinPR can be made equal o RegPCSB max by inering move inrucion. Proof: If we are given privae regier PR 1, PR 2 PR, Re gpcsb max and a a cerain CSB, here are V 1, V 2 V n oally n variable live acro, RegPCSB max n. Simply, iner n move inrucion PR 1 =V 1,PR 2 =V 2, PR n =V n before he CSB and n move inrucion V 1 = PR 1, V 2 = PR 2, V n =PR n afer he CSB can make he code equivalen o he original and he number of privae regier needed i no more han RegPCSB max. However, in realiy, move inrucion ill co 1 cycle in our model, alhough i i much cheaper han pill, we ill need o keep he number of inered move inrucion mall. Similarly, we can eimae he MinR needed. MinR RegP max Max(# of co-live regier a program poin) Thi lower bound i alo achievable given enough move inrucion. The proof i imilar o he one above. Upper Bound Eimaion The upper bound give a maximal number of regier required wihou any exra move inrucion inered. According o claim 1 in ecion 3.2, he be eimaion for MaxPR and MaxR i he minimal number of color required o color BIG and GIG. However, for GIG he coloring problem i lighly differen from he radiional graph coloring. The problem i o find a coloring cheme for a hread which aifie: 1. All boundary node are colored wih a mo MaxPR color 2. All node are MaxR colorable 3. Any wo inerfering node are colored differenly For he GIG in Figure 5, all boundary node can be minimally colored wih 3 color; hu MaxPR=3. And, all node can be minimally colored wih 4 color (here i one 4-node clique), o MaxR=4 SR=1. Acually, here i a radeoff beween MaxPR and MaxR eimaion. Reducing MaxPR may induce a larger MaxR. To minimize MaxPR, we can fir remove all inernal node and color he BIG minimally, hen iner back he inernal node and color he graph auming all boundary node have fixed color. To find he ighe (minimal) value of MaxR ufficien o color, we hould ignore he condiion 1 above, i.e. we could aume ha all node are indiinguihable and we could imply color he GIG a uual uing any coloring allocaor. Such a coloring would hen minimize MaxR bu may give a higher MaxPR. PR=mini_color (BIG) R=Max(mini_color(IIG1), mini_color(iig2) ) Any conflic edge beween BIG and IIG? N Done Y BIG i colored wih PR color BIG R++, color one end node of he edge wih new color Try o change one end node color on he edge (may adju heir neighbor color) mini_color(g): a graph coloring algorihm Conflic Edge: edge wih wo end node having he ame color Y Succeed? Each IIG i colored wih up o R color Figure 7. Eimae he maximal regier requiremen. We ake an approach lighly differen from he fir one, i.e. we minimize he MaxPR fir. Thi approach i moivaed by he fac ha increae in PR caue direc increae in oal number of regier, while increae in SR only affec he oal number of regier when hi SR i he maximum among all hread (refer o he formula a he end of ecion 2). Baed on claim 2 menioned in ecion 3.2, (i.e. IIG are no conneced wih each oher) we can color IIG and BIG eparaely and hen merge hem ogeher o keep a igh conrol on colorabiliy. Afer merging, edge added beween BIG and IIG may caue conflic. For example, in Figure IIG1 IIGk N

7 5, when IIG and BIG are colored eparaely, variable um may ge he ame color a mp1, leading o color conflic when he edge beween hem i added during he merge. A general algorihm o color he whole graph alogeher may ake much more ime, ince he graph can be big (i include all live range in he program. Some code in our experimen conain hundred of node). Our approach i imilar o he fuion-baed or region-baed regier allocaion [23], excep ha our region are choen a he IIG and BIG. The algorihm (Figure 7.a) fir build BIG and IIG from he GIG and color each of hem independenly. In oher word, he BIG i colored wih color number from 1 o PR, while each IIG i colored wih color number from 1 up o R. Some IIG may be colored wih le han R color, bu an IIG can be colored wih a mo R color. The nex ep rie o merge each IIG wih he BIG. The edge beween IIG and BIG can caue problem if he wo end node of an edge have he ame color. Such edge are called Conflic Edge. The loop in Figure 7.a how how o reolve all he conflic edge. We illurae he procedure in Figure 7.b. Suppoe boundary node and inernal node i colored wih he ame color. If color can be changed o anoher color wihin color number 1 o PR or color can be changed o anoher color wihin color number 1 o R, hen one of hem can be changed o anoher color o remove hi conflic edge. If ha fail, we heuriically ry o change heir neighbor color o ee if he wo node can be recolored afer ha. Afer all hee aemp fail, we have o increae R and i re-colored wih he new color. The algorihm give MaxPR and MaxR finally. The complexiy of he algorihm i ΣO(mini_ color(iig i ))+O(mini_color(BIG))+O(#Edge beween BIG and IIG). In conra, he complexiy o color he whole graph i O(mini_color(GIG)). Thi mean he algorihm i alo quie fa o ry ou a given coloring for a hread. 6. INTERTHREAD REGISTER ALLOCATION 6.1 Our approach One of he difficulie in regier allocaion for muliple hread i ha we do no know exacly how many regier each hread need. Trying all combinaion o find ou he be regier allocaion will caue remendou amoun of compilaion ime and will be infeaible o build ino any pracical yem. Our approach i o fir ge an eimaion (range) of how many regier are needed by individual hread via he algorihm propoed in he previou ecion. From hi aring poin, we ue a greedy heuriic algorihm o approach a ub-opimal oluion by reducing he oal number of required phyical regier gradually. The algorihm alo encapulae he inra-hread regier allocaor, o ha i can be developed independenly. 6.2 The Regier Allocaion Algorihm Afer geing he eimaed upper bound MaxPR i and MaxR i for each hread, Le SRi = MaxRi MaxPR and i PRi = MaxPR. i We can check wih he following condiion: PR + Max( S, S... S ) N (**) i i 1 2 Nhd reg If hi hold, we can aign SGR = Max( S1, S2... S Nhd ) a he number of globally hared regier and MaxPRi a he number of privae regier for each hread o aify all regier requiremen. If he above condiion (**) canno hold good, he regier requiremen i oo high. We mu eiher reduce he PR() or SR() o aify (**). From (**), we can ee, here are wo way o reduce he lef ide value. Eiher we can reduce one of he PR i, which will reul in direc reducion of he lef-ide value. The oher way i o reduce SR i, we hould reduce he one() wih he maximal value. In cae muliple SR i have he ame maximal value, we hould conider reducing one of he PR i if ha co le. The iner-hread regier allocaion algorihm i hown in Figure 8. The algorihm fir build GIG and ge he eimaion for each hread. If he needed regier are enough (le han N reg ), he program imply allocae regier and reurn. Oherwie, i ener a loop o gradually reduce he number of overall regier requiremen hrough a greedy algorihm, i.e. every ime we chooe a direcion ha can achieve he minimal co. To reduce he regier requiremen (i.e. he lef ide of (**)) by 1, we have many choice. Eiher we can reduce one of he PR by 1 or reduce all he maximal SR() by 1 o cu down Max( SR1, SR2... SR N hd ). Every ime we reduce PR of one hread, we check if i i larger han he lower bound. Alo, he lower bound of R i =PR i +SR i >=MinR i i verified when eiher PR i or SR i i reduced. INPUT: N hd, N reg, CFG of all hread OUTPUT: all PR i and SR i, SGR, CFG afer regier allocaion /*Inra-hread regier allocaor, reurn move co*/ Inra_hd_allocaor(CFG, GIG, PR, SR); ALGORITHM: Iner_hd_reg_allocaion 1. Build_GIG() Eimae_reg_requiremen() While(um(PR i)+max(sr 1,SR 2 SR Nhd)>N reg) 6. Foreach PR i>minpr i and PR i+sr i>minr i do 7. co_pr i=regier allocaion co afer reducing PR i by od max_sr=max(sr 1, SR 2 SR Nhd) 11. co_sr= Regier allocaion co afer reducing all SR 12. ha equal max_sr by 1, if all uch SR can 13. be reduced(by checking PR+SR>MinR) 14. Find he min one among co_sr, co_pr 1,co_PR Chooe he one wih minimal co, modify PR and SR. 16. Endw Acually modify he CFG baed on new PR and SR 19. SGR= Max(SR i) 20. Reurn all PR i and SR i, SGR, all CFG Figure 8. Algorihm for iner-hread regier allocaion. The funcion Inra_hd_allocaor i an inra-hread regier allocaor. I accep he PR and SR, hen rie o reurn an allocaion uing PR and SR number of regier. Thi funcion i called when we calculae regier allocaion co for each hread and when we finally modify he CFG. I reurn he allocaion co. Acually, he inerference graph and coloring cheme given by he funcion Eimae_reg_requiremen can be paed o he inra-hread regier allocaor a a aring poin. However, o provide more flexibiliy, we leave hi o he implemenaion of Inra_hd_allocacor. The complexiy of our heuriic algorihm i O(N reg *N hd )* O(Inra_hd_allocaor), which largely depend on he complexiy of he inra-hread regier allocaor. Our regier allocaion algorihm generae aifacory oluion for all benchmark program wihin almo negligible compilaion ime. 7. INTRATHREAD REGISTER ALLOCATION The inra-hread regier allocaor aemp o allocae up o

8 PR number of phyical regier o boundary node and up o R=PR+SR phyical regier o all node. 7.1 Move Inerion and Live Range Spliing Our inra-regier allocaion i baed on live range pliing and move inrucion inerion. Live range pliing ha been ued in regier allocaion [11] o pill par of he live range o memory. In hi paper, we aemp o pli he live range by inering move inrucion o reduce he chromaic number. Lemma 1 ha hown ha hrough live range pliing MinPR can be reached. Figure 9 give anoher example. In Figure 9.a, live range A B and C inerfere wih each oher a hree differen CSB poin. The lower bound lemma in ecion 3 give MinPR=2, bu he inerference graph mu be colored wih 3 color, becaue A,B, and C form a clique. In Figure 9.b we pli he live range of variable A ino A 1 and A 2 by inering move inrucion a he pli poin. The reuling inerference graph can be colored wih 2 color which i equal o MinPR. Noice ha, hi i alo he way we reduce he number of regier required in he fir example (Figure 3.c). In our inra-hread allocaion algorihm, we focu on live range pliing hrough move inerion becaue pill i oo expenive on nework proceor and our experimen how MinPR (MinR) i much maller han MaxPR (MaxR). Thi provide u room o reduce chromaic number oward he lower bound by inering move inrucion. Node Color A 1 B 2 C 3 B A CSB C A 1 B Live Range move A 2 C Node Color A 1 1 A 2 2 B 2 C 1 Figure 9. Live range pliing via move inerion. 7.2 Inra-hread Regier Allocaion Algorihm Our regier allocaor work incremenally, i.e. i record he conex (inerference graph wih pli node and he poiion of move inrucion) of he la 2 invocaion and modifie he conex o aify he new PR and SR value. Noice ha he inerhread allocaion algorihm in Figure 8 call Inra_hd_allocaor muliple ime. In each ep, eiher i accep he previou conex and reduce PR or SR by 1 or i rejec he previou modificaion and ar from he previou o previou conex and reduce PR or SR by 1. Incremenal modificaion can ave ime for oherwie repeiive work. Furher, baed on he record of he wo conex, we can aume ha each ime he allocaor i invoked, i aemp o reduce eiher PR or SR by 1 from one of he recorded conex. We name hee wo kind of invocaion a Reduce-PR invocaion and Reduce-SR invocaion. Reduce-PR Invocaion In hi ype of invocaion he allocaor wan o reduce he PR by one from i la invocaion. In oher word, he la acceped conex can color all boundary node wih PR color and hi invocaion wan o color i wih PR-1 color. In hi age, we aume all move inrucion are inered near he CSB. Wih hi aumpion, we do no need o aler he color of inernal node. Normally, changing he color of boh inernal and boundary node migh induce more move inrucion (in hi cae we mu pli he live range o recolor an inernal node) and increae he co accordingly. Laer, we will how ome of he move inrucion a he CSB can be eliminaed by merging hem wih move inrucion inide he NSR. Thi acually relocae he move inrucion from he CSB boundary. Before he dicuion of our algorihm, we fir define Neighbor Color Number (NCN). Definiion: Neighbor Color Number (NCN): The number of color ued by he neighbor of a given node in a colored graph. INPUT: PR, SR OUTPUT: co (number of inered move inrucion) Saic conex_pre, conex_pre_pre 1. FUNCTION Reduce_PR(conex):co 2. Begin 3. Foreach color c in PR do 4. Co=0 5. Foreach node in Se_color_node(c,BIG) do 6. If NCN(,BIG)<PR-1 hen 7. Change o anoher color c in PR oher han c. 8. Co+=min(Cu_if_conflic(,c,c )) for all poible c 9. Ele 10. Co+=min(NSR_excluion_co(,c,c )) for each 11. color c in PR oher han c 12. Add newly pli node wih color c o Se_color_node(c,BIG) 13. if i i boundary node 14. Endif 15. od 16. Eliminae_unneceary_move() 17. Record o min_co if hi co i maller and record he conex. 18. od 19. Keep he minimal co conex and reurn min_co 20. End 21. FUNCTION Reduce_SR(conex):co 22. Begin 23. Foreach color c in SR 24. Co=0 25. Foreach NSR i color c i ued do 26. Foreach inernal node in Se_color_node(c,IIG i) do 27. If NCN(, GIG)<R-1 hen 28. Color wih a color oher han c. 29. Ele 30. Co+=min(live_range_excluion_co(,c,c )) 31. For each color c in R oher han c 32. Add newly pli node wih color c o Se_color_node(c,IIGi) 33. Endif 34. od 35. od 36. Eliminae_unneceary_move() 37. Record o min_co if hi co i maller and record he conex. 38. od 39. Keep he minimal co conex and reurn min_co 40. End 41. FUNCTION Inra_hd_allocaor(PR,SR):co 42. Begin 43. According o he acceped conex, pick ored eiher conex_pre 44. or conex_pre_pre => conex. 45. If(PR i reduced) reurn Reduce_PR(conex) 46. Ele if (SR i reduced) reurn Reduce_SR(conex) 47. Ele reurn co for he conex //no change 48. End Figure 10. Algorihm for inra-hread regier allocaion. The algorihm in Figure 10 ue funcion NCN(,BIG) o ge he neighbor color number of node on he BIG. The algorihm alo work in a greedy manner. I rie each color c in PR color and check he co o eliminae ha color. Then, he color wih lea eliminaion co i eleced o be eliminaed and all needed move inrucion are inered. Funcion Se_color_node(c,BIG) reurn he e of node on BIG wih color c. We need o change every node in hi e o a differen color in PR.

9 Firly, we check he NCN of ha ha color c on he BIG. If hi number i le han PR-1 (which mean here i a lea one color available in PR no ued by i neighbor), we can change o anoher color. Since we have changed color on BIG and may inernally inerfere wih oher inernal node or boundary node (wo boundary node can inerfere only inide NSR bu no on he CSB), we need o check if here i a color conflic. The funcion Cu_if_conflic(,c,c ) aemp o iner move inrucion o diconnec uch edge. Figure 11 how how he diconnecion i done and he correponding change on he GIG. In Figure 11.a, i originally colored wih color c ; afer node i changed o color c from color c i conflic wih inernal node. We iner a move a he CSB, o live range i pli. The par of he live range in NSR2 become, and hi par can keep color c, o i doe no conflic wih, while, on he BIG, i changed o color c. Figure 11.b how he change on he GIG. The edge beween and ge eliminaed afer pli from. keep he original color of, o in he IIG, i i compaible wih, while on he BIG, he color of i changed. In he algorihm, we ry every candidae color for and pick he one wih minimal co. inernal node boundary node in NSR2 Boundary node CSB NSR1 move NSR2 Inernal node Figure 11. Node pliing o change he color of node. If hi ep fail, i.e. NCN(,BIG)=PR-1, he algorihm call funcion NSR_excluion_co(,c,c ) o ge he co of changing o anoher color c and o exclude all he NSR wih conflic node. NSR_excluion_co look a each NSR where i live o ee if here i any node wih color c in i. If o, he NSR i excluded by pliing he live range of in ha NSR and by inering move inrucion. In our approach, he NSR are pli in whole, i.e. eiher he live range in ha NSR i kep wih color c (if no conflic) or he live range i pli (afer pliing, in ha NSR keep color c). inernal node Boundary node boundary node CSB in NSR2 NSR1 move CSB NSR4 NSR2 NSR3 Inernal node Boundary node r move Figure 12. NSR excluion o reduce PR. Figure 12 how how NSR excluion i done. Boundary node canno change o color c becaue he boundary node r and he inernal node are uing color c. The conflic NSR are NSR2 and NSR3, where and r are live. So, hee wo NSR are excluded from he live range of he original boundary node. On he GIG, we ee i pli from and now can be colored wih c. keep color c and i i ill compaible wih and r. Noice ha, afer r r pliing, he edge originally conneced from r o i conneced o. Therefore, he NCN of i reduced and can be recolored wih c. The algorihm rie each color oher han c o recolor and find he minimal value o finally color. Alo noice ha, afer hi ep, i colored wih c and, if i i a boundary node, we hould add o Se_color_node(c,BIG) and we will color i wih ome oher color during he laer ieraion. Se_color_node(c,BIG) will no increae infiniely, ince furher pliing will finally generae inernal node. Reduce-SR Invocaion To reduce SR, we check wih each color c in SR o ee which one hould be reduced wih minimal co. The co i calculaed by adding up co in every NSR where hi color i ued. Alo noice ha in hi ep, all boundary node are aumed o have fixed color o ha he phae will no affec he PR number. The algorihm rie o recolor node wih color c in a NSR o oher color. If he node on he GIG ha NCN le han R-1, we can ju pick ha color and color he node wihou any co. Oherwie, live range pliing i needed. Live range pliing i illuraed in Figure 13. In Figure 13.a, he example ha 3 baic block. Live range i recolored wih color c, however, live range alo ue color c. Our algorihm hen pli a he boundary where he wo live range overlap. Afer pliing, can ill ue color c and now change o c. We aign he color wih minimal co o node. Afer he pliing, node i puh ino Se_color_node(c,IIG i ), becaue now i bear color c. Thi proce will finally op. Afer each pliing, he live range wih color c i reduced. Since he value R-1 RegP max (according o he lower bound eimaion in ecion 5 and he algorihm in Figure 7), in he exreme cae, each live range i a ingle program poin, here will be a mo RegP max node co-live and live range wih color c can alway be recolored. move Figure 13. Excluding a live range wihin NSR o reduce SR. Eliminae Unneceary Move During he aemp o reduce PR, we aume ha all move inrucion are inered near he CSB boundary and during reduce SR, ome move inrucion are inered inide he NSR. A hi poin, we can merge ome of he inernal move inrucion wih hoe a he boundary. For wo conecuive move, he fir move inrucion o he live range i unneceary if he color a he enrance o he fir move i alo accepable in he region beween he wo move inrucion. We can afely eliminae he fir move and hi acually relaxe he rericion in Reduce_PR o bind move o he CSB. 8. THE SRA PROBLEM For SRA problem (defined in ecion ecion 2), given he

10 PR are equal and SR are alo equal. The rericion can be rewrien in a imple form: N hd PR + SR N reg Thu, he iner-hread regier allocaion algorihm can alo be implified. There are only wo poibiliie o reduce he regier requiremen. Due o he hrunk oluion pace, for algorihm in Figure 8, we can acually ravere all he poible PR and SR o find he be oluion. 9. EXPERIMENTAL RESULTS The evaluaion of our algorihm i done wih he Inelprovided imulaion environmen IXP1200 Developer Benchmark The IXP1200 workbench uppor cycleaccurae imulaion for IXP microengine and oher peripherie wih high fideliy. In hi ecion, we experimen wih 11 benchmark program and ome of heir combinaion o ee he effecivene of he regier allocaor. Thee benchmark are colleced from Commbench[15], Nebench[16], Inel provided example code and a packe cheduling algorihm from [18]. To evaluae our algorihm, he benchmark program are rewrien in IXP C code (a ube of andard C) and a few of hem are direcly wrien in aembly (microcode). For hoe wrien in aembly code, we reore he virual regier o ha our regier allocaor can work on he live range from crach. Our pa build he CFG and inerference graph from he aembly code, afer imple ranlaion of he aembly direcive. The aembly code i hen paed o he aembler o generae machine code. The IXP aembly coni of only 40 RISC inrucion which make he ranlaion eay. The aembler imply exi if oo many regier are required. However, afer our pa, he regier requiremen are alway aified, o he machine code can be generaed properly. Table 1 how he properie of he benchmark program. The code ize i number of inrucion afer code generaion. The cycle coun are meaure a follow: for ome program like L2l3forward, i canno run o a op in finie ime, ince hee program all run in a while loop o accep and proce packe, he cycle coun are averaged number per ieraion of he main loop. We li CTX inrucion (conex wich inrucion, which include load/ore, volunary conex wich and oher I/O operaion ha can caue conex wich) each benchmark ha. Roughly, abou 10% inrucion are CTX inrucion. The CTX inrucion here do no include pill inrucion, a we have removed all pill and reconruced original live range (we did hi baed on he ource code and he annoaion embedded in he generaed aembly code by he Inel IXP compiler). The number of live range (node on he GIG) i lied in he 5 h column. Thee number come from he reored virual regier. Column 6 and 7 are maximal regier preure in he program (RegP max ) and maximal regier preure a he CSB (RegPCSB max ). Thee are he lower bound eimaion for regier requiremen of he hread. Column 8 and 9 are he upper bound eimaion for R and PR baed on he algorihm in Figure 7. The 10 h and 11 h column give aiic for he number of NSR and heir average ize. One obervaion i ha normally larger NSR lead o bigger difference beween he maximal and minimal value of P and PR. Becaue more inernal node can exi in larger NSR, he regier preure for GIG hould exceed he BIG wih larger margin. Figure 14 evaluae our iner-hread regier allocaion algorihm for SRA. The ame evaluaion for ARA i combined in Table 3. For each benchmark program, we how wo relevan bar. The fir bar i he number of regier allocaed o he benchmark auming only ingle hread i available. We ue a Chaiin [9] yle regier allocaor for comparion wih our hared regier allocaor. The econd and hird bar are he number of privae regier and hared regier aigned wih our iner-hread regier allocaion algorihm. The ame benchmark i aumed o execue on four hread. The algorihm coninue unil he co reurned i non-zero, which mean we wan o e how many PR and SR are needed wihou any move inrucion inerion wih he iner-hread allocaion algorihm. The figure how ha he number of privae regier allocaed for he muli-hreaded cae i le han he number of regier needed for andalone regier allocaion. Thi i no urpriing becaue hared regier can ake care he higher regier preure inide he NSR. If no hared regier are ued and each hread run he ingle-hread regier allocaor, many regier are waed. Compared o he cae wih muli-hreaded regier requiremen i.e. 4*PR+SR, he average oal regier aving for all benchmark i 24%. In Table 2, we collec daa for he exreme cae wih our regier allocaion algorihm, i.e. he maximal number of move inrucion ha will be inered, if only he minimal number of regier i allocaed. Thi mean our algorihm mu pli many live range o reach he minimal number of regier. The move inerion overhead in he exreme cae i moly wihin 10% of he oal number of inrucion for he benchmark. Thi co i affordable compared o he overhead due o regier pill if he regier number i ou of range wih he ingle hread regier allocaion algorihm. Finally, Table 3 evaluae our regier allocaion algorihm for ARA wih 3 cenario. Noice ha all ak are periodic, independenly haring he CPU and execue forever. Thu, we meaure he performance improvemen of each hread in erm of he percenage reducion of cycle per ieraion. The fir cenario pu wo Md5 program on hread 0 and 1, wo fir2dim on hread 2 and 3. Thi can be a proceing module beween he receiving and ending module. Our daa how he PR and SR aigned, he number of live range afer he regier allocaion (#Live Range), conex wich inrucion number reducion and cycle change. The column of #CTX Reg Spill i he original code generaed by he Inel compiler ha allocae regier wih pilling and wihou regier haring acro hread (only allocae 32 regier for each hread). And, #CTX Reg Sharing i he number wih our allocaor (acually no change compared wih Table 1, becaue we avoid pill). The ame i rue for cycle coun ( #Cycle Reg Spill and #Cycle Reg Sharing ). The fir2dim acually run lower due o inered move. Bu hi i profiable due o he big aving from Md5. Thu, he allocaor i able o boo he performance criical hread (Md5) by lighly lowing down le performance criical one (fir2dim). The econd cenario coni of L2l3fwd receive and end on hread 0 and 1 and Md5 on hread 2 and 3. Thi can be a complee proceing module erving on one ending and one receiving por. The reul ill how he pill are aved for Md5 wih minor co for move on L2l3fwd hread. The la cenario run wrap receive and end on hread 0 and 1, fir2dim and frag on hread 2 and 3. The allocaor balance regier allocaion o aify wrap hread. Due o a high regier preure, wrap receive and end can run much lower (due o pill) if regier are no allocaed properly. Our reul how ha over 20% peedup i achieved for wrap, wherea only ligh lowdown i incurred for he oher wo benchmark, which i in accordance wih our opimizaion objecive of booing performance criical hread.

Flow graph/networks MAX FLOW APPLICATIONS. Flow constraints. Max flow problem 4/26/12

Flow graph/networks MAX FLOW APPLICATIONS. Flow constraints. Max flow problem 4/26/12 4// low graph/nework MX LOW PPLIION 30, pring 0 avid Kauchak low nework direced, weighed graph (V, ) poiive edge weigh indicaing he capaciy (generally, aume ineger) conain a ingle ource V wih no incoming

More information

Fuzzy LPT Algorithms for Flexible Flow Shop Problems with Unrelated Parallel Machines for a Continuous Fuzzy Domain

Fuzzy LPT Algorithms for Flexible Flow Shop Problems with Unrelated Parallel Machines for a Continuous Fuzzy Domain The IE Nework Conference 4-6 Ocober 007 Fuzzy LPT Algorihm for Flexible Flow Shop Problem wih Unrelaed Parallel Machine for a Coninuou Fuzzy Domain Jii Jungwaanaki * Manop Reodecha Paveena Chaovaliwonge

More information

Outline. CS38 Introduction to Algorithms 5/8/2014. Network flow. Lecture 12 May 8, 2014

Outline. CS38 Introduction to Algorithms 5/8/2014. Network flow. Lecture 12 May 8, 2014 /8/0 Ouline CS8 Inroducion o Algorihm Lecure May 8, 0 Nework flow finihing capaciy-caling analyi Edmond-Karp, blocking-flow implemenaion uni-capaciy imple graph biparie maching edge-dijoin pah aignmen

More information

6.8 Shortest Paths. Chapter 6. Dynamic Programming. Shortest Paths: Failed Attempts. Shortest Paths

6.8 Shortest Paths. Chapter 6. Dynamic Programming. Shortest Paths: Failed Attempts. Shortest Paths 1 Chaper.8 Shore Pah Dynamic Programming Slide by Kein Wayne. Copyrigh 5 Pearon-Addion Weley. All righ reered. Shore Pah Shore Pah: Failed Aemp Shore pah problem. Gien a direced graph G = (V, E), wih edge

More information

DEFINITION OF THE LAPLACE TRANSFORM

DEFINITION OF THE LAPLACE TRANSFORM 74 CHAPER 7 HE LAPLACE RANSFORM 7 DEFINIION OF HE LAPLACE RANSFORM REVIEW MAERIAL Improper inegral wih infinie limi of inegraio Inegraion y par and parial fracion decompoiion INRODUCION In elemenary calculu

More information

RULES OF DIFFERENTIATION LESSON PLAN. C2 Topic Overview CALCULUS

RULES OF DIFFERENTIATION LESSON PLAN. C2 Topic Overview CALCULUS CALCULUS C Topic Overview C RULES OF DIFFERENTIATION In pracice we o no carry ou iffereniaion from fir principle (a ecribe in Topic C Inroucion o Differeniaion). Inea we ue a e of rule ha allow u o obain

More information

4. Minimax and planning problems

4. Minimax and planning problems CS/ECE/ISyE 524 Inroducion o Opimizaion Spring 2017 18 4. Minima and planning problems ˆ Opimizing piecewise linear funcions ˆ Minima problems ˆ Eample: Chebyshev cener ˆ Muli-period planning problems

More information

Christoph Kessler, IDA, Linköpings universitet, C. Kessler, IDA, Linköpings universitet. C. Kessler, IDA, Linköpings universitet.

Christoph Kessler, IDA, Linköpings universitet, C. Kessler, IDA, Linköpings universitet. C. Kessler, IDA, Linköpings universitet. 00100 dvanced ompiler onrucion T86 ompiler Opimizaion and ode eneraion Sofware Pipelining of Loop (1) Sofware Pipelining Lieraure:. Keler, ompiling for VLW SP, 2009, Secion 7.2 (handed ou) LSU2e Secion

More information

COSC 3213: Computer Networks I Chapter 6 Handout # 7

COSC 3213: Computer Networks I Chapter 6 Handout # 7 COSC 3213: Compuer Neworks I Chaper 6 Handou # 7 Insrucor: Dr. Marvin Mandelbaum Deparmen of Compuer Science York Universiy F05 Secion A Medium Access Conrol (MAC) Topics: 1. Muliple Access Communicaions:

More information

Overview. From Point Visibility. From Point Visibility. From Region Visibility. Ray Space Factorization. Daniel Cohen-Or Tel-Aviv University

Overview. From Point Visibility. From Point Visibility. From Region Visibility. Ray Space Factorization. Daniel Cohen-Or Tel-Aviv University From-Region Viibiliy and Ray Space Facorizaion Overview Daniel Cohen-Or Tel-Aviv Univeriy Shor inroducion o he problem Dual Space & Parameer/Ray Space Ray pace facorizaion (SIGGRAPH 0) From Poin Viibiliy

More information

the marginal product. Using the rule for differentiating a power function,

the marginal product. Using the rule for differentiating a power function, 3 Augu 07 Chaper 3 Derivaive ha economi ue 3 Rule for differeniaion The chain rule Economi ofen work wih funcion of variable ha are hemelve funcion of oher variable For example, conider a monopoly elling

More information

Structural counter abstraction

Structural counter abstraction Srucural couner abracion Proving fair-erminaion of deph bounded yem Khiij Banal 1 wih Eric Kokinen 1, Thoma Wie 1, Damien Zufferey 2 1 New York Univeriy 2 IST Auria March 18, 2013 TACAS, Rome, Ialy Inroducion

More information

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II CS 152 Compuer Archiecure and Engineering Lecure 7 - Memory Hierarchy-II Krse Asanovic Elecrical Engineering and Compuer Sciences Universiy of California a Berkeley hp://www.eecs.berkeley.edu/~krse hp://ins.eecs.berkeley.edu/~cs152

More information

GPU-Based Parallel Algorithm for Computing Point Visibility Inside Simple Polygons

GPU-Based Parallel Algorithm for Computing Point Visibility Inside Simple Polygons GPU-Baed Parallel Algorihm for Compuing Poin Viibiliy Inide Simple Polygon Ehan Shoja a,, Mohammad Ghodi a,b, a Deparmen of Compuer Engineering, Sharif Univeriy of Technology, Tehran, Iran b Iniue for

More information

Sam knows that his MP3 player has 40% of its battery life left and that the battery charges by an additional 12 percentage points every 15 minutes.

Sam knows that his MP3 player has 40% of its battery life left and that the battery charges by an additional 12 percentage points every 15 minutes. 8.F Baery Charging Task Sam wans o ake his MP3 player and his video game player on a car rip. An hour before hey plan o leave, he realized ha he forgo o charge he baeries las nigh. A ha poin, he plugged

More information

The Laplace Transform

The Laplace Transform 7 he Laplace ranform 7 Definiion of he Laplace ranform 7 Invere ranform and ranform of Derivaive 7 Invere ranform 7 ranform of Derivaive 73 Operaional Properie I 73 ranlaion on he -Axi 73 ranlaion on he

More information

Scheduling. Scheduling. EDA421/DIT171 - Parallel and Distributed Real-Time Systems, Chalmers/GU, 2011/2012 Lecture #4 Updated March 16, 2012

Scheduling. Scheduling. EDA421/DIT171 - Parallel and Distributed Real-Time Systems, Chalmers/GU, 2011/2012 Lecture #4 Updated March 16, 2012 EDA421/DIT171 - Parallel and Disribued Real-Time Sysems, Chalmers/GU, 2011/2012 Lecure #4 Updaed March 16, 2012 Aemps o mee applicaion consrains should be done in a proacive way hrough scheduling. Schedule

More information

Fisheye Lens Distortion Correction on Multicore and Hardware Accelerator Platforms

Fisheye Lens Distortion Correction on Multicore and Hardware Accelerator Platforms Fiheye Len Diorion Correcion on Mulicore and Hardware Acceleraor Plaform Konani Dalouka 1 Chrio D. Anonopoulo 1 Nikolao Sek M. Bella 1 Chai 1 Deparmen of Compuer and Communicaion Engineering Univeriy of

More information

Multi-layer Global Routing Considering Via and Wire Capacities

Multi-layer Global Routing Considering Via and Wire Capacities Muli-layer Global Rouing Conidering Via and Wire Capaciie Chin-Hiung Hu, Huang-Yu Chen, and Yao-Wen Chang Graduae Iniue of Elecronic Engineering, Naional Taiwan Univeriy, Taipei, Taiwan Deparmen of Elecrical

More information

Implementing Ray Casting in Tetrahedral Meshes with Programmable Graphics Hardware (Technical Report)

Implementing Ray Casting in Tetrahedral Meshes with Programmable Graphics Hardware (Technical Report) Implemening Ray Casing in Terahedral Meshes wih Programmable Graphics Hardware (Technical Repor) Marin Kraus, Thomas Erl March 28, 2002 1 Inroducion Alhough cell-projecion, e.g., [3, 2], and resampling,

More information

The Planar Slope Number of Planar Partial 3-Trees of Bounded Degree

The Planar Slope Number of Planar Partial 3-Trees of Bounded Degree The Planar Slope Number of Planar Parial 3-Tree of Bounded Degree Ví Jelínek 1,2,EvaJelínková 1, Jan Kraochvíl 1,3, Bernard Lidický 1, Marek Teař 1,andTomáš Vykočil 1,3 1 Deparmen of Applied Mahemaic,

More information

4 Error Control. 4.1 Issues with Reliable Protocols

4 Error Control. 4.1 Issues with Reliable Protocols 4 Error Conrol Jus abou all communicaion sysems aemp o ensure ha he daa ges o he oher end of he link wihou errors. Since i s impossible o build an error-free physical layer (alhough some shor links can

More information

Network management and QoS provisioning - QoS in Frame Relay. . packet switching with virtual circuit service (virtual circuits are bidirectional);

Network management and QoS provisioning - QoS in Frame Relay. . packet switching with virtual circuit service (virtual circuits are bidirectional); QoS in Frame Relay Frame relay characerisics are:. packe swiching wih virual circui service (virual circuis are bidirecional);. labels are called DLCI (Daa Link Connecion Idenifier);. for connecion is

More information

On Romeo and Juliet Problems: Minimizing Distance-to-Sight

On Romeo and Juliet Problems: Minimizing Distance-to-Sight On Romeo and Julie Problem: Minimizing Diance-o-Sigh Hee-Kap Ahn 1, Eunjin Oh 2, Lena Schlipf 3, Fabian Sehn 4, and Darren Srah 5 1 Deparmen of Compuer Science and Engineering, POSTECH, Souh Korea heekap@poech.ac.kr

More information

NEWTON S SECOND LAW OF MOTION

NEWTON S SECOND LAW OF MOTION Course and Secion Dae Names NEWTON S SECOND LAW OF MOTION The acceleraion of an objec is defined as he rae of change of elociy. If he elociy changes by an amoun in a ime, hen he aerage acceleraion during

More information

CS 428: Fall Introduction to. Geometric Transformations (continued) Andrew Nealen, Rutgers, /20/2010 1

CS 428: Fall Introduction to. Geometric Transformations (continued) Andrew Nealen, Rutgers, /20/2010 1 CS 428: Fall 2 Inroducion o Compuer Graphic Geomeric Tranformaion (coninued) Andrew Nealen, Ruger, 2 9/2/2 Tranlaion Tranlaion are affine ranformaion The linear par i he ideni mari The 44 mari for he ranlaion

More information

AML710 CAD LECTURE 11 SPACE CURVES. Space Curves Intrinsic properties Synthetic curves

AML710 CAD LECTURE 11 SPACE CURVES. Space Curves Intrinsic properties Synthetic curves AML7 CAD LECTURE Space Curves Inrinsic properies Synheic curves A curve which may pass hrough any region of hreedimensional space, as conrased o a plane curve which mus lie on a single plane. Space curves

More information

COMP26120: Algorithms and Imperative Programming

COMP26120: Algorithms and Imperative Programming COMP26120 ecure C3 1/48 COMP26120: Algorihms and Imperaive Programming ecure C3: C - Recursive Daa Srucures Pee Jinks School of Compuer Science, Universiy of Mancheser Auumn 2011 COMP26120 ecure C3 2/48

More information

STEREO PLANE MATCHING TECHNIQUE

STEREO PLANE MATCHING TECHNIQUE STEREO PLANE MATCHING TECHNIQUE Commission III KEY WORDS: Sereo Maching, Surface Modeling, Projecive Transformaion, Homography ABSTRACT: This paper presens a new ype of sereo maching algorihm called Sereo

More information

A Matching Algorithm for Content-Based Image Retrieval

A Matching Algorithm for Content-Based Image Retrieval A Maching Algorihm for Conen-Based Image Rerieval Sue J. Cho Deparmen of Compuer Science Seoul Naional Universiy Seoul, Korea Absrac Conen-based image rerieval sysem rerieves an image from a daabase using

More information

Less Pessimistic Worst-Case Delay Analysis for Packet-Switched Networks

Less Pessimistic Worst-Case Delay Analysis for Packet-Switched Networks Less Pessimisic Wors-Case Delay Analysis for Packe-Swiched Neworks Maias Wecksén Cenre for Research on Embedded Sysems P O Box 823 SE-31 18 Halmsad maias.wecksen@hh.se Magnus Jonsson Cenre for Research

More information

The Vertex-Adjacency Dual of a Triangulated Irregular Network has a Hamiltonian Cycle

The Vertex-Adjacency Dual of a Triangulated Irregular Network has a Hamiltonian Cycle The Verex-Adjacency Dual of a Triangulaed Irregular Nework ha a Hamilonian Cycle John J. Barholdi, III Paul Goldman November 1, 003 Abrac Triangulaed irregular nework (TIN) are common repreenaion of urface

More information

DYNAMIC AND ADAPTIVE TESSELLATION OF BÉZIER SURFACES

DYNAMIC AND ADAPTIVE TESSELLATION OF BÉZIER SURFACES DYNAMIC AND ADAPTIVE TESSELLATION OF BÉZIER SURFACES R. Concheiro, M. Amor Univeriy of A Coruña, Spain rconcheiro@udc.e, margamor@udc.e M. Bóo Univeriy of Saniago de Compoela, Spain monerra.boo@uc.e Keyword:

More information

GLR: A novel geographic routing scheme for large wireless ad hoc networks

GLR: A novel geographic routing scheme for large wireless ad hoc networks Compuer Nework xxx (2006) xxx xxx www.elevier.com/locae/comne : A novel geographic rouing cheme for large wirele ad hoc nework Jongkeun Na *, Chong-kwon Kim School of Compuer Science and Engineering, Seoul

More information

Packet Scheduling in a Low-Latency Optical Interconnect with Electronic Buffers

Packet Scheduling in a Low-Latency Optical Interconnect with Electronic Buffers Packe cheduling in a Low-Laency Opical Inerconnec wih Elecronic Buffers Lin Liu Zhenghao Zhang Yuanyuan Yang Dep Elecrical & Compuer Engineering Compuer cience Deparmen Dep Elecrical & Compuer Engineering

More information

User Adjustable Process Scheduling Mechanism for a Multiprocessor Embedded System

User Adjustable Process Scheduling Mechanism for a Multiprocessor Embedded System Proceedings of he 6h WSEAS Inernaional Conference on Applied Compuer Science, Tenerife, Canary Islands, Spain, December 16-18, 2006 346 User Adjusable Process Scheduling Mechanism for a Muliprocessor Embedded

More information

Image segmentation. Motivation. Objective. Definitions. A classification of segmentation techniques. Assumptions for thresholding

Image segmentation. Motivation. Objective. Definitions. A classification of segmentation techniques. Assumptions for thresholding Moivaion Image segmenaion Which pixels belong o he same objec in an image/video sequence? (spaial segmenaion) Which frames belong o he same video sho? (emporal segmenaion) Which frames belong o he same

More information

Maximum Flows: Polynomial Algorithms

Maximum Flows: Polynomial Algorithms Maximum Flow: Polynomial Algorihm Algorihm Augmening pah Algorihm - Labeling Algorihm - Capaciy Scaling Algorihm - Shore Augmening Pah Algorihm Preflow-Puh Algorihm - FIFO Preflow-Puh Algorihm - Highe

More information

PART 1 REFERENCE INFORMATION CONTROL DATA 6400 SYSTEMS CENTRAL PROCESSOR MONITOR

PART 1 REFERENCE INFORMATION CONTROL DATA 6400 SYSTEMS CENTRAL PROCESSOR MONITOR . ~ PART 1 c 0 \,).,,.,, REFERENCE NFORMATON CONTROL DATA 6400 SYSTEMS CENTRAL PROCESSOR MONTOR n CONTROL DATA 6400 Compuer Sysems, sysem funcions are normally handled by he Monior locaed in a Peripheral

More information

A Generalized and Analytical Method to Solve Inverse Kinematics of Serial and Parallel Mechanisms Using Finite Screw Theory

A Generalized and Analytical Method to Solve Inverse Kinematics of Serial and Parallel Mechanisms Using Finite Screw Theory A Generalized Analyical Mehod o Solve Invere Kinemaic of Serial Parallel Mechanim Uing Finie Screw heory. Sun 1 S. F. Yang 1. Huang 1 J. S. Dai 3 1 Key Laboraory of Mechanim heory Equipmen Deign of Miniry

More information

Chapter 8 LOCATION SERVICES

Chapter 8 LOCATION SERVICES Disribued Compuing Group Chaper 8 LOCATION SERVICES Mobile Compuing Winer 2005 / 2006 Overview Mobile IP Moivaion Daa ransfer Encapsulaion Locaion Services & Rouing Classificaion of locaion services Home

More information

Shortest Path Algorithms. Lecture I: Shortest Path Algorithms. Example. Graphs and Matrices. Setting: Dr Kieran T. Herley.

Shortest Path Algorithms. Lecture I: Shortest Path Algorithms. Example. Graphs and Matrices. Setting: Dr Kieran T. Herley. Shores Pah Algorihms Background Seing: Lecure I: Shores Pah Algorihms Dr Kieran T. Herle Deparmen of Compuer Science Universi College Cork Ocober 201 direced graph, real edge weighs Le he lengh of a pah

More information

Highly Secure and Efficient Routing

Highly Secure and Efficient Routing Highly Secure and Efficien Rouing Ioanni Avramopoulo, Hiahi Kobayahi, Randolph Wang, Dep. of Elecrical Engineering Dep. of Compuer Science School of Engineering and Applied Science Princeon Univeriy, Princeon,

More information

Optimal Crane Scheduling

Optimal Crane Scheduling Opimal Crane Scheduling Samid Hoda, John Hooker Laife Genc Kaya, Ben Peerson Carnegie Mellon Universiy Iiro Harjunkoski ABB Corporae Research EWO - 13 November 2007 1/16 Problem Track-mouned cranes move

More information

A High-Performance Area-Efficient Multifunction Interpolator

A High-Performance Area-Efficient Multifunction Interpolator A High-Performance Area-Efficien Mulifuncion Inerpolaor ARITH Suar Oberman Michael Siu Ouline Wha i a GPU? Targe applicaion and floaing poin Shader microarchiecure High-order funcion Aribue inerpolaion

More information

Voltair Version 2.5 Release Notes (January, 2018)

Voltair Version 2.5 Release Notes (January, 2018) Volair Version 2.5 Release Noes (January, 2018) Inroducion 25-Seven s new Firmware Updae 2.5 for he Volair processor is par of our coninuing effors o improve Volair wih new feaures and capabiliies. For

More information

Dimmer time switch AlphaLux³ D / 27

Dimmer time switch AlphaLux³ D / 27 Dimmer ime swich AlphaLux³ D2 426 26 / 27! Safey noes This produc should be insalled in line wih insallaion rules, preferably by a qualified elecrician. Incorrec insallaion and use can lead o risk of elecric

More information

Y. Tsiatouhas. VLSI Systems and Computer Architecture Lab

Y. Tsiatouhas. VLSI Systems and Computer Architecture Lab CMOS INEGRAED CIRCUI DESIGN ECHNIQUES Universiy of Ioannina Clocking Schemes Dep. of Compuer Science and Engineering Y. siaouhas CMOS Inegraed Circui Design echniques Overview 1. Jier Skew hroughpu Laency

More information

Assignment 2. Due Monday Feb. 12, 10:00pm.

Assignment 2. Due Monday Feb. 12, 10:00pm. Faculy of rs and Science Universiy of Torono CSC 358 - Inroducion o Compuer Neworks, Winer 218, LEC11 ssignmen 2 Due Monday Feb. 12, 1:pm. 1 Quesion 1 (2 Poins): Go-ack n RQ In his quesion, we review how

More information

Test - Accredited Configuration Engineer (ACE) Exam - PAN-OS 6.0 Version

Test - Accredited Configuration Engineer (ACE) Exam - PAN-OS 6.0 Version Tes - Accredied Configuraion Engineer (ACE) Exam - PAN-OS 6.0 Version ACE Exam Quesion 1 of 50. Which of he following saemens is NOT abou Palo Alo Neworks firewalls? Sysem defauls may be resored by performing

More information

Systems & Biomedical Engineering Department. Transformation

Systems & Biomedical Engineering Department. Transformation Sem & Biomedical Engineering Deparmen SBE 36B: Compuer Sem III Compuer Graphic Tranformaion Dr. Aman Eldeib Spring 28 Tranformaion Tranformaion i a fundamenal corner one of compuer graphic and i a cenral

More information

NRMI: Natural and Efficient Middleware

NRMI: Natural and Efficient Middleware NRMI: Naural and Efficien Middleware Eli Tilevich and Yannis Smaragdakis Cener for Experimenal Research in Compuer Sysems (CERCS), College of Compuing, Georgia Tech {ilevich, yannis}@cc.gaech.edu Absrac

More information

Static Determination of Allocation Rates to Support Real-Time Garbage Collection

Static Determination of Allocation Rates to Support Real-Time Garbage Collection Saic Deerminaion of Allocaion Rae o Suppor RealTime Garbage Collecion Tobia Mann Morgan Deer Rob LeGrand Ron K. Cyron Deparmen of Compuer Science and Engineering Wahingon Unieriy in S. Loui "!#$ % & '()

More information

BALANCING STABLE TOPOLOGY AND NETWORK LIFETIME IN AD HOC NETWORKS

BALANCING STABLE TOPOLOGY AND NETWORK LIFETIME IN AD HOC NETWORKS INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET) Inernaional Journal of Elecronic and Communicaion Engineering & Technology (IJECET), ISSN 0976 6464(Prin), ISSN

More information

USBFC (USB Function Controller)

USBFC (USB Function Controller) USBFC () EIFUFAL501 User s Manual Doc #: 88-02-E01 Revision: 2.0 Dae: 03/24/98 (USBFC) 1. Highlighs... 4 1.1 Feaures... 4 1.2 Overview... 4 1.3 USBFC Block Diagram... 5 1.4 USBFC Typical Sysem Block Diagram...

More information

Improving the Efficiency of Dynamic Service Provisioning in Transport Networks with Scheduled Services

Improving the Efficiency of Dynamic Service Provisioning in Transport Networks with Scheduled Services Improving he Efficiency of Dynamic Service Provisioning in Transpor Neworks wih Scheduled Services Ralf Hülsermann, Monika Jäger and Andreas Gladisch Technologiezenrum, T-Sysems, Goslarer Ufer 35, D-1585

More information

Gauss-Jordan Algorithm

Gauss-Jordan Algorithm Gauss-Jordan Algorihm The Gauss-Jordan algorihm is a sep by sep procedure for solving a sysem of linear equaions which may conain any number of variables and any number of equaions. The algorihm is carried

More information

Quick Verification of Concurrent Programs by Iteratively Relaxed Scheduling

Quick Verification of Concurrent Programs by Iteratively Relaxed Scheduling Quick Verificaion of Concurren Programs by Ieraively Relaxed Scheduling Parick Mezler, Habib Saissi, Péer Bokor, Neeraj Suri Technische Univerisä Darmsad, Germany {mezler, saissi, pbokor, suri}@deeds.informaik.u-darmsad.de

More information

Using CANopen Slave Driver

Using CANopen Slave Driver CAN Bus User Manual Using CANopen Slave Driver V1. Table of Conens 1. SDO Communicaion... 1 2. PDO Communicaion... 1 3. TPDO Reading and RPDO Wriing... 2 4. RPDO Reading... 3 5. CANopen Communicaion Parameer

More information

Simple Network Management Based on PHP and SNMP

Simple Network Management Based on PHP and SNMP Simple Nework Managemen Based on PHP and SNMP Krasimir Trichkov, Elisavea Trichkova bsrac: This paper aims o presen simple mehod for nework managemen based on SNMP - managemen of Cisco rouer. The paper

More information

SREM: A Novel Multicast Routing Algorithm - Comprehensive Cost Analysis

SREM: A Novel Multicast Routing Algorithm - Comprehensive Cost Analysis REM: A Novel Mulicas Rouing Algorihm - Comprehensive Cos Analysis Yewen Cao and Khalid Al-Begain chool of Compuing, Universiy of Glamorgan, CF37 DL, Wales, U.K E-mail:{ycao,kbegain}@glam.ac.uk Absrac Ever-increasing

More information

Elite Acoustics Engineering A4-8 Live-Performance Studio Monitor with 4 Channels, Mixer, Effects, and Bluetooth Quick Start Guide

Elite Acoustics Engineering A4-8 Live-Performance Studio Monitor with 4 Channels, Mixer, Effects, and Bluetooth Quick Start Guide Elie Acousics Engineering A4-8 Live-Performance Sudio Monior wih 4 Channels, Mixer, Effecs, and Blueooh Quick Sar Guide WHAT IS IN THE BOX Your A4-8 package conains he following: (1) Speaker (1) 12V AC

More information

Coded Caching with Multiple File Requests

Coded Caching with Multiple File Requests Coded Caching wih Muliple File Requess Yi-Peng Wei Sennur Ulukus Deparmen of Elecrical and Compuer Engineering Universiy of Maryland College Park, MD 20742 ypwei@umd.edu ulukus@umd.edu Absrac We sudy a

More information

Automatic Calculation of Coverage Profiles for Coverage-based Testing

Automatic Calculation of Coverage Profiles for Coverage-based Testing Auomaic Calculaion of Coverage Profiles for Coverage-based Tesing Raimund Kirner 1 and Waler Haas 1 Vienna Universiy of Technology, Insiue of Compuer Engineering, Vienna, Ausria, raimund@vmars.uwien.ac.a

More information

CS 152 Computer Architecture and Engineering. Lecture 6 - Memory

CS 152 Computer Architecture and Engineering. Lecture 6 - Memory CS 152 Compuer Archiecure and Engineering Lecure 6 - Memory Krse Asanovic Elecrical Engineering and Compuer Sciences Universiy of California a Berkeley hp://www.eecs.berkeley.edu/~krse hp://ins.eecs.berkeley.edu/~cs152

More information

Announcements. TCP Congestion Control. Goals of Today s Lecture. State Diagrams. TCP State Diagram

Announcements. TCP Congestion Control. Goals of Today s Lecture. State Diagrams. TCP State Diagram nnouncemens TCP Congesion Conrol Projec #3 should be ou onigh Can do individual or in a eam of 2 people Firs phase due November 16 - no slip days Exercise good (beer) ime managemen EE 122: Inro o Communicaion

More information

Restorable Dynamic Quality of Service Routing

Restorable Dynamic Quality of Service Routing QOS ROUTING Resorable Dynamic Qualiy of Service Rouing Murali Kodialam and T. V. Lakshman, Lucen Technologies ABSTRACT The focus of qualiy-of-service rouing has been on he rouing of a single pah saisfying

More information

Learning in Games via Opponent Strategy Estimation and Policy Search

Learning in Games via Opponent Strategy Estimation and Policy Search Learning in Games via Opponen Sraegy Esimaion and Policy Search Yavar Naddaf Deparmen of Compuer Science Universiy of Briish Columbia Vancouver, BC yavar@naddaf.name Nando de Freias (Supervisor) Deparmen

More information

EECS 487: Interactive Computer Graphics

EECS 487: Interactive Computer Graphics EECS 487: Ineracive Compuer Graphics Lecure 7: B-splines curves Raional Bézier and NURBS Cubic Splines A represenaion of cubic spline consiss of: four conrol poins (why four?) hese are compleely user specified

More information

MORPHOLOGICAL SEGMENTATION OF IMAGE SEQUENCES

MORPHOLOGICAL SEGMENTATION OF IMAGE SEQUENCES MORPHOLOGICAL SEGMENTATION OF IMAGE SEQUENCES B. MARCOTEGUI and F. MEYER Ecole des Mines de Paris, Cenre de Morphologie Mahémaique, 35, rue Sain-Honoré, F 77305 Fonainebleau Cedex, France Absrac. In image

More information

CS 152 Computer Architecture and Engineering. Lecture 6 - Memory

CS 152 Computer Architecture and Engineering. Lecture 6 - Memory CS 152 Compuer Archiecure and Engineering Lecure 6 - Memory Krse Asanovic Elecrical Engineering and Compuer Sciences Universiy of California a Berkeley hp://www.eecs.berkeley.edu/~krse hp://ins.eecs.berkeley.edu/~cs152

More information

C 1. Last Time. CSE 490/590 Computer Architecture. Cache I. Branch Delay Slots (expose control hazard to software)

C 1. Last Time. CSE 490/590 Computer Architecture. Cache I. Branch Delay Slots (expose control hazard to software) CSE 490/590 Compuer Archiecure Cache I Seve Ko Compuer Sciences and Engineering Universiy a Buffalo Las Time Pipelining hazards Srucural hazards hazards Conrol hazards hazards Sall Bypass Conrol hazards

More information

Video streaming over Vajda Tamás

Video streaming over Vajda Tamás Video sreaming over 802.11 Vajda Tamás Video No all bis are creaed equal Group of Picures (GoP) Video Sequence Slice Macroblock Picure (Frame) Inra (I) frames, Prediced (P) Frames or Bidirecional (B) Frames.

More information

1 œ DRUM SET KEY. 8 Odd Meter Clave Conor Guilfoyle. Cowbell (neck) Cymbal. Hi-hat. Floor tom (shell) Clave block. Cowbell (mouth) Hi tom.

1 œ DRUM SET KEY. 8 Odd Meter Clave Conor Guilfoyle. Cowbell (neck) Cymbal. Hi-hat. Floor tom (shell) Clave block. Cowbell (mouth) Hi tom. DRUM SET KEY Hi-ha Cmbal Clave block Cowbell (mouh) 0 Cowbell (neck) Floor om (shell) Hi om Mid om Snare Floor om Snare cross sick or clave block Bass drum Hi-ha wih foo 8 Odd Meer Clave Conor Guilfole

More information

Texture Mapping. Texture Mapping. Map textures to surfaces. Trompe L Oeil ( Deceive the Eye ) Texture map. The texture

Texture Mapping. Texture Mapping. Map textures to surfaces. Trompe L Oeil ( Deceive the Eye ) Texture map. The texture CSCI 48 Compuer Graphic Lecure Texure Mapping A way of adding urface deail Texure Mapping February 5, 22 Jernej Barbic Univeriy of Souhern California Texure Mapping + Shading Filering and Mipmap Non-color

More information

Location. Electrical. Loads. 2-wire mains-rated. 0.5 mm² to 1.5 mm² Max. length 300 m (with 1.5 mm² cable). Example: Belden 8471

Location. Electrical. Loads. 2-wire mains-rated. 0.5 mm² to 1.5 mm² Max. length 300 m (with 1.5 mm² cable). Example: Belden 8471 Produc Descripion Insallaion and User Guide Transiser Dimmer (454) The DIN rail mouned 454 is a 4channel ransisor dimmer. I can operae in one of wo modes; leading edge or railing edge. All 4 channels operae

More information

Chapter 4 Sequential Instructions

Chapter 4 Sequential Instructions Chaper 4 Sequenial Insrucions The sequenial insrucions of FBs-PLC shown in his chaper are also lised in secion 3.. Please refer o Chaper, "PLC Ladder diagram and he Coding rules of Mnemonic insrucion",

More information

FIELD PROGRAMMABLE GATE ARRAY (FPGA) AS A NEW APPROACH TO IMPLEMENT THE CHAOTIC GENERATORS

FIELD PROGRAMMABLE GATE ARRAY (FPGA) AS A NEW APPROACH TO IMPLEMENT THE CHAOTIC GENERATORS FIELD PROGRAMMABLE GATE ARRAY (FPGA) AS A NEW APPROACH TO IMPLEMENT THE CHAOTIC GENERATORS Mohammed A. Aseeri and M. I. Sobhy Deparmen of Elecronics, The Universiy of Ken a Canerbury Canerbury, Ken, CT2

More information

! errors caused by signal attenuation, noise.!! receiver detects presence of errors:!

! errors caused by signal attenuation, noise.!! receiver detects presence of errors:! Daa Link Layer! The Daa Link layer can be furher subdivided ino:!.! Logical Link Conrol (LLC): error and flow conrol!.! Media Access Conrol (MAC): framing and media access! differen link proocols may provide

More information

MIC2569. Features. General Description. Applications. Typical Application. CableCARD Power Switch

MIC2569. Features. General Description. Applications. Typical Application. CableCARD Power Switch CableCARD Power Swich General Descripion is designed o supply power o OpenCable sysems and CableCARD hoss. These CableCARDs are also known as Poin of Disribuion (POD) cards. suppors boh Single and Muliple

More information

Overview of Board Revisions

Overview of Board Revisions s Sysem Overview MicroAuoBox Embedded PC MicroAuoBox II can be enhanced wih he MicroAuoBox Embedded PC. The MicroAuoBox EmbeddedPC is powered via he MicroAuoBox II power inpu connecor. Wih he common power

More information

An Improved Square-Root Nyquist Shaping Filter

An Improved Square-Root Nyquist Shaping Filter An Improved Square-Roo Nyquis Shaping Filer fred harris San Diego Sae Universiy fred.harris@sdsu.edu Sridhar Seshagiri San Diego Sae Universiy Seshigar.@engineering.sdsu.edu Chris Dick Xilinx Corp. chris.dick@xilinx.com

More information

The Impact of Product Development on the Lifecycle of Defects

The Impact of Product Development on the Lifecycle of Defects The Impac of Produc Developmen on he Lifecycle of Rudolf Ramler Sofware Compeence Cener Hagenberg Sofware Park 21 A-4232 Hagenberg, Ausria +43 7236 3343 872 rudolf.ramler@scch.a ABSTRACT This paper invesigaes

More information

Let s get physical - EDA Tools for Mobility

Let s get physical - EDA Tools for Mobility Le s ge physical - EDA Tools for Mobiliy Aging and Reliabiliy Communicaion Mobile and Green Mobiliy - Smar and Safe Frank Oppenheimer OFFIS Insiue for Informaion Technology OFFIS a a glance Applicaion-oriened

More information

Rule-Based Multi-Query Optimization

Rule-Based Multi-Query Optimization Rule-Based Muli-Query Opimizaion Mingsheng Hong Dep. of Compuer cience Cornell Universiy mshong@cs.cornell.edu Johannes Gehrke Dep. of Compuer cience Cornell Universiy johannes@cs.cornell.edu Mirek Riedewald

More information

Utility-Based Hybrid Memory Management

Utility-Based Hybrid Memory Management Uiliy-Based Hybrid Memory Managemen Yang Li Saugaa Ghose Jongmoo Choi Jin Sun Hui Wang Onur Mulu Carnegie Mellon Universiy Dankook Universiy Beihang Universiy ETH Zürich While he memory fooprins of cloud

More information

Personalizing Forum Search using Multidimensional Random Walks

Personalizing Forum Search using Multidimensional Random Walks Peronalizing Forum Search uing Mulidimenional Random Walk Gayaree Ganu Compuer Science Ruger Univeriy gganu@c.ruger.edu Amélie Marian Compuer Science Ruger Univeriy amelie@c.ruger.edu Abrac Online forum

More information

Overview. 9 - Game World: textures, skyboxes, etc. Texture Mapping. Texture Space. Vertex Texture Coordinates. Texture Mapping. Game World Backgrounds

Overview. 9 - Game World: textures, skyboxes, etc. Texture Mapping. Texture Space. Vertex Texture Coordinates. Texture Mapping. Game World Backgrounds CSc 165 Compuer Game Archiecure Overview Texure Mapping 9 - Game World: exure, kyboxe, ec. Game World Background SkyBoxe & SkyDome World Bound and Viibiliy Render Sae 2 Texure Mapping Texure Space Baic

More information

Timers CT Range. CT-D Range. Electronic timers. CT-D Range. Phone: Fax: Web: -

Timers CT Range. CT-D Range. Electronic timers. CT-D Range. Phone: Fax: Web:  - CT-D Range Timers CT-D Range Elecronic imers Characerisics Diversiy: mulifuncion imers 0 single-funcion imers Conrol supply volages: Wide range: -0 V AC/DC Muli range: -8 V DC, 7 ime ranges from 0.0s o

More information

MATH Differential Equations September 15, 2008 Project 1, Fall 2008 Due: September 24, 2008

MATH Differential Equations September 15, 2008 Project 1, Fall 2008 Due: September 24, 2008 MATH 5 - Differenial Equaions Sepember 15, 8 Projec 1, Fall 8 Due: Sepember 4, 8 Lab 1.3 - Logisics Populaion Models wih Harvesing For his projec we consider lab 1.3 of Differenial Equaions pages 146 o

More information

Optics and Light. Presentation

Optics and Light. Presentation Opics and Ligh Presenaion Opics and Ligh Wha comes o mind when you hear he words opics and ligh? Wha is an opical illusion? Opical illusions can use color, ligh and paerns o creae images ha can be

More information

The Roots of Lisp paul graham

The Roots of Lisp paul graham The Roos of Lisp paul graham Draf, January 18, 2002. In 1960, John McCarhy published a remarkable paper in which he did for programming somehing like wha Euclid did for geomery. 1 He showed how, given

More information

Performance Evaluation of Implementing Calls Prioritization with Different Queuing Disciplines in Mobile Wireless Networks

Performance Evaluation of Implementing Calls Prioritization with Different Queuing Disciplines in Mobile Wireless Networks Journal of Compuer Science 2 (5): 466-472, 2006 ISSN 1549-3636 2006 Science Publicaions Performance Evaluaion of Implemening Calls Prioriizaion wih Differen Queuing Disciplines in Mobile Wireless Neworks

More information

Chapter 3 MEDIA ACCESS CONTROL

Chapter 3 MEDIA ACCESS CONTROL Chaper 3 MEDIA ACCESS CONTROL Overview Moivaion SDMA, FDMA, TDMA Aloha Adapive Aloha Backoff proocols Reservaion schemes Polling Disribued Compuing Group Mobile Compuing Summer 2003 Disribued Compuing

More information

Announcements For The Logic of Boolean Connectives Truth Tables, Tautologies & Logical Truths. Outline. Introduction Truth Functions

Announcements For The Logic of Boolean Connectives Truth Tables, Tautologies & Logical Truths. Outline. Introduction Truth Functions Announcemens For 02.05.09 The Logic o Boolean Connecives Truh Tables, Tauologies & Logical Truhs 1 HW3 is due nex Tuesday William Sarr 02.05.09 William Sarr The Logic o Boolean Connecives (Phil 201.02)

More information

1. Function 1. Push-button interface 4g.plus. Push-button interface 4-gang plus. 2. Installation. Table of Contents

1. Function 1. Push-button interface 4g.plus. Push-button interface 4-gang plus. 2. Installation. Table of Contents Chaper 4: Binary inpus 4.6 Push-buon inerfaces Push-buon inerface Ar. no. 6708xx Push-buon inerface 2-gang plus Push-buon inerfacechaper 4:Binary inpusar. no.6708xxversion 08/054.6Push-buon inerfaces.

More information

Motor Control. 5. Control. Motor Control. Motor Control

Motor Control. 5. Control. Motor Control. Motor Control 5. Conrol In his chaper we will do: Feedback Conrol On/Off Conroller PID Conroller Moor Conrol Why use conrol a all? Correc or wrong? Supplying a cerain volage / pulsewidh will make he moor spin a a cerain

More information

Lecture 14: Minimum Spanning Tree I

Lecture 14: Minimum Spanning Tree I COMPSCI 0: Deign and Analyi of Algorithm October 4, 07 Lecture 4: Minimum Spanning Tree I Lecturer: Rong Ge Scribe: Fred Zhang Overview Thi lecture we finih our dicuion of the hortet path problem and introduce

More information

Outline. EECS Components and Design Techniques for Digital Systems. Lec 06 Using FSMs Review: Typical Controller: state

Outline. EECS Components and Design Techniques for Digital Systems. Lec 06 Using FSMs Review: Typical Controller: state Ouline EECS 5 - Componens and Design Techniques for Digial Sysems Lec 6 Using FSMs 9-3-7 Review FSMs Mapping o FPGAs Typical uses of FSMs Synchronous Seq. Circuis safe composiion Timing FSMs in verilog

More information

Ins Net2 plus control unit

Ins Net2 plus control unit S ns 0 Server Link 00 0/00 Eherne End of Line Terminaion RS485 Nework xi -4V. Ins-30080 Ne plus conrol uni C auion: For DC readers y Inruder Ne plus O u pus r Powe DC Only Relay C onac E Buo n P SU/ Page

More information