Communication Speed Selection and Functional Partitioning for Low-Energy On-Chip Networked Multiprocessor

Size: px
Start display at page:

Download "Communication Speed Selection and Functional Partitioning for Low-Energy On-Chip Networked Multiprocessor"

Transcription

1 ommuncaton Speed Selecton and Functonal Parttonng for Low-Energy On-hp Networed ultprocessor Jnfeng Lu, Pa H. hou, Nader Bagherzadeh epartment of Electrcal & omputer Engneerng Unversty of alforna, Irvne, A , USA {nfengl, chou, nader}@ece.uc.edu Abstract Hgh-speed seral networ nterfaces are becomng the prmary way for modern embedded systems and systems-onchp to connect wth each other and wth perpheral devces. odern communcaton nterfaces are capable of operatng at multple speeds and are openng a new dmenson of tradeoffs between computaton and communcaton. Unfortunately, today s PU-centrc technques often fal to consder mult-speed communcaton and the balance between communcaton and computaton for tme and energy; as a result, they yeld sub-optmal f not ncorrect desgns. Ths paper presents a new technque for global energy optmzaton through coordnated functonal parttonng and speed selecton for the processors and ther communcaton nterfaces. We propose a mult-dmensonal dynamc programmng formulaton for energy-optmal functonal parttonng wth PU/communcaton speed selecton for a class of data-regular applcatons under performance constrants. We demonstrate the effectveness of our optmzaton technques wth an mage processng applcaton mapped onto a mult-processor archtecture wth a mult-speed Ethernet. Keywords: communcaton speed selecton, functonal parttonng, on-chp networed mult-processor, low-power desgn Introducton Towards Hgh-Speed Seral Busses on So A ey trend n systems-on-chp s towards the use of hghspeed seral busses for system-level nterconnect. Seral busses offer many compellng advantages, ncludng modularty, composablty, scalablty, form factor, and power effcency [5,, 3]. odularty and composablty are extremely mportant, because the sheer complexty of these chps forces desgners to rase the level of abstracton. ost So desgns are done by ntegraton of ntellectual property (IP) components as a way to manage complexty whle meetng tme-to-maret Ths research was sponsored by ARPA grant F and Prntronx Fellowshp. deadlnes. Seral protocols are well understood and have long been used n n automotve control (e.g., AN from Bausch) and consumer electroncs (e.g., I from Phlps). ore recent protocols such as FreWre (IEEE 394) and USB are commonly used not only for perpheral devces but also for connectng multple embedded processors. They provde a smple, standardzed, effcent, and scalable way of buldng loosely coupled systems. Hgh-speed seral controllers such as Ethernet are now an ntegral part of many embedded processors. Seral busses also have power and form factor advantages. From automobles to computer perpherals, seral nterconnects such as FreWre and USB are compact and low power compared to SSI or parallel, whch are buly, hgh power, and lmted n length. Ths s especally mportant for systems-on-chp, where gates are vrtually free, but wres are the most expensve part of the chp real estate. Long, parallel, shared wres are not only hgh power but also suffer from cloc sews and even cross tals as the feature sze shrns. Seral controllers provde a clean abstracton by sheldng components from these low-level concerns. oreover, modern protocols also support plug-and-play and power management features such as subnet shutdown or ln suspenson. These features and more mae hgh-speed seral protocols an attractve choce for rapd ntegraton of So archtectures. /Performance Issues wth Seral Networs Of course, seral controllers come at a prce. The area and IP lcensng wll have a cost, but ths cost mght be ustfed by tme-to-maret or other overrdng busness concerns. In fact, t mght be even less of an ssue for future IP, whch wll lely have these seral controllers ntegrated. For example, A s newly announced Au [] s a IPS based mcrocontroller wth ntegrated /-base T Ethernet, USB, and many other I/O. However, power and performance wll become the crtcal ssues, as they drectly affect the correctness of the desgn. For power optmzaton, prevous efforts focused on the processor for several reasons. The PU was the man consumer of power, and t also offered the most optons for power management, ncludng voltage scalng. However, recent advances n both processors and communcaton nter-

2 faces are drvng a shft n how power should be managed. PU-centrc power management has gven rse to a new generaton of processors wth dramatcally mproved power effcency, and the PU s now drawng a smaller percentage of the overall system power. The nsatable demand for bandwdth has also resulted n hgh-speed communcaton nterfaces. Even though ther power effcency (.e., energy per bt transmtted) has also been mproved, communcaton power now matches or surpasses the PU, and s thus a larger fracton of the system power. For nstance, the Intel XScale processor consumes.6w at full speed, whle a GgaBt Ethernet nterface consumes 6W. System anagement wth Speed Selecton any communcaton nterfaces today support multple data rates. However, the scalng effects tend to be opposte those of voltage scalable PUs. For PUs, slower speed generally means lower power and lower energy per nstructon; but for communcaton, faster speed means hgher power but often less energy per bt. Ths s hghly dependent on the specfc controller. Few research wors to date explored communcaton speed as a ey parameter for power optmzaton. Speed selecton cannot be performed for ust communcaton or computaton n solaton, because a local decson can have a global mpact. One reason s that communcaton now goes through a shared medum rather than pontto-pont. The PUs cannot all be run at the slowest, most power-effcent speeds, because they must compete for the avalable tme and power wth each other and wth the communcaton nterfaces. A faster communcaton speed, even at a hgher energy-per-bt, can save energy by creatng opportuntes for subsystem shutdown or voltage scalng the processors. Greedly savng communcaton power may actually result n hgher overall energy. At the same tme, functonal parttonng must be an ntegral part of the optmzaton loop, because dfferent parttonng schemes can dramatcally alter the communcaton payload and computaton worload for each node. Approach For a gven worload on a networed archtecture, our problem statement s to generate a functonal parttonng scheme and to select the speeds of communcaton nterfaces and processors, such that the total energy s mnmzed. In general, such a problem s extremely dffcult. Fortunately, for a class of systems wth ppelned tass under an overall latency constrant, effcent, exact solutons exst. Ths paper presents a mult-dmensonal dynamc programmng soluton to such a problem. It formulates the energy consumed by the processors and communcaton nterfaces wth ther power/speed scalng factors wthn ther avalable tme budget. We demonstrate the effectveness of ths technque wth an mage processng algorthm mapped onto a multprocessor archtecture nterconnected by a GgaBt Ethernet. Ths technque s also applcable as a heurstc to general dataflow problems. Related Wor Prevous wors have explored communcaton synthess and optmzaton n dstrbuted mult-processor systems. [7] presents communcaton schedulng to wor wth ratemonotonc tass, whle [7] assumes the more determnstc tme-trggered protocol (TTP). [] dstrbutes tmng constrants on communcaton among segments through prorty assgnment on seral busses (such as control-area networ) and customzaton of devce drvers. Whle these assume a bus or a networ protocol, LYOS [9] ntegrates the ablty to select among several communcaton protocols (wth dfferent delays, data szes, burstness) nto the man parttonng loop. Although these and many other wors can be extended to So archtectures, they do not specfcally optmze for energy mnmzaton by explotng the processors voltage scalng capabltes. Related technques that optmze for power consumpton of processors typcally assume a fxed communcaton data rate. [4] uses smulated heatng search strateges to fnd low-power desgn ponts for voltage scalable embedded processors. [] performs battery-aware tas post-schedulng for dstrbuted, voltage-scalable processors by movng tass to smooth the power profle. [6, 5] propose parttonng the computaton onto a mult-processor archtecture that consumes sgnfcantly less power than a sngle processor. [6] reduces swtchng actvtes of both functonal unts and communcaton lns by parttonng tass onto a mult-chp archtecture; whle [8] maxmzes the opportunty to shut down dle processors through functonal parttonng. All these technques focus on the computatonal aspect wthout explorng the speed/power scalablty of the communcaton nterfaces. Exstng technques cannot be readly combned to explore many tmng/power trade-offs between computaton and communcaton. The quadratc voltage scalng propertes for PU s do not generalze to communcaton nterfaces. Even f they do, these technques have not consdered the parttonng of power and tmng budgets among computaton/communcaton components across the networ. Selectng communcaton attrbutes by only consderng deadlnes wthout power wll lead to unexpected, often ncorrect results at the system level. 3 System odel Ths secton defnes a system-level performance/energy model for both computaton and communcaton components n a networed on-chp mult-processor archtecture. In ths paper, a system conssts of processng nodes N, =,,..., connected by a shared communcaton medum. Each processng node (or node for short) conssts of a processor, a local memory, and one or more communcaton n-

3 terfaces that send and/or receve data from other nodes. 3. Jobs and Tass A processng ob assgned to a node has three tass: REV, PRO, and SEN, whch must be executed serally n that order. REV and SEN are communcaton tass on the nterfaces, and PRO s a computaton tas on the processor. The worload for each tas s defned as follows. For communcaton tass REV and SEN, worload W r and W s ndcate the number of bts to be receved and sent, respectvely. For the computaton tas PRO, the worload W p s the number of cycles. Let T p,t r,t s denote the delays of tass PRO, REV and SEN, respectvely. Let F p denote the cloc frequency of the processor, F r and F s the respectve data bt rates for recevng and sendng. We have T p = W p F p ; T r = W r F r ; T s = W s F s () () s reasonable for processors executng data-domnated programs, where the total cycles W p can be analyzed and bounded statcally. However, t does not hold true n general f the effectve data rate can be reduced by collsons and errors on the shared communcaton medum. We present the collson-free condton of the shared medum n Secton 4. To model the non-deal aspect of the medum, we ntroduce the communcaton effcency terms, ρ r and ρ s, ρ r,ρ s, such that T r = W r ρ r F r and T s = W s ρ s F s. Note that ρ r and ρ s need not be constants, but may be functons of communcaton speeds F r,f s. For brevty, our expermental results assume an deal communcaton medum (ρ r = ρ s = ) wthout loss of generalty. A more practcal communcaton model can be drectly appled, snce ρ r and ρ s can be very well bounded for a collson-free medum. s a deadlne on each processng ob, whch requres T r + T p + T s for the three seralzed tass. If any slac tme exsts, then we can slow down tas PRO by voltage scalng to reduce energy. Therefore, we assume the ob fnshes at the deadlne. That s, 3. Scalng = T r + T p + T s () On each node, we assume only the processor and the communcaton nterfaces are power-manageable by speed selecton. The power consumpton by the communcaton medum s nterpreted to be the total power consumed by all actve communcaton nterfaces. We assume a processor s voltage-scalng characterstcs can be expressed by a scalng functon Scale p that maps the PU frequency to ts power level. A communcaton nterface also has scalng functons that characterze the power levels at dfferent communcaton data rates for sendng and recevng. () mples Scale p REV recevng Wr bts Wp cycles on processor PRO (a) bloc dgram SEN sendng Ws bts delay: Tr = Wr / Fr REV Pr power: Pr Pp speed: Fr delay: Tp = Wp / Fp PRO power: Pp speed: Fp OVERHEA (b) tmng-power dgram delay: Ts = Ws / Fs SEN Ps power: Ps speed: Fs power: Povh Fgure : Tmng and power propertes of a processng node. s contnuous, whle communcaton nterfaces support only a few dscrete scalng ponts. Let P p, P r, and P s denote the power levels of tass PRO, REV and SEN, respectvely, then, P p = Scale p (F p ); P r = Scale r (F r ); P s = Scale s (F s ) (3) Let P ovh denote the power overhead when ntroducng an addtonal node nto the system. It captures the power of the memory, mnmum power of the PU and communcaton nterface, PU s power durng REV and SEN (A), and communcaton nterfaces power durng PRO. The energy consumpton of a tas s the power-delay product. Let E p,e r,e s, and E ovh denote the energy consumpton of tass PRO, REV, SEN, and overhead of a node, E p = P p T p ; E r = P r T r ; E s = P s T s ; E ovh = P ovh (4) For one node N wth tass PRO, REV, and SEN, the total energy of node N s Tme E N = E p + E r + E s + E ovh (5) Fg. shows the structure of a processng node. The gray bar represents the overhead and whte bars represent tass REV, PRO and SEN. The area of the bars refers to the energy contrbuton of the tass and overhead. Fnally, the total energy of the system s the sum of energy consumpton on each node, 3.3 -Node Ppelne E sys = = E N (6) Ths paper consders a specal case called an -node ppelne. It conssts of dentcal nodes N, =,,..., as characterzed by Scale p,scale r,scale s,e ovh. Each node N receves W r bts of data from the prevous node N (except N ), processes the data n W p cycles, and sends the W s -bt result to the next node N + (except N ). Each SEN REV + communcaton par sends and receves same amount of data at the same communcaton speed, wth the same communcaton delay, and we assume they start and fnsh at the same tme. That s, W s = W r+,f s = F r+,t s = T r+. All nodes have the same deadlne, and each node

4 recevng Wr bts T = Tr N RE V Wp cycles on processor N Tp PRO communcatng Ws =Wr bts Ts= Tr SE N Tme T= Tr=Ts N RE V Wp cycles on processor N (a) bloc dagram PRO communcatng Ws =Wr3 bts Ts = Tr3 SEN T = Tme Tr3 =Ts N3 N N N3 T RE V Tp3 - T PRO3 Tp PRO Tp PRO T3 T Tp REV3 (b) seralzed tmng-power dagram T SE N T SEN RE V T T SE REV3 PR N3 O3 Tme PRO3 Tp PRO Tp3 - T T3 T SEN T SE REV3 N3 (c) ppelned tmng-power dagram Wp3 cycles on processor Tme Fgure : A 3-node ppelne. N3 Tp3 PRO3 Tp3 PRO3 sendng Ws3 bts T3 = Ts3 SE N3 T3 SE N3 acts as a ppelne stage wth delay. Fg. shows an example of a three-node ppelne. For brevty, the overhead s not shown. An -node ppelne can be parttoned and mapped onto an -node ppelne ( ) by mergng adacent nodes N,N +,...,N + ( ) nto a new node N. The new node N combnes all computaton worload, receves W r bts of data, and sends W s bts of data. ommuncaton wthn a node become local data accesses. That s, W p = l= W p +l, and W r =W r,w s =W s. The new -node ppelne s called a parttonng of the ntal -node ppelne. 4 Schedulablty ondtons Tme Tme Ths secton presents the schedulablty condtons for the ppelned on-chp mult-processor system. In the ppelned tmng dagram Fg. (c) of the three-node ppelne, we fold the tass n Fg. (b) nto a common nterval wth duraton, whch s the delay of each ppelne stage. Note that there appear to be two nstances of tas PRO on node N 3. Ths does not mean that tas PRO on node N 3 s preempted. In fact, each nstance s a part of an ntegrated tas PRO across the boundary between ppelne stages. In other words, the boundary between ppelne stages resdes n the mddle durng the executon of tas PRO. Fg. (c) shows that due to the common deadlne, communcaton actvtes are shfted to dfferent tme slots, such that at any gven tme, there s at most one actve communcaton nstance (a SEN REV + par, e.g. SEN REV 3 and SEN REV are seralzed). Ths s especally meanngful f all nodes share the communcaton medum such as Ethernet nstead of pont-to-pont connectons. If collson does not occur, then our estmaton on both performance and energy of the whole system can be well bounded. ollson s always undesrable because retransmsson costs both tme and energy. ommuncaton actvtes should be scheduled such that the system s collsonfree. Lemma (ollson-free ondton) In an -node ppelne wth a deadlne, let T, =,,..., ndcate the delays of + nstances of data communcaton. T = T r ( = ) T s = T r+ ( =,,..., ) T s ( = ) The system does not have collson on the shared communcaton medum ff the utlzaton of the shared communcaton medum s less than or equal to. That s, U = T = (7) Note that for a general mult-processor, Lemma expresses the overload condton and can be only a necessary condton for a collson-free schedule. However, t s also a suffcent condton for -node ppelnes as defned n Secton 3.3, because ths specal case of ppelnng has the property of seralzng all communcaton nstances. Lemma s also the schedulablty condton for the shared communcaton medum. Lemma (Schedulablty ondton of One Node) In an -node ppelne wth a deadlne, nodes N, =,,...,, N s able to meet the deadlne ff N s not overloaded, that s, W p max(f p ) T r T s (8) Lemma states the overload condton of one node: gven the communcaton speeds (that determne communcaton delays T r,t s ), f ts computaton tas cannot be completed before the tme budget T r T s by operatng at the maxmum PU cloc rate, then ths node wll fal to meet the deadlne and thus the whole ppelne wll be malfunctonng. If Lemma cannot be satsfed, then the only way to meet the deadlne s to select hgher communcaton speeds to reduce T r,t s, n order to allocate addtonal tme budget for computaton. Hgh-speed communcaton can also reduce communcaton collson to satsfy Lemma.

5 Wr = 8Kb N: Target etecton Wp = 4K cycles Ws = Wr = 4Kb N: FFT Wp = 9K cycles Ws = Wr3 = 4Kb N3: Flter Wp3 = 54K cycles Ws3 = Wr4 = 4Kb N4: IFFT Wp4 = 357K cycles Ws4 = Wr5 = 4Kb N5: ompute stance Wp5 = 639K cycles Fgure 3: Functonal blocs of the ATR algorthm. bps Node N OVERHEA bps bps (a) A fne-gran parttonng scheme reduces energy on computaton, at the cost of nter-proessor communcaton and overhead of addtonal nodes. bps erge N and N nto a combned node N PRO (ncreased OVERHEA (b) The combned node reduces communcaton and overhead, but t requres more energy for computaton. bps Tme Tme Node N OVERHEA bps bps Node N OVERHEA Tme bps Ws5 = 4Kb (c) The computaton energy can be reduced by ncreasng communcaton speeds, whch leaves more tme on computaton. Fgure 4: The mpact of dfferent parttonng schemes and communcaton speed settngs. Lemma 3 (Schedulablty ondton of the System) An -node ppelne s schedulable to meet a deadlne ff () node N, =,,...,, N meets the deadlne (Lemma ), and () The shared communcaton medum s collson-free (Lemma ). Lemma 3 says that the system s schedulablty s determned by the schedulablty of all resources, ncludng nodes and the communcaton medum. If and only f none of them s overloaded, the system can be ppelned by the deadlne. Lemma 3 holds true only for ths -ppelne organzaton; t s a necessary but not suffcent condton for a general mult-processor system. 5 otvatng Example We use an automatc target recognton (ATR) algorthm (Fg. 3) as our motvatng example. Orgnally t s a seral algorthm. We reconstructed a parallel verson and mapped t onto ppelned multple processors. Ppelnng allows each processor to run at a much slower speed wth a lower voltage level to reduce overall computaton energy, whle parallelsm compensates for the performance. Of course, a multprocessor platform ncurs energy for nter-processor communcaton, extra processors, memory, and other overhead. appng Tas to Node through Parttonng Gven the fve functonal blocs (tass) of the ATR algorthm, several parttonng schemes are possble for mappng the tass to a number of ppelned nodes. Fg. 4 shows an Tme example by consderng how they map the frst two tass onto nodes. In Fg. 4(a), they are mapped onto two nodes N and N that are both allowed to operate at a lower speed (3Hz) for computaton. Ths scheme has lower computaton energy than f they were mapped onto one node, but t requres energy on communcaton tass SEN REV, plus overhead. Fg. 4(b) shows a mappng onto one node. It elmnates the communcaton SEN REV and the overhead of an extra node. However, the combned node has much more computaton worload and must run at a faster cloc rate (6Hz), a less energy-effcent level. Zoomng out, many parttonng schemes are possble, even when lmted to a ppelned organzaton. For example, one parttonng [N, N][N3, N4, N5] may be optmal for nodes N and N; but t wll preclude another soluton [N],[N,N3],[N4,N5] that may lead to lower energy for the whole system. Speed Selecton for PU and ommuncaton In addtonal to parttonng, the selecton of communcaton speed s an equally crtcal ssue. For example we consder a //Base-T Ethernet nterface. It consumes more power than the PU at hgh (/bps) speeds, but less power than the PU at the slower, bps data rate. In Fg. 4(b), the processor must operate at a hgh cloc rate due to the low-speed communcaton at bps. Because of the deadlne, communcaton and computaton compete for ths budget. Low-speed communcaton leaves less tme for computaton, thereby forcng the processor to run faster to meet the deadlne. onversely, hgh-speed communcaton could free more tme budget for computaton, shown n Fg. 4(c), where the PU s cloc rate s dropped to 3Hz wth bps communcaton. Although extra energy could be allocated to communcaton, f the energy savng on the PU could compensate for ths cost, then (c) would be more energy-effcent than (b). The communcaton-computaton nteracton becomes more ntrcate n a mult-processor envronment. Any data dependency between dfferent nodes must nvolve ther communcaton nterfaces. The communcaton speed of a sender wll not only determne the recever s communcaton speed but also nfluence the choce of the recever s computaton speed. The communcaton speed on the frst node of the ppelne wll have a chan effect on all other nodes n the system. A locally optmal speed for the frst node wll not necessarly lead to a globally optmal soluton. ombnng Parttonng and Speed Selecton Parttonng and communcaton speed selecton are mutually enablng each other. Gven a fxed parttonng scheme, the desgners can always fnd the correspondng optmal speed settng that mnmzes energy for that scheme. However, energy-optmal speed selecton for a parttonng s not necessarly optmal over all parttonngs. Instead, parttonng and speed selecton are mutually enablng. In ths pa-

6 per, we tae a mult-dmensonal optmzaton approach that consders performance requrement, schedulablty, load balancng, communcaton-computaton trade-offs, and multprocessor overhead n a system-level context. 6 Problem Formulaton Gven an -node ppelne, the choces of parttonng and communcaton speed settngs wll lead to dfferent energy consumpton at the system level. Ths secton formulates the energy mnmzaton problems by means of parttonng and communcaton speed selecton. In both cases, the optmal solutons can be obtaned by dynamc programmng. Fnally, the combned optmzaton problem wth both parttonng and communcaton speed selecton can be addressed synergstcally by mult-dmensonal dynamc programmng. Problem (Optmal Parttonng) Gven (a) ppelned nodes N wth worload W p,w r,w s, =,,...,, (b) a deadlne for all nodes, and (c) the constrant that the speed settngs of all communcaton nstance must match: F r,f s = F r+,f s, for =,,...,, fnd a parttonng scheme that mnmzes energy E sys. To avod exhaustve enumeraton n the O( ) soluton space, we construct a seres of sub-problems as follows. We consder a sub-problem P[, ] that maps the frst orgnal nodes N,N,...,N onto a sub-parttonng nodes N,N,...,N. The optmal soluton of P[, ] has the mnmum energy E[, ]. It can be decomposed nto two parts shown n Fg. 5: (a) a sub-parttonng P[,l] that maps frst l orgnal nodes to new nodes, plus (b) the th new node N that combnes the orgnal nodes N l+,...,n wth ts energy denoted as E N. In order to acheve the mnmum energy E[, ], the energy consumpton of (a) must also be an optmal sub-soluton E[,l]. Snce l can be any value n a range l, E[, ] must also be the mnmum value of E[,l] + E N over all these possble values of l. That s, E[, ] = mn l {E[,l] + E N }. Any optmal sub-soluton E[, ] can be derved from other optmal sub-solutons E[,l]. Therefore, the problem has an optmal sub-structure and a dynamc programmng approach s approprate. It s llustrated n Fg. 6. atrx E[, ] s ntalzed to for. We defne E[,] = and t can be used to compute the frst row E[, ], =,,...,. For any entry E[, ], ts value can be computed by entres n the prevous row E[,l], l. These entres are shaded n Fg. 6. Thus, a seres of optmal sub-solutons E[, ],E[3, ],...,E[, ] n each row of the matrx can be computed subsequently. Fnally, these sub-solutons lead to the global optmal soluton mn {E[,]}, whch maps all orgnal nodes onto a new parttonng wth mnmum energy. Note that the same algorthm can also solve the optmal parttonng onto a fxed number of nodes. For example, orgnal nodes -node optmal sub-parttonng wth mnmum energy E[, ] (a) a sub-parttonng that maps l nodes N,..., Nl on to - new nodes N',..., N'- wth mnmum energy E[-, l] N N Nl Nl+ N N' N'- N' (b) the last new node N' combnes nodes Nl+,..., N wth energy EN' Fgure 5: The optmal sub-structure of Problem. - E[,] E[,] E[,] E[,] E[-, -] l = -,..., - E[-, -] E[,] E[-, E[-, -] ] E opt = mn {E[, ]} =,,..., E[,] E[,] Fgure 6: The dynamc programmng approach to solve Problem. Each entry E[, ] can be computed by the shaded entres n the prevous row. The global optmal energy s the mnmum value of the last column. E[,] s the optmal energy for mappng nodes onto an arbtrary -node new parttonng. To summarze, the optmal cost functon E s defned as follows: E[, ] = E[,] E[, ] for = = mn l { E[,l] } f +E N for U[,l] + W s F s, (9) To guarantee each optmal sub-soluton s schedulable, by Lemma 3, the communcaton medum must be collsonfree, and any node n the new sub-parttonng must not be overloaded. We defne a utlzaton matrx U[, ] ndcatng the utlzaton of the communcaton medum correspondng to the optmal soluton of a sub-problem P[, ], whch s guarded by U[, ] (Lemma ). U s ntalzed to, whle settng U[,] = W r F r (= T n(7)), ndcatng the bandwdth used by the frst communcaton nstance REV. We also defne the energy consumpton of a node N as E N that refnes (5) by Lemma. If a node s overloaded, then ts energy consumpton s ndcatng an nvald soluton.

7 parttonng(w r [ : ],W s [ : ],W p [ : ],F r [ : ],F s [ : ], scale r,scale s,scale p,,p ovh ) for := to do for := to do E[, ] := U[, ] := P[, ] := E[,] := U[,] := W r []/F r []/ for := to do for := to do for l := to do e := E[,l] + E N u := U[,l] +W s [ ]/F s [ ]/ f u and e < E[, ] then E[, ] := e U[, ] := u P[, ] := l E opt,p opt := retreve from matrces E,P return E opt,p opt U[, ] = E N = Fgure 7: Optmal parttonng algorthm. W r F r for = = U[,l] + W s F s scale r (F r )T r + scale s (F s )T s + scale p (F p )T p + P ovh for l that acheves mn{e[, ]} n (9), for f F p = W p T r T s F max (T r = W r F r,t s = W s F s ) () otherwse () Fg. 7 shows the optmal parttonng algorthm derved from (9) and (). The parttonng matrx P[, ] records the prevous optmal sub-solutons for each sub-problem. Ths nformaton can be used to retreve the optmal parttonng P opt. The tme complexty of ths algorthm s O( 3 ) determned by the three-level nested loop. Problem (Optmal Speed Selecton) Gven (a) a fxed parttonng scheme wth ppelned nodes N wth worload W p,w r,w s, =,,...,, (b) a deadlne for all nodes, and (c) the avalable choces for communcaton speed settngs F c, =,,...,, fnd all processor speeds F p and communcaton speeds F r,f s that mnmze energy E sys. We also perform dynamc programmng as opposed to exhaustve search n O( + ) soluton space. Snce communcaton speeds decde processor speeds, we only select communcaton speeds for each node. Gven that the sendng speed and recevng speed are equal for each communcaton nstance, selectng only sendng speed s suffcent. frst nodes where the last sendng speed Fs= Fc wth mnmum energy E[, ] N N... N- N (a) a sub speed selecton problem where node N -'s sendng speed selected as Fs- = Fcm wth mnmum energy E[-, m] sendng speed Fs- = Fcm recevng speed Fr = Fcm sendng speed Fs = Fc (b) the last node N whose recevng speed s Fcm and sendng speed s Fc wth energy EN(Fr = Fcm, Fs = Fc) Fgure 8: The optmal sub-structure of Problem. - E[,] E[,] E[,] E[-,] E[,] E[,] E[,] E[,] E[-,-] E[,] E[,] E[,] E[,] E[-,] E[,] E[,] E opt = mn {E[, ]} =,,..., Fgure 9: The dynamc programmng approach to solve Problem. Each entry E[,] can be computed by the shaded row E[,l]. The global optmal energy s the mnmum value of the last row. We defne a sub-problem S[, ] that selects communcaton speeds for the frst nodes, wth the last node N s sendng speed selected to be the th choce of speed settngs, F s = F c. Its optmal sub-soluton has mnmum energy E[,]. As llustrated n Fg. 8, a sub-problem S[,] conssts of two parts: (a) another sub-problem S[,m] that selects speed settngs for the frst nodes wth node N s sendng speed F s = F cm, combned wth (b) node N wth recevng speed F r = F cm and sendng speed F s = F c. (a) must be an optmal sub-soluton wth mnmum energy E[,m]. (b) has only one node N that receves data from (a) through speed F cm ; and ts sendng speed s F c. Its energy s denoted as E N (F r = F cm,f s = F c ). Therefore, E[,] = E[,m] + E N (F r = F cm,f s = F c ). In the sub-problem S[,m], F cm can be any choce among F c,f c,...,f c. In order to acheve the mnmum energy E[,], t must be the mnmum value among all possble F cm. That s, the optmal sub-structure of ths problem can be defned as E[,] = mn m {E[,m] + E N (F r = F cm,f s = F c )} The dynamc programmng algorthm s llustrated n Fg. 9. Snce each E[,] can be derved from the prevous row E[,m],m =,,...,, the algorthm can compute all rows of matrx E from E[,],E[,],..., to E[,], =,,..., sequentally. The global optmal energy s the mnmum value n the last row, mn {E[,]}. The energy matrx E[, ] and utlzaton matrx U[, ] are defned as follows. U[, ] guarantees that each optmal sub-soluton E[, s schedulable. Both E and U are ntalzed to, except E[,] =, U[,] s set to the utlzaton

8 speedselecton(w r [ : ],W s [ : ],W p [ : ],F c [ : ], scale r,scale s,scale p,,p ovh ) for := to do for := to do E[,] := U[,] := S[,] := for := to do E[,] := U[,] := W r []/F c []/ for := to do for := to do for m := to do e := E[,m] + E N (F r = F c [m],f s = F c []) u := U[,m] +W s []/F c []/ f u and e < E[,m] then E[,] := e U[,] := u S[,] := m E opt,s opt := retreve from matrces E,S return E opt,s opt Fgure : Optmal speed selecton algorthm. of the frst communcaton nstance REV usng communcaton speed F c, for =,,...,. E[,] = for mn m W r F c E[,m]+ E N (F r = F cm, F s = F c ) for =, =, f U[,m] + W s F c, for, () U[,] = for m that acheves U[,m] mn{e[,]} n (), + W s F c, for (3) The algorthm s shown n Fg.. The speed matrx S records the prevous optmal sub-solutons. The optmal speed settng S opt wll be retreved from S. The tme complexty of ths algorthm s O( ). Note that the algorthm can be modfed trvally to f the frst communcaton speed F r and the last communcaton speed F s are fxed. Ths refers to the stuaton where the ppelned mult-processor has a fxed communcaton speed settng to other components whle ts nternal communcaton speeds can be selected to optmal. Problem 3 (Optmal Parttonng and Speed Selecton) Gven (a) ppelned nodes N wth worload W p,w r,w s, =,,...,, (b) a deadlne for all nodes, and (c) the avalable choces for communcaton speed settngs F c, =,,...,, fnd a parttonng scheme and correspondng communcaton speed settngs that mnmze energy E sys. ue to the nter-dependency between speed settngs and parttonng schemes, the optmal soluton cannot be acheved by solvng two prevous problems ndvdually. Exhaustvely enumeratng over one dmenson and dynamc programmng over the other s qute expensve wth the tme complexty as ether O( ) or O( + 3 ). We proposed a mult-dmensonal dynamc programmng algorthm gven the fact that the two prevous problems are all characterzed by optmal sub-structures. Based on the dynamc programmng approaches n prevous problems, we defne a subproblem PS[,,] that maps orgnal nodes N,N,...,N onto an -node new sub-parttonng N,N,...,N, wth the last node N s sendng speed F s = F c. The optmal subsoluton has mnmum energy E[,,]. Smlar to the prevous problems, a sub-problem PS[,, ] can be decomposed wth an optmal sub-structure, shown n Fg.. (a) s a prevous sub-problem PS[,l,m], whch maps the frst l orgnal nodes N,N,...,N l onto new nodes wth node N s sendng speed selected as F c m. (b) s the new node N that combnes orgnal nodes N l+,...,n wth recevng speed F cm and sendng speed F c. (a) must be an optmal sub-soluton wth the mnmum energy E[,l,m]. Note that (b) has only one node N, and ts energy s denoted as E N (F r = F cm,f s = F c ). For sub-soluton E[,l,m], l can be any value n range l and F cm s one of speed choces F c,f c,...,f c. E[,,] must be derved from all possble pars of (l,m) to acheve the mnmum value. Therefore, E[,,] = mn l, m {E[,l,m] + E N (F r = F cm,f s = F c )}. The algorthm s llustrated n Fg.. The threedmensonal matrx E[,, ] s represented by a seres of two-dmensonal sub-matrx ndexed by =,,...,. Any E[,,] can be computed from entres n a sub-matrx E[,l,m], l, m. The algorthm constructs all optmal sub-solutons from E[,,],E[,,],... to E[,,],,. The global mnmum energy s mn, {E[,,]}. It refers to the mnmum value of the last rows n all sub-matrces. The energy matrx E[,,] and the utlzaton matrx U[,,] s defned as follows.

9 orgnal nodes -node optmal sub-parttonng where the last sendng speed F's = Fc wth mnmum energy E[,, ] - - N N Nl Nl+ N N' N'- N' sendng recevng sendng speed speed speed F's- = Fcm F'r = Fcm F's = Fc (a) a sub-parttonng that maps l nodes N,..., Nl on to - new nodes N',..., N'- where node N'- 's sendng speed selected as F's- = Fcm wth mnmum energy E[-, l] (b) the last new node N' combnes nodes Nl+,..., N whose recevng speed s Fcm and sendng speed s Fc wth energy EN'(Fr = Fcm, Fs = Fc) Fgure : The optmal sub-structure of Problem 3. E[-,,] E[-,-,] - E[-,-,] - E[-,,] E[-,,]... E[-,-,] E[-,-,] - E[-,-,] E[-,-,] - E[-,,] E[-,,] E[-,-,] E[-,-,] - E[-,-,] E[-,-,] - E[-,,] E[-,,] E[-,,] E[-,,] E[-,,] E[-,-,] E[-,-,] E[-,-,] E[-,-,] E[-,-,] E[-,-,] E[-,-,] E[-,,] E[-,-,] E[-,-,] E[-,-,] E[-,,] E[-,,] E[-,-,] E[-,-,] E[-,,] E[-,-,] E[-,-,] E[-,,] l = -,..., - m =,,..., E[-,,] E[-,,] E[,,]... Eopt = mn{e[,, ]} =,,..., =,,..., E[-,,] E[,,] E[-,,] E[-,,]... E[,,] E[,,] E[-,,] E[-,,] E[,,] E[,,] E[,,] E[,,]... Fgure : The mult-dmensonal dynamc programmng approach to solve Problem 3. Each entry E[,, ] can be computed by the shaded entres n the prevous sub-matrx. The global optmal energy s the mnmum value n the last row of all sub-matrces. E[,,] = U[,,] = for mn l, m W r F c U[,l,m] + W s F c E[,l,m]+ E N (F r = F cm, F s = F c ) for = =, E[-,,] E[-,,] E[,,] = =, U[,l,m] f + W s F c, for, (4) for (l,m) that acheve mn{e[,,]} n (4),, for (5) The algorthm s shown n Fg. 3. It combnes two prevous algorthms by two-dmensonal dynamc programmng. The tme complexty of the algorthm s O( 3 ). It also apples to stuatons where the new parttonng has a fxed number of nodes, or the ppelne has a fxed communcaton nterface to other components whle only nternal communcaton speed can be selected. parttonng-speedselecton(w r [ : ],W s [ : ],W p [ : ], F c [ : ],scale r,scale s,scale p,,p ovh ) for := to do for := to do for := to do E[,,] := U[,,] := P[,,] := S[,,] := for := to do E[,,] := U[,,] := W r []/F c []/ for := to do for := to do for := to do for l := to do for m := to do e := E[,l,m] + E node (merge(n l+,...,n ), wth F r = F c [m],f s = F c []) u := U[,l,m] +W s [ ]/F c []/ f u and e < E[,,] then E[,,] := e U[,,] := u P[,,] := l S[,,] := m E opt,p opt,s opt := retreve from matrces E,P,S return E opt,p opt,s opt Fgure 3: ombned parttonng wth speed selecton. 7 Analytcal Results To evaluate our energy optmzaton technque, we expermented wth mappng the ATR algorthm [4] (Fg. 3) onto two fxed parttonng schemes: (a) a sngle-node that combnes all blocs, and (b) a fve-node ppelne that maps each bloc onto an ndvdual node. (a) and (b) are two extremes representng seral vs. parallel schemes. For both (a) and (b) we apply optmal speed selecton. We also fnd the optmal parttonng wth speed selecton as (c) and compare wth (a) and (b) under three types of performance requrements: () hgh performance, = ms, () moderate performance, = 5ms, and (3) low performance, = ms. Each node conssts of an XScale processor and an LXT- Ethernet nterface from Intel. The Scale p and Scale s (same as Scale r ) functons, whch ndcate the power vs. performance characterstcs of a node, are extracted from ther data sheets [, 3] and are shown n Fg. 4 and 5. Besdes the power draw from the PU and communcaton nterfaces, we assume each node has a constant power draw P ovh = mw. The results are presented n Fg. 6. In all cases, bps s always the optmal speed settng for communcaton. The low-power, bps communcaton speed results n the hghest energy. Ths s because t leaves so lttle tme for computaton such that the processors must run faster wth more energy to meet the deadlne, and t has the hghest energy-per-bt ratng. The low-speed communcaton also tends to volate the schedulablty condtons (Lemma 3). Gven propertes of ths partcular Eth-

10 4 4 4 Overhead Energy / frame (mj) ommuncaton omputaton (a) -node (b) 5-node (c) Optmal NN N3 N4 N5 () hgh performance = ms (a) -node (b) 5-node (c) Optmal NNN3N4 N5 () moderate performance = 5ms (a) -node (optmal) (b) 5-node (3) low performance = ms Fgure 6: Analytcal results. ernet nterface, bps communcaton wll always lead to the lowest energy consumpton snce t requres the least amount of energy per bt and leaves the maxmum amount of tme budget for reducng PU energy. However, n cases where the energy-per-bt ratng does not decrease monotoncally wth the communcaton speed, the optmal speed settng may nvolve some combnatons of low-speed and hghspeed settngs between dfferent nodes. For example, the node N may communcate wth N at bps and wth N + at bps. Fg. 6() shows the energy consumpton per mage frame n three parttonng schemes. Wth a tght performance constrant, the sngle-node (a) s heavly loaded wth computaton. Therefore t s desrable to reduce PU energy by ppelnng. As a result, the fve-node ppelne (b) s more energy-effcent at the cost of addtonal communcaton and overhead. However, the optmal parttonng s (c) wth three nodes: [N,N],[N3,N4],[N5]. It consumes more PU energy than (b), but overall t s optmal wth less energy on communcaton and overhead. Fgure 4: vs. performance of the XScale processor. ode bps consumpton 8 mw bps.5w bps 6W Fgure 5: modes of the Ethernet nterface. In case of the moderate performance constrant (Fg. 6()), (a) s stll domnated by computaton but t s not heavly loaded due to the relaxed deadlne. The reducton of PU energy by (b) cannot compensate for the added overhead of new nodes and communcaton. Therefore (a) s better than (b) and ppelnng seems neffcent. However, the optmal parttonng (c) s stll a ppelned soluton. It combnes N,N,N3,N4 nto one node and maps N5 to another node. (c) acheves mnmum energy by approprately balancng computaton, communcaton wth ppelnng overhead. In cases where the performance s not crtcal, ppelnng s not effcent and the seral soluton (a) s optmal. Fg. 6(3) shows that the computaton load on (a) s very lght. Introducng addtonal nodes wll only save margnal PU energy that wll be offset by extra communcaton and overhead.

11 8 oncluson We present a combned parttonng and speed selecton technque for the energy optmzaton of embedded multprocessor-on-chp archtectures wth hgh-speed onchp networs. As communcaton power approaches or surpasses that of processor power, communcaton must be treated as a prmary concern n system-level energy optmzaton. We explot the mult-speed feature of modern hgh-speed communcaton nterfaces as an effectve way to complement and enhance today s PU-centrc power optmzaton approaches. In such systems, communcaton and computaton compete over opportuntes for operatng at the most energy-effcent ponts. It s crtcal to not only balance the load among processors by functonal parttonng, but also to balance the speeds between communcaton and computaton on each node and across the whole system. Our mult-dmensonal dynamc programmng formulaton s exact and s of polynomal tme complexty. It produces energy-optmal solutons as defned by a parttonng scheme and by the speed selectons for all computaton and communcaton tass. We expect ths technque to be applcable to a large class of data domnated systems-on-chp that can be structured n a ppelned organzaton. References [] The Alchemy Au from A: Internet edge processor. nfo/- au/ndex.html. [] INTEL ethernet PHYs/transcevers. ntel.com/desgn/networ/products/ethernet/- lnecard ept.htm. [3] INTEL XScale mcroarchtecture. ntel.com/desgn/ntelxscale/. [4] N. K. Bambha, S. S. Bhattacharyya, J. Tech, and E. Ztzler. Hybrd global/local search strateges for dynamc voltage scalng n embedded multprocessors. In Proc. Internatonal Symposum on Hardware/Software odesgn, pages 43 48,. [5] L. Benn and G. e chel. Networs on chps: a new soc paradgm. IEEE omputer, 35():7 78, Jan. [6] R. herabudd,. Bayoum, and H. Krshnamurthy. A low power based system parttonng and bndng technque for mult-chp module archtectures. In Proc. Proc. Great Laes Symposum on VLSI, pages 56 6, 997. [7] P. Eles, A. obol, P. Pop, and Z. Peng. Schedulng wth bus access optmzaton for dstrbuted embedded systems. IEEE Transactons on VLSI Systems, 8(5):47 49,. [8] E. Huwang, F. Vahd, and Y.-. Hsu. FS functonal parttonng for low power. In Proc. esgn, Automaton and Test n Europe, pages 8, 999. [9] P. V. Knudsen and J. adsen. Integratng communcaton protocol selecton wth hardware/software codesgn. IEEE Transactons on omputer-aded esgn of Integrated rcuts and Systems, 8(8):77 95, August 999. [] K. Lahr, A. Raghunathan, and G. Lashmnarayana. LOTTERYBUS: a new hgh-performance communcaton archtecture for system-on-chp desgns. In Proc. esgn Automaton onference, pages 5, June. [] J. Luo and N. K. Jha. Battery-aware statc schedulng for dstrbuted real-tme embedded systems. In Proc. esgn Automaton onference, pages , June. [] R. Ortega and G. Borrello. ommuncaton synthess for dstrbuted embedded systems. In Proc. Internatonal onference on omputer-aded esgn, pages , 998.

12 [3]. Sgro,. Sheets, A. hal, K. Keutzer, S. al, J. Rabaey, and A. Sangovann-Vncentell. Addressng the system-on-a-chp nterconnect woes through communcaton-based desgn. In Proc. esgn Automaton onference, pages , June. [4] R. Sms. Sgnal to clutter measurement and ATR performance. Proc. of the SPIE - The Internatonal Socety for Optcal Engneerng, 337():3 7, Aprl 998. [5] A. Wang and A. handraasan. Energy effcent system parttonng for dstrbuted wreless sensor networs. In Proc. IEEE Internatonal onference on Acoustcs, Speech and Sgnal Processng, pages 95 98, ay. [6] E. F. Weglarz, K. K. Salua, and. H. Lpast. nmzng energy consumpton for hgh-performance processng. In Proc. Asan and South Pacfc esgn Automaton onference, pages 99 4,. [7] W. Wolf. An archtectural co-synthess algorthm for dstrbuted embedded computng systems. IEEE Transactons on VLSI Systems, pages 8 9, June 997.

Combined Functional Partitioning and Communication Speed Selection for Networked Voltage-Scalable Processors

Combined Functional Partitioning and Communication Speed Selection for Networked Voltage-Scalable Processors Combned Functonal Parttonng and Communcaton Speed Selecton for Networked Voltage-Scalable Processors Jnfeng Lu, Pa H. Chou, Nader Bagherzadeh epartment of Electrcal & Computer Engneerng Unversty of Calforna,

More information

Combined Functional Partitioning and Communication Speed Selection for Networked Voltage-Scalable Processors Λ

Combined Functional Partitioning and Communication Speed Selection for Networked Voltage-Scalable Processors Λ Combned Functonal Parttonng and Communcaton Speed Selecton for Networked Voltage-Scalable Processors Λ Jnfeng Lu, Pa H. Chou, Nader Bagherzadeh epartment of Electrcal & Computer Engneerng Unversty of Calforna,

More information

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

AADL : about scheduling analysis

AADL : about scheduling analysis AADL : about schedulng analyss Schedulng analyss, what s t? Embedded real-tme crtcal systems have temporal constrants to meet (e.g. deadlne). Many systems are bult wth operatng systems provdng multtaskng

More information

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique //00 :0 AM Outlne and Readng The Greedy Method The Greedy Method Technque (secton.) Fractonal Knapsack Problem (secton..) Task Schedulng (secton..) Mnmum Spannng Trees (secton.) Change Money Problem Greedy

More information

Parallel matrix-vector multiplication

Parallel matrix-vector multiplication Appendx A Parallel matrx-vector multplcaton The reduced transton matrx of the three-dmensonal cage model for gel electrophoress, descrbed n secton 3.2, becomes excessvely large for polymer lengths more

More information

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization Problem efntons and Evaluaton Crtera for Computatonal Expensve Optmzaton B. Lu 1, Q. Chen and Q. Zhang 3, J. J. Lang 4, P. N. Suganthan, B. Y. Qu 6 1 epartment of Computng, Glyndwr Unversty, UK Faclty

More information

Maintaining temporal validity of real-time data on non-continuously executing resources

Maintaining temporal validity of real-time data on non-continuously executing resources Mantanng temporal valdty of real-tme data on non-contnuously executng resources Tan Ba, Hong Lu and Juan Yang Hunan Insttute of Scence and Technology, College of Computer Scence, 44, Yueyang, Chna Wuhan

More information

Configuration Management in Multi-Context Reconfigurable Systems for Simultaneous Performance and Power Optimizations*

Configuration Management in Multi-Context Reconfigurable Systems for Simultaneous Performance and Power Optimizations* Confguraton Management n Mult-Context Reconfgurable Systems for Smultaneous Performance and Power Optmzatons* Rafael Maestre, Mlagros Fernandez Departamento de Arqutectura de Computadores y Automátca Unversdad

More information

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,

More information

The Codesign Challenge

The Codesign Challenge ECE 4530 Codesgn Challenge Fall 2007 Hardware/Software Codesgn The Codesgn Challenge Objectves In the codesgn challenge, your task s to accelerate a gven software reference mplementaton as fast as possble.

More information

Communication Speed Selection and Functional Partitioning for Low-Energy On-Chip Networked Multiprocessor

Communication Speed Selection and Functional Partitioning for Low-Energy On-Chip Networked Multiprocessor Communication Speed Selection and Functional Partitioning for Low-Energy On-Chip Networked Multiprocessor Jinfeng Liu, Pai H. Chou, Nader Bagherzadeh epartment of Electrical & Computer Engineering University

More information

Chapter 1. Introduction

Chapter 1. Introduction Chapter 1 Introducton 1.1 Parallel Processng There s a contnual demand for greater computatonal speed from a computer system than s currently possble (.e. sequental systems). Areas need great computatonal

More information

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms Course Introducton Course Topcs Exams, abs, Proects A quc loo at a few algorthms 1 Advanced Data Structures and Algorthms Descrpton: We are gong to dscuss algorthm complexty analyss, algorthm desgn technques

More information

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 A mathematcal programmng approach to the analyss, desgn and

More information

Simulation Based Analysis of FAST TCP using OMNET++

Simulation Based Analysis of FAST TCP using OMNET++ Smulaton Based Analyss of FAST TCP usng OMNET++ Umar ul Hassan 04030038@lums.edu.pk Md Term Report CS678 Topcs n Internet Research Sprng, 2006 Introducton Internet traffc s doublng roughly every 3 months

More information

Distributed Resource Scheduling in Grid Computing Using Fuzzy Approach

Distributed Resource Scheduling in Grid Computing Using Fuzzy Approach Dstrbuted Resource Schedulng n Grd Computng Usng Fuzzy Approach Shahram Amn, Mohammad Ahmad Computer Engneerng Department Islamc Azad Unversty branch Mahallat, Iran Islamc Azad Unversty branch khomen,

More information

Efficient Distributed File System (EDFS)

Efficient Distributed File System (EDFS) Effcent Dstrbuted Fle System (EDFS) (Sem-Centralzed) Debessay(Debsh) Fesehaye, Rahul Malk & Klara Naherstedt Unversty of Illnos-Urbana Champagn Contents Problem Statement, Related Work, EDFS Desgn Rate

More information

Cache Performance 3/28/17. Agenda. Cache Abstraction and Metrics. Direct-Mapped Cache: Placement and Access

Cache Performance 3/28/17. Agenda. Cache Abstraction and Metrics. Direct-Mapped Cache: Placement and Access Agenda Cache Performance Samra Khan March 28, 217 Revew from last lecture Cache access Assocatvty Replacement Cache Performance Cache Abstracton and Metrcs Address Tag Store (s the address n the cache?

More information

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,

More information

5 The Primal-Dual Method

5 The Primal-Dual Method 5 The Prmal-Dual Method Orgnally desgned as a method for solvng lnear programs, where t reduces weghted optmzaton problems to smpler combnatoral ones, the prmal-dual method (PDM) has receved much attenton

More information

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour 6.854 Advanced Algorthms Petar Maymounkov Problem Set 11 (November 23, 2005) Wth: Benjamn Rossman, Oren Wemann, and Pouya Kheradpour Problem 1. We reduce vertex cover to MAX-SAT wth weghts, such that the

More information

Kent State University CS 4/ Design and Analysis of Algorithms. Dept. of Math & Computer Science LECT-16. Dynamic Programming

Kent State University CS 4/ Design and Analysis of Algorithms. Dept. of Math & Computer Science LECT-16. Dynamic Programming CS 4/560 Desgn and Analyss of Algorthms Kent State Unversty Dept. of Math & Computer Scence LECT-6 Dynamc Programmng 2 Dynamc Programmng Dynamc Programmng, lke the dvde-and-conquer method, solves problems

More information

Load Balancing for Hex-Cell Interconnection Network

Load Balancing for Hex-Cell Interconnection Network Int. J. Communcatons, Network and System Scences,,, - Publshed Onlne Aprl n ScRes. http://www.scrp.org/journal/jcns http://dx.do.org/./jcns.. Load Balancng for Hex-Cell Interconnecton Network Saher Manaseer,

More information

Private Information Retrieval (PIR)

Private Information Retrieval (PIR) 2 Levente Buttyán Problem formulaton Alce wants to obtan nformaton from a database, but she does not want the database to learn whch nformaton she wanted e.g., Alce s an nvestor queryng a stock-market

More information

Module Management Tool in Software Development Organizations

Module Management Tool in Software Development Organizations Journal of Computer Scence (5): 8-, 7 ISSN 59-66 7 Scence Publcatons Management Tool n Software Development Organzatons Ahmad A. Al-Rababah and Mohammad A. Al-Rababah Faculty of IT, Al-Ahlyyah Amman Unversty,

More information

Routing in Degree-constrained FSO Mesh Networks

Routing in Degree-constrained FSO Mesh Networks Internatonal Journal of Hybrd Informaton Technology Vol., No., Aprl, 009 Routng n Degree-constraned FSO Mesh Networks Zpng Hu, Pramode Verma, and James Sluss Jr. School of Electrcal & Computer Engneerng

More information

Dynamic Voltage Scaling of Supply and Body Bias Exploiting Software Runtime Distribution

Dynamic Voltage Scaling of Supply and Body Bias Exploiting Software Runtime Distribution Dynamc Voltage Scalng of Supply and Body Bas Explotng Software Runtme Dstrbuton Sungpack Hong EE Department Stanford Unversty Sungjoo Yoo, Byeong Bn, Kyu-Myung Cho, Soo-Kwan Eo Samsung Electroncs Taehwan

More information

Needed Information to do Allocation

Needed Information to do Allocation Complexty n the Database Allocaton Desgn Must tae relatonshp between fragments nto account Cost of ntegrty enforcements Constrants on response-tme, storage, and processng capablty Needed Informaton to

More information

DESIGNING TRANSMISSION SCHEDULES FOR WIRELESS AD HOC NETWORKS TO MAXIMIZE NETWORK THROUGHPUT

DESIGNING TRANSMISSION SCHEDULES FOR WIRELESS AD HOC NETWORKS TO MAXIMIZE NETWORK THROUGHPUT DESIGNING TRANSMISSION SCHEDULES FOR WIRELESS AD HOC NETWORKS TO MAXIMIZE NETWORK THROUGHPUT Bran J. Wolf, Joseph L. Hammond, and Harlan B. Russell Dept. of Electrcal and Computer Engneerng, Clemson Unversty,

More information

Cost-efficient deployment of distributed software services

Cost-efficient deployment of distributed software services 1/30 Cost-effcent deployment of dstrbuted software servces csorba@tem.ntnu.no 2/30 Short ntroducton & contents Cost-effcent deployment of dstrbuted software servces Cost functons Bo-nspred decentralzed

More information

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

Concurrent Apriori Data Mining Algorithms

Concurrent Apriori Data Mining Algorithms Concurrent Apror Data Mnng Algorthms Vassl Halatchev Department of Electrcal Engneerng and Computer Scence York Unversty, Toronto October 8, 2015 Outlne Why t s mportant Introducton to Assocaton Rule Mnng

More information

Meta-heuristics for Multidimensional Knapsack Problems

Meta-heuristics for Multidimensional Knapsack Problems 2012 4th Internatonal Conference on Computer Research and Development IPCSIT vol.39 (2012) (2012) IACSIT Press, Sngapore Meta-heurstcs for Multdmensonal Knapsack Problems Zhbao Man + Computer Scence Department,

More information

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster

More information

Comparison of Heuristics for Scheduling Independent Tasks on Heterogeneous Distributed Environments

Comparison of Heuristics for Scheduling Independent Tasks on Heterogeneous Distributed Environments Comparson of Heurstcs for Schedulng Independent Tasks on Heterogeneous Dstrbuted Envronments Hesam Izakan¹, Ath Abraham², Senor Member, IEEE, Václav Snášel³ ¹ Islamc Azad Unversty, Ramsar Branch, Ramsar,

More information

An Entropy-Based Approach to Integrated Information Needs Assessment

An Entropy-Based Approach to Integrated Information Needs Assessment Dstrbuton Statement A: Approved for publc release; dstrbuton s unlmted. An Entropy-Based Approach to ntegrated nformaton Needs Assessment June 8, 2004 Wllam J. Farrell Lockheed Martn Advanced Technology

More information

A Frame Packing Mechanism Using PDO Communication Service within CANopen

A Frame Packing Mechanism Using PDO Communication Service within CANopen 28 A Frame Packng Mechansm Usng PDO Communcaton Servce wthn CANopen Mnkoo Kang and Kejn Park Dvson of Industral & Informaton Systems Engneerng, Ajou Unversty, Suwon, Gyeongg-do, South Korea Summary The

More information

Efficient Content Distribution in Wireless P2P Networks

Efficient Content Distribution in Wireless P2P Networks Effcent Content Dstrbuton n Wreless P2P Networs Qong Sun, Vctor O. K. L, and Ka-Cheong Leung Department of Electrcal and Electronc Engneerng The Unversty of Hong Kong Pofulam Road, Hong Kong, Chna {oansun,

More information

Feature Reduction and Selection

Feature Reduction and Selection Feature Reducton and Selecton Dr. Shuang LIANG School of Software Engneerng TongJ Unversty Fall, 2012 Today s Topcs Introducton Problems of Dmensonalty Feature Reducton Statstc methods Prncpal Components

More information

CHAPTER 2 PROPOSED IMPROVED PARTICLE SWARM OPTIMIZATION

CHAPTER 2 PROPOSED IMPROVED PARTICLE SWARM OPTIMIZATION 24 CHAPTER 2 PROPOSED IMPROVED PARTICLE SWARM OPTIMIZATION The present chapter proposes an IPSO approach for multprocessor task schedulng problem wth two classfcatons, namely, statc ndependent tasks and

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices Internatonal Mathematcal Forum, Vol 7, 2012, no 52, 2549-2554 An Applcaton of the Dulmage-Mendelsohn Decomposton to Sparse Null Space Bases of Full Row Rank Matrces Mostafa Khorramzadeh Department of Mathematcal

More information

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

An Efficient Genetic Algorithm with Fuzzy c-means Clustering for Traveling Salesman Problem

An Efficient Genetic Algorithm with Fuzzy c-means Clustering for Traveling Salesman Problem An Effcent Genetc Algorthm wth Fuzzy c-means Clusterng for Travelng Salesman Problem Jong-Won Yoon and Sung-Bae Cho Dept. of Computer Scence Yonse Unversty Seoul, Korea jwyoon@sclab.yonse.ac.r, sbcho@cs.yonse.ac.r

More information

A Hybrid Genetic Algorithm for Routing Optimization in IP Networks Utilizing Bandwidth and Delay Metrics

A Hybrid Genetic Algorithm for Routing Optimization in IP Networks Utilizing Bandwidth and Delay Metrics A Hybrd Genetc Algorthm for Routng Optmzaton n IP Networks Utlzng Bandwdth and Delay Metrcs Anton Redl Insttute of Communcaton Networks, Munch Unversty of Technology, Arcsstr. 21, 80290 Munch, Germany

More information

Virtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory

Virtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory Background EECS. Operatng System Fundamentals No. Vrtual Memory Prof. Hu Jang Department of Electrcal Engneerng and Computer Scence, York Unversty Memory-management methods normally requres the entre process

More information

Evaluation of an Enhanced Scheme for High-level Nested Network Mobility

Evaluation of an Enhanced Scheme for High-level Nested Network Mobility IJCSNS Internatonal Journal of Computer Scence and Network Securty, VOL.15 No.10, October 2015 1 Evaluaton of an Enhanced Scheme for Hgh-level Nested Network Moblty Mohammed Babker Al Mohammed, Asha Hassan.

More information

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching A Fast Vsual Trackng Algorthm Based on Crcle Pxels Matchng Zhqang Hou hou_zhq@sohu.com Chongzhao Han czhan@mal.xjtu.edu.cn Ln Zheng Abstract: A fast vsual trackng algorthm based on crcle pxels matchng

More information

RAP. Speed/RAP/CODA. Real-time Systems. Modeling the sensor networks. Real-time Systems. Modeling the sensor networks. Real-time systems:

RAP. Speed/RAP/CODA. Real-time Systems. Modeling the sensor networks. Real-time Systems. Modeling the sensor networks. Real-time systems: Speed/RAP/CODA Presented by Octav Chpara Real-tme Systems Many wreless sensor network applcatons requre real-tme support Survellance and trackng Border patrol Fre fghtng Real-tme systems: Hard real-tme:

More information

Overview. Basic Setup [9] Motivation and Tasks. Modularization 2008/2/20 IMPROVED COVERAGE CONTROL USING ONLY LOCAL INFORMATION

Overview. Basic Setup [9] Motivation and Tasks. Modularization 2008/2/20 IMPROVED COVERAGE CONTROL USING ONLY LOCAL INFORMATION Overvew 2 IMPROVED COVERAGE CONTROL USING ONLY LOCAL INFORMATION Introducton Mult- Smulator MASIM Theoretcal Work and Smulaton Results Concluson Jay Wagenpfel, Adran Trachte Motvaton and Tasks Basc Setup

More information

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth

More information

Hierarchical clustering for gene expression data analysis

Hierarchical clustering for gene expression data analysis Herarchcal clusterng for gene expresson data analyss Gorgo Valentn e-mal: valentn@ds.unm.t Clusterng of Mcroarray Data. Clusterng of gene expresson profles (rows) => dscovery of co-regulated and functonally

More information

Wishing you all a Total Quality New Year!

Wishing you all a Total Quality New Year! Total Qualty Management and Sx Sgma Post Graduate Program 214-15 Sesson 4 Vnay Kumar Kalakband Assstant Professor Operatons & Systems Area 1 Wshng you all a Total Qualty New Year! Hope you acheve Sx sgma

More information

VRT012 User s guide V0.1. Address: Žirmūnų g. 27, Vilnius LT-09105, Phone: (370-5) , Fax: (370-5) ,

VRT012 User s guide V0.1. Address: Žirmūnų g. 27, Vilnius LT-09105, Phone: (370-5) , Fax: (370-5) , VRT012 User s gude V0.1 Thank you for purchasng our product. We hope ths user-frendly devce wll be helpful n realsng your deas and brngng comfort to your lfe. Please take few mnutes to read ths manual

More information

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009. Farrukh Jabeen Algorthms 51 Assgnment #2 Due Date: June 15, 29. Assgnment # 2 Chapter 3 Dscrete Fourer Transforms Implement the FFT for the DFT. Descrbed n sectons 3.1 and 3.2. Delverables: 1. Concse descrpton

More information

TPL-Aware Displacement-driven Detailed Placement Refinement with Coloring Constraints

TPL-Aware Displacement-driven Detailed Placement Refinement with Coloring Constraints TPL-ware Dsplacement-drven Detaled Placement Refnement wth Colorng Constrants Tao Ln Iowa State Unversty tln@astate.edu Chrs Chu Iowa State Unversty cnchu@astate.edu BSTRCT To mnmze the effect of process

More information

Lecture 5: Multilayer Perceptrons

Lecture 5: Multilayer Perceptrons Lecture 5: Multlayer Perceptrons Roger Grosse 1 Introducton So far, we ve only talked about lnear models: lnear regresson and lnear bnary classfers. We noted that there are functons that can t be represented

More information

Load-Balanced Anycast Routing

Load-Balanced Anycast Routing Load-Balanced Anycast Routng Chng-Yu Ln, Jung-Hua Lo, and Sy-Yen Kuo Department of Electrcal Engneerng atonal Tawan Unversty, Tape, Tawan sykuo@cc.ee.ntu.edu.tw Abstract For fault-tolerance and load-balance

More information

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z.

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z. TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS Muradalyev AZ Azerbajan Scentfc-Research and Desgn-Prospectng Insttute of Energetc AZ1012, Ave HZardab-94 E-mal:aydn_murad@yahoocom Importance of

More information

An Investigation into Server Parameter Selection for Hierarchical Fixed Priority Pre-emptive Systems

An Investigation into Server Parameter Selection for Hierarchical Fixed Priority Pre-emptive Systems An Investgaton nto Server Parameter Selecton for Herarchcal Fxed Prorty Pre-emptve Systems R.I. Davs and A. Burns Real-Tme Systems Research Group, Department of omputer Scence, Unversty of York, YO10 5DD,

More information

High-Level Power Modeling of CPLDs and FPGAs

High-Level Power Modeling of CPLDs and FPGAs Hgh-Level Power Modelng of CPLs and FPGAs L Shang and Nraj K. Jha epartment of Electrcal Engneerng Prnceton Unversty {lshang, jha}@ee.prnceton.edu Abstract In ths paper, we present a hgh-level power modelng

More information

Biostatistics 615/815

Biostatistics 615/815 The E-M Algorthm Bostatstcs 615/815 Lecture 17 Last Lecture: The Smplex Method General method for optmzaton Makes few assumptons about functon Crawls towards mnmum Some recommendatons Multple startng ponts

More information

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,

More information

Reliability and Energy-aware Cache Reconfiguration for Embedded Systems

Reliability and Energy-aware Cache Reconfiguration for Embedded Systems Relablty and Energy-aware Cache Reconfguraton for Embedded Systems Yuanwen Huang and Prabhat Mshra Department of Computer and Informaton Scence and Engneerng Unversty of Florda, Ganesvlle FL 326-62, USA

More information

Conditional Speculative Decimal Addition*

Conditional Speculative Decimal Addition* Condtonal Speculatve Decmal Addton Alvaro Vazquez and Elsardo Antelo Dep. of Electronc and Computer Engneerng Unv. of Santago de Compostela, Span Ths work was supported n part by Xunta de Galca under grant

More information

Real-time Scheduling

Real-time Scheduling Real-tme Schedulng COE718: Embedded System Desgn http://www.ee.ryerson.ca/~courses/coe718/ Dr. Gul N. Khan http://www.ee.ryerson.ca/~gnkhan Electrcal and Computer Engneerng Ryerson Unversty Overvew RTX

More information

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Some materal adapted from Mohamed Youns, UMBC CMSC 611 Spr 2003 course sldes Some materal adapted from Hennessy & Patterson / 2003 Elsever Scence Performance = 1 Executon tme Speedup = Performance (B)

More information

Mixed-Criticality Scheduling on Multiprocessors using Task Grouping

Mixed-Criticality Scheduling on Multiprocessors using Task Grouping Mxed-Crtcalty Schedulng on Multprocessors usng Task Groupng Jankang Ren Lnh Th Xuan Phan School of Software Technology, Dalan Unversty of Technology, Chna Computer and Informaton Scence Department, Unversty

More information

Advanced Computer Networks

Advanced Computer Networks Char of Network Archtectures and Servces Department of Informatcs Techncal Unversty of Munch Note: Durng the attendance check a stcker contanng a unque QR code wll be put on ths exam. Ths QR code contans

More information

Real-time Fault-tolerant Scheduling Algorithm for Distributed Computing Systems

Real-time Fault-tolerant Scheduling Algorithm for Distributed Computing Systems Real-tme Fault-tolerant Schedulng Algorthm for Dstrbuted Computng Systems Yun Lng, Y Ouyang College of Computer Scence and Informaton Engneerng Zheang Gongshang Unversty Postal code: 310018 P.R.CHINA {ylng,

More information

Real-Time Guarantees. Traffic Characteristics. Flow Control

Real-Time Guarantees. Traffic Characteristics. Flow Control Real-Tme Guarantees Requrements on RT communcaton protocols: delay (response s) small jtter small throughput hgh error detecton at recever (and sender) small error detecton latency no thrashng under peak

More information

MOBILE Cloud Computing (MCC) extends the capabilities

MOBILE Cloud Computing (MCC) extends the capabilities 1 Resource Sharng of a Computng Access Pont for Mult-user Moble Cloud Offloadng wth Delay Constrants Meng-Hs Chen, Student Member, IEEE, Mn Dong, Senor Member, IEEE, Ben Lang, Fellow, IEEE arxv:1712.00030v2

More information

Solving two-person zero-sum game by Matlab

Solving two-person zero-sum game by Matlab Appled Mechancs and Materals Onlne: 2011-02-02 ISSN: 1662-7482, Vols. 50-51, pp 262-265 do:10.4028/www.scentfc.net/amm.50-51.262 2011 Trans Tech Publcatons, Swtzerland Solvng two-person zero-sum game by

More information

Research of Dynamic Access to Cloud Database Based on Improved Pheromone Algorithm

Research of Dynamic Access to Cloud Database Based on Improved Pheromone Algorithm , pp.197-202 http://dx.do.org/10.14257/dta.2016.9.5.20 Research of Dynamc Access to Cloud Database Based on Improved Pheromone Algorthm Yongqang L 1 and Jn Pan 2 1 (Software Technology Vocatonal College,

More information

TN348: Openlab Module - Colocalization

TN348: Openlab Module - Colocalization TN348: Openlab Module - Colocalzaton Topc The Colocalzaton module provdes the faclty to vsualze and quantfy colocalzaton between pars of mages. The Colocalzaton wndow contans a prevew of the two mages

More information

Cluster Analysis of Electrical Behavior

Cluster Analysis of Electrical Behavior Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School

More information

Hermite Splines in Lie Groups as Products of Geodesics

Hermite Splines in Lie Groups as Products of Geodesics Hermte Splnes n Le Groups as Products of Geodescs Ethan Eade Updated May 28, 2017 1 Introducton 1.1 Goal Ths document defnes a curve n the Le group G parametrzed by tme and by structural parameters n the

More information

Performance Evaluation of Information Retrieval Systems

Performance Evaluation of Information Retrieval Systems Why System Evaluaton? Performance Evaluaton of Informaton Retreval Systems Many sldes n ths secton are adapted from Prof. Joydeep Ghosh (UT ECE) who n turn adapted them from Prof. Dk Lee (Unv. of Scence

More information

Mathematics 256 a course in differential equations for engineering students

Mathematics 256 a course in differential equations for engineering students Mathematcs 56 a course n dfferental equatons for engneerng students Chapter 5. More effcent methods of numercal soluton Euler s method s qute neffcent. Because the error s essentally proportonal to the

More information

A SYSTOLIC APPROACH TO LOOP PARTITIONING AND MAPPING INTO FIXED SIZE DISTRIBUTED MEMORY ARCHITECTURES

A SYSTOLIC APPROACH TO LOOP PARTITIONING AND MAPPING INTO FIXED SIZE DISTRIBUTED MEMORY ARCHITECTURES A SYSOLIC APPROACH O LOOP PARIIONING AND MAPPING INO FIXED SIZE DISRIBUED MEMORY ARCHIECURES Ioanns Drosts, Nektaros Kozrs, George Papakonstantnou and Panayots sanakas Natonal echncal Unversty of Athens

More information

Range images. Range image registration. Examples of sampling patterns. Range images and range surfaces

Range images. Range image registration. Examples of sampling patterns. Range images and range surfaces Range mages For many structured lght scanners, the range data forms a hghly regular pattern known as a range mage. he samplng pattern s determned by the specfc scanner. Range mage regstraton 1 Examples

More information

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 An Iteratve Soluton Approach to Process Plant Layout usng Mxed

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Decson surface s a hyperplane (lne n 2D) n feature space (smlar to the Perceptron) Arguably, the most mportant recent dscovery n machne learnng In a nutshell: map the data to a predetermned

More information

(1) The control processes are too complex to analyze by conventional quantitative techniques.

(1) The control processes are too complex to analyze by conventional quantitative techniques. Chapter 0 Fuzzy Control and Fuzzy Expert Systems The fuzzy logc controller (FLC) s ntroduced n ths chapter. After ntroducng the archtecture of the FLC, we study ts components step by step and suggest a

More information

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Proceedngs of the Wnter Smulaton Conference M E Kuhl, N M Steger, F B Armstrong, and J A Jones, eds A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Mark W Brantley Chun-Hung

More information

A GENETIC ALGORITHM FOR PROCESS SCHEDULING IN DISTRIBUTED OPERATING SYSTEMS CONSIDERING LOAD BALANCING

A GENETIC ALGORITHM FOR PROCESS SCHEDULING IN DISTRIBUTED OPERATING SYSTEMS CONSIDERING LOAD BALANCING A GENETIC ALGORITHM FOR PROCESS SCHEDULING IN DISTRIBUTED OPERATING SYSTEMS CONSIDERING LOAD BALANCING M. Nkravan and M. H. Kashan Department of Electrcal Computer Islamc Azad Unversty, Shahrar Shahreqods

More information

Active Contours/Snakes

Active Contours/Snakes Actve Contours/Snakes Erkut Erdem Acknowledgement: The sldes are adapted from the sldes prepared by K. Grauman of Unversty of Texas at Austn Fttng: Edges vs. boundares Edges useful sgnal to ndcate occludng

More information

Self-tuning Histograms: Building Histograms Without Looking at Data

Self-tuning Histograms: Building Histograms Without Looking at Data Self-tunng Hstograms: Buldng Hstograms Wthout Lookng at Data Ashraf Aboulnaga Computer Scences Department Unversty of Wsconsn - Madson ashraf@cs.wsc.edu Surajt Chaudhur Mcrosoft Research surajtc@mcrosoft.com

More information

Verification by testing

Verification by testing Real-Tme Systems Specfcaton Implementaton System models Executon-tme analyss Verfcaton Verfcaton by testng Dad? How do they know how much weght a brdge can handle? They drve bgger and bgger trucks over

More information

Space-Optimal, Wait-Free Real-Time Synchronization

Space-Optimal, Wait-Free Real-Time Synchronization 1 Space-Optmal, Wat-Free Real-Tme Synchronzaton Hyeonjoong Cho, Bnoy Ravndran ECE Dept., Vrgna Tech Blacksburg, VA 24061, USA {hjcho,bnoy}@vt.edu E. Douglas Jensen The MITRE Corporaton Bedford, MA 01730,

More information

Smoothing Spline ANOVA for variable screening

Smoothing Spline ANOVA for variable screening Smoothng Splne ANOVA for varable screenng a useful tool for metamodels tranng and mult-objectve optmzaton L. Rcco, E. Rgon, A. Turco Outlne RSM Introducton Possble couplng Test case MOO MOO wth Game Theory

More information

Parallel Branch and Bound Algorithm - A comparison between serial, OpenMP and MPI implementations

Parallel Branch and Bound Algorithm - A comparison between serial, OpenMP and MPI implementations Journal of Physcs: Conference Seres Parallel Branch and Bound Algorthm - A comparson between seral, OpenMP and MPI mplementatons To cte ths artcle: Luco Barreto and Mchael Bauer 2010 J. Phys.: Conf. Ser.

More information

Repeater Insertion for Two-Terminal Nets in Three-Dimensional Integrated Circuits

Repeater Insertion for Two-Terminal Nets in Three-Dimensional Integrated Circuits Repeater Inserton for Two-Termnal Nets n Three-Dmensonal Integrated Crcuts Hu Xu, Vasls F. Pavlds, and Govann De Mchel LSI - EPFL, CH-5, Swtzerland, {hu.xu,vasleos.pavlds,govann.demchel}@epfl.ch Abstract.

More information

An efficient iterative source routing algorithm

An efficient iterative source routing algorithm An effcent teratve source routng algorthm Gang Cheng Ye Tan Nrwan Ansar Advanced Networng Lab Department of Electrcal Computer Engneerng New Jersey Insttute of Technology Newar NJ 7 {gc yt Ansar}@ntedu

More information

Communication-Minimal Partitioning and Data Alignment for Af"ne Nested Loops

Communication-Minimal Partitioning and Data Alignment for Afne Nested Loops Communcaton-Mnmal Parttonng and Data Algnment for Af"ne Nested Loops HYUK-JAE LEE 1 AND JOSÉ A. B. FORTES 2 1 Department of Computer Scence, Lousana Tech Unversty, Ruston, LA 71272, USA 2 School of Electrcal

More information

3. CR parameters and Multi-Objective Fitness Function

3. CR parameters and Multi-Objective Fitness Function 3 CR parameters and Mult-objectve Ftness Functon 41 3. CR parameters and Mult-Objectve Ftness Functon 3.1. Introducton Cogntve rados dynamcally confgure the wreless communcaton system, whch takes beneft

More information

Polyhedral Compilation Foundations

Polyhedral Compilation Foundations Polyhedral Complaton Foundatons Lous-Noël Pouchet pouchet@cse.oho-state.edu Dept. of Computer Scence and Engneerng, the Oho State Unversty Feb 8, 200 888., Class # Introducton: Polyhedral Complaton Foundatons

More information

Vectorization in the Polyhedral Model

Vectorization in the Polyhedral Model Vectorzaton n the Polyhedral Model Lous-Noël Pouchet pouchet@cse.oho-state.edu Dept. of Computer Scence and Engneerng, the Oho State Unversty October 200 888. Introducton: Overvew Vectorzaton: Detecton

More information

APPLICATION OF MULTIVARIATE LOSS FUNCTION FOR ASSESSMENT OF THE QUALITY OF TECHNOLOGICAL PROCESS MANAGEMENT

APPLICATION OF MULTIVARIATE LOSS FUNCTION FOR ASSESSMENT OF THE QUALITY OF TECHNOLOGICAL PROCESS MANAGEMENT 3. - 5. 5., Brno, Czech Republc, EU APPLICATION OF MULTIVARIATE LOSS FUNCTION FOR ASSESSMENT OF THE QUALITY OF TECHNOLOGICAL PROCESS MANAGEMENT Abstract Josef TOŠENOVSKÝ ) Lenka MONSPORTOVÁ ) Flp TOŠENOVSKÝ

More information