Communication Speed Selection and Functional Partitioning for Low-Energy On-Chip Networked Multiprocessor
|
|
- Morgan Glenn
- 6 years ago
- Views:
Transcription
1 ommuncaton Speed Selecton and Functonal Parttonng for Low-Energy On-hp Networed ultprocessor Jnfeng Lu, Pa H. hou, Nader Bagherzadeh epartment of Electrcal & omputer Engneerng Unversty of alforna, Irvne, A , USA {nfengl, chou, nader}@ece.uc.edu Abstract Hgh-speed seral networ nterfaces are becomng the prmary way for modern embedded systems and systems-onchp to connect wth each other and wth perpheral devces. odern communcaton nterfaces are capable of operatng at multple speeds and are openng a new dmenson of tradeoffs between computaton and communcaton. Unfortunately, today s PU-centrc technques often fal to consder mult-speed communcaton and the balance between communcaton and computaton for tme and energy; as a result, they yeld sub-optmal f not ncorrect desgns. Ths paper presents a new technque for global energy optmzaton through coordnated functonal parttonng and speed selecton for the processors and ther communcaton nterfaces. We propose a mult-dmensonal dynamc programmng formulaton for energy-optmal functonal parttonng wth PU/communcaton speed selecton for a class of data-regular applcatons under performance constrants. We demonstrate the effectveness of our optmzaton technques wth an mage processng applcaton mapped onto a mult-processor archtecture wth a mult-speed Ethernet. Keywords: communcaton speed selecton, functonal parttonng, on-chp networed mult-processor, low-power desgn Introducton Towards Hgh-Speed Seral Busses on So A ey trend n systems-on-chp s towards the use of hghspeed seral busses for system-level nterconnect. Seral busses offer many compellng advantages, ncludng modularty, composablty, scalablty, form factor, and power effcency [5,, 3]. odularty and composablty are extremely mportant, because the sheer complexty of these chps forces desgners to rase the level of abstracton. ost So desgns are done by ntegraton of ntellectual property (IP) components as a way to manage complexty whle meetng tme-to-maret Ths research was sponsored by ARPA grant F and Prntronx Fellowshp. deadlnes. Seral protocols are well understood and have long been used n n automotve control (e.g., AN from Bausch) and consumer electroncs (e.g., I from Phlps). ore recent protocols such as FreWre (IEEE 394) and USB are commonly used not only for perpheral devces but also for connectng multple embedded processors. They provde a smple, standardzed, effcent, and scalable way of buldng loosely coupled systems. Hgh-speed seral controllers such as Ethernet are now an ntegral part of many embedded processors. Seral busses also have power and form factor advantages. From automobles to computer perpherals, seral nterconnects such as FreWre and USB are compact and low power compared to SSI or parallel, whch are buly, hgh power, and lmted n length. Ths s especally mportant for systems-on-chp, where gates are vrtually free, but wres are the most expensve part of the chp real estate. Long, parallel, shared wres are not only hgh power but also suffer from cloc sews and even cross tals as the feature sze shrns. Seral controllers provde a clean abstracton by sheldng components from these low-level concerns. oreover, modern protocols also support plug-and-play and power management features such as subnet shutdown or ln suspenson. These features and more mae hgh-speed seral protocols an attractve choce for rapd ntegraton of So archtectures. /Performance Issues wth Seral Networs Of course, seral controllers come at a prce. The area and IP lcensng wll have a cost, but ths cost mght be ustfed by tme-to-maret or other overrdng busness concerns. In fact, t mght be even less of an ssue for future IP, whch wll lely have these seral controllers ntegrated. For example, A s newly announced Au [] s a IPS based mcrocontroller wth ntegrated /-base T Ethernet, USB, and many other I/O. However, power and performance wll become the crtcal ssues, as they drectly affect the correctness of the desgn. For power optmzaton, prevous efforts focused on the processor for several reasons. The PU was the man consumer of power, and t also offered the most optons for power management, ncludng voltage scalng. However, recent advances n both processors and communcaton nter-
2 faces are drvng a shft n how power should be managed. PU-centrc power management has gven rse to a new generaton of processors wth dramatcally mproved power effcency, and the PU s now drawng a smaller percentage of the overall system power. The nsatable demand for bandwdth has also resulted n hgh-speed communcaton nterfaces. Even though ther power effcency (.e., energy per bt transmtted) has also been mproved, communcaton power now matches or surpasses the PU, and s thus a larger fracton of the system power. For nstance, the Intel XScale processor consumes.6w at full speed, whle a GgaBt Ethernet nterface consumes 6W. System anagement wth Speed Selecton any communcaton nterfaces today support multple data rates. However, the scalng effects tend to be opposte those of voltage scalable PUs. For PUs, slower speed generally means lower power and lower energy per nstructon; but for communcaton, faster speed means hgher power but often less energy per bt. Ths s hghly dependent on the specfc controller. Few research wors to date explored communcaton speed as a ey parameter for power optmzaton. Speed selecton cannot be performed for ust communcaton or computaton n solaton, because a local decson can have a global mpact. One reason s that communcaton now goes through a shared medum rather than pontto-pont. The PUs cannot all be run at the slowest, most power-effcent speeds, because they must compete for the avalable tme and power wth each other and wth the communcaton nterfaces. A faster communcaton speed, even at a hgher energy-per-bt, can save energy by creatng opportuntes for subsystem shutdown or voltage scalng the processors. Greedly savng communcaton power may actually result n hgher overall energy. At the same tme, functonal parttonng must be an ntegral part of the optmzaton loop, because dfferent parttonng schemes can dramatcally alter the communcaton payload and computaton worload for each node. Approach For a gven worload on a networed archtecture, our problem statement s to generate a functonal parttonng scheme and to select the speeds of communcaton nterfaces and processors, such that the total energy s mnmzed. In general, such a problem s extremely dffcult. Fortunately, for a class of systems wth ppelned tass under an overall latency constrant, effcent, exact solutons exst. Ths paper presents a mult-dmensonal dynamc programmng soluton to such a problem. It formulates the energy consumed by the processors and communcaton nterfaces wth ther power/speed scalng factors wthn ther avalable tme budget. We demonstrate the effectveness of ths technque wth an mage processng algorthm mapped onto a multprocessor archtecture nterconnected by a GgaBt Ethernet. Ths technque s also applcable as a heurstc to general dataflow problems. Related Wor Prevous wors have explored communcaton synthess and optmzaton n dstrbuted mult-processor systems. [7] presents communcaton schedulng to wor wth ratemonotonc tass, whle [7] assumes the more determnstc tme-trggered protocol (TTP). [] dstrbutes tmng constrants on communcaton among segments through prorty assgnment on seral busses (such as control-area networ) and customzaton of devce drvers. Whle these assume a bus or a networ protocol, LYOS [9] ntegrates the ablty to select among several communcaton protocols (wth dfferent delays, data szes, burstness) nto the man parttonng loop. Although these and many other wors can be extended to So archtectures, they do not specfcally optmze for energy mnmzaton by explotng the processors voltage scalng capabltes. Related technques that optmze for power consumpton of processors typcally assume a fxed communcaton data rate. [4] uses smulated heatng search strateges to fnd low-power desgn ponts for voltage scalable embedded processors. [] performs battery-aware tas post-schedulng for dstrbuted, voltage-scalable processors by movng tass to smooth the power profle. [6, 5] propose parttonng the computaton onto a mult-processor archtecture that consumes sgnfcantly less power than a sngle processor. [6] reduces swtchng actvtes of both functonal unts and communcaton lns by parttonng tass onto a mult-chp archtecture; whle [8] maxmzes the opportunty to shut down dle processors through functonal parttonng. All these technques focus on the computatonal aspect wthout explorng the speed/power scalablty of the communcaton nterfaces. Exstng technques cannot be readly combned to explore many tmng/power trade-offs between computaton and communcaton. The quadratc voltage scalng propertes for PU s do not generalze to communcaton nterfaces. Even f they do, these technques have not consdered the parttonng of power and tmng budgets among computaton/communcaton components across the networ. Selectng communcaton attrbutes by only consderng deadlnes wthout power wll lead to unexpected, often ncorrect results at the system level. 3 System odel Ths secton defnes a system-level performance/energy model for both computaton and communcaton components n a networed on-chp mult-processor archtecture. In ths paper, a system conssts of processng nodes N, =,,..., connected by a shared communcaton medum. Each processng node (or node for short) conssts of a processor, a local memory, and one or more communcaton n-
3 terfaces that send and/or receve data from other nodes. 3. Jobs and Tass A processng ob assgned to a node has three tass: REV, PRO, and SEN, whch must be executed serally n that order. REV and SEN are communcaton tass on the nterfaces, and PRO s a computaton tas on the processor. The worload for each tas s defned as follows. For communcaton tass REV and SEN, worload W r and W s ndcate the number of bts to be receved and sent, respectvely. For the computaton tas PRO, the worload W p s the number of cycles. Let T p,t r,t s denote the delays of tass PRO, REV and SEN, respectvely. Let F p denote the cloc frequency of the processor, F r and F s the respectve data bt rates for recevng and sendng. We have T p = W p F p ; T r = W r F r ; T s = W s F s () () s reasonable for processors executng data-domnated programs, where the total cycles W p can be analyzed and bounded statcally. However, t does not hold true n general f the effectve data rate can be reduced by collsons and errors on the shared communcaton medum. We present the collson-free condton of the shared medum n Secton 4. To model the non-deal aspect of the medum, we ntroduce the communcaton effcency terms, ρ r and ρ s, ρ r,ρ s, such that T r = W r ρ r F r and T s = W s ρ s F s. Note that ρ r and ρ s need not be constants, but may be functons of communcaton speeds F r,f s. For brevty, our expermental results assume an deal communcaton medum (ρ r = ρ s = ) wthout loss of generalty. A more practcal communcaton model can be drectly appled, snce ρ r and ρ s can be very well bounded for a collson-free medum. s a deadlne on each processng ob, whch requres T r + T p + T s for the three seralzed tass. If any slac tme exsts, then we can slow down tas PRO by voltage scalng to reduce energy. Therefore, we assume the ob fnshes at the deadlne. That s, 3. Scalng = T r + T p + T s () On each node, we assume only the processor and the communcaton nterfaces are power-manageable by speed selecton. The power consumpton by the communcaton medum s nterpreted to be the total power consumed by all actve communcaton nterfaces. We assume a processor s voltage-scalng characterstcs can be expressed by a scalng functon Scale p that maps the PU frequency to ts power level. A communcaton nterface also has scalng functons that characterze the power levels at dfferent communcaton data rates for sendng and recevng. () mples Scale p REV recevng Wr bts Wp cycles on processor PRO (a) bloc dgram SEN sendng Ws bts delay: Tr = Wr / Fr REV Pr power: Pr Pp speed: Fr delay: Tp = Wp / Fp PRO power: Pp speed: Fp OVERHEA (b) tmng-power dgram delay: Ts = Ws / Fs SEN Ps power: Ps speed: Fs power: Povh Fgure : Tmng and power propertes of a processng node. s contnuous, whle communcaton nterfaces support only a few dscrete scalng ponts. Let P p, P r, and P s denote the power levels of tass PRO, REV and SEN, respectvely, then, P p = Scale p (F p ); P r = Scale r (F r ); P s = Scale s (F s ) (3) Let P ovh denote the power overhead when ntroducng an addtonal node nto the system. It captures the power of the memory, mnmum power of the PU and communcaton nterface, PU s power durng REV and SEN (A), and communcaton nterfaces power durng PRO. The energy consumpton of a tas s the power-delay product. Let E p,e r,e s, and E ovh denote the energy consumpton of tass PRO, REV, SEN, and overhead of a node, E p = P p T p ; E r = P r T r ; E s = P s T s ; E ovh = P ovh (4) For one node N wth tass PRO, REV, and SEN, the total energy of node N s Tme E N = E p + E r + E s + E ovh (5) Fg. shows the structure of a processng node. The gray bar represents the overhead and whte bars represent tass REV, PRO and SEN. The area of the bars refers to the energy contrbuton of the tass and overhead. Fnally, the total energy of the system s the sum of energy consumpton on each node, 3.3 -Node Ppelne E sys = = E N (6) Ths paper consders a specal case called an -node ppelne. It conssts of dentcal nodes N, =,,..., as characterzed by Scale p,scale r,scale s,e ovh. Each node N receves W r bts of data from the prevous node N (except N ), processes the data n W p cycles, and sends the W s -bt result to the next node N + (except N ). Each SEN REV + communcaton par sends and receves same amount of data at the same communcaton speed, wth the same communcaton delay, and we assume they start and fnsh at the same tme. That s, W s = W r+,f s = F r+,t s = T r+. All nodes have the same deadlne, and each node
4 recevng Wr bts T = Tr N RE V Wp cycles on processor N Tp PRO communcatng Ws =Wr bts Ts= Tr SE N Tme T= Tr=Ts N RE V Wp cycles on processor N (a) bloc dagram PRO communcatng Ws =Wr3 bts Ts = Tr3 SEN T = Tme Tr3 =Ts N3 N N N3 T RE V Tp3 - T PRO3 Tp PRO Tp PRO T3 T Tp REV3 (b) seralzed tmng-power dagram T SE N T SEN RE V T T SE REV3 PR N3 O3 Tme PRO3 Tp PRO Tp3 - T T3 T SEN T SE REV3 N3 (c) ppelned tmng-power dagram Wp3 cycles on processor Tme Fgure : A 3-node ppelne. N3 Tp3 PRO3 Tp3 PRO3 sendng Ws3 bts T3 = Ts3 SE N3 T3 SE N3 acts as a ppelne stage wth delay. Fg. shows an example of a three-node ppelne. For brevty, the overhead s not shown. An -node ppelne can be parttoned and mapped onto an -node ppelne ( ) by mergng adacent nodes N,N +,...,N + ( ) nto a new node N. The new node N combnes all computaton worload, receves W r bts of data, and sends W s bts of data. ommuncaton wthn a node become local data accesses. That s, W p = l= W p +l, and W r =W r,w s =W s. The new -node ppelne s called a parttonng of the ntal -node ppelne. 4 Schedulablty ondtons Tme Tme Ths secton presents the schedulablty condtons for the ppelned on-chp mult-processor system. In the ppelned tmng dagram Fg. (c) of the three-node ppelne, we fold the tass n Fg. (b) nto a common nterval wth duraton, whch s the delay of each ppelne stage. Note that there appear to be two nstances of tas PRO on node N 3. Ths does not mean that tas PRO on node N 3 s preempted. In fact, each nstance s a part of an ntegrated tas PRO across the boundary between ppelne stages. In other words, the boundary between ppelne stages resdes n the mddle durng the executon of tas PRO. Fg. (c) shows that due to the common deadlne, communcaton actvtes are shfted to dfferent tme slots, such that at any gven tme, there s at most one actve communcaton nstance (a SEN REV + par, e.g. SEN REV 3 and SEN REV are seralzed). Ths s especally meanngful f all nodes share the communcaton medum such as Ethernet nstead of pont-to-pont connectons. If collson does not occur, then our estmaton on both performance and energy of the whole system can be well bounded. ollson s always undesrable because retransmsson costs both tme and energy. ommuncaton actvtes should be scheduled such that the system s collsonfree. Lemma (ollson-free ondton) In an -node ppelne wth a deadlne, let T, =,,..., ndcate the delays of + nstances of data communcaton. T = T r ( = ) T s = T r+ ( =,,..., ) T s ( = ) The system does not have collson on the shared communcaton medum ff the utlzaton of the shared communcaton medum s less than or equal to. That s, U = T = (7) Note that for a general mult-processor, Lemma expresses the overload condton and can be only a necessary condton for a collson-free schedule. However, t s also a suffcent condton for -node ppelnes as defned n Secton 3.3, because ths specal case of ppelnng has the property of seralzng all communcaton nstances. Lemma s also the schedulablty condton for the shared communcaton medum. Lemma (Schedulablty ondton of One Node) In an -node ppelne wth a deadlne, nodes N, =,,...,, N s able to meet the deadlne ff N s not overloaded, that s, W p max(f p ) T r T s (8) Lemma states the overload condton of one node: gven the communcaton speeds (that determne communcaton delays T r,t s ), f ts computaton tas cannot be completed before the tme budget T r T s by operatng at the maxmum PU cloc rate, then ths node wll fal to meet the deadlne and thus the whole ppelne wll be malfunctonng. If Lemma cannot be satsfed, then the only way to meet the deadlne s to select hgher communcaton speeds to reduce T r,t s, n order to allocate addtonal tme budget for computaton. Hgh-speed communcaton can also reduce communcaton collson to satsfy Lemma.
5 Wr = 8Kb N: Target etecton Wp = 4K cycles Ws = Wr = 4Kb N: FFT Wp = 9K cycles Ws = Wr3 = 4Kb N3: Flter Wp3 = 54K cycles Ws3 = Wr4 = 4Kb N4: IFFT Wp4 = 357K cycles Ws4 = Wr5 = 4Kb N5: ompute stance Wp5 = 639K cycles Fgure 3: Functonal blocs of the ATR algorthm. bps Node N OVERHEA bps bps (a) A fne-gran parttonng scheme reduces energy on computaton, at the cost of nter-proessor communcaton and overhead of addtonal nodes. bps erge N and N nto a combned node N PRO (ncreased OVERHEA (b) The combned node reduces communcaton and overhead, but t requres more energy for computaton. bps Tme Tme Node N OVERHEA bps bps Node N OVERHEA Tme bps Ws5 = 4Kb (c) The computaton energy can be reduced by ncreasng communcaton speeds, whch leaves more tme on computaton. Fgure 4: The mpact of dfferent parttonng schemes and communcaton speed settngs. Lemma 3 (Schedulablty ondton of the System) An -node ppelne s schedulable to meet a deadlne ff () node N, =,,...,, N meets the deadlne (Lemma ), and () The shared communcaton medum s collson-free (Lemma ). Lemma 3 says that the system s schedulablty s determned by the schedulablty of all resources, ncludng nodes and the communcaton medum. If and only f none of them s overloaded, the system can be ppelned by the deadlne. Lemma 3 holds true only for ths -ppelne organzaton; t s a necessary but not suffcent condton for a general mult-processor system. 5 otvatng Example We use an automatc target recognton (ATR) algorthm (Fg. 3) as our motvatng example. Orgnally t s a seral algorthm. We reconstructed a parallel verson and mapped t onto ppelned multple processors. Ppelnng allows each processor to run at a much slower speed wth a lower voltage level to reduce overall computaton energy, whle parallelsm compensates for the performance. Of course, a multprocessor platform ncurs energy for nter-processor communcaton, extra processors, memory, and other overhead. appng Tas to Node through Parttonng Gven the fve functonal blocs (tass) of the ATR algorthm, several parttonng schemes are possble for mappng the tass to a number of ppelned nodes. Fg. 4 shows an Tme example by consderng how they map the frst two tass onto nodes. In Fg. 4(a), they are mapped onto two nodes N and N that are both allowed to operate at a lower speed (3Hz) for computaton. Ths scheme has lower computaton energy than f they were mapped onto one node, but t requres energy on communcaton tass SEN REV, plus overhead. Fg. 4(b) shows a mappng onto one node. It elmnates the communcaton SEN REV and the overhead of an extra node. However, the combned node has much more computaton worload and must run at a faster cloc rate (6Hz), a less energy-effcent level. Zoomng out, many parttonng schemes are possble, even when lmted to a ppelned organzaton. For example, one parttonng [N, N][N3, N4, N5] may be optmal for nodes N and N; but t wll preclude another soluton [N],[N,N3],[N4,N5] that may lead to lower energy for the whole system. Speed Selecton for PU and ommuncaton In addtonal to parttonng, the selecton of communcaton speed s an equally crtcal ssue. For example we consder a //Base-T Ethernet nterface. It consumes more power than the PU at hgh (/bps) speeds, but less power than the PU at the slower, bps data rate. In Fg. 4(b), the processor must operate at a hgh cloc rate due to the low-speed communcaton at bps. Because of the deadlne, communcaton and computaton compete for ths budget. Low-speed communcaton leaves less tme for computaton, thereby forcng the processor to run faster to meet the deadlne. onversely, hgh-speed communcaton could free more tme budget for computaton, shown n Fg. 4(c), where the PU s cloc rate s dropped to 3Hz wth bps communcaton. Although extra energy could be allocated to communcaton, f the energy savng on the PU could compensate for ths cost, then (c) would be more energy-effcent than (b). The communcaton-computaton nteracton becomes more ntrcate n a mult-processor envronment. Any data dependency between dfferent nodes must nvolve ther communcaton nterfaces. The communcaton speed of a sender wll not only determne the recever s communcaton speed but also nfluence the choce of the recever s computaton speed. The communcaton speed on the frst node of the ppelne wll have a chan effect on all other nodes n the system. A locally optmal speed for the frst node wll not necessarly lead to a globally optmal soluton. ombnng Parttonng and Speed Selecton Parttonng and communcaton speed selecton are mutually enablng each other. Gven a fxed parttonng scheme, the desgners can always fnd the correspondng optmal speed settng that mnmzes energy for that scheme. However, energy-optmal speed selecton for a parttonng s not necessarly optmal over all parttonngs. Instead, parttonng and speed selecton are mutually enablng. In ths pa-
6 per, we tae a mult-dmensonal optmzaton approach that consders performance requrement, schedulablty, load balancng, communcaton-computaton trade-offs, and multprocessor overhead n a system-level context. 6 Problem Formulaton Gven an -node ppelne, the choces of parttonng and communcaton speed settngs wll lead to dfferent energy consumpton at the system level. Ths secton formulates the energy mnmzaton problems by means of parttonng and communcaton speed selecton. In both cases, the optmal solutons can be obtaned by dynamc programmng. Fnally, the combned optmzaton problem wth both parttonng and communcaton speed selecton can be addressed synergstcally by mult-dmensonal dynamc programmng. Problem (Optmal Parttonng) Gven (a) ppelned nodes N wth worload W p,w r,w s, =,,...,, (b) a deadlne for all nodes, and (c) the constrant that the speed settngs of all communcaton nstance must match: F r,f s = F r+,f s, for =,,...,, fnd a parttonng scheme that mnmzes energy E sys. To avod exhaustve enumeraton n the O( ) soluton space, we construct a seres of sub-problems as follows. We consder a sub-problem P[, ] that maps the frst orgnal nodes N,N,...,N onto a sub-parttonng nodes N,N,...,N. The optmal soluton of P[, ] has the mnmum energy E[, ]. It can be decomposed nto two parts shown n Fg. 5: (a) a sub-parttonng P[,l] that maps frst l orgnal nodes to new nodes, plus (b) the th new node N that combnes the orgnal nodes N l+,...,n wth ts energy denoted as E N. In order to acheve the mnmum energy E[, ], the energy consumpton of (a) must also be an optmal sub-soluton E[,l]. Snce l can be any value n a range l, E[, ] must also be the mnmum value of E[,l] + E N over all these possble values of l. That s, E[, ] = mn l {E[,l] + E N }. Any optmal sub-soluton E[, ] can be derved from other optmal sub-solutons E[,l]. Therefore, the problem has an optmal sub-structure and a dynamc programmng approach s approprate. It s llustrated n Fg. 6. atrx E[, ] s ntalzed to for. We defne E[,] = and t can be used to compute the frst row E[, ], =,,...,. For any entry E[, ], ts value can be computed by entres n the prevous row E[,l], l. These entres are shaded n Fg. 6. Thus, a seres of optmal sub-solutons E[, ],E[3, ],...,E[, ] n each row of the matrx can be computed subsequently. Fnally, these sub-solutons lead to the global optmal soluton mn {E[,]}, whch maps all orgnal nodes onto a new parttonng wth mnmum energy. Note that the same algorthm can also solve the optmal parttonng onto a fxed number of nodes. For example, orgnal nodes -node optmal sub-parttonng wth mnmum energy E[, ] (a) a sub-parttonng that maps l nodes N,..., Nl on to - new nodes N',..., N'- wth mnmum energy E[-, l] N N Nl Nl+ N N' N'- N' (b) the last new node N' combnes nodes Nl+,..., N wth energy EN' Fgure 5: The optmal sub-structure of Problem. - E[,] E[,] E[,] E[,] E[-, -] l = -,..., - E[-, -] E[,] E[-, E[-, -] ] E opt = mn {E[, ]} =,,..., E[,] E[,] Fgure 6: The dynamc programmng approach to solve Problem. Each entry E[, ] can be computed by the shaded entres n the prevous row. The global optmal energy s the mnmum value of the last column. E[,] s the optmal energy for mappng nodes onto an arbtrary -node new parttonng. To summarze, the optmal cost functon E s defned as follows: E[, ] = E[,] E[, ] for = = mn l { E[,l] } f +E N for U[,l] + W s F s, (9) To guarantee each optmal sub-soluton s schedulable, by Lemma 3, the communcaton medum must be collsonfree, and any node n the new sub-parttonng must not be overloaded. We defne a utlzaton matrx U[, ] ndcatng the utlzaton of the communcaton medum correspondng to the optmal soluton of a sub-problem P[, ], whch s guarded by U[, ] (Lemma ). U s ntalzed to, whle settng U[,] = W r F r (= T n(7)), ndcatng the bandwdth used by the frst communcaton nstance REV. We also defne the energy consumpton of a node N as E N that refnes (5) by Lemma. If a node s overloaded, then ts energy consumpton s ndcatng an nvald soluton.
7 parttonng(w r [ : ],W s [ : ],W p [ : ],F r [ : ],F s [ : ], scale r,scale s,scale p,,p ovh ) for := to do for := to do E[, ] := U[, ] := P[, ] := E[,] := U[,] := W r []/F r []/ for := to do for := to do for l := to do e := E[,l] + E N u := U[,l] +W s [ ]/F s [ ]/ f u and e < E[, ] then E[, ] := e U[, ] := u P[, ] := l E opt,p opt := retreve from matrces E,P return E opt,p opt U[, ] = E N = Fgure 7: Optmal parttonng algorthm. W r F r for = = U[,l] + W s F s scale r (F r )T r + scale s (F s )T s + scale p (F p )T p + P ovh for l that acheves mn{e[, ]} n (9), for f F p = W p T r T s F max (T r = W r F r,t s = W s F s ) () otherwse () Fg. 7 shows the optmal parttonng algorthm derved from (9) and (). The parttonng matrx P[, ] records the prevous optmal sub-solutons for each sub-problem. Ths nformaton can be used to retreve the optmal parttonng P opt. The tme complexty of ths algorthm s O( 3 ) determned by the three-level nested loop. Problem (Optmal Speed Selecton) Gven (a) a fxed parttonng scheme wth ppelned nodes N wth worload W p,w r,w s, =,,...,, (b) a deadlne for all nodes, and (c) the avalable choces for communcaton speed settngs F c, =,,...,, fnd all processor speeds F p and communcaton speeds F r,f s that mnmze energy E sys. We also perform dynamc programmng as opposed to exhaustve search n O( + ) soluton space. Snce communcaton speeds decde processor speeds, we only select communcaton speeds for each node. Gven that the sendng speed and recevng speed are equal for each communcaton nstance, selectng only sendng speed s suffcent. frst nodes where the last sendng speed Fs= Fc wth mnmum energy E[, ] N N... N- N (a) a sub speed selecton problem where node N -'s sendng speed selected as Fs- = Fcm wth mnmum energy E[-, m] sendng speed Fs- = Fcm recevng speed Fr = Fcm sendng speed Fs = Fc (b) the last node N whose recevng speed s Fcm and sendng speed s Fc wth energy EN(Fr = Fcm, Fs = Fc) Fgure 8: The optmal sub-structure of Problem. - E[,] E[,] E[,] E[-,] E[,] E[,] E[,] E[,] E[-,-] E[,] E[,] E[,] E[,] E[-,] E[,] E[,] E opt = mn {E[, ]} =,,..., Fgure 9: The dynamc programmng approach to solve Problem. Each entry E[,] can be computed by the shaded row E[,l]. The global optmal energy s the mnmum value of the last row. We defne a sub-problem S[, ] that selects communcaton speeds for the frst nodes, wth the last node N s sendng speed selected to be the th choce of speed settngs, F s = F c. Its optmal sub-soluton has mnmum energy E[,]. As llustrated n Fg. 8, a sub-problem S[,] conssts of two parts: (a) another sub-problem S[,m] that selects speed settngs for the frst nodes wth node N s sendng speed F s = F cm, combned wth (b) node N wth recevng speed F r = F cm and sendng speed F s = F c. (a) must be an optmal sub-soluton wth mnmum energy E[,m]. (b) has only one node N that receves data from (a) through speed F cm ; and ts sendng speed s F c. Its energy s denoted as E N (F r = F cm,f s = F c ). Therefore, E[,] = E[,m] + E N (F r = F cm,f s = F c ). In the sub-problem S[,m], F cm can be any choce among F c,f c,...,f c. In order to acheve the mnmum energy E[,], t must be the mnmum value among all possble F cm. That s, the optmal sub-structure of ths problem can be defned as E[,] = mn m {E[,m] + E N (F r = F cm,f s = F c )} The dynamc programmng algorthm s llustrated n Fg. 9. Snce each E[,] can be derved from the prevous row E[,m],m =,,...,, the algorthm can compute all rows of matrx E from E[,],E[,],..., to E[,], =,,..., sequentally. The global optmal energy s the mnmum value n the last row, mn {E[,]}. The energy matrx E[, ] and utlzaton matrx U[, ] are defned as follows. U[, ] guarantees that each optmal sub-soluton E[, s schedulable. Both E and U are ntalzed to, except E[,] =, U[,] s set to the utlzaton
8 speedselecton(w r [ : ],W s [ : ],W p [ : ],F c [ : ], scale r,scale s,scale p,,p ovh ) for := to do for := to do E[,] := U[,] := S[,] := for := to do E[,] := U[,] := W r []/F c []/ for := to do for := to do for m := to do e := E[,m] + E N (F r = F c [m],f s = F c []) u := U[,m] +W s []/F c []/ f u and e < E[,m] then E[,] := e U[,] := u S[,] := m E opt,s opt := retreve from matrces E,S return E opt,s opt Fgure : Optmal speed selecton algorthm. of the frst communcaton nstance REV usng communcaton speed F c, for =,,...,. E[,] = for mn m W r F c E[,m]+ E N (F r = F cm, F s = F c ) for =, =, f U[,m] + W s F c, for, () U[,] = for m that acheves U[,m] mn{e[,]} n (), + W s F c, for (3) The algorthm s shown n Fg.. The speed matrx S records the prevous optmal sub-solutons. The optmal speed settng S opt wll be retreved from S. The tme complexty of ths algorthm s O( ). Note that the algorthm can be modfed trvally to f the frst communcaton speed F r and the last communcaton speed F s are fxed. Ths refers to the stuaton where the ppelned mult-processor has a fxed communcaton speed settng to other components whle ts nternal communcaton speeds can be selected to optmal. Problem 3 (Optmal Parttonng and Speed Selecton) Gven (a) ppelned nodes N wth worload W p,w r,w s, =,,...,, (b) a deadlne for all nodes, and (c) the avalable choces for communcaton speed settngs F c, =,,...,, fnd a parttonng scheme and correspondng communcaton speed settngs that mnmze energy E sys. ue to the nter-dependency between speed settngs and parttonng schemes, the optmal soluton cannot be acheved by solvng two prevous problems ndvdually. Exhaustvely enumeratng over one dmenson and dynamc programmng over the other s qute expensve wth the tme complexty as ether O( ) or O( + 3 ). We proposed a mult-dmensonal dynamc programmng algorthm gven the fact that the two prevous problems are all characterzed by optmal sub-structures. Based on the dynamc programmng approaches n prevous problems, we defne a subproblem PS[,,] that maps orgnal nodes N,N,...,N onto an -node new sub-parttonng N,N,...,N, wth the last node N s sendng speed F s = F c. The optmal subsoluton has mnmum energy E[,,]. Smlar to the prevous problems, a sub-problem PS[,, ] can be decomposed wth an optmal sub-structure, shown n Fg.. (a) s a prevous sub-problem PS[,l,m], whch maps the frst l orgnal nodes N,N,...,N l onto new nodes wth node N s sendng speed selected as F c m. (b) s the new node N that combnes orgnal nodes N l+,...,n wth recevng speed F cm and sendng speed F c. (a) must be an optmal sub-soluton wth the mnmum energy E[,l,m]. Note that (b) has only one node N, and ts energy s denoted as E N (F r = F cm,f s = F c ). For sub-soluton E[,l,m], l can be any value n range l and F cm s one of speed choces F c,f c,...,f c. E[,,] must be derved from all possble pars of (l,m) to acheve the mnmum value. Therefore, E[,,] = mn l, m {E[,l,m] + E N (F r = F cm,f s = F c )}. The algorthm s llustrated n Fg.. The threedmensonal matrx E[,, ] s represented by a seres of two-dmensonal sub-matrx ndexed by =,,...,. Any E[,,] can be computed from entres n a sub-matrx E[,l,m], l, m. The algorthm constructs all optmal sub-solutons from E[,,],E[,,],... to E[,,],,. The global mnmum energy s mn, {E[,,]}. It refers to the mnmum value of the last rows n all sub-matrces. The energy matrx E[,,] and the utlzaton matrx U[,,] s defned as follows.
9 orgnal nodes -node optmal sub-parttonng where the last sendng speed F's = Fc wth mnmum energy E[,, ] - - N N Nl Nl+ N N' N'- N' sendng recevng sendng speed speed speed F's- = Fcm F'r = Fcm F's = Fc (a) a sub-parttonng that maps l nodes N,..., Nl on to - new nodes N',..., N'- where node N'- 's sendng speed selected as F's- = Fcm wth mnmum energy E[-, l] (b) the last new node N' combnes nodes Nl+,..., N whose recevng speed s Fcm and sendng speed s Fc wth energy EN'(Fr = Fcm, Fs = Fc) Fgure : The optmal sub-structure of Problem 3. E[-,,] E[-,-,] - E[-,-,] - E[-,,] E[-,,]... E[-,-,] E[-,-,] - E[-,-,] E[-,-,] - E[-,,] E[-,,] E[-,-,] E[-,-,] - E[-,-,] E[-,-,] - E[-,,] E[-,,] E[-,,] E[-,,] E[-,,] E[-,-,] E[-,-,] E[-,-,] E[-,-,] E[-,-,] E[-,-,] E[-,-,] E[-,,] E[-,-,] E[-,-,] E[-,-,] E[-,,] E[-,,] E[-,-,] E[-,-,] E[-,,] E[-,-,] E[-,-,] E[-,,] l = -,..., - m =,,..., E[-,,] E[-,,] E[,,]... Eopt = mn{e[,, ]} =,,..., =,,..., E[-,,] E[,,] E[-,,] E[-,,]... E[,,] E[,,] E[-,,] E[-,,] E[,,] E[,,] E[,,] E[,,]... Fgure : The mult-dmensonal dynamc programmng approach to solve Problem 3. Each entry E[,, ] can be computed by the shaded entres n the prevous sub-matrx. The global optmal energy s the mnmum value n the last row of all sub-matrces. E[,,] = U[,,] = for mn l, m W r F c U[,l,m] + W s F c E[,l,m]+ E N (F r = F cm, F s = F c ) for = =, E[-,,] E[-,,] E[,,] = =, U[,l,m] f + W s F c, for, (4) for (l,m) that acheve mn{e[,,]} n (4),, for (5) The algorthm s shown n Fg. 3. It combnes two prevous algorthms by two-dmensonal dynamc programmng. The tme complexty of the algorthm s O( 3 ). It also apples to stuatons where the new parttonng has a fxed number of nodes, or the ppelne has a fxed communcaton nterface to other components whle only nternal communcaton speed can be selected. parttonng-speedselecton(w r [ : ],W s [ : ],W p [ : ], F c [ : ],scale r,scale s,scale p,,p ovh ) for := to do for := to do for := to do E[,,] := U[,,] := P[,,] := S[,,] := for := to do E[,,] := U[,,] := W r []/F c []/ for := to do for := to do for := to do for l := to do for m := to do e := E[,l,m] + E node (merge(n l+,...,n ), wth F r = F c [m],f s = F c []) u := U[,l,m] +W s [ ]/F c []/ f u and e < E[,,] then E[,,] := e U[,,] := u P[,,] := l S[,,] := m E opt,p opt,s opt := retreve from matrces E,P,S return E opt,p opt,s opt Fgure 3: ombned parttonng wth speed selecton. 7 Analytcal Results To evaluate our energy optmzaton technque, we expermented wth mappng the ATR algorthm [4] (Fg. 3) onto two fxed parttonng schemes: (a) a sngle-node that combnes all blocs, and (b) a fve-node ppelne that maps each bloc onto an ndvdual node. (a) and (b) are two extremes representng seral vs. parallel schemes. For both (a) and (b) we apply optmal speed selecton. We also fnd the optmal parttonng wth speed selecton as (c) and compare wth (a) and (b) under three types of performance requrements: () hgh performance, = ms, () moderate performance, = 5ms, and (3) low performance, = ms. Each node conssts of an XScale processor and an LXT- Ethernet nterface from Intel. The Scale p and Scale s (same as Scale r ) functons, whch ndcate the power vs. performance characterstcs of a node, are extracted from ther data sheets [, 3] and are shown n Fg. 4 and 5. Besdes the power draw from the PU and communcaton nterfaces, we assume each node has a constant power draw P ovh = mw. The results are presented n Fg. 6. In all cases, bps s always the optmal speed settng for communcaton. The low-power, bps communcaton speed results n the hghest energy. Ths s because t leaves so lttle tme for computaton such that the processors must run faster wth more energy to meet the deadlne, and t has the hghest energy-per-bt ratng. The low-speed communcaton also tends to volate the schedulablty condtons (Lemma 3). Gven propertes of ths partcular Eth-
10 4 4 4 Overhead Energy / frame (mj) ommuncaton omputaton (a) -node (b) 5-node (c) Optmal NN N3 N4 N5 () hgh performance = ms (a) -node (b) 5-node (c) Optmal NNN3N4 N5 () moderate performance = 5ms (a) -node (optmal) (b) 5-node (3) low performance = ms Fgure 6: Analytcal results. ernet nterface, bps communcaton wll always lead to the lowest energy consumpton snce t requres the least amount of energy per bt and leaves the maxmum amount of tme budget for reducng PU energy. However, n cases where the energy-per-bt ratng does not decrease monotoncally wth the communcaton speed, the optmal speed settng may nvolve some combnatons of low-speed and hghspeed settngs between dfferent nodes. For example, the node N may communcate wth N at bps and wth N + at bps. Fg. 6() shows the energy consumpton per mage frame n three parttonng schemes. Wth a tght performance constrant, the sngle-node (a) s heavly loaded wth computaton. Therefore t s desrable to reduce PU energy by ppelnng. As a result, the fve-node ppelne (b) s more energy-effcent at the cost of addtonal communcaton and overhead. However, the optmal parttonng s (c) wth three nodes: [N,N],[N3,N4],[N5]. It consumes more PU energy than (b), but overall t s optmal wth less energy on communcaton and overhead. Fgure 4: vs. performance of the XScale processor. ode bps consumpton 8 mw bps.5w bps 6W Fgure 5: modes of the Ethernet nterface. In case of the moderate performance constrant (Fg. 6()), (a) s stll domnated by computaton but t s not heavly loaded due to the relaxed deadlne. The reducton of PU energy by (b) cannot compensate for the added overhead of new nodes and communcaton. Therefore (a) s better than (b) and ppelnng seems neffcent. However, the optmal parttonng (c) s stll a ppelned soluton. It combnes N,N,N3,N4 nto one node and maps N5 to another node. (c) acheves mnmum energy by approprately balancng computaton, communcaton wth ppelnng overhead. In cases where the performance s not crtcal, ppelnng s not effcent and the seral soluton (a) s optmal. Fg. 6(3) shows that the computaton load on (a) s very lght. Introducng addtonal nodes wll only save margnal PU energy that wll be offset by extra communcaton and overhead.
11 8 oncluson We present a combned parttonng and speed selecton technque for the energy optmzaton of embedded multprocessor-on-chp archtectures wth hgh-speed onchp networs. As communcaton power approaches or surpasses that of processor power, communcaton must be treated as a prmary concern n system-level energy optmzaton. We explot the mult-speed feature of modern hgh-speed communcaton nterfaces as an effectve way to complement and enhance today s PU-centrc power optmzaton approaches. In such systems, communcaton and computaton compete over opportuntes for operatng at the most energy-effcent ponts. It s crtcal to not only balance the load among processors by functonal parttonng, but also to balance the speeds between communcaton and computaton on each node and across the whole system. Our mult-dmensonal dynamc programmng formulaton s exact and s of polynomal tme complexty. It produces energy-optmal solutons as defned by a parttonng scheme and by the speed selectons for all computaton and communcaton tass. We expect ths technque to be applcable to a large class of data domnated systems-on-chp that can be structured n a ppelned organzaton. References [] The Alchemy Au from A: Internet edge processor. nfo/- au/ndex.html. [] INTEL ethernet PHYs/transcevers. ntel.com/desgn/networ/products/ethernet/- lnecard ept.htm. [3] INTEL XScale mcroarchtecture. ntel.com/desgn/ntelxscale/. [4] N. K. Bambha, S. S. Bhattacharyya, J. Tech, and E. Ztzler. Hybrd global/local search strateges for dynamc voltage scalng n embedded multprocessors. In Proc. Internatonal Symposum on Hardware/Software odesgn, pages 43 48,. [5] L. Benn and G. e chel. Networs on chps: a new soc paradgm. IEEE omputer, 35():7 78, Jan. [6] R. herabudd,. Bayoum, and H. Krshnamurthy. A low power based system parttonng and bndng technque for mult-chp module archtectures. In Proc. Proc. Great Laes Symposum on VLSI, pages 56 6, 997. [7] P. Eles, A. obol, P. Pop, and Z. Peng. Schedulng wth bus access optmzaton for dstrbuted embedded systems. IEEE Transactons on VLSI Systems, 8(5):47 49,. [8] E. Huwang, F. Vahd, and Y.-. Hsu. FS functonal parttonng for low power. In Proc. esgn, Automaton and Test n Europe, pages 8, 999. [9] P. V. Knudsen and J. adsen. Integratng communcaton protocol selecton wth hardware/software codesgn. IEEE Transactons on omputer-aded esgn of Integrated rcuts and Systems, 8(8):77 95, August 999. [] K. Lahr, A. Raghunathan, and G. Lashmnarayana. LOTTERYBUS: a new hgh-performance communcaton archtecture for system-on-chp desgns. In Proc. esgn Automaton onference, pages 5, June. [] J. Luo and N. K. Jha. Battery-aware statc schedulng for dstrbuted real-tme embedded systems. In Proc. esgn Automaton onference, pages , June. [] R. Ortega and G. Borrello. ommuncaton synthess for dstrbuted embedded systems. In Proc. Internatonal onference on omputer-aded esgn, pages , 998.
12 [3]. Sgro,. Sheets, A. hal, K. Keutzer, S. al, J. Rabaey, and A. Sangovann-Vncentell. Addressng the system-on-a-chp nterconnect woes through communcaton-based desgn. In Proc. esgn Automaton onference, pages , June. [4] R. Sms. Sgnal to clutter measurement and ATR performance. Proc. of the SPIE - The Internatonal Socety for Optcal Engneerng, 337():3 7, Aprl 998. [5] A. Wang and A. handraasan. Energy effcent system parttonng for dstrbuted wreless sensor networs. In Proc. IEEE Internatonal onference on Acoustcs, Speech and Sgnal Processng, pages 95 98, ay. [6] E. F. Weglarz, K. K. Salua, and. H. Lpast. nmzng energy consumpton for hgh-performance processng. In Proc. Asan and South Pacfc esgn Automaton onference, pages 99 4,. [7] W. Wolf. An archtectural co-synthess algorthm for dstrbuted embedded computng systems. IEEE Transactons on VLSI Systems, pages 8 9, June 997.
Combined Functional Partitioning and Communication Speed Selection for Networked Voltage-Scalable Processors
Combned Functonal Parttonng and Communcaton Speed Selecton for Networked Voltage-Scalable Processors Jnfeng Lu, Pa H. Chou, Nader Bagherzadeh epartment of Electrcal & Computer Engneerng Unversty of Calforna,
More informationCombined Functional Partitioning and Communication Speed Selection for Networked Voltage-Scalable Processors Λ
Combned Functonal Parttonng and Communcaton Speed Selecton for Networked Voltage-Scalable Processors Λ Jnfeng Lu, Pa H. Chou, Nader Bagherzadeh epartment of Electrcal & Computer Engneerng Unversty of Calforna,
More informationParallelism for Nested Loops with Non-uniform and Flow Dependences
Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr
More informationAADL : about scheduling analysis
AADL : about schedulng analyss Schedulng analyss, what s t? Embedded real-tme crtcal systems have temporal constrants to meet (e.g. deadlne). Many systems are bult wth operatng systems provdng multtaskng
More informationThe Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique
//00 :0 AM Outlne and Readng The Greedy Method The Greedy Method Technque (secton.) Fractonal Knapsack Problem (secton..) Task Schedulng (secton..) Mnmum Spannng Trees (secton.) Change Money Problem Greedy
More informationParallel matrix-vector multiplication
Appendx A Parallel matrx-vector multplcaton The reduced transton matrx of the three-dmensonal cage model for gel electrophoress, descrbed n secton 3.2, becomes excessvely large for polymer lengths more
More informationProblem Definitions and Evaluation Criteria for Computational Expensive Optimization
Problem efntons and Evaluaton Crtera for Computatonal Expensve Optmzaton B. Lu 1, Q. Chen and Q. Zhang 3, J. J. Lang 4, P. N. Suganthan, B. Y. Qu 6 1 epartment of Computng, Glyndwr Unversty, UK Faclty
More informationMaintaining temporal validity of real-time data on non-continuously executing resources
Mantanng temporal valdty of real-tme data on non-contnuously executng resources Tan Ba, Hong Lu and Juan Yang Hunan Insttute of Scence and Technology, College of Computer Scence, 44, Yueyang, Chna Wuhan
More informationConfiguration Management in Multi-Context Reconfigurable Systems for Simultaneous Performance and Power Optimizations*
Confguraton Management n Mult-Context Reconfgurable Systems for Smultaneous Performance and Power Optmzatons* Rafael Maestre, Mlagros Fernandez Departamento de Arqutectura de Computadores y Automátca Unversdad
More informationImprovement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration
Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,
More informationThe Codesign Challenge
ECE 4530 Codesgn Challenge Fall 2007 Hardware/Software Codesgn The Codesgn Challenge Objectves In the codesgn challenge, your task s to accelerate a gven software reference mplementaton as fast as possble.
More informationCommunication Speed Selection and Functional Partitioning for Low-Energy On-Chip Networked Multiprocessor
Communication Speed Selection and Functional Partitioning for Low-Energy On-Chip Networked Multiprocessor Jinfeng Liu, Pai H. Chou, Nader Bagherzadeh epartment of Electrical & Computer Engineering University
More informationChapter 1. Introduction
Chapter 1 Introducton 1.1 Parallel Processng There s a contnual demand for greater computatonal speed from a computer system than s currently possble (.e. sequental systems). Areas need great computatonal
More informationCourse Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms
Course Introducton Course Topcs Exams, abs, Proects A quc loo at a few algorthms 1 Advanced Data Structures and Algorthms Descrpton: We are gong to dscuss algorthm complexty analyss, algorthm desgn technques
More informationA mathematical programming approach to the analysis, design and scheduling of offshore oilfields
17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 A mathematcal programmng approach to the analyss, desgn and
More informationSimulation Based Analysis of FAST TCP using OMNET++
Smulaton Based Analyss of FAST TCP usng OMNET++ Umar ul Hassan 04030038@lums.edu.pk Md Term Report CS678 Topcs n Internet Research Sprng, 2006 Introducton Internet traffc s doublng roughly every 3 months
More informationDistributed Resource Scheduling in Grid Computing Using Fuzzy Approach
Dstrbuted Resource Schedulng n Grd Computng Usng Fuzzy Approach Shahram Amn, Mohammad Ahmad Computer Engneerng Department Islamc Azad Unversty branch Mahallat, Iran Islamc Azad Unversty branch khomen,
More informationEfficient Distributed File System (EDFS)
Effcent Dstrbuted Fle System (EDFS) (Sem-Centralzed) Debessay(Debsh) Fesehaye, Rahul Malk & Klara Naherstedt Unversty of Illnos-Urbana Champagn Contents Problem Statement, Related Work, EDFS Desgn Rate
More informationCache Performance 3/28/17. Agenda. Cache Abstraction and Metrics. Direct-Mapped Cache: Placement and Access
Agenda Cache Performance Samra Khan March 28, 217 Revew from last lecture Cache access Assocatvty Replacement Cache Performance Cache Abstracton and Metrcs Address Tag Store (s the address n the cache?
More informationA Fast Content-Based Multimedia Retrieval Technique Using Compressed Data
A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,
More information5 The Primal-Dual Method
5 The Prmal-Dual Method Orgnally desgned as a method for solvng lnear programs, where t reduces weghted optmzaton problems to smpler combnatoral ones, the prmal-dual method (PDM) has receved much attenton
More information6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour
6.854 Advanced Algorthms Petar Maymounkov Problem Set 11 (November 23, 2005) Wth: Benjamn Rossman, Oren Wemann, and Pouya Kheradpour Problem 1. We reduce vertex cover to MAX-SAT wth weghts, such that the
More informationKent State University CS 4/ Design and Analysis of Algorithms. Dept. of Math & Computer Science LECT-16. Dynamic Programming
CS 4/560 Desgn and Analyss of Algorthms Kent State Unversty Dept. of Math & Computer Scence LECT-6 Dynamc Programmng 2 Dynamc Programmng Dynamc Programmng, lke the dvde-and-conquer method, solves problems
More informationLoad Balancing for Hex-Cell Interconnection Network
Int. J. Communcatons, Network and System Scences,,, - Publshed Onlne Aprl n ScRes. http://www.scrp.org/journal/jcns http://dx.do.org/./jcns.. Load Balancng for Hex-Cell Interconnecton Network Saher Manaseer,
More informationPrivate Information Retrieval (PIR)
2 Levente Buttyán Problem formulaton Alce wants to obtan nformaton from a database, but she does not want the database to learn whch nformaton she wanted e.g., Alce s an nvestor queryng a stock-market
More informationModule Management Tool in Software Development Organizations
Journal of Computer Scence (5): 8-, 7 ISSN 59-66 7 Scence Publcatons Management Tool n Software Development Organzatons Ahmad A. Al-Rababah and Mohammad A. Al-Rababah Faculty of IT, Al-Ahlyyah Amman Unversty,
More informationRouting in Degree-constrained FSO Mesh Networks
Internatonal Journal of Hybrd Informaton Technology Vol., No., Aprl, 009 Routng n Degree-constraned FSO Mesh Networks Zpng Hu, Pramode Verma, and James Sluss Jr. School of Electrcal & Computer Engneerng
More informationDynamic Voltage Scaling of Supply and Body Bias Exploiting Software Runtime Distribution
Dynamc Voltage Scalng of Supply and Body Bas Explotng Software Runtme Dstrbuton Sungpack Hong EE Department Stanford Unversty Sungjoo Yoo, Byeong Bn, Kyu-Myung Cho, Soo-Kwan Eo Samsung Electroncs Taehwan
More informationNeeded Information to do Allocation
Complexty n the Database Allocaton Desgn Must tae relatonshp between fragments nto account Cost of ntegrty enforcements Constrants on response-tme, storage, and processng capablty Needed Informaton to
More informationDESIGNING TRANSMISSION SCHEDULES FOR WIRELESS AD HOC NETWORKS TO MAXIMIZE NETWORK THROUGHPUT
DESIGNING TRANSMISSION SCHEDULES FOR WIRELESS AD HOC NETWORKS TO MAXIMIZE NETWORK THROUGHPUT Bran J. Wolf, Joseph L. Hammond, and Harlan B. Russell Dept. of Electrcal and Computer Engneerng, Clemson Unversty,
More informationCost-efficient deployment of distributed software services
1/30 Cost-effcent deployment of dstrbuted software servces csorba@tem.ntnu.no 2/30 Short ntroducton & contents Cost-effcent deployment of dstrbuted software servces Cost functons Bo-nspred decentralzed
More informationLearning the Kernel Parameters in Kernel Minimum Distance Classifier
Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department
More informationConcurrent Apriori Data Mining Algorithms
Concurrent Apror Data Mnng Algorthms Vassl Halatchev Department of Electrcal Engneerng and Computer Scence York Unversty, Toronto October 8, 2015 Outlne Why t s mportant Introducton to Assocaton Rule Mnng
More informationMeta-heuristics for Multidimensional Knapsack Problems
2012 4th Internatonal Conference on Computer Research and Development IPCSIT vol.39 (2012) (2012) IACSIT Press, Sngapore Meta-heurstcs for Multdmensonal Knapsack Problems Zhbao Man + Computer Scence Department,
More informationCompiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz
Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster
More informationComparison of Heuristics for Scheduling Independent Tasks on Heterogeneous Distributed Environments
Comparson of Heurstcs for Schedulng Independent Tasks on Heterogeneous Dstrbuted Envronments Hesam Izakan¹, Ath Abraham², Senor Member, IEEE, Václav Snášel³ ¹ Islamc Azad Unversty, Ramsar Branch, Ramsar,
More informationAn Entropy-Based Approach to Integrated Information Needs Assessment
Dstrbuton Statement A: Approved for publc release; dstrbuton s unlmted. An Entropy-Based Approach to ntegrated nformaton Needs Assessment June 8, 2004 Wllam J. Farrell Lockheed Martn Advanced Technology
More informationA Frame Packing Mechanism Using PDO Communication Service within CANopen
28 A Frame Packng Mechansm Usng PDO Communcaton Servce wthn CANopen Mnkoo Kang and Kejn Park Dvson of Industral & Informaton Systems Engneerng, Ajou Unversty, Suwon, Gyeongg-do, South Korea Summary The
More informationEfficient Content Distribution in Wireless P2P Networks
Effcent Content Dstrbuton n Wreless P2P Networs Qong Sun, Vctor O. K. L, and Ka-Cheong Leung Department of Electrcal and Electronc Engneerng The Unversty of Hong Kong Pofulam Road, Hong Kong, Chna {oansun,
More informationFeature Reduction and Selection
Feature Reducton and Selecton Dr. Shuang LIANG School of Software Engneerng TongJ Unversty Fall, 2012 Today s Topcs Introducton Problems of Dmensonalty Feature Reducton Statstc methods Prncpal Components
More informationCHAPTER 2 PROPOSED IMPROVED PARTICLE SWARM OPTIMIZATION
24 CHAPTER 2 PROPOSED IMPROVED PARTICLE SWARM OPTIMIZATION The present chapter proposes an IPSO approach for multprocessor task schedulng problem wth two classfcatons, namely, statc ndependent tasks and
More informationA Binarization Algorithm specialized on Document Images and Photos
A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a
More informationAn Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices
Internatonal Mathematcal Forum, Vol 7, 2012, no 52, 2549-2554 An Applcaton of the Dulmage-Mendelsohn Decomposton to Sparse Null Space Bases of Full Row Rank Matrces Mostafa Khorramzadeh Department of Mathematcal
More informationSupport Vector Machines
/9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.
More informationAn Efficient Genetic Algorithm with Fuzzy c-means Clustering for Traveling Salesman Problem
An Effcent Genetc Algorthm wth Fuzzy c-means Clusterng for Travelng Salesman Problem Jong-Won Yoon and Sung-Bae Cho Dept. of Computer Scence Yonse Unversty Seoul, Korea jwyoon@sclab.yonse.ac.r, sbcho@cs.yonse.ac.r
More informationA Hybrid Genetic Algorithm for Routing Optimization in IP Networks Utilizing Bandwidth and Delay Metrics
A Hybrd Genetc Algorthm for Routng Optmzaton n IP Networks Utlzng Bandwdth and Delay Metrcs Anton Redl Insttute of Communcaton Networks, Munch Unversty of Technology, Arcsstr. 21, 80290 Munch, Germany
More informationVirtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory
Background EECS. Operatng System Fundamentals No. Vrtual Memory Prof. Hu Jang Department of Electrcal Engneerng and Computer Scence, York Unversty Memory-management methods normally requres the entre process
More informationEvaluation of an Enhanced Scheme for High-level Nested Network Mobility
IJCSNS Internatonal Journal of Computer Scence and Network Securty, VOL.15 No.10, October 2015 1 Evaluaton of an Enhanced Scheme for Hgh-level Nested Network Moblty Mohammed Babker Al Mohammed, Asha Hassan.
More informationA Fast Visual Tracking Algorithm Based on Circle Pixels Matching
A Fast Vsual Trackng Algorthm Based on Crcle Pxels Matchng Zhqang Hou hou_zhq@sohu.com Chongzhao Han czhan@mal.xjtu.edu.cn Ln Zheng Abstract: A fast vsual trackng algorthm based on crcle pxels matchng
More informationRAP. Speed/RAP/CODA. Real-time Systems. Modeling the sensor networks. Real-time Systems. Modeling the sensor networks. Real-time systems:
Speed/RAP/CODA Presented by Octav Chpara Real-tme Systems Many wreless sensor network applcatons requre real-tme support Survellance and trackng Border patrol Fre fghtng Real-tme systems: Hard real-tme:
More informationOverview. Basic Setup [9] Motivation and Tasks. Modularization 2008/2/20 IMPROVED COVERAGE CONTROL USING ONLY LOCAL INFORMATION
Overvew 2 IMPROVED COVERAGE CONTROL USING ONLY LOCAL INFORMATION Introducton Mult- Smulator MASIM Theoretcal Work and Smulaton Results Concluson Jay Wagenpfel, Adran Trachte Motvaton and Tasks Basc Setup
More informationContent Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers
IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth
More informationHierarchical clustering for gene expression data analysis
Herarchcal clusterng for gene expresson data analyss Gorgo Valentn e-mal: valentn@ds.unm.t Clusterng of Mcroarray Data. Clusterng of gene expresson profles (rows) => dscovery of co-regulated and functonally
More informationWishing you all a Total Quality New Year!
Total Qualty Management and Sx Sgma Post Graduate Program 214-15 Sesson 4 Vnay Kumar Kalakband Assstant Professor Operatons & Systems Area 1 Wshng you all a Total Qualty New Year! Hope you acheve Sx sgma
More informationVRT012 User s guide V0.1. Address: Žirmūnų g. 27, Vilnius LT-09105, Phone: (370-5) , Fax: (370-5) ,
VRT012 User s gude V0.1 Thank you for purchasng our product. We hope ths user-frendly devce wll be helpful n realsng your deas and brngng comfort to your lfe. Please take few mnutes to read ths manual
More informationAssignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.
Farrukh Jabeen Algorthms 51 Assgnment #2 Due Date: June 15, 29. Assgnment # 2 Chapter 3 Dscrete Fourer Transforms Implement the FFT for the DFT. Descrbed n sectons 3.1 and 3.2. Delverables: 1. Concse descrpton
More informationTPL-Aware Displacement-driven Detailed Placement Refinement with Coloring Constraints
TPL-ware Dsplacement-drven Detaled Placement Refnement wth Colorng Constrants Tao Ln Iowa State Unversty tln@astate.edu Chrs Chu Iowa State Unversty cnchu@astate.edu BSTRCT To mnmze the effect of process
More informationLecture 5: Multilayer Perceptrons
Lecture 5: Multlayer Perceptrons Roger Grosse 1 Introducton So far, we ve only talked about lnear models: lnear regresson and lnear bnary classfers. We noted that there are functons that can t be represented
More informationLoad-Balanced Anycast Routing
Load-Balanced Anycast Routng Chng-Yu Ln, Jung-Hua Lo, and Sy-Yen Kuo Department of Electrcal Engneerng atonal Tawan Unversty, Tape, Tawan sykuo@cc.ee.ntu.edu.tw Abstract For fault-tolerance and load-balance
More informationTECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z.
TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS Muradalyev AZ Azerbajan Scentfc-Research and Desgn-Prospectng Insttute of Energetc AZ1012, Ave HZardab-94 E-mal:aydn_murad@yahoocom Importance of
More informationAn Investigation into Server Parameter Selection for Hierarchical Fixed Priority Pre-emptive Systems
An Investgaton nto Server Parameter Selecton for Herarchcal Fxed Prorty Pre-emptve Systems R.I. Davs and A. Burns Real-Tme Systems Research Group, Department of omputer Scence, Unversty of York, YO10 5DD,
More informationHigh-Level Power Modeling of CPLDs and FPGAs
Hgh-Level Power Modelng of CPLs and FPGAs L Shang and Nraj K. Jha epartment of Electrcal Engneerng Prnceton Unversty {lshang, jha}@ee.prnceton.edu Abstract In ths paper, we present a hgh-level power modelng
More informationBiostatistics 615/815
The E-M Algorthm Bostatstcs 615/815 Lecture 17 Last Lecture: The Smplex Method General method for optmzaton Makes few assumptons about functon Crawls towards mnmum Some recommendatons Multple startng ponts
More informationAn Optimal Algorithm for Prufer Codes *
J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,
More informationReliability and Energy-aware Cache Reconfiguration for Embedded Systems
Relablty and Energy-aware Cache Reconfguraton for Embedded Systems Yuanwen Huang and Prabhat Mshra Department of Computer and Informaton Scence and Engneerng Unversty of Florda, Ganesvlle FL 326-62, USA
More informationConditional Speculative Decimal Addition*
Condtonal Speculatve Decmal Addton Alvaro Vazquez and Elsardo Antelo Dep. of Electronc and Computer Engneerng Unv. of Santago de Compostela, Span Ths work was supported n part by Xunta de Galca under grant
More informationReal-time Scheduling
Real-tme Schedulng COE718: Embedded System Desgn http://www.ee.ryerson.ca/~courses/coe718/ Dr. Gul N. Khan http://www.ee.ryerson.ca/~gnkhan Electrcal and Computer Engneerng Ryerson Unversty Overvew RTX
More informationSome material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier
Some materal adapted from Mohamed Youns, UMBC CMSC 611 Spr 2003 course sldes Some materal adapted from Hennessy & Patterson / 2003 Elsever Scence Performance = 1 Executon tme Speedup = Performance (B)
More informationMixed-Criticality Scheduling on Multiprocessors using Task Grouping
Mxed-Crtcalty Schedulng on Multprocessors usng Task Groupng Jankang Ren Lnh Th Xuan Phan School of Software Technology, Dalan Unversty of Technology, Chna Computer and Informaton Scence Department, Unversty
More informationAdvanced Computer Networks
Char of Network Archtectures and Servces Department of Informatcs Techncal Unversty of Munch Note: Durng the attendance check a stcker contanng a unque QR code wll be put on ths exam. Ths QR code contans
More informationReal-time Fault-tolerant Scheduling Algorithm for Distributed Computing Systems
Real-tme Fault-tolerant Schedulng Algorthm for Dstrbuted Computng Systems Yun Lng, Y Ouyang College of Computer Scence and Informaton Engneerng Zheang Gongshang Unversty Postal code: 310018 P.R.CHINA {ylng,
More informationReal-Time Guarantees. Traffic Characteristics. Flow Control
Real-Tme Guarantees Requrements on RT communcaton protocols: delay (response s) small jtter small throughput hgh error detecton at recever (and sender) small error detecton latency no thrashng under peak
More informationMOBILE Cloud Computing (MCC) extends the capabilities
1 Resource Sharng of a Computng Access Pont for Mult-user Moble Cloud Offloadng wth Delay Constrants Meng-Hs Chen, Student Member, IEEE, Mn Dong, Senor Member, IEEE, Ben Lang, Fellow, IEEE arxv:1712.00030v2
More informationSolving two-person zero-sum game by Matlab
Appled Mechancs and Materals Onlne: 2011-02-02 ISSN: 1662-7482, Vols. 50-51, pp 262-265 do:10.4028/www.scentfc.net/amm.50-51.262 2011 Trans Tech Publcatons, Swtzerland Solvng two-person zero-sum game by
More informationResearch of Dynamic Access to Cloud Database Based on Improved Pheromone Algorithm
, pp.197-202 http://dx.do.org/10.14257/dta.2016.9.5.20 Research of Dynamc Access to Cloud Database Based on Improved Pheromone Algorthm Yongqang L 1 and Jn Pan 2 1 (Software Technology Vocatonal College,
More informationTN348: Openlab Module - Colocalization
TN348: Openlab Module - Colocalzaton Topc The Colocalzaton module provdes the faclty to vsualze and quantfy colocalzaton between pars of mages. The Colocalzaton wndow contans a prevew of the two mages
More informationCluster Analysis of Electrical Behavior
Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School
More informationHermite Splines in Lie Groups as Products of Geodesics
Hermte Splnes n Le Groups as Products of Geodescs Ethan Eade Updated May 28, 2017 1 Introducton 1.1 Goal Ths document defnes a curve n the Le group G parametrzed by tme and by structural parameters n the
More informationPerformance Evaluation of Information Retrieval Systems
Why System Evaluaton? Performance Evaluaton of Informaton Retreval Systems Many sldes n ths secton are adapted from Prof. Joydeep Ghosh (UT ECE) who n turn adapted them from Prof. Dk Lee (Unv. of Scence
More informationMathematics 256 a course in differential equations for engineering students
Mathematcs 56 a course n dfferental equatons for engneerng students Chapter 5. More effcent methods of numercal soluton Euler s method s qute neffcent. Because the error s essentally proportonal to the
More informationA SYSTOLIC APPROACH TO LOOP PARTITIONING AND MAPPING INTO FIXED SIZE DISTRIBUTED MEMORY ARCHITECTURES
A SYSOLIC APPROACH O LOOP PARIIONING AND MAPPING INO FIXED SIZE DISRIBUED MEMORY ARCHIECURES Ioanns Drosts, Nektaros Kozrs, George Papakonstantnou and Panayots sanakas Natonal echncal Unversty of Athens
More informationRange images. Range image registration. Examples of sampling patterns. Range images and range surfaces
Range mages For many structured lght scanners, the range data forms a hghly regular pattern known as a range mage. he samplng pattern s determned by the specfc scanner. Range mage regstraton 1 Examples
More informationAn Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation
17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 An Iteratve Soluton Approach to Process Plant Layout usng Mxed
More informationSupport Vector Machines
Support Vector Machnes Decson surface s a hyperplane (lne n 2D) n feature space (smlar to the Perceptron) Arguably, the most mportant recent dscovery n machne learnng In a nutshell: map the data to a predetermned
More information(1) The control processes are too complex to analyze by conventional quantitative techniques.
Chapter 0 Fuzzy Control and Fuzzy Expert Systems The fuzzy logc controller (FLC) s ntroduced n ths chapter. After ntroducng the archtecture of the FLC, we study ts components step by step and suggest a
More informationA MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS
Proceedngs of the Wnter Smulaton Conference M E Kuhl, N M Steger, F B Armstrong, and J A Jones, eds A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Mark W Brantley Chun-Hung
More informationA GENETIC ALGORITHM FOR PROCESS SCHEDULING IN DISTRIBUTED OPERATING SYSTEMS CONSIDERING LOAD BALANCING
A GENETIC ALGORITHM FOR PROCESS SCHEDULING IN DISTRIBUTED OPERATING SYSTEMS CONSIDERING LOAD BALANCING M. Nkravan and M. H. Kashan Department of Electrcal Computer Islamc Azad Unversty, Shahrar Shahreqods
More informationActive Contours/Snakes
Actve Contours/Snakes Erkut Erdem Acknowledgement: The sldes are adapted from the sldes prepared by K. Grauman of Unversty of Texas at Austn Fttng: Edges vs. boundares Edges useful sgnal to ndcate occludng
More informationSelf-tuning Histograms: Building Histograms Without Looking at Data
Self-tunng Hstograms: Buldng Hstograms Wthout Lookng at Data Ashraf Aboulnaga Computer Scences Department Unversty of Wsconsn - Madson ashraf@cs.wsc.edu Surajt Chaudhur Mcrosoft Research surajtc@mcrosoft.com
More informationVerification by testing
Real-Tme Systems Specfcaton Implementaton System models Executon-tme analyss Verfcaton Verfcaton by testng Dad? How do they know how much weght a brdge can handle? They drve bgger and bgger trucks over
More informationSpace-Optimal, Wait-Free Real-Time Synchronization
1 Space-Optmal, Wat-Free Real-Tme Synchronzaton Hyeonjoong Cho, Bnoy Ravndran ECE Dept., Vrgna Tech Blacksburg, VA 24061, USA {hjcho,bnoy}@vt.edu E. Douglas Jensen The MITRE Corporaton Bedford, MA 01730,
More informationSmoothing Spline ANOVA for variable screening
Smoothng Splne ANOVA for varable screenng a useful tool for metamodels tranng and mult-objectve optmzaton L. Rcco, E. Rgon, A. Turco Outlne RSM Introducton Possble couplng Test case MOO MOO wth Game Theory
More informationParallel Branch and Bound Algorithm - A comparison between serial, OpenMP and MPI implementations
Journal of Physcs: Conference Seres Parallel Branch and Bound Algorthm - A comparson between seral, OpenMP and MPI mplementatons To cte ths artcle: Luco Barreto and Mchael Bauer 2010 J. Phys.: Conf. Ser.
More informationRepeater Insertion for Two-Terminal Nets in Three-Dimensional Integrated Circuits
Repeater Inserton for Two-Termnal Nets n Three-Dmensonal Integrated Crcuts Hu Xu, Vasls F. Pavlds, and Govann De Mchel LSI - EPFL, CH-5, Swtzerland, {hu.xu,vasleos.pavlds,govann.demchel}@epfl.ch Abstract.
More informationAn efficient iterative source routing algorithm
An effcent teratve source routng algorthm Gang Cheng Ye Tan Nrwan Ansar Advanced Networng Lab Department of Electrcal Computer Engneerng New Jersey Insttute of Technology Newar NJ 7 {gc yt Ansar}@ntedu
More informationCommunication-Minimal Partitioning and Data Alignment for Af"ne Nested Loops
Communcaton-Mnmal Parttonng and Data Algnment for Af"ne Nested Loops HYUK-JAE LEE 1 AND JOSÉ A. B. FORTES 2 1 Department of Computer Scence, Lousana Tech Unversty, Ruston, LA 71272, USA 2 School of Electrcal
More information3. CR parameters and Multi-Objective Fitness Function
3 CR parameters and Mult-objectve Ftness Functon 41 3. CR parameters and Mult-Objectve Ftness Functon 3.1. Introducton Cogntve rados dynamcally confgure the wreless communcaton system, whch takes beneft
More informationPolyhedral Compilation Foundations
Polyhedral Complaton Foundatons Lous-Noël Pouchet pouchet@cse.oho-state.edu Dept. of Computer Scence and Engneerng, the Oho State Unversty Feb 8, 200 888., Class # Introducton: Polyhedral Complaton Foundatons
More informationVectorization in the Polyhedral Model
Vectorzaton n the Polyhedral Model Lous-Noël Pouchet pouchet@cse.oho-state.edu Dept. of Computer Scence and Engneerng, the Oho State Unversty October 200 888. Introducton: Overvew Vectorzaton: Detecton
More informationAPPLICATION OF MULTIVARIATE LOSS FUNCTION FOR ASSESSMENT OF THE QUALITY OF TECHNOLOGICAL PROCESS MANAGEMENT
3. - 5. 5., Brno, Czech Republc, EU APPLICATION OF MULTIVARIATE LOSS FUNCTION FOR ASSESSMENT OF THE QUALITY OF TECHNOLOGICAL PROCESS MANAGEMENT Abstract Josef TOŠENOVSKÝ ) Lenka MONSPORTOVÁ ) Flp TOŠENOVSKÝ
More information