Combined Functional Partitioning and Communication Speed Selection for Networked Voltage-Scalable Processors Λ

Size: px
Start display at page:

Download "Combined Functional Partitioning and Communication Speed Selection for Networked Voltage-Scalable Processors Λ"

Transcription

1 Combned Functonal Parttonng and Communcaton Speed Selecton for Networked Voltage-Scalable Processors Λ Jnfeng Lu, Pa H. Chou, Nader Bagherzadeh epartment of Electrcal & Computer Engneerng Unversty of Calforna, Irvne, CA , USA fjnfengl, chou, Categores and Subject escrptors C.3 [SPECIAL-PURPOSE AN APPLICATION-BASE SYS- TEMS]: Real-tme and embedded systems General Terms esgn, Performance, Algorthms Keywords functonal parttonng, communcaton speed selecton, communcaton/computaton trade-offs, embedded mult-processor, lowpower desgn ABSTRACT Ths paper presents a new technque for global energy optmzaton through coordnated functonal parttonng and speed selecton for embedded processors nterconnected by a hgh-speed seral bus. Many such seral nterfaces are capable of operatng at multple speeds and can open up a new dmenson of trade-offs to complement today s CPU-centrc voltage scalng technques for processors. We propose a mult-dmensonal dynamc programmng formulaton for energy-optmal functonal parttonng wth CPU/communcaton speed selecton for a class of data-regular applcatons under performance constrants. We demonstrate the effectveness of our optmzaton technques wth an mage processng applcaton mapped onto a mult-processor archtecture wth a mult-speed Ethernet. 1. INTROUCTION A key trend n embedded systems s towards the use of hghspeed seral busses for system-level nterconnect. Hgh-speed seral controllers such as Ethernet are now an ntegral part of many embedded processors. Newer protocols such as FreWre (IEEE Λ Ths research was sponsored by ARPA grant F and Prntronx Fellowshp. Permsson to make dgtal or hard copes of all or part of ths work for personal or classroom use s granted wthout fee provded that copes are not made or dstrbuted for proft or commercal advantage and that copes bear ths notce and the full ctaton on the frst page. To copy otherwse, to republsh, to post on servers or to redstrbute to lsts, requres pror specfc permsson and/or a fee. ISSS 2, October 2 4, 22, Kyoto, Japan. Copyrght 22 ACM /2/1...$ ) and USB are commonly used not only for perpheral devces but also for connectng embedded processors. Many have advocated hgh-speed, seral packet networks for systems-on-chp for ther compellng advantages ncludng modularty, composablty, scalablty, form factor, and power effcency. For power optmzaton, prevous efforts focused on the processor for several reasons. The CPU was the man consumer of power, and t also offered the most optons for power management, ncludng voltage scalng. However, recent advances n both processors and communcaton nterfaces are drvng a shft n how power should be managed. Low-power CPU, Hgh-power Communcaton CPU-centrc power management has gven rse to a new generaton of processors wth dramatcally mproved power effcency, and the CPU s now drawng a smaller percentage of the overall system power. The nsatable demand for bandwdth has also resulted n hgh-speed communcaton nterfaces. Even though ther power effcency (.e., energy per bt transmtted) has also been mproved, communcaton power now matches or surpasses the CPU, and s thus a larger fracton of the system power. For nstance, the Intel XScale processor consumes 1.6W at full speed, whle a GgaBt Ethernet nterface consumes 6W. Mult-speed Communcaton Interfaces Many communcaton nterfaces today support multple data rates. However, the scalng effects tend to be the opposte those of voltage scalable CPUs. For CPUs, slower speed generally means lower power and lower energy per nstructon; but for communcaton, faster speed means hgher power but often less energy per bt. Ths s hghly dependent on the specfc controller. Few research works to date explored communcaton speed as a key parameter for power optmzaton. Speed Selecton and Functonal Parttonng Speed selecton cannot be performed for just communcaton or computaton n solaton, because a local decson can have a global mpact. The CPUs cannot all be run at the slowest, most powereffcent speeds, because they must compete for the avalable tme and power wth each other and wth the communcaton nterfaces. A faster communcaton speed, even at a hgher energy-per-bt, can save energy by creatng opportuntes for voltage scalng the processors. Greedly savng communcaton power may actually result n hgher overall energy. At the same tme, functonal parttonng must be an ntegral part of the optmzaton loop, because dfferent parttonng schemes can dramatcally alter the communcaton and computaton workload for each node. 14

2 Approach For a gven workload on a networked archtecture, our problem statement s to generate a functonal parttonng scheme and to select the speeds of communcaton nterfaces and processors, such that the total energy s mnmzed. In general, ths problem s extremely dffcult. Fortunately, for a class of systems wth ppelned multple processors under a latency constrant, effcent, exact solutons exst. We construct such a system model and formulate the energy consumed by the processors and communcaton nterfaces wth ther power/speed scalng factors wthn ther avalable tme budget. In [], we presented the schedulablty condtons and the problem of communcaton speed selecton and sketched solutons by exhaustve search. Ths paper combnes communcaton speed selecton wth functonal parttonng and presents an effcent mult-dmensonal dynamc programmng soluton to mnmze system energy. We demonstrate the effectveness of ths technque wth an mage processng algorthm mapped onto a ppelned mult-processor archtecture nterconnected by a GgaBt Ethernet. 2. RELATE WORK Prevous works have explored communcaton synthess and optmzaton n dstrbuted mult-processor systems. [13] presents communcaton schedulng to work wth rate-monotonc tasks, whle [5] assumes the more determnstc tme-trggered protocol (TTP). [1] dstrbutes tmng constrants on communcaton among segments through prorty assgnment on seral busses (such as controlarea network) and customzaton of devce drvers. Whle these assume a bus or a network protocol, LYCOS [7] ntegrates the ablty to select among several communcaton protocols (wth dfferent delays, data szes, burstness) nto the man parttonng loop. These technques do not specfcally optmze for energy by explotng the processors voltage scalng capabltes or the characterstcs of the communcaton nterfaces power consumpton. Related technques that optmze for power consumpton of processors typcally assume a fxed communcaton data rate. [3] uses smulated heatng search strateges to fnd low-power desgn ponts for voltage scalable embedded processors. [9] performs batteryaware task post-schedulng for dstrbuted, voltage-scalable processors by movng tasks to smooth the power profle. [12, 11] propose parttonng the computaton onto a mult-processor archtecture that consumes sgnfcantly less power than a sngle processor. [4] reduces swtchng actvtes of both functonal unts and communcaton lnks by parttonng tasks onto a mult-chp archtecture; whle [6] maxmzes the opportunty to shut down dle processors through functonal parttonng. All these technques focus on the computatonal aspect wthout explorng the speed/power scalablty of the communcaton nterfaces. Exstng technques cannot be readly combned to explore many tmng/power trade-offs between computaton and communcaton. The quadratc voltage scalng propertes for CPU s do not generalze to communcaton nterfaces. Even f they do, these technques have not consdered the parttonng of power and tmng budgets among computaton/communcaton components across the network. Selectng communcaton attrbutes by only consderng deadlnes wthout power wll lead to unexpected, often ncorrect results at the system level. 3. SYSTEM MOEL Ths secton defnes a system-level performance/energy model for both computaton and communcaton components n a networked, multple-processor embedded system. In ths paper, such a system conssts of M processng nodes N ; = 1;2;:::;M connected by a shared communcaton medum. Each processng node (or node for short) conssts of a processor, a local memory, and one or more communcaton nterfaces that send and/or receve data from other nodes. A processng job assgned to a node s modeled n terms of three tasks: RECV,, and SEN, whch must be executed serally n that order. RECV and SEN are communcaton tasks on the nterfaces, and s a computaton task on the processor. For communcaton tasks RECV and SEN, workload W r and W s ndcate the number of bts to be receved and sent, respectvely. For the computaton task, the workload W p s the number of. Let T p ;T r ;T s denote the delays of tasks, RECV and SEN, respectvely. Let F p denote the clock frequency of the processor, and F r and F s the respectve data bt rates for recevng and sendng. We have T W p p = ; T W r r = ; T W s s = (1) F p F r F s (1) s reasonable for processors executng data-domnated programs, where the total W p can be analyzed and bounded statcally. To model non-deal aspects of the medum, we ntroduce the communcaton effcency terms, ρ r and ρ s, where» ρ r ;ρ s» 1, such that T r = W r ρ r F r and T s = W s ρ s F s. Note that ρ r and ρ s need not be constants, but may be functons of communcaton speeds F r ;F s. For brevty, our expermental results assume an deal communcaton medum (ρ r = ρ s = 1) wthout loss of generalty. s a deadlne on each processng job, whch requres T r +T p + T s» for the three seralzed tasks. If any slack tme exsts, then we assume we can always slow down task by voltage scalng to reduce energy, based on the capablty of modern embedded processors. Therefore, we convert the nequalty nto an equalty n the deadlne equaton. That s, = T r + T p + T s (2) We assume a processor s voltage-scalng characterstcs can be expressed by a scalng functon Scale p that maps the CPU s frequency to ts power level. A communcaton nterface also has scalng functons Scale s and Scale r for sendng and recevng. (2) mples Scale p s contnuous, whle communcaton nterfaces support only a few dscrete scalng ponts. Let P p, P r, and P s denote the power for the processor, recevng, and sendng, respectvely. Then, P p = Scale p (F p ); P r = Scale r (F r ); P s = Scale s (F s ) (3) Let P ovh denote the power overhead assocated wth havng an addtonal node nto the system. It captures the power of the memory, mnmum power of the CPU and communcaton nterface, CPU s power durng RECV and SEN (MA), and communcaton nterfaces power durng. The energy consumpton of a task s the power-delay product. Let E p ;E r ;E s, and E ovh denote the energy consumpton of tasks, RECV, SEN, and overhead of a node, respectvely. Let E N denote the total energy of node N. Fnally, the total energy of the system s the sum of energy consumpton on each node. To summarze, E p = P p T p ; E r = P r T r ; E s = P s T s ; E ovh = P ovh (4) E N = E p + E r + E s + E ovh (5) E sys = M =1 E N (6) 15

3 N1 N2 N3 RECV recevng Wr bts Wp on processor (a) block dgram SEN sendng Ws bts delay: Tr = Wr / Fr delay: Tp = Wp / Fp RECV Pr power: Pr Pp power: Pp speed: Fr speed: Fp OVERHEA (b) tmng-power dgram delay: Ts = Ws / Fs SEN Ps power: Ps speed: Fs power: Povh Fgure 1: Tmng and power propertes of a processng node. Tr1 RE CV Tp1 SE N RE CV Ts1= Tr2 Tp2 Ts2 = Tr3 SEN Tp3 RECV (a) seralzed tmng dagram Ts3 SE N Tr1 RE N1 CV N2 N3 Tp2 Tp3 - Ts1 Tp1 Ts3 Ts2 = Tr3 SE N SEN RE CV SE RECV PR N OC Fgure 2: A three-node ppelne. Ts1= Tr2 SEN SE RECV N (b) ppelned tmng dagram Fg. 1 shows the tmng and power breakdown of the tasks on a node. The gray bar represents the overhead, whle the whte bars represent tasks RECV, and SEN. The area of a bar represents the energy consumpton by the correspondng task or overhead. Ths paper consders a specal case called an M-node ppelne. It conssts of dentcal nodes N ; = 1;2;:::;M as characterzed by Scale p ;Scale r ;Scale s ;E ovh. Each node N receves W r bts of data from the prevous node N 1 (except N 1 ), processes the data n W p, and sends the W s -bt result to the next node N +1 (except N M ). Each SEN! RECV +1 communcaton par sends and receves same amount of data at the same communcaton speed, wth the same communcaton delay, and we assume they start and fnsh at the same tme. That s, W s = W r+1 ;F s = F r+1 ;T s = T r+1. All nodes have the same deadlne, and each node acts as a ppelne stage wth delay. Fg. 2 shows an example of a three-node ppelne. For brevty, the overhead s not shown. Fg. 2(b) shows the ppelned tmng dagram by foldng the tasks n Fg. 2(a) nto a common nterval wth duraton, whch s the delay of each ppelne stage. [] presented the schedulablty condtons for an M-node ppelne based on collson and utlzaton of the shared communcaton medum. An M-node ppelne can be parttoned and mapped onto an M - node ppelne (M» M) by mergng adjacent nodes N ;N +1 ;:::;N j ( j ) nto a new node Nk. The new node N k combnes all computaton workload, receves W r bts of data, and sends W s j bts of data. Communcaton wthn a node become local data accesses. That s, Wp k = j l= W p l, and Wr k = W r ;Ws k = W s j. The new M - node ppelne s called a parttonng of the ntal M-node ppelne. 4. MOTIVATING EXAMPLE We use an automatc target recognton (ATR) algorthm (Fg. 3) as our motvatng example. Orgnally t s a seral algorthm. We reconstructed a parallel verson and mapped t onto ppelned multple processors. Ppelnng allows each processor to run at a much slower speed wth a lower voltage level to reduce overall computaton energy, whle parallelsm compensates for the performance. Of course, havng extra processors costs energy overhead for nterprocessor communcaton, memory, etc. Wr1 = 12Kb N1: Target etecton Wp1 = 4K 1 Mbps Ws1 = Wr2 N2: FFT Wp2 = 119K Ws2 = Wr3 N3: Flter Wp3 = 54K Ws3 = Wr4 N4: IFFT Wp4 = 357K Ws4 = Wr5 Fgure 3: Stages of the ATR algorthm. Node N1 OVERHEA 1 Mbps 1 Mbps (a) A fne-gran parttonng scheme reduces energy on computaton, at the cost of nter-proessor communcaton and overhead of addtonal nodes. 1 Mbps Merge N1 and N2 nto a combned node N (ncreased OVERHEA (b) The combned node reduces communcaton and overhead, but t requres more energy for computaton. 1 Mbps Node N2 OVERHEA 1 Mbps 1 Mbps Node OVERHEA N5: Compute stance Wp5 = 2639K 1 Mbps (c) The computaton energy can be reduced by hgh-speed communcaton, whch leaves more tme for computaton. Ws5 = 14Kb Fgure 4: The mpact of dfferent parttonng schemes and communcaton speed settngs. Task to Node Mappng Gven the decomposton nto fve stages of the ATR algorthm, several parttonng schemes are possble for mappng them onto a number of ppelned nodes. Fg. 4 shows an example by consderng how they map the frst two stages onto (a) two nodes and (b) one node. In Fg. 4(a), mappng onto two nodes N1 and N2 enables both processors to operate at a reduced speed (3MHz) for computaton. The two nodes together consume lower computaton energy than one node at a faster speed but must pay the prce of communcaton energy for SEN1! RECV 2. In Fg. 4(b), even though mergng the two stages onto one node elmnates the SEN1! RECV 2 communcaton, the CPU must execute the combned computaton workload at a faster clock rate (6MHz), a less energyeffcent level. Zoomng out, many parttonng schemes are possble, even when lmted to a ppelned organzaton. For example, one parttonng [N1;N2][N3;N4;N5] may be optmal for nodes N1 and N2; but t wll preclude another soluton [N1]; [N2; N3]; [N4; N5] that may lead to less energy for the whole system. Speed Selecton for CPU and Communcaton The selecton of communcaton speed s an equally crtcal ssue. For example, a 1/1/1 Base-T Ethernet nterface can consume more power than a CPU at hgh (1/1Mbps) speeds, but less power at the slower, 1Mbps data rate. In Fg. 4(b), the processor must operate at a hgh clock rate due to the low-speed communcaton at 1Mbps. Because of the deadlne, communcaton and computaton compete for ths budget. Low-speed communcaton leaves less tme for computaton, thereby forcng the processor to run faster to meet the deadlne. Conversely, hgh-speed communcaton could free up more tme budget for computaton, as shown n Fg. 4(c), where the CPU s clock rate s dropped to 3MHz. Although extra energy could be allocated to hgh-speed communcaton, f the energy savng on the CPU could compensate for ths cost, then (c) would be more energy-effcent than (b). 16

4 The communcaton-computaton nteracton becomes more ntrcate n a mult-processor envronment. Any data dependency between dfferent nodes must nvolve ther communcaton nterfaces. The communcaton speed of a sender wll not only determne the recever s communcaton speed but also nfluence the choce of the recever s computaton speed. The communcaton speed on the frst node of the ppelne wll have a chan effect on all other nodes n the system. A locally optmal speed for the frst node wll not necessarly lead to a globally optmal soluton. Combnng Parttonng and Speed Selecton Gven a fxed parttonng scheme, the desgners can always fnd the correspondng optmal speed settng that mnmzes energy for that scheme. However, energy-optmal speed selecton for a parttonng s not necessarly optmal over all parttonngs. Instead, parttonng and speed selecton are mutually enablng. In ths paper, we take a mult-dmensonal optmzaton approach that consders performance requrements, schedulablty, load balancng, communcaton-computaton trade-offs, and mult-processor overhead n a system-level context. 5. PROBLEM FORMULATION Gven an M-node ppelne, choces of parttonng and communcaton speed settngs wll lead to dfferent levels of energy consumpton at the system level. Ths secton formulates three energy mnmzaton problems: by parttonng, by communcaton speed selecton, and by both. In the frst two problems, the optmal soluton can be obtaned by dynamc programmng, and the combned optmzaton problem can be solved by mult-dmensonal dynamc programmng. Problem 1 (Optmal Parttonng) Gven (a) M ppelned nodes N wth workload W p ;W r ;W s, = 1;2;:::;M, (b) a deadlne for all nodes, and (c) the constrant that the speed settngs of all communcaton nstance must match: F r1 ;F s = F r+1 ;F sm, for = 1;2;:::;M 1, fnd a parttonng scheme that mnmzes energy E sys. To avod exhaustve enumeraton n the O(2 M 1 ) soluton space, we construct a seres of optmal solutons to sub-problems by mappng the orgnal M nodes one by one onto new sub-parttonngs. We compute the optmal cost functon n terms of the mnmum energy consumpton over the sub-parttonngs. Upon mappng each node, the new optmal sub-soluton can be computed from past optmal sub-solutons. Therefore, a dynamc programmng approach s applcable. For dynamc programmng, we use an energy matrx E to store the cost functon. Each entry E[; j] ndcates the mnmum energy of a sub-problem that maps the frst j orgnal nodes N 1 ;N 2 ;:::;N j onto a new sub-parttonng wth nodes N 1 ;N 2 ;:::;N. Matrx E s ntalzed to. E[; j] = >< >: for = j = mn 1»l» j 1» E[ 1;l]+ E N for 1»» j» M (7) ndcates that the optmal -node sub-parttonng that maps the frst j orgnal nodes must be a combnaton of the followngs: (a) a sub-parttonng that maps the frst l orgnal nodes N 1 ;N 2 ;:::;N l to 1 new nodes, and (b) the th new node N that combnes the orgnal nodes N l+1 ;:::;N j. The sub-parttonng (a) must be optmal wth mnmum energy E[ 1; l]. (b) only has one (7) node N. Its energy s denoted as E N. Snce E[; j] s the optmal energy for the sub-problem, t must be the mnmum value of (7) among all possble choces of l. The dynamc programmng algorthm can terate (7) from = j = untl = j = M. Each optmal sub-soluton E[; j] can be derved from prevously computed E[ 1;l]. Fnally, the mnmum energy s mn(e[;m]); = 1; 2;:::; M. We omt the algorthm for brevty. Its tme complexty s O(M 3 ). Problem 2 (Optmal Communcaton Speed Selecton) Gven (a) a fxed parttonng scheme wth M ppelned nodes N wth workload W p ;W r ;W s, = 1;2;:::;M, (b) a deadlne for all nodes, and (c) the avalable choces for communcaton speed settngs F ck ;k = 1;2;:::;C, fnd all processor speeds F p and communcaton speeds F r ;F s that mnmze energy E sys. We also perform dynamc programmng as opposed to exhaustve search n O(C M+1 ) soluton space. urng step when processng node N, we only select communcaton speeds F r ;F s of N, because they determne F p, and the prevous speed settngs of the sub-problems have already been selected to optmal. For each choce of F r ;F s, we compute the energy of node N, plus the optmal energy of a sub-problem computed by step 1 wth F s 1 = F r to fnd the optmal energy of the new sub-problem n step. Each element E[;k] n the energy matrx E ndcates the mnmum energy of a sub-problem. It has nodes N 1 ;N 2 ;:::;N wth the last node N s sendng speed selected to be the k th speed choce F ck. E s ntalzed to. E[;k]= >< >: mn 1»m»C» E[ 1;m]+ E N (F r = F cm ;F s = F ck ) for =, for 1»» M, () () ndcates that the optmal speed settng for the sub-problem up to node N whose sendng speed F s = F ck s determned by: (a) a prevous optmal sub-soluton where node N 1 s sendng speed F s 1 = F cm, plus (b) node N whose recevng speed F r = F cm, sendng speed F s = F ck. (a) ncludes 1 nodes N 1 ;N 2 ;:::;N 1 and communcates wth (b) at speed F cm. The optmal energy of subproblem (a) s E[ 1;m]. (b) has only one node N that receves data from (a) through speed F cm ; and ts sendng speed s F ck. Its energy s denoted as E N (F r = F cm ;F s = F ck ). Snce E[;k] s optmal, t must be the mnmum value among all possble speed settngs F cm n (). The algorthm s omtted for brevty. It terates () untl = M;k = C. Each E[;k] can be derved from prevously computed E[ 1; m]. The global mnmum energy s mn(e[m;k]);k = 1;2;:::;C. The tme complexty of the algorthm s O(MC 2 ). Problem 3 (Optmal Parttonng and Speed Selecton) Gven (a) M ppelned nodes N wth workload W p ;W r ;W s, = 1;2;:::;M, (b) a deadlne for all nodes, and (c) the avalable choces for communcaton speed settngs F ck ;k = 1;2;:::;C, fnd a parttonng scheme and correspondng communcaton speed settngs that mnmze energy E sys. ue to the nter-dependency between speed settng and parttonng, the optmal soluton cannot be acheved by solvng two prevous problems ndvdually. Exhaustvely enumeratng over one 17

5 parttonng-speedselecton(w r [1:M];W s [1:M];W p [1:M]; F c [1:C];Scale r ;Scale s ;Scale p ;;P ovh ) for :=tom do for j := to M do for k :=1toC do E[; j;k] := U[; j;k] := P[; j;k] := S[; j;k] := for k := 1 to C do E[;;k] := U[;;k] := W r [1]=F c [k]= for :=1toM do for j := to M do for k :=1toC do for l := 1to j 1 do for m := 1 to C do e := E[ 1;l;m]+E N (F r = F c [m];f s = F c [k]) u := U[ 1;l;m]+W s [ j]=f c [k]= f u» 1 and e < E[; j;k] then E[; j;k] := e U[; j;k] := u P[; j;k] := l S[; j;k] := m E opt ;P opt ;S opt := retreve from matrces E;U;P;S return E opt ;P opt ;S opt Fgure 5: Combned parttonng wth speed selecton. dmenson and dynamc programmng over the other s qute expensve wth the tme complexty as ether O(2 M 1 MC 2 ) or O(C M+1 M 3 ). We propose a mult-dmensonal dynamc programmng algorthm gven the fact that the prevous two problems can be solved by dynamc programmng ndependently. Based on the prevous two dynamc programmng approaches, the energy matrx E for the combned problem s defned as follows: each element E[; j;k] stores the mnmum energy of a sub-problem that maps the frst j orgnal nodes N 1 ;N 2 ;:::;N j onto a new -node sub-parttonng, whose last node N has sendng speed F s = F ck. E[; j;k]= >< 2 3 E[ 1;l;m]+ 4 E N (F r = F cm ; 5 F s = F ck ) for = j =, mn for 1» >: 1» l» j» j» M; 1; 1» m» C (9) The optmal energy E[; j;k] s derved from: (a) E[ 1;l;m] of a prevous optmal sub-soluton, whch maps l orgnal nodes N 1 ;:::;N l onto 1 new nodes N1 ;:::;N 1 wth the last node N 1 s sendng speed selected to be F c m, plus (b) the new node N that combnes orgnal nodes N l+1 ;:::;N j wth recevng speed F cm and sendng speed F ck. The sub-soluton (a) has the optmal energy E[ 1;l;m]. Note that (b) has only one node N, and ts energy s denoted as E N (F r = F cm ;F s = F ck ). E[; j;k] must be derved from all possble pars of (l;m) to acheve the mnmum value of (9). The algorthm s shown n Fg. 5. It combnes two prevous algorthms by two-dmensonal dynamc programmng. There are three addtonal matrces. The utlzaton matrx U tracks the schedulablty condton [] and guards each optmal sub-soluton to guarantee ts schedulablty. The parttonng matrx P and speed matrx S are used to record the ntermedate solutons and for retrevng the optmal parttonng P opt and optmal speed settng S opt when the algorthm termnates. The global mnmum energy s Wr1 = 12Kb N1: Target etecton Wp1 = 4K Wr = 12Kb Ws1 = Wr2 N2: FFT Wp2 = 119K N: Mergng N1, N2, N3, N4, N5 nto one node Wp = Wp1 + Wp2 + Wp3 + Wp4 + Wp5 = 33K (a) sngle-node parttonng Ws2 = Wr3 N3: Flter Wp3 = 54K Ws3 = Wr4 (b) fve-node parttonng N4: IFFT Wp4 = 357K Ws = 14Kb Ws4 = Wr5 N5: Compute stance Wp5 = 2639K Fgure 6: Two fxed parttonng schemes of ATR. Ws5 = 14Kb mn(e[;m;k]); = 1;2;:::;M;k = 1;2;:::;C. The tme complexty of the algorthm s O(M 3 C 2 ). 6. EXPERIMENTAL RESULTS To evaluate our energy optmzaton technques, we experment wth mappng the ATR algorthm onto two fxed parttonng schemes: (a) a sngle-node that combnes all blocks, and (b) a fve-node ppelne that maps each block onto an ndvdual node (Fg. 6). The nput data sze s 12K bts, and the output s 14K bts per frame. In scheme (a), the sngle node combnes all the workload of fve nodes n (b); and t elmnates all nternal communcaton nstances between nodes n (b). (a) and (b) are two extremes representng seral vs. parallel schemes. For both (a) and (b) we apply optmal speed selecton. We also fnd the optmal parttonng wth speed selecton as (c) and compare ts energy consumpton per mage frame wth (a) and (b) under two types of performance requrements: (1) hgh performance, = 1ms, (2) moderate performance, = 15ms. Each node conssts of an Intel XScale processor [2] whose power vs. performance level ranges from 5mW@15MHz to 1.6W@1GHz (Fg. 7), and an LXT-1 Ethernet nterface [1] wth power levels of.w@1mbps, 1.5W@1Mbps, and 6W@1Mbps (Fg. ). We assume each node has a constant power draw P ovh = 1mW. The results are presented n Fg. 9. In all cases, 1Mbps s always the optmal speed settng for communcaton. The low-power, 1Mbps communcaton speed results n the hghest energy. Ths s because t leaves so lttle tme for computaton such that the processors must run faster wth more energy to meet the deadlne, and t has the hghest energy-per-bt ratng. The low-speed communcaton also tends to volate the schedulablty condtons []. Gven propertes of ths partcular Ethernet nterface, 1Mbps communcaton wll always lead to the lowest energy consumpton snce t requres the least amount of energy per bt and leaves the maxmum amount of tme budget for reducng CPU energy. However, n cases where the energy-per-bt ratng does not decrease monotoncally wth the communcaton speed, the optmal speed settng may nvolve some combnatons of low-speed and hgh-speed settngs between dfferent nodes. For example, the node N may communcate wth N 1 at 1Mbps and wth N +1 at 1Mbps. Fg. 9(1) shows the energy consumpton of all three parttonng schemes under a tght performance constrant. The sngle-node (a) s heavly loaded wth computaton. Therefore t s desrable to reduce CPU energy by ppelnng. As a result, the fve-node ppelne (b) s more energy-effcent at the cost of addtonal communcaton and overhead. However, the optmal parttonng s (c) wth three nodes: [N1;N2]; [N3;N4]; [N5]. It consumes more CPU energy than (b), but overall t s optmal wth less energy on communcaton and overhead. 1

6 7. CONCLUSION We present an energy optmzaton technque for networked embedded processors and emergng system-on-chp archtectures wth hgh-speed on-chp networks. We explot wth the mult-speed feature of modern hgh-speed communcaton nterfaces as an effectve way to complement and enhance today s CPU-centrc power optmzaton approaches. In such systems, communcaton and computaton compete over opportuntes for operatng at the most energyeffcent ponts. It s crtcal to not only balance the load among processors by functonal parttonng, but also to balance the speeds between communcaton and computaton on each node and across the whole system. Our mult-dmensonal dynamc programmng formulaton s exact and produces the energy-optmal soluton as defned by a parttonng scheme and the speed selectons for all computaton and communcaton tasks. We expect ths technque to be applcable to a large class of data domnated systems that can be structured n a ppelned organzaton. Fgure 7: vs. performance of the XScale processor. Energy per frame (mj) Mode consumpton 1M bps mw 1M bps 1.5W 1M bps 6W Fgure : modes of the Ethernet nterface. (a) 1-node (b) 5-node (c) Optmal N1N2 N3 N4 N5 (1) hgh performance = 1ms (a) 1-node (b) 5-node (c) Optmal N1N2N3N4 N5 (2) moderate performance = 15ms Overhead Communcaton Computaton Fgure 9: Energy consumpton of three parttonng schemes. In case of the moderate performance constrant (Fg. 9(2)), (a) s stll domnated by computaton but t s not heavly loaded due to the relaxed deadlne. The reducton of CPU energy by (b) cannot compensate for the added overhead of new nodes and communcaton. Therefore (a) s better than (b) and ppelnng seems neffcent. However, the optmal parttonng (c) s stll a ppelned soluton. It combnes N1;N2;N3;N4 nto one node and maps N5 to another node. (c) acheves mnmum energy by approprately balancng computaton, communcaton wth ppelnng overhead. If the performance constrant s further relaxed, the seral soluton (a) wll become optmal.. REFERENCES [1] INTEL ethernet PHYs/transcevers. ethernet/lnecard ept.htm. [2] INTEL XScale mcroarchtecture. [3] N. K. Bambha, S. S. Bhattacharyya, J. Tech, and E. Ztzler. Hybrd global/local search strateges for dynamc voltage scalng n embedded multprocessors. In Proc. Internatonal Symposum on Hardware/Software Codesgn, pages , 21. [4] R. Cherabudd, M. Bayoum, and H. Krshnamurthy. A low power based system parttonng and bndng technque for mult-chp module archtectures. In Proc. Great Lakes Symposum on VLSI, pages , [5] P. Eles, A. obol, P. Pop, and Z. Peng. Schedulng wth bus access optmzaton for dstrbuted embedded systems. IEEE Transactons on VLSI Systems, (5): , 2. [6] E. Huwang, F. Vahd, and Y.-C. Hsu. FSM functonal parttonng for low power. In Proc. esgn, Automaton and Test n Europe, pages 22 2, [7] P. V. Knudsen and J. Madsen. Integratng communcaton protocol selecton wth hardware/software codesgn. IEEE Transactons on Computer-Aded esgn of Integrated Crcuts and Systems, 1(): , August [] J. Lu, P. H. Chou, and N. Bagherzadeh. Communcaton speed selecton for embedded systems wth networked voltage-scalable processors. In Proc. Internatonal Symposum on Hardware/Software Codesgn, pages , Aprl 22. [9] J. Luo and N. K. Jha. Battery-aware statc schedulng for dstrbuted real-tme embedded systems. In Proc. esgn Automaton Conference, pages , June 21. [1] R. Ortega and G. Borrello. Communcaton synthess for dstrbuted embedded systems. In Proc. Internatonal Conference on Computer-Aded esgn, pages , 199. [11] A. Wang and A. Chandrakasan. Energy effcent system parttonng for dstrbuted wreless sensor networks. In Proc. IEEE Internatonal Conference on Acoustcs, Speech and Sgnal Processng, pages 95 9, May 21. [12] E. F. Weglarz, K. K. Saluja, and M. H. Lpast. Mnmzng energy consumpton for hgh-performance processng. In Proc. Asan and South Pacfc esgn Automaton Conference, pages , 22. [13] W. Wolf. An archtectural co-synthess algorthm for dstrbuted embedded computng systems. IEEE Transactons on VLSI Systems, pages , June

Combined Functional Partitioning and Communication Speed Selection for Networked Voltage-Scalable Processors

Combined Functional Partitioning and Communication Speed Selection for Networked Voltage-Scalable Processors Combned Functonal Parttonng and Communcaton Speed Selecton for Networked Voltage-Scalable Processors Jnfeng Lu, Pa H. Chou, Nader Bagherzadeh epartment of Electrcal & Computer Engneerng Unversty of Calforna,

More information

Communication Speed Selection and Functional Partitioning for Low-Energy On-Chip Networked Multiprocessor

Communication Speed Selection and Functional Partitioning for Low-Energy On-Chip Networked Multiprocessor ommuncaton Speed Selecton and Functonal Parttonng for Low-Energy On-hp Networed ultprocessor Jnfeng Lu, Pa H. hou, Nader Bagherzadeh epartment of Electrcal & omputer Engneerng Unversty of alforna, Irvne,

More information

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique //00 :0 AM Outlne and Readng The Greedy Method The Greedy Method Technque (secton.) Fractonal Knapsack Problem (secton..) Task Schedulng (secton..) Mnmum Spannng Trees (secton.) Change Money Problem Greedy

More information

AADL : about scheduling analysis

AADL : about scheduling analysis AADL : about schedulng analyss Schedulng analyss, what s t? Embedded real-tme crtcal systems have temporal constrants to meet (e.g. deadlne). Many systems are bult wth operatng systems provdng multtaskng

More information

The Codesign Challenge

The Codesign Challenge ECE 4530 Codesgn Challenge Fall 2007 Hardware/Software Codesgn The Codesgn Challenge Objectves In the codesgn challenge, your task s to accelerate a gven software reference mplementaton as fast as possble.

More information

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization Problem efntons and Evaluaton Crtera for Computatonal Expensve Optmzaton B. Lu 1, Q. Chen and Q. Zhang 3, J. J. Lang 4, P. N. Suganthan, B. Y. Qu 6 1 epartment of Computng, Glyndwr Unversty, UK Faclty

More information

Kent State University CS 4/ Design and Analysis of Algorithms. Dept. of Math & Computer Science LECT-16. Dynamic Programming

Kent State University CS 4/ Design and Analysis of Algorithms. Dept. of Math & Computer Science LECT-16. Dynamic Programming CS 4/560 Desgn and Analyss of Algorthms Kent State Unversty Dept. of Math & Computer Scence LECT-6 Dynamc Programmng 2 Dynamc Programmng Dynamc Programmng, lke the dvde-and-conquer method, solves problems

More information

Configuration Management in Multi-Context Reconfigurable Systems for Simultaneous Performance and Power Optimizations*

Configuration Management in Multi-Context Reconfigurable Systems for Simultaneous Performance and Power Optimizations* Confguraton Management n Mult-Context Reconfgurable Systems for Smultaneous Performance and Power Optmzatons* Rafael Maestre, Mlagros Fernandez Departamento de Arqutectura de Computadores y Automátca Unversdad

More information

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

CHAPTER 2 PROPOSED IMPROVED PARTICLE SWARM OPTIMIZATION

CHAPTER 2 PROPOSED IMPROVED PARTICLE SWARM OPTIMIZATION 24 CHAPTER 2 PROPOSED IMPROVED PARTICLE SWARM OPTIMIZATION The present chapter proposes an IPSO approach for multprocessor task schedulng problem wth two classfcatons, namely, statc ndependent tasks and

More information

Routing in Degree-constrained FSO Mesh Networks

Routing in Degree-constrained FSO Mesh Networks Internatonal Journal of Hybrd Informaton Technology Vol., No., Aprl, 009 Routng n Degree-constraned FSO Mesh Networks Zpng Hu, Pramode Verma, and James Sluss Jr. School of Electrcal & Computer Engneerng

More information

Distributed Resource Scheduling in Grid Computing Using Fuzzy Approach

Distributed Resource Scheduling in Grid Computing Using Fuzzy Approach Dstrbuted Resource Schedulng n Grd Computng Usng Fuzzy Approach Shahram Amn, Mohammad Ahmad Computer Engneerng Department Islamc Azad Unversty branch Mahallat, Iran Islamc Azad Unversty branch khomen,

More information

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 A mathematcal programmng approach to the analyss, desgn and

More information

Maintaining temporal validity of real-time data on non-continuously executing resources

Maintaining temporal validity of real-time data on non-continuously executing resources Mantanng temporal valdty of real-tme data on non-contnuously executng resources Tan Ba, Hong Lu and Juan Yang Hunan Insttute of Scence and Technology, College of Computer Scence, 44, Yueyang, Chna Wuhan

More information

DESIGNING TRANSMISSION SCHEDULES FOR WIRELESS AD HOC NETWORKS TO MAXIMIZE NETWORK THROUGHPUT

DESIGNING TRANSMISSION SCHEDULES FOR WIRELESS AD HOC NETWORKS TO MAXIMIZE NETWORK THROUGHPUT DESIGNING TRANSMISSION SCHEDULES FOR WIRELESS AD HOC NETWORKS TO MAXIMIZE NETWORK THROUGHPUT Bran J. Wolf, Joseph L. Hammond, and Harlan B. Russell Dept. of Electrcal and Computer Engneerng, Clemson Unversty,

More information

Simulation Based Analysis of FAST TCP using OMNET++

Simulation Based Analysis of FAST TCP using OMNET++ Smulaton Based Analyss of FAST TCP usng OMNET++ Umar ul Hassan 04030038@lums.edu.pk Md Term Report CS678 Topcs n Internet Research Sprng, 2006 Introducton Internet traffc s doublng roughly every 3 months

More information

Communication Speed Selection and Functional Partitioning for Low-Energy On-Chip Networked Multiprocessor

Communication Speed Selection and Functional Partitioning for Low-Energy On-Chip Networked Multiprocessor Communication Speed Selection and Functional Partitioning for Low-Energy On-Chip Networked Multiprocessor Jinfeng Liu, Pai H. Chou, Nader Bagherzadeh epartment of Electrical & Computer Engineering University

More information

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,

More information

Chapter 1. Introduction

Chapter 1. Introduction Chapter 1 Introducton 1.1 Parallel Processng There s a contnual demand for greater computatonal speed from a computer system than s currently possble (.e. sequental systems). Areas need great computatonal

More information

Load Balancing for Hex-Cell Interconnection Network

Load Balancing for Hex-Cell Interconnection Network Int. J. Communcatons, Network and System Scences,,, - Publshed Onlne Aprl n ScRes. http://www.scrp.org/journal/jcns http://dx.do.org/./jcns.. Load Balancng for Hex-Cell Interconnecton Network Saher Manaseer,

More information

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,

More information

An Entropy-Based Approach to Integrated Information Needs Assessment

An Entropy-Based Approach to Integrated Information Needs Assessment Dstrbuton Statement A: Approved for publc release; dstrbuton s unlmted. An Entropy-Based Approach to ntegrated nformaton Needs Assessment June 8, 2004 Wllam J. Farrell Lockheed Martn Advanced Technology

More information

Module Management Tool in Software Development Organizations

Module Management Tool in Software Development Organizations Journal of Computer Scence (5): 8-, 7 ISSN 59-66 7 Scence Publcatons Management Tool n Software Development Organzatons Ahmad A. Al-Rababah and Mohammad A. Al-Rababah Faculty of IT, Al-Ahlyyah Amman Unversty,

More information

5 The Primal-Dual Method

5 The Primal-Dual Method 5 The Prmal-Dual Method Orgnally desgned as a method for solvng lnear programs, where t reduces weghted optmzaton problems to smpler combnatoral ones, the prmal-dual method (PDM) has receved much attenton

More information

Parallel matrix-vector multiplication

Parallel matrix-vector multiplication Appendx A Parallel matrx-vector multplcaton The reduced transton matrx of the three-dmensonal cage model for gel electrophoress, descrbed n secton 3.2, becomes excessvely large for polymer lengths more

More information

Comparison of Heuristics for Scheduling Independent Tasks on Heterogeneous Distributed Environments

Comparison of Heuristics for Scheduling Independent Tasks on Heterogeneous Distributed Environments Comparson of Heurstcs for Schedulng Independent Tasks on Heterogeneous Dstrbuted Envronments Hesam Izakan¹, Ath Abraham², Senor Member, IEEE, Václav Snášel³ ¹ Islamc Azad Unversty, Ramsar Branch, Ramsar,

More information

Efficient Distributed File System (EDFS)

Efficient Distributed File System (EDFS) Effcent Dstrbuted Fle System (EDFS) (Sem-Centralzed) Debessay(Debsh) Fesehaye, Rahul Malk & Klara Naherstedt Unversty of Illnos-Urbana Champagn Contents Problem Statement, Related Work, EDFS Desgn Rate

More information

TPL-Aware Displacement-driven Detailed Placement Refinement with Coloring Constraints

TPL-Aware Displacement-driven Detailed Placement Refinement with Coloring Constraints TPL-ware Dsplacement-drven Detaled Placement Refnement wth Colorng Constrants Tao Ln Iowa State Unversty tln@astate.edu Chrs Chu Iowa State Unversty cnchu@astate.edu BSTRCT To mnmze the effect of process

More information

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster

More information

3. CR parameters and Multi-Objective Fitness Function

3. CR parameters and Multi-Objective Fitness Function 3 CR parameters and Mult-objectve Ftness Functon 41 3. CR parameters and Mult-Objectve Ftness Functon 3.1. Introducton Cogntve rados dynamcally confgure the wreless communcaton system, whch takes beneft

More information

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

A SYSTOLIC APPROACH TO LOOP PARTITIONING AND MAPPING INTO FIXED SIZE DISTRIBUTED MEMORY ARCHITECTURES

A SYSTOLIC APPROACH TO LOOP PARTITIONING AND MAPPING INTO FIXED SIZE DISTRIBUTED MEMORY ARCHITECTURES A SYSOLIC APPROACH O LOOP PARIIONING AND MAPPING INO FIXED SIZE DISRIBUED MEMORY ARCHIECURES Ioanns Drosts, Nektaros Kozrs, George Papakonstantnou and Panayots sanakas Natonal echncal Unversty of Athens

More information

Analysis of Continuous Beams in General

Analysis of Continuous Beams in General Analyss of Contnuous Beams n General Contnuous beams consdered here are prsmatc, rgdly connected to each beam segment and supported at varous ponts along the beam. onts are selected at ponts of support,

More information

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour 6.854 Advanced Algorthms Petar Maymounkov Problem Set 11 (November 23, 2005) Wth: Benjamn Rossman, Oren Wemann, and Pouya Kheradpour Problem 1. We reduce vertex cover to MAX-SAT wth weghts, such that the

More information

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

Efficient Content Distribution in Wireless P2P Networks

Efficient Content Distribution in Wireless P2P Networks Effcent Content Dstrbuton n Wreless P2P Networs Qong Sun, Vctor O. K. L, and Ka-Cheong Leung Department of Electrcal and Electronc Engneerng The Unversty of Hong Kong Pofulam Road, Hong Kong, Chna {oansun,

More information

Meta-heuristics for Multidimensional Knapsack Problems

Meta-heuristics for Multidimensional Knapsack Problems 2012 4th Internatonal Conference on Computer Research and Development IPCSIT vol.39 (2012) (2012) IACSIT Press, Sngapore Meta-heurstcs for Multdmensonal Knapsack Problems Zhbao Man + Computer Scence Department,

More information

Performance Evaluation of Information Retrieval Systems

Performance Evaluation of Information Retrieval Systems Why System Evaluaton? Performance Evaluaton of Informaton Retreval Systems Many sldes n ths secton are adapted from Prof. Joydeep Ghosh (UT ECE) who n turn adapted them from Prof. Dk Lee (Unv. of Scence

More information

A Frame Packing Mechanism Using PDO Communication Service within CANopen

A Frame Packing Mechanism Using PDO Communication Service within CANopen 28 A Frame Packng Mechansm Usng PDO Communcaton Servce wthn CANopen Mnkoo Kang and Kejn Park Dvson of Industral & Informaton Systems Engneerng, Ajou Unversty, Suwon, Gyeongg-do, South Korea Summary The

More information

Private Information Retrieval (PIR)

Private Information Retrieval (PIR) 2 Levente Buttyán Problem formulaton Alce wants to obtan nformaton from a database, but she does not want the database to learn whch nformaton she wanted e.g., Alce s an nvestor queryng a stock-market

More information

Virtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory

Virtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory Background EECS. Operatng System Fundamentals No. Vrtual Memory Prof. Hu Jang Department of Electrcal Engneerng and Computer Scence, York Unversty Memory-management methods normally requres the entre process

More information

Resource and Virtual Function Status Monitoring in Network Function Virtualization Environment

Resource and Virtual Function Status Monitoring in Network Function Virtualization Environment Journal of Physcs: Conference Seres PAPER OPEN ACCESS Resource and Vrtual Functon Status Montorng n Network Functon Vrtualzaton Envronment To cte ths artcle: MS Ha et al 2018 J. Phys.: Conf. Ser. 1087

More information

Load-Balanced Anycast Routing

Load-Balanced Anycast Routing Load-Balanced Anycast Routng Chng-Yu Ln, Jung-Hua Lo, and Sy-Yen Kuo Department of Electrcal Engneerng atonal Tawan Unversty, Tape, Tawan sykuo@cc.ee.ntu.edu.tw Abstract For fault-tolerance and load-balance

More information

Test-Cost Modeling and Optimal Test-Flow Selection of 3D-Stacked ICs

Test-Cost Modeling and Optimal Test-Flow Selection of 3D-Stacked ICs Test-Cost Modelng and Optmal Test-Flow Selecton of 3D-Stacked ICs Mukesh Agrawal, Student Member, IEEE, and Krshnendu Chakrabarty, Fellow, IEEE Abstract Three-dmensonal (3D) ntegraton s an attractve technology

More information

Dynamic Voltage Scaling of Supply and Body Bias Exploiting Software Runtime Distribution

Dynamic Voltage Scaling of Supply and Body Bias Exploiting Software Runtime Distribution Dynamc Voltage Scalng of Supply and Body Bas Explotng Software Runtme Dstrbuton Sungpack Hong EE Department Stanford Unversty Sungjoo Yoo, Byeong Bn, Kyu-Myung Cho, Soo-Kwan Eo Samsung Electroncs Taehwan

More information

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms Course Introducton Course Topcs Exams, abs, Proects A quc loo at a few algorthms 1 Advanced Data Structures and Algorthms Descrpton: We are gong to dscuss algorthm complexty analyss, algorthm desgn technques

More information

A Hybrid Genetic Algorithm for Routing Optimization in IP Networks Utilizing Bandwidth and Delay Metrics

A Hybrid Genetic Algorithm for Routing Optimization in IP Networks Utilizing Bandwidth and Delay Metrics A Hybrd Genetc Algorthm for Routng Optmzaton n IP Networks Utlzng Bandwdth and Delay Metrcs Anton Redl Insttute of Communcaton Networks, Munch Unversty of Technology, Arcsstr. 21, 80290 Munch, Germany

More information

Cost-efficient deployment of distributed software services

Cost-efficient deployment of distributed software services 1/30 Cost-effcent deployment of dstrbuted software servces csorba@tem.ntnu.no 2/30 Short ntroducton & contents Cost-effcent deployment of dstrbuted software servces Cost functons Bo-nspred decentralzed

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Decson surface s a hyperplane (lne n 2D) n feature space (smlar to the Perceptron) Arguably, the most mportant recent dscovery n machne learnng In a nutshell: map the data to a predetermned

More information

Concurrent Apriori Data Mining Algorithms

Concurrent Apriori Data Mining Algorithms Concurrent Apror Data Mnng Algorthms Vassl Halatchev Department of Electrcal Engneerng and Computer Scence York Unversty, Toronto October 8, 2015 Outlne Why t s mportant Introducton to Assocaton Rule Mnng

More information

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009. Farrukh Jabeen Algorthms 51 Assgnment #2 Due Date: June 15, 29. Assgnment # 2 Chapter 3 Dscrete Fourer Transforms Implement the FFT for the DFT. Descrbed n sectons 3.1 and 3.2. Delverables: 1. Concse descrpton

More information

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 An Iteratve Soluton Approach to Process Plant Layout usng Mxed

More information

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z.

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z. TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS Muradalyev AZ Azerbajan Scentfc-Research and Desgn-Prospectng Insttute of Energetc AZ1012, Ave HZardab-94 E-mal:aydn_murad@yahoocom Importance of

More information

GSLM Operations Research II Fall 13/14

GSLM Operations Research II Fall 13/14 GSLM 58 Operatons Research II Fall /4 6. Separable Programmng Consder a general NLP mn f(x) s.t. g j (x) b j j =. m. Defnton 6.. The NLP s a separable program f ts objectve functon and all constrants are

More information

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Proceedngs of the Wnter Smulaton Conference M E Kuhl, N M Steger, F B Armstrong, and J A Jones, eds A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Mark W Brantley Chun-Hung

More information

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Some materal adapted from Mohamed Youns, UMBC CMSC 611 Spr 2003 course sldes Some materal adapted from Hennessy & Patterson / 2003 Elsever Scence Performance = 1 Executon tme Speedup = Performance (B)

More information

Hierarchical clustering for gene expression data analysis

Hierarchical clustering for gene expression data analysis Herarchcal clusterng for gene expresson data analyss Gorgo Valentn e-mal: valentn@ds.unm.t Clusterng of Mcroarray Data. Clusterng of gene expresson profles (rows) => dscovery of co-regulated and functonally

More information

MOBILE Cloud Computing (MCC) extends the capabilities

MOBILE Cloud Computing (MCC) extends the capabilities 1 Resource Sharng of a Computng Access Pont for Mult-user Moble Cloud Offloadng wth Delay Constrants Meng-Hs Chen, Student Member, IEEE, Mn Dong, Senor Member, IEEE, Ben Lang, Fellow, IEEE arxv:1712.00030v2

More information

A Saturation Binary Neural Network for Crossbar Switching Problem

A Saturation Binary Neural Network for Crossbar Switching Problem A Saturaton Bnary Neural Network for Crossbar Swtchng Problem Cu Zhang 1, L-Qng Zhao 2, and Rong-Long Wang 2 1 Department of Autocontrol, Laonng Insttute of Scence and Technology, Benx, Chna bxlkyzhangcu@163.com

More information

Real-time Fault-tolerant Scheduling Algorithm for Distributed Computing Systems

Real-time Fault-tolerant Scheduling Algorithm for Distributed Computing Systems Real-tme Fault-tolerant Schedulng Algorthm for Dstrbuted Computng Systems Yun Lng, Y Ouyang College of Computer Scence and Informaton Engneerng Zheang Gongshang Unversty Postal code: 310018 P.R.CHINA {ylng,

More information

Biostatistics 615/815

Biostatistics 615/815 The E-M Algorthm Bostatstcs 615/815 Lecture 17 Last Lecture: The Smplex Method General method for optmzaton Makes few assumptons about functon Crawls towards mnmum Some recommendatons Multple startng ponts

More information

Cache Performance 3/28/17. Agenda. Cache Abstraction and Metrics. Direct-Mapped Cache: Placement and Access

Cache Performance 3/28/17. Agenda. Cache Abstraction and Metrics. Direct-Mapped Cache: Placement and Access Agenda Cache Performance Samra Khan March 28, 217 Revew from last lecture Cache access Assocatvty Replacement Cache Performance Cache Abstracton and Metrcs Address Tag Store (s the address n the cache?

More information

Improving Low Density Parity Check Codes Over the Erasure Channel. The Nelder Mead Downhill Simplex Method. Scott Stransky

Improving Low Density Parity Check Codes Over the Erasure Channel. The Nelder Mead Downhill Simplex Method. Scott Stransky Improvng Low Densty Party Check Codes Over the Erasure Channel The Nelder Mead Downhll Smplex Method Scott Stransky Programmng n conjuncton wth: Bors Cukalovc 18.413 Fnal Project Sprng 2004 Page 1 Abstract

More information

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,

More information

On the Efficiency of Swap-Based Clustering

On the Efficiency of Swap-Based Clustering On the Effcency of Swap-Based Clusterng Pas Fränt and Oll Vrmaok Department of Computer Scence, Unversty of Joensuu, Fnland {frant, ovrma}@cs.oensuu.f Abstract. Random swap-based clusterng s very smple

More information

11. APPROXIMATION ALGORITHMS

11. APPROXIMATION ALGORITHMS Copng wth NP-completeness 11. APPROXIMATION ALGORITHMS load balancng center selecton prcng method: vertex cover LP roundng: vertex cover generalzed load balancng knapsack problem Q. Suppose I need to solve

More information

RAP. Speed/RAP/CODA. Real-time Systems. Modeling the sensor networks. Real-time Systems. Modeling the sensor networks. Real-time systems:

RAP. Speed/RAP/CODA. Real-time Systems. Modeling the sensor networks. Real-time Systems. Modeling the sensor networks. Real-time systems: Speed/RAP/CODA Presented by Octav Chpara Real-tme Systems Many wreless sensor network applcatons requre real-tme support Survellance and trackng Border patrol Fre fghtng Real-tme systems: Hard real-tme:

More information

IP Camera Configuration Software Instruction Manual

IP Camera Configuration Software Instruction Manual IP Camera 9483 - Confguraton Software Instructon Manual VBD 612-4 (10.14) Dear Customer, Wth your purchase of ths IP Camera, you have chosen a qualty product manufactured by RADEMACHER. Thank you for the

More information

Wishing you all a Total Quality New Year!

Wishing you all a Total Quality New Year! Total Qualty Management and Sx Sgma Post Graduate Program 214-15 Sesson 4 Vnay Kumar Kalakband Assstant Professor Operatons & Systems Area 1 Wshng you all a Total Qualty New Year! Hope you acheve Sx sgma

More information

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices Internatonal Mathematcal Forum, Vol 7, 2012, no 52, 2549-2554 An Applcaton of the Dulmage-Mendelsohn Decomposton to Sparse Null Space Bases of Full Row Rank Matrces Mostafa Khorramzadeh Department of Mathematcal

More information

Repeater Insertion for Two-Terminal Nets in Three-Dimensional Integrated Circuits

Repeater Insertion for Two-Terminal Nets in Three-Dimensional Integrated Circuits Repeater Inserton for Two-Termnal Nets n Three-Dmensonal Integrated Crcuts Hu Xu, Vasls F. Pavlds, and Govann De Mchel LSI - EPFL, CH-5, Swtzerland, {hu.xu,vasleos.pavlds,govann.demchel}@epfl.ch Abstract.

More information

Combined Rate Control and Mode Decision Optimization for MPEG-2 Transcoding with Spatial Resolution Reduction

Combined Rate Control and Mode Decision Optimization for MPEG-2 Transcoding with Spatial Resolution Reduction MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Combned Rate Control and Mode Decson Optmzaton for MPEG-2 Transcodng wth Spatal Resoluton Reducton TR2003-7 December 2003 Abstract Ths paper

More information

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS ARPN Journal of Engneerng and Appled Scences 006-017 Asan Research Publshng Network (ARPN). All rghts reserved. NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS Igor Grgoryev, Svetlana

More information

Reliability and Energy-aware Cache Reconfiguration for Embedded Systems

Reliability and Energy-aware Cache Reconfiguration for Embedded Systems Relablty and Energy-aware Cache Reconfguraton for Embedded Systems Yuanwen Huang and Prabhat Mshra Department of Computer and Informaton Scence and Engneerng Unversty of Florda, Ganesvlle FL 326-62, USA

More information

Vectorization in the Polyhedral Model

Vectorization in the Polyhedral Model Vectorzaton n the Polyhedral Model Lous-Noël Pouchet pouchet@cse.oho-state.edu Dept. of Computer Scence and Engneerng, the Oho State Unversty October 200 888. Introducton: Overvew Vectorzaton: Detecton

More information

Smoothing Spline ANOVA for variable screening

Smoothing Spline ANOVA for variable screening Smoothng Splne ANOVA for varable screenng a useful tool for metamodels tranng and mult-objectve optmzaton L. Rcco, E. Rgon, A. Turco Outlne RSM Introducton Possble couplng Test case MOO MOO wth Game Theory

More information

Maximum Weight Matching Dispatching Scheme in Buffered Clos-Network Packet Switches

Maximum Weight Matching Dispatching Scheme in Buffered Clos-Network Packet Switches Maxmum Weght Matchng Dspatchng Scheme n Buffered Clos-Network Packet Swtches Roberto Roas-Cessa, Member, IEEE, E Ok, Member, IEEE, and H. Jonathan Chao, Fellow, IEEE Abstract The scalablty of Clos-network

More information

Research of Dynamic Access to Cloud Database Based on Improved Pheromone Algorithm

Research of Dynamic Access to Cloud Database Based on Improved Pheromone Algorithm , pp.197-202 http://dx.do.org/10.14257/dta.2016.9.5.20 Research of Dynamc Access to Cloud Database Based on Improved Pheromone Algorthm Yongqang L 1 and Jn Pan 2 1 (Software Technology Vocatonal College,

More information

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision SLAM Summer School 2006 Practcal 2: SLAM usng Monocular Vson Javer Cvera, Unversty of Zaragoza Andrew J. Davson, Imperal College London J.M.M Montel, Unversty of Zaragoza. josemar@unzar.es, jcvera@unzar.es,

More information

A GENETIC ALGORITHM FOR PROCESS SCHEDULING IN DISTRIBUTED OPERATING SYSTEMS CONSIDERING LOAD BALANCING

A GENETIC ALGORITHM FOR PROCESS SCHEDULING IN DISTRIBUTED OPERATING SYSTEMS CONSIDERING LOAD BALANCING A GENETIC ALGORITHM FOR PROCESS SCHEDULING IN DISTRIBUTED OPERATING SYSTEMS CONSIDERING LOAD BALANCING M. Nkravan and M. H. Kashan Department of Electrcal Computer Islamc Azad Unversty, Shahrar Shahreqods

More information

Positive Semi-definite Programming Localization in Wireless Sensor Networks

Positive Semi-definite Programming Localization in Wireless Sensor Networks Postve Sem-defnte Programmng Localzaton n Wreless Sensor etworks Shengdong Xe 1,, Jn Wang, Aqun Hu 1, Yunl Gu, Jang Xu, 1 School of Informaton Scence and Engneerng, Southeast Unversty, 10096, anjng Computer

More information

Solving two-person zero-sum game by Matlab

Solving two-person zero-sum game by Matlab Appled Mechancs and Materals Onlne: 2011-02-02 ISSN: 1662-7482, Vols. 50-51, pp 262-265 do:10.4028/www.scentfc.net/amm.50-51.262 2011 Trans Tech Publcatons, Swtzerland Solvng two-person zero-sum game by

More information

Mixed-Criticality Scheduling on Multiprocessors using Task Grouping

Mixed-Criticality Scheduling on Multiprocessors using Task Grouping Mxed-Crtcalty Schedulng on Multprocessors usng Task Groupng Jankang Ren Lnh Th Xuan Phan School of Software Technology, Dalan Unversty of Technology, Chna Computer and Informaton Scence Department, Unversty

More information

A Genetic Algorithm Based Dynamic Load Balancing Scheme for Heterogeneous Distributed Systems

A Genetic Algorithm Based Dynamic Load Balancing Scheme for Heterogeneous Distributed Systems Proceedngs of the Internatonal Conference on Parallel and Dstrbuted Processng Technques and Applcatons, PDPTA 2008, Las Vegas, Nevada, USA, July 14-17, 2008, 2 Volumes. CSREA Press 2008, ISBN 1-60132-084-1

More information

Two-Stage Data Distribution for Distributed Surveillance Video Processing with Hybrid Storage Architecture

Two-Stage Data Distribution for Distributed Surveillance Video Processing with Hybrid Storage Architecture Two-Stage Data Dstrbuton for Dstrbuted Survellance Vdeo Processng wth Hybrd Storage Archtecture Yangyang Gao, Hatao Zhang, Bngchang Tang, Yanpe Zhu, Huadong Ma Bejng Key Lab of Intellgent Telecomm. Software

More information

An Efficient Garbage Collection for Flash Memory-Based Virtual Memory Systems

An Efficient Garbage Collection for Flash Memory-Based Virtual Memory Systems S. J and D. Shn: An Effcent Garbage Collecton for Flash Memory-Based Vrtual Memory Systems 2355 An Effcent Garbage Collecton for Flash Memory-Based Vrtual Memory Systems Seunggu J and Dongkun Shn, Member,

More information

A fast algorithm for color image segmentation

A fast algorithm for color image segmentation Unersty of Wollongong Research Onlne Faculty of Informatcs - Papers (Arche) Faculty of Engneerng and Informaton Scences 006 A fast algorthm for color mage segmentaton L. Dong Unersty of Wollongong, lju@uow.edu.au

More information

Real-time Scheduling

Real-time Scheduling Real-tme Schedulng COE718: Embedded System Desgn http://www.ee.ryerson.ca/~courses/coe718/ Dr. Gul N. Khan http://www.ee.ryerson.ca/~gnkhan Electrcal and Computer Engneerng Ryerson Unversty Overvew RTX

More information

Classifier Selection Based on Data Complexity Measures *

Classifier Selection Based on Data Complexity Measures * Classfer Selecton Based on Data Complexty Measures * Edth Hernández-Reyes, J.A. Carrasco-Ochoa, and J.Fco. Martínez-Trndad Natonal Insttute for Astrophyscs, Optcs and Electroncs, Lus Enrque Erro No.1 Sta.

More information

Power-Aware Mapping for Network-on-Chip Architectures under Bandwidth and Latency Constraints

Power-Aware Mapping for Network-on-Chip Architectures under Bandwidth and Latency Constraints Power-Aware Mappng for Network-on-Chp Archtectures under Bandwdth and Latency Constrants Xaohang Wang 1,2, Me Yang 2, Yngtao Jang 2, and Peng Lu 1 1 Department of Informaton Scence and Electronc Engneerng,

More information

Control strategies for network efficiency and resilience with route choice

Control strategies for network efficiency and resilience with route choice Control strateges for networ effcency and reslence wth route choce Andy Chow Ru Sha Centre for Transport Studes Unversty College London, UK Centralsed strateges UK 1 Centralsed strateges Some effectve

More information

Active Contours/Snakes

Active Contours/Snakes Actve Contours/Snakes Erkut Erdem Acknowledgement: The sldes are adapted from the sldes prepared by K. Grauman of Unversty of Texas at Austn Fttng: Edges vs. boundares Edges useful sgnal to ndcate occludng

More information

Programming in Fortran 90 : 2017/2018

Programming in Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Exercse 1 : Evaluaton of functon dependng on nput Wrte a program who evaluate the functon f (x,y) for any two user specfed values

More information

A protocol for mixed-criticality management in switched Ethernet networks

A protocol for mixed-criticality management in switched Ethernet networks A protocol for mxed-crtcalty management n swtched Ethernet networks Olver CROS, Laurent GEORGE Unversté Pars-Est, LIGM / ESIEE, France cros@ece.fr,lgeorge@eee.org Xaotng LI ECE Pars / LACSC, France xaotng.l@ece.fr

More information

ARTICLE IN PRESS. Signal Processing: Image Communication

ARTICLE IN PRESS. Signal Processing: Image Communication Sgnal Processng: Image Communcaton 23 (2008) 754 768 Contents lsts avalable at ScenceDrect Sgnal Processng: Image Communcaton journal homepage: www.elsever.com/locate/mage Dstrbuted meda rate allocaton

More information

DEAR: A DEVICE AND ENERGY AWARE ROUTING PROTOCOL FOR MOBILE AD HOC NETWORKS

DEAR: A DEVICE AND ENERGY AWARE ROUTING PROTOCOL FOR MOBILE AD HOC NETWORKS DEAR: A DEVICE AND ENERGY AWARE ROUTING PROTOCOL FOR MOBILE AD HOC NETWORKS Arun Avudanayagam Yuguang Fang Wenjng Lou Department of Electrcal and Computer Engneerng Unversty of Florda Ganesvlle, FL 3261

More information

CACHE MEMORY DESIGN FOR INTERNET PROCESSORS

CACHE MEMORY DESIGN FOR INTERNET PROCESSORS CACHE MEMORY DESIGN FOR INTERNET PROCESSORS WE EVALUATE A SERIES OF THREE PROGRESSIVELY MORE AGGRESSIVE ROUTING-TABLE CACHE DESIGNS AND DEMONSTRATE THAT THE INCORPORATION OF HARDWARE CACHES INTO INTERNET

More information

Motivation. EE 457 Unit 4. Throughput vs. Latency. Performance Depends on View Point?! Computer System Performance. An individual user wants to:

Motivation. EE 457 Unit 4. Throughput vs. Latency. Performance Depends on View Point?! Computer System Performance. An individual user wants to: 4.1 4.2 Motvaton EE 457 Unt 4 Computer System Performance An ndvdual user wants to: Mnmze sngle program executon tme A datacenter owner wants to: Maxmze number of Mnmze ( ) http://e-tellgentnternetmarketng.com/webste/frustrated-computer-user-2/

More information

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points; Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features

More information