Particle Swarm Optimization for HW/SW Partitioning

Size: px
Start display at page:

Download "Particle Swarm Optimization for HW/SW Partitioning"

Transcription

1 Partcle Swarm Optmzaton for HW/SW Parttonng M. B. Abdelhalm and S. E. D. Habb Electroncs and Communcatons Department, Faculty of Engneerng - Caro Unversty Egypt 3 1. Introducton Embedded systems typcally consst of applcaton specfc hardware parts and programmable parts, e.g. processors lke DSPs, core processors or ASIPs. In comparson to the hardware parts, the software parts are much easer to develop and modfy. Thus, software s less expensve n terms of costs and development tme. Hardware, however, provdes better performance. For ths reason, a system desgner's goal s to desgn a system fulfllng all system constrants. The co-desgn phase, durng whch the system specfcaton s parttoned onto hardware and programmable parts of the target archtecture, s called Hardware/Software parttonng. Ths phase represents one key ssue durng the desgn process of heterogeneous systems. Some early co-desgn approaches [Marrec et al. 1998, Cloute et al. 1999] carred out the HW/SW parttonng task manually. Ths manual approach s lmted to small desgn problems wth small number of consttuent modules. Addtonally, automatc Hardware/Software parttonng s of large nterest because the problem tself s a very complex optmzaton problem. Varetes of Hardware/Software parttonng approaches are avalable n the lterature. Followng Neman [1998], these approaches can be dstngushed by the followng aspects: 1. The complexty of the supported parttonng problem, e.g. whether the target archtecture s fxed or optmzed durng parttonng. 2. The supported target archtecture, e.g. sngle-processor or mult-processor, ASIC or FPGA-based hardware. 3. The applcaton doman, e.g. ether data-flow or control-flow domnated systems. 4. The optmzaton goal determned by the chosen cost functon, e.g. hardware mnmzaton under tmng (performance) constrants, performance maxmzaton under resource constrants, or low power solutons. 5. The optmzaton technque, ncludng heurstc, probablstc or exact methods, compared by computaton tme and the qualty of results. 6. The optmzaton aspects, e.g. whether communcaton and/or hardware sharng are taken nto account. 7. The granularty of the peces for whch costs are estmated for parttonng, e.g. granules at the statement, basc block, functon, process or task level. 8. The estmaton method tself, whether the estmatons are computed by specal estmaton tools or by analyzng the results of synthess tools and complers.

2 50 Partcle Swarm Optmzaton 9. The cost metrcs used durng parttonng, ncludng cost metrcs for hardware mplementatons (e.g. executon tme, chp area, pn requrements, power consumpton, testablty metrcs), software cost metrcs (e.g. executon tme, power consumpton, program and data memory usage) and nterface metrcs (e.g. communcaton tme or addtonal resource-power costs). 10. The number of these cost metrcs, e.g. whether only one hardware soluton s consdered for each granule or a complete Area/Tme curve. 11. The degree of automaton. 12. The degree of user-nteracton to explot the valuable experence of the desgner. 13. The ablty for Desgn-Space-Exploraton (DSE) enablng the desgner to compare dfferent parttons and to fnd alternatve solutons for dfferent objectve functons n short computaton tme. In ths Chapter, we nvestgate the applcaton of the Partcle Swarm Optmzaton (PSO) technque for solvng the Hardware/Software parttonng problem. The PSO s attractve for the Hardware/Software parttonng problem as t offers reasonable coverage of the desgn space together wth O(n) man loop's executon tme, where n s the number of proposed solutons that wll evolve to provde the fnal soluton. Ths Chapter s an extended verson of the authors 2006 paper [Abdelhalm et al. 2006]. The organzaton of ths chapter s as follows: In Secton 2, we ntroduce the HW/SW parttonng problem. Secton 3 ntroduces the Partcle Swarm Optmzaton formulaton for HW/SW Parttonng problem followed by a case study. Secton 4 ntroduces the technque extensons, namely, hardware mplementaton alternatves, HW/SW communcatons modelng, and fne tunng algorthm. Fnally, Secton 5 gves the conclusons of our work. 2. HW/SW Parttonng The most mportant challenge n the embedded system desgn s parttonng;.e. decdng whch components (or operatons) of the system should be mplemented n hardware and whch ones n software. The granularty of each component can be a sngle nstructon, a short sequence of nstructons, a basc block or a functon (procedure). To clarfy the HW/SW parttonng problem, let us represent the system by a Data Flow Graph (DFG) that defnes the sequencng of the operatons startng from the nput capture to the output evaluaton. Each node n ths DFG represents a component (or operaton). Implementng a gven component n HW or n SW mples dfferent delay/ area/ power/ desgn-tme/ tme-to-market/ desgn costs. The HW/SW parttonng problem s, thus, an optmzaton problem where we seek to fnd the partton ( an assgnment vector of each component to HW or SW) that mnmzes a user-defned global cost functon (or functons) subject to gven area/ power/ delay constrants. Fndng an optmal HW/SW partton s hard because of the large number of possble solutons for a gven granularty of the components and the many dfferent alternatves for these granulartes. In other words, the HW/SW parttonng problem s hard snce the desgn (search) space s typcally huge. The followng survey overvews the man algorthms used to solve the HW/SW parttonng problem. However, ths survey s by no means comprehensve. Tradtonally, parttonng was carred out manually as n the work of Marrec et al. [1998] and Cloute et al. [1999]. However, because of the ncrease of complexty of the systems, many research efforts amed at automatng the parttonng as much as possble. The suggested partton approaches dffer sgnfcantly accordng to the defnton they used to

3 Partcle Swarm Optmzaton for HW/SW Parttonng 51 the problem. One of the man dfferences s whether to nclude other tasks (such as schedulng where startng tmes of the components should be determned) as n Lopez-Vallejo et al [2003] and n Me et al. [2000], or just map components to hardware or software only as n the work of Vahd [2002] and Madsen et al [1997]. Some formulatons assgn communcaton events to lnks between hardware and/or software unts as n Jha and Dck [1998]. The system to be parttoned s generally gven n the form of task graph, the graph nodes are determned by the model granularty,.e. the semantc of a node. The node could represent a sngle nstructon, short sequence of nstructons [Sttt et al. 2005], basc block [Knudsen et al. 1996], a functon or procedure [Dtzel 2004, and Armstrong et al. 2002]. A flexble granularty may also be used where a node can represent any of the above [Vahd 2002; Henkel and Ernst 2001]. Regardng the suggested algorthms, one can dfferentate between exact and heurstc methods. The proposed exact algorthms nclude, but are not lmted to, branch-and-bound [Bnh et al 1996], dynamc programmng [Madsen et al. 1997], and nteger lnear programmng [Neman 1998; Dtzel 2004]. Due to the slow performance of the exact algorthms, heurstc-based algorthms are proposed. In partcular, Genetc algorthms are wdely used [Neman 1998; Mann 2004] as well as smulated annealng [Armstrong et al 2002; Eles et al. 1997], herarchcal clusterng [Eles et al. 1997], and Kernghan-Ln based algorthms such as n [Mann 2004]. Less popular heurstcs are used such as Tabu search [Eles et al. 1997] and greedy algorthms [Chatha and Vemur 2001]. Some researchers used custom heurstcs, such as Maxmum Flow-Mnmum Communcatons (MFMC) [Mann 2004], Global Crtcalty/Local Phase (GCLP) [Kalavade and Lee 1994], process complexty [Adhpath 2004], the expert system presented n [Lopez-Vallejo et al. 2003], and Balanced/Unbalanced parttonng (BUB) [Sttt 2008]. The deal Hardware/Software parttonng tool produces automatcally a set of hgh-qualty parttons n a short, predctable computaton tme. Such tool would also allow the desgner to nteract wth the parttonng algorthm. De Souza et al. [2003] propose the concepts of qualty requstes and a method based on Qualty Functon Deployment (QFD) as references to represent both the advantages and dsadvantages of exstng HW/SW parttonng methods, as well as, to defne a set of features for an optmzed parttonng algorthm. They classfed the algorthms accordng to the followng crteron: 1. Applcaton doman: whether they are "mult-doman" (conceved for more than one or any applcaton doman, thus not consderng partculartes of these domans and beng technology-ndependent) or "specfc doman" approaches. 2. The target archtecture type. 3. Consderaton for the HW-SW communcaton costs. 4. Possblty of choosng the best mplementaton alternatve of HW nodes. 5. Possblty of sharng HW resources among two or more nodes. 6. Explotaton of HW-SW parallelsm. 7. Sngle-mode or mult-mode systems wth respect to the clock domans. In ths Chapter, we present the use of the Partcle Swarm Optmzaton technques to solve the HW/SW parttonng problem. The aforementoned crterons wll be mplctly consdered along the algorthm presentaton. 3. Partcle swarm optmzaton Partcle swarm optmzaton (PSO) s a populaton based stochastc optmzaton technque developed by Eberhart and Kennedy n 1995 [Kennedy and Eberhart 1995; Eberhart and

4 52 Partcle Swarm Optmzaton Kennedy 1995; Eberhart and Sh 2001]. The PSO algorthm s nspred by socal behavor of brd flockng, anmal hordng, or fsh schoolng. In PSO, the potental solutons, called partcles, fly through the problem space by followng the current optmum partcles. PSO has been successfully appled n many areas. A good bblography of PSO applcatons could be found n the work done by Pol [2007]. 3.1 PSO algorthm As stated before, PSO smulates the behavor of brd flockng. Suppose the followng scenaro: a group of brds s randomly searchng for food n an area. There s only one pece of food n the area beng searched. Not all the brds know where the food s. However, durng every teraton, they learn va ther nter-communcatons, how far the food s. Therefore, the best strategy to fnd the food s to follow the brd that s nearest to the food. PSO learned from ths brd-flockng scenaro, and used t to solve optmzaton problems. In PSO, each sngle soluton s a "brd" n the search space. We call t "partcle". All of partcles have ftness values whch are evaluated by the ftness functon (the cost functon to be optmzed), and have veloctes whch drect the flyng of the partcles. The partcles fly through the problem space by followng the current optmum partcles. PSO s ntalzed wth a group of random partcles (solutons) and then searches for optma by updatng generatons. Durng every teraton, each partcle s updated by followng two "best" values. The frst one s the poston vector of the best soluton (ftness) ths partcle has acheved so far. The ftness value s also stored. Ths poston s called pbest. Another "best" poston that s tracked by the partcle swarm optmzer s the best poston, obtaned so far, by any partcle n the populaton. Ths best poston s the current global best and s called gbest. After fndng the two best values, the partcle updates ts velocty and poston accordng to equatons (1) and (2) respectvely. where v k vk+ 1 = wvk + c1r1 (pbest x k ) + c2r2 (gbestk x k ) (1) x k + 1 x k + v k + 1 = (2) s the velocty of th partcle at the k th teraton, x s current the soluton (or poston) of the th partcle. r 1 and r 2 are random numbers generated unformly between 0 and 1. c 1 s the self-confdence (cogntve) factor and c 2 s the swarm confdence (socal) factor. Usually c 1 and c 2 are n the range from 1.5 to 2.5. Fnally, w s the nerta factor that takes lnearly decreasng values downward from 1 to 0 accordng to a predefned number of teratons as recommended by Haupt and Haupt [2004]. The 1 st term n equaton (1) represents the effect of the nerta of the partcle, the 2 nd term represents the partcle memory nfluence, and the 3 rd term represents the swarm (socety) nfluence. The flow chart of the procedure s shown n Fg. 1. The veloctes of the partcles on each dmenson may be clamped to a maxmum velocty V max, whch s a parameter specfed by the user. If the sum of acceleratons causes the velocty on that dmenson to exceed V max, then ths velocty s lmted to V max [Haupt and Haupt 2004]. Another type of clampng s to clamp the poston of the current soluton to a certan range n whch the soluton has vald value, otherwse the soluton s meanngless [Haupt and Haupt 2004]. In ths Chapter, poston clampng s appled wth no lmtaton on the velocty values. k

5 Partcle Swarm Optmzaton for HW/SW Parttonng 53 Fgure 1. PSO Flow chart 3.2 Comparsons between GA and PSO The Genetc Algorthm (GA) s an evolutonary optmzer (EO) that takes a sample of possble solutons (ndvduals) and employs mutaton, crossover, and selecton as the prmary operators for optmzaton. The detals of GA are beyond the scope of ths chapter, but nterested readers can refer to Haupt and Haupt [2004]. In general, most of evolutonary technques have the followng steps: 1. Random generaton of an ntal populaton. 2. Reckonng of a ftness value for each subject. Ths ftness value depends drectly on the dstance to the optmum. 3. Reproducton of the populaton based on ftness values. 4. If requrements are met, then stop. Otherwse go back to step 2. From ths procedure, we can learn that PSO shares many common ponts wth GA. Both algorthms start wth a group of randomly generated populaton and both algorthms have ftness values to evaluate the populaton, update the populaton and search for the optmum wth random technques, and fnally, check for the attanment of a vald soluton. On the other hand, PSO does not have genetc operators lke crossover and mutaton. Partcles update themselves wth the nternal velocty. They also have memory, whch s mportant to the algorthm (even f ths memory s very smple as t stores only pbest and gbest k postons).

6 54 Partcle Swarm Optmzaton Also, the nformaton sharng mechansm n PSO s sgnfcantly dfferent: In GAs, chromosomes share nformaton wth each other. So the whole populaton moves lke one group towards an optmal area even f ths move s slow. In PSO, only gbest gves out the nformaton to others. It s a one-way nformaton sharng mechansm. The evoluton only looks for the best soluton. Compared wth GA, all the partcles tend to converge to the best soluton quckly n most cases as shown by Eberhart and Sh [1998] and Hassan et al. [2004]. When comparng the run-tme complexty of the two algorthms, we should exclude the smlar operatons (ntalzaton, ftness evaluaton, and termnaton) form our comparson. We exclude also the number of generatons, as t depends on the optmzaton problem complexty and termnaton crtera (our experments n Secton ndcate that PSO needs lower number of generatons than GA to reach a gven soluton qualty). Therefore, we focus our comparson to the man loop of the two algorthms. We consder the most tme-consumng processes (recombnaton n GA as well as velocty and poston update n PSO). For GA, f the new generaton replaces the older one, the recombnaton complexty s O(q), where q s group sze for tournament selecton. In our case, q equals the Selecton rate*n, where n s the sze of populaton. However, f the replacement strategy depends on the ftness of the ndvdual, a sortng process s needed to determne whch ndvduals to be replaced by whch new ndvduals. Ths sortng s mportant to guarantee the soluton qualty. Another sortng process s needed any way to update the rank of the ndvduals at the end of each generaton. Note that the quck sortng complexty ranges from O(n 2 ) to O(nlog 2 n) [Jensen 2003, Harrs and Ross 2006]. In the other hand, for PSO, the velocty and poston update processes complexty s O(n) as there s no need for pre-sortng. The algorthm operates accordng to equatons (1) and (2) on each ndvdual (partcle) [Rodrguez et al. 2008]. From the above dscusson, GA's complexty s larger than that of PSO. Therefore, PSO s smpler and faster than GA. 3.3 Algorthm Implementaton The PSO algorthm s wrtten n the MATLAB program envronment. The nput to the program s a desgn that conssts of the number of nodes. Each node s assocated wth cost parameters. For expermental purpose, these parameters are randomly generated. The used cost parameters are: A Hardware mplementaton cost: whch s the cost of mplementng that node n hardware (e.g. number of gates, area, or number of logc elements). Ths hardware cost s unformly and randomly generated n the range from 1 to 99 [Mann 2004]. A Software mplementaton cost: whch s the cost of mplementng that node n software (e.g. executon delay or number of clock cycles). Ths software cost s unformly and randomly generated n the range from 1 to 99 [Mann 2004]. A Power mplementaton cost: whch s the power consumpton f the node s mplemented n hardware or software. Ths power cost s unformly and randomly generated n the range from 1 to 9. We use a dfferent range for Power consumpton values to test the addton of other cost terms wth dfferent range characterstcs. Consder a desgn consstng of m nodes. A possble soluton (partcle) s a vector of m elements, where each element s assocated to a gven node. The elements assume a 0 value (f node s mplemented n software) or a 1 value (f the node s mplemented n hardware). There are n ntal partcles; the partcles (solutons) are ntalzed randomly.

7 Partcle Swarm Optmzaton for HW/SW Parttonng 55 The velocty of each node s ntalzed n the range from (-1) to (1), where negatve velocty means movng the partcle toward 0 and postve velocty means movng the partcle toward 1. For the man loop, equatons (1), (2) are evaluated n each loop. If the partcle goes outsde the permssble regon (poston from 0 to 1), t wll be kept on the nearest lmt by the aforementoned clampng technque. The cost functon s called for each partcle, the used cost functon s a normalzed weghted sum of the hardware, software, and power cost of each partcle accordng to equaton (3). HWcost SWcost POWERcost Cost = 100* α +β + γ (3) allhwcost allswcost allpowercost where allhwcost (allswcost) s the Maxmum Hardware (Software) cost when all nodes are mapped to Hardware (Software), and allpowercost s the average of the power cost of all-hardware soluton and all-software soluton. α, β, and γ are weghtng factors. They are set by the user accordng to hs/her crtcal desgn parameters. For the rest of ths chapter, all the weghtng factors are consdered equal unless otherwse mentoned. The multplcaton by 100 s for readablty only. The HWCost (SWCost) term represent the cost of the partton mplemented n hardware (software), t could represent the area and the delay of the partton (the area and the delay of the software partton). However, the software cost has a fxed (CPU area) term that s ndependent on the problem sze. The weghted sum of normalzed metrcs s a classcal approach to transform Mult-objectve Optmzaton problems nto a sngle objectve optmzaton [Donoso and Fabregat 2007] The PSO algorthm proceeds accordng to the flow chart shown n Fg. 1. For smplcty, the cost value could be consdered as the nverse of the ftness where good solutons have low cost values. Accordng to equatons (1) and (2), the partcle nodes values could take any value between 0 and 1. However, as a dscrete,.e. bnary, parttonng problem, the nodes values must take values of 1 or 0. Therefore, the poston value s rounded to the nearest nteger [Hassan et al. 2004]. The man loop s termnated when the mprovement n the global best soluton gbest for the last number teratons s less than a predefned value (ε). The number of these teratons and the value of (ε) are user controlled parameters. For GA parameters, the most mportant parameters are: Selecton rate whch s the percentage of the populaton members that are kept unchanged whle the others go under the crossover operators. Mutaton rate whch s the percentage of the populaton that undergo the gene alteraton process after each generaton. The matng technque whch determnes the mechansm of generatng new chldren form the selected parents. 3.4 Results Algorthms parameters The followng experments are performed on a Pentum-4 PC wth 3GHz processor speed, 1 GB RAM and WnXP operatng system. The experments were performed usng MATLAB 7

8 56 Partcle Swarm Optmzaton program. The PSO results are compared wth the GA. Common parameters between the two algorthms are as follows: No. of partcles (Populaton sze) n = 60, desgn sze m = 512 nodes, ε = 100 * eps, where eps s defned n MATLAB as a very small (numercal resoluton) value and equals *10-16 [Hanselman and Lttlefeld 2001]. For PSO, c 1 = c 2 = 2, w starts at 1 and decreases lnearly untl reachng 0 after 100 teratons. Those values are suggested n [Sh and Eberhart 1998; Sh and Eberhart 1999; Zheng et al. 2003]. To get the best results for GA, the parameters values are chosen as suggested n [Mann 2004; Haupt and Haupt 2004] where Selecton rate = 0.5, Mutaton rate = 0.05, and The matng s performed usng randomly selected sngle pont crossover. The termnaton crteron s the same for both PSO and GA. The algorthm stops after 50 unchanged teratons, but at least 100 teratons must be performed to avod quck stagnaton Algorthm results Fgures 2 and 3 shows the best cost as well as average populaton cost of GA and PSO respectvely. 155 HW /SW parttonng usng GA Best Populaton Average Cost Fgure 2. GA Soluton Generaton As shown n the fgures, the ntalzaton s the same, but at the end, the best cost of GA s whle for PSO t s Ths result represents around 8% mprovement n the result qualty n favor of PSO. Another advantage of PSO s ts performance (speed), as t termnates after seconds whle GA termnates after seconds. Ths result represents around 38% mprovement n performance n favor of PSO. The results vary slghtly from one run to another due to the random ntalzaton. Hence, decsons based on a sngle run are doubtful. Therefore, we ran the two algorthms 100 tmes for the same nput and took the average of the fnal costs. We found the average best cost of GA s 143 and t termnates after 155 seconds, whle for the PSO the average best cost was and t termnates after seconds. Thus, there are 8% mprovement n the result qualty and 29% speed mprovement.

9 Partcle Swarm Optmzaton for HW/SW Parttonng HW/SW parttonng usng PSO Best Populaton average Global Best 145 Cost Generaton Fgure 3. PSO Soluton Improved Algorthms. To further enhance the qualty of the results, we tred cascadng two runs of the same algorthm or of dfferent algorthms. There are four possble cascades of ths type: GA followed by another GA run (GA-GA algorthm), GA followed by PSO run (GA PSO algorthm), PSO followed by GA run (PSO-GA algorthm), and fnally PSO followed by another PSO run (PSO-PSO algorthm). For these cascaded algorthms, we kept the parameters values the same as n the Secton Only the last combnaton, PSO-PSO algorthm proved successful. For GA-GA algorthm, the second GA run s ntalzed wth the fnal results of the frst GA run. Ths result can be explaned as follows. When the populaton ndvduals are smlar, the crossover operator yelds no mprovements and the GA technque depends on the mutaton process to escape such cases, and hence, t slowly escapes local mnmums. Therefore, cascadng several GA runs takes a very long tme to yeld sgnfcant mprovement n results. The PSO-GA algorthm dd not far any better. Ths negatve result can be explaned as follows. At the end of the frst PSO run, the whole swarm partcles converge around a certan pont (soluton) as shown n Fg. 3. Thus, the GA s ntalzed wth populaton members of close ftness wth small or no dversty. In fact, ths s a poor ntalzaton of the GA, and hence t s not expected to mprove the PSO results of the frst step of ths algorthm sgnfcantly. Our numercal results confrmed ths concluson The GA-PSO algorthm was not also successful. Fgures 4 and 5 depct typcal results for ths algorthm. PSO starts wth the fnal solutons of the GA stage (The GA best output cost s ~143, and the populaton fnal average s ~147) and contnues the optmzaton untl t termnates wth a best output cost equals ~132. However, ths best output cost value s acheved by PSO alone as shown n Fg. 3. Ths fnal result could be explaned as the PSO behavor s not strongly dependent on the ntal partcles poston obtaned by GA due to the random veloctes assgned to the partcles at the begnnng of PSO phase. Notce that, n Fg. 5, the cost ncreases at the begnnng due to the random veloctes that force the partcles to move away from the postons obtaned by GA phase.

10 58 Partcle Swarm Optmzaton HW /SW parttonng usng GA Best Populaton Average Cost Generaton Fgure 4. GA output of GA-PSO HW /SW parttonng usng PS Best Populaton Average Global Best Cost G eneraton Fgure 5. PSO output of GA-PSO Re-exted PSO algorthm. As the PSO proceeds, the effect of the nerta factor (w) s decreased untl reachng 0. Therefore, v k + 1 at the late teratons depends only on the partcle memory nfluence and the swarm nfluence (2 nd and 3 rd terms n equaton (1)). Hence, the algorthm may gve nonglobal optmum results. A hll-clmbng algorthm s proposed, ths algorthm s based on the assumpton that f we take the run's fnal results (partcles postons) and start allover agan wth (w) = 1 and re-ntalze the velocty (v) wth new random values, and keepng the pbest and gbest vectors n the partcles memores, the results can be mproved. We found that the result qualty s mproved wth each new round untl t settles around a certan value. Fg. 6 plots the best cost n each round. The curve starts wth cost ~133 and settles at round number 30 wth cost value ~116.5 whch s sgnfcantly below the results obtaned n the prevous two subsectons (about 15% qualty mprovement). The program

11 Partcle Swarm Optmzaton for HW/SW Parttonng 59 performed 100 rounds, but t could be modfed to stop earler by usng a dfferent termnaton crteron (.e. f the result remans unchanged for a certan number of rounds). 134 HW /SW parttonng usng re-excted PSO Best Cost Round Fgure 6. Successve mprovements n Re-excted PSO As the new algorthm depends on re-exctng new randomzed partcle veloctes at the begnnng of each round, whle keepng the partcle postons obtaned so far, t allows another round of doman exploraton. We propose to name ths successve PSO algorthm as the Re-excted PSO algorthm. In nature, ths algorthm looks lke gvng the brds a bg push after they are settled n ther best poston. Ths push re-ntalzes the nerta and speed of the brds so they are able to explore new areas, unexplored before. Hence, f the brds fnd a better place, they wll go there, otherwse they wll return back to the place from where they were pushed. The man reason of the advantage of re-excted PSO over successve GA s as follows: The PSO algorthm s able to swtch a sngle node from software to hardware or vce versa durng a sngle teraton. Such sngle node flppng s dffcult n GA as the change s done through crossover or mutaton. However, crossover selects large number of nodes n one segment as a unt of operaton. Mutaton toggles the value of a random number of nodes. In ether case, sngle node swtchng s dffcult and slow. Ths re-excted PSO algorthm can be vewed as a varant of the re-start strateges for PSO publshed elsewhere. However, our re-excted PSO algorthm s not dentcal to any of these prevously publshed re-startng PSO algorthms as dscussed below. In Settles and Soule [2003], the restartng s done wth the help of Genetc Algorthm operators, the goal s to create two new chld partcles whose poston s between the parents poston, but accelerated away from the current drecton to ncrease dversty. The chldren s velocty vectors are exchanged at the same node and the prevous best vector s set to the new poston vector, effectvely restartng the chldren s memory. Obvously, our restartng strategy s dfferent n that t depends on pure PSO operators. In Tllett et al. [2005], the restartng s done by spawnng a new swarm when stagnaton occurs,.e. the swarm spawns a new swarm f a new global best ftness s found. When a swarm spawns a new swarm, the spawnng swarm (parent) s unaffected. To form the spawned (chld) swarm, half of the chldren partcles are randomly selected from the parent swarm and the other half are randomly selected from a random member of the swarm collecton (mate). Swarm creaton s suppressed when there are large numbers of swarms n

12 60 Partcle Swarm Optmzaton exstence. Obvously, our restartng strategy s dfferent n that t depends on a sngle swarm. In Pasupulet and Battt [2006], the Gregarous PSO or G-PSO, the populaton s attracted by the global best poston and each partcle s re-ntalzed wth a random velocty f t s stuck close to the global best poston. In ths manner, the algorthm proceeds by aggressvely and greedly scoutng the local mnma whereas Basc-PSO proceeds by tryng to avod them. Therefore, a re-ntalzaton mechansm s needed to avod the premature convergence of the swarm. Our algorthm dffers than G-PSO n that the re-ntalzaton strategy depends on the global best partcle not on the partcles that stuck close to the global best poston whch saves a lot of computatons needed to compare each partcle poston wth the global best one. Fnally, the re-start method of Van den Bergh [2002], the Mult-Start PSO (MPSO), s the nearest to our approach, except that when the swarm converges to a local optma. The MPSO records the current poston and re-ntalze the postons of the partcles. The veloctes are not re-ntalzed as MPSO depends on a dfferent verson of the velocty equaton that guarantees that the velocty term wll never reach zero. The modfed algorthm s called Guaranteed Convergence PSO (GCPSO). Our algorthm dffers n that we use the velocty update equaton defned n Equaton (1) and our algorthm re-ntalzes the velocty and the nerta of the partcles but not the postons at the restart. 3.5 Qualty and Speed Comparson between GA, PSO, and re-excted PSO For the sake of far comparson, we assumed that we have dfferent desgns where ther szes range from 5 nodes to 1020 nodes. We used the same parameters as descrbed n prevous experments and we ran the algorthms on each desgn sze 10 tmes and took the average results. Another stoppng crteron s added to the re-excted PSO where t stops when the best result s the same for the last 10 rounds. Fg. 7 represents the desgn qualty mprovement of PSO over GA, re-excted PSO over GA, and re-excted PSO over PSO. We notced that when the desgn sze s around 512, the mprovement s about 8% whch confrms the qualty mprovement results obtaned n Secton PSO over GA Re-excted PSO over GA Re-excted PSO over PSO Qualty Improvement Fgure 7. Qualty mprovement Desgn Sze n Nodes

13 Partcle Swarm Optmzaton for HW/SW Parttonng Speed Improvement Desgn Sze n Nodes Fgure 8. Speed mprovement Fg. 8 represents the performance (speed) mprovement of PSO over GA (orgnal and ftted curve, the curve fttng s done usng MATLAB Basc Fttng tool). Re-excted PSO s not ncluded as t depends on mult-round scheme where t starts a new round nternally when the prevous round termnates, whle GA and PSO runs once and produces ther outputs when a termnaton crteron s met. It s notced that n a few number of ponts n Fg. 8, the speed mprovement s negatve whch means that GA fnshes before PSO, but the desgn qualty n Fg. 7 does not show any negatve values. Fg. 7 also shows that, on the average, PSO outperforms GA by a rato of 7.8% mprovements n the result qualty and Fg. 8 shows that, on the average, PSO outperforms GA by a rato 29.3% mprovement n speed. On the other hand, re-excted PSO outperforms GA by an average rato of 17.4% n desgn qualty, and outperforms normal PSO by an average rato of 10.5% n desgn qualty. Moreover, Fg. 8 could be dvded nto three regons. The frst regon s the small sze desgns regon (lower than 400 nodes) where the speed mprovement s large (from 40% to 60%). The medum sze desgn regon (from 400 to 600 nodes) depcts an almost lnear decrease n the speed mprovement from 40% to 10%. The large sze desgn regon (bgger than 600 nodes) shows an almost constant (around 10%) speed mprovement, wth some cases where GA s faster than PSO. Note that most of the practcal real lfe HW/SW parttonng problems belong to the frst regon where the number of nodes < Constraned Problem Formulaton Constrants defnton and volaton handlng In embedded systems, the constrants play an mportant role n the success of a desgn, where hard constrants mean hgher desgn effort and therefore a hgh need for automated tools to gude the desgner n crtcal desgn decsons. In most of the cases, the constrants are manly the software deadlne tmes (for real-tme systems) and the maxmum avalable area for hardware. For smplcty, we wll refer to them as software constrant and hardware constrant respectvely. Mann [2004] dvded the HW/SW parttonng problem nto 5 sub-problems (P 1 P 5 ). The unconstraned problem (P 5 ) s dscussed n Secton 3.3. The P 1 problem nvolves wth both

14 62 Partcle Swarm Optmzaton Hardware and Software constrants. The P 2 (P 3 ) problem deals wth hardware (software) constraned desgns. Fnally, the P 4 problem mnmzes HW/SW communcatons cost whle satsfyng hardware and software constrants. The constrants affect drectly the cost functon. Hence, equaton (3) should be modfed to account for constrants volatons. In Lopez-Vallejo et al. [2003] three dfferent technques are suggested for the cost functon correcton and evaluaton: Mean Square Error mnmzaton: Ths technque s useful for forcng the soluton to meet certan equalty, rather than nequalty, constrants. The general expresson for Mean Square Error based cost functon s: 2 (cost constrant ) MSE_cost = k * (4) 2 constrant where constrant s the constrant on parameter and k s a weghtng factor. The cost s the parameter cost functon. cost s calculated usng the assocated term (.e. area or delay) of the general cost functon (3). Penalty Methods: These methods punsh the solutons that produce medum or large constrants volatons, but allow nvald solutons close to the boundares defned by the constrants to be consdered as good solutons [Lopez-Vallejo et al. 2003]. The cost functon n ths case s formulated as: cost (x) Cost (x) = k * + kc *vol(c, x) (5) Totalcost where x s the soluton vector to be evaluated, k and k c are weghtng factors (100 n our case). denotes the desgn parameters such as: area, delay, power consumpton, etc., c denotes a constraned parameter, and vol(c,x) s the correcton functon of the constraned parameters. vol(c,x) could be expressed n terms of the percentage of volaton defned by : 0 vol(c,x) = cos t (x) constrant(c) constrant(c) c cos t (x) < constra nt(c) cos t (x) > constra nt(c) Lopez-Vallejo and Lopez et al. [2003] proposed to use the squared value of vol(c,x). The penalty methods have an mportant characterstc n whch there mght be nvald solutons wth better overall cost than vald ones. In other words, the nvald solutons are penalzed but could be ranked better than vald ones. Barrer Technques: These methods forbd the exploraton of solutons outsde the allowed desgn-space. The barrer technques rank the nvald solutons worse than the vald ones. There are two common forms of the barrer technques. The frst form assgns a constant hgh cost to all nvald solutons (for example nfnty). Ths form s unable to dfferentate between near-barrer or far-barrer nvald solutons. t also needs to be ntalzed wth at least one vald soluton, otherwse all the costs are the same (.e. ) and the algorthm fals. The other form, suggested n Mann [2004], assgns a constant-base barrer to all nvald solutons. Ths base barrer could be a constant larger than maxmum cost produced by any vald soluton. In our case for example, from equaton (3), each cost term s normalzed such that ts maxmum value s one. Therefore, a good choce of the constant-base penalty s "one" for each volaton ("one" for hardware volaton, "one" for software volaton, and so on). (6)

15 Partcle Swarm Optmzaton for HW/SW Parttonng Constrants modelng In order to determne the best method to be adopted, a comparson between the penalty methods (frst order or second order percentage volaton term) and the barrer methods (nfnty vs. constant-base barrer) s performed. The detals of the experments are not shown here for the sake of brevty. Our experments showed that combnng the constant-base barrer method wth any penalty method (frst-order error or second-order error term) gves hgher qualty solutons and guarantees that no nvald solutons beat vald ones. Hence, n the followng experments, equaton (7) wll be used as the cost functon form. Our experments further ndcate that the second-order error penalty method gves a slght mprovement over frst-order one. For double constrants problem (P 1 ), generatng vald ntal solutons s hard and tme consumng, and hence, the barrer methods should be ruled out for such problems. When dealng wth sngle constrant problems (P 2 and P 3 ), one can use the Fast Greedy Algorthm (FGA) proposed by Mann [2004] to generate vald ntal solutons. FGA starts by assgnng all nodes to the unconstraned sde. It then proceeds by randomly movng nodes to the constraned sde untl the constrant s volated. Cost(x) cost(x) k * + kc(penalty_ vol(c, x) + Barrer_ vol(c)) Totalcost = (7) c Sngle constrant experments As P 2 and P 3 are treated the same n our formulaton, we consder the software constraned problem (P 3 ) only. Two experments were performed, the frst one wth relaxed constrant where the deadlne (Maxmum delay) s 40% of all-software soluton delay, the second one s a hard real-tme system where the deadlne s 15% of the all-software soluton delay. The parameters used are the same as n Secton 3.4. Fast Greedy Algorthm s used to generate the ntal solutons and re-excted PSO s performed for 10 rounds. In the cases of GA and normal PSO only, all results are based on averagng the results of 100 runs. For the frst experment; the average qualty of the GA s ~ whle for PSO t s ~ 131.3, and for re-excted PSO t s ~ 120. All fnal solutons are vald due to the ntalzaton scheme used (Fast Greedy Algorthm). For the second experment, the average qualty of the soluton of GA s ~ 147 whle for PSO t s ~ 137 and for re-excted PSO t s ~ 129. The results confrm our earler concluson that the re-excted PSO agan outperforms normal PSO and GA, and that the normal PSO agan outperforms GA Double constrants experments When testng P 1 problems, the same parameters as the sngle-constraned case are used except that FGA s not used for ntalzaton. Two experments were performed: balanced constrants where maxmum allowable hardware area s 45% of the area of the all-hardware soluton and the maxmum allowable software delay s 45% of the delay of the all-software soluton. The other one s an unbalanced-constrants problem where maxmum allowable hardware area s 60% of area of the all-hardware soluton and the maxmum allowable software delay s 20% of the delay of the all-software soluton. Note that these constrants are used to guarantee that at least a vald soluton exsts.

16 64 Partcle Swarm Optmzaton For the frst experment, the average qualty of the soluton of GA s ~ 158 and nvald solutons are obtaned durng the frst 22 runs out of xx total runs. The best vald soluton cost was 137. For PSO the average qualty s ~ 131 wth vald solutons durng all the runs. The best vald soluton cost was Fnally for the re-excted PSO; the fnal soluton qualty s It s clear that re-excted PSO agan outperforms both PSO and GA. For the second experment; the average qualty of the soluton of GA s ~ 287 and no vald soluton s obtaned durng the runs. Note that a constant penalty barrer of value 100 s added to the cost functon n the case of a volaton. For PSO the average qualty s ~ 251 and no vald soluton s obtaned durng the runs. Fnally, for the re-excted PSO, the fnal soluton qualty s 125 (As vald soluton s found n the seventh round). Ths shows the performance mprovement of re-excted PSO over both PSO and GA. Hence, for the rest of ths Chapter, we wll use the terms PSO and re-excted PSO nterchangeably to refer to the re-excted algorthm. 3.7 Real-Lfe Case Study Fgure 9. CDFG for JPEG encodng system [Lee et al. 2007c]

17 Partcle Swarm Optmzaton for HW/SW Parttonng 65 To further valdate the potental of PSO algorthm for HW/SW parttonng problem we need to test t on a real-lfe case study, wth a realstc cost functon terran. We also wanted to verfy our PSO generated solutons aganst a publshed benchmark desgn. The HW/SW cost matrx for all the modules of such real lfe case study should be known. We carred out a comprehensve lterature search n search for such case study. Lee et al. [2007c] provded such detals for a case study of the well-known Jont Pcture Expert Group (JPEG) encoder system. The hardware mplementaton s wrtten n "Verlog" descrpton language, whle the software s wrtten n "C" language. The Control-Data Flow Graph (CDFG) for ths mplementaton s shown n Fg. 9. The authors pre-assumed that the RGB to YUV converter s mplemented n SW and wll not be subjected to the parttonng process. For more detals regardng JPEG systems, nterested readers can refer to Jonsson [2005]. Table 1 shows measured data for the consdered cost metrcs of the system components. Includng such table n Lee et al. [2007c] allows us to compare drectly our PSO search algorthm wth the publshed ones wthout re-estmatng the HW or SW costs of the desgn modules on our platform. Also, armed wth ths data, there s no need to re-mplement the publshed algorthms or tryng to obtan them from ther authors. Executon Tme Cost Percentage Power Consumpton Component HW(ns) SW(us) HW(10-3 ) SW(10-3 ) HW(mw) SW(mw) Level Offset (FE a ) DCT (FE b ) DCT (FE c ) DCT (FE d ) Quant (FE e ) Quant (FE f ) Quant (FE g ) DPCM (FE h ) ZgZag (FE ) DPCM(FE j ) ZgZag (FE k ) DPCM (FE l ) ZgZag (FE m ) VLC (FE n ) RLE (FE o ) VLC (FE p ) RLE (FE q ) VLC (FE r ) RLE (FE s ) VLC (FE t ) VLC (FE u ) VLC (FE v ) Table 1. Measured data for JPEG system

18 66 Partcle Swarm Optmzaton The data s obtaned through mplementng the hardware components targetng ML310 board usng Xlnx ISE 7.1 desgn platform. Xlnx Embedded Desgn Kt (EDK 7.1) s used to measure the software mplementaton costs. The target board (ML310) contans Vrtex2-Pro XC2vP30FF896 FPGA devce that contans programmable logc slces and 2448 Kbytes memory and two embedded IBM Power PC (PPC) processor cores. In general, one slce approxmately represents two 4-nput Look- Up Tables (LUTs) and two Flp-Flops [Xlnx 2007]. The frst column n the table shows the component name (cf. Fg. 9) along wth a character unque to each component. The second and thrd columns show the power consumpton n mwatts for the hardware and software mplementatons respectvely. The fourth column shows the software cost n terms of memory usage percentage whle the ffth column shows the hardware cost n terms of slces percentage. The last two columns show the executon tme of the hardware and software mplementatons. Lee et al. [2007c] also provded detaled comparson of ther methodology wth another four approaches. The man problem s that the target archtecture n Lee et al. [2007c] has two processors and allows mult-processor parttonng whle our target archtecture s based on a sngle processor. A slght modfcaton n our cost functon s performed that allows up to two processors to run on the software part concurrently. Equaton (3) s used to model the cost functon after addng the memory cost term as shown n Equaton (8) HWcost SW cos t POWERcos t MEMcos t Cost = 100* α + β + γ + η (8) allhwcos t allswcos t allpowercos t allmemcos t The added memory cost term (MEMcost) and ts weght factor (η) account for the memory sze (n bts). allmemcost s the maxmum sze (upper-bound) of memory bts.e., memory sze of all software soluton. Another modfcaton to the cost functon of Equaton (8) s affected f the number of multprocessors s lmted. Consder that we have only two processors. Thus, only two modules can be assgned to the SW sde at any control step. For example, n the step 3 of Fg. 9, no more than two DCT modules can be assgned to the SW sde. The soluton that assgns the three DCT modules of ths step to SW sde s penalzed by a barrer volaton term of value "one". Fnally, as more than one hardware component could run n parallel, the hardware delay s not addtve. Hence, we calculate the hardware delay by accumulaton the maxmum delay of each control steps as shown n Fg. 9. In other words, we calculate the crtcal-path delay. In Lee et al. [2007c], the results of four dfferent algorthms were presented. However, for the sake brevty, detals of such algorthms are beyond the scope of ths chapter. We used these results and compared them wth our algorthm n Table 2. In our experments, the parameters used for the PSO are the populaton sze s fxed to 50 partcles, the round termnates after 50 unmproved runs, and 100 runs must run at the begnnng to avod trappng n local mnmum. The number of re-excted PSO rounds s selected by the user. The power constrant s constraned to 600 mw, area and memory are constraned to the maxmum avalable FPGA resources,.e. 100%, and maxmum number of concurrent software tasks s two.

19 Partcle Swarm Optmzaton for HW/SW Parttonng 67 Method Results Lev / DCT / Q / DPCM-Zg / VLC-RLE / VLC Executon Tme (us) Memory (KB) Slce use rate (%) Power (mw) FBP [Lee et al. 2007c] 1/001/111/101111/111101/ GHO [Lee et al. 1/010/111/111110/111111/ b] GA [Ln et al. 2006] 0/010/010/101110/110111/ HOP [Lee et al. 2007a] 0/100/010/101110/110111/ PSO-delay 1/010/111/111110/111111/ PSO-area 0/100/001/111010/110101/ PSO-power 0/100/001/111010/110101/ PSO-memory 1/010/111/111110/111111/ PSO-NoProc 0/000/111/000000/111111/ PSO-Norm 0/010/111/101110/111111/ Table 2. Comparson of parttonng results Dfferent confguratons of the cost functon are tested for dfferent optmzaton goals. PSOdelay, PSO-area, PSO-power, and PSO-memory represent the case where the cost functon ncludes only one term,.e. delay, area, power, and memory, respectvely. PSO-NoProc s the normal PSO-based algorthm wth the cost functon shown n equaton (7) but the number of processors s unconstraned. Fnally, PSO-Norm s the normal PSO wth all constrants beng consdered,.e. the same as PSO-NoProc wth maxmum number of two processors. The second column n Table 2 shows the resultng partton where '0' represents software and '1' represents hardware. The vector s dvded nto sets, each set represents a control step as shown n Fg. 9. The thrd to ffth columns of ths table lst the executon tme, memory sze, % of slces used and the power consumpton respectvely of the optmum solutons obtaned accordng to the algorthms dentfed n the frst column. As shown n the table, the bold results are the best results obtaned for each desgn metrcs. Regardng PSO performance, all the PSO-based results are found wthn two or three rounds of the Re-excted PSO. Moreover, for each ndvdual optmzaton objectve, PSO obtans the best result for that specfc objectve. For example, PSO-delay obtans the same results as GHO algorthm [ref.] does and t outperforms the other solutons n the executon tme and memory utlzaton and t produces good qualty results that meet the constrants. Hence, our cost functon formulaton enables us to easly select the optmzaton crteron that suts our desgn goals. In addton, PSO-a and PSO-p gve the same results as they try to move nodes to software whle meetng the power and number of processors constrants. On the other hand, PSO-del and PSO-mem try to move nodes to hardware to reduce the memory usage and the delay, so ther results are smlar. PSO-NoProc s used as a what-f analyss tool, as ts results answer the queston of what s the optmum number of parallel processors that could be used to fnd the optmum desgn.

20 68 Partcle Swarm Optmzaton In our case, obtanng sx processors would yeld the results shown n the table even f three of them wll be used only for one task, namely, the DCT. 4. Extensons 4.1 Modelng Hardware Implementaton alternatves As shown prevously, HW/SW parttonng depends on the HW area, delay, and power costs of the ndvdual nodes. Each node represents a gran (from an nstructon up to a procedure), and the gran level s selected by the desgner. The ntal desgn s usually mapped nto a sequencng graph that descrbes the flow dependences of the ndvdual nodes. These dependences lmt the maxmum degree of parallelsm possble between these nodes. Whereas a sequencng graph denotes the partal order of the operatons to be performed, the schedulng of a sequencng graph determnes the detaled startng tme for each operaton. Hence, the schedulng task sets the actual degree of concurrency of the operatons, wth the attendant delay and area costs [De Mchel 1994]. In short, delay and area costs needed for the HW/SW parttonng task are only known accurately post the schedulng task. Obvously, ths stuaton calls for tme-wasteful teratons. The other soluton s to prepare a lbrary of many mplementatons for each node and select one of them durng the HW/SW parttonng task as the work done by Kalavade and Lee [2002]. Agan, such approach mples a hgh desgn tme cost. Our approach to solve ths egg-chcken couplng between the parttonng and schedulng tasks s as follows: represent the hardware soluton of each node by two lmtng solutons, HW 1 and HW 2, whch are automatcally generated from the functonal specfcatons. These two lmtng solutons bound the range of all other possble schedules. The parttonng algorthm s then called on to select the best mplementaton for the ndvdual nodes: SW, HW 1 or HW 2. These two lmtng solutons are: 1. Mnmum-Latency soluton: where As-Soon-As-Possble (ASAP) schedulng algorthm s appled to fnd the fastest mplementaton by allowng unconstraned concurrency. Ths soluton allows for two alternatve mplementatons, the frst where maxmum resource-sharng s allowed. In ths mplementaton, smlar operatonal unts are assgned to the same operaton nstance whenever data precedence constrants allow. The other soluton, the non-shared parallel soluton, forbds resource-sharng altogether by nstantatng a new operatonal unt for each operaton. Whch of these two parallel solutons yelds a lower area s dffcult to predct as the multplexer cost of the shared parallel soluton, added to control the access to the shared nstances, can offset the extra area cost of the non-shared soluton. Our modelng technque selects the soluton wth the lower area. Ths soluton s, henceforth, referred to as the parallel hardware soluton. 2. Maxmum Latency soluton: where no concurrency s allowed, or all operatons are smply seralzed. Ths soluton results n the maxmum hardware latency and the nstantaton of only one operatonal nstance for each operaton unt. Ths soluton s, henceforth, referred to as the seral hardware soluton. To llustrate our dea, consder a node that represents the operaton y = (a*b) + (c*d). Fg. 10.a (10.b) shows the parallel (seral) hardware mplementatons. From Fg. 10 and assumng that each operaton takes only one clock cycle, the frst mplementaton fnshes n 2 clock cycles but needs 2 multpler unts and one adder unt. The second mplementaton ends n 3 clock cycles but needs only one unt for each operaton

21 Partcle Swarm Optmzaton for HW/SW Parttonng 69 (one adder unt and one multpler unt). The bold horzontal lnes drawn n Fg. 10 represent the clock boundares. (a) Fgure 10. Two extreme mplementatons of y = (a*b) + (c*d) In general, the parallel and seral HW solutons have dfferent area and delay costs. For specal nodes, these two solutons may have the same area cost, the same delay cost or the same delay and area costs. The reader s referred to Abdelhalm and Habb [2007] for more detals on such specal nodes. The use of two alternatve HW solutons converts the HW/SW optmzaton problem from a bnary form to a tr-state form. The effectveness of the PSO algorthm for handlng ths extended HW/SW parttonng problem s detaled n Secton Communcatons Cost Modelng The Communcatons cost term n the context of HW/SW parttonng represents the cost ncurred due to the data and control passng from one node to another n the graph representaton of the desgn. Earler co-desgn approaches tend to gnore the effect of HW/SW communcatons. However, many recent embedded systems are communcatons orented due to the heavy amount of data to be transferred between system components. The communcatons cost should be consdered at the early desgn stages to provde hgh qualty as well as feasble solutons. The communcaton cost can be gnored f t s between two nodes on the same sde (.e., two hardware nodes or two software nodes). However, f the two nodes le on dfferent sdes; the communcaton cost cannot be gnored as t affects the parttonng decsons. Therefore, as communcatons are based on physcal channels, the nature of the channel determnes the communcaton type (class). In general, the HW/SW communcatons between the can be classfed nto four classes [Ernest 1997]: 1. Pont-to-pont communcatons 2. Bus-based communcatons 3. Shared memory communcatons 4. Network-based communcatons To model the communcatons cost, a communcaton class must be selected accordng to the target archtecture. In general, the model should nclude one or more of the followng cost terms [Luthra et al. 2003]: 1. Hardware cost: The area needed to mplement the HW/SW nterface and assocated data transfer delay on the hardware sde. (b)

22 70 Partcle Swarm Optmzaton 2. Software cost: The delay of the software nterface drver on the software sde. 3. Memory sze: The sze of the dedcated memory and regsters for control and data transfers as well as shared memory sze. The terms could be easly modeled wthn the overall delay, hardware area and memory costs of the system, as shown n equaton (8). 4.3 Extended algorthm experments As descrbed n Secton 3.3, the nput to the algorthm s a graph that conssts of a number of nodes and number of edges. Each node (edge) s assocated wth cost parameters. The used cost parameters are: Seral hardware mplementaton cost: whch s the cost of mplementng the node n seralzed hardware. The cost ncludes HW area as well as the assocated latency (n clock cycles). Parallel hardware mplementaton cost: whch s the cost of mplementng the node n parallel hardware. The cost ncludes HW area as well as the assocated latency (n clock cycles). Software mplementaton cost: the cost of mplementng the node n software (e.g. executon clock cycles and the CPU area). Communcaton cost: the cost of the edge f t crosses the boundary between the HW and the SW sdes (nterface area and delay, SW drver delay and shared memory sze). For expermental purposes, these parameters are randomly generated after consderng the characterstcs of each parameter,.e. Seral HW area Parallel HW area, and SW delay Seral HW delay Parallel HW delay. The needed modfcaton s to allow each node n the PSO soluton vector to have three values: 0 for software, 1 for seral hardware and 2 for parallel hardware. The parameters used n the mplementaton are: No. of partcles (Populaton sze) n = 50, No. of desgn sze (m) = 100 nodes, No. of communcaton edges (e) = 200, No. The number of re-exted PSO rounds set to a predefned value = 50. All other parameters are taken from Secton 3.4. The constrants are: Maxmum hardware area s 65% of the all-hardware soluton area, and the maxmum delay s 25% of the all-software soluton delay Results Three experments were performed. The frst (second) experment uses the normal PSO wth only the seral (parallel) hardware mplementaton. The thrd experment examnes the proposed trstate formulaton where the hardware s represented by two solutons (seral and parallel solutons). The results are shown n Table 3. Area Delay Comm. Seral HW Parallel SW Cost Cost Cost nodes HW nodes nodes Seral HW 34.9% 30.52% 1.43% 99 N/A 1 Parallel HW 57.8% 29.16% 32.88% N/A Tr-state formul % 23.65% 18.7% Table 3. Cost result of dfferent hardware alternatves schemes As shown n ths table, the seral hardware soluton pushes approxmately all nodes to hardware (99 out of 100) but fals to meet the deadlne constrant due to the relatvely large

23 Partcle Swarm Optmzaton for HW/SW Parttonng 71 delay of the seral HW mplementatons. On the other hand, the parallel HW soluton fals to meet the delay constrant due to the relatvely large area of parallel HW. Moreover, It has large communcatons cost. Fnally, the tr-state formulaton meets the constrants and results n a relatvely low communcaton cost. 4.4 Tunng Algorthm. As shown n Table 3, the thrd soluton wth two lmtng HW alternatves has a 23.65% delay. The algorthm could be tuned to push the delay to the constraned value (25%) by movng some hardware-mapped nodes from the parallel HW soluton to the seral HW soluton. Ths node swtchng reduces the hardware area at the expense of ncreasng the delay cost wthn the acceptable lmts, whle the communcaton cost s unaffected because all the moves are between HW mplementatons. 1) Fnd all nodes wth parallel HW mplementaton (mn_delay_set) 2) Calculate the Delay_margn = Delay deadlne PSO Acheved delay 3) Calculate Hardware_range = Node's Max. area Node's Mn. area. 4) Calculate Delay_range = Node's Max. delay Node's Mn. delay. 5) Create (dedcated_nodes_lst) wth nodes n (mn_delay_set) sorted n ascendng order accordng to Hardware_rang such that Delay_range<Delay_margn 6) Whle (dedcated_nodes_lst) s not empty 7) Move node wth the maxmum Hardware_range to seral HW regon. 8) For many nodes wth the same Hardware_range, choose the one wth mnmum Delay_range 9) Re-calculate Delay_margn 10) Update (dedcated_nodes_lst) 11) End Whle 12) Update (mn_delay_set) 13) Calculate Hardware Senstvty = Hardware range / Delay range Outputs 1. HW/SW partton 2. The remanng delay range n clock cycles. 3. Remanng parallel hardware nodes and ther Hardware Senstvty Nodes wth hgh Hardware Senstvty could be used along wth the delay range to obtan refned mplementatons (Tme Constraned Schedulng Problem) Fgure 11. Tunng heurstc for reducng the hardware area. The heurstc used to reduce the hardware area s shown Fg. 11. It shares many smlartes wth the greedy approaches presented by Gupta et al. [1992]. Frst, the heurstc calculates the extra delay that the system could tolerate and stll acheves the deadlne constrant (delay margn). It then fnds all nodes n parallel HW regon wth delay range less than delay margn and selects the node wth maxmum reducton n HW area cost (hardware range) to be moved to the seral hardware regon. Such selecton s carred out to obtan the maxmum hardware reducton whle stll meetng the deadlne.

24 72 Partcle Swarm Optmzaton (delay margn) s then re-calculated to fnd the nodes that are movable after the last movement. After movng all allowable nodes, the remanng parallel HW nodes can not move to the seral HW regon due to deadlne volaton. Therefore, the algorthm reports to the desgner wth all the remanng parallel HW nodes, ther Hardware Senstvty (the average hardware decrease due to the ncrease n the latency by one clock cycle), and the remanng delay margn. The user can, then, select a parallel hardware node or more and make a refned HW mplementaton wth the allowable delay (Tme-constraned Schedulng problem [De Mchel 1994]). The algorthm can be easly modfed for the opposte goals,.e. to account for reducng the delay whle stll meetng the hardware constrant. The above algorthm could not start f the PSO termnates wth nvald soluton. Therefore, we mplemented a smlar algorthm as a pre-tunng phase but wth the opposte goal: movng nodes form seral HW regon to parallel HW regon to reduce the delay, hence meet the deadlne constrant f possble, whle mnmzng the ncrease n the hardware area Results after usng the Tunng Algorthm Two experments were done: the frst one s the tunng of the results shown n Table 3. The tunng algorthm starts from where PSO ends. The Delay Margn was 1.35% (about 72 clock cycles). At the end of the algorthm, the Delay margn reaches 1 clock cycles, the area decreased to 40.09% and the delay reaches 24.98%. 12 parallel HW nodes were moved to the seral HW mplementaton. The results show that the area decreases by 10.55% for a very small delay ncrease (1.33%). The constrants are modfed such that the deadlne constrant s reduced to 22% and the maxmum area constrant s reduced to 55% to test the pre-tunng phase. PSO termnates wth 23.33%delay, 47.68% area, and communcatons 25.58%. The deadlne constrant s volated by 1.33% (about 71 clock cycles). The pre-tunng phase moves nodes from seral HW regon nto parallel HW regon untl satsfyng the deadlne constrants (delay s reduced to 21.95%). It moves 32 nodes and the Area ncreased to 59.13%. The delay margn becomes 2 clock cycles. Then the normal tunng heurstc starts wth that delay margn and moves two nodes back to the seral HW regon. The fnal area s 59% and the fnal delay s 21.99%. Notce that the delay constrant s met whle the area constrant becomes volated. 5. Conclusons In ths chapter, the recent ntroducton of the Partcle Swarm Optmzaton technque to solve the HW/SW parttonng problem s revewed, along wth ts re-exted PSO modfcaton. The re-exted PSO algorthm s a recently-ntroduced restartng technque for PSO. The Re-exted PSO proved to be hghly effectve for solvng the HW/SW parttonng problem. Effcent cost functon formulaton s of a paramount mportance for an effcent optmzaton algorthm. Each component n the desgn must have hardware as well as software mplementaton costs that gude the optmzaton algorthm. The hardware cost n our platform s modeled usng two extreme mplementatons that bound all other scheduledependent mplementatons. Communcatons cost between hardware and software domans are then proposed n contrast to other approaches that completely gnore such

25 Partcle Swarm Optmzaton for HW/SW Parttonng 73 term. Fnally, a tunng algorthm s proposed to fne tune the results and/or try to meet the constrants f PSO provdes volatng solutons. Fnally, JPEG encoder system s used as a real-lfe case study to test the vablty of the PSO for solvng the HW/SW parttonng problems. Ths case study compares our results wth other publshed results from the lterature. The comparson focuses on the PSO technque only. The results prove that our algorthm provdes better or equal results relatve to the cted results. The followng conclusons can be made: PSO s effectve for solvng the HW/SW Parttonng Problem. The PSO yelds better qualty and faster performance relatve to the well-known Genetc Algorthm. A newly-proposed Re-exted PSO restartng technque s effectve n escapng local mnmum. Formulatng the HW/SW parttonng problem usng the recently proposed two extreme hardware alternatves s effectve for solvng tghtly constraned problems. The ntroducton of two lmtng hardware alternatves provdes extra degree of freedom for the desgner wthout penalzng the desgner wth excessve computatonal cost. Greedy-lke Tunng algorthms are useful for refnng the PSO results. Such algorthms moves hardware-mapped nodes between ther two extreme mplementatons to refne the soluton or even to meet the constrants. A JPEG Encoder system s used as a real-lfe case study to verfy the potental of our methodology for parttonng large HW/SW co-desgn problems. 6. References Abdelhalm, M. B, Salama, A. E., and Habb S. E. -D Hardware Software Parttonng usng Partcle Swarm Optmzaton Technque. In The 6 th Internatonal Workshop on System-on-Chp for Real-Tme Applcatons (Caro, Egypt) Abdelhalm, M. B, and Habb S. E. -D Modelng communcaton cost and hardware alternatves n PSO based HW/SW parttonng. In the 19 th Internatonal Conference on Mcroelectroncs (Caro, Egypt) Adhpath, P Model based approach to Hardware/Software Parttonng of SOC Desgns. MSc Thess, Vrgna Polytechnc Insttute and State Unversty, USA. Armstrong, J.R., Adhpath, P. J.M. Baker, Jr Model and synthess drected task assgnment for systems on a chp. 15 th Internatonal Conference on Parallel and Dstrbuted Computng Systems (Cambrdge, MA, USA). Bnh, N. N., Ima, M., Shom, A., and Hkch, N A hardware/software parttonng algorthm for desgnng ppelned ASIPs wth least gate counts. Proceedngs of 33 rd Desgn Automaton Conference (Las Vegas, NV, USA) Chatha, K. S., and Vemur, R MAGELLAN: multway hardware-software parttonng and schedulng for latency mnmzaton of herarchcal control-dataflow task graphs. In proceedngs of the 9 th Internatonal Symposum on Hardware/Software Codesgn (Copenhagen, Denmark) Cloute, F., Contensou, J.-N., Esteve, D., Pampagnn, P., Pons, P., and Favard, Y Hardware/software co-desgn of an avoncs communcaton protocol nterface system: an ndustral case study. In proceedngs of the 7 th Internatonal Symposum on Hardware/Software Codesgn (Rome, Italy)

26 74 Partcle Swarm Optmzaton De Mchel, G Synthess and Optmzaton of Dgtal Crcuts. McGraw Hll. De Souza, D. C., De Barros, M. A., Navner, L. A. B., and Neto, B. G. A On relevant qualty crtera for optmzed parttonng methods. In proceedngs of 45 th Mdwest Symposum on Crcuts and Systems (Caro, Egypt) Dtzel, M Power-aware archtectng for data-domnated applcatons. PhD thess, Delft Unversty of Technology, The Netherlands. Donoso, Y., and Fabregat, R Mult-objectve optmzaton n computer networks usng metaheurstcs. Auerbach Publcatons. Eberhart, R. C., and Sh, Y Partcle swarm optmzaton: developments, applcatons and resources. In Processons of 2001 congress on evolutonary computaton (Seoul, Korea) Eberhart, R. C., and Sh, Y Comparson between genetc algorthms and partcle swarm optmzaton. In proceedngs of the 7 th annual conference on evolutonary programmng (San Dego, CA, USA) , Eberhart, R.C., and Kennedy, J A new optmzer usng partcle swarm theory. Proceedngs of the 6 th nternatonal symposum on mcro-machne and human scence (Nagoya, Japan) Eles, P., Peng, Z., Kuchcnsk, K., and Dobol, A System level HW/SW parttonng based on smulated annealng and tabu search. Desgn automaton for embedded systems. Vol. 2, No Ernest, R. L Target archtectures n Hardware/Software Co-Desgn: prncples and practce, Staunstrup, J. and Wolf W. (eds.). Kluwer Academc publshers Gupta, R.K., and De Mchel, G System-level synthess usng re-programmable components. In Proceedngs of the 3 rd European Conference on Desgn Automaton (Brussels, Belgum) Hanselman, D., and Lttlefeld, B Masterng MATLAB 6, Prentce Hall. Hassan, R., Cohanm, B., de Weck, O., and Venter, G A comparson of partcle swarm optmzaton and the genetc algorthm. 1 st AIAA Multdscplnary Desgn Optmzaton Specalst Conference (Austn, Texas). Haupt, R. L., and Haupt, S. E Practcal Genetc Algorthms. Second Edton, Wley Interscence. Henkel, J., and Ernst, R An approach to automated hardware/software parttonng usng a flexble granularty that s drven by hgh-level estmaton technques. IEEE Transactons on Very Large Scale Integraton Systems, Vol. 9, No. 2, Jensen, M. T Reducng the run-tme complexty of multobjectve EAs: The NSGA-II and other algorthms. IEEE Transactons on Evolutonary Computaton, Vol 7, No. 5, Jha, N. K. and Dck, R. P MOGAC: a multobjectve genetc algorthm for hardwaresoftware co-synthess of dstrbuted embedded systems. IEEE Transactons on Computer-Aded Desgn of Integrated Crcuts and Systems, Vol. 17, No Jonsson, B A JPEG encoder n SystemC, MSc thess, Lulea Unversty of Technology, Sweden. Kalavade, A. and Lee, E. A A global crtcalty/local phase drven algorthm for the constraned hardware/software parttonng problem. In Proceedngs of Thrd Internatonal Workshop on Hardware/Software Codesgn (Grenoble, France)

27 Partcle Swarm Optmzaton for HW/SW Parttonng 75 Kalavade, A. and Lee, E. A The Extended Parttonng Problem: Hardware-software Mappng and Implementaton-Bn Selecton. In Readngs n hardware/software codesgn, De Mchel, G., Ernest, R. L, and Wolf W.(eds.), Morgan Kaufmann Kennedy, J., and Eberhart, R.C Partcle swarm optmzaton. In proceedngs of IEEE nternatonal Conference on Neural Networks (Perth, Australa) Knudsen, P. V., and Madsen, J PACE: a dynamc programmng algorthm for hardware/software parttonng. Fourth Internatonal Workshop on Hardware/Software Co-Desgn (Pttsburgh, PA, USA) Lee, T.Y., Fan, Y. H., Cheng, Y. M. Tsa, C. C., and Hsao, R. S. 2007a. Hardware-Orented Partton for Embedded Multprocessor FPGA systems. In Proceedngs of the Second Internatonal Conference on Innovatve Computng, Informaton and Control (Kumamoto, Japan) Lee, T.Y., Fan, Y. H., Cheng, Y. M. Tsa, C. C., and Hsao, R. S. 2007b. An effcently hardware-software parttonng for embedded Multprocessor FPGA system. In Proceedngs of Internatonal Multconference of Engneers and Computer Scentsts (Hong Kong) Lee, T.Y., Fan, Y. H., Cheng, Y. M., Tsa, C. C., and Hsao, R. S. 2007c. Enhancement of Hardware-Software Partton for Embedded Multprocessor FPGA Systems. In Proceedngs of the 3 rd Internatonal Conference on Internatonal Informaton Hdng and Multmeda Sgnal Processng (Kaohsung, Tawan) Ln, T. Y., Hung, Y. T., and Chang, R. G Effcent hardware/software parttonng approach for embedded multprocessor systems. In Proceedngs of Internatonal Symposum on VLSI Desgn, Automaton and Test (Hsnchu, Tawan) Lopez-Vallejo, M. and Lopez, J. C On the hardware-software parttonng problem: system modelng and parttonng technques. ACM transactons on desgn automaton for electronc systems, Vol. 8, No Luthra, M., Gupta, S., Dutt, N., Gupta, R., and Ncolau, A Interface synthess usng memory mappng for an FPGA platform. In Proceedngs of the 21 st Internatonal conference on computer desgn (San Jose, CA, USA) Madsen, J., Gorde, J., Knudsen, P. V. Petersen, M. E., and Haxthausen, A lycos: The Lyngby co-synthess system. Desgn Automaton of Embedded Systems, Vol. 2, No Mann, Z. A Parttonng algorthms for Hardware/Software Co-desgn. PhD thess, Budapest Unversty of Technology and Economcs, Hungary. Marrec, P. L., Valderrama, C. A., Hessel, F., Jerraya, A. A., Atta, M., and Cayrol, O Hardware, software and mechancal cosmulaton for automotve applcatons. proceedngs of 9 th Internatonal Workshop on Rapd System Prototypng (Leuven, Belgum) Me, B., Schaumont, P., and Vernalde, S A hardware/software parttonng and schedulng algorthm for dynamcally reconfgurable embedded systems. In Proceedngs of 11 th ProRISC (Veldhoven, Netherlands). Neman, R Hardware/Software co-desgn for data flow domnated embedded systems. Kluwer Academc publshers. Pasupulet, S. and Battt, R The Gregarous Partcle Swarm Optmzer (GPSO). Proceedngs of the Genetc and Evolutonary Computaton Conference (Seattle, WA, USA)

28 76 Partcle Swarm Optmzaton Pol, R Analyss of the publcatons on the applcatons of partcle swarm optmzaton applcatons. Tech. Rep. CSM-469, Department of Computng and Electronc Systems, Unversty of Essex, Colchester, Essex, UK. Rodrguez, M, A., and Bollen, J Smulatng Network Influence Algorthms Usng Partcle- Swarms: PageRank and PageRank-Prors. Avalable Settles, M., and Soule, T. 2003, A hybrd GA/PSO to evolve artfcal recurrent neural networks. In Intellgent Engneerng Systems through Artfcal NN (St. Lous, MO, USA) Sh, Y, and Eberhart, R. C Emprcal study of partcle swarm optmzaton. In Proceedngs of the 1999 Congress on Evolutonary Computaton (Washngton, DC, USA) Sh, Y, and Eberhart, R. C Parameter selecton n partcle swarm optmzaton. In Proceedngs of 7 th Annual Conference on Evolutonary Computaton (New York, NY, USA) Sttt, G Hardware/Software Parttonng wth Mult-Verson Implementaton Exploraton, In Proceedngs ofgreat Lakes Symposum n VLSI (Orlando, FL, USA) Sttt, G., Vahd, F., McGregor, G., and Enloth, B Hardware/Software Parttonng of Software Bnares: A Case Study of H.264 Decoder. IEEE/ACM CODES+ISSS'05 (New York, NY, USA) Tllett, J., Rao, T.M., Sahn, F., and Rao, R. 2005, Darwnan partcle swarm optmzaton. Proceedngs of the 2 nd Indan Intl. Conference on Artfcal Intellgence (Pune, Ind) Vahd, F Parttonng Sequental Programs for CAD usng a Three-Step Approach. ACM Transactons on Desgn Automaton of Electronc Systems, Vol. 7, No Van den Bergh, F An Analyss of Partcle Swarm Optmzer. PhD thess, Department of Computer Scence, Unversty of Pretora, South Afrca. Xlnx Inc., Vrtex-II Pro and Vrtex-II Pro X Platform FPGAs: Complete Data Sheet. Zheng, Y. L., Ma, L. H., Zhang, L. Y., and Qan, J. X On the convergence analyss and parameter selecton n partcle swarm optmzaton. In Proceedngs of the 2 nd Internatonal Conference on Machne Learnng and Cybernetcs (X-an, Chna) Zou, Y. Zhuang, Z., and Cheng, H. HW-SW parttonng based on genetc algorthm In Proceedngs of Congress on Evolutonary Computaton (Anhu, Chna)

29 Partcle Swarm Optmzaton Edted by Aleksandar Laznca ISBN Hard cover, 476 pages Publsher InTech Publshed onlne 01, January, 2009 Publshed n prnt edton January, 2009 Partcle swarm optmzaton (PSO) s a populaton based stochastc optmzaton technque nfluenced by the socal behavor of brd flockng or fsh schoolng.pso shares many smlartes wth evolutonary computaton technques such as Genetc Algorthms (GA). The system s ntalzed wth a populaton of random solutons and searches for optma by updatng generatons. However, unlke GA, PSO has no evoluton operators such as crossover and mutaton. In PSO, the potental solutons, called partcles, fly through the problem space by followng the current optmum partcles. Ths book represents the contrbutons of the top researchers n ths feld and wll serve as a valuable tool for professonals n ths nterdscplnary feld. How to reference In order to correctly reference ths scholarly work, feel free to copy and paste the followng: M. B. Abdelhalm and S. E. D. Habb (2009). Partcle Swarm Optmzaton for HW/SW Parttonng, Partcle Swarm Optmzaton, Aleksandar Laznca (Ed.), ISBN: , InTech, Avalable from: onng InTech Europe Unversty Campus STeP R Slavka Krautzeka 83/A Rjeka, Croata Phone: +385 (51) Fax: +385 (51) InTech Chna Unt 405, Offce Block, Hotel Equatoral Shangha No.65, Yan An Road (West), Shangha, , Chna Phone: Fax:

Meta-heuristics for Multidimensional Knapsack Problems

Meta-heuristics for Multidimensional Knapsack Problems 2012 4th Internatonal Conference on Computer Research and Development IPCSIT vol.39 (2012) (2012) IACSIT Press, Sngapore Meta-heurstcs for Multdmensonal Knapsack Problems Zhbao Man + Computer Scence Department,

More information

CHAPTER 2 PROPOSED IMPROVED PARTICLE SWARM OPTIMIZATION

CHAPTER 2 PROPOSED IMPROVED PARTICLE SWARM OPTIMIZATION 24 CHAPTER 2 PROPOSED IMPROVED PARTICLE SWARM OPTIMIZATION The present chapter proposes an IPSO approach for multprocessor task schedulng problem wth two classfcatons, namely, statc ndependent tasks and

More information

The Codesign Challenge

The Codesign Challenge ECE 4530 Codesgn Challenge Fall 2007 Hardware/Software Codesgn The Codesgn Challenge Objectves In the codesgn challenge, your task s to accelerate a gven software reference mplementaton as fast as possble.

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

Module Management Tool in Software Development Organizations

Module Management Tool in Software Development Organizations Journal of Computer Scence (5): 8-, 7 ISSN 59-66 7 Scence Publcatons Management Tool n Software Development Organzatons Ahmad A. Al-Rababah and Mohammad A. Al-Rababah Faculty of IT, Al-Ahlyyah Amman Unversty,

More information

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization Problem efntons and Evaluaton Crtera for Computatonal Expensve Optmzaton B. Lu 1, Q. Chen and Q. Zhang 3, J. J. Lang 4, P. N. Suganthan, B. Y. Qu 6 1 epartment of Computng, Glyndwr Unversty, UK Faclty

More information

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,

More information

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

Complexity Analysis of Problem-Dimension Using PSO

Complexity Analysis of Problem-Dimension Using PSO Proceedngs of the 7th WSEAS Internatonal Conference on Evolutonary Computng, Cavtat, Croata, June -4, 6 (pp45-5) Complexty Analyss of Problem-Dmenson Usng PSO BUTHAINAH S. AL-KAZEMI AND SAMI J. HABIB,

More information

CHAPTER 4 OPTIMIZATION TECHNIQUES

CHAPTER 4 OPTIMIZATION TECHNIQUES 48 CHAPTER 4 OPTIMIZATION TECHNIQUES 4.1 INTRODUCTION Unfortunately no sngle optmzaton algorthm exsts that can be appled effcently to all types of problems. The method chosen for any partcular case wll

More information

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 An Iteratve Soluton Approach to Process Plant Layout usng Mxed

More information

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Some materal adapted from Mohamed Youns, UMBC CMSC 611 Spr 2003 course sldes Some materal adapted from Hennessy & Patterson / 2003 Elsever Scence Performance = 1 Executon tme Speedup = Performance (B)

More information

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision SLAM Summer School 2006 Practcal 2: SLAM usng Monocular Vson Javer Cvera, Unversty of Zaragoza Andrew J. Davson, Imperal College London J.M.M Montel, Unversty of Zaragoza. josemar@unzar.es, jcvera@unzar.es,

More information

Parallel matrix-vector multiplication

Parallel matrix-vector multiplication Appendx A Parallel matrx-vector multplcaton The reduced transton matrx of the three-dmensonal cage model for gel electrophoress, descrbed n secton 3.2, becomes excessvely large for polymer lengths more

More information

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1)

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1) Secton 1.2 Subsets and the Boolean operatons on sets If every element of the set A s an element of the set B, we say that A s a subset of B, or that A s contaned n B, or that B contans A, and we wrte A

More information

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points; Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features

More information

Virtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory

Virtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory Background EECS. Operatng System Fundamentals No. Vrtual Memory Prof. Hu Jang Department of Electrcal Engneerng and Computer Scence, York Unversty Memory-management methods normally requres the entre process

More information

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster

More information

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 A mathematcal programmng approach to the analyss, desgn and

More information

Mathematics 256 a course in differential equations for engineering students

Mathematics 256 a course in differential equations for engineering students Mathematcs 56 a course n dfferental equatons for engneerng students Chapter 5. More effcent methods of numercal soluton Euler s method s qute neffcent. Because the error s essentally proportonal to the

More information

User Authentication Based On Behavioral Mouse Dynamics Biometrics

User Authentication Based On Behavioral Mouse Dynamics Biometrics User Authentcaton Based On Behavoral Mouse Dynamcs Bometrcs Chee-Hyung Yoon Danel Donghyun Km Department of Computer Scence Department of Computer Scence Stanford Unversty Stanford Unversty Stanford, CA

More information

Hierarchical clustering for gene expression data analysis

Hierarchical clustering for gene expression data analysis Herarchcal clusterng for gene expresson data analyss Gorgo Valentn e-mal: valentn@ds.unm.t Clusterng of Mcroarray Data. Clusterng of gene expresson profles (rows) => dscovery of co-regulated and functonally

More information

A Hybrid Genetic Algorithm for Routing Optimization in IP Networks Utilizing Bandwidth and Delay Metrics

A Hybrid Genetic Algorithm for Routing Optimization in IP Networks Utilizing Bandwidth and Delay Metrics A Hybrd Genetc Algorthm for Routng Optmzaton n IP Networks Utlzng Bandwdth and Delay Metrcs Anton Redl Insttute of Communcaton Networks, Munch Unversty of Technology, Arcsstr. 21, 80290 Munch, Germany

More information

Simulation Based Analysis of FAST TCP using OMNET++

Simulation Based Analysis of FAST TCP using OMNET++ Smulaton Based Analyss of FAST TCP usng OMNET++ Umar ul Hassan 04030038@lums.edu.pk Md Term Report CS678 Topcs n Internet Research Sprng, 2006 Introducton Internet traffc s doublng roughly every 3 months

More information

Natural Computing. Lecture 13: Particle swarm optimisation INFR /11/2010

Natural Computing. Lecture 13: Particle swarm optimisation INFR /11/2010 Natural Computng Lecture 13: Partcle swarm optmsaton Mchael Herrmann mherrman@nf.ed.ac.uk phone: 0131 6 517177 Informatcs Forum 1.42 INFR09038 5/11/2010 Swarm ntellgence Collectve ntellgence: A super-organsm

More information

Cluster Analysis of Electrical Behavior

Cluster Analysis of Electrical Behavior Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School

More information

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms Course Introducton Course Topcs Exams, abs, Proects A quc loo at a few algorthms 1 Advanced Data Structures and Algorthms Descrpton: We are gong to dscuss algorthm complexty analyss, algorthm desgn technques

More information

NGPM -- A NSGA-II Program in Matlab

NGPM -- A NSGA-II Program in Matlab Verson 1.4 LIN Song Aerospace Structural Dynamcs Research Laboratory College of Astronautcs, Northwestern Polytechncal Unversty, Chna Emal: lsssswc@163.com 2011-07-26 Contents Contents... 1. Introducton...

More information

Active Contours/Snakes

Active Contours/Snakes Actve Contours/Snakes Erkut Erdem Acknowledgement: The sldes are adapted from the sldes prepared by K. Grauman of Unversty of Texas at Austn Fttng: Edges vs. boundares Edges useful sgnal to ndcate occludng

More information

Programming in Fortran 90 : 2017/2018

Programming in Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Exercse 1 : Evaluaton of functon dependng on nput Wrte a program who evaluate the functon f (x,y) for any two user specfed values

More information

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Proceedngs of the Wnter Smulaton Conference M E Kuhl, N M Steger, F B Armstrong, and J A Jones, eds A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Mark W Brantley Chun-Hung

More information

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique //00 :0 AM Outlne and Readng The Greedy Method The Greedy Method Technque (secton.) Fractonal Knapsack Problem (secton..) Task Schedulng (secton..) Mnmum Spannng Trees (secton.) Change Money Problem Greedy

More information

CMPS 10 Introduction to Computer Science Lecture Notes

CMPS 10 Introduction to Computer Science Lecture Notes CPS 0 Introducton to Computer Scence Lecture Notes Chapter : Algorthm Desgn How should we present algorthms? Natural languages lke Englsh, Spansh, or French whch are rch n nterpretaton and meanng are not

More information

GSLM Operations Research II Fall 13/14

GSLM Operations Research II Fall 13/14 GSLM 58 Operatons Research II Fall /4 6. Separable Programmng Consder a general NLP mn f(x) s.t. g j (x) b j j =. m. Defnton 6.. The NLP s a separable program f ts objectve functon and all constrants are

More information

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr) Helsnk Unversty Of Technology, Systems Analyss Laboratory Mat-2.08 Independent research projects n appled mathematcs (3 cr) "! #$&% Antt Laukkanen 506 R ajlaukka@cc.hut.f 2 Introducton...3 2 Multattrbute

More information

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,

More information

X- Chart Using ANOM Approach

X- Chart Using ANOM Approach ISSN 1684-8403 Journal of Statstcs Volume 17, 010, pp. 3-3 Abstract X- Chart Usng ANOM Approach Gullapall Chakravarth 1 and Chaluvad Venkateswara Rao Control lmts for ndvdual measurements (X) chart are

More information

Reducing Frame Rate for Object Tracking

Reducing Frame Rate for Object Tracking Reducng Frame Rate for Object Trackng Pavel Korshunov 1 and We Tsang Oo 2 1 Natonal Unversty of Sngapore, Sngapore 11977, pavelkor@comp.nus.edu.sg 2 Natonal Unversty of Sngapore, Sngapore 11977, oowt@comp.nus.edu.sg

More information

Quality Improvement Algorithm for Tetrahedral Mesh Based on Optimal Delaunay Triangulation

Quality Improvement Algorithm for Tetrahedral Mesh Based on Optimal Delaunay Triangulation Intellgent Informaton Management, 013, 5, 191-195 Publshed Onlne November 013 (http://www.scrp.org/journal/m) http://dx.do.org/10.36/m.013.5601 Qualty Improvement Algorthm for Tetrahedral Mesh Based on

More information

Classifier Swarms for Human Detection in Infrared Imagery

Classifier Swarms for Human Detection in Infrared Imagery Classfer Swarms for Human Detecton n Infrared Imagery Yur Owechko, Swarup Medasan, and Narayan Srnvasa HRL Laboratores, LLC 3011 Malbu Canyon Road, Malbu, CA 90265 {owechko, smedasan, nsrnvasa}@hrl.com

More information

Brave New World Pseudocode Reference

Brave New World Pseudocode Reference Brave New World Pseudocode Reference Pseudocode s a way to descrbe how to accomplsh tasks usng basc steps lke those a computer mght perform. In ths week s lab, you'll see how a form of pseudocode can be

More information

Concurrent Apriori Data Mining Algorithms

Concurrent Apriori Data Mining Algorithms Concurrent Apror Data Mnng Algorthms Vassl Halatchev Department of Electrcal Engneerng and Computer Scence York Unversty, Toronto October 8, 2015 Outlne Why t s mportant Introducton to Assocaton Rule Mnng

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Decson surface s a hyperplane (lne n 2D) n feature space (smlar to the Perceptron) Arguably, the most mportant recent dscovery n machne learnng In a nutshell: map the data to a predetermned

More information

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance Tsnghua Unversty at TAC 2009: Summarzng Mult-documents by Informaton Dstance Chong Long, Mnle Huang, Xaoyan Zhu State Key Laboratory of Intellgent Technology and Systems, Tsnghua Natonal Laboratory for

More information

Configuration Management in Multi-Context Reconfigurable Systems for Simultaneous Performance and Power Optimizations*

Configuration Management in Multi-Context Reconfigurable Systems for Simultaneous Performance and Power Optimizations* Confguraton Management n Mult-Context Reconfgurable Systems for Smultaneous Performance and Power Optmzatons* Rafael Maestre, Mlagros Fernandez Departamento de Arqutectura de Computadores y Automátca Unversdad

More information

EECS 730 Introduction to Bioinformatics Sequence Alignment. Luke Huan Electrical Engineering and Computer Science

EECS 730 Introduction to Bioinformatics Sequence Alignment. Luke Huan Electrical Engineering and Computer Science EECS 730 Introducton to Bonformatcs Sequence Algnment Luke Huan Electrcal Engneerng and Computer Scence http://people.eecs.ku.edu/~huan/ HMM Π s a set of states Transton Probabltes a kl Pr( l 1 k Probablty

More information

PARETO BAYESIAN OPTIMIZATION ALGORITHM FOR THE MULTIOBJECTIVE 0/1 KNAPSACK PROBLEM

PARETO BAYESIAN OPTIMIZATION ALGORITHM FOR THE MULTIOBJECTIVE 0/1 KNAPSACK PROBLEM PARETO BAYESIAN OPTIMIZATION ALGORITHM FOR THE MULTIOBJECTIVE 0/ KNAPSACK PROBLEM Josef Schwarz Jří Očenáše Brno Unversty of Technology Faculty of Engneerng and Computer Scence Department of Computer Scence

More information

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour 6.854 Advanced Algorthms Petar Maymounkov Problem Set 11 (November 23, 2005) Wth: Benjamn Rossman, Oren Wemann, and Pouya Kheradpour Problem 1. We reduce vertex cover to MAX-SAT wth weghts, such that the

More information

Performance Evaluation of Information Retrieval Systems

Performance Evaluation of Information Retrieval Systems Why System Evaluaton? Performance Evaluaton of Informaton Retreval Systems Many sldes n ths secton are adapted from Prof. Joydeep Ghosh (UT ECE) who n turn adapted them from Prof. Dk Lee (Unv. of Scence

More information

Smoothing Spline ANOVA for variable screening

Smoothing Spline ANOVA for variable screening Smoothng Splne ANOVA for varable screenng a useful tool for metamodels tranng and mult-objectve optmzaton L. Rcco, E. Rgon, A. Turco Outlne RSM Introducton Possble couplng Test case MOO MOO wth Game Theory

More information

A Time-driven Data Placement Strategy for a Scientific Workflow Combining Edge Computing and Cloud Computing

A Time-driven Data Placement Strategy for a Scientific Workflow Combining Edge Computing and Cloud Computing > REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 1 A Tme-drven Data Placement Strategy for a Scentfc Workflow Combnng Edge Computng and Cloud Computng Bng Ln, Fangnng

More information

Overview. Basic Setup [9] Motivation and Tasks. Modularization 2008/2/20 IMPROVED COVERAGE CONTROL USING ONLY LOCAL INFORMATION

Overview. Basic Setup [9] Motivation and Tasks. Modularization 2008/2/20 IMPROVED COVERAGE CONTROL USING ONLY LOCAL INFORMATION Overvew 2 IMPROVED COVERAGE CONTROL USING ONLY LOCAL INFORMATION Introducton Mult- Smulator MASIM Theoretcal Work and Smulaton Results Concluson Jay Wagenpfel, Adran Trachte Motvaton and Tasks Basc Setup

More information

Conditional Speculative Decimal Addition*

Conditional Speculative Decimal Addition* Condtonal Speculatve Decmal Addton Alvaro Vazquez and Elsardo Antelo Dep. of Electronc and Computer Engneerng Unv. of Santago de Compostela, Span Ths work was supported n part by Xunta de Galca under grant

More information

Sorting Review. Sorting. Comparison Sorting. CSE 680 Prof. Roger Crawfis. Assumptions

Sorting Review. Sorting. Comparison Sorting. CSE 680 Prof. Roger Crawfis. Assumptions Sortng Revew Introducton to Algorthms Qucksort CSE 680 Prof. Roger Crawfs Inserton Sort T(n) = Θ(n 2 ) In-place Merge Sort T(n) = Θ(n lg(n)) Not n-place Selecton Sort (from homework) T(n) = Θ(n 2 ) In-place

More information

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following.

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following. Complex Numbers The last topc n ths secton s not really related to most of what we ve done n ths chapter, although t s somewhat related to the radcals secton as we wll see. We also won t need the materal

More information

y and the total sum of

y and the total sum of Lnear regresson Testng for non-lnearty In analytcal chemstry, lnear regresson s commonly used n the constructon of calbraton functons requred for analytcal technques such as gas chromatography, atomc absorpton

More information

Efficient Distributed File System (EDFS)

Efficient Distributed File System (EDFS) Effcent Dstrbuted Fle System (EDFS) (Sem-Centralzed) Debessay(Debsh) Fesehaye, Rahul Malk & Klara Naherstedt Unversty of Illnos-Urbana Champagn Contents Problem Statement, Related Work, EDFS Desgn Rate

More information

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task Proceedngs of NTCIR-6 Workshop Meetng, May 15-18, 2007, Tokyo, Japan Term Weghtng Classfcaton System Usng the Ch-square Statstc for the Classfcaton Subtask at NTCIR-6 Patent Retreval Task Kotaro Hashmoto

More information

CS 534: Computer Vision Model Fitting

CS 534: Computer Vision Model Fitting CS 534: Computer Vson Model Fttng Sprng 004 Ahmed Elgammal Dept of Computer Scence CS 534 Model Fttng - 1 Outlnes Model fttng s mportant Least-squares fttng Maxmum lkelhood estmaton MAP estmaton Robust

More information

Machine Learning: Algorithms and Applications

Machine Learning: Algorithms and Applications 14/05/1 Machne Learnng: Algorthms and Applcatons Florano Zn Free Unversty of Bozen-Bolzano Faculty of Computer Scence Academc Year 011-01 Lecture 10: 14 May 01 Unsupervsed Learnng cont Sldes courtesy of

More information

High-Boost Mesh Filtering for 3-D Shape Enhancement

High-Boost Mesh Filtering for 3-D Shape Enhancement Hgh-Boost Mesh Flterng for 3-D Shape Enhancement Hrokazu Yagou Λ Alexander Belyaev y Damng We z Λ y z ; ; Shape Modelng Laboratory, Unversty of Azu, Azu-Wakamatsu 965-8580 Japan y Computer Graphcs Group,

More information

Problem Set 3 Solutions

Problem Set 3 Solutions Introducton to Algorthms October 4, 2002 Massachusetts Insttute of Technology 6046J/18410J Professors Erk Demane and Shaf Goldwasser Handout 14 Problem Set 3 Solutons (Exercses were not to be turned n,

More information

ELEC 377 Operating Systems. Week 6 Class 3

ELEC 377 Operating Systems. Week 6 Class 3 ELEC 377 Operatng Systems Week 6 Class 3 Last Class Memory Management Memory Pagng Pagng Structure ELEC 377 Operatng Systems Today Pagng Szes Vrtual Memory Concept Demand Pagng ELEC 377 Operatng Systems

More information

EVALUATION OF THE PERFORMANCES OF ARTIFICIAL BEE COLONY AND INVASIVE WEED OPTIMIZATION ALGORITHMS ON THE MODIFIED BENCHMARK FUNCTIONS

EVALUATION OF THE PERFORMANCES OF ARTIFICIAL BEE COLONY AND INVASIVE WEED OPTIMIZATION ALGORITHMS ON THE MODIFIED BENCHMARK FUNCTIONS Academc Research Internatonal ISS-L: 3-9553, ISS: 3-9944 Vol., o. 3, May 0 EVALUATIO OF THE PERFORMACES OF ARTIFICIAL BEE COLOY AD IVASIVE WEED OPTIMIZATIO ALGORITHMS O THE MODIFIED BECHMARK FUCTIOS Dlay

More information

Load Balancing for Hex-Cell Interconnection Network

Load Balancing for Hex-Cell Interconnection Network Int. J. Communcatons, Network and System Scences,,, - Publshed Onlne Aprl n ScRes. http://www.scrp.org/journal/jcns http://dx.do.org/./jcns.. Load Balancng for Hex-Cell Interconnecton Network Saher Manaseer,

More information

5 The Primal-Dual Method

5 The Primal-Dual Method 5 The Prmal-Dual Method Orgnally desgned as a method for solvng lnear programs, where t reduces weghted optmzaton problems to smpler combnatoral ones, the prmal-dual method (PDM) has receved much attenton

More information

CHAPTER 3 SEQUENTIAL MINIMAL OPTIMIZATION TRAINED SUPPORT VECTOR CLASSIFIER FOR CANCER PREDICTION

CHAPTER 3 SEQUENTIAL MINIMAL OPTIMIZATION TRAINED SUPPORT VECTOR CLASSIFIER FOR CANCER PREDICTION 48 CHAPTER 3 SEQUENTIAL MINIMAL OPTIMIZATION TRAINED SUPPORT VECTOR CLASSIFIER FOR CANCER PREDICTION 3.1 INTRODUCTION The raw mcroarray data s bascally an mage wth dfferent colors ndcatng hybrdzaton (Xue

More information

Lobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide

Lobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide Lobachevsky State Unversty of Nzhn Novgorod Polyhedron Quck Start Gude Nzhn Novgorod 2016 Contents Specfcaton of Polyhedron software... 3 Theoretcal background... 4 1. Interface of Polyhedron... 6 1.1.

More information

Assembler. Building a Modern Computer From First Principles.

Assembler. Building a Modern Computer From First Principles. Assembler Buldng a Modern Computer From Frst Prncples www.nand2tetrs.org Elements of Computng Systems, Nsan & Schocken, MIT Press, www.nand2tetrs.org, Chapter 6: Assembler slde Where we are at: Human Thought

More information

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Determining the Optimal Bandwidth Based on Multi-criterion Fusion Proceedngs of 01 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 5 (01) (01) IACSIT Press, Sngapore Determnng the Optmal Bandwdth Based on Mult-crteron Fuson Ha-L Lang 1+, Xan-Mn

More information

A Notable Swarm Approach to Evolve Neural Network for Classification in Data Mining

A Notable Swarm Approach to Evolve Neural Network for Classification in Data Mining A Notable Swarm Approach to Evolve Neural Network for Classfcaton n Data Mnng Satchdananda Dehur 1, Bjan Bhar Mshra 2 and Sung-Bae Cho 1 1 Soft Computng Laboratory, Department of Computer Scence, Yonse

More information

Review of approximation techniques

Review of approximation techniques CHAPTER 2 Revew of appromaton technques 2. Introducton Optmzaton problems n engneerng desgn are characterzed by the followng assocated features: the objectve functon and constrants are mplct functons evaluated

More information

Clustering Algorithm Combining CPSO with K-Means Chunqin Gu 1, a, Qian Tao 2, b

Clustering Algorithm Combining CPSO with K-Means Chunqin Gu 1, a, Qian Tao 2, b Internatonal Conference on Advances n Mechancal Engneerng and Industral Informatcs (AMEII 05) Clusterng Algorthm Combnng CPSO wth K-Means Chunqn Gu, a, Qan Tao, b Department of Informaton Scence, Zhongka

More information

Repeater Insertion for Two-Terminal Nets in Three-Dimensional Integrated Circuits

Repeater Insertion for Two-Terminal Nets in Three-Dimensional Integrated Circuits Repeater Inserton for Two-Termnal Nets n Three-Dmensonal Integrated Crcuts Hu Xu, Vasls F. Pavlds, and Govann De Mchel LSI - EPFL, CH-5, Swtzerland, {hu.xu,vasleos.pavlds,govann.demchel}@epfl.ch Abstract.

More information

Simulation: Solving Dynamic Models ABE 5646 Week 11 Chapter 2, Spring 2010

Simulation: Solving Dynamic Models ABE 5646 Week 11 Chapter 2, Spring 2010 Smulaton: Solvng Dynamc Models ABE 5646 Week Chapter 2, Sprng 200 Week Descrpton Readng Materal Mar 5- Mar 9 Evaluatng [Crop] Models Comparng a model wth data - Graphcal, errors - Measures of agreement

More information

CSCI 104 Sorting Algorithms. Mark Redekopp David Kempe

CSCI 104 Sorting Algorithms. Mark Redekopp David Kempe CSCI 104 Sortng Algorthms Mark Redekopp Davd Kempe Algorthm Effcency SORTING 2 Sortng If we have an unordered lst, sequental search becomes our only choce If we wll perform a lot of searches t may be benefcal

More information

Fitting: Deformable contours April 26 th, 2018

Fitting: Deformable contours April 26 th, 2018 4/6/08 Fttng: Deformable contours Aprl 6 th, 08 Yong Jae Lee UC Davs Recap so far: Groupng and Fttng Goal: move from array of pxel values (or flter outputs) to a collecton of regons, objects, and shapes.

More information

Optimizing Document Scoring for Query Retrieval

Optimizing Document Scoring for Query Retrieval Optmzng Document Scorng for Query Retreval Brent Ellwen baellwe@cs.stanford.edu Abstract The goal of ths project was to automate the process of tunng a document query engne. Specfcally, I used machne learnng

More information

A Two-Stage Algorithm for Data Clustering

A Two-Stage Algorithm for Data Clustering A Two-Stage Algorthm for Data Clusterng Abdolreza Hatamlou 1 and Salwan Abdullah 2 1 Islamc Azad Unversty, Khoy Branch, Iran 2 Data Mnng and Optmsaton Research Group, Center for Artfcal Intellgence Technology,

More information

Network Intrusion Detection Based on PSO-SVM

Network Intrusion Detection Based on PSO-SVM TELKOMNIKA Indonesan Journal of Electrcal Engneerng Vol.1, No., February 014, pp. 150 ~ 1508 DOI: http://dx.do.org/10.11591/telkomnka.v1.386 150 Network Intruson Detecton Based on PSO-SVM Changsheng Xang*

More information

DESIGNING TRANSMISSION SCHEDULES FOR WIRELESS AD HOC NETWORKS TO MAXIMIZE NETWORK THROUGHPUT

DESIGNING TRANSMISSION SCHEDULES FOR WIRELESS AD HOC NETWORKS TO MAXIMIZE NETWORK THROUGHPUT DESIGNING TRANSMISSION SCHEDULES FOR WIRELESS AD HOC NETWORKS TO MAXIMIZE NETWORK THROUGHPUT Bran J. Wolf, Joseph L. Hammond, and Harlan B. Russell Dept. of Electrcal and Computer Engneerng, Clemson Unversty,

More information

S1 Note. Basis functions.

S1 Note. Basis functions. S1 Note. Bass functons. Contents Types of bass functons...1 The Fourer bass...2 B-splne bass...3 Power and type I error rates wth dfferent numbers of bass functons...4 Table S1. Smulaton results of type

More information

An Efficient Genetic Algorithm with Fuzzy c-means Clustering for Traveling Salesman Problem

An Efficient Genetic Algorithm with Fuzzy c-means Clustering for Traveling Salesman Problem An Effcent Genetc Algorthm wth Fuzzy c-means Clusterng for Travelng Salesman Problem Jong-Won Yoon and Sung-Bae Cho Dept. of Computer Scence Yonse Unversty Seoul, Korea jwyoon@sclab.yonse.ac.r, sbcho@cs.yonse.ac.r

More information

CE 221 Data Structures and Algorithms

CE 221 Data Structures and Algorithms CE 1 ata Structures and Algorthms Chapter 4: Trees BST Text: Read Wess, 4.3 Izmr Unversty of Economcs 1 The Search Tree AT Bnary Search Trees An mportant applcaton of bnary trees s n searchng. Let us assume

More information

Verification by testing

Verification by testing Real-Tme Systems Specfcaton Implementaton System models Executon-tme analyss Verfcaton Verfcaton by testng Dad? How do they know how much weght a brdge can handle? They drve bgger and bgger trucks over

More information

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes SPH3UW Unt 7.3 Sphercal Concave Mrrors Page 1 of 1 Notes Physcs Tool box Concave Mrror If the reflectng surface takes place on the nner surface of the sphercal shape so that the centre of the mrror bulges

More information

Maximum Variance Combined with Adaptive Genetic Algorithm for Infrared Image Segmentation

Maximum Variance Combined with Adaptive Genetic Algorithm for Infrared Image Segmentation Internatonal Conference on Logstcs Engneerng, Management and Computer Scence (LEMCS 5) Maxmum Varance Combned wth Adaptve Genetc Algorthm for Infrared Image Segmentaton Huxuan Fu College of Automaton Harbn

More information

An Efficient Genetic Algorithm Based Approach for the Minimum Graph Bisection Problem

An Efficient Genetic Algorithm Based Approach for the Minimum Graph Bisection Problem 118 An Effcent Genetc Algorthm Based Approach for the Mnmum Graph Bsecton Problem Zh-Qang Chen, Rong-Long WAG and Kozo OKAZAKI Faculty of Engneerng, Unversty of Fuku, Bunkyo 3-9-1,Fuku-sh, Japan 910-8507

More information

Optimization Methods: Integer Programming Integer Linear Programming 1. Module 7 Lecture Notes 1. Integer Linear Programming

Optimization Methods: Integer Programming Integer Linear Programming 1. Module 7 Lecture Notes 1. Integer Linear Programming Optzaton Methods: Integer Prograng Integer Lnear Prograng Module Lecture Notes Integer Lnear Prograng Introducton In all the prevous lectures n lnear prograng dscussed so far, the desgn varables consdered

More information

Improving Low Density Parity Check Codes Over the Erasure Channel. The Nelder Mead Downhill Simplex Method. Scott Stransky

Improving Low Density Parity Check Codes Over the Erasure Channel. The Nelder Mead Downhill Simplex Method. Scott Stransky Improvng Low Densty Party Check Codes Over the Erasure Channel The Nelder Mead Downhll Smplex Method Scott Stransky Programmng n conjuncton wth: Bors Cukalovc 18.413 Fnal Project Sprng 2004 Page 1 Abstract

More information

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth

More information

VISUAL SELECTION OF SURFACE FEATURES DURING THEIR GEOMETRIC SIMULATION WITH THE HELP OF COMPUTER TECHNOLOGIES

VISUAL SELECTION OF SURFACE FEATURES DURING THEIR GEOMETRIC SIMULATION WITH THE HELP OF COMPUTER TECHNOLOGIES UbCC 2011, Volume 6, 5002981-x manuscrpts OPEN ACCES UbCC Journal ISSN 1992-8424 www.ubcc.org VISUAL SELECTION OF SURFACE FEATURES DURING THEIR GEOMETRIC SIMULATION WITH THE HELP OF COMPUTER TECHNOLOGIES

More information

K-means and Hierarchical Clustering

K-means and Hierarchical Clustering Note to other teachers and users of these sldes. Andrew would be delghted f you found ths source materal useful n gvng your own lectures. Feel free to use these sldes verbatm, or to modfy them to ft your

More information

3. CR parameters and Multi-Objective Fitness Function

3. CR parameters and Multi-Objective Fitness Function 3 CR parameters and Mult-objectve Ftness Functon 41 3. CR parameters and Mult-Objectve Ftness Functon 3.1. Introducton Cogntve rados dynamcally confgure the wreless communcaton system, whch takes beneft

More information

APPLICATION OF MULTIVARIATE LOSS FUNCTION FOR ASSESSMENT OF THE QUALITY OF TECHNOLOGICAL PROCESS MANAGEMENT

APPLICATION OF MULTIVARIATE LOSS FUNCTION FOR ASSESSMENT OF THE QUALITY OF TECHNOLOGICAL PROCESS MANAGEMENT 3. - 5. 5., Brno, Czech Republc, EU APPLICATION OF MULTIVARIATE LOSS FUNCTION FOR ASSESSMENT OF THE QUALITY OF TECHNOLOGICAL PROCESS MANAGEMENT Abstract Josef TOŠENOVSKÝ ) Lenka MONSPORTOVÁ ) Flp TOŠENOVSKÝ

More information

Collaboratively Regularized Nearest Points for Set Based Recognition

Collaboratively Regularized Nearest Points for Set Based Recognition Academc Center for Computng and Meda Studes, Kyoto Unversty Collaboratvely Regularzed Nearest Ponts for Set Based Recognton Yang Wu, Mchhko Mnoh, Masayuk Mukunok Kyoto Unversty 9/1/013 BMVC 013 @ Brstol,

More information

A Saturation Binary Neural Network for Crossbar Switching Problem

A Saturation Binary Neural Network for Crossbar Switching Problem A Saturaton Bnary Neural Network for Crossbar Swtchng Problem Cu Zhang 1, L-Qng Zhao 2, and Rong-Long Wang 2 1 Department of Autocontrol, Laonng Insttute of Scence and Technology, Benx, Chna bxlkyzhangcu@163.com

More information

SHAPE OPTIMIZATION OF STRUCTURES BY MODIFIED HARMONY SEARCH

SHAPE OPTIMIZATION OF STRUCTURES BY MODIFIED HARMONY SEARCH INTERNATIONAL JOURNAL OF OPTIMIZATION IN CIVIL ENGINEERING Int. J. Optm. Cvl Eng., 2011; 3:485-494 SHAPE OPTIMIZATION OF STRUCTURES BY MODIFIED HARMONY SEARCH S. Gholzadeh *,, A. Barzegar and Ch. Gheyratmand

More information