Parametric Micro-level Performance Models for Parallel Computing

Size: px
Start display at page:

Download "Parametric Micro-level Performance Models for Parallel Computing"

Transcription

1 Computer Siene Tehnial Report Computer Siene Parametri Miro-level Performane Model for Parallel Computing Youngtae Kim Iowa State Univerity Mark Fienup Iowa State Univerity Jeffrey S. Clary Iowa State Univerity Sureh C. Kothari Iowa State Univerity Follow thi and additional work at: Part of the Sytem Arhiteture Common, and the Theory and Algorithm Common Reommended Citation Kim, Youngtae; Fienup, Mark; Clary, Jeffrey S.; and Kothari, Sureh C., "Parametri Miro-level Performane Model for Parallel Computing" (1994). Computer Siene Tehnial Report Thi Artile i brought to you for free and open ae by the Computer Siene at Iowa State Univerity Digital Repoitory. It ha been aepted for inluion in Computer Siene Tehnial Report by an authorized adminitrator of Iowa State Univerity Digital Repoitory. For more information, pleae ontat digirep@iatate.edu.

2 Parametri Miro-level Performane Model for Parallel Computing Abtrat Parametri miro-level (PM) performane model are introdued to addre the important iue of how to realitially model parallel performane. Thee model an be ued to predit exeution time, identify performane bottlenek, and ompare mahine. The aurate predition and analyi of exeution time i ahieved by inorporating preie detail of interproeor ommuniation, memory operation, auxiliary intrution, and effet of ommuniation and omputation hedule. Parameter are ued for flexibility to tudy variou algorithmi and arhitetural iue. The development and verifiation proe, parameter and the ope of appliability of thee model are diued. A oherent view of performane i obtained from the exeution profile generated by PM model. The model are targeted at a large la numerial algorithm ommonly implemented on both SIMD and MIMD mahine. Speifi model are preented for matrix multipliation, LU deompoition, and FFT on a 2-D proeor array with ditributed memory. A ae tudy i done on MaPar MP-1 and MP-2 mahine to validate PM model and demontrate their utility. Diipline Sytem Arhiteture Theory and Algorithm Thi artile i available at Iowa State Univerity Digital Repoitory:

3 Parametri Miro-level Performane Model for Parallel Computing TR94-23 Youngtae Kim, Mark Fienup, Jeffrey C. Clary & Sureh C. Kothari Deember 5, 1994 Iowa State Univerity of Siene and Tehnology Department of Computer Siene 226 Atanaoff Ame, IA 50011

4 Parametri Miro-level Performane Model for Parallel Computing Youngtae Kim, Mark Fienup, Jerey S. Clary, Sureh C. Kothari Department of Computer Siene Iowa State Univerity Ame, Iowa Abtrat Parametri miro-level (PM) performane model are introdued to addre the important iue of how to realitially model parallel performane. Thee model an be ued to predit exeution time, identify performane bottlenek, and ompare mahine. The aurate predition and analyi of exeution time i ahieved by inorporating preie detail of interproeor ommuniation, memory operation, auxiliary intrution, and eet of ommuniation and omputation hedule. Parameter are ued for exibility to tudy variou algorithmi and arhitetural iue. The development and veriation proe, parameter and the ope of appliability of thee model are diued. A oherent view of performane i obtained from the exeution prole generated by PM model. The model are targeted at a large la numerial algorithm ommonly implemented on both SIMD and MIMD mahine. Spei model are preented for matrix multipliation, LU deompoition, and FFT on a 2-D proeor array with ditributed memory. A ae tudy i done on MaPar MP-1 and MP-2 mahine to validate PM model and demontrate their utility. Keyword: Performane Model, Parallel omputing, Numerial Algorithm, Memory Ae Optimization 1

5 1 Introdution How to model parallel omputation ha been an important topi of reearh in highperformane omputing. Performane model have been extenively invetigated through theoretial and empirial tudie. One important iue i how to make model realiti. The paper [19, 3] diu hortoming of earlier theoretial reearh, and propoe new model alled BSP and LogP for parallel omputation. An important apet of both model i the inorporation of ommuniation parameter whih were ignored in earlier theoretial reearh. The tudie [2, 7, 8, 18] addre everal pragmati iue and provide inight into important attribute of parallel performane. A good introdution to performane and alability of parallel ytem i provided in reent book [10, 12]. Thi paper i about parametri miro-level (PM) performane model for parallel omputation. While BSP and LogP model [19, 3] fou on what i a realiti abtration for modeling parallel performane, our emphai i on pragmati model to aurately predit and analyze exeution time. Our goal i to develop performane model that an be atually ued to predit performane on exiting and future generation mahine, ompare mahine, and failitate eient implementation of algorithm by identifying performane bottlenek. To develop uh model, we adopt a miro-level approah whih inorporate preie detail of interproeor ommuniation, memory operation, miellaneou overhead due to auxiliary intrution, and eet of ommuniation and omputation hedule. Exeution time an be predited by tting timing urve to experimental data, a diued in [8]. The bai approah i to determine an algebrai expreion for the tting formula by analyi of algorithm and then determine the oeient by experiment. Thi approah i loely aligned with our goal; it an aurately predit exeution time. A tting formula expree exeution time a a funtion of problem ize and number of proeor. It doe not deribe how arhitetural parameter aet performane. Alo, it i not poible to identify performane bottlenek uing the tting formula. We addre thee hortoming with PM model. Firt, intead of prediting exeution time a a alar quantity, PM model predit a vetor that 2

6 repreent igniant omponent of exeution time. Thi i ueful for analyi of performane. Seondly, the formula are parametri. Arhitetural and algorithmi parameter are inorporated a variable. The parameter provide exibility to tudy avariety ofarhitetural and algorithmi iue. For example, the impat of hanging proeor peed, ommuniation peed, or memory ae peed an be tudied by varying the parameter of the model. A tradeo i to be expeted between realiti modeling and it appliability in abene of pei information about the parallel algorithm or the arhiteture. It i deirable that performane model are not unneearily pei with repet to algorithm and arhiteture. Model need to be deigned with a et of parameter appliable to a wide la of parallel algorithm and arhiteture. Spei enter into the piture when parameter value have to be determined. There i an example in [3] where two implementation of FFT are onidered. The experimental reult how a dramati dierene in ommuniation ot of thoe two implementation. If a model i to predit the dierene, it i inevitable that detail of the implementation of algorithm have to be onidered. In order to aommodate thee oniting requirement, our approah i to deign the parameter and the proe of model development with general appliability in mind, and follow it up with omplete example of model whih get into pei. We onider exeution time a the prinipal meaure of performane. The model generate exeution prole to provide a piture of how omputation, memory operation, ommuniation, and miellaneou overhead together aount for the total exeution time. The exeution prole an be ued to view the performane in dierent way. Other metri uh a peedup, eieny, and MFLOPS are dened on bai of exeution prole. It i well known that performane metri an provide dierent and ometime mileading view of performane [8, 9]. We orrelate variou performane metri to provide oherent view of parallel performane. PM model are appropriate for a large la of data parallel numerial algorithm, deribed later in the paper. Thi la i of interet ine it inlude a large number 3

7 of algorithm enompaing many of the ienti and engineering appliation. Thee algorithm are typially implemented on MIMD mahine, but many of them an alo be implemented quite eiently on SIMD mahine. Separate PM model are needed for dierent algorithm. Eah model inlude a omplete repreentation of the parallel algorithm, determined by key part of the algorithm. A onrete illutration, we preent model for matrix multipliation, LU deompoition, and fat Fourier tranform (FFT), all implemented on a 2-D proeor array. Thee algorithm are of oniderable interet in pratie; individually, they have been ued a example in many empirial and theoretial tudie [19, 3, 1, 16, 6]. Together, the algorithm repreent varying degree of omputation, ommuniation, and memory requirement, and erve well a tet ae. PM model are validated, and their utility i demontrated in a ae tudy on Ma- Par MP-1 and MP-2. Two implementation of eah algorithm are tudied to illutrate the analyi and impat of memory operation. The ae tudy provide intereting example of how arhitetural dierene aet performane. For example, the hoie between a mall number of powerful proeor or a large number of le powerful proeor i often a point of debate in parallel omputing. To tudy thi iue, we preent a onrete example of performane omparion of 16K proeor MP-1 and 4K proeor MP-2 uing three algorithm with dierent omputational harateriti. The model are diued in Setion 2, the performane analyi i deribed in Setion 3, a ae tudy i preented in Setion 4, and onluion are in Setion 5. 2 Parametri Miro-level Model In thi etion, the model development and veriation proe i deribed uing the example of three parallel algorithm on a 2-D proeor array. Parameter of the model and it appliability are alo diued. 4

8 2.1 Model Development Eah PM model i baed on a preie analytial formula that apture eential operation of a given parallel algorithm. The formula ha four omponent to predit the exeution time a a vetor. Thee omponent are omputation time, ommuniation time, memory ae time, and the time for auxiliary intrution. Arhitetural parameter of the model are determined by experimental meaurement. In hypothetial ae uh a the tudy of a futuriti mahine, the parameter are extrapolated. We will rt provide an overview of model development and follow it with detail Overview The development proe an be deribed a follow: Step 1: Derive analytial formula f omp, f omm, and f mem for part of the exeution time for omputation, ommuniation, and memory operation repetively. Step 2: Do experimental meaurement of ample ae to determine model parameter and alo the time for omputation, ommuniation, aeing the memory, and the miellaneou time for auxiliary intrution. Step 3: Selet the template for regreion analyi to etimate the miellaneou overhead time. Determine the regreion oeient baed on experimentally meaured value. The regreion formula for miellaneou overhead time i denoted by f mi. Step 4: Baed on the experimental meaurement, modify the analytial expreion f mem and f omm o that the predition math with experimental timing determined in Step 2. The modiation to f mem are done to take into aount ahe eet and overlap of memory aee with other operation. The modi- ation to f omm are done to take into aount overlap of ommuniation with omputation. 5

9 Step 5: Finally, the following formula i obtained to predit the exeution time: f omp + f omm + f mi + f mem Detail The analytial formula are given for the three parallel algorithm in Appendix A. In analyzing pratial enario for parallel mahine, the lower order term an be igniant. Thee formula are arefully derived by examining the parallel algorithm to apture all it eential detail. The formula are omplex, but the advantage i that the performane predition are very aurate. The three algorithm ued in the tudy are well-known. The LU deompoition i deribed in [5]. The detail of the FFT algorithm an be found in [4]. Cannon' parallel algorithm i deribed in [12]. The LU deompoition ue a 2-D attered data layout for the oeient matrix, and it inlude partial pivoting. Dierent ommuniation pattern are ued by the three algorithm. The matrix multipliation ue nearetneighbor ommuniation where element are hifted from one proeor to the next along either a row or a olumn with wrap-around at the end. In ae of the LU deompoition, ommuniation i needed for pivoting and for broadating a pivot row andamultiplier olumn. A one-to-all broadat i ued along either a row ora olumn of proeor. To implement buttery operation, the FFT algorithm require ommuniation between proeor in a row or a olumn where the ditane between the ommuniating proeor i a power of two. Depending on whether the routing i pipelined or non-pipelined, the ot of a ommuniation operation varie. Table 1 ummarize dierent ommuniation heme and their ot, and it alo lit ot on MaPar mahine. The Xnet[d] primitive on MaPar i a verion of non-pipelined routing, and the Xnetp[d] and Xnet[d] are for pipelined routing, where d i ditane. Typially, to end a large meage from one proeor to another, multiple individual meage may be required. There may alo be a limit on the number of meage that an be pipelined together. On MaPar, eah meage ha to be either one, four or eight byte, and it ha to be loaded in a regiter 6

10 Table 1: Communiation Cot general deription MaPar pei deription routing ommuniation ot primitive ommuniation ot heme (32-bit meage) MP-1 MP-2 Pipelined T X + dt Xp + kt Xt Xnetp[d] 58 + d 48 + d Xnet[d](Copy) 84 + d 48 + d Non- T X + dkt Xt Xnet[d], d = pipelined Xnet[d], d> d d T X : tartup time T Xp : time to ll the pipeline T Xt : tranmiion time d : ditane k :number of meage rt. The pipelining i done at the bit level for eah meage. In our ae tudy, ingle preiion arithmeti i ued, and the meage are four byte eah. The ommuniation ot formula in Table 1 are implied in aordane with [15] to how the ot on MaPar when the meage ize i four byte. Example are ited in [8] to point out that imple overhead-type operation hould not be negleted, no matter how trivial they may eem. PM model onider miellaneou overhead ariing from auxiliary intrution to implement loop in the mahine language, regiter move, et. A regreion formula i ued to predit the miellaneou overhead time. The template for the regreion formula i determined by examining the loop truture of the parallel program. The template for the three algorithm are lited in Table 2. A imple algebrai manipulation of template how that miellaneou overhead i a funtion of two variable the loal problem ize and the number of proeor. The oeient 0, 1, 2, 3, and 4 hown in Table 2 are determined on bai of experimental meaurement of ample ae with dierent loal ize of problem and uing 1K and 4K proeor. The arhiteture parameter inlude individual timing for oating point intrution, ommuniation primitive, and LOAD and STORE operation. It i aumed that memory aee are only through LOAD and STORE intrution. The arhiteture parameter an be obtained from the mahine manual, but it i a good idea 7

11 Table 2: Regreion Template for Miellaneou Overhead regreion template regreion oeient MP-1 MP e e-7 f MM mi P ( M + 2 M M 3 ) e e e e e e e e-5 = mi P ( 0M + 1 (log 2 P )M + 2 M M 3 ) e e e e e e-7 f LU e e-6 mi = 0M + 1 (log 2 P 2 )M + 2 M log 2 M e e e e-6 f FFT For fmi MM LU and f FFT mi For f mi P P : proeor array ize P P : proeor array ize N N : matrix ize N :number of element M M : loal problem ize per M : loal problem ize per proeor (M = N=P) proeor (M = N=P 2 ) Table 3: Arhiteture Parameter Operation MP-1 Cyle MP-2 Cyle T load Load T tore Store T mult Floating Point Multiply T div Floating Point Diviion T add Floating Point Addition T neg Floating Point Negation T mp Floating Point Comparion T twiddle Twiddle Fator Calulation for FFT

12 to atually meaure thee timing. The arhiteture parameter are lited in Table 3 along with the value for the MaPar MP-1 and MP-2 mahine. The other parameter inlude problem ize, PE array ize, and the timing for algorithm pei primitive uh a omputing the twiddle fator for FFT. 2.2 Veriation of Model Anumber of feature are built into the model to enure that the exeution time are predited aurately. Firt, the preie detail of omputation, ommuniation, memory operation, and miellaneou overhead are inluded in the model. Seondly, the model parameter are arefully determined by experiment. However, PM model are omplex, and it i important toverify eah model ytematially. The proedure for uh averiation i deribed here. Thi proedure wa ued in our ae tudy to verify the model on the MaPar MP-1 and MP-2 mahine. We deribe the neeary experimental meaurement to be obtained by running the parallel program for ample problem ize. The experimental meaurement inlude: (i) total exeution time (T exe ), (ii) omputation time (T omp ), (iii) ommuniation time (T omm ), (iv) miellaneou overhead time (T mi ), and (v) the time for memory operation (T mem ). The experimental meaurement for (ii), (iii), and (iv) were done after deleting appropriate intrution from the ompiler generated aembly ode. Firt, T mi i meaured by deleting all the omputation, ommuniation plu the aoiated LOAD and STORE intrution. Next, only the ommuniation and the memory intrution are omitted, and the omputation time (T omp )idetermined by ubtrating T mi from the reulting exeution time. Finally, only the memory intrution are omitted, and the ommuniation time (T omm ) i determined by ubtrating T omp + T mi from the reulting exeution time. The time for memory operation i baed on the previou meaurement uing the equation T mem = T exe, T omp, T omm, T mi. The auray of model i baed on the following obervation: The omputation and ommuniation timing predited by the analytial formu- 9

13 la f omp and f omm are heked individually with experimental value T omp and T omm. Only a part of the experimental data i ued to determine the regreion oeient, and the remaining data i ued a the tet data to verify the regreion formula. The memory model i heked eparately. Experimental meaurement are ometime triky, epeially due to the fat that overlap have tobetaken into aount. In ome ae, we had to modify the aembly ode to get the experimental data ine the ompiler introdued major tranformation into the ode and making hange in the high-level language did not produe the eet we wanted. For example, thi wa the ae in an intane where we wanted to eletively omit ertain intrution to meaure their eet. There may be problem ariing from data dependenie where omiting ertain intrution an have ide eet. For example, omitting a LOAD an make the ubequent diviion intrution to aue exeption of diviion by zero. Thee iue have to be addreed in experimental proedure. Our experiene i that LOAD-STORE arhiteture make experimental proedure impler, it at leat avoid ompliation reulting from omplex addreing mode where it i not poible to eparate memory aee. A ytemati development of experimental proedure i an important and omplex topi by itelf. For example, a timing proedure uitable for program that ue meage paing i deribed in [11]. To do omplete jutie to it i beyond the ope of thi paper. 2.3 Sope and Appliability PM model are appliable to a la of numerial algorithm deribed a follow. Firt, the work done by the algorithm i haraterizable a a et of oating point operation. Seondly, the parallel exeution proeed a a ueion of tep with ynhronization point in between. Eah tep onit of omputation followed by ommuniation. The ame program i exeuted by all proeor, but dierent data i proeed. Within 10

14 eah tep, ome proeor in a MIMD mahine may nih their omputation earlier and remain partly idle till the next ynhronization point. The onept of tight ynhronization i inherent in the BSP model [19]. The BSP model onider an algorithm a a equene of upertep. Eah upertep ombine omputation and ommuniation. Many of the numerial algorithm from ienti and engineering appliation fall in the ategory to whih PM model an be applied. There are alo important exeption; for example orting algorithm where it i the data movement and not the oating point operation that haraterize work. The parallel algorithm onidered in thi paper are ued on both SIMD and MIMD mahine. We have implemented thee algorithm on MaPar, a SIMD arhiteture and ncube, a MIMD arhiteture. PM model with ome hange an be applied to dierent mahine. Experimental meaurement may poe a problem on ome mahine. For example, in ome ae it may not be poible to arrive at a yle time for an individual intrution beaue it may vary depending on the adjaent intrution. Thi wa oberved to be the ae on ncube. We have found it i eaier to make experimental meaurement on mahine that have proeor with LOAD-STORE arhiteture where the only intrution to ae memory are LOAD and STORE operation. Fortunately, thi i the ae with everal reent parallel mahine inluding MaPar MP-1 and MP-2, Intel Paragon, IBM SP-1 and SP-2. A PM model i ueful in many way. The ae tudy in later etion provide an illutration of how it i ueful to identify performane bottlenek, analyze performane, and ompare mahine. Contant are important in pratie. For example, a better deign that inreae performane by 50% i not omething that a omputer manufaturer an aord to ignore. In uh ituation, PM model provide a viable tool to aurately analyze performane of dierent deign. For a new generation of mahine, an important onideration i ot eetive improvement in performane. The alternative ould be either fater proeor, fater ommuniation hardware or fater memory. Suh alternative an be evaluated by PM model. 11

15 3 Performane Analyi The exeution prole generated by model are ued a the bai for performane analyi. We derive quantitative relationhip that are ueful for a la of algorithm diued in Setion Exeution Prole PM model predit the exeution time a a um of four omponent orreponding to omputation, ommuniation, miellaneou overhead and memory operation. The model an be ued to predit the total exeution time, and eah of it omponent eparately. The exeution prole for an algorithm i preented in the form of a table that how perentage attributed to eah omponent of the exeution time for a range of problem ize. The omputation omponent repreent the ueful work, and the other three omponent hould be a mall a poible. It beome lear from the exeution prole how igniant ommuniation, memory operation, or miellaneou overhead are a performane bottlenek. Performane an be viewed in dierent way uing variou metri. Exeution prole provide a bai to orrelate dierent view in order to provide a oherent piture of parallel performane. Speedup, eieny, and MFLOPS are dened on bai of exeution prole in way that reveal preiely the role of key fator uh a load balane. 3.2 Load Balane Load balane i an important attribute of performane in parallel omputing. For the la of algorithm onidered in thi analyi, load balane an be thought ofa the degree of utilization of proeor averaged over all \ompute only" tep after the memory and miellaneou overhead are fatored out. The following denition of Load Balane Fator(LB f ) i uh that the range for LB f i between zero to one, with one orreponding to the bet utilization of proeor. 12

16 LB f (N) = nf lop(n )t flop P 2 fomp(n ) nf lop(n) :number of normalized oating point operation for equential t f lop f omp (N) N P P omputation (P = 1) : time for a ingle normalized oating point operation : total time for oating point operation done in parallel : problem ize parameter : proeor array ize To deal with the mixture of fat and low oating point operation, normalized oating point operation are ued in thi paper. For example, on MaPar MP-1 where the ADD operation take 127 yle, and the MULT operation take 225 yle, the normalized FLOP for thee operation are ounted a 1 and 1.77 repetively. 3.3 Eieny Baed on Work Traditionally, eieny i alulated baed on the work done. However, in parallel omputing, eieny i ommonly dened a the peedup divided by the number of proeor. The ioeieny analyi [12, 13] i baed on thi denition. It ha been argued in [2] that intead of relying on time a a meaure of work, eieny hould be dened by uing unit ount baed on the ize of an indiviible tak a the meaure of work. The ratio of work aomplihed (wa) to the work expended (we) i propoed in [2] a the alternative denition of eieny. Following thee idea, onider a normalized FLOP a the unit of work. There are ome objetion to uing FLOP a a unit of work in general [8]. In our ae, however, we are onidering numerial algorithm and taking into aount memory and other operation eparately. Another objetion i that operation ount i an imperfet meaure of omputational work ine it doe not tandardize aro omputer [8]. We agree and addre thi point later in the ontext of omparing two mahine. With a normalized FLOP a the unit of work, wa i proportional to MFLOPS and we i proportional to peak MF LOP S. Auming a normalization i ued, the ratio of MFLOPS to peak MFLOPS an be onidered a the 13

17 alternate denition for eieny(ef f(n)). A hown below, the ineieny reulting from ommuniation and other overhead i aptured by the frational term, and the ineieny due to idle proeor i repreented by the load balane fator. Eff(N) = fomp(n ) fomp(n )+fomm(n )+f mi (N )+fmem(n ) LB f (N) Interetingly, for example provided in [2], the ommonly ued denition and the alternate denition of eieny both led to the ame reult. The following obervation may explain why it i o. On reubtituting for LB f and uing the traditional denition of peedup, it beome lear that both denition of eieny lead to the ame formula. Thi an be veried by uing the following formula for peedup a the ratio of the equential exeution time to the parallel exeution time. Speedup(N) = nf lop(n )t flop fomp(n )+fomm(n )+f mi (N )+fmem(n ) The overhead due to memory operation and miellaneou operation are alo preent in equential proeing. We have not fatored thoe out and are in eet meauring the overall eieny by aounting for all oure of ineieny. 3.4 MFLOPS, Eieny and Exeution time Firt, onider the MFLOPS meaure. The normalized MFLOPS are given by: MFLOPS(N) = nf lop(n )10,6 Texe T exe : experimentally meaured parallel omputation time for ize N Baed on our earlier diuion, MFLOPS an alo be alulated by: MFLOPS(N) =P eak MFLOPS Eff(N) P eak MF LOP S : lok rate number of yle per normalized f lop P 2 14

18 The quetion i what i a good meaure of performane to ompare dierent mahine baed on a given algorithm. A reaonable way i to interpret higher performane a aomplihing more ueful work in the ame amount of time. Intuitively, one may think that the eieny ould erve the purpoe. However, eieny an be a mileading meaure for omparion of dierent mahine. A mahine may be le eient, but ould till perform more work beaue it i fater than the other mahine. Thi ugget that one hould really onider the produt of the eieny and the rate of work of a mahine. If normalized FLOP i onidered a the unit of work, then MFLOPS i uh a meaure. MFLOPS alo ha a problem. The diulty lie in uing a normalized FLOP a a unit of work aro dierent mahine. In pite of normalization, the ame work (for example the multipliation of two matrie of a given ize) an tranlate into dierent FLOPS on dierent mahine. Thi problem an be addreed in a ouple of dierent way. One olution i to onider a unit of work that depend on the appliation, not on the mahine. For example, an addition plu a multipliation i a viable unit of work to ompare performane of matrix multipliation on dierent mahine. Another olution an be to require a onverion rate to onvert a FLOP from one mahine to another mahine. The onverion i done o that the number of FLOP orreponding to the ame work i unhanged in going from one mahine to another mahine. The denition of work and the onverion rate depend on the algorithm. Thu, in omparing dierent mahine with repet to a given algorithm, there are really three iue; the eieny, the rate of work, and the unit of work. The bottom line i alway the exeution time auming auray of alulation i atifatory. With the ue of an appropriate onverion rate, higher MFLOPS number indeed mean lower exeution time. A a onrete example, for matrix multipliation one normalized FLOP on MaPar MP-1 hould be onverted to ( 2:58 ) normalized FLOP on MP-2. 2:77 Thi i beaue the number for normalized FLOP for an addition plu multipliation i 2.58 on MP-2 and 2.77 on MP-1. The LU deompoition kernel ue the ame oating point operation a matrix multipliation, thu the ame onverion rate i 15

19 appliable for both. An analyi of FFT kernel how that the onverion rate i one for that algorithm. Inidentally, without the onverion the MFLOP number on MP- 1 are inated, when ued for meauring performane of matrix multipliation and LU deompoition. 4 Cae Study Thi tudy wa done on a 16K proeor MaPar MP-1 with 16K byte of memory per proeor and a 4K proeor MP-2 mahine with 64K byte of memory per proeor. PM model of matrix multipliation, LU deompoition, and FFT are onidered. Two implementation of eah algorithm were tudied to illutrate the analyi and impat of memory operation. The eond implementation inluded oftware pipelining to redue the time for memory operation. The highet level of ompiler optimization wa ued with both implementation. A pre-analyi wa done auming the memory overlap ratio to be zero in the model. Seondly, a pot-analyi wa done by inluding a non-zero overlap ratio baed on the experimental data from the eond implementation whih introdued igniant memory overlap a a reult of oftware pipelining. 4.1 Parallel Mahine MaPar MP-1 and MP-2 mahine are baed on a ingle-intrution tream, multiple data tream (SIMD) arhiteture with proeor arranged in a two dimenional toroidal grid. A parallel program run on the array ontrol unit (ACU) whih broadat intrution to the proeor. The ommuniation operation on MaPar and their ot are diued earlier. The MP-1 and MP-2 mahine have a lok rate of 12.5 MHz, and idential intrution et. However, the MP-1 ue 4-bit proeor while the MP-2 ue 32-bit proeor. The MP-2 proeor an perform oating point operation four to ve time fater than the MP-1 proeor. Meaured yle time for everal intrution are hown in Table 3. There i no ahe memory on either mahine, and eah proeor ha forty 32-bit regiter. Memory aee are done only through LOAD and 16

20 STORE intrution. Other intrution, inluding interproeor ommuniation, are all regiter baed. Table 4: Auray of Exeution time Predition 16K MP-1 4K MP-2 model experi- di. model experi- di. N mental mental (e) (e) (%) (e) (e) (%) Matrix Multipliation LU Deompoition Fat Fourier Tranform Validation of Model To validate PM model, their predition are ompared with experimental reult on MaPar MP-1 and MP-2 mahine. We did ompare the model and the experimental reult for the four part of the exeution time eparately. Intead of preenting the individual omparion for eah part, the omparion of the total exeution time i 17

21 preented Table 4. The reult how that in all ae, the model are very aurate. 4.3 Pre-Analyi: Identifying Performane Bottlenek PM model yield exeution prole that an provide lue for improving performane. The prole for the three algorithm are hown in Table 5, 6, and 7. The exeution pro- le inlude the total exeution time, and it break-up baed on omputation, ommuniation, miellaneou overhead, and memory operation. Note that the omponent other than the omputation hould be a mall a poible for high performane. The pre-analyi table 5 and 6 how that memory operation aount for a igniant portion of the exeution time. For matrix multipliation, miellaneou overhead and interproeor ommuniation together ontitute only a mall part (10% or le) of the exeution time, but memory operation aount for a muh a 37% on MP-1 and 52% on MP-2. It get wore with LU deompoition a it i a more memory-ae intenive algorithm. Miellaneou overhead for LU deompoition are igniant for maller problem ize, but they dereae for larger problem. The performane pro- le for FFT (Table 7) i quite dierent. It i lear that memory operation i not the problem. The performane lo with FFT i mainly due to interproeor ommuniation. The pre-analyi ugget that the performane of matrix multipliation and LU deompoition ould be igniantly improved by uing tehnique that minimize the time for memory operation. 4.4 Memory Ae Optimization The performane lo due to memory operation an be minimized by exploiting the organization of the memory and how itwork. We ued bloking and oftware pipelining. Sine there i no ahe memory on MaPar, bloking wa implemented uing the regiter. Software pipelining wa found to be more ritial for performane improvement on MaPar mahine. 18

22 Table 5: Pre-Analyi by Model : Matrix Multipliation N omp % MP-1 omm % mem ae % mi % omp % MP-2 omm % mem ae % mi % Table 6: Pre-Analyi by Model : LU Deompoition N omp % MP-1 omm % mem ae % mi % omp % MP-2 omm % mem ae % mi % Table 7: Pre-Analyi by Model : Fat Fourier Tranform N omp % MP-1 omm % mem ae % mi % omp % MP-2 omm % mem ae % mi %

23 regiter a, b, ; for i = 0to M-1 begin for j = 0to M-1 begin = C(i,j); for k = 0to M-1 begin a = A(i,k); b = B(k,j); += a * b; end C(i, j) = ; end end (bai verion) regiter a0, a1, b0, b1, ; for i = 0to M-1 begin for j = 0to M-1 begin = 0.0; a0 = A(i,0); b0 = B(0,j); for k = 0to M-1 begin (1) a1 = A(i,k+1); (2) b1 = B(k+1,j); (3) += a0 * b0; a0 = a1; b0 = b1; end += a0 * b0; C(i, j) += ; end end end (oftware pipelined verion) Figure 1: An example of oftware pipelining applied to matrix multipliation Software Pipelining Tehnique Software pipelining i ued to redue the overhead of aeing the memory. Thi tehnique ha been previouly tudied [14, 17] for VLIW and other arhiteture. The tehnique i ommonly ued on RISC worktation. On MaPar, we had to apply the tehnique by hand to oure level program to hange the order of operation in ueive iteration of a loop o that data ould be prefethed. Software pipelining help if the hardware an overlap prefething of data with omputation and ommuniation. We applied oftware pipelining to omputation loop with oating point operation and alo to ommuniation loop that move a blok of data from the loal memory of one 20

24 proeor to another proeor. The oftware pipelining tehnique i illutrated in Figure 1 by the example of matrix multipliation. For the bai matrix multipliation loop in the left program egment, element of the A and B array are ued for oating point operation immediately after they are aeed. A a reult, oating point operation annot tart until the memory aee are omplete. On the other hand, for the pipelined loop, the array element get prefethed in line (1) and (2). Thi prefething i overlapped with the oating point omputation done in line (3). Software pipelining an be ombined with loop unrolling for further improvement in performane Meaurement of Memory Overlap Thi etion illutrate how memory ae optimization i aounted for, and how igniant i it impat on performane. The impat of overlapping memory operation i meaured by the overlap ratio (O r ) baed on the equation: f mem (1, O r )=T mem. A dened originally, f mem give the time for memory operation in abene of overlap, and T mem i the experimentally meaured time for memory operation in preene of oftware pipelining. The overlap ratio play a role imilar to the hit ratio for analyzing the ahe memory performane. Similar to the ahe hit ratio, the overlap ratio ha a value between 0 and 1, and the loer it i to 1, the higher the performane. The overlap reulting from oftware pipelining i expeted to inreae up to a point with inreaing number of pipelined iteration of the for loop. In a pipelined operation, the eieny inreae with the number of job until it level o at a maximum value. The ame trend i oberved for the overlap ratio. The overlap ratio depend on the algorithm, arhiteture of the mahine and problem ize. It inreae with the loal problem ize until it level o a hown in Figure 2 and 3. Note that eah gure refer to the total problem ize and not the loal ize at eah proeor. The orreponding loal problem ize are larger on MP-2 a it ha only 4K proeor ompared to 16K proeor on MP-1. LU deompoition how higher overlap than matrix multipliation on MP-1, and 21

25 0:90 0:85 Matrix Multipliation LU Deompoition Fat Fourier Tranfrom Overlap ratio 0:80 0:75 0:70 0: Problem ize (N) Figure 2: Memory Overlap Ratio on 16K MP-1 Overlap ratio 0:90 0:85 0:80 0:75 0: Matrix Multipliation LU Deompoition Fat Fourier Tranfrom :65 Problem ize (N) Figure 3: Memory Overlap Ratio on 4K MP-2 22

26 it i other way around on MP-2. LU deompoition kernel i more memory intenive; it require an additional STORE operation ompared to matrix multipliation kernel. We veried that if an additional STORE operation i inluded (redundantly) in the matrix multipliation kernel, then it overlap ratio mathe loely with that of LU deompoition. Memory aee need to be fatored into realiti model of parallel omputing. Memory aee an have igniant impat on performane even in parallel omputing. For two out of the three algorithm in our tudy, the memory ae ot in fat turn out to be ubtantially higher than the interproeor ommuniation ot. Memory ae time an vary igniantly due to memory hierarhy and overlap of memory aee with other operation. We have addreed memory overlap whih i the relevant iue on MaPar mahine where there i no ahe memory, but the overlap i a igniant fator. A future reearh, it will be worthwhile to do ae tudie on other mahine with ahe memory. There i extenive literature on performane analyi of ahe memory whih need to be explored in the ontext of realiti modeling for parallel mahine with ditributed memory. 4.5 Pot-Analyi A \pot-analyi" wa done to tudy performane after it wa improved by oftware pipelining. To aount for the memory overlap, f mem i replaed by f mem (1, O r ) in the pot-analyi. A omparion of exeution time between pre-analyi and potanalyi (Table 8) how that a igniant improvement in performane i poible on MaPar mahine by overlapping memory operation with other operation. The pot-analyi table 9, 10 and 11 provide a quantitative piture of how dierent overhead impat performane. The following trend are oberved for the three algorithm when overhead are onidered a perentage of the total exeution time. For matrix multipliation, memory i the dominant overhead. For LU deompoition, miellaneou and ommuniation overhead are alo high, but only for maller problem. A the problem ize inreae, the other two overhead diminih and memory 23

27 Table 8: Improvement by Overlappping of Memory Operation 16K MP-1 4K MP-2 pre- pot- improve pre- pot- improve- N anal. anal. -ment anal. anal. ment (e) (e) (%) (e) (e) (%) Matrix Multipliation LU Deompoition Fat Fourier Tranform beome the dominant overhead. Software pipelining help ubtantially and more o in ae of MP-2, but the ot of memory operation till remain relatively high. For FFT, the ommuniation overhead i the mot igniant followed by the miellaneou overhead; memory overhead i very low. Next, we analyze eieny whih i aeted by overhead and the load balane. The eieny urve on MP-1 and MP-2 are hown in Figure 4 and 5. Matrix multipliation ha the leat overhead plu the bet poible load balane, thu it ahieve the highet eieny among the three algorithm. After oftware pipelining, the overall 24

28 Table 9: Pot-Analyi by Model a : Matrix Multipliation N omp % MP-1 omm % mem ae % mi % omp % MP-2 omm % mem ae % mi % Table 10: Pot-Analyi by Model a : LU Deompoition N omp % MP-1 omm % mem ae % mi % omp % MP-2 omm % mem ae % mi % Table 11: Pot-Analyi by Model a :Fat Fourier Tranform N omp % MP-1 omm % mem ae % mi % omp % MP-2 omm % mem ae % mi % a Pot-Analyi how performane after memory ae optimization i done 25

29 Eieny 1:0 0:9 0:8 0:7 0:6 0:5 0:4 0: Matrix Multipliation LU Deompoition Fat Fourier Tranfrom Problem ize (N) Figure 4: Eieny on 16K MP-1 Eieny 1:0 0:9 0:8 0:7 0:6 0:5 0:4 0:3 Matrix Multipliation LU Deompoition Fat Fourier Tranfrom Problem ize (N) Figure 5: Eieny on 4K MP-2 26

30 overhead for LU deompoition beome maller ompared to FFT, epeially for large problem. The load balane for LU deompoition i low for mall ize problem, but it improve due to 2-D attered deompoition a the problem ize inreae. Uing the formula from Setion 3.2, it wa heked that the load balane fator (LB f ) for LU deompoition hanged from 0.64 to 0.94 on MP-1 and from 0.76 to 0.96 on MP-2. The net reult i that the eieny urve for LU deompoition eventually take o, and i muh higher than the FFT urve. For matrix multipliation and FFT, it i eay to ee from the parallel algorithm itelf that LB f = 1, i.e., proeor are fully utilized when the problem ize i a multiple of the PE array ize. 4.6 Comparion of Two Mahine We will ue PM model to ompare two mahine. Thi omparion provide a onrete example to tudy an important iue in parallel omputing, namely, \whih hoie i better? { a mall number ofpowerful proeor or a large number of le powerful proeor". MP-1 ha 16K imple 4-bit proeor wherea MP-2 ha 4K 32-bit proeor. Eah MP-2 proeor i four to ve time fater than MP-1 proeor in term of oating point omputation. The peak rating of 16K proeor MP-1 and 4K proeor MP-2 are repetively 1613 and 1969 normalized MFLOPS. The two mahine have the ame amount of total memory, thu it i poible to ompare problem of the ame ize on both mahine. The three algorithm ued for the omparion are ueful to get dierent perpetive. We will ompare the two mahine in term of overhead, eieny, MFLOPS, and exeution time. The impat of overhead turn out to be igniantly dierent on MP-1 and MP-2. For eah algorithm, we ompare the data for the ame ize problem on MP-1 and MP-2. A een from the pot-analyi table 9, 10 and 11, all overhead inluding memory, ommuniation, and miellaneou are igniantly higher on MP-2. Thi an be undertood on bai of two fator related to dierene in arhitetural parameter. Firt, only ertain operation are fater on MP-2, and thoe are alo not in the ame proportion. For example, oating point operation are four to ve time fater, but 27

31 Dierene 30% 25% 20% 15% 10% 5% 0% 5% 10% Problem ize (N) Matrix Multipliation LU Deompoition Fat Fourier Tranfrom MP-1 > MP Figure 6: Performane Comparion baed on MFLOPS 6? 6 MP-2 > MP-1? Dierene 30% 25% 20% 15% 10% 5% 0% 5% 10% Matrix Multipliation LU Deompoition Fat Fourier Tranfrom Problem ize (N) Figure 7: Performane Comparion baed on Exeution time 6 MP-1 > MP-2? 6 MP-2 > MP-1? 28

32 memory operation are only twie a fat ompared to MP-1. The ommuniation operation and auxiliary intrution leading to miellaneou overhead are not fater at all. Seondly, MP-1 i a larger mahine where more proeor are onneted to eah other, thu it ommuniation bandwidth i higher. Next, we ompare the two mahine in term of eieny and MFLOPS. In all ae, the eieny on MP-2 i lower (ompare Figure 4 and 5). A maller mahine an ahieve higher load balane whih help eieny. In thi ae, however, the main fator i overhead whih are igniantly higher on MP-2. Although MP-2 i le eient, it i fater and ha higher peak MFLOPS rating than MP-1. So we ompare the two mahine in term of MFLOPS. The 4K proeor MP-2, in all intane, ahieve lower MFLOPS than the 16K proeor MP-1 (ee Figure 6). The omparion baed on overhead, eieny, and MFLOPS implie that MP-2 i wore mahine than MP-1. Before we admit that onluion, let u ompare exeution time. A omparion of exeution time i hown in Figure 7. It i een that MP- 2 i better than MP-1 for matrix multipliation in all ae, it i alo better for LU deompoition with maller problem. Thee reult are not urpriing baed on the earlier diuion in Setion 3.4. It wa pointed out that the MFLOPS number on MP-1 are inated for matrix multipliation and LU deompoition. To get a oherent piture of performane, we need to onvert MFLOPS from one mahine to another. The onverion rate are given in Setion 3.4. If MFLOP omparion i redone uing proper onverion, then it turn out to be exatly the ame a the exeution time omparion. In fat, for FFT, the onverion rate i one whih i onitent with the obervation that both the MFLOP and the exeution time omparion urve are almot idential for that algorithm (ompare Figure 6 and 7). 4.7 Predition for a Future Mahine We illutrate how PM model an be ued to make performane predition for a future generation mahine. For a new mahine, many dierent alternative may beofinteret. For example, it may be neeary to onider impat of inreaing proeor peed, 29

33 Table 12: Speedup Predition for 16K MP-2 over 4K MP-2 4K MP-2 16K MP-2 relative N time (e) time (e) peedup Matrix Multipliation LU Deompoition Fat Fourier Tranform improving memory ae time, enhaning ommuniation hardware, or inreaing the number of proeor. We ue PM model to predit performane when the number of proeor i inreaed from 4K to 16K in a future MP-2 mahine. The peedup predition are given in Table 12. We have hown peedup obtained by inreaing the number of proeor from 4K to 16K on MP-2. Exeution prole are provided in Table 13 to give an idea of how overhead due to interproeor ommuniation, memory aee, and auxiliary intrution are expeted to hange. If Table 9, 10, 11 and Table 13 are ompared, it i een that overhead inreae. The inreae i mot igniant in ae of FFT. 30

34 Table 13: Predition of Exeution Prole on 16K MP-2 N omp % omm % memory % mi % Matrix Multipliation LU Deompoition Fat Fourier Tranform It i diult to hek validity of future predition, but it may be poible to hek validity of the approah. To hek validity of the approah, hypothetial predition were made for 16K proeor MP-1, and they were heked uing the real mahine. A ouple of thing are worth mentioning about the validation. The memory overlap ratio depend on the loal ize of problem, and we veried that it i fairly aurate to extrapolate the overlap ratio on that bai. The regreion formula for prediting miellaneou overhead were developed uing tet ae on 1K and 4K proeor on MP-1, and their validity wa heked on 16K proeor. 31

Combined Radix-10 and Radix-16 Division Unit

Combined Radix-10 and Radix-16 Division Unit Combined adix- and adix-6 Diviion Unit Tomá ang and Alberto Nannarelli Dept. of Eletrial Engineering and Computer Siene, Univerity of California, Irvine, USA Dept. of Informati & Math. Modelling, Tehnial

More information

KINEMATIC ANALYSIS OF VARIOUS ROBOT CONFIGURATIONS

KINEMATIC ANALYSIS OF VARIOUS ROBOT CONFIGURATIONS International Reearh Journal of Engineering and Tehnology (IRJET) e-in: 39-6 Volume: 4 Iue: May -7 www.irjet.net p-in: 39-7 KINEMATI ANALYI OF VARIOU ROBOT ONFIGURATION Game R. U., Davkhare A. A., Pakhale..

More information

Description of Traffic in ATM Networks by the First Erlang Formula

Description of Traffic in ATM Networks by the First Erlang Formula 5th International Conferene on Information Tehnology and Appliation (ICITA 8) Deription of Traffi in ATM Network by the Firt Erlang Formula Erik Chromý, Matej Kavaký and Ivan Baroňák Abtrat In the paper

More information

Incorporating Speculative Execution into Scheduling of Control-flow Intensive Behavioral Descriptions

Incorporating Speculative Execution into Scheduling of Control-flow Intensive Behavioral Descriptions Inorporating Speulative Exeution into Sheduling of Control-flow Intenive Behavioral Deription Ganeh Lakhminarayana, Anand Raghunathan, and Niraj K. Jha Dept. of Eletrial Engineering C&C Reearh Laboratorie

More information

Macrohomogenous Li-Ion-Battery Modeling - Strengths and Limitations

Macrohomogenous Li-Ion-Battery Modeling - Strengths and Limitations Marohomogenou Li-Ion-Battery Modeling - Strength and Limitation Marku Lindner Chritian Wieer Adam Opel AG Sope Purpoe of the reearh: undertand and quantify impat of implifiation in marohomogeneou model

More information

ISSN (Online), Volume 1, Special Issue 2(ICITET 15), March 2015 International Journal of Innovative Trends and Emerging Technologies

ISSN (Online), Volume 1, Special Issue 2(ICITET 15), March 2015 International Journal of Innovative Trends and Emerging Technologies International Journal of Innovative Trend and Emerging Tehnologie ROBUST SCAN TECHNIQUE FOR SECURED AES AGAINST DIFFERENTIAL CRYPTANALYSIS BASED SIDE CHANNEL ATTACK A.TAMILARASAN 1, MR.A.ANBARASAN 2 1

More information

1. Introduction. Abstract

1. Introduction. Abstract Automati Ontology Derivation Uing Clutering for Image Claifiation 1 Latifur Khan and Lei Wang Department of Computer Siene Univerity of Texa at Dalla, TX 75083-0688 Email: [lkhan, leiwang]@utdalla.edu

More information

Datum Transformations of NAV420 Reference Frames

Datum Transformations of NAV420 Reference Frames NA4CA Appliation Note Datum ranformation of NA4 Referene Frame Giri Baleri, Sr. Appliation Engineer Crobow ehnology, In. http://www.xbow.om hi appliation note explain how to onvert variou referene frame

More information

Relayer Selection Strategies in Cellular Networks with Peer-to-Peer Relaying

Relayer Selection Strategies in Cellular Networks with Peer-to-Peer Relaying Relayer Seletion Strategie in Cellular Network with Peer-to-Peer Relaying V. Sreng, H. Yanikomeroglu, and D. D. Faloner Broadband Communiation and Wirele Sytem (BCWS) Centre Dept. of Sytem and Computer

More information

OSI Model. SS7 Protocol Model. Application TCAP. Presentation Session Transport. ISDN-UP Null SCCP. Network. MTP Level 3 MTP Level 2 MTP Level 1

OSI Model. SS7 Protocol Model. Application TCAP. Presentation Session Transport. ISDN-UP Null SCCP. Network. MTP Level 3 MTP Level 2 MTP Level 1 Direte Event Simulation of CCS7 DAP Benjamin, AE Krzeinki and S Staven Department of Computer Siene Univerity of Stellenboh 7600 Stellenboh, South Afria fbenj,aek,taveng@.un.a.za ABSTRACT: Complex imulation

More information

COURSEWORK 1 FOR INF2B: FINDING THE DISTANCE OF CLOSEST PAIRS OF POINTS ISSUED: 9FEBRUARY 2017

COURSEWORK 1 FOR INF2B: FINDING THE DISTANCE OF CLOSEST PAIRS OF POINTS ISSUED: 9FEBRUARY 2017 COURSEWORK 1 FOR INF2B: FINDING THE DISTANCE OF CLOSEST PAIRS OF POINTS ISSUED: 9FEBRUARY 2017 Submiion Deadline: The ourework onit of two part (of a different nature) relating to one problem. A hown below

More information

Laboratory Exercise 6

Laboratory Exercise 6 Laboratory Exercie 6 Adder, Subtractor, and Multiplier The purpoe of thi exercie i to examine arithmetic circuit that add, ubtract, and multiply number. Each type of circuit will be implemented in two

More information

1 The secretary problem

1 The secretary problem Thi i new material: if you ee error, pleae email jtyu at tanford dot edu 1 The ecretary problem We will tart by analyzing the expected runtime of an algorithm, a you will be expected to do on your homework.

More information

Shortest Paths in Directed Graphs

Shortest Paths in Directed Graphs Shortet Path in Direted Graph Jonathan Turner January, 0 Thi note i adapted from Data Struture and Network Algorithm y Tarjan. Let G = (V, E) e a direted graph and let length e a real-valued funtion on

More information

Visual Targeted Advertisement System Based on User Profiling and Content Consumption for Mobile Broadcasting Television

Visual Targeted Advertisement System Based on User Profiling and Content Consumption for Mobile Broadcasting Television Viual Targeted Advertiement Sytem Baed on Uer Profiling and ontent onumption for Mobile Broadating Televiion Silvia Uribe Federio Alvarez Joé Manuel Menéndez Guillermo inero Abtrat ontent peronaliation

More information

Pipelined Multipliers for Reconfigurable Hardware

Pipelined Multipliers for Reconfigurable Hardware Pipelined Multipliers for Reonfigurable Hardware Mithell J. Myjak and José G. Delgado-Frias Shool of Eletrial Engineering and Computer Siene, Washington State University Pullman, WA 99164-2752 USA {mmyjak,

More information

Pruning Game Tree by Rollouts

Pruning Game Tree by Rollouts Pruning Game Tree by Rollout Bojun Huang Mirooft Reearh bojhuang@mirooft.om Abtrat In thi paper we how that the α-β algorithm and it ueor MT-SSS*, a two lai minimax earh algorithm, an be implemented a

More information

Inverse Kinematics 1 1/29/2018

Inverse Kinematics 1 1/29/2018 Invere Kinemati 1 Invere Kinemati 2 given the poe of the end effetor, find the joint variable that produe the end effetor poe for a -joint robot, given find 1 o R T 3 2 1,,,,, q q q q q q RPP + Spherial

More information

The Association of System Performance Professionals

The Association of System Performance Professionals The Aociation of Sytem Performance Profeional The Computer Meaurement Group, commonly called CMG, i a not for profit, worldwide organization of data proceing profeional committed to the meaurement and

More information

Q1:Choose the correct answer:

Q1:Choose the correct answer: Q:Chooe the orret anwer:. Purpoe of an OS i a. Create abtration b. Multiple proee ompete for ue of proeor. Coordination. Sheduler deide a. whih proee get to ue the proeor b. when proee get to ue the proeor.

More information

mahines. HBSP enhanes the appliability of the BSP model by inorporating parameters that reet the relative speeds of the heterogeneous omputing omponen

mahines. HBSP enhanes the appliability of the BSP model by inorporating parameters that reet the relative speeds of the heterogeneous omputing omponen The Heterogeneous Bulk Synhronous Parallel Model Tiani L. Williams and Rebea J. Parsons Shool of Computer Siene University of Central Florida Orlando, FL 32816-2362 fwilliams,rebeag@s.uf.edu Abstrat. Trends

More information

What are Cycle-Stealing Systems Good For? A Detailed Performance Model Case Study

What are Cycle-Stealing Systems Good For? A Detailed Performance Model Case Study What are Cyle-Stealing Systems Good For? A Detailed Performane Model Case Study Wayne Kelly and Jiro Sumitomo Queensland University of Tehnology, Australia {w.kelly, j.sumitomo}@qut.edu.au Abstrat The

More information

MAT 155: Describing, Exploring, and Comparing Data Page 1 of NotesCh2-3.doc

MAT 155: Describing, Exploring, and Comparing Data Page 1 of NotesCh2-3.doc MAT 155: Decribing, Exploring, and Comparing Data Page 1 of 8 001-oteCh-3.doc ote for Chapter Summarizing and Graphing Data Chapter 3 Decribing, Exploring, and Comparing Data Frequency Ditribution, Graphic

More information

Advanced Encryption Standard and Modes of Operation

Advanced Encryption Standard and Modes of Operation Advanced Encryption Standard and Mode of Operation G. Bertoni L. Breveglieri Foundation of Cryptography - AES pp. 1 / 50 AES Advanced Encryption Standard (AES) i a ymmetric cryptographic algorithm AES

More information

A SIMPLE IMPERATIVE LANGUAGE THE STORE FUNCTION NON-TERMINATING COMMANDS

A SIMPLE IMPERATIVE LANGUAGE THE STORE FUNCTION NON-TERMINATING COMMANDS A SIMPLE IMPERATIVE LANGUAGE Eventually we will preent the emantic of a full-blown language, with declaration, type and looping. However, there are many complication, o we will build up lowly. Our firt

More information

Kinematic design of a double wishbone type front suspension mechanism using multi-objective optimization

Kinematic design of a double wishbone type front suspension mechanism using multi-objective optimization 5 th utralaian Congre on pplied Mehani, CM 2007 10-12 Deember 2007, Bribane, utralia Kinemati deign of a double wihbone tpe front upenion mehanim uing multi-objetive optimiation J. S. wang 1, S. R. Kim

More information

Deterministic Access for DSRC/802.11p Vehicular Safety Communication

Deterministic Access for DSRC/802.11p Vehicular Safety Communication eterminiti Ae for SRC/802.11p Vehiular Safety Communiation Jihene Rezgui, Soumaya Cheraoui, Omar Charoun INTERLAB Reearh Laboratory Univerité de Sherbrooe, Canada {jihene.rezgui, oumaya.heraoui, omar.haroun

More information

Universität Augsburg. Institut für Informatik. Approximating Optimal Visual Sensor Placement. E. Hörster, R. Lienhart.

Universität Augsburg. Institut für Informatik. Approximating Optimal Visual Sensor Placement. E. Hörster, R. Lienhart. Univerität Augburg à ÊÇÅÍÆ ËÀǼ Approximating Optimal Viual Senor Placement E. Hörter, R. Lienhart Report 2006-01 Januar 2006 Intitut für Informatik D-86135 Augburg Copyright c E. Hörter, R. Lienhart Intitut

More information

Calculations for multiple mixers are based on a formalism that uses sideband information and LO frequencies: ( ) sb

Calculations for multiple mixers are based on a formalism that uses sideband information and LO frequencies: ( ) sb Setting frequeny parameter in the WASP databae A. Harri 24 Aug 2003 Calulation for multiple mixer are baed on a formalim that ue ideband information and LO frequenie: b b := ign f ig f LO f IF := f ig

More information

Key Terms - MinMin, MaxMin, Sufferage, Task Scheduling, Standard Deviation, Load Balancing.

Key Terms - MinMin, MaxMin, Sufferage, Task Scheduling, Standard Deviation, Load Balancing. Volume 3, Iue 11, November 2013 ISSN: 2277 128X International Journal of Advanced Reearch in Computer Science and Software Engineering Reearch Paper Available online at: www.ijarce.com Tak Aignment in

More information

Chapter 13 Non Sampling Errors

Chapter 13 Non Sampling Errors Chapter 13 Non Sampling Error It i a general aumption in the ampling theory that the true value of each unit in the population can be obtained and tabulated without any error. In practice, thi aumption

More information

Representations and Transformations. Objectives

Representations and Transformations. Objectives Repreentation and Tranformation Objective Derive homogeneou coordinate tranformation matrice Introduce tandard tranformation - Rotation - Tranlation - Scaling - Shear Scalar, Point, Vector Three baic element

More information

Approximate logic synthesis for error tolerant applications

Approximate logic synthesis for error tolerant applications Approximate logi synthesis for error tolerant appliations Doohul Shin and Sandeep K. Gupta Eletrial Engineering Department, University of Southern California, Los Angeles, CA 989 {doohuls, sandeep}@us.edu

More information

Using Bayesian Networks for Cleansing Trauma Data

Using Bayesian Networks for Cleansing Trauma Data Uing Bayeian Network for Cleaning Trauma Data Prahant J. Dohi pdohi@.ui.edu Dept. of Computer Siene Univ of Illinoi, Chiago, IL 60607 Lloyd G. Greenwald lgreenwa@.drexel.edu Dept. of Computer Siene Drexel

More information

ES205 Analysis and Design of Engineering Systems: Lab 1: An Introductory Tutorial: Getting Started with SIMULINK

ES205 Analysis and Design of Engineering Systems: Lab 1: An Introductory Tutorial: Getting Started with SIMULINK ES05 Analyi and Deign of Engineering Sytem: Lab : An Introductory Tutorial: Getting Started with SIMULINK What i SIMULINK? SIMULINK i a oftware package for modeling, imulating, and analyzing dynamic ytem.

More information

arxiv: v1 [physics.soc-ph] 17 Oct 2013

arxiv: v1 [physics.soc-ph] 17 Oct 2013 Emergene of Blind Area in Information Sreading arxiv:131707v1 [hyi.o-h] 17 Ot 2013 Zi-Ke Zhang 1,2,, Chu-Xu Zhang 1,3,, Xiao-Pu Han 1,2 and Chuang Liu 1,2 1 Intitute of Information Eonomy, Hangzhou Normal

More information

Classical Univariate Statistics

Classical Univariate Statistics 1 2 Statitial Modelling and Computing Nik Fieller Department of Probability & Statiti Unierity of Sheffield, UK Claial Uniariate Statiti (& Alternatie) Claial tatitial tet and p-alue Simple imulation method

More information

About this Topic. Topic 4. Arithmetic Circuits. Different adder architectures. Basic Ripple Carry Adder

About this Topic. Topic 4. Arithmetic Circuits. Different adder architectures. Basic Ripple Carry Adder About thi Topi Topi 4 Arithmeti Ciruit Peter Cheung Department of Eletrial & Eletroni Engineering Imperial College London URL: www.ee.imperial.a.uk/pheung/ E-mail: p.heung@imperial.a.uk Comparion of adder

More information

A METHOD OF REAL-TIME NURBS INTERPOLATION WITH CONFINED CHORD ERROR FOR CNC SYSTEMS

A METHOD OF REAL-TIME NURBS INTERPOLATION WITH CONFINED CHORD ERROR FOR CNC SYSTEMS Vietnam Journal of Science and Technology 55 (5) (017) 650-657 DOI: 10.1565/55-518/55/5/906 A METHOD OF REAL-TIME NURBS INTERPOLATION WITH CONFINED CHORD ERROR FOR CNC SYSTEMS Nguyen Huu Quang *, Banh

More information

Distributed Packet Processing Architecture with Reconfigurable Hardware Accelerators for 100Gbps Forwarding Performance on Virtualized Edge Router

Distributed Packet Processing Architecture with Reconfigurable Hardware Accelerators for 100Gbps Forwarding Performance on Virtualized Edge Router Ditributed Packet Proceing Architecture with Reconfigurable Hardware Accelerator for 100Gbp Forwarding Performance on Virtualized Edge Router Satohi Nihiyama, Hitohi Kaneko, and Ichiro Kudo Abtract To

More information

Constructing Transaction Serialization Order for Incremental. Data Warehouse Refresh. Ming-Ling Lo and Hui-I Hsiao. IBM T. J. Watson Research Center

Constructing Transaction Serialization Order for Incremental. Data Warehouse Refresh. Ming-Ling Lo and Hui-I Hsiao. IBM T. J. Watson Research Center Construting Transation Serialization Order for Inremental Data Warehouse Refresh Ming-Ling Lo and Hui-I Hsiao IBM T. J. Watson Researh Center July 11, 1997 Abstrat In typial pratie of data warehouse, the

More information

Topics. Lecture 37: Global Optimization. Issues. A Simple Example: Copy Propagation X := 3 B > 0 Y := 0 X := 4 Y := Z + W A := 2 * 3X

Topics. Lecture 37: Global Optimization. Issues. A Simple Example: Copy Propagation X := 3 B > 0 Y := 0 X := 4 Y := Z + W A := 2 * 3X Lecture 37: Global Optimization [Adapted from note by R. Bodik and G. Necula] Topic Global optimization refer to program optimization that encompa multiple baic block in a function. (I have ued the term

More information

Lecture 14: Minimum Spanning Tree I

Lecture 14: Minimum Spanning Tree I COMPSCI 0: Deign and Analyi of Algorithm October 4, 07 Lecture 4: Minimum Spanning Tree I Lecturer: Rong Ge Scribe: Fred Zhang Overview Thi lecture we finih our dicuion of the hortet path problem and introduce

More information

SPH3UW Unit 7.1 The Ray Model of Light Page 2 of 5. The accepted value for the speed of light inside a vacuum is c m which we usually

SPH3UW Unit 7.1 The Ray Model of Light Page 2 of 5. The accepted value for the speed of light inside a vacuum is c m which we usually SPH3UW Unit 7. The Ray Model of Light Page of 5 Note Phyi Tool box Ray light trael in traight path alled ray. Index of refration (n) i the ratio of the peed of light () in a auu to the peed of light in

More information

System-Level Parallelism and Throughput Optimization in Designing Reconfigurable Computing Applications

System-Level Parallelism and Throughput Optimization in Designing Reconfigurable Computing Applications System-Level Parallelism and hroughput Optimization in Designing Reonfigurable Computing Appliations Esam El-Araby 1, Mohamed aher 1, Kris Gaj 2, arek El-Ghazawi 1, David Caliga 3, and Nikitas Alexandridis

More information

arxiv: v1 [cs.db] 13 Sep 2017

arxiv: v1 [cs.db] 13 Sep 2017 An effiient lustering algorithm from the measure of loal Gaussian distribution Yuan-Yen Tai (Dated: May 27, 2018) In this paper, I will introdue a fast and novel lustering algorithm based on Gaussian distribution

More information

See chapter 8 in the textbook. Dr Muhammad Al Salamah, Industrial Engineering, KFUPM

See chapter 8 in the textbook. Dr Muhammad Al Salamah, Industrial Engineering, KFUPM Goal programming Objective of the topic: Indentify indutrial baed ituation where two or more objective function are required. Write a multi objective function model dla a goal LP Ue weighting um and preemptive

More information

An Intro to LP and the Simplex Algorithm. Primal Simplex

An Intro to LP and the Simplex Algorithm. Primal Simplex An Intro to LP and the Simplex Algorithm Primal Simplex Linear programming i contrained minimization of a linear objective over a olution pace defined by linear contraint: min cx Ax b l x u A i an m n

More information

Performance of a Robust Filter-based Approach for Contour Detection in Wireless Sensor Networks

Performance of a Robust Filter-based Approach for Contour Detection in Wireless Sensor Networks Performance of a Robut Filter-baed Approach for Contour Detection in Wirele Senor Network Hadi Alati, William A. Armtrong, Jr., and Ai Naipuri Department of Electrical and Computer Engineering The Univerity

More information

VLSI Design 9. Datapath Design

VLSI Design 9. Datapath Design VLSI Deign 9. Datapath Deign 9. Datapath Deign Lat module: Adder circuit Simple adder Fat addition Thi module omparator Shifter Multi-input Adder Multiplier omparator detector: A = 1 detector: A = 11 111

More information

In-Plane Shear Behavior of SC Composite Walls: Theory vs. Experiment

In-Plane Shear Behavior of SC Composite Walls: Theory vs. Experiment Tranation, MiRT, 6- November,, New Delhi, India Div-VI: Paper ID# 764 In-Plane hear Behavior of C Compoite Wall: Theory v. Experiment Amit H. Varma, ai Zhang, Hoeok Chi 3, Peter Booth 4, Tod Baker 5 Aoiate

More information

Automatic design of robust PID controllers based on QFT specifications

Automatic design of robust PID controllers based on QFT specifications IFAC Conferene on Advane in PID Control PID'1 Breia (Italy), Marh 8-3, 1 Automati deign of robut PID ontroller baed on QFT peifiation R. Comaòliva*, T. Eobet* J. Quevedo* * Advaned Control Sytem (SAC),

More information

On - Line Path Delay Fault Testing of Omega MINs M. Bellos 1, E. Kalligeros 1, D. Nikolos 1,2 & H. T. Vergos 1,2

On - Line Path Delay Fault Testing of Omega MINs M. Bellos 1, E. Kalligeros 1, D. Nikolos 1,2 & H. T. Vergos 1,2 On - Line Path Delay Fault Testing of Omega MINs M. Bellos, E. Kalligeros, D. Nikolos,2 & H. T. Vergos,2 Dept. of Computer Engineering and Informatis 2 Computer Tehnology Institute University of Patras,

More information

COST PERFORMANCE ASPECTS OF CCD FAST AUXILIARY MEMORY

COST PERFORMANCE ASPECTS OF CCD FAST AUXILIARY MEMORY COST PERFORMANCE ASPECTS OF CCD FAST AUXILIARY MEMORY Dileep P, Bhondarkor Texas Instruments Inorporated Dallas, Texas ABSTRACT Charge oupled devies (CCD's) hove been mentioned as potential fast auxiliary

More information

An Evolutionary Multiple Heuristic with Genetic Local Search for Solving TSP

An Evolutionary Multiple Heuristic with Genetic Local Search for Solving TSP An Evolutionary Multiple Heuriti with Geneti Loal Searh for Solving TSP Peng Gang Ihiro Iimura 2 and Shigeru Nakayama 3 Department of Information and Computer Siene Faulty of Engineering Kagohima Univerity

More information

A New Approach to Pipeline FFT Processor

A New Approach to Pipeline FFT Processor A ew Approach to Pipeline FFT Proceor Shouheng He and Mat Torkelon Department of Applied Electronic, Lund Univerity S- Lund, SWEDE email: he@tde.lth.e; torkel@tde.lth.e Abtract A new VLSI architecture

More information

Minimum congestion spanning trees in bipartite and random graphs

Minimum congestion spanning trees in bipartite and random graphs Minimum congetion panning tree in bipartite and random graph M.I. Otrovkii Department of Mathematic and Computer Science St. John Univerity 8000 Utopia Parkway Queen, NY 11439, USA e-mail: otrovm@tjohn.edu

More information

Operational Semantics Class notes for a lecture given by Mooly Sagiv Tel Aviv University 24/5/2007 By Roy Ganor and Uri Juhasz

Operational Semantics Class notes for a lecture given by Mooly Sagiv Tel Aviv University 24/5/2007 By Roy Ganor and Uri Juhasz Operational emantic Page Operational emantic Cla note for a lecture given by Mooly agiv Tel Aviv Univerity 4/5/7 By Roy Ganor and Uri Juhaz Reference emantic with Application, H. Nielon and F. Nielon,

More information

Evaluation of Benchmark Performance Estimation for Parallel. Fortran Programs on Massively Parallel SIMD and MIMD. Computers.

Evaluation of Benchmark Performance Estimation for Parallel. Fortran Programs on Massively Parallel SIMD and MIMD. Computers. Evaluation of Benhmark Performane Estimation for Parallel Fortran Programs on Massively Parallel SIMD and MIMD Computers Thomas Fahringer Dept of Software Tehnology and Parallel Systems University of Vienna

More information

Laboratory Exercise 6

Laboratory Exercise 6 Laboratory Exercie 6 Adder, Subtractor, and Multiplier The purpoe of thi exercie i to examine arithmetic circuit that add, ubtract, and multiply number. Each circuit will be decribed in VHL and implemented

More information

Quadrilaterals. Learning Objectives. Pre-Activity

Quadrilaterals. Learning Objectives. Pre-Activity Section 3.4 Pre-Activity Preparation Quadrilateral Intereting geometric hape and pattern are all around u when we tart looking for them. Examine a row of fencing or the tiling deign at the wimming pool.

More information

Folding. Hardware Mapped vs. Time multiplexed. Folding by N (N=folding factor) Node A. Unfolding by J A 1 A J-1. Time multiplexed/microcoded

Folding. Hardware Mapped vs. Time multiplexed. Folding by N (N=folding factor) Node A. Unfolding by J A 1 A J-1. Time multiplexed/microcoded Folding is verse of Unfolding Node A A Folding by N (N=folding fator) Folding A Unfolding by J A A J- Hardware Mapped vs. Time multiplexed l Hardware Mapped vs. Time multiplexed/mirooded FI : y x(n) h

More information

CleanUp: Improving Quadrilateral Finite Element Meshes

CleanUp: Improving Quadrilateral Finite Element Meshes CleanUp: Improving Quadrilateral Finite Element Meshes Paul Kinney MD-10 ECC P.O. Box 203 Ford Motor Company Dearborn, MI. 8121 (313) 28-1228 pkinney@ford.om Abstrat: Unless an all quadrilateral (quad)

More information

KS3 Maths Assessment Objectives

KS3 Maths Assessment Objectives KS3 Math Aement Objective Tranition Stage 9 Ratio & Proportion Probabilit y & Statitic Appreciate the infinite nature of the et of integer, real and rational number Can interpret fraction and percentage

More information

A Boyer-Moore Approach for. Two-Dimensional Matching. Jorma Tarhio. University of California. Berkeley, CA Abstract

A Boyer-Moore Approach for. Two-Dimensional Matching. Jorma Tarhio. University of California. Berkeley, CA Abstract A Boyer-Moore Approach for Two-Dimenional Matching Jorma Tarhio Computer Science Diviion Univerity of California Berkeley, CA 94720 Abtract An imple ublinear algorithm i preented for two-dimenional tring

More information

The norm Package. November 15, Title Analysis of multivariate normal datasets with missing values

The norm Package. November 15, Title Analysis of multivariate normal datasets with missing values The norm Package November 15, 2003 Verion 1.0-9 Date 2002/05/06 Title Analyi of multivariate normal dataet with miing value Author Ported to R by Alvaro A. Novo . Original by Joeph

More information

A DYNAMIC ACCESS CONTROL WITH BINARY KEY-PAIR

A DYNAMIC ACCESS CONTROL WITH BINARY KEY-PAIR Malaysian Journal of Computer Siene, Vol 10 No 1, June 1997, pp 36-41 A DYNAMIC ACCESS CONTROL WITH BINARY KEY-PAIR Md Rafiqul Islam, Harihodin Selamat and Mohd Noor Md Sap Faulty of Computer Siene and

More information

Lecture 8: More Pipelining

Lecture 8: More Pipelining Overview Lecture 8: More Pipelining David Black-Schaffer davidbb@tanford.edu EE8 Spring 00 Getting Started with Lab Jut get a ingle pixel calculating at one time Then look into filling your pipeline Multiplier

More information

Announcements. CSE332: Data Abstractions Lecture 19: Parallel Prefix and Sorting. The prefix-sum problem. Outline. Parallel prefix-sum

Announcements. CSE332: Data Abstractions Lecture 19: Parallel Prefix and Sorting. The prefix-sum problem. Outline. Parallel prefix-sum Announcement Homework 6 due Friday Feb 25 th at the BEGINNING o lecture CSE332: Data Abtraction Lecture 19: Parallel Preix and Sorting Project 3 the lat programming project! Verion 1 & 2 - Tue March 1,

More information

Outline: Software Design

Outline: Software Design Outline: Software Design. Goals History of software design ideas Design priniples Design methods Life belt or leg iron? (Budgen) Copyright Nany Leveson, Sept. 1999 A Little History... At first, struggling

More information

Computer Arithmetic Homework Solutions. 1 An adder for graphics. 2 Partitioned adder. 3 HDL implementation of a partitioned adder

Computer Arithmetic Homework Solutions. 1 An adder for graphics. 2 Partitioned adder. 3 HDL implementation of a partitioned adder Computer Arithmetic Homework 3 2016 2017 Solution 1 An adder for graphic In a normal ripple carry addition of two poitive number, the carry i the ignal for a reult exceeding the maximum. We ue thi ignal

More information

Routing Definition 4.1

Routing Definition 4.1 4 Routing So far, we have only looked at network without dealing with the iue of how to end information in them from one node to another The problem of ending information in a network i known a routing

More information

How to Select Measurement Points in Access Point Localization

How to Select Measurement Points in Access Point Localization Proceeding of the International MultiConference of Engineer and Computer Scientit 205 Vol II, IMECS 205, March 8-20, 205, Hong Kong How to Select Meaurement Point in Acce Point Localization Xiaoling Yang,

More information

else end while End References

else end while End References 621-630. [RM89] [SK76] Roenfeld, A. and Melter, R. A., Digital geometry, The Mathematical Intelligencer, vol. 11, No. 3, 1989, pp. 69-72. Sklanky, J. and Kibler, D. F., A theory of nonuniformly digitized

More information

/06/$ IEEE 364

/06/$ IEEE 364 006 IEEE International ympoium on ignal Proceing and Information Technology oie Variance Etimation In ignal Proceing David Makovoz IPAC, California Intitute of Technology, MC-0, Paadena, CA, 95 davidm@ipac.caltech.edu;

More information

(12) Patent Application Publication (10) Pub. No.: US 2003/ A1

(12) Patent Application Publication (10) Pub. No.: US 2003/ A1 US 2003O196031A1 (19) United State (12) Patent Application Publication (10) Pub. No.: US 2003/0196031 A1 Chen (43) Pub. Date: Oct. 16, 2003 (54) STORAGE CONTROLLER WITH THE DISK Related U.S. Application

More information

Performance Evaluation of an Advanced Local Search Evolutionary Algorithm

Performance Evaluation of an Advanced Local Search Evolutionary Algorithm Anne Auger and Nikolau Hanen Performance Evaluation of an Advanced Local Search Evolutionary Algorithm Proceeding of the IEEE Congre on Evolutionary Computation, CEC 2005 c IEEE Performance Evaluation

More information

Uninformed Search Complexity. Informed Search. Search Revisited. Day 2/3 of Search

Uninformed Search Complexity. Informed Search. Search Revisited. Day 2/3 of Search Informed Search ay 2/3 of Search hap. 4, Ruel & Norvig FS IFS US PFS MEM FS IS Uninformed Search omplexity N = Total number of tate = verage number of ucceor (branching factor) L = Length for tart to goal

More information

ANALYSIS OF THE FIRST LAYER IN WEIGHTLESS NEURAL NETWORKS FOR 3_DIMENSIONAL PATTERN RECOGNITION

ANALYSIS OF THE FIRST LAYER IN WEIGHTLESS NEURAL NETWORKS FOR 3_DIMENSIONAL PATTERN RECOGNITION ANALYSIS OF THE FIRST LAYER IN WEIGHTLESS NEURAL NETWORKS FOR 3_DIMENSIONAL PATTERN RECOGNITION A. Váque-Nava * Ecuela de Ingeniería. CENTRO UNIVERSITARIO MEXICO. DIVISION DE ESTUDIOS SUPERIORES J. Figueroa

More information

Floating Point CORDIC Based Power Operation

Floating Point CORDIC Based Power Operation Floating Point CORDIC Baed Power Operation Kazumi Malhan, Padmaja AVL Electrical and Computer Engineering Department School of Engineering and Computer Science Oakland Univerity, Rocheter, MI e-mail: kmalhan@oakland.edu,

More information

Learning Convention Propagation in BeerAdvocate Reviews from a etwork Perspective. Abstract

Learning Convention Propagation in BeerAdvocate Reviews from a etwork Perspective. Abstract CS 9 Projet Final Report: Learning Convention Propagation in BeerAdvoate Reviews from a etwork Perspetive Abstrat We look at the way onventions propagate between reviews on the BeerAdvoate dataset, and

More information

Laboratory Exercise 6

Laboratory Exercise 6 Laboratory Exercie 6 Adder, Subtractor, and Multiplier The purpoe of thi exercie i to examine arithmetic circuit that add, ubtract, and multiply number. Each circuit will be decribed in Verilog and implemented

More information

NONLINEAR BACK PROJECTION FOR TOMOGRAPHIC IMAGE RECONSTRUCTION. Ken Sauer and Charles A. Bouman

NONLINEAR BACK PROJECTION FOR TOMOGRAPHIC IMAGE RECONSTRUCTION. Ken Sauer and Charles A. Bouman NONLINEAR BACK PROJECTION FOR TOMOGRAPHIC IMAGE RECONSTRUCTION Ken Sauer and Charles A. Bouman Department of Eletrial Engineering, University of Notre Dame Notre Dame, IN 46556, (219) 631-6999 Shool of

More information

Topics. FPGA Design EECE 277. Number Representation and Adders. Class Exercise. Laboratory Assignment #2

Topics. FPGA Design EECE 277. Number Representation and Adders. Class Exercise. Laboratory Assignment #2 FPGA Deign EECE 277 Number Repreentation and Adder Dr. William H. Robinon Februar 2, 25 Topi There are kind of people in the world, thoe that undertand binar and thoe that don't. Unknown Adminitrative

More information

Calculation of typical running time of a branch-and-bound algorithm for the vertex-cover problem

Calculation of typical running time of a branch-and-bound algorithm for the vertex-cover problem Calulation of typial running time of a branh-and-bound algorithm for the vertex-over problem Joni Pajarinen, Joni.Pajarinen@iki.fi Otober 21, 2007 1 Introdution The vertex-over problem is one of a olletion

More information

A Specification for Rijndael, the AES Algorithm

A Specification for Rijndael, the AES Algorithm A Speifiation for Rijndael, the AES Algorithm. Notation and Convention. Rijndael Input and Output The input, the output and the ipher key for Rijndael are eah it equene ontaining 28, 92 or 256 it with

More information

Compressed Sensing Image Processing Based on Stagewise Orthogonal Matching Pursuit

Compressed Sensing Image Processing Based on Stagewise Orthogonal Matching Pursuit Senor & randucer, Vol. 8, Iue 0, October 204, pp. 34-40 Senor & randucer 204 by IFSA Publihing, S. L. http://www.enorportal.com Compreed Sening Image Proceing Baed on Stagewie Orthogonal Matching Puruit

More information

A PROBABILISTIC NOTION OF CAMERA GEOMETRY: CALIBRATED VS. UNCALIBRATED

A PROBABILISTIC NOTION OF CAMERA GEOMETRY: CALIBRATED VS. UNCALIBRATED A PROBABILISTIC NOTION OF CAMERA GEOMETRY: CALIBRATED VS. UNCALIBRATED Jutin Domke and Yianni Aloimono Computational Viion Laboratory, Center for Automation Reearch Univerity of Maryland College Park,

More information

Automatic Physical Design Tuning: Workload as a Sequence Sanjay Agrawal Microsoft Research One Microsoft Way Redmond, WA, USA +1-(425)

Automatic Physical Design Tuning: Workload as a Sequence Sanjay Agrawal Microsoft Research One Microsoft Way Redmond, WA, USA +1-(425) Automati Physial Design Tuning: Workload as a Sequene Sanjay Agrawal Mirosoft Researh One Mirosoft Way Redmond, WA, USA +1-(425) 75-357 sagrawal@mirosoft.om Eri Chu * Computer Sienes Department University

More information

Aalborg Universitet. Published in: Proceedings of the Working Conference on Advanced Visual Interfaces

Aalborg Universitet. Published in: Proceedings of the Working Conference on Advanced Visual Interfaces Aalborg Univeritet Software-Baed Adjutment of Mobile Autotereocopic Graphic Uing Static Parallax Barrier Paprocki, Martin Marko; Krog, Kim Srirat; Kritofferen, Morten Bak; Krau, Martin Publihed in: Proceeding

More information

Comparison of Methods for Horizon Line Detection in Sea Images

Comparison of Methods for Horizon Line Detection in Sea Images Comparion of Method for Horizon Line Detection in Sea Image Tzvika Libe Evgeny Gerhikov and Samuel Koolapov Department of Electrical Engineering Braude Academic College of Engineering Karmiel 2982 Irael

More information

13.1 Numerical Evaluation of Integrals Over One Dimension

13.1 Numerical Evaluation of Integrals Over One Dimension 13.1 Numerial Evaluation of Integrals Over One Dimension A. Purpose This olletion of subprograms estimates the value of the integral b a f(x) dx where the integrand f(x) and the limits a and b are supplied

More information

Delaunay Triangulation: Incremental Construction

Delaunay Triangulation: Incremental Construction Chapter 6 Delaunay Triangulation: Incremental Contruction In the lat lecture, we have learned about the Lawon ip algorithm that compute a Delaunay triangulation of a given n-point et P R 2 with O(n 2 )

More information

SIMIT 7. Component Type Editor (CTE) User manual. Siemens Industrial

SIMIT 7. Component Type Editor (CTE) User manual. Siemens Industrial SIMIT 7 Component Type Editor (CTE) Uer manual Siemen Indutrial Edition January 2013 Siemen offer imulation oftware to plan, imulate and optimize plant and machine. The imulation- and optimizationreult

More information

Laboratory Exercise 6

Laboratory Exercise 6 Laboratory Exercie 6 Adder, Subtractor, and Multiplier a a The purpoe of thi exercie i to examine arithmetic circuit that add, ubtract, and multiply number. Each b c circuit will be decribed in Verilog

More information

Generic Traverse. CS 362, Lecture 19. DFS and BFS. Today s Outline

Generic Traverse. CS 362, Lecture 19. DFS and BFS. Today s Outline Generic Travere CS 62, Lecture 9 Jared Saia Univerity of New Mexico Travere(){ put (nil,) in bag; while (the bag i not empty){ take ome edge (p,v) from the bag if (v i unmarked) mark v; parent(v) = p;

More information

DAROS: Distributed User-Server Assignment And Replication For Online Social Networking Applications

DAROS: Distributed User-Server Assignment And Replication For Online Social Networking Applications DAROS: Ditributed Uer-Server Aignment And Replication For Online Social Networking Application Thuan Duong-Ba School of EECS Oregon State Univerity Corvalli, OR 97330, USA Email: duongba@eec.oregontate.edu

More information

Analyzing Hydra Historical Statistics Part 2

Analyzing Hydra Historical Statistics Part 2 Analyzing Hydra Hitorical Statitic Part Fabio Maimo Ottaviani EPV Technologie White paper 5 hnode HSM Hitorical Record The hnode i the hierarchical data torage management node and ha to perform all the

More information

Cutting Stock by Iterated Matching. Andreas Fritsch, Oliver Vornberger. University of Osnabruck. D Osnabruck.

Cutting Stock by Iterated Matching. Andreas Fritsch, Oliver Vornberger. University of Osnabruck. D Osnabruck. Cutting Stock by Iterated Matching Andrea Fritch, Oliver Vornberger Univerity of Onabruck Dept of Math/Computer Science D-4909 Onabruck andy@informatikuni-onabrueckde Abtract The combinatorial optimization

More information

Stress-Blended Eddy Simulation (SBES) - A new Paradigm in hybrid RANS-LES Modeling

Stress-Blended Eddy Simulation (SBES) - A new Paradigm in hybrid RANS-LES Modeling Stre-Blended Eddy Simulation (SBES) - A new Paradigm in hybrid RANS-LES Modeling Menter F.R. ANSYS Germany GmbH Introduction It i oberved in many CFD imulation that RANS model how inherent technology limitation

More information