Formal Datapath Representation and Manipulation for Implementing DSP Transforms

Size: px
Start display at page:

Download "Formal Datapath Representation and Manipulation for Implementing DSP Transforms"

Transcription

1 Formal Datapath Represetatio ad Maipulatio for Implemetig DSP Trasforms Peter A. Milder, Fraz Frachetti, James C. Hoe, ad Markus Püschel Electrical ad Computer Egieerig Departmet Caregie Mello Uiversity Pittsburgh, PA, U.S.A. ABSTRACT We preset a domai-specific approach to represetig datapaths for hardare implemetatios of liear sigal trasform algorithms. We exted the tesor structure for describig liear trasform algorithms, addig the ability to explicitly characterize to importat dimesios of datapath architecture. This represetatio allos both algorithm ad datapath to be specified ithi a sigle formula ad gives the desiger the ability to easily cosider a ide space of possible datapaths at a high level of abstractio. We have costructed a formula maipulatio system based o this represetatio ad have ritte a compiler that ca traslate a formula ito a hardare implemetatio. This eables a automatic push butto compilatio flo that produces a register trasfer level hardare descriptio from high-level datapath directives ad a algorithm (ritte as a formula). I our experimetal results, e demostrate that this approach yields efficiet desigs over a large tradeoff space. Categories ad Subject Descriptors: B.6.3 [Hardare]: Desig Aids Automatic sythesis Geeral Terms: Algorithms, Desig Keyords: liear trasform, discrete Fourier trasform, high-level sythesis, streamig 1. INTRODUCTION Liear sigal trasforms such as the discrete Fourier trasform are ubiquitous i digital sigal processig (DSP) ad scietific computig. Algorithms for computig these trasforms are ofte highly structured ad regular, hich makes them ell suited for hardare implemetatio. This regularity allos a ide space of potetial datapath structures, each givig a differet set of tradeoffs betee performace ad cost. It is very difficult for a desiger to determie the structure that ill yield the most efficiet datapath for give cost or performace costraits. Cotributio. I this paper, e take a domai-specific mathematical represetatio for describig liear DSP al- Permissio to make digital or hard copies of all or part of this ork for persoal or classroom use is grated ithout fee provided that copies are ot made or distributed for profit or commercial advatage ad that copies bear this otice ad the full citatio o the first page. To copy otherise, to republish, to post o servers or to redistribute to lists, requires prior specific permissio ad/or a fee. DAC 28, Jue 8 13, 28, Aaheim, Califoria, USA Copyright 28 ACM /8/6 $5.. gorithms ad exted it to iclude datapath cocepts such as parallelism ad explicit datapath reuse. The result is a mathematical laguage that e compile directly ito hardare. Usig this laguage, a desiger specifies datapath optios at the formula level. This leads to easier exploratio of the desig space by eablig algorithm restructurig through formula maipulatio, hich is performed automatically based o high-level directives. We have costructed a push butto sythesis system that takes as iput a algorithm (ritte as a formula) ad high-level datapath directives (idicatig desired qualities of the resultig desig); it outputs a desig i register-trasfer level (RTL) Verilog. Orgaizatio. We begi by itroducig the tesor (or Kroecker) represetatio for trasform algorithms i Sectio 2. The, Sectio 3 discusses the datapath costructs e cosider, ho e are able to iclude them ithi the existig mathematical represetatio, ad the associated performace ad cost metrics. Additioally, e give a highlevel vie of our sythesis system. I Sectio 4, e evaluate our geerated desigs. We preset experimets that demostrate: (a) that the cost/performace tradeoffs obtaied are competitive ith good had-desiged implemetatios, (b) that this system produces desigs across a ide tradeoff space, ad (c) that real beefits are obtaied by cosiderig a variety of datapath structures. Lastly, e discuss related ork i Sectio 5 ad coclude i Sectio BACKGROUND Trasforms as matrices. A liear trasform may be vieed as a dese matrix; applyig the trasform is the a matrix-vector multiplicatio. For example, a poit trasform characterized by matrix A is give by y = x, here x ad y are the poit iput ad output vectors (respectively), ad is a matrix. Direct evaluatio of the matrix-vector product requires O` 2 arithmetic operatios. Algorithms as formulas. Fast algorithms exist for may trasforms that reduce the arithmetic cost to O`log. We vie a algorithm as a decompositio of the dese matrix ito a product of structured sparse matrices. The tesor (or Kroecker) formulatio has bee sho to be a compact ad efficiet ay to represet fast trasform algorithms [5, 11]. Recetly, others have sho that a frameork based o this formulatio ca be used to geerate optimized softare

2 x B (a) B x x 1 x 2 x 3 y x x 1 A 2 y y 1 x 2 x 3 A 2 y 2 y 3 (b) I 2 A 2 i x x 1 x 2 x 3 d d1 d2 d3 (c) D 4 y y 1 y 2 y 3 (d) DFT 4 = P 4 (I 2 ) P 4 D 4 (I 2 ) P 4 Figure 1: Examples of traslatig from formula to combiatioal datapath. for today s high-performace computer systems [1]. Formula laguage. This algorithmic represetatio is captured i a formal laguage that represets algorithms usig formulas, ith each term i the formula havig a correspodig combiatioal datapath represetatio. I Backus- Naur form, the laguage is defied as follos (o-termials are bold-faced): matrix ::=matrix Q matrix i matrix I k matrix m here = km base base ::=D = diag(d,..., d 1) P I A matrix formula ca be decomposed ito a product or iterative product of matrix formulas (lies 1 ad 2, illustrated i Figure 1(a)). Matrix I k is the k k idetity matrix, ad I k matrix m is a tesor (or Kroecker) product, here k parallel istaces of matrix m are applied to the data vector of size = km (Figure 1(b)). We use P to deote a permutatio o poits ad D to represet a diagoal matrix, hich has o-zero values alog the mai diagoal oly, causig each value of the iput vector to be scaled by a costat (Figure 1(c)). Lastly, e use to deote a geeric dese matrix, hich correspods to a computatioal basic block. This laguage is a subset of the sigal processig laguage (SPL) used i Spiral, a program geerator for softare implemetatios of liear trasforms [1]. A algorithm ritte i this laguage ca be mapped directly to a combiatioal datapath (Figure 1(d)), but the resultig datapath is ifeasibly large for all but the smallest problem sizes. 3. DATAPATH REPRESENTATION The tesor laguage described above ca represet a ide rage of algorithms, but it does ot have the capability of represetig sequetial reuse of datapath compoets, here oe computatioal block is used may times hile solvig a sigle problem. Sequetial reuse is ecessary for efficiet ad reasoably sized hardare desigs. I this sectio, e describe extesios to our formula laguage to represet to types of sequetial reuse that are relevat for hardare desigs. We sho ho these extesios eable explicit datapath descriptio at the formula level ad discuss ho formulas are automatically traslated ito register-trasfer level datapath descriptios. y y 1 y 2 y 3 size (m ) vector (a) No streamig reuse: I m. ords per cycle oe streamed vector, size (m ) m cycles (b) Full streamig reuse: I m sr. ords per cycle m/ cycles (/) blocks (c) Partial streamig reuse: I m/ sr `I /. Figure 2: Examples of streamig reuse. 3.1 Streamig Reuse As e sa i Sectio 2, the tesor product I m idicates m data-parallel istatiatios of the block (Figure 2(a)). Hoever, the same computatio ca be performed by other structures. For example, the tesor product ca be iterpreted as reuse i time (rather tha parallelism i space). The, e build a sigle istace of block ad reuse it over m cosecutive cycles (Figure 2(b)). Rather tha all m iput poits eterig the system cocurretly, they o stream i ad out at a rate of ords per cycle. We call this streamig reuse ad represet it I m sr. We defie streamig idth as the umber of iputs (or outputs) that eter (or exit) a sectio of datapath durig each cycle. Here, the streamig idth is. We ca est the to iterpretatios of i order to build a partially parallel datapath that is reused over multiple cycles (Figure 2(c)). I geeral, I m ca be ritte as I m/ sr (I / ), hich results i a datapath ith a streamig idth of, cosistig of / parallel istaces of, reused over m/ cycles ( is a multiple of ; m). Icreasig the streamig idth icreases the datapath s cost ad throughput proportioally. 3.2 Iterative Reuse Q The product of m idetical blocks ca be ritte as m A. A straightforard iterpretatio of this is a series of m blocks (Figure 3(a)). We ca also perform the same computatio by reusig the block m times (Figure 3(b)). No, the datapath must have a feedback mechaism to allo the data to cycle through the proper umber of times. We call this iterative reuse ad represet it by addig the letters ir to the product term: Q ir m A. By estig both kids of product terms, e specify a umber of blocks i sequece to be reused a umber of times (Figure 3(c)). I geeral, Q m A ca be restructured ito Q ir m/d (Q d A), resultig i d cascaded istaces of, iterated over m/d times (m/d is a iteger). We defie depth as the umber of stages built (here, d).

3 m blocks (a) No iterative reuse: Q m A. 1 block, reused m times (b) Full iterative reuse: Q ir m A. d blocks, reused m/d times (c) Partial iterative reuse: Q ir `Q m/d d A. Figure 3: Examples of iterative reuse. Whe a iterative reuse datapath is built, it is importat that the reused portio of the datapath buffer the etire vector, so the head of the data maitais sufficiet distace from its o tail. This is equivalet to requirig that the latecy (i cycles) be at least 1/(its throughput i trasforms per cycle). If the datapath does ot aturally have this property, it is ecessary to add buffers to icrease its latecy. We ill see a example of this i the folloig sectio. 3.3 Combiig Streamig ad Iterative Reuse Ofte, trasform algorithms cotai the form Q k (Im ). This structure ca utilize both iterative reuse (due to the Q ) ad streamig reuse (due to I m ), alloig a ide rage of hybrid implemetatios that exhibit flexibility across to dimesios. We ca restructure this formula to have streamig ad iterative reuse of parameterized amouts: Q ir k/d `Q d (I m/ sr (I / )), here d is the depth of the cascaded stages (ragig from 1 to k; k/d must be a iteger). Parameter is the streamig idth, a multiple of. This parameterized datapath is illustrated i Figure 4. Each stage cosists of / parallel istaces of ; d stages are built i series. Let B m represet this array of d/ may blocks. Data are loaded ito B m at a rate of per cycle over m/ cycles. The vector feeds back through B m a total of k/d times. Latecy ad throughput. Give this combied reuse example, e ca aalyze the effect of parameters d ad o the datapath. I Table 1, e first describe several geeral rules for derivig the latecy, throughput, ad a approximate area cost of basic formula costructs. Belo, e preset calculatios that correspod to evaluatig the geeral rules from Table 1 for the specific parameters of this combied reuse example (Figure 4). I these calculatios, e assume that B m (the collective block of blocks) is fully pipelied, i.e., its throughput is dictated by the problem size ad streamig idth oly: T(B m) = /m. The aalysis of latecy ad throughput for this combied reuse example icludes the folloig to cases: ords per cycle m/ cycles d stages, reused k/d times (/) blocks per colum Figure `Q 4: Combiig iterative ad streamig reuse: d (I m/ sr (I / )). Q ir k/d Case 1: Iterative reuse. This case occurs he d < k, meaig the data ill iterate over the iteral block at least 2 times. As discussed i Sectio 3.2, the iteral block s miimum latecy is determied by its throughput. So, if d L() < m/, buffers are added util they are equal. Thus, iteral block B m has latecy L(B m) = max(m/, d L()). The latecy of the hole system is k/d times this, givig latecy = max(mk/d, k L()). Because e are utilizig iterative reuse, a e vector caot eter util the previous vector begis exitig the datapath, so the throughput (i trasforms per cycle) is the iverse of the latecy, mi(d/mk, 1/(k L())). Case 2: No iterative reuse. This case occurs he d = k. No, o iterative reuse is performed; the data oly passes through the ier block oce. The datapath cosists of d = k stages, givig latecy = k L(). Because the data ever feeds back, the throughput is limited oly by the streamig idth, givig throughput = /m trasforms per cycle. From these equatios, e see that icreasig ad d ill lead to loer latecy ad higher throughput i equal eights, util either the data flos so quickly that the latecy of the computatio domiates (d L() > m/), or d icreases util o iterative reuse is performed (d = k). Flexibility. Additioally, there is oe importat distictio that must be made betee parameters d ad : as gros, the datapath requires greater badidth at its ports, ad the cost of itercoect ad multiplexers icreases. For this reaso, it is preferable to icrease d istead of. Hoever, e also ote that d must divide k evely (k is typically the log 2 of the trasform size). I may cases, this becomes a all or othig situatio, here the oly optios are d = 1 ad d = k. I those cases, the added flexibility provided by is importat. Lastly, e ote that he the datapath does ot employ iterative reuse (i.e., he d = k), the desiger typically has a ider choice of algorithms because the iteral stages are ot required to be uiform. Datapath efficiecy ad vector iterleavig. Assume e have a iterative reuse datapath that reuses block B. Here, B ca represet ay datapath e cosider i this paper, icludig those ith further iterative reuse iterally. B has a iheret latecy L(B ) ad throughput T(B ) (determied by the iverse of the miimum iitiatio iterval of iput vectors). With a sigle vector recirculatig through B, the effective throughput of B may be further limited to 1/L(B ) if L(B ) is greater tha the miimum iitiatio iterval. I this case the head of the vector is still iside B he B s iput is ready to accept a e iteratio.

4 Formula F Latecy L(F) Throughput T(F) Area cost C(F) F = A () A (1) A (m 1) Pi (L(A(i) )) mi(t(a (i) )) Pi (C(A(i) )) F = Q ir k A max( k T(A) 1 T() mi(, k k L() C(A) + C(mux) F m = I m L() T() m C() F m = I m sr L() T()/m C() Table 1: Give a matrix formula F, formulas for latecy L(F) (i cycles), throughput T(F) (i trasforms per cycle) ad approximate area cost C(F) (relative to the area cost of sub-modules). We ca defie a utilizatio ratio R of the effective throughput to the iheret throughput of B ; this quatifies the portio of B s potetial throughput that is utilized i the system. For a sigle vector, R = (1/L(B ))/T(B ). Whe the utilizatio by a sigle vector is sufficietly lo, e ca iterleave multiple vectors to make use of the full throughput capacity of B. Formally, if R 1/V (here V is a iteger), e may iterleave V computatios through the datapath, icreasig the effective throughput ad thus icreasig the utilizatio ratio to R = (V/L(B ))/T(B ). I some cases, a desiger may at to icrease L(B ) artificially for better efficiecy. For example, if R =.55, the desiger could isert delay buffers i the datapath (icrease L(B )) util R is reduced to.5 ad the iterleave to vectors. This icreased utilizatio yields higher throughput at the expese of added latecy, so the desiger s particular applicatio requiremets ill determie the suitability of this approach. Our compilatio frameork, discussed ext, ca utilize either strategy. 3.4 Compilatio: From Math to RTL We have built a compilatio frameork that takes a algorithm ritte as a formula, automatically maipulates it to describe a datapath, ad traslates the resultig desig ito register-trasfer level (RTL) Verilog. A full explaatio of this compilatio frameork is outside of the scope of this paper, but e preset a high-level descriptio here. First, the algorithm is expaded ito a formula i the laguage defied i Sectio 2. This formula correspods to the computatio that ill be performed but does ot specify the structure of the desig that ill perform it. Next, datapath directives are added that give desired characteristics of the fial implemetatio (e.g., streamig idth). A formula reritig system the propagates these directives ito the formula ad restructures each term to match the desired characteristics. A hardare formula is produced that explicitly specifies the datapath architecture. Lastly, the hardare formula is traslated to a RTL etlist that has the desired reuse characteristics. Computatioal blocks are implemeted accordig to the base matrices ad streamig permutatios are built ith memory ad itercoectio etorks. 4. EVALUATION I this sectio, e evaluate desigs produced usig the proposed method. First, e explai our methodology. The, e give several examples of trasform algorithms that utilize I ad Q ad demostrate that streamig ad iterative reuse lead to a ide tradeoff space i the resultig desigs. We compare our geerated desigs ith existig bechmarks i order to demostrate the quality of our cores. We also evaluate a trasform ith a ider tradeoff space ad examie ho high-level desig decisios affect the resultig desig. Lastly, e discuss the geerality of this approach ad sho other trasform algorithms that utilize I ad Q, i.e., algorithms that ca be implemeted ith streamig ad iterative reuse. 4.1 Methodology We have implemeted the compilatio frameork that is described i Sectio 3.4 as a e backed to the Spiral formula geeratio frameork [1]. Spiral is used to geerate the startig formula for a give trasform, ad e have modified the tool to perform the formula maipulatio associated ith our extesios to the tesor formula laguage. Lastly, e have ritte a stadaloe compiler that traslates a hardare formula ito a register-trasfer level (RTL) descriptio. The tools are itegrated, resultig i a completely automated flo from problem descriptio to RTL Verilog. I this sectio, e evaluate various desigs produced ith our frameork. Here, e target the Xilix Virtex-5 LX 33 FPGA ad geerate desigs that use a 16 bit fixed poit data type. 1 We use Xilix ISE 9.1i to sythesize ad place ad route the desigs. Whe memory is required, e use a o-chip block RAM (BRAM) if e ca utilize 5% of its storage capacity. Otherise, e use distributed RAM, i.e., memory distributed across the FPGA s logic cells. Although a FPGA platform provides a coveiet target for evaluatio, the desigs e geerate are ot limited to FPGAs. 4.2 Quality of Geerated Desigs I this sectio, e demostrate that the desigs produced by our frameork are competitive ith cores that are commercially available or foud i recet literature. We choose the discrete Fourier trasform (DFT) of size 124; e evaluate our desigs relative to cores from the commercially available Xilix LogiCore FFT versio 4.1 ad the desigs from our previous ork [8]. The algorithm ad architectures e cosidered i [8] are subsets of the space e cosider here. We geerate cores based upo to algorithms: the Pease FFT [9] ad a iterative versio of the Cooley-Tukey FFT [3]. Although they are differet algorithms, at the high level, both are of the form: DFT log r () 1 i= 1 P (I /r DFT r)d A P, here r is a poer of to. (See Sectio 2 for a explaatio of the terms i this formula.) We geerate a variety of desigs usig these algorithms ith various depths, streamig 1 Data type ad bit idth are parameters of our geeratio frameork. Curretly, our tool supports fixed poit data types of ay bitidth ad sigle precisio floatig poit.

5 DFT 124 (16 bit fixed poit) o Xilix Virtex-5 FPGA throughput [millio samples per secod] 3, 2,5 2, Geerated, ith iterative reuse Geerated, ithout iterative reuse Nordi, DAC5 Xilix LogiCore FFT v4.1 Perm (I DFTr) d 2 stages (I DFT r) (a) d 1 = 1. The outer product term is iteratively reused. 1,5 1, 5 Perm (I DFTr) d 2 stages (I DFT r) 2, 4, 6, 8, 1, 12, 14, area [slices] Figure 5: Throughput for varyig implemetatios of DFT 124. idths, ad radices (values of r i the formula above). Here e cosider steady state throughput (give i millio samples per secod) as our performace metric ad area (i terms of FPGA slices) as our cost metric. Figure 5 shos throughput for varyig implemetatios of DFT 124. From our data e plot oly the Pareto optimal poits, i.e., those that are ot eclipsed by aother desig that is both smaller ad faster. From these results, e see that the cost ad performace values of the LogiCore desigs are similar to those of our smallest cores. Furthermore, e see that our larger cores provide a commesurate icrease i performace for the extra resources they cosume. Our previous ork [8] covers a small subset of the datapath ad algorithmic optios e cosider i this paper. I Figure 5, e see that the added flexibility of our curret method leads to sigificat improvemets over [8]; the desigs i our Pareto optimal set all provide higher performace at loer cost. Similar treds are obtaied if e choose a differet value for ad/or measure latecy istead of throughput. 4.3 Automatic Desig Space Exploratio I this sectio, e cosider a algorithm for the todimesioal discrete Fourier trasform (2D DFT). This algorithm utilizes I ad to Q terms, givig a very ide space of possible datapaths. This algorithm operates o 2 poits ad has the folloig form:! 1 t 1 DFT = P 2(I 2 /r DFT r)d 2 k= l= P 2 here t = log r (). This gives to iterative product terms, as see above. Each may utilize iterative reuse, hich e characterize ith depth parameters d 1 ad d 2 (see Sectio 3.2). This is illustrated i Figure 6. We defie d 2 to be the depth of the ier product; d 2 ca be betee 1 ad t, provided that t/d 2 is a iteger. Each iteral block (a shaded regio i Figure 6) cosists of d 2 stages of P (I DFT r) D, streamed ith ports. We defie d 1 to be the depth of the outer product; d 1 ca be either 1 or 2. Whe d 1 = 1, the outer product term is iteratively reused (Figure 6(a)). Whe d 1 = 2, the outer term is urolled, givig to cascaded stages as see i Figure 6(b). Exploratio. No, e preset results of a datapath exploratio for 2D DFT We geerate cores across all!, Perm (I DFT r) d 2 stages (I DFTr) (b) d 1 = 2. The outer product term is fully urolled, i.e., ot iteratively reused. Figure 6: Illustratios of DFT ith outer product term parameterized by d 1, ier product term parameterized by d 2, ad streamig idth. possible values of d 1 ad d 2 ith the streamig idth ragig betee r ad 16. Parameter r, the radix, is 2, 4, or 8. Summig these possibilities for d 1, d 2,, ad r, e have a total of 52 differet architectures i this desig space. We geerate each hardare core, sythesize it, ad place ad route it. I Figure 7, e sho the throughput (i millio samples per secod) versus area (i FPGA slices) for all 52 data poits, ith differet markers for each value of. The black lie passes through the Pareto optimal poits. I this data set, the smallest Pareto optimal poit is the maximally folded desig: = 2, d 1 = 1, d 2 = 1. From there, e cotiue alog the Pareto optimal set by first icreasig the d parameters hile keepig = 2 (hite diamods i Figure 7). Icreasig yields desigs i the Pareto optimal set oly after several values of d 1 ad d 2 have bee icluded. This observatio that it is preferable to first icrease d 1 ad d 2 before icreasig is supported by our theoretical uderstadig of streamig ad iterative reuse as outlied i Sectio 3.3. Hoever, it is ot obvious hich parameter combiatios ill yield desigs i the Pareto optimal set, ad it is difficult to determie the crossover poits here oe desig parameter becomes more importat tha aother. This highlights the importace of a automatic geeratio system; it ould be exceedigly difficult to complete such a desig exploratio by had. 4.4 Geerality The datapath cocepts cosidered i this paper (streamig reuse of I m ad iterative reuse of Q k A) apply to algorithms for trasforms other tha those already discussed. I this sectio, e preset several problems that fit ithi these structures. For example, the Walsh-Hadamard trasform (WHT) ca be computed ith a algorithm of the form t 1 WHT r t = ((I r t 1 WHT r)p r t), k= ad the real discrete Fourier trasform (RDFT) ca be com-

6 D 64x64 (16 bit fixed poit) o Xilix Virtex-5 FPGA throughput [millio samples per secod] 1,6 1,4 1,2 1, Streamig idth 2 Streamig idth 4 Streamig idth 8 Streamig idth 16 5, 1, 15, 2, area [slices] Figure 7: Throughput versus area for 2D DFT The streamig idth is idicated by the data marker. puted usig a algorithm of the form 1 log 2 (m) RDFT 4m = P (I m l RDFT 4(l, k))p 4m A P 4m. k= The WHT algorithm is completely expressible i the laguage e cosider, ad the RDFT requires oly a small additio. Both algorithms cotai iterative product Q ad tesor product I A, hich meas that iterative ad streamig reuse ca be applied to each. We have implemeted both of these algorithms i our frameork ad have geerated ad evaluated datapaths for both. Other fast liear trasform algorithms ca be ritte usig Q ad I A, meaig that streamig ad iterative reuse aturally apply. For example, [1] shos algorithms of this sort for discrete sie ad cosie trasforms (DST ad DCT). Lastly, e poit out that streamig ad iterative reuse ca apply to other umerical problems outside of the domai of liear trasforms. For example, Viterbi decodig is performed usig a dataflo quite similar to the discrete Fourier trasform, ad may matrix-matrix multiplicatio algorithms exhibit parallelism hich ca be expressed by the tesor product. By extedig our frameork beyod liear trasforms, e may be able to efficietly describe datapaths for these types of problems. 5. RELATED WORK Although e do ot ko of ay other istaces of the tesor formula laguage beig exteded to support a geeral class of hardare implemetatios i this maer, it has bee used i the process of desigig special purpose hardare (e.g., a FFT processor i [7] ad FFT cores i our previous ork [8]). The importat distictios are that either approach exteds the formula laguage to describe datapath structure ad that either compiles from the formula to hardare; the formula is used to describe the algorithm oly. May methods have bee proposed to compile hardare from a softare-like level of abstractio. This ork differs from ours i the level of represetatio (typically C or Matlab code) ad i scope. Lastly, may special purpose FFT implemetatios have bee proposed i the literature that have features that correspod to the datapath structures e are iterested i. To ame just a fe, [6] is a example of a desig ith streamig reuse, ad cores ith streamig ad iterative reuse are developed i [2, 4, 8]. 6. CONCLUSIONS Liear DSP trasforms ad their algorithms are ell uderstood ad ca be formally described i a compact maer ith the tesor product formulatio. I this ork, e exteded this frameork to allo the represetatio of the datapath cocepts of streamig ad iterative reuse. This eables a domai-specific, formula-level vie of hardare desig ad allos datapath maipulatio to take place automatically at the mathematical level. We have implemeted these ideas i a automatic desig flo that maipulates a formula based upo high-level directives ad produces a desig i RTL Verilog. Lastly, e have preseted results that demostrate the breadth of these techiques ad have established the quality of the geerated desigs. 7. ACKNOWLEDGMENTS This ork as supported by NSF through aards ad ad by DARPA through Departmet of Iterior grat NBCH159 ad ARO grat W911NF REFERENCES [1] J. Astola ad D. Akopia. Architecture-orieted regular algorithms for discrete sie ad cosie trasforms. IEEE Trasactios o Sigal Processig, 47(4): , [2] D. Cohe. Simplified cotrol of FFT hardare. IEEE Trasactios o Acoustics, Speech, ad Sigal Processig, 24(6): , [3] J. W. Cooley ad J. W. Tukey. A algorithm for the machie calculatio of compex Fourier series. Mathematics of Computatio, 19(9), [4] N. Dave, M. Pellauer, S. Gerdig, ad Arvid a trasmitter: a case study i microarchitectural exploratio. I MEMOCODE, 26. [5] J. Graata, M. Coer, ad R. Tolimieri. The tesor product: a mathematical programmig laguage for FFTs ad other fast DSP operatios. Sigal Processig Magazie, IEEE, 9(1):4 48, [6] S. He ad M. Torkelso. e approach to pipelie FFT processor. I Proc. Iteratioal Parallel Processig Symposium, [7] P. Kumhom, J. Johso, ad P. Nagvajara. Desig, optimizatio, ad implemetatio of a uiversal FFT processor. I Proc. 13th IEEE ASIC/SOC Coferece, 2. [8] G. Nordi, P. A. Milder, J. C. Hoe, ad M. Püschel. Automatic geeratio of customized discrete Fourier trasform IPs. I Desig Automatio Coferece (DAC), pages , 25. [9] M. C. Pease. A adaptatio of the fast Fourier trasform for parallel processig. Joural of the ACM, 15(2), April [1] M. Püschel, J. M. F. Moura, J. Johso, D. Padua, M. Veloso, B. W. Siger, J. Xiog, F. Frachetti, A. Gačić,. Voroeko, K. Che, R. W. Johso, ad N. Rizzolo. SPIRAL: Code geeratio for DSP trasforms. Proc. of the IEEE, 93(2): , 25. [11] C. Va Loa. Computatioal Frameorks for the Fast Fourier Trasform. SIAM, 1992.

SPIRAL DSP Transform Compiler:

SPIRAL DSP Transform Compiler: SPIRAL DSP Trasform Compiler: Applicatio Specific Hardware Sythesis Peter A. Milder (peter.milder@stoybroo.edu) Fraz Frachetti, James C. Hoe, ad Marus Pueschel Departmet of ECE Caregie Mello Uiversity

More information

Discrete Fourier Transform Compiler: From Mathematical Representation to Efficient Hardware

Discrete Fourier Transform Compiler: From Mathematical Representation to Efficient Hardware Discrete Fourier Trasform Compiler: From Mathematical Represetatio to Efficiet Hardware Peter A. Milder, Fraz Frachetti, James C. Hoe, ad Markus Püschel Electrical ad Computer Egieerig Departmet Caregie

More information

Chapter 3 Classification of FFT Processor Algorithms

Chapter 3 Classification of FFT Processor Algorithms Chapter Classificatio of FFT Processor Algorithms The computatioal complexity of the Discrete Fourier trasform (DFT) is very high. It requires () 2 complex multiplicatios ad () complex additios [5]. As

More information

Fast Fourier Transform (FFT) Algorithms

Fast Fourier Transform (FFT) Algorithms Fast Fourier Trasform FFT Algorithms Relatio to the z-trasform elsewhere, ozero, z x z X x [ ] 2 ~ elsewhere,, ~ e j x X x x π j e z z X X π 2 ~ The DFS X represets evely spaced samples of the z- trasform

More information

APPLICATION NOTE PACE1750AE BUILT-IN FUNCTIONS

APPLICATION NOTE PACE1750AE BUILT-IN FUNCTIONS APPLICATION NOTE PACE175AE BUILT-IN UNCTIONS About This Note This applicatio brief is iteded to explai ad demostrate the use of the special fuctios that are built ito the PACE175AE processor. These powerful

More information

Improvement of the Orthogonal Code Convolution Capabilities Using FPGA Implementation

Improvement of the Orthogonal Code Convolution Capabilities Using FPGA Implementation Improvemet of the Orthogoal Code Covolutio Capabilities Usig FPGA Implemetatio Naima Kaabouch, Member, IEEE, Apara Dhirde, Member, IEEE, Saleh Faruque, Member, IEEE Departmet of Electrical Egieerig, Uiversity

More information

Elementary Educational Computer

Elementary Educational Computer Chapter 5 Elemetary Educatioal Computer. Geeral structure of the Elemetary Educatioal Computer (EEC) The EEC coforms to the 5 uits structure defied by vo Neuma's model (.) All uits are preseted i a simplified

More information

EE123 Digital Signal Processing

EE123 Digital Signal Processing Last Time EE Digital Sigal Processig Lecture 7 Block Covolutio, Overlap ad Add, FFT Discrete Fourier Trasform Properties of the Liear covolutio through circular Today Liear covolutio with Overlap ad add

More information

CSC 220: Computer Organization Unit 11 Basic Computer Organization and Design

CSC 220: Computer Organization Unit 11 Basic Computer Organization and Design College of Computer ad Iformatio Scieces Departmet of Computer Sciece CSC 220: Computer Orgaizatio Uit 11 Basic Computer Orgaizatio ad Desig 1 For the rest of the semester, we ll focus o computer architecture:

More information

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe CHAPTER 19 Query Optimizatio Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe Itroductio Query optimizatio Coducted by a query optimizer i a DBMS Goal:

More information

Data Structures and Algorithms. Analysis of Algorithms

Data Structures and Algorithms. Analysis of Algorithms Data Structures ad Algorithms Aalysis of Algorithms Outlie Ruig time Pseudo-code Big-oh otatio Big-theta otatio Big-omega otatio Asymptotic algorithm aalysis Aalysis of Algorithms Iput Algorithm Output

More information

A SOFTWARE MODEL FOR THE MULTILAYER PERCEPTRON

A SOFTWARE MODEL FOR THE MULTILAYER PERCEPTRON A SOFTWARE MODEL FOR THE MULTILAYER PERCEPTRON Roberto Lopez ad Eugeio Oñate Iteratioal Ceter for Numerical Methods i Egieerig (CIMNE) Edificio C1, Gra Capitá s/, 08034 Barceloa, Spai ABSTRACT I this work

More information

Running Time. Analysis of Algorithms. Experimental Studies. Limitations of Experiments

Running Time. Analysis of Algorithms. Experimental Studies. Limitations of Experiments Ruig Time Aalysis of Algorithms Iput Algorithm Output A algorithm is a step-by-step procedure for solvig a problem i a fiite amout of time. Most algorithms trasform iput objects ito output objects. The

More information

Running Time ( 3.1) Analysis of Algorithms. Experimental Studies. Limitations of Experiments

Running Time ( 3.1) Analysis of Algorithms. Experimental Studies. Limitations of Experiments Ruig Time ( 3.1) Aalysis of Algorithms Iput Algorithm Output A algorithm is a step- by- step procedure for solvig a problem i a fiite amout of time. Most algorithms trasform iput objects ito output objects.

More information

Analysis of Algorithms

Analysis of Algorithms Aalysis of Algorithms Iput Algorithm Output A algorithm is a step-by-step procedure for solvig a problem i a fiite amout of time. Ruig Time Most algorithms trasform iput objects ito output objects. The

More information

3D Model Retrieval Method Based on Sample Prediction

3D Model Retrieval Method Based on Sample Prediction 20 Iteratioal Coferece o Computer Commuicatio ad Maagemet Proc.of CSIT vol.5 (20) (20) IACSIT Press, Sigapore 3D Model Retrieval Method Based o Sample Predictio Qigche Zhag, Ya Tag* School of Computer

More information

An Efficient Algorithm for Graph Bisection of Triangularizations

An Efficient Algorithm for Graph Bisection of Triangularizations A Efficiet Algorithm for Graph Bisectio of Triagularizatios Gerold Jäger Departmet of Computer Sciece Washigto Uiversity Campus Box 1045 Oe Brookigs Drive St. Louis, Missouri 63130-4899, USA jaegerg@cse.wustl.edu

More information

Lecture Notes 6 Introduction to algorithm analysis CSS 501 Data Structures and Object-Oriented Programming

Lecture Notes 6 Introduction to algorithm analysis CSS 501 Data Structures and Object-Oriented Programming Lecture Notes 6 Itroductio to algorithm aalysis CSS 501 Data Structures ad Object-Orieted Programmig Readig for this lecture: Carrao, Chapter 10 To be covered i this lecture: Itroductio to algorithm aalysis

More information

UNIVERSITY OF MORATUWA

UNIVERSITY OF MORATUWA UNIVERSITY OF MORATUWA FACULTY OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING B.Sc. Egieerig 2014 Itake Semester 2 Examiatio CS2052 COMPUTER ARCHITECTURE Time allowed: 2 Hours Jauary 2016

More information

Lecture 1: Introduction and Strassen s Algorithm

Lecture 1: Introduction and Strassen s Algorithm 5-750: Graduate Algorithms Jauary 7, 08 Lecture : Itroductio ad Strasse s Algorithm Lecturer: Gary Miller Scribe: Robert Parker Itroductio Machie models I this class, we will primarily use the Radom Access

More information

Ones Assignment Method for Solving Traveling Salesman Problem

Ones Assignment Method for Solving Traveling Salesman Problem Joural of mathematics ad computer sciece 0 (0), 58-65 Oes Assigmet Method for Solvig Travelig Salesma Problem Hadi Basirzadeh Departmet of Mathematics, Shahid Chamra Uiversity, Ahvaz, Ira Article history:

More information

How do we evaluate algorithms?

How do we evaluate algorithms? F2 Readig referece: chapter 2 + slides Algorithm complexity Big O ad big Ω To calculate ruig time Aalysis of recursive Algorithms Next time: Litterature: slides mostly The first Algorithm desig methods:

More information

The Magma Database file formats

The Magma Database file formats The Magma Database file formats Adrew Gaylard, Bret Pikey, ad Mart-Mari Breedt Johaesburg, South Africa 15th May 2006 1 Summary Magma is a ope-source object database created by Chris Muller, of Kasas City,

More information

Appendix D. Controller Implementation

Appendix D. Controller Implementation COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Appedix D Cotroller Implemetatio Cotroller Implemetatios Combiatioal logic (sigle-cycle); Fiite state machie (multi-cycle, pipelied);

More information

Analysis Metrics. Intro to Algorithm Analysis. Slides. 12. Alg Analysis. 12. Alg Analysis

Analysis Metrics. Intro to Algorithm Analysis. Slides. 12. Alg Analysis. 12. Alg Analysis Itro to Algorithm Aalysis Aalysis Metrics Slides. Table of Cotets. Aalysis Metrics 3. Exact Aalysis Rules 4. Simple Summatio 5. Summatio Formulas 6. Order of Magitude 7. Big-O otatio 8. Big-O Theorems

More information

An Efficient Algorithm for Graph Bisection of Triangularizations

An Efficient Algorithm for Graph Bisection of Triangularizations Applied Mathematical Scieces, Vol. 1, 2007, o. 25, 1203-1215 A Efficiet Algorithm for Graph Bisectio of Triagularizatios Gerold Jäger Departmet of Computer Sciece Washigto Uiversity Campus Box 1045, Oe

More information

A New Morphological 3D Shape Decomposition: Grayscale Interframe Interpolation Method

A New Morphological 3D Shape Decomposition: Grayscale Interframe Interpolation Method A ew Morphological 3D Shape Decompositio: Grayscale Iterframe Iterpolatio Method D.. Vizireau Politehica Uiversity Bucharest, Romaia ae@comm.pub.ro R. M. Udrea Politehica Uiversity Bucharest, Romaia mihea@comm.pub.ro

More information

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe CHAPTER 18 Strategies for Query Processig Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe Itroductio DBMS techiques to process a query Scaer idetifies

More information

Chapter 1. Introduction to Computers and C++ Programming. Copyright 2015 Pearson Education, Ltd.. All rights reserved.

Chapter 1. Introduction to Computers and C++ Programming. Copyright 2015 Pearson Education, Ltd.. All rights reserved. Chapter 1 Itroductio to Computers ad C++ Programmig Copyright 2015 Pearso Educatio, Ltd.. All rights reserved. Overview 1.1 Computer Systems 1.2 Programmig ad Problem Solvig 1.3 Itroductio to C++ 1.4 Testig

More information

9.1. Sequences and Series. Sequences. What you should learn. Why you should learn it. Definition of Sequence

9.1. Sequences and Series. Sequences. What you should learn. Why you should learn it. Definition of Sequence _9.qxd // : AM Page Chapter 9 Sequeces, Series, ad Probability 9. Sequeces ad Series What you should lear Use sequece otatio to write the terms of sequeces. Use factorial otatio. Use summatio otatio to

More information

Structuring Redundancy for Fault Tolerance. CSE 598D: Fault Tolerant Software

Structuring Redundancy for Fault Tolerance. CSE 598D: Fault Tolerant Software Structurig Redudacy for Fault Tolerace CSE 598D: Fault Tolerat Software What do we wat to achieve? Versios Damage Assessmet Versio 1 Error Detectio Iputs Versio 2 Voter Outputs State Restoratio Cotiued

More information

Pseudocode ( 1.1) Analysis of Algorithms. Primitive Operations. Pseudocode Details. Running Time ( 1.1) Estimating performance

Pseudocode ( 1.1) Analysis of Algorithms. Primitive Operations. Pseudocode Details. Running Time ( 1.1) Estimating performance Aalysis of Algorithms Iput Algorithm Output A algorithm is a step-by-step procedure for solvig a problem i a fiite amout of time. Pseudocode ( 1.1) High-level descriptio of a algorithm More structured

More information

Reversible Realization of Quaternary Decoder, Multiplexer, and Demultiplexer Circuits

Reversible Realization of Quaternary Decoder, Multiplexer, and Demultiplexer Circuits Egieerig Letters, :, EL Reversible Realizatio of Quaterary Decoder, Multiplexer, ad Demultiplexer Circuits Mozammel H.. Kha, Member, ENG bstract quaterary reversible circuit is more compact tha the correspodig

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Part A Datapath Design

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Part A Datapath Design COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter The Processor Part A path Desig Itroductio CPU performace factors Istructio cout Determied by ISA ad compiler. CPI ad

More information

Copyright 1982, by the author(s). All rights reserved.

Copyright 1982, by the author(s). All rights reserved. Copyright 1982, by the author(s). All rights reserved. Permissio to make digital or hard copies of all or part of this work for persoal or classroom use is grated without fee provided that copies are ot

More information

GPUMP: a Multiple-Precision Integer Library for GPUs

GPUMP: a Multiple-Precision Integer Library for GPUs GPUMP: a Multiple-Precisio Iteger Library for GPUs Kaiyog Zhao ad Xiaowe Chu Departmet of Computer Sciece, Hog Kog Baptist Uiversity Hog Kog, P. R. Chia Email: {kyzhao, chxw}@comp.hkbu.edu.hk Abstract

More information

Bayesian approach to reliability modelling for a probability of failure on demand parameter

Bayesian approach to reliability modelling for a probability of failure on demand parameter Bayesia approach to reliability modellig for a probability of failure o demad parameter BÖRCSÖK J., SCHAEFER S. Departmet of Computer Architecture ad System Programmig Uiversity Kassel, Wilhelmshöher Allee

More information

A Parallel DFA Minimization Algorithm

A Parallel DFA Minimization Algorithm A Parallel DFA Miimizatio Algorithm Ambuj Tewari, Utkarsh Srivastava, ad P. Gupta Departmet of Computer Sciece & Egieerig Idia Istitute of Techology Kapur Kapur 208 016,INDIA pg@iitk.ac.i Abstract. I this

More information

Outline. Research Definition. Motivation. Foundation of Reverse Engineering. Dynamic Analysis and Design Pattern Detection in Java Programs

Outline. Research Definition. Motivation. Foundation of Reverse Engineering. Dynamic Analysis and Design Pattern Detection in Java Programs Dyamic Aalysis ad Desig Patter Detectio i Java Programs Outlie Lei Hu Kamra Sartipi {hul4, sartipi}@mcmasterca Departmet of Computig ad Software McMaster Uiversity Caada Motivatio Research Problem Defiitio

More information

One advantage that SONAR has over any other music-sequencing product I ve worked

One advantage that SONAR has over any other music-sequencing product I ve worked *gajedra* D:/Thomso_Learig_Projects/Garrigus_163132/z_productio/z_3B2_3D_files/Garrigus_163132_ch17.3d, 14/11/08/16:26:39, 16:26, page: 647 17 CAL 101 Oe advatage that SONAR has over ay other music-sequecig

More information

6.854J / J Advanced Algorithms Fall 2008

6.854J / J Advanced Algorithms Fall 2008 MIT OpeCourseWare http://ocw.mit.edu 6.854J / 18.415J Advaced Algorithms Fall 2008 For iformatio about citig these materials or our Terms of Use, visit: http://ocw.mit.edu/terms. 18.415/6.854 Advaced Algorithms

More information

1. SWITCHING FUNDAMENTALS

1. SWITCHING FUNDAMENTALS . SWITCING FUNDMENTLS Switchig is the provisio of a o-demad coectio betwee two ed poits. Two distict switchig techiques are employed i commuicatio etwors-- circuit switchig ad pacet switchig. Circuit switchig

More information

Outline and Reading. Analysis of Algorithms. Running Time. Experimental Studies. Limitations of Experiments. Theoretical Analysis

Outline and Reading. Analysis of Algorithms. Running Time. Experimental Studies. Limitations of Experiments. Theoretical Analysis Outlie ad Readig Aalysis of Algorithms Iput Algorithm Output Ruig time ( 3.) Pseudo-code ( 3.2) Coutig primitive operatios ( 3.3-3.) Asymptotic otatio ( 3.6) Asymptotic aalysis ( 3.7) Case study Aalysis

More information

ISSN (Print) Research Article. *Corresponding author Nengfa Hu

ISSN (Print) Research Article. *Corresponding author Nengfa Hu Scholars Joural of Egieerig ad Techology (SJET) Sch. J. Eg. Tech., 2016; 4(5):249-253 Scholars Academic ad Scietific Publisher (A Iteratioal Publisher for Academic ad Scietific Resources) www.saspublisher.com

More information

Generation of Distributed Arithmetic Designs for Reconfigurable Applications

Generation of Distributed Arithmetic Designs for Reconfigurable Applications Geeratio of Distributed Arithmetic Desigs for Recofigurable Applicatios Christophe Bobda, Ali Ahmadiia, Jürge Teich Uiversity of Erlage-Nuremberg Departmet of computer sciece Am Weichselgarte 3, 91058

More information

EE 459/500 HDL Based Digital Design with Programmable Logic. Lecture 13 Control and Sequencing: Hardwired and Microprogrammed Control

EE 459/500 HDL Based Digital Design with Programmable Logic. Lecture 13 Control and Sequencing: Hardwired and Microprogrammed Control EE 459/500 HDL Based Digital Desig with Programmable Logic Lecture 13 Cotrol ad Sequecig: Hardwired ad Microprogrammed Cotrol Refereces: Chapter s 4,5 from textbook Chapter 7 of M.M. Mao ad C.R. Kime,

More information

Software development of components for complex signal analysis on the example of adaptive recursive estimation methods.

Software development of components for complex signal analysis on the example of adaptive recursive estimation methods. Software developmet of compoets for complex sigal aalysis o the example of adaptive recursive estimatio methods. SIMON BOYMANN, RALPH MASCHOTTA, SILKE LEHMANN, DUNJA STEUER Istitute of Biomedical Egieerig

More information

Outline. Applications of FFT in Communications. Fundamental FFT Algorithms. FFT Circuit Design Architectures. Conclusions

Outline. Applications of FFT in Communications. Fundamental FFT Algorithms. FFT Circuit Design Architectures. Conclusions FFT Circuit Desig Outlie Applicatios of FFT i Commuicatios Fudametal FFT Algorithms FFT Circuit Desig Architectures Coclusios DAB Receiver Tuer OFDM Demodulator Chael Decoder Mpeg Audio Decoder 56/5/ 4/48

More information

. Written in factored form it is easy to see that the roots are 2, 2, i,

. Written in factored form it is easy to see that the roots are 2, 2, i, CMPS A Itroductio to Programmig Programmig Assigmet 4 I this assigmet you will write a java program that determies the real roots of a polyomial that lie withi a specified rage. Recall that the roots (or

More information

Performance Plus Software Parameter Definitions

Performance Plus Software Parameter Definitions Performace Plus+ Software Parameter Defiitios/ Performace Plus Software Parameter Defiitios Chapma Techical Note-TG-5 paramete.doc ev-0-03 Performace Plus+ Software Parameter Defiitios/2 Backgroud ad Defiitios

More information

Efficient Hardware Design for Implementation of Matrix Multiplication by using PPI-SO

Efficient Hardware Design for Implementation of Matrix Multiplication by using PPI-SO Efficiet Hardware Desig for Implemetatio of Matrix Multiplicatio by usig PPI-SO Shivagi Tiwari, Niti Meea Dept. of EC, IES College of Techology, Bhopal, Idia Assistat Professor, Dept. of EC, IES College

More information

CIS 121 Data Structures and Algorithms with Java Fall Big-Oh Notation Tuesday, September 5 (Make-up Friday, September 8)

CIS 121 Data Structures and Algorithms with Java Fall Big-Oh Notation Tuesday, September 5 (Make-up Friday, September 8) CIS 11 Data Structures ad Algorithms with Java Fall 017 Big-Oh Notatio Tuesday, September 5 (Make-up Friday, September 8) Learig Goals Review Big-Oh ad lear big/small omega/theta otatios Practice solvig

More information

Chapter 4. Procedural Abstraction and Functions That Return a Value. Copyright 2015 Pearson Education, Ltd.. All rights reserved.

Chapter 4. Procedural Abstraction and Functions That Return a Value. Copyright 2015 Pearson Education, Ltd.. All rights reserved. Chapter 4 Procedural Abstractio ad Fuctios That Retur a Value Copyright 2015 Pearso Educatio, Ltd.. All rights reserved. Overview 4.1 Top-Dow Desig 4.2 Predefied Fuctios 4.3 Programmer-Defied Fuctios 4.4

More information

Data diverse software fault tolerance techniques

Data diverse software fault tolerance techniques Data diverse software fault tolerace techiques Complemets desig diversity by compesatig for desig diversity s s limitatios Ivolves obtaiig a related set of poits i the program data space, executig the

More information

Creating Exact Bezier Representations of CST Shapes. David D. Marshall. California Polytechnic State University, San Luis Obispo, CA , USA

Creating Exact Bezier Representations of CST Shapes. David D. Marshall. California Polytechnic State University, San Luis Obispo, CA , USA Creatig Exact Bezier Represetatios of CST Shapes David D. Marshall Califoria Polytechic State Uiversity, Sa Luis Obispo, CA 93407-035, USA The paper presets a method of expressig CST shapes pioeered by

More information

Octahedral Graph Scaling

Octahedral Graph Scaling Octahedral Graph Scalig Peter Russell Jauary 1, 2015 Abstract There is presetly o strog iterpretatio for the otio of -vertex graph scalig. This paper presets a ew defiitio for the term i the cotext of

More information

Automatic Generation of Polynomial-Basis Multipliers in GF (2 n ) using Recursive VHDL

Automatic Generation of Polynomial-Basis Multipliers in GF (2 n ) using Recursive VHDL Automatic Geeratio of Polyomial-Basis Multipliers i GF (2 ) usig Recursive VHDL J. Nelso, G. Lai, A. Teca Abstract Multiplicatio i GF (2 ) is very commoly used i the fields of cryptography ad error correctig

More information

Computer Systems - HS

Computer Systems - HS What have we leared so far? Computer Systems High Level ENGG1203 2d Semester, 2017-18 Applicatios Sigals Systems & Cotrol Systems Computer & Embedded Systems Digital Logic Combiatioal Logic Sequetial Logic

More information

FPGA IMPLEMENTATION OF BASE-N LOGARITHM. Salvador E. Tropea

FPGA IMPLEMENTATION OF BASE-N LOGARITHM. Salvador E. Tropea FPGA IMPLEMENTATION OF BASE-N LOGARITHM Salvador E. Tropea Electróica e Iformática Istituto Nacioal de Tecología Idustrial Bueos Aires, Argetia email: salvador@iti.gov.ar ABSTRACT I this work, we preset

More information

Counting the Number of Minimum Roman Dominating Functions of a Graph

Counting the Number of Minimum Roman Dominating Functions of a Graph Coutig the Number of Miimum Roma Domiatig Fuctios of a Graph SHI ZHENG ad KOH KHEE MENG, Natioal Uiversity of Sigapore We provide two algorithms coutig the umber of miimum Roma domiatig fuctios of a graph

More information

Load balanced Parallel Prime Number Generator with Sieve of Eratosthenes on Cluster Computers *

Load balanced Parallel Prime Number Generator with Sieve of Eratosthenes on Cluster Computers * Load balaced Parallel Prime umber Geerator with Sieve of Eratosthees o luster omputers * Soowook Hwag*, Kyusik hug**, ad Dogseug Kim* *Departmet of Electrical Egieerig Korea Uiversity Seoul, -, Rep. of

More information

QMDD and Spectral Transformation of Binary and Multiple-Valued Functions *

QMDD and Spectral Transformation of Binary and Multiple-Valued Functions * QMDD ad Spectral Trasformatio of Biary ad Multiple-Valued Fuctios * D. Michael Miller Uiversity of Victoria Victoria, BC, Caada mmiller@uvic.ca Mitchell A. Thorto Souther Methodist Uiversity Dallas, TX,

More information

A General Framework for Accurate Statistical Timing Analysis Considering Correlations

A General Framework for Accurate Statistical Timing Analysis Considering Correlations A Geeral Framework for Accurate Statistical Timig Aalysis Cosiderig Correlatios 7.4 Vishal Khadelwal Departmet of ECE Uiversity of Marylad-College Park vishalk@glue.umd.edu Akur Srivastava Departmet of

More information

AN OPTIMIZATION NETWORK FOR MATRIX INVERSION

AN OPTIMIZATION NETWORK FOR MATRIX INVERSION 397 AN OPTIMIZATION NETWORK FOR MATRIX INVERSION Ju-Seog Jag, S~ Youg Lee, ad Sag-Yug Shi Korea Advaced Istitute of Sciece ad Techology, P.O. Box 150, Cheogryag, Seoul, Korea ABSTRACT Iverse matrix calculatio

More information

Appendix A. Use of Operators in ARPS

Appendix A. Use of Operators in ARPS A Appedix A. Use of Operators i ARPS The methodology for solvig the equatios of hydrodyamics i either differetial or itegral form usig grid-poit techiques (fiite differece, fiite volume, fiite elemet)

More information

Analysis of Server Resource Consumption of Meteorological Satellite Application System Based on Contour Curve

Analysis of Server Resource Consumption of Meteorological Satellite Application System Based on Contour Curve Advaces i Computer, Sigals ad Systems (2018) 2: 19-25 Clausius Scietific Press, Caada Aalysis of Server Resource Cosumptio of Meteorological Satellite Applicatio System Based o Cotour Curve Xiagag Zhao

More information

Solving Fuzzy Assignment Problem Using Fourier Elimination Method

Solving Fuzzy Assignment Problem Using Fourier Elimination Method Global Joural of Pure ad Applied Mathematics. ISSN 0973-768 Volume 3, Number 2 (207), pp. 453-462 Research Idia Publicatios http://www.ripublicatio.com Solvig Fuzzy Assigmet Problem Usig Fourier Elimiatio

More information

GE FUNDAMENTALS OF COMPUTING AND PROGRAMMING UNIT III

GE FUNDAMENTALS OF COMPUTING AND PROGRAMMING UNIT III GE2112 - FUNDAMENTALS OF COMPUTING AND PROGRAMMING UNIT III PROBLEM SOLVING AND OFFICE APPLICATION SOFTWARE Plaig the Computer Program Purpose Algorithm Flow Charts Pseudocode -Applicatio Software Packages-

More information

Lecture 5. Counting Sort / Radix Sort

Lecture 5. Counting Sort / Radix Sort Lecture 5. Coutig Sort / Radix Sort T. H. Corme, C. E. Leiserso ad R. L. Rivest Itroductio to Algorithms, 3rd Editio, MIT Press, 2009 Sugkyukwa Uiversity Hyuseug Choo choo@skku.edu Copyright 2000-2018

More information

EE260: Digital Design, Spring /16/18. n Example: m 0 (=x 1 x 2 ) is adjacent to m 1 (=x 1 x 2 ) and m 2 (=x 1 x 2 ) but NOT m 3 (=x 1 x 2 )

EE260: Digital Design, Spring /16/18. n Example: m 0 (=x 1 x 2 ) is adjacent to m 1 (=x 1 x 2 ) and m 2 (=x 1 x 2 ) but NOT m 3 (=x 1 x 2 ) EE26: Digital Desig, Sprig 28 3/6/8 EE 26: Itroductio to Digital Desig Combiatioal Datapath Yao Zheg Departmet of Electrical Egieerig Uiversity of Hawaiʻi at Māoa Combiatioal Logic Blocks Multiplexer Ecoders/Decoders

More information

Improving Template Based Spike Detection

Improving Template Based Spike Detection Improvig Template Based Spike Detectio Kirk Smith, Member - IEEE Portlad State Uiversity petra@ee.pdx.edu Abstract Template matchig algorithms like SSE, Covolutio ad Maximum Likelihood are well kow for

More information

IMP: Superposer Integrated Morphometrics Package Superposition Tool

IMP: Superposer Integrated Morphometrics Package Superposition Tool IMP: Superposer Itegrated Morphometrics Package Superpositio Tool Programmig by: David Lieber ( 03) Caisius College 200 Mai St. Buffalo, NY 4208 Cocept by: H. David Sheets, Dept. of Physics, Caisius College

More information

Morgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5

Morgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5 Morga Kaufma Publishers 26 February, 28 COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter 5 Set-Associative Cache Architecture Performace Summary Whe CPU performace icreases:

More information

Redundancy Allocation for Series Parallel Systems with Multiple Constraints and Sensitivity Analysis

Redundancy Allocation for Series Parallel Systems with Multiple Constraints and Sensitivity Analysis IOSR Joural of Egieerig Redudacy Allocatio for Series Parallel Systems with Multiple Costraits ad Sesitivity Aalysis S. V. Suresh Babu, D.Maheswar 2, G. Ragaath 3 Y.Viaya Kumar d G.Sakaraiah e (Mechaical

More information

Recursive Procedures. How can you model the relationship between consecutive terms of a sequence?

Recursive Procedures. How can you model the relationship between consecutive terms of a sequence? 6. Recursive Procedures I Sectio 6.1, you used fuctio otatio to write a explicit formula to determie the value of ay term i a Sometimes it is easier to calculate oe term i a sequece usig the previous terms.

More information

End Semester Examination CSE, III Yr. (I Sem), 30002: Computer Organization

End Semester Examination CSE, III Yr. (I Sem), 30002: Computer Organization Ed Semester Examiatio 2013-14 CSE, III Yr. (I Sem), 30002: Computer Orgaizatio Istructios: GROUP -A 1. Write the questio paper group (A, B, C, D), o frot page top of aswer book, as per what is metioed

More information

The golden search method: Question 1

The golden search method: Question 1 1. Golde Sectio Search for the Mode of a Fuctio The golde search method: Questio 1 Suppose the last pair of poits at which we have a fuctio evaluatio is x(), y(). The accordig to the method, If f(x())

More information

CSE 305. Computer Architecture

CSE 305. Computer Architecture CSE 305 Computer Architecture Computer Architecture Course Teachers Rifat Shahriyar (rifat1816@gmail.com) Johra Muhammad Moosa Textbook Computer Orgaizatio ad Desig (The Hardware/Software Iterface) David

More information

Adaptive Resource Allocation for Electric Environmental Pollution through the Control Network

Adaptive Resource Allocation for Electric Environmental Pollution through the Control Network Available olie at www.sciecedirect.com Eergy Procedia 6 (202) 60 64 202 Iteratioal Coferece o Future Eergy, Eviromet, ad Materials Adaptive Resource Allocatio for Electric Evirometal Pollutio through the

More information

Math 10C Long Range Plans

Math 10C Long Range Plans Math 10C Log Rage Plas Uits: Evaluatio: Homework, projects ad assigmets 10% Uit Tests. 70% Fial Examiatio.. 20% Ay Uit Test may be rewritte for a higher mark. If the retest mark is higher, that mark will

More information

DESIGN AND ANALYSIS OF LDPC DECODERS FOR SOFTWARE DEFINED RADIO

DESIGN AND ANALYSIS OF LDPC DECODERS FOR SOFTWARE DEFINED RADIO DESIGN AND ANALYSIS OF LDPC DECODERS FOR SOFTWARE DEFINED RADIO Sagwo Seo, Trevor Mudge Advaced Computer Architecture Laboratory Uiversity of Michiga at A Arbor {swseo, tm}@umich.edu Yumig Zhu, Chaitali

More information

COARSE ANGLE ROTATION MODE CORDIC BASED SINGLE PROCESSING ELEMENT QR-RLS PROCESSOR

COARSE ANGLE ROTATION MODE CORDIC BASED SINGLE PROCESSING ELEMENT QR-RLS PROCESSOR 7th Europea Sigal Processig Coferece (EUSIPCO 9 Glasgow, Scotlad, August 4-8, 9 COARSE ANGLE ROTATION MODE CORDIC BASED SINGLE PROCESSING ELEMENT QR-RLS PROCESSOR Qiag Gao, Louise Crockett ad Robert Stewart

More information

Analysis of Algorithms

Analysis of Algorithms Presetatio for use with the textbook, Algorithm Desig ad Applicatios, by M. T. Goodrich ad R. Tamassia, Wiley, 2015 Aalysis of Algorithms Iput 2015 Goodrich ad Tamassia Algorithm Aalysis of Algorithms

More information

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe CHAPTER 22 Database Recovery Techiques Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe Itroductio Recovery algorithms Recovery cocepts Write-ahead

More information

Arithmetic Sequences

Arithmetic Sequences . Arithmetic Sequeces COMMON CORE Learig Stadards HSF-IF.A. HSF-BF.A.1a HSF-BF.A. HSF-LE.A. Essetial Questio How ca you use a arithmetic sequece to describe a patter? A arithmetic sequece is a ordered

More information

ANN WHICH COVERS MLP AND RBF

ANN WHICH COVERS MLP AND RBF ANN WHICH COVERS MLP AND RBF Josef Boští, Jaromír Kual Faculty of Nuclear Scieces ad Physical Egieerig, CTU i Prague Departmet of Software Egieerig Abstract Two basic types of artificial eural etwors Multi

More information

BOOLEAN DIFFERENTIATION EQUATIONS APPLICABLE IN RECONFIGURABLE COMPUTATIONAL MEDIUM

BOOLEAN DIFFERENTIATION EQUATIONS APPLICABLE IN RECONFIGURABLE COMPUTATIONAL MEDIUM MATEC Web of Cofereces 79, 01014 (016) DOI: 10.1051/ mateccof/0167901014 T 016 BOOLEAN DIFFERENTIATION EQUATIONS APPLICABLE IN RECONFIGURABLE COMPUTATIONAL MEDIUM Staislav Shidlovskiy 1, 1 Natioal Research

More information

Consider the following population data for the state of California. Year Population

Consider the following population data for the state of California. Year Population Assigmets for Bradie Fall 2016 for Chapter 5 Assigmet sheet for Sectios 5.1, 5.3, 5.5, 5.6, 5.7, 5.8 Read Pages 341-349 Exercises for Sectio 5.1 Lagrage Iterpolatio #1, #4, #7, #13, #14 For #1 use MATLAB

More information

BOOLEAN MATHEMATICS: GENERAL THEORY

BOOLEAN MATHEMATICS: GENERAL THEORY CHAPTER 3 BOOLEAN MATHEMATICS: GENERAL THEORY 3.1 ISOMORPHIC PROPERTIES The ame Boolea Arithmetic was chose because it was discovered that literal Boolea Algebra could have a isomorphic umerical aspect.

More information

CIS 121 Data Structures and Algorithms with Java Spring Stacks, Queues, and Heaps Monday, February 18 / Tuesday, February 19

CIS 121 Data Structures and Algorithms with Java Spring Stacks, Queues, and Heaps Monday, February 18 / Tuesday, February 19 CIS Data Structures ad Algorithms with Java Sprig 09 Stacks, Queues, ad Heaps Moday, February 8 / Tuesday, February 9 Stacks ad Queues Recall the stack ad queue ADTs (abstract data types from lecture.

More information

n n B. How many subsets of C are there of cardinality n. We are selecting elements for such a

n n B. How many subsets of C are there of cardinality n. We are selecting elements for such a 4. [10] Usig a combiatorial argumet, prove that for 1: = 0 = Let A ad B be disjoit sets of cardiality each ad C = A B. How may subsets of C are there of cardiality. We are selectig elemets for such a subset

More information

FAST BIT-REVERSALS ON UNIPROCESSORS AND SHARED-MEMORY MULTIPROCESSORS

FAST BIT-REVERSALS ON UNIPROCESSORS AND SHARED-MEMORY MULTIPROCESSORS SIAM J. SCI. COMPUT. Vol. 22, No. 6, pp. 2113 2134 c 21 Society for Idustrial ad Applied Mathematics FAST BIT-REVERSALS ON UNIPROCESSORS AND SHARED-MEMORY MULTIPROCESSORS ZHAO ZHANG AND XIAODONG ZHANG

More information

Cubic Polynomial Curves with a Shape Parameter

Cubic Polynomial Curves with a Shape Parameter roceedigs of the th WSEAS Iteratioal Coferece o Robotics Cotrol ad Maufacturig Techology Hagzhou Chia April -8 00 (pp5-70) Cubic olyomial Curves with a Shape arameter MO GUOLIANG ZHAO YANAN Iformatio ad

More information

Chapter 24. Sorting. Objectives. 1. To study and analyze time efficiency of various sorting algorithms

Chapter 24. Sorting. Objectives. 1. To study and analyze time efficiency of various sorting algorithms Chapter 4 Sortig 1 Objectives 1. o study ad aalyze time efficiecy of various sortig algorithms 4. 4.7.. o desig, implemet, ad aalyze bubble sort 4.. 3. o desig, implemet, ad aalyze merge sort 4.3. 4. o

More information

Computers and Scientific Thinking

Computers and Scientific Thinking Computers ad Scietific Thikig David Reed, Creighto Uiversity Chapter 15 JavaScript Strigs 1 Strigs as Objects so far, your iteractive Web pages have maipulated strigs i simple ways use text box to iput

More information

The VSS CCD photometry spreadsheet

The VSS CCD photometry spreadsheet The VSS CCD photometry spreadsheet Itroductio This Excel spreadsheet has bee developed ad tested by the BAA VSS for aalysig results files produced by the multi-image CCD photometry procedure i AIP4Wi v2.

More information

Analysis of Algorithms

Analysis of Algorithms Aalysis of Algorithms Ruig Time of a algorithm Ruig Time Upper Bouds Lower Bouds Examples Mathematical facts Iput Algorithm Output A algorithm is a step-by-step procedure for solvig a problem i a fiite

More information

( n+1 2 ) , position=(7+1)/2 =4,(median is observation #4) Median=10lb

( n+1 2 ) , position=(7+1)/2 =4,(median is observation #4) Median=10lb Chapter 3 Descriptive Measures Measures of Ceter (Cetral Tedecy) These measures will tell us where is the ceter of our data or where most typical value of a data set lies Mode the value that occurs most

More information

CS200: Hash Tables. Prichard Ch CS200 - Hash Tables 1

CS200: Hash Tables. Prichard Ch CS200 - Hash Tables 1 CS200: Hash Tables Prichard Ch. 13.2 CS200 - Hash Tables 1 Table Implemetatios: average cases Search Add Remove Sorted array-based Usorted array-based Balaced Search Trees O(log ) O() O() O() O(1) O()

More information

Human-Computer Interaction IS4300

Human-Computer Interaction IS4300 Huma-Computer Iteractio IS4300 1 I5 due ext class Your missio i this exercise is to implemet a very simple Java paitig applicatio. The app must support the followig fuctios: Draw curves, specified by a

More information