Lifting a butterfly A component-based FFT

Size: px
Start display at page:

Download "Lifting a butterfly A component-based FFT"

Transcription

1 Scietific Programmig 11 (2003) IOS Press Liftig a butterfly A compoet-based FFT Sibylle Schupp Departmet of Computer Sciece, Resselaer Polytechic Istitute, 110 8th Street, Troy, NY 12180, USA Tel.: ; Fax: ; schupp@cs.rpi.edu Abstract. While moder software egieerig, with good reaso, tries to establish the idea of reusability ad the priciples of parameterizatio ad loosely coupled compoets eve for the desig of performace-critical software, Fast Fourier Trasforms (FFTs) ted to be moolithic ad of a very low degree of parameterizatio. The data structures to hold the iput ad output data, the elemet type of these data, the algorithm for computig the so-called twiddle factors, the storage model for a give set of twiddle factors, all are uchageably defied i the so-called butterfly, restrictig its reuse almost etirely. This paper shows a way to a compoet-based FFT by desigig a parameterized butterfly. Based o the techique of liftig, this parameterizatio icludes algorithmic ad implemetatio issues without violatig the complexity guaratees of a FFT. The paper demostrates the liftig process for the Getlema-Sade butterfly, i.e., the butterfly that uderlies the large class of decimatio-i-frequecy (DIF) FFTs, shows the resultig compoets ad summarizes the implemetatio of a compoet-based, geeric DIF library i C Itroductio Whe the Fast Fourier Trasform (FFT) became widely kow i 1965 [3], it revolutioized the use of the Discrete Fourier Trasform (DFT) because it reduced the computatioal complexity of the DFT to a poit where the methods of the DFT aalysis became applicable i practice. Sice the, the DFT has grow ito a importat tool for may disciplies i scietific ad egieerig computig, icludig quatum physics, spectral aalysis, acoustics, speech processig, optics, image processig, to ame but a few. As a result, the FFT is ow oe of most widely used algorithms. It is therefore surprisig how little this algorithm (family) has bee oticed by research i software egieerig. While software desig has dramatically chaged over the last forty years, FFT programs have chaged oly little, thus beefitig oly little from the advaces i software egieerig. Oe of the cetral ideas i software egieerig today is the idea of reusability. Moder software egieerig ow propagates the desig priciples of parameterizatio ad modularizatio via compoets [30], so that software ca be adjusted at the iterface level solely, with o eed to chage the implemetatio. I cotrast, FFT implemetatios ted to be moolithic ad of a very low degree of parameterizatio. They may allow users to customize the size ad scalig factor of the trasform, but exclude from parameterizatio quite crucial parts of the computatios. Most otably the computatioal core, the so-called butterfly, hardwires all, or most, of its parts. The data structures to hold the iput ad output data, the elemet type of these data, the algorithm for computig the so-called twiddle factors, ad the storage model for a give set of twiddle factors, all are typically uchageable at the user s level. If, however, a butterfly would be desiged i a more flexible way, be composed of loosely coupled compoets, a FFT could beefit i several ways: Users could use their ow types, e.g., for the iput data or the cotaier holdig them. These types could be itroduced specifically for the FFT or could come from a previous computatio step; they could modify the mathematical sematics of the FFT, e.g., if fixed-poit arithmetic or a complex type of high precisio were itroduced, or provide additioal fuctioality such as tracig or bookkeepig. FFT implemetors could exted or replace parts of their implemetatio without touchig the code outside the respective compoet. There are, for example, well over 30 algorithms for bit-reversal oly [14], oe of which fits all eeds equally. ISSN /03/$ IOS Press. All rights reserved

2 292 S. Schupp / Liftig a butterfly A compoet-based FFT While curretly implemetors are forced to decide o oe method, a compoet-based approach, similar to the idea of poly-algorithms [15], would allow them to offer choices ad would eable users to select the implemetatio that is most appropriate for them. Moreover, the maiteace costs of the program would decrease sice, by costructio, replacig or upgradig a compoet does ot affect the stability of ay other compoet. Simulatios ad experimetal cofiguratios could be ru easily. It is for example ofte hard to compare the differet computatioal effects of two fuctioally equivalet implemetatios without ruig each alog with the rest of the FFT, for the iteractio with other parts of the FFT ca eable or hider compiler optimizatios, improve or worse the cache behavior, ad, i geeral, have computatioal effects that are hard to predict. A compoet-based framework allows simply swappig compoets i ad out ad esures at the same time that all but the factor of iterest is set fixed. While the advatages of a compoet-based FFT usually are ot questioed directly, a commo cocer is the potetial loss of efficiecy. For a compoetbased FFT to work practically, it is therefore of utmost importace to keep the right balace betwee geericity ad efficiecy. I other words, it is ecessary to parameterize the butterfly i a way that allows ot oly for differet istatiatios but also esures that its geericity does ot violate the complexity guaratees of a FFT. How to fid the right level of abstractio, is the mai topic of this paper. Our approach is based o the techique of liftig [17,25], which, i short, starts with a o-geeric butterfly ad stepwise removes all, mostly implicit, assumptios that do ot compromise the correctess or efficiecy of the computatio. The importat idea thereby is to use the origial, o-geeric butterfly as a protectio agaist over-geeralizatio as log as it ca be regaied by specializatio, the abstractio has lifted the butterfly without compromisig its efficiecy the least. Liftig has bee the uderlyig techique for oe of the most successful recet software libraries, the Stadard Template Library (STL) [29,18]. STL became kow for icludig geericity ito highly efficiet data structures ad algorithms to a extet o other library had doe before. Sice the, STL has a tremedous impact o the desig of libraries i C++ ad other laguages. From early successors such as MTL ad POOMA [27,23] to the curret, most comprehesive iitiative for C++ libraries, Boost [7], the umber of libraries i the spirit of STL is growig. I the area of FFTs, however, a compoet-based approach has bee lackig yet. We started fillig this gap by desigig ad implemetig a compoet-based C++ library, at first for a subset of FFT algorithms: the decimatio-ifrequecy (DIF) FFT. The DIF library curretly supports four differet DIF algorithms, divided ito about 120 compoets (3000 lies of codes). I this paper, we focus o the core of a compoetbased FFT, the parameterized butterfly, show how to lift a butterfly ad preset the compoets that aturally result from the process of liftig. First, i Sectio 2, we recapitulate the mathematics of the FFT ad the DIF ad summarize the algorithmic issues of the DIF that are relevat for the desig of compoets. Sice our purpose merely is to keep the presetatio selfcotaied we refer the iterested reader to the various textbooks ad surveys, e.g. [2,4,8,16,34], for a more thorough discussio. Sectios 3 ad 4 form the two core sectios of the paper. Step by step we parameterize a moomorphic butterfly by iput types, data represetatios, algorithms, ad eve storage models ad give the a overview of the resultig DIF library. I the last part of the paper, Sectios 5 ad 6, we preset experimetal results ad discuss related ad future work. Throughout the paper we will keep the discussio mostly laguage-idepedet but illustrate the liftig process with sippets i C++. Readers eed ot have ay deep kowledge of C++ but should be able to map the preseted abstractio mechaisms to class types ad other correspodig features i objectorieted or geeric programmig laguages. Furthermore helpful is a geeral kowledge of laguages that have polymorphic type system with type parameters, as, for example, the template system i C++, the proposed extesio of the Java type system, or geerics i Ada, i which the compositio of compoets happes i a efficiet, i.e., statically resolvable, way. 2. Fourier Trasforms The term Fast Fourier Trasform deotes ot just oe but a etire class of algorithms: ay implemetatio of the Discrete Fourier Trasform (DFT) [10,13] with complexity O( log ) for iput of legth is a FFT. There are several ways, hece several FFT algorithms, to achieve logarithmic behavior. For the most frequetly used FFTs, the radix-2 FFTs for iput of size 2 k, the key idea is to arrage the computatio i a form

3 S. Schupp / Liftig a butterfly A compoet-based FFT 293 that, due to its flow graph shape, is called butterfly. Butterflies come i differet forms for differet radix-2 algorithms. The oe we focus o is the Getlema- Sade butterfly [11], which uderlies DIF FFTs. I this sectio we recall the mathematical backgroud of the DFT, the DIF, ad the Getlema-Sade butterfly ad prepare the liftig process i the ext sectio with a discussio of the relevat algorithmic issues The Discrete Fourier Trasform Let be a power of 2 ad deote by ω the followig complex root of uity ω = e 2π j/ =cos( 2π ) j si(2π ) (j 2 = 1). The (oe-dimesioal) DFT defies a mappig C C, 1 X k = x l ω kl, x l,x k C, (1) 0 k 1, where the sequece X k represets cosecutive samples i the frequecy domai. The umber is called the legth of the trasformatio. The DFT ca be represeted as a system of liear equatios (x, X C ): X 0 x 0 X 1 1 ω ω 2... ω 1 x X 2 1 ω 2 ω 4... ω 2( 1) 1 x X 3 = 1 ω... 3 ω 6... ω 3( 1) 2 x X 1 1 ω 1 ω 2( 1)...ω ( 1)2 x 1 The iverse or backward DFT ca be obtaied from the forward DFT by cojugatig the umber ω ad scalig the summatio with the iverse of the legth of the trasformatio: x l = 1 1 X k ω kl, 0 k 1. (2) k=0 Note that i some texts the meaig of forward ad backward DFT is switched ad that i a implemetatio the scale factor could be applied to either oe of them or split i two factors or eve etirely omitted. I either case, calculatig a complex DFT of legth with the stadard methods of liear algebra requires a quadratic umber of (complex) multiplicatios Decimatio-i-Frequecy FFTs The radix-2 DIF FFT is based o the observatio that a DFT of legth ca be recursively defied by two trasformatios of legth /2 ad that its iput ca be grouped i /2 butterflies, each of which performs a costat umber of floatig poit computatios. The radix-2 DIF is derived from a DFT by the followig sequece of three rewrite steps. First, the summatio i Eq. (1) is rewritte as the sum of two summatios: X k = = 1 x l ω kl + x l ω kl + = (x l + x l+ 0 k 1. l= 2 x l ω kl x l+ 2 ωk(l+ 2 ωk 2 ) ω kl, 2 ) (3) Secod, Eq. (3) is divided ito a eve-idexed ad a odd-idexed subsequece. For eve-idexed umbers, usig the idetities ω m =1ad ω 2 = ω, oe 2 obtais (0 m 2 1): X 2m = (x l + x l+ ωm 2 = (x l + x ) l+ 2 ωml 2. ) ω2ml I a similar way, oe obtais for odd-idexed umbers: X 2m+1 = (x l + x l+ Settig ow ω (2m+1)l ω(2m+1) 2 ) 2 = ((x l x ) l+ 2 ωl ) ωml, 0 m 2 1. y l =(x l + x l+ 2 ) ad Y m = X 2m i Eq. (4), yields the first sub-fft of dimesio /2: 2 (4) (5)

4 294 S. Schupp / Liftig a butterfly A compoet-based FFT Y m = Similarly, reamig y l ω ml, 0 m (6) x l y l z l =(x l x l+ 2 ) ωl ad Z m = X 2m+1 i Eq. (5), yields the secod sub-fft of dimesio /2: Z m = z l ω ml, 0 m (7) With Eqs (6) ad (7), the DFT of legth has bee replaced by two DFTs of legth /2, plus the computatio of y l ad z l, the so-called Getlema-Sade butterfly. Figure 1 depicts the aotated butterfly symbol with the three arithmetic operatios ivolved: a complex additio, a complex subtractio, ad the multiplicatio with a corrective factor, the twiddle factor ω l. Sice the legth is assumed to be a power of 2, /2 is a power of 2 agai. The same sequece of rewrite steps just described ca therefore be applied to the two sub-dfts of legth /2. Geerally speakig, at each stage, 2 i DFTs of legth 2 i ca be decomposed ito 2 i+1 DFTs of legth 2 i 1, at the cost of additios ad /2 multiplicatio. Thus, the total umber of operatios of a DIF amouts to /2 log 2 multiplicatios ad log 2 additios. Figure 2 illustrates the recursive decompositio ito smaller DFTs for the case =8, reducig a 8-poit DFT first to a 4-poit DFT ad fially to the computatio of a DFT of legth 2. O a historical ote, the DIF FFT goes back to the decimatio-i-time (DIT) algorithm, the historically first FFT [3]. DITs ad DIFs are dual to each other i the sese that a DIT ca be obtaied from a DIF by performig its computatio ad data movemets i reverse order, ad vice versa. Together, they costitute the two predomiatly used FFTs DIF algorithms As the mathematical derivatio i the last subsectio suggests, a DIF algorithm is a divide-ad-coquer algorithm. Figure 3 lists its skeleto, a est of three loops. Oe of the iterestig properties of a DIF is that it ca be performed i-place. The price for such storage efficiecy, however, is a permutatio of either the iput or the output data. As Fig. 2 shows, the repeated reorderig preseted i Eq. (3) results i the bit-reversal x l + 2 Fig. 1. Getlema-Sade butterfly. w l of the output, that is, the elemet o positio i has bee swapped with the elemet at positio j, where i is obtaied from j by writig j i biary form ad reversig its bits. However, the realizatio of the DIF that the flow graph i Fig. 2 suggests, is ot the oly possible oe. Bit-reversig the output ca be avoided if the iput is already give i bit-reversed form or if butterflies ad permutatios are itertwied. I the otatio of Chu ad George [2] we therefore distiguish the followig three kids of DIF algorithms. I a DIF NR (Fig. 2), the iput is give i atural order ad the output is bit-reversed. The iermost loop of the DIF, the butterfly loop (see Fig. 3), iitially has legth /2 ad decreases by a factor of 2 i each step. A disadvatage of the DIF NR is that the twiddle vector has to be loaded ito the iermost loop. I a DIF RN, the iput is bit-reversed while the output is i atural order. The DIF RN requires a iitial permutatio of the iput but has the advatage that the twiddle factor i the iermost loop is costat, so that the load of the twiddle cotaier ca be hoisted out. I cotrast to a DIF NR, the ier loop iitially has legth 1 ad is icreased by a factor of 2. Figure 4 shows its flow graph. Some hardware supports bit-reversed arithmetic directly. I a DIF NN, both the iput ad the output are i atural order sice the DIF NN performs the permutatio durig the computatio, without a extra permutatio step. A DIF NN eeds, however, a auxiliary cotaier, thus is the most spaceexpesive DIF. The DIF NN is sometimes called ordered DIF, its flow graph give i Fig. 4. Our DIF library cotais a algorithm, DIF NRN, which is a DIF NR with subsequet bit-reversal ad ca thus be thought of as a i-place ordered DIF. Despite the quite differet computatioal characteristics of these 4 algorithms, we will see that the compoetbased approach allows for a great amout of sharig. z l

5 S. Schupp / Liftig a butterfly A compoet-based FFT 295 x [0] X [0] x [1] w 0 X [4] x [2] w 0 X [2] x [3] w 2 w 0 X [6] x [4] w 0 X [1] x [5] w 1 w 0 X [5] x [6] x [7] w 2 w 3 w 0 w 2 w 0 X [3] X [7] Fig. 2. The flow graph of a complete DIF decompositio of a 8-poit DFT. Figure redraw from Oppeheim & Schafer [21], Fig Fig. 3. The cotrol flow of a radix-2 DIF algorithm Twiddle factors The secod computatioally itese part of a DIF, besides the loop est just discussed, is the computatio of the twiddle factors. Obviously, the trigoometric fuctios ivolved are expesive, but there are several ways to take advatage of the symmetry of complex roots of uities ad other idetities of the sie ad cosie fuctios. For FFTs of legth =2 k a widely used algorithm is Sigleto s algorithm [28], which itroduces two auxiliary costats, C =1 2si 2 (π/) ad S = si(2π/), to compute the value of cos((k+1) 2π/) ad si((k+1) 2π/), respectively, from the values of cos(k 2π/) ad si(k 2π/). For the cache behavior of a DIF, ad a FFT i geeral, it it also importat to choose the right storage model for a twiddle set. Give a twiddle cotaier of size 2 y, the recursive ature of a DIF implies that each step accesses oly half of the twiddle factors the previous step did. If the twiddle factors are stored so that they are accessed with a power-of-2-stride, the ru time of the DIF ca suffer from well-kow memory bak coflicts, a bak busy-wait situatio that occurs whe the umber of baks is a power of 2 [12]. Istead of reusig a twiddle factor across several recursio steps, a alterative storage model therefore is to store multiple copies of twiddle factors ad to arrage them so that the twiddle cotaier always is traversed with a stride of 1. Obviously, the disadvatage of this storage model is that the cotaier doubles i size. 3. Liftig the DIF At first glace, it might seem as if the differet DIF algorithms ad storage models discussed i the previous sectio required implemetatios etirely disjoit from each other. The compoet-based approach, however, allows for a great amout of code sharig. I this

6 296 S. Schupp / Liftig a butterfly A compoet-based FFT x [0] X [0] x [0] X [0] x [4] x [2] x [6] w 0 w 2 0 w 0 w X [1] X [2] X [3] x [1] x [2] x [3] w 0 w 1 w 0 w 0 X [1] X [2] X [3] x [1] w 0 X [4] x [4] w 0 X [4] x [5] x [3] x [7] w 1 w 3 2 w 2 w w 0 w 0 w 0 X [5] X [6] X [7] x [5] x [6] x [7] w 2 w 3 w 2 w 2 w 0 w 0 w 0 X [5] X [6] X [7] Fig. 4. The flow graph of a DIF RN (left) ad a DIF NN (right) 8-poit FFT. Figures redraw from Oppeheim ad Schafer [21], Figs 9.22, sectio we lift the Getlema-Sade butterfly ad the twiddle data structure ad itroduce the compoets resultig from this liftig process. To briefly reiterate what we said i the itroductio, the idea behid a parameterized butterfly is to be able to obtai differet implemetatios merely by specializig oe or more of the butterfly s parameters. As we will see, five quite differet kids of abstractios are ivolved i its parameterizatio. Startig with the o-geeric butterfly i Fig. 5, we remove all uecessary restrictios, step by step ad, to avoid over-geeralizatio, i a way that always allows regaiig the origial code simply through appropriate parameter bidigs. Although liftig itself by o meas depeds o C++, we illustrate the resultig compoets with the correspodig iterfaces i our C++ implemetatio Parameterizatio by elemet types First, we claim the hard-wirig of the type complex be a uecessary restrictio. Especially i a object-based programmig style users ofte defie their ow complex data type. Istead of usig the complex type that is provided by the implemetatio laguage, users might wat to vary, e.g., the uderlyig real type, the precisio, or the storage model; ad for those complex types, the butterfly should work i the same way as for the built-i oe. As the first step towards a geeric butterfly, therefore, we lift the restrictio to oe particular complex type ad replace the type complex by a type parameter. The declaratios i lie 1 ad 3 of Fig. 5 thus become parameterized declaratios ad the two arrays w ad ibecome parameterized arrays. Obviously, the origial implemetatio ca be regaied by bidig the ew type parameter to the origial type complex Parameterizatio by data represetatios Next, we look at the arrays themselves. Figure 5 shows that altogether three kids of operatios are performed o them: they hold data, allow for radom access, ad allow for traversal. These three operatios, by o meas, are specific to the data structure of a array but characterize radom access cotaiers i geeral. Data types like vector ad deque, for example, meet the very specificatio, thus should be usable as well. But, how do we add a level of idirectio without performace pealty? The Stadard Template Library (STL) itroduces iterators for this purpose. Operatioally speakig, iterators defie protocols for cotaier traversal ad elemet retrieval. Oe distiguishes 5 iterator categories, characterized by the directio, step-size, read or write access as well as the asymptotic costs of each operatio. It is easy to see that these 5 categories form a partial order. What makes iterators importat i the cotext of compoets is that they serve as iterfaces to cotaiers ad thereby cotrol which types of cotaiers to admit as actual parameters. If a algorithm is expressed i terms of iterator movemets rather tha operatig directly o the cotaier, this algorithm works for all cotaiers that support the respective iterator category. At the same time, it excludes all cotaiers with weaker iterators. The term weaker thereby refers to sematic as well as performace requiremets. Sice each iterator operatio icludes a upper bouds for its cost, which i tur deped o the uderlyig cotaier, a more specialized iterator category implicitly requires a more specialized, faster uderlyig cotaier, ad thus restricts the set of cotaiers a iterator ca be associated with.

7 S. Schupp / Liftig a butterfly A compoet-based FFT 297 Fig. 5. A moomorphic Getlema-Sade butterfly i pseudo-c. For example, if, i the case of the DIF, we lifted arrays to a geeric cotaier type or eve itroduced a type parameter as i the previous step, we would fail to keep the efficiecy promise of the DIF. It would be possible, for example, to use a list cotaier, which is a data structure too slow for a DIF. However, if we express the geeric DIF i terms of the radom access iterator category [1], which requires radom access i costat time, it must ot be ru with a list data type or other cotaiers that are too geeral to defie this iterator category. As the secod liftig step of the butterfly we therefore specify operator[] as a operatio of a radom access iterator istead of a cotaier operatio. The declaratios i lie 1 ad 3 of Fig. 5 are replaced by declaratios of iterator variables, the iteger additio i lie 9 the becomes a adjustmet of the stride of the twiddle iterator. Figure 6 lists the iterface of the fuctio call operator that ecapsulates the arithmetic operatios of the Getlema-Sade butterfly; the associated cotaiers are defied outside the butterfly class. Decouplig algorithms ad cotaiers via iterators also offers possibilities for code reuse at a algorithmic level. The same butterfly ad the same data cotaiers, for example, ca be used for both backward ad forward FFTs. All oe eeds to do is to refie the coectig iterator from a plai radom access iterator, which retrieves a elemet as-is, to oe that performs a combied retrieval-cojugate operatio Parameterizatio by scalig factors The third abstractio we wat to itroduce is the abstractio from the scalig factor. As discussed i Sectio 2, three scalig factors are commo; the moomorphic butterfly i Fig. 5 implicitly uses the scalig factor 1. To equally support all scalig factors we ecapsulate each scalig operatio i a fuctio object, that is, a object that ca be called as a fuctio [1], ad itroduce a type parameter, say ScaleFactor, that ca bid ay of the scalig fuctio objects. The butterfly ca the be expressed i terms of a istace of the ScaleFactor parameter ad, agai without ay code chages, specialized to the origial, moomorphic butterfly; see Fig. 7 for the iterface. To avoid the overhead of a fuctio call thereby, it is importat to use fuctio objects istead of fuctios sice the former oes ca be ilied, at least i C The lifted butterfly Puttig together the results so far,fig. 8 lists the body of the Getlema-Sade butterfly i its geeric versio. To summarize, from a moomorphic versio for oe particular costellatio of types we have obtaied a versio with a three-dimesioal parameter space: Complex Type Iput Cotaier Scalig. It is worth emphasizig that each dimesio of the parameter space is coceptually ulimited ay complex type, iput cotaier, or scalig factor that meets the requiremets specified ca serve as compoet, eve those that might ot exist at FFT desig time. I cotrast, macros, the alterative to templates i C, usually require the FFT desiger to determie the set of all possible choices up-frot. May C implemetatios ca therefore offer a choice, e.g., betwee the types double ad float for the represetatio of the complex type, but oe of them permits complex umbers that are user-defied ad oe of them permits user-defied cotaiers Parameterizatio by algorithms ad storage models Applyig the abstractios of the previous subsectio we ca lift a traditioal twiddle array to a parameterized twiddle cotaier. However, the computatio of the twiddle factors itself ca be parameterized two powerful abstractios are outstadig yet. The first is the abstractio from the algorithm for twiddle computatios, the secod the abstractio from the storage

8 298 S. Schupp / Liftig a butterfly A compoet-based FFT Fig. 6. Parameterizatio by array-like data structures through decouplig ad iterfacig via radom access iterators. Fig. 7. Parameterizatio of the Getlema-Sade computatio. Fig. 8. A geeric i-place Getlema-Sade butterfly i pseudo-c++. model of the twiddle cotaier. Both abstractios are more fudametal tha ay of the previous oes sice they refer to implemetatio features, ad they are typically ot available i traditioal object-orieted programmig eviromets. For the reusability ad flexibility of a system, however, it is importat that users ca replace implemetatio modules as easily as others. STL suggests usig cotaier adaptors, that is, cotaiers that are parameterized by other cotaiers, ecapsulate their specifics ad thereby provide a uiform iterface across differet cotaiers. Therefore, if we further lift a parameterized twiddle cotaier to a adaptor to a twiddle set, which exposes access fuctios to the twiddle set but hides all data ad fuctios specific for the uderlyig algorithm, it becomes trasparet to other parts of the DIF how the twiddle set has bee computed. At the same time, it becomes possible to support differet twiddle computatios by istatiatig the adaptor i differet ways. The logic of a commo iterface further exteds to the twiddle set ad the access its adaptor provides. As discussed i Sectio 2.4, twiddle sets ca be stored i differet ways, e.g., as a sigle copy that has to be traversed several times or i multiple copies that have to be traversed oly liearly. If we would ot ecapsulate their storage model, the code i the butterfly would have to take each differet model ito accout ad for example reset a iterator for oe storage model, but ot reset it for aother. It would the be impossible to use the same butterfly for twiddle sets that are stored differetly. I illustratio of the two adaptors eeded, Fig. 9 shows the sigleto adaptor, which ecapsulates the computatio of the twiddle set, ad the adaptor twiddle cotaier, which hides the twiddle set as well as its storage layout. Obviously, the sigleto adaptor ca bid twiddle cotaier s TwiddleSet parameter. Together, the two adaptors yield the required trasparecy Multiple butterflies Up to this poit we implicitly assumed to deal with i-place DIFs. We have show that the geeric butterfly i Fig. 8 ca be specialized to the o-geeric DIF NR we started with, ad we could show the very same for the DIF RN. For a ordered DIF, however, this geericity goes too far. Keepig two cotaiers ad usig them alterately to store itermediate results, a DIF NN works i fact so differetly from a DIF NR that we are ot able to regai its moomorphic butterfly by just specializig the geeric oe i Fig. 8. A

9 S. Schupp / Liftig a butterfly A compoet-based FFT 299 Fig. 9. Parameterizatio by twiddle sets ad their storage models (class twiddle cotaier); various parameterizatios of the twiddle computatio (class sigleto). Fig. 10. Parameterizatio by differet DIF computatios. T deque T vector bit-reversed scale by sqrt(n) scale by N scale by 1 radom acc. iterator Getlema- Getlema- Sade butterfly Sade butterfly butterfly loop block loop stages cojugate fixed poit ifiite prec. radom acc. iterator std::complex twiddle cotaier C 'textbook' C sigleto sigle multiple Fig. 11. The architecture of the DIF library. secod geeric butterfly is eeded, therefore. Startig this time with a DIF NN, we have to repeat the various steps of liftig ad, with argumets similar to the oes preseted, determie the right level of abstractio. We skip the details here, but refer to the documetatio o the project web page [24]. To provide a uiform iterface we add aother layer of abstractio ad implemet the two differet algorithms as differet specializatio of the class template stages (see Fig. 10). 4. Compoet architecture Durig the liftig process we idetified five categories of compoets that costitute the DIF framework: algorithms, cotaiers, iterators, fuctio objects, ad adaptors. We ow give a overview of the architecture of the DIF library ad its three levels: the implemetatio level, the library extesio level, ad the user s level. These three levels are disjoit so that, for example, o user ever is required to touch a implemetatio compoet or to eve kow about it. I the followig we briefly discuss the three levels; for a full specificatio of each of the curretly 120 compoets we refer to the project home page [24]. Figure 11 illustrates the four major fuctioal uits of the DIF library: the iput ad output data to the left of the figure, the computatio ad storage of the twiddle set to its right, the butterfly computatio o top, ad o bottom a cotrol uit that defies the skeleto of the differet DIF algorithms as sketched i Fig. 10. Orgaizig the majority of tasks at the implemetatio level, this cotrol uit is further broke dow ito a set of 3 classes (see Fig. 12), which closely col-

10 300 S. Schupp / Liftig a butterfly A compoet-based FFT Fig. 12. Core compoets at the implemetatio level. Fig. 13. Parameterizatio of the DIF NR class, with default bidigs to a traits class for type cofiguratios. laborate. I short, the class stages costructs oe block loop object per stage, which i tur costructs oe butterfly loop object per block, which, fially, for each butterfly i the block, ivokes the fuctio object that ecapsulates the Getlema-Sade computatio. A characteristic of our implemetatio is that, based o the techiques of meta-programmig ad compiletime polymorphism [6,32], the recursive divisio ito sub-ffts takes place at compile time. I particular, the top-level dispatcher class stages ca call itself recursively at compile time by statically icremetig the template parameter that represets the curret stage. As a cosequece, the code of the whole computatio ca be laid out at compile time. Dividig a block ito 2 smaller oes, determiig the right iput for each butterfly ad all the orgaizatioal steps that cotribute as costat factors to the ru time of other implemetatios, are doe at compile time i our implemetatio, therefore do ot icur ay ru-time costs. The overall performace of the DIF library, thus, correlates closely to the mathematically idetified FFT cost. At the library extesio level, most compoets are of coceptual ature. Cocepts, oe of the key iovatios of STL, are specificatios of the fuctioality of a compoet, its parameters, ad its complexity; they are ot types themselves but models of types. Separatig the specificatio from a implemetatio, cocepts allow library desigers to program i a type-idepedet way as well as to exted a library by differet implemetatios: every type that models a cocept ca serve as its realizatio. For example if a library desiger wats to add a ew algorithm for the computatio of a twiddle set, s/he cosults the coceptual descriptio of the Twiddle Set ad the desigs the ew twiddle set type so that it complies with the iterface ad the performace guaratees required by the cocept. Core cocepts of the DIF framework iclude Butterfly, Twiddle Set, Twiddle Cotaier, ad Bitreversal Cotaier. At the user s level,fially, there are two kids of compoets: classes that specialize a parameter of a DIF algorithm ad auxiliary types that help orgaizig the parameter space. The first kid of types icludes twiddle set computatios, storage models of twiddle cotai-

11 S. Schupp / Liftig a butterfly A compoet-based FFT 301 Fig. 14. A example mai program of a DIF NR computatio. Table 1 Iput/output order variats DIF algorithm Iput order Output order dif r atural reversed dif r reversed atural dif atural atural dif r atural atural ers, scale factors, butterflies, ad toggles that determie the ormalizatio, directio, ad meaig of forward ad backward FFT. The secod kid of types mostly cosists of so-called traits [19], or iterface classes, a stadard idiom i library desig that allows users to package type cofiguratios ad to redefie them trasparetly to the caller s code. I illustratio of the traits techique, Fig. 13 lists the trait fft computatio, which wraps up some of the implemetatio data types of a DIF. Because of this wrapper layer, the code of the DIF eed ot operate directly o ay of the implemetatio types, but ca be expressed i terms of the iterface type defiitios the traits class provides. As a cosequece, types ca be redefied without affectig the DIF code, just by chagig their defiitio i the trait class. Figure 13 also shows how DIF algorithms rely o traits classes to establish default bidigs. I the compoet-based settig just preseted, the resposibility of a user might seem to grow with each additioal parameter. With traits, however, ad, at least i C++, default parameters, the parameter space stays quite maageable. Figure 14 list a example mai program i C++, where the legth of the FFT is the oly parameter users have to supply. Of course, users ca overwrite ay default parameter wheever they wish, as we have just explaied. 5. Empirical results For each of the four DIFs i our library we measured its ru time, the size of its executable, ad the compilatio time eeded. The first two measuremets, ru time ad biary size, typically are of iterest to the ed user of the library, while the third kid of measuremets, compilatio times, matter for developers. Sice there are famous examples of C++ template programs that take hours to compile, it is importat to show that software developmet with the DIF library i fact is practical. To summarize the mai test results, dif r, as expected, is the algorithm with the best ru-time performace. Iterestigly o the other had is that dif r outperforms dif, which implies that it is faster to perform a ordered DIF as a compositio of DIF NR

12 302 S. Schupp / Liftig a butterfly A compoet-based FFT Table 2 Test platforms Thikpad 600e Origi 2000 Eterprise 450 No. of Processors Processor Celero MIPS R10000 UltraSPARC-II 366 Mhz 195 MHz 480 MHz Cache L1: 32Kb L1: 64Kb L1: 32Kb L2: 128Kb L2: 4Mb L2: 8Mb (130 MHz) Memory 160Mb 1.5Gb 4Gb Operatig System Redhat Liux 8.0 IRIX SuOS 5.8 Table 3 Compilers ad compile flags Compiler Compile flags Thikpad Itel C O3 -Ob2 -mp -tpp6 600e -mcpu=petiumpro -march=petiumii -ip -ipo -asi Origi 2000 MIPSpro m -O3 -IPA -mips4 -Ofast -r10000 Eterprise GCC 3.2 -O3 -fomit-frame-poiter 450 -mcpu=ultrasparc -mtue=ultrasparc -filie-fuctios Table 4 Sizes of executables (Kb) Iput size = 2 3 Iput size = 2 21 dif-r dif-r dif- dif-r dif-r dif-r dif- dif-r MIPS GCC Itel s with subsequet bit-reversal tha to directly perform the ordered DIF NN. Also as expected, the sizes of the biaries grow sub-liearly, very slowly, with the iput size. Fially, the compilatio times are mostly below 30 secods eve for large iput sizes, thus quite acceptable. Oe iterestig observatio is that dif, the algorithm with the slowest ru time, is also the most expesive algorithm to compile. The followig subsectios describe the test setup ad preset the test results i more detail Test setup I the FFT literature, it is commo to measure performace i MFLOP uits. MFLOPs scale the executio time by the theoretical umber of floatig poit operatios for a radix-2 FFT of legth. They are defied as follows: 5 log 2 /( time for oe FFT i µs). I the graphs of the performace measuremets (see Sectio 5.2), the ru time is expressed i MFLOPs. For the iterpretatio of the performace graphs it is importat to keep i mid that higher MFLOP umbers are better tha lower oes. We tested each algorithm with 3 compilers, each o a differet platform. Table 2 summarizes the characteristics of each platform, icludig the operatig system, while Table 3 lists the compilers ad compilatio flags used. We varied the iput size i steps of powers of 2: the smallest iput size is 2 3, which is the smallest possible iput size for the Sigleto algorithm (see Sectio 2.4); the largest iput size depeds o the memory available, thus varies platformdepedetly betwee 2 21 (o a machie with 160 Mb memory) ad 2 26 (o a machie with 4 Gb memory). We tested each iput size multiple times. Sice the impact of possible oise is higher for small iputs with short executio times we repeated the tests for small iput size more ofte tha the oes for larger sizes: for a iput size i the rage x 2 10, we repeated each test 1000 times. For a iput size betwee x 2 14, the umber of repetitios was 100, for a iput size betwee x 2 17 it was 30 times; for 2 18 we repeated the tests 10 times, for the followig pairs of iput sizes 5 times ad 3 times, respectively, ad for all iput sizes above 2 22 two times.

13 S. Schupp / Liftig a butterfly A compoet-based FFT dif-r dif-r dif- dif-r Test results for the MIPSpro compiler 120 Executio time (MFLOPS) log() Fig. 15. Ru time of 4 differet DIFs, compiled with MIPSpro/MIPS. Idepedetly, we tested differet bidigs of the parameters for the DIF compoets. For the tests preseted here, the cotaier for the iput data as well as the cotaier for the twiddle umbers was of type std::vector, its elemet type std::complex<float>; the twiddle computatio was doe via the fftl:sigleto class. We furthermore used the sigle storage model for twiddle umbers ad the scalig factor 1. The Celero machie ra i a cotrolled eviromet ad was dedicated to our tests. The other two machies, Origi 2000 ad Eterprise 450, are multiuser machies with multiple processors. Sice we use oly oe processor whe ruig the tests, it seems safe to assume that the impact of ay other processes is miimal Test results We first discuss the results of the ru-time tests, the the results of the compile-time tests. The sizes of the biaries are listed i Table Ru times The results of the ru-time tests are summarized i the Figs 15 17, which plot the umber of MFLOPs agaist the size of the iput vector. As said i the itroductio of this sectio, the graphs cofirm, at least i the MIPS ad GCC/Sparc tests, that the fastest algorithm is dif r. The same is true i the Itel/Celero test, however, oly for iput size 2 16, ad the, dif r is faster oly by less tha 1 MFLOP. Not surprisigly, all tests agree that dif performs worst. This algorithm, as oe might recall, computes a out-of-place DIF. It requires a secod cotaier ad performs alteratig updates of the two cotaiers, thus ca be expected to be slower tha the i-place DIFs. As we emphasized already earlier, it is almost always faster to perform a ordered DIF as a compositio of DIF NR with subsequet bit-reversal tha to directly perform the ordered DIF NN. I the MIPS ad Itel/Celero tests, the trade-off betwee the two outof-place DIFs, dif ad dif r, is at iput size O GCC/Sparc, we ca observe the trade-off poit for the iput size 2 5. The relative performace of the 4 algorithms is the same for MIPS ad GCC/Sparc. For the Itel/Celero tests, o the other had, the ru times are for the most

14 304 S. Schupp / Liftig a butterfly A compoet-based FFT Test results for the GCC compiler dif-r dif-r dif- dif-r Executio time (MFLOPS) log() Fig. 16. Ru time of 4 differet DIFs, compiled with GCC/Ultra Sparc. part too close to each other (mostly less tha 1 MFLOP apart) to allow for a meaigful rakig. The overall shapes of the graphs for the 4 algorithms are similar to each other as well as across the three tests, with a peek performace aroud the iput size of 2 11 (for MIPS, 2 9 for GCC/Sparc, 2 12 for Itel/Celero). The overall speed,fially, is the best i the MIPS tests where we reach up to 150 MFLOPS. Roughly speakig, the performace i the MIPS tests is betwee 2 3 times faster tha i the GCC/Sparc tests ad betwee 15 (for the peek values) to 6 (for larger values) tha i the Itel/Celero tests. Sice the MIPS processor is much slower tha the Sparc, it seems to be the MIPS compiler that is resposible for the better performace. O the other had, ad surprisigly, the Itel/Celero times are sigificatly better tha the times o the MIPS platform for small iput sizes Compilatio times The results of the compilatio-time tests are summarized i Fig. 18. Perhaps the most importat result is the overall compilatio time. As we oted already i the itroductio, the compilatio times do ot exceed 30 secods eve for large iput sizes; the oly exceptio is dif o GCC/Sparc. We ca therefore claim that software developmet with the DIF library is feasible. Aother positive result is the overall sub-liear behavior of the compilatio time. I cotrast to several template programs elsewhere that face expoetial compilatio times, our implemetatios scale i this regard. O all machies, the algorithm dif takes the logest time to compile. As i the case of the ru-time measuremets, this behavior ca be expected for a out-of-place DIF, which requires extra data structures. Sice oe ca observe that its compilatio times, relative to the oes for the other algorithms, are best o the MIPS, a little worse o GCC/Sparc, ad may times slower o Celero/Itel, cache size agai seems to be a importat factor. The graphs i Fig. 18 also show two uexpected results. For oe, we expected the compilatio times for dif r ad dif r to be almost idetical, sice these algorithms are very similar i structure. However, this is the case oly i the Celero/Itel tests. I the MIPS ad GCC/Sparc tests, the compilatio times ot oly differ oticeably but also are separated by the measuremets for dif r, which is a algorithm of a very differet structure. Secod, the MIPS compiler exhibits a strage but cosistet irregularity for small values. While the first pheomeo might be just coicidece, we have curretly o good explaatio for the secod oe.

15 S. Schupp / Liftig a butterfly A compoet-based FFT dif-r dif-r dif- dif-r Test results for the Itel compiler Executio time (MFLOPS) log() Fig. 17. Ru time of 4 differet DIFs, compiled with Itel/Celero. 6. Related ad future work With its differet kids of parameterizatio, icludig the parameterizatio with implemetatio features, our DIF library is strogly iflueced by STL ad therefore closest to other scietific libraries i the spirit of STL, most otably Blitz++ ad POOMA [31,23]. I Blitz++, however, the FFT is merely a exercise i meta-programmig without much emphasis o compoets, while POOMA meawhile seems to have ceased its support for a FFT. Although the POOMA user s guide describes a class template for multi-dimesioal, parallel FFT, its distributio (pooma-2.2.0) lacks the respective code. The MTL [26,27], fially, was the first library we are aware of that, for matrices, made differet storage types first-class compoets. I a more geeral sese our work is related to the curret tred we see i scietific programmig, to move from lower programmig laguages such as Fortra ad C to higher oes, which support classes ad other mechaisms for abstractio. A importat step i this directio is the Vector, Sigal, ad Image Processig Library (VSIPL) [33], a ope idustry stadard that achieves portability across differet memory ad processor architectures through a object-based style, which eables the ecapsulatio ad abstractio from differet memory models ad array represetatios. Versio 1.1 of VSIPL is writte i C but the stadardizatio forum is curretly discussig bidigs to C++. I fact, the umber of both ope-source ad commercial libraries that ow provide implemetatios i C++ or Java, or at least iterface higher laguages, log has exceeded a size that would allow givig a complete referece. We therefore oly poit out the recet iterest i the field of digital sigal processig, the major applicatio of FFTs, to develop compilers for DSP processors i C++ rather tha i special-purpose laguages, ad these compilers rely o FFTs that are writte i C++ (see, e.g. [22,5]). For the further developmet of our ow DFT project we idetified the followig three ext steps. First, we wat to complemet the curret DIF algorithms by the correspodig decimatio-i-time (DIT) oes. Sice DITs ad DIFs are dual to each other, we expect their implemetatios to parallel each other. Whether after liftig ad at a yet higher level of abstractio, DIFs ad DITs could be further uified, is aother iter-

16 306 S. Schupp / Liftig a butterfly A compoet-based FFT Test results for the MIPSpro compiler dif-r dif-r dif- dif-r log() Test results for the GCC compiler dif-r dif-r dif- dif-r log() Test results for the Itel compiler dif-r dif-r dif- dif-r log() Compilatio time (secods) Compilatio time (secods) Compilatio time (secods) Fig. 18. Compile times i secods for MIPSpro/MIPS (left), GCC/Ultra Sparc (middle), ad Celero/Itel (right); figures rotated by estig topic, about which, however, we curretly ca speculate oly. Secod, we eed to separately implemet small-size DFTs, both for efficiecy reasos ad to be able to hadle iput of arbitrary size. As a cosequece, we also eed to exted our curret metaprogram so that it ca statically select amog the differet decompositios ito sub-ffts that will the be possible. Third, we eed to work o optimizatios of the library. We pla o applyig techiques of program trasformatio, i particular for arithmetic operatios with costat values. Program trasformatio allows for optimizatios that are decoupled from the actual code, thus extesible for example by optimizatios demaded by ew user-defied types ad their arithmetic. Our goal is to reach executio times that are o a par with the FFTW package ( Fastest Fourier Trasform i the West ), a ope source implemetatio that has proved to be competitive with vedor-tued implemetatios [9]. 7. Ackowledgmets We thak a aoymous reviewer, whose suggestios greatly helped improvig the presetatio of the paper. Marci Zalewski helped with the Itel tests. Refereces [1] M. Auster, Geeric Programmig ad the STL: Usig ad Extedig the C++ Stadard Template Library, Addiso- Wesley, [2] E. Chu ad A. George, Iside the FFT Black Box, CRC, [3] J.W. Cooley ad J.W. Tukey, A algorithm for the machie calculatio of complex Fourier series, Math. Comp. (1965), [4] T.H. Corme, C.E. Leiserso, R.R. Rivest ad C. Stei, Itroductio to Algorithms, (2d ed.), MIT Press, McGraw-Hill Book Compay, [5] CyApps, Cylib, [6] K. Czarecki ad U.K. Eiseecker, Geerative Programmig Towards a New Paradigm of Software Egieerig, Addiso Wesley Logma, [7] B. Dawes ad D. Abrahams, The Boost library iitiative, [8] P.M. Embree ad D. Daieli, C++ Algorithms for Digital Sigal Processig, PTR PH, [9] M. Frigo, A Fast Fourier Trasform Compiler, i Proc. of the 1999 ACM SIGPLAN Cof. o Prog. Laguage Desig ad Implemetatio, May 1999, pp [10] C.F. Gauss, Gauss Collected Works, (Vol. 3), chapter Theoria iterpolatiois methodo ova tractata, 1886, pp [11] W.M. Getlema ad G. Sade, Fast Fourier trasforms for fu ad profit, (Vol. 29), i Proc. Joit Computer Cof., 1966, pp [12] J. Guiliai ad D. Robertso, Performace Tuig for the Cray SV1, i: Cray User Meetig, 2000, [13] M.T. Heidema, D.H. Johso ad D.S. Burrus, Gauss ad the history of the FFT, IEEE Tras. o Acoustics, Speech, ad Sigal Processig, 1(4) (1984), [14] A.H. Karp, Bit reversal o uiprocessors, SIAM Review 38 (1966), [15] J. Li, A. Skjellum ad R.D. Falgout, A poly-algorithm for parallel dese matrix multiplicatio o two-dimesioal process grid topologies, Cocurrecy: Practice ad Experiece 9(5) (1997), [16] D. Masle ad D. Rockmore, Geeralized FFTs A Survey of some Recet Results, i Proc DIMACS Workshop i Groups ad Computatio, [17] D. Musser ad A. Stepaov, Algorithm-orieted Geeric

17 S. Schupp / Liftig a butterfly A compoet-based FFT 307 Libraries, Software-Practice ad Experiece 27(7) (July 1994), [18] D.R. Musser, G.J. Derge ad A. Saii, STL Tutorial ad Referece Guide. C++ Programmig with the Stadard Template Library, (2d ed.), Addiso Wesley, [19] N. Myers, A ew ad useful template techique, i C++ Gems, B. Lippma, ed., Cambridge Uiversity Press, December [20] H.J. Nussbaumer, Fast Fourier Trasforms ad Covolutio Algorithms, Spriger Verlag, [21] A.V. Oppeheim ad R.W. Schafer, Discrete-Time Sigal Processig, (2d ed.), Pretice Hall Sigal Processig Series, March [22] Ope SystemC Iitiative (OSCI), SystemC, [23] J.V. Reyders, P.J. Hiker, J.C. Cummigs, S.R. Atlas, S. Baerjee, W.F. Humphrey, S.R. Karmesi, K. Keahey, M. Srikat ad M. Tholbur, POOMA: A framework for scietific simulatios o parallel architectures, i: Parallel Programmig usig C++, G.V. Wilso ad P. Lu, eds, MIT Press, 1996, pp [24] S. Schupp, The FFT Template Library (FF T L), cs.rpi.edu/ schupp/etries/software/fftl. [25] S. Schupp, How to lift a library, Techical Report WSI-7-96 Fakultät für Iformatik, Uiversität Tübige, [26] J.G. Siek, A moder framework for portable high performace umerical liear algebra, Master s thesis, Notre Dame, [27] J.G. Siek ad A. Lumsdaie, The Matrix Template Library: A Geeric Programmig Approach to High Performace Numerical Liear Algebra, i: Iterat. Symp. o Comp. i Object- Orieted Parallel Eviromets, [28] R.C. Sigleto, A method for computig the fast Fourier trasform with auxiliary memory ad limited high-speed storage, IEEE Tras. Audio ad Electroacustics 15 (1969), [29] A.A. Stepaov ad M. Lee, The Stadard Template Library, Techical Report HP-94-93, Hewlett-Packard, [30] C. Szyperski, Compoet Software. Beyod Object-Orieted Programmig, ACM Press, [31] T. Veldhuize, Blitz++, [32] T. Veldhuize, Techiques for Scietific C++, 0.3 editio, August 1999, tveldhui/papers/techiques. [33] VSIPL, Vector, Sigal, ad Image Processig Library. The Ope Idustry Stadard for Sigal Processig, vsipl.org. [34] K.R. Wadleigh ad I.L. Crawford, Software Optimizatio for High Performace Computig, Hewlett-Packard Professioal Books, 2000.

18 Joural of Advaces i Idustrial Egieerig Multimedia Hidawi Publishig Corporatio The Scietific World Joural Hidawi Publishig Corporatio Applied Computatioal Itelligece ad Soft Computig Iteratioal Joural of Distributed Sesor Networks Hidawi Publishig Corporatio Hidawi Publishig Corporatio Hidawi Publishig Corporatio Advaces i Fuzzy Systems Modellig & Simulatio i Egieerig Hidawi Publishig Corporatio Hidawi Publishig Corporatio Submit your mauscripts at Joural of Computer Networks ad Commuicatios Advaces i Artificial Itelligece Hidawi Publishig Corporatio Hidawi Publishig Corporatio Iteratioal Joural of Biomedical Imagig Advaces i Artificial Neural Systems Iteratioal Joural of Computer Egieerig Computer Games Techology Hidawi Publishig Corporatio Hidawi Publishig Corporatio Advaces i Advaces i Software Egieerig Hidawi Publishig Corporatio Hidawi Publishig Corporatio Hidawi Publishig Corporatio Iteratioal Joural of Recofigurable Computig Robotics Hidawi Publishig Corporatio Computatioal Itelligece ad Neurosciece Advaces i Huma-Computer Iteractio Joural of Hidawi Publishig Corporatio Hidawi Publishig Corporatio Joural of Electrical ad Computer Egieerig Hidawi Publishig Corporatio Hidawi Publishig Corporatio

Chapter 3 Classification of FFT Processor Algorithms

Chapter 3 Classification of FFT Processor Algorithms Chapter Classificatio of FFT Processor Algorithms The computatioal complexity of the Discrete Fourier trasform (DFT) is very high. It requires () 2 complex multiplicatios ad () complex additios [5]. As

More information

Fast Fourier Transform (FFT) Algorithms

Fast Fourier Transform (FFT) Algorithms Fast Fourier Trasform FFT Algorithms Relatio to the z-trasform elsewhere, ozero, z x z X x [ ] 2 ~ elsewhere,, ~ e j x X x x π j e z z X X π 2 ~ The DFS X represets evely spaced samples of the z- trasform

More information

EE123 Digital Signal Processing

EE123 Digital Signal Processing Last Time EE Digital Sigal Processig Lecture 7 Block Covolutio, Overlap ad Add, FFT Discrete Fourier Trasform Properties of the Liear covolutio through circular Today Liear covolutio with Overlap ad add

More information

Data Structures and Algorithms. Analysis of Algorithms

Data Structures and Algorithms. Analysis of Algorithms Data Structures ad Algorithms Aalysis of Algorithms Outlie Ruig time Pseudo-code Big-oh otatio Big-theta otatio Big-omega otatio Asymptotic algorithm aalysis Aalysis of Algorithms Iput Algorithm Output

More information

Running Time. Analysis of Algorithms. Experimental Studies. Limitations of Experiments

Running Time. Analysis of Algorithms. Experimental Studies. Limitations of Experiments Ruig Time Aalysis of Algorithms Iput Algorithm Output A algorithm is a step-by-step procedure for solvig a problem i a fiite amout of time. Most algorithms trasform iput objects ito output objects. The

More information

Running Time ( 3.1) Analysis of Algorithms. Experimental Studies. Limitations of Experiments

Running Time ( 3.1) Analysis of Algorithms. Experimental Studies. Limitations of Experiments Ruig Time ( 3.1) Aalysis of Algorithms Iput Algorithm Output A algorithm is a step- by- step procedure for solvig a problem i a fiite amout of time. Most algorithms trasform iput objects ito output objects.

More information

Analysis of Algorithms

Analysis of Algorithms Aalysis of Algorithms Iput Algorithm Output A algorithm is a step-by-step procedure for solvig a problem i a fiite amout of time. Ruig Time Most algorithms trasform iput objects ito output objects. The

More information

CSC 220: Computer Organization Unit 11 Basic Computer Organization and Design

CSC 220: Computer Organization Unit 11 Basic Computer Organization and Design College of Computer ad Iformatio Scieces Departmet of Computer Sciece CSC 220: Computer Orgaizatio Uit 11 Basic Computer Orgaizatio ad Desig 1 For the rest of the semester, we ll focus o computer architecture:

More information

How do we evaluate algorithms?

How do we evaluate algorithms? F2 Readig referece: chapter 2 + slides Algorithm complexity Big O ad big Ω To calculate ruig time Aalysis of recursive Algorithms Next time: Litterature: slides mostly The first Algorithm desig methods:

More information

Pseudocode ( 1.1) Analysis of Algorithms. Primitive Operations. Pseudocode Details. Running Time ( 1.1) Estimating performance

Pseudocode ( 1.1) Analysis of Algorithms. Primitive Operations. Pseudocode Details. Running Time ( 1.1) Estimating performance Aalysis of Algorithms Iput Algorithm Output A algorithm is a step-by-step procedure for solvig a problem i a fiite amout of time. Pseudocode ( 1.1) High-level descriptio of a algorithm More structured

More information

CIS 121 Data Structures and Algorithms with Java Fall Big-Oh Notation Tuesday, September 5 (Make-up Friday, September 8)

CIS 121 Data Structures and Algorithms with Java Fall Big-Oh Notation Tuesday, September 5 (Make-up Friday, September 8) CIS 11 Data Structures ad Algorithms with Java Fall 017 Big-Oh Notatio Tuesday, September 5 (Make-up Friday, September 8) Learig Goals Review Big-Oh ad lear big/small omega/theta otatios Practice solvig

More information

A New Morphological 3D Shape Decomposition: Grayscale Interframe Interpolation Method

A New Morphological 3D Shape Decomposition: Grayscale Interframe Interpolation Method A ew Morphological 3D Shape Decompositio: Grayscale Iterframe Iterpolatio Method D.. Vizireau Politehica Uiversity Bucharest, Romaia ae@comm.pub.ro R. M. Udrea Politehica Uiversity Bucharest, Romaia mihea@comm.pub.ro

More information

Lecture 1: Introduction and Strassen s Algorithm

Lecture 1: Introduction and Strassen s Algorithm 5-750: Graduate Algorithms Jauary 7, 08 Lecture : Itroductio ad Strasse s Algorithm Lecturer: Gary Miller Scribe: Robert Parker Itroductio Machie models I this class, we will primarily use the Radom Access

More information

BOOLEAN MATHEMATICS: GENERAL THEORY

BOOLEAN MATHEMATICS: GENERAL THEORY CHAPTER 3 BOOLEAN MATHEMATICS: GENERAL THEORY 3.1 ISOMORPHIC PROPERTIES The ame Boolea Arithmetic was chose because it was discovered that literal Boolea Algebra could have a isomorphic umerical aspect.

More information

Improving Template Based Spike Detection

Improving Template Based Spike Detection Improvig Template Based Spike Detectio Kirk Smith, Member - IEEE Portlad State Uiversity petra@ee.pdx.edu Abstract Template matchig algorithms like SSE, Covolutio ad Maximum Likelihood are well kow for

More information

Appendix D. Controller Implementation

Appendix D. Controller Implementation COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Appedix D Cotroller Implemetatio Cotroller Implemetatios Combiatioal logic (sigle-cycle); Fiite state machie (multi-cycle, pipelied);

More information

Chapter 1. Introduction to Computers and C++ Programming. Copyright 2015 Pearson Education, Ltd.. All rights reserved.

Chapter 1. Introduction to Computers and C++ Programming. Copyright 2015 Pearson Education, Ltd.. All rights reserved. Chapter 1 Itroductio to Computers ad C++ Programmig Copyright 2015 Pearso Educatio, Ltd.. All rights reserved. Overview 1.1 Computer Systems 1.2 Programmig ad Problem Solvig 1.3 Itroductio to C++ 1.4 Testig

More information

Outline and Reading. Analysis of Algorithms. Running Time. Experimental Studies. Limitations of Experiments. Theoretical Analysis

Outline and Reading. Analysis of Algorithms. Running Time. Experimental Studies. Limitations of Experiments. Theoretical Analysis Outlie ad Readig Aalysis of Algorithms Iput Algorithm Output Ruig time ( 3.) Pseudo-code ( 3.2) Coutig primitive operatios ( 3.3-3.) Asymptotic otatio ( 3.6) Asymptotic aalysis ( 3.7) Case study Aalysis

More information

Elementary Educational Computer

Elementary Educational Computer Chapter 5 Elemetary Educatioal Computer. Geeral structure of the Elemetary Educatioal Computer (EEC) The EEC coforms to the 5 uits structure defied by vo Neuma's model (.) All uits are preseted i a simplified

More information

Lecture Notes 6 Introduction to algorithm analysis CSS 501 Data Structures and Object-Oriented Programming

Lecture Notes 6 Introduction to algorithm analysis CSS 501 Data Structures and Object-Oriented Programming Lecture Notes 6 Itroductio to algorithm aalysis CSS 501 Data Structures ad Object-Orieted Programmig Readig for this lecture: Carrao, Chapter 10 To be covered i this lecture: Itroductio to algorithm aalysis

More information

The Magma Database file formats

The Magma Database file formats The Magma Database file formats Adrew Gaylard, Bret Pikey, ad Mart-Mari Breedt Johaesburg, South Africa 15th May 2006 1 Summary Magma is a ope-source object database created by Chris Muller, of Kasas City,

More information

A SOFTWARE MODEL FOR THE MULTILAYER PERCEPTRON

A SOFTWARE MODEL FOR THE MULTILAYER PERCEPTRON A SOFTWARE MODEL FOR THE MULTILAYER PERCEPTRON Roberto Lopez ad Eugeio Oñate Iteratioal Ceter for Numerical Methods i Egieerig (CIMNE) Edificio C1, Gra Capitá s/, 08034 Barceloa, Spai ABSTRACT I this work

More information

Analysis Metrics. Intro to Algorithm Analysis. Slides. 12. Alg Analysis. 12. Alg Analysis

Analysis Metrics. Intro to Algorithm Analysis. Slides. 12. Alg Analysis. 12. Alg Analysis Itro to Algorithm Aalysis Aalysis Metrics Slides. Table of Cotets. Aalysis Metrics 3. Exact Aalysis Rules 4. Simple Summatio 5. Summatio Formulas 6. Order of Magitude 7. Big-O otatio 8. Big-O Theorems

More information

CMSC Computer Architecture Lecture 12: Virtual Memory. Prof. Yanjing Li University of Chicago

CMSC Computer Architecture Lecture 12: Virtual Memory. Prof. Yanjing Li University of Chicago CMSC 22200 Computer Architecture Lecture 12: Virtual Memory Prof. Yajig Li Uiversity of Chicago A System with Physical Memory Oly Examples: most Cray machies early PCs Memory early all embedded systems

More information

1 Enterprise Modeler

1 Enterprise Modeler 1 Eterprise Modeler Itroductio I BaaERP, a Busiess Cotrol Model ad a Eterprise Structure Model for multi-site cofiguratios are itroduced. Eterprise Structure Model Busiess Cotrol Models Busiess Fuctio

More information

Ones Assignment Method for Solving Traveling Salesman Problem

Ones Assignment Method for Solving Traveling Salesman Problem Joural of mathematics ad computer sciece 0 (0), 58-65 Oes Assigmet Method for Solvig Travelig Salesma Problem Hadi Basirzadeh Departmet of Mathematics, Shahid Chamra Uiversity, Ahvaz, Ira Article history:

More information

GPUMP: a Multiple-Precision Integer Library for GPUs

GPUMP: a Multiple-Precision Integer Library for GPUs GPUMP: a Multiple-Precisio Iteger Library for GPUs Kaiyog Zhao ad Xiaowe Chu Departmet of Computer Sciece, Hog Kog Baptist Uiversity Hog Kog, P. R. Chia Email: {kyzhao, chxw}@comp.hkbu.edu.hk Abstract

More information

Lecture 5. Counting Sort / Radix Sort

Lecture 5. Counting Sort / Radix Sort Lecture 5. Coutig Sort / Radix Sort T. H. Corme, C. E. Leiserso ad R. L. Rivest Itroductio to Algorithms, 3rd Editio, MIT Press, 2009 Sugkyukwa Uiversity Hyuseug Choo choo@skku.edu Copyright 2000-2018

More information

Morgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5

Morgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5 Morga Kaufma Publishers 26 February, 28 COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter 5 Set-Associative Cache Architecture Performace Summary Whe CPU performace icreases:

More information

Structuring Redundancy for Fault Tolerance. CSE 598D: Fault Tolerant Software

Structuring Redundancy for Fault Tolerance. CSE 598D: Fault Tolerant Software Structurig Redudacy for Fault Tolerace CSE 598D: Fault Tolerat Software What do we wat to achieve? Versios Damage Assessmet Versio 1 Error Detectio Iputs Versio 2 Voter Outputs State Restoratio Cotiued

More information

Service Oriented Enterprise Architecture and Service Oriented Enterprise

Service Oriented Enterprise Architecture and Service Oriented Enterprise Approved for Public Release Distributio Ulimited Case Number: 09-2786 The 23 rd Ope Group Eterprise Practitioers Coferece Service Orieted Eterprise ad Service Orieted Eterprise Ya Zhao, PhD Pricipal, MITRE

More information

Counting Regions in the Plane and More 1

Counting Regions in the Plane and More 1 Coutig Regios i the Plae ad More 1 by Zvezdelia Stakova Berkeley Math Circle Itermediate I Group September 016 1. Overarchig Problem Problem 1 Regios i a Circle. The vertices of a polygos are arraged o

More information

Python Programming: An Introduction to Computer Science

Python Programming: An Introduction to Computer Science Pytho Programmig: A Itroductio to Computer Sciece Chapter 6 Defiig Fuctios Pytho Programmig, 2/e 1 Objectives To uderstad why programmers divide programs up ito sets of cooperatig fuctios. To be able to

More information

Analysis of Algorithms

Analysis of Algorithms Aalysis of Algorithms Ruig Time of a algorithm Ruig Time Upper Bouds Lower Bouds Examples Mathematical facts Iput Algorithm Output A algorithm is a step-by-step procedure for solvig a problem i a fiite

More information

Software development of components for complex signal analysis on the example of adaptive recursive estimation methods.

Software development of components for complex signal analysis on the example of adaptive recursive estimation methods. Software developmet of compoets for complex sigal aalysis o the example of adaptive recursive estimatio methods. SIMON BOYMANN, RALPH MASCHOTTA, SILKE LEHMANN, DUNJA STEUER Istitute of Biomedical Egieerig

More information

Polynomial Functions and Models. Learning Objectives. Polynomials. P (x) = a n x n + a n 1 x n a 1 x + a 0, a n 0

Polynomial Functions and Models. Learning Objectives. Polynomials. P (x) = a n x n + a n 1 x n a 1 x + a 0, a n 0 Polyomial Fuctios ad Models 1 Learig Objectives 1. Idetify polyomial fuctios ad their degree 2. Graph polyomial fuctios usig trasformatios 3. Idetify the real zeros of a polyomial fuctio ad their multiplicity

More information

Bezier curves. Figure 2 shows cubic Bezier curves for various control points. In a Bezier curve, only

Bezier curves. Figure 2 shows cubic Bezier curves for various control points. In a Bezier curve, only Edited: Yeh-Liag Hsu (998--; recommeded: Yeh-Liag Hsu (--9; last updated: Yeh-Liag Hsu (9--7. Note: This is the course material for ME55 Geometric modelig ad computer graphics, Yua Ze Uiversity. art of

More information

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe CHAPTER 19 Query Optimizatio Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe Itroductio Query optimizatio Coducted by a query optimizer i a DBMS Goal:

More information

Reversible Realization of Quaternary Decoder, Multiplexer, and Demultiplexer Circuits

Reversible Realization of Quaternary Decoder, Multiplexer, and Demultiplexer Circuits Egieerig Letters, :, EL Reversible Realizatio of Quaterary Decoder, Multiplexer, ad Demultiplexer Circuits Mozammel H.. Kha, Member, ENG bstract quaterary reversible circuit is more compact tha the correspodig

More information

APPLICATION NOTE PACE1750AE BUILT-IN FUNCTIONS

APPLICATION NOTE PACE1750AE BUILT-IN FUNCTIONS APPLICATION NOTE PACE175AE BUILT-IN UNCTIONS About This Note This applicatio brief is iteded to explai ad demostrate the use of the special fuctios that are built ito the PACE175AE processor. These powerful

More information

Chapter 11. Friends, Overloaded Operators, and Arrays in Classes. Copyright 2014 Pearson Addison-Wesley. All rights reserved.

Chapter 11. Friends, Overloaded Operators, and Arrays in Classes. Copyright 2014 Pearson Addison-Wesley. All rights reserved. Chapter 11 Frieds, Overloaded Operators, ad Arrays i Classes Copyright 2014 Pearso Addiso-Wesley. All rights reserved. Overview 11.1 Fried Fuctios 11.2 Overloadig Operators 11.3 Arrays ad Classes 11.4

More information

Computers and Scientific Thinking

Computers and Scientific Thinking Computers ad Scietific Thikig David Reed, Creighto Uiversity Chapter 15 JavaScript Strigs 1 Strigs as Objects so far, your iteractive Web pages have maipulated strigs i simple ways use text box to iput

More information

An Efficient Algorithm for Graph Bisection of Triangularizations

An Efficient Algorithm for Graph Bisection of Triangularizations A Efficiet Algorithm for Graph Bisectio of Triagularizatios Gerold Jäger Departmet of Computer Sciece Washigto Uiversity Campus Box 1045 Oe Brookigs Drive St. Louis, Missouri 63130-4899, USA jaegerg@cse.wustl.edu

More information

Τεχνολογία Λογισμικού

Τεχνολογία Λογισμικού ΕΘΝΙΚΟ ΜΕΤΣΟΒΙΟ ΠΟΛΥΤΕΧΝΕΙΟ Σχολή Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών Τεχνολογία Λογισμικού, 7ο/9ο εξάμηνο 2018-2019 Τεχνολογία Λογισμικού Ν.Παπασπύρου, Αν.Καθ. ΣΗΜΜΥ, ickie@softlab.tua,gr

More information

n Explore virtualization concepts n Become familiar with cloud concepts

n Explore virtualization concepts n Become familiar with cloud concepts Chapter Objectives Explore virtualizatio cocepts Become familiar with cloud cocepts Chapter #15: Architecture ad Desig 2 Hypervisor Virtualizatio ad cloud services are becomig commo eterprise tools to

More information

. Written in factored form it is easy to see that the roots are 2, 2, i,

. Written in factored form it is easy to see that the roots are 2, 2, i, CMPS A Itroductio to Programmig Programmig Assigmet 4 I this assigmet you will write a java program that determies the real roots of a polyomial that lie withi a specified rage. Recall that the roots (or

More information

Octahedral Graph Scaling

Octahedral Graph Scaling Octahedral Graph Scalig Peter Russell Jauary 1, 2015 Abstract There is presetly o strog iterpretatio for the otio of -vertex graph scalig. This paper presets a ew defiitio for the term i the cotext of

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Part A Datapath Design

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Part A Datapath Design COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter The Processor Part A path Desig Itroductio CPU performace factors Istructio cout Determied by ISA ad compiler. CPI ad

More information

LU Decomposition Method

LU Decomposition Method SOLUTION OF SIMULTANEOUS LINEAR EQUATIONS LU Decompositio Method Jamie Traha, Autar Kaw, Kevi Marti Uiversity of South Florida Uited States of America kaw@eg.usf.edu http://umericalmethods.eg.usf.edu Itroductio

More information

Computer Systems - HS

Computer Systems - HS What have we leared so far? Computer Systems High Level ENGG1203 2d Semester, 2017-18 Applicatios Sigals Systems & Cotrol Systems Computer & Embedded Systems Digital Logic Combiatioal Logic Sequetial Logic

More information

Lower Bounds for Sorting

Lower Bounds for Sorting Liear Sortig Topics Covered: Lower Bouds for Sortig Coutig Sort Radix Sort Bucket Sort Lower Bouds for Sortig Compariso vs. o-compariso sortig Decisio tree model Worst case lower boud Compariso Sortig

More information

CIS 121 Data Structures and Algorithms with Java Spring Stacks, Queues, and Heaps Monday, February 18 / Tuesday, February 19

CIS 121 Data Structures and Algorithms with Java Spring Stacks, Queues, and Heaps Monday, February 18 / Tuesday, February 19 CIS Data Structures ad Algorithms with Java Sprig 09 Stacks, Queues, ad Heaps Moday, February 8 / Tuesday, February 9 Stacks ad Queues Recall the stack ad queue ADTs (abstract data types from lecture.

More information

Protected points in ordered trees

Protected points in ordered trees Applied Mathematics Letters 008 56 50 www.elsevier.com/locate/aml Protected poits i ordered trees Gi-Sag Cheo a, Louis W. Shapiro b, a Departmet of Mathematics, Sugkyukwa Uiversity, Suwo 440-746, Republic

More information

One advantage that SONAR has over any other music-sequencing product I ve worked

One advantage that SONAR has over any other music-sequencing product I ve worked *gajedra* D:/Thomso_Learig_Projects/Garrigus_163132/z_productio/z_3B2_3D_files/Garrigus_163132_ch17.3d, 14/11/08/16:26:39, 16:26, page: 647 17 CAL 101 Oe advatage that SONAR has over ay other music-sequecig

More information

CSC165H1 Worksheet: Tutorial 8 Algorithm analysis (SOLUTIONS)

CSC165H1 Worksheet: Tutorial 8 Algorithm analysis (SOLUTIONS) CSC165H1, Witer 018 Learig Objectives By the ed of this worksheet, you will: Aalyse the ruig time of fuctios cotaiig ested loops. 1. Nested loop variatios. Each of the followig fuctios takes as iput a

More information

Python Programming: An Introduction to Computer Science

Python Programming: An Introduction to Computer Science Pytho Programmig: A Itroductio to Computer Sciece Chapter 1 Computers ad Programs 1 Objectives To uderstad the respective roles of hardware ad software i a computig system. To lear what computer scietists

More information

1 Graph Sparsfication

1 Graph Sparsfication CME 305: Discrete Mathematics ad Algorithms 1 Graph Sparsficatio I this sectio we discuss the approximatio of a graph G(V, E) by a sparse graph H(V, F ) o the same vertex set. I particular, we cosider

More information

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe CHAPTER 18 Strategies for Query Processig Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe Itroductio DBMS techiques to process a query Scaer idetifies

More information

Hash Tables. Presentation for use with the textbook Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, 2015.

Hash Tables. Presentation for use with the textbook Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, 2015. Presetatio for use with the textbook Algorithm Desig ad Applicatios, by M. T. Goodrich ad R. Tamassia, Wiley, 2015 Hash Tables xkcd. http://xkcd.com/221/. Radom Number. Used with permissio uder Creative

More information

Sorting in Linear Time. Data Structures and Algorithms Andrei Bulatov

Sorting in Linear Time. Data Structures and Algorithms Andrei Bulatov Sortig i Liear Time Data Structures ad Algorithms Adrei Bulatov Algorithms Sortig i Liear Time 7-2 Compariso Sorts The oly test that all the algorithms we have cosidered so far is compariso The oly iformatio

More information

What are we going to learn? CSC Data Structures Analysis of Algorithms. Overview. Algorithm, and Inputs

What are we going to learn? CSC Data Structures Analysis of Algorithms. Overview. Algorithm, and Inputs What are we goig to lear? CSC316-003 Data Structures Aalysis of Algorithms Computer Sciece North Carolia State Uiversity Need to say that some algorithms are better tha others Criteria for evaluatio Structure

More information

Lecture 2: Spectra of Graphs

Lecture 2: Spectra of Graphs Spectral Graph Theory ad Applicatios WS 20/202 Lecture 2: Spectra of Graphs Lecturer: Thomas Sauerwald & He Su Our goal is to use the properties of the adjacecy/laplacia matrix of graphs to first uderstad

More information

An Efficient Algorithm for Graph Bisection of Triangularizations

An Efficient Algorithm for Graph Bisection of Triangularizations Applied Mathematical Scieces, Vol. 1, 2007, o. 25, 1203-1215 A Efficiet Algorithm for Graph Bisectio of Triagularizatios Gerold Jäger Departmet of Computer Sciece Washigto Uiversity Campus Box 1045, Oe

More information

Filter design. 1 Design considerations: a framework. 2 Finite impulse response (FIR) filter design

Filter design. 1 Design considerations: a framework. 2 Finite impulse response (FIR) filter design Filter desig Desig cosideratios: a framework C ı p ı p H(f) Aalysis of fiite wordlegth effects: I practice oe should check that the quatisatio used i the implemetatio does ot degrade the performace of

More information

Pattern Recognition Systems Lab 1 Least Mean Squares

Pattern Recognition Systems Lab 1 Least Mean Squares Patter Recogitio Systems Lab 1 Least Mea Squares 1. Objectives This laboratory work itroduces the OpeCV-based framework used throughout the course. I this assigmet a lie is fitted to a set of poits usig

More information

UNIVERSITY OF MORATUWA

UNIVERSITY OF MORATUWA UNIVERSITY OF MORATUWA FACULTY OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING B.Sc. Egieerig 2014 Itake Semester 2 Examiatio CS2052 COMPUTER ARCHITECTURE Time allowed: 2 Hours Jauary 2016

More information

9.1. Sequences and Series. Sequences. What you should learn. Why you should learn it. Definition of Sequence

9.1. Sequences and Series. Sequences. What you should learn. Why you should learn it. Definition of Sequence _9.qxd // : AM Page Chapter 9 Sequeces, Series, ad Probability 9. Sequeces ad Series What you should lear Use sequece otatio to write the terms of sequeces. Use factorial otatio. Use summatio otatio to

More information

Accuracy Improvement in Camera Calibration

Accuracy Improvement in Camera Calibration Accuracy Improvemet i Camera Calibratio FaJie L Qi Zag ad Reihard Klette CITR, Computer Sciece Departmet The Uiversity of Aucklad Tamaki Campus, Aucklad, New Zealad fli006, qza001@ec.aucklad.ac.z r.klette@aucklad.ac.z

More information

COMP Parallel Computing. PRAM (1): The PRAM model and complexity measures

COMP Parallel Computing. PRAM (1): The PRAM model and complexity measures COMP 633 - Parallel Computig Lecture 2 August 24, 2017 : The PRAM model ad complexity measures 1 First class summary This course is about parallel computig to achieve high-er performace o idividual problems

More information

Numerical Methods Lecture 6 - Curve Fitting Techniques

Numerical Methods Lecture 6 - Curve Fitting Techniques Numerical Methods Lecture 6 - Curve Fittig Techiques Topics motivatio iterpolatio liear regressio higher order polyomial form expoetial form Curve fittig - motivatio For root fidig, we used a give fuctio

More information

Improvement of the Orthogonal Code Convolution Capabilities Using FPGA Implementation

Improvement of the Orthogonal Code Convolution Capabilities Using FPGA Implementation Improvemet of the Orthogoal Code Covolutio Capabilities Usig FPGA Implemetatio Naima Kaabouch, Member, IEEE, Apara Dhirde, Member, IEEE, Saleh Faruque, Member, IEEE Departmet of Electrical Egieerig, Uiversity

More information

3D Model Retrieval Method Based on Sample Prediction

3D Model Retrieval Method Based on Sample Prediction 20 Iteratioal Coferece o Computer Commuicatio ad Maagemet Proc.of CSIT vol.5 (20) (20) IACSIT Press, Sigapore 3D Model Retrieval Method Based o Sample Predictio Qigche Zhag, Ya Tag* School of Computer

More information

condition w i B i S maximum u i

condition w i B i S maximum u i ecture 10 Dyamic Programmig 10.1 Kapsack Problem November 1, 2004 ecturer: Kamal Jai Notes: Tobias Holgers We are give a set of items U = {a 1, a 2,..., a }. Each item has a weight w i Z + ad a utility

More information

Data Warehousing. Paper

Data Warehousing. Paper Data Warehousig Paper 28-25 Implemetig a fiacial balace scorecard o top of SAP R/3, usig CFO Visio as iterface. Ida Carapelle & Sophie De Baets, SOLID Parters, Brussels, Belgium (EUROPE) ABSTRACT Fiacial

More information

A Very Simple Approach for 3-D to 2-D Mapping

A Very Simple Approach for 3-D to 2-D Mapping A Very Simple Approach for -D to -D appig Sadipa Dey (1 Ajith Abraham ( Sugata Sayal ( Sadipa Dey (1 Ashi Software Private Limited INFINITY Tower II 10 th Floor Plot No. - 4. Block GP Salt Lake Electroics

More information

IMP: Superposer Integrated Morphometrics Package Superposition Tool

IMP: Superposer Integrated Morphometrics Package Superposition Tool IMP: Superposer Itegrated Morphometrics Package Superpositio Tool Programmig by: David Lieber ( 03) Caisius College 200 Mai St. Buffalo, NY 4208 Cocept by: H. David Sheets, Dept. of Physics, Caisius College

More information

Data diverse software fault tolerance techniques

Data diverse software fault tolerance techniques Data diverse software fault tolerace techiques Complemets desig diversity by compesatig for desig diversity s s limitatios Ivolves obtaiig a related set of poits i the program data space, executig the

More information

Creating Exact Bezier Representations of CST Shapes. David D. Marshall. California Polytechnic State University, San Luis Obispo, CA , USA

Creating Exact Bezier Representations of CST Shapes. David D. Marshall. California Polytechnic State University, San Luis Obispo, CA , USA Creatig Exact Bezier Represetatios of CST Shapes David D. Marshall Califoria Polytechic State Uiversity, Sa Luis Obispo, CA 93407-035, USA The paper presets a method of expressig CST shapes pioeered by

More information

Chapter 4 Threads. Operating Systems: Internals and Design Principles. Ninth Edition By William Stallings

Chapter 4 Threads. Operating Systems: Internals and Design Principles. Ninth Edition By William Stallings Operatig Systems: Iterals ad Desig Priciples Chapter 4 Threads Nith Editio By William Stalligs Processes ad Threads Resource Owership Process icludes a virtual address space to hold the process image The

More information

CS200: Hash Tables. Prichard Ch CS200 - Hash Tables 1

CS200: Hash Tables. Prichard Ch CS200 - Hash Tables 1 CS200: Hash Tables Prichard Ch. 13.2 CS200 - Hash Tables 1 Table Implemetatios: average cases Search Add Remove Sorted array-based Usorted array-based Balaced Search Trees O(log ) O() O() O() O(1) O()

More information

Morgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5.

Morgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5. Morga Kaufma Publishers 26 February, 208 COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter 5 Virtual Memory Review: The Memory Hierarchy Take advatage of the priciple

More information

Math 10C Long Range Plans

Math 10C Long Range Plans Math 10C Log Rage Plas Uits: Evaluatio: Homework, projects ad assigmets 10% Uit Tests. 70% Fial Examiatio.. 20% Ay Uit Test may be rewritte for a higher mark. If the retest mark is higher, that mark will

More information

A Generalized Set Theoretic Approach for Time and Space Complexity Analysis of Algorithms and Functions

A Generalized Set Theoretic Approach for Time and Space Complexity Analysis of Algorithms and Functions Proceedigs of the 10th WSEAS Iteratioal Coferece o APPLIED MATHEMATICS, Dallas, Texas, USA, November 1-3, 2006 316 A Geeralized Set Theoretic Approach for Time ad Space Complexity Aalysis of Algorithms

More information

1.2 Binomial Coefficients and Subsets

1.2 Binomial Coefficients and Subsets 1.2. BINOMIAL COEFFICIENTS AND SUBSETS 13 1.2 Biomial Coefficiets ad Subsets 1.2-1 The loop below is part of a program to determie the umber of triagles formed by poits i the plae. for i =1 to for j =

More information

Last class. n Scheme. n Equality testing. n eq? vs. equal? n Higher-order functions. n map, foldr, foldl. n Tail recursion

Last class. n Scheme. n Equality testing. n eq? vs. equal? n Higher-order functions. n map, foldr, foldl. n Tail recursion Aoucemets HW6 due today HW7 is out A team assigmet Submitty page will be up toight Fuctioal correctess: 75%, Commets : 25% Last class Equality testig eq? vs. equal? Higher-order fuctios map, foldr, foldl

More information

Analysis of Algorithms

Analysis of Algorithms Presetatio for use with the textbook, Algorithm Desig ad Applicatios, by M. T. Goodrich ad R. Tamassia, Wiley, 2015 Aalysis of Algorithms Iput 2015 Goodrich ad Tamassia Algorithm Aalysis of Algorithms

More information

COP4020 Programming Languages. Compilers and Interpreters Prof. Robert van Engelen

COP4020 Programming Languages. Compilers and Interpreters Prof. Robert van Engelen COP4020 mig Laguages Compilers ad Iterpreters Prof. Robert va Egele Overview Commo compiler ad iterpreter cofiguratios Virtual machies Itegrated developmet eviromets Compiler phases Lexical aalysis Sytax

More information

Overview. Common tasks. Observation. Chapter 20 The STL (containers, iterators, and algorithms) 8/13/18. Bjarne Stroustrup

Overview. Common tasks. Observation. Chapter 20 The STL (containers, iterators, and algorithms) 8/13/18. Bjarne Stroustrup Overview Chapter 20 The STL (cotaiers, iterators, ad algorithms) Bjare Stroustrup www.stroustrup.com/programmig Commo tasks ad ideals Geeric programmig Cotaiers, algorithms, ad iterators The simplest algorithm:

More information

Outline. Applications of FFT in Communications. Fundamental FFT Algorithms. FFT Circuit Design Architectures. Conclusions

Outline. Applications of FFT in Communications. Fundamental FFT Algorithms. FFT Circuit Design Architectures. Conclusions FFT Circuit Desig Outlie Applicatios of FFT i Commuicatios Fudametal FFT Algorithms FFT Circuit Desig Architectures Coclusios DAB Receiver Tuer OFDM Demodulator Chael Decoder Mpeg Audio Decoder 56/5/ 4/48

More information

Spectral leakage and windowing

Spectral leakage and windowing EEL33: Discrete-Time Sigals ad Systems Spectral leakage ad widowig. Itroductio Spectral leakage ad widowig I these otes, we itroduce the idea of widowig for reducig the effects of spectral leakage, ad

More information

Bayesian approach to reliability modelling for a probability of failure on demand parameter

Bayesian approach to reliability modelling for a probability of failure on demand parameter Bayesia approach to reliability modellig for a probability of failure o demad parameter BÖRCSÖK J., SCHAEFER S. Departmet of Computer Architecture ad System Programmig Uiversity Kassel, Wilhelmshöher Allee

More information

2. ALGORITHM ANALYSIS

2. ALGORITHM ANALYSIS 2. ALGORITHM ANALYSIS computatioal tractability survey of commo ruig times 2. ALGORITHM ANALYSIS computatioal tractability survey of commo ruig times Lecture slides by Kevi Waye Copyright 2005 Pearso-Addiso

More information

FPGA IMPLEMENTATION OF BASE-N LOGARITHM. Salvador E. Tropea

FPGA IMPLEMENTATION OF BASE-N LOGARITHM. Salvador E. Tropea FPGA IMPLEMENTATION OF BASE-N LOGARITHM Salvador E. Tropea Electróica e Iformática Istituto Nacioal de Tecología Idustrial Bueos Aires, Argetia email: salvador@iti.gov.ar ABSTRACT I this work, we preset

More information

Operating System Concepts. Operating System Concepts

Operating System Concepts. Operating System Concepts Chapter 4: Mass-Storage Systems Logical Disk Structure Logical Disk Structure Disk Schedulig Disk Maagemet RAID Structure Disk drives are addressed as large -dimesioal arrays of logical blocks, where the

More information

6.854J / J Advanced Algorithms Fall 2008

6.854J / J Advanced Algorithms Fall 2008 MIT OpeCourseWare http://ocw.mit.edu 6.854J / 18.415J Advaced Algorithms Fall 2008 For iformatio about citig these materials or our Terms of Use, visit: http://ocw.mit.edu/terms. 18.415/6.854 Advaced Algorithms

More information

Chapter 4. Procedural Abstraction and Functions That Return a Value. Copyright 2015 Pearson Education, Ltd.. All rights reserved.

Chapter 4. Procedural Abstraction and Functions That Return a Value. Copyright 2015 Pearson Education, Ltd.. All rights reserved. Chapter 4 Procedural Abstractio ad Fuctios That Retur a Value Copyright 2015 Pearso Educatio, Ltd.. All rights reserved. Overview 4.1 Top-Dow Desig 4.2 Predefied Fuctios 4.3 Programmer-Defied Fuctios 4.4

More information

Evaluation scheme for Tracking in AMI

Evaluation scheme for Tracking in AMI A M I C o m m u i c a t i o A U G M E N T E D M U L T I - P A R T Y I N T E R A C T I O N http://www.amiproject.org/ Evaluatio scheme for Trackig i AMI S. Schreiber a D. Gatica-Perez b AMI WP4 Trackig:

More information

EE 459/500 HDL Based Digital Design with Programmable Logic. Lecture 13 Control and Sequencing: Hardwired and Microprogrammed Control

EE 459/500 HDL Based Digital Design with Programmable Logic. Lecture 13 Control and Sequencing: Hardwired and Microprogrammed Control EE 459/500 HDL Based Digital Desig with Programmable Logic Lecture 13 Cotrol ad Sequecig: Hardwired ad Microprogrammed Cotrol Refereces: Chapter s 4,5 from textbook Chapter 7 of M.M. Mao ad C.R. Kime,

More information

Data Structures and Algorithms Part 1.4

Data Structures and Algorithms Part 1.4 1 Data Structures ad Algorithms Part 1.4 Werer Nutt 2 DSA, Part 1: Itroductio, syllabus, orgaisatio Algorithms Recursio (priciple, trace, factorial, Fiboacci) Sortig (bubble, isertio, selectio) 3 Sortig

More information

GE FUNDAMENTALS OF COMPUTING AND PROGRAMMING UNIT III

GE FUNDAMENTALS OF COMPUTING AND PROGRAMMING UNIT III GE2112 - FUNDAMENTALS OF COMPUTING AND PROGRAMMING UNIT III PROBLEM SOLVING AND OFFICE APPLICATION SOFTWARE Plaig the Computer Program Purpose Algorithm Flow Charts Pseudocode -Applicatio Software Packages-

More information