Asymmetrical Load-Balancing for Incremental Fast Fourier Transform on Multi-Core Processors

Size: px
Start display at page:

Download "Asymmetrical Load-Balancing for Incremental Fast Fourier Transform on Multi-Core Processors"

Transcription

1 Asymmetrical Load-Balacig for Icremetal Fast Fourier Trasform o Multi-Core Processors By Todor Padeliev A thesis submitted to The Faculty of Graduate Studies ad Research i partial fulfilmet of the degree requiremets of Master of Sciece i Iformatio ad Systems Sciece Computer Sciece Ottawa-Carleto Istitute of Computer Sciece Departmet of Computer Sciece Carleto Uiversity Ottawa, Otario, Caada September, 009 copyright 009, Todor Padeliev

2 Abstract The Fast Fourier Trasform (FFT) is a powerful method i cotemporary computig, with lots of practical applicatios from sigal processig to cryptography. O multicore platforms, symmetrical load distributio prevails but has efficiecy issues with locality creatig a gap betwee theoretical arithmetic complexity ad actual performace. Some proposed re-evaluatios of Amdahl s law for parallel speed-up favour asymmetric multi-processig. The approach take i this work is therefore to let the idividual processors/cores specialise i parts of the FFT such as butterfly operatio, permutig/trasposig, calculatig complex roots of uity; thus computatio is asymmetrical eve o SMP/CMP. Iter-thread sychroisatio employed is spi-lock. This research cotributes: the otio of icremetality iheret i the applicatio, iovative usage of a shared heap to hold twiddle-factors, ad best kow arithmetic complexity of computig these. The solutio suits hard to predict problem sizes, o the up to 9 cores that the multi-core idustry delivers owadays (9 i IBM s CellBE). ii

3 TO THE EUROPEAN COUNTRY THAT RAISED ME ACADEMICALLY, BULGARIA, AND TO CANADA FOR ALLOWING ME TO EXCEL. iii

4 Ackowledgemets I would specifically like to express appreciatio to my co-supervisors Dr. Michiel Smid ad Dr. Richard Dasereau, for their participatio i the iitial discussio of the topic, as well as for logistical support. Dr. Smid cotributed a lot to shapig up this research through valuable cocise remarks, to the poit, all alog throughout the work. My family comprisig my wife Valeria Padelieva, so Velia ad daughter Atoia, deserve thaks for their patiece, uderstadig ad ecouragemet durig my studies at Carleto Uiversity. The overall academic atmosphere at the School of Computer Sciece, ad the excellet level of the semiars i particular, set the cotext that made this possible. May thaks go to the School s graduate director (util recetly) Dr. J.-P. Corriveau for his support i my trasitio ito the departmet, ad to Claire Rya for her assistace with admi matters. Dr. Kraakis cotributed to improvig the clarity of the arrative. Dr. Paario from the Departmet of Mathematics helped to fix ad improve the mathematics i this work. iv

5 Table of Cotets Abstract Ackowledgemets Table of Cotets List of Figures List of Acroyms ii iv v viii ix Chapter 1: Itroductio Termiology Used State of the Art i FFT Efficiecy o Multi-core Amdahl s law Gustafso s law Iput/Output Complexity Cache Obliviousess Recocilig Space ad Time Locality with FFT Motivatio for This Research Icremetality of Importat Applicatios Multi-core specifics Goals of This Research Approach ad Methodology Specific Solutio for Computatio of Twiddle-factors Specific Solutios for Icremetality v

6 Chapter : Itroducig the Fast Fourier Trasform 1.1 Basic defiitios Polyomials Alterative Represetatios of Polyomials Sigals ad Frequecies Discrete Fourier Trasform (DFT) Properties of the Complex Roots of Uity Fast Fourier Trasform (FFT) ad the Cooley-Tukey Algorithm Recursive Algorithm for Radix- DIT FFT Towards Iterative Algorithm for Radix- DIT FFT Applicatios of the FFT FFT o Multi-core Platforms Theoretical Arithmetic Complexity of the FFT Decimatio, Parallelism ad Multi-core Pipelies ad Sigle Istructio/Multiple Data (SIMD) Stockham Autosort ad Pease s Algorithm... 4 Chapter 3: Geeric Desig The Solutios Itroducig the Middle-Heap Icremetal Computatio of Twiddle Factors the Mea-middle Method Load-Balacig for Icremetality Core-Mootoic Strategy The Algorithms Middle-heap Algorithms Threadig ad Sychroisatio Desig of the FFT Pla Shuffle for Bit-reversed Iput DIF Algorithm for Ordered Output vi

7 3..6 Correctess of the DIF Algorithm for Ordered Output Efficiecy of the DIF Algorithms Itroducig Iput Isertio Optimise DIF Algorithm by Urollig Recursio Leaves Chapter 4: Aalysis of the Geeric Desig Properties of the Middle-heap Properties Derived from Mi-Heap Properties Derived from Complex Roots of Uity Computatioal Complexity of Traversal Arithmetic Complexity of Computig Twiddle-factors Time Locality ad Pipeliig Twiddle-factor Computatio FFT Pla Space Locality Cache Misses with Twiddle-factor Computatio Cache Misses with the FFT Pla Avoidig false sharig Fial Optimisatio of Load-balacig Chapter 5: Coclusios ad Future Work 55 Bibliography 56 vii

8 List of Figures Figure.1. Amplitude Modulatio... 0 Figure 3.1. Max-Heap... 6 Figure 3.. Middle-Heap... 7 Figure 3.3. Meas of Adjacet Twiddle Factors... 8 Figure 3.4. Sequece of Multi-core Processig Figure 3.5. DIF Natural vs. Bit-Reversed Iput Figure 4.1. Error Propagatio of Mea-middle Method viii

9 List of Acroyms ADC AM ALU AMP API BCE CMP CPU DMA DFT DIF DIT DMA DSP FFT FFTW FLOP FM FPGA Aalog-to-Digital Coverter Amplitude Modulatio Arithmetical-Logical Uit Asymmetrical Multi-Processig Applicatio Programmig Iterface Base Core Etity Chip Multi-Processig (SMP o a sigle chip) Cetral Processig Uit Direct Memory Access Discrete Fourier Trasform Decimatio I Frequecy Decimatio I Time Direct Memory Access, copyig without processor participatio Digital Sigal Processig/ Digital Sigal Processor Fast Fourier Trasform Fastest Fourier Trasform i the West (a plaer/executor tool) FLoatig-poit OPeratio Frequecy Modulatio Field-Programmable Gate Array ix

10 IDCT IFFT MAC MKL OFDM PPU PVR SIMD SMP SPU WHT WLOG Iverse Discrete Cosie Trasform Iverse Fast Fourier Trasform Multiply-ad-ACcumulate Math Kerel Library by Itel Orthogoal Frequecy Divisio Multiplexig PowerPC core o the IBM CellBE processor Poit-Value Represetatio (of polyomials) Sigle Istructio Multiple Data Symmetrical Multi-Processig Syergistic Processig Uit core o the IBM CellBE processor Walsh-Hadamard Trasform Without Loss Of Geerality x

11 Chapter 1: Itroductio 1.1 Termiology Used Multi-core processor techology is a moder versio of parallel multi-processig, i which several but ot umerous, usually a sigle-digit umber of cores are laid out by idustry o the same processor. Oe of the aims is to distribute the geerated heat more evely across the itegrated circuit, so that higher performace is achieved without icreasig the processor s clock rate, which ievitably creates heat dissipatio issues. I the title of this work, asymmetrical load-balacig is used i a sese similar to the well-kow meaig i AMP (Asymmetrical Multi-Processig). AMP refers to processor desigs with cores cosistig of differet/specialised architectures, as well as to ruig differet applicatios o each core possibly optimised for differet hardware features, but also because each core is assiged its ow thread. Symmetrical o the other had, whe referrig to hardware, meas that the processor cores are idetical. Whe used i cojuctio with Operatig Systems/Software, this usually meas that the differet cores do ot ru the same code (o differet data) i parallel a multi-core variat of vectorizatio ad SIMD, which some compiler code-geeratio optimisers ad libraries like Itel s MKL are capable of achievig trasparetly for the programmer.

12 Chapter 1: Itroductio The title uses the word icremetal to idicate that ot all iput data may be available at oce at the start of the computatioal process. Ideed most sigal-processig applicatios, FFT beig part of these, rely o samplig voltage at regular itervals. Except for a couple of ewly itroduced terms e.g. middle-heap, the rest of the wordig i this work is iteded to coform completely to the widely-accepted terms ad their meaigs i cotemporary literature o techology. The oly peculiarity worth metioig is that sometimes terms from the area of algorithms ad computer sciece are less kow to egieerig ad electroics people, e.g. heap, ad vice-versa. Amog the possible stumblig blocks for people outside electrical ad electroics egieerig could be DMA (Direct Memory Access) copyig of values i memory without the participatio of the processor; DSP (Digital Sigal Processig/Digital Sigal Processor) deotig a serially maufactured but specialised processor for computatioally-itesive, usually embedded, applicatios. Alog the same lies are FPGA (Field-Programmable Gate Array) a itegrated circuit whose logic is programmable rather tha hard-wired, MAC (Multiply-ad-Accumulate) a computatioal patter i DSP techology. There are also terms that are kow to ay professioal i the idustry of software developmet, but may ot be very commoly used by academic researchers. Amog these is API (Applicatio Programmig Iterface) meaig a fuctio/fuctios to be called, or spi-lock computatioally efficiet but crude iter-thread sychroisatio where oe core is i a uproductive loop while waitig to be released by aother. Key to this work is also the otio of space locality meaig the aim to achieve few cache misses see sectio 1..4, ad time locality, explaied i detail i sectio.3.1.

13 Chapter 1: Itroductio 3 1. State of the Art i FFT Efficiecy o Multi-core 1..1 Amdahl s law Almost forty years ago i [3] Gee Amdahl argued i favour of a sigle-processor approach for achievig large-scale computig capabilities, as opposed to multiprocessig. I the process, he defied his law for the case of usig processors (cores) i parallel. Assumig that fractio p of a program s executio time is parallelisable (igorig schedulig overhead), while 1 p is strictly sequetial, the speedup o processors is: S parallel 1 (1 p) p Amdahl s law has a few corollaries, oe of which, amely that whe p is small optimisatios have little effect, was i support of his argumet that high performace computig should rely o sigle processors. The most importat corollary is this: as approaches ifiity, speedup is boud by 1/(1 p). 1.. Gustafso s law I 1988 [11] was published, ad later became kow as the Gustafso(-Barsis) Law. Here is how he himself summarises it: The model is ot a cotradictio of Amdahl's law as some have stated, but a observatio that Amdahl's assumptios do't match the way people use parallel processors. People scale their problems to match the power available, i cotrast to Amdahl's assumptio that the problem is always the same o matter how capable the computer. For N processors, the parallel part p will scale to pn: S scaled = (s+pn) (s+p)=s+pn = N + (1 N)s = N (N 1)s where s 1 p.

14 Chapter 1: Itroductio 4 As a result, few today claim that parallel processig is ot viable. I [1] Hill ad Marty argue two importat results (amog others), quoted exactly: Result 1. Amdahl s law (still) applies to multicore chips because achievig the best speedup S requires p to be close to 1. Thus, fidig parallelism is still critical. Result. Asymmetric multicore chips ca offer potetial speedups that are much greater tha symmetric multicore chips (ad ever worse). Sice by asymmetric they mea fittig a varyig umber of Base Core Elemets (BCEs) of the same, ot differet, architecture i each core, i this work we adopt the idea of asymmetric loads for CMP. A spi-lock is a efficiet idle loop i oe core util aother core is ready ad releases it. While the ad-hoc load-balacig will ot be perfect, the simplicity of spi-locks will miimise the iheretly serial sychroisatio overhead. This sectio further focuses o literature coverig the particular task of optimisig the FFT ad/or similar computatioal problems, with respect to locality ad parallelism Iput/Output Complexity Efficiet algorithms have to cosider the use of slower memory ad exteral storage, alog with the couts of operatios. While today memory has eve more layers, takig ito accout cache at various levels, some earlier results o efficiecy of algorithms still apply whether cache versus mai memory is cosidered, or mai memory versus disk ca be immaterial. I [1] Aggrawal ad Vitter cosider a model with these parameters: N = # records to hadle; M = # records that ca fit ito iteral memory; B = # records that ca be trasferred i a sigle block; P = # blocks that ca be trasferred cocurretly;

15 Chapter 1: Itroductio 5 where 1 P M N ad 1 P M B. The parameters N, M, ad B are the file size, memory size, ad block size, respectively. For FFT, the asymptotically-optimal algorithm based o Radix- DIF recursio, is show to require N PB log(1 N ) B log(1 M B I/O operatios (o tight lower boud is kow). This is achieved through so called pebblig, ad brigig the records ito memory i traspositio permutatios, M at a time i logn/logm stages (assume logm divides logn) Cache Obliviousess The ideal cache model is defied as follows: the CPU oly uses words that are i cache; if the refereced word is already i cache, a cache hit occurs, ad the word is used; else a cache miss causes a fetch from memory, possibly optimally evictig from the cache. Cache ormally cosists of cache lies, each cotaiig L cosecutive words copied together to ad from mai memory, L>1 coutig o data space locality for efficiecy. Algorithms are cache oblivious whe o parameters depedet o the hardware platform, such as cache size or cache-lie legth, eed tuig to achieve asymptotical optimality. I their ladmark paper [9] Frigo et al. prove the followig for FFT: with cache of size Z ad cache-lie legth L, whe Z = Ώ(L ) defied as the tall cache assumptio, their 6-step versio (see sectio.3) of the -poit FFT with factorisatio is cache-oblivious (if traspose is cache-oblivious), ad icurs 1 1 log Z cache L misses. They also prove that cache-obliviousess is preserved with multiple cache levels.

16 Chapter 1: Itroductio Recocilig Space ad Time Locality with FFT From amog the plaer-orieted solutios, the most uique oe is i [16], ad uses a learig strategy ad heuristics. The authors Siger ad Veloso describe a space of differet decimatio trees for a give FFT, usig Kroecker products with permutatio matrices ad twiddle-factor/vadermode matrices. They maitai that the complexity of moder processors makes it difficult to predict aalytically, or to model by had, the performace of a formula o a particular architecture. Also, that the differeces betwee curret processors lead to very differet optimal formulae from machie to machie. They employ a black-box approach by ruig the plaer software tool described i [7] o differet platforms, gatherig performace statistics. Their research reveals clear clusters i the histograms of cache miss couts vs. rutimes for each platform, as well as iterestig patters commo across the board. Also, they apply clever heuristics to limit the combiatorial burst of the solutio space, ad use particular decompositio trees. Their approach seems to focus o, ad work slightly better for, the similar to DFT but real-valued, Walsh-Hadamard Trasform (WHT) outside the scope of this work. Frigo ad Johso have addressed similar goals i the FFTW software (1998): it uses biary dyamic programmig to search for the optimal FFT implemetatio (see [8] ). I [] Ali et al. address schedulig o p cores, by factorig a -sized FFT, p. Earlier, [] co-author Johsso cotributed to the developmet of the popular plaer tool UHFFT. Usig it, the parallel FFT schedules i OpeMP ad PThreads are compared to that of the best sequetial FFT pla, ad the speedup for various umber of

17 Chapter 1: Itroductio 7 processors is reported. Reasoable speedup is achieved for sizes betwee 1 ad 14, o to 8 cores. However outside those sizes the speedup seems to follow Amdahl s law. I [7] Frachetti, Voroeko ad Püschel discuss parallelisig the FFT uder the assumptio of shared memory amog the cores. They maitai: The major problem with usig the stadard Cooley-Tukey FFT algorithm o shared memory machies is its memory access patter: large strides, ad cosecutive loop iteratios touch the same cache lies, which leads to false sharig. Their effort is thus aimed to fight false sharig. They describe their existig Spiral plaer tool, the propose extesios to it that allow for embarrassigly parallel (i.e. o mutual data depedecies exist betwee the threads) computatios of the FFT that also avoid false sharig. Their implemetatio appears to be the best fit for CMP: while o SMP platforms the performaces are comparable, the break-eve poit of parallelised FFT for Spiral o CMP is at size 7, while the competitio (FFTW ad Itel MKL) achieves it earliest at 14. A exotic algorithmic path is preseted i [13] : va der Hoeve argues that the stride from to +1 is too large, so trucate the FFT to obtai < +1 etries i the result vector. I [4] there is a good summary ad bibliography of techiques for efficiet twiddlefactor computatio. The mai approaches are CORDIC algorithms, polyomial approximatio of trigoometry, ad the recursive sie-fuctio geerator techique. CORDIC implemets fixed-poit arithmetic for butterfly rotatio (which is what a multiply by a twiddle-factor is) i fast embedded/fpga systems, virtually elimiatig the eed to compute ad store twiddle-factors separately; polyomial approximatio is commo i digital frequecy sythesis (DDFS); recursive sie-fuctio geeratio has

18 Chapter 1: Itroductio 8 accuracy issues, which the authors of [4] attempt to couter. The best result quoted is with the recursive sie-fuctio geerator adds ad multiplies, 4 FLOPs per etry. Fially, attempts are made to desig ew computig platforms so that they are also optimised for applicatios similar to the FFT. I [10] Guo et al. elaborate o desirable features of a uiversal multi-core processor with respect to its memory iterface. Although their requiremets are ot explicitly stated to favour the FFT, of the 7 tests carried out o their simulatio are FFT ad the essetially computatioally equivalet Iverse Discrete Cosie Trasform (IDCT). Aother two are Fiite Impulse Respose filter ad Adaptive Differetial Pulse-Code Modulatio coder - both sigal processig applicatios too, makig these more tha half of the tested oes. The most revolutioary idea proposed is the usage of cache for istructios oly, while data goes ito local memory for each core, similar to the CellBE processor (except the latter uses local memory for istructios too). 1.3 Motivatio for This Research Icremetality of Importat Applicatios Most sigal processig applicatios ivolve samplig at regular itervals. Prevailig research so far has cocetrated o quickly ad otherwise efficietly computig the FFT, oce all iput data is i mai memory (all samples have bee take). Iterestigly, the twiddle factors, costat for ay give problem size, are almost ever kept i storage or a database (except i some embedded/fpga applicatios), eve whe the algorithm oly deals with power-of-two sizes. Istead, API is provided to

19 Chapter 1: Itroductio 9 calculate them, the idea beig that the programmer calls this oce the uses the values i repeated Fourier trasforms of the same size. Herei we assume that there exist sigal processig applicatios where the problem size is ot kow before the arrival of some samples. While this may ot always be the case, it seems like a terrible waste ayway to stay idle computatioally while samplig, ad util the last sample arrives. If the latter were to cotai a setiel tag idetifyig it as the last (a atural assumptio), eve the twiddle factors will ot yet have bee computed, which could have happeed if the problem size had bee kow i advace. We preset a parallelisable method to work icremetally, as the samples arrive Multi-core specifics What remais to be addressed is parallelisatio. Earlier research has exclusively addressed it via decimatio of FFT ito smaller sizes, each assiged to a separate core. That may be the oly reasoable approach whe may parallel processor uits are available. However recetly multi-cores of to maximum 8 uits have become popular. O them, the prevailig approach is to search experimetally a space of (usually six-step) solutios lookig for the best executio time of a specific size FFT o a particular platform. It would be iterestig to explore other load-balacig strategies (but ot excludig decimatio, if eough cores are available). Pease s algorithm ([15] ) is a cadidate, with the perfect shuffles ruig parallel o a separate core, as well as parts of the butterfly operatio. Cores could progress i parallel performig asymmetrical computatio. This allows for the simplest ad most efficiet sychroisatio mechaism i.e. spi-lock.

20 Chapter 1: Itroductio Goals of This Research Approach ad Methodology It is oly atural to explore computig of the FFT efficietly, i the followig sese: the overall work complexity is asymptotically optimal i.e. O(N log N) (research exists that argues i favour of Horer s rule, O(N ), for practical low N); the arithmetic complexity is at least as good as that of the origial Cooley-Tukey algorithm, preferably the more efficiet Split-radix (o lower boud is kow); the I/O complexity i cache misses is close to optimal (o lower boud is kow), asymptotically; assume the size fits i memory but ot cache (ot hard up to 3 ); the algorithm is cache-oblivious; SIMD, MAC ad other pipeliig features of moder CPUs are used effectively; the work is parallelisable ad reasoably load-balaced o available CMP multicore, for the customary umbers of cores up to the 9 preset i IBM s CellBE. At the curret stage of the research area it is uclear whether the shoppig list above is achievable i full, ad uder what assumptios eve if a dedicated chip is desiged. Actually Pease ([15] ) origially suggested his algorithm for the desig of dedicated hardware. The top two bullets are easiest to achieve a variety of decimatio strategies are kow, for highly composite sizes or powers of two, as well as for prime sizes ad co-prime factors of the size. The ext three oes have bee researched mostly separately. The last bullet looks deceptively easy, but is ot trivial if the rest are kept i mid. A sizable proportio of the recet papers is dedicated to plaers that fid a optimal solutio, for a give problem size, o a particular platform, from a space of possible oes. For this to be practical, the applicatio is assumed to be of kow size, ad oce

21 Chapter 1: Itroductio 11 optimised will be ru multiple times o the same platform. Although these assumptios are fair, other applicatios are also coceivable i sigal processig ad measuremet. We take a itegrated approach: cosider all adjacet areas as well as the expected timig of evets aroud the FFT computatio. We also suggest asymmetrical executio of the parts e.g. shuffles, butterflies ad twiddle-factor multiplicatio, o separate cores. Thus each core will ru a simpler algorithm, which is a prerequisite for better pipeliig ad compiler optimisatios Specific Solutio for Computatio of Twiddle-factors We address the problem of fidig eve more efficiet ways to compute the twiddle factors tha the oes already kow, specifically for power-of-two sizes. We show a high-accuracy method (data structure ad algorithm) to improve o the adjacet area of computig the twiddle factors; our ew method also perfectly agrees with icremetality Specific Solutios for Icremetality I FFT every iput value of the algorithm affects every output value. Ca we still save time uder the assumptio that the applicatio is icremetal? Our aswer is yes if we choose to re-order data to improve space locality (ad to simplify the algorithm), that (re-orderig) work is already half doe at every power-oftwo boudary. The we come up with the otio of iput isertio, whereby a iput value arrivig after a trucated FFT has already bee computed, is propagated ito the solutio, istead of doig the same power-of-two size FFT all over agai.

22 Chapter : Itroducig the Fast Fourier Trasform.1 Basic defiitios.1.1 Polyomials 1 j A polyomial i the variable x over a algebraic field F is A ( x) a j x, a j F. The values a 0, a 1,..., a 1 are called the coefficiets of the polyomial, typically draw from the field C of the complex umbers. Ay iteger that is strictly greater tha the degree of a polyomial is a degree-boud of that polyomial. The degree of a polyomial of degree-boud may be ay iteger betwee 0 ad 1, iclusive. If A(x) ad B(x) are polyomials, their sum is defied as a polyomial C(x) of same degree-boud, such that C(x) = A(x) + B(x), for ay x. The coefficiets of matchig degrees are added together computatioal complexity is O() for degree-boud. Similarly, if A(x) ad B(x) are polyomials of degree-boud, their product C(x) is a polyomial of degree-boud 1 such that C(x) = A(x).B(x) for ay x. This meas multiplyig each term i A(x) by each term i B(x) ad addig those with equal powers; this is called the covolutio of the iput vectors a ad b, deoted c = a b. The above process is O( ), coutig all arithmetic operatios o the terms coefficiets. j 0

23 Chapter : Itroducig the Fast Fourier Trasform Alterative Represetatios of Polyomials The represetatio from the defiitio is called the coefficiet represetatio. It is coveiet for some operatios o polyomials, e.g. additio as above. Also, the operatio of evaluatig the polyomial A(x) at a give poit x 0 cosists of computig the value of A(x 0 ). Evaluatio takes time Θ() usig Horer's rule: A(x 0 ) = a 0 + x 0 (a 1 + x 0 (a + + x 0 (a - + x 0 (a -1 )) )). A poit-value represetatio of a polyomial A(x) of degree-boud is a set of poit-value pairs {(x 0, y 0 ), (x 1, y 1 ),..., (x 1, y 1 )} such that all of the x k are distict ad y k = A(x k ), for k = 0, 1,.., 1. A polyomial has may differet poit-value represetatios; evaluatio meas fidig oe of these, of size at least the degree-boud. The poit-value represetatio is as coveiet for multiplyig polyomials, as for addig them. If C(x) = A(x) B(x), the C(x k ) = A(x k ) B(x k ) for ay poit x k, ad we ca poit-wise multiply a poit-value represetatio of A by a poit-value represetatio of B to obtai a poit-value represetatio of C. The iverse of evaluatio the determiig of the coefficiet form of a polyomial from a poit-value represetatio is called iterpolatio. Theorem (Uiqueess of iterpolatig polyomial): For ay set {(x 0, y 0 ), (x 1, y 1 ),..., (x 1, y 1 )} of poit-value pairs such that all the x k values are distict, there is a uique polyomial A(x) of degree-boud such that y k = A(x k ) for k = 0, 1,..., 1.

24 Chapter : Itroducig the Fast Fourier Trasform 14 The proof is based o the existece of the iverse of the Vadermode matrix V ( x 0, x 1,..., x )... 1 x x x x x 0 x x 0 1 x1 o-sigular as x k are distict by defiitio x 1 A fast algorithm for -poit iterpolatio is based o Lagrage's formula: A ( x) 1 jk ( x x ) k0 ( xk x j ) jk j. It is possible to compute the coefficiets of A usig Lagrage's formula i time Θ( ): compute P ( x k x j ), the the coefficiet represetatio of jk j k ( x xl ) ( xk x j ) ( x ), the divide it by x j j k for each l = 0, 1,, 1. Thus, -poit evaluatio ad iterpolatio are well-defied iverse operatios that trasform betwee the coefficiet represetatio of a polyomial ad a poit-value represetatio. The methods described for these are Θ( ) classical arithmetic operatios, where is the degree boud of the polyomial..1.3 Sigals ad Frequecies Electroics mixes sigals by addig them or multiplyig/dividig them. Commo practical problems i sigal processig ivolve aalysis of the spectrum: if frequecy f is preset, how strog is f, 3f, etc (called harmoics). Note: a perfect square wave of a digital sigal, alteratig betwee some voltage to represet a biary 1, ad ~0V to represet a biary 0, cotais a ifiite series of f, 3f, 4f,!

25 Chapter : Itroducig the Fast Fourier Trasform 15 This meas fidig the coefficiets of terms of a polyomial (iterpolatio), sice f is represeted by the square of a complex umber, 3f by the cube, etc. This is based o Euler s idetity: cos x + i si x = e ix, as the equality e ky = (e y ) k holds for ay complex y..1.4 Discrete Fourier Trasform (DFT) The iverse of the particular iterpolatio whe f = 1/ is to evaluate the polyomial 1 A( x) a j x j0 j of degree-boud at the complex th roots of uity:,,...,, where e i. Without loss of geerality (WLOG), assume that is a power of, sice a give degree ca always be raised high-order zero coefficiets ca always be added as ecessary. Let A be give i coefficiet form a = (a 0, a 1,..., a 1 ). Defie the results y k, for k = 0, 1,..., 1, by y k 1 j 0 k kj A( ) a. j Defiitio 1: The vector y = (y 0, y 1,..., y 1 ) is the Discrete Fourier Trasform (DFT) of the coefficiet vector a = (a 0, a 1,..., a 1 ). We also write y = DFT (a). The DFT is equivalet to the cotiuous Fourier trasform F( s) for periodic f, i a bad-limited settig made discrete via samplig. f ( t) e F(s) would yield the stregth of frequecy s i the mix its amplitude. A differet way to express the DFT is y = V a, where V is the Vadermode matrix with the powers of the -th complex root of uity. To carry out the iverse, e.g. iterpolate for frequecy aalysis, V -1 is eeded. I [6] there is the followig theorem: i.st dt,

26 Chapter : Itroducig the Fast Fourier Trasform kj/ Theorem: V j,k = ω Proof idea is, by usig the properties of the complex roots of uity, to show that V -1 V is the idetity matrix I. The above shows the iverse to be similar to DFT. DFT has bee prove to be its ow iverse, reordered ad scaled with a factor of 1/..1.5 Properties of the Complex Roots of Uity The descriptio of these properties follows the presetatio i [6]. From the defiitio of complex roots of uity directly follows the followig lemma: Lemma 1: (Cacellatio lemma) For ay itegers 0, k 0, ad d > 0, d dk = k. Proof: d dk = e i.dk/d = e i.k/ = k. Corollary For ay iteger > 0, = = 1. Lemma : (Halvig lemma) If > 0 is eve, the the squares of the complex th roots of uity are the / complex (/) th roots of uity (each occurrig twice). Proof: / = 1 implies k+/ = k hece ( k+/ ) = ( k ). However per the Cacellatio lemma ( k ) = / k. Lemma 3: (Summatio lemma) k j For ay iteger 1 ad oegative iteger k ot divisible by, ) 0. 1 k k k k j 1 ( ) 1 ( ) 1 (1) Proof: ( ) 0. k k k j 0 j 0 1 (

27 Chapter : Itroducig the Fast Fourier Trasform Fast Fourier Trasform (FFT) ad the Cooley-Tukey Algorithm By takig advatage of the properties of the complex roots of uity, DFT (a) ca be computed i time Θ( log ), as opposed to Θ( ) for the defiitio formula. This method is attributed to Gauss, but was rediscovered as the Cooley-Tukey Algorithm i [5]. I the two cases below we use (ω k ) = ω k / (remember WLOG is a power of ). Idea 1: Split the eve-idex from the odd-idex coefficiets of polyomial A(x). Assumig eve: A [0] (x) =a 0 +a x+a 4 x + +a - x / ; A [1] (x) = a 1 +a 3 x+a 5 x + +a -1 x / 1, ad A(x) = A [0] (x ) + xa [1] (x ). Thus we eed to evaluate two degree-boud / polyomials at (ω 0 ), (ω 1 ),, (ω 1 ) the complex (/) th roots of uity each occurrig twice the to combie the results. The problem decomposes ito two of half its size. This method is referred to i literature as Radix- decimatio i time (DIT). The iverse of this odd/eve split is a operatio kow as perfect shuffle (like with two half-decks of cards). Idea : Split the low-idex half coefficiets of A(x) from the high-idex half oes: A(x) = a 0 + a 1 x + a x + + a /-1 x /-1 + x / (a / + a /+1 x + a /+ x + + a -1 x /-1 ) 1 j i.e. A ( x) ( a j x a ) j x ; thus for r = 0,1,, 1, yr ( a j0 1 j0 j r a ) j jr. Cosider the eve r=k separately from the odd r=l+1; takig ito accout ω / = 1: z k 1 1 kj k a j a j ) ( a j a j ). j0 j0 y (, for k 0,1,..., 1, ad kj t l 1 1 lj j j l 1 a j a j ) ( a j a j ). j 0 j 0 y (, for l 0,1,..., 1. lj

28 Chapter : Itroducig the Fast Fourier Trasform 18 Agai, the DFT problem decomposes ito two of half its origial size. This method is referred to as Radix- decimatio i frequecy (DIF). Radix- DIT ad DIF are particular cases of the geeral Cooley-Tukey algorithm, which allows ay radix that divides..1.7 Recursive Algorithm for Radix- DIT FFT The pseudo-code below follows DIT literally, hece its correctess is iheret: it begis with a check for the ed of the recursio; the a iverse perfect shuffle o the iput is performed ad the result is assiged to ew 0-based vectors a0[] ad a1[]. Recursive_FFT() the calls itself for these. Fially, a for-loop iterates icremetally calculatig the powers of the root of uity at the same time combiig a0[] ad a1[]. Recursive_FFT(a[0:-1], ) /* a[ ] is a vector, is power of */ if = 1 the retur a[]; edif; a0[0:-1] = {a[0],a[],...,a[-]}; /*vector assigmet, a*/ a1[0:-1] = {a[1],a[3],...,a[-1]}; /*iverse perfect shuffle*/ y0[0:-1] = Recursive_FFT(a0,/); y1[0:-1] = Recursive_FFT(a1,/ ); ω = e πi/ ; /* primitive complex root of uity twiddle-factor */ ω = 1; for k = 0 to /-1 do y[k] = y0[k] + ω*y1[k]; y[k+/] = y0[k] - ω*y1[k]; ω = ω * ω; /* computatio of twiddle-factors */ edfor; retur y;

29 Chapter : Itroducig the Fast Fourier Trasform 19 I the computatioal algorithms, the complex roots of uity have become kow as Twiddle-factors, sice they are beig viewed as coefficiet correctios, e.g. i t l above. The computatioal complexity of Recursive_FFT() is O( log ). Ideed there are log recursio levels of iteratio loops each x / the 4 x /4, etc. We show i sectio 3..7 that the computatioal complexity of DIF versios of Cooley- Tukey is also O( log )..1.8 Towards Iterative Algorithm for Radix- DIT FFT The code above is recursive, with overheads for the calls/returs ad local vectors. Also, the value ω*y1[k] is computed twice, whe added ad whe subtracted. It could be assiged to a variable the reused with the sig reversed; this is kow as a butterfly operatio. Follow the exact order of the recursive evaluatio, idices expressed i biary: a[0] a[] a[4] a[6] a[1] a[3] a[5] a[7] a[0] a[4] a[] a[6] a[1] a[5] a[3] a[7] The above are a case of bit-reversal permutatio: biary umbers sorted by the leastsigificat (LS) bit first. They are easy to compute i sub O( log ), thus would ot affect the complexity. It is the straightforward to write a iterative versio of the above algorithm, refer to [6] for suggested implemetatio.

30 Chapter : Itroducig the Fast Fourier Trasform Applicatios of the FFT Possible applicatio is digital implemetatio of AM or FM radio. Preseted i Figure.1 ( Arrow Electroics) is Amplitude Modulatio (AM) of carrier 14kHz by 1kHz. The 1kHz amplitude over time ca be foud (demodulated) via spectrum aalysis usig FFT. Figure.1. Amplitude Modulatio At a DSP traiig course we were preseted with a software traffic radar detector too. Comig back to polyomials, straightforward multiplicatio (covolutio of coefficiet vectors) was show to be O( ). Alteratively (per Schöhage & Strasse): Evaluate both polyomials at complex roots of uity, usig FFT O( log ); Poit-wise multiplicatio of the two PVRs O(); Iterpolate the result usig iverse FFT O( log ); Overall ruig time is therefore O( log ). Oe importat applicatio is the efficiet multiplicatio of large prime umbers eeded i cryptography: the decimal represetatio of a iteger ca be viewed as a polyomial with its digits as the coefficiets. I moder wireless commuicatio systems, both for voice ad widebad data trasmissio, the OFDM (Orthogoal Frequecy Divisio Multiplexig) is used. It also plays a importat role i wire-lie commuicatio systems. Examples of widely popular stadards relyig upo it are 80.11a/g, 80.16, DVB, DAB, VDSL, ad so o.

31 Chapter : Itroducig the Fast Fourier Trasform 1 I these systems, DFT/iverse DFT, implemeted as FFT/iverse FFT both i software ad hardware, is the core compoet i OFDM trasmissio ad receptio. Those systems require FFT/IFFT of legths ragig from 64 to 819. Thus the FFT eeds to be evaluated o the widest possible variety of computig platforms FFT o Multi-core Platforms The majority of today s computers are multi-core with more tha oe processor/ arithmetical-logical uit (ALU) o a sigle chip. May use truly shared memory that ca be accessed from ay core, some like IBM s CellBE do ot their data eeds to be copied by Direct Memory Access (DMA). Some use private cache with shared memory, hece the false sharig problem data may be preset i the wrog core's cache, where it is ot eeded but takes the space of data that is eeded for the computatio.. Theoretical Arithmetic Complexity of the FFT Asymptotically FFT for size N takes O(N log N) operatios. Extesive research has bee carried out o the actual operatios cout, i.e. o the costat factor before N log N (kow to be 5 for Cooley-Tukey), as well as the remaiig (o-domiatig) terms of the complexity equality. The arithmetic complexity is expressed i the umber of floatig poit operatios (FLOPs), additios ad multiplicatios, as a fuctio of the problem size N. No tight lower limit is kow. Util recetly (007), the best kow result had bee achieved by Yave i 1968, his split-radix algorithm ruig i 4NlgN - 6N + 8, where lg meas log. I [14] Johso ad Frigo give a explaatio of that result, ad publish a improved cout of 34/9NlgN - 14/7N - lgn - /9(-1) lgn lgn + 16/7(-1) lgn + 8,

32 Chapter : Itroducig the Fast Fourier Trasform also based o the split-radix method: decimatio ito 3 sub-problems, oe of which cosists of the eve-idexed terms, the other two respectively idexed 1 ad 3 modulo 4. There is also research available that focuses o miimisig the floatig-poit multiplicatios, but this is achieved with a lot more additios. Such results could be useful o some platforms, particularly dedicated sigal-processig FPGAs or processors (icludig some DSPs) that do ot support floatig poit arithmetic i hardware. For ormal processors, especially more recet oes, the arithmetic complexity i the origial sese (multiplicatios ad additios together) is more relevat sice a multiplicatio takes the same umber of processor cycles as additio. However experimets have show that performace for the same theoretical FLOP cout ca differ dramatically depedig o data space locality, pipeliig, ad other features of moder processors. This is the topic of the ext sectio. The twiddle-factors are mostly assumed (efficietly) pre-computed the reused. Efficiet computatio of these has focused maily o avoidig repeated calculatios ad usig some properties of the complex roots of uity like their periodicity: e.g. it is sufficiet to calculate the oes i the 1 st Cartesia quadrat the others are obtaied via multiplyig by i, 1 or i. These multiplicatios do ot ivolve FLOPs, as they ca be expressed as sig iversios ad swappig of the real part with the imagiary part. It is coveiet to calculate the twiddle factors oce, store the values ad reuse them with ew FFTs of the same size. Almost all existig software libraries provide API to populate a array of twiddle-factors give the FFT size, istead of pre-computed tables.

33 Chapter : Itroducig the Fast Fourier Trasform 3.3 Decimatio, Parallelism ad Multi-core Both DIF ad DIT allow divide-ad-coquer parallelisatio, also apply to size factors, ad algorithms exist for prime size (Rader s) or co-prime size factors (Good-Thomas). For large umbers of processors, e.g. o computig arrays or hyper-cubes, strategies have bee developed to distribute work equitably (load-balacig). O smaller CMP multicores, efforts have bee applied mostly to improve the use of caches ad pipelies istead the bit-reversal permute ad the stride a 0 to a / create a problem i the presece of cache: they dramatically decrease speed for FFT sizes that do ot fit i it. Oe solutio is the Six-step approach ([9], [7] ): decimate size =pq ito a pxq matrix, each row with coefficiets of a size-q FFT, ad compute the origial FFT i these steps: 1) Traspose more cache-efficiet due to stride smaller tha /; ) Perform FFT by rows more cache-efficiet due to size much smaller tha ; 3) Combie results with twiddle-factors of much lower degree; 4) Traspose; 5) Perform FFT by rows, this combies the parts of the decimatio; 6) Traspose. Most kow moder solutios ivolve techiques to address both parallelism ad locality. Some achieve it through a dedicated plaer ru to fid oe efficiet solutio from amog a space of may, for a particular problem size ad (multi-core) platform..3.1 Pipelies ad Sigle Istructio/Multiple Data (SIMD) Space locality is ot the oly feature to be cosidered o moder computig platforms. May, especially the oes dedicated to efficiet computatios such as DSPs, have bee optimised for certai predictable patters of istructios ad data that occur i time.

34 Chapter : Itroducig the Fast Fourier Trasform 4 For example a commo characteristic is the efficiet carryig out of a sequece of multiply-ad-accumulate (MAC) ito a sum s: s s + a i. b i, for a sequece of values i, or the ability to perform with higher efficiecy the same operatio o a relatively small array/vector of values vectorizatio a.k.a. Sigle Istructio Multiple Data (SIMD). Aother oe is istructio pipeliig: phase of istructio #i rus i parallel to phase 1 of istructio #i+1, so circuitry parts do t wait idle similar to a coveyor belt. Sometimes i research the term of time locality is used to deote ay of the above..3. Stockham Autosort ad Pease s Algorithm Certai algorithms have bee prove to fit well with space or time locality. Oe favoured method with SIMD is the Stockham Autosort (1966). Alteratively to the 6-step, it embeds the reverse-bit permute ito the butterfly: whe computig a FFT decimatio stage, the results go ito locatios of a itermediate workig array, determied by trasposig (flippig) bit positios i the idex s biary represetatio. Stockham is called auto-sort sice o separate re-orderig/traspose step is required. Pease s algorithm (1968) promises eve greater SIMD advatages but requires a separate perfect-shuffle permute stage ad O(N log N) auxiliary storage. I [15] he itroduces Kroecker product otatio to express ay permute via matrix traspositios. Stockham efficiecy deteriorates dramatically whe size exceeds 1/3 rd of the cache; ideed apart from the iput array, for each phase it eeds two alteratig workig storage arrays of the same size as the iput array, plus more for temporary variables etc.

35 Chapter 3: Geeric Desig This chapter itroduces our approach to the solutios, ad describes these i detail. 3.1 The Solutios Itroducig the Middle-Heap Comig back to twiddle factors, we accept that it is uwise to store them permaetly, for all possible problem sizes. We argue however that we ca compute them icremetally for the most widely used sizes powers-of-two. First defie array H 3: H 3 = {-1; i, i}; at step >1, usig array of size -1, compute the +1-1 array s etries of idices to +1 (the odd powers of 1 ) possible to do parallel to samplig. So H 7 = ={-1; i,-i; 8 i, 8 i, 8 i, 8 i }, etc. The above requires a matchig data structure: cosider a tree that grows by doublig its size with each ew level added. It is most appropriate to use a heap (see [6] ) sice the uderlyig storage costruct is a array. The heap is a biary tree completely filled with elemets, except possibly at the level of the leaves, stored i a array. The root is stored at idex 1 of the array; if a paret has idex i, its left child has idex i, ad its right child has idex i+1. Each elemet is, or

36 Chapter 3: Geeric Desig 6 has a key value from a partially or totally ordered set. Whe all parets are larger tha their childre, the heap is a max-heap. Here is a example from [6] : A (max-) heap viewed as (a) a biary tree ad (b) a array. The umber withi the circle at each ode i the tree is the value stored at that ode. The umber above a ode is the correspodig idex i the array. Above ad below the array are lies showig paret-child relatioships; parets are always to the left of their childre. The tree has height three; the ode at idex 4 (with value 8) has height oe. Figure 3.1. Max-Heap Similarly, a mi-heap is the same structure but parets are smaller tha their childre. Defiitio : A array H of size N i which for ay atural i N the followig holds: if i N the H[i] H[i], ad if i < N the H[i] H[i+1], is called a mi-heap. Whe either of them exists withi the array H, H[i] is defied as the left child of H[i], ad H[i+1] is defied as the right child of H[i] i the tree of mi-heap H; H[i] is by defiitio the paret of H[i] ad H[i+1]. H[1] is the tree root of mi-heap H. A mi-heap H is complete whe its size N is of the form N = 1 for some > 1. Give the idex i, the idices of its left child LEFT(i), right child RIGHT(i) ad whe the ode is o-root of its paret PARENT(i), are computed by pseudo-code as follows: PARENT(i) retur i/ LEFT(i) retur i RIGHT(i) retur i + 1

37 Chapter 3: Geeric Desig 7 For the complex roots of uity p, let partial order be ascedig by degree, ad total order be by the (for two roots with equal degrees ) ascedig by p. Defiitio 3: A complete mi-heap of size N = 1 cotaiig all the twiddle-factors smaller tha or equal to ω N+1N is called middle-heap of size N, deoted H N. We use oly middle-heaps sorted per the total order defied above, as i Figure 3.: -1 i -i H 3 i i i i H7 i e 16 3 i e 16 5 i e 16 7 i e 16 9 i e i e i e i e 16 H (H 31 ) Figure 3.. Middle-Heap 3.1. Icremetal Computatio of Twiddle Factors the Mea-middle Method Whe the H N middle-heap is i-traversed, the twiddle factors are retured i ascedig powers of the N+1-th primitive root of uity, except the trivial oe N+1 N+1 =1. For example i H 7 (Figure 3.), by startig from the left-most leaf i i =, the its e 8 paret i = i e 8 i, followed by its siblig i =, the up the tree util the root 3 e 8 1 = 4 i e 8 the dow left the right sub-tree of the root to the leaf 5 e 8 i i =, etc. Note that i H 7 the leaves have this property: each leaf except the left-most is the mea average of two adjacet traversed values from H 3, scaled with coefficietc 3 :

38 Chapter 3: Geeric Desig 8 i ( 1) ( 1) ( i) i 1 i = c 3, i = c3, i = c 3. The left-most leaf is 1 i i = c3. Thus we ca work cyclically from 1 ad the first i-traversed value, the drop the 1 ad combie the first with the secod i-traversed value, the cotiue i the same way, util i the last step 1 is used agai; proof follows. Cosider Figure 3.3; m 8 ad m 16 are the midpoits of the respective segmets. Figure 3.3. Meas of Adjacet Twiddle Factors 3 8 i 8 m 8 m i 8 7 The followig equalities hold: 8 = c 3 m 8 = i e 16 1 i c3, where 8 = e 8 i i, c 3, i= 4 = e 4 ; = = c 4 m 16 = c4 for c 4 = m, where 1 m 1 cos si Similarly, 3 8 i 16 = c4 (ote that 16 = 8 ), etc. k k 1 k 1 Lemma 4: For ay atural k ad costat c +1, such that c 1 1. Proof: Sice 1, 1 is colliear with its argumet s agle bisector. So is 1, sice per Halvig lemma ( 1 ad multiplicatio by 1 ) is rotatio by its argumet. Deote c 1 1. Cosider c 1 :

39 Chapter 3: Geeric Desig 9 its module is 1 i.e. equal to that of 1. Sice it is also colliear to 1, ad both are i the first quadrat, they are equal: 1 k c 1 1. Now multiply both sides by : 1 k c 1 k 1 c 1 k k 1 k k k 1. Fially, We ca compute the leaves for the ext Middle-heap degree (the odd powers of the primitive root of uity for the ext power of two) by fidig mea averages of cosecutive traversed values from the middle-heap of the previous degree, ad multiplyig by a real-value costat c. The costat ca be calculated oce for each size. This allows us to work out each twiddle factor roughly with oe complex i.e. two real-valued additios, ad two (real-valued) multiplicatios; add the O(1) time of workig out a costat, for each ext power-of-two cout. We igore the divide by two i computig the mea average it is a cheap right shift of the matissa, or (eve better for loss of precisio) decremet of the floatig poit value s (biary) power. The ca be hidde i the costat, but below we will ote a beefit if c is close to 1. Note this calculatio ca be carried out idepedet of the Middle-heap data structure; i sectio we show that the Middle-heap traversal takes liear time with a small costat. Call this the Mea-middle method. Its arithmetic complexity is better tha that of ay other method of computig twiddle factors. This is a major cotributio of this work. We also show that due to its biary-chop patter, the precisio of calculatig the twiddle factors i this way is also superior to other methods kow so far.

40 Chapter 3: Geeric Desig Load-Balacig for Icremetality Our approach is that cores specialise i parts of the FFT computatio. This is ituitively expected to improve time locality/simd, overall. There is work for four cores at least: C1. Samplig ad Data Coversio; C. Twiddle Factor Computatio; C3. Shuffle, or Matrix Traspose (for pxq decimatio, if six-step used sectio.3); C4. Butterfly Operatio ad ay other FLOPs; Uder the premise that the computatio is icremetal, ad progresses i parallel with the samplig process, from amog the variats of the classical Cooley-Tukey algorithm, Decimatio i Frequecy (DIF) is the better cadidate tha Decimatio i Time (DIT): the former allows for the computatio to proceed with the lower half of samples, while the upper half is ot yet available. All four cores progress i parallel ad ay core except the first may have to wait for its previous oe to complete a part of its work, i order to make available some data eeded for its computatio. It appears wise for C3 to wait o C before sigallig the release of C4, rather tha C4 for both C ad C3; cascadig sychroisatio is simpler. The sychroisatio mechaism suggested herei is spi-lock: the overhead is miimal, due to o cotext-switchig. That said, platform features are welcome, like the register files for fast switchig of two threads i the PPE core of the CellBE processor. Note that the work of C1 is ot trivial: samples are ormally take i fixed-poit arithmetic from Aalog-to-Digital Coverters (ADCs) ad each eeds to be coverted to floatig-poit accordig to some fuctio, i most cases but ot always a liear oe.

Chapter 3 Classification of FFT Processor Algorithms

Chapter 3 Classification of FFT Processor Algorithms Chapter Classificatio of FFT Processor Algorithms The computatioal complexity of the Discrete Fourier trasform (DFT) is very high. It requires () 2 complex multiplicatios ad () complex additios [5]. As

More information

EE123 Digital Signal Processing

EE123 Digital Signal Processing Last Time EE Digital Sigal Processig Lecture 7 Block Covolutio, Overlap ad Add, FFT Discrete Fourier Trasform Properties of the Liear covolutio through circular Today Liear covolutio with Overlap ad add

More information

Lecture 1: Introduction and Strassen s Algorithm

Lecture 1: Introduction and Strassen s Algorithm 5-750: Graduate Algorithms Jauary 7, 08 Lecture : Itroductio ad Strasse s Algorithm Lecturer: Gary Miller Scribe: Robert Parker Itroductio Machie models I this class, we will primarily use the Radom Access

More information

Fast Fourier Transform (FFT) Algorithms

Fast Fourier Transform (FFT) Algorithms Fast Fourier Trasform FFT Algorithms Relatio to the z-trasform elsewhere, ozero, z x z X x [ ] 2 ~ elsewhere,, ~ e j x X x x π j e z z X X π 2 ~ The DFS X represets evely spaced samples of the z- trasform

More information

Lecture 5. Counting Sort / Radix Sort

Lecture 5. Counting Sort / Radix Sort Lecture 5. Coutig Sort / Radix Sort T. H. Corme, C. E. Leiserso ad R. L. Rivest Itroductio to Algorithms, 3rd Editio, MIT Press, 2009 Sugkyukwa Uiversity Hyuseug Choo choo@skku.edu Copyright 2000-2018

More information

CIS 121 Data Structures and Algorithms with Java Spring Stacks, Queues, and Heaps Monday, February 18 / Tuesday, February 19

CIS 121 Data Structures and Algorithms with Java Spring Stacks, Queues, and Heaps Monday, February 18 / Tuesday, February 19 CIS Data Structures ad Algorithms with Java Sprig 09 Stacks, Queues, ad Heaps Moday, February 8 / Tuesday, February 9 Stacks ad Queues Recall the stack ad queue ADTs (abstract data types from lecture.

More information

Ones Assignment Method for Solving Traveling Salesman Problem

Ones Assignment Method for Solving Traveling Salesman Problem Joural of mathematics ad computer sciece 0 (0), 58-65 Oes Assigmet Method for Solvig Travelig Salesma Problem Hadi Basirzadeh Departmet of Mathematics, Shahid Chamra Uiversity, Ahvaz, Ira Article history:

More information

APPLICATION NOTE PACE1750AE BUILT-IN FUNCTIONS

APPLICATION NOTE PACE1750AE BUILT-IN FUNCTIONS APPLICATION NOTE PACE175AE BUILT-IN UNCTIONS About This Note This applicatio brief is iteded to explai ad demostrate the use of the special fuctios that are built ito the PACE175AE processor. These powerful

More information

Lecture Notes 6 Introduction to algorithm analysis CSS 501 Data Structures and Object-Oriented Programming

Lecture Notes 6 Introduction to algorithm analysis CSS 501 Data Structures and Object-Oriented Programming Lecture Notes 6 Itroductio to algorithm aalysis CSS 501 Data Structures ad Object-Orieted Programmig Readig for this lecture: Carrao, Chapter 10 To be covered i this lecture: Itroductio to algorithm aalysis

More information

Octahedral Graph Scaling

Octahedral Graph Scaling Octahedral Graph Scalig Peter Russell Jauary 1, 2015 Abstract There is presetly o strog iterpretatio for the otio of -vertex graph scalig. This paper presets a ew defiitio for the term i the cotext of

More information

Appendix D. Controller Implementation

Appendix D. Controller Implementation COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Appedix D Cotroller Implemetatio Cotroller Implemetatios Combiatioal logic (sigle-cycle); Fiite state machie (multi-cycle, pipelied);

More information

Outline. Applications of FFT in Communications. Fundamental FFT Algorithms. FFT Circuit Design Architectures. Conclusions

Outline. Applications of FFT in Communications. Fundamental FFT Algorithms. FFT Circuit Design Architectures. Conclusions FFT Circuit Desig Outlie Applicatios of FFT i Commuicatios Fudametal FFT Algorithms FFT Circuit Desig Architectures Coclusios DAB Receiver Tuer OFDM Demodulator Chael Decoder Mpeg Audio Decoder 56/5/ 4/48

More information

9.1. Sequences and Series. Sequences. What you should learn. Why you should learn it. Definition of Sequence

9.1. Sequences and Series. Sequences. What you should learn. Why you should learn it. Definition of Sequence _9.qxd // : AM Page Chapter 9 Sequeces, Series, ad Probability 9. Sequeces ad Series What you should lear Use sequece otatio to write the terms of sequeces. Use factorial otatio. Use summatio otatio to

More information

CIS 121 Data Structures and Algorithms with Java Fall Big-Oh Notation Tuesday, September 5 (Make-up Friday, September 8)

CIS 121 Data Structures and Algorithms with Java Fall Big-Oh Notation Tuesday, September 5 (Make-up Friday, September 8) CIS 11 Data Structures ad Algorithms with Java Fall 017 Big-Oh Notatio Tuesday, September 5 (Make-up Friday, September 8) Learig Goals Review Big-Oh ad lear big/small omega/theta otatios Practice solvig

More information

Sorting in Linear Time. Data Structures and Algorithms Andrei Bulatov

Sorting in Linear Time. Data Structures and Algorithms Andrei Bulatov Sortig i Liear Time Data Structures ad Algorithms Adrei Bulatov Algorithms Sortig i Liear Time 7-2 Compariso Sorts The oly test that all the algorithms we have cosidered so far is compariso The oly iformatio

More information

Math 10C Long Range Plans

Math 10C Long Range Plans Math 10C Log Rage Plas Uits: Evaluatio: Homework, projects ad assigmets 10% Uit Tests. 70% Fial Examiatio.. 20% Ay Uit Test may be rewritte for a higher mark. If the retest mark is higher, that mark will

More information

Lower Bounds for Sorting

Lower Bounds for Sorting Liear Sortig Topics Covered: Lower Bouds for Sortig Coutig Sort Radix Sort Bucket Sort Lower Bouds for Sortig Compariso vs. o-compariso sortig Decisio tree model Worst case lower boud Compariso Sortig

More information

Analysis Metrics. Intro to Algorithm Analysis. Slides. 12. Alg Analysis. 12. Alg Analysis

Analysis Metrics. Intro to Algorithm Analysis. Slides. 12. Alg Analysis. 12. Alg Analysis Itro to Algorithm Aalysis Aalysis Metrics Slides. Table of Cotets. Aalysis Metrics 3. Exact Aalysis Rules 4. Simple Summatio 5. Summatio Formulas 6. Order of Magitude 7. Big-O otatio 8. Big-O Theorems

More information

A New Morphological 3D Shape Decomposition: Grayscale Interframe Interpolation Method

A New Morphological 3D Shape Decomposition: Grayscale Interframe Interpolation Method A ew Morphological 3D Shape Decompositio: Grayscale Iterframe Iterpolatio Method D.. Vizireau Politehica Uiversity Bucharest, Romaia ae@comm.pub.ro R. M. Udrea Politehica Uiversity Bucharest, Romaia mihea@comm.pub.ro

More information

Elementary Educational Computer

Elementary Educational Computer Chapter 5 Elemetary Educatioal Computer. Geeral structure of the Elemetary Educatioal Computer (EEC) The EEC coforms to the 5 uits structure defied by vo Neuma's model (.) All uits are preseted i a simplified

More information

CIS 121 Data Structures and Algorithms with Java Spring Stacks and Queues Monday, February 12 / Tuesday, February 13

CIS 121 Data Structures and Algorithms with Java Spring Stacks and Queues Monday, February 12 / Tuesday, February 13 CIS Data Structures ad Algorithms with Java Sprig 08 Stacks ad Queues Moday, February / Tuesday, February Learig Goals Durig this lab, you will: Review stacks ad queues. Lear amortized ruig time aalysis

More information

Data Structures and Algorithms. Analysis of Algorithms

Data Structures and Algorithms. Analysis of Algorithms Data Structures ad Algorithms Aalysis of Algorithms Outlie Ruig time Pseudo-code Big-oh otatio Big-theta otatio Big-omega otatio Asymptotic algorithm aalysis Aalysis of Algorithms Iput Algorithm Output

More information

CSC 220: Computer Organization Unit 11 Basic Computer Organization and Design

CSC 220: Computer Organization Unit 11 Basic Computer Organization and Design College of Computer ad Iformatio Scieces Departmet of Computer Sciece CSC 220: Computer Orgaizatio Uit 11 Basic Computer Orgaizatio ad Desig 1 For the rest of the semester, we ll focus o computer architecture:

More information

Pseudocode ( 1.1) Analysis of Algorithms. Primitive Operations. Pseudocode Details. Running Time ( 1.1) Estimating performance

Pseudocode ( 1.1) Analysis of Algorithms. Primitive Operations. Pseudocode Details. Running Time ( 1.1) Estimating performance Aalysis of Algorithms Iput Algorithm Output A algorithm is a step-by-step procedure for solvig a problem i a fiite amout of time. Pseudocode ( 1.1) High-level descriptio of a algorithm More structured

More information

How do we evaluate algorithms?

How do we evaluate algorithms? F2 Readig referece: chapter 2 + slides Algorithm complexity Big O ad big Ω To calculate ruig time Aalysis of recursive Algorithms Next time: Litterature: slides mostly The first Algorithm desig methods:

More information

1.2 Binomial Coefficients and Subsets

1.2 Binomial Coefficients and Subsets 1.2. BINOMIAL COEFFICIENTS AND SUBSETS 13 1.2 Biomial Coefficiets ad Subsets 1.2-1 The loop below is part of a program to determie the umber of triagles formed by poits i the plae. for i =1 to for j =

More information

South Slave Divisional Education Council. Math 10C

South Slave Divisional Education Council. Math 10C South Slave Divisioal Educatio Coucil Math 10C Curriculum Package February 2012 12 Strad: Measuremet Geeral Outcome: Develop spatial sese ad proportioal reasoig It is expected that studets will: 1. Solve

More information

Running Time ( 3.1) Analysis of Algorithms. Experimental Studies. Limitations of Experiments

Running Time ( 3.1) Analysis of Algorithms. Experimental Studies. Limitations of Experiments Ruig Time ( 3.1) Aalysis of Algorithms Iput Algorithm Output A algorithm is a step- by- step procedure for solvig a problem i a fiite amout of time. Most algorithms trasform iput objects ito output objects.

More information

. Written in factored form it is easy to see that the roots are 2, 2, i,

. Written in factored form it is easy to see that the roots are 2, 2, i, CMPS A Itroductio to Programmig Programmig Assigmet 4 I this assigmet you will write a java program that determies the real roots of a polyomial that lie withi a specified rage. Recall that the roots (or

More information

Analysis of Algorithms

Analysis of Algorithms Aalysis of Algorithms Iput Algorithm Output A algorithm is a step-by-step procedure for solvig a problem i a fiite amout of time. Ruig Time Most algorithms trasform iput objects ito output objects. The

More information

Running Time. Analysis of Algorithms. Experimental Studies. Limitations of Experiments

Running Time. Analysis of Algorithms. Experimental Studies. Limitations of Experiments Ruig Time Aalysis of Algorithms Iput Algorithm Output A algorithm is a step-by-step procedure for solvig a problem i a fiite amout of time. Most algorithms trasform iput objects ito output objects. The

More information

UNIVERSITY OF MORATUWA

UNIVERSITY OF MORATUWA UNIVERSITY OF MORATUWA FACULTY OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING B.Sc. Egieerig 2014 Itake Semester 2 Examiatio CS2052 COMPUTER ARCHITECTURE Time allowed: 2 Hours Jauary 2016

More information

COMP Parallel Computing. PRAM (1): The PRAM model and complexity measures

COMP Parallel Computing. PRAM (1): The PRAM model and complexity measures COMP 633 - Parallel Computig Lecture 2 August 24, 2017 : The PRAM model ad complexity measures 1 First class summary This course is about parallel computig to achieve high-er performace o idividual problems

More information

Alpha Individual Solutions MAΘ National Convention 2013

Alpha Individual Solutions MAΘ National Convention 2013 Alpha Idividual Solutios MAΘ Natioal Covetio 0 Aswers:. D. A. C 4. D 5. C 6. B 7. A 8. C 9. D 0. B. B. A. D 4. C 5. A 6. C 7. B 8. A 9. A 0. C. E. B. D 4. C 5. A 6. D 7. B 8. C 9. D 0. B TB. 570 TB. 5

More information

What are we going to learn? CSC Data Structures Analysis of Algorithms. Overview. Algorithm, and Inputs

What are we going to learn? CSC Data Structures Analysis of Algorithms. Overview. Algorithm, and Inputs What are we goig to lear? CSC316-003 Data Structures Aalysis of Algorithms Computer Sciece North Carolia State Uiversity Need to say that some algorithms are better tha others Criteria for evaluatio Structure

More information

A Study on the Performance of Cholesky-Factorization using MPI

A Study on the Performance of Cholesky-Factorization using MPI A Study o the Performace of Cholesky-Factorizatio usig MPI Ha S. Kim Scott B. Bade Departmet of Computer Sciece ad Egieerig Uiversity of Califoria Sa Diego {hskim, bade}@cs.ucsd.edu Abstract Cholesky-factorizatio

More information

6.854J / J Advanced Algorithms Fall 2008

6.854J / J Advanced Algorithms Fall 2008 MIT OpeCourseWare http://ocw.mit.edu 6.854J / 18.415J Advaced Algorithms Fall 2008 For iformatio about citig these materials or our Terms of Use, visit: http://ocw.mit.edu/terms. 18.415/6.854 Advaced Algorithms

More information

Homework 1 Solutions MA 522 Fall 2017

Homework 1 Solutions MA 522 Fall 2017 Homework 1 Solutios MA 5 Fall 017 1. Cosider the searchig problem: Iput A sequece of umbers A = [a 1,..., a ] ad a value v. Output A idex i such that v = A[i] or the special value NIL if v does ot appear

More information

Lecture 6. Lecturer: Ronitt Rubinfeld Scribes: Chen Ziv, Eliav Buchnik, Ophir Arie, Jonathan Gradstein

Lecture 6. Lecturer: Ronitt Rubinfeld Scribes: Chen Ziv, Eliav Buchnik, Ophir Arie, Jonathan Gradstein 068.670 Subliear Time Algorithms November, 0 Lecture 6 Lecturer: Roitt Rubifeld Scribes: Che Ziv, Eliav Buchik, Ophir Arie, Joatha Gradstei Lesso overview. Usig the oracle reductio framework for approximatig

More information

IMP: Superposer Integrated Morphometrics Package Superposition Tool

IMP: Superposer Integrated Morphometrics Package Superposition Tool IMP: Superposer Itegrated Morphometrics Package Superpositio Tool Programmig by: David Lieber ( 03) Caisius College 200 Mai St. Buffalo, NY 4208 Cocept by: H. David Sheets, Dept. of Physics, Caisius College

More information

Thompson s Group F (p + 1) is not Minimally Almost Convex

Thompson s Group F (p + 1) is not Minimally Almost Convex Thompso s Group F (p + ) is ot Miimally Almost Covex Claire Wladis Thompso s Group F (p + ). A Descriptio of F (p + ) Thompso s group F (p + ) ca be defied as the group of piecewiseliear orietatio-preservig

More information

Combination Labelings Of Graphs

Combination Labelings Of Graphs Applied Mathematics E-Notes, (0), - c ISSN 0-0 Available free at mirror sites of http://wwwmaththuedutw/ame/ Combiatio Labeligs Of Graphs Pak Chig Li y Received February 0 Abstract Suppose G = (V; E) is

More information

Big-O Analysis. Asymptotics

Big-O Analysis. Asymptotics Big-O Aalysis 1 Defiitio: Suppose that f() ad g() are oegative fuctios of. The we say that f() is O(g()) provided that there are costats C > 0 ad N > 0 such that for all > N, f() Cg(). Big-O expresses

More information

Chapter 1. Introduction to Computers and C++ Programming. Copyright 2015 Pearson Education, Ltd.. All rights reserved.

Chapter 1. Introduction to Computers and C++ Programming. Copyright 2015 Pearson Education, Ltd.. All rights reserved. Chapter 1 Itroductio to Computers ad C++ Programmig Copyright 2015 Pearso Educatio, Ltd.. All rights reserved. Overview 1.1 Computer Systems 1.2 Programmig ad Problem Solvig 1.3 Itroductio to C++ 1.4 Testig

More information

The Magma Database file formats

The Magma Database file formats The Magma Database file formats Adrew Gaylard, Bret Pikey, ad Mart-Mari Breedt Johaesburg, South Africa 15th May 2006 1 Summary Magma is a ope-source object database created by Chris Muller, of Kasas City,

More information

Pattern Recognition Systems Lab 1 Least Mean Squares

Pattern Recognition Systems Lab 1 Least Mean Squares Patter Recogitio Systems Lab 1 Least Mea Squares 1. Objectives This laboratory work itroduces the OpeCV-based framework used throughout the course. I this assigmet a lie is fitted to a set of poits usig

More information

Analysis of Algorithms

Analysis of Algorithms Aalysis of Algorithms Ruig Time of a algorithm Ruig Time Upper Bouds Lower Bouds Examples Mathematical facts Iput Algorithm Output A algorithm is a step-by-step procedure for solvig a problem i a fiite

More information

Outline and Reading. Analysis of Algorithms. Running Time. Experimental Studies. Limitations of Experiments. Theoretical Analysis

Outline and Reading. Analysis of Algorithms. Running Time. Experimental Studies. Limitations of Experiments. Theoretical Analysis Outlie ad Readig Aalysis of Algorithms Iput Algorithm Output Ruig time ( 3.) Pseudo-code ( 3.2) Coutig primitive operatios ( 3.3-3.) Asymptotic otatio ( 3.6) Asymptotic aalysis ( 3.7) Case study Aalysis

More information

Load balanced Parallel Prime Number Generator with Sieve of Eratosthenes on Cluster Computers *

Load balanced Parallel Prime Number Generator with Sieve of Eratosthenes on Cluster Computers * Load balaced Parallel Prime umber Geerator with Sieve of Eratosthees o luster omputers * Soowook Hwag*, Kyusik hug**, ad Dogseug Kim* *Departmet of Electrical Egieerig Korea Uiversity Seoul, -, Rep. of

More information

BOOLEAN MATHEMATICS: GENERAL THEORY

BOOLEAN MATHEMATICS: GENERAL THEORY CHAPTER 3 BOOLEAN MATHEMATICS: GENERAL THEORY 3.1 ISOMORPHIC PROPERTIES The ame Boolea Arithmetic was chose because it was discovered that literal Boolea Algebra could have a isomorphic umerical aspect.

More information

Morgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5

Morgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5 Morga Kaufma Publishers 26 February, 28 COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter 5 Set-Associative Cache Architecture Performace Summary Whe CPU performace icreases:

More information

Computational Geometry

Computational Geometry Computatioal Geometry Chapter 4 Liear programmig Duality Smallest eclosig disk O the Ageda Liear Programmig Slides courtesy of Craig Gotsma 4. 4. Liear Programmig - Example Defie: (amout amout cosumed

More information

Counting Regions in the Plane and More 1

Counting Regions in the Plane and More 1 Coutig Regios i the Plae ad More 1 by Zvezdelia Stakova Berkeley Math Circle Itermediate I Group September 016 1. Overarchig Problem Problem 1 Regios i a Circle. The vertices of a polygos are arraged o

More information

n n B. How many subsets of C are there of cardinality n. We are selecting elements for such a

n n B. How many subsets of C are there of cardinality n. We are selecting elements for such a 4. [10] Usig a combiatorial argumet, prove that for 1: = 0 = Let A ad B be disjoit sets of cardiality each ad C = A B. How may subsets of C are there of cardiality. We are selectig elemets for such a subset

More information

CSC165H1 Worksheet: Tutorial 8 Algorithm analysis (SOLUTIONS)

CSC165H1 Worksheet: Tutorial 8 Algorithm analysis (SOLUTIONS) CSC165H1, Witer 018 Learig Objectives By the ed of this worksheet, you will: Aalyse the ruig time of fuctios cotaiig ested loops. 1. Nested loop variatios. Each of the followig fuctios takes as iput a

More information

Intermediate Statistics

Intermediate Statistics Gait Learig Guides Itermediate Statistics Data processig & display, Cetral tedecy Author: Raghu M.D. STATISTICS DATA PROCESSING AND DISPLAY Statistics is the study of data or umerical facts of differet

More information

EE 459/500 HDL Based Digital Design with Programmable Logic. Lecture 13 Control and Sequencing: Hardwired and Microprogrammed Control

EE 459/500 HDL Based Digital Design with Programmable Logic. Lecture 13 Control and Sequencing: Hardwired and Microprogrammed Control EE 459/500 HDL Based Digital Desig with Programmable Logic Lecture 13 Cotrol ad Sequecig: Hardwired ad Microprogrammed Cotrol Refereces: Chapter s 4,5 from textbook Chapter 7 of M.M. Mao ad C.R. Kime,

More information

Counting the Number of Minimum Roman Dominating Functions of a Graph

Counting the Number of Minimum Roman Dominating Functions of a Graph Coutig the Number of Miimum Roma Domiatig Fuctios of a Graph SHI ZHENG ad KOH KHEE MENG, Natioal Uiversity of Sigapore We provide two algorithms coutig the umber of miimum Roma domiatig fuctios of a graph

More information

condition w i B i S maximum u i

condition w i B i S maximum u i ecture 10 Dyamic Programmig 10.1 Kapsack Problem November 1, 2004 ecturer: Kamal Jai Notes: Tobias Holgers We are give a set of items U = {a 1, a 2,..., a }. Each item has a weight w i Z + ad a utility

More information

Improving Template Based Spike Detection

Improving Template Based Spike Detection Improvig Template Based Spike Detectio Kirk Smith, Member - IEEE Portlad State Uiversity petra@ee.pdx.edu Abstract Template matchig algorithms like SSE, Covolutio ad Maximum Likelihood are well kow for

More information

CMSC Computer Architecture Lecture 12: Virtual Memory. Prof. Yanjing Li University of Chicago

CMSC Computer Architecture Lecture 12: Virtual Memory. Prof. Yanjing Li University of Chicago CMSC 22200 Computer Architecture Lecture 12: Virtual Memory Prof. Yajig Li Uiversity of Chicago A System with Physical Memory Oly Examples: most Cray machies early PCs Memory early all embedded systems

More information

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe CHAPTER 19 Query Optimizatio Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe Itroductio Query optimizatio Coducted by a query optimizer i a DBMS Goal:

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Part A Datapath Design

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Part A Datapath Design COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter The Processor Part A path Desig Itroductio CPU performace factors Istructio cout Determied by ISA ad compiler. CPI ad

More information

3D Model Retrieval Method Based on Sample Prediction

3D Model Retrieval Method Based on Sample Prediction 20 Iteratioal Coferece o Computer Commuicatio ad Maagemet Proc.of CSIT vol.5 (20) (20) IACSIT Press, Sigapore 3D Model Retrieval Method Based o Sample Predictio Qigche Zhag, Ya Tag* School of Computer

More information

Wavelet Transform. CSE 490 G Introduction to Data Compression Winter Wavelet Transformed Barbara (Enhanced) Wavelet Transformed Barbara (Actual)

Wavelet Transform. CSE 490 G Introduction to Data Compression Winter Wavelet Transformed Barbara (Enhanced) Wavelet Transformed Barbara (Actual) Wavelet Trasform CSE 49 G Itroductio to Data Compressio Witer 6 Wavelet Trasform Codig PACW Wavelet Trasform A family of atios that filters the data ito low resolutio data plus detail data high pass filter

More information

Heaps. Presentation for use with the textbook Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, 2015

Heaps. Presentation for use with the textbook Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, 2015 Presetatio for use with the textbook Algorithm Desig ad Applicatios, by M. T. Goodrich ad R. Tamassia, Wiley, 201 Heaps 201 Goodrich ad Tamassia xkcd. http://xkcd.com/83/. Tree. Used with permissio uder

More information

CHAPTER IV: GRAPH THEORY. Section 1: Introduction to Graphs

CHAPTER IV: GRAPH THEORY. Section 1: Introduction to Graphs CHAPTER IV: GRAPH THEORY Sectio : Itroductio to Graphs Sice this class is called Number-Theoretic ad Discrete Structures, it would be a crime to oly focus o umber theory regardless how woderful those topics

More information

LU Decomposition Method

LU Decomposition Method SOLUTION OF SIMULTANEOUS LINEAR EQUATIONS LU Decompositio Method Jamie Traha, Autar Kaw, Kevi Marti Uiversity of South Florida Uited States of America kaw@eg.usf.edu http://umericalmethods.eg.usf.edu Itroductio

More information

Algorithm. Counting Sort Analysis of Algorithms

Algorithm. Counting Sort Analysis of Algorithms Algorithm Coutig Sort Aalysis of Algorithms Assumptios: records Coutig sort Each record cotais keys ad data All keys are i the rage of 1 to k Space The usorted list is stored i A, the sorted list will

More information

Big-O Analysis. Asymptotics

Big-O Analysis. Asymptotics Big-O Aalysis 1 Defiitio: Suppose that f() ad g() are oegative fuctios of. The we say that f() is O(g()) provided that there are costats C > 0 ad N > 0 such that for all > N, f() Cg(). Big-O expresses

More information

Lecture 2: Spectra of Graphs

Lecture 2: Spectra of Graphs Spectral Graph Theory ad Applicatios WS 20/202 Lecture 2: Spectra of Graphs Lecturer: Thomas Sauerwald & He Su Our goal is to use the properties of the adjacecy/laplacia matrix of graphs to first uderstad

More information

Lecturers: Sanjam Garg and Prasad Raghavendra Feb 21, Midterm 1 Solutions

Lecturers: Sanjam Garg and Prasad Raghavendra Feb 21, Midterm 1 Solutions U.C. Berkeley CS170 : Algorithms Midterm 1 Solutios Lecturers: Sajam Garg ad Prasad Raghavedra Feb 1, 017 Midterm 1 Solutios 1. (4 poits) For the directed graph below, fid all the strogly coected compoets

More information

Analysis of Algorithms

Analysis of Algorithms Presetatio for use with the textbook, Algorithm Desig ad Applicatios, by M. T. Goodrich ad R. Tamassia, Wiley, 2015 Aalysis of Algorithms Iput 2015 Goodrich ad Tamassia Algorithm Aalysis of Algorithms

More information

Module 8-7: Pascal s Triangle and the Binomial Theorem

Module 8-7: Pascal s Triangle and the Binomial Theorem Module 8-7: Pascal s Triagle ad the Biomial Theorem Gregory V. Bard April 5, 017 A Note about Notatio Just to recall, all of the followig mea the same thig: ( 7 7C 4 C4 7 7C4 5 4 ad they are (all proouced

More information

Basic allocator mechanisms The course that gives CMU its Zip! Memory Management II: Dynamic Storage Allocation Mar 6, 2000.

Basic allocator mechanisms The course that gives CMU its Zip! Memory Management II: Dynamic Storage Allocation Mar 6, 2000. 5-23 The course that gives CM its Zip Memory Maagemet II: Dyamic Storage Allocatio Mar 6, 2000 Topics Segregated lists Buddy system Garbage collectio Mark ad Sweep Copyig eferece coutig Basic allocator

More information

Hash Tables. Presentation for use with the textbook Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, 2015.

Hash Tables. Presentation for use with the textbook Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, 2015. Presetatio for use with the textbook Algorithm Desig ad Applicatios, by M. T. Goodrich ad R. Tamassia, Wiley, 2015 Hash Tables xkcd. http://xkcd.com/221/. Radom Number. Used with permissio uder Creative

More information

Performance Plus Software Parameter Definitions

Performance Plus Software Parameter Definitions Performace Plus+ Software Parameter Defiitios/ Performace Plus Software Parameter Defiitios Chapma Techical Note-TG-5 paramete.doc ev-0-03 Performace Plus+ Software Parameter Defiitios/2 Backgroud ad Defiitios

More information

An Efficient Algorithm for Graph Bisection of Triangularizations

An Efficient Algorithm for Graph Bisection of Triangularizations A Efficiet Algorithm for Graph Bisectio of Triagularizatios Gerold Jäger Departmet of Computer Sciece Washigto Uiversity Campus Box 1045 Oe Brookigs Drive St. Louis, Missouri 63130-4899, USA jaegerg@cse.wustl.edu

More information

Improvement of the Orthogonal Code Convolution Capabilities Using FPGA Implementation

Improvement of the Orthogonal Code Convolution Capabilities Using FPGA Implementation Improvemet of the Orthogoal Code Covolutio Capabilities Usig FPGA Implemetatio Naima Kaabouch, Member, IEEE, Apara Dhirde, Member, IEEE, Saleh Faruque, Member, IEEE Departmet of Electrical Egieerig, Uiversity

More information

Ch 9.3 Geometric Sequences and Series Lessons

Ch 9.3 Geometric Sequences and Series Lessons Ch 9.3 Geometric Sequeces ad Series Lessos SKILLS OBJECTIVES Recogize a geometric sequece. Fid the geeral, th term of a geometric sequece. Evaluate a fiite geometric series. Evaluate a ifiite geometric

More information

Data diverse software fault tolerance techniques

Data diverse software fault tolerance techniques Data diverse software fault tolerace techiques Complemets desig diversity by compesatig for desig diversity s s limitatios Ivolves obtaiig a related set of poits i the program data space, executig the

More information

6.851: Advanced Data Structures Spring Lecture 17 April 24

6.851: Advanced Data Structures Spring Lecture 17 April 24 6.851: Advaced Data Structures Sprig 2012 Prof. Erik Demaie Lecture 17 April 24 Scribes: David Bejami(2012), Li Fei(2012), Yuzhi Zheg(2012),Morteza Zadimoghaddam(2010), Aaro Berstei(2007) 1 Overview Up

More information

SPIRAL DSP Transform Compiler:

SPIRAL DSP Transform Compiler: SPIRAL DSP Trasform Compiler: Applicatio Specific Hardware Sythesis Peter A. Milder (peter.milder@stoybroo.edu) Fraz Frachetti, James C. Hoe, ad Marus Pueschel Departmet of ECE Caregie Mello Uiversity

More information

Chapter 3. Floating Point Arithmetic

Chapter 3. Floating Point Arithmetic COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter 3 Floatig Poit Arithmetic Review - Multiplicatio 0 1 1 0 = 6 multiplicad 32-bit ALU shift product right multiplier add

More information

On (K t e)-saturated Graphs

On (K t e)-saturated Graphs Noame mauscript No. (will be iserted by the editor O (K t e-saturated Graphs Jessica Fuller Roald J. Gould the date of receipt ad acceptace should be iserted later Abstract Give a graph H, we say a graph

More information

The isoperimetric problem on the hypercube

The isoperimetric problem on the hypercube The isoperimetric problem o the hypercube Prepared by: Steve Butler November 2, 2005 1 The isoperimetric problem We will cosider the -dimesioal hypercube Q Recall that the hypercube Q is a graph whose

More information

arxiv: v2 [cs.ds] 24 Mar 2018

arxiv: v2 [cs.ds] 24 Mar 2018 Similar Elemets ad Metric Labelig o Complete Graphs arxiv:1803.08037v [cs.ds] 4 Mar 018 Pedro F. Felzeszwalb Brow Uiversity Providece, RI, USA pff@brow.edu March 8, 018 We cosider a problem that ivolves

More information

Analysis of Server Resource Consumption of Meteorological Satellite Application System Based on Contour Curve

Analysis of Server Resource Consumption of Meteorological Satellite Application System Based on Contour Curve Advaces i Computer, Sigals ad Systems (2018) 2: 19-25 Clausius Scietific Press, Caada Aalysis of Server Resource Cosumptio of Meteorological Satellite Applicatio System Based o Cotour Curve Xiagag Zhao

More information

CSE 2320 Notes 8: Sorting. (Last updated 10/3/18 7:16 PM) Idea: Take an unsorted (sub)array and partition into two subarrays such that.

CSE 2320 Notes 8: Sorting. (Last updated 10/3/18 7:16 PM) Idea: Take an unsorted (sub)array and partition into two subarrays such that. CSE Notes 8: Sortig (Last updated //8 7:6 PM) CLRS 7.-7., 9., 8.-8. 8.A. QUICKSORT Cocepts Idea: Take a usorted (sub)array ad partitio ito two subarrays such that p q r x y z x y y z Pivot Customarily,

More information

CS 683: Advanced Design and Analysis of Algorithms

CS 683: Advanced Design and Analysis of Algorithms CS 683: Advaced Desig ad Aalysis of Algorithms Lecture 6, February 1, 2008 Lecturer: Joh Hopcroft Scribes: Shaomei Wu, Etha Feldma February 7, 2008 1 Threshold for k CNF Satisfiability I the previous lecture,

More information

Accuracy Improvement in Camera Calibration

Accuracy Improvement in Camera Calibration Accuracy Improvemet i Camera Calibratio FaJie L Qi Zag ad Reihard Klette CITR, Computer Sciece Departmet The Uiversity of Aucklad Tamaki Campus, Aucklad, New Zealad fli006, qza001@ec.aucklad.ac.z r.klette@aucklad.ac.z

More information

The following algorithms have been tested as a method of converting an I.F. from 16 to 512 MHz to 31 real 16 MHz USB channels:

The following algorithms have been tested as a method of converting an I.F. from 16 to 512 MHz to 31 real 16 MHz USB channels: DBE Memo#1 MARK 5 MEMO #18 MASSACHUSETTS INSTITUTE OF TECHNOLOGY HAYSTACK OBSERVATORY WESTFORD, MASSACHUSETTS 1886 November 19, 24 Telephoe: 978-692-4764 Fax: 781-981-59 To: From: Mark 5 Developmet Group

More information

Algorithms Chapter 3 Growth of Functions

Algorithms Chapter 3 Growth of Functions Algorithms Chapter 3 Growth of Fuctios Istructor: Chig Chi Li 林清池助理教授 chigchi.li@gmail.com Departmet of Computer Sciece ad Egieerig Natioal Taiwa Ocea Uiversity Outlie Asymptotic otatio Stadard otatios

More information

University of Waterloo Department of Electrical and Computer Engineering ECE 250 Algorithms and Data Structures

University of Waterloo Department of Electrical and Computer Engineering ECE 250 Algorithms and Data Structures Uiversity of Waterloo Departmet of Electrical ad Computer Egieerig ECE 250 Algorithms ad Data Structures Midterm Examiatio ( pages) Istructor: Douglas Harder February 7, 2004 7:30-9:00 Name (last, first)

More information

Recursive Procedures. How can you model the relationship between consecutive terms of a sequence?

Recursive Procedures. How can you model the relationship between consecutive terms of a sequence? 6. Recursive Procedures I Sectio 6.1, you used fuctio otatio to write a explicit formula to determie the value of ay term i a Sometimes it is easier to calculate oe term i a sequece usig the previous terms.

More information

Creating Exact Bezier Representations of CST Shapes. David D. Marshall. California Polytechnic State University, San Luis Obispo, CA , USA

Creating Exact Bezier Representations of CST Shapes. David D. Marshall. California Polytechnic State University, San Luis Obispo, CA , USA Creatig Exact Bezier Represetatios of CST Shapes David D. Marshall Califoria Polytechic State Uiversity, Sa Luis Obispo, CA 93407-035, USA The paper presets a method of expressig CST shapes pioeered by

More information

Major CSL Write your name and entry no on every sheet of the answer script. Time 2 Hrs Max Marks 70

Major CSL Write your name and entry no on every sheet of the answer script. Time 2 Hrs Max Marks 70 NOTE:. Attempt all seve questios. Major CSL 02 2. Write your ame ad etry o o every sheet of the aswer script. Time 2 Hrs Max Marks 70 Q No Q Q 2 Q 3 Q 4 Q 5 Q 6 Q 7 Total MM 6 2 4 0 8 4 6 70 Q. Write a

More information

1 Graph Sparsfication

1 Graph Sparsfication CME 305: Discrete Mathematics ad Algorithms 1 Graph Sparsficatio I this sectio we discuss the approximatio of a graph G(V, E) by a sparse graph H(V, F ) o the same vertex set. I particular, we cosider

More information

top() Applications of Stacks

top() Applications of Stacks CS22 Algorithms ad Data Structures MW :00 am - 2: pm, MSEC 0 Istructor: Xiao Qi Lecture 6: Stacks ad Queues Aoucemets Quiz results Homework 2 is available Due o September 29 th, 2004 www.cs.mt.edu~xqicoursescs22

More information

COSC 1P03. Ch 7 Recursion. Introduction to Data Structures 8.1

COSC 1P03. Ch 7 Recursion. Introduction to Data Structures 8.1 COSC 1P03 Ch 7 Recursio Itroductio to Data Structures 8.1 COSC 1P03 Recursio Recursio I Mathematics factorial Fiboacci umbers defie ifiite set with fiite defiitio I Computer Sciece sytax rules fiite defiitio,

More information