MOST of the advanced signal processing algorithms are

Size: px
Start display at page:

Download "MOST of the advanced signal processing algorithms are"

Transcription

1 A edited versio of this work was publiched i IEEE TRANS. ON CIRCUITS AND SYSTEMS II, VOL. 6, NO. 9, SEPT 05 DOI:0.09/TCSII High-Throughput FPGA Implemetatio of QR Decompositio Sergio D. Muñoz ad Javier Hormigo Abstract This brief presets a hardware desig to achieve high-throughput QR decompositio, usig Gives Rotatio Method. It utilizes a ew two-dimesioal systolic array architecture with pipelied processig elemets, which are based o the COordiate Rotatio DIgital Computer () algorithm. computes vector rotatios through shifts ad additios. This approach allows a cotiuous computatio of QR factorizatios with simple hardware. A fixed-poit FPGA architecture for matrices has bee optimized by balacig the umber of iteratios with the fial error. As a result, compared to other previous proposals for FPGA, our desig achieves at least 50% more throughput, ad much less resource utilizatio. Idex Terms QR Decompositio, systolic array, pipelied, FPGA, high-throughput, I. INTRODUCTION MOST of the advaced sigal processig algorithms are based o algebraic matrix operatios. May examples of this are foud i wireless commuicatio, such as multipleiput-multiple-output (MIMO), beam-formig, multi-user detectio ad cacellatio, etc []. Oe useful operator for these matrix operatios is QR factorizatio, especially for MIMO techologies [] [3] ad adaptive filterig []. Some of this applicatios require high-throughput QR decompositio but are for small matrix sizes. Thus, may works have addressed the parallel hardware implemetatio of this operatio for either ASIC or FPGA techologies. I this work, we focus o high-throughput computatio for small matrices o FPGAs. The Gives Rotatio Method (ad its variatios) is probably the most widely used to implemet QR decompositio by hardware due to its robust umerical properties ad its easy parallelizatio [5]. I the literature, there are several papers i which QR factorizatio has bee implemeted o FPGA by usig this method. Although, serial approaches or liear systolic arrays may be used [6], to achieve high throughput, the most commo hardware implemetatio is through twodimesio (D) systolic arrays, such as i [7], [8], [], [9], [0], []. A D systolic array is a parallel grid structure where processig elemets (PEs) works i parallel ad are locally itercoected. This systolic architecture allows the exploitatio of differet grades of parallelism iheret to the the Give Rotatio algorithm. Thus, these approaches have This work was supported i part by the Miistry of Educatio ad Sciece of Spai ad Juta of Alucía uder cotracts TIN03-53-P ad P07-TIC-0630, respectively. The authors are with the Departmet of Computer Architecture, Uiversidad de Málaga, Málaga E-907 Spai ( smuoz@uma.es; fjhormigo@uma.es). Copyright c 05 IEEE. Persoal use of this material is permitted. However, permissio to use this material for ay other purposes must be obtaied from the IEEE by sedig a to pubs-permissios@ieee.org high-throughput ad relatively low latecy, at the cost of cosiderable area cosumptio. I this work, through combiig several ideas, we have desiged a ew architecture which improves previous highthroughput FPGA implemetatios. It is based o the algorithm to simplify hardware, pipeliig the PEs to obtai better throughput, alog with a differet schedule for performig the Give Rotatios to reduce latecy. As a result, the proposed architecture has very high-throughput ad low latecy, with a relatively reduced area cosumptio. They also have a very simple cotrol ad commuicatio logic. The ext sectios of this brief are orgaized as followed: Sectio II reviews some importat aspects of the QR decompositio usig Gives Rotatios, alog with a brief review of some previous works proposed i the literature. Sectio III presets the proposed architecture to achieve high-throughput. I Sectio IV the results of the FPGA implemetatio are studied ad compared with other previous works. Fially, Sectio V provides the coclusios of this work. II. GIVENS ALGORITHM AND PREVIOUS FPGA IMPLEMENTATIONS Give a matrix A m, this is equivalet to the product of two factors, i. e. A = Q R, i which matrix Q m m is orthogoal ad R m is a upper triagular matrix [5]. The computatio of these two factors is called QR decompositio or factorizatio. The Gives Method achieves a QR factorizatio through uitary trasformatios, called Gives Rotatios, which selectively allow the itroducig of a zero elemet [5]. Gives rotatio matrix has rak-two correctios about idetity matrix, where the rak (i, j) is replaced by orthogoal values based o sies ad cosies. [ cos(θ) si(θ) si(θ) cos(θ) ] [ ] ] a a =[ a 0 As a example, a Gives rotatio is represeted i Eq. for a matrix, where the resultat matrix has a ew iserted zero; this ca be extrapolated to ay other matrix size. The rotatio agle θ must be computed beforehad by the formula arcta( a a ). Alteratively, these values ca also be calculated by Eq. ad Eq. 3. cos(θ) = a i,k a i,k + a j,k () si(θ) = a j,k a i,k + a j,k Accordigly, Gives Method algorithm starts zeroig the lower elemets, from the first colum to the last oe, ad, o each colum, startig from the bottommost elemet to the () (3)

2 STAGE STAGE STAGE 3 STAGE G(3,) G(,3) G(,) STAGE 5 STAGE 6 R R3 3 R 5 R R R3 R R G(3,) G(,3) G3(3,) x Fig. 3. Row-based D-systolic array for matrices. *Gk(i,j)=Gives Rotatio of rows i ad j, k is the colum where a zero is iserted. Fig.. Fig.. Usual Gives rotatio schedule for matrices. GR(,3) GR(3,) R R R3 R Q Q Q3 Q V R R R R R R R V R R R R R R V R R R R R V R R R R Colum-based D-systolic array for matrices. diagoal elemet. The upper triagular matrix R is achieved by accumulatig the Gives Rotatios o the iitial matrix. Similarly, Q is obtaied whe the same rotatios are applied to the idetity matrix. As a example, Fig. illustrates the applicatio of the Gives Method o a matrix to achieve a upper triagular matrix R i 6 stages. Each arrow represets a Gives Rotatio, where G k (i, j) specifies the ivolved rows (i, j) ad the colum k where a zero will be iserted. The circular areas idicate the elemets selected to calculate the rotatio agle, whereas the squared areas delimit those elemets that will be rotated usig said agle. It is clearly see that this algorithm has differet levels of parallelism that cou be exploited depedig o the selected architecture. Several works, such as i [0] ad [7], have proposed a D systolic array similar to the oe showed i Fig.. I this architecture, each PE always works over elemets o the same colum. O each row of the architecture, the Gives rotatios may be performed i parallel usig as may PEs as o-zero elemets are withi the row of the matrix. Besides this parallel computatio, this cofiguratio has the advatage of the two differet types of PEs used, oe (V) to compute the rotatio agle which, at first, requires much more complicated operatios, ad aother (R) to perform the effective rotatios, which is much simpler. Each row of PEs oly eeds oe PE type V ad the rest as type R. Thus, although they eed more PEs, the umber of PEs type V (much more complex) are reduced ad, the, the overall area may be also reduced. This architecture is used i [7], where stard arithmetic operatios are utilized to implemet the PEs. I [0], iterative circuits are used istead, which reduces area cosumptio. However, i these approaches, due to data depedecies betwee cosecutive rotatios, the PEs of the last rows are idle most of the time, which meas a importat waste of resources. Besides this, the same data depedecy prevets the use of pipeliig iterally i the PEs, which limits the achievable throughput. A differet ad oer approach is the oe used i [8] ad [] which is show i Fig. 3. O this scheme, a PE completely performs a Gives rotatio for all elemets of the two rows. Thus, the two operatios ivolved i a Gives rotatio have to be combied i oe PE, makig it more complex, although much fewer PEs are required. The mai advatage of this approach is the fact that the oly data depedecy, which prevets the pipeliig withi the PEs, is the oe betwee the computatio of the rotatio agle ad the rotatio itself. I [], they propose to iterleave colums of differet iput matrices to overcome this depedecy, but this is upractical for may applicatios, especially for deep pipelies. O the other had, Square Root ad Divisio Free Gives Rotatios (SDFG) [3] are utilized i [8], where this depedecy is elimiated by meas of a pre- ad postprocessig which allows pipeliig of the PEs. Thus, this architecture achieves a high-throughput, but the complexity of the operatios ivolved also requires a high utilizatio of resource. III. PROPOSED ARCHITECTURE Similarly to the work i [8], we propose to use a Darray architecture where each PE works with all the elemets of the same row ad these PEs are pipeliig, to achieve high throughput. Yet, at the same time, we propose differet coectios for the PEs withi the D array to reduce latecy. Moreover, the PEs are desiged based o the algorithm to implemet this pipelie i a simpler way, which produces a system with lower area ad higher throughput. Next, we preset some details of this architecture. A. Gives Rotatios Schedule The classic schedule to implemet the Gives algorithm, as it is previously described i Fig., starts zeroig the bottommost elemet of the first colum, ad serially cotiues up i the same colum util this colum is fiished. The, the same procedure is performed over the ext colum ad so o, util the matrix is triagular. The D systolic array

3 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS,, VOL 6, NO., APRIL 05 3 Fig.. STAGE STAGE STAGE 3 G(,3) G(,) G(3,) STAGE G3(3,) G(,3) G(,) x *Gk(i,j)=Gives Rotatio of rows i ad j, k is the colum where a zero is iserted. Gives rotatios scheduled for icreasig parallelism. improves x x this x xschedule by performig several rotatios at the same time. Cocretely, as it is idicated i Fig 3, all PEs i thexsame x x diagoal x work i parallel, sigificatly reducig the umber x x x of x steps required for oe matrix computatio. However, this schedule ca be performed with a higher level of parallelism, if as may rows as possible were rotated simultaeously. Thereby the algorithm decreases latecy by reducig its amout of steps. We shou ote that the umber of Gives rotatios remais the same but the umber of Gives rotatios by step is icreased. To reduce the umber of steps as much as possible, o each step all suitable pairs of rows (i.e., two uselected rows that cotai the same umber of zeros to the left) are selected ad they are rotated i parallel. This is repeated util a upper triagular matrix is achieved. Fig. illustrates how a matrix is factorized by usig this schedule. I this figure, two types of lies are described, dotted ad cotiuous; each oe represets a Gives rotatio made simultaeously. I the first stage, two Gives rotatios are performed cocurretly, it takes the adjacet rows (, ) ad (3, ). This meas isertig two zeros i rows ad as show o the secod stage. Followig, rotatios G (, 3) ad G (, ) are calculated, fiishig the computatio o the first colum. The, the first row will ot be used agai, restrictig the algorithm to oly oe rotatio by stage for the two last stages. Therefore, oly four stages are required usig this schedule. B. -based processig elemets (COordiate Rotatio DIgital Computer) is a iterative algorithm based o shifts ad additios which allows calculatig may elemetary fuctios with a very simple hardware []. The same circuit may operate i two modes, vectorig or rotatio mode. The former rotates a iput vector (X, Y ) util its Y coordiate reaches a zero value, returig agle θ ad the X coordiate has rotated. The latter rotates a iput vector (X, Y ) with a determied agle θ. Therefore, a uit cou be used to compute the agle for a Gives Rotatio (vectorig mode) ad, the, performs the rotatio through the rest of the row usig said agle (rotatio mode). May differet circuits based o have bee proposed i the literature to perform Gives Rotatios. To achieve our goal, we have selected the oe used to implemet a liear systolic array i [5]. It is a pipelie architecture Fig. 5. [0] VEC-ROT PIPELINED phase cotrol [] [it] 0 σ[] 0 0 operatio sectio Y[0] X[0] Y[] shift() -based processig elemet. X[] shift() which performs both vectorig ad rotatio mode. Due to data depedecy of the liear array, the circuit preseted i [5] eeds matrix iterleavig to take advatage of the pipelie desig. However, withi our row-based D systolic array, the pipelie is used i a atural way. This approach replaces the computatio of the rotatio agle θ by the directio of each micro-rotatio. This directio is idicated by the sig of Y o each stage ad it is stored i a register (σ) to be used i subsequet rotatios. Thus, this circuit allows overlappig the computatio of the agle with the rotatio of the correspodig rows. Fig. 5 shows this circuit divided ito two sectios. The right sectio is the operatio sectio that cotais the typical x-y data-path. The left sectio represets the cotrol hardware which, i vectorig mode, selects the rotatio directio ad updates the σ registers. These registers cofigure the adds/subs i the rotatio mode. A active sigal idicates a ew agle computatio (vectorig mode) o this stage. The, sig(y ) is used to cotrol the ad it is stored i the σ register. While the active sigal goes through the pipelie, the rest of the elemets of the correspodig rows are itroduced ito the circuit to be rotated usig said stored directios (rotatio mode). Therefore, both computatios are overlapped ad, furthermore, it is clearly see that a ew Gives rotatio may be itroduced i the pipelie before the actual oe was completely fiished. The costat scale factor itroduced by the algorithm [] is compesated by multiplyig the output values by its iverse. As we will see i the ext sectio, costat multipliers or embedded multipliers cou be utilized for this operatio. C. Proposed circuit Usig the schedule described i sectio III-A, the proposed architecture is derived by assigig a PE to each Gives

4 0 σ[] σ[] σ[] σ[] σ[] σ[] x y x y STAGE STAGE STAGE 3 STAGE, phase cotrol operatio sectio [0] Y[0] X[0] phase cotrol operatio sectio [0] Y[0] X[0] X[] Y[] [] X[] Y[] [] shift() shift() shift() shift() [it], [it] x' y' x 3 y x' x y' y Row x' x3 y' 3 y3, phase cotrol operatio sectio [0] Y[0] X[0] phase cotrol operatio sectio [0] Y[0] X[0] X[] Y[] [] X[] Y[] [] shift() shift(), shift() shift() [it] [it] x' y',3 phase cotrol operatio sectio [0] Y[0] X[0] X[] Y[] [] shift() shift() Delay Register. [it] Row x'3 y'3 3 x y, phase cotrol operatio sectio [0] Y[0] X[0] X[] Y[] [] shift() shift() [it] Rows 3- Fig. 6. base architecture implemeted to factorize size matrices. rotatio. Fig. 6 illustrates the architecture for matrices. There are four stages, the two first oes with two PEs, sice two Gives rotatios are performed i parallel, ad oly oe i the two others. Iput ad output buses are coected directly from oe stage to the ext. Oly a FIFO register is required o stage 3, sice oe of the rows computed i stage is used i stage. Not much logic is required for sychroizatio of this architecture, due to its pipelied structure. The sigals of the PEs o the first stage may be set exterally, or usig a couter if the flow of iput matrices is regular. I the ext stages, the iputs are coected to the outputs of the previous stage. I some PEs, the sigal has to be delayed oe extra cycle to compesate for the zero elemets. I the first stage, all rows are itroduced simultaeously, elemet by elemet (iput matrix followed by idetity matrix). Furthermore, thaks to its fully pipelied architecture, a ew matrix computatio cou start right after the last elemet is itroduced. Therefore, a very high throughput is achieved (for this example, oe matrix computatio each 8 cycles). IV. PERFORMANCE ANALYSIS AND COMPARISON Usig the proposed architecture, a VHDL fixed-poit QR decompositio core for matrices has bee desiged. Said core allows us to cofigure both bit-width ad umber of iteratios. This core has bee simulated ad sythesized usig Xilix ISE.3 software, ad implemeted ad evaluated usig a hardware Virtex-6 XV6VLX0T speed - FPGA platform. To cofirm the correctess of the proposed core, first, it has bee tested with a wide rage of radom matrices ad the results have bee checked usig Matlab. Secodly, to improve the area ad the latecy of the proposed circuit, we have experimetally studied how the umber of iteratios iflueces the error of its results. To do this, differet circuits, usig three word-legths (6, ad 3 bits), have bee implemeted o our hardware platform for several umbers of iteratios. Usig each oe of these circuits, the QR decompositio has bee calculated for 50,000 radom matrices whose results have bee checked by computig Q t R ad comparig it with the origial matrix A. I Table I, the maximum error detected o these comparisos is preseted for each tested cofiguratio. It is clearly observed that, at first, the maximum error decreases whe the umber of iteratios icreases, due to the better approximatio achieved for the rotatio agle. However, at a certai poit, the error starts to slightly icrease due to the accumulated roudig error. Thus, to obtai miimum error while reducig the area x' y' TABLE I MAXIMUM ERROR OF QR FACTORIZATION FOR 6, AND 3 BIT WORD-LENGTHS DEPENDING ON THE NUMBER OF ITERATIONS. Word-Leght 6 bits -Iter Latecy (cycles) Max. Error.e-3 5.8e- 6.9e- 7.3e- 8.9e- Word-Leght bits -Iter Latecy (cycles) Max. Error 8.06e-5 6.0e-6 3.5e-6 3.7e-6.7e-6 Word-Leght 3 bits -Iter Latecy (cycles) Max. Error 3.3e-7.e-8 9.e-9.6e-8.3e-8 ad the latecy, the best cofiguratios for 6, ad 3 bitwidths are 0, 8, 6 iteratios, respectively. Table II shows the implemetatio results for the three aalyzed word-legths, each oe with three differet approaches for the scale factor compesatio required by the algorithm (see Sectio III-B). All desigs use the optimum umber of iteratios previously computed. Two approaches use the embedded multipliers (DSP8E) which typically exist i FPGAs, either o-pipelied or pipelied (Multiplier A ad Multiplier B, respectively). While, the third approach uses pipelied costat-coefficiet (Multiplier C) desiged with Xilix Core Geerator. There are ot great area differeces betwee the approaches usig DSP8, but the oe pipelied allows much higher clock frequecy ad, cosequetly, much better throughput at the cost of a moderate latecy icrease. O the other had, the umber of slices used to implemet the costat-coefficiet multipliers is relatively high compared to the rest of the circuit. The, although this approach achieves the same throughput as the pipelied-dsp8 oe, ad eve less latecy, this approach may be oly selected if it is required to save DSP8 for differet computatios. Fially, to study the effectiveess of our proposal, from the literature, we have selected some represetative works which provide eough data to perform a reasoable compariso. Tab. III shows the mea results of these works alog with the oes for our 6-bit circuit usig Multiplier A. To provide a fair compariso, we have sythesized our architecture o equivalet FPGAs as those works, cocretely Virtex (XCVFX60- ) ad Virtex5 (XC5VTX50T-). Regardig the performace, the oly desig with a throughput relatively close to ours is the oe i [8]. Similarly to our proposal, it uses pipelied PEs ad has practically the same latecy i clock cycles. However, its lower maximum frequecy provides that our proposal presets about 35% less latecy (secods) ad 50% more throughput tha the desig i [8]. The better critical path of our desig is explaied maily by the simplicity of the architecture. O the other had, the circuits preseted i [0] ad [7] have a throughput which is oe order of magitude lower tha ours, sice their PEs are iterative. Regardig the area, our desig clearly requires several

5 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS,, VOL. 6, NO., APRIL 05 5 TABLE II FPGA IMPLEMENTATION RESULTS FOR 6, AND 3 BITS WORD-LENGTHS. Device Xilix Virtex 6 XC6VLX0T - Word-Leght 6 bits bits 3 bits -Iteratios Cycles/Matrix Multiplier A Dedicated No-Pipelied DSP8E Latecy (cycles) DSP8E (%) (3%) 8(6%) Max. Freq. (Mhz) Slice Registers,0(%) 5,888(%) 0,8(3%) Slice LUTs,98(%) 6,09(%),337(7%) Multiplier B Dedicated Pipelied DSP8E Latecy (cycles) DSP8E (%) (3%) 8(6%) Stages Pipe. Mult. 6 Max. Freq. (Mhz) Slice Registers,57(%) 6,06(%),50(3%) Slice LUTs,587(%) 6,0(%),5(7%) Multiplier C Cost. Coef. Pipelied (without DSP8E) Latecy (cycles) Stages Pipe. Mult. 3 Max. Freq. (Mhz) Slice Registers 3,6(%) 8,00(%) 6,59(5%) Slice LUTs 3,6(%) 8,596(5%) 6,00(0%) TABLE III COMPARATIVE WITH OTHER FPGA IMPLEMENTATIONS Work [7] [8] [0] This Work (Multiplier A) DEVICE Virtex5 Virtex Virtex5 Virtex Virtex5 W-Legth 6 bits 6 bits 8 bits 6 bits Latecy(cl) Latecy(µs) Max. Freq. (Mhz.) 6 5 Throughput MMatrices/sec Slice Reg. 6,99 5,89 7,8,3,085 Slice LUTs 0,899 9,80,609,085,67 DSP8 8 Max. Error.e-3* 6.3e-3 5.8e- *Note. Maximum error for [7] is obtaied from a factored matrix sample, a complete error study has ot bee doe. times less resources tha the others, cosiderig all kids of resources. The closest oe is the circuit i [0], which also uses a -based architecture. Although it requires practically the same umber of LUTs ad o multipliers, the umber of registers is more tha three times greater. This is maily explaied by the much greater umber of PEs preseted i the architecture ad the use of carry-save arithmetic. Takig all these results ito accout, we cou coclude that the architecture proposed herei presets much better throughput, ad much lower resource utilizatio, tha previously proposed works. Moreover, this throughput cou be doubled by usig the pipelied versio of DSP8 at the cost of a moderate latecy icrease. V. CONCLUSION This brief presets a fixed-poit systolic architecture to achieve high-throughput QR Decompositio for small matrices. This is achieved by performig as may Gives rotatios as possible i parallel i a urolled architecture, ad usig a pipelied circuit which allows completely overlappig the agle computatio ad the rows rotatio. Thus, this highly pipelied circuit performs a matrix decompositio each clock cycles. The FPGA implemetatio of this architecture for matrices has bee optimized for differet word-legths by selectig the appropriate umber of iteratios. Comparig with previous FPGA approaches, our proposal highly improves both performace ad resources utilizatio. REFERENCES [] K. Sarrigeorgidis ad J. Rabaey, A scalable cofigurable architecture for advaced wireless commuicatio algorithms, Joural of VLSI sigal processig systems for sigal, image ad video techology, vol. 5, o. 3, pp. 7 5, 006. [] Z.-Y. Huag ad P.-Y. Tsai, Efficiet implemetatio of QR decompositio for gigabit MIMO-OFDM systems, Circuits ad Systems I: Regular Papers, IEEE Trasactios o, vol. 58, o. 0, pp. 53 5, Oct 0. [3] Y. Wu, J. McAllister, ad P. Wag, High performace real-time preprocessig for fixed-complexity sphere decoder, i Global Coferece o Sigal ad Iformatio Processig (GlobalSIP), 03 IEEE, Dec 03, pp [] S. Cha ad X. Yag, Improved approximate QR-LS algorithms for adaptive filterig, Circuits ad Systems II: Express Briefs, IEEE Trasactios o, vol. 5, o., pp. 9 39, Ja 00. [5] G. H. Golub ad C. F. Va Loa, Matrix Computatios (3rd Ed.). Baltimore, MD, USA: Johs Hopkis Uiversity Press, 996. [6] K. Booyi, J. Tagapaij, ad A. Boopooga, FPGA-based hardware/software implemetatio for MIMO wireless commuicatios, i Electrical Egieerig Cogress (ieecon), 0 Iteratioal, March 0, pp.. [7] S. Asla, S. Niu, ad J. Saiie, FPGA implemetatio of fast QR decompositio based o Gives rotatio, i Circuits ad Systems, 0 IEEE 55th Iteratioal Midwest Symposium o, 0, pp [8] M. Abels, T. Wiegad, ad S. Paul, Efficiet FPGA implemetatio of a high throughput systolic array QR-decompositio algorithm, i Sigals, Systems ad Computers (ASILOMAR), 0 Coferece Record of the Forty Fifth Asilomar Coferece o, Nov 0, pp [9] R.-H. Chag, C.-H. Li, K.-H. Li, C.-L. Huag, ad F.-C. Che, Iterative QR decompositio architecture usig the modified Gram-Schmidt algorithm for MIMO systems, Circuits ad Systems I: Regular Papers, IEEE Trasactios o, vol. 57, o. 5, pp , May 00. [0] D. Che ad M. Sima, Fixed-poit -based QR decompositio by Gives rotatios o FPGA, i Recofigurable Computig ad FPGAs (ReCoFig), 0 Iteratioal Coferece o, Nov 0, pp [] G. Prabhu, B. Johso, ad J. Rai, FPGA based scalable fixed poit QRD core usig dyamic partial recofiguratio, i VLSI Desig (VLSID), 05 8th Iteratioal Coferece o, Ja 05, pp [] A. El-Amawy ad K. Dharmaraja, Parallel VLSI algorithm for stable iversio of dese matrices, Computers ad Digital Techiques, IEE Proceedigs E, vol. 36, o. 6, pp , Nov 989. [3] J. Gotze ad U. Schwiegelshoh, A square root ad divisio free Gives rotatio for solvig least squares problems o systolic arrays, SIAM Joural o Scietific ad Statistical Computig, vol., o., pp , Jul 99. [] M. D. Ercegovac ad T. Lag, Digital arithmetic. Elsevier, 003. [5] J. Luo ad C. Jog, Scalable liear array architectures for matrix iversio usig Bi-z, Microelectroics Joural, vol. 3, o., pp. 53, 0.

Chapter 3 Classification of FFT Processor Algorithms

Chapter 3 Classification of FFT Processor Algorithms Chapter Classificatio of FFT Processor Algorithms The computatioal complexity of the Discrete Fourier trasform (DFT) is very high. It requires () 2 complex multiplicatios ad () complex additios [5]. As

More information

Ones Assignment Method for Solving Traveling Salesman Problem

Ones Assignment Method for Solving Traveling Salesman Problem Joural of mathematics ad computer sciece 0 (0), 58-65 Oes Assigmet Method for Solvig Travelig Salesma Problem Hadi Basirzadeh Departmet of Mathematics, Shahid Chamra Uiversity, Ahvaz, Ira Article history:

More information

Improvement of the Orthogonal Code Convolution Capabilities Using FPGA Implementation

Improvement of the Orthogonal Code Convolution Capabilities Using FPGA Implementation Improvemet of the Orthogoal Code Covolutio Capabilities Usig FPGA Implemetatio Naima Kaabouch, Member, IEEE, Apara Dhirde, Member, IEEE, Saleh Faruque, Member, IEEE Departmet of Electrical Egieerig, Uiversity

More information

LU Decomposition Method

LU Decomposition Method SOLUTION OF SIMULTANEOUS LINEAR EQUATIONS LU Decompositio Method Jamie Traha, Autar Kaw, Kevi Marti Uiversity of South Florida Uited States of America kaw@eg.usf.edu http://umericalmethods.eg.usf.edu Itroductio

More information

CSC 220: Computer Organization Unit 11 Basic Computer Organization and Design

CSC 220: Computer Organization Unit 11 Basic Computer Organization and Design College of Computer ad Iformatio Scieces Departmet of Computer Sciece CSC 220: Computer Orgaizatio Uit 11 Basic Computer Orgaizatio ad Desig 1 For the rest of the semester, we ll focus o computer architecture:

More information

COARSE ANGLE ROTATION MODE CORDIC BASED SINGLE PROCESSING ELEMENT QR-RLS PROCESSOR

COARSE ANGLE ROTATION MODE CORDIC BASED SINGLE PROCESSING ELEMENT QR-RLS PROCESSOR 7th Europea Sigal Processig Coferece (EUSIPCO 9 Glasgow, Scotlad, August 4-8, 9 COARSE ANGLE ROTATION MODE CORDIC BASED SINGLE PROCESSING ELEMENT QR-RLS PROCESSOR Qiag Gao, Louise Crockett ad Robert Stewart

More information

A Study on the Performance of Cholesky-Factorization using MPI

A Study on the Performance of Cholesky-Factorization using MPI A Study o the Performace of Cholesky-Factorizatio usig MPI Ha S. Kim Scott B. Bade Departmet of Computer Sciece ad Egieerig Uiversity of Califoria Sa Diego {hskim, bade}@cs.ucsd.edu Abstract Cholesky-factorizatio

More information

A Note on Least-norm Solution of Global WireWarping

A Note on Least-norm Solution of Global WireWarping A Note o Least-orm Solutio of Global WireWarpig Charlie C. L. Wag Departmet of Mechaical ad Automatio Egieerig The Chiese Uiversity of Hog Kog Shati, N.T., Hog Kog E-mail: cwag@mae.cuhk.edu.hk Abstract

More information

APPLICATION NOTE PACE1750AE BUILT-IN FUNCTIONS

APPLICATION NOTE PACE1750AE BUILT-IN FUNCTIONS APPLICATION NOTE PACE175AE BUILT-IN UNCTIONS About This Note This applicatio brief is iteded to explai ad demostrate the use of the special fuctios that are built ito the PACE175AE processor. These powerful

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Part A Datapath Design

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Part A Datapath Design COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter The Processor Part A path Desig Itroductio CPU performace factors Istructio cout Determied by ISA ad compiler. CPI ad

More information

Fast Fourier Transform (FFT) Algorithms

Fast Fourier Transform (FFT) Algorithms Fast Fourier Trasform FFT Algorithms Relatio to the z-trasform elsewhere, ozero, z x z X x [ ] 2 ~ elsewhere,, ~ e j x X x x π j e z z X X π 2 ~ The DFS X represets evely spaced samples of the z- trasform

More information

Load balanced Parallel Prime Number Generator with Sieve of Eratosthenes on Cluster Computers *

Load balanced Parallel Prime Number Generator with Sieve of Eratosthenes on Cluster Computers * Load balaced Parallel Prime umber Geerator with Sieve of Eratosthees o luster omputers * Soowook Hwag*, Kyusik hug**, ad Dogseug Kim* *Departmet of Electrical Egieerig Korea Uiversity Seoul, -, Rep. of

More information

Design of Efficient Pipelined Radix-2 2 Single Path Delay Feedback FFT

Design of Efficient Pipelined Radix-2 2 Single Path Delay Feedback FFT IOSR Joural of VLSI ad Sigal Processig IOSR-JVSP Volume Issue Ver. I May-Ju. 0 PP 88-9 e-iss: 9 00 p-iss o. : 9 97 www.iosrjourals.org Desig of Efficiet Pipelied Radi- Sigle Path Delay Feedbac FFT isha

More information

EE123 Digital Signal Processing

EE123 Digital Signal Processing Last Time EE Digital Sigal Processig Lecture 7 Block Covolutio, Overlap ad Add, FFT Discrete Fourier Trasform Properties of the Liear covolutio through circular Today Liear covolutio with Overlap ad add

More information

Lecture 1: Introduction and Strassen s Algorithm

Lecture 1: Introduction and Strassen s Algorithm 5-750: Graduate Algorithms Jauary 7, 08 Lecture : Itroductio ad Strasse s Algorithm Lecturer: Gary Miller Scribe: Robert Parker Itroductio Machie models I this class, we will primarily use the Radom Access

More information

AN OPTIMIZATION NETWORK FOR MATRIX INVERSION

AN OPTIMIZATION NETWORK FOR MATRIX INVERSION 397 AN OPTIMIZATION NETWORK FOR MATRIX INVERSION Ju-Seog Jag, S~ Youg Lee, ad Sag-Yug Shi Korea Advaced Istitute of Sciece ad Techology, P.O. Box 150, Cheogryag, Seoul, Korea ABSTRACT Iverse matrix calculatio

More information

An Efficient Algorithm for Graph Bisection of Triangularizations

An Efficient Algorithm for Graph Bisection of Triangularizations A Efficiet Algorithm for Graph Bisectio of Triagularizatios Gerold Jäger Departmet of Computer Sciece Washigto Uiversity Campus Box 1045 Oe Brookigs Drive St. Louis, Missouri 63130-4899, USA jaegerg@cse.wustl.edu

More information

FPGA IMPLEMENTATION OF BASE-N LOGARITHM. Salvador E. Tropea

FPGA IMPLEMENTATION OF BASE-N LOGARITHM. Salvador E. Tropea FPGA IMPLEMENTATION OF BASE-N LOGARITHM Salvador E. Tropea Electróica e Iformática Istituto Nacioal de Tecología Idustrial Bueos Aires, Argetia email: salvador@iti.gov.ar ABSTRACT I this work, we preset

More information

Parallel Polygon Approximation Algorithm Targeted at Reconfigurable Multi-Ring Hardware

Parallel Polygon Approximation Algorithm Targeted at Reconfigurable Multi-Ring Hardware Parallel Polygo Approximatio Algorithm Targeted at Recofigurable Multi-Rig Hardware M. Arif Wai* ad Hamid R. Arabia** *Califoria State Uiversity Bakersfield, Califoria, USA **Uiversity of Georgia, Georgia,

More information

Elementary Educational Computer

Elementary Educational Computer Chapter 5 Elemetary Educatioal Computer. Geeral structure of the Elemetary Educatioal Computer (EEC) The EEC coforms to the 5 uits structure defied by vo Neuma's model (.) All uits are preseted i a simplified

More information

Alpha Individual Solutions MAΘ National Convention 2013

Alpha Individual Solutions MAΘ National Convention 2013 Alpha Idividual Solutios MAΘ Natioal Covetio 0 Aswers:. D. A. C 4. D 5. C 6. B 7. A 8. C 9. D 0. B. B. A. D 4. C 5. A 6. C 7. B 8. A 9. A 0. C. E. B. D 4. C 5. A 6. D 7. B 8. C 9. D 0. B TB. 570 TB. 5

More information

3D Model Retrieval Method Based on Sample Prediction

3D Model Retrieval Method Based on Sample Prediction 20 Iteratioal Coferece o Computer Commuicatio ad Maagemet Proc.of CSIT vol.5 (20) (20) IACSIT Press, Sigapore 3D Model Retrieval Method Based o Sample Predictio Qigche Zhag, Ya Tag* School of Computer

More information

An Efficient Implementation of the Gradient-based Hough Transform using DSP slices and block RAMs on the FPGA

An Efficient Implementation of the Gradient-based Hough Transform using DSP slices and block RAMs on the FPGA A Efficiet Implemetatio of the Gradiet-based Hough Trasform usig DSP slices ad block RAMs o the FPGA Xi Zhou, Yasuaki Ito, ad Koji Nakao Departmet of Iformatio Egieerig Hiroshima Uiversity Kagamiyama 1-4-1,

More information

Appendix D. Controller Implementation

Appendix D. Controller Implementation COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Appedix D Cotroller Implemetatio Cotroller Implemetatios Combiatioal logic (sigle-cycle); Fiite state machie (multi-cycle, pipelied);

More information

Efficient Hough transform on the FPGA using DSP slices and block RAMs

Efficient Hough transform on the FPGA using DSP slices and block RAMs Efficiet Hough trasform o the FPGA usig DSP slices ad block RAMs Xi Zhou, Norihiro Tomagou, Yasuaki Ito, ad Koji Nakao Departmet of Iformatio Egieerig Hiroshima Uiversity Kagamiyama 1-4-1, Higashi Hiroshima,

More information

An Efficient Implementation Method of Fractal Image Compression on Dynamically Reconfigurable Architecture

An Efficient Implementation Method of Fractal Image Compression on Dynamically Reconfigurable Architecture A Efficiet Implemetatio Method of Fractal Image Compressio o Dyamically Recofigurable Architecture Hidehisa Nagao, Akihiro Matsuura, ad Akira Nagoya NTT Commuicatio Sciece Laboratories 2-4 Hikaridai, Seika-cho,

More information

Reversible Realization of Quaternary Decoder, Multiplexer, and Demultiplexer Circuits

Reversible Realization of Quaternary Decoder, Multiplexer, and Demultiplexer Circuits Egieerig Letters, :, EL Reversible Realizatio of Quaterary Decoder, Multiplexer, ad Demultiplexer Circuits Mozammel H.. Kha, Member, ENG bstract quaterary reversible circuit is more compact tha the correspodig

More information

GPUMP: a Multiple-Precision Integer Library for GPUs

GPUMP: a Multiple-Precision Integer Library for GPUs GPUMP: a Multiple-Precisio Iteger Library for GPUs Kaiyog Zhao ad Xiaowe Chu Departmet of Computer Sciece, Hog Kog Baptist Uiversity Hog Kog, P. R. Chia Email: {kyzhao, chxw}@comp.hkbu.edu.hk Abstract

More information

Automatic Generation of Polynomial-Basis Multipliers in GF (2 n ) using Recursive VHDL

Automatic Generation of Polynomial-Basis Multipliers in GF (2 n ) using Recursive VHDL Automatic Geeratio of Polyomial-Basis Multipliers i GF (2 ) usig Recursive VHDL J. Nelso, G. Lai, A. Teca Abstract Multiplicatio i GF (2 ) is very commoly used i the fields of cryptography ad error correctig

More information

A SOFTWARE MODEL FOR THE MULTILAYER PERCEPTRON

A SOFTWARE MODEL FOR THE MULTILAYER PERCEPTRON A SOFTWARE MODEL FOR THE MULTILAYER PERCEPTRON Roberto Lopez ad Eugeio Oñate Iteratioal Ceter for Numerical Methods i Egieerig (CIMNE) Edificio C1, Gra Capitá s/, 08034 Barceloa, Spai ABSTRACT I this work

More information

An Efficient Algorithm for Graph Bisection of Triangularizations

An Efficient Algorithm for Graph Bisection of Triangularizations Applied Mathematical Scieces, Vol. 1, 2007, o. 25, 1203-1215 A Efficiet Algorithm for Graph Bisectio of Triagularizatios Gerold Jäger Departmet of Computer Sciece Washigto Uiversity Campus Box 1045, Oe

More information

Accuracy Improvement in Camera Calibration

Accuracy Improvement in Camera Calibration Accuracy Improvemet i Camera Calibratio FaJie L Qi Zag ad Reihard Klette CITR, Computer Sciece Departmet The Uiversity of Aucklad Tamaki Campus, Aucklad, New Zealad fli006, qza001@ec.aucklad.ac.z r.klette@aucklad.ac.z

More information

Pattern Recognition Systems Lab 1 Least Mean Squares

Pattern Recognition Systems Lab 1 Least Mean Squares Patter Recogitio Systems Lab 1 Least Mea Squares 1. Objectives This laboratory work itroduces the OpeCV-based framework used throughout the course. I this assigmet a lie is fitted to a set of poits usig

More information

Dynamic Programming and Curve Fitting Based Road Boundary Detection

Dynamic Programming and Curve Fitting Based Road Boundary Detection Dyamic Programmig ad Curve Fittig Based Road Boudary Detectio SHYAM PRASAD ADHIKARI, HYONGSUK KIM, Divisio of Electroics ad Iformatio Egieerig Chobuk Natioal Uiversity 664-4 Ga Deokji-Dog Jeoju-City Jeobuk

More information

Sorting in Linear Time. Data Structures and Algorithms Andrei Bulatov

Sorting in Linear Time. Data Structures and Algorithms Andrei Bulatov Sortig i Liear Time Data Structures ad Algorithms Adrei Bulatov Algorithms Sortig i Liear Time 7-2 Compariso Sorts The oly test that all the algorithms we have cosidered so far is compariso The oly iformatio

More information

Cubic Polynomial Curves with a Shape Parameter

Cubic Polynomial Curves with a Shape Parameter roceedigs of the th WSEAS Iteratioal Coferece o Robotics Cotrol ad Maufacturig Techology Hagzhou Chia April -8 00 (pp5-70) Cubic olyomial Curves with a Shape arameter MO GUOLIANG ZHAO YANAN Iformatio ad

More information

Research Article Kinematics Analysis and Modeling of 6 Degree of Freedom Robotic Arm from DFROBOT on Labview

Research Article Kinematics Analysis and Modeling of 6 Degree of Freedom Robotic Arm from DFROBOT on Labview Research Joural of Applied Scieces, Egieerig ad Techology 13(7): 569-575, 2016 DOI:10.19026/rjaset.13.3016 ISSN: 2040-7459; e-issn: 2040-7467 2016 Maxwell Scietific Publicatio Corp. Submitted: May 5, 2016

More information

Redundancy Allocation for Series Parallel Systems with Multiple Constraints and Sensitivity Analysis

Redundancy Allocation for Series Parallel Systems with Multiple Constraints and Sensitivity Analysis IOSR Joural of Egieerig Redudacy Allocatio for Series Parallel Systems with Multiple Costraits ad Sesitivity Aalysis S. V. Suresh Babu, D.Maheswar 2, G. Ragaath 3 Y.Viaya Kumar d G.Sakaraiah e (Mechaical

More information

The Nature of Light. Chapter 22. Geometric Optics Using a Ray Approximation. Ray Approximation

The Nature of Light. Chapter 22. Geometric Optics Using a Ray Approximation. Ray Approximation The Nature of Light Chapter Reflectio ad Refractio of Light Sectios: 5, 8 Problems: 6, 7, 4, 30, 34, 38 Particles of light are called photos Each photo has a particular eergy E = h ƒ h is Plack s costat

More information

Efficient Hardware Design for Implementation of Matrix Multiplication by using PPI-SO

Efficient Hardware Design for Implementation of Matrix Multiplication by using PPI-SO Efficiet Hardware Desig for Implemetatio of Matrix Multiplicatio by usig PPI-SO Shivagi Tiwari, Niti Meea Dept. of EC, IES College of Techology, Bhopal, Idia Assistat Professor, Dept. of EC, IES College

More information

Second-Order Domain Decomposition Method for Three-Dimensional Hyperbolic Problems

Second-Order Domain Decomposition Method for Three-Dimensional Hyperbolic Problems Iteratioal Mathematical Forum, Vol. 8, 013, o. 7, 311-317 Secod-Order Domai Decompositio Method for Three-Dimesioal Hyperbolic Problems Youbae Ju Departmet of Applied Mathematics Kumoh Natioal Istitute

More information

Multi-Threading. Hyper-, Multi-, and Simultaneous Thread Execution

Multi-Threading. Hyper-, Multi-, and Simultaneous Thread Execution Multi-Threadig Hyper-, Multi-, ad Simultaeous Thread Executio 1 Performace To Date Icreasig processor performace Pipeliig. Brach predictio. Super-scalar executio. Out-of-order executio. Caches. Hyper-Threadig

More information

The Simeck Family of Lightweight Block Ciphers

The Simeck Family of Lightweight Block Ciphers The Simeck Family of Lightweight Block Ciphers Gagqiag Yag, Bo Zhu, Valeti Suder, Mark D. Aagaard, ad Guag Gog Electrical ad Computer Egieerig, Uiversity of Waterloo Sept 5, 205 Yag, Zhu, Suder, Aagaard,

More information

( n+1 2 ) , position=(7+1)/2 =4,(median is observation #4) Median=10lb

( n+1 2 ) , position=(7+1)/2 =4,(median is observation #4) Median=10lb Chapter 3 Descriptive Measures Measures of Ceter (Cetral Tedecy) These measures will tell us where is the ceter of our data or where most typical value of a data set lies Mode the value that occurs most

More information

Appendix A. Use of Operators in ARPS

Appendix A. Use of Operators in ARPS A Appedix A. Use of Operators i ARPS The methodology for solvig the equatios of hydrodyamics i either differetial or itegral form usig grid-poit techiques (fiite differece, fiite volume, fiite elemet)

More information

Lower Bounds for Sorting

Lower Bounds for Sorting Liear Sortig Topics Covered: Lower Bouds for Sortig Coutig Sort Radix Sort Bucket Sort Lower Bouds for Sortig Compariso vs. o-compariso sortig Decisio tree model Worst case lower boud Compariso Sortig

More information

Math Section 2.2 Polynomial Functions

Math Section 2.2 Polynomial Functions Math 1330 - Sectio. Polyomial Fuctios Our objectives i workig with polyomial fuctios will be, first, to gather iformatio about the graph of the fuctio ad, secod, to use that iformatio to geerate a reasoably

More information

ON THE QUALITY OF AUTOMATIC RELATIVE ORIENTATION PROCEDURES

ON THE QUALITY OF AUTOMATIC RELATIVE ORIENTATION PROCEDURES ON THE QUALITY OF AUTOMATIC RELATIVE ORIENTATION PROCEDURES Thomas Läbe, Timo Dickscheid ad Wolfgag Förster Istitute of Geodesy ad Geoiformatio, Departmet of Photogrammetry, Uiversity of Bo laebe@ipb.ui-bo.de,

More information

Recursive Procedures. How can you model the relationship between consecutive terms of a sequence?

Recursive Procedures. How can you model the relationship between consecutive terms of a sequence? 6. Recursive Procedures I Sectio 6.1, you used fuctio otatio to write a explicit formula to determie the value of ay term i a Sometimes it is easier to calculate oe term i a sequece usig the previous terms.

More information

Improving Template Based Spike Detection

Improving Template Based Spike Detection Improvig Template Based Spike Detectio Kirk Smith, Member - IEEE Portlad State Uiversity petra@ee.pdx.edu Abstract Template matchig algorithms like SSE, Covolutio ad Maximum Likelihood are well kow for

More information

BASED ON ITERATIVE ERROR-CORRECTION

BASED ON ITERATIVE ERROR-CORRECTION A COHPARISO OF CRYPTAALYTIC PRICIPLES BASED O ITERATIVE ERROR-CORRECTIO Miodrag J. MihaljeviC ad Jova Dj. GoliC Istitute of Applied Mathematics ad Electroics. Belgrade School of Electrical Egieerig. Uiversity

More information

Lecture 28: Data Link Layer

Lecture 28: Data Link Layer Automatic Repeat Request (ARQ) 2. Go ack N ARQ Although the Stop ad Wait ARQ is very simple, you ca easily show that it has very the low efficiecy. The low efficiecy comes from the fact that the trasmittig

More information

A Parallel Reconfigurable Architecture for Real-Time Stereo Vision

A Parallel Reconfigurable Architecture for Real-Time Stereo Vision 2009 Iteratioal Cofereces o Embedded Software ad Systems A Parallel Recofigurable Architecture for Real-Time Stereo Visio Lei Che Yude Jia Beijig Laboratory of Itelliget Iformatio Techology, School of

More information

DESIGN AND ANALYSIS OF LDPC DECODERS FOR SOFTWARE DEFINED RADIO

DESIGN AND ANALYSIS OF LDPC DECODERS FOR SOFTWARE DEFINED RADIO DESIGN AND ANALYSIS OF LDPC DECODERS FOR SOFTWARE DEFINED RADIO Sagwo Seo, Trevor Mudge Advaced Computer Architecture Laboratory Uiversity of Michiga at A Arbor {swseo, tm}@umich.edu Yumig Zhu, Chaitali

More information

Creating Exact Bezier Representations of CST Shapes. David D. Marshall. California Polytechnic State University, San Luis Obispo, CA , USA

Creating Exact Bezier Representations of CST Shapes. David D. Marshall. California Polytechnic State University, San Luis Obispo, CA , USA Creatig Exact Bezier Represetatios of CST Shapes David D. Marshall Califoria Polytechic State Uiversity, Sa Luis Obispo, CA 93407-035, USA The paper presets a method of expressig CST shapes pioeered by

More information

Intro to Scientific Computing: Solutions

Intro to Scientific Computing: Solutions Itro to Scietific Computig: Solutios Dr. David M. Goulet. How may steps does it take to separate 3 objects ito groups of 4? We start with 5 objects ad apply 3 steps of the algorithm to reduce the pile

More information

Polynomial Functions and Models. Learning Objectives. Polynomials. P (x) = a n x n + a n 1 x n a 1 x + a 0, a n 0

Polynomial Functions and Models. Learning Objectives. Polynomials. P (x) = a n x n + a n 1 x n a 1 x + a 0, a n 0 Polyomial Fuctios ad Models 1 Learig Objectives 1. Idetify polyomial fuctios ad their degree 2. Graph polyomial fuctios usig trasformatios 3. Idetify the real zeros of a polyomial fuctio ad their multiplicity

More information

New Fuzzy Color Clustering Algorithm Based on hsl Similarity

New Fuzzy Color Clustering Algorithm Based on hsl Similarity IFSA-EUSFLAT 009 New Fuzzy Color Clusterig Algorithm Based o hsl Similarity Vasile Ptracu Departmet of Iformatics Techology Tarom Compay Bucharest Romaia Email: patrascu.v@gmail.com Abstract I this paper

More information

A Very Simple Approach for 3-D to 2-D Mapping

A Very Simple Approach for 3-D to 2-D Mapping A Very Simple Approach for -D to -D appig Sadipa Dey (1 Ajith Abraham ( Sugata Sayal ( Sadipa Dey (1 Ashi Software Private Limited INFINITY Tower II 10 th Floor Plot No. - 4. Block GP Salt Lake Electroics

More information

The following algorithms have been tested as a method of converting an I.F. from 16 to 512 MHz to 31 real 16 MHz USB channels:

The following algorithms have been tested as a method of converting an I.F. from 16 to 512 MHz to 31 real 16 MHz USB channels: DBE Memo#1 MARK 5 MEMO #18 MASSACHUSETTS INSTITUTE OF TECHNOLOGY HAYSTACK OBSERVATORY WESTFORD, MASSACHUSETTS 1886 November 19, 24 Telephoe: 978-692-4764 Fax: 781-981-59 To: From: Mark 5 Developmet Group

More information

BOOLEAN DIFFERENTIATION EQUATIONS APPLICABLE IN RECONFIGURABLE COMPUTATIONAL MEDIUM

BOOLEAN DIFFERENTIATION EQUATIONS APPLICABLE IN RECONFIGURABLE COMPUTATIONAL MEDIUM MATEC Web of Cofereces 79, 01014 (016) DOI: 10.1051/ mateccof/0167901014 T 016 BOOLEAN DIFFERENTIATION EQUATIONS APPLICABLE IN RECONFIGURABLE COMPUTATIONAL MEDIUM Staislav Shidlovskiy 1, 1 Natioal Research

More information

The Magma Database file formats

The Magma Database file formats The Magma Database file formats Adrew Gaylard, Bret Pikey, ad Mart-Mari Breedt Johaesburg, South Africa 15th May 2006 1 Summary Magma is a ope-source object database created by Chris Muller, of Kasas City,

More information

6.854J / J Advanced Algorithms Fall 2008

6.854J / J Advanced Algorithms Fall 2008 MIT OpeCourseWare http://ocw.mit.edu 6.854J / 18.415J Advaced Algorithms Fall 2008 For iformatio about citig these materials or our Terms of Use, visit: http://ocw.mit.edu/terms. 18.415/6.854 Advaced Algorithms

More information

k (check node degree) and j (variable node degree)

k (check node degree) and j (variable node degree) A Parallel Turbo Decodig Message Passig Architecture for Array LDPC Codes Kira Guam, Pakaj Bhagawat, Weihuag Wag, Gwa Choi, Mark Yeary * Dept. of Electrical Egieerig, Texas A&M Uiversity, College Statio,

More information

The Counterchanged Crossed Cube Interconnection Network and Its Topology Properties

The Counterchanged Crossed Cube Interconnection Network and Its Topology Properties WSEAS TRANSACTIONS o COMMUNICATIONS Wag Xiyag The Couterchaged Crossed Cube Itercoectio Network ad Its Topology Properties WANG XINYANG School of Computer Sciece ad Egieerig South Chia Uiversity of Techology

More information

CSC165H1 Worksheet: Tutorial 8 Algorithm analysis (SOLUTIONS)

CSC165H1 Worksheet: Tutorial 8 Algorithm analysis (SOLUTIONS) CSC165H1, Witer 018 Learig Objectives By the ed of this worksheet, you will: Aalyse the ruig time of fuctios cotaiig ested loops. 1. Nested loop variatios. Each of the followig fuctios takes as iput a

More information

The Closest Line to a Data Set in the Plane. David Gurney Southeastern Louisiana University Hammond, Louisiana

The Closest Line to a Data Set in the Plane. David Gurney Southeastern Louisiana University Hammond, Louisiana The Closest Lie to a Data Set i the Plae David Gurey Southeaster Louisiaa Uiversity Hammod, Louisiaa ABSTRACT This paper looks at three differet measures of distace betwee a lie ad a data set i the plae:

More information

Primitive polynomials selection method for pseudo-random number generator

Primitive polynomials selection method for pseudo-random number generator Joural of hysics: Coferece Series AER OEN ACCESS rimitive polyomials selectio method for pseudo-radom umber geerator To cite this article: I V Aiki ad Kh Alajjar 08 J. hys.: Cof. Ser. 944 0003 View the

More information

Lecture 18. Optimization in n dimensions

Lecture 18. Optimization in n dimensions Lecture 8 Optimizatio i dimesios Itroductio We ow cosider the problem of miimizig a sigle scalar fuctio of variables, f x, where x=[ x, x,, x ]T. The D case ca be visualized as fidig the lowest poit of

More information

Spectral leakage and windowing

Spectral leakage and windowing EEL33: Discrete-Time Sigals ad Systems Spectral leakage ad widowig. Itroductio Spectral leakage ad widowig I these otes, we itroduce the idea of widowig for reducig the effects of spectral leakage, ad

More information

Optimization for framework design of new product introduction management system Ma Ying, Wu Hongcui

Optimization for framework design of new product introduction management system Ma Ying, Wu Hongcui 2d Iteratioal Coferece o Electrical, Computer Egieerig ad Electroics (ICECEE 2015) Optimizatio for framework desig of ew product itroductio maagemet system Ma Yig, Wu Hogcui Tiaji Electroic Iformatio Vocatioal

More information

Civil Engineering Computation

Civil Engineering Computation Civil Egieerig Computatio Fidig Roots of No-Liear Equatios March 14, 1945 World War II The R.A.F. first operatioal use of the Grad Slam bomb, Bielefeld, Germay. Cotets 2 Root basics Excel solver Newto-Raphso

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Single-Cycle Disadvantages & Advantages

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Single-Cycle Disadvantages & Advantages COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter 4 The Processor Pipeliig Sigle-Cycle Disadvatages & Advatages Clk Uses the clock cycle iefficietly the clock cycle must

More information

CIS 121 Data Structures and Algorithms with Java Spring Stacks, Queues, and Heaps Monday, February 18 / Tuesday, February 19

CIS 121 Data Structures and Algorithms with Java Spring Stacks, Queues, and Heaps Monday, February 18 / Tuesday, February 19 CIS Data Structures ad Algorithms with Java Sprig 09 Stacks, Queues, ad Heaps Moday, February 8 / Tuesday, February 9 Stacks ad Queues Recall the stack ad queue ADTs (abstract data types from lecture.

More information

Project 2.5 Improved Euler Implementation

Project 2.5 Improved Euler Implementation Project 2.5 Improved Euler Implemetatio Figure 2.5.10 i the text lists TI-85 ad BASIC programs implemetig the improved Euler method to approximate the solutio of the iitial value problem dy dx = x+ y,

More information

New HSL Distance Based Colour Clustering Algorithm

New HSL Distance Based Colour Clustering Algorithm The 4th Midwest Artificial Itelligece ad Cogitive Scieces Coferece (MAICS 03 pp 85-9 New Albay Idiaa USA April 3-4 03 New HSL Distace Based Colour Clusterig Algorithm Vasile Patrascu Departemet of Iformatics

More information

How do we evaluate algorithms?

How do we evaluate algorithms? F2 Readig referece: chapter 2 + slides Algorithm complexity Big O ad big Ω To calculate ruig time Aalysis of recursive Algorithms Next time: Litterature: slides mostly The first Algorithm desig methods:

More information

K-NET bus. When several turrets are connected to the K-Bus, the structure of the system is as showns

K-NET bus. When several turrets are connected to the K-Bus, the structure of the system is as showns K-NET bus The K-Net bus is based o the SPI bus but it allows to addressig may differet turrets like the I 2 C bus. The K-Net is 6 a wires bus (4 for SPI wires ad 2 additioal wires for request ad ackowledge

More information

Enhancing Efficiency of Software Fault Tolerance Techniques in Satellite Motion System

Enhancing Efficiency of Software Fault Tolerance Techniques in Satellite Motion System Joural of Iformatio Systems ad Telecommuicatio, Vol. 2, No. 3, July-September 2014 173 Ehacig Efficiecy of Software Fault Tolerace Techiques i Satellite Motio System Hoda Baki Departmet of Electrical ad

More information

EE260: Digital Design, Spring /16/18. n Example: m 0 (=x 1 x 2 ) is adjacent to m 1 (=x 1 x 2 ) and m 2 (=x 1 x 2 ) but NOT m 3 (=x 1 x 2 )

EE260: Digital Design, Spring /16/18. n Example: m 0 (=x 1 x 2 ) is adjacent to m 1 (=x 1 x 2 ) and m 2 (=x 1 x 2 ) but NOT m 3 (=x 1 x 2 ) EE26: Digital Desig, Sprig 28 3/6/8 EE 26: Itroductio to Digital Desig Combiatioal Datapath Yao Zheg Departmet of Electrical Egieerig Uiversity of Hawaiʻi at Māoa Combiatioal Logic Blocks Multiplexer Ecoders/Decoders

More information

Analysis of Algorithms

Analysis of Algorithms Aalysis of Algorithms Iput Algorithm Output A algorithm is a step-by-step procedure for solvig a problem i a fiite amout of time. Ruig Time Most algorithms trasform iput objects ito output objects. The

More information

Running Time ( 3.1) Analysis of Algorithms. Experimental Studies. Limitations of Experiments

Running Time ( 3.1) Analysis of Algorithms. Experimental Studies. Limitations of Experiments Ruig Time ( 3.1) Aalysis of Algorithms Iput Algorithm Output A algorithm is a step- by- step procedure for solvig a problem i a fiite amout of time. Most algorithms trasform iput objects ito output objects.

More information

Harris Corner Detection Algorithm at Sub-pixel Level and Its Application Yuanfeng Han a, Peijiang Chen b * and Tian Meng c

Harris Corner Detection Algorithm at Sub-pixel Level and Its Application Yuanfeng Han a, Peijiang Chen b * and Tian Meng c Iteratioal Coferece o Computatioal Sciece ad Egieerig (ICCSE 015) Harris Corer Detectio Algorithm at Sub-pixel Level ad Its Applicatio Yuafeg Ha a, Peijiag Che b * ad Tia Meg c School of Automobile, Liyi

More information

Lecture 2. RTL Design Methodology. Transition from Pseudocode & Interface to a Corresponding Block Diagram

Lecture 2. RTL Design Methodology. Transition from Pseudocode & Interface to a Corresponding Block Diagram Lecture 2 RTL Desig Methodology Trasitio from Pseudocode & Iterface to a Correspodig Block Diagram Structure of a Typical Digital Data Iputs Datapath (Executio Uit) Data Outputs System Cotrol Sigals Status

More information

Analysis of Server Resource Consumption of Meteorological Satellite Application System Based on Contour Curve

Analysis of Server Resource Consumption of Meteorological Satellite Application System Based on Contour Curve Advaces i Computer, Sigals ad Systems (2018) 2: 19-25 Clausius Scietific Press, Caada Aalysis of Server Resource Cosumptio of Meteorological Satellite Applicatio System Based o Cotour Curve Xiagag Zhao

More information

The Penta-S: A Scalable Crossbar Network for Distributed Shared Memory Multiprocessor Systems

The Penta-S: A Scalable Crossbar Network for Distributed Shared Memory Multiprocessor Systems The Peta-S: A Scalable Crossbar Network for Distributed Shared Memory Multiprocessor Systems Abdulkarim Ayyad Departmet of Computer Egieerig, Al-Quds Uiversity, Jerusalem, P.O. Box 20002 Tel: 02-2797024,

More information

Optimal Mapped Mesh on the Circle

Optimal Mapped Mesh on the Circle Koferece ANSYS 009 Optimal Mapped Mesh o the Circle doc. Ig. Jaroslav Štigler, Ph.D. Bro Uiversity of Techology, aculty of Mechaical gieerig, ergy Istitut, Abstract: This paper brigs out some ideas ad

More information

. Written in factored form it is easy to see that the roots are 2, 2, i,

. Written in factored form it is easy to see that the roots are 2, 2, i, CMPS A Itroductio to Programmig Programmig Assigmet 4 I this assigmet you will write a java program that determies the real roots of a polyomial that lie withi a specified rage. Recall that the roots (or

More information

Bezier curves. Figure 2 shows cubic Bezier curves for various control points. In a Bezier curve, only

Bezier curves. Figure 2 shows cubic Bezier curves for various control points. In a Bezier curve, only Edited: Yeh-Liag Hsu (998--; recommeded: Yeh-Liag Hsu (--9; last updated: Yeh-Liag Hsu (9--7. Note: This is the course material for ME55 Geometric modelig ad computer graphics, Yua Ze Uiversity. art of

More information

Analysis of Algorithms

Analysis of Algorithms Aalysis of Algorithms Ruig Time of a algorithm Ruig Time Upper Bouds Lower Bouds Examples Mathematical facts Iput Algorithm Output A algorithm is a step-by-step procedure for solvig a problem i a fiite

More information

Chapter 11. Friends, Overloaded Operators, and Arrays in Classes. Copyright 2014 Pearson Addison-Wesley. All rights reserved.

Chapter 11. Friends, Overloaded Operators, and Arrays in Classes. Copyright 2014 Pearson Addison-Wesley. All rights reserved. Chapter 11 Frieds, Overloaded Operators, ad Arrays i Classes Copyright 2014 Pearso Addiso-Wesley. All rights reserved. Overview 11.1 Fried Fuctios 11.2 Overloadig Operators 11.3 Arrays ad Classes 11.4

More information

Improving Information Retrieval System Security via an Optimal Maximal Coding Scheme

Improving Information Retrieval System Security via an Optimal Maximal Coding Scheme Improvig Iformatio Retrieval System Security via a Optimal Maximal Codig Scheme Dogyag Log Departmet of Computer Sciece, City Uiversity of Hog Kog, 8 Tat Chee Aveue Kowloo, Hog Kog SAR, PRC dylog@cs.cityu.edu.hk

More information

Pseudocode ( 1.1) Analysis of Algorithms. Primitive Operations. Pseudocode Details. Running Time ( 1.1) Estimating performance

Pseudocode ( 1.1) Analysis of Algorithms. Primitive Operations. Pseudocode Details. Running Time ( 1.1) Estimating performance Aalysis of Algorithms Iput Algorithm Output A algorithm is a step-by-step procedure for solvig a problem i a fiite amout of time. Pseudocode ( 1.1) High-level descriptio of a algorithm More structured

More information

DETECTION OF LANDSLIDE BLOCK BOUNDARIES BY MEANS OF AN AFFINE COORDINATE TRANSFORMATION

DETECTION OF LANDSLIDE BLOCK BOUNDARIES BY MEANS OF AN AFFINE COORDINATE TRANSFORMATION Proceedigs, 11 th FIG Symposium o Deformatio Measuremets, Satorii, Greece, 2003. DETECTION OF LANDSLIDE BLOCK BOUNDARIES BY MEANS OF AN AFFINE COORDINATE TRANSFORMATION Michaela Haberler, Heribert Kahme

More information

T Shaped Fractal Geomerty Based Micro Strip Patch Antenna

T Shaped Fractal Geomerty Based Micro Strip Patch Antenna ISSN 2395-1621 T Shaped Fractal Geomerty Based Micro Strip Patch Atea #1 Dr. Swapil Lahudkar, #2 Satosh Sigh, #3 Akit Yadav, #4 Pooja Kad 2 sk.sigh260@gmail.com #1 Prof. Departmet of Electroics ad Telecommuicatio

More information

An Improved Shuffled Frog-Leaping Algorithm for Knapsack Problem

An Improved Shuffled Frog-Leaping Algorithm for Knapsack Problem A Improved Shuffled Frog-Leapig Algorithm for Kapsack Problem Zhoufag Li, Ya Zhou, ad Peg Cheg School of Iformatio Sciece ad Egieerig Hea Uiversity of Techology ZhegZhou, Chia lzhf1978@126.com Abstract.

More information

Lip Contour Extraction Based on Support Vector Machine

Lip Contour Extraction Based on Support Vector Machine Lip Cotour Extractio Based o Support Vector Machie Author Pa, Xiaosheg, Kog, Jiagpig, Liew, Ala Wee-Chug Published 008 Coferece Title CISP 008 : Proceedigs, First Iteratioal Cogress o Image ad Sigal Processig

More information

Running Time. Analysis of Algorithms. Experimental Studies. Limitations of Experiments

Running Time. Analysis of Algorithms. Experimental Studies. Limitations of Experiments Ruig Time Aalysis of Algorithms Iput Algorithm Output A algorithm is a step-by-step procedure for solvig a problem i a fiite amout of time. Most algorithms trasform iput objects ito output objects. The

More information

ANN WHICH COVERS MLP AND RBF

ANN WHICH COVERS MLP AND RBF ANN WHICH COVERS MLP AND RBF Josef Boští, Jaromír Kual Faculty of Nuclear Scieces ad Physical Egieerig, CTU i Prague Departmet of Software Egieerig Abstract Two basic types of artificial eural etwors Multi

More information

Lecture Notes 6 Introduction to algorithm analysis CSS 501 Data Structures and Object-Oriented Programming

Lecture Notes 6 Introduction to algorithm analysis CSS 501 Data Structures and Object-Oriented Programming Lecture Notes 6 Itroductio to algorithm aalysis CSS 501 Data Structures ad Object-Orieted Programmig Readig for this lecture: Carrao, Chapter 10 To be covered i this lecture: Itroductio to algorithm aalysis

More information