Floating-Point Division Algorithms for an x86 Microprocessor with a Rectangular Multiplier

Size: px
Start display at page:

Download "Floating-Point Division Algorithms for an x86 Microprocessor with a Rectangular Multiplier"

Transcription

1 Floatng-Pont Dvson Algorthms for an x86 Mcroprocessor wth a Rectangular Multpler Mchael J. Schulte Dmtr Tan Carl E. Lemonds Unversty of Wsconsn Advanced Mcro Devces Advanced Mcro Devces Schulte@engr.wsc.edu Dmtr.Tan@amd.com Carl.Lemonds@amd.com Abstract Floatng-pont dvson s an mportant operaton n scentfc computng and multmeda applcatons. Ths paper presents and compares two dvson algorthms for an x86 mcroprocessor, whch utlzes a rectangular multpler that s optmzed for multmeda applcatons. The proposed dvson algorthms are based on Goldschmdt s dvson algorthm and provde correctly rounded results for IEEE 754 sngle, double, and extended precson floatng-pont numbers. Compared to a prevous Goldschmdt dvson algorthm, the fastest proposed algorthm requres 25% to 37% fewer cycles, whle utlzng a multpler that s roughly 2.5 tmes smaller.. Introducton In an x86 mcroprocessor, the floatng-pont unt (FPU) has undergone consderable change n recent years. Much of ths change s due to the advent of Streamng SIMD Extensons (SSE) []. These extensons, manly drven by multmeda applcatons (3D graphcs, vdeo, etc.), have added complexty to recent FPU desgns. Pror to the addton of SSE, the FPU n x86 mcroprocessors only had to support x87 scentfc floatng-pont nstructons. In x87 mode, the FPU performs arthmetc operatons on 8-bt extended-precson floatng-pont numbers, and then rounds the results to 32-bt sngle, 64-bt double, or 8- bt extended precson floatng-pont numbers [2]. Floatng-pont arthmetc n x86 mcroprocessors comples wth the specfcatons gven n the IEEE-754 Standard for Bnary Floatng-Pont Arthmetc [3]. Wth the growng mportance of multmeda applcatons, the FPU s now requred to support both x87 nstructons and SSE nstructons. In 999, Intel ntroduced SSE nstructons that perform multple floatng-pont arthmetc operatons on sngle-precson floatng-pont data types []. For example, a sngle SSE nstructon, DIVPS, performs four sngleprecson floatng-pont dvde operatons. A few years later, SSE2 ntroduced new nstructons for parallel double-precson operatons. Recently, SSE3 added horzontal arthmetc and asymmetrc arthmetc operatons, but no new data formats. Multmeda applcatons are placng a greater emphass on SSE performance over x87. Hence, the FPU workload s shftng from engneerng and scentfc computng to multmeda applcatons. We are desgnng an FPU that utlzes a 27-bt by 76-bt rectangular multpler, n whch the length of the multpler operand s less than the length of the multplcand operand. Ths reduces the area of the multpler, but requres multple passes through the multpler to produce a full-precson result. Our multpler s optmzed for sngle-precson SSE nstructons, whch are wdely used n multmeda applcatons [, 4]. The multpler can perform two parallel sngle-precson multples each cycle wth a latency of two cycles. It can perform one doubleprecson multply every other cycle wth a latency of three cycles or one extended-precson multply every three cycles wth a latency of four cycles. Compared to a fully-ppelned multpler, the rectangular multpler mproves the latency of sngle precson multples and reduces the area of the FPU. It also has the potental to reduce power dsspaton for multmeda applcatons. In addton to performng multplcaton, the rectangular multpler s used to perform dvson, square root, and elementary functon computatons. Due to ts mportance n scentfc computng and multmeda applcatons, several algorthms for floatng-pont dvson have been developed [5]. These algorthms can be dvded nto three man categores; dgt recurrence, very hgh-radx, and functonal teraton. Dgt recurrence algorthms, such as restorng dvson, non-restorng dvson, and SRT dvson, compute a fxed number of quotent bts each teraton [6]. Very hgh-radx dvson algorthms, ncludng accurate quotent approxmatons [7], the short recprocal algorthm [8, 9, ], and prescalng /7/$ IEEE 34

2 and selecton by roundng algorthms [, 2], are dgt recurrence algorthms that compute a large number of quotent bts (e.g., 8 or more) each teraton. Functonal teraton algorthms, such as Goldschmdt s algorthm [3] and Newton-Raphson teraton [4], typcally obtan an estmate of the dvsor s recprocal, and then use multplcaton and subtracton to double the number of accurate quotent bts each teraton. In ths paper, we present and compare two dvson algorthms for an x86 mcroprocessor wth a rectangular multpler. These algorthms are based on Goldschmdt s dvson algorthm and provde support for sngle, double, and extended precson floatngpont numbers. The algorthms are also compared to the algorthm and mplementaton used on the AMD- K7 FPU [5], whch employ Goldschmdt s algorthm to perform dvson, but uses a fully ppelned multpler. Some of our goals n developng these algorthms nclude () the algorthms should have a small mpact on the archtecture and performance of the multpler, (2) they should be able to effcently utlze the rectangular multpler and hgh-speed recprocal approxmatons, (3) they should have low latences and not requre unnecessary passes through the rectangular multpler, (4) they should be optmzed for sngleprecson numbers, but also be able to effcently support double and extended-precson numbers, and (5) they should produce correctly rounded results, as specfed n the IEEE 754 Standard for Bnary Floatng-Pont Arthmetc. The man contrbuton of ths paper s the presentaton of two new dvson algorthms that are desgned to be mplemented wth a rectangular multpler and provde support for x87 and SSE datatypes. The algorthms presented n ths paper are based on Goldschmdt s dvson algorthm and are able to utlze the rectangular multpler and hgh-speed recprocal approxmatons. Our algorthms have low latences, especally for sngle-precson numbers. Compared to very hgh-radx algorthms, our algorthms requre fewer modfcatons to the multpler archtecture. They have lower latences than equvalent Newton-Raphson-based dvson algorthms, snce there are fewer dependences between multplcatons. The remander of ths paper s organzed as follows: Secton 2 gves an overvew of Goldschmdt s dvson algorthm. Secton 3 presents the desgn of a 27-bt by 76-bt rectangular multpler that provdes hghperformance sngle-precson multplcatons and s extended to mplement the proposed dvson algorthms. Secton 4 dscusses a prevous mplementaton of Goldschmdt s dvson algorthm on the AMD-K7 FPU, and descrbes our proposed dvson algorthms. Secton 5 compares the dvson algorthms, and Secton 6 gves our conclusons. In the followng sectons, upper case varables denote operands and lower-case varables denote bts wthn those operands. Indvdual bts are ndexed by ther bt poston wth the more sgnfcant bts havng lower ndces. For example, = x.x x n- has the value: V = n x = 2 When bts through j of are accessed, we use the notaton [:j], where [:j] = x x + x j- x j for < j. 2. Goldschmdt s dvson algorthm Goldschmdt s dvson algorthm s also known as dvson by multplcatve normalzaton, dvson by convergence, and dvson by seres expanson. It has been mplemented n the IBM 36/9 [6], the TMS39C62A [7], the IBM S/39 G4 [8], and the AMD-K7 mcroprocessor [5]. Varous publcatons descrbe Goldschmdt s dvson algorthm [9, 2, 2], ts error analyss [22], and ts mplementaton usng ppelned multplers [23, 24]. Goldschmdt s dvson algorthm, computes the quotent Q = A/B by startng wth an ntal approxmaton to the dvsor s recprocal; /B. It then multples by the dvdend, A, and dvsor, B, to obtan: N = A () D = B (2) R = 2 D (3) After ths, m teratons are performed, where: N+ = R N (4) D+ = R D (5) R+ = 2 D+ (6) Fnally, N m s multpled by R m to obtan Q. Each teraton requres two multplcatons and one subtracton (or complement operaton) and approxmately doubles the number of accurate bts. If has an absolute error of ε and computatons are performed wthout roundng error then: A N = A = + ε A = + A ε B B (7) D = B = + ε B Bε = + B (8) R = 2 D = 2 ( + ε ) = Bε (9) In the next teraton: 35

3 N = R N = D = R D = R = 2 D ( Bε ) ( Bε ) A + Aε B ( + Bε = 2 ( B ε ) = + B ε A 2 = ABε B 2 2 ) = B ε () () (2) In general, when N s close to A/B, D + and R + converge towards. and N + converges towards A/B. Each teraton roughly doubles the number of accurate bts n the quotent approxmaton, N. Snce R s close to., not all of the bts of R are needed to compute N and D. If ε < 2, R has k R k ε R the form. r k+ r k+2 r n-. If 2 <, R has the form. r k+ r k+2 r n-. Consequently, the k most sgnfcant bts of R are not needed when computng N and D. Usng the substtuton R = R -, Equatons (4) to (6) can be rewrtten as: N+ = N + R N (3) D + = D + R D (4) R + = D + (5) Although ths approach requres extra addtons to mplement Equaton (3) and (4), t has the advantage ' that R s close to zero, whch lets R ' N and R ' D be computed wth less precson. Instead of ' computng R+ = D drectly, hardware computes + R as the one s complement of D and then computes: + k N + = N + N 2 = N + {' k, N } (6) k D+ = D + D 2 = D + {' k, D} (7) These computatons multply the approprate bts from R by N or D rght shfted by k bts and then adds ths product to N or D, respectvely. double precson numbers wth 53-bt sgnfcands, and extended precson numbers wth 64-bt sgnfcands. Smlar to the AMD-K7 multpler desgn [5], our multpler also provdes a varety of other multplcaton szes to facltate accurate dvson, square root, and elementary functon computatons. The multplcaton szes supported nclude 24x24, 25x24, 27x76, 53x53, 54x53, 54x76, 64x64, 68x68 and 76x76. The multpler also performs two sngle precson (dual 24x24) multplcatons n parallel, whch s frequently used n multmeda applcatons. 3. Rectangular multpler The rectangular floatng-pont multpler used to mplement our proposed dvson algorthms has two ppelne stages, as shown n Fgure. The frst stage, E, conssts of a 27-bt by 76-bt tree multpler that accepts the two numbers to be multpled, along wth a 76-bt feedback term n carry-save format, and produces a 3-bt product n carry-save format. The second stage, E2, conssts of combned addton and roundng, result multplexng, and forwardng to the regster fle and bypass networks. The multpler supports a range of precsons wth wder precson multples acheved by multple passes through the frst stage, E. It supports operatons on sngle precson numbers wth 24-bt sgnfcands, Fgure. 27-bt by 76-bt multpler For each pass through the multpler, the approprate 27-bts of the multpler operand are selected by the Unpack/Algn Multplexers. Two sets of radx-4 Booth encoders are requred to support the dual 24x24 multply. The Booth multplexers produce fourteen 8-bt partal products, whch are reduced, along wth the two 76-bt feedback terms, usng a partal product reducton tree mplemented usng three levels of 4-2 compressors. For the frst pass, the feedback terms are all zeros. For subsequent passes, the feedback terms 36

4 are obtaned from the upper 76-bts of the carry-save product from the prevous pass. The roundng scheme mplemented n the second stage, E2, nvolves addng roundng constants to the carry-save product usng 3-2 carry-save adders (CSAs) pror to the fnal addton [5]. The roundng s performed pror to normalzaton usng two addtons, wth one addton assumng roundng overflow occurs and one addton assumng roundng overflow does not occur. A thrd addton computes the un-rounded sgnfcand [5]. An approprate roundng constant s provded for each of the frst two addtons and s omtted for the un-rounded sgnfcand. Snce for wder precson multples, the product generaton s splt over multple cycles, the lower 27-bts are processed after each pass to compute the stcky bt and the carryn for the next pass. Table shows the multpler passes, latences, and throughputs for supported multplcaton szes. Table. Multpler passes, latences, and throughput for supported multplcaton szes Multplcaton Szes Multpler Passes Latences (cycles) Throughputs (mults/cycle) Dual 24x x24, 25x24, 27x x53, 54x53, 54x /2 64x64, 68x68, 76x /3 4. Floatng-pont dvson algorthms The dvson algorthms presented n ths paper are derved from the AMD-K7 Goldschmdt dvson algorthm [5], whch was desgned for a fullyppelned 76-bt by 76-bt multpler. Ths secton gves an overvew of the AMD-K7 dvson algorthm [5]. It then presents our varatons of Goldschmdt s dvson algorthm that are desgned for an x86 mcroprocessor wth the 76-bt by 27-bt rectangular multpler presented n Secton 3. The algorthms can be modfed for other multpler szes. Fgure 2 shows the verson of Goldschmdt s dvson algorthm mplemented on the AMD-K7 and presented n [5]. Ths dvson algorthm only supports extended precson nput operands wth results rounded to sngle, double, extended, or nternal precson. In Fgure 2, A and B are the nput operands. PC s the sgnfcand precson control, where PC s 24 for sngle precson, 53 for double precson, 64 for extend precson, and 68 for nternal precson. Dvson wth an nternal precson of 68 bts s used to compute certan elementary functons. RC s the roundng control, whch ndcates f the fnal result s rounded to nearest even, toward zero, toward mnus nfnty, or toward plus nfnty. Q s the ntal quotent approxmaton and Q f s the fnal correctly rounded quotent. REM s a 2-bt varable that ndcates the sgn of the remander and f the remander s zero. The cycles shown on the rght assume that the ntal recprocal estmate takes three cycles and each multplcaton takes four cycles [5]. The dvson algorthm takes 6 cycles for sngle precson (PC = 24), 2 cycles for double precson (PC = 53), and 24 cycles for extended and nternal precson (PC = 64 and 68, respectvely). Program: Goldschmdt s Dvson Algorthm n the AMD-K7 wth a 76 by 76 Multpler [5] Input = (A, B, PC, RC), Output = (Q f ) Operatons Cycles = recp_estmate(b) -3 D = termul_76x76(, B), R = comp(d ) 4-7 N = termul_76x76(, A) 5-8 f (PC == 24) {N f = N, R f = R, D = termul_76x76(d, R ), R = comp(d ) 8- N = termul_76x76(n, R ) 9-2 f (PC == 53) {N f = R, R f = R, D 2 = termmul_76x76(d, R ), R 2 = comp(d 2 ) 2-5 N 2 = termmul_76x76 (N, R ) 3-6 R f = N 2, R f = R 2 END DIVISION: Q = lastmul_76x76(n f, R f, PC+) See + REM = backmul_76x76(q, B, A), Q f = round(q, REM, PC, RC) See * (PC = 24), 3-6 (PC = 53), 7-2 (PC = 64/68) * 3-6 (PC = 24), 7-2 (PC = 53), 2-24 (PC = 64/68) Fgure 2: Goldschmdt s algorthm n the AMD-K7 The algorthm shown n Fgure 2 ncludes several operatons, whch are dscussed n detal by Oberman [5]. The recp_estmate operaton uses 2 -entry by 6-bt and 2 -entry by 7-bt bpartte tables to provde a recprocal estmate that s accurate to at least 4.94 bts [5, 25]. The termul_76x76 operaton corresponds to a 76-bt by 76-bt multply n whch the result s rounded to 76 bts usng round-to-nearest-even. The comp operaton produces the one s complement of D, whch s a 76-bt value. The lastmul_76x76 operaton s a 76-bt by 76-bt multply, whch rounds ts result to PC+ bts of precson usng round-to-nearest-even. PC+ bts of precson are requred n order to mplement the AMD-K7 roundng technque [5]. The backmul_76x76 operaton performs a 76-bt by 76-bt multplcaton of Q B and subtracts A to determne the sgn of the remander and f the remander s equal to zero. The round operaton produces the correctly 37

5 rounded quotent usng the AMD-K7 roundng technque [5]. To more effcently mplement Goldschmdt s dvson algorthm wth a rectangular multpler, our frst verson of Goldschmdt s algorthm (GS-) uses a truncated verson of R, n whch the requred precson of R s determned from a detaled error analyss. Ths analyss ndcates correctly rounded results are stll produced, when R s truncated to 3 bts and R s truncated to 6 bts. Snce R must be longer than 27 bts, t needs two passes through the 27-bt by 76-bt multpler, so R s nstead truncated to 54 bts. Smlarly, snce R s longer than 54 bts, t needs three passes through the multpler, so all 76 bts are used. Program: Goldschmdt s Dvson Algorthm wth Truncated R on a 27 x 76 Multpler (GS-) Input = (A,B,OT, PC, RC) Output = (Q f ) Operatons Cycles = recp_estmate(b) -3 D = termul_27x76(, B), R = comp(d ) 4-5 N = termul_27x76(, A) 5-6 f (OT = = SINGLE) { Q = lastmul_54x76(r [:53], N, 25) 7-9 REM = backmul_25x24(q, B, A), Q f = round(q, REM, 24, RC) - f (OT = = 87 and PC = = 24) goto 87 DIV D = termul_54x76(r [:53], D ), R = comp(d ) 6-8 N = termul_54x76(r [:53], N ) 8- f (OT = = DOUBLE ) { Q = lastmul_76x76(r, N, 54) -4 REM = backmul_54x53(q, B, A), Q f = round(q, REM, 53, RC) DIV: f (PC == 24) { Q = lastmul_54x76(r [:53], N, 25) 7-9 else f (PC == 53) Q = lastmul_76x76(r, N, 54) -4 else { D 2 = termul_76x76(r, D ), R 2 = comp(d 2 ) -4 N 2 = termul_76x76(r, N ) 4-7 Q = lastmul_76x76(r 2, N 2, PC+) } 8-2 REM = backmul_76x76(q, B, A), Q f = round(q, REM, PC, RC) See * END DIVISION: * -3 (PC=24), 5-8 (PC=53), (PC = 64/68) Fgure 3: Goldschmdt s algorthm wth truncated R on a 27 x 76 multpler (GS-) Utlzng a truncated verson of R allows some of the multplcatons to be performed wth fewer passes through the rectangular multpler. The GS- algorthm also examnes the operand type, OT, snce SSE requres support for sngle and double precson nput operands and operatons on these types of operands requre fewer passes through the rectangular multpler than extended precson operands. Fgure 3 shows the GS- Algorthm. In ths fgure, the sze of each multplcaton s specfed by the numbers after the _. All of the termul_ operatons, truncate ther results to 76 bts, the lastmul_ operatons round ther results to the precson specfed n the last argument usng round-to-nearest. The rest of the operatons have the same functonalty as the correspondng operatons n Fgure 2, except for the sze of the nput operands. For example, Q = lastmul_54x76(r [:53], N, 25) ndcates that the 54 most sgnfcant bts of R are multpled by all 76 bts of N. The result s rounded to 25 bts usng round-to-nearest. Snce R [:53] s 54 bts, ths multplcaton s performed wth two passes through the rectangular multpler. For sngle precson operands (OT = SINGLE), all of the multplcatons, except for lastmul_54x76, requre only a sngle pass through the multpler tree and the dvson has a latency of cycles. For double precson operands, the multplcatons requre one to three passes through the multpler tree and the dvson has a latency of 7 cycles. For x87 operands, the latency depends on the requred precson of the fnal result and s 3 cycles for sngle precson, 8 cycles for double precson, and 25 cycles for extended or nternal precson. Our second verson of Goldschmdt s algorthm (GS-2), shown n Fgure 4, uses a truncated verson of R and takes advantage of the fact that R s close to. to reduce the number of bts n R used for the teratve multplcatons and reduce the number of passes through the multpler. For example, snce 3 R < 2, the thrteen most sgnfcant bts of R are not needed. Based on Equaton (7), ths allows the computaton D = termul_54x76(r [:53], D ) (8) whch requres two passes through the multpler tree n GS- to be replaced by the computaton D = termuladd_27x76(r [3:39], D, 3) (9) whch corresponds to 3 D = D + D R[3 : 39] 2. = D + {' 3, D} R[3 : 39] (2) Ths operaton requres only a sngle pass through the multpler wth D rght shfted by 3 bts, the lower 3 bts of D truncated, and the un-shfted value of D added to the product. Ths operaton compensates for the fact that - R s used nstead of R, as descrbed n Secton 2. Smlar optmzatons are used throughout the algorthm to reduce the number of passes through 38

6 the multpler and the latency of the dvson algorthm. The operatons that use these types of optmzatons are termuladd_ and lastmuladd_. They mplement operand shftng, multplcaton, and addton by usng a modfed verson of the multpler descrbed n Secton 3. The lastmuladdd operaton s smlar to the termuladd algorthm, except that the result s rounded to the number of bts specfed by ts last argument usng round-to-nearest. Program: Goldschmdt s Dvson Algorthm wth Reduced R on a 27 x 76 Multpler (GS-2) Input = (A,B,OT, PC,RC), Output = (Q f ) Operatons Cycles = recp_estmate(b) -3 D = termul_27x76(, B), R = comp(d ) 4-5 N = termul_27x76(, A) 5-6 f (OT == SINGLE) { Q = lastmuladd_27x76(r [3:39], N, 3, 25) 7-8 REM = backmul_25x24(q, B, A), Q f = round(q, REM, 24, RC) 9- f (OT == 87 and PC = 24) goto 87 DIV D = termuladd_27x76(r [3:39], D, 3), R = comp(d ) 6-7 N = termuladd_27x76(r [3:39], N, 3) 7-8 f (OT == DOUBLE ) { Q = lastmuladd_54x76({r [26,75], N, 26, 54) 9- REM = backmul_54x53(q, B, A), Q f = round(q, REM, 53, RC) DIV: f (PC == 24) Q = lastmuladd_27x76(r [3,39], N, 3, 25) 7-8 else f (PC == 53) Q = lastmuladd_54x76(r [26,75], N, 26, 54) 9- else { D 2 = termuladd_27x76(r [26:52], D, 26), R 2 = comp(d 2 ) 8-9 N 2 = termuladd_27x76(r [26:52], N, 26) 9- Q = lastmuladd_27x76(r 2 [52:75], N 2, 52, PC+)} -2 REM = backmul_65x64(q, B, A), Q f = round(q, REM, PC, RC) See * END DIVISION: * 9-2 (PC=24), 2-5 (PC=53), 3-6 (PC=64/68) Fgure 4: Goldschmdt s algorthm wth reduced R on a 27 x 76 multpler (GS-2) 5. Algorthm comparson Table 2 compares the latency n cycles for each dvson algorthm, based on the multplcaton latences gven n Table. In Table 2,,, and (E) ndcate results are rounded to sngle, double, or extended precson, respectvely. For completeness, the latency of the orgnal dvson algorthm [5] on the AMD-K7 mcroprocessor wth a 76x76 multpler s also gven, and denoted as K7 (76x76). The 76x76 multpler s roughly 2.5 tmes larger than our 27x76 multpler. Table 2 also shows the latency for the K7 dvson algorthm [5], when t has mnor modfcaton to work wth our rectangular multpler. Ths modfed algorthm s denoted as K7 (27x76). As shown n Table 2, the two proposed algorthms have better latency than the AMD-K7 (27x76) algorthm for all operand types and precsons. The GS-2 (27x76) algorthm has the lowest overall latency for all operand types and precsons. Compared to the GS- (27x76) algorthm, the GS-2 (27x76) algorthm reduces the latency by one cycle for sngle precson, three cycles for double precson, and nne cycles for extended precson, when the nput and output operands have the same precson. Table 3 shows the number of passes through the multpler for each dvson algorthm, based on the number of multpler passes for the varous multplcaton szes gven n Table. For example, a 27x76 multplcaton only requres a sngle pass through the multpler and a 76x76 multplcaton requres 3 passes through the multpler. As shown n Table 3, the GS-2 algorthm has the fewest passes through the multpler for all operand types and precsons. The number of passes through the multpler s mportant snce t mpacts the power dsspated by the dvson algorthm and also ndcates how avalable the multpler s for mplementng other operatons. Table 2: Latency of dvson algorthms (cycles) Algorthm Sngle Double (E) K7 (76x76) K7 (27x76) GS- (27x76) GS-2 (27x76) Table 3: Multpler passes of dvson algorthms Algorthm Sngle Double (E) K7 (76x76) K7 (27x76) GS- (27x76) GS-2 (27x76)

7 Compared to the K7 (27x76) algorthm, the GS- (27x76) algorthm has roughly the same hardware complexty, but more complex control logc to handle the dfferent multplcaton szes. The GS-2 algorthm has the most complexty, snce t has addtonal multplexers to shft R, N, and D and t has modfcatons to the multpler tree to perform multply-add operatons. For our mplementaton, the relatvely small ncrease n hardware complexty of the GS-2 algorthm s less mportant than the reduced latency and passes through the rectangular multpler. 6. Conclusons Ths paper presents and compares varatons of Goldschmdt s dvson algorthm for an x86 mcroprocessor that utlzes a rectangular multpler. Of the algorthms presented n ths paper, the GS-2 algorthm has the lowest latency and requres the fewest passes through the rectangular multpler. All of the algorthms presented n ths paper have been verfed through extensve error analyss. The GS-2 algorthm has been modeled n Verlog and smulated usng over mllon test vectors for the supported operand types and result precsons. References [] S. K. Raman, V. Pentkovsk, and J. Keshava, Implementng Streamng SIMD Extensons on the Pentum III Processor, IEEE Mcro, vol. 2, no. 4, pp , July 2. [2] Advanced Mcro Devces, AMD64 Archtecture Programmer s Manual Volume 5: 64-Bt Meda and x87 Floatng-Pont Instructons, Revson 3.7, September 26. [3] ANSI and IEEE, IEEE Standard for Bnary Floatngpont Arthmetc, 985. [4] W.-C. Ma and C.-L. Yang, Usng Intel Streamng SIMD Extensons for 3D Geometry Processng, Proceedngs of the 3rd IEEE Pacfc-Rm Conference on Multmeda, pp. 8-87, December 22 [5] S. F. Oberman and M. J. Flynn, "Dvson Algorthms and Implementatons," IEEE Transactons on Computers, vol. 46, no. 8, pp , August 997. [6] M. D. Ercegovac and T. Lang, Dvson and Square Root: Dgt-Recurrence Algorthms and Implementatons, Kluwer Academc Publshers, 994. [7] D. Wong and M. Flynn, Fast Dvson Usng Accurate Quotent Approxmatons to Reduce the Number of Iteratons, IEEE Transactons on Computers, vol. 4, no. 8, pp , August 992. [8] W. S. Brggs and D. W. Matula, A 7 69 Bt Multply and Add Unt wth Redundant Bnary Feedback and Sngle Cycle Latency, Proceedngs of the th IEEE Symposum on Computer Arthmetc, pp. 63-7, July 993. [9] W. S. Brggs and D. W. Matula, Method and Apparatus for Performng Dvson Usng a Rectangular Aspect Rato, Multpler, U.S. Patent No. 5,46,38, 989. [] W. S. Brggs and D. W. Matula, Method and Apparatus for Performng Prescaled Dvson, U.S. Patent No. 5,475,63, 995. [] M. D. Ercegovac, T. Lang, and P. Montusch, Very Hgh Radx Dvson wth Prescalng and Selecton by Roundng, IEEE Transactons on Computers, vol. 43, no. 8, pp , August 994. [2] T. Lang and P. Montusch, Boostng Very Hgh Radx Dvson wth Prescalng and Selecton by Roundng, IEEE Transactons on Computers, vol. 5, no., pp. 3-27, January 2. [3] R. E. Goldschmdt, Applcatons of Dvson by Convergence, M.S. thess, Dept. of Electrcal Engneerng, MIT, Cambrdge, MA, June 964. [4] M. Flynn, On Dvson by Functonal Iteraton, IEEE Transactons on Computers, vol. 9, no. 8, pp , August 97. [5] S. F. Oberman, Floatng-pont Dvson and Square Root Algorthms and Implementaton n the AMD-K7 Mcroprocessor, In Proceedngs of the 4 th IEEE Symposum on Computer Arthmetc, pg. 6-5, 999. [6] S. F. Anderson, J. G. Earle, R. E. Goldschmdt, and D. M. Powers, The IBM System/36 Model 9: Floatng- Pont Executon Unt, IBM Journal of Research and Development, vol., pp , Jan [7] H. Darley, M. Gll, D. Earl, D. Ngo, P. Wang, M. Hpona, and J. Dodrll, Floatng Pont/Integer Processor wth Dvde and Square Root Functons, U.S. Patent No. 4,878,9, 989. [8] E. M. Schwarz, L. Sgal, and T. J. McPherson, CMOS Floatng-pont Unt for the S/39 Parallel Enterprse Server G4, IBM Journal of Research and Development, vol. 4, no. 4/5, pp , July/September 997. [9] M. D. Ercegovac and T. Lang, Dgtal Arthmetc, Morgan Kaufmann Publshers, 24. [2] B. Parham, Computer Arthmetc: Algorthms and Hardware Desgns, Oxford Unversty Press, 2. [2] I. Koren, Computer Arthmetc Algorthms, A. K. Peters, 22. [22] G. Even, P.-M Sedel, and W. E. Ferguson, A Parametrc Error Analyss of Goldschmdt's Dvson Algorthm, 6th IEEE Symposum on Computer Arthmetc, pp. 65-7, June 23. [23] G. Even and P.-M. Sedel, "Ppelned Multplcatve Dvson wth IEEE Roundng," IEEE Internatonal Conference on Computer Desgn, pp , 23. [24] G. Even and P.-M. Sedel, "Ppelned Multplcatve Dvson wth IEEE Roundng," U.S. Patent No. 24/28338, July, 24. [25] S. F. Oberman, Bpartte Look-up Table wth Output Values Havng Mnmzed Absolute Error, U.S. Patent No. 6,223,92, Aprl, 2. 3

Lecture 3: Computer Arithmetic: Multiplication and Division

Lecture 3: Computer Arithmetic: Multiplication and Division 8-447 Lecture 3: Computer Arthmetc: Multplcaton and Dvson James C. Hoe Dept of ECE, CMU January 26, 29 S 9 L3- Announcements: Handout survey due Lab partner?? Read P&H Ch 3 Read IEEE 754-985 Handouts:

More information

Newton-Raphson division module via truncated multipliers

Newton-Raphson division module via truncated multipliers Newton-Raphson dvson module va truncated multplers Alexandar Tzakov Department of Electrcal and Computer Engneerng Illnos Insttute of Technology Chcago,IL 60616, USA Abstract Reducton n area and power

More information

Conditional Speculative Decimal Addition*

Conditional Speculative Decimal Addition* Condtonal Speculatve Decmal Addton Alvaro Vazquez and Elsardo Antelo Dep. of Electronc and Computer Engneerng Unv. of Santago de Compostela, Span Ths work was supported n part by Xunta de Galca under grant

More information

RADIX-10 PARALLEL DECIMAL MULTIPLIER

RADIX-10 PARALLEL DECIMAL MULTIPLIER RADIX-10 PARALLEL DECIMAL MULTIPLIER 1 MRUNALINI E. INGLE & 2 TEJASWINI PANSE 1&2 Electroncs Engneerng, Yeshwantrao Chavan College of Engneerng, Nagpur, Inda E-mal : mrunalngle@gmal.com, tejaswn.deshmukh@gmal.com

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,

More information

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009. Farrukh Jabeen Algorthms 51 Assgnment #2 Due Date: June 15, 29. Assgnment # 2 Chapter 3 Dscrete Fourer Transforms Implement the FFT for the DFT. Descrbed n sectons 3.1 and 3.2. Delverables: 1. Concse descrpton

More information

The Codesign Challenge

The Codesign Challenge ECE 4530 Codesgn Challenge Fall 2007 Hardware/Software Codesgn The Codesgn Challenge Objectves In the codesgn challenge, your task s to accelerate a gven software reference mplementaton as fast as possble.

More information

Using Delayed Addition Techniques to Accelerate Integer and Floating-Point Calculations in Configurable Hardware

Using Delayed Addition Techniques to Accelerate Integer and Floating-Point Calculations in Configurable Hardware Draft submtted for publcaton. Please do not dstrbute Usng Delayed Addton echnques to Accelerate Integer and Floatng-Pont Calculatons n Confgurable Hardware Zhen Luo, Nonmember and Margaret Martonos, Member,

More information

Mallathahally, Bangalore, India 1 2

Mallathahally, Bangalore, India 1 2 7 IMPLEMENTATION OF HIGH PERFORMANCE BINARY SQUARER PRADEEP M C, RAMESH S, Department of Electroncs and Communcaton Engneerng, Dr. Ambedkar Insttute of Technology, Mallathahally, Bangalore, Inda pradeepmc@gmal.com,

More information

A New Memory Reduced Radix-4 CORDIC Processor For FFT Operation

A New Memory Reduced Radix-4 CORDIC Processor For FFT Operation IOSR Journal of VLSI and Sgnal Processng (IOSR-JVSP) Volume, Issue 5 (May. Jun. 013), PP 09-16 e-issn: 319 400, p-issn No. : 319 4197 www.osrjournals.org A New Memory Reduced Radx-4 CORDIC Processor For

More information

Parallel matrix-vector multiplication

Parallel matrix-vector multiplication Appendx A Parallel matrx-vector multplcaton The reduced transton matrx of the three-dmensonal cage model for gel electrophoress, descrbed n secton 3.2, becomes excessvely large for polymer lengths more

More information

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Some materal adapted from Mohamed Youns, UMBC CMSC 611 Spr 2003 course sldes Some materal adapted from Hennessy & Patterson / 2003 Elsever Scence Performance = 1 Executon tme Speedup = Performance (B)

More information

FPGA IMPLEMENTATION OF RADIX-10 PARALLEL DECIMAL MULTIPLIER

FPGA IMPLEMENTATION OF RADIX-10 PARALLEL DECIMAL MULTIPLIER FPGA IMPLEMENTATION OF RADIX-10 PARALLEL DECIMAL MULTIPLIER A Dssertaton Submtted In Partal Fulflment of the Requred for the Degree of MASTER OF TECHNOLOGY In VLSI Desgn Submtted By GEETA Roll no. 601361009

More information

Mathematics 256 a course in differential equations for engineering students

Mathematics 256 a course in differential equations for engineering students Mathematcs 56 a course n dfferental equatons for engneerng students Chapter 5. More effcent methods of numercal soluton Euler s method s qute neffcent. Because the error s essentally proportonal to the

More information

Load Balancing for Hex-Cell Interconnection Network

Load Balancing for Hex-Cell Interconnection Network Int. J. Communcatons, Network and System Scences,,, - Publshed Onlne Aprl n ScRes. http://www.scrp.org/journal/jcns http://dx.do.org/./jcns.. Load Balancng for Hex-Cell Interconnecton Network Saher Manaseer,

More information

Data Representation in Digital Design, a Single Conversion Equation and a Formal Languages Approach

Data Representation in Digital Design, a Single Conversion Equation and a Formal Languages Approach Data Representaton n Dgtal Desgn, a Sngle Converson Equaton and a Formal Languages Approach Hassan Farhat Unversty of Nebraska at Omaha Abstract- In the study of data representaton n dgtal desgn and computer

More information

Area Efficient Self Timed Adders For Low Power Applications in VLSI

Area Efficient Self Timed Adders For Low Power Applications in VLSI ISSN(Onlne): 2319-8753 ISSN (Prnt) :2347-6710 Internatonal Journal of Innovatve Research n Scence, Engneerng and Technology (An ISO 3297: 2007 Certfed Organzaton) Area Effcent Self Tmed Adders For Low

More information

High-Boost Mesh Filtering for 3-D Shape Enhancement

High-Boost Mesh Filtering for 3-D Shape Enhancement Hgh-Boost Mesh Flterng for 3-D Shape Enhancement Hrokazu Yagou Λ Alexander Belyaev y Damng We z Λ y z ; ; Shape Modelng Laboratory, Unversty of Azu, Azu-Wakamatsu 965-8580 Japan y Computer Graphcs Group,

More information

FPGA Based Fixed Width 4 4, 6 6, 8 8 and Bit Multipliers using Spartan-3AN

FPGA Based Fixed Width 4 4, 6 6, 8 8 and Bit Multipliers using Spartan-3AN IJCSNS Internatonal Journal of Computer Scence and Network Securty, VOL.11 No.2, February 211 61 FPGA Based Fxed Wdth 4 4, 6 6, 8 8 and 12 12-Bt Multplers usng Spartan-3AN Muhammad H. Ras and Mohamed H.

More information

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,

More information

Lecture - Data Encryption Standard 4

Lecture - Data Encryption Standard 4 The Data Encrypton Standard For an encrypton algorthm we requre: secrecy of the key and not of the algorthm tself s the only thng that s needed to ensure the prvacy of the data the best cryptographc algorthms

More information

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task Proceedngs of NTCIR-6 Workshop Meetng, May 15-18, 2007, Tokyo, Japan Term Weghtng Classfcaton System Usng the Ch-square Statstc for the Classfcaton Subtask at NTCIR-6 Patent Retreval Task Kotaro Hashmoto

More information

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following.

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following. Complex Numbers The last topc n ths secton s not really related to most of what we ve done n ths chapter, although t s somewhat related to the radcals secton as we wll see. We also won t need the materal

More information

Analysis of Continuous Beams in General

Analysis of Continuous Beams in General Analyss of Contnuous Beams n General Contnuous beams consdered here are prsmatc, rgdly connected to each beam segment and supported at varous ponts along the beam. onts are selected at ponts of support,

More information

CHAPTER 4 PARALLEL PREFIX ADDER

CHAPTER 4 PARALLEL PREFIX ADDER 93 CHAPTER 4 PARALLEL PREFIX ADDER 4.1 INTRODUCTION VLSI Integer adders fnd applcatons n Arthmetc and Logc Unts (ALUs), mcroprocessors and memory addressng unts. Speed of the adder often decdes the mnmum

More information

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 An Iteratve Soluton Approach to Process Plant Layout usng Mxed

More information

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 A mathematcal programmng approach to the analyss, desgn and

More information

Array transposition in CUDA shared memory

Array transposition in CUDA shared memory Array transposton n CUDA shared memory Mke Gles February 19, 2014 Abstract Ths short note s nspred by some code wrtten by Jeremy Appleyard for the transposton of data through shared memory. I had some

More information

Type-2 Fuzzy Non-uniform Rational B-spline Model with Type-2 Fuzzy Data

Type-2 Fuzzy Non-uniform Rational B-spline Model with Type-2 Fuzzy Data Malaysan Journal of Mathematcal Scences 11(S) Aprl : 35 46 (2017) Specal Issue: The 2nd Internatonal Conference and Workshop on Mathematcal Analyss (ICWOMA 2016) MALAYSIAN JOURNAL OF MATHEMATICAL SCIENCES

More information

Random Kernel Perceptron on ATTiny2313 Microcontroller

Random Kernel Perceptron on ATTiny2313 Microcontroller Random Kernel Perceptron on ATTny233 Mcrocontroller Nemanja Djurc Department of Computer and Informaton Scences, Temple Unversty Phladelpha, PA 922, USA nemanja.djurc@temple.edu Slobodan Vucetc Department

More information

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization Problem efntons and Evaluaton Crtera for Computatonal Expensve Optmzaton B. Lu 1, Q. Chen and Q. Zhang 3, J. J. Lang 4, P. N. Suganthan, B. Y. Qu 6 1 epartment of Computng, Glyndwr Unversty, UK Faclty

More information

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching A Fast Vsual Trackng Algorthm Based on Crcle Pxels Matchng Zhqang Hou hou_zhq@sohu.com Chongzhao Han czhan@mal.xjtu.edu.cn Ln Zheng Abstract: A fast vsual trackng algorthm based on crcle pxels matchng

More information

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1)

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1) Secton 1.2 Subsets and the Boolean operatons on sets If every element of the set A s an element of the set B, we say that A s a subset of B, or that A s contaned n B, or that B contans A, and we wrte A

More information

Wishing you all a Total Quality New Year!

Wishing you all a Total Quality New Year! Total Qualty Management and Sx Sgma Post Graduate Program 214-15 Sesson 4 Vnay Kumar Kalakband Assstant Professor Operatons & Systems Area 1 Wshng you all a Total Qualty New Year! Hope you acheve Sx sgma

More information

High level vs Low Level. What is a Computer Program? What does gcc do for you? Program = Instructions + Data. Basic Computer Organization

High level vs Low Level. What is a Computer Program? What does gcc do for you? Program = Instructions + Data. Basic Computer Organization What s a Computer Program? Descrpton of algorthms and data structures to acheve a specfc ojectve Could e done n any language, even a natural language lke Englsh Programmng language: A Standard notaton

More information

Programming in Fortran 90 : 2017/2018

Programming in Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Exercse 1 : Evaluaton of functon dependng on nput Wrte a program who evaluate the functon f (x,y) for any two user specfed values

More information

Specifications in 2001

Specifications in 2001 Specfcatons n 200 MISTY (updated : May 3, 2002) September 27, 200 Mtsubsh Electrc Corporaton Block Cpher Algorthm MISTY Ths document shows a complete descrpton of encrypton algorthm MISTY, whch are secret-key

More information

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth

More information

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

A New Approach For the Ranking of Fuzzy Sets With Different Heights

A New Approach For the Ranking of Fuzzy Sets With Different Heights New pproach For the ankng of Fuzzy Sets Wth Dfferent Heghts Pushpnder Sngh School of Mathematcs Computer pplcatons Thapar Unversty, Patala-7 00 Inda pushpndersnl@gmalcom STCT ankng of fuzzy sets plays

More information

Resource Efficient Design and Implementation of Standard and Truncated Multipliers using FPGAs

Resource Efficient Design and Implementation of Standard and Truncated Multipliers using FPGAs Proceedngs of the World Congress on Engneerng 2011 Vol II, July 6-8, 2011, London, U.K. Resource Effcent Desgn and Implementaton of Standard and Truncated Multplers usng FPGAs Muhammad H. Ras, Member,

More information

Simulation Based Analysis of FAST TCP using OMNET++

Simulation Based Analysis of FAST TCP using OMNET++ Smulaton Based Analyss of FAST TCP usng OMNET++ Umar ul Hassan 04030038@lums.edu.pk Md Term Report CS678 Topcs n Internet Research Sprng, 2006 Introducton Internet traffc s doublng roughly every 3 months

More information

FPGA Implementation of CORDIC Algorithms for Sine and Cosine Generator

FPGA Implementation of CORDIC Algorithms for Sine and Cosine Generator The 5th Internatonal Conference on Electrcal Engneerng and Informatcs 25 August -, 25, Bal, Indonesa FPGA Implementaton of CORDIC Algorthms for Sne and Cosne Generator Antonus P. Renardy, Nur Ahmad, Ashbr

More information

Decomposition of Grey-Scale Morphological Structuring Elements in Hardware

Decomposition of Grey-Scale Morphological Structuring Elements in Hardware Decomposton of Grey-Scale Morpholocal Structurn Elements n Hardware I Andreads, C Fyrndes, A Gasteratos and Y Boutals Secton of Electroncs & Informaton Systems Technoloy Department of Electrcal & Computer

More information

CS1100 Introduction to Programming

CS1100 Introduction to Programming Factoral (n) Recursve Program fact(n) = n*fact(n-) CS00 Introducton to Programmng Recurson and Sortng Madhu Mutyam Department of Computer Scence and Engneerng Indan Insttute of Technology Madras nt fact

More information

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach Angle Estmaton and Correcton of Hand Wrtten, Textual and Large areas of Non-Textual Document Images: A Novel Approach D.R.Ramesh Babu Pyush M Kumat Mahesh D Dhannawat PES Insttute of Technology Research

More information

Exercises (Part 4) Introduction to R UCLA/CCPR. John Fox, February 2005

Exercises (Part 4) Introduction to R UCLA/CCPR. John Fox, February 2005 Exercses (Part 4) Introducton to R UCLA/CCPR John Fox, February 2005 1. A challengng problem: Iterated weghted least squares (IWLS) s a standard method of fttng generalzed lnear models to data. As descrbed

More information

Cluster Analysis of Electrical Behavior

Cluster Analysis of Electrical Behavior Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School

More information

CMPS 10 Introduction to Computer Science Lecture Notes

CMPS 10 Introduction to Computer Science Lecture Notes CPS 0 Introducton to Computer Scence Lecture Notes Chapter : Algorthm Desgn How should we present algorthms? Natural languages lke Englsh, Spansh, or French whch are rch n nterpretaton and meanng are not

More information

CSCI 104 Sorting Algorithms. Mark Redekopp David Kempe

CSCI 104 Sorting Algorithms. Mark Redekopp David Kempe CSCI 104 Sortng Algorthms Mark Redekopp Davd Kempe Algorthm Effcency SORTING 2 Sortng If we have an unordered lst, sequental search becomes our only choce If we wll perform a lot of searches t may be benefcal

More information

THE PULL-PUSH ALGORITHM REVISITED

THE PULL-PUSH ALGORITHM REVISITED THE PULL-PUSH ALGORITHM REVISITED Improvements, Computaton of Pont Denstes, and GPU Implementaton Martn Kraus Computer Graphcs & Vsualzaton Group, Technsche Unverstät München, Boltzmannstraße 3, 85748

More information

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS ARPN Journal of Engneerng and Appled Scences 006-017 Asan Research Publshng Network (ARPN). All rghts reserved. NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS Igor Grgoryev, Svetlana

More information

Module Management Tool in Software Development Organizations

Module Management Tool in Software Development Organizations Journal of Computer Scence (5): 8-, 7 ISSN 59-66 7 Scence Publcatons Management Tool n Software Development Organzatons Ahmad A. Al-Rababah and Mohammad A. Al-Rababah Faculty of IT, Al-Ahlyyah Amman Unversty,

More information

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes SPH3UW Unt 7.3 Sphercal Concave Mrrors Page 1 of 1 Notes Physcs Tool box Concave Mrror If the reflectng surface takes place on the nner surface of the sphercal shape so that the centre of the mrror bulges

More information

THE low-density parity-check (LDPC) code is getting

THE low-density parity-check (LDPC) code is getting Implementng the NASA Deep Space LDPC Codes for Defense Applcatons Wley H. Zhao, Jeffrey P. Long 1 Abstract Selected codes from, and extended from, the NASA s deep space low-densty party-check (LDPC) codes

More information

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms Course Introducton Course Topcs Exams, abs, Proects A quc loo at a few algorthms 1 Advanced Data Structures and Algorthms Descrpton: We are gong to dscuss algorthm complexty analyss, algorthm desgn technques

More information

An Entropy-Based Approach to Integrated Information Needs Assessment

An Entropy-Based Approach to Integrated Information Needs Assessment Dstrbuton Statement A: Approved for publc release; dstrbuton s unlmted. An Entropy-Based Approach to ntegrated nformaton Needs Assessment June 8, 2004 Wllam J. Farrell Lockheed Martn Advanced Technology

More information

Memory Modeling in ESL-RTL Equivalence Checking

Memory Modeling in ESL-RTL Equivalence Checking 11.4 Memory Modelng n ESL-RTL Equvalence Checkng Alfred Koelbl 2025 NW Cornelus Pass Rd. Hllsboro, OR 97124 koelbl@synopsys.com Jerry R. Burch 2025 NW Cornelus Pass Rd. Hllsboro, OR 97124 burch@synopsys.com

More information

Hermite Splines in Lie Groups as Products of Geodesics

Hermite Splines in Lie Groups as Products of Geodesics Hermte Splnes n Le Groups as Products of Geodescs Ethan Eade Updated May 28, 2017 1 Introducton 1.1 Goal Ths document defnes a curve n the Le group G parametrzed by tme and by structural parameters n the

More information

CACHE MEMORY DESIGN FOR INTERNET PROCESSORS

CACHE MEMORY DESIGN FOR INTERNET PROCESSORS CACHE MEMORY DESIGN FOR INTERNET PROCESSORS WE EVALUATE A SERIES OF THREE PROGRESSIVELY MORE AGGRESSIVE ROUTING-TABLE CACHE DESIGNS AND DEMONSTRATE THAT THE INCORPORATION OF HARDWARE CACHES INTO INTERNET

More information

UB at GeoCLEF Department of Geography Abstract

UB at GeoCLEF Department of Geography   Abstract UB at GeoCLEF 2006 Mguel E. Ruz (1), Stuart Shapro (2), June Abbas (1), Slva B. Southwck (1) and Davd Mark (3) State Unversty of New York at Buffalo (1) Department of Lbrary and Informaton Studes (2) Department

More information

Vectorization of Image Outlines Using Rational Spline and Genetic Algorithm

Vectorization of Image Outlines Using Rational Spline and Genetic Algorithm 01 Internatonal Conference on Image, Vson and Computng (ICIVC 01) IPCSIT vol. 50 (01) (01) IACSIT Press, Sngapore DOI: 10.776/IPCSIT.01.V50.4 Vectorzaton of Image Outlnes Usng Ratonal Splne and Genetc

More information

Parallel Inverse Halftoning by Look-Up Table (LUT) Partitioning

Parallel Inverse Halftoning by Look-Up Table (LUT) Partitioning Parallel Inverse Halftonng by Look-Up Table (LUT) Parttonng Umar F. Sddq and Sadq M. Sat umar@ccse.kfupm.edu.sa, sadq@kfupm.edu.sa KFUPM Box: Department of Computer Engneerng, Kng Fahd Unversty of Petroleum

More information

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster

More information

Solving two-person zero-sum game by Matlab

Solving two-person zero-sum game by Matlab Appled Mechancs and Materals Onlne: 2011-02-02 ISSN: 1662-7482, Vols. 50-51, pp 262-265 do:10.4028/www.scentfc.net/amm.50-51.262 2011 Trans Tech Publcatons, Swtzerland Solvng two-person zero-sum game by

More information

2x x l. Module 3: Element Properties Lecture 4: Lagrange and Serendipity Elements

2x x l. Module 3: Element Properties Lecture 4: Lagrange and Serendipity Elements Module 3: Element Propertes Lecture : Lagrange and Serendpty Elements 5 In last lecture note, the nterpolaton functons are derved on the bass of assumed polynomal from Pascal s trangle for the fled varable.

More information

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices Internatonal Mathematcal Forum, Vol 7, 2012, no 52, 2549-2554 An Applcaton of the Dulmage-Mendelsohn Decomposton to Sparse Null Space Bases of Full Row Rank Matrces Mostafa Khorramzadeh Department of Mathematcal

More information

The stream cipher MICKEY-128 (version 1) Algorithm specification issue 1.0

The stream cipher MICKEY-128 (version 1) Algorithm specification issue 1.0 The stream cpher MICKEY-128 (verson 1 Algorthm specfcaton ssue 1. Steve Babbage Vodafone Group R&D, Newbury, UK steve.babbage@vodafone.com Matthew Dodd Independent consultant matthew@mdodd.net www.mdodd.net

More information

Repeater Insertion for Two-Terminal Nets in Three-Dimensional Integrated Circuits

Repeater Insertion for Two-Terminal Nets in Three-Dimensional Integrated Circuits Repeater Inserton for Two-Termnal Nets n Three-Dmensonal Integrated Crcuts Hu Xu, Vasls F. Pavlds, and Govann De Mchel LSI - EPFL, CH-5, Swtzerland, {hu.xu,vasleos.pavlds,govann.demchel}@epfl.ch Abstract.

More information

Virtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory

Virtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory Background EECS. Operatng System Fundamentals No. Vrtual Memory Prof. Hu Jang Department of Electrcal Engneerng and Computer Scence, York Unversty Memory-management methods normally requres the entre process

More information

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices Steps for Computng the Dssmlarty, Entropy, Herfndahl-Hrschman and Accessblty (Gravty wth Competton) Indces I. Dssmlarty Index Measurement: The followng formula can be used to measure the evenness between

More information

Related-Mode Attacks on CTR Encryption Mode

Related-Mode Attacks on CTR Encryption Mode Internatonal Journal of Network Securty, Vol.4, No.3, PP.282 287, May 2007 282 Related-Mode Attacks on CTR Encrypton Mode Dayn Wang, Dongda Ln, and Wenlng Wu (Correspondng author: Dayn Wang) Key Laboratory

More information

Load-Balanced Anycast Routing

Load-Balanced Anycast Routing Load-Balanced Anycast Routng Chng-Yu Ln, Jung-Hua Lo, and Sy-Yen Kuo Department of Electrcal Engneerng atonal Tawan Unversty, Tape, Tawan sykuo@cc.ee.ntu.edu.tw Abstract For fault-tolerance and load-balance

More information

APPLICATION OF PREDICTION-BASED PARTICLE FILTERS FOR TELEOPERATIONS OVER THE INTERNET

APPLICATION OF PREDICTION-BASED PARTICLE FILTERS FOR TELEOPERATIONS OVER THE INTERNET APPLICATION OF PREDICTION-BASED PARTICLE FILTERS FOR TELEOPERATIONS OVER THE INTERNET Jae-young Lee, Shahram Payandeh, and Ljljana Trajovć School of Engneerng Scence Smon Fraser Unversty 8888 Unversty

More information

ELEC 377 Operating Systems. Week 6 Class 3

ELEC 377 Operating Systems. Week 6 Class 3 ELEC 377 Operatng Systems Week 6 Class 3 Last Class Memory Management Memory Pagng Pagng Structure ELEC 377 Operatng Systems Today Pagng Szes Vrtual Memory Concept Demand Pagng ELEC 377 Operatng Systems

More information

Brave New World Pseudocode Reference

Brave New World Pseudocode Reference Brave New World Pseudocode Reference Pseudocode s a way to descrbe how to accomplsh tasks usng basc steps lke those a computer mght perform. In ths week s lab, you'll see how a form of pseudocode can be

More information

Hybrid Non-Blind Color Image Watermarking

Hybrid Non-Blind Color Image Watermarking Hybrd Non-Blnd Color Image Watermarkng Ms C.N.Sujatha 1, Dr. P. Satyanarayana 2 1 Assocate Professor, Dept. of ECE, SNIST, Yamnampet, Ghatkesar Hyderabad-501301, Telangana 2 Professor, Dept. of ECE, AITS,

More information

APPLICATION OF PREDICTION-BASED PARTICLE FILTERS FOR TELEOPERATIONS OVER THE INTERNET

APPLICATION OF PREDICTION-BASED PARTICLE FILTERS FOR TELEOPERATIONS OVER THE INTERNET APPLICATION OF PREDICTION-BASED PARTICLE FILTERS FOR TELEOPERATIONS OVER THE INTERNET Jae-young Lee, Shahram Payandeh, and Ljljana Trajovć School of Engneerng Scence Smon Fraser Unversty 8888 Unversty

More information

FPGA-based implementation of circular interpolation

FPGA-based implementation of circular interpolation Avalable onlne www.jocpr.com Journal of Chemcal and Pharmaceutcal Research, 04, 6(7):585-593 Research Artcle ISSN : 0975-7384 CODEN(USA) : JCPRC5 FPGA-based mplementaton of crcular nterpolaton Mngyu Gao,

More information

Evaluation of an Enhanced Scheme for High-level Nested Network Mobility

Evaluation of an Enhanced Scheme for High-level Nested Network Mobility IJCSNS Internatonal Journal of Computer Scence and Network Securty, VOL.15 No.10, October 2015 1 Evaluaton of an Enhanced Scheme for Hgh-level Nested Network Moblty Mohammed Babker Al Mohammed, Asha Hassan.

More information

A RECONFIGURABLE ARCHITECTURE FOR MULTI-GIGABIT SPEED CONTENT-BASED ROUTING. James Moscola, Young H. Cho, John W. Lockwood

A RECONFIGURABLE ARCHITECTURE FOR MULTI-GIGABIT SPEED CONTENT-BASED ROUTING. James Moscola, Young H. Cho, John W. Lockwood A RECONFIGURABLE ARCHITECTURE FOR MULTI-GIGABIT SPEED CONTENT-BASED ROUTING James Moscola, Young H. Cho, John W. Lockwood Dept. of Computer Scence and Engneerng Washngton Unversty, St. Lous, MO {jmm5,

More information

LS-TaSC Version 2.1. Willem Roux Livermore Software Technology Corporation, Livermore, CA, USA. Abstract

LS-TaSC Version 2.1. Willem Roux Livermore Software Technology Corporation, Livermore, CA, USA. Abstract 12 th Internatonal LS-DYNA Users Conference Optmzaton(1) LS-TaSC Verson 2.1 Wllem Roux Lvermore Software Technology Corporaton, Lvermore, CA, USA Abstract Ths paper gves an overvew of LS-TaSC verson 2.1,

More information

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,

More information

CPE 628 Chapter 2 Design for Testability. Dr. Rhonda Kay Gaede UAH. UAH Chapter Introduction

CPE 628 Chapter 2 Design for Testability. Dr. Rhonda Kay Gaede UAH. UAH Chapter Introduction Chapter 2 Desgn for Testablty Dr Rhonda Kay Gaede UAH 2 Introducton Dffcultes n and the states of sequental crcuts led to provdng drect access for storage elements, whereby selected storage elements are

More information

Sequential search. Building Java Programs Chapter 13. Sequential search. Sequential search

Sequential search. Building Java Programs Chapter 13. Sequential search. Sequential search Sequental search Buldng Java Programs Chapter 13 Searchng and Sortng sequental search: Locates a target value n an array/lst by examnng each element from start to fnsh. How many elements wll t need to

More information

Fast exponentiation via prime finite field isomorphism

Fast exponentiation via prime finite field isomorphism Alexander Rostovtsev, St Petersburg State Polytechnc Unversty rostovtsev@sslstunevaru Fast exponentaton va prme fnte feld somorphsm Rasng of the fxed element of prme order group to arbtrary degree s the

More information

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance Tsnghua Unversty at TAC 2009: Summarzng Mult-documents by Informaton Dstance Chong Long, Mnle Huang, Xaoyan Zhu State Key Laboratory of Intellgent Technology and Systems, Tsnghua Natonal Laboratory for

More information

Analysis of Min Sum Iterative Decoder using Buffer Insertion

Analysis of Min Sum Iterative Decoder using Buffer Insertion Analyss of Mn Sum Iteratve ecoder usng Buffer Inserton Saravanan Swapna M.E II year, ept of ECE SSN College of Engneerng M. Anbuselv Assstant Professor, ept of ECE SSN College of Engneerng S.Salvahanan

More information

Configuration Management in Multi-Context Reconfigurable Systems for Simultaneous Performance and Power Optimizations*

Configuration Management in Multi-Context Reconfigurable Systems for Simultaneous Performance and Power Optimizations* Confguraton Management n Mult-Context Reconfgurable Systems for Smultaneous Performance and Power Optmzatons* Rafael Maestre, Mlagros Fernandez Departamento de Arqutectura de Computadores y Automátca Unversdad

More information

Efficient Distributed File System (EDFS)

Efficient Distributed File System (EDFS) Effcent Dstrbuted Fle System (EDFS) (Sem-Centralzed) Debessay(Debsh) Fesehaye, Rahul Malk & Klara Naherstedt Unversty of Illnos-Urbana Champagn Contents Problem Statement, Related Work, EDFS Desgn Rate

More information

Rapid Development of High Performance Floating-Point Pipelines for Scientific Simulation 1

Rapid Development of High Performance Floating-Point Pipelines for Scientific Simulation 1 Rapd Development of Hgh Performance Floatng-Pont Ppelnes for Scentfc Smulaton 1 G. Lenhart, A. Kugel and R. Männer Dept. for Computer Scence V, Unversty of Mannhem, B6-26B, D-68131 Mannhem, Germany {lenhart,kugel,maenner}@t.un-mannhem.de

More information

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Proceedngs of the Wnter Smulaton Conference M E Kuhl, N M Steger, F B Armstrong, and J A Jones, eds A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Mark W Brantley Chun-Hung

More information

A High-Quality, Energy Optimized, Real-Time Sampling Rate Conversion Library for the StrongARM Microprocessor

A High-Quality, Energy Optimized, Real-Time Sampling Rate Conversion Library for the StrongARM Microprocessor A Hgh-Qualty, Energy Optmzed, Real-Tme Samplng Rate Converson Lbrary for the StrongARM Mcroprocessor Chung-Tse Mar 1, Mat C. Hans, Mark T. Smth, Tajana Smunc, Ronald W. Schafer 1 Moble& Meda Systems Laboratory

More information

Improving The Test Quality for Scan-based BIST Using A General Test Application Scheme

Improving The Test Quality for Scan-based BIST Using A General Test Application Scheme _ Improvng The Test Qualty for can-based BIT Usng A General Test Applcaton cheme Huan-Chh Tsa Kwang-Tng Cheng udpta Bhawmk Department of ECE Bell Laboratores Unversty of Calforna Lucent Technologes anta

More information

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision SLAM Summer School 2006 Practcal 2: SLAM usng Monocular Vson Javer Cvera, Unversty of Zaragoza Andrew J. Davson, Imperal College London J.M.M Montel, Unversty of Zaragoza. josemar@unzar.es, jcvera@unzar.es,

More information

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems A Unfed Framework for Semantcs and Feature Based Relevance Feedback n Image Retreval Systems Ye Lu *, Chunhu Hu 2, Xngquan Zhu 3*, HongJang Zhang 2, Qang Yang * School of Computng Scence Smon Fraser Unversty

More information

Reducing Frame Rate for Object Tracking

Reducing Frame Rate for Object Tracking Reducng Frame Rate for Object Trackng Pavel Korshunov 1 and We Tsang Oo 2 1 Natonal Unversty of Sngapore, Sngapore 11977, pavelkor@comp.nus.edu.sg 2 Natonal Unversty of Sngapore, Sngapore 11977, oowt@comp.nus.edu.sg

More information