Floating-Point Division Algorithms for an x86 Microprocessor with a Rectangular Multiplier
|
|
- Myra Hopkins
- 6 years ago
- Views:
Transcription
1 Floatng-Pont Dvson Algorthms for an x86 Mcroprocessor wth a Rectangular Multpler Mchael J. Schulte Dmtr Tan Carl E. Lemonds Unversty of Wsconsn Advanced Mcro Devces Advanced Mcro Devces Schulte@engr.wsc.edu Dmtr.Tan@amd.com Carl.Lemonds@amd.com Abstract Floatng-pont dvson s an mportant operaton n scentfc computng and multmeda applcatons. Ths paper presents and compares two dvson algorthms for an x86 mcroprocessor, whch utlzes a rectangular multpler that s optmzed for multmeda applcatons. The proposed dvson algorthms are based on Goldschmdt s dvson algorthm and provde correctly rounded results for IEEE 754 sngle, double, and extended precson floatng-pont numbers. Compared to a prevous Goldschmdt dvson algorthm, the fastest proposed algorthm requres 25% to 37% fewer cycles, whle utlzng a multpler that s roughly 2.5 tmes smaller.. Introducton In an x86 mcroprocessor, the floatng-pont unt (FPU) has undergone consderable change n recent years. Much of ths change s due to the advent of Streamng SIMD Extensons (SSE) []. These extensons, manly drven by multmeda applcatons (3D graphcs, vdeo, etc.), have added complexty to recent FPU desgns. Pror to the addton of SSE, the FPU n x86 mcroprocessors only had to support x87 scentfc floatng-pont nstructons. In x87 mode, the FPU performs arthmetc operatons on 8-bt extended-precson floatng-pont numbers, and then rounds the results to 32-bt sngle, 64-bt double, or 8- bt extended precson floatng-pont numbers [2]. Floatng-pont arthmetc n x86 mcroprocessors comples wth the specfcatons gven n the IEEE-754 Standard for Bnary Floatng-Pont Arthmetc [3]. Wth the growng mportance of multmeda applcatons, the FPU s now requred to support both x87 nstructons and SSE nstructons. In 999, Intel ntroduced SSE nstructons that perform multple floatng-pont arthmetc operatons on sngle-precson floatng-pont data types []. For example, a sngle SSE nstructon, DIVPS, performs four sngleprecson floatng-pont dvde operatons. A few years later, SSE2 ntroduced new nstructons for parallel double-precson operatons. Recently, SSE3 added horzontal arthmetc and asymmetrc arthmetc operatons, but no new data formats. Multmeda applcatons are placng a greater emphass on SSE performance over x87. Hence, the FPU workload s shftng from engneerng and scentfc computng to multmeda applcatons. We are desgnng an FPU that utlzes a 27-bt by 76-bt rectangular multpler, n whch the length of the multpler operand s less than the length of the multplcand operand. Ths reduces the area of the multpler, but requres multple passes through the multpler to produce a full-precson result. Our multpler s optmzed for sngle-precson SSE nstructons, whch are wdely used n multmeda applcatons [, 4]. The multpler can perform two parallel sngle-precson multples each cycle wth a latency of two cycles. It can perform one doubleprecson multply every other cycle wth a latency of three cycles or one extended-precson multply every three cycles wth a latency of four cycles. Compared to a fully-ppelned multpler, the rectangular multpler mproves the latency of sngle precson multples and reduces the area of the FPU. It also has the potental to reduce power dsspaton for multmeda applcatons. In addton to performng multplcaton, the rectangular multpler s used to perform dvson, square root, and elementary functon computatons. Due to ts mportance n scentfc computng and multmeda applcatons, several algorthms for floatng-pont dvson have been developed [5]. These algorthms can be dvded nto three man categores; dgt recurrence, very hgh-radx, and functonal teraton. Dgt recurrence algorthms, such as restorng dvson, non-restorng dvson, and SRT dvson, compute a fxed number of quotent bts each teraton [6]. Very hgh-radx dvson algorthms, ncludng accurate quotent approxmatons [7], the short recprocal algorthm [8, 9, ], and prescalng /7/$ IEEE 34
2 and selecton by roundng algorthms [, 2], are dgt recurrence algorthms that compute a large number of quotent bts (e.g., 8 or more) each teraton. Functonal teraton algorthms, such as Goldschmdt s algorthm [3] and Newton-Raphson teraton [4], typcally obtan an estmate of the dvsor s recprocal, and then use multplcaton and subtracton to double the number of accurate quotent bts each teraton. In ths paper, we present and compare two dvson algorthms for an x86 mcroprocessor wth a rectangular multpler. These algorthms are based on Goldschmdt s dvson algorthm and provde support for sngle, double, and extended precson floatngpont numbers. The algorthms are also compared to the algorthm and mplementaton used on the AMD- K7 FPU [5], whch employ Goldschmdt s algorthm to perform dvson, but uses a fully ppelned multpler. Some of our goals n developng these algorthms nclude () the algorthms should have a small mpact on the archtecture and performance of the multpler, (2) they should be able to effcently utlze the rectangular multpler and hgh-speed recprocal approxmatons, (3) they should have low latences and not requre unnecessary passes through the rectangular multpler, (4) they should be optmzed for sngleprecson numbers, but also be able to effcently support double and extended-precson numbers, and (5) they should produce correctly rounded results, as specfed n the IEEE 754 Standard for Bnary Floatng-Pont Arthmetc. The man contrbuton of ths paper s the presentaton of two new dvson algorthms that are desgned to be mplemented wth a rectangular multpler and provde support for x87 and SSE datatypes. The algorthms presented n ths paper are based on Goldschmdt s dvson algorthm and are able to utlze the rectangular multpler and hgh-speed recprocal approxmatons. Our algorthms have low latences, especally for sngle-precson numbers. Compared to very hgh-radx algorthms, our algorthms requre fewer modfcatons to the multpler archtecture. They have lower latences than equvalent Newton-Raphson-based dvson algorthms, snce there are fewer dependences between multplcatons. The remander of ths paper s organzed as follows: Secton 2 gves an overvew of Goldschmdt s dvson algorthm. Secton 3 presents the desgn of a 27-bt by 76-bt rectangular multpler that provdes hghperformance sngle-precson multplcatons and s extended to mplement the proposed dvson algorthms. Secton 4 dscusses a prevous mplementaton of Goldschmdt s dvson algorthm on the AMD-K7 FPU, and descrbes our proposed dvson algorthms. Secton 5 compares the dvson algorthms, and Secton 6 gves our conclusons. In the followng sectons, upper case varables denote operands and lower-case varables denote bts wthn those operands. Indvdual bts are ndexed by ther bt poston wth the more sgnfcant bts havng lower ndces. For example, = x.x x n- has the value: V = n x = 2 When bts through j of are accessed, we use the notaton [:j], where [:j] = x x + x j- x j for < j. 2. Goldschmdt s dvson algorthm Goldschmdt s dvson algorthm s also known as dvson by multplcatve normalzaton, dvson by convergence, and dvson by seres expanson. It has been mplemented n the IBM 36/9 [6], the TMS39C62A [7], the IBM S/39 G4 [8], and the AMD-K7 mcroprocessor [5]. Varous publcatons descrbe Goldschmdt s dvson algorthm [9, 2, 2], ts error analyss [22], and ts mplementaton usng ppelned multplers [23, 24]. Goldschmdt s dvson algorthm, computes the quotent Q = A/B by startng wth an ntal approxmaton to the dvsor s recprocal; /B. It then multples by the dvdend, A, and dvsor, B, to obtan: N = A () D = B (2) R = 2 D (3) After ths, m teratons are performed, where: N+ = R N (4) D+ = R D (5) R+ = 2 D+ (6) Fnally, N m s multpled by R m to obtan Q. Each teraton requres two multplcatons and one subtracton (or complement operaton) and approxmately doubles the number of accurate bts. If has an absolute error of ε and computatons are performed wthout roundng error then: A N = A = + ε A = + A ε B B (7) D = B = + ε B Bε = + B (8) R = 2 D = 2 ( + ε ) = Bε (9) In the next teraton: 35
3 N = R N = D = R D = R = 2 D ( Bε ) ( Bε ) A + Aε B ( + Bε = 2 ( B ε ) = + B ε A 2 = ABε B 2 2 ) = B ε () () (2) In general, when N s close to A/B, D + and R + converge towards. and N + converges towards A/B. Each teraton roughly doubles the number of accurate bts n the quotent approxmaton, N. Snce R s close to., not all of the bts of R are needed to compute N and D. If ε < 2, R has k R k ε R the form. r k+ r k+2 r n-. If 2 <, R has the form. r k+ r k+2 r n-. Consequently, the k most sgnfcant bts of R are not needed when computng N and D. Usng the substtuton R = R -, Equatons (4) to (6) can be rewrtten as: N+ = N + R N (3) D + = D + R D (4) R + = D + (5) Although ths approach requres extra addtons to mplement Equaton (3) and (4), t has the advantage ' that R s close to zero, whch lets R ' N and R ' D be computed wth less precson. Instead of ' computng R+ = D drectly, hardware computes + R as the one s complement of D and then computes: + k N + = N + N 2 = N + {' k, N } (6) k D+ = D + D 2 = D + {' k, D} (7) These computatons multply the approprate bts from R by N or D rght shfted by k bts and then adds ths product to N or D, respectvely. double precson numbers wth 53-bt sgnfcands, and extended precson numbers wth 64-bt sgnfcands. Smlar to the AMD-K7 multpler desgn [5], our multpler also provdes a varety of other multplcaton szes to facltate accurate dvson, square root, and elementary functon computatons. The multplcaton szes supported nclude 24x24, 25x24, 27x76, 53x53, 54x53, 54x76, 64x64, 68x68 and 76x76. The multpler also performs two sngle precson (dual 24x24) multplcatons n parallel, whch s frequently used n multmeda applcatons. 3. Rectangular multpler The rectangular floatng-pont multpler used to mplement our proposed dvson algorthms has two ppelne stages, as shown n Fgure. The frst stage, E, conssts of a 27-bt by 76-bt tree multpler that accepts the two numbers to be multpled, along wth a 76-bt feedback term n carry-save format, and produces a 3-bt product n carry-save format. The second stage, E2, conssts of combned addton and roundng, result multplexng, and forwardng to the regster fle and bypass networks. The multpler supports a range of precsons wth wder precson multples acheved by multple passes through the frst stage, E. It supports operatons on sngle precson numbers wth 24-bt sgnfcands, Fgure. 27-bt by 76-bt multpler For each pass through the multpler, the approprate 27-bts of the multpler operand are selected by the Unpack/Algn Multplexers. Two sets of radx-4 Booth encoders are requred to support the dual 24x24 multply. The Booth multplexers produce fourteen 8-bt partal products, whch are reduced, along wth the two 76-bt feedback terms, usng a partal product reducton tree mplemented usng three levels of 4-2 compressors. For the frst pass, the feedback terms are all zeros. For subsequent passes, the feedback terms 36
4 are obtaned from the upper 76-bts of the carry-save product from the prevous pass. The roundng scheme mplemented n the second stage, E2, nvolves addng roundng constants to the carry-save product usng 3-2 carry-save adders (CSAs) pror to the fnal addton [5]. The roundng s performed pror to normalzaton usng two addtons, wth one addton assumng roundng overflow occurs and one addton assumng roundng overflow does not occur. A thrd addton computes the un-rounded sgnfcand [5]. An approprate roundng constant s provded for each of the frst two addtons and s omtted for the un-rounded sgnfcand. Snce for wder precson multples, the product generaton s splt over multple cycles, the lower 27-bts are processed after each pass to compute the stcky bt and the carryn for the next pass. Table shows the multpler passes, latences, and throughputs for supported multplcaton szes. Table. Multpler passes, latences, and throughput for supported multplcaton szes Multplcaton Szes Multpler Passes Latences (cycles) Throughputs (mults/cycle) Dual 24x x24, 25x24, 27x x53, 54x53, 54x /2 64x64, 68x68, 76x /3 4. Floatng-pont dvson algorthms The dvson algorthms presented n ths paper are derved from the AMD-K7 Goldschmdt dvson algorthm [5], whch was desgned for a fullyppelned 76-bt by 76-bt multpler. Ths secton gves an overvew of the AMD-K7 dvson algorthm [5]. It then presents our varatons of Goldschmdt s dvson algorthm that are desgned for an x86 mcroprocessor wth the 76-bt by 27-bt rectangular multpler presented n Secton 3. The algorthms can be modfed for other multpler szes. Fgure 2 shows the verson of Goldschmdt s dvson algorthm mplemented on the AMD-K7 and presented n [5]. Ths dvson algorthm only supports extended precson nput operands wth results rounded to sngle, double, extended, or nternal precson. In Fgure 2, A and B are the nput operands. PC s the sgnfcand precson control, where PC s 24 for sngle precson, 53 for double precson, 64 for extend precson, and 68 for nternal precson. Dvson wth an nternal precson of 68 bts s used to compute certan elementary functons. RC s the roundng control, whch ndcates f the fnal result s rounded to nearest even, toward zero, toward mnus nfnty, or toward plus nfnty. Q s the ntal quotent approxmaton and Q f s the fnal correctly rounded quotent. REM s a 2-bt varable that ndcates the sgn of the remander and f the remander s zero. The cycles shown on the rght assume that the ntal recprocal estmate takes three cycles and each multplcaton takes four cycles [5]. The dvson algorthm takes 6 cycles for sngle precson (PC = 24), 2 cycles for double precson (PC = 53), and 24 cycles for extended and nternal precson (PC = 64 and 68, respectvely). Program: Goldschmdt s Dvson Algorthm n the AMD-K7 wth a 76 by 76 Multpler [5] Input = (A, B, PC, RC), Output = (Q f ) Operatons Cycles = recp_estmate(b) -3 D = termul_76x76(, B), R = comp(d ) 4-7 N = termul_76x76(, A) 5-8 f (PC == 24) {N f = N, R f = R, D = termul_76x76(d, R ), R = comp(d ) 8- N = termul_76x76(n, R ) 9-2 f (PC == 53) {N f = R, R f = R, D 2 = termmul_76x76(d, R ), R 2 = comp(d 2 ) 2-5 N 2 = termmul_76x76 (N, R ) 3-6 R f = N 2, R f = R 2 END DIVISION: Q = lastmul_76x76(n f, R f, PC+) See + REM = backmul_76x76(q, B, A), Q f = round(q, REM, PC, RC) See * (PC = 24), 3-6 (PC = 53), 7-2 (PC = 64/68) * 3-6 (PC = 24), 7-2 (PC = 53), 2-24 (PC = 64/68) Fgure 2: Goldschmdt s algorthm n the AMD-K7 The algorthm shown n Fgure 2 ncludes several operatons, whch are dscussed n detal by Oberman [5]. The recp_estmate operaton uses 2 -entry by 6-bt and 2 -entry by 7-bt bpartte tables to provde a recprocal estmate that s accurate to at least 4.94 bts [5, 25]. The termul_76x76 operaton corresponds to a 76-bt by 76-bt multply n whch the result s rounded to 76 bts usng round-to-nearest-even. The comp operaton produces the one s complement of D, whch s a 76-bt value. The lastmul_76x76 operaton s a 76-bt by 76-bt multply, whch rounds ts result to PC+ bts of precson usng round-to-nearest-even. PC+ bts of precson are requred n order to mplement the AMD-K7 roundng technque [5]. The backmul_76x76 operaton performs a 76-bt by 76-bt multplcaton of Q B and subtracts A to determne the sgn of the remander and f the remander s equal to zero. The round operaton produces the correctly 37
5 rounded quotent usng the AMD-K7 roundng technque [5]. To more effcently mplement Goldschmdt s dvson algorthm wth a rectangular multpler, our frst verson of Goldschmdt s algorthm (GS-) uses a truncated verson of R, n whch the requred precson of R s determned from a detaled error analyss. Ths analyss ndcates correctly rounded results are stll produced, when R s truncated to 3 bts and R s truncated to 6 bts. Snce R must be longer than 27 bts, t needs two passes through the 27-bt by 76-bt multpler, so R s nstead truncated to 54 bts. Smlarly, snce R s longer than 54 bts, t needs three passes through the multpler, so all 76 bts are used. Program: Goldschmdt s Dvson Algorthm wth Truncated R on a 27 x 76 Multpler (GS-) Input = (A,B,OT, PC, RC) Output = (Q f ) Operatons Cycles = recp_estmate(b) -3 D = termul_27x76(, B), R = comp(d ) 4-5 N = termul_27x76(, A) 5-6 f (OT = = SINGLE) { Q = lastmul_54x76(r [:53], N, 25) 7-9 REM = backmul_25x24(q, B, A), Q f = round(q, REM, 24, RC) - f (OT = = 87 and PC = = 24) goto 87 DIV D = termul_54x76(r [:53], D ), R = comp(d ) 6-8 N = termul_54x76(r [:53], N ) 8- f (OT = = DOUBLE ) { Q = lastmul_76x76(r, N, 54) -4 REM = backmul_54x53(q, B, A), Q f = round(q, REM, 53, RC) DIV: f (PC == 24) { Q = lastmul_54x76(r [:53], N, 25) 7-9 else f (PC == 53) Q = lastmul_76x76(r, N, 54) -4 else { D 2 = termul_76x76(r, D ), R 2 = comp(d 2 ) -4 N 2 = termul_76x76(r, N ) 4-7 Q = lastmul_76x76(r 2, N 2, PC+) } 8-2 REM = backmul_76x76(q, B, A), Q f = round(q, REM, PC, RC) See * END DIVISION: * -3 (PC=24), 5-8 (PC=53), (PC = 64/68) Fgure 3: Goldschmdt s algorthm wth truncated R on a 27 x 76 multpler (GS-) Utlzng a truncated verson of R allows some of the multplcatons to be performed wth fewer passes through the rectangular multpler. The GS- algorthm also examnes the operand type, OT, snce SSE requres support for sngle and double precson nput operands and operatons on these types of operands requre fewer passes through the rectangular multpler than extended precson operands. Fgure 3 shows the GS- Algorthm. In ths fgure, the sze of each multplcaton s specfed by the numbers after the _. All of the termul_ operatons, truncate ther results to 76 bts, the lastmul_ operatons round ther results to the precson specfed n the last argument usng round-to-nearest. The rest of the operatons have the same functonalty as the correspondng operatons n Fgure 2, except for the sze of the nput operands. For example, Q = lastmul_54x76(r [:53], N, 25) ndcates that the 54 most sgnfcant bts of R are multpled by all 76 bts of N. The result s rounded to 25 bts usng round-to-nearest. Snce R [:53] s 54 bts, ths multplcaton s performed wth two passes through the rectangular multpler. For sngle precson operands (OT = SINGLE), all of the multplcatons, except for lastmul_54x76, requre only a sngle pass through the multpler tree and the dvson has a latency of cycles. For double precson operands, the multplcatons requre one to three passes through the multpler tree and the dvson has a latency of 7 cycles. For x87 operands, the latency depends on the requred precson of the fnal result and s 3 cycles for sngle precson, 8 cycles for double precson, and 25 cycles for extended or nternal precson. Our second verson of Goldschmdt s algorthm (GS-2), shown n Fgure 4, uses a truncated verson of R and takes advantage of the fact that R s close to. to reduce the number of bts n R used for the teratve multplcatons and reduce the number of passes through the multpler. For example, snce 3 R < 2, the thrteen most sgnfcant bts of R are not needed. Based on Equaton (7), ths allows the computaton D = termul_54x76(r [:53], D ) (8) whch requres two passes through the multpler tree n GS- to be replaced by the computaton D = termuladd_27x76(r [3:39], D, 3) (9) whch corresponds to 3 D = D + D R[3 : 39] 2. = D + {' 3, D} R[3 : 39] (2) Ths operaton requres only a sngle pass through the multpler wth D rght shfted by 3 bts, the lower 3 bts of D truncated, and the un-shfted value of D added to the product. Ths operaton compensates for the fact that - R s used nstead of R, as descrbed n Secton 2. Smlar optmzatons are used throughout the algorthm to reduce the number of passes through 38
6 the multpler and the latency of the dvson algorthm. The operatons that use these types of optmzatons are termuladd_ and lastmuladd_. They mplement operand shftng, multplcaton, and addton by usng a modfed verson of the multpler descrbed n Secton 3. The lastmuladdd operaton s smlar to the termuladd algorthm, except that the result s rounded to the number of bts specfed by ts last argument usng round-to-nearest. Program: Goldschmdt s Dvson Algorthm wth Reduced R on a 27 x 76 Multpler (GS-2) Input = (A,B,OT, PC,RC), Output = (Q f ) Operatons Cycles = recp_estmate(b) -3 D = termul_27x76(, B), R = comp(d ) 4-5 N = termul_27x76(, A) 5-6 f (OT == SINGLE) { Q = lastmuladd_27x76(r [3:39], N, 3, 25) 7-8 REM = backmul_25x24(q, B, A), Q f = round(q, REM, 24, RC) 9- f (OT == 87 and PC = 24) goto 87 DIV D = termuladd_27x76(r [3:39], D, 3), R = comp(d ) 6-7 N = termuladd_27x76(r [3:39], N, 3) 7-8 f (OT == DOUBLE ) { Q = lastmuladd_54x76({r [26,75], N, 26, 54) 9- REM = backmul_54x53(q, B, A), Q f = round(q, REM, 53, RC) DIV: f (PC == 24) Q = lastmuladd_27x76(r [3,39], N, 3, 25) 7-8 else f (PC == 53) Q = lastmuladd_54x76(r [26,75], N, 26, 54) 9- else { D 2 = termuladd_27x76(r [26:52], D, 26), R 2 = comp(d 2 ) 8-9 N 2 = termuladd_27x76(r [26:52], N, 26) 9- Q = lastmuladd_27x76(r 2 [52:75], N 2, 52, PC+)} -2 REM = backmul_65x64(q, B, A), Q f = round(q, REM, PC, RC) See * END DIVISION: * 9-2 (PC=24), 2-5 (PC=53), 3-6 (PC=64/68) Fgure 4: Goldschmdt s algorthm wth reduced R on a 27 x 76 multpler (GS-2) 5. Algorthm comparson Table 2 compares the latency n cycles for each dvson algorthm, based on the multplcaton latences gven n Table. In Table 2,,, and (E) ndcate results are rounded to sngle, double, or extended precson, respectvely. For completeness, the latency of the orgnal dvson algorthm [5] on the AMD-K7 mcroprocessor wth a 76x76 multpler s also gven, and denoted as K7 (76x76). The 76x76 multpler s roughly 2.5 tmes larger than our 27x76 multpler. Table 2 also shows the latency for the K7 dvson algorthm [5], when t has mnor modfcaton to work wth our rectangular multpler. Ths modfed algorthm s denoted as K7 (27x76). As shown n Table 2, the two proposed algorthms have better latency than the AMD-K7 (27x76) algorthm for all operand types and precsons. The GS-2 (27x76) algorthm has the lowest overall latency for all operand types and precsons. Compared to the GS- (27x76) algorthm, the GS-2 (27x76) algorthm reduces the latency by one cycle for sngle precson, three cycles for double precson, and nne cycles for extended precson, when the nput and output operands have the same precson. Table 3 shows the number of passes through the multpler for each dvson algorthm, based on the number of multpler passes for the varous multplcaton szes gven n Table. For example, a 27x76 multplcaton only requres a sngle pass through the multpler and a 76x76 multplcaton requres 3 passes through the multpler. As shown n Table 3, the GS-2 algorthm has the fewest passes through the multpler for all operand types and precsons. The number of passes through the multpler s mportant snce t mpacts the power dsspated by the dvson algorthm and also ndcates how avalable the multpler s for mplementng other operatons. Table 2: Latency of dvson algorthms (cycles) Algorthm Sngle Double (E) K7 (76x76) K7 (27x76) GS- (27x76) GS-2 (27x76) Table 3: Multpler passes of dvson algorthms Algorthm Sngle Double (E) K7 (76x76) K7 (27x76) GS- (27x76) GS-2 (27x76)
7 Compared to the K7 (27x76) algorthm, the GS- (27x76) algorthm has roughly the same hardware complexty, but more complex control logc to handle the dfferent multplcaton szes. The GS-2 algorthm has the most complexty, snce t has addtonal multplexers to shft R, N, and D and t has modfcatons to the multpler tree to perform multply-add operatons. For our mplementaton, the relatvely small ncrease n hardware complexty of the GS-2 algorthm s less mportant than the reduced latency and passes through the rectangular multpler. 6. Conclusons Ths paper presents and compares varatons of Goldschmdt s dvson algorthm for an x86 mcroprocessor that utlzes a rectangular multpler. Of the algorthms presented n ths paper, the GS-2 algorthm has the lowest latency and requres the fewest passes through the rectangular multpler. All of the algorthms presented n ths paper have been verfed through extensve error analyss. The GS-2 algorthm has been modeled n Verlog and smulated usng over mllon test vectors for the supported operand types and result precsons. References [] S. K. Raman, V. Pentkovsk, and J. Keshava, Implementng Streamng SIMD Extensons on the Pentum III Processor, IEEE Mcro, vol. 2, no. 4, pp , July 2. [2] Advanced Mcro Devces, AMD64 Archtecture Programmer s Manual Volume 5: 64-Bt Meda and x87 Floatng-Pont Instructons, Revson 3.7, September 26. [3] ANSI and IEEE, IEEE Standard for Bnary Floatngpont Arthmetc, 985. [4] W.-C. Ma and C.-L. Yang, Usng Intel Streamng SIMD Extensons for 3D Geometry Processng, Proceedngs of the 3rd IEEE Pacfc-Rm Conference on Multmeda, pp. 8-87, December 22 [5] S. F. Oberman and M. J. Flynn, "Dvson Algorthms and Implementatons," IEEE Transactons on Computers, vol. 46, no. 8, pp , August 997. [6] M. D. Ercegovac and T. Lang, Dvson and Square Root: Dgt-Recurrence Algorthms and Implementatons, Kluwer Academc Publshers, 994. [7] D. Wong and M. Flynn, Fast Dvson Usng Accurate Quotent Approxmatons to Reduce the Number of Iteratons, IEEE Transactons on Computers, vol. 4, no. 8, pp , August 992. [8] W. S. Brggs and D. W. Matula, A 7 69 Bt Multply and Add Unt wth Redundant Bnary Feedback and Sngle Cycle Latency, Proceedngs of the th IEEE Symposum on Computer Arthmetc, pp. 63-7, July 993. [9] W. S. Brggs and D. W. Matula, Method and Apparatus for Performng Dvson Usng a Rectangular Aspect Rato, Multpler, U.S. Patent No. 5,46,38, 989. [] W. S. Brggs and D. W. Matula, Method and Apparatus for Performng Prescaled Dvson, U.S. Patent No. 5,475,63, 995. [] M. D. Ercegovac, T. Lang, and P. Montusch, Very Hgh Radx Dvson wth Prescalng and Selecton by Roundng, IEEE Transactons on Computers, vol. 43, no. 8, pp , August 994. [2] T. Lang and P. Montusch, Boostng Very Hgh Radx Dvson wth Prescalng and Selecton by Roundng, IEEE Transactons on Computers, vol. 5, no., pp. 3-27, January 2. [3] R. E. Goldschmdt, Applcatons of Dvson by Convergence, M.S. thess, Dept. of Electrcal Engneerng, MIT, Cambrdge, MA, June 964. [4] M. Flynn, On Dvson by Functonal Iteraton, IEEE Transactons on Computers, vol. 9, no. 8, pp , August 97. [5] S. F. Oberman, Floatng-pont Dvson and Square Root Algorthms and Implementaton n the AMD-K7 Mcroprocessor, In Proceedngs of the 4 th IEEE Symposum on Computer Arthmetc, pg. 6-5, 999. [6] S. F. Anderson, J. G. Earle, R. E. Goldschmdt, and D. M. Powers, The IBM System/36 Model 9: Floatng- Pont Executon Unt, IBM Journal of Research and Development, vol., pp , Jan [7] H. Darley, M. Gll, D. Earl, D. Ngo, P. Wang, M. Hpona, and J. Dodrll, Floatng Pont/Integer Processor wth Dvde and Square Root Functons, U.S. Patent No. 4,878,9, 989. [8] E. M. Schwarz, L. Sgal, and T. J. McPherson, CMOS Floatng-pont Unt for the S/39 Parallel Enterprse Server G4, IBM Journal of Research and Development, vol. 4, no. 4/5, pp , July/September 997. [9] M. D. Ercegovac and T. Lang, Dgtal Arthmetc, Morgan Kaufmann Publshers, 24. [2] B. Parham, Computer Arthmetc: Algorthms and Hardware Desgns, Oxford Unversty Press, 2. [2] I. Koren, Computer Arthmetc Algorthms, A. K. Peters, 22. [22] G. Even, P.-M Sedel, and W. E. Ferguson, A Parametrc Error Analyss of Goldschmdt's Dvson Algorthm, 6th IEEE Symposum on Computer Arthmetc, pp. 65-7, June 23. [23] G. Even and P.-M. Sedel, "Ppelned Multplcatve Dvson wth IEEE Roundng," IEEE Internatonal Conference on Computer Desgn, pp , 23. [24] G. Even and P.-M. Sedel, "Ppelned Multplcatve Dvson wth IEEE Roundng," U.S. Patent No. 24/28338, July, 24. [25] S. F. Oberman, Bpartte Look-up Table wth Output Values Havng Mnmzed Absolute Error, U.S. Patent No. 6,223,92, Aprl, 2. 3
Lecture 3: Computer Arithmetic: Multiplication and Division
8-447 Lecture 3: Computer Arthmetc: Multplcaton and Dvson James C. Hoe Dept of ECE, CMU January 26, 29 S 9 L3- Announcements: Handout survey due Lab partner?? Read P&H Ch 3 Read IEEE 754-985 Handouts:
More informationNewton-Raphson division module via truncated multipliers
Newton-Raphson dvson module va truncated multplers Alexandar Tzakov Department of Electrcal and Computer Engneerng Illnos Insttute of Technology Chcago,IL 60616, USA Abstract Reducton n area and power
More informationConditional Speculative Decimal Addition*
Condtonal Speculatve Decmal Addton Alvaro Vazquez and Elsardo Antelo Dep. of Electronc and Computer Engneerng Unv. of Santago de Compostela, Span Ths work was supported n part by Xunta de Galca under grant
More informationRADIX-10 PARALLEL DECIMAL MULTIPLIER
RADIX-10 PARALLEL DECIMAL MULTIPLIER 1 MRUNALINI E. INGLE & 2 TEJASWINI PANSE 1&2 Electroncs Engneerng, Yeshwantrao Chavan College of Engneerng, Nagpur, Inda E-mal : mrunalngle@gmal.com, tejaswn.deshmukh@gmal.com
More informationA Binarization Algorithm specialized on Document Images and Photos
A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a
More informationImprovement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration
Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,
More informationParallelism for Nested Loops with Non-uniform and Flow Dependences
Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr
More informationAssignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.
Farrukh Jabeen Algorthms 51 Assgnment #2 Due Date: June 15, 29. Assgnment # 2 Chapter 3 Dscrete Fourer Transforms Implement the FFT for the DFT. Descrbed n sectons 3.1 and 3.2. Delverables: 1. Concse descrpton
More informationThe Codesign Challenge
ECE 4530 Codesgn Challenge Fall 2007 Hardware/Software Codesgn The Codesgn Challenge Objectves In the codesgn challenge, your task s to accelerate a gven software reference mplementaton as fast as possble.
More informationUsing Delayed Addition Techniques to Accelerate Integer and Floating-Point Calculations in Configurable Hardware
Draft submtted for publcaton. Please do not dstrbute Usng Delayed Addton echnques to Accelerate Integer and Floatng-Pont Calculatons n Confgurable Hardware Zhen Luo, Nonmember and Margaret Martonos, Member,
More informationMallathahally, Bangalore, India 1 2
7 IMPLEMENTATION OF HIGH PERFORMANCE BINARY SQUARER PRADEEP M C, RAMESH S, Department of Electroncs and Communcaton Engneerng, Dr. Ambedkar Insttute of Technology, Mallathahally, Bangalore, Inda pradeepmc@gmal.com,
More informationA New Memory Reduced Radix-4 CORDIC Processor For FFT Operation
IOSR Journal of VLSI and Sgnal Processng (IOSR-JVSP) Volume, Issue 5 (May. Jun. 013), PP 09-16 e-issn: 319 400, p-issn No. : 319 4197 www.osrjournals.org A New Memory Reduced Radx-4 CORDIC Processor For
More informationParallel matrix-vector multiplication
Appendx A Parallel matrx-vector multplcaton The reduced transton matrx of the three-dmensonal cage model for gel electrophoress, descrbed n secton 3.2, becomes excessvely large for polymer lengths more
More informationSome material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier
Some materal adapted from Mohamed Youns, UMBC CMSC 611 Spr 2003 course sldes Some materal adapted from Hennessy & Patterson / 2003 Elsever Scence Performance = 1 Executon tme Speedup = Performance (B)
More informationFPGA IMPLEMENTATION OF RADIX-10 PARALLEL DECIMAL MULTIPLIER
FPGA IMPLEMENTATION OF RADIX-10 PARALLEL DECIMAL MULTIPLIER A Dssertaton Submtted In Partal Fulflment of the Requred for the Degree of MASTER OF TECHNOLOGY In VLSI Desgn Submtted By GEETA Roll no. 601361009
More informationMathematics 256 a course in differential equations for engineering students
Mathematcs 56 a course n dfferental equatons for engneerng students Chapter 5. More effcent methods of numercal soluton Euler s method s qute neffcent. Because the error s essentally proportonal to the
More informationLoad Balancing for Hex-Cell Interconnection Network
Int. J. Communcatons, Network and System Scences,,, - Publshed Onlne Aprl n ScRes. http://www.scrp.org/journal/jcns http://dx.do.org/./jcns.. Load Balancng for Hex-Cell Interconnecton Network Saher Manaseer,
More informationData Representation in Digital Design, a Single Conversion Equation and a Formal Languages Approach
Data Representaton n Dgtal Desgn, a Sngle Converson Equaton and a Formal Languages Approach Hassan Farhat Unversty of Nebraska at Omaha Abstract- In the study of data representaton n dgtal desgn and computer
More informationArea Efficient Self Timed Adders For Low Power Applications in VLSI
ISSN(Onlne): 2319-8753 ISSN (Prnt) :2347-6710 Internatonal Journal of Innovatve Research n Scence, Engneerng and Technology (An ISO 3297: 2007 Certfed Organzaton) Area Effcent Self Tmed Adders For Low
More informationHigh-Boost Mesh Filtering for 3-D Shape Enhancement
Hgh-Boost Mesh Flterng for 3-D Shape Enhancement Hrokazu Yagou Λ Alexander Belyaev y Damng We z Λ y z ; ; Shape Modelng Laboratory, Unversty of Azu, Azu-Wakamatsu 965-8580 Japan y Computer Graphcs Group,
More informationFPGA Based Fixed Width 4 4, 6 6, 8 8 and Bit Multipliers using Spartan-3AN
IJCSNS Internatonal Journal of Computer Scence and Network Securty, VOL.11 No.2, February 211 61 FPGA Based Fxed Wdth 4 4, 6 6, 8 8 and 12 12-Bt Multplers usng Spartan-3AN Muhammad H. Ras and Mohamed H.
More informationAn Optimal Algorithm for Prufer Codes *
J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,
More informationLecture - Data Encryption Standard 4
The Data Encrypton Standard For an encrypton algorthm we requre: secrecy of the key and not of the algorthm tself s the only thng that s needed to ensure the prvacy of the data the best cryptographc algorthms
More informationTerm Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task
Proceedngs of NTCIR-6 Workshop Meetng, May 15-18, 2007, Tokyo, Japan Term Weghtng Classfcaton System Usng the Ch-square Statstc for the Classfcaton Subtask at NTCIR-6 Patent Retreval Task Kotaro Hashmoto
More informationLearning the Kernel Parameters in Kernel Minimum Distance Classifier
Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department
More informationComplex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following.
Complex Numbers The last topc n ths secton s not really related to most of what we ve done n ths chapter, although t s somewhat related to the radcals secton as we wll see. We also won t need the materal
More informationAnalysis of Continuous Beams in General
Analyss of Contnuous Beams n General Contnuous beams consdered here are prsmatc, rgdly connected to each beam segment and supported at varous ponts along the beam. onts are selected at ponts of support,
More informationCHAPTER 4 PARALLEL PREFIX ADDER
93 CHAPTER 4 PARALLEL PREFIX ADDER 4.1 INTRODUCTION VLSI Integer adders fnd applcatons n Arthmetc and Logc Unts (ALUs), mcroprocessors and memory addressng unts. Speed of the adder often decdes the mnmum
More informationAn Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation
17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 An Iteratve Soluton Approach to Process Plant Layout usng Mxed
More informationA mathematical programming approach to the analysis, design and scheduling of offshore oilfields
17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 A mathematcal programmng approach to the analyss, desgn and
More informationArray transposition in CUDA shared memory
Array transposton n CUDA shared memory Mke Gles February 19, 2014 Abstract Ths short note s nspred by some code wrtten by Jeremy Appleyard for the transposton of data through shared memory. I had some
More informationType-2 Fuzzy Non-uniform Rational B-spline Model with Type-2 Fuzzy Data
Malaysan Journal of Mathematcal Scences 11(S) Aprl : 35 46 (2017) Specal Issue: The 2nd Internatonal Conference and Workshop on Mathematcal Analyss (ICWOMA 2016) MALAYSIAN JOURNAL OF MATHEMATICAL SCIENCES
More informationRandom Kernel Perceptron on ATTiny2313 Microcontroller
Random Kernel Perceptron on ATTny233 Mcrocontroller Nemanja Djurc Department of Computer and Informaton Scences, Temple Unversty Phladelpha, PA 922, USA nemanja.djurc@temple.edu Slobodan Vucetc Department
More informationProblem Definitions and Evaluation Criteria for Computational Expensive Optimization
Problem efntons and Evaluaton Crtera for Computatonal Expensve Optmzaton B. Lu 1, Q. Chen and Q. Zhang 3, J. J. Lang 4, P. N. Suganthan, B. Y. Qu 6 1 epartment of Computng, Glyndwr Unversty, UK Faclty
More informationA Fast Visual Tracking Algorithm Based on Circle Pixels Matching
A Fast Vsual Trackng Algorthm Based on Crcle Pxels Matchng Zhqang Hou hou_zhq@sohu.com Chongzhao Han czhan@mal.xjtu.edu.cn Ln Zheng Abstract: A fast vsual trackng algorthm based on crcle pxels matchng
More informationFor instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1)
Secton 1.2 Subsets and the Boolean operatons on sets If every element of the set A s an element of the set B, we say that A s a subset of B, or that A s contaned n B, or that B contans A, and we wrte A
More informationWishing you all a Total Quality New Year!
Total Qualty Management and Sx Sgma Post Graduate Program 214-15 Sesson 4 Vnay Kumar Kalakband Assstant Professor Operatons & Systems Area 1 Wshng you all a Total Qualty New Year! Hope you acheve Sx sgma
More informationHigh level vs Low Level. What is a Computer Program? What does gcc do for you? Program = Instructions + Data. Basic Computer Organization
What s a Computer Program? Descrpton of algorthms and data structures to acheve a specfc ojectve Could e done n any language, even a natural language lke Englsh Programmng language: A Standard notaton
More informationProgramming in Fortran 90 : 2017/2018
Programmng n Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Exercse 1 : Evaluaton of functon dependng on nput Wrte a program who evaluate the functon f (x,y) for any two user specfed values
More informationSpecifications in 2001
Specfcatons n 200 MISTY (updated : May 3, 2002) September 27, 200 Mtsubsh Electrc Corporaton Block Cpher Algorthm MISTY Ths document shows a complete descrpton of encrypton algorthm MISTY, whch are secret-key
More informationContent Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers
IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth
More informationSupport Vector Machines
/9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.
More informationA New Approach For the Ranking of Fuzzy Sets With Different Heights
New pproach For the ankng of Fuzzy Sets Wth Dfferent Heghts Pushpnder Sngh School of Mathematcs Computer pplcatons Thapar Unversty, Patala-7 00 Inda pushpndersnl@gmalcom STCT ankng of fuzzy sets plays
More informationResource Efficient Design and Implementation of Standard and Truncated Multipliers using FPGAs
Proceedngs of the World Congress on Engneerng 2011 Vol II, July 6-8, 2011, London, U.K. Resource Effcent Desgn and Implementaton of Standard and Truncated Multplers usng FPGAs Muhammad H. Ras, Member,
More informationSimulation Based Analysis of FAST TCP using OMNET++
Smulaton Based Analyss of FAST TCP usng OMNET++ Umar ul Hassan 04030038@lums.edu.pk Md Term Report CS678 Topcs n Internet Research Sprng, 2006 Introducton Internet traffc s doublng roughly every 3 months
More informationFPGA Implementation of CORDIC Algorithms for Sine and Cosine Generator
The 5th Internatonal Conference on Electrcal Engneerng and Informatcs 25 August -, 25, Bal, Indonesa FPGA Implementaton of CORDIC Algorthms for Sne and Cosne Generator Antonus P. Renardy, Nur Ahmad, Ashbr
More informationDecomposition of Grey-Scale Morphological Structuring Elements in Hardware
Decomposton of Grey-Scale Morpholocal Structurn Elements n Hardware I Andreads, C Fyrndes, A Gasteratos and Y Boutals Secton of Electroncs & Informaton Systems Technoloy Department of Electrcal & Computer
More informationCS1100 Introduction to Programming
Factoral (n) Recursve Program fact(n) = n*fact(n-) CS00 Introducton to Programmng Recurson and Sortng Madhu Mutyam Department of Computer Scence and Engneerng Indan Insttute of Technology Madras nt fact
More informationSkew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach
Angle Estmaton and Correcton of Hand Wrtten, Textual and Large areas of Non-Textual Document Images: A Novel Approach D.R.Ramesh Babu Pyush M Kumat Mahesh D Dhannawat PES Insttute of Technology Research
More informationExercises (Part 4) Introduction to R UCLA/CCPR. John Fox, February 2005
Exercses (Part 4) Introducton to R UCLA/CCPR John Fox, February 2005 1. A challengng problem: Iterated weghted least squares (IWLS) s a standard method of fttng generalzed lnear models to data. As descrbed
More informationCluster Analysis of Electrical Behavior
Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School
More informationCMPS 10 Introduction to Computer Science Lecture Notes
CPS 0 Introducton to Computer Scence Lecture Notes Chapter : Algorthm Desgn How should we present algorthms? Natural languages lke Englsh, Spansh, or French whch are rch n nterpretaton and meanng are not
More informationCSCI 104 Sorting Algorithms. Mark Redekopp David Kempe
CSCI 104 Sortng Algorthms Mark Redekopp Davd Kempe Algorthm Effcency SORTING 2 Sortng If we have an unordered lst, sequental search becomes our only choce If we wll perform a lot of searches t may be benefcal
More informationTHE PULL-PUSH ALGORITHM REVISITED
THE PULL-PUSH ALGORITHM REVISITED Improvements, Computaton of Pont Denstes, and GPU Implementaton Martn Kraus Computer Graphcs & Vsualzaton Group, Technsche Unverstät München, Boltzmannstraße 3, 85748
More informationNUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS
ARPN Journal of Engneerng and Appled Scences 006-017 Asan Research Publshng Network (ARPN). All rghts reserved. NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS Igor Grgoryev, Svetlana
More informationModule Management Tool in Software Development Organizations
Journal of Computer Scence (5): 8-, 7 ISSN 59-66 7 Scence Publcatons Management Tool n Software Development Organzatons Ahmad A. Al-Rababah and Mohammad A. Al-Rababah Faculty of IT, Al-Ahlyyah Amman Unversty,
More informationR s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes
SPH3UW Unt 7.3 Sphercal Concave Mrrors Page 1 of 1 Notes Physcs Tool box Concave Mrror If the reflectng surface takes place on the nner surface of the sphercal shape so that the centre of the mrror bulges
More informationTHE low-density parity-check (LDPC) code is getting
Implementng the NASA Deep Space LDPC Codes for Defense Applcatons Wley H. Zhao, Jeffrey P. Long 1 Abstract Selected codes from, and extended from, the NASA s deep space low-densty party-check (LDPC) codes
More informationCourse Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms
Course Introducton Course Topcs Exams, abs, Proects A quc loo at a few algorthms 1 Advanced Data Structures and Algorthms Descrpton: We are gong to dscuss algorthm complexty analyss, algorthm desgn technques
More informationAn Entropy-Based Approach to Integrated Information Needs Assessment
Dstrbuton Statement A: Approved for publc release; dstrbuton s unlmted. An Entropy-Based Approach to ntegrated nformaton Needs Assessment June 8, 2004 Wllam J. Farrell Lockheed Martn Advanced Technology
More informationMemory Modeling in ESL-RTL Equivalence Checking
11.4 Memory Modelng n ESL-RTL Equvalence Checkng Alfred Koelbl 2025 NW Cornelus Pass Rd. Hllsboro, OR 97124 koelbl@synopsys.com Jerry R. Burch 2025 NW Cornelus Pass Rd. Hllsboro, OR 97124 burch@synopsys.com
More informationHermite Splines in Lie Groups as Products of Geodesics
Hermte Splnes n Le Groups as Products of Geodescs Ethan Eade Updated May 28, 2017 1 Introducton 1.1 Goal Ths document defnes a curve n the Le group G parametrzed by tme and by structural parameters n the
More informationCACHE MEMORY DESIGN FOR INTERNET PROCESSORS
CACHE MEMORY DESIGN FOR INTERNET PROCESSORS WE EVALUATE A SERIES OF THREE PROGRESSIVELY MORE AGGRESSIVE ROUTING-TABLE CACHE DESIGNS AND DEMONSTRATE THAT THE INCORPORATION OF HARDWARE CACHES INTO INTERNET
More informationUB at GeoCLEF Department of Geography Abstract
UB at GeoCLEF 2006 Mguel E. Ruz (1), Stuart Shapro (2), June Abbas (1), Slva B. Southwck (1) and Davd Mark (3) State Unversty of New York at Buffalo (1) Department of Lbrary and Informaton Studes (2) Department
More informationVectorization of Image Outlines Using Rational Spline and Genetic Algorithm
01 Internatonal Conference on Image, Vson and Computng (ICIVC 01) IPCSIT vol. 50 (01) (01) IACSIT Press, Sngapore DOI: 10.776/IPCSIT.01.V50.4 Vectorzaton of Image Outlnes Usng Ratonal Splne and Genetc
More informationParallel Inverse Halftoning by Look-Up Table (LUT) Partitioning
Parallel Inverse Halftonng by Look-Up Table (LUT) Parttonng Umar F. Sddq and Sadq M. Sat umar@ccse.kfupm.edu.sa, sadq@kfupm.edu.sa KFUPM Box: Department of Computer Engneerng, Kng Fahd Unversty of Petroleum
More informationCompiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz
Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster
More informationSolving two-person zero-sum game by Matlab
Appled Mechancs and Materals Onlne: 2011-02-02 ISSN: 1662-7482, Vols. 50-51, pp 262-265 do:10.4028/www.scentfc.net/amm.50-51.262 2011 Trans Tech Publcatons, Swtzerland Solvng two-person zero-sum game by
More information2x x l. Module 3: Element Properties Lecture 4: Lagrange and Serendipity Elements
Module 3: Element Propertes Lecture : Lagrange and Serendpty Elements 5 In last lecture note, the nterpolaton functons are derved on the bass of assumed polynomal from Pascal s trangle for the fled varable.
More informationAn Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices
Internatonal Mathematcal Forum, Vol 7, 2012, no 52, 2549-2554 An Applcaton of the Dulmage-Mendelsohn Decomposton to Sparse Null Space Bases of Full Row Rank Matrces Mostafa Khorramzadeh Department of Mathematcal
More informationThe stream cipher MICKEY-128 (version 1) Algorithm specification issue 1.0
The stream cpher MICKEY-128 (verson 1 Algorthm specfcaton ssue 1. Steve Babbage Vodafone Group R&D, Newbury, UK steve.babbage@vodafone.com Matthew Dodd Independent consultant matthew@mdodd.net www.mdodd.net
More informationRepeater Insertion for Two-Terminal Nets in Three-Dimensional Integrated Circuits
Repeater Inserton for Two-Termnal Nets n Three-Dmensonal Integrated Crcuts Hu Xu, Vasls F. Pavlds, and Govann De Mchel LSI - EPFL, CH-5, Swtzerland, {hu.xu,vasleos.pavlds,govann.demchel}@epfl.ch Abstract.
More informationVirtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory
Background EECS. Operatng System Fundamentals No. Vrtual Memory Prof. Hu Jang Department of Electrcal Engneerng and Computer Scence, York Unversty Memory-management methods normally requres the entre process
More informationSteps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices
Steps for Computng the Dssmlarty, Entropy, Herfndahl-Hrschman and Accessblty (Gravty wth Competton) Indces I. Dssmlarty Index Measurement: The followng formula can be used to measure the evenness between
More informationRelated-Mode Attacks on CTR Encryption Mode
Internatonal Journal of Network Securty, Vol.4, No.3, PP.282 287, May 2007 282 Related-Mode Attacks on CTR Encrypton Mode Dayn Wang, Dongda Ln, and Wenlng Wu (Correspondng author: Dayn Wang) Key Laboratory
More informationLoad-Balanced Anycast Routing
Load-Balanced Anycast Routng Chng-Yu Ln, Jung-Hua Lo, and Sy-Yen Kuo Department of Electrcal Engneerng atonal Tawan Unversty, Tape, Tawan sykuo@cc.ee.ntu.edu.tw Abstract For fault-tolerance and load-balance
More informationAPPLICATION OF PREDICTION-BASED PARTICLE FILTERS FOR TELEOPERATIONS OVER THE INTERNET
APPLICATION OF PREDICTION-BASED PARTICLE FILTERS FOR TELEOPERATIONS OVER THE INTERNET Jae-young Lee, Shahram Payandeh, and Ljljana Trajovć School of Engneerng Scence Smon Fraser Unversty 8888 Unversty
More informationELEC 377 Operating Systems. Week 6 Class 3
ELEC 377 Operatng Systems Week 6 Class 3 Last Class Memory Management Memory Pagng Pagng Structure ELEC 377 Operatng Systems Today Pagng Szes Vrtual Memory Concept Demand Pagng ELEC 377 Operatng Systems
More informationBrave New World Pseudocode Reference
Brave New World Pseudocode Reference Pseudocode s a way to descrbe how to accomplsh tasks usng basc steps lke those a computer mght perform. In ths week s lab, you'll see how a form of pseudocode can be
More informationHybrid Non-Blind Color Image Watermarking
Hybrd Non-Blnd Color Image Watermarkng Ms C.N.Sujatha 1, Dr. P. Satyanarayana 2 1 Assocate Professor, Dept. of ECE, SNIST, Yamnampet, Ghatkesar Hyderabad-501301, Telangana 2 Professor, Dept. of ECE, AITS,
More informationAPPLICATION OF PREDICTION-BASED PARTICLE FILTERS FOR TELEOPERATIONS OVER THE INTERNET
APPLICATION OF PREDICTION-BASED PARTICLE FILTERS FOR TELEOPERATIONS OVER THE INTERNET Jae-young Lee, Shahram Payandeh, and Ljljana Trajovć School of Engneerng Scence Smon Fraser Unversty 8888 Unversty
More informationFPGA-based implementation of circular interpolation
Avalable onlne www.jocpr.com Journal of Chemcal and Pharmaceutcal Research, 04, 6(7):585-593 Research Artcle ISSN : 0975-7384 CODEN(USA) : JCPRC5 FPGA-based mplementaton of crcular nterpolaton Mngyu Gao,
More informationEvaluation of an Enhanced Scheme for High-level Nested Network Mobility
IJCSNS Internatonal Journal of Computer Scence and Network Securty, VOL.15 No.10, October 2015 1 Evaluaton of an Enhanced Scheme for Hgh-level Nested Network Moblty Mohammed Babker Al Mohammed, Asha Hassan.
More informationA RECONFIGURABLE ARCHITECTURE FOR MULTI-GIGABIT SPEED CONTENT-BASED ROUTING. James Moscola, Young H. Cho, John W. Lockwood
A RECONFIGURABLE ARCHITECTURE FOR MULTI-GIGABIT SPEED CONTENT-BASED ROUTING James Moscola, Young H. Cho, John W. Lockwood Dept. of Computer Scence and Engneerng Washngton Unversty, St. Lous, MO {jmm5,
More informationLS-TaSC Version 2.1. Willem Roux Livermore Software Technology Corporation, Livermore, CA, USA. Abstract
12 th Internatonal LS-DYNA Users Conference Optmzaton(1) LS-TaSC Verson 2.1 Wllem Roux Lvermore Software Technology Corporaton, Lvermore, CA, USA Abstract Ths paper gves an overvew of LS-TaSC verson 2.1,
More informationA Fast Content-Based Multimedia Retrieval Technique Using Compressed Data
A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,
More informationCPE 628 Chapter 2 Design for Testability. Dr. Rhonda Kay Gaede UAH. UAH Chapter Introduction
Chapter 2 Desgn for Testablty Dr Rhonda Kay Gaede UAH 2 Introducton Dffcultes n and the states of sequental crcuts led to provdng drect access for storage elements, whereby selected storage elements are
More informationSequential search. Building Java Programs Chapter 13. Sequential search. Sequential search
Sequental search Buldng Java Programs Chapter 13 Searchng and Sortng sequental search: Locates a target value n an array/lst by examnng each element from start to fnsh. How many elements wll t need to
More informationFast exponentiation via prime finite field isomorphism
Alexander Rostovtsev, St Petersburg State Polytechnc Unversty rostovtsev@sslstunevaru Fast exponentaton va prme fnte feld somorphsm Rasng of the fxed element of prme order group to arbtrary degree s the
More informationTsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance
Tsnghua Unversty at TAC 2009: Summarzng Mult-documents by Informaton Dstance Chong Long, Mnle Huang, Xaoyan Zhu State Key Laboratory of Intellgent Technology and Systems, Tsnghua Natonal Laboratory for
More informationAnalysis of Min Sum Iterative Decoder using Buffer Insertion
Analyss of Mn Sum Iteratve ecoder usng Buffer Inserton Saravanan Swapna M.E II year, ept of ECE SSN College of Engneerng M. Anbuselv Assstant Professor, ept of ECE SSN College of Engneerng S.Salvahanan
More informationConfiguration Management in Multi-Context Reconfigurable Systems for Simultaneous Performance and Power Optimizations*
Confguraton Management n Mult-Context Reconfgurable Systems for Smultaneous Performance and Power Optmzatons* Rafael Maestre, Mlagros Fernandez Departamento de Arqutectura de Computadores y Automátca Unversdad
More informationEfficient Distributed File System (EDFS)
Effcent Dstrbuted Fle System (EDFS) (Sem-Centralzed) Debessay(Debsh) Fesehaye, Rahul Malk & Klara Naherstedt Unversty of Illnos-Urbana Champagn Contents Problem Statement, Related Work, EDFS Desgn Rate
More informationRapid Development of High Performance Floating-Point Pipelines for Scientific Simulation 1
Rapd Development of Hgh Performance Floatng-Pont Ppelnes for Scentfc Smulaton 1 G. Lenhart, A. Kugel and R. Männer Dept. for Computer Scence V, Unversty of Mannhem, B6-26B, D-68131 Mannhem, Germany {lenhart,kugel,maenner}@t.un-mannhem.de
More informationA MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS
Proceedngs of the Wnter Smulaton Conference M E Kuhl, N M Steger, F B Armstrong, and J A Jones, eds A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Mark W Brantley Chun-Hung
More informationA High-Quality, Energy Optimized, Real-Time Sampling Rate Conversion Library for the StrongARM Microprocessor
A Hgh-Qualty, Energy Optmzed, Real-Tme Samplng Rate Converson Lbrary for the StrongARM Mcroprocessor Chung-Tse Mar 1, Mat C. Hans, Mark T. Smth, Tajana Smunc, Ronald W. Schafer 1 Moble& Meda Systems Laboratory
More informationImproving The Test Quality for Scan-based BIST Using A General Test Application Scheme
_ Improvng The Test Qualty for can-based BIT Usng A General Test Applcaton cheme Huan-Chh Tsa Kwang-Tng Cheng udpta Bhawmk Department of ECE Bell Laboratores Unversty of Calforna Lucent Technologes anta
More informationSLAM Summer School 2006 Practical 2: SLAM using Monocular Vision
SLAM Summer School 2006 Practcal 2: SLAM usng Monocular Vson Javer Cvera, Unversty of Zaragoza Andrew J. Davson, Imperal College London J.M.M Montel, Unversty of Zaragoza. josemar@unzar.es, jcvera@unzar.es,
More informationA Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems
A Unfed Framework for Semantcs and Feature Based Relevance Feedback n Image Retreval Systems Ye Lu *, Chunhu Hu 2, Xngquan Zhu 3*, HongJang Zhang 2, Qang Yang * School of Computng Scence Smon Fraser Unversty
More informationReducing Frame Rate for Object Tracking
Reducng Frame Rate for Object Trackng Pavel Korshunov 1 and We Tsang Oo 2 1 Natonal Unversty of Sngapore, Sngapore 11977, pavelkor@comp.nus.edu.sg 2 Natonal Unversty of Sngapore, Sngapore 11977, oowt@comp.nus.edu.sg
More information