High-Performance Floating Point Divide

Size: px
Start display at page:

Download "High-Performance Floating Point Divide"

Transcription

1 High-Performce Flotig Poit Divide Albert A. Liddicot d Michel J. Fly Computer Systems Lbortory Stford Uiversity, Stford, CA 945 liddicot@stford.edu d fly@umuhum.stford.edu Abstrct I moder processors flotig poit divide opertios ofte tke to 5 clock cycles, five times tht of multiplictio. Typiclly multiplictive lgorithms with qudrtic covergece re used for high-performce divide. A divide uit bsed o the multiplictive Newto-Rphso itertio is proposed. This divide uit utilizes the higher-order Newto-Rphso reciprocl pproximtio to compute the quotiet fst, efficietly d with high throughput. The divide uit chieves fst executio by computig the squre, cube d higher powers of the pproximtio directly d much fster th the trditiol pproch with seril multiplictios. Additiolly, the secod, third, d higher-order terms re computed simulteously further reducig the divide ltecy. Sigifict hrdwre reductios hve bee idetified tht reduce the overll computtio sigifictly d therefore, reduce the re required for implemettio d the power cosumed by the computtio. The proposed hrdwre uit is desiged to chieve the desired quotiet precisio i sigle itertio llowig the uit to be fully pipelied for mximum throughput. Itroductio Divisio c be expressed s the product of the divided, d the reciprocl of the divisor, Õ ½ µ. Multiplictive techiques such s Newto-Rphso d series expsio lgorithms re ofte used to compute the reciprocl for high-performce divisio []. The IBM È ÓÛ ÖÈ Ì Å d È ÓÛ Ö¾ Ì Å processors use Newto- Rphso lgorithms to implemet divide d squre root. The Å Ã Ì Å [9] d IBM È ÓÛ Ö Ì Å [] processors use lgorithms bsed o series expsio for both divide d squre root. Typiclly, the first-order Newto-Rphso itertio with qudrtic covergece is used. The first-order Newto- Rphso itertio requires two depedet multiplictios per itertio. Rbiowitz [] exteded the Newto- Rphso reciprocl recurrece to iclude higher-order polyomils. The covergece of the higher-order itertio is ½ ½, where is the error of the reciprocl pproximtio for itertio d is the order of the recurrece [5]. Series expsio lgorithms re lso used to compute the reciprocl usig multiplictive itertios. The biomil series expsio techique, ofte clled Goldschmidt s lgorithm [6] [4], is bsed o the fmilir Tylor series expsio of fuctio t poit. The biomil expsio lgorithm requires two idepedet multiplictios per itertio d provides qudrtic covergece. Recet work i the re of high-performce divisio hs show tht higher-order itertios improve performce. Wog d Fly [] proposed very-high rdix divisio scheme tht is bsed o look-up tbles d Tylor series pproximtios for the reciprocl. Higher-order terms of the Tylor series re computed to icrese the precisio of successive quotiet pproximtios. This pproch offers lier covergece while retirig or more bits per itertio. Ito, Tkgi, d Yjim [7] developed ccelerted higher-order Newto-Rphso divisio d squre root lgorithm suitble for implemettio usig multiply-ccumulte uit. This implemettio ccelertes the covergece of the higher-order itertio by usig lookup tble to estimte the cube of itermedite vlue. Ercegovc, Lg, Muller, d Tisserd [] proposed method to compute the reciprocl d other fuctios bsed o rgumet reductio d series expsio. This method uses tbles d smll multipliers to compute the terms of series expsio. Smll seril multipliers re used to compute the squre d cube of itermedite vlue. A multiplictive divide uit bsed o the higher-order Newto-Rphso reciprocl pproximtio is proposed d lyzed. A prllel cubig uit proposed by Liddicot d Fly [8] exposes dditiol computtiol prllelism. The prllel cube computtio is extedble to compute higher powers d thus further ccelerte the covergece of the pproximtio. The proposed divide uit exploits the computtio prllelism exposed by the prllel powerig

2 uits. Furthermore, by usig higher-order pproximtios the desired precisio my be obtied i sigle itertio llowig fully pipelied implemettio. The vrious divide lgorithms d implemettios differ o severl ccouts. First, the iheret computtiol prllelism tht llows ltecy reductio. Secod, the subuit precisio d the ffect of the subuit precisio o the ltecy, re, d power cosumptio required for the divide computtio. Filly, the error covergece of the pproximtio determies whether it is fesible to compute the quotiet to the desired precisio i sigle itertio. This pper is orgized s follows, sectios d preset the Newto-Rphso d biomil series expsio divide lgorithms. I sectio 4 the higher-order Newto- Rphso divider d subuits re proposed. I sectio 5 sigifict hrdwre reductios pplicble to the proposed rchitecture re preseted d the fil hrdwre cofigurtio is discussed. I the remiig sectios, the proposed divide uit is compred to lterte techiques d brief coclusios re preseted. LUT (/b) X b MUX Xi q = /b () Xi+ LUT (/b) X b MUX Xi + Xi+ q = /b (b) Figure. NR divide ()st order (b)rd order. Newto-Rphso Divide Uit The first-order Newto-Rphso reciprocl pproximtio with qudrtic covergece is expressed s, ½ ¾ µ. The iitil pproximtio, ¼, for the reciprocl of ½ is geerlly determied usig ROM lookup tble before the first itertio begis. A fused multiplysubtrct subuit my be used withi the itertio to compute ¾ µ i sigle opertio. Therefore, ech itertio requires two depedet multiplictio opertios. After the fil itertio hs completed, the quotiet is determied by multiplyig the divided with the reciprocl of the divisor. Figure () illustrtes the first-order Newto- Rphso divide uit. Ech multiplictio withi the itertio is depedet o the result produced by the previous multiplictio d must be computed serilly. If Ä itertios re required to chieve the desired quotiet precisio, the the ltecy of the divide uit implemeted with sigle multiplier is Ø Ú Ø ÐÓÓ ÙÔØ Ð ¾ÄØ ÑÙÐØ Ø ÑÙÐØ. Usig two multipliers the totl ltecy my be reduced by oe multiplictio if the fil multiplictio with the divided is overlpped i the lst itertio sice Õ µ ¾ µ. The ltecy for the first-order Newto-Rphso divide uit usig two multipliers is Ø Ú Ø ÐÓÓ ÙÔØ Ð ¾ÄØ ÑÙÐØ. The geerlized Newto-Rphso reciprocl itertio my be expressed s the followig Ø order itertio, ½ ½ ½ µ ½ µ ¾ ½ µ µ. Here is the Ø pproximtio of the reciprocl of the divisor,. Figure (b) shows third-order Newto-Rphso divide uit desiged usig stdrd multiplictio, dditio, d subtrctio uits. The subtrctio d dditios my be fused with the multiplictios s described previously without sigifictly icresig the multiplictio ltecy. If sigle multiplier is used d Ä itertios re required for the desired quotiet precisio, the the ltecy of the third-order Newto-Rphso divide uit is Ø Ú Ø ÐÓÓ ÙÔØ Ð ÄØ ÑÙÐØ Ø ÑÙÐØ. Due to the fster covergece, oe itertio of the third-order divide uit reduces the reciprocl error by the sme mout s two itertios of the first-order divide uit. The ltecy for oe itertio of the third-order divide uit is lso equivlet to the ltecy for two itertios of the first-order divide uit. As ws the cse with the first-order divide uit, the fil multiplictio with the divided my be overlpped whe two multipliers re vilble reducig the ltecy by oe multiplictio. The ltecy of the divide uit usig two multipliers is Ø Ú Ø ÐÓÓ ÙÔØ Ð ÄØ ÑÙÐØ. Agi the ltecy for oe itertio of the third-order divide uit is equivlet with the ltecy of two itertios of the first-order Newto-Rphso itertio. There is o beefit i usig higher-order itertio if full precisio seril multiplictios re used to compute the powers of ½ µ. Divisio by Series Expsio The disdvtge with the stdrd form of the Newto- Rphso divide is tht the multiplictios i the itertio re depedet d must be performed serilly. Therefore, ech itertio requires two or more seril multiplictios. Biomil series expsio is other multiplictive divisio techique with qudrtic covergece. The typicl form of the series expsio recurrece is bsed o the Mcluri

3 series were ½ s show i equtio. µ ½ ½ ½ ½ ¾ () After fctorig equtio d multiplyig by the divided, the quotiet, Õ, c be expressed by the multiplictive series show i equtio. Õ ½ µ ½ ¾ µ ½ µ ½ µ ½ ½ µ () Ech multiplictio i equtio qudrticlly reduces the error i the quotiet Õ d is cosidered itertio towrds the fil quotiet. Here, Õ ¼, Õ ½ ½ µ, d Õ ½ Õ ½ ¾ µ Õ Ö for ½. Let Ö ¼ ½ µ, ¼ ½ µ d ½ Ö ½ the Ö ¾ µ for ½. A multiplictio d subtrctio must be performed to obti the ext fctor Ö. Withi ech itertio both d Ö must be computed, d fused multiply-subtrct cot be used. Figure shows divide uit bsed o the itertive form of the biomil series expsio. The right side of the divide uit computes the ext fctor Ö ½ while the left side computes the quotiet pproximtio Õ ½. The fctor Ö ½ is idepedet of the quotiet, Õ ½, computtio d therefore the two multiplictios my occur simulteously. Similrly to the Newto-Rphso divisio, lookup tble c be used to reduce the umber of itertios required to obti the desired precisio. The first term ½ µ, or product of the first few terms ½ µ ½ ¾ µ ½ µ ½ ¾Ñ µ, is foud i ROM lookup tble. The Ñ ½ is computed by multiplyig the result retured from the lookup tble by ½ µ. The iitil quotiet pproximtio for Õ Ñ ½ is computed by multiplyig the divided,, by the result retured from the lookup tble such tht Õ ½ ½ µ ½ ¾ µ ½ µ ½ ¾Ñ µ. These two multiplictios re lso idepedet d my occur i prllel. The the itertios cotiue s before. The ltecy of the biomil expsio divide lgorithm depeds o how my multipliers re used d the umber of itertios, Ä, required to obti the desired quotiet precisio. If oe multiplier is used, the the divide uit ltecy is Ø Ú Ø ÐÓÓ ÙÔØ Ð ¾ÄØ ÑÙÐØ Ø ÑÙÐØ. Here, the subtrct must be overlpped with the quotiet multiplictios. If two multipliers re used the the ltecy reduces to Ø Ú Ø ÐÓÓ ÙÔØ Ð Ø ÑÙÐØ Ä Ø ¾¼ ÓÑÔ Ø ÑÙÐØ µ. Iterestigly, if sigle multiplier is used the biomil expsio divide uit ltecy is equivlet to tht of the Newto-Rphso divide uit. However, if two multipliers re used the the ltecy of the biomil expsio divider is reduced by pproximtely, Ä Ø ÑÙÐØ Ø ¾¼ ÓÑÔµ Ø ÑÙÐØ. It hs bee show tht the first-order Newto-Rphso lgorithm d the biomil expsio lgorithms re equivlet whe ¼ ½. I fct, they re two differet wys of expressig the sme computtio. Both lgorithms require the sme umber d type of opertios. However, due to the wy ech lgorithm is expressed, the multiplictios i the Newto-Rphso itertio re depedet d must be performed serilly while the two multiplictios i the biomil expsio re idepedet d c be performed i prllel. q = MUX q i q = /b LUT ( X) r q d MUX d i d i r i q i + d i+ d = b = (+X) Figure. Biomil expsio divide uit. 4 Proposed Divide Architecture The divide uit my compute the quotiet directly d eed ot iterte to solutio. The error i the iitil pproximtio d the covergece of the computtio must be sufficiet to gurtee the desired quotiet precisio will be chieved with oe computtio. This joit costrit implies tht there is trdeoff betwee the lookup tble size d the computtiol complexity of the lgorithm. Next, we express the quotiet directly s the product of the divided d the Ø -order Newto-Rphso reciprocl pproximtio, Õ ½ ½ µ½ ½ µ ¾ ½ µ µ. Here is the iitil estimtio of ½ geerlly foud i lookup tble with error, ÐÓÓ ÙÔØ Ð. The quotiet error is expressed s, Õ ÐÓÓ ÙÔØ Ð ½. The Ø order pproximtio icreses the umber of bits of precisio of by fctor of ½. Therefore, i order to compute Ò-bit reciprocl i sigle itertio the precisio Ò of the iitil pproximtio must be ½ bits. The lookup tble size must be ¾ Ò ½ µ Ò µ bits. For exmple, the ½ third-order Newto-Rphso pproximtio would require bit lookup tble. ¾ Ò Ò

4 Figure illustrtes the hrdwre structure required to implemet the proposed higher-order divide uit. The ltecy of the divide uit is Ø Ú Ø ÐÓÓ ÙÔØ Ð Ø ÑÙÐØ. Here, it is ssumed tht the powers of ½ µ my be computed directly, i prllel, d fster th full precisio multiplictio. Liddicot d Fly [8] propose prllel cubig uit d describe prllel squrig uit suitble for the proposed higher-order divide rchitecture. These powerig uits re described i more detil i subsectio 4.. The proposed divide uit my be fully pipelied if two smll multipliers, the powerig uits, d oe full multiplier re used. Smll multipliers re used to compute ½ µ d µ sice is pproximtely Ò bits i legth. A ½ third-order divide uit would be costructed out of two ½ size multipliers, oe squrig uit, oe cubig uit, d oe full multiplier. A 4-bit implemettio of the proposed third-order divide uit is preseted d discussed i detil i the followig subsectios. LUT (X ~ /b) ~ /(k+) x /(k+) ~/(k+) b bx q = /b k ( bx) ( bx) ( bx) + sum Figure. Proposed higher-order NR divide. 4. Lookup tble A iitil pproximtio for the reciprocl of the divisor is determied by tble lookup. The ½ most sigifict bits of the divisor re used to idex the lookup tble d the Ñ ½ most sigifict bits of the reciprocl pproximtio re retured from the lookup tble. A ¾ Ñ bit ROM with ddress bits d Ñ-bit word size is used for the lookup tble sice the most sigifict bit is costt for ormlized IEEE flotig poit operds []. I order to elimite the eed to represet egtive umbers i the computtio, the lookup tble must be progrmmed such tht the ½ µ fused multiply-subtrct lwys produces positive result. Furthermore, if the result of the ½ µ computtio is positive the the computed reciprocl will be equl to or less th the true reciprocl. This c be demostrted by relizig tht the exct reciprocl c be computed usig ifiite-order Newto- Rphso pproximtio. If ½ µ is positive the the exct reciprocl is proportiol to ifiite sum of decresig positive terms d fiite-order pproximtio is the tructio of this ifiite series. To gurtee tht ½ µ ¼, the lookup tble must retur vlue tht whe multiplied by is lwys less th or equl to oe. This c be ccomplished by progrmmig the tble etries such tht the vlue stored i ech lookup tble ddress is less th the reciprocl for ll possible vlues of tht mp to tht prticulr ddress loctio. Let ØÖÙÒ be equivlet to the divisor tructed to ½ bits. To determie the vlue to store i ech ddress of the lookup tble we first dd ¾ to ØÖÙÒ so tht ØÖÙÒ ¾ for y possible. The reciprocl of ØÖÙÒ ¾ is the ½ computed otig tht ØÖÙÒ ¾ ½. Filly the result ½ of ØÖÙÒ ¾ is tructed to Ñ ½ bits. All of the tble etries re of the form ¼ ½ÜÜÜ ÜÜÜ. The first digit to the right of the rdix poit is lwys oe d therefore does ot eed to be stored i the lookup tble. If the lookup tble is progrmmed ccordig to this procedure, the it will lwys retur pproximtio less th ½ d the result of the ½ µ fused multiply-subtrct opertio is lwys positive. The vlue to be progrmmed i ech ddress of the lookup tble re fuctio of the ddress, the umber of bits used to ddress the lookup tble, d the umber of output bits from the tble Ñ. To determie the best lookup tble size we exhustively simulted severl tble sizes usig full precisio computtios. The results from these simultios re show i tble. The tble size of ¾ bits ws selected for the reciprocl pproximtio sice the mximum error ws less th ¼ ÙÐÔ (uit i the lst plce). Furthermore, the umber of ledig zeros to the right of the rdix poit i the ½ µ computtio is gurteed to be six or more whe usig lookup tble of this size. Tble. Lookup tble sizes for 4-bit operd Ad bits Wd bits Tbl Size Led s Error ulps ulps ulps ulps

5 4. Computig ½ µ The first rithmetic opertio tht follows the lookup tble ccess is the ½ µ fused multiply subtrct. Sice is pproximtely Ò bits, smll multiplier is used. ½ The divisor is i the ormlized IEEE sigle precisio formt. Therefore, ¾ ½ d bits ¾¾ through ¼ deped o the vlue stored i. I order to compute the result of ½ µ, is sig exteded with eight ledig zeros d the the two s complemet of is tke to produce. The sig exteded is represeted by 8 ledig s, the oe zero, followed by the complemet of bits ¾¾ to ¼ d dditiol oe is dded to bit ¼ to complete the twos complemet. The bits from re used to select prtil products of. Lstly the costt. is dded to the prtil product rry (PPA) to ccout for the tht is subtrcted from to form ½ µ. Figure 4 illustrtes the hrdwre uit required to clculte ½ µ. Recll tht exhustive simultio idictes tht the ledig six bits to the right of the rdix poit will lwys be zero. These bits do ot hve to be computed, further reducig the hrdwre eeded for the ½ µ computtio. The boxed re i figure 4 idictes the colums i the PPA tht eed to be summed to compute the 4 most sigifict bits of ½ µ. The PPA c be reduced usig Wllce tree structure i four CSA delys. The re required to implemet the PPA for the ½ µ uit is pproximtely % of the size of 4-bit direct multiplier prtil product rry. b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b.. Figure 4. ½ q q q q q q q q q q q q q q q q q q q q q q q q x q = ( bx) =. q q q... q µ fused multiply-subtrct uit. 4. Computig the powers of ½ µ X X X X X 4 X 5 X 6 X 7 = A squrig circuit tht computes the squre of 4-bit operd 5% fster with slightly less th hlf the re required to perform 4-bit direct multiplictio is used to compute ½ µ ¾. Figure 5 illustrtes the squrig uit prtil product rry reductio. Similrly the cube my be computed directly d cocurretly with the squre usig the cubig uit proposed by x Figure 5. Squrig uit PPA reductio usig ¾. Liddicot d Fly [8]. Figure 6 shows the prtil product rry required to compute the prllel cube for 4-bit operd. Figure 6 lso idetifies three reductios tht my be pplied to reduce the size of the PPA. The terms from the st reductio hve weight of oe while terms from the d d rd reductios hve weight of three. The terms with weight of three re summed is crry sve fshio usig Wllce Tree. The the three times multiple is computed d summed with the X terms usig crry free dditio stge. The precise cube of the 4-bit iput produces 7-bit result. The exct cube is ot eeded to chieve the desired quotiet precisio for the divide uit. I sectio 5, tructios of the reduced cube PPA re studied. These reductios ot oly decrese the re requiremet but lso the ltecy of the cube opertio. The reduced prllel cube is pproximtely 6% fster th computig the cube usig direct multipliers d requires oly bout % of the re tht is required by sigle Ò-bit direct multiplier. The prllel cubig uit is esily extedble to compute higher-order powers of ½ µ. Usig higher-order powers of ½ µ will ccelerte the covergece of the Newto-Rphso pproximtio d reduce the precisio eeded by the iitil reciprocl estimte. 4.4 Computig the Fil Multiplictios The multiplictio of is smll multiplictio sice is pproximtely Ò bits. The multiplictio re ½ is bout % tht of full 4-bit direct multiplictio. A slow re efficiet multiplier my be used sice the result of this computtio is ot eeded util fter the powers of ½ µ hve bee computed. The fil multiplier computes the product of µ with the sum of powers of ½ µ to produce the fil quotiet. This is the oly full multiplictio tht is required by the proposed divide uit.

6 x x - - X X X Figure 6. Prllel cubig uit PPA reductios, (), () µ, d () µ. 5 The Fil Hrdwre Cofigurtio The umber of bits i multiplier PPA, icrese by the squre of the operd legth Ò ¾, while the squrig uit re grows by ½ ¾ Ò¾ d the cubig uit grows by ½ Ò [8]. Therefore, effort must be mde to miimize the itermedite operd legth d the required output precisio from ech subuit. Sigifict reductios i the hrdwre re required to implemet the subuits re preseted i this sectio. The divide uit hs bee exhustively simulted to determie the mximum error i the reciprocl computtio for vrious tructios of the cubig uit PPA. Figure 7 idictes the reciprocl mximum error i terms of ulps for vryig tructios of the cubig uit PPA. A shrp kee exists i the curve whe 6 or more of the lest sigifict colums hve bee tructed. Oly the eight most sigifict colums of the cubig uit PPA re required to chieve error of less th ¼ ulps. Additiolly, the divide uit hs bee exhustively simulted to determie the mximum error i the reciprocl for vrious tructios of the squrig uit give tht 59, 6 d 6 colums hve bee tructed from the cubig uit PPA. Similrly, shrp kee exists i the curve plottig the reciprocl error versus the umber of lest sigifict Reciprocl Mximum Error (ulps) Cube PPA Colums Tructed Figure 7. The reciprocl error versus the cubig uit PPA colum tructio (4-bit). colums tructed from the squrig uit. Tructig up to 9 colums from the squrig uit PPA does ot sigifictly icrese the reciprocl error. The reciprocl ccurcy is less th ¼ ulps for the desig poits listed i Tble. Desig ws selected sice the mximum PPA height d umber of bits i the PPA re miimized. Furthermore, Desig chieves the smllest mximum error. Therefore, the lest sigifict colums of the squrig uit PPA d the lest sigifict 6 colums i the cubig uit PPA my be tructed. The PPA re for the squrig uit is less th 5% of the size of 4-bit direct multiplier while the PPA re for the cubig uit is less th % of the size of 4-bit direct multiplier. Sice the squrig d cubig uits hve bee sigifictly reduced, the ltecy of these uits is less th tht of sigle multiplier. I fct the cube computtio is 6% fster th c be computed usig seril multipliers. Filly, the divide uit hs bee exhustively simulted to determie the mximum error i the reciprocl for vrious tructios of the ½ µ multiply uit give tht the cubig uit PPA hs bee tructed by 6 colums d the squrig uit by colums. The lest sigifict three colums of the ½ µ multiply-subtrct uit my be tru- Tble. Reciprocl error for tructed uits. Cb truc Sqr truc Err(ulp) PPA ht PPA bits

7 cted while mitiig error of less th ¼ ulps. A mximum reciprocl error of.496 ulps is chieved with the cubig uit tructed by 6 colums, the squrig uit tructed by colums, d the ½ µ multiply-subtrct uit tructed by colums. Let s re-exmie the third-order Newto-Rphso reciprocl pproximtio ½ ½ ½ µ ½ µ¾ ½ µ µ Ë where Ë ½ ½ µ ½ µ ¾ ½ µ µ. I figure 8 the bit fields for ech of the four compoets of Ë hve bee liged. I this figure the X s represet bits tht will be computed by colums i the PPA for ech uit d the T s idicte colums tht my be tructed from the PPA for ech uit. From this digrm it is cler tht most of the bits form the ½ µ multiply subtrct uit will cotribute to the 4 most sigifict bits of Ë, while oly bout ½ of the colums from the squrig uit d pproximtely ½ of the colums from the cubig uit cotribute to the 4 most sigifict bits of Ë. Progressively less computtio is required to chieve the higher-order terms of the Newto-Rphso itertio. The desig tht ws preseted i the precedig discussio ws selected to miimize ltecy d hrdwre re uder the costrit of computig the reciprocl to less th ¼ ulps error. By slightly icresig the umber of colums i the squre d cube computtios, the worst-cse reciprocl error will be improved. The lookup tble precisio required depeds o the order of the Newto-Rphso itertio d the precisio of the sub-uit computtio. Icresig the umber of colums i the sub-uit PPAs will decrese the lookup tble precisio eeded to chieve give worst-cse error. The lookup tble re my be reduced by 5% for ech bit of precisio tht it is reduced. Therefore, the desiger my trde off computtiol complexity for re or vice vers. /b = X ( + ( bx) + ( bx) + ( bx) ) 4 bits > ( bx) >... X X X X T ( bx) >... X X X... X T T T T ( bx) > X... X T T T T S >... X X X X 4 bits S Figure 8. Ö order Newto-Rphso pproximtio sub-uit precisio. 6 Summry Divisio by fuctiol itertio utilizes multiplictio s the fudmetl opertio. We preseted the stdrd Newto-Rphso reciprocl itertio d the biomil series expsio reciprocl itertio. The computtiol prllelism i these pproches is limited to two prllel multiplictios. We proposed divide uit rchitecture bsed o the higher-order Newto-Rphso reciprocl itertio. The divide uit uses tructed squrig, cubig d powerig uits. A 4-bit third-order implemettio ws used s exmple to describe the divide uit i detil. We foud tht the first ½ µ ½, secod ½ µ ¾, third ½ µ d fourth ½ µ order computtios require progressively less precisio. Reducig the precisio of the higher-order computtios will mximize the efficiecy, reduce the ltecy d miimize the power cosumptio of the overll computtio. The reductio i the precisio of the higher-order computtios i the proposed rchitecture differs from the typicl Newto-Rphso or series expsio pproch. The lter lgorithms require full precisio computtio fter the first itertio. Furthermore, tructig the subuits sigifictly reduces the re required to implemet the divider. The ½ µ fused multiply-subtrct uit, squrig uit, cubig uit, d the µ multiply uit require respectively %, 5%, %, d % of the re tht is required by direct multiplier. If the divider is desiged with seprte uits the the etire implemettio would be less th the size of two full precisio multipliers. The fil multiplictio my be performed o shred multiplier further reducig the dedicted hrdwre requiremet by the divide uit. A 5-bit IEEE double precisio divide uit ws lso desiged d tested. The sme desig techiques were pplied to the double precisio uit. To reduce the lookup tble size the highest-order 5 colums of the ½ µ prllel computtio were icluded. Addig few dditiol colums for the ½ µ terms oly icresed the X PPA by totl of 8 bits d the X PPA by totl of two bits. Usig ¾ ½ ½ bit lookup tble the 5-bit reciprocl c be computed i oe itertio with the squrig uit d cubig uit tructed to pproximtely 4% d % of the size of 5-bit direct multiplier. A secod 5-bit desig ws studied usig lookup tble of ¾ ½ ½ bits, hlf the size of the previous 5-bit desig poit. The tructed squrig uit ws pproximtely 4% of the size of 5-bit direct multiplier d the tructed cubig uit ws pproximtely 5% of the size of 5-bit direct multiplier. The 5-bit desigs were proportiolly very similr to the results for the 4-bit desigs idictig tht the proposed rchitecture scles well over the studied rge. Tble summrizes the re requiremets of the mul-

8 Tble. Divide re compriso (IEEE DP) Algorithm Lookup Tbl Size HW Are N-R ½ Ø -order 8,67B Mult. Series Expsio 8,67B Mult. N-R Ö -order 8,67B Mult. Ito... [7] 6,44B Mult-Acc. Erc... [] 65,56B Mult. Proposed Arch. 4,6B Mult. Tble 4. Divide ltecy compriso Algorithm Iter. Comp. Ltecy N-R ½ Ø -order SM+FM Series Expsio SM+Sub+FM N-R Ö -order SM+FM Ito... [7] 4 4FMAC Erc... [] SM+SNM+SDA+FM Proposed Arch SM+FM tiplictive divide techiques for IEEE double precisio operds. The lookup tble requires most of the dedicted re eeded to implemet the divide uit. Tble 4 summrizes the ltecies for the multiplictive divisio lgorithms discussed. I the tble the followig bbrevitios re used; SM=smll multiply Ò Ò, SNM=smll rrow multiply Ò Ò, FM=full multiply Ò Ò, SUB=subtrct Ò Ò, FMAC=full multiply ccumulte Ò Ò Ò, SDA= operd siged digit dder. The proposed divide uit hs the lowest ltecy d re requiremets. Additiolly, the umber of itertios required by ech lgorithm to chieve error reductio of Ò Ø Ð is listed. The implemettios listed i Tble 4 d 5 were selected for compriso sice ech oe requires the re of pproximtely two multipliers or less. The proposed rchitecture is esily meble to fully pipelied implemettio. Sice the quotiet is computed i sigle pss through the subuits. A ew divide opertio c be disptched ech cycle. This differs from the first-order Newto-Rphso itertio d biomil series expsio techique tht require multiple itertios to chieve the sme covergece s the proposed rchitecture. Our lgorithm d the Ercegovc, Lg, Muller, Tisserd pproch re fully pipelie-ble without sigifict icrese i hrdwre. 7 Coclusios A fst, efficiet, d high-throughput divide uit is proposed. This uit utilizes the higher-order Newto-Rphso reciprocl pproximtio. Prllel squrig, cubig d powerig uits perform low ltecy cocurret computtio d reduce the overll ltecy of the divide uit. It hs bee demostrted tht progressively less computtio is required to compute the secod, third d higherorder terms. Therefore, sigifict hrdwre reductios re chievble by tructig the powerig uit prtil product rrys. The proposed rchitecture chieves the desired precisio i sigle itertio d is meble to fully pipelied implemettio tht disptches oe divide istructio per cycle. Desigig optiml divide uit for specific operd legth requires blcig the subuit precisio d lookup tble size. Refereces [] ANSI/IEEE Std , IEEE Stdrd for Biry Flotig-Poit Arithmetic, 985. [] R. C. Agrwl, F. G. Gustvso, d M. S. Schmookler. Series Approximtio Methods for Divide d Squre Root i the È ÓÛ Ö Ì Å Processor. I Proc. 4th IEEE Symp. o Computer Arithmetic, pges 6, April 999. [] M. D. Ercegovc, T. Lg, J.-M. Muller, d A. Tisserd. Reciproctio, Squre Root, Iverse Squre Root, d Some Elemetry Fuctios Usig Smll Multipliers. IEEE Trsctios o Computers, 49(7):68 67, July. [4] M. D. Ercegovc, D. W. Mtul, J.-M. Muller, d G. Wei. Improvig Goldschmidt Divisio, Squre Root, d Squre Root Reciprocl. IEEE Trsctios o Computers, 49(7):759 76, July. [5] M. Fly. O Divisio by Fuctiol Itertio. IEEE Trsctios o Computers, C-9(8):7 76, August 97. [6] R. E. Goldschmidt. Applictios of Divisio by Covergece. Mster s thesis, Dept. of Electricl Egieerig, Msschusetts Istitute of Techology, Cmbridge, Mss., Jue 964. [7] M. Ito, N. Tkgi, d S. Yjim. Efficiet Iitil Approximtios d Fst Covergig Methods for Divisio d Squre Root. I Proc. th IEEE Symp. o Computer Arithmetic, pges 9, July 995. [8] A. Liddicot d M. Fly. The Prllel Squre d Cube Computtio. I IEEE 4th Asilomr Coferce o Sigls, Systems d Computers, October. [9] S. F. Oberm. Flotig Poit Divisio d Squre Root Algorithms d Implemettio i the AMD-K7 Microprocessor. I Proc. 4th IEEE Symp. o Computer Arithmetic, pges 6 5, April 999. [] S. F. Oberm d M. Fly. Divisio Algorithms d Implemettios. IEEE Trsctios o Computers, 46(8):8 854, August 997. [] P. Rbiowitz. Multiple-Precisio Divisio. I Commuictios of the ACM, volume 4, pge 98, Februry 96. [] D. Wog d M. Fly. Fst Divisio Usig Accurte Quotiet Approximtios to Reduce the Number of Itertios. I IEEE Trsctios o Computers, pges , August 99.

Parallel Square and Cube Computations

Parallel Square and Cube Computations Prllel Squre nd Cube Computtions Albert A. Liddicot nd Michel J. Flynn Computer Systems Lbortory, Deprtment of Electricl Engineering Stnford University Gtes Building 5 Serr Mll, Stnford, CA 945, USA liddicot@stnford.edu

More information

Primitive Pythagorean triples and generalized Fibonacci sequences

Primitive Pythagorean triples and generalized Fibonacci sequences Notes o Number Theory d Discrete Mthemtics Prit ISSN 30 53, Olie ISSN 37 875 Vol 3, 07, No, 5 Primitive Pythgore triples d geerlized ibocci sequeces J V Leyedekkers d A G Sho, 3 culty of Sciece, The Uiversity

More information

Using Gaussian Elimination for Determination of Structure Index in Euler Deconvolution

Using Gaussian Elimination for Determination of Structure Index in Euler Deconvolution Austrli Jourl of Bsic d Applied Scieces, 4(1): 6390-6396, 010 ISSN 1991-8178 Usig Gussi Elimitio for Determitio of Structure Idex i Euler Decovolutio 1 Rez.toushmli d M.ghbri 1 Islmic zd uiversity,hmed

More information

Gauss-Seidel Method. An iterative method. Basic Procedure:

Gauss-Seidel Method. An iterative method. Basic Procedure: Guss-Siedel Method Guss-Seidel Method A itertive method. Bsic Procedure: -Algebriclly solve ech lier equtio for i -Assume iitil guess solutio rry -Solve for ech i d repet -Use bsolute reltive pproimte

More information

. (b) Evaluate the sum given by. Exercise #1: A sequence is defined by the equation a n 2n

. (b) Evaluate the sum given by. Exercise #1: A sequence is defined by the equation a n 2n Nme: 453 Dte: SEQUENCES ALGEBRA WITH TRIGONOMETRY Sequeces, or ordered list of umbers, re extremely importt i mthemtics, both theoreticl d pplied A sequece is formlly defied s fuctio tht hs s its domi

More information

A Variant of Pascal s Triangle Dennis Van Hise Stetson University

A Variant of Pascal s Triangle Dennis Van Hise Stetson University A Vrit of Pscl s Trigle eis V Hise tetso Uiversity I. Itroductio d Nottio It is the purpose of this pper to explore vrit of Pscl s trigle. This vrit hs the rule tht every etry, deoted s,, where,, is clculted

More information

CS280 HW1 Solution Set Spring2002. now, we need to get rid of the n term. We know that:

CS280 HW1 Solution Set Spring2002. now, we need to get rid of the n term. We know that: CS80 HW Solutio Set Sprig00 Solutios by: Shddi Doghmi -) -) 4 * 4 4 4 ) 4 b-) ) ) ) * ) ) ) 0 ) c-) ) ) ) ) ) ow we eed to get rid of the term. We ow tht: ) ) ) ) substitute ito the recursive epressio:

More information

Section 3.2: Arithmetic Sequences and Series

Section 3.2: Arithmetic Sequences and Series Sectio 3.: Arithmetic Sequeces Series Arithmetic Sequeces Let s strt out with efiitio: rithmetic sequece: sequece i which the ext term is fou by ig costt (the commo ifferece ) to the previous term Here

More information

Section 3.1: Sequences and Series

Section 3.1: Sequences and Series Sectio 3.: Sequeces d Series Sequeces Let s strt out with the defiitio of sequece: sequece: ordered list of umbers, ofte with defiite ptter Recll tht i set, order does t mtter so this is oe wy tht sequece

More information

On Some Iterative Methods for Solving Systems of Linear Equations

On Some Iterative Methods for Solving Systems of Linear Equations Computtiol d Applied Mthemtics Jourl 205; (2: 2-28 Published olie Februry 20, 205 (http://www.scit.org/jourl/cmj O Some Itertive Methods for Solvig Systems of Lier Equtios Fdugb Sudy Emmuel Deprtmet of

More information

Def : A radical is an expression consisting of a radical sign (radical symbol), a radicand, and an index.

Def : A radical is an expression consisting of a radical sign (radical symbol), a radicand, and an index. Mth 0 Uit : Roots & Powers Red Buildig O, Big Ides, d New Vocbulry, p. 0 tet.. Estitig Roots ( clss) Red Lesso Focus p. tet. Outcoes Ch. Notes. Defie d give eple of rdicl. pp. 0, 9. Idetify the ide d the

More information

EE213A - EE298-2 Lecture 11

EE213A - EE298-2 Lecture 11 EE23A EE292 Lecture Time multiplexig eferece: F. Ctthoor (prt III) Igrid Verbuwhede Deprtmet of Electricl Egieerig Uiversity of Clifori Los Ageles igrid@ee.ucl.edu Motivtio DSP represettio & modelig to

More information

Two-Dimensional Motion Estimation (Part II: Advanced Techniques) Yao Wang Polytechnic University, Brooklyn, NY

Two-Dimensional Motion Estimation (Part II: Advanced Techniques) Yao Wang Polytechnic University, Brooklyn, NY Two-Dimesiol Motio Estimtio (Prt II: Advced Techiques) Yo Wg Polytechic Uiversity, Brookly, NY11201 http://eeweb.poly.edu/~yo Outlie Problems with EBMA Multiresolutio motio estimtio Hierrchicl block mtchig

More information

3n

3n Prctice Set 6 Sequeces d Series Clcultor Required Objectives Alyze ptters i sequeces to determie subsequet terms. Fid the first four terms of sequece give equtio for. Fid expressio for give sequece. Expd

More information

EE 477 Digital Signal Processing. 8a IIR Systems

EE 477 Digital Signal Processing. 8a IIR Systems EE 477 Digitl Sigl Processig 8 IIR Systems Geerl Differece Equtio FIR: output depeds o curret d pst iputs oly IIR: output depeds o curret d pst iputs d pst outputs y M [] b x[ ] + y[ l] l N l EE 477 DSP

More information

Two-Dimensional Motion Estimation (Part III: Advanced Techniques) Yao Wang Polytechnic University, Brooklyn, NY11201

Two-Dimensional Motion Estimation (Part III: Advanced Techniques) Yao Wang Polytechnic University, Brooklyn, NY11201 Two-Dimesiol Motio Estimtio (Prt III: Advced Techiques) Yo Wg Polytechic Uiversity, Brookly, NY11201 yo@visio.poly.edu Yo Wg, 2002 2-D Motio Estimtio Outlie Deformble block mtchig lgorithm (DBMA) Node-bsed

More information

What do all those bits mean now? Number Systems and Arithmetic. Introduction to Binary Numbers. Questions About Numbers

What do all those bits mean now? Number Systems and Arithmetic. Introduction to Binary Numbers. Questions About Numbers Wht do ll those bits men now? bits (...) Number Systems nd Arithmetic or Computers go to elementry school instruction R-formt I-formt... integer dt number text chrs... floting point signed unsigned single

More information

Embedded Systems Design: A Unified Hardware/Software Introduction. Outline. Chapter 2: Custom single-purpose processors.

Embedded Systems Design: A Unified Hardware/Software Introduction. Outline. Chapter 2: Custom single-purpose processors. Hrdwre/Softwre Itroductio Chpter Custom sigle-purpose processors Itroductio Combitiol logic Sequetil logic Outlie Custom sigle-purpose processor desig RT-level custom sigle-purpose processor desig Hrdwre/Softwre

More information

Questions About Numbers. Number Systems and Arithmetic. Introduction to Binary Numbers. Negative Numbers?

Questions About Numbers. Number Systems and Arithmetic. Introduction to Binary Numbers. Negative Numbers? Questions About Numbers Number Systems nd Arithmetic or Computers go to elementry school How do you represent negtive numbers? frctions? relly lrge numbers? relly smll numbers? How do you do rithmetic?

More information

PCJ-BLAST massively parallel sequence alignment using NCBI Blast and PCJ Java library

PCJ-BLAST massively parallel sequence alignment using NCBI Blast and PCJ Java library PCJ-BLAST mssively prllel sequece ligmet usig NCBI Blst d PCJ Jv librry Piotr Bł Mrek Nowicki Dvit Bzhlv bl@icm.edu.pl ICM Uiversity of Wrsw frmir@mt.umk.pl ICM Uiversity of Wrsw N. Copericus Uiversity

More information

What do all those bits mean now? Number Systems and Arithmetic. Introduction to Binary Numbers. Questions About Numbers

What do all those bits mean now? Number Systems and Arithmetic. Introduction to Binary Numbers. Questions About Numbers Wht do ll those bits men now? bits (...) Number Systems nd Arithmetic or Computers go to elementry school instruction R-formt I-formt... integer dt number text chrs... floting point signed unsigned single

More information

Engineer To Engineer Note

Engineer To Engineer Note Engineer To Engineer Note EE-186 Technicl Notes on using Anlog Devices' DSP components nd development tools Contct our technicl support by phone: (800) ANALOG-D or e-mil: dsp.support@nlog.com Or visit

More information

Lecture 1: Introduction and Strassen s Algorithm

Lecture 1: Introduction and Strassen s Algorithm 5-750: Graduate Algorithms Jauary 7, 08 Lecture : Itroductio ad Strasse s Algorithm Lecturer: Gary Miller Scribe: Robert Parker Itroductio Machie models I this class, we will primarily use the Radom Access

More information

Logic Spring Final Review

Logic Spring Final Review Idirect Argumet: Cotrdictios d Cotrpositio. Prove the followig by cotrdictio d by cotrpositio. Give two seprte proofs. The egtive of y irrtiol umber is irrtiol. b. For ll iteger, if ² is odd the is odd.

More information

Gauss-Siedel Method. Major: All Engineering Majors. Authors: Autar Kaw

Gauss-Siedel Method. Major: All Engineering Majors. Authors: Autar Kaw Guss-Siedel Method Mjor: All Egieerig Mjors Authors: Autr Kw http://umericlmethods.eg.usf.edu Trsformig Numericl Methods Eductio for STEM Udergrdutes 4//06 http://umericlmethods.eg.usf.edu Guss-Seidel

More information

Optimization of a precise integration method for seismic modeling based on graphic processing unit

Optimization of a precise integration method for seismic modeling based on graphic processing unit Erthq Sci ()3: 387 393 387 Doi:.7/s589--736- Optimiztio of precise itegrtio method for seismic modelig bsed o grphic processig uit Jigyu Li Geyg Tg d Tiyue Hu, School of Erth d Spce Scieces, Pekig Uiversity,

More information

12-B FRACTIONS AND DECIMALS

12-B FRACTIONS AND DECIMALS -B Frctions nd Decimls. () If ll four integers were negtive, their product would be positive, nd so could not equl one of them. If ll four integers were positive, their product would be much greter thn

More information

Raytracing: Quality. Ray Genealogy. Ray Genealogy. Describe what you see! Describe what you see! Describe what you see!

Raytracing: Quality. Ray Genealogy. Ray Genealogy. Describe what you see! Describe what you see! Describe what you see! Rytrcig: Qulity COSC 4328/5327 Scott A. Kig Ry Geelogy Primry rys spw off 3 rys. Two of those c spw of 3 more, etc. Whe do you stop? Whe ry leves the scee. Whe the cotriutio is smll eough. After ech ouce

More information

Symbolic Algebra and Timing Driven Data-flow Synthesis

Symbolic Algebra and Timing Driven Data-flow Synthesis Symolic Alger d Timig Drive Dt-flow Sythesis Armit Peymdoust Giovi De Micheli Computer Systems Lortory, Stford Uiversity Stford, CA 94305 {rmit, i}@stford.edu Astrct The growig mrket of multi-medi pplictios

More information

Multilayer Perceptrons

Multilayer Perceptrons Multilyer Perceptros The Essece of Neurl Networks XOR Problem Multilyer Perceptros Seerl lyers of itercoected euros: Sesory iput) lyer Output lyer Oe or more hidde lyers Geerliztio of sigle lyer perceptros

More information

Section 3.1: Sequences and Series

Section 3.1: Sequences and Series Section.: Sequences d Series Sequences Let s strt out with the definition of sequence: sequence: ordered list of numbers, often with definite pttern Recll tht in set, order doesn t mtter so this is one

More information

Section 10.4 Hyperbolas

Section 10.4 Hyperbolas 66 Section 10.4 Hyperbols Objective : Definition of hyperbol & hyperbols centered t (0, 0). The third type of conic we will study is the hyperbol. It is defined in the sme mnner tht we defined the prbol

More information

ON THE STABILITY OF LU-FACTORIZATIONS

ON THE STABILITY OF LU-FACTORIZATIONS Idi Jourl of Fudetl d pplied Life Scieces ISSN: 3 6345 (Olie) Ope ccess Olie Itertiol Jourl vilble t wwwcibtechorg/sped/jls/4/4/jlsht 4 Vol 4 (S4) pp 4-49/Sfdr Reserch rticle BSTRCT ON THE STBILITY OF

More information

14 Randomized Minimum Cut

14 Randomized Minimum Cut Jques: But, for the seveth cuse; how did you fid the qurrel o the seveth cuse? Touchstoe: Upo lie seve times removed: ber your body more seemig, Audrey: s thus, sir. I did dislike the cut of certi courtier

More information

d i e j example:

d i e j example: Expoet Rules Mth II Foruls to study for NC Fil Ex Fll 017 exple: 0 1 exple: 0 = 1 c h exple: c h 6 1 1 1 exple: 9 bbg b exple: bbg 1 b d 9 F F exple: exple: 7 7 4 16 HG I K J F H G I K J HG I K J F H G

More information

9.1. Sequences and Series. Sequences. What you should learn. Why you should learn it. Definition of Sequence

9.1. Sequences and Series. Sequences. What you should learn. Why you should learn it. Definition of Sequence _9.qxd // : AM Page Chapter 9 Sequeces, Series, ad Probability 9. Sequeces ad Series What you should lear Use sequece otatio to write the terms of sequeces. Use factorial otatio. Use summatio otatio to

More information

Chapter 3. Floating Point Arithmetic

Chapter 3. Floating Point Arithmetic COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter 3 Floatig Poit Arithmetic Review - Multiplicatio 0 1 1 0 = 6 multiplicad 32-bit ALU shift product right multiplier add

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Part A Datapath Design

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Part A Datapath Design COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter The Processor Part A path Desig Itroductio CPU performace factors Istructio cout Determied by ISA ad compiler. CPI ad

More information

Randomized Algorithms

Randomized Algorithms // Copyright he McGrw-Hill Compies, Ic. Permissio required for reproductio or disply. Rdomized Algorithms Copyright he McGrw-Hill Compies, Ic. Permissio required for reproductio or disply. Copyright he

More information

EECS150 - Digital Design Lecture 23 - High-level Design and Optimization 3, Parallelism and Pipelining

EECS150 - Digital Design Lecture 23 - High-level Design and Optimization 3, Parallelism and Pipelining EECS150 - Digitl Design Lecture 23 - High-level Design nd Optimiztion 3, Prllelism nd Pipelining Nov 12, 2002 John Wwrzynek Fll 2002 EECS150 - Lec23-HL3 Pge 1 Prllelism Prllelism is the ct of doing more

More information

Accelerating 3D convolution using streaming architectures on FPGAs

Accelerating 3D convolution using streaming architectures on FPGAs Accelerting 3D convolution using streming rchitectures on FPGAs Hohun Fu, Robert G. Clpp, Oskr Mencer, nd Oliver Pell ABSTRACT We investigte FPGA rchitectures for ccelerting pplictions whose dominnt cost

More information

EE260: Digital Design, Spring /16/18. n Example: m 0 (=x 1 x 2 ) is adjacent to m 1 (=x 1 x 2 ) and m 2 (=x 1 x 2 ) but NOT m 3 (=x 1 x 2 )

EE260: Digital Design, Spring /16/18. n Example: m 0 (=x 1 x 2 ) is adjacent to m 1 (=x 1 x 2 ) and m 2 (=x 1 x 2 ) but NOT m 3 (=x 1 x 2 ) EE26: Digital Desig, Sprig 28 3/6/8 EE 26: Itroductio to Digital Desig Combiatioal Datapath Yao Zheg Departmet of Electrical Egieerig Uiversity of Hawaiʻi at Māoa Combiatioal Logic Blocks Multiplexer Ecoders/Decoders

More information

LU Decomposition Method

LU Decomposition Method SOLUTION OF SIMULTANEOUS LINEAR EQUATIONS LU Decompositio Method Jamie Traha, Autar Kaw, Kevi Marti Uiversity of South Florida Uited States of America kaw@eg.usf.edu http://umericalmethods.eg.usf.edu Itroductio

More information

UNIVERSITY OF MORATUWA

UNIVERSITY OF MORATUWA UNIVERSITY OF MORATUWA FACULTY OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING B.Sc. Egieerig 2014 Itake Semester 2 Examiatio CS2052 COMPUTER ARCHITECTURE Time allowed: 2 Hours Jauary 2016

More information

Appendix D. Controller Implementation

Appendix D. Controller Implementation COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Appedix D Cotroller Implemetatio Cotroller Implemetatios Combiatioal logic (sigle-cycle); Fiite state machie (multi-cycle, pipelied);

More information

Engineer-to-Engineer Note

Engineer-to-Engineer Note Engineer-to-Engineer Note EE-295 Technicl notes on using Anlog Devices DSPs, processors nd development tools Visit our Web resources http://www.nlog.com/ee-notes nd http://www.nlog.com/processors or e-mil

More information

Dynamic Programming. Andreas Klappenecker. [partially based on slides by Prof. Welch] Monday, September 24, 2012

Dynamic Programming. Andreas Klappenecker. [partially based on slides by Prof. Welch] Monday, September 24, 2012 Dynmic Progrmming Andres Klppenecker [prtilly bsed on slides by Prof. Welch] 1 Dynmic Progrmming Optiml substructure An optiml solution to the problem contins within it optiml solutions to subproblems.

More information

ECEN 468 Advanced Logic Design Lecture 36: RTL Optimization

ECEN 468 Advanced Logic Design Lecture 36: RTL Optimization ECEN 468 Advnced Logic Design Lecture 36: RTL Optimiztion ECEN 468 Lecture 36 RTL Design Optimiztions nd Trdeoffs 6.5 While creting dtpth during RTL design, there re severl optimiztions nd trdeoffs, involving

More information

Modeling Reusable Concurrent Passive Entity Objects in Colored Petri Nets

Modeling Reusable Concurrent Passive Entity Objects in Colored Petri Nets Modelig Reusble Cocurret Pssive Etity Objects i Colored Petri Nets Rowld Pitts d Hss Gom George Mso Uiversity, Firfx, Virgii, USA {rpitts,hgom}@gmu.edu Abstrct. Cocurret softwre systems re growig icresigly

More information

Type-Constrained Direct Fitting of Quadric Surfaces

Type-Constrained Direct Fitting of Quadric Surfaces 1 ype-costried Direct Fittig of Qudric Surfces Jmes Adrews 1 d Crlo H. Séqui 1 Uiversity of Clifori, Berkeley, jim@eecs.erkeley.edu Uiversity of Clifori, Berkeley, sequi@cs.erkeley.edu ABSRAC We preset

More information

Chapter 3 Classification of FFT Processor Algorithms

Chapter 3 Classification of FFT Processor Algorithms Chapter Classificatio of FFT Processor Algorithms The computatioal complexity of the Discrete Fourier trasform (DFT) is very high. It requires () 2 complex multiplicatios ad () complex additios [5]. As

More information

Morgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5

Morgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5 Morga Kaufma Publishers 26 February, 28 COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter 5 Set-Associative Cache Architecture Performace Summary Whe CPU performace icreases:

More information

Computer Architecture. Microcomputer Architecture and Interfacing Colorado School of Mines Professor William Hoff

Computer Architecture. Microcomputer Architecture and Interfacing Colorado School of Mines Professor William Hoff Computer rchitecture Microcomputer rchitecture ad Iterfacig Colorado School of Mies Professor William Hoff Computer Hardware Orgaizatio Processor Performs all computatios; coordiates data trasfer Iput

More information

If f(x, y) is a surface that lies above r(t), we can think about the area between the surface and the curve.

If f(x, y) is a surface that lies above r(t), we can think about the area between the surface and the curve. Line Integrls The ide of line integrl is very similr to tht of single integrls. If the function f(x) is bove the x-xis on the intervl [, b], then the integrl of f(x) over [, b] is the re under f over the

More information

x )Scales are the reciprocal of each other. e

x )Scales are the reciprocal of each other. e 9. Reciprocls A Complete Slide Rule Mnul - eville W Young Chpter 9 Further Applictions of the LL scles The LL (e x ) scles nd the corresponding LL 0 (e -x or Exmple : 0.244 4.. Set the hir line over 4.

More information

10.5 Graphing Quadratic Functions

10.5 Graphing Quadratic Functions 0.5 Grphing Qudrtic Functions Now tht we cn solve qudrtic equtions, we wnt to lern how to grph the function ssocited with the qudrtic eqution. We cll this the qudrtic function. Grphs of Qudrtic Functions

More information

Math 142, Exam 1 Information.

Math 142, Exam 1 Information. Mth 14, Exm 1 Informtion. 9/14/10, LC 41, 9:30-10:45. Exm 1 will be bsed on: Sections 7.1-7.5. The corresponding ssigned homework problems (see http://www.mth.sc.edu/ boyln/sccourses/14f10/14.html) At

More information

Hamiltonian-T*- Laceability in Jump Graphs Of Diameter Two

Hamiltonian-T*- Laceability in Jump Graphs Of Diameter Two IOSR Jourl of Mthemtics IOSR-JM e-issn 78-78 p-issn9-76x. Volume Issue Ver. III My-Ju. PP -6 www.iosrjourls.org Hmiltoi-T*- Lcebility i Jump Grphs Of Dimeter Two Mjuth.G Murli.R Deprtmet of MthemticsGopl

More information

CMSC Computer Architecture Lecture 12: Virtual Memory. Prof. Yanjing Li University of Chicago

CMSC Computer Architecture Lecture 12: Virtual Memory. Prof. Yanjing Li University of Chicago CMSC 22200 Computer Architecture Lecture 12: Virtual Memory Prof. Yajig Li Uiversity of Chicago A System with Physical Memory Oly Examples: most Cray machies early PCs Memory early all embedded systems

More information

Comparing Offset Curve Approximation Methods

Comparing Offset Curve Approximation Methods Feture Article Comprig Offset Curve Approximtio Methos Offset curves hve iverse egieerig pplictios, spurrig extesive reserch o vrious offset techiques. Reserch i the erly 0s focuse o pproximtio techiques

More information

The Chromatic Covering of a Graph: Ratios, Domination, Areas and Farey Sequences.

The Chromatic Covering of a Graph: Ratios, Domination, Areas and Farey Sequences. The Chromtic Coverig of Grph: Rtios Domitio res d Frey eueces. Pul ugust Witer mthemtics UKZN Durb outh fric emil: witerp@ukz.c.z bstrct The study of the chromtic umber d vertex coverigs of grphs hs opeed

More information

Data diverse software fault tolerance techniques

Data diverse software fault tolerance techniques Data diverse software fault tolerace techiques Complemets desig diversity by compesatig for desig diversity s s limitatios Ivolves obtaiig a related set of poits i the program data space, executig the

More information

Explicit Decoupled Group Iterative Method for the Triangle Element Solution of 2D Helmholtz Equations

Explicit Decoupled Group Iterative Method for the Triangle Element Solution of 2D Helmholtz Equations Interntionl Mthemticl Forum, Vol. 12, 2017, no. 16, 771-779 HIKARI Ltd, www.m-hikri.com https://doi.org/10.12988/imf.2017.7654 Explicit Decoupled Group Itertive Method for the Tringle Element Solution

More information

EE123 Digital Signal Processing

EE123 Digital Signal Processing Last Time EE Digital Sigal Processig Lecture 7 Block Covolutio, Overlap ad Add, FFT Discrete Fourier Trasform Properties of the Liear covolutio through circular Today Liear covolutio with Overlap ad add

More information

6.2 Volumes of Revolution: The Disk Method

6.2 Volumes of Revolution: The Disk Method mth ppliction: volumes by disks: volume prt ii 6 6 Volumes of Revolution: The Disk Method One of the simplest pplictions of integrtion (Theorem 6) nd the ccumultion process is to determine so-clled volumes

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Single-Cycle Disadvantages & Advantages

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Single-Cycle Disadvantages & Advantages COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter 4 The Processor Pipeliig Sigle-Cycle Disadvatages & Advatages Clk Uses the clock cycle iefficietly the clock cycle must

More information

CMSC 430, Practice Problems 1 (Solutions)

CMSC 430, Practice Problems 1 (Solutions) CMC 430, Prtie Problems 1 olutios) 1. Cosider the followig grmmr: d or ) true flse. Compute First sets for eh produtio d otermil FIRTtrue) = { true } FIRTflse) = { flse } FIRT ) ) = { } FIRT d ) = FIRT

More information

EE University of Minnesota. Midterm Exam #1. Prof. Matthew O'Keefe TA: Eric Seppanen. Department of Electrical and Computer Engineering

EE University of Minnesota. Midterm Exam #1. Prof. Matthew O'Keefe TA: Eric Seppanen. Department of Electrical and Computer Engineering EE 4363 1 Uiversity of Miesota Midterm Exam #1 Prof. Matthew O'Keefe TA: Eric Seppae Departmet of Electrical ad Computer Egieerig Uiversity of Miesota Twi Cities Campus EE 4363 Itroductio to Microprocessors

More information

Lecture 28: Data Link Layer

Lecture 28: Data Link Layer Automatic Repeat Request (ARQ) 2. Go ack N ARQ Although the Stop ad Wait ARQ is very simple, you ca easily show that it has very the low efficiecy. The low efficiecy comes from the fact that the trasmittig

More information

Creating Exact Bezier Representations of CST Shapes. David D. Marshall. California Polytechnic State University, San Luis Obispo, CA , USA

Creating Exact Bezier Representations of CST Shapes. David D. Marshall. California Polytechnic State University, San Luis Obispo, CA , USA Creatig Exact Bezier Represetatios of CST Shapes David D. Marshall Califoria Polytechic State Uiversity, Sa Luis Obispo, CA 93407-035, USA The paper presets a method of expressig CST shapes pioeered by

More information

Unit 5 Vocabulary. A function is a special relationship where each input has a single output.

Unit 5 Vocabulary. A function is a special relationship where each input has a single output. MODULE 3 Terms Definition Picture/Exmple/Nottion 1 Function Nottion Function nottion is n efficient nd effective wy to write functions of ll types. This nottion llows you to identify the input vlue with

More information

COSC 1P03. Ch 7 Recursion. Introduction to Data Structures 8.1

COSC 1P03. Ch 7 Recursion. Introduction to Data Structures 8.1 COSC 1P03 Ch 7 Recursio Itroductio to Data Structures 8.1 COSC 1P03 Recursio Recursio I Mathematics factorial Fiboacci umbers defie ifiite set with fiite defiitio I Computer Sciece sytax rules fiite defiitio,

More information

An Efficient Divide and Conquer Algorithm for Exact Hazard Free Logic Minimization

An Efficient Divide and Conquer Algorithm for Exact Hazard Free Logic Minimization An Efficient Divide nd Conquer Algorithm for Exct Hzrd Free Logic Minimiztion J.W.J.M. Rutten, M.R.C.M. Berkelr, C.A.J. vn Eijk, M.A.J. Kolsteren Eindhoven University of Technology Informtion nd Communiction

More information

. Written in factored form it is easy to see that the roots are 2, 2, i,

. Written in factored form it is easy to see that the roots are 2, 2, i, CMPS A Itroductio to Programmig Programmig Assigmet 4 I this assigmet you will write a java program that determies the real roots of a polyomial that lie withi a specified rage. Recall that the roots (or

More information

MA1008. Calculus and Linear Algebra for Engineers. Course Notes for Section B. Stephen Wills. Department of Mathematics. University College Cork

MA1008. Calculus and Linear Algebra for Engineers. Course Notes for Section B. Stephen Wills. Department of Mathematics. University College Cork MA1008 Clculus nd Liner Algebr for Engineers Course Notes for Section B Stephen Wills Deprtment of Mthemtics University College Cork s.wills@ucc.ie http://euclid.ucc.ie/pges/stff/wills/teching/m1008/ma1008.html

More information

Complete Coverage Path Planning of Mobile Robot Based on Dynamic Programming Algorithm Peng Zhou, Zhong-min Wang, Zhen-nan Li, Yang Li

Complete Coverage Path Planning of Mobile Robot Based on Dynamic Programming Algorithm Peng Zhou, Zhong-min Wang, Zhen-nan Li, Yang Li 2nd Interntionl Conference on Electronic & Mechnicl Engineering nd Informtion Technology (EMEIT-212) Complete Coverge Pth Plnning of Mobile Robot Bsed on Dynmic Progrmming Algorithm Peng Zhou, Zhong-min

More information

Ones Assignment Method for Solving Traveling Salesman Problem

Ones Assignment Method for Solving Traveling Salesman Problem Joural of mathematics ad computer sciece 0 (0), 58-65 Oes Assigmet Method for Solvig Travelig Salesma Problem Hadi Basirzadeh Departmet of Mathematics, Shahid Chamra Uiversity, Ahvaz, Ira Article history:

More information

Stained Glass Design. Teaching Goals:

Stained Glass Design. Teaching Goals: Stined Glss Design Time required 45-90 minutes Teching Gols: 1. Students pply grphic methods to design vrious shpes on the plne.. Students pply geometric trnsformtions of grphs of functions in order to

More information

P(r)dr = probability of generating a random number in the interval dr near r. For this probability idea to make sense we must have

P(r)dr = probability of generating a random number in the interval dr near r. For this probability idea to make sense we must have Rndom Numers nd Monte Crlo Methods Rndom Numer Methods The integrtion methods discussed so fr ll re sed upon mking polynomil pproximtions to the integrnd. Another clss of numericl methods relies upon using

More information

APPLICATION NOTE PACE1750AE BUILT-IN FUNCTIONS

APPLICATION NOTE PACE1750AE BUILT-IN FUNCTIONS APPLICATION NOTE PACE175AE BUILT-IN UNCTIONS About This Note This applicatio brief is iteded to explai ad demostrate the use of the special fuctios that are built ito the PACE175AE processor. These powerful

More information

Polynomial Functions and Models. Learning Objectives. Polynomials. P (x) = a n x n + a n 1 x n a 1 x + a 0, a n 0

Polynomial Functions and Models. Learning Objectives. Polynomials. P (x) = a n x n + a n 1 x n a 1 x + a 0, a n 0 Polyomial Fuctios ad Models 1 Learig Objectives 1. Idetify polyomial fuctios ad their degree 2. Graph polyomial fuctios usig trasformatios 3. Idetify the real zeros of a polyomial fuctio ad their multiplicity

More information

External Memory. External Memory. Computational Models. Computational Models

External Memory. External Memory. Computational Models. Computational Models Exterl emory Exterl emory Computtiol model Shortet pth i implicit grid grph lgorithm I/O lgorithm Cche-oliviou lgorithm Computtiol model Shortet pth i implicit grid grph lgorithm I/O lgorithm Cche-oliviou

More information

Study Guide for Exam 3

Study Guide for Exam 3 Mth 05 Elementry Algebr Fll 00 Study Guide for Em Em is scheduled for Thursdy, November 8 th nd ill cover chpters 5 nd. You my use "5" note crd (both sides) nd scientific clcultor. You re epected to no

More information

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe CHAPTER 22 Database Recovery Techiques Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe Itroductio Recovery algorithms Recovery cocepts Write-ahead

More information

II. THE ALGORITHM. A. Depth Map Processing

II. THE ALGORITHM. A. Depth Map Processing Lerning Plnr Geometric Scene Context Using Stereo Vision Pul G. Bumstrck, Bryn D. Brudevold, nd Pul D. Reynolds {pbumstrck,brynb,pulr2}@stnford.edu CS229 Finl Project Report December 15, 2006 Abstrct A

More information

Optimal Mapped Mesh on the Circle

Optimal Mapped Mesh on the Circle Koferece ANSYS 009 Optimal Mapped Mesh o the Circle doc. Ig. Jaroslav Štigler, Ph.D. Bro Uiversity of Techology, aculty of Mechaical gieerig, ergy Istitut, Abstract: This paper brigs out some ideas ad

More information

IMP: Superposer Integrated Morphometrics Package Superposition Tool

IMP: Superposer Integrated Morphometrics Package Superposition Tool IMP: Superposer Itegrated Morphometrics Package Superpositio Tool Programmig by: David Lieber ( 03) Caisius College 200 Mai St. Buffalo, NY 4208 Cocept by: H. David Sheets, Dept. of Physics, Caisius College

More information

GPUMP: a Multiple-Precision Integer Library for GPUs

GPUMP: a Multiple-Precision Integer Library for GPUs GPUMP: a Multiple-Precisio Iteger Library for GPUs Kaiyog Zhao ad Xiaowe Chu Departmet of Computer Sciece, Hog Kog Baptist Uiversity Hog Kog, P. R. Chia Email: {kyzhao, chxw}@comp.hkbu.edu.hk Abstract

More information

CSC 220: Computer Organization Unit 11 Basic Computer Organization and Design

CSC 220: Computer Organization Unit 11 Basic Computer Organization and Design College of Computer ad Iformatio Scieces Departmet of Computer Sciece CSC 220: Computer Orgaizatio Uit 11 Basic Computer Orgaizatio ad Desig 1 For the rest of the semester, we ll focus o computer architecture:

More information

5 Regular 4-Sided Composition

5 Regular 4-Sided Composition Xilinx-Lv User Guide 5 Regulr 4-Sided Composition This tutoril shows how regulr circuits with 4-sided elements cn be described in Lv. The type of regulr circuits tht re discussed in this tutoril re those

More information

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe CHAPTER 18 Strategies for Query Processig Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe Itroductio DBMS techiques to process a query Scaer idetifies

More information

Engineer To Engineer Note

Engineer To Engineer Note Engineer To Engineer Note EE-169 Technicl Notes on using Anlog Devices' DSP components nd development tools Contct our technicl support by phone: (800) ANALOG-D or e-mil: dsp.support@nlog.com Or visit

More information

Design Optimization of Extrusion Blow Molded Parts Using Prediction Reliability Guided Search of Evolving Network Modeling

Design Optimization of Extrusion Blow Molded Parts Using Prediction Reliability Guided Search of Evolving Network Modeling Desig Optimiztio of Extrusio Blow Molded Prts Usig Predictio Relibility Guided Serch of Evolvig Network Modelig Jyh-Cheg Yu, 1 Jyh-Yeog Jug 1Deprtmet of Mechicl d Automtio Egieerig, Ntiol Kohsiug First

More information

Rational Numbers---Adding Fractions With Like Denominators.

Rational Numbers---Adding Fractions With Like Denominators. Rtionl Numbers---Adding Frctions With Like Denomintors. A. In Words: To dd frctions with like denomintors, dd the numertors nd write the sum over the sme denomintor. B. In Symbols: For frctions c nd b

More information

Master Informatics Eng. 2017/18. A.J.Proença. Memory Hierarchy. (most slides are borrowed) AJProença, Advanced Architectures, MiEI, UMinho, 2017/18 1

Master Informatics Eng. 2017/18. A.J.Proença. Memory Hierarchy. (most slides are borrowed) AJProença, Advanced Architectures, MiEI, UMinho, 2017/18 1 Advaced Architectures Master Iformatics Eg. 2017/18 A.J.Proeça Memory Hierarchy (most slides are borrowed) AJProeça, Advaced Architectures, MiEI, UMiho, 2017/18 1 Itroductio Programmers wat ulimited amouts

More information

Geometric transformations

Geometric transformations Geometric trnsformtions Computer Grphics Some slides re bsed on Shy Shlom slides from TAU mn n n m m T A,,,,,, 2 1 2 22 12 1 21 11 Rows become columns nd columns become rows nm n n m m A,,,,,, 1 1 2 22

More information

On Computation and Resource Management in Networked Embedded Systems

On Computation and Resource Management in Networked Embedded Systems On Computtion nd Resource Mngement in Networed Embedded Systems Soheil Ghisi Krlene Nguyen Elheh Bozorgzdeh Mjid Srrfzdeh Computer Science Deprtment University of Cliforni, Los Angeles, CA 90095 soheil,

More information

Morgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5.

Morgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5. Morga Kaufma Publishers 26 February, 208 COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter 5 Virtual Memory Review: The Memory Hierarchy Take advatage of the priciple

More information

Real-time Vehicle Detection and Tracking Algorithm for Forward Vehicle Collision Warning

Real-time Vehicle Detection and Tracking Algorithm for Forward Vehicle Collision Warning JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.18, NO.5, OCTOBER, 2018 ISSN(Prit 1598-1657 https://doi.org/10.5573/jsts.2018.18.5.547 ISSN(Olie 2233-4866 Rel-time Vehicle Detectio d Trckig Algorithm

More information

Numerical Methods Lecture 6 - Curve Fitting Techniques

Numerical Methods Lecture 6 - Curve Fitting Techniques Numerical Methods Lecture 6 - Curve Fittig Techiques Topics motivatio iterpolatio liear regressio higher order polyomial form expoetial form Curve fittig - motivatio For root fidig, we used a give fuctio

More information