Reducing SDRAM Energy Consumption in Embedded Systems Λ

Size: px
Start display at page:

Download "Reducing SDRAM Energy Consumption in Embedded Systems Λ"

Transcription

1 Reducig SDRAM Eergy Cosumptio i Embedded Systems Λ Jelea Trajkovic, Alexader Veidebaum Uiversity of Califoria, Irvie fjeleat,alexvg@ics.uci.edu Abstract DRAM eergy cosumptio i embedded systems ca be very high, exceedig that of the data cache or eve of the etire processor. This paper presets a scheme for reducig the eergy cosumptio of SDRAM memory access by a combiatio of techiques that take advatage of SDRAM eergy efficiecies i bak ad row access. This is achieved by usig small, cachelike structures i the memory cotroller to prefetch a additioal cache block(s) o reads ad to combie block writes to the same DRAM row. The results quatify the DRAM eergy cosumptio of MiBech applicatios ad demostrate sigificat savigs i both the DRAM eergy cosumptio, a average of 23%, ad the eergy-delay product, a average of 44%. The approach also improves performace: the CPI is reduced by 26% o a average. 1. Itroductio May embedded applicatios are memory itesive, especially multimedia applicatios. Memory access costitutes a sigificat portio of overall eergy cosumptio i such applicatios [2, ]. This research ivestigatges a architectural approach to reducig the memory system eergy without ay performace loss. I fact, it will be show to result i a performace gai. Much of prior research to reduce eergy cosumptio has focused o caches. But mai memory cosumes orders of magitude more eergy per access tha cache. As will be show below, i some applicatios the total eergy of mai memory accesses ca be a order of magitude higher tha the total data cache eergy cosumptio. Thus it is very importat to optimize DRAM access for eergy cosumptio. Some of the techiques proposed for cache optimizatio ca be exteded for this purpose. SDRAM memory is oe of the major types of DRAM used i embedded systems. The eergy of SDRAM access ca be divided ito two mai compoets: the eergy of a bak/row activatio (activate-precharge pair) ad the eergy of a read or write access to a active bak/row. The activate-precharge pair cosumes 65% of the total access eergy (per maufacturer s data). The SDRAM orgaizatio allows the bak/row to be left o after a access, which permits additioal read/write access to such bak/row without icurrig the activate/precharge cost o each access. Readig or writig twice the amout of data withi the same activate-precharge cycle does ot double the eergy cosumptio but oly icreases it by approximately 35%. Embedded systems use a data cache ad oly access memory o cache misses or write-backs. Thus the memory is read/writte i (cache) blocks (lies). Therefore this paper proposes readig/writig multiple lies at a time, i.e. withi a sigle activate-precharge pair. This will be show to lead to sigificat eergy savigs. Multiple block reads are accomplished via hardware prefetchig ad writes via write combiig [16] at the memory iterface. Accessig multiple lies requires itermediate storage. This research proposes to use a small amout of such storage i the memory cotroller. Additioal lies will be prefetched ito this storage o each read. Writes to the same SDRAM row will be buffered first ad combied ito a multiple lie SDRAM write wheever possible. The mai cotributios of this research are adaptig both prefetchig ad write combiig to SDRAM eergy reductio, i particular combiig writes to thesamesdramrow. Λ This research was supported i part by the Natioal Sciece Foudatio uder Grat No. NSF CCR

2 Figure 1. Write curret profile A secod goal of this research is to avoid effectig the executio time or eve to improve it while savig eergy. The small buffers used for prefetchig/combiig act as a memory cache ad ca sigificatly improve read performace. Prefetchig ca sometimes degrade executio time by iterferig with regular memory access without supplyig useful data. Write bufferig eables the processor to cotiue executio ad ot wait for the write access to fiish. 1 The rest of the paper is orgaized as follows. Sectio 2 presets related work. Sectio 3 describes SDRAM eergy compoets for read ad write access. Sectio 4 presets architectural modificatios ad describes the eergy savig techiques of our approach. Experimetal results demostratig the beefits of the approach are preseted i sectio 5. Coclusios are preseted i sectio Related Work May architectural approaches for reducig eergy cosumptio i embedded processors have bee proposed. We are uaware of architectural solutios for SDRAM eergy reductio. For a software-based solutio see [], for istace. There is a large body of prior work o prefetchig, write buffers, ad write combiig buffers that is briefly summarized below. Techiques for reducig eergy cosumptio i mai memory are also described. Numerous prefetchig algorithms have bee proposed ([16, 8, 4, 2, 17]) based o history based predictio of future memory addresses. They differ i their way of predictig which block to fetch, how may blocks to fetch ad what triggers prefetch. May of these schemes require a complex predictio mechaism, although the so-called oe-block-lookahead schemes does ot. For istace, a stream buffer [8] prefetches N cosecutive lies triggered by a cache miss. Noe of the prefetchig proposals targets eergy reductio. Write buffers [16, 9] have bee used i may processors to avoid waitig for data to be writte to memory ad to avoid delayig cache miss fetch. Mergig was itroduced to improve performace of write buffers for write-through caches. This approach combies a icomig write request withi a cache lie with requests already residig i the write buffer, resultig i a more efficiet memory usage. The techique has bee implemeted i may architectures: Digital s VAX 88 [6], StrogARM [14], MIPS R43i [13], ad Itel XScale [3]. Low power mode is preset i most state-of-the-art DRAMs. A sigificat amout of eergy ca be saved by settig as may memory chips as possible to sleep mode [12]. Efforts have bee made i reducig memory eergy cosumptio based o differet compressio techiques. For istace, 1 It is assumed that there is o post-l1 cache write buffer i a CPU with a write-back cache. 2

3 SDRAM Eergy per memory access 6.E-8 5.E-8 E(ACT-PRE) E(WR) Eergy[J] 4.E-8 3.E-8 2.E-8 1.E-8.E+ Write oe Two times write oe Write two Three times write oe Write three Four times write oe Write four Figure 2. Eergy per memory access [1] describes ad evaluates a computer system that supports hardware mai memory compressio. As the compressio ratio of applicatios dyamically chages so does the real memory size that is maaged by the operatig system (OS). OS chages are ecessary to support mai memory compressio. These chages iclude the maagemet of the free pages pool as a fuctio of the physical memory utilizatio ad the effective compressio ratio, coupled with zeroig pages at free time rather tha at allocatio time. Kim et al. [] idetify successive memory accesses to differet rows ad baks as a source for icreased latecy ad eergy cosumptio i memory. They use block-based layouts istead of a traditio oe ad determie the best block size such that a umber of requests for differet row or bak is miimized. 3. SDRAM Eergy Compoets Let us idetify potetial sources of eergy savig i SDRAM memories that are used i embedded devices. I order to access data from a SDRAM, a row i a particular bak has to be activated. After activatio ad a specified delay a read or a write is performed. Whe the access is completed, a row precharge operatio is performed. Also, if access to data i a differet row has to be performed, the curret row eeds to be precharged before the ew row is activated. The total eergy of a access cosists of two mai compoets: eergy cosumed by activate-precharge pair ad read or write access eergy. Micro describes the curret profile for the write (or read) operatio i such memory [19] as show i Figure 1 2 (reproduced from [19]) i Micro s Techical Note. The first large peak i the graph correspods to the activatio commad. The middle plateau correspods to writig four words of data. Fially, a small peak ca be oticed for the precharge commad. This shows that activatio-precharge pair costitutes a sigificat portio of the overall curret ad thus the eergy cosumptio. Figure 2 quatifies eergy compoets of a 64Mb SDRAM memory [18]. These were obtaied usig Micro s System Power Calculator [19]. Each bar o the graph shows the activate-precharge ad write compoets of the total eergy. The figure shows the total eergy for writig 16 bytes (B) of data (oe cache lie), two separate 16B writes, ad two 16B writes combied i oe activatio-precharge. It also presets data for 3 ad 4 accesses, performed separately or combied. The figure shows the activate-precharge pair to be the domiat compoet. It cotributes 65% to the total eergy cosumptio for this SDRAM. The eergy savigs from combied read or write accesses are 24%, 34% ad 38% for two, three ad four combied accesses, respectively, comparig to same size performed i sequece. 4. Proposed Approach The memory subsystem of a typical embedded system is show i Figure 3a. The baselie cofiguratio cosists of a CPU with a sigle level of cache ad a mai memory. The cache is write-back. Memory latecy is high i terms of processor cycles ad the processor has to stall for may cycles waitig for the data to arrive from the memory. As discussed above, it would be good to fetch more tha oe lie o each cache miss ad thus save eergy. This has to be doe uder the usual costraits of embedded systems: reduced cost, eergy, ad complexity. Because of that the prefetchig has to be precise, caot use very complex logic due to potetial time ad eergy overheads, ad caot use a large buffer 2 cfl21 Micro Techology, Ic. All Rights Reserved. Used with permissio. 3

4 a) cache CPU Memory c) cache CPU Read Fetch buffer cotroller Memory b) cache CPU WCB cotroller Memory d) cache CPU c o t r o l l e r WCB Fetch buffer Memory Figure 3. Memory subsystem architectures tag_row tag_col v data Addr?? Read check Write combie check Read check hit Write combie check hit Figure 4. Architecture of write-combie buffer memory, agai due to overheads. Thus, the approach chose for this work is to use a simple oe-block lookahead [?] or stream buffer-like [?] prefetchig. Combiig of multiple lie writes is used i this paper with a goal of eergy reductio. Thus a differet type of combiig or coalescig write buffer (WCB) is proposed for writes. The differece is that it should be able to combie ay two writes to the same SDRAM row. It also eeds to be small simple for the same reasos as discusses above for prefetchig. Let us iitially discuss write ad read combiig separately to uderstad the beefits ad requiremets of each of them. Next, the combied approach will be ivestigated ad the best mechaism selected. I all cases, the sizes of buffers are studied as part of this research. For reasos that will become more clear after the architecture of each separate buffer is studied, the combied approach will actually use separate buffers as opposed to a sigle, mii-l2 cache i the memory iterface. While coceptually the same, the implemetatios are quite differet with separate buffers havig a advatage. 4.1 Write Combiig The memory cotroller for write combiig is show by Figure 3b. Figure 4 shows the write-combiig buffer (WCB) architecture for combiig 2 write requests. Each etry cosists of a split tag, a valid bit, ad data storage for oe cache lie. Tag bits are divided ito two groups: bits that determie the row address i the memory (tag row) ad the remaiig tag bits that are part of the colum address (tag col). The buffer is expected to be very small ad thus full associativity is easily implemetable. A address of a icomig cache lie write request is checked agaist all tag row etries. LRU or pseudo-lru replacemet is used. A hit i the WCB meas that the icomig write request ca be combied ad performed together with a valid WCB etry. Oce the memory write is performed the WCB erty is freed. Notice that the icomig write is ot writte ito the WCB i this case. 4

5 A WCB write miss causes the icomig request to be stored i the WCB ad to be potetially combied with a future write. A write to the WCB may cause a replacemet ad write-back from the WCB of a sigle-block etry. This architecture ca be exteded to combie more tha 2 accesses. To combie N+1 writes, N tag col sub-tags, N valid bits, ad N data store blocks are stored with each tag row etry. The etry cotais a couter to show how may writes to this SDRAM row are already preset. A icomig write causes a write to memory o a hit if the etry couter has a value of N. O replacemet less tha N etries may be writte to memory, as specified by the couter value. To summarize, the WCB differs from the traditioal write buffer or eve a coalescig write buffer because it ca merge data that is aywhere i a give SDRAM row. As a result it writes data to the memory i uits of N+1 cache blocks or less. The goal is thus to write N+1 etries as ofte as possible. A traditioal write buffer, o the other had, ca oly coalesce idividual words (or sub-words) withi a cache lie ad writes data to memory whe the memory is ot busy. A major advatage of this ew form of write combiig is that it ca ot icur ay eergy losses. The total umber of writes is the same as i the baselie case but those writes are potetially grouped i a differet way. I additio, the WCB reduces the processor CPI by allowig the CPU to cotiue executio as soo as data are writte to the WCB (as opposed to waitig for the SDRAM write to complete). The presece of the WCB creates a coherece problem o reads. It is solved as follows: every read address is checked agaist the full lie address (i.e. both tag row ad tag col bits) of every lie i the WCB. A read hit implies the eeded data is i the WCB ad the matched lie is set to the CPU cache. This results i miss latecy reductio. 4.2 Read Prefetchig The goal of read combiig is to perform multi-lie DRAM reads. However, sice there is oly oe read miss at ay give time i a embedded system (with a i-order CPU), there is othig to combie it with. Thus the oly way to read-combie is to geerate a additioal address speculatively via predictio. This is what other sequetial prefetchig mechaisms metioed above do. The differece is that our prefetchig is aimed at mai memory eergy reductio. It is possible to prefetch o-adjacet lies withi a same row, i a way similar to write combiig. This would, however, require a very sophisticated address predictor that would be both large ad complex (see [11]). This is why oly simple, sequetial prefetchig is cosidered here. The memory cotroller for read combiig is show i the Figure 3c. It fetches N additioal cache lies o a read miss. The lies are stored i a fetch buffer (FB): a small, cache-like structure with a tag o each etry. Each cache read miss is checked agaist the FB. O a hit, the lie from FB is set to the CPU cache. O a FB miss the lie is read from the DRAM together with N additioal lies which are stored i the FB. The missed lie is read first ad set to the CPU cache. All N+1 read accesses are performed i the same activate-precharge cycle. The rest of this paper will deal with N = 1, 2, ad 3. As will be show, a small, fully associative FB is sufficiet to achieve sigificat eergy reductio. I additio, the performace is also improved due to FB s cachig effect. 4.3 Read ad Write Combiig Each of the write ad the read combiig has its ow idividual advatages. They are largely idepedet of each other ad thus ca be deployed together for a additive eergy reductio as well as performace improvemet. The questio is what is the best architecture to perform the read ad the write combiig at the same time. The architecture advocated by this paper is show i Fig. 3d. This solutio basically itegrates the separate fetch buffer (FB) ad write-combie buffer (WCB). While a sigle, cache-like structure ca be desiged, it will have two major disadvatages. First, it will likely require that N, the umber of cache lies to combie, be the same for reads ad writes. As will be show i this paper this is ot desirable. Ad secod, more importatly, it will make it more difficult to perform sequetial read combiig ad withi the SDRAM row write combiig which is very importat. I additio, there will be iterferece ad replacemets of write lies by read prefetches ad vice versa i this case. Also, the split tag is ot required for read combiig. Mergig the WCB ad FB desigs is ot very difficult sice each will cotiue to operate idepedetly ad has its ow cotrol. Thus the write-combiig operatio i the WCB remais the same ad the read combiig (prefetchig) operatio i the FB remais the same. Recall that WCB was already checked o each read miss ad could supply data to the CPU cache. There is oe chage that is required for the merged orgaizatio. Additioal coherecy checks have to be performed betwee writes ad prefetches. First, prefetched data ca be ivalidated by a icomig write from the CPU cache. Secod, there is o poit i prefetchig lies already i the write combiig buffer. 5

6 Bechmark Emem relative Bechmark Emem relative to Ecache [%] to Ecache [%] d FFT d rijdael e rijdael e susa 6.5 d jpeg c jpeg d blowfish e blowfish 3. i FFT Table 1. Memory eergy relative to the eergy of data cache Briefly, the solutio is two-fold. First, ay icomig data cache write request is checked agaist both the FB ad the WCB (i parallel). A matchig FB etry is ivalidated. Secod, every prefetch address is checked agaist the WCB first, the set to the DRAM oly if there was o match. The small size of the WCB guaratees that this additioal fully associative search has low eergy overhead ad does ot cause slowdow. The algorithm that the cotroller implemets to keep coherecy betwee buffers is: O outgoig cache request if replacemet the check write i WCB ivalidate FB etry if exists else check read i WCB ad FB i parallel (hit is possible i oly oe of buffers) if hit i WCB the supply data to the CPU ed if if hit i FB the supply data to the CPU else geerate N prefetch addresses check for the same row/bak (drop oes that exceed row/bak boudary) N 1» N check for existece i WCB (drop matched) N 1» N 2 fetch N 2 +1addresses ed if ed if The oly potetial drawback of the combied operatio is over-utilizatio of the limited memory badwidth. A combiatio of write combiig ad read prefetchig ca use up all of the available badwidth. A read miss may thus be delayed ad cause a slowdow. The evaluatio of access combiig is preseted i the ext sectio. It will show that the eergy ad/or performace loss ca be avoided i almost all cases. Whe it does happe it ca be miimized by a proper choice of architectural parameters. 5. Evaluatio Methodology The system modeled i this paper cosists of a i-order processor ad a sigle, large SDRAM memory chip. Oe ca thik of a mobile phoe as a example of such a system. The processor is a sigle issue, 32b embedded processor resemblig Itel s Xscale. It has a 8KB, 4-way set associative istructio ad data caches with a 16Byte lie ad a 2 cycle latecy. Data cache implemets a write-allocate, write-back policy. The CPU operatig frequecy is 4MHz. The CPU memory bus is a MHz, 32b bus. The baselie cache miss latecy is 36 processor cycles for the first word to arrive, ad a additioal 4 6

7 Fetch 2 25 Fetch2_16 Fetch2_32 Fetch2_64 Fetch2_128 E improvemet [%] Figure 5. Memory eergy reductio for read combiig for differet buffer sizes ED improvemet [%] Fetch 2 Fetch2_16 Fetch2_32 Fetch2_64 Fetch2_128 Figure 6. Memory ED product reductio for read combiig for differet buffer sizes processor cycles for each cosecutive word. The mai memory with the modified cotroller has a latecy of 4 processor cycles for the delivery of the first word, ad 4 cycles for each additioal cosecutive word. Both baselie ad modified architectures use the same SDRAM (see data sheet [18]). The extra 4 cycles (s) i the access time to modified memory are due to FB ad WCB access delays. The SDRAM clock rate is is MHz (speed grade -6). The evaluatio is performed usig the SimpleScalar 3. simulator [5] executig PISA biaries. SimpleScalar s bus ad memory model are modified to match this architecture. Both FB ad WCB are fully associative, with 16-byte lies ad a latecy of 12 processor cycles. The WCB ca be cofigured to store N lies per etry. The FB ad WCB sizes were limited to avoid overhead ad reduce cost. As a result, the FB cosumes.75% ad WCB cosumes 4% of the data cache eergy whe both buffers are at full capacity ad assumig a.18 micro process techology is used. Dyamic eergy cosumptio of the cache, WCB, ad FB is modeled usig modified CACTI 3.2 [15] for.18 micro techology. Oe of the mai chages are i the sese amplifier eergy model, which was overestimated i the origial model. Mai memory eergy is modeled usig Micro s System Power Calculator [19]. Bechmarks from a MiBech [7] are used i this study. All bechmarks are simulated usig large iput sets. 5.1 Results The impact of the proposed architecture is evaluated by comparig the memory eergy cosumptio, eergy-delay product, ad CPI relative to the baselie cofiguratio. The followig leged is used: ffl FetchN M for read fetch of N lies with a buffer of M lies (N-1 lies are prefetched); ffl WCB P for write combiig of 2 accesses with a buffer of P lies; ffl WCB PxQ for write combiig of (Q+1) accesses with a buffer of P etries x Q lies; 7

8 Fetch 2 6 Fetch2_16 Fetch2_32 Fetch2_64 Fetch2_ CPI reductio [%] Figure 7. CPI improvemet for read combiig for differet buffer sizes 2 Fetch =2,3,4 Fetch2_32 Fetch2_64 Fetch2_128 Fetch3_32 Fetch3_64 Fetch3_128 Fetch4_32 Fetch4_64 Fetch4_128 E improvemet [%] Figure 8. Memory eergy reductio for read combiig for differet fetch ad buffer sizes ffl FetchN M+WCB PxQ for a hybrid cofiguratio. Table 1 shows memory eergy per bechmark relative to the data cache memory cosumptio for the baselie model. This is with a average cache miss rate of 3.5%. [14] showed the data cache cosumig 16% of overall the processor eergy. For MiBech bechmarks the mai memory cosumes, o a average, 2.6 times the eergy of the data cache. The worst case differece is 15x Read Prefetch: the Effect of Fetch ad Buffer Size First, let us evaluate read combiig ad its effect o memory eergy cosumptio. Figure 5 shows eergy reductio for differet fetch buffer sizes relative to the baselie cofiguratio. Buffer sizes of 16, 32, 64 ad 128 etries are used, fetchig two 16B blocks. The average memory eergy savigs are 12% to 17%. The smallest buffer already obtais a sigificat reductio, with each doublig of the size producig a small (1% to 2%) icrease. Two bechmarks d rijdael ad e rijdael have a oticeable icrease i memory eergy cosumptio for a 16-etry FB. With 32 etries there are basically o eergy icreases, makig it a good choice. The eergy-delay (ED) product show i Figure 6. It is reduced by as much as 68% ad by 38% o average, with buffer size havig almost o impact. It ca be see, that both bechmarks that have eergy icrease obtai sigificat ED product savig. The eergy delay reductio is large due to a improved average memory latecy. The effect of latecy reductio ca be see i the CPI improvemet show i Figure 7. CPI is also isesitive to the buffer size chage. Read combiig techique reduces CPI by as much as 59% ad by 27% o a average. 8

9 7 5 Fetch =2,3,4 Fetch2_32 Fetch2_64 Fetch2_128 Fetch3_32 Fetch3_64 Fetch3_128 Fetch4_32 Fetch4_64 Fetch4_128 ED improvemet [%] Figure 9. Memory eergy-delay product for read combiig for differet fetch ad buffer sizes 7 6 Fetch =2,3,4 Fetch2_32 Fetch2_64 Fetch2_128 Fetch3_32 Fetch3_64 Fetch3_128 Fetch4_32 Fetch4_64 Fetch4_128 CPI reductio [%] Figure. CPI reductio for read combiig for differet fetch ad buffer sizes If we cosider eergy as the mai factor, the smallest buffer that provides savigs with o overhead is oe with 32 etries. O the other had, if we cosider ED product, it is the buffer with 16 etries that brigs the same savigs as the largest buffer i the majority of cases. The oly two bechmarks that have sigificat differece i ED product savig ( 15%) are d rijdael ad e rijdael. Therefore, for combied techique, we will explore read-combie buffer of 16 ad 32 etries. Let us ow explore the use of differet fetch size. Figure 8 shows the eergy reductio relative to the baselie cofiguratio. We have cosidered fetchig 2, 3 ad 4 lies ad buffer sizes of 32, 64 ad 128 etries. It ca be see that a fetch size larger tha 2 is ot beeficial. Geerally, a larger fetch size icreases eergy cosumptio because may prefetched lies are ot used. Fetch size 3 i some cases reduces eergy cosumptio, but ot as much as fetch size 2, while fetch size 4 always has too much overhead. O average, a fetch size of 2 saves 13% to 16% of eergy, fetchig 3 lies saves from -1% to 8% ad fetchig 4 icreases eergy cosumptio by 16% to 37%. Figures 9 ad preset ED product ad CPI savigs. It ca be see that for ED product savigs, fetch size 2 uiformly outperforms other fetch sizes. The oly exceptio is for fetch 3 where the differece is just 1%. The largest savig is 68% ad the average improvemet rages form 5% to 39%. CPI savig is ot sigificatly affected by ay parameter chage except for,,,ad, where the differece is less tha 5%. I the best case, a 65% improvemet is obtaied, ad o average the improvemet rages from 27% to 29% Write Combiig: the Effect of Combiig ad Buffer Size Figure 11 shows the relative memory eergy for write combiig. Buffer sizes of 2, 4, ad 8 etries are used, with 2, 3, ad 4-lie combiig. The buffer cofiguratios are chose to have approximately the same size i all cases. O average, 9

10 Write combie cofiguratios: combie 2, 3, WCB_2 WCB_4 WCB_8 WCB_4x2 WCB_2x E improvemet [%] Figure 11. Memory eergy reductio for differet write-combiig ad buffer sizes Write combie cofiguratios: combie 2, 3, 4 8 WCB_2 WCB_4 WCB_8 WCB_4x2 WCB_2x3 7 ED improvemet [%] Figure 12. Memory ED product reductio for differet write-combiig ad buffer sizes the improvemet rages from 8% to 11%; that is smaller tha for read combiig. Buffer size has little impact (the left three bars), but additioal eergy savigs are obtaied whe combiig 3 or 4 lies. Write combiig achieves up to a 8% reductio of ED product (see Figure 12) with a 4% average. CPI savigs (Figure 13) are also ot affected by size or cofiguratio. Write combiig achieves a up to 76% CPI improvemet, with a 33% average Hybrid Cofiguratios Figure 14 shows the effect of both write ad read combiig. The fetch buffer with 16 etries is used with write combiig buffers of size 8, cofigured to combie either 2, 3 or 4 writes. The results show that combiig 3 lies is the best cofiguratio. O average 21.5% to 23.5% eergy savigs are obtaied. As see i Figures 15 ad 16, the differece i ED product ad CPI savigs for differet cofiguratios is ot more tha 2%. ED product is reduced by 71% i the best case ad by 44% o average. CPI is improved by up to 56%, with a 26% o average. Figure 17 shows the effect of both write ad read combiig whe the fetch buffer with 32 etries is used with write combiig buffers of size 8, cofigured to combie either 2, 3 or 4 writes. Still, combiig 3 lies gives the best cofiguratio. O average 22% to 24% eergy savigs are obtaied. The differece i ED product ad CPI savig is egligible (2%) for differet cofiguratios, as see i Figures 18 ad 19 respectively. ED product ad CPI reductio are the same as for the cofiguratio with 16-etry read-combiig buffer. 6. Coclusios This research developed a techique for reducig eergy cosumptio for SDRAM memory access i embedded systems. We itroduced architectural additios to the memory cotroller of a fully parameterizable uit that cosists of a small high speed fetch buffer ad a write-combie buffer. This allowed read prefetchig ad combied write access to the mai memory. Sice prefetched data resides i a fast ad small cache-like memory, a access to it is sigificatly cheaper, both i terms of time ad eergy cosumptio. Combiig write accesses leads to gais without ay pealty. The techique was evaluated usig the SimpleScalar simulator of a Xscale-like embedded processor. The results demostrate that a sigificat reductio i memory eergy cosumptio ad delays ca be achieved by read prefetchig ad write combiig. Eve with small size buffers, 256B/512B for prefetchig ad 128B for write combig, a

11 Write combie cofiguratios: combie 2, 3, 4 8 WCB_2 WCB_4 WCB_8 WCB_4x2 WCB_2x CPI reductio [%] Figure 13. CPI improvemet for differet write-combiig ad buffer sizes Fetch 2 (16) & WCB cofiguratios (2,3,4) Fetch2_16+WCB_8 Fetch2_16+WCb_4x2 Fetch2_16+WCB_2x3 3 E improvemet [%] 2 - Figure 14. Memory eergy reductio for differet combied cofiguratios, for 16 etry FB average 23% eergy reductio is achieved. The eergy-delay product is improved, o average, by over 4%. The CPI is reduced by 26%, o average. Prefetchig or write combiig ca be powered dow idividually to better tue them to a give applicatio. The proposed approach requires simple hardware suitable to embedded systems. I a resource costraied eviromet of embedded systems ruig multimedia applicatios these eergy savigs provide a sigificat beefit. Refereces [1] B. Abali ad H. Frake. Operatig system support for fast hardware compressio of mai memory cotets. I Memory Wall Workshop, the 27th A. It. Sym. O Computer Architecture, 2. [2] K. Barr ad K. Asaovic. Eergy aware lossless data compressio. I Proceedigs of the First Iteratioal Coferece o Mobile Systems, Applicatios, ad Services (MobiSys 23), Sa Fracisco, CA, 23. [3] L. T. Clark ad et al. A embedded 32b microprocessor core for low-power ad high-performace applicatios. IEEE JSSC, 36(11): , Nov. 21. [4] F. Dahlgre, M. Dubois, ad P. Stestrom. Fixed ad adaptive sequetial prefetchig i red-memory multiprocessors. I I Proceedigs of the 1993 Iteratioal Coferece o Parallel Processig,, pages 56 63, [5] D.Burger ad T. Austi. The simplescalar tool set, versio 2.. Techical report, Techical Report TR , Uiversity of Wiscosi-Madiso, [6] J. Fu, J. Keller, ad K. Haduch. Aspects of the vax 88 c box desig. Digital Techical Joural, Number 4, February [7] M. R. Guthaus, J. S. Rigeberg, D. Erst, T. M. Austi, T. Mudge, ad R. B. Brow. Mibech: A free, commercially represetative embedded bechmark suite. I IEEE 4th Aual Workshop o Workload Characterizatio, pages 83 94, 21. [8] N. P. Jouppi. Improvig direct-mapped cache performace by the additio of a small fully-associative cache ad prefetch buffers. I Proceedigs of the 17th aual iteratioal symposium o Computer Architecture, pages ACM Press, 199. [9] R. Kessler, E. McLella, ad D. Webb. The alpha microprocessor architecture. I ACM SIGPLAN Notices, [] H. S. Kim, N. Vijaykrisha, M. Kademir, E. Brockmeyer, F. Catthoor, ad M. J. Irwi. Estimatig ifluece of data layout optimizatios o sdram eergy cosumptio. I Proceedigs of the 23 iteratioal symposium o Low power electroics ad desig, pages ACM Press, 23. [11] S. Kumar ad C. Wilkerso. Exploitig spatial locality i data cache usig spatial footprit. I Iteratioal symposium o Computer Architecture,

12 Fetch 2 (32) & WCB cofiguratios (2,3,4) 8 Fetch2_32+WCB_8 Fetch2_32+WC_4x2 Fetch_2_32+WC_2x3 7 ED improvemet [%] Figure 15. Memory ED product reductio for differet combied cofiguratios, for 16 etry FB CPI reductio [%] Fetch 2 (16) & WCB cofiguratios (2,3,4) Fetch2_16+WC_8 Fetch2_16+WC_4x2 Fetch2_16+WC_2x3 Figure 16. CPI improvemet for differet combied cofiguratios, for 16 etry FB [12] A. R. Lebeck, X. Fa, H. Zeg, ad C. Ellis. Power aware page allocatio. I I Proceedigs of the 9th Iteratioal Coferece o Architectural Support for Programmig Laguages ad Operatig Systems (ASPLOS IX), November 2, 2. [13] MIPS Techologies, Ic.: R Series Documets [14] J. Motagaro ad et al. A 16 mhz, 32 b,.5 w cmos risc microprocessor. IEEE JSSC, 31(11): , Nov [15] P.Shivakumar ad N. Jouppi. Cacti 3.: A itegrated cache timig, power, ad area model. Techical report, Digital Equipmet Corporatio, COMPAQ Wester Research Lab, 199. [16] A. J. Smith. Cache memories. ACM Comput. Surv., 14(3):473 53, [17] Y. Solihi, J. Torrellas, ad J. Lee. Usig a user-level memory thread for correlatio prefetchig. I I Proceedigs of 29th Aual Iteratioal Symposium o Computer Architecture, May 22., 22. [18] The Micro: Sychroous DRAM 64Mb x32 Part umber: MT48LC2M32B2. [19] The Micro System-Power Calculator [2] A. Veidebaum, W. Tag, R. Gupta, A. Nicolau, ad X. Ji. Adaptig cache lie size to applicatio behavior. I It l Cof. Supercomputig,

13 E improvemet [%] Fetch 2 (32) & WCB cofiguratios (2,3,4) Fetch2_32+WCB_8 Fetch2_32+WC_4x2 Fetch_2_32+WC_2x3 Figure 17. Memory eergy reductio for differet combied cofiguratios, for 32 etry FB ED improvemet [%] Fetch 2 (16) & WCB cofiguratios (2,3,4) Fetch2_16+WC_8 Fetch2_16+WC_4x2 Fetch2_16+WC_2x3 Figure 18. Memory ED product reductio for differet combied cofiguratios, for 32 etry FB CPI reductio [%] Fetch 2 (32) & WCB cofiguratios (2,3,4) Fetch2_32+WCB_8 Fetch2_32+WCB_4x2 Fetch_2_32_WC_2x3 Figure 19. CPI improvemet for differet combied cofiguratios, for 32 etry FB 13

Master Informatics Eng. 2017/18. A.J.Proença. Memory Hierarchy. (most slides are borrowed) AJProença, Advanced Architectures, MiEI, UMinho, 2017/18 1

Master Informatics Eng. 2017/18. A.J.Proença. Memory Hierarchy. (most slides are borrowed) AJProença, Advanced Architectures, MiEI, UMinho, 2017/18 1 Advaced Architectures Master Iformatics Eg. 2017/18 A.J.Proeça Memory Hierarchy (most slides are borrowed) AJProeça, Advaced Architectures, MiEI, UMiho, 2017/18 1 Itroductio Programmers wat ulimited amouts

More information

Course Site: Copyright 2012, Elsevier Inc. All rights reserved.

Course Site:   Copyright 2012, Elsevier Inc. All rights reserved. Course Site: http://cc.sjtu.edu.c/g2s/site/aca.html 1 Computer Architecture A Quatitative Approach, Fifth Editio Chapter 2 Memory Hierarchy Desig 2 Outlie Memory Hierarchy Cache Desig Basic Cache Optimizatios

More information

Morgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5

Morgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5 Morga Kaufma Publishers 26 February, 28 COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter 5 Set-Associative Cache Architecture Performace Summary Whe CPU performace icreases:

More information

CSC 220: Computer Organization Unit 11 Basic Computer Organization and Design

CSC 220: Computer Organization Unit 11 Basic Computer Organization and Design College of Computer ad Iformatio Scieces Departmet of Computer Sciece CSC 220: Computer Orgaizatio Uit 11 Basic Computer Orgaizatio ad Desig 1 For the rest of the semester, we ll focus o computer architecture:

More information

UH-MEM: Utility-Based Hybrid Memory Management. Yang Li, Saugata Ghose, Jongmoo Choi, Jin Sun, Hui Wang, Onur Mutlu

UH-MEM: Utility-Based Hybrid Memory Management. Yang Li, Saugata Ghose, Jongmoo Choi, Jin Sun, Hui Wang, Onur Mutlu UH-MEM: Utility-Based Hybrid Memory Maagemet Yag Li, Saugata Ghose, Jogmoo Choi, Ji Su, Hui Wag, Our Mutlu 1 Executive Summary DRAM faces sigificat techology scalig difficulties Emergig memory techologies

More information

CMSC Computer Architecture Lecture 11: More Caches. Prof. Yanjing Li University of Chicago

CMSC Computer Architecture Lecture 11: More Caches. Prof. Yanjing Li University of Chicago CMSC 22200 Computer Architecture Lecture 11: More Caches Prof. Yajig Li Uiversity of Chicago Lecture Outlie Caches 2 Review Memory hierarchy Cache basics Locality priciples Spatial ad temporal How to access

More information

Morgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5.

Morgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5. Morga Kaufma Publishers 26 February, 208 COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter 5 Virtual Memory Review: The Memory Hierarchy Take advatage of the priciple

More information

CMSC Computer Architecture Lecture 12: Virtual Memory. Prof. Yanjing Li University of Chicago

CMSC Computer Architecture Lecture 12: Virtual Memory. Prof. Yanjing Li University of Chicago CMSC 22200 Computer Architecture Lecture 12: Virtual Memory Prof. Yajig Li Uiversity of Chicago A System with Physical Memory Oly Examples: most Cray machies early PCs Memory early all embedded systems

More information

The University of Adelaide, School of Computer Science 22 November Computer Architecture. A Quantitative Approach, Sixth Edition.

The University of Adelaide, School of Computer Science 22 November Computer Architecture. A Quantitative Approach, Sixth Edition. Computer Architecture A Quatitative Approach, Sixth Editio Chapter 2 Memory Hierarchy Desig 1 Itroductio Programmers wat ulimited amouts of memory with low latecy Fast memory techology is more expesive

More information

CMSC Computer Architecture Lecture 10: Caches. Prof. Yanjing Li University of Chicago

CMSC Computer Architecture Lecture 10: Caches. Prof. Yanjing Li University of Chicago CMSC 22200 Computer Architecture Lecture 10: Caches Prof. Yajig Li Uiversity of Chicago Midterm Recap Overview ad fudametal cocepts ISA Uarch Datapath, cotrol Sigle cycle, multi cycle Pipeliig Basic idea,

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Part A Datapath Design

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Part A Datapath Design COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter The Processor Part A path Desig Itroductio CPU performace factors Istructio cout Determied by ISA ad compiler. CPI ad

More information

. Written in factored form it is easy to see that the roots are 2, 2, i,

. Written in factored form it is easy to see that the roots are 2, 2, i, CMPS A Itroductio to Programmig Programmig Assigmet 4 I this assigmet you will write a java program that determies the real roots of a polyomial that lie withi a specified rage. Recall that the roots (or

More information

Elementary Educational Computer

Elementary Educational Computer Chapter 5 Elemetary Educatioal Computer. Geeral structure of the Elemetary Educatioal Computer (EEC) The EEC coforms to the 5 uits structure defied by vo Neuma's model (.) All uits are preseted i a simplified

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Single-Cycle Disadvantages & Advantages

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Single-Cycle Disadvantages & Advantages COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter 4 The Processor Pipeliig Sigle-Cycle Disadvatages & Advatages Clk Uses the clock cycle iefficietly the clock cycle must

More information

Multi-Threading. Hyper-, Multi-, and Simultaneous Thread Execution

Multi-Threading. Hyper-, Multi-, and Simultaneous Thread Execution Multi-Threadig Hyper-, Multi-, ad Simultaeous Thread Executio 1 Performace To Date Icreasig processor performace Pipeliig. Brach predictio. Super-scalar executio. Out-of-order executio. Caches. Hyper-Threadig

More information

Appendix D. Controller Implementation

Appendix D. Controller Implementation COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Appedix D Cotroller Implemetatio Cotroller Implemetatios Combiatioal logic (sigle-cycle); Fiite state machie (multi-cycle, pipelied);

More information

UNIVERSITY OF MORATUWA

UNIVERSITY OF MORATUWA UNIVERSITY OF MORATUWA FACULTY OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING B.Sc. Egieerig 2014 Itake Semester 2 Examiatio CS2052 COMPUTER ARCHITECTURE Time allowed: 2 Hours Jauary 2016

More information

Improving Template Based Spike Detection

Improving Template Based Spike Detection Improvig Template Based Spike Detectio Kirk Smith, Member - IEEE Portlad State Uiversity petra@ee.pdx.edu Abstract Template matchig algorithms like SSE, Covolutio ad Maximum Likelihood are well kow for

More information

One advantage that SONAR has over any other music-sequencing product I ve worked

One advantage that SONAR has over any other music-sequencing product I ve worked *gajedra* D:/Thomso_Learig_Projects/Garrigus_163132/z_productio/z_3B2_3D_files/Garrigus_163132_ch17.3d, 14/11/08/16:26:39, 16:26, page: 647 17 CAL 101 Oe advatage that SONAR has over ay other music-sequecig

More information

Avid Interplay Bundle

Avid Interplay Bundle Avid Iterplay Budle Versio 2.5 Cofigurator ReadMe Overview This documet provides a overview of Iterplay Budle v2.5 ad describes how to ru the Iterplay Budle cofiguratio tool. Iterplay Budle v2.5 refers

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor Advanced Issues

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor Advanced Issues COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter 4 The Processor Advaced Issues Review: Pipelie Hazards Structural hazards Desig pipelie to elimiate structural hazards.

More information

EE University of Minnesota. Midterm Exam #1. Prof. Matthew O'Keefe TA: Eric Seppanen. Department of Electrical and Computer Engineering

EE University of Minnesota. Midterm Exam #1. Prof. Matthew O'Keefe TA: Eric Seppanen. Department of Electrical and Computer Engineering EE 4363 1 Uiversity of Miesota Midterm Exam #1 Prof. Matthew O'Keefe TA: Eric Seppae Departmet of Electrical ad Computer Egieerig Uiversity of Miesota Twi Cities Campus EE 4363 Itroductio to Microprocessors

More information

3D Model Retrieval Method Based on Sample Prediction

3D Model Retrieval Method Based on Sample Prediction 20 Iteratioal Coferece o Computer Commuicatio ad Maagemet Proc.of CSIT vol.5 (20) (20) IACSIT Press, Sigapore 3D Model Retrieval Method Based o Sample Predictio Qigche Zhag, Ya Tag* School of Computer

More information

Computer Architecture ELEC3441

Computer Architecture ELEC3441 CPU-Memory Bottleeck Computer Architecture ELEC44 CPU Memory Lecture 8 Cache Dr. Hayde Kwok-Hay So Departmet of Electrical ad Electroic Egieerig Performace of high-speed computers is usually limited by

More information

Fast Fourier Transform (FFT) Algorithms

Fast Fourier Transform (FFT) Algorithms Fast Fourier Trasform FFT Algorithms Relatio to the z-trasform elsewhere, ozero, z x z X x [ ] 2 ~ elsewhere,, ~ e j x X x x π j e z z X X π 2 ~ The DFS X represets evely spaced samples of the z- trasform

More information

Data diverse software fault tolerance techniques

Data diverse software fault tolerance techniques Data diverse software fault tolerace techiques Complemets desig diversity by compesatig for desig diversity s s limitatios Ivolves obtaiig a related set of poits i the program data space, executig the

More information

Structuring Redundancy for Fault Tolerance. CSE 598D: Fault Tolerant Software

Structuring Redundancy for Fault Tolerance. CSE 598D: Fault Tolerant Software Structurig Redudacy for Fault Tolerace CSE 598D: Fault Tolerat Software What do we wat to achieve? Versios Damage Assessmet Versio 1 Error Detectio Iputs Versio 2 Voter Outputs State Restoratio Cotiued

More information

End Semester Examination CSE, III Yr. (I Sem), 30002: Computer Organization

End Semester Examination CSE, III Yr. (I Sem), 30002: Computer Organization Ed Semester Examiatio 2013-14 CSE, III Yr. (I Sem), 30002: Computer Orgaizatio Istructios: GROUP -A 1. Write the questio paper group (A, B, C, D), o frot page top of aswer book, as per what is metioed

More information

Python Programming: An Introduction to Computer Science

Python Programming: An Introduction to Computer Science Pytho Programmig: A Itroductio to Computer Sciece Chapter 6 Defiig Fuctios Pytho Programmig, 2/e 1 Objectives To uderstad why programmers divide programs up ito sets of cooperatig fuctios. To be able to

More information

Chapter 4 Threads. Operating Systems: Internals and Design Principles. Ninth Edition By William Stallings

Chapter 4 Threads. Operating Systems: Internals and Design Principles. Ninth Edition By William Stallings Operatig Systems: Iterals ad Desig Priciples Chapter 4 Threads Nith Editio By William Stalligs Processes ad Threads Resource Owership Process icludes a virtual address space to hold the process image The

More information

Bayesian approach to reliability modelling for a probability of failure on demand parameter

Bayesian approach to reliability modelling for a probability of failure on demand parameter Bayesia approach to reliability modellig for a probability of failure o demad parameter BÖRCSÖK J., SCHAEFER S. Departmet of Computer Architecture ad System Programmig Uiversity Kassel, Wilhelmshöher Allee

More information

Analysis Metrics. Intro to Algorithm Analysis. Slides. 12. Alg Analysis. 12. Alg Analysis

Analysis Metrics. Intro to Algorithm Analysis. Slides. 12. Alg Analysis. 12. Alg Analysis Itro to Algorithm Aalysis Aalysis Metrics Slides. Table of Cotets. Aalysis Metrics 3. Exact Aalysis Rules 4. Simple Summatio 5. Summatio Formulas 6. Order of Magitude 7. Big-O otatio 8. Big-O Theorems

More information

The Magma Database file formats

The Magma Database file formats The Magma Database file formats Adrew Gaylard, Bret Pikey, ad Mart-Mari Breedt Johaesburg, South Africa 15th May 2006 1 Summary Magma is a ope-source object database created by Chris Muller, of Kasas City,

More information

Improvement of the Orthogonal Code Convolution Capabilities Using FPGA Implementation

Improvement of the Orthogonal Code Convolution Capabilities Using FPGA Implementation Improvemet of the Orthogoal Code Covolutio Capabilities Usig FPGA Implemetatio Naima Kaabouch, Member, IEEE, Apara Dhirde, Member, IEEE, Saleh Faruque, Member, IEEE Departmet of Electrical Egieerig, Uiversity

More information

IMP: Superposer Integrated Morphometrics Package Superposition Tool

IMP: Superposer Integrated Morphometrics Package Superposition Tool IMP: Superposer Itegrated Morphometrics Package Superpositio Tool Programmig by: David Lieber ( 03) Caisius College 200 Mai St. Buffalo, NY 4208 Cocept by: H. David Sheets, Dept. of Physics, Caisius College

More information

Software development of components for complex signal analysis on the example of adaptive recursive estimation methods.

Software development of components for complex signal analysis on the example of adaptive recursive estimation methods. Software developmet of compoets for complex sigal aalysis o the example of adaptive recursive estimatio methods. SIMON BOYMANN, RALPH MASCHOTTA, SILKE LEHMANN, DUNJA STEUER Istitute of Biomedical Egieerig

More information

Page 1. Why Care About the Memory Hierarchy? Memory. DRAMs over Time. Virtual Memory!

Page 1. Why Care About the Memory Hierarchy? Memory. DRAMs over Time. Virtual Memory! Why Care About the Memory Hierarchy? Memory Virtual Memory -DRAM Memory Gap (latecy) Reasos: Multi process systems (abstractio & memory protectio) Solutio: Tables (holdig per process traslatios) Fast traslatio

More information

EE 459/500 HDL Based Digital Design with Programmable Logic. Lecture 13 Control and Sequencing: Hardwired and Microprogrammed Control

EE 459/500 HDL Based Digital Design with Programmable Logic. Lecture 13 Control and Sequencing: Hardwired and Microprogrammed Control EE 459/500 HDL Based Digital Desig with Programmable Logic Lecture 13 Cotrol ad Sequecig: Hardwired ad Microprogrammed Cotrol Refereces: Chapter s 4,5 from textbook Chapter 7 of M.M. Mao ad C.R. Kime,

More information

Threads and Concurrency in Java: Part 1

Threads and Concurrency in Java: Part 1 Cocurrecy Threads ad Cocurrecy i Java: Part 1 What every computer egieer eeds to kow about cocurrecy: Cocurrecy is to utraied programmers as matches are to small childre. It is all too easy to get bured.

More information

Threads and Concurrency in Java: Part 1

Threads and Concurrency in Java: Part 1 Threads ad Cocurrecy i Java: Part 1 1 Cocurrecy What every computer egieer eeds to kow about cocurrecy: Cocurrecy is to utraied programmers as matches are to small childre. It is all too easy to get bured.

More information

GPUMP: a Multiple-Precision Integer Library for GPUs

GPUMP: a Multiple-Precision Integer Library for GPUs GPUMP: a Multiple-Precisio Iteger Library for GPUs Kaiyog Zhao ad Xiaowe Chu Departmet of Computer Sciece, Hog Kog Baptist Uiversity Hog Kog, P. R. Chia Email: {kyzhao, chxw}@comp.hkbu.edu.hk Abstract

More information

Python Programming: An Introduction to Computer Science

Python Programming: An Introduction to Computer Science Pytho Programmig: A Itroductio to Computer Sciece Chapter 1 Computers ad Programs 1 Objectives To uderstad the respective roles of hardware ad software i a computig system. To lear what computer scietists

More information

APPLICATION NOTE PACE1750AE BUILT-IN FUNCTIONS

APPLICATION NOTE PACE1750AE BUILT-IN FUNCTIONS APPLICATION NOTE PACE175AE BUILT-IN UNCTIONS About This Note This applicatio brief is iteded to explai ad demostrate the use of the special fuctios that are built ito the PACE175AE processor. These powerful

More information

Adaptive Resource Allocation for Electric Environmental Pollution through the Control Network

Adaptive Resource Allocation for Electric Environmental Pollution through the Control Network Available olie at www.sciecedirect.com Eergy Procedia 6 (202) 60 64 202 Iteratioal Coferece o Future Eergy, Eviromet, ad Materials Adaptive Resource Allocatio for Electric Evirometal Pollutio through the

More information

1 Enterprise Modeler

1 Enterprise Modeler 1 Eterprise Modeler Itroductio I BaaERP, a Busiess Cotrol Model ad a Eterprise Structure Model for multi-site cofiguratios are itroduced. Eterprise Structure Model Busiess Cotrol Models Busiess Fuctio

More information

COSC 1P03. Ch 7 Recursion. Introduction to Data Structures 8.1

COSC 1P03. Ch 7 Recursion. Introduction to Data Structures 8.1 COSC 1P03 Ch 7 Recursio Itroductio to Data Structures 8.1 COSC 1P03 Recursio Recursio I Mathematics factorial Fiboacci umbers defie ifiite set with fiite defiitio I Computer Sciece sytax rules fiite defiitio,

More information

CMSC22200 Computer Architecture Lecture 9: Out-of-Order, SIMD, VLIW. Prof. Yanjing Li University of Chicago

CMSC22200 Computer Architecture Lecture 9: Out-of-Order, SIMD, VLIW. Prof. Yanjing Li University of Chicago CMSC22200 Computer Architecture Lecture 9: Out-of-Order, SIMD, VLIW Prof. Yajig Li Uiversity of Chicago Admiistrative Stuff Lab2 due toight Exam I: covers lectures 1-9 Ope book, ope otes, close device

More information

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe CHAPTER 18 Strategies for Query Processig Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe Itroductio DBMS techiques to process a query Scaer idetifies

More information

Service Oriented Enterprise Architecture and Service Oriented Enterprise

Service Oriented Enterprise Architecture and Service Oriented Enterprise Approved for Public Release Distributio Ulimited Case Number: 09-2786 The 23 rd Ope Group Eterprise Practitioers Coferece Service Orieted Eterprise ad Service Orieted Eterprise Ya Zhao, PhD Pricipal, MITRE

More information

GE FUNDAMENTALS OF COMPUTING AND PROGRAMMING UNIT III

GE FUNDAMENTALS OF COMPUTING AND PROGRAMMING UNIT III GE2112 - FUNDAMENTALS OF COMPUTING AND PROGRAMMING UNIT III PROBLEM SOLVING AND OFFICE APPLICATION SOFTWARE Plaig the Computer Program Purpose Algorithm Flow Charts Pseudocode -Applicatio Software Packages-

More information

Chapter 1. Introduction to Computers and C++ Programming. Copyright 2015 Pearson Education, Ltd.. All rights reserved.

Chapter 1. Introduction to Computers and C++ Programming. Copyright 2015 Pearson Education, Ltd.. All rights reserved. Chapter 1 Itroductio to Computers ad C++ Programmig Copyright 2015 Pearso Educatio, Ltd.. All rights reserved. Overview 1.1 Computer Systems 1.2 Programmig ad Problem Solvig 1.3 Itroductio to C++ 1.4 Testig

More information

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe CHAPTER 19 Query Optimizatio Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe Itroductio Query optimizatio Coducted by a query optimizer i a DBMS Goal:

More information

Speeding-up dynamic programming in sequence alignment

Speeding-up dynamic programming in sequence alignment Departmet of Computer Sciece Aarhus Uiversity Demark Speedig-up dyamic programmig i sequece aligmet Master s Thesis Dug My Hoa - 443 December, Supervisor: Christia Nørgaard Storm Pederse Implemetatio code

More information

Chapter 11. Friends, Overloaded Operators, and Arrays in Classes. Copyright 2014 Pearson Addison-Wesley. All rights reserved.

Chapter 11. Friends, Overloaded Operators, and Arrays in Classes. Copyright 2014 Pearson Addison-Wesley. All rights reserved. Chapter 11 Frieds, Overloaded Operators, ad Arrays i Classes Copyright 2014 Pearso Addiso-Wesley. All rights reserved. Overview 11.1 Fried Fuctios 11.2 Overloadig Operators 11.3 Arrays ad Classes 11.4

More information

Operating System Concepts. Operating System Concepts

Operating System Concepts. Operating System Concepts Chapter 4: Mass-Storage Systems Logical Disk Structure Logical Disk Structure Disk Schedulig Disk Maagemet RAID Structure Disk drives are addressed as large -dimesioal arrays of logical blocks, where the

More information

Computers and Scientific Thinking

Computers and Scientific Thinking Computers ad Scietific Thikig David Reed, Creighto Uiversity Chapter 15 JavaScript Strigs 1 Strigs as Objects so far, your iteractive Web pages have maipulated strigs i simple ways use text box to iput

More information

EE123 Digital Signal Processing

EE123 Digital Signal Processing Last Time EE Digital Sigal Processig Lecture 7 Block Covolutio, Overlap ad Add, FFT Discrete Fourier Trasform Properties of the Liear covolutio through circular Today Liear covolutio with Overlap ad add

More information

Lecture Notes 6 Introduction to algorithm analysis CSS 501 Data Structures and Object-Oriented Programming

Lecture Notes 6 Introduction to algorithm analysis CSS 501 Data Structures and Object-Oriented Programming Lecture Notes 6 Itroductio to algorithm aalysis CSS 501 Data Structures ad Object-Orieted Programmig Readig for this lecture: Carrao, Chapter 10 To be covered i this lecture: Itroductio to algorithm aalysis

More information

n Explore virtualization concepts n Become familiar with cloud concepts

n Explore virtualization concepts n Become familiar with cloud concepts Chapter Objectives Explore virtualizatio cocepts Become familiar with cloud cocepts Chapter #15: Architecture ad Desig 2 Hypervisor Virtualizatio ad cloud services are becomig commo eterprise tools to

More information

CS2410 Computer Architecture. Flynn s Taxonomy

CS2410 Computer Architecture. Flynn s Taxonomy CS2410 Computer Architecture Dept. of Computer Sciece Uiversity of Pittsburgh http://www.cs.pitt.edu/~melhem/courses/2410p/idex.html 1 Fly s Taxoomy SISD Sigle istructio stream Sigle data stream (SIMD)

More information

6.854J / J Advanced Algorithms Fall 2008

6.854J / J Advanced Algorithms Fall 2008 MIT OpeCourseWare http://ocw.mit.edu 6.854J / 18.415J Advaced Algorithms Fall 2008 For iformatio about citig these materials or our Terms of Use, visit: http://ocw.mit.edu/terms. 18.415/6.854 Advaced Algorithms

More information

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe CHAPTER 22 Database Recovery Techiques Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe Itroductio Recovery algorithms Recovery cocepts Write-ahead

More information

Ones Assignment Method for Solving Traveling Salesman Problem

Ones Assignment Method for Solving Traveling Salesman Problem Joural of mathematics ad computer sciece 0 (0), 58-65 Oes Assigmet Method for Solvig Travelig Salesma Problem Hadi Basirzadeh Departmet of Mathematics, Shahid Chamra Uiversity, Ahvaz, Ira Article history:

More information

Lecture 28: Data Link Layer

Lecture 28: Data Link Layer Automatic Repeat Request (ARQ) 2. Go ack N ARQ Although the Stop ad Wait ARQ is very simple, you ca easily show that it has very the low efficiecy. The low efficiecy comes from the fact that the trasmittig

More information

1. SWITCHING FUNDAMENTALS

1. SWITCHING FUNDAMENTALS . SWITCING FUNDMENTLS Switchig is the provisio of a o-demad coectio betwee two ed poits. Two distict switchig techiques are employed i commuicatio etwors-- circuit switchig ad pacet switchig. Circuit switchig

More information

FREQUENCY ESTIMATION OF INTERNET PACKET STREAMS WITH LIMITED SPACE: UPPER AND LOWER BOUNDS

FREQUENCY ESTIMATION OF INTERNET PACKET STREAMS WITH LIMITED SPACE: UPPER AND LOWER BOUNDS FREQUENCY ESTIMATION OF INTERNET PACKET STREAMS WITH LIMITED SPACE: UPPER AND LOWER BOUNDS Prosejit Bose Evagelos Kraakis Pat Mori Yihui Tag School of Computer Sciece, Carleto Uiversity {jit,kraakis,mori,y

More information

FPGA IMPLEMENTATION OF BASE-N LOGARITHM. Salvador E. Tropea

FPGA IMPLEMENTATION OF BASE-N LOGARITHM. Salvador E. Tropea FPGA IMPLEMENTATION OF BASE-N LOGARITHM Salvador E. Tropea Electróica e Iformática Istituto Nacioal de Tecología Idustrial Bueos Aires, Argetia email: salvador@iti.gov.ar ABSTRACT I this work, we preset

More information

Algorithms for Disk Covering Problems with the Most Points

Algorithms for Disk Covering Problems with the Most Points Algorithms for Disk Coverig Problems with the Most Poits Bi Xiao Departmet of Computig Hog Kog Polytechic Uiversity Hug Hom, Kowloo, Hog Kog csbxiao@comp.polyu.edu.hk Qigfeg Zhuge, Yi He, Zili Shao, Edwi

More information

CIS 121 Data Structures and Algorithms with Java Spring Stacks and Queues Monday, February 12 / Tuesday, February 13

CIS 121 Data Structures and Algorithms with Java Spring Stacks and Queues Monday, February 12 / Tuesday, February 13 CIS Data Structures ad Algorithms with Java Sprig 08 Stacks ad Queues Moday, February / Tuesday, February Learig Goals Durig this lab, you will: Review stacks ad queues. Lear amortized ruig time aalysis

More information

APPLICATION NOTE. Automated Gain Flattening. 1. Experimental Setup. Scope and Overview

APPLICATION NOTE. Automated Gain Flattening. 1. Experimental Setup. Scope and Overview APPLICATION NOTE Automated Gai Flatteig Scope ad Overview A flat optical power spectrum is essetial for optical telecommuicatio sigals. This stems from a eed to balace the chael powers across large distaces.

More information

1&1 Next Level Hosting

1&1 Next Level Hosting 1&1 Next Level Hostig Performace Level: Performace that grows with your requiremets Copyright 1&1 Iteret SE 2017 1ad1.com 2 1&1 NEXT LEVEL HOSTING 3 Fast page loadig ad short respose times play importat

More information

STRATEGIC. alliances & Services

STRATEGIC. alliances & Services STRATEGIC alliaces & Services Chesterto is a leadig iteratioal maufacturer of idustrial fluid sealig systems, advaced polymer composites, cleaers, lubricats ad idustrial speciality products. Sice 1884

More information

Load balanced Parallel Prime Number Generator with Sieve of Eratosthenes on Cluster Computers *

Load balanced Parallel Prime Number Generator with Sieve of Eratosthenes on Cluster Computers * Load balaced Parallel Prime umber Geerator with Sieve of Eratosthees o luster omputers * Soowook Hwag*, Kyusik hug**, ad Dogseug Kim* *Departmet of Electrical Egieerig Korea Uiversity Seoul, -, Rep. of

More information

An Improved Shuffled Frog-Leaping Algorithm for Knapsack Problem

An Improved Shuffled Frog-Leaping Algorithm for Knapsack Problem A Improved Shuffled Frog-Leapig Algorithm for Kapsack Problem Zhoufag Li, Ya Zhou, ad Peg Cheg School of Iformatio Sciece ad Egieerig Hea Uiversity of Techology ZhegZhou, Chia lzhf1978@126.com Abstract.

More information

Fast Interpolation of Grid Data at a Non-Grid Point

Fast Interpolation of Grid Data at a Non-Grid Point Fast Iterpolatio of Grid Data at a No-Grid Poit Hiroshi Ioue IBM Research - Tokyo Tokyo, Japa iouehrs@jp.ibm.com Abstract Defiig data at a o-grid poit by iterpolatig grid data is a commo operatio i may

More information

Parallel Polygon Approximation Algorithm Targeted at Reconfigurable Multi-Ring Hardware

Parallel Polygon Approximation Algorithm Targeted at Reconfigurable Multi-Ring Hardware Parallel Polygo Approximatio Algorithm Targeted at Recofigurable Multi-Rig Hardware M. Arif Wai* ad Hamid R. Arabia** *Califoria State Uiversity Bakersfield, Califoria, USA **Uiversity of Georgia, Georgia,

More information

Basic allocator mechanisms The course that gives CMU its Zip! Memory Management II: Dynamic Storage Allocation Mar 6, 2000.

Basic allocator mechanisms The course that gives CMU its Zip! Memory Management II: Dynamic Storage Allocation Mar 6, 2000. 5-23 The course that gives CM its Zip Memory Maagemet II: Dyamic Storage Allocatio Mar 6, 2000 Topics Segregated lists Buddy system Garbage collectio Mark ad Sweep Copyig eferece coutig Basic allocator

More information

Project 2.5 Improved Euler Implementation

Project 2.5 Improved Euler Implementation Project 2.5 Improved Euler Implemetatio Figure 2.5.10 i the text lists TI-85 ad BASIC programs implemetig the improved Euler method to approximate the solutio of the iitial value problem dy dx = x+ y,

More information

Big Data Capacity Planning: Achieving Right Sized Hadoop Clusters and Optimized Operations

Big Data Capacity Planning: Achieving Right Sized Hadoop Clusters and Optimized Operations Big Data Capacity Plaig: Achievig Right Sized Hadoop Clusters ad Optimized Operatios Abstract Busiesses are cosiderig more opportuities to leverage data for differet purposes, impactig resources ad resultig

More information

CS61C : Machine Structures

CS61C : Machine Structures CS 61C L24 VM II (1) ist.eecs.berkele.edu/~cs61c/su5 CS61C : Machie Structures Lecture #24: VM II Address Mappig: Virtual Address: VPN offset 25-8-2 Ad Carle idex ito page table located i phsical memor

More information

Computer Systems - HS

Computer Systems - HS What have we leared so far? Computer Systems High Level ENGG1203 2d Semester, 2017-18 Applicatios Sigals Systems & Cotrol Systems Computer & Embedded Systems Digital Logic Combiatioal Logic Sequetial Logic

More information

A Parallel Reconfigurable Architecture for Real-Time Stereo Vision

A Parallel Reconfigurable Architecture for Real-Time Stereo Vision 2009 Iteratioal Cofereces o Embedded Software ad Systems A Parallel Recofigurable Architecture for Real-Time Stereo Visio Lei Che Yude Jia Beijig Laboratory of Itelliget Iformatio Techology, School of

More information

Instruction and Data Streams

Instruction and Data Streams Advaced Architectures Master Iformatics Eg. 2017/18 A.J.Proeça Data Parallelism 1 (vector & SIMD extesios) (most slides are borrowed) AJProeça, Advaced Architectures, MiEI, UMiho, 2017/18 1 Istructio ad

More information

Accuracy Improvement in Camera Calibration

Accuracy Improvement in Camera Calibration Accuracy Improvemet i Camera Calibratio FaJie L Qi Zag ad Reihard Klette CITR, Computer Sciece Departmet The Uiversity of Aucklad Tamaki Campus, Aucklad, New Zealad fli006, qza001@ec.aucklad.ac.z r.klette@aucklad.ac.z

More information

Cache-Optimal Methods for Bit-Reversals

Cache-Optimal Methods for Bit-Reversals Proceedigs of the ACM/IEEE Supercomputig Coferece, November 1999, Portlad, Orego, U.S.A. Cache-Optimal Methods for Bit-Reversals Zhao Zhag ad Xiaodog Zhag Departmet of Computer Sciece College of William

More information

Lecture 1: Introduction and Fundamental Concepts 1

Lecture 1: Introduction and Fundamental Concepts 1 Uderstadig Performace Lecture : Fudametal Cocepts ad Performace Aalysis CENG 332 Algorithm Determies umber of operatios executed Programmig laguage, compiler, architecture Determie umber of machie istructios

More information

New HSL Distance Based Colour Clustering Algorithm

New HSL Distance Based Colour Clustering Algorithm The 4th Midwest Artificial Itelligece ad Cogitive Scieces Coferece (MAICS 03 pp 85-9 New Albay Idiaa USA April 3-4 03 New HSL Distace Based Colour Clusterig Algorithm Vasile Patrascu Departemet of Iformatics

More information

Accelerating Non-volatile/Hybrid Processor Cache Design Space Exploration for Application Specific Embedded Systems

Accelerating Non-volatile/Hybrid Processor Cache Design Space Exploration for Application Specific Embedded Systems Acceleratig No-volatile/Hybrid Processor Cache Desig Space Exploratio for Applicatio Specific Embedded Systems Mohammad Shihabul Haque, Ag Li, Akash Kumar, Qigsog Wei Natioal Uiversity of Sigapore ad Data

More information

A SOFTWARE MODEL FOR THE MULTILAYER PERCEPTRON

A SOFTWARE MODEL FOR THE MULTILAYER PERCEPTRON A SOFTWARE MODEL FOR THE MULTILAYER PERCEPTRON Roberto Lopez ad Eugeio Oñate Iteratioal Ceter for Numerical Methods i Egieerig (CIMNE) Edificio C1, Gra Capitá s/, 08034 Barceloa, Spai ABSTRACT I this work

More information

Hardware Implementation of Stack-Based Replacement Algorithms

Hardware Implementation of Stack-Based Replacement Algorithms World Academy of Sciece, Egieerig ad Techology Iteratioal Joural of Computer ad Iformatio Egieerig Vol:, No:4, 8 Hardware Implemetatio of Stack-Based Replacemet Algorithms Hassa Ghasemzadeh, Sepideh Mazrouee,

More information

SCI Reflective Memory

SCI Reflective Memory Embedded SCI Solutios SCI Reflective Memory (Experimetal) Atle Vesterkjær Dolphi Itercoect Solutios AS Olaf Helsets vei 6, N-0621 Oslo, Norway Phoe: (47) 23 16 71 42 Fax: (47) 23 16 71 80 Mail: atleve@dolphiics.o

More information

Fundamentals of. Chapter 1. Microprocessor and Microcontroller. Dr. Farid Farahmand. Updated: Tuesday, January 16, 2018

Fundamentals of. Chapter 1. Microprocessor and Microcontroller. Dr. Farid Farahmand. Updated: Tuesday, January 16, 2018 Fudametals of Chapter 1 Microprocessor ad Microcotroller Dr. Farid Farahmad Updated: Tuesday, Jauary 16, 2018 Evolutio First came trasistors Itegrated circuits SSI (Small-Scale Itegratio) to ULSI Very

More information

Lecture 1: Introduction and Strassen s Algorithm

Lecture 1: Introduction and Strassen s Algorithm 5-750: Graduate Algorithms Jauary 7, 08 Lecture : Itroductio ad Strasse s Algorithm Lecturer: Gary Miller Scribe: Robert Parker Itroductio Machie models I this class, we will primarily use the Radom Access

More information

Automatic Generation of Polynomial-Basis Multipliers in GF (2 n ) using Recursive VHDL

Automatic Generation of Polynomial-Basis Multipliers in GF (2 n ) using Recursive VHDL Automatic Geeratio of Polyomial-Basis Multipliers i GF (2 ) usig Recursive VHDL J. Nelso, G. Lai, A. Teca Abstract Multiplicatio i GF (2 ) is very commoly used i the fields of cryptography ad error correctig

More information

ALU Augmentation for MPEG-4 Repetitive Padding

ALU Augmentation for MPEG-4 Repetitive Padding ALU Augmetatio for MPEG-4 Repetitive Paddig Georgi Kuzmaov Stamatis Vassiliadis Computer Egieerig Lab, Electrical Egieerig Departmet, Faculty of formatio Techology ad Systems, Delft Uiversity of Techology,

More information

Second-Order Domain Decomposition Method for Three-Dimensional Hyperbolic Problems

Second-Order Domain Decomposition Method for Three-Dimensional Hyperbolic Problems Iteratioal Mathematical Forum, Vol. 8, 013, o. 7, 311-317 Secod-Order Domai Decompositio Method for Three-Dimesioal Hyperbolic Problems Youbae Ju Departmet of Applied Mathematics Kumoh Natioal Istitute

More information

FAST BIT-REVERSALS ON UNIPROCESSORS AND SHARED-MEMORY MULTIPROCESSORS

FAST BIT-REVERSALS ON UNIPROCESSORS AND SHARED-MEMORY MULTIPROCESSORS SIAM J. SCI. COMPUT. Vol. 22, No. 6, pp. 2113 2134 c 21 Society for Idustrial ad Applied Mathematics FAST BIT-REVERSALS ON UNIPROCESSORS AND SHARED-MEMORY MULTIPROCESSORS ZHAO ZHANG AND XIAODONG ZHANG

More information

Isn t It Time You Got Faster, Quicker?

Isn t It Time You Got Faster, Quicker? Is t It Time You Got Faster, Quicker? AltiVec Techology At-a-Glace OVERVIEW Motorola s advaced AltiVec techology is desiged to eable host processors compatible with the PowerPC istructio-set architecture

More information

Chapter 3 Classification of FFT Processor Algorithms

Chapter 3 Classification of FFT Processor Algorithms Chapter Classificatio of FFT Processor Algorithms The computatioal complexity of the Discrete Fourier trasform (DFT) is very high. It requires () 2 complex multiplicatios ad () complex additios [5]. As

More information

condition w i B i S maximum u i

condition w i B i S maximum u i ecture 10 Dyamic Programmig 10.1 Kapsack Problem November 1, 2004 ecturer: Kamal Jai Notes: Tobias Holgers We are give a set of items U = {a 1, a 2,..., a }. Each item has a weight w i Z + ad a utility

More information