Direct-Mapped Caches

Size: px
Start display at page:

Download "Direct-Mapped Caches"

Transcription

1 A Case for Diret-Mapped Cahes Mark D. Hill University of Wisonsin ahe is a small, fast buffer in whih a system keeps those parts, of the ontents of a larger, slower memory that are likely to be used soon. The purpose of a ahe is to improve systetn ost performane by providing the apaity of the large, slow memory with an average aess time lose to that of the small, fast ahe. This is possible only if most memory referenes an be servied rapidly by the ahe without the intervention of the slower memory. Usually ahes are suessful due to temporal and spatial loality, two properties of most real referene streams. Temporal loality means future referenes are likely to be made to the same loations as reent referenes, while spatial loality suggests that future referenes are also likely to be made to loations near reent referenes. Cahes take advantage of temporal loality by retaining reently referened information, while they exploit spatial loality by loading and retaining bloks of information surrounding reent referenes. A CPU ahe is a ahe of main memory.' Like ahes in general, CPU ahes are faster and smaller than the memory they buffer. They are usually five to 20 times faster and 50 to 1,000 times smaller than main memory. Beause CPU ahes must be extremely fast, they are managed entirely by hardware, and for this reason, Despite having worse miss ratios, large diret-mapped ahes often handle proessor referenes faster than more-expensive set-assoiative ahes. CPU-ahe aess and management poliies must be relatively simple. CPU ahes have been studied extensively' beause they have proven effetive at inreasing system performane, lowering system ost, or both. CPU ahes ontinue to be worth studying beause their importane to system ost-performane is inreasing and tehnologial improvements are altering their harateristis. (Sine this artile examines only CPU ahes, the term ahe is often used instead of CPU ahe.) The important ahe design parameter examined here is assoiativity, whih is also alled degree of assoiativity or set size. The assoiativity of a ahe is the number of blok frames in whih a given blok may reside. Reduing assoiativity allows fewer blok frames to be searhed on a referene, a potential implementation advantage. However, this further onstrains whih bloks an be simultaneously resident, a potential performane disadvantage. The terms fully-assoiative, setassoiative, and diret-mapped express the relationship between a ahe's assoiativity and apaity. A ahe of blok frames is alled fully assoiative if a blok an reside in any blok frame (assoiativity ), n-way set-assoiative if a blok an reside only in one of n blok frames where 1 < n < (assoiativity n), and diret-mapped if a blok an reside in only one blok frame (assoiativity 1). Figure 1 illustrates set-assoiative mapping. t is worthwhile studying assoiativity beause tehnologial trends toward large, fast stati RAMS are failitating larger ahe sizes and arhitetural trends toward redued instrution set omputers (RSCs) are requiring faster hit times. The trend to larger ahes is illustrated by the VAX 11 family. The reently introdued VAX 8800 uses a 64-Kbyte diret-mapped ahe, while older VAX 11 implementations like the VAX 11/780 and VAX 11/785 use setassoiative ahes of 8 and 16 Kbytes. RSCs aentuate the need for ahes Deember 1988 OOlR 9162/XX/ $ l Ett 25

2 regarding single-level ahes extend to two-level ahe hierarhies. restrit my arguments to uniproessors for two reasons. First, uniproessors are and will ontinue to be important, espeially for omputers less ostly than mainframes, suh as engineering workstations. Seond, a thorough analysis of ahes in multiproessors requires overage of many degrees of freedom, whih would dilute the thrust of this artile. These inlude interonnetion topology, whether ontrol is single-instrution-multiple-data or multiple-instrution-multiple-data, synhronization and ahe ohereny mehanisms, and number of proessors (hene granularity of sharing). will, however, disuss how and when these arguments for ahes in uniproessors apply to ahes in multiproessors. Performane metris To examine ahe performane, use miss ratio and an extended model of effetive aess time. Miss ratio. Miss ratio is the most ommonly-used ahe performane metri.' The miss ratio for a ahe Cis m(c) = No. of misses with ahe C No. of proessor referenes with fast hit times by having simple pipelines that failitate shorter yle times, and by referening memory so frequently (one per yle) that CPU yle times are often determined by ahe hit times. Commerial RSC proessors have been introdued by AMD, Hewlett-Pakard, BM, ntel, MPS, Motorola, Sun, and others. This artile will show that trends toward larger ahe sizes and faster hit times favor diret-mapped ahes. The arguments in the main body of this artile are restrited to single-level ahes in uniproessors. A single-level ahe servies proessor referenes and obtains data for misses diretly from main memory. Most past and present omputers use single-level ahes, as will many future omputers. expet some future omputers, however, to use twolevel (or more) ahe hierarhies, where a level-one ahe servies proessor referenes and obtains data for misses from a level-two ahe, whih in turn servies level-one-ahe misses and obtains data for its misses from memory. Later in the artile, disuss how my arguments use m(c), rather than m, to emphasize that the miss ratio is a funtion of a ahe organization. "C" represents all attributes of ahe C. Miss ratio is used beause it is easy to define, interpret, and ompute, and perhaps most important, beause it is implementation independent. This independene failitates ahe performane omparisons between ahes not yet implemented and those implemented with different tehnologies and in different kinds of systems. Unfortunately, some omparisons of dissimilar ahes an lead to misleading results. A miss ratio ornparison, for example, between the Cray-1 instrution buffers and the Motorola on-hip instrution ahe is meaningless beause the tehnologies and workloads have little in ommon. Sine miss-ratio omparisons ontrast the number of misses, they an also be misleading if the penalty for a miss varies. For instane, inreasing ahe blok size often redues the number of misses and hene the miss ratio, but it often also inreases the number of yles needed to load a 26 COMPUTER

3 ~ Blok number Blok offset 1 Address L A -1 Set-mappi ng funtion Set deoder 4 Assoiativity (A=n) Number of sets (S =/n) t Compare blok number with tags and selet data word Figure 1. Set-assoiative mapping. A set-assoiative ahe uses a set-mapping funtionf to partition all main-memory bloks into equivalene lasses. Some ahe blok frames are assigned to hold reently referened bloks from eah lass. Eah group of blok frames is alled a set. The number of groups, alled the number of sets (s), equals the number of lasses. The number of blok frames in eah set is alled the assoiativity (degree of assoiativity, set size, n). The number of blok frames in the ahe () always equals the assoiativity times the number of sets ( = n*s). A ahe is fully-assoiative if it ontains only one set (n =, s = l), is diret-mapped if eah set ontains one blok frame (n = 1, s = E ), and is n-way set-assoiative otherwise (where n is the assoiativity, s = /n). On a referene to blok x, set-mapping funtionffeeds the set deoder withf(x) to selet one set (one row); eah blok frame is searhed until x is found (a ahe hit) or the set is exhausted (a ahe miss). On a ahe miss, one blok in setf(x) is replaed with the blok x obtained from memory. Finally, the word requested from blok x is returned to the proessor. For oneptual simpliity, the figure shows the word seleted last (in box labeled Compare blok number with tags and selet data word ). To redue the number of bits that must be read, many implementations selet the word while seleting the set. The most ommonly used set-mapping funtion is the blok number modulo the number of sets, where the number of sets is a power of two. This funtion is alled bit seletion sine it equals several low-order bits of the blok number. For 256 sets, for example,f(x) = x mod 256 orf(x) = x AND Oxff, where mod is remainder and AND is bitwise-and. blok. The atual hange in ahe performane will depend on how muh the number of misses dereases and how muh the time to servie a miss inrease^.^ The penalty for a miss an also vary beause of delays indiretly affeted by hanges in miss ratio, suh as memory ontention in a multiproessor. Effetive aess time. Another ommonly used ahe performane metri is effetive aess time, teff(c) (average aess time). Effetive aess time is the average lateny, as seen by the proessor, required by the memory system to servie a memory referene. n this artile, model it as where m(c), fahe(c), and fmemory(c) are the miss ratio, ahe hit time, and average miss penalty (delay beyond a ahe aess to aess memory) for ahe C. Stritly speaking, ahe hit time should be alled ahe aess time, sine this delay ours on all aesses, not just hits. hoose not to use ahe aess time, beause it is too easily onfused with effetive aess time. Using effetive aess time rather than miss ratio allows ahes with different hit and miss times to be more aurately ompared. One an, for example, determine whether inreasing ahe blok size improves performane as well as miss ratio. The disadvantage, however, is that implementation details must be examined and assumptions must be made for the values of tahe(c) and tmemory(c). Performane estimates with any implementation assumptions are less general, and those with inorret assumptions are misleading. Unlike many other ahe memory analyses, my analysis does not assume that tahe(c) is the same for all ahes studied. The disadvantage of inluding hanges in tahe(c) is that more implementationdependent parameters must be estimated, further limiting the generality of results. However, variability in ahe hit time must be onsidered, sine ignoring it an lead to inorret onlusions when omparing Deember

4 Address lath State Ta Valid and math? + i Data - Addr memory Data MathOut b log,(number of words per blok frame) bits DataOut = log,(number of blok frames) bits t = 32-i-b-2 bits Figure 2. A diret-mapped ahe. The aess logi (hit, not miss logi) for a diret-mapped ahe using bit seletion to selet the set (blok frame) of the referene has three omponents. The first omponent, the data memory, holds all ahed data and instrutions. The seond omponent, tag memory, holds the state bits and address tag assoiated with a ahed blok. The last omponent, the math logi, produes a single bit indiating whether the referened blok is present. large ahes of varying assoiativities. On the other hand, like many other ahe memory analyses, mine assumes that ahe hanges do not affet the average miss penalty, fmemory (C). This assumption simplifies analysis, but it an bias results for multiproessors where delays due to ontention or updating memory are large and variable. This artile does not evaluate ahes with system performane metris like benhmark exeution time or effetive number of proessors, sine these metris require many system-dependent assumptions that limit their usefulness to omparing similar alternative ahes within the ontext of an existing system. Furthermore, system metris rarely produe onlusions that generalize to ahe designs in other systems, beause of the diffiulty of isolating ahe effets from other system effets. mplementing ahes This setion examines the implementation of diret-mapped and set-assoiative ahes. onentrate on diret-mapped ahe hit (aess) logi and setassoiativity logi, beause the delay through this logi determines ahe hit time and diretly affets effetive aess time. Set-assoiativity logi is the additional logi required by a set-assoiative ahe over a diret-mapped ahe. For this disussion, assume a generi memory system with a single four-gigabyte address spae of aligned four-byte words addressed with 32-bit byte addresses. also assume address translation is done in a way that does not affet the ahe hit time. Diret-mapped ahe. A diret-mapped ahe is simpler to build than a setassoiative ahe beause the ahe loa- tion of a referened word is a funtion of the address of a referene only and the replaement algorithm is trivial. The address of a referene to a diret-mapped ahe using bit seletion is divided into several fields. From least-signifiant to most-signifiant, they are (1) two bits that are ignored, assuming a byte address and aligned word referenes; (2) b = log, (number of wordsper blok frame)bits of the blok (offset); (3) i= log, (number of blok frames) bits of the index; and (4) t = 32 - i - b - 2 bits of the address tag. Diret-mapped aess logi, illustrated in Figure 2, has three omponents: data memory, tag memory, and math logi. Data memory holds all the ahed data and instrutions. ts size is, by definition, the ahe size. Coneptually, it an be organized as if it were one word wide and aessed with an address formed by onatenating the index (i bits) and blok (b 28 COMPUTER

5 Address t+i+b 1-1 Address Address e e. Address Ban k[o] Bank[ll Bank[n -11 Math[O] Data[O] Math[l] Data[ \ 1 mom Math[n -11 Data[n \ \ 1 MathOut DataOut Figure 3. A set-assoiative ahe. Cahe aess (hit) logi for an n-way set-assoiative ahe of bloks onsists of n banks and the logi to ombine bank results. Eah bank an be thought of as a diret-mapped ahe of /n bloks and an be implemented using the logi in the dashed box of Figure 2. bits) fields of the address. f it is implemented as a wider memory, some or all of the bits in the blok field will be used to selet a word after the data memory aess. A blok-wide data memory is often preferred when unaligned memory referenes are permitted. The seond omponent, the tag memory, whih holds the state bits (s bits) and address tag (t bits) assoiated with a ahed blok, has one entry per blok frame and is addressed by the index field. The state bits for a blok, usually one or two bits, indiate the blok s status regarding memory update or in a ahe ohereny protool. Cahe hit logi is only onerned with whether a blok is valid. The last omponent, the math-logi, produes a single bit indiating whether the referened blok is present. This bit is asserted only if the tag read from the tag memory is equal to the tag field of the address and the state read from the tag memory is valid. A diret-mapped ahe lookup requires two parallel ations. One ation, alled read-data, onsists of aessing the data memory and passing the word read to DataOut. The seond ation, alled math-found, requires two steps: first, aessing the tag memory to read the state and address tag for a blok frame; seond, asserting MathOut if the state is valid and the tag mathes the referene s tag. Thus, a diret-mapped ahe lookup is simpler than a set-assoiative lookup (desribed below) beause ations readdata and math-found an proeed independently. n set-assoiative ahes, the results of math-found influene the data seleted. Set-assoiative ahe. An n-way setassoiative ahe (n = 2,4,8, or 16), is a ommonly used ahe organization. An n- way set-assoiative ahe allows any one of the n bloks in a referene s set to be replaed on a miss. While this flexibility usually yields lower miss ratios, it requires heking n bloks on eah referene. To keep a set-assoiative ahe hit time similar to that of a diret-mapped ahe, eah of then tags in a set must be read and ompared to the tag of the referene in parallel. This assoiative lookup and omparison adds signifiant ost, as measured in hip ount and board area. Figure 3 shows the basi struture of an n-way set-assoiative ahe. Eah bank has the same struture as an n-timessmaller diret-mapped ahe (see Figure 2). Thus, the index field for eah bank requires i = log, (number of blok frumesln) bits, making the tag field, t, log,n bits larger than for a diret-mapped ahe of the same size. n addition, some Deember

6 Address t+i+b _t_ MathOut DataOut Figure 4. An alternative set-assoiative ahe. This figure shows ahe hit logi for an n-way set-assoiative ahe with a different set-assoiativity logi implemention from that of Figure 3. First, it uses wired-or logi instead of an OR gate to ompute MathOut. Seond, the 32-bit-wide n-to-1 multiplexer and selet logi have been replaed with n 32-bit-wide tri-state buffers. logi, alled the set-assoiativity logi, is needed to selet the result from one of the n banks. On a referene, the address is passed to all the diret-mapped banks. n parallel, eah bank selets a blok, sends 32 bits of data to Data[i], and omputes Math(i1, whih is asserted on valid tag mathes. The set of a referene onsists of the n bloks seleted by the n banks. After the n diret-mapped banks ompute Math[i]'s and Data[i]'s, the setassoiativity logi, shown in the dashed box in Figure 2, produes a single Math- Out signal and DataOut word. MathOut, asserted on a ahe hit, is the logial OR of the n Math[i] signals. DataOut, the data to be returned, must be driven to the Data[;] for the bank that mathed and an be any value if none mathed. One way to implement set-assoiativity logi is illustrated in Figure 3. Here, MathOut is omputed with a single n- input OR gate and DataOut with a 32-bitwide n-to-1 multiplexer. The multiplexer Selet input is driven with the number of the bank that mathed and an be any value if none mathed. Selet an be omputed with an n-bit enoder or with a single level of log&) n/2-input OR gates. Alternate ways of omputing MathOut and DataOut are illustrated in Figure 4. MathOut is omputed by wire-oring all Math[i]'s together, as is possible using open olletor (o) gates in TTL or any ECL gates. This approah requires om- puting two opies of eah Math [i] so that the wire-oring does not affet whih data is seleted. This dupliation does not ause additional delay if the final AND-gate in the bank math logi (not shown) is dupliated. The alternative implementation for DataOut uses tri-state buffers. Here, eah Data[i] is onneted to the input of a tristate buffer, whose enable is ontroled by Math[i]. All n tri-state buffer outputs are onneted together and to DataOut. At most, one tri-state buffer is enabled sine, at most, one bank an math. f no banks math, DataOut is undefined. The distintion between the logi within the n banks and the set-assoiativity logi is not as lear in many implementations as 30 COMPUTER

7 math-found selet-data read-data Figure 5. Timing paths in a set-assoiative ahe. The three timing paths in the ahe hit logi for an n-way set-assoiative ahe are (1) math-found, whih signals a ahe hit or miss (Address to Math[iJ to Mathout); (2) selet-data, whih selets the data word that orresponds to the tag that mathed (Address to Math111 to Selet to DataOut); and (3) read-data, whih provides the data on a ahe hit (Address to Data[i] to DataOut). Path selet-data is not needed in a diret-mapped ahe. it is in Figures 3 and 4. For example, the n omparators and the enoding logi an be ombined into a single n-way omparator that diretly ontrols the multiplexer. Nevertheless, a set-assoiative ahe always requires more iruitry than a diret-mapped ahe. The delay through a set-assoiative ahe is determined by one of three timing paths, illustrated in Figure 5: (1) math-found, whih signals a ahe hit or miss; (2) selet-data, whih selets the data word that orresponds to the tag that mathed; and (3) read-data, whih provides data on a ahe hit. A diret-mapped ahe has timing paths read-data and math-found, but it does not have path selet-data sine the loation of ahed data in a diret-mapped ahe does not depend on whih omparator mathed. Arguments against diret-mapped ahes The arguments against diret-mapped ahes are that they (1) have worse miss ratios than set-assoiative ahes of the same size, (2) have terrible worst-ase behavior, and (3) prelude doing address translation in parallel with the first part of the ahe lookup. n the following setion, show that as single-level ahes in uniproessors get larger, the effets of the first two arguments are diminished and the third argument beomes moot. Larger miss ratios. t is well-known that diret-mapped ahes have larger miss ratios than set-assoiative Consider the likelihood of prematurely replaing an ative blok (one that is being referened) when multiple ative bloks map to the same set. A diret-mapped ahe allows only one of the multiple ative bloks to reside in the ahe at any time, while an n-way set-assoiative ahe allows n bloks to be ahed. Data from simulation and measurement show, however, that the size of the miss Deember

8 E n E m - a K 10K 100K 1M Cahe size (bytes) (a) Two-way to diret-mapped, 16-byte bloks K 10K OOK 1M Cahe size (bytes) (b) Two-way to diret-mapped, 32-byte bloks Figure 6. Miss ratio differenes for unified ahes. This figure shows the hanges in miss ratio, Am, that result when assoiativity is redued from two-way to diret-mapped for unified (data and instrutions ahed together) ahes with 16-byte (a) or 32-byte (b) bloks. The data show that miss ratio differenes diminish as ahes get larger. n omparing 16-byte and 32-byte miss ratios, ignore the dashed lines sine this data omes from different traes. The set-assoiative ahes use LRU replaement. (Soures: The miss ratio data in both figures (solid lines) is derived from Tables 2 and 3 in Alexander6 and Table 3-4 in Hill9. Additional data (dashed lines) for 16-byte bloks (a) omes from Figures 5.10a and 5.10b in Agarwal. Additional data (dashed lines) for 32-byte bloks (b) omes from Figures in Smith. ) ratio differene that results from hanging assoiativity is less than one might expet (Figure 6). The intuition that assoiativity makes a tremendous differene is wrong, beause it fails to onsider that referenes are not made to random loations. Rather, referenes are usually made to loations in reently referened bloks. The tendeny to re-referene bloks makes the miss ratios of all ahes muh less than one, thereby diminishing all potential missratio differenes. A trend in the data shown in Figure 6, not heretofore emphasized, is that the miss ratio differenes diminish as the ahes get larger. For 8-Kbyte unified (data and instrutions ahed together) ahes with 32-byte bloks, for example, the data show that reduing assoiativity from two-way to diret-mapped auses an absolute miss ratio hange of about 0.013, while at 32 Kbytes the hange is Miss ratio differenes for further assoiativity inreases (from two-way to four-way, from four-way to eight-way), not shown, are muh smaller and diminish further as the ahes get larger. Miss ratio differenes diminish as ahes get larger for two reasons. First, the ative bloks are less likely to map to the same set in larger ahes, sine larger ahes have more sets. For fixed assoiativity and blok size, the number of sets is proportional to ahe size. Seond, the miss ratios of all ahe organizations get smaller with inreasing ahe size, diminishing potential miss-ratio differenes. The data from many soures onluively show that the miss ratio differene between a diret-mapped ahe and a setassoiative ahe of the same size diminishes as ahe size inreases. Conse- quently, the disadvantage to diretmapped ahes beomes less important for larger ahes. Terrible worst-ase behavior. Another argument against diret-mapped ahes is that their worst-ase behavior, when multiple bloks ollide in a set, is terrible. While this is true, one must ask whether an analysis of worst-ase behavior should inlude how likely this behavior is. f not, then submit that the worst-ase behavior of diret-mapped ahes is no worse than that of set-assoiative ahes. f too many bloks map to a given set, both organizations will thrash. That fewer ative bloks an ause diret-mapped ahes to thrash does not hange the severity of the worst-ase behavior, only its likelihood, whih we just hose to ignore. On the other hand, if one wishes to 32 COMPUTER

9 inlude the probability that worst-ase behavior ours in one s analysis, then one must observe that (1) worst-ase behavior does not our very often, as is indiated by the small differenes in average miss ratios, and (2) it ours less often in larger ahes, as is indiated by the diminishing average-miss-ratio differenes. n summary, the worst-ase behavior of all ahes, inluding large ahes, is bad, but while worst-ase behavior is more likely in large diret-mapped ahes than in large set-assoiative ahes, it is still unlikely. Parallel address translation diffiult. Almost all high-end omputers in the last two deades used paged virtual memory and organized their ahes with physial addresses. n these systems, address translation (the translation of virtual addresses to physial addresses) ours logially before the ahe is aessed. For some of these ahe onfigurations, however, it is possible to do the address translation in parallel with part of the ahe aess. An important disadvantage of reasonably sized diret-mapped ahes is that this tehnique, alled parallel address translation, is impratial, sine straightforward implementations require that a ahe s size not exeed its assoiativity times the page size. The BM 3033, for example, uses parallel address translation and has a 16-way set-assoiative, 64-Kbyte, physiallytagged ahe and 4-Kbyte pages. A 4-Kbyte diret-mapped ahe, on the other hand, would not be adequate. As ahes get larger, parallel address translation will beome impratial in arhitetures with fixed page sizes. Eventually the inreased hit time and implementation osts of wider assoiativity will overwhelm the benefits of parallel address translation. Designers will be fored to hoose between doing address translation before or after the ahe lookup. Address translation is done before the ahe lookup on all DEC VAX-11 implementations, for example, sine reasonable ahe sizes are muh larger than the VAX-11 s 512-byte page size. Doing address translation after the ahe lookup implies that ahes are organized with virtual addresses and address translation is neessary only on ahe misses. Some researhers argue that the advantage of this approah, namely, a faster hit time, will justify the additional omplexity required to implement a ahe organized with virtual addresses.o7l n either ase, if address translation is not done in parallel with the ahe lookup, it will no longer affet whether a ahe should be diret-mapped or setassoiative. Arguments for diretmapped ahes The arguments for diret-mapped ahes are (1) they an be implemented at less ost than set-assoiative ahes, (2) their ahe hit (aess) times are smaller than those of omparable set-assoiative ahes, and (3) they have smaller effetive (average) aess times than set-assoiative ahes for suffiiently large ahe sizes. Below, support the above arguments for single-level ahes in uniproessors and show why expet the diret-mapped organization to beome ommonly used. Lower ost. A diret-mapped ahe never osts more than a set-assoiative ahe, beause there is a way to onvert from a set-assoiative to a diret-mapped design at no ost. (The ost of a ahe an be measured in many dimensions, suh as number of hips, hip area, poweronsumption, dollars, and design time.) An n-way set-assoiative ahe, like the one shown in Figure 3, an be onverted to one that is diret-mapped simply by hanging the replaement algorithm. On a ahe miss, an n-way set-assoiative ahe selets a vitim, or blok to be replaed, using some algorithm, perhaps LRU or random. A diret-mapped ahe is reated if the vitim is seleted with the lower log2n bits of the address tag of the new referene. Sine this replaement algorithm requires less hardware than the original replaement algorithm, a diret-mapped ahe will ost less than one that is set-assoiative. n pratie, diret-mapped ahes ost signifiantly less, sine less parallelism is required if parallel address translation is not done. An n-way set-assoiative ahe must read n tags in parallel and ompare eah of them with the high-order bits of the referene s address. A diret-mapped ahe need only read and ompare one tag. Thus, diret-mapped ahes need fewer omparators, require fewer onnetions, and an use fewer, larger (deeper) memory hips. Similarly, the data memory (and onnetions to it) in an n-way setassoiative ahe must be n times as wide as that for a diret-mapped ahe, enabling the diret-mapped ahe to use fewer, larger memory hips. Faster hit time. The hit (aess) time of a diret-mapped ahe is less than or equal to that of a omparable set-assoiative ahe. t is at most equal, beause the transformation desribed above reates a diret-mapped ahe with exatly the same hit time as a set-assoiative ahe. n pratie, the hit time of a diretmapped ahe is less than that of a omparable set-assoiative ahe beause the ritial timing path an be made shorter (unless the set-assoiative ahe was small enough to allow parallel address translation). The delay paths, displayed in Figure 5, are math-found, selet-data, and read-data. The hit time of a diret-mapped ahe an be less than that of a set-assoiative ahe, beause the selet-data path an be eliminated in a diret-mapped ahe. nstead of letting the results of tag omparisons determine the data returned to the CPU, the data an be seleted with several bits from a referene s address. These bits an diretly ontrol a multiplexer or be deoded to ontrol tri-state buffers. n either ase, this timing path is so muh faster than the others that it is effetively eliminated. Figure 7 illustrates this improvement. An important effet of eliminating the selet-data timing path is that the mathfound and read-data paths are now independent. This makes it possible for a diret-mapped ahe to return the orret data and for the CPU to resume exeution even before the system knows whether a hit will our, so long as the CPU an bak out of exeution begun with inorret data. This optimisti use of ahe data is being used in a researh mahine at DEC WRL, where it enables the ahe hit time and the mahine yle time to be redued by approximately one-third. Optimisti use of ahe data is possible in a setassoiative ahe if one always returns the most-reently-used (MRU) blok in the seleted set. * found, however, that the performane of a simple diret-mapped ahe is similar to that of a more omplex MRU ahe. t is also possible to improve the readdata path, sine it is no longer neessary to read from n data bloks in parallel. nstead only one blok need be read. This flexibility allows designers to organize data memory hips differently and to use larger, deeper hips. t is possible, for example, to ompletely eliminate the multiplexer or tri-state buffers previously used to selet data from different bloks. Finally, improvement in the mathfound path is also possible, sine it is no longer neessary to read and ompare n Deember

10 Address,Lower log,(n) bits of f MathOut DataOut Figure 7. Converting to a diret-mapped ahe. An n-way set-assoiative ahe an be onverted to a diret-mapped ahe by hanging the replaement algorithm to replae the blok in bank r, where r is the referene s tag modulo n. Sine this funtion an be done with bit seletion (at trivial ost) and off the ritial path for a ahe hit, the resulting diret-mapped ahe has the same ost and hit time as the original set-assoiative ahe. Thus, moving to a diret-mapped ahe never inreases and, as explained in the text, an derease ost and hit time. tags in parallel and then OR the results for the ahe hit/miss signal. Rather, one need only read and ompare one tag. This flexibility allows the tag memory to be implemented with fewer, deeper hips and eliminates the final OR stage. The exat magnitude of the improvement possible depends on many implementation fators. 1 examined ahes implemented in three tehnologies: (1) TTL logi and MOS SRAM memory hips, (2) ECL logi and memory hips, and (3) ustom CMOS. found that moving from a diret-mapped to a two-way setassoiative ahe inreases ahe hit time in (1) from 100 to 109 ns (nine perent), in (2) from 30.0 to 33.5 ns (12 perent), and in (3) from 50.0 to 51.O ns (two perent). The differene is about 10 perent for board-level TTL and ECL ahes and muh smaller for ustom CMOS ahes. 1 do not regard the differene between the TTL and ECL times as signifiant, sine both numbers are sensitive to the propagation delays through a few parts. Sine ustom CMOS assumptions are radially different from those for MS, omparing CMOS results with TTL or ECL results is subjet to more error. However, one may expet the penalty for adding a multiplexer to be larger in MS, where it adds logi delay and two hip rossings, than on a ustom hip, where it adds just the logi delay. n summary, the hit time of a diretmapped ahe will be less than that of a omparable set-assoiative ahe, sine blok seletion an be done before the tag omparison ompletes, and the tag and data memories do not need to read information from n bloks in parallel. Superior effetive aess times. A diretmapped ahe has a smaller effetive (average) aess time than that of a setassoiative ahe of the same size if (1) the diret-mapped ahe has a smaller hit time and (2) both ahes are suffiiently large that the miss ratio differene between them is small. Reall that effetive aess time, t,rf(c), is the average lateny, as seen by the proessor, required by the memory sys- 34 COMPUTER

11 ~~ U3 a, U $ _ U - L W v a, 0, C L U._ E U) a, U m._ P - 0 L W % 10% 20% 30% 40% 50% Cahe hit time hange (Atahe) (a) 10-yle ahe miss time (tmemory) Cahe hit time hange (Atahe) (b) 20-yle ahe miss time (fmemor,) Figure 8. Change in effetive aess time. This figure shows the hange in effetive aess time (Ateff = Atahe + Am*t,,,,,,) that results when moving from a ahe with a relatively fast hit time and a relatively large miss ratio (e.g., a diret-mapped ahe) to another ahe with a slower hit time but smaller miss ratio (a set-assoiative ahe). The graphs assume 10-yle (a) and 20-yle (b) miss penalties, where a yle is defined to be equal to the hit time of the faster ahe. The x-axis displays values of Atahe, the hit time differene. An x value of 20 perent implies that the slower ahe s hit time is 1.2 yles, 1.2 times the hit time of the faster ahe. The y-axis gives values of Ateff, the hange in effetive aess time. A y value of implies that the effetive aess time improves by 0.10 yles. Sine most effetive aess times are slightly larger than 1 yle, an absolute improvement of 0.10 yles translates into slightly less than a 10 perent relative improvement. The various lines show miss ratio hanges, Am, from up to 0.0. All Am s are nonpositive, sine we assume the seond ahe has a smaller miss ratio. Points on the y-axis represent the effetive aess time hange that results when Atahe is zero or ignored. Here, all points are below the x-axis, sine the latter ahe, with the smaller miss ratio, always has a better effetive aess time (Atefr < 0). f Atahe > 0, the benefit of the lower miss ratio is diminished. For all points above the x-axis, the drawbak of the slower hit time exeeds the benefit of the lower missxatio, making the former ahe preferred (Aterr > 0). tem to servie a memory referene. 1 model it as where ahes. t shows that Ateff an be either positive or negative. f, on theother hand, implementation onsiderations are ignored, then where m(c), tahe(c), and fmemory(c) are the miss ratio, hit time (ahe aess time), and average miss penalty (delay beyond a ahe aess to aess memory) for ahe L. f two ahes have the same miss penalty, the hange in effetive aess time moving from a ahe Cl to a ahe C2 is f ahe C is diret-mapped and ahe 2 set-assoiative, then Atahe 2 0 and Am *tmemory 5 0, sine set-assoiative ahes typially have a slower hit time and smaller miss ratio than diret-mapped ahes of the same size. Figure 8 illustrates Ateff = Atahe + Am*tmemor, for hypothetial diret-mapped and set-assoiative whih implies inreasing assoiativity always improves effetive aess time (Ateff is negative). Thus, the effet of inluding implementation onsiderations is to diminish or reverse the miss ratio benefit of inreasing assoiativity. To see whether implementation onsiderations matter in pratie, typial values must be determined for fmemory, Deember

12 E E n.- E m - P) n _...._ _..._ Qnified ,Data , -nstrut n 1K 10K look 1M Cahe size (bytes) Figure 9. Miss ratio differenes. This figure displays the miss ratios from diretmapped ahes less the miss ratio of two-way set-assoiative ahes of the same size for unified, instrution, and data ahes with 32-byte bloks using operating system and multiprogramming traes from BM/370 and VAX 11 arhitetures. Results show miss ratio differenes (Am) generally diminish with inreasing ahe size, and are smaller for instrution ahes than for unified or data ahes. Am, and Atahe. Reasonable values for tmemory are 10 or 20 yles, where a yle is equal to the hit time of the faster ahe. Smaller values are possible, espeially in systems where ahe misses are servied by larger level-two ahes instead of main memory. Larger values are possible in a system where the mismath between the tehnologies used to implement the ahe and memory is larger than normal. Typial values for Am, the absolute differene in miss ratio, an be derived from trae-driven simulation. Figure 9 shows miss ratio differenes between some diret-mapped and two-way setassoiative ahes with 32-byte bloks. The data show that Am s generally get smaller as ahe size is inreased, and that the absolute values of the Am s are small for larger ahes. All Am s for ahes larger than 16 Kbytes, for example, are less than Figure 10 shows effetive aess time hanges with atual miss-ratio differenes for unified ahes from Figure 9. Lines are labeled with ahe sizes and positioned aording to the miss ratio differene for that ahe size. Figures 11 and 12 show similar results for instrution and data ahes. These figures illustrate three points: (1) Moving from a diret-mapped to a two-way set-assoiative ahe has little potential for improving effetive aess time as ahes get larger. At 64 Kbytes (see lines labeled 64K) and with 10-yle misses, the maximum improvement possible is 5.2,3.6, and 4.5 perent for unified, instrution, and data ahes. With 20-yle misses, the maximum possible improvement is twie as large. (2) Moving from a diret-mapped to a two-way set-assoiative ahe an ause a worse effetive aess time if ahe hit time inreases by even a small amount. The improvement is offset if the ahe hit time inrease is equal to the maximum improve- ment possible from the smaller miss ratio (for example, 5.2,3.6, and 4.5 perent for unified, instrution, and data ahes of 64 Kbytes, having 10-yle miss penalties). (3) Moving from a diret-mapped to a two-way set-assoiative ahe offers less to instrution ahes than it does to unified or data ahes. The potential benefit from inreasing assoiativity in instrution ahes with a 10-yle miss time is less than 6.4 perent for sizes as small as 2 Kbytes. The atual benefit will be less if the miss penalty is less than 10 yles or inreasing assoiativity impats ahe hit time. Furthermore, inreasing blok size or inreasing assoiativity beyond two-way does not hurt the ase for large diretmapped ahes. nreasing blok size in large ahes to 64 bytes improves the performane of diret-mapped ahes relative to set-assoiative ones by dereasing all miss ratios and miss ratio differenes. Further inreases will exhibit similar behavior until the number of bloks in the ahe beomes limited. Miss ratio improvements resulting from inreasing assoiativity beyond two-way are muh smaller than the improvements between diret-mapped and two-way set-assoiativity, implying that further inreases in assoiativity will not improve effetive aess time unless they have a negligible impat on ahe hit time. The final parameter value that must be determined to know whether diretmapped or set-assoiative ahes are faster is Afahe. This parameter is diffiult to determine, beause it is implementation dependent and very sensitive to the delay through a few parts. As disussed previously, examined board-level ahes (TTL and ECL) where Atahe was around 10 perent. The effet of a 10 perent slowdown an be studied in Figures by only onsidering design points on a vertial line at Atahe = 10 perent. For the 10-yle miss penalty, Atahe = 10 perent implies that diretmapped ahes have better effetive aess times than two-way set-assoiative ahes for ahes equal to and larger than 16,8, and 16Kbytes for unified, instrution, and data ahes. For the 20-yle miss penalty, the orresponding sizes are 64, 16, and 64 Kbytes. The exat ahe size at whih the effetive aess time of a diret-mapped ahe beomes better than that of a two-way setassoiative ahe is sensitive to many assumptions. Nevertheless, that it does ross over is inevitable, given that miss ratio differenes diminish as ahes get larger and that set-assoiative ahes have 36 COMPUTER

13 ~ ~ 64K ~ 2K ~ 32K ~ 4K K - 16K - 8K - 4K - 1K - 64K - 16K - 8K - 2K K - 0.5K % 10% 20% 30% 40% 50% Cahe hit time hange (Atahe) (a) 10-yle ahe miss time (tmemor,) 0% 10% 20% 30% 40% 50% Cahe hit time hange (Af,,,,,) (b) 20-yle ahe miss time (fmemor,) Figure 10. Effetive aess time hanges in unified ahes. This figure shows the hange in effetive aess time (Ateff) that results from moving from a diret-mapped ahe to a two-way set-assoiative ahe when both ahes are unified, have 32-byte bloks, and have 10-yle (a) or 20-yle (b) miss penalties (t,,,,,,). This figure is onstruted by substituting miss ratio differenes (Am s) for unified ahes from Figure 9 into Figure 8. The lines are labeled with ahe sizes in bytes and positioned by the miss ratio differene at that ahe size. The data for 16-Kbyte ahes with 10-yle miss penalties, for example, an be interpreted as follows: inreasing assoiativity from diret-mapped to two-way improves effetive aess time by 0.10 if there is no speed ost to adding assoiativity (Atohe = 0); inreasing assoiativity has no effet on effetive aess time if the set-assoiative ahe s hit time is 10 perent longer; and inreasing assoiativity auses a worse effetive aess time, despite lowering the miss ratio, if the setassoiative ahe is more than 10 perent slower. slower hit times. At ahe sizes less than the ross-over size, a diret-mapped ahe may still be preferred to one that is set-assoiative, sine a diret-mapped ahe may ost less and its effetive aess time may not be muh worse. Even for 28yle miss penalties, as Figures show, the effetive aess time of a two-way set-assoiative ahe is never more than five perent better than that of the orresponding diretmapped ahe at ahe sizes of 32 Kbytes and larger. Other trends Up to this point, have onentrated on single-level ahes in uniproessors. Here disuss future trends toward ahes in hierarhies and multiproessors. 1 examine why these trends may our and disuss how and whether my arguments for singlelevel ahes in uniproessors apply to these new situations. Toward ahes in hierarhies. n twolevel ahe hierarhies, a level-one ahe servies proessor referenes, but it obtains data for misses from a level-two ahe instead of memory. A level-two ahe servies only level-one ahe misses and obtains data for its misses from memory. Two-level ahe hierarhies, heretofore rarely used, may beome more ommon in future systems for three reasons. First, implementation onsiderations an fore a partition. Some reently introdued miroproessors, for example, devote some of their limited on-hip area to ahes, but they require larger ahes to avoid frequent aesses to relatively slow main memory. Sine the on-hip ahes annot be made larger, a seond on-board ahe is required. Seond, a detailed omputation of effetive aess time shows that two-level ahe hierarhies an offer superior performane to a single-level ahe as proessors speed up relative to main memories. Third, there may be funtional and performane benefits to speializing ahes at different levels in a multiproessor. n a multiproessor, a level-one ahe an be optimized to minimize effetive aess time, while the leveltwo ahe is designed to redue ost or Deember

14 ~...,..., 2K-64K 1K K % 10% 20% 30% 40% 50% Cahe hit time hange (Afahe) (a) 10-yle ahe miss time (t,,,,,) J 0% 10% 20% 30% 40% 50% Cahe hit time hange (Atahe) (b) 20-yle ahe miss time (fmemory) Figure 11. Effetive aess-time differenes in instrution ahes. This figure shows the effetive aess-time hange (teff) of moving from a diret-mapped instrution ahe to a two-way set-assoiative instrution ahe with miss penalties of either 10 yles (a) or 20 yles (b). Other assumptions math those of Figure 10. Beause miss-ratio differenes (Am s) are smaller, the benefit of assoiativity is smaller for instrution ahes than it is for unified or data ahes. interonnetion traffi. Similar reasons are expressed by Short and Levy.13 The utility of diret-mapped ahes in two-level ahe hierarhies is, as yet, undetermined. Level-one ahes will be diret mapped if tehnologial onstraints permit large enough ahe sizes that the hit time advantage of diret-mapped ahes (due in part to allowing data to be returned before the tag omparison is omplete) is more important than the miss ratio disadvantage. Diret-mapped ahes an be preferred for ahe sizes as small as 16 Kbytes if misses are servied by a level-two ahe in 10 yles or less. Level-two ahes, on the other hand, are more likely to be setassoiative, sine level-two ahe hit times are less ritial and a lower miss ratio an improve multiproessor performane. The only argument for diret-mapped leveltwo ahes is that straightforward implementations of large set-assoiative ahes will be expensive, requiring multiple-word-wide banks of memory hips. Toward ahes in multiproessors. To provide a rate of growth of omputing power that exeeds the rate of tehnologial improvement, many manufaturers, partiularly of high-end omputers, are turning toward multiproessors. To failitate ease of programming, some multiproessors provide shared-memory and use ahes. Cahes in multiproessors may be designed differently than those in uniproessors, sine multiproessor ahes may be more onerned with minimizing memory and interonnet ontention than with minimizing effetive aess time.4 Here, the relative miss ratio differene (Amlm) is more important than the absolute miss ratio differene (Am). found relative miss ratio differenes are onstant aross wide hanges in miss ratio and ahe size. For example, dereasing assoiativity from two-way to diret-mapped in unified ahes auses a relative miss ratio inrease of about 30 perent even for large ahes. Relative miss ratio differenes are most important in single-bus shared-memory ahe-oherent multiproessors, where bus bandwidth an easily limit system throughput. n multiproessors based on long-lateny high-bandwidth interonnetion networks, however, ahe design should proeed as in a uniproessor with slow main memory. Nevertheless, both ases make set-assoiativity ahes more attrative, but not neessarily better. f multiproessors use two-level ahe hierarhies, the above arguments apply to level-two ahes. would expet most level-one ahes, on the other hand, to be designed like level-one ahes in a uniproessor, making diret-mapped ahes likely. This expetation may be inorret if the misses for many level-one ahes are servied by a single level-two ahe and ontention between level-one ahes is signifiant. D iret-mapped ahes will be ommon in uniproessors as singlelevel or level-one ahes and in 38 COMPUTER

15 K 32K 16K 8K 4K 0.50,...._......,......, ~ 2K 1K 0.5K v) a, 0 P._ + 0 U1 v) a, U P._ 0 L W % 10% 20% 30% 40% 50% Cahe hit time hange (Afahe) (a) 10-yle ahe miss time (tmemor,) - nn -- J, 0% 10% 20% 30% 40% 50% Cahe hit time hange (Afahe) (b) 20-yle ahe miss time (fmemor,) Figure 12. Effetive aess-time differenes in data ahes. This figure displays effetive aess-time hanges that result from moving from a diret-mapped data ahe to a two-way set-assoiative data ahe with miss penalties of either 10 yles (a) or 20 yles (b). Other assumptions math those of Figure 10. The benefit of assoiativity in data ahes is similar to that for unified ahes. multiproessors as level-one ahes. Diret-mapped ahes are preferred when they are suffiiently large that hit time benefits are more signifiant than miss ratio drawbaks. This an our in singlelevel ahes of 64 Kbytes and larger (16 Kbytes for instrution ahes) and an our at 16 Kbytes and larger for level-one ahes whose misses are servied more rapidly by level-two ahes. The arguments against diret-mapped ahes, with respet to set-assoiative ahes, are that they (1) have worse miss ratios, (2) have more ommon worst-ase behavior, and (3) prelude parallel address translation. have shown that the signifiane of the first two points beomes questionable for large ahes where absolute miss ratio differenes are small, and that the third is not a disadvantage for large diret-mapped ahes, sine large setassoiative ahes also prelude parallel address translation. The arguments for diret-mapped ahes are that they (1) ost less, (2) have faster hit (aess) times, and (3) an have superior effetive (average) aess times. have shown that the strength of these arguments is not diminished by inreasing ahe size, and the third point is more likely to be true for large ahe sizes. An alternate way of stating this result is set-assoiative ahes redue the time spent on ahe misses; diret-mapped ahes redue the time spent on ahe hits, espeially if a CPU an use data before a hit or miss is determined; set-assoiative ahes are preferred in small ahes where misses are ommon; diret-mapped ahes are preferred in large ahes where misses are rare; and many future ahes will be suffiiently large and therefore diretmapped. These arguments may not apply to single-level or level-two ahes in mul- tiproessors, where minimizing ontention or very long miss penalties may favor setassoiative ahes over diret-mapped ahes. 0 Aknowledgments would like to thank my thesis advisors, Alan Smith and David Patterson, for their many suggestions that improved the quality of my researh. Thanks also to those who read and improved drafts of this artile: Sue Dentinger, James Goodman, David Patterson, Gurindar Sohi, and the anonymous referees. The material presented here is based on researh supported in part by the Defense Advaned Researh Projets Ageny monitored by Naval Eletronis Systems Command under Contrat No. N C-0269, the National Siene Foundation under grants CCR and MP , the State of California under the MCRO program, the graduate shool at the University of Wisonsin-Madison, and by BM, Digital Equipment Corporation, Hewlett- Pakard, and Signetis. Deember

16 Referenes 1. A.J. Smith, Cahe Memories, Cotputing Surveys, Vol. 14, No. 3, Sept. 1982, pp A.J. Smith, Bibliography and Readings on CPU Cahe Memories and Related Topis, Computer Arhiteture News, Jan. 1986, pp A.J. Smith, Line (Blok) Sizehoie for CPU Cahes, EEE Trans. Computers Vol. C-36, No. 9, Sept. 1987, pp J. Bell, D. Casasent, and C.G. Bell, An nvestigation of Alternative Cahe Organizations, EEE Trans. Computers, Vol. C-23, No. 4, Apr. 1974, pp A.J. Smith, A Comparative Study of Set Assoiative Memory Mapping Algorithms and Their Use for Cahe and Main Mem- ory, EEE Trans. Software Engineering, Vol. SE-4, No. 2, Mar. 1978, pp C. Alexander et al., CaheMemory Performane in a Unix Environment, Computer Arhiteture News, Vol. 14, No. 3, June 1986, pp A. Agarwal, Analysis of Cahe Performane for Operating Systems and Miroprogramming, PhD dissertation, Teh. Report CSL-TR , Stanford University, Stanford, Calif., May S. Przybylski, M. Horowitz, and J. Hennessy, Performane Tradeoffs in Cahe Design, Pro. 15th Ann. nt l Symp. Computer Arhiteture, No. 861, Computer Soiety Press, Los Alamitos, Calif., 1988, pp University of Nebraska - Linoln Computer Siene and Engineering Department Department Chair The University of Nebraska-Linoln seeks a dynami individual for the position of Chair of the Department of Computer Siene and Engineering. The department urrently has 14 faulty members, has in plae rigorous programs in omputer siene and has initiated a program in omputer engineering. Programs are offered in two olleges, Arts and Sienes and Engineering and Tehnology. Currently about 380 Undergraduates, 55 Masters and 15 Ph.D. andidates are enrolled in Computer Siene and Engineering programs. The department has strong researh programs in algorithms, theoretial omputer siene, ommuniations theory and networks, oding theory and data enryption, ombinatoris, fault tolerant omputing, formal languages, and symboli and algebrai omputation. Researh strengths also exist in artifiial intelligene, omputer arhiteture, VLS, programming anguages, numerial analysis, information retrieval. human fators, and data base. Strong interdisiplinary ties exist between the Departments of Computer Siene and Engineering; Mathematis and Statistis; Eletrial Engineering; and Computer Resoures Center. The University has reently reated a Centerfor Communiation and nformation Siene based mainly on researh faulty in the Computer Siene and Engineering Department, but also inluding faulty from the above named departments. The new hairperson will have a leadership role in shaping the Center s future diretion. The University of Nebraska - Linoln is the primary ampus for researh and graduate studies in the State of Nebraska. The University has a wide variety of omputing resoures linked by a sophistiated ampus-wide network. UNL is the leading institution in the NSF-funded regional network MDnet, and a node on the NSFnet bakbone. The State of Nebraska s ommitment to tehnology has been undersored by the Governor s Proposal to inrease funding for researh at the University of Nebraska. The five year plan would provide an additional $4 million eah year over the previous year, leading to a $20 million inrement in the fifth year. The State Legislature has appropriated funds to start this ambitious projet. Some of these funds are now available to support the Center for Communiation and nformation Siene. Qualifiations require earned dotorate in omputer siene or related field, strong leadership for researh and aademi programs, and redentials appropriate for appointment as a full professor. Administrative experiene is desirable. The starting date for this appointment is August, The losing date is Deember 15, 1988, or until the position is filled. Salary will be ommensurate with qualifiations. Women and minorities are partiularly enouraged to apply. Qualified appliants should send resumes and names of three referenes to Prof. Spyros S. Magliveras, Chairman, Searh Committee, Computer Siene and Engineering Department, Ferguson Hall, University of Nebraska, Linoln, NE address: spyros@ fergvax.unl.edu. An Equal Opporiuni~lAffirniati~,e Ation Employer 9. M.D. Hill, Aspetsof CaheMemoryand nstrution Buffer Performane, PhD dissertation, Teh. Report 87/381, Computer Siene Dept., Univ. of California, Berkeley, Calif., Nov J.R Goodman, Cohereny for Multiproessor Virtual Address Cahes, Pro. Symp. Arhitetural Support for Programming Languages and Operating Systems, No. M805 (mirofihe), Computer Soiety Press, Los Alamitos, Calif., 1987, pp D.A. Wood et al., An n-cahe Address Translation Mehanism, Pro. 13th Ann. nt l Symp. Computer Arhiteture, No. 719, Computer Soiety Press, Los Alamitos, Calif., 1986, pp J.H. Chang, H. Chao, andk. So, Cahe Design of a Sub-Miron CMOS System/370, Pro. 14th Ann. nt l Symp. Computer Arhiteture, No. 716, Computer Soiety Press, Los Alamitos, Calif., 1987, pp R.T. Short and H.M. Levy, ASimulation Study of Two-Level Cahes, Pro. 15th Ann. nt 1 Symp. Computer Arhiteture, NO. 861, Computer Soiety Press, Los Alamitos, Calif., 1988, pp J.R Goodman, Using Cahe Memory to Redue Proessor-Memory Traffi, Pro. 10th Ann. nt l Symp. ComputerArhiteture, No. M473 (mirofihe), Computer Soiety Press, Los Alamitos, Calif., 1983, pp Mark D. Hill is an assistant professor in the Computer Sienes Department at the University of Wisonsin at Madison. His researh interests enter on performane arid implementation fators in memory systems. He was a prinipal ontributor to the SPUR projet to build a shared-bus multiproessor at the University of California at Berkeley. He is urrently working on Multiube, a projet designing a multiproessor using a grid of buses. Hill earned a BS in omputer engineering from the University of Mihigan in 1981, and an MS and PhD in omputer siene from the University of California at Berkeley in 1983 and 1987, respetively. Heis amember of EEE, the EEE Computer Soiety, and ACM. Hill may be ontated at the University of Wisonsin-Madison, Computer Sienes Department, 1210 W. Dayton St., Madison, W COMPUTER

On - Line Path Delay Fault Testing of Omega MINs M. Bellos 1, E. Kalligeros 1, D. Nikolos 1,2 & H. T. Vergos 1,2

On - Line Path Delay Fault Testing of Omega MINs M. Bellos 1, E. Kalligeros 1, D. Nikolos 1,2 & H. T. Vergos 1,2 On - Line Path Delay Fault Testing of Omega MINs M. Bellos, E. Kalligeros, D. Nikolos,2 & H. T. Vergos,2 Dept. of Computer Engineering and Informatis 2 Computer Tehnology Institute University of Patras,

More information

Pipelined Multipliers for Reconfigurable Hardware

Pipelined Multipliers for Reconfigurable Hardware Pipelined Multipliers for Reonfigurable Hardware Mithell J. Myjak and José G. Delgado-Frias Shool of Eletrial Engineering and Computer Siene, Washington State University Pullman, WA 99164-2752 USA {mmyjak,

More information

Analysis of input and output configurations for use in four-valued CCD programmable logic arrays

Analysis of input and output configurations for use in four-valued CCD programmable logic arrays nalysis of input and output onfigurations for use in four-valued D programmable logi arrays J.T. utler H.G. Kerkhoff ndexing terms: Logi, iruit theory and design, harge-oupled devies bstrat: s in binary,

More information

Automatic Physical Design Tuning: Workload as a Sequence Sanjay Agrawal Microsoft Research One Microsoft Way Redmond, WA, USA +1-(425)

Automatic Physical Design Tuning: Workload as a Sequence Sanjay Agrawal Microsoft Research One Microsoft Way Redmond, WA, USA +1-(425) Automati Physial Design Tuning: Workload as a Sequene Sanjay Agrawal Mirosoft Researh One Mirosoft Way Redmond, WA, USA +1-(425) 75-357 sagrawal@mirosoft.om Eri Chu * Computer Sienes Department University

More information

Partial Character Decoding for Improved Regular Expression Matching in FPGAs

Partial Character Decoding for Improved Regular Expression Matching in FPGAs Partial Charater Deoding for Improved Regular Expression Mathing in FPGAs Peter Sutton Shool of Information Tehnology and Eletrial Engineering The University of Queensland Brisbane, Queensland, 4072, Australia

More information

Outline: Software Design

Outline: Software Design Outline: Software Design. Goals History of software design ideas Design priniples Design methods Life belt or leg iron? (Budgen) Copyright Nany Leveson, Sept. 1999 A Little History... At first, struggling

More information

COST PERFORMANCE ASPECTS OF CCD FAST AUXILIARY MEMORY

COST PERFORMANCE ASPECTS OF CCD FAST AUXILIARY MEMORY COST PERFORMANCE ASPECTS OF CCD FAST AUXILIARY MEMORY Dileep P, Bhondarkor Texas Instruments Inorporated Dallas, Texas ABSTRACT Charge oupled devies (CCD's) hove been mentioned as potential fast auxiliary

More information

Extracting Partition Statistics from Semistructured Data

Extracting Partition Statistics from Semistructured Data Extrating Partition Statistis from Semistrutured Data John N. Wilson Rihard Gourlay Robert Japp Mathias Neumüller Department of Computer and Information Sienes University of Strathlyde, Glasgow, UK {jnw,rsg,rpj,mathias}@is.strath.a.uk

More information

HEXA: Compact Data Structures for Faster Packet Processing

HEXA: Compact Data Structures for Faster Packet Processing Washington University in St. Louis Washington University Open Sholarship All Computer Siene and Engineering Researh Computer Siene and Engineering Report Number: 27-26 27 HEXA: Compat Data Strutures for

More information

Announcements. Lecture Caching Issues for Multi-core Processors. Shared Vs. Private Caches for Small-scale Multi-core

Announcements. Lecture Caching Issues for Multi-core Processors. Shared Vs. Private Caches for Small-scale Multi-core Announements Your fous should be on the lass projet now Leture 17: Cahing Issues for Multi-ore Proessors This week: status update and meeting A short presentation on: projet desription (problem, importane,

More information

CleanUp: Improving Quadrilateral Finite Element Meshes

CleanUp: Improving Quadrilateral Finite Element Meshes CleanUp: Improving Quadrilateral Finite Element Meshes Paul Kinney MD-10 ECC P.O. Box 203 Ford Motor Company Dearborn, MI. 8121 (313) 28-1228 pkinney@ford.om Abstrat: Unless an all quadrilateral (quad)

More information

What are Cycle-Stealing Systems Good For? A Detailed Performance Model Case Study

What are Cycle-Stealing Systems Good For? A Detailed Performance Model Case Study What are Cyle-Stealing Systems Good For? A Detailed Performane Model Case Study Wayne Kelly and Jiro Sumitomo Queensland University of Tehnology, Australia {w.kelly, j.sumitomo}@qut.edu.au Abstrat The

More information

Multi-Channel Wireless Networks: Capacity and Protocols

Multi-Channel Wireless Networks: Capacity and Protocols Multi-Channel Wireless Networks: Capaity and Protools Tehnial Report April 2005 Pradeep Kyasanur Dept. of Computer Siene, and Coordinated Siene Laboratory, University of Illinois at Urbana-Champaign Email:

More information

A Novel Validity Index for Determination of the Optimal Number of Clusters

A Novel Validity Index for Determination of the Optimal Number of Clusters IEICE TRANS. INF. & SYST., VOL.E84 D, NO.2 FEBRUARY 2001 281 LETTER A Novel Validity Index for Determination of the Optimal Number of Clusters Do-Jong KIM, Yong-Woon PARK, and Dong-Jo PARK, Nonmembers

More information

The Minimum Redundancy Maximum Relevance Approach to Building Sparse Support Vector Machines

The Minimum Redundancy Maximum Relevance Approach to Building Sparse Support Vector Machines The Minimum Redundany Maximum Relevane Approah to Building Sparse Support Vetor Mahines Xiaoxing Yang, Ke Tang, and Xin Yao, Nature Inspired Computation and Appliations Laboratory (NICAL), Shool of Computer

More information

Accommodations of QoS DiffServ Over IP and MPLS Networks

Accommodations of QoS DiffServ Over IP and MPLS Networks Aommodations of QoS DiffServ Over IP and MPLS Networks Abdullah AlWehaibi, Anjali Agarwal, Mihael Kadoh and Ahmed ElHakeem Department of Eletrial and Computer Department de Genie Eletrique Engineering

More information

Cross-layer Resource Allocation on Broadband Power Line Based on Novel QoS-priority Scheduling Function in MAC Layer

Cross-layer Resource Allocation on Broadband Power Line Based on Novel QoS-priority Scheduling Function in MAC Layer Communiations and Networ, 2013, 5, 69-73 http://dx.doi.org/10.4236/n.2013.53b2014 Published Online September 2013 (http://www.sirp.org/journal/n) Cross-layer Resoure Alloation on Broadband Power Line Based

More information

Learning Convention Propagation in BeerAdvocate Reviews from a etwork Perspective. Abstract

Learning Convention Propagation in BeerAdvocate Reviews from a etwork Perspective. Abstract CS 9 Projet Final Report: Learning Convention Propagation in BeerAdvoate Reviews from a etwork Perspetive Abstrat We look at the way onventions propagate between reviews on the BeerAdvoate dataset, and

More information

the data. Structured Principal Component Analysis (SPCA)

the data. Structured Principal Component Analysis (SPCA) Strutured Prinipal Component Analysis Kristin M. Branson and Sameer Agarwal Department of Computer Siene and Engineering University of California, San Diego La Jolla, CA 9193-114 Abstrat Many tasks involving

More information

Exploring the Commonality in Feature Modeling Notations

Exploring the Commonality in Feature Modeling Notations Exploring the Commonality in Feature Modeling Notations Miloslav ŠÍPKA Slovak University of Tehnology Faulty of Informatis and Information Tehnologies Ilkovičova 3, 842 16 Bratislava, Slovakia miloslav.sipka@gmail.om

More information

Space- and Time-Efficient BDD Construction via Working Set Control

Space- and Time-Efficient BDD Construction via Working Set Control Spae- and Time-Effiient BDD Constrution via Working Set Control Bwolen Yang Yirng-An Chen Randal E. Bryant David R. O Hallaron Computer Siene Department Carnegie Mellon University Pittsburgh, PA 15213.

More information

A Novel Bit Level Time Series Representation with Implication of Similarity Search and Clustering

A Novel Bit Level Time Series Representation with Implication of Similarity Search and Clustering A Novel Bit Level Time Series Representation with Impliation of Similarity Searh and lustering hotirat Ratanamahatana, Eamonn Keogh, Anthony J. Bagnall 2, and Stefano Lonardi Dept. of omputer Siene & Engineering,

More information

IN structured P2P overlay networks, each node and file key

IN structured P2P overlay networks, each node and file key 242 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 21, NO. 2, FEBRUARY 2010 Elasti Routing Table with Provable Performane for Congestion Control in DHT Networks Haiying Shen, Member, IEEE,

More information

Department of Electrical and Computer Engineering University of Wisconsin Madison. Fall

Department of Electrical and Computer Engineering University of Wisconsin Madison. Fall Department of Eletrial and Computer Engineering University of Wisonsin Madison ECE 553: Testing and Testable Design of Digital Systems Fall 2014-2015 Assignment #2 Date Tuesday, September 25, 2014 Due

More information

NONLINEAR BACK PROJECTION FOR TOMOGRAPHIC IMAGE RECONSTRUCTION. Ken Sauer and Charles A. Bouman

NONLINEAR BACK PROJECTION FOR TOMOGRAPHIC IMAGE RECONSTRUCTION. Ken Sauer and Charles A. Bouman NONLINEAR BACK PROJECTION FOR TOMOGRAPHIC IMAGE RECONSTRUCTION Ken Sauer and Charles A. Bouman Department of Eletrial Engineering, University of Notre Dame Notre Dame, IN 46556, (219) 631-6999 Shool of

More information

Reading Object Code. A Visible/Z Lesson

Reading Object Code. A Visible/Z Lesson Reading Objet Code A Visible/Z Lesson The Idea: When programming in a high-level language, we rarely have to think about the speifi ode that is generated for eah instrution by a ompiler. But as an assembly

More information

Chapter 2: Introduction to Maple V

Chapter 2: Introduction to Maple V Chapter 2: Introdution to Maple V 2-1 Working with Maple Worksheets Try It! (p. 15) Start a Maple session with an empty worksheet. The name of the worksheet should be Untitled (1). Use one of the standard

More information

A Dual-Hamiltonian-Path-Based Multicasting Strategy for Wormhole-Routed Star Graph Interconnection Networks

A Dual-Hamiltonian-Path-Based Multicasting Strategy for Wormhole-Routed Star Graph Interconnection Networks A Dual-Hamiltonian-Path-Based Multiasting Strategy for Wormhole-Routed Star Graph Interonnetion Networks Nen-Chung Wang Department of Information and Communiation Engineering Chaoyang University of Tehnology,

More information

Acoustic Links. Maximizing Channel Utilization for Underwater

Acoustic Links. Maximizing Channel Utilization for Underwater Maximizing Channel Utilization for Underwater Aousti Links Albert F Hairris III Davide G. B. Meneghetti Adihele Zorzi Department of Information Engineering University of Padova, Italy Email: {harris,davide.meneghetti,zorzi}@dei.unipd.it

More information

13.1 Numerical Evaluation of Integrals Over One Dimension

13.1 Numerical Evaluation of Integrals Over One Dimension 13.1 Numerial Evaluation of Integrals Over One Dimension A. Purpose This olletion of subprograms estimates the value of the integral b a f(x) dx where the integrand f(x) and the limits a and b are supplied

More information

Reading Object Code. A Visible/Z Lesson

Reading Object Code. A Visible/Z Lesson Reading Objet Code A Visible/Z Lesson The Idea: When programming in a high-level language, we rarely have to think about the speifi ode that is generated for eah instrution by a ompiler. But as an assembly

More information

Semi-Supervised Affinity Propagation with Instance-Level Constraints

Semi-Supervised Affinity Propagation with Instance-Level Constraints Semi-Supervised Affinity Propagation with Instane-Level Constraints Inmar E. Givoni, Brendan J. Frey Probabilisti and Statistial Inferene Group University of Toronto 10 King s College Road, Toronto, Ontario,

More information

System-Level Parallelism and Throughput Optimization in Designing Reconfigurable Computing Applications

System-Level Parallelism and Throughput Optimization in Designing Reconfigurable Computing Applications System-Level Parallelism and hroughput Optimization in Designing Reonfigurable Computing Appliations Esam El-Araby 1, Mohamed aher 1, Kris Gaj 2, arek El-Ghazawi 1, David Caliga 3, and Nikitas Alexandridis

More information

SVC-DASH-M: Scalable Video Coding Dynamic Adaptive Streaming Over HTTP Using Multiple Connections

SVC-DASH-M: Scalable Video Coding Dynamic Adaptive Streaming Over HTTP Using Multiple Connections SVC-DASH-M: Salable Video Coding Dynami Adaptive Streaming Over HTTP Using Multiple Connetions Samar Ibrahim, Ahmed H. Zahran and Mahmoud H. Ismail Department of Eletronis and Eletrial Communiations, Faulty

More information

Graph-Based vs Depth-Based Data Representation for Multiview Images

Graph-Based vs Depth-Based Data Representation for Multiview Images Graph-Based vs Depth-Based Data Representation for Multiview Images Thomas Maugey, Antonio Ortega, Pasal Frossard Signal Proessing Laboratory (LTS), Eole Polytehnique Fédérale de Lausanne (EPFL) Email:

More information

A Partial Sorting Algorithm in Multi-Hop Wireless Sensor Networks

A Partial Sorting Algorithm in Multi-Hop Wireless Sensor Networks A Partial Sorting Algorithm in Multi-Hop Wireless Sensor Networks Abouberine Ould Cheikhna Department of Computer Siene University of Piardie Jules Verne 80039 Amiens Frane Ould.heikhna.abouberine @u-piardie.fr

More information

The AMDREL Project in Retrospective

The AMDREL Project in Retrospective The AMDREL Projet in Retrospetive K. Siozios 1, G. Koutroumpezis 1, K. Tatas 1, N. Vassiliadis 2, V. Kalenteridis 2, H. Pournara 2, I. Pappas 2, D. Soudris 1, S. Nikolaidis 2, S. Siskos 2, and A. Thanailakis

More information

Accelerating Multiprocessor Simulation with a Memory Timestamp Record

Accelerating Multiprocessor Simulation with a Memory Timestamp Record Aelerating Multiproessor Simulation with a Memory Timestamp Reord Kenneth Barr Heidi Pan Mihael Zhang Krste Asanovi Marh, 5 Massahusetts Institute of Tehnology Intelligent sampling gives est speed-auray

More information

A Load-Balanced Clustering Protocol for Hierarchical Wireless Sensor Networks

A Load-Balanced Clustering Protocol for Hierarchical Wireless Sensor Networks International Journal of Advanes in Computer Networks and Its Seurity IJCNS A Load-Balaned Clustering Protool for Hierarhial Wireless Sensor Networks Mehdi Tarhani, Yousef S. Kavian, Saman Siavoshi, Ali

More information

An Efficient and Scalable Approach to CNN Queries in a Road Network

An Efficient and Scalable Approach to CNN Queries in a Road Network An Effiient and Salable Approah to CNN Queries in a Road Network Hyung-Ju Cho Chin-Wan Chung Dept. of Eletrial Engineering & Computer Siene Korea Advaned Institute of Siene and Tehnology 373- Kusong-dong,

More information

Design and Analysis of a Robust Pipelined Memory System

Design and Analysis of a Robust Pipelined Memory System Design and Analysis of a obust Pipelined Memory System Hao ang 1 Haiquan (Chuk) Zhao 2 Bill Lin 1 Jun (Jim) Xu 2 1 Department of Eletrial and Computer Engineering, University of California, San Diego Email

More information

Contents Contents...I List of Tables...VIII List of Figures...IX 1. Introduction Information Retrieval... 8

Contents Contents...I List of Tables...VIII List of Figures...IX 1. Introduction Information Retrieval... 8 Contents Contents...I List of Tables...VIII List of Figures...IX 1. Introdution... 1 1.1. Internet Information...2 1.2. Internet Information Retrieval...3 1.2.1. Doument Indexing...4 1.2.2. Doument Retrieval...4

More information

Algorithms, Mechanisms and Procedures for the Computer-aided Project Generation System

Algorithms, Mechanisms and Procedures for the Computer-aided Project Generation System Algorithms, Mehanisms and Proedures for the Computer-aided Projet Generation System Anton O. Butko 1*, Aleksandr P. Briukhovetskii 2, Dmitry E. Grigoriev 2# and Konstantin S. Kalashnikov 3 1 Department

More information

Abstract. Key Words: Image Filters, Fuzzy Filters, Order Statistics Filters, Rank Ordered Mean Filters, Channel Noise. 1.

Abstract. Key Words: Image Filters, Fuzzy Filters, Order Statistics Filters, Rank Ordered Mean Filters, Channel Noise. 1. Fuzzy Weighted Rank Ordered Mean (FWROM) Filters for Mixed Noise Suppression from Images S. Meher, G. Panda, B. Majhi 3, M.R. Meher 4,,4 Department of Eletronis and I.E., National Institute of Tehnology,

More information

Architecture and Performance of the Hitachi SR2201 Massively Parallel Processor System

Architecture and Performance of the Hitachi SR2201 Massively Parallel Processor System Arhiteture and Performane of the Hitahi SR221 Massively Parallel Proessor System Hiroaki Fujii, Yoshiko Yasuda, Hideya Akashi, Yasuhiro Inagami, Makoto Koga*, Osamu Ishihara*, Masamori Kashiyama*, Hideo

More information

Multi-Piece Mold Design Based on Linear Mixed-Integer Program Toward Guaranteed Optimality

Multi-Piece Mold Design Based on Linear Mixed-Integer Program Toward Guaranteed Optimality INTERNATIONAL CONFERENCE ON MANUFACTURING AUTOMATION (ICMA200) Multi-Piee Mold Design Based on Linear Mixed-Integer Program Toward Guaranteed Optimality Stephen Stoyan, Yong Chen* Epstein Department of

More information

BENDING STIFFNESS AND DYNAMIC CHARACTERISTICS OF A ROTOR WITH SPLINE JOINTS

BENDING STIFFNESS AND DYNAMIC CHARACTERISTICS OF A ROTOR WITH SPLINE JOINTS Proeedings of ASME 0 International Mehanial Engineering Congress & Exposition IMECE0 November 5-, 0, San Diego, CA IMECE0-6657 BENDING STIFFNESS AND DYNAMIC CHARACTERISTICS OF A ROTOR WITH SPLINE JOINTS

More information

Approximate logic synthesis for error tolerant applications

Approximate logic synthesis for error tolerant applications Approximate logi synthesis for error tolerant appliations Doohul Shin and Sandeep K. Gupta Eletrial Engineering Department, University of Southern California, Los Angeles, CA 989 {doohuls, sandeep}@us.edu

More information

Constructing Transaction Serialization Order for Incremental. Data Warehouse Refresh. Ming-Ling Lo and Hui-I Hsiao. IBM T. J. Watson Research Center

Constructing Transaction Serialization Order for Incremental. Data Warehouse Refresh. Ming-Ling Lo and Hui-I Hsiao. IBM T. J. Watson Research Center Construting Transation Serialization Order for Inremental Data Warehouse Refresh Ming-Ling Lo and Hui-I Hsiao IBM T. J. Watson Researh Center July 11, 1997 Abstrat In typial pratie of data warehouse, the

More information

Tackling IPv6 Address Scalability from the Root

Tackling IPv6 Address Scalability from the Root Takling IPv6 Address Salability from the Root Mei Wang Ashish Goel Balaji Prabhakar Stanford University {wmei, ashishg, balaji}@stanford.edu ABSTRACT Internet address alloation shemes have a huge impat

More information

CS:APP2e Web Aside ASM:X87: X87-Based Support for Floating Point

CS:APP2e Web Aside ASM:X87: X87-Based Support for Floating Point CS:APP2e Web Aside ASM:X87: X87-Based Support for Floating Point Randal E. Bryant David R. O Hallaron June 5, 2012 Notie The material in this doument is supplementary material to the book Computer Systems,

More information

Volume 3, Issue 9, September 2013 International Journal of Advanced Research in Computer Science and Software Engineering

Volume 3, Issue 9, September 2013 International Journal of Advanced Research in Computer Science and Software Engineering Volume 3, Issue 9, September 2013 ISSN: 2277 128X International Journal of Advaned Researh in Computer Siene and Software Engineering Researh Paper Available online at: www.ijarsse.om A New-Fangled Algorithm

More information

Query Evaluation Overview. Query Optimization: Chap. 15. Evaluation Example. Cost Estimation. Query Blocks. Query Blocks

Query Evaluation Overview. Query Optimization: Chap. 15. Evaluation Example. Cost Estimation. Query Blocks. Query Blocks Query Evaluation Overview Query Optimization: Chap. 15 CS634 Leture 12 SQL query first translated to relational algebra (RA) Atually, some additional operators needed for SQL Tree of RA operators, with

More information

This fact makes it difficult to evaluate the cost function to be minimized

This fact makes it difficult to evaluate the cost function to be minimized RSOURC LLOCTION N SSINMNT In the resoure alloation step the amount of resoures required to exeute the different types of proesses is determined. We will refer to the time interval during whih a proess

More information

Performance Benchmarks for an Interactive Video-on-Demand System

Performance Benchmarks for an Interactive Video-on-Demand System Performane Benhmarks for an Interative Video-on-Demand System. Guo,P.G.Taylor,E.W.M.Wong,S.Chan,M.Zukerman andk.s.tang ARC Speial Researh Centre for Ultra-Broadband Information Networks (CUBIN) Department

More information

Cluster-Based Cumulative Ensembles

Cluster-Based Cumulative Ensembles Cluster-Based Cumulative Ensembles Hanan G. Ayad and Mohamed S. Kamel Pattern Analysis and Mahine Intelligene Lab, Eletrial and Computer Engineering, University of Waterloo, Waterloo, Ontario N2L 3G1,

More information

Design of High Speed Mac Unit

Design of High Speed Mac Unit Design of High Speed Ma Unit 1 Harish Babu N, 2 Rajeev Pankaj N 1 PG Student, 2 Assistant professor Shools of Eletronis Engineering, VIT University, Vellore -632014, TamilNadu, India. 1 harishharsha72@gmail.om,

More information

Methods for Multi-Dimensional Robustness Optimization in Complex Embedded Systems

Methods for Multi-Dimensional Robustness Optimization in Complex Embedded Systems Methods for Multi-Dimensional Robustness Optimization in Complex Embedded Systems Arne Hamann, Razvan Rau, Rolf Ernst Institute of Computer and Communiation Network Engineering Tehnial University of Braunshweig,

More information

XML Data Streams. XML Stream Processing. XML Stream Processing. Yanlei Diao. University of Massachusetts Amherst

XML Data Streams. XML Stream Processing. XML Stream Processing. Yanlei Diao. University of Massachusetts Amherst XML Stream Proessing Yanlei Diao University of Massahusetts Amherst XML Data Streams XML is the wire format for data exhanged online. Purhase orders http://www.oasis-open.org/ommittees/t_home.php?wg_abbrev=ubl

More information

We don t need no generation - a practical approach to sliding window RLNC

We don t need no generation - a practical approach to sliding window RLNC We don t need no generation - a pratial approah to sliding window RLNC Simon Wunderlih, Frank Gabriel, Sreekrishna Pandi, Frank H.P. Fitzek Deutshe Telekom Chair of Communiation Networks, TU Dresden, Dresden,

More information

Cluster Centric Fuzzy Modeling

Cluster Centric Fuzzy Modeling 10.1109/TFUZZ.014.300134, IEEE Transations on Fuzzy Systems TFS-013-0379.R1 1 Cluster Centri Fuzzy Modeling Witold Pedryz, Fellow, IEEE, and Hesam Izakian, Student Member, IEEE Abstrat In this study, we

More information

Improved flooding of broadcast messages using extended multipoint relaying

Improved flooding of broadcast messages using extended multipoint relaying Improved flooding of broadast messages using extended multipoint relaying Pere Montolio Aranda a, Joaquin Garia-Alfaro a,b, David Megías a a Universitat Oberta de Catalunya, Estudis d Informàtia, Mulimèdia

More information

DETECTION METHOD FOR NETWORK PENETRATING BEHAVIOR BASED ON COMMUNICATION FINGERPRINT

DETECTION METHOD FOR NETWORK PENETRATING BEHAVIOR BASED ON COMMUNICATION FINGERPRINT DETECTION METHOD FOR NETWORK PENETRATING BEHAVIOR BASED ON COMMUNICATION FINGERPRINT 1 ZHANGGUO TANG, 2 HUANZHOU LI, 3 MINGQUAN ZHONG, 4 JIAN ZHANG 1 Institute of Computer Network and Communiation Tehnology,

More information

INTERPOLATED AND WARPED 2-D DIGITAL WAVEGUIDE MESH ALGORITHMS

INTERPOLATED AND WARPED 2-D DIGITAL WAVEGUIDE MESH ALGORITHMS Proeedings of the COST G-6 Conferene on Digital Audio Effets (DAFX-), Verona, Italy, Deember 7-9, INTERPOLATED AND WARPED -D DIGITAL WAVEGUIDE MESH ALGORITHMS Vesa Välimäki Lab. of Aoustis and Audio Signal

More information

- 1 - S 21. Directory-based Administration of Virtual Private Networks: Policy & Configuration. Charles A Kunzinger.

- 1 - S 21. Directory-based Administration of Virtual Private Networks: Policy & Configuration. Charles A Kunzinger. - 1 - S 21 Diretory-based Administration of Virtual Private Networks: Poliy & Configuration Charles A Kunzinger kunzinge@us.ibm.om - 2 - Clik here Agenda to type page title What is a VPN? What is VPN Poliy?

More information

Calculation of typical running time of a branch-and-bound algorithm for the vertex-cover problem

Calculation of typical running time of a branch-and-bound algorithm for the vertex-cover problem Calulation of typial running time of a branh-and-bound algorithm for the vertex-over problem Joni Pajarinen, Joni.Pajarinen@iki.fi Otober 21, 2007 1 Introdution The vertex-over problem is one of a olletion

More information

Gray Codes for Reflectable Languages

Gray Codes for Reflectable Languages Gray Codes for Refletable Languages Yue Li Joe Sawada Marh 8, 2008 Abstrat We lassify a type of language alled a refletable language. We then develop a generi algorithm that an be used to list all strings

More information

Plot-to-track correlation in A-SMGCS using the target images from a Surface Movement Radar

Plot-to-track correlation in A-SMGCS using the target images from a Surface Movement Radar Plot-to-trak orrelation in A-SMGCS using the target images from a Surfae Movement Radar G. Golino Radar & ehnology Division AMS, Italy ggolino@amsjv.it Abstrat he main topi of this paper is the formulation

More information

Zippy - A coarse-grained reconfigurable array with support for hardware virtualization

Zippy - A coarse-grained reconfigurable array with support for hardware virtualization Zippy - A oarse-grained reonfigurable array with support for hardware virtualization Christian Plessl Computer Engineering and Networks Lab ETH Zürih, Switzerland plessl@tik.ee.ethz.h Maro Platzner Department

More information

Improved Circuit-to-CNF Transformation for SAT-based ATPG

Improved Circuit-to-CNF Transformation for SAT-based ATPG Improved Ciruit-to-CNF Transformation for SAT-based ATPG Daniel Tille 1 René Krenz-Bååth 2 Juergen Shloeffel 2 Rolf Drehsler 1 1 Institute of Computer Siene, University of Bremen, 28359 Bremen, Germany

More information

Dr.Hazeem Al-Khafaji Dept. of Computer Science, Thi-Qar University, College of Science, Iraq

Dr.Hazeem Al-Khafaji Dept. of Computer Science, Thi-Qar University, College of Science, Iraq Volume 4 Issue 6 June 014 ISSN: 77 18X International Journal of Advaned Researh in Computer Siene and Software Engineering Researh Paper Available online at: www.ijarsse.om Medial Image Compression using

More information

Algorithms for External Memory Lecture 6 Graph Algorithms - Weighted List Ranking

Algorithms for External Memory Lecture 6 Graph Algorithms - Weighted List Ranking Algorithms for External Memory Leture 6 Graph Algorithms - Weighted List Ranking Leturer: Nodari Sithinava Sribe: Andi Hellmund, Simon Ohsenreither 1 Introdution & Motivation After talking about I/O-effiient

More information

Facility Location: Distributed Approximation

Facility Location: Distributed Approximation Faility Loation: Distributed Approximation Thomas Mosibroda Roger Wattenhofer Distributed Computing Group PODC 2005 Where to plae ahes in the Internet? A distributed appliation that has to dynamially plae

More information

Background/Review on Numbers and Computers (lecture)

Background/Review on Numbers and Computers (lecture) Bakground/Review on Numbers and Computers (leture) ICS312 Mahine-Level and Systems Programming Henri Casanova (henri@hawaii.edu) Numbers and Computers Throughout this ourse we will use binary and hexadeimal

More information

Self-Adaptive Parent to Mean-Centric Recombination for Real-Parameter Optimization

Self-Adaptive Parent to Mean-Centric Recombination for Real-Parameter Optimization Self-Adaptive Parent to Mean-Centri Reombination for Real-Parameter Optimization Kalyanmoy Deb and Himanshu Jain Department of Mehanial Engineering Indian Institute of Tehnology Kanpur Kanpur, PIN 86 {deb,hjain}@iitk.a.in

More information

An Interactive-Voting Based Map Matching Algorithm

An Interactive-Voting Based Map Matching Algorithm Eleventh International Conferene on Mobile Data Management An Interative-Voting Based Map Mathing Algorithm Jing Yuan* University of Siene and Tehnology of China Hefei, China yuanjing@mail.ust.edu.n Yu

More information

2017 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media,

2017 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, 2017 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any urrent or future media, inluding reprinting/republishing this material for advertising

More information

UCSB Math TI-85 Tutorials: Basics

UCSB Math TI-85 Tutorials: Basics 3 UCSB Math TI-85 Tutorials: Basis If your alulator sreen doesn t show anything, try adjusting the ontrast aording to the instrutions on page 3, or page I-3, of the alulator manual You should read the

More information

Interconnection Styles

Interconnection Styles Interonnetion tyles oftware Design Following the Export (erver) tyle 2 M1 M4 M5 4 M3 M6 1 3 oftware Design Following the Export (Client) tyle e 2 e M1 M4 M5 4 M3 M6 1 e 3 oftware Design Following the Export

More information

Multiple-Criteria Decision Analysis: A Novel Rank Aggregation Method

Multiple-Criteria Decision Analysis: A Novel Rank Aggregation Method 3537 Multiple-Criteria Deision Analysis: A Novel Rank Aggregation Method Derya Yiltas-Kaplan Department of Computer Engineering, Istanbul University, 34320, Avilar, Istanbul, Turkey Email: dyiltas@ istanbul.edu.tr

More information

Trajectory Tracking Control for A Wheeled Mobile Robot Using Fuzzy Logic Controller

Trajectory Tracking Control for A Wheeled Mobile Robot Using Fuzzy Logic Controller Trajetory Traking Control for A Wheeled Mobile Robot Using Fuzzy Logi Controller K N FARESS 1 M T EL HAGRY 1 A A EL KOSY 2 1 Eletronis researh institute, Cairo, Egypt 2 Faulty of Engineering, Cairo University,

More information

The Implementation of RRTs for a Remote-Controlled Mobile Robot

The Implementation of RRTs for a Remote-Controlled Mobile Robot ICCAS5 June -5, KINEX, Gyeonggi-Do, Korea he Implementation of RRs for a Remote-Controlled Mobile Robot Chi-Won Roh*, Woo-Sub Lee **, Sung-Chul Kang *** and Kwang-Won Lee **** * Intelligent Robotis Researh

More information

Performance Improvement of TCP on Wireless Cellular Networks by Adaptive FEC Combined with Explicit Loss Notification

Performance Improvement of TCP on Wireless Cellular Networks by Adaptive FEC Combined with Explicit Loss Notification erformane Improvement of TC on Wireless Cellular Networks by Adaptive Combined with Expliit Loss tifiation Masahiro Miyoshi, Masashi Sugano, Masayuki Murata Department of Infomatis and Mathematial Siene,

More information

Reduced-Complexity Column-Layered Decoding and. Implementation for LDPC Codes

Reduced-Complexity Column-Layered Decoding and. Implementation for LDPC Codes Redued-Complexity Column-Layered Deoding and Implementation for LDPC Codes Zhiqiang Cui 1, Zhongfeng Wang 2, Senior Member, IEEE, and Xinmiao Zhang 3 1 Qualomm In., San Diego, CA 92121, USA 2 Broadom Corp.,

More information

Gradient based progressive probabilistic Hough transform

Gradient based progressive probabilistic Hough transform Gradient based progressive probabilisti Hough transform C.Galambos, J.Kittler and J.Matas Abstrat: The authors look at the benefits of exploiting gradient information to enhane the progressive probabilisti

More information

Colouring contact graphs of squares and rectilinear polygons de Berg, M.T.; Markovic, A.; Woeginger, G.

Colouring contact graphs of squares and rectilinear polygons de Berg, M.T.; Markovic, A.; Woeginger, G. Colouring ontat graphs of squares and retilinear polygons de Berg, M.T.; Markovi, A.; Woeginger, G. Published in: nd European Workshop on Computational Geometry (EuroCG 06), 0 Marh - April, Lugano, Switzerland

More information

1. Introduction. 2. The Probable Stope Algorithm

1. Introduction. 2. The Probable Stope Algorithm 1. Introdution Optimization in underground mine design has reeived less attention than that in open pit mines. This is mostly due to the diversity o underground mining methods and omplexity o underground

More information

Series/1 GA File No i=:: IBM Series/ Battery Backup Unit Description :::5 ~ ~ >-- ffi B~88 ~0 (] II IIIIII

Series/1 GA File No i=:: IBM Series/ Battery Backup Unit Description :::5 ~ ~ >-- ffi B~88 ~0 (] II IIIIII Series/1 I. (.. GA34-0032-0 File No. 51-10 a i=:: 5 Q 1 IBM Series/1 4999 Battery Bakup Unit Desription B88 0 (] o. :::5 >-- ffi "- I II1111111111IIIIII1111111 ---- - - - - ----- --_.- Series/1 «h: ",

More information

CA Release Automation 5.x Implementation Proven Professional Exam (CAT-600) Study Guide Version 1.1

CA Release Automation 5.x Implementation Proven Professional Exam (CAT-600) Study Guide Version 1.1 Exam (CAT-600) Study Guide Version 1.1 PROPRIETARY AND CONFIDENTIAL INFORMATION 2016 CA. All rights reserved. CA onfidential & proprietary information. For CA, CA Partner and CA Customer use only. No unauthorized

More information

A scheme for racquet sports video analysis with the combination of audio-visual information

A scheme for racquet sports video analysis with the combination of audio-visual information A sheme for raquet sports video analysis with the ombination of audio-visual information Liyuan Xing a*, Qixiang Ye b, Weigang Zhang, Qingming Huang a and Hua Yu a a Graduate Shool of the Chinese Aadamy

More information

A DYNAMIC ACCESS CONTROL WITH BINARY KEY-PAIR

A DYNAMIC ACCESS CONTROL WITH BINARY KEY-PAIR Malaysian Journal of Computer Siene, Vol 10 No 1, June 1997, pp 36-41 A DYNAMIC ACCESS CONTROL WITH BINARY KEY-PAIR Md Rafiqul Islam, Harihodin Selamat and Mohd Noor Md Sap Faulty of Computer Siene and

More information

Smooth Trajectory Planning Along Bezier Curve for Mobile Robots with Velocity Constraints

Smooth Trajectory Planning Along Bezier Curve for Mobile Robots with Velocity Constraints Smooth Trajetory Planning Along Bezier Curve for Mobile Robots with Veloity Constraints Gil Jin Yang and Byoung Wook Choi Department of Eletrial and Information Engineering Seoul National University of

More information

CA Test Data Manager 4.x Implementation Proven Professional Exam (CAT-681) Study Guide Version 1.0

CA Test Data Manager 4.x Implementation Proven Professional Exam (CAT-681) Study Guide Version 1.0 Implementation Proven Professional Study Guide Version 1.0 PROPRIETARY AND CONFIDENTIAL INFORMATION 2017 CA. All rights reserved. CA onfidential & proprietary information. For CA, CA Partner and CA Customer

More information

35 th Design Automation Conference Copyright 1998 ACM

35 th Design Automation Conference Copyright 1998 ACM Using Reongurable Computing Tehniques to Aelerate Problems in the CAD Domain: A Case Study with Boolean Satisability Peixin Zhong, Pranav Ashar, Sharad Malik and Margaret Martonosi Prineton University

More information

Performance of Histogram-Based Skin Colour Segmentation for Arms Detection in Human Motion Analysis Application

Performance of Histogram-Based Skin Colour Segmentation for Arms Detection in Human Motion Analysis Application World Aademy of Siene, Engineering and Tehnology 8 009 Performane of Histogram-Based Skin Colour Segmentation for Arms Detetion in Human Motion Analysis Appliation Rosalyn R. Porle, Ali Chekima, Farrah

More information

Make your process world

Make your process world Automation platforms Modion Quantum Safety System Make your proess world a safer plae You are faing omplex hallenges... Safety is at the heart of your proess In order to maintain and inrease your ompetitiveness,

More information

Path Sharing and Predicate Evaluation for High-Performance XML Filtering*

Path Sharing and Predicate Evaluation for High-Performance XML Filtering* Path Sharing and Prediate Evaluation for High-Performane XML Filtering Yanlei Diao, Mihael J. Franklin, Hao Zhang, Peter Fisher EECS, University of California, Berkeley {diaoyl, franklin, nhz, fisherp}@s.erkeley.edu

More information

Allocating Rotating Registers by Scheduling

Allocating Rotating Registers by Scheduling Alloating Rotating Registers by Sheduling Hongbo Rong Hyunhul Park Cheng Wang Youfeng Wu Programming Systems Lab Intel Labs {hongbo.rong,hyunhul.park,heng..wang,youfeng.wu}@intel.om ABSTRACT A rotating

More information

CA Agile Requirements Designer 2.x Implementation Proven Professional Exam (CAT-720) Study Guide Version 1.0

CA Agile Requirements Designer 2.x Implementation Proven Professional Exam (CAT-720) Study Guide Version 1.0 Exam (CAT-720) Study Guide Version 1.0 PROPRIETARY AND CONFIDENTIAL INFORMATION 2017 CA. All rights reserved. CA onfidential & proprietary information. For CA, CA Partner and CA Customer use only. No unauthorized

More information

8 Instruction Selection

8 Instruction Selection 8 Instrution Seletion The IR ode instrutions were designed to do exatly one operation: load/store, add, subtrat, jump, et. The mahine instrutions of a real CPU often perform several of these primitive

More information