Direct-Mapped Caches

Size: px

Start display at page:

Download "Direct-Mapped Caches"

Heather Byrd
6 years ago
Views:

1 A Case for Diret-Mapped Cahes Mark D. Hill University of Wisonsin ahe is a small, fast buffer in whih a system keeps those parts, of the ontents of a larger, slower memory that are likely to be used soon. The purpose of a ahe is to improve systetn ost performane by providing the apaity of the large, slow memory with an average aess time lose to that of the small, fast ahe. This is possible only if most memory referenes an be servied rapidly by the ahe without the intervention of the slower memory. Usually ahes are suessful due to temporal and spatial loality, two properties of most real referene streams. Temporal loality means future referenes are likely to be made to the same loations as reent referenes, while spatial loality suggests that future referenes are also likely to be made to loations near reent referenes. Cahes take advantage of temporal loality by retaining reently referened information, while they exploit spatial loality by loading and retaining bloks of information surrounding reent referenes. A CPU ahe is a ahe of main memory.' Like ahes in general, CPU ahes are faster and smaller than the memory they buffer. They are usually five to 20 times faster and 50 to 1,000 times smaller than main memory. Beause CPU ahes must be extremely fast, they are managed entirely by hardware, and for this reason, Despite having worse miss ratios, large diret-mapped ahes often handle proessor referenes faster than more-expensive set-assoiative ahes. CPU-ahe aess and management poliies must be relatively simple. CPU ahes have been studied extensively' beause they have proven effetive at inreasing system performane, lowering system ost, or both. CPU ahes ontinue to be worth studying beause their importane to system ost-performane is inreasing and tehnologial improvements are altering their harateristis. (Sine this artile examines only CPU ahes, the term ahe is often used instead of CPU ahe.) The important ahe design parameter examined here is assoiativity, whih is also alled degree of assoiativity or set size. The assoiativity of a ahe is the number of blok frames in whih a given blok may reside. Reduing assoiativity allows fewer blok frames to be searhed on a referene, a potential implementation advantage. However, this further onstrains whih bloks an be simultaneously resident, a potential performane disadvantage. The terms fully-assoiative, setassoiative, and diret-mapped express the relationship between a ahe's assoiativity and apaity. A ahe of blok frames is alled fully assoiative if a blok an reside in any blok frame (assoiativity ), n-way set-assoiative if a blok an reside only in one of n blok frames where 1 < n < (assoiativity n), and diret-mapped if a blok an reside in only one blok frame (assoiativity 1). Figure 1 illustrates set-assoiative mapping. t is worthwhile studying assoiativity beause tehnologial trends toward large, fast stati RAMS are failitating larger ahe sizes and arhitetural trends toward redued instrution set omputers (RSCs) are requiring faster hit times. The trend to larger ahes is illustrated by the VAX 11 family. The reently introdued VAX 8800 uses a 64-Kbyte diret-mapped ahe, while older VAX 11 implementations like the VAX 11/780 and VAX 11/785 use setassoiative ahes of 8 and 16 Kbytes. RSCs aentuate the need for ahes Deember 1988 OOlR 9162/XX/ $ l Ett 25

2 regarding single-level ahes extend to two-level ahe hierarhies. restrit my arguments to uniproessors for two reasons. First, uniproessors are and will ontinue to be important, espeially for omputers less ostly than mainframes, suh as engineering workstations. Seond, a thorough analysis of ahes in multiproessors requires overage of many degrees of freedom, whih would dilute the thrust of this artile. These inlude interonnetion topology, whether ontrol is single-instrution-multiple-data or multiple-instrution-multiple-data, synhronization and ahe ohereny mehanisms, and number of proessors (hene granularity of sharing). will, however, disuss how and when these arguments for ahes in uniproessors apply to ahes in multiproessors. Performane metris To examine ahe performane, use miss ratio and an extended model of effetive aess time. Miss ratio. Miss ratio is the most ommonly-used ahe performane metri.' The miss ratio for a ahe Cis m(c) = No. of misses with ahe C No. of proessor referenes with fast hit times by having simple pipelines that failitate shorter yle times, and by referening memory so frequently (one per yle) that CPU yle times are often determined by ahe hit times. Commerial RSC proessors have been introdued by AMD, Hewlett-Pakard, BM, ntel, MPS, Motorola, Sun, and others. This artile will show that trends toward larger ahe sizes and faster hit times favor diret-mapped ahes. The arguments in the main body of this artile are restrited to single-level ahes in uniproessors. A single-level ahe servies proessor referenes and obtains data for misses diretly from main memory. Most past and present omputers use single-level ahes, as will many future omputers. expet some future omputers, however, to use twolevel (or more) ahe hierarhies, where a level-one ahe servies proessor referenes and obtains data for misses from a level-two ahe, whih in turn servies level-one-ahe misses and obtains data for its misses from memory. Later in the artile, disuss how my arguments use m(c), rather than m, to emphasize that the miss ratio is a funtion of a ahe organization. "C" represents all attributes of ahe C. Miss ratio is used beause it is easy to define, interpret, and ompute, and perhaps most important, beause it is implementation independent. This independene failitates ahe performane omparisons between ahes not yet implemented and those implemented with different tehnologies and in different kinds of systems. Unfortunately, some omparisons of dissimilar ahes an lead to misleading results. A miss ratio ornparison, for example, between the Cray-1 instrution buffers and the Motorola on-hip instrution ahe is meaningless beause the tehnologies and workloads have little in ommon. Sine miss-ratio omparisons ontrast the number of misses, they an also be misleading if the penalty for a miss varies. For instane, inreasing ahe blok size often redues the number of misses and hene the miss ratio, but it often also inreases the number of yles needed to load a 26 COMPUTER

3 ~ Blok number Blok offset 1 Address L A -1 Set-mappi ng funtion Set deoder 4 Assoiativity (A=n) Number of sets (S =/n) t Compare blok number with tags and selet data word Figure 1. Set-assoiative mapping. A set-assoiative ahe uses a set-mapping funtionf to partition all main-memory bloks into equivalene lasses. Some ahe blok frames are assigned to hold reently referened bloks from eah lass. Eah group of blok frames is alled a set. The number of groups, alled the number of sets (s), equals the number of lasses. The number of blok frames in eah set is alled the assoiativity (degree of assoiativity, set size, n). The number of blok frames in the ahe () always equals the assoiativity times the number of sets ( = n*s). A ahe is fully-assoiative if it ontains only one set (n =, s = l), is diret-mapped if eah set ontains one blok frame (n = 1, s = E ), and is n-way set-assoiative otherwise (where n is the assoiativity, s = /n). On a referene to blok x, set-mapping funtionffeeds the set deoder withf(x) to selet one set (one row); eah blok frame is searhed until x is found (a ahe hit) or the set is exhausted (a ahe miss). On a ahe miss, one blok in setf(x) is replaed with the blok x obtained from memory. Finally, the word requested from blok x is returned to the proessor. For oneptual simpliity, the figure shows the word seleted last (in box labeled Compare blok number with tags and selet data word ). To redue the number of bits that must be read, many implementations selet the word while seleting the set. The most ommonly used set-mapping funtion is the blok number modulo the number of sets, where the number of sets is a power of two. This funtion is alled bit seletion sine it equals several low-order bits of the blok number. For 256 sets, for example,f(x) = x mod 256 orf(x) = x AND Oxff, where mod is remainder and AND is bitwise-and. blok. The atual hange in ahe performane will depend on how muh the number of misses dereases and how muh the time to servie a miss inrease^.^ The penalty for a miss an also vary beause of delays indiretly affeted by hanges in miss ratio, suh as memory ontention in a multiproessor. Effetive aess time. Another ommonly used ahe performane metri is effetive aess time, teff(c) (average aess time). Effetive aess time is the average lateny, as seen by the proessor, required by the memory system to servie a memory referene. n this artile, model it as where m(c), fahe(c), and fmemory(c) are the miss ratio, ahe hit time, and average miss penalty (delay beyond a ahe aess to aess memory) for ahe C. Stritly speaking, ahe hit time should be alled ahe aess time, sine this delay ours on all aesses, not just hits. hoose not to use ahe aess time, beause it is too easily onfused with effetive aess time. Using effetive aess time rather than miss ratio allows ahes with different hit and miss times to be more aurately ompared. One an, for example, determine whether inreasing ahe blok size improves performane as well as miss ratio. The disadvantage, however, is that implementation details must be examined and assumptions must be made for the values of tahe(c) and tmemory(c). Performane estimates with any implementation assumptions are less general, and those with inorret assumptions are misleading. Unlike many other ahe memory analyses, my analysis does not assume that tahe(c) is the same for all ahes studied. The disadvantage of inluding hanges in tahe(c) is that more implementationdependent parameters must be estimated, further limiting the generality of results. However, variability in ahe hit time must be onsidered, sine ignoring it an lead to inorret onlusions when omparing Deember

4 Address lath State Ta Valid and math? + i Data - Addr memory Data MathOut b log,(number of words per blok frame) bits DataOut = log,(number of blok frames) bits t = 32-i-b-2 bits Figure 2. A diret-mapped ahe. The aess logi (hit, not miss logi) for a diret-mapped ahe using bit seletion to selet the set (blok frame) of the referene has three omponents. The first omponent, the data memory, holds all ahed data and instrutions. The seond omponent, tag memory, holds the state bits and address tag assoiated with a ahed blok. The last omponent, the math logi, produes a single bit indiating whether the referened blok is present. large ahes of varying assoiativities. On the other hand, like many other ahe memory analyses, mine assumes that ahe hanges do not affet the average miss penalty, fmemory (C). This assumption simplifies analysis, but it an bias results for multiproessors where delays due to ontention or updating memory are large and variable. This artile does not evaluate ahes with system performane metris like benhmark exeution time or effetive number of proessors, sine these metris require many system-dependent assumptions that limit their usefulness to omparing similar alternative ahes within the ontext of an existing system. Furthermore, system metris rarely produe onlusions that generalize to ahe designs in other systems, beause of the diffiulty of isolating ahe effets from other system effets. mplementing ahes This setion examines the implementation of diret-mapped and set-assoiative ahes. onentrate on diret-mapped ahe hit (aess) logi and setassoiativity logi, beause the delay through this logi determines ahe hit time and diretly affets effetive aess time. Set-assoiativity logi is the additional logi required by a set-assoiative ahe over a diret-mapped ahe. For this disussion, assume a generi memory system with a single four-gigabyte address spae of aligned four-byte words addressed with 32-bit byte addresses. also assume address translation is done in a way that does not affet the ahe hit time. Diret-mapped ahe. A diret-mapped ahe is simpler to build than a setassoiative ahe beause the ahe loa- tion of a referened word is a funtion of the address of a referene only and the replaement algorithm is trivial. The address of a referene to a diret-mapped ahe using bit seletion is divided into several fields. From least-signifiant to most-signifiant, they are (1) two bits that are ignored, assuming a byte address and aligned word referenes; (2) b = log, (number of wordsper blok frame)bits of the blok (offset); (3) i= log, (number of blok frames) bits of the index; and (4) t = 32 - i - b - 2 bits of the address tag. Diret-mapped aess logi, illustrated in Figure 2, has three omponents: data memory, tag memory, and math logi. Data memory holds all the ahed data and instrutions. ts size is, by definition, the ahe size. Coneptually, it an be organized as if it were one word wide and aessed with an address formed by onatenating the index (i bits) and blok (b 28 COMPUTER

5 Address t+i+b 1-1 Address Address e e. Address Ban k[o] Bank[ll Bank[n -11 Math[O] Data[O] Math[l] Data[ \ 1 mom Math[n -11 Data[n \ \ 1 MathOut DataOut Figure 3. A set-assoiative ahe. Cahe aess (hit) logi for an n-way set-assoiative ahe of bloks onsists of n banks and the logi to ombine bank results. Eah bank an be thought of as a diret-mapped ahe of /n bloks and an be implemented using the logi in the dashed box of Figure 2. bits) fields of the address. f it is implemented as a wider memory, some or all of the bits in the blok field will be used to selet a word after the data memory aess. A blok-wide data memory is often preferred when unaligned memory referenes are permitted. The seond omponent, the tag memory, whih holds the state bits (s bits) and address tag (t bits) assoiated with a ahed blok, has one entry per blok frame and is addressed by the index field. The state bits for a blok, usually one or two bits, indiate the blok s status regarding memory update or in a ahe ohereny protool. Cahe hit logi is only onerned with whether a blok is valid. The last omponent, the math-logi, produes a single bit indiating whether the referened blok is present. This bit is asserted only if the tag read from the tag memory is equal to the tag field of the address and the state read from the tag memory is valid. A diret-mapped ahe lookup requires two parallel ations. One ation, alled read-data, onsists of aessing the data memory and passing the word read to DataOut. The seond ation, alled math-found, requires two steps: first, aessing the tag memory to read the state and address tag for a blok frame; seond, asserting MathOut if the state is valid and the tag mathes the referene s tag. Thus, a diret-mapped ahe lookup is simpler than a set-assoiative lookup (desribed below) beause ations readdata and math-found an proeed independently. n set-assoiative ahes, the results of math-found influene the data seleted. Set-assoiative ahe. An n-way setassoiative ahe (n = 2,4,8, or 16), is a ommonly used ahe organization. An n- way set-assoiative ahe allows any one of the n bloks in a referene s set to be replaed on a miss. While this flexibility usually yields lower miss ratios, it requires heking n bloks on eah referene. To keep a set-assoiative ahe hit time similar to that of a diret-mapped ahe, eah of then tags in a set must be read and ompared to the tag of the referene in parallel. This assoiative lookup and omparison adds signifiant ost, as measured in hip ount and board area. Figure 3 shows the basi struture of an n-way set-assoiative ahe. Eah bank has the same struture as an n-timessmaller diret-mapped ahe (see Figure 2). Thus, the index field for eah bank requires i = log, (number of blok frumesln) bits, making the tag field, t, log,n bits larger than for a diret-mapped ahe of the same size. n addition, some Deember

6 Address t+i+b _t_ MathOut DataOut Figure 4. An alternative set-assoiative ahe. This figure shows ahe hit logi for an n-way set-assoiative ahe with a different set-assoiativity logi implemention from that of Figure 3. First, it uses wired-or logi instead of an OR gate to ompute MathOut. Seond, the 32-bit-wide n-to-1 multiplexer and selet logi have been replaed with n 32-bit-wide tri-state buffers. logi, alled the set-assoiativity logi, is needed to selet the result from one of the n banks. On a referene, the address is passed to all the diret-mapped banks. n parallel, eah bank selets a blok, sends 32 bits of data to Data[i], and omputes Math(i1, whih is asserted on valid tag mathes. The set of a referene onsists of the n bloks seleted by the n banks. After the n diret-mapped banks ompute Math[i]'s and Data[i]'s, the setassoiativity logi, shown in the dashed box in Figure 2, produes a single Math- Out signal and DataOut word. MathOut, asserted on a ahe hit, is the logial OR of the n Math[i] signals. DataOut, the data to be returned, must be driven to the Data[;] for the bank that mathed and an be any value if none mathed. One way to implement set-assoiativity logi is illustrated in Figure 3. Here, MathOut is omputed with a single n- input OR gate and DataOut with a 32-bitwide n-to-1 multiplexer. The multiplexer Selet input is driven with the number of the bank that mathed and an be any value if none mathed. Selet an be omputed with an n-bit enoder or with a single level of log&) n/2-input OR gates. Alternate ways of omputing MathOut and DataOut are illustrated in Figure 4. MathOut is omputed by wire-oring all Math[i]'s together, as is possible using open olletor (o) gates in TTL or any ECL gates. This approah requires om- puting two opies of eah Math [i] so that the wire-oring does not affet whih data is seleted. This dupliation does not ause additional delay if the final AND-gate in the bank math logi (not shown) is dupliated. The alternative implementation for DataOut uses tri-state buffers. Here, eah Data[i] is onneted to the input of a tristate buffer, whose enable is ontroled by Math[i]. All n tri-state buffer outputs are onneted together and to DataOut. At most, one tri-state buffer is enabled sine, at most, one bank an math. f no banks math, DataOut is undefined. The distintion between the logi within the n banks and the set-assoiativity logi is not as lear in many implementations as 30 COMPUTER

7 math-found selet-data read-data Figure 5. Timing paths in a set-assoiative ahe. The three timing paths in the ahe hit logi for an n-way set-assoiative ahe are (1) math-found, whih signals a ahe hit or miss (Address to Math[iJ to Mathout); (2) selet-data, whih selets the data word that orresponds to the tag that mathed (Address to Math111 to Selet to DataOut); and (3) read-data, whih provides the data on a ahe hit (Address to Data[i] to DataOut). Path selet-data is not needed in a diret-mapped ahe. it is in Figures 3 and 4. For example, the n omparators and the enoding logi an be ombined into a single n-way omparator that diretly ontrols the multiplexer. Nevertheless, a set-assoiative ahe always requires more iruitry than a diret-mapped ahe. The delay through a set-assoiative ahe is determined by one of three timing paths, illustrated in Figure 5: (1) math-found, whih signals a ahe hit or miss; (2) selet-data, whih selets the data word that orresponds to the tag that mathed; and (3) read-data, whih provides data on a ahe hit. A diret-mapped ahe has timing paths read-data and math-found, but it does not have path selet-data sine the loation of ahed data in a diret-mapped ahe does not depend on whih omparator mathed. Arguments against diret-mapped ahes The arguments against diret-mapped ahes are that they (1) have worse miss ratios than set-assoiative ahes of the same size, (2) have terrible worst-ase behavior, and (3) prelude doing address translation in parallel with the first part of the ahe lookup. n the following setion, show that as single-level ahes in uniproessors get larger, the effets of the first two arguments are diminished and the third argument beomes moot. Larger miss ratios. t is well-known that diret-mapped ahes have larger miss ratios than set-assoiative Consider the likelihood of prematurely replaing an ative blok (one that is being referened) when multiple ative bloks map to the same set. A diret-mapped ahe allows only one of the multiple ative bloks to reside in the ahe at any time, while an n-way set-assoiative ahe allows n bloks to be ahed. Data from simulation and measurement show, however, that the size of the miss Deember

8 E n E m - a K 10K 100K 1M Cahe size (bytes) (a) Two-way to diret-mapped, 16-byte bloks K 10K OOK 1M Cahe size (bytes) (b) Two-way to diret-mapped, 32-byte bloks Figure 6. Miss ratio differenes for unified ahes. This figure shows the hanges in miss ratio, Am, that result when assoiativity is redued from two-way to diret-mapped for unified (data and instrutions ahed together) ahes with 16-byte (a) or 32-byte (b) bloks. The data show that miss ratio differenes diminish as ahes get larger. n omparing 16-byte and 32-byte miss ratios, ignore the dashed lines sine this data omes from different traes. The set-assoiative ahes use LRU replaement. (Soures: The miss ratio data in both figures (solid lines) is derived from Tables 2 and 3 in Alexander6 and Table 3-4 in Hill9. Additional data (dashed lines) for 16-byte bloks (a) omes from Figures 5.10a and 5.10b in Agarwal. Additional data (dashed lines) for 32-byte bloks (b) omes from Figures in Smith. ) ratio differene that results from hanging assoiativity is less than one might expet (Figure 6). The intuition that assoiativity makes a tremendous differene is wrong, beause it fails to onsider that referenes are not made to random loations. Rather, referenes are usually made to loations in reently referened bloks. The tendeny to re-referene bloks makes the miss ratios of all ahes muh less than one, thereby diminishing all potential missratio differenes. A trend in the data shown in Figure 6, not heretofore emphasized, is that the miss ratio differenes diminish as the ahes get larger. For 8-Kbyte unified (data and instrutions ahed together) ahes with 32-byte bloks, for example, the data show that reduing assoiativity from two-way to diret-mapped auses an absolute miss ratio hange of about 0.013, while at 32 Kbytes the hange is Miss ratio differenes for further assoiativity inreases (from two-way to four-way, from four-way to eight-way), not shown, are muh smaller and diminish further as the ahes get larger. Miss ratio differenes diminish as ahes get larger for two reasons. First, the ative bloks are less likely to map to the same set in larger ahes, sine larger ahes have more sets. For fixed assoiativity and blok size, the number of sets is proportional to ahe size. Seond, the miss ratios of all ahe organizations get smaller with inreasing ahe size, diminishing potential miss-ratio differenes. The data from many soures onluively show that the miss ratio differene between a diret-mapped ahe and a setassoiative ahe of the same size diminishes as ahe size inreases. Conse- quently, the disadvantage to diretmapped ahes beomes less important for larger ahes. Terrible worst-ase behavior. Another argument against diret-mapped ahes is that their worst-ase behavior, when multiple bloks ollide in a set, is terrible. While this is true, one must ask whether an analysis of worst-ase behavior should inlude how likely this behavior is. f not, then submit that the worst-ase behavior of diret-mapped ahes is no worse than that of set-assoiative ahes. f too many bloks map to a given set, both organizations will thrash. That fewer ative bloks an ause diret-mapped ahes to thrash does not hange the severity of the worst-ase behavior, only its likelihood, whih we just hose to ignore. On the other hand, if one wishes to 32 COMPUTER

inlude the probability that worst-ase behavior ours in one s analysis, then one must observe that (1) worst-ase behavior does not our very often, as is indiated by the small differenes in average

9 inlude the probability that worst-ase behavior ours in one s analysis, then one must observe that (1) worst-ase behavior does not our very often, as is indiated by the small differenes in average miss ratios, and (2) it ours less often in larger ahes, as is indiated by the diminishing average-miss-ratio differenes. n summary, the worst-ase behavior of all ahes, inluding large ahes, is bad, but while worst-ase behavior is more likely in large diret-mapped ahes than in large set-assoiative ahes, it is still unlikely. Parallel address translation diffiult. Almost all high-end omputers in the last two deades used paged virtual memory and organized their ahes with physial addresses. n these systems, address translation (the translation of virtual addresses to physial addresses) ours logially before the ahe is aessed. For some of these ahe onfigurations, however, it is possible to do the address translation in parallel with part of the ahe aess. An important disadvantage of reasonably sized diret-mapped ahes is that this tehnique, alled parallel address translation, is impratial, sine straightforward implementations require that a ahe s size not exeed its assoiativity times the page size. The BM 3033, for example, uses parallel address translation and has a 16-way set-assoiative, 64-Kbyte, physiallytagged ahe and 4-Kbyte pages. A 4-Kbyte diret-mapped ahe, on the other hand, would not be adequate. As ahes get larger, parallel address translation will beome impratial in arhitetures with fixed page sizes. Eventually the inreased hit time and implementation osts of wider assoiativity will overwhelm the benefits of parallel address translation. Designers will be fored to hoose between doing address translation before or after the ahe lookup. Address translation is done before the ahe lookup on all DEC VAX-11 implementations, for example, sine reasonable ahe sizes are muh larger than the VAX-11 s 512-byte page size. Doing address translation after the ahe lookup implies that ahes are organized with virtual addresses and address translation is neessary only on ahe misses. Some researhers argue that the advantage of this approah, namely, a faster hit time, will justify the additional omplexity required to implement a ahe organized with virtual addresses.o7l n either ase, if address translation is not done in parallel with the ahe lookup, it will no longer affet whether a ahe should be diret-mapped or setassoiative. Arguments for diretmapped ahes The arguments for diret-mapped ahes are (1) they an be implemented at less ost than set-assoiative ahes, (2) their ahe hit (aess) times are smaller than those of omparable set-assoiative ahes, and (3) they have smaller effetive (average) aess times than set-assoiative ahes for suffiiently large ahe sizes. Below, support the above arguments for single-level ahes in uniproessors and show why expet the diret-mapped organization to beome ommonly used. Lower ost. A diret-mapped ahe never osts more than a set-assoiative ahe, beause there is a way to onvert from a set-assoiative to a diret-mapped design at no ost. (The ost of a ahe an be measured in many dimensions, suh as number of hips, hip area, poweronsumption, dollars, and design time.) An n-way set-assoiative ahe, like the one shown in Figure 3, an be onverted to one that is diret-mapped simply by hanging the replaement algorithm. On a ahe miss, an n-way set-assoiative ahe selets a vitim, or blok to be replaed, using some algorithm, perhaps LRU or random. A diret-mapped ahe is reated if the vitim is seleted with the lower log2n bits of the address tag of the new referene. Sine this replaement algorithm requires less hardware than the original replaement algorithm, a diret-mapped ahe will ost less than one that is set-assoiative. n pratie, diret-mapped ahes ost signifiantly less, sine less parallelism is required if parallel address translation is not done. An n-way set-assoiative ahe must read n tags in parallel and ompare eah of them with the high-order bits of the referene s address. A diret-mapped ahe need only read and ompare one tag. Thus, diret-mapped ahes need fewer omparators, require fewer onnetions, and an use fewer, larger (deeper) memory hips. Similarly, the data memory (and onnetions to it) in an n-way setassoiative ahe must be n times as wide as that for a diret-mapped ahe, enabling the diret-mapped ahe to use fewer, larger memory hips. Faster hit time. The hit (aess) time of a diret-mapped ahe is less than or equal to that of a omparable set-assoiative ahe. t is at most equal, beause the transformation desribed above reates a diret-mapped ahe with exatly the same hit time as a set-assoiative ahe. n pratie, the hit time of a diretmapped ahe is less than that of a omparable set-assoiative ahe beause the ritial timing path an be made shorter (unless the set-assoiative ahe was small enough to allow parallel address translation). The delay paths, displayed in Figure 5, are math-found, selet-data, and read-data. The hit time of a diret-mapped ahe an be less than that of a set-assoiative ahe, beause the selet-data path an be eliminated in a diret-mapped ahe. nstead of letting the results of tag omparisons determine the data returned to the CPU, the data an be seleted with several bits from a referene s address. These bits an diretly ontrol a multiplexer or be deoded to ontrol tri-state buffers. n either ase, this timing path is so muh faster than the others that it is effetively eliminated. Figure 7 illustrates this improvement. An important effet of eliminating the selet-data timing path is that the mathfound and read-data paths are now independent. This makes it possible for a diret-mapped ahe to return the orret data and for the CPU to resume exeution even before the system knows whether a hit will our, so long as the CPU an bak out of exeution begun with inorret data. This optimisti use of ahe data is being used in a researh mahine at DEC WRL, where it enables the ahe hit time and the mahine yle time to be redued by approximately one-third. Optimisti use of ahe data is possible in a setassoiative ahe if one always returns the most-reently-used (MRU) blok in the seleted set. * found, however, that the performane of a simple diret-mapped ahe is similar to that of a more omplex MRU ahe. t is also possible to improve the readdata path, sine it is no longer neessary to read from n data bloks in parallel. nstead only one blok need be read. This flexibility allows designers to organize data memory hips differently and to use larger, deeper hips. t is possible, for example, to ompletely eliminate the multiplexer or tri-state buffers previously used to selet data from different bloks. Finally, improvement in the mathfound path is also possible, sine it is no longer neessary to read and ompare n Deember

10 Address,Lower log,(n) bits of f MathOut DataOut Figure 7. Converting to a diret-mapped ahe. An n-way set-assoiative ahe an be onverted to a diret-mapped ahe by hanging the replaement algorithm to replae the blok in bank r, where r is the referene s tag modulo n. Sine this funtion an be done with bit seletion (at trivial ost) and off the ritial path for a ahe hit, the resulting diret-mapped ahe has the same ost and hit time as the original set-assoiative ahe. Thus, moving to a diret-mapped ahe never inreases and, as explained in the text, an derease ost and hit time. tags in parallel and then OR the results for the ahe hit/miss signal. Rather, one need only read and ompare one tag. This flexibility allows the tag memory to be implemented with fewer, deeper hips and eliminates the final OR stage. The exat magnitude of the improvement possible depends on many implementation fators. 1 examined ahes implemented in three tehnologies: (1) TTL logi and MOS SRAM memory hips, (2) ECL logi and memory hips, and (3) ustom CMOS. found that moving from a diret-mapped to a two-way setassoiative ahe inreases ahe hit time in (1) from 100 to 109 ns (nine perent), in (2) from 30.0 to 33.5 ns (12 perent), and in (3) from 50.0 to 51.O ns (two perent). The differene is about 10 perent for board-level TTL and ECL ahes and muh smaller for ustom CMOS ahes. 1 do not regard the differene between the TTL and ECL times as signifiant, sine both numbers are sensitive to the propagation delays through a few parts. Sine ustom CMOS assumptions are radially different from those for MS, omparing CMOS results with TTL or ECL results is subjet to more error. However, one may expet the penalty for adding a multiplexer to be larger in MS, where it adds logi delay and two hip rossings, than on a ustom hip, where it adds just the logi delay. n summary, the hit time of a diretmapped ahe will be less than that of a omparable set-assoiative ahe, sine blok seletion an be done before the tag omparison ompletes, and the tag and data memories do not need to read information from n bloks in parallel. Superior effetive aess times. A diretmapped ahe has a smaller effetive (average) aess time than that of a setassoiative ahe of the same size if (1) the diret-mapped ahe has a smaller hit time and (2) both ahes are suffiiently large that the miss ratio differene between them is small. Reall that effetive aess time, t,rf(c), is the average lateny, as seen by the proessor, required by the memory sys- 34 COMPUTER

11 ~~ U3 a, U $ _ U - L W v a, 0, C L U._ E U) a, U m._ P - 0 L W % 10% 20% 30% 40% 50% Cahe hit time hange (Atahe) (a) 10-yle ahe miss time (tmemory) Cahe hit time hange (Atahe) (b) 20-yle ahe miss time (fmemor,) Figure 8. Change in effetive aess time. This figure shows the hange in effetive aess time (Ateff = Atahe + Am*t,,,,,,) that results when moving from a ahe with a relatively fast hit time and a relatively large miss ratio (e.g., a diret-mapped ahe) to another ahe with a slower hit time but smaller miss ratio (a set-assoiative ahe). The graphs assume 10-yle (a) and 20-yle (b) miss penalties, where a yle is defined to be equal to the hit time of the faster ahe. The x-axis displays values of Atahe, the hit time differene. An x value of 20 perent implies that the slower ahe s hit time is 1.2 yles, 1.2 times the hit time of the faster ahe. The y-axis gives values of Ateff, the hange in effetive aess time. A y value of implies that the effetive aess time improves by 0.10 yles. Sine most effetive aess times are slightly larger than 1 yle, an absolute improvement of 0.10 yles translates into slightly less than a 10 perent relative improvement. The various lines show miss ratio hanges, Am, from up to 0.0. All Am s are nonpositive, sine we assume the seond ahe has a smaller miss ratio. Points on the y-axis represent the effetive aess time hange that results when Atahe is zero or ignored. Here, all points are below the x-axis, sine the latter ahe, with the smaller miss ratio, always has a better effetive aess time (Atefr < 0). f Atahe > 0, the benefit of the lower miss ratio is diminished. For all points above the x-axis, the drawbak of the slower hit time exeeds the benefit of the lower missxatio, making the former ahe preferred (Aterr > 0). tem to servie a memory referene. 1 model it as where ahes. t shows that Ateff an be either positive or negative. f, on theother hand, implementation onsiderations are ignored, then where m(c), tahe(c), and fmemory(c) are the miss ratio, hit time (ahe aess time), and average miss penalty (delay beyond a ahe aess to aess memory) for ahe L. f two ahes have the same miss penalty, the hange in effetive aess time moving from a ahe Cl to a ahe C2 is f ahe C is diret-mapped and ahe 2 set-assoiative, then Atahe 2 0 and Am *tmemory 5 0, sine set-assoiative ahes typially have a slower hit time and smaller miss ratio than diret-mapped ahes of the same size. Figure 8 illustrates Ateff = Atahe + Am*tmemor, for hypothetial diret-mapped and set-assoiative whih implies inreasing assoiativity always improves effetive aess time (Ateff is negative). Thus, the effet of inluding implementation onsiderations is to diminish or reverse the miss ratio benefit of inreasing assoiativity. To see whether implementation onsiderations matter in pratie, typial values must be determined for fmemory, Deember

12 E E n.- E m - P) n _...._ _..._ Qnified ,Data , -nstrut n 1K 10K look 1M Cahe size (bytes) Figure 9. Miss ratio differenes. This figure displays the miss ratios from diretmapped ahes less the miss ratio of two-way set-assoiative ahes of the same size for unified, instrution, and data ahes with 32-byte bloks using operating system and multiprogramming traes from BM/370 and VAX 11 arhitetures. Results show miss ratio differenes (Am) generally diminish with inreasing ahe size, and are smaller for instrution ahes than for unified or data ahes. Am, and Atahe. Reasonable values for tmemory are 10 or 20 yles, where a yle is equal to the hit time of the faster ahe. Smaller values are possible, espeially in systems where ahe misses are servied by larger level-two ahes instead of main memory. Larger values are possible in a system where the mismath between the tehnologies used to implement the ahe and memory is larger than normal. Typial values for Am, the absolute differene in miss ratio, an be derived from trae-driven simulation. Figure 9 shows miss ratio differenes between some diret-mapped and two-way setassoiative ahes with 32-byte bloks. The data show that Am s generally get smaller as ahe size is inreased, and that the absolute values of the Am s are small for larger ahes. All Am s for ahes larger than 16 Kbytes, for example, are less than Figure 10 shows effetive aess time hanges with atual miss-ratio differenes for unified ahes from Figure 9. Lines are labeled with ahe sizes and positioned aording to the miss ratio differene for that ahe size. Figures 11 and 12 show similar results for instrution and data ahes. These figures illustrate three points: (1) Moving from a diret-mapped to a two-way set-assoiative ahe has little potential for improving effetive aess time as ahes get larger. At 64 Kbytes (see lines labeled 64K) and with 10-yle misses, the maximum improvement possible is 5.2,3.6, and 4.5 perent for unified, instrution, and data ahes. With 20-yle misses, the maximum possible improvement is twie as large. (2) Moving from a diret-mapped to a two-way set-assoiative ahe an ause a worse effetive aess time if ahe hit time inreases by even a small amount. The improvement is offset if the ahe hit time inrease is equal to the maximum improve- ment possible from the smaller miss ratio (for example, 5.2,3.6, and 4.5 perent for unified, instrution, and data ahes of 64 Kbytes, having 10-yle miss penalties). (3) Moving from a diret-mapped to a two-way set-assoiative ahe offers less to instrution ahes than it does to unified or data ahes. The potential benefit from inreasing assoiativity in instrution ahes with a 10-yle miss time is less than 6.4 perent for sizes as small as 2 Kbytes. The atual benefit will be less if the miss penalty is less than 10 yles or inreasing assoiativity impats ahe hit time. Furthermore, inreasing blok size or inreasing assoiativity beyond two-way does not hurt the ase for large diretmapped ahes. nreasing blok size in large ahes to 64 bytes improves the performane of diret-mapped ahes relative to set-assoiative ones by dereasing all miss ratios and miss ratio differenes. Further inreases will exhibit similar behavior until the number of bloks in the ahe beomes limited. Miss ratio improvements resulting from inreasing assoiativity beyond two-way are muh smaller than the improvements between diret-mapped and two-way set-assoiativity, implying that further inreases in assoiativity will not improve effetive aess time unless they have a negligible impat on ahe hit time. The final parameter value that must be determined to know whether diretmapped or set-assoiative ahes are faster is Afahe. This parameter is diffiult to determine, beause it is implementation dependent and very sensitive to the delay through a few parts. As disussed previously, examined board-level ahes (TTL and ECL) where Atahe was around 10 perent. The effet of a 10 perent slowdown an be studied in Figures by only onsidering design points on a vertial line at Atahe = 10 perent. For the 10-yle miss penalty, Atahe = 10 perent implies that diretmapped ahes have better effetive aess times than two-way set-assoiative ahes for ahes equal to and larger than 16,8, and 16Kbytes for unified, instrution, and data ahes. For the 20-yle miss penalty, the orresponding sizes are 64, 16, and 64 Kbytes. The exat ahe size at whih the effetive aess time of a diret-mapped ahe beomes better than that of a two-way setassoiative ahe is sensitive to many assumptions. Nevertheless, that it does ross over is inevitable, given that miss ratio differenes diminish as ahes get larger and that set-assoiative ahes have 36 COMPUTER

13 ~ ~ 64K ~ 2K ~ 32K ~ 4K K - 16K - 8K - 4K - 1K - 64K - 16K - 8K - 2K K - 0.5K % 10% 20% 30% 40% 50% Cahe hit time hange (Atahe) (a) 10-yle ahe miss time (tmemor,) 0% 10% 20% 30% 40% 50% Cahe hit time hange (Af,,,,,) (b) 20-yle ahe miss time (fmemor,) Figure 10. Effetive aess time hanges in unified ahes. This figure shows the hange in effetive aess time (Ateff) that results from moving from a diret-mapped ahe to a two-way set-assoiative ahe when both ahes are unified, have 32-byte bloks, and have 10-yle (a) or 20-yle (b) miss penalties (t,,,,,,). This figure is onstruted by substituting miss ratio differenes (Am s) for unified ahes from Figure 9 into Figure 8. The lines are labeled with ahe sizes in bytes and positioned by the miss ratio differene at that ahe size. The data for 16-Kbyte ahes with 10-yle miss penalties, for example, an be interpreted as follows: inreasing assoiativity from diret-mapped to two-way improves effetive aess time by 0.10 if there is no speed ost to adding assoiativity (Atohe = 0); inreasing assoiativity has no effet on effetive aess time if the set-assoiative ahe s hit time is 10 perent longer; and inreasing assoiativity auses a worse effetive aess time, despite lowering the miss ratio, if the setassoiative ahe is more than 10 perent slower. slower hit times. At ahe sizes less than the ross-over size, a diret-mapped ahe may still be preferred to one that is set-assoiative, sine a diret-mapped ahe may ost less and its effetive aess time may not be muh worse. Even for 28yle miss penalties, as Figures show, the effetive aess time of a two-way set-assoiative ahe is never more than five perent better than that of the orresponding diretmapped ahe at ahe sizes of 32 Kbytes and larger. Other trends Up to this point, have onentrated on single-level ahes in uniproessors. Here disuss future trends toward ahes in hierarhies and multiproessors. 1 examine why these trends may our and disuss how and whether my arguments for singlelevel ahes in uniproessors apply to these new situations. Toward ahes in hierarhies. n twolevel ahe hierarhies, a level-one ahe servies proessor referenes, but it obtains data for misses from a level-two ahe instead of memory. A level-two ahe servies only level-one ahe misses and obtains data for its misses from memory. Two-level ahe hierarhies, heretofore rarely used, may beome more ommon in future systems for three reasons. First, implementation onsiderations an fore a partition. Some reently introdued miroproessors, for example, devote some of their limited on-hip area to ahes, but they require larger ahes to avoid frequent aesses to relatively slow main memory. Sine the on-hip ahes annot be made larger, a seond on-board ahe is required. Seond, a detailed omputation of effetive aess time shows that two-level ahe hierarhies an offer superior performane to a single-level ahe as proessors speed up relative to main memories. Third, there may be funtional and performane benefits to speializing ahes at different levels in a multiproessor. n a multiproessor, a level-one ahe an be optimized to minimize effetive aess time, while the leveltwo ahe is designed to redue ost or Deember

14 ~...,..., 2K-64K 1K K % 10% 20% 30% 40% 50% Cahe hit time hange (Afahe) (a) 10-yle ahe miss time (t,,,,,) J 0% 10% 20% 30% 40% 50% Cahe hit time hange (Atahe) (b) 20-yle ahe miss time (fmemory) Figure 11. Effetive aess-time differenes in instrution ahes. This figure shows the effetive aess-time hange (teff) of moving from a diret-mapped instrution ahe to a two-way set-assoiative instrution ahe with miss penalties of either 10 yles (a) or 20 yles (b). Other assumptions math those of Figure 10. Beause miss-ratio differenes (Am s) are smaller, the benefit of assoiativity is smaller for instrution ahes than it is for unified or data ahes. interonnetion traffi. Similar reasons are expressed by Short and Levy.13 The utility of diret-mapped ahes in two-level ahe hierarhies is, as yet, undetermined. Level-one ahes will be diret mapped if tehnologial onstraints permit large enough ahe sizes that the hit time advantage of diret-mapped ahes (due in part to allowing data to be returned before the tag omparison is omplete) is more important than the miss ratio disadvantage. Diret-mapped ahes an be preferred for ahe sizes as small as 16 Kbytes if misses are servied by a level-two ahe in 10 yles or less. Level-two ahes, on the other hand, are more likely to be setassoiative, sine level-two ahe hit times are less ritial and a lower miss ratio an improve multiproessor performane. The only argument for diret-mapped leveltwo ahes is that straightforward implementations of large set-assoiative ahes will be expensive, requiring multiple-word-wide banks of memory hips. Toward ahes in multiproessors. To provide a rate of growth of omputing power that exeeds the rate of tehnologial improvement, many manufaturers, partiularly of high-end omputers, are turning toward multiproessors. To failitate ease of programming, some multiproessors provide shared-memory and use ahes. Cahes in multiproessors may be designed differently than those in uniproessors, sine multiproessor ahes may be more onerned with minimizing memory and interonnet ontention than with minimizing effetive aess time.4 Here, the relative miss ratio differene (Amlm) is more important than the absolute miss ratio differene (Am). found relative miss ratio differenes are onstant aross wide hanges in miss ratio and ahe size. For example, dereasing assoiativity from two-way to diret-mapped in unified ahes auses a relative miss ratio inrease of about 30 perent even for large ahes. Relative miss ratio differenes are most important in single-bus shared-memory ahe-oherent multiproessors, where bus bandwidth an easily limit system throughput. n multiproessors based on long-lateny high-bandwidth interonnetion networks, however, ahe design should proeed as in a uniproessor with slow main memory. Nevertheless, both ases make set-assoiativity ahes more attrative, but not neessarily better. f multiproessors use two-level ahe hierarhies, the above arguments apply to level-two ahes. would expet most level-one ahes, on the other hand, to be designed like level-one ahes in a uniproessor, making diret-mapped ahes likely. This expetation may be inorret if the misses for many level-one ahes are servied by a single level-two ahe and ontention between level-one ahes is signifiant. D iret-mapped ahes will be ommon in uniproessors as singlelevel or level-one ahes and in 38 COMPUTER

15 K 32K 16K 8K 4K 0.50,...._......,......, ~ 2K 1K 0.5K v) a, 0 P._ + 0 U1 v) a, U P._ 0 L W % 10% 20% 30% 40% 50% Cahe hit time hange (Afahe) (a) 10-yle ahe miss time (tmemor,) - nn -- J, 0% 10% 20% 30% 40% 50% Cahe hit time hange (Afahe) (b) 20-yle ahe miss time (fmemor,) Figure 12. Effetive aess-time differenes in data ahes. This figure displays effetive aess-time hanges that result from moving from a diret-mapped data ahe to a two-way set-assoiative data ahe with miss penalties of either 10 yles (a) or 20 yles (b). Other assumptions math those of Figure 10. The benefit of assoiativity in data ahes is similar to that for unified ahes. multiproessors as level-one ahes. Diret-mapped ahes are preferred when they are suffiiently large that hit time benefits are more signifiant than miss ratio drawbaks. This an our in singlelevel ahes of 64 Kbytes and larger (16 Kbytes for instrution ahes) and an our at 16 Kbytes and larger for level-one ahes whose misses are servied more rapidly by level-two ahes. The arguments against diret-mapped ahes, with respet to set-assoiative ahes, are that they (1) have worse miss ratios, (2) have more ommon worst-ase behavior, and (3) prelude parallel address translation. have shown that the signifiane of the first two points beomes questionable for large ahes where absolute miss ratio differenes are small, and that the third is not a disadvantage for large diret-mapped ahes, sine large setassoiative ahes also prelude parallel address translation. The arguments for diret-mapped ahes are that they (1) ost less, (2) have faster hit (aess) times, and (3) an have superior effetive (average) aess times. have shown that the strength of these arguments is not diminished by inreasing ahe size, and the third point is more likely to be true for large ahe sizes. An alternate way of stating this result is set-assoiative ahes redue the time spent on ahe misses; diret-mapped ahes redue the time spent on ahe hits, espeially if a CPU an use data before a hit or miss is determined; set-assoiative ahes are preferred in small ahes where misses are ommon; diret-mapped ahes are preferred in large ahes where misses are rare; and many future ahes will be suffiiently large and therefore diretmapped. These arguments may not apply to single-level or level-two ahes in mul- tiproessors, where minimizing ontention or very long miss penalties may favor setassoiative ahes over diret-mapped ahes. 0 Aknowledgments would like to thank my thesis advisors, Alan Smith and David Patterson, for their many suggestions that improved the quality of my researh. Thanks also to those who read and improved drafts of this artile: Sue Dentinger, James Goodman, David Patterson, Gurindar Sohi, and the anonymous referees. The material presented here is based on researh supported in part by the Defense Advaned Researh Projets Ageny monitored by Naval Eletronis Systems Command under Contrat No. N C-0269, the National Siene Foundation under grants CCR and MP , the State of California under the MCRO program, the graduate shool at the University of Wisonsin-Madison, and by BM, Digital Equipment Corporation, Hewlett- Pakard, and Signetis. Deember

Referenes 1. A.J. Smith, Cahe Memories, Cotputing Surveys, Vol. 14, No. 3, Sept. 1982, pp. 473-530. 2. A.J. Smith, Bibliography and Readings on CPU Cahe Memories and Related Topis, Computer Arhiteture News, Jan.

16 Referenes 1. A.J. Smith, Cahe Memories, Cotputing Surveys, Vol. 14, No. 3, Sept. 1982, pp A.J. Smith, Bibliography and Readings on CPU Cahe Memories and Related Topis, Computer Arhiteture News, Jan. 1986, pp A.J. Smith, Line (Blok) Sizehoie for CPU Cahes, EEE Trans. Computers Vol. C-36, No. 9, Sept. 1987, pp J. Bell, D. Casasent, and C.G. Bell, An nvestigation of Alternative Cahe Organizations, EEE Trans. Computers, Vol. C-23, No. 4, Apr. 1974, pp A.J. Smith, A Comparative Study of Set Assoiative Memory Mapping Algorithms and Their Use for Cahe and Main Mem- ory, EEE Trans. Software Engineering, Vol. SE-4, No. 2, Mar. 1978, pp C. Alexander et al., CaheMemory Performane in a Unix Environment, Computer Arhiteture News, Vol. 14, No. 3, June 1986, pp A. Agarwal, Analysis of Cahe Performane for Operating Systems and Miroprogramming, PhD dissertation, Teh. Report CSL-TR , Stanford University, Stanford, Calif., May S. Przybylski, M. Horowitz, and J. Hennessy, Performane Tradeoffs in Cahe Design, Pro. 15th Ann. nt l Symp. Computer Arhiteture, No. 861, Computer Soiety Press, Los Alamitos, Calif., 1988, pp University of Nebraska - Linoln Computer Siene and Engineering Department Department Chair The University of Nebraska-Linoln seeks a dynami individual for the position of Chair of the Department of Computer Siene and Engineering. The department urrently has 14 faulty members, has in plae rigorous programs in omputer siene and has initiated a program in omputer engineering. Programs are offered in two olleges, Arts and Sienes and Engineering and Tehnology. Currently about 380 Undergraduates, 55 Masters and 15 Ph.D. andidates are enrolled in Computer Siene and Engineering programs. The department has strong researh programs in algorithms, theoretial omputer siene, ommuniations theory and networks, oding theory and data enryption, ombinatoris, fault tolerant omputing, formal languages, and symboli and algebrai omputation. Researh strengths also exist in artifiial intelligene, omputer arhiteture, VLS, programming anguages, numerial analysis, information retrieval. human fators, and data base. Strong interdisiplinary ties exist between the Departments of Computer Siene and Engineering; Mathematis and Statistis; Eletrial Engineering; and Computer Resoures Center. The University has reently reated a Centerfor Communiation and nformation Siene based mainly on researh faulty in the Computer Siene and Engineering Department, but also inluding faulty from the above named departments. The new hairperson will have a leadership role in shaping the Center s future diretion. The University of Nebraska - Linoln is the primary ampus for researh and graduate studies in the State of Nebraska. The University has a wide variety of omputing resoures linked by a sophistiated ampus-wide network. UNL is the leading institution in the NSF-funded regional network MDnet, and a node on the NSFnet bakbone. The State of Nebraska s ommitment to tehnology has been undersored by the Governor s Proposal to inrease funding for researh at the University of Nebraska. The five year plan would provide an additional $4 million eah year over the previous year, leading to a $20 million inrement in the fifth year. The State Legislature has appropriated funds to start this ambitious projet. Some of these funds are now available to support the Center for Communiation and nformation Siene. Qualifiations require earned dotorate in omputer siene or related field, strong leadership for researh and aademi programs, and redentials appropriate for appointment as a full professor. Administrative experiene is desirable. The starting date for this appointment is August, The losing date is Deember 15, 1988, or until the position is filled. Salary will be ommensurate with qualifiations. Women and minorities are partiularly enouraged to apply. Qualified appliants should send resumes and names of three referenes to Prof. Spyros S. Magliveras, Chairman, Searh Committee, Computer Siene and Engineering Department, Ferguson Hall, University of Nebraska, Linoln, NE address: spyros@ fergvax.unl.edu. An Equal Opporiuni~lAffirniati~,e Ation Employer 9. M.D. Hill, Aspetsof CaheMemoryand nstrution Buffer Performane, PhD dissertation, Teh. Report 87/381, Computer Siene Dept., Univ. of California, Berkeley, Calif., Nov J.R Goodman, Cohereny for Multiproessor Virtual Address Cahes, Pro. Symp. Arhitetural Support for Programming Languages and Operating Systems, No. M805 (mirofihe), Computer Soiety Press, Los Alamitos, Calif., 1987, pp D.A. Wood et al., An n-cahe Address Translation Mehanism, Pro. 13th Ann. nt l Symp. Computer Arhiteture, No. 719, Computer Soiety Press, Los Alamitos, Calif., 1986, pp J.H. Chang, H. Chao, andk. So, Cahe Design of a Sub-Miron CMOS System/370, Pro. 14th Ann. nt l Symp. Computer Arhiteture, No. 716, Computer Soiety Press, Los Alamitos, Calif., 1987, pp R.T. Short and H.M. Levy, ASimulation Study of Two-Level Cahes, Pro. 15th Ann. nt 1 Symp. Computer Arhiteture, NO. 861, Computer Soiety Press, Los Alamitos, Calif., 1988, pp J.R Goodman, Using Cahe Memory to Redue Proessor-Memory Traffi, Pro. 10th Ann. nt l Symp. ComputerArhiteture, No. M473 (mirofihe), Computer Soiety Press, Los Alamitos, Calif., 1983, pp Mark D. Hill is an assistant professor in the Computer Sienes Department at the University of Wisonsin at Madison. His researh interests enter on performane arid implementation fators in memory systems. He was a prinipal ontributor to the SPUR projet to build a shared-bus multiproessor at the University of California at Berkeley. He is urrently working on Multiube, a projet designing a multiproessor using a grid of buses. Hill earned a BS in omputer engineering from the University of Mihigan in 1981, and an MS and PhD in omputer siene from the University of California at Berkeley in 1983 and 1987, respetively. Heis amember of EEE, the EEE Computer Soiety, and ACM. Hill may be ontated at the University of Wisonsin-Madison, Computer Sienes Department, 1210 W. Dayton St., Madison, W COMPUTER

On - Line Path Delay Fault Testing of Omega MINs M. Bellos 1, E. Kalligeros 1, D. Nikolos 1,2 & H. T. Vergos 1,2

On - Line Path Delay Fault Testing of Omega MINs M. Bellos, E. Kalligeros, D. Nikolos,2 & H. T. Vergos,2 Dept. of Computer Engineering and Informatis 2 Computer Tehnology Institute University of Patras,