Bus Encoding Techniques for System- Level Power Optimization

Size: px
Start display at page:

Download "Bus Encoding Techniques for System- Level Power Optimization"

Transcription

1 Chapter 5 Bus Encoding Techniques for System- Level Power Optimization The switching activity on system-level buses is often responsible for a substantial fraction of the total power consumption for large VLSI systems. Thus, the minimization of the transitions at the I/O interfaces can provide significant savings on the overall power budget. This chapter introduces innovative encoding techniques suitable for minimizing the switching activity of system-level address buses in microprocessor-based systems. In particular, the proposed schemes target the reduction of the average number of bus line transitions per clock cycle by exploiting the spatial locality principle. Experimental results, conducted on address streams generated by a real microprocessor executing benchmark programs, have demonstrated the effectiveness of the proposed methods with respect to previous encoding approaches. A trade-off analysis has been carried out to evaluate whether the power saved by bus activity reduction is not offset by the encoding/decoding logic Introduction In microprocessor-based systems, significant power savings can be achieved through the reduction of the transition activity of the on- and off-chip buses. This is due to the total capacitance being switched, when a voltage change occurs on a bus line, that is usually sensibly larger than the capacitive load that must be charged/discharged when internal nodes toggle. Heavy loads are usually connected to off-chip buses due to I/O pins, long external board wires, and board-level connected devices. In order to drive these large board-level capacitances, the sizes of the devices in the I/O pads need to be much larger than the average on-chip features, increasing also the pads intrinsic parasitic capacitance.

2 134 In this chapter, we deal with the communication occurring on both data and address buses on general-purpose microprocessor-based systems. The increasing performance requirements and the existing gap between the speed of current microprocessors and the speed of the system interfaces has pushed the system designers to increase the bandwidth of the data transfers. For example, the PowerPC 620 processor can be configured with either a 64 or a 128-bit data bus to support the transfer of the 64-byte cache block within four clock cycles. Moreover, modern software applications span a very large address space. The latest versions of existing processors, such as the DEC Alpha AXP and the PowerPC 620, support a 64-bit address space. Hence, both data and address buses have become very wide for modern microprocessor-based systems. Due to the intrinsic capacitances of the bus lines, a considerable amount of power is required at the I/O pins of the microprocessor when data have to be transmitted over the bus. More specifically, it has been estimated that the capacitance driven by the I/O nodes is usually much larger (up to three orders of magnitude [146]) than the one seen by the internal nodes of the microprocessor. As a consequence, dramatic optimizations of the average power consumption can be achieved by minimizing the number of transitions (i.e., the switching activity) on system-level buses. In this chapter, we focus on the problem of reducing the switching activity on system-level buses through the application of dedicated encoding schemes. The aim is to propose innovative encoding techniques targeting the reduction of the average number of transitions on the system-level address buses of microprocessor-based systems. In these systems, the data and address buses are at the core of the interface between the processor and the memory and I/O sub-systems. More in detail, they are used, on-processor, to access the first and the second-level caches, as well as, off-processor, to access the external lower level caches, the main memory (usually through a memory controller), and the I/O sub-systems (to support direct memory accesses from the I/O controllers). Our goal is to avoid any modification to the standard memory components, hence adding the encoding circuitry inside the processor, and the decoding logic inside the memory and the I/O controllers. First, we analyzed the execution traces for benchmark programs on the MIPS RISC-based architecture [92], and we noticed how the address sequentiality for instruction address streams is equal to 63.04% on average, while it can reach up 76.56%. As a matter of fact, instruction

3 Chapter 5. Bus Encoding Techniques for System-Level Power Optimization 135 address streams are typically composed of bursts of in-sequence addresses, intermingled with out-of-sequence ones corresponding to taken branches and jumps. On the other hand, the percentage of in-sequence data addresses, that we derived for the MIPS processor, is equal to 11.39% on average and 52.47% at the maximum. In fact, data streams accessing data arrays generate in-sequence addresses, while other accesses to not-structured data generate addresses randomly distributed in time with a very limited temporal correlation among them. Then, we defined the asymptotic zero-transition encoding scheme, which is suitable for reducing the switching activity on the address bus lines. The technique relies on the observation that, in the most number of cases, patterns travelling onto address buses are consecutive, as the spatial locality principle states and as confirmed by the MIPS tracing. Under this condition it may therefore be possible, for the devices located at the receiving end of the bus, to automatically calculate the address to be received at the next clock cycle; consequently, the transmission of the new pattern can be avoided, resulting in an overall switching activity decrease. The main idea exploited by our encoding scheme, called in the sequel the T0 code, consists of avoiding the transfer of consecutive addresses on the bus by using a redundant line, INC, to transfer to the receiving sub-system the information on the sequentiality of the addresses. When two addresses in the stream to be transmitted are consecutive, the INC line is set to 1, the address bus lines are frozen, to avoid unnecessary switchings, and the new address is computed directly by the receiver. On the other hand, when two addresses are not consecutive, the INC line is driven to 0 and the bus lines operate normally. With the hypothesis of infinite streams of consecutive addresses, the T0 code enjoys the property of zero transitions occurring on the bus. Therefore, it outperforms the Gray code since, under the same assumption, Gray addressing requires one line switching per each pair of patterns. The increments between consecutive patterns can be parametric, reflecting the addressability scheme adopted in the given architecture. In this respect, our code has the same capabilities of the Gray scheme [127]. The proposed encoding has been applied to several address streams, broken down into instruction-only, data-only, and multiplexed instruction-data traces. The goal is to evaluate the effects of encoding on separated and shared address buses for both instructions and data streams. The chapter also introduces mixed encoding schemes, aiming at combining the best

4 136 properties of existing approaches. This with the ultimate objective of exploiting the advantages offered by each technique, and thus identifying the best method for the specific processor, the MIPS RISC we have used for the experiments. The combined techniques are suitable for address buses for both instructions and data streams. In fact, the separated accesses to instruction and data arrays occur mainly in sequence, but unfortunately they are often interleaved on the same address bus, thus the sequentiality is destroyed. The mixed encoding schemes aim at restoring the sequentiality. Furthermore, we present analytical and experimental analyses showing the improved performance of our encoding schemes when compared to existing bus encoding techniques. Since our global target is the reduction of the power consumed by the overall system, we need to guarantee that the power savings achieved by our encoding schemes are not offset by the power overhead dissipated by the encoder/decoder. Moreover, bus latency is considered a critical design constraint such as power. Performance of the coding-decoding scheme is an essential requirement, since in modern microprocessor-based systems bus width and clock rate are both constantly increasing. Therefore, the simultaneous optimization of power and timing has been targeted during the synthesis of the encoding/decoding logic. Although the T0 encoder/decoder are more area demanding than the Gray ones, the T0 bus interface turns out to be faster. In fact, the critical delay of a Gray decoder grows linearly with the number of bus lines to be decoded, while in the T0 decoder we propose the critical delay has logarithmic behavior. In spite of the fact that the encoding and decoding logic to implement the T0 and related codes is more expensive, in terms of area, than the Gray one, the power savings achieved by bus encoding are not offset by the energy consumed by the additional circuitry to implement the proposed codes. To support this claim, we present detailed implementations of such additional logic, and we report power dissipation results obtained through simulation of a large set of properly selected input patterns for typical onand off-chip capacitive loads. The rest of this chapter is organized as follows. Previous works on power-oriented bus encoding techniques have been outlined in Section 5.2, while Section 5.3 discusses the main characteristics of the proposed T0 code with respect to previous encoding schemes. Mixed bus encoding schemes have been proposed in Section 5.4 and their performances, in terms of average bus transitions, have also been reported for real address streams. Section 5.5

5 Chapter 5. Bus Encoding Techniques for System-Level Power Optimization 137 discusses the encoder/decoder architectures and their power consumption for typical on-chip and off-chip system-level bus loads. Finally, in Section 5.6 we draw some conclusions of our analysis Previous Work on Bus Encoding Techniques Encoding paradigms for reducing the switching activity on the bus lines in the processor-tomemory interface have been recently investigated. They all rely on encoding the binary patterns before transmitting them depending on the distinctive spectral characteristics of the pattern streams to be exchanged. In the well-known one-hot encoding technique [35], the bus is composed of m = 2 n wires. An n-bit value is coded by a logical 1 on the i-th wire, where 0 i 2 n-1 is the binary value corresponding to the bit pattern, while a logical 0 is put on the remaining m-1 wires. The onehot encoding has the drawback of requiring an exponential number of wires, but it guarantees exactly one 0-to-1 and one 1-to-0 transition when a new value is sent over the bus. In [167], Stan and Burleson have proposed the use of a redundant encoding scheme, the businvert code, to limit the average power. The bus-invert method performs well when patterns to be transmitted are randomly distributed in time and no information about pattern correlation is available. Therefore, the method seems to be appropriate for encoding the information travelling on data buses. A redundant bus line, INV, is needed to signal to the receiving end of the bus which polarity is used for the transmission of the incoming pattern. The encoding depends on the Hamming distance (i.e., the number of bit differences) between the value of the encoded bus lines at time t-1 (also counting the redundant line at time t-1) and the value of the address bus lines at time t. The Hamming distance is compared to N / 2, where N is the bus width (assuming N even without loss of generality). If the Hamming distance between two successive patterns is larger than N / 2, the current address is transmitted with inverted polarity and the redundant line is asserted; otherwise, the current address is transmitted as is, and the INV line is de-asserted. The bus-invert method can be expressed by the following equations: (B (t), INV (t) ) = (b (t), 0 ) if H (t) N / 2 (b (t), 1 ) if t > 0 H (t) > N / 2 (1)

6 138 where B (t) is the value of the encoded bus lines at time t, INV (t) is the additional bus line, b (t) is the address value at time t, H (t) = (B (t-1) INV (t-1), b (t) 0 ) is the Hamming distance, and N is the bus width of b (t). The corresponding decoding scheme is simply defined as: b (t) = B (t) if INV = 0 B (t) if INV = 1 (2) If the words transmitted on the bus are independent and uniformly distributed, the average number of transitions per clock cycle is lowered by less than 25% of the original value, due to the binomial distribution of the distance between consecutive patterns. Major drawbacks of this approach are related to the required redundant bus line and the overhead due to the logic to implement the voter to decide whether the Hamming distance exceeds N / 2. The encoding methods discussed so far perform well when no information about possible data correlation is available. In particular, they work fine when patterns to be transmitted are randomly distributed in time, such as for the data exchange between a microprocessor and the data cache. This is because, except for some specific applications such as arithmetic and DSP circuits, patterns being transmitted over these buses usually have very limited correlation. When the objective shifts to address buses, a radically different behavior can be observed. Other techniques have been explored, all relying on the well-known fact that the addresses generated by a microprocessor are often consecutive, due to the spatial locality principle [78], which states that the memory elements whose addresses are close to each other tend to be referenced during successive clock cycles. The addresses generated by a running microprocessor are often consecutive, since instructions are stored in adjacent sections of the memory space, and structured data are stored in consecutive memory locations for better locality. Clearly, there are exceptions to this behavior such as control-flow instructions, which cause interruptions in the sequence of consecutive addresses on the instruction flow, and data not stored in arrays are often addressed without any regular pattern. In any case, sequential addressing usually dominates and so the temporal correlation between successive addresses is usually very high. To exploit the peculiar characteristics of the address buses, Su et al. [169] have proposed to encode the patterns with the Gray code, since it guarantees a single bit transition when consecutive addresses are accessed. The results reported in [169] show that the number of bit switches is reduced by 37%, on average, on several benchmark programs, when binary

7 Chapter 5. Bus Encoding Techniques for System-Level Power Optimization 139 encoding is replaced with Gray encoding. Although Gray code is suitable for reducing the switching activity, we need to consider the power overhead caused by the presence of additional circuitry for encoding and decoding. In fact, it is unrealistic to assume that the address computation units, the data-path, the memory decoders and even the compiler could be modified to generate Gray code addresses. Therefore, a Gray encoder must be placed at the transmitting end of the bus, and a Gray decoder is required at all the receiving ends. In [169], the trade-off between cost of encoding/decoding and the savings on the address buses is not discussed. Further investigation on Gray code addressing has been carried out in [127], where architectural solutions are proposed for the realization of the Gray addressing circuitry, and their performance are compared with the pure binary addressing in the case of a 16-bit address bus. In addition, the issue of modifying the Gray code so as to preserve the one-transition property for consecutive addresses of byte-addressable machines has been extensively discussed. A further encoding method for decreasing the address bus activity still relies on the locality property of the memory references ([132], [133]). In more detail, this work originates from the consideration that a program favors a limited number of working zones of the memory address space at each instant of time. Consequently, given a reference to one of these zones, we need to send over the bus only the information related to the offset of this reference with respect to the previous reference to that zone, along with the information of the current working zone. The information related to the offset is coded on the bus by adopting the Modified One-hot Encoding mode (MOE) on the entire bus width, so that the activity of the n- bit address bus is reduced to 1. Basically, the MOE process of a value v (where n/2 v n/2 1) consists of two steps. First, the one-hot encoding of v into n bits is performed; second, the one-hot encoding of v is xor-ed with the previous value sent over the n-bit address bus. The Working Zone Encoding (WZE) scheme is suitable for data address buses and for shared address buses. In the former case, the address sequentiality is often destroyed by the interleaved accesses to different data arrays, while in the latter case, the address sequentiality is diminished by the interleaved accesses to instruction and data locations. The WZE method restores sequentiality by memorizing the reference address of each working zone on the

8 140 receiver side and by sending only the highly sequential offset. Whenever the memory access moves towards a new working zone, this information is sent to the receiver with a special codeword. So the receiver changes the default reference address and the offset transmission can resume. The WZE technique has been extended in [103] to the data buses based on the observation that data for successive accesses to a working zone frequently differ by a small amount, and, therefore, it is also effective to use an encoding based on offsets for the data buses. Basically, the data is sent as an offset, coded in the one-hot encoding with transition signaling, and an additional register indicates the current working zone. The main limitation of the WZE is its larger encoder and decoder logic overhead, which limits the power benefits of the I/O switching activity reduction and introduces additional delays on signal paths. Another main drawback of the WZE consists of requiring extra bus lines for communicating the working zone change. In fact, besides the n-bits of the original address bus, the proposed method requires (lg 2 B + 1) extra wires, where 1 bit to acknowledge whether the value sent over the bus is an offset, while lg 2 B bits to carry the information of which of the B previous references to each working zone is being used. Furthermore, WZE relies on strong assumptions on the patterns in the stream. If the data-access policy is not array-based, or if the number of working zones is too large, this encoding scheme looses its effectiveness. Other encoding techniques at the system-level have been reviewed in [168], where the authors discuss the introduction of redundancy in space (number of bus lines), time (number of cycles) and voltage (number of distinct voltage levels). In particular, the time redundancy has been demonstrated to be as effective as the space redundancy for decreasing the average switching activity. Nevertheless it is impractical to apply in high performance systems, since it requires extra transfer cycles. In [168], codes have also been reported for both level signaling and transition signaling. Among transition signaling codes, Limited-Weight Codes are suitable for reducing the bus switching activity. In fact, LWCs reduce the number of logical 1s transmitted over the bus, that are responsible for all the high-to-low and low-tohigh transitions, while the logical 0s are represented by the lack of transitions. A further work on encoding [154] discusses the lower and upper bounds on the transition activity provided by encoding techniques, and a low power encoder/decoder suitable for high

9 Chapter 5. Bus Encoding Techniques for System-Level Power Optimization 141 capacitance buses, where the extra power due to the encoder/decoder is offset by the power savings at the bus level. The bounds on the expected transition activity have been obtained by an information theoretic approach, where a signal with a certain entropy rate is coded with additional bits per sample on average. The architecture proposed in this work consists of a data source that is passed through a de-correlating function followed by a variable entropy coding function, aiming at reducing the switching activity. Several encoding schemes can then be employed for both functions. All the aforementioned schemes are suitable for general-purpose microprocessor-based systems, while an application-dependent encoding method for special-purpose embedded systems has been proposed in ([20], [22]). The approach, called Beach solution, reduces the address bus activity of special-purpose systems, where a dedicated or intellectual proprietary core processor executes the same portion of embedded code repeatedly. The analysis of the execution traces of a given program allows an accurate computation of the correlations existing between blocks of bits in consecutive patterns. This information is exploited to determine an application-dependent encoding which reduces the transition activity. So, the target of the Beach code consists of reducing the bus activity even in the cases where the percentage of in-sequence addresses is limited. In this situation, it may be possible to exploit other types of temporal correlations than arithmetic sequentiality between the patterns that are being transmitted. Although the sequentiality of addresses may be lost by a taken branch or jump instruction, same portion of the patterns may still maintain high correlation. For processors adopting segment- or page-based memory organizations, this can be justified by the fact that intrasegment/page branches and jumps are much more frequent than intersegment/page ones. The encoding strategy is then determined depending on the particular stream being transmitted. The main limitation of the Beach solution is that it is strongly applicationdependent, thus it can find applicability only in dedicated core-based systems executing a given program many times. Moreover, the power saving may be traded-off by the extra power due to the encoding/decoding logic. The bus encoding techniques described so far aim at reducing the switching activity at the memory-processor interface by changing the format of the information transmitted over the bus. An alternative approach consists of directly changing the way the information is stored in

10 142 memory, so that the address streams generated by the processor have already low transition activity. Memory organization techniques have mainly investigated two research areas. The first one exploits the low-power cost associated with accesses to the highest levels of the memory hierarchy ([192], [50]). Basically, the power is reduced by organizing data in the hierarchy so as data transfers from the lowest levels are minimized. The second area of research targets the mapping of data in the memory to increase the number of sequential accesses in the processor-memory interface ([192], [146], [193]). From our analysis, we can notice how bus encoding and memory organization techniques constitute a relatively new research area to cope with the problem of power reduction from a system-level standpoint. Both bus encoding and memory organization techniques appeared so far provide advantages and limitations. The most part of bus encoding schemes can be applied without any knowledge about the functionality of the application being executed, while the Beach solution and the memory organization techniques need detailed information on the given application in terms of data structures, loops, and control flow structures. On the other hand, bus encoding imposes, in most cases, hardware overhead in terms of encoder/decoder and redundant lines, while memory organization techniques exploit the existing hardware. Furthermore, memory organization and bus encoding techniques are not mutually exclusive: optimal power minimization strategies should exploit their synergy. In this scenario, the primary goal of our research is to provide original address encoding techniques suitable for a general processor-memory architecture, usable in general-purpose microprocessor-based systems as well as dedicated embedded systems applications. For in-sequence addresses, the Gray code achieves the minimum switching activity among irredundant codes; however, better performance can be obtained by introducing a minimal redundancy as proposed in our T0 code. The basic idea of T0 code consists of exploiting the strong temporal correlation between successive addresses related to instructions and data arrays to achieve zero asymptotic transitions on the address bus. We also propose power and timing efficient implementations of the T0 and related mixed encoding/decoding logic, and we discuss the applicability of the technique to real microprocessor-based designs. Moreover, the codes proposed in this chapter can be used in conjunction with data allocation techniques emphasizing sequential memory access to further minimize the average number of bus line transitions.

11 Chapter 5. Bus Encoding Techniques for System-Level Power Optimization Asymptotic Zero-Transition Encoding In the ideal case of an infinite stream of consecutive instructions, Gray code achieves its asymptotic best performance of a single transition per emitted address. It appears that Gray code is the best possible code for reducing the switching activity, because one bit difference is the minimum needed to distinguish two binary numbers. The key observation that allows us to improve upon this result is realizing that Gray code is optimum only for irredundant codes, that is, codes that employ exactly N-bit patterns to encode a maximum of 2 N data words. If we add redundancy to the code, we can achieve better performance. Let us provide an additional redundant line, INC, to the address bus. Its purpose is to signal with value one that a consecutive stream of addresses is output on the bus. If INC is high, all other lines on the bus are frozen, to avoid unnecessary switchings. The new address is computed directly by the receiver. On the other hand, when two addresses are not consecutive, the INC line is low and the remaining bus lines are used as standard binary codes for the new addresses. Obviously, this redundant code outperforms the Gray code on unlimited streams of consecutive addresses. Since all addresses are consecutive, the INC line is always asserted, and the bus lines never switch. As a consequence, the asymptotic performance of the T0 code is zero transitions per emitted consecutive address. The increments between consecutive patterns can be parametric, reflecting the addressability scheme adopted in the given architecture. In this respect, our code has the same capabilities of the Gray schemes reported in [127]. More formally, our encoding scheme can be described as follows: (B (t), INC (t) ) = (B (t-1), 1 ) if t > 0 b (t) = b (t-1) + S (b (t), 0 ) otherwise (3) where B (t) is the value on the encoded bus lines at time t, INC (t) is the additional bus line, b (t) is the address value at time t and S is a constant power of 2, called stride. The corresponding decoding scheme can be formally defined as follows: b (t) = (b (t-1) + S) if INC = 1 t > 0 B (t) if INC = 0 (4) The T0 code retains its zero-transition property even if the addresses are incremented by a constant stride equal to a power of two (as it is often the case for practical machines which are

12 144 byte addressable, but that are able to access data or instructions aligned at word boundaries). Obviously, the stride does not correspond to the memory granularity, but to the memory word length or to the cache block size Analytical Performance Comparison Aim of this paragraph is to compare the T0 and bus-invert codes, considering the binary code as reference. For the analysis, we consider a stream of unlimited length whose addresses have a random uniform distribution, and a stream of unlimited length whose addresses are consecutive. Table 1 reports the results of the comparison, where N is the address bus width. Notice that, in the case of the bus-invert code, the average number of transitions per clock cycle, T N, is given by: where C k N is the binomial coefficient. T N = 1/2 N N/2 k C k N + 1 (5) k = 0 Stream Type Code Avg. Trans. per Clock Avg. Trans. per Clock per Line Avg. I/O Pow. Diss. Out-of-Sequence Binary N/ Addresses T0 N/ Bus Invert T N T N / N T N / (N/2) In-Sequence Binary N/ Addresses T Bus Invert T N T N / N T N / (N/2) Table 1: Analytical Performance Comparison Experimental Performance Comparison This paragraph aims at evaluating the performance of the T0 code in terms of the average number of switchings required by the transmission over the bus for sequences of both artificial and real patterns. Since the code is designed specifically for patterns that satisfy, in a large number of cases, the sequentiality hypothesis, we first study its behavior by encoding artificially generated streams in which out-of-sequence addresses are inserted with controlled probability. For the experiment, streams of addresses have been generated with percentage of sequential addresses ranging from 0 to 100. The diagrams of Figure 1 and

13 Chapter 5. Bus Encoding Techniques for System-Level Power Optimization 145 Figure 2 summarize the results of our comparison analysis with respect to the binary and Gray codes, respectively "Binary" "T0" Average Number of Transitions per Bus Line Percentage of In-Sequence Addresses Figure 1: Comparison Binary-T0 for Artificially Generated Addresses. In particular, they clearly shows that the average number of transitions per bus line is smaller for the case of T0 addressing than for the pure binary encoding. As expected, the advantage of the T0 code becomes more remarkable as the percentage of in-sequence addresses contained into the address stream increases. Concerning Gray encoding, the T0 offers lower transitions starting from approximately 50% of in-sequence addresses "Gray" "T0" Average Number of Transitions per Bus Line Percentage of In-Sequence Addresses Figure 2: Comparison Gray-T0 for Artificially Generated Addresses The simulation of address streams generated ad-hoc substantiates the theoretical performance of the T0 code. However, to prove its applicability to real cases, sequences of addresses

14 146 produced by a commercial microprocessors running real programs have been considered. First, we briefly analyze the behavior of the addresses generated by a RISC microprocessor, then we provide experimental results for comparing the above discussed encoding techniques. The number of transitions occurring on the address bus during the execution of benchmark programs, derived from different application domains, has been used as the metric to estimate the power consumption. The addresses generated by a microprocessor obey to the spatial locality principle, that is, they are often in-sequence and, in general, they present a very high correlation. However, the things are quite different if the addresses correspond to instructions or data. In fact, addresses corresponding to instructions usually show a more sequential behavior with respect to data addresses. Instructions are usually stored in adjacent locations of the memory space, while only structured data such as arrays are generally stored in consecutive memory locations to achieve better locality. Clearly, there are exceptions to this behavior: Control-flow instructions cause interruptions in the sequence of consecutive addresses on the instruction flow, and data not stored in arrays are often accessed without any regular pattern. Nevertheless, sequential addressing dominates on average. These considerations are supported by the data contained in Table 2 to Table 4, where we report the percentage of in-sequence addresses measured on real address streams occurring on the 32-bit address bus of the MIPS microprocessor, when different benchmark programs are executed. Three distinct cases have been considered: The instruction address bus; The data address bus; The instruction/data multiplexed address bus. The tables report also the number of transitions on the address bus, for the above mentioned classes of codes, and the percentage of transitions saved with respect to binary encoding. The average percentage of sequential addresses in the benchmark streams is higher for instructions addresses (63.04%) than for data address streams (11.39%). This behavior can be due to the fact that references to automatic variables such as loop counters destroy the sequentiality of the address streams even if array data structures are accessed sequentially.

15 Chapter 5. Bus Encoding Techniques for System-Level Power Optimization 147 Benchmark Stream In-Seq. Binary T0 Bus-Invert Length Addr. Trans. Trans. Savings Trans. Savings gzip % % % gunzip % % % ghostview % % % espresso % % % nova % % % jedi % % % latex % % % matlab % % % oracle % % % Average 63.04% 35.52% 0.03% Table 2: Experimental Comparison of Binary, T0 and Bus-Invert Schemes for Instruction Address Streams. Benchmark Stream In-Seq. Binary T0 Bus-Invert Length Addr. Trans. Trans. Savings Trans. Savings gzip % % % gunzip % % % ghostview % % % espresso % % % nova % % % jedi % % % latex % % % matlab % % % oracle % % % Average 11.39% 3.37% 10.78% Table 3: Experimental Comparison of Binary, T0 and Bus-Invert Schemes for Data Address Streams. Benchmark Stream In-Seq. Binary T0 Bus-Invert Length Addr. Trans. Trans. Savings Trans. Savings gzip % % % gunzip % % % ghostview % % % espresso % % % nova % % % jedi % % % latex % % % matlab % % % oracle % % % Average 57.62% 10.25% 9.79% Table 4: Experimental Comparison of Binary, T0 and Bus-Invert Schemes for Multiplexed Address Streams. As expected, when the probability of consecutive addresses is high, as for instruction addresses, a scheme such as T0 can be effective to reduce the transitions count, offering an average savings of 35.52% (see Table 2), while the Bus-Invert approach does not introduce any savings.

16 148 On the other hand, when the probability of in-sequence addresses is very low, as in the case of the data addresses reported in Table 3, the T0 code can provide only marginal advantages with respect to binary encoding (3.37%). The binary encoding can thus represent a good alternative for data addresses, since it does not require any redundancy and therefore no encoding and decoding circuitry. Among redundant codes, another alternative for data address buses can be represented by the Bus-Invert code, that is advantageous for the minimization of the average number of transitions (10.78% savings on average). When the address bus is multiplexed or shared between instructions and data addresses, as in the MIPS architecture, the sequential behavior is often interrupted, when the selection signal switches from instruction to data and vice versa. Hence, the multiplexed address bus shows an intermediate behavior (57.62%), as shown in Table 4. As a matter of fact, the sequentiality of the addresses on the bus is somewhat reduced by the time multiplexing and by the inherent randomness of the data addresses. For shared address buses, both the T0 and Bus-Invert codes provide transition savings of 10% approximately Mixed Bus Encoding Techniques The different effects of the above discussed encoding schemes on the number of bus transitions suggested us the use of combined techniques which may exploit the best properties of each method to further improve the overall system-level power budget. In this section, we first introduce some novel encoding options, then we present transition results collected during the execution of the usual set of benchmark programs on the MIPS processor T0_BI Encoding For architectures based on a single address bus used to transmit both instruction and data addresses, as in the case of external second-level unified data and instruction caches or for data address buses, a new encoding can be applied to exploit the advantages provided by the T0 and Bus-Invert approaches. Basically, the T0 code is used for in-sequence addresses, while the Bus-Invert is adopted when the addresses are out-of-sequence. The combined code, called T0_BI code, reduces the average power and requires two redundant lines, INC and INV. The formal definition of the code is the following:

17 Chapter 5. Bus Encoding Techniques for System-Level Power Optimization 149 (B (t-1), 1, 0 ) if t > 0 b (t) = b (t-1) + S (B (t), INC (t), INV (t) ) = (b (t), 0, 1) if t > 0 b (t) b (t-1) + S H (t) > (N + 2) / 2 (b (t), 0, 0) if b (t) b (t-1) + S H (t) (N + 2) / 2 (6) where B (t) is the value of the encoded bus lines at time t, INV (t) and INC (t) are the additional bus lines, b (t) is the address value at time t, H (t) = (B (t-1) INC (t-1) INV (t-1), b (t) 0 0 ) is the Hamming distance, and N is the bus width of b (t). The corresponding decoding scheme is given by: b (t) = B (t) if t > 0 INC = 0 INV = 1 b (t-1) + S if t > 0 INC = 1 B (t) if INC = 0 INV = 0 (7) Dual_T0 Encoding For architectures based on multiplexed address buses, such as the one of the MIPS microprocessor, two address streams α and β with quite different behavior, are timemultiplexed on the same address bus. Stream α, corresponding to instruction addresses, has high probability of having two consecutive addresses on the bus in two successive clock cycles; on the contrary, stream β, corresponding to data addresses, has almost no in-sequence patterns. The control signal, SEL, available in the standard bus interface to de-multiplex the bus at the receiver side, is asserted when stream α is transmitted; otherwise, SEL is deasserted. An extension of the T0 code, called Dual_T0 code, can be effective in these cases, providing the application of the T0 code and the updating of the encoding/decoding registers whenever SEL is asserted; otherwise, the binary code is applied and the encoding/decoding registers hold. Analytically, the Dual_T0 encoding can be expressed as: (B (t), INC (t) ) = (B (t-1), 1) if t > 0 SEL = 1 b (t) = b (t) + S (b (t), 0) otherwise (8) where B (t) is the value on the encoded bus lines at time t, INC (t) is the additional bus line, b (t) is the address value at time t, S is the stride, SEL is the selection signal, and b (t) is given by: b (t) = b (t-1) if SEL = 0 b (t-1) if t > 0 SEL = 1 (9) The corresponding decoding scheme follows:

18 150 b (t) = (b (t) + S) if t > 0 INC = 1 B (t) if INC = 0 (10) Dual_T0_BI Encoding Similarly to what we have done for the T0_BI code, we can define the Dual_T0_BI code. With this scheme, which is expected to work well on multiplexed buses, the overall switching activity can be reduced by resorting the T0 code anytime stream α is transmitted on the bus, while the bus-invert is used for stream β. As for the T0_BI code, the Dual_T0_BI can offer significant savings for the average power. The Dual_T0_BI encoding is defined as: (B (t-1), 1) if t > 0 SEL = 1 b (t) = b (t) + S (B (t), INC_INV (t) ) = (b (t), 1) if t > 0 SEL = 0 H (t) > N / 2 (b (t), 0) otherwise (11) where B (t) is the value on the encoded bus lines at time t, INC_INV (t) is the additional bus line, b (t) is the address value at time t, S is the stride, SEL is the selection signal, H (t) = (B (t- 1) INC_INV (t-1), b (t) 0 ) is the Hamming distance, N is the bus width of b (t), and b (t) is defined as before. The decoding scheme can be summarized with the following equation: b (t) = (b (t) + S) if t > 0 INC_INV = 1 SEL = 1 B (t) if t > 0 INC_INV = 1 SEL = 1 B (t) if INC_INV = 0 (12) Experimental Performance Comparison To verify the performance of the newly proposed encoding techniques, we still used the MIPS processor as reference architecture for measuring the number of address bus transitions required by the execution of the usual set of benchmark programs. Table 5 to Table 7 summarize the results, in terms of number of transitions and transitions savings with respect to the pure binary encoding.

19 Chapter 5. Bus Encoding Techniques for System-Level Power Optimization 151 Benchmark Stream In-Seq. Binary T0_BI Dual_T0 Dual_T0_BI Length Addr. Trans. Trans. Savings Trans. Savings Trans. Savings gzip % % % % gunzip % % % % ghostview % % % % espresso % % % % nova % % % % jedi % % % % latex % % % % matlab % % % % oracle % % % % Average 63.04% 34.92% 35.52% 35.52% Table 5: Experimental Comparison of Mixed Encoding Schemes for Instruction Address Streams. Benchmark Stream In-Seq. Binary T0_BI Dual_T0 Dual_T0_BI Length Addr. Trans. Trans. Savings Trans. Savings Trans. Savings gzip % % % % gunzip % % % % ghostview % % % % espresso % % % % nova % % % % jedi % % % % latex % % % % matlab % % % % oracle % % % % Average 11.39% 12.82% 0.00% 10.66% Table 6: Experimental Comparison of Mixed Encoding Schemes for Data Address Streams. Benchmark Stream In-Seq. Binary T0_BI Dual_T0 Dual_T0_BI Length Addr. Trans. Trans. Savings Trans. Savings Trans. Savings gzip % % % % gunzip % % % % ghostview % % % % espresso % % % % nova % % % % jedi % % % % latex % % % % matlab % % % % oracle % % % % Average 57.62% 19.56% 12.15% 22.25% Table 7: Experimental Comparison of Mixed Encoding Schemes for Multiplexed Address Streams. When the percentage of in-sequence addresses is high, as in the case of instruction address streams (see Table 5), the savings provided by the T0_BI, the Dual_T0 and the Dual_T0_BI

20 152 codes are very remarkable (35.32%, on average, with respect to pure binary). However, almost the same savings have been obtained by using the simple T0 code. Thus, the latter seems to be the most preferable solution in this case, due to the relatively low cost of the T0 encoding/decoding circuitry. Regarding the data address streams, shown in Table 6, no savings are provided by the Dual_T0 code, while the T0_BI and the Dual_T0_BI offer some advantages with respect to binary encoding (12.82% and 10.66%, respectively). By comparing these results to the ones provided by the bus-invert method, we can claim that the T0_BI represents the most effective solution for average power minimization. If we consider multiplexed address buses (see Table 7), the Dual_T0_BI code shows the best savings (22.25% on average) if compared to the T0, T0_BI, and Dual_T0 codes (10.25%, 19.56%, and 12.15% over pure binary). Furthermore, the Dual_T0_BI provides the absolute best savings, thus being the most effective code, concerning the average power, for the address bus of the MIPS microprocessor Encoding and Decoding Logic The results of the experimental comparisons show that, for a muxed bus, the Dual_T0_BI code is the most effective in terms of transition count. It is now necessary to evaluate whether the power savings achievable through activity reduction is not offset by the circuitry required to implement the code on a system bus. After describing the basic encoder/decoder architecture for the T0 and Dual_T0_BI codes, we analyze their power consumption when the encoding scheme is used to minimize the power of on-chip and off-chip buses Architectures At a given clock cycle, t, the encoder computes the incremented address of cycle t-1 and compares it to the address generated at cycle t. If the incremented old (t-1) address and the new (t) address are equal, the INC line is raised, and the old address is left on the bus. The encoder architecture is shown on the left of Figure 3. The incrementer can be programmable, to be able to flexibly define the constant increment S. The encoder inserts one cycle delay between the arrival of the address b and the output of the encoded bus B. We do not consider this delay an overhead of the encoding. Even if binary code is used (i.e., no encoding), the

21 Chapter 5. Bus Encoding Techniques for System-Level Power Optimization 153 FFs on the output B would be needed because the address b is generated by complex logic which produces glitches and misaligned transitions. The FFs filter out glitches and align the transitions on B to the clock edge. Glitches on B must be avoided because B is connected to large output buffers that should always be driven by clean and fast edges, to eliminate excessive power dissipation and signal quality degradation. The decoder architecture is even simpler. At any given clock cycle, the last cycle s address is incremented. If the INC line is high, the old incremented value is used for addressing; otherwise, the value coming from the bus lines is selected. The decoder is depicted on the right of Figure 3. b D +S EQ 1 0 SEL D D B INC B INC 0 1 SEL S+ D b Figure 3: Block Diagrams of the T0 Code Encoder and Decoder. The encoder/decoder implementation is optimized for minimum delay. If the speed constraints are not tight, low-power implementations should be considered. For example, it is possible to disable the incrementer in the decoder when INC = 0. In this case, however, the incrementer delay would be added to the critical path (starting from the late arriving signal INC), and performance would be penalized. We consider delay constraints, the most critical ones in high-performance microprocessors. Since the bus input b is produced by the complex address computation logic, it is expected to have a late arrival time. Thus, the critical path will be in the encoder from b through the comparator EQ and the control-to-output delay of the multiplexer (the setup time of the register controlling the bus B must be added to the critical path as well). If the arrival time of b is not critical, the next critical path is through the incrementer, the comparator and the multiplexer. It is unlikely that this relatively simple logic will ever become the critical path of a complex microprocessor design, where much more complex tasks are usually performed in a single clock cycle. Consequently, we will discuss the

22 154 possibility that the encoder could constrain the critical path because of the delay from the late arriving b to the output of the multiplexer (going through the comparator). Fortunately, the combination of the T0 encoder and decoder is very fast on the critical path: The comparator can be implemented with an XOR tree structure (which has a logarithmic delay D cmp = K cmp log(n)) and the delay through a multiplexer is weakly dependent on the width of the bus. The dependence is due to the load on the control input which increases linearly with the width of the bus. If we drive the control input (SEL) with a tapered buffer, the delay has a logarithmic dependence on the bus width: D mux = K mux log(n). In the formulas for D cmp and D mux the constants K cmp and K mux are technology dependent. The incrementers in the encoder and decoder are not strongly timing constrained, thus we can implement them in a power-efficient fashion, as long as their delays do not become critical. Compared to the Gray encoder and decoder [127], our architectures are more area and power consuming. However, the performance of the Gray scheme is limited by the decoder (implemented as a chain of EXOR gates [127]), which has a delay D gray = K gray N. For wide buses in performance-constrained systems, the delay penalty of Gray addressing may be simply unacceptable, leaving the T0 code as the only alternative to standard binary encoding. For purely power-constrained systems, the designer s choice will be based on the trade-off between the additional power savings on the bus provided by the T0 code and the reduced power dissipation of the Gray encoder/decoder. Gray code would probably be the preferred choice for area-constrained systems where power dissipation is a secondary concern. Concerning the Dual_T0_BI code, the encoder architecture consists of a section for the T0 encoding which generates the INC signal, a section for the bus-invert logic providing the INV signal, and the output multiplexor, which is controlled by the input signal SEL and by signal INC_INV = INC + INV. While the architecture of the T0 section has been fully described, the bus-invert section has been realized by a Hamming distance evaluator of the encoded bus lines at time (t - 1) concatenated with the INC_INV signal and the address value at the present time t, followed by a majority voter to decide if the computed Hamming distance is greater than half of the bus width. Besides the encoded bus lines, the circuit outputs the INC_INV signal (the SEL signal, needed by the decoder, is already present on the bus). Concerning the decoder, its architecture can be obtained easily from the Dual_T0_BI decoding equations (SEL and INC_INV are the control signals of the multiplexer).

Bus Encoding Technique for hierarchical memory system Anne Pratoomtong and Weiping Liao

Bus Encoding Technique for hierarchical memory system Anne Pratoomtong and Weiping Liao Bus Encoding Technique for hierarchical memory system Anne Pratoomtong and Weiping Liao Abstract In microprocessor-based systems, data and address buses are the core of the interface between a microprocessor

More information

Address Bus Encoding Techniques for System-Level Power Optimization. Dip. di Automatica e Informatica. Dip. di Elettronica per l'automazione

Address Bus Encoding Techniques for System-Level Power Optimization. Dip. di Automatica e Informatica. Dip. di Elettronica per l'automazione Address Bus Encoding Techniques for System-Level Power Optimization Luca Benini $ Giovanni De Micheli $ Enrico Macii Donatella Sciuto z Cristina Silvano # z Politecnico di Milano Dip. di Elettronica e

More information

Low-Power Data Address Bus Encoding Method

Low-Power Data Address Bus Encoding Method Low-Power Data Address Bus Encoding Method Tsung-Hsi Weng, Wei-Hao Chiao, Jean Jyh-Jiun Shann, Chung-Ping Chung, and Jimmy Lu Dept. of Computer Science and Information Engineering, National Chao Tung University,

More information

Reference Caching Using Unit Distance Redundant Codes for Activity Reduction on Address Buses

Reference Caching Using Unit Distance Redundant Codes for Activity Reduction on Address Buses Reference Caching Using Unit Distance Redundant Codes for Activity Reduction on Address Buses Tony Givargis and David Eppstein Department of Information and Computer Science Center for Embedded Computer

More information

OUTLINE Introduction Power Components Dynamic Power Optimization Conclusions

OUTLINE Introduction Power Components Dynamic Power Optimization Conclusions OUTLINE Introduction Power Components Dynamic Power Optimization Conclusions 04/15/14 1 Introduction: Low Power Technology Process Hardware Architecture Software Multi VTH Low-power circuits Parallelism

More information

Power Estimation of System-Level Buses for Microprocessor-Based Architectures: A Case Study

Power Estimation of System-Level Buses for Microprocessor-Based Architectures: A Case Study Power Estimation of System-Level Buses for Microprocessor-Based Architectures: A Case Study William Fornaciari Politecnico di Milano, DEI Milano (Italy) fornacia@elet.polimi.it Donatella Sciuto Politecnico

More information

Memory Bus Encoding for Low Power: A Tutorial

Memory Bus Encoding for Low Power: A Tutorial Memory Bus Encoding for Low Power: A Tutorial Wei-Chung Cheng and Massoud Pedram University of Southern California Department of EE-Systems Los Angeles CA 90089 Outline Background Memory Bus Encoding Techniques

More information

A Low Power Design of Gray and T0 Codecs for the Address Bus Encoding for System Level Power Optimization

A Low Power Design of Gray and T0 Codecs for the Address Bus Encoding for System Level Power Optimization A Low Power Design of Gray and T0 Codecs for the Address Bus Encoding for System Level Power Optimization Prabhat K. Saraswat, Ghazal Haghani and Appiah Kubi Bernard Advanced Learning and Research Institute,

More information

RTL Power Estimation and Optimization

RTL Power Estimation and Optimization Power Modeling Issues RTL Power Estimation and Optimization Model granularity Model parameters Model semantics Model storage Model construction Politecnico di Torino Dip. di Automatica e Informatica RTL

More information

Efficient Power Reduction Techniques for Time Multiplexed Address Buses

Efficient Power Reduction Techniques for Time Multiplexed Address Buses Efficient Power Reduction Techniques for Time Multiplexed Address Buses Mahesh Mamidipaka enter for Embedded omputer Systems Univ. of alifornia, Irvine, USA maheshmn@cecs.uci.edu Nikil Dutt enter for Embedded

More information

Shift Invert Coding (SINV) for Low Power VLSI

Shift Invert Coding (SINV) for Low Power VLSI Shift Invert oding (SINV) for Low Power VLSI Jayapreetha Natesan* and Damu Radhakrishnan State University of New York Department of Electrical and omputer Engineering New Paltz, NY, U.S. email: natesa76@newpaltz.edu

More information

Architectures and Synthesis Algorithms for Power-Efficient Bus Interfaces

Architectures and Synthesis Algorithms for Power-Efficient Bus Interfaces IEEE TRANSACTIONS ON COMPUTER AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 19, NO. 9, SEPTEMBER 2000 969 Architectures and Synthesis Algorithms for Power-Efficient Bus Interfaces Luca Benini,

More information

How Much Logic Should Go in an FPGA Logic Block?

How Much Logic Should Go in an FPGA Logic Block? How Much Logic Should Go in an FPGA Logic Block? Vaughn Betz and Jonathan Rose Department of Electrical and Computer Engineering, University of Toronto Toronto, Ontario, Canada M5S 3G4 {vaughn, jayar}@eecgutorontoca

More information

Chapter 7 The Potential of Special-Purpose Hardware

Chapter 7 The Potential of Special-Purpose Hardware Chapter 7 The Potential of Special-Purpose Hardware The preceding chapters have described various implementation methods and performance data for TIGRE. This chapter uses those data points to propose architecture

More information

Exploiting On-Chip Data Transfers for Improving Performance of Chip-Scale Multiprocessors

Exploiting On-Chip Data Transfers for Improving Performance of Chip-Scale Multiprocessors Exploiting On-Chip Data Transfers for Improving Performance of Chip-Scale Multiprocessors G. Chen 1, M. Kandemir 1, I. Kolcu 2, and A. Choudhary 3 1 Pennsylvania State University, PA 16802, USA 2 UMIST,

More information

Power Efficient Arithmetic Operand Encoding

Power Efficient Arithmetic Operand Encoding Power Efficient Arithmetic Operand Encoding Eduardo Costa, Sergio Bampi José Monteiro UFRGS IST/INESC P. Alegre, Brazil Lisboa, Portugal ecosta,bampi@inf.ufrgs.br jcm@algos.inesc.pt Abstract This paper

More information

Copyright 2012, Elsevier Inc. All rights reserved.

Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design Edited by Mansour Al Zuair 1 Introduction Programmers want unlimited amounts of memory with low latency Fast

More information

CPE300: Digital System Architecture and Design

CPE300: Digital System Architecture and Design CPE300: Digital System Architecture and Design Fall 2011 MW 17:30-18:45 CBC C316 Cache 11232011 http://www.egr.unlv.edu/~b1morris/cpe300/ 2 Outline Review Memory Components/Boards Two-Level Memory Hierarchy

More information

Tradeoff between coverage of a Markov prefetcher and memory bandwidth usage

Tradeoff between coverage of a Markov prefetcher and memory bandwidth usage Tradeoff between coverage of a Markov prefetcher and memory bandwidth usage Elec525 Spring 2005 Raj Bandyopadhyay, Mandy Liu, Nico Peña Hypothesis Some modern processors use a prefetching unit at the front-end

More information

MIPS) ( MUX

MIPS) ( MUX Memory What do we use for accessing small amounts of data quickly? Registers (32 in MIPS) Why not store all data and instructions in registers? Too much overhead for addressing; lose speed advantage Register

More information

CHAPTER 3 ASYNCHRONOUS PIPELINE CONTROLLER

CHAPTER 3 ASYNCHRONOUS PIPELINE CONTROLLER 84 CHAPTER 3 ASYNCHRONOUS PIPELINE CONTROLLER 3.1 INTRODUCTION The introduction of several new asynchronous designs which provides high throughput and low latency is the significance of this chapter. The

More information

Behavioral Array Mapping into Multiport Memories Targeting Low Power 3

Behavioral Array Mapping into Multiport Memories Targeting Low Power 3 Behavioral Array Mapping into Multiport Memories Targeting Low Power 3 Preeti Ranjan Panda and Nikil D. Dutt Department of Information and Computer Science University of California, Irvine, CA 92697-3425,

More information

Chapter Seven. Large & Fast: Exploring Memory Hierarchy

Chapter Seven. Large & Fast: Exploring Memory Hierarchy Chapter Seven Large & Fast: Exploring Memory Hierarchy 1 Memories: Review SRAM (Static Random Access Memory): value is stored on a pair of inverting gates very fast but takes up more space than DRAM DRAM

More information

Chapter 7 Large and Fast: Exploiting Memory Hierarchy. Memory Hierarchy. Locality. Memories: Review

Chapter 7 Large and Fast: Exploiting Memory Hierarchy. Memory Hierarchy. Locality. Memories: Review Memories: Review Chapter 7 Large and Fast: Exploiting Hierarchy DRAM (Dynamic Random Access ): value is stored as a charge on capacitor that must be periodically refreshed, which is why it is called dynamic

More information

Cache Justification for Digital Signal Processors

Cache Justification for Digital Signal Processors Cache Justification for Digital Signal Processors by Michael J. Lee December 3, 1999 Cache Justification for Digital Signal Processors By Michael J. Lee Abstract Caches are commonly used on general-purpose

More information

Memory Systems IRAM. Principle of IRAM

Memory Systems IRAM. Principle of IRAM Memory Systems 165 other devices of the module will be in the Standby state (which is the primary state of all RDRAM devices) or another state with low-power consumption. The RDRAM devices provide several

More information

Slide Set 9. for ENCM 369 Winter 2018 Section 01. Steve Norman, PhD, PEng

Slide Set 9. for ENCM 369 Winter 2018 Section 01. Steve Norman, PhD, PEng Slide Set 9 for ENCM 369 Winter 2018 Section 01 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary March 2018 ENCM 369 Winter 2018 Section 01

More information

Chapter 3. Top Level View of Computer Function and Interconnection. Yonsei University

Chapter 3. Top Level View of Computer Function and Interconnection. Yonsei University Chapter 3 Top Level View of Computer Function and Interconnection Contents Computer Components Computer Function Interconnection Structures Bus Interconnection PCI 3-2 Program Concept Computer components

More information

Abridged Addressing: A Low Power Memory Addressing Strategy

Abridged Addressing: A Low Power Memory Addressing Strategy Abridged Addressing: A Low Power Memory Addressing Strategy Preeti anjan Panda Department of Computer Science and Engineering Indian Institute of Technology Delhi Hauz Khas New Delhi 006 INDIA Abstract

More information

VERY large scale integration (VLSI) design for power

VERY large scale integration (VLSI) design for power IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 7, NO. 1, MARCH 1999 25 Short Papers Segmented Bus Design for Low-Power Systems J. Y. Chen, W. B. Jone, Member, IEEE, J. S. Wang,

More information

Basic Processing Unit: Some Fundamental Concepts, Execution of a. Complete Instruction, Multiple Bus Organization, Hard-wired Control,

Basic Processing Unit: Some Fundamental Concepts, Execution of a. Complete Instruction, Multiple Bus Organization, Hard-wired Control, UNIT - 7 Basic Processing Unit: Some Fundamental Concepts, Execution of a Complete Instruction, Multiple Bus Organization, Hard-wired Control, Microprogrammed Control Page 178 UNIT - 7 BASIC PROCESSING

More information

Cache Memory COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals

Cache Memory COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals Cache Memory COE 403 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals Presentation Outline The Need for Cache Memory The Basics

More information

EE414 Embedded Systems Ch 5. Memory Part 2/2

EE414 Embedded Systems Ch 5. Memory Part 2/2 EE414 Embedded Systems Ch 5. Memory Part 2/2 Byung Kook Kim School of Electrical Engineering Korea Advanced Institute of Science and Technology Overview 6.1 introduction 6.2 Memory Write Ability and Storage

More information

A Cache Hierarchy in a Computer System

A Cache Hierarchy in a Computer System A Cache Hierarchy in a Computer System Ideally one would desire an indefinitely large memory capacity such that any particular... word would be immediately available... We are... forced to recognize the

More information

Page 1. Multilevel Memories (Improving performance using a little cash )

Page 1. Multilevel Memories (Improving performance using a little cash ) Page 1 Multilevel Memories (Improving performance using a little cash ) 1 Page 2 CPU-Memory Bottleneck CPU Memory Performance of high-speed computers is usually limited by memory bandwidth & latency Latency

More information

LECTURE 11. Memory Hierarchy

LECTURE 11. Memory Hierarchy LECTURE 11 Memory Hierarchy MEMORY HIERARCHY When it comes to memory, there are two universally desirable properties: Large Size: ideally, we want to never have to worry about running out of memory. Speed

More information

Design of Low Power Wide Gates used in Register File and Tag Comparator

Design of Low Power Wide Gates used in Register File and Tag Comparator www..org 1 Design of Low Power Wide Gates used in Register File and Tag Comparator Isac Daimary 1, Mohammed Aneesh 2 1,2 Department of Electronics Engineering, Pondicherry University Pondicherry, 605014,

More information

Computer Systems. Binary Representation. Binary Representation. Logical Computation: Boolean Algebra

Computer Systems. Binary Representation. Binary Representation. Logical Computation: Boolean Algebra Binary Representation Computer Systems Information is represented as a sequence of binary digits: Bits What the actual bits represent depends on the context: Seminar 3 Numerical value (integer, floating

More information

A New Architecture for 2 s Complement Gray Encoded Array Multiplier

A New Architecture for 2 s Complement Gray Encoded Array Multiplier A New Architecture for s Complement Gray Encoded Array Multiplier Eduardo Costa Sergio Bampi José Monteiro UCPel, Pelotas, Brazil UFRGS, P. Alegre, Brazil IST/INESC, Lisboa, Portugal ecosta@atlas.ucpel.tche.br

More information

Chapter 4: Implicit Error Detection

Chapter 4: Implicit Error Detection 4. Chpter 5 Chapter 4: Implicit Error Detection Contents 4.1 Introduction... 4-2 4.2 Network error correction... 4-2 4.3 Implicit error detection... 4-3 4.4 Mathematical model... 4-6 4.5 Simulation setup

More information

Top-Level View of Computer Organization

Top-Level View of Computer Organization Top-Level View of Computer Organization Bởi: Hoang Lan Nguyen Computer Component Contemporary computer designs are based on concepts developed by John von Neumann at the Institute for Advanced Studies

More information

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more

More information

CS 24: INTRODUCTION TO. Spring 2015 Lecture 2 COMPUTING SYSTEMS

CS 24: INTRODUCTION TO. Spring 2015 Lecture 2 COMPUTING SYSTEMS CS 24: INTRODUCTION TO Spring 2015 Lecture 2 COMPUTING SYSTEMS LAST TIME! Began exploring the concepts behind a simple programmable computer! Construct the computer using Boolean values (a.k.a. bits )

More information

Chapter 2: Memory Hierarchy Design Part 2

Chapter 2: Memory Hierarchy Design Part 2 Chapter 2: Memory Hierarchy Design Part 2 Introduction (Section 2.1, Appendix B) Caches Review of basics (Section 2.1, Appendix B) Advanced methods (Section 2.3) Main Memory Virtual Memory Fundamental

More information

CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS

CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS UNIT-I OVERVIEW & INSTRUCTIONS 1. What are the eight great ideas in computer architecture? The eight

More information

Reducing Hit Times. Critical Influence on cycle-time or CPI. small is always faster and can be put on chip

Reducing Hit Times. Critical Influence on cycle-time or CPI. small is always faster and can be put on chip Reducing Hit Times Critical Influence on cycle-time or CPI Keep L1 small and simple small is always faster and can be put on chip interesting compromise is to keep the tags on chip and the block data off

More information

The Encoding Complexity of Network Coding

The Encoding Complexity of Network Coding The Encoding Complexity of Network Coding Michael Langberg Alexander Sprintson Jehoshua Bruck California Institute of Technology Email: mikel,spalex,bruck @caltech.edu Abstract In the multicast network

More information

Complexity-effective Enhancements to a RISC CPU Architecture

Complexity-effective Enhancements to a RISC CPU Architecture Complexity-effective Enhancements to a RISC CPU Architecture Jeff Scott, John Arends, Bill Moyer Embedded Platform Systems, Motorola, Inc. 7700 West Parmer Lane, Building C, MD PL31, Austin, TX 78729 {Jeff.Scott,John.Arends,Bill.Moyer}@motorola.com

More information

On Using Machine Learning for Logic BIST

On Using Machine Learning for Logic BIST On Using Machine Learning for Logic BIST Christophe FAGOT Patrick GIRARD Christian LANDRAULT Laboratoire d Informatique de Robotique et de Microélectronique de Montpellier, UMR 5506 UNIVERSITE MONTPELLIER

More information

Chapter 2: Memory Hierarchy Design Part 2

Chapter 2: Memory Hierarchy Design Part 2 Chapter 2: Memory Hierarchy Design Part 2 Introduction (Section 2.1, Appendix B) Caches Review of basics (Section 2.1, Appendix B) Advanced methods (Section 2.3) Main Memory Virtual Memory Fundamental

More information

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition The Processor - Introduction

More information

ECE902 Virtual Machine Final Project: MIPS to CRAY-2 Binary Translation

ECE902 Virtual Machine Final Project: MIPS to CRAY-2 Binary Translation ECE902 Virtual Machine Final Project: MIPS to CRAY-2 Binary Translation Weiping Liao, Saengrawee (Anne) Pratoomtong, and Chuan Zhang Abstract Binary translation is an important component for translating

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor. COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor The Processor - Introduction

More information

Digital System Design Using Verilog. - Processing Unit Design

Digital System Design Using Verilog. - Processing Unit Design Digital System Design Using Verilog - Processing Unit Design 1.1 CPU BASICS A typical CPU has three major components: (1) Register set, (2) Arithmetic logic unit (ALU), and (3) Control unit (CU) The register

More information

Power Aware Encoding for the Instruction Address Buses Using Program Constructs

Power Aware Encoding for the Instruction Address Buses Using Program Constructs Power Aware Encoding for the Instruction Address Buses Using Program Constructs Prakash Krishnamoorthy and Meghanad D. Wagh Abstract This paper examines the address traces produced by various program constructs.

More information

RAID SEMINAR REPORT /09/2004 Asha.P.M NO: 612 S7 ECE

RAID SEMINAR REPORT /09/2004 Asha.P.M NO: 612 S7 ECE RAID SEMINAR REPORT 2004 Submitted on: Submitted by: 24/09/2004 Asha.P.M NO: 612 S7 ECE CONTENTS 1. Introduction 1 2. The array and RAID controller concept 2 2.1. Mirroring 3 2.2. Parity 5 2.3. Error correcting

More information

Lecture notes for CS Chapter 2, part 1 10/23/18

Lecture notes for CS Chapter 2, part 1 10/23/18 Chapter 2: Memory Hierarchy Design Part 2 Introduction (Section 2.1, Appendix B) Caches Review of basics (Section 2.1, Appendix B) Advanced methods (Section 2.3) Main Memory Virtual Memory Fundamental

More information

Ten Reasons to Optimize a Processor

Ten Reasons to Optimize a Processor By Neil Robinson SoC designs today require application-specific logic that meets exacting design requirements, yet is flexible enough to adjust to evolving industry standards. Optimizing your processor

More information

Cache memories are small, fast SRAM-based memories managed automatically in hardware. Hold frequently accessed blocks of main memory

Cache memories are small, fast SRAM-based memories managed automatically in hardware. Hold frequently accessed blocks of main memory Cache Memories Cache memories are small, fast SRAM-based memories managed automatically in hardware. Hold frequently accessed blocks of main memory CPU looks first for data in caches (e.g., L1, L2, and

More information

Computer Architecture V Fall Practice Exam Questions

Computer Architecture V Fall Practice Exam Questions Computer Architecture V22.0436 Fall 2002 Practice Exam Questions These are practice exam questions for the material covered since the mid-term exam. Please note that the final exam is cumulative. See the

More information

EITF20: Computer Architecture Part4.1.1: Cache - 2

EITF20: Computer Architecture Part4.1.1: Cache - 2 EITF20: Computer Architecture Part4.1.1: Cache - 2 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Cache performance optimization Bandwidth increase Reduce hit time Reduce miss penalty Reduce miss

More information

EXAM 1 SOLUTIONS. Midterm Exam. ECE 741 Advanced Computer Architecture, Spring Instructor: Onur Mutlu

EXAM 1 SOLUTIONS. Midterm Exam. ECE 741 Advanced Computer Architecture, Spring Instructor: Onur Mutlu Midterm Exam ECE 741 Advanced Computer Architecture, Spring 2009 Instructor: Onur Mutlu TAs: Michael Papamichael, Theodoros Strigkos, Evangelos Vlachos February 25, 2009 EXAM 1 SOLUTIONS Problem Points

More information

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology 1 Multilevel Memories Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Based on the material prepared by Krste Asanovic and Arvind CPU-Memory Bottleneck 6.823

More information

Comparing Multiported Cache Schemes

Comparing Multiported Cache Schemes Comparing Multiported Cache Schemes Smaїl Niar University of Valenciennes, France Smail.Niar@univ-valenciennes.fr Lieven Eeckhout Koen De Bosschere Ghent University, Belgium {leeckhou,kdb}@elis.rug.ac.be

More information

Chapter-5 Memory Hierarchy Design

Chapter-5 Memory Hierarchy Design Chapter-5 Memory Hierarchy Design Unlimited amount of fast memory - Economical solution is memory hierarchy - Locality - Cost performance Principle of locality - most programs do not access all code or

More information

CHAPTER 5 PROPAGATION DELAY

CHAPTER 5 PROPAGATION DELAY 98 CHAPTER 5 PROPAGATION DELAY Underwater wireless sensor networks deployed of sensor nodes with sensing, forwarding and processing abilities that operate in underwater. In this environment brought challenges,

More information

Chapter 5B. Large and Fast: Exploiting Memory Hierarchy

Chapter 5B. Large and Fast: Exploiting Memory Hierarchy Chapter 5B Large and Fast: Exploiting Memory Hierarchy One Transistor Dynamic RAM 1-T DRAM Cell word access transistor V REF TiN top electrode (V REF ) Ta 2 O 5 dielectric bit Storage capacitor (FET gate,

More information

FILTER SYNTHESIS USING FINE-GRAIN DATA-FLOW GRAPHS. Waqas Akram, Cirrus Logic Inc., Austin, Texas

FILTER SYNTHESIS USING FINE-GRAIN DATA-FLOW GRAPHS. Waqas Akram, Cirrus Logic Inc., Austin, Texas FILTER SYNTHESIS USING FINE-GRAIN DATA-FLOW GRAPHS Waqas Akram, Cirrus Logic Inc., Austin, Texas Abstract: This project is concerned with finding ways to synthesize hardware-efficient digital filters given

More information

Chapter 5A. Large and Fast: Exploiting Memory Hierarchy

Chapter 5A. Large and Fast: Exploiting Memory Hierarchy Chapter 5A Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) Fast, expensive Dynamic RAM (DRAM) In between Magnetic disk Slow, inexpensive Ideal memory Access time of SRAM

More information

a) Memory management unit b) CPU c) PCI d) None of the mentioned

a) Memory management unit b) CPU c) PCI d) None of the mentioned 1. CPU fetches the instruction from memory according to the value of a) program counter b) status register c) instruction register d) program status word 2. Which one of the following is the address generated

More information

Copyright 2012, Elsevier Inc. All rights reserved.

Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more

More information

Threshold-Based Markov Prefetchers

Threshold-Based Markov Prefetchers Threshold-Based Markov Prefetchers Carlos Marchani Tamer Mohamed Lerzan Celikkanat George AbiNader Rice University, Department of Electrical and Computer Engineering ELEC 525, Spring 26 Abstract In this

More information

Maintaining Mutual Consistency for Cached Web Objects

Maintaining Mutual Consistency for Cached Web Objects Maintaining Mutual Consistency for Cached Web Objects Bhuvan Urgaonkar, Anoop George Ninan, Mohammad Salimullah Raunak Prashant Shenoy and Krithi Ramamritham Department of Computer Science, University

More information

Selective Fill Data Cache

Selective Fill Data Cache Selective Fill Data Cache Rice University ELEC525 Final Report Anuj Dharia, Paul Rodriguez, Ryan Verret Abstract Here we present an architecture for improving data cache miss rate. Our enhancement seeks

More information

ADVANCED COMPUTER ARCHITECTURE TWO MARKS WITH ANSWERS

ADVANCED COMPUTER ARCHITECTURE TWO MARKS WITH ANSWERS ADVANCED COMPUTER ARCHITECTURE TWO MARKS WITH ANSWERS 1.Define Computer Architecture Computer Architecture Is Defined As The Functional Operation Of The Individual H/W Unit In A Computer System And The

More information

Embedded Systems Design: A Unified Hardware/Software Introduction. Outline. Chapter 5 Memory. Introduction. Memory: basic concepts

Embedded Systems Design: A Unified Hardware/Software Introduction. Outline. Chapter 5 Memory. Introduction. Memory: basic concepts Hardware/Software Introduction Chapter 5 Memory Outline Memory Write Ability and Storage Permanence Common Memory Types Composing Memory Memory Hierarchy and Cache Advanced RAM 1 2 Introduction Memory:

More information

Embedded Systems Design: A Unified Hardware/Software Introduction. Chapter 5 Memory. Outline. Introduction

Embedded Systems Design: A Unified Hardware/Software Introduction. Chapter 5 Memory. Outline. Introduction Hardware/Software Introduction Chapter 5 Memory 1 Outline Memory Write Ability and Storage Permanence Common Memory Types Composing Memory Memory Hierarchy and Cache Advanced RAM 2 Introduction Embedded

More information

Chapter 3 - Top Level View of Computer Function

Chapter 3 - Top Level View of Computer Function Chapter 3 - Top Level View of Computer Function Luis Tarrataca luis.tarrataca@gmail.com CEFET-RJ L. Tarrataca Chapter 3 - Top Level View 1 / 127 Table of Contents I 1 Introduction 2 Computer Components

More information

SRAMs to Memory. Memory Hierarchy. Locality. Low Power VLSI System Design Lecture 10: Low Power Memory Design

SRAMs to Memory. Memory Hierarchy. Locality. Low Power VLSI System Design Lecture 10: Low Power Memory Design SRAMs to Memory Low Power VLSI System Design Lecture 0: Low Power Memory Design Prof. R. Iris Bahar October, 07 Last lecture focused on the SRAM cell and the D or D memory architecture built from these

More information

THE latest generation of microprocessors uses a combination

THE latest generation of microprocessors uses a combination 1254 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 30, NO. 11, NOVEMBER 1995 A 14-Port 3.8-ns 116-Word 64-b Read-Renaming Register File Creigton Asato Abstract A 116-word by 64-b register file for a 154 MHz

More information

EITF20: Computer Architecture Part4.1.1: Cache - 2

EITF20: Computer Architecture Part4.1.1: Cache - 2 EITF20: Computer Architecture Part4.1.1: Cache - 2 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Cache performance optimization Bandwidth increase Reduce hit time Reduce miss penalty Reduce miss

More information

Transition Reduction in Memory Buses Using Sector-based Encoding Techniques

Transition Reduction in Memory Buses Using Sector-based Encoding Techniques Transition Reduction in Memory Buses Using Sector-based Encoding Techniques Yazdan Aghaghiri University of Southern California 3740 McClintock Ave Los Angeles, CA 90089 yazdan@sahand.usc.edu Farzan Fallah

More information

SDR Forum Technical Conference 2007

SDR Forum Technical Conference 2007 THE APPLICATION OF A NOVEL ADAPTIVE DYNAMIC VOLTAGE SCALING SCHEME TO SOFTWARE DEFINED RADIO Craig Dolwin (Toshiba Research Europe Ltd, Bristol, UK, craig.dolwin@toshiba-trel.com) ABSTRACT This paper presents

More information

Low-Power Instruction Bus Encoding for Embedded Processors. Peter Petrov, Student Member, IEEE, and Alex Orailoglu, Member, IEEE

Low-Power Instruction Bus Encoding for Embedded Processors. Peter Petrov, Student Member, IEEE, and Alex Orailoglu, Member, IEEE 812 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 12, NO. 8, AUGUST 2004 Low-Power Instruction Bus Encoding for Embedded Processors Peter Petrov, Student Member, IEEE, and Alex

More information

SEMICON Solutions. Bus Structure. Created by: Duong Dang Date: 20 th Oct,2010

SEMICON Solutions. Bus Structure. Created by: Duong Dang Date: 20 th Oct,2010 SEMICON Solutions Bus Structure Created by: Duong Dang Date: 20 th Oct,2010 Introduction Buses are the simplest and most widely used interconnection networks A number of modules is connected via a single

More information

Low Latency via Redundancy

Low Latency via Redundancy Low Latency via Redundancy Ashish Vulimiri, Philip Brighten Godfrey, Radhika Mittal, Justine Sherry, Sylvia Ratnasamy, Scott Shenker Presenter: Meng Wang 2 Low Latency Is Important Injecting just 400 milliseconds

More information

COMPUTER ORGANISATION CHAPTER 1 BASIC STRUCTURE OF COMPUTERS

COMPUTER ORGANISATION CHAPTER 1 BASIC STRUCTURE OF COMPUTERS Computer types: - COMPUTER ORGANISATION CHAPTER 1 BASIC STRUCTURE OF COMPUTERS A computer can be defined as a fast electronic calculating machine that accepts the (data) digitized input information process

More information

Computer Architecture Memory hierarchies and caches

Computer Architecture Memory hierarchies and caches Computer Architecture Memory hierarchies and caches S Coudert and R Pacalet January 23, 2019 Outline Introduction Localities principles Direct-mapped caches Increasing block size Set-associative caches

More information

Memory. Objectives. Introduction. 6.2 Types of Memory

Memory. Objectives. Introduction. 6.2 Types of Memory Memory Objectives Master the concepts of hierarchical memory organization. Understand how each level of memory contributes to system performance, and how the performance is measured. Master the concepts

More information

Minimizing Power Dissipation during. University of Southern California Los Angeles CA August 28 th, 2007

Minimizing Power Dissipation during. University of Southern California Los Angeles CA August 28 th, 2007 Minimizing Power Dissipation during Write Operation to Register Files Kimish Patel, Wonbok Lee, Massoud Pedram University of Southern California Los Angeles CA August 28 th, 2007 Introduction Outline Conditional

More information

Mladen Stefanov F48235 R.A.I.D

Mladen Stefanov F48235 R.A.I.D R.A.I.D Data is the most valuable asset of any business today. Lost data, in most cases, means lost business. Even if you backup regularly, you need a fail-safe way to ensure that your data is protected

More information

Eastern Mediterranean University School of Computing and Technology CACHE MEMORY. Computer memory is organized into a hierarchy.

Eastern Mediterranean University School of Computing and Technology CACHE MEMORY. Computer memory is organized into a hierarchy. Eastern Mediterranean University School of Computing and Technology ITEC255 Computer Organization & Architecture CACHE MEMORY Introduction Computer memory is organized into a hierarchy. At the highest

More information

1.3 Data processing; data storage; data movement; and control.

1.3 Data processing; data storage; data movement; and control. CHAPTER 1 OVERVIEW ANSWERS TO QUESTIONS 1.1 Computer architecture refers to those attributes of a system visible to a programmer or, put another way, those attributes that have a direct impact on the logical

More information

William Stallings Computer Organization and Architecture 8th Edition. Chapter 5 Internal Memory

William Stallings Computer Organization and Architecture 8th Edition. Chapter 5 Internal Memory William Stallings Computer Organization and Architecture 8th Edition Chapter 5 Internal Memory Semiconductor Memory The basic element of a semiconductor memory is the memory cell. Although a variety of

More information

2 Improved Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers [1]

2 Improved Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers [1] EE482: Advanced Computer Organization Lecture #7 Processor Architecture Stanford University Tuesday, June 6, 2000 Memory Systems and Memory Latency Lecture #7: Wednesday, April 19, 2000 Lecturer: Brian

More information

Errata and Clarifications to the PCI-X Addendum, Revision 1.0a. Update 3/12/01 Rev P

Errata and Clarifications to the PCI-X Addendum, Revision 1.0a. Update 3/12/01 Rev P Errata and Clarifications to the PCI-X Addendum, Revision 1.0a Update 3/12/01 Rev P REVISION REVISION HISTORY DATE P E1a-E6a, C1a-C12a 3/12/01 2 Table of Contents Table of Contents...3 Errata to PCI-X

More information

I/O CANNOT BE IGNORED

I/O CANNOT BE IGNORED LECTURE 13 I/O I/O CANNOT BE IGNORED Assume a program requires 100 seconds, 90 seconds for main memory, 10 seconds for I/O. Assume main memory access improves by ~10% per year and I/O remains the same.

More information

New Advances in Micro-Processors and computer architectures

New Advances in Micro-Processors and computer architectures New Advances in Micro-Processors and computer architectures Prof. (Dr.) K.R. Chowdhary, Director SETG Email: kr.chowdhary@jietjodhpur.com Jodhpur Institute of Engineering and Technology, SETG August 27,

More information