Low-Power Data Address Bus Encoding Method

Low-Power Data Address Bus Encoding Method Tsung-Hsi Weng, Wei-Hao Chiao, Jean Jyh-Jiun Shann, Chung-Ping Chung, and Jimmy Lu Dept. of Computer Science and Information Engineering, National Chao Tung University, HsinChu, Taiwan, ROC IP Technology Department, SOC Technology Center, Industrial Technology Research Institute, HsinChu, Taiwan, ROC Abstract-Reducing power consumption of computer systems has gained much research attention recently. In a typical system, the memory bus power constitute will over 50% of all system power; and this power is required due to bus signal transitions (0 1 or 1 0). Reducing the number of memory bus transitions is hence an effective way to reduce system power. While many techniques deal with reducing bus power on instruction address bus, only a few have been proposed for data address bus power reduction. We present an encoding scheme to reduce data address bus power consumption. In this scheme, data address bus can be frozen for sequential addresses, or inverted as appropriate for other cases. Furthermore, data addresses are classified into read addresses and write addresses, and each address set is encoded independently. Simulation results show that the overall bus line switching reduction is 26% of unencoded bus, or 14.5% of the previous T0_BI method [1]. Keywords: low-power, bus encoding, data address bus, T0_BI_1 1. Introduction The increase in complexity of system-on-chip (SoC) designs has led to the power consumption, hence cooling, and reliability problems. Power consumption is becoming one of the most important design issues especially for embedded systems. On the other hand, we are witnessing a dramatic market size increase for portable electronic devices such as mobile phones and personal digital assistants. While these products are battery-powered, plus their functional requirements due to users are even increasing, low power design for these systems hence becomes a very important research topic. In a digital computer system, the major power consumption comes from the off-chip processor-memory bus traffic, as a result of the huge capacitances of the bus lines. More specifically, it has been estimated that the capacitance driven by the I/O nodes is usually three orders of magnitude [2] that of the one seen by the internal nodes of a microprocessor. Design techniques leading to decrease in power dissipation on external buses will make a significant impact on the overall power dissipation of the system. While many bus encoding techniques [3][4] focus on instruction address bus encoding, only a few interested in data address bus encoding. Although memory access instructions account for only 20% to 25% of all instructions, the data address bus bit transitions account for 30% to 60% of all address bus transitions, due to the random bit-pattern nature of data addresses.

Techniques for reducing bit transitions on data address buses deserve in-depth exploration. The rest of the paper is organized as follows: Section 2 describes the background of low-power data address bus encoding. Section 3 presents the proposed design for reducing the bit transitions on data address buses. Section 4 gives the performance results. The last section summarizes the work. 2. Background In this section, the behaviors of a data address sequence and its related low-power bus encoding techniques are described. Some potential improvements to these existing designs and other design issues yet explored are also unveiled. The data address sequences for general-purpose computing are usually randomly distributed. However, accesses to arrays or scalar data in loops do give the resulted data address sequences some pattern. Bus-Invert (BI) method [5] is proposed for random data sequence patterns. It inverts the to-be transmitted bus value whenever doing so can result in less bit transitions less than half of bus width. An extra control line, called INV, is used to indicate if the bus value had been inverted. Zero-Transition (T0) method [6], which avoids the transmission of sequential data completely, is proposed by Benini. An extra control line, called INC, is used to indicate if the bus value is sequential to last bus value. A true INC renders the signals on bus lines meaningless, hence these lines can remain unchanged to save power. T0_BI method [1], which combines both BI and T0 methods, uses two extra control signals, the INV and the INC, to integrate both methods in one design. However, these two control lines themselves may produce excessive number of bit-transitions in many cases. We believe that there exists room for improvement in existing designs, and also new design issues for further studying. First, the two control lines of T0_BI method may be combined into one control line to indicate both invert and sequential condition. Second, the stride value of data addresses may be updated dynamically to allow for different data address strides. Finally, the read and write data address sets usually have their distinct behaviors, and they should be encoded independently for the greatest benefit of both sets. 3. Designs Our low-power data address bus encoding scheme is described in this section. Section 3.1 will introduce the overview of our designs, section 3.2 to 3.4 will show our design details, and section 3.5 gives the design summary. 3.1. Design Overview Figure 1: Data address bus encoding architecture Figure 1 shows our low-power data address bus encoding architecture. The encoder gets data addresses from CPU and outputs the encoded address and some control signals. Encoded addresses and control signals are transmitted to the decoder of data memory. When the decoder receives encoded addresses and control signals, it converts this information into original data address. Two control lines, called Read/Write and enable, are traditional memory control signals, and we will make use of them later.

Three versions of our low-power bus encoding scheme are proposed in this paper, with the second and third built upon its predecessor version: T0_BI_1 combining T0 & BI using a single control line; T0_BI_1/S with Variable-Stride capability added; T0_BI_1/S/RW preserving read/write continuities in a multiplexed data address sequence. T0 schemes. Following is the T0_BI_1 encoding algorithm: 3.2. Combining T0 & BI using single control line (T0_BI_1) T0_BI_1 design uses only one control line, called INCV, to control both INC and INV functions to transmit a data address, the encoder of T0_BI_1 first detects the continuity of the transferred address sequence. The continuity means that the current address is equal to the sum of the previous address and the stride. Transmission of an address with continuity property is done with an asserted INCV and a frozen address bus. If continuity test fails, then the encoder checks to see if the address pattern produces bus bit-transitions on more than half of the address bus lines and the inverted address pattern is not equal to the previous bus value, then the INCV is also asserted and the inverted address is sent over the address bus. Otherwise, the INCV line is de-asserted and the address will be sent directly. Upon activation, the decoder needs to identify the meaning of an asserted INCV line according to the received bus value. If the bus value is unchanged, the INCV line is interpreted as an increment-by-stride indicator. Otherwise, it is interpreted as an invert indicator. In this way, the single INCV line can act as both INC and INV control signals, and address encoding still benefit from both BI and Figure 2: T0_BI_1 encoding algorithm And the corresponding T0_BI_1 decoding algorithm is: Figure 3: T0_BI_1 decoding algorithm Note that the decoder interprets the meaning of the asserted INCV line according to the received bus value. If the encoder intends to invert the bus value but the inverted value happens to equal the current bus value, the decoder may erroneously interpret this as a frozen address bus. As a result, to avoid this error, the encoder simply sends the current address out directly. We believe that this is a very unlikely situation, but precaution must be carefully taken.

3.3. T0_BI_1 with Variable-Stride capability (T0_BI_1/S) Many data are structured (arrays, matrices, etc.), and accesses to such structured data have very predictable data addresses. We use the term stride to describe the byte offset between consecutive access addresses of this kind. Two factors affect the stride value: one is the data item size (64 bits for scientific data, 32 bits for general-purpose computing, and 16 or 8 bits for multimedia applications). The other is the access pattern (column, row, diagonal, ) interacted with the storage scheme (row-major, column-major, others). These complicate the stride value computation and identification; different stride values may even mix in the code sequence. Here we deal only with the changing stride problem. Interleaved stride problem will be tackled in next section. Following is the T0_BI_1/S encoding algorithm, in which italic and underlined contents are newly added: Figure 4: T0_BI_1/S encoding algorithm And the corresponding T0_BI_1/S decoding algorithm, in which italic and underlined contents are newly added, is: Figure 5: T0_BI_1/S decoding algorithm The above algorithms are very simple and straight forward methods, and work only with array accesses without any intervening data accesses. Nevertheless, with these simple ideas as the basis, many innovative schemes can be derived, such as the one to be introduced next. 3.4. Preserving Read/Write Continuities in a multiplexed data address sequence (T0_BI_1/S/RW) Data memory are read and written by the CPU, both over the same set of address and data buses. While data read sequence and write sequence each has its own stride characteristics, these stride characteristics are unfortunately torn apart and severely contaminated due to the intervention of the read/write address sequences in a single address trace. How to preserve and utilize the individual read and write stride characteristics in bus encoding hence becomes an interesting problem. As a result, if we can encode the read and write address sequences individually, we must gain more power savings. Figure 6 shows the T0_BI_1/S/RW block diagram. In this modification, the read/write control line, which exists in all memory systems, is used to indicate the address being a read or

Figure 6: T0_BI_1/S/RW block diagram write address. With this, we can encode each of the read and write address sequences separately using our variable stride T0_BI_1/S method. Following is the T0_BI_1/S/RW encoding algorithm, in which italic and underlined contents are newly added: Figure 8: T0_BI_1/S/RW decoding algorithm 3.5. Summary Figure 7: T0_BI_1/S/RW encoding algorithm And the corresponding T0_BI_1/S/RW decoding algorithm, in which italic and underlined contents are newly added, is: We have introduced our low-power data address encoding/decoding schemes. The data address sequences are bi-streamed into read and write addresses and each of them use their own encoding logic of variable-stride T0_BI_1/S. The decoding process is similar to encoding, and we ignore its details here.

Figure 9: Simulation flowchart 4. Simulation We implement our design using simulation, and use benchmarks to validate our design. The target embedded system conforms to a portable personal multimedia/communication device, and the test programs are selected accordingly. The performance metric is the ratio of reduced data address bus bit toggles. To simplify the result, only overall performance improvements are reported. Although readers may be interested in the effects of each individual technique and their incremental effects on top of other techniques, these data are not shown here due to page limits. 4.1. Simulation Environment The simulated embedded system platform assumptions are listed below: 1. The processor is ARM7TDMI, and there is only one processor in the system. 2. The memory is separated into two parts: instruction memory, and data memory. 3. There is no cache memory. 4. All instructions are compiled in ARM mode. There are four types of benchmarks, and each type has 2 programs in it. These benchmark programs are selected from MediaBench, a popular benchmark suite including multi-media and communication applications. These benchmarks are: 1. ADPCM Audio Coder and Decoder 2. Efficient Pyramid Image Coder, an experimental image data compression utility 3. GSM full-rate speech trans-coding 4. JPEG image compression & decompression Figure 9 shows the flowchart of simulation. The benchmarks are run in the ARM-Emulator7t, and the emulator dumps the trace of program execution. After that, a simulator takes the trace as input and counts the number of data address bus bit transitions. Number of bit transitions are collected for 1) a traditional data address bus, 2) BI encoded bus, 3) T0 encoding bus, 4) T0_BI encoded bus, and 5) T0_BI_1/S/RW encoded bus. 4.2. Simulation Results The goal of data address bus encoding methods is to reduce the switching activities on data address bus, so the result of encodings will be presented as percentage of reduced switching activities, which is calculated as (% of Reduced Switching Activities) = (Reduced Switching count) (unencoded Switching count). The higher this value is, the more effective the corresponding design. To make the results more readable, following only shows the average of all eight benchmarks due to each design. Reduced Transitions 30% 25% 20% 15% 10% 5% 0% Simulation Results our scheme T0_BI BI T0 Figure 10: Simulation Results

5. Conclusions In this paper, we address low-power data address bus encoding technique. First, we have proposed a T0_BI_1 method to integrate T0 and BI methods using only one control line. Second, we have introduced a variable-stride method which deals with dynamically changing strides, and can be combined with T0_BI_1. Lastly, we have used separate sets of encoding information for read and write address sets to preserve their individuality. Compared with the T0_BI method, our design achieves 26% reduction of the original bit transitions, and the improvements rate is 226%. The simulation results show that our data address bus encoding scheme has much less bit transitions. To make the power estimation results more precise, a bus power model needs to be carefully constructed. And the hardware overheads for the additional control lines/logic, include silicon area, delay, and power, also need to be evaluated. Several related research directions worth further studying. For one, we can allow multiple candidate strides and adaptively select among them for encoding. The challenge is how to determine which stride is to be used. Second, we can extend our bus encoding method to be applicable on multiplexed instruction/data address bus. Many designs with pin count/pcb real estate limitations use such multiplexed bus. The challenge of this extended work is how to capture and use the individual instruction/load/ store address sets from the multiplexed bus, on these three intervened address set sequences. And last, we can try to preserve more continuity in all kinds of address bus sequences, but how to do it remains a question! 6. References [1] L. Benini, G. DeMicheli, E.Macii, D. Sciuto, and C. Silvano, Address bus encoding techniques for system-level power optimization, Proc. Of Design Automation and Test in Europe, pp. 861-866, Feb. 1998 [2] S. Wuytack, F. Catthoor, L. Nachtergaele, H. De Man, Global communication and memory optimizing transformati ons for low power signal processing systems, IWLPD-94: ACM/IEEE International Workshop on Low Power Design, Apr. 1994, pp. 203-208. [3] C. L. Su, C. Y. Tsui, A. M. Despain, Saving Power in the Control Path of Embedded Processors, IEEE Design and Test of Computers, Vol. 11, No. 4, pp. 24-30, Winter 1994 [4] Y. Aghaghiri, F. Fallah, and M. Pedram, Irredundant address bus encoding for low-power, in Proc. IEEE Int. Symp. Low-Power Electronics and Design, Aug. 2001, pp. 182 187. [5] M. R. Stan and W. P. Burleson, Bus-invert coding for low-power I/O, IEEE Transactions on VLSI Systems, Vol. 3, No. 1, pp. 49-58, 1995 [6] L. Benini, G. DeMicheli, E.Macii, D. Sciuto, and C. Silvano, Asymptotic zero-transition activity encoding for address busses in low-power microprocessor-based systems GLS-VLSI-97: IEEE 7th Great Lakes Symposium on VLSI,pp. 77-82, Urbanana-Champaign, IL, March 1997.