Low-Power Data Address Bus Encoding Method

Similar documents
Bus Encoding Technique for hierarchical memory system Anne Pratoomtong and Weiping Liao

Memory Bus Encoding for Low Power: A Tutorial

Address Bus Encoding Techniques for System-Level Power Optimization. Dip. di Automatica e Informatica. Dip. di Elettronica per l'automazione

Shift Invert Coding (SINV) for Low Power VLSI

Bus Encoding Techniques for System- Level Power Optimization

Power Estimation of System-Level Buses for Microprocessor-Based Architectures: A Case Study

Reference Caching Using Unit Distance Redundant Codes for Activity Reduction on Address Buses

Efficient Power Reduction Techniques for Time Multiplexed Address Buses

A Low Power Design of Gray and T0 Codecs for the Address Bus Encoding for System Level Power Optimization

Power Aware Encoding for the Instruction Address Buses Using Program Constructs

Behavioral Array Mapping into Multiport Memories Targeting Low Power 3

Power-Aware Bus Encoding Techniques for I/O and Data Busses in an Embedded System

VERY large scale integration (VLSI) design for power

Adaptive Low-Power Address Encoding Techniques Using Self-Organizing Lists

Reusing Cache for Real-Time Memory Address Trace Compression

Transition Reduction in Memory Buses Using Sector-based Encoding Techniques

RECENTLY, researches on gigabit wireless personal area

Parameterized System Design

Evaluating Power Consumption of Parameterized Cache and Bus Architectures in System-on-a-Chip Designs

Unified VLSI Systolic Array Design for LZ Data Compression

Reducing Transitions on Memory Buses Using Sectorbased Encoding Technique

Predictive Line Buffer: A fast, Energy Efficient Cache Architecture

A Low Energy Clustered Instruction Memory Hierarchy for Long Instruction Word Processors

A Translation Framework for Automatic Translation of Annotated LLVM IR into OpenCL Kernel Function

DATA HIDING IN PDF FILES AND APPLICATIONS BY IMPERCEIVABLE MODIFICATIONS OF PDF OBJECT PARAMETERS

DESIGN AND IMPLEMENTATION OF BIT TRANSITION COUNTER

Low Power Set-Associative Cache with Single-Cycle Partial Tag Comparison

Encoding Scheme for Power Reduction in Network on Chip Links

Efficient Algorithm for Test Vector Decompression Using an Embedded Processor

DUE to the high computational complexity and real-time

Architectures and Synthesis Algorithms for Power-Efficient Bus Interfaces

Parallel-computing approach for FFT implementation on digital signal processor (DSP)

Power-Mode-Aware Buffer Synthesis for Low-Power Clock Skew Minimization

Real-time and smooth scalable video streaming system with bitstream extractor intellectual property implementation

THE use of intellectual proprietary components, such as

OUTLINE Introduction Power Components Dynamic Power Optimization Conclusions

Power Protocol: Reducing Power Dissipation on Off-Chip Data Buses

PROOFS Fault Simulation Algorithm

Abridged Addressing: A Low Power Memory Addressing Strategy

Energy-Efficient Encoding Techniques for Off-Chip Data Buses

Design and Implementation of Advanced Modified Booth Encoding Multiplier

THE orthogonal frequency-division multiplex (OFDM)

ISSN Vol.04,Issue.01, January-2016, Pages:

A Universal Test Pattern Generator for DDR SDRAM *

Managing Dynamic Reconfiguration Overhead in Systems-on-a-Chip Design Using Reconfigurable Datapaths and Optimized Interconnection Networks

Multi-Level Cache Hierarchy Evaluation for Programmable Media Processors. Overview

Lossless Compression using Efficient Encoding of Bitmasks

Code Compression for RISC Processors with Variable Length Instruction Encoding

Efficient VLSI Huffman encoder implementation and its application in high rate serial data encoding

Slide Set 9. for ENCM 369 Winter 2018 Section 01. Steve Norman, PhD, PEng

A LOW-COMPLEXITY AND LOSSLESS REFERENCE FRAME ENCODER ALGORITHM FOR VIDEO CODING

International Journal of Emerging Technology and Advanced Engineering Website: (ISSN , Volume 2, Issue 4, April 2012)

POWER ANALYSIS RESISTANT SRAM

Simultaneously Improving Code Size, Performance, and Energy in Embedded Processors

ISSCC 2006 / SESSION 22 / LOW POWER MULTIMEDIA / 22.1

Thermal-Aware Memory Management Unit of 3D- Stacked DRAM for 3D High Definition (HD) Video

Complexity-effective Enhancements to a RISC CPU Architecture

Processor and DRAM Integration by TSV- Based 3-D Stacking for Power-Aware SOCs

SF-LRU Cache Replacement Algorithm

Using a Victim Buffer in an Application-Specific Memory Hierarchy

Minimizing Power Dissipation during Write Operation to Register Files

An Efficient Multi Mode and Multi Resolution Based AHB Bus Tracer

Power Efficient Arithmetic Operand Encoding

CS 24: INTRODUCTION TO. Spring 2015 Lecture 2 COMPUTING SYSTEMS

Low Power Mapping of Video Processing Applications on VLIW Multimedia Processors

A Simulation: Improving Throughput and Reducing PCI Bus Traffic by. Caching Server Requests using a Network Processor with Memory

Gated-Demultiplexer Tree Buffer for Low Power Using Clock Tree Based Gated Driver

ISSN Vol.05,Issue.09, September-2017, Pages:

Memory Systems IRAM. Principle of IRAM

Write only as much as necessary. Be brief!

Optimal Cache Organization using an Allocation Tree

WITH integrated circuits, especially system-on-chip

Characterization of Native Signal Processing Extensions

Low-Power Instruction Bus Encoding for Embedded Processors. Peter Petrov, Student Member, IEEE, and Alex Orailoglu, Member, IEEE

Vector an ordered series of scalar quantities a one-dimensional array. Vector Quantity Data Data Data Data Data Data Data Data

Design of a Pipelined 32 Bit MIPS Processor with Floating Point Unit

Synthesis of Customized Loop Caches for Core-Based Embedded Systems

Effective Memory Access Optimization by Memory Delay Modeling, Memory Allocation, and Slack Time Management

CODING METHOD FOR EMBEDDING AUDIO IN VIDEO STREAM. Harri Sorokin, Jari Koivusaari, Moncef Gabbouj, and Jarmo Takala

(1) Define following terms: Instruction, Machine Cycle, Opcode, Oprand & Instruction Cycle. Instruction:

Software Power Optimizations In An Embedded System

Application-Specific Design of Low Power Instruction Cache Hierarchy for Embedded Processors

6T- SRAM for Low Power Consumption. Professor, Dept. of ExTC, PRMIT &R, Badnera, Amravati, Maharashtra, India 1

SDR Forum Technical Conference 2007

Energy Efficient Caching-on-Cache Architectures for Embedded Systems

COMPUTER ARCHITECTURES

EEL 4783: Hardware/Software Co-design with FPGAs

Power Optimized Transition and Forbidden Free Pattern Crosstalk Avoidance

Chapter 7 The Potential of Special-Purpose Hardware

Vector Bank Based Multimedia Codec System-on-a-Chip (SoC) Design

Memory Access Optimizations in Instruction-Set Simulators

A Bus Architecture for Crosstalk Elimination in High Performance Processor Design

An Integrated ECC and BISR Scheme for Error Correction in Memory

Tradeoff between coverage of a Markov prefetcher and memory bandwidth usage

Chapter-5 Memory Hierarchy Design

A Low Power SRAM Base on Novel Word-Line Decoding

Energy-Efficient Value-Based Selective Refresh for Embedded DRAMs

Power Aware External Bus Arbitration for System-on-a-Chip Embedded Systems

Design and Implementation of AMBA AXI to AHB Bridge K. Lakshmi Shirisha 1 A.Ramkumar 2

Evaluation of Static and Dynamic Scheduling for Media Processors. Overview

Transcription:

Low-Power Data Address Bus Encoding Method Tsung-Hsi Weng, Wei-Hao Chiao, Jean Jyh-Jiun Shann, Chung-Ping Chung, and Jimmy Lu Dept. of Computer Science and Information Engineering, National Chao Tung University, HsinChu, Taiwan, ROC IP Technology Department, SOC Technology Center, Industrial Technology Research Institute, HsinChu, Taiwan, ROC Abstract-Reducing power consumption of computer systems has gained much research attention recently. In a typical system, the memory bus power constitute will over 50% of all system power; and this power is required due to bus signal transitions (0 1 or 1 0). Reducing the number of memory bus transitions is hence an effective way to reduce system power. While many techniques deal with reducing bus power on instruction address bus, only a few have been proposed for data address bus power reduction. We present an encoding scheme to reduce data address bus power consumption. In this scheme, data address bus can be frozen for sequential addresses, or inverted as appropriate for other cases. Furthermore, data addresses are classified into read addresses and write addresses, and each address set is encoded independently. Simulation results show that the overall bus line switching reduction is 26% of unencoded bus, or 14.5% of the previous T0_BI method [1]. Keywords: low-power, bus encoding, data address bus, T0_BI_1 1. Introduction The increase in complexity of system-on-chip (SoC) designs has led to the power consumption, hence cooling, and reliability problems. Power consumption is becoming one of the most important design issues especially for embedded systems. On the other hand, we are witnessing a dramatic market size increase for portable electronic devices such as mobile phones and personal digital assistants. While these products are battery-powered, plus their functional requirements due to users are even increasing, low power design for these systems hence becomes a very important research topic. In a digital computer system, the major power consumption comes from the off-chip processor-memory bus traffic, as a result of the huge capacitances of the bus lines. More specifically, it has been estimated that the capacitance driven by the I/O nodes is usually three orders of magnitude [2] that of the one seen by the internal nodes of a microprocessor. Design techniques leading to decrease in power dissipation on external buses will make a significant impact on the overall power dissipation of the system. While many bus encoding techniques [3][4] focus on instruction address bus encoding, only a few interested in data address bus encoding. Although memory access instructions account for only 20% to 25% of all instructions, the data address bus bit transitions account for 30% to 60% of all address bus transitions, due to the random bit-pattern nature of data addresses.

Techniques for reducing bit transitions on data address buses deserve in-depth exploration. The rest of the paper is organized as follows: Section 2 describes the background of low-power data address bus encoding. Section 3 presents the proposed design for reducing the bit transitions on data address buses. Section 4 gives the performance results. The last section summarizes the work. 2. Background In this section, the behaviors of a data address sequence and its related low-power bus encoding techniques are described. Some potential improvements to these existing designs and other design issues yet explored are also unveiled. The data address sequences for general-purpose computing are usually randomly distributed. However, accesses to arrays or scalar data in loops do give the resulted data address sequences some pattern. Bus-Invert (BI) method [5] is proposed for random data sequence patterns. It inverts the to-be transmitted bus value whenever doing so can result in less bit transitions less than half of bus width. An extra control line, called INV, is used to indicate if the bus value had been inverted. Zero-Transition (T0) method [6], which avoids the transmission of sequential data completely, is proposed by Benini. An extra control line, called INC, is used to indicate if the bus value is sequential to last bus value. A true INC renders the signals on bus lines meaningless, hence these lines can remain unchanged to save power. T0_BI method [1], which combines both BI and T0 methods, uses two extra control signals, the INV and the INC, to integrate both methods in one design. However, these two control lines themselves may produce excessive number of bit-transitions in many cases. We believe that there exists room for improvement in existing designs, and also new design issues for further studying. First, the two control lines of T0_BI method may be combined into one control line to indicate both invert and sequential condition. Second, the stride value of data addresses may be updated dynamically to allow for different data address strides. Finally, the read and write data address sets usually have their distinct behaviors, and they should be encoded independently for the greatest benefit of both sets. 3. Designs Our low-power data address bus encoding scheme is described in this section. Section 3.1 will introduce the overview of our designs, section 3.2 to 3.4 will show our design details, and section 3.5 gives the design summary. 3.1. Design Overview Figure 1: Data address bus encoding architecture Figure 1 shows our low-power data address bus encoding architecture. The encoder gets data addresses from CPU and outputs the encoded address and some control signals. Encoded addresses and control signals are transmitted to the decoder of data memory. When the decoder receives encoded addresses and control signals, it converts this information into original data address. Two control lines, called Read/Write and enable, are traditional memory control signals, and we will make use of them later.

Three versions of our low-power bus encoding scheme are proposed in this paper, with the second and third built upon its predecessor version: T0_BI_1 combining T0 & BI using a single control line; T0_BI_1/S with Variable-Stride capability added; T0_BI_1/S/RW preserving read/write continuities in a multiplexed data address sequence. T0 schemes. Following is the T0_BI_1 encoding algorithm: 3.2. Combining T0 & BI using single control line (T0_BI_1) T0_BI_1 design uses only one control line, called INCV, to control both INC and INV functions to transmit a data address, the encoder of T0_BI_1 first detects the continuity of the transferred address sequence. The continuity means that the current address is equal to the sum of the previous address and the stride. Transmission of an address with continuity property is done with an asserted INCV and a frozen address bus. If continuity test fails, then the encoder checks to see if the address pattern produces bus bit-transitions on more than half of the address bus lines and the inverted address pattern is not equal to the previous bus value, then the INCV is also asserted and the inverted address is sent over the address bus. Otherwise, the INCV line is de-asserted and the address will be sent directly. Upon activation, the decoder needs to identify the meaning of an asserted INCV line according to the received bus value. If the bus value is unchanged, the INCV line is interpreted as an increment-by-stride indicator. Otherwise, it is interpreted as an invert indicator. In this way, the single INCV line can act as both INC and INV control signals, and address encoding still benefit from both BI and Figure 2: T0_BI_1 encoding algorithm And the corresponding T0_BI_1 decoding algorithm is: Figure 3: T0_BI_1 decoding algorithm Note that the decoder interprets the meaning of the asserted INCV line according to the received bus value. If the encoder intends to invert the bus value but the inverted value happens to equal the current bus value, the decoder may erroneously interpret this as a frozen address bus. As a result, to avoid this error, the encoder simply sends the current address out directly. We believe that this is a very unlikely situation, but precaution must be carefully taken.

3.3. T0_BI_1 with Variable-Stride capability (T0_BI_1/S) Many data are structured (arrays, matrices, etc.), and accesses to such structured data have very predictable data addresses. We use the term stride to describe the byte offset between consecutive access addresses of this kind. Two factors affect the stride value: one is the data item size (64 bits for scientific data, 32 bits for general-purpose computing, and 16 or 8 bits for multimedia applications). The other is the access pattern (column, row, diagonal, ) interacted with the storage scheme (row-major, column-major, others). These complicate the stride value computation and identification; different stride values may even mix in the code sequence. Here we deal only with the changing stride problem. Interleaved stride problem will be tackled in next section. Following is the T0_BI_1/S encoding algorithm, in which italic and underlined contents are newly added: Figure 4: T0_BI_1/S encoding algorithm And the corresponding T0_BI_1/S decoding algorithm, in which italic and underlined contents are newly added, is: Figure 5: T0_BI_1/S decoding algorithm The above algorithms are very simple and straight forward methods, and work only with array accesses without any intervening data accesses. Nevertheless, with these simple ideas as the basis, many innovative schemes can be derived, such as the one to be introduced next. 3.4. Preserving Read/Write Continuities in a multiplexed data address sequence (T0_BI_1/S/RW) Data memory are read and written by the CPU, both over the same set of address and data buses. While data read sequence and write sequence each has its own stride characteristics, these stride characteristics are unfortunately torn apart and severely contaminated due to the intervention of the read/write address sequences in a single address trace. How to preserve and utilize the individual read and write stride characteristics in bus encoding hence becomes an interesting problem. As a result, if we can encode the read and write address sequences individually, we must gain more power savings. Figure 6 shows the T0_BI_1/S/RW block diagram. In this modification, the read/write control line, which exists in all memory systems, is used to indicate the address being a read or

Figure 6: T0_BI_1/S/RW block diagram write address. With this, we can encode each of the read and write address sequences separately using our variable stride T0_BI_1/S method. Following is the T0_BI_1/S/RW encoding algorithm, in which italic and underlined contents are newly added: Figure 8: T0_BI_1/S/RW decoding algorithm 3.5. Summary Figure 7: T0_BI_1/S/RW encoding algorithm And the corresponding T0_BI_1/S/RW decoding algorithm, in which italic and underlined contents are newly added, is: We have introduced our low-power data address encoding/decoding schemes. The data address sequences are bi-streamed into read and write addresses and each of them use their own encoding logic of variable-stride T0_BI_1/S. The decoding process is similar to encoding, and we ignore its details here.

Figure 9: Simulation flowchart 4. Simulation We implement our design using simulation, and use benchmarks to validate our design. The target embedded system conforms to a portable personal multimedia/communication device, and the test programs are selected accordingly. The performance metric is the ratio of reduced data address bus bit toggles. To simplify the result, only overall performance improvements are reported. Although readers may be interested in the effects of each individual technique and their incremental effects on top of other techniques, these data are not shown here due to page limits. 4.1. Simulation Environment The simulated embedded system platform assumptions are listed below: 1. The processor is ARM7TDMI, and there is only one processor in the system. 2. The memory is separated into two parts: instruction memory, and data memory. 3. There is no cache memory. 4. All instructions are compiled in ARM mode. There are four types of benchmarks, and each type has 2 programs in it. These benchmark programs are selected from MediaBench, a popular benchmark suite including multi-media and communication applications. These benchmarks are: 1. ADPCM Audio Coder and Decoder 2. Efficient Pyramid Image Coder, an experimental image data compression utility 3. GSM full-rate speech trans-coding 4. JPEG image compression & decompression Figure 9 shows the flowchart of simulation. The benchmarks are run in the ARM-Emulator7t, and the emulator dumps the trace of program execution. After that, a simulator takes the trace as input and counts the number of data address bus bit transitions. Number of bit transitions are collected for 1) a traditional data address bus, 2) BI encoded bus, 3) T0 encoding bus, 4) T0_BI encoded bus, and 5) T0_BI_1/S/RW encoded bus. 4.2. Simulation Results The goal of data address bus encoding methods is to reduce the switching activities on data address bus, so the result of encodings will be presented as percentage of reduced switching activities, which is calculated as (% of Reduced Switching Activities) = (Reduced Switching count) (unencoded Switching count). The higher this value is, the more effective the corresponding design. To make the results more readable, following only shows the average of all eight benchmarks due to each design. Reduced Transitions 30% 25% 20% 15% 10% 5% 0% Simulation Results our scheme T0_BI BI T0 Figure 10: Simulation Results

5. Conclusions In this paper, we address low-power data address bus encoding technique. First, we have proposed a T0_BI_1 method to integrate T0 and BI methods using only one control line. Second, we have introduced a variable-stride method which deals with dynamically changing strides, and can be combined with T0_BI_1. Lastly, we have used separate sets of encoding information for read and write address sets to preserve their individuality. Compared with the T0_BI method, our design achieves 26% reduction of the original bit transitions, and the improvements rate is 226%. The simulation results show that our data address bus encoding scheme has much less bit transitions. To make the power estimation results more precise, a bus power model needs to be carefully constructed. And the hardware overheads for the additional control lines/logic, include silicon area, delay, and power, also need to be evaluated. Several related research directions worth further studying. For one, we can allow multiple candidate strides and adaptively select among them for encoding. The challenge is how to determine which stride is to be used. Second, we can extend our bus encoding method to be applicable on multiplexed instruction/data address bus. Many designs with pin count/pcb real estate limitations use such multiplexed bus. The challenge of this extended work is how to capture and use the individual instruction/load/ store address sets from the multiplexed bus, on these three intervened address set sequences. And last, we can try to preserve more continuity in all kinds of address bus sequences, but how to do it remains a question! 6. References [1] L. Benini, G. DeMicheli, E.Macii, D. Sciuto, and C. Silvano, Address bus encoding techniques for system-level power optimization, Proc. Of Design Automation and Test in Europe, pp. 861-866, Feb. 1998 [2] S. Wuytack, F. Catthoor, L. Nachtergaele, H. De Man, Global communication and memory optimizing transformati ons for low power signal processing systems, IWLPD-94: ACM/IEEE International Workshop on Low Power Design, Apr. 1994, pp. 203-208. [3] C. L. Su, C. Y. Tsui, A. M. Despain, Saving Power in the Control Path of Embedded Processors, IEEE Design and Test of Computers, Vol. 11, No. 4, pp. 24-30, Winter 1994 [4] Y. Aghaghiri, F. Fallah, and M. Pedram, Irredundant address bus encoding for low-power, in Proc. IEEE Int. Symp. Low-Power Electronics and Design, Aug. 2001, pp. 182 187. [5] M. R. Stan and W. P. Burleson, Bus-invert coding for low-power I/O, IEEE Transactions on VLSI Systems, Vol. 3, No. 1, pp. 49-58, 1995 [6] L. Benini, G. DeMicheli, E.Macii, D. Sciuto, and C. Silvano, Asymptotic zero-transition activity encoding for address busses in low-power microprocessor-based systems GLS-VLSI-97: IEEE 7th Great Lakes Symposium on VLSI,pp. 77-82, Urbanana-Champaign, IL, March 1997.