Gated-Demultiplexer Tree Buffer for Low Power Using Clock Tree Based Gated Driver

Similar documents
Chapter 6 (Lect 3) Counters Continued. Unused States Ring counter. Implementing with Registers Implementing with Counter and Decoder

DYNAMIC CIRCUIT TECHNIQUE FOR LOW- POWER MICROPROCESSORS Kuruva Hanumantha Rao 1 (M.tech)

ISSN Vol.05,Issue.09, September-2017, Pages:

INTERNATIONAL JOURNAL OF PROFESSIONAL ENGINEERING STUDIES Volume 9 /Issue 3 / OCT 2017

Implementing Synchronous Counter using Data Mining Techniques

of Soft Core Processor Clock Synchronization DDR Controller and SDRAM by Using RISC Architecture

DESIGN AND PERFORMANCE ANALYSIS OF CARRY SELECT ADDER

HIGH-PERFORMANCE RECONFIGURABLE FIR FILTER USING PIPELINE TECHNIQUE

RECENTLY, researches on gigabit wireless personal area

Resource Efficient Multi Ported Sram Based Ternary Content Addressable Memory

Design and Implementation of Signed, Rounded and Truncated Multipliers using Modified Booth Algorithm for Dsp Systems.

POWER REDUCTION IN CONTENT ADDRESSABLE MEMORY

ENGIN 112 Intro to Electrical and Computer Engineering

Implementation of FM0 and Manchester Encoder for Efficient Hardware Utilization

THE latest generation of microprocessors uses a combination

Efficient Majority Logic Fault Detector/Corrector Using Euclidean Geometry Low Density Parity Check (EG-LDPC) Codes

ISSN: [Garade* et al., 6(1): January, 2017] Impact Factor: 4.116

Area And Power Optimized One-Dimensional Median Filter

Fully Reused VLSI Architecture for DSRC Applications Using SOLS Technique

Design of FM0 and Manchester Encoder and Decoder for DSRC Application Using SOLS Technique

Design and Analysis of Kogge-Stone and Han-Carlson Adders in 130nm CMOS Technology

1. Introduction. Raj Kishore Kumar 1, Vikram Kumar 2

Low Power Floating-Point Multiplier Based On Vedic Mathematics

A Low Power 32 Bit CMOS ROM Using a Novel ATD Circuit

LOGIC EFFORT OF CMOS BASED DUAL MODE LOGIC GATES

Area Efficient Z-TCAM for Network Applications

the main limitations of the work is that wiring increases with 1. INTRODUCTION

Philadelphia University Department of Computer Science. By Dareen Hamoudeh

High Performance Interconnect and NoC Router Design

DESIGN OF PARAMETER EXTRACTOR IN LOW POWER PRECOMPUTATION BASED CONTENT ADDRESSABLE MEMORY

Design and Implementation of VLSI 8 Bit Systolic Array Multiplier

Low Power and Memory Efficient FFT Architecture Using Modified CORDIC Algorithm

A MULTIBANK MEMORY-BASED VLSI ARCHITECTURE OF DIGITAL VIDEO BROADCASTING SYMBOL DEINTERLEAVER

High Performance Memory Read Using Cross-Coupled Pull-up Circuitry

DUE to the high computational complexity and real-time

ARITHMETIC operations based on residue number systems

Embedded Soc using High Performance Arm Core Processor D.sridhar raja Assistant professor, Dept. of E&I, Bharath university, Chennai

The Serial Commutator FFT

FABRICATION TECHNOLOGIES

Testability Design for Sleep Convention Logic

LOW POWER SRAM CELL WITH IMPROVED RESPONSE

Optimal Built-In Self Repair Analyzer for Word-Oriented Memories

Power Optimized Programmable Truncated Multiplier and Accumulator Using Reversible Adder

Design of Low Power Digital CMOS Comparator

A Countermeasure Circuit for Secure AES Engine against Differential Power Analysis

Implementation of Reduce the Area- Power Efficient Fixed-Point LMS Adaptive Filter with Low Adaptation-Delay

Reconfigurable PLL for Digital System

Performance Evaluation of Guarded Static CMOS Logic based Arithmetic and Logic Unit Design

A Universal Test Pattern Generator for DDR SDRAM *

Code No: R Set No. 1

Implementation of SCN Based Content Addressable Memory

AN EFFICIENT DESIGN OF VLSI ARCHITECTURE FOR FAULT DETECTION USING ORTHOGONAL LATIN SQUARES (OLS) CODES

COMPUTER ARCHITECTURE AND ORGANIZATION Register Transfer and Micro-operations 1. Introduction A digital system is an interconnection of digital

A Low Power DDR SDRAM Controller Design P.Anup, R.Ramana Reddy

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2017 Lecture 13

A Novel Design of High Speed and Area Efficient De-Multiplexer. using Pass Transistor Logic

FPGA Implementation of 16-Point Radix-4 Complex FFT Core Using NEDA

A Low Power Asynchronous FPGA with Autonomous Fine Grain Power Gating and LEDR Encoding

Design and Implementation of 3-D DWT for Video Processing Applications

Fast FPGA Routing Approach Using Stochestic Architecture

1 MALP ( ) Unit-1. (1) Draw and explain the internal architecture of 8085.

AUTONOMOUS RECONFIGURATION OF IP CORE UNITS USING BLRB ALGORITHM

Embedded Systems Design: A Unified Hardware/Software Introduction. Outline. Chapter 5 Memory. Introduction. Memory: basic concepts

Embedded Systems Design: A Unified Hardware/Software Introduction. Chapter 5 Memory. Outline. Introduction

FPGA IMPLEMENTATION OF FLOATING POINT ADDER AND MULTIPLIER UNDER ROUND TO NEAREST

Compact Clock Skew Scheme for FPGA based Wave- Pipelined Circuits

Reducing Cache Energy in Embedded Processors Using Early Tag Access and Tag Overflow Buffer

POWER OPTIMIZATION USING BODY BIASING METHOD FOR DUAL VOLTAGE FPGA

Implementation of ALU Using Asynchronous Design

IMPLEMENTATION OF OPTIMIZED 128-POINT PIPELINE FFT PROCESSOR USING MIXED RADIX 4-2 FOR OFDM APPLICATIONS

FeRAM Circuit Technology for System on a Chip

Memories. Design of Digital Circuits 2017 Srdjan Capkun Onur Mutlu.

! Memory. " RAM Memory. " Serial Access Memories. ! Cell size accounts for most of memory array size. ! 6T SRAM Cell. " Used in most commercial chips

AN FFT PROCESSOR BASED ON 16-POINT MODULE

MODIFIED IMDCT-DECODER BASED MP3 MULTICHANNEL AUDIO DECODING SYSTEM Shanmuga Raju.S 1, Karthik.R 2, Sai Pradeep.K.P 3, Varadharajan.

Content Addressable Memory Architecture based on Sparse Clustered Networks

[Indu*, 5(4): April, 2016] ISSN: (I2OR), Publication Impact Factor: 3.785

Reference Sheet for C112 Hardware

6T- SRAM for Low Power Consumption. Professor, Dept. of ExTC, PRMIT &R, Badnera, Amravati, Maharashtra, India 1

Design & Analysis of 16 bit RISC Processor Using low Power Pipelining

Chapter 5 Registers & Counters

A Software LDPC Decoder Implemented on a Many-Core Array of Programmable Processors

Design and Implementation of Advanced Modified Booth Encoding Multiplier

Programmable Memory Blocks Supporting Content-Addressable Memory

CS250 VLSI Systems Design Lecture 9: Memory

Design and Implementation of FPGA- based Systolic Array for LZ Data Compression

A Low-Power Field Programmable VLSI Based on Autonomous Fine-Grain Power Gating Technique

International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering

Implementation of Efficient Modified Booth Recoder for Fused Sum-Product Operator

A LOSSLESS INDEX CODING ALGORITHM AND VLSI DESIGN FOR VECTOR QUANTIZATION

Model EXAM Question Bank

A 65nm Parallel Prefix High Speed Tree based 64-Bit Binary Comparator

Design and Implementation of CVNS Based Low Power 64-Bit Adder

Analysis of Power Dissipation and Delay in 6T and 8T SRAM Using Tanner Tool

EXPERIMENT NUMBER 11 REGISTERED ALU DESIGN

Keywords: Fast Fourier Transforms (FFT), Multipath Delay Commutator (MDC), Pipelined Architecture, Radix-2 k, VLSI.

Fully Reconfigured VLSI architecture of FM0, Manchester and Miller Encoder for DSRC Applications

Ripple Counters. Lecture 30 1

HDL IMPLEMENTATION OF SRAM BASED ERROR CORRECTION AND DETECTION USING ORTHOGONAL LATIN SQUARE CODES

DETECTION AND CORRECTION OF CELL UPSETS USING MODIFIED DECIMAL MATRIX

Transcription:

Gated-Demultiplexer Tree Buffer for Low Power Using Clock Tree Based Gated Driver E.Kanniga 1, N. Imocha Singh 2,K.Selva Rama Rathnam 3 Professor Department of Electronics and Telecommunication, Bharath University, Chennai 1 PG Scholar-VLSI, Department of Electronics and Communication Engineering, Bharath University, Chennai 2 PG Scholar-AE, Department of Electronics and Communication Engineering, Bharath University, Chennai 3 ABSTRACT: With the progress of VLSI technology, delay buffer plays an important role affecting the circuit design and performance. This paper presents the design of low power buffer using clock gating and gated driver tree. Since delay buffers are accessed sequentially, it adopts a gated clock ring counter addressing scheme. The ring counter employs double edge triggered (DET) flip flops instead of traditional flip flop to half the operating frequency. Also for generating clock gating signals, combinational elements (C-element) are implemented in the control logic to avoid the increasing loading of the global clock signal. For the clock distribution network, a gated driver tree technique is used and it further reduces the power consumption. In addition, this technique is used in the input and output ports of the memory to decrease their loading. The proposed delay buffer consumes less power comparing to the conventional delay buffers. INDEX TERMS: C-element, DET flip flop, delay buffer, clock gating, gated driver, ring counter. I. INTRODUCTION Due to the massive usage of portable battery operated, design of low power circuits has become an important one factor in modern VLSI technology. For these products, delay buffers which are also line buffers and delay lines are used. In the temporary storage of signals that are processed such serial access memory is needed. To compensate for the difference in the rate of being flow of date, holding data for use at a later timing, allowing timing corrections to be made on data stream, delaying the transit time of signal in order to allow other applications to occur delay buffers are used. Currently, most circuits adopt static random access memory (SRAM) plus some control/addressing logic to implement delay buffers. For smaller-length delay buffers, shift register can be used instead. In the long delay buffers SRAM-based delay buffers are more popular because of the compact SRAM cell size and small total area. There still can be considerable power consumption in the SRAM address decoder and the read/write circuits. In the proposed delay buffer, a gated clock ring counter is used to access the memory. Instead of single edge triggered flip flop, the ring counter uses double edge triggered (DET) flip flop to half the operating clock frequency. In the control logic for generating the clock gating signals to avoid the increasing loading of the global clock signal, combinational element (C-element) is used. In addition to gating the clock signal going to the DET flip flops in the ring counter, a gated clock driver tree is then applied to further reduce the activity along the clock distribution network. If no gating is applied, all drivers need to be activated. A driver tree distribution network is used for the global clock and activate only those drivers along the path from the clock source to the block that need to be activated by the clock[5]. This technique will greatly decrease the loading on distribution network of the clock signal for the Copyright to IJAREEIE www.ijareeie.com 4652

ring counter and thus the overall power consumption. The same technique is applied to the input driver and output driver of the memory part in the delay buffer. For the input circuitry, in each level of the driver tree, only one driver along the path leading to the addressed memory word is activated. Similarly, only one driver along the path from the addressed memory word to the output is activated for the read circuitry in each level of driver tree. Thus, the power wasted on the drivers can be eliminated that need not to be activated by this technique. BUFFERS II. CONVENTIONAL DELAY For implementing short delay buffers where area and power are of less importance, shift registers are used. Upon the application of clock pulses, data in the shift register can be moved.[2] Fig. 1. Delay buffer implementing shift registers In the pointer based design, a ring counter with only one rotating active cell to point the words for write-in and read-out. The bottom row of DFF is initialized with only one 1 and all the other DFFs are kept at 0. When a clock edge triggers the DFFs, this 1 signal is propagated forward. Consequently, the traditional binary address decoder can be replaced by this unary-coded ring counter. Compared to the shift register delay buffers, this approach propagates only one 1 in the ring counter instead of propagating - bit words.[2] Fig. 2. Pointer-based delay buffer Copyright to IJAREEIE www.ijareeie.com 4653

In the Oct system, i.e. every 8 DFFs in the ring counter are grouped into one block for the gated clock technique. Then, a gated signal is computed for each block to gate the frequently toggled clock signal when the block can be inactive so that unnecessary power wasted is eliminated.[2] Fig. 3. Ring counter with clock gated by R S flip flop As shown in fig. 3, when the input of the first DFF In the block is set to 1, it sets the output of R S flip flop to 1 at the next clock edge. Thus, the incoming 1 can be trapped in that block and continue to propagate inside the block. Contrary to this, when this block is active, other blocks are shut down. The successful propagation of 1 to the DFF in the next block can shut down the unnecessary clock signal in the current block. This technique of gated clock ring counter and gated driver are used to reduce power consumption during data write in or read out. III. GATED CLOCK RING COUNTER The ring counter which uses DET flip flop and C-elements are used for the generation of clock gating. To further reduce the activity along the clock distribution network, a gated clock driver tree is then applied. [2] Fig.4. Ring counters with clock gated by C-elements Copyright to IJAREEIE www.ijareeie.com 4654

In the fig.3, extra R-S flip flop still consume more power. So, it is replaced by C-element as in fig.4. Saving more power. 3.1 DET flip flop Double edge triggered flip flops are used instead of a Single edge triggered flip flop in the ring counter. Double edge triggered flip-fops are used to generate two outputs at a single clock pulse. In these flip-flops work will be done at rising edge and falling edge. Thus, clock frequency is reduced to one half. Fig 5. DET flip-flop 3.2 CLOCK GATING USING C-element A gating function is used to turn off the clock to some of the functional module for some extended period of time. Fig.6. C-element Copyright to IJAREEIE www.ijareeie.com 4655

The logic express for C-element [5] is given by C next = AB + BC + CA where A and B are its inputs and C and C next is the current and next output. If A=b, then the next output will be the same as the input. Otherwise, if A B, then output remains unchanged. Since the output of C-element can only be changed when A=B, it can avoid the possibility of glitches. It s avoiding excessive clock loading. It s works with hand shaking protocol. 3.3 GATED CLOCK DRIVER TREE Fig.7. Tree structure clock tree drivers with gating in Octree system. Loading on the global clock signal CLK is diminished further by using this technique. This method efficiently reduces the length of clock path that distributed in whole circuit and distribution layers also get reduced by which the tree contains 8 leaf. A driver tree distribution network is used for the global clock and activated only those drivers along the path from the clock source to the blocks that need to be driven by the clock. IV. GATED DRIVER TREE FOR INPUT AND OUTPUT This technique can eliminate the power wasted on the drivers that need not be activated. Of all the memory cells, only two words will be activated: one is written by the input data and the other is read to the output. Fig.8. gated driver tree of input driving circuitry (Octree system) Copyright to IJAREEIE www.ijareeie.com 4656

Driving the input signal all the way to all memory cells seems to be a waste of power. The same is in the case of read circuitry of the output port. In the input driving/output driving sensing circuitry in the memory module of the delay buffer, the gated clock tree technique is used. V. RESULTS A delay buffer based on the proposed techniques have designed and simulated. The tool used is Micro wind 3.1 for the backend and for power synthesis, Altera Quartus II 10.0 is used. Fig.9. Simulation result for buffer Fig.10. Forming the Mask files Copyright to IJAREEIE www.ijareeie.com 4657

POWER ANALYSIS Gated Ring Counter Structural 64 bits Simulated Power @1.2V, Time Scale 1ns, CMOS 90nm Quad System 1.164 mw Oct System 0.781 mw Difference = 0.383mW Comparison power analysis using Micro wind 3.1 VI. CONCLUSION In this paper, we presented a low-power delay buffer architecture which adopts several novel techniques to reduce power consumption. The ring counter with clock gated by the C-elements can effectively eliminate the excessive data transition without increasing loading on the global clock signal. An-other gated-demultiplexer tree and a gatedmultiplexer tree are used for the input and output driving circuitry to decrease the loading of the input and output data bus. Measurement results indicate that the proposed architecture consumes less than the conventional architecture that measured in Back End tool. VII. FUTURE ENHANCEMENT This low power circuit design is used in portable multimedia and wireless communication devices. More power reduction can be achieved by considering the memory section. Backend by using Micro wind 3.1 The table gives the comparison result between buffer using gated techniques and ring counter with DET flip flop for addressing and the buffer that uses binary counter for addressing. Thus, the proposed buffer consumes less power. REFERENCES [1] Low Power Design of an SRAM Cell for Portable Devices Int l Conf. on Computer & Communication Technology ICCCT 10 [2] a low-power delay buffer using gated driver tree ieee transactions on very large scale integration (vlsi) systems, vol. 17, no. 9, september 2009 [3] W. Eberle et al., 80-Mb/s QPSK and 72-Mb/s 64-QAM flexible and scalable digital OFDM transceiver ASICs for wireless local area networks in the 5-GHz band, IEEE J. Solid-State Circuits, vol. 36, no.11, pp. 1829 1838, Nov. 2001. Copyright to IJAREEIE www.ijareeie.com 4658

[4] M. L. Liou, P. H. Lin, C. J. Jan, S. C. Lin, and T. D. Chiueh, Design of an OFDM baseband receiver with space diversity, IEE Proc.Commun., vol. 153, no. 6, pp. 894 900, Dec. 2006. [5] H. Mathew, A Low Power Memory Design Using Clock Gating Technique, ICVCI-2011, April 7 th, 2011. [6] G. Pastuszak, A high-performance architecture for embedded block coding in JPEG 2000, IEEE Trans. Circuits Syst. Video Technol., vol. 15, no. 9, pp. 1182 1191, Sep. 2005. [7] W. Li and L.Wanhammar, A pipeline FFT processor, in Proc. Workshop Signal Process. Syst. Design Implement., 1999, pp. 654 662. [8] E. K. Tsern and T. H. Meng, A low-power video-rate pyramid VQ decoder, IEEE J. Solid-State Circuits, vol. 31, no. 11, pp. 1789 1794, Nov. 1996. [9] N. Shibata, M.Watanabe, and Y. Tanabe, A current-sensed high-speed and low-power first-in-first-out memory using a wordline/beltline- swapped dual-port SRAM cell, IEEE J. Solid-State circuits, vol.37, no. 6, pp. 735 750, Jun. 2002. [10] E. Sutherland, Micropipelines, Commun. ACM, vol. 32, no. 6, pp.720 738, Jun. 1989. [11] R. Hosain, L. D. Wronshi, and A. albicki, Low power design using double edge triggered flip-flop, IEEE Trans. Very Large Scale Integr.(VLSI ) Syst., vol. 2, no. 2, pp. 261 265, Jun. 1994.. Copyright to IJAREEIE www.ijareeie.com 4659