A Comparative Study of Power Efficient SRAM Designs

Similar documents
Column decoder using PTL for memory

Introduction to SRAM. Jasur Hanbaba

Integrated Circuits & Systems

+1 (479)

Low-Power SRAM and ROM Memories

PICo Embedded High Speed Cache Design Project

STUDY OF SRAM AND ITS LOW POWER TECHNIQUES

Power Reduction Techniques in the Memory System. Typical Memory Hierarchy

High Performance Memory Read Using Cross-Coupled Pull-up Circuitry

A Low Power SRAM Base on Novel Word-Line Decoding

Lecture 13: SRAM. Slides courtesy of Deming Chen. Slides based on the initial set from David Harris. 4th Ed.

Lecture 11 SRAM Zhuo Feng. Z. Feng MTU EE4800 CMOS Digital IC Design & Analysis 2010

ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems

Memory Design I. Array-Structured Memory Architecture. Professor Chris H. Kim. Dept. of ECE.

CENG 4480 L09 Memory 2

8Kb Logic Compatible DRAM based Memory Design for Low Power Systems

Simulation and Analysis of SRAM Cell Structures at 90nm Technology

Minimizing Power Dissipation during Write Operation to Register Files

SRAM. Introduction. Digital IC

! Memory. " RAM Memory. " Serial Access Memories. ! Cell size accounts for most of memory array size. ! 6T SRAM Cell. " Used in most commercial chips

6T- SRAM for Low Power Consumption. Professor, Dept. of ExTC, PRMIT &R, Badnera, Amravati, Maharashtra, India 1

Memory. Outline. ECEN454 Digital Integrated Circuit Design. Memory Arrays. SRAM Architecture DRAM. Serial Access Memories ROM

! Memory Overview. ! ROM Memories. ! RAM Memory " SRAM " DRAM. ! This is done because we can build. " large, slow memories OR

Low Power SRAM Design with Reduced Read/Write Time

High-Performance Full Adders Using an Alternative Logic Structure

International Journal of Advance Engineering and Research Development LOW POWER AND HIGH PERFORMANCE MSML DESIGN FOR CAM USE OF MODIFIED XNOR CELL

CMOS Logic Circuit Design Link( リンク ): センター教官講義ノートの下 CMOS 論理回路設計

Introduction to CMOS VLSI Design Lecture 13: SRAM

DESIGN AND IMPLEMENTATION OF 8X8 DRAM MEMORY ARRAY USING 45nm TECHNOLOGY

Digital Integrated Circuits Lecture 13: SRAM

Very Large Scale Integration (VLSI)

Design of Low Power Wide Gates used in Register File and Tag Comparator

Analysis of 8T SRAM with Read and Write Assist Schemes (UDVS) In 45nm CMOS Technology

Memory Design I. Semiconductor Memory Classification. Read-Write Memories (RWM) Memory Scaling Trend. Memory Scaling Trend

Semiconductor Memory Classification. Today. ESE 570: Digital Integrated Circuits and VLSI Fundamentals. CPU Memory Hierarchy.

Low Power using Match-Line Sensing in Content Addressable Memory S. Nachimuthu, S. Ramesh 1 Department of Electrical and Electronics Engineering,

Unit 7: Memory. Dynamic shift register: Circuit diagram: Refer to unit 4(ch 6.5.4)

A Single Ended SRAM cell with reduced Average Power and Delay

A Novel Architecture of SRAM Cell Using Single Bit-Line

Dual Port SRAM. Research Article. Rajeshwari Mathapati a*, Geetanjali Kamble a and S.K.Shirakol a

CMPEN 411 VLSI Digital Circuits Spring Lecture 22: Memery, ROM

Design of 6-T SRAM Cell for enhanced read/write margin

A novel DRAM architecture as a low leakage alternative for SRAM caches in a 3D interconnect context.

A Low Power 32 Bit CMOS ROM Using a Novel ATD Circuit

Analysis of Power Dissipation and Delay in 6T and 8T SRAM Using Tanner Tool

Semiconductor Memory Classification

Design and Analysis of 32 bit SRAM architecture in 90nm CMOS Technology

Introduction to Semiconductor Memory Dr. Lynn Fuller Webpage:

Memory Arrays. Array Architecture. Chapter 16 Memory Circuits and Chapter 12 Array Subsystems from CMOS VLSI Design by Weste and Harris, 4 th Edition

Spiral 2-9. Tri-State Gates Memories DMA

6. Latches and Memories

Module 6 : Semiconductor Memories Lecture 30 : SRAM and DRAM Peripherals

Data Cache Final Project Report ECE251: VLSI Systems Design UCI Spring, 2000

Evaluating STT-RAM as an Energy-Efficient Main Memory Alternative

SIDDHARTH INSTITUTE OF ENGINEERING AND TECHNOLOGY :: PUTTUR (AUTONOMOUS) Siddharth Nagar, Narayanavanam Road QUESTION BANK UNIT I

CS250 VLSI Systems Design Lecture 9: Memory

Prototype of SRAM by Sergey Kononov, et al.

Research Scholar, Chandigarh Engineering College, Landran (Mohali), 2

Design and Implementation of Low Leakage Power SRAM System Using Full Stack Asymmetric SRAM

EE577b. Register File. By Joong-Seok Moon

POWER ANALYSIS RESISTANT SRAM

Low Power PLAs. Reginaldo Tavares, Michel Berkelaar, Jochen Jess. Information and Communication Systems Section, Eindhoven University of Technology,

VERY large scale integration (VLSI) design for power

Reducing Instruction Cache Energy Using Gated Wordlines. Mukaya Panich. Submitted to the Department of Electrical Engineering and Computer Science

Optimized CAM Design

Novel low power CAM architecture

A 65nm LEVEL-1 CACHE FOR MOBILE APPLICATIONS

A Low Power SRAM Cell with High Read Stability

Reducing DRAM Latency at Low Cost by Exploiting Heterogeneity. Donghyuk Lee Carnegie Mellon University

CALCULATION OF POWER CONSUMPTION IN 7 TRANSISTOR SRAM CELL USING CADENCE TOOL

SYSTEM SPECIFICATIONS USING VERILOG HDL. Dr. Mohammed M. Farag

Memory Classification revisited. Slide 3

Improved Initial Overdrive Sense-Amplifier. For Low-Voltage DRAMS. Analog CMOS IC Design. Esayas Naizghi April 30, 2004

Exploring High Bandwidth Pipelined Cache Architecture for Scaled Technology

Learning Outcomes. Spiral 2-9. Typical Logic Gate TRI-STATE GATES

Dynamic CMOS Logic Gate

Lecture 11: MOS Memory

EECS 427 Lecture 17: Memory Reliability and Power Readings: 12.4,12.5. EECS 427 F09 Lecture Reminders

Unleashing the Power of Embedded DRAM

Chapter 3 Semiconductor Memories. Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan

Introduction to CMOS VLSI Design. Semiconductor Memory Harris and Weste, Chapter October 2018

INTERNATIONAL JOURNAL OF PROFESSIONAL ENGINEERING STUDIES Volume 9 /Issue 3 / OCT 2017

Design and Implementation of 8K-bits Low Power SRAM in 180nm Technology

A Novel Design of High Speed and Area Efficient De-Multiplexer. using Pass Transistor Logic

Sense Amplifiers 6 T Cell. M PC is the precharge transistor whose purpose is to force the latch to operate at the unstable point.

The Memory Hierarchy 1

International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering

Minimizing Power Dissipation during. University of Southern California Los Angeles CA August 28 th, 2007

Performance Analysis and Designing 16 Bit Sram Memory Chip Using XILINX Tool

Silicon Memories. Why store things in silicon? It s fast!!! Compatible with logic devices (mostly)

POWER EFFICIENT SRAM CELL USING T-NBLV TECHNIQUE

DYNAMIC CIRCUIT TECHNIQUE FOR LOW- POWER MICROPROCESSORS Kuruva Hanumantha Rao 1 (M.tech)

International Journal of Scientific & Engineering Research, Volume 5, Issue 2, February ISSN

Memory Supplement for Section 3.6 of the textbook

Memories: Memory Technology

THE latest generation of microprocessors uses a combination

A Low Power Content Addressable Memory Implemented In Deep Submicron Technology

Digital Electronics. CHAPTER THIRTY TWO. Semiconductor Read-Only Memories

Silicon Memories. Why store things in silicon? It s fast!!! Compatible with logic devices (mostly) The main goal is to be cheap

250nm Technology Based Low Power SRAM Memory

Transcription:

A Comparative tudy of Power Efficient RAM Designs Jeyran Hezavei, N. Vijaykrishnan, M. J. Irwin Pond Laboratory, Department of Computer cience & Engineering, Pennsylvania tate University {hezavei, vijay, mji}@cse.psu.edu ABTRACT This paper investigates the effectiveness of combination of different low power RAM circuit design techniques. The divided bit line (DBL), pulsed word line (PWL) and isolated bit line (IBL) strategies have been implemented in a various size RAM designs and evaluated using 0.35Micron technology and 3.3V VDD at 100MHz frequency. Different decoder structures have been investigated for their power efficiency as well. It is observed that the power reduces by 29%, 32% and 52% over an unoptimized RAM design when (PWL+IBL), (PWL+DBL) and (PWL+IBL+DBL) are implemented in a 256*2 size RAM respectively. Keywords Low Power, RAM, oder. 1. INTRODUCTION In many current applications such as multimedia, the memory system has been demonstrated to be the main power-consuming unit. Thus, a significant effort has been invested in reducing the power of CMO RAM chips using circuit and architectural techniques. For DRAM's, the main techniques used to reduce power are partial activation of multi-divided arrays (both for word-line and bit-line), half-vdd precharging, and lowering operating voltage using external power supply reduction [4][5]. The techniques used in RAM's are of particular interest due to the large RAM-based on-chip cache structures employed in current processors. Partially activating a divided bit and word lines, isolating the sense amplifier from the bit line, pulsing the word driver and column circuitry, reducing the bit/word-line swings, and charge recycling in the I/O buffer are some of the techniques that can reduce the RAM power consumption [4][5]. In [1], an automatic-power-save architecture, a pulsed word technique and an isolated bit line technique reduced the power dissipation of the cache memory to almost 60% at a frequency of 60MHz and to 20% at 10MHz by these techniques for 0.5Micron technology, 3.3V VDD. In [2], a novel hierarchical divided bit-line approach for reducing active power in RAMs by reducing bitline capacitance was introduced. This approach was found to reduce the power consumption by 50-60%. In this paper, we apply a combination of divided bit-line approach and the isolated bit-line approach along with the pulsed word-line technique [3] in reducing the power consumption in RAM caches. In addition, we investigate the design of low power decoder designs for the cache. Various decoder designs have been investigated [6,7] for speed and power in the past. In this paper, we analyze six different decoder structures for their power efficiency. The rest of the paper is organized as follows. ection 2 details the design of the different components of the RAM structure for the different optimizations. The cache power result for the different combination of optimizations is presented in section 3. Finally, we close with conclusions. 2. RAM POWER REDUCTION TECHNIQUE The floorplan of basic cache block is shown in (Fig. 1). The core is a matrix of standard 6-transistor RAM cells. Rows are the word lines and columns include the bit lines. In this work the core's columns were initially divided into smaller chunks (divided bit line (DBL) (Fig. 2))[2,6]; however, the rows remained continuous. DBL enhances the read/write speed of the circuit as the long bit lines are split into shorter ones and the bit line capacitances are decreased. This eventually reduces the power consumption significantly. DBL also allows us to split the row decoders as only one segment needs to be activated at a time and the rest of the segments can be kept idle. It will be explained later in this paper that splitting the decoder will reduce the decoder power consumption. The decoder is a multi stage structure constructed of basic blocks, which were made of dynamic 3-input nand gates. It provides pulsed outputs. This way allows controlling the word line drivers, which enables

us to implement the Pulsed Word Line (PWL) scheme with no additional hardware overhead. The idea of PWL is to minimize the duration of active input on word lines by deactivating the word lines (and RAM cells) before the bit line voltages make a full swing. This leads to a reduced power consumption and enhanced speed. The last technique used in this work is the Isolated Bit line cheme (IBL). ense amplifiers are included in the memory-readcircuitry to speed up the read operation. Here, the sense amplifiers attached to the bit lines are isolated after they detect a sufficient voltage difference on the bit line and bit line/. This prevents a full swing on the entire bit line and saves energy. The sense amplifiers are also isolated from the bit lines during the entire write operation. 2.1 oder The speed of the row decoder has a great impact on memory performance [6]. The row decoder drives the word lines of the RAM array. At each read/write operation only one driver is active, making the RAM cells connected to the corresponding word line accessible. The pulsed word line scheme (PWL) can be implemented by gating the outputs of standard decoders with a control enable pulse (from the sensing of voltage swings on the bit lines), which limits the duration of active output signals. In our design, the PWL control is integrated in the row decoder design by use of dynamic logic. PreCharge oder Memory Core a d Read/Write Circuit Fig. 1: Basic Memory Block Diagram Pre Pre witch b e BitLines (2N Rows) N Rows N Rows W/R W/R c f Fig. 2: Divided Bit line cheme (DBL), witches are Controlled by input address. Fig. 3: Different 3 Input Basic Gates: a) CMO Nand, b) Dynamic Nand, c) kewed Nand, d)cmo Nor, e) Dynamic Nor, f) kewed Nor.

2.1.1 Basic oder Using fast, low power basic gates, can greatly optimize the delay and power consumption of the decoder. (Fig. 3) shows schematics of different core 3-input gates, which were used to build basic 3x8 decoders. In (Fig. 4) comparison between the hspice simulation results of all schematics is presented. The schemes used are as follows: a) CMO Nand, b) Dynamic Nand, c) kewed Nand, d) CMO Nor, e) Dynamic Nor, f) kewed Nor. Dynamic Nands with shared NMO and PMO transistors consume the least amount of power as they disconnect the direct VDD/GND path all the time [9]. Based on these results and according to dynamic gates' behavior basic decoders were built using 3-input dynamic nand gate. Watts 7.00E-003 6.00E-003 5.00E-003 4.00E-003 3.00E-003 2.00E-003 1.00E-003 0.00E+000 Average Power 6.57E-003 8.06E-005 9.35E-004 3.36E-004 6.26E-005 1.18E-004 a b c d e f Active Components For Addresses between 000000 and 000111 Address Idle Components Fig. 5: Two tage 6X64 oder howing Active and Idle Areas for Addresses Ranging between 000000 & 000111. PreCharge VDD Worst Case Delay econds 7.00E-010 6.00E-010 5.00E-010 4.00E-010 3.00E-010 2.00E-010 6.19E-010 4.51E-010 4.07E-010 3.79E-010 2.49E-010 2.10E-010 RAM RAM 1.00E-010 0.00E+000 a b c d e f BitLine BitLine/ Fig. 4: Average Power Consumption & Worst Case Delay of Circuits in Fig. 3. 2.1.2 Total tructure The complete decoder can be implemented by cascading the basic decoders in multi stages (based on the size of the decoder). (Fig. 5) shows a sample 6x64 decoder. Basic decoders are decoders explained in the previous part. A zero voltage on precharge input of the first stage grounds all the outputs of the first stage connected to the precharge input of the second stage decoders and consequently all the final outputs. When the precharge is at high voltage only one output of the first stage is going high. Hence, only one single output of the final stage will be active. This keeps the other components of the second stage idle and as no switching occurs in them they consume very little power. An example is shown in (Fig. 5). For an address ranging between 000000 and 000111 only the mentioned area is switching and the rest of the circuit sits idle. This way the dynamic power is minimized, while controllable pulses are generated at the output (for PWL). Wb Write Column elect RAM MUX ENE Amp IOLATE Wb/ Fig. 6: Overall View of Bit Line tructure and Connected Circuits.

2.2 Memory Core: Bit Line Architecture An overall view of the bit line structure used in this work is shown in (Fig. 6). Each column includes the following parts: 2.2.1 RAM Generic 6-transistor RAM cells (Fig. 7) [9] were used as memory core cells. They are costly and occupy large areas, but they are widely used due to important advantages including higher speed and less static current (consequently less power consumption). isolation transistors can be turned off after a minimum 10% voltage difference is sensed between the lines. BitLine/ BitLine Rd Rd/ Out VDD BitLine BitLine/ Fig. 8: Two tage ense Amplifier Fig. 7: ix-transistor RAM Cell GND 2.2.2 Pre charge Logic and Column MUX A pair of pull up transistors per each column was used in the design [8]. The pull up transistors are turned on before each read/write operation precharging both the bit line and bit line/ to a high voltage and then will be switched off at the beginning of the operation leaving the bit lines at the same voltage. During the operation either bit line or bit line/ will have a voltage swing while the other one will stay at precharge level voltage. While a precharge operation is performed the column MUX's disconnect the bit lines from the read/write circuitry. This minimizes the occurrence of direct Vdd/Gnd paths in the column, which is one of the major sources of power consumption in every circuit. However, this is not the major task of the column MUX. Column MUX's are basically used to avoid duplicating the read/write circuitry for all bit lines. A simple tree MUX was used in this work. 2.2.3 Read Circuitry ense Amplifier: During the read operation the voltage on one of the bit line or bit line/ will slightly start droping. A full swing on the line is rather a slow operation. Hence, sense amplifiers are used to sense the slight voltage difference and amplify it to a correct data value. Additional stages can boost the speed of read operation significantly [8]. A two stage sense amplifier was used (Fig. 8) in this work. Isolation Transistors: In IBL scheme a pair of isolation transistors are used to disconnect the sense amplifier from the bit lines by the time the correct data is detected. This technique reduces the read power as it prevents the complete swing on the bit lines. It also disconnects the sense amplifiers from the bit lines during write operation as they are not needed. Generally the 2.2.4 Write Circuitry Two pass transistors controlled by WB and WB/ signals are the devices used to control write operation. The two mentioned signals are assumed to be generated by a control unit not included in memory circuitry [8]. As mentioned, before the beginning of write operation both bit line and bit line/ are precharged and all the word lines are grounded. During the data write operation only one of the two signals is active and the connected pass transistor grounds the relative line forcing a zero voltage to that line. Depending on whether bit line or bit line/ is grounded a zero or one value will be written into the active RAM cell respectively. 2.3 Cache Power Characterization The layout of different memory power optimizations were done using the magic design tools using 0.35Micron technology and simulated using Avant! Hspice (100 MHz frequency and power supply voltage of 3.3 V). Initially DBL, IBL and PWL schemes were implemented. (Fig. 9) shows the percentage of average power consumed by each subcomponent during different operations. Operations are listed in (Table 1). The results shown were averaged over simulations performed on different RAM configurations ranging between (64, 128, 256 bits bit line size and 1, 2 and 4 bits word line size). Name Operation 1 H0-W0 Cell Holds 0 a 0 is written into that 2 H1-W1 Cell Holds 1 a 1 is written 3 H0-W1 Cell Holds 0 a 1 is written 4 H1-W0 Cell Holds 1 a 0 is written 5 R0 A 0 is read 6 R1 A 1 is read Table 1: Different Memory Operations

56.78% 70.30% 42.09% 28.92% H0-W0 Write: 0.44% Read: 0.69% H0-W1 Write: 0.30% Read: 0.48% 59.26% 70.00% 39.68% 29.22% H1-W1 Write: 0.42% Read: 0.65% H0-W1 Write: 0.31% Read: 0.48% 3. POWER AVING (Fig.11) shows the average power consumed by a 256*2 RAM with different memory power reduction schemes and compare them for different operations listed in (Table1). PWL is initially implemented in all schemes. The average power savings gained by different schemes are 29%, 32% and 52% using (IBL+PWL), (DBL+PWL) and (DBL+IBL+PWL) respectively. The power savings obtained using the IBL and DBL techniques are much less as compared to [1] and [2] due to the smaller size of our cell array and the decoder power is more significant in our design. Also, we observe a potential for more optimizations by combination of different circuit optimizations. 1.40E-03 50.29% 23.88% Read: 25.58% R1 Write: 0.25% 41.80% 28.59% Read: 29.32% R0 Write: 0.30% Watts 1.20E-03 1.00E-03 8.00E-04 6.00E-04 4.00E-04 Fig. 9: Average Power Consumption Percentage of Different Memory Components During Different Operations (Table 1). 2.00E-04 0.00E+00 H0-W0 H1-W1 H0-W1 H1-W0 R0 R1 Watts 6.00E-06 5.00E-06 4.00E-06 3.00E-06 2.00E-06 1.00E-06 0.00E+00 H0-W0 H1-W1 H0-W1 H1-W0 Fig. 10: Average Power Comparison of ingle Memory Cell For Operations Listed in Table 1. According to (Fig 9) the major amount of power is consumed by cell array. The amount of cell power consumption is different in each access according to the operation (Read/Write) and the values ( 0 / 1 ). (Fig. 10) shows the comparison between power consumed by each single cell (6-transistor RAM) in the core during each operation. The characterizing information provided in this figure can serve as useful information for modeling high-level cache power models for the 0.35Micron designs. R0 R1 IBL+PWL IBL+DBL+PWL PWL DBL+PWL Fig. 11: Memory Average Power Consumption Using Combinations of Different Power aving chemes. 4. CONCLUION The effectiveness of a combination of different low power RAM design circuit techniques that included the divided bit line (DBL), pulsed word line (PWL) and isolated bit line (IBL) strategies was investigated. A series of different size RAM s using these techniques was implemented in 0.35Micron, 3.3V technology and simulated at 100MHz frequency. It is observed that the power reduces by 29%, 32% and 52% over an unoptimized RAM design when (PWL+IBL), (PWL+DBL) and (PWL+IBL+DBL) are used in a 256*2 size RAM. 5. REFERENCE [1] himazi Y., et. al., An automatic-power-save cache memory for low-power RIC processors, IEEE ymposium on Low Power Electronics, pp. 58-59, 1995. [2] A. Karandikar and K. K. Parhi, Low power RAM design using hierarchical divided bit-line approach, Proc. of International Conference on Computer Design, pp.82-88, 1998.

[3] D. T. Wong, An 11-ns 8K*18 CMO static RAM with 0.5- mu m devices, IEEE Journal of olid-tate Circuits, 23(5):1095-1103, Oct. 1998. [4] K. Itoh, K. asaki, Y. Nakagome, Trends in low-power RAM circuit technologies, Proceedings of the IEEE, 83(4):524-543, April 1995. [5] M. B. Kamble, K. Ghose, Energy Efficiency of VLI Caches: A Comparative tudy, Proc. IEEE International Conference on VLI Design, pp. 261-267, 1997. [6] B.Amrutur, Design and Analysis of Fast Low Power RAMs, Ph.D. Thesis, Department of Electrical Engineering, tanford University, 1998. [7] B. Bhaumik, et. al., A low power 256 KB RAM design, Proc. 12th International Conference on VLI Design, pp. 67-70, 1999. [8]. M. Kang, Y. Leblebici, CMO Digital Integrated Circuits Analysis and Design, econd Edition, McGraw-Hill, 1999. [9] J. M. Rabaey, Digital Integrated Circuits a Design Perspective, Prentice Hall, 1996.