Filter-Based Dual-Voltage Architecture for Low-Power Long-Word TCAM Design

Similar documents
Ting-Sheng Chen, Ding-Yuan Lee, Tsung-Te Liu, and An-Yeu Wu, Fellow, IEEE

A Survey on Design of Content Addressable Memories

Ternary Content Addressable Memory Types And Matchline Schemes

Low Power using Match-Line Sensing in Content Addressable Memory S. Nachimuthu, S. Ramesh 1 Department of Electrical and Electronics Engineering,

Power Gated Match Line Sensing Content Addressable Memory

Design of high speed low power Content Addressable Memory (CAM) using parity bit and gated power matchline sensing

International Journal of Advance Engineering and Research Development LOW POWER AND HIGH PERFORMANCE MSML DESIGN FOR CAM USE OF MODIFIED XNOR CELL

A Low Power Content Addressable Memory Implemented In Deep Submicron Technology

Content Addressable Memory performance Analysis using NAND Structure FinFET

Optimized CAM Design

Implementation of Efficient Ternary Content Addressable Memory by Using Butterfly Technique

DESIGN ANDIMPLEMENTATION OF EFFICIENT TERNARY CONTENT ADDRESSABLE MEMORY

Design of low power Precharge-Free CAM design using parity based mechanism

Design of Low Power Wide Gates used in Register File and Tag Comparator

POWER REDUCTION IN CONTENT ADDRESSABLE MEMORY

EMDBAM: A Low-Power Dual Bit Associative Memory with Match Error and Mask Control

Content Addressable Memory Using Automatic Charge Balancing with Self-Control Mechanism and Master-Slave Match Line Design

Content Addressable Memory with Efficient Power Consumption and Throughput

DESIGN OF PARAMETER EXTRACTOR IN LOW POWER PRECOMPUTATION BASED CONTENT ADDRESSABLE MEMORY

A Hybrid Approach to CAM-Based Longest Prefix Matching for IP Route Lookup

Content Addressable Memory Architecture based on Sparse Clustered Networks

6T- SRAM for Low Power Consumption. Professor, Dept. of ExTC, PRMIT &R, Badnera, Amravati, Maharashtra, India 1

STUDY OF SRAM AND ITS LOW POWER TECHNIQUES

Resource Efficient Multi Ported Sram Based Ternary Content Addressable Memory

Implementation of SCN Based Content Addressable Memory

LOGIC EFFORT OF CMOS BASED DUAL MODE LOGIC GATES

INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET)

Unique Journal of Engineering and Advanced Sciences Available online: Research Article

A Review Paper on Reconfigurable Techniques to Improve Critical Parameters of SRAM

DESIGN AND SIMULATION OF 1 BIT ARITHMETIC LOGIC UNIT DESIGN USING PASS-TRANSISTOR LOGIC FAMILIES

Memory Design I. Array-Structured Memory Architecture. Professor Chris H. Kim. Dept. of ECE.

Design of a Low Power Content Addressable Memory (CAM)

Research Scholar, Chandigarh Engineering College, Landran (Mohali), 2

MTJ-Based Nonvolatile Logic-in-Memory Architecture

INTERNATIONAL JOURNAL OF PROFESSIONAL ENGINEERING STUDIES Volume 9 /Issue 3 / OCT 2017

NOISE ELIMINATION USING A BIT CAMS

Survey on Content Addressable Memory and Sparse Clustered Network

Analysis of 8T SRAM with Read and Write Assist Schemes (UDVS) In 45nm CMOS Technology

Implementation of Asynchronous Topology using SAPTL

Simulation and Analysis of SRAM Cell Structures at 90nm Technology

Design and Implementation of High Performance Application Specific Memory

Design of local ESD clamp for cross-power-domain interface circuits

LOW POWER SRAM CELL WITH IMPROVED RESPONSE

DYNAMIC CIRCUIT TECHNIQUE FOR LOW- POWER MICROPROCESSORS Kuruva Hanumantha Rao 1 (M.tech)

Survey on Stability of Low Power SRAM Bit Cells

Analysis of 8T SRAM Cell Using Leakage Reduction Technique

Minimizing Power Dissipation during Write Operation to Register Files

RECENTLY, researches on gigabit wireless personal area

120 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 61, NO. 2, FEBRUARY 2014

VERY large scale integration (VLSI) design for power

Implementation of DRAM Cell Using Transmission Gate

Packet Classification Using Dynamically Generated Decision Trees

ISSCC 2006 / SESSION 22 / LOW POWER MULTIMEDIA / 22.1

554 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 16, NO. 5, MAY 2008

! Memory Overview. ! ROM Memories. ! RAM Memory " SRAM " DRAM. ! This is done because we can build. " large, slow memories OR

Power Analysis for CMOS based Dual Mode Logic Gates using Power Gating Techniques

Memory Design I. Semiconductor Memory Classification. Read-Write Memories (RWM) Memory Scaling Trend. Memory Scaling Trend

THE latest generation of microprocessors uses a combination

Design and Implementation of 8K-bits Low Power SRAM in 180nm Technology

+1 (479)

INTERNATIONAL JOURNAL OF PROFESSIONAL ENGINEERING STUDIES Volume 9 /Issue 5 / JAN 2018

A Low Power 32 Bit CMOS ROM Using a Novel ATD Circuit

Novel low power CAM architecture

Tree-Based Minimization of TCAM Entries for Packet Classification

High Performance Memory Read Using Cross-Coupled Pull-up Circuitry

250nm Technology Based Low Power SRAM Memory

POWER ANALYSIS RESISTANT SRAM

ECEN 449 Microprocessor System Design. Memories

A Single Ended SRAM cell with reduced Average Power and Delay

Low Power SRAM Design with Reduced Read/Write Time

Analysis and Design of Low Voltage Low Noise LVDS Receiver

Design and Simulation of Low Power 6TSRAM and Control its Leakage Current Using Sleepy Keeper Approach in different Topology

Semiconductor Memory Classification. Today. ESE 570: Digital Integrated Circuits and VLSI Fundamentals. CPU Memory Hierarchy.

Unleashing the Power of Embedded DRAM

Design of 6-T SRAM Cell for enhanced read/write margin

8Kb Logic Compatible DRAM based Memory Design for Low Power Systems

Millimeter-Scale Nearly Perpetual Sensor System with Stacked Battery and Solar Cells

[Karpakam, 3(11): November, 2014] ISSN: Scientific Journal Impact Factor: (ISRA), Impact Factor: 1.852

Design and simulation of 4-bit ALU Design using GDI Technique for Low Power Application on Microwind 2.6K

OUTLINE Introduction Power Components Dynamic Power Optimization Conclusions

An Area-Efficient BIRA With 1-D Spare Segments

A 65nm LEVEL-1 CACHE FOR MOBILE APPLICATIONS

Design and Analysis of 32 bit SRAM architecture in 90nm CMOS Technology

DECOUPLING LOGIC BASED SRAM DESIGN FOR POWER REDUCTION IN FUTURE MEMORIES

Comparative Analysis of Contemporary Cache Power Reduction Techniques

DESIGN AND IMPLEMENTATION OF 8X8 DRAM MEMORY ARRAY USING 45nm TECHNOLOGY

1. Designing a 64-word Content Addressable Memory Background

Chapter 3 Semiconductor Memories. Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan

International Journal of Scientific & Engineering Research, Volume 5, Issue 2, February ISSN

Near Optimal Repair Rate Built-in Redundancy Analysis with Very Small Hardware Overhead

Comparative Analysis of Low Leakage SRAM Cell at 32nm Technology

COMPARITIVE ANALYSIS OF SRAM CELL TOPOLOGIES AT 65nm TECHNOLOGY

A Novel Design of High Speed and Area Efficient De-Multiplexer. using Pass Transistor Logic

CALCULATION OF POWER CONSUMPTION IN 7 TRANSISTOR SRAM CELL USING CADENCE TOOL

Lecture 13: SRAM. Slides courtesy of Deming Chen. Slides based on the initial set from David Harris. 4th Ed.

Lecture 14. Advanced Technologies on SRAM. Fundamentals of SRAM State-of-the-Art SRAM Performance FinFET-based SRAM Issues SRAM Alternatives

IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY LOW POWER SRAM DESIGNS: A REVIEW Asifa Amin*, Dr Pallavi Gupta *

Digital Fundamentals. Integrated Circuit Technologies

A 256-Radix Crossbar Switch Using Mux-Matrix-Mux Folded-Clos Topology

ECEN 449 Microprocessor System Design. Memories. Texas A&M University

Transcription:

Filter-Based Dual-Voltage Architecture for Low-Power Long-Word TCAM Design Ting-Sheng Chen, Ding-Yuan Lee, Tsung-Te Liu, and An-Yeu (Andy) Wu Graduate Institute of Electronics Engineering, National Taiwan University, Taipei, 106, Taiwan, R.O.C. {tsc, dan} @access.ee.ntu.edu.tw, ttliu@ntu.edu.tw, andywu@ntu.edu.tw Abstract Ternary Content Addressable Memory (TCAM) provides superior table-lookup performance to the conventional search algorithms. However, as word length increases, large-sized TCAM suffers from high power consumption and throughput degradation. This paper presents filter-based dual voltage architecture for long-word TCAM. Without sacrificing throughput and noise immunity, the proposed filter-based dual voltage architecture can reduce lots of dynamic and static power consumption. Simulation results show that the proposed technique can achieve 87% of dynamic power and 63% static power reduction in 40nm CMOS technology when compared with traditional TCAM architectures. This proposed architecture is well suitable for applications that need long-word TCAM. Fig. 1. Block diagram of content addressable memory (CAM). Keywords Low power, ternary content addressable memory (TCAM), dual rail voltage design, scalability issue. I. INTRODUCTION Ternary Content Addressable Memory (TCAM) provides superior table-lookup performance to the conventional search algorithms. It finds a match address in a clock cycle period by comparing a search word with all the stored word with its parallel architecture. As a result, TCAM had been widely employed in many searching/comparison applications that demand for a short search latency [1]-[3]. Fig. 1 shows the block diagram of CAM. A search word is broadcasted onto search-lines (SL) of the table of stored words. The size of a word is L-bit and the number of entries is typically from a few hundred to thousands. Each word has a match-line (ML) that indicates whether search data and stored word are the same. The priority encoder then outputs the match location. The CAM with mask ability is called Ternary CAM (TCAM). The mask bit of TCAM always matches the search bit regardless of the content of search bit. Although TCAM offers outstanding search performance, it is very power hungry and expensive due to its parallel architecture [4]. With the development of information and communication technology, the word size of table-lookup application is gradually increasing. For instance, IPv6 [5] with 144-bit of word length gradually replace IPv4 (32 bits) in network switch. Furthermore, many-field classification with more than 500-bit word size gains popularity in OpenFlow and Software-Defined Neworking environment [6]. Above examples indicate that the requirement of word length of information searching and comparing is rising. However, the word length of conventional 978-1-4673-8473-5/16/$31.00 2016 IEEE. 165 Fig. 2. Pull down currents from ML to ground: (a) match (b) 1-bit mismatch. TCAM architecture usually ranging from 32 bits to 144 bits [7] [8], which is quite insufficient for the increasing needs. Owing to the large ML capacitance, as word length increases, large sized TCAM suffers from not only high power consumption, but throughput degradation. Large-sized TCAM also affects the search operation. As shown in Fig. 2, the total ML currents of an L-bit NOR-type TCAM (1 entry) for the cases of match and 1-bit mismatch can be written as follows: I ML _ match 2 L I OFF, I ML _ miss I ON (2 L 1) I OFF, (1) (2) where ION and IOFF are the current of matched and mismatched cell, respectively.

increases, large sized TCAM suffers from not only high power consumption but throughput degradation. Although NANDtype TCAM consumes less power consumption; however, the pull down path is too long to be applied in the large-sized TCAM. As a result, NOR-type is more suitable for long word TCAM. Fig. 3. Conventional L-bit NAND-type TCAM. (1) and (2) show that if the cell ION / IOFF ratio decreases or the word length L increases, it would be hard to distinguish a match case from a 1-bit mismatch case. As a result, except for high throughput and low-power, keeping a robust noise margin is important in large-sized TCAM design. In this paper, we aim to reduce dynamic and static power consumption of long word TCAM. Moreover, the proposed technique should not sacrifice noise margin. The main contributions of this paper are as follows: 1) Filter-based architecture appeared in many related works is well suitable for long word TCAM. In this work, we further analysis the best size of the filter so that the dynamic power consumption can be optimized as low as possible. Appropriate size of word filter can reduce 87% dynamic power consumptions. 2) Without sacrificing any throughput and noise immunity, the proposed dual rail-voltage techniques can reduce 63% static power consumptions. Besides, the complexity of the control unit is relatively low. Fig. 4. High-level structure of the filter-based L-bit TCAM. The rest of this paper is organized as follows. Section II briefly introduces and analyzes the traditional TCAM architecture. Section III presents the proposed low power filter-based dual-rail voltage for long word TCAM. Section IV shows the simulation result of the proposed TCAM. Section V concludes this paper. II. BACKGROUND Fig. 5. Leakage current components in a NOR-type TCAM cell. There are two types of TCAM: NAND and NOR-type TCAM. Fig. 3 shows the NAND-type TCAM. When MLPre is logic 1, ML_NAND is pre-charged to high voltage by the PMOS P2. Search data are sent to XNOR cell, while the NMOS N1 is turned off to avoid short current to ground. When MLPre is logic 0, the PMOS P2 stop charging ML, and the NMOS N1 is turned on. If the input data match the stored word, the ML will be discharged to ground. The mask bits can turn on the pass transistor of NAND cell. Fig. 2 shows the NOR-type TCAM. Unlike NAND-type TCAM that pulls ML down when all cell match occurs, NOR-type TCAM discharge ML to ground when any mismatch happens. Owing to large capacitive loads of ML, as the word length Although a TCAM with maximum 640-bit of word length had been proposed in [10], the energy metric is 2.12 fj/bit/search, which is several times than other related works. Many energy reduction techniques are proposed to reduce TCAM power consumption [4], such as low swing match-line [7] and low swing search-line [8]. For dynamic power consumption, hierarchical ML is a very popular architecture to reduce high ML power consumption. For instance, a TCAM with Butterfly Matchline (Butterfly TCAM) proposed in [9] saves lots of dynamic power consumptions. Fig. 4 shows the high-level structure of filter-based TCAM. Since hierarchical ML can effectively filter the words, the later part of ML is seldom activated by pre-search phase. 166

Fig. 6. Filter-based dual-rail voltage architecture for long-word TCAM (1 entry). Despite the fact that hierarchical ML can reduce dynamic power significantly, most of power dissipation in the main search component is leakage power consumption, which still cause considerable power consumptions. Fig. 5 shows the leakage current in a TCAM cell. The red solid arrows are the sub-threshold leakage currents, I sp and I sn, from PMOS and NMOS transistors, respectively. The red dotted arrows are the gate-oxide leakage currents, resulting from any voltage difference between the gate and source/drain of a transistor. To reduce static power consumption, lowering the supply voltage is a popular technique at circuit level. However, lower supply voltage sacrificing noise immunity; even result more hard to distinguish a match case from a 1-bit mismatch case. III. FILTER-BASED DUAL RAIL VOLTAGE ARCHITECTURE Fig 6. shows the proposed filter-based dual-rail voltage architecture. The 1-bit pipeline register pass the match information of the first 15-bit TCAM. The activation of the main search component EN in Fig. 6 is given by: EN MLpre Match, (3) where MLpre is clock signal, pre-charge ML when logic one, and evaluate match result when logic zero. Match in (3) is the match result of pre-search filter, logic one means that input search word match the first 15-bit NOR TCAM. Only when both MLpre and Match are logic one would pre-charge high capacitive load of main search component. To ensure the function correctness, if ML_main in Fig.6 is activated but ML_filter transmit mismatch information in the next clock cycle, the ML_main in Fig.6 will be discharged to the ground. For convenience, the stored word in the pre-search TCAM is just the left 15-bit of the table. The content of the stored data is typically about 50% don t care data and 50% logic zero and logic one. If the matching probability of each bit in the pre-search is 0.75, the matching probability of the filter is (0.75) 15 0.013. Because of the low probability of mainsearch activation, the average bit of activated powerdemanding main-search is reduced from 497-bit to (497*(0.75) 15 ) 6.5-bit. TABLE I TWO MODES OF THE PROPOSED DUAL VOLTAGE TECHNIQUE Mode Active mode Data retention mode Match result of pre-search 1 (Match) 0 (Mismatch) Activation of main-search Supply voltage of storage cell ON 0.9V OFF 0.6V Since the pre-search stage can filter the word efficiently, most of power dissipation in the main search component is leakage power consumption. The operation still causes considerable power consumptions. Therefore, we lower the supply voltage of main search core cells to reduce static power consumption. In this paper, we validate and implement our design in 40nm CMOS technology. The supply voltage for normal logic operation is 0.9V. The threshold voltage is about 0.45V, depending on the design corner, temperature and process variation. We choose the lower supply voltage which is 0.6V. Despite the fact that we can use the supply voltage lower than 0.6V for more power reduction, lower voltage would result in more wake-up power consumptions and worse noise immunity. Table I indicates the dual modes of the proposed TCAM architecture. If the match result of pre-search is mismatch (logic-zero), the main-search component would be operated in data-retention mode. When the match result of pre-search is matched (logic-one), the supply voltage of main-search component would be raised to 0.9V for better noise margin and operation speed. IV. SIMULATION RESULTS The proposed filter-based dual-rail voltage TCAM architecture is simulated in CMOS 40nm technology. The search delay is 2ns. We build an array with 512-bit of word length and 128-word TCAM structure. To validate the timing stability of the proposed TCAM, the ML_main should be pulled down in time when the case of 1-bit mismatch; 167

Fig. 8. Expectation value of power consumption for different size of filter. TABLE III SUMMARY OF DIFFERENT TERNERY CONTENT-ADDRESSABLE MEMORY Fig. 7. Simulated waveforms of the filter-based dual voltage architecture. TABLE II POWER REDUCTION OF THE PROPOSED ARCHITECTURE (PER BIT) This work (Simulation) Butterfly [9] 4-parallel TCAM [10] Technology 40nm 65nm 28nm w/o Filter & Dual Voltage w/ Filter & Dual Voltage Reduction Supply Voltage 0.9/0.6 V 1.0 V 0.85 V Static Power 22.5318 nw 8.2526 nw -63.3 % Word-length 512 bits 144 bits 640 bits Dynamic Power (Mismatch) Dynamic Power (Match) 0.247 0.247 0.031 0.2836-87.4 % Throughput 500 MHz 400 MHz 400 MHz +15.0 % Energy Metric (fj/bit/search) 0.059 0.165 1.96 conversely, when the case of the match case, ML_main should remain at logic one. Fig. 7 shows simulated waveforms of the filter-based dual voltage structure. Traditional low voltage design sacrifices noise immunity and takes more evaluation time when 1-bit mismatch occurring. However, with filter-based dual rail architecture, such long-word TCAM can achieve low dynamic and static power consumption, with more stable noise margin. The dynamic power consumption is evaluated with 50% mask data and toggled input. The simulated dynamic power consumption is normalized to energy metric (i.e., /Bit) at 500-MHz search frequency. Table II shows the simulated power results with the proposed filter-based dual-rail voltage architecture. We can see a significant power reduction with the proposed architecture. A little increase (15%) in dynamic power when a match occurs in the word-filter, which is coming from raising the supply voltage of the TCAM core cell to 0.9V. Fig 8. indicates the expected value (PEX) of energy metric for each rule, which is derived as follows: PEX Filter(k)+( retention(512k) active(512-k) 168 where filter(k) is the power consumption of k-bit word filter, retention(512-k) and active(512-k) are sleep and active power consumption of the main search component, respectively. Length of word filter is scanned from 1-bit to 50-bit. Since dual rail voltage architecture would cause the penalty of wakeup power consumptions, the optimal length of word-filter would increase; the stricter rule filter would help main search component save more power consumption. However, longer length of word-filter would cause more dynamic power consumption itself. As the result, we choose 15-bit length as filter for minimal power consumptions. Table III shows a comparison of the proposed TCAM with other TCAM architectures. As we can see, owing to the filterbased dual voltage architecture, our TCAM architecture can support long search word with very low power consumption, This filter-based dual voltage architecture is well suitable for applications that needs long-word TCAM in the future. CONCLUSION In this paper, we present filter-based dual-rail voltage architecture for large-sized TCAM. Compare to traditional TCAM architectures, the proposed filter-based dual-rail voltage architecture reduces 87% of dynamic and 63% of

static power consumption without sacrificing any throughput. In addition to the significant reduction of power consumptions, this technique can let long-word TCAM be operated at stable and robust noise margin without sacrificing noise immunity. This filter-based dual voltage architecture is well suitable for applications that needs long-word TCAM. ACKNOWLEDGMENT This work was supported in part by the Ministry of Science and Technology, ROC, under grant 103-2218-E-002-033, 104-3115-E-002-005, and 103-2218-E-002-029-MY2. REFERENCES [1] M. Nakanishi and T. Ogura, Real-time CAM-based Hough transform and its performance evaluation, Machine Vision Appl., vol. 12, no. 2, pp. 59 68, Aug. 2000. [2] P.-F. Lin and J. Kuo, A 1-V 128-kb four-way set-associative cmos cache memory using wordline-oriented tag-compare (WLOTC) structure with the content-addressable-memory (CAM) 10-transistor tag cell, IEEE J. Solid-State Circuits, vol. 36, no. 4, pp. 666 675, Apr.2001. [3] C.-C. Wang, C.-J. Cheng, T.-F. Chen, and J.-S. Wang, An adaptively dividable dual-port BiTCAM for virus-detection processors in mobile devices, IEEE J. Solid-State Circuits, vol. 44, no. 5, pp. 1571 1581, May 2009. [4] K. Pagiamtzis and A. Sheikholeslami, Content-addressable memory (CAM) circuits and architectures: A tutorial and survey, IEEE J. Solid- State Circuits, vol. 41, no. 3, pp. 712 727, Mar. 2006. [5] Deering and R. Hinden, RFC 1883: Internet Protocol Version 6 (IPv6) Specification, Internet Engineering Task Force (IETF), Dec.1995. [6] Nunes, B.A.A, et al. "A Survey of Software-Defined Networking: Past, Present, and Future of Programmable Networks," IEEE Commun. Surv. Tut., vol.16, no.3, pp.1617-1634, Third Quarter 2014 [7] C.-S. Lin et al., A 250-MHz 18-Mb full ternary CAM with low-voltage matchline sensing scheme in 65-nm CMOS, IEEE J. Solid-State Circuits, vol. 48, no. 11, pp. 2013 2671, Nov. 2013. [8] B.-D. Yang, Y.-K. Lee, S.-W. Sung, J.-J. Min, J.-M. Oh, and H.-J. Kang, A low power content addressable memory using low swing search lines, IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 58, no.12, pp. 2849 2858, Dec. 2011. [9] P.-T. Huang and W. Hwang, A 65 nm 0.165 fj/bit/search 256x144TCAM macro design for IPv6 lookup tables, IEEE J.Solid- State Circuits, vol. 46, no. 2, pp. 507 519, Feb. 2011. [10] K. Nii, T. Amano, N. Watanabe, M. Yamawaki, K. Yoshinaga, M. Wada, and I. Hayashi, A 28nm 400MHz 4-parallel 1.6Gsearch/s 80Mb ternarycam, in 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), Feb 2014, pp. 240 241. 169