Resource Efficient Multi Ported Sram Based Ternary Content Addressable Memory

Similar documents
Resource-Efficient SRAM-based Ternary Content Addressable Memory

Area Efficient Z-TCAM for Network Applications

High Performance and Energy Efficient SRAM-based Architecture for TCAM

CONTENT-addressable memory (CAM) is a particular

POWER REDUCTION IN CONTENT ADDRESSABLE MEMORY

DESIGN OF PARAMETER EXTRACTOR IN LOW POWER PRECOMPUTATION BASED CONTENT ADDRESSABLE MEMORY

Ternary Content Addressable Memory Types And Matchline Schemes

A Hybrid Approach to CAM-Based Longest Prefix Matching for IP Route Lookup

Pak. J. Biotechnol. Vol. 14 (Special Issue II) Pp (2017) Keerthiga D.S. and S. Bhavani

Content Addressable Memory with Efficient Power Consumption and Throughput

Design of a High Speed FPGA-Based Classifier for Efficient Packet Classification

Implementation of SCN Based Content Addressable Memory

Content Addressable Memory Architecture based on Sparse Clustered Networks

Unique Journal of Engineering and Advanced Sciences Available online: Research Article

Content Addressable Memory performance Analysis using NAND Structure FinFET

Filter-Based Dual-Voltage Architecture for Low-Power Long-Word TCAM Design

ISSN Vol.05,Issue.09, September-2017, Pages:

Implementation of Efficient Ternary Content Addressable Memory by Using Butterfly Technique

Design and Implementation of Signed, Rounded and Truncated Multipliers using Modified Booth Algorithm for Dsp Systems.

Low Power using Match-Line Sensing in Content Addressable Memory S. Nachimuthu, S. Ramesh 1 Department of Electrical and Electronics Engineering,

AN EFFICIENT DESIGN OF VLSI ARCHITECTURE FOR FAULT DETECTION USING ORTHOGONAL LATIN SQUARES (OLS) CODES

Index Terms- Field Programmable Gate Array, Content Addressable memory, Intrusion Detection system.

Three DIMENSIONAL-CHIPS

International Journal of Advance Engineering and Research Development LOW POWER AND HIGH PERFORMANCE MSML DESIGN FOR CAM USE OF MODIFIED XNOR CELL

Optimized CAM Design

Gated-Demultiplexer Tree Buffer for Low Power Using Clock Tree Based Gated Driver

DESIGN AND IMPLEMENTATION OF VLSI SYSTOLIC ARRAY MULTIPLIER FOR DSP APPLICATIONS

NOISE ELIMINATION USING A BIT CAMS

Design and Implementation of VLSI 8 Bit Systolic Array Multiplier

DESIGN AND PERFORMANCE ANALYSIS OF CARRY SELECT ADDER

Scalable Ternary Content Addressable Memory Implementation Using FPGAs

Decision Forest: A Scalable Architecture for Flexible Flow Matching on FPGA

Tree-Based Minimization of TCAM Entries for Packet Classification

AUTONOMOUS RECONFIGURATION OF IP CORE UNITS USING BLRB ALGORITHM

An Efficient Carry Select Adder with Less Delay and Reduced Area Application

Implementation of Efficient Modified Booth Recoder for Fused Sum-Product Operator

DESIGN ANDIMPLEMENTATION OF EFFICIENT TERNARY CONTENT ADDRESSABLE MEMORY

Power Gated Match Line Sensing Content Addressable Memory

High Performance Interconnect and NoC Router Design

Reliability of Memory Storage System Using Decimal Matrix Code and Meta-Cure

Online Heavy Hitter Detector on FPGA

Scalable and Dynamically Updatable Lookup Engine for Decision-trees on FPGA

ISSN Vol.05, Issue.12, December-2017, Pages:

Dynamically Configurable Online Statistical Flow Feature Extractor on FPGA

A Low Power Asynchronous FPGA with Autonomous Fine Grain Power Gating and LEDR Encoding

INTRODUCTION TO FPGA ARCHITECTURE

Design and Implementation of 3-D DWT for Video Processing Applications

Delay efficient mux approach for finding the first two minimum values

ISSN (Online), Volume 1, Special Issue 2(ICITET 15), March 2015 International Journal of Innovative Trends and Emerging Technologies

Design of high speed low power Content Addressable Memory (CAM) using parity bit and gated power matchline sensing

LOW-POWER SPLIT-RADIX FFT PROCESSORS

OPTIMIZING THE POWER USING FUSED ADD MULTIPLIER

FPGA IMPLEMENTATION OF FLOATING POINT ADDER AND MULTIPLIER UNDER ROUND TO NEAREST

Design of a Pipelined 32 Bit MIPS Processor with Floating Point Unit

Scalable Packet Classification on FPGA

An Architecture for IPv6 Lookup Using Parallel Index Generation Units

A Survey on Design of Content Addressable Memories

The Serial Commutator FFT

Design of low power Precharge-Free CAM design using parity based mechanism

VLSI Implementation of Low Power Area Efficient FIR Digital Filter Structures Shaila Khan 1 Uma Sharma 2

Novel Hybrid k-d-apriori Algorithm for Web Usage Mining

IMPLEMENTATION OF AN ADAPTIVE FIR FILTER USING HIGH SPEED DISTRIBUTED ARITHMETIC

FPGA-BASED DATA ACQUISITION SYSTEM WITH RS 232 INTERFACE

ARITHMETIC operations based on residue number systems

Implementation of Full -Parallelism AES Encryption and Decryption

EMDBAM: A Low-Power Dual Bit Associative Memory with Match Error and Mask Control

An Efficient Design of Sum-Modified Booth Recoder for Fused Add-Multiply Operator

HIGH-PERFORMANCE RECONFIGURABLE FIR FILTER USING PIPELINE TECHNIQUE

the main limitations of the work is that wiring increases with 1. INTRODUCTION

Keywords: Fast Fourier Transforms (FFT), Multipath Delay Commutator (MDC), Pipelined Architecture, Radix-2 k, VLSI.

Packet Classification Using Dynamically Generated Decision Trees

Pipelined Quadratic Equation based Novel Multiplication Method for Cryptographic Applications

A Normal I/O Order Radix-2 FFT Architecture to Process Twin Data Streams for MIMO

Fast FPGA Routing Approach Using Stochestic Architecture

Research Scholar, Chandigarh Engineering College, Landran (Mohali), 2

Design and Implementation of High Performance DDR3 SDRAM controller

Critical-Path Realization and Implementation of the LMS Adaptive Algorithm Using Verilog-HDL and Cadence-Tool

Design and Implementation of FPGA- based Systolic Array for LZ Data Compression

Efficient And Advance Routing Logic For Network On Chip

Energy Aware Optimized Resource Allocation Using Buffer Based Data Flow In MPSOC Architecture

Design and Verification of Area Efficient High-Speed Carry Select Adder

VLSI Implementation of Parallel CRC Using Pipelining, Unfolding and Retiming

A Modified CORDIC Processor for Specific Angle Rotation based Applications

DESIGN AND IMPLEMENTATION ARCHITECTURE FOR RELIABLE ROUTER RKT SWITCH IN NOC

A Novel Design of High Speed and Area Efficient De-Multiplexer. using Pass Transistor Logic

Low Power and Memory Efficient FFT Architecture Using Modified CORDIC Algorithm

Design and Implementation of CVNS Based Low Power 64-Bit Adder

Reconfigurable PLL for Digital System

An Area-Efficient BIRA With 1-D Spare Segments

A Low-Power Field Programmable VLSI Based on Autonomous Fine-Grain Power Gating Technique

High Performance and Area Efficient DSP Architecture using Dadda Multiplier

ISSN Vol.08,Issue.12, September-2016, Pages:

FIR Filter Architecture for Fixed and Reconfigurable Applications

Verilog for High Performance

Design of an Efficient 128-Bit Carry Select Adder Using Bec and Variable csla Techniques

FPGA Implementation of Multiplierless 2D DWT Architecture for Image Compression

Assistant Professor, PICT, Pune, Maharashtra, India

A Simple Method to Improve the throughput of A Multiplier

DESIGN AND IMPLEMENTATION OF ADDER ARCHITECTURES AND ANALYSIS OF PERFORMANCE METRICS

II. MOTIVATION AND IMPLEMENTATION

Transcription:

IOSR Journal of Engineering (IOSRJEN) ISSN (e): 2250-3021, ISSN (p): 2278-8719 PP 11-18 www.iosrjen.org Resource Efficient Multi Ported Sram Based Ternary Content Addressable Memory S.Parkavi (1) And S.Bharath (2) PG Student,M.E, VLSI design, Surya group of institution, Tamil nadu, India Assistant professor, Communication system, Surya group of institution, Tamil nadu, India Abstract: Static Random Access Memory Is Constructed From Ternary Content Addressable Memory Which Provides TCAM Functions By Following It With SRAM. However In SRAM, The Following Of TCAM Suffers From Reduced Memory Efficiency While Map In The TCAM Table. This Is Because Of Low Physical Address And Limited Capacity In SRAM. The Optimal Resource Of Memory Architecture Is Resource Efficient Of TCAM. The Address Information Or Stored In Multiple Virtual Blocks Of SRAM Which Is Presented In The TCAM Table. In Mapping A Greater Position Of TCAM It Increases The Overall Address Space In SRAM Unit. The Overall Followed TCAM Bits/SRAM In A Reduced Throughput. A 512 X 72bit Of Resource Efficient SRAM Based TCAM Consumes A Few Distributed Rams Via Implementation On Xilinx SPARTAN LX9 Field Programmable Gate Array. When Comparing To The Conventional SRAM Based TCAM It Utilizes Only 3.3% Of Memory. Key Words - Field Programmable Gate Array (FPGA), Memory Architecture, Memory Throughput, SRAM Based Ternary Content Addressable Memory, Static Random Access Memory (SRAM). I. Introduction The main function of Ternary Content Addressable Memory (TCAM) is to explore the contents stored in the SRAM memory. This is searched by accessing of the data by a content not by an address. This is different from a static random access memory because static random access memory which is used to search the content by means of memory. The search query is compared to the TCAM with the contents that are already loaded in entire memory continuously and originated results. In the three states the TCAM can be used as follows two binary states 0 s and 1 s and don t care states (X). Data cells stores the binary cells and mask cells stores the don t care cells. The representative TCAM has the primary advantages of quick searching when compared to the SRAM but it also has a limitations. The more transistors are used in TCAM cells so the memory storage has the high production cost per bit and low storage efficiency than SRAM care. This is mainly due to don t care states the additional hardware is needed in terms of bits. To control such limitation in the representative TCAM such as high energy consumption heavy licensing, limited storage capacity, poor scalability and high cost by some TCAM vendors. The SRAM is required to map TCAM bits in the SRAM based TCAM memory architecture. In the search operation the binary query string of fixed length is required. It creates the vector in the matched address location. The logical one is created as a matched location and zero treated as a mismatched location. When the multiple matched bits are in the vector table the priority encoder is used to prioritize the location. The data s are stored in SRAM by an sequential manner the following of TCAM as more significance in the usage of decreasing memory efficiency it mainly concentrated in memory and throughput. The following architecture is to make use of memory throughput trade off. Fig.1. operation of content addressable memory International Conference On Progressive Research In Applied Sciences, Engineering And Technology 11 Page

The rest of the brief is as follows. In the section II previous works on SRAM based TCAM. Section III discussed about proposed of low power multi-port resource efficient SRAM based TCAM architecture. Section IV discusses the virtual blocks of memory architecture. The results analysis and comparison of conventional SRAM based TCAM and multi ported resource efficient SRAM based TCAM are discussed in section V. section VI shows the hardware implementation on FPGA section VII concludes this brief. II. Prievious Works On Sram Based Tcam In FPGA flat form several SRAM based TCAM have been implemented it is implemented on Xilinx FPGA it consists of primitive blocks of RAMS the rams are divided into BRAMs and distributed rams during the search operation the overall power consumption increases and SRAM are activated. To avoid the power consumption low power resource efficient search operation is proposed. The FPGA based classification engine and UETCAM are executed using Altera and Xilinx respectively. The relatively small TCAMs are divided logically from larger TCAM the researcher HP TCAM hybrid portion and SRAM based TCAM ZTCAN and ETCAM. The SRAM architecture is divided into two. The presence of QS is stored in SRAM the address information of particular QS in stored in second SRAM. rest memory architecture to achieve a reduced throughput and memory efficient. In the single rest architecture the high level organization of W X H-W Addresses and H bit of inputs clocks and outputs are shown in fig. the construction of rest block consists of consists of virtual blocks, single port architecture, a single multiplexer with an m inputs, and a single de multiplexer with m outputs, priority encoder and w bit blocks of output. The input of H bit with query strings is divided into large number of B bit substrings. The divided substrings are used as an address in the distributed RAMs with the length of 2b x 1 bit The distributed RAMs are used as an early elimination table in rest memory architecture. The data that present in the distributed ram are used to find the early elimination of substrings in B bit. The outputs of distributed RAMs are single bit output. The outputs are ANDed to form a single ENABLE signal to the virtual table. When the ENABLE signal is logically low the rest architecture is inactivated. Fig.2. Memory architecture of a single REST block of dimension. In addition to the storage of first SRAM in HP TCAM and the second TCAM is later used for generation of address. In the previous work the consider throughput of resource in a following of TCAM consumes huge SRAM. The explosion of SRAM bits in different physical block is essential to have emulation scheme in architecture. Thus the multi ported resources efficient is proposed. International Conference On Progressive Research In Applied Sciences, Engineering And Technology 12 Page

Fig. 3. Organization of the REST blocks to create larger TCAM. III. Proposed Of Low Power Multiport Sram Based Tcam The following of rest memory architecture is derived from classical TCAM for resource efficient ways the same size of strings are searched against QS in a TCAM and strings matches the address are terminated as outputs The virtual blocks in the SRAM plays an important role in Fig. 4. Organization of multiport REST blocks to create larger TCAM. The B bit substrings are correspondingly given to the M bit multiplexer the clock of M bit inputs are also given to the multiplexer. The virtual blocks are 2 b X W bits and the original address of W bit has the same data which B bit substring used. The single post SRAM stores AI which is H X W. The matched address of query string with preloaded data from the single post SRAM with the combination of virtual blocks are directed to M bit de-multiplexer with corresponding W bit register. IV. Virtual Blocks Of Memory Architecture The single port SRAM has the w-bit address and 2 n address. The w-bit address and n-bit data stream are used to store in TCAM table for the process of following of TCAM. AI table is corresponding to the w-bit. The original address position of w-bit represents the storage of query string. The type of mapping the relation of SRAM and TCAM are as follows, International Conference On Progressive Research In Applied Sciences, Engineering And Technology 13 Page

SRAM UNIT (2nxw=TCAM (WXN)) (1) The equation shows that SRAM based TCAM can follows size up to down bit and 2 n xw. Each size of virtual blocks 2 contains TCAM functionality. The rest memory architecture consists of SRAM with virtual blocks. When the virtual blocks increases Emulated TCAM Bit (ETB) also increases. It consumes large number of bit that available in SRAM. This is known as maximal memory efficiency. AI in the SRAM unit allow only limited TCAM bit to add a more number of Fig. 5. VBs of single-port SRAM unit. limited memory resources. The SRAM based TCAM does not give the required throughput because of ETB that has limited of SRAM size in the application specific integrated circuits (ASIC). Fig. 7. Analysis of energy/bit/search of emulated CAM Architectures. A comparison with the conventional TCAM structure the multiport resource efficient SRAM based TCAM has built from m to obtain ETB, latency and throughput. To the best of our knowledge, this is the first work to show relationship between performance parameters of ternary content addressable memory architecture TCAM bit the large number of SRAM required. To overcome the problem the SRAM is half divided address space. The mathematical equation has the pre parameters that are (m, n, W) ETB=m x {w x (n-log2m)} (2) 4.1 Throughput ETB tradeoffs: The comparison of ETB and throughput were in bits and megabit respectively. The clock frequency of 500MHZ that are operated in 512X72 SRAM 1 to 128 of virtual blocks in SRAM increases to 648 to 18432 bits. International Conference On Progressive Research In Applied Sciences, Engineering And Technology 14 Page

The decreases in throughput from 36 gigabit to 281 megabit. The throughput is a calculation of product of clock and W bits. A throughput can be calculated in single rest block. The address bit per second is produced in a rest block is called throughput. The clock frequency is depended on maximum frequency. The increase in maximum frequency decreases the clock frequency so that the throughput is reduced. Throughput = clock x W (3) The virtual block in the rest memory architecture m is always is greater than 1 when ETB values are calculated without virtual blocks than the m=1. This causes the SRAM based TCAM are implemented in practical design V. Result Analysis And Comparison: 5.1 Energy Efficiency Analysis: When comparing to the various TCAM energy efficiency the proposed multiport REST become efficient. The X- axis shows the sizes and cases of emulated TCAM and Y-axis shows the energy per bit. Fig.8. analysis of number of slice register 5.2 Memory Efficiency Analysis: When compared to single port resource efficient SRAM based TCAM of 72 x 28 with SRAM based TCAM of 512x72 the physical memory is increased Fig.9. output of SRAM based TCAM with 72 x28 International Conference On Progressive Research In Applied Sciences, Engineering And Technology 15 Page

Fig.10. output of SRAM based TCAM with 512X7 2 5.3 Area And Power Analysis: The area and power analysis of SRAM based TCAM is computed using the Xilinx x power tool for worst case scenario, in which BRAM and logic blocks are activated and low power is consumed. The store latency is defined as maximum number of clock cycles which is required to write operation. TCAM array size of 512 x72 requires 513 cycles as a word. Fig.11. power report of 72x28 Fig.12.power report of 512 x 72 International Conference On Progressive Research In Applied Sciences, Engineering And Technology 16 Page

Fig.13.area analysis of 72 x28 Fig.14.area analysis of 512x72 Fig.15 RTL veiw of multiport REST architecture International Conference On Progressive Research In Applied Sciences, Engineering And Technology 17 Page

VI. Hardware Implementation: The existing rest block of 72 x 28 has implemented in Xilinx KINTAX 7 FPGA. The rest block has four virtual blocks system clock of 50MHz. Fig.16. Block Diagram Of Hardware Implementation 6.1 Power Consumption: The dynamic power consumption in megahertz in the 72 x 28 rest block for the worst case the power consumption or computed in Xilinx. Logic blocks and BRAMs are activated and there is no power reduction. 6.2 Resource Utilization: In the table 1 shows the resource utilization of 512 x 72 REST block. The column shows their quantity and resources respectively. VII. Conclusion In this brief, we have investigated the trade-off among resources like memory efficiency and throughput in constructing of SRAM based TCAM. The experimental results from the implementation of a multiport 512 X 72 bit REST on FPGA shows a dramatic increase in memory efficiency using four virtual blocks at the cost of reduced throughput. The system uses one 36kb SRAM and eight 64x1 distributed RAM which is equal to 25.3% and 3.5% of memory resources when compared with the ZTCAM and HP TCAM respectively. REFERENCES [1] S. Baeg, Low-power ternary content-addressable memory design using a segmented match line, IEEE Trans. Circuits Syst. I, Reg. [2] Papers, vol. 55, no. 6, pp. 1485 1494, Jul. 2008. [3] K. Pagiamtzis and A. Sheikholeslami, Content-addressable memory (CAM) circuits and architectures: A tutorial and survey, IEEE J. [4] Solid-State Circuits, vol. 41, no. 3, pp. 712 727, Mar. 2006. [5] P. Mahoney, Y. Savaria, G. Bois, and P. Plante, Parallel hashing memories: An alternative to content addressable memories, in Proc.3rd Int. IEEE-NEWCAS Conf., Jun. 2005, pp. 223 226. [6] B. Agrawal and T. Sherwood, Ternary CAM power and delay model: Extensions and uses, IEEE Trans. Very Large Scale Integr. (VLSI) [7] Syst., vol. 16, no. 5, pp. 554 564, May 2008. [8] C. A. Zerbini and J. M. Finochietto, Performance evaluation of packet classification on FPGA-based TCAM emulation architectures, in [9] Proc. Global Commun. Conf. (GLOBECOM), Dec. 2012, pp.2766 2771. [10] W. Jiang, Scalable ternary content addressable memory implementation using FPGAs, in Proc. ACM/IEEE Symp. Archit. [11] Netw. Commun. Syst. (ANCS), Oct. 2013, pp. 71 82. [12] Z. Qian and M. Margala, Low power RAM-based hierarchical CAM on FPGA, in Proc. Int. Conf. Re ConFigurable Comput. FPGAs(ReConFig), Dec. 2014, pp. 1 4. [13] Z. Ullah, K. Ilgon, and S. Baeg, Hybrid partitioned SRAM-based ternary content addressable memory, IEEE Trans. Circuits Syst. I, [14] Reg. Papers, vol. 59, no. 12, pp. 2969 2979, Dec. 2012. [15] Z. Ullah, M. K. Jaiswal, and R. C. C. Cheung, Z-TCAM: An SRAM-based Architecture for TCAM, IEEE Trans. Very Large Scale [16] Integr. (VLSI) Syst., vol. 23, no. 2, pp. 402 406, Feb. 2015. [17] Xilinx, San Jose, CA, USA. Xilinx FPGAs, accessed on Apr. 20, 2016. [Online]. Available: http://www.xilinx.com. International Conference On Progressive Research In Applied Sciences, Engineering And Technology 18 Page