Implementing High-Speed Search Applications with APEX CAM

Similar documents
Using TriMatrix Embedded Memory Blocks

On-Chip Memory Implementations

Using Flexible-LVDS Circuitry in Mercury Devices

Using Flexible-LVDS I/O Pins in

2. TriMatrix Embedded Memory Blocks in Stratix II and Stratix II GX Devices

APEX Devices APEX 20KC. High-Density Embedded Programmable Logic Devices for System-Level Integration. Featuring. All-Layer Copper.

Designing with ESBs in APEX II Devices

System-on-a-Programmable-Chip (SOPC) Development Board

MAX 10 Embedded Memory User Guide

February 2002, ver. 4.3

MAX 10 Embedded Memory User Guide

Intel MAX 10 Embedded Memory User Guide

Internetworking Part 1

Nios Soft Core Embedded Processor

Stratix. Introduction. Features... Programmable Logic Device Family. Preliminary Information

Section I. Cyclone FPGA Family Data Sheet

White Paper The Need for a High-Bandwidth Memory Architecture in Programmable Logic Devices

Configuration via Protocol (CvP) Implementation in Altera FPGAs User Guide

Configuring SRAM-Based LUT Devices

Configuring APEX 20K, FLEX 10K & FLEX 6000 Devices

Programmable Memory Blocks Supporting Content-Addressable Memory

APEX II The Complete I/O Solution

2. System Interconnect Fabric for Memory-Mapped Interfaces

Digital Communication Networks

Chapter 5 Memory Hierarchy Design. In-Cheol Park Dept. of EE, KAIST

Section I. Cyclone FPGA Family Data Sheet

Stratix. Introduction. Features... 10,570 to 114,140 LEs; see Table 1. FPGA Family. Preliminary Information

440GX Application Note

Altera FLEX 8000 Block Diagram

Feature EPF10K30E EPF10K50E EPF10K50S

Implementing Double Data Rate I/O Signaling in Stratix & Stratix GX Devices. Introduction. DDR I/O Elements. Input Configuration

Design Verification Using the SignalTap II Embedded

Simultaneous Multi-Mastering with the Avalon Bus

TOC: Switching & Forwarding

TOC: Switching & Forwarding

a16450 Features General Description Universal Asynchronous Receiver/Transmitter

3. Memory Blocks in the Cyclone III Device Family

ET4254 Communications and Networking 1

Bus System. Bus Lines. Bus Systems. Chapter 8. Common connection between the CPU, the memory, and the peripheral devices.

Intel Cyclone 10 LP Core Fabric and General Purpose I/Os Handbook

CS610- Computer Network Solved Subjective From Midterm Papers

ZBT SRAM Controller Reference Design

Memory. Objectives. Introduction. 6.2 Types of Memory

Tag Switching. Background. Tag-Switching Architecture. Forwarding Component CHAPTER

Section I. Cyclone FPGA Family Data Sheet

3. Memory Blocks in Cyclone III Devices

CS 457 Networking and the Internet. Network Overview (cont d) 8/29/16. Circuit Switching (e.g., Phone Network) Fall 2016 Indrajit Ray

BROADBAND AND HIGH SPEED NETWORKS

CS 856 Latency in Communication Systems

BROADBAND AND HIGH SPEED NETWORKS

Implementing LVDS in Cyclone Devices

EECS150 - Digital Design Lecture 16 Memory 1

MICROTRONIX AVALON MULTI-PORT FRONT END IP CORE

Low Power Design Techniques

HOW TURBO ACL S WORK

Lecture 9: Bridging & Switching"

ATM Technology in Detail. Objectives. Presentation Outline

NAPIER UNIVERSITY SCHOOL OF COMPUTING

CS 4453 Computer Networks Winter

Section III. Transport and Communication

! Program logic functions, interconnect using SRAM. ! Advantages: ! Re-programmable; ! dynamically reconfigurable; ! uses standard processes.

Configuration of Synchronous Protocols

Configuring RTP Header Compression

The Network Layer and Routers

Buses. Disks PCI RDRAM RDRAM LAN. Some slides adapted from lecture by David Culler. Pentium 4 Processor. Memory Controller Hub.

Estimating Nios Resource Usage & Performance

Configuring PPP over ATM with NAT

CSE 123: Computer Networks Alex C. Snoeren. HW 2 due Thursday 10/21!

9. Configuration, Design Security, and Remote System Upgrades in Arria II Devices

EECS150 - Digital Design Lecture 16 - Memory

Implementation of the Content Addressable Memory (CAM) as an Encoder for the Silicon Track Card (STC)

Router Router Microprocessor controlled traffic direction home router DSL modem Computer Enterprise routers Core routers

MPLS/Tag Switching. Background. Chapter Goals CHAPTER

RSVP Support for RTP Header Compression, Phase 1

White Paper AHB to Avalon & Avalon to AHB Bridges

SONET/SDH STS-12c/STM-4 Framer MegaCore Function (STS12CFRM)

13. Configuring Stratix & Stratix GX Devices

Computer Architecture Memory hierarchies and caches

ATM. Asynchronous Transfer Mode. these slides are based on USP ATM slides from Tereza Carvalho. ATM Networks Outline

Bridging and Switching Basics

A General Purpose Queue Architecture for an ATM Switch

ECE 485/585 Microprocessor System Design

Chapter 10. Circuits Switching and Packet Switching 10-1

Data and Computer Communications. Protocols and Architecture

HY225 Lecture 12: DRAM and Virtual Memory

Nios Embedded Processor Development Board

PPPoE Client DDR Idle-Timer

Nepal Telecom Nepal Doorsanchar Company Ltd.

What is Network Acceleration?

Configuring PPP over ATM with NAT

Asynchronous Transfer Mode

Last Lecture: Network Layer

LUC4AU01 ATM Layer UNI Manager (ALM)

5. Configuring Cyclone FPGAs

Content Addressable Memory with Efficient Power Consumption and Throughput

Section 6.2, IP Routing. Section 6.4, IP/VPN Policy. Section 6.5, IP Quality of Service. Section 6.6, The BANDIT as Firewall

Caches. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Overview. Implementing Gigabit Routers with NetFPGA. Basic Architectural Components of an IP Router. Per-packet processing in an IP Router

Configuring IP Services

OPERATING SYSTEM. Functions of Operating System:

Transcription:

Implementing High-Speed Search Applications with APEX July 999, ver.. Application Note 9 Introduction Most memory devices store and retrieve data by addressing specific memory locations. For example, a system using RAM or ROM searches sequentially through memory to locate data. However, this technique can slow system performance since the search requires multiple clock cycles to complete. You can considerably reduce the time required to find an item stored in memory by identifying stored data by content, rather than by its address. Memory accessed in this way is called content-addressable memory (). offers a performance advantage over other memory search algorithms, such as binary-based searches, tree-based searches, or lookaside tag buffers, because it compares the desired information against the entire list of pre-stored entries simultaneously. Thus, provides an order-of-magnitude reduction in the search time. is ideally suited for many applications, including Ethernet address lookup, data compression, pattern recognition, cache tags, fast routing table lookup, high-bandwidth address filtering, user privileges, and security and encryption information. This application note discusses the following topics: Fundamentals in APEX TM 2KE Devices Applications Fundamentals is based on RAM technology. RAM operates as a circuit that stores data at a particular address. When retrieving data from RAM, the system supplies the address and then receives the data. With, the system supplies the data instead of the address. To locate stored data, takes one clock cycle to search through all memory locations in parallel and returns the data s address. drives a match flag high if the data is found, or low if the data is not found. Figure shows a block diagram of operation. Altera Corporation A-AN-9-.

Figure. Block Diagram Data Match Flag Accelerates Searches can accelerate applications requiring fast searches of databases, lists, or patterns, such as in image or voice recognition. For example, the search key could be a network user s internet protocol (IP) address, and the associated information could be a user s access privileges and location on the network. If the search key presented is available in, indicates a match and returns the associated information, i.e., the user s privileges. Integration Currently, most applications requiring fast searches use discrete. Designers must add a separate device to their printed circuit board (PCB), which increases design time and reduces the amount of usable PCB space. Discrete also reduces system performance because it introduces additional on-chip and off-chip delays. APEX 2KE devices, which contain on-chip built into their embedded system blocks (ESBs), eliminate the disadvantages of discrete. APEX 2KE on-chip has an access time of 4 ns, compared to a 2-ns access time for a typical discrete. Because is integrated inside an APEX 2KE device, it provides faster system performance than traditional discrete. APEX 2KE device is optimized for smalland medium-sized applications that are described in the Applications section on page 8. in APEX 2KE Devices In APEX 2KE devices, each ESB can implement a 32-word 32-bit block. Figure 2 shows implemented in an ESB, and Figure 3 shows a block diagram of. 2 Altera Corporation

Figure 2. Implementing in an ESB data[] inclocken inclock inaclr D ena Q data_in 32-word Comparator 32-bit outclocken outclock D ena Q data_address[] wren wraddress[] data[] Write Control RE outaclr match inclock Figure 3. APEX 2KE Block Diagram wraddress[] data[] wren inclock inclocken inaclr data_address[] match outclock outclocken outaclr Writing to APEX You can either pre-load with data during configuration, or you can write data during system operation. In most cases, two clock cycles are required to write each word into. Figure 4 shows the waveform for an 8-bit input written to address A of a block. The data is driven to for two clock cycles. Figure 4. Writing an 8-Bit Input to Cycle Cycle 2 Clock Write Data Write A Altera Corporation 3

A design can write don t care bits into words; bits set to don t care do not affect matching. The don t care bits can be used as a mask for comparisons. A third clock cycle is required when don t care bits are used. The don t care bits are signified by inverting them on the third clock cycle. Figure 5 shows the waveform for a X word (with a don t care bit) written to. Figure 5. Writing X to Cycle Cycle 2 Cycle 3 Clock Write Data "Don't Care" Bit Reading from APEX The output is either encoded or unencoded. When encoded, the ESB outputs the data s location as an encoded address. Encoded data is better suited for designs without duplicate data in the memory, and reading an encoded output requires only one clock cycle. Figure 6 shows the encoded output for in APEX 2KE devices. Figure 6. Encoded Output data[3..] = 45 Data 5 27 45 85 2 3 addr[4..] = 2 match = Encoded Output 4 Altera Corporation

You should use an unencoded output if the same data may be written to multiple locations in the memory. In this mode, an ESB uses 6 outputs and reads the outputs in two cycles: 6 bits at each cycle to represent the 32 words in the block. Each output represents one word of the. There is no match output in this mode; if any of the outputs go high, there is a match (e.g., if the data is located in address 5, the fifteenth output line goes high). Figure 7 shows the s unencoded output in APEX 2KE devices. Figure 7. Unencoded Output q data[3..] () (2) q5 Unencoded Outputs clock unencoded output to 5 6 to 3 Notes: () For an unencoded output, the ESB only supports 3 input data bits. One input bit is used by the line to choose one of the two banks of 6 outputs. (2) If the input is a, then outputs words through 5. If the input is a, outputs words 6 through 3. Deeper & Wider Blocks Each ESB in an APEX 2KE device supports a -Kbit block (32 words of 32 bits each). You can implement wider or deeper by combining multiple blocks using logic elements (LEs). The Quartus TM software combines ESBs and LEs automatically to create larger blocks. There is no intrinsic limit to cascading APEX 2KE ESBs; all ESBs in a device can be combined into one very large block. For example, by cascading 64 (out of 4) ESBs in an EP2K4E device, you can generate a 2,48-word 32-bit or,24-word 64-bit block of. Large APEX 2KE devices, such as the EP2KE device (with 6 ESBs), can generate a 4,96-word 32-bit block using 28 ESBs. Altera Corporation 5

Creating Deeper To create deeper blocks, the Quartus software cascades the output of each ESB. Both encoded and unencoded outputs are used to create deeper blocks. In an encoded implementation, a multiplexer s the output of one of the ESBs and drives it out. The line of the multiplexer is controlled by the match flags of the ESBs. Figure 8 shows an example of 64-word 32- or 3-bit implemented with encoded and unencoded outputs. 6 Altera Corporation

Figure 8. Creating Deeper with Encoded & Unencoded Outputs Encoded Output Words 3 to data[3..] data data_address data_addr[4..] ESB match data_address_5 data_address data match ESB match Words 63 to 32 Unencoded Output Words 5 to and 47 to 32 data[3..] (2) data data_address a/a32 () ESB a5/a47 () data data_address a6/a48 () ESB a3/a63 () Words 3 to 6 and 63 to 48 Notes: () Words through 3 are driven out in parallel in the first clock cycle, and words 32 through 63 are driven out in parallel in the second clock cycle. (2) For an unencoded output, the ESB only supports 3 input bits. One input bit s one of the two banks of 6 outputs. Altera Corporation 7

Creating Wider To increase the width of a block, the Quartus software cascades the ESB s unencoded outputs. Encoded outputs cannot be used, because two different data words may coincidentally contain matching portions and cause an incorrect output. To cascade the ESBs, each bit of the first ESB is ANDed with the corresponding bit of the second ESB. When both ESBs report a match, the entire word matches the stored word. Figure 9 shows an example of 32-word 62-bit implemented with unencoded outputs. Figure 9. Creating Wider with an Unencoded Output ESB q ESB q data[3..] data[6..3] data data_address q5 data data_address q5 q q5 Applications is used to accelerate a variety of applications such as local-area networks (LANs), database management, file-storage management, table look up, pattern recognition, artificial intelligence, fully associative and processor-specific cache memories, and disk cache memories. can also perform any search operation. This section discusses the following applications: Data Compression Network Switch IP Filters ATM Switch Cache Tags Peripheral Component Interconnect (PCI) and Other Bus Applications Data Compression Data compression removes redundancy in a given piece of information, producing an equivalent, but shorter, message. Data compression is particularly useful in communications because it allows devices to transmit the same amount of data using fewer bits. 8 Altera Corporation

implements data compression efficiently because it can quickly search through the data structure containing the compression information. Because a good portion of a compression algorithm s time is spent searching and maintaining this data structure, a hardware search engine can greatly increase the algorithm throughput. look-up is performed after each word is presented. If the specific code is not found in, another word is shifted in. When the code is found, outputs the appropriate token and the input register is flushed. generates a result in a single transaction regardless of the table size or length of the search list. This process makes ideal for data compression schemes that use sparsely populated tables as part of their algorithm. Figure shows an example of data compression using. Figure. Using for Data Compression Uncompressed Data Sequence 42 59 83 42 83 27 59 Contents Data 42 59 83 2 27 3 Token Compressed Data 2 2 3 Network Switch Switch applications use to process the address information from incoming packets. To switch a packet to the correct outgoing port, the incoming packet address is compared with a table of network addresses stored in. outputs the destination for each data packet based on its address. can store network address and switch port numbers (see Figure ). in the switch compares gathered data against its stored table. If the comparison yields a match, outputs the destination, and routing control forwards the packet to the correct port. Altera Corporation 9

Figure. Using as a Network Switch PC 272 PC 9743 PC 746 2 3 PC 654 Switch 4 PC 98 Network Contents Data 654 272 9743 2 746 3 98 4 Port IP Filters An IP filter is a security feature that prohibits unauthorized users from accessing LAN resources. It can also restrict IP traffic over a wide-area network (WAN) link. With an IP filter, LAN users can be restricted to specific applications on the Internet (such as e-mail). works as a filter to block all access except for packets that have permission. The addresses that have permission are stored in ; when an address is sent to memory, reports whether it contains the address. If the address resides within, it has permission for a particular activity. Figure 2 shows an example of an IP filter. Figure 2. Using as an IP Filter es which are Granted Permission Contents Data 27 3A 2 4D 3 Packet 27 3A 4F 25 Permission Permit Permit Deny Deny Altera Corporation

When multiple permissions are required, a combination of and RAM enables this operation. Figure 3 shows a sample application that regulates access to e-mail, the web, file transfer protocol (FTP), and telnet. This application uses a 4-bit RAM block; each bit of RAM refers to one permission, or access. Figure 3. Multiple-Permission IP Filter RAM Net 6 Requested Data Net Net 6 Net 3 Net 25 2 3 2 3 Web Data E-mail FTP Telnet Web & E-mail Permission ATM Switch can be used in asynchronous transfer mode (ATM) switching network components as a translation table. Because ATM networks are connection-oriented, virtual circuits must be set up before transferring data. There are two kinds of ATM virtual circuits: virtual path (identified by a virtual path identifier (VPI)) and channel path (identified by a channel path identifier (VCI)). VPI/VCI values are localized; each segment of the total connection has a unique VPI/VCI combination. Whenever an ATM cell travels through a switch, its VPI/VCI value must change the next segment of connection through a process called VPI/VCI translation. It is critical to optimize the translation speed to improve the performance of high throughput ATM networks. acts as an address translator in an ATM switch and performs the VPI/VCI translation very quickly. During the translation process, processes the incoming VPI/VCI values in ATM cell headers and generates addresses that access data stored in RAM. RAM stores the VPI/VCI mapping data and other connection information. VPI/VCI fields from the ATM cell header are compared against a list of current connections stored in the array. From the comparison, generates an address that is used to access the RAM. A combination of and RAM implements the translation tables with fully parallel search capability. Altera Corporation

The ATM controller modifies the cell header using the VPI/VCI data from the RAM, and the cell is sent to the switch. This application is shown in Figure 4. For optimal performance, both and RAM should be embedded into the same device. Figure 4. in an ATM Switch ATM Controller Switch Fabric Current Connection Data VCI/VPI8 VCI/VPI23 VCI/VPI2 2 2 RAM Data VCI/VPI35 VCI/VPI28 VCI/VPI3 Next Connection Cache Tags Cache is high-speed memory that enables a microprocessor to quickly access a subset of data from the main memory. The microprocessor can access data stored in cache much faster than data located in the main memory. Cache stores recently used items in a small amount of fast memory; recently accessed words replace previously used words. Cache uses both and RAM to store data; stores the address, or tag, where the data can be found in RAM, and RAM contains the actual data. For optimal performance, both and RAM should be embedded into the same device. When requesting data, the microprocessor submits a data tag to the cache. The cache compares the tag requested by the microprocessor with tags stored in the tag field. All tags in are compared simultaneously (in parallel) with the requested tag. If the tag is located in the block (i.e., a match is found), s match flag goes high. also sends the address of the data to RAM, which in turn outputs the requested data to the microprocessor. Figure 5 diagrams this process. If does not find a match, s match flag goes low, and the cache controller transfers the requested data from the main memory into cache. The new data and address are stored in RAM and, respectively, and replace previously used data. Only the most recently used data is stored in cache. 2 Altera Corporation

Figure 5. Searching with the Tag Field in Cache Cache Microprocessor Data (Tag) 32 44 65 2 RAM Data A734 F275 2 B3 Cache Controller Match Flag Main Memory PCI Applications In a system that utilizes a dynamic memory map, you can use to store memory addresses for faster access. For example, in a peripheral component interconnect (PCI) system, a single PCI device may contain up to six memory spaces allocated in system memory. The exact location of these spaces is determined on power-up, and their starting locations are written into the PCI interface s six base address registers (BARs). When a PCI master requests access to a PCI device s memory location, can be used to match the request to the address in memory quickly, as shown in Figure 6. To recognize which BAR is being called, compares the requested address against the stored base addresses. Altera Corporation 3

Figure 6. PCI Master Request PCI Target PCI Master PCI Content BAR BAR Accessed BAR BAR2 2 BAR3 3 BAR4 4 BAR5 5 By using for PCI applications, the PCI master can locate the requested BAR faster. The implementation also requires fewer device resources than a LE implementation. Conclusion Revision History can be used to accelerate a variety of search applications. By embedding into the APEX 2KE architecture, Altera improves the performance of memory searches without on- and off-chip delays. The information contained in Application Note 9 (Implementing High- Speed Search Applications with APEX ) version. supersedes information published in previous versions. Application Note 9 (Implementing High-Speed Search Applications with APEX ) version. contains an update to Figure 2 on page 3. 4 Altera Corporation

Copyright 995, 996, 997, 998, 999 Altera Corporation, Innovation Drive, San Jose, CA 9534, USA, all rights reserved. By accessing this information, you agree to be bound by the terms of Altera s Legal Notice.