The iflow Address Processor Forwarding Table Lookups using Fast, Wide Embedded DRAM

Similar documents
PUSHING THE LIMITS, A PERSPECTIVE ON ROUTER ARCHITECTURE CHALLENGES

Multi-gigabit Switching and Routing

Last Lecture: Network Layer

100 GBE AND BEYOND. Diagram courtesy of the CFP MSA Brocade Communications Systems, Inc. v /11/21

A 400Gbps Multi-Core Network Processor

Barcelona: a Fibre Channel Switch SoC for Enterprise SANs Nital P. Patwa Hardware Engineering Manager/Technical Leader

Network Layer: Control/data plane, addressing, routers

High-Speed Network Processors. EZchip Presentation - 1

MOSAID Semiconductor

Chapter 2: Memory Hierarchy Design (Part 3) Introduction Caches Main Memory (Section 2.2) Virtual Memory (Section 2.4, Appendix B.4, B.

Growth of the Internet Network capacity: A scarce resource Good Service

Introduction to Routers and LAN Switches

Master Course Computer Networks IN2097

Master Course Computer Networks IN2097

Network Superhighway CSCD 330. Network Programming Winter Lecture 13 Network Layer. Reading: Chapter 4

Router Construction. Workstation-Based. Switching Hardware Design Goals throughput (depends on traffic model) scalability (a function of n) Outline

1 Copyright 2013 Oracle and/or its affiliates. All rights reserved.

Novel Hardware Architecture for Fast Address Lookups

Switch and Router Design. Packet Processing Examples. Packet Processing Examples. Packet Processing Rate 12/14/2011

Static Routing and Serial interfaces. 1 st semester

Professor Yashar Ganjali Department of Computer Science University of Toronto.

A Single Chip Shared Memory Switch with Twelve 10Gb Ethernet Ports

High-Performance Network Data-Packet Classification Using Embedded Content-Addressable Memory

ECEN 449 Microprocessor System Design. Memories. Texas A&M University

Routers Technologies & Evolution for High-Speed Networks

Users Guide: Fast IP Lookup (FIPL) in the FPX

COMP211 Chapter 4 Network Layer: The Data Plane

Forwarding and Routers : Computer Networking. Original IP Route Lookup. Outline

CSCD 330 Network Programming

WASHINGTON UNIVERSITY SEVER INSTITUTE OF TECHNOLOGY DEPARTMENT OF ELECTRICAL ENGINEERING HARDWARE-BASED INTERNET PROTOCOL PREFIX LOOKUPS

White Paper The Need for a High-Bandwidth Memory Architecture in Programmable Logic Devices

Topics for Today. Network Layer. Readings. Introduction Addressing Address Resolution. Sections 5.1,

Cisco Nexus 9508 Switch Power and Performance

Fast IP Routing Lookup with Configurable Processor and Compressed Routing Table

Introduction read-only memory random access memory

Basics of communication. Grundlagen der Rechnernetze Introduction 31

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY

Cisco ASR 1000 Series Routers Embedded Services Processors

Lecture 10: Addressing

TOC: Switching & Forwarding

Broadcom BCM5600 StrataSwitch

Toward a unified architecture for LAN/WAN/WLAN/SAN switches and routers

The Routing Table: A Closer Look

Outline. The demand The San Jose NAP. What s the Problem? Most things. Time. Part I AN OVERVIEW OF HARDWARE ISSUES FOR IP AND ATM.

The Network Processor Revolution

Message Switch. Processor(s) 0* 1 100* 6 1* 2 Forwarding Table

15-744: Computer Networking. Routers

Network Layer PREPARED BY AHMED ABDEL-RAOUF

Decision Forest: A Scalable Architecture for Flexible Flow Matching on FPGA

NetFPGA Hardware Architecture

Performing Path Traces

Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1)

The Memory Component

Lecture 3. The Network Layer (cont d) Network Layer 1-1

CS 356: Computer Network Architectures. Lecture 14: Switching hardware, IP auxiliary functions, and midterm review. [PD] chapter 3.4.1, 3.2.

Advanced Memory Organizations

The D igital Digital Logic Level Chapter 3 1

A distributed architecture of IP routers

ECEN 449 Microprocessor System Design. Memories

Introduction CHAPTER 1

cs144 Midterm Review Fall 2010

NetFlow Monitoring. NetFlow Monitoring

CS252 S05. Main memory management. Memory hardware. The scale of things. Memory hardware (cont.) Bottleneck

Chapter 18. Introduction to Network Layer

CSE 123A Computer Networks

Network Technologies. Unit 5, Chapter 8 Switching and Routing. Cisco Learning Institute Network+ Fundamentals and Certification

IPv6 Neighbor Discovery

Lecture 12: Aggregation. CSE 123: Computer Networks Alex C. Snoeren

Contents. Main Memory Memory access time Memory cycle time. Types of Memory Unit RAM ROM

Computer Networks CS 552

COSC 6385 Computer Architecture - Memory Hierarchies (III)

CS 268: Route Lookup and Packet Classification

Arista EOS Central Drop Counters

TOC: Switching & Forwarding

Chapter 1. Introduction

Last time. Network layer. Introduction. Virtual circuit vs. datagram details. IP: the Internet Protocol. forwarding vs. routing

Main Memory. EECC551 - Shaaban. Memory latency: Affects cache miss penalty. Measured by:

Themes. The Network 1. Energy in the DC: ~15% network? Energy by Technology

Network Processors. Douglas Comer. Computer Science Department Purdue University 250 N. University Street West Lafayette, IN

IPv6 Neighbor Discovery

Algorithmic Memory Increases Memory Performance By an Order of Magnitude

A Fast Ternary CAM Design for IP Networking Applications

Computer Networking. December 2004 CEN CN

Monitoring Data CHAPTER

Configuring IPv6 First-Hop Security

Chapter 18 and 22. IPv4 Address. Data Communications and Networking

The Nios II Family of Configurable Soft-core Processors

Chapter 4: network layer

Homework 3 assignment for ECE671 Posted: 03/01/18 Due: 03/08/18

ECE 485/585 Microprocessor System Design

CSC 4900 Computer Networks: Network Layer

ARM Processors for Embedded Applications

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Memory Organization Part II

Lecture 12: Addressing. CSE 123: Computer Networks Alex C. Snoeren

Network Layer. IP Protocol Stack: Key AbstracHons. Best- Effort Global Packet Delivery. Circuit Switching (e.g., Phone Network)

Implementing Inter-VLAN Routing. 2003, Cisco Systems, Inc. All rights reserved. 2-1

6.6 Subnetting and Variable Length Subnet Masks (VLSMs)

CS 640: Introduction to Computer Networks

1-1. Switching Networks (Fall 2010) EE 586 Communication and. October 25, Lecture 24

Routers. Session 12 INST 346 Technologies, Infrastructure and Architecture

Transcription:

Enabling the Future of the Internet The iflow Address Processor Forwarding Table Lookups using Fast, Wide Embedded DRAM Mike O Connor - Director, Advanced Architecture www.siliconaccess.com Hot Chips 12

Forwarding Table Lookups One key component of routing an IP packet is the forwarding table lookup of the destination address» Uses Classless Inter-domain Routing (CIDR) protocol» Requires finding longest prefix match 127.26.193.12 124. X. X. X 127. X. X. X 127. 12. X. X 127. 26. X. X Lookup an IP destination address 127. 26.190. X 127. 26.193. X 127.244. 21. 6 128. X. X. X Find longest matching prefix in forwarding table Hot Chips 12 2

iflow Address Processor Network Processor or ASIC Standard SRAM Interface Designed for layer 2 and layer 3 switching and routing applications requiring high table densities at up to OC-192 / 10 Gb/s Ethernet line speeds» 10 Gb/s line rates translate into over 25 million packets per second» Enables at least 2 lookups per packet iap iap Standard 133 MHz ZBT SRAM Module Operates as a coprocessor to a Network Processor or ASIC on a router line card» Sits on a standard SRAM bus» Configurable as 32, 64, 96, or 128-bits wide (+ parity)» Bus and chip operate up to 133 MHz» Cascade up to 4 iap chips at full speed Hot Chips 12 3

iap Applications Longest Prefix Match/Exact Match Lookup Table» Up to 256K 48-bit keys IPv4 plus optional (up to 16-bit) VLAN or MPLS tag IEEE 802.3 MAC Address» Up to 128K 96-bit keys IPv4 <S,G> Lookups for multicast MAC Address + VLAN tag» Up to 80K 144-bit keys IPv6 plus optional VLAN or MPLS tags Exact Flow Lookups (e.g. Cisco NetFlow)» Any combination of above subject to capacity e.g. 128K 48-bit and 40K 144-bit Hot Chips 12 4

iap Applications (cont.) 1 st Level Associated Data (1:1 per Flow)» 256K 96-bit data words» Treated as up to 4 fields, including 2 counters (up to 48-bits and 35-bits) Optionally saturating Typically byte and packet counters 13-bit pointer to further 2 nd level associated data 2 nd Level Associated Data» 8K 256-bit data words» Many-to-1 per flow, or 1-to-1 per Next-Hop» Two counter fields (up to 64 bits and 48 bits), Optionally saturating» Typically contains next hop information Next Hop IP Destination Switch Fabric ID Billing information, etc. Hot Chips 12 5

iap Features Automated incremental table maintenance ops» Inserts and Deletes processed in background while lookups continue» Supports over 1M inserts/deletes per second using less than 10% of the lookup capacity Designed for High-Availability Telco Applications» Goal is to prevent silent failures» Parity on all internal memories larger than 1Kbyte» Parity on external address and data buses Hot Chips 12 6

iap Organization Address: 18 + Data: 32, 64, 96, or 128 2 parity plus parity SRAM-based I/ O Interface Data:128 Data:128 Request Assembler Result Buffer Rate Manager/ Field Extractor Data:384 2 nd Level Assoc. Data:256 Key:144 Mask:8 Lookup: Level 0 Index:5 Key:144 Mask:8 Lookup: Level 1 High - Level Engine 2 nd Level Associated Data 1 st Level Data:96 Pointer:13 1 st Level Associated Data Key,Mask:156 Pointer:18 Key:144 Mask:8 Index:9 Lookup: Level 3 Lookup: Level 2 Mask:8 Key:144 Index:13 SRAM Write transactions go to Request Assembler Lookup Pipeline performs Longest- Prefix Match lookup High-Level Engine processes requests (like Inserts) 1 st and 2 nd Level Associated Data are indexed/updated Result Buffer holds results accessed via SRAM read transactions Hot Chips 12 7

iap Lookup Pipeline Level 0: 1 row of 31 entries Indexes one of 32 rows of Level 1 Level 1: 32 rows of 15 entries Indexes one of 512 rows of Level 2 Level 2: 512 rows of 15 entries Indexes one of 8192 rows of Level 3 Organized as a pipelined tree-like structure» First three levels are implemented in SRAM and total approximately 1.2 Mbits in size» First three levels are organized as a B-Tree index of the fourth level» Each SRAM Memory is >2000 bits wide Hot Chips 12 8

iap Lookup Pipeline From Level 2 Level 3: 8192 rows of 32 48-bit entries Final level of lookup pipeline holds entire set of keys» 25 Mbits of embedded DRAM with an access rate of 66 MHz» Organized as 8K rows of 3,200 bits wide» A single row contains enough information to determine the longest prefix match for a search key which indexes that row Hot Chips 12 9

Associated Memory 1 st Level Associated Memory DRAM Result Data:96 Index:18 Update Data:96 Writeback Data:96 Update Logic 1 st Level Associated Memory» Each word can contain two counters one up to 48-bits and one up to 35-bits» Supports reading counters, adding increments, and writing the values back every lookup» 25 Mbits of 256K x 100-bit fast embedded DRAM Supports 133 MHz random-access rate 2 nd Level Associated Memory» Supports on-the-fly statistics updates» Implemented as 2 Mbits of 8K by 256- bit fast embedded DRAM Hot Chips 12 10

Table Maintenance Embedded state machines control operations of high-level table updates Inserts and deletes into the table are processed in the background, while lookups take place. Appropriate bypassing allows updates of the table to be invisible to the lookups.» Lookups get either the original result or the new result never an inconsistent table state Massive internal bandwidth makes this practical Hot Chips 12 11

Embedded DRAM The iap contains a total 52 Mbits of custom embedded DRAM memories» 25 Mb, 3200 bits wide at 66 MHz» 25 Mb, 100 bits wide at 133 MHz» 2 Mb, 256 bits wide at 133 MHz 252 Gb/s aggregate on chip DRAM bandwidth» Equivalent bandwidth from external 133 MHz ZBT SRAMs would require approximately 1900 data pins 70% of the die area is DRAM Hot Chips 12 12

Embedded DRAM challenges Even Embedded DRAMs must be refreshed» Refresh cycles flow down the pipeline at regular intervals If user prefers completely deterministic lookup timing, the user can generate refresh commands externally» Reduces effective maximum lookup rate to 64.6M/sec Large, wide memory blocks create interesting floorplanning issues» 3200-bit wide memory spans almost entire width of die Custom DRAMs require significant resources to develop» Memory design team is ~20 engineers Hot Chips 12 13

Physical Details L3-8Kx544 L3-8Kx544 L3-8Kx512 15 L2 512 x 76 15 L2 512 x 76 L4-256K x 36 L4-256K x 64 L3-8Kx544 L3-8Kx544 L3-8Kx512 L0 L1 FE HLE RA/ RM RB L5-8K x 256 iap is implemented in a 0.18 micron TSMC embedded DRAM process. Die is 13.5 by 13.5 mm 476 pins ~750,000 logic gates 52 Mbits EDRAM 1.2 Mbits SRAM Power is ~5 W at 133 MHz L5-8K x 256 DRAM SRAM Logic Hot Chips 12 14

Conclusions Embedded DRAM enables networking applications requiring high bandwidth to large tables Alternative solutions require more:» Parts» Pins» Power» Cost Silicon Access Networks is working on other chips too Hot Chips 12 15