A Programmable and Highly Pipelined PPP Architecture for Gigabit IP over SDH/SONET

Similar documents
Investigation into Programmability for Layer 2 Protocol Frame Delineation Architectures

Flow control: Ensuring the source sending frames does not overflow the receiver

CCNA Exploration1 Chapter 7: OSI Data Link Layer

Request for Comments: June MAPOS - Multiple Access Protocol over SONET/SDH Version 1

Advanced Computer Networks. Rab Nawaz Jadoon DCS. Assistant Professor COMSATS University, Lahore Pakistan. Department of Computer Science

POS on ONS Ethernet Cards

Data Link Layer. Indian Institute of Technology Madras

POS on ONS Ethernet Cards

Framing and Stuffing. Advanced Computer Networks

Direct Link Networks. Framing. Lecture - Encoding & Framing 1. Problems. Areas for Discussion

Hubs, Bridges, and Switches (oh my) Hubs

Data Link layer (CN chap 3.1, 3.4, 3.6)

Data Link Protocols. TCP/IP Suite and OSI Reference Model

Chapter 3. The Data Link Layer

DLL: Flow Control DLL. Simplex. Fast sender / slow receiver scenario. Various protocols used. Simplified examples implemented in C.

M242 COMPUTER NETWORS AND SECURITY

Network Working Group Request for Comments: T. Yoshida Werk Mikro Systems July 2003

Part 5: Link Layer Technologies. CSE 3461: Introduction to Computer Networking Reading: Chapter 5, Kurose and Ross

Single Channel HDLC Core V1.3. LogiCORE Facts. Features. General Description. Applications

Chapter 3. The Data Link Layer

Other Protocols. Arash Habibi Lashkari

Lecture 1.1: Point to Point Protocol (PPP) An introduction

Code No: RR Set No. 1

3. Data Link Layer 3-2

QoS-Aware IPTV Routing Algorithms

Data Link Layer, Part 4. Exemplary Protocols

CS422 Computer Networks

32 Channel HDLC Core V1.2. Applications. LogiCORE Facts. Features. General Description. X.25 Frame Relay B-channel and D-channel

Direct Link Networks. Lecture - Encoding & Framing 1. Areas for Discussion. Problems

1/29/2008. From Signals to Packets. Lecture 6 Datalink Framing, Switching. Datalink Functions. Datalink Lectures. Character and Bit Stuffing.

Data Link Layer. Overview. Links. Shivkumar Kalyanaraman

3. (a) What is bridge? Explain the operation of a LAN bridge from to (b) Explain the operation of transparent bridge.

CSMC 417. Computer Networks Prof. Ashok K Agrawala Ashok Agrawala. Nov 1,

Network Working Group. Category: Informational DayDreamer August 1996

CSMC 417. Computer Networks Prof. Ashok K Agrawala Ashok Agrawala Set 4. September 09 CMSC417 Set 4 1

The Data Link Layer Chapter 3

Programmable Memory Blocks Supporting Content-Addressable Memory

Introductions. Computer Networking Lecture 01. January 16, HKU SPACE Community College. HKU SPACE CC CN Lecture 01 1/36

Ethernet Switches (more)

Virtual Private Networks (VPNs)

What is the fundamental purpose of a communication system? Discuss the communication model s elements.

Ethernet Switches Bridges on Steroids. Ethernet Switches. IEEE Wireless LAN. Ad Hoc Networks

Where we are in the Course

Lecture 5: Data Link Layer Basics

Fibre Channel Arbitrated Loop v2.3

Virtual Link Layer : Fundamentals of Computer Networks Bill Nace

POINT TO POINT DATALINK PROTOCOLS. ETI 2506 Telecommunication Systems Monday, 7 November 2016

Data Link Protocols. High Level Data. Control Protocol. HDLC Framing ~~~~~~~~ Functions of a Data Link Protocol. Framing PDUs. Addressing Destination

Data Link Layer (1) Networked Systems 3 Lecture 6

6.9. Communicating to the Outside World: Cluster Networking

Request for Comments: August 1996

Computer Networks. Today. Principles of datalink layer services Multiple access links Adresavimas, ARP LANs Wireless LANs VU MIF CS 1/48 2/48

UNIT-II OVERVIEW OF PHYSICAL LAYER SWITCHING & MULTIPLEXING

Data-link. Examples of protocols. Generating polynomials. Example. Error detection in TCP/IP. Multiple Access Links and Protocols

CS 640 Introduction to Computer Networks. Role of data link layer. Today s lecture. Lecture16

INTERNET ARCHITECTURE & PROTOCOLS

Virtual Link Layer : Fundamentals of Computer Networks Bill Nace

PPP. Point-to-Point Protocol

Chapter 10: Planning and Cabling Networks

Networks University of Stirling CSCU9B1 Essential Skills for the Information Age. Content

Distributed Queue Dual Bus

Lecture 6 Datalink Framing, Switching. From Signals to Packets

Chapter 3. The Data Link Layer. Wesam A. Hatamleh

Access to the Web. Coverage. Basic Communication Technology. CMPT 165: Review

BCS THE CHARTERED INSTITUTE FOR IT BCS HIGHER EDUCATION QUALIFICATIONS BCS Level 5 Diploma in IT COMPUTER NETWORKS APRIL 2015 EXAMINERS REPORT

Category: Informational Stac Technology August 1996

Local Area Networks (LANs): Packets, Frames and Technologies Gail Hopkins. Part 3: Packet Switching and. Network Technologies.

Lecture 6 The Data Link Layer. Antonio Cianfrani DIET Department Networking Group netlab.uniroma1.it

Network Topologies & Error Performance Monitoring in SDH Technology

Ethereal Exercise 2 (Part A): Link Control Protocol

In modern computers data is usually stored in files, that can be small or very, very large. One might assume that, when we transfer a file from one

Jaringan Komputer. Data Link Layer. The Data Link Layer. Study the design principles

Controller IP for a Low Cost FPGA Based USB Device Core

Chapter 5: The Data Link Layer. Chapter 5 Link Layer and LANs. Ethernet. Link Layer. Star topology. Ethernet Frame Structure.

C08a: Data Link Protocols

CS 4453 Computer Networks Winter

ITP 140 Mobile Applications Technologies. Networks

7010INT Data Communications Lecture 7 The Network Layer

Module 10 Frame Relay and ATM

Data Communication & Computer Networks INFO

Enabling Gigabit IP for Intelligent Systems

A typical WAN structure includes the following components.

High Speed Data Transfer Using FPGA

C H A P T E R GIGABIT ETHERNET PROTOCOL

Data Link Control. Outline. DLC functions

ECE 4450:427/527 - Computer Networks Spring 2017

Guide to TCP/IP, Third Edition. Chapter 3: Data Link and Network Layer TCP/IP Protocols

Cross Layer QoS Provisioning in Home Networks

The Internet. 9.1 Introduction. The Internet is a global network that supports a variety of interpersonal and interactive multimedia applications.

Chapter 4. DataLink Layer. Reference: Computer Networking: A Top Down Approach 4 th edition. Jim Kurose, Keith Ross Addison-Wesley, July 2007.

Switched Multimegabit Data Service (SMDS)

The Data Link Layer Chapter 3

CS 5520/ECE 5590NA: Network Architecture I Spring Lecture 13: UDP and TCP

Wireless over Pseudowires

TYPES OF ERRORS. Data can be corrupted during transmission. Some applications require that errors be detected and corrected.

Introduction to Real-Time Communications. Real-Time and Embedded Systems (M) Lecture 15

Network Working Group Request for Comments: 1663 Category: Standards Track July 1994

Networks. Computer Technology

Ten most important ideas/concepts to take away from Chapter 1

Request for Comments: 1333 May 1992

Transcription:

A Programmable and Highly Pipelined PPP Architecture for Gigabit IP over SDH/SONET Ciaran Toal, Sakir Sezer School of Electrical and Electronic Engineering Queen s University Belfast Ashby Building, Stranmillis Road Belfast, BT9 5AH, N. Ireland, U.K. Ciaran.Toal@ee.qub.ac.uk Abstract This paper details the implementation of a highly pipelined 2.5 Gbps Point-to-Point-Protocol Packet Processor (P 5 ) aimed for the latest System-on-a-Programmable-Chip (SoPC) technology. Throughput rates beyond 2.5 Gbps based on FPGA technology could be achieved by designing a new highly pipelined and parallel processing architecture for frames and datagrams. A novel pipelined data sorting mechanism with an extremely low resynchronization buffer and backpressure scheme are introduced to keep the data memory requirements as low as possible for embedded on-chip applications. 1. Introduction The emergence of the Internet in recent years has seen a marked change in the nature of traffic carried by telecommunication networks. For a century these networks have been devoted, almost exclusively, to the transmission of voice traffic. These networks have evolved slowly, as in recent decades the analogue network has been replaced by a digital infrastructure. However, there has been an explosive increase in the amount of data traffic since the emergence of the Internet in the last decade. This trend is set to continue as voice and video data is now being transmitted in packets over the Internet for Voice-over-Internet-Protocol (VoIP), WEB-TV, WEB- Cam and internet video conferencing [9]. The nature of businesses has become dependent on the instant information access and exchange enabled by the internet. Consumer trends have changed. Digital television channels are distributed over satellite and optical fibre to individual consumer set top boxes; these same television channels provide interactive content-on-demand at the push of a button. More and more people have a home web browser, require more than 1 email account and will not go anywhere unless they are carrying the latest wireless mobile communication technology. Next generation wireless technology will not only be handling voice data traffic but will also be transmitting JPEG images and MPEG video streams as 3 rd and 4 th generation mobile phones hit the market [8]. While the technology behind the scenes becomes more advanced, the products utilising this technology are becoming more accessible to the consumer. Newer generations of network packet processors must be able to cope with this massive amount of information for this trend to continue. The project presented in this paper investigates a new parallel and pipelined architecture that can be implemented on new generation programmable logic devices. The PPP processor presents an ideal case study for parallel architectures as the required throughput rate of 2.5 Gbps can only be achieved through hardware acceleration parallel processing. Before implementing a 32-bit P 5, an 8-bit (625 Mbps) P 5 system was studied and implemented. This gave a sound basis from which the 32- bit system could be compared to. 2. The Point-to-Point Protocol The Point-to-Point Protocol (PPP) [4], [5], [6], [7] provides a method for transmitting datagrams over pointto-point links and is the most efficient layer 2 protocol for

encapsulating IP datagrams [2]. PPP is widely used as a data link layer protocol in dial-up, cable and ADSL modems and also for data over SDH/SONET links. PPP is composed of three parts: A framing method for encapsulating datagrams to transmit over physical layer point-to-point links. PPP uses HDLC like framing [5]. An extensible Link Protocol (LCP) to establish, configure, and test the data-link connection. [4]. A family of Network Protocols (NCP) for establishing and configuring different networklayer protocols. PPP is designed to allow the simultaneous use of multiple network-layer protocols [4]. It should be noted that the main focus of this paper is on the data-path implementation of the P 5. PPP Frame Format The PPP frame format [4] is shown in Figure 1. Bytes 1 1 1 1 or 2 Flag 01111110 Address 11111111 00000011 Protocol Variable 2 or 4 1 Payload FCS Figure 1 - The PPP frame format Flag 01111110 Flag field: All PPP frames begin and end with the standard flag value of 0x7E. This value is character stuffed if it occurs within the payload field. Address field: This field typically defaults to the hexadecimal value 0xFF to indicate that all stations are to accept the frame. However, this implementation allows this field to be programmable so that it is compatible with MAPOS systems [1], [2]. field: PPP may be configured via the LCP to use sequence numbers and acknowledgements for reliable data transmission. This is of particular use in noisy environments such as wireless networks, but will be disabled by default. In normal operating conditions the value of this field is 0x03, indicating the use of unnumbered frames [7]. Protocol field: This field provides an indication of the type of packet held within the frame payload. Protocols starting with a 0 bit are network layer protocols such as IP or IPX, those starting with a 1 bit are used to negotiate other protocols including LCP and NCP. The default size of the protocol field is 2 bytes but this may be negotiated down to 1 byte using LCP. Payload field: The payload field is variable in length up to a negotiated maximum. If the maximum length is not negotiated via LCP during link establishment a default length of 1500 bytes is allocated. This payload may be padded if necessary. Checksum field: The checksum field contains the CRC value calculated over the entire frame. In general 16-bit and 32-bit CRC checks are used. For accuracy purposes the system will incorporate 32-bit CRC checking. PPP frames are delimited using a logical `flag' denoted by the occurrence of the character value of 0x7E. Safeguards are required against the possibility that this character should occur within the frame payload. The character 0x7D is used to `escape' any instances of 0x7E. Any instances of 0x7E outside of the flag fields are replaced with a two byte sequence consisting of 0x7D and the original character with its sixth bit complimented. This is best clarified using an example: A sequence of characters destined for storage within the frame payload may be: 0x31, 0x33, 0x7E, 0x96. Post character stuffing, this sequence will be: 0x31, 0x33, 0x7D, 0x5E, 0x96. 3. System Architecture and Design Today, commercial PPP packet processors are 8-bit systems. Therefore an 8-bit system was studied, designed and simulated before designing the more complex 32-bit system. Although both systems have the same structure, the fact that only 1 byte is processed in each clock cycle makes the overall control and synchronisation of the 8-bit system much easier. The P 5 is designed with a general purpose microprocessor interface, to make the system programmable, and with a simplified physical layer interface to interlink to the most common optical transmission systems. Figure 2 shows a simplified block diagram of the P 5 system architecture. The system is composed of 3 main blocks, a Transmitter, a Receiver and a Protocol OAM (Operations, Administration and Maintenance). Data is buffered before transmission and after reception in memory. The Transmitter formats and packetises datagrams before transmitting over a physical link. The receiver unpacketises and extracts the encapsulated datagram from the received data. The Protocol OAM provides an efficient interface for control and status information to be exchanged between an external microcontroller and the internal Receiver and Transmitter blocks.

Shared Memory Microprocessor Interface PPP Transmitter Protocol OAM PPP Receiver PHY PHY Figure 2 Programmable Point-to-Point-Protocol Processor Architecture Figures 3 and 4 illustrate diagrams of the P 5 Transmitter and Receiver blocks. The pipelined structure of these blocks is easily observed as the indicated data-path travels through 3 sub-modules. 32 32 Data Path Transmitter CRC Transmitter Escape Generate Figure 4 P 5 Receiver Architecture Data Path Receiver CRC Receiver Escape Detect Figure 3 P 5 Transmitter Architecture Both, the Transmitter and Receiver blocks are essentially mirror images of each other. They each contain a unit, a CRC unit and an Escape Generate/Detect unit. A PPP frame propagates at 32 bits per clock cycle through the transmitter or receiver block and is processed in a pipelined fashion at each processing unit. The Transmitter/Receiver units accommodate the control path for the framing procedure. signals generated by the µp, PHY and the Protocol OAM are interpreted and executed by a well-defined finite state machine. These Transmitter/Receiver units also relay status information of the packet processors back to the Protocol OAM ler. The exchange of status information between a µp (host commuter) is carried out via interrupts and a status/control register map for further processing. A highly efficient and optimised parallel CRC core has been developed. The CRC unit co-ordinates and synchronises data being fed into the CRC core. The CRC core computes a 32-bit Frame Check Sequence FCS via an 8 x 32-bit parallel matrix (for the 8-bit P 5 ) or via a 32 x 32- bit parallel matrix (for the 32-bit P 5 ) [3]. Before data is transmitted, the Escape Generate module checks for the presence of a flag character in a frame location in which it is not expected. For each flag character detected, the module inserts an escape character and XORs the flag character with the value 0x20 before transmission. The receiver block carries out the reverse of this Escape operation. Although, the data path for the 32-bit P 5 system is the same as that implemented for the 8-bit system, the control and synchronisation of the 32-bit system is much more complex. To illustrate this, the Escape Generate and the Escape Detect blocks will be examined in more detail. Considering the Escape Generate block for an 8-bit system, if a flag character was present, the system will halt the input data for 1 clock cycle while simple manipulation takes place and an extra byte is inserted. With the 32-bit version however, a flag character could be present on any of four locations. If a flag is present then an Escape character must be inserted and the flag data XOR d. This means that instead of the system holding 4 bytes to transmit at this moment, there are suddenly 5 bytes to transfer on a 32-bit channel. Therefore 1 byte must be transmitted on the next clock cycle with the first 3 of the next 4 incoming bytes. Figure 5 demonstrates this problem diagrammatically. Complicated structuring and reorganising is required. 7E 12 Extra Byte 7D 5E 12 Figure 5 Escape Generate Data Organisation Problem

If all 4 byte locations consisted of flag characters, however unlikely, then there will be 4 bytes of data awaiting transmission. And say for example on the previous cycle that there was already another byte waiting to be transferred, then the complexity of this problem is easily realised especially considering all the processing and reorganising must be carried out in a single clock cycle. 7D 5E Figure 6 Escape Detect Data Organisation Problem 7E Empty The 32-bit Escape Detect unit must solve similar issues as the 32-bit Escape Generate unit. With the 8-bit version, if an escape character was present, there is no input data for 1 clock cycle as the system deletes the escape character and then XOR s the next byte of data. With the 32-bit version however, an escape character could be present on any of four locations. If an escape character is present then it must be deleted and the next data byte XOR d. This means that instead of the system holding 4 bytes to process at this moment, there are suddenly only 3 bytes and there is effectively a bubble appearing on the channel. Therefore 1 byte of the next set of incoming bytes must be inserted into this bubble. Figure 6 explains this problem diagrammatically. Again pattern reorganising and complex restructuring is required. This problem has been solved by devising a data reordering mechanism and by further pipelining the unit. In the 8-bit implementation, this process typically takes 1 clock cycle. However for the 32-bit system, the process is divided up into 4 pipelined stages with buffering and decisional mechanisms implemented. The first data transmitted is therefore delayed by 4 clock cycles, approximately 50ns. Subsequent data flow is continuous and efficient. 4. Synthesis and Circuit Study The P 5 was synthesised and targeted to Xilinx Virtex, and Virtex II devices using Synplicity and Xilinx Foundation tools. Speed and area performance is examined. The pre-layout and post-layout synthesis results are included in Tables 1, 2 and 3. One of the first points to note is that the 32-bit version of the system is not 4 times bigger than the 8-bit version as one might predict, but is approximately 11 times bigger. This is mainly due to the fact that the 32-bit system contains byte sorter mechanisms built with large decisionmaking combinational logic. XCV50-4 XC2V40-6 XCV600-4 XC2V1000-6 Table 1 P 5 8-bit Implementation 8-Bit System Pre-layout Synthesis 95.3 128.4 184 (12%) 179 (35%) 84 (5%) 84 (16%) Post-layout Synthesis 79.5 91.5 130 (17%) 124 (48%) Table 2 P 5 32-bit Implementation 32-Bit System Pre-layout Synthesis 73 125.9 2641 (16%) 2230 (20%) 841 (4%) 689 (6%) 191 (12%) 185 (36%) Post-layout Synthesis 65 78.66 1208 (17%) 1144 (22%) 2563 (15%) 2157 (19%) To confirm this point, extra synthesis testing was carried out. The 32-bit Escape Generate module and the 8- bit Escape Generate module were synthesised to a Xilinx XC2V40 FPGA separately. The synthesis results from this test are shown in Table 3. The 32-bit Escape Generate requires 492 (96% available) and 168 flip-flops (32% of the availability). Most of the combinational logic available on this device is utilised, however less than one third of the available flip-flops are utilised. The 8-bit Escape Generate module requires only 22 and 6 flip-flops. The 32-bit module requires 25 times more combinational logic and 28 times as many flip-flops as the 8-bit module. Although the complete 32-bit system is 11 times larger, this size increase is mainly due to the decisional logic required in the Escape modules and partly due to extra decisional logic involved in the CRC and data reordering mechanisms. Table 3 Escape Generator Implementation XCV40-6 Escape Generate Module 32-Bit Implementation 8-Bit Implementation 492 (96%) 168 (32%) 22 (4%) 6 (~1%) There is a considerable speed-up for the 32-bit system running on the Virtex II technology compared to the Virtex performance. Timing analysis revealed that the

critical path is the same for each device and in each case passes through 6. The delay at each LUT is slightly greater with Virtex technology. This indicates that this speed-up is not achieved by a more efficient placement and routing process but to the technological advantage Virtex II offers over Virtex. [9] P. Blatherwick, R. Bell, P. Holland, Megaco IP Phone Media Gateway Application Profile, RFC-3054, January 2001. 5. Conclusions In this paper the implementation of a 32-bit P 5 system on an FPGA has been presented. Making use of a 32-bit bus, the system had to operate at a frequency of at least 78.125. It is imperative that at this speed the system is able to process 32 bits every clock cycle. A parallel pipelined PPP architecture has been developed. The 8-bit and 32-bit systems speed requirements are met with Virtex II technology. The circuit requires approximately 25% of the resources of a XC2V-1000 device leaving more than sufficient room to incorporate a Xilinx Microblaze microprocessor core or any other general purpose embedded microcontroller, enabling system programmability. The 32-bit P 5 is approximately 11 times larger than the 8-bit system. It has been discovered that this size increase is mainly due to the byte sorter and buffering mechanisms included in the 32-bit design, which are heavy in combinational logic. 6. References [1] K. Murakami, M. Maruyama, MAPOS Multiple Access Protocol over SONET/SDH, Version 1, RFC-2171, NTT, June 1997. [2] S. Sezer, A. Marshall, R. Woods, MAPOS a Backbone Solution for Internet over SONET/SDH, IEE Teletraffic Symposium March 1998. [3] T.-Bi-Pei, C. Zukowski, "High-speed parallel CRC circuits in VLSI," IEEE Transaction on Communication, vol. 40, pp. 653-657, April 1992. [4] W. Simpson, The Point-to-Point Protocol (PPP), RFC- 1661, July 1994. [5] W. Simpson, HDCL-like Framing, RFC-1662, July 1994. [6] W. Simpson, PPP over SONET/SDH, RFC-1619, May 1994. [7] D. Rand, PPP Reliable Transmission, RFC-1663, July 1994. [8] D. Mitzel, Overview of 2000 IAB Wireless Internetworking Workshop, RFC-3002, December 2000.