RHiNET-3/SW: an 80-Gbit/s high-speed network switch for distributed parallel computing

Similar documents
High-speed network switch RHiNET-2/SW and its implementation with optical interconnections

Optical Interconnection as an IP Macro of COMS LSIs (OIP)

INTERNATIONAL STANDARD

Dynamic Packet Fragmentation for Increased Virtual Channel Utilization in On-Chip Routers

Intel Thunderbolt. James Coddington Ed Mackowiak

Basic Low Level Concepts

Module 17: "Interconnection Networks" Lecture 37: "Introduction to Routers" Interconnection Networks. Fundamentals. Latency and bandwidth

Flow Control can be viewed as a problem of

InfiniBand SDR, DDR, and QDR Technology Guide

FUJITSU COMPONENT LIMITED

Trends in Digital Interfaces for High-Speed ADCs

SpaceWire-RT. SpaceWire-RT Status SpaceWire-RT IP Core ASIC Feasibility SpaceWire-RT Copper Line Transceivers

Responsive Processor for Parallel/Distributed Real-Time Control

NoC Test-Chip Project: Working Document

Application of Zero Delay Buffers in Switched Ethernet

Joint ITU-T/IEEE Workshop on Carrier-class Ethernet

Lecture 26: Interconnects. James C. Hoe Department of ECE Carnegie Mellon University

fleximac A MICROSEQUENCER FOR FLEXIBLE PROTOCOL PROCESSING

Hybrid On-chip Data Networks. Gilbert Hendry Keren Bergman. Lightwave Research Lab. Columbia University

1 Copyright 2013 Oracle and/or its affiliates. All rights reserved.

PCI to SH-3 AN Hitachi SH3 to PCI bus

Part IV: 3D WiNoC Architectures

A HT3 Platform for Rapid Prototyping and High Performance Reconfigurable Computing

SPACEFIBRE. Session: SpaceWire Standardisation. Long Paper.

Network on Chip Architecture: An Overview

XS1 Link Performance and Design Guidelines

100GE and 40GE PCS Proposal

Optimal Management of System Clock Networks

Interconnection Structures. Patrick Happ Raul Queiroz Feitosa

Ethernet Technologies

VCSEL-based solderable optical modules

Lecture 12: Interconnection Networks. Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E)

December 2002, ver. 1.1 Application Note For more information on the CDR mode of the HSDI block, refer to AN 130: CDR in Mercury Devices.

Ting Wu, Chi-Ying Tsui, Mounir Hamdi Hong Kong University of Science & Technology Hong Kong SAR, China

DesignCon SerDes Architectures and Applications. Dave Lewis, National Semiconductor Corporation

10Gb Ethernet PCS Core

Peripheral Component Interconnect - Express

White Paper. ORSPI4 Field-Programmable System-on-a-Chip Solves Design Challenges for 10 Gbps Line Cards

Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems.

Network Media and Layer 1 Functionality

The interconnect becomes an increasingly critical system component > Fatter compute nodes > Increasing disparity between local and remote

250 Mbps Transceiver in LC FB2M5LVR

3D WiNoC Architectures

Design of a Multigigabit Optical Network Interface Card

Computer buses and interfaces

6.9. Communicating to the Outside World: Cluster Networking

Lecture 13: Interconnection Networks. Topics: lots of background, recent innovations for power and performance

Achieving UFS Host Throughput For System Performance

100 Gbit/s Computer Optical Interconnect

A Single Chip Shared Memory Switch with Twelve 10Gb Ethernet Ports

Low Latency Communication on DIMMnet-1 Network Interface Plugged into a DIMM Slot

SpaceWire Technologies deliver multi-gigabit data rates for on-board Spacecraft. SpaceTech Expo Gregor Cranston Business Development Manager

InfiniBand FDR 56-Gbps QSFP+ Active Optical Cable PN: WST-QS56-AOC-Cxx

Lecture: Interconnection Networks

Introduction Electrical Considerations Data Transfer Synchronization Bus Arbitration VME Bus Local Buses PCI Bus PCI Bus Variants Serial Buses

Physical and Data Link layers

A (Very Hand-Wavy) Introduction to. PCI-Express. Jonathan Heathcote

Quality-of-Service for a High-Radix Switch

Multi-level Fault Tolerance in 2D and 3D Networks-on-Chip

An FPGA-Based Optical IOH Architecture for Embedded System

Lecture: Interconnection Networks. Topics: TM wrap-up, routing, deadlock, flow control, virtual channels

Lecture 7: Flow Control - I

Lecture 16: On-Chip Networks. Topics: Cache networks, NoC basics

RT2016 Phase-I Trigger Readout Electronics Upgrade for the ATLAS Liquid-Argon Calorimeters

Produto: 1.25Gbps SFP Optical Transceiver, 550m Reach, with DDM Modelo: V7-SFP-0301D Documentação: Técnica/Datasheet

Ultra-Low Latency, Bit-Parallel Message Exchange in Optical Packet Switched Interconnection Networks

BES-III off-detector readout electronics for the GEM detector: an update

SpaceWire-RT Update. EU FP7 Project Russian and European Partners. SUAI, SubMicron, ELVEES University of Dundee, Astrium GmbH

Physical Implementation of the DSPIN Network-on-Chip in the FAUST Architecture

FPGA based Design of Low Power Reconfigurable Router for Network on Chip (NoC)

Natalie Enright Jerger, Jason Anderson, University of Toronto November 5, 2010

HSE OTN Support. Qiwen Zhong Qiuyou Wu WB Jiang Huawei Technologies. IEEE 802.3ba Task Force, March HUAWEI TECHNOLOGIES Co., Ltd.

Common PMD Interface Hari

FPGA BASED ADAPTIVE RESOURCE EFFICIENT ERROR CONTROL METHODOLOGY FOR NETWORK ON CHIP

Brief Background in Fiber Optics

Optical Data Interface ODI-1 Physical Layer Preliminary Specification. Revision Date

Lecture 3: Flow-Control

SERIAL MULTI-PROTOCOL TRANSMISSION WITH THE LatticeSC FPGA

GIGALIGHT CXP-CXP Active Optical Cable GCX-DO151G-XXXC

2. THE PCI EXPRESS BUS

Interconnection Networks: Flow Control. Prof. Natalie Enright Jerger

10 Gigabit XGXS/XAUI PCS Core. 1 Introduction. Product Brief Version April 2005

Safe City Transmission Solution

EE 382C Final Project Presentation. Ted Jiang Curt Harting 5/24/11

Dominique Gigi CMS/DAQ. Siena 4th October 2006

Networks: Routing, Deadlock, Flow Control, Switch Design, Case Studies. Admin

Barcelona: a Fibre Channel Switch SoC for Enterprise SANs Nital P. Patwa Hardware Engineering Manager/Technical Leader

BlueGene/L. Computer Science, University of Warwick. Source: IBM

Networks-on-Chip Router: Configuration and Implementation

Index 283. F Fault model, 121 FDMA. See Frequency-division multipleaccess

WAN-compatible 10 Gigabit Ethernet Tutorial. Opticon Burlingame, CA July 31, 2000

Ultrafast photonic packet switching with optical control

ECE/CS 757: Advanced Computer Architecture II Interconnects

250 Mbps Transceiver in OptoLock IDL300T XXX

Field Program mable Gate Arrays

SONA: An On-Chip Network for Scalable Interconnection of AMBA-Based IPs*

Xmultiple Page 1 XQSFP-AOC40G-XX. QSFP Active Optical Cable. Features. Applications

Topologies. Maurizio Palesi. Maurizio Palesi 1

Lecture 24: Interconnection Networks. Topics: topologies, routing, deadlocks, flow control

Overview. Performance metrics - Section 1.5 Direct link networks Hardware building blocks - Section 2.1 Encoding - Section 2.2 Framing - Section 2.

Transcription:

RHiNET-3/SW: an 0-Gbit/s high-speed network switch for distributed parallel computing S. Nishimura 1, T. Kudoh 2, H. Nishi 2, J. Yamamoto 2, R. Ueno 3, K. Harasawa 4, S. Fukuda 4, Y. Shikichi 4, S. Akutsu 4, K. Tasho 5, and H. Amano 3 1 RWCP Optical Interconnection Hitachi Laboratory, 2 RWCP Tsukuba Research Center, 3 Keio University, 4 Hitachi Communication Systems, Inc. 5 Synergetech, Inc. E-mail: nisimura@crl.hitachi.co.jp 1

Contents RHiNET concept (RWCP high-performance network) Concept and architecture of RHiNET-3/SW Key components in RHiNET-3/SW switch-lsi, deskew-lsi, parallel optical link, board Evaluation test results, on LSIs bit-error-rate, deskew function 2

RHiNET concept RHiNET switch: x high-speed crossbar switch RHiNET-3/SW High-speed parallel optical link Targets: PCI-bus based NIC Low-cost, and high-performance parallel computing through the combined computational power of PCs Connecting computers distributed within one or more floors of a building Features: Reliable low-latency communication, no upper layer Long links ( - 1 km), free topology design Large bi-section bandwidth (- Gbit/s) 3

Structure of RHiNET-3/SW (schematic structure) Optical RX DS-LSI DS-RX(1) DS-TX(1) DS-TX(0) DS-RX(0) electrical I/Os (-bit data, 1-bit clock) P2 P1 P3 SW-LSI P0 P4 P7 P5 P6 optical I/Os (-bit data, 1-bit clock) DS-LSI DS-LSI DS-RX(1) DS-TX(1) DS-TX(0) DS-RX(0) DS-LSI Optical TX electrical I/Os (-bit data, 1-bit clock) Switch: -Gbit/s x -port Aggregate throughput: 0 Gbit/s BB encoded data with clock I/O: 1.25-Gbit/s x 12-channel optical links Transmission length: < 1km DS-LSI: skew compensation for long transmission length Electrical I/O: CML or LVDS 4

Design concepts of RHiNET-3 Hop-by-hop retransmission Low-cost optical link module Retransmission: need for error-free data transmission Simple procedures and compact circuits Retransmission unit: micro frame (160 bits) Credit-based flow control For long transmission length Effective use of packet buffer 32 Virtual channels (VCs) - Virtual lane - Deadlock-free and topology-free 5

Flow control and retransmission (layered) TX-switch flit (0 bits) RX-switch Tx flow controller Tx-retrans. Ctrl. micro frame (2 flits [160 bits]) network link Rx-retrans. Ctrl. Rx flow controller 32 VC buffer Retransmission layer (unit : micro frame [160 bits]) Per-VC credit-based flow control layer (unit: flit [ 0 bits]) Small data size: reduce overhead (latency and bandwidth) 6

Format of micro frame (MF) Micro frame type 0 (bit) 63 69 74 79 Flit 0 Flit 1 Payload Payload MF sequence number CRC Credit Acknowledge Retransmission request CRC and sequence-number based retransmission mechanism Retransmission unit: micro frame (12 bits payload / 160 bits) Acknowledge: sequence number of successfully received MF Credit, acknowledge and retransmission request use the same field Small retransmission overhead 7

Retransmission mechanism (behavior) TX-switch Tx-retrans. Ctrl. Rx-retrans. Ctrl. RX-switch Retrans. buffer network link CRC/Seq.number check Error Detected!! Hop-by-hop retransmission : error-free transmission, and small overhead

Credit-based flow control TX-switch Credit counters RX-switch Tx-retrans. Ctrl. network link Tx flow controller Rx flow controller Rx-retrans. Ctrl. Per-VC credit-based flow control VC Buffer 256 flits (2 Kbytes) Credit-based flow control mechanism enables long data transmission and uses VC buffer effectively 9

Components of RHiNET-3/SW (schematic structure) SW-LSI Motherboard DS-RX(1) DS-TX(1) DS-TX(0) DS-RX(0) P2 P1 P3 P4 SW-LSI P0 P7 P5 P6 DS-RX(1) DS-TX(1) DS-TX(0) DS-RX(0) DS-LSIs Optical link modules

Blockdiagram of SW-LSI 1.25 Gbit/s x bit per port 125Mbit/s 0 bit per port Routing Table Demultiplexer Elastic Buffer Rx-retrans. Ctrl. RT Controller VC Controller Tx-retrans. Ctrl. Multiplexer Packet Buffer Crossbar Retrans. Buffer 1.25 Gbit/s x bit per port 11

Floor plan of SW-LSI (1st cut ) VC buffer memory PLL 0.14-um CMOS ASIC Die size: 16.5 mm x 16.5 mm Number of gates: 1502 k Buffer memory: a total of 640 kbytes I/O: 1.25 Gbit/s per pin Package: 74-pin BGA 12

DS-LSI (LSI for skew compensation) from SW-LSI to SW-LSI to SW-LSI from SW-LSI TX0 RX0 RX1 TX1 Optical TX Optical RX Optical RX Optical TX 12-channel fiber ribbon ( < 1 km) Optical RX Optical TX TX0 RX0 RX1 TX1 DCcoupled ACcoupled 1.25-Gbit/s x 12-channel AC-coupled optical modules DS-LSI has BB encoder and decoder For high-speed (1.25 Gbit/s per pin) AC-coupled optical data transmission DS-LSI compensates skew between -bit data and 1-bit clock Maximum skew: +/- 256 ns larger than a skew of 1-km MMF fiber ribbon (+/- 64 ns) Initial data pattern consists of 64 BB special characters 13

12-channel parallel optical link TX module RX module 12-channel parallel data transmission (products of ZARLINK TM semiconductor) 50-nm VCSEL 12-channel CML interfaces 155 Mbit/s - 2.5 Gbit/s (AC-coupled) GI 50/125 12-channel MMF fiber Up to 300-m data transmission at 2.5- Gbit/s BER: -12 14

Structure of motherboard (1st test-bed) Fiber ribbon SW-LSI Four DS-LSIs Designed to evaluate switching function Size: 550 x 550 mm Multi-wire interconnection board TM (Hitachi Chemical, Ltd.,) To overcome crosstalk, skew, and propagation loss Layout is optimized according to experimental results Eight pairs of 12-channel optical modules 15

Evaluation results (bit error rate) SW-LSI output from channel D0 of port0 BER (bit error rate): < -11 at data rate of 1.25 Gbit/s per pin Timing budget margin: about 400 ps 16

Evaluation results (deskew function) P5 P4 P6 P7 SW-LSI P0 P1 TX0 RX0 RX1 TX1 TX RX RX P3 P2 DS-LSI TX Optical Modules 12-channel ribbon fiber (300 m ) Port 0 Port 1 Deskew Function works successfully. 17

Summary A prototype network switch, RHiNET-3/SW, for a RHiNET high-performance distributed parallel computing environment Specifications Gbit/s x ports Parallel optical data transmission over a distance of up to 1 km Aggregate throughput is 0 Gbit/s per board Architecture Hop-by-hop retransmission mechanism Credit-based flow control reliable and long-transmission-distance data communication For -nodes parallel computing RHiNET-3/SW High-throughput, long-distance and flexible-flow-control In a distributed parallel computer system using commercial PCs 1