Hybrid On-chip Data Networks. Gilbert Hendry Keren Bergman. Lightwave Research Lab. Columbia University

Similar documents
Scalable Computing Systems with Optically Enabled Data Movement

Meet in the Middle: Leveraging Optical Interconnection Opportunities in Chip Multi Processors

NETWORKS-ON-CHIP (NoC) plays an important role in

Johnnie Chan and Keren Bergman VOL. 4, NO. 3/MARCH 2012/J. OPT. COMMUN. NETW. 189

Low-Power Reconfigurable Network Architecture for On-Chip Photonic Interconnects

Emerging Platforms, Emerging Technologies, and the Need for Crosscutting Tools Luca Carloni

Brief Background in Fiber Optics

IITD OPTICAL STACK : LAYERED ARCHITECTURE FOR PHOTONIC INTERCONNECTS

Loss-Aware Router Design Approach for Dimension- Ordered Routing Algorithms in Photonic Networks-on-Chip

Analyses on the Effects of Crosstalk in a Dense Wavelength Division Multiplexing (DWDM) System Considering a WDM Based Optical Cross Connect (OXC)

Graphene-enabled hybrid architectures for multiprocessors: bridging nanophotonics and nanoscale wireless communication

Challenges for Future Interconnection Networks Hot Interconnects Panel August 24, Dennis Abts Sr. Principal Engineer

Network-on-chip (NOC) Topologies


Network on Chip Architecture: An Overview

Silicon Photonics PDK Development

Future of Interconnect Fabric A Contrarian View. Shekhar Borkar June 13, 2010 Intel Corp. 1

EDA for ONoCs: Achievements, Challenges, and Opportunities. Ulf Schlichtmann Dresden, March 23, 2018

Lecture 3: Flow-Control

Optical Interconnects: Trend and Applications

Interconnection Networks

FUTURE high-performance computers (HPCs) and data. Runtime Management of Laser Power in Silicon-Photonic Multibus NoC Architecture

Achieving Lightweight Multicast in Asynchronous Networks-on-Chip Using Local Speculation

PIC training: Interconnect System Design

OPAL: A Multi-Layer Hybrid Photonic NoC for 3D ICs

Network-on-Chip Architecture

IN the continual drive toward improved microprocessor

Hybrid Integration of a Semiconductor Optical Amplifier for High Throughput Optical Packet Switched Interconnection Networks

A Multilayer Nanophotonic Interconnection Network for On-Chip Many-core Communications

IntraChip Optical Networks for a Future Supercomputer-on-a-Chip

100 Gbit/s Computer Optical Interconnect

TDT Appendix E Interconnection Networks

Phastlane: A Rapid Transit Optical Routing Network

Silicon Nanophotonic Network-On-Chip Using TDM Arbitration

826 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 33, NO. 6, JUNE 2014

PHOTONIC ATM FRONT-END PROCESSOR

Photonic NoC for DMA Communications in Chip Multiprocessors

Topologies. Maurizio Palesi. Maurizio Palesi 1

Index 283. F Fault model, 121 FDMA. See Frequency-division multipleaccess

ECE/CS 757: Advanced Computer Architecture II Interconnects

Architectural Exploration and Design Methodologies of Photonic Interconnection Networks. Jong Wu Chan

170 Index. Delta networks, DENS methodology

Lecture 2: Topology - I

EE382C Lecture 1. Bill Dally 3/29/11. EE 382C - S11 - Lecture 1 1

Interconnection Networks

From Majorca with love

Lecture 12: Interconnection Networks. Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E)

Holistic Comparison of Optical Routers for Chip Multiprocessors

MIMD Overview. Intel Paragon XP/S Overview. XP/S Usage. XP/S Nodes and Interconnection. ! Distributed-memory MIMD multicomputer

Ultra-Low Latency, Bit-Parallel Message Exchange in Optical Packet Switched Interconnection Networks

Analyzing the Effectiveness of On-chip Photonic Interconnects with a Hybrid Photo-electrical Topology

Lecture 7: Flow Control - I

Lecture 13: Interconnection Networks. Topics: lots of background, recent innovations for power and performance

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS 1

Reliability-Performance Trade-offs in Photonic NoC Architectures

Switch Datapath in the Stanford Phictious Optical Router (SPOR)

Monolithic Integration of Energy-efficient CMOS Silicon Photonic Interconnects

Exploiting Dark Silicon in Server Design. Nikos Hardavellas Northwestern University, EECS

Silicon-Photonic Clos Networks for Global On-Chip Communication

A Torus-Based Hierarchical Optical-Electronic Network-on-Chip for Multiprocessor System-on-Chip

PREDICTION MODELING FOR DESIGN SPACE EXPLORATION IN OPTICAL NETWORK ON CHIP

Toward a Reliable Data Transport Architecture for Optical Burst-Switched Networks

Lecture 22: Router Design

3D WiNoC Architectures

A 2-Layer Laser Multiplexed Photonic Network-on-Chip

CMOS Photonic Processor-Memory Networks

Lecture: Interconnection Networks

Silicon-Photonic Clos Networks for Global On-Chip Communication

Silicon Photonics for Exascale Systems

Packet Switch Architecture

Packet Switch Architecture

Lecture 3: Topology - II

Designing Energy-Efficient Low-Diameter On-chip Networks with Equalized Interconnects

Analysis of Photonic Networks for a Chip Multiprocessor Using Scientific Applications

HANDSHAKE AND CIRCULATION FLOW CONTROL IN NANOPHOTONIC INTERCONNECTS

ECE 551 System on Chip Design

Lecture: Memory, Multiprocessors. Topics: wrap-up of memory systems, intro to multiprocessors and multi-threaded programming models

Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems.

CHAPTER TWO LITERATURE REVIEW

NoC Test-Chip Project: Working Document

The Impact of Optics on HPC System Interconnects

Low Polling Time TDM ONoC With Direction-Based Wavelength Assignment

Networks for Multi-core Chips A A Contrarian View. Shekhar Borkar Aug 27, 2007 Intel Corp.

Optical switching for scalable and programmable data center networks

COMPARATIVE PERFORMANCE EVALUATION OF WIRELESS AND OPTICAL NOC ARCHITECTURES

Scaling routers: Where do we go from here?

Photonics in computing: use more than a link for getting more than Moore

Interconnection Network

Trickle Up: Photonics and the Future of Computing Justin Rattner Chief Technology Officer Intel Corporation

Topologies. Maurizio Palesi. Maurizio Palesi 1

Thomas Moscibroda Microsoft Research. Onur Mutlu CMU

Interconnection Networks

The Future of Electrical I/O for Microprocessors. Frank O Mahony Intel Labs, Hillsboro, OR USA

Address InterLeaving for Low- Cost NoCs

Optical Crossbars on Chip: a comparative study based on worst-case losses

RHiNET-3/SW: an 80-Gbit/s high-speed network switch for distributed parallel computing

Optimal Management of System Clock Networks

NetSpeed ORION: A New Approach to Design On-chip Interconnects. August 26 th, 2013

Intro to: Ultra-low power, ultra-high bandwidth density SiP interconnects

Lecture 26: Interconnects. James C. Hoe Department of ECE Carnegie Mellon University

Transcription:

Hybrid On-chip Data Networks Gilbert Hendry Keren Bergman Lightwave Research Lab Columbia University

Chip-Scale Interconnection Networks Chip multi-processors create need for high performance interconnects Performance bottleneck of on-chip networks and I/O Power dissipation constraints of the chip package > 50% of total power comes from interconnects* Intel Polaris IBM Cell AMD Opteron * N. Magen et al., Interconnect-power dissipation in a microprocessor, SLIP 2004. 2

Motivation CMPs of the future = 3D stacking Lots of data on chip Photonics offers key advantages 3

Why Photonics? Photonics changes the rules for Bandwidth, Energy, and Distance. ELECTRONICS: Buffer, receive and re-transmit at every router. Each bus lane routed independently. (P N LANES ) Off-chip BW is pin-limited and power hungry. OPTICS: Modulate/receive high bandwidth data stream once per communication event. Broadband switch routes entire multiwavelength stream. Off-chip BW = On-chip BW for nearly same power. RX RX RX TX TX TX TX RX TX RX TX RX 4

Hybrid Network Premise Optical processing difficult and limited Source, destination routing inefficient Use electronics for routing, optics for switching and transmission Hybrid Circuit-Switching 5

Hybrid Circuit-Switched Networks Step 1: Path SETUP request Electronic SETUP Msg Source core Destination Core 6

Hybrid Circuit-Switched Networks Step 2: Path ACK Electronic ACK Msg 7

Hybrid Circuit-Switched Networks Step 3: Transmit Data Photonic Switch Use Information 8

Hybrid Circuit-Switched Networks Meanwhile: Path Contention Path BLOCKED Msg (Backoff) 9

Hybrid Circuit-Switched Networks Step 4: Path TEARDOWN Electronic SETUP Msg Source core Destination Core 10

Hybrid Circuit-Switched Networks Pros: Cons: Energy-efficient end-toend transmission High bandwidth through WDM Electronic network still available for small control messages* Network-level support for secure regions Path setup latency Path setup contention (no fairness) * [G. Hendry et al. Analysis of Photonic Networks for a Chip Multiprocessor Using Scientific Applications. In NOCS, 2009] 11

Programming and Communication

Shared Memory scaling [OpenMP on large systems] often performs worse than message passing due to a combination of false sharing, coherence traffic, contention, and system issues that arise from the difference in scheduling and network interface moderation ~ Exascale Report Implicit Communication Explicit Communication 13

Partitioned Global Address Space Access Method Local Read Optical Receive Local Write Optical send Remote Read Electronic request, optical receive Remote Write Optical send Shared R/W? Implicit Communication Explicit Communication [G. Hendry et al. Circuit-Switched Memory Access in Photonic Interconnection Networks for HPEC. In Supercomputing, Nov. 2010] 14

Message Passing Complex, dynamic access patterns Relatively larger blocks of data Scientific computing Implicit Communication Explicit Communication * [G. Hendry et al. Analysis of Photonic Networks for a Chip Multiprocessor Using Scientific Applications. In NOCS, 2009] 15

Streaming Embedded / specialized systems (Graphics, Image + Signal Proc.) Execution mode of general-purpose systems (Cell Processor) 1 4 Input Data 3 Output Data 2 Persistent optical circuits Implicit Communication Explicit Communication 16

Electronic Plane

Electronic Router Ring Cntrl Request Bus Xbar Cntrl Buffer Cntrl Data Switch Routing Logic Arbiter Flow Control Xbar Allocation Credits In Xbar Cntrl Data Switch Allocation Ring Cntrl Buffer Data Path Crossbar Control Router Low frequency operation (~ 1GHz) 1 VC (typically) Small buffers (64-28) Narrow Channels (8-32) 18

Network Gateway Network IF Serialization Deserialization Receivers Drivers Electronic Crossbar Control Router To/From Control plane Core Core Core Tx/Rx Core Bidirectional Electronic Channel 5-port photonic switch Bidirectional Waveguide External Concentration To/From Data plane [P. Kumar et al. Exploring concentration and channel slicing in on-chip network router. In NOCS, 2009] 19

The Photonic Plane

Silicon Photonic Waveguide Technology 1.28 Tb/s Data Transmission Experiment (occupies small slice of available WG BW) 100 ps before injection into waveguide [Vlasov and McNab, Optics Express 12 (8) 1622 (2004)] C23 (1559 nm) C28 (1555 nm) C46 (1541 nm) C51 (1537 nm) [B. G. Lee et al., Photon. Technol. Lett. 20 (10) 767 (2008)] after 5-cm waveguide and EDFA Silicon photonic waveguides provide low-power optical interconnects in CMOS-compatible platform. Low-loss (1.7 db/cm), high-bandwidth (> 200 nm) silicon photonic waveguides can be fabricated in commercial CMOS 21 process.

Silicon Photonic Modulator and Detector Technology (CW) LASER 18 Gb/s demonstrated modulator detector Ge-on-Si Detectors: 40-GHz bandwidths 1 A/W responsivities Receivers (detectors w/ CMOS amplifiers): 1.1 pj/bit demonstrated at 10 Gb/s Scalable to < 50 fj/bit [M Lipson, Optics Express (2007)] 85 fj/bit demonstrated at 10 Gb/s Scalable to < 25 fj/bit [M Watts, Group Four Photonics (2008)] [S Koester, J. Lightw. Technol. (2007)] 22

Silicon Photonic Micro-Ring Switch Explanation Transmission (in i out i ) bar state in0 out0 no current, on-resonance in1 out1 cross state fast control of resonance wavelength via carrier injection current, off-resonance 23

Higher Order Switch Designs 24

On-Chip Topology Exploration Photonic Torus Nonblocking Photonic Torus [A. Shacham et al., Trans. on Comput., 2008] [M. Petracca et al. IEEE Micro, 2008] 25

On-Chip Topology Exploration TorusNX Square Root [J. Chan et al. JLT, May 2010] 26

Photonic Plane Characteristics Insertion Loss Noise Power 27

Insertion Loss and Optical Power Budget Optical Power Budget Nonlinear Effects WDM Factor Worst-case Insertion Loss Detector Sensitivity 28

Insertion Loss vs. Bandwidth Number of λ Topologies Network Size 29

Simulation Results Insertion Loss (db) 50 40 30 20 10 0 Torus Topology 31.2 25.6 20.6 6 6 4 4 8 8 37.0 42.8 48.6 10 10 12 12 14 14 54.5 16 16 60.3 18 18 Insertion Loss (db) 50 40 30 20 10 0 Non-Blocking Torus Topology 38.0 44.1 50.6 25.3 18.7 6 6 4 4 31.5 8 8 10 10 12 12 14 14 56.8 16 16 63.2 18 18 Topology Size (nodes) Topology Size (nodes) Insertion Loss (db) 50 40 30 20 10 TorusNX Topology 15.8 23.2 19.5 27.1 31.0 34.9 38.8 42.7 Insertion Loss (db) Insertion Loss (db) 50 40 30 20 10 Square TorusNXRoot Topology 19.5 12.2 15.8 27.1 31.0 34.9 23.2 21.5 38.8 30.6 42.7 0 6 6 4 4 8 8 10 10 12 12 14 14 16 16 18 18 0 4 4 4 4 6 6 8 8 8 8 10 10 12 12 14 14 16 16 16 16 18 18 Topology Size (nodes) Topology Size (nodes) Propagation Crossing Dropping Into a Ring 30

Simulation Results Original is based on the IL results from previous slide, Improved is based on a hypothetical improvement in crossing loss from 0.15 db to 0.05 db. Torus Topology Non-Blocking Torus Topology Number of Wavelength Channels 100 10 Number of Wavelength Channels 100 10 Optical power budget 1 0 100 200 300 Number of Access Points TorusNX Topology 1 10 20 30 Number of Access Points Square Root Topology Number of Wavelength Channels 100 10 Number of Wavelength Channels 100 10 Optical power budget 1 0 100 200 300 Number of Access Points 1 0 100 200 300 Number of Access Points 31

Photonic Plane Characteristics Insertion Loss Noise Power 32

Noise and Crosstalk Laser Noise Modulation Noise Inter-Message Crosstalk Coherent noise Crosstalk Intra-Message Crosstalk Filter Incoherent noise 33

Optical SNR Effects of Noise Network Size Number of λ Network Load 34

Simulation Results Results Results are plotted for network size of 8 8 at saturation, at the detectors. Maximum OSNR = ~45 db (due to laser noise) Minimum OSNR < 17 db (due to message-to-message crosstalk) Variations between networks due to varying likelihood of two message intersecting on network topology. Optical SNR (db) System Performance SNR measures the likelihood of error-free transmission. Lower SNR designs will require additional retransmission, resulting in lower throughput performance. 50 40 30 20 10 0 Torus Non-blocking Torus TorusNX Square Root 10 0 10 1 10 2 10 3 10 4 10 5 10 6 10 7 Message Size (bit) The line at OSNR=16.9 db is where a bit-error-rate of 10-12 can be achieved, assuming an ideal binary receiver circuit and orthogonal signaling. 35

Photonic Plane Characteristics Insertion Loss Noise Power 36

Power Usage Transmission Laser Power Active Power Modulating Detecting Broadband Static Power Thermal tuning Tx\Rx Power Drivers TIAs 1V Electronic Control p-region n-region 0V Ohmic Heater Thermal Control 0V 1V Off-resonance profile On-resonance profile Injected Wavelengths 37

Energy Per Bit Energy per Bit (J/bit) 10-7 10-8 10-9 10-10 10-11 Torus Non-blocking Torus TorusNX Square Root 10-12 10-13 10 0 10 1 10 2 10 3 10 4 10 5 10 6 10 7 Message Size (bit) 38

Power Breakdown Torus Topology Nonblocking Torus Topology Modulator 4% Detector 3% PSE 2% Thermal 1% Modulator 4% Detector 2% PSE 2% Thermal 1% Electronic Wire 3% Router Logic 43% Electronic Wire 2% Router Logic 45% Router Buffer 44% Router Buffer 44% 12 wavelengths @ 10 Gbps/each Power Dissipation = 4.31 W 7 wavelengths @ 10 Gbps/each Power Dissipation = 1.59 W Results based on randomly generated traffic with message sizes of 100 kbit, with network in saturation. Data was collected on 64 nodes topologies constrained to a total surface area of 2 cm 2 cm. 39

Power Breakdown Square Root Topology TorusNX Topology Modulator 14% PSE 2% Thermal 4% Modulator 17% PSE 1% Thermal 3% Detector 8% Router Logic 34% Detector 10% Router Logic 37% Electronic Wire 7% Router Buffer 31% Electronic Wire 1% Router Buffer 31% 27 wavelengths @ 10 Gbps/each Power Dissipation = 1.89 W 38 wavelengths @ 10 Gbps/each Power Dissipation = 3.22 W 40

Performance 41

Other Interesting Issues

Memory Access Processor Core Network Router Memory Access Point [G. Hendry et al. Circuit-Switched Memory Access in Photonic Interconnection Networks for HPEC. In Supercomputing, Nov. 2010] 43

Other Arbitration Means - TDM [G. Hendry et al. Silicon Nanophotonic Network-On-Chip Using TDM Arbitration. In HOTI, Aug. 2010] 44

Wavelength Granularity Original Re-design λ Scalable number of WDM channels 45 λ

Conclusion Some applications / programming models definitely well-suited to a circuit-switched photonic network Interesting tradeoffs and design space 46