EECS 598: Integrating Emerging Technologies with Computer Architecture. Lecture 14: Photonic Interconnect

Size: px

Start display at page:

Download "EECS 598: Integrating Emerging Technologies with Computer Architecture. Lecture 14: Photonic Interconnect"

Moris Bennett
5 years ago
Views:

1 1 EECS 598: Integrating Emerging Technologies with Computer Architecture Lecture 14: Photonic Interconnect Instructor: Ron Dreslinski Winter

2 Announcements 2 Remaining lecture schedule 3/15: Photonics 3/17: Project Meetings 3/22: Student Presentations (2) 3/24: Student Presentations (2) 3/29: Student Presentations (2) 3/31: Student Presentations (2) 4/5: Student Presentations (2) 4/7: Student Presentations (2) 4/12: Project Writeup Due; Group Project Presentations (2 or 3) 4/14: Group Project Presentations (2 or 3) 2 2

3 Photonic Interconenct 3 Used heavily in telecommunications industry Encode data in photons (light) rather than electrons Multiple wavelengths of light provide natural communication channels over a single connection But can we integrate them into the CMOS system and use them for chip-chip or even on-chip communication? 3 3

4 Corona 4 Enter the Corona paper Corona: System Implications of Emerging Nanophotonic Technology Dana Vantrease, Robert Schreiber, Matteo Monchiero, Moray McLaren, Norman P. Jouppi, Marco Fiorentino, Al Davis, Nathan Binkert, Raymond G. Beausoleil, and Jung Ho Ahn. In ISCA-35, Beijing, China, June Discuss the use of 3D integration to provide all the components necessary for on-chip photonics Then address the architectural design 4 4

5 How does it work on chip? 5 Can couple selected wavelength Can Detect the wavelength a) b) c) d) e) Ring resonator SiGe Doped Coupler Waveguide Can pass the value onto other line Using ring resonators the system can couple signals that match only a certain wavelength from the waveguide, pass them between waveguides, or detect the presence 5 5

6 Putting them into a system 6 Fiber I/O s to s or Network stsvs pgctsvs Heat Sink pgctsvs Processor/L1 Die Memory Controller/Directory/L2 Die Analog Electronics Die Optical Die Package Face to Face Bonds Laser 6 6

7 High-Level Architecture 7 Cluster based approach (increase core count without increasing interconnect as much) Cluster 0 Cluster 1 Optical Interconnect (a) Cluster 63 On-chip directory based coherence Hub connects on-chip optically to other clusters Core Core Core Shared L2 Cache Memory Controller Hub S Directory Off-chip memory also uses optical connections to improve bandwidth Core Network Interface Optical Interconnect (b) Figure 2: Architecture Overview 7 7

8 More Detailed Architecture 8 Core Die L1-I L1-I Core 0 L1-D L1-D L1-D L1-D Core 1 Star Coupler Laser L1 L2 Interface L1-I L1-I Core 2 Through Silicon Via Array Core 3 L2 Cache L1 L2 Interface Cache Die MC Direct ory Hub My X-bar Connection NI Peer X-bar Connection Detectors Splitters Optical Die Optically Connected Memory Detectors Splitter Broadcast 4-waveguide bundles N-1 Crossbar Detectors Splitter N N Arbitration Injectors Detectors Figure 3: Layout with Serpentine Crossbar and Resonator Ring Detail ensures that the memory bandwidth grows linearly with increased core count, and it provides local memory accessible Photonic Subsystem Waveguides Ring Resonators Memory K Crossbar K 8 8

9 Mul$ple writer single reader (MWSR) interconnects latchless/ wave-pipelined Arbitration prevents corruption of in-flight data Source: Mikko Lipas$-University of Wisconsin 9

10 Arbitra$on solu$ons Token Channel Single Token / Serial Writes Token Slot Multiple Tokens / Simultaneous Writes Token passing allows token to pace transmission tail (no bubbles) Source: Mikko Lipas$-University of Wisconsin Token passing allows token to directly precede 10 slot

11 Token Protocol 11 injectors Cluster 0 0r 0g 0b 0r 0g 0b detectors How do you prevent more than one writter in a given wavelength (color) Arbitration WG home cluster wavelength r g b 1r 1g 1b r r 1r 1g 1b Cluster 1 Have a token that circulates the system to indicate who is allowed to write b b g 2b 2g 2r b b g b 2b 2g g2r Power WG Cluster 2 Active Ring Resonator Lit Inactive Ring Resonator Unlit Assign each cluster it s own wavelength (color) to read Leads to underutilization of potential interconnect (when token is at a node who doesn t need it) 11 11

12 Evaluation Criteria 12 Resource Value Number of clusters 64 Per-Cluster: L2 cache size/assoc 4 MB/16-way L2 cache line size 64 B L2 coherence MOESI Memory controllers 1 Cores 4 Per-Core: L1 ICache size/assoc 16 KB/4-way L1 DCache size/assoc 32 KB/4-way L1 I & D cache line size 64 B Frequency 5 GHz Threads 4 Issue policy In-order Issue width 2 64 b floating point SIMD width 4 Fused floating point operations Multiply-Add Resource ECM Memory controllers External connectivity 256 fibers 1536 pins Channel width 128 b half duplex 12 b full duplex Channel data rate 10 Gb/s 10 Gb/s Memory bandwidth TB/s 0.96 TB/s Memory latency 20 ns 20 ns Synthetic # Network Benchmark Description Requests Uniform Uniform random 1M Hot Spot All clusters to one cluster 1M Tornado Cluster (i, j) to cluster 1M ((i + bk/2c 1)%k, (j + bk/2c 1)%k), where k = network s radix Transpose Cluster (i, j) to cluster (j, i) 1M SPLASH-2 Data Set # Network Benchmark Experimental (Default) Requests Barnes 64 K particles (16 K) 7.2 M Cholesky tk29.o (tk15.o) 0.6 M FFT 16 M points (64 K) 176 M FMM 1 M particles (16 K) 1.8 M LU matrix ( ) 34 M Ocean grid ( ) 240 M Radiosity roomlarge (room) 4.2 M Radix 64 M integers (1 M) 189 M Raytrace balls4 (car) 0.7 M Volrend head (head) 3.6 M Water-Sp 32 K molecules (512) 3.2 M Table 3: Benchmarks and Configurations 12 12

13 Speedup 13 Normalized Speedup Uniform Hot Spot 13.5 Tornado Transpose Barnes Cholesky FFT FMM Figure 8: Normalized Speedup LU Ocean Radiosity Radix Raytrace LMesh/ECM HMesh/ECM LMesh/ HMesh/ XBar/ Volrend Water-Sp 13 13

14 Bandwidth 14 Bandwidth (TB/s) LMesh/ECM HMesh/ECM LMesh/ HMesh/ XBar/ 1 0 Uniform Hot Spot Tornado Transpose Barnes Cholesky FFT FMM LU Ocean Radiosity Radix Raytrace Figure 9: Achieved Bandwidth Volrend Water-Sp 14 14

15 L2 Miss Latency 15 Average Request Latency (ns) Uniform Hot Spot Tornado Transpose Barnes Cholesky FFT FMM LU Ocean Radiosity Radix Raytrace Figure 10: Average L2 Miss Latency LMesh/ECM HMesh/ECM LMesh/ HMesh/ XBar/ Volrend Water-Sp 15 15

16 Power LMesh/ECM HMesh/ECM LMesh/ HMesh/ XBar/ Power (W) Uniform Hot Spot Tornado Transpose Barnes Cholesky FFT FMM LU Ocean Radiosity Radix Raytrace Volrend Water-Sp Figure 11: On-chip Network Power 16 16

Corona: System Implications of Emerging Nanophotonic Technology

International Symposium on Computer Architecture Corona: System Implications of Emerging Nanophotonic Technology Dana Vantrease, Robert Schreiber, Matteo Monchiero, Moray McLaren, Norman P. Jouppi, Marco