1. NoCs: What s the point?

Size: px

Start display at page:

Download "1. NoCs: What s the point?"

Ophelia Beasley
5 years ago
Views:

1 1. Nos: What s the point? What is the role of networks-on-chip in future many-core systems? What topologies are most promising for performance? What about for energy scaling? How heavily utilized are Nos in practical applications, and how does that affect your answers? 1

64mm I/O Area single tile 1.5mm 2.0mm 48 ore Single hip loud (2009) TI LE 26.

11% 8 X 10 Mesh 32 bit links 320 GB/sec bisection BW @ 5 GHz Dual FPMAs 36% VR System

2 21.72mm DDR 3 M DDR 3 M 21.4mm DDR 3 M DDR 3 M On-Die ommunication Power 80 ore TFLOP hip (2006) 12.64mm I/O Area single tile 1.5mm 2.0mm 48 ore Single hip loud (2009) TI LE 26.5mm PLL TI LE JTA G PLL I/O Area TAP lock dist. 11% 8 X 10 Mesh 32 bit links 320 GB/sec bisection 5 GHz Dual FPMAs 36% VR System Interface + I/O 2 ore clusters in 6 X 4 Mesh (why not 6 x 8?) 128 bit links 256 GB/sec bisection 2 GHz M & DDR % ores 70% Router + Links 28% 10-port RF 4% IMEM + DMEM 21% Routers & 2Dmesh 10% Global locking 1% 2

3 Bus The Other Extreme Issues: Slow, < 300MHz Shared, limited scalability? Solutions: Repeaters to increase freq Wide busses for bandwidth Multiple busses for scalability Benefits: Power? Simpler cache coherency Move away from frequency, embrace parallelism 3

Mesh Retrospective Bus: Good at board level, does not extend well Transmission line

board area Broadcast, simple to implement Point to point busses: fast signaling over

Ring, 2D Mesh and Torus to reduce latency Higher complexity and latency in each node

4 Mesh Retrospective Bus: Good at board level, does not extend well Transmission line issues: loss and signal integrity, limited frequency Width is limited by pins and board area Broadcast, simple to implement Point to point busses: fast signaling over longer distance Board level, between boards, and racks High frequency, narrow links 1D Ring, 2D Mesh and Torus to reduce latency Higher complexity and latency in each node Hence, emergence of packet switched network But, pt-to-pt packet switched network on a chip? 4

5 Delay (ps) pj/bit Interconnect Delay & Energy u pitch, 0.5V Repeated wire delay Router Delay Wire Delay Length (mm) Wire Energy Router Energy

6 A ircuit Switched Network Routers 8x8 ircuitswitched No Packet-switched Request Plk Src 0 1 n Dest ircuit-switched Acknowledge lk ircuit-switched Data Transmission Routers lk 2mm links ircuit-switched No eliminates intra-route data storage Packet-switching used only for channel requests High bandwidth and energy efficiency (1.6 to 0.6 pj/bit) Anders et al, A 4.1Tb/s Bisection-Bandwidth 560Gb/s/W Streaming ircuit-switched 8 8 Mesh Network-on-hip in 45nm MOS, ISS

7 Hierarchical & Heterogeneous Bus R Bus R Bus Bus to connect over short distances Bus R 2 nd Level Bus Bus R Hierarchy of Busses Or hierarchical circuit and packet switched networks 7

8 Link Width (a.u.) Bytes/Op and Tapered BW Local Regional luster Global Local. Wide Slow 8

relatively low investment or cost, can return large dividends in energy and/or

9 2. Low-hanging Fruit Asking for both on-chip and chip-to-chip wires, separately: Is there a physical (circuit or logic) technology that, for relatively low investment or cost, can return large dividends in energy and/or performance? Where should we be looking to improve interconnects? Answer: I do not see one today 9

10 3. Bend, but don t break Is there a role for interconnect in overall system resilience? Must interconnects change to maintain or enable large-scale resilience, and if so, how? 10

11 Resiliency Faults Example Permanent faults Stuck-at 0 & 1 Gradual faults Variability Temperature Intermittent faults Soft errors Voltage droops Aging faults Degradation Faults cause errors (data & control) Datapath errors Detected by parity/e Silent data corruption Need HW hooks ontrol errors ontrol lost (Blue screen) Minimal overhead for resiliency Applications System Software Programming system Microcode, Platform Microarchitecture ircuit & Design Error detection Fault isolation Fault confinement Reconfiguration Recovery & Adapt Interconnect plays peripheral role in resiliency 11

most direct research investment? Answer: 1.

12 4. Packaging For chip-to-chip interconnects (or even for on-chip wires), what is the enabling or supporting role played by packaging/packages, and where do we need to make the most direct research investment? Answer: 1. Research investment in 3D design tools and automation (Not in 3D processing and packaging technology) 2. Low cost, low loss, materials, cables etc. 12

13 20MB 3D-Stacked SRAM 80 ores SRAM heat sink heat spreader Polaris die Freya die LGA substrate top metal top metal TSVs 20MB 3D local memory for TFLOP performance BW full core clock (3GHz) ~1TB/s for TFLOP 13

3D Memory Architecture On-die Mesh Interconnect Processor Tile Memory Bus 42 Memory Tile Signals and power from package, through memory, to the

14 3D Memory Architecture On-die Mesh Interconnect Processor Tile Memory Bus 42 Memory Tile Signals and power from package, through memory, to the processor tile TSV Pitch 190mm SRAM die size 275mm 2 SRAM size 256KB per tile, 20MB total SRAM Power 7W SRAM + 2.2W IO Bandwidth 12GB/sec/tile, ~1TB/sec total 14

Other Potential Applications Network on a chip IO Hub High Performance Technology (expensive) Small number of high speed IO High Performance PU Heat-sink Lower Performance Technology (inexpensive)

15 Other Potential Applications Network on a chip IO Hub High Performance Technology (expensive) Small number of high speed IO High Performance PU Heat-sink Lower Performance Technology (inexpensive) Large number of low speed IO IO Hub PU IO Hub Package No fabricated on a separate die with metal system optimized for the interconnect stack IO hub fabricated on older technology with high voltage and legacy support 15

Relative 5. Worries What, if anything, keeps you up at night regarding interconnect scalability? Is cost (NRE, complexity, design time) a factor? On-die I energy/mm Off-die I 1.2 1 0.

16 Relative 5. Worries What, if anything, keeps you up at night regarding interconnect scalability? Is cost (NRE, complexity, design time) a factor? On-die I energy/mm Off-die I ompute Energy Interconnect Energy 1.6X 6X Technology (nm) Energy, pj/bit Data Rate Gb/s Research Technology (nm) 16

Future of Interconnect Fabric A Contrarian View. Shekhar Borkar June 13, 2010 Intel Corp. 1

Future of Interconnect Fabric A ontrarian View Shekhar Borkar June 13, 2010 Intel orp. 1 Outline Evolution of interconnect fabric On die network challenges Some simple contrarian proposals Evaluation and