The Sun Fireplane Interconnect in the Mid- Range Sun Fire Servers

Size: px

Start display at page:

Download "The Sun Fireplane Interconnect in the Mid- Range Sun Fire Servers"

Benjamin Barker
5 years ago
Views:

bandwidth & low latency Good performance on most parallel apps Can be more costly for bigger boxes any systems Separate systems,

1 TAK IT TO TH NTH Alan Charlesworth icrosystems The Fireplane Interconnect in the id- Range Fire Servers Vertical & Horizontal Scaling any CUs in one box Cache-coherent shared memory (S) Usually proprietary interconnect Can be dynamically partitioned High bandwidth & low latency Good performance on most parallel apps Can be more costly for bigger boxes any systems Separate systems, communicate by network AIs Usually commodity interconnect & boxes Needs partitionable parallel apps bases typically can t be horizontally scaled

2 Shared-emory Server Sales Factory revenue (Billions) $60-6 CUs $0 7- CUs 9-6 CUs $0-8 CUs $0 - CUs $0 CUs $0 System CU capacity bands Source: IDC, arch 00 $ Year CU center Tiers Users Ray, Net Appliances, Thin Clients, Cs Horizontal scaling Web Tier Web Web Web Web Web Web Web Web Web Web Web Lots of Small sys Application Tier Apps Server Apps Server Apps Server OLA art arts 0 s small to big base TierOLT base Server DSS base Server Few Big Sys Operational Storage OLT DSS Warehouse Storage Vertical scaling

3 Big base xamples SF 6800: 000 GB TC-H Server Storage racks CUs 80 GB mem drives 6 TB 0K: 000 GB TC-H Servers 8 CUs 8 GB mem Storage racks,9 drives TB -yr illions KW Sq ft Server $ Storage $ Software $0. HW aint $0.9 Total $..9. -yr illions KW Sq ft Server $ Storage $ Software $. HW aint $.6 Total $.. 6. Cache Coherency CU CU CU CU Invalidate Invalidate. Read to Share. Read to Share. Read to Own. Read to Share. Writeback Coherency blocks (Aligned, 6, or 8 bytes) emory

4 Broadcast & oint-to-oint. Broadcast (Snoopy) All addresses sent everywhere Snoop result computed in a few cycles Lowest possible latency, especially for cache-to-cache transfers bandwidth limited by snoop bandwidth. oint-to-point (Directory) Directory keeps track of who is interested in each block Addresses sent only to interested parties Latency usually longer Bandwidth can be much greater Interconnect Timeline Development roduction UltraSARC-V UltraSARC-III / Fireplane UltraSARC-I / UA SuperSARC / XDBus Cypress SARC / Bus CU core / Interconnect Now

5 Interconnect Generations Bus XDBus UA Fireplane Year (in mid-size servers) System clock (Hz) Coherency type acket switching Address & Coherency block (bytes) Sys clocks/snoop Address B/W (GBps) Broadcast point-to-point Broadcast Circuit acket switched Together Separate # Address buses >6 ax data B/W (GBps) >0 path width (bytes) Wiring 8 Bused 6 Switched id: Bused High: Switched Snooping rogress 0 Broadcast-bus bandwidth (GBps) Bus. XDBus. UA. Fireplane Doubling every 8 months trend line Year of first shipment in medium-sized servers

6 \\ Fireplane Cache Coherency. Broadcast (snoopy) coherency inside a snooping coherency domain Snooping coherence domain \\ SS Agent Scalable Shared emory (SS) Interconnect \\ Snooping coherence domain \\ SS Agent rocessor emory I/O. oint-to-point (directory) coherency between snooping coherency domains Address Bus Implementation Top level Address Repeater outgoing incoming implicit distributed arbitration Board level AR 0 AR 0 CU 0 CU 6 CU CU 6 I/O 0 I/O

7 Snoopy Coherence Domain Address transaction Broadcast address bus Top-level Address Repeater transfer Snoop emory cycle rocessor emory I/O interface CI CI CI CI CI CI CI CI Address Repeater Switch CU/em path CU/em Address Request (0-) Broadcast address (-6) Snoop (7-) emory (7-) Transfer (-6) Increasing CU Integration UltraSARC-III / Fireplane 000 Ultra-III rocessor xternal cache xternal cache tags UltraSARC-I / UA 996 Ultra-I rocessor xternal cache xternal cache tags SuperSARC / XDBus 99 Coherency Coherency emory emory SuperSparc rocessor xternal cache and cache tags Coherency emory Cypress FU Cypress SARC / Bus 990 Cypress IU Cache and Coherency Cache tags emory

8 UltraSARC-III rocessor rocessor chip instructions/clock Instruction Issue Unit 0 instruction queue 6 K branch predictor Instruction TLB Integer Unit 6 registers ALU pipe 0 ALU pipe Load/store pipe Branch pipe Floating-oint Unit registers F multiply/graphics/div pipe F add/graphics pipe = arity generate & check = CC generate, check & correct Address path path 8 Instruction Cache KB Instruction refetch Store Queue Write cache KB 6 Cache 6 KB TLB refetch Cache KB refetch emory Banks 0, SDRA DI xternal Cache Control 8 B xternal Cache data Tags (90 KB) SRA DI 6 Fireplane address bus Fireplane system interface & memory control 88 Dual CU Switch Fireplane 88 data path 76 Banks, SDRA DI Fire System Board 0 million snoops/sec path control Address repeater path.8 GBps rocessor & cache emory. GBps. GBps Dual CU data switch. GBps. GBps rocessor & cache emory = arity generate & check C = CC check = CC generate, check & correct Address path path.8 GBps bytes C Switch C.8 GBps rocessor & cache emory. GBps. GBps Dual CU data switch. GBps. GBps rocessor & cache emory

9 Fire Assembly 0 million snoops/sec path control. GBps 6 bytes Address repeater path. GBps ~00 Bps Hz CI CI card 66 Hz CI card ~00 Bps ~00 Bps Hz CI CI card C C 66 Hz switch. CI card GBps ~00 Bps = arity generate & check C = CC check = CC generate, check & correct Address path path Fireplane Switch Boards Address Address repeater repeater Address repeater Address repeater pairs of address ports path path path path switch switch switch switch Six -byte ports & four 6-byte ports

10 System Board icture ower $ DI $ DI ower Boot bus ASIC CU Address ASIC Control ASIC Switch ASICs Boot bus ASIC Two sets of 8 Dual CU Switch ASICs Four banks of 8 SDRA DIs 9." ower 6." I/O Assembly ictures 6 slot cci 8 slot CI

Cabinet ictures x Fire 800 Fire 800/80 Fire 6800 CUs + 6 CI slots x (8

8 processors CU/emory boards I/O assemblies domains 9.

" CU/em boards Fire 800 Fire 80 Deskside or rack mount Rack mount

11 Cabinet ictures x Fire 800 Fire 800/80 Fire 6800 CUs + 6 CI slots x (8 CUs + cci slots) CUs + CI slots Fire Server Cabinets Fire 800 Rack mount 8 processors CU/emory boards I/O assemblies domains 9.6 GBps peak BW I/O 9" CU/em boards " 7.U " I/O 7." CU/em I/O 7." CU/em boards Fire 800 Fire 80 Deskside or rack mount Rack mount processors CU/emory boards I/O assemblies domains 9.6 GBps bisection BW " 8.U " 7" U " " CU/em boards ( I/O in back) Bulk ower 6" 7" Fire 6800 Cabinet processors 6 CU/emory boards I/O assemblies domains 9.6 GBps bisection BW Large Server Cabinet >6 processors

12 A icro Benchmark arallel pointer-chasing emory latency (ns) (Lower is better) nterprise 600 (Bus) Fire 6800 (Switch) emory bandwidth (GBps) 6 (Higher is better) Fire 6800 (Switch) Linear nterprise 600 (Bus) rocessors rocessors Benchmark Record SpecWeb99 CUs: Web serving SpecJBB CUs: -bit OLT app-tier perf TC-H TB Decision Support CUs: rice/perf & perf/cu Oracle Apps CUs: OLT performance eoplesoft CUs: General Ledger CUs: Financials

SunFire range of servers

TAKE IT TO THE NTH Frederic Vecoven Sun Microsystems SunFire range of servers System Components Fireplane Shared Interconnect Operating Environment Ultra SPARC & compilers Applications & Middleware Clustering