John Fragalla TACC 'RANGER' INFINIBAND ARCHITECTURE WITH SUN TECHNOLOGY. Presenter s Name Title and Division Sun Microsystems

Similar documents
Scaling to Petaflop. Ola Torudbakken Distinguished Engineer. Sun Microsystems, Inc

Agenda. Sun s x Sun s x86 Strategy. 2. Sun s x86 Product Portfolio. 3. Virtualization < 1 >

SUN CUSTOMER READY HPC CLUSTER: REFERENCE CONFIGURATIONS WITH SUN FIRE X4100, X4200, AND X4600 SERVERS Jeff Lu, Systems Group Sun BluePrints OnLine

Sun Lustre Storage System Simplifying and Accelerating Lustre Deployments

Dell Solution for High Density GPU Infrastructure

<Insert Picture Here> Exadata Hardware Configurations and Environmental Information

Sugon TC6600 blade server

SUN BLADE 6000 AND 6048 MODULAR SYSTEMS. Open Modular Architecture with a Choice of Sun SPARC, Intel Xeon, and AMD Opteron Platforms

Oracle s Full Line of Integrated Sun Servers, Storage, and Networking Systems. See Beyond the Limits

GW2000h w/gw175h/q F1 specifications

Achieve Optimal Network Throughput on the Cisco UCS S3260 Storage Server

SwitchX Virtual Protocol Interconnect (VPI) Switch Architecture

HPC Hardware Overview

Cisco HyperFlex HX220c M4 Node

EN2910A: Advanced Computer Architecture Topic 06: Supercomputers & Data Centers Prof. Sherief Reda School of Engineering Brown University

2008 International ANSYS Conference

Dual-Core Server Computing Leader!! 이슬림코리아

Architecting High Performance Computing Systems for Fault Tolerance and Reliability

Configuring a Single Oracle ZFS Storage Appliance into an InfiniBand Fabric with Multiple Oracle Exadata Machines

Dell PowerEdge Servers Portfolio Guide

Oracle <Insert Picture Here>

IBM System x servers. Innovation comes standard

Data Sheet Fujitsu Server PRIMERGY CX250 S2 Dual Socket Server Node

The Why and How of Developing All-Flash Storage Server

Cisco MCS 7825-I1 Unified CallManager Appliance

High Performance Computing

eslim SV Opteron 1U Server

1. ALMA Pipeline Cluster specification. 2. Compute processing node specification: $26K

Cisco UCS C210 M1 General-Purpose Rack-Mount Server

Highest Levels of Scalability Simplified Network Manageability Maximum System Productivity

Maailman paras palvelinjärjestelmä. Tommi Salli Distinguished Engineer

DELL POWEREDGE SERVERS

IBM System x servers. Innovation comes standard

Solaris Engineered Systems

STAR-CCM+ Performance Benchmark and Profiling. July 2014

Dell EMC Ready Bundle for HPC Digital Manufacturing Dassault Systѐmes Simulia Abaqus Performance

IBM eserver xseries. BladeCenter. Arie Berkovitch eserver Territory Manager IBM Corporation

Cisco HyperFlex HX220c Edge M5

Dell EMC Ready Bundle for HPC Digital Manufacturing ANSYS Performance

Dell PowerEdge R720xd with PERC H710P: A Balanced Configuration for Microsoft Exchange 2010 Solutions

Cisco UCS C200 M2 High-Density Rack-Mount Server

Next Generation Computing Architectures for Cloud Scale Applications

Mellanox Technologies Maximize Cluster Performance and Productivity. Gilad Shainer, October, 2007

January 28 29, 2014San Jose. Engineering Workshop

Suggested use: infrastructure applications, collaboration/ , web, and virtualized desktops in a workgroup or distributed environments.

FLASHARRAY//M Business and IT Transformation in 3U

Fujitsu PRIMERGY Servers Portfolio

HIGH PERFORMANCE COMPUTING FROM SUN

Cisco HyperFlex HX220c M4 and HX220c M4 All Flash Nodes

LAMMPS-KOKKOS Performance Benchmark and Profiling. September 2015

All-Flash Storage System

Data Sheet FUJITSU Server PRIMERGY CX1640 M1 Multi-node Server

SUN SERVER X2-8 SYSTEM

Flexible General-Purpose Server Board in a Standard Form Factor

NEC EXPRESS5800/R320a-E4 Configuration Guide

Integrating FPGAs in High Performance Computing A System, Architecture, and Implementation Perspective

CISCO MEDIA CONVERGENCE SERVER 7825-I1

Altos R320 F3 Specifications. Product overview. Product views. Internal view

Data Sheet FUJITSU Server PRIMERGY CX272 S1 Dual socket server node for PRIMERGY CX420 cluster server

GROMACS (GPU) Performance Benchmark and Profiling. February 2016

SUN BLADE 6000 CHASSIS

SSD Architecture Considerations for a Spectrum of Enterprise Applications. Alan Fitzgerald, VP and CTO SMART Modular Technologies

An Oracle White Paper December A Technical Overview of Oracle s SPARC SuperCluster T4-4

SGI Overview. HPC User Forum Dearborn, Michigan September 17 th, 2012

Intel Xeon E v4, Windows Server 2016 Standard, 16GB Memory, 1TB SAS Hard Drive and a 3 Year Warranty

Density Optimized System Enabling Next-Gen Performance

QLogic InfiniBand Cluster Planning Guide

eslim SV Xeon 2U Server

Cisco HyperFlex HX220c M4 and HX220c M4 All Flash Nodes

HP BladeSystem c-class Server Blades OpenVMS Blades Management. John Shortt Barry Kierstein Leo Demers OpenVMS Engineering

Data Sheet FUJITSU Server PRIMERGY CX2550 M4 Dual Socket Server Node

HUAWEI Tecal X8000 High-Density Rack Server

Cisco SFS 7000D InfiniBand Server Switch

Xyratex ClusterStor6000 & OneStor

DESCRIPTION GHz, 1.536TB shared memory RAM, and 20.48TB RAW internal storage teraflops About ScaleMP

RS U, 1-Socket Server, High Performance Storage Flexibility and Compute Power

Server Networking e Virtual Data Center

An Oracle White Paper December Accelerating Deployment of Virtualized Infrastructures with the Oracle VM Blade Cluster Reference Configuration

MegaGauss (MGs) Cluster Design Overview

OCTOPUS Performance Benchmark and Profiling. June 2015

Cray XD1 Supercomputer Release 1.3 CRAY XD1 DATASHEET

IBM BladeCenter: The right choice

IBM System x family brochure

Analyzing Performance and Power of Applications on GPUs with Dell 12G Platforms. Dr. Jeffrey Layton Enterprise Technologist HPC

Oracle s Netra Modular System. A Product Concept Introduction

IBM BladeCenter solutions

Dell EMC HPC System for Life Sciences v1.4

Selecting Server Hardware for FlashGrid Deployment

IBM System x family brochure

PRIMERGY BX960 S1. System configurator and order-information guide September PRIMERGY Server. Contents

Power Systems AC922 Overview. Chris Mann IBM Distinguished Engineer Chief System Architect, Power HPC Systems December 11, 2017

IBM Virtual Fabric Architecture

Data Sheet FUJITSU Server PRIMERGY CX2550 M1 Dual Socket Server Node

eslim SV Xeon 1U Server

NAMD Performance Benchmark and Profiling. January 2015

Cisco Connected Safety and Security UCS C220

SUN FIRE X4150 AND X4450 SERVER ARCHITECTURE

GR585 F1 specifications

FlashArray//m. Business and IT Transformation in 3U. Transform Your Business. All-Flash Storage for Every Workload.

Server System Infrastructure (SM) (SSI) Blade Specification Technical Overview

Transcription:

TACC 'RANGER' INFINIBAND ARCHITECTURE WITH SUN TECHNOLOGY SUBTITLE WITH TWO LINES OF TEXT IF NECESSARY John Fragalla Presenter s Name Title and Division Sun Microsystems Principle Engineer High Performance Computing Sun Microsystems, Inc. 1

The Sun Constellation System The world s largest general purpose compute cluster based on Sun Constellation System > 82 racks of Sun Blade 6048 Modular System > 2 Sun Sun Datacenter Switch 3456 Infiniband switches > 72 Sun X4500 Thumpers storage servers Sun is the sole HW supplier AMD Opteron Based Single Fabric based on InfiniBand with Mellanox Chipsets Expandable configuration

Configuration Summary Two-tier InfiniBand topology > A 24-port IB-NEM leaf switch on each 12-blade shelf > Two Magnum central switches: 16 line cards each > One 12x IB cable for every three nodes, 1,384 IB cables total > Total Bisection BW: 46.08 Tbps (5.76 GB/s) 82 C48 Compute racks, (Sun Blade 6048) > 3,936 Pegasus Blades w/ 2.3 GHz AMD, 4-socket, quadcore, 32 GB > 15,744 Sockets, 62,691 Cores, 125.95 TB of Memory > 579.4 TFLOP/s Rpeak > 433 TFLOP/s (#6) Rmax Sun Lustre InfiniBand Storage > 72 X4500 bulk storage nodes with 24-TB each > Total Capacity is 1.7 PB > Peak Measured BW: 50GB/s (Approx. 0.1GB/s : 1TFLOP/s)

Sun Blade 6048 Modular System Massive Horizontal Scale Unibody Chassis/Rack Design 48 server modules per rack 4-socket AMD Blades with 32 DIMMs 192 sockets per rack 768 cores per rack Weighs ~500 lbs. less than conventional rack/chassis combo TFLOP/s per Rack @ TACC: 7.07 (2.3GHz AMD Barcelona) Modular Design 100 percent of active components hot-plug and redundant (power, fans, I/O, management) Power and Cooling Eight 8,400 W (redundant in an N+N configuration may be configured to 5,600W) Common power and cooling I/O Industry-standard PCI Express midplane Four x8 PCIe links per server module Two PCIe EM per server module, eight NEM per rack

Sun Blade 6048 InfiniBand DDR Switched Network Express Module Supports 13,824 Node Clusters Industry s Only Chassis Integrated Leaf Switch Dual port DDR HCA per server module Pass thru GigE per server module Leverages native x8 PCIe I/O Compact design maximizes chassis utilization Designed for Highly Resilient Fabrics Two onboard DDR IB switches Increased resiliency allowing connectivity up to four Ultra-dense switches 3:1 Reduction in Cabling Simplifies Cable Management 24 ports of 4x DDR connectivity realized with only 8 physical 12x connectors

Inside the SB6048 IB DDR Switched NEM x8 PCIe to 12 Server Modules HCA HCA HCA HCA HCA HCA HCA HCA HCA HCA HCA HCA Dual Port DDR HCA 24 Port 384Gbps IB Switch 24 Port 384Gbps IB Switch GE GE 12x 12x GE GE GE GE 12x 12x 12x 12x GE GE GE GE 12x 12x GE GE 12x Connects to Sun Data Center Switch 3456 GigE Ports for Server Administration Contains 2 x 24-port Mellanox 384 Gbps Infiniscale III switches Supports clusters sizes of 13K server modules > Routes around failed 12x links

Sun Datacenter Switch 3456 Worlds Highest Scale & Density InfiniBand Switch I/O 3456 IB 4x SRD/DDR connections 110 Terabits per second 1 usec latency Fabric 720 24-port IB switch chips 5-stage switching Availability Hot-swap power supplies, fans, CMC's Multi-Failover Subnet Manager Servers Multi-Redundant switching nodes Management IPMI, CLI, HTTP, SNMP Software Host based subnet Management software

InfiniBand Cables Dual 12 socket on Magnum line card or C48 NEM 12x Cable connector Magnum switch & IB NEM uses Mollex ipass connectors for three 4x connections > 3x improvement in density over standard IB connectors > Three 4x connections in one 12x connector 12x to 12x cable > Magnum to C48 NEM 12x to triple 4x standard (splitter) cable > Magnum to three servers with standard IB HCA's

A Lustre Cluster

Sun Fire X4500 Dual Socket Data Server Lustre OSS Building Block at TACC Lustre OSS BW: 0.8 GB/s 24 TB Raw Capacity Compute 2x AMD Opteron Sockets Dual-core 8x DDR1 DIMM slots Up to 32GB memory IO 2 PCI-X slots 4x Gigabit Ethernet ports 48x SATA 3.5 disk drives Availability Hot-swap disk Hot-plug fans Hot-plug PSU, N+N

11

TACC Configuration Blade shelf 1,312 cables from 82 blade racks 4S x 4C x 4flop 2.0 GHz CPU 32 GB memory NEM with 12 IB HCAs & 24-port leaf switch 3,936 Compute nodes 328 blade shelves 504 Tflops 123 TiB memory size Bisection Bandwidth: 3,936 * 1 GBps = 3.9 TBps (at current SDR IB rate) 7 N1GE, N1SM, ROCKS & IB subnet manager nodes Magnum Switch 16-line cards (2,304-ports) 48 splitter cables 6 splitter cables 4S x 2C 2.6 GHz 16 GB mem X4600 Magnum Switch 16-line cards (2,304-ports) 18 splitter cables 8 Gateway & Datamover nodes 4S x 2C 2.6 GHz 16 GB mem X4600 10 GbE FC 4 Login nodes 4S x 4C 2.1 GHz 32 GB mem X4600 2S x 2C X4500 (Thumper) 48 500 GB drives 4S x 2C 2.6 GHz 16 GB mem X4600 32-port FC4 switch STK 6540 FC-RAID 9 TB 72 Bulk Storage nodes 1.7 PBytes 6 Metadata nodes Metadata RAID

Blades and Shelf as Used at TACC Per blade: > 4 quad-core CPUs > 16 2 GB DIMMs > 8 GB flash for booting > One PCIe 8x connection > One Mellanox IB HCA (on NEM) 12 blades Per shelf: > One 24-port IB leaf switch > Four 12x cables, each to a different line card > CMM connection to 100 MbE 4 12x IB cables NEM GigE ports, second NEM switch chip, and PEM slots are not used

Magnum Switches as Configured at TACC Two Magnum switches > Each with 16 line cards > A total of 4,608 4x ports > Line cards cabled in pairs, with empty slots left for cable access Largest IB network so far > Equivalent to 42 conventional 288-port IB switches > Only 1,400 cables needed conventional IB switches would require 8,000 cables From blade shelf Each line card connects to 48 blade shelves To blade shelf

Rack to Switch Cabling Blade rack from s e l cab itch t h g i of e each sw s e l bund rack to o w T each Switch 1 Each blade shelf connects to two different line cards in both switches Four separate paths from each shelf Switch 2

TACC Floorplan Size: approximately half a basketball court Cold aisle I Hot aisle I Cold aisle 38 I Hot aisle I Cold aisle Power wiring C C C C C C C C C C C C C C I C C C C C C C C C C C C C C I Power wiring C C C C C C Switch C C C C C C I Switch C C C C C C 2 C C C C C 1 C I Power wiring I C C C C C C C C C C C C C C C I Hot aisle I C C C C C C C C C C C C C C C I Cold aisle Power wiring 2 floor tiles 54 12x cable lengths: 171 9m, 679 11m, 406 13m, 56 15m Splitter cable lengths: 54 14m, 18 16m Switch C I 2 Magnum switches 16 line cards each (2,304 4x IB ports each) 82 blade compute racks (3,936 4S blades) 12 IO racks (25 X4600 4 RU 72 X4500 4 RU) 112 APC Row coolers 1,312 12x cables (16 per rack) 16 km total length 72 splitter cables 6 per IO rack

TACC 'RANGER' INFINIBAND ARCHITECTURE WITH SUN TECHNOLOGY SUBTITLE WITH TWO LINES OF TEXT IF NECESSARY Presenter s Name Title and Division Sun Microsystems Thank you John.Fragalla@Sun.COM 17