Cray XD1 Supercomputer Release 1.3 CRAY XD1 DATASHEET

Similar documents
The Cray XD1. Technical Overview. Amar Shan, Senior Product Marketing Manager. Cray XD1. Cray Proprietary

Scalable Computing at Work

Cray events. ! Cray User Group (CUG): ! Cray Technical Workshop Europe:

CS500 SMARTER CLUSTER SUPERCOMPUTERS

Who says world-class high performance computing (HPC) should be reserved for large research centers? The Cray CX1 supercomputer makes HPC performance

HP ProLiant BL35p Server Blade

Introducing the next generation of affordable and productive massively parallel processing (MPP) computing the Cray XE6m supercomputer.

Sun Lustre Storage System Simplifying and Accelerating Lustre Deployments

Sun Fire X4170 M2 Server Frequently Asked Questions

SUN CUSTOMER READY HPC CLUSTER: REFERENCE CONFIGURATIONS WITH SUN FIRE X4100, X4200, AND X4600 SERVERS Jeff Lu, Systems Group Sun BluePrints OnLine

BlueGene/L. Computer Science, University of Warwick. Source: IBM

Huawei KunLun Mission Critical Server. KunLun 9008/9016/9032 Technical Specifications

DESCRIPTION GHz, 1.536TB shared memory RAM, and 20.48TB RAW internal storage teraflops About ScaleMP

ABySS Performance Benchmark and Profiling. May 2010

Architecting Storage for Semiconductor Design: Manufacturing Preparation

Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems.

Xyratex ClusterStor6000 & OneStor

The Red Storm System: Architecture, System Update and Performance Analysis

Huawei KunLun Mission Critical Server. KunLun 9008/9016/9032 Technical Specifications

IBM System x servers. Innovation comes standard

IBM System x3455 AMD Opteron SMP 1 U server features Xcelerated Memory Technology to meet the needs of HPC environments

Smarter Clusters from the Supercomputer Experts

2008 International ANSYS Conference

AcuSolve Performance Benchmark and Profiling. October 2011

IBM s Data Warehouse Appliance Offerings

Configuring a Single Oracle ZFS Storage Appliance into an InfiniBand Fabric with Multiple Oracle Exadata Machines

OceanStor 9000 InfiniBand Technical White Paper. Issue V1.01 Date HUAWEI TECHNOLOGIES CO., LTD.

HIGH-PERFORMANCE STORAGE FOR DISCOVERY THAT SOARS

Scaling to Petaflop. Ola Torudbakken Distinguished Engineer. Sun Microsystems, Inc

QLogic TrueScale InfiniBand and Teraflop Simulations

Cisco SFS 7000D InfiniBand Server Switch

STAR-CCM+ Performance Benchmark and Profiling. July 2014

The Use of Cloud Computing Resources in an HPC Environment

IBM BladeCenter solutions

Designed for Maximum Accelerator Performance

Cisco Unified Computing System Delivering on Cisco's Unified Computing Vision

Scaling Across the Supercomputer Performance Spectrum

Data Sheet FUJITSU Server PRIMERGY CX2550 M1 Dual Socket Server Node

Models Smart Array 6402/128 Controller B21 Smart Array 6404/256 Controller B21

IBM System Storage DS5020 Express

p5 520 server Robust entry system designed for the on demand world Highlights

White Paper. Technical Advances in the SGI. UV Architecture

FUJITSU SPARC ENTERPRISE SERVER FAMILY MID-RANGE AND HIGH- END MODELS ARCHITECTURE FLEXIBLE, MAINFRAME- CLASS COMPUTE POWER

IBM Storwize V5000 disk system

IBM System x servers. Innovation comes standard

The Cray Rainier System: Integrated Scalar/Vector Computing

Highest Levels of Scalability Simplified Network Manageability Maximum System Productivity

Cluster Network Products

MM5 Modeling System Performance Research and Profiling. March 2009

Support for Programming Reconfigurable Supercomputers

Cisco HyperFlex HX220c M4 Node

Assessment of LS-DYNA Scalability Performance on Cray XD1

In-Network Computing. Paving the Road to Exascale. 5th Annual MVAPICH User Group (MUG) Meeting, August 2017

IBM System Storage DCS3700

Cisco 4000 Series Integrated Services Routers: Architecture for Branch-Office Agility

Assessing performance in HP LeftHand SANs

EMC Backup and Recovery for Microsoft SQL Server

Refining and redefining HPC storage

EMC Celerra CNS with CLARiiON Storage

IBM BladeCenter solutions

IBM Power Systems HPC Cluster

AcuSolve Performance Benchmark and Profiling. October 2011

Overview of Tianhe-2

IBM System p5 510 and 510Q Express Servers

SCRAMNet GT. A New Technology for Shared-Memor y Communication in High-Throughput Networks. Technology White Paper

Cisco HyperFlex HX220c M4 and HX220c M4 All Flash Nodes

Matrox Supersight. HPC: made for imaging. Benefits. High-performance computing (HPC) platform for computationally-demanding industrial imaging.

Dell PowerVault MD Family. Modular Storage. The Dell PowerVault MD Storage Family

GW2000h w/gw175h/q F1 specifications

Best Practices for Setting BIOS Parameters for Performance

Cisco UCS C210 M2 General-Purpose Rack-Mount Server

NAMD Performance Benchmark and Profiling. January 2015

SAP High-Performance Analytic Appliance on the Cisco Unified Computing System

Dell PowerVault MD Family. Modular storage. The Dell PowerVault MD storage family

CESM (Community Earth System Model) Performance Benchmark and Profiling. August 2011

IBM System x family brochure

Apace Systems. Avid Unity Media Offload Solution KIT

IBM System p5 550 and 550Q Express servers

An Oracle Technical White Paper October Sizing Guide for Single Click Configurations of Oracle s MySQL on Sun Fire x86 Servers

IBM System x family brochure

Achieve Optimal Network Throughput on the Cisco UCS S3260 Storage Server

PART-I (B) (TECHNICAL SPECIFICATIONS & COMPLIANCE SHEET) Supply and installation of High Performance Computing System

Introduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620

EMC ISILON X-SERIES. Specifications. EMC Isilon X200. EMC Isilon X400. EMC Isilon X410 ARCHITECTURE

Cisco HyperFlex HX220c M4 and HX220c M4 All Flash Nodes

Metropolitan Road Traffic Simulation on FPGAs

Cisco UCS C210 M1 General-Purpose Rack-Mount Server

IBM System x3755 SMP-capable rack server supports new dual-core AMD Opteron processors

INFOBrief. Dell PowerEdge Key Points

SUN SPARC ENTERPRISE M5000 SERVER

Dell Fluid Data solutions. Powerful self-optimized enterprise storage. Dell Compellent Storage Center: Designed for business results

Virtual Leverage: Server Consolidation in Open Source Environments. Margaret Lewis Commercial Software Strategist AMD

Sugon TC6600 blade server

Cisco UCS C24 M3 Server

IBM System x3850 M2 servers feature hypervisor capability

An Oracle White Paper April Consolidation Using the Fujitsu M10-4S Server

IBM System p5 185 Express Server

Oracle Coherence + Oracle Exalogic Elastic Cloud

ClearPath Forward Libra 8480 / 8490

Enabling Performance-per-Watt Gains in High-Performance Cluster Computing

Transcription:

CRAY XD1 DATASHEET Cray XD1 Supercomputer Release 1.3 Purpose-built for HPC delivers exceptional application performance Affordable power designed for a broad range of HPC workloads and budgets Linux, 32 and 64-bit x86 compatible runs a wide variety of ISV applications and open source codes Simplified system administration automates configuration and management functions Highly reliable monitors and maintains system health Scalable to hundreds of compute nodes high bandwidth and low latency let applications scale The Cray XD1 supercomputer combines breakthrough interconnect, management and reconfigurable computing technologies to meet users demands for exceptional performance, reliability and usability. Designed to meet the requirements of highly demanding HPC applications in fields ranging from product design to weather prediction to scientific research, the Cray XD1 system is an indispensable tool for engineers and scientists to simulate and analyze faster, solve more complex problems, and bring solutions to market sooner.

Direct Connected Processor Architecture Cray XD1 System Highlights The Cray XD1 system is based on the Direct Connected Processor (DCP) architecture, harnessing many processors into a single, unified system to deliver new levels of application performance. Cray s implementation of the DCP architecture optimizes message-passing applications by directly linking processors to each other through a high performance interconnect fabric, eliminating shared memory contention and PCI bus bottlenecks. Compute Processors 12 AMD Opteron 64-bit single or dual core processors run Linux and are organized as six nodes of 2 or 4-way SMPs to deliver up to 106 GFLOPS* per chassis. Matching memory and I/O performance removes bottlenecks and maximizes processor performance. RapidArray Interconnect The industry s fastest embedded switching fabric, the RapidArray interconnect uses 12 custom communications processors and a 96 GigaBytes (GB) per second nonblocking switching fabric per chassis to deliver 8 GB per second bandwidth between nodes with 1.7 microsecond MPI latency. Each chassis presents 24 RapidArray links externally with an aggregate 48 GB per second bandwidth between chassis. Application Acceleration Six Xilinx Virtex-4 Field Programmable Gate Arrays (FPGAs) per chassis attach to the RapidArray fabric for massively parallel execution of critical algorithm components, promising orders of magnitude performance improvement for target applications. Highly modular, the Cray XD1 base unit is a chassis. Up to 12 chassis can be installed in a cabinet. Multi-cabinet configurations integrate hundreds of processors into a single system. Chassis Compute Processors 12 144 Each Cabinet Performance 106 GFLOPS* 1.27 TFLOPS* Aggregate Switching Capacity 96 GB/s 1152 GB/s MPI Interprocessor Latency 1.7 µsec 2.0 µsec Aggregate Memory Bandwidth 77 GB/s 922 GB/s Maximum Memory 96 GB 1.2 TB Maximum Local Disk Storage 1.5 TB 18 TB * peak theoretical performance with 2.2 GHz Dual Core AMD Opteron processors Active Management A management processor in each chassis, a realtime operating system, distributed software and an independent supervisory network operate together to monitor, control and manage every aspect of the computer. The Cray XD1 system enables single system command and control and provides extensive high availability features. CRAY XD1 DATASHEET

The Cray XD1 HPC system is based on the DCP architecture. This innovative new computer unifies up to hundreds of processors into a single, balanced computer, delivering exceptional performance, reliability and usability. The Cray XD1 architecture includes four key subsystems: Compute Environment RapidArray Interconnect Application Acceleration Active Management Compute Environment The Cray XD1 compute subsystem is composed of a high-performance Linux operating system and AMD Opteron 64-bit processors. Direct Connect Bandwidth The AMD Opteron processor s integrated memory controller increases memory bandwidth and reduces latencies. Three HyperTransport links per processor provide up to 19.2 GB per second I/O bandwidth. Dual Core Processors Dual Core AMD Opteron processors are available as an option on the Cray XD1 system. Two processing cores exist on a single die, doubling peak performance, without raising power consumption and heat levels. Linux for a wide variety of HPC codes Global Process Synchronization Optimizations to the Linux scheduler result in the synchronization of process execution across the system, minimizing time spent by applications at barriers, and increasing application performance as much as 50 percent. Standards Based The Linux operating system, combined with the AMD Opteron processor s support for 32 and 64-bit x86 compatible computing, allows users to take advantage of a wide range of commercial and open-source applications. RapidArray Interconnect The Cray XD1 RapidArray interconnect directly connects processors over high-speed, low-latency pathways. This DCP architecture eliminates the memory contention and PCI bus bottlenecks. Each fully configured chassis includes: 12 custom communications processors a 96 GB per second nonblocking embedded switch fabric, which includes main and optional expansion fabrics Optimized communications libraries 96 GB/s nonblocking switching fabric per chassis RapidArray Communications Processors Tightly coupled to the AMD Opteron processors and switching fabric, the RapidArray processors offload and accelerate communications functions from the Opteron processor, freeing it to perform core compute tasks and enabling concurrent computing and communication. The communications processors deliver up to 8 GB per second bandwidth with 1.7 microsecond MPI latency between nodes. The communications processors also deliver 1.1 microsecond latency on the randomly ordered ring latency test, part of the HPC Challenge (HPCC) benchmark suite. With interconnect bandwidth on par with memory bandwidth, a major system bottleneck is removed for many applications, improving performance and greatly simplifying software development. RapidArray Embedded Switching Fabric A 96 GB per second, nonblocking, crossbar switching fabric in each chassis provides four 2 GB per second links to each node and twenty-four 2 GB per second interchassis links. RapidArray Communications Libraries The RapidArray communications libraries work in conjunction with the RapidArray communications processors to streamline communications protocols, bypassing the Linux kernel where possible and optimizing the compute/communicate interaction. The RapidArray interconnect delivers outstanding performance for highly parallel applications built upon the MPI, shmem, and Global Arrays standards. 1.7 microsecond MPI latency System Topologies The Cray XD1 embedded switching fabric and 24 interchassis links enable a wide variety of network topologies. Topologies initially supported include direct connect and fat tree. A fat tree topology is ideal for larger systems (or small systems that are expected to grow) that run applications that use all-to-one, oneto-all, or all-to-all communications. Systems up to 24 chassis can be supported with a single layer fat tree. Each half of the RapidArray fabric forms a separate fat tree, with its own set of external switches. Lustre Global Parallel File System The Lustre parallel file system provides a highperformance, high-availability, object-based storage architecture for the Cray XD1 system and can scale to thousands of nodes. Lustre runs natively across the Cray XD1 RapidArray interconnect, avoiding TCP/IP overheads. Application Acceleration The application acceleration subsystem incorporates reconfigurable computing capabilities to deliver superlinear speedup of targeted applications. Each Cray XD1 chassis can be configured with six application acceleration processors, based on Xilinx Virtex-4 FPGAs, that can be programmed to accelerate key algorithms. Well suited to functions such as all_reduce operations, searching, sorting and signal processing, the application acceleration subsystem acts as a coprocessor to the AMD Opteron processors, handling the computationally intensive and highly repetitive algorithms that can be significantly accelerated through parallel execution. The application acceleration processors are tightly integrated with Linux and the AMD Opteron processors and use standard software programming APIs, removing a major obstacle to application speedup.

Balance The Key to Exceptional Application Performance Active Management The active management subsystem delivers outstanding usability and reliability through single system command and control and selfhealing capabilities. Single System Command and Control The active management system combines partitioning and intelligent self-configuring features to allow administrators to view and manage hundreds of processors as one or more logical computers. Partitions divide the Cray XD1 system into logical computers. Administrators view and operate on partitions, rather than on individual nodes. Self-configuration features automatically translate partition details into node configurations. Transaction processing techniques ensure configuration integrity. Single system control is provided over common administrative functions: system configuration, monitoring and fault tolerance, software upgrades, storage management, network management, user management, security, and resource and queue management. These functions may be accessed using the active manager graphical user interface (GUI) or a command line interface (CLI). Single system command and control eliminates common configuration problems, increases system uptime, and significantly reduces the time and effort necessary to manage a multiprocessor, high performance computer. Self-Healing The Cray XD1 system provides extensive fault detection, isolation, and prediction capabilities, coupled with automated proactive and reactive self-healing intelligence. 200 sensors to monitor system health Fault Detection: In each chassis, a dedicated management processor with its own supervisory network continuously monitors over 200 critical hardware functions, including air flow, temperatures, voltages, in-rush currents, parity errors and component diagnostics. Proactive Management: Sophisticated proactive controls adjust a broad range of operating parameters to maintain peak performance and optimal operating conditions. These proactive measures improve the mean time between failures (MTBF) and ensure system resiliency and job completion. Recovery from Failures: The Cray XD1 s selfhealing intelligence facilitates a quick and automated recovery in the event of a hardware failure, reducing outages from hours to minutes. Redundancy features include N+1 sparing and the ability to reallocate resources in the event of a failure, enabling a replacement node to assume the persona of a failed node and restore full capacity to the affected partition. Jobs are automatically rescheduled from the last checkpoint. It takes more than simply adding processors to increase HPC application performance. MPI applications, such as simulation and modeling, are compute- and communicateintensive. They must perform millions of calculations and then exchange intermediate results before beginning another iteration. To maximize processor performance, this exchange of data must be fast enough to prevent the compute processor from sitting idle, waiting for data. Thus, the faster the processors, the greater the bandwidth required and the lower the latency that can be tolerated. A well-balanced system matches processing power with memory, interprocessor and I/O bandwidth, ensuring that applications are free to communicate results without experiencing delays.

CRAY XD1 SUPERCOMPUTER Technical data Release 1.3 CPU Cache FLOPS SMP Main Memory Memory Bandwidth 12 64-bit AMD Opteron 200 series single or dual core processors per chassis 64K L1 instruction cache, 64K L1 data cache, 1 MB L2 cache per processor 106 GFLOPS theoretical peak performance (@ 2.2 GHz Dual Core) 2-way (single core) or 4-way (dual core) nodes 12 48 GB PC3200 (DDR400) Registered ECC SDRAM per chassis or 96 GB PC2700 (DDR333) Registered EEC SDRAM per chassis (1-8 GB per socket) 12.8 GB/s per node Interconnect 2 or 4 Cray RapidArray links per node (4 or 8 GB/s per node) Fully nonblocking Cray RapidArray switch fabric (48 GB/s or 96 GB/s) 12 or 24 external Cray RapidArray interchassis links 24 or 48 GB/s aggregate 1.7 µs MPI latency between nodes Direct Connect or Fat Tree Topology Application Acceleration (FPGA) 6 Xilinx Virtex-4 FPGAs, XC4VLX160-10 or XC4VSX55-10, 16 MB QDR RAM, 3.2 GB/s interconnect External I/O Options 4 PCI-X bus slots Quad Port Gigabit Ethernet PCI-X card Single Port 10 Gigabit Ethernet PCI-X Card Single and Dual Port Fibre Channel HBAs Single Port Infiniband HCA Disk Up to six 3.5 inch Serial ATA drives (74GB 10K RPM or 250 GB 7200 RPM) System Administration Graphical and command line system administration tool sets for partition management, fault management, configuration, security, software updating, telemetry and provisioning across all chassis in a system Partitioning of system into multiple logical computers Administration of entire partition as a single entity (single system command and control) Transaction processing used to ensure configuration consistency Workload management - Grid Engine, PBS Pro, Platform LSF Automatic interconnect topology verification and auto-configuration of L2 and IP networking Automatic response to component failures: isolation of hard failures, re-initialization on soft failures, switching around redundant components. Automatic re-start of jobs from last checkpoint following system failure Reliability Management processor, network, over 200 measurement points on each Cray XD1 chassis Independent 100 Mb/s management fabric within and between Cray XD1 chassis Thermal stability maintained through temperature monitoring and regulation of fan speed Proactive detection of impending component failure (CPUs, fans, power supplies, memory, interconnect) and automatic isolation of failed components Hot swappable fans and disk blades Operating System Cray HPC enhanced Linux, Kernel version 2.6.5 File Systems EXT2/3, NFS v2/3, ReiserFS, Lustre Global Parallel File System External Storage Fibre Channel disk controllers and drive enclosures Parallel Processing MPI 1.2 MPI-IO, MPI-2, Sockets Direct Protocol (SDP) for TCP/IP acceleration Shared Memory Access Shmem, OpenMP, Global Arrays Compilers Fortran 77, 90, 95, HPF; C/C++; Java Power 2200 W typical. 3 phase: Circuit requirement: 20A, 208 V per chassis Single phase: Circuit requirement: 30A, 230 V per chassis Dimensions 3 VU (5.25 ) x 23 W x 32 D per chassis (13.3 cm high x 58.4 cm wide x 81.3 cm deep), 12 chassis per cabinet For the most up-to-date information, please visit www.cray.com

The Supercomputer Company Global Headquarters: Cray Inc. 411 First Avenue S., Suite 600 Seattle, WA 98104-2860 USA tel (206) 701 2000 fax (206) 701 2500 Sales Inquiries: North America: 1 (877) CRAY INC Worldwide: 1 (651) 605 8817 sales@cray.com www.cray.com 2005 Cray Inc. All rights reserved. Specifications subject to change without notice. Cray is a registered trademark, and the Cray logo, Cray XD1 and RapidArray are trademarks of Cray Inc. All other trademarks mentioned herein are the properties of their respective owners. 0405