Customer Success Story Los Alamos National Laboratory

Similar documents
NetApp: Solving I/O Challenges. Jeff Baxter February 2013

PANASAS TIERED PARITY ARCHITECTURE

New Approach to Unstructured Data

A High-Performance Storage and Ultra- High-Speed File Transfer Solution for Collaborative Life Sciences Research

InfiniBand Strengthens Leadership as The High-Speed Interconnect Of Choice

Market Report. Scale-out 2.0: Simple, Scalable, Services- Oriented Storage. Scale-out Storage Meets the Enterprise. June 2010.

Architecting Storage for Semiconductor Design: Manufacturing Preparation

HPC Considerations for Scalable Multidiscipline CAE Applications on Conventional Linux Platforms. Author: Correspondence: ABSTRACT:

Introducing Panasas ActiveStor 14

An Oracle White Paper June Enterprise Database Cloud Deployment with Oracle SuperCluster T5-8

RAIDIX Data Storage Solution. Clustered Data Storage Based on the RAIDIX Software and GPFS File System

IBM Power Systems: Open innovation to put data to work Dexter Henderson Vice President IBM Power Systems

INFOBrief. Dell-IBRIX Cluster File System Solution. Key Points

Object Storage: Redefining Bandwidth for Linux Clusters

Storage for High-Performance Computing Gets Enterprise Ready

IBM Spectrum NAS. Easy-to-manage software-defined file storage for the enterprise. Overview. Highlights

DELL EMC ISILON F800 AND H600 I/O PERFORMANCE

Microsoft Office SharePoint Server 2007

VEXATA FOR ORACLE. Digital Business Demands Performance and Scale. Solution Brief

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

Sun Lustre Storage System Simplifying and Accelerating Lustre Deployments

Panasas Corporate & Product Overview

Red Hat Ceph Storage and Samsung NVMe SSDs for intensive workloads

Bringing OpenStack to the Enterprise. An enterprise-class solution ensures you get the required performance, reliability, and security

Los Alamos National Laboratory Modeling & Simulation

An Oracle White Paper December Accelerating Deployment of Virtualized Infrastructures with the Oracle VM Blade Cluster Reference Configuration

NetApp FAS8000 Series

VMWARE EBOOK. Easily Deployed Software-Defined Storage: A Customer Love Story

IBM System Storage DCS3700

THE ZADARA CLOUD. An overview of the Zadara Storage Cloud and VPSA Storage Array technology WHITE PAPER

Sales Presentation Case 2018 Dell EMC

QLogic TrueScale InfiniBand and Teraflop Simulations

Lecture 9: MIMD Architectures

THE EMC ISILON STORY. Big Data In The Enterprise. Deya Bassiouni Isilon Regional Sales Manager Emerging Africa, Egypt & Lebanon.

Xcellis Technical Overview: A deep dive into the latest hardware designed for StorNext 5

DELL EMC ISILON SCALE-OUT NAS PRODUCT FAMILY Unstructured data storage made simple

Introduction to the Active Everywhere Database

HP X9000 Network Storage Systems (NAS)

DELL EMC ISILON SCALE-OUT NAS PRODUCT FAMILY

Top 5 Reasons to Consider

Optimizing LS-DYNA Productivity in Cluster Environments

EMC ISILON HARDWARE PLATFORM

Lecture 9: MIMD Architectures

IBM _` p5 570 servers

The Fastest Scale-Out NAS

The Last Bottleneck: How Parallel I/O can attenuate Amdahl's Law

The Last Bottleneck: How Parallel I/O can improve application performance

TECHNICAL OVERVIEW ACCELERATED COMPUTING AND THE DEMOCRATIZATION OF SUPERCOMPUTING

DDN. DDN Updates. Data DirectNeworks Japan, Inc Shuichi Ihara. DDN Storage 2017 DDN Storage

Nimble Storage Adaptive Flash

2008 International ANSYS Conference

Upgrade Your MuleESB with Solace s Messaging Infrastructure

Let s say I give you a homework assignment today with 100 problems. Each problem takes 2 hours to solve. The homework is due tomorrow.

Midsize Enterprise Solutions Selling Guide. Sell NetApp s midsize enterprise solutions and take your business and your customers further, faster

On-Demand Supercomputing Multiplies the Possibilities

An Oracle White Paper April Consolidation Using the Fujitsu M10-4S Server

VERITAS Storage Foundation 4.0 TM for Databases

Oracle Coherence + Oracle Exalogic Elastic Cloud

Assessing performance in HP LeftHand SANs

Chelsio Communications. Meeting Today s Datacenter Challenges. Produced by Tabor Custom Publishing in conjunction with: CUSTOM PUBLISHING

IME (Infinite Memory Engine) Extreme Application Acceleration & Highly Efficient I/O Provisioning

Data-Intensive Applications on Numerically-Intensive Supercomputers

Oracle and Tangosol Acquisition Announcement

Company. Intellectual Property. Headquartered in the Silicon Valley

IBM DeepFlash Elastic Storage Server

Oracle Exadata Statement of Direction NOVEMBER 2017

IBM Real-time Compression and ProtecTIER Deduplication

White Paper. Low Cost High Availability Clustering for the Enterprise. Jointly published by Winchester Systems Inc. and Red Hat Inc.

NetApp Clustered Data ONTAP 8.2 Storage QoS Date: June 2013 Author: Tony Palmer, Senior Lab Analyst

Shared Object-Based Storage and the HPC Data Center

An Oracle White Paper March Consolidation Using the Oracle SPARC M5-32 High End Server

BlueGene/L. Computer Science, University of Warwick. Source: IBM

10 Parallel Organizations: Multiprocessor / Multicore / Multicomputer Systems

e BOOK Do you feel trapped by your database vendor? What you can do to take back control of your database (and its associated costs!

HPC File Systems and Storage. Irena Johnson University of Notre Dame Center for Research Computing

IBM Spectrum Scale IO performance

LS-DYNA Productivity and Power-aware Simulations in Cluster Environments

Was ist dran an einer spezialisierten Data Warehousing platform?

powered by Cloudian and Veritas

SoftNAS Cloud Data Management Products for AWS Add Breakthrough NAS Performance, Protection, Flexibility

Cisco Prime Cable Provisioning 5.1

Microsoft Exchange Server 2010 workload optimization on the new IBM PureFlex System

University at Buffalo Center for Computational Research

Introduction to iscsi

NetApp SolidFire and Pure Storage Architectural Comparison A SOLIDFIRE COMPETITIVE COMPARISON

Mellanox InfiniBand Solutions Accelerate Oracle s Data Center and Cloud Solutions

EditShare XStream EFS Shared Storage

The UnAppliance provides Higher Performance, Lower Cost File Serving

DELL EMC DATA DOMAIN EXTENDED RETENTION SOFTWARE

Optimize your business-critical workloads with Dell EqualLogic PS Series

Performance Optimizations via Connect-IB and Dynamically Connected Transport Service for Maximum Performance on LS-DYNA

HYCOM Performance Benchmark and Profiling

Cisco Nexus 5000 and Emulex LP21000

IBM Power Systems HPC Cluster

IBM HPC DIRECTIONS. Dr Don Grice. ECMWF Workshop November, IBM Corporation

iscsi Technology Brief Storage Area Network using Gbit Ethernet The iscsi Standard

Storage Solutions for VMware: InfiniBox. White Paper

Oracle Database 10G. Lindsey M. Pickle, Jr. Senior Solution Specialist Database Technologies Oracle Corporation

CS500 SMARTER CLUSTER SUPERCOMPUTERS

Video Architectures Eyes on the Future: The Benefits of Wireless Technology for Fixed Video Surveillance

Transcription:

Customer Success Story Los Alamos National Laboratory Panasas High Performance Storage Powers the First Petaflop Supercomputer at Los Alamos National Laboratory Case Study June 2010

Highlights First Petaflop Supercomputer l #1 on the Top-500 list in 2009 l Over 3,250 Compute Nodes l Over 156 I/O Nodes l Over 12,000 Core Processors l Hundreds of Thousands of Cell Processors Panasas High Performance Storage Solutions l 100 Panasas Storage Shelves l 2 Petabytes Capacity l 55 GB/s Throughput l Throughput Scales Linearly with Capacity l Non-Stop Availability & Simple to Deploy Abstract Scientists want faster, more powerful high-performance supercomputers to simulate complex physical, biological, and socioeconomic systems with greater realism and predictive power. In May 2009, Los Alamos scientists doubled the processing speed of the previously fastest computer. Roadrunner, a new hybrid supercomputer, uses specialized Cell coprocessors to propel performance to petaflop speeds capable of more than a thousand trillion calculations per second. One of the keys to the project s success was the need for a highly reliable storage subsystem that could provide massively parallel I/O throughput with linear scalability that was simple to deploy and maintain. Los Alamos National Laboratory deployed Panasas High Performance storage to meet the stringent needs of the Roadrunner project. Panasas provides scalable performance with commodity parts providing excellent price/performance, scalable capacity and performance that scale symmetrically with processor, caching, and network bandwidth. Introduction From its origins as a secret Manhattan Project laboratory, Los Alamos National Laboratory (LANL) has attracted world-class scientists to solve the nation s most challenging problems. As one of the U.S. Department of Energy s multi-program, multi-disciplinary research laboratories, Los Alamos thrives on having the best people doing the best science to solve problems of global importance. The Laboratory s central focus is to foster science, encourage innovation, and recruit and retain top talent while balancing short-term and long-term needs with research driven by principal investigators and aligned with missions. In 2002, when Los Alamos scientists were planning for their next-generation supercomputer, they looked at the commodity market for a way to make an end run around the speed and memory barriers looming in the future. What they found was a joint project by Sony Computer Entertainment, Toshiba, and IBM to 2

develop a specialized microprocessor that could revolutionize computer games and consumer electronics, as well as scientific computing. The major application areas addressed were radiation transport (how radiation deposits energy in and moves through matter), neutron transport (how neutrons move through matter), molecular dynamics (how matter responds at the molecular level to shock waves and other extreme conditions), fluid turbulence, and the behavior of plasmas (ionized gases) in relation to fusion experiments at the National Ignition Facility at Lawrence Livermore National Laboratory. Roadrunner Architecture Roadrunner is a cluster of approximately 3,250 compute nodes interconnected by an off-the-shelf parallel-computing network. Each compute node consists of two AMD Opteron dual-core microprocessors, with each of the Opteron cores internally attached to one of four enhanced Cell microprocessors. This enhanced Cell does double-precision arithmetic faster and can access more memory than can the original Cell in a PlayStation 3. The entire machine will have almost 13,000 Cells and half as many dual-core Opterons. Unique I/O Challenges LANL used a breakthrough architecture designed to achieve high performance. The cluster could run a large number of jobs. In order to make the best use of the cluster, there were some I/O challenges and considerations that had to be addressed, including achieving: l Performance required to serve and write data for the cluster to keep it busy l Parallel I/O for optimized performance for each node l Scalability needed to support a large number of cluster nodes l Reliability needed to keep the cluster running l A reasonable cost, both in terms of acquisition and management l A storage architecture that could support future generations of clusters Source: Los Alamos National Laboratory Storage Requirements LANL wanted a shared storage architecture where all their compute clusters on a network could have access to shared Panasas storage. This is in contrast to many other HPC sites that bind a storage cluster tightly to a compute cluster. LANL has been using Panasas for all their high performance storage needs for several years and so naturally, when they deployed RoadRunner, Panasas Storage was the logical choice to satisfy their demanding performance, availability and scalability requirements. In order to achieve the performance goals of RoadRunner, the network storage system would have to deliver superior bandwidth, lower latency and be superior in terms of the file creation rate per second and the aggregate throughput. Parallel I/O would be important, as this would enable parallel data streams to 3

go to the 156 I/O nodes, which in turn provide I/O service to the compute nodes. In addition, the storage system would have to be able to scale to support storage capacities of 10 Petabyes (PB) and beyond, plus a growing number of nodes in the cluster. This eliminated the possibility of implementing NFS-based storage systems, as they would not be able to scale past a certain number of nodes. Availability was another key consideration. Because RoadRunner is such a high demand resource, downtime for maintenance, expansion and repair is extremely scarce, so the storage system would need to support automatic provisioning for easy growth LANL designed the cluster architecture to be simple, easy to manage, and cost effective. One aspect of simplicity and cost was to use I/O nodes that interface with network attached storage, lowering cost by reducing the number of GbE connections from a few thousand to a few hundred. The storage system used, likewise, would have to be easy to manage, provide a reasonable total cost of ownership, and fit into the established architecture. Last but not least, the storage system architecture needed to have headroom to provide I/O to larger future cluster configurations. Instead of being able to support just a single large cluster, the storage architecture needs be able to scale to support multiple clusters from a single, central storage pool. Deployment of Storage from Panasas Panasas high performance storage met all the criteria dictated by LANL s computer cluster architecture. Panasas utilizes a parallel file system and provides GbE or infiniband connectivity between some of the cluster nodes and storage. In fact, the Panasas storage system itself is a cluster. It uses an object-based file system that is an integral part of the PanFS TM Storage Operating System. PanFS divides files into large virtual data objects. These objects can be stored on Panasas StorageBlade modules or units of storage, enabling dynamic distribution of data activity throughout the storage system. Parallel data paths between compute clusters and the StorageBlade modules result in high performance data access to large files. The result is that Panasas Scaleout NAS delivers performance that scales almost linearly with capacity. In fact, the current implementation supports more than 2 PB capacity while delivering a massive 55 GB per second throughput. Parallel access is made possible by empowering each of the LANL cluster I/O nodes with a small installable file system from Panasas the DirectFLOW client 4

access software. This enables direct communication and data transfer between the I/O nodes and the StorageBlade modules. A simple three-step process is required to initiate direct data transfers: 1. Requests for I/O are made to a Panasas DirectorBlade module, which controls access to data. 2. The DirectorBlade module authenticates the requests, obtains the object maps of all applicable objects across the StorageBlade modules and sends the maps to the I/O nodes. 3. With authentication and virtual maps, I/O nodes access data on StorageBlade modules directly and in parallel. This concurrency eliminates the bottleneck of traditional, monolithic storage systems, which manage data in small blocks, and delivers record-setting data throughput. The number of data streams is limited only by the number of StorageBlade modules and the number of I/O nodes in the server cluster. Performance is also a key factor in evaluating the cost effectiveness of storage for large, expensive clusters. It is important to keep a powerful cluster busy doing computations and processing jobs rather than waiting for I/O operations to complete. If a cluster costs $3.5M and is amortized over 3 years, the cost is approximately $3200 per day. As such, it makes sense to keep the cluster utilized and completing jobs as fast as possible. In order to do this, outages have to be minimized and the cluster must be kept up and running. Therefore, system availability is another key factor. The Panasas high performance storage system provided the availability that LANL was looking for. Ron Minnich, leader of the cluster research team at LANL, said, Rather than having a cluster node failure at least once a week, as a comparable system with local disks would experience, the time between node failures was increased to once every 7 weeks. In terms of simplicity of administration, the Panasas architecture allows management of all data within a single seamless namespace. There is no NFS root as NFS is replaced by a global file system that is scaleable. Data objects can be dynamically rebalanced across the StorageBlade module for continual ongoing performance optimization. Furthermore, the objectbased architecture enables faster data reconstruction in the event of a drive failure because StorageBlade modules have the intelligence to reconstruct data objects only, not unused sectors on a drive. Finally, the Panasas storage architecture is capable of supporting future generations of more complex cluster configurations, including the scalability to support multiple clusters from one central storage pool. Instead of using one big, expensive GbE switch through one subnet, Panasas storage can be configured across many subnets through smaller, less expensive network switches that connect to the I/O nodes. This improves reliability by providing even more paths to serve data to the compute cluster. Furthermore, by having a centralized pool of high-performance storage, there is no need to copy data for different kinds of jobs. After the computation jobs, visualization tasks can take place with a compute in place approach rather than copying the data to another storage system. 5

Summary The Roadrunner project has proven to be a tremendous asset to the Laboratory s nuclear weapons program simulations as well as for other scientific endeavors like cosmology, antibiotic drug design, HIV vaccine development, astrophysics, ocean or climate modeling, turbulence, and many others. The Panasas architecture is designed specifically to support Linux clusters, scaling performance in concert with capacity. Panasas Scale-out NAS is capable of meeting the needs of the world s leading high performance computing clusters, both now and for future generations of cluster technology. About Panasas Panasas, Inc., the leader in high-performance scale-out NAS storage solutions, enables enterprise customers to rapidly solve complex computing problems, speed innovation and bring new products to market faster. All Panasas solutions leverage the patented PanFS storage operating system to deliver exceptional performance, scalability and manageability. Phone: 1-888-PANASAS 2010 Panasas Incorporated. All rights reserved. Panasas is a trademark of Panasas, Inc. in the United States and other countries. 6 PW-10-20301