Linux Networx HPC Strategy and Roadmap

Similar documents
HYCOM Performance Benchmark and Profiling

Sami Saarinen Peter Towers. 11th ECMWF Workshop on the Use of HPC in Meteorology Slide 1

Mellanox Technologies Maximize Cluster Performance and Productivity. Gilad Shainer, October, 2007

Voltaire Making Applications Run Faster

IBM Virtual Fabric Architecture

Single-Points of Performance

Scaling to Petaflop. Ola Torudbakken Distinguished Engineer. Sun Microsystems, Inc

InfiniBand Strengthens Leadership as The High-Speed Interconnect Of Choice

Supercomputing with Commodity CPUs: Are Mobile SoCs Ready for HPC?

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

2008 International ANSYS Conference

MM5 Modeling System Performance Research and Profiling. March 2009

GROMACS Performance Benchmark and Profiling. September 2012

Sun Lustre Storage System Simplifying and Accelerating Lustre Deployments

AMBER 11 Performance Benchmark and Profiling. July 2011

SUN CUSTOMER READY HPC CLUSTER: REFERENCE CONFIGURATIONS WITH SUN FIRE X4100, X4200, AND X4600 SERVERS Jeff Lu, Systems Group Sun BluePrints OnLine

NAMD Performance Benchmark and Profiling. February 2012

AcuSolve Performance Benchmark and Profiling. October 2011

Looking ahead with IBM i. 10+ year roadmap

ARISTA: Improving Application Performance While Reducing Complexity

Agenda. Sun s x Sun s x86 Strategy. 2. Sun s x86 Product Portfolio. 3. Virtualization < 1 >

The Convergence of Storage and Server Virtualization Solarflare Communications, Inc.

Sugon TC6600 blade server

TOP500 List s Twice-Yearly Snapshots of World s Fastest Supercomputers Develop Into Big Picture of Changing Technology

FUSION1200 Scalable x86 SMP System

AMD Opteron Processors In the Cloud

Cluster Network Products

Customer Success Story Los Alamos National Laboratory

About Us. Are you ready for headache-free HPC? Call us and learn more about our custom clustering solutions.

Optimizing LS-DYNA Productivity in Cluster Environments

Memory Scalability Evaluation of the Next-Generation Intel Bensley Platform with InfiniBand

CP2K Performance Benchmark and Profiling. April 2011

Performance Optimizations via Connect-IB and Dynamically Connected Transport Service for Maximum Performance on LS-DYNA

Initial Performance Evaluation of the Cray SeaStar Interconnect

Delivering HPC Performance at Scale

BlueGene/L. Computer Science, University of Warwick. Source: IBM

Data Protection for Cisco HyperFlex with Veeam Availability Suite. Solution Overview Cisco Public

Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems.

Huawei KunLun Mission Critical Server. KunLun 9008/9016/9032 Technical Specifications

Computer Aided Engineering with Today's Multicore, InfiniBand-Based Clusters ANSYS, Inc. All rights reserved. 1 ANSYS, Inc.

Technology Trends IT ELS. Kevin Kettler Dell CTO

Enterprise Cloud Computing. Eddie Toh Platform Marketing Manager, APAC Data Centre Group Cisco Summit 2010, Kuala Lumpur

iscsi Technology: A Convergence of Networking and Storage

Who says world-class high performance computing (HPC) should be reserved for large research centers? The Cray CX1 supercomputer makes HPC performance

Accelerating HPC. (Nash) Dr. Avinash Palaniswamy High Performance Computing Data Center Group Marketing

Huawei KunLun Mission Critical Server. KunLun 9008/9016/9032 Technical Specifications

Best Practices for Setting BIOS Parameters for Performance

SGI Overview. HPC User Forum Dearborn, Michigan September 17 th, 2012

IBM System x family brochure

Cray XD1 Supercomputer Release 1.3 CRAY XD1 DATASHEET

HPE ProLiant DL360 Gen P 16GB-R P408i-a 8SFF 500W PS Performance Server (P06453-B21)

Altair RADIOSS Performance Benchmark and Profiling. May 2013

IBM System p5 510 and 510Q Express Servers

Making a Case for a Green500 List

Preparing GPU-Accelerated Applications for the Summit Supercomputer

The Mont-Blanc approach towards Exascale

ABySS Performance Benchmark and Profiling. May 2010

The Red Storm System: Architecture, System Update and Performance Analysis

Scalable x86 SMP Server FUSION1200

Update on Cray Activities in the Earth Sciences

Brand-New Vector Supercomputer

OpenFOAM Performance Testing and Profiling. October 2017

Cisco Unified Computing System Delivering on Cisco's Unified Computing Vision

DataON and Intel Select Hyper-Converged Infrastructure (HCI) Maximizes IOPS Performance for Windows Server Software-Defined Storage

The Future of Computing: AMD Vision

Improving Application Performance and Predictability using Multiple Virtual Lanes in Modern Multi-Core InfiniBand Clusters

The Stampede is Coming: A New Petascale Resource for the Open Science Community

AMD EPYC Empowers Single-Socket Servers

Roadmapping of HPC interconnects

IBM System p5 550 and 550Q Express servers

Mapping MPI+X Applications to Multi-GPU Architectures

Future Trends in Hardware and Software for use in Simulation

TOP 7 REASONS to Migrate Your Data Center to the Cloud

HPC Innovation Lab Update. Dell EMC HPC Community Meeting 3/28/2017

Performance Analysis and Evaluation of Mellanox ConnectX InfiniBand Architecture with Multi-Core Platforms

Lossless 10 Gigabit Ethernet: The Unifying Infrastructure for SAN and LAN Consolidation

Storage System. David Southwell, Ph.D President & CEO Obsidian Strategics Inc. BB:(+1)

Magellan Project. Jeff Broughton NERSC Systems Department Head October 7, 2009

SR-IOV Support for Virtualization on InfiniBand Clusters: Early Experience

for Power Energy and

An Oracle White Paper December Accelerating Deployment of Virtualized Infrastructures with the Oracle VM Blade Cluster Reference Configuration

Arm Processor Technology Update and Roadmap

John Fragalla TACC 'RANGER' INFINIBAND ARCHITECTURE WITH SUN TECHNOLOGY. Presenter s Name Title and Division Sun Microsystems

Enabling Performance-per-Watt Gains in High-Performance Cluster Computing

2 to 4 Intel Xeon Processor E v3 Family CPUs. Up to 12 SFF Disk Drives for Appliance Model. Up to 6 TB of Main Memory (with GB LRDIMMs)

IBM System x family brochure

DESCRIPTION GHz, 1.536TB shared memory RAM, and 20.48TB RAW internal storage teraflops About ScaleMP

Parallel File Systems Compared

LS-DYNA Productivity and Power-aware Simulations in Cluster Environments

High Performance Computing with Accelerators

Dell Fluid Data solutions. Powerful self-optimized enterprise storage. Dell Compellent Storage Center: Designed for business results

Amazon Web Services: Performance Analysis of High Performance Computing Applications on the Amazon Web Services Cloud

IBM Power 755 server. High performance compute node for scalable clusters using InfiniBand architecture interconnect products.

Unified Storage and FCoE

Himeno Performance Benchmark and Profiling. December 2010

LAMMPS Performance Benchmark and Profiling. July 2012

NERSC Site Update. National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory. Richard Gerber

Intel Xeon Phi архитектура, модели программирования, оптимизация.

SWsoft ADVANCED VIRTUALIZATION AND WORKLOAD MANAGEMENT ON ITANIUM 2-BASED SERVERS

DB2 purescale: High Performance with High-Speed Fabrics. Author: Steve Rees Date: April 5, 2011

Transcription:

Linux Networx HPC Strategy and Roadmap Eric Pitcher October 2006

Agenda Business Update Technology Trends Linux Networx Drivers Hardware Roadmap Software Highlights

Linux Networx Overview Founded in 1989, HQ in Salt Lake City Operations in Americas, EMEA, Asia Pacific Privately Held Strong Venture Capital Backing Oak Investment Partners Tudor Ventures Lehman Brothers Linux Supercomputing is our only business Nearly a decade of Linux cluster experience 200+ sites 500+ systems deployed LS Series Linux Supersystems

Selected Linux Networx Large Systems Site: Computer Name: Theoretical Peak: Best Top500 Ranking: System Model: Processors: US Army Research Laboratory John Von Neumann 13.926 TFlops 13 Evolocity II 2048 Xeon 3.4 GHz Los Alamos National Laboratory Lightning 11.26 TFlops 6 Evolocity 2816 Opteron 2 GHz Lawrence Livermore National Lab. MCR 11.2 TFlops 3 Evolocity II 2304 Xeon 2.4 GHz Site: Computer Name: Theoretical Peak: Best Top500 Ranking: System Model: Processors: Los Alamos National Laboratory Lightening Bolt 16.645 TFlops N/A Evolocity II 1536 Opteron 2.4 GHz + 245 Dual Core 1.8 GHz Los Alamos National Laboratory Pink 10.0 TFlops N/A Evolocity 2050 Xeon 2.4 GHz NERSC Jacquard 3.1 TFlops N/A Evolocity II 722 Opteron 2.2 GHz

Selected Recent Orders Fleet Numerical Meteorology and Oceanography Center 229 Nodes & 4 Accelerator Nodes mid-nov DoD TI-06 1. Classified System 842 Intel Dempsey Nodes 2. Non-classified System 1024 Intel Woodcrest nodes 3. Dugway 64 node Dempsey 4. ARL/Visualization 64 node 5. TDS 16 nodes Woodcrest/Dempsey NASA/Goddard 1. Baseline System 128 Dempsey nodes 2. Visualization System 16 node 3. SU01 256 Woodcrest Nodes 4. SU02 256 Woodcrest Nodes

What Drives Us Create the best Linux Supercomputing systems, software and services to help our customer s solve important problems.

Technology Trends

Processor Highlights CPU On-die memory controllers seem to be a likely direction (improved latency) Trend towards point-to-point FSB (improved on-node scalability) Both major architectures going towards 4 FLOPS/clock Focus on power efficiency through 2012 Multi-paradigm Computing Increase in the use of co-processors for multi-paradigm computing Multi-paradigm computing will provide highly capable offload engines -- initially difficult to program

2012 2011 2007 2008 2009 2010 2006 Cores/ Socket 2005 16 8 4 2 1 Mainstream CPU Trends

2012 GFLOPS/ Socket 160 80 40 20 10 2011 2007 2008 2009 2010 2006 Cores/ Socket 2005 16 8 4 2 1 Mainstream CPU Trends

2012 GFLOPS/ Socket 160 80 40 20 10 2011 2007 2008 2009 2010 2006 MFLOPS/ Watt 2005 1600 800 400 200 100 CPU Power Trends

CPU Summary Multi-core CPUs will greatly exceed single-core CPUs in computing power The cost-effectiveness of multi-core CPUs may make it attractive to apply fewer cores to an application to get more bandwidth. Renaissance in CPU and co-processor architectures expected over the next 5 years. Expect to see a lot of interesting ideas. May be hard to pick winners.

2012 Bytes/ FLOP 1.5 1.25 1 0.75 0.5 2011 2007 2008 2009 2010 Per-core Bandwidth Bytes/clock 2005 2006 16 8 4 2 Memory Bandwidth 4 FLOPS/clock Multi-core

Memory Summary Bytes/FLOP stays mostly constant through 2010. So performance per node should significantly increase. Remedy for memory-intensive apps: turn off some cores leaving more bandwidth for remaining cores. Latency is currently inferior for FB-DIMMs, but capacity is better.

Interconnect Highlights Infiniband will offer leading-edge performance through 2009 Bandwidth will increase rapidly through 100 Gbps Latencies will likely flatten out @ ~500-800 ns by 2011 Significant improvements in optics in the near future 10Gb Ethernet will become more cost effective

Infiniband in 2007 Half-round trip ping-pong latency < 1.5 µs Messaging rate > 10 M/sec Inexpensive optical cabling for SDR and DDR up to 100 m More mature stack natively supported in SLES 10 More goodies TBA later this year

Linux Networx Drivers

Key Engineering Themes Use Off the shelf to optimize price/performance Systems Standard, supported systems Integrated, validated, tested software stack Innovate on top of standard hardware and software Environment/Pricing TCO increasingly important (eg, quick system deployment) Power and cooling requirements becoming more important Focus on production supercomputing Necessitates full-featured software stack

LNXI Systems Strategy 2007 Cluster System Mgmt Software Performance & Utilization System Usability & Management RAS Operable on other vendor systems 2006 Application-tuned Supersystems Performance-tuned for specific applications Production at Power-up The Linux Networx Difference Application Expertise VIS Integrated Systems Storage Management/usability RAS LS Performance Utilization S/W S/W 2006 Performance Software Platform Integrated/validated software stack 2005/6 Standard Hardware Platforms Integrated systems delivery Storage LS Series VIS Apptuned Accel 1 st Generation Clusters Pre 2006 - LNXI leadership with stable systems SC expertise and credibility Custom approach less scalable Hardware Software

Hardware Roadmap

LS Series LS-1 AMD LS-1 Intel Quad Core LS-Series LS-1 AMD Socket 940 LS-1 Intel Dempsey (Dual Core) Santa Rosa (Dual Core) CTN2 Clovertown1 (Quad Core) Woodcrest (Dual Core) Q2 Q3 Q4 H1 2006 2007 H2

Value Performance Premium Performance Ultimate Performance Total Cost of Ownership No In-house S/W Development Limited Linux Experience Multiple applications/balanced system Online capacity expansion Production system environment Extreme I/O performance Savvy In-house technologists Time to Results is top priority Small File I/O Diverse Streaming and Small File Applications Streaming Bandwidth CFS Lustre NFS Linux Networx GPFS

Software Highlights

Key Linux Networx System Features Clusterworx Maximizes system performance with comprehensive system monitoring. Increases uptime with automated system management. Quickly updates the entire system with fast multicast provisioning. Implements risk-free changes to OS, applications, and kernels through versioncontrolled image management Allows users to assess the state of the entire system and each node at a glance.

System Dashboard

Key Linux Networx System Features (Cont d) Full hardware validation Complete System Integration and Testing before Shipping Project Management Professional Services Site Planning and Environmental Design Application Parallelization Consulting Data Storage and Management Assessment Data Storage Design and Implementation Capacity Planning Linux Networx Training Provides users with information, tools and practical experience to successfully manage LNXI Supercomputing Systems.

WRF Model on LS-1 System Scaling: 64-512 Cores WRF 2.5KM Conus, 1501 x 1201 domain S p eedup relative to 64-w ay 9 8 7 6 5 4 3 2 1 0 0 100 200 300 400 500 600 CPUs Linux Networx Dual- Core/Infiniband Linear