Cray events. ! Cray User Group (CUG): ! Cray Technical Workshop Europe:

Size: px
Start display at page:

Download "Cray events. ! Cray User Group (CUG): ! Cray Technical Workshop Europe:"

Transcription

1 Cray events! Cray User Group (CUG):! When: May 16-19, 2005! Where: Albuquerque, New Mexico - USA! Registration: reserved to CUG members! Web site: Cray Technical Workshop Europe:! When: September 20-22, 2005! Where: Manno, Lugano Switzerland! Registration: free! Web site: 1

2 XD1 Presentation agenda! Cray XD1! Product Overview! Interconnect! Management! FPGA-Based Application Acceleration! Benchmark results! Usage of ENEA s system! Login! Compilation! Job submission 2

3 Cray XD1 Product Overview

4 The Cray XD1 Cray XD1! Built for price/performance! Interconnect bandwidth/latency! System-wide process synchronization! Application Acceleration FPGAs! Standards-based! 32/64-bit X86, Linux, MPI! High resiliency! Self-configuring, self-monitoring, self-healing! Single system command & control! Intuitive, tightly integrated management software Purpose-built and and optimized for for high performance workloads 4

5 Cray XD1 System Architecture Compute! 12 AMD Opteron 32/64 bit, x86 processors! High Performance Linux RapidArray Interconnect! 12 communications processors! 1 Tb/s switch fabric Active Management! Dedicated processor Application Acceleration! 6 co-processors Processors directly connected via integrated switch fabric 5

6 XD1 Chassis Six SATA Hard Drives Fans Six Two-way Opteron Blades Chassis Front Six FPGA Modules 0.5 Tb/s Switch Three I/O Slots (e.g. JTAG) Four 133 MHz PCI-X Slots 12 x 2 GB/s Ports to Fabric Connector for 2 nd 0.5 Tb/s Switch and 12 More 2 GB/s Ports to Fabric Chassis Rear 6

7 Compute Blade 4 DIMM Sockets for DDR 400 Registered ECC Memory RapidArray Communications Processor AMD Opteron 248 Processor AMD Opteron 248 Processor (2.2 GHz for ENEA) Connector to Main Board 4 DIMM Sockets for DDR 400 ECC Memory (2+2 GB for ENEA) 7

8 The AMD Opteron Processor Dedicated Memory Bus Native 32 & 64 bit x86 compatibility 64KB 64KB 1 MB Up to 19.2 GB/s I/O 8

9 Cray Innovations Balanced Interconnect Active Management Cray XD1 Application Acceleration Performance and Usability 9

10 Interconnect

11 Cray XD1 Interconnect System RapidArray! Interconnect processors! Switch fabric! Communications software 11

12 Typical HPC Application Compute Communicate Compute Communicate Compute.! HPC applications exhibit intense compute/ communicate cycles! 20% - 60% of time, CPUs sit idle, stalled by communications! Application performance is very sensitive to latency and bandwidth Interconnect Drives System Performance 12

13 Balanced Interconnect Gigabytes GFLOPS Gigabytes per Second Memory Processor I/O Interconnect Xeon Server 6.4GB/s DDR GB/s PCI-X 0.25 GB/s GigE Cray XD1 6.4GB/s DDR GB/s Removing the communications bottleneck 13

14 HPC Communications Optimizations Cray Communications Libraries! MPI 1.2 library! TCP/IP! PVM! Shmem! Global Arrays! System-wide process & time synchronization RapidArray Communications Processor! HT/RA tunnelling! Routing with route redundancy! Reliable transport! Short message latency optimization! DMA operations! System-wide clock synchronization AMD Opteron 2XX Processor RapidArray Communications Processor 2 GB/s 3.2 GB/s RA 2 GB/s Direct Connected Processor Architecture 14

15 Interconnect Benchmarks (MPI Latency) MPI Latency versus Message Size Latency (microsec) Message Length (bytes) Cray XD1 (RapidArray) Quadrics (Elan 4) 4x Infiniband Myrinet (D card) Cray XD1 XD1 latency is is 4 times lower than Infiniband Cray XD1 XD1 can can send 2 KB KB before Infiniband sends its its first first byte byte 15

16 Interconnect Benchmarks (MPI Throughput) Bandwidth versus Message Size Bandwidth (MB/s) MB Data Length (Bytes) Cray XD1 (1/2 RapidArray Fabric) Quadrics Elan 4 4x Infiniband Myrinet (D card) Cray XD1 XD1 delivers 2X 2X the the throughput of of Infiniband (1 (1 KB KB Message Size) Size) 16

17 Management

18 Active Manager System CLI and Web Access Active Management Software Usability! Single System Command and Control Resiliency! Dedicated management processors, realtime OS and communications fabric.! Proactive background diagnostics with self-healing. Automated management for exceptional reliability, availability, serviceability 18

19 Active Manager GUI: SysAdmin Portal GUI GUI provides quick quick access to to status status info info and and system functions 19

20 Automated Management Users & Administrators Compute Partition 1 File Services Partition Front End Partition Compute Partition 2 Compute Partition 1! Partition management! Linux configuration! Hardware monitoring! Software upgrades! File system management! Data backups Network configuration Accounting & user management Security Performance analysis Resource & queue management Single System Command and and Control 20

21 Active Manager Job Scheduler Job Job management is is integrated with with self-healing features to to increase job job completion rates 21

22 Application Acceleration FPGA

23 Application Acceleration Application Accelerator Application Acceleration! Reconfigurable Computing! Tightly coupled to Opteron! FPGA acts like a programmable coprocessor! Performs vector operations! Well-suited for:! Searching, sorting, signal processing, audio/video/image manipulation, encryption, error correction, coding/decoding, packet processing, random number generation. SuperLinear speedup for key algorithms 23

24 Application Acceleration Co-Processor AMD Opteron HyperTransport 3.2 GB/s 3.2 GB/s 3.2 GB/s 3.2 GB/s 3.2 GB/s RAP RapidArray Application Acceleration FPGA Xilinx Virtex II Pro 3.2 GB/s QDR SRAM 2 GB/s 2 GB/s Cray RapidArray Interconnect 24

25 FPGA Linux API! Administration Commands! fpga_open allocate and open fpga! fpga_close close allocated fpga! fpga_load load binary into fpga! Control Commands! fpga_start start fpga (release from reset)! fpga_stop stop fpga! Status Commands! fpga_status get status of fpga! Data Commands! fpga_put put data to fpga ram! fpga_get get data from fpga ram! Interrupt/Blocking Commands! fpga_intwait blocks process waits for fpga interrupt Programmer sees get/put and message passing programming model 25

26 Programming & Applications Environment

27 Programming Environment Operating System Cray HPC Enhanced Linux Distribution (derived from SuSE 8.2) System Management Active Manager for system administration & workload management Application Acceleration Kit IP Cores, Reference Designs, Command-line tools, API, JTAG interface card Scientific Libraries AMD Core Math Library (ACML) Shared Memory Access Shmem, Global Arrays, OpenMP 3 rd Party Tools Fortran 77/90/95, HPF, C/C++, Java Communications Libraries MPI 1.2 Cray Cray XD1 XD1 is is standards-based for for ease ease of of programming Linux, Linux, x86, x86, MPI MPI 27

28 Benchmark results

29 HPCC results! Slide by J. Dongarra (SOS9 meeting 03/2005). 29

30 PTRANS Benchmark! PTRANS (parallel matrix transpose) implements a parallel matrix transpose for two-dimensional blockcyclic storage! It is an important benchmark because it exercises the communications of the computer heavily on a realistic problem where pairs of processors communicate with each other simultaneously.! It is a useful test of the total communications capacity of the network. Unit: Giga Bytes per Second! Several molecular dynamic codes and some climate models must transpose large arrays to perform multidimensional FFTs (CPMD, FPMD, VASP, Climate Spectral models) 30

31 G-PTRANS, EP-STREAM-TRIAD PTRANS EP STREAM TRIAD GByte/s Cray XD RA 64 Dalco - Opt QSNetII 64 Dell - Xeon InfiniBand 64 IBM p Federation 128 HP SC QsNet 32 SGI Altix Itanium SGI Altix Itanium2 1.5 IBM p655 STREAM results scaled to single CPU 31

32 FFTE Benchmark! FFTE measures the floating point rate of execution of double precision complex one-dimensional Fast Fourier Transform! It is an important benchmark because it exercises the computation and the all-to-all communications required by global FFT algorithm! It is a useful test of the total communications capacity of the network. Unit: Giga Flops per Second! Several molecular dynamic codes and some climate models perform multi-dimensional FFTs (CPMD, FPMD, VASP, Climate Spectral models) 32

33 G-FFTE, EP-DGEMM G-FFTE - Global FFT EP DGEMM Gflop/s Cray XD RA 64 Dalco - Opt QSNetII 64 Dell - Xeon InfiniBand 64 IBM p Federation 128 HP SC QsNet 32 SGI Altix Itanium SGI Altix Itanium2 1.5 IBM p655 DGEMM results scaled to single CPU 33

34 Cray XD1 benchmark - FPMD FPMD - H2Obig - 64 CPUs 1.8 GHz Opteron/Myrinet IBM Cluster 1350, Intel Xeon 3.06 GHz/Myrinet IBM 1.3 GHz SP4 Cray XD1-2.4 GHz Cray X Elapsed time (seconds) Cray XD1 XD1 is is times faster than the the IBM IBM Power times faster than the the IBM IBM Cluster times faster than the the Opteron/Myrinet cluster 34

35 ECHAM5 case T63/L Forecast years per day =5 6=30 12=60 Number processors 5xVector=Scalar X1 XD1 35

36 Parallel performance on XD1! 5 Opteron/XD1 >= 1 CPU of X1! At low end of scaling curve (no real surprise)! At high end of scaling curve too (proprietary high speed interconnect) Parallel efficiency Number processors XD1 non radiation radiation full run (including IO) X1 reference (1/5 MSPs) 36

37 XD1 Benchmark - GROMACS GROMACS (DPPC in water) Speedup XD1, 2.2 GHz 1.8 GHz Opteron cluster, Myrinet Perfect scaling CPUs The The Cray XD1 XD1 delivers 63% 63% greater speedup over GHz GHz Opteron/Myrinet cluster at at CPUs (Higher is is better.) 37

38 Cray XD1 Benchmark: Amber 8 (scaling) Amber8 XD1 vs. Altix Scaling Speedup XD1 jac XD1 factor_ix Altix jac Altix factor_ix Perfect CPUs The The Cray XD1 XD1 delivers % 40% greater speedup over Altix Itanium2 cluster (Higher is is better.) 38

39 Cray XD1 Benchmarks: CHARMM! Next slide:! Itanium2 data - CHARMM version c31a2! XD1 CHARMM version c31b1 39

40 Cray XD1 Benchmarks: CHARMM CHARM M MbCO+4985 waters(17491 atoms), 100 steps Elapsed time (seconds) XD1, 2.2 GHz Itanium2, 1.4 GHz CPUs XD1 is 20% faster than 1.4 GHz Itanium2 at 16 CPUs, and is less expensive. (Lower is better.) 40

41 Cray XD1 Benchmarks: LS-DYNA LS-DYNA mpp970, revision 5434a 3-car collision, simulation time: 150 ms 20 Number of Runs per day XD1, 2.2 GHz Opteron/RapidArray HP, 2.2 GHz Opteron/InfiniBand 1.5GHz Itanium2 rx2600 Intel Xeon 3.06GHz CPUs The The Cray Cray XD1 XD1 is is 29% 29% faster faster than than a HP HP Opteron Opteron Cluster Cluster and and 9% 9% faster faster than than an an Itanium2 Itanium2 Cluster Cluster at at CPUs, CPUs, and and 12% 12% faster faster than than the the Itanium2 Itanium2 cluster cluster at at CPUs. CPUs. (Higher (Higher is is better.) better.) 41

42 Cray XD1 Benchmarks: LS-DYNA LS-DYNA mpp970, revision 5434a Neon_refined, simulation time: 30 ms XD1, 2.2 GHz Opteron/RapidArray HP, 2.2 GHz Opteron/InfiniBand 1.5GHz Itanium2 rx2600 Number of Runs per day CPUs The The Cray Cray XD1 XD1 is is 31% 31% faster faster than than a HP HP Opteron Opteron Cluster Cluster and and 13% 13% faster faster than than an an Itanium2 Itanium2 Cluster Cluster at at CPUs, CPUs, and and 11% 11% faster faster than than the the Itanium2 Itanium2 cluster cluster at at CPUs. CPUs. (Higher (Higher is is better.) better.) 42

43 Cray XD1 Benchmarks: STAR-HPC STAR-HPC 3.24 engine test case Number of Runs/day XD1, 2.2 GHz Opteron/RapidArray SGI Altix, 1.5 GHz Itanium2 IBM p5-570, 1.9 GHz Power 5 AMD 2 GHz Opteron/Myrinet CPUs The The Cray Cray XD1 XD1 is is 40% 40% faster faster than than the the SGI SGI Itanium2 Itanium2 Cluster Cluster and and 2.2X 2.2X faster faster than than the the AMD AMD Opteron Opteron cluster cluster at at CPUs, CPUs, and and 5% 5% faster faster than than the the IBM IBM Power5 Power5 at at CPUs. CPUs. (Higher (Higher is is better.) better.) 43

44

45

The Cray XD1. Technical Overview. Amar Shan, Senior Product Marketing Manager. Cray XD1. Cray Proprietary

The Cray XD1. Technical Overview. Amar Shan, Senior Product Marketing Manager. Cray XD1. Cray Proprietary The Cray XD1 Cray XD1 Technical Overview Amar Shan, Senior Product Marketing Manager Cray Proprietary The Cray XD1 Cray XD1 Built for price performance 30 times interconnect performance 2 times the density

More information

Cray XD1 Supercomputer Release 1.3 CRAY XD1 DATASHEET

Cray XD1 Supercomputer Release 1.3 CRAY XD1 DATASHEET CRAY XD1 DATASHEET Cray XD1 Supercomputer Release 1.3 Purpose-built for HPC delivers exceptional application performance Affordable power designed for a broad range of HPC workloads and budgets Linux,

More information

Assessment of LS-DYNA Scalability Performance on Cray XD1

Assessment of LS-DYNA Scalability Performance on Cray XD1 5 th European LS-DYNA Users Conference Computing Technology (2) Assessment of LS-DYNA Scalability Performance on Cray Author: Ting-Ting Zhu, Cray Inc. Correspondence: Telephone: 651-65-987 Fax: 651-65-9123

More information

GROMACS Performance Benchmark and Profiling. September 2012

GROMACS Performance Benchmark and Profiling. September 2012 GROMACS Performance Benchmark and Profiling September 2012 Note The following research was performed under the HPC Advisory Council activities Participating vendors: AMD, Dell, Mellanox Compute resource

More information

Support for Programming Reconfigurable Supercomputers

Support for Programming Reconfigurable Supercomputers Support for Programming Reconfigurable Supercomputers Miriam Leeser Nicholas Moore, Albert Conti Dept. of Electrical and Computer Engineering Northeastern University Boston, MA Laurie Smith King Dept.

More information

Scheduling Strategies for HPC as a Service (HPCaaS) for Bio-Science Applications

Scheduling Strategies for HPC as a Service (HPCaaS) for Bio-Science Applications Scheduling Strategies for HPC as a Service (HPCaaS) for Bio-Science Applications Sep 2009 Gilad Shainer, Tong Liu (Mellanox); Jeffrey Layton (Dell); Joshua Mora (AMD) High Performance Interconnects for

More information

COSC 6385 Computer Architecture - Multi Processor Systems

COSC 6385 Computer Architecture - Multi Processor Systems COSC 6385 Computer Architecture - Multi Processor Systems Fall 2006 Classification of Parallel Architectures Flynn s Taxonomy SISD: Single instruction single data Classical von Neumann architecture SIMD:

More information

LS-DYNA Productivity and Power-aware Simulations in Cluster Environments

LS-DYNA Productivity and Power-aware Simulations in Cluster Environments LS-DYNA Productivity and Power-aware Simulations in Cluster Environments Gilad Shainer 1, Tong Liu 1, Jacob Liberman 2, Jeff Layton 2 Onur Celebioglu 2, Scot A. Schultz 3, Joshua Mora 3, David Cownie 3,

More information

Performance of HPC Applications over InfiniBand, 10 Gb and 1 Gb Ethernet. Swamy N. Kandadai and Xinghong He and

Performance of HPC Applications over InfiniBand, 10 Gb and 1 Gb Ethernet. Swamy N. Kandadai and Xinghong He and Performance of HPC Applications over InfiniBand, 10 Gb and 1 Gb Ethernet Swamy N. Kandadai and Xinghong He swamy@us.ibm.com and xinghong@us.ibm.com ABSTRACT: We compare the performance of several applications

More information

Experiences with HP SFS / Lustre in HPC Production

Experiences with HP SFS / Lustre in HPC Production Experiences with HP SFS / Lustre in HPC Production Computing Centre (SSCK) University of Karlsruhe Laifer@rz.uni-karlsruhe.de page 1 Outline» What is HP StorageWorks Scalable File Share (HP SFS)? A Lustre

More information

LS-DYNA Performance Benchmark and Profiling. April 2015

LS-DYNA Performance Benchmark and Profiling. April 2015 LS-DYNA Performance Benchmark and Profiling April 2015 2 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, Dell, Mellanox Compute resource

More information

GROMACS Performance Benchmark and Profiling. August 2011

GROMACS Performance Benchmark and Profiling. August 2011 GROMACS Performance Benchmark and Profiling August 2011 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, Dell, Mellanox Compute resource

More information

ABySS Performance Benchmark and Profiling. May 2010

ABySS Performance Benchmark and Profiling. May 2010 ABySS Performance Benchmark and Profiling May 2010 Note The following research was performed under the HPC Advisory Council activities Participating vendors: AMD, Dell, Mellanox Compute resource - HPC

More information

Future Trends in Hardware and Software for use in Simulation

Future Trends in Hardware and Software for use in Simulation Future Trends in Hardware and Software for use in Simulation Steve Feldman VP/IT, CD-adapco April, 2009 HighPerformanceComputing Building Blocks CPU I/O Interconnect Software General CPU Maximum clock

More information

Benchmark Results. 2006/10/03

Benchmark Results. 2006/10/03 Benchmark Results cychou@nchc.org.tw 2006/10/03 Outline Motivation HPC Challenge Benchmark Suite Software Installation guide Fine Tune Results Analysis Summary 2 Motivation Evaluate, Compare, Characterize

More information

Early Evaluation of the Cray XD1

Early Evaluation of the Cray XD1 Early Evaluation of the Cray XD1 (FPGAs not covered here) Mark R. Fahey Sadaf Alam, Thomas Dunigan, Jeffrey Vetter, Patrick Worley Oak Ridge National Laboratory Cray User Group May 16-19, 2005 Albuquerque,

More information

AMBER 11 Performance Benchmark and Profiling. July 2011

AMBER 11 Performance Benchmark and Profiling. July 2011 AMBER 11 Performance Benchmark and Profiling July 2011 Note The following research was performed under the HPC Advisory Council activities Participating vendors: AMD, Dell, Mellanox Compute resource -

More information

Application Performance on Dual Processor Cluster Nodes

Application Performance on Dual Processor Cluster Nodes Application Performance on Dual Processor Cluster Nodes by Kent Milfeld milfeld@tacc.utexas.edu edu Avijit Purkayastha, Kent Milfeld, Chona Guiang, Jay Boisseau TEXAS ADVANCED COMPUTING CENTER Thanks Newisys

More information

FUSION1200 Scalable x86 SMP System

FUSION1200 Scalable x86 SMP System FUSION1200 Scalable x86 SMP System Introduction Life Sciences Departmental System Manufacturing (CAE) Departmental System Competitive Analysis: IBM x3950 Competitive Analysis: SUN x4600 / SUN x4600 M2

More information

Balance of HPC Systems Based on HPCC Benchmark Results

Balance of HPC Systems Based on HPCC Benchmark Results Proceedings, Cray Users Group Conference, CUG 2005, May 16-19, Albuquerque, NM USA. Extended version of the paper R. Rabenseifner et al., Network Bandwidth Measurements and Ratio Analysis with the HPC

More information

Introduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620

Introduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620 Introduction to Parallel and Distributed Computing Linh B. Ngo CPSC 3620 Overview: What is Parallel Computing To be run using multiple processors A problem is broken into discrete parts that can be solved

More information

The Red Storm System: Architecture, System Update and Performance Analysis

The Red Storm System: Architecture, System Update and Performance Analysis The Red Storm System: Architecture, System Update and Performance Analysis Douglas Doerfler, Jim Tomkins Sandia National Laboratories Center for Computation, Computers, Information and Mathematics LACSI

More information

LAMMPS Performance Benchmark and Profiling. July 2012

LAMMPS Performance Benchmark and Profiling. July 2012 LAMMPS Performance Benchmark and Profiling July 2012 Note The following research was performed under the HPC Advisory Council activities Participating vendors: AMD, Dell, Mellanox Compute resource - HPC

More information

Metropolitan Road Traffic Simulation on FPGAs

Metropolitan Road Traffic Simulation on FPGAs Metropolitan Road Traffic Simulation on FPGAs Justin L. Tripp, Henning S. Mortveit, Anders Å. Hansson, Maya Gokhale Los Alamos National Laboratory Los Alamos, NM 85745 Overview Background Goals Using the

More information

Optimizing LS-DYNA Productivity in Cluster Environments

Optimizing LS-DYNA Productivity in Cluster Environments 10 th International LS-DYNA Users Conference Computing Technology Optimizing LS-DYNA Productivity in Cluster Environments Gilad Shainer and Swati Kher Mellanox Technologies Abstract Increasing demand for

More information

MM5 Modeling System Performance Research and Profiling. March 2009

MM5 Modeling System Performance Research and Profiling. March 2009 MM5 Modeling System Performance Research and Profiling March 2009 Note The following research was performed under the HPC Advisory Council activities AMD, Dell, Mellanox HPC Advisory Council Cluster Center

More information

CP2K Performance Benchmark and Profiling. April 2011

CP2K Performance Benchmark and Profiling. April 2011 CP2K Performance Benchmark and Profiling April 2011 Note The following research was performed under the HPC Advisory Council activities Participating vendors: AMD, Dell, Mellanox Compute resource - HPC

More information

The Optimal CPU and Interconnect for an HPC Cluster

The Optimal CPU and Interconnect for an HPC Cluster 5. LS-DYNA Anwenderforum, Ulm 2006 Cluster / High Performance Computing I The Optimal CPU and Interconnect for an HPC Cluster Andreas Koch Transtec AG, Tübingen, Deutschland F - I - 15 Cluster / High Performance

More information

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance 11 th International LS-DYNA Users Conference Computing Technology LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance Gilad Shainer 1, Tong Liu 2, Jeff Layton

More information

Initial Performance Evaluation of the Cray SeaStar Interconnect

Initial Performance Evaluation of the Cray SeaStar Interconnect Initial Performance Evaluation of the Cray SeaStar Interconnect Ron Brightwell Kevin Pedretti Keith Underwood Sandia National Laboratories Scalable Computing Systems Department 13 th IEEE Symposium on

More information

HPCC Results. Nathan Wichmann Benchmark Engineer

HPCC Results. Nathan Wichmann Benchmark Engineer HPCC Results Nathan Wichmann Benchmark Engineer Outline What is HPCC? Results Comparing current machines Conclusions May 04 2 HPCChallenge Project Goals To examine the performance of HPC architectures

More information

NAMD Performance Benchmark and Profiling. January 2015

NAMD Performance Benchmark and Profiling. January 2015 NAMD Performance Benchmark and Profiling January 2015 2 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, Dell, Mellanox Compute resource

More information

The Use of Cloud Computing Resources in an HPC Environment

The Use of Cloud Computing Resources in an HPC Environment The Use of Cloud Computing Resources in an HPC Environment Bill, Labate, UCLA Office of Information Technology Prakashan Korambath, UCLA Institute for Digital Research & Education Cloud computing becomes

More information

Real Parallel Computers

Real Parallel Computers Real Parallel Computers Modular data centers Background Information Recent trends in the marketplace of high performance computing Strohmaier, Dongarra, Meuer, Simon Parallel Computing 2005 Short history

More information

What does Heterogeneity bring?

What does Heterogeneity bring? What does Heterogeneity bring? Ken Koch Scientific Advisor, CCS-DO, LANL LACSI 2006 Conference October 18, 2006 Some Terminology Homogeneous Of the same or similar nature or kind Uniform in structure or

More information

Platform Choices for LS-DYNA

Platform Choices for LS-DYNA Platform Choices for LS-DYNA Manfred Willem and Lee Fisher High Performance Computing Division, HP lee.fisher@hp.com October, 2004 Public Benchmarks for LS-DYNA www.topcrunch.org administered by University

More information

Resources Current and Future Systems. Timothy H. Kaiser, Ph.D.

Resources Current and Future Systems. Timothy H. Kaiser, Ph.D. Resources Current and Future Systems Timothy H. Kaiser, Ph.D. tkaiser@mines.edu 1 Most likely talk to be out of date History of Top 500 Issues with building bigger machines Current and near future academic

More information

Performance Analysis of Memory Transfers and GEMM Subroutines on NVIDIA TESLA GPU Cluster

Performance Analysis of Memory Transfers and GEMM Subroutines on NVIDIA TESLA GPU Cluster Performance Analysis of Memory Transfers and GEMM Subroutines on NVIDIA TESLA GPU Cluster Veerendra Allada, Troy Benjegerdes Electrical and Computer Engineering, Ames Laboratory Iowa State University &

More information

Sami Saarinen Peter Towers. 11th ECMWF Workshop on the Use of HPC in Meteorology Slide 1

Sami Saarinen Peter Towers. 11th ECMWF Workshop on the Use of HPC in Meteorology Slide 1 Acknowledgements: Petra Kogel Sami Saarinen Peter Towers 11th ECMWF Workshop on the Use of HPC in Meteorology Slide 1 Motivation Opteron and P690+ clusters MPI communications IFS Forecast Model IFS 4D-Var

More information

Single-Points of Performance

Single-Points of Performance Single-Points of Performance Mellanox Technologies Inc. 29 Stender Way, Santa Clara, CA 9554 Tel: 48-97-34 Fax: 48-97-343 http://www.mellanox.com High-performance computations are rapidly becoming a critical

More information

Performance Optimizations via Connect-IB and Dynamically Connected Transport Service for Maximum Performance on LS-DYNA

Performance Optimizations via Connect-IB and Dynamically Connected Transport Service for Maximum Performance on LS-DYNA Performance Optimizations via Connect-IB and Dynamically Connected Transport Service for Maximum Performance on LS-DYNA Pak Lui, Gilad Shainer, Brian Klaff Mellanox Technologies Abstract From concept to

More information

2008 International ANSYS Conference

2008 International ANSYS Conference 2008 International ANSYS Conference Maximizing Productivity With InfiniBand-Based Clusters Gilad Shainer Director of Technical Marketing Mellanox Technologies 2008 ANSYS, Inc. All rights reserved. 1 ANSYS,

More information

HPCS HPCchallenge Benchmark Suite

HPCS HPCchallenge Benchmark Suite HPCS HPCchallenge Benchmark Suite David Koester, Ph.D. () Jack Dongarra (UTK) Piotr Luszczek () 28 September 2004 Slide-1 Outline Brief DARPA HPCS Overview Architecture/Application Characterization Preliminary

More information

Computer Comparisons Using HPCC. Nathan Wichmann Benchmark Engineer

Computer Comparisons Using HPCC. Nathan Wichmann Benchmark Engineer Computer Comparisons Using HPCC Nathan Wichmann Benchmark Engineer Outline Comparisons using HPCC HPCC test used Methods used to compare machines using HPCC Normalize scores Weighted averages Comparing

More information

IBM System x servers. Innovation comes standard

IBM System x servers. Innovation comes standard IBM System x servers Innovation comes standard IBM System x servers Highlights Build a cost-effective, flexible IT environment with IBM X-Architecture technology. Achieve maximum performance per watt with

More information

Mellanox Technologies Maximize Cluster Performance and Productivity. Gilad Shainer, October, 2007

Mellanox Technologies Maximize Cluster Performance and Productivity. Gilad Shainer, October, 2007 Mellanox Technologies Maximize Cluster Performance and Productivity Gilad Shainer, shainer@mellanox.com October, 27 Mellanox Technologies Hardware OEMs Servers And Blades Applications End-Users Enterprise

More information

Performance comparison between a massive SMP machine and clusters

Performance comparison between a massive SMP machine and clusters Performance comparison between a massive SMP machine and clusters Martin Scarcia, Stefano Alberto Russo Sissa/eLab joint Democritos/Sissa Laboratory for e-science Via Beirut 2/4 34151 Trieste, Italy Stefano

More information

Cluster Network Products

Cluster Network Products Cluster Network Products Cluster interconnects include, among others: Gigabit Ethernet Myrinet Quadrics InfiniBand 1 Interconnects in Top500 list 11/2009 2 Interconnects in Top500 list 11/2008 3 Cluster

More information

What are Clusters? Why Clusters? - a Short History

What are Clusters? Why Clusters? - a Short History What are Clusters? Our definition : A parallel machine built of commodity components and running commodity software Cluster consists of nodes with one or more processors (CPUs), memory that is shared by

More information

Dell EMC Ready Bundle for HPC Digital Manufacturing Dassault Systѐmes Simulia Abaqus Performance

Dell EMC Ready Bundle for HPC Digital Manufacturing Dassault Systѐmes Simulia Abaqus Performance Dell EMC Ready Bundle for HPC Digital Manufacturing Dassault Systѐmes Simulia Abaqus Performance This Dell EMC technical white paper discusses performance benchmarking results and analysis for Simulia

More information

Scalable x86 SMP Server FUSION1200

Scalable x86 SMP Server FUSION1200 Scalable x86 SMP Server FUSION1200 Challenges Scaling compute-power is either Complex (scale-out / clusters) or Expensive (scale-up / SMP) Scale-out - Clusters Requires advanced IT skills / know-how (high

More information

LS-DYNA Performance Benchmark and Profiling. October 2017

LS-DYNA Performance Benchmark and Profiling. October 2017 LS-DYNA Performance Benchmark and Profiling October 2017 2 Note The following research was performed under the HPC Advisory Council activities Participating vendors: LSTC, Huawei, Mellanox Compute resource

More information

Accelerating HPC. (Nash) Dr. Avinash Palaniswamy High Performance Computing Data Center Group Marketing

Accelerating HPC. (Nash) Dr. Avinash Palaniswamy High Performance Computing Data Center Group Marketing Accelerating HPC (Nash) Dr. Avinash Palaniswamy High Performance Computing Data Center Group Marketing SAAHPC, Knoxville, July 13, 2010 Legal Disclaimer Intel may make changes to specifications and product

More information

COSC 6374 Parallel Computation. Parallel Computer Architectures

COSC 6374 Parallel Computation. Parallel Computer Architectures OS 6374 Parallel omputation Parallel omputer Architectures Some slides on network topologies based on a similar presentation by Michael Resch, University of Stuttgart Spring 2010 Flynn s Taxonomy SISD:

More information

OP2 FOR MANY-CORE ARCHITECTURES

OP2 FOR MANY-CORE ARCHITECTURES OP2 FOR MANY-CORE ARCHITECTURES G.R. Mudalige, M.B. Giles, Oxford e-research Centre, University of Oxford gihan.mudalige@oerc.ox.ac.uk 27 th Jan 2012 1 AGENDA OP2 Current Progress Future work for OP2 EPSRC

More information

Birds of a Feather Presentation

Birds of a Feather Presentation Mellanox InfiniBand QDR 4Gb/s The Fabric of Choice for High Performance Computing Gilad Shainer, shainer@mellanox.com June 28 Birds of a Feather Presentation InfiniBand Technology Leadership Industry Standard

More information

Performance Analysis of LS-DYNA in Huawei HPC Environment

Performance Analysis of LS-DYNA in Huawei HPC Environment Performance Analysis of LS-DYNA in Huawei HPC Environment Pak Lui, Zhanxian Chen, Xiangxu Fu, Yaoguo Hu, Jingsong Huang Huawei Technologies Abstract LS-DYNA is a general-purpose finite element analysis

More information

FUJITSU PHI Turnkey Solution

FUJITSU PHI Turnkey Solution FUJITSU PHI Turnkey Solution Integrated ready to use XEON-PHI based platform Dr. Pierre Lagier ISC2014 - Leipzig PHI Turnkey Solution challenges System performance challenges Parallel IO best architecture

More information

GW2000h w/gw175h/q F1 specifications

GW2000h w/gw175h/q F1 specifications Product overview The Gateway GW2000h w/ GW175h/q F1 maximizes computing power and thermal control with up to four hot-pluggable nodes in a space-saving 2U form factor. Offering first-class performance,

More information

Real Parallel Computers

Real Parallel Computers Real Parallel Computers Modular data centers Overview Short history of parallel machines Cluster computing Blue Gene supercomputer Performance development, top-500 DAS: Distributed supercomputing Short

More information

The Missing Piece of Virtualization. I/O Virtualization on 10 Gb Ethernet For Virtualized Data Centers

The Missing Piece of Virtualization. I/O Virtualization on 10 Gb Ethernet For Virtualized Data Centers The Missing Piece of Virtualization I/O Virtualization on 10 Gb Ethernet For Virtualized Data Centers Agenda 10 GbE Adapters Built for Virtualization I/O Throughput: Virtual & Non-Virtual Servers Case

More information

STAR-CCM+ Performance Benchmark and Profiling. July 2014

STAR-CCM+ Performance Benchmark and Profiling. July 2014 STAR-CCM+ Performance Benchmark and Profiling July 2014 Note The following research was performed under the HPC Advisory Council activities Participating vendors: CD-adapco, Intel, Dell, Mellanox Compute

More information

NAMD Performance Benchmark and Profiling. November 2010

NAMD Performance Benchmark and Profiling. November 2010 NAMD Performance Benchmark and Profiling November 2010 Note The following research was performed under the HPC Advisory Council activities Participating vendors: HP, Mellanox Compute resource - HPC Advisory

More information

COSC 6374 Parallel Computation. Parallel Computer Architectures

COSC 6374 Parallel Computation. Parallel Computer Architectures OS 6374 Parallel omputation Parallel omputer Architectures Some slides on network topologies based on a similar presentation by Michael Resch, University of Stuttgart Edgar Gabriel Fall 2015 Flynn s Taxonomy

More information

Computing architectures Part 2 TMA4280 Introduction to Supercomputing

Computing architectures Part 2 TMA4280 Introduction to Supercomputing Computing architectures Part 2 TMA4280 Introduction to Supercomputing NTNU, IMF January 16. 2017 1 Supercomputing What is the motivation for Supercomputing? Solve complex problems fast and accurately:

More information

HPC Solution. Technology for a New Era in Computing

HPC Solution. Technology for a New Era in Computing HPC Solution Technology for a New Era in Computing TEL IN HPC & Storage.. 20 years of changing with Technology Complete Solution Integrators for Select Verticals Mechanical Design & Engineering High Performance

More information

HPCC Optimizations and Results for the Cray X1 Nathan Wichmann Cray Inc. May 14, 2004

HPCC Optimizations and Results for the Cray X1 Nathan Wichmann Cray Inc. May 14, 2004 HPCC Optimizations and Results for the Cray X1 Nathan Wichmann Cray Inc. May 14, 2004 ABSTRACT: A new benchmark call HPCC has recently been proposed to evaluate High Performance systems. This paper discusses

More information

Oncilla - a Managed GAS Runtime for Accelerating Data Warehousing Queries

Oncilla - a Managed GAS Runtime for Accelerating Data Warehousing Queries Oncilla - a Managed GAS Runtime for Accelerating Data Warehousing Queries Jeffrey Young, Alex Merritt, Se Hoon Shon Advisor: Sudhakar Yalamanchili 4/16/13 Sponsors: Intel, NVIDIA, NSF 2 The Problem Big

More information

Determining Optimal MPI Process Placement for Large- Scale Meteorology Simulations with SGI MPIplace

Determining Optimal MPI Process Placement for Large- Scale Meteorology Simulations with SGI MPIplace Determining Optimal MPI Process Placement for Large- Scale Meteorology Simulations with SGI MPIplace James Southern, Jim Tuccillo SGI 25 October 2016 0 Motivation Trend in HPC continues to be towards more

More information

SUN CUSTOMER READY HPC CLUSTER: REFERENCE CONFIGURATIONS WITH SUN FIRE X4100, X4200, AND X4600 SERVERS Jeff Lu, Systems Group Sun BluePrints OnLine

SUN CUSTOMER READY HPC CLUSTER: REFERENCE CONFIGURATIONS WITH SUN FIRE X4100, X4200, AND X4600 SERVERS Jeff Lu, Systems Group Sun BluePrints OnLine SUN CUSTOMER READY HPC CLUSTER: REFERENCE CONFIGURATIONS WITH SUN FIRE X4100, X4200, AND X4600 SERVERS Jeff Lu, Systems Group Sun BluePrints OnLine April 2007 Part No 820-1270-11 Revision 1.1, 4/18/07

More information

HPC Capabilities at Research Intensive Universities

HPC Capabilities at Research Intensive Universities HPC Capabilities at Research Intensive Universities Purushotham (Puri) V. Bangalore Department of Computer and Information Sciences and UAB IT Research Computing UAB HPC Resources 24 nodes (192 cores)

More information

The AMD64 Technology for Server and Workstation. Dr. Ulrich Knechtel Enterprise Program Manager EMEA

The AMD64 Technology for Server and Workstation. Dr. Ulrich Knechtel Enterprise Program Manager EMEA The AMD64 Technology for Server and Workstation Dr. Ulrich Knechtel Enterprise Program Manager EMEA Agenda Direct Connect Architecture AMD Opteron TM Processor Roadmap Competition OEM support The AMD64

More information

Installation and Test of Molecular Dynamics Simulation Packages on SGI Altix and Hydra-Cluster at JKU Linz

Installation and Test of Molecular Dynamics Simulation Packages on SGI Altix and Hydra-Cluster at JKU Linz Installation and Test of Molecular Dynamics Simulation Packages on SGI Altix and Hydra-Cluster at JKU Linz Rene Kobler May 25, 25 Contents 1 Abstract 2 2 Status 2 3 Changelog 2 4 Installation Notes 3 4.1

More information

Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems.

Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems. Cluster Networks Introduction Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems. As usual, the driver is performance

More information

Early Evaluation of the Cray X1 at Oak Ridge National Laboratory

Early Evaluation of the Cray X1 at Oak Ridge National Laboratory Early Evaluation of the Cray X1 at Oak Ridge National Laboratory Patrick H. Worley Thomas H. Dunigan, Jr. Oak Ridge National Laboratory 45th Cray User Group Conference May 13, 2003 Hyatt on Capital Square

More information

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Family-Based Platforms Executive Summary Complex simulations of structural and systems performance, such as car crash simulations,

More information

Moab Workload Manager on Cray XT3

Moab Workload Manager on Cray XT3 Moab Workload Manager on Cray XT3 presented by Don Maxwell (ORNL) Michael Jackson (Cluster Resources, Inc.) MOAB Workload Manager on Cray XT3 Why MOAB? Requirements Features Support/Futures 2 Why Moab?

More information

Quad-Core Intel Xeon Processor-based 3200/3210 Chipset Server Platforms

Quad-Core Intel Xeon Processor-based 3200/3210 Chipset Server Platforms Quad-Core Intel Xeon Processor-based 3200/3210 Chipset Server Platforms Entry-level server platforms with outstanding dependability, performance, and value Trust your company to Intel's proven server technology

More information

CESM (Community Earth System Model) Performance Benchmark and Profiling. August 2011

CESM (Community Earth System Model) Performance Benchmark and Profiling. August 2011 CESM (Community Earth System Model) Performance Benchmark and Profiling August 2011 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, Dell,

More information

4. LS-DYNA Anwenderforum, Bamberg 2005 IT I. September 28, 2005 Computation Products Group 1. September 28, 2005 Computation Products Group 2

4. LS-DYNA Anwenderforum, Bamberg 2005 IT I. September 28, 2005 Computation Products Group 1. September 28, 2005 Computation Products Group 2 4. LS-DYNA Anwenderforum, Bamberg 2005 IT I High Performance Enterprise Computing Hardware Design & Performance Application Optimization Guide Performance Evaluation Lynn Lewis Director WW FAE MSS lynn.lewis@amd.com

More information

Agenda. Sun s x Sun s x86 Strategy. 2. Sun s x86 Product Portfolio. 3. Virtualization < 1 >

Agenda. Sun s x Sun s x86 Strategy. 2. Sun s x86 Product Portfolio. 3. Virtualization < 1 > Agenda Sun s x86 1. Sun s x86 Strategy 2. Sun s x86 Product Portfolio 3. Virtualization < 1 > 1. SUN s x86 Strategy Customer Challenges Power and cooling constraints are very real issues Energy costs are

More information

BlueGene/L. Computer Science, University of Warwick. Source: IBM

BlueGene/L. Computer Science, University of Warwick. Source: IBM BlueGene/L Source: IBM 1 BlueGene/L networking BlueGene system employs various network types. Central is the torus interconnection network: 3D torus with wrap-around. Each node connects to six neighbours

More information

High Performance Computing with Accelerators

High Performance Computing with Accelerators High Performance Computing with Accelerators Volodymyr Kindratenko Innovative Systems Laboratory @ NCSA Institute for Advanced Computing Applications and Technologies (IACAT) National Center for Supercomputing

More information

The Cray Rainier System: Integrated Scalar/Vector Computing

The Cray Rainier System: Integrated Scalar/Vector Computing THE SUPERCOMPUTER COMPANY The Cray Rainier System: Integrated Scalar/Vector Computing Per Nyberg 11 th ECMWF Workshop on HPC in Meteorology Topics Current Product Overview Cray Technology Strengths Rainier

More information

Building NVLink for Developers

Building NVLink for Developers Building NVLink for Developers Unleashing programmatic, architectural and performance capabilities for accelerated computing Why NVLink TM? Simpler, Better and Faster Simplified Programming No specialized

More information

HPC Technology Update Challenges or Chances?

HPC Technology Update Challenges or Chances? HPC Technology Update Challenges or Chances? Swiss Distributed Computing Day Thomas Schoenemeyer, Technology Integration, CSCS 1 Move in Feb-April 2012 1500m2 16 MW Lake-water cooling PUE 1.2 New Datacenter

More information

Exchange Server 2007 Performance Comparison of the Dell PowerEdge 2950 and HP Proliant DL385 G2 Servers

Exchange Server 2007 Performance Comparison of the Dell PowerEdge 2950 and HP Proliant DL385 G2 Servers Exchange Server 2007 Performance Comparison of the Dell PowerEdge 2950 and HP Proliant DL385 G2 Servers By Todd Muirhead Dell Enterprise Technology Center Dell Enterprise Technology Center dell.com/techcenter

More information

Himeno Performance Benchmark and Profiling. December 2010

Himeno Performance Benchmark and Profiling. December 2010 Himeno Performance Benchmark and Profiling December 2010 Note The following research was performed under the HPC Advisory Council activities Participating vendors: AMD, Dell, Mellanox Compute resource

More information

LS-DYNA Performance Benchmark and Profiling. October 2017

LS-DYNA Performance Benchmark and Profiling. October 2017 LS-DYNA Performance Benchmark and Profiling October 2017 2 Note The following research was performed under the HPC Advisory Council activities Participating vendors: LSTC, Huawei, Mellanox Compute resource

More information

To Infiniband or Not Infiniband, One Site s s Perspective. Steve Woods MCNC

To Infiniband or Not Infiniband, One Site s s Perspective. Steve Woods MCNC To Infiniband or Not Infiniband, One Site s s Perspective Steve Woods MCNC 1 Agenda Infiniband background Current configuration Base Performance Application performance experience Future Conclusions 2

More information

Advances of parallel computing. Kirill Bogachev May 2016

Advances of parallel computing. Kirill Bogachev May 2016 Advances of parallel computing Kirill Bogachev May 2016 Demands in Simulations Field development relies more and more on static and dynamic modeling of the reservoirs that has come a long way from being

More information

Intel Select Solutions for Professional Visualization with Advantech Servers & Appliances

Intel Select Solutions for Professional Visualization with Advantech Servers & Appliances Solution Brief Intel Select Solution for Professional Visualization Intel Xeon Processor Scalable Family Powered by Intel Rendering Framework Intel Select Solutions for Professional Visualization with

More information

Maximizing Memory Performance for ANSYS Simulations

Maximizing Memory Performance for ANSYS Simulations Maximizing Memory Performance for ANSYS Simulations By Alex Pickard, 2018-11-19 Memory or RAM is an important aspect of configuring computers for high performance computing (HPC) simulation work. The performance

More information

Cray RS Programming Environment

Cray RS Programming Environment Cray RS Programming Environment Gail Alverson Cray Inc. Cray Proprietary Red Storm Red Storm is a supercomputer system leveraging over 10,000 AMD Opteron processors connected by an innovative high speed,

More information

Best Practices for Setting BIOS Parameters for Performance

Best Practices for Setting BIOS Parameters for Performance White Paper Best Practices for Setting BIOS Parameters for Performance Cisco UCS E5-based M3 Servers May 2013 2014 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page

More information

IBM System x3455 AMD Opteron SMP 1 U server features Xcelerated Memory Technology to meet the needs of HPC environments

IBM System x3455 AMD Opteron SMP 1 U server features Xcelerated Memory Technology to meet the needs of HPC environments IBM Europe Announcement ZG07-0492, dated July 17, 2007 IBM System x3455 AMD Opteron SMP 1 U server features Xcelerated Memory Technology to meet the needs of HPC environments Key prerequisites...2 Description...3

More information

HPC and IT Issues Session Agenda. Deployment of Simulation (Trends and Issues Impacting IT) Mapping HPC to Performance (Scaling, Technology Advances)

HPC and IT Issues Session Agenda. Deployment of Simulation (Trends and Issues Impacting IT) Mapping HPC to Performance (Scaling, Technology Advances) HPC and IT Issues Session Agenda Deployment of Simulation (Trends and Issues Impacting IT) Discussion Mapping HPC to Performance (Scaling, Technology Advances) Discussion Optimizing IT for Remote Access

More information

FPGA Solutions: Modular Architecture for Peak Performance

FPGA Solutions: Modular Architecture for Peak Performance FPGA Solutions: Modular Architecture for Peak Performance Real Time & Embedded Computing Conference Houston, TX June 17, 2004 Andy Reddig President & CTO andyr@tekmicro.com Agenda Company Overview FPGA

More information

EPYC VIDEO CUG 2018 MAY 2018

EPYC VIDEO CUG 2018 MAY 2018 AMD UPDATE CUG 2018 EPYC VIDEO CRAY AND AMD PAST SUCCESS IN HPC AMD IN TOP500 LIST 2002 TO 2011 2011 - AMD IN FASTEST MACHINES IN 11 COUNTRIES ZEN A FRESH APPROACH Designed from the Ground up for Optimal

More information

NAMD Performance Benchmark and Profiling. February 2012

NAMD Performance Benchmark and Profiling. February 2012 NAMD Performance Benchmark and Profiling February 2012 Note The following research was performed under the HPC Advisory Council activities Participating vendors: AMD, Dell, Mellanox Compute resource -

More information