Cray events. ! Cray User Group (CUG): ! Cray Technical Workshop Europe:
|
|
- Nathan Townsend
- 5 years ago
- Views:
Transcription
1 Cray events! Cray User Group (CUG):! When: May 16-19, 2005! Where: Albuquerque, New Mexico - USA! Registration: reserved to CUG members! Web site: Cray Technical Workshop Europe:! When: September 20-22, 2005! Where: Manno, Lugano Switzerland! Registration: free! Web site: 1
2 XD1 Presentation agenda! Cray XD1! Product Overview! Interconnect! Management! FPGA-Based Application Acceleration! Benchmark results! Usage of ENEA s system! Login! Compilation! Job submission 2
3 Cray XD1 Product Overview
4 The Cray XD1 Cray XD1! Built for price/performance! Interconnect bandwidth/latency! System-wide process synchronization! Application Acceleration FPGAs! Standards-based! 32/64-bit X86, Linux, MPI! High resiliency! Self-configuring, self-monitoring, self-healing! Single system command & control! Intuitive, tightly integrated management software Purpose-built and and optimized for for high performance workloads 4
5 Cray XD1 System Architecture Compute! 12 AMD Opteron 32/64 bit, x86 processors! High Performance Linux RapidArray Interconnect! 12 communications processors! 1 Tb/s switch fabric Active Management! Dedicated processor Application Acceleration! 6 co-processors Processors directly connected via integrated switch fabric 5
6 XD1 Chassis Six SATA Hard Drives Fans Six Two-way Opteron Blades Chassis Front Six FPGA Modules 0.5 Tb/s Switch Three I/O Slots (e.g. JTAG) Four 133 MHz PCI-X Slots 12 x 2 GB/s Ports to Fabric Connector for 2 nd 0.5 Tb/s Switch and 12 More 2 GB/s Ports to Fabric Chassis Rear 6
7 Compute Blade 4 DIMM Sockets for DDR 400 Registered ECC Memory RapidArray Communications Processor AMD Opteron 248 Processor AMD Opteron 248 Processor (2.2 GHz for ENEA) Connector to Main Board 4 DIMM Sockets for DDR 400 ECC Memory (2+2 GB for ENEA) 7
8 The AMD Opteron Processor Dedicated Memory Bus Native 32 & 64 bit x86 compatibility 64KB 64KB 1 MB Up to 19.2 GB/s I/O 8
9 Cray Innovations Balanced Interconnect Active Management Cray XD1 Application Acceleration Performance and Usability 9
10 Interconnect
11 Cray XD1 Interconnect System RapidArray! Interconnect processors! Switch fabric! Communications software 11
12 Typical HPC Application Compute Communicate Compute Communicate Compute.! HPC applications exhibit intense compute/ communicate cycles! 20% - 60% of time, CPUs sit idle, stalled by communications! Application performance is very sensitive to latency and bandwidth Interconnect Drives System Performance 12
13 Balanced Interconnect Gigabytes GFLOPS Gigabytes per Second Memory Processor I/O Interconnect Xeon Server 6.4GB/s DDR GB/s PCI-X 0.25 GB/s GigE Cray XD1 6.4GB/s DDR GB/s Removing the communications bottleneck 13
14 HPC Communications Optimizations Cray Communications Libraries! MPI 1.2 library! TCP/IP! PVM! Shmem! Global Arrays! System-wide process & time synchronization RapidArray Communications Processor! HT/RA tunnelling! Routing with route redundancy! Reliable transport! Short message latency optimization! DMA operations! System-wide clock synchronization AMD Opteron 2XX Processor RapidArray Communications Processor 2 GB/s 3.2 GB/s RA 2 GB/s Direct Connected Processor Architecture 14
15 Interconnect Benchmarks (MPI Latency) MPI Latency versus Message Size Latency (microsec) Message Length (bytes) Cray XD1 (RapidArray) Quadrics (Elan 4) 4x Infiniband Myrinet (D card) Cray XD1 XD1 latency is is 4 times lower than Infiniband Cray XD1 XD1 can can send 2 KB KB before Infiniband sends its its first first byte byte 15
16 Interconnect Benchmarks (MPI Throughput) Bandwidth versus Message Size Bandwidth (MB/s) MB Data Length (Bytes) Cray XD1 (1/2 RapidArray Fabric) Quadrics Elan 4 4x Infiniband Myrinet (D card) Cray XD1 XD1 delivers 2X 2X the the throughput of of Infiniband (1 (1 KB KB Message Size) Size) 16
17 Management
18 Active Manager System CLI and Web Access Active Management Software Usability! Single System Command and Control Resiliency! Dedicated management processors, realtime OS and communications fabric.! Proactive background diagnostics with self-healing. Automated management for exceptional reliability, availability, serviceability 18
19 Active Manager GUI: SysAdmin Portal GUI GUI provides quick quick access to to status status info info and and system functions 19
20 Automated Management Users & Administrators Compute Partition 1 File Services Partition Front End Partition Compute Partition 2 Compute Partition 1! Partition management! Linux configuration! Hardware monitoring! Software upgrades! File system management! Data backups Network configuration Accounting & user management Security Performance analysis Resource & queue management Single System Command and and Control 20
21 Active Manager Job Scheduler Job Job management is is integrated with with self-healing features to to increase job job completion rates 21
22 Application Acceleration FPGA
23 Application Acceleration Application Accelerator Application Acceleration! Reconfigurable Computing! Tightly coupled to Opteron! FPGA acts like a programmable coprocessor! Performs vector operations! Well-suited for:! Searching, sorting, signal processing, audio/video/image manipulation, encryption, error correction, coding/decoding, packet processing, random number generation. SuperLinear speedup for key algorithms 23
24 Application Acceleration Co-Processor AMD Opteron HyperTransport 3.2 GB/s 3.2 GB/s 3.2 GB/s 3.2 GB/s 3.2 GB/s RAP RapidArray Application Acceleration FPGA Xilinx Virtex II Pro 3.2 GB/s QDR SRAM 2 GB/s 2 GB/s Cray RapidArray Interconnect 24
25 FPGA Linux API! Administration Commands! fpga_open allocate and open fpga! fpga_close close allocated fpga! fpga_load load binary into fpga! Control Commands! fpga_start start fpga (release from reset)! fpga_stop stop fpga! Status Commands! fpga_status get status of fpga! Data Commands! fpga_put put data to fpga ram! fpga_get get data from fpga ram! Interrupt/Blocking Commands! fpga_intwait blocks process waits for fpga interrupt Programmer sees get/put and message passing programming model 25
26 Programming & Applications Environment
27 Programming Environment Operating System Cray HPC Enhanced Linux Distribution (derived from SuSE 8.2) System Management Active Manager for system administration & workload management Application Acceleration Kit IP Cores, Reference Designs, Command-line tools, API, JTAG interface card Scientific Libraries AMD Core Math Library (ACML) Shared Memory Access Shmem, Global Arrays, OpenMP 3 rd Party Tools Fortran 77/90/95, HPF, C/C++, Java Communications Libraries MPI 1.2 Cray Cray XD1 XD1 is is standards-based for for ease ease of of programming Linux, Linux, x86, x86, MPI MPI 27
28 Benchmark results
29 HPCC results! Slide by J. Dongarra (SOS9 meeting 03/2005). 29
30 PTRANS Benchmark! PTRANS (parallel matrix transpose) implements a parallel matrix transpose for two-dimensional blockcyclic storage! It is an important benchmark because it exercises the communications of the computer heavily on a realistic problem where pairs of processors communicate with each other simultaneously.! It is a useful test of the total communications capacity of the network. Unit: Giga Bytes per Second! Several molecular dynamic codes and some climate models must transpose large arrays to perform multidimensional FFTs (CPMD, FPMD, VASP, Climate Spectral models) 30
31 G-PTRANS, EP-STREAM-TRIAD PTRANS EP STREAM TRIAD GByte/s Cray XD RA 64 Dalco - Opt QSNetII 64 Dell - Xeon InfiniBand 64 IBM p Federation 128 HP SC QsNet 32 SGI Altix Itanium SGI Altix Itanium2 1.5 IBM p655 STREAM results scaled to single CPU 31
32 FFTE Benchmark! FFTE measures the floating point rate of execution of double precision complex one-dimensional Fast Fourier Transform! It is an important benchmark because it exercises the computation and the all-to-all communications required by global FFT algorithm! It is a useful test of the total communications capacity of the network. Unit: Giga Flops per Second! Several molecular dynamic codes and some climate models perform multi-dimensional FFTs (CPMD, FPMD, VASP, Climate Spectral models) 32
33 G-FFTE, EP-DGEMM G-FFTE - Global FFT EP DGEMM Gflop/s Cray XD RA 64 Dalco - Opt QSNetII 64 Dell - Xeon InfiniBand 64 IBM p Federation 128 HP SC QsNet 32 SGI Altix Itanium SGI Altix Itanium2 1.5 IBM p655 DGEMM results scaled to single CPU 33
34 Cray XD1 benchmark - FPMD FPMD - H2Obig - 64 CPUs 1.8 GHz Opteron/Myrinet IBM Cluster 1350, Intel Xeon 3.06 GHz/Myrinet IBM 1.3 GHz SP4 Cray XD1-2.4 GHz Cray X Elapsed time (seconds) Cray XD1 XD1 is is times faster than the the IBM IBM Power times faster than the the IBM IBM Cluster times faster than the the Opteron/Myrinet cluster 34
35 ECHAM5 case T63/L Forecast years per day =5 6=30 12=60 Number processors 5xVector=Scalar X1 XD1 35
36 Parallel performance on XD1! 5 Opteron/XD1 >= 1 CPU of X1! At low end of scaling curve (no real surprise)! At high end of scaling curve too (proprietary high speed interconnect) Parallel efficiency Number processors XD1 non radiation radiation full run (including IO) X1 reference (1/5 MSPs) 36
37 XD1 Benchmark - GROMACS GROMACS (DPPC in water) Speedup XD1, 2.2 GHz 1.8 GHz Opteron cluster, Myrinet Perfect scaling CPUs The The Cray XD1 XD1 delivers 63% 63% greater speedup over GHz GHz Opteron/Myrinet cluster at at CPUs (Higher is is better.) 37
38 Cray XD1 Benchmark: Amber 8 (scaling) Amber8 XD1 vs. Altix Scaling Speedup XD1 jac XD1 factor_ix Altix jac Altix factor_ix Perfect CPUs The The Cray XD1 XD1 delivers % 40% greater speedup over Altix Itanium2 cluster (Higher is is better.) 38
39 Cray XD1 Benchmarks: CHARMM! Next slide:! Itanium2 data - CHARMM version c31a2! XD1 CHARMM version c31b1 39
40 Cray XD1 Benchmarks: CHARMM CHARM M MbCO+4985 waters(17491 atoms), 100 steps Elapsed time (seconds) XD1, 2.2 GHz Itanium2, 1.4 GHz CPUs XD1 is 20% faster than 1.4 GHz Itanium2 at 16 CPUs, and is less expensive. (Lower is better.) 40
41 Cray XD1 Benchmarks: LS-DYNA LS-DYNA mpp970, revision 5434a 3-car collision, simulation time: 150 ms 20 Number of Runs per day XD1, 2.2 GHz Opteron/RapidArray HP, 2.2 GHz Opteron/InfiniBand 1.5GHz Itanium2 rx2600 Intel Xeon 3.06GHz CPUs The The Cray Cray XD1 XD1 is is 29% 29% faster faster than than a HP HP Opteron Opteron Cluster Cluster and and 9% 9% faster faster than than an an Itanium2 Itanium2 Cluster Cluster at at CPUs, CPUs, and and 12% 12% faster faster than than the the Itanium2 Itanium2 cluster cluster at at CPUs. CPUs. (Higher (Higher is is better.) better.) 41
42 Cray XD1 Benchmarks: LS-DYNA LS-DYNA mpp970, revision 5434a Neon_refined, simulation time: 30 ms XD1, 2.2 GHz Opteron/RapidArray HP, 2.2 GHz Opteron/InfiniBand 1.5GHz Itanium2 rx2600 Number of Runs per day CPUs The The Cray Cray XD1 XD1 is is 31% 31% faster faster than than a HP HP Opteron Opteron Cluster Cluster and and 13% 13% faster faster than than an an Itanium2 Itanium2 Cluster Cluster at at CPUs, CPUs, and and 11% 11% faster faster than than the the Itanium2 Itanium2 cluster cluster at at CPUs. CPUs. (Higher (Higher is is better.) better.) 42
43 Cray XD1 Benchmarks: STAR-HPC STAR-HPC 3.24 engine test case Number of Runs/day XD1, 2.2 GHz Opteron/RapidArray SGI Altix, 1.5 GHz Itanium2 IBM p5-570, 1.9 GHz Power 5 AMD 2 GHz Opteron/Myrinet CPUs The The Cray Cray XD1 XD1 is is 40% 40% faster faster than than the the SGI SGI Itanium2 Itanium2 Cluster Cluster and and 2.2X 2.2X faster faster than than the the AMD AMD Opteron Opteron cluster cluster at at CPUs, CPUs, and and 5% 5% faster faster than than the the IBM IBM Power5 Power5 at at CPUs. CPUs. (Higher (Higher is is better.) better.) 43
44
45
The Cray XD1. Technical Overview. Amar Shan, Senior Product Marketing Manager. Cray XD1. Cray Proprietary
The Cray XD1 Cray XD1 Technical Overview Amar Shan, Senior Product Marketing Manager Cray Proprietary The Cray XD1 Cray XD1 Built for price performance 30 times interconnect performance 2 times the density
More informationCray XD1 Supercomputer Release 1.3 CRAY XD1 DATASHEET
CRAY XD1 DATASHEET Cray XD1 Supercomputer Release 1.3 Purpose-built for HPC delivers exceptional application performance Affordable power designed for a broad range of HPC workloads and budgets Linux,
More informationAssessment of LS-DYNA Scalability Performance on Cray XD1
5 th European LS-DYNA Users Conference Computing Technology (2) Assessment of LS-DYNA Scalability Performance on Cray Author: Ting-Ting Zhu, Cray Inc. Correspondence: Telephone: 651-65-987 Fax: 651-65-9123
More informationGROMACS Performance Benchmark and Profiling. September 2012
GROMACS Performance Benchmark and Profiling September 2012 Note The following research was performed under the HPC Advisory Council activities Participating vendors: AMD, Dell, Mellanox Compute resource
More informationSupport for Programming Reconfigurable Supercomputers
Support for Programming Reconfigurable Supercomputers Miriam Leeser Nicholas Moore, Albert Conti Dept. of Electrical and Computer Engineering Northeastern University Boston, MA Laurie Smith King Dept.
More informationScheduling Strategies for HPC as a Service (HPCaaS) for Bio-Science Applications
Scheduling Strategies for HPC as a Service (HPCaaS) for Bio-Science Applications Sep 2009 Gilad Shainer, Tong Liu (Mellanox); Jeffrey Layton (Dell); Joshua Mora (AMD) High Performance Interconnects for
More informationCOSC 6385 Computer Architecture - Multi Processor Systems
COSC 6385 Computer Architecture - Multi Processor Systems Fall 2006 Classification of Parallel Architectures Flynn s Taxonomy SISD: Single instruction single data Classical von Neumann architecture SIMD:
More informationLS-DYNA Productivity and Power-aware Simulations in Cluster Environments
LS-DYNA Productivity and Power-aware Simulations in Cluster Environments Gilad Shainer 1, Tong Liu 1, Jacob Liberman 2, Jeff Layton 2 Onur Celebioglu 2, Scot A. Schultz 3, Joshua Mora 3, David Cownie 3,
More informationPerformance of HPC Applications over InfiniBand, 10 Gb and 1 Gb Ethernet. Swamy N. Kandadai and Xinghong He and
Performance of HPC Applications over InfiniBand, 10 Gb and 1 Gb Ethernet Swamy N. Kandadai and Xinghong He swamy@us.ibm.com and xinghong@us.ibm.com ABSTRACT: We compare the performance of several applications
More informationExperiences with HP SFS / Lustre in HPC Production
Experiences with HP SFS / Lustre in HPC Production Computing Centre (SSCK) University of Karlsruhe Laifer@rz.uni-karlsruhe.de page 1 Outline» What is HP StorageWorks Scalable File Share (HP SFS)? A Lustre
More informationLS-DYNA Performance Benchmark and Profiling. April 2015
LS-DYNA Performance Benchmark and Profiling April 2015 2 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, Dell, Mellanox Compute resource
More informationGROMACS Performance Benchmark and Profiling. August 2011
GROMACS Performance Benchmark and Profiling August 2011 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, Dell, Mellanox Compute resource
More informationABySS Performance Benchmark and Profiling. May 2010
ABySS Performance Benchmark and Profiling May 2010 Note The following research was performed under the HPC Advisory Council activities Participating vendors: AMD, Dell, Mellanox Compute resource - HPC
More informationFuture Trends in Hardware and Software for use in Simulation
Future Trends in Hardware and Software for use in Simulation Steve Feldman VP/IT, CD-adapco April, 2009 HighPerformanceComputing Building Blocks CPU I/O Interconnect Software General CPU Maximum clock
More informationBenchmark Results. 2006/10/03
Benchmark Results cychou@nchc.org.tw 2006/10/03 Outline Motivation HPC Challenge Benchmark Suite Software Installation guide Fine Tune Results Analysis Summary 2 Motivation Evaluate, Compare, Characterize
More informationEarly Evaluation of the Cray XD1
Early Evaluation of the Cray XD1 (FPGAs not covered here) Mark R. Fahey Sadaf Alam, Thomas Dunigan, Jeffrey Vetter, Patrick Worley Oak Ridge National Laboratory Cray User Group May 16-19, 2005 Albuquerque,
More informationAMBER 11 Performance Benchmark and Profiling. July 2011
AMBER 11 Performance Benchmark and Profiling July 2011 Note The following research was performed under the HPC Advisory Council activities Participating vendors: AMD, Dell, Mellanox Compute resource -
More informationApplication Performance on Dual Processor Cluster Nodes
Application Performance on Dual Processor Cluster Nodes by Kent Milfeld milfeld@tacc.utexas.edu edu Avijit Purkayastha, Kent Milfeld, Chona Guiang, Jay Boisseau TEXAS ADVANCED COMPUTING CENTER Thanks Newisys
More informationFUSION1200 Scalable x86 SMP System
FUSION1200 Scalable x86 SMP System Introduction Life Sciences Departmental System Manufacturing (CAE) Departmental System Competitive Analysis: IBM x3950 Competitive Analysis: SUN x4600 / SUN x4600 M2
More informationBalance of HPC Systems Based on HPCC Benchmark Results
Proceedings, Cray Users Group Conference, CUG 2005, May 16-19, Albuquerque, NM USA. Extended version of the paper R. Rabenseifner et al., Network Bandwidth Measurements and Ratio Analysis with the HPC
More informationIntroduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620
Introduction to Parallel and Distributed Computing Linh B. Ngo CPSC 3620 Overview: What is Parallel Computing To be run using multiple processors A problem is broken into discrete parts that can be solved
More informationThe Red Storm System: Architecture, System Update and Performance Analysis
The Red Storm System: Architecture, System Update and Performance Analysis Douglas Doerfler, Jim Tomkins Sandia National Laboratories Center for Computation, Computers, Information and Mathematics LACSI
More informationLAMMPS Performance Benchmark and Profiling. July 2012
LAMMPS Performance Benchmark and Profiling July 2012 Note The following research was performed under the HPC Advisory Council activities Participating vendors: AMD, Dell, Mellanox Compute resource - HPC
More informationMetropolitan Road Traffic Simulation on FPGAs
Metropolitan Road Traffic Simulation on FPGAs Justin L. Tripp, Henning S. Mortveit, Anders Å. Hansson, Maya Gokhale Los Alamos National Laboratory Los Alamos, NM 85745 Overview Background Goals Using the
More informationOptimizing LS-DYNA Productivity in Cluster Environments
10 th International LS-DYNA Users Conference Computing Technology Optimizing LS-DYNA Productivity in Cluster Environments Gilad Shainer and Swati Kher Mellanox Technologies Abstract Increasing demand for
More informationMM5 Modeling System Performance Research and Profiling. March 2009
MM5 Modeling System Performance Research and Profiling March 2009 Note The following research was performed under the HPC Advisory Council activities AMD, Dell, Mellanox HPC Advisory Council Cluster Center
More informationCP2K Performance Benchmark and Profiling. April 2011
CP2K Performance Benchmark and Profiling April 2011 Note The following research was performed under the HPC Advisory Council activities Participating vendors: AMD, Dell, Mellanox Compute resource - HPC
More informationThe Optimal CPU and Interconnect for an HPC Cluster
5. LS-DYNA Anwenderforum, Ulm 2006 Cluster / High Performance Computing I The Optimal CPU and Interconnect for an HPC Cluster Andreas Koch Transtec AG, Tübingen, Deutschland F - I - 15 Cluster / High Performance
More informationLS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance
11 th International LS-DYNA Users Conference Computing Technology LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance Gilad Shainer 1, Tong Liu 2, Jeff Layton
More informationInitial Performance Evaluation of the Cray SeaStar Interconnect
Initial Performance Evaluation of the Cray SeaStar Interconnect Ron Brightwell Kevin Pedretti Keith Underwood Sandia National Laboratories Scalable Computing Systems Department 13 th IEEE Symposium on
More informationHPCC Results. Nathan Wichmann Benchmark Engineer
HPCC Results Nathan Wichmann Benchmark Engineer Outline What is HPCC? Results Comparing current machines Conclusions May 04 2 HPCChallenge Project Goals To examine the performance of HPC architectures
More informationNAMD Performance Benchmark and Profiling. January 2015
NAMD Performance Benchmark and Profiling January 2015 2 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, Dell, Mellanox Compute resource
More informationThe Use of Cloud Computing Resources in an HPC Environment
The Use of Cloud Computing Resources in an HPC Environment Bill, Labate, UCLA Office of Information Technology Prakashan Korambath, UCLA Institute for Digital Research & Education Cloud computing becomes
More informationReal Parallel Computers
Real Parallel Computers Modular data centers Background Information Recent trends in the marketplace of high performance computing Strohmaier, Dongarra, Meuer, Simon Parallel Computing 2005 Short history
More informationWhat does Heterogeneity bring?
What does Heterogeneity bring? Ken Koch Scientific Advisor, CCS-DO, LANL LACSI 2006 Conference October 18, 2006 Some Terminology Homogeneous Of the same or similar nature or kind Uniform in structure or
More informationPlatform Choices for LS-DYNA
Platform Choices for LS-DYNA Manfred Willem and Lee Fisher High Performance Computing Division, HP lee.fisher@hp.com October, 2004 Public Benchmarks for LS-DYNA www.topcrunch.org administered by University
More informationResources Current and Future Systems. Timothy H. Kaiser, Ph.D.
Resources Current and Future Systems Timothy H. Kaiser, Ph.D. tkaiser@mines.edu 1 Most likely talk to be out of date History of Top 500 Issues with building bigger machines Current and near future academic
More informationPerformance Analysis of Memory Transfers and GEMM Subroutines on NVIDIA TESLA GPU Cluster
Performance Analysis of Memory Transfers and GEMM Subroutines on NVIDIA TESLA GPU Cluster Veerendra Allada, Troy Benjegerdes Electrical and Computer Engineering, Ames Laboratory Iowa State University &
More informationSami Saarinen Peter Towers. 11th ECMWF Workshop on the Use of HPC in Meteorology Slide 1
Acknowledgements: Petra Kogel Sami Saarinen Peter Towers 11th ECMWF Workshop on the Use of HPC in Meteorology Slide 1 Motivation Opteron and P690+ clusters MPI communications IFS Forecast Model IFS 4D-Var
More informationSingle-Points of Performance
Single-Points of Performance Mellanox Technologies Inc. 29 Stender Way, Santa Clara, CA 9554 Tel: 48-97-34 Fax: 48-97-343 http://www.mellanox.com High-performance computations are rapidly becoming a critical
More informationPerformance Optimizations via Connect-IB and Dynamically Connected Transport Service for Maximum Performance on LS-DYNA
Performance Optimizations via Connect-IB and Dynamically Connected Transport Service for Maximum Performance on LS-DYNA Pak Lui, Gilad Shainer, Brian Klaff Mellanox Technologies Abstract From concept to
More information2008 International ANSYS Conference
2008 International ANSYS Conference Maximizing Productivity With InfiniBand-Based Clusters Gilad Shainer Director of Technical Marketing Mellanox Technologies 2008 ANSYS, Inc. All rights reserved. 1 ANSYS,
More informationHPCS HPCchallenge Benchmark Suite
HPCS HPCchallenge Benchmark Suite David Koester, Ph.D. () Jack Dongarra (UTK) Piotr Luszczek () 28 September 2004 Slide-1 Outline Brief DARPA HPCS Overview Architecture/Application Characterization Preliminary
More informationComputer Comparisons Using HPCC. Nathan Wichmann Benchmark Engineer
Computer Comparisons Using HPCC Nathan Wichmann Benchmark Engineer Outline Comparisons using HPCC HPCC test used Methods used to compare machines using HPCC Normalize scores Weighted averages Comparing
More informationIBM System x servers. Innovation comes standard
IBM System x servers Innovation comes standard IBM System x servers Highlights Build a cost-effective, flexible IT environment with IBM X-Architecture technology. Achieve maximum performance per watt with
More informationMellanox Technologies Maximize Cluster Performance and Productivity. Gilad Shainer, October, 2007
Mellanox Technologies Maximize Cluster Performance and Productivity Gilad Shainer, shainer@mellanox.com October, 27 Mellanox Technologies Hardware OEMs Servers And Blades Applications End-Users Enterprise
More informationPerformance comparison between a massive SMP machine and clusters
Performance comparison between a massive SMP machine and clusters Martin Scarcia, Stefano Alberto Russo Sissa/eLab joint Democritos/Sissa Laboratory for e-science Via Beirut 2/4 34151 Trieste, Italy Stefano
More informationCluster Network Products
Cluster Network Products Cluster interconnects include, among others: Gigabit Ethernet Myrinet Quadrics InfiniBand 1 Interconnects in Top500 list 11/2009 2 Interconnects in Top500 list 11/2008 3 Cluster
More informationWhat are Clusters? Why Clusters? - a Short History
What are Clusters? Our definition : A parallel machine built of commodity components and running commodity software Cluster consists of nodes with one or more processors (CPUs), memory that is shared by
More informationDell EMC Ready Bundle for HPC Digital Manufacturing Dassault Systѐmes Simulia Abaqus Performance
Dell EMC Ready Bundle for HPC Digital Manufacturing Dassault Systѐmes Simulia Abaqus Performance This Dell EMC technical white paper discusses performance benchmarking results and analysis for Simulia
More informationScalable x86 SMP Server FUSION1200
Scalable x86 SMP Server FUSION1200 Challenges Scaling compute-power is either Complex (scale-out / clusters) or Expensive (scale-up / SMP) Scale-out - Clusters Requires advanced IT skills / know-how (high
More informationLS-DYNA Performance Benchmark and Profiling. October 2017
LS-DYNA Performance Benchmark and Profiling October 2017 2 Note The following research was performed under the HPC Advisory Council activities Participating vendors: LSTC, Huawei, Mellanox Compute resource
More informationAccelerating HPC. (Nash) Dr. Avinash Palaniswamy High Performance Computing Data Center Group Marketing
Accelerating HPC (Nash) Dr. Avinash Palaniswamy High Performance Computing Data Center Group Marketing SAAHPC, Knoxville, July 13, 2010 Legal Disclaimer Intel may make changes to specifications and product
More informationCOSC 6374 Parallel Computation. Parallel Computer Architectures
OS 6374 Parallel omputation Parallel omputer Architectures Some slides on network topologies based on a similar presentation by Michael Resch, University of Stuttgart Spring 2010 Flynn s Taxonomy SISD:
More informationOP2 FOR MANY-CORE ARCHITECTURES
OP2 FOR MANY-CORE ARCHITECTURES G.R. Mudalige, M.B. Giles, Oxford e-research Centre, University of Oxford gihan.mudalige@oerc.ox.ac.uk 27 th Jan 2012 1 AGENDA OP2 Current Progress Future work for OP2 EPSRC
More informationBirds of a Feather Presentation
Mellanox InfiniBand QDR 4Gb/s The Fabric of Choice for High Performance Computing Gilad Shainer, shainer@mellanox.com June 28 Birds of a Feather Presentation InfiniBand Technology Leadership Industry Standard
More informationPerformance Analysis of LS-DYNA in Huawei HPC Environment
Performance Analysis of LS-DYNA in Huawei HPC Environment Pak Lui, Zhanxian Chen, Xiangxu Fu, Yaoguo Hu, Jingsong Huang Huawei Technologies Abstract LS-DYNA is a general-purpose finite element analysis
More informationFUJITSU PHI Turnkey Solution
FUJITSU PHI Turnkey Solution Integrated ready to use XEON-PHI based platform Dr. Pierre Lagier ISC2014 - Leipzig PHI Turnkey Solution challenges System performance challenges Parallel IO best architecture
More informationGW2000h w/gw175h/q F1 specifications
Product overview The Gateway GW2000h w/ GW175h/q F1 maximizes computing power and thermal control with up to four hot-pluggable nodes in a space-saving 2U form factor. Offering first-class performance,
More informationReal Parallel Computers
Real Parallel Computers Modular data centers Overview Short history of parallel machines Cluster computing Blue Gene supercomputer Performance development, top-500 DAS: Distributed supercomputing Short
More informationThe Missing Piece of Virtualization. I/O Virtualization on 10 Gb Ethernet For Virtualized Data Centers
The Missing Piece of Virtualization I/O Virtualization on 10 Gb Ethernet For Virtualized Data Centers Agenda 10 GbE Adapters Built for Virtualization I/O Throughput: Virtual & Non-Virtual Servers Case
More informationSTAR-CCM+ Performance Benchmark and Profiling. July 2014
STAR-CCM+ Performance Benchmark and Profiling July 2014 Note The following research was performed under the HPC Advisory Council activities Participating vendors: CD-adapco, Intel, Dell, Mellanox Compute
More informationNAMD Performance Benchmark and Profiling. November 2010
NAMD Performance Benchmark and Profiling November 2010 Note The following research was performed under the HPC Advisory Council activities Participating vendors: HP, Mellanox Compute resource - HPC Advisory
More informationCOSC 6374 Parallel Computation. Parallel Computer Architectures
OS 6374 Parallel omputation Parallel omputer Architectures Some slides on network topologies based on a similar presentation by Michael Resch, University of Stuttgart Edgar Gabriel Fall 2015 Flynn s Taxonomy
More informationComputing architectures Part 2 TMA4280 Introduction to Supercomputing
Computing architectures Part 2 TMA4280 Introduction to Supercomputing NTNU, IMF January 16. 2017 1 Supercomputing What is the motivation for Supercomputing? Solve complex problems fast and accurately:
More informationHPC Solution. Technology for a New Era in Computing
HPC Solution Technology for a New Era in Computing TEL IN HPC & Storage.. 20 years of changing with Technology Complete Solution Integrators for Select Verticals Mechanical Design & Engineering High Performance
More informationHPCC Optimizations and Results for the Cray X1 Nathan Wichmann Cray Inc. May 14, 2004
HPCC Optimizations and Results for the Cray X1 Nathan Wichmann Cray Inc. May 14, 2004 ABSTRACT: A new benchmark call HPCC has recently been proposed to evaluate High Performance systems. This paper discusses
More informationOncilla - a Managed GAS Runtime for Accelerating Data Warehousing Queries
Oncilla - a Managed GAS Runtime for Accelerating Data Warehousing Queries Jeffrey Young, Alex Merritt, Se Hoon Shon Advisor: Sudhakar Yalamanchili 4/16/13 Sponsors: Intel, NVIDIA, NSF 2 The Problem Big
More informationDetermining Optimal MPI Process Placement for Large- Scale Meteorology Simulations with SGI MPIplace
Determining Optimal MPI Process Placement for Large- Scale Meteorology Simulations with SGI MPIplace James Southern, Jim Tuccillo SGI 25 October 2016 0 Motivation Trend in HPC continues to be towards more
More informationSUN CUSTOMER READY HPC CLUSTER: REFERENCE CONFIGURATIONS WITH SUN FIRE X4100, X4200, AND X4600 SERVERS Jeff Lu, Systems Group Sun BluePrints OnLine
SUN CUSTOMER READY HPC CLUSTER: REFERENCE CONFIGURATIONS WITH SUN FIRE X4100, X4200, AND X4600 SERVERS Jeff Lu, Systems Group Sun BluePrints OnLine April 2007 Part No 820-1270-11 Revision 1.1, 4/18/07
More informationHPC Capabilities at Research Intensive Universities
HPC Capabilities at Research Intensive Universities Purushotham (Puri) V. Bangalore Department of Computer and Information Sciences and UAB IT Research Computing UAB HPC Resources 24 nodes (192 cores)
More informationThe AMD64 Technology for Server and Workstation. Dr. Ulrich Knechtel Enterprise Program Manager EMEA
The AMD64 Technology for Server and Workstation Dr. Ulrich Knechtel Enterprise Program Manager EMEA Agenda Direct Connect Architecture AMD Opteron TM Processor Roadmap Competition OEM support The AMD64
More informationInstallation and Test of Molecular Dynamics Simulation Packages on SGI Altix and Hydra-Cluster at JKU Linz
Installation and Test of Molecular Dynamics Simulation Packages on SGI Altix and Hydra-Cluster at JKU Linz Rene Kobler May 25, 25 Contents 1 Abstract 2 2 Status 2 3 Changelog 2 4 Installation Notes 3 4.1
More informationCommunication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems.
Cluster Networks Introduction Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems. As usual, the driver is performance
More informationEarly Evaluation of the Cray X1 at Oak Ridge National Laboratory
Early Evaluation of the Cray X1 at Oak Ridge National Laboratory Patrick H. Worley Thomas H. Dunigan, Jr. Oak Ridge National Laboratory 45th Cray User Group Conference May 13, 2003 Hyatt on Capital Square
More informationMaximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms
Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Family-Based Platforms Executive Summary Complex simulations of structural and systems performance, such as car crash simulations,
More informationMoab Workload Manager on Cray XT3
Moab Workload Manager on Cray XT3 presented by Don Maxwell (ORNL) Michael Jackson (Cluster Resources, Inc.) MOAB Workload Manager on Cray XT3 Why MOAB? Requirements Features Support/Futures 2 Why Moab?
More informationQuad-Core Intel Xeon Processor-based 3200/3210 Chipset Server Platforms
Quad-Core Intel Xeon Processor-based 3200/3210 Chipset Server Platforms Entry-level server platforms with outstanding dependability, performance, and value Trust your company to Intel's proven server technology
More informationCESM (Community Earth System Model) Performance Benchmark and Profiling. August 2011
CESM (Community Earth System Model) Performance Benchmark and Profiling August 2011 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, Dell,
More information4. LS-DYNA Anwenderforum, Bamberg 2005 IT I. September 28, 2005 Computation Products Group 1. September 28, 2005 Computation Products Group 2
4. LS-DYNA Anwenderforum, Bamberg 2005 IT I High Performance Enterprise Computing Hardware Design & Performance Application Optimization Guide Performance Evaluation Lynn Lewis Director WW FAE MSS lynn.lewis@amd.com
More informationAgenda. Sun s x Sun s x86 Strategy. 2. Sun s x86 Product Portfolio. 3. Virtualization < 1 >
Agenda Sun s x86 1. Sun s x86 Strategy 2. Sun s x86 Product Portfolio 3. Virtualization < 1 > 1. SUN s x86 Strategy Customer Challenges Power and cooling constraints are very real issues Energy costs are
More informationBlueGene/L. Computer Science, University of Warwick. Source: IBM
BlueGene/L Source: IBM 1 BlueGene/L networking BlueGene system employs various network types. Central is the torus interconnection network: 3D torus with wrap-around. Each node connects to six neighbours
More informationHigh Performance Computing with Accelerators
High Performance Computing with Accelerators Volodymyr Kindratenko Innovative Systems Laboratory @ NCSA Institute for Advanced Computing Applications and Technologies (IACAT) National Center for Supercomputing
More informationThe Cray Rainier System: Integrated Scalar/Vector Computing
THE SUPERCOMPUTER COMPANY The Cray Rainier System: Integrated Scalar/Vector Computing Per Nyberg 11 th ECMWF Workshop on HPC in Meteorology Topics Current Product Overview Cray Technology Strengths Rainier
More informationBuilding NVLink for Developers
Building NVLink for Developers Unleashing programmatic, architectural and performance capabilities for accelerated computing Why NVLink TM? Simpler, Better and Faster Simplified Programming No specialized
More informationHPC Technology Update Challenges or Chances?
HPC Technology Update Challenges or Chances? Swiss Distributed Computing Day Thomas Schoenemeyer, Technology Integration, CSCS 1 Move in Feb-April 2012 1500m2 16 MW Lake-water cooling PUE 1.2 New Datacenter
More informationExchange Server 2007 Performance Comparison of the Dell PowerEdge 2950 and HP Proliant DL385 G2 Servers
Exchange Server 2007 Performance Comparison of the Dell PowerEdge 2950 and HP Proliant DL385 G2 Servers By Todd Muirhead Dell Enterprise Technology Center Dell Enterprise Technology Center dell.com/techcenter
More informationHimeno Performance Benchmark and Profiling. December 2010
Himeno Performance Benchmark and Profiling December 2010 Note The following research was performed under the HPC Advisory Council activities Participating vendors: AMD, Dell, Mellanox Compute resource
More informationLS-DYNA Performance Benchmark and Profiling. October 2017
LS-DYNA Performance Benchmark and Profiling October 2017 2 Note The following research was performed under the HPC Advisory Council activities Participating vendors: LSTC, Huawei, Mellanox Compute resource
More informationTo Infiniband or Not Infiniband, One Site s s Perspective. Steve Woods MCNC
To Infiniband or Not Infiniband, One Site s s Perspective Steve Woods MCNC 1 Agenda Infiniband background Current configuration Base Performance Application performance experience Future Conclusions 2
More informationAdvances of parallel computing. Kirill Bogachev May 2016
Advances of parallel computing Kirill Bogachev May 2016 Demands in Simulations Field development relies more and more on static and dynamic modeling of the reservoirs that has come a long way from being
More informationIntel Select Solutions for Professional Visualization with Advantech Servers & Appliances
Solution Brief Intel Select Solution for Professional Visualization Intel Xeon Processor Scalable Family Powered by Intel Rendering Framework Intel Select Solutions for Professional Visualization with
More informationMaximizing Memory Performance for ANSYS Simulations
Maximizing Memory Performance for ANSYS Simulations By Alex Pickard, 2018-11-19 Memory or RAM is an important aspect of configuring computers for high performance computing (HPC) simulation work. The performance
More informationCray RS Programming Environment
Cray RS Programming Environment Gail Alverson Cray Inc. Cray Proprietary Red Storm Red Storm is a supercomputer system leveraging over 10,000 AMD Opteron processors connected by an innovative high speed,
More informationBest Practices for Setting BIOS Parameters for Performance
White Paper Best Practices for Setting BIOS Parameters for Performance Cisco UCS E5-based M3 Servers May 2013 2014 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page
More informationIBM System x3455 AMD Opteron SMP 1 U server features Xcelerated Memory Technology to meet the needs of HPC environments
IBM Europe Announcement ZG07-0492, dated July 17, 2007 IBM System x3455 AMD Opteron SMP 1 U server features Xcelerated Memory Technology to meet the needs of HPC environments Key prerequisites...2 Description...3
More informationHPC and IT Issues Session Agenda. Deployment of Simulation (Trends and Issues Impacting IT) Mapping HPC to Performance (Scaling, Technology Advances)
HPC and IT Issues Session Agenda Deployment of Simulation (Trends and Issues Impacting IT) Discussion Mapping HPC to Performance (Scaling, Technology Advances) Discussion Optimizing IT for Remote Access
More informationFPGA Solutions: Modular Architecture for Peak Performance
FPGA Solutions: Modular Architecture for Peak Performance Real Time & Embedded Computing Conference Houston, TX June 17, 2004 Andy Reddig President & CTO andyr@tekmicro.com Agenda Company Overview FPGA
More informationEPYC VIDEO CUG 2018 MAY 2018
AMD UPDATE CUG 2018 EPYC VIDEO CRAY AND AMD PAST SUCCESS IN HPC AMD IN TOP500 LIST 2002 TO 2011 2011 - AMD IN FASTEST MACHINES IN 11 COUNTRIES ZEN A FRESH APPROACH Designed from the Ground up for Optimal
More informationNAMD Performance Benchmark and Profiling. February 2012
NAMD Performance Benchmark and Profiling February 2012 Note The following research was performed under the HPC Advisory Council activities Participating vendors: AMD, Dell, Mellanox Compute resource -
More information