Understanding Hardware Selection to Speedup Your CFD and FEA Simulations
|
|
- Cecilia Carr
- 5 years ago
- Views:
Transcription
1 Understanding Hardware Selection to Speedup Your CFD and FEA Simulations 1
2 Agenda Why Talking About Hardware HPC Terminology ANSYS Work-flow Hardware Considerations Additional resources 2
3 Agenda Why Talking About Hardware HPC Terminology ANSYS Work-flow Hardware Considerations Additional resources 3
4 Most Users Constrained by Hardware 4 Source: HPC Usage survey with over 1,800 ANSYS respondents
5 Problem Statement I am not achieving the performance and throughput I was expecting from my hardware & software 5 Image courtesy of Intel Corporation
6 Building A Balanced System Is The Key To Improving Your Experience If Your System Is Slow So Are Your Engineers & Analysts Networks Storage Memory Processors 6
7 What Hardware Configuration to Select? HDD vs. SSD SMP vs. DMP 7 CPUs? The right combination of hardware and software leads to maximum efficiency Clusters? GPUs? Interconnects?
8 Agenda Why Talking About Hardware HPC Terminology ANSYS Work-flow Hardware Considerations Additional resources 8
9 HPC Hardware Terminology Machine 1 (or Node 1) Machine N (or Node N) Processor 1 (or Socket 1) Processor 1 (or Socket 1) Processor 2 (or Socket 2) Processor 2 (or Socket 2) GPU GPU Interconnect (GigE or InfiniBand) 9
10 Shared Memory Parallel Machine 1 (or Node 1) Processor 1 (or Socket 1) Single Machine Parallel (SMP) systems share a single global memory image that may be distributed physically across multiple cores, but is globally addressable. OpenMP is the industry standard. 10
11 Distributed Memory Parallel Machine 1 (or Node 1) Processor 1 (or Socket 1) Distributed memory parallel processing (DMP) assumes that physical memory for each process is separate from all other processes. Parallel processing on such a system requires some form of message passing software to exchange data between the cores. 11 MPI (Message Passing Interface) is the industry standard for this.
12 Agenda Why Talking About Hardware HPC Terminology ANSYS Work-flow Hardware Considerations Additional resources 12
13 Typical HPC Growth Path Desktop User Workstation and/or Server Users Cluster Users Cloud Solution 13
14 14 Remote Visualization Ideal for remote users submitting jobs from a Windows machine to a Linux cluster or local users submitting jobs to a Linux cluster users that do not have enough power (memory or graphics) on their local workstation to build large meshes or view graphics. ANSYS 16.0 supports the following remote visualization applications Nice Desktop Cloud Visualiation (DCV) 2013 Linux server + Linux/Windows client OpenText Exceed ondemand 8 SP2/SP3 Linux server + Linux/Windows client RealVNC Enterprise Edition (with VirtualGL) Linux server + Linux/Windows client (on Windows cluster: Microsoft Remote Desktop) Hardware requirements for remote visualization servers require: GPU capable video cards large amounts of RAM accessible for multiple user availability when running ANSYS applications and pre/post processing
15 Virtual Desktop (VDI) Support Key focus area at ANSYS (internal use & software QA) Focus on GPU Pass-Through One GPU per VM, up to 8 VMs per machine (K1, K2 cards); memory constraints will limit in any case vgpu (NVIDIA Grid) as it matures; testing internally Not SW rendering, Not Shared GPU (too slow) Supported at R16.0: 15
16 ANSYS Remote Solve Manager (RSM) Desktop Server Cluster (with 3 rd party scheduler) The Remote Solve Manager (RSM) is a GUI-based, job queuing system that RSM as a scheduler RSM as a transport mechanism distributes simulation tasks to (shared) computing resources Submits to RSM itself. Submits through RSM to a high-level RSM enables tasks to be scheduler such as LSF, PBS Pro, Run in background mode on the local machine Windows HPC Server 2008 R2 / 2012, Sent Unit to recognition: a remote compute jobs (e.g. machine a run of a and Univa Grid Engine (at R15.0). Broken solver into such a as series CFX, of Fluent jobs for Mechanical) parallel processing Unit across recognition: a variety cores of computers 16
17 RSM Usage Scenarios Submission from a client to a centralized (shared) compute resource, allowing back-ground queuing on a centralized machine multiple users to share a common, usually large memory/fast machine (compared to client machine) 17
18 RSM Usage Scenarios Submission from a client to a centralized (shared) compute resource, allowing back-ground queuing on a centralized machine multiple users to share a common, usually large memory/fast machine (compared to client machine) Submission from a client to multiple (shared) compute resources, allowing back-ground queuing on a centralized machine that submits to other machines (compute servers) multiple users to share user workstations (often at night) using the RSM Limit Times for Job Submission feature 18
19 RSM Usage Scenarios Submission from a client to a centralized (shared) compute resource, allowing back-ground queuing on a centralized machine multiple users to share a common, usually large memory/fast machine (compared to client machine) Submission from a client to multiple (shared) compute resources, allowing back-ground queuing on a centralized machine that submits to other machines (compute servers) multiple users to share user workstations (often at night) using the RSM Limit Times for Job Submission feature Submission from a client to a centralized (shared) compute resource with a job scheduler, allowing back-ground queuing on a centralized machine that submits to a job scheduler (e.g. LSF) multiple users to run multi-node jobs on shared compute resources 19
20 Recent Enhancements in RSM Improved robustness and scalability Added support for Univa Grid Engine Added support for Mechanical/MAPDL restart Non-root users on Linux can now use RSM wizard Enriched support for RSM customization Added component override for design point update Improved efficiency of Design Point updates Design objectives: Equal fresh and exhaust gas mass flow distribution to each cylinder To minimize the overall pressure drop Input parameters: Radii of 3 fillets near inlet (8 design points) ~5.0x speed-up over sequential execution 20 Parametric, Optimization of Intake Manifold Initial Optimized
21 Guidelines : Know your hardware lifecycle Have a goal in mind for what you want to achieve. Using Licensing productively Using ANSYS provided processes effectively. 21
22 Agenda Why Talking About Hardware HPC Terminology ANSYS Work-flow Hardware Considerations Additional resources 22
23 What Hardware Configuration to Select? CPUs? GPU/Phi? HDD vs. SSD SMP vs. DMP Clusters? Interconnects? 23
24 Understanding the effect of clock speed Generally, ANSYS applications scale with clock frequency Cost/performance argues for high clock (but maybe not top bin) Using higher clock speed is always helpful to realize productivity gains ANSYS DMP benchmarks (8 core) Clock effect is highest for sparse solver 24
25 Understanding the effect of memory bandwidth - Is 24 Cores Equal to 24 Cores? 3 x (2 x 4) = 24 cores 2 x (2 x 6) = 24 cores x5570 x5670 x5570 x5570 x
26 Understanding the effect of memory bandwidth - Is 24 Cores Equal to 24 Cores? 3 x (2 x 4) = 24 cores 2 x (2 x 6) = 24 cores x5570 x5670 x5570 x5570 x5670 Consider memory per core! 26
27 Understanding the effect of memory bandwidth - Is 16 Cores Equal to 16 Cores? 2 x (2 x 4) = 16 cores 2 x (2 x 4) = 16 cores x5570 x5670 x5570 x5670 Using less cores per node can be helpful to realize productivity gains 27
28 Understanding the effect of memory bandwidth - ANSYS Mechanical Consider memory per core! 28
29 Understanding the effect of memory speed We can see here the effect of memory speed. This has implications on how you build your hardware. Some processors types have slower memory speeds by default. On other processors nonoptimally filling the memory channels can slow the memory speed. Has an effect on memory bandwidth Using higher memory speed can be helpful to realize productivity gains 29
30 Turbo Boost (Intel) / Turbo Core (AMD) - ANSYS CFD Turbo Boost (Intel)/ Turbo Core(AMD) is a form of over-clocking that allows you to give more GHz to individual processors when others are idle. With the Intel s have seen variable performance with this ranging between 0-8% improvement depending on the numbers of cores in use. The graph below for CFX on a Intel X5550. This only sees a maximum of 2.5% improvement. 30
31 Turbo Boost (Intel) / Turbo Core (AMD) - ANSYS Mechanical We can see that relative to 1 core we can see good performance gains in many cases by using Turbo Boost on the E5 processor family. Using Turbo Boost / Core can be helpful to realize productivity gains - particularly for lower core counts 31
32 Hyper-threading Evaluation of Hyperthreading on ANSYS/FLUENT Performance idataplex M3 (Intel Xeon x5670, 2.93 GHz) TURBO: ON (measurement is improvement relative ot Hyperthtreading OFF) HT OFF (12 threads on 12 physical cores) HT ON (24 threads on 12 physical cores) 1.10 Higher is better Improvemet due to Hyperthreading Hyper-threading is NOT recommended 0.90 eddy_417k turbo_500k aircraft_2m sedan_4m truck_14m ANSYS/FLUENT Model 32
33 Generation to Generation - ANSYS Mechanical Optimized for Intel Xeon E5 v3 processors: ANSYS Mechanical 16.0 performs well on the latest Intel processor architecture Haswell processor-based system is 20% to 40% faster than Sandy Bridge processor-based system for a variety of benchmarks 34
34 ANSYS Fluent on Intel Ivy Bridge Ivy Bridge vs. Sandy Bridge Single Node Ivy Bridge = Tick release of Sandy Bridge Similar micro architecture, more cores, reduced power Expect similar core-to-core performance on Ivy Bridge and Sandy Bridge Improved node-to-node Single-node performance of ANSYS Fluent 14.5 over six benchmark cases 2x8 core Sandy Bridge vs. 2x12 core Ivy Bridge 50% performance boost matches core count increase Scaling maintained on higher core density Achieved via efficient memory use (and higher RAM speed) ANSYS, Inc. June 18, 2015 Case Ivy Bridge Sandy Bridge Ratio turbo_500k eddy_417k aircraft_2m sedan_4m truck_14m truck_poly_14m
35 ANSYS Fluent Ivy Bridge vs. Sandy Bridge Scaling Multi-node performance of ANSYS Fluent 14.5 Up to 192 cores Nearly identical core-to-core scaling confirms system balance for Fluent Truck_14m Solver Rating, Fluent Solver Rating SandyBridge Ivybridge Number of Cores ANSYS, Inc. June 18, 2015
36 Per Node vs. Per Core Comparisons This is a 4 socket vs. 2 socket node comparison. Xeon E v GHz (4 socket) Xeon E v GHz (2 socket) From the per node comparison you d assume it was better to go with the 4 socket. Per core however the 2 socket is the better choice. Both are not showing linear scalability as they are running on all the cores per node (bandwidth constrained) 37
37 Generation to Generation - ANSYS Fluent ANSYS Application Example Case Details: Flow through a Combustor Number of cells: 12 Million Cell Type: Polyhedra Models used: Realizable K-ε turbulence Pressure based coupled, species transport, Least Square cell based, pseudo transient 38
38 Generation to Generation - ANSYS Fluent ANSYS Application Example Case Details: External flow over a passenger sedan Number of cells: 4 Million Cell Type: Mixed Models used: Standard K-ε turbulence Solver: Pressure based coupled, steady, Green-Gauss cell based 39
39 Recap Faster cores mean faster solution Faster memory means faster solution Memory bandwidth is an important factor for (linear) scale-ability Turbo Boost/Turbo Core modes do give some benefit especially at low core counts per node. In general hyper threading should not be used because of licensing implications. Be careful when looking at comparisons! Make sure you are comparing like with like! 40
40 What Hardware Configuration to Select? CPUs? GPU/Phi? HDD vs. SSD SMP vs. DMP Clusters? Interconnects? 41
41 Understanding the effect of the interconnect Need fast interconnects to feed fast processors Two main characteristics for each interconnect: latency and bandwidth Distributed ANSYS is highly bandwidth bound D I S T R I B U T E D A N S Y S S T A T I S T I C S Release: 14.5 Build: UP Platform: LINUX x64 Date Run: 08/09/2012 Time: 23:07 Processor Model: Intel(R) Xeon(R) CPU E GHz Total number of cores available : 32 Number of physical cores available : 32 Number of cores requested : 4 (Distributed Memory Parallel) MPI Type: INTELMPI Core Machine Name Working Directory hpclnxsmc00 /data1/ansyswork 1 hpclnxsmc00 /data1/ansyswork 2 hpclnxsmc01 /data1/ansyswork 3 hpclnxsmc01 /data1/ansyswork Latency time from master to core 1 = microseconds Latency time from master to core 2 = microseconds Latency time from master to core 3 = microseconds Communication speed from master to core 1 = MB/sec Same machine Communication speed from master to core 2 = MB/sec QDR Infiniband Communication speed from master to core 3 = MB/sec QDR Infiniband 42
42 Understanding the effect of the interconnect - ANSYS Fluent ANSYS/FLUENT Performance idataplex M3 (Intel Xeon x5670, 12C 2.93 GHz) Network: Gigabit, 10-Gigabit, 4X QDR Infiniband (QLogic, Voltaire) Hyperthreading: OFF, TURBO: ON Models: truck_14m 5000 QLogic Voltaire 10-Gigabit Gigabit FLUENT Rating Higher is better Number of Cores used by a single job 43
43 Understanding the effect of the interconnect - ANSYS Fluent Exhaust Model M cells Transient simulation with explicit time stepping for engine startup cycle Fujitsu PRIMERGY CX250 HPC systems (E5-2690v2 with 20 and E5-2697v2 with 24 cores per node, resp.) For CFD we can see the performance of IB vs GiGE GiGE starts to drop off after 2 nodes
44 Understanding the effect of the interconnect - ANSYS Fluent For CFD 10 GiGE starts to taper off after 8 nodes 45
45 Understanding the effect of the interconnect - ANSYS Mechanical V13sp-5 Model Rating (runs/day) Turbine geometry 2,100 K DOF 20 SOLID187 FEs Static, nonlinear One iteration 10 Direct sparse Linux cluster (8 cores per node) 0 Interconnect Performance Gigabit Ethernet DDR Infiniband 8 cores 16 cores 32 cores 64 cores 128 cores 46
46 Understanding the effect of the interconnect - ANSYS Mechanical For ANSYS Mechanical GiGE does not scale to more than 1 node! 47
47 Understanding the effect of the interconnect - ANSYS Mechanical GiGE (Gigabit Ethernet) 1 Gbits/sec ( 100 MB/sec ) 10 GiGE 10 Gbits/sec ( 1000 MB/sec ) Not recommended!! Bare minimum!! Myrinet (Myricom, Inc) 2 Gbits/sec ( 250 MB/sec ) Myri 10G 10 Gbits/sec (4 th generation Myrinet) Infiniband (many vendors/speeds) SDR/DDR/QDR 1x, 4x, 12x RECOMMENDATION Over 1000 MB/s, especially when running on more than 4 nodes 48
48 Recap 10GiGE and Infiniband are recommended for HPC Clusters. Currently Infiniband only for large clusters is recommended QDR should be more than adequate for small to medium clusters. FDR for large clusters. For more than 1 node you will see performance decrease using GiGE. For Mechanical users do not use GiGE at all if their jobs span more than one node. 49
49 What Hardware Configuration to Select? CPUs? GPU/Phi? HDD vs. SSD SMP vs. DMP Clusters? Interconnects? 50
50 Parallel file systems NFS Server and/or master node causes IO bottleneck Master node causes IO bottleneck IO scales with cluster 51
51 Parallel file systems - ANSYS Mechanical The example across from here is using GPFS for Mechanical. Notice how it is very similar in speed to a local RAID 0 configuration (4 x 15k SAS) 52
52 Understanding the effect of I/O - ANSYS Fluent Parallel I/O is based on MPI-IO Implemented for data file read and write A single file is written collectively by the nodes Suited for parallel file systems Does not work on NFS Support for Panasas, PVFS2, HP/SFS, IBM/GPFS, EMC/MPFS2, Lustre Files cannot be written directly compressed but can be compressed asynchronously 53
53 Understanding the effect of I/O - ANSYS Fluent Truck-111million (uses DES model with the segregated implicit solver) Truck-111m Write Data File Throughput (MB/s) Parallel IO = 7x ( Legacy-NAS ) Parallel IO = 4x ( Serial-IO ) Legacy NAS Serial IO Parallel IO Parallel IO (RAID-10, CW) 176 Cores Panasas layout available with MPI-IO Hints in Fluent
54 Understanding the effect of I/O - ANSYS Fluent Landing Gear Noise Predictions using Scale-Resolving Simulations (180M cell model using pressure based segregated solver) 55
55 Understanding the effect of I/O - ANSYS Fluent Asynchronous I/O for Linux Fluent Total write time 3-5x quicker over NFS Even larger speed-ups on bigger cases and local disk (up to 10x) Mesh File Location Async I/O Time 15M Cas NFS OFF 217s 15M Cas NFS ON 62s 15M Dat NFS OFF 113s 15M Dat NFS ON 8s 30M Cas NFS OFF 207s 30M Cas NFS ON 75s 30M Dat NFS OFF 144s 30M Dat NFS ON 10s 56
56 Understanding the effect of I/O - ANSYS Mechanical 4XSSD-RAID-0-SATA-3Gb/s 2XSSD-RAID-0-SATA-3Gb/s SSD-SATA-6Gb/s HD(7.2K RPM)-SATA-6Gb/s SP-5 (in-core) R14.5 Benchmark Results Rating (jobs/day) #Machine X #Core Memory 1X1 1X2 1X4 1X8 1X16 29GB 33GB 35.6GB 40.8GB 47.8GB 57
57 Understanding the effect of I/O - ANSYS Mechanical 4XSSD-RAID-0-SATA-3Gb/s 2XSSD-RAID-0-SATA-3Gb/s SSD-SATA-6Gb/s HD(7.2K RPM)-SATA-6Gb/s SP-5 (in-core) R14.5 Benchmark Results Rating (jobs/day) #Machine X #Core Memory 1X1 1X2 1X4 1X8 1X16 29GB 33GB 35.6GB 40.8GB 47.8GB 58
58 Understanding the effect of I/O - ANSYS Mechanical 4XSSD-RAID-0-SATA-3Gb/s 2XSSD-RAID-0-SATA-3Gb/s SSD-SATA-6Gb/s HD(7.2K RPM)-SATA-6Gb/s SP-5 (in-core) R14.5 Benchmark Results Rating (jobs/day) #Machine X #Core Memory 1X1 1X2 1X4 1X8 1X16 29GB 33GB 35.6GB 40.8GB 47.8GB 59
59 Recap IO is very important for Mechanical Solver o Raid 0 mandatory for multiple disks o SSD s recommended for speed, 15k SAS drives FLUENT and CFX for most customers won t require fast local disk access (for most type of job) Parallel file systems can meet the requirements of both types of solver. 60
60 I/O [Mb/s] Is Your Hardware Ready for HPC? - ANSYS Mechanical 2x SSD x SSD 2x SAS > 6 Mdof 4 Mdof 1x SAS 61 2 Mdof Mdof RAM [Gb]
61 What Hardware Configuration to Select? CPUs? GPU/Phi? HDD vs. SSD SMP vs. DMP Clusters? Interconnects? 62
62 DMP Outperforming SMP 6 Mio Degrees of Freedom Plasticity, Contact Bolt pretension 4 load steps 63
63 DMP: Good Performance at High Core Counts Number of Cores 10.7 Mio Degrees of Freedom Static, linear, structural 1 load step Number of Cores 1 Mio Degrees of Freedom Harmonic, linear, structural 4 frequencies Intel Xeon E processors (2.9 GHz, 16 cores total) 128 GB of RAM 64
64 ANSYS Mechanical 14.5 DMP Enabling Scalability at High Core Counts Minimum time to solution more important than scaling V14sp-5 Model Solution Scalability Turbine geometry 2.1 million DOF Static, nonlinear analysis 1 loadstep, 7 substeps, 25 equilibrium iterations 8-node Linux cluster (with 8 cores per node) Speedup
65 ANSYS Mechanical 15.0 Faster Performance at Higher Core Counts by an enhanced domain decomposition method 6 Improved Scaling at 8 cores 8-node Linux cluster (with 8 cores and 48 GB of RAM per node, InfiniBand DDR) Speedup over R x 1.7x 2.7x 2.4x 0 Engine (9 MDOF) Stent (520 KDOF) Clutch (160 KDOF) Bracket (45 KDOF) 66
66 ANSYS Mechanical 15.0 Faster Performance at Higher Core Counts by an enhanced domain decomposition method 6 Improved Scaling at 16 cores 8-node Linux cluster (with 8 cores and 48 GB of RAM per node, InfiniBand DDR) Speedup over R x 1.8x 3.8x 4.0x 0 Engine (9 MDOF) Stent (520 KDOF) Clutch (160 KDOF) Bracket (45 KDOF) 67
67 ANSYS Mechanical 15.0 Faster Performance at Higher Core Counts by an enhanced domain decomposition method Speedup over R x Improved Scaling at 32 cores 2.2x 3.9x 8-node Linux cluster (with 8 cores and 48 GB of RAM per node, InfiniBand DDR) 5.0x 0 Engine (9 MDOF) Stent (520 KDOF) Clutch (160 KDOF) Bracket (45 KDOF) 68
68 ANSYS Mechanical 16.0 Faster Performance at Higher Core Counts Continually improving Core Solver Rating to 128 cores Courtesy of HP 70
69 ANSYS Mechanical 15.0 HPC & Solver Technology Improvements Coupled Acoustic, 1.2 M DOF, Full Harmonic Response Improved Scalability of Distributed solver at higher core counts NEW Subspace eigen solver supports Shared and Distributed Parallel technology NEW MSUP Harmonic method for unsymmetric systems e.g vibro-acoustics 2.09 MDOFs first 20 modes 71
70 What Hardware Configuration to Select? CPUs? GPU/Phi? HDD vs. SSD SMP vs. DMP Clusters? Interconnects? 72
71 Some Basics ANSYS Software on NVIDIA GPUs GPUs are accelerators and can significantly speed up your simulations GPUs work hand in hand with CPUs Most ANSYS GPU acceleration is user-transparent Only requirement is to inform ANSYS of how many GPUs to use Schematic of a CPU with an attached GPU accelerator CPU begins/ends job, GPU manages heavy computations 73
72 GPU Accelerator Capability - ANSYS Fluent GPU-based Model: Radiation Heat Transfer using OptiX GPU-based Solver: Coupled Algebraic Multigrid (AMG) PBNS linear solver Operating Systems: Both Linux and Win64 for workstations and servers Parallel Methods: Shared and distributed memory Supported GPUs: Tesla K40, Tesla K80, and Quadro 6000 Multi-GPU Support: Full multi-gpu and multi-node support Model Suitability: Unlimited (hardware dependent) 74
73 ANSYS Fluent on GPU Performance of Pressure-Based Solver 27 Jobs/day 1.9x Sedan Model 12 Jobs/day Higher is Better 15 Jobs/day Sedan geometry 3.6M mixed cells Steady, turbulent External aerodynamics Coupled PBNS, DP CPU: Intel Xeon E5-2680; 8 cores GPU: 2 X Tesla K40 CPU only Segregated solver CPU only CPU + GPU Coupled solver Convergence criteria: 10e-03 for all variables; No of iterations until convergence: segregated CPU-2798 iterations (7070 secs); coupled CPU-967 iterations (5900 secs); coupled 985 iterations (3150 secs) NOTE: Times for total solution until convergence 75
74 ANSYS Fluent on GPU Performance of Pressure-Based Solver Higher is Better 33 Jobs/day Truck Model 200% Additional productivity from GPUs 11 Jobs/day External aerodynamics 14 million cells Steady, k-ε turbulence Coupled PBNS, DP 2 nodes each with dual Intel Xeon E V3 (16 CPU cores) and dual Tesla K80 GPUs Additional cost of adding GPUs 40% CPU-only solution cost 100% 100% Simulation productivity from CPU-only system 64 CPU cores 56 CPU cores + 4 Tesla K80 Cost CPU Benefit GPU Simulation productivity (with an HPC Workgroup 64 license) 76
75 ANSYS Fluent on GPU Better Speedup on Larger Models 36 CPU cores 36 CPU cores + 12 GPUs 144 CPU cores 144 CPU cores + 48 GPUs 36 Truck Model ANSYS Fluent Time (Sec) X 9.5 Lower is Better 2 X 18 External aerodynamics Steady, k-ε turbulence Double-precision solver CPU: Intel Xeon E5-2667; 12 cores per node GPU: Tesla K40, 4 per node 14 million cells 111 million cells NOTE: Reported times are per iteration 77
76 NVIDIA-GPU Solution Fit for ANSYS Fluent CFD analysis Is it single-phase & flow dominant? No Yes Pressurebased coupled solver? No Not ideal for GPUs Pressure based coupled solver Segregated solver Is it a steady-state analysis? No 78 Best-fit for GPUs Yes Consider switching to the pressure-based coupled solver for better performance (faster convergence) and further speedups with GPUs. Please see the next slide.
77 NVIDIA-GPU Solution Fit for ANSYS Fluent - Supported Hardware Configurations CPU GPU Homogeneous process distribution Homogeneous GPU selection Number of processes be an exact multiple of number of GPUs CPU CPU GPU Some nodes with 16 processes and some with 12 processes GPU Some nodes with 2 GPUs some with 1 GPU CPU 79 GPU 15 processes not divisible by 2 GPUs
78 ANSYS Fluent - Power Consumption Study Adding GPUs to a CPU-only node resulted in 2.1x speed up while reducing energy consumption by 38% 80
79 NVIDIA-GPU Solution Fit for ANSYS Fluent GPUs accelerate the AMG solver portion of the CFD analysis, thus benefit problems with relatively high %AMG Coupled solvers have high %AMG in the range of 60-70% Fine meshes and low-dissipation problems have high %AMG In some cases, pressure-based coupled solvers offer faster convergence compared to segregated solvers (problem-dependent) The whole problem must fit on GPUs for the calculations to proceed In pressure-based coupled solver, each million cells need approx. 4 GB of GPU memory High-memory cards such as Tesla K40 or Quadro K6000 are ideal Moving scalar equations such as turbulence may not benefit much because of low workloads (using scalar yes option in amg-options ) Better performance on lower CPU core counts A ratio of 3 or 4 CPU cores to 1 GPU is recommended 81
80 GPU Accelerator Capability - ANSYS Mechanical Supports majority of ANSYS structural mechanics solvers: Covers both sparse direct and PCG iterative solvers Only a few minor limitations Ease of use: Requires at least one supported GPU card to be installed No rebuild, no additional installation steps Performance: Offer significantly faster time to solution Should never slow down your simulation V14sp-5 Model 82
81 Influence of GPU Accelerator on Speedup ANSYS Mechanical Model Impeller Impeller geometry of ~2M DOF, solid FEs Normal modes analysis using cyclic symmetry ANSYS Mechanical SMP and Block-Lanczos solver ANSYS Mechanical Model Speaker Speaker geometry of ~0.7M DOF, solid FEs Vibroacoustic harmonic analysis for one frequency ANSYS Mechanical distributed sparse solver Speedup 5.9x 3.7x 2.4x Impeller 2M DOF Normal modes 4 cores + GPU = 2.4x speedup vs. 4 cores Speedup Speaker 0.7M DOF Harmonic analysis 4 cores + GPU = 2.7x speedup vs. 4 cores 83
82 NVIDIA-GPU Solution Fit for ANSYS Mechanical GPUs accelerate the solver part of analysis, consequently problems with high solver workloads benefit the most from GPUs Characterized by both high DOF and high factorization requirements Models with solid elements (such as castings) and have >500K DOF experience good speedups Better performance when run on DMP mode over SMP mode GPU and system memories both play important roles in performance Sparse solver: Bulkier and/or higher-order FE models are good and will be accelerated If the model exceeds 5M DOF, then either add another GPU with 5-6 GB of memory (Tesla K20 or K20X) or use a single GPU with 12 GB memory (Tesla K40 or Quadro K6000). PCG/JCG solver: Memory saving (MSAVE) option should be turned off for enabling GPUs Models with lower Level of Difficulty value (Lev_Diff) are better suited for GPUs 84
83 GPU Achievements ANSYS Mechanical 16.0 Supporting Newest GPUs 371 Jobs/day V15sp-4 Model 2.3x Higher is Better 247 Jobs/day V15sp-5 Model Turbine geometry 3.2 million DOF SOLID187 elements Static, nonlinear analysis Sparse direct solver 159 Jobs/day 135 Jobs/day 1.8x Ball grid array geometry 6.0 million DOF Static, nonlinear analysis Sparse direct solver 8 CPU cores 6 CPU cores + K80 GPU 8 CPU cores 6 CPU cores + K80 GPU 87 Distributed ANSYS Mechanical 16.0 with Intel Xeon E5-2697v2 2.7 GHz 8-core CPU; Tesla K80 GPU with boost clocks.
84 GPU Achievements ANSYS Mechanical 15.0 Supporting Newest GPUs GPUs can offer significantly faster time to solution Higher core counts favor multiple GPUs Lower core counts favor a single GPU Courtesy of HP 89
85 GPU Achievements ANSYS Mechanical 16.0 Supporting Xeon Phi Background: ANSYS Mechanical 15.0 was the first commercial FEA program to support Intel Xeon Phi coprocessor It was limited to shared memory parallelism (SMP) on Linux only Intel Xeon Phi coprocessor support R16 now supports distributed memory parallelism (DMP) and Windows Speedup core 2 cores 4 cores 8 cores 16 cores No Xeon Phi Xeon Phi
86 GPU Achievements ANSYS License Scheme for GPU and Phi Licensing Examples: 1 x ANSYS HPC Pack Total 8 HPC Tasks (4 GPU/Phi Max) Example of Valid Configurations: 6 CPU Cores + 2 GPU/Phi 4 CPU Cores + 4 GPU/Phi 2 x ANSYS HPC Pack Total 32 HPC Tasks (16 GPU/Phi Max)..... (Applies to all schemes: ANSYS HPC, ANSYS HPC Pack, ANSYS HPC Workgroup) 24 CPU Cores + 8 GPU/Phi (Total Use of 2 Compute Nodes) 93
87 Maximizing Performance Putting it Together HDD vs. SSD SMP vs. DMP The right combination of hardware and software leads to maximum efficiency 95 CPUs? Clusters? GPU/Phi? Interconnects?
88 Maximizing Performance ANSYS Mechanical #1 Rule Avoid waiting for I/O to complete Always check if job is I/O bound or compute bound Check output file for CPU and Elapsed times When Elapsed time >> main thread CPU time Total CPU time for main thread : seconds Elapsed Time (sec) = Date = 03/21/2013 I/O bound Consider adding more RAM or faster hard drive configuration When Elapsed time main thread CPU time Compute bound Considering moving simulation to a machine with newer, faster processors Consider using Distributed ANSYS (DMP) instead of SMP Consider running on more CPU cores or possibly using GPU(s) 96
89 Maximizing Performance ANSYS Mechanical How to improve an I/O bound simulation First consider adding more RAM Always the best option for optimal performance Allows the operating system to cache file data in memory Next consider improving the I/O configuration Need fast hard drives to feed fast processors Consider SSDs Higher bandwidths and extremely low seek times Consider RAID configurations RAID 0 for speed RAID 1,5 for redundancy RAID 10 for speed and redundancy 97
90 Maximizing Performance ANSYS Mechanical Example of an I/O bound simulation 2.1 million DOF Nonlinear static analysis Direct sparse solver (DSPARSE) 2 Intel Xeon E (2.6 GHz, 16 cores total) One 10k rpm HDD, one SSD Windows 7 Relative Speedup Benefits of SSD and RAM 16 GB RAM 5.9x 5.9x 128 GB RAM 2.7x 2.9x 0.8x 2 cores, HDD 8 cores, HDD 8 cores, SSD Adding RAM gives biggest gains & allows good scaling Single SSD helps allow some scaling. Not as helpful as RAM, but cheaper Lack of RAM and slow HDD ruin scaling 98
91 Maximizing Performance ANSYS Mechanical How to improve a compute bound simulation First consider using newer, faster processors New CPU architecture and faster clock speeds always help Next consider using parallel processing DMP virtually always recommended over SMP More computations performed in parallel with DMP Significantly faster speedups achieved using DMP DMP can take advantage of all resources on a cluster Whole new class of problems can be solved!! Last consider using GPU acceleration Can help accelerate critical, time-consuming computations 99
92 Maximizing Performance ANSYS Mechanical Example of a compute bound simulation 2.1 million DOF Nonlinear static analysis Direct sparse solver (DSPARSE) 2 Intel Xeon E (2.6 GHz, 16 cores total) 128 GB RAM 1 Tesla K20c Windows 7 Relative Speedup Benefits of DMP and GPU 11.0x Xeon x5675 Xeon E x 1.8x 2 cores 8 cores 8 cores, GPU Maximum performance found by adding GPU Using 8 cores gives faster performance Using newer Xeons gives big gain 100
93 Maximizing Performance ANSYS Mechanical Balanced System for Overall Optimum Performance 2.1 million DOF Nonlinear static analysis Direct sparse solver (DSPARSE) 2 Intel Xeon E (2.6 GHz, 16 cores total) 16 GB RAM SSD and SATA disks 1 Tesla K20c Windows 7 Relative Speedup x Balanced Performance IO Bound 2.7x 5.2x 2 cores 8 cores 8 cores + GPU 12.5x 8 cores + GPU + SSD 101
94 Maximizing Performance ANSYS Mechanical Balanced System for Overall Optimum Performance 2.1 million DOF Nonlinear static analysis Direct sparse solver (DSPARSE) 2 Intel Xeon E (2.6 GHz, 16 cores total) 128 GB RAM SSD and SATA disks 1 Tesla K20c Windows 7 Relative Speedup x 5.7x Balanced Performance IO Bound Compute Bound 12.0x 2.7x 5.2x 24.8x 2 cores 8 cores 8 cores + GPU 12.5x 27.3x 8 cores + GPU + SSD 102
95 Agenda Why Talking About Hardware HPC Terminology ANSYS Work-flow Hardware Considerations Additional resources 103
96 Wrap-up - Hardware An important part of specifying an HPC system is to purchase a balanced system. There is no point in spending all your money on the processor if the I/O is your biggest bottleneck. You are only as good as your slowest component! 104
97 Scalable HPC Licensing ANSYS HPC (per-process) ANSYS HPC Pack HPC product rewarding volume parallel processing for high-fidelity simulations Each simulation consumes one or more Packs Parallel enabled increases quickly with added Packs ANSYS HPC Workgroup HPC product rewarding volume parallel processing for increased simulation throughput within a single colocated workgroup 16 to parallel shared across any number of simulations on a single server Enterprise options available to deploy and use anywhere in the world Single HPC solution for FEA/CFD/FSI and any level of fidelity Parallel Enabled (Cores) HPC Packs per Simulation 105
98 Which type of Licensing is right for me? ANSYS HPC and ANSYS HPC Workgroup gives Flexible use of a pool of licenses. ANSYS HPC Pack gives quick scale-up but is more restrictive in how users can use it. The ability to be more flexible is why HPC Workgroup options cost more than the HPC Packs. 106
99 ANSYS HPC Parametric Pack License HPC license for running parametric FEA or CFD simulations on multiple CPU cores simultaneously, and more cost effectively Key Benefits Ability to automatically and simultaneously execute design points while consuming just one set of application licenses Scalable because number of simultaneous design points enabled increases quickly with added packs Amplifies complete workflow because design points can include execution of multiple applications (pre, meshing, solve, HPC, post) 107 Number of Simultaneous Design Points Enabled Number of HPC Parametric Pack Licenses
100 Additional Resources - IT Webinars Watch recorded webinars by clicking below: Understanding Hardware Selection for ANSYS 15.0 How to Speed Up ANSYS 15.0 with GPUs Intel Technologies Enabling Faster, More Effective Simulation Optimizing Remote Access to Simulation Click on webinars related to HPC/IT for more and upcoming ones! 108
101 Additional Resources - IT White Papers & Technical Briefs White Papers by clicking below: Optimizing Business Value in High-Performance Engineering Computing IBM Application Ready Solutions Reference Architecture for ANSYS Intel Solid-State Drives Increase Productivity of Product Design and Simulation Value of HPC for Ensuring Product Integrity Technical Briefs by clicking below: Parallel Scalability of ANSYS 15.0 on Hewlett-Packard Systems SGI Technology Guide for ANSYS Mechanical Analysts SGI Technology Guide for ANSYS Fluent Analysts Accelerating ANSYS Fluent 15.0 Using NVIDIA GPUs 109
102 Additional Resources - ANSYS IT Webcast Series On-demand webinars: Understanding Hardware Selection for ANSYS 15.0 How to Speed Up ANSYS 15.0 with GPUs Cloud Hosting of ANSYS: Gompute On-Demand Solutions Simplified HPC Clusters for ANSYS Users Intel Technologies Enabling Faster, More Effective Simulation Accelerating Time-to-Results with Parallel I/O Extreme Scalability for High-Fidelity CFD Simulations Methodology and Tools for Compute Performance at Any Scale Understanding Hardware Selection for Structural Mechanics Optimizing Remote Access to Simulation Scalable Storage and Data Management for Engineering Simulation 110
103 Additional Resources ANSYS Platform Support Platform Support Policies Supported Platforms Supported Hardware Tested Systems ANSYS Benchmarks 111
104 Additional Resources ANSYS Partner Solutions Reference configurations Performance data White papers Sales contact points Performance Data 112
105 Additional Resources The Manual Sections on best practices and parallel processing for various solvers Performance Guide for Mechanical Installation walkthroughs for installing the products, parallel processing, licensing and RSM (remote solve manager) ANSYS Advantage Online Magazine 113
106 Thank You! Connect with Me Connect with ANSYS, Inc. LinkedIn ANSYSInc Facebook ANSYSInc Follow our Blog ansys-blog.com 114
Maximize automotive simulation productivity with ANSYS HPC and NVIDIA GPUs
Presented at the 2014 ANSYS Regional Conference- Detroit, June 5, 2014 Maximize automotive simulation productivity with ANSYS HPC and NVIDIA GPUs Bhushan Desam, Ph.D. NVIDIA Corporation 1 NVIDIA Enterprise
More informationWhy HPC for. ANSYS Mechanical and ANSYS CFD?
Why HPC for ANSYS Mechanical and ANSYS CFD? 1 HPC Defined High Performance Computing (HPC) at ANSYS: An ongoing effort designed to remove computing limitations from engineers who use computer aided engineering
More informationANSYS Improvements to Engineering Productivity with HPC and GPU-Accelerated Simulation
ANSYS Improvements to Engineering Productivity with HPC and GPU-Accelerated Simulation Ray Browell nvidia Technology Theater SC12 1 2012 ANSYS, Inc. nvidia Technology Theater SC12 HPC Revolution Recent
More informationANSYS HPC Technology Leadership
ANSYS HPC Technology Leadership 1 ANSYS, Inc. November 14, Why ANSYS Users Need HPC Insight you can t get any other way It s all about getting better insight into product behavior quicker! HPC enables
More informationSolving Large Complex Problems. Efficient and Smart Solutions for Large Models
Solving Large Complex Problems Efficient and Smart Solutions for Large Models 1 ANSYS Structural Mechanics Solutions offers several techniques 2 Current trends in simulation show an increased need for
More informationRecent Advances in ANSYS Toward RDO Practices Using optislang. Wim Slagter, ANSYS Inc. Herbert Güttler, MicroConsult GmbH
Recent Advances in ANSYS Toward RDO Practices Using optislang Wim Slagter, ANSYS Inc. Herbert Güttler, MicroConsult GmbH 1 Product Development Pressures Source: Engineering Simulation & HPC Usage Survey
More informationHPC and IT Issues Session Agenda. Deployment of Simulation (Trends and Issues Impacting IT) Mapping HPC to Performance (Scaling, Technology Advances)
HPC and IT Issues Session Agenda Deployment of Simulation (Trends and Issues Impacting IT) Discussion Mapping HPC to Performance (Scaling, Technology Advances) Discussion Optimizing IT for Remote Access
More informationHPC and IT Issues Session Agenda. Deployment of Simulation (Trends and Issues Impacting IT) Mapping HPC to Performance (Scaling, Technology Advances)
HPC and IT Issues Session Agenda Deployment of Simulation (Trends and Issues Impacting IT) Discussion Mapping HPC to Performance (Scaling, Technology Advances) Discussion Optimizing IT for Remote Access
More informationANSYS HPC. Technology Leadership. Barbara Hutchings ANSYS, Inc. September 20, 2011
ANSYS HPC Technology Leadership Barbara Hutchings barbara.hutchings@ansys.com 1 ANSYS, Inc. September 20, Why ANSYS Users Need HPC Insight you can t get any other way HPC enables high-fidelity Include
More informationStan Posey, CAE Industry Development NVIDIA, Santa Clara, CA, USA
Stan Posey, CAE Industry Development NVIDIA, Santa Clara, CA, USA NVIDIA and HPC Evolution of GPUs Public, based in Santa Clara, CA ~$4B revenue ~5,500 employees Founded in 1999 with primary business in
More informationIBM Information Technology Guide For ANSYS Fluent Customers
IBM ISV & Developer Relations Manufacturing IBM Information Technology Guide For ANSYS Fluent Customers A collaborative effort between ANSYS and IBM 2 IBM Information Technology Guide For ANSYS Fluent
More informationThe Cray CX1 puts massive power and flexibility right where you need it in your workgroup
The Cray CX1 puts massive power and flexibility right where you need it in your workgroup Up to 96 cores of Intel 5600 compute power 3D visualization Up to 32TB of storage GPU acceleration Small footprint
More informationANSYS High. Computing. User Group CAE Associates
ANSYS High Performance Computing User Group 010 010 CAE Associates Parallel Processing in ANSYS ANSYS offers two parallel processing methods: Shared-memory ANSYS: Shared-memory ANSYS uses the sharedmemory
More informationEngineers can be significantly more productive when ANSYS Mechanical runs on CPUs with a high core count. Executive Summary
white paper Computer-Aided Engineering ANSYS Mechanical on Intel Xeon Processors Engineer Productivity Boosted by Higher-Core CPUs Engineers can be significantly more productive when ANSYS Mechanical runs
More information2008 International ANSYS Conference
28 International ANSYS Conference Maximizing Performance for Large Scale Analysis on Multi-core Processor Systems Don Mize Technical Consultant Hewlett Packard 28 ANSYS, Inc. All rights reserved. 1 ANSYS,
More informationTFLOP Performance for ANSYS Mechanical
TFLOP Performance for ANSYS Mechanical Dr. Herbert Güttler Engineering GmbH Holunderweg 8 89182 Bernstadt www.microconsult-engineering.de Engineering H. Güttler 19.06.2013 Seite 1 May 2009, Ansys12, 512
More informationGPU-Acceleration of CAE Simulations. Bhushan Desam NVIDIA Corporation
GPU-Acceleration of CAE Simulations Bhushan Desam NVIDIA Corporation bdesam@nvidia.com 1 AGENDA GPUs in Enterprise Computing Business Challenges in Product Development NVIDIA GPUs for CAE Applications
More informationDell EMC Ready Bundle for HPC Digital Manufacturing ANSYS Performance
Dell EMC Ready Bundle for HPC Digital Manufacturing ANSYS Performance This Dell EMC technical white paper discusses performance benchmarking results and analysis for ANSYS Mechanical, ANSYS Fluent, and
More informationDell EMC Ready Bundle for HPC Digital Manufacturing Dassault Systѐmes Simulia Abaqus Performance
Dell EMC Ready Bundle for HPC Digital Manufacturing Dassault Systѐmes Simulia Abaqus Performance This Dell EMC technical white paper discusses performance benchmarking results and analysis for Simulia
More informationYou will not hear hold music while waiting for the event to begin.
To hear today s event : Listen via the audio stream through your computer speakers OR Listen via phone by clicking the teleconference request button in the Participants window You will not hear hold music
More informationAccelerating Implicit LS-DYNA with GPU
Accelerating Implicit LS-DYNA with GPU Yih-Yih Lin Hewlett-Packard Company Abstract A major hindrance to the widespread use of Implicit LS-DYNA is its high compute cost. This paper will show modern GPU,
More informationFaster Innovation - Accelerating SIMULIA Abaqus Simulations with NVIDIA GPUs. Baskar Rajagopalan Accelerated Computing, NVIDIA
Faster Innovation - Accelerating SIMULIA Abaqus Simulations with NVIDIA GPUs Baskar Rajagopalan Accelerated Computing, NVIDIA 1 Engineering & IT Challenges/Trends NVIDIA GPU Solutions AGENDA Abaqus GPU
More informationEnhancing Analysis-Based Design with Quad-Core Intel Xeon Processor-Based Workstations
Performance Brief Quad-Core Workstation Enhancing Analysis-Based Design with Quad-Core Intel Xeon Processor-Based Workstations With eight cores and up to 80 GFLOPS of peak performance at your fingertips,
More informationACCELERATING CFD AND RESERVOIR SIMULATIONS WITH ALGEBRAIC MULTI GRID Chris Gottbrath, Nov 2016
ACCELERATING CFD AND RESERVOIR SIMULATIONS WITH ALGEBRAIC MULTI GRID Chris Gottbrath, Nov 2016 Challenges What is Algebraic Multi-Grid (AMG)? AGENDA Why use AMG? When to use AMG? NVIDIA AmgX Results 2
More informationPerformance Optimizations via Connect-IB and Dynamically Connected Transport Service for Maximum Performance on LS-DYNA
Performance Optimizations via Connect-IB and Dynamically Connected Transport Service for Maximum Performance on LS-DYNA Pak Lui, Gilad Shainer, Brian Klaff Mellanox Technologies Abstract From concept to
More informationMSC Nastran Explicit Nonlinear (SOL 700) on Advanced SGI Architectures
MSC Nastran Explicit Nonlinear (SOL 700) on Advanced SGI Architectures Presented By: Dr. Olivier Schreiber, Application Engineering, SGI Walter Schrauwen, Senior Engineer, Finite Element Development, MSC
More informationSpeedup Altair RADIOSS Solvers Using NVIDIA GPU
Innovation Intelligence Speedup Altair RADIOSS Solvers Using NVIDIA GPU Eric LEQUINIOU, HPC Director Hongwei Zhou, Senior Software Developer May 16, 2012 Innovation Intelligence ALTAIR OVERVIEW Altair
More informationMaximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms
Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Family-Based Platforms Executive Summary Complex simulations of structural and systems performance, such as car crash simulations,
More informationSIMPLIFYING HPC SIMPLIFYING HPC FOR ENGINEERING SIMULATION WITH ANSYS
SIMPLIFYING HPC SIMPLIFYING HPC FOR ENGINEERING SIMULATION WITH ANSYS THE DELL WAY We are an acknowledged leader in academic supercomputing including major HPC systems installed at the Cambridge University
More informationMaximizing Memory Performance for ANSYS Simulations
Maximizing Memory Performance for ANSYS Simulations By Alex Pickard, 2018-11-19 Memory or RAM is an important aspect of configuring computers for high performance computing (HPC) simulation work. The performance
More informationQLogic TrueScale InfiniBand and Teraflop Simulations
WHITE Paper QLogic TrueScale InfiniBand and Teraflop Simulations For ANSYS Mechanical v12 High Performance Interconnect for ANSYS Computer Aided Engineering Solutions Executive Summary Today s challenging
More informationUniversity at Buffalo Center for Computational Research
University at Buffalo Center for Computational Research The following is a short and long description of CCR Facilities for use in proposals, reports, and presentations. If desired, a letter of support
More informationDell HPC System for Manufacturing System Architecture and Application Performance
Dell HPC System for Manufacturing System Architecture and Application Performance This Dell technical white paper describes the architecture of the Dell HPC System for Manufacturing and discusses performance
More informationLBRN - HPC systems : CCT, LSU
LBRN - HPC systems : CCT, LSU HPC systems @ CCT & LSU LSU HPC Philip SuperMike-II SuperMIC LONI HPC Eric Qeenbee2 CCT HPC Delta LSU HPC Philip 3 Compute 32 Compute Two 2.93 GHz Quad Core Nehalem Xeon 64-bit
More informationAutomated Design Exploration and Optimization + HPC Best Practices
Automated Design Exploration and Optimization + HPC Best Practices 1 Outline The Path to Robust Design ANSYS DesignXplorer Mesh Morphing and Optimizer RBF Morph Adjoint Solver HPC Best Practices 2 The
More informationTECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 11th CALL (T ier-0)
TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 11th CALL (T ier-0) Contributing sites and the corresponding computer systems for this call are: BSC, Spain IBM System X idataplex CINECA, Italy The site selection
More informationFEMAP/NX NASTRAN PERFORMANCE TUNING
FEMAP/NX NASTRAN PERFORMANCE TUNING Chris Teague - Saratech (949) 481-3267 www.saratechinc.com NX Nastran Hardware Performance History Running Nastran in 1984: Cray Y-MP, 32 Bits! (X-MP was only 24 Bits)
More informationA Comprehensive Study on the Performance of Implicit LS-DYNA
12 th International LS-DYNA Users Conference Computing Technologies(4) A Comprehensive Study on the Performance of Implicit LS-DYNA Yih-Yih Lin Hewlett-Packard Company Abstract This work addresses four
More informationTECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 13 th CALL (T ier-0)
TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 13 th CALL (T ier-0) Contributing sites and the corresponding computer systems for this call are: BSC, Spain IBM System x idataplex CINECA, Italy Lenovo System
More informationHPC Architectures. Types of resource currently in use
HPC Architectures Types of resource currently in use Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us
More information2008 International ANSYS Conference
2008 International ANSYS Conference Maximizing Productivity With InfiniBand-Based Clusters Gilad Shainer Director of Technical Marketing Mellanox Technologies 2008 ANSYS, Inc. All rights reserved. 1 ANSYS,
More informationBest Practices for Setting BIOS Parameters for Performance
White Paper Best Practices for Setting BIOS Parameters for Performance Cisco UCS E5-based M3 Servers May 2013 2014 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page
More informationNetwork Design Considerations for Grid Computing
Network Design Considerations for Grid Computing Engineering Systems How Bandwidth, Latency, and Packet Size Impact Grid Job Performance by Erik Burrows, Engineering Systems Analyst, Principal, Broadcom
More informationPART-I (B) (TECHNICAL SPECIFICATIONS & COMPLIANCE SHEET) Supply and installation of High Performance Computing System
INSTITUTE FOR PLASMA RESEARCH (An Autonomous Institute of Department of Atomic Energy, Government of India) Near Indira Bridge; Bhat; Gandhinagar-382428; India PART-I (B) (TECHNICAL SPECIFICATIONS & COMPLIANCE
More informationAccelerated ANSYS Fluent: Algebraic Multigrid on a GPU. Robert Strzodka NVAMG Project Lead
Accelerated ANSYS Fluent: Algebraic Multigrid on a GPU Robert Strzodka NVAMG Project Lead A Parallel Success Story in Five Steps 2 Step 1: Understand Application ANSYS Fluent Computational Fluid Dynamics
More informationSTAR-CCM+ Performance Benchmark and Profiling. July 2014
STAR-CCM+ Performance Benchmark and Profiling July 2014 Note The following research was performed under the HPC Advisory Council activities Participating vendors: CD-adapco, Intel, Dell, Mellanox Compute
More informationHPC Considerations for Scalable Multidiscipline CAE Applications on Conventional Linux Platforms. Author: Correspondence: ABSTRACT:
HPC Considerations for Scalable Multidiscipline CAE Applications on Conventional Linux Platforms Author: Stan Posey Panasas, Inc. Correspondence: Stan Posey Panasas, Inc. Phone +510 608 4383 Email sposey@panasas.com
More informationSun Lustre Storage System Simplifying and Accelerating Lustre Deployments
Sun Lustre Storage System Simplifying and Accelerating Lustre Deployments Torben Kling-Petersen, PhD Presenter s Name Principle Field Title andengineer Division HPC &Cloud LoB SunComputing Microsystems
More informationMaking Supercomputing More Available and Accessible Windows HPC Server 2008 R2 Beta 2 Microsoft High Performance Computing April, 2010
Making Supercomputing More Available and Accessible Windows HPC Server 2008 R2 Beta 2 Microsoft High Performance Computing April, 2010 Windows HPC Server 2008 R2 Windows HPC Server 2008 R2 makes supercomputing
More informationAcuSolve Performance Benchmark and Profiling. October 2011
AcuSolve Performance Benchmark and Profiling October 2011 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, Dell, Mellanox, Altair Compute
More informationReal Application Performance and Beyond
Real Application Performance and Beyond Mellanox Technologies Inc. 2900 Stender Way, Santa Clara, CA 95054 Tel: 408-970-3400 Fax: 408-970-3403 http://www.mellanox.com Scientists, engineers and analysts
More informationBuilding NVLink for Developers
Building NVLink for Developers Unleashing programmatic, architectural and performance capabilities for accelerated computing Why NVLink TM? Simpler, Better and Faster Simplified Programming No specialized
More informationTechnical guide. Windows HPC server 2016 for LS-DYNA How to setup. Reference system setup - v1.0
Technical guide Windows HPC server 2016 for LS-DYNA How to setup Reference system setup - v1.0 2018-02-17 2018 DYNAmore Nordic AB LS-DYNA / LS-PrePost 1 Introduction - Running LS-DYNA on Windows HPC cluster
More informationLS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance
11 th International LS-DYNA Users Conference Computing Technology LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance Gilad Shainer 1, Tong Liu 2, Jeff Layton
More informationANSYS Fluent 14 Performance Benchmark and Profiling. October 2012
ANSYS Fluent 14 Performance Benchmark and Profiling October 2012 Note The following research was performed under the HPC Advisory Council activities Special thanks for: HP, Mellanox For more information
More informationHP Z Turbo Drive G2 PCIe SSD
Performance Evaluation of HP Z Turbo Drive G2 PCIe SSD Powered by Samsung NVMe technology Evaluation Conducted Independently by: Hamid Taghavi Senior Technical Consultant August 2015 Sponsored by: P a
More informationDeep Learning Performance and Cost Evaluation
Micron 5210 ION Quad-Level Cell (QLC) SSDs vs 7200 RPM HDDs in Centralized NAS Storage Repositories A Technical White Paper Don Wang, Rene Meyer, Ph.D. info@ AMAX Corporation Publish date: October 25,
More informationThe BioHPC Nucleus Cluster & Future Developments
1 The BioHPC Nucleus Cluster & Future Developments Overview Today we ll talk about the BioHPC Nucleus HPC cluster with some technical details for those interested! How is it designed? What hardware does
More informationFaster Metal Forming Solution with Latest Intel Hardware & Software Technology
12 th International LS-DYNA Users Conference Computing Technologies(3) Faster Metal Forming Solution with Latest Intel Hardware & Software Technology Nick Meng 1, Jixian Sun 2, Paul J Besl 1 1 Intel Corporation,
More informationAltair OptiStruct 13.0 Performance Benchmark and Profiling. May 2015
Altair OptiStruct 13.0 Performance Benchmark and Profiling May 2015 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, Dell, Mellanox Compute
More informationSGI. Technology Guide for Users of Abaqus. September, Authors Scott Shaw, Dr. Olivier Schreiber, Tony DeVarco TECH GUIDE
SGI Technology Guide for Users of Abaqus September, 2014 Authors Scott Shaw, Dr. Olivier Schreiber, Tony DeVarco Senior CAE Applications Engineer, SGI Applications Engineering Director of SGI Virtual Product
More informationUltimate Workstation Performance
Product brief & COMPARISON GUIDE Intel Scalable Processors Intel W Processors Ultimate Workstation Performance Intel Scalable Processors and Intel W Processors for Professional Workstations Optimized to
More informationDeep Learning Performance and Cost Evaluation
Micron 5210 ION Quad-Level Cell (QLC) SSDs vs 7200 RPM HDDs in Centralized NAS Storage Repositories A Technical White Paper Rene Meyer, Ph.D. AMAX Corporation Publish date: October 25, 2018 Abstract Introduction
More informationTECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 6 th CALL (Tier-0)
TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 6 th CALL (Tier-0) Contributing sites and the corresponding computer systems for this call are: GCS@Jülich, Germany IBM Blue Gene/Q GENCI@CEA, France Bull Bullx
More informationThe Visual Computing Company
The Visual Computing Company GPU Acceleration Benefits for Applied CAE Axel Koehler, Senior Solutions Architect HPC, NVIDIA HPC Advisory Council Meeting, April 2014, Lugano Outline General overview about
More informationBIG DATA AND HADOOP ON THE ZFS STORAGE APPLIANCE
BIG DATA AND HADOOP ON THE ZFS STORAGE APPLIANCE BRETT WENINGER, MANAGING DIRECTOR 10/21/2014 ADURANT APPROACH TO BIG DATA Align to Un/Semi-structured Data Instead of Big Scale out will become Big Greatest
More informationDELL EMC ISILON F800 AND H600 I/O PERFORMANCE
DELL EMC ISILON F800 AND H600 I/O PERFORMANCE ABSTRACT This white paper provides F800 and H600 performance data. It is intended for performance-minded administrators of large compute clusters that access
More informationBig Data Analytics Performance for Large Out-Of- Core Matrix Solvers on Advanced Hybrid Architectures
Procedia Computer Science Volume 51, 2015, Pages 2774 2778 ICCS 2015 International Conference On Computational Science Big Data Analytics Performance for Large Out-Of- Core Matrix Solvers on Advanced Hybrid
More informationIBM InfoSphere Streams v4.0 Performance Best Practices
Henry May IBM InfoSphere Streams v4.0 Performance Best Practices Abstract Streams v4.0 introduces powerful high availability features. Leveraging these requires careful consideration of performance related
More informationIntroduction to parallel Computing
Introduction to parallel Computing VI-SEEM Training Paschalis Paschalis Korosoglou Korosoglou (pkoro@.gr) (pkoro@.gr) Outline Serial vs Parallel programming Hardware trends Why HPC matters HPC Concepts
More informationMemory-Based Cloud Architectures
Memory-Based Cloud Architectures ( Or: Technical Challenges for OnDemand Business Software) Jan Schaffner Enterprise Platform and Integration Concepts Group Example: Enterprise Benchmarking -) *%'+,#$)
More informationLustre2.5 Performance Evaluation: Performance Improvements with Large I/O Patches, Metadata Improvements, and Metadata Scaling with DNE
Lustre2.5 Performance Evaluation: Performance Improvements with Large I/O Patches, Metadata Improvements, and Metadata Scaling with DNE Hitoshi Sato *1, Shuichi Ihara *2, Satoshi Matsuoka *1 *1 Tokyo Institute
More informationHeadline in Arial Bold 30pt. SGI Altix XE Server ANSYS Microsoft Windows Compute Cluster Server 2003
Headline in Arial Bold 30pt SGI Altix XE Server ANSYS Microsoft Windows Compute Cluster Server 2003 SGI Altix XE Building Blocks XE Cluster Head Node Two dual core Xeon processors 16GB Memory SATA/SAS
More informationAdvances of parallel computing. Kirill Bogachev May 2016
Advances of parallel computing Kirill Bogachev May 2016 Demands in Simulations Field development relies more and more on static and dynamic modeling of the reservoirs that has come a long way from being
More informationStorage Update and Storage Best Practices for Microsoft Server Applications. Dennis Martin President, Demartek January 2009 Copyright 2009 Demartek
Storage Update and Storage Best Practices for Microsoft Server Applications Dennis Martin President, Demartek January 2009 Copyright 2009 Demartek Agenda Introduction Storage Technologies Storage Devices
More informationSGI Overview. HPC User Forum Dearborn, Michigan September 17 th, 2012
SGI Overview HPC User Forum Dearborn, Michigan September 17 th, 2012 SGI Market Strategy HPC Commercial Scientific Modeling & Simulation Big Data Hadoop In-memory Analytics Archive Cloud Public Private
More informationComputer Aided Engineering with Today's Multicore, InfiniBand-Based Clusters ANSYS, Inc. All rights reserved. 1 ANSYS, Inc.
Computer Aided Engineering with Today's Multicore, InfiniBand-Based Clusters 2006 ANSYS, Inc. All rights reserved. 1 ANSYS, Inc. Proprietary Our Business Simulation Driven Product Development Deliver superior
More informationMellanox Technologies Maximize Cluster Performance and Productivity. Gilad Shainer, October, 2007
Mellanox Technologies Maximize Cluster Performance and Productivity Gilad Shainer, shainer@mellanox.com October, 27 Mellanox Technologies Hardware OEMs Servers And Blades Applications End-Users Enterprise
More informationAccelerating HPC. (Nash) Dr. Avinash Palaniswamy High Performance Computing Data Center Group Marketing
Accelerating HPC (Nash) Dr. Avinash Palaniswamy High Performance Computing Data Center Group Marketing SAAHPC, Knoxville, July 13, 2010 Legal Disclaimer Intel may make changes to specifications and product
More informationFUSION1200 Scalable x86 SMP System
FUSION1200 Scalable x86 SMP System Introduction Life Sciences Departmental System Manufacturing (CAE) Departmental System Competitive Analysis: IBM x3950 Competitive Analysis: SUN x4600 / SUN x4600 M2
More informationConsulting Solutions WHITE PAPER Citrix XenDesktop XenApp 6.x Planning Guide: Virtualization Best Practices
Consulting Solutions WHITE PAPER Citrix XenDesktop XenApp 6.x Planning Guide: Virtualization Best Practices www.citrix.com Table of Contents Overview... 3 Scalability... 3 Guidelines... 4 Operations...
More informationFROM HPC TO THE CLOUD WITH AMQP AND OPEN SOURCE SOFTWARE
FROM HPC TO THE CLOUD WITH AMQP AND OPEN SOURCE SOFTWARE Carl Trieloff cctrieloff@redhat.com Red Hat Lee Fisher lee.fisher@hp.com Hewlett-Packard High Performance Computing on Wall Street conference 14
More informationDell PowerEdge R730xd Servers with Samsung SM1715 NVMe Drives Powers the Aerospike Fraud Prevention Benchmark
Dell PowerEdge R730xd Servers with Samsung SM1715 NVMe Drives Powers the Aerospike Fraud Prevention Benchmark Testing validation report prepared under contract with Dell Introduction As innovation drives
More informationFree SolidWorks from Performance Constraints
Free SolidWorks from Performance Constraints Adrian Fanjoy Technical Services Director, CATI Josh Altergott Technical Support Manager, CATI Objective Build a better understanding of what factors involved
More informationHPC 2 Informed by Industry
HPC 2 Informed by Industry HPC User Forum October 2011 Merle Giles Private Sector Program & Economic Development mgiles@ncsa.illinois.edu National Center for Supercomputing Applications University of Illinois
More informationThe Oracle Database Appliance I/O and Performance Architecture
Simple Reliable Affordable The Oracle Database Appliance I/O and Performance Architecture Tammy Bednar, Sr. Principal Product Manager, ODA 1 Copyright 2012, Oracle and/or its affiliates. All rights reserved.
More informationCluster Scalability of Implicit and Implicit-Explicit LS-DYNA Simulations Using a Parallel File System
Cluster Scalability of Implicit and Implicit-Explicit LS-DYNA Simulations Using a Parallel File System Mr. Stan Posey, Dr. Bill Loewe Panasas Inc., Fremont CA, USA Dr. Paul Calleja University of Cambridge,
More informationPerformance Benefits of NVIDIA GPUs for LS-DYNA
Performance Benefits of NVIDIA GPUs for LS-DYNA Mr. Stan Posey and Dr. Srinivas Kodiyalam NVIDIA Corporation, Santa Clara, CA, USA Summary: This work examines the performance characteristics of LS-DYNA
More informationHP and CATIA HP Workstations for running Dassault Systèmes CATIA
Whitepaper HP and NX HP and CATIA HP Workstations for running Dassault Systèmes CATIA 4AA3-xxxxENW, Created Month 20XX This is an HP Indigo digital print (optional) Table of contents 3 Introduction 3 What
More informationFUJITSU PHI Turnkey Solution
FUJITSU PHI Turnkey Solution Integrated ready to use XEON-PHI based platform Dr. Pierre Lagier ISC2014 - Leipzig PHI Turnkey Solution challenges System performance challenges Parallel IO best architecture
More informationAcuSolve Performance Benchmark and Profiling. October 2011
AcuSolve Performance Benchmark and Profiling October 2011 Note The following research was performed under the HPC Advisory Council activities Participating vendors: AMD, Dell, Mellanox, Altair Compute
More informationEvolving HPC Solutions Using Open Source Software & Industry-Standard Hardware
CLUSTER TO CLOUD Evolving HPC Solutions Using Open Source Software & Industry-Standard Hardware Carl Trieloff cctrieloff@redhat.com Red Hat, Technical Director Lee Fisher lee.fisher@hp.com Hewlett-Packard,
More informationRecent Advances in Modelling Wind Parks in STAR CCM+ Steve Evans
Recent Advances in Modelling Wind Parks in STAR CCM+ Steve Evans Introduction Company STAR-CCM+ Agenda Wind engineering at CD-adapco STAR-CCM+ & EnviroWizard Developments for Offshore Simulation CD-adapco:
More informationHybrid (MPP+OpenMP) version of LS-DYNA
Hybrid (MPP+OpenMP) version of LS-DYNA LS-DYNA Forum 2011 Jason Wang Oct. 12, 2011 Outline 1) Why MPP HYBRID 2) What is HYBRID 3) Benefits 4) How to use HYBRID Why HYBRID LS-DYNA LS-DYNA/MPP Speedup, 10M
More informationTurbostream: A CFD solver for manycore
Turbostream: A CFD solver for manycore processors Tobias Brandvik Whittle Laboratory University of Cambridge Aim To produce an order of magnitude reduction in the run-time of CFD solvers for the same hardware
More informationWHITE PAPER AGILOFT SCALABILITY AND REDUNDANCY
WHITE PAPER AGILOFT SCALABILITY AND REDUNDANCY Table of Contents Introduction 3 Performance on Hosted Server 3 Figure 1: Real World Performance 3 Benchmarks 3 System configuration used for benchmarks 3
More informationIBM Emulex 16Gb Fibre Channel HBA Evaluation
IBM Emulex 16Gb Fibre Channel HBA Evaluation Evaluation report prepared under contract with Emulex Executive Summary The computing industry is experiencing an increasing demand for storage performance
More informationGPU TECHNOLOGY WORKSHOP SOUTH EAST ASIA 2014
GPU TECHNOLOGY WORKSHOP SOUTH EAST ASIA 2014 Delivering virtualized 3D graphics apps with Citrix XenDesktop & NVIDIA Grid GPUs Garry Soriano Solution Engineer, ASEAN Citrix Systems garry.soriano@citrix.com
More informationRIGHTNOW A C E
RIGHTNOW A C E 2 0 1 4 2014 Aras 1 A C E 2 0 1 4 Scalability Test Projects Understanding the results 2014 Aras Overview Original Use Case Scalability vs Performance Scale to? Scaling the Database Server
More informationInfoBrief. Dell 2-Node Cluster Achieves Unprecedented Result with Three-tier SAP SD Parallel Standard Application Benchmark on Linux
InfoBrief Dell 2-Node Cluster Achieves Unprecedented Result with Three-tier SAP SD Parallel Standard Application Benchmark on Linux Leveraging Oracle 9i Real Application Clusters (RAC) Technology and Red
More information