PowerEdge NUMA Configurations with AMD EPYC Processors

Similar documents
Dell EMC NUMA Configuration for AMD EPYC (Naples) Processors

NUMA Topology for AMD EPYC Naples Family Processors

Dell PowerEdge 11 th Generation Servers: R810, R910, and M910 Memory Guidance

EPYC VIDEO CUG 2018 MAY 2018

Linux Network Tuning Guide for AMD EPYC Processor Based Servers

Linux Network Tuning Guide for AMD EPYC Processor Based Servers

Memory Population Guidelines for AMD EPYC Processors

Accelerating Microsoft SQL Server 2016 Performance With Dell EMC PowerEdge R740

NVMe Performance Testing and Optimization Application Note

Microsoft Windows 2016 Mellanox 100GbE NIC Tuning Guide

Six-Core AMD Opteron Processor

Dell EMC Ready Bundle for HPC Digital Manufacturing Dassault Systѐmes Simulia Abaqus Performance

Atos ARM solutions for HPC

VMware Infrastructure Update 1 for Dell PowerEdge Systems. Deployment Guide. support.dell.com

Dell PowerEdge R720xd with PERC H710P: A Balanced Configuration for Microsoft Exchange 2010 Solutions

Dell EMC Ready Bundle for HPC Digital Manufacturing ANSYS Performance

Exchange Server 2007 Performance Comparison of the Dell PowerEdge 2950 and HP Proliant DL385 G2 Servers

Cisco UCS M5 Blade and Rack Servers

Altair OptiStruct 13.0 Performance Benchmark and Profiling. May 2015

Accelerating Microsoft SQL Server Performance With NVDIMM-N on Dell EMC PowerEdge R740

VMware Infrastructure Update 1 for Dell PowerEdge Systems. Deployment Guide. support.dell.com

DELL EMC READY BUNDLE FOR VIRTUALIZATION WITH VMWARE AND FIBRE CHANNEL INFRASTRUCTURE

GLOBAL [Change] Input Product Model. Products Support News & Events Where to Buy FOX Club About FOXCONN. Home Products Motherboards Socket AM3 A74GA

Dell EMC PowerEdge R6415

Server-Related Policy Configuration

VMware Infrastructure 3.5 for Dell PowerEdge Systems. Deployment Guide. support.dell.com

DELL Reference Configuration Microsoft SQL Server 2008 Fast Track Data Warehouse

The PowerEdge M830 blade server

VMware Infrastructure 3.5 Update 2 for Dell PowerEdge Systems. Deployment Guide. support.dell.com

Dell EMC PowerEdge R7415

Best Practices for Deploying a Mixed 1Gb/10Gb Ethernet SAN using Dell Storage PS Series Arrays

goals review some basic concepts and terminology

Dell EMC PowerEdge Installation, Management and Diagnostics

Dell PowerVault Network Attached Storage (NAS) Systems Running Windows Storage Server 2012 Troubleshooting Guide

Upgrade to Microsoft SQL Server 2016 with Dell EMC Infrastructure

Avid Configuration Guidelines Dell R7920 Rack workstation Dual 8 to 28 Core CPU System

CAUTIONARY STATEMENT This presentation contains forward-looking statements concerning Advanced Micro Devices, Inc. (AMD) including, but not limited to

Hyper-converged storage for Oracle RAC based on NVMe SSDs and standard x86 servers

I/O Channels. RAM size. Chipsets. Cluster Computing Paul A. Farrell 9/8/2011. Memory (RAM) Dept of Computer Science Kent State University 1

AMD Opteron 4200 Series Processor

TPC-E testing of Microsoft SQL Server 2016 on Dell EMC PowerEdge R830 Server and Dell EMC SC9000 Storage

Reference Architecture Microsoft Exchange 2013 on Dell PowerEdge R730xd 2500 Mailboxes

DELL EMC READY BUNDLE FOR VIRTUALIZATION WITH VMWARE AND ISCSI INFRASTRUCTURE

Best Practices for Setting BIOS Parameters for Performance

VMware VMmark V1.1 Results

(Please refer "CPU Support List" for more information.) (Please refer "Memory Support List" for more information.)

POWEREDGE RACK SERVERS

AMD EPYC Processors Showcase High Performance for Network Function Virtualization (NFV)

Active System Manager Release 8.2 Compatibility Matrix

NAMD Performance Benchmark and Profiling. January 2015

Consolidating Microsoft SQL Server databases on PowerEdge R930 server

Designing High Performance Communication Middleware with Emerging Multi-core Architectures

Open storage architecture for private Oracle database clouds

Dell EMC PowerEdge Systems SUSE Linux Enterprise Server 15. Installation Instructions and Important Information

DELL EMC READY BUNDLE FOR VIRTUALIZATION WITH VMWARE VSAN INFRASTRUCTURE

Avid Configuration Guidelines Dell R7920 Rack workstation Dual 8 to 28 Core CPU System

Consolidating OLTP Workloads on Dell PowerEdge R th generation Servers

Maximizing Six-Core AMD Opteron Processor Performance with RHEL

WHITE PAPER SINGLE & MULTI CORE PERFORMANCE OF AN ERASURE CODING WORKLOAD ON AMD EPYC

SUN CUSTOMER READY HPC CLUSTER: REFERENCE CONFIGURATIONS WITH SUN FIRE X4100, X4200, AND X4600 SERVERS Jeff Lu, Systems Group Sun BluePrints OnLine

BIOS Parameters by Server Model

HostEngine 5URP24 Computer User Guide

Viewing Server Properties

Multiprocessor Cache Coherence. Chapter 5. Memory System is Coherent If... From ILP to TLP. Enforcing Cache Coherence. Multiprocessor Types

Optimal BIOS settings for HPC with Dell PowerEdge 12 th generation servers

VMmark 3.0 Results. Number of Hosts: 2 Uniform Hosts [yes/no]: yes Total sockets/cores/threads in test: 4/112/224

Dell PowerEdge VRTX Storage Subsystem Compatibility Matrix

Notes Section Notes for Workload. Configuration Section Configuration

LAMMPS-KOKKOS Performance Benchmark and Profiling. September 2015

Dell EMC SC Series Storage with SAS Front-end Support for VMware vsphere

GPU > CPU. FOR HIGH PERFORMANCE COMPUTING PRESENTATION BY - SADIQ PASHA CHETHANA DILIP

BIOS Parameters by Server Model

Hardware Evolution in Data Centers

Technical Specifications

Microsoft SQL Server in a VMware Environment on Dell PowerEdge R810 Servers and Dell EqualLogic Storage

Dell Reference Configuration for Large Oracle Database Deployments on Dell EqualLogic Storage

21 TB Data Warehouse Fast Track for Microsoft SQL Server 2014 Using the PowerEdge R730xd Server Deployment Guide

1. AMD Ryzen 2nd Generation processors 2. AMD Ryzen with Radeon Vega Graphics processors 3. AMD Ryzen 1st Generation processors

Veritas Access. Installing Veritas Access in VMWare ESx environment. Who should read this paper? Veritas Pre-Sales, Partner Pre-Sales

HyperTransport Technology

SGI UV 300RL for Oracle Database In-Memory

GROMACS (GPU) Performance Benchmark and Profiling. February 2016

Avid Configuration Guidelines Dell 7820 Tower workstation Dual 8 to 28 Core CPU System

PowerEdge FX2 - Upgrading from 10GbE Pass-through Modules to FN410S I/O Modules

Dell PowerEdge M620 Systems (For Dell PowerEdge VRTX Enclosure) Owner's Manual

Pexip Infinity Server Design Guide

Competitive Power Savings with VMware Consolidation on the Dell PowerEdge 2950

Dell EMC PowerEdge R7425

(Please refer "CPU Support List" for more information.)

Dell EMC PowerEdge R740xd as a Dedicated Milestone Server, Using Nvidia GPU Hardware Acceleration

About 2CRSI. OCtoPus Solution. Technical Specifications. OCtoPus servers. OCtoPus. OCP Solution by 2CRSI.

Boot Mode Considerations: BIOS vs. UEFI

Design a Remote-Office or Branch-Office Data Center with Cisco UCS Mini

The Impact of SSD Selection on SQL Server Performance. Solution Brief. Understanding the differences in NVMe and SATA SSD throughput

About 2CRSI. OCtoPus Solution. Technical Specifications. OCtoPus. OCP Solution by 2CRSI.

STAR-CCM+ Performance Benchmark and Profiling. July 2014

An Oracle White Paper December Accelerating Deployment of Virtualized Infrastructures with the Oracle VM Blade Cluster Reference Configuration

Data Sheet FUJITSU Server PRIMERGY CX2550 M1 Dual Socket Server Node

How Might Recently Formed System Interconnect Consortia Affect PM? Doug Voigt, SNIA TC

Avid Configuration Guidelines Dell 7920 Tower workstation Dual 8 to 28 Core CPU System

Transcription:

PowerEdge Product Group Direct from Development PowerEdge NUMA Configurations with AMD EPYC Processors Tech Note by: Jose Grande SUMMARY With the introduction of AMD s EPYC (Naples) x86 Server CPUs featuring four Zeppelin dies per package, there is a need to clarify how AMD s new silicon design establishes Non- Uniform Memory Access (NUMA) domains across dies and sockets. This tech note explains how Dell EMC PowerEdge Servers leverage AMD s EPYC CPUs to configure NUMA domains for optimal performance by using Dell EMC BIOS Settings. AMD EPYC is a Multi-Chip Module (MCM) processor and per silicon package there are four Zeppelin SOCs/dies leveraged from AMD Ryzen. Each of the four dies have direct Infinity Fabric connections to each of the other dies as well as a possible socket-to-socket interconnect. This design allows, at most, four NUMA nodes per socket or eight NUMA nodes in a dual-socket system. AMD EPYC s four dies each have two Unified Memory Controllers (UMC), that each control one DDR channel with two DIMMs per channel, along with one controller for IO, as shown in Figure 1 below: Channel 0 Channel 1 IO IO DIE 0 DIE 1 Channel 2 Channel 3 Channel 4 Channel 6 DIE 2 DIE 3 Channel 5 Channel 7 IO IO Figure 1: Die layout of the AMD EPYC processor Memory Interleaving Options and Rules AMDs EPYC processor supports four memory interleaving options: Socket Interleaving (2 processor configurations) Die Interleaving Channel Interleaving Memory Interleaving disabled. Other trademarks may be trademarks of their respective owners.

The following are the rules for each memory interleave options: The system can socket interleave, but only if all channels in the entire system have the same amount of memory. Die interleaving must be enabled as well. The system can die interleave, but only if all channels on the socket have the same amount of memory. Channel interleaving must be enabled as well. The system can channel interleave as long as both channels have at least one DIMM. The channels do not have to be symmetrical. This is the default configuration. No interleave at all, where each channel is stacked on top of the previous channel. However, it should be noted that probe filter performance may be affected if there is one UMC with less memory than the other UMC on the same die. NUMA Domains per Memory Interleave Option AMD s new silicon architecture adds nuances on how to configure platforms for NUMA. The focus of AMDs scheme to NUMA lies within its quad-die layout and its potential to have four NUMA domains. Socket interleaving is memory interleave option meant only for inter-socket memory interleaving, and is only available with 2-processor configurations. In this configuration memory across both sockets will be seen as a single memory domain producing a non-numa configuration. Die interleaving is available for all configurations. Die interleaving is the intra-socket memory interleave option that creates one NUMA domain for all the four dies on socket. In a 2-processor configuration, this will produce two NUMA domains, one domain pertaining to each socket, providing customers with the first option for NUMA configuration. In a one socket platform, die interleaving will be the maximum option of memory interleaving, and will produce one memory domain thus producing a non-numa configuration. Channel interleaving is also available with all configurations. Channel interleaving is the intra-die memory interleave option and is the default setting for Dell EMC platforms. With channel interleaving the memory behind each UMC will be interleaved and seen as one NUMA domain per die. This will generated with four NUMA domains per socket. Memory interleaving disabled - When memory interleaving is disabled, four NUMA nodes will be seen as in the case for channel interleaving. In this case however, the memory will not be interleaved but rather stacked next to one another. Number of Processors Socket Interleave Die Interleave Channel Interleave 2 1 2 8 1 NA 1 4 Figure 2: NUMA Domain count per Memory Interleave Option Performance Tuning For best performance from AMD EPYC processors, it is recommended that each die have one DIMM populated in each channel. This allows all IO behind each die to access memory, with optimal latency.

Memory DIMM Population Guidelines Populate empty channels, with the same type/capacity of DIMMs, before populating two DIMMs on a given channel Recommendations for best performance: o 1 DIMM per channel dedicates full memory bandwidth o Populating 2 DIMMs per channel will increase capacity but will lower the clock speed, resulting in lower memory bandwidth. There is a dependency between memory speed and the bandwidth of the Infinity Fabric. Memory Bus Speed Infinity Fabric Speed Infinity Fabric Speed (Die to Die in same CPU socket) (Socket-to-socket) 2666 MT/s 5.3 GT/s 10.6 GT/s 2400 MT/s 4.8 GT/s 9.6 GT/s 2133 MT/s 4.2 GT/s 8.5 GT/s 1866 MT/s 3.7 GT/s 7.4 GT/s Figure 3: Memory Bus Speed to Infinity Fabric Bus Speed Minimum recommended: o At least 1 DIMM is per die in the system for a total of 4 DIMM per CPU On Dell EMC platforms populate DIMM 1 first (white slots in Figure 4, below) A 2 socket system (2 CPUs are populated) will need equivalent memory configurations on both CPUs for optimal performance. D1 D0 D1 D0 D1 D0 D1 D0 CPU D0 D1 D0 D1 D0 D1 D0 D1 Figure 4: DIMM layout of AMD EPYC processors

PCIe Configuration Guidelines When PCIe cards are populated into particular slots with NUMA-unaware application/software, make sure to have memory DIMMs populated in the corresponding NUMA-node mapping as local memory. Mappings can be found in Section 4 of Platform Specific NUMA/Die Domain Details Consider also pinning the interrupts to local CPUs to get maximum performance. For instructions on how to tune network cards for better performance on AMD EPYC processors, access the following links and download provided documentation: https://support.amd.com/techdocs/56224.pdf https://developer.amd.com/resources/epyc-resources/epyc-white-papers/ BIOS Setup The Memory Interleaving setting controls whether the system is configured for Socket, Die, or Channel interleaving. In System Setup (F2 prompt during system boot), enter System BIOS > Memory Settings, and navigate to Memory Interleaving to choose the memory interleave for desired configuration. This option is also available in system management consoles such as RACADM. Figure 5: Using the BIOS Setup screen to select Memory Interleaving option

Platform-specific NUMA/Die Domain Details The following matrices shows how CPU die, memory and PCIe slots are physically grouped to each NUMA domain for Dell EMC PowerEdge EPYC-based platforms, PowerEdge R6415, R7415, and R7425. PowerEdge R7415 (Sys configuration with rear side disks) PCIe Slot / Device CPU Die Memory Slots NUMA Node Slot1 2 A7, A8, A15, A16 2 Slot2 0 A3, A4, A11, A12 0 Slot3 2 A7, A8, A15, A16 2 Embedded LOM 2 A7, A8, A15, A16 2 Mini PERC 3 A5, A6, A13, A14 3 CONCLUSIONS Dell EMC PowerEdge Servers with the new AMD EPYC processors can deliver significant performance improvements for key applications and workloads. By following the guidelines above and implementing the desired NUMA configuration with the PowerEdge system BIOS, optimal performance can be attained. For further information, please see: Dell TechCenter, at https://www.dell.com/community/dell-community/ct-p/english is an online technical community where IT professionals have access to numerous resources for Dell EMC software, hardware and services. https://community.amd.com/community/server-gurus EPYC Server Community Forum https://developer.amd.com/resources/epyc-resources/epyc-white-papers/ Linux Network Tuning Guide for AMD EPYC Processor Based Servers