Maximizing System x and ThinkServer Performance with a Balanced Memory Configuration

Similar documents
Intel Xeon Scalable Family Balanced Memory Configurations

BladeCenter HS21 XM Memory Configurations Positioning Information (withdrawn product)

Understanding the Performance Benefits of MegaRAID FastPath with Lenovo Servers

Introduction to PCI Express Positioning Information

Introduction to Windows Server 2016 Hyper-V Discrete Device Assignment

Lenovo RAID Introduction Reference Information

Using Read Intensive SSDs with Lenovo Storage V Series and IBM Storwize for Lenovo

Comparing the IBM eserver xseries 440 with the xseries 445 Positioning Information

OSIG Change History Article

Lenovo exflash DDR3 Storage DIMMs Product Guide

An Introduction to NIC Teaming with Lenovo Networking Switches

Cisco Systems 4 Gb 20-port and 10-port Fibre Channel Switch Modules for BladeCenter

1 UPDATING GCM16, GCM32 (1754D1X, 1754D2X) FIRMWARE

2/4 Port Ethernet Expansion Card (CFFh) for IBM BladeCenter Product Guide

Flex System FC port 8Gb FC Adapter Lenovo Press Product Guide

Broadcom 10Gb 2-Port and 4-Port Ethernet Expansion Cards (CFFh) for IBM BladeCenter (Withdrawn) Product Guide

N2225 and N2226 SAS/SATA HBAs for System x Lenovo Press Product Guide

ServeRAID C100 and C105 Product Guide

Brocade 10Gb CNA for IBM System x (Withdrawn) Product Guide

IBM High IOPS MLC Adapters Product Guide

IBM High IOPS Modular Adapters (Withdrawn) Product Guide

Brocade 20-port 8Gb SAN Switch Modules for BladeCenter

Flex System EN port 10Gb Ethernet Adapter Product Guide

IBM 1735 Rack-Based Local Console Switches (Withdrawn) Product Guide

SAS 2.5-inch Hybrid Hard Drives for System x Product Guide

Lenovo LTO Generation 6 (LTO6) Internal SAS Tape Drive Product Guide

IBM UPS5000 HV Uninterruptible Power Supply for IBM System x Product Guide

Flex System FC port 16Gb FC Adapter Lenovo Press Product Guide

Flex System EN port 1Gb Ethernet Adapter Product Guide

Lenovo 42U 1200mm Deep Racks Product Guide

Designing a Reference Architecture for Virtualized Environments Using IBM System Storage N series IBM Redbooks Solution Guide

Introduction to Windows Server 2016 Nano Server

Broadcom NetXtreme Gigabit Ethernet Adapters Product Guide

Platform LSF Version 9 Release 1.1. Migrating on Windows SC

Lenovo RDX USB 3.0 Disk Backup Solution Product Guide

Flex System FC5024D 4-port 16Gb FC Adapter Lenovo Press Product Guide

Platform LSF Version 9 Release 1.3. Migrating on Windows SC

Build integration overview: Rational Team Concert and IBM UrbanCode Deploy

ServeRAID-MR10i SAS/SATA Controller IBM System x at-a-glance guide

Continuous Availability with the IBM DB2 purescale Feature IBM Redbooks Solution Guide

ServeRAID M5015 and M5014 SAS/SATA Controllers Product Guide

ServeRAID H1110 SAS/SATA Controller Product Guide

Emulex 8Gb Fibre Channel Single-port and Dual-port HBAs for IBM System x IBM System x at-a-glance guide

Hardware Replacement Guide Lenovo 3000 J Series. Types 8453, 8454, 8455, 8458, 8459, 8460

Hardware Replacement Guide

Lenovo PM963 NVMe Enterprise Value PCIe SSDs Product Guide

IBM PowerKVM available with the Linux only scale-out servers IBM Redbooks Solution Guide

Release Notes. IBM Tivoli Identity Manager Rational ClearQuest Adapter for TDI 7.0. Version First Edition (January 15, 2011)

SATA 1.8-inch and 2.5-inch MLC Enterprise SSDs for System x Product Guide

ServeRAID M5000 Series Performance Accelerator Key for System x Product Guide

Lenovo XClarity Provisioning Manager User Guide

IBM RDX Removable Disk Backup Solution (Withdrawn) Product Guide

Introduction to Spine-Leaf Networking Designs

ServeRAID-BR10il SAS/SATA Controller v2 for IBM System x IBM System x at-a-glance guide

Version 1.2 Tivoli Integrated Portal 2.2. Tivoli Integrated Portal Customization guide

IBM FlashSystem V MTM 9846-AC3, 9848-AC3, 9846-AE2, 9848-AE2, F, F. Quick Start Guide IBM GI

Best practices. Linux system tuning for heavilyloaded. IBM Platform Symphony

Best practices. Starting and stopping IBM Platform Symphony Developer Edition on a two-host Microsoft Windows cluster. IBM Platform Symphony

IBM. Networking INETD. IBM i. Version 7.2

IBM FlashSystem V Quick Start Guide IBM GI

Flex System IB port FDR InfiniBand Adapter Lenovo Press Product Guide

IBM emessage Version 8.x and higher. Account Startup Overview

Implementing IBM Easy Tier with IBM Real-time Compression IBM Redbooks Solution Guide

IBM High IOPS SSD PCIe Adapters IBM System x at-a-glance guide

Getting Started with InfoSphere Streams Quick Start Edition (VMware)

IBM BladeCenter Layer 2-7 Gigabit Ethernet Switch Module (Withdrawn) Product Guide

Version 9 Release 0. IBM i2 Analyst's Notebook Premium Configuration IBM

Implementing Disk Encryption on System x Servers with IBM Security Key Lifecycle Manager Solution Guide

Lenovo XClarity Essentials UpdateXpress User Guide

DS8880 High Performance Flash Enclosure Gen2

2-Port 40 Gb InfiniBand Expansion Card (CFFh) for IBM BladeCenter IBM BladeCenter at-a-glance guide

Version 9 Release 0. IBM i2 Analyst's Notebook Configuration IBM

IBM. IBM i2 Enterprise Insight Analysis Understanding the Deployment Patterns. Version 2 Release 1 BA

Networking Bootstrap Protocol

IBM Kenexa LCMS Premier on Cloud. Release Notes. Version 9.3

IBM. Avoiding Inventory Synchronization Issues With UBA Technical Note

Release Notes. IBM Tivoli Identity Manager Universal Provisioning Adapter. Version First Edition (June 14, 2010)

IBM Storage Management Pack for Microsoft System Center Operations Manager (SCOM) Version Release Notes

Netcool/Impact Version Release Notes GI

Version 2 Release 1. IBM i2 Enterprise Insight Analysis Understanding the Deployment Patterns IBM BA

Tivoli Storage Manager for Virtual Environments: Data Protection for VMware Solution Design Considerations IBM Redbooks Solution Guide

Release Notes. IBM Security Identity Manager GroupWise Adapter. Version First Edition (September 13, 2013)

IBM License Metric Tool Enablement Guide

IBM Rational Synergy DCM-GUI

Best practices. Reducing concurrent SIM connection requests to SSM for Windows IBM Platform Symphony

Hardware Replacement Guide Types 8099, 8116, 8155, 8156 Types 8157, 8158, 8159, 8160 Types 8215, 9210, 9211

White Paper: Configuring SSL Communication between IBM HTTP Server and the Tivoli Common Agent

IBM Flex System FC port 8Gb FC Adapter IBM Redbooks Product Guide

Solution Guide: Flex System Solution for Oracle RAC

Intel I350 Gigabit Ethernet Adapters Product Guide

Flex System FC port and FC port 16Gb FC Adapters Lenovo Press Product Guide

M910q. User Guide and Hardware Maintenance Manual. Overview. Computer locks. Specifications. Replacing FRUs. Replacing CRUs.

Development tools System i5 Debugger

Implementing Enhanced LDAP Security

IBM Storage Driver for OpenStack Version Installation Guide SC

S3500 SATA MLC Enterprise Value SSDs for System x Product Guide

Operating System Installation Guide for Models 3xx, 5xx, 7xx, and 9xx

IBM z/os Management Facility V2R1 Solution Guide IBM Redbooks Solution Guide

IBM Endpoint Manager Version 9.1. Patch Management for Ubuntu User's Guide

IBM Security QRadar Version Customizing the Right-Click Menu Technical Note

Transcription:

Front cover Maximizing System x and ThinkServer Performance with a Balanced Configuration Last Update: October 2017 Introduces three balanced memory guidelines for Intel Xeon s Compares the performance of balanced and unbalanced memory configurations Explains memory interleaving and its importance Provides tips on how to balance memory Dan Colglazier Joseph Jakubowski Tristian Truth Brown

Abstract Configuring a server with balanced memory is important for maximizing its memory bandwidth and overall performance. Lenovo servers running Intel Xeon s have 4 or 8 memory channels per and up to three DIMMs per channel, so it is important to understand what is considered a balanced configuration and what is not. In this paper, we introduce three balanced memory guidelines that will guide you to select a balanced memory configuration. Balanced and unbalanced memory configurations are presented along with their relative measured memory bandwidths to show the effect of unbalanced memory. Suggestions are also provided on how to produce balanced memory configurations. This paper is for System x and ThinkServer customers and for business partners and sellers wishing to understand how to maximize the performance of Lenovo servers. At Lenovo Press, we bring together experts to produce technical publications around topics of importance to you, providing information and best practices for using Lenovo products and solutions to solve IT challenges. See a list of our most recent publications at the Lenovo Press web site: http://lenovopress.com Do you have the latest version? We update our papers from time to time, so check whether you have the latest version of this document by clicking the Check for Updates button on the front page of the PDF. Pressing this button will take you to a web page that will tell you if you are reading the latest version of the document and give you a link to the latest if needed. While you re there, you can also sign up to get notified via email whenever we make an update. Contents Introduction.............................................................. 3 interleaving........................................................ 3 Balanced memory configurations.............................................. 4 About the STREAM test..................................................... 5 topology.......................................................... 5 Applying the balanced memory guidelines - E5 s.......................... 6 Applying the balanced memory guidelines - E7 s......................... 11 Maximizing memory bandwidth.............................................. 15 Cluster on Die option impact on memory bandwidth.............................. 15 Summary............................................................... 16 Change history........................................................... 16 Authors................................................................. 17 Notices................................................................. 18 Trademarks............................................................. 19 2 Maximizing System x and ThinkServer Performance with a Balanced Configuration

Introduction The memory subsystem is a key component of the Intel Xeon architecture of Lenovo servers which can affect overall server performance. When properly configured, the memory subsystem can deliver extremely high memory bandwidth and low memory access latency. When the memory subsystem is incorrectly configured, however, the memory bandwidth available to the server can become limited and overall server performance can be reduced. This brief explains the concept of balanced memory configurations that yield the highest possible memory bandwidth from the Intel Xeon architecture. Examples of balanced and unbalanced memory configurations are shown to illustrate their effect on memory subsystem performance. This paper is applicable to servers based on Intel v4 and upcoming E7 v4 s as well as the previous generation v3 and E7 v3 families. interleaving Access to the information stored on memory DIMMs is controlled by the memory controller(s) within the Intel Xeon. Depending on the specific, one or two memory controllers are present in each. Each memory controller is attached to memory channels that are connected to the physical connectors that interface to memory DIMMs. Figure 1 on page 3 illustrates how an Intel E5 s memory controllers are connected to DIMM s. Slot 0 Slot 1 Slot 2 Slot 0 Slot 1 Slot 2 Channel 0 Channel 1 Controller Channel 0 Channel 1 Slot 2 Slot 2 Slot 1 Slot 1 Slot 0 Slot 0 Figure 1 Layout of the Intel memory controllers and DIMM s The Xeon optimizes memory accesses by creating interleave sets across the memory controllers and/or memory channels. For example, if identical DIMMs are populated on both memory channels attached to a memory controller, the memory controller creates a 2-way interleave set across both DIMMs. Figure 2 on page 4 shows two such interleave sets. Copyright Lenovo 2017. All rights reserved. 3

Two-way interleaved set Two-way interleaved set Figure 2 Two-way interleave sets (four identical DIMMs) Interleaving enables higher memory bandwidth by spreading contiguous memory accesses across both memory channels rather than sending all memory accesses to one memory channel or the other. If DIMMS with different memory capacities are populated on the memory channels attached to a memory controller or if different numbers of identical capacity DIMMs are populated on the memory channels, the memory controller has to create multiple interleave sets. Managing multiple interleave sets creates overhead for the memory controller which can reduce memory bandwidth. Figure 2 illustrates a 2-way memory interleave set on an Intel E5 which results from populating identical memory DIMMs on each memory channel. These 2-way memory interleave sets can be further interleaved across the memory controllers. Consecutive addresses would alternate between the memory controllers with every fourth address going to each memory channel. Within a memory channel, a second level of interleaving called memory rank interleaving can occur. A memory rank is a block of data created from the memory chips on a memory DIMM. A memory rank is typically 64 bits wide. If ECC is supported, an additional 8 bits are added for a total of 72 bits. A DIMM may contain multiple memory ranks, for example, one, two and four ranks per DIMMs. rank interleaving is optimal when each DIMM on a memory channel has the same number of memory ranks. For example, each DIMM on a memory channel is a 2-rank DIMM. When installing memory DIMMs into your server, follow the DIMM installation sequence for your particular server. The configurations shown in this paper did not always follow the sequences for the servers they were implemented and measured on as a number of these configurations were put together just for demonstration purposes. Balanced memory configurations Balanced memory configurations enable optimal interleaving across all attached memory channels so memory bandwidth is maximized. Bandwidth can be optimized if both memory controllers on the same physical socket are identically configured. System level memory performance can be further optimized if each physical socket has the same physical memory capacity. 4 Maximizing System x and ThinkServer Performance with a Balanced Configuration

As a result, the basic guidelines for a balanced memory subsystem are as follows: 1. All populated memory channels should have the same total memory capacity and the same total number of ranks 2. All memory controllers on a socket should have the same configuration of memory DIMMs 3. All sockets on the same physical server should have the same configuration of memory DIMMs Tip: We refer to the above three bullets throughout this paper as the balanced memory guidelines 1, 2 and 3. About the STREAM test STREAM Triad is a simple, synthetic benchmark designed to measure sustainable memory bandwidth. Its intent is to measure the best memory bandwidth available. STREAM Triad will be used to measure the sustained memory bandwidth of various memory configurations to see the effect of unbalanced memory configurations on memory bandwidth. For more information about STREAM Triad, see: http://www.cs.virginia.edu/stream/ topology In this paper we are testing with a server that has an E5-2600 v3 with two memory controllers, however the results equally apply to servers with E5-2600 v4 s. Low-end s: Intel v3 s with fewer than 10 cores and E5 v4 s with fewer than 12 cores only have one memory controller in the package, with all four memory channels connected to the one memory controller. As a result, balanced memory guideline 2 does not apply for s with only one memory controller. To illustrate various memory topologies for a with two memory controllers, different memory configurations will be designated as A:B:C:D where each letter indicates the number of DIMMs populated on each memory channel. A refers to Channel 0 on B refers to Channel 1 on C refers to Channel 0 on D refers to Channel 1 on This designation is shown in Figure 3 on page 6. 5

A B Channel 0 Channel 1 Channel 0 Channel 1 C D Figure 3 Channel designation As an example, a 3:2:1:0 memory configuration has: 3 DIMMs on Channel 0 on 2 DIMMs on Channel 1 on 1 DIMM on Channel 0 on 0 DIMMs on Channel 1 on Applying the balanced memory guidelines - E5 s We will start with the assumption that balanced memory guideline 3 described in Balanced memory configurations on page 4 is followed so that all sockets on the same physical server have the same configuration of memory DIMMs. Configuration of 4 DIMMs - balanced Therefore, we only have to look at one socket for each memory configuration. We will start with one DIMM in each memory channel which yields the 1:1:1:1 memory configuration shown in Figure 4. Figure 4 One memory DIMM in each channel (STREAM Triad relative memory bandwidth = 100) This is a balanced memory configuration as it follows balanced memory guideline 1 with all memory channels having of memory capacity and two ranks total. It also follows balanced memory guideline 2 with both memory controllers having the same configuration of DIMMs. Interleaving can be done across the memory controllers and across both pairs of memory channels. The 1:1:1:1 memory configuration provides a maximum amount of memory bandwidth with a relative STREAM Triad score of 100. 6 Maximizing System x and ThinkServer Performance with a Balanced Configuration

Configuration of 5 DIMMs - unbalanced Next we will add one DIMM to see its effect on memory bandwidth. This 2:1:1:1 memory configuration is shown in Figure 5. Figure 5 2:1:1:1 memory configuration (STREAM Triad relative memory bandwidth = 26) This is not a balanced memory configuration as it breaks both balanced memory guideline 1 with a different memory capacity and number of ranks on one memory channel than the others and balanced memory guideline 2 with different DIMM configurations on each memory controller. While a 2-way interleave set is created on, has to create two separate interleave sets for the three DIMMs on its two memory channels: A 2-way interleave set is created using one DIMM from each memory channel. A 1-way interleave set is created using the remaining DIMM on Channel 0. Interleaving across the memory controllers is not done for this configuration due to the unbalanced memory between them. The inability to interleave across the memory controllers combined with the interleaving overhead on Channel 0 reduces the relative memory bandwidth of the 2:1:1:1 memory configuration to 26 or 26% of the 1:1:1:1 memory configuration. Configurations of 6 DIMMs - 3 unbalanced and 1 balanced There are numerous ways to populate six DIMMs in the twelve DIMM s. We will look at four different ways to see the effect on memory bandwidth. The first is the 2:1:1:2 memory configuration shown in Figure 6. Figure 6 2:1:1:2 memory configuration (STREAM Triad relative memory bandwidth = 52) This is an unbalanced memory configuration as it does not follow balanced memory guideline 1 with two memory channels having 32 GB of memory and four total ranks while the other two have only 16GB of memory and two ranks. Balanced memory guideline 2 is followed as both memory controllers have the same configuration of memory DIMMs allowing interleaving to be done across the memory controllers. 7

Each memory controller has to create two separate interleave sets for the three DIMMs on their two memory channels as described above for in the 2:1:1:1 memory configuration. Even though balanced memory guideline 2 is followed with each memory controller having the same configuration of DIMMs, the interleaving overhead of multiple sets on the channels reduces the relative memory bandwidth of the 2:1:1:2 memory configuration to only 52. Another way to populate memory with six DIMMs is in a 3:1:1:1 configuration as shown in Figure 7 on page 8. Figure 7 3:1:1:1 memory configuration (STREAM Triad relative memory bandwidth = 20) This unbalanced memory configuration does not follow balanced memory guideline 1 as one channel has more memory capacity and ranks than the others. It also does not follow balanced memory guideline 2 as the memory controllers do not have the same DIMM configurations. Interleaving cannot be done across the memory controllers. can interleave its memory channels as they follow balanced memory guideline 1. will need to create a 2-way interleave set across its memory channels and two 1-way interleave sets for the two other DIMMs on its Channel 0. Since most of this configuration s memory is on, this interleaving overhead is very detrimental. The result is a relative memory bandwidth of 20. One way to fulfill balanced memory guideline 2 with six DIMMs is with a 3:0:3:0 memory configuration shown in Figure 8. Figure 8 3:0:3:0 memory configuration (STREAM Triad relative memory bandwidth = 39) This memory configuration also follows balanced memory guideline 1 and is a balanced memory configuration. Even so, all memory channels should be populated to achieve maximum memory bandwidth. Interleaving is done across the memory controllers. Each memory controller will create three 1-way interleave sets on Channel 0. Even though balanced memory guideline 2 is met, this memory configuration produces a relative memory bandwidth of 39 mostly due to no contribution from either Channel 1. 8 Maximizing System x and ThinkServer Performance with a Balanced Configuration

The final way to arrange six DIMMs is to put all of them on one memory controller in the 3:3:0:0 memory configuration shown in Figure 9 on page 9. Figure 9 3:3:0:0 memory configuration (STREAM Triad relative memory bandwidth = 38) This memory configuration does not follow balanced memory guideline 1 in the same way as the 3:0:3:0 memory configuration. It also does not follow balanced memory guideline 2 as all the memory is on one memory controller. Having one memory controller unpopulated entirely removes its contribution to memory bandwidth. Interleaving can occur between the channels on. Even so, the 3:3:0:0 memory configuration has a relative memory bandwidth of 38. Configurations of 8 DIMMs - balanced Using eight DIMMs, it is easy to create the balanced 2:2:2:2 memory configuration as shown in Figure 10. Figure 10 2:2:2:2 memory configuration (STREAM Triad relative memory bandwidth = 100) This memory configuration fulfills balanced memory guideline 1 with all channels populated equally and balanced memory guideline 2 with each memory controller having the same DIMM configuration. This memory configuration can interleave between both the memory controllers and the memory channels. The 2:2:2:2 memory configuration provides the maximum relative memory bandwidth of 100. Configurations of 8 DIMMs - unbalanced The same eight memory DIMMs can also be arranged in the unbalanced memory configuration of 3:1:1:3 as shown in Figure 11 on page 10. 9

Figure 11 3:1:1:3 memory configuration (STREAM Triad relative memory bandwidth = 53) This configuration does not follow balanced memory guideline 1 as half the memory channels have 48 GB of memory and 6 total ranks, while the other memory channels have only of memory and 2 ranks. This memory configuration can be interleaved across the memory controllers but not efficiently across the memory channels. Each memory controller will have one 2-way interleave set and two 1-way interleave sets. This limits its relative memory bandwidth to 53. Configurations of 3 DIMMs - unbalanced One other memory configuration to look at is having only three memory DIMMs in the 1:1:1:0 memory configuration shown in Figure 12. Figure 12 1:1:1:0 memory configuration (STREAM Triad relative memory bandwidth = 60) This unbalanced memory configuration does not follow balanced memory guideline 1 with one unpopulated memory channel nor balanced memory guideline 2 with two DIMMs on one memory controller and only one on the other. This memory configuration cannot be interleaved efficiently across the memory controllers nor at all across the memory channels of. It can be interleaved across the memory channels of. It also suffers from lacking any contribution to memory bandwidth from Channel 1 on as it is unpopulated. The result is a relative memory bandwidth of 60. 10 Maximizing System x and ThinkServer Performance with a Balanced Configuration

Applying the balanced memory guidelines - E7 s The Intel Xeon E7 uses a slightly different memory architecture to enable support of more DIMMs. Each has two integrated memory controllers, and each memory controller has two Scalable Interconnect generation 2 (SMI2) links that are connected to two scalable memory buffers. Each memory buffer has two memory channels, and each channel supports three DIMMs, for a total of 24 DIMMs per. The use of memory buffers doubles the number of memory channels and DIMMs supported per. The E7 memory architecture is shown in Figure 13 on page 11. Slot 0 Slot 1 Slot 2 Slot 0 Slot 1 Slot 2 Slot 0 Slot 1 Slot 2 Slot 0 Slot 1 Slot 2 Buffer 0 Buffer 1 Xeon E7 Buffer 2 Buffer 3 Slot 2 Slot 2 Slot 2 Slot 2 Slot 1 Slot 1 Slot 1 Slot 1 Slot 0 Slot 0 Slot 0 Slot 0 Figure 13 Intel Xeon E7 memory DIMM layout configurations will be designated as A:B:C:D:E:F:G:H where each letter indicates the number of DIMMs populated on each memory channel. A refers to Channel 0 on Buffer 0 on B refers to Channel 1 on Buffer 0 on C refers to Channel 0 on Buffer 1 on D refers to Channel 1 on Buffer 1 on E refers to Channel 0 on Buffer 2 on F refers to Channel 1 on Buffer 2 on G refers to Channel 0 on Buffer 3 on H refers to Channel 1 on Buffer 3 on This designation is shown in Figure 14 on page 11. A Channel 0 B Channel 1 C D Channel 0 Channel 1 Buffer 0 Buffer 1 Xeon E7 Buffer 2 Buffer 3 Channel 0 Channel 1 Channel 0 Channel 1 E F G H Figure 14 Channel designation for E7 s 11

The memory buffers provide another level that can be interleaved. Balanced memory guideline 2 now equally applies to memory buffers as it does to memory controllers in that all memory buffers on a socket should have the same configuration of DIMMs to provide for efficient interleaving across them. Configuration of 8 DIMMs - balanced The first memory configuration we will look at is a balanced 1:1:1:1:1:1:1:1 memory configuration with one DIMM on each memory channel as shown in Figure 15. This memory configuration follows all the balanced memory guidelines and can be interleaved efficiently across the memory controllers, memory buffers, and memory channels. It provides the maximum relative memory bandwidth of 100. MB0 MB1 Xeon E7 MB2 MB3 Figure 15 1:1:1:1:1:1:1:1 memory configuration (STREAM Triad relative memory bandwidth = 100) Configuration of 9 DIMMs - unbalanced Adding one additional memory DIMM produces the unbalanced 2:1:1:1:1:1:1:1 memory configuration shown in Figure 16 on page 12. MB0 MB1 Xeon E7 MB2 MB3 Figure 16 2:1:1:1:1:1:1:1 memory configuration (STREAM Triad relative memory bandwidth = 16) This configuration does not follow balanced memory guideline 1 with one memory channel having twice the memory capacity and ranks of all the others. It also does not follow balanced memory guideline 2 with different memory DIMM configurations on the memory controllers, but it is balanced across the memory buffers on. Efficient interleaving occurs between Buffer 2 and Buffer 3 as they both have the same configurations of DIMMs attached to them. 12 Maximizing System x and ThinkServer Performance with a Balanced Configuration

It is difficult to create balances with an odd number of DIMMs. The result is a very poor relative memory bandwidth of 16. Configuration of 10 DIMMs - unbalanced Adding one more DIMM produces the unbalanced 2:1:1:1:2:1:1:1 memory configuration shown in Figure 17. MB0 MB1 Xeon E7 MB2 MB3 Figure 17 2:1:1:1:2:1:1:1 memory configuration (STREAM Triad relative memory bandwidth = 48) This configuration does follow balanced memory guideline 2 with both memory controllers having the same DIMM configurations allowing interleaving across the memory controllers. This balanced memory guideline is not followed for the memory buffers. Balanced memory guideline 1 is not followed as two memory channels have twice the memory capacity and ranks as the other six. Buffers 1 and 3 do follow balanced memory guideline 1 allowing efficient interleaving to be done across their memory channels. The result is a relative memory bandwidth of 48. Configuration of 12 DIMMs - unbalanced Adding two more DIMMs produces the unbalanced 2:1:2:1:2:1:2:1 memory configuration shown in Figure 18 on page 13. MB0 MB1 Xeon E7 MB2 MB3 Figure 18 2:1:2:1:2:1:2:1 memory configuration (STREAM Triad relative memory bandwidth = 64) Balanced memory guideline 2 is followed for both memory controllers and all memory buffers with matching DIMM configurations. This allows for interleaving between all of them. Unfortunately, balanced memory guideline 1 is not followed as half the memory channels have twice the memory capacity and ranks as the other half. The result is less efficient 13

interleaving across the channels of each memory buffer and a relative memory bandwidth of 64. Configuration of 14 DIMMs - unbalanced Using 14 DIMMs produces the unbalanced 2:1:2:2:2:1:2:2 memory configuration shown in Figure 19. MB0 MB1 Xeon E7 MB2 MB3 Figure 19 2:1:2:2:2:1:2:2 memory configuration (STREAM Triad relative memory bandwidth = 33) This configuration follows balanced memory guideline 2 with both memory controllers having the same DIMM configurations. This allows for efficient interleaving across the memory controllers. The memory buffers do not follow balanced memory guideline 2 as half have three DIMMs and half have four. Balanced memory guideline 1 is not followed as not all memory channels have the same memory capacity and ranks. Pairs of memory buffers do have the same DIMM configurations: Buffers 0 and 2 and Buffers 1 and 3. Interleaving can be done efficiently between these memory buffer pairs. The result is a relative memory bandwidth of 33. Configuration of 16 DIMMs - balanced Adding two more DIMMs produces the balanced 2:2:2:2:2:2:2:2 memory configuration shown in Figure 20. MB0 MB1 Xeon E7 MB2 MB3 Figure 20 2:2:2:2:2:2:2:2 memory configuration (STREAM Triad relative memory bandwidth = 100) This follows balanced memory guideline 1 as all memory channels have the same memory capacity and total ranks. It also follows balanced memory guideline 2 as both memory 14 Maximizing System x and ThinkServer Performance with a Balanced Configuration

controllers and all memory buffers have the same DIMM configurations. This allows for efficient interleaving at all levels and provides a relative memory bandwidth of 100. Maximizing memory bandwidth In order to maximize the memory bandwidth of a server, the following rules should be followed: Balance the memory across the sockets - all sockets on the same physical server should have the same configuration of memory DIMMs Balance the memory across the memory controllers - all memory controllers on a socket should have the same configuration of memory DIMMs Balance the memory across the memory channels - all memory channels should be populated and have the same total memory capacity and the same total number of ranks If a server requires a specific memory capacity, sometimes smaller DIMMs or a mix of DIMM sizes are needed to produce a balanced memory configuration. For example, you may require 96 GB of memory on a server with one Intel. This can be accomplished using six DIMMs. Unfortunately there is no way to follow all the rules above with six DIMMs on four memory channels. Options in this situation are: Use twelve 8 GB DIMMs. These would be arranged as a 3:3:3:3 configuration which does follow all the rules. Use one DIMM and one 8 GB DIMM on each channel. Have more memory than required to produce a balanced memory configuration. In this case, eight can be arranged as a 2:2:2:2 balanced memory configuration. The important thing is to create a balanced memory configuration following all the above rules to maximize memory bandwidth and overall server performance. Cluster on Die option impact on memory bandwidth Cluster on Die is a parameter that is set in UEFI and effectively splits a into two parts so that it acts like two s. The CoD option is only available on Intel Xeon E5-2600 v3 and Intel E5-2600 v4 s with two memory controllers - Intel v3 s with 10 or more cores and E5 v4 s with 12 or more cores. Configurations with the CoD option can improve performance in workloads that are highly Non-Uniform Access (NUMA) optimized. Each half of the is treated as a separate NUMA node with ownership of half of the cores, a memory controller, and locally attached memory. Such a configuration can improve performance for workloads whose data is found almost entirely within the cluster it is running on. Figure 21 on page 16 shows an Intel E5 with the CoD option producing two clusters. 15

Cluster 0 Cluster 1 Figure 21 2:2:1:1 memory configuration with Cluster on Die making two clusters (STREAM = 100) The CoD option can reduce the impact of not following balanced memory guideline 2. More memory capacity can be attached to one memory controller than the other and still produce maximum memory bandwidth as long as memory is balanced across the memory channels of each memory controller. Without the CoD option, balanced memory guideline 2 is not followed in the 2:2:1:1 configuration making it an unbalanced memory configuration with reduced memory bandwidth. With the CoD option, the 2:2:1:1 configuration provides maximum memory bandwidth. Even so, it is recommended that all the balanced memory guidelines be followed even with configurations with the CoD option. Summary Overall server performance is affected by the memory subsystem which can provide both high memory bandwidth and low memory access latency when properly configured. Balancing memory across the memory controllers and the memory channels produces memory configurations which can efficiently interleave memory references among its DIMMs producing the highest possible memory bandwidth. An unbalanced memory configuration can reduce the total memory bandwidth to as low as 16% that of a balanced memory configuration. Implementing all three of the balanced memory guidelines described in this paper results in balanced memory configurations producing the best possible memory bandwidth and overall performance. Change history October 2017: Correction that one of the 6-DIMM configurations is balanced April 2016 Initial publication 16 Maximizing System x and ThinkServer Performance with a Balanced Configuration

Authors This paper was produced by the following team of specialists: Dan Colglazier is a Senior Engineer in the Lenovo Data Center Performance Group in Morrisville, NC. He had previously spent over 30 years at IBM, where he started out in the IBM Data Processing Division performance team helping improve the designs of future mainframe storage hierarchies. Assessing the performance of future System x servers in both IBM and Lenovo has been his main focus. He is currently the leader of the Performance Design Guidance team. Dan holds a Bachelor of Science degree in Computer Engineering and a Master of Science degree in Electrical Engineering both from the University of Illinois. Joe Jakubowski is a Senior Technical Staff Member in the Lenovo Server Performance Laboratory in Morrisville, NC. Previously, he spent 30 years at IBM. He started his career in the IBM Networking Hardware Division test organization and worked on various Token Ring adapters, switches and test tool development projects. He has spent the past 21 years in the server performance organization focusing primarily on database, virtualization and new technology performance. His current role includes all aspects of x86 server architecture and performance. Joe holds Bachelor of Science degrees in Electrical Engineering and Engineering Operations from North Carolina State University and a Master of Science degree in Telecommunications from Pace University. Tristian "Truth" Brown is a Hardware Performance Engineer on the Lenovo Server Performance Team in Raleigh, NC. He is responsible for the hardware analysis of high-performance, flash-based storage solutions for System x servers. Truth earned a bachelor's degree in Electrical Engineer from Tennessee State University and a master's degree in Electrical Engineering from North Carolina State University. His focus areas were in Computer Architecture and System-on-Chip (SoC) micro design and validation. Thanks to the following people for their contributions to this project: Charles Stephan, Lenovo Server Performance Team David Watts, Lenovo Press 17

Notices Lenovo may not offer the products, services, or features discussed in this document in all countries. Consult your local Lenovo representative for information on the products and services currently available in your area. Any reference to a Lenovo product, program, or service is not intended to state or imply that only that Lenovo product, program, or service may be used. Any functionally equivalent product, program, or service that does not infringe any Lenovo intellectual property right may be used instead. However, it is the user's responsibility to evaluate and verify the operation of any other product, program, or service. Lenovo may have patents or pending patent applications covering subject matter described in this document. The furnishing of this document does not give you any license to these patents. You can send license inquiries, in writing, to: Lenovo (United States), Inc. 1009 Think Place - Building One Morrisville, NC 27560 U.S.A. Attention: Lenovo Director of Licensing LENOVO PROVIDES THIS PUBLICATION AS IS WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some jurisdictions do not allow disclaimer of express or implied warranties in certain transactions, therefore, this statement may not apply to you. This information could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will be incorporated in new editions of the publication. Lenovo may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time without notice. The products described in this document are not intended for use in implantation or other life support applications where malfunction may result in injury or death to persons. The information contained in this document does not affect or change Lenovo product specifications or warranties. Nothing in this document shall operate as an express or implied license or indemnity under the intellectual property rights of Lenovo or third parties. All information contained in this document was obtained in specific environments and is presented as an illustration. The result obtained in other operating environments may vary. Lenovo may use or distribute any of the information you supply in any way it believes appropriate without incurring any obligation to you. Any references in this publication to non-lenovo Web sites are provided for convenience only and do not in any manner serve as an endorsement of those Web sites. The materials at those Web sites are not part of the materials for this Lenovo product, and use of those Web sites is at your own risk. Any performance data contained herein was determined in a controlled environment. Therefore, the result obtained in other operating environments may vary significantly. Some measurements may have been made on development-level systems and there is no guarantee that these measurements will be the same on generally available systems. Furthermore, some measurements may have been estimated through extrapolation. Actual results may vary. Users of this document should verify the applicable data for their specific environment. Copyright Lenovo 2017. All rights reserved. Note to U.S. Government Users Restricted Rights -- Use, duplication or disclosure restricted by Global Services Administration (GSA) ADP Schedule Contract 18

This document was created or updated on October 4, 2017. Send us your comments via the Rate & Provide Feedback form found at http://lenovopress.com/lp0501 Trademarks Lenovo, the Lenovo logo, and For Those Who Do are trademarks or registered trademarks of Lenovo in the United States, other countries, or both. These and other Lenovo trademarked terms are marked on their first occurrence in this information with the appropriate symbol ( or ), indicating US registered or common law trademarks owned by Lenovo at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of Lenovo trademarks is available on the Web at http://www.lenovo.com/legal/copytrade.html. The following terms are trademarks of Lenovo in the United States, other countries, or both: Lenovo Lenovo(logo) System x The following terms are trademarks of other companies: Intel, Xeon, and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Other company, product, or service names may be trademarks or service marks of others. 19