AMD EPYC Empowers Single-Socket Servers

Similar documents
AMD Disaggregates the Server, Defines New Hyperscale Building Block

Top 5 Reasons to Consider

Cisco HyperFlex HX220c M4 Node

Qualcomm Centriq 2400 Server TCO HHVM Web Server Series sponsored by Qualcomm Datacenter Technologies (QDT)

EPYC VIDEO CUG 2018 MAY 2018

OpEx Drivers in the Enterprise

HP ProLiant BladeSystem Gen9 vs Gen8 and G7 Server Blades on Data Warehouse Workloads

Mellanox Virtual Modular Switch

Cisco HyperFlex HX220c M4 and HX220c M4 All Flash Nodes

Infrastructure Matters: POWER8 vs. Xeon x86

Cisco UCS C200 M2 High-Density Rack-Mount Server

Cisco UCS C250 M2 Extended-Memory Rack-Mount Server

CAPEX Savings and Performance Advantages of BigTwin s Memory Optimization

7 Trends driving the Industry to Software-Defined Servers

The Impact of SSD Selection on SQL Server Performance. Solution Brief. Understanding the differences in NVMe and SATA SSD throughput

Cisco HyperFlex HX220c M4 and HX220c M4 All Flash Nodes

83951c01.qxd:Layout 1 1/24/07 10:14 PM Page 1 PART. Technology Evolution COPYRIGHTED MATERIAL

Taking Hyper-converged Infrastructure to a New Level of Performance, Efficiency and TCO

Cisco UCS B460 M4 Blade Server

Intel Select Solutions for Professional Visualization with Advantech Servers & Appliances

Building the Ecosystem for ARM Servers

HPE ProLiant ML350 Gen P 16GB-R E208i-a 8SFF 1x800W RPS Solution Server (P04674-S01)

UCS M-Series + Citrix XenApp Optimizing high density XenApp deployment at Scale

IBM System x family brochure

TECHNICAL OVERVIEW ACCELERATED COMPUTING AND THE DEMOCRATIZATION OF SUPERCOMPUTING

About 2CRSI. OCtoPus Solution. Technical Specifications. OCtoPus servers. OCtoPus. OCP Solution by 2CRSI.

HPE ProLiant ML350 Gen10 Server

It s Time to Move Your Critical Data to SSDs Introduction

Find the right platform for your server needs

Cisco UCS B200 M3 Blade Server

Design a Remote-Office or Branch-Office Data Center with Cisco UCS Mini

Cisco UCS C210 M2 General-Purpose Rack-Mount Server

Cisco UCS C250 M2 Extended-Memory Rack-Mount Server

Optimizing Apache Spark with Memory1. July Page 1 of 14

AMD EPYC PRESENTS OPPORTUNITY TO SAVE ON SOFTWARE LICENSING COSTS

IBM Storwize V7000 TCO White Paper:

SAS Technical Update Connectivity Roadmap and MultiLink SAS Initiative Jay Neer Molex Corporation Marty Czekalski Seagate Technology LLC

About 2CRSI. OCtoPus Solution. Technical Specifications. OCtoPus. OCP Solution by 2CRSI.

BUT HOW DID THE CLOUD AS WE KNOW IT COME TO BE AND WHERE IS IT GOING?

Evaluation Report: Improving SQL Server Database Performance with Dot Hill AssuredSAN 4824 Flash Upgrades

IBM System x family brochure

EPYC Offers x86 Compatibility

Accelerating Workload Performance with Cisco 16Gb Fibre Channel Deployments

Design a Remote-Office or Branch-Office Data Center with Cisco UCS Mini

Key Considerations for Improving Performance And Virtualization in Microsoft SQL Server Environments

2 to 4 Intel Xeon Processor E v3 Family CPUs. Up to 12 SFF Disk Drives for Appliance Model. Up to 6 TB of Main Memory (with GB LRDIMMs)

JMR ELECTRONICS INC. WHITE PAPER

Sugon TC6600 blade server

Cisco UCS C210 M1 General-Purpose Rack-Mount Server

All-new HP ProLiant ML350p Gen8 Server series

ZeroStack vs. AWS TCO Comparison ZeroStack s private cloud as-a-service offers significant cost advantages over public clouds.

Ultimate Workstation Performance

Cisco UCS C24 M3 Server

Cisco UCS B230 M2 Blade Server

Hyper-Converged Infrastructure: Providing New Opportunities for Improved Availability

Density Optimized System Enabling Next-Gen Performance

Broadberry. Artificial Intelligence Server for Fraud. Date: Q Application: Artificial Intelligence

TechValidate Survey Report: SaaS Application Trends and Challenges

Huawei KunLun Mission Critical Server. KunLun 9008/9016/9032 Technical Specifications

Data center day. Non-volatile memory. Rob Crooke. August 27, Senior Vice President, General Manager Non-Volatile Memory Solutions Group

Facilitating IP Development for the OpenCAPI Memory Interface Kevin McIlvain, Memory Development Engineer IBM. Join the Conversation #OpenPOWERSummit

Sun Microsystems Product Information

6WINDGate. White Paper. Packet Processing Software for Wireless Infrastructure

Agenda. Sun s x Sun s x86 Strategy. 2. Sun s x86 Product Portfolio. 3. Virtualization < 1 >

Server Success Checklist: 20 Features Your New Hardware Must Have

Expand In-Memory Capacity at a Fraction of the Cost of DRAM: AMD EPYCTM and Ultrastar

Flexible General-Purpose Server Board in a Standard Form Factor

Server-Grade Performance Computer-on-Modules:

CAUTIONARY STATEMENT This presentation contains forward-looking statements concerning Advanced Micro Devices, Inc. (AMD) including, but not limited to

FUJITSU Server PRIMERGY CX400 M4 Workload-specific power in a modular form factor. 0 Copyright 2018 FUJITSU LIMITED

Cisco HyperFlex HX220c Edge M5

Huawei KunLun Mission Critical Server. KunLun 9008/9016/9032 Technical Specifications

Erkenntnisse aus aktuellen Performance- Messungen mit LS-DYNA

The Use of Cloud Computing Resources in an HPC Environment

Global Headquarters: 5 Speen Street Framingham, MA USA P F

HPE ProLiant ML110 Gen P 8GB-R S100i 4LFF NHP SATA 350W PS DVD Entry Server/TV (P )

HYPER-CONVERGED INFRASTRUCTURE 101: HOW TO GET STARTED. Move Your Business Forward with a Software-Defined Approach

HP Z Turbo Drive G2 PCIe SSD

Dell PowerEdge R720xd with PERC H710P: A Balanced Configuration for Microsoft Exchange 2010 Solutions

Future-Proof Your Hardware Investment PRESENTED BY:

Calxeda : RACK TRUMPS THE CHIP

Four-Socket Server Consolidation Using SQL Server 2008

The Future of Computing: AMD Vision

CyberServe Atom Servers

16GFC Sets The Pace For Storage Networks

System Memory at a Fraction of the DRAM Cost

DDN About Us Solving Large Enterprise and Web Scale Challenges

WHITE PAPER SINGLE & MULTI CORE PERFORMANCE OF AN ERASURE CODING WORKLOAD ON AMD EPYC

FAST FORWARD TO YOUR <NEXT> CREATION

Performance and Energy Efficiency of the 14 th Generation Dell PowerEdge Servers

Meeting the 3G/4G Challenge: Maintaining Profitability in the Mobile Backhaul Through Ethernet

Choosing the Best Network Interface Card for Cloud Mellanox ConnectX -3 Pro EN vs. Intel XL710

How Architecture Design Can Lower Hyperconverged Infrastructure (HCI) Total Cost of Ownership (TCO)

AMD Opteron Processors In the Cloud

INTEL NEXT GENERATION TECHNOLOGY - POWERING NEW PERFORMANCE LEVELS

New Approach to Unstructured Data

How Microsoft IT Reduced Operating Expenses Using Virtualization

Achieve Optimal Network Throughput on the Cisco UCS S3260 Storage Server

INCREASING DENSITY AND SIMPLIFYING SETUP WITH INTEL PROCESSOR-POWERED DELL POWEREDGE FX2 ARCHITECTURE

NVMe Takes It All, SCSI Has To Fall. Brave New Storage World. Lugano April Alexander Ruebensaal

Transcription:

Whitepaper Sponsored by AMD May 16, 2017 This paper examines AMD EPYC, AMD s upcoming server system-on-chip (SoC). Many IT customers purchase dual-socket (2S) servers to acquire more I/O or memory capacity than what is available on Intel single-socket (1S) server architectures. AMD EPYC s 1S processing and I/O resources have the potential to displace 2S server designs for many workloads. Figure 1: AMD EPYC SoC seated in socket (left), bottom view (right) Sources: TIRIAS Research and AMD History Multi-processor servers have been around for decades. In the 1980s they were built from many boards per processor (CPU), then many single board computers. Single-chip microprocessors enabled server designers to put several processor sockets onto one motherboard. These multisocket motherboards were labeled as n-processor (np) where n is the number of processors on the board, such as 1P, 2P, 4P, etc. Today, a single server socket can host dozens of hardware threads. Multi-socket motherboards are now labeled as n-socket (ns), so a 1S server has one processor socket containing a number of processor cores, and so on. Figure 2: AMD EPYC SoC has twice the hardware threads of 2008-era four-socket system Source: TIRIAS Research TIRIAS RESEARCH ALL RIGHTS RESERVED Page 1

Multi-core processors entered the mainstream server market a decade ago when AMD introduced the AMD Opteron Dual Core processor. Multithreaded processors cores, which have extra CPU resources in each core to run more than one software execution thread at a time, also became popular during the past decade. Multithreaded, multi-core processors caused an industry shift from counting processors on motherboards to counting sockets. Counting sockets eliminates confusion between the number of hardware threads a processor chip can handle and the number of physical chips seated in sockets on a motherboard. Over the same period, server-based parallel-programming practices have stalled at a point where most have eight or fewer software execution threads per application, and few have over 16 threads. This is true for most mainstream commercial applications. The rise of virtualization is due to hardware performance evolving at a faster pace than individual software applications could absorb that increase in hardware performance. In the 2000s, the proliferation of multi-core processors enabled 4S systems to displace the more expensive 8-socket (8S) market as 4S systems became more capable. Then, over the past decade, 2S systems took over many workloads from more expensive 4S systems. 2S systems now define the enterprise-class server mainstream. A byproduct of continuing to cram more cores and threads into each server processor is that a single AMD EPYC SoC socket now contains roughly twice the hardware thread execution resources of a 2008-era dual-threaded, quad-core, 4S server (see Figure 2). That begs the question: why are 2S systems still the dominant server motherboard form factor? Is there potential to move mainstream server workloads to 1S infrastructure? And can 1S server be enterprise class? 2S or 1S? That is the Question There are few workloads that generate over 16 simultaneous threads per schedulable task or process, while most generate no more than eight threads per instance. Most of the workloads that do generate more than 16 threads per process are embarrassingly parallel high performance computing (HPC) workloads and generate orders of magnitude more threads per instance they are better suited to GPU or other offload acceleration than scaling up to higher socket counts. Most workloads, such as business logic running in virtual machines and cloud microservices running in containers, could run as fast and economically on a 1S server as they can on a 2S server, if there was a 1S system available today comparably provisioned to a 2S server with only one socket populated. Intel s server processor feature and pricing segmentation strategy appears designed to preserve its substantial 2S volume and margins. Due to increasing core counts as well as improvements to the underlying core and cache architectures, Intel's Xeon "Haswell" generation introduction in late 2014 increased the number of 2S servers shipped with only one socket populated. Page 2

Figure 3 shows that the number of under $3,000 2S servers shipping with only one socket populated permanently jumped about 20% (from 25% share to over 30% share of the price band) during the Haswell ramp. In addition, the number of $3,000-$6,000 2S servers shipping with only one socket populated jumped about 25% as well (from 27% to about 35% share of that price band). Figure 3: Share of 2S servers shipped with only one socket populated Source: IDC, 2016 AMD's EPYC architecture has the same kind of core count and architectural uplift, so stands a good chance of causing the same kind of effect in the marketplace, with the caveat that AMD is also interested in promoting a native 1S market. Today, over one-third of 2S capable rack optimized servers ship with only one processor populated. See Table 1 for detail on the two highest volume 2S server price bands. Table 1: 2S rack optimized servers shipped with only one processor populated during Q1- Q3 2016 Price Band 2S Rack Share One Processor Shipped $3,000 to $5,999 43% 35% $6,000 to $9,999 30% 39% Source: IDC, 2016 Page 3

Features Many 2S servers are purchased today simply to acquire more and different I/O capacity or memory than what is available on Intel s 1S server architectures. To add significantly more memory and PCIe lanes requires adding the second processor. Those designs use the 2S chipset for more or different I/O (mostly PCIe, NVMe, and/or SATA), not for the second socket s compute resources. Most enterprise and hyperscale server buyers will not trade down from Intel s Xeon E5 enterpriseclass features, such as use of registered DIMMs (RDIMMs), to Xeon E3 s used in 1U servers that have consumer-pc-derived features, such as unbuffered DIMMs (UDIMMs). The price difference between buying one Xeon E3 vs. one Xeon E5, with the potential to upgrade to two Xeon E5 processors, is considered a small cost to pay even though the financial cost is actually very high. In addition, practically no one who buys a 2S server with only one Xeon E5 populated will exercise the option of installing a second processor at a later time 1. Table 2: EPYC competitive landscape Vendor Intel Intel Intel AMD Model Xeon D 1500 Xeon E3-1200 v5 Xeon E5-2600 v4 Epyc Max Sockets 1 1 2 2 Core Generation Broadwell Skylake Broadwell-EP Zen Core Count per Socket 4, 8, 12, 16 4 Up to 22 Up to 32 Thread Count per Socket 8, 16, 12, 32 8 Up to 44 Up to 64 Memory Type DDR4/DDR3L DDR4/DDR3L DDR4/3DS DDR4/3DS Speed (MHz) 2400 2133 2400 2667 R/LRDIMM per Socket (GB) 128 N/A 1536 2048 UDIMM per Socket (GB) 64 64 N/A N/A DIMM Size - Max (GB) 32 16 128 128 Channels 2 2 4 8 DIMMs per Channel 2 2 3 2 PCIe Gen 3 Lanes 24 16 40 Up to 128 PCIe Gen 2 Lanes 8 No No No Integrated SATA3 6 No No Up to 32 Integrated NVMe No No No Up to 32 Integrated USB3/2 4 No No 4 Integrated Ethernet 2 x 10G No No No Requires Chipset No C230 C612 No Source: TIRIAS Research Table 2 shows a side-by-side comparison of AMD EPYC and competitive Intel Xeon products. All the information in Table 3 is for 1S configurations, although both EPYC and Xeon E5 are dual- 1 Anecdotally and as measured by processor shipment data. Page 4

socket capable. The table clearly shows that AMD EPYC is not only pushing well beyond Xeon E3 limitations, but also exceeding Xeon E5 v4 integration and features. For the case where both processor sockets are populated largely for the additional memory capacity and/or PCIe lanes, not only does this add additional capital expense (CAPEX) from increased board area, chassis volume, and the second processor, but there is also ongoing operational expense (OPEX) increases because of additional power consumption of the second processor. Table 3 shows that Intel is pricing the Xeon D 1S SoC at twice the price of a comparable Xeon E3 processor, but the Xeon E3 requires additional southbridge chip, networking, and I/O support. In contrast, Intel is pricing the Xeon E5 processor, which also requires additional southbridge chip, networking, and I/O support, at six times the price of a comparable E3 SKU for a 2S board with only one socket populated, an E5 is required for upgradeability (which, as mentioned above, rarely happens, but is still considered in purchase decisions). With both sockets of a 2S board populated, Intel s list price for dual E5 processors is twelve times (12x) the price of a single E3 processor. Table 3: Intel Xeon model lines list price comparison 2 Intel Xeon Representative Multiple of E3 Approximate List Product Line SKU List Price Price (Mar 2017) E3 E3-1275V5 1 $325 D D-1548 2 $650 E5 E5-2667V4 6 $2,000 E7 E7-8867V4 14 $4,600 Source: TIRIAS Research and https://ark.intel.com/ Upgradeability There are two ways to look at upgradeability: 1. Upgrade the processors in both sockets for clock speed and other architectural improvements. This technique worked well when each new generation of processor added substantial performance via clock rate increases and architectural improvements. Unfortunately, Intel s current x86 processor core architectures are maturing and Moore s Law no longer automatically generates faster clock speeds because of process shrinks. As a result, performance improvements are often within single digits from one generation to another. Servers today are typically left in-service until they fail, until they are not supported, or until they cannot run a specific workload. 2 January 10, 2017 prices are rounded down slightly for ease of comparison. Page 5

2. Start out with only one processor and then add a second to add more capability. Anecdotally, from talking with many cloud and enterprise IT shops, most servers will never have their chassis covers opened to upgrade any part of the server. That is true for 2S servers purchased with only one processor, as well. While adding a second processor was a reasonable bet two decades ago from an applications software point of view, modern applications tend to scale-out nicely by adding more inexpensive servers, rather than buying expensive scale-up servers that are overbuilt for many applications in virtual machine hosting deployments. It is much easier to scale-out to add capacity these days. If operators are going to invest the manpower to open a server rack and pull out a chassis they might as well replace the whole chassis than trying to install an additional CPU, heatsink, and more memory. AMD s New Server Architecture and Design AMD s EPYC SoC is designed to address the latent 1S server demand that Intel has pushed to expensive and unnecessary 2U servers, as well as high-performance 2U server demands. A EPYC SoC can directly connect up to 32 SATA or NVMe devices. Configuration of EPYC s PCIe Gen3 lanes as NVMe, SATA, or 2S socket interconnect is determined when a motherboard is designed. In a 2S configuration, EPYC s system memory capacity doubles to 2/4TB of memory (RDIMM/LRDIMM), but system I/O is identical to a 1S solution half of the I/O lanes are used for high-speed links between the two SoC sockets. Figure 4: AMD EPYC 1S development board (left) and 2S pre-production board (right) Source: TIRIAS Research Page 6

Economics There are two economic components to server ownership: CAPEX and OPEX. CAPEX is the buyer s price for a complete server, while OPEX is dominated by the server s power consumption and management overhead costs. Generalizing a comparison between 1S and 2S economics is somewhat fuzzy. The features that dominate the comparison are listed in Table 4. Table 4: CAPEX and OPEX contributions to differences in 1S and 2S server ownership CAPEX OPEX Processor Pricing Power consumption Motherboard Pricing Power consumption Power supply Pricing Data center density Opportunity cost Source: TIRIAS Research There is also an OPEX software licensing component, but that is extremely difficult to generalize across a wide range of applications. We estimate $200 to $300 increase in pricing between typical 1S and 2S motherboards (due to additional components, increased board costs, etc.) close to the difference in single processor pricing between Xeon E3 and Xeon E5. This price difference is nominal for a larger and more complex board design, and for support components (power regulators, socket, etc.) required for the second processor or SoC socket. Additionally, a higher capacity, and therefore more expensive power supply, is used in 2S designs to power the additional processor and support components, even if the second processor socket is never populated. While the additional 2S motherboard components figure into an increase in power consumption, here again, processor power consumption dwarfs the rest of the motherboard components. Intel s 1S Xeon E3 series power consumption specifications range from 25W to 80W, and Xeon D power consumption specifications range from 45W to 60W. Intel s 2S Xeon E5 series power consumption ranges from 50W to 145W. 3 For comparison, DIMM slots are configured for 7.5W power consumption, so four DIMMs may consume up to 30W and eight may consume up to 60W. 3 Specifications for Intel products were obtained from https://ark.intel.com/ Page 7

However, the motherboard price differential is dwarfed by today s approximately $600 to over $1000 price difference moving from an Intel 1S Xeon E3 series processor to a 2S Xeon E5 series processor. EPYC SoCs that fit in a 2S system will also fit into 1S systems without diluting per socket capabilities or functionality at any design point (see Table 2). Memory and I/O Scaling Each EPYC SoC will have the same memory bandwidth and capacity, whether in a 1S or 2S design. This will dramatically impact system designers ability to create performant 1S systems, as Intel segments their 1S and 2S product lines by restricting 1S memory bandwidth and capacity, which has the effect of upselling customers to 2S designs. Scaling EPYC from 1S to 2S does consume some I/O lanes. A EPYC 2S design uses half of each SoC s high speed I/O lanes to connect to the other SoC socket. But both the 1S and 2S EPYC motherboards support the same number of PCIe lanes. From Table 2, note the limited memory capacity of both Xeon D and Xeon E3, and that Xeon E3 does not support server standard RDIMMs. Conclusion AMD s EPYC architecture will enable uniquely capable 1S server solutions. TIRIAS Research believes that economics, coupled with memory and I/O scaling, are sufficient justification for data center customers to adopt 1S designs in volume, as AMD s EPYC SoC credibly addresses the majority of 1S architectural and economic objections. In other words, we believe that buying overprovisioned 2S servers will be throwing money away once EPYC 1S servers are generally available. This is the same kind of transition the server market experienced after AMD shipped the dual-core AMD Opteron and Intel responded in-kind, which caused 8S and higher system sales to collapse into the 4S segment. Since then, public cloud providers have almost completely ignored 4S servers in favor of 2S configurations in their metalas-a-service (MaaS), infrastructure-as-a-service (IaaS), and platform-as-a-service (PaaS) product offerings. From a software development perspective, developing for EPYC 1S solutions is no different than developing for any other server processor. Plus, customers can also design top performing 2S solutions with EPYC and they can use the exact same EPYC SKUs to do so. From a customer perspective, each server should support a baseline level of performance for a given application (number of users, number of transactions, throughput, etc.). It should not matter if a server is 1S or 2S, as long as it meets or beats that baseline level of performance with enterprise- Page 8

class features. The challenge has been that, until EPYC, all things had not been equal in the supply chain. EPYC s high core count and robust features should enable AMD to convert what could be a simple increase in 2S servers shipped with only one processor into an expanding market for purpose-built 1S server designs. AMD designed EPYC to address enterprise-class server deployments. We look forward to EPYC performance metrics and TCO analysis as its launch date approaches to further quantify its 1S advantages over Intel s Xeon processors. Page 9

Edited on February 8, 2018 to correct AMD EPYC branding. Copyright TIRIAS Research LLC 2017. All rights reserved. Reproduction in whole or in part is prohibited without written permission from TIRIAS Research LLC. This report is the property of TIRIAS Research LLC and is made available only upon these terms and conditions. The contents of this report represent the interpretation and analysis of statistics and information that is either generally available to the public or released by responsible agencies or individuals. The information contained in this report is believed to be reliable but is not guaranteed as to its accuracy or completeness. TIRIAS Research LLC reserves all rights herein. Reproduction or disclosure in whole or in part is permitted only with the written and express consent of TIRIAS Research LLC. Page 10