NVMe Performance Testing and Optimization Application Note
|
|
- Lindsay Beasley
- 6 years ago
- Views:
Transcription
1 NVMe Performance Testing and Optimization Application Note Publication # Revision: 0.72 Issue Date: December 2017 Advanced Micro Devices
2 2017 Advanced Micro Devices, Inc. All rights reserved. The information contained herein is for informational purposes only, and is subject to change without notice. While every precaution has been taken in the preparation of this document, it may contain technical inaccuracies, omissions and typographical errors, and AMD is under no obligation to update or otherwise correct this information. Advanced Micro Devices, Inc. makes no representations or warranties with respect to the accuracy or completeness of the contents of this document, and assumes no liability of any kind, including the implied warranties of noninfringement, merchantability or fitness for particular purposes, with respect to the operation or use of AMD hardware, software or other products described herein. No license, including implied or arising by estoppel, to any intellectual property rights is granted by this document. Terms and limitations applicable to the purchase or use of AMD s products are as set forth in a signed agreement between the parties or in AMD's Standard Terms and Conditions of Sale. Trademarks AMD, the AMD Arrow logo, AMD EPYC, and combinations thereof, are trademarks of Advanced Micro Devices, Inc. Other product names used in this publication are for identification purposes only and may be trademarks of their respective companies. Linux is a registered trademark of Linus Torvalds. PCI and PCIe are registered trademarks of PCI SIG.
3 56163 Rev December 2017 NVMe Performance Testing and Optimization Contents Introduction... 6 AMD EPYC Processor Architecture... 6 System Optimizations... 8 FIO CPU Pinning Test System Test Setup Results % Read Test % Read 30% Write Test % Read 70% Write Test % Write Test Summary Contents 3
4 NVMe Performance Testing and Optimization Rev December 2017 List of Figures Figure 1. AMD EPYC Processor Architecture... 7 Figure % Read IOps Figure % Read Bandwidth Figure % Read CPU Utilization Figure 5. 70% Read 30% Write IOPs Figure 6. 70% Read 30% Write Bandwidth Figure 7. 70% Read 30% Write CPU Utilization Figure 8. 30% Read 70% Write IOPs Figure 9. 30% Read 70% Write Bandwidth Figure % Read 70% Write CPU Utilization Figure % Write IOPs Figure % Write Bandwidth Figure % Write CPU Utilization List of Figures
5 56163 Rev December 2017 NVMe Performance Testing and Optimization Revision History Date Revision Description December Initial public release; updated line a. in System Optimizations section on page 8. September Updated CPIO Pinning section; Updated legend for Figures 4, 5, and 7. August Initial NDA release. Revision History 5
6 NVMe Performance Testing and Optimization Rev December 2017 Introduction The AMD EPYC processor has more PCIe lanes and NUMA nodes than a traditional processor which can impact synthetic I/O testing adversely. When performing synthetic IO testing, some optimizations need to be done to achieve maximum performance. This application note discusses the EPYC architecture and how to optimize the IO. AMD EPYC Processor Architecture The AMD EPYC processor is functionally different than any other CPU on the market. The processor uses 4 dies to create a single CPU. A single die contains 2 Core Compute Complexes or a CCX, each CCX has 4 Zen cores which share a single L3 cache. In the case of this test system, which is using the AMD EPYC 7601 processor, it has 8 physical cores per die, meaning it has 32 total cores. The internal communication of the dies is handled by the Infinity Fabric, which is a low latency fabric that manages inter-die and inter-ccx communication. The AMD EPYC processor architecture provides enhanced performance, core count, and PCIe connectivity over traditional CPU architecture. When doing synthetic disk testing it is best to pin the IO to the associated die. Linux sees these dies as NUMA nodes, which is attached to the IO device. The optimizations in this paper can be applied to SATA, SAS and NVMe drives. 6 Introduction
7 56163 Rev December 2017 NVMe Performance Testing and Optimization Figure 1. AMD EPYC Processor Architecture Figure 1 shows the separate NUMA nodes with their associated dies and their direct connectivity internally to the multiple SATA, NVMe and PCIe devices installed on this test system. For example, PCI device 144d:a822 is a Samsung NVMe drive, of which 22 are connected to PCIe root complexes in the platform. The testing below focuses on testing a single one of these drives that is attached to die 0. Optimizing the workload by keeping the IO localized to a die minimizes external die memory usage by keeping all the IO local to the die that is associated with the SATA drive, NVMe device, or PCIe device. The Infinity Fabric has high speed interconnectivity between the dies but it is not as fast local die IO. AMD EPYC Processor Architecture 7
8 NVMe Performance Testing and Optimization Rev December 2017 System Optimizations The system optimizations that should be performed for synthetic disk testing are the standard optimizations for Linux. a. Load the latest kernel which provides patches that will optimize IO. b. Change the IO scheduler to NOOP Edit: grub.conf and add to the GRUB_CMDLINE_LINUX_DEFAULT line elevator=noop and then run update-grub Set the CPU governor to performance c. Run this command from a prompt: cpufreq-set -c 0 -g ondemand d. If you do not have the cpufreq-set command available you need to install the cpufrequtils package As described previously, the AMD EPYC processor architecture is fundamentally different than prior CPU architectures. AMD has worked with the open source community to provide updates to the Linux kernel so that it is optimized to use the EPYC CPU to its full capabilities. A large amount of work has been done on the IRQBALANCE service, specifically around optimizations for data locality and core count. The latest version can be found at: cd84bd93 If the IO is expected to be extremely high, then it would be best to pin CPU cores to the respective IO device that is connected to that core. A simple way to see this is to load the HWLOC package on Linux which contains LSTOPO. LSTOPO is a command that can be used to show PCIe connectivity and to visualize the various NUMA nodes installed on the system. The following is the command and resulting output of the command for the example platform. It is only showing the output of a single NUMA node to simplify the results. root@nvmetestsys1:~# lstopo-no-graphics Machine (252GB total) NUMANode L#0 (P#0 63GB) Package L#0 L3 L#0 (8192KB) L2 L#0 (512KB) + L1d L#0 (32KB) + L1i L#0 (64KB) + Core L#0 PU L#0 (P#0) PU L#1 (P#1) L2 L#1 (512KB) + L1d L#1 (32KB) + L1i L#1 (64KB) + Core L#1 PU L#2 (P#2) PU L#3 (P#3) L2 L#2 (512KB) + L1d L#2 (32KB) + L1i L#2 (64KB) + Core L#2 PU L#4 (P#4) PU L#5 (P#5) 8 System Optimizations
9 56163 Rev December 2017 NVMe Performance Testing and Optimization L2 L#3 (512KB) + L1d L#3 (32KB) + L1i L#3 (64KB) + Core L#3 PU L#6 (P#6) PU L#7 (P#7) L3 L#1 (8192KB) L2 L#4 (512KB) + L1d L#4 (32KB) + L1i L#4 (64KB) + Core L#4 PU L#8 (P#8) PU L#9 (P#9) L2 L#5 (512KB) + L1d L#5 (32KB) + L1i L#5 (64KB) + Core L#5 PU L#10 (P#10) PU L#11 (P#11) L2 L#6 (512KB) + L1d L#6 (32KB) + L1i L#6 (64KB) + Core L#6 PU L#12 (P#12) PU L#13 (P#13) L2 L#7 (512KB) + L1d L#7 (32KB) + L1i L#7 (64KB) + Core L#7 PU L#14 (P#14) PU L#15 (P#15) HostBridge L#0 PCIBridge PCI 144d:a822 PCIBridge PCI 144d:a822 PCIBridge PCIBridge PCI 1a03:2000 GPU L#0 "card0" GPU L#1 "controld64" PCIBridge PCI 144d:a822 PCIBridge PCI 144d:a822 PCIBridge PCI 144d:a822 PCIBridge PCI 144d:a822 PCIBridge PCI 1022:7901 Block(Disk) L#2 "sda" System Optimizations 9
10 NVMe Performance Testing and Optimization Rev December 2017 FIO CPU Pinning FIO supports CPU pinning within the FIO workload file an example of this is as follows: [global] name=4k random read 4 ios in the queue in 32 queues ioengine=libaio direct=1 readwrite=randrw rwmixread=70 iodepth=64 buffered=0 size=100% runtime=30 time_based randrepeat=0 norandommap refill_buffers ramp_time=10 [job1] filename=/dev/nvme0n1 bs=4k cpus_allowed=0 [job2] filename=/dev/nvme0n1 bs=4k cpus_allowed=2 [job3] filename=/dev/nvme0n1 bs=4k cpus_allowed=4 In Job 1 it pins CPU 0 to that job that will send IO to nvme0n1 which is directly attached to CPU 0. To verify this impacted synthetic disk benchmark performance tests we used the four corners of disk IO to insure there was improvement. Test System System Memory HPE CL GB of installed memory 10 FIO CPU Pinning
11 56163 Rev December 2017 NVMe Performance Testing and Optimization CPU NVMe Drive AMD EPYC 7601 processor Samsung PM1725a OS Ubuntu Optimizations to the OS IO Scheduler set to NOOP CPU Governor set to performance Latest IRQBALANCE patches Test Setup The single drive test was setup using FIO and it focused on four scenarios: 1-100% Read Pinned vs. Unpinned IO 2-70/30% Read/Write Pinned vs. Unpinned IO 3-30/70% Read/Write Pinned vs. Unpinned IO 4-100% Write Pinned vs Unpinned IO The test was setup to verify that these optimizations improved the performance of synthetic disk bench marking. Results 100% Read Test Figure 2 and Figure 3 on page 12, and Figure 4 on page 13 show the results of the 100% Read test. Test Setup 11
12 NVMe Performance Testing and Optimization Rev December % Read IOPs 900,00 800,00 700,00 600,00 500,00 400,00 300,00 Read IOPs Pinned READ IOPs Unpinned 200,00 100,00 Figure % Read IOps 100% Read Bandwidth 3,500,00 3,000,00 2,500,00 2,000,00 1,500,00 1,000,00 500,00 BW Pinned BW Unpinned Figure % Read Bandwidth 12 Results
13 56163 Rev December 2017 NVMe Performance Testing and Optimization % Read CPU Utilization USR CPU Pinned USR CPU Unpinned SYS CPU Pinned SYS CPU Unpinned Figure % Read CPU Utilization The 100% Read test shows significant improvement of pinned IO vs unpinned IO. The CPU spent less time in SYS space which means the kernel is performing more efficiently while performing IO. Subsequently the IOPs numbers went up and the drive performed at its full capabilities. Results 13
14 NVMe Performance Testing and Optimization Rev December % Read 30% Write Test Figure 5 on page 14, and Figure 6 on page 15 and Figure 7 on page 16 show the results of the 70% Read 30% Write test. 250,00 70% Read 30% Write IOPs 200,00 150,00 100,00 50, Read IOPs Pinned Read IOPs Unpinned Write IOPs Pinned Write IOPs Unpinned Figure 5. 70% Read 30% Write IOPs 14 Results
15 56163 Rev December 2017 NVMe Performance Testing and Optimization 1,400,00 70% Read 30% Write Bandwidth 1,200,00 1,000,00 800,00 600,00 400,00 200,00 Bandwidth Pinned Bandwidth Unpinned Figure 6. 70% Read 30% Write Bandwidth Results 15
16 NVMe Performance Testing and Optimization Rev December % Read 30% Write CPU Utilization USR CPU Pinned USR CPU Unpinned SYS CPU Pinned SYS CPU Unpinned Figure 7. 70% Read 30% Write CPU Utilization The read write tests are impacted less by the pinning from an IOP perspective but pinning allows the CPU to perform more efficiently than not pinned which can be seen by the CPU Utilization graph. The IOP numbers were slightly lower but the CPU performed more efficiently in this workload while IO was pinned. 16 Results
17 56163 Rev December 2017 NVMe Performance Testing and Optimization 30% Read 70% Write Test Figure 8 on page 17, and Figure 9 and Figure 10 and on page 18 show the results of the 70% Read 30% Write test. 160,00 140,00 120,00 100,00 80,00 60,00 40,00 20,00 30% Read 70% Write IOPs Read IOPs Pinned READ IOPs Unpinned Write IOPs Pinned Write IOPs Unpinned Figure 8. 30% Read 70% Write IOPs Results 17
18 NVMe Performance Testing and Optimization Rev December % Read 70% Write Bandwidth 1,000,00 900,00 800,00 700,00 600,00 500,00 400,00 300,00 200,00 100,00 BW Pinned BW Unpinned Figure 9. 30% Read 70% Write Bandwidth 30% Read 70% Write CPU Utilization USR CPU Pinned USR CPU Unpinned SYS CPU Pinned SYS CPU Unpinned Figure % Read 70% Write CPU Utilization 18 Results
19 56163 Rev December 2017 NVMe Performance Testing and Optimization The read write tests are impacted less by the pinning from an IOP perspective but pinning allows the CPU to perform more efficiently than not pinned which can be seen by the CPU Utilization graph. The IOP numbers were slightly lower but the CPU performed more efficiently in this workload while IO was pinned. 100% Write Test Figure 11 on page 19, and Figure 12 and Figure 13 on page 20 show the results of the 70% Read 30% Write test. 250,00 100% Write IOPs 200,00 150,00 100,00 50,00 Write IOPs Pinned Write IOPs Unpinned Figure % Write IOPs Results 19
20 NVMe Performance Testing and Optimization Rev December % Write Bandwidth 900,00 800,00 700,00 600,00 500,00 400,00 300,00 200,00 100,00 BW Pinned BW Unpinned Figure % Write Bandwidth 100% Write CPU Utilization USR CPU Pinned USR CPU Unpinned SYS CPU Pinned SYS CPU Unpinned Figure % Write CPU Utilization 20 Results
21 56163 Rev December 2017 NVMe Performance Testing and Optimization Pinned CPU performance performed better in this synthetic test. Summary The optimizations show increased performance and allows the CPU to perform at its potential. The AMD EPYC processor performs better when the IO is localized to the attached CPU, when synthetic testing is performed the best options are to run the latest IRQBALANCE patches and to pin the CPUs. Summary 21
NVMe SSD Performance Evaluation Guide for Windows Server 2016 and Red Hat Enterprise Linux 7.4
NVMe SSD Performance Evaluation Guide for Windows Server 2016 and Red Hat Enterprise Linux 7.4 Publication # 56367 Revision: 0.70 Issue Date: August 2018 Advanced Micro Devices 2018 Advanced Micro Devices,
More informationNUMA Topology for AMD EPYC Naples Family Processors
NUMA Topology for AMD EPYC Naples Family Publication # 56308 Revision: 0.70 Issue Date: May 2018 Advanced Micro Devices 2018 Advanced Micro Devices, Inc. All rights reserved. The information contained
More informationMicrosoft Windows 2016 Mellanox 100GbE NIC Tuning Guide
Microsoft Windows 2016 Mellanox 100GbE NIC Tuning Guide Publication # 56288 Revision: 1.00 Issue Date: June 2018 2018 Advanced Micro Devices, Inc. All rights reserved. The information contained herein
More informationEPYC VIDEO CUG 2018 MAY 2018
AMD UPDATE CUG 2018 EPYC VIDEO CRAY AND AMD PAST SUCCESS IN HPC AMD IN TOP500 LIST 2002 TO 2011 2011 - AMD IN FASTEST MACHINES IN 11 COUNTRIES ZEN A FRESH APPROACH Designed from the Ground up for Optimal
More informationLinux Network Tuning Guide for AMD EPYC Processor Based Servers
Linux Network Tuning Guide for AMD EPYC Processor Application Note Publication # 56224 Revision: 1.00 Issue Date: November 2017 Advanced Micro Devices 2017 Advanced Micro Devices, Inc. All rights reserved.
More informationMemory Population Guidelines for AMD EPYC Processors
Memory Population Guidelines for AMD EPYC Processors Publication # 56301 Revision: 0.70 Issue Date: July 2018 Advanced Micro Devices 2018 Advanced Micro Devices, Inc. All rights reserved. The information
More informationLinux Network Tuning Guide for AMD EPYC Processor Based Servers
Linux Network Tuning Guide for AMD EPYC Processor Application Note Publication # 56224 Revision: 1.10 Issue Date: May 2018 Advanced Micro Devices 2018 Advanced Micro Devices, Inc. All rights reserved.
More informationPerformance Tuning Guidelines for Low Latency Response on AMD EPYC -Based Servers Application Note
Performance Tuning Guidelines for Low Latency Response on AMD EPYC -Based Servers Publication # 56263 Revision: 3.00 Issue Date: January 2018 Advanced Micro Devices 2018 Advanced Micro Devices, Inc. All
More informationAMD Radeon ProRender plug-in for Unreal Engine. Installation Guide
AMD Radeon ProRender plug-in for Unreal Engine Installation Guide This document is a guide on how to install and configure AMD Radeon ProRender plug-in for Unreal Engine. DISCLAIMER The information contained
More informationThermal Design Guide for Socket SP3 Processors
Thermal Design Guide for Socket SP3 Processors Publication # 55423 Rev: 3.00 Issue Date: November 2017 2017 Advanced Micro Devices, Inc. All rights reserved. The information contained herein is for informational
More informationDriver Options in AMD Radeon Pro Settings. User Guide
Driver Options in AMD Radeon Pro Settings User Guide This guide will show you how to switch between Professional Mode and Gaming Mode when using Radeon Pro Software. DISCLAIMER The information contained
More informationJava Application Performance Tuning for AMD EPYC Processors
Java Application Performance Tuning for AMD EPYC Processors Publication # 56245 Revision: 0.70 Issue Date: January 2018 Advanced Micro Devices 2018 Advanced Micro Devices, Inc. All rights reserved. The
More informationChanging your Driver Options with Radeon Pro Settings. Quick Start User Guide v3.0
Changing your Driver Options with Radeon Pro Settings Quick Start User Guide v3.0 This guide will show you how to switch between Professional Mode and Gaming Mode when using Radeon Pro Software. DISCLAIMER
More informationCAUTIONARY STATEMENT 1 EPYC PROCESSOR ONE YEAR ANNIVERSARY JUNE 2018
CAUTIONARY STATEMENT This presentation contains forward-looking statements concerning Advanced Micro Devices, Inc. (AMD) including, but not limited to, the features, functionality, availability, timing,
More informationFan Control in AMD Radeon Pro Settings. User Guide. This document is a quick user guide on how to configure GPU fan speed in AMD Radeon Pro Settings.
Fan Control in AMD Radeon Pro Settings User Guide This document is a quick user guide on how to configure GPU fan speed in AMD Radeon Pro Settings. DISCLAIMER The information contained herein is for informational
More informationAMD EPYC Processors Showcase High Performance for Network Function Virtualization (NFV)
White Paper December, 2018 AMD EPYC Processors Showcase High Performance for Network Function Virtualization (NFV) Executive Summary Data centers and cloud service providers are creating a technology shift
More informationChanging your Driver Options with Radeon Pro Settings. Quick Start User Guide v2.1
Changing your Driver Options with Radeon Pro Settings Quick Start User Guide v2.1 This guide will show you how to switch between Professional Mode and Gaming Mode when using Radeon Pro Software. DISCLAIMER
More informationWhite Paper AMD64 TECHNOLOGY SPECULATIVE STORE BYPASS DISABLE
White Paper AMD64 TECHNOLOGY SPECULATIVE STORE BYPASS DISABLE 2018 Advanced Micro Devices Inc. All rights reserved. The information contained herein is for informational purposes only, and is subject to
More informationSolid State Graphics (SSG) SDK Setup and Raw Video Player Guide
Solid State Graphics (SSG) SDK Setup and Raw Video Player Guide PAGE 1 Radeon Pro SSG SDK Setup To enable you to access the capabilities of the Radeon Pro SSG card, it comes with extensions for Microsoft
More informationAMD EPYC and NAMD Powering the Future of HPC February, 2019
AMD EPYC and NAMD Powering the Future of HPC February, 19 Exceptional Core Performance NAMD is a compute-intensive workload that benefits from AMD EPYC s high core IPC (Instructions Per Clock) and high
More informationAMD NVMe/SATA RAID Quick Start Guide for Windows Operating Systems
AMD NVMe/SATA RAID Quick Start Guide for Windows Operating Systems Publication # 56268 Revision: 1.02 Issue Date: April 2018 Advanced Micro Devices 2018 Advanced Micro Devices, Inc. All rights reserved.
More informationINTRODUCTION TO OPENCL TM A Beginner s Tutorial. Udeepta Bordoloi AMD
INTRODUCTION TO OPENCL TM A Beginner s Tutorial Udeepta Bordoloi AMD IT S A HETEROGENEOUS WORLD Heterogeneous computing The new normal CPU Many CPU s 2, 4, 8, Very many GPU processing elements 100 s Different
More informationForza Horizon 4 Benchmark Guide
Forza Horizon 4 Benchmark Guide Copyright 2018 Playground Games Limited. The Playground Games name and logo, the Forza Horizon 4 name and logo and the Forza Horizon 4 insignia are trademarks of Playground
More informationDell EMC NUMA Configuration for AMD EPYC (Naples) Processors
Dell EMC NUMA Configuration for AMD EPYC (Naples) Processors Dell Engineering February 2018 A Dell EMC Deployment and Configuration Guide Revisions Date February 2018 Description Initial release The information
More informationAMD Radeon ProRender plug-in for Universal Scene Description. Installation Guide
AMD Radeon ProRender plug-in for Universal Scene Description Installation Guide This document is a guide on how to install and configure AMD Radeon ProRender plug-in for Universal Scene Description (USD).
More informationEnhance your Cloud Security with AMD EPYC Hardware Memory Encryption
Enhance your Cloud Security with AMD EPYC Hardware Memory Encryption White Paper October, 2018 Introduction Consumers and enterprises are becoming increasingly concerned about the security of their digital
More informationCAUTIONARY STATEMENT This presentation contains forward-looking statements concerning Advanced Micro Devices, Inc. (AMD) including, but not limited to the features, functionality, availability, timing,
More informationAMD IOMMU VERSION 2 How KVM will use it. Jörg Rödel August 16th, 2011
AMD IOMMU VERSION 2 How KVM will use it Jörg Rödel August 16th, 2011 AMD IOMMU VERSION 2 WHAT S NEW? 2 AMD IOMMU Version 2 Support in KVM August 16th, 2011 Public NEW FEATURES - OVERVIEW Two-level page
More informationThe devices can be set up with RAID for additional performance and redundancy using software RAID. Models HP Z Turbo Drive Quad Pro 2x512GB PCIe SSD
Overview HP Z Turbo Quad Pro Introduction The demands on Workstations continue to increase, especially in segments like digital media or imaging, where resolutions and file sizes are increasing. SSD technology
More informationPowerEdge NUMA Configurations with AMD EPYC Processors
PowerEdge Product Group Direct from Development PowerEdge NUMA Configurations with AMD EPYC Processors Tech Note by: Jose Grande SUMMARY With the introduction of AMD s EPYC (Naples) x86 Server CPUs featuring
More informationFOR ENTERPRISE 18.Q3. August 8 th, 2018
18.Q3 August 8 th, 2018 AMD RADEON PRO SOFTWARE TM Making the Best AMD RADEON PRO SOFTWARE TM Making the Best Quality Performance Simplicity Virtualization AMD RADEON PRO SOFTWARE TM Your Workstation Virtually
More informationCAUTIONARY STATEMENT This presentation contains forward-looking statements concerning Advanced Micro Devices, Inc. (AMD) including, but not limited to
CAUTIONARY STATEMENT This presentation contains forward-looking statements concerning Advanced Micro Devices, Inc. (AMD) including, but not limited to AMD s strategy and focus, expected datacenter total
More informationMxGPU Setup Guide with VMware
Page 1 of 17 MxGPU Setup Guide with VMware 1 Page 2 of 17 DISCLAIMER The information contained herein is for informational purposes only, and is subject to change without notice. While every precaution
More informationVMware vsphere 6.5. Radeon Pro V340 MxGPU Deployment Guide for. Version 1.0
for VMware vsphere 6.5 Version 1.0 This document covers set up, installation, and configuration of MxGPU with Radeon Pro V340 in a VMware vsphere 6.5 environment. DISCLAIMER The information contained herein
More informationFAQs HP Z Turbo Drive Quad Pro
FAQs HP Z Turbo Drive Quad Pro Product performance/implementation What is the HP Z Turbo Drive PCIe SSD? The HP Z Turbo Drive PCIe SSD is the family name for an M.2 PCIe connected SSD. The M.2 PCIe card
More informationSIMULATOR AMD RESEARCH JUNE 14, 2015
AMD'S gem5apu SIMULATOR AMD RESEARCH JUNE 14, 2015 OVERVIEW Introducing AMD s gem5 APU Simulator Extends gem5 with a GPU timing model Supports Heterogeneous System Architecture in SE mode Includes several
More informationHyper-converged infrastructure with Proxmox VE virtualization platform and integrated Ceph Storage.
Hyper-converged infrastructure with Proxmox VE virtualization platform and integrated Ceph Storage. To optimize performance in hyper-converged deployments with Proxmox VE and Ceph storage the appropriate
More informationIntelligent Tiered Storage Acceleration Software for Windows 10
for Windows 10 QUICK START GUIDE April 2018 2018 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD logo, Ryzen, Threadripper, and combinations thereof are trademarks are of Advanced Micro
More informationFLASH MEMORY SUMMIT Adoption of Caching & Hybrid Solutions
FLASH MEMORY SUMMIT 2011 Adoption of Caching & Hybrid Solutions Market Overview 2009 Flash production reached parity with all other existing solid state memories in terms of bites. 2010 Overall flash production
More informationComparing UFS and NVMe Storage Stack and System-Level Performance in Embedded Systems
Comparing UFS and NVMe Storage Stack and System-Level Performance in Embedded Systems Bean Huo, Blair Pan, Peter Pan, Zoltan Szubbocsev Micron Technology Introduction Embedded storage systems have experienced
More informationamdgpu Graphics Stack Documentation
amdgpu Graphics Stack Documentation Release unknown-build Advanced Micro Devices, Inc. Oct 25, 2018 Contents 1 Table of Contents 1 1.1 Preamble................................................. 1 1.2 Overview.................................................
More informationUnderstanding GPGPU Vector Register File Usage
Understanding GPGPU Vector Register File Usage Mark Wyse AMD Research, Advanced Micro Devices, Inc. Paul G. Allen School of Computer Science & Engineering, University of Washington AGENDA GPU Architecture
More informationLow-Overhead Flash Disaggregation via NVMe-over-Fabrics
Low-Overhead Flash Disaggregation via NVMe-over-Fabrics Vijay Balakrishnan Memory Solutions Lab. Samsung Semiconductor, Inc. August 2017 1 DISCLAIMER This presentation and/or accompanying oral statements
More informationDeep Learning Performance and Cost Evaluation
Micron 5210 ION Quad-Level Cell (QLC) SSDs vs 7200 RPM HDDs in Centralized NAS Storage Repositories A Technical White Paper Rene Meyer, Ph.D. AMAX Corporation Publish date: October 25, 2018 Abstract Introduction
More informationQuickSpecs. PCIe Solid State Drives for HP Workstations
Overview Introduction Storage technology with NAND media is outgrowing the bandwidth limitations of the SATA bus. New high performance Storage solutions will connect directly to the PCIe bus for revolutionary
More informationFamily 15h Models 00h-0Fh AMD FX -Series Processor Product Data Sheet
Family 15h Models 00h-0Fh AMD FX -Series Publication # 49686 Revision # 3.01 Issue Date October 2012 Advanced Micro Devices 2011, 2012 Advanced Micro Devices Inc. All rights reserved. The contents of this
More informationINTERFERENCE FROM GPU SYSTEM SERVICE REQUESTS
INTERFERENCE FROM GPU SYSTEM SERVICE REQUESTS ARKAPRAVA BASU, JOSEPH L. GREATHOUSE, GURU VENKATARAMANI, JÁN VESELÝ AMD RESEARCH, ADVANCED MICRO DEVICES, INC. MODERN SYSTEMS ARE POWERED BY HETEROGENEITY
More informationLow-Overhead Flash Disaggregation via NVMe-over-Fabrics Vijay Balakrishnan Memory Solutions Lab. Samsung Semiconductor, Inc.
Low-Overhead Flash Disaggregation via NVMe-over-Fabrics Vijay Balakrishnan Memory Solutions Lab. Samsung Semiconductor, Inc. 1 DISCLAIMER This presentation and/or accompanying oral statements by Samsung
More informationSamsung PM1725a NVMe SSD
Samsung PM1725a NVMe SSD Exceptionally fast speeds and ultra-low latency for enterprise application Brochure 1 Extreme performance from an SSD technology leader Maximize data transfer with the high-performance,
More informationOptimizations of BLIS Library for AMD ZEN Core
Optimizations of BLIS Library for AMD ZEN Core 1 Introduction BLIS [1] is a portable software framework for instantiating high-performance BLAS-like dense linear algebra libraries [2] The framework was
More informationDR. LISA SU
CAUTIONARY STATEMENT This presentation contains forward-looking statements concerning Advanced Micro Devices, Inc. (AMD) including, but not limited to AMD s strategy and focus, expected datacenter total
More informationAnatomy of AMD s TeraScale Graphics Engine
Anatomy of AMD s TeraScale Graphics Engine Mike Houston Design Goals Focus on Efficiency f(perf/watt, Perf/$) Scale up processing power and AA performance Target >2x previous generation Enhance stream
More informationQuickSpecs. PCIe Solid State Drives for HP Workstations
Introduction Storage technology with NAND media is outgrowing the bandwidth limitations of the SATA bus. New high performance Storage solutions will connect directly to the PCIe bus for revolutionary performance
More informationHyperTransport Technology
HyperTransport Technology in 2009 and Beyond Mike Uhler VP, Accelerated Computing, AMD President, HyperTransport Consortium February 11, 2009 Agenda AMD Roadmap Update Torrenza, Fusion, Stream Computing
More information* ENDNOTES: RVM-26 AND RZG-01.
2 * ENDNOTES: RVM-26 AND RZG-01. 3 4 5 6 7 *SEE ENDNOTES GD-126 ** RESULTS MAY VARY. SEE ENDNOTES RZP-31 8 * SEE ENDNOTES: RZP-31 ** SEE ENDNOTES: GD-126 *** AMD DEFINES PREMIUM PROCESSOR COOLING AS A
More informationFamily 15h Models 00h-0Fh AMD Opteron Processor Product Data Sheet
Family 15h Models 00h-0Fh AMD Opteron Publication # 49687 Revision # 3.01 Issue Date October 2012 Advanced Micro Devices 2011, 2012 Advanced Micro Devices Inc. All rights reserved. The contents of this
More informationDeep Learning Performance and Cost Evaluation
Micron 5210 ION Quad-Level Cell (QLC) SSDs vs 7200 RPM HDDs in Centralized NAS Storage Repositories A Technical White Paper Don Wang, Rene Meyer, Ph.D. info@ AMAX Corporation Publish date: October 25,
More informationPanel Discussion: The Future of I/O From a CPU Architecture Perspective
Panel Discussion: The Future of I/O From a CPU Architecture Perspective Brad Benton AMD, Inc. #OFADevWorkshop Issues Move to Exascale involves more parallel processing across more processing elements GPUs,
More informationThe Transition to PCI Express* for Client SSDs
The Transition to PCI Express* for Client SSDs Amber Huffman Senior Principal Engineer Intel Santa Clara, CA 1 *Other names and brands may be claimed as the property of others. Legal Notices and Disclaimers
More informationEFFICIENT SPARSE MATRIX-VECTOR MULTIPLICATION ON GPUS USING THE CSR STORAGE FORMAT
EFFICIENT SPARSE MATRIX-VECTOR MULTIPLICATION ON GPUS USING THE CSR STORAGE FORMAT JOSEPH L. GREATHOUSE, MAYANK DAGA AMD RESEARCH 11/20/2014 THIS TALK IN ONE SLIDE Demonstrate how to save space and time
More informationSource RT Group Shared Memory Destination RT Horizontal Pass Vertical Pass Single Shader Tile #1 Tile #2 Tile #3 Tile #4 Tile #5 Start with the store cache filled with border color Store Cache
More informationFamily 15h Models 10h-1Fh AMD Athlon Processor Product Data Sheet
Family 15h Models 10h-1Fh AMD Athlon Publication # 52422 Revision: 3.00 Issue Date: July 2012 Advanced Micro Devices 2012 Advanced Micro Devices, Inc. All rights reserved. The contents of this document
More informationThe mobile computing evolution. The Griffin architecture. Memory enhancements. Power management. Thermal management
Next-Generation Mobile Computing: Balancing Performance and Power Efficiency HOT CHIPS 19 Jonathan Owen, AMD Agenda The mobile computing evolution The Griffin architecture Memory enhancements Power management
More informationWHITE PAPER SINGLE & MULTI CORE PERFORMANCE OF AN ERASURE CODING WORKLOAD ON AMD EPYC
WHITE PAPER SINGLE & MULTI CORE PERFORMANCE OF AN ERASURE CODING WORKLOAD ON AMD EPYC INTRODUCTION With the EPYC processor line, AMD is expected to take a strong position in the server market including
More informationROCm: An open platform for GPU computing exploration
UCX-ROCm: ROCm Integration into UCX {Khaled Hamidouche, Brad Benton}@AMD Research ROCm: An open platform for GPU computing exploration 1 JUNE, 2018 ISC ROCm Software Platform An Open Source foundation
More informationclarmor: A DYNAMIC BUFFER OVERFLOW DETECTOR FOR OPENCL KERNELS CHRIS ERB, JOE GREATHOUSE, MAY 16, 2018
clarmor: A DYNAMIC BUFFER OVERFLOW DETECTOR FOR OPENCL KERNELS CHRIS ERB, JOE GREATHOUSE, MAY 16, 2018 ANECDOTE DISCOVERING A BUFFER OVERFLOW CPU GPU MEMORY MEMORY Data Data Data Data Data 2 clarmor: A
More informationPCI Express Link/Transaction Test Methodology
PCI Express Link/Transaction Test Methodology September 29, 2006 Revision 1.1 This page is intentionally left blank. 2 PCI Express Link/Transaction Test Methodology, Rev 1.1 Revision History Document
More informationThe Impact of SSD Selection on SQL Server Performance. Solution Brief. Understanding the differences in NVMe and SATA SSD throughput
Solution Brief The Impact of SSD Selection on SQL Server Performance Understanding the differences in NVMe and SATA SSD throughput 2018, Cloud Evolutions Data gathered by Cloud Evolutions. All product
More informationPreliminary Information. AMD-8111 TM HyperTransport TM I/O Hub Revision Guide
AMD-8111 TM HyperTransport TM I/O Hub Revision Guide Publication # 25720 Rev: 3.03 Issue Date: July 2003 2003 Advanced Micro Devices, Inc. All rights reserved. The contents of this document are provided
More informationOpenMPDK and unvme User Space Device Driver for Server and Data Center
OpenMPDK and unvme User Space Device Driver for Server and Data Center Open source for maximally utilizing Samsung s state-of-art Storage Solution in shorter development time White Paper 2 Target Audience
More informationApplying Polling Techniques to QEMU
Applying Polling Techniques to QEMU Reducing virtio-blk I/O Latency Stefan Hajnoczi KVM Forum 2017 Agenda Problem: Virtualization overhead is significant for high IOPS devices QEMU
More informationAccelerating NVMe-oF* for VMs with the Storage Performance Development Kit
Accelerating NVMe-oF* for VMs with the Storage Performance Development Kit Jim Harris Principal Software Engineer Intel Data Center Group Santa Clara, CA August 2017 1 Notices and Disclaimers Intel technologies
More informationAMD CORPORATE TEMPLATE AMD Radeon Open Compute Platform Felix Kuehling
AMD Radeon Open Compute Platform Felix Kuehling ROCM PLATFORM ON LINUX Compiler Front End AMDGPU Driver Enabled with ROCm GCN Assembly Device LLVM Compiler (GCN) LLVM Opt Passes GCN Target Host LLVM Compiler
More informationRun Anywhere. The Hardware Platform Perspective. Ben Pollan, AMD Java Labs October 28, 2008
Run Anywhere The Hardware Platform Perspective Ben Pollan, AMD Java Labs October 28, 2008 Agenda Java Labs Introduction Community Collaboration Performance Optimization Recommendations Leveraging the Latest
More informationMaximizing Six-Core AMD Opteron Processor Performance with RHEL
Maximizing Six-Core AMD Opteron Processor Performance with RHEL Bhavna Sarathy Red Hat Technical Lead, AMD Sanjay Rao Senior Software Engineer, Red Hat Sept 4, 2009 1 Agenda Six-Core AMD Opteron processor
More informationSCALING DGEMM TO MULTIPLE CAYMAN GPUS AND INTERLAGOS MANY-CORE CPUS FOR HPL
SCALING DGEMM TO MULTIPLE CAYMAN GPUS AND INTERLAGOS MANY-CORE CPUS FOR HPL Matthias Bach and David Rohr Frankfurt Institute for Advanced Studies Goethe University of Frankfurt I: INTRODUCTION 3 Scaling
More informationPerformance Prediction and Optimization using Linux/cgroups
Linux Con JAPAN 0(Yokohama) June st 0 Performance Prediction and Optimization using Linux/cgroups Yuzuru Maya Hitachi, Ltd., Yokohama Research Laboratory Agenda Background Outline of Linux/Cgroups Performance
More informationAccelerate Finger Printing in Data Deduplication Xiaodong Liu & Qihua Dai Intel Corporation
Accelerate Finger Printing in Data Deduplication Xiaodong Liu & Qihua Dai Intel Corporation Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS
More informationUser Guide. Storage Executive. Introduction. Storage Executive User Guide. Introduction
Introduction User Guide Storage Executive Introduction This guide describes how to install and use Storage Executive to monitor and manage Micron solid state drives (SSDs). Storage Executive provides the
More informationTHE STORAGE PERFORMANCE DEVELOPMENT KIT AND NVME-OF
14th ANNUAL WORKSHOP 2018 THE STORAGE PERFORMANCE DEVELOPMENT KIT AND NVME-OF Paul Luse Intel Corporation Apr 2018 AGENDA Storage Performance Development Kit What is SPDK? The SPDK Community Why are so
More informationPROTECTING VM REGISTER STATE WITH AMD SEV-ES DAVID KAPLAN LSS 2017
PROTECTING VM REGISTER STATE WITH AMD SEV-ES DAVID KAPLAN LSS 2017 BACKGROUND-- HARDWARE MEMORY ENCRYPTION AMD Secure Memory Encryption (SME) / AMD Secure Encrypted Virtualization (SEV) Hardware AES engine
More informationRavindra Babu Ganapathi
14 th ANNUAL WORKSHOP 2018 INTEL OMNI-PATH ARCHITECTURE AND NVIDIA GPU SUPPORT Ravindra Babu Ganapathi Intel Corporation [ April, 2018 ] Intel MPI Open MPI MVAPICH2 IBM Platform MPI SHMEM Intel MPI Open
More informationJim Harris Principal Software Engineer Intel Data Center Group
Jim Harris Principal Software Engineer Intel Data Center Group Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR
More informationIntroducing NVDIMM-X: Designed to be the World s Fastest NAND-Based SSD Architecture and a Platform for the Next Generation of New Media SSDs
, Inc. Introducing NVDIMM-X: Designed to be the World s Fastest NAND-Based SSD Architecture and a Platform for the Next Generation of New Media SSDs Doug Finke Director of Product Marketing September 2016
More informationDeveloping Extremely Low-Latency NVMe SSDs
Developing Extremely Low-Latency NVMe SSDs Young Paik Director of Product Planning Samsung Electronics Santa Clara, CA 1 Disclaimer This presentation and/or accompanying oral statements by Samsung representatives
More informationAVR42789: Writing to Flash on the New tinyavr Platform Using Assembly
AVR 8-bit Microcontrollers AVR42789: Writing to Flash on the New tinyavr Platform Using Assembly APPLICATION NOTE Table of Contents 1. What has Changed...3 1.1. What This Means and How to Adapt...4 2.
More informationIdentifying Performance Bottlenecks with Real- World Applications and Flash-Based Storage
Identifying Performance Bottlenecks with Real- World Applications and Flash-Based Storage TechTarget Dennis Martin 1 Agenda About Demartek Enterprise Data Center Environments Storage Performance Metrics
More informationOPENCL TM APPLICATION ANALYSIS AND OPTIMIZATION MADE EASY WITH AMD APP PROFILER AND KERNELANALYZER
OPENCL TM APPLICATION ANALYSIS AND OPTIMIZATION MADE EASY WITH AMD APP PROFILER AND KERNELANALYZER Budirijanto Purnomo AMD Technical Lead, GPU Compute Tools PRESENTATION OVERVIEW Motivation AMD APP Profiler
More informationMEASURING AND MODELING ON-CHIP INTERCONNECT POWER ON REAL HARDWARE
MEASURING AND MODELING ON-CHIP INTERCONNECT POWER ON REAL HARDWARE VIGNESH ADHINARAYANAN, INDRANI PAUL, JOSEPH L. GREATHOUSE, WEI HUANG, ASHUTOSH PATTNAIK, WU-CHUN FENG POWER AND ENERGY ARE FIRST-CLASS
More informationCES TECH DAY JIM ANDERSON. SVP and GM, Computing and Graphics Business Group
CES TECH DAY JIM ANDERSON SVP and GM, Computing and Graphics Business Group CAUTIONARY STATEMENT This presentation contains forward-looking statements concerning Advanced Micro Devices, Inc. (AMD) including
More informationSPDK China Summit Ziye Yang. Senior Software Engineer. Network Platforms Group, Intel Corporation
SPDK China Summit 2018 Ziye Yang Senior Software Engineer Network Platforms Group, Intel Corporation Agenda SPDK programming framework Accelerated NVMe-oF via SPDK Conclusion 2 Agenda SPDK programming
More informationRelease Notes Compute Abstraction Layer (CAL) Stream Computing SDK New Features. 2 Resolved Issues. 3 Known Issues. 3.
Release Notes Compute Abstraction Layer (CAL) Stream Computing SDK 1.4 1 New Features 2 Resolved Issues 3 Known Issues 3.1 Link Issues Support for bilinear texture sampling. Support for FETCH4. Rebranded
More informationRadeon Pro Software: Radeon Pro ReLive. User Guide v3.0
Radeon Pro Software: Radeon Pro ReLive User Guide v3.0 This guide will detail how to use Radeon Pro ReLive to capture high quality desktop videos and screenshots for your professional needs. DISCLAIMER
More informationIntel Cache Acceleration Software (Intel CAS) for Linux* v2.9 (GA)
Intel Cache Acceleration Software (Intel CAS) for Linux* v2.9 (GA) Release Notes June 2015 Revision 010 Document Number: 328497-010 Notice: This document contains information on products in the design
More informationGPGPU COMPUTE ON AMD. Udeepta Bordoloi April 6, 2011
GPGPU COMPUTE ON AMD Udeepta Bordoloi April 6, 2011 WHY USE GPU COMPUTE CPU: scalar processing + Latency + Optimized for sequential and branching algorithms + Runs existing applications very well - Throughput
More informationAMD EPYC Delivers Linear Scalability for Docker with Bare-Metal Performance
Solution Brief February, 2019 AMD EPYC Delivers Linear Scalability for Docker with Bare-Metal Performance The AMD EPYC SoC brings a new balance to the datacenter. Utilizing x86 architecture, the AMD EPYC
More informationCeph in a Flash. Micron s Adventures in All-Flash Ceph Storage. Ryan Meredith & Brad Spiers, Micron Principal Solutions Engineer and Architect
Ceph in a Flash Micron s Adventures in All-Flash Ceph Storage Ryan Meredith & Brad Spiers, Micron Principal Solutions Engineer and Architect 217 Micron Technology, Inc. All rights reserved. Information,
More information6th Generation Intel Core Processor Series
6th Generation Intel Core Processor Series Application Power Guidelines Addendum Supporting the 6th Generation Intel Core Processor Series Based on the S-Processor Lines August 2015 Document Number: 332854-001US
More informationADVANCED RENDERING EFFECTS USING OPENCL TM AND APU Session Olivier Zegdoun AMD Sr. Software Engineer
ADVANCED RENDERING EFFECTS USING OPENCL TM AND APU Session 2117 Olivier Zegdoun AMD Sr. Software Engineer CONTENTS Rendering Effects Before Fusion: single discrete GPU case Before Fusion: multiple discrete
More informationIntel Cluster Ready Allowed Hardware Variances
Intel Cluster Ready Allowed Hardware Variances Solution designs are certified as Intel Cluster Ready with an exact bill of materials for the hardware and the software stack. When instances of the certified
More information