Maximizing Six-Core AMD Opteron Processor Performance with RHEL

Similar documents
HyperTransport Technology

Six-Core AMD Opteron Processor

CAUTIONARY STATEMENT This presentation contains forward-looking statements concerning Advanced Micro Devices, Inc. (AMD) including, but not limited to


AMD Graphics Team Last Updated February 11, 2013 APPROVED FOR PUBLIC DISTRIBUTION. 1 3DMark Overview February 2013 Approved for public distribution

EFFICIENT SPARSE MATRIX-VECTOR MULTIPLICATION ON GPUS USING THE CSR STORAGE FORMAT

FLASH MEMORY SUMMIT Adoption of Caching & Hybrid Solutions

AMD IOMMU VERSION 2 How KVM will use it. Jörg Rödel August 16th, 2011

AMD APU and Processor Comparisons. AMD Client Desktop Feb 2013 AMD

INTRODUCTION TO OPENCL TM A Beginner s Tutorial. Udeepta Bordoloi AMD

AMD Graphics Team Last Updated April 29, 2013 APPROVED FOR PUBLIC DISTRIBUTION. 1 3DMark Overview April 2013 Approved for public distribution

CAUTIONARY STATEMENT This presentation contains forward-looking statements concerning Advanced Micro Devices, Inc. (AMD) including, but not limited to

The mobile computing evolution. The Griffin architecture. Memory enhancements. Power management. Thermal management

CAUTIONARY STATEMENT 1 AMD NEXT HORIZON NOVEMBER 6, 2018

Multi-core processors are here, but how do you resolve data bottlenecks in native code?

Run Anywhere. The Hardware Platform Perspective. Ben Pollan, AMD Java Labs October 28, 2008

AMD RYZEN PROCESSOR WITH RADEON VEGA GRAPHICS CORPORATE BRAND GUIDELINES

Understanding GPGPU Vector Register File Usage

Panel Discussion: The Future of I/O From a CPU Architecture Perspective

KVM CPU MODEL IN SYSCALL EMULATION MODE ALEXANDRU DUTU, JOHN SLICE JUNE 14, 2015

Quad-core Press Briefing First Quarter Update

DR. LISA SU

HETEROGENEOUS SYSTEM ARCHITECTURE: PLATFORM FOR THE FUTURE

AMD ACCELERATING TECHNOLOGIES FOR EXASCALE COMPUTING FELLOW 3 OCTOBER 2016

SIMULATOR AMD RESEARCH JUNE 14, 2015

The Road to the AMD. Fiji GPU. Featuring Die Stacking and HBM Technology 1 THE ROAD TO THE AMD FIJI GPU ECTC 2016 MAY 2015

オープンソ プンソース技術者のための AMD 最新テクノロジーアップデート 日本 AMD 株式会社 マーケティング ビジネス開発本部 エンタープライズプロダクトマーケティング部 山野 洋幸

ADVANCED RENDERING EFFECTS USING OPENCL TM AND APU Session Olivier Zegdoun AMD Sr. Software Engineer

OPENCL TM APPLICATION ANALYSIS AND OPTIMIZATION MADE EASY WITH AMD APP PROFILER AND KERNELANALYZER

FUSION PROCESSORS AND HPC

STREAMING VIDEO DATA INTO 3D APPLICATIONS Session Christopher Mayer AMD Sr. Software Engineer

HPCA 18. Reliability-aware Data Placement for Heterogeneous memory Architecture

Resource Saving: Latest Innovation in Optimized Cloud Infrastructure

EPYC VIDEO CUG 2018 MAY 2018

Introducing NVDIMM-X: Designed to be the World s Fastest NAND-Based SSD Architecture and a Platform for the Next Generation of New Media SSDs

Sun Fire X4170 M2 Server Frequently Asked Questions

PROTECTING VM REGISTER STATE WITH AMD SEV-ES DAVID KAPLAN LSS 2017

Microsoft SQL Server in a VMware Environment on Dell PowerEdge R810 Servers and Dell EqualLogic Storage

INTERFERENCE FROM GPU SYSTEM SERVICE REQUESTS

The Rise of Open Programming Frameworks. JC BARATAULT IWOCL May 2015

Designing Natural Interfaces

SOLUTION TO SHADER RECOMPILES IN RADEONSI SEPTEMBER 2015

AMD S X86 OPEN64 COMPILER. Michael Lai AMD

AMD EPYC CORPORATE BRAND GUIDELINES

Use cases. Faces tagging in photo and video, enabling: sharing media editing automatic media mashuping entertaining Augmented reality Games

ROCm: An open platform for GPU computing exploration

MEASURING AND MODELING ON-CHIP INTERCONNECT POWER ON REAL HARDWARE

AMD HD3D Technology. Setup Guide. 1 AMD HD3D TECHNOLOGY: Setup Guide

ABySS Performance Benchmark and Profiling. May 2010

THE PROGRAMMER S GUIDE TO THE APU GALAXY. Phil Rogers, Corporate Fellow AMD

Desktop Telepresence Arrived! Sudha Valluru ViVu CEO

Family 15h Models 00h-0Fh AMD Opteron Processor Product Data Sheet

AMD SEV Update Linux Security Summit David Kaplan, Security Architect

Automatic Intra-Application Load Balancing for Heterogeneous Systems

SCALING DGEMM TO MULTIPLE CAYMAN GPUS AND INTERLAGOS MANY-CORE CPUS FOR HPL

ACCELERATING MATRIX PROCESSING WITH GPUs. Nicholas Malaya, Shuai Che, Joseph Greathouse, Rene van Oostrum, and Michael Schulte AMD Research

3D Numerical Analysis of Two-Phase Immersion Cooling for Electronic Components

viewdle! - machine vision experts

Linux Network Tuning Guide for AMD EPYC Processor Based Servers

AMD CORPORATE TEMPLATE AMD Radeon Open Compute Platform Felix Kuehling

Virtual Leverage: Server Consolidation in Open Source Environments. Margaret Lewis Commercial Software Strategist AMD

Generic System Calls for GPUs

Newest generation of HP ProLiant DL380 takes #1 position overall on Oracle E-Business Suite Small Model Benchmark

Pattern-based analytics to estimate and track yield risk of designs down to 7nm

Memory Population Guidelines for AMD EPYC Processors

HPG 2011 HIGH PERFORMANCE GRAPHICS HOT 3D

Dell PowerEdge R920 System Powers High Performing SQL Server Databases and Consolidates Databases

Gestural and Cinematic Interfaces - DX11. David Brebner Unlimited Realities CTO

Accelerating Applications. the art of maximum performance computing James Spooner Maxeler VP of Acceleration

Family 15h Models 00h-0Fh AMD FX -Series Processor Product Data Sheet

Heterogeneous Computing

Vertical Scaling of Oracle 10g Performance on Red Hat

Consolidating Microsoft SQL Server databases on PowerEdge R930 server

AMD EPYC Processors Showcase High Performance for Network Function Virtualization (NFV)

Fusion Enabled Image Processing

Microsoft Windows 2016 Mellanox 100GbE NIC Tuning Guide

AMD RYZEN CORPORATE BRAND GUIDELINES

СОВРЕМЕННЫЕ СЕРВЕРНЫЕ ПЛАТФОРМЫ НА БАЗЕ AMD OPTERON AMD OPTERON 6300 SERIES PROCESSOR, CODENAMED ABU DHABI

MM5 Modeling System Performance Research and Profiling. March 2009

AMD AIB Partner Guidelines. Version February, 2015

Broadcast-Quality, High-Density HEVC Encoding with AMD EPYC Processors

Aster Database Platform/OS Support Matrix, version 6.00

clarmor: A DYNAMIC BUFFER OVERFLOW DETECTOR FOR OPENCL KERNELS CHRIS ERB, JOE GREATHOUSE, MAY 16, 2018

NUMA Topology for AMD EPYC Naples Family Processors

Aster Database Platform/OS Support Matrix, version 5.0.2

EXPLOITING ACCELERATOR-BASED HPC FOR ARMY APPLICATIONS

Reference Architecture

TPC-E testing of Microsoft SQL Server 2016 on Dell EMC PowerEdge R830 Server and Dell EMC SC9000 Storage

GPGPU COMPUTE ON AMD. Udeepta Bordoloi April 6, 2011

Sequential Consistency for Heterogeneous-Race-Free

HIGHLY PARALLEL COMPUTING IN PHYSICS-BASED RENDERING OpenCL Raytracing Based. Thibaut PRADOS OPTIS Real-Time & Virtual Reality Manager

Performance Tuning Guidelines for Low Latency Response on AMD EPYC -Based Servers Application Note

Performance and Energy Efficiency of the 14 th Generation Dell PowerEdge Servers

NEXT-GENERATION MATRIX 3D IMMERSIVE USER INTERFACE [ M3D-IUI ] H Raghavendra Swamy AMD Senior Software Engineer

AMD Opteron Processors In the Cloud

Linux Network Tuning Guide for AMD EPYC Processor Based Servers

AMD Radeon ProRender plug-in for Unreal Engine. Installation Guide

Accelerating Microsoft SQL Server Performance With NVDIMM-N on Dell EMC PowerEdge R740

BIOMEDICAL DATA ANALYSIS ON HETEROGENEOUS PLATFORM. Dong Ping Zhang Heterogeneous System Architecture AMD

The Oracle Database Appliance I/O and Performance Architecture

Transcription:

Maximizing Six-Core AMD Opteron Processor Performance with RHEL Bhavna Sarathy Red Hat Technical Lead, AMD Sanjay Rao Senior Software Engineer, Red Hat Sept 4, 2009 1

Agenda Six-Core AMD Opteron processor codenamed Istanbul overview Six-Core AMD Opteron processor feature support Continued virtualization support New Innovations Red Hat Enterprise Linux software support Performance benchmarking results Conclusions 2

Six-Core AMD Opteron Processor ( Istanbul ) Six True Cores New HyperTransport Technology HT Assist Increased HyperTransport 3.0 Technology (HT3) Bandwidth Higher Performing Integrated Memory Controller Same power/thermal envelopes as Quad-Core AMD Opteron Processor Continued AMD Virtualization (AMD-V ) technology support, Rapid Virtualization Indexing 3

Prior Generation Innovations that Continue All the performance-enhancing features of Quad-Core AMD Opteron processor AMD Wide Floating-Point Accelerator AMD Memory Optimizer Technology Dual Dynamic Power Management AMD Balanced Smart Cache HyperTransport 3 Technology CPU HT3 CPU AMD Virtualization (AMD-V ) technology VM1 VM2 Virtual Memory 1 Virtual Memory 2 Physical Memory 4

Prior Generation Innovations that Continue Independent Dynamic Core Technology All the power-efficiency features of Quad-Core AMD Opteron processor AMD CoolCore Technology Dual Dynamic Power Management Low-Power DDR2 Memory AMD PowerCap manager Core Select AMD Smart Fetch technology 5

New Innovations in Six-Core AMD Opteron processor ( Istanbul ) Six cores per socket Six core support for F (1207) socket infrastructure Improves performance (compared to Quad-Core AMD Opteron processor) HT Assist in multi-socket systems: Reduces probe traffic Resolves probes more quickly Higher HyperTransport 3.0 Technology Speeds Support for up to 4.8GT/s per link Overall system performance 6

HT Assist : What is it? Micro-architectural feature in Six-Core AMD Opteron processor Helps reduce memory latency Helps increase overall system performance in 4- socket and 8-socket systems Improves HyperTransport technology link efficiency and increases performance by: Reducing probe traffic Resolving probes more quickly Probe broadcasting can be eliminated in 8 of 11 typical CPU-to-CPU transactions 7

HT Assist : How does it work? Query Example: CPU 1 CPU 2 L3 L3 CPU 1 CPU 2 L3 L3 CPU 3 CPU 4 L3 L3 CPU 3 CPU 4 L3 L3 Without HT Assist (Total 10 transactions) With HT Assist (Total 2 transactions) = Data Request = Probe Request =L3 Directory = Data Response = Probe Response =Directory Read 8

HT Assist : What is the cache directory? The HT Assist is a sparse directory cache Associated with memory controller of home node Tracks all lines cached in the system from home node Logically part of the memory controller Physically in L3 cache, occupying 1MB of L3 cache For many transactions, eliminates probe broadcasts Host CPU knows exactly which CPU to probe for data local accesses get local DRAM latency, less queuing delay due to lower HT traffic overhead Results in reduced latency and increased system performance in multi-socket systems 9

HT Assist: What is the result? Helps reduce memory latency Helps increase overall system performance 4-way stream memory bandwidth performance improves by ~60% (42 GB/s with HT Assist, and 25.5 GB/s without HT Assist)* Can result in faster query times that can increase performance for cache sensitive applications: Database Virtualization HPC T Assist vs. 25.5GB/s without HT Assist) * *See backup slides for performance and configuration information. 10

HyperTransport 3.0 Technology Advantages of HyperTransport 3.0 technology (HT3) Compared to HyperTransport 1.0 technology (HT1), improves system bandwidth between CPUs and I/O Increased interconnect rate (from 2GT/s with HT1 up to 4.8GT/s per link with HT3) Improves overall system balance and scalability, especially in commercial applications (database, web server, etc.) T Assist vs. 25.5GB/s without HT Assist) * 11

Six-Core AMD Opteron Processor Support For Red Hat Enterprise Linux Excellent relationship with Red Hat Hardware enablement Virtualization and performance collaboration Six-Core AMD Opteron processor works best with RHEL5.4 Continued support for AMD-V with Rapid Virtualization Index Continued support for AMD Power Now! technology driver Continued support for Xen 2MB super pages 12

Six-Core AMD Opteron Support For Red Hat Enterprise Linux RHEL5.4: New Features and support Supports Six-Core AMD Opteron processors AMD-Vi on SR5690 enabled systems KVM virtualization support RHEL6.0: New Features and support 1GB huge page table 13

RHEL5.4 Performance Testing on Six-Core AMD Opteron Istanbul Bare Metal Scalability Testing with Oracle OLTP workload Multiple instance testing with OLTP workload Taking advantage of NUMA KVM multiguest testing with Oracle OLTP workload KVM multiguest testing with Sybase OLTP workload 14

RHEL5.4 Testing on Six-Core AMD Opteron Istanbul Bare Metal Scalability testing with Oracle OLTP workload Multiple Instance testing with Oracle OLTP workload Taking advantage of NUMA KVM Multiguest testing with Oracle OLTP workload KVM Multiguest testing with Sybase OLTP workload 15

Hardware Configuration System Storage 4 Socket - Six-Core AMD Opteron(tm) Processor 8431 @ 2400.099 MHz 64 GB Memory HP HSV300 Fusion IO SSD Device 16

450000.00 Scaling with Oracle OTLP workload 400000.00 350000.00 300000.00 Trans / Min 250000.00 200000.00 RHEL54 FC RHEL54 SSD 150000.00 100000.00 50000.00 0.00 10U 20U 40U 60U 80U 100U Number of Users Graph shows scaling with Oracle OLTP workload (Running in batch commit mode) Scaling improves with storage with low latency higher throughput characteristics 17

KVM 2 Vcpu Multi guest - Oracle OLTP 250000.00 24 cpu Istanbul - 64G 800 742.56 700 200000.00 600 150000.00 500 Trans / min 100000.00 398.53 400 300 216.15 200 50000.00 100 100 0.00 1 Guest 2Vcpu 2 Guests 4 Vcpu 4 Guests 8 Vcpu 8 Guests 16 Vcpu No. of Guests - Total Vcpu 0 Scaling with multiple 2 Vcpu guests running Oracle OLTP workload Near linear Scaling 18

KVM 4 Vcpu Multi guest - Oracle OLTP 24 cpu Istanbul - 64G 300000.00 500.00 450.00 250000.00 400.00 200000.00 350.00 Trans / min 150000.00 100000.00 300.00 250.00 200.00 150.00 50000.00 100.00 50.00 0.00 1Guest 4 Vcpu-8G 2Guest 8 Vcpu-16G 4Guest 16 Vcpu-32G 6Guest-24 Vcpu-48G 8G-32Vcpu-64G-Oversub No of Guests - Total Vcpu - Total Memory 0.00 19 4 vcpu multi guest testing with Oracle OLTP workload shows good scaling Last bar shows no significant penalty with oversubscription of cpus

KVM 8 Vcpu Multi guest - Oracle OLTP Istanbul - 24 cpus - 64G 300000.00 300 271.99 277.85 280 250000.00 260 200000.00 240 220 Trans / min 150000.00 206.02 200 180 100000.00 160 50000.00 140 120 0.00 100 1Guest-8 Vcpu-14G 2Guest-16 Vcpu-28G 3Guest-24Vcpu-42G 4Guest-32Vcpu 56G 100 No of Guests - Total Vcpu-Total Memory 20 8 vcpu multi guest testing with Oracle OLTP workload shows linear scaling Last bar shows no significant penalty with oversubscription of cpus

NUMA pinning with numactl Istanbul - 24 CPUs - 64G Mem 40 0000.0 0 45 0 3500 00.00 389.1 40 0 3000 00.00 326.68 350 291.2 295.76 300 2500 00.00 Trans / Min 2000 00.00 15 0000.0 0 197.56 196.66 250 200 15 0 10 0000.0 0 100 100 10 0 5000 0.00 50 0.0 0 2Guest-12 vcp u-28g 4Guest-24vcpu-56G 2Guest-12 vcp u-28g-numa 4Guest-24vcpu-56G-NUMA 1Guest-6vcpu -14G 3Guest-18 vcp u-42g 1Guest-6vcpu -14G-NUMA 3Guest-18 vcp u-42g-numa No of Guest - Total vcpu - Total Mem 0 Platform shows good scaling without NUMA tuning (Bars 1-4) Using numactl, linear scaling is achieved with multiple guests (Bars 5-8) 21

NUMA Pinning with taskset RHEL5.4 Multi - Instance Scaling Oracle database workload 6000 00.00 5000 00.00 40 0000.0 0 No of Trans 3000 00.00 2000 00.00 10 0000.0 0 0.0 0 1 Instance 2 Insta nces 4 Instances 4 Instances NUMA No of Instances 22 The The platform platform supports supports NUMA. NUMA. By By pinning pinning 4 database database instances instances into into 4 NUMA 4 NUMA nodes nodes a 10% a 10% performance performance improvement improvement was was seen seen (Compare bar 3 & 4)

16 0000.0 0 KVM 4 Vcpu Multi guest - Sybase OLTP Istanbul - 24 cpu - 64G 140000.00 12 0000.0 0 10 0000.0 0 Trans / min 8000 0.00 6000 0.00 40 000.00 2000 0.00 0.0 0 1Guest 4 Vcpu-8G 2 Guest 8 Vcpu-16G 4 Guests 16 Vcpu-32G 6 Guests 24 Vcp u-48g Guests - Total Vcpu - Total Memory 4 vcpu guests showed scaling trend as more guests were added. Scaling was not linear as the workload was not tuned to run in KVM guest 23

140000.00 KVM 8 Vcpu Multi guest - Sybase OLTP Istanbul 24 cpu - 64G 12 0000.0 0 10 0000.0 0 Trans / Min 8000 0.00 6000 0.00 40 000.00 2000 0.00 0.0 0 1 Guest 8Vcpu-14G 2 Guests 16 Vcp u-28g 3 Guests 24 Vcp u-42g Guests - Total Vcpu-Total Memory 8 vcpu guests showed scaling trend as more guests were added. Scaling was not linear as the workload was not tuned to run in KVM guest 24

Conclusion Six-core AMD Opteron Processor Istanbul has shown: Good Vertical scaling Storage (low latency) Memory (Dense memory) Good Horizontal scaling Consolidation Virtualization Storage (low latency, high bandwidth) Memory (Dense memory) NUMA 25

Conclusion (contd) Six-core AMD Opteron Processor Istanbul : Retains the prior generation innovations Adds new innovations six-core, HTAssist, higher HyperTransport 3.0 bandwidth Optimized on RHEL, new hardware features enabled System consolidation in data centers What is your data center story? 26

Questions? Bhavna Sarathy bhavna.sarathy@amd.com Sanjay Rao srao@redhat.com 27

Backup 28

Four-Socket STREAM Performance Improvement with HT Assist (slide 10) 42GB/s using 4 x Six-Core AMD Opteron processors ( Istanbul ) Model 8435 in Tyan Thunder n4250qe (S4985-E) motherboard, 32GB (16x2GB DDR2-800) memory, SuSE Linux Enterprise Server 10 SP1 64-bit (with HT Assist enabled) 25.5GB/s using 4 x Six-Core AMD Opteron processors ( Istanbul ) Model 8435 in Tyan Thunder n4250qe (S4985-E) motherboard, 32GB (16x2GB DDR2-800) memory, SuSE Linux Enterprise Server 10 SP1 64-bit (with HT Assist disabled) 24GB/s using 4 x Quad-Core AMD Opteron processors ( Shanghai ) Model 8384 in Tyan Thunder n4250qe (S4985-E) motherboard, 32GB (16x2GB DDR2-800) memory, SuSE Linux Enterprise Server 10 SP1 64-bit 9GB/s using 4 x Hex-Core Intel Xeon processors ( Dunnington ) Model E7450 in Supermicro X7QC3+ motherboard, 32GB (16x2GB DDR2-667 FB-DIMM) memory, SuSE Linux Enterprise Server 10 SP1 64-bit 29

Disclaimer & Attribution DISCLAIMER The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes. AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION. AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. ATTRIBUTION 2009 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo, AMD CoolCore, AMD Opteron, AMD PowerNow!, AMD Virtualization, AMD-V, Dual Dynamic Power Management, and combinations thereof are trademarks of Advanced Micro Devices, Inc. HyperTransport is a licensed trademark of the HyperTransport Technology Consortium. Microsoft, Windows, and Windows Vista are registered trademarks of Microsoft Corporation in the United States and/or other jurisdictions. Other names are for informational purposes only and may be trademarks of their respective owners. 30