Run Anywhere. The Hardware Platform Perspective. Ben Pollan, AMD Java Labs October 28, 2008

Similar documents
Multi-core processors are here, but how do you resolve data bottlenecks in native code?

INTRODUCTION TO OPENCL TM A Beginner s Tutorial. Udeepta Bordoloi AMD

SIMULATOR AMD RESEARCH JUNE 14, 2015


AMD IOMMU VERSION 2 How KVM will use it. Jörg Rödel August 16th, 2011

ADVANCED RENDERING EFFECTS USING OPENCL TM AND APU Session Olivier Zegdoun AMD Sr. Software Engineer

OPENCL TM APPLICATION ANALYSIS AND OPTIMIZATION MADE EASY WITH AMD APP PROFILER AND KERNELANALYZER

AMD APU and Processor Comparisons. AMD Client Desktop Feb 2013 AMD

Panel Discussion: The Future of I/O From a CPU Architecture Perspective

AMD S X86 OPEN64 COMPILER. Michael Lai AMD

AMD Graphics Team Last Updated February 11, 2013 APPROVED FOR PUBLIC DISTRIBUTION. 1 3DMark Overview February 2013 Approved for public distribution

AMD ACCELERATING TECHNOLOGIES FOR EXASCALE COMPUTING FELLOW 3 OCTOBER 2016

Maximizing Six-Core AMD Opteron Processor Performance with RHEL

AMD RYZEN PROCESSOR WITH RADEON VEGA GRAPHICS CORPORATE BRAND GUIDELINES

Heterogeneous Computing

CAUTIONARY STATEMENT This presentation contains forward-looking statements concerning Advanced Micro Devices, Inc. (AMD) including, but not limited to

EFFICIENT SPARSE MATRIX-VECTOR MULTIPLICATION ON GPUS USING THE CSR STORAGE FORMAT

SOLUTION TO SHADER RECOMPILES IN RADEONSI SEPTEMBER 2015

HyperTransport Technology

AMD Graphics Team Last Updated April 29, 2013 APPROVED FOR PUBLIC DISTRIBUTION. 1 3DMark Overview April 2013 Approved for public distribution

ROCm: An open platform for GPU computing exploration

KVM CPU MODEL IN SYSCALL EMULATION MODE ALEXANDRU DUTU, JOHN SLICE JUNE 14, 2015

Designing Natural Interfaces

CAUTIONARY STATEMENT This presentation contains forward-looking statements concerning Advanced Micro Devices, Inc. (AMD) including, but not limited to

Generic System Calls for GPUs

AMD CORPORATE TEMPLATE AMD Radeon Open Compute Platform Felix Kuehling

The mobile computing evolution. The Griffin architecture. Memory enhancements. Power management. Thermal management

HETEROGENEOUS SYSTEM ARCHITECTURE: PLATFORM FOR THE FUTURE

The Rise of Open Programming Frameworks. JC BARATAULT IWOCL May 2015

RegMutex: Inter-Warp GPU Register Time-Sharing

CAUTIONARY STATEMENT 1 AMD NEXT HORIZON NOVEMBER 6, 2018

Sequential Consistency for Heterogeneous-Race-Free

AMD EPYC CORPORATE BRAND GUIDELINES

FUSION PROCESSORS AND HPC

GPGPU COMPUTE ON AMD. Udeepta Bordoloi April 6, 2011

Understanding GPGPU Vector Register File Usage

3D Numerical Analysis of Two-Phase Immersion Cooling for Electronic Components

Use cases. Faces tagging in photo and video, enabling: sharing media editing automatic media mashuping entertaining Augmented reality Games

INTERFERENCE FROM GPU SYSTEM SERVICE REQUESTS

STREAMING VIDEO DATA INTO 3D APPLICATIONS Session Christopher Mayer AMD Sr. Software Engineer

viewdle! - machine vision experts

AMD HD3D Technology. Setup Guide. 1 AMD HD3D TECHNOLOGY: Setup Guide

FLASH MEMORY SUMMIT Adoption of Caching & Hybrid Solutions

Automatic Intra-Application Load Balancing for Heterogeneous Systems

LIQUIDVR TODAY AND TOMORROW GUENNADI RIGUER, SOFTWARE ARCHITECT

Accelerating Applications. the art of maximum performance computing James Spooner Maxeler VP of Acceleration

Vulkan (including Vulkan Fast Paths)

NEXT-GENERATION MATRIX 3D IMMERSIVE USER INTERFACE [ M3D-IUI ] H Raghavendra Swamy AMD Senior Software Engineer

HPCA 18. Reliability-aware Data Placement for Heterogeneous memory Architecture

Desktop Telepresence Arrived! Sudha Valluru ViVu CEO

MEASURING AND MODELING ON-CHIP INTERCONNECT POWER ON REAL HARDWARE

The Road to the AMD. Fiji GPU. Featuring Die Stacking and HBM Technology 1 THE ROAD TO THE AMD FIJI GPU ECTC 2016 MAY 2015

HPG 2011 HIGH PERFORMANCE GRAPHICS HOT 3D

HIGHLY PARALLEL COMPUTING IN PHYSICS-BASED RENDERING OpenCL Raytracing Based. Thibaut PRADOS OPTIS Real-Time & Virtual Reality Manager

Gestural and Cinematic Interfaces - DX11. David Brebner Unlimited Realities CTO

ACCELERATING MATRIX PROCESSING WITH GPUs. Nicholas Malaya, Shuai Che, Joseph Greathouse, Rene van Oostrum, and Michael Schulte AMD Research

DR. LISA SU

EXPLOITING ACCELERATOR-BASED HPC FOR ARMY APPLICATIONS

AMD SEV Update Linux Security Summit David Kaplan, Security Architect

THE PROGRAMMER S GUIDE TO THE APU GALAXY. Phil Rogers, Corporate Fellow AMD

clarmor: A DYNAMIC BUFFER OVERFLOW DETECTOR FOR OPENCL KERNELS CHRIS ERB, JOE GREATHOUSE, MAY 16, 2018

Pattern-based analytics to estimate and track yield risk of designs down to 7nm

AMD Radeon ProRender plug-in for Unreal Engine. Installation Guide

1 HiPEAC January, 2012 Public TASKS, FUTURES AND ASYNCHRONOUS PROGRAMMING

AMD AIB Partner Guidelines. Version February, 2015

D3D12 & Vulkan: Lessons learned. Dr. Matthäus G. Chajdas Developer Technology Engineer, AMD

MIGRATION OF LEGACY APPLICATIONS TO HETEROGENEOUS ARCHITECTURES Francois Bodin, CTO, CAPS Entreprise. June 2011

PROTECTING VM REGISTER STATE WITH AMD SEV-ES DAVID KAPLAN LSS 2017

AMD RYZEN CORPORATE BRAND GUIDELINES

BIOMEDICAL DATA ANALYSIS ON HETEROGENEOUS PLATFORM. Dong Ping Zhang Heterogeneous System Architecture AMD

Cilk Plus: Multicore extensions for C and C++

SCALING DGEMM TO MULTIPLE CAYMAN GPUS AND INTERLAGOS MANY-CORE CPUS FOR HPL

1 Presentation Title Month ##, 2012

Microsoft Windows 2016 Mellanox 100GbE NIC Tuning Guide

Fusion Enabled Image Processing

オープンソ プンソース技術者のための AMD 最新テクノロジーアップデート 日本 AMD 株式会社 マーケティング ビジネス開発本部 エンタープライズプロダクトマーケティング部 山野 洋幸

AMD 780G. Niles Burbank AMD. an x86 chipset with advanced integrated GPU. Hot Chips 2008

Anatomy of AMD s TeraScale Graphics Engine

MULTIMEDIA PROCESSING Real-time H.264 video enhancement by using AMD APP SDK

User Manual. Nvidia Jetson Series Carrier board Aetina ACE-N622

OpenCL Implementation Of A Heterogeneous Computing System For Real-time Rendering And Dynamic Updating Of Dense 3-d Volumetric Data

NUMA Topology for AMD EPYC Naples Family Processors

Multi-core is Here! But How Do You Resolve Data Bottlenecks in PC Games? hint: it s all about locality

Driver Options in AMD Radeon Pro Settings. User Guide

HP Operations Orchestration

Graphics Hardware 2008

AMD Security and Server innovation

Resource Saving: Latest Innovation in Optimized Cloud Infrastructure

Memory Population Guidelines for AMD EPYC Processors

Changing your Driver Options with Radeon Pro Settings. Quick Start User Guide v3.0

Performance Tuning Guidelines for Low Latency Response on AMD EPYC -Based Servers Application Note

Java Application Performance Tuning for AMD EPYC Processors

SPECjbb2005. Alan Adamson, IBM Canada David Dagastine, Sun Microsystems Stefan Sarne, BEA Systems

Family 15h Models 00h-0Fh AMD FX -Series Processor Product Data Sheet

NVMe SSD Performance Evaluation Guide for Windows Server 2016 and Red Hat Enterprise Linux 7.4

Oracle Service Cloud Agent Browser UI. November What s New

Pause-Less GC for Improving Java Responsiveness. Charlie Gracie IBM Senior Software charliegracie

Oracle Enterprise Performance Reporting Cloud. What s New in September 2016 Release (16.09)

IBM FileNet Content Manager 5.2. Asynchronous Event Processing Performance Tuning

Fan Control in AMD Radeon Pro Settings. User Guide. This document is a quick user guide on how to configure GPU fan speed in AMD Radeon Pro Settings.

Transcription:

Run Anywhere The Hardware Platform Perspective Ben Pollan, AMD Java Labs October 28, 2008

Agenda Java Labs Introduction Community Collaboration Performance Optimization Recommendations Leveraging the Latest Hardware Improvements 2 Run Anywhere: The Hardware Platform Perspective October 28, 2008

AMD Java Labs Dedicated AMD Java Labs organization supports Java development community through 3 Run Anywhere: The Hardware Platform Perspective October 28, 2008

AMD and Java Relationship Java Platform Performance Trends on x64 SPECjbb2005 on AMD Barcelona processor Data obtained from Sun Microsystems. Results not verified by AMD. 4 Run Anywhere: The Hardware Platform Perspective October 28, 2008

Community Collaboration 5 Run Anywhere: The Hardware Platform Perspective October 28, 2008

Contribution Areas Open Source OpenJDK: contributing performance enhancements CodeSleuth: Java profiling plug-in for Eclipse, released as open source project on sourceforge.net Collaboration with proprietary JVM vendors Performance optimizations to leverage hardware features 6 Run Anywhere: The Hardware Platform Perspective October 28, 2008

Pursuing Optimizations For: More efficient use of hardware features Instruction selection Convert integer to double, float to double, shift by constants Improve cache efficiency Hashmap: common case usage improvements BigDecimal: class size reductions Field reordering / removal Improved performance / profiling data JVMTI: allow tools to track JITed method inlining Performance data via Instruction Based Sampling 7 Run Anywhere: The Hardware Platform Perspective October 28, 2008

Hashmap Optimization Hashmap.get Leading cause of cache misses Statistics showed common use pattern Hashmap.put(int i, object); // where i < # elements Hashmap.get(int i); Solution For this case, implement hashing functions and buckets as an array lookup Touches less memory, causing fewer cache misses 8 Run Anywhere: The Hardware Platform Perspective October 28, 2008

JVMTI Method Inlining Information Method address to source code mapping is a common performance analysis task Memory location->jited code->bytecode->source code JVMTI s CompiledMethodLoad callback returns information to assist with this mapping Broken does not provide information for inlined methods OpenJDK extended to make inlining information available via JVMTI Tools writers can use this to produce better mapping Within JVMTI specification (existing void pointer) 9 Run Anywhere: The Hardware Platform Perspective October 28, 2008

Performance Optimization Recommendations 10 Run Anywhere: The Hardware Platform Perspective October 28, 2008

Configuration Options - NUMA Processors have their own local memory, which is less costly to access If running multiple JVMs per system, pin individual JVMs to a processor Windows: start /affinity xx java where xx is a mask specifying the cores the process will run on Linux: numactl --cpunodebind=processor_num --membind=processor_num java 11 Run Anywhere: The Hardware Platform Perspective October 28, 2008

Configuration Options NUMA (cont.) If running single JVM per system Try node interleaved memory setting in BIOS For Sun Solaris JVM, use XX+UseNUMA For IBM on Linux, heap automatically interleaved Don t do interleave both (cancels each other out) 12 Run Anywhere: The Hardware Platform Perspective October 28, 2008

Configuration Options Page Size Page table maintains virtual to physical address mappings Translation Lookaside Buffer (TLB) maintains cache of these With small page sizes, # of mappings can exceed cache size, leading to (slower) page table access For large (2M) page size system configuration instructions, see article in AMD Java Zone: Supersizing Java, parts 1 and 2 For those with even larger requirements, 1G Page support submitted soon for OpenJDK Only Solaris supports this for now Working with Linux distributions for support 13 Run Anywhere: The Hardware Platform Perspective October 28, 2008

Configuration Options Page Size (cont.) Sun: -XX:+UseLargePages -XX:LargePagesSizeInBytes=n (n=2m or 1G) Oracle: Determines if Large Pages are enabled in the system, then enables their support in JVM IBM: -Xlp 14 Run Anywhere: The Hardware Platform Perspective October 28, 2008

Configuration Options Compressed References For 64-bit OS and 64-bit JVM, object references stored on the heap are 64-bits Compressed references limit references to 32-bits, using fewer heap resources To use, your memory requirements must be lower than: IBM: 25GB Oracle: 4GB Sun: 32GB To enable: IBM: -Xcompressedrefs Oracle: -XXcompressedRefs Sun: -XX:+UseCompressedOops 15 Run Anywhere: The Hardware Platform Perspective October 28, 2008

Configuration Options - IBM -Xtlhprefetch Causes newly allocated area on the heap to be prefetched with PREFETCHNTA Prevents L2 processor cache from being polluted, because when objects are removed from L1, they aren t moved to L2 Good to use when you have many short-lived objects 16 Run Anywhere: The Hardware Platform Perspective October 28, 2008

Leveraging the Latest Hardware Improvements 17 Run Anywhere: The Hardware Platform Perspective October 28, 2008

AMD Hardware Advances Increase in cores per processor Balance workload between GPU and CPU Instruction-based sampling (IBS) Rich set of processor event data Precisely associates event data with the instructions that cause the event JVMs can use this data to make dynamic optimization decisions Lightweight Profiling (LWP) Specification Enable code to make dynamic decisions about how to improve performance Suggestions welcome: lwp.feedback@amd.com 18 Run Anywhere: The Hardware Platform Perspective October 28, 2008

AMD Hardware Advances Advanced Synchronization Facility (ASF) Experimental AMD64 extension Lighter weight locking mechanism Instruction optimizations With each processor release 19 Run Anywhere: The Hardware Platform Perspective October 28, 2008

Taking Advantage of Hardware Features Upgrade to the latest JVMs Many benefits you will get for free Instruction optimizations Profiling information, feedback Use the Java Concurrency Classes ParallelArray (JDK7) will take advantage of work that will leverage multiple cores 20 Run Anywhere: The Hardware Platform Perspective October 28, 2008

More Information at AMD Developer Central Detailed technical articles Documentation, tutorials, and guides Featuring tips to help optimize software for AMD Barcelona processors AMD-optimized build tools (Compilers, JITs) Performance analysis tools (AMD CodeAnalyst TM software) Performance libraries AMD Core Math Library (ACML) and Framewave (open source) Inside look at AMD s vision Community discussions Subscribe to our free newsletter: developer.amd.com 21 Run Anywhere: The Hardware Platform Perspective October 28, 2008

Java Zone on AMD Developer Central 22 Run Anywhere: The Hardware Platform Perspective October 28, 2008

Disclaimer The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes. AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION. AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. Trademark Attribution AMD, the AMD Arrow logo, and combinations thereof are trademarks of Advanced Micro Devices, Inc. SPECjbb is a registered trademark of Standard Performance Evaluation Corporation (SPEC). Other names are for informational purposes only and may be trademarks of their respective owners. 2008 Advanced Micro Devices, Inc. All rights reserved. 23 Run Anywhere: The Hardware Platform Perspective October 28, 2008