Handling Memory Ordering in Multithreaded Applications with Oracle Solaris Studio 12 Update 2: Part 1, Compiler Barriers

Similar documents
Handling Memory Ordering in Multithreaded Applications with Oracle Solaris Studio 12 Update 2: Part 2, Memory Barriers and Memory Fences

Configuring Oracle Business Intelligence Enterprise Edition to Support Teradata Database Query Banding

Installation Instructions: Oracle XML DB XFILES Demonstration. An Oracle White Paper: November 2011

Creating Custom Project Administrator Role to Review Project Performance and Analyze KPI Categories

An Oracle White Paper November Primavera Unifier Integration Overview: A Web Services Integration Approach

Generate Invoice and Revenue for Labor Transactions Based on Rates Defined for Project and Task

Loading User Update Requests Using HCM Data Loader

Correction Documents for Poland

Oracle Fusion Configurator

Hard Partitioning with Oracle VM Server for SPARC O R A C L E W H I T E P A P E R J U L Y

Oracle Data Provider for.net Microsoft.NET Core and Entity Framework Core O R A C L E S T A T E M E N T O F D I R E C T I O N F E B R U A R Y

An Oracle White Paper September Security and the Oracle Database Cloud Service

Using the Oracle Business Intelligence Publisher Memory Guard Features. August 2013

Oracle CIoud Infrastructure Load Balancing Connectivity with Ravello O R A C L E W H I T E P A P E R M A R C H

Automatic Receipts Reversal Processing

Load Project Organizations Using HCM Data Loader O R A C L E P P M C L O U D S E R V I C E S S O L U T I O N O V E R V I E W A U G U S T 2018

An Oracle White Paper December, 3 rd Oracle Metadata Management v New Features Overview

Tutorial on How to Publish an OCI Image Listing

An Oracle Technical Article March Certification with Oracle Linux 4

Oracle DIVArchive Storage Plan Manager

StorageTek ACSLS Manager Software Overview and Frequently Asked Questions

Sun Fire X4170 M2 Server Frequently Asked Questions

An Oracle White Paper October The New Oracle Enterprise Manager Database Control 11g Release 2 Now Managing Oracle Clusterware

An Oracle White Paper July Oracle WebCenter Portal: Copying a Runtime-Created Skin to a Portlet Producer

Oracle Enterprise Performance Reporting Cloud. What s New in September 2016 Release (16.09)

An Oracle Technical Article August Certification with Oracle Linux 7

Veritas NetBackup and Oracle Cloud Infrastructure Object Storage ORACLE HOW TO GUIDE FEBRUARY 2018

Oracle Cloud Applications. Oracle Transactional Business Intelligence BI Catalog Folder Management. Release 11+

Oracle Service Registry - Oracle Enterprise Gateway Integration Guide

JD Edwards EnterpriseOne Licensing

April Understanding Federated Single Sign-On (SSO) Process

An Oracle Technical Article November Certification with Oracle Linux 7

An Oracle White Paper July Methods for Downgrading from Oracle Database 11g Release 2

Working with Time Zones in Oracle Business Intelligence Publisher ORACLE WHITE PAPER JULY 2014

Oracle Enterprise Data Quality New Features Overview

Subledger Accounting Reporting Journals Reports

October Oracle Application Express Statement of Direction

Integrating Oracle SuperCluster Engineered Systems with a Data Center s 1 GbE and 10 GbE Networks Using Oracle Switch ES1-24

Oracle Data Masking and Subsetting

Oracle Utilities CC&B V2.3.1 and MDM V2.0.1 Integrations. Utility Reference Model Synchronize Master Data

An Oracle Technical White Paper September Detecting and Resolving Oracle Solaris LUN Alignment Problems

Technical White Paper August Recovering from Catastrophic Failures Using Data Replicator Software for Data Replication

Oracle Learn Cloud. Taleo Release 16B.1. Release Content Document

An Oracle White Paper Oct Hard Partitioning With Oracle Solaris Zones

PeopleSoft Fluid Navigation Standards

Oracle NoSQL Database Parent-Child Joins and Aggregation O R A C L E W H I T E P A P E R A P R I L,

Oracle FLEXCUBE Direct Banking Release Corporate Cash Management User Manual. Part No. E

Oracle FLEXCUBE Direct Banking Release Dashboard Widgets Transfer Payments User Manual. Part No. E

Oracle Secure Backup. Getting Started. with Cloud Storage Devices O R A C L E W H I T E P A P E R F E B R U A R Y

Product Release Notes

Achieving High Availability with Oracle Cloud Infrastructure Ravello Service O R A C L E W H I T E P A P E R J U N E

An Oracle White Paper June StorageTek In-Drive Reclaim Accelerator for the StorageTek T10000B Tape Drive and StorageTek Virtual Storage Manager

Oracle WebLogic Portal O R A C L E S T A T EM EN T O F D I R E C T IO N F E B R U A R Y 2016

Oracle NoSQL Database For Time Series Data O R A C L E W H I T E P A P E R D E C E M B E R

WebCenter Portal Task Flow Customization in 12c O R A C L E W H I T E P A P E R J U N E

VISUAL APPLICATION CREATION AND PUBLISHING FOR ANYONE

An Oracle White Paper December Oracle Exadata Database Machine Warehouse Architectural Comparisons

Oracle Service Cloud Agent Browser UI. November What s New

An Oracle White Paper September, Oracle Real User Experience Insight Server Requirements

RAC Database on Oracle Ravello Cloud Service O R A C L E W H I T E P A P E R A U G U S T 2017

Oracle JD Edwards EnterpriseOne Object Usage Tracking Performance Characterization Using JD Edwards EnterpriseOne Object Usage Tracking

Oracle Social Network

An Oracle White Paper October Minimizing Planned Downtime of SAP Systems with the Virtualization Technologies in Oracle Solaris 10

Data Capture Recommended Operating Environments

An Oracle White Paper June Exadata Hybrid Columnar Compression (EHCC)

Overview. Implementing Fibre Channel SAN Boot with the Oracle ZFS Storage Appliance. January 2014 By Tom Hanvey; update by Peter Brouwer Version: 2.

Siebel CRM Applications on Oracle Ravello Cloud Service ORACLE WHITE PAPER AUGUST 2017

Oracle Developer Studio Code Analyzer

E-BUSINESS SUITE APPLICATIONS R12 (R12.2.5) ORDER MANAGEMENT (OLTP) BENCHMARK - USING ORACLE11g

See What's Coming in Oracle Taleo Business Edition Cloud Service

Product Release Notes

Pricing Cloud: Upgrading to R13 - Manual Price Adjustments from the R11/R12 Price Override Solution O R A C L E W H I T E P A P E R A P R I L

Improve Data Integration with Changed Data Capture. An Oracle Data Integrator Technical Brief Updated December 2006

Oracle Financial Services Regulatory Reporting for US Federal Reserve Lombard Risk Integration Pack

Benefits of an Exclusive Multimaster Deployment of Oracle Directory Server Enterprise Edition

August 6, Oracle APEX Statement of Direction

An Oracle White Paper October Deploying and Developing Oracle Application Express with Oracle Database 12c

ORACLE S PEOPLESOFT GENERAL LEDGER 9.2 (WITH COMBO EDITING) USING ORACLE DATABASE 11g FOR ORACLE SOLARIS (UNICODE) ON AN ORACLE S SPARC T7-2 Server

Oracle Utilities Customer Care and Billing Release Utility Reference Model a Load Meter Reads

E-BUSINESS SUITE APPLICATIONS R12 (R12.2.5) HR (OLTP) BENCHMARK - USING ORACLE11g ON ORACLE S CLOUD INFRASTRUCTURE

Oracle Learn Cloud. What s New in Release 15B.1

Oracle Virtual Directory 11g Oracle Enterprise Gateway Integration Guide

Frequently Asked Questions Oracle Content Management Integration. An Oracle White Paper June 2007

Repairing the Broken State of Data Protection

Oracle Express CPQ for Salesforce.com

Oracle Flash Storage System QoS Plus Operation and Best Practices ORACLE WHITE PAPER OCTOBER 2016

An Oracle White Paper April How to Use Tape Tiering Accelerator (Automatically Linked Partition)

Oracle Financial Consolidation and Close Cloud. What s New in the February Update (17.02)

Oracle Database 10g Release 2 Database Vault - Restricting the DBA From Accessing Business Data

Migrating VMs from VMware vsphere to Oracle Private Cloud Appliance O R A C L E W H I T E P A P E R O C T O B E R

Oracle NoSQL Database Parent-Child Joins and Aggregation O R A C L E W H I T E P A P E R M A Y,

Technical Upgrade Guidance SEA->SIA migration

TABLE OF CONTENTS DOCUMENT HISTORY 3

Advanced Global Intercompany Systems : Transaction Account Definition (TAD) In Release 12

Oracle Clusterware 18c Technical Overview O R A C L E W H I T E P A P E R F E B R U A R Y

Deploy VPN IPSec Tunnels on Oracle Cloud Infrastructure. White Paper September 2017 Version 1.0

Oracle Financial Consolidation and Close Cloud. What s New in the November Update (16.11)

An Oracle White Paper September Upgrade Methods for Upgrading to Oracle Database 11g Release 2

Oracle JD Edwards EnterpriseOne Object Usage Tracking Performance Characterization Using JD Edwards EnterpriseOne Object Usage Tracking

Oracle Event Processing Extreme Performance on Sparc T5

Transcription:

An Oracle White Paper September 2010 Handling Memory Ordering in Multithreaded Applications with Oracle Solaris Studio 12 Update 2: Part 1, Compiler

Introduction... 1 What Is Memory Ordering?... 2 Compiler Memory Ordering... 2 Volatile Variables Are not Necessarily the Best Solution... 4 Complier... 5 Conclusion... 7

Introduction Oracle Solaris Studio 12 Update 2 introduces a new header file <mbarrier.h>. This header file provides a set of memory ordering intrinsics that are useful for enforcing memory ordering constraints in multithreaded applications. This article is part 1 of a two-part series. This part discusses how compiler barriers can be used to stop the compiler from generating code that is incorrect due to reordered memory accesses. Part 2 discusses how memory barriers or memory fences can be used to ensure that the processor does not reorder memory operations. 1

What Is Memory Ordering? Memory ordering is the order in which memory operations (loads and stores) are performed. There are two ways that memory ordering might be changed: The compiler might change memory ordering as a result of some compile-time optimizations. The processor might change memory ordering at run time. Compiler Memory Ordering For performance reasons, a compiler performs various optimizations. It might hold variable values in registers (rather than in memory), reorder expressions, avoid reloading values, eliminate the use of variables altogether, and so on. There are two constraints to what the compiler can do. First of all, it must produce code that is functionally equivalent to the code as written. This is the as-if rule, which basically says that the output of the optimized program should be as if the program were executed exactly as written. The second constraint is the user-specified flags passed to the compiler. If these flags say, for example, that pointers never alias, then the compiler can use that assumption in producing more optimal code. The code shown in Listing 1 reflects a situation in which a compiler could produce a more-optimal code sequence by reordering the memory operations. The code starts a team of threads. Each thread waits until it is signaled before starting work and then signals as soon as it completes work. The main thread is responsible for creating the team of threads, starting them, and then waiting until their work is complete. Listing 1. Multithreaded Code in which the Compiler Might Perform Undesirable Optimizations #include <stdio.h> #include <pthread.h> volatile int volatile int pthread_t start[10]; ended[10]; threads[10]; void * work( void * param) int id = (int) param; while( start[ id ] == 0 ) // Wait until work started printf( "Thread %i started\n", id ); ended[ id ] = 1; // Indicate that work completed printf( "Thread %i finished\n", id ); 2

void dowork() for (int count = 0; count < 10; count++ ) start[ count ] = 1; // Start thread working for (int count = 0; count < 10; count++ ) while( ended[ count ] == 0 ) // Wait until thread completes work int main() // Create team of threads for( int count = 0; count < 10; count++ ) start[ count ] = 0; ended[ count ] = 0; pthread_create( &threads[count], 0, work, (void*)count ); dowork(); // Join team of threads for( int count = 0; count < 10; count++ ) pthread_join( threads[count], 0 ); Compiling and running the code in Listing 1 without optimization produces the results shown in Listing 2. The threads start and finish in an unspecified order. Listing 2. Compiling and Running Code Without Optimization % cc -mt listing1.c %./a.out Thread 1 started Thread 1 finished Thread 9 started Thread 9 finished Thread 5 started Thread 5 finished Thread 0 started Thread 0 finished Thread 3 started Thread 3 finished Thread 4 started Thread 4 finished Thread 6 started Thread 6 finished Thread 7 started Thread 7 finished Thread 2 started Thread 2 finished Thread 8 started Thread 8 finished 3

We can compare the results in Listing 2 with the results from when the code is compiled with optimization (shown in Listing 3). Listing 3. Compiling and Running Code with Optimizations % cc -g -O -mt listing1.c %./a.out Thread 0 started Thread 0 finished Thread 1 started Thread 1 finished Thread 2 started Thread 2 finished Thread 3 started Thread 3 finished Thread 4 started Thread 4 finished Thread 5 started Thread 5 finished Thread 6 started Thread 6 finished Thread 7 started Thread 7 finished Thread 8 started Thread 8 finished Thread 9 started Thread 9 finished When the code is compiled with optimization, the threads start and complete in order. The reason for this is an optimization that the compiler performed on the routine dowork(). The optimized version of this routine is shown in Listing 4. The compiler combined the two loops in the dowork() routine. This resulted in the main thread starting each child thread and then waiting until that child thread completes. This is in contrast to the desired behavior of starting all the child threads in parallel. Listing 4. Compiler Optimized Version of the Routine dowork() void dowork() for (int count = 0; count < 10; count++ ) start[ count ] = 1; // Start thread working while( ended[ count ] == 0 ) // Wait until thread completes work Volatile Variables Are not Necessarily the Best Solution It is worth discussing the use of the keyword volatile in this code. The keyword volatile means that a value needs to be loaded from memory before use and stored back to memory immediately after use. This behavior is useful because it ensures that loops, such as the one in Listing 5, are not converted to infinite loops during optimization. 4

Listing 5. Loop that Could Be Converted to an Infinite Loop During Optimization while( ended[ count ] == 0 ) // Wait until thread completes work However, using volatile variables is not without costs. If the variable is used as part of a computation, its value cannot be carried in a register; it needs to be refreshed at each use. So there is some performance impact. The use of volatile variables does not make code correct. It essentially reduces the optimization of code that uses the variables. For example, the code in Listing 6 contains a data-race condition regardless of whether the variable is declared as volatile. Listing 6. Volatile Variables Do not Protect Against Data Races void increment(int * variable) (*variable)++; Finally, as we saw in the example from Listing 1, volatile variables do not protect against side effects of optimization. The conclusion we should draw from this is that volatile variables are not a panacea; they might solve a subset of problems, but they do not solve all of them. At this point it is appropriate to ask What is required in order for the code to function correctly? The answer to that is we need points in the code that the compiler cannot optimize around. If we take the code in Listing 5, we want the compiler to have to reload the variable ended[] at every iteration of the loop, and we do not want the compiler to assume that the variable is invariant. Similarly, we want the compiler to avoid merging the two loops, as shown in the optimization in Listing 4. So what we need is a compiler barrier. Complier A compiler barrier is a sequence point. At such a point, we want all previous operations to have stored their results to memory, and we want all future operations to not have been started yet. The most common sequence point is a function call. We can recode Listing 5 to use a function call to achieve the desired effect. This method is shown in Listing 7. The call to DoNothing() can immediately return having, as its name suggests, done nothing. However, the compiler does not know that nothing was done; the compiler has to assume that the function call might have changed global data. Consequently, the compiler needs to reload the value of the variable ended[]. Listing 7. Using a Function Call as a Compiler Barrier while( ended[ count ] == 0 ) DoNothing() // Wait until thread completes work 5

There are two problems with this approach. The first problem is that it uses an unnecessary function call, and the code might be more efficient if the function call were not performed. The second, more serious, problem is that under optimization, the call to DoNothing() might be in-lined. If this happens, the sequencing point is removed, and the loop is converted once again to an infinite loop. The header file <mbarrier.h> introduces the compiler_barrier() intrinsic, which is designed to provide the required functionality. This intrinsic has two advantages. The first is that the barrier remains during optimization, and the second is that it does not generate any instructions. The compiler_barrier() intrinsic can be used to provide the desired reloading of the variable ended[], as shown in Listing 8. Listing 8. Using the compiler_barrier() Intrinsic to Avoid Generating Infinite Loops During Optimization // Wait until thread completes work while( ended[ count ] == 0 ) compiler_barrier() The compiler_barrier()intrinsic can be used to avoid the requirement of declaring the variables volatile, and it can be placed between the two loops in the dowork() routine to ensure that the compiler does not merge them. The code modified to use the compiler barrier is shown in Listing 9. The changes are indicated in bold. Listing 9. Full Listing Modified to use the compiler_barrier() Intrinsic #include <stdio.h> #include <pthread.h> #include <mbarrier.h> int start[10]; // volatile removed int ended[10]; // volatile removed pthread_t threads[10]; void * work( void * param) int id = (int) param; // Use a compiler barrier to reload start[id] every iteration while( start[ id ] == 0 ) compiler_barrier(); // Wait until work started printf( "Thread %i started\n", id ); ended[ id ] = 1; // Indicate that work completed printf( "Thread %i finished\n", id ); 6

void dowork() for (int count = 0; count < 10; count++ ) start[ count ] = 1; // Start thread working // Use a compiler barrier to stop the two loops being merged compiler_barrier(); for (int count = 0; count < 10; count++ ) // Wait until thread completes work // Use a compiler barrier to reload ended[count] every iteration while( ended[ count ] == 0 ) compiler_barrier(); int main() // Create team of threads for( int count = 0; count < 10; count++ ) start[ count ] = 0; ended[ count ] = 0; pthread_create( &threads[count], 0, work, (void*)count ); dowork(); // Join team of threads for( int count = 0; count < 10; count++ ) pthread_join( threads[count], 0 ); Conclusion This article, which is part 1 of a two-part series, described how to use compiler barriers to stop the compiler from generating code that is incorrect due to reordered memory accesses. Part 2 of the series discusses how memory barriers or memory fences can be used to ensure that the processor does not reorder memory operations. 7

Handling Memory Ordering in Multithreaded Applications with Oracle Solaris Studio 12 Update 2: Part 1, Compiler September 2010 Author: Darryl Gove Oracle Corporation World Headquarters 500 Oracle Parkway Redwood Shores, CA 94065 U.S.A. Worldwide Inquiries: Phone: +1.650.506.7000 Fax: +1.650.506.7200 oracle.com Copyright 2010, Oracle and/or its affiliates. All rights reserved. This document is provided for information purposes only and the contents hereof are subject to change without notice. This document is not warranted to be error-free, nor subject to any other warranties or conditions, whether expressed orally or implied in law, including implied warranties and conditions of merchantability or fitness for a particular purpose. We specifically disclaim any liability with respect to this document and no contractual obligations are formed either directly or indirectly by this document. This document may not be reproduced or transmitted in any form or by any means, electronic or mechanical, for any purpose, without our prior written permission. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. AMD, Opteron, the AMD logo, and the AMD Opteron logo are trademarks or registered trademarks of Advanced Micro Devices. Intel and Intel Xeon are trademarks or registered trademarks of Intel Corporation. All SPARC trademarks are used under license and are trademarks or registered trademarks of SPARC International, Inc. UNIX is a registered trademark licensed through X/Open Company, Ltd. 0310