Tracing mfence White Paper

Size: px

Start display at page:

Download "Tracing mfence White Paper"

Roberta Harmon
6 years ago
Views:

1 Tracing mfence White Paper Doug Deao Texas Texas Instruments All rights reserved Document History Revision Modifications 0.4 Added Alert Appendix to the end of the document. This section provides guidance for dealing with mfence instruction alerts in regards to trace. 0.4 Updated the Trace triggers Required section to include example of setting up workaround properties for Event trace. 0.5 Added section with instructions for setting up Trace Job workaround with AETLib 0.5 Updated for CCS 5.x and later.

2 The Issue Trace data generation for the mfence instruction, added to Keystone devices, is incorrect. The mfence instruction will stall the instruction pipeline until the completion of all outstanding CPU-triggered memory transactions. To determine if all outstanding CPU-triggered memory transactions are complete, the instruction checks an internal busy flag. always waits at least 5 clock cycles before checking the busy flag in order to account for pipeline delays. During the course of executing a operation, any enabled interrupts will still be serviced. While the mfence is waiting on the busy flag, the Trace PC stream continues to advance indicating in error the instruction pipeline is advancing. This causes any branch data between the mfence and the next trace sync point to be reconstructed incorrectly in the Trace Viewer or by TD.EXE, and can cause bad Trace Status column messages. Workaround Overview The workaround requires three components: 1. CCS v5.x.(and earlier CCS releases) must be updated with Emupack or later. 2. In your code the mfence instruction must be followed immediately with a nop and mark instructions. 3. An additional trace trigger is required to Don t Sample PC on a Mark. The Don t Sample PC on Mark will cause a new sync point in the trace stream. The Emupack update causes all cycles between the mfence and the new sync point to be associated with the mfence instruction, rather than instructions after the mfence in error. The following sections will provide details on implementing the workaround. Validation Discussion Validation of the workaround utilized the TSCL counter to confirm the trace timing data. Validation confirmed that all interrupts and branches that occur after the new sync point behave correctly. Validation also included generation of an interrupt during the first cycle of an mfence, which behaved as expected with the return from interrupt back to the mfence. We also tested the case where the interrupt occurred immediately after the mfence. In this case the interrupt returned to the nop instruction after the mfence as expected, but the trace timing data was one cycle less than the TSCL count. There are potential boundary condition cases caused by interrupts during an mfence instruction that our testing may not have covered. We encourage you to check your specific mfence trace cases and confirm proper return from any interrupts that occurred during the mfence instruction.

3 Code Changes Required Every occurrence of the mfence instruction must be followed with a nop and mark instruction. Methods to include nop/mark code: 1. For C code use the preprocessor to update the code: #define _mfence() asm("\tmfence\n\tnop\n\tmark 0") Note that we do not recommend using the compiler _mfence() and _mark() intrinsics in this case because the compiler can schedule code between the intrinsics. 2. For assembly: mfence nop mark 0

Trace Triggers Required A Don t Store Sample trace trigger with the following properties must be enabled, along with your normal trace triggers. For PC Trace use cases, CCSv5.

4 Trace Triggers Required A Don t Store Sample trace trigger with the following properties must be enabled, along with your normal trace triggers. For PC Trace use cases, CCSv5.4 and later provides a predefined Workaround (Don t Store Sample on Mark 0) trace trigger automatically. The Workaround trigger is not automatically added for Custom Core Trace use cases and must be added by the user. Also, for the Custom Core Trace use case, there are differences between Standard and Event Trace Don t Store Sample triggers. The following shows the properties for a Standard Trace Don t Store Sample trace trigger:

The following shows the properties for an Event Trace Don t Store Sample trace trigger: Note that you must select the specific mark instruction used in your code for this purpose.

5 The following shows the properties for an Event Trace Don t Store Sample trace trigger: Note that you must select the specific mark instruction used in your code for this purpose. If you are using the mark 0 instructions for some other trace purpose then for this workaround you must use one of the other mark instructions (mark 1,2, or 3) in your code and in the Don t Store Sample trigger to avoid conflicts. If using AETLib add the following to your code: /* Set up AET trigger for the Trace workaround */ AET_jobParams MarkTraceParams; MarkTraceParams = AET_JOBPARAMS; /* Initialize Job Parameter Structure */ MarkTraceParams.eventNumber[0] = AET_EVT_MISC_MARK_INS_0; MarkTraceParams.triggerType = AET_TRIG_TRACE_PCSUSPEND; /* Set up the desired job */ if (err=aet_setupjob(aet_job_trig_on_events, &MarkTraceParams)) { printf("error setting up AET resources for mark job [error = 0x%X]\n",err); return err; }

How do I know the workaround is functional: At cycle 802 (the MARK 0 sample) the Trace Status column contains Pc_Off Timing_On, indicating the PC Stream has been turned off on that cycle, thus

6 How do I know the workaround is functional: At cycle 802 (the MARK 0 sample) the Trace Status column contains Pc_Off Timing_On, indicating the PC Stream has been turned off on that cycle, thus causing the new sync point. At cycle 803 the Trace Status column contains Pc_On Timing_On indicating the PC data stream has been turned back on. In this case the number of cycles read from the TSCL was 49 which is the same as the highlighted trace timing. Any additional post processing you do with the data will work as normal.

7 Mfence Alert Appendix Single Issue: This alert addresses an issue with a store instruction that directly precedes an mfence instruction. The solution requires two mfence instructions back-to-back after the store instruction. For the trace workaround to function properly both mfence instructions must be followed immediately with a nop and mark instructions. Change: To: STORE_A TRANSACTION_B STORE_A NOP MARK 0 NOP MARK 0 TRANSACTION_B End of Document

Superscalar Processors

Superscalar Processors Superscalar Processor Multiple Independent Instruction Pipelines; each with multiple stages Instruction-Level Parallelism determine dependencies between nearby instructions o input