GR712RC Incorrect Annulation of Floating-point Operation on Instruction Cache Parity Error. Technical note Doc. No GRLIB-TN-0012 Issue 1.

Similar documents
GRFPU Floating-point controller (GRFPC): Store Forwarding Error After Single-precision Loads to Adjacent Registers

GRETH ERRATUM: Overrun May Cause GRETH Receiver to Hang. Technical note Doc. No GRLIB-TN-0008 Issue 1.0

GRPCI Master/Target Abort Erratum. Technical note Doc. No GRLIB-TN-0003 Issue 1.1

LEON SRMMU Behaviour. Technical note Doc. No GRLIB-TN-0002 Issue 1.0

Meltdown, Spectre, and Security Boundaries in LEON/GRLIB. Technical note Doc. No GRLIB-TN-0014 Issue 1.1

FTMCTRL: BCH EDAC with multiple 8-bit wide PROM and SRAM banks. Application note Doc. No GRLIB-AN-0003 Issue 1.0

Xuan Guo. Lecture XIV: Review of Chapter 3 & 4. CSC 3210 Computer Organization and Programming Georgia State University. March 5, 2015.

Booting a LEON system over SpaceWire RMAP. Application note Doc. No GRLIB-AN-0002 Issue 2.1

Interfacing the RAD1419 A/D converter with the GRLIB IP core library

GAISLER. IEEE-STD-754 Floating Point Unit GRFPU Lite / GRFPU-FT Lite CompanionCore Data Sheet

Chapter 4 Simple Calculations

Installing LEON VxWorks 6.9

COMPARISON BETWEEN GR740, LEON4-N2X AND NGMP

Overview of the SPARC Architecture

Migrating from the UT699 to the UT699E

GRLIDE. LEON IDE plugin for Eclipse User's Manual. The most important thing we build is trust GR-LIDE-UM. August 2016, Version 1.

LLVM LEON Backend Final Presentation

Atmel AT697 validation report

LINUXBUILD User's Manual

Specifying Representations of Machine Instructions

Installing LEON VxWorks

Atmel AT697F. Rad-Hard 32-bit SPARC v8 Processor ERRATA SHEET. Active Errata List

Hardware-based Speculation

Executive Summary Next Generation Microprocessor (NGMP) Engineering Models Product code: GR740

GAISLER. IEEE-STD-754 Floating Point Unit GRFPU / GRFPU-FT CompanionCore Data Sheet

Lecture VII: The debugger gdb and Branching. Xuan Guo. CSC 3210 Computer Organization and Programming Georgia State University.

Topics in computer architecture

Pipelining, Branch Prediction, Trends

Reprinted from 1995 Usenix Technical Conference January 16{20, 1995 New Orleans, LA. The New Jersey Machine-Code Toolkit.

USING INLINE PTX ASSEMBLY IN CUDA

ESA Contract 18533/04/NL/JD

Blackfin Embedded Processor

Lecture VIII: Branching and Control Statements. Xuan Guo. CSC 3210 Computer Organization and Programming Georgia State University.

UNIT 8 1. Explain in detail the hardware support for preserving exception behavior during Speculation.

Technical Note on NGMP Verification. Next Generation Multipurpose Microprocessor. Contract: 22279/09/NL/JK

The following revision history lists the anomaly list revisions and major changes for each anomaly list revision.

Chapter 2 SPARC Architecture Chinua Umoja

MACHINE LANGUAGE AND ASSEMBLY LANGUAGE

INLINE PTX ASSEMBLY IN CUDA

The Impact of Fetch Rate and Reorder Buffer Size on Speculative Pre-Execution

Computer Organization & Assembly Language Programming

Atmel AT697F. Rad-Hard 32-bit SPARC v8 Processor ERRATA SHEET. Active Errata List

CUDA TOOLKIT 3.2 READINESS FOR CUDA APPLICATIONS

SPARC/ERC32 V7, V8, V8E

Super Scalar. Kalyan Basu March 21,

Course on Advanced Computer Architectures

SOLUTION. Midterm #1 February 26th, 2018 Professor Krste Asanovic Name:

CS 3330 Exam 2 Spring 2016 Name: EXAM KEY Computing ID: KEY

Errata Sheet RV-8803-C7

EECC551 - Shaaban. 1 GHz? to???? GHz CPI > (?)

BCC User's Manual BCC. Bare-C Cross-Compiler User's Manual. The most important thing we build is trust

CS152 Computer Architecture and Engineering SOLUTIONS Complex Pipelines, Out-of-Order Execution, and Speculation Problem Set #3 Due March 12

Parallelism. Execution Cycle. Dual Bus Simple CPU. Pipelining COMP375 1

Android PerfHUD ES quick start guide

Final CSE 131B Winter 2003

CS 352H Computer Systems Architecture Exam #1 - Prof. Keckler October 11, 2007

Tuning CUDA Applications for Fermi. Version 1.2

EITF20: Computer Architecture Part2.2.1: Pipeline-1

LECTURE 10. Pipelining: Advanced ILP

ECE 252 / CPS 220 Advanced Computer Architecture I. Lecture 14 Very Long Instruction Word Machines

ECE 552 / CPS 550 Advanced Computer Architecture I. Lecture 15 Very Long Instruction Word Machines

CS152 Computer Architecture and Engineering. Complex Pipelines, Out-of-Order Execution, and Speculation Problem Set #3 Due March 12

ELE 655 Microprocessor System Design

BCC User's Manual BCC. Bare-C Cross-Compiler User's Manual. The most important thing we build is trust

Processor and Peripheral IP Cores for Microcontrollers in Embedded Space Applications

About these Release Notes. Documentation Accessibility. New Features in Pro*COBOL

RMA PROCESS. vr384 October RMA Process

Computer Systems Architecture I. CSE 560M Lecture 10 Prof. Patrick Crowley

CDA 3101: Summer 2018 Project 2 - Pipeline Simulator

Chapter 9. Pipelining Design Techniques

INSTRUCTION LEVEL PARALLELISM

Handout 2 ILP: Part B

GRID SOFTWARE FOR RED HAT ENTERPRISE LINUX WITH KVM VERSION /370.28

UG4 Honours project selection: Talk to Vijay or Boris if interested in computer architecture projects

Pentium 4 Processor Block Diagram

GR740 Technical Note on Benchmarking and Validation

A Boot-Strap Loader and Monitor for SPARC LEON2/3/FT

5008: Computer Architecture

CS152 Computer Architecture and Engineering. Complex Pipelines

Spectre and Meltdown. Clifford Wolf q/talk

ECO and Workarounds for Bugs in ESP32

NVIDIA CAPTURE SDK 7.1 (WINDOWS)

GRID VIRTUAL GPU FOR HUAWEI UVP Version /

4. The Processor Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3. Emil Sekerinski, McMaster University, Fall Term 2015/16

Pipelining and Vector Processing

CS420/520 Computer Architecture I

Computer Science 146. Computer Architecture

Intel released new technology call P6P

Engineer-to-Engineer Note

Cg Toolkit. Cg 1.4 rc 1 Release Notes

SISTEMI EMBEDDED. Computer Organization Pipelining. Federico Baronti Last version:

CS425 Computer Systems Architecture

Review Tomasulo. Lecture 17: ILP and Dynamic Execution #2: Branch Prediction, Multiple Issue. Tomasulo Algorithm and Branch Prediction

Lecture 6 MIPS R4000 and Instruction Level Parallelism. Computer Architectures S

Architectures & instruction sets R_B_T_C_. von Neumann architecture. Computer architecture taxonomy. Assembly language.

ADVANCED ELECTRONIC SOLUTIONS AVIATION SERVICES COMMUNICATIONS AND CONNECTIVITY MISSION SYSTEMS

Pipelined Processor Design. EE/ECE 4305: Computer Architecture University of Minnesota Duluth By Dr. Taek M. Kwon

EITF20: Computer Architecture Part2.2.1: Pipeline-1

Vorlesung / Course IN2075: Mikroprozessoren / Microprocessors

GRID SOFTWARE FOR HUAWEI UVP VERSION /370.12

Transcription:

Template: GQMS-TPLT-1-1-0 GR712RC Incorrect Annulation of Floating-point Operation on Instruction Cache Parity Error Technical note 2017-12-21 Doc. No Issue 1.3

Date: 2017-12-21 Page: 2 of 9 CHANGE RECORD Issue Date Section / Page Description 1.2 2017-11-17 All First public release 1.3 2017-12-21 Section 3.2 Overview of compiler workaround and list of toolchains with the workaround available. TABLE OF CONTENTS 1 INTRODUCTION...3 1.1 Scope of the Document...3 1.2 Affected components...3 1.3 Distribution...3 1.4 Contact...3 2 TECHNICAL DESCRIPTION...4 2.1 Overview...4 2.2 Sequence A...4 2.3 Sequence B...5 2.3.1 Effect...6 3 WORKAROUND / MITIGATION...7 3.1 Workaround...7 3.2 Toolchain versions with workaround...7 3.3 Scan script...8 4 FAQ...8 4.1 Are floating-point loads and stores affected?...8

Date: 2017-12-21 Page: 3 of 9 1 INTRODUCTION 1.1 Scope of the Document This document describes a corner case present in the GR712RC floating-point pipeline where an annulled floating-point instruction, with a tag or data parity error in instruction cache combined with an integer branch instruction resolved late in the pipeline, is executed and committed even though it should have been annulled. 1.2 Affected components Cobham and Aeroflex components that are affected are: GR712RC Cobham and Aeroflex components that are unaffected are: GR740 LEON4FT is unaffected LEON3FT-RTAX UT699 UT699E UT700 No GRLIB version is affected from this bug. 1.3 Distribution GR712RC users are free to use the material in this document in their own documents and to redistribute this document. Please contact Cobham Gaisler for inquires on other use. 1.4 Contact For questions on this document, please contact Cobham Gaisler support at support@gaisler.com. When requesting support include the part name if the question is a specific device or the full GRLIB IP library package name if the question relates to a GRLIB IP library license.

Date: 2017-12-21 Page: 4 of 9 2 TECHNICAL DESCRIPTION 2.1 Overview The processor behaviour described in this document can cause an incorrect update of a floatingpoint register or floating-point condition register due to execution and commit of an extra floating point instruction outside of the correct instruction flow. This can in turn cause erroneous behaviour of software. Two different instruction sequences are vulnerable to this issue. They are described in the subsections below. 2.2 Sequence A 1. Any instruction that modifies the integer unit condition codes. 2. Branch instruction with annul bit = '1' that depends on integer condition code 3. The delay slot of the branch instruction is a floating point instruction Notes: All the instructions in the above sequence must be back-to-back in the instruction flow without any intervening instructions. Instructions in 1. include: ANDcc, ANDNcc ORcc, ORNcc XORcc, XNORcc ADDcc, ADDXcc, TADDcc SUBcc, SUBXcc, TSUBcc MULScc, UMULcc, SMULcc UDIVcc, SDIVcc Pseudo instructions: cmp inccc deccc tst btst Instruction in 2.include Bicc,a ( see SPARC architecture manual V8 section B.21 ) Instruction in 3. is the delay slot of 2 and include FPop (see SPARC architecture manual V8 section B.33)

Date: 2017-12-21 Page: 5 of 9 Example: cmp %l0,%l1 (subcc %l0,%l1,%g0) bne,a L1 fmuld %f10,%f10,%f16 In order for the erroneous behaviour to be encountered, the two following conditions must met: 1. Branch is not taken. 2. Floating point operation or branch with floating point condition at the delay slot of 1. has a tag or data parity error in instruction cache (can be caused by radiation induced single-event upset or inserted by force using the error injection diagnostic interface) by the time it is fetched from the instruction cache. 2.3 Sequence B 1. Any instruction that modifies the integer unit condition codes. 2. Branch instruction that depends on integer condition code 3. Any instruction in the delay slot of the branch. 4. At the branch target address, a floating point operate instruction or a branch with floating point condition code. Notes: Instructions 1.2.3 in the above sequence must be back-to-back in the instruction flow without any intervening instructions. Instruction 4 is the branch target address of instruction 2 which is executed after instruction 3. Instruction in 1. include: same as Sequence A. Instructions in 2. include Bicc ( see SPARC architecture manual V8 section B.21 ) Bicc,a ( see SPARC architecture manual V8 section B.21 ) Instruction in 3. is the delay slot of 2. Instruction in 4. include: FPop (see SPARC architecture manual V8 section B.33) FBfcc (see sparc architecture manual V8 section B.22)

Date: 2017-12-21 Page: 6 of 9 Example: cmp %l0,%l1 (subcc %l0,%l1,%g0) be L1 nop L1: fmuld %f10,%f10,%f16 Another example: cmp %l0,%l1 (subcc %l0,%l1,%g0) be L1 nop L1: fbe L2 In order for the erroneous behaviour to be encountered, the two following conditions must met: 1. Branch is not taken. 2. Floating point operation or branch with floating point condition at the branch target address of 1. should have a tag or data parity error in instruction cache (can be caused by radiation induced single-event upset or inserted by force using the error injection diagnostic interface) by the time it is fetched from the instruction cache. 2.3.1 Effect When erroneous behaviour is encountered an extra floating point operation will be executed and committed which must not happen in a correct execution flow. What floating-point operation is performed and which register is affected is not predictable. Whether this creates a software error depends on if the modified register and/or condition code is subsequently used. It should be noted that the two sequences (A,B) can be combined and two consecutively annulled floating point operations can be executed wrongly.

Date: 2017-12-21 Page: 7 of 9 Example: In this example, if the branch is not taken and both of the floating point instructions encounter a parity error in the instruction cache while they are fetched then both of them will be executed and incorrectly committed. cmp %l0,%l1 bne,a L1 fmuld %f10,%f10,%f16.... L1: fadd %f12,%f14,%f8 3 WORKAROUND / MITIGATION 3.1 Workaround Erroneously executed floating point operations can be avoided by avoiding sequences A and B. Sequence A can be avoided by avoiding floating point operations in delay slots of conditional integer branches with the annul bit set. Sequence B can be avoided by moving all floating point operations and floating point branches that resides on branch targets by inserting a nop instruction on those branch targets. 3.2 Toolchain versions with workaround Errata workarounds are available for BCC, RCC, and VxWorks. The workarounds include compiler fixes as well as patched assembly code for the system libraries. The following release versions and later will include the workarounds in toolchain and OS/libraries: BCC 1.0.50 BCC 2.0.2 RCC 1.2.22 RCC 1.3-RC3 VxWorks 6.7 1.0.21 Toolchain 1.0.15 VxWorks 6.9 2.0.8 Toolchain 1.0.4 VxWorks 7 Toolchain 7.2.0.0 MKPROM 2.0.63

Date: 2017-12-21 Page: 8 of 9 The compiler workaround will prevent any floating-point operation from being placed in the delay slot of conditional integer branches with the annul bit set. It will also place a NOP instruction at the branch target of an integer branch if it is a floating-point operation or a floating-point branch. The workaround is thus in accordance with the method described in the previous section. Note that the compiler workaround will not make any modifications to inline assembly code in the user application. Any sensitive assembly instruction sequence will have to be modified manually. The compiler workaround is activated by compiling with the "-mfix-gr712rc" flag. The patches for the compiler workaround are listed below. They are included in the official GCC distribution for version 7.3.0 and later. [SPARC] Errata workaround for https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=255234 * config/sparc/sparc.c (sparc_do_work_around_errata): Use mem_ref https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=255393 SPARC: Make sure that jump is to a label in errata workaround https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=255807 3.3 Scan script A scan script is provided to allow scans of disassembled programs for code that can trigger the issue described in this document. The script is available from: http://www.gaisler.com/notes Please see the header in the script file for usage instructions. 4 FAQ 4.1 Are floating-point loads and stores affected? No, loads and stores are not affected and can be placed in delay slots and branch targets.

Date: 2017-12-21 Page: 9 of 9 Copyright 2017 Cobham Gaisler. Information furnished by Cobham Gaisler is believed to be accurate and reliable. However, no responsibility is assumed by Cobham Gaisler for its use, or for any infringements of patents or other rights of third parties which may result from its use. No license is granted by implication or otherwise under any patent or patent rights of Cobham Gaisler. All information is provided as is. There is no warranty that it is correct or suitable for any purpose, neither implicit nor explicit.