Enhancing LLVM s Floating-Point Exception and Rounding Mode Support. Andy Kaylor David Kreitzer Intel Corporation

Size: px
Start display at page:

Download "Enhancing LLVM s Floating-Point Exception and Rounding Mode Support. Andy Kaylor David Kreitzer Intel Corporation"

Transcription

1 Enhancing LLVM s Floating-Point Exception and Rounding Mode Support Andy Kaylor David Kreitzer Intel Corporation 1

2 What s Needed? User controlled rounding mode needs to be respected by optimizer FP exception status flags need to be correctly maintained No exceptions hidden by optimizations No false exceptions introduced by optimizations FP instruction side effects (in existing intrinsics) need to be modeled Need extra support for masked vector operations Masked-off lanes shouldn t raise exceptions Other issues? 2

3 Proposed Solution What passes can assume about rounding (DYNAMIC, TONEAREST, DOWNWARD, UPWARD, = thread_local global i8, section llvm.metadata declare %lhs, double %rhs, metadata %rounding_behavior, metadata %exception_behavior, i8* %fp_env) Opaque reference to the FP environment. Must What passes can assume about exceptions (IGNORE, RETURN, MAYTRAP) Hat Tip to Chandler Carruth: 3

4 Rounding Behavior The rounding behavior argument is information for the optimizer. It is not equivalent to the actual runtime rounding mode. define %a, double %b) { %rm = call ; We can t use this value at compile time. %add1 = call (double %a, double %b, metadata! LLVM_ROUND_DYNAMIC, metadata! LLVM_FPEXCEPT_RETURN, %result = call ; FE_TOWARDZERO -- Now we know. %add2 = call (double %a, double %b, metadata! LLVM_ROUND_TOWARDZERO, metadata! LLVM_FPEXCEPT_RETURN, 4

5 What happens in CodeGen? The proposal to this point centers on making things work in IR. Do we need a way to make sure CodeGen behaves also? Possible solutions: Defer the lowering of the intrinsic as late as possible. Correctly model the registers used by FP operations. For x86 targets, at least, the implicit uses of MXCSR, and the x87 control and status registers are not currently modeled. Add attributes describing the rounding and exception behavior. 5

6 Masked Vector Operations We re teaching the vectorizer to create masked vector operations. We need to avoid false exceptions when the target hardware does not support masking. declare <2 x (<2 x double> %a, <2 x double> %b, <2 x i1> %mask, <2 x double> %passthru, metadata %rounding_behavior, metadata %exception_behavior, i8* %fp_env) 6

7 Other concerns? 7

8 Backup 8

9 Why? #pragma STDC FENV_ACCESS ON More pragmas on the way -ftrapping-math New masked vector operations 9

10 Implementation Goals No changes required to maintain the current handling when the default modes are used. Limit scope of changes needed for conservatively correct behavior. Allow a path to optimize constrained FP code. Try not to limit potential vectorization. 10

11 Example: Constant Folding define %div = fdiv double e+00, e+01 ret double %div Sparse Conditional Constant Propagation define { ret double e-01 Looks right but the folded constant must be rounded! 11

12 Example: Rounding (a 0) define %a){ %sub = fsub double %a, e+00 ret double %sub Early CSE define %a) { ret double %a This is incorrect if %a is zero and rounding mode is FE_DOWNWARD! 12

13 Example: Rounding (-(-a)*b) define %a, double %b){ %sub = fsub double e+00, %a %mul = fmul double %sub, %b %sub1 = fsub double e+00, %mul ret double %sub1 Combine Redundant Instructions define %a, double %b) { %mul = fmul double %a, %b ret double %mul Optimized code produces a different result for some rounding modes! 13

14 Example: Speculative Execution define %n, double %d) { %cmp = icmp sgt i32 %n, 0 br i1 %cmp, label %if.then, label %if.end if.then: %add = fadd double e+00, %d br label %if.end if.end: %d.0 = phi double [%add, %if.then], [ %d, %entry ] ret double %d.0 Simplify CFG define %n, double %d) { %cmp = icmp sgt i32 %n, 0 %add = fadd double e+00, %d %0 = select i1 %cmp, double %add, double %d ret double %0 Speculative execution may set exception status flags! 14

15 Example: Side Effects define <4 x x float> %v) { %tmp = alloca i32, align 4 %tmp1 = alloca i32, align 4 %0 = bitcast i32* %tmp to i8* call %0) %stmxcsr = load i32, i32* %tmp, align 4 %or = or i32 %stmxcsr, store i32 %or, i32* %tmp1, align 4 %1 = bitcast i32* %tmp1 to i8* call %1) %floorint = call <4 x x float> %0) readnone %result = sitofp <4 x i32> %floorint to <4 x float> call %0) ret %result The instructions in bold all implicitly use MXCSR! 15

16 The FP Environment The FP environment is kind of an abstract idea. On Intel64 targets, for instance, it consists of: The x87 FPU status register The x87 FPU control register The MXCSR register How should we be modeling its use? Implicit state, SSA Value or Intrinsic Global? If an explicit value is used, it cannot be used outside the new intrinsics. 16

17 Example: Implicit State define %a, double %b, double %c) { %add1 = call (double %a, double %b, metadata! LLVM_ROUND_DYNAMIC, metadata! LLVM_FEEXCEPT_RETURN ) call %add1) %add2 = call (double %a, double %b, metadata! LLVM_ROUND_DYNAMIC, metadata! LLVM_FEEXCEPT_RETURN ) %add3 = call (double %add2, double %c, metadata! LLVM_ROUND_DYNAMIC, metadata! LLVM_FEEXCEPT_RETURN ) ret double %add3 Problem: The intrinsic properties must be very restrictive. 17

18 Example: SSA Value define %a, double %b, double %c) { %fenv = call %add1.ret = call { double, %a, double %b, metadata! LLVM_ROUND_DYNAMIC, metadata! LLVM_FEEXCEPT_RETURN, token %fenv) %fenv.2 = extractvalue { double, token %add1.ret, 1 call token %fenv.1) %add1 = extractvalue { double, token %add1.ret, 0 call %add1) [ fp_env (token %fenv.2)] %fenv.2 = call %add2.ret = call { double, %a, double %b, metadata! LLVM_ROUND_DYNAMIC, metadata! LLVM_FEEXCEPT_RETURN, token %fenv.2) %fenv.3 = extractvalue { double, token %add2.ret, 1 %add2 = extractvalue { double, token %add2.ret, 0 %add3.ret = call { double, %add2, double %c, metadata! LLVM_ROUND_DYNAMIC, metadata! LLVM_FEEXCEPT_RETURN, token %fenv.3) %fenv.4 = extractvalue { double, token %add3.env, 0 call token %fenv.4) %add3 = extractvalue { double, token %add3.env, 0 ret float %add3 Problem: We aren t allowed to use tokens this way. 18

19 Example: Intrinsic = thread_local global i8, section llvm.metadata define %a, double %b, double %c) { %add1 = call (double %a, double %b, metadata! LLVM_ROUND_DYNAMIC, metadata! LLVM_FEEXCEPT_RETURN, call %add2 = call (double %a, double %b, metadata! LLVM_ROUND_DYNAMIC, metadata! LLVM_FEEXCEPT_RETURN, %add3 = call (double %add2, double %c, metadata! LLVM_ROUND_DYNAMIC, metadata! LLVM_FEEXCEPT_RETURN, ret float %add3 19

20 Bad Masked Vector Lowering (hypothetical) %div.v = call <8 x (<8 x double> %a, <8 x double> %b, <8 x i1> %mask, <8 x double> %passthru, i32 1, i32-1, AVX-512F %vreg4 = VDIVPDZrrk %vreg0, %vreg3, %vreg1, %vreg2 AVX (bad code) %vreg4 = V_SET0 %vreg5 = VPCMPEQDrr %vreg3, %vreg4 %vreg6 = V_SETALLONES %vreg7 = VPXORrr %vreg5, %vreg6 %vreg8 = VPMOVSXDQYrr %vreg7 %vreg9 = VDIVPDYrr %vreg1, %vreg2 %vreg10 = VBLENDVPDYrr %vreg0, %vreg9, %vreg8 This may raise false divideby-zero exceptions! 20

21 Command Line Options clang currently recognizes the following command line options: -ffast-math : Allow aggressive, lossy floating-point optimizations -ffinitee-math-only : Assume no NaNs or infinities are generated -ffloat-store : Don't allocate floats and doubles in extended precision registers -frounding-math : Disable optimizations that assume default FP rounding behavior -fsignaling-nans : Disable optimizations observable by IEEE signaling NaNs -fsigned-zeros : Disable floating point optimizations that ignore the IEEE signedness of zero -fsingle-precision-constants : Convert floating point constants to single precision constants -ftrapping-math : Assume floating point operations can trap -funsafe-math-optimizations : Allow math optimizations that may violate IEEE or ISO standards Some of these are ignored. Others are implemented using the fast math flags. 21

22 FP-related pragmas C99 FENV_ACCESS (on/off) FP_CONTRACT (on/off) CX_LIMITED_RANGE (on/off) ISO/IEC TS :2014 FE_ROUND (dynamic/direction) ISO/IEC TS :2015 FE_DEC_ROUND (dynamic/direction) ISO/IEC TS :2016 FENV_FLT_EVAL_METHOD (width) FENV_DEC_EVAL_METHOD (width) FENV_ALLOW_VALUE_CHANGING_OPTIMIZATION (on/off) FENV_ALLOW_ASSOCIATIVE_LAW (on/off) FENV_ALLOW_DISTRIBUTIVE_LAW (on/off) FENV_ALLOW_MULTIPLY_BY_RECIPROCAL (on/off) FENV_ALLOW_ZERO_SUBNORMAL (on/off) FENV_ALLOW_CONTRACT_FMA (on/off) FENV_ALLOW_CONTRACT_OPERATION_CONVERSION (on/off) FENV_ALLOW_CONTRACT (on/off) FENV_REPRODUCIBLE (on/off) FENV_EXCEPT (action except-list) 22

23 <fenv.h> Functions int feclearexcept(int except); int fegetexceptflag(fexcept_t *pflag, int except); int feraiseexcept(int except); int fesetexceptflag(const fexcept_t *pflag, int except); int fetestexcept(int except); int fegetround(void); int fesetround(int mode); int fegetenv(fenv_t *penv); int feholdexcept(fenv_t *penv); int fesetenv(const fenv_t *penv); int feupdateenv(const fenv_t *penv); 23

Directed Rounding Arithmetic Operations (Revision 1) WG21 Document : N2876=

Directed Rounding Arithmetic Operations (Revision 1) WG21 Document : N2876= Directed Rounding Arithmetic Operations (Revision 1) WG21 Document : N2876=09-0066 Guillaume Melquiond and Sylvain Pion 2009-04-30 Abstract This revision of N2811 is targeted at the Core Working Group.

More information

Introduction to LLVM compiler framework

Introduction to LLVM compiler framework Introduction to LLVM compiler framework Michele Scandale Politecnico di Milano April 8, 2015 This material is strongly based on Ettore Speziale s material for the previous year course. Michele Scandale

More information

Targeting LLVM IR. LLVM IR, code emission, assignment 4

Targeting LLVM IR. LLVM IR, code emission, assignment 4 Targeting LLVM IR LLVM IR, code emission, assignment 4 LLVM Overview Common set of tools & optimizations for compiling many languages to many architectures (x86, ARM, PPC, ASM.js). Integrates AOT & JIT

More information

Plan for Today. Concepts. Next Time. Some slides are from Calvin Lin s grad compiler slides. CS553 Lecture 2 Optimizations and LLVM 1

Plan for Today. Concepts. Next Time. Some slides are from Calvin Lin s grad compiler slides. CS553 Lecture 2 Optimizations and LLVM 1 Plan for Today Quiz 2 How to automate the process of performance optimization LLVM: Intro to Intermediate Representation Loops as iteration spaces Data-flow Analysis Intro Control-flow graph terminology

More information

Introduction to LLVM compiler framework

Introduction to LLVM compiler framework Introduction to LLVM compiler framework Stefano Cherubin Politecnico di Milano 12-04-2017 This material is strongly based on material produced by Michele Scandale and Ettore Speziale for the course `Code

More information

15-411: LLVM. Jan Hoffmann. Substantial portions courtesy of Deby Katz

15-411: LLVM. Jan Hoffmann. Substantial portions courtesy of Deby Katz 15-411: LLVM Jan Hoffmann Substantial portions courtesy of Deby Katz and Gennady Pekhimenko, Olatunji Ruwase,Chris Lattner, Vikram Adve, and David Koes Carnegie What is LLVM? A collection of modular and

More information

Programming languages C

Programming languages C INTERNATIONAL STANDARD ISO/IEC 9899:1999 TECHNICAL CORRIGENDUM 1 Published 2001-09-01 INTERNATIONAL ORGANIZATION FOR STANDARDIZATION МЕЖДУНАРОДНАЯ ОРГАНИЗАЦИЯ ПО СТАНДАРТИЗАЦИИ ORGANISATION INTERNATIONALE

More information

Compiler Construction: LLVMlite

Compiler Construction: LLVMlite Compiler Construction: LLVMlite Direct compilation Expressions X86lite Input Output Compile directly from expression language to x86 Syntax-directed compilation scheme Special cases can improve generated

More information

The following sections describe how to implement floating point operations using either the Spin Language or assembler code.

The following sections describe how to implement floating point operations using either the Spin Language or assembler code. Using Parallax Propeller Floating Point Routines The Propeller chip can be programmed using either the Spin Language or assembly code. The design objective for the Propeller floating point package was

More information

A Framework for Automatic OpenMP Code Generation

A Framework for Automatic OpenMP Code Generation 1/31 A Framework for Automatic OpenMP Code Generation Raghesh A (CS09M032) Guide: Dr. Shankar Balachandran May 2nd, 2011 Outline 2/31 The Framework An Example Necessary Background Polyhedral Model SCoP

More information

FLOATING-POINT PROPOSALS FOR C2X

FLOATING-POINT PROPOSALS FOR C2X FLOATING-POINT PROPOSALS FOR C2X N2140 WG 14 - Markham April 3-6, 2017 C FP group FP proposals for C2x IEC 60559 is intended for a wide range of applications. Not all its features are suitable for all

More information

Apple LLVM GPU Compiler: Embedded Dragons. Charu Chandrasekaran, Apple Marcello Maggioni, Apple

Apple LLVM GPU Compiler: Embedded Dragons. Charu Chandrasekaran, Apple Marcello Maggioni, Apple Apple LLVM GPU Compiler: Embedded Dragons Charu Chandrasekaran, Apple Marcello Maggioni, Apple 1 Agenda How Apple uses LLVM to build a GPU Compiler Factors that affect GPU performance The Apple GPU compiler

More information

CIS 341 Final Examination 4 May 2017

CIS 341 Final Examination 4 May 2017 CIS 341 Final Examination 4 May 2017 1 /14 2 /15 3 /12 4 /14 5 /34 6 /21 7 /10 Total /120 Do not begin the exam until you are told to do so. You have 120 minutes to complete the exam. There are 14 pages

More information

NAN propagation versus fault trapping in floating point code

NAN propagation versus fault trapping in floating point code NAN propagation versus fault trapping in floating point code By Agner Fog. Technical University of Denmark. Copyright 2018. Last updated 2018-05-24. Contents 1 Introduction... 1 2 Fault trapping... 1 3

More information

The New C Standard (Excerpted material)

The New C Standard (Excerpted material) The New C Standard (Excerpted material) An Economic and Cultural Derek M. Jones derek@knosof.co.uk Copyright 2002-2008 Derek M. Jones. All rights reserved. 1994 #pragma directive Semantics A preprocessing

More information

GPU Floating Point Features

GPU Floating Point Features CSE 591: GPU Programming Floating Point Considerations Klaus Mueller Computer Science Department Stony Brook University Objective To understand the fundamentals of floating-point representation To know

More information

Lecture 6 More on the LLVM Compiler

Lecture 6 More on the LLVM Compiler Lecture 6 More on the LLVM Compiler Jonathan Burket Special thanks to Deby Katz, Luke Zarko, and Gabe Weisz for their slides Visualizing the LLVM Compiler System C C++ Java Source Code Clang (Front End)

More information

Rationale for TR Extension to the programming language C. Decimal Floating-Point Arithmetic

Rationale for TR Extension to the programming language C. Decimal Floating-Point Arithmetic Rationale for TR 24732 Extension to the programming language C Decimal Floating-Point Arithmetic Contents 1 Introduction... 1 1.1 Background... 1 1.2 The Arithmetic Model...3 1.3 The Encodings... 3 2 General...

More information

Lecture 23 CIS 341: COMPILERS

Lecture 23 CIS 341: COMPILERS Lecture 23 CIS 341: COMPILERS Announcements HW6: Analysis & Optimizations Alias analysis, constant propagation, dead code elimination, register allocation Due: Wednesday, April 25 th Zdancewic CIS 341:

More information

Expressing high level optimizations within LLVM. Artur Pilipenko

Expressing high level optimizations within LLVM. Artur Pilipenko Expressing high level optimizations within LLVM Artur Pilipenko artur.pilipenko@azul.com This presentation describes advanced development work at Azul Systems and is for informational purposes only. Any

More information

Reproducibility BoF Position, Solutions

Reproducibility BoF Position, Solutions Reproducibility BoF Position, Solutions Triaging Races, Floating Point, Other Concerns Wei-Fan Chiang, Ganesh Gopalakrishnan, Geof Sawaya, Simone Atzeni School of Computing, University of Utah, Salt Lake

More information

DEVIRTUALIZATION IN LLVM

DEVIRTUALIZATION IN LLVM DEVIRTUALIZATION IN LLVM Piotr Padlewski piotr.padlewski@gmail.com University of Warsaw IIIT @PiotrPadlewski CURRENT DEVIRTUALIZATION IN THE FRONTEND struct A { ; virtual void foo(); void f() { A a; a.foo();

More information

LLVM and IR Construction

LLVM and IR Construction LLVM and IR Construction Fabian Ritter based on slides by Christoph Mallon and Johannes Doerfert http://compilers.cs.uni-saarland.de Compiler Design Lab Saarland University 1 Project Progress source code

More information

ISO/IEC TS OVERVIEW

ISO/IEC TS OVERVIEW ISO/IEC TS 18661 OVERVIEW 23 rd IEEE Symposium on Computer Arithmetic ARITH23 July 13, 2016 Jim Thomas jaswthomas@sbcglobal.net Davis, CA USA ISO/IEC Technical Specification 18661 C extensions to support

More information

An example of optimization in LLVM. Compiler construction Step 1: Naive translation to LLVM. Step 2: Translating to SSA form (opt -mem2reg)

An example of optimization in LLVM. Compiler construction Step 1: Naive translation to LLVM. Step 2: Translating to SSA form (opt -mem2reg) Compiler construction 2014 An example of optimization in LLVM Lecture 8 More on code optimization SSA form Constant propagation Common subexpression elimination Loop optimizations int f () { int i, j,

More information

3.Constructors and Destructors. Develop cpp program to implement constructor and destructor.

3.Constructors and Destructors. Develop cpp program to implement constructor and destructor. 3.Constructors and Destructors Develop cpp program to implement constructor and destructor. Constructors A constructor is a special member function whose task is to initialize the objects of its class.

More information

Exception Handling Interfaces, Implementations, and Evaluation

Exception Handling Interfaces, Implementations, and Evaluation Exception Handling Interfaces, Implementations, and Evaluation David Bindel E. Jason Riedy U.C. Berkeley Exception Handling p.1/41 What do we want? We want to produce programs which can detect exceptional

More information

September, 2003 Saeid Nooshabadi

September, 2003 Saeid Nooshabadi COMP3211 lec21-fp-iii.1 COMP 3221 Microprocessors and Embedded Systems Lectures 21 : Floating Point Number Representation III http://www.cse.unsw.edu.au/~cs3221 September, 2003 Saeid@unsw.edu.au Overview

More information

Compiler construction. x86 architecture. This lecture. Lecture 6: Code generation for x86. x86: assembly for a real machine.

Compiler construction. x86 architecture. This lecture. Lecture 6: Code generation for x86. x86: assembly for a real machine. This lecture Compiler construction Lecture 6: Code generation for x86 Magnus Myreen Spring 2018 Chalmers University of Technology Gothenburg University x86 architecture s Some x86 instructions From LLVM

More information

Rationale for TR Extension to the programming language C. Decimal Floating-Point Arithmetic

Rationale for TR Extension to the programming language C. Decimal Floating-Point Arithmetic WG14 N1161 Rationale for TR 24732 Extension to the programming language C Decimal Floating-Point Arithmetic Contents 1 Introduction... 1 1.1 Background... 1 1.2 The Arithmetic Model... 3 1.3 The Encodings...

More information

PTX Back-End: GPU Programming with LLVM

PTX Back-End: GPU Programming with LLVM PTX Back-End: GPU Programming with LLVM Justin Holewinski The Ohio State University LLVM Developer's Meeting November 18, 2011 Justin Holewinski (Ohio State) PTX Back-End Nov. 18, 2011 1 / 37 Outline PTX

More information

An LLVM Back-end for MLton

An LLVM Back-end for MLton An LLVM Back-end for MLton by Brian Andrew Leibig A Project Report Submitted in Partial Fulfillment of the Requirements for the Degree of Master of Science in Computer Science Supervised by Dr. Matthew

More information

Gil Rapaport and Ayal Zaks. Intel Corporation, Israel Development Center. March 27-28, 2017 European LLVM Developers Meeting

Gil Rapaport and Ayal Zaks. Intel Corporation, Israel Development Center. March 27-28, 2017 European LLVM Developers Meeting Gil Rapaport and Ayal Zaks Intel Corporation, Israel Development Center March 27-28, 2017 European LLVM Developers Meeting Saarland Informatics Campus, Saarbrücken, Germany Legal Disclaimer & INFORMATION

More information

COMP-520 GoLite Tutorial

COMP-520 GoLite Tutorial COMP-520 GoLite Tutorial Alexander Krolik Sable Lab McGill University Winter 2019 Plan Target languages Language constructs, emphasis on special cases General execution semantics Declarations Types Statements

More information

SPECIFICATION

SPECIFICATION Draft Technical Specification March 7, 16 ISO/IEC JTC 1/SC 22/WG 14 N04 TECHNICAL ISO/IEC TS SPECIFICATION 18661- First edition 1y-mm-dd Information technology Programming languages, their environments,

More information

The new LLVM exception handling scheme

The new LLVM exception handling scheme The new LLVM exception handling scheme Duncan Sands DeepBlueCapital / CNRS Control flow try { MayThrowSomething(); AnotherFunctionCall(); catch (int i) { catch (class A a) { Control flow try { MayThrowSomething();

More information

Chapter 04: Instruction Sets and the Processor organizations. Lesson 18: Stack-based processor Organisation

Chapter 04: Instruction Sets and the Processor organizations. Lesson 18: Stack-based processor Organisation Chapter 04: Instruction Sets and the Processor organizations Lesson 18: Stack-based processor Organisation 1 Objective To understand stack based processor organisation Instruction set of a stack organized

More information

Baggy bounds with LLVM

Baggy bounds with LLVM Baggy bounds with LLVM Anton Anastasov Chirantan Ekbote Travis Hance 6.858 Project Final Report 1 Introduction Buffer overflows are a well-known security problem; a simple buffer-overflow bug can often

More information

CIS 341 Midterm February 28, Name (printed): Pennkey (login id): SOLUTIONS

CIS 341 Midterm February 28, Name (printed): Pennkey (login id): SOLUTIONS CIS 341 Midterm February 28, 2013 Name (printed): Pennkey (login id): My signature below certifies that I have complied with the University of Pennsylvania s Code of Academic Integrity in completing this

More information

Visualizing code structure in LLVM

Visualizing code structure in LLVM Institute of Computational Science Visualizing code structure in LLVM Dmitry Mikushin dmitry.mikushin@usi.ch. December 5, 2013 Dmitry Mikushin Visualizing code structure in LLVM December 5, 2013 1 / 14

More information

LLVM Language Reference Manual

LLVM Language Reference Manual LLVM Language Reference Manual Introduction Well-Formedness Identifiers High Level Structure Module Structure Linkage Types Calling Conventions Named Types Global Variables Functions Parameter Attributes

More information

Information technology Programming languages, their environments, and system software interfaces Floating- point extensions for C

Information technology Programming languages, their environments, and system software interfaces Floating- point extensions for C Draft Technical Specification September 28, 1 ISO/IEC JTC 1/SC 22/WG 14 N1968 TECHNICAL SPECIFICATION ISO/IEC TS 18661- First edition 1y- mm- dd Information technology Programming languages, their environments,

More information

数理 計算科学特論 C プログラミング言語処理系の最先端実装技術 LLVM Compiler Framework

数理 計算科学特論 C プログラミング言語処理系の最先端実装技術 LLVM Compiler Framework 数理 計算科学特論 C プログラミング言語処理系の最先端実装技術 LLVM Compiler Framework What is LLVM? LLVM is an open-source compiler infrastructure (e.g. COINS, Eclipse OMR): Its official page says: The LLVM Project is a collection

More information

Connecting the EDG front-end to LLVM. Renato Golin, Evzen Muller, Jim MacArthur, Al Grant ARM Ltd.

Connecting the EDG front-end to LLVM. Renato Golin, Evzen Muller, Jim MacArthur, Al Grant ARM Ltd. Connecting the EDG front-end to LLVM Renato Golin, Evzen Muller, Jim MacArthur, Al Grant ARM Ltd. 1 Outline Why EDG Producing IR ARM support 2 EDG Front-End LLVM already has two good C++ front-ends, why

More information

Tutorial: Building a backend in 24 hours. Anton Korobeynikov

Tutorial: Building a backend in 24 hours. Anton Korobeynikov Tutorial: Building a backend in 24 hours Anton Korobeynikov anton@korobeynikov.info Outline 1. From IR to assembler: codegen pipeline 2. MC 3. Parts of a backend 4. Example step-by-step The Pipeline LLVM

More information

Alias Analysis in LLVM

Alias Analysis in LLVM Alias Analysis in LLVM by Sheng-Hsiu Lin Presented to the Graduate and Research Committee of Lehigh University in Candidacy for the Degree of Master of Science in Computer Science Lehigh University May

More information

INTRODUCTION TO LLVM Bo Wang SA 2016 Fall

INTRODUCTION TO LLVM Bo Wang SA 2016 Fall INTRODUCTION TO LLVM Bo Wang SA 2016 Fall LLVM Basic LLVM IR LLVM Pass OUTLINE What is LLVM? LLVM is a compiler infrastructure designed as a set of reusable libraries with well-defined interfaces. Implemented

More information

Consistency of Floating-Point Results using the Intel Compiler or Why doesn t my application always give the same answer?

Consistency of Floating-Point Results using the Intel Compiler or Why doesn t my application always give the same answer? Consistency of Floating-Point Results using the Intel Compiler or Why doesn t my application always give the same answer? Dr. Martyn J. Corden David Kreitzer Software Services Group Intel Corporation Introduction

More information

Instruction Set extensions to X86. Floating Point SIMD instructions

Instruction Set extensions to X86. Floating Point SIMD instructions Instruction Set extensions to X86 Some extensions to x86 instruction set intended to accelerate 3D graphics AMD 3D-Now! Instructions simply accelerate floating point arithmetic. Accelerate object transformations

More information

Adding Binary Integers. Part 5. Adding Base 10 Numbers. Adding 2's Complement. Adding Binary Example = 10. Arithmetic Logic Unit

Adding Binary Integers. Part 5. Adding Base 10 Numbers. Adding 2's Complement. Adding Binary Example = 10. Arithmetic Logic Unit Part 5 Adding Binary Integers Arithmetic Logic Unit = Adding Binary Integers Adding Base Numbers Computer's add binary numbers the same way that we do with decimal Columns are aligned, added, and "'s"

More information

LLVM code generation and implementation of nested functions for the SimpliC language

LLVM code generation and implementation of nested functions for the SimpliC language LLVM code generation and implementation of nested functions for the SimpliC language Oscar Legetth Lunds University dat12ole@student.lth.se Gustav Svensson Lunds University dat12gs1@student.lth.se Abstract

More information

High Performance Computing: Tools and Applications

High Performance Computing: Tools and Applications High Performance Computing: Tools and Applications Edmond Chow School of Computational Science and Engineering Georgia Institute of Technology Lecture 8 Processor-level SIMD SIMD instructions can perform

More information

Compiler Construction Lent Term 2015 Lectures 10, 11 (of 16)

Compiler Construction Lent Term 2015 Lectures 10, 11 (of 16) Compiler Construction Lent Term 15 Lectures 10, 11 (of 16) 1. Slang.2 (Lecture 10) 1. In lecture code walk of slang2_derive 2. Assorted topics (Lecture 11) 1. Exceptions 2. Objects 3. Stacks vs. Register

More information

Tapir: Embedding Fork-Join Parallelism into LLVM s Intermediate Representation

Tapir: Embedding Fork-Join Parallelism into LLVM s Intermediate Representation Tapir: Embedding Fork-Join Parallelism into LLVM s Intermediate Representation. 2016, 2016 Joint work with and Charles E. Leiserson 1 Example: Normalizing a Vector attribute ((const)) double norm(const

More information

A Brief Introduction to Using LLVM. Nick Sumner

A Brief Introduction to Using LLVM. Nick Sumner A Brief Introduction to Using LLVM Nick Sumner What is LLVM? A compiler? (clang) What is LLVM? A compiler? (clang) A set of formats, libraries, and tools. What is LLVM? A compiler? (clang) A set of formats,

More information

CSE P 501 Compilers. Intermediate Representations Hal Perkins Spring UW CSE P 501 Spring 2018 G-1

CSE P 501 Compilers. Intermediate Representations Hal Perkins Spring UW CSE P 501 Spring 2018 G-1 CSE P 501 Compilers Intermediate Representations Hal Perkins Spring 2018 UW CSE P 501 Spring 2018 G-1 Administrivia Semantics/types/symbol table project due ~2 weeks how goes it? Should be caught up on

More information

LLVM Tutorial. John Criswell University of Rochester

LLVM Tutorial. John Criswell University of Rochester LLVM Tutorial John Criswell University of Rochester 1 Overview 2 History of LLVM Developed by Chris Lattner and Vikram Adve at the University of Illinois at Urbana-Champaign Released open-source in October

More information

Compiling for Performance on hp OpenVMS I64. Doug Gordon Original Presentation by Bill Noyce European Technical Update Days, 2005

Compiling for Performance on hp OpenVMS I64. Doug Gordon Original Presentation by Bill Noyce European Technical Update Days, 2005 Compiling for Performance on hp OpenVMS I64 Doug Gordon Original Presentation by Bill Noyce European Technical Update Days, 2005 Compilers discussed C, Fortran, [COBOL, Pascal, BASIC] Share GEM optimizer

More information

The x87 Floating-Point Unit

The x87 Floating-Point Unit The x87 Floating-Point Unit Lecture 27 Intel Manual, Vol. 1, Chapter 8 Intel Manual, Vol. 2 Robb T. Koether Hampden-Sydney College Fri, Mar 27, 2015 Robb T. Koether (Hampden-Sydney College) The x87 Floating-Point

More information

Ayal Zaks and Gil Rapaport, Vectorization Team, Intel Corporation. October 18 th, 2017 US LLVM Developers Meeting, San Jose, CA

Ayal Zaks and Gil Rapaport, Vectorization Team, Intel Corporation. October 18 th, 2017 US LLVM Developers Meeting, San Jose, CA Ayal Zaks and Gil Rapaport, Vectorization Team, Intel Corporation October 18 th, 2017 US LLVM Developers Meeting, San Jose, CA Legal Disclaimer & INFORMATION IN THIS DOCUMENT IS PROVIDED AS IS. NO LICENSE,

More information

Floating-point control in the Intel C/C++ compiler and libraries or Why doesn t my application always give the same answer?

Floating-point control in the Intel C/C++ compiler and libraries or Why doesn t my application always give the same answer? Floating-point control in the Intel C/C++ compiler and libraries or Why doesn t my application always give the same answer? Martyn Corden Software Solutions Group Intel Corporation February 2012 *Other

More information

TS ,2 FOR C2X. N2095 WG 14 - Pittsburg October 17-21, C FP group

TS ,2 FOR C2X. N2095 WG 14 - Pittsburg October 17-21, C FP group TS 18661-1,2 FOR C2X N2095 WG 14 - Pittsburg October 17-21, 2016 C FP group TS 18661-1 FOR C2X WG 14 - Pittsburg October 17-21, 2016 C FP group TS 18661-1 for C2x TS 18661 background TS 18661-1 Overview

More information

Dynamic SIMD Scheduling

Dynamic SIMD Scheduling Dynamic SIMD Scheduling Florian Wende SC15 MIC Tuning BoF November 18 th, 2015 Zuse Institute Berlin time Dynamic Work Assignment: The Idea Irregular SIMD execution Caused by branching: control flow varies

More information

A Translation Framework for Automatic Translation of Annotated LLVM IR into OpenCL Kernel Function

A Translation Framework for Automatic Translation of Annotated LLVM IR into OpenCL Kernel Function A Translation Framework for Automatic Translation of Annotated LLVM IR into OpenCL Kernel Function Chen-Ting Chang, Yu-Sheng Chen, I-Wei Wu, and Jyh-Jiun Shann Dept. of Computer Science, National Chiao

More information

Under the Compiler's Hood: Supercharge Your PLAYSTATION 3 (PS3 ) Code. Understanding your compiler is the key to success in the gaming world.

Under the Compiler's Hood: Supercharge Your PLAYSTATION 3 (PS3 ) Code. Understanding your compiler is the key to success in the gaming world. Under the Compiler's Hood: Supercharge Your PLAYSTATION 3 (PS3 ) Code. Understanding your compiler is the key to success in the gaming world. Supercharge your PS3 game code Part 1: Compiler internals.

More information

CS152 Computer Architecture and Engineering CS252 Graduate Computer Architecture. VLIW, Vector, and Multithreaded Machines

CS152 Computer Architecture and Engineering CS252 Graduate Computer Architecture. VLIW, Vector, and Multithreaded Machines CS152 Computer Architecture and Engineering CS252 Graduate Computer Architecture VLIW, Vector, and Multithreaded Machines Assigned 3/24/2019 Problem Set #4 Due 4/5/2019 http://inst.eecs.berkeley.edu/~cs152/sp19

More information

Lecture 3 Overview of the LLVM Compiler

Lecture 3 Overview of the LLVM Compiler LLVM Compiler System Lecture 3 Overview of the LLVM Compiler The LLVM Compiler Infrastructure - Provides reusable components for building compilers - Reduce the time/cost to build a new compiler - Build

More information

Matthieu Lefebvre Princeton University. Monday, 10 July First Computational and Data Science school for HEP (CoDaS-HEP)

Matthieu Lefebvre Princeton University. Monday, 10 July First Computational and Data Science school for HEP (CoDaS-HEP) Matthieu Lefebvre Princeton University Monday, 10 July 2017 First Computational and Data Science school for HEP (CoDaS-HEP) Prerequisites: recent C++ compiler Eventually cmake git clone https://github.com/mpbl/codas_fpa/

More information

Diego Caballero and Vectorizer Team, Intel Corporation. April 16 th, 2018 Euro LLVM Developers Meeting. Bristol, UK.

Diego Caballero and Vectorizer Team, Intel Corporation. April 16 th, 2018 Euro LLVM Developers Meeting. Bristol, UK. Diego Caballero and Vectorizer Team, Intel Corporation. April 16 th, 2018 Euro LLVM Developers Meeting. Bristol, UK. Legal Disclaimer & Software and workloads used in performance tests may have been optimized

More information

Consistency of Floating-Point Results using the Intel Compiler or Why doesn t my application always give the same answer?

Consistency of Floating-Point Results using the Intel Compiler or Why doesn t my application always give the same answer? 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 Consistency of Floating-Point Results using the Intel Compiler or Why doesn t my application

More information

Sequence 5.1 Building stack frames in LLVM

Sequence 5.1 Building stack frames in LLVM Sequence 5.1 Building stack frames in LLVM P. de Oliveira Castro S. Tardieu 1/13 P. de Oliveira Castro, S. Tardieu Reminder: Stack frames We have seen earlier that: A function can access its local variables

More information

Announcements. ECE4750/CS4420 Computer Architecture L11: Speculative Execution I. Edward Suh Computer Systems Laboratory

Announcements. ECE4750/CS4420 Computer Architecture L11: Speculative Execution I. Edward Suh Computer Systems Laboratory ECE4750/CS4420 Computer Architecture L11: Speculative Execution I Edward Suh Computer Systems Laboratory suh@csl.cornell.edu Announcements Lab3 due today 2 1 Overview Branch penalties limit performance

More information

OpenMP: Vectorization and #pragma omp simd. Markus Höhnerbach

OpenMP: Vectorization and #pragma omp simd. Markus Höhnerbach OpenMP: Vectorization and #pragma omp simd Markus Höhnerbach 1 / 26 Where does it come from? c i = a i + b i i a 1 a 2 a 3 a 4 a 5 a 6 a 7 a 8 + b 1 b 2 b 3 b 4 b 5 b 6 b 7 b 8 = c 1 c 2 c 3 c 4 c 5 c

More information

Guy Blank Intel Corporation, Israel March 27-28, 2017 European LLVM Developers Meeting Saarland Informatics Campus, Saarbrücken, Germany

Guy Blank Intel Corporation, Israel March 27-28, 2017 European LLVM Developers Meeting Saarland Informatics Campus, Saarbrücken, Germany Guy Blank Intel Corporation, Israel March 27-28, 2017 European LLVM Developers Meeting Saarland Informatics Campus, Saarbrücken, Germany Motivation C AVX2 AVX512 New instructions utilized! Scalar performance

More information

Floating-Point Arithmetic

Floating-Point Arithmetic ENEE446---Lectures-4/10-15/08 A. Yavuz Oruç Professor, UMD, College Park Copyright 2007 A. Yavuz Oruç. All rights reserved. Floating-Point Arithmetic Integer or fixed-point arithmetic provides a complete

More information

Lecture 4 Overview of the LLVM Compiler

Lecture 4 Overview of the LLVM Compiler Lecture 4 Overview of the LLVM Compiler Pratik Fegade Thanks to: Vikram Adve, Jonathan Burket, Deby Katz, David Koes, Chris Lattner, Gennady Pekhimenko, and Olatunji Ruwase, for their slides Visualizing

More information

Lecture 4 More on the LLVM Compiler

Lecture 4 More on the LLVM Compiler Lecture 4 More on the LLVM Compiler Abhilasha Jain Thanks to: Jonathan Burket, Deby Katz, Gabe Weisz, Luke Zarko, and Dominic Chen for their slides Visualizing the LLVM Compiler System C C++ Java Source

More information

Homework #3: CMPT-379

Homework #3: CMPT-379 Only submit answers for questions marked with. Homework #3: CMPT-379 Download the files for this homework: wget http://www.cs.sfu.ca/ msiahban/personal/teaching/cmpt-379-spring-2016/hw3.tgz Put your solution

More information

OptiCode: Machine Code Deobfuscation for Malware Analysis

OptiCode: Machine Code Deobfuscation for Malware Analysis OptiCode: Machine Code Deobfuscation for Malware Analysis NGUYEN Anh Quynh, COSEINC CONFidence, Krakow - Poland 2013, May 28th 1 / 47 Agenda 1 Obfuscation problem in malware analysis

More information

HSAIL: PORTABLE COMPILER IR FOR HSA

HSAIL: PORTABLE COMPILER IR FOR HSA HSAIL: PORTABLE COMPILER IR FOR HSA HOT CHIPS TUTORIAL - AUGUST 2013 BEN SANDER AMD SENIOR FELLOW STATE OF GPU COMPUTING GPUs are fast and power efficient : high compute density per-mm and per-watt But:

More information

Reuse Optimization. LLVM Compiler Infrastructure. Local Value Numbering. Local Value Numbering (cont)

Reuse Optimization. LLVM Compiler Infrastructure. Local Value Numbering. Local Value Numbering (cont) LLVM Compiler Infrastructure Source: LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation by Lattner and Adve Reuse Optimization Eliminate redundant operations in the dynamic execution

More information

Compiler Options. Linux/x86 Performance Practical,

Compiler Options. Linux/x86 Performance Practical, Center for Information Services and High Performance Computing (ZIH) Compiler Options Linux/x86 Performance Practical, 17.06.2009 Zellescher Weg 12 Willers-Bau A106 Tel. +49 351-463 - 31945 Ulf Markwardt

More information

Compiler construction. Course info. Today. Lecture 1: Introduction and project overview. Compiler Construction Why learn to write a compiler?

Compiler construction. Course info. Today. Lecture 1: Introduction and project overview. Compiler Construction Why learn to write a compiler? Today Compiler construction Lecture 1: Introduction and project overview Course info Introduction to compiling Some examples Project description Magnus Myreen Spring 2018 Chalmers University of Technology

More information

x86: assembly for a real machine Compiler construction 2012 x86 assembler, a first example Example explained Lecture 7

x86: assembly for a real machine Compiler construction 2012 x86 assembler, a first example Example explained Lecture 7 x86 architecture Compiler construction 2012 x86: assembly for a real machine x86 architecture Calling conventions Some x86 instructions Instruction selection Instruction scheduling Register allocation

More information

Measuring the User Debugging Experience. Greg Bedwell Sony Interactive Entertainment

Measuring the User Debugging Experience. Greg Bedwell Sony Interactive Entertainment Measuring the User Debugging Experience Greg Bedwell Sony Interactive Entertainment introducing DExTer introducing Debugging Experience Tester introducing Debugging Experience Tester (currently in internal

More information

Assignment 1c: Compiler organization and backend programming

Assignment 1c: Compiler organization and backend programming Assignment 1c: Compiler organization and backend programming Roel Jordans 2016 Organization Welcome to the third and final part of assignment 1. This time we will try to further improve the code generation

More information

4/1/15 LLVM AND SSA. Low-Level Virtual Machine (LLVM) LLVM Compiler Infrastructure. LL: A Subset of LLVM. Basic Blocks

4/1/15 LLVM AND SSA. Low-Level Virtual Machine (LLVM) LLVM Compiler Infrastructure. LL: A Subset of LLVM. Basic Blocks 4//5 Low-Level Virtual Machine (LLVM) LLVM AND SSA Slides adapted from those prepared by Steve Zdancewic at Penn Open-Source Compiler Infrastructure see llvm.org for full documntation Created by Chris

More information

Tutorial: Building a backend in 24 hours. Anton Korobeynikov

Tutorial: Building a backend in 24 hours. Anton Korobeynikov Tutorial: Building a backend in 24 hours Anton Korobeynikov anton@korobeynikov.info Outline 1. Codegen phases and parts 2. The Target 3. First steps 4. Custom lowering 5. Next steps Codegen Phases Preparation

More information

Advanced OpenMP Features

Advanced OpenMP Features Christian Terboven, Dirk Schmidl IT Center, RWTH Aachen University Member of the HPC Group {terboven,schmidl@itc.rwth-aachen.de IT Center der RWTH Aachen University Vectorization 2 Vectorization SIMD =

More information

CS 426 Fall Machine Problem 4. Machine Problem 4. CS 426 Compiler Construction Fall Semester 2017

CS 426 Fall Machine Problem 4. Machine Problem 4. CS 426 Compiler Construction Fall Semester 2017 CS 426 Fall 2017 1 Machine Problem 4 Machine Problem 4 CS 426 Compiler Construction Fall Semester 2017 Handed out: November 16, 2017. Due: December 7, 2017, 5:00 p.m. This assignment deals with implementing

More information

CSE 401/M501 Compilers

CSE 401/M501 Compilers CSE 401/M501 Compilers Intermediate Representations Hal Perkins Autumn 2018 UW CSE 401/M501 Autumn 2018 G-1 Agenda Survey of Intermediate Representations Graphical Concrete/Abstract Syntax Trees (ASTs)

More information

LDC: The LLVM-based D Compiler

LDC: The LLVM-based D Compiler LDC: The LLVM-based D Compiler Using LLVM as backend for a D compiler Kai Nacke 02/02/14 LLVM devroom @ FOSDEM 14 Agenda Brief introduction to D Internals of the LDC compiler Used LLVM features Possible

More information

Computer Architecture Chapter 3. Fall 2005 Department of Computer Science Kent State University

Computer Architecture Chapter 3. Fall 2005 Department of Computer Science Kent State University Computer Architecture Chapter 3 Fall 2005 Department of Computer Science Kent State University Objectives Signed and Unsigned Numbers Addition and Subtraction Multiplication and Division Floating Point

More information

Overview (4) CPE 101 mod/reusing slides from a UW course. Assignment Statement: Review. Why Study Expressions? D-1

Overview (4) CPE 101 mod/reusing slides from a UW course. Assignment Statement: Review. Why Study Expressions? D-1 CPE 101 mod/reusing slides from a UW course Overview (4) Lecture 4: Arithmetic Expressions Arithmetic expressions Integer and floating-point (double) types Unary and binary operators Precedence Associativity

More information

CS240: Programming in C

CS240: Programming in C CS240: Programming in C Lecture 5: Functions. Scope of variables. Program structure. Cristina Nita-Rotaru Lecture 5/ Fall 2013 1 Functions: Explicit declaration Declaration, definition, use, order matters.

More information

Type Checking. Chapter 6, Section 6.3, 6.5

Type Checking. Chapter 6, Section 6.3, 6.5 Type Checking Chapter 6, Section 6.3, 6.5 Inside the Compiler: Front End Lexical analyzer (aka scanner) Converts ASCII or Unicode to a stream of tokens Syntax analyzer (aka parser) Creates a parse tree

More information

Advanced Computer Architecture

Advanced Computer Architecture ECE 563 Advanced Computer Architecture Fall 2010 Lecture 6: VLIW 563 L06.1 Fall 2010 Little s Law Number of Instructions in the pipeline (parallelism) = Throughput * Latency or N T L Throughput per Cycle

More information

Improving Compiler Optimizations using Program Annotations

Improving Compiler Optimizations using Program Annotations Improving Compiler Optimizations using Program Annotations BY NIKO ZARZANI Laurea, Politecnico di Milano, Milan, Italy, 2011 THESIS Submitted as partial fulfillment of the requirements for the degree of

More information

Improving Numerical Reproducibility in C/C++/Fortran

Improving Numerical Reproducibility in C/C++/Fortran Improving Numerical Reproducibility in C/C++/Fortran Steve Lionel Intel Corporation steve.lionel@intel.com 1 The Three Objectives Accuracy Reproducibility Performance Pick two Reproducibility Consistent

More information

Lecture 7 CIS 341: COMPILERS

Lecture 7 CIS 341: COMPILERS Lecture 7 CIS 341: COMPILERS Announcements HW2: X86lite Available on the course web pages. Due: TOMORROW at 11:59:59pm NOTE: submission server was broken last night/this morning It should now support Ocaml

More information