Enhancing LLVM s Floating-Point Exception and Rounding Mode Support. Andy Kaylor David Kreitzer Intel Corporation
|
|
- Dwayne Dickerson
- 6 years ago
- Views:
Transcription
1 Enhancing LLVM s Floating-Point Exception and Rounding Mode Support Andy Kaylor David Kreitzer Intel Corporation 1
2 What s Needed? User controlled rounding mode needs to be respected by optimizer FP exception status flags need to be correctly maintained No exceptions hidden by optimizations No false exceptions introduced by optimizations FP instruction side effects (in existing intrinsics) need to be modeled Need extra support for masked vector operations Masked-off lanes shouldn t raise exceptions Other issues? 2
3 Proposed Solution What passes can assume about rounding (DYNAMIC, TONEAREST, DOWNWARD, UPWARD, = thread_local global i8, section llvm.metadata declare %lhs, double %rhs, metadata %rounding_behavior, metadata %exception_behavior, i8* %fp_env) Opaque reference to the FP environment. Must What passes can assume about exceptions (IGNORE, RETURN, MAYTRAP) Hat Tip to Chandler Carruth: 3
4 Rounding Behavior The rounding behavior argument is information for the optimizer. It is not equivalent to the actual runtime rounding mode. define %a, double %b) { %rm = call ; We can t use this value at compile time. %add1 = call (double %a, double %b, metadata! LLVM_ROUND_DYNAMIC, metadata! LLVM_FPEXCEPT_RETURN, %result = call ; FE_TOWARDZERO -- Now we know. %add2 = call (double %a, double %b, metadata! LLVM_ROUND_TOWARDZERO, metadata! LLVM_FPEXCEPT_RETURN, 4
5 What happens in CodeGen? The proposal to this point centers on making things work in IR. Do we need a way to make sure CodeGen behaves also? Possible solutions: Defer the lowering of the intrinsic as late as possible. Correctly model the registers used by FP operations. For x86 targets, at least, the implicit uses of MXCSR, and the x87 control and status registers are not currently modeled. Add attributes describing the rounding and exception behavior. 5
6 Masked Vector Operations We re teaching the vectorizer to create masked vector operations. We need to avoid false exceptions when the target hardware does not support masking. declare <2 x (<2 x double> %a, <2 x double> %b, <2 x i1> %mask, <2 x double> %passthru, metadata %rounding_behavior, metadata %exception_behavior, i8* %fp_env) 6
7 Other concerns? 7
8 Backup 8
9 Why? #pragma STDC FENV_ACCESS ON More pragmas on the way -ftrapping-math New masked vector operations 9
10 Implementation Goals No changes required to maintain the current handling when the default modes are used. Limit scope of changes needed for conservatively correct behavior. Allow a path to optimize constrained FP code. Try not to limit potential vectorization. 10
11 Example: Constant Folding define %div = fdiv double e+00, e+01 ret double %div Sparse Conditional Constant Propagation define { ret double e-01 Looks right but the folded constant must be rounded! 11
12 Example: Rounding (a 0) define %a){ %sub = fsub double %a, e+00 ret double %sub Early CSE define %a) { ret double %a This is incorrect if %a is zero and rounding mode is FE_DOWNWARD! 12
13 Example: Rounding (-(-a)*b) define %a, double %b){ %sub = fsub double e+00, %a %mul = fmul double %sub, %b %sub1 = fsub double e+00, %mul ret double %sub1 Combine Redundant Instructions define %a, double %b) { %mul = fmul double %a, %b ret double %mul Optimized code produces a different result for some rounding modes! 13
14 Example: Speculative Execution define %n, double %d) { %cmp = icmp sgt i32 %n, 0 br i1 %cmp, label %if.then, label %if.end if.then: %add = fadd double e+00, %d br label %if.end if.end: %d.0 = phi double [%add, %if.then], [ %d, %entry ] ret double %d.0 Simplify CFG define %n, double %d) { %cmp = icmp sgt i32 %n, 0 %add = fadd double e+00, %d %0 = select i1 %cmp, double %add, double %d ret double %0 Speculative execution may set exception status flags! 14
15 Example: Side Effects define <4 x x float> %v) { %tmp = alloca i32, align 4 %tmp1 = alloca i32, align 4 %0 = bitcast i32* %tmp to i8* call %0) %stmxcsr = load i32, i32* %tmp, align 4 %or = or i32 %stmxcsr, store i32 %or, i32* %tmp1, align 4 %1 = bitcast i32* %tmp1 to i8* call %1) %floorint = call <4 x x float> %0) readnone %result = sitofp <4 x i32> %floorint to <4 x float> call %0) ret %result The instructions in bold all implicitly use MXCSR! 15
16 The FP Environment The FP environment is kind of an abstract idea. On Intel64 targets, for instance, it consists of: The x87 FPU status register The x87 FPU control register The MXCSR register How should we be modeling its use? Implicit state, SSA Value or Intrinsic Global? If an explicit value is used, it cannot be used outside the new intrinsics. 16
17 Example: Implicit State define %a, double %b, double %c) { %add1 = call (double %a, double %b, metadata! LLVM_ROUND_DYNAMIC, metadata! LLVM_FEEXCEPT_RETURN ) call %add1) %add2 = call (double %a, double %b, metadata! LLVM_ROUND_DYNAMIC, metadata! LLVM_FEEXCEPT_RETURN ) %add3 = call (double %add2, double %c, metadata! LLVM_ROUND_DYNAMIC, metadata! LLVM_FEEXCEPT_RETURN ) ret double %add3 Problem: The intrinsic properties must be very restrictive. 17
18 Example: SSA Value define %a, double %b, double %c) { %fenv = call %add1.ret = call { double, %a, double %b, metadata! LLVM_ROUND_DYNAMIC, metadata! LLVM_FEEXCEPT_RETURN, token %fenv) %fenv.2 = extractvalue { double, token %add1.ret, 1 call token %fenv.1) %add1 = extractvalue { double, token %add1.ret, 0 call %add1) [ fp_env (token %fenv.2)] %fenv.2 = call %add2.ret = call { double, %a, double %b, metadata! LLVM_ROUND_DYNAMIC, metadata! LLVM_FEEXCEPT_RETURN, token %fenv.2) %fenv.3 = extractvalue { double, token %add2.ret, 1 %add2 = extractvalue { double, token %add2.ret, 0 %add3.ret = call { double, %add2, double %c, metadata! LLVM_ROUND_DYNAMIC, metadata! LLVM_FEEXCEPT_RETURN, token %fenv.3) %fenv.4 = extractvalue { double, token %add3.env, 0 call token %fenv.4) %add3 = extractvalue { double, token %add3.env, 0 ret float %add3 Problem: We aren t allowed to use tokens this way. 18
19 Example: Intrinsic = thread_local global i8, section llvm.metadata define %a, double %b, double %c) { %add1 = call (double %a, double %b, metadata! LLVM_ROUND_DYNAMIC, metadata! LLVM_FEEXCEPT_RETURN, call %add2 = call (double %a, double %b, metadata! LLVM_ROUND_DYNAMIC, metadata! LLVM_FEEXCEPT_RETURN, %add3 = call (double %add2, double %c, metadata! LLVM_ROUND_DYNAMIC, metadata! LLVM_FEEXCEPT_RETURN, ret float %add3 19
20 Bad Masked Vector Lowering (hypothetical) %div.v = call <8 x (<8 x double> %a, <8 x double> %b, <8 x i1> %mask, <8 x double> %passthru, i32 1, i32-1, AVX-512F %vreg4 = VDIVPDZrrk %vreg0, %vreg3, %vreg1, %vreg2 AVX (bad code) %vreg4 = V_SET0 %vreg5 = VPCMPEQDrr %vreg3, %vreg4 %vreg6 = V_SETALLONES %vreg7 = VPXORrr %vreg5, %vreg6 %vreg8 = VPMOVSXDQYrr %vreg7 %vreg9 = VDIVPDYrr %vreg1, %vreg2 %vreg10 = VBLENDVPDYrr %vreg0, %vreg9, %vreg8 This may raise false divideby-zero exceptions! 20
21 Command Line Options clang currently recognizes the following command line options: -ffast-math : Allow aggressive, lossy floating-point optimizations -ffinitee-math-only : Assume no NaNs or infinities are generated -ffloat-store : Don't allocate floats and doubles in extended precision registers -frounding-math : Disable optimizations that assume default FP rounding behavior -fsignaling-nans : Disable optimizations observable by IEEE signaling NaNs -fsigned-zeros : Disable floating point optimizations that ignore the IEEE signedness of zero -fsingle-precision-constants : Convert floating point constants to single precision constants -ftrapping-math : Assume floating point operations can trap -funsafe-math-optimizations : Allow math optimizations that may violate IEEE or ISO standards Some of these are ignored. Others are implemented using the fast math flags. 21
22 FP-related pragmas C99 FENV_ACCESS (on/off) FP_CONTRACT (on/off) CX_LIMITED_RANGE (on/off) ISO/IEC TS :2014 FE_ROUND (dynamic/direction) ISO/IEC TS :2015 FE_DEC_ROUND (dynamic/direction) ISO/IEC TS :2016 FENV_FLT_EVAL_METHOD (width) FENV_DEC_EVAL_METHOD (width) FENV_ALLOW_VALUE_CHANGING_OPTIMIZATION (on/off) FENV_ALLOW_ASSOCIATIVE_LAW (on/off) FENV_ALLOW_DISTRIBUTIVE_LAW (on/off) FENV_ALLOW_MULTIPLY_BY_RECIPROCAL (on/off) FENV_ALLOW_ZERO_SUBNORMAL (on/off) FENV_ALLOW_CONTRACT_FMA (on/off) FENV_ALLOW_CONTRACT_OPERATION_CONVERSION (on/off) FENV_ALLOW_CONTRACT (on/off) FENV_REPRODUCIBLE (on/off) FENV_EXCEPT (action except-list) 22
23 <fenv.h> Functions int feclearexcept(int except); int fegetexceptflag(fexcept_t *pflag, int except); int feraiseexcept(int except); int fesetexceptflag(const fexcept_t *pflag, int except); int fetestexcept(int except); int fegetround(void); int fesetround(int mode); int fegetenv(fenv_t *penv); int feholdexcept(fenv_t *penv); int fesetenv(const fenv_t *penv); int feupdateenv(const fenv_t *penv); 23
Directed Rounding Arithmetic Operations (Revision 1) WG21 Document : N2876=
Directed Rounding Arithmetic Operations (Revision 1) WG21 Document : N2876=09-0066 Guillaume Melquiond and Sylvain Pion 2009-04-30 Abstract This revision of N2811 is targeted at the Core Working Group.
More informationIntroduction to LLVM compiler framework
Introduction to LLVM compiler framework Michele Scandale Politecnico di Milano April 8, 2015 This material is strongly based on Ettore Speziale s material for the previous year course. Michele Scandale
More informationTargeting LLVM IR. LLVM IR, code emission, assignment 4
Targeting LLVM IR LLVM IR, code emission, assignment 4 LLVM Overview Common set of tools & optimizations for compiling many languages to many architectures (x86, ARM, PPC, ASM.js). Integrates AOT & JIT
More informationPlan for Today. Concepts. Next Time. Some slides are from Calvin Lin s grad compiler slides. CS553 Lecture 2 Optimizations and LLVM 1
Plan for Today Quiz 2 How to automate the process of performance optimization LLVM: Intro to Intermediate Representation Loops as iteration spaces Data-flow Analysis Intro Control-flow graph terminology
More informationIntroduction to LLVM compiler framework
Introduction to LLVM compiler framework Stefano Cherubin Politecnico di Milano 12-04-2017 This material is strongly based on material produced by Michele Scandale and Ettore Speziale for the course `Code
More information15-411: LLVM. Jan Hoffmann. Substantial portions courtesy of Deby Katz
15-411: LLVM Jan Hoffmann Substantial portions courtesy of Deby Katz and Gennady Pekhimenko, Olatunji Ruwase,Chris Lattner, Vikram Adve, and David Koes Carnegie What is LLVM? A collection of modular and
More informationProgramming languages C
INTERNATIONAL STANDARD ISO/IEC 9899:1999 TECHNICAL CORRIGENDUM 1 Published 2001-09-01 INTERNATIONAL ORGANIZATION FOR STANDARDIZATION МЕЖДУНАРОДНАЯ ОРГАНИЗАЦИЯ ПО СТАНДАРТИЗАЦИИ ORGANISATION INTERNATIONALE
More informationCompiler Construction: LLVMlite
Compiler Construction: LLVMlite Direct compilation Expressions X86lite Input Output Compile directly from expression language to x86 Syntax-directed compilation scheme Special cases can improve generated
More informationThe following sections describe how to implement floating point operations using either the Spin Language or assembler code.
Using Parallax Propeller Floating Point Routines The Propeller chip can be programmed using either the Spin Language or assembly code. The design objective for the Propeller floating point package was
More informationA Framework for Automatic OpenMP Code Generation
1/31 A Framework for Automatic OpenMP Code Generation Raghesh A (CS09M032) Guide: Dr. Shankar Balachandran May 2nd, 2011 Outline 2/31 The Framework An Example Necessary Background Polyhedral Model SCoP
More informationFLOATING-POINT PROPOSALS FOR C2X
FLOATING-POINT PROPOSALS FOR C2X N2140 WG 14 - Markham April 3-6, 2017 C FP group FP proposals for C2x IEC 60559 is intended for a wide range of applications. Not all its features are suitable for all
More informationApple LLVM GPU Compiler: Embedded Dragons. Charu Chandrasekaran, Apple Marcello Maggioni, Apple
Apple LLVM GPU Compiler: Embedded Dragons Charu Chandrasekaran, Apple Marcello Maggioni, Apple 1 Agenda How Apple uses LLVM to build a GPU Compiler Factors that affect GPU performance The Apple GPU compiler
More informationCIS 341 Final Examination 4 May 2017
CIS 341 Final Examination 4 May 2017 1 /14 2 /15 3 /12 4 /14 5 /34 6 /21 7 /10 Total /120 Do not begin the exam until you are told to do so. You have 120 minutes to complete the exam. There are 14 pages
More informationNAN propagation versus fault trapping in floating point code
NAN propagation versus fault trapping in floating point code By Agner Fog. Technical University of Denmark. Copyright 2018. Last updated 2018-05-24. Contents 1 Introduction... 1 2 Fault trapping... 1 3
More informationThe New C Standard (Excerpted material)
The New C Standard (Excerpted material) An Economic and Cultural Derek M. Jones derek@knosof.co.uk Copyright 2002-2008 Derek M. Jones. All rights reserved. 1994 #pragma directive Semantics A preprocessing
More informationGPU Floating Point Features
CSE 591: GPU Programming Floating Point Considerations Klaus Mueller Computer Science Department Stony Brook University Objective To understand the fundamentals of floating-point representation To know
More informationLecture 6 More on the LLVM Compiler
Lecture 6 More on the LLVM Compiler Jonathan Burket Special thanks to Deby Katz, Luke Zarko, and Gabe Weisz for their slides Visualizing the LLVM Compiler System C C++ Java Source Code Clang (Front End)
More informationRationale for TR Extension to the programming language C. Decimal Floating-Point Arithmetic
Rationale for TR 24732 Extension to the programming language C Decimal Floating-Point Arithmetic Contents 1 Introduction... 1 1.1 Background... 1 1.2 The Arithmetic Model...3 1.3 The Encodings... 3 2 General...
More informationLecture 23 CIS 341: COMPILERS
Lecture 23 CIS 341: COMPILERS Announcements HW6: Analysis & Optimizations Alias analysis, constant propagation, dead code elimination, register allocation Due: Wednesday, April 25 th Zdancewic CIS 341:
More informationExpressing high level optimizations within LLVM. Artur Pilipenko
Expressing high level optimizations within LLVM Artur Pilipenko artur.pilipenko@azul.com This presentation describes advanced development work at Azul Systems and is for informational purposes only. Any
More informationReproducibility BoF Position, Solutions
Reproducibility BoF Position, Solutions Triaging Races, Floating Point, Other Concerns Wei-Fan Chiang, Ganesh Gopalakrishnan, Geof Sawaya, Simone Atzeni School of Computing, University of Utah, Salt Lake
More informationDEVIRTUALIZATION IN LLVM
DEVIRTUALIZATION IN LLVM Piotr Padlewski piotr.padlewski@gmail.com University of Warsaw IIIT @PiotrPadlewski CURRENT DEVIRTUALIZATION IN THE FRONTEND struct A { ; virtual void foo(); void f() { A a; a.foo();
More informationLLVM and IR Construction
LLVM and IR Construction Fabian Ritter based on slides by Christoph Mallon and Johannes Doerfert http://compilers.cs.uni-saarland.de Compiler Design Lab Saarland University 1 Project Progress source code
More informationISO/IEC TS OVERVIEW
ISO/IEC TS 18661 OVERVIEW 23 rd IEEE Symposium on Computer Arithmetic ARITH23 July 13, 2016 Jim Thomas jaswthomas@sbcglobal.net Davis, CA USA ISO/IEC Technical Specification 18661 C extensions to support
More informationAn example of optimization in LLVM. Compiler construction Step 1: Naive translation to LLVM. Step 2: Translating to SSA form (opt -mem2reg)
Compiler construction 2014 An example of optimization in LLVM Lecture 8 More on code optimization SSA form Constant propagation Common subexpression elimination Loop optimizations int f () { int i, j,
More information3.Constructors and Destructors. Develop cpp program to implement constructor and destructor.
3.Constructors and Destructors Develop cpp program to implement constructor and destructor. Constructors A constructor is a special member function whose task is to initialize the objects of its class.
More informationException Handling Interfaces, Implementations, and Evaluation
Exception Handling Interfaces, Implementations, and Evaluation David Bindel E. Jason Riedy U.C. Berkeley Exception Handling p.1/41 What do we want? We want to produce programs which can detect exceptional
More informationSeptember, 2003 Saeid Nooshabadi
COMP3211 lec21-fp-iii.1 COMP 3221 Microprocessors and Embedded Systems Lectures 21 : Floating Point Number Representation III http://www.cse.unsw.edu.au/~cs3221 September, 2003 Saeid@unsw.edu.au Overview
More informationCompiler construction. x86 architecture. This lecture. Lecture 6: Code generation for x86. x86: assembly for a real machine.
This lecture Compiler construction Lecture 6: Code generation for x86 Magnus Myreen Spring 2018 Chalmers University of Technology Gothenburg University x86 architecture s Some x86 instructions From LLVM
More informationRationale for TR Extension to the programming language C. Decimal Floating-Point Arithmetic
WG14 N1161 Rationale for TR 24732 Extension to the programming language C Decimal Floating-Point Arithmetic Contents 1 Introduction... 1 1.1 Background... 1 1.2 The Arithmetic Model... 3 1.3 The Encodings...
More informationPTX Back-End: GPU Programming with LLVM
PTX Back-End: GPU Programming with LLVM Justin Holewinski The Ohio State University LLVM Developer's Meeting November 18, 2011 Justin Holewinski (Ohio State) PTX Back-End Nov. 18, 2011 1 / 37 Outline PTX
More informationAn LLVM Back-end for MLton
An LLVM Back-end for MLton by Brian Andrew Leibig A Project Report Submitted in Partial Fulfillment of the Requirements for the Degree of Master of Science in Computer Science Supervised by Dr. Matthew
More informationGil Rapaport and Ayal Zaks. Intel Corporation, Israel Development Center. March 27-28, 2017 European LLVM Developers Meeting
Gil Rapaport and Ayal Zaks Intel Corporation, Israel Development Center March 27-28, 2017 European LLVM Developers Meeting Saarland Informatics Campus, Saarbrücken, Germany Legal Disclaimer & INFORMATION
More informationCOMP-520 GoLite Tutorial
COMP-520 GoLite Tutorial Alexander Krolik Sable Lab McGill University Winter 2019 Plan Target languages Language constructs, emphasis on special cases General execution semantics Declarations Types Statements
More informationSPECIFICATION
Draft Technical Specification March 7, 16 ISO/IEC JTC 1/SC 22/WG 14 N04 TECHNICAL ISO/IEC TS SPECIFICATION 18661- First edition 1y-mm-dd Information technology Programming languages, their environments,
More informationThe new LLVM exception handling scheme
The new LLVM exception handling scheme Duncan Sands DeepBlueCapital / CNRS Control flow try { MayThrowSomething(); AnotherFunctionCall(); catch (int i) { catch (class A a) { Control flow try { MayThrowSomething();
More informationChapter 04: Instruction Sets and the Processor organizations. Lesson 18: Stack-based processor Organisation
Chapter 04: Instruction Sets and the Processor organizations Lesson 18: Stack-based processor Organisation 1 Objective To understand stack based processor organisation Instruction set of a stack organized
More informationBaggy bounds with LLVM
Baggy bounds with LLVM Anton Anastasov Chirantan Ekbote Travis Hance 6.858 Project Final Report 1 Introduction Buffer overflows are a well-known security problem; a simple buffer-overflow bug can often
More informationCIS 341 Midterm February 28, Name (printed): Pennkey (login id): SOLUTIONS
CIS 341 Midterm February 28, 2013 Name (printed): Pennkey (login id): My signature below certifies that I have complied with the University of Pennsylvania s Code of Academic Integrity in completing this
More informationVisualizing code structure in LLVM
Institute of Computational Science Visualizing code structure in LLVM Dmitry Mikushin dmitry.mikushin@usi.ch. December 5, 2013 Dmitry Mikushin Visualizing code structure in LLVM December 5, 2013 1 / 14
More informationLLVM Language Reference Manual
LLVM Language Reference Manual Introduction Well-Formedness Identifiers High Level Structure Module Structure Linkage Types Calling Conventions Named Types Global Variables Functions Parameter Attributes
More informationInformation technology Programming languages, their environments, and system software interfaces Floating- point extensions for C
Draft Technical Specification September 28, 1 ISO/IEC JTC 1/SC 22/WG 14 N1968 TECHNICAL SPECIFICATION ISO/IEC TS 18661- First edition 1y- mm- dd Information technology Programming languages, their environments,
More information数理 計算科学特論 C プログラミング言語処理系の最先端実装技術 LLVM Compiler Framework
数理 計算科学特論 C プログラミング言語処理系の最先端実装技術 LLVM Compiler Framework What is LLVM? LLVM is an open-source compiler infrastructure (e.g. COINS, Eclipse OMR): Its official page says: The LLVM Project is a collection
More informationConnecting the EDG front-end to LLVM. Renato Golin, Evzen Muller, Jim MacArthur, Al Grant ARM Ltd.
Connecting the EDG front-end to LLVM Renato Golin, Evzen Muller, Jim MacArthur, Al Grant ARM Ltd. 1 Outline Why EDG Producing IR ARM support 2 EDG Front-End LLVM already has two good C++ front-ends, why
More informationTutorial: Building a backend in 24 hours. Anton Korobeynikov
Tutorial: Building a backend in 24 hours Anton Korobeynikov anton@korobeynikov.info Outline 1. From IR to assembler: codegen pipeline 2. MC 3. Parts of a backend 4. Example step-by-step The Pipeline LLVM
More informationAlias Analysis in LLVM
Alias Analysis in LLVM by Sheng-Hsiu Lin Presented to the Graduate and Research Committee of Lehigh University in Candidacy for the Degree of Master of Science in Computer Science Lehigh University May
More informationINTRODUCTION TO LLVM Bo Wang SA 2016 Fall
INTRODUCTION TO LLVM Bo Wang SA 2016 Fall LLVM Basic LLVM IR LLVM Pass OUTLINE What is LLVM? LLVM is a compiler infrastructure designed as a set of reusable libraries with well-defined interfaces. Implemented
More informationConsistency of Floating-Point Results using the Intel Compiler or Why doesn t my application always give the same answer?
Consistency of Floating-Point Results using the Intel Compiler or Why doesn t my application always give the same answer? Dr. Martyn J. Corden David Kreitzer Software Services Group Intel Corporation Introduction
More informationInstruction Set extensions to X86. Floating Point SIMD instructions
Instruction Set extensions to X86 Some extensions to x86 instruction set intended to accelerate 3D graphics AMD 3D-Now! Instructions simply accelerate floating point arithmetic. Accelerate object transformations
More informationAdding Binary Integers. Part 5. Adding Base 10 Numbers. Adding 2's Complement. Adding Binary Example = 10. Arithmetic Logic Unit
Part 5 Adding Binary Integers Arithmetic Logic Unit = Adding Binary Integers Adding Base Numbers Computer's add binary numbers the same way that we do with decimal Columns are aligned, added, and "'s"
More informationLLVM code generation and implementation of nested functions for the SimpliC language
LLVM code generation and implementation of nested functions for the SimpliC language Oscar Legetth Lunds University dat12ole@student.lth.se Gustav Svensson Lunds University dat12gs1@student.lth.se Abstract
More informationHigh Performance Computing: Tools and Applications
High Performance Computing: Tools and Applications Edmond Chow School of Computational Science and Engineering Georgia Institute of Technology Lecture 8 Processor-level SIMD SIMD instructions can perform
More informationCompiler Construction Lent Term 2015 Lectures 10, 11 (of 16)
Compiler Construction Lent Term 15 Lectures 10, 11 (of 16) 1. Slang.2 (Lecture 10) 1. In lecture code walk of slang2_derive 2. Assorted topics (Lecture 11) 1. Exceptions 2. Objects 3. Stacks vs. Register
More informationTapir: Embedding Fork-Join Parallelism into LLVM s Intermediate Representation
Tapir: Embedding Fork-Join Parallelism into LLVM s Intermediate Representation. 2016, 2016 Joint work with and Charles E. Leiserson 1 Example: Normalizing a Vector attribute ((const)) double norm(const
More informationA Brief Introduction to Using LLVM. Nick Sumner
A Brief Introduction to Using LLVM Nick Sumner What is LLVM? A compiler? (clang) What is LLVM? A compiler? (clang) A set of formats, libraries, and tools. What is LLVM? A compiler? (clang) A set of formats,
More informationCSE P 501 Compilers. Intermediate Representations Hal Perkins Spring UW CSE P 501 Spring 2018 G-1
CSE P 501 Compilers Intermediate Representations Hal Perkins Spring 2018 UW CSE P 501 Spring 2018 G-1 Administrivia Semantics/types/symbol table project due ~2 weeks how goes it? Should be caught up on
More informationLLVM Tutorial. John Criswell University of Rochester
LLVM Tutorial John Criswell University of Rochester 1 Overview 2 History of LLVM Developed by Chris Lattner and Vikram Adve at the University of Illinois at Urbana-Champaign Released open-source in October
More informationCompiling for Performance on hp OpenVMS I64. Doug Gordon Original Presentation by Bill Noyce European Technical Update Days, 2005
Compiling for Performance on hp OpenVMS I64 Doug Gordon Original Presentation by Bill Noyce European Technical Update Days, 2005 Compilers discussed C, Fortran, [COBOL, Pascal, BASIC] Share GEM optimizer
More informationThe x87 Floating-Point Unit
The x87 Floating-Point Unit Lecture 27 Intel Manual, Vol. 1, Chapter 8 Intel Manual, Vol. 2 Robb T. Koether Hampden-Sydney College Fri, Mar 27, 2015 Robb T. Koether (Hampden-Sydney College) The x87 Floating-Point
More informationAyal Zaks and Gil Rapaport, Vectorization Team, Intel Corporation. October 18 th, 2017 US LLVM Developers Meeting, San Jose, CA
Ayal Zaks and Gil Rapaport, Vectorization Team, Intel Corporation October 18 th, 2017 US LLVM Developers Meeting, San Jose, CA Legal Disclaimer & INFORMATION IN THIS DOCUMENT IS PROVIDED AS IS. NO LICENSE,
More informationFloating-point control in the Intel C/C++ compiler and libraries or Why doesn t my application always give the same answer?
Floating-point control in the Intel C/C++ compiler and libraries or Why doesn t my application always give the same answer? Martyn Corden Software Solutions Group Intel Corporation February 2012 *Other
More informationTS ,2 FOR C2X. N2095 WG 14 - Pittsburg October 17-21, C FP group
TS 18661-1,2 FOR C2X N2095 WG 14 - Pittsburg October 17-21, 2016 C FP group TS 18661-1 FOR C2X WG 14 - Pittsburg October 17-21, 2016 C FP group TS 18661-1 for C2x TS 18661 background TS 18661-1 Overview
More informationDynamic SIMD Scheduling
Dynamic SIMD Scheduling Florian Wende SC15 MIC Tuning BoF November 18 th, 2015 Zuse Institute Berlin time Dynamic Work Assignment: The Idea Irregular SIMD execution Caused by branching: control flow varies
More informationA Translation Framework for Automatic Translation of Annotated LLVM IR into OpenCL Kernel Function
A Translation Framework for Automatic Translation of Annotated LLVM IR into OpenCL Kernel Function Chen-Ting Chang, Yu-Sheng Chen, I-Wei Wu, and Jyh-Jiun Shann Dept. of Computer Science, National Chiao
More informationUnder the Compiler's Hood: Supercharge Your PLAYSTATION 3 (PS3 ) Code. Understanding your compiler is the key to success in the gaming world.
Under the Compiler's Hood: Supercharge Your PLAYSTATION 3 (PS3 ) Code. Understanding your compiler is the key to success in the gaming world. Supercharge your PS3 game code Part 1: Compiler internals.
More informationCS152 Computer Architecture and Engineering CS252 Graduate Computer Architecture. VLIW, Vector, and Multithreaded Machines
CS152 Computer Architecture and Engineering CS252 Graduate Computer Architecture VLIW, Vector, and Multithreaded Machines Assigned 3/24/2019 Problem Set #4 Due 4/5/2019 http://inst.eecs.berkeley.edu/~cs152/sp19
More informationLecture 3 Overview of the LLVM Compiler
LLVM Compiler System Lecture 3 Overview of the LLVM Compiler The LLVM Compiler Infrastructure - Provides reusable components for building compilers - Reduce the time/cost to build a new compiler - Build
More informationMatthieu Lefebvre Princeton University. Monday, 10 July First Computational and Data Science school for HEP (CoDaS-HEP)
Matthieu Lefebvre Princeton University Monday, 10 July 2017 First Computational and Data Science school for HEP (CoDaS-HEP) Prerequisites: recent C++ compiler Eventually cmake git clone https://github.com/mpbl/codas_fpa/
More informationDiego Caballero and Vectorizer Team, Intel Corporation. April 16 th, 2018 Euro LLVM Developers Meeting. Bristol, UK.
Diego Caballero and Vectorizer Team, Intel Corporation. April 16 th, 2018 Euro LLVM Developers Meeting. Bristol, UK. Legal Disclaimer & Software and workloads used in performance tests may have been optimized
More informationConsistency of Floating-Point Results using the Intel Compiler or Why doesn t my application always give the same answer?
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 Consistency of Floating-Point Results using the Intel Compiler or Why doesn t my application
More informationSequence 5.1 Building stack frames in LLVM
Sequence 5.1 Building stack frames in LLVM P. de Oliveira Castro S. Tardieu 1/13 P. de Oliveira Castro, S. Tardieu Reminder: Stack frames We have seen earlier that: A function can access its local variables
More informationAnnouncements. ECE4750/CS4420 Computer Architecture L11: Speculative Execution I. Edward Suh Computer Systems Laboratory
ECE4750/CS4420 Computer Architecture L11: Speculative Execution I Edward Suh Computer Systems Laboratory suh@csl.cornell.edu Announcements Lab3 due today 2 1 Overview Branch penalties limit performance
More informationOpenMP: Vectorization and #pragma omp simd. Markus Höhnerbach
OpenMP: Vectorization and #pragma omp simd Markus Höhnerbach 1 / 26 Where does it come from? c i = a i + b i i a 1 a 2 a 3 a 4 a 5 a 6 a 7 a 8 + b 1 b 2 b 3 b 4 b 5 b 6 b 7 b 8 = c 1 c 2 c 3 c 4 c 5 c
More informationGuy Blank Intel Corporation, Israel March 27-28, 2017 European LLVM Developers Meeting Saarland Informatics Campus, Saarbrücken, Germany
Guy Blank Intel Corporation, Israel March 27-28, 2017 European LLVM Developers Meeting Saarland Informatics Campus, Saarbrücken, Germany Motivation C AVX2 AVX512 New instructions utilized! Scalar performance
More informationFloating-Point Arithmetic
ENEE446---Lectures-4/10-15/08 A. Yavuz Oruç Professor, UMD, College Park Copyright 2007 A. Yavuz Oruç. All rights reserved. Floating-Point Arithmetic Integer or fixed-point arithmetic provides a complete
More informationLecture 4 Overview of the LLVM Compiler
Lecture 4 Overview of the LLVM Compiler Pratik Fegade Thanks to: Vikram Adve, Jonathan Burket, Deby Katz, David Koes, Chris Lattner, Gennady Pekhimenko, and Olatunji Ruwase, for their slides Visualizing
More informationLecture 4 More on the LLVM Compiler
Lecture 4 More on the LLVM Compiler Abhilasha Jain Thanks to: Jonathan Burket, Deby Katz, Gabe Weisz, Luke Zarko, and Dominic Chen for their slides Visualizing the LLVM Compiler System C C++ Java Source
More informationHomework #3: CMPT-379
Only submit answers for questions marked with. Homework #3: CMPT-379 Download the files for this homework: wget http://www.cs.sfu.ca/ msiahban/personal/teaching/cmpt-379-spring-2016/hw3.tgz Put your solution
More informationOptiCode: Machine Code Deobfuscation for Malware Analysis
OptiCode: Machine Code Deobfuscation for Malware Analysis NGUYEN Anh Quynh, COSEINC CONFidence, Krakow - Poland 2013, May 28th 1 / 47 Agenda 1 Obfuscation problem in malware analysis
More informationHSAIL: PORTABLE COMPILER IR FOR HSA
HSAIL: PORTABLE COMPILER IR FOR HSA HOT CHIPS TUTORIAL - AUGUST 2013 BEN SANDER AMD SENIOR FELLOW STATE OF GPU COMPUTING GPUs are fast and power efficient : high compute density per-mm and per-watt But:
More informationReuse Optimization. LLVM Compiler Infrastructure. Local Value Numbering. Local Value Numbering (cont)
LLVM Compiler Infrastructure Source: LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation by Lattner and Adve Reuse Optimization Eliminate redundant operations in the dynamic execution
More informationCompiler Options. Linux/x86 Performance Practical,
Center for Information Services and High Performance Computing (ZIH) Compiler Options Linux/x86 Performance Practical, 17.06.2009 Zellescher Weg 12 Willers-Bau A106 Tel. +49 351-463 - 31945 Ulf Markwardt
More informationCompiler construction. Course info. Today. Lecture 1: Introduction and project overview. Compiler Construction Why learn to write a compiler?
Today Compiler construction Lecture 1: Introduction and project overview Course info Introduction to compiling Some examples Project description Magnus Myreen Spring 2018 Chalmers University of Technology
More informationx86: assembly for a real machine Compiler construction 2012 x86 assembler, a first example Example explained Lecture 7
x86 architecture Compiler construction 2012 x86: assembly for a real machine x86 architecture Calling conventions Some x86 instructions Instruction selection Instruction scheduling Register allocation
More informationMeasuring the User Debugging Experience. Greg Bedwell Sony Interactive Entertainment
Measuring the User Debugging Experience Greg Bedwell Sony Interactive Entertainment introducing DExTer introducing Debugging Experience Tester introducing Debugging Experience Tester (currently in internal
More informationAssignment 1c: Compiler organization and backend programming
Assignment 1c: Compiler organization and backend programming Roel Jordans 2016 Organization Welcome to the third and final part of assignment 1. This time we will try to further improve the code generation
More information4/1/15 LLVM AND SSA. Low-Level Virtual Machine (LLVM) LLVM Compiler Infrastructure. LL: A Subset of LLVM. Basic Blocks
4//5 Low-Level Virtual Machine (LLVM) LLVM AND SSA Slides adapted from those prepared by Steve Zdancewic at Penn Open-Source Compiler Infrastructure see llvm.org for full documntation Created by Chris
More informationTutorial: Building a backend in 24 hours. Anton Korobeynikov
Tutorial: Building a backend in 24 hours Anton Korobeynikov anton@korobeynikov.info Outline 1. Codegen phases and parts 2. The Target 3. First steps 4. Custom lowering 5. Next steps Codegen Phases Preparation
More informationAdvanced OpenMP Features
Christian Terboven, Dirk Schmidl IT Center, RWTH Aachen University Member of the HPC Group {terboven,schmidl@itc.rwth-aachen.de IT Center der RWTH Aachen University Vectorization 2 Vectorization SIMD =
More informationCS 426 Fall Machine Problem 4. Machine Problem 4. CS 426 Compiler Construction Fall Semester 2017
CS 426 Fall 2017 1 Machine Problem 4 Machine Problem 4 CS 426 Compiler Construction Fall Semester 2017 Handed out: November 16, 2017. Due: December 7, 2017, 5:00 p.m. This assignment deals with implementing
More informationCSE 401/M501 Compilers
CSE 401/M501 Compilers Intermediate Representations Hal Perkins Autumn 2018 UW CSE 401/M501 Autumn 2018 G-1 Agenda Survey of Intermediate Representations Graphical Concrete/Abstract Syntax Trees (ASTs)
More informationLDC: The LLVM-based D Compiler
LDC: The LLVM-based D Compiler Using LLVM as backend for a D compiler Kai Nacke 02/02/14 LLVM devroom @ FOSDEM 14 Agenda Brief introduction to D Internals of the LDC compiler Used LLVM features Possible
More informationComputer Architecture Chapter 3. Fall 2005 Department of Computer Science Kent State University
Computer Architecture Chapter 3 Fall 2005 Department of Computer Science Kent State University Objectives Signed and Unsigned Numbers Addition and Subtraction Multiplication and Division Floating Point
More informationOverview (4) CPE 101 mod/reusing slides from a UW course. Assignment Statement: Review. Why Study Expressions? D-1
CPE 101 mod/reusing slides from a UW course Overview (4) Lecture 4: Arithmetic Expressions Arithmetic expressions Integer and floating-point (double) types Unary and binary operators Precedence Associativity
More informationCS240: Programming in C
CS240: Programming in C Lecture 5: Functions. Scope of variables. Program structure. Cristina Nita-Rotaru Lecture 5/ Fall 2013 1 Functions: Explicit declaration Declaration, definition, use, order matters.
More informationType Checking. Chapter 6, Section 6.3, 6.5
Type Checking Chapter 6, Section 6.3, 6.5 Inside the Compiler: Front End Lexical analyzer (aka scanner) Converts ASCII or Unicode to a stream of tokens Syntax analyzer (aka parser) Creates a parse tree
More informationAdvanced Computer Architecture
ECE 563 Advanced Computer Architecture Fall 2010 Lecture 6: VLIW 563 L06.1 Fall 2010 Little s Law Number of Instructions in the pipeline (parallelism) = Throughput * Latency or N T L Throughput per Cycle
More informationImproving Compiler Optimizations using Program Annotations
Improving Compiler Optimizations using Program Annotations BY NIKO ZARZANI Laurea, Politecnico di Milano, Milan, Italy, 2011 THESIS Submitted as partial fulfillment of the requirements for the degree of
More informationImproving Numerical Reproducibility in C/C++/Fortran
Improving Numerical Reproducibility in C/C++/Fortran Steve Lionel Intel Corporation steve.lionel@intel.com 1 The Three Objectives Accuracy Reproducibility Performance Pick two Reproducibility Consistent
More informationLecture 7 CIS 341: COMPILERS
Lecture 7 CIS 341: COMPILERS Announcements HW2: X86lite Available on the course web pages. Due: TOMORROW at 11:59:59pm NOTE: submission server was broken last night/this morning It should now support Ocaml
More information