LLVM IR Code Generations Inside YACC. Li-Wei Kuo

Similar documents
LLVM & LLVM Bitcode Introduction

Intermediate Code Generation

15-411: LLVM. Jan Hoffmann. Substantial portions courtesy of Deby Katz

CIS 341 Midterm February 28, Name (printed): Pennkey (login id): SOLUTIONS

LLVM and IR Construction

Where We Are. Lexical Analysis. Syntax Analysis. IR Generation. IR Optimization. Code Generation. Machine Code. Optimization.

A Brief Introduction to Using LLVM. Nick Sumner

INTRODUCTION TO LLVM Bo Wang SA 2016 Fall

Lecture 2 Overview of the LLVM Compiler

The structure of a compiler

Announcements. My office hours are today in Gates 160 from 1PM-3PM. Programming Project 3 checkpoint due tomorrow night at 11:59PM.

Compiler Construction: LLVMlite

A Fast Review of C Essentials Part I

Computer System and programming in C

Lecture 3 Overview of the LLVM Compiler

CS Programming In C

EC 413 Computer Organization

Operators and Expressions:

Targeting LLVM IR. LLVM IR, code emission, assignment 4

Declaration. Fundamental Data Types. Modifying the Basic Types. Basic Data Types. All variables must be declared before being used.

Lecture Set 4: More About Methods and More About Operators

These are reserved words of the C language. For example int, float, if, else, for, while etc.

Code Generation. Dragon: Ch (Just part of it) Holub: Ch 6.

Chapter 3. Section 3.10 Type of Expressions and Automatic Conversion. CS 50 Hathairat Rattanasook

Features of C. Portable Procedural / Modular Structured Language Statically typed Middle level language

Lecture 2: C Programming Basic

Programming in C++ 6. Floating point data types

Lecture 3. Variables. Variables

Marcello Bersani Ed. 22, via Golgi 42, 3 piano 3769

A Translation Framework for Automatic Translation of Annotated LLVM IR into OpenCL Kernel Function

Data Storage. August 9, Indiana University. Geoffrey Brown, Bryce Himebaugh 2015 August 9, / 19

Introduction to LLVM compiler framework

Lecture Set 4: More About Methods and More About Operators

Data Types and Variables in C language

Variables Data types Variable I/O. C introduction. Variables. Variables 1 / 14

IMPORTANT QUESTIONS IN C FOR THE INTERVIEW

M4.1-R3: PROGRAMMING AND PROBLEM SOLVING THROUGH C LANGUAGE

Computers Programming Course 5. Iulian Năstac

DECLARAING AND INITIALIZING POINTERS

CS113: Lecture 3. Topics: Variables. Data types. Arithmetic and Bitwise Operators. Order of Evaluation

Chapter 11 Introduction to Programming in C

LECTURE 11. Semantic Analysis and Yacc

CIS 341 Midterm March 2, 2017 SOLUTIONS

UNIT- 3 Introduction to C++

Plan for Today. Concepts. Next Time. Some slides are from Calvin Lin s grad compiler slides. CS553 Lecture 2 Optimizations and LLVM 1

Concepts Introduced in Chapter 6

Chapter 10. Programming in C

Programming for Engineers Iteration

Fundamentals of Programming

Concepts Introduced in Chapter 3

Introduction to LLVM compiler framework

CS 31: Intro to Systems Binary Arithmetic. Martin Gagné Swarthmore College January 24, 2016

LLVM code generation and implementation of nested functions for the SimpliC language

Fundamental of Programming (C)

Project 2 Interpreter for Snail. 2 The Snail Programming Language

Basic C Programming (2) Bin Li Assistant Professor Dept. of Electrical, Computer and Biomedical Engineering University of Rhode Island

Chapter 11 Introduction to Programming in C

Work relative to other classes

More about BOOLEAN issues

Chapter 2: Overview of C. Problem Solving & Program Design in C

Concepts Introduced in Chapter 6

Chapter 11 Introduction to Programming in C

Question Bank. 10CS63:Compiler Design

Assignment 11: functions, calling conventions, and the stack

Types, Variables, and Constants

The CPU and Memory. How does a computer work? How does a computer interact with data? How are instructions performed? Recall schematic diagram:

Administrivia. Introduction to Computer Systems. Pointers, cont. Pointer example, again POINTERS. Project 2 posted, due October 6

CS4850 SummerII Lex Primer. Usage Paradigm of Lex. Lex is a tool for creating lexical analyzers. Lexical analyzers tokenize input streams.

Informatics Ingeniería en Electrónica y Automática Industrial

What is a compiler? var a var b mov 3 a mov 4 r1 cmpi a r1 jge l_e mov 2 b jmp l_d l_e: mov 3 b l_d: ;done

Introduction to Lex & Yacc. (flex & bison)

Outline. Performing Computations. Outline (cont) Expressions in C. Some Expression Formats. Types for Operands

Chapter 2: Using Data

CSE P 501 Exam 8/5/04 Sample Solution. 1. (10 points) Write a regular expression or regular expressions that generate the following sets of strings.

Agenda. CS 61C: Great Ideas in Computer Architecture. Lecture 2: Numbers & C Language 8/29/17. Recap: Binary Number Conversion

UNIT II Structuring the Data, Computations and Program. Kainjan Sanghavi

TDDD55 - Compilers and Interpreters Lesson 3

CS 61C: Great Ideas in Computer Architecture. Lecture 2: Numbers & C Language. Krste Asanović & Randy Katz

Writing Program in C Expressions and Control Structures (Selection Statements and Loops)

Programming in C - Part 2

Part I Part 1 Expressions

Introduction to LLVM. UG3 Compiling Techniques Autumn 2018

B.V. Patel Institute of Business Management, Computer & Information Technology, Uka Tarsadia University

CS102: Variables and Expressions

TDDD55- Compilers and Interpreters Lesson 2

APPENDIX A : Example Standard <--Prev page Next page -->

Lecture 3. More About C

P.G.TRB - COMPUTER SCIENCE. c) data processing language d) none of the above

CIS 341 Midterm March 2, Name (printed): Pennkey (login id): Do not begin the exam until you are told to do so.

Lex & Yacc (GNU distribution - flex & bison) Jeonghwan Park

COMPILER CONSTRUCTION LAB 2 THE SYMBOL TABLE. Tutorial 2 LABS. PHASES OF A COMPILER Source Program. Lab 2 Symbol table

TDDD55- Compilers and Interpreters Lesson 3

Syntax. A. Bellaachia Page: 1

Automatic Scanning and Parsing using LEX and YACC

Creating a C++ Program

Pointers. 10/5/07 Pointers 1

BLM2031 Structured Programming. Zeyneb KURT

BASIC ELEMENTS OF A COMPUTER PROGRAM

Character Set. The character set of C represents alphabet, digit or any symbol used to represent information. Digits 0, 1, 2, 3, 9

Lecture 3: C Programm

Transcription:

LLVM IR Code Generations Inside YACC Li-Wei Kuo

LLVM IR LLVM code representation In memory compiler IR (Intermediate Representation) On-disk bitcode representation (*.bc) Human readable assembly language LLVM IR (*.ll) Our Target LLVM IR is SSA form (Static single assignment form) Each variable is assigned exactly once Use-def chains are explicit and each contains a single element

LLVM command Generate the *.bc $ clang -c emit-llvm a.c o a.bc $ llvm-dis a.bc -o a.ll Generate the *.ll (human-readable) $ clang S emit-llvm a.c o a.ll Using interpreter to run bitcode $ lli test.bc $ lli test.ll

LLVM IR example test1.ll Header test1.c Global Function Local clang Body

LLVM Module In LLVM, a module represents a single unit of code that is to be processed together. A module contains things like global variables, function declarations, and implementations. Format: ; Module ID = file name

Target data layout & triple Data layout - A module may specify a target specific data layout string that specifies how data is to be laid out in memory. Triple - Helper class for working with autoconf configuration names (they used to contain exactly three fields).

Overview of routines extdef: TYPESPEC notype_declarator ';' { if (TRACEON) printf("7 "); set_global_vars($2); } notype_declaratory { if (TRACEON) printf("10 "); cur_scope++; set_scope_and_offset_of_param($1); code_gen_func_header($1); } '{' xdecls { if (TRACEON) printf("10.5 "); set_local_vars($1); } stmts { if (TRACEON) printf("11 "); pop_up_symbol(cur_scope); cur_scope--; code_gen_at_end_of_function_body($1); }

Overview of routines extdef: TYPESPEC notype_declarator ';' { if (TRACEON) printf("7 "); set_global_vars($2); } TYPESPEC notype_declaratory { if (TRACEON) printf("10 "); cur_scope++; set_scope_and_offset_of_param($2); code_gen_func_header($2); } '{' xdecls { if (TRACEON) printf("10.5 "); set_local_vars($2); } stmts { if (TRACEON) printf("11 "); pop_up_symbol(cur_scope); cur_scope--; code_gen_at_end_of_function_body($2); }

Code generation with header file pointer: f_llvm void f(a) float a; { /*... */ } Both C89/90 and C99 still officially support K&R style declarations.

Global variable: int, float, double clang @variable_name = linkage_type global variable_type value, alignment 32-bit x86 alignment: A char will be 1-byte aligned. A short will be 2-byte aligned. An int will be 4-byte aligned. A long will be 4-byte aligned. A float will be 4-byte aligned. A double will be 8-byte aligned.

Code generation with global Vars Only implement integer type without initial value

Local variable: int, float, double clang %variable_name = alloca variable_type, alignment

Setup local variables Only implement integer type without initial value

Function clang define return_type @function_name (parm_type %parm_name) function_attributes { entry: %parm_name.addr = alloca parm_type, alignment store parm_type %parm_name, parm_type* %parm_name.addr, alignment } ret return_type value

Code generation function header Only implement integer return type and no parameter

Code generation function end Only implement integer return type

Arithmetic operation: Add Add 2 operand clang %SSA_form_temp_var = load variable_type % @var, alignment %SSA_form_temp_var = add nsw nuw variable_type % @op1, variable_type % @op2 store variable_type % @var, result_type % @result, alignment nuw and nsw stand for No Unsigned Wrap and No Signed Wrap

Arithmetic operation: Add Add 3 operand a = b + c + d; clang Add 4 operand a = b + c + d + e; clang

Grammar expr_no_commas: primary { } expr_no_commas '+' expr_no_commas { } expr_no_commas '=' expr_no_commas { } expr_no_commas '*' expr_no_commas { } ; primary: IDENTIFIER { } CONSTANT { } STRING { } primary PLUSPLUS { } ;

Grammar expr_no_commas: primary { } expr_no_commas '+' after_expr_no_commas { } expr_no_commas '=' expr_no_commas { } expr_no_commas '*' after_expr_no_commas ; { } after_expr_no_commas: primary { } ; primary: IDENTIFIER { } CONSTANT { } STRING { } primary PLUSPLUS ; Load operand 1 Store result { } Handle int Handle string Handle operand 2 Handle variable Load operand 2

Type conflict Solution: change grammar or use variable to record value. primary: IDENTIFIER { } CONSTANT { } STRING { } primary PLUSPLUS { } ;

Handle SSA Using global counter to store SSA value. Implement +, -, *, / instruction SSA temporal variables.

Load variable: int Global, Local Only implement integer type

Add operation: int Using global variable to store operand value. Handle add instruction SSA temporal variables.

Add operation: int (cont.)

Implement instruction Use node to store each operand and SSA variable. e.g. fprintf(f_llvm, %s = load %s* %s, align %d\n, SSA, type, var, align); fprintf(f_llvm, %s = add nsw %s %s, %s\n, addssa, type, op1, op2); fprintf(f_llvm, store %s %s, %s* %s, align %d\n, type, var, type, result, align);

Store result: int Global, Local Only implement integer type add operation

Optimized IR clang Self-made

Optimized IR (cont.) a = a + 1 + 2 + 3 + 4; clang a = 1 + 2 + 3 + 4 + a; clang

Unimplemented part Declaration initialize Precedence: a + (b + c) Different type: char, string, float, double Function call Signed, unsigned If then else printf

Char, string clang

Type conversion clang

printf clang

Reference LLVM Language Reference Manual http://llvm.org/docs/langref.html lex & yacc, 2nd Edition by John R.Levine, Tony Mason & Doug Brown O Reilly ISBN: 1-56592-000-7

First compiler? Bootstrapping http://en.wikipedia.org/wiki/bootstrapping_%28compilers%29 History of compiler construction http://en.wikipedia.org/wiki/history_of_compiler_writing

Compiler is a software Compiler + + = Machine code, Assembler