How Compiling and Compilers Work

Similar documents
Advanced School in High Performance and GRID Computing November 2008

C Compilation Model. Comp-206 : Introduction to Software Systems Lecture 9. Alexandre Denault Computer Science McGill University Fall 2006

Draft. Chapter 1 Program Structure. 1.1 Introduction. 1.2 The 0s and the 1s. 1.3 Bits and Bytes. 1.4 Representation of Numbers in Memory

Comp 524 Spring 2009 Exercise 1 Solutions Due in class (on paper) at 3:30 PM, January 29, 2009.

Outline. Compiling process Linking libraries Common compiling op2ons Automa2ng the process

x86 assembly CS449 Fall 2017

Work relative to other classes

PRINCIPLES OF OPERATING SYSTEMS

On a 64-bit CPU. Size/Range vary by CPU model and Word size.

Programs. Function main. C Refresher. CSCI 4061 Introduction to Operating Systems

x86 assembly CS449 Spring 2016

Introduction Selected details Live demos. HrwCC. A self-compiling C-compiler. Stefan Huber Christian Rathgeb Stefan Walkner

Chapter 03: Computer Arithmetic. Lesson 09: Arithmetic using floating point numbers

Basic C Programming. Bin Li Assistant Professor Dept. of Electrical, Computer and Biomedical Engineering University of Rhode Island

CS240: Programming in C

C Overview Fall 2014 Jinkyu Jeong

Numerical computing. How computers store real numbers and the problems that result

Follow us on Twitter for important news and Compiling Programs

2. Numbers In, Numbers Out

Integer Representation. Variables. Real Representation. Integer Overflow/Underflow

Essentials for Scientific Computing: Source Code, Compilation and Libraries Day 8

2 Compiling a C program

2. Numbers In, Numbers Out

Chapter 11 Introduction to Programming in C

Department of Computer Science and Engineering Yonghong Yan

Chapter 11 Introduction to Programming in C

Simple C Program. Assembly Ouput. Using GCC to produce Assembly. Assembly produced by GCC is easy to recognize:

CS 326 Operating Systems C Programming. Greg Benson Department of Computer Science University of San Francisco

CS2141 Software Development using C/C++ Compiling a C++ Program

Topic 6: A Quick Intro To C

Foundations of Computer Systems

Introduction Presentation A

CS 261 Fall C Introduction. Variables, Memory Model, Pointers, and Debugging. Mike Lam, Professor

Chapter 11 Introduction to Programming in C

How to learn C? CSCI [4 6]730: A C Refresher or Introduction. Diving In: A Simple C Program 1-hello-word.c

Floating point. Today! IEEE Floating Point Standard! Rounding! Floating Point Operations! Mathematical properties. Next time. !

CS Basics 15) Compiling a C prog.

Berner Fachhochschule Haute cole spcialise bernoise Berne University of Applied Sciences 2

Chapter 11 Introduction to Programming in C

System calls and assembler

Continue: How do I learn C? C Primer Continued (Makefiles, debugging, and more ) Last Time: A Simple(st) C Program 1-hello-world.c!

A software view. Computer Systems. The Compilation system. How it works. 1. Preprocesser. 1. Preprocessor (cpp)

Floating Point January 24, 2008

C introduction: part 1

Programming Language Basics

Basic Types and Formatted I/O

Chapter 11 Introduction to Programming in C

Physics 306 Computing Lab 5: A Little Bit of This, A Little Bit of That

INTERMEDIATE SOFTWARE DESIGN SPRING 2011 ACCESS SPECIFIER: SOURCE FILE

Data Representation Floating Point

CS133 C Programming. Instructor: Jialiang Lu Office: Information Center 703

Floating Point Numbers

Floating Point Numbers

Data Representation Floating Point

printf( Please enter another number: ); scanf( %d, &num2);

Systems I. Floating Point. Topics IEEE Floating Point Standard Rounding Floating Point Operations Mathematical properties

Floating Point Puzzles The course that gives CMU its Zip! Floating Point Jan 22, IEEE Floating Point. Fractional Binary Numbers.

A Fast Review of C Essentials Part I

Chapter 2 Float Point Arithmetic. Real Numbers in Decimal Notation. Real Numbers in Decimal Notation

C OVERVIEW BASIC C PROGRAM STRUCTURE. C Overview. Basic C Program Structure

Topic 6: A Quick Intro To C. Reading. "goto Considered Harmful" History

Systems Programming. Fatih Kesgin &Yusuf Yaslan Istanbul Technical University Computer Engineering Department 18/10/2005

15213 Recitation 2: Floating Point

Number Systems. Binary Numbers. Appendix. Decimal notation represents numbers as powers of 10, for example

Chapter 11 Introduction to Programming in C

NAN propagation versus fault trapping in floating point code

CS 107 Lecture 18: GCC and Make

Program Translation. text. text. binary. binary. C program (p1.c) Compiler (gcc -S) Asm code (p1.s) Assembler (gcc or as) Object code (p1.

Floating Point Puzzles. Lecture 3B Floating Point. IEEE Floating Point. Fractional Binary Numbers. Topics. IEEE Standard 754

Topics. CS429: Computer Organization and Architecture. File Inclusion. A Simple C Program. Intro to C

Hello, World! in C. Johann Myrkraverk Oskarsson October 23, The Quintessential Example Program 1. I Printing Text 2. II The Main Function 3

CMPSC 311- Introduction to Systems Programming Module: Build Processing

Giving credit where credit is due

Compiler Theory. (GCC the GNU Compiler Collection) Sandro Spina 2009

C OVERVIEW. C Overview. Goals speed portability allow access to features of the architecture speed

Introduction to Supercomputing

C Introduction. Comparison w/ Java, Memory Model, and Pointers

Week 1 / Lecture 2 8 March 2017 NWEN 241 C Fundamentals. Alvin Valera. School of Engineering and Computer Science Victoria University of Wellington

ENCE Computer Organization and Architecture. Chapter 1. Software Perspective

The course that gives CMU its Zip! Floating Point Arithmetic Feb 17, 2000

Giving credit where credit is due

Floating Point Numbers

System Programming CISC 360. Floating Point September 16, 2008

M2 Instruction Set Architecture

CS429: Computer Organization and Architecture

COMP2611: Computer Organization. Data Representation

Floating Point (with contributions from Dr. Bin Ren, William & Mary Computer Science)

Floating Point : Introduction to Computer Systems 4 th Lecture, May 25, Instructor: Brian Railing. Carnegie Mellon

AN OVERVIEW OF C. CSE 130: Introduction to Programming in C Stony Brook University

Fundamental Data Types. CSE 130: Introduction to Programming in C Stony Brook University

Raspberry Pi / ARM Assembly. OrgArch / Fall 2018

COSC121: Computer Systems: Runtime Stack

Language Design COMS W4115. Prof. Stephen A. Edwards Spring 2003 Columbia University Department of Computer Science

CPS104 Recitation: Assembly Programming

Credits and Disclaimers

Bryant and O Hallaron, Computer Systems: A Programmer s Perspective, Third Edition. Carnegie Mellon

Bristol Institute of Technology

Conduite de Projet Cours 4 The C build process

Lab 10: Introduction to x86 Assembly

Introduction to Operating Systems Part I

Transcription:

How Compiling and Compilers Work Dr. Axel Kohlmeyer Research Professor, Department of Mathematics Associate Director, Institute for Computational Science Assistant Vice President for High-Performance Computing Temple University, Philadelphia, USA External Associate HPC International Centre for Theoretical Physics Trieste, Italy a.kohlmeyer@temple.edu

Representing Real Numbers (1) Real numbers have unlimited accuracy Yet computers think digital, i.e. in integer math => only a fixed range of numbers can be represented by a fixed number of bits => distance between two integers is 1 We can reduce the distance through fractions (= fixed point), but that also reduces the range 16-bit 32-bit 64-bit 28-bit / 4-bit 22-bit / 10-bit Min. -32768-2147483648 ~ -9.2233 * 10-18 -16777216.0000-2048.000000 Max. 32767 2147483647 ~ 9.2233 * 10-18 16777215.9375 ~ 2047.999023 Dist. 1 1 1 0.0635 0.0009765625 2

Representing Real Numbers (2) We need a way to represent a wider range of numbers with a same number of bits We need a way to represent numbers with a reasonable amount of precision (distance) The same relative precision is often sufficient: => Scientific notation: +/-(mantissa) * (base) +/-(exponent) Mantissa -> fixed point number (1.0 <= x < 2.0) Base -> 2 Exponent -> a small integer (few bits) 3

IEEE 754 Floating-point Numbers The IEEE 754 standard defines: storage format, result of operations, special values (infinity, overflow, invalid number), error handling => portability of compute kernels ensured Numbers are defined as bit patterns with a sign bit, an exponential field, and a fraction field Single precision: 8-bit exponent 23-bit fraction Double precision: 11-bit exponent 52-bit fraction 4

Values of Floating-Point Numbers Value: (1 + {mantissa}/(2 {fraction bits} ) * 2 ({exponent}-{bias}) 0 {mantissa} < 2 {fraction bits}, {exponent} 0 Special case: 0.0 is all bits set to zero Special case: -0.0 is like 0.0 but sign bit is set More special cases: Inf, -Inf, NaN, -NaN Single precision: ~±1.2*10-38 < x < ~±3.4*10 38 relative precision: ~7 decimal digits Double precision: ~±2.2*10-308 < x < ~±1.8*10 308 relative precision: ~15 decimal digits 5

Density of Floating-point Numbers How can we represent so many more numbers in floating point than in integer? We don't! The number of unique bit patterns has to be the same as with integers of the same bitness There are 8,388,607 single precision numbers in 1.0< x < 2.0, but only 8191 in 1023.0< x <1024.0 => absolute precision depends on the magnitude => some numbers are not represented exactly => approximated using rounding mode (nearest) 6

Floating-Point Math Pitfalls Floating point math is commutative, but not associative! Example (single precision): 1.0 + (1.5*10 38 + (- 1.5*10 38 )) = 1.0 (1.0 + 1.5*10 38 ) + (- 1.5*10 38 ) = 0.0 => the result of a summation depends on the order of how the numbers are summed up => results may change significantly, if a compiler changes the order of operations for optimization => prefer adding numbers of same magnitude => avoid subtracting very similar numbers 7

Floating Point Comparison Floating-point results are usually inexact => comparing for equality is dangerous Example: don't use a floating point number for controlling a loop count. Integers are made for it It is OK to use exact comparison: When results have to be bitwise identical To prevent division by zero errors => compare against expected absolute error => don't expect higher accuracy than possible 8

Pre-process / Compile / Link Creating an executable includes multiple steps The compiler (gcc) is a wrapper for several commands that are executed in succession The compiler flags similarly fall into categories and are handed down to the respective tools The wrapper selects the compiler language from source file name, but links its runtime We will look into a C example first, since this is the language the OS is (mostly) written in 9

A simple C Example Consider the minimal C program 'hello.c': #include <stdio.h> int main(int argc, char **argv) { printf( hello world\n ); return 0; } i.e.: what happens, if we do: > gcc -o hello hello.c (try: gcc -v -o hello hello.c) 10

Step 1: Pre-processing Pre-processing is mandatory in C (and C++) Pre-processing will handle '#' directives File inclusion with support for nested inclusion Conditional compilation and Macro expansion In this case: /usr/include/stdio.h - and all files are included by it - are inserted and the contained macros expanded Use -E flag to stop after pre-processing: > cc -E -o hello.pp.c hello.c 11

Step 2: Compilation Compiler converts a high-level language into the specific instruction set of the target CPU Individual steps: Parse text (lexical + syntactical analysis) Do language specific transformations Translate to internal representation units (IRs) Optimization (reorder,merge,eliminate,transform) Replace IRs with chunks of assembly language Try:> gcc -S hello.c (produces hello.s) 12

Compilation cont'd.lc0:.file "hello.c".section.rodata.string "hello, world!".text.globl main.type main, @function main: pushl %ebp movl %esp, %ebp andl $-16, %esp subl $16, %esp movl $.LC0, (%esp) call puts movl $0, %eax leave ret.size main,.-main gcc replaced printf with puts try: gcc -fno-builtin -S hello.c #include <stdio.h> int main(int argc, char **argv) { printf( hello world\n ); return 0; }.ident "GCC: (GNU) 4.5.1 20100924 (Red Hat 4.5.1-4)".section.note.GNU-stack,"",@progbits 13

Step 3: Assembler / Step 4: Linker Assembler (as) translates assembly to binary Creates so-called object files (in ELF format) Try: > gcc -c hello.c Try: > nm hello.o 00000000 T main U puts Linker (ld) puts binary together with startup code and required libraries Final step, result is executable. Try: > gcc -o hello hello.o 14

Example 2: exp.c Adding Libraries #include <math.h> #include <stdio.h> int main(int argc, char **argv) { double a=2.0; printf("exp(2.0)=%f\n", exp(a)); return 0; } > gcc -o exp exp.c Fails with undefined reference to 'exp'. Add: -lm > gcc -O3 -o exp exp.c Works due to inlining at high optimization level. 15

Symbols in Object Files & Visibility Compiled object files have multiple sections and a symbol table describing their entries: Text : this is executable code Data : pre-allocated variables storage Constants : read-only data Undefined : symbols that are used but not defined Debug : debugger information (e.g. line numbers) Entries in the object files can be inspected with either the nm tool or the readelf command 16

Example File: visbility.c static const int val1 = -5; const int val2 = 10; static int val3 = -20; int val4 = -15; extern int errno; static int add_abs(const int v1, const int v2) { return abs(v1)+abs(v2); } int main(int argc, char **argv) { int val5 = 20; printf("%d / %d / %d\n", add_abs(val1,val2), add_abs(val3,val4), add_abs(val1,val5)); return 0; } nm visibility.o: 00000000 t add_abs U errno 00000024 T main U printf 00000000 r val1 00000004 R val2 00000000 d val3 00000004 D val4 17

What Happens During Linking? Historically, the linker combines a startup object (crt1.o) with all compiled or listed object files, the C library (libc) and a finish object (crtn.o) into an executable (a.out) With current compilers it is more complicated The linker then builds the executable by matching undefined references with available entries in the symbol tables of the objects crt1.o has an undefined reference to main thus C programs start at the main() function 18

What is Different in Fortran? Basic compilation principles are the same => preprocess, compile, assemble, link In Fortran, symbols are case insensitive => most compilers translate them to lower case In Fortran symbol names may be modified to make them different from C symbols (e.g. append one or more underscores) Fortran entry point is not main (no arguments) PROGRAM => MAIN (in gfortran) C-like main() provided as startup (to store args) 19

Pre-processing in C and Fortran Pre-processing is mandatory in C/C++ Pre-processing is optional in Fortran Fortran pre-processing enabled implicitly via file name: name.f, name.f90, name.for Legacy Fortran packages often use /lib/cpp: /lib/cpp -C -P -traditional -o name.f name.f -C : keep comments (may be legal Fortran code) -P : no '#line' markers (not legal Fortran syntax) -traditional : don't collapse whitespace (incompatible with fixed format sources) 20

Common Compiler Flags Optimization: -O0, -O1, -O2, -O3, -O4,... Compiler will try to rearrange generated code so it executes faster Aggressive compiler optimization may not always execute faster or may miscompile code High optimization level (> 2) may alter semantics Preprocessor flags: -I/some/dir -DSOM_SYS Linker flags: -L/some/other/dir -lm -> search for libm.so/libm.a also in /some/dir 21

Makefiles: Concepts Simplify building large code projects Speed up re-compile on small changes Consistent build command: make Platform specific configuration via Variable definitions 22

Makefiles: Syntax Rules: target: prerequisites command ^this must be a 'Tab' ( <- -> ) Variables: NAME= VALUE1 VALUE2 value3 Comments: # this is a comment Special keywords: include linux.mk 23

Makefiles: Rules Examples # first target is default: all: hello sqrt hello: hello.c cc -o hello hello.c sqrt: sqrt.o f77 -o sqrt sqrt.o sqrt.o: sqrt.f f77 -o sqrt.o -c sqrt.f 24

Makefiles: Variables Examples # uncomment as needed CC= gcc #CC= icc -i-static LD=$(CC) CFLAGS= -O2 hello: hello.o $(LD) -o hello hello.o hello.o: hello.c $(CC)-c $(CFLAGS) hello.c 25

Makefiles: Automatic Variables CC= gcc CFLAGS= -O2 howdy: hello.o yall.o $(CC) -o $@ $^ hello.o: hello.c $(CC)-c $(CFLAGS) $< yall.o: yall.c $(CC)-c $(CFLAGS) $< 26

Makefiles: Pattern Rules OBJECTS=hello.o yall.o howdy: $(OBJECTS) $(CC) -o $@ $^ hello.o: hello.c yall.o: yall.c.c.o: Rule to translate all XXX.c files to XXX.o files $(CC)-o $@ -c $(CFLAGS) $< 27

Makefiles: Special Targets.SUFFIXES:.SUFFIXES:.o.F Clear list of all known suffixes Register new suffixes.phony: clean install Tell make to not look for theses files.f.o: $(CPP) $(CPPFLAGS) $< -o $*.f $(FC)-o $@ -c $(FFLAGS) $*.f clean: rm -f *.f *.o 28

Makefiles: Calling make Override Variables: make CC=icc CFLAGS='-O2 -unroll' Dry run (don't execute): make -n Don't stop at errors (dangerous): make -i Parallel make (requires careful design) make -j2 Use alternative Makefile make -f make.pgi 29